sharpbyte.dev
← Learning hub
Sample · Path A topic

Python for ML code

A topic page spells out the outcome you should get, what to know first, common traps, and where to read next—then shows code in the same terminal-style panel used across the tracks so examples stay easy to scan. Many copilot patterns pair a hosted LLM with RAG for grounded answers; see the hub glossary for more anchors like #mcp or #eval.

Outcome

You can create a reproducible training script layout with logging, config entrypoints, and a smoke test before touching data.

Prerequisites

  • Comfort with Python 3.10+ syntax
  • Terminal + virtual environment workflow

Core ideas

  • Isolate dependencies per project (venv or container)
  • Make paths portable with pathlib
  • Log structured context (epoch, split, metric)—not only print

Mental model

Treat every training run like a micro-service deployment: pinned deps, explicit entrypoint, observable logs.

Hands-on

Scaffold train.py that reads a YAML config, sets seeds, and writes JSON lines logs.

Example — training entrypoint sketch

Styled like python-ai-fasttrack.html code windows:

train.py
# Minimal reproducible entry — expand in your own repo
from __future__ import annotations

import json
import logging
from pathlib import Path

import yaml


def load_config(path: Path) -> dict:
    with path.open("r", encoding="utf-8") as fh:
        return yaml.safe_load(fh)


def main() -> None:
    cfg = load_config(Path("configs/experiment.yaml"))
    logging.info("run_start", extra={"json_fields": {"lr": cfg["lr"]}})
    out = Path(cfg["output_dir"])
    out.mkdir(parents=True, exist_ok=True)
    (out / "metrics.jsonl").write_text(
        json.dumps({"epoch": 0, "loss": 0.42}) + "\n",
        encoding="utf-8",
    )


if __name__ == "__main__":
    main()

Real-world example

Internal fine-tuning jobs launched from a single CLI that records Git SHA, dataset version, and hyperparameters—so audits match what shipped.

Pitfalls

  • Hard-coded absolute paths that break on CI
  • Silent pip install drift between laptops
  • No seed control → “unlucky” runs that cannot be reproduced

Further reading

Python logging cookbook; Twelve-Factor config discipline; pathlib docs.

Bridge

Practical counterpart: Path B — Scripting & APIs