Training a coding agent end-to-end costs $200 on TPUs

The question “how much does it actually cost to train a capable coding agent from scratch” now has a concrete answer, at the 1.3B parameter scale at least. Nanocode is an open-source project that runs the full pipeline: base model pretraining on code, supervised fine-tuning on synthetic data with critique loops, and Direct Preference Optimisation for alignment. The resulting model can read files, run Bash commands, make edits, and search code. It runs the whole thing on a TPU v6e-8 in about nine hours for roughly $200. A smaller 477M parameter variant costs $34 and trains in ninety minutes.

The project is written entirely in JAX and optimised for TPU execution via XLA compilation. The tokeniser incorporates code data from The Stack v2 at a 1:5 ratio with general text, which the authors report yields 50.9% better token efficiency on code compared to the baseline. The preference optimisation step uses DPO rather than traditional RLHF, keeping the pipeline simpler and cheaper. The whole thing follows Anthropic’s Constitutional AI methodology: generate synthetic training data, run rejection sampling with critique loops, then fine-tune on the filtered outputs.

The $200 figure is interesting less as a practical training cost and more as a benchmark for what the underlying compute actually looks like. Cloud AI API costs are opaque in a way that makes it easy to treat model capability as a kind of magic. Nanocode makes the cost concrete: nine hours of TPU time, a specific dataset construction procedure, a specific set of architectural decisions. The 1.3B model is nowhere near production Claude quality, but it is a functional coding agent trained from scratch for less than a dinner out in San Francisco.

For practitioners building specialised coding tools, the key takeaway is the recipe for domain-specific fine-tuning. The synthetic data generation approach, where you generate outputs, critique them, and train on the filtered set, is applicable at much smaller scales than full pretraining. If you have a specific coding task, a specific codebase pattern, or a specific workflow you want a model to follow, this pipeline gives you a principled path to get there without needing Google-scale resources.

The caveat is that $200 buys you a 1.3B parameter model trained on publicly available code data. The quality ceiling is real. You are not producing a competitor to frontier models. What you are producing is something you fully control, can run privately, and can fine-tune further for your specific use case. At the current price of cloud inference, a lightweight specialised model that handles 80% of your routine coding tasks locally might pay for itself in a month.