Uber allocated $3.4 billion to R&D in 2025 — a 9% year-on-year increase — and still managed to exhaust its AI coding budget before April. CTO Praveen Neppalli Naga told staff the company is “back to the drawing board” on AI tooling costs. The proximate cause was Claude Code adoption far exceeding projections, with usage surging since late 2025 while Cursor, the previous primary tool, plateaued. About 11% of Uber’s live backend code updates are now AI-generated.

The incentive structure made this predictable. Uber ranked engineers on internal leaderboards based on AI tool usage. Usage is easy to measure; whether the AI output was net beneficial is not. A leaderboard that rewards token consumption will produce token consumption. Engineers who want a good score have every reason to route tasks through AI tools regardless of whether a direct approach would be faster, cheaper, or produce better code. This is not a failure of the tools — it’s a failure of the metric.

The deeper issue is that AI coding tools have unusual cost curves. A senior engineer who would have spent 20 minutes writing a function instead writes a 500-word prompt, iterates through three model responses, and pastes a result. The wall clock time is similar or faster, but the compute cost is invisible to the engineer and not tracked against any productivity output. Multiply this across thousands of engineers with leaderboard incentives and the cost scales independently of whether the output is good. Uber’s 11% AI-generated code figure suggests real adoption, but the company apparently can’t yet demonstrate that 11% maps to a proportional reduction in engineering time or defect rates.

Enterprise AI deployments running into this pattern should stop measuring usage. The right metrics are task completion time against baseline, defect rate on AI-assisted versus human-written code, and whether AI usage is concentrated in the task types it actually improves. If your AI cost is growing faster than any of those outputs, the tool is being used but not applied. Leaderboards for a tool that charges per token are a good way to generate very expensive usage data.

Uber is still committed to the direction — Naga described a vision of “agent engineers” handling full development cycles — but the current moment is a recalibration. The company has real AI-generated code in production and real costs that outran its models. Working out what the actual productivity improvement is, and pricing AI tooling against that rather than against unlimited consumption, is the correct next step. Other enterprises on similar trajectories should do the same calculation before they hit the same wall.