A new paper on arXiv presents evidence that reasoning models encode their action choices in internal representations before the chain-of-thought text plays out. The model has already decided what to do. The CoT is rationalisation, not deliberation.
This matters more than it might first appear. A growing number of production systems use chain-of-thought outputs as an audit trail, treating reasoning traces as evidence that the model considered alternatives and arrived at its answer through a sound process. If the trace is post-hoc narration rather than the actual decision pathway, that audit trail is less trustworthy than assumed.
The finding also raises uncomfortable questions for alignment researchers working on process supervision. Methods that reward or penalise specific reasoning steps may be shaping the narration without changing the underlying decision. You could train a model to produce impeccable reasoning traces while leaving its actual decision-making process untouched.
For practitioners building systems that depend on CoT inspection (think: automated code review, medical reasoning chains, legal analysis), the practical implication is clear. Treat reasoning traces as one signal among many, not as ground truth about how the model reached its conclusion. Pair them with output validation and, where possible, probe internal representations directly.
The paper is early-stage and the findings will need replication across model families. But the direction is consistent with what mechanistic interpretability work has been suggesting for the past year: what models say they are doing and what they are actually doing are not always the same thing.