Anthropic shipped streaming tool use for Claude, which means agent frameworks can now invoke tools as the model generates rather than waiting for the full response to complete. Previously, you had to wait for the entire generation to finish before parsing out any tool calls, which added noticeable latency in multi-step agent workflows.

The change matters most for agentic systems that chain multiple tool calls together. A research agent that needs to search, fetch, and summarise can now kick off the search while still generating its reasoning about what to search for. In practice, this can cut end-to-end latency significantly for workflows that involve three or more sequential tool calls.

The implementation follows the existing tool use schema, so migration from non-streaming mode is straightforward. You are already defining tools the same way; the difference is that tool call events now arrive as part of the stream rather than bundled at the end. Most agent frameworks that support Claude will likely update their streaming handlers within days.

For anyone building production agent systems, this is the kind of infrastructure improvement that compounds. Individually, saving a few hundred milliseconds per tool call is minor. Across a five-step agent workflow running thousands of times a day, it adds up to a meaningful reduction in both latency and cost (since you are holding fewer connections open for shorter periods).

The broader signal is that API providers are now optimising for agentic use cases specifically, not just chat completion. Expect more changes in this direction from all the major providers over the coming months.