Gradio has shipped gradio.Server, an extension of FastAPI that exposes Gradio’s backend infrastructure without requiring the Gradio UI. You annotate functions with @app.api() and get queuing, request serialisation, concurrency control, and ZeroGPU support. You serve your own HTML, React app, or anything else from @app.get() routes. The Gradio aesthetic — which has been a persistent friction point for teams building production-facing tools — is now optional.

Gradio has always been the fastest path to a GPU-backed API on Hugging Face Spaces: a few lines of Python, free hosting, automatic batching, and ZeroGPU for on-demand GPU allocation. The problem is that the default Gradio UI is immediately recognisable and hard to customise beyond surface-level theming. Teams building internal tools or customer-facing demos have had to choose between Gradio’s backend convenience and design freedom. The decoupling resolves that tension directly.

The @app.api() decorator does more than raw FastAPI would out of the box. It handles GPU collision prevention by serialising concurrent requests through the queuing engine, which matters when your inference function needs to hold a GPU allocation for several seconds. It also registers the function as an MCP tool via @app.mcp.tool(), which means Claude and other MCP-compatible clients can call it directly without any additional plumbing. The reference demo in the announcement is a text-behind-image app: roughly 50 lines of Python for the backend (BiRefNet model, PyTorch, PIL), and around 1,300 lines of vanilla HTML/CSS/JS for the frontend. No build tools, full client-side PNG export, and it runs on Spaces with ZeroGPU.

If you have been hosting inference endpoints on raw FastAPI because Gradio’s UI did not fit your use case, this changes the calculus. The queuing and ZeroGPU integration alone save meaningful engineering time on Space-hosted demos and internal tools. If you are serving to MCP clients, the @app.mcp.tool() registration is a one-liner that removes a manual integration step.

This is still Spaces-first. The ZeroGPU benefit only applies on Hugging Face Spaces. For self-hosted deployments, gradio.Server is just a FastAPI wrapper with queuing, which is useful but less distinctive. Teams with their own GPU infrastructure may find the overhead of the Gradio dependency unjustified relative to a purpose-built FastAPI setup.