Meta Released Its First Hosted Frontier Model With 16 Built-in Tools

Meta has released Muse Spark, their first hosted model that is not an open-weights release. This is a departure for a company that has positioned itself as the open-source counterweight to proprietary frontier labs. The model is currently in private API preview but accessible via meta.ai with a Facebook or Instagram login. Artificial Analysis ranks it fourth overall, behind only Gemini 3.1 Pro, GPT 5.4, and Claude Opus 4.6. Whether that holds up under independent evaluation is an open question, but the gap from Llama 4 to a model competitive at the frontier tier is significant.

The meta.ai interface ships 16 built-in capabilities: web search, page loading, pattern matching, semantic search across Instagram, Threads, and Facebook content, image generation in artistic and realistic modes, a Python 3.9 sandbox with pandas, numpy, matplotlib, and scikit-learn, file management, sub-agent spawning, and calendar and email account linking. The Python sandbox and sub-agent spawning put this in the same category as the Claude computer use and GPT operator workflows — a hosted AI that can delegate to other agents and execute code.

The visual grounding tool is the capability that does not have a direct equivalent in the current offerings from other labs. It analyses images and returns structured output in three formats: pixel coordinates for point detection, bounding boxes for spatial localisation, and object counts. Simon Willison’s write-up demonstrates it counting whiskers on a raccoon and locating pelicans with pixel-level precision. Grounding image understanding in structured positional output rather than prose description changes what you can build on top of it — downstream automation can act on coordinates in ways it cannot act on “there are three birds near the top left.”

For practitioners choosing an AI stack, the competitive field now includes a Meta hosted offering with native social content search and visual grounding, which addresses two use cases that previously required separate specialised services. The Meta content search is obviously limited to Meta’s platforms, but for applications in social media monitoring, content moderation, or trend analysis, that is exactly the data source that matters.

The relevant caution is that benchmark rankings from the releasing lab should be treated as a starting point. Terminal-Bench 2.0 is one independent benchmark where Muse Spark underperforms its claimed positioning, and full independent evaluation takes weeks. The 16-tool setup and the visual grounding capability are worth watching closely; the benchmark claims are worth watching sceptically.