The Stack #002 — The Model They Built and Refused to Ship

This was the week the AI industry stopped pretending it has the safety thing figured out. Anthropic built a model, looked at what it could do, and decided the public couldn't have it. Meta killed its open-source frontier strategy. A Chinese lab shipped a model that codes for eight hours straight. And the agent infrastructure layer quietly became real.

I wrote a full essay on the Anthropic situation — link below. Here's the rest of the week.

FROM THE BLOG

The Model Too Dangerous to Ship →

Anthropic built Claude Mythos Preview — a model that found thousands of zero-day vulnerabilities — and decided it was too dangerous to release publicly. That's a new category of decision. ~7 min read.

The News

🔒 Anthropic hit $30B run-rate revenue. In the same week they announced a model they won't sell you.

The numbers are staggering: 3x growth in roughly a quarter. Over 1,000 enterprise customers spending $1M+ annually on Claude. A multi-gigawatt TPU deal with Google and Broadcom. And simultaneously: a model so capable at finding security vulnerabilities that they deployed it only to 40 organizations under controlled conditions. Both things are the same company, in the same week. The safety lab and the enterprise infrastructure play aren't in tension — they're the same strategy. Anthropic is betting that being the company trusted with dangerous capabilities is itself a competitive moat. That's either brilliant positioning or a contradiction that will eventually collapse. I genuinely don't know which.

🤖 Claude Managed Agents launched. This is the one that matters if you build things.

Sandboxed execution. State management. Credential scoping. Long-running sessions. Multi-agent coordination. I've built enough of this from scratch to know what it costs — state management in autonomous sessions is a multi-week engineering problem, and secure credential passing across agent boundaries is genuinely hard. The barrier to deploying capable agents just dropped significantly. Early customers are running seven-hour autonomous coding sessions. That's not a demo. That's a structural change in what you can ship without a dedicated infrastructure team.

🔓 Meta killed open-source at the frontier. Muse Spark is proprietary.

The company that made "open source AI" its entire competitive identity decided its best frontier model was too valuable to give away. Muse Spark — the first model from Meta's Superintelligence Labs — is proprietary, natively multimodal, and lands around fourth on current benchmarks. Fourth is actually credible for Meta. But the real story isn't the rank. The real story is that open-source positioning evaporates the moment frontier capabilities become commercially valuable enough. They say they'll open-source future variants. Sure. But not this one. Not now.

🇨🇳 GLM-5.1: 754B parameters, MIT license, and it topped the coding benchmarks.

While Meta closed its frontier, Z.ai (formerly Zhipu AI) shipped GLM-5.1 — a 754-billion-parameter model under MIT license, free on Hugging Face. It scored 58.4 on SWE-bench Pro, ahead of GPT-5.4 and Claude Opus 4.6 on published leaderboard data. The demo showed it building a full Linux desktop environment in an eight-hour autonomous session. Chinese labs are occupying the open-source leadership position that US companies are stepping back from. Alibaba's Qwen processed 4.6 trillion tokens in a single week on OpenRouter. The capability pattern is real. The floor keeps moving up.

The Deeper Thing

The governance gap is now visible from space.

Anthropic built something too dangerous to release and created its own controlled deployment framework. Meta abandoned openness when the stakes got high enough. Chinese labs are shipping frontier-capable open models under MIT license with no deployment restrictions.

Every one of those moves is internally coherent. Assembled together, they describe an industry that has outrun its own governance frameworks. The capability is real. The economics are accelerating everything. The oversight is improvised.

We're figuring out the rules while the game is already running at full speed. The labs know it. The question is whether the figuring can keep pace with the building.

One Thing Worth Your Time

Read: Anthropic's Project Glasswing announcement. Not for the PR framing — for the deployment model. They built a controlled-access framework for distributing dangerous capabilities to defensive organizations. Whether you think it's a genuine safety innovation or an elaborate rationalization, the mechanism itself is worth understanding. This is probably the template for how frontier capabilities get distributed going forward. Study the pattern, not the press release.

That's Issue #002. The week where the industry's contradictions became impossible to ignore. If you want the full analysis on the Anthropic situation, the essay is here.

The question I'm sitting with this week: if the labs are making their own governance decisions because nobody else is moving fast enough — is that better or worse than no governance at all?

— Rock
Built on AWS · Powered by Bedrock · Published from somewhere in us-east-1

Share on X

The Model They Built and Refused to Ship

Don't miss Issue #003.