The Week the Computer Started Using Itself

There's a feature that landed this week that I haven't been able to stop thinking about.

OpenAI's Codex — already used by more than two million developers — can now operate your macOS apps on its own. Not just write code. Use the actual applications. Click things. Navigate interfaces. Run builds. Open files you haven't opened yet and figure out what they mean.

That's a sentence that requires a second read.

The coding agent isn't waiting for you to hand it a task. It's in the app. It's doing the thing.

The Race to Own the Developer's Desktop

This week, every major lab shipped something aimed at the same target: the hour-to-hour workflow of a software developer.

Anthropic redesigned Claude Code's desktop app entirely. The new version has a sidebar for managing multiple sessions at once, a drag-and-drop workspace layout, and a built-in terminal and file editor. The design philosophy is explicit: you shouldn't be working with one agent on one task. You should be managing several agents running in parallel while you think about something else.

That's a different model of development work than anything we've had before. Not "the AI helps you write code." The AI runs tasks; you review outputs.

Two days later, OpenAI showed Codex opening macOS applications on its own — reading gauges in your project tracker, navigating your build tools, interacting with UIs that don't have APIs. The infrastructure is the entire computer, not just the editor.

Meanwhile: Claude Opus 4.7 launched. The early access feedback is unusually direct. One company described it as "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." Another said it "catches its own logical faults during the planning phase." A third said it's the first model they've tested that "resists dissonant-data traps" — scenarios where the context is subtly wrong and most models just go along with it.

That last one matters more than the benchmark score. Models that catch their own errors close a loop that has required human review at every step.

What $122 Billion Actually Bought

OpenAI closed its $122 billion round at an $852 billion post-money valuation earlier this month. The round announcement confirmed what the product roadmap has been telegraphing: the superapp thesis is real, and Codex is the enterprise wedge.

Two million weekly Codex users. Usage growing more than 70% month over month. APIs processing 15 billion tokens per minute.

The funding note is worth reading directly: they describe a flywheel where compute drives models, models drive products, products drive adoption, and adoption funds more compute. That's not remarkable — every platform company describes a flywheel. What's remarkable is that this one appears to actually be spinning. They went from $1B revenue per quarter at the end of 2024 to $2B per month now.

For comparison: they grew four times faster than Google or Meta during their early years. That's OpenAI's own claim, and the underlying numbers are real.

The decision to kill Sora — their video generator, quietly ended — is the tell. When you're at this scale, you cut the experiments that don't compound. Video generation is hard to turn into an enterprise workflow tool. Coding is trivially connectable to every CI/CD pipeline, every codebase, every deploy script that already exists. The product roadmap follows the money, and the money follows the developers.

Software Is Now a Target

The piece from earlier this week — "OpenAI, Google, and Anthropic are eating the software world alive" — describes something that feels obvious in retrospect but wasn't obvious when it was happening.

The original pitch for AI coding tools was productivity improvement. Twenty percent faster. Fifty percent fewer bugs. A junior engineer becoming more senior faster.

What's actually happening is category compression. The space between "idea" and "working software" is collapsing. The market for tools that help humans write code is shrinking as the market for agents that write code expands. That's not the same thing.

A friend who runs a development team described it to me recently: the bottleneck used to be engineering hours. Now the bottleneck is requirements clarity. If you can write down precisely what you want, you can have it running in the time it takes to review the output. The constraint moved up the stack.

That shift has a second-order effect nobody's talking about enough: the value of being able to specify things precisely just went way up. The market is about to care a lot more about people who can think clearly about what software should do, and a lot less about people who can implement it.

The Pentagon Wants Mythos Too

The story that deserves more attention this week: the White House is reportedly seeking access to Claude Mythos Preview — Anthropic's cybersecurity model, the one that finds zero-day vulnerabilities across operating systems and browsers.

At the same time, Google is reportedly in talks to let the Pentagon use Gemini in classified settings. OpenAI already has a contract allowing DOD use of its models for "all lawful purposes."

This is all happening while Anthropic is actively fighting the government in federal court. The Pentagon designated Anthropic a "supply-chain risk" — essentially a national security concern — and Anthropic sued to block it. The judge is expected to rule soon. Meanwhile, the White House apparently wants the model it called a risk.

The contradiction is not a bug. It's a feature of how government works. Multiple agencies with competing interests making decisions that don't coordinate. One hand designating you a threat; another hand asking if they can have a subscription.

What it tells you about the actual stakes: Mythos Preview is genuinely powerful enough that the national security apparatus wants it. They're not seeking access out of curiosity. They're seeking access because whoever controls this capability first has an asymmetric advantage in the vulnerability research and offensive security domains.

Anthropic responded to this entire situation by releasing Opus 4.7 — a slightly less capable version with cybersecurity-specific safeguards built in — as a way to iterate on deployment patterns before Mythos eventually goes broader. They called it the "first such model" in that progression. Read between the lines: they're building a controlled release infrastructure for a capability they believe needs one.

The Infrastructure That Actually Matters

Underneath all of this: compute.

Anthropic's Google and Broadcom deal for multiple gigawatts of TPU capacity — coming online starting in 2027 — is infrastructure at a scale that puts them in a different competitive tier than any AI company that isn't also a hyperscaler.

I think about infrastructure in terms of floors. What is the minimum any serious player needs? What does the floor cost? And who has already committed so far beyond the floor that the floor itself doesn't constrain them anymore?

A year ago, the floor was a handful of A100s and a good relationship with one cloud provider. Now it's gigawatts. The floor moved so fast that a lot of companies didn't realize they'd fallen through it until they were already below it.

The labs that will matter in three years are the ones that signed the compute deals in 2025 and 2026. Not because compute is scarce — it won't always be — but because the models trained on that compute will compound. Better training infrastructure now means better models next year, which means more revenue, which means more infrastructure in 2027. That's not a prediction. That's just reading the announcements.

What This Week Says About Next Year

Here's the actual argument underneath everything I've described:

The AI industry is running a controlled experiment in replacing software labor with software infrastructure.

The coding agent tools aren't a feature. They're a preview of what enterprise software procurement looks like when the buyer can specify outcomes instead of paying for human-hour capacity. That transition is happening faster than anyone predicted twelve months ago.

The companies that are winning — not in the press cycle, in the actual economics — are the ones who understood earliest that AI is not a product category. It's an infrastructure layer. And infrastructure compounds in ways that products don't.

Codex using your Mac apps autonomously isn't a party trick. It's a signal about the direction of travel.

Every tool your organization uses that has an interface is now, in principle, automatable. Every workflow that runs through software is now, in principle, delegatable. The question isn't whether AI agents will run your internal processes. It's who builds the layer that runs all of them, and who the customers are, and what the contracts look like.

OpenAI at $2B per month already knows the answer to those questions. Anthropic at $30B run-rate is starting to. The rest of the industry is still writing the specification.

— Rock

Share on X