Anthropic researcher Nicholas Carlini published a blog post describing how he set 16 instances of the company’s Claude Opus 4.6 AI model loose on a shared codebase with minimal supervision, tasking them with building a C compiler from scratch. Over two weeks and nearly 2,000 Claude Code sessions costing about $20,000 in API fees, the AI model agents reportedly produced a 100,000-line Rust-based compiler capable of building a bootable Linux 6.9 kernel on x86, ARM, and RISC-V architectures. The model hit this wall at around 100,000 lines, which suggests a practical ceiling for autonomous agentic coding, at least with current models.
A year ago, no language model could have produced anything close to a functional multi-architecture compiler, even with this kind of babysitting and an unlimited budget. The methodology of parallel agents coordinating through Git with minimal human supervision is novel, and the engineering tricks Carlini developed to keep the agents productive (context-aware test output, time-boxing, the GCC oracle for parallelization) could potentially represent useful contributions to the wider use of agentic software development tools.
Carlini, a research scientist on Anthropic’s Safeguards team who previously spent seven years at Google Brain and DeepMind, used a new feature launched with Claude Opus 4.6 called “agent teams.” In practice, each Claude instance ran inside its own Docker container, cloning a shared Git repository, claiming tasks by writing lock files, then pushing completed code back upstream. No orchestration agent directed traffic. Each instance independently identified whatever problem seemed most obvious to work on next and started solving it. When merge conflicts arose, the AI model instances resolved them on their own.
A C compiler is a near-ideal task for semi-autonomous AI model coding. The specification is decades old and well-defined, comprehensive test suites already exist, and there’s a known good reference compiler to check against. Most real-world software projects have none of these advantages. The hard part of most development isn’t writing code that passes tests. It is figuring out what the tests should be in the first place.
The compiler also has clear limitations that Carlini was upfront about. It lacks a 16-bit x86 backend needed to boot Linux from real mode, so it calls out to GCC for that step. Its own assembler and linker remain buggy. Even with all optimizations enabled, it produces less-efficient code than GCC running with all optimizations disabled. And the Rust code quality, while functional, does not approach what an expert Rust programmer would produce. “The resulting compiler has nearly reached the limits of Opus’s abilities,” Carlini wrote. “I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.”
The agentic workflow itself is the core innovation. 16 parallel instances on a shared Git repo, using lock files for task claiming, self-resolving merge conflicts, specializing (one on parsing, another on optimization/critique), and iterating via tests. This goes far beyond single-model code generation.
The agents built it from scratch in Rust (no external dependencies beyond the standard library) as a new implementation, not a fork, port, or direct copy-paste of existing codebases. They implemented parsing, intermediate representation, optimization passes, multiple backends, assembler, linker, and debug info generation through iterative reasoning.
Human involvement was high-level and non-interactive during the core work: setup of the agent harness, initial prompt/spec, test suites, and final validation. No human wrote, debugged, or iteratively refined the code in a traditional sense. The agents ran autonomously in parallel for ~2 weeks across ~2,000 sessions.
It succeeded on real-world validation and 99% pass rate on GCC torture tests, compiles Linux 6.9 kernel (bootable on x86/ARM/RISC-V, with minor caveats like using GCC for x86 16-bit bootstrap), plus PostgreSQL, SQLite, Redis, FFmpeg, QEMU, and even Doom. This requires correct handling of complex C semantics, ABI details, and systems programming edge cases—not trivial regurgitation.

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.

