I read through the entire leaked source - 2,203 files, ~30MB of TypeScript. The agentic loop, tool concurrency, context compaction, permission model, and the architectural decisions (good and bad) behind one of the most widely used AI coding agents in the world.
Claude Code is Anthropic's terminal-based AI coding agent. You type a prompt, it reads your codebase, runs commands, edits files. It is a TypeScript application that renders a terminal UI with React and Ink. When you type a message, it gets sent to the Claude API along with a system prompt and a list of tool definitions. Claude responds with text, tool calls, or both. Tool calls execute locally on your machine. The results get fed back into the conversation. The loop repeats until Claude has nothing left to do.
There is no remote execution server. Your files, shell, and credentials stay on your machine unless a tool explicitly reaches out (like WebFetch or an MCP server).
That is the simple version. After spending hours tracing execution paths and grepping through every module, here is what I actually found. The engineering is impressive in places - the query loop, the tool concurrency system, the permission classifier. Real thought went into these. But some parts made me stop and think "wait, really?" This is not a hit piece. Every codebase has debt. But Claude Code runs arbitrary commands on your machine, and it is built by a company with $10B+ in funding. Some of these choices are worth examining.
Every interaction, whether you are in the interactive REPL or running a headless claude -p command, follows the same cycle. The core loop lives in query.ts, wrapped by QueryEngine.ts which adds session tracking, cost accounting, and SDK integration.
--print / stdin. The message gets appended to the conversation history. If you are resuming a session, the full history is loaded from disk first.CLAUDE.md memory files, the current date, and the full list of available tools. This context is memoized for the session and only rebuilt if explicitly invalidated.tool_use block arrives, the tool starts executing immediately, while the stream is still running. The system does not wait for the full response before acting on tool calls. The loop is a while(true) async generator that yields streaming events. It is not recursive. State is carried forward via a mutable State object that gets reconstructed at each iteration boundary. The generator pattern lets the UI consume events incrementally without buffering the entire turn in memory.
The architecture here is clean. But the component that renders this loop is where things get rough. The main REPL interface is a single React component in screens/REPL.tsx. It is 5,005 lines long. Inside that one file: 68 useState calls, 43 useEffects, 54 useRefs, 44 useCallbacks, 18 useMemos. That is 227 hook calls in one component. The JSX nesting goes 22 spaces deep. Over 300 conditional branches. The import section alone is 244 statements pulling from 235 distinct modules.
A file with 227 hook calls is functionally untestable in isolation. Every useEffect interacts with every useState. The dependency arrays become impossible to reason about. There is a // TODO: fix this on line 4114, sitting next to an eslint-disable-next-line react-hooks/exhaustive-deps - the team knows it. A state machine driving 15-20 focused components would be the standard approach for a UI with this many states: initializing, waiting for input, streaming, executing tools, awaiting permission, compacting, showing results. Each state maps to a component. The 68 useStates become one typed state object with explicit transitions.
This is probably the most well-designed piece of the architecture. Most agent frameworks wait for the model to finish its entire response, then execute all the tool calls, then send the results back. Claude Code does not do that.
As soon as a tool_use block arrives in the stream, a StreamingToolExecutor picks it up and starts executing it. While the model is still generating the rest of its response, the first tool is already running. This overlaps model I/O with tool execution and meaningfully reduces end-to-end latency on multi-tool turns.
| Scenario | What happens |
|---|---|
| All queued tools are concurrent-safe | They run in parallel. Read, Grep, Glob, and WebSearch are all concurrent-safe. |
| A non-concurrent tool is queued | It waits until all running tools finish, then runs alone. Bash is the main example. |
| User aborts mid-execution | Running tools are cancelled. Queued tools get synthetic tool_result blocks saying "Interrupted by user" to keep the conversation consistent. |
| Model fallback mid-stream | Old executor is discarded, partial messages are tombstoned, thinking blocks are stripped, fresh executor spins up for the retry. |
The context window is finite. Long conversations fill it up. Claude Code has a layered compaction system that is more sophisticated than most people realize.
When all layers fail: If the context is still too large after every compaction strategy has run, the system blocks the API call entirely and surfaces an error. It does not silently drop messages. There is also a reactive compaction path that kicks in if the API returns a 413 (prompt too long), attempting one last recovery before giving up.
Claude Code ships with 40+ built-in tools and supports unlimited external tools via MCP. Every tool implements the same Tool<Input, Output, Progress> interface. Each tool has a Zod input schema for parameter validation, a call() method, and five rendering methods for different lifecycle points.
| Property | What it controls |
|---|---|
isConcurrencySafe() |
Whether this tool can run in parallel. Defaults to false (fail-closed). |
isReadOnly() |
Read-only tools get lighter permission checks. |
maxResultSizeChars |
When output exceeds this, it gets saved to a temp file and the model receives a preview. Prevents one large result from flooding the context. |
interruptBehavior() |
'cancel' kills the tool on Ctrl+C; 'block' keeps it running. |
The tool design is solid. All tools use a buildTool() factory with safe defaults. But the Tool.ts type definition file itself is 792 lines long and imports from permission types, message types, analytics, MCP types, agent types, progress types, hooks, and more. When the central type in your architecture imports from everything and everything imports from it, you get dependency cycles. Grepping the codebase for "break import cycle" or "circular dependency" hits 61 different files.
The pattern is always the same: extract types to a separate file, use lazy requires, or inline code that should be imported. Entire files like types/permissions.ts and schemas/hooks.ts exist purely as import cycle band-aids. 61 files means the module graph was never designed - it grew organically and now has deep tangles. Every lazy require is a place where TypeScript cannot help you at compile time.
This is where Claude Code is most opinionated, and it is well done. Every single tool invocation goes through a six-step permission check before anything executes. The system defaults to denying, not allowing.
validateInput() runs first. Catches invalid file paths, blocked device paths (/dev/zero, /dev/random), or nonsensical arguments."Bash" blocks all shell commands. A specific deny like "Bash(rm -rf *)" blocks just that pattern.Bash parses the command into an AST and evaluates each subcommand individually against permission rules. This is proper shell parsing via tree-sitter, not regex matching. This is one of the best-designed parts of the entire codebase.default, auto, bypass, plan) can override the decision. In bypass mode, everything is auto-approved.Anti-fatigue protection. If you deny a tool 5 times in a row, the system stops asking and auto-rejects subsequent attempts. Prevents the model from getting stuck in a retry loop on a denied action.
Claude Code uses Bun's compile-time feature() function for feature gating. There are 89 distinct feature flags referenced 960 times across the codebase. On top of that, 472 distinct environment variables are referenced across 1,425 call sites.
Some of these are clearly experiments (ABLATION_BASELINE, OVERFLOW_TEST_TOOL). Some are entire product directions (KAIROS, COORDINATOR_MODE, BRIDGE_MODE). Some sound like they should have shipped or been deleted months ago (EXPERIMENTAL_SKILL_SEARCH, NEW_INIT).
When you have KAIROS, KAIROS_BRIEF, KAIROS_CHANNELS, KAIROS_DREAM, KAIROS_GITHUB_WEBHOOKS, and KAIROS_PUSH_NOTIFICATION as separate flags, that is not a gradual rollout. That is an entire parallel product built inside the same codebase behind conditional requires. A monorepo without the monorepo tooling.
Since feature() is compile-time, dead code gets eliminated from the build. The runtime never sees unused paths. Performance-wise, it is fine. The cost is in developer experience - 960 feature checks scattered across the codebase, and nobody knows which ones are still alive.
This leads to a pattern that appears everywhere, especially in REPL.tsx (17 times) and query.ts (6 times):
TypeScript code using require() inside an ES module, wrapped in a compile-time feature check, with a type assertion to recover the types that require() loses. Each one is a place where the type system has a gap. The as typeof import(...) cast tells TypeScript "trust me." If someone changes the export shape, the cast silently lies. No compiler error. You find out at runtime. Dynamic import() would preserve types and is the standard solution for conditional module loading.
Every analytics call in Claude Code requires a type cast to a type named AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS. It appears 1,193 times across the codebase.
The intent is admirable. Claude Code runs on people's actual codebases. You do not want to accidentally log file paths, source code, or secrets to an analytics pipeline. So they made a type that forces developers to manually confirm "yes, this string is safe to log."
But when you are writing this 53-character type cast over a thousand times, it stops being a guardrail. It becomes a ritual. A builder pattern with runtime validation - something that actually throws if a value looks like a path or source code - would catch the thing the type cast only claims to prevent.
main.tsx is 4,683 lines and contains every CLI command definition, all argument parsing via Commander.js, the complete OAuth login flow, session resume logic, remote session management, profile startup benchmarking, plugin loading, and MDM configuration. The comments explain why:
Everything is in one file to minimize the import graph depth. Bun evaluates imports eagerly. Deeper import trees mean more startup latency. Keeping everything in main.tsx means one level of imports instead of three or four. They are saving ~135ms at startup by making the entry point unreadable. A lazy-loading command registry - only load the init module when someone runs claude init, only load OAuth when authentication is needed - would achieve the same thing. That is how every other CLI tool does it.
The system prompt itself is built in layers: a base prompt, optional teammate instructions, browser integration hints, custom agent prompts, proactive mode addendum, and assistant mode addendum. Each layer is appended via string concatenation in sequence.
Feature flags at build time. Many subsystems are gated behind compile-time feature flags using bun:bundle. Disabled features get their code eliminated at build time, not just branched around at runtime. This keeps the shipped bundle lean for each platform target (CLI, VS Code extension, desktop app, SDK).
Claude Code has a hidden pet system (behind the BUDDY feature flag). Procedurally generated companions with rarity tiers, species, hats, eye styles, and stat distributions. In buddy/types.ts, the species list is defined like this:
One of the species names collides with an internal model codename. Anthropic's CI greps the build output for these codenames as a security canary. Instead of adding a regex exclusion for the buddy module, they hex-encoded all 18 species names. Future developers reading this file will be baffled by why "duck" cannot just be "duck". But the fact that engineers spent time building a pet system with rarity tiers inside a terminal coding tool is honestly charming.
Claude Code can spawn sub-agents via the Agent tool. Each sub-agent runs its own isolated agentic loop with a separate conversation, and optionally a restricted tool set. The leader agent orchestrates; workers execute.
Workers run with the swarm worker permission handler, which auto-denies interactive prompts. They can only do things pre-approved by rules or hooks. If a worker hits a permission wall, it fails gracefully and reports back to the leader. Workers cannot spawn their own sub-agents (no recursive spawning) and each can run in an isolated git worktree to avoid file conflicts.
413 (prompt too long): The response is withheld. The system attempts context collapse drain first (cheapest), then reactive compaction (costs an API call). If both fail, it surfaces the error. No silent message dropping.
Output truncation: If the response is capped at the default 8K limit, automatically retry with 64K. If still too long, inject synthetic continue messages up to a configurable limit.
Model failure mid-stream: Tombstone partial messages, strip incompatible thinking blocks, create a fresh streaming executor on the fallback model, notify the user. The conversation continues seamlessly.
MCP integration: MCP tools go through the exact same permission cascade, result size management, and concurrency model as built-in tools. Built-in tools are sorted separately from MCP tools before concatenation in the API call to preserve prompt cache stability - if MCP tools were interleaved, adding or removing one connection would invalidate the entire prompt cache.
Output sanitization. All MCP tool outputs pass through a Unicode sanitizer that strips control characters and zero-width sequences. Prevents a malicious MCP server from injecting terminal escape codes that could manipulate your terminal display.
Most of these issues come from the same root cause: Claude Code grew faster than its architecture could keep up with. You can see the layers of history. A simple terminal REPL grew into a multi-agent coordinator with voice mode, companion pets, vim bindings, and remote sessions. Features got added behind flags faster than old flags got cleaned up. The module graph grew connections faster than anyone drew boundaries.
This is not unique to Anthropic. Every fast-moving company has codebases like this. The reason Claude Code's case is interesting is the scale: this is one of the most important AI products in the world, and its source reveals the same messy engineering trade-offs that exist at every startup.
The code ships. It works. Lots of developers rely on it daily. The query loop architecture, the streaming tool executor, the permission classifier with tree-sitter shell parsing - these are genuinely well-engineered systems. That matters more than clean architecture. But it is worth being honest about the cost.
Built from the source. Last updated March 2026.