Portfolio | Nour Alhouseini - Ethical Hacker

Inside Claude Code's Source Code

I read through the entire leaked source - 2,203 files, ~30MB of TypeScript. The agentic loop, tool concurrency, context compaction, permission model, and the architectural decisions (good and bad) behind one of the most widely used AI coding agents in the world.

The Big Picture

Claude Code is Anthropic's terminal-based AI coding agent. You type a prompt, it reads your codebase, runs commands, edits files. It is a TypeScript application that renders a terminal UI with React and Ink. When you type a message, it gets sent to the Claude API along with a system prompt and a list of tool definitions. Claude responds with text, tool calls, or both. Tool calls execute locally on your machine. The results get fed back into the conversation. The loop repeats until Claude has nothing left to do.

There is no remote execution server. Your files, shell, and credentials stay on your machine unless a tool explicitly reaches out (like WebFetch or an MCP server).

That is the simple version. After spending hours tracing execution paths and grepping through every module, here is what I actually found. The engineering is impressive in places - the query loop, the tool concurrency system, the permission classifier. Real thought went into these. But some parts made me stop and think "wait, really?" This is not a hit piece. Every codebase has debt. But Claude Code runs arbitrary commands on your machine, and it is built by a company with $10B+ in funding. Some of these choices are worth examining.

The Agentic Loop

Every interaction, whether you are in the interactive REPL or running a headless claude -p command, follows the same cycle. The core loop lives in query.ts, wrapped by QueryEngine.ts which adds session tracking, cost accounting, and SDK integration.

Your message enters the system

You type something in the REPL or pass it via --print / stdin. The message gets appended to the conversation history. If you are resuming a session, the full history is loaded from disk first.

Context is assembled

Before hitting the API, the system builds a prompt from several layers: your git status (branch, recent commits, working tree state), all discovered CLAUDE.md memory files, the current date, and the full list of available tools. This context is memoized for the session and only rebuilt if explicitly invalidated.

Pre-flight checks run

Before the API call, the system runs up to four compaction strategies to keep the conversation within the context window: history snipping, microcompaction, auto-compaction, and context collapse. If the context is still too large after all that, the request is blocked before it wastes an API call.

The API streams a response

The conversation is sent to the Claude API and the response streams back token by token. When a tool_use block arrives, the tool starts executing immediately, while the stream is still running. The system does not wait for the full response before acting on tool calls.

Tools execute (with permission checks)

Each tool call passes through a six-step permission cascade before running. Multiple concurrent-safe tools can run in parallel. Non-concurrent tools (like Bash) get an exclusive lock.

The loop decides what to do next

The system checks 11 possible exit conditions: normal completion, max turns reached, budget exhausted, context overflow, user abort, and others. If none are met and the response contained tool calls, the loop iterates.

The loop is a while(true) async generator that yields streaming events. It is not recursive. State is carried forward via a mutable State object that gets reconstructed at each iteration boundary. The generator pattern lets the UI consume events incrementally without buffering the entire turn in memory.

The architecture here is clean. But the component that renders this loop is where things get rough. The main REPL interface is a single React component in screens/REPL.tsx. It is 5,005 lines long. Inside that one file: 68 useState calls, 43 useEffects, 54 useRefs, 44 useCallbacks, 18 useMemos. That is 227 hook calls in one component. The JSX nesting goes 22 spaces deep. Over 300 conditional branches. The import section alone is 244 statements pulling from 235 distinct modules.

A file with 227 hook calls is functionally untestable in isolation. Every useEffect interacts with every useState. The dependency arrays become impossible to reason about. There is a // TODO: fix this on line 4114, sitting next to an eslint-disable-next-line react-hooks/exhaustive-deps - the team knows it. A state machine driving 15-20 focused components would be the standard approach for a UI with this many states: initializing, waiting for input, streaming, executing tools, awaiting permission, compacting, showing results. Each state maps to a component. The 68 useStates become one typed state object with explicit transitions.

Streaming Tool Execution

This is probably the most well-designed piece of the architecture. Most agent frameworks wait for the model to finish its entire response, then execute all the tool calls, then send the results back. Claude Code does not do that.

As soon as a tool_use block arrives in the stream, a StreamingToolExecutor picks it up and starts executing it. While the model is still generating the rest of its response, the first tool is already running. This overlaps model I/O with tool execution and meaningfully reduces end-to-end latency on multi-tool turns.

Scenario	What happens
All queued tools are concurrent-safe	They run in parallel. `Read`, `Grep`, `Glob`, and `WebSearch` are all concurrent-safe.
A non-concurrent tool is queued	It waits until all running tools finish, then runs alone. `Bash` is the main example.
User aborts mid-execution	Running tools are cancelled. Queued tools get synthetic `tool_result` blocks saying "Interrupted by user" to keep the conversation consistent.
Model fallback mid-stream	Old executor is discarded, partial messages are tombstoned, thinking blocks are stripped, fresh executor spins up for the retry.

Context Management

The context window is finite. Long conversations fill it up. Claude Code has a layered compaction system that is more sophisticated than most people realize.

Layer 1: History Snip

The cheapest strategy. Removes the oldest messages entirely from the API payload. The raw transcript on disk is never touched.

↓

Layer 2: Microcompaction

Targets stale tool results. If a file was read 30 messages ago and not referenced since, the full content gets replaced with a short stub. Defers cache invalidation until after the API response to preserve prompt cache efficiency.

↓

Layer 3: Auto-Compaction

Spawns a forked Claude process that summarizes older messages. The summary replaces the originals in the API payload. The full transcript is preserved on disk. Costs an extra API call but can recover thousands of tokens.

↓

Layer 4: Context Collapse

A projection-based approach. Instead of rewriting history, maintains a compressed "view" over the full message list and swaps it in at read time. Most experimental strategy, feature-gated.

When all layers fail: If the context is still too large after every compaction strategy has run, the system blocks the API call entirely and surfaces an error. It does not silently drop messages. There is also a reactive compaction path that kicks in if the API returns a 413 (prompt too long), attempting one last recovery before giving up.

The Tool System

Claude Code ships with 40+ built-in tools and supports unlimited external tools via MCP. Every tool implements the same Tool<Input, Output, Progress> interface. Each tool has a Zod input schema for parameter validation, a call() method, and five rendering methods for different lifecycle points.

Property	What it controls
`isConcurrencySafe()`	Whether this tool can run in parallel. Defaults to `false` (fail-closed).
`isReadOnly()`	Read-only tools get lighter permission checks.
`maxResultSizeChars`	When output exceeds this, it gets saved to a temp file and the model receives a preview. Prevents one large result from flooding the context.
`interruptBehavior()`	`'cancel'` kills the tool on Ctrl+C; `'block'` keeps it running.

The tool design is solid. All tools use a buildTool() factory with safe defaults. But the Tool.ts type definition file itself is 792 lines long and imports from permission types, message types, analytics, MCP types, agent types, progress types, hooks, and more. When the central type in your architecture imports from everything and everything imports from it, you get dependency cycles. Grepping the codebase for "break import cycle" or "circular dependency" hits 61 different files.

// types/permissions.ts
// Pure permission type definitions extracted to break import cycles.

// schemas/hooks.ts
// Hook Zod schemas extracted to break import cycles.
// circular dependency between settings/types.ts and plugins/schemas.ts.

// utils/systemPrompt.ts
// Lazy require to avoid circular dependency at module load time

// utils/bash/ast.ts (line 2218)
// circular import with bashPermissions.ts.

The pattern is always the same: extract types to a separate file, use lazy requires, or inline code that should be imported. Entire files like types/permissions.ts and schemas/hooks.ts exist purely as import cycle band-aids. 61 files means the module graph was never designed - it grew organically and now has deep tangles. Every lazy require is a place where TypeScript cannot help you at compile time.

The Permission Model

This is where Claude Code is most opinionated, and it is well done. Every single tool invocation goes through a six-step permission check before anything executes. The system defaults to denying, not allowing.

Input validation

The tool's own validateInput() runs first. Catches invalid file paths, blocked device paths (/dev/zero, /dev/random), or nonsensical arguments.

Deny rules

Checks against configured deny rules from all scopes. A blanket deny like "Bash" blocks all shell commands. A specific deny like "Bash(rm -rf *)" blocks just that pattern.

Tool-specific permission logic

Bash parses the command into an AST and evaluates each subcommand individually against permission rules. This is proper shell parsing via tree-sitter, not regex matching. This is one of the best-designed parts of the entire codebase.

Rule-based evaluation

Allow and deny rules from all configured scopes (session > project > user > policy) are evaluated in priority order. The first matching rule wins.

Mode-based adjustment

The active permission mode (default, auto, bypass, plan) can override the decision. In bypass mode, everything is auto-approved.

Optional classifier

In auto mode, a background classifier can speculatively pre-approve Bash commands. It runs in parallel with the permission prompt. Whichever resolves first wins.

Anti-fatigue protection. If you deny a tool 5 times in a row, the system stops asking and auto-rejects subsequent attempts. Prevents the model from getting stuck in a retry loop on a denied action.

Interactive

The main REPL. Shows a confirmation dialog in the terminal. Supports keyboard navigation to approve, deny, or grant session-wide permissions.

Coordinator

Used by the swarm leader. Runs automated checks before showing UI dialogs. Limits prompt frequency to avoid flooding the user.

Swarm Worker

Background agents that cannot show UI. Auto-denies all interactive permission requests. Only hook-based approvals work here. If a worker needs something it does not have permission for, it fails and reports back to the leader.

89 Feature Flags, 960 References

Claude Code uses Bun's compile-time feature() function for feature gating. There are 89 distinct feature flags referenced 960 times across the codebase. On top of that, 472 distinct environment variables are referenced across 1,425 call sites.

Some of these are clearly experiments (ABLATION_BASELINE, OVERFLOW_TEST_TOOL). Some are entire product directions (KAIROS, COORDINATOR_MODE, BRIDGE_MODE). Some sound like they should have shipped or been deleted months ago (EXPERIMENTAL_SKILL_SEARCH, NEW_INIT).

ABLATION_BASELINE, AGENT_MEMORY_SNAPSHOT, AGENT_TRIGGERS,
BASH_CLASSIFIER, BG_SESSIONS, BRIDGE_MODE, BUDDY,
CACHED_MICROCOMPACT, CONTEXT_COLLAPSE, COORDINATOR_MODE,
DAEMON, DIRECT_CONNECT, ENHANCED_TELEMETRY_BETA,
KAIROS, KAIROS_BRIEF, KAIROS_CHANNELS, KAIROS_DREAM,
KAIROS_GITHUB_WEBHOOKS, KAIROS_PUSH_NOTIFICATION,
LODESTONE, MCP_SKILLS, PROACTIVE, REACTIVE_COMPACT,
SSH_REMOTE, ULTRAPLAN, ULTRATHINK, VOICE_MODE,
WEB_BROWSER_TOOL, WORKFLOW_SCRIPTS
// ... 60 more flags

When you have KAIROS, KAIROS_BRIEF, KAIROS_CHANNELS, KAIROS_DREAM, KAIROS_GITHUB_WEBHOOKS, and KAIROS_PUSH_NOTIFICATION as separate flags, that is not a gradual rollout. That is an entire parallel product built inside the same codebase behind conditional requires. A monorepo without the monorepo tooling.

Since feature() is compile-time, dead code gets eliminated from the build. The runtime never sees unused paths. Performance-wise, it is fine. The cost is in developer experience - 960 feature checks scattered across the codebase, and nobody knows which ones are still alive.

This leads to a pattern that appears everywhere, especially in REPL.tsx (17 times) and query.ts (6 times):

// query.ts
const reactiveCompact = feature('REACTIVE_COMPACT')
  ? (require('./services/compact/reactiveCompact.js')
     as typeof import('./services/compact/reactiveCompact.js'))
  : null

const contextCollapse = feature('CONTEXT_COLLAPSE')
  ? (require('./services/contextCollapse/index.js')
     as typeof import('./services/contextCollapse/index.js'))
  : null

TypeScript code using require() inside an ES module, wrapped in a compile-time feature check, with a type assertion to recover the types that require() loses. Each one is a place where the type system has a gap. The as typeof import(...) cast tells TypeScript "trust me." If someone changes the export shape, the cast silently lies. No compiler error. You find out at runtime. Dynamic import() would preserve types and is the standard solution for conditional module loading.

The 1,193-Character Type Cast

Every analytics call in Claude Code requires a type cast to a type named AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS. It appears 1,193 times across the codebase.

logEvent('tengu_startup_telemetry', {
  entrypoint: entrypoint as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
  action: 'hint_converted' as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
  variant: idleHintShownRef.current as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
})

The intent is admirable. Claude Code runs on people's actual codebases. You do not want to accidentally log file paths, source code, or secrets to an analytics pipeline. So they made a type that forces developers to manually confirm "yes, this string is safe to log."

But when you are writing this 53-character type cast over a thousand times, it stops being a guardrail. It becomes a ritual. A builder pattern with runtime validation - something that actually throws if a value looks like a path or source code - would catch the thing the type cast only claims to prevent.

The 4,683-Line Entry Point

main.tsx is 4,683 lines and contains every CLI command definition, all argument parsing via Commander.js, the complete OAuth login flow, session resume logic, remote session management, profile startup benchmarking, plugin loading, and MDM configuration. The comments explain why:

// main.tsx - lines 1-8
// These side-effects must run before all other imports:
// 1. profileCheckpoint marks entry before heavy module evaluation begins
// 2. startMdmRawRead fires MDM subprocesses in parallel with the
//    remaining ~135ms of imports below
// 3. startKeychainPrefetch fires both macOS keychain reads in parallel
//    (~65ms on every macOS startup)

Everything is in one file to minimize the import graph depth. Bun evaluates imports eagerly. Deeper import trees mean more startup latency. Keeping everything in main.tsx means one level of imports instead of three or four. They are saving ~135ms at startup by making the entry point unreadable. A lazy-loading command registry - only load the init module when someone runs claude init, only load OAuth when authentication is needed - would achieve the same thing. That is how every other CLI tool does it.

The system prompt itself is built in layers: a base prompt, optional teammate instructions, browser integration hints, custom agent prompts, proactive mode addendum, and assistant mode addendum. Each layer is appended via string concatenation in sequence.

Feature flags at build time. Many subsystems are gated behind compile-time feature flags using bun:bundle. Disabled features get their code eliminated at build time, not just branched around at runtime. This keeps the shipped bundle lean for each platform target (CLI, VS Code extension, desktop app, SDK).

String.fromCharCode to Spell "Duck"

Claude Code has a hidden pet system (behind the BUDDY feature flag). Procedurally generated companions with rarity tiers, species, hats, eye styles, and stat distributions. In buddy/types.ts, the species list is defined like this:

// buddy/types.ts
// One species name collides with a model-codename canary in
// excluded-strings.txt. The check greps build output (not source),
// so runtime-constructing the value keeps the literal out of the
// bundle while the check stays armed for the actual codename.

const c = String.fromCharCode

export const duck    = c(0x64,0x75,0x63,0x6b) as 'duck'
export const goose   = c(0x67,0x6f,0x6f,0x73,0x65) as 'goose'
export const blob    = c(0x62,0x6c,0x6f,0x62) as 'blob'
export const cat     = c(0x63,0x61,0x74) as 'cat'
export const dragon  = c(0x64,0x72,0x61,0x67,0x6f,0x6e) as 'dragon'
export const penguin = c(0x70,0x65,0x6e,0x67,0x75,0x69,0x6e) as 'penguin'
// ... 12 more species, all hex-encoded

One of the species names collides with an internal model codename. Anthropic's CI greps the build output for these codenames as a security canary. Instead of adding a regex exclusion for the buddy module, they hex-encoded all 18 species names. Future developers reading this file will be baffled by why "duck" cannot just be "duck". But the fact that engineers spent time building a pet system with rarity tiers inside a terminal coding tool is honestly charming.

Multi-Agent Coordination

Claude Code can spawn sub-agents via the Agent tool. Each sub-agent runs its own isolated agentic loop with a separate conversation, and optionally a restricted tool set. The leader agent orchestrates; workers execute.

Workers run with the swarm worker permission handler, which auto-denies interactive prompts. They can only do things pre-approved by rules or hooks. If a worker hits a permission wall, it fails gracefully and reports back to the leader. Workers cannot spawn their own sub-agents (no recursive spawning) and each can run in an isolated git worktree to avoid file conflicts.

Error Recovery

413 (prompt too long): The response is withheld. The system attempts context collapse drain first (cheapest), then reactive compaction (costs an API call). If both fail, it surfaces the error. No silent message dropping.

Output truncation: If the response is capped at the default 8K limit, automatically retry with 64K. If still too long, inject synthetic continue messages up to a configurable limit.

Model failure mid-stream: Tombstone partial messages, strip incompatible thinking blocks, create a fresh streaming executor on the fallback model, notify the user. The conversation continues seamlessly.

MCP integration: MCP tools go through the exact same permission cascade, result size management, and concurrency model as built-in tools. Built-in tools are sorted separately from MCP tools before concatenation in the API call to preserve prompt cache stability - if MCP tools were interleaved, adding or removing one connection would invalidate the entire prompt cache.

Output sanitization. All MCP tool outputs pass through a Unicode sanitizer that strips control characters and zero-width sequences. Prevents a malicious MCP server from injecting terminal escape codes that could manipulate your terminal display.

The Pattern Underneath

Most of these issues come from the same root cause: Claude Code grew faster than its architecture could keep up with. You can see the layers of history. A simple terminal REPL grew into a multi-agent coordinator with voice mode, companion pets, vim bindings, and remote sessions. Features got added behind flags faster than old flags got cleaned up. The module graph grew connections faster than anyone drew boundaries.

This is not unique to Anthropic. Every fast-moving company has codebases like this. The reason Claude Code's case is interesting is the scale: this is one of the most important AI products in the world, and its source reveals the same messy engineering trade-offs that exist at every startup.

The code ships. It works. Lots of developers rely on it daily. The query loop architecture, the streaming tool executor, the permission classifier with tree-sitter shell parsing - these are genuinely well-engineered systems. That matters more than clean architecture. But it is worth being honest about the cost.

Built from the source. Last updated March 2026.