PROMPT TO PRODUCTION
Chapter 9 of 19 · 12 min read

Chapter 9: Context Management

Quick Start (5-10 minutes)

Every AI conversation has a context window – a limited working memory. Claude Code’s context window holds approximately 200,000 tokens. Everything in your conversation – your messages, files read, tool outputs, and Claude’s responses – must fit inside it. When it fills up, older information fades, and Claude starts β€œforgetting” earlier decisions.

Here is how to monitor and manage your context right now.

Check your context usage. Claude Code displays a context usage indicator as you work. Watch for it to climb:

Context: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 40%   ← Comfortable, keep working
Context: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘ 85%   ← Time to wrap up or start fresh

Use /compact when context gets high. The /compact command tells Claude to compress the conversation history, preserving key decisions while freeing up space:

You: /compact

Claude: Compacted conversation. Key context preserved:
        - Working on Next.js app with JWT auth
        - Currently implementing password reset
        - Files modified: src/auth/reset.ts, src/models/User.ts
        Context reduced from 78% to 32%.

Start a fresh conversation when you switch tasks. If you have been debugging for an hour and now want to build a new feature, start a new conversation. Carry forward only what you need:

New conversation:
You: I'm working on a Next.js app with JWT auth (HTTP-only cookies).
     Key files: src/auth/, src/models/User.ts, src/middleware/auth.ts.
     I need to implement email verification. Let's start.

Try This Now: Open Claude Code on any project. Ask it to read two files and explain something about the codebase. Then run /compact and notice how the context usage drops while your key decisions are preserved.


Core Concepts (15-20 minutes reading)

What Is a Context Window?

Think of context as the surface of your desk. Everything you are actively working on must fit on the desk. When it gets full, you have to clear items to make room for new ones. A well-organized desk is more productive than a cluttered one.

Claude Code’s β€œdesk” is 200,000 tokens. Here is what counts toward that limit:

What Approximate Tokens Example
Your messages 50-150 each β€œFix the login validation bug”
Files Claude reads 5-8 tokens per line A 500-line file uses ~3,000 tokens
Tool outputs 200-5,000 each Grep results, npm output, test results
Claude’s responses 300-3,000 each Code generation + explanation

Token basics: 1 token is roughly 4 characters or 0.75 words. A typical 500-line source file uses about 2,500-4,000 tokens, or 1-2% of your total context.

Recognizing Context Limits

Four warning signs that context is getting full:

  1. Slower responses. Early in a conversation, responses come in 3-5 seconds. Near the limit, they take 15-30 seconds.

  2. Claude β€œforgets” earlier decisions. You said β€œUse TypeScript for everything” at message 5, but at message 100 Claude generates a JavaScript file.

  3. Repeated questions. Claude asks about your framework or database when you already discussed it 50 messages ago.

  4. System notifications. Claude Code shows context usage warnings as you approach capacity.

When context fills up completely, the system must drop older messages to make room. Your foundational decisions (architecture, conventions) were often in those early messages – and now they are gone.

Six Strategies for Efficient Context Usage

Strategy 1: Search before reading.

Do not read entire files. Locate the specific section you need first.

Inefficient (loads 2,000 lines):
You: Read src/components/Dashboard.tsx

Efficient (loads only 70 lines):
You: Find the data fetching logic in the Dashboard component
Claude: [Grep locates it at lines 450-520]
You: Read lines 450-520 of src/components/Dashboard.tsx

Strategy 2: One task per conversation.

Keep conversations focused on a single feature or task. When you finish authentication and want to start payment processing, begin a new conversation with a brief handoff summary.

Conversation 1: "Implement JWT Authentication"
  β†’ End with summary of decisions and key file locations

Conversation 2: "Add Payment Processing"
  β†’ Start with pasted summary from Conversation 1
  β†’ Full context available for implementation

Strategy 3: Use /compact proactively.

Do not wait until context is 90% full. Run /compact when you finish a major subtask or when context passes 60-70%:

You: /compact
     Focus on: implementing the password reset endpoint.
     Key decisions: JWT in HTTP-only cookies, bcrypt with 10 rounds,
     Prisma for database access.

Providing focus hints helps Claude preserve the most relevant information.

Strategy 4: Use agents for exploration.

When you need to understand how authentication works across 20 files, let an agent explore in its own context space and return a summary. Your main conversation stays lean.

Inefficient:
You: How does authentication work?
[Claude reads 20 files into your context β†’ 65% consumed]

Efficient:
You: How does authentication work?
Claude: [Launches explore agent in separate context]
        [Agent reads files, analyzes, returns summary]
Your context usage: 15% (just the summary)

Strategy 5: Give Claude context upfront instead of making it search.

If you know the relevant files, tell Claude directly:

You: I need to add rate limiting to the login endpoint.
     The relevant files are:
     - src/api/auth/login.ts (the endpoint)
     - src/middleware/rateLimit.ts (existing rate limiter)
     - src/config/limits.ts (configuration)
     Please read these and implement rate limiting.

This is faster and more context-efficient than having Claude search the entire project.

Strategy 6: Redirect verbose output.

Commands like npm install or long test suites dump thousands of tokens into context. Redirect them:

Instead of:
npm install --verbose
[5,000+ tokens of output]

Do this:
npm install --silent 2>&1 | tail -5
[100 tokens of output]

Try This Now: Pick a project and try the β€œsearch before reading” approach. Ask Claude to find a specific function using grep, then read only the relevant lines. Compare how much cleaner this feels versus reading the entire file.

When to Start a New Conversation

Start fresh when any of these apply:

Signal What to Do
Context passes 80% Summarize and start new conversation
Switching to a different feature New conversation with brief handoff
Exploration phase complete Start implementation in fresh conversation
Claude forgets key decisions Context is too full – start fresh
Long debug session resolved Debug logs are wasting space – start fresh

The Handoff Pattern:

End of Conversation 1:
You: Summarize what we accomplished and what I need for next steps.

Claude: "Summary:
  - Implemented JWT auth in src/auth/
  - User model in src/models/User.ts
  - Login: POST /api/auth/login
  - Tokens in HTTP-only cookies
  - Next: Add password reset"

Start of Conversation 2:
You: I'm working on a Next.js app with JWT auth.
     Key files: src/auth/, src/models/User.ts
     Tokens stored in HTTP-only cookies.
     I need to implement password reset.

Include in your handoff: project stack, key architectural decisions, important file locations, conventions, and current objective. Skip: detailed code, debug logs, search results, rejected alternatives.

Working with Large Codebases

A large enterprise codebase might have 10,000 files and 500,000 lines of code. Claude can hold roughly 50-100 files in context at once. The solution is the pyramid strategy:

Level 1: Project Overview (load once, ~5% context)
β”œβ”€β”€ README.md, package.json, architecture docs

Level 2: Feature Area (current focus, ~15% context)
β”œβ”€β”€ src/auth/  ← Working here now

Level 3: Everything Else (search as needed, minimal context)
└── src/payments/, src/notifications/, etc.

The Reference Document Pattern. Maintain a PROJECT_CONTEXT.md file with your stack, conventions, architecture decisions, and key file locations. Paste it at the start of each new conversation instead of re-exploring the codebase:

# Project Quick Reference

## Stack
- Next.js 13.4, PostgreSQL + Prisma, JWT auth, Tailwind CSS

## Conventions
- API responses: {success, data, error}
- TypeScript strict mode, all functions have JSDoc
- JWT in HTTP-only cookies (not localStorage)

## Key File Locations
- Auth: src/auth/
- Database: src/db/
- API routes: app/api/

This replaces dozens of file reads with a single paste.

Worktrees for Parallel Work

When you need to work on multiple features simultaneously, use worktrees to give each agent its own working directory. This prevents file conflicts and lets each conversation operate independently:

You: /worktree feature-auth
[Claude creates isolated worktree for authentication work]

Meanwhile, in another terminal:
You: /worktree feature-payments
[Separate worktree for payment work]

Each worktree has its own branch and file state, so agents working in parallel cannot interfere with each other.


Deep Dive (optional)

Context Consumption Benchmarks

Understanding real token costs helps you plan conversations:

Operation Tokens % of 200K
Read small file (50-100 lines) 500 0.25%
Read medium file (100-500 lines) 2,500 1.25%
Read large file (500-1000 lines) 5,000 2.50%
Grep search (20 matches) 800 0.40%
Agent focused exploration 1,500 0.75%
Short user message 50 0.03%
Claude’s medium response 1,000 0.50%
npm test output (20 tests) 800 0.40%
Git diff (small) 1,000 0.50%

Realistic scenario: Implementing a feature

Action                              Tokens    Cumulative
─────────────────────────────────── ───────── ──────────
Agent exploration (focused)          1,500     0.75%
Read 3 relevant files                3,300     2.40%
10-message planning discussion       6,000     5.40%
Code generation (3 files)            5,500     8.15%
Test run + 2 fix iterations          3,800    10.05%
Documentation updates                2,500    11.30%
─────────────────────────────────── ───────── ──────────
Total: ~23,000 tokens (11.5%)

Result: Complete feature with 88% context remaining.

Well-managed conversations routinely stay under 15% context for a complete feature.

Performance Impact by Context Level

Context Used Response Time Quality Risk Action
0-40% Fast (3-5s) Very low Keep working
40-60% Normal (5-8s) Low Monitor occasionally
60-75% Slower (8-12s) Medium Consider /compact
75-85% Noticeably slow High Plan transition or /compact
85-100% Slow (18-30s+) Very high Start new conversation now

Multi-Session Project Management

For projects spanning days or weeks, use a living context document:

# Project Context (Last updated: 2025-11-18)

## Current State
- Authentication: Complete
- Payment processing: In progress
- Notifications: Not started

## Recent Changes
- Nov 18: Added Stripe integration
- Nov 17: Implemented JWT auth

## Active Issues
- TODO: Add rate limiting to auth endpoints
- BUG: Profile image upload fails for PNG files

Update this document at the end of each session. Paste it at the start of each new conversation. This is your single source of truth that keeps every conversation aligned.

When Things Go Wrong

Problem: Context fills up unexpectedly fast.

Common causes: reading very large files (2000+ lines), verbose npm/build output, grep searches returning 100+ matches, or Claude generating lengthy responses.

Fix: Audit what consumed context. Use npm install --silent, redirect verbose output to files, use --files-with-matches for grep instead of full content, and read targeted line ranges instead of entire files.

Problem: Claude forgets decisions despite low context.

Even at 40% context, Claude may not attend equally to all content. Critical decisions buried 50 messages deep can fade.

Fix: Re-state critical decisions when starting major features. Create β€œcontext anchors” at the top of your conversation. Use a PROJECT_CONTEXT.md file. Run /compact with focus hints that emphasize your key decisions.

Problem: /compact loses important information.

Sometimes compaction drops details you still need.

Fix: Provide explicit focus hints when compacting: /compact Focus: payment integration details, Stripe API keys location, webhook endpoint structure. After compacting, verify Claude still knows the critical facts by asking a quick question.

Problem: Summaries miss key details during conversation transitions.

Your handoff summary was too brief or too vague, and the new conversation lacks critical context.

Fix: Use a structured summary template:

Tech stack and versions:
Architectural decisions:
Key file locations and purposes:
Code conventions:
Known issues and TODOs:
Current objective:

Problem: Context management feels like overhead.

You are spending more time thinking about context than coding.

Fix: Use simple rules, not complex strategies:

  1. Context over 80%? Start new conversation.
  2. Switching features? Start new conversation.
  3. Over 100 messages? Consider starting fresh.
  4. Otherwise, keep working.

Run /compact when you notice slowness. Maintain PROJECT_CONTEXT.md once and update it in 2 minutes at the end of each session. With practice, context management becomes automatic.


Chapter Checkpoint

Five-Bullet Summary

  1. Context is finite but generous. You have 200,000 tokens – enough for a full feature, but not infinite. Strategic management keeps you productive.
  2. Search before reading. Use grep to locate code, then read only the relevant lines. This can reduce context usage by 80% or more.
  3. One task per conversation. Focused conversations perform better. Summarize and start fresh when switching features.
  4. Use /compact and agents. The /compact command compresses history. Agents explore in their own context space, keeping your main conversation lean.
  5. Maintain a project reference document. A living PROJECT_CONTEXT.md replaces dozens of file reads and makes every conversation transition smooth.

Competency Checklist

After completing this chapter, you should be able to:


Next: Chapter 10 – Test-Driven Development with AI