Chapter 9: Context Management
Quick Start (5-10 minutes)
Every AI conversation has a context window β a limited working memory. Claude Codeβs context window holds approximately 200,000 tokens. Everything in your conversation β your messages, files read, tool outputs, and Claudeβs responses β must fit inside it. When it fills up, older information fades, and Claude starts βforgettingβ earlier decisions.
Here is how to monitor and manage your context right now.
Check your context usage. Claude Code displays a context usage indicator as you work. Watch for it to climb:
Context: ββββββββββββββββββββ 40% β Comfortable, keep working
Context: ββββββββββββββββββββ 85% β Time to wrap up or start fresh
Use /compact when context gets high. The
/compact command tells Claude to compress the conversation
history, preserving key decisions while freeing up space:
You: /compact
Claude: Compacted conversation. Key context preserved:
- Working on Next.js app with JWT auth
- Currently implementing password reset
- Files modified: src/auth/reset.ts, src/models/User.ts
Context reduced from 78% to 32%.
Start a fresh conversation when you switch tasks. If you have been debugging for an hour and now want to build a new feature, start a new conversation. Carry forward only what you need:
New conversation:
You: I'm working on a Next.js app with JWT auth (HTTP-only cookies).
Key files: src/auth/, src/models/User.ts, src/middleware/auth.ts.
I need to implement email verification. Let's start.
Try This Now: Open Claude Code on any project. Ask it to read two files and explain something about the codebase. Then run
/compactand notice how the context usage drops while your key decisions are preserved.
Core Concepts (15-20 minutes reading)
What Is a Context Window?
Think of context as the surface of your desk. Everything you are actively working on must fit on the desk. When it gets full, you have to clear items to make room for new ones. A well-organized desk is more productive than a cluttered one.
Claude Codeβs βdeskβ is 200,000 tokens. Here is what counts toward that limit:
| What | Approximate Tokens | Example |
|---|---|---|
| Your messages | 50-150 each | βFix the login validation bugβ |
| Files Claude reads | 5-8 tokens per line | A 500-line file uses ~3,000 tokens |
| Tool outputs | 200-5,000 each | Grep results, npm output, test results |
| Claudeβs responses | 300-3,000 each | Code generation + explanation |
Token basics: 1 token is roughly 4 characters or 0.75 words. A typical 500-line source file uses about 2,500-4,000 tokens, or 1-2% of your total context.
Recognizing Context Limits
Four warning signs that context is getting full:
Slower responses. Early in a conversation, responses come in 3-5 seconds. Near the limit, they take 15-30 seconds.
Claude βforgetsβ earlier decisions. You said βUse TypeScript for everythingβ at message 5, but at message 100 Claude generates a JavaScript file.
Repeated questions. Claude asks about your framework or database when you already discussed it 50 messages ago.
System notifications. Claude Code shows context usage warnings as you approach capacity.
When context fills up completely, the system must drop older messages to make room. Your foundational decisions (architecture, conventions) were often in those early messages β and now they are gone.
Six Strategies for Efficient Context Usage
Strategy 1: Search before reading.
Do not read entire files. Locate the specific section you need first.
Inefficient (loads 2,000 lines):
You: Read src/components/Dashboard.tsx
Efficient (loads only 70 lines):
You: Find the data fetching logic in the Dashboard component
Claude: [Grep locates it at lines 450-520]
You: Read lines 450-520 of src/components/Dashboard.tsx
Strategy 2: One task per conversation.
Keep conversations focused on a single feature or task. When you finish authentication and want to start payment processing, begin a new conversation with a brief handoff summary.
Conversation 1: "Implement JWT Authentication"
β End with summary of decisions and key file locations
Conversation 2: "Add Payment Processing"
β Start with pasted summary from Conversation 1
β Full context available for implementation
Strategy 3: Use /compact proactively.
Do not wait until context is 90% full. Run /compact when
you finish a major subtask or when context passes 60-70%:
You: /compact
Focus on: implementing the password reset endpoint.
Key decisions: JWT in HTTP-only cookies, bcrypt with 10 rounds,
Prisma for database access.
Providing focus hints helps Claude preserve the most relevant information.
Strategy 4: Use agents for exploration.
When you need to understand how authentication works across 20 files, let an agent explore in its own context space and return a summary. Your main conversation stays lean.
Inefficient:
You: How does authentication work?
[Claude reads 20 files into your context β 65% consumed]
Efficient:
You: How does authentication work?
Claude: [Launches explore agent in separate context]
[Agent reads files, analyzes, returns summary]
Your context usage: 15% (just the summary)
Strategy 5: Give Claude context upfront instead of making it search.
If you know the relevant files, tell Claude directly:
You: I need to add rate limiting to the login endpoint.
The relevant files are:
- src/api/auth/login.ts (the endpoint)
- src/middleware/rateLimit.ts (existing rate limiter)
- src/config/limits.ts (configuration)
Please read these and implement rate limiting.
This is faster and more context-efficient than having Claude search the entire project.
Strategy 6: Redirect verbose output.
Commands like npm install or long test suites dump
thousands of tokens into context. Redirect them:
Instead of:
npm install --verbose
[5,000+ tokens of output]
Do this:
npm install --silent 2>&1 | tail -5
[100 tokens of output]
Try This Now: Pick a project and try the βsearch before readingβ approach. Ask Claude to find a specific function using grep, then read only the relevant lines. Compare how much cleaner this feels versus reading the entire file.
When to Start a New Conversation
Start fresh when any of these apply:
| Signal | What to Do |
|---|---|
| Context passes 80% | Summarize and start new conversation |
| Switching to a different feature | New conversation with brief handoff |
| Exploration phase complete | Start implementation in fresh conversation |
| Claude forgets key decisions | Context is too full β start fresh |
| Long debug session resolved | Debug logs are wasting space β start fresh |
The Handoff Pattern:
End of Conversation 1:
You: Summarize what we accomplished and what I need for next steps.
Claude: "Summary:
- Implemented JWT auth in src/auth/
- User model in src/models/User.ts
- Login: POST /api/auth/login
- Tokens in HTTP-only cookies
- Next: Add password reset"
Start of Conversation 2:
You: I'm working on a Next.js app with JWT auth.
Key files: src/auth/, src/models/User.ts
Tokens stored in HTTP-only cookies.
I need to implement password reset.
Include in your handoff: project stack, key architectural decisions, important file locations, conventions, and current objective. Skip: detailed code, debug logs, search results, rejected alternatives.
Working with Large Codebases
A large enterprise codebase might have 10,000 files and 500,000 lines of code. Claude can hold roughly 50-100 files in context at once. The solution is the pyramid strategy:
Level 1: Project Overview (load once, ~5% context)
βββ README.md, package.json, architecture docs
Level 2: Feature Area (current focus, ~15% context)
βββ src/auth/ β Working here now
Level 3: Everything Else (search as needed, minimal context)
βββ src/payments/, src/notifications/, etc.
The Reference Document Pattern. Maintain a
PROJECT_CONTEXT.md file with your stack, conventions,
architecture decisions, and key file locations. Paste it at the start of
each new conversation instead of re-exploring the codebase:
# Project Quick Reference
## Stack
- Next.js 13.4, PostgreSQL + Prisma, JWT auth, Tailwind CSS
## Conventions
- API responses: {success, data, error}
- TypeScript strict mode, all functions have JSDoc
- JWT in HTTP-only cookies (not localStorage)
## Key File Locations
- Auth: src/auth/
- Database: src/db/
- API routes: app/api/This replaces dozens of file reads with a single paste.
Worktrees for Parallel Work
When you need to work on multiple features simultaneously, use worktrees to give each agent its own working directory. This prevents file conflicts and lets each conversation operate independently:
You: /worktree feature-auth
[Claude creates isolated worktree for authentication work]
Meanwhile, in another terminal:
You: /worktree feature-payments
[Separate worktree for payment work]
Each worktree has its own branch and file state, so agents working in parallel cannot interfere with each other.
Deep Dive (optional)
Context Consumption Benchmarks
Understanding real token costs helps you plan conversations:
| Operation | Tokens | % of 200K |
|---|---|---|
| Read small file (50-100 lines) | 500 | 0.25% |
| Read medium file (100-500 lines) | 2,500 | 1.25% |
| Read large file (500-1000 lines) | 5,000 | 2.50% |
| Grep search (20 matches) | 800 | 0.40% |
| Agent focused exploration | 1,500 | 0.75% |
| Short user message | 50 | 0.03% |
| Claudeβs medium response | 1,000 | 0.50% |
| npm test output (20 tests) | 800 | 0.40% |
| Git diff (small) | 1,000 | 0.50% |
Realistic scenario: Implementing a feature
Action Tokens Cumulative
βββββββββββββββββββββββββββββββββββ βββββββββ ββββββββββ
Agent exploration (focused) 1,500 0.75%
Read 3 relevant files 3,300 2.40%
10-message planning discussion 6,000 5.40%
Code generation (3 files) 5,500 8.15%
Test run + 2 fix iterations 3,800 10.05%
Documentation updates 2,500 11.30%
βββββββββββββββββββββββββββββββββββ βββββββββ ββββββββββ
Total: ~23,000 tokens (11.5%)
Result: Complete feature with 88% context remaining.
Well-managed conversations routinely stay under 15% context for a complete feature.
Performance Impact by Context Level
| Context Used | Response Time | Quality Risk | Action |
|---|---|---|---|
| 0-40% | Fast (3-5s) | Very low | Keep working |
| 40-60% | Normal (5-8s) | Low | Monitor occasionally |
| 60-75% | Slower (8-12s) | Medium | Consider /compact |
| 75-85% | Noticeably slow | High | Plan transition or /compact |
| 85-100% | Slow (18-30s+) | Very high | Start new conversation now |
Multi-Session Project Management
For projects spanning days or weeks, use a living context document:
# Project Context (Last updated: 2025-11-18)
## Current State
- Authentication: Complete
- Payment processing: In progress
- Notifications: Not started
## Recent Changes
- Nov 18: Added Stripe integration
- Nov 17: Implemented JWT auth
## Active Issues
- TODO: Add rate limiting to auth endpoints
- BUG: Profile image upload fails for PNG filesUpdate this document at the end of each session. Paste it at the start of each new conversation. This is your single source of truth that keeps every conversation aligned.
When Things Go Wrong
Problem: Context fills up unexpectedly fast.
Common causes: reading very large files (2000+ lines), verbose npm/build output, grep searches returning 100+ matches, or Claude generating lengthy responses.
Fix: Audit what consumed context. Use
npm install --silent, redirect verbose output to files, use
--files-with-matches for grep instead of full content, and
read targeted line ranges instead of entire files.
Problem: Claude forgets decisions despite low context.
Even at 40% context, Claude may not attend equally to all content. Critical decisions buried 50 messages deep can fade.
Fix: Re-state critical decisions when starting major features. Create
βcontext anchorsβ at the top of your conversation. Use a
PROJECT_CONTEXT.md file. Run /compact with focus hints that
emphasize your key decisions.
Problem: /compact loses important information.
Sometimes compaction drops details you still need.
Fix: Provide explicit focus hints when compacting:
/compact Focus: payment integration details, Stripe API keys location, webhook endpoint structure.
After compacting, verify Claude still knows the critical facts by asking
a quick question.
Problem: Summaries miss key details during conversation transitions.
Your handoff summary was too brief or too vague, and the new conversation lacks critical context.
Fix: Use a structured summary template:
Tech stack and versions:
Architectural decisions:
Key file locations and purposes:
Code conventions:
Known issues and TODOs:
Current objective:
Problem: Context management feels like overhead.
You are spending more time thinking about context than coding.
Fix: Use simple rules, not complex strategies:
- Context over 80%? Start new conversation.
- Switching features? Start new conversation.
- Over 100 messages? Consider starting fresh.
- Otherwise, keep working.
Run /compact when you notice slowness. Maintain
PROJECT_CONTEXT.md once and update it in 2 minutes at the end of each
session. With practice, context management becomes automatic.
Chapter Checkpoint
Five-Bullet Summary
- Context is finite but generous. You have 200,000 tokens β enough for a full feature, but not infinite. Strategic management keeps you productive.
- Search before reading. Use grep to locate code, then read only the relevant lines. This can reduce context usage by 80% or more.
- One task per conversation. Focused conversations perform better. Summarize and start fresh when switching features.
- Use /compact and agents. The
/compactcommand compresses history. Agents explore in their own context space, keeping your main conversation lean. - Maintain a project reference document. A living PROJECT_CONTEXT.md replaces dozens of file reads and makes every conversation transition smooth.
Competency Checklist
After completing this chapter, you should be able to:
Next: Chapter 10 β Test-Driven Development with AI
PROMPT TO PRODUCTION