Understanding Token Limits and Prompting: A Practical Guide for Lawyers

Token limits represent one of the most significant but poorly understood constraints in AI-assisted legal work. Whether you're using Claude, ChatGPT, or specialized legal AI platforms like Harvey or Legora, token limits determine how much information the AI model can process in a single conversation, and once exceeded, force users to start fresh, potentially losing critical context mid-project.

This post examines what tokens actually are, why they matter for your practice, and how to work within their constraints.

Tokens are the Building Blocks of AI Communication

Think of tokens as the vocabulary units that AI models use to process language, though they're not quite words. A token can be a complete word ("litigation"), part of a word ("un" + "precedented"), punctuation marks, or even whitespace. AI models don't read text the way humans do. They convert everything into numerical representations, and tokens serve as the bridge between human language and machine processing.

When you interact with an AI:

You type: "Motion for summary judgment"
The system tokenizes this into approximately 5 tokens: ["Motion", " for", " summary", " judg", "ment"]
Each token gets converted to a unique numerical ID
The AI processes these number sequences
Output tokens are generated and converted back to readable text

A useful rule of thumb for English text: 1 token ≈ 4 characters, or roughly 75% of a word. So "artificial intelligence" uses about 4 tokens, while a typical page of legal text (around 500 words) consumes approximately 650-700 tokens.

The Scarcity Problem: Context Windows and Token Limits

Every AI model operates within a "context window," essentially the model's working memory for a conversation. Claude performs really well at upper token thresholds, with the ability to effectively reason over a maximally-utilized context window, with Claude Sonnet 4.5 handling up to 200,000 input tokens. That sounds like a lot (roughly 150,000 words or about 500 pages) but in litigation practice, you can burn through this surprisingly quickly.

The context window includes everything:

Your initial prompt or question
Any documents you upload or paste
The AI's responses
The entire conversation history up to that point
Any system instructions or custom settings

Once you max out the context window, you're forced to start a new conversation, potentially losing critical context and continuity. Imagine building a complex motion to compel over several iterations, only to hit the token limit just when you need to add that final section incorporating the judge's recent ruling. You'll need to start fresh in a new window, re-uploading documents and re-establishing context, a frustrating and time-consuming process.

Token Economics in Legal AI Platforms

Even if your firm uses Harvey or Legora, token limitations still matter. These platforms run on the same underlying LLMs (GPT-4, Claude, etc.) that power ChatGPT and Claude.ai, which means they inherit the same fundamental token constraints. The specialized legal interface and features don't eliminate the context window problem.

When you upload a 200-page merger agreement to Harvey for analysis, or feed multiple deposition transcripts into Legora's review system, you're still consuming tokens from the same limited pool. The platforms may handle token management behind the scenes, implementing sophisticated rate limiting and load balancing, but the underlying constraint remains: each conversation or workflow has a maximum capacity before you need to start fresh.

This reality becomes particularly apparent during complex projects. An associate using Harvey to analyze hundreds of contracts for a due diligence project will still hit token walls, just as they would using ChatGPT directly. The platform might handle the overflow more gracefully, perhaps by automatically chunking documents or creating new sessions, but the limitation affects workflow efficiency and continuity. Similarly, Legora users drafting lengthy agreements with multiple rounds of revision will encounter the same context limitations that exist in the base models, requiring strategic planning about how to structure their work within these constraints.

Token management becomes even more critical because these platforms often handle multiple concurrent users and complex workflows. Running unnecessary document reviews, requesting redundant analyses, or maintaining overly verbose prompts doesn't just waste tokens; it can slow down responses for your entire team during peak usage periods. Different features also consume tokens differently: research functions that query vast databases, document review tools that process hundreds of agreements simultaneously, and workflow automation features all have different token profiles. Understanding these patterns helps you choose the right tool for each task and avoid bottlenecks that affect the entire team.

Litigation Tasks That Devour Tokens

Understanding which tasks consume the most tokens helps you plan your AI workflow more effectively. The following litigation activities are particularly token-intensive:

Document Review and Analysis

Uploading a 100-page deposition transcript: ~35,000-40,000 tokens
Analyzing multiple contracts for inconsistencies: 5,000-10,000 tokens per contract
Reviewing discovery documents for privilege: Token cost scales with document volume

Brief Writing and Motion Practice

Initial motion draft with supporting memorandum: 3,000-5,000 tokens
Multiple rounds of revision and refinement: 2,000-3,000 tokens per iteration
Incorporating case law citations and analysis: 1,000-2,000 tokens per major case discussed

Legal Research and Analysis

Complex multi-jurisdictional research queries: 500-1,000 tokens per query
Synthesizing research results: 2,000-4,000 tokens
Creating case law summaries: 1,500-2,500 tokens per case

Deposition and Trial Preparation

Generating deposition outline from case documents: 10,000-15,000 tokens
Creating examination questions: 2,000-3,000 tokens per witness
Analyzing opposing counsel's filings for weaknesses: 5,000-8,000 tokens

Best Practices for Token Conservation

Working efficiently within token constraints requires strategic thinking about how you structure your AI interactions. The following approaches have proven effective:

Front-load Important Context Start conversations with essential background information in a concise format. Instead of uploading entire case files, create a "case summary" document that captures key facts, parties, claims, and procedural history in 1-2 pages.

Use Hierarchical Information Structure Begin with high-level requests, then drill down into specifics. Rather than asking the AI to analyze an entire contract at once, focus on specific sections or issues sequentially.

Implement a "Save and Resume" Strategy Before hitting token limits, ask the AI to create a comprehensive summary of work completed, key findings, and next steps. Save this summary along with any generated drafts, then use it to efficiently restart in a new conversation if needed.

Batch Similar Tasks Group related questions or document reviews together. Instead of starting separate conversations for each deposition transcript review, process them in sequence within a single session.

Clean Your Input Data Remove unnecessary formatting, headers, footers, and redundant information from documents before uploading. Convert PDFs to clean text when possible to avoid token waste on formatting artifacts.

Working Around Token Limitations

When you do hit the context window ceiling, these strategies help maintain productivity:

Strategic Conversation Splitting Break complex projects into logical phases. For example, split a summary judgment motion into: (1) factual background development, (2) legal standard articulation, (3) argument construction, and (4) final assembly and polishing.

Create Checkpoint Documents Generate interim work products that capture essential outputs (drafted sections, research findings, strategic insights) that can be referenced in subsequent conversations.

Use External Memory Systems Maintain a separate document tracking key decisions, preferred language, and important context that can be quickly fed into new conversations.

Leverage Platform-Specific Features Harvey's systems utilize a variety of AI models, and each request carries a varying computational load based on the weight of the request (prompt tokens) and the response (completion tokens). Understanding how your specific platform allocates tokens can help you optimize usage.

The Upshot

Token limitations aren't going away anytime soon; they're a fundamental constraint of current AI technology. But they're also not insurmountable. By understanding how tokens work and implementing smart usage strategies, you can maximize the value you get from AI tools while minimizing frustration and wasted effort.

The most successful AI adopters in Biglaw aren't necessarily those with unlimited budgets for AI tools. They're the ones who understand the technology's constraints and work intelligently within them. Whether you're using general-purpose AI assistants or specialized platforms like Harvey and Legora, mastering token economics is becoming as important as mastering Westlaw searches was a generation ago.

Every token counts, but with the right approach, you have more than enough to transform how you practice law. The key is working with the technology's grain, not against it.