Context Engineering: Feeding a nutritious diet to your LLM
By Amey Gujre and Sakshi Ray
Dec 24, 2025
When Andrej Karpathy, one of the leading voices in AI, endorsed “context engineering” over “prompt engineering,” it marked a pivotal shift in how we think about building intelligent systems. The message was clear: what you feed into your model matters as much as the model itself.
From prompt engineering to context engineering
Prompt engineering was the first step - teaching us to phrase questions with precision. But as AI evolves into agentic systems that plan, reason, and execute tasks autonomously, we’ve entered a new discipline. Context engineering isn’t about clever phrasing anymore - it’s about designing the entire environment an LLM operates in.
Filling the context window involves strategically loading the model’s temporary workspace with complete, relevant, and well-structured information. Since every token matters and each instruction vies for attention, the way this space is organized directly influences whether your agent succeeds or struggles.
Understanding context: Beyond the prompt
Context isn't just the prompt you send; it's everything the model perceives beforehand, forming a comprehensive information structure.
The system prompts instruction sets the scene for the agent about what sort of tasks we want it to perform.
The user input can be anything from a question to a request for a task to be completed.
Short-term memory provides the LLM context about the ongoing chat.
Long-term memory can store and retrieve both long-term chat history and other relevant information.
Information retrieved from a knowledge base could still be vector-based database retrieval, but could also entail relevant information retrieved from external knowledge bases via API calls, MCP tools, or other sources.
Tools and their definitions provide additional context to the LLM as to what tools it has access to.
Responses from tools provide the responses from tool runs back to the LLM as additional context to work with.
Structured output Response format specifications, such as JSON schemas.
The five-layer context architecture
Foundational instructions (System identity)
This is the agent’s core blueprint defining its role, tone, behavior, and boundaries. It acts as the permanent foundation that governs every response, ensuring consistency and alignment with its intended persona and purpose.
Retrieved Knowledge (RAG layer)
This is the engine of truth and factual grounding. Using Retrieval-Augmented Generation, the system enhances LLM responses with real-time or proprietary data from external sources. The process involves:
Retrieve: Turn user queries into search vectors
Search: Look up relevant entries in vector databases or document stores
Dynamic inputs (Real-time context)
This layer captures what’s happening in the moment, reflecting the user’s current state and environment. It includes dynamically generated inputs created on the fly. The context isn’t static; it adapts to the user's evolving preferences. Each user request may require different information, so providing everything at once can slow down the system.
Memory (Short-term and long-term)
Short-term memory: Summarizes or stores recent interactions (such as the last few users' turns) to maintain continuity.
Long-term memory: Retains persistent user data goals, habits, or knowledge graphs to recall over time. By compressing and prioritizing essential details, the system balances continuity with efficiency.
Tool outputs (Action layer)
This layer transforms the LLM from a static responder into an active problem-solver.
When the model needs information it doesn’t have, it triggers external tools (APIs, Python scripts, databases). This is where the agent connects reasoning with real-world execution, bridging intelligence with action.
Critical Context Problems
Context poisoning
Low-quality or hallucinated information seeps in and becomes accepted as truth, corrupting reasoning.
Solution: Validate and quarantine suspect context before saving it to long-term memory, isolating bad data in separate threads.
Context distraction
The model gets overwhelmed by irrelevant details, losing sight of its main goal
Solution: Apply context summarization and compression to retain only salient details - a cleaner, lighter mental meal.
Context confusion
Too many tools or facts create noise, leading to incorrect reasoning or tool misuse.
Solution: Manage the tool loadout dynamically, retrieve only relevant tool descriptions using RAG techniques.
Context clash
Conflicting or outdated instructions paralyze reasoning.
Solution: Prune or offload older information as new inputs arrive, maintaining clarity of purpose through a “Scratchpad” workspace.
The four pillars of writing good prompts
Write context: Save information outside the context window via scratchpads, external notes, or runtime states, preserving intermediate thoughts and plans across sessions.
Select context: Semantic search, embeddings, and knowledge graphs retrieve only the most relevant memories, documents, or tool descriptions per step.
Compress context: Hierarchical summarization and pruning heuristics condense large histories into token-efficient, salient representations.
Isolate context: Split complex tasks among specialized sub-agents with tailored contexts and tools, dramatically improving focus, modularity, and scalability.
Core challenges and techniques
Context engineering tackles two fundamental challenges: selecting the right context and fitting it within token limits.
Knowledge base selection: Modern agentic systems access multiple knowledge bases. Providing AI with context about available resources ensures accurate retrieval and prevents confusion.
Context ordering: With finite context windows, strategic ordering matters. Techniques like post-retrieval summarization maximize utility within constraints.
Structured information: Avoid overcrowding by providing only relevant context rather than exhaustive data dumps.
Workflow engineering: Map complex task workflows, control when to invoke AI versus standard tools, implement error handling and fallbacks, and optimize for specific outcomes.
Why LLMs need context engineering
Large Language Models don’t fail because of the code or framework; they fail because of what they’re fed.
Finite context windows: Like computer RAM, context windows (up to 200K tokens) are limited workspaces. Exceeding capacity causes the oldest information to fall away.
Hallucination: LLMs have static, frozen knowledge. Without access to recent events or proprietary data, they confidently invent plausible-sounding but false answers, confusing the sound of truth with actual truth.
Instruction neglect: Overloading with simultaneous instructions dramatically degrades performance. The more constraints are stacked, the higher the cognitive load, and the more likely it is that critical rules are forgotten.
Context engineering isn’t just a technical optimization; it’s the backbone of intelligent AI systems. It transforms LLMs from passive responders into active, situational thinkers. In this new era, performance doesn’t depend on bigger models or fancier prompts. It depends on feeding them smarter, with cleaner, structured, and purpose-driven context. Because one truth now defines the future of AI systems:
Your LLM is what it eats.
Recent Blogs


