Developers using AI complete 21% more tasks and merge 98% more pull requests. Yet study after study finds no measurable gain at the organizational level. If you're a CTO or engineering leader looking at your sprint velocity and wondering why AI hasn't moved the needle, you're not alone and you're not imagining it.
This isn't a tooling problem. It's a context problem.
Key Takeaways
- Individual AI productivity gains are real: 21% more tasks, 98% more PRs merged.
- Team-level outcomes worsen: PR review time up 91%, PR size up 154%, bugs per developer up 9%.
- 75% of engineers use AI tools, yet most organizations report no measurable performance gains.
- The root cause: AI planning tools read tickets. Tickets are outputs, not inputs. Without a structured context layer, AI generates plans that teams don't trust and don't use.

What Is a Context Lake? The AI-Native Planning Memory Layer
The AI Productivity Paradox Is Real
Researchers at Faros.ai tracked more than 10,000 developers across 1,255 teams and found that AI coding assistants drive a genuine jump in individual output: 21% more tasks completed, 98% more pull requests merged (Faros.ai, July 2025). And yet, at the organizational level, productivity gains are nowhere to be found. That's not a rounding error. That's a structural failure.
This isn't a niche finding from a small sample. Larridin's Developer Productivity Benchmarks found that 75% of engineers use AI tools, yet most organizations see no measurable performance gains (Larridin, 2026). So we have near-universal adoption and near-zero systemic impact. Something is fundamentally broken.
The tempting explanation is that the AI models aren't good enough yet. That's wrong. The models are fine. The problem is what you feed them.
What's Actually Driving the AI Productivity Gap?
The Faros.ai data tells a more uncomfortable story when you look past the headline numbers. Yes, individual developers ship more. But PR review time jumps 91%, average PR size inflates 154%, and bugs per developer increase 9% (Faros.ai, July 2025). Individual throughput goes up. The downstream system gets worse.
This is not a paradox. It's a congestion problem.
When a developer ships twice as fast, the reviewer's queue doubles. Larger PRs take longer to review and introduce more defect surface area. The bottleneck moves. It doesn't disappear. AI accelerates one node in the system without coordinating the rest of it.
AI planning tools were supposed to fix this at the system level. They haven't. Most of them are wrappers around whatever inputs you already have, which brings us to the actual problem.
Why Don't Standard AI Planning Tools Fix the Gap?
Here's a blunt observation: most AI planning tools are glorified summarizers. They take your Jira board, a Confluence doc or two, and your last standup notes, feed them to a language model, and return something that looks like a sprint plan. The output is plausible. It is also contextless.
Only 29% of developers trust AI-generated output (Stack Overflow Developer Survey 2025). That number isn't surprising once you understand what those tools are actually reading.
Jira tickets are status records. They tell you a task exists and roughly what it is. They don't explain why the task was created. They don't capture what customer problem it solves, what technical constraints shaped the estimate, or what happened in the last three sprints that makes this task risky. An AI reading those tickets is guessing. Teams can tell. That's why they ignore the output.
What Is the Context Problem in AI Planning?
So what does AI actually need to plan a sprint well? Engineers spend 25% of their workweek searching for information (Atlassian State of Teams, April 2026). That's the cost of fragmented context. AI planning tools inherit that fragmentation and then try to reason through it.
Context, for a planning AI, is not a ticket title and a story point estimate. It is:
- The product brief that originated the epic and the customer interview that validated it.
- The PR that partially addressed the problem last sprint and the review comment that flagged it as incomplete.
- The retrospective note that named this feature area as a recurring blocker.
- The team velocity over the last six sprints and the capacity constraint for the next one.
- The architectural decision record that limits which approach is even viable.
No AI planning tool on the market today ingests all of this. They ingest tickets. Tickets are the output of a planning process, not the input to one. When you hand an AI the outputs and ask it to reconstruct the inputs, you get hallucinated rationale dressed up as a plan.
What Does a Context Layer Actually Change?
This is where the concept of a Context Lake becomes concrete rather than abstract. A Context Lake is a centralized, structured repository of all software knowledge: decisions, customer signals, retrospectives, velocity history, architecture constraints, and linked work items. It's not a document store. It's a graph. Relationships matter as much as content.
Google's DORA research offers a warning worth taking seriously. A 25% increase in AI adoption correlated with a 1.5% decrease in throughput and a 7.2% decrease in stability in teams with dysfunctional processes (Google DORA 2025, approximately 5,000 respondents). AI doesn't fix a broken process. It amplifies it.
The same principle works in reverse. When AI has access to a structured context layer, it stops guessing. It can generate sprint plans grounded in actual decisions, flagged risks from prior retrospectives, and capacity constraints from real velocity data. The output is defensible. Teams can interrogate it. That's a fundamentally different tool.
83% of Agile practitioners use AI tools, yet 55% spend 10% or less of their work time with AI (AI4Agile Practitioners Report, Scrum.org, February 2026, n=289). That's not adoption reluctance. That's the market telling you the current tools don't fit into actual planning workflows. A context-aware AI fits differently, because the output connects to work people already understand.
The Teams Getting It Right
Elite engineering teams see 2.5 to 3.5x ROI on AI coding tools compared to the median (Larridin Developer Productivity Benchmarks, 2026). What separates them from the 75% seeing no gain?
They don't treat AI as an individual productivity tool bolted onto existing workflows. They treat it as a system-level investment that requires system-level inputs. In practice, this means:
- Written architectural decisions with explicit rationale, not just outcomes.
- Tagged retrospectives that link recurring issues to specific work items and technical areas.
- Customer feedback connected to product decisions, not siloed in a CRM.
- Sprint reviews that capture what was descoped and why, not just what shipped.
These teams are building context lakes without calling them that. The structure is there. The linkage is there. The AI they run on top of it produces output that reflects institutional knowledge, and teams use it.
Only 15% of Agile practitioners have received formal AI training (AI4Agile Practitioners Report, Scrum.org, February 2026). The gap isn't willingness to adopt. It's organizational infrastructure. Elite teams built the infrastructure first and the AI adoption followed naturally.
What's the One Question to Ask Your AI Planning Tool?
Before you adopt any AI planning tool, or before you accept that the one you have is working, ask it a single question: What does it read before it generates a plan?
If the answer is "your Jira board" or "your Confluence space," you know what you're getting. A faster version of a bad process. The model will be fluent, the output will look structured, and your team will ignore it within two sprints because it doesn't reflect how decisions actually get made.
If the answer is "your full software knowledge graph — decisions, customer signals, retrospectives, velocity, architecture constraints, and linked work items" — that's a different conversation. That's a tool that might actually change what your team does on Monday morning, not just how fast it generates a ticket.
The AI productivity paradox resolves the moment the AI has the same context your best engineer carries in their head. Right now, no planning tool gives it that. The ones that figure it out first will make the rest of the category look like spreadsheets.
Frequently Asked Questions
Why do individual developers get more productive with AI while teams don't?
AI accelerates one node in the system. Individual developers ship more code and merge more PRs. But PR review time increases 91% and PR size inflates 154% (Faros.ai, July 2025), creating downstream congestion. The bottleneck moves. Without AI coordinating the entire system, individual speed creates team-level friction.
What makes an AI planning tool different from a context-aware planning tool?
Most AI planning tools read Jira tickets and summarize them. A context-aware tool reads the full knowledge graph: product briefs, customer interviews, retrospectives, architectural decisions, and velocity history. Only 29% of developers trust standard AI output (Stack Overflow Developer Survey 2025). Trust goes up when output reflects real decisions, not ticket descriptions.
Does more AI adoption always improve engineering outcomes?
No. Google DORA found that a 25% increase in AI adoption correlated with a 1.5% decrease in throughput and 7.2% decrease in stability in teams with dysfunctional processes (Google DORA 2025). AI amplifies the process that already exists. If that process lacks structured context, AI-generated plans will too.
What is a Context Lake and how does it fix the problem?
A Context Lake is a centralized repository of all software knowledge — decisions, customer signals, retrospectives, velocity history, and architecture constraints — structured as a graph so relationships between pieces of information are preserved. When an AI reads a Context Lake instead of a ticket board, it can generate plans grounded in actual institutional knowledge.
How do elite teams achieve 2.5-3.5x ROI on AI tools when most teams see zero gain?
They invest in structured context before they invest in AI tooling. Written ADRs, tagged retrospectives, linked customer feedback, and documented sprint decisions give AI the inputs it needs to generate trustworthy output. Elite teams are building context infrastructure deliberately. Most teams skip it and then blame the model when the output isn't useful.

