The 11% Problem: Why the AI Agent Gap Is Becoming a Moat

Last week, Meta’s Director of AI Alignment told her OpenClaw agent to scan her inbox and suggest what to archive or delete. “Don’t action until I tell you to,” she wrote.

The agent nuked her inbox anyway.

Summer Yue — whose literal job is making sure AI systems follow human instructions — watched the agent “speedrun deleting” hundreds of emails while ignoring her commands to stop. She typed “STOP OPENCLAW.” It kept going. She had to sprint to her Mac Mini and kill the processes manually.

Her post-mortem? “Rookie mistake. Turns out alignment researchers aren’t immune to misalignment.”

This happened to one person, on one email account, running one agent. Meanwhile: $650 billion in AI infrastructure spending this year from Microsoft, Amazon, Google, and Meta alone. NVIDIA just posted $68.1 billion in quarterly revenue, up 73% year over year. Every enterprise vendor is selling “agentic AI.”

And yet, according to Deloitte’s 2025 Emerging Technology Trends study, only 11% of organizations have AI agents actually running in production.

The gap between what we’re spending on AI infrastructure and what we’re shipping in production keeps widening. It’s becoming a moat.

First, a Primer: What AI Agents Actually Are

A chatbot answers questions. A copilot sits alongside you while you work — suggesting code, drafting replies, summarizing docs. You’re still driving.

An agent receives a goal and goes after it. It breaks the goal into steps, picks the right tools (APIs, browsers, file systems, databases), and executes. Often without waiting for your approval on each step. Think less “ask for directions” and more “hand someone your car keys.”

OpenClaw is the agent that made this click for millions of people. Built by Austrian developer Peter Steinberger (who just got hired by OpenAI), it’s open source, runs locally on your machine, and connects to WhatsApp, Telegram, Slack, or Discord. It reads and writes files, runs shell commands, sends emails, controls browsers, manages calendars, runs scheduled automations. It remembers things across sessions. And it has what Steinberger calls a “heartbeat” — it proactively checks in and acts on your behalf even when you’re not talking to it.

People call it “Claude with hands.” The large language models (Claude, GPT, Gemini, DeepSeek) provide the reasoning. OpenClaw adds the ability to act on that reasoning — to touch real systems, move real data, make real changes.

Which is also what makes agents so much harder to deploy than chatbots. A chatbot that hallucinates gives you a wrong answer. An agent that hallucinates takes a wrong action — on your production systems, with your customer data, at 3 AM.

The Adoption Funnel

The adoption funnel as of early 2026:

62% of organizations are experimenting with AI agents (McKinsey State of AI)
38% are piloting solutions (Deloitte Emerging Tech Trends)
14% have production-ready solutions (Deloitte)
11% are actively using agents in production (Deloitte)
5% have scaled AI to production with material business impact (MIT/Index.dev)

Two-thirds of companies are experimenting. One in twenty has shipped anything that matters.

Meanwhile, 95% of organizations report their AI spending has not produced measurable business returns, per the MIT State of AI in Business 2025 report. S&P Global found that 42% of companies abandoned most of their AI initiatives in 2024 — up from 17% the prior year.

And Gartner predicts that over 40% of agentic AI projects will be canceled outright by the end of 2027.

The Bifurcation

Most companies are stuck in what McKinsey calls “pilot purgatory.” A small group is pulling away. Deloitte’s 2026 State of AI in the Enterprise report found the number of companies with 40%+ of AI projects in production is expected to double in the next six months. McKinsey shows high-performing organizations are 3x more likely to scale agents than their peers.

So the gap is compounding. One group is accumulating production experience — learning what breaks, what scales, what drives ROI — while everyone else restarts the same pilot for the third time.

Gartner projects that 40% of enterprise applications will embed AI agents by end of 2026, up from less than 5% in 2025. The companies building governance, integration infrastructure, and production reliability now will own those embedded workflows. Everyone else will be buying from them.

That advantage compounds quarterly.

Five Things Killing Agents Between Pilot and Production

1. Nobody actually knows how the process works

A study of 20 companies deploying AI agents, published this month, found that 14 of them were trying to automate processes that were never documented, never stable, and in some cases never understood — even by the people running them.

The CTO of a mid-sized logistics company put it this way, two months into a failed rollout: “We thought we were deploying software. We discovered we were holding up a mirror to ourselves. And we didn’t like what we saw.”

Most organizations have massive amounts of institutional knowledge that exists only in people’s heads. Agents can’t access any of it.

2. Agents only see a fraction of enterprise data

Roughly 10-20% of enterprise data is structured — clean ERP fields, CRM records, transaction logs. The other 70-85% lives in contracts, emails, PDFs, Slack threads, meeting notes, and policy documents.

Most agents only operate on the structured slice, which means they’re making decisions based on a fraction of available context. Context gaps are the root cause of most agent failures in enterprise settings, not model quality.

3. Integration is the actual product

The 20-company study found that the average organization ran 14 distinct software systems per workflow. Four had clean APIs. Three had APIs that technically existed but hadn’t been updated in years. The rest ran on manual entry, screen scraping, email, spreadsheet exports, or — in one case — a fax machine feeding into a scan-to-email workflow.

Each of those integration points can break an agent. And unlike a human who shrugs and improvises, an agent either fails hard or fails silently. The silent failures are worse.

The Summer Yue incident showed this in miniature. Her real inbox was too large and triggered context compaction — where the agent compresses its context window to keep running. During that compaction, it lost her original safety instruction. The “don’t take action” directive vanished. The agent kept executing with full confidence on an incomplete set of instructions.

That failure mode scales directly to enterprise deployments, across systems far more consequential than email.

4. Humans were silently fixing agent errors all along

The 20-company study found that teams systematically underestimated edge cases by 3 to 8 times. Initial pilots succeeded because human reviewers were quietly catching and correcting agent mistakes. Production failures arrived the moment those review layers were removed to hit cost targets.

Remove the reviewers to hit cost targets, and the agent’s actual error rate surfaces for the first time.

5. Governance doesn’t exist yet

Deloitte’s 2026 report found that only 21% of companies deploying agentic AI have a mature governance model. 42% are still developing one. 35% don’t have a formal strategy at all — no audit trails, no identity management, no compliance frameworks designed for autonomous systems. Agents that can’t pass a SOC2 review never make it out of the sandbox.

Cisco’s AI security research team tested a third-party OpenClaw skill and found it performed data exfiltration and prompt injection without user awareness. CrowdStrike published enterprise tools to detect and remove OpenClaw from corporate machines. Meta banned OpenClaw from internal workflows after the Yue incident.

What the 11% Are Doing Differently

The biggest differentiator, per McKinsey, is that the companies succeeding don’t bolt agents onto existing processes. They redesign the process around what agents are good at. Organizations that treat agents as “productivity add-ons” consistently fail to scale.

They also start narrow. The highest adoption areas right now are coding (nearly 90% of organizations use AI for development), data analysis and report generation (60%), and internal process automation (48%). Constrained, measurable, high-repetition tasks.

Governance matters more than most teams realize. Deloitte found that enterprises where senior leadership actively shapes AI governance — audit trails, identity management, kill switches — achieve significantly greater business value than those who delegate it to technical teams.

And they model the economics upfront. The emerging pattern in 2026 is treating agent cost optimization the way companies learned to treat cloud cost optimization in the microservices era. Inference costs and runaway API bills have killed more agent projects than bad models have.

Five Companies That Got It Right (And What They Actually Did)

Walmart: Surgical Agents, Not General-Purpose AI

Walmart’s CTO Hari Vasudev described the company’s approach as “surgical“: purpose-built agents trained on Walmart’s proprietary retail data, each doing one specific thing well. No general-purpose AI trying to be everything.

They have four “super agents” — composite systems made of multiple task-specific agents stitched together. Sparkyhandles customer shopping. Wibey serves developers (and they’re now building Wibey using Wibey). A merchant AI agent called “Wally” identifies root causes of inventory imbalances in real-time — not flagging shortages, but autonomously analyzing weather patterns, social media trends, and logistics bottlenecks to reroute stock before shelves go empty.

Results: their “self-healing inventory system” saved $55 million in waste during its 2025 rollout, primarily in perishables. A separate AI negotiation bot (built with startup Pactum) closes deals with 64-68% of long-tail suppliers it approaches, achieving 1.5-3% cost savings and extended payment terms — at a scale human buyers could never cover. Digital twins of store equipment cut emergency maintenance calls by 30% and repair costs by 20%. AI-directed shift planning went from 90 minutes to 30.

The key insight: Walmart’s agents aren’t replacing people at scale. They’re covering tasks that were economically impossible for humans to do at Walmart’s volume — negotiating with thousands of small suppliers, monitoring every refrigeration unit, rerouting inventory across 4,700 stores in real time.

Klarna: The Cautionary Success Story

Klarna is the company everyone cites when they talk about AI agents working. The numbers are real: their OpenAI-powered assistant handled 2.3 million conversations in its first month, doing the work of 700 full-time agents. Resolution time dropped from 11 minutes to under 2. Repeat inquiries fell 25%. The company reported $60 million in savings and customer service cost per transaction dropped 40% over two years, from $0.32 to $0.19.

But the full story is more instructive than the highlight reel.

In May 2025, CEO Sebastian Siemiatkowski told Bloomberg: “Cost unfortunately seems to have been a too predominant evaluation factor… what you end up having is lower quality.” Klarna started rehiring human agents. By February 2026, they’d adopted an “Uber-style” model: AI handles simple, standardized queries while humans become the VIP tier for complex, emotionally sensitive, or high-value cases.

Forrester VP Kate Leggett summarized it well: “They overpivoted to cost containment, without thinking about the longer-term impact of customer experience.”

The lesson isn’t that AI agents failed at Klarna. They work. The lesson is that optimizing for cost savings without modeling the downstream quality impact is how you get a system that technically resolves tickets faster but drives customer complaints about impersonal, robotic interactions. Klarna’s revenue per employee went from $300K to $1.3M since 2022. The company is heading toward 2,000 employees from 5,000. The AI is staying. The humans are coming back for the work AI can’t do well.

Salesforce (Customer Zero): What 1.5 Million Support Requests Taught Them

Salesforce deployed its own Agentforce platform internally before selling it — what they call “Customer Zero.” In one year, their service agent handled 1.5 million+ support requests, most without human intervention. Their SDR agent worked 43,000 leads and generated $1.7 million in new pipeline from dormant leads. Agentforce in Slack gave teams 500,000 hours back.

The most useful finding wasn’t about performance. It was about failure modes. When they first launched the customer support agent, the answers were factually accurate but the experience felt transactional. Customers who had been used to human agents saying “I’m really sorry to see that” got technically correct but emotionally flat responses. Fix: they explicitly trained the agent on conversational empathy, not just knowledge retrieval.

Customer results across 12,000 Agentforce deployments: Reddit deflected 46% of support cases and cut resolution times by 84% (8.9 minutes to 1.4 minutes). 1-800-Accountant hit 90% case deflection during tax week. Heathrow Airport’s“Hallie” agent answers questions about security wait times, restaurant locations, and gate directions — grounded entirely in Data 360, their unified customer data layer. UK police forces’ “Bobbi” agent resolves 82% of inbound citizen queries without human escalation.

JPMorgan Chase: $2 Billion In, $2 Billion Out

Jamie Dimon told Bloomberg in October 2025: “We spend $2 billion a year on developing artificial intelligence technology, and save about the same amount annually from the investment.” He called it “the tip of the iceberg.”

The specifics behind that number: 450+ AI use cases in production, up from 300 use cases and $100 million in value in 2022. 200,000 employees have access to LLM Suite, their internal AI platform built on OpenAI and Anthropic models. Half use it daily. The platform updates every eight weeks as JPMorgan feeds it more proprietary data and connects it to more internal systems.

The compounding pattern is the point. JPMorgan went from $100M in AI value (2022) → $1B (2024) → $1.5B → $2B (2025), growing 30-40% annually. Their fraud detection systems alone prevent an estimated $1.5B in annual losses. “Ask David,” a multi-agent system for the Private Bank, automates investment research with 90%+ accuracy on domain-specific queries. AI-powered card upgrade recommendations generated over $220 million in benefit.

Chief Analytics Officer Derek Waldron described the endgame to CNBC: “Every employee will have their own personalized AI assistant; every process is powered by AI agents, and every client experience has an AI concierge.” He also acknowledged the reality: “There is a value gap between what the technology is capable of and the ability to fully capture that within an enterprise.” Even at $18 billion in annual tech spend, full deployment will take years.

The lesson for smaller organizations: JPMorgan spent three years building data infrastructure before the AI value curve inflected. They didn’t skip the foundation. They also didn’t build their own models — they built the integration layerbetween commodity models and proprietary financial data. That’s where the $2B in value lives.

General Mills: AI That Moves Cereal, Not Just Data

General Mills isn’t a company you’d associate with cutting-edge AI. That’s what makes their results worth studying. Their supply chain chief Paul Gallagher built an “always-on” AI model — developed with Palantir — that replaced episodic, human-driven supply chain decisions with continuous, machine-driven optimization.

The system (called ELF) assesses 5,000+ daily shipments from plants to warehouses. In its first six months, it made ~400 daily recommendations to teams, with 70% accepted automatically. Result: $20 million+ in logistics savings since fiscal year 2024, plus a 30% waste reduction in manufacturing where AI analytics were deployed.

Gallagher put the shift plainly: “We’re moving from a world where people make decisions supported by machines to one where the machines make most of the decisions guided by people.” Decisions that took a day now take a minute. The company doubled its digital, data, and technology investments since 2019, and expanded ELF globally after proving out the U.S. pilot.

General Mills didn’t try to build a general-purpose AI platform. They built a digital twin of their supply chain, layered AI on top for specific decisions (routing, procurement timing, waste monitoring), and kept humans in the loop for exceptions. It’s the same pattern as Walmart and JPMorgan: narrow scope, proprietary data, measurable outcome, then expand.

Sierra: $100M ARR in 21 Months on Outcome-Based Pricing

Sierra is the clearest proof that AI agents have moved from experiment to infrastructure. Founded in 2023 by former Salesforce co-CEO Bret Taylor and Google veteran Clay Bavor, the company hit $100 million ARR in 21 months — making it one of the fastest-growing enterprise software companies in history. Valued at $10 billion after a $350M round in September 2025.

What surprised the founders wasn’t the tech companies adopting Sierra. It was the traditional businesses. ADT, SiriusXM, Casper, WeightWatchers, Bissell — companies that were never on anyone’s AI-native shortlist are now running production agents.

The results across the customer base:

Brex — customers get answers 90% faster, saving 15,000+ hours per year. Their CX and ops teams build customer journeys without code while engineering maintains control via Sierra’s Agent SDK. They piloted voice the same quarter they launched chat. Brex’s CTO framed the build-vs-buy decision clearly: they wanted engineering focused on what Brex does best (spend management), not on building and maintaining an AI agent platform.

Casper — their “Luna 2.0” agent handles 74% of inquiries autonomously. During their Labor Day sale, traffic doubled and the agent maintained resolution quality without scaling human staff. CSAT scores went up nearly a full point. Two team members review conversations daily to flag issues and create knowledge articles — that continuous feedback loop is what keeps the resolution rate climbing.

WeightWatchers — agent handles nearly 70% of all customer sessions with a 4.6/5 satisfaction score. The agent scored higher on empathy than WeightWatchers’ own call-center staff.

SiriusXM’s “Harmony” agent became their highest-rated, lowest-effort customer service channel — handling subscription management, technical troubleshooting, and content recommendations. They’re now the first customer on Sierra’s Agent Data Platform, which unifies structured and unstructured data so Harmony can move from transactional interactions to proactive, personalized engagement.

Sierra’s pricing model is worth noting: outcome-based, meaning you pay when the agent completes a task, not per seat or per month. That pricing structure aligns incentives in a way that traditional SaaS licensing never did — Sierra only makes money when the agent actually works.

Glean: The Knowledge Layer That Makes Agents Useful

Glean approaches the agent problem from a different angle. Where Sierra builds the customer-facing agent, Glean builds the internal knowledge layer that employees (and other agents) need to function.

The numbers: $200 million ARR as of December 2025, doubling from $100M in just nine months. Valued at $7.2 billion. Customers include Booking.com, Comcast, eBay, Intuit, LinkedIn, Pinterest, Samsung, and Zillow. 27 billion documents indexed across 100+ enterprise SaaS applications. Over 100 million agent actions per year on their platform, targeting one billion by end of 2025.

Glean’s core insight maps directly to failure pattern #2 from earlier in this article: agents only see a fraction of enterprise data. Glean’s platform connects to everything — Slack, Google Workspace, Microsoft 365, Salesforce, Jira, GitHub — and provides permission-aware search and retrieval across all of it. When a support agent receives a customer ticket, Glean can surface relevant documentation, prior Slack threads, similar past tickets, and internal policy documents, but only the ones that person is authorized to see.

The engagement metrics tell the real story: employees average five queries per day — comparable to consumer web search behavior — with a 40% weekly active/monthly active ratio, more than double the SaaS industry average. The $1M+ contract segment has grown nearly threefold in the past year.

CEO Arvind Jain framed the competitive position at Fortune Brainstorm AI: “The next decade won’t be defined by who builds the biggest model, but by who builds the most trusted and useful systems on top of them.” Glean’s bet is that the knowledge graph — understanding how every person, document, and system in an organization connects — is the layer that makes all other AI agents useful.

Build vs. Buy: The Framework That Actually Matters

The build-vs-buy question in 2026 isn’t the same one enterprises asked about cloud a decade ago. MIT NANDA researchputs a number on it: purchased AI solutions succeed roughly 67% of the time versus 22% for internal builds. That gap alone should inform the default.

But the real answer is: build AND buy — in different places.

Buy when the capability is commoditizing. Agent frameworks (LangChain/LangGraph, Microsoft AutoGen, CrewAI), orchestration platforms (AWS Bedrock AgentCore, Google Agentspace, Salesforce Agentforce), and the underlying models themselves — these are moving too fast for most enterprises to build and maintain. Anthropic now captures 40% of enterprise LLM spend, up from 12% two years ago. OpenAI dropped from half the market to under a quarter. The model layer is a commodity. Treat it like one.

Build where your data is the moat. Your domain knowledge, proprietary ontologies, evaluation datasets, governance policies, and integration logic — that’s where the differentiation lives. As one CIO told InformationWeek: “Use the agent engine now to learn what works, but architect your stack so you can swap in vendor innovations as they mature — while your real differentiation lives in the domain models, policies, and evaluation data that no platform vendor can ship for you.”

The practical decision tree:

Use a platform (Salesforce Agentforce, ServiceNow, Microsoft Copilot Studio) if your workflows already run on that vendor’s ecosystem, you need governance and audit trails on day one, and your team doesn’t have dedicated ML engineers. Salesforce has 18,500 Agentforce deals closed, 9,500+ paid, and 330% ARR growth — adoption is real.

Use an open framework (LangGraph, AutoGen, CrewAI) if you need to connect agents across multiple vendor ecosystems, your workflows are complex enough that no-code builders can’t handle the logic, and you have engineering capacity to own the stack. LangGraph powers Klarna, Replit, and Elastic. AutoGen is the enterprise play for Azure-heavy shops.

Build custom only for the specific capability layer that creates competitive advantage: your proprietary data pipelines, evaluation suites, domain-specific guardrails, and workflow orchestration logic. Walmart didn’t build their own LLMs. They built Element — a proprietary ML platform for deploying, testing, and monitoring agents — on top of commodity models.

The biggest mistake enterprises make is building where they should buy (custom model training when fine-tuned commercial APIs would work) and buying where they should build (off-the-shelf governance when their industry requires custom compliance frameworks).

A Concrete Playbook: Going From Pilot to Production

These patterns come from the case studies above, the 20-company deployment study, and McKinsey’s finding that workflow redesign is the strongest driver of enterprise-level AI impact.

Step 1: Pick the right first agent. The highest-ROI deployments in 2025 were document processing, data reconciliation, compliance checks, and invoice handling — not customer-facing chatbots. Start with high-volume, rules-heavy, internal-facing tasks where errors are catchable and reversible. Walmart started with supplier negotiations and inventory monitoring. Salesforce started with support ticket routing and dormant lead qualification.

Step 2: Document the actual process before automating it. Fourteen of the twenty companies in the deployment study failed because they automated processes that were never documented. Map the workflow end to end. Find the tribal knowledge. Identify the edge cases the current team handles by instinct. If you can’t write it down clearly enough for a new hire to follow, an agent can’t follow it either.

Step 3: Solve the data problem first. Every successful enterprise deployment in the Ampcome study of 30+ deployments fused structured data (ERP, CRM, billing) with unstructured data (contracts, emails, PDFs, policies) into a unified semantic layer before writing any agent logic. Deployments that planned to “add unstructured later” consistently failed because agent logic built on partial context produced outputs that were technically correct but operationally wrong.

Step 4: Keep the humans in the loop longer than you think you should. Set explicit confidence thresholds. Below 90% confidence, the agent escalates to a human. Klarna learned this the hard way — removing human review to hit cost targets is what surfaced the quality problems. Build the human review period into the budget, not as a phase to “get past” but as a permanent feature for high-stakes decisions.

Step 5: Govern with deterministic rules, not probabilistic scores. The Ampcome research found that every successful deployment encoded approval hierarchies, compliance thresholds, and escalation criteria as hard if-then logic. Agents that make autonomous decisions based on confidence scores alone will eventually make a confident wrong decision on something that matters.

Step 6: Model the economics like cloud costs. Track inference costs per agent, per task, per workflow. Set budgets. Agent cost optimization is the 2026 equivalent of cloud cost optimization in the microservices era. Runaway API bills and unmonitored token usage have killed more agent projects than model quality has.

Step 7: Measure outcomes, not activity. The question isn’t “how many tickets did the agent resolve” — Klarna learned that metric can look great while customer experience degrades. Define 2-3 outcome KPIs (cost-to-serve, error rate, time-to-resolution, CSAT) with baselines and measurement windows before you deploy.

The OpenClaw Paradox

The same week CrowdStrike published enterprise tools to remove OpenClaw from corporate machines, Kilo launched KiloClaw — letting anyone deploy a hosted OpenClaw agent in under 60 seconds. OpenClaw has over 161,000 GitHub stars. Its creator just got hired by OpenAI. Mac Minis are reportedly hard to find because so many people are spinning up personal agents.

The technology works, and it’s moving fast enough that the gap between companies building on it and companies trying to contain it gets harder to close every month.

Jensen Huang said it on NVIDIA’s earnings call this week: “In this new world of AI, compute is revenues.” The infrastructure buildout isn’t waiting for the application layer to catch up. Hyperscalers are spending $650 billion this year and guiding higher.

The 11% shipping agents today are accumulating something that doesn’t show up in vendor demos or analyst reports: production reps. They know what breaks at scale, how to recover from agent errors, how to measure whether an agentic workflow actually saved money. That knowledge compounds.

Every quarter in pilot purgatory is a quarter someone else is getting those reps.

Sources Appendix

Summer Yue / OpenClaw inbox incident — Fast Company, Windows Central, Tom’s Hardware
NVIDIA Q4 FY2026 earnings ($68.1B revenue) — CNBC, Fortune
Deloitte Emerging Technology Trends (11% production stat) — Deloitte Insights
Deloitte 2026 State of AI in the Enterprise — Deloitte Press Release
McKinsey State of AI / 3x scaling stat — via MachineLearningMastery
MIT State of AI in Business 2025 (95% failure rate) — Fortune
S&P Global (42% abandoned AI initiatives) — via Index.dev
Gartner (40% agentic projects canceled by 2027) — Gartner Newsroom
20-company AI agent study — Medium / Abdul Tayyeb Datarwala
80/20 enterprise data split & context gaps — Ampcome
OpenClaw background — Wikipedia, OpenClaw.ai
Cisco security analysis of OpenClaw — Cisco Blogs
CrowdStrike OpenClaw removal tools — CrowdStrike Blog
Meta bans OpenClaw internally — UCStrategies
KiloClaw launch (60-second deployment) — VentureBeat
Enterprise agent adoption stats (coding, data analysis) — BeamSec
Gartner (40% enterprise apps to embed agents by end 2026) — via MachineLearningMastery
Hyperscaler capex ($650B) and Jensen Huang earnings call — Fortune, Nasdaq
Walmart AI strategy (“surgical” agents, super agents) — AI News, IT Brew
Walmart self-healing inventory ($55M savings) — FinancialContent
Walmart Pactum AI negotiation bot (64-68% close rate) — Klover.ai
Walmart associate AI tools (shift planning 90→30 min) — Walmart Corporate
Walmart Element platform & Wibey — Digital Commerce 360
Klarna AI assistant (2.3M conversations, 700 FTE equivalent) — Klarna Press
Klarna $60M savings, 853 FTE equivalent — Yahoo Finance / CX Dive
Klarna cost per transaction decline (40%) — CX Dive
Klarna CEO admits quality problems, rehires humans — Substack / Vidmar
Klarna Forrester “overpivoted” analysis — CX Dive
Salesforce Customer Zero (1.5M requests, 500K hours saved) — Salesforce News
Salesforce Agentforce 360 (Reddit, 1-800-Accountant results) — Salesforce IR
Salesforce Agentforce customer stories (Heathrow, Engine) — Salesforce News, Success Stories
UK police “Bobbi” agent (82% resolution) — Salesforce UK
Agentforce 18,500 deals, 330% ARR growth — CX Today
MIT NANDA build vs. buy (67% vs 22% success) — via Neontri
Anthropic 40% enterprise LLM spend — Beam AI
CIO advice on agent engine as replaceable component — InformationWeek
Agentic AI frameworks (LangGraph, AutoGen, CrewAI) — SpaceO
Ampcome 30+ deployment study (unified context, deterministic rules) — Ampcome
McKinsey workflow redesign as strongest impact driver — McKinsey State of AI
Sierra $100M ARR in 21 months — TechBuzz
Sierra $10B valuation, Agent OS evolution — CMSWire
Brex + Sierra (90% faster, 15K hours saved) — Sierra / Brex
Casper Luna 2.0 (74% resolution, CSAT +1 point) — Sierra / Casper
WeightWatchers (70% sessions, 4.6/5 CSAT, empathy scores) — Sierra Blog, AIM Media
SiriusXM Harmony agent + Agent Data Platform — Sierra Blog
Glean $200M ARR, doubling in 9 months — Fortune
Glean 27B documents, 100M+ agent actions, engagement metrics — BusinessWire, Yahoo Finance
JPMorgan $2B AI spend = $2B savings (Dimon quote) — AIM Media House
JPMorgan 450+ AI use cases, LLM Suite, 30-40% annual growth — AI News
JPMorgan AI infrastructure ($100M→$2B value trajectory) — The Data Letter
JPMorgan “Ask David” 90%+ accuracy, $220M card upgrade benefit — Klover.ai
JPMorgan “fully AI-connected enterprise” vision (Waldron/CNBC) — CNBC
General Mills $20M+ AI logistics savings — CIO Dive, Food Dive
General Mills 30% waste reduction, ELF system — Supply Chain Strategy
General Mills “always-on” supply chain model — Consumer Goods Technology

Vatsal writes about AI, technology, and markets at vigyaan.com.

Vigyaan

The 11% Problem: Why the AI Agent Gap Is Becoming a Moat

First, a Primer: What AI Agents Actually Are

The Adoption Funnel

The Bifurcation

Five Things Killing Agents Between Pilot and Production

1. Nobody actually knows how the process works

2. Agents only see a fraction of enterprise data

3. Integration is the actual product

4. Humans were silently fixing agent errors all along

5. Governance doesn’t exist yet

What the 11% Are Doing Differently

Five Companies That Got It Right (And What They Actually Did)

Walmart: Surgical Agents, Not General-Purpose AI

Klarna: The Cautionary Success Story

Salesforce (Customer Zero): What 1.5 Million Support Requests Taught Them

JPMorgan Chase: $2 Billion In, $2 Billion Out

General Mills: AI That Moves Cereal, Not Just Data

Sierra: $100M ARR in 21 Months on Outcome-Based Pricing

Glean: The Knowledge Layer That Makes Agents Useful

Build vs. Buy: The Framework That Actually Matters

A Concrete Playbook: Going From Pilot to Production

The OpenClaw Paradox

Sources Appendix

Related

First, a Primer: What AI Agents Actually Are

The Adoption Funnel

The Bifurcation

Five Things Killing Agents Between Pilot and Production

1. Nobody actually knows how the process works

2. Agents only see a fraction of enterprise data

3. Integration is the actual product

4. Humans were silently fixing agent errors all along

5. Governance doesn’t exist yet

What the 11% Are Doing Differently

Five Companies That Got It Right (And What They Actually Did)

Walmart: Surgical Agents, Not General-Purpose AI

Klarna: The Cautionary Success Story

Salesforce (Customer Zero): What 1.5 Million Support Requests Taught Them

JPMorgan Chase: $2 Billion In, $2 Billion Out

General Mills: AI That Moves Cereal, Not Just Data

Sierra: $100M ARR in 21 Months on Outcome-Based Pricing

Glean: The Knowledge Layer That Makes Agents Useful

Build vs. Buy: The Framework That Actually Matters

A Concrete Playbook: Going From Pilot to Production

The OpenClaw Paradox

Sources Appendix

Share this:

Related