MCP Server Security Risks: The AI Agent Toolchain Exposed

cybersecurity vulnerability alert or warning on computer screen - Computer code on a dark screen with line numbers.

What if the biggest security vulnerability in your AI development pipeline isn't a line of code—but the autonomous system generating it?

That question, surfaced in Forrester analysis tied to Snyk's June 2026 platform launch, exposes a structural gap in how enterprise security teams think about risk. The traditional model assumes humans write software and tools check their work. Agentic AI coding environments break that assumption at the foundation. According to Google News, citing SC Media's coverage dated June 25, 2026, Snyk announced Evo Agentic Development Security (Evo ADS) on June 23, 2026—with general availability scheduled for June 29, 2026—as a governance layer targeting the autonomous systems now writing, reviewing, and deploying enterprise code before human eyes ever see it. As Forrester put it: "Risk now enters before code reaches the repository, shifting the core security question from 'Is this code secure?' to 'Can we trust the system that created it?'"

The Evidence: When the Toolchain Becomes the Threat Vector

As of June 25, 2026, Snyk's MCPInspect tool had analyzed 9,700 developer environments and catalogued 4,524 unique MCP servers—the Model Context Protocol connectors that allow AI coding agents to invoke external tools, read file systems, call APIs, and execute actions without direct developer input. The top 1% of those installations ran 13 or more MCP servers per machine; some environments hosted more than 80 running concurrently.

The surface area is larger than most security teams realize. As of June 25, 2026, according to Snyk's research, 43% of developers run two or more AI coding environments simultaneously, and over 50% have MCP servers installed. Nearly one in four enterprise developers has at least one agent skill installed, averaging 18 skills per developer. More than one in ten of those skills reference external dependencies or externally hosted instructions—meaning the governance boundary extends well beyond the company's own infrastructure.

Of the MCP servers catalogued, 833 were identified as vulnerable and 18 carried descriptions flagged as suspicious. Independently, Cisco's AI Defense team examined 31,000 agent skills and found that 26% contained at least one vulnerability; 1,184 malicious skills were confirmed across the ClawHub repository. These aren't theoretical threat scenarios—they're live findings across developer machines already in production use.

In March 2026, Amazon experienced a 6-hour outage affecting 6.3 million orders, with investigators linking the disruption to issues in AI-generated code. Separate 2026 security research documented functional exploit chains running entirely through the agent toolchain: a poisoned security scanner that backdoored the LiteLLM library, and prompt injection attacks embedded in agent dependencies rather than in generated output. The attack no longer looks like a vulnerability. It looks like a normal tool call.

What It Means: Supply Chain Governance for AI Agents

Evo ADS structures its approach across three enforcement layers. Supply chain security vets MCP servers and agent skills before they reach developer machines. Behavioral governance monitors agent actions in real time during execution. Output validation scans AI-generated code at the moment of creation—not at the pull request gate, not at the CI/CD boundary, but before the code ever enters version control.

Chart: Agentic AI Security Market projected to grow from $1.25 billion (2025) to $13.52 billion by 2032, at a 42.0% CAGR. Source: market research data cited by Snyk, current as of June 25, 2026.

That three-layer model is a material departure from where AppSec tooling has historically been positioned. Traditional SAST tools (Static Application Security Testing—automated code review that flags vulnerabilities before deployment) operate at the tail end of the pipeline. Evo ADS attempts interception at the beginning, the middle, and the output simultaneously—treating the agent itself as an untrusted component, not just its artifacts.

Snyk CTO Manoj Nair stated plainly at the platform's launch: "Ask a security leader for a complete inventory of the AI agents, MCP servers and skills running across their developer machines and in most organizations that inventory doesn't exist. That is the gap Evo ADS closes." Brendan Putek, Director of DevOps at Relay Network, offered a practitioner's framing that resonates with anyone who has tried to retrofit governance onto agentic workflows: "The blast radius isn't bounded and we're early in the curve. Working with Snyk, we landed on what I think is the right architecture: controls built directly into the agent workflow." Oliver Neuberger, Managing Director EMEA at Accenture, summarized the enterprise mandate: "Their impact demands mindful development and the right guardrails—so enterprises can deploy them securely."

Snyk chose to launch Evo ADS at the AI Engineer World's Fair, where it serves as the exclusive security sponsor—a venue choice that signals AI developer tooling and security governance are converging into a single purchasing decision. As of 2025, the Agentic AI Security Market was valued at $1.25 billion, with projections placing it at $1.65 billion in 2026 and $13.52 billion by 2032—a 42.0% compound annual growth rate indicating genuine category formation. Competitor Cycode has shipped a comparable runtime governance platform, confirming that the "scan at commit" model is broadly recognized as insufficient. Teams already evaluating which AI coding tools to standardize on—a comparison covered in depth by AI Tools Newslens in its breakdown of Copilot, Cursor, and Claude Code—will find that governance tooling selection is now a parallel decision, not an afterthought.

Where Real-Time Behavioral Monitoring Actually Fails

Behavioral governance at agent runtime sounds tractable until you model the engineering constraints. Every tool call, every MCP server invocation, every intermediate action in a multi-step agent chain must be intercepted, evaluated, and either cleared or blocked in real time—without adding enough latency to make the workflow unusable or cause developers to disable the monitoring. That's not a product gap; it's a physics problem.

Context window blowups compound the overhead. An agent running 15 sequential tool calls, each returning several kilobytes of context, can exhaust a 128K token window in a single session. Adding behavioral state tracking to that process increases compute cost at exactly the point in the pipeline where every added millisecond has a direct developer-productivity cost. At the scale of thousands of developer machines, the numbers compound quickly and uncomfortably.

The structural challenge underneath all of this is the inventory problem Nair named directly: governance can only act on what it can see. Supply chain vetting requires comprehensive discovery of every MCP server across every developer machine—and with 4,524 unique servers catalogued from fewer than 10,000 environments, the surface area is expanding faster than any static allowlist strategy can handle. As of June 25, 2026, 62% of organizations cite security as the primary barrier to scaling agentic AI. That's not a problem a single tooling layer fully closes.

The underlying vulnerability rate also deserves direct attention: as of 2026, 45% of AI-generated code contains security flaws, and AI-generated code introduces vulnerabilities at 2.74 times the rate of human-written code. Output scanning at creation time catches artifacts. It does not alter the model's propensity to generate insecure patterns. Addressing that root cause requires model-level controls, systematic red-teaming of the agent itself, and eval-driven development (using automated test harnesses to validate agent behavior across security-relevant scenarios)—none of which the three-layer Evo ADS framework directly addresses.

Frequently Asked Questions

What is agentic development security, and how does it differ from traditional code scanning?

Traditional code scanning evaluates software after a developer writes it—at the pull request or CI/CD gate. Agentic development security addresses risk that enters before code reaches a repository: through the AI agent's tool connections (MCP servers, plugins, third-party skills) and the autonomous actions the agent takes during code generation. Forrester frames the shift as moving from "Is this code secure?" to "Can we trust the system that created it?"—a distinction that requires governance at the agent layer, not just the output layer.

How do MCP servers create specific security risks for enterprise AI coding workflows?

Model Context Protocol (MCP) servers are the connective interfaces that let AI coding agents invoke external tools, read file systems, call APIs, and access data sources without explicit per-action developer approval. A compromised or malicious MCP server becomes a direct pathway into internal systems, indistinguishable from a legitimate tool call. As of June 25, 2026, Snyk's analysis of 9,700 environments identified 833 vulnerable MCP servers, with some developer machines running more than 80 concurrent MCP connections. One in 12 developers with MCP servers installed has high or critical security findings in their environment.

Are AI coding tools safe for enterprise use without a dedicated governance platform?

Based on research current as of June 25, 2026, the risk is measurable and significant without additional controls. Snyk's data shows 45% of AI-generated code contains security flaws, with AI-generated code introducing vulnerabilities at 2.74 times the rate of human-written code. Cisco's AI Defense team independently found 26% of 31,000 agent skills contained at least one vulnerability, with 1,184 malicious skills confirmed in the wild. Enterprise teams can use AI coding assistants productively—but that use requires layered governance covering the agent's tool connections, runtime behavior, and generated output, not only a final scan before deployment.

Bottom line: In my read, the most important signal here isn't the product launch—it's the Forrester reframe. When the security question shifts from evaluating code to evaluating the system that writes it, the AppSec toolchain needs to be rebuilt, not extended. I'd argue that the organizations most exposed right now aren't those who haven't adopted AI coding agents; they're the ones who have, without first building the inventory visibility that any governance layer depends on to function. The three enforcement layers Evo ADS offers address real gaps. But governance without inventory is policy without enforcement—and the inventory problem has to come first, or the layers govern a surface you can't fully see.

Disclaimer: This article is editorial commentary for informational purposes only and does not constitute legal, security, or financial advice. Research based on publicly available sources current as of June 25, 2026.