Prompt Injection Is Now OWASP's #1 AI Threat: When Your Coding Agent Becomes the Attacker

Quick Answer

Prompt injection is now ranked the number one AI security risk by OWASP, and according to Help Net Security's June 2026 report it maps to six of the ten categories in OWASP's Top 10 for Agentic Applications. The threat has moved from "the chatbot says something rude" to "the coding agent in your editor runs attacker-controlled commands." In April 2026 a trojanized Bitwarden CLI package was caught specifically hunting for Claude, Cursor, Codex, and Aider credentials. This article explains what changed, why your agent is now part of the attack surface, and the one part of the problem you fully control: the code that gets committed. Every claim is attributed to a named source.

Why Prompt Injection Suddenly Tops the List

The structural reason is simple and, for now, unsolved. As Help Net Security summarizes the OWASP position, large language models "treat the system prompt, the user's request, and any text retrieved from external sources as a single stream of tokens." The model cannot reliably tell your instructions apart from instructions hidden in a file it just read. When that model can also run shell commands, edit files, and open network connections, a malicious instruction in a README or a pull request becomes executable.

This is why Securance and other trackers put prompt injection at the top of the 2026 threat list. It is not the most sophisticated attack. It is the one with the widest reach, because every retrieval-augmented agent, every "read this repo and fix it" workflow, and every MCP tool integration is a potential injection point.

The Shift: Your Agent Is Now the Attack Surface

A run of CVEs in 2026 made the abstract concrete. These are real, assigned identifiers, not hypotheticals.

CVE	Tool	What the injection does	Severity
CVE-2025-53773	GitHub Copilot	Hidden prompt in a PR description leads to remote code execution	CVSS 9.6
CVE-2026-22708	Cursor	Poisons the agent's environment so allowlisted commands carry payloads	High
CVE-2025-54135 (CurXecute)	Cursor	Malicious prompt hidden in a repo README triggers RCE	High
CVE-2025-6514	Model Context Protocol	Untrusted MCP tool input reaches a shell	CVSS 9.6
CVE-2025-59532	OpenAI Codex CLI	Injection via untrusted project content	High

Microsoft's own research, "When prompts become shells," documents the same theme across agent frameworks: the gap between "the model decided to run this" and "an attacker decided to run this" can be a single line of hidden text. Help Net Security notes the five fastest-growing coding tools (Claude Code, Gemini CLI, Codex, Cline, and Aider) all carry security advisories, with Claude Code alone listed at 22.

The Bitwarden CLI Worm That Hunted AI Credentials

The most direct warning shot landed on 22 April 2026. According to Palo Alto Networks and Endor Labs, a malicious version of the @bitwarden/cli npm package was live for roughly 90 minutes. The payload stole GitHub and npm tokens, SSH keys, .env files, and shell history, then self-propagated by backdooring any package the victim could publish.

The detail that matters for this audience: the malware specifically probed for Claude, Cursor, Codex CLI, and Aider. As The Hacker News reported, this is among the first documented supply chain attacks to explicitly target AI coding assistant credentials. The attackers understand that a developer's machine running an agent is a high-value target: it holds cloud keys, repo write access, and now an autonomous tool that can be steered.

A Mental Model: The Lethal Trifecta

You cannot patch the token-stream problem yourself, so the practical defenses are about limiting what an injected instruction can reach. Help Net Security highlights two heuristics that have become standard.

The first is the "Lethal Trifecta": danger spikes when an agent simultaneously has access to private data, exposure to untrusted content, and the ability to communicate externally. Remove any one leg and exfiltration gets much harder. The second is Meta's "Agents Rule of Two": an agent operating without human approval should satisfy at most two of those three properties at once. If your agent reads arbitrary repos (untrusted content) and holds your cloud keys (private data), then it should not also be able to make arbitrary outbound network calls unattended.

// ❌ BAD - all three legs of the trifecta, unattended.
// Agent has: repo write + cloud keys in env + unrestricted shell/network,
// and auto-runs on every "review this external PR" with no human gate.

What You Actually Control: The Code That Ships

You cannot stop a model from reading a poisoned README. You can control what an attacker finds if a payload runs in your environment, and what your agent quietly commits. Both are scannable.

// ✅ GOOD - nothing worth stealing sits in the tree.
// config.ts
export const OPENAI_KEY = process.env.OPENAI_KEY;          // read at runtime
export const SUPABASE_SERVICE_ROLE = process.env.SUPABASE_SERVICE_ROLE;
// .env is gitignored, secrets live in a manager and rotate on a schedule,
// and every new dependency the agent adds is verified before install.

Keep secrets out of the repo and out of git history. A credential-harvesting payload greps the working tree first. If there is nothing there, the trifecta's "private data" leg shrinks.
Review what the agent commits, not just what it says. An injected instruction can add a dependency or a network call you never asked for. Diff the agent's commits, especially lockfile and config changes.
Verify new imports. Injection and slopsquatting both end in a bad package name landing in your package.json. Confirm each one exists and is the package you intended.
Scan the output automatically. Tools like VibeDoctor (vibedoctor.io) automatically scan your codebase for hardcoded secrets, private keys, and hallucinated or freshly-published imports, and flag the exact file and line, so the residue an agent or a payload leaves behind does not reach production. Free to sign up.
Gate autonomous runs on untrusted content. Require human approval before an agent acts on an external PR, issue, or web page.

FAQ

Can prompt injection be fully prevented?

Not today. Because the model processes instructions and data as one token stream, there is no reliable parser-level separation. The defenses that work are about reducing blast radius: limiting an agent's permissions, gating risky actions on human approval, and keeping high-value secrets out of reach. OWASP and Help Net Security both frame it as a risk to be contained, not a bug to be patched once.

Does this mean I should stop using Claude Code, Cursor, or Codex?

No. Advisory counts reflect scrutiny and popularity as much as risk, and all of these tools ship fixes quickly. The point is to use them with the same caution you would give any process that can run shell commands: keep them updated, do not auto-approve actions on untrusted input, and scan what they produce.

What was actually stolen in the Bitwarden CLI attack?

According to Palo Alto Networks and Endor Labs, the payload harvested GitHub and npm tokens, SSH keys, .env contents, shell history, and cloud secrets from infected developer machines, and probed for AI coding tool credentials. Bitwarden stated it found no evidence that end-user vault data was accessed. The risk was to developers who installed the bad version, whose local secrets and publish tokens were the target.

What is the "Lethal Trifecta" in one sentence?

An AI agent is dangerous when it can simultaneously access private data, read untrusted content, and communicate externally. Break any one of those three and a successful injection has far less it can do.

How does scanning my own code help against an injection in someone else's repo?

It addresses the two outcomes you can measure. First, it shrinks the prize: if no secrets sit in your tree, a payload that runs has less to steal. Second, it catches the residue: injected instructions often end with a new dependency, a hardcoded endpoint, or a secret written into a file. Scanning your repo on every change surfaces those changes regardless of where the injection came from.