Quick Answer
Three independent data sets converged in the first half of 2026 and they tell the same story. The Cloud Security Alliance's 2026 research note compiles them: Veracode testing finds about 45% of AI-generated code introduces an OWASP Top 10 weakness, enterprise findings from Apiiro rose roughly tenfold in six months, and confirmed CVEs attributed to AI-generated code climbed from 6 in January to 35 in March. None of this means "stop using AI to code." It means the review step you used to do in your head no longer scales to the volume of code you now generate. This article walks the numbers and what to check before you ship. Every figure is attributed to its named source.
The Headline Numbers
Start with code quality at the unit level. According to Veracode testing cited in the CSA note, the security pass rate for AI-generated code has sat at roughly 55% from 2025 into March 2026, meaning close to half of samples ship a real weakness. The failure rate is not evenly distributed.
| Category | Failure rate in AI-generated code | Source |
|---|---|---|
| Any OWASP Top 10 weakness introduced | ~45% of samples | Veracode (via CSA) |
| Java security pass | 72% fail | Veracode (via CSA) |
| Cross-site scripting | 86% fail | Veracode (via CSA) |
| Log injection | 88% fail | Veracode (via CSA) |
The takeaway is not the exact percentage. It is that the most common AI mistakes cluster in a handful of well-understood categories, which means they are detectable and fixable if anything is actually looking.
The Velocity Problem Is the Real Story
A 45% defect rate would be manageable if the volume were flat. It is not. The CSA note cites Apiiro enterprise data showing AI-assisted developers commit at three to four times the rate of their peers, and that monthly security findings rose from around 1,000 to over 10,000 across a six-month window, a tenfold surge.
Inside that increase, the dangerous categories grew fastest: privilege-escalation paths up 322%, architectural design flaws up 153%, and exposed credentials appearing at nearly twice the rate of non-AI developers. This is the core of the problem. AI did not invent new vulnerability classes. It scaled the production of old ones past the point where manual review can keep up. You are shipping more code, faster, with the same number of eyes on it.
The CVE Curve Is Bending Up
Lab metrics become real when they turn into assigned CVEs. The CSA note references a Georgia Tech tracker, the Vibe Security Radar, recording CVEs directly attributed to AI-generated code: 6 in January 2026, 15 in February, and 35 in March. The cumulative confirmed total was 74, with the researchers estimating the true figure is five to ten times higher in observable repositories because most are never formally reported.
For context on impact, the CSA note also cites an Escape.tech analysis of 1,400 vibe-coded applications that surfaced 2,038 highly critical vulnerabilities, more than 400 leaked secrets, and 175 instances of exposed personal data. These are not lab samples. They are deployed apps with real users.
Hallucinated Packages: A New Supply Chain Door
One AI-specific failure deserves its own section because attackers have already industrialized it. The CSA note reports that roughly 20% of AI-generated code samples reference packages that do not exist on the registry, and that 43% of those invented names recur consistently across similar prompts. That consistency is the danger: an attacker can predict the hallucinated name, register it, and wait for the next developer's AI to suggest it. The technique is called slopsquatting.
// ❌ BAD - trusting an import the model invented.
import { sanitizeHtml } from 'react-safe-html'; // no such package on npm
// If an attacker registers that name tomorrow, your next install pulls their code.
// ✅ GOOD - verify every AI-suggested dependency before trusting it.
import DOMPurify from 'dompurify'; // real, audited, widely used
// Before adding any new package the AI proposes:
// npm info dompurify // confirm it exists, check publish date + downloads
// read the repo, not just the name the model gave you
The Perception Gap and What to Check Before You Ship
The reason this surge is quiet is psychological. The CSA note cites Snyk data that nearly 80% of developers believe AI tools generate more secure code than humans write, even as studies found developers using AI assistants were more likely to submit insecure code while feeling more confident about it. Confidence is rising while the defect rate holds. That gap is exactly where bugs ship.
You close it with a checklist that runs on every change, not on vibes.
- Validate and parameterize. The biggest Veracode failure buckets are injection and XSS. Use parameterized queries everywhere and validate input with a schema library like Zod.
- Verify every new import. Given a 1-in-5 hallucination rate, never let an AI-suggested package reach your lockfile unchecked.
- Hunt for committed secrets. Exposed credentials are appearing at twice the baseline rate. Scan the tree and the git history, not just the latest diff.
- Check authorization on every route. Access-control bugs do not show up as errors. Confirm each endpoint scopes data to the current user.
- Automate it. Tools like VibeDoctor (vibedoctor.io) automatically scan AI-generated codebases for these exact categories, injection, XSS, hallucinated imports, leaked secrets, and unprotected routes, and flag the specific file paths and line numbers so the review keeps pace with the code. Free to sign up.
FAQ
Does a 45% defect rate mean AI coding tools are bad?
No. It means AI-generated code needs the same review and testing as any other code, and currently often skips it. The tools are a genuine productivity multiplier. The mistake is treating their output as finished rather than as a first draft that has not been security-reviewed.
Why are the CVE numbers still small if the defect rate is so high?
Because most vulnerabilities in small apps are never formally reported as CVEs. The Georgia Tech tracker counts only confirmed, attributed cases and its own researchers estimate the real figure is five to ten times higher. The trend line, from 6 to 15 to 35 in three months, matters more than the absolute count.
What is slopsquatting?
It is squatting on package names that AI models reliably hallucinate. Because the same fake name appears across many prompts, an attacker can register it and wait for AI assistants to recommend it to unsuspecting developers. Always confirm a package exists and is legitimate before installing it.
Which vulnerability categories should I prioritize?
Start where the data shows the highest failure rates and impact: injection and XSS (the largest Veracode buckets), broken access control, exposed secrets, and hallucinated dependencies. These five cover the majority of what shows up in scans of real AI-built apps.
Is manual code review enough?
It is necessary but no longer sufficient at AI velocity. When developers commit three to four times as much code, a human reviewer cannot catch every missing auth check or invented import. Automated scanning handles the mechanical, high-volume checks so human review can focus on logic and design.