AI Coding Agents Are Writing Entire Apps - Who's Checking Their Work?

Quick Answer

AI coding agents like Claude Code, Devin, Windsurf, and Cursor Agent mode can now write entire applications autonomously - not just autocomplete lines, but scaffold full projects with routing, database schemas, authentication, and deployment configs. The problem is that these agents introduce the same categories of vulnerabilities that AI autocomplete does, but at a much larger scale and with far less human oversight. Automated scanning after every agent session is no longer optional.

From Autocomplete to Autonomous: A Fundamental Shift

In 2024, AI coding meant autocomplete. You typed a function signature and Copilot suggested the body. You accepted or rejected each suggestion individually. The human was in the loop on every line.

In 2026, the paradigm has shifted. Tools like Claude Code, Devin, Windsurf, and Cursor Agent mode operate autonomously. You describe what you want - "build a SaaS dashboard with Stripe billing, user auth, and a REST API" - and the agent writes the entire application. Hundreds or thousands of files. Full stack. Working code.

GitHub reported that Copilot writes 46% of code in projects where it is enabled. With agentic tools, that number approaches 90-100% for new projects. The agent does not assist you - it replaces you as the primary author. And that changes the risk profile completely.

The shift from autocomplete to autonomous agents means the surface area of unreviewed code has grown by orders of magnitude. When you accepted one autocomplete suggestion at a time, you at least glanced at each block. When an agent writes 50 files in a session, nobody is reading most of them.

What AI Agents Get Wrong Consistently

AI coding agents produce code that works - it compiles, it runs, it does what you asked. But "works" and "secure" are fundamentally different standards. A Stanford study found that developers using AI assistants wrote less secure code while believing it was more secure. With autonomous agents producing entire codebases, this confidence gap is even more dangerous.

Veracode's 2025 analysis found that 36% of AI-generated code contains security flaws. The patterns are consistent across agents. Here is what they routinely produce:

God files. Agents love putting everything in one file. A single 800-line API route file with authentication, business logic, database queries, and error handling all mixed together. No separation of concerns.

Missing authentication. Agents generate working API endpoints but skip auth middleware. The route works perfectly in development and gets shipped to production wide open.

Hardcoded secrets. Connection strings, API keys, and JWT secrets appear as string literals. The agent gets the functionality right but stores the secret in the worst possible place.

No error handling. Happy path only. Agents rarely generate try-catch blocks, input validation, or graceful failure modes unless you explicitly ask for them.

A Real Example: Agent-Generated API Route

Here is a pattern that AI coding agents produce constantly. An API route that handles user data with no authentication, no input validation, and a SQL injection vulnerability:

// ❌ BAD - Typical AI agent output
app.post('/api/users/update', async (req, res) => {
  const { id, name, email, role } = req.body;
  // No auth check - anyone can update any user
  // No input validation - trusts client data completely
  const result = await db.query(
    `UPDATE users SET name='${name}', email='${email}', role='${role}' WHERE id=${id}`
  );
  res.json({ success: true, user: result.rows[0] });
});

This code works. It updates users. An agent will generate it, test it, confirm it functions, and move to the next task. But it has four critical vulnerabilities in nine lines: no authentication, no authorization, no input validation, and SQL injection via string interpolation.

Here is what that route should look like:

// ✅ GOOD - Secure version with auth, validation, parameterized query
app.post('/api/users/update', requireAuth, async (req, res) => {
  const schema = z.object({
    id: z.string().uuid(),
    name: z.string().min(1).max(100),
    email: z.string().email(),
  });
  const parsed = schema.safeParse(req.body);
  if (!parsed.success) return res.status(400).json({ error: 'Invalid input' });

  // Users can only update their own profile
  if (parsed.data.id !== req.user.id) {
    return res.status(403).json({ error: 'Forbidden' });
  }
  const result = await db.query(
    'UPDATE users SET name=$1, email=$2 WHERE id=$3 RETURNING *',
    [parsed.data.name, parsed.data.email, parsed.data.id]
  );
  res.json({ success: true, user: result.rows[0] });
});

The secure version has authentication middleware, Zod input validation, authorization checks, and parameterized queries. An AI agent will not produce this version unless you explicitly instruct it on every one of these requirements - and even then, it may skip some.

Agent vs. Autocomplete: The Risk Multiplier

The difference between autocomplete and agentic coding is not just speed - it is the oversight model. Here is how the risk profile changes:

Dimension	AI Autocomplete (Copilot, Tab-complete)	AI Agent (Claude Code, Devin, Cursor Agent)
Code per session	10-50 lines	500-5,000+ lines
Human review	Each suggestion individually	Final result only (if at all)
Files modified	1-3 files	10-50+ files per task
Architecture decisions	Human decides structure	Agent decides structure
Secret handling	Human places secrets	Agent may hardcode secrets
Test coverage	Human writes or skips tests	Agent usually skips tests unless asked
Dependency choices	Human selects packages	Agent selects (may hallucinate packages)
Error when wrong	Localized to one function	Systemic across entire codebase

The critical difference is the last row. When autocomplete gets something wrong, the blast radius is one function. When an agent gets something wrong, the pattern repeats across every file it touches. One bad security decision becomes a codebase-wide vulnerability.

The Specific Agents and Their Patterns

Each agentic tool has its own tendencies. Claude Code tends to produce cleaner architecture but can over-engineer solutions, creating complex abstractions where a simple function would suffice. Devin produces working end-to-end implementations but frequently skips authentication middleware and input validation on API routes.

Cursor Agent mode works well within an existing codebase but makes risky assumptions about the project structure when generating new files. Windsurf generates code quickly but leans heavily on hardcoded configuration values. Platforms like Bolt, Lovable, and v0 (by Vercel) produce visually polished frontends but the backend code they scaffold - when they scaffold any at all - tends to have the weakest security posture.

Apiiro's 2025 research confirmed that AI-generated code contains security vulnerabilities at 2.74x the rate of human-written code. That research predates the widespread adoption of fully agentic workflows. The actual rate for agent-generated code, where the AI makes architecture decisions without human input, is likely higher - though no published study has isolated agentic output yet.

The common thread across all these tools is that they optimize for "does it work?" and not "is it safe?" They are trained to produce functional code, and functional code with a SQL injection vulnerability is still functional code.

How to Fix It: Scanning After Every Agent Session

The solution is not to stop using AI coding agents. They are genuinely productive tools. The solution is to treat every agent session like a pull request from a junior developer who is talented but does not think about security.

That means automated scanning after every session. Not a manual code review - nobody is going to read 50 files line by line. Automated analysis that checks for the specific patterns agents introduce: unprotected routes, hardcoded secrets, SQL injection, missing validation, god files, no error handling, hallucinated imports.

Tools like VibeDoctor's Vibe Check (vibedoctor.io) run 149+ automated checks across security, performance, code quality, and AI-specific patterns - catching exactly the kinds of issues that AI coding agents introduce. Free to sign up.

The workflow should be: agent writes code, automated scanner checks it, you fix the findings, then you ship. This adds minutes to a process that otherwise takes hours of manual review. The key is that the scanner needs to understand AI-specific patterns - god files, hallucinated packages, hardcoded localhost URLs, missing error boundaries - not just traditional lint rules.

What Happens Next

AI coding agents are only getting more autonomous. The trajectory is clear: more code generated per session, more architectural decisions delegated to the AI, and less human review of individual files. Tools like Next.js and Vercel are building deeper agent integrations. The gap between "AI wrote it" and "a human verified it" is widening.

The builders who ship safely will be the ones who pair fast generation with fast verification. Let the agent write the code. Let the scanner check the code. Fix what the scanner finds. This is the new workflow - and the teams that adopt it will move faster than those who either avoid agents entirely or use them without any safety net.

The question is not whether AI agents will write your code. They already are. The question is whether anyone is checking their work before it reaches your users.

FAQ

Are AI coding agents less secure than AI autocomplete?

The vulnerabilities are the same categories - SQL injection, missing auth, hardcoded secrets - but the scale is different. Autocomplete introduces these issues one suggestion at a time with human review on each. Agents introduce them across entire codebases with minimal review. The risk per line of code is similar, but the total exposure is much higher because agents produce 10-100x more code per session.

Can I make AI agents write more secure code with better prompts?

Partially. Adding security requirements to your system prompt or project rules (like .cursorrules or CLAUDE.md) improves the output. But prompts are not enforced - the agent may still skip requirements under certain conditions or when context gets complex. Prompts reduce the frequency of issues but do not eliminate them. You still need automated verification.

Which AI coding agent is the safest to use?

Claude Code tends to produce fewer security vulnerabilities in testing, particularly around input validation and secret handling. But no agent is consistently safe enough to skip scanning. The model improvements between versions are real but the gap between "functional" and "secure" has not closed for any tool. Use whichever agent fits your workflow, and scan the output regardless.

How often should I scan agent-generated code?

After every agent session that produces or modifies code. If the agent wrote 30 files, scan before you commit. The scan takes 2-5 minutes and catches issues that would take hours to find manually. For teams using agents daily, integrate scanning into your CI/CD pipeline so nothing ships without a check.

Will AI agents eventually fix their own security issues?

Some agents are adding self-review capabilities, and future models will likely produce more secure code by default. But self-review has a fundamental limitation: the same model that introduced the vulnerability may not recognize it as a vulnerability. Independent verification from a purpose-built scanner - using rule-based pattern detection and CVE databases rather than LLM judgment - will remain necessary for the foreseeable future.