Forty-five percent of AI-generated code contains known security vulnerabilities at the point of submission β yet syntax correctness rates exceed 95%, according to Veracode's Spring 2026 GenAI Code Security analysis. That gap β clean syntax, dangerous behavior β is exactly where existing pull request review breaks down. The code looks fine. The reviewer approves it. The vulnerability ships.
The problem isn't that teams aren't reviewing. It's that they're reviewing for the wrong signals. A 2025 arxiv study of 675 AI-generated security pull requests found 52.4% were merged despite containing embedded security vulnerabilities. These weren't obscure edge cases β they were regex flaws, injection patterns, and path traversal issues. They escaped review because human review processes are calibrated for human code failure modes, not AI code failure modes.
The Five Signals Your Review Process Isn't Built to Catch
Hardcoded credentials in the diff. AI coding agents optimize for functional output first. GitGuardian's 2026 State of Secrets Sprawl report found AI-assisted commits carry a 3.2% secret-leak rate β more than double the 1.5% baseline for human-only code. Reviewers scanning for logic, architecture, and correctness rarely run string-pattern analysis on the diff itself. Secrets hide in configuration helpers, test fixtures, and initialization code that reviewers treat as scaffolding.
XSS and input sanitization gaps. Veracode's Spring 2026 data shows cross-site scripting prevention (CWE-80) has only a 15% pass rate in AI-generated code β meaning 85% of AI-written code with XSS-relevant patterns fails the check. The underlying problem is structural: XSS prevention requires tracking user input through dataflow paths and applying sanitization at the right boundary. AI models generate syntactically clean code that passes every build check while leaving injection paths open in the output layer.
Regex patterns with catastrophic backtracking. The arxiv study of AI-generated security PRs found regex efficiency issues (CWE-1333) account for 36.2% of identified vulnerabilities β the single largest category. AI agents generate regular expressions that solve the immediate matching problem without verifying whether the pattern is safe under adversarial input. Catastrophic backtracking turns a validation function into a denial-of-service vector. It produces no test failure and no linting error.
OS command injection and path traversal in utility code. The same study found OS command injection (CWE-78) at 13.0% and path traversal (CWE-22) at 10.3% of AI security PR vulnerabilities. Both patterns concentrate in utility functions β file processors, shell wrappers, build helpers β that reviewers often treat as implementation detail rather than security-sensitive surface. AI agents write this code fluently and quickly, which makes it look reliable.
Oversized changesets that distribute risk across files. AI coding agents generate entire modules in a single commit. When a PR touches 40 files, per-file review depth drops substantially β and vulnerabilities embedded in utility modules or helper classes receive the least attention. The arxiv study identified PR content characteristics, including changeset scope, as among the strongest predictors of whether embedded security issues escaped review. A large diff doesn't just mean more code. It means each individual signal gets proportionally less scrutiny.
Why These Signals Consistently Escape Standard Review
Human code review is optimized for evaluating intent, architecture decisions, and logic correctness. It was never designed to run string-pattern analysis on credentials, trace user input through dataflow graphs, verify regex efficiency under adversarial load, or assess changeset-level risk distribution. These are pattern-detection and statistical problems β not architectural judgment calls.
The scale gap makes the signal problem worse. A 2026 survey of over 900 engineering leaders found 80.9% of technical teams are actively deploying AI coding agents, yet only 29% felt prepared to secure those deployments. Teams are merging AI-generated PRs at a pace that existing review capacity wasn't built to match. Asking reviewers to manually scan for regex efficiency issues and injection patterns in every AI-generated changeset isn't a sustainable ask β it requires a different class of check, applied automatically, at submission time. This is the core argument for signal-based PR risk scoring for AI-generated code: it fills the structural gap between what automated CI catches and what human review is calibrated to find.
What re-entry.ai Does About This
re-entry.ai scores each pull request against these five signal categories β credential exposure patterns, injection and sanitization gaps, regex complexity, path traversal indicators, and changeset size distribution β before it reaches human reviewers. Every PR receives a risk score with the specific signals flagged, so reviewer attention concentrates on high-signal PRs rather than distributing equally across the queue. Teams that adopt automated PR risk scoring as a pre-review gate convert signal detection from a reviewer responsibility into an automated check that runs at submission β consistently, on every PR, regardless of who authored it.
Start with a targeted retrospective: pull the last 30 AI-generated PRs your team merged and run automated scans against the five signal categories above. That snapshot will identify which pattern is your highest-frequency gap and where to focus governance effort first. Setting a pull request size policy for AI-generated code is the lowest-overhead control available immediately: it directly limits the diffuse-risk signal without requiring new tooling or process restructuring. For teams that need automated coverage across all five signal categories from day one, re-entry.ai provides risk scoring calibrated to AI coding agent output patterns without restructuring your review workflow.