AI Agent Security Testing in CI/CD
Security testing for AI agents is often treated as a periodic exercise — a red team engagement every quarter, a pre-launch audit before a major release. This approach has a fundamental problem: AI agents change constantly. Model updates, new tools, prompt revisions, RAG corpus changes — each is a potential new vulnerability. Quarterly testing catches a tiny fraction of the risk surface.
The solution is continuous adversarial AI testing integrated directly into CI/CD pipelines — the same way you'd integrate unit tests or SAST scanning.
This guide covers how to do it.
Why CI/CD Integration Is Non-Negotiable for AI Agent Security
Traditional software security can rely heavily on static analysis — examining code for patterns that correlate with vulnerability classes. AI agent security cannot. The vulnerabilities are behavioral and emergent. They only manifest when the agent runs.
This means:
- Every change is potentially a new vulnerability — a prompt edit that makes the agent more helpful may also make it more susceptible to goal hijacking. A new tool that adds capability expands the tool abuse surface.
- Model updates introduce regressions — when your LLM provider updates the underlying model, your agent's behavior changes. Defense mechanisms that worked against GPT-4o may need calibration against its successor.
- Manual testing doesn't scale — running a full adversarial test suite manually before every deployment is infeasible. CI/CD integration makes it automatic.
- Compliance requires continuous evidence — NIST AI RMF and emerging AI security frameworks expect documented, ongoing security testing — not a once-a-year report.
The CI/CD Integration Architecture
┌─────────────────────────────────────────────────────┐
│ Developer pushes code / prompt / tool change │
└────────────────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ CI Pipeline: Build + Unit Tests + Type Check │
└────────────────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Deploy to staging environment │
└────────────────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ FortifAI Adversarial Scan │
│ → 150+ payloads against staging agent endpoint │
│ → OWASP Agentic Top 10 coverage │
│ → Structured JSON results │
└────────────────────────┬────────────────────────────┘
│
┌──────────┴──────────┐
│ │
Critical/High findings No Critical/High
│ │
▼ ▼
┌─────────────────┐ ┌────────────────────┐
│ Block deploy │ │ Deploy to prod │
│ Create ticket │ │ Security badge ✓ │
└─────────────────┘ └────────────────────┘Step 1: Set Up Your FortifAI Configuration
Create a fortifai.config.ts in your project root that points to your staging environment endpoint:
// fortifai.config.ts
export default {
// Your AI agent's HTTP endpoint
target: process.env.AGENT_STAGING_URL ?? "http://localhost:3000/api/chat",
// HTTP method your agent uses
method: "POST" as const,
// Request headers (auth token, content type)
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${process.env.AGENT_INTERNAL_TOKEN}`,
},
// Request body shape — {{FORTIFAI_PAYLOAD}} is replaced with each test payload
requestBody: {
messages: [
{
role: "user",
content: "{{FORTIFAI_PAYLOAD}}"
}
]
},
// Where in the response to find the agent's reply
responseExtractor: "choices[0].message.content",
// Which OWASP categories to run (default: all)
categories: ["AA1", "AA2", "AA3", "AA4", "AA5", "AA6"],
// Fail the scan if any of these severities are found
failOn: ["critical", "high"],
}Step 2: GitHub Actions Integration
# .github/workflows/ai-security.yml
name: AI Agent Security Scan
on:
push:
branches: [main, staging]
pull_request:
branches: [main]
jobs:
ai-security-scan:
name: Adversarial AI Security Scan
runs-on: ubuntu-latest
needs: [build, deploy-staging] # Run after staging deployment
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
- name: Run FortifAI Adversarial Scan
run: npx fortifai scan
env:
FORTIFAI_API_KEY: ${{ secrets.FORTIFAI_API_KEY }}
AGENT_STAGING_URL: ${{ secrets.AGENT_STAGING_URL }}
AGENT_INTERNAL_TOKEN: ${{ secrets.AGENT_INTERNAL_TOKEN }}
- name: Upload Security Report
if: always()
uses: actions/upload-artifact@v4
with:
name: fortifai-security-report
path: fortifai-report.json
retention-days: 90
- name: Post PR Comment with Findings
if: github.event_name == 'pull_request' && failure()
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = JSON.parse(fs.readFileSync('fortifai-report.json', 'utf8'));
const criticalFindings = report.findings.filter(f => f.severity === 'critical');
const body = `## ⚠️ AI Security Scan Failed\n\n` +
`**Critical findings:** ${criticalFindings.length}\n\n` +
criticalFindings.map(f => `- **${f.category}**: ${f.title}`).join('\n');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body
});Step 3: GitLab CI Integration
# .gitlab-ci.yml (AI security stage)
ai-security-scan:
stage: security
image: node:20-alpine
needs: [deploy-staging]
script:
- npx fortifai scan
artifacts:
when: always
paths:
- fortifai-report.json
expire_in: 90 days
variables:
FORTIFAI_API_KEY: $FORTIFAI_API_KEY
AGENT_STAGING_URL: $AGENT_STAGING_URL
rules:
- if: $CI_COMMIT_BRANCH == "main"
- if: $CI_PIPELINE_SOURCE == "merge_request_event"Step 4: Configure Severity Gates
The most important decision is what severity level blocks a deployment. Recommended defaults:
| Severity | Finding Type | Gate Policy |
|---|---|---|
| Critical | Agent fully complied with adversarial instruction | Block deployment |
| High | Agent showed significant vulnerability (partial compliance, data leakage) | Block deployment |
| Medium | Agent showed minor vulnerability or defense gap | Warn + create ticket |
| Low | Informational security improvement | Create ticket |
Configure your gates in fortifai.config.ts:
failOn: ["critical", "high"], // Exit code 1 on these severities
warnOn: ["medium"], // Exit code 0, but print warningsStep 5: Handle False Positives
Adversarial AI testing on probabilistic systems will produce some false positives — the model may behave non-deterministically, producing a "vulnerable" response on one run and a clean response on the next.
Managing this:
// fortifai.config.ts
retries: 3, // Re-run payloads that produce findings to confirm
confirmThreshold: 2, // Finding must appear in 2 of 3 runs to be reportedWith a confirmThreshold: 2 setting, a finding only blocks deployment if it reproduces reliably — reducing false-positive-caused deployment blocks while maintaining security signal.
Step 6: Shift Left — Run in Local Dev
Developers don't have to wait for CI to catch AI security regressions. FortifAI runs locally:
# Run against local dev server
AGENT_STAGING_URL=http://localhost:3000/api/chat npx fortifai scan
# Quick scan — run only Critical-severity payload categories
npx fortifai scan --categories AA1,AA2,AA6
# Watch mode — re-scan on file changes (for prompt/config development)
npx fortifai scan --watchAdding a pre-commit hook:
# .husky/pre-commit
npx fortifai scan --quick --fail-on criticalThis catches prompt injection regressions before they're even committed.
Step 7: Track Security Posture Over Time
The FortifAI dashboard aggregates scan results across deployments, showing:
- Security posture trend over time (are Critical findings increasing or decreasing?)
- Finding distribution by OWASP category
- Regression tracking (which changes introduced which findings)
- Coverage confirmation (were all OWASP categories scanned?)
Use this data to:
- Identify OWASP categories where your agent consistently struggles
- Detect when model updates cause security regressions
- Generate compliance evidence for NIST AI RMF or internal audit requirements
Sample Security Gate Policy for Teams
Pull Request → Staging → AI Security Scan → Production
Gate 1 (Staging gate): Block on Critical or High
Gate 2 (Production gate): Block on Critical; require CISO sign-off on High
Exemptions: High findings may be accepted with documented risk decision
from security team lead, expiring after 30 days.
Reporting: Monthly security posture report generated from scan history.FortifAI integrates with any CI/CD system via the npx fortifai scan CLI. Set up your first pipeline scan → | Read the CLI docs →