The baseline question is simple: can a developer run a scanner against real vulnerable Java projects and get useful evidence before code reaches human review?
In the benchmark copy already used on Cognium research pages, Cognium reports 42.5% CVE detection for SAST-only analysis on CWE-Bench-Java, compared with a 22.5% CodeQL baseline on the same dataset. With the LLM verification layer added, the reported detection rate is 81.7%.
Why developers should care
AI-generated code changes the review loop. Pull requests can arrive faster than security teams can manually inspect them, and generic linters do not explain whether a finding is exploitable. Developers need findings that show source, path, sink, and remediation context.
- Run a local scan before opening a pull request.
- Export SARIF so findings appear in GitHub code scanning.
- Use LLM verification to rank findings that need human attention.
- Tune source, sink, and sanitizer definitions for internal frameworks.
Reproducible workflow
The article should link readers directly to commands they can run. The first version can use the open-source scanner and a CI workflow, then expand with dataset and harness details as the benchmark repository is published.
# install Cognium npm install -g cognium # scan a Java repository and emit SARIF cognium scan . --format sarif -o cognium.sarif # upload SARIF in CI for pull request review github/codeql-action/upload-sarif@v3
How this supports AI trust verification
The SAST result is the first evidence layer. Cognium can combine the scan output with AI trust scoring, agent provenance, and skills registry evidence to decide whether an AI-generated pull request is ready for human review.
That positioning makes the article useful for both search and conversion: developers find concrete benchmark data, then get a practical path into local scanning, CI integration, and enterprise pilot evaluation.