Semantic SAST + LLM Verification Benchmark

The baseline question is simple: can a developer run a scanner against real vulnerable Java projects and get useful evidence before code reaches human review?

In the benchmark copy already used on Cognium research pages, Cognium reports 42.5% CVE detection for SAST-only analysis on CWE-Bench-Java, compared with a 22.5% CodeQL baseline on the same dataset. With the LLM verification layer added, the reported detection rate is 81.7%.

42.5%Cognium SAST-only

81.7%SAST + LLM verification

22.5%CodeQL baseline

Why developers should care

AI-generated code changes the review loop. Pull requests can arrive faster than security teams can manually inspect them, and generic linters do not explain whether a finding is exploitable. Developers need findings that show source, path, sink, and remediation context.

Run a local scan before opening a pull request.
Export SARIF so findings appear in GitHub code scanning.
Use LLM verification to rank findings that need human attention.
Tune source, sink, and sanitizer definitions for internal frameworks.

Reproducible workflow

The article should link readers directly to commands they can run. The first version can use the open-source scanner and a CI workflow, then expand with dataset and harness details as the benchmark repository is published.

benchmark workflow

# install Cognium
npm install -g cognium

# scan a Java repository and emit SARIF
cognium scan . --format sarif -o cognium.sarif

# upload SARIF in CI for pull request review
github/codeql-action/upload-sarif@v3

How this supports AI trust verification

The SAST result is the first evidence layer. Cognium can combine the scan output with AI trust scoring, agent provenance, and skills registry evidence to decide whether an AI-generated pull request is ready for human review.

That positioning makes the article useful for both search and conversion: developers find concrete benchmark data, then get a practical path into local scanning, CI integration, and enterprise pilot evaluation.

Read developer workflow Request pilot