Research — Cognium

Research Hub

Advisories

Monthly category-level disclosures on AI agent vulnerabilities. Numbered CA-YYYY-NNN format with permanent URLs.

Benchmarks

Reproducible vulnerability detection benchmarks. We measured 42.5% (SAST-only) and 81.7% (SAST+LLM) on CWE-Bench-Java.

Reports

Quarterly reports on the AI agent ecosystem. Q1: Skills. Q2: Agents. Q3: OSS. Q4: Supply Chain.

Methodology

How we scan, score, and verify. Dataset documentation, evaluation harness, and limitations. arXiv preprint coming.

Artifacts

Reproducible benchmark summaries, demo Spaces, and evaluation metadata are published under CogniumHQ on Hugging Face.

SAST-Only

CWE-Bench-Java, 120 projects. Cognium SAST alone. CodeQL on same dataset: 22.5%. Reproduce it yourself.

SAST + LLM

Same dataset. SAST + LLM verification layer. 3.6x improvement over CodeQL baseline. Repo ships Week 3.