Static analysis is the foundation. Cognium-AI runs deterministic security analysis first, then optionally asks an OpenAI-compatible model to add context, explanation, and prioritization. If the model is unavailable, slow, or disabled, static results still return.
That separation matters for teams adopting AI-generated code. The security gate should not depend on a model being perfect. The LLM layer should improve triage while the scanner keeps the baseline reliable.
What LLM enrichment adds
LLM enrichment is useful when a finding needs more surrounding context than a static rule can cheaply encode. The model can read nearby code, explain why a path matters, and help distinguish a practical issue from a theoretical warning.
- Contextual vulnerability explanations for developers reviewing pull requests.
- Better prioritization when several findings compete for attention.
- Specification generation for teams using Specifica as a behavioral baseline.
- Provider flexibility through OpenAI-compatible endpoints.
Recommended model setups
The right model depends on your operating constraint: local privacy, CI speed, cloud convenience, or maximum review depth. The table below summarizes the validated configurations from the recent comparison run.
| Setup | Model | Observed scan time | Best fit |
|---|---|---|---|
| Ollama local | llama3.2:3b | ~29s | Fast local scans, developer laptops, and CI jobs with strict time budgets. |
| Ollama local | qwen2.5-coder:7b | ~64s | Code-heavy repositories where a coding-specialist local model is preferred. |
| Ollama local | phi4-mini | ~44s | Teams standardizing on compact Microsoft-aligned models. |
| GitHub Models or OpenAI | openai/gpt-4o-mini | ~10s | Cloud CI where speed and operational simplicity matter. |
| OpenAI or Azure OpenAI | gpt-4o | ~15s | High-stakes review paths where deeper analysis is worth the extra cost. |
The benchmark used python-vuln-demo, a five-file, 396-line Python repository with intentional vulnerabilities. Treat these timings as directional, not universal. Repository size, model host, network latency, and prompt volume will change results.
Local setup with Ollama
Use local models when code cannot leave your environment or when developers need a repeatable laptop workflow without cloud credentials.
# install Ollama on macOS brew install ollama # pull the fastest validated local model ollama pull llama3.2:3b # configure Cognium-AI export LLM_ENRICHMENT_MODEL="llama3.2:3b" export LLM_BASE_URL="http://localhost:11434/v1" export LLM_API_KEY="ollama" cognium-ai scan ./src
Cloud setup with GitHub Models
Use GitHub Models when the team already lives in GitHub and wants cloud-hosted LLM enrichment without running model infrastructure.
export LLM_ENRICHMENT_MODEL="openai/gpt-4o-mini" export LLM_BASE_URL="https://models.github.ai/inference" export LLM_API_KEY="your_github_token" cognium-ai scan ./src
OpenAI or Azure-compatible setup
Cognium-AI uses OpenAI-compatible configuration, so teams can point enrichment at OpenAI, Azure OpenAI, an internal gateway, or a compatible local server.
export LLM_ENRICHMENT_MODEL="gpt-4o-mini" export LLM_BASE_URL="https://api.openai.com/v1" export LLM_API_KEY="sk-..." cognium-ai scan ./src
CI workflow
Start with report-only output. Once the signal is tuned, add thresholds and exit codes for repositories that are ready for blocking gates.
name: Cognium-AI Security Scan
on: [pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm install -g cognium-ai
- name: Run Cognium-AI
env:
LLM_ENRICHMENT_MODEL: openai/gpt-4o-mini
LLM_BASE_URL: https://models.github.ai/inference
LLM_API_KEY: ${{ secrets.GITHUB_TOKEN }}
run: |
cognium-ai scan . -f sarif -q -o cognium.sarif
cognium-ai trust . -f json -q -o trust.json
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: cognium.sarifModel selection checklist
Use local models when privacy or air-gapped execution is the priority. Use cloud models when speed and operational simplicity matter. Avoid reasoning-heavy models for per-file scan enrichment unless you have explicitly measured the latency and token cost.
- Fastest local default:
llama3.2:3bthrough Ollama. - Code-specialist local option:
qwen2.5-coder:7b. - Fast cloud default:
openai/gpt-4o-minithrough GitHub Models or OpenAI. - Higher-sensitivity cloud review:
gpt-4othrough OpenAI or Azure OpenAI.
Commands that matter most
For the current Cognium-AI command surface, the production workflow should center on scan, trust, quality, spec generation, and spec-drift checks.
# security scan with optional LLM enrichment cognium-ai scan ./src # deterministic trust and quality evidence cognium-ai trust ./src -f json -o trust.json cognium-ai quality ./src -f json -o quality.json # Specifica workflow cognium-ai generate-spec ./src --all cognium-ai spec-diff ./src --threshold 70 --exit-code # environment checks cognium-ai doctor cognium-ai version
Public artifacts
Cognium publishes reproducible benchmark summaries, demo Spaces, and evaluation metadata under CogniumHQ on Hugging Face as public artifacts become available. Use those artifacts alongside the open-source scanner and benchmark pages when evaluating model-assisted security scan enrichment.
Bottom line
LLM enrichment should make findings easier to understand, not make the security gate fragile. Cognium-AI keeps static analysis as the safety net and lets teams add local or cloud model reasoning when the workflow benefits from it.