Static analysis is the foundation. Cognium-AI runs deterministic security analysis first, then optionally asks an OpenAI-compatible model to add context, explanation, and prioritization. If the model is unavailable, slow, or disabled, static results still return.

That separation matters for teams adopting AI-generated code. The security gate should not depend on a model being perfect. The LLM layer should improve triage while the scanner keeps the baseline reliable.

<1s Static-only baseline on the sample repo
29s Fastest local LLM-enriched scan observed
10s Fast cloud LLM-enriched scan observed

What LLM enrichment adds

LLM enrichment is useful when a finding needs more surrounding context than a static rule can cheaply encode. The model can read nearby code, explain why a path matters, and help distinguish a practical issue from a theoretical warning.

  • Contextual vulnerability explanations for developers reviewing pull requests.
  • Better prioritization when several findings compete for attention.
  • Specification generation for teams using Specifica as a behavioral baseline.
  • Provider flexibility through OpenAI-compatible endpoints.

Recommended model setups

The right model depends on your operating constraint: local privacy, CI speed, cloud convenience, or maximum review depth. The table below summarizes the validated configurations from the recent comparison run.

SetupModelObserved scan timeBest fit
Ollama localllama3.2:3b~29sFast local scans, developer laptops, and CI jobs with strict time budgets.
Ollama localqwen2.5-coder:7b~64sCode-heavy repositories where a coding-specialist local model is preferred.
Ollama localphi4-mini~44sTeams standardizing on compact Microsoft-aligned models.
GitHub Models or OpenAIopenai/gpt-4o-mini~10sCloud CI where speed and operational simplicity matter.
OpenAI or Azure OpenAIgpt-4o~15sHigh-stakes review paths where deeper analysis is worth the extra cost.

The benchmark used python-vuln-demo, a five-file, 396-line Python repository with intentional vulnerabilities. Treat these timings as directional, not universal. Repository size, model host, network latency, and prompt volume will change results.

Local setup with Ollama

Use local models when code cannot leave your environment or when developers need a repeatable laptop workflow without cloud credentials.

ollama setup
# install Ollama on macOS
brew install ollama

# pull the fastest validated local model
ollama pull llama3.2:3b

# configure Cognium-AI
export LLM_ENRICHMENT_MODEL="llama3.2:3b"
export LLM_BASE_URL="http://localhost:11434/v1"
export LLM_API_KEY="ollama"

cognium-ai scan ./src

Cloud setup with GitHub Models

Use GitHub Models when the team already lives in GitHub and wants cloud-hosted LLM enrichment without running model infrastructure.

github models setup
export LLM_ENRICHMENT_MODEL="openai/gpt-4o-mini"
export LLM_BASE_URL="https://models.github.ai/inference"
export LLM_API_KEY="your_github_token"

cognium-ai scan ./src

OpenAI or Azure-compatible setup

Cognium-AI uses OpenAI-compatible configuration, so teams can point enrichment at OpenAI, Azure OpenAI, an internal gateway, or a compatible local server.

openai-compatible setup
export LLM_ENRICHMENT_MODEL="gpt-4o-mini"
export LLM_BASE_URL="https://api.openai.com/v1"
export LLM_API_KEY="sk-..."

cognium-ai scan ./src

CI workflow

Start with report-only output. Once the signal is tuned, add thresholds and exit codes for repositories that are ready for blocking gates.

.github/workflows/cognium-ai.yml
name: Cognium-AI Security Scan
on: [pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm install -g cognium-ai
      - name: Run Cognium-AI
        env:
          LLM_ENRICHMENT_MODEL: openai/gpt-4o-mini
          LLM_BASE_URL: https://models.github.ai/inference
          LLM_API_KEY: ${{ secrets.GITHUB_TOKEN }}
        run: |
          cognium-ai scan . -f sarif -q -o cognium.sarif
          cognium-ai trust . -f json -q -o trust.json
      - uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: cognium.sarif

Model selection checklist

Use local models when privacy or air-gapped execution is the priority. Use cloud models when speed and operational simplicity matter. Avoid reasoning-heavy models for per-file scan enrichment unless you have explicitly measured the latency and token cost.

  • Fastest local default: llama3.2:3b through Ollama.
  • Code-specialist local option: qwen2.5-coder:7b.
  • Fast cloud default: openai/gpt-4o-mini through GitHub Models or OpenAI.
  • Higher-sensitivity cloud review: gpt-4o through OpenAI or Azure OpenAI.

Commands that matter most

For the current Cognium-AI command surface, the production workflow should center on scan, trust, quality, spec generation, and spec-drift checks.

developer commands
# security scan with optional LLM enrichment
cognium-ai scan ./src

# deterministic trust and quality evidence
cognium-ai trust ./src -f json -o trust.json
cognium-ai quality ./src -f json -o quality.json

# Specifica workflow
cognium-ai generate-spec ./src --all
cognium-ai spec-diff ./src --threshold 70 --exit-code

# environment checks
cognium-ai doctor
cognium-ai version

Public artifacts

Cognium publishes reproducible benchmark summaries, demo Spaces, and evaluation metadata under CogniumHQ on Hugging Face as public artifacts become available. Use those artifacts alongside the open-source scanner and benchmark pages when evaluating model-assisted security scan enrichment.

Bottom line

LLM enrichment should make findings easier to understand, not make the security gate fragile. Cognium-AI keeps static analysis as the safety net and lets teams add local or cloud model reasoning when the workflow benefits from it.

Read developer guideRequest pilot