9.4 KiB
AGENTS.md — GuardDog Nexus
Project overview
GuardDog Nexus integrates GuardDog with Sonatype Nexus Repository Manager. It receives webhooks from Nexus, downloads packages from proxied repositories, scans them with GuardDog CLI, and stores results in SQLite. A web dashboard built with FastAPI + Jinja2 + htmx displays findings with optional LLM-based analysis.
Stack: Python 3.12, FastAPI, SQLAlchemy (async), aiosqlite, Jinja2, htmx, Docker Compose.
Package ecosystems: PyPI, Go (proxy.golang.org), npm (registry.npmjs.org).
GuardDog binary: installed inside the Docker image via uv pip install --system guarddog. Supports pypi, go, npm subcommands.
Quick start
cp .env.example .env
# edit .env to set LLM vars if needed
make docker-up
# → guarddog-nexus :8080, Nexus :8081
For local development without Docker:
make install dev
export $(cat .env | xargs)
python -m guarddog_nexus.main
make test # 85 tests
make lint # ruff
make format # ruff format + fix
Architecture
guarddog_nexus/
├── core/ # Business logic
│ ├── scanner.py # subprocess guarddog CLI
│ ├── harvester.py # download → sha256 → scan → store
│ ├── nexus.py # httpx client + pypi/go/npm path extractors
│ └── llm.py # OpenAI-compatible analysis client
├── db/ # Persistence
│ ├── engine.py # async SQLAlchemy + auto-migration
│ ├── models.py # Scan, Finding ORM
│ └── queries.py # shared query builders
├── routes/ # HTTP layer
│ ├── webhooks.py # POST /webhooks/nexus
│ ├── api_scans.py # REST /api/v1/scans
│ ├── api_packages.py
│ ├── api_findings.py
│ ├── metrics.py # GET /metrics (Prometheus)
│ └── web.py # HTML UI (Jinja2 + htmx)
├── web/ # Static assets
│ ├── templates/ # Jinja2 templates
│ └── static/ # CSS, JS
├── config.py # env-var configuration dataclass
├── constants.py # all magic strings/limits
├── i18n.py # RU/EN translation dictionaries
├── logging_setup.py # JSON logging + syslog
└── main.py # FastAPI app, middleware, lifepan
Data flow:
- Nexus sends
UPDATEDwebhook →POST /webhooks/nexus webhooks.pyvalidates signature, extracts asset info, spawns background taskharvester.pydownloads file (async viaasyncio.to_thread), computes SHA256, deduplicatesscanner.pyrunsguarddog <ecosystem> scan <file> --output-format json- Findings stored in SQLite (
scans+findingstables) - If
LLM_ENABLED=1andLLM_AUTO_ANALYZE=1,llm.pysends each finding to the configured model.finding.reportstate machine:None→{"status": "analyzing"}→{verdict, summary, analysis, severity_rating}orNoneon failure.
Key conventions
- Python ≥ 3.10 — type hints with
| Nonesyntax, noOptional - Imports: absolute from
guarddog_nexus.<module>; relative (..) within subpackages - Line length: 100 (ruff)
- Lint:
ruff check guarddog_nexus tests(E/F/I/W rules) - Format:
ruff format guarddog_nexus tests - Tests:
pytest -v(85 tests, pytest-asyncio auto mode) - Commits: Russian descriptions, prefix convention:
feat:,fix:,refactor:,docs:,ui: - No comments in code unless explicitly requested
- Async I/O: file reads/writes wrapped in
asyncio.to_thread()— never rawopen()in async context
Configuration
All via environment variables, defined in config.py. Key ones:
| Variable | Default | Notes |
|---|---|---|
NEXUS_URL |
http://localhost:8081 |
|
NEXUS_ALLOWED_HOSTS |
host from NEXUS_URL |
comma-separated, SSRF protection |
WEBHOOK_SECRET |
"" |
HMAC-SHA256 validation |
MAX_CONCURRENT_SCANS |
4 |
asyncio.Semaphore for guarddog processes |
LLM_ENABLED |
0 |
1 to enable analysis |
LLM_AUTO_ANALYZE |
0 |
1 to auto-trigger after scan; 0 = manual via UI button |
LLM_API_KEY |
"" |
OpenAI-compatible key |
LLM_MODEL |
gpt-4o-mini |
|
LLM_MAX_CONCURRENT_ANALYSES |
2 |
Semaphore for LLM calls |
DATABASE_PATH |
data/guarddog.db |
Full list in config.py.
Database
- SQLite via
aiosqliteasync driver - Tables:
scans,findings - Auto-migration in
db/engine.py—_migrate()adds missing columns on startup - Indexes created in
_ensure_indexes():scans.status,scans.sha256,scans.package_name,scans.package_version,scans.flagged,scans.nexus_asset_url,findings.scan_id Scanfields: id, package_name, package_version, ecosystem, repository, nexus_asset_url, sha256, status, total_findings, flagged, started_at, finished_at, error_message, initiator, source_ipFindingfields: id, scan_id, data (JSON), report (JSON, nullable), created_at
LLM analysis
finding.report drives UI state:
| Value | UI |
|---|---|
None |
Show "Analyze with LLM" button (manual mode only) |
{"status": "analyzing"} |
Show spinner |
{verdict:, summary:, ...} |
Show report + "Retry" link |
Auto mode (LLM_AUTO_ANALYZE=1): analysis runs immediately after scan; button hidden.
Manual mode (LLM_AUTO_ANALYZE=0): user clicks button; button visible for each finding.
Per-finding asyncio.Lock in web.py prevents concurrent analysis of the same finding. Retry passes ?retry=1 to bypass the idempotency guard.
Webhooks
Only UPDATED action is accepted (not CREATED). Format field in asset data determines ecosystem: pypi, go, npm.
Per-URL locking (asyncio.Lock) prevents parallel scans of the same asset. SHA256 dedup prevents re-scanning identical file content.
Templates (htmx)
- Fragment templates prefixed with
_are returned forHX-Requestrequests - Filter-bar lives outside htmx target — never replaced
- Sortable columns use
hx-getwith all filter params in URL - Language persists via cookie
lang, set by middleware - Shared includes:
_status_badge.html(scan status),_pagination.html(page nav),_llm_spinner.html(LLM progress)
Docker
docker compose up -d --build # build + start
docker compose down # stop
docker compose down -v # stop + destroy volumes (make docker-destroy)
docker compose logs -f # tail logs
The Dockerfile parses pyproject.toml for dependency list (single source of truth). GuardDog is installed as a separate uv pip install step.
Testing
make testrunspytest -v- Tests use in-memory SQLite (
:memory:) conftest.pysets upos.environbefore importing the app- Mock
guarddogoutput via fixtures — no real CLI execution - 85 tests covering: API, webhooks, harvester, scanner, web UI
When adding features:
- Always
python3 -m pytest -vbefore committing - Always
ruff check guarddog_nexus tests - Add tests for new extractors, endpoints, edge cases
Common tasks
Add a new ecosystem:
- Add extractor in
core/nexus.py→EXTRACTORSdict - Add format handler in
routes/webhooks.py→_detect_ecosystem() - Create proxy repo in
scripts/setup-nexus.sh - Add test fixture in
tests/conftest.py - Add tests in
tests/test_nexus.py
Add a new env var:
config.py→ field in Config dataclass.env.example→ add with defaultdocker-compose.yml→ add to environment if needed in DockerREADME.md→ add to env table
Add a UI string (i18n):
i18n.py→ add to_STRINGSdict- Template →
{{ t('key', request.state.lang) }}
Run a manual webhook test:
curl -X POST http://localhost:8080/webhooks/nexus \
-H "Content-Type: application/json" \
-d '{"action":"UPDATED","repositoryName":"pypi-proxy",
"asset":{"format":"pypi","name":"/packages/pkg/ver/pkg-ver.tar.gz",
"downloadUrl":"http://nexus:8081/repository/pypi-proxy/packages/pkg/ver/pkg-ver.tar.gz"}}'
Notes
- AI-generated code: all code in this repository was generated by an AI assistant (Claude). Review carefully before production use.
- No Nexus Pro required: the system works with Nexus OSS. Webhooks can be triggered manually or via community plugins.
- GuardDog deadlocks: GuardDog is CPU-intensive. Use
MAX_CONCURRENT_SCANSto avoid resource exhaustion. - LLM may be slow: increase
LLM_TIMEOUT_SECONDSfor large models. SetLLM_MAX_CONCURRENT_ANALYSESto limit parallel requests.
Workflow
Workflow — MANDATORY after completing a feature or session
Before responding to the user, you MUST complete ALL of:
- Lint —
ruff check guarddog_nexus tests(must pass) +ruff format guarddog_nexus tests - Test —
python3 -m pytest -v(must pass 100%) - Commit —
git add -A && git commit -m "prefix: description"using the existing prefix convention (feat:,fix:,refactor:,docs:,ui:) - Rebuild —
docker compose up -d --build - Document — update
AGENTS.mdif the change introduces a new concept, env var, endpoint, or workflow
If you skip any of these, the user will need to do them manually. Do NOT skip commit and rebuild.
These steps must be executed sequentially — lint before test, test before commit, commit before rebuild.