Files
guarddog-nexus/AGENTS.md
Marker689 6984844161 feat: LLM-анализ — индикатор прогресса, кнопка рескана, статистика на дашборде
- Добавлен статус {"status": "analyzing"} в finding.report на время LLM-анализа
- Кнопка рескана (Retry) под LLM-отчётом в ручном режиме
- LLM-статистика на дашборде: analysed / pending
- Защита от двойного анализа через per-finding asyncio.Lock
- _llm_spinner.html — фрагмент спиннера для состояния analysing
- Удалён мёртвый код: constants, i18n, CSS, queries
- Фиксы: _env_int, индексы БД, UnicodeDecodeError, time.mktime и др.
- Шаблоны: shared includes (_status_badge, _pagination)
- AGENTS.md: workflow (lint, test, commit, rebuild)
2026-05-10 09:54:04 +03:00

7.8 KiB

AGENTS.md — GuardDog Nexus

Project overview

GuardDog Nexus integrates GuardDog with Sonatype Nexus Repository Manager. It receives webhooks from Nexus, downloads packages from proxied repositories, scans them with GuardDog CLI, and stores results in SQLite. A web dashboard built with FastAPI + Jinja2 + htmx displays findings with optional LLM-based analysis.

Stack: Python 3.12, FastAPI, SQLAlchemy (async), aiosqlite, Jinja2, htmx, Docker Compose.

Package ecosystems: PyPI, Go (proxy.golang.org), npm (registry.npmjs.org).

GuardDog binary: installed inside the Docker image via uv pip install --system guarddog. Supports pypi, go, npm subcommands.


Quick start

cp .env.example .env
# edit .env to set NEXUS_PASSWORD, optionally LLM vars
make docker-up
# → guarddog-nexus :8080, Nexus :8081

For local development without Docker:

make install dev
export $(cat .env | xargs)
python -m guarddog_nexus.main
make test     # 50 tests
make lint     # ruff
make format   # ruff format + fix

Architecture

guarddog_nexus/
├── core/              # Business logic
│   ├── scanner.py     # subprocess guarddog CLI
│   ├── harvester.py   # download → sha256 → scan → store
│   ├── nexus.py       # httpx client + pypi/go/npm path extractors
│   └── llm.py         # OpenAI-compatible analysis client
├── db/                # Persistence
│   ├── engine.py      # async SQLAlchemy + auto-migration
│   ├── models.py      # Scan, Finding ORM
│   └── queries.py     # shared query builders
├── routes/            # HTTP layer
│   ├── webhooks.py    # POST /webhooks/nexus
│   ├── api_scans.py   # REST /api/v1/scans
│   ├── api_packages.py
│   ├── api_findings.py
│   ├── metrics.py     # GET /metrics (Prometheus)
│   └── web.py         # HTML UI (Jinja2 + htmx)
├── web/               # Static assets
│   ├── templates/     # Jinja2 templates
│   └── static/        # CSS, JS
├── config.py          # env-var configuration dataclass
├── constants.py       # all magic strings/limits
├── i18n.py            # RU/EN translation dictionaries
├── logging_setup.py   # JSON logging + syslog
└── main.py            # FastAPI app, middleware, lifepan

Data flow:

  1. Nexus sends UPDATED webhook → POST /webhooks/nexus
  2. webhooks.py validates signature, extracts asset info, spawns background task
  3. harvester.py downloads file, computes SHA256, deduplicates
  4. scanner.py runs guarddog <ecosystem> scan <file> --output-format json
  5. Findings stored in SQLite (scans + findings tables)
  6. If LLM_ENABLED=1, llm.py sends each finding to the configured model, stores report in findings.report

Key conventions

  • Python ≥ 3.10 — type hints with | None syntax, no Optional
  • Imports: absolute from guarddog_nexus.<module>; relative (..) within subpackages
  • Line length: 100 (ruff)
  • Lint: ruff check guarddog_nexus tests (E/F/I/W rules)
  • Format: ruff format guarddog_nexus tests
  • Tests: pytest -v (50 tests, pytest-asyncio auto mode)
  • Commits: Russian descriptions, prefix convention: feat:, fix:, refactor:, docs:, ui:
  • No comments in code unless explicitly requested

Configuration

All via environment variables, defined in config.py. Key ones:

Variable Default Notes
NEXUS_URL http://localhost:8081
NEXUS_PASSWORD Required
WEBHOOK_SECRET "" HMAC-SHA256 validation
MAX_CONCURRENT_SCANS 4 asyncio.Semaphore for guarddog processes
LLM_ENABLED 0 1 to enable analysis
LLM_API_KEY "" OpenAI-compatible key
LLM_MODEL gpt-4o-mini
LLM_MAX_CONCURRENT_ANALYSES 2 Semaphore for LLM calls
DATABASE_PATH data/guarddog.db

Full list in config.py.


Database

  • SQLite via aiosqlite async driver
  • Tables: scans, findings
  • Auto-migration in db/engine.py_migrate() adds missing columns on startup
  • Scan fields: id, package_name, package_version, ecosystem, repository, nexus_asset_url, sha256, status, total_findings, flagged, started_at, finished_at, error_message, initiator, source_ip
  • Finding fields: id, scan_id, data (JSON), report (JSON, nullable), created_at

Webhooks

Only UPDATED action is accepted (not CREATED). Format field in asset data determines ecosystem: pypi, go, npm.

Per-URL locking (asyncio.Lock) prevents parallel scans of the same asset. SHA256 dedup prevents re-scanning identical file content.


Templates (htmx)

  • Fragment templates prefixed with _ are returned for HX-Request requests
  • Filter-bar lives outside htmx target — never replaced
  • Sortable columns use hx-get with all filter params in URL
  • Language persists via cookie lang, set by middleware

Docker

docker compose up -d --build   # build + start
docker compose down             # stop
docker compose down -v          # stop + destroy volumes (make docker-destroy)
docker compose logs -f          # tail logs

The Dockerfile parses pyproject.toml for dependency list (single source of truth). GuardDog is installed as a separate uv pip install step.


Testing

  • make test runs pytest -v
  • Tests use in-memory SQLite (:memory:)
  • conftest.py sets up os.environ before importing the app
  • Mock guarddog output via fixtures — no real CLI execution
  • 50 tests covering: API, webhooks, harvester, scanner, web UI

When adding features:

  • Always python3 -m pytest -v before committing
  • Always ruff check guarddog_nexus tests
  • Add tests for new extractors, endpoints, edge cases

Common tasks

Add a new ecosystem:

  1. Add extractor in core/nexus.pyEXTRACTORS dict
  2. Add format handler in routes/webhooks.py_detect_ecosystem()
  3. Create proxy repo in scripts/setup-nexus.sh
  4. Add test fixture in tests/conftest.py
  5. Add tests in tests/test_nexus.py

Add a new env var:

  1. config.py → field in Config dataclass
  2. .env.example → add with default
  3. docker-compose.yml → add to environment if needed in Docker
  4. README.md → add to env table

Add a UI string (i18n):

  1. i18n.py → add to _STRINGS dict
  2. Template → {{ t('key', request.state.lang) }}

Run a manual webhook test:

curl -X POST http://localhost:8080/webhooks/nexus \
  -H "Content-Type: application/json" \
  -d '{"action":"UPDATED","repositoryName":"pypi-proxy",
       "asset":{"format":"pypi","name":"/packages/pkg/ver/pkg-ver.tar.gz",
       "downloadUrl":"http://nexus:8081/repository/pypi-proxy/packages/pkg/ver/pkg-ver.tar.gz"}}'

Notes

  • AI-generated code: all code in this repository was generated by an AI assistant (Claude). Review carefully before production use.
  • No Nexus Pro required: the system works with Nexus OSS. Webhooks can be triggered manually or via community plugins.
  • GuardDog deadlocks: GuardDog is CPU-intensive. Use MAX_CONCURRENT_SCANS to avoid resource exhaustion.
  • LLM may be slow: increase LLM_TIMEOUT_SECONDS for large models. Set LLM_MAX_CONCURRENT_ANALYSES to limit parallel requests.

Workflow

After every change — follow these steps in order:

  1. Document — update AGENTS.md if the change introduces a new concept, env var, endpoint, or workflow.
  2. Lintruff check guarddog_nexus && ruff format guarddog_nexus
  3. Testpython3 -m pytest -v (must pass 100%)
  4. Commit — use the existing commit prefix convention (feat:, fix:, refactor:, docs:, ui:).
  5. Rebuilddocker compose up -d --build to deploy changes.