5.5 KiB
GuardDog Nexus - Improvement Plan
Status Check: Completed Analysis
After thoroughly reviewing the codebase, I've identified several areas for improvement. Below is the prioritized plan.
Priority 1: Critical Bug Fixes
1.1 Fix Harvester Race Condition (T4)
File: guarddog_nexus/core/harvester.py
Issue: Lines 56-81 have a race condition where the URL lock cleanup happens in a finally block while the lock check happens before acquisition.
Current problematic flow:
# Line 56-65: Check if locked
async with _url_lock:
if download_url not in _url_locks:
_url_locks[download_url] = asyncio.Lock()
lock = _url_locks[download_url]
if lock.locked():
# Skip...
async with _url_lock:
_url_locks.pop(download_url, None) # Line 64
return None
async with lock:
# ... work ...
finally:
async with _url_lock:
_url_locks.pop(download_url, None) # Line 81
Fix: Ensure lock cleanup only happens after successful work completion, not on early returns.
1.2 Fix LLM Response Parsing (T4)
File: guarddog_nexus/core/llm.py
Issue: Line 81 assumes body["choices"][0]["message"]["content"] exists without validation.
Fix: Add proper error handling:
try:
choices = body.get("choices", [])
if not choices:
raise ValueError("No choices in response")
message = choices[0].get("message", {})
content = message.get("content", "")
return json.loads(content)
except (KeyError, IndexError, json.JSONDecodeError) as e:
...
Priority 2: Reliability Improvements
2.1 Add LLM Retry Logic (T4)
File: guarddog_nexus/core/llm.py
Issue: Failed LLM calls have no retry mechanism.
Fix: Add exponential backoff retry:
async def analyze_finding(finding_data: dict, max_retries: int = 3) -> dict | None:
for attempt in range(max_retries):
result = await _attempt_analysis(finding_data)
if result:
return result
if attempt < max_retries - 1:
await asyncio.sleep(2 ** attempt)
return None
2.2 Add Dependency Health Checks (T4)
File: guarddog_nexus/main.py
Issue: No health checks for database or external dependencies.
Fix: Add /health/dependencies endpoint:
@app.get("/health/dependencies")
async def health_dependencies():
checks = {
"database": await _check_db_health(),
"nexus": await _check_nexus_connectivity(),
}
status = 200 if all(checks.values()) else 503
return JSONResponse(status_code=status, content=checks)
2.3 Fix Lock Cleanup (T4)
Files: guarddog_nexus/core/harvester.py, guarddog_nexus/routes/web.py
Issue: _url_locks and _llm_locks dicts grow indefinitely.
Fix: Add periodic cleanup using asyncio.create_task():
async def _cleanup_locks():
while True:
await asyncio.sleep(3600) # Every hour
for key in list(_url_locks.keys()):
if not _url_locks[key].locked():
_url_locks.pop(key, None)
Priority 3: Code Quality
3.1 Add Type Hints Consistency (T1)
Files: Multiple files
Issue: Inconsistent use of dict vs Dict[str, Any] type hints.
3.2 Add Input Validation (T4)
File: guarddog_nexus/routes/webhooks.py
Issue: Limited validation of webhook payload structure.
Fix: Add validation for required fields before processing.
3.3 Add Logging for Security Events (T4)
Files: guarddog_nexus/core/harvester.py, guarddog_nexus/routes/webhooks.py
Issue: Security-related events not logged at appropriate levels.
Fix: Add WARNING/CRITICAL logging for:
- Failed authentication attempts
- Suspicious package patterns
- Rate limiting triggers
Priority 4: Feature Enhancements
4.1 Add Scan Status Tracking (T4)
File: guarddog_nexus/core/harvester.py
Issue: No visibility into scan progress for long-running packages.
Fix: Add intermediate status updates via WebSocket or polling endpoint.
4.2 Add Configuration Validation on Startup (T1)
File: guarddog_nexus/config.py
Issue: Invalid configurations discovered only at runtime.
Fix: Add validation in Config.__post_init__():
def __post_init__(self):
if not self.nexus_password:
raise ValueError("NEXUS_PASSWORD is required")
if self.llm_enabled and not self.llm_api_key:
raise ValueError("LLM_API_KEY required when LLM_ENABLED=1")
4.3 Add CSV Export with Filters (T4)
Files: guarddog_nexus/routes/api_scans.py, guarddog_nexus/routes/api_packages.py
Issue: CSV exports don't support all filter options available in API.
Fix: Sync filter parameters between API and CSV export endpoints.
Implementation Order
- Week 1: Priority 1 fixes (race condition, LLM parsing)
- Week 2: Priority 2 improvements (retries, health checks, lock cleanup)
- Week 3: Priority 3 code quality (type hints, validation, logging)
- Week 4: Priority 4 features (status tracking, config validation, CSV filters)
Verification Steps
After each change:
- Run
ruff check guarddog_nexus tests - Run
python3 -m pytest -v(must pass 85 tests) - Run
ruff format guarddog_nexus tests - Manual testing with Docker Compose
Risk Assessment
| Change | Risk Level | Mitigation |
|---|---|---|
| Harvester race condition fix | MEDIUM | Thorough concurrent testing |
| LLM retry logic | LOW | Ensure idempotency |
| Health checks | LOW | Graceful degradation |
| Lock cleanup | LOW | Conservative cleanup intervals |