Test Performance Metrics¶

This document tracks test performance baselines and CI parallelization configuration for the project.

Last updated: 2026-01-05 Related issue: NEM-1375

Test Suite Overview¶

Suite	Test Count	Parallelization	CI Configuration
Backend Unit	8,229	pytest-xdist (`-n auto --dist=worksteal`)	1 job, parallel workers
Backend Integration	1,556	Serial (`-n0`) - 4 domain shards	4 parallel jobs (API, WebSocket, Services, Models)
Frontend Vitest	~2,000+	Thread pool	8 shards
Frontend E2E (Chromium)	~500+	Playwright workers	4 shards
Frontend E2E (Firefox/WebKit)	~500+	Playwright workers	1 job each (non-blocking)

CI Parallelization Strategy¶

Backend Tests¶

Unit Tests¶

Configuration: Single job with pytest-xdist
Flags: -n auto --dist=worksteal
Rationale: Worksteal scheduling optimizes for uneven test durations
Expected runtime: ~60-90 seconds

Integration Tests (4 Domain Shards)¶

Domain-based sharding to maximize parallelism while avoiding database contention:

Shard	Job Name	Test Domains	Filter Pattern
1	`integration-tests-api`	API routes	`test__api or test_api_`
2	`integration-tests-websocket`	WebSocket/PubSub	`test_websocket or test__broadcaster or test__pubsub`
3	`integration-tests-services`	Business logic	`test__integration or test_batch or test_detector*`
4	`integration-tests-models`	Database models	`test_models or test_database or test_baseline*`

Configuration: Serial execution within each shard (-n0)
Rationale: Avoids database deadlocks from concurrent schema creation
Expected runtime per shard: ~60-90 seconds (runs in parallel)
Total wall-clock time: ~2-3 minutes (down from ~6-8 minutes serial)

Frontend Tests¶

Vitest (Component/Hook Tests)¶

Configuration: 8-shard matrix strategy
CI command: npx vitest run --shard=${{ matrix.shard }}/8
Timeout: 3 minutes per shard
Coverage: Collected per-shard, merged in frontend-coverage-merge job
Expected runtime per shard: ~30-60 seconds

Playwright E2E Tests¶

Primary (Chromium): 4 shards with --shard=1/4 through --shard=4/4
Secondary (Firefox/WebKit): 1 job each, non-blocking (continue-on-error: true)
Workers: 4 parallel workers in CI
Retries: 2 retries on CI for flaky test handling
Expected runtime per shard: ~2-4 minutes

Performance Thresholds¶

The scripts/audit-test-durations.py script enforces these thresholds:

Test Category	Max Duration	Warning at (80%)
Unit tests	1.0s	0.8s
Integration tests	5.0s	4.0s
E2E tests	5.0s	4.0s
Known slow tests	60.0s	48.0s
Benchmarks	Excluded	N/A

Known Slow Test Patterns¶

Tests matching these patterns use the extended 60s threshold:

Pipeline worker manager tests (real stop timeouts)
Health monitor subprocess tests
System broadcaster reconnection tests
Property-based tests with text generation
TLS certificate generation (RSA key gen)
E2E error state tests (API retry exhaustion)
Accessibility tests (axe-core analysis)
ReID service retry/backoff tests

See scripts/audit-test-durations.py for the full pattern list.

Baseline Metrics (2026-01-05)¶

Test Counts (Minimum Thresholds)¶

These baselines prevent accidental test deletion:

Suite	Minimum Count	Current Count
Backend Unit	2,900	8,229
Backend Integration	600	1,556
Frontend Test Files	50	135
Frontend E2E Specs	15	26

Expected CI Duration¶

Stage	Expected Duration	Notes
Lint + Typecheck	~1-2 min	Runs in parallel
Backend Unit Tests	~1-2 min	pytest-xdist parallel
Backend Integration Tests	~2-3 min	4 shards in parallel
Frontend Vitest	~1-2 min	8 shards in parallel
Frontend E2E (Chromium)	~3-5 min	4 shards in parallel
Total CI time	~8-12 min	All jobs parallelized

Improvement from Parallelization¶

Configuration	Estimated Time	Improvement
Serial (no parallelization)	~25-35 min	Baseline
Current parallelized	~8-12 min	60-70% reduction

Caching Strategy¶

GitHub Actions Cache¶

Cache	Key Pattern	Purpose
uv dependencies	`uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}`	Python packages
npm dependencies	`npm-${{ runner.os }}-${{ hashFiles('package-lock.json') }}`	Node packages
Playwright browsers	`playwright-*-${{ runner.os }}-${{ hashFiles('package-lock.json') }}`	Browser binaries
Docker layers	`gha,scope=backend` / `gha,scope=frontend`	Image build layers

Test Performance Audit Job¶

The test-performance-audit CI job:

Downloads JUnit XML results from all test jobs
Parses test durations from XML files
Flags tests exceeding thresholds
Reports warnings for tests approaching limits
Excludes benchmark tests from analysis

Running Locally¶

# After running tests with JUnit output
uv run python scripts/audit-test-durations.py test-results/

# Environment variable overrides
UNIT_TEST_THRESHOLD=2.0 \
INTEGRATION_TEST_THRESHOLD=10.0 \
uv run python scripts/audit-test-durations.py test-results/

Monitoring and Maintenance¶

Weekly Checks¶

Review CI run times in GitHub Actions
Check for new slow tests in audit reports
Verify test count thresholds remain accurate
Update slow test patterns if needed

When to Update Baselines¶

Update this document when:

Test count baselines change significantly (add tests)
New slow test patterns are identified
CI configuration changes (shards, parallelization)
Performance improvements are made

Alerting¶

The CI workflow includes:

test-count-verification job - fails if test counts drop below thresholds
test-performance-audit job - warns on slow tests, fails if threshold exceeded

References¶

CI workflow: .github/workflows/ci.yml
Test duration audit: scripts/audit-test-durations.py
Vitest config: frontend/vite.config.ts
Playwright config: frontend/playwright.config.ts
pytest config: pyproject.toml