Engineering Standards
The machine-readable spec that drives automated conformance checks across every service. Fetched live from ecosystem-standards at build time.
Principles9 rules
Reuse over rebuild
Work in one layer should have a clear path to surfacing in another. Cogs feed APIs. APIs feed sites. Shared libraries reduce duplication across all layers.
Pipeline resilience — continue on item failure
Pipelines continue on per-file errors. One bad input does not abort a run. Failures are logged, prefixed, and left for retry — never silently dropped.
AI evaluates process, not data
AI is infrastructure, not a feature. It runs during the pipeline to evaluate health and conformance — not to enrich output data for end users.
New stack, opportunistic migration
New things are built with the current best stack. Existing things migrate when the opportunity arises — not on a forced schedule.
Observability is not optional
If you cannot see what a system is doing in production, you cannot deploy it confidently. Logging, error tracking, and run history are requirements, not enhancements. Every production service must have all three.
Deploy anytime, release when ready
Code and features are decoupled. Deployments are safe at any time. Feature flags control what users see. A flag should be the first tool considered when a feature spans multiple services or when the frontend needs to ship before the API is ready. See CD-001 for the full feature flag standard.
Portfolio artifacts are working systems
Portfolio pieces are live, queryable, demonstrable systems — not case studies, READMEs, or screenshots.
Every bug fix includes a regression test
If a bug was found in production, a test is written that would have caught it. This is non-negotiable.
Standards are a living document
This document is updated as the ecosystem evolves. Outdated standards are worse than no standards — they create confusion and erode trust in the process. Standards must be reviewed at least every 90 days.
Python16 rules
uv for dependency management
uv replaces Poetry. Faster installs, simpler lockfile, better resolution. All new cogs use uv. Existing cogs migrate when touched.
ruff for linting and formatting
ruff replaces black + isort + flake8. Single tool, significantly faster. Configured in pyproject.toml.
Python 3.11+ minimum version
Minimum version for new work is Python 3.11. Python 3.13 preferred. Type hints used throughout.
Pydantic for all external data validation
All external data (CSV rows, API responses, Drive file metadata) validated through Pydantic models before processing. Replaces ad-hoc dict access.
hatchling as build backend
hatchling is the ecosystem standard build backend for all Python packages. Already used by common-python-utils. Requires no [tool.setuptools] config. New repos must use hatchling from day one. Existing repos migrate as a chore commit.
ruff line length is 88
All Python repos set line-length = 88 in [tool.ruff] in pyproject.toml. Matches Python community default and Black. Ensures cross-repo consistency.
src layout required
All packages use src/<package_name>/ + tests/<package_name>/. No flat layouts.
common-python-utils declared as dependency
All cogs declare common-python-utils as a dependency. Shared behaviors live there, not duplicated per-repo. Flag if cog re-implements logging, API clients, or metadata helpers.
pyproject.toml as single source of truth
pyproject.toml is the only configuration file. No setup.py, no requirements.txt.
pre-commit configured
ruff and basic hooks configured via .pre-commit-config.yaml. Runs on every commit.
Naming conventions — Python
Package name: snake_case matching repo name (hyphens become underscores). Module files: snake_case verb phrases (process_new_files.py). Functions: snake_case descriptive verbs (generate_dj_set_collection()). Constants: UPPER_SNAKE_CASE in config module. Pydantic models: PascalCase nouns (DjSetRecord, TrackRow). Log messages: emoji prefix for lifecycle events (🚀 start ✅ success ❌ failure).
FAILED_ prefix for failed inputs
Files that fail processing are renamed FAILED_<original> in the source folder for manual retry. Failed files must not be silently deleted or left unmarked.
possible_duplicate_ prefix for duplicates
Duplicate files are renamed rather than overwritten or deleted. Human review required.
finally for temp file cleanup
Temp files created during processing are always cleaned up in a finally block regardless of success or failure.
No getattr() access for undeclared Settings fields
pydantic-settings with extra="ignore" silently drops env vars not declared as fields. Using getattr(settings, "KEY", default) on an undeclared key is always equivalent to using the hardcoded default. This pattern is prohibited. Any key accessed via settings.KEY or getattr(settings, "KEY", ...) must be declared as a typed field on the Settings class. Deferred fields must be documented in a commented-out stub block in config.py.
Every key in .env.example must be declared in Settings
Any key in .env.example that is not a declared field on Settings creates a misleading contract. Exception: keys consumed by external tooling (RAILWAY_*, NODE_ENV) may be listed with a comment indicating they are not read by the app.
Testing14 rules
Testing pyramid — right ratio, not maximum coverage
The goal is the right ratio of test types, not maximum line coverage. Unit tests (many): pure functions in isolation, no I/O, no external calls. Fast. Integration tests (some): cog interaction with mocked external dependencies. One per major external integration. E2E tests (few): happy path only, run nightly not on every commit. One per pipeline covering the full flow.
Normalization test required per cog
A test that verifies the normalization/cleaning logic on representative input. Covers the core transformation the cog exists to do.
Deduplication test required per cog
A test that verifies duplicate inputs are correctly identified and handled — not silently overwritten.
Failure path test required per cog
A test that verifies the cog handles a bad input correctly — logs the error, marks the file, and continues rather than aborting.
Output shape test required per cog
A test that verifies the output (JSON schema, DB row, or file structure) matches the expected contract.
pytest as test runner
pytest is the test runner for all Python projects. Configured in pyproject.toml.
pytest-cov for coverage in CI
Coverage measured on every CI run. Report in terminal. Threshold not enforced by number — enforced by critical path coverage (TEST-001 through TEST-004).
respx/httpx for HTTP mocking — no real external calls
respx/httpx used for mocking HTTP calls in integration tests. No real external calls to Drive, Sheets, or any external API in unit or integration tests.
mypy must run in CI if [tool.mypy] is declared
If a repo declares [tool.mypy] in pyproject.toml, a CI step invoking "uv run mypy src/" (or equivalent) is required. A mypy config that is never run in CI gives false assurance and drifts silently. Exception: mypy may be omitted during initial setup if repo is in a documented "typing: in progress" state. This exception must not persist past first stable release.
FastAPI TestClient for API tests
FastAPI's TestClient via httpx used for all API endpoint tests. Tests run without a live server.
Database fixtures — no production data in tests
Tests use a separate test database or in-memory SQLite with transaction rollback. Never run against the production database.
Contract test for every API endpoint
Every API endpoint has a dedicated test asserting the response envelope shape: { data, meta } on success, { error: { code, message } } on failure. One contract test per endpoint minimum. Contract tests run in CI on every push.
Mock verification required
Every mock in a test must be verified with assert_called() or assert_called_once(). Tests that pass because a mock was never called are false positives.
GAP: Most existing cogs have import-level tests only
Most existing cogs currently have import-level tests only. This is a known gap with active remediation underway. Remediation order: deejay-cog first (new platform foundation), then by activity level. When a cog is remediated, this gap entry is closed and the cog is marked as compliant with TEST-001 through TEST-004.
Documentation13 rules
README.md is mandatory
Every repo has a README. It describes: purpose (one paragraph), inputs, outputs, environment variables, how to run locally, how to run tests, and versioning policy.
README describes inputs and outputs
For processors: what files are expected, where they come from, what is produced and where it goes. For APIs: what endpoints exist and what they serve.
CHANGELOG.md required
Tracks meaningful changes per version. Not every commit — only changes that affect behavior, interface, or configuration. Managed by semantic-release. Never edited manually.
.env.example is current
Every environment variable used by the service is documented in .env.example with a description and example value. Kept current — not a one-time artifact.
Design decisions captured
When a significant architectural decision is made, the rationale is written down — in docs/DESIGN.md, a CHANGELOG entry, or a README note. Not just what was decided but why.
README "Running locally" section is complete
Every repo's README includes a "Running locally" section that covers: (1) prerequisites — Python version (≥3.11) and uv; (2) install — `uv sync --all-extras`; (3) pre-commit — `uv run pre-commit install` (run once after cloning) and `uv run pre-commit run --all-files` (run manually at any time); (4) run — the exact command(s) to execute the service or scripts; (5) test — `uv run pytest` and the coverage variant. The section must be kept current. Copy-paste from the README must work on a clean clone with no prior knowledge of the repo.
Docstrings on all public functions and classes
All public functions and classes have docstrings. One sentence minimum describing what the function does, not how.
Pydantic field descriptions required
Pydantic model fields use the description parameter. Models are self-documenting — no separate documentation needed for data shapes.
No dead code
Commented-out code is removed, not left in place. Version control is the history — the codebase is the present.
Split package identity documented at entry point
If a Python library's install name (pyproject.toml [project] name) differs from its import namespace (src/ directory name), both names and their relationship must be documented in __init__.py docstring. README must show both correct install snippet and correct import path.
OpenAPI docs are first-class
FastAPI's /docs is a deliverable, not a side effect. Every endpoint must be complete and accurate before the service is considered done. All endpoints have summary, description, and response_model defined.
Public endpoints documented as intentional
Public (unauthenticated) endpoints include a note in their description confirming they are intentionally public.
Standards document is versioned
Every update to the standards repo increments the version in index.yaml. The AI evaluator always references a specific version. Evaluations without a standards version reference are invalid.
API10 rules
All new API services on Railway
All new API services deployed on Railway regardless of language. Python services use FastAPI. TypeScript services use Hono on Node. Railway is the single hosting standard. Cloudflare Workers deprecated for new API work.
PostgreSQL as data store for new services
All new services use PostgreSQL deployed on Railway. Not D1 or SQLite.
ORM required — SQLAlchemy (Python) or Drizzle (TypeScript)
Python: SQLAlchemy async with asyncpg driver, Pydantic-compatible models, Alembic for migrations. TypeScript: Drizzle ORM with postgres.js driver, TypeScript-native schema definitions, Drizzle Kit for migrations. No raw SQL without ORM.
Versioned routes — /v1/<resource>
All routes versioned from day one: /v1/<resource>. No unversioned routes in production.
Response envelope on all endpoints
All responses wrapped: { data: ..., meta: { count, version } }. Errors: { error: { code, message } }. No bare arrays or inconsistent shapes.
owner_id on all tables
Every table includes owner_id (Clerk user ID). Not enforced as FK today but present for future multi-tenant use.
Clerk for authentication
Authentication via Clerk across all services. Consistent across the ecosystem.
Public endpoints explicit and intentional
Public (unauthenticated) endpoints are explicitly documented and intentional — not accidental.
No unverified write endpoints reachable from the public internet
Any FastAPI service with write endpoints (POST, PATCH, DELETE) that depend on an unverified header (X-Owner-Id or equivalent) must either be provably isolated on a private network (Railway private networking, no public port), OR have CLERK_AUTH_ENABLED=true and RS256 JWT verification active. A module-level docstring in auth.py documenting the current posture and upgrade path is required regardless of which condition is satisfied.
API auth scheme and HTTP client must match
When kaianolevine-api's auth scheme changes, CommonPythonApiClient._headers() in common-python-utils must change in the same release. A mismatch causes all pipeline writes to return 401. Auth mechanism used by the API and sent by its client must be documented in the same location (auth.py cross-referencing common-python-utils and vice versa).
Pipeline9 rules
Prefect for new pipelines
New event-driven pipelines use Prefect. Python-native, built-in observability, retry logic, run history. GitHub Actions is for CI/CD only — not pipeline orchestration.
Idempotent pipeline steps
Every pipeline step can be safely re-run without side effects. Re-running a step produces the same result as running it once.
Separate process and collect steps
Processing new inputs and rebuilding the collection are separate pipeline steps — not combined. Allows independent retry.
Concurrency groups required for shared resources
Pipelines writing to shared resources declare a concurrency group. cancel-in-progress: false for data pipelines.
Archive, never delete raw inputs
Raw inputs are archived after processing, never deleted. Archive subfolder used.
Dual logger pattern in Prefect flows
Flow functions use get_run_logger() for Prefect-visible logs with fallback to standard logger for local runs: try: logger = get_run_logger() except Exception: logger = log
Retry logic on external API calls
Tasks that call external APIs use retries=2, retry_delay_seconds=30. Tasks calling Google APIs inherit common-python-utils retry logic.
watcher-cog as canonical Drive trigger
The canonical Drive-event trigger pattern is: File appears in watched Drive folder → watcher-cog detects change (1-minute poll via Drive API) → watcher-cog calls Prefect API to create flow run → Prefect executes the cog flow watcher-cog is an always-on Railway worker service. Config-driven: adding a new folder-to-cog mapping requires one WatcherConfig entry — no code changes. The previous pattern (Apps Script → repository_dispatch → GitHub Actions) is retired. google-app-script-trigger is archived.
AI evaluation step as final pipeline task
All production pipelines include an AI evaluation step as the final task. The step assesses run conformance against the current standards version and writes findings to the pipeline_evaluations table.
Frontend10 rules
Astro for all static sites
All portfolio, technical, and community sites use Astro. Static-first, component islands for interactivity. If the primary product is content delivery, use Astro. If it is an app with persistent UI state, use React.
Vite + React + TypeScript for web apps
Standard stack for all React web apps. Vite for bundling, React 19, TypeScript throughout. No exceptions.
Tailwind CSS for styling
Tailwind is the styling primitive for all React apps and Astro sites. No CSS modules, no styled-components, no other CSS framework.
shadcn/ui for components
shadcn/ui is the component library standard. Components are copied into src/components/ui/ — not installed as a dependency. Built on Radix UI primitives and Tailwind.
React Hook Form + Zod for forms and validation
All forms use React Hook Form. All validation schemas use Zod. These are assumed by shadcn/ui form components and are the ecosystem standard.
Graceful degradation — build succeeds without API
Build succeeds with empty or unavailable API data. Site does not break if API is down.
No hardcoded API URLs
No hardcoded API URLs. PUBLIC_API_URL (Astro) or VITE_API_URL (React) env var used throughout.
Pinned starter version
Astro sites pin their starter version. Upstream preserved in-repo for reference. Do not blindly upgrade starter versions — pin and document the version in use.
Build-time data for static content
Infrequently-changing data (collections, summaries) fetched at build time via Astro data files. Do not make runtime API calls for content that could be fetched at build time.
Runtime queries for interactive demos only
Live API queries reserved for search, filtering, and interactive demo surfaces. Implemented via Astro component islands. Document which endpoints are called at runtime vs build time.
Delivery20 rules
Feature flags for major functionality toggles
Feature flags are used to decouple deployment from release. They gate major sections of functionality — not minor implementation details. The canonical use cases are: Kill switches: disable a risky integration (e.g. Anthropic API calls in evaluator-cog) without a redeploy. Readiness gates: API ships a new endpoint; frontend UI is hidden until the flag is enabled. Allows frontend code to be merged and deployed independently of API readiness. Maintenance mode: return 503 gracefully during migrations without touching code. Flags are stored in the feature_flags table in api-kaianolevine-com's PostgreSQL database and served via GET /v1/feature-flags. The public read endpoint is unauthenticated. Write endpoints (POST, PATCH, DELETE /v1/feature-flags/:key) require Clerk JWT auth with admin role. Flag naming convention: <service>.<feature> Examples: evaluator_cog.llm_soft_rules evaluator_cog.conformance_check watcher_cog.drive_polling api.maintenance_mode Flag anatomy — each flag must have: key: string identifier enabled: boolean description: why it exists and when it should be deleted permanent: boolean — true for infrastructure flags (maintenance_mode), false for rollout/readiness flags (must be deleted post-rollout) Lifecycle contract: 1. Ship code with flag check 2. Deploy and activate when ready by flipping the flag via the admin panel 3. For non-permanent flags: ship a follow-up PR removing the flag check once the feature is fully rolled out and stable 4. Delete the DB row after the cleanup PR merges Include a comment in code pointing to the flag key so it is easy to find: # feature flag: evaluator_cog.llm_soft_rules Flags are NOT used for: - Fine-grained A/B testing or percentage rollouts - Per-user targeting - Replacing Prefect flow configuration - Gating minor implementation details Clients check flags at runtime (not build time) via a short-TTL in-memory cache (30–60 seconds per process) to avoid hitting the API on every request. Astro static sites fetch flags client-side on load. Fail-open or fail-closed behavior must be intentional and documented per flag.
Sentry for error tracking — all production services
All production services integrate Sentry for unhandled exception tracking. This includes FastAPI services, Hono services, and Python cogs (always-on worker services). Free tier is sufficient. Sentry is initialised at service entry point before any application logic runs. SENTRY_DSN is set as an environment variable — never hardcoded. The same Sentry project may be shared across related services or a dedicated project used per service. Sentry covers Layer 3 observability (unhandled exceptions). It does not replace structured logging (Layer 2) or liveness monitoring (Layer 1).
Structured logging via shared library
All Python services use common-python-utils logger. All TypeScript services use kaiano-ts-utils logger. Never use print() in production code paths. Log output is JSON-formatted in production. Standard log event shape (all services): timestamp: ISO 8601 service: repo name (e.g. watcher-cog, api-kaianolevine-com) level: DEBUG | INFO | WARN | ERROR category: infra | pipeline | data | api event: snake_case event name (e.g. trigger_fired, file_processed) context: key-value pairs specific to the event Category definitions: infra: service lifecycle, triggers fired/not fired, heartbeats, Drive poll results pipeline: file processed/skipped/failed, pipeline completed/failed, Prefect flow run created data: data quality issues, schema violations, evaluation findings, duplicate detection api: HTTP errors (4xx/5xx), slow responses (>2s), external API failures, auth failures Emoji prefixes used for human-readable local output only. Structured JSON is the production format.
GitHub Actions version tags must be valid
All "uses: owner/repo@vN" references in .github/workflows/ must reference version tags that exist in the action's release history. Invalid version tags cause CI to fail silently. Current pinned versions (March 2026): actions/checkout@v6, actions/setup-node@v6, astral-sh/setup-uv@v7. Verify the tag exists before committing any version pin.
Conventional Commits format
All commit messages follow: type: description. Types: feat, fix, docs, refactor, chore, test, ci.
BREAKING CHANGE footer for major bumps
Use explicit BREAKING CHANGE in commit body for major version bumps. The feat!: shorthand is unreliable. Correct pattern using two -m flags: git commit -m 'feat: description' -m 'BREAKING CHANGE: explanation'
semantic-release on all repos
Every repo has .releaserc.json and a release job in ci.yml. On merge to main: tests run, semantic-release reads commits, determines bump, updates version file, updates CHANGELOG.md, creates git tag, creates GitHub Release.
Never manually edit version files or CHANGELOG
Never manually edit version in pyproject.toml or package.json. Never manually edit CHANGELOG.md. Both are owned by semantic-release.
fetch-depth: 0 on CI checkout
All CI jobs that run semantic-release must use fetch-depth: 0 on the checkout step. Without this semantic-release cannot read full git history and will not release correctly.
Plugins installed explicitly before semantic-release
semantic-release plugins are installed via npm install --no-save in the release job before running npx semantic-release. Not via package.json. Required because npx only installs the core package.
Prefect Cloud for pipeline observability
All pipeline flows connect to Prefect Cloud (free tier). Run history, step logs, and flow state are visible at app.prefect.cloud. GitHub Actions logs are for deep debugging only — not the primary observability surface.
GitHub Actions is CI/CD only — not a trigger relay
GitHub Actions handles lint, tests, deploys, and semantic-release. It does not trigger pipeline flows. Drive-event triggers go through watcher-cog. Scheduled and manual triggers go through Prefect deployments. repository_dispatch as a trigger relay for cogs is a retired pattern.
Healthchecks.io for always-on worker services
Always-on Railway worker services (services with no HTTP port that run continuously) integrate Healthchecks.io as a dead man's switch. The service pings HEALTHCHECKS_URL on every work cycle. If the ping goes silent beyond the grace period, Healthchecks.io sends an email alert. Recommended settings: period 1 minute, grace 5 minutes. HEALTHCHECKS_URL is set as an environment variable — never hardcoded. Free tier is sufficient.
Log levels used consistently and intentionally
Log levels follow a strict contract across all services: DEBUG: internal state useful for local debugging only. Never emitted in production by default. INFO: meaningful lifecycle events — service started, file processed, trigger fired, pipeline completed. Reviewable weekly. Default production level. WARN: something unexpected happened but the service recovered. Skipped files, retried calls, degraded behaviour. Reviewed promptly. ERROR: something failed and requires attention. Unhandled exceptions, failed triggers, data integrity violations. Reviewed immediately. Never use ERROR for expected failure modes (e.g. file not found when that is a valid outcome). Never use INFO for high-frequency noise that would obscure meaningful events.
Structured log event shape — all services
All meaningful log events emitted by production services include these fields: timestamp: ISO 8601 service: repo name (e.g. watcher-cog, api-kaianolevine-com) level: DEBUG | INFO | WARN | ERROR category: infra | pipeline | data | api event: snake_case event name (e.g. trigger_fired, file_processed) context: key-value pairs specific to the event Category definitions: infra: service lifecycle, triggers fired/not fired, heartbeats, Drive poll results pipeline: file processed/skipped/failed, pipeline completed/failed, Prefect flow run created data: data quality issues, schema violations, evaluation findings, duplicate detection api: HTTP errors (4xx/5xx), slow responses (>2s), external API failures, auth failures Log output is JSON in production, human-readable with emoji prefixes in local development. The shared library (common-python-utils for Python, kaiano-ts-utils for TypeScript) provides the logger — services do not configure logging inline.
Three-layer observability stack — all production services
Every production service implements all three observability layers: Layer 1 — Liveness: Healthchecks.io for always-on workers. Railway auto-restart for all services. Layer 2 — Structured logs: common-python-utils or kaiano-ts-utils logger emitting JSON events with standard shape. Queryable in Railway log viewer or Better Stack. Layer 3 — Exceptions: Sentry capturing all unhandled exceptions with full stack trace and context. These three layers are non-overlapping and non-redundant: Layer 1 catches: process died, service unreachable Layer 2 catches: business logic errors, skipped work, slow operations Layer 3 catches: unhandled exceptions and crashes A service missing any layer has a blind spot in production. Pipelines additionally require Prefect run history as a fourth layer covering orchestration-level observability.
Doppler as canonical secret store
Doppler is the single source of truth for all infrastructure secrets across the MiniAppPolis ecosystem. This includes API keys, database URLs, service tokens, internal API keys, Sentry DSNs, Healthchecks URLs, and Clerk credentials. Doppler syncs automatically to: Railway: all API services and cogs receive secrets via the Doppler → Railway native sync GitHub Actions: CI secrets synced via Doppler → GitHub integration Cloudflare Pages: secrets synced via Doppler CLI in CI/CD Secret management workflow: - All secret changes are made in Doppler only - Doppler pushes changes to downstream platforms automatically - Never manually set secrets in Railway, GitHub, or Cloudflare if Doppler is managing that service - .env.example lists all required secret keys with no values — this file is the human-readable contract of what a service needs Doppler project structure: One Doppler project per service or cog. Environments: development, staging, production (minimum). The only known carve-out is Prefect Cloud. Prefect uses its own encrypted Blocks for flow-level secrets. Prefect Blocks are managed directly in Prefect Cloud and are not synced from Doppler. When a secret used in a Prefect Block is rotated, Prefect must be updated manually. This is a known limitation — document it per-secret in Doppler's description field.
Internal service-to-service auth via per-caller API keys
Internal calls between Railway services and cogs use per-caller API keys passed via the X-Internal-API-Key header. Each caller (each cog or service making internal API calls) has its own key — not a single shared secret across all callers. Per-caller keys allow: - Independent rotation without touching all callers - Caller identity logging on every internal request - Revocation of a single caller's access without affecting others All internal API keys are stored in Doppler and injected at runtime. The receiving service (api-kaianolevine-com) validates the key and logs the caller identity on every internal request. common-python-utils provides get_internal_headers() returning the correct header dict for outgoing internal requests. All Python cogs use this utility — never construct the header inline. Clerk JWT auth is for user-facing requests only. Internal service-to-service calls never use Clerk JWTs.
Feature flag admin panel in kaianolevine.com
The admin control panel for feature flags lives at an /admin route in the kaianolevine.com software portfolio site. It is gated behind Clerk authentication with admin role verification. Capabilities: - View all flags and their current state - Toggle flags on/off - Create new flags (key, description, permanent boolean) - Delete flags (permanent flags require confirmation) The admin panel calls authenticated write endpoints on api-kaianolevine-com. The public GET /v1/feature-flags endpoint remains unauthenticated for use by Astro sites and cogs. Each site (kaianolevine.com, wcs.kaianolevine.com, deejaytools.com) has its own Clerk application and independent user pool. Admin access to the flag panel is scoped to the kaianolevine.com Clerk app only.
Pull before branching
Always run git pull origin main before creating a new feature branch. Prevents merge conflicts with semantic-release commits (chore(release): X.Y.Z [skip ci]) that land on main after every release.
Evaluation7 rules
Evaluation findings stored, not ephemeral
Evaluation results are written to the database alongside pipeline outputs. They are reviewable, queryable, and feed the portfolio surface. Results logged only are not acceptable for production pipelines.
Every evaluation references a standards version
Every evaluation call includes a reference to the current version of the standards document being evaluated against. Without the rubric, evaluation is undefined.
Findings are specific and actionable
Findings reference a specific standard by ID, describe what was observed, and suggest a concrete remediation. Vague findings are not acceptable.
Structural conformance dimension
Evaluation dimension: does the repo/workflow follow the patterns in the standards? Covers src layout, naming conventions, error handling, tooling choices.
Pipeline consistency dimension
Evaluation dimension: did this run behave the same way as previous runs? Covers unexpected timing changes, new error types, missing steps.
Standards currency dimension
Evaluation dimension: has this repo been evaluated against the current version of the standards document? Is it behind? Flag if the last evaluation used a standards version more than one minor version behind current.
Deterministic conformance checks — partial coverage
The conformance flow in evaluator-cog implements ~12 deterministic checks covering file presence, pyproject.toml, CI YAML, AST scanning, and test structure. Approximately 38 of the 50 checkable rules are not yet covered deterministically and fall to LLM assessment only. Remediation: backfill deterministic checks one domain at a time, starting with delivery.yaml (CD-004 action version pinning) and documentation.yaml (DOC-013 README completeness).
Cross-Stack2 rules
Use shared library — do not reimplement shared behaviors
Python services declare common-python-utils as a dependency. TypeScript services declare kaiano-ts-utils as a dependency. Neither stack reimplements logging, auth verification, or response helpers provided by the shared library.
Cross-stack response shape parity
API response shapes are identical across Python and TypeScript services: success: { data: ..., meta: { count, version } } error: { error: { code, message } } Defined in kaiano-ts-utils for TypeScript. Enforced by response_model in FastAPI.