TL;DR — From Spec to Shipped Code, Hands-Off

Praxis turns a written spec into merged, reviewed code with no human in the loop for the busywork. Claude Opus breaks the spec into a task graph and reviews every pull request; cheap local LLMs do the actual coding via Aider in isolated Docker containers, each on its own git branch. Opus squash-merges what passes and retries what fails (up to 3×). An optional autonomous loop then proposes its own improvements. The result: a self-driving plan → implement → review → merge cycle where expensive reasoning is reserved for planning and judgment, and commodity tokens do the typing.

   spec ─▶ ┌────────┐   tasks   ┌──────────┐  PR diff  ┌────────┐
           │  OPUS  │──────────▶│  AIDER   │──────────▶│  OPUS  │
           │  plan  │           │ agents   │           │ review │
           └────────┘           │ (Docker) │           └───┬────┘
                                └──────────┘        pass │  │ fail
                                                   merge ◀┘  └▶ retry (≤3)
                                                     │
                                          all tasks merged ─▶ integration PR
                                                     │
                                       optional: Opus proposes next improvement

The Problem

Using a frontier model like Opus for every keystroke is expensive and slow; using a cheap local model for everything produces unreviewed, low-quality code. Real engineering value is split: planning and code review need deep reasoning, while implementation is largely mechanical. Most AI coding tools collapse these into one model and one context, so they neither plan well nor scale cheaply — and a human still has to babysit every step.

The Approach — Two-Tier Model, Two-Tier Git

Praxis separates thinking from typing, and isolates every unit of work on its own branch so parallel agents never collide.

  ┌──────────────────────────────────────────────────────────────┐
  │  ORCHESTRATOR  (FastAPI + SQLite, REST + SSE)               │
  │                                                            │
  │  Task Queue ─▶ Agent Manager ─▶ Opus Bridge                │
  │  (state machine)  (Docker lifecycle)  (claude -p)          │
  └───────────────┬───────────────────────────┬────────────────┘
        plan/{date}-{slug} branch    agent/{task-slug} branches
                  │                            │
        ┌─────────▼──────────┐      ┌──────────▼──────────┐
        │  Opus (subscription)│      │ Aider × N (local LM)│
        │  plan + review only │      │  parallel coding    │
        └────────────────────┘      └─────────────────────┘
  • Right model for the job: Claude Opus (via the claude -p CLI on a subscription) plans and reviews; local LLMs in LM Studio do the implementation — minimizing frontier-token spend.
  • Two-tier branching: a plan/ branch per spec, an agent/ branch per task — parallel agents work in isolation, then merge up into an integration PR.
  • Quality gate built in: Opus reviews each PR diff and only squash-merges on pass; failures are re-dispatched up to 3 times.
  • Autonomous mode: a confidence-scored loop lets Opus propose and ship its own improvements, with a toggleable human approval gate and automatic resume after rate limits.
  • Real-time visibility: live log streaming over SSE to a single-file dashboard, a Typer CLI, and an authenticated REST API.

Results & Engineering Quality

  • End-to-end autonomy: a single spec flows through planning, parallel implementation, review, and merge without manual intervention on the happy path.
  • Cost-efficient by design: expensive reasoning is confined to planning/review; bulk code generation runs on free local inference.
  • Production-grade footing: 101 tests at 88% coverage, strict typing (mypy), and ruff-enforced style.
  • Deploy-anywhere: pure-stdlib-friendly stack with Docker Compose and a Caddy auto-HTTPS hosted profile; SQLite persistence with no external services.
  • Resilient: automatic rate-limit resume (5h + buffer) keeps long autonomous runs alive unattended.