Full stack in a weekend: a Python adjudication core, a Django-Ninja API, a React 19 single-page app, and a packaged desktop build, all shipped end to end in 36 hours
Model reads, engine decides: forced tool_use extraction pulls charges, ledger balance, and eligibility, then a pure-Python function computes the payout, so every dollar traces back to code and a document line and nothing is hallucinated
91% of claims within $250 of the human decision, median error of $0, mean absolute error of $62, at about $0.33 per claim on the held-out test split
Multi-user with per-tenant SQLite databases and workspace sharing: a run can be snapshotted and shared read-only into a space, copied on share so a viewer never touches the originator's live data
Security hardening throughout: master-approved signup, PBKDF2 passwords, session tokens stored only as SHA-256 hashes, and an allow-list column projection that structurally blocks the human-answer fields from ever reaching the model
Built-in white-hat security pass: the code is reviewed by autohack, my own autonomous bug-hunter, which traces user input to sinks and has a second model try to disprove each finding
Hybrid document reading routes each PDF page by text density: about 75% read free with pure-Python pdfminer, scanned pages go to vision, and no poppler or tesseract binaries means the same code runs everywhere including the desktop build
Calibration with zero API calls: extractions are stored and the engine is a pure function, so a candidate rulebook (JSON, not code) re-scores against the human decisions by replaying stored reads