Proof, not vibes
Real goals a naive pipeline leaves subtly broken — the file looks created, the answer looks right, the parallel split looks done — that kazi drove to objective convergence. Each case below is a genuine kazi apply --json run on a released binary, with before/after evidence and a command you can re-run.
No unverifiable claims. Every iteration count, cost, and wall-clock figure is copied verbatim from a real run recorded in the devlog; the full reproduction steps — and the metered cost_usd per run — live in the methodology doc. If a number can’t be traced to a captured run, it isn’t on this page.
- #1kazi v1.64.2released binary · real claude harness
Exact-content file
Goal: Create VERSION.txt whose contents are exactly 1.0.0
The subtle break: “The file exists” looks done — but the bytes have to be exact. A naive run is happy with a stray trailing line, v1.0.0, or 1.0.0\n\n.
Without kaziFile created, declared done. Content is plausible but not byte-exact — the version string silently wrong.
With kazikazi’s predicate compares the content exactly (test -f VERSION.txt && [ "$(cat VERSION.txt)" = "1.0.0" ]). “Exists” is never mistaken for “done”; VERSION.txt verified to be the bytes 1.0.0.
- status
- converged
- iterations
- 2
- wall clock
- 18.5 s
- tokens
- 39,712
Reproduce$ kazi apply version.goal.toml --workspace <ws> --harness claude --json - #2kazi v1.64.1released binary · real claude harness
Self-correcting against an opaque oracle
Goal: Write solution.py printing the sum of n<1,000,000 that are palindromes in base 10, 2 AND 8 (answer 610)
The subtle break: The agent’s first multi-step attempt looks plausible but is wrong. With no objective check, that wrong answer ships.
Without kaziFirst dispatch prints a plausible-but-wrong number. A prose pipeline accepts it — there’s nothing grading the result.
With kazikazi grades the output against a one-way sha256 oracle (the model can’t read the answer out of the checker), catches the wrong first attempt, and re-drives. The cheap model (Haiku 4.5) self-corrected on iteration 2; solution.py verified to print 610.
- status
- converged
- iterations
- 2
- wall clock
- 39.3 s
- model
- claude-haiku-4-5
Reproduce$ kazi apply ./goal.toml --workspace ./ws --harness claude --model claude-haiku-4-5 --json - #3kazi v1.64.2released binary · real claude harness
A real cross-group dependency, parallelized correctly
Goal: Build three Go capabilities (contract, /healthz, streaming) where streaming consumes the contract type
The subtle break: Split the work across parallel agents naively and the streaming endpoint compiles against a Widget type that doesn’t exist yet — a broken build that “looks done” per-agent.
Without kaziNaive parallel split → streaming dispatched before the contract exists → broken compile, or a serial run that gives up all the parallelism.
With kazikazi computes the wave schedule from authored needs edges: result-contract and health dispatch in the SAME millisecond (disjoint blast radius), streaming waits 0.222 s until result-contract objectively converges, then all three merge into one workspace. Collective converged, exit 0.
- collective
- converged
- groups
- 3 (2 concurrent, 1 gated)
- blocked
- 0
- scheduler
- single-node, NATS-free
Reproduce$ kazi apply priv/examples/predicate_graph_waves.toml --workspace <scratch> --parallel --harness claude --jsonCommitted goal-file:
priv/examples/predicate_graph_waves.toml
Why the “converged” verdicts mean something
The inverse case is the founding dogfood: a deterministic test where a naive fix makes one predicate pass while regressing another, and kazi catches the green→red regression instead of declaring success. Two coupled predicates — fixing a.txt breaks b.txt as a side effect — and kazi attributes the regression to the dispatch that caused it and never declares done. A single exit code would have hidden it.
$ mix test test/kazi/slice1_dogfood_test.exsDeterministic — no model, no network. Source: test/kazi/slice1_dogfood_test.exs (the Slice-1 acceptance dogfood, T1.8). Labelled as a test, not a live kazi apply run.
Each new converged dogfood becomes a new entry here. To add one: capture a real kazi apply --json envelope, record it in the devlog, and append a case — never a number without a run behind it.