Execute
Default Action¶
PYTHONUNBUFFERED=1 millstone --cli claude --cli-reviewer codex -n 1 --max-cycles 6
Run this command. Verify the result. Review the remainder of the tasklist to ensure it's still sensibly aligned with the objective given the ground truth of the implementation. Repeat until the tasklist has no unchecked tasks or human input is required.
When invoked from Claude Code (Bash tool): -
PYTHONUNBUFFERED=1— required to see output; the Bash tool's piped stdout triggers Python's line-buffering and swallows output otherwise. -CLAUDECODEenv var — millstone automatically strips this from the subprocess environment when invoking claude, so nested sessions work without any manual action.
Responsibilities¶
- Execute — Run millstone to process tasklist tasks.
- Verify — Check that changes are correct after each task.
- Intervene — Fix problems when the orchestrator gets stuck.
- Escalate — Stop and report when something needs human judgment.
Decision Framework¶
- Continue IF: Exit 0, tests pass, changes align with task, diff matches scope.
- Stop/Escalate IF: Loop detected, commits wrong, tests fail without obvious cause, scope creep, STOP.md created, uncertainty > 30%.
- Intervene IF: Ambiguous task, flawed design, missing context.
Autonomy Guidelines¶
- High: Well-defined/mechanical tasks, stable project.
- Low: Ambiguous/Complex tasks, security/auth, post-major changes, generated plans.
- Zero: Requires external context, product judgment, high risk.
Preflight¶
- Skim top unchecked tasklist items; rewrite any ambiguous ones.
- Predict likely blockers (missing tests, unclear API, cross-cutting changes). Add prerequisite tasks if needed.
- Decide autonomy level in advance.
Runtime Expectations¶
| Operation | Typical Duration | Timeout |
|---|---|---|
Single task (-n 1) |
2-30 minutes | 60 min |
Authoring Loop Invariant¶
Every outer-loop authoring step — --analyze, --design, and --plan — runs an iterative write/review/fix loop. A reviewer agent checks the output and requests revisions until it approves or --max-cycles is exhausted. --max-cycles applies equally to inner-loop build-review iterations and outer-loop authoring loops.
Scoping Remote Backlogs¶
When using the MCP tasklist provider, the default scope is all open items returned by the agent's configured MCP server. Narrow it with [millstone.artifacts.tasklist_filter] in .millstone/config.toml — filter keys are forwarded to the agent as part of the read instruction:
[millstone.artifacts.tasklist_filter]
label = "sprint-1"
cycles = ["Cycle 5"]
Use local .millstone/tasklist.md for solo or personal projects; use the MCP provider plus a filter for team boards where the backlog lives in a remote service. The tasklist storage location is determined dynamically by the configured provider — the agent receives instructions specifying where to read and write tasks. Full filter reference: docs/providers/mcp.md.
Invocation Patterns¶
millstone -n 1— Single task (default, safest)millstone -n 3— Batch, when confident tasks are well-defined
Exit Codes & Recovery¶
- Exit 0 (Success): Verify with
git log -1, continue if appropriate. - Exit 1 (Halted): Diagnose cause.
- Check
.millstone/STOP.md(Sanity check failure). - Check output (LoC threshold).
- Check output (Sensitive files).
- Check output (Loop detection).
- Check
git status(Commit failed). - Recovery:
millstone --continue(After manual review/fix).git revert HEAD(If tests failing/bad commit).millstone --task "simpler..."(If stuck in loop).
- Check
Correcting Changes¶
In order of preference:
1. Inject task: Add - [ ] Fix X... to .millstone/tasklist.md → millstone -n 1. Maintains traceability.
2. Direct task: millstone --task "Fix X...". For quick, one-off fixes.
3. Revert & reframe: git revert HEAD → rewrite original task → re-run.
CLI Configuration¶
- Default:
claude(Anthropic). - Options:
codex(OpenAI). - Per-role:
--cli-builder,--cli-reviewer,--cli-sanity,--cli-analyzer. - Persistent config:
.millstone/config.toml.
Anti-Patterns¶
- Polling millstone runs instead of waiting for completion.
- Ignoring exit code 1.
- Trusting
--cycle --no-approveon unfamiliar codebases. - Continuing blindly after repeated halts.
Debugging¶
- Run logs:
.millstone/runs/
millstone to complete the tasklist with exceptional quality.
<planning>: Analyze current state → strategize blockers → select command → verify against constraints.
2. Response: the command to run and one sentence explaining why.
<planning> block.
2. Execute: Run the selected command.
3. Verify: Check exit code and logs immediately.
4. Loop: Repeat until the tasklist is empty or the Decision Framework requires escalation.