Architecture
Goals (design intent)
- gh-aw format compatibility: Task files use Markdown + YAML frontmatter like Agentic Workflows (gh-aw)
; you can drop community workflows into
.wm/tasks/. - No compile step: No
.lock.yml, nogh aw compile. - Go +
go-gh: GitHub auth followsgh auth login(seeinternal/ghclientfor API usage from commands likeassign). - Thin coordination on GitHub: Issues, labels, Actions, PRs—no extra control plane.
High-level pipeline
Each task follows trigger → resolve → run (RunTask). The run is a five-phase pipeline in-process (no gh-aw-style compile): activation (event/task validation, feature branch for PR mode, per-run artifact dir under .wm/runs/ or WM_RUN_DIR), agent (runAgent subprocess — WM_SAFE_OUTPUT_FILE set to per-run output.jsonl when a run dir exists), validation (successful exit and output size bound; context deadline surfaced as timeout), safe-outputs (internal/output
— reads output.jsonl (gh wm emit) when safe-outputs: is non-empty; max: / label allowlists enforced; empty output warns and succeeds), and conclusion (defer: checkpoint comment, branch rollback on failure, result.json). RunTask
returns a types.RunResult
with Phase, Success, Errors, RunDir, and AgentResult; wm run logs phase= and artifacts= on stderr when a run directory is created.
Optional checkpoints (internal/checkpoint
): when WM_CHECKPOINT=1, the runner loads the latest checkpoint from issue comments into the prompt before the agent, and posts a new checkpoint comment after a successful run (see internal/engine/runner.go
).
Code map
| Concern | Location | Role |
|---|---|---|
| CLI entry | cmd/ | Cobra commands: init, compile, upgrade, update, assign, resolve, run, process-outputs, emit, status, logs, add. |
| Config + tasks | internal/config/ | Load .wm/config.yml, parse .wm/tasks/*.md frontmatter (frontmatter.go
). |
| Event → task names | internal/trigger/match.go | MatchOnOR: implements on: OR-semantics against types.GitHubEvent
. |
| Orchestration | internal/engine/ | ResolveMatchingTasks and ResolveForcedTask (resolver.go
) — forced resolve pins one task by filename without evaluating on: (matches local gh wm run); RunTask (runner.go
), per-run dirs (rundir.go
), activation checks (activation.go
), output validation (validation.go
), conclusion/defer (conclusion.go
), runAgent (agent.go
). |
| Post-agent steps | internal/output/ | RunSuccessOutputs: applies output.jsonl (gh wm emit) when safe-outputs: is set (see task-format
). |
wm-agent.yml generation | internal/gen/wmagent.go
, triggers.go
, schedules.go | Task-driven union of GitHub on: keys (issues / issue_comment / pull_request types, slash_command → issue_comment, on.schedule crons); writes caller workflow. |
| Embedded templates | internal/templates/ | Starters for gh wm init (config.yml, tasks). |
| GitHub API helpers | internal/ghclient/ | Labels, issue comments (gh api). |
| Feature branch before PR | internal/gitbranch/ | When safe-outputs includes create-pull-request, create wm/<task>-… on the default branch so the agent does not commit directly to main. |
GitHub Actions: reusable workflows and generated wm-agent.yml
Business repos use an auto-generated wm-agent.yml (from gh wm init / gh wm compile). Runner labels come from workflow.runs_on in .wm/config.yml
; optional workflow.install_claude_code (default true) controls whether CI installs the Claude Code CLI before gh-wm run; optional workflow.gh_wm_extension_version passes --pin to gh extension install (tag or commit; see gh help extension install); optional workflow.pre_steps lists prerequisite Actions steps (toolchains, deps); compile rewrites wm-agent.yml when you change them.
- Resolve always uses reusable
agent-resolve.yml. - Run uses reusable
agent-run.ymlwhenworkflow.pre_stepsis empty. Ifworkflow.pre_stepsis set, the generator embeds the same checkout → pre-steps →gh extension install→ (optional) Claude Code install →gh wm run --agent-only→ pack workspace → artifact →gh wm process-outputssequence inline inwm-agent.yml(reusable workflows cannot take arbitrary step YAML as inputs).
agent-resolve.yml(.github/workflows/agent-resolve.yml)runs-onis driven by theruns_onworkflow input (JSON array of labels), with default["ubuntu-latest"]; generatedwm-agent.ymlpasses labels from.wm/config.yml.- Checks out the repo, ensures
ghvia the compositeinstall-gh-cliaction (officialcli/cliLinux tarball whenghis missing on self-hosted runners), installsgh-wmviagh extension install, writes the GitHub event JSON to.wm/runs/github-event.json(under the ignoredruns/tree; see.wm/.gitignore) sogit statusstays clean forgh wm run’s working-tree check, setsWM_SCHEDULE_CRONfromgithub.event.schedulewhenevent_nameisschedule(so only tasks whose normalized cron matches that tick are resolved), then runs: gh wm resolve --repo-root . --event-name "$EVENT_NAME" --payload .wm/runs/github-event.json --json- Exposes the printed JSON array as job output
tasks, and setshas_tasksto the stringtrueorfalseso the caller can skip therunjob when nothing matched (avoids matrix/fromJSONerrors on empty input).
agent-run.yml(.github/workflows/agent-run.yml) — whenworkflow.pre_stepsis unset- The generated caller runs this workflow only when
needs.resolve.outputs.has_tasks == 'true'. Matrix overfromJSON(needs.resolve.outputs.tasks)withfail-fast: false. - Two jobs (token sandbox):
agentusespermissions: contents|issues|pull-requests: readso theGITHUB_TOKENavailable to the agent subprocess cannot mutate GitHub state; it runsgh wm run … --agent-only, thentarthe entire workspace (including.gitand.wm/runs/<id>/) into${{ runner.temp }}/wm-workspace.tar.gz(per job, so parallel matrix legs on self-hosted runners do not race on a shared/tmppath) and uploadswm-workspace-<task>-<run_id>.tar.gzas an artifact.outputs(needsagent) usespermissions: write, downloads that artifact to${{ runner.temp }}(not the repo root),tar -xzfintoGITHUB_WORKSPACE, and runsgh wm process-outputs --run-dir …to applysafe-outputs:and conclusion with a write-capable token (download outside the checkout avoids leaving the archive as an untracked file, which would breakgh pr create). - Upload/download use
actions/upload-artifact@v5andactions/download-artifact@v7(Node.js 24). Self-hosted runners must be Actions Runner 2.327.1 or newer fordownload-artifact@v7. - Unless
install_claude_codeis false, theagentjob runs the official Claude Code installer (https://claude.ai/install.sh) and appends$HOME/.local/bintoGITHUB_PATHsoclaudeis onPATHon minimal self-hosted runners.
- The generated caller runs this workflow only when
Inline
run_agent/run_outputsjobs — whenworkflow.pre_stepsis set- Same matrix and the same read/write split as
agent-run.yml(payload under.wm/runs/github-event.json); steps includeworkflow.pre_stepsafter checkout and beforegh extension install; whenworkflow.install_claude_codeis true (default), the same Claude Code install +GITHUB_PATHsteps run beforegh wm run --agent-only.run_outputsis gated withif: always() && !cancelled() && …so a failing matrix leg for one task does not skipprocess-outputsfor other tasks. Theprocess-outputsstep usesgh wm process-outputs --task "<task>"so thegh-wmbinary resolves the newest.wm/runs/<id>/for that task (same logic as--run-dir, no extra runtime dependencies).
- Same matrix and the same read/write split as
GitHub Actions token sandbox
agent-run.yml
enforces safe-outputs by denying the agent direct gh writes: the agent job runs with a read-only GITHUB_TOKEN (repository read on contents/issues/PRs). Intended GitHub mutations go through gh wm emit into output.jsonl, then the outputs job runs gh wm process-outputs --run-dir <path> (engine.ProcessRunOutputs
) so only policy-validated actions execute. process-outputs requires a persisted result.json from gh wm run --agent-only with agent_result.success true; otherwise it refuses to apply outputs. The workspace tarball preserves .git so create-pull-request (git push + gh pr create) still works in the outputs job. For create_pull_request, add_labels, and create_issue outputs that use labels, gh-wm ensures each label exists on the repository (create with a default color if missing) before gh pr create or issue APIs run, so repos do not need labels pre‑seeded. Activation side effects that require write (on.reaction) may fail in the agent job on a read-only token; they are logged and the run continues (see runner.go
).
Note: In CI, the installed binary name is gh-wm. When installed as a gh extension, the same commands are available as gh wm ….
Loop prevention (generated wm-agent.yml)
The generator adds workflow-level defenses so agent side effects (labels, comments, PRs) are less likely to cascade into repeated runs—especially when gh uses a PAT (where GitHub does not suppress follow-up workflow runs the way it does for the default GITHUB_TOKEN):
concurrency: one in-flight run per issue/PR number (falls back togithub.run_idfor schedule/dispatch).resolvejobif:: skips whengithub.actorisgithub-actions[bot], except forscheduleandworkflow_dispatch(those are always evaluated).
ResolveMatchingTasks
adds resolver-side guards before MatchOnOR: skip events whose sender is a Bot (same exceptions: schedule, workflow_dispatch); issue_comment ignores comments that contain the hidden <!-- wm-agent: marker appended by add-comment and checkpoint posts. Use on.issues.labels in tasks that should run only when specific labels are added (see task-format
).
Resolve behavior details
engine.ResolveMatchingTasksapplies the loop guards above, then loads all tasks and keeps those wheretrigger.MatchOnOR(event, task.OnMap())is true.- Schedule events: For
event_name == schedule, every task that includeson.schedulematches at theMatchOnORlayer;agent-resolve.ymlsetsWM_SCHEDULE_CRONfrom the payload’sschedulefield soResolveMatchingTaskscan filter withtrigger.ScheduleCronMatches(same fuzzy cron asgen.FuzzyNormalizeSchedulefor that task path). For localgh wm resolve, setWM_SCHEDULE_CRONyourself when simulating a schedule tick, or omit it to list every schedule task. - Payload: Event JSON is read from
--payloadorGITHUB_EVENT_PATHwhen set; if both are unset, the payload defaults to{}. Event name comes from--event-nameorGITHUB_EVENT_NAME.
Run behavior details
engine.RunTaskreturns aRunResultwith phase, accumulated errors, timing, andRunDir. It validates the event and engine, buildsTaskContext, creates a per-run directory (NewRunDir:.wm/runs/<id>/orWM_RUN_DIR/<id>/), optionally loads checkpoint text, optionally creates a feature branch viainternal/gitbranchwhensafe-outputsincludescreate-pull-request(see CLI reference), runsrunAgent(writesprompt.md, appends safe outputs instructions whensafe-outputs:is set, setsWM_SAFE_OUTPUT_FILE, streams combined stdout/stderr to a per-run agent log file — defaultagent-stdout.log, or structuredconversation.json/conversation.jsonlwhen print-mode JSON is enabled for the built-inclaudeCLI; SIGTERM then kill on Unix when the run context is canceled), validates agent output size (from log file stat when present) and success, then on success runsoutput.RunSuccessOutputsunlessRunOptions.AgentOnlyis set (CI token sandbox: stop after validation; useProcessRunOutputslater). Emptyoutput.jsonlwarns and succeeds (implicit noop). A deferred conclusion runs when not agent-only: on success, checkpoint comment ifWM_CHECKPOINT=1; on failure, checkout of the previous branch if a feature branch was created; finallyresult.jsonandmeta.json(phase conclusion). WithAgentOnly, the defer writesresult.json/run.jsonwithout conclusion;ProcessRunOutputsperforms outputs + conclusion.runAgentbuilds the prompt from the task body +context.files+ optional checkpoint hint + Available Outputs (whensafe-outputs:is non-empty); setsWM_TASK_TOOLSwhentools:is present; selects CLI viaWM_AGENT_CMDorengine:(claude,codex; useWM_AGENT_CMDfor a custom CLI). Defaultclaudeuses stdin for the prompt,--dangerously-skip-permissions, and optional--model/--max-turnsfrom global config so the agent can run tools (includinggh) non-interactively. Whenclaude_output_format/WM_CLAUDE_OUTPUT_FORMATrequestjsonorstream-json, the runner also passes--output-formatand, forstream-json,--verbose(built-inclaudeonly;WM_AGENT_CMDand codex keep plain-text capture). WhenRunOptions.LogWriteris set (e.g.gh wm runstreaming to stderr), built-inclaudeforcesstream-jsonso subprocess output is newline-delimited as events occur instead of bufferingtextuntil exit; the raw JSONL is written unchanged toconversation.jsonl, while the log writer receives human-readable lines parsed from the same stream (logstream.go). In-memoryStdout/Summaryhold a 64 KiB tail of the transcript when a run dir is used (full text is on disk).- Timeout:
cmd/runusestimeout-minutesfrom task frontmatter (default 45, max 480).
RunTask pipeline (detailed reference)
Implementation: RunTask
, rundir.go
, activation.go
, validation.go
, conclusion.go
.
Contract: One gh-wm run / gh wm run process executes the pipeline below. The primary API result is types.RunResult
(Phase, Success, AgentResult, Errors, Duration, RunDir) plus a Go error. Conclusion (checkpoint, branch rollback, result.json) runs in a defer after task and tc are set; if the run fails earlier (e.g. config load, missing task, invalid event), tc may be nil and conclusion does nothing (and no run dir is created if failure is before NewRunDir).
Phase 1 — Activation (PhaseActivation)
| Reads | Purpose |
|---|---|
Disk: .wm/config.yml, .wm/tasks/*.md | config.Load → global config + tasks |
In-memory: *GitHubEvent | Must be non-nil; Payload non-nil; Name non-empty (except unknown for local empty-event runs) |
Env: GITHUB_REPOSITORY, WM_AGENT_CMD, task engine: / global engine | Engine validation |
Env: WM_CHECKPOINT=1 (optional) | Enables checkpoint read below |
Env: WM_RUN_DIR (optional) | Base path for per-run dirs instead of <repo>/.wm/runs/ |
Disk: claude_output_format in .wm/config.yml; env: WM_CLAUDE_OUTPUT_FORMAT (optional) | Overrides config when set: text (default), json, or stream-json for built-in claude — chooses run-dir filename, --output-format, and --verbose when stream-json |
GitHub API: ghclient.ListIssueCommentBodies (optional) | Only with checkpoint mode + GITHUB_REPOSITORY + issue/PR number: load comment bodies to find latest <!-- wm-checkpoint: … --> |
| In-memory outputs | |
|---|---|
TaskContext | Task name, RepoPath, event, issue/PR numbers from payload (extractNumbers) |
CheckpointHint | Latest checkpoint summary text for the agent prompt |
| Writes / side effects | Where |
|---|---|
| Optional: feature branch | Local git repo (gitbranch.PrepareFeatureForPR) when safe-outputs includes create-pull-request |
| Per-run directory | <repo>/.wm/runs/<id>/ or WM_RUN_DIR/<id>/: meta.json (phase activation); PruneRunDirs drops dirs older than 7 days under .wm/runs (and under WM_RUN_DIR when set) |
Phase 2 — Agent (PhaseAgent)
| Reads | Purpose |
|---|---|
Task body, global context.files | Prompt in runAgent |
CheckpointHint | Appended to prompt |
| Repo working tree | Agent subprocess Dir = --repo-root; agent may edit files / run git |
| Outputs | |
|---|---|
AgentResult | Combined transcript: full agent log on disk (agent-stdout.log, or conversation.json / conversation.jsonl when structured print-mode output is enabled for built-in claude); Stdout/Summary hold a 64 KiB tail when a run dir exists (for checkpoints/comments); Success, ExitCode, TimedOut if context deadline exceeded |
| Optional stream | Tee to RunOptions.LogWriter (CLI uses stderr) and to the same per-run log file; raw bytes match the subprocess (e.g. conversation.jsonl when stream-json) while stderr shows formatted [tool] / [agent] / [result] lines when streaming is enabled |
Phase 3 — Validation (PhaseValidation)
| Reads | Purpose |
|---|---|
AgentResult, run context | In-process checks; deadline → timeout error |
| Checks | |
|---|---|
validateAgentOutputErr | Non-nil result, Success, not timed out; size from the on-disk agent log path when set, else in-memory text length ≤ 12 MiB. Empty successful output is allowed. |
Phase 4 — Safe outputs (PhaseOutputs)
| Reads | Purpose |
|---|---|
AgentResult.SafeOutputFilePath, output.jsonl, TaskContext, safe-outputs: | RunSuccessOutputs
: NDJSON items (empty warns with implicit noop) |
| Writes (if configured / requested) | Persistence |
|---|---|
create-pull-request / create_pull_request | git push, gh pr create → GitHub |
create-issue / create_issue | gh issue create → GitHub |
add-labels / add_labels | GitHub API → labels |
remove-labels / remove_labels | GitHub API → remove labels |
add-comment / add_comment | gh issue comment / gh pr comment → GitHub |
noop | Log only |
missing_tool / missing_data | Log only |
Phase 5 — Conclusion (deferred)
Runs in defer via concludeRun
only when task and tc are non-nil.
On success (runSucceeded):
| Action | Reads | Writes |
|---|---|---|
| Checkpoint | WM_CHECKPOINT=1, AgentResult text | New issue comment (ghclient.PostIssueComment), body includes encoded checkpoint plus hidden <!-- wm-agent: footer for loop prevention |
On failure:
| Action | Reads | Writes |
|---|---|---|
| Branch rollback | branchCreated, prevBranch | git checkout previous branch on disk (if applicable) |
| Artifacts | RunResult | meta.json (phase conclusion), result.json (serialized outcome) |
Checkpoint failures are appended to RunResult.Errors and do not always change the primary returned error from an earlier phase.
What persists where
| Kind | Where |
|---|---|
RunResult / errors | In-memory for the process; CLI prints phase=, artifacts=, and failure phase: on stderr |
| Per-run artifacts | .wm/runs/<id>/ (or WM_RUN_DIR/<id>/): prompt.md; optional output.jsonl (gh wm emit); combined agent stdout/stderr (agent-stdout.log by default, or conversation.json / conversation.jsonl when claude_output_format / WM_CLAUDE_OUTPUT_FORMAT is json / stream-json for built-in claude); meta.json (phase updates); result.json (final snapshot); run.json (merged meta + outcome for tooling). Ignore runs/ under .wm/ via .wm/.gitignore (gh wm init / gh wm compile ensure that file). |
| Agent tail in memory | Last 64 KiB of combined output in AgentResult when a run dir is used (full output remains in the per-run agent log file above) |
| Repo state | Whatever git / the agent wrote under --repo-root |
| Coordination | GitHub: labels, issue/PR comments, PRs — the main external persistence |
| Checkpoints | Issue comments when WM_CHECKPOINT=1, encoded in internal/checkpoint |
Note: RunResult.Phase is the last phase reached or where failure occurred; it is not set to a separate conclusion value after the defer. There is no collaborator/actor permission gate in the current implementation.
Security posture (minimal)
- In GitHub Actions with the generated
agent-run.yml, theagentjob uses a read-only token so directghmutations from the subprocess fail; validated writes happen only in theoutputsjob viagh wm process-outputs(internal/output). Localgh wm runstill uses your normalghauth unless you enforce otherwise. - Agent-driven
output.jsonlis filtered by declaredsafe-outputs:keys,max:, and label allowlists. - Draft PR defaults in
safe-outputs/.wm/config.ymlfeedgh pr createwhencreate-pull-requestis listed (agent can overridedraftper item when requestingcreate_pull_request).