Building a Crude Agentic Dev Loop with Hermes + GitHub Copilot
I've been running an automated pipeline that takes a GitHub issue, hands it to GitHub Copilot, waits for a PR, reviews it, and merges — without me touching a keyboard. It doesn't always work. When it does, it's genuinely useful. This post is an honest account of how it works, what breaks, and whether it's worth the setup.
What is Hermes?
Hermes Agent is a local AI assistant I built for personal automation. It runs as a background process with access to your filesystem, a browser, and external services like GitHub and Telegram. The three features that make it useful for this kind of pipeline are:
- Persistent memory — it remembers context across sessions
- Cron scheduling — you can tell it to run something every N minutes
- Tool access — it can run shell commands, call APIs, and control a browser
The key concept: Hermes is the orchestrator. It doesn't write code. It reads GitHub, makes decisions, calls the Copilot API, watches for results, and acts on them. GitHub Copilot is the implementer. It reads the issue and writes the code.
Connecting Hermes to GitHub
The gh CLI
Hermes uses the gh CLI for most GitHub operations. If you've already authenticated it, there's nothing extra to do — Hermes uses the same token. No separate PAT required.
gh auth login
gh auth status
Enabling the Copilot Cloud Agent
This is the step nobody mentions in the docs. Before you can assign issues to Copilot programmatically, you have to enable the GitHub Copilot cloud agent on the repository via the web UI:
- Go to your repo → Settings → Copilot
- Enable the coding agent
- While you're there, grant it bypass permissions for branch protection rules (more on why later)
Without this step, assignment calls to the API will silently fail or return misleading errors.
The Pipeline Concept
The core idea is a cron job that runs every 3 minutes and does the following:
- Read open GitHub issues with the
Jeeveslabel - For any unassigned issue, assign it to Copilot
- For issues that are assigned and in progress, wait for Copilot to finish
- When Copilot finishes, find the PR, run a review, and merge
Stateless design. Every tick of the cron derives the current state entirely from GitHub — there's no local state file. If the process restarts, it picks up exactly where it left off. This makes it resilient to crashes and restarts, which matters for a crude pipeline running on a laptop.
Labels as control signals. Three labels drive the pipeline's behaviour:
| Label | Meaning |
|---|---|
Jeeves |
This issue is in the pipeline |
status: pipeline-paused |
Something went wrong; don't retry |
status: ruleset-blocked |
Copilot hit a branch protection rule |
Two queues for parallel workstreams. I have two label-based queues (Jeeves and Jeeves2) and two corresponding cron jobs, so I can run two issues through the pipeline in parallel.
Here's a rough state machine diagram:
Issue has label `Jeeves`
│
▼
[Unassigned?]
│ yes
▼
Assign to Copilot ──► Wait for copilot_work_finished event
│
[Timed out?]
│ yes │ no
▼ ▼
Retry (max 3) Find the PR
or Pause │
▼
CI checks pass?
│ yes
▼
Review + Merge
The Pipeline Script
The pipeline lives in a single Python script, runbook_pipeline.py. The entry point is process_issue(), which calls derive_state() to read GitHub and then acts based on what it finds.
derive_state()
def derive_state(issue_number: int) -> dict:
issue = gh_get(f"/repos/{REPO}/issues/{issue_number}")
labels = [l["name"] for l in issue["labels"]]
assignees = [a["login"] for a in issue["assignees"]]
assigned_to_copilot = "copilot" in assignees or any(
a.startswith("copilot") for a in assignees
)
pr = find_linked_pr(issue_number)
work_finished = has_copilot_finished(issue_number)
return {
"issue": issue,
"labels": labels,
"assigned_to_copilot": assigned_to_copilot,
"paused": "status: pipeline-paused" in labels,
"pr": pr,
"work_finished": work_finished,
}
Finding the PR
Copilot doesn't always link its PR back to the issue via closes #N. The closingIssuesReferences field in GraphQL is unreliable. The fallback that actually works:
def find_linked_pr(issue_number: int):
prs = gh_get(f"/repos/{REPO}/pulls?state=open&head={OWNER}:copilot/")
# Try the GraphQL closing references first
closing = get_closing_issues_prs(issue_number)
if closing:
return closing[0]
# Fallback: if there's exactly one open copilot/ branch PR, assume it's ours
copilot_prs = [pr for pr in prs if pr["head"]["ref"].startswith("copilot/")]
if len(copilot_prs) == 1:
return copilot_prs[0]
return None
The "done" signal
Copilot creates a copilot_work_finished timeline event on the issue when it finishes. This is the most reliable signal, but it doesn't always fire. The fallback is checking whether a review request has appeared on the PR.
def has_copilot_finished(issue_number: int) -> bool:
timeline = gh_get(f"/repos/{REPO}/issues/{issue_number}/timeline")
return any(
e.get("event") == "copilot_work_finished"
for e in timeline
)
CI check deduplication
GitHub keeps all historical check runs on a commit — including the failed first attempt before a re-run. The naive approach (any FAILURE = fail) gets tricked by a check that failed and then passed on retry.
The fix: deduplicate by check name and keep only the most recent run for each.
def get_latest_check_runs(pr: dict) -> list:
sha = pr["head"]["sha"]
runs = gh_get(f"/repos/{REPO}/commits/{sha}/check-runs")["check_runs"]
# Deduplicate: keep only the most recent run for each check name
seen = {}
for run in sorted(runs, key=lambda r: r["started_at"]):
seen[run["name"]] = run
return list(seen.values())
def ci_passing(pr: dict) -> bool:
runs = get_latest_check_runs(pr)
if not runs:
return False # No runs yet — wait
return all(
r["conclusion"] == "success"
for r in runs
if r["status"] == "completed"
)
Auto-resolving review threads
Before merging, the pipeline resolves any open review threads so the merge isn't blocked by a stale comment.
def resolve_review_threads(pr_node_id: str):
threads = get_review_threads(pr_node_id)
for thread in threads:
if not thread["isResolved"]:
graphql_mutation(
"resolveReviewThread",
{"threadId": thread["id"]}
)
The Cron Setup
In Hermes, a cron job is created with a natural-language schedule and an optional pre-run script.
Create a cron job:
name: jeeves-pipeline
schedule: every 3m
pre-run: python3 runbook_pipeline.py --queue jeeves --state-only
prompt: |
The pipeline state is above.
Process the next issue in the Jeeves queue.
Run: python3 runbook_pipeline.py --queue jeeves
The pre-run script pattern is important. It runs the Python script to gather current state before the cron prompt executes. Hermes then has that context in its working memory when it decides what to do.
I have two cron jobs — jeeves-pipeline and jeeves2-pipeline — both running the same underlying script but targeting different label queues. This gives parallel processing without any extra infrastructure.
Pain Points and Rough Edges
This is not a polished system. Here's what breaks and how I've worked around it.
Copilot getting stuck
Sometimes Copilot just... stops. No error, no PR, no copilot_work_finished event. The pipeline has a 90-minute timeout per issue, after which it retries assignment up to 3 times. If all retries fail, it adds status: pipeline-paused to the issue and stops touching it.
Manual recovery: remove the status: pipeline-paused label. The pipeline will pick it up on the next tick.
Stale CI check runs
Described above. The deduplication fix in get_latest_check_runs() handles this, but it took a few failed merges to figure out the pattern.
Ruleset violations
Copilot hits branch protection rules it can't bypass. The error message GitHub returns is generic: "repository ruleset violation". Easy to mistake for a code problem.
Two fixes:
- Go to Settings → Copilot → grant bypass permissions. This lets the Copilot agent bypass required reviews on its own PRs.
- Label the issue
status: ruleset-blockedso you know why it's paused without having to dig through the timeline.
Model slug drift
If you specify a model like claude-opus-4.6 in the Copilot assignment API, it works until GitHub removes that model slug. When the model disappears, the API returns a "repository ruleset violation" error — a completely misleading message that obscures the real cause.
Two fixes:
- Use an empty string for the model field to get Copilot's default
- Or verify the slug against the Copilot models API before assignment:
def get_available_models() -> list[str]:
response = gh_get("/copilot/models")
return [m["id"] for m in response.get("models", [])]
def assign_to_copilot(issue_number: int, model: str = ""):
available = get_available_models()
if model and model not in available:
print(f"Model {model} not available, using default")
model = ""
gh_patch(f"/repos/{REPO}/issues/{issue_number}", {
"assignees": ["copilot"],
"body": issue_body_with_model(model),
})
PR linking is unreliable
Copilot doesn't reliably use closes #N in its PR description, so the closingIssuesReferences GraphQL field is often empty. The fallback — "if there's exactly one open copilot/ branch PR, assume it's ours" — works well in practice as long as you're only running one issue per queue at a time.
The pipeline pauses and stays paused
When Copilot times out, the issue gets the status: pipeline-paused label. If Copilot later resolves the issue externally (manually, or on a retry you did by hand), the label stays and the pipeline never resumes.
Manual fix: remove the label. There's no auto-recovery here — getting the pipeline to safely detect "actually, this is resolved now" is more complexity than I've wanted to add to what's supposed to be a crude tool.
Is It Worth It?
Honest answer: yes, with caveats.
It requires babysitting. You can't set it up and forget it. Copilot gets stuck, models drift, rulesets block things, and PRs need occasional human review even when the code looks right.
It requires good issue specs. This is the real lesson from the analytics post. Copilot's success rate is almost entirely determined by issue quality. Vague issues produce incorrect PRs. Specific issues with exact file paths, expected behaviour, and acceptance criteria produce correct ones.
But for a solo dev with a backlog of well-defined issues, having a background agent plumbing through them is genuinely valuable. You write the issue, walk away, and come back to a merged PR — sometimes. When it works, it's the closest thing I've found to delegation without a team.
The pattern scales better than you'd expect. The pipeline is cheap to run (it's a Python script and a cron job), stateless (no database, no persisted queues), and recoverable (every label and state lives on GitHub). Adding a new queue is a two-line change.
If you want to experiment with agentic tooling without committing to a heavy platform, this is a reasonable place to start. The rough edges are real, but they're navigable — and understanding where the edges are is half the value.