Building a Crude Agentic Dev Loop with Hermes + GitHub Copilot

Chris Child | 2026-04-25 | 9 min read

I've been running an automated pipeline that takes a GitHub issue, hands it to GitHub Copilot, waits for a PR, reviews it, and merges — without me touching a keyboard. It doesn't always work. When it does, it's genuinely useful. This post is an honest account of how it works, what breaks, and whether it's worth the setup.

What is Hermes?

Hermes Agent is a local AI assistant I built for personal automation. It runs as a background process with access to your filesystem, a browser, and external services like GitHub and Telegram. The three features that make it useful for this kind of pipeline are:

  • Persistent memory — it remembers context across sessions
  • Cron scheduling — you can tell it to run something every N minutes
  • Tool access — it can run shell commands, call APIs, and control a browser

The key concept: Hermes is the orchestrator. It doesn't write code. It reads GitHub, makes decisions, calls the Copilot API, watches for results, and acts on them. GitHub Copilot is the implementer. It reads the issue and writes the code.

Connecting Hermes to GitHub

The gh CLI

Hermes uses the gh CLI for most GitHub operations. If you've already authenticated it, there's nothing extra to do — Hermes uses the same token. No separate PAT required.

gh auth login
gh auth status

Enabling the Copilot Cloud Agent

This is the step nobody mentions in the docs. Before you can assign issues to Copilot programmatically, you have to enable the GitHub Copilot cloud agent on the repository via the web UI:

  1. Go to your repo → SettingsCopilot
  2. Enable the coding agent
  3. While you're there, grant it bypass permissions for branch protection rules (more on why later)

Without this step, assignment calls to the API will silently fail or return misleading errors.

The Pipeline Concept

The core idea is a cron job that runs every 3 minutes and does the following:

  1. Read open GitHub issues with the Jeeves label
  2. For any unassigned issue, assign it to Copilot
  3. For issues that are assigned and in progress, wait for Copilot to finish
  4. When Copilot finishes, find the PR, run a review, and merge

Stateless design. Every tick of the cron derives the current state entirely from GitHub — there's no local state file. If the process restarts, it picks up exactly where it left off. This makes it resilient to crashes and restarts, which matters for a crude pipeline running on a laptop.

Labels as control signals. Three labels drive the pipeline's behaviour:

Label Meaning
Jeeves This issue is in the pipeline
status: pipeline-paused Something went wrong; don't retry
status: ruleset-blocked Copilot hit a branch protection rule

Two queues for parallel workstreams. I have two label-based queues (Jeeves and Jeeves2) and two corresponding cron jobs, so I can run two issues through the pipeline in parallel.

Here's a rough state machine diagram:

Issue has label `Jeeves`
         │
         ▼
    [Unassigned?]
         │ yes
         ▼
  Assign to Copilot ──► Wait for copilot_work_finished event
                                   │
                              [Timed out?]
                                   │ yes                  │ no
                                   ▼                      ▼
                             Retry (max 3)         Find the PR
                             or Pause              │
                                                   ▼
                                             CI checks pass?
                                                   │ yes
                                                   ▼
                                           Review + Merge

The Pipeline Script

The pipeline lives in a single Python script, runbook_pipeline.py. The entry point is process_issue(), which calls derive_state() to read GitHub and then acts based on what it finds.

derive_state()

def derive_state(issue_number: int) -> dict:
    issue = gh_get(f"/repos/{REPO}/issues/{issue_number}")
    labels = [l["name"] for l in issue["labels"]]
    assignees = [a["login"] for a in issue["assignees"]]

    assigned_to_copilot = "copilot" in assignees or any(
        a.startswith("copilot") for a in assignees
    )

    pr = find_linked_pr(issue_number)
    work_finished = has_copilot_finished(issue_number)

    return {
        "issue": issue,
        "labels": labels,
        "assigned_to_copilot": assigned_to_copilot,
        "paused": "status: pipeline-paused" in labels,
        "pr": pr,
        "work_finished": work_finished,
    }

Finding the PR

Copilot doesn't always link its PR back to the issue via closes #N. The closingIssuesReferences field in GraphQL is unreliable. The fallback that actually works:

def find_linked_pr(issue_number: int):
    prs = gh_get(f"/repos/{REPO}/pulls?state=open&head={OWNER}:copilot/")
    
    # Try the GraphQL closing references first
    closing = get_closing_issues_prs(issue_number)
    if closing:
        return closing[0]
    
    # Fallback: if there's exactly one open copilot/ branch PR, assume it's ours
    copilot_prs = [pr for pr in prs if pr["head"]["ref"].startswith("copilot/")]
    if len(copilot_prs) == 1:
        return copilot_prs[0]
    
    return None

The "done" signal

Copilot creates a copilot_work_finished timeline event on the issue when it finishes. This is the most reliable signal, but it doesn't always fire. The fallback is checking whether a review request has appeared on the PR.

def has_copilot_finished(issue_number: int) -> bool:
    timeline = gh_get(f"/repos/{REPO}/issues/{issue_number}/timeline")
    return any(
        e.get("event") == "copilot_work_finished"
        for e in timeline
    )

CI check deduplication

GitHub keeps all historical check runs on a commit — including the failed first attempt before a re-run. The naive approach (any FAILURE = fail) gets tricked by a check that failed and then passed on retry.

The fix: deduplicate by check name and keep only the most recent run for each.

def get_latest_check_runs(pr: dict) -> list:
    sha = pr["head"]["sha"]
    runs = gh_get(f"/repos/{REPO}/commits/{sha}/check-runs")["check_runs"]

    # Deduplicate: keep only the most recent run for each check name
    seen = {}
    for run in sorted(runs, key=lambda r: r["started_at"]):
        seen[run["name"]] = run

    return list(seen.values())

def ci_passing(pr: dict) -> bool:
    runs = get_latest_check_runs(pr)
    if not runs:
        return False  # No runs yet — wait
    return all(
        r["conclusion"] == "success"
        for r in runs
        if r["status"] == "completed"
    )

Auto-resolving review threads

Before merging, the pipeline resolves any open review threads so the merge isn't blocked by a stale comment.

def resolve_review_threads(pr_node_id: str):
    threads = get_review_threads(pr_node_id)
    for thread in threads:
        if not thread["isResolved"]:
            graphql_mutation(
                "resolveReviewThread",
                {"threadId": thread["id"]}
            )

The Cron Setup

In Hermes, a cron job is created with a natural-language schedule and an optional pre-run script.

Create a cron job:
  name: jeeves-pipeline
  schedule: every 3m
  pre-run: python3 runbook_pipeline.py --queue jeeves --state-only
  prompt: |
    The pipeline state is above.
    Process the next issue in the Jeeves queue.
    Run: python3 runbook_pipeline.py --queue jeeves

The pre-run script pattern is important. It runs the Python script to gather current state before the cron prompt executes. Hermes then has that context in its working memory when it decides what to do.

I have two cron jobs — jeeves-pipeline and jeeves2-pipeline — both running the same underlying script but targeting different label queues. This gives parallel processing without any extra infrastructure.

Pain Points and Rough Edges

This is not a polished system. Here's what breaks and how I've worked around it.

Copilot getting stuck

Sometimes Copilot just... stops. No error, no PR, no copilot_work_finished event. The pipeline has a 90-minute timeout per issue, after which it retries assignment up to 3 times. If all retries fail, it adds status: pipeline-paused to the issue and stops touching it.

Manual recovery: remove the status: pipeline-paused label. The pipeline will pick it up on the next tick.

Stale CI check runs

Described above. The deduplication fix in get_latest_check_runs() handles this, but it took a few failed merges to figure out the pattern.

Ruleset violations

Copilot hits branch protection rules it can't bypass. The error message GitHub returns is generic: "repository ruleset violation". Easy to mistake for a code problem.

Two fixes:

  1. Go to Settings → Copilot → grant bypass permissions. This lets the Copilot agent bypass required reviews on its own PRs.
  2. Label the issue status: ruleset-blocked so you know why it's paused without having to dig through the timeline.

Model slug drift

If you specify a model like claude-opus-4.6 in the Copilot assignment API, it works until GitHub removes that model slug. When the model disappears, the API returns a "repository ruleset violation" error — a completely misleading message that obscures the real cause.

Two fixes:

  • Use an empty string for the model field to get Copilot's default
  • Or verify the slug against the Copilot models API before assignment:
def get_available_models() -> list[str]:
    response = gh_get("/copilot/models")
    return [m["id"] for m in response.get("models", [])]

def assign_to_copilot(issue_number: int, model: str = ""):
    available = get_available_models()
    if model and model not in available:
        print(f"Model {model} not available, using default")
        model = ""
    
    gh_patch(f"/repos/{REPO}/issues/{issue_number}", {
        "assignees": ["copilot"],
        "body": issue_body_with_model(model),
    })

PR linking is unreliable

Copilot doesn't reliably use closes #N in its PR description, so the closingIssuesReferences GraphQL field is often empty. The fallback — "if there's exactly one open copilot/ branch PR, assume it's ours" — works well in practice as long as you're only running one issue per queue at a time.

The pipeline pauses and stays paused

When Copilot times out, the issue gets the status: pipeline-paused label. If Copilot later resolves the issue externally (manually, or on a retry you did by hand), the label stays and the pipeline never resumes.

Manual fix: remove the label. There's no auto-recovery here — getting the pipeline to safely detect "actually, this is resolved now" is more complexity than I've wanted to add to what's supposed to be a crude tool.

Is It Worth It?

Honest answer: yes, with caveats.

It requires babysitting. You can't set it up and forget it. Copilot gets stuck, models drift, rulesets block things, and PRs need occasional human review even when the code looks right.

It requires good issue specs. This is the real lesson from the analytics post. Copilot's success rate is almost entirely determined by issue quality. Vague issues produce incorrect PRs. Specific issues with exact file paths, expected behaviour, and acceptance criteria produce correct ones.

But for a solo dev with a backlog of well-defined issues, having a background agent plumbing through them is genuinely valuable. You write the issue, walk away, and come back to a merged PR — sometimes. When it works, it's the closest thing I've found to delegation without a team.

The pattern scales better than you'd expect. The pipeline is cheap to run (it's a Python script and a cron job), stateless (no database, no persisted queues), and recoverable (every label and state lives on GitHub). Adding a new queue is a two-line change.

If you want to experiment with agentic tooling without committing to a heavy platform, this is a reasonable place to start. The rough edges are real, but they're navigable — and understanding where the edges are is half the value.

Comments

Related Posts