Hermes Agent’s Most Dangerous Bug Looks Like Success

It can say the work is done before the machine changes. Here’s the proof layer every serious user needs before trusting tool access.

May 11, 2026

Hermes Agent can fail in a way that looks almost like success.

Someone asks it to initialize a folder. The agent explains the plan, reports completion, and the folder still doesn’t exist.

A direct terminal request works right after that.

So the tool exists. The missing step is verified action. A recent Reddit thread described this pattern during an Arch Linux setup using Ollama and local models. The discussion later pointed toward context-window configuration as a likely factor in that specific case, but the wider lesson still holds: users need proof of what actually ran.

Wrong answers waste time.

Tool-using agents create a deeper problem when they sound right while machine state stays unchanged.

The fix isn’t a louder prompt. It isn’t more autonomy language. It isn’t pretending every model behaves the same way.

The fix is an action receipt.

An action receipt is a short proof packet. It tells you what Hermes tried, which tool ran, where it ran, what changed, and how the result was checked.

For a beginner, it answers one practical question:

Did Hermes actually do the thing?

For an advanced operator, it becomes a debugging habit. It helps separate prompt confusion from provider issues, backend mismatch, missing tool calls, gateway state problems, and failed verification.

That’s why action receipts should become a basic Hermes workflow.

What Hermes is built to do

Hermes Agent isn’t just a chat window.

The official repository describes Hermes as an agent from Nous Research with a built-in learning loop. The repo points to skills, memory, scheduled automations, subagents, gateway access, provider flexibility, and multiple terminal backends as core parts of the system.

That capability creates a requirement most people skip: when Hermes claims it changed something, the user needs proof.

A plain response says this:

I created the project folder and initialized the files.

A receipt says this:

Action receipt

Task: Create a project folder and README file
Tool used: terminal
Backend: local
Command: mkdir -p hermes-action-test && echo "Hermes action test." > hermes-action-test/README.md
Result: exit code 0
Verification command: ls -la hermes-action-test && cat hermes-action-test/README.md
Verification output: README.md exists and contains the expected sentence
Human review: confirm this is the right folder before adding more files

One version asks for trust.

The other gives evidence.

The real problem is action ambiguity

Most users describe this failure too vaguely.

They say Hermes hallucinated, got lazy, or didn’t understand.

That may be true, but it doesn’t help much.

The sharper diagnosis is action ambiguity.

Action ambiguity happens when the agent’s message sounds like work happened, but the user can’t tell whether a tool actually ran.

Hermes may be operating through a model, provider, terminal backend, file operation, tool call, gateway session, profile, skill, or memory update. Those layers don’t fail in one clean way.

A GitHub issue from April 2026 reported that custom endpoints could behave like plain chat agents, with no command or tool execution enabled by default. The reporter expected command execution, but the model only responded like a chat agent.

Another April issue described unstable tool calling in Hermes v0.8.0, including configured tools not being used, invalid tool names, empty responses, retry loops, and generic fallback replies.

These reports don’t describe one universal bug.

They point to the same operator need.

Hermes users need receipts for state-changing work.

Why v0.13 makes this more important

Hermes v0.13.0 shipped on May 7, 2026 as the Tenacity Release. The release notes describe durable multi-agent Kanban, heartbeat, reclaim, zombie detection, incomplete-exit blocking, per-task retries, hallucination recovery, persistent goals, checkpoints v2, gateway auto-resume, and cron watchdog mode.

That release moves Hermes toward longer-running work.

Longer work needs clearer proof.

A single terminal command is easy to check. A larger workflow can pass through a board, restart after a gateway interruption, retry a failed task, and report progress through chat.

Human review still needs to know what actually happened at each step.

A checkpoint should say what state was saved.

Retry notes should name the earlier failure before another attempt starts.

Kanban completion should include the verification that moved the card forward.

Gateway updates should separate planning from completed action.

Without receipts, reliability features can make confusion last longer.

The beginner mental model

Think of Hermes like someone helping in your workshop.

“I fixed it” is a claim.

A useful receipt says what was touched, which tool was used, what changed, and what still needs review.

Hermes needs the software version of that.

Folder creation should include the path and a check that proves it exists.

File edits should name the file and summarize the change.

Package installs should include the command and a version check.

Memory updates should explain what was saved and why.

Skill creation should identify the file and the trigger.

That doesn’t make the workflow complicated.

It makes the agent’s claim inspectable.

The advanced operator view

Receipts also make debugging cleaner.

Missing command output suggests Hermes stayed in explanation mode.

Docker backend output means the user shouldn’t expect the same file on the host unless the folder was mounted.

Custom endpoint behavior can reveal whether the model acted like a plain chat endpoint.

Failed verification catches the problem before the agent reports success.

The official configuration docs describe multiple terminal backends for Hermes, including local, Docker, SSH, Modal, Daytona, Vercel Sandbox, and Singularity or Apptainer. The same docs warn that local backend commands run directly on the user’s machine with no isolation.

That detail is easy for beginners to miss.

A file can exist in a container but not on the host.

Remote execution can succeed while the user checks a local folder.

Sandboxed work can finish while the main workspace stays untouched.

Receipts stop that from turning into a mystery.

The first test should be boring

Avoid customer messages, repo-wide rewrites, WhatsApp automation, or any workflow where a false “done” creates real damage.

Start with a disposable folder and one verified file write.

Use this prompt:

Create a folder called hermes-action-test.

Inside that folder, create a file called README.md.

The file should contain one sentence:
Hermes action receipts prove machine state changed.

After you act, return an action receipt with:
- the exact command or tool used
- the terminal backend
- the target path
- the verification command
- the verification output

Don't say the task is complete unless the verification command passes.

Then check the result yourself.

This test isn’t about the folder. It checks whether Hermes can plan, act, verify, and report the result without mixing those stages together.

A flashy demo can hide that problem.

A boring receipt test exposes it fast.

Use receipts only when state changes

Receipts shouldn’t appear after every normal chat response.

That would make Hermes annoying.

Use them when Hermes changes external state.

External state means something outside the chat may now be different.

The rule:

Require an action receipt when Hermes changes external state.

Skip the receipt when Hermes only explains, drafts, compares, or brainstorms.

External state includes files, folders, repositories, configs, memory, skills, schedules, gateway settings, browser actions, API calls, outbound messages, package installs, deployments, and server notes.

That keeps the system light until proof matters.

The receipt format

A useful receipt should be short enough to read and specific enough to debug.

Use this:

Action receipt

Task:
State change expected:
Tool or backend used:
Exact command or action:
Target path or object:
Result:
Verification check:
Verification output:
Files or state changed:
Remaining risk:
Human review needed before:

A one-word receipt isn’t enough:

Done.

A usable receipt looks like this:

Action receipt

Task: Create a test folder and README file
State change expected: New folder and file on disk
Tool or backend used: terminal, local backend
Exact command or action: mkdir -p hermes-action-test && echo "Hermes action receipts prove machine state changed." > hermes-action-test/README.md
Target path or object: ./hermes-action-test/README.md
Result: exit code 0
Verification check: ls -la hermes-action-test && cat hermes-action-test/README.md
Verification output: README.md exists and contains the expected sentence
Files or state changed: Created hermes-action-test/README.md
Remaining risk: User should confirm this ran in the intended working directory
Human review needed before: Adding more files or modifying a real project

Now the user has a checkpoint, not just a claim.

Turn the pattern into a Hermes skill

Hermes skills are the right place to store this habit.

Official docs describe skills as on-demand knowledge documents that Hermes loads when needed. They live under the skills directory, appear as slash commands, and can be created or changed by the agent.

Action receipts make sense as a starter skill because they’re not a clever prompt.

They’re a reusable procedure for work that changes state.

Save this as a skill only after you test it in a safe folder:

---
name: action-receipt
description: Require proof whenever Hermes changes files, tools, memory, skills, schedules, configs, repos, messages, browser state, or external systems.
---

# Action Receipt Skill

## Purpose

Use this skill when Hermes performs work that should change external state.

A chat answer isn't proof.
A plan isn't proof.
A completed receipt shows that a tool or command ran and that the result was checked.

## Trigger this skill when the user asks Hermes to:

- create files or folders
- edit files
- delete files
- inspect a repo
- run terminal commands
- initialize a project
- install packages
- modify configuration
- update memory
- create or update skills
- schedule a job
- interact with browser tools
- send or prepare external messages
- commit changes
- open a pull request
- deploy or touch infrastructure

## Don't trigger this skill for:

- pure explanation
- brainstorming
- rough drafting
- concept comparison
- non-action Q&A

## Procedure

1. Restate the requested action.
2. Identify the expected state change.
3. Name the tool or backend required.
4. Ask for approval before destructive, external, or sensitive actions.
5. Execute the smallest safe action.
6. Capture the exact command or tool summary.
7. Run a separate verification check.
8. Report the action receipt.
9. Stop if verification fails.

## Receipt format

Action receipt

- Task:
- State change expected:
- Tool or backend used:
- Exact command or action:
- Target path or object:
- Result:
- Verification check:
- Verification output:
- Files or state changed:
- Remaining risk:
- Human review needed before:

## Failure rules

Never say the task is complete unless verification passes.

No tool ran:
"No tool or command executed. This was only a plan or explanation."

Container, remote host, or sandbox:
Name the backend clearly.

Filesystem mismatch:
Warn the user when they're checking a different filesystem than the execution backend.

Verification failure:
Show the failed check and ask for the next instruction.

Receipts don’t replace sandboxing

A receipt proves what happened.

It doesn’t make every action safe.

Hermes profiles are often misunderstood here. The official profile docs say a profile gives Hermes a separate state directory with its own config, environment file, personality file, memories, sessions, skills, cron jobs, and gateway state. The same docs warn that profiles aren’t sandboxes. Filesystem access is controlled separately.

Receipts and sandboxes solve different problems.

Profiles separate agent state.

Sandboxes limit what the agent can touch.

Receipts show what the agent did.

You still need small permissions, allowlisted users, careful secrets, backups, and review before risky external actions.

Receipts make the workflow inspectable. Broad access still needs hard boundaries.

What changes for beginners

Beginners don’t have to guess whether Hermes understood the task.

They can ask for proof.

That one habit lowers the fear around terminal work because the user can inspect the result instead of trusting a confident sentence.

A vague bug report sounds like this:

Hermes doesn't work.

A useful report sounds like this:

Hermes claimed it initialized a folder, but the receipt showed no command. When I asked for an explicit terminal command, ls -la worked. Backend was local. Model was Ollama with gemma4:26b. The folder didn't exist after verification.

The second version gives the community something to diagnose.

It also helps the user think like an operator without needing to become a developer overnight.

What changes for power users

Action receipts can become a house rule for serious Hermes profiles.

Repo maintenance should require a receipt before files change.

Research workflows can demand source receipts before memory updates.

Gateway sends should stay blocked until the draft, target, and approval point are visible.

Docker-backed setups should name the container or mounted workspace.

CI-style workflows can reject “done” until tests, diffs, or file checks pass.

This is where receipts become more than beginner training.

They create a bridge between chat and operations.

The user doesn’t have to trust the agent’s summary alone. The work leaves a small trail.

Why this is the better Hermes article right now

The crowded lane is obvious: v0.13 recaps, “self-improving agent” explainers, basic installs, and big claims about autonomous workflows.

Those topics will get attention.

Action receipts are a better operator angle because they explain the problem people feel after the install.

The Reddit post about Hermes talking without acting is a beginner pain signal. The GitHub issues around custom endpoints and tool-call instability show the same theme from more technical surfaces.

That’s the opening.

Hermes doesn’t only need more users installing it.

It needs more users who can tell the difference between fluent text and machine state.

Receipts teach that distinction in a way beginners can use immediately and advanced operators can build into their workflows.

Use this before your next Hermes task

Paste this into Hermes before any task that should change files, memory, skills, configs, schedules, repos, browser state, or external systems:

For this task, use action receipts.

Before acting:
1. Tell me the exact state change you plan to make.
2. Name the tool or terminal backend you expect to use.
3. Ask before destructive actions, external sends, installs, deletes, config edits, or memory writes.

During the task:
1. Use real tools or commands when machine state must change.
2. Don't describe a command as completed unless it actually ran.
3. Capture the command, path, backend, result, and verification output.

After each action, return this receipt:

Action receipt
- Task:
- State change expected:
- Tool or backend used:
- Exact command or action:
- Target path or object:
- Result:
- Verification check:
- Verification output:
- Files or state changed:
- Remaining risk:
- Human review needed before:

Stop rule:
If verification fails, don't claim completion. Show the failure and ask what to do next.

Save this as `ACTION_RECEIPT.md`

# ACTION_RECEIPT.md

Use this receipt whenever Hermes claims to change external state.

External state includes files, folders, repos, configs, memory, skills, scheduled jobs, gateway settings, browser actions, API calls, messages, deployments, package installs, and server notes.

## Receipt format

### Task

Describe the requested task in one sentence.

### State change expected

Name what should be different after the action.

### Tool or backend used

Name the tool, command, or backend.

Examples:
- terminal
- local backend
- Docker backend
- SSH backend
- browser tool
- skill_manage
- memory update
- gateway action

### Exact command or action

Paste the command, tool summary, or file operation.

### Target path or object

Show the folder, file, memory entry, skill name, message target, repo path, schedule, or external object.

### Result

Include the exit code, success output, error output, created path, changed file, or returned object.

### Verification check

Run a separate check.

Examples:
- ls -la target-folder
- cat file.md
- git diff -- file.md
- python -m pytest
- hermes skills list
- grep -n "expected text" file.md

### Verification output

Paste the relevant output.

### Remaining risk

State what might still be wrong.

Examples:
- command ran in Docker, not the host
- file exists, but content still needs review
- install succeeded, but package behavior hasn't been tested
- memory changed, but user should confirm it should persist

### Human review

Name the next step that requires human approval.

### Stop rule

If verification fails, don't claim completion.
Summarize the failure and ask for the next instruction.

The practical next move

Run the disposable folder test before trusting Hermes with a real workflow.

Then save the receipt pattern as a skill.

After that, use it inside low-risk work: repo notes, local file organization, research folders, docs cleanup, or read-only server documentation.

Hermes gets more useful when it can act, verify, report, and reuse the procedure later. The receipt is what lets the user know which part happened.

Tyrannicides

That dude who has the YouTube channel, you the dude with the skull cap and glasses… it’s funny how he’s a programmer, I dropped out of programming at DeVry, but most of what he says I understand. He was speaking to this exact same subject matter. Which is also what ultimately pushed me to Hermes.

I was already intrigued with Hermes ever since I first heard of it a month or two ago. But ultimately, OpenClaw breaking on updates was the final straw. I say that my M4 Mac mini OpenClaw agent has been the most stable agent amongst the other three. Two pi 5’s running on Ollama cloud and a Ryzen 3 mini pc running off my Ryzen 5 PC running LM Studio server. The Ryzen 3 was the most stable, until the most recent update broke that agent.

I am still in the process of deciding whether or not to keep the other 3 as OpenClaw agents (I will probably do half and half), but I can say so far I am more than impressed with Hermes. It’s stable, it’s not as fragile as OpenClaw and I dare say it’s almost as if it’s smarter in some strange way. Like it makes the same model (don’t roast me for not having dual 5090’s), qwen3.5-9B running on dual 3060’s at 128k tokens, run better, handle tasks better. I look at the same model running in OpenClaw and the reasoning is confusing. It makes me wonder, TF are you actually doing. Where as with Hermes, it’s more cleanly laid out. More sensible. But indeed, the proof is in the receipts. But Hermes is sort of doing that at the end of tasks, albeit with brief summary’s. A detailed receipt would be nice for programming drop outs like myself who just paste outputs into ChatGPT and ask Chat, “what did he say?”

Discussion about this post

Ready for more?