Meet Hermes: An Operational Agent for Real Business Workflows

Hermes is the open-source AI agent we use to build operational agents for our clients. It runs as a persistent process — lives on a server, holds memory across sessions, schedules its own jobs, and works across Slack, email, CLI, and the messaging platforms most teams already use. The piece we lean on hardest is its skills system: every time Hermes solves a problem, it writes a reusable skill that makes the next attempt faster and more reliable. The agent gets better at a workflow the longer it runs it.

It sits in the same neighborhood as OpenClaw, but with two differences that matter in production. First is stability — Hermes runs for weeks without the context drift or runaway behavior that has made OpenClaw deployments hard to trust at scale. Second is the self-improving loop. OpenClaw treats every task as a fresh start. Hermes accumulates skills, refines them under correction, and uses them to handle similar work autonomously next time. Over a few months of real workflows, the gap widens.

We use Hermes as the runtime. What we build on top is the deployment — integrations with the systems that already run your business, custom skills for your specific workflows, and the autonomy gates that decide what the agent handles on its own versus what it routes to a human.

This post is a case study. Two real-shaped examples of Hermes deployments, drawn from operations that exist in nearly every mid-market and enterprise business. We'll skip the agent-pitch and look at the workflows.

Where Hermes Lives

A short note on the model before the examples.

Hermes runs inside your environment — the source code, the prompts, the decision logic, and the action history all sit in repositories and infrastructure you control. It connects to your existing systems via the same APIs your team uses: email, CRM, ticketing, accounting, document storage. It does not require a parallel agent platform, a third-party SaaS, or a vendor's runtime. If we delivered Hermes today and walked away, it would keep operating tomorrow.

Every action Hermes takes is logged with the inputs, the reasoning, the confidence level, and the outcome. Consequential actions — anything that changes external state, contacts a customer, or moves money — require human approval until you've observed the agent long enough to lift the gate. Low-risk actions execute autonomously from day one.

That's the model. Now the workflows.

Case 1: The Invoice That Reads Itself

The situation. A mid-size company processes around 4,000 vendor invoices a month. They arrive by email in roughly twenty formats — PDF attachments, embedded images, sometimes plain-text bodies, occasionally CSVs. Two AP clerks spend most of their week opening these messages, extracting the line items, checking each invoice against the purchase order it references, flagging discrepancies, and entering the result into the accounting system. Routine invoices take three minutes each. Exception cases take twenty.

What Hermes does:

Watches the AP inbox. When a new message lands, Hermes pulls the body and any attachments.
Extracts structured data. PDFs, scanned images, and free-text emails all get normalized into the same invoice schema — vendor, PO number, line items, totals, dates, tax.
Validates against the purchase order. Hermes queries the ERP for the PO referenced on the invoice. It compares line items, unit prices, and quantities. It checks against the receiving records for goods received.
Decides. Three outcomes:
- Clean match. Hermes enters the invoice into accounts payable with the standard approval flag, attaches the source PDF, and replies to the vendor confirming receipt. No human touches it.
- Minor variance. Hermes flags the specific lines that don't match, drafts a note to the buyer who owns the PO, and parks the invoice in a pending queue.
- No matching PO or major variance. Hermes routes to AP with a structured summary of what it could and could not match, and a confidence score.
Learns from corrections. When a human overrides Hermes's decision, the correction feeds back into the agent's evaluation rules. The agent's threshold for autonomy on similar cases adjusts.

The shape of the work shifts. The two AP clerks stop opening every email and start working a much shorter queue: the exceptions. The clean invoices — typically 70-80% of volume — go straight through. Cycle time on a routine invoice drops from minutes to seconds. Cycle time on exceptions is unchanged, but exceptions are now the only work humans see.

What's worth noting is what Hermes doesn't try to do. It doesn't approve invoices over a threshold. It doesn't contact vendors about disputes. It doesn't modify the chart of accounts. Those decisions stay with humans. The agent's job is to handle the volume and surface the judgment calls.

Case 2: The Inbox That Routes Itself

The situation. A B2B software company runs a support inbox that receives about 600 tickets a day. The first-line support team's first job is classification: is this a bug, a billing question, a feature request, a general how-to, an integration issue, or a churn signal? Their second job is response: about half the tickets get a pattern-matched answer from a runbook; the other half need investigation or escalation. The team's tenured members do this in fifteen seconds; new hires take five minutes per ticket and make routing mistakes that cost the company hours of misallocated engineering time.

What Hermes does:

Reads each new ticket. Full body, subject line, customer metadata pulled from the CRM, prior conversation history if it exists.
Classifies. The agent assigns a category, a sub-category, a sentiment score, and an urgency rating. The classification taxonomy is the one the support team already uses.
Checks the known-issues registry. Hermes searches the team's internal documentation and recent engineering changelog for matching symptoms. If it finds a known issue, the ticket gets tagged with the related incident and the customer gets the existing workaround.
Drafts a response. For tickets that match a runbook pattern, Hermes drafts the response with the right tone for that customer's tier. For tickets that don't, it drafts an acknowledgment and a clarifying question.
Routes or sends. Tickets with low-risk responses and high agent confidence go to a human reviewer who approves with one click. Tickets above a configurable confidence threshold can send autonomously — typically password resets, billing receipts, documentation links, account status questions. Anything involving a refund, a churn risk, a security concern, or a major customer is escalated to the right human with a structured summary.

The pattern that emerges is similar to the AP case. The support team's first thirty minutes of every day used to be triage. Now triage is done before they arrive. Their queue is pre-classified, pre-routed, and the routine 30-40% of responses have already gone out. Their job is the conversations that need a human — and they get those conversations sooner, with more context.

Two things matter about how this is built. First, the classification and response taxonomy is the support team's, not Hermes's — the agent is tuned against thousands of historical tickets and reviewed by senior support engineers before going live. Second, the autonomy threshold is adjustable. On day one, Hermes sends nothing without human approval. By week six, it sends password resets and documentation responses autonomously. By month three, the team has data on which categories the agent handles at human-level quality and which it doesn't, and the gates are set accordingly.

What Both Cases Share

The two workflows are different on the surface — one is a finance back-office process, the other is a customer-facing support function — but they share a structure.

Both cases involve inbound work that arrives in unstructured form: an email, a PDF, a free-text ticket. Both require extraction: turning the message into structured fields the business can act on. Both require judgment: comparing against rules, deciding what to do next. Both have clean-path cases and exception cases, where the clean path dominates the volume and the exceptions need humans. Both have consequential actions — paying a vendor, replying to a customer — that benefit from human oversight on the high-stakes edge.

This is the shape of work Hermes is built for. Repetitive, judgment-light at the median, judgment-heavy at the tail, bottlenecked on the team's attention. The agent absorbs the median; the team focuses on the tail.

The Closing Note

The most common question we get about Hermes is whether it replaces the team. The answer that matters: it doesn't. The two AP clerks are still there, and the support team is still there. Their work is harder and more interesting now. They handle the cases where business judgment is required. They are no longer the throughput bottleneck for invoice volume or ticket volume — the company can grow without proportionally growing the headcount on those functions.

That's the actual unlock. Most operations teams don't need more software to interact with. They need fewer routine inputs to process. Hermes is not a tool the team learns; it's a worker that joins the team and takes the load that should never have been on humans in the first place.

If you're looking at a queue that grows faster than your team can — invoices, tickets, applications, documents, requests — that queue is probably a Hermes problem. We can usually tell within a single conversation whether it is.