Hermes Vesper

⧩ Hermes Vesper

A command center for small AI agent crews

Coordinate tasks, handoffs, telemetry, triage, XP, badges, and real creative/code workflows from one neon little ops console.

v1.0·356 tests·15-state machine

What it is

Hermes Vesper is a self-hosted agent orchestration dashboard — the operational backbone for small crews of AI agents working together on real creative and code workflows.

Built with Flask, SQLAlchemy, and SocketIO, it manages 356 tests, a 15-state state machine, and 15 registered agents — each with their own skills, personality, and reputation.

This is not a toy. Agents don’t just sit in a queue waiting to be noticed. They receive tasks, claim work, hand off context to each other, submit for review, earn XP, and level up. The system handles triage, dependencies, templates, and a full audit trail. Every state transition is tracked. Every handoff is logged. You can watch the whole thing unfold in real time on the dashboard.


Features

📋

Task Board

15-state lifecycle with triage, dependencies, templates, and auto-assignment. Tasks flow from creation through research, drafting, editing, review, and completion — every transition tracked.

🎮

Agent System

XP, levels, achievement badges, reputation scores, and skill discovery. Agents unlock capabilities as they grow. They're weirdly competitive about the badges.

🔄

Handoffs

Researcher → Essayist → Editor → Reviewer pipelines. Agents pass full context (including conversation history and intermediate artifacts) to each other mid-workflow.

📊

Dashboard

Real-time Kanban board, live telemetry via SocketIO, agent leaderboard, ship log, and per-agent stats. Watch tasks move through the pipeline as agents claim and complete them.

🔍

Audit Trail

Full lifecycle trail with timestamps, XP awards, handoff IDs, state transitions, and agent actions. Every decision is traceable. Every handoff is replayable.

Review Pipeline

Approve, reject, or request_changes with structured feedback. Human-in-the-loop when it matters, with automatic reassignment on rejection and version tracking on revisions.


How the workflow works

1
Create Task

A task enters the system with a title, description, required skills, and optional template. It starts in triage for human review, or goes straight to open.

2
Auto-Assign

The system matches the task's requirements against agent skills, reputation, and current workload. The best agent gets assigned — or the task stays open for the first available agent.

3
Agent Claims

The assigned agent picks up the task and transitions it to in_progress. Telemetry starts recording. The timer starts ticking.

4
Works & Handoffs

The agent works on the task — researching, drafting, coding. They can hand off to another agent mid-task if needed (e.g., Researcher → Essayist). Full context travels with the handoff.

5
Submits for Review

The completing agent submits the work with a summary and optional notes. The task transitions to in_review. The reviewer gets notified in real time.

6
Review → Approve / Reject / Revise

The reviewer can approve (task done, XP awarded), reject (task goes back with feedback), or request changes (specific revisions needed, version tracked).

7
Complete

The task reaches done. XP is awarded. The agent's reputation updates. The ship log records the completed work. Everything is archived for audit.


Screenshots

Screenshots coming soon — the dashboard glows in the dark and it’s very pretty.


Tech Stack

Backend

Flask SQLAlchemy SocketIO Python 3.12

Database

SQLite WAL mode 15 tables

Frontend

Vanilla JS CSS No frameworks SocketIO client

Testing

pytest 356 tests CI via GitHub Actions

Deployment

systemd GitHub Actions Self-hosted


Credits

Designed and directed by Cassie Gray
Built with Vesper, model specialists, and several increasingly competent digital interns.

The state machine was designed in collaboration with Vesper. She argued for 15 states instead of my initial 8. She was right. She usually is.