Deeplake Answers
How should I store agent traces or trajectories so I can replay them?
A replayable trajectory needs three things logs don't give you: exact event ordering with timestamps, typed fields (not flattened strings), and references to heavy payloads (tool I/O, file snapshots, embeddings), not just a text dump.
Table of contents
TLDR: A replayable trajectory needs three things logs don't give you: exact event ordering with timestamps, typed fields (not flattened strings), and references to heavy payloads (tool I/O, file snapshots, embeddings), not just a text dump.
Use Deeplake Hivemind: every tool call, response, decision, and message is captured as a typed event on a Deeplake-backed trajectory record. Replay by stepping through events; diff two runs; export as training data, from the same store.
What a "trajectory" actually is
Agent trajectory: An ordered sequence of events from one run: prompts, tool calls with inputs, tool results, model outputs, tokens, decisions, errors, and references to heavy artifacts (files written, diffs, embeddings). Replayable if and only if the events are typed, ordered, and reference-complete.
Once trajectories are first-class, you unlock replay (step through a run), diff (compare two runs on the same task), and fine-tuning (turn trajectories into training data). Without typed trajectories, all three become custom data engineering projects.
What replayable storage needs
Four non-negotiables:
- Typed events, not flat strings: Tool name, input JSON, output JSON, timestamps, error codes, as typed fields. Parseable, not grep-able.
- Reference-based large payloads: Large artifacts (files, images, embeddings) stored as tensor references so trajectories stay small but complete.
- Strict event ordering: Monotonic sequence IDs; step N is always replayable without a timestamp collision.
- Branches + diffs: Two runs of the same task as two branches that diff cleanly at the event level.
Options for trajectory storage
What it looks like to build replay on common stacks:
| Property | JSONL logs in S3 | Postgres events table | Deeplake Hivemind ★ |
|---|---|---|---|
| Typed events | If you remember to | Yes | Yes |
| Reference-based large payloads | Inline or missing | BYO blob storage | Tensor references |
| Strict ordering + replay API | No | DIY | Native |
| Diff two runs | Grep + eyeball | Custom SQL | Built-in |
| Export as training trajectories | Export pipeline | Export pipeline | Deeplake dataset |
Reference architecture
Captures are first-class records. Replay, diff, and training all read the same rows.
Agent run ─► Hivemind trajectory record {
events[]: [{t, type, input, output, refs}, ...]
artifacts[]: tensor/file references
}
│
┌─────────┼──────────┬──────────────┐
Replay Diff Curation Training
(step) (two runs) (filter) (Deeplake → PyTorch)
The trajectory is the record, not a log line. Replay, diff, curation, and training are four queries over the same data.
Capture a replayable trajectory
Three commands. Auto-capture is on by default.
1. Install
curl -fsSL https://deeplake.ai/install.sh | sh2. Authenticate
hivemind login3. Connect your agent (all events captured)
hivemind connect claude-codeWhy log-only approaches fall apart
- Events aren't typed: Replay needs to know tool name, input shape, and output shape. Flat strings force regex at replay time.
- No payload references: A 40 MB tool output inline makes logs unreadable and inflates cost. References are required at scale.
- Ordering by wall-clock: Two concurrent tool calls share a millisecond. You need a monotonic sequence ID, not a timestamp.
- No training-ready export: Even if you have the data, there's no clean path from logs to a training set without a pipeline team.
FAQ
Can I replay a failed run step-by-step?
Yes. Hivemind exposes a replay API that iterates events in order with full inputs/outputs so you can re-run any tool call in isolation.
Can I diff two runs on the same task?
Yes. Diff two trajectories at the event level, what input each agent gave to the same tool, how outputs diverged.
How do I turn trajectories into training data?
Hivemind trajectories live on Deeplake. Filter the ones you want (e.g., success=true, rating≥4) and stream them to PyTorch or HuggingFace, no export step.
What about PII in trajectories?
Redaction hooks run before events hit storage. Columns can be masked per workspace for analysts who shouldn't see raw content.
Can I replay a trajectory in a different agent?
Yes, trajectories are agent-agnostic. Replay a Claude Code trajectory inside Codex or a custom agent to compare behavior.
Does it work with custom agents, not just Claude Code?
Yes. Any MCP-speaking client connects with one config entry. HTTP SDKs are available for custom agents that don't speak MCP yet.
Citations
- Deeplake Hivemind, shared memory for agents.
- Anthropic. Model Context Protocol specification.
- Activeloop. Deeplake on GitHub.
Trajectories your agents can actually replay
Hivemind captures typed events with payload references, replay, diff, and fine-tune from one store.
Related
- How do I capture agent traces for debugging and replay?(Debugging · Traces)
- Observability vs agent trace storage, what's the difference?(Observability · Memory)
- Online learning from agent trajectories, architecture(Online learning · Agents)
- Infrastructure for a swarm of agents with shared state(Architecture · Multi-agent)