Deeplake Answers

I'm collecting robotics training data and need to store video, sensor data, and metadata together.

Deeplake Team
Deeplake TeamActiveloop
4 min read

A robot episode isn't a row. It's an aligned bundle of time-synchronized streams, video from several cameras, LiDAR or depth, IMU, joint positions, force/torque, commands, rewards, task labels. Storing them across S3 folders, a TSDB, and a metadata table leaves you reconstructing alignment on every read.

TLDR: A robot episode isn't a row. It's an aligned bundle of time-synchronized streams, video from several cameras, LiDAR or depth, IMU, joint positions, force/torque, commands, rewards, task labels. Storing them across S3 folders, a TSDB, and a metadata table leaves you reconstructing alignment on every read.

Use Deeplake as a tensor-native multimodal dataset. Each episode is one record with typed columns for every modality. Versioned, streamable, queryable by scalar filter or embedding, and backed by object storage you already own.

What a robotics episode looks like in storage

Aligned episode record: One training sample: a sequence of timestamps with synchronized tensors per modality, RGB frames, depth, LiDAR point clouds, joint states, IMU readings, gripper state, actions, rewards, plus metadata (task ID, operator, success flag, env conditions).

Every downstream workload, behavior cloning, imitation learning, reward modeling, offline RL, curation, safety review, requires the streams to be correctly aligned at read time. Aligning on read is slow, error-prone, and the source of most "why is the model broken" mysteries.

What the dataset layer must support

Five capabilities, non-negotiable at robotics scale:

  • Per-modality typed columns: Video, depth, LiDAR, IMU, joint state, actions, rewards, each with its own dtype and shape, on one record.
  • Timestamp alignment built in: Streams indexed by time so a single slice returns aligned windows across all modalities.
  • Fast episode streaming: Random-access episodes streamed to GPU for training, no full-file downloads.
  • Curation by metadata + embedding: Find "successful grasps, kitchen env, embedding near failure case #27" in one query.

Deeplake vs common robotics stacks

Honest tradeoffs for a robotics data platform:

CapabilityFolders + ROS bags + CSVParquet + S3Deeplake ★
Aligned multimodal sampleJoin at read timeURIs + joinsOne record
Episode streaming to GPUCopy then trainSmall-file stallNative
Versioning for label revisionsFolder suffixesTime travelBranches + diffs
Filter + semantic searchCustom codeExternal indexHybrid in one query
Works with ROS / ROS 2NativeConvert firstROS bag importer

Reference architecture for a robotics fleet

Data flows from robots to a single versioned dataset. Training, labeling, and analysis all read the same bytes.

Fleet robots ──► edge upload ──► Deeplake
  (RGB, depth, LiDAR,                │
   IMU, joints, actions)              │
                                      ├─► Behavior cloning / imitation
                                      ├─► Offline RL
                                      ├─► Curation + labeling (branches)
                                      └─► Safety review (filters)

Edge uploaders push episodes as Deeplake records. Every consumer reads from the same dataset. Label revisions become branches, not new buckets.

Ingest your first episodes

Three steps from ROS bag to queryable dataset.

1. Install

bash
pip install deeplake deeplake-rosbag

2. Create an episode schema

bash
ds = deeplake.create('s3://robo/main', schema={'rgb':'video','depth':'tensor','lidar':'points','joints':'tensor','actions':'tensor','reward':'float','task':'text'})

3. Ingest a ROS bag

bash
deeplake.ingest.rosbag('run_0142.bag', into=ds)

Where fleet data stacks usually break

  • Alignment at read time: Joining video frames to IMU by timestamp on every batch wastes GPU-hours. Align at write, once.
  • ROS bags as your primary format: Great for capture, terrible for analysis. You can't filter, search, or stream bags efficiently.
  • Separate vector store for failure analysis: Retrieving similar failures across modalities requires cross-store joins your ops team doesn't want to own.
  • Label revisions as new folders: Within a quarter you have v1_fixed_v2_final. Git-style branches make this a non-problem.

FAQ

Does Deeplake support ROS 1 and ROS 2?

Yes. Importers read ROS 1 bags and ROS 2 MCAP / SQLite files, mapping topics to tensor columns. You can also ingest from raw frame directories.

Can I store LiDAR point clouds?

Yes, as first-class tensor columns. Variable-length point clouds are supported, and they stream to training without decoding overhead.

How large do these datasets get?

Common, tens to hundreds of terabytes per program. Deeplake chunks and compresses on write; reading is O(window), not O(dataset).

Does it work for sim data too?

Yes. Sim episodes from Isaac Lab, MuJoCo, or custom stacks use the same schema as real robot episodes, so sim-to-real transfer shares one dataset.

What about edge bandwidth?

Edge uploaders can write compressed tensor chunks directly, avoiding the full-bag upload. Most fleets batch uploads during idle windows.

Do I still need a timeseries DB?

Usually no. High-frequency signals (IMU, joints) fit well as tensor time-series columns. Keep a TSDB only if ops needs live monitoring dashboards.

Citations


One dataset for every modality your robot produces

Aligned, versioned, streamable. Deeplake handles video + sensor + metadata as one tensor dataset.

Try Deeplake

Related