Teleoperation & VLA Annotation

Training data for
robotics teams that
move fast.

We recruit operators, run teleoperation sessions on your hardware, and deliver policy-ready datasets in LeRobot, RLDS, or HDF5 — in weeks, not months.

Start a Pilot → See How It Works

COLLECTION PIPELINE

LIVE

2wk

turnaround

300+

demos/week

5ms

label density

PICK & PLACE

88%

ASSEMBLY

72%

BIMANUAL

61%

Teleoperation Data Collection

VLA Action Annotation

LeRobot v2/v3 Native

Policy Smoke Test

RLDS / RT-X Compatible

Action Segmentation

Natural Language Labels

HDF5 / robomimic

Humanoid Robotics

2-Week Pilots

LatAm Operator Network

OpenVLA · pi0 · Octo · ACT

Teleoperation Data Collection

VLA Action Annotation

LeRobot v2/v3 Native

Policy Smoke Test

RLDS / RT-X Compatible

Action Segmentation

Natural Language Labels

HDF5 / robomimic

Humanoid Robotics

2-Week Pilots

LatAm Operator Network

OpenVLA · pi0 · Octo · ACT

$14B+

Raised by humanoid robotics companies in 2024–2025

MARKET SIZE

300+

Teleoperation demos per week, delivered to HuggingFace

WEEKLY CAPACITY

<5s

Average annotation label density per demonstration

LABEL PRECISION

2wk

From brief to policy-ready dataset delivery

TURNAROUND

The Bottleneck

Your model is blocked
on data, not architecture.

You have the robot. You have the model architecture. But collecting 500 quality demonstrations, annotating subtask segments, normalizing action vectors, and formatting for OpenVLA or pi0 — that's 3 months of engineering time you don't have.

"A dataset of one million noisy demonstrations is not an asset — it's a liability. Bad demos teach inconsistent strategies, and more data just compounds the noise."

Most teams end up stuck in a loop: collect more data → still bad policy → collect even more data. The problem isn't volume. It's quality, annotation, and format expertise.

Teleop is expensive in-house

$50K–$150K per station in hardware. We bring lean setups to your facility.

Generic annotators miss robot physics

They can't tell a failed grasp from a good one. We train specialists.

No one validates the data trains

Most vendors ship files. We run a smoke test before delivery.

Scale AI minimums are out of reach

Enterprise contracts start at $500K+. We work with Seed–Series B.

Services

Everything from
collection to delivery.

Three services, one vendor. We slot into your pipeline wherever you need us.

01 ————

Teleoperation Data Collection

We recruit, train, and manage teleoperation operators who collect demonstrations on your hardware — at your facility or remotely with portable Quest 3 + GELLO setups. Pick and place, assembly, bimanual tasks, deformable objects. You bring the robot. We bring the operators, the protocol, and the quality control.

50 – 2,000+ demos On-site or remote QC included

02 ————

VLA Action Annotation

Send us your raw recordings. We deliver fully annotated datasets: episode-level language instructions, subtask segmentation, success/failure labels, contact state tags, and trajectory quality scores.

OpenVLA · pi0 · Octo · ACT

03 ————

Policy Smoke Test

Before we ship, we run a mini training pass on 10% of your dataset. If the policy doesn't converge, we fix the data before you ever see it. No other vendor does this.

Included on every delivery

Why Train Them AI

Built by people who've done this.

Built from the inside

Our founder built the automated action labeling platform powering humanoid robotics data at a leading physical AI company — at $100M ARR. We didn't learn this from the outside.

Your hardware, our operators

No hardware minimums. We operate on your existing setup — ALOHA, UR arms, Franka, or custom rigs. Portable Quest 3 setups available for on-site deployments across LatAm.

Format-native delivery

LeRobot v2/v3, RLDS, HDF5, robomimic — delivered in the exact format your training loop expects. Zero conversion work on your end. We know the difference between a good RLDS record and a broken one.

Mid-market focus

No $500K minimums. No 6-month enterprise sales cycles. Pilots start at 100 demos. We work with Seed–Series B robotics companies that can't get Scale AI's attention — but need the same quality.

2-week pilot turnaround

300-demo annotated pilot in your hands in 2 weeks. Large vendors take 4–6 weeks for comparable scope. We move at startup speed because we are one.

LatAm environmental diversity

Operators across Latin America provide real-world environmental diversity — homes, kitchens, warehouses, offices — that a single-location lab can never produce on its own.

Building in Public

What we're building.

Live projects, active research, and infrastructure in progress.

● Live

Apr 2026

Market Research — Robotics Data Ops

Deep analysis of the $14B+ humanoid robotics market. Identified the VLA data ops gap for Seed–Series B companies underserved by current vendors. Mapped 50+ YC targets.

Research Competitive Analysis 50+ Companies

● Live

Apr 2026

trainthemai.com Relaunch

Full repositioning from generic POV video collection to VLA data ops boutique for mid-market robotics. New services, new positioning, new design.

Website Positioning

◐ In Progress

Coming Soon

Humanoid Manipulation Dataset

Egocentric + multi-camera demos on Tiangong humanoid hardware. Action segmentation, language labels, success flags. Publishing to HuggingFace in LeRobot v2.

HuggingFace LeRobot v2 Tiangong

◐ In Progress

Q2 2026

Automated Action Labeling Platform

Proprietary platform for fine-grained temporal annotation, multi-stream action segmentation, and automated QC scoring — built for humanoid robotics data ops.

Internal Tool VLA Annotation QC Automation

○ Planned

Q2 2026

LatAm Teleop Operator Network

Distributed network of trained teleoperation operators across Latin America. Quest 3 + GELLO setups. Target: 20 operators, 500+ demos/week capacity.

Operations Quest 3 LatAm

○ Planned

Q2 2026

Policy Smoke Test Pipeline

Automated pipeline running a mini ACT or Diffusion Policy training pass on 10% of every delivered dataset. First vendor in the space to offer training-ready validation.

ACT Diffusion Policy QA

Delivered in

LeRobot v2/v3

RLDS / RT-X

HDF5

robomimic

JSONL / CSV

Who We Work With

Built for Seed–Series B robotics teams that move too fast to build an internal data ops team.

If you raised $2M–$50M, have robots in the lab, and are hitting the data collection bottleneck — we're the team you'd hire, without the hiring overhead.

Start a Pilot → See the Process

Process

From brief to
policy-ready data.

Share Your Brief

Task description, robot specs, target format, volume, and timeline. 30-minute scoping call optional. We'll tell you exactly what to expect before we start.

We Script & Deploy Operators

We write the collection protocol, define success criteria, train operators on your specific task, and configure the data pipeline. You review the protocol before collection begins.

Collect, QA & Annotate

Real-time operator rejection of bad demos, automated trajectory quality checks, human QC review, and full VLA annotation — action segmentation, language labels, success flags.

Policy Smoke Test & Delivery

We run a mini training pass to validate the dataset trains a policy. Then deliver to HuggingFace or your preferred endpoint in your target format, with a full data card.

Get Started

Ready to fuel
your model?

Tell us what you're building and we'll scope a pilot. Most pilots ship in 2 weeks. No minimums, no lock-in, no enterprise sales cycle.

Pilots start at 100 demonstrations

Response within 24 hours

LeRobot, RLDS, HDF5, or your format

Policy smoke test on every delivery

No minimum contract size

Training data forrobotics teams thatmove fast.

Your model is blockedon data, not architecture.

Teleop is expensive in-house

Generic annotators miss robot physics

No one validates the data trains

Scale AI minimums are out of reach

Everything fromcollection to delivery.

Teleoperation Data Collection

VLA Action Annotation

Policy Smoke Test

Built by people who've done this.

What we're building.

Market Research — Robotics Data Ops

trainthemai.com Relaunch

Humanoid Manipulation Dataset

Automated Action Labeling Platform

LatAm Teleop Operator Network

Policy Smoke Test Pipeline

Built for Seed–Series B robotics teams that move too fast to build an internal data ops team.

From brief topolicy-ready data.

Share Your Brief

We Script & Deploy Operators

Collect, QA & Annotate

Policy Smoke Test & Delivery

Ready to fuelyour model?

Training data for
robotics teams that
move fast.

Your model is blocked
on data, not architecture.

Everything from
collection to delivery.

From brief to
policy-ready data.

Ready to fuel
your model?