Teleoperation & VLA Annotation

Training data for
robotics teams that
move fast.

We recruit operators, run teleoperation sessions on your hardware, and deliver policy-ready datasets in LeRobot, RLDS, or HDF5 — in weeks, not months.

$14B+
Raised by humanoid robotics companies in 2024–2025
MARKET SIZE
300+
Teleoperation demos per week, delivered to HuggingFace
WEEKLY CAPACITY
<5s
Average annotation label density per demonstration
LABEL PRECISION
2wk
From brief to policy-ready dataset delivery
TURNAROUND
The Bottleneck

Your model is blocked
on data, not architecture.

You have the robot. You have the model architecture. But collecting 500 quality demonstrations, annotating subtask segments, normalizing action vectors, and formatting for OpenVLA or pi0 — that's 3 months of engineering time you don't have.

"A dataset of one million noisy demonstrations is not an asset — it's a liability. Bad demos teach inconsistent strategies, and more data just compounds the noise."

Most teams end up stuck in a loop: collect more data → still bad policy → collect even more data. The problem isn't volume. It's quality, annotation, and format expertise.

Teleop is expensive in-house

$50K–$150K per station in hardware. We bring lean setups to your facility.

Generic annotators miss robot physics

They can't tell a failed grasp from a good one. We train specialists.

No one validates the data trains

Most vendors ship files. We run a smoke test before delivery.

Scale AI minimums are out of reach

Enterprise contracts start at $500K+. We work with Seed–Series B.

Services

Everything from
collection to delivery.

Three services, one vendor. We slot into your pipeline wherever you need us.

01 ————

Teleoperation Data Collection

We recruit, train, and manage teleoperation operators who collect demonstrations on your hardware — at your facility or remotely with portable Quest 3 + GELLO setups. Pick and place, assembly, bimanual tasks, deformable objects. You bring the robot. We bring the operators, the protocol, and the quality control.

50 – 2,000+ demos On-site or remote QC included
02 ————

VLA Action Annotation

Send us your raw recordings. We deliver fully annotated datasets: episode-level language instructions, subtask segmentation, success/failure labels, contact state tags, and trajectory quality scores.

OpenVLA · pi0 · Octo · ACT
03 ————

Policy Smoke Test

Before we ship, we run a mini training pass on 10% of your dataset. If the policy doesn't converge, we fix the data before you ever see it. No other vendor does this.

Included on every delivery
Why Train Them AI

Built by people who've done this.

Built from the inside
Our founder built the automated action labeling platform powering humanoid robotics data at a leading physical AI company — at $100M ARR. We didn't learn this from the outside.
Your hardware, our operators
No hardware minimums. We operate on your existing setup — ALOHA, UR arms, Franka, or custom rigs. Portable Quest 3 setups available for on-site deployments across LatAm.
Format-native delivery
LeRobot v2/v3, RLDS, HDF5, robomimic — delivered in the exact format your training loop expects. Zero conversion work on your end. We know the difference between a good RLDS record and a broken one.
Mid-market focus
No $500K minimums. No 6-month enterprise sales cycles. Pilots start at 100 demos. We work with Seed–Series B robotics companies that can't get Scale AI's attention — but need the same quality.
2-week pilot turnaround
300-demo annotated pilot in your hands in 2 weeks. Large vendors take 4–6 weeks for comparable scope. We move at startup speed because we are one.
LatAm environmental diversity
Operators across Latin America provide real-world environmental diversity — homes, kitchens, warehouses, offices — that a single-location lab can never produce on its own.
Building in Public

What we're building.

Live projects, active research, and infrastructure in progress.

● Live
Apr 2026

Market Research — Robotics Data Ops

Deep analysis of the $14B+ humanoid robotics market. Identified the VLA data ops gap for Seed–Series B companies underserved by current vendors. Mapped 50+ YC targets.

Research Competitive Analysis 50+ Companies
● Live
Apr 2026

trainthemai.com Relaunch

Full repositioning from generic POV video collection to VLA data ops boutique for mid-market robotics. New services, new positioning, new design.

Website Positioning
◐ In Progress
Coming Soon

Humanoid Manipulation Dataset

Egocentric + multi-camera demos on Tiangong humanoid hardware. Action segmentation, language labels, success flags. Publishing to HuggingFace in LeRobot v2.

HuggingFace LeRobot v2 Tiangong
◐ In Progress
Q2 2026

Automated Action Labeling Platform

Proprietary platform for fine-grained temporal annotation, multi-stream action segmentation, and automated QC scoring — built for humanoid robotics data ops.

Internal Tool VLA Annotation QC Automation
○ Planned
Q2 2026

LatAm Teleop Operator Network

Distributed network of trained teleoperation operators across Latin America. Quest 3 + GELLO setups. Target: 20 operators, 500+ demos/week capacity.

Operations Quest 3 LatAm
○ Planned
Q2 2026

Policy Smoke Test Pipeline

Automated pipeline running a mini ACT or Diffusion Policy training pass on 10% of every delivered dataset. First vendor in the space to offer training-ready validation.

ACT Diffusion Policy QA
Delivered in
LeRobot v2/v3
RLDS / RT-X
HDF5
robomimic
JSONL / CSV
Who We Work With

Built for Seed–Series B robotics teams that move too fast to build an internal data ops team.

If you raised $2M–$50M, have robots in the lab, and are hitting the data collection bottleneck — we're the team you'd hire, without the hiring overhead.

Process

From brief to
policy-ready data.

Share Your Brief

Task description, robot specs, target format, volume, and timeline. 30-minute scoping call optional. We'll tell you exactly what to expect before we start.

We Script & Deploy Operators

We write the collection protocol, define success criteria, train operators on your specific task, and configure the data pipeline. You review the protocol before collection begins.

Collect, QA & Annotate

Real-time operator rejection of bad demos, automated trajectory quality checks, human QC review, and full VLA annotation — action segmentation, language labels, success flags.

Policy Smoke Test & Delivery

We run a mini training pass to validate the dataset trains a policy. Then deliver to HuggingFace or your preferred endpoint in your target format, with a full data card.

Get Started

Ready to fuel
your model?

Tell us what you're building and we'll scope a pilot. Most pilots ship in 2 weeks. No minimums, no lock-in, no enterprise sales cycle.

Pilots start at 100 demonstrations
Response within 24 hours
LeRobot, RLDS, HDF5, or your format
Policy smoke test on every delivery
No minimum contract size