Introducing Robots ID

Humandroid
Feb 6
4 min read

Updated: May 5

Today we’re introducing Robots ID, an end-to-end operating system for humanoid robots designed to capture, structure, and scale real-world robot intelligence. Robot ID enables humanoid robots to be trained directly from human behavior through teleoperation, motion capture, and multimodal sensing turning physical experience into reusable intelligence.

Robots ID is built as a data-first. It connects humans, robots, and AI models in a continuous learning loop, allowing robots to observe, imitate, and eventually generalize skills across tasks, environments, and embodiments.

Our initial focus is enabling high-fidelity data capture and training for industrial and service humanoid robots, accelerating the path toward autonomous physical labor.

Robots ID

Human-in-the-Loop by Design

Robots ID places humans at the center of robot training. Using VR, teleoperation, and direct control interfaces, humans perform tasks while robots record full-stack multimodal data vision, proprioception, actions, and intent.

This data becomes the foundation for training VLA and control models that progressively reduce the need for human intervention.

Haptic Gloves & Dexterous Manipulation

For tasks requiring fine motor control (assembly, inspection, tool use), Robots ID integrates with haptic glove systems like Manus, enabling finger-level teleoperation with force feedback. Operators feel what the robot touches, dramatically improving demonstration quality for manipulation tasks.

Cross-Embodiment

A single operating system supports multiple humanoid platforms and morphologies. Skills learned on one robot can be transferred, adapted, and reused across different embodiments.

Robots ID abstracts hardware differences while preserving embodiment-specific signals, enabling scalable fleet learning.

Cross-Timescale Intelligence

Robots ID operates across multiple timescales from high-level task intent down to millisecond-level joint execution ensuring alignment between human demonstrations, learned policies, and real-time control.

Human & Training Orchestration Layer

An agentic coordination layer that manages humans, robots, and training objectives across a fleet.

Training goals and task definitions
Human teleoperation sessions
Robot availability and embodiment context
Data quality and coverage requirements

It assigns robots to training sessions, schedules demonstrations, coordinates multiple operators, and tracks dataset completeness and performance across the fleet.

With enterprise systems, simulation platforms, and AI training pipelines to ensure that captured data flows seamlessly into model development.

Robot-Level Cognitive

A robot-level reasoning layer that interprets human intent and task structure during execution. Operating at second to sub-minute timescales, bridges human demonstrations and machine understanding.

Observes human actions through multimodal input
Segments demonstrations into meaningful sub-tasks
Annotates intent, context, and failure modes
Dynamically adapts task execution based on environment changes

Captured demonstrations can be converted into structured workflows (SOPs), reused for retraining, simulation, or autonomous execution.

Multi-VLA Intelligence Layer

Robots ID is model-agnostic by design.

Instead of relying on a single Vision-Language-Action model, it composes multiple VLA and embodied foundation models based on task complexity, embodiment constraints, and deployment context.

UBTECH Thinker

Thinker is a state-of-the-art open-source vision–language foundation model designed for embodied intelligence. Unlike conventional VLMs, it addresses perspective, temporal, and task-level limitations by combining high-quality data curation, multi-stage training, and reinforcement learning.

Thinker excels in task planning, spatial and temporal understanding, and visual grounding, achieving state-of-the-art results across multiple embodied AI benchmarks and demonstrating strong potential as a foundation for autonomous robotic decision-making.

NVIDIA Cosmos + GR00T N1.6

Used as the physical reasoning and simulation-aligned VLA stack.

Cosmos provides world modeling and data generation, while GR00T generates executable VLA policies trained and validated in Isaac Sim before real-world deployment.

Unitree

UnifoLM-VLA-0 is a Vision–Language–Action (VLA) model from the UnifoLM family, designed as a general-purpose manipulation brain for humanoid robots such as Unitree.

Unlike traditional VLMs, it is continuously trained on robot manipulation data, directly linking perception, understanding, and action.

AgiBot GO-1

AgiBot World Colosseo is a full-stack, large-scale robot learning platform developed by AgiBot to advance bimanual manipulation in intelligent embodied systems.

It combines large-scale, high-quality robot interaction datasets with foundation models, standardized benchmarks, and tooling that enable scalable training, evaluation, and deployment of manipulation skills.

By providing an open ecosystem for both academia and industry, AgiBot World Colosseo aims to democratize access to robot data and accelerate progress toward the “ImageNet moment” for Embodied AI.

GitHub - OpenDriveLab/AgiBot-World: [IROS 2025 Best Paper Award Finalist & IEEE TRO 2026] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

From Demonstration to Autonomy

Robots ID is not just a control system it is a learning engine.

By continuously capturing real-world demonstrations, structuring them into reusable intelligence, and feeding them into training pipelines, RobotsOS ID1 accelerates the transition from:

Human-controlled robots
→ supervised autonomy
→ generalized physical intelligence

A Marketplace for Physical Intelligence

All knowledge captured and structured by Robots ID does not remain siloed. It becomes part of a Humanoid Robot Skills Marketplace, where trained capabilities can be shared, reused, and scaled across different robots, organizations, and use cases.

Each human demonstration, workflow (SOP), and trained VLA skill is converted into a versioned digital asset, enriched with performance metrics, usage context, and embodiment compatibility. These skills can be:

Shared across robots within the same fleet
Transferred across different customers or industries
Retrained or adapted to new environments
Integrated directly into autonomy and deployment pipelines

The platform allows a new class of contributors to participate:

Universities
System integrators
Robotics labs and technology centers

These integrators can specialize in specific tasks, domains, or embodiments, accelerating learning cycles and expanding the overall skill library.

By opening the ecosystem to external trainers and integrators, physical learning shifts from an isolated effort into a network effect: as more robots are trained by more contributors, the available skill set grows in breadth, quality, and robustness for everyone.

RobotsOS ID1 is not only how humanoid robots learn it is the infrastructure through which physical intelligence is collaboratively built, distributed, and scaled.

Training the Workforce of 2030

By unifying human behavior, robot execution, and AI training under a single operating system, Robots ID lays the foundation for a new kind of workforce a scalable, autonomous physical workforce for 2030 and beyond.

Our roadmap is explicit: to progressively advance humanoid robots toward Level 5 physical autonomy by 2030. Reaching this level requires learning directly from real human labor, performed in real environments, at scale.

The cross-timescale architecture of Robots ID allows each layer to evolve independently while the system as a whole compounds in capability. Every demonstration, assisted task, and supervised autonomous run becomes training data for the next generation of robots.

We are building Robots ID1 to train the workforce of the future. What begins today as human-in-the-loop learning is the starting point for a fully autonomous physical workforce.

Robots ID is how the workforce of 2030 learns starting today.