Human Archive startup pays Indian gig workers to train the world's robots

Human Archive startup pays Indian gig workers to train the world's robots

Human Archive, founded by Berkeley and Stanford researchers, is recruiting gig workers in India to collect physical training data for AI and robotics labs. Workers wear camera-equipped caps and sensor devices to generate real-world movement data. The startup is tapping into India's large gig economy to meet surging global demand for robot training datasets.

Tehnoloogia

A Silicon Valley startup called Human Archive, co-founded by researchers from UC Berkeley and Stanford University, is turning to India's vast gig economy to solve one of the biggest bottlenecks in modern robotics: the shortage of real-world physical training data.

## Gig Workers as Data Collectors

Workers recruited through the programme are paid to wear specially designed camera-equipped caps and sensor-laden devices as they go about everyday tasks. The footage and motion data they generate is then packaged and sold to AI and robotics laboratories that are racing to train the next generation of physical robots.

The approach reflects a broader scramble across the tech industry to gather diverse, high-quality embodied data — the kind that teaches robots how humans move, pick up objects, open doors, and navigate crowded spaces. Unlike text or image data, physical movement data remains scarce and expensive to produce in controlled laboratory settings.

## India as a Data Powerhouse

By tapping India's large and growing gig workforce, Human Archive aims to collect data at a scale and cost that would be difficult to replicate in the United States or Europe. India has tens of millions of gig economy participants, making it an attractive source of demographically diverse human movement footage.

The startup's model raises both promise and questions: while it offers supplementary income to workers in a competitive labour market, researchers and ethicists have begun scrutinising how informed consent and data ownership are handled when individual bodies become training inputs for commercial AI systems.

Open in app →