Researchers from Stanford Present Mobile ALOHA: A Low-Cost and Whole-Body Teleoperation System for Data Collection

Since it enables humans to teach robots any skill, imitation learning via human-provided demonstrations is a promising approach for creating generalist robots. Lane-following in mobile robots, basic pick-and-place manipulation, and more delicate manipulations like spreading pizza sauce or inserting a battery may all be taught to robots through direct behavior cloning. However, rather than merely requiring individual mobility or manipulation behaviors, many activities in realistic, everyday situations need whole-body coordination of mobility and dexterous manipulation.

Research by Standford University investigates whether imitation learning may be used for tasks where bimanual mobile robots need to be controlled with their entire body. Two key issues hamper the widespread use of imitation learning for bimanuaneeds to be improved. (1) Plug-and-play, readily available hardware for whole-body teleoperation needs to be improved. (2) Buying off-the-shelf bimanual mobile manipulators can be expensive. Robots such as the PR2 and the TIAGo are too expensive for typical research labs, costing over USD 200k. More gear and calibration are also required to enable teleoperation on these platforms.

This study addresses the difficulties in implementing imitation learning for bimanual mobile manipulation. Regarding hardware, the researchers introduce Mobile ALOHA, a whole-body teleoperation system that is inexpensive and designed to gather data on bimanual mobile manipulation. By placing it on a wheeled base, Mobile ALOHA expands the possibilities of the original ALOHA, the inexpensive and skillful bimanual puppeteering apparatus.

To permit base movement, the user back drives the wheels while physically attached to the system. This enables the base to move independently while the user controls ALOHA with both hands. They create a whole-body teleoperation system by recording arm puppeteering and base velocity data simultaneously.

The team notes that excellent performance in imitation learning can be obtained by simply concatenating the base and arm actions and then training by direct imitation learning. In particular, they create a 16-dimensional action vector by joining the mobile base’s linear and angular velocity with the 14-DoF joint positions of ALOHA. With nearly no implementation change, this formulation enables Mobile ALOHA to benefit directly from earlier deep imitation learning methods.

They highlight that there are few to no accessible bimanual mobile manipulation datasets. However, they were motivated by the recent success of pre-training and co-training on varied robot datasets to increase imitation learning performance further. As a result, they started using data from static bimanual datasets, which are easier to obtain and more plentiful. In particular, they use the static ALOHA datasets through the introduction of RT-X. It has 825 episodes with activities unrelated to the Mobile ALOHA tasks and has the two arms mounted separately.

Despite the disparities in tasks and morphology, the study demonstrates positive transfer in almost all mobile manipulation tasks, achieving comparable or greater performance and data efficiency than policies taught using only Mobile ALOHA data. Additionally, this observation holds for other classes of cutting-edge imitation learning techniques, such as Diffusion Policy and ACT.

This imitation learning outcome is also robust to many complex activities, including pulling in chairs, contacting an elevator, opening a two-door wall cabinet to store heavy cooking pots, and cleaning up spilled wine. With just 50 human examples for each task, co-training allows us to obtain over 80% performance, with an absolute improvement of 34% on average compared to no co-training.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.