DeepMind Research Develops AutoRT: Transforming Robotic Learning Through AI-Driven Task Execution in Real-World Environments

Google Deepmind has introduced a system, AutoRT, that utilizes existing foundation models to enhance the deployment of operational robots in unseen scenarios with minimal human supervision. It addresses the challenge of training embodied foundation models for robots, highlighting the limitation of insufficient data grounded in the physical world. AutoRT leverages vision-language models for scene understanding and grounding and large language models for generating diverse and novel instructions for a fleet of robots. The goal is to enable large-scale, “in-the-wild” data collection, allowing robots to adapt to new environments and tasks autonomously.

Current methods in autonomous robotics focus on acquiring individual robotic skills, while large language models (LLMs) and vision-language models (VLMs) provide the ability to reason over abstract tasks. The researchers state that truly open-ended tasks in diverse settings present significant challenges due to the lack of extensive real-world robotic experience. The proposed solution, AutoRT, introduces a system that orchestrates a fleet of robots using a large foundation model. This model guides the robots to perform tasks based on user prompts, scene understanding from VLMs, and task proposals from LLMs, all while adhering to a robot constitution specifying rules and safety constraints.

AutoRT’s approach comprises several key components. The system begins with exploration, where robots navigate and map the environment using a natural language map approach. The robot constitution, inspired by Asimov’s laws, sets foundational, safety, and embodiment rules, providing a framework for safe and effective task generation. Task generation involves scene description by VLMs and task proposal by LLMs, with specific prompts for each robot’s collect policy. Affordance filtering incorporates constitutional rules and ensures the feasibility and safety of generated tasks. AutoRT employs diverse collection policies, including teleoperation, scripted pick policies, and autonomous policies, aiming to maximize data diversity. Guardrails, traditional robot environment controls, enhance safety in real-world settings.

In conclusion, AutoRT presents a pioneering system for large-scale robotic data collection in real-world scenarios. By leveraging foundation models and incorporating a robot constitution, AutoRT enables the autonomous deployment of robots in diverse environments, with the ability to propose and execute tasks aligned with human preferences. The system’s effectiveness is demonstrated through extensive real-world evaluations, showcasing its capability to collect diverse and valuable data. AutoRT marks a significant step towards addressing the challenges of scaling robotic learning and autonomy in dynamic, unseen environments.

Check out the Paper, Project, and Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.