The world’s most capable Robotics Foundation Model provides robots a deep understanding of language and the physical world
Today Covariant (https://covariant.ai), the world’s leading AI Robotics company, released RFM-1: a Robotics Foundation Model that provides robots the human-like ability to reason, representing the first time Generative AI has successfully given commercial robots a deeper understanding of language and the physical world.
The key challenge with traditional robotic automation and automation based on manual programming or specialized learned models is the lack of reliability and flexibility in real-world scenarios. To create value at scale, robots must understand how to manipulate an unlimited array of items and scenarios autonomously.
By starting with warehouse pick and place operations, Covariant’s RFM-1 showcases the power of Robotics Foundation Models. In warehouse environments, the technology company’s approach of combining the largest real-world robot production dataset with a massive collection of Internet data is unlocking new levels of robotic productivity and shows a path to broader industry applications ranging from hospitals and homes to factories, stores, restaurants, and more.
“Robotics Foundation Models require access to a vast amount of high-quality multimodal data. These models require data that reflects the wide range of information a robot needs to make decisions, including text, images, video, physical measurements, and robot actions,” said Peter Chen, Chief Executive Officer and Co-Founder, Covariant. “Unlike AI for the digital world, there is no internet to scrape for large-scale robot interaction data with the physical world. So we built a highly scalable data collection system which has collected tens of millions of trajectories by deploying a large fleet of warehouse automation robots to dozens of customers around the world.”
Since 2017, Covariant’s previous AI models have enabled robots to operate in a commercially meaningful way across a diverse set of warehouse operations and industries. These robots have been able to adapt to their embodiment, understand the scenes they are faced with, reliably handle items they have never seen before, and achieve human-level speed and reliability.
The introduction of RFM-1 sets a new frontier for what’s possible. Set up as a Multimodal Any-to-Any Sequence Model, RFM-1 is an 8 billion parameter model that is trained on text, images, video, robot actions, and physical measurements to autoregressively perform next-token prediction. Because it tokenizes all modalities into a common space, the next-token prediction training enables RFM-1 to understand any modalities as input and predict any modalities as output.
With a deep understanding of language and the physical world, RFM-1 gives robots the sophisticated ability to reason and make decisions on the fly. This delivers high levels of robotic autonomy, lowers associated costs and implementation times, and opens the door for the rapid development of new applications and robotic form factors (consumer and humanoid robots).
Specific RFM-1 capabilities include:
- Physics world model: RFM-1’s understanding of physics emerges from learning to generate videos. RFM-1 can predict via AI-generated videos how objects will react to robotic actions. This physics world model, powered by Covariant’s multimodal robotics dataset, improves speed and reliability by enabling robots to simulate the result of future scenarios and select the best course of action.
- Language-guided programming: By making robots taskable and giving them an understanding of the English language, RFM-1 enables robots and humans to collaborate and problem-solve by simply communicating with each other — significantly lowering the barriers of customizing AI behavior to address dynamic business needs and the long-tail of corner case scenarios.
- Learning from self-reflection: In-context learning allows robots to learn on the fly and improve based on the self-reflection of their own actions. With RFM-1, robots can realize this learning in minutes, as opposed to weeks or months, which drastically increases performance while reducing ramp time for a new system, scenario, or item.
“Recent advances in Generative AI have demonstrated beautiful video creation capabilities, yet these models are still very disconnected from physical reality and limited in their ability to understand the world robots are faced with. Covariant’s RFM-1, which is trained on a very large dataset that is rich in physical robot interactions, represents a significant leap forward towards building generalized AI models that can accurately simulate the physical world,” commented Pieter Abbeel, Chief Scientist and Co-Founder, Covariant.
Learn more about RFM-1 on the Covariant blog.
The capabilities will be available for live demonstration at Covariant’s headquarters in Emeryville, CA by appointment only and can be experienced on a first-come, first-served basis at the MODEX 2024 Trade Event in Atlanta, GA, from March 11 until March 14, 2024 (visit booth C7085 to reserve a spot).
The post Covariant introduces RFM-1 to Give robots human-like ability to reason first appeared on AI-TechPark.
#Robotics [Source: AI Techpark]