Listen to this article
DayDreamer, a reinforcement-learning (RL) artificial intelligence (AI) algorithm created by researchers from the University of California, Berkeley, can teach a quadruped to walk in just one hour. The algorithm helps robots quickly learn tasks like picking, navigating or walking by using a world model.
The world model allows the AI algorithm to learn more quickly than using RL alone without needing to interact with an AI simulator. It was successfully used to train a Unitree Robotics A1 Quadruped to roll off its back and walk in just an hour, a Universal Robot UR5 manipulator and a UFACTORY xArm 6 to complete a pick-and-place task in around 10 hours, and a Sphero Ollie mobile robot a navigation task in two hours.
DayDreamer uses neural networks to interact with the environment. It uses this information to learn a world model. The world model allows AI to predict the results of a series of actions. This predicted behavior is used with RL to train a controller for the robot.
This process has advantages over typical robot training methods. It’s faster than RL on its own and better equipped to handle the complexity and dynamics of the real world than training with a simulated environment. The world model also requires less development time and cost than simulated environments.
The world model system uses an encoder neural network to translate map sensor data into a smaller-dimensional representation and a dynamics network. The network predicts the way motor actions will change this smaller representation.
Then, a reward neural network decides which motor actions are best based on whether or not it achieved a task. Next, an RL actor-critic algorithm uses the resulting world model to learn control behaviors. This method allows the AI algorithm to consider many different motor actions at the same time, instead of having the robot try one behavior at a time like in typical RL.
DayDreamer is able to allow robots to quickly adapt to their surroundings. The team found the quadruped was able to learn within 10 minutes how to withstand being pushed or to quickly roll over and stand back up using the algorithm. The robotic arms could learn to pick and place objects by just using camera images and sparse rewards, and the mobile robot could navigate to its goal position using just camera images.
The team’s model and several experiments were published in a paper co-authored by Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg and Pieter Abbeel. The paper was published on arXiv. The DayDreamer code will soon be open-sourced, according to the project’s website, while an earlier version of the algorithm is available on GitHub.