Rubik's Cube gets single-handed robotic solution with OpenAI training

Listen to this article

Most robotic grippers do not closely resemble human hands because they are designed for a limited range of functions or high precision and repeatability. However, human hands can be very dexterous and perform feats difficult for robots. The key to robotic manipulation is not the hardware but the software, according to OpenAI. The company posted to its blog today about how it trained a robot hand to solve a Rubik’s Cube.

San Francisco-based OpenAI has been working on artificial general intelligence, in which robots learn to solve problems independently rather than be programmed with specific solutions. In July, Microsoft Corp. said it was investing $1 billion in OpenAI and partnering with it to develop AI on the Azure platform.

OpenAI’s blog post refers to a research paper its team wrote explaining how models trained in simulation could “solve a manipulation problem of unprecedented complexity on a real robot.”

The company has been working since May 2017 to train a robot hand to solve a Rubik’s Cube. While it was able to do so in simulation by July 2017, the physical robot achieved that capability only in July 2019.

The goal is to help train robots to eventually be general-purpose household assistants. Mobile manipulators for have also received interest for e-commerce order fulfillment, packing, manufacturing, and other tasks.

Applying machine learning to complex manipulation

“Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it,” said OpenAI. “Our robot still hasn’t perfected its technique, though, as it solves the Rubik’s Cube 60% of the time (and only 20% of the time for a maximally difficult scramble).”

The goal wasn’t just to solve a Rubik’s Cube, which other robots can do faster, but to be able to manipulate it without having data on all possible orientations and combinations first.

To get to that point, OpenAI kept the hardware it has been using for the past 15 years — a Shadow Dextrous E Series Hand — with a PhaseSpace motion-capture system for coordinating the five fingertips. The company also kept its 3 RGB Basler camera for visual pose estimation. It made only minor modifications for grip and and robustness to the Dactyl system.

The researchers did modify the Rubik’s Cube for its testing to include built-in sensors and a Bluetooth module. This enabled the cube to report its state and helped with the manipulation and testing.

While Dactyl’s hardware remained mostly the same, OpenAI’s latest research was different because of the techniques it used with two neural networks. It included the custom robot platform and automatic domain randomization (ADR). Normal randomization was not enough to train AI and robots to apply generalized lessons.

“The biggest challenge we faced was to create environments in simulation diverse enough to capture the physics of the real world,” OpenAI wrote. “Factors like friction, elasticity and dynamics are incredibly difficult to measure and model for objects as complex as Rubik’s Cubes or robotic hands, and we found that domain randomization alone is not enough.”

Automatic domain randomization enables a robot to be trained to solve a Rubik’s Cube in simulation. Source: OpenAI

ADR generated simulations of increasing complexity, and the control policy learned to solve them using a recurrent neural network and reinforcement learning. The convolutional neural network for pose prediction was trained on the same data but separately from the control policy, said OpenAI.

“Control policies and vision-state estimators trained with ADR exhibit vastly improved sim2real [simulation-to-reality] transfer,” stated OpenAI. “For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time.”

By using ADR, it was easier to transfer lessons from simulation to a real-world Rubik’s Cube. Source: OpenAI

By “meta-learning,” OpenAI meant that the algorithm — and, by extension, robots — should be able to learn without prior knowledge and react accordingly to unforeseen factors in the environment. MIT and other research institutions are also working on the problem.

Overcoming random obstacles to Rubik’s Cube solution

As a neural network got better at solving the Rubik’s Cube, the amount of domain randomization is automatically increased, forcing the network to generalize its lessons. Random factors included the size and mass of the cube, the amount of friction, and the visible parts of the hand itself.

In addition to setting the challenges of manipulating and solving the Rubik’s Cube, the researchers added a rubber glove, a blanket, and a stuffed giraffe as environmental obstacles.

After repeated simulations and randomizations, the robot exceeded performance thresholds for both manipulating the block and solving the puzzle.

“We find that our system trained with ADR is surprisingly robust to perturbations, even though we never trained with them,” said OpenAI. “The robot can successfully perform most flips and face rotations under all tested perturbations, though not at peak performance.”

OpenAI found that visually representing how the neural networks solve problems helped associate semantic behaviors with the data gathered during simulations. This provided insight into the steps the algorithm took to move and solve the Rubik’s Cube.

While a Rubik’s Cube might seem a long way from figuring out how to open a refrigerator and fetch a beverage, developing human-level dexterity is an important step toward service robots that can observe, decide, and react to a wide variety of circumstances, said OpenAI.