NVIDIA research announced at CoRL this week puts smart picking within robotic grasping range.
ZURICH — Researchers at NVIDIA Corp. today announced a new deep-learning system for perceiving a variety of objects for robotic grasping and manipulation. In a research paper presented at the Conference on Robot Learning here, NVIDIA explained how its breakthrough could help robots in different situations.
“We want robots to be able to interact with their environment in a safe and skillful manner,” stated Stan Birchfield, a principal research scientist at NVIDIA.
“We’ve developed a novel deep-learning system that enables a robot to estimate in real time the pose of objects with an off-the-shelf RGB camera and reach out, grasp, and manipulate them,” said Hector Marinez, director of corporate communications at NVIDIA.
“Our ultimate goal is for robots to work in the real world,” he said. “To do that, they have to perceive changes in complex environments.”
Spotting and moving objects
“Industrial robots today are very good at repeating the same trajectory over and over, but they have no awareness of the environment,” Birchfield told Robotics Business Review. “Robots with this level of perception don’t really exist right now in industry.”
“Most researchers in computer vision address the problem in algorithms by working with particular data sets, but they don’t necessarily tell you if it will work in a real-world environment,” he said. “We’ve been working on this particular problem for about one year.”
“We presented in May at ICRA,” said Birchfield, referring to a paper presented at the International Conference on Robotics and Automation (ICRA) in Brisbane, Australia. “A human-trained robot would mimic a human and execute a task in the real world, stacking colored cubes.”
NVIDIA said its new algorithm advances machine vision and robotic grasping by accounting for different object orientations and locations.
“Robots need to detect and estimate the pose of objects,” explained Birchfield. “With our system, we can find an object in real time, estimate its orientation, and then grasp it.”
Bridging the ‘reality gap’
The challenge for deep-learning systems is having enough high-quality data to train the neural networks, Birchfield explained.
“This is important for real-time performance,” he said. “Some networks are trained only on synthetic [computer-generated] data, and other systems developed in the research community are trained on real-world data.”
“There has been a shift between synthetic data and real data,” said Birchfield. “When you train a neural network on synthetic data, it typically doesn’t work well.”
“We came up with a trick for bridging the ‘reality gap,’” he added. “We mix domain-randomized plus photorealistic images. We get better results than with either alone or than current algorithms.”
“We can generate a limitless amount of training data that’s already labeled — that’s huge benefit of synthetic data,” Birchfield said.
NVIDIA’s neural network for robotic grasping used NVIDIA Tesla V1000 GPUs on a DGX Station with cuDNN-accelerated deep-learning framework.
“Our neural network is based on a convolutional pose machine, originally developed at another university for human pose estimation,” said Birchfield. “It generated a series of belief maps for per joint in the human skeleton.”
“We adapted that for our work, building belief maps for bounding points of 3D objects,” he said. “In the image space bounding box, we can infer six degrees of freedom, or the pose of objects in a scene.”
Robotic grasping in service applications
While NVIDIA is conducting pure research into robotic perception and manipulation, Birchfield and Marinez acknowledged the potential for practical applications.
“This technology is important for pick-and-place tasks,” Marinez said. “It’s also useful for stacking or putting something such as a can of soup back on a shelf.”
In addition, NVIDIA said its research is relevant to robots interacting with humans, as well as objects that have moved.
“For example, a robot could safely hand a glass to a person or take it from a person,” Marinez said.
“This is a key step forward to household and service robots, but there’s plenty more work to do,” Birchfield said. “We don’t expect robots in the home tomorrow, but this is the type of technology that’s needed.”
Next steps for deep learning research
“The next steps include scaling up to hundreds or thousands of objects, symmetric objects, objects for which the pose is ambiguous, and nonrigid objects,” noted Birchfield.
“We’re focusing on settings where you have a robot manipulator and a workspace,” he said. “You could imagine it in a mobile manipulator setting, executing tasks in multiple spaces.”
“In a closed-loop system, we’re improving performance not just from a motion perspective, but also from a control one for smooth motions,” said Birchfield. “The network will develop to handle the perception problem and enable other robots.”
“This awareness of the environment includes static objects, people, and other dynamic objects, but it’s for robotic grasping, not navigation,” he said.
Sharing robotic grasping research
“We have a custom plug-in developed for Unreal Engine,” said Birchfield. “It can randomize positions and orientations, plus distractor objects that would cause occlusion, and include lighting and rendering.”
“We’re making the code publicly available, and we hope that other researchers will see the value of it and use it,” he said. “There has been a healthy trend in the past five to 10 years in computer vision and robotics of researchers sharing code.”
“We’re focused on the research side, not commercialization,” Marinez said.