Although humans rely on multiple senses to comprehend the world, robots largely use vision and, increasingly, touch. Carnegie Mellon University researchers this week said they have found that hearing could significantly improve robotic perception.
In what they claimed is the first large-scale study of the interactions between sound and robotic action, researchers at CMU’s Robotics Institute found that robots could use sounds to differentiate between objects, such as a metal screwdriver and a metal wrench.
Hearing also could help robots determine what type of action caused a sound and help them use sounds to predict the physical properties of new objects.
Hearing improves perception more than expected
“A lot of preliminary work in other fields indicated that sound could be useful, but it wasn’t clear how useful it would be in robotics,” said Lerrel Pinto, who recently earned his Ph.D. in robotics at CMU and will join the faculty of New York University this fall. He and his colleagues found the performance rate was quite high, with robots that used hearing successfully classified objects 76% of the time.
The results were so encouraging, he added, that it might prove useful to equip future robots with instrumented canes, enabling them to tap on objects they want to identify.
The researchers presented their findings last month during the virtual Robotics Science and Systems conference. Other team members included Abhinav Gupta, associate professor of robotics, and Dhiraj Gandhi, a former master’s student who is now a research scientist at Facebook Artificial Intelligence Research’s Pittsburgh lab. The Defense Advanced Research Projects Agency and the Office of Naval Research also supported the research.
Building a data set
To perform their study, the researchers created a large dataset, simultaneously recording video and audio of 60 common objects — such as toy blocks, hand tools, shoes, apples and tennis balls — as they slid or rolled around a tray and crashed into its sides. They have since released this hearing dataset, cataloging 15,000 interactions, for use by other researchers.
The team captured these interactions using an experimental apparatus they called Tilt-Bot — a square tray attached to the arm of a Sawyer robot. It was an efficient way to build a large dataset; they could place an object in the tray and let Sawyer spend a few hours moving the tray in random directions with varying levels of tilt as cameras and microphones recorded each action.
They also collected some data beyond the tray, using Sawyer to push objects on a surface.
Though the size of this dataset is unprecedented, other researchers have also studied how intelligent agents can glean information from sound. For instance, Oliver Kroemer, assistant professor of robotics, led research into using sound sensing to estimate the amount of granular materials, such as rice or pasta, by shaking a container, or estimating the flow of those materials from a scoop.
Pinto said the usefulness of robot hearing was therefore not surprising, though he and the others were surprised at just how useful it proved to be. They found, for instance, that a robot could use what it learned about the sound of one set of objects to make predictions about the physical properties of previously unseen objects.
“I think what was really exciting was that when it failed, it would fail on things you expect it to fail on,” he said. For instance, a robot couldn’t use sound to tell the difference between a red block or a green block. “But if it was a different object, such as a block versus a cup, it could figure that out,” said Pinto.