Brown researchers simplify human-to-robot communication with large language models

Listen to this article

The Brown research team tested its Lang2LTL software on a Spot robot from Boston Dynamics on campus. | Source: Juan Siliezar, Brown University

Researchers at Brown University said they have developed software that can translate plainly worded instructions into behaviors that robots can carry out without needing thousands of hours of training data.

Most current software for robot navigation can’t reliably move from any everyday language to the mathematical language that robots can understand and perform, noted the researchers at Brown’s Humans to Robots Laboratory. Software systems have an even harder time making logical leaps based on complex or expressive directions, they said.

To achieve these tasks, traditional systems require training on thousands of hours of data. This is so the robot does what it is supposed to do when it comes across that particular type of command. However, recent advances in large language models (LLMs) that run on AI have changed the way that robots learn.

LLMs change how robots learn

These LLMs have opened doors for robots to unlock new abilities in understanding and reasoning, said the Brown team. The researchers said they were excited to bring these capabilities outside of the lab and into the world in a year-long experiment. The team detailed its research in a recently published paper.

The team used AI language models to create a method that compartmentalized the instructions. This method eliminates the need for training data and allows robots to follow simple word instructions to locations using only a map, it claimed.

In addition, the Brown labs’ software gives navigation robots a grounding tool that can take natural language commands and generate behaviors. The software also allows robots to compute the logical leaps a robot needs to make to make decisions based on both the context from the instructions and what they say the robot can do and in what order.

“In the paper, we were particularly thinking about mobile robots moving around an environment,” Stefanie Tellex, a computer science professor at Brown and senior author of the new study, said in a release. “We wanted a way to connect complex, specific and abstract English instructions that people might say to a robot — like go down Thayer Street in Providence and meet me at the coffee shop, but avoid the CVS and first stop at the bank — to a robot’s behavior.”

Step by step with Lang2LTL

The software system created by the team, called Lang2LTL, works by breaking down instructions into modular pieces. The team gave a sample instruction — a user telling a drone to go to the store on Main Street after visiting the bank — to show how this works.

When presented with that instruction, Lang2LTL first pulls out the two locations named. The model matches these locations with specific spots that the model knows are in the robot’s environment.

It make this decision by analyzing the metadata it has on the locations, like their addresses or what kind of store they are. The system will look at nearby stores and then focuses on just the ones on Main Street to decide where it needs to go.

After this, the language model finishes translating the command to linear temporal logic, the mathematical codes and symbols that can express these commands in a way the robot understands. It plugs the locations it mapped into the formula it has been creating and gives these commands to the robot.

Brown scientists continue testing

The Brown researchers tested the system in two ways. First, the research team put the software through simulations in 21 cities using OpenStreetMap, an open geographic database.

According to the team, the system was accurate 80% of the time within these simulations. The team also tested its system indoors on Brown’s campus using a Spot robot from Boston Dynamics.

In the future, the team plans to release a simulation based in OpenStreetMaps that users can use to test out the system themselves. The simulation will be on the project website, and users will be able to type in natural language commands for a simulated drone to carry out. This will let the researchers better study how their software works and fine-tune it.

The team is also plans on adding manipulation capabilities to the software. The research was supported by the National Science Foundation, the Office of Naval Research, the Air Force Office of Scientific Research, Echo Labs, and Amazon Robotics.