Berkeley released BridgeData V2 dataset for robot learning at scale

Listen to this article

dataset image of a robot closing a cabinet door.

The initial and final state of a trajectory with the natural language annotation “close cabinet”. | Source: UC Berkeley

A research team at UC Berkeley has released BridgeData V2, a large and diverse dataset of robotic manipulation behaviors. The dataset aims to facilitate research in scalable robot learning.

This updated dataset is compatible with open-vocabulary and multi-task learning methods conditioned on goal images or natural language instructions. The skills learned from the dataset can be generalized to novel objects and environments and across institutions.

The Berkeley team collected data from a wide range of tasks in many different environments with variations in objects, camera poses, and workspace positioning. All of these variations are to better support broad generalization.

The dataset includes 60,096 trajectories, 50,365 teleoperated demonstrations, 9,731 rollouts, 24 environments, and 13 skills. Each trajectory is labeled with a natural language instruction corresponding to the task the robot is performing.

The 24 environments included in the dataset are grouped into four different categories. Most of the data comes from seven distinct toy kitchens, which all include some combination of sinks, stoves, and microwaves.

Most of the 13 skills included in BridgeData V2 come from foundational object manipulation tasks like pick-and-place, pushing, and sweeping. Some data comes from environment manipulation, which includes things like opening and closing doors and drawers. The rest of the data comes from more complex tasks, like stacking blocks, folding clothes, and sweeping granular media. Some parts of the data come from mixtures of these categories.

The team evaluated several state-of-the-art offline learning methods using the dataset. First, they evaluated the dataset on tasks that are seen in the training data. While the tasks were seen in training, many methods still required the team to generalize to novel object positions, distractor objects, and lighting. The team then evaluated the dataset on tasks that require generalizing skills in the data to novel objects and environments.

The data was collected on a WidowX 250 6DOF robot arm. The team collected the demonstrations by teleoperating the robot with a VR controller. The control frequency is 5 Hz and the average trajectory length is 38 timesteps.

For sensing, the team used an RGBD camera that is fixed in an over-the-shoulder view, two RGB cameras with poses that are randomized during data collection, and an RGB camera attached to the robot’s wrist. All of the images are saved at a 640×480 resolution.

The dataset can be downloaded here. The data for teleoperated demonstrations and from the scripted pick-and-place policy are provided as separate zip files. The team provides both model training code and pre-trained weights for getting started with the dataset.

This repository provides code and instructions for training on the dataset and evaluating policies. This guide provides instructions for setting up the robot hardware.