The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe

UC Berkeley open-sources BDD100K self-driving dataset

By Steve Crowe | June 5, 2018

BDD100K self-driving dataset

UC Berkeley has released to the public its BDD100K self-driving dataset. The BDD100K self-driving dataset is quite vast with 100,000 videos that can be used to further technologies for autonomous vehicles. The dataset is part of the university’s DeepDrive project that is investigating state-of-the-art technologies in computer vision and machine learning for automotive applications.

Developers can download the BDD100K self-driving dataset here and read more about it in this academic paper. Each video in the BDD100K self-driving dataset is about 40 seconds long and are viewed in 720p at 30 frames per second. According to the researchers, the videos were collected from about 50,000 trips on streets throughout the United States.

The videos were shot at different times of the day and in various weather conditions. Datasets like this are vital to teaching autonomous systems how to cope with different environments and driving conditions.

The UC Berkeley team said the BDD100K self-driving database contains about one million cars, more than 300,000 street signs, 130,000 pedestrians, and much more. The videos also contain GPS locations (from mobile phones), IMU data, and timestamps across 1100 hours.

Related: MapLite enables autonomous vehicles to navigate unmapped roads

BDD100K self-driving dataset

BDD100K isn’t the only self-driving dataset available, but it is the largest.

BDD100K will be especially suitable for computer vision training to detect and avoid pedestrians on the street, as it contains more people than other datasets. CityPerson, a dataset specialized for pedestrian detection, has only about one-quarter the people per image that BDD100K does.

BDD100K isn’t the first publicly-available self-driving dataset, but it is the largest. Baidu in March released its ApolloScape dataset, but BDD100K is 800 times larger. It’s also 4,800 times bigger than Mapillary’s dataset and 8,000 times bigger than KITTI.

Annotating the BDD100K dataset

Classifying all the objects in each of these videos would be quite time-consuming for developers. So UC Berkley has already done that work for you, annotating with 2D bounding boxes more than 100,000 images contain objects like traffic signs, people, bicycles, other vehicles, trains, and traffic lights.

BDD100K self-driving dataset

BDD100K has two types of lane margins.

The annotated videos have two types of lane margins: vertical lines are colored red while parallel lines are colored blue. The researchers also want to take road segmentation to the next level. They’ve divided the drivable area into two categories: “directly drivable area” (red areas) and “alternatively drivable area” (blue areas). We’ll let the researchers explain:

“In our dataset, the ‘directly drivable area’ defines the area that the driver is currently driving on – it is also the region where the driver has priority over other cars or the “right of the way”. In contrast, ‘alternatively drivable are’ is a lane the driver is currently not driving on, but could do so – via changing lanes. Although the directly and alternatively drivable areas are visually indistinguishable, they are functionally different, and requires potential algorithms to recognize blocking objects and scene context.”

The researchers continue, “in align with our understanding, on highway or city street, where traffic is closely regulated, drivable ar-eas are mostly within lanes and they do not with the vehicles or objects on the road. However, in residential areas, the lanes are sparse. Our annotators can judge what is drivable based on the surroundings.”

BDD100K self-driving dataset

The BDD100K self-driving dataset divided the drivable area into two categories: “directly drivable area” and “alternatively drivable area.”

To annotate all this data, the researchers built a semi-autonomous tool that speeds up labeling bounding boxes, semantic segmentation, and lanes in the driving database. The tool can be accessed via a web browser.

For box annotation, for example, the team trained a Fast-RCNN object detection model to learn from 55k labeled videos. The model will work alongside human annotators and, the researchers said, save 60 percent of the time required for drawing and adjusting bounding boxes.

“Our annotation system incorporates different kinds of labeling heuristics to improve productivity, and can be extended to different types of image annotation,” the researchers wrote. “With this production-ready annotation system, we are able to label a driving video dataset that is larger and more diverse than existing datasets. This dataset comes with comprehensive annotations that are necessary for a complete driving system.

“Moreover, experiments show that this new dataset is more challenging and more comprehensive than existing ones, and can serve as a good benchmark for domain adaption due to its diversity. This will serve to help the research community with understanding on how different scenarios affect existing algorithms’ performance.”

BDD100K self-driving dataset

The back- (left) and front-end of BDD100K’s labeling tool.

About The Author

Steve Crowe

Steve Crowe is Executive Editor, Robotics, WTWH Media, and chair of the Robotics Summit & Expo and RoboBusiness. He is also co-host of The Robot Report Podcast, the top-rated podcast for the robotics industry. He joined WTWH Media in January 2018 after spending four-plus years as Managing Editor of Robotics Trends Media. He can be reached at scrowe@wtwhmedia.com

Tell Us What You Think! Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles Read More >

The founders of MOTOR Ai, a German self-driving software startup.
MOTOR Ai gets seed funding toward explainable self-driving software
Outrider has designed safety systems for autonomous yard trucks, with mockup shown here.
Outrider designs safety system for autonomous yard trucks
headshots of Tim Bucher, and Sean Walters with podcast artwork. John Santagate from infios is also on this episode.
Farmer-first future: Agtonomy’s approach to smart agriculture
An autonomous NAVI vehicle on Jacksonville roads.
Beep launches fully autonomous public transit system in Florida

RBR50 Innovation Awards

“rr
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for Robotics Professionals.
The Robot Report Listing Database

Latest Episode of The Robot Report Podcast

Automated Warehouse Research Reports

Sponsored Content

  • How to Set Up a Planetary Gear Motion with SOLIDWORKS
  • Sager Electronics and its partners, logos shown here, will exhibit at the 2025 Robotics Summit & Expo. Sager Electronics to exhibit at the Robotics Summit & Expo
  • The Shift in Robotics: How Visual Perception is Separating Winners from the Pack
  • An AutoStore automated storage and retrieval grid. Webinar to provide automated storage and retrieval adoption advice
  • Smaller, tougher devices for evolving demands
The Robot Report
  • Automated Warehouse
  • RoboBusiness Event
  • Robotics Summit & Expo
  • About The Robot Report
  • Subscribe
  • Contact Us

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe