How Zoox robotaxis make predictions while on the road

Listen to this article

Zoox’s robotaxis have sensors placed high on the vehicle’s four corners that give it a 360º view of its surroundings. | Source: Zoox

Amazon acquired Zoox back in January 2020 for what reports suggested was around $1.2 billion. Since then, the company has revealed its Zoox vehicle, a rectangular passenger-focused vehicle with no driver’s seat or steering wheel and expanded testing facilities, but news from the company has been otherwise quiet.

Amazon recently demonstrated how the Zoox vehicle can predict its surroundings up to eight seconds in the future. These seconds allow the vehicle to react and make prudent and safe driving decisions.

Zoox’s artificial intelligence (AI) stack is at the heart of the vehicle’s ability to predict these outcomes. To accomplish this, the stack employs three broad processes: perception, prediction, and planning.

Predicting the future

Zoox’s AI stack starts with its perception stage, where the vehicle takes in everything in its surroundings and how each thing is moving.

The perception phase begins with high-resolution data that Zoox’s team gathers from the vehicle’s sensors. Zoox is equipped with a variety of sensors, from visual cameras to LiDAR, radar, and longwave infrared cameras. These sensors are placed at the high four corners of the vehicle, giving Zoox an overlapping, 360º view of the car’s surroundings for over 100 m.

The robotaxi combines this data with an already provided, detailed semantic map of its environment called the Zoox Road Network (ZRN). The ZRN has information about local infrastructure, road rules, speed limits, intersection layouts, location of traffic symbols and more.

The perception AI then identifies and classifies surrounding cars, pedestrians and cyclists, which it calls “agents.” The AI tracks each of these agent’s velocities and trajectories. It then boils down this data to its essentials, making it into a 2D image optimized for machine learning to understand.

This image is presented to a convolutional neural network, which decides what items in the image matter to the vehicle. The image includes around 60 channels of semantic information about all of the agents in it.

With this information, the machine learning system creates a probability distribution of potential trajectories for each dynamic agent in the vehicle’s surroundings. The machine learning system considers the trajectory of all agents, as well as how cars are expected to move on a given street, what traffic lights are doing, the workings of crosswalks and more.

The system’s resulting predictions are usually around eight seconds into the future, and are recalculated every tenth of a second with new information from the perception system.

Weighted predictions are given to the final stage of the process, the planner phase. The planner is the car’s executive decision-making. It takes predictions from the previous phase and uses it to decide how the Zoox vehicle will move.

Constantly improving predictions

While Zoox’s AI stack has millions of miles of sensor data collected by the company’s test fleet to train from, the team is still constantly trying to improve its accuracy.

Right now, the team is working to leverage a graph neural network (GNN) approach to improve the stack’s prediction capabilities. A GNN would enable the vehicle to understand the relationships between different agents around it and within itself, as well as how those relationships will change over time.

The team is also working to more deeply integrate the prediction and planning stages of the process to create a feedback loop. This would allow the prediction and planner systems to interact by allowing the planner system to ask the prediction system how agents might react to certain behaviors before carrying out decisions.