The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe

Encord releases EBIND multimodal embedding model for AI agents

By Eugene Demaitre | November 19, 2025

A robot with multiple sensors. EBIND is designed to enable AI teams to create multimodal models.

The EBIND model enables AI teams to use multimodal data. Source: StockBuddies, AI, via Adobe Stock

As robots tackle increasingly complex environments and tasks, their artificial intelligence needs to be able to process and use data from many sources. Encord today launched EBIND, an embedding model that it said allows AI teams to enhance the capabilities of agents, robots, and other AI systems that use multimodal data.

“The EBIND model we’ve launched today further demonstrates the power of Encord’s data-centric approach to driving progress in multimodal AI,” stated Ulrik Stig Hansen, co-founder and president of Encord. “The speed, performance and functionality of the model are all made possible by the high-quality E-MM1 dataset it was built on – demonstrating again that AI teams do not need to be constrained by compute power to push the boundaries of what is possible in this field.”

Founded in 2021, Encord provides data infrastructure for physical and multimodal AI. The company, which  has offices in London and San Francisco, said its platform enables AI labs, human data companies, and enterprise AI teams to curate, label, and manage data for AI models and systems at scale. It uses agentic and human-in-the-loop workflows so these teams can work with multiple types of data.

EBIND built on E-MM1 dataset, covers five modalities

Encord built EBIND on its recently released E-MM1 dataset, which it claimed is “the largest open-source multimodal dataset in the world.” The model allows users to retrieve audio, video, text, or image data using data of any other modality.

EBIND can also incorporate 3D point clouds from lidar sensors as a modality. This allows downstream multimodal models to, for example, understand an object’s position, shape, and relationships to other objects in its physical environment.

“It was quite difficult to bring together all the data,” acknowledged Eric Landau, co-founder and CEO of Encord. “Data coming in through the internet is often paired, like text and data, or maybe with some sensor data.”

“It’s difficult to find these quintuples in the wild, so we had to go through a very painstaking exercise of constructing the data set that powered EBIND,” he told The Robot Report. “We’re quite excited by the power we saw of having all the different modalities interact in a simultaneous manner. This data set is 100 times larger than the next largest one.”

AI and robotics developers can use EBIND to build multimodal models, explained Encord. With it, they can extrapolate the 3D shape of a car based on a 2D image, locate video based on simple voice prompts, or accurately render the sound of an airplane based on its position relative to the listener, for instance.

“That’s how you compare the sound of a truck in a snowy environment to the image of it, to the actual audio file, to the 3D representation,” Landau said. “And we were actually surprised that data of as diverse and specific as that actually existed and could be related from a multimodal sense.”

Thanks to the higher quality of data, Encord said EBIND is smaller and faster than competing models, while maintaining a lower cost per data item and supporting a broader range of modalities. In addition, the model’s smaller size means it can be deployed and run on local infrastructure, significantly reducing latency and enabling real-time inference.

Encord makes model open-source

Encord said its release of EBIND as an open-source model demonstrates its commitment to making multimodal AI more accessible.

“We are very proud of the highly competitive embedding model our team has created, and even more pleased to further democratize innovation in multimodal AI by making it open source,” said Stig Hansen.

Encord asserted that this will empower AI teams, from university labs and startups to publicly traded companies, to quickly expand and enhance the capabilities of their multimodal models in a cost-effective way.

“Encord has seen tremendous success with our open-source E-MM1 dataset and EBIND training methodology, which are allowing AI teams around the world to develop, train, and deploy multimodal models with unprecedented speed and efficiency,” said Landau. “Now we’re taking the next step, providing the AI community with a model that will form a critical piece of their broader multimodal systems by enabling them to seamlessly and quickly retrieve any modality of data, regardless of whether the initial query comes in the form of text, audio, image, video or 3D point cloud.”


SITE AD for the 2026 Robotics Summit save the date.

Use cases range from LLMs and quality control to safety

Encord said it expects key use cases for EBIND to include:

  • Enabling large language models (LLMs) to understand all data modalities from a single unified space
  • Teaching LLMs to describe or answer questions about images, audio, video and/or 3D content
  • Cross-modal learning, or using examples from one data type such as images to help models recognize patterns in others like audio
  • Quality-control applications such as detecting instances in which audio doesn’t match the generated video or finding biases in datasets
  • Using embeddings from the EBIND model to condition video generation using text, objects, or audio embeddings, such as transferring an audio “style” to 3D models

Encord works with customers including Synthesia, Toyota, Zipline, AXA Financial, and Northwell Health.

“We work across the spectrum of physical AI, including autonomous vehicles, traditional robots for manufacturing and logistics, humanoids, and drones,” said Landau. “Our focus are these applications where AI is embodied in the real world, and we’re agnostic to the form that it takes.”

Users could also swap in different sensor modalities such as tactile or even olfactory sensing or synthetic data, he said. “One of our initiatives that is that we’re now looking at multilingual sources, because a lot of the textual data is heavily weighted to English,” added Landau. “We’re looking at expanding the data set itself.”

“Humans take in multiple sets of like sensory data to navigate and make inferences and decisions,” he noted. “It’s not just visual data, but also audio data and sensory data. If you have an AI that’s existing in the physical world, you would want it to have a similar set of abilities to operate as effectively as humans do in 3D space.

“So you want your autonomous vehicle to not just see and not just sense through lidar, but also to hear if there’s a siren in the background, you want your car to know that a police car, which might not be in sight, is coming,” Landau concluded. “Our view is that all physicalized systems will be multimodal in some sense in the future.”

About The Author

Eugene Demaitre

Eugene Demaitre is editorial director of the robotics group at WTWH Media. He was senior editor of The Robot Report from 2019 to 2020 and editorial director of Robotics 24/7 from 2020 to 2023. Prior to working at WTWH Media, Demaitre was an editor at BNA (now part of Bloomberg), Computerworld, TechTarget, and Robotics Business Review.

Demaitre has participated in robotics webcasts, podcasts, and conferences worldwide. He has a master's from the George Washington University and lives in the Boston area.

Tell Us What You Think! Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles Read More >

A robotic arm builds a lattice-like stool after hearing the prompt “I want a simple stool,” demonstrating how the system translates speech into real-time fabrication.
With AI, MIT researchers teach a robot to build furniture by just asking
Jeff Burnstein, president of A3, introduced the panel discussion, which included, from left, Boston Dynamics' Brendan Schulman, Path Robotics' Heather Carroll, Intrinsic's Torsten Kroger, LCCC's Terri Santu, and MCCCT's Jason Moore and Matt Peters.
A national robotics strategy is necessary to reshore manufacturing, says the Congressional Robotics Caucus
A rendering of a car moving through a busy street.
Helm.ai releases new architectural framework for autonomous vehicles
Inbolt is helping manufacturers such as Stellantis with vision-guided robots like this one.
Inbolt provides vision guidance in real time for new bin-picking system

RBR50 Innovation Awards

“rr
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for Robotics Professionals.

Latest Episode of The Robot Report Podcast

Automated Warehouse Research Reports

Sponsored Content

  • Supporting the future of medical robotics with smarter motor solutions
  • YUAN Unveils Next-Gen AI Robotics Powered by NVIDIA for Land, Sea & Air
  • ASMPT chooses Renishaw for high-quality motion control
  • Revolutionizing Manufacturing with Smart Factories
  • How to Set Up a Planetary Gear Motion with SOLIDWORKS
The Robot Report
  • Automated Warehouse
  • RoboBusiness Event
  • Robotics Summit & Expo
  • About The Robot Report
  • Subscribe
  • Contact Us

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe