The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe

Facebook AI Describes Photos to Blind Users

By Steve Crowe | April 5, 2016

Facebook just became a lot more accessible to the 39 million-plus people who are blind and 246 million-plus people with severe visual impairments. Facebook today introduced automatic alternative text, or automatic alt text, an artificial intelligence (AI) application that generates a verbal description of images on the site.

Now people using screen readers on iOS devices will hear a list of items a photo may contain. For example, automatic alt text will now tell a Facebook user that an image “may contain three people, smiling, outdoors.” Before automatic alt text, Facebook users would only hear the name of the person who shared the photo.

Just last week, interestingly, Twitter also made its service more accessible to the visually impaired. Twitter now allows users to add descriptions or “alternative text” to images that allows people using screen readers and braille displays to hear what an image is about.

So, how did Facebook build automatic alt text? The site’s object recognition technology is based on a neural network that contains billions of parameters and is trained with millions of examples of visual objects. The company’s software developers and engineers work with the Facebook Accessibility team to make technology more accessible. Here’s more from Facebook:

While Facebook’s visual recognition technology described above can be used to recognize a wide range of objects and scenes (both referred to as “concepts” in the rest of this post), for this first launch we carefully selected a set of about 100 concepts based on their prominence in photos as well as the accuracy of the visual recognition engine. We also chose concepts that had very specific meanings, and we avoided concepts open to interpretation. The current list of concepts covers a wide range of things that can appear in photos, such as people’s appearance (e.g., baby, eyeglasses, beard, smiling, jewelry), nature (outdoor, mountain, snow, sky), transportation (car, boat, airplane, bicycle), sports (tennis, swimming, stadium, baseball), and food (ice cream, pizza, dessert, coffee). And settings provided different sets of information about the image, including people (e.g., people count, smiling, child, baby), objects (car, building, tree, cloud, food), settings (inside restaurant, outdoor, nature), and other image properties (text, selfie, close-up).

We make sure that our object detection algorithm can detect any of these concepts with a minimum precision of 0.8 (some are as high as 0.99). Even with such a high quality bar, we can still retrieve at least one concept for more than 50 percent of photos on Facebook. Over time our goal is to keep increasing the vocabulary of automatic alt text to provide even richer descriptions.

Construction of sentence

After detecting the major objects in a photo, we need to organize them in a way that feels natural to people. We experimented with different approaches, such as ordering the concepts by their confidence, showing the concepts with a confidence level (such as 50 percent or 75 percent) attached to them, and so on. After many surveys and in-lab user experience studies, and after using this feature ourselves, we decided to group all the concepts into three categories – people, objects, and scenes – and then present information in this order. For each photo, we first report the number of people (approximated by the number of faces) in the photos, and whether they are smiling or not; we then list all the objects we detect, ordered by the detection algorithm’s confidence; scenes, such as settings and properties of the entire image (e.g., indoor, outdoor, selfie, meme), will be presented at the end. In addition, since we cannot guarantee that the description we deliver is 100 percent accurate (given that it’s neither created nor reviewed by a human), we start our sentence with the phrase “Image may contain” to convey uncertainty. As a result, we will construct a sentence like “Image may contain: two people, smiling, sunglasses, sky, tree, outdoor.”

Facebook says it took about 10 months to get automatic alt text to its current stage. The biggest challenge was “balancing people’s desire for more information about the images with the quality and social intelligence of such information. Interpretation of visual content can be very subjective and context-dependent. For instance, though people mostly care about who is in the photo and what they are doing, sometimes the background of the photo is what makes it interesting or significant.”

As of now, Facebook’s automatic alt text is available only on iOS screen readers that are set to English. However, Facebook said it plans to make automatic alt text compatible with other languages in the near future.

Let’s hope this goes better than Microsoft’s AI-powered chat bot, Tay.ai, that was shut down after just 16 hours after it starting tweeting racial slurs, defending white supremacist propaganda, and supporting genocide. Tay was designed to engage in playful conversations with 18- to 24-year-olds. It could tell jokes, play games, send pictures, tell you your horoscope. Tay was even supposed to become more personalized with users as time went on. But within hours of it going live, Twitter users took advantage of Tay’s flaws and forced Microsoft to shut it down.

About The Author

Steve Crowe

Steve Crowe is Executive Editor, Robotics, WTWH Media, and chair of the Robotics Summit & Expo and RoboBusiness. He is also co-host of The Robot Report Podcast, the top-rated podcast for the robotics industry. He joined WTWH Media in January 2018 after spending four-plus years as Managing Editor of Robotics Trends Media. He can be reached at [email protected]

Related Articles Read More >

The TESOLLO DG-5F-S robotic hand.
TESOLLO uses own actuator in DG-5F-S humanoid robotic hand
The AGIBOT A2 on stage at CES 2026 doing a Tai Chi performance.
AGIBOT makes its U.S. debut with more than 5,100 robots shipped
ARM Institute issues education and workforce development project call
Wing drones at a "nest" at at Walmart store.
Wing is bringing drone delivery to 150 more Walmart stores

RBR50 Innovation Awards

“rr
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for Robotics Professionals.

Latest Episode of The Robot Report Podcast

Automated Warehouse Research Reports

Sponsored Content

  • Supporting the future of medical robotics with smarter motor solutions
  • YUAN Unveils Next-Gen AI Robotics Powered by NVIDIA for Land, Sea & Air
  • ASMPT chooses Renishaw for high-quality motion control
  • Revolutionizing Manufacturing with Smart Factories
  • How to Set Up a Planetary Gear Motion with SOLIDWORKS
The Robot Report
  • Automated Warehouse
  • RoboBusiness Event
  • Robotics Summit & Expo
  • About The Robot Report
  • Subscribe
  • Contact Us

Copyright © 2026 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe