Using a Samsung Simband, MIT CSAIL has developed a wearable AI that analyzes audio and measures movement, heart rate, blood pressure and skin temperature to help detect the tone of a conversation and help people better understand social interactions.
Many people struggle with everyday social situations, especially those with anxiety issues or conditions like Asperger’s. But MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) thinks its new wearable AI can help those affected better understand social interactions.
The wearable device could eventually serve as a “social coach,” predicting if a conversation you’re having is happy, sad or neutral based on a person’s speech patterns and vitals. MIT CSAIL outfitted participants of the study with a Samsung Simband, a device that captures high-resolution physiological waveforms to measure heart rate, blood pressure, blood flow and skin temperature. The system also captured audio data and text transcripts to analyze the speaker’s tone, pitch, energy, and vocabulary.
As a participant tells a story, MIT CSAIL says its system can analyze audio, text transcriptions and physiological signals to determine the overall tone of the story with 83 percent accuracy. Using deep-learning techniques, the system can then classify the overall nature of a conversation as either happy or sad and classify each five-second block of every conversation as either positive, negative or neutral.
“The system picks up on how, for example, the sentiment in the text transcription was more abstract than the raw accelerometer data,” says graduate student Tuka Alhanai, who co-authored a related paper with PhD candidate Mohammad Ghassemi. “It’s quite remarkable that a machine could approximate how we humans perceive these interactions, without significant input from us as researchers.”
The model associated long pauses and monotonous vocal tones with sadder stories, while more energetic, varied speech patterns were associated with happier stories. In terms of body language, sadder stories were also strongly associated with increased fidgeting and cardiovascular activity, as well as certain postures like putting one’s hands on one’s face.
On average, the model could classify the mood of each five-second interval with an accuracy that was approximately 18 percent above chance, and a full 7.5 percent better than existing approaches.
“As far as we know, this is the first experiment that collects both physical data and speech data in a passive but robust way, even while subjects are having natural, unstructured interactions,” says Ghassemi. “Our results show that it’s possible to classify the emotional tone of conversations in real-time.”
In future work, the team hopes to collect data on a much larger scale, potentially using commercial devices like the Apple Watch to more easily deploy the system out in the world.
“Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious,” says Alhanai. “Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket.”
Alhanai says the team is keeping privacy in mind when developing this wearable AI. The algorithm runs locally on a user’s device as a way of protecting personal information.
“Our next step is to improve the algorithm’s emotional granularity so it can call out boring, tense, and excited moments with greater accuracy instead of just labeling interactions as ‘positive’ or ‘negative’,” says Alhanai. “Developing technology that can take the pulse of human emotions has the potential to dramatically improve how we communicate with each other.”