When she was studying for her Ph.D. at Cambridge, Rana el Kaliouby, CEO of Affectiva, realized she was spending more time with technology than with any human being.
“I realized that this machine was emotion blind. It had absolutely no idea how I was feeling,” el Kaliouby said. “It took actions or decisions that were not at all congruent with my emotional and mental state, but perhaps even worse, it was the main mode of communication I had with my family back home.”
El Kaliouby began to think of the possibility of devices understanding emotions the same way humans do. With artificial intelligence becoming more mainstream, such as in self-driving cars and in assisting in health care, she said the emphasis with AI is with efficiency, “and there is no consideration for the human elements.”
“I really want to kind of bring that balance. I want to marry the IQ and the EQ in our devices. … This has the power not only to reimagine our relationship with technology and human-machine interfaces,” el Kaliouby said, “but, more importantly, reconnect humans in more powerful ways and bring empathy into the equation in terms of human-to-human connection and communication.”
El Kaliouby is the CEO of Affectiva, an emotion recognition software analysis company, and author of the memoir Girl Decoded: A Scientist’s Quest to Reclaim Our Humanity by Bringing Emotional Intelligence to Technology. She presented her lecture “Humanizing Technology with AI” at 10:45 a.m. EDT Tuesday, July 21, on the CHQ Assembly Video Platform as the second part of Week Four’s theme of “The Ethics of Tech: Scientific, Corporate and Personal Responsibility.” El Kaliouby discussed how artificial intelligence can be used to improve human communication and make the world safer, and how these aspirations can only be attained through ethical practices and collecting diverse information.
El Kaliouby grew up in the Middle East, and studied computer science at The American University in Cairo.
“At the time, I got so fascinated by the role technology plays in connecting people and how it changes the way we connect and communicate, and that’s been a common thread across my research, and my work over the last 25-plus years,” el Kaliouby said.
Ninety percent of human communication is nonverbal, which she said includes facial expressions, gestures and vocal intonations. El Kaliouby said people have been researching non-verbal communication, like Guillaume-Benjamin-Amand Duchenne, Charles Darwin and more recently Paul Elman in the ‘70s mapping out every facial muscle into code. For example, she said, when people smile, they activate the Zygomatic muscle and when they furrow their brow, they use the Corrugator muscle.
Becoming a verified face reader requires hundreds of hours of training, so el Kaliouby and her team uses computer vision and machine learning to automate that process. They provide the algorithm or computer with hundreds of thousands of examples of people smiling, smirking or frowning, and the AI looks for similarities. Her team started with facial expressions, then added vocal intonations, then activities like eating, drinking and sleeping.
“The more data you give it, the better it becomes,” el Kaliouby said. “The more diverse data you give it, the better, more robust and more accurate it becomes.”
One of the applications of this technology is helping people with autism communicate. El Kaliouby said that these individuals often avoid looking at the face altogether because they find it too overwhelming.
“So they’re completely missing out on that 90% of communication we’re talking about, which impacts their ability to make friends if they’re in school, their ability to keep jobs if they’re adults, so it has a lot of dire consequences,” el Kaliouby said.
After earning her Ph.D., el Kaliouby worked at MIT and created a device, similar to the Google Glass, that functioned as a real-time coach and helped children on the autism spectrum understand facial expressions. The device made the process into a game for the children, giving them points when looking at faces. This project is still a research project, but el Kaliouby said they found that the children were improving at communication.
El Kaliouby and one of her coworkers, Rosalind Picard, left MIT in 2009 and founded Affectiva. They realized that there were many applications for this technology, such as detecting driver drowsiness, and using facial and vocal biomarkers to detect stress, depression, suicidal intent and Parkinson’s disease.
“Imagine if we can detect all of that just off of your cell phone. Very transformative, but at the same time, we recognize that this data is very personal,” el Kaliouby said. “There is a lot of potential for abuse.”
She said Affectiva set up core values. First, they would reject any business they felt that did not understand what the technology should be used for, even if that meant turning revenue away. Second, any person who chose to give Affectiva data would be compensated. Third, the company would focus on ethical practices and make sure their algorithms are not biased.
“This is really important for me. This is the biggest issue right now in the space of artificial intelligence. It’s not that the robots are going to take over the universe,” el Kaliouby said. “It is that we are just building bias into these systems and then deploying them at scale unintentionally, but with really dire consequences.”
Affectiva has collected 9.5 million facial videos from 90 countries. These videos include people of different genders, ages, ethnicities, and even people wearing face masks and hijabs.
El Kaliouby said a few years after starting Affectiva, one of their principles was tested. The company was running out of money. Two months away from not making payroll, a venture arm of an intelligence agency offered them $40 million in funding to research lie detection, surveillance and security.
“I remember going back home one night and just kind of imagining what the world would look like if we took that money and Affectiva pivoted to working on this,” el Kaliouby said. “I just really didn’t feel that that was in line with why we started Affectiva. Our start was in autism and bridging this communication gap between people, between companies.”
In 2020, the company started an international program where they pair young people with Affectiva employees.
“These kids are asking awesome questions, and really kind of challenging us around what we’re building and how we’re building all this technology. That gives me a lot of hope in the future,” el Kaliouby said.
The lecture then shifted to a Q-and-A session with Chief of Staff and Vice President of Strategic Initiatives Shannon Rozner. The first question was how Affectiva retrained algorithms to analyze people wearing facemasks.
El Kaliouby said that people rely on the lower half of their face for communication, particularly because that is where the mouth is. She said when people genuinely smile out of joy, or a Duchenne smile, they activate muscles around their face, causing wrinkles like crow’s feet.
“What we’re seeing is that when you when you do cover the lower half of your face, you need to exaggerate some of your expressions so that they can manifest in the upper half of the face, but also use things like head gestures and hand gestures that can accentuate some of these nonverbal signals,” el Kaliouby said.
Rozner then asked if el Kaliouby could explain the process of collecting data, from the initial gathering to analysis by the AI.
El Kaliouby said that for training an algorithm to detect a smile, they have two ways of gathering data. One is having people from all over the world watch a video on a laptop, and, with the person’s permission, use the camera to record their reaction. The other way is to have people put dash cameras in their cars and record their daily commute for a few weeks.
Then the videos go to human annotators who watch the video in slow motion and label parts that have smiles. The annotators’ findings serve as the correct answers, or validation data set, and if three out of five annotators find that a person is smiling, then the person is most likely smiling. El Kaliouby said the AI’s findings are tested against the annotators and this process is repeated for different expressions and emotions, and even if the person is eating or drinking.
“The repertoire of things we can train the machine is endless. I find that really exciting,” el Kaliouby said.
The final question was what el Kaliouby’s ideal world in 50 years looked like, and what the average person’s role in creating that world might be.
She said the power of consumers is monumental, and that people need to choose companies that are committed to the ethical development of AI. El Kaliouby said that people being educated about AI will go a long way, and help create not only a more productive and automated future, but a more empathetic and human one, too.
“I really hope in 50 years, we have rebuilt our technology in a way that gives us a sense of connection,” el Kaliouby said. “Not the illusion of a connection, but the real sense that we are connected across borders and across our differences. I’m excited about that.”