A report from Belgian public broadcaster VRT NWS has revealed how contractors paid to transcribe audio clips collected by Google’s AI assistant can end up listening to sensitive information about users, including names, addresses, and details about their personal lives.
It’s the latest story showing how our interactions with AI assistants are not as private as we may like to believe. Earlier this year, a report from Bloomberg revealed similar details about Amazon’s Alexa, explaining how audio clips recorded by Echo devices are sent without users’ knowledge to human contractors, who transcribe what’s being said in order to improve the company’s AI systems.
Worse, these audio clips are often recorded entirely by accident. Usually, AI assistants like Alexa and Google Assistant only start recording audio when they hear their wake word (eg, “Okay Google”), but these reports show the devices often start recording by mistake.
In the story by VRT NWS, which focuses on Dutch and Flemish speaking Google Assistant users, the broadcaster reviewed a thousand or so recordings, 153 of which had been captured accidentally. A contractor told the publication that he transcribes around 1,000 audio clips from Google Assistant every week. In one of the clips he reviewed he heard a female voice in distress and said he felt that “physical violence” had been involved. “And then it becomes real people you’re listening to, not just voices,” said the contractor.
You can watch more in the video report below:
Tech companies say that sending audio clips to humans to be transcribed is an essential process for improving their speech recognition technology. They also stress that only a small percentage of recordings are shared in this way. A spokesperson for Google told Wired that just 0.2 percent of all recordings are transcribed by humans, and that these audio clips are never presented with identifying information about the user.
These obfuscations could cause legal trouble for the company, says Michael Veale, a technology privacy researcher at the Alan Turing Institute in London. He told Wired that this level of disclosure might not meet the standards set by the EU’s GDPR regulations. “You have to be very specific on what you’re implementing and how,” said Veale. “I think Google hasn’t done that because it would look creepy.”
In a blog post published later in the day, Google defended its practice of using human employees to review Assistant audio conversations. The company says it applies “a wide range of safeguards to protect user privacy throughout the entire review process,” and it does this review work to improve the Assistant’s natural language processing and its support for multiple languages. But Google also owned up to the failure of those safeguards in the case of the Belgian contract worker who provided the audio to VRT NWS, breaking the company’s data security and privacy rules in the process.
“We just learned that one of these language reviewers has violated our data security policies by leaking confidential Dutch audio data,” writes David Monsees, a product manager on the Google Search team who authored the blog post. “Our Security and Privacy Response teams have been activated on this issue, are investigating, and we will take action. We are conducting a full review of our safeguards in this space to prevent misconduct like this from happening again.”
Update 7/11, 6:33PM ET: Added information and comment from Google’s blog post published in response to the VRT NWS report.