“Today, when people want to talk to any digital assistant, they’re thinking about two things: what do I want to get done, and how should I phrase my command in order to get that done,” Subramanya says. “I think that’s very unnatural. There’s a huge cognitive burden when people are talking to digital assistants; natural conversation is one way that cognitive burden goes away.”
Making conversations with Assistant more natural means improving its reference resolution—its ability to link a phrase to a specific entity. For example, if you say, “Set a timer for 10 minutes,” and then say, “Change it to 12 minutes,” a voice assistant needs to understand and resolve what you’re referencing when you say “it.”
The new NLU models are powered by machine-learning technology, specifically bidirectional encoder representations from transformers, or BERT. Google unveiled this technique in 2018 and applied it first to Google Search. Early language understanding technology used to deconstruct each word in a sentence on its own, but BERT processes the relationship between all the words in the phrase, greatly improving the ability to identify context.
An example of how BERT improved Search (as referenced here) is when you look up “Parking on hill with no curb.” Before, the results still contained hills with curbs. After BERT was enabled, Google searches offered up a website that advised drivers to point wheels to the side of the road.
With BERT models now employed for timers and alarms, Subramanya says Assistant is now able to respond to related queries, like the aforementioned adjustments, with almost 100 percent accuracy. But this superior contextual understanding doesn’t work everywhere just yet—Google says it’s slowly working on bringing the updated models to more tasks like reminders and controlling smart home devices.
William Wang, director of UC Santa Barbara’s Natural Language Processing group, says Google’s improvements are radical, especially since applying the BERT model to spoken language understanding is “not a very easy thing to do.”
“In the whole field of natural language processing, after 2018, with Google introducing this BERT model, everything changed,” Wang says. “BERT actually understands what follows naturally from one sentence to another and what is the relationship between sentences. You’re learning a contextual representation of the word, phrases, and also sentences, so compared to prior work before 2018, this is much more powerful.”
Most of these improvements might be relegated to timers and alarms, but you will see a general improvement in the voice assistant’s ability to broadly understand context. For example, if you ask it the weather in New York and follow that up with questions like “What’s the tallest building there?” and “Who built it?” Assistant will continue providing answers knowing which city you’re referencing. This isn’t exactly new, but the update makes the Assistant even more adept at solving these contextual puzzles.
Teaching Assistant Names
Assistant is now better at understanding unique names too. If you’ve tried to call or send a text to someone with an uncommon name, there’s a good chance it took multiple tries or didn’t work at all because Google Assistant was unaware of the proper pronunciation.