Human Speech in Digital Assistants: Blurring the Lines between Tool and Companion

Digital Assistant, amazon, alexa, echo google home, siri artificial intelligence, AI, cortana, speech, linguistics, software development, technology, machine learning

Digital Assistants can do many things: tell you the weather, a joke, guide you to your location, or remind you to return that phone call. But one thing they cannot yet do is not give away the fact that they are robots.

Amazon is currently working on correcting this, by giving Alexa a more natural sounding voice by using language tags that are designed to allow Alexa to perform more humanlike functions with her voice. This includes whispering, pausing, and varying the speed, volume, emphasis, and pitch of Alexa’s voice. It also stands to reasons that followed by Amazon, other digital assistants like Siri and Cortana will also sound increasingly more human.

But is that what we really want or need from digital assistants?

More sophisticated language tags would in no doubt allow for more enhanced communications, including subtext. It would also make us feel more attached to Alexa, since if she sounds more human, then that is how she will be perceived by us on a subconscious level, even though we know she isn’t.

The more human she sounds, the more people will want to use and interact with her.

The Downside

A perhaps unexpected consequence of making Alexa sound human is that it would change the way that we interact with her—which may hinder her ability to understand us. If she starts mimicking speech patterns like saying “umm” in a sentence, then we will probably start responding to her in the same way, punctuating our words with fillers and slag that she may not understand.

So ultimately, it could have the opposite effect of what is intended. Instead of becoming more useful, the sophistications begin to actually take away from the work that an AI is supposed to perform. That a trade-off would ensue between giving the AI a personality which will make it more pleasant to interact with, or one with less personality but a higher utility.

Setting Expectations

Ultimately, the personality of the AI should inform its purpose. This will help users gauge how far the AI’s abilities can extend. For example, if the voice of an AI sounds very humanlike, a user will set their expectations for its other functions equally high.

So coming back to the question, what do people really want from a digital assistant? Where on the scale of perfect utility vs. an emotional connection do we want to lie?

For a company like Amazon, or any company for that matter, the motive is to sell you their products. The more acute Alexa’s ability to understand—and reciprocate—emotional cues, the better she will ultimately be at selling you things. But in a different setting, the same technology could be used to in education, eldercare, or something else entirely.

In the future, we can see AI including Alexa impact public policy, employment, and ethics as more and more nuance is added to them. This is why we need to stop and really consider which direction we want to take, and not just continue to make improvements for improvement’s sake.


Please enter your comment!
Please enter your name here