A Definitive Guide to Voice User Interface Design (VUI)

Voice-controlled devices are on the rise today. One of Google’s articles, “How voice assistance is reshaping consumer behavior,” about 70% of requests to the Google Assistant are made in natural languages instead of keywords people type on a web page. In addition, 41% of those with smart speakers feel like talking to a real person.

Many experts predict that voice user interface design will revolutionize how we interact with computers in the next decade. This post looks at voice user interface design and the critical aspects of designing visual interfaces.

Understanding Voice User Interface (VUI)

A Voice user interface is designed to allow users to interact with a device through voice commands. Increased use of digital devices is known to cause fatigue which has given rise to the development and use of voice user interfaces.

With VUIs, users don’t have to look at the screen to control devices and apps. The world’s leading tech companies like Amazon, Google, Facebook, Apple, and Microsoft have developed (or are developing) voice-controlled devices and voice-enabled AI assistants. Great examples include Google’s assistant, Apple’s Siri, and Alexa from Amazon.

Besides the AI assistants, smart devices are available on the market today, including Apple HomePod, Google Home, and Amazon Echo. Voice interface and interactions can only become more popular in the future. According to Smart Audio Report, 25% of US adults own a smart speaker, and 33% of the US population uses voice search features.

If you plan to create a voice user interface design, ensure you understand how it works to create a VUI design that won’t frustrate users but provides a better user experience.

Why Voice User Interface Design Matters

With the leading tech giants investing millions in voice technology, one may wonder whether voice technology will replace screens. While that is yet to be achieved, VUI is on an upward trajectory and taking off at a significant speed. Here’s why:

The Technology

Artificial intelligence (AI) is gaining momentum, with many tech companies embracing it. Some experts think that we might experience a robot takeover in the future. Thanks to AI and cloud computing, machines can understand many human speech variations more accurately than before.

The User

Intuition is a critical component in speech communication. Other than mind-reading, speech has the least friction than other communication methods.

Companies

Voice offers an excellent opportunity to build mutual trust, friendship, and affinity. Building a positive rapport is beneficial to a company and will help create a better user experience that will give repeat customers.

Voice is a big driver for people working in UX design or those interested in a better user experience. Many companies will likely have voice design openings in the future. Since few people have these skills, this will present an excellent opportunity for those who learn them now.

How Does a Voice Interface Work?

Different Artificial Intelligence (AI) technologies, including Automatic Speech Recognition, Speech Synthesis, and Name Entity Recognition make the voice user interface. You can add voice UIs to devices or inside applications.

The VUI processes the user’s voice and speech. They are backed by AI technology that enables them to understand the user’s intent and provide a response. The VUIs speech components are stored in a private or public cloud.

Like most companies, you may want to include a graphic user interface (GUI) to the VUIs for a better user experience. Visual and additional sound effects allow the user to know whether the device is listening, processing speech, or giving a response.

Advantages and Disadvantages of Voice Interface

Advantages

There are endless possibilities when exploring the benefits of VUI. These includes:

Ease of use: Users who cannot get along with technological devices can take advantage of voice to request tasks from AI assistants and VUI devices.
Saves time: It takes less time to dictate than typing text messages when requesting a task. Voice is more convenient for users than typing.
Eyes-free: VUI will come in handy if you need an eyes-free experience. This is especially the case if you experience screen fatigue issues or when you need to focus on a task rather than the device.
Hands-free: Sometimes, it is more practical to speak than type. This is the case when cooking, driving, and doing similar tasks.

What are the Disadvantages of VUI?

Misinterpretation: Voice recognition software is not without flaws. For instance, it may not understand and interpret the language context, leading to errors and misinterpretation. Besides, VUIs may not differentiate homonyms like ‘real’ and ‘reel’ or ‘road’ and ‘rode,’ leading to misinterpretation.
Privacy concerns in privacy spaces: Many users will find it hard to give voice commands to devices in public spaces due to privacy concerns and noise.

The Difference Between Voice-Only Interactions and Multimodal Ones

A multimodal interface is where you eliminate the need to use your hands but want to see the results of your voice commands on a screen. A great example of a multimodal interface is a voice-controlled TV. With this interface, a user can view more information than a voice-only device. When it comes to voice-only devices, you need to consider cognitive overload and the quality and speed of information delivery.

Let’s look at the example below to shed more light on the difference between voice-only and multimodal interactions.

Suppose you want a CBD cookbook. A voice-only device will read the result for you at a reasonable pace. On the other hand, a multimodal device will display different results on your device and you could command it to open your preferred option from the list. While you will control the device with voice, you will see the results on the screen.

That means designers should consider both voice-only and multimodal interfaces when designing devices and apps.

VUI Design Fundamental Properties

Before we look at how to design a voice user interface, let’s look at the crucial properties of VUI design:

Hand-and-Eye Free

You need to create a voice-first user interface design even when the VUI device has a screen. While the screen makes the voice interaction better, the user should be able to complete the operation without looking at the screen.

Of course, some tasks cannot be completed by voice alone. However, that doesn’t mean creating actions requiring users to rely heavily on the screen alone to complete tasks. If you have a task that relies on a screen, create a case where users start with voice before switching to a visual interface.

Tone of Voice

Voice is way more than a medium of interaction. Listening to someone (even for a few seconds), you learn a lot about them—gender, age, education, trustworthiness, intelligence, etc.

As such, you need to give your VUI a personality. It needs to match your brand values and be specific to evoke a unique personality.

Personalization

Personalization is another critical component of VUI design. Personalization goes beyond identifying a user name—it is about identifying unique user needs and creating information that matches them.

VUI provides an excellent opportunity for product designers to personalize each user’s interaction. It helps identify new and returning users and create user profiles. After all, as the system learns more about the users, it offers a more personalized experience.

Human-Like Conversation

No one wants to feel like they are communicating with a robot, and your VUI is not exceptional. The conversation should be natural—resemble a natural human conversation. If your system requires users to remember certain phrases to perform specific tasks, you are getting it wrong.

As a rule of thumb, let users use their everyday language. If the commands are unclear, something is wrong, and a redesign may be necessary.

Trust

You cannot create a robust user engagement without trust. Trust is a critical component of a good user experience. Creating good interaction with the voice interface is a great way to create trust.

Some of the ways you can achieve this include:

Be careful with private data: Don’t verbalize sensitive data as it may lead to privacy issues, especially since the users might not be alone
Avoid pure promotional content: No one wants to be sold to. Avoid mentioning brands or products out of content as users may view it too salesy
No offensive content: Introduce sensitive changes by age or region

How to Design a Voice User Interface

Designing a voice user interface is different from any other UX project. In this section, we look at the process of VUI design.

Conduct User Research

To identify problems and users’ pain points, you need to conduct user research. User research will help understand the interaction between the user persona and an assistant in different engagement stages.

Aim to understand the needs, behaviors, and motivations of the user. The goal is to understand how you can use voice as an interaction method in the customer journey map. Is there an opportunity where voice interactions can help enhance the user flow? If you are yet to create the customer journey, think about how you can implement voice interactions as an opportunity in the user flow. If the user journey is already created, focus on seeing how voice interactions can improve the customer journey.

Ideally, you need to solve users’ problems to improve the user flow.

Competitor Analysis

A VUI competitor analysis is critical to determine how competitors implement voice interactions. Some of the factors to focus on when analyzing a competitor’s product include:

The type of voice command they use
Customer reviews
The use cases of the app

Use the information to design a better product.

Define User Requirements

User research and competitor research are not enough. Conducting interviews and user testing will help define users’ pain points and requirements. This way, you will focus on different scenarios before creating conversation flows. Note user requirements with user stories and design dialog flows for each. Next, prototype VUI conversations using the dialog showing the interaction between the user and the device.

Key things to remember when prototyping VUI conversation with dialog flows include:

Keep the interaction conversational and simple
Have a strong error strategy
Confirm when a task is completed
Create an additional layer of strong security

The dialog flows guide users in the customer journey map. It should consist of:

Keywords that encourage the interaction—this includes voice triggers such as “Hello @username.”
Branches showing the direction of the conversation
Sample dialogs

Ideally, a dialog flow is a script with the entire conversation. Some apps that can make the process of creating dialog flow simple include:

Dialogflow
Voiceflow
Speechly

Testing

It is also important to test your dialogs. Ideally, start testing your VUI designs as soon as you have the sample dialogs. Getting feedback during the design process helps identify usability issues and fix them early enough.

A great way to test out dialog is to act it out. Have one person act as the system and the other as the user. As you practice the scripts, focus on how they sound when spoken aloud.

However, it is crucial to remember that non-verbal language does not apply to VUIs systems. Ensure that the participants don’t have eye contact when testing your dialogs.

Another way to test your dialogs is by observing actual user behavior. Take note of users who use your product for the first time and observe any usability issues.

Consider the Anatomy of a Voice Command

VUI designers need to think about the possible interaction scenarios and objectives—what exactly does the user want to achieve. A user command consists of three factors: Intent, utterance, and slot.

Intent

This refers to the primary objective of the user’s voice command. Voice interactions are categorized into two: low utility and high utility interaction.

A low utility interaction involves vague and hard-to-decipher tasks. For instance, when a user needs more information about a topic. The user interface needs to confirm whether the information is available in its service scope before asking more questions to understand and respond better.

On the other hand, a high utility interaction involves specific tasks like requesting lights in the bedroom be turned off.

Utterance

It refers to how a user utters the voice command to trigger a task. While some phrases for requests can be easy to understand, UX designers should not ignore other variations. For instance, instead of saying, “play me song X,” a user could say, “could you play song X.”

Designers should consider these variations to make it easier for AI to understand and respond to requests.

Slot

Slots can either be optional or a requirement depending on the task. For instance, if a user requests music on Spotify, they may say, “play me music.” Since the AI can respond to the request without the variable, the slot here is optional. However, if a user wants to book a reservation at a specific time, the slot here will be the time and is a requirement.

Industries Likely to be Impacted by Voice User Interface

While all industries can use voice interactions, some will likely experience the most significant impact. These include:

Visually Impaired Devices

The popularity of VUI that benefits the whole population will bring large improvements in voice services for people with visual impairments who rely on them. Visually-impaired persons have long relied on things like screen-readers, but the experience has its shortcomings. These people will experience a wealth of online information with devices custom-designed for voice.

Automobile

The automobile industry can significantly benefit from voice interaction due to these reasons:

Operating other devices while driving is not recommended
Looking at graphical interfaces while driving can lead to accidents
It allows extended periods of uninterrupted driving
Many customers are not new to voice assistants in cars

Enabling voice command in cars will be a natural improvement. For example, “find an ATM nearby,” play music, “read my emails,” etc.

Googling things like this while driving is not just inconvenient but will put you at risk. Improved voice command means better user experience and safety.

Customer Service

As machines become increasingly reliable, it will significantly impact call centers, and artificially-intelligent bots may complete direct interactions in the future, giving people more time to deal with more complex issues.

Wearable Electronics

Many wearable electronics rely on an operating system or smartphones to access information. However, we could see wearables that interact through voice eliminate this intermediary to perform various functions.