TV Voice Assistant

UI Revision

BACKGROUND
The voice assistant of Changhong TV is called Changhong Xiaobai. Since 2017, with increasing features and participation of different designers, the design had become fragmented. There were many problems in usability, compatibility with new features, visual aesthetics as well as maintenance, so a UI revision was needed.  
ROLE
Organize Usability Tests / Competitive Analysis / UI Design

Participate in: Prototyping
April - July 2020
Industry background
Dialogue is human instinct. A voice enabled device can be more humanized. However, even without technical limitation, the human machine interaction is not only a single mode of listening and speaking, because that is not what our natural communication like. At present, in home scenes, users’ interaction with TV is still a simultaneous process of three types of information: visual, audio and tactile, which correspond to the behavior of viewing, listening/speaking, and remote controlling respectively. For UX designers, it’s worth thinking about how to let TV carry so much perceptual design at the same time.
The older UI
Voice experience design is a complex system, covering user research, functions, content, interaction, and technical implementation, while GUI is an auxiliary model.
Key concepts
VUI
Voice UI is an interaction model where a human interacts with a machine and performs a set of tasks at least in part by using voice.
Domain
divides the user tasks into some big categories, such as the domain of video, music, cyclopedia, etc.
Continuous Dialog
is a mode that saves users from having to evoke the voice assistant before saying every sentence. While in the regular mode, you need to evoke it every time.
Classic scene
Mike: “Changhong Xiaobai, recommend me some TV dramas.”
Xiaobai: “Look if there are any of these you like”, meanwhile, pushes some popular TV dramas.
Mike: Considering for a while...says, “The second one.”
Xiaobai: Starts playing Better Call Saul.
User Experience Map
Any voice task, including the example above, will go through the following journey:
Anatomy of older UI
Our voice GUI consists of two parts: basic area (required) and content area (optional).
Older UI design
Full scrim: such as the video domain, where movie posters fly in from all directions.
Continuous dialog: a bottom bar without hint text.
Half scrim: used in most domains, like weather. This scrim is a half-screen blue gradient mask plus a black semi-translucent container, and the background is visible under the scrim.
Revision requirements
Based on usability test of 50 users, feedback from across the country, and evaluation of experts, I summarized all requirements:
State indicating motion
Problems: Although it’s cool, the motion has many frames and costs much memory. The continuous dialog state motion violates the motions in regular mode.

Analysis: Reduce motion space like the scrim. Make it more simple and intuitive. Ensure feasibility.
Basic area
Problems: Basic Area obscures the background too much, and large area of blue distract visual focus.

Analysis: Reduce the Basic Area space just to contain elements. For the scrim, either not obscuring the background or totally covering it. Desaturate the scrim.
Hint text
Problem: Hints are not easy to notice.

Analysis: The hint text is already at the foreground, so we may consider the color contrast or overall layout: streamline UI elements and focus on information.
Video domain
Problems: The animation of posters flying in is visually chaotic, unable to offer a normal focus moving path, also, difficult to maintain by front end. The focus is not obvious.

Analysis: 1. Design new layout for posters, remove the entrance animation but focus on the resource. 2. Unify the design specifications.
Non-video domains
Problems: The half scrim has redundant layers, which mess up the interface on a playing background. Also this layout is poorly scaleable.  

Analysis: Redesign the Content Area’s framework for all domains.
Others
Problems: Empty states are not uniform. Some minor problems in specific domains, such as out-of-focus state in face recognition domain is not obvious.  

Analysis: Unify all empty states such as loading and error. Trouble shoot small issues in each domain one by one.
Revision goals
Unify
motions
Build design specifications
Re-layout
Since the most important and frequently used function for TV is watching videos, the video domain is given priority. Therefore, the sequence of re-layout is: Basic Area > Video domain > non-video domains.

In addition, the scrim needs to present the content in an optimal way: neither waste space nor appear crowded.
Unify the continuous dialog state with the 5 existing states, and reserve for potential states in the future.

Design new motions if it’s necessary for the new layout.
Establish atomic design guidelines for UI elements. Make reusable and scalable components and templates.

Customize layout templates for the media resources provided by existing content partners, while ensuring the templates adapt to potential content in the future.
Revision process
Basic area
After competitive analysis, I exhausted 7 alternatives for the Basic Area's layout:
A. Fan
Advantages
1. It has the most inheritance.
2. Make the left and right background corners less obscured.
3. Expose more vertical space, which is precious on TV.
Pain points
1. Position of hints is too low to be noticed (we tried it).
2. Middle part of screen is widely covered.
3. No good for continuous dialog, neither for the streaming recognition of user's command, since the center alignment will make text jitter.
B1. Bottom rectangle
Advantages
1. The least obscuring to background.
2. Browsing from left to right conforms to people's reading habits.
3. The container color can change with scenes, various and interesting.
4. Also it's a popular layout.
Pain points
1. The least inheritance.
2. State indicating motion is not eye-catching.
3. Need to define the spacing between text on the left and hints on the right, in case they collide.
B2. Feathered bottom rectangle
Advantage
More sense of technology and visual inheritance compared with B1.
Pain point
To ensure legibility, the container will actually increase its height and cover area.
C1. Two-line bottom rectangle
Advantages
1. Streaming recognition is easy to display.
2. Left alignment looks compact and tidy.
Pain points
1. The high container’s edge rigidly splits the entire interface.
2. Empty space on the right side is a waste.
C2. Feathered two-line bottom rectangle
Advantage
More sense of technology and visual inheritance compared with C1.
Pain point
The translucency of container is hard to define, since too little affects reading while too much needs more hight. In short, it obscures too much.
D. Sidebar dialog
Advantage
It shows the dialog in an intuitive way.
Pain points
1. The screen is visually unbalanced.
2. Much unnecessary container area.
3. Hint text has limited space.
4. The right position makes streaming recognition jitter.
5. It’s an outdated layout.
D. Sidebar dialog
Advantage
Same as D and it’s more balanced than D.
Pain points
1. Same as D and obscures more than D.
2. Poor readability since elements are too scattered left and right.
How might I decide with so many advantages and pain points?  I categorized all affecting factors,  and quantized their layout performance according to universal interaction principles, then worked out the priority:
Finally we choose B1: bottom rectangle, and added some gradient transition like B2. To solve the pain points of B1, we needed to design a new set of state indicating motions, and define the spacing between text on left and right. Technically we were able to follow up the user’s speaking: one sentence the user says, one sentence we display.
* This blue figure was designed by my colleague.
In continuous dialog mode, the user can skip evoking but directly talk to Xiaobai. Xiaobai in waiting state shrinks at the corner, not occupying the screen but seamlessly connected with regular states.
The waiting state of continuous dialog
The listening state of continuous dialog
Content area - video domain
TV screen is like a big canvas for free play. The layout of video posters must take account of these factors:
Container’s proportion
Alternatives: half-screen, full-screen, and half to full-screen.
Navigation style
Instead of masonry layout, tabs, or multi-row swim lanes, grids are suitable because our poster size can be unified and we don’t have themed content at the moment.
Poster’s aspect ratio
We may use posters in landscape or portrait.
Number of posters
Consider if it's more than one screen and the size of poster.
After competitive analysis, I came up with 4 possible layouts:
1. Full-screen grids
6 posters per row, scroll down to turn pages
Analysis
It totally obscures the background, but provides a lot of options.
2. Half-screen swim lane
10+ per row, scroll right to turn pages
Analysis
Although it obscures less and displays more items, the user action is not friendly for either remote control or voice, but suits air mouse. LG has the world-leading air mouse so they adopt this layout.
3. Half-screen swim lane
6.5 posters per row, scroll right
Analysis
It obscures less and contains moderate amount of items, nearly satisfactory.
4. Poster in landscape
3.5 posters per row, scroll right, half or full-screen
Analysis
The landscape view suits videos with long titles, such as Chinese short videos and english videos. While for the Chinese long videos, portrait view is more suitable.
Behavioral data indicated that users’ video intentions have 2 types: clear what they want to watch- “Play Game of Thrones”, and unclear but ask Xiaobai for advice- ”Recommend me a TV drama”. We name them ordinary recommend and personalized recommend respectively. Since the former has few content in most cases, half-screen is used for it, while the later is full-screen, and both of them present no more than 10 posters:
Ordinary recommendation
The target content is accurate, and less in quantity. Half-screen container will not completely interrupt the background playing.
10 targeted posters. Since the user is more intented to browse, full-screen doesn’t need paging and is remote control free in the whole process.
Personalized recommendation
Content area - non-video domains
This is more complicated because of diverse resource specifications. Nevertheless, they are all permutation and combination of basic elements: text and graphics. I built them gradually from molecules to organisms according to my modified atomic design system:
Small molecules
include inseparable graphics, text and components.
The size of text, components and small images are close, so I regard them as one type when arranging them:
Big molecules
are the combination of small molecules.
Organisms
are the repetition of molecules.
Like playing lego, I combined and repeated the images according to their size features. This step allows molecules to permute in a rich manner to form a compact organism. Moreover, there may be a secondary container expanded by the remote in the future, which will be fully compatible.
I obtained many more molecules actually. This method ensures all 20+ domains of the content area have corresponding layouts, as well as be expandable. The following table shows these domains.
Content area - background (Container)
In real world, content may appear in any of the previous molecules or organisms, which demand for specialized containers since the large container for less items is a waste while conversely, it’s crowded and messy. Still, 5 alternative containers to choose:
a. Card
The card is the container, small sized.
Advantage
It obscures background the least.
Pain points
Not obvious, easily messy.
Solution: use enough, bright and clean colored inner padding.
Suitable for: small molecules, less content.
b. Floating translucent container
Container is a black translucent rectangle, the older UI.
Advantage
Obscures less.
Pain point
Can be very messy.
Suitable for: little to medium content.
c. Half-screen feathered container
A dark gradient scrim, with a responsive height.
Advantage
Strong sense of technology and inheritance.
Pain points
It obscures more, and easy to mess. Make sure that the content keeps a distance with the changing background.
Suitable for: medium content, repeated arrangement.
d. Half-screen opaque container
A dark opaque rectangle with responsive height.
Advantage
Obvious, never messy.
Pain point
Should avoid a sharp split to the interface by the edge of container.
Suitable for: medium content, repeated arrangement.
e. Full-screen container
Opaque or nearly opaque full-screen scrim, totally obscuring background.
Advantage
Most obvious, never messy.
Pain point
Obscures the most.
Suitable for: complex combination of multiple items, expand the interface vertically, much content.
Score all layouts by priority:
It can be seen that scheme d (half-screen opaque container) is the best. Finally we adopted d and e, since two types are not fragmented and meanwhile they can holder all content amount ranging from little to much.
The height of the half-screen container is adaptive to the content until it reaches the upper limit, then a slider is used to turn pages.
Having measured on real TV, the underlying background cannot be less than 42% exposed.
The height of the half-screen container is adaptive to the content until it reaches the upper limit, then a slider is used to turn pages.
Having measured on real TV, the underlying background cannot be less than 42% exposed.
Face recognition
When the character has no corresponding media content, only use the half-screen container to display recognized characters.
When the character has related content, use full-screen container to show it.
Style and UI Design
At this point, the basic area, all domains’ layouts, and the containers are settled. Next, build all of them together, and typically consider if the content has a focus state or turn pages, which is decided by the specific domain and scene.

Coincidently, our TV design guidelines were in building when I did this project, so the specification of color, typography, grids and focus were from our guidelines. Welcome to read the color guidelines in my another work of this portfolio.
UIs
Tests and conclusions
User preference test
Our department organized 40 users to experience the new and older versions, then conducted preference tests and collected feedback. Some of the feedback is shown as following:
80% of the 40 users preferred the new version, 12.5% and 7.5% thinking they are the same and preferring the old respectively. Specifically, UI & interaction was the biggest improvement (12%), followed by motion, and color & containers (2% and 7% separately) .

The most positive comments of the new version were: easy to view, high space utilization, and better layout.

While the most complaints were: containers too dark, toneless color scheme, motion not good-looking nor obvious enough. I immediately adjusted the container colors and what you see is the the optimized result.

The follow up compatibility tests: this new structure can easily support various needs such as festival skin change, mini programs, third-party apps, a guide for continuous dialog, etc.
Conclusions and learnings
1 / The biggest harvest is that I managed to anatomize design from the most granular elements, meanwhile, apply my revised Atomic Design System in the real project. I know a surface built by this meticulous method can be stable, comprehensive, and extensible.

2 / It’s often emphasized that big screen should bring a sense of immersion, with edge-to-edge content; that is true thus the TV interface usually uses overlaying structures. However that often leads to legibility issues, so there must be priority of the content. We should always consider the changing background behind the foreground information, and avoid messy.

3 / In this work, I mainly analyzed the two-dimensional surfaces, but no three or four dimensional factors, for example, secondary surface and timing. Although, the current structure is ready to unknown new needs, I need further study in that direction.
Back to homepage