Project 2

Industry background

Dialogue is human instinct. A voice enabled device can be more humanized. However, even without technical limitation, the human machine interaction is not only a single mode of listening and speaking, because that is not what our natural communication like. At present, in home scenes, users’ interaction with TV is still a simultaneous process of three types of information: visual, audio and tactile, which correspond to the behavior of viewing, listening/speaking, and remote controlling respectively. For UX designers, it’s worth thinking about how to let TV carry so much perceptual design at the same time.

The older UI

Voice experience design is a complex system, covering user research, functions, content, interaction, and technical implementation, while GUI is an auxiliary model.

Key concepts

VUI

Voice UI is an interaction model where a human interacts with a machine and performs a set of tasks at least in part by using voice.

Domain

divides the user tasks into some big categories, such as the domain of video, music, cyclopedia, etc.

Continuous Dialog

is a mode that saves users from having to evoke the voice assistant before saying every sentence. While in the regular mode, you need to evoke it every time.

Classic scene

Mike: “Changhong Xiaobai, recommend me some TV dramas.”
Xiaobai: “Look if there are any of these you like”, meanwhile, pushes some popular TV dramas.
Mike: Considering for a while...says, “The second one.”
Xiaobai: Starts playing Better Call Saul.

User Experience Map

Any voice task, including the example above, will go through the following journey:

User Experience Map (PDF)

Anatomy of older UI

Our voice GUI consists of two parts: basic area (required) and content area (optional).

Older UI design

Full scrim: such as the video domain, where movie posters fly in from all directions.

Continuous dialog: a bottom bar without hint text.

Half scrim: used in most domains, like weather. This scrim is a half-screen blue gradient mask plus a black semi-translucent container, and the background is visible under the scrim.

Revision requirements

Based on usability test of 50 users, feedback from across the country, and evaluation of experts, I summarized all requirements:

State indicating motion

Problems: Although it’s cool, the motion has many frames and costs much memory. The continuous dialog state motion violates the motions in regular mode.

Analysis: Reduce motion space like the scrim. Make it more simple and intuitive. Ensure feasibility.

Basic area

Problems: Basic Area obscures the background too much, and large area of blue distract visual focus.

Analysis: Reduce the Basic Area space just to contain elements. For the scrim, either not obscuring the background or totally covering it. Desaturate the scrim.

Hint text

Problem: Hints are not easy to notice.

Analysis: The hint text is already at the foreground, so we may consider the color contrast or overall layout: streamline UI elements and focus on information.

Video domain

Problems: The animation of posters flying in is visually chaotic, unable to offer a normal focus moving path, also, difficult to maintain by front end. The focus is not obvious.

Analysis: 1. Design new layout for posters, remove the entrance animation but focus on the resource. 2. Unify the design specifications.

Non-video domains

Problems: The half scrim has redundant layers, which mess up the interface on a playing background. Also this layout is poorly scaleable.

Analysis: Redesign the Content Area’s framework for all domains.

Others

Problems: Empty states are not uniform. Some minor problems in specific domains, such as out-of-focus state in face recognition domain is not obvious.

Analysis: Unify all empty states such as loading and error. Trouble shoot small issues in each domain one by one.

Revision goals

Unify
motions

Build design specifications

Re-layout

Since the most important and frequently used function for TV is watching videos, the video domain is given priority. Therefore, the sequence of re-layout is: Basic Area > Video domain > non-video domains.

In addition, the scrim needs to present the content in an optimal way: neither waste space nor appear crowded.

Unify the continuous dialog state with the 5 existing states, and reserve for potential states in the future.

Design new motions if it’s necessary for the new layout.

Establish atomic design guidelines for UI elements. Make reusable and scalable components and templates.

Customize layout templates for the media resources provided by existing content partners, while ensuring the templates adapt to potential content in the future.

Revision process

Basic area

After competitive analysis, I exhausted 7 alternatives for the Basic Area's layout:

A. Fan

Advantages

1. It has the most inheritance.
2. Make the left and right background corners less obscured.
3. Expose more vertical space, which is precious on TV.

Pain points

1. Position of hints is too low to be noticed (we tried it).
2. Middle part of screen is widely covered.
3. No good for continuous dialog, neither for the streaming recognition of user's command, since the center alignment will make text jitter.

B1. Bottom rectangle

Advantages

1. The least obscuring to background.
2. Browsing from left to right conforms to people's reading habits.
3. The container color can change with scenes, various and interesting.
4. Also it's a popular layout.

Pain points

1. The least inheritance.
2. State indicating motion is not eye-catching.
3. Need to define the spacing between text on the left and hints on the right, in case they collide.

B2. Feathered bottom rectangle

Advantage

More sense of technology and visual inheritance compared with B1.

Pain point

To ensure legibility, the container will actually increase its height and cover area.

C1. Two-line bottom rectangle

Advantages

1. Streaming recognition is easy to display.
2. Left alignment looks compact and tidy.

Pain points

1. The high container’s edge rigidly splits the entire interface.
2. Empty space on the right side is a waste.

C2. Feathered two-line bottom rectangle

Advantage

More sense of technology and visual inheritance compared with C1.

Pain point

The translucency of container is hard to define, since too little affects reading while too much needs more hight. In short, it obscures too much.

D. Sidebar dialog

Advantage

It shows the dialog in an intuitive way.

Pain points

1. The screen is visually unbalanced.
2. Much unnecessary container area.
3. Hint text has limited space.
4. The right position makes streaming recognition jitter.
5. It’s an outdated layout.

D. Sidebar dialog

Advantage

Same as D and it’s more balanced than D.

Pain points

1. Same as D and obscures more than D.
2. Poor readability since elements are too scattered left and right.

How might I decide with so many advantages and pain points? I categorized all affecting factors, and quantized their layout performance according to universal interaction principles, then worked out the priority:

Finally we choose B1: bottom rectangle, and added some gradient transition like B2. To solve the pain points of B1, we needed to design a new set of state indicating motions, and define the spacing between text on left and right. Technically we were able to follow up the user’s speaking: one sentence the user says, one sentence we display.

Content area - video domain

TV screen is like a big canvas for free play. The layout of video posters must take account of these factors:

Container’s proportion

Alternatives: half-screen, full-screen, and half to full-screen.

Navigation style

Instead of masonry layout, tabs, or multi-row swim lanes, grids are suitable because our poster size can be unified and we don’t have themed content at the moment.

Poster’s aspect ratio

We may use posters in landscape or portrait.

Number of posters

Consider if it's more than one screen and the size of poster.

After competitive analysis, I came up with 4 possible layouts:

1. Full-screen grids

6 posters per row, scroll down to turn pages

Analysis

It totally obscures the background, but provides a lot of options.

2. Half-screen swim lane

10+ per row, scroll right to turn pages

Analysis

Although it obscures less and displays more items, the user action is not friendly for either remote control or voice, but suits air mouse. LG has the world-leading air mouse so they adopt this layout.

3. Half-screen swim lane

6.5 posters per row, scroll right

Analysis

It obscures less and contains moderate amount of items, nearly satisfactory.

4. Poster in landscape

3.5 posters per row, scroll right, half or full-screen

Analysis

The landscape view suits videos with long titles, such as Chinese short videos and english videos. While for the Chinese long videos, portrait view is more suitable.

Behavioral data indicated that users’ video intentions have 2 types: clear what they want to watch- “Play Game of Thrones”, and unclear but ask Xiaobai for advice- ”Recommend me a TV drama”. We name them ordinary recommend and personalized recommend respectively. Since the former has few content in most cases, half-screen is used for it, while the later is full-screen, and both of them present no more than 10 posters:

Ordinary recommendation

The target content is accurate, and less in quantity. Half-screen container will not completely interrupt the background playing.

10 targeted posters. Since the user is more intented to browse, full-screen doesn’t need paging and is remote control free in the whole process.

Personalized recommendation

Content area - non-video domains

This is more complicated because of diverse resource specifications. Nevertheless, they are all permutation and combination of basic elements: text and graphics. I built them gradually from molecules to organisms according to my modified atomic design system:

Small molecules

include inseparable graphics, text and components.

The size of text, components and small images are close, so I regard them as one type when arranging them:

Big molecules

are the combination of small molecules.

Organisms

are the repetition of molecules.

Like playing lego, I combined and repeated the images according to their size features. This step allows molecules to permute in a rich manner to form a compact organism. Moreover, there may be a secondary container expanded by the remote in the future, which will be fully compatible.

I obtained many more molecules actually. This method ensures all 20+ domains of the content area have corresponding layouts, as well as be expandable. The following table shows these domains.

Tests and conclusions

User preference test

Our department organized 40 users to experience the new and older versions, then conducted preference tests and collected feedback. Some of the feedback is shown as following:

80% of the 40 users preferred the new version, 12.5% and 7.5% thinking they are the same and preferring the old respectively. Specifically, UI & interaction was the biggest improvement (12%), followed by motion, and color & containers (2% and 7% separately) .

The most positive comments of the new version were: easy to view, high space utilization, and better layout.

While the most complaints were: containers too dark, toneless color scheme, motion not good-looking nor obvious enough. I immediately adjusted the container colors and what you see is the the optimized result.

The follow up compatibility tests: this new structure can easily support various needs such as festival skin change, mini programs, third-party apps, a guide for continuous dialog, etc.

Conclusions and learnings

1 / The biggest harvest is that I managed to anatomize design from the most granular elements, meanwhile, apply my revised Atomic Design System in the real project. I know a surface built by this meticulous method can be stable, comprehensive, and extensible.

2 / It’s often emphasized that big screen should bring a sense of immersion, with edge-to-edge content; that is true thus the TV interface usually uses overlaying structures. However that often leads to legibility issues, so there must be priority of the content. We should always consider the changing background behind the foreground information, and avoid messy.

3 / In this work, I mainly analyzed the two-dimensional surfaces, but no three or four dimensional factors, for example, secondary surface and timing. Although, the current structure is ready to unknown new needs, I need further study in that direction.

Back to homepage