Summary
Early in the 2010s, Nuance Communications had internal research teams working on natural language processing (NLP) projects to make it possible to train natural language understand (NLU) and automatic speech recognition (ASR) engines more quickly with smaller sets of data, while retaining high accuracy.
The product launched publicly as Nuance Mix midway through the decade. When Cerence was spun out as an independent company in 2019, their version of Mix was rebranded as Cerence Studio. This included the tools for NLU (with ASR functionality) and Dialog (with natural language generation (NLG)), a sample mobile app, and SDK, and prototypes for a design tool.
In the early days of Mix, our team operated much like an internal start-up. I was the sole UX researcher and the front-end product manager/scrum master, and technical writer for documentation. I was one of the UX architects and graphical UX designers. By the launch of Cerence Studio after spin-off from Nuance, I continued to conduct UX research, but had switched to overseeing UX design and UI design.
Problem
NLP technologies are traditionally complex and best understood by speech scientists and machine learning experts. Most advancements in accuracy still relied on large models that took a great deal of time to train. The underlying engines succeeded in these goals, but the middleware and front-end built on top of them were only in prototype form.
These projects aimed to make those NLP technologies available for use for a larger group of internal users without those backgrounds, and democratize these technologies to outside developers and designers to speech-enable anything they wanted. They also required the ability to manage deployment flows of various models, collect usage data, support localization, and more.
The major problem was always trying to take these technologies that were often handled by experts in code, and make them accessible to users with a web-based GUI, as well as accessible via APIs.
Design
The internal research teams provided us with prototype interfaces for the NLU and Dialog tools, as well as the sample mobile application. I was part of a small team of designers who took these prototypes and created and evaluated multiple versions of improved UX, including support from design.
[Mix and Studio logos, small, to right]
For NLU, we created a layout that allow user to structure their data hierarchically with user utterances (sample sentences) tagged with concepts (slots/mentions) as part of intents. Think of this like a function in code; the intent is like a function, with concepts like the variables one passes from user input. For example, in a coffee ordering app the user may say, “Give me a large caramel latte” resulting in the intent of “order_coffee”, and concepts of “size:large, “flavor:caramel”, “type:latte”.
[Image of NLU interface]
For Dialog, the major challenge was trying to allow developers/designers to visualize how a dialog engine must guide a user through a dialog, including prompting for more information and responding using NLG. We used decision tables, allowing users to setup a logical flow for the engine to execute. The tables supported multi-modal input, including voice and text, as well as facets (e.g. information that could be used to load images and more.). This was not flat or two-dimensional; rather, there were many factors nested within the table that needed to be accessible and usable.
[Image of Dialog interface]
I created personas based on user research, supporting our broad base of intended users. This ranged from hackathon-style developers, to speech science experts, to voice designers, and more.
[Persona images]
Finally, I worked with and oversaw two UX designers on early designs for a design tool that would interface with the other tools, allowing designers to create flows that were linked to the NLU and dialog models. These concepts were not productized.
[Design layer image]
Research
Over a dozen internal interviews were conducted with internal subject-matter experts (SMEs) in speech science, machine learning and AI, NLP (NLU and dialog) development, and dialog design. Some of these SMEs were also put in front of the prototypes and observed while they attempted to learn and use the tools. This allowed me to create UX and product requirements, which fed into the early designs as we went from prototypes to our beta product.
Throughout the process, I conducted user-centered design sessions with prospective users that matched out personas. This allowed us to poke holes in our concepts for both tools, and iterate and refine.
With both prototypes and beta versions, I conducted user studies that served as mix of onboarding and usability testing simultaneously. New users would be provided with access to documentation ahead of the session, and then observed with minimal interaction with us as they explored and started using the tools for the first time. When stuck, they could ask questions. Once they completed some basic tasks, they were provided with training much like new customers would be, and then an interview was conducted. Results were analyzed and fed back into our feedback loop, allowing us to continue to refine.
During the beta phase, we attended numerous hackathons, trained students on the tools, and observed and interviewed them about their experiences.
After launch, we would periodically observe new employees as they started to use the tools for the first time to measure our improvements for first-time use. Additionally, a small subset of users would participate in follow-up studies to evaluate how their usage of the tool had evolved over time.
Outcome
Cerence Studio is now used by multiple large automakers and a fitness equipment company to manage some of their domains, and customize elements of their user experience. A growing number of internal projects also use Studio, allowing a larger range of developers and designers to work on Cerence projects without being speech science or machine learning experts.