“We’re at the beginning of the end of the first phase of voice” said James Poulter, CEO of Vixen Labs.
The beginning of the end of the first phase might be optimistic for voice, but that did not dampen the enthusiasm at the second annual Voice.Summit in New Jersey this week. Another speaker likened voice to being at the same stage of technology as the release of the first iPhone. If you remember, at that time everyone else had feature phones, and Android didn’t exist.
Not surprisingly, this year’s summit reminded me a bit of the early days of mobile development: more talk about how than why, more developer content than business content, and just an inkling that we really don’t know what we don’t know about where voice will go. One of the great things about this conference is that it’s a cross-industry event, so was a specific healthcare track, with topics ranging from best-practices for designing voice care plans to ethical considerations of voice, AI, and bots in healthcare, there was also the opportunity to learn from other industries, and also directly from the key technology leaders in the space.
The following are some highlights, learnings, and implications for healthcare as well.
Dave Isbitski, Alexa Developer Evangelist, kicked off the summit with a couple of announcements for Alexa, skill connections which enable one skill to invoke another skill, for example if you want to print content from a skill you can call another skill designed for printing, like the HP skill. While this is currently limited to named skills, it has huge value in healthcare where every skill shouldn’t have to recreate medication lists or a full lexicon of disease education, and it would be better to call on a proven authority like Mayo Clinic or WebMD to get more information.
Dave also announced Dialog Flow, which uses neural networks to develop skills with less coding, and less manual tagging, although admittedly we’re realistically still in the turn>multi-turn phase rather than full machine-learning and AI. This is probably okay for healthcare: let’s focus on getting patient feedback on structured conversations, like triage surveys, before trying to design a system that is completely responsive to any healthcare need.
Another announcement, the ability to have one skill that uses two languages, rather than installing a separate skill for each language would also be beneficial in healthcare to provide patient instructions, especially to family members who may have different “first” languages.
Samsung has jumped into the voice fray with a voice assistant called Bixby, which is designed to be a developer platform for people to insert voice into any type of device. Microsoft has this strategy with Cortana, the difference is that Samsung themselves ship televisions and refrigerators that can be voice-enabled. Samsung is also in the “voice and” category with screens being key part of their delivery. This has some really interesting implications for healthcare if you think about the television as the focal point of the living room. Health reminders and actions delivered there could have great impact. We’re working on a mobile version of the MIND diet, which would have huge impact if delivered through voice and visual reminders on the refridgerator door. The challenges with these modalities though are that it may take generations for the technology to become ubiquitous, versus the $39 Echo Dot. Samsung sees a world where your voice assistant knows you across all your devices, which would definitely be helpful in maintaining health context.
Voice as an ingredient, and part of an iOT and AI strategy was echoed by Microsoft. No surprise since Cortana doesn’t have a body or even a hockey puck. This strategy could be very interesting in healthcare if you think about the talking EPI pen. Why wouldn’t all devices and complex equipment have voice prompts for both patients and providers? There was also a meetup group at the conference demonstrating voice running on an ARM chip, which could be very interesting for the cheaper medical devices.
Designers and Developers
The tradeshow floor was full of mostly developer tools for building, testing, and securing voice applications, and the rallying cry in sessions was for the platform providers (Amazon, Google, Apple, Samsung, Microsoft) to standardize on their approach to voice, if not for the developers, then for the end-users. One of the key areas for some standardization is in the lack of standard interface, just as the APIs from platform vendors use different terminology, there’s no standard interface or reusable components aside from the idea of a wake word, to help users navigate. Mobile had the same problem in the early days, and still does to some extent, something that was solved with on-boarding experiences and tours built into apps, something voice has yet to do, but if done consistently could really improve usability.
“Complexity and ecosystem lock-in are threats to ubiquity and frictionless experience. Let’s not build an ecosystem that locks that in.” James Poulter, CEO of Vixen Labs
There was also an admonition to not build apps for the sake of building apps, but to focus on user need, and understand that what users want most of all is convenience, and that they will use the most convenient interface for the task (web, mobile, TV, phone, voice).
What does this mean in healthcare? Context is very important. Make sure your users know exactly what your skill can and can’t do so they don’t expect the full canon of medical knowledge from your one skill. At Wellpepper, we are firmly in the “Voice and” category, (and yet I still get invited to speak at these events): our voice interactions are a subset of the patient’s care plan, and just as they don’t expect the mobile app to do more than deliver the care plan prescribed by their physicians, the same holds true for the voice experiences for Wellpepper interactive care plans.
While I didn’t hear anyone talking specifically about the ethical issues of both the eavesdropping scandals, and the need for humans to manually tag voice snippets in order to improve machine learning, I did attend a great session by Brooke Hawkins on ethics and design implications in healthcare. Issues tackled included considerations for disclosure of the exact conditions for the efficacy of an app, whether A/B testing on patients can even be done, and understanding the implications of focusing on specific measures in a care plan. On this last one, she suggested that care plans that include weight, like our Sugarpod diabetes care plan
Along the same lines of not building voice apps for the sake of it, there was also a lot of talk of how your brand is reflected in your app. Not all healthcare systems think of their brand impact, although they should, and the voice skill is an extension of that. Interestingly David Ciccarelli from Voices.com which has voice talent, mentioned that most developers use the standard Alexa voice. It’s not surprising, as it’s expensive to have someone record every possible response for your application, although it’s interesting to think about a world where your healthcare app is speaking in the voice of your own doctor. Given what we’ve seen with the correlation between adherence and healthcare provider engagement, this could give a huge boost to patient outcomes. The technology is not that far off to synthesis voice from other recordings, so it wouldn’t require that your doctor record everything. Or perhaps it would make more sense to have a specific doctor be the voice of all of your apps, which might be more credible than Alexa dispensing healthcare information. Ciccarelli provided a nice matrix of when to use synthetic voice and when to use real humans that applies well in healthcare.
There’s no question that voice spans all ages. Dave Isbitski opened the conference by saying that his kids and his parents were equally excited by voice applications. Speakers at on the Best Practices for Developing Voice Care Plans panel (myself included) were developing specialized care plans for children, seniors, and everyone else. While there are definite generational differences in usage patterns for voice assistant, and there is also a “voice-first” generation coming. One speaker mentioned how his child who had Alexa from birth knew the limitations of the device, and didn’t ask more than Alexa was capable to deliver, however a 6-year-old family friend who didn’t have that experience wanted to ask things like “do you know my teacher.” In our testing we found similar differences between generations as well, with seniors more likely to try to have a conversation, and younger people sticking to the script of the care plan a bit more. I heard one developer say that they track slang used by end user to determine age and adjust the interactions accordingly.
Wrapping It Up For Healthcare
We’ve written before about the use cases for voice in healthcare, and there are many, from documenting clinic visits, transcribing physician notes, to medication adherence, education, and patient care plans, as well, voice biomarkers, which my fellow panelists called a huge pool of untapped diagnostic data. If we’re at the early days of voice apps, we’re also at the early days of voice data. There’s a ton to be discovered, and the research, especially in healthcare, is just starting.
Our expectations for voice are high. Let’s hope it delivers.
If you’re interested in learning more about voice in healthcare here are some great resources: