September saw the announcement of the Voice Interoperability Initiative. Across a diverse range of tech household names from around the world, from car manufacturers like BMW, to hardware brands like Sonos, Bose & Harmon Kardon, to the tech giants of Amazon & Microsoft, in addition to hardware manufacturers that consumers likely never get to hear about inside their gear – the announcement seeks to usher in collaboration in the design and operation of voice-enabled products.
Where today you typically find a single voice provider in your device – Siri in your Apple gear, Alexa on Amazon widgets and Cortana baked into Windows & Xbox – with each device siloed to engage only with the respective manufacturer’s voice service, the interoperability initiative seeks to:
- Develop services that work with each other, whilst protecting privacy & security
- Promote choice & flexibility
- Make it easier for hardware vendors to include multiple voice services in one product
- Accelerate machine learning & conversational AI research
A key enabler to this vision is the ability to support multiple simultaneous ‘wake words’ – i.e. the ability to have your device listen for multiple possible starts to the commands you utter, and when detecting one routing the following audio to the appropriate service for processing.
Enabling manufacturers like Qualcomm and Intel will be instrumental in this capability and, when available, could allow (as an example) you to have an Alexa device that routes “Alexa, do X” to the Amazon voice service for processing, but “Cortana, do Y” through to reach into my work calendar, documents etc. and run commands there from a single device.
As I’ve mentioned in previous articles on IoT, standardisation can be a key to adoption as services seek ubiquity in our lives. Having a single device that’s able to perform multiple functions was key to the success of the home computer & the smart phone before, will the ability for a single device – be it a counter-top speaker, my phone or my headphones – be the inflection point of voice service adoption?
There are plenty of challenges along the way, from securing access (I don’t want guests in my living room to be able to deliberately or inadvertently query my work sales pipeline for example), to the age-old blocker to adoption – user experience.
I don’t know about you, but I often find myself thinking “I must get back to Fred about that question he asked” only to struggle to find what channel the question had been asked on – was it a WhatsApp message? An SMS? An Email? An IM on Teams?
If I have to remember which service controls my home automation, which is hooked up to work calendars and which has an alarm clock set, the opportunity for confusion for the consumer is plentiful. Joining up a consistent and clear UX across these multiple channels is going to be one hell of a challenge, but an agreement between over 30 parties to set some standards, agree some core principles and to starting working towards a unified approach may be just the kick start the technology needs.