Currently, there is no such thing as a single Microsoft speech service.
But Microsoft is taking the first steps toward creating a single speech application programming programming interface (API) and software development kit (SDK) that will work across its products and services, including Windows, Office, Cortana, Xbox and the HoloLens.
Microsoft disclosed this move last week in a rather understated way at its Build 2018 conference. (This Day 3 Build session on the “Cognitive Services Speech SDK” covers some of the details.)
Microsoft has some ambitious goals for its coming unified Speech Service, which falls under its Microsoft Cognitive Services umbrella. (Cognitive services are Azure APIs that developers can use to add various AI capabilities to their own apps and services.)
The new unified Speech Service “unites several Azure speech services that were previously available separately: Bing Speech (comprising speech recognition and text to speech), Custom Speech, and Speech Translation. Like its precursors, the Speech service is powered by the technologies used in other Microsoft products, including Cortana and Microsoft Office,” according to Microsoft.
Microsoft is aiming to have the common speech API and SDK “run on all modern platforms” and “support all modern programming languages.” Microsoft wants the service to be accessible by all levels, from novice to expert developer, and to work online, offline, in hybrid situations and batch, officials said. The new API and SDK will provide speech-to-text; speech-to-intent; speech translation and custom keyword-spotter invocation. They will work with both single-shot spoken commands and continuous ones. Microsoft is committing to handle all 28 spoken languages in the one unified Speech SDK.
“We don’t have all that today, but this (Speech preview) is a good first step,” said Rob Chambers during last week’s Speech SDK session. The preview supports Windows 10, Linux and Android (via the Speech Devices SDK), and works with C#, C++ and Java currently. Support for iOS and macOS X are coming “soon.”
The Speech Devices SDK is a “pre-tuned library paired with specific microphone-enabled hardware,” explains Microsoft in its documentation. “The SDK makes it easy to integrate your device with the cloud-based Microsoft Speech service and create an exceptional user experience for your customers.”
The Devices SDK is meant to enable companies to build their own “ambient devices with a customized wake word,” and provides noise suppression, echo cancellation, far-field voice and more. Currently, the SDK preview provides access to Speech to Text and Speech Translation. Text to Speech is currently not supported by the SDK.
Microsoft officials said they are moving the existing Microsoft Translator app/service to use the new unified Speech Service and SDK as of its next release. Office also is planning to replace the current dictation engine, based on Dictate technology developed by the Microsoft Garage incubator, with the new service/SDK.
“Microsoft is planning to move Office Dictation to the Microsoft Speech Service and unified SDK when it becomes generally available. In the meantime, Office Dictation will continue to be updated and the migration will be seamless for customers,” a spokesperson told me when I asked about timing.
Microsoft officials said they expect the service/SDK to become generally available some time in the “next few months,” the spokesperson said.
I’ve also asked the Windows team about its plans regarding when/how Windows 10 will support the new unified speech service and SDK. With the Windows 10 April 2018 update, Microsoft officials were touting improved dictation built into Windows 10 as one of the April Update’s main selling points. But Windows doesn’t use the same speech engine as Office or other Microsoft products at this time; it uses legacy Microsoft speech technology.
So far, no word back from the Windows team on what it’s planning on this front.