Voice Assistant
Eng. Oscar Alonso Rosete Beas



Agenda
Text
AI Voice Assistant
Wake word detection
Speech recognition
speech-to-text
Natural language understanding and skills
Speech Synthesis
Coordinator
speech-to-intent
1
2
3
4

speech-to-text


Vosk- API
Vosk is a speech recognition toolkit. The best things in Vosk are:
Supports 17 languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino.
Works offline, even on lightweight devices - Raspberry Pi, Android, iOS
Docs:
https://alphacephei.com/vosk/

Rhasspy
Rhasspy is composed of independent services that coordinate over MQTT using a superset of the Hermes protocol.
Docs:
https://rhasspy.readthedocs.io/en/latest/#services


Picovoice
Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate. Using Picovoice, one can infer a user’s intent from a naturally spoken utterance such as:
Hey Edison, set the lights in the living room to blue.
- Porcupine wake word engine
- Rhino Speech-to-Intent
Docs:
https://picovoice.ai/docs/
Comparative table
Tech | Owner | License | Internet | Wake word | Speech-text | Speech-intent | Speech synthesis |
---|---|---|---|---|---|---|---|
Google recognition | - | only with internet | Yes | Yes | Yes | Yes | |
Deep Speech | Mozilla | Open source | offline | No | Yes | No | Yes |
Picovoice | Rhino | Free personal usage | offline | Yes | Yes | Yes | Yes |
Vosk | Alpha Cephei | Open source | offline | No | Yes | No | No |
Snowboy | Discontinued | Open source | offline | Yes | No | No | No |
Mycroft | Mycroft Community | Open source | offline | Yes | Yes | No | Yes |
Rhasspy | Community | Open source | offline | Yes | Yes | Yes | Yes |
Summary
Among the evaluated tools, we have 3 promising tools
Google's Speech recognition, Rhino's Picovoice and the Rhasspy project.
Based on the offline usage requirement the most promising ones are Picovoice and Rhasspy, even though Picovoice is not free for commercial use and Rhasspy is intended to use for linux computers.
Own development
Further lines of research can be done regarding own wake up word detection software and implementation through vosk api or Deep speech.
Total number of hours required for own development is going to be detailed in the following slides.
Rhasspy Development Effort
The following hours of research are required to ensure compatibility of rhasspy with android.
Currently supporting Raspberry Pi 2-3 B/B+ (armhf/aarch64), Desktop/laptop/server (amd64) and Raspberry Pi Zero (armv6l)
https://rhasspy.readthedocs.io/en/latest/hardware/
- Correct installation on Raspberry Pi to have as a benchmark
- Implement MQTT brooker on android
- Implement web server on android
- Test Wake up words
- Test Speech to Intent
It is estimated that these sections involve a total effort of 40 hours.
Android demo:
https://github.com/razzo04/rhasspy-mobile-app
Own Development effort
The following hours of research are required to use vosk-api or Mozilla Deep Speech
- Implement Wake up word/Hot word engine
- Ground-up approach
- PocketSphinx
- Snowboy.
- Implement speech to intent
- Ground-up approach
- voice-2-json
Testing voice-2-json, pocketshpinx and snowboy to make sure they are compatible with the other well-known api is estimated to require 40 hours.
Offline Android Voice Assistant
By Oscar Rosete
Offline Android Voice Assistant
- 478