Voice Assistant

Eng. Oscar Alonso Rosete Beas

Agenda

Text

Comparative table

Description of alternatives

Summary

AI Voice Assistant System

Own development

01/18

AI Voice Assistant

Wake word detection

Speech recognition

speech-to-text

Natural language understanding and skills

Speech Synthesis

Coordinator

speech-to-intent

speech-to-text

01/18

Vosk- API

Vosk is a speech recognition toolkit. The best things in Vosk are:

Supports 17 languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino.

Works offline, even on lightweight devices - Raspberry Pi, Android, iOS

Docs:

https://alphacephei.com/vosk/

01/18

Rhasspy

Rhasspy is composed of independent services that coordinate over MQTT using a superset of the Hermes protocol.

Docs:

https://rhasspy.readthedocs.io/en/latest/#services

01/18

Picovoice

Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate. Using Picovoice, one can infer a user’s intent from a naturally spoken utterance such as:

Hey Edison, set the lights in the living room to blue.

Porcupine wake word engine
Rhino Speech-to-Intent

Docs:

https://picovoice.ai/docs/

01/18

Comparative table

Tech	Owner	License	Internet	Wake word	Speech-text	Speech-intent	Speech synthesis
Google recognition	Google	-	only with internet	Yes	Yes	Yes	Yes
Deep Speech	Mozilla	Open source	offline	No	Yes	No	Yes
Picovoice	Rhino	Free personal usage	offline	Yes	Yes	Yes	Yes
Vosk	Alpha Cephei	Open source	offline	No	Yes	No	No
Snowboy	Discontinued	Open source	offline	Yes	No	No	No
Mycroft	Mycroft Community	Open source	offline	Yes	Yes	No	Yes
Rhasspy	Community	Open source	offline	Yes	Yes	Yes	Yes

01/18

Summary

Among the evaluated tools, we have 3 promising tools

Google's Speech recognition, Rhino's Picovoice and the Rhasspy project.

Based on the offline usage requirement the most promising ones are Picovoice and Rhasspy, even though Picovoice is not free for commercial use and Rhasspy is intended to use for linux computers.

01/18

Own development

Further lines of research can be done regarding own wake up word detection software and implementation through vosk api or Deep speech.

Total number of hours required for own development is going to be detailed in the following slides.

01/18

Rhasspy Development Effort

The following hours of research are required to ensure compatibility of rhasspy with android.

Currently supporting Raspberry Pi 2-3 B/B+ (armhf/aarch64), Desktop/laptop/server (amd64) and Raspberry Pi Zero (armv6l)

https://rhasspy.readthedocs.io/en/latest/hardware/

Correct installation on Raspberry Pi to have as a benchmark
Implement MQTT brooker on android
Implement web server on android
Test Wake up words
Test Speech to Intent

It is estimated that these sections involve a total effort of 40 hours.

Android demo:
https://github.com/razzo04/rhasspy-mobile-app

01/18

Own Development effort

The following hours of research are required to use vosk-api or Mozilla Deep Speech

Implement Wake up word/Hot word engine
- Ground-up approach
- PocketSphinx
- Snowboy.
Implement speech to intent
- Ground-up approach
- voice-2-json

Testing voice-2-json, pocketshpinx and snowboy to make sure they are compatible with the other well-known api is estimated to require 40 hours.