Voice Assistant

Eng. Oscar Alonso Rosete Beas

Agenda

Text

AI Voice Assistant

Wake word detection

Speech recognition

speech-to-text

Natural language understanding and skills

Speech Synthesis

Coordinator

speech-to-intent

1

2

3

4

speech-to-text

Vosk- API

Vosk is a speech recognition toolkit. The best things in Vosk are:

Supports 17 languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino.


Works offline, even on lightweight devices - Raspberry Pi, Android, iOS

 

Docs:

https://alphacephei.com/vosk/

Rhasspy

Rhasspy is composed of independent services that coordinate over MQTT using a superset of the Hermes protocol.

 

 

 

 

 

 

 

Docs:

https://rhasspy.readthedocs.io/en/latest/#services

Picovoice

Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate. Using Picovoice, one can infer a user’s intent from a naturally spoken utterance such as:

Hey Edison, set the lights in the living room to blue.

  • Porcupine wake word engine
  • Rhino Speech-to-Intent

 

Docs:

https://picovoice.ai/docs/

Comparative table

Tech Owner License Internet Wake word Speech-text Speech-intent Speech synthesis
Google recognition Google - only with internet Yes Yes Yes Yes
Deep Speech Mozilla Open source offline No Yes No Yes
Picovoice Rhino Free personal usage  offline Yes Yes Yes Yes
Vosk Alpha Cephei Open source offline No Yes No No
Snowboy Discontinued Open source offline Yes No No No
Mycroft  Mycroft Community Open source offline Yes Yes No Yes
Rhasspy Community Open source offline Yes Yes Yes Yes

Summary

Among the evaluated tools, we have 3 promising tools

Google's Speech recognition, Rhino's Picovoice and the Rhasspy project.

Based on the offline usage requirement the most promising ones are Picovoice and Rhasspy, even though Picovoice is not free for commercial use and Rhasspy is intended to use for linux computers.

Own development

Further lines of research can be done regarding own wake up word detection software and implementation through vosk api or Deep speech.

 

Total number of hours required for own development is going to be detailed in the following slides.

 

Rhasspy Development Effort

The following hours of research are required to ensure compatibility of rhasspy with android.

Currently  supporting Raspberry Pi 2-3 B/B+ (armhf/aarch64), Desktop/laptop/server (amd64) and Raspberry Pi Zero (armv6l)

https://rhasspy.readthedocs.io/en/latest/hardware/

 

  • Correct installation on Raspberry Pi to have as a benchmark
  • Implement MQTT brooker on android
  • Implement web server on android
  • Test Wake up words
  • Test Speech to Intent

 

 

It is estimated that these sections involve a total effort of 40 hours.

Android demo:
https://github.com/razzo04/rhasspy-mobile-app

Own Development effort

The following hours of research are required to use vosk-api or Mozilla Deep Speech

  • Implement Wake up word/Hot word engine
    • Ground-up approach
    • PocketSphinx
    • Snowboy.
  • Implement speech to intent
    • Ground-up approach
    • voice-2-json

Testing voice-2-json, pocketshpinx and snowboy to make sure they are compatible with the other well-known api is estimated to require 40 hours.

Offline Android Voice Assistant

By Oscar Rosete

Offline Android Voice Assistant

  • 478