Amazon Polly

Hands-On

Demo

Agenda

In this demo, we will:

  1. Set up an S3 bucket for storing audio files
  2. Create an IAM role for Polly to access S3
  3. Synthesize speech using Amazon Polly console
  4. Create a speech synthesis task for longer text
  5. Experiment with different voices and SSML
  6. Test the generated audio files
  7. Clean up resources

Create bucket

polly-audio-demo-361981

Create Role

AmazonPollyFullAccess

AmazonPollyFullAccess

AmazonS3FullAccess

AmazonS3FullAccess

Name, review, and create

PollyS3AccessRole

Step 1: Select trusted entities

Create role

Role PollyS3AccessRole created

Amazon Polly - Joanna

Welcome to Amazon Polly! 
This is a demonstration of text-to-speech conversion using AWS artificial intelligence services. 
Polly can convert text into lifelike speech, making it easy to create applications that talk 
and build entirely new categories of speech-enabled products.

Matthew

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Polly's Text-to-Speech service uses advanced deep learning technologies to synthesize natural sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications that work in many different countries.

In addition to Standard TTS voices, Amazon Polly offers Neural Text-to-Speech voices that deliver advanced improvements in speech quality through a new machine learning approach. Polly's Neural TTS technology also supports a Newscaster speaking style that is tailored to news narration use cases.

Amazon Polly is designed to be highly scalable, cost-effective, and easy to use. You can quickly integrate high-quality speech synthesis into your applications without worrying about the underlying infrastructure.

Save to S3

polly-audio-demo-361981
synthesis-tasks/

Output Location

S3 synthesis tasks

Completed

Checkout the S3 Bucket

Standard - SSML 

<speak>
    <prosody rate="slow" pitch="low">
        Hello, and welcome to our SSML demonstration.
    </prosody>
    <break time="1s"/>
    Now I'll speak at a normal pace.
    <break time="0.5s"/>
    I can spell out words: <say-as interpret-as="spell-out">AWS</say-as>
    or say numbers as digits: <say-as interpret-as="digits">12345</say-as>.
    <break time="0.5s"/>
    <prosody rate="fast" pitch="high">
        And I can speak quickly with a higher pitch!
    </prosody>
    <break time="1s"/>
    Here's a substitute word: <sub alias="World Wide Web">WWW</sub>
</speak>
<speak>
    <amazon:domain name="news">
        This is a news-style announcement using Amazon Polly's neural voice technology.
    </amazon:domain>
    <break time="1s"/>
    <prosody rate="medium">
        Neural voices provide higher quality speech synthesis.
    </prosody>
</speak>

news

Dynamic Range Compression

<speak>
    <amazon:effect name="drc">
        This text uses dynamic range compression for more consistent audio levels across different playback devices.
    </amazon:effect>
</speak>

Clean Up

Empty the bucket

permanently delete

permanently delete

Delete the bucket 

Delete bucket

polly-audio-demo-361981

Delete the Role 

PollyS3AccessRole

🙏

Thanks

for

Watching