Google Cloud Text-to-Speech API

Q: How much does Google Cloud Text-to-Speech cost?

Google Cloud TTS has a free tier of 4 million characters per month for standard voices and 1 million characters for WaveNet/Neural2/Studio voices. After that: Standard voices cost $4 per million characters; WaveNet $16 per million; Neural2 $16 per million; Studio (highest quality) $160 per million characters.

Q: What are WaveNet voices and how are they different?

WaveNet voices are AI-generated voices that sound significantly more natural than standard TTS. Neural2 voices are Google's latest generation, trained on WaveNet technology with improved prosody and naturalness. Studio voices are the premium tier u2014 recorded by professional voice actors and enhanced by AI, sounding nearly indistinguishable from human speech.

Q: How many languages does Google Cloud TTS support?

Google Cloud TTS supports 40+ languages and variants, with over 380 voices total. This includes major world languages as well as regional variants (e.g., multiple English accents, Brazilian vs. European Portuguese). WaveNet quality varies by language u2014 it is strongest for English, Spanish, French, German, Japanese, and Korean.

Q: What audio formats can Google Cloud TTS output?

Google Cloud TTS can output MP3, LINEAR16 (WAV), OGG Opus, MULAW, and ALAW audio. You can also customise the speaking rate (0.25xu20134x), pitch (u221220 to +20 semitones), and volume gain. For telephony applications, use MULAW at 8kHz; for most applications, MP3 at 24kHz offers the best size-to-quality ratio.

380+ voices across 50+ languages with WaveNet and Neural2

Freemium ✓ Verified ★ 4.6 🇺🇸 United States

View Documentation → Visit Website

Google Cloud Text-to-Speech offers one of the widest selections of voices (380+) in the industry across 50+ languages and variants. WaveNet and Neural2 voices produce highly natural-sounding speech using deep learning. The generous free tier of 4 million characters/month makes it the go-to for prototyping and medium-volume applications. SSML support gives fine-grained control over pronunciation, speed, pitch, and pauses. Used in IVR systems, accessibility tools, e-learning platforms, and smart speakers.

API Details

Auth Method

API Key

Pricing Model

Freemium

Free Tier

Yes — 4 million characters/month free

Rate Limit

300 RPM

Format

REST / JSON / gRPC

Versioning

v1, v1beta1

SLA / Uptime

99.9%

Compliance

SOC 2, ISO 27001, HIPAA, GDPR

Geographic Restrictions

Global (30+ regions)

Last Verified

2026-02-20

Frequently Asked Questions

How much does Google Cloud Text-to-Speech cost?

What are WaveNet voices and how are they different?

How many languages does Google Cloud TTS support?

What audio formats can Google Cloud TTS output?

Google Cloud Text-to-Speech API

API Details

Categories

Frequently Asked Questions