Google Cloud Text-to-Speech API
380+ voices across 50+ languages with WaveNet and Neural2
Google Cloud Text-to-Speech offers one of the widest selections of voices (380+) in the industry across 50+ languages and variants. WaveNet and Neural2 voices produce highly natural-sounding speech using deep learning. The generous free tier of 4 million characters/month makes it the go-to for prototyping and medium-volume applications. SSML support gives fine-grained control over pronunciation, speed, pitch, and pauses. Used in IVR systems, accessibility tools, e-learning platforms, and smart speakers.
API Details
Categories
Frequently Asked Questions
Google Cloud TTS has a free tier of 4 million characters per month for standard voices and 1 million characters for WaveNet/Neural2/Studio voices. After that: Standard voices cost $4 per million characters; WaveNet $16 per million; Neural2 $16 per million; Studio (highest quality) $160 per million characters.
WaveNet voices are AI-generated voices that sound significantly more natural than standard TTS. Neural2 voices are Google's latest generation, trained on WaveNet technology with improved prosody and naturalness. Studio voices are the premium tier u2014 recorded by professional voice actors and enhanced by AI, sounding nearly indistinguishable from human speech.
Google Cloud TTS supports 40+ languages and variants, with over 380 voices total. This includes major world languages as well as regional variants (e.g., multiple English accents, Brazilian vs. European Portuguese). WaveNet quality varies by language u2014 it is strongest for English, Spanish, French, German, Japanese, and Korean.
Google Cloud TTS can output MP3, LINEAR16 (WAV), OGG Opus, MULAW, and ALAW audio. You can also customise the speaking rate (0.25xu20134x), pitch (u221220 to +20 semitones), and volume gain. For telephony applications, use MULAW at 8kHz; for most applications, MP3 at 24kHz offers the best size-to-quality ratio.
