The Speech Synthesis Markup Language (SSML) is a W3C standard for marking up text for speech output. It provides tags for controlling voices, rate of speech, volume, gender, and pitch (tone). It also provides tags for controlling how words are spoken, for instance spelling out abbreviations. SSML is part of the VoiceXML specification, which is also a W3C standard.
At this time, KTTS provides limited and very basic support for SSML. It currently has the following restrictions.
Works only with the Festival Interactive and Hadifix Talkers.
You must install the rab_diphone (British male) voice, as this is the default voice Festival uses when speaking SSML.
The Speed setting on the Audio screen is ignored when speaking text containing SSML.
If the Speed or Pitch settings in the Festival configuration dialog are not set to 100%, it will usually cause the SSML text to be spoken in a monotone.
The following sample text can be used to experiment with SSML.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE speak PUBLIC "-//W3C//DTD SYNTHESIS 1.0//EN" "http://www.w3.org/TR/speech-synthesis/synthesis.dtd"> <speak version="1.0" xml:lang="en-US"> <prosody pitch="low"> Who's been sleeping in my bed? </prosody> said papa bear. <prosody pitch="medium"> Who's been sleeping in my bed? </prosody> said momma bear. <prosody pitch="high"> Who's been sleeping in my bed? </prosody> said baby bear. </speak>
More robust support for SSML is planned for the next version of KTTS.
Would you like to comment or contribute an update to this page?
Send feedback to the TDE Development Team