last update: 28th January 2013 , diese Seite auf deutsch 
I added a chart to facilitate the understanding of the concepts used for classification. It's kind of out-dated, as non-uniform unit selection is not explicitely mentioned.
TTS consists always of two components, which I call Dutoit's introduction):
The engines that synthesize the speech (DSP-component) are based mainly on three main technologies:



The test sentences were:
sentence 1:
» An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht. Dabei war eigentlich immer sehr schönes Wetter gewesen. «
As I found this sentence a bit too simple, I thought up another test sentence which contains a collection of known problems for the NLP module: (in some demos this sentence is truncuated due to provider's restriction on character number)
sentence 2:
» Dr. A. Smithe von der NATO (und nicht vom CIA) versorgt z.B. - meines Wissens nach - die Heroin seit dem 15.3.00 tgl. mit 13,84 Gramm Heroin zu 1,04 DM das Gramm. «
Speaking now 6 years after thinking up those sentences, more pressing problems for German speech synthesis used in services like email-reading arise from the pronounciation of english terms, e.g. the following sentence would not be pronounced correctly by most systems without tuning:
sentence 3:
» Die Manpowerdiskussion wird gecancelt, du kannst das File vom Server downloaden. «
all soundfiles are mp3 64kb
| company/link | engine name | technology | languages ( internet abbr.) | voice name | year (approx.) | s1 | s2 | s3 |
|---|---|---|---|---|---|---|---|---|
Acapela Group (former Babeltech, Infovox and Elan)![]() ![]() ![]() |
Acapela HQ TTS | non-uniform unit-selection | DE, FR, NL, ES, SE, US, SA, CY, DN, FI, CA, GR, IE, NO, PL, PT, BR, RU, TR | Andreas | 2011 | ![]() |
![]() |
![]() |
| Julia | 2009 | ![]() |
![]() |
![]() |
||||
| Klaus | 2006 | ![]() |
![]() |
![]() |
||||
| Sarah | 2003 | ![]() |
![]() |
![]() |
||||
| greeting bunny | non-uniform unit-selection | DE, US, FR, IT, ES, NL, SE, NO, DK, BE | bunny | 2008 | ![]() |
![]() |
![]() |
|
| Elan 's SaySo | non-uniform unit-selection | DE, US, FR, IT, ES | Lea | 2003 | ![]() |
![]() |
![]() |
|
| Elan's Tempo | diphone-concatenation (PSOLA). Pitch Synchronous Overlap and Add: famous algorithm to change pitch and time of speech that made diphone-synthesis a great success for many years. | DE, US, UK, FR, ES, IT, BR, PT, RU, PL | Thomas | 1998 | ![]() |
![]() |
![]() |
|
| Dagmar | 1996 | ![]() |
![]() |
![]() |
||||
| Babeltech 's BrightSpeech | non-uniform unit-selection, same as Acapela HQ TTS | Ingrid | 2002 | ![]() |
![]() |
- | ||
| Babeltech's Babil | diphone-concatenation based on commercial Mbrola-engine. MBROLA (Multi Band Resynthesis Overlap and Add), similar to PSOLA but the database is treated beforehand to adapt pitch, amplitude and spectral features. | DE, US, UK, ES, FR, NL, BE, BR, PT, IT, SE, NO, DK, FI, IS, TR, CZ, SA | Eva | 2000 | ![]() |
![]() |
![]() |
|
| Greta | 2000 | ![]() |
![]() |
![]() |
||||
| Helga (8 kHz) | 2000 | ![]() |
![]() |
![]() |
||||
| Gerhard (8 kHz) | 2000 | ![]() |
![]() |
![]() |
||||
| Steffen | 1997 | ![]() |
![]() |
![]() |
||||
| Infovox 330/Infovox Desktop | diphone-concatenation (probably same as Babil). Infovox 310 is apple version | DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE | Helga | 1996 | ![]() |
![]() |
- | |
Gerhard ![]() |
1996 | - | - | - | ||||
| Infovox 210/230 | formant-synthesis (successor of KTH's OVE, originally telia promotor) | DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE | - | 1994 | ![]() |
![]() |
- | |
| Infovox Desktop PRO | non-uniform unit-selection, same as Acapela HQ TTS | |||||||
Aculab![]() |
. | diphone-concatenation with LPC coded units. LPC (linear predictive coding), originally a compression algorithm, useful for synthesis because based on a source/filter model of speech. | DE, UK, US, FR, BR, IT, ES | Julia | 1998 | ![]() |
![]() |
- |
Amazon (formerly Ivona)![]() |
IVONA TTS | non-uniform unit-selection | DE, US, UK, ES, RO, PL, MX | Hans | 2011 | ![]() |
![]() |
![]() |
| Marlene | 2011 | ![]() |
![]() |
![]() |
||||
Atip![]() |
Proser | NLP-component and voices from Atip, Mbrola Engine (diphone-concatenation) from Babeltech | DE, US | Carla | 2000 | ![]() |
![]() |
![]() |
| Erkan (turkish accent) | 2004 | ![]() |
![]() |
![]() |
||||
| Fifi (french accent) | 2004 | ![]() |
![]() |
![]() |
||||
| Steffen | 1997 | ![]() |
![]() |
![]() |
||||
| Eva | 2000 | ![]() |
![]() |
![]() |
||||
AT&T![]() |
Natural Voices | non-uniform unit-selection | DE, IT, US, UK, FR, MX* | Klara | 2001 | ![]() |
![]() |
![]() |
| Reiner | 2002 | ![]() |
![]() |
![]() |
||||
Bell-Labs
(Lucent)![]() |
- | LPC-diphone concatenation | DE, FR, ES, US, UK, IT, RU, RO, CN | - | 1997 | ![]() |
![]() |
- |
Cepstral![]() |
- | non-uniform unit-selection | DE, UK, US, ES, FR, EG, TH, AF | Katrin | 2003 | ![]() |
![]() |
![]() |
| Matthias | 2003 | ![]() |
![]() |
![]() |
||||
Fonix![]() |
Dectalk | rule based formant-synthesis - the legendary formant synthesizer, based on Klatt's MITTalk) | DE, US, UK, ES, MX*, FR | - | 1982 | ![]() |
![]() |
- |
GData![]() |
Logox | microsegmentsynthesis (concatenating subphonetic units) | DE, US, UK | - | 2000 | ![]() |
![]() |
- |
| Bill | 1998 | ![]() |
![]() |
![]() |
||||
| Bill swabian accent | 2002 | ![]() |
![]() |
![]() |
||||
| Bill hessian accent | 2002 | ![]() |
![]() |
![]() |
||||
| Bill saxian accent | 2002 | ![]() |
![]() |
![]() |
||||
| Bill french accent | 2002 | ![]() |
![]() |
![]() |
||||
IBM![]() |
CTTS | non-uniform unit-selection | DE, US, UK, JP, KR, IT, ES, FR | male, courtesy of IBM. Database speaker is Gilles Karolyi. | 2002 | ![]() |
![]() |
![]() 8kHz |
female 8kHz |
2004 | - | - | - | ||||
Meridian![]() |
Orpheus, formerly from Dolphin Oceanic Ltd | formant synthesis. | DE, UK, US, FR, BR, PT, IT, ES, Welsh, CN (Catonese and Mandarin), CR, DN, NL, FI, GR, HU, LT, MY, NO, PL, RO, MX, SE | - | 2009 | ![]() |
![]() |
![]() |
Microsoft![]() |
Microsoft Speech Platform - Runtime Languages (Version 11) | non-uniform unit-selection. | ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN | Hedda | 2012 | ![]() |
![]() |
![]() |
Nuance (formerly called Scansoft)![]() |
Vocalizer (formally RealSpeak, originally from Lernout & Hauspie), converged with RVoice (formerly Rhetorical) first commercial German unit-selection TTS | non-uniform unit-selection | DE, NL, PT, CA, CN, ES, DK, PT, FR, IT, JP, KR, MX, NO, PL, RU, SE, US, UK, AU, SA, ID, Basque, BE, CZ, FI, GR, IN, HU, TH, TR, ZA, RO | Anna (11 kHz, courtesy of Nuance) | 2010 | ![]() |
![]() |
![]() |
| Yannick (11 kHz, courtesy of Nuance) | 2006 | ![]() |
![]() |
![]() |
||||
| Yannick embedded version recorded from a cell phone | 2009 | ![]() |
![]() |
![]() |
||||
| Monika and/or Beate (?) - same as RVoice F026 | 2005 | ![]() |
![]() |
![]() |
||||
| Steffi (8 kHz) | 2004 | ![]() |
![]() |
![]() |
||||
| Vera (8 kHz) | 1999 | ![]() |
![]() |
![]() |
||||
| Former Loquendo, formerly called Actor, now Loquendo TTS | non-uniform unit-selection | DE, IT, ES, FR, BR, PT, CN, UK, US, MX, GR, CL, AR, SE | Ulrike, no longer available | 2001 | ![]() |
![]() |
![]() |
|
| Stefan, courtesy of Loquendo | 2003 | ![]() |
![]() |
![]() |
||||
| Katrin, courtesy of Loquendo | 2003 | ![]() |
![]() |
![]() |
||||
| Former SVOX, commercial version of the ETH-Zuerich System. | diphone-synthesis | DE, FR, IT, US, ES | Nicole | 2000 | ![]() |
![]() |
- | |
| Former SVOX, Corporate | non-uniform unit-selection | DE, US | Petra | 2005 | ![]() |
![]() |
![]() |
|
| Markus | 2005 | ![]() |
![]() |
![]() |
||||
Marlene ![]() |
2003 | - | - | - | ||||
| Speechify (formerly SpeechWorks, now merged with Scansoft) | non-uniform unit-selection | DE, US, UK, AU, JP, MX*, FR, BR, CA(FR) | Tessa | 2002 | ![]() |
![]() |
![]() |
|
| RVoice, former Rhetorical | non-uniform unit-selection | DE, UK, US, GR, ES | F018 | 2002 | ![]() |
![]() |
![]() |
|
| M027 | 2004 | ![]() |
![]() |
![]() |
||||
| F026 | 2004 | ![]() |
![]() |
![]() |
||||
| Vocalizer 4.05 (former Nuance, before acquisition by Scansosft) | non-uniform unit-selection | DE, US, UK, AU, CA(FR), MX*, BR | Anna Weber | 2004 | ![]() |
![]() |
![]() |
|
| Vocalizer 1.0 (former Nuance, before acquisition by Scansosft) | non-uniform unit-selection (licensed fonix engine) | DE, US, UK, NL, FR, IT, NO, ES, SE | - | 2001 | ![]() |
![]() |
- | |
| ETI Eloquence, (originally from Eloquent Technologies, than Speechworks) also licensed to IBM (ViaVoice Outloud) | rule-based formant-synthesis (Klatt-style) | DE, UK, US, ES, MX, FR, CA(FR), IT, FI, BR, CN, JP, KR | - | 1998 | ![]() |
![]() |
- | |
| TTS3000 (orig. Lernout & Hauspie) | diphone-synthesis | DE, US, UK, NL, FR, RU, ES, MX, BR, CN, KR | Stefan | 1996 | ![]() |
![]() |
- | |
Anna ![]() |
1996 | - | - | - | ||||
| TruVoice (orig. Centigram, than Lernout & Hauspie) | formant-synthesis | DE, US, MX*, FR, IT | - | 1996 | ![]() |
![]() |
- | |
SpeechConcept![]() |
Cerevoice, technology from Cereproc | non-uniform unit-selection | DE, EN, FR, IT, ES | Sophie, adult, Corporate Voice, courtesy of SpeechConcept | 2011 | ![]() |
![]() |
![]() |
| Leopold, Austrian adult, courtesy of SpeechConcept | 2011 | ![]() |
![]() |
![]() |
||||
| Alex, adult, courtesy of SpeechConcept | 2011 | ![]() |
![]() |
![]() |
||||
| Gudrun, adult, courtesy of SpeechConcept | 2011 | ![]() |
![]() |
![]() |
||||
| Nick, youth, courtesy of SpeechConcept | 2011 | ![]() |
![]() |
![]() |
||||
| Saskia, youth, courtesy of SpeechConcept | 2011 | ![]() |
![]() |
![]() |
||||
VoiceINTERConnect![]() |
Commercial version of the Dress Synthesizer. | diphone-synthesis | DE | male voice | 2000 | ![]() |
![]() |
![]() |
| female voice | 2000 | ![]() |
![]() |
![]() |
||||
VoiceRSS![]() |
Online TTS Webservice, free of charge, engine unknown. | unknown | DE, ES, CN, HK, TW, DK, NL, AU, CA, GB, IN, US, FI, CA, FR, IT, JP, KR, NO, PL, BR, PT, MX, RU, SE | female voice, downloaded the sample by using the http API | 2013 | ![]() |
![]() |
![]() |
* Mexican stands for Latin American Spanish
| institution | system | description | year (approx.) | s1 | s2 | s3 |
|---|---|---|---|---|---|---|
| Berkom | Felix | Research system by former r&d department of German Telekom. Hybrid approach combining formant synthesis for voiced phonemes and concatenating with waveform coded units for unvoiced parts. | 1998 | ![]() |
![]() |
![]() |
| Ruhr Univerität Bochum | SyRUB, Version 4.1.1 | Forschungssystem der Ruhr Univerität Bochum. | 1995 | ![]() |
![]() |
![]() |
| IKP Bonn | BOSS | Speech Synthesis Framework based on non-uniform unit-selection | 2001 | ![]() |
![]() |
![]() |
| HADIFIX | mixed inventory concatenation | 1995 | ![]() |
![]() |
- | |
| Uni Dresden | DreSS | diphone-synthesis | 1996 | ![]() |
![]() |
![]() |
| Gerhard Mercator University of Duisburg | - | formant-synthesis | 1996 | ![]() |
![]() |
- |
| Jonathan Duddington | e-speak (eSpeak) | formant-synthesis, based on the 1995 unix "speak"-program. Open-source | 2006 | ![]() |
![]() |
![]() |
| IMS Stuttgart | diphone synthesis | diphone concatenation developed at the IMS Stuttgart. TTS-Framework from Festival. Voice-Database from MBROLA | 2000 | ![]() |
![]() |
![]() |
| non-uniform unit-selection | Developed for the Smartkom-project. TTS-Framework from
Festival.
![]() |
2003 | - | - | - | |
| KTH Stockholm | Infovox | formant-synthesis from Sweden. Developed by Rolf Carlson, Bjorn Granström and Sheri Hunnicut. ( commercial version ) | 1992 | ![]() |
- | - |
| DFKI | MARY | Non-uniform unit selection based on the Pavoque corpus. | 2011 | ![]() |
![]() |
![]() |
| Non-uniform unit selection based on the BITS corpus, for details see Schröder, M. & Hunecke, A. (2007). Creating German Unit Selection Voices for the MARY TTS Platform from the BITS Corpora. Proc. SSW6, Bonn, Germany. | 2007 bits 1 |
![]() |
![]() |
![]() |
||
| 2007 bits 2 |
![]() |
![]() |
![]() |
|||
| 2007 bits 3 |
![]() |
![]() |
![]() |
|||
| 2007 bits 4 |
![]() |
![]() |
![]() |
|||
| diphone-synthesis (DSP is MBROLA, NLP from DFKI/University of Saarbrücken) | 2000 | ![]() |
![]() |
![]() |
||
| Uni Mons | MBROLA / Txt2Pho. Hadifix NLP in combination with Mbrola-Synthesis (diphone-synthesis). Available for free for noncommercial use. Own sentences can be synthesized here. MBROLA-TTS is avalable for a large number of languages. | de8, see Markus Binsteiner | 2002 | - | - | - |
| de7 (by Marc Schröder, DFKI/Uni Saarland, female, 22 kHz), all diphones in three voice qualities | 2002 | ![]() |
![]() |
![]() |
||
| de6 (by Marc Schröder, DFKI/Uni Saarland, male, 22 kHz), all diphones in three voice qualities | 2002 | ![]() |
![]() |
![]() |
||
| de5 (by Fred Englert/ATIP, female, 22 kHz) | 2000 | ![]() |
![]() |
![]() |
||
| de4 (by IMS Stuttgart, male, 16 kHz), includes english and french diphones | 2002 | ![]() |
![]() |
![]() |
||
| de3 (by ATIP, female), first 22005 kHz voice | 2000 | ![]() |
![]() |
![]() |
||
| de2 (by ATIP, male, 16 kHz) | 1997 | ![]() |
![]() |
![]() |
||
| de1 (by Fred Englert, female, 16 kHz) | 1996 | ![]() |
![]() |
![]() |
||
| Uni Budapest | Multivox 5 | (ProfiVox) diphone-concatenation from University of Budapest. male speaker 1 | 2004 | ![]() |
![]() |
- |
| male speaker 2 | 2004 | ![]() |
![]() |
- | ||
| Multivox 3 | formant-synthesis from TU-Budapest. Languages: DE,
HU, FI, NL, ES, PT, SA, Esperanto (!) ![]() |
1994 | - | - | - | |
| Oregon Graduate Institue (OGI) | OGI/Festival | LPC-diphone concatenation developed at the Oregon Graduate Institute , Center for Spoken Language Understanding during a workshop in 1998. TTS-Framework from Festival. | 1998 | ![]() |
![]() |
- |
| ETH Zürich | SVOX | diphone-concatenation. commercial version here | 1998 | ![]() |
![]() |
- |
| Austrian Research Institute for Artificial Intelligence (ÖFAI) | VieCtoS | Vienna Concept-to-Speech system. If the prosody sounds poor it's due to my limited knowledge of Tobi-Labels. Technology is demisyllable-LPC-concatenation | 1998 | ![]() |
- | - |
with the following systems it wasn't possible to synthesize own sentences:
| name/link | description | year (approx.) | mpeg3 (64 kB) |
|---|---|---|---|
| CHATR | non-uniform unit selection waveform-concatenation from ATR, Japan. Male Voice | 1997 | ![]() |
| Female Voice | 1997 | ![]() |
|
| Markus Binsteiner | from TFH Berlin, diphone synthesis with MBROLA (voice de8), simulation of an bavarian accent. | 2004 | ![]() |
| (Uni-) Dresden | Voice 1: concatenative formant-synthesizer | 1993 | ![]() |
| TUSY: hardware formant-synthesizer | 1987 | ![]() |
|
| ROSY (Robotron Synthesizer): hardware formant-synthesizer | 1977 | ![]() |
|
| Syni 2: punchcard controlled formant-synthesizer | 1975 | ![]() |
|
| Syni 1: punchcard controlled formant-synthesizer | 1972 | ![]() |
|
| Eurovocs | new version, diphone-synthesis from t & i, technology from Lernout & Hauspie. | 1998 | ![]() |
| old version, diphone-synthesis from t & i, technology from Lernout & Hauspie. | 1996 | ![]() |
|
| First Byte | product-name:Monologue, ProVoice. waveform-concatenation synthesis (?) (link broken) | 1998 | ![]() |
| (Uni-) Köln | articulatory-synthesis (actually not a TTS-system) | 1996 | ![]() |
| SAMT | (Sprach-Ausgabesystem in Multiplex-Technik): hardware-based formant synthesis of former Forschungsinstitut der Deutschen Bundespost. | 1987 | ![]() |
| Texas Instruments Language Translator | LPC coded word-concatenation from Texas Instruments. Male Voice | 1980 | ![]() |
| SpeakEaZy | waveform-concatenation synthesis from Keller & Trauth. (link broken). | 1998 | ![]() |
| Spengi | diphone-synthesis from Philips/IPO Eindhoven, information based on Gregor Moehler's Examples. | 1997 | ![]() |
| LAIPTTS-D | TTS-system from the university of Lausanne (LAIP), uses MBROLA -engine. Includes a model to reduce/elaborate articulation according to speech-rate. | 1998 | ![]() ![]() |
| University of West Bohemia in Pilsen | concatenative synthesizer ARTIC (ARtificial Talker In Czech). Commercial version available by speechtech by the name of ERIS. | 2002 | ![]() |
| Univ. of Rostock, Peter Birkholz | Vocal Tract Lab: Articulatory synthesis (handtweaked articulatory movements transformed into a mathematical model to generate soundwaves) | 2006 | ![]() |
The following table lists some products to enhance text-to-speech quality.
| company | product | description | date | sample |
|---|---|---|---|---|
| ReadSpeaker | SagEs / SayIt | Serverbased website reader. Based on Acapela products. Sample reads a newspaper article (Tagesspiegel). Note pronunciation of the word "playstation". | 7/11/07 | ![]() |
| ETeX | - | Dictionaries. | 1/7/05 | ![]() |
| Interlinx, aquired by Speech Concept | emphasis / SpeechOptimizer | Tuning tool for pronounciation and prosody modeling. | 1/7/05 | ![]() |
Speechsynthesis examples, that did not fit otherwise.
| Description | Example |
|---|---|
| Ultrafast speechsynthesis as used by blind, with 14 syllables per second, based on formant synthesis Eloquence | ![]() |
| realspeak British English, 31/5/05, "Flight LH312 from Frankfurt to Berlin." | ![]() |
| TTS of the Fiat "Blue & Me" Navigation Headunit with Microsoft CE. Voice Steffi of Nuance. | ![]() |
| Apple Iphone 2011, Recorded with PC Mikrofon from Apple iPhone 4.1, TTS is faster compact version of Voice Yannick von Nuance | , , ![]() |
the following engines are based on systems with a different name:
For the following systems I didn't yet get samples:
For the following systems I have no information about the supplier:
Systems are usually either system- or signal modeling, primarily
rule-based or data-based and can be distinguished by the type of the
basic units and the way they are coded.
the following persons delivered information and/or samples:Missing examples
Unknown examples
Categorization of text-to-speech
systems

Credits: