site navigation

German Text-to-Speech

last update: 3rd January 2018


  1. Foreword
  2. Commercial systems
  3. Universities/research
  4. Other systems
  5. Service systems
  6. Further samples
  7. Licensed products
  8. Missing examples
  9. Unknown examples
  10. TTS classification chart
  11. Credits
  12. Change-log



I added a chart to facilitate the understanding of the concepts used for classification. It's kind of out-dated, as non-uniform unit selection is not explicitely mentioned.

TTS consists always of two components, which I call Dutoit's introduction):

The engines that synthesize the speech (DSP-component) are based mainly on five main technologies:

The test sentences were:

sentence 1:

» An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht. Dabei war eigentlich immer sehr schönes Wetter gewesen. «

As I found this sentence a bit too simple, I thought up another test sentence which contains a collection of known problems for the NLP module: (in some demos this sentence is truncuated due to provider's restriction on character number)

sentence 2:

» Dr. A. Smithe von der NATO (und nicht vom CIA) versorgt z.B. - meines Wissens nach - die Heroin seit dem 15.3.00 tgl. mit 13,84 Gramm Heroin zu 1,04 DM das Gramm. «

Speaking now 6 years after thinking up those sentences, more pressing problems for German speech synthesis used in services like email-reading arise from the pronounciation of english terms, e.g. the following sentence would not be pronounced correctly by most systems without tuning:

sentence 3:

» Die Manpowerdiskussion wird gecancelt, du kannst das File vom Server downloaden. «


company/link description/engine name technology languages voice name year (approx.) s1 s2 s3
Acapela Group (former Babeltech, Infovox and Elan)


Acapela HQ TTS non-uniform unit-selection DE, FR, NL, ES, SE, US, SA, CY, DN, FI, CA, GR, IE, NO, PL, PT, BR, RU, TR Claudia 2015 mp3 mp3 mp3
Claudia (Smile) 2017 mp3 - <-
Lea (Child) 2013 mp3 mp3 mp3
Jonas (child) 2013 mp3 mp3 mp3
Andreas 2011 mp3 mp3 mp3
Julia 2009 mp3 mp3 mp3
Klaus 2006 mp3 mp3 mp3
Sarah 2003 mp3 mp3 mp3
Custom Voice Non-uniform unit-selection / Artificial neural nets. DE, US, UK, FR, ES, IT, BR, PT, RU, PL Felix (page author): Artificial neural net model adapted with 15 minutes data 2017 mp3 mp3 mp3
Felix (page author) Non-uniform unit-selection from 2 hours data 2017 mp3 mp3 mp3
greeting bunny non-uniform unit-selection DE, US, FR, IT, ES, NL, SE, NO, DK, BE bunny 2008 mp3 mp3 mp3
Elan 's SaySo non-uniform unit-selection DE, US, FR, IT, ES Lea 2003 mp3 mp3 mp3
Elan's Tempo diphone-concatenation (PSOLA). Pitch Synchronous Overlap and Add: famous algorithm to change pitch and time of speech that made diphone-synthesis a great success for many years. DE, US, UK, FR, ES, IT, BR, PT, RU, PL Thomas 1998 mp3 mp3 mp3
Dagmar 1996 mp3 mp3 mp3
Babeltech 's BrightSpeech non-uniform unit-selection, same as Acapela HQ TTS Ingrid 2002 mp3 mp3 -
Babeltech's Babil diphone-concatenation based on commercial Mbrola-engine. MBROLA (Multi Band Resynthesis Overlap and Add), similar to PSOLA but the database is treated beforehand to adapt pitch, amplitude and spectral features. DE, US, UK, ES, FR, NL, BE, BR, PT, IT, SE, NO, DK, FI, IS, TR, CZ, SA Eva 2000 mp3 mp3 mp3
Greta 2000 mp3 mp3 mp3
Helga (8 kHz) 2000 mp3 mp3 mp3
Gerhard (8 kHz) 2000 mp3 mp3 mp3
Steffen 1997 mp3 mp3 mp3
Infovox 330/Infovox Desktop diphone-concatenation (probably same as Babil). Infovox 310 is apple version DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE Helga 1996 mp3 mp3 -
Gerhard mp3 1996 - - -
Infovox 210/230 formant-synthesis (successor of KTH's OVE, originally telia promotor) DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE - 1994 mp3 mp3 -
Infovox Desktop PRO non-uniform unit-selection, same as Acapela HQ TTS
. diphone-concatenation with LPC coded units. LPC (linear predictive coding), originally a compression algorithm, useful for synthesis because based on a source/filter model of speech. DE, UK, US, FR, BR, IT, ES Julia 1998 mp3 mp3 -
Amazon (formerly Ivona)
IVONA TTS non-uniform unit-selection DE, US, UK, ES, RO, PL, MX Hans 2011 mp3 mp3 mp3
Marlene 2011 mp3 mp3 mp3
Cerevoice, Developments from Aristech, CereProc, University of Edinburgh non-uniform unit-selection DE, EN, FR, IT, ES, US, NL, JP Sophie, adult, Corporate Voice, courtesy of Aristech 2011 mp3 mp3 mp3
Leopold, Austrian adult, courtesy of Aristech 2013 mp3 mp3 mp3
Alex, adult, courtesy of Aristech 2016 mp3 mp3 mp3
Gudrun, adult, courtesy of Aristech 2013 mp3 mp3 mp3
Nick, youth, courtesy of Aristech 2011 mp3 mp3 mp3
Saskia, youth, courtesy of Aristech 2011 mp3 mp3 mp3
Proser NLP-component and voices from Atip, Mbrola Engine (diphone-concatenation) from Babeltech DE, US Carla 2000 mp3 mp3 mp3
Erkan (turkish accent) 2004 mp3 mp3 mp3
Fifi (french accent) 2004 mp3 mp3 mp3
Steffen 1997 mp3 mp3 mp3
Eva 2000 mp3 mp3 mp3
Natural Voices non-uniform unit-selection DE, IT, US, UK, FR, MX* Klara 2001 mp3 mp3 mp3
Reiner 2002 mp3 mp3 mp3
Bell-Labs (Lucent)
- LPC-diphone concatenation DE, FR, ES, US, UK, IT, RU, RO, CN - 1997 mp3 mp3 -
- non-uniform unit-selection DE, UK, US, ES, FR, EG, TH, AF Katrin 2003 mp3 mp3 mp3
Matthias 2003 mp3 mp3 mp3
Dectalk rule based formant-synthesis - the legendary formant synthesizer, based on Klatt's MITTalk) DE, US, UK, ES, MX*, FR - 1982 mp3 mp3 -
img src="images/gdata.gif" alt="logo" border="0" />
Logox microsegmentsynthesis (concatenating subphonetic units), not developed any more DE, US, UK - 2000 mp3 mp3 -
Bill 1998 mp3 mp3 mp3
Bill swabian accent 2002 mp3 mp3 mp3
Bill hessian accent 2002 mp3 mp3 mp3
Bill saxian accent 2002 mp3 mp3 mp3
Bill french accent 2002 mp3 mp3 mp3
Unknown non-uniform unit-selection, can be accessed via the translation service NA female 19th oct. 2013 mp3 mp3 mp3
CTTS non-uniform unit-selection DE, US, UK, JP, KR, IT, ES, FR male, courtesy of IBM. Database speaker is Gilles Karolyi. 2002 mp3 mp3 mp3
female mp3 8kHz 2004 - - -
Development system from unsupervised audiobook extraction non-uniform unit-selection DE, US, UK, GR, BG Christian, courtesy of Innoetics. 2015 mp3 mp3 mp3
Claudia, courtesy of Innoetics. 2015 mp3 mp3 mp3
Jessi, courtesy of Innoetics. 2015 mp3 mp3 mp3
Karlsson, courtesy of Innoetics. 2015 mp3 mp3 mp3
Orpheus, formerly from Dolphin Oceanic Ltd formant synthesis. DE, UK, US, FR, BR, PT, IT, ES, Welsh, CN (Catonese and Mandarin), CR, DN, NL, FI, GR, HU, LT, MY, NO, PL, RO, MX, SE - 2009 mp3 mp3 mp3
Microsoft Speech Platform - Runtime Languages (Version 11) non-uniform unit-selection. ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN Hedda 2012 mp3 mp3 mp3
A Hoya company. As is ReadSpeaker Non-uniform unit-selection DE, US, UK, MX, TW, TH, KR, IT, CN, CH, JP, CT, BR, PT, FR Lena, female voice 2018 mp3
Tim, male voice 2018 mp3
Nuance (formerly called Scansoft)
Vocalizer (formally RealSpeak, originally from Lernout & Hauspie), converged with RVoice (formerly Rhetorical) first commercial German unit-selection TTS non-uniform unit-selection DE, NL, PT, CA, CN, ES, DK, PT, FR, IT, JP, KR, MX, NO, PL, RU, SE, US, UK, AU, SA, ID, Basque, BE, CZ, FI, GR, IN, HU, TH, TR, ZA, RO Victor 2016 mp3 mp3 mp3
Anna (11 kHz, courtesy of Nuance) 2010 mp3 mp3 mp3
Yannick (11 kHz, courtesy of Nuance) 2006 mp3 mp3 mp3
Yannick embedded version recorded from a cell phone 2009 mp3 mp3 mp3
Monika and/or Beate (?) - same as RVoice F026 2005 mp3 mp3 mp3
Steffi (8 kHz) 2004 mp3 mp3 mp3
Steffi 2, newer version with enhanced voicequality and better pronunciation. 2015 mp3 mp3 mp3
Vera (8 kHz) 1999 mp3 mp3 mp3
Former Loquendo, formerly called Actor, now Loquendo TTS non-uniform unit-selection DE, IT, ES, FR, BR, PT, CN, UK, US, MX, GR, CL, AR, SE Ulrike, no longer available 2001 mp3 mp3 mp3
Stefan, courtesy of Loquendo 2003 mp3 mp3 mp3
Katrin, courtesy of Loquendo 2003 mp3 mp3 mp3
Former SVOX, commercial version of the ETH-Zuerich System. diphone-synthesis DE, FR, IT, US, ES Nicole 2000 mp3
Former SVOX, Corporate non-uniform unit-selection DE, US Petra 2005 mp3 mp3 mp3
Markus 2005 mp3 mp3 mp3
Marlene mp3 2003 - - -
Speechify (formerly SpeechWorks, now merged with Scansoft) non-uniform unit-selection DE, US, UK, AU, JP, MX*, FR, BR, CA(FR) Tessa 2002 mp3
RVoice, former Rhetorical non-uniform unit-selection DE, UK, US, GR, ES F018 2002 mp3 mp3 mp3
M027 2004 mp3 mp3 mp3
F026 2004 mp3 mp3 mp3
Vocalizer 4.05 (former Nuance, before acquisition by Scansosft) non-uniform unit-selection DE, US, UK, AU, CA(FR), MX*, BR Anna Weber 2004 mp3 mp3 mp3
Vocalizer 1.0 (former Nuance, before acquisition by Scansosft) non-uniform unit-selection (licensed fonix engine) DE, US, UK, NL, FR, IT, NO, ES, SE - 2001 mp3 mp3 -
ETI Eloquence, (originally from Eloquent Technologies, than Speechworks) also licensed to IBM (ViaVoice Outloud) rule-based formant-synthesis (Klatt-style) DE, UK, US, ES, MX, FR, CA(FR), IT, FI, BR, CN, JP, KR - 1998 mp3 mp3 -
TTS3000 (orig. Lernout & Hauspie) diphone-synthesis DE, US, UK, NL, FR, RU, ES, MX, BR, CN, KR Stefan 1996 mp3 mp3 -
Anna mp3 1996 - - -
TruVoice (orig. Centigram, than Lernout & Hauspie) formant-synthesis DE, US, MX*, FR, IT - 1996 mp3 mp3 -
OnScreenVoices, by tom weber software
Samples courtesy of tom weber software Non-uniform unit-selection synthesis DE Andreas 2015 mp3
Marianne 2015 mp3
A Hoya company. non-uniform unit-selection synthesis using deep neural artificial networks DE,GB,US,AU,ES,FR,NL,SE Max, male voice, courtesy of ReadSpeaker 2018 mp3
Commercial version of the Dress Synthesizer. diphone-synthesis DE male voice 2000 mp3
female voice 2000 mp3
Votrax Early hardware Formant synthesizer Formant synthesis DE Samples taken from an Audiodata Braille reader. 1974 mp3 mp3 mp3
Spin-off from Orange Labs. Hybrid Non-uniform unit-selection / HMM synthesis DE, FR, EN, ES, IT, AR Sylvia, female voice, courtesy of Voxygen 2014 mp3
Matthias, male voice, courtesy of Voxygen 2014 mp3

* Mexican stands for Latin American Spanish

Universities / Research

institution system description year (approx.) s1 s2 s3
Simple4All Tundra corpus EU FP7 Project "Simple4All" Tundra corpus, system features unsupervised learning. 2013 mp3 mp3 mp3
Berkom Felix Research system by former r&d department of German Telekom. Hybrid approach combining formant synthesis for voiced phonemes and concatenating with waveform coded units for unvoiced parts. 1998 mp3 mp3 mp3
Ruhr Univerität Bochum SyRUB, Version 4.1.1 Forschungssystem der Ruhr Univerität Bochum. 1995 mp3 mp3 mp3
IKP Bonn BOSS Speech Synthesis Framework based on non-uniform unit-selection 2001 mp3 mp3 mp3
HADIFIX mixed inventory concatenation 1995 mp3 mp3 -
Uni Dresden DreSS diphone-synthesis 1996 mp3 mp3 mp3
Gerhard Mercator University of Duisburg - formant-synthesis 1996 mp3 mp3 -
Jonathan Duddington e-speak (eSpeak) formant-synthesis, based on the 1995 unix "speak"-program. Open-source 2006 mp3 mp3 mp3
IMS Stuttgart diphone synthesis diphone concatenation developed at the IMS Stuttgart. TTS-Framework from Festival. Voice-Database from MBROLA 2000 mp3 mp3 mp3
non-uniform unit-selection engine Developed for the Smartkom-project. TTS-Framework from Festival. mp3 2003 - - -
KTH Stockholm Infovox formant-synthesis from Sweden. Developed by Rolf Carlson, Bjorn Granström and Sheri Hunnicut. ( commercial version ) 1992 mp3 - -
DFKI MARY Non-uniform unit selection based on the Pavoque corpus. 2011 mp3 mp3 mp3
Non-uniform unit selection based on the BITS corpus, for details see Schröder, M. & Hunecke, A. (2007). Creating German Unit Selection Voices for the MARY TTS Platform from the BITS Corpora. Proc. SSW6, Bonn, Germany. 2007
bits 1
mp3 mp3 mp3
bits 2
mp3 mp3 mp3
bits 3
mp3 mp3 mp3
bits 4
mp3 mp3 mp3
diphone-synthesis (DSP is MBROLA, NLP from DFKI/University of Saarbrücken) 2000 mp3 mp3 mp3
Uni Mons MBROLA / Txt2Pho. Hadifix NLP in combination with Mbrola-Synthesis (diphone-synthesis). Available for free for noncommercial use. MBROLA-TTS is avalable for a large number of languages. de8, see Markus Binsteiner 2002 - - -
de7 (by Marc Schröder, DFKI/Uni Saarland, female, 22 kHz), all diphones in three voice qualities 2002 mp3 mp3 mp3
de6 (by Marc Schröder, DFKI/Uni Saarland, male, 22 kHz), all diphones in three voice qualities 2002 mp3 mp3 mp3
de5 (by Fred Englert/ATIP, female, 22 kHz) 2000 mp3 mp3 mp3
de4 (by IMS Stuttgart, male, 16 kHz), includes english and french diphones 2002 mp3 mp3 mp3
de3 (by ATIP, female), first 22005 kHz voice 2000 mp3 mp3 mp3
de2 (by ATIP, male, 16 kHz) 1997 mp3 mp3 mp3
de1 (by Fred Englert, female, 16 kHz) 1996 mp3 mp3 mp3
Uni Budapest Multivox 5 (ProfiVox) diphone-concatenation from University of Budapest. male speaker 1 2004 mp3 mp3 -
male speaker 2 2004 mp3 mp3 -
Multivox 3 formant-synthesis from TU-Budapest. Languages: DE, HU, FI, NL, ES, PT, SA, Esperanto (!) mp3 1994 - - -
Oregon Graduate Institue (OGI) OGI/Festival LPC-diphone concatenation developed at the Oregon Graduate Institute , Center for Spoken Language Understanding during a workshop in 1998. TTS-Framework from Festival. 1998 mp3 mp3 -
ETH Zürich SVOX diphone-concatenation. commercial version here 1998 mp3 mp3 mp3
Austrian Research Institute for Artificial Intelligence (ÖFAI) VieCtoS Vienna Concept-to-Speech system. If the prosody sounds poor it's due to my limited knowledge of Tobi-Labels. Technology is demisyllable-LPC-concatenation 1998 mp3 - -

with the following systems it wasn't possible to synthesize own sentences:

name/link description year (approx.) mpeg3
AEG Telefunken unknown concatenation ("Parcor-Synthetisator") 1978 mp3
CHATR non-uniform unit selection waveform-concatenation from ATR, Japan. Male Voice 1997 mp3
Female Voice 1997 mp3
Char2Wav Deep neural artificial networks from University of Montreal: An end-to-end model for speech synthesis learned with Deeplearning4J. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. For the German samples, the Pavoque database was used for training. 2017 mp3
Markus Binsteiner from TFH Berlin, diphone synthesis with MBROLA (voice de8), simulation of an bavarian accent. 2004 mp3
(Uni-) Dresden Voice 1: concatenative formant-synthesizer 1993 mp3
TUSY: hardware formant-synthesizer 1987 mp3
ROSY (Robotron Synthesizer): hardware formant-synthesizer 1977 mp3
Syni 2: punchcard controlled formant-synthesizer 1975 mp3
Syni 1: punchcard controlled formant-synthesizer 1972 mp3
Eurovocs new version, diphone-synthesis from t & i, technology from Lernout & Hauspie. 1998 mp3
old version, diphone-synthesis from t & i, technology from Lernout & Hauspie. 1996 mp3
First Byte product-name:Monologue, ProVoice. waveform-concatenation synthesis (?) (link broken) 1998 mp3 mp3
HHI unknown 1978 mp3
(Uni-) Köln articulatory-synthesis (actually not a TTS-system) 1996 mp3
KTH's OVE III Formant synthesis from the KTH, Sweden 1967 mp3
Karl Küpfmüller / Bernhard Cramer Hardware phoneme concatenation 1955 mp3
LAIPTTS-D TTS-system from the university of Lausanne (LAIP), uses MBROLA -engine. Includes a model to reduce/elaborate articulation according to speech-rate. 1998 mp3mp3
Unknown Russian TTS unknown / formant? 1970? mp3
SAMT (Sprach-Ausgabesystem in Multiplex-Technik): hardware-based formant synthesis of former Forschungsinstitut der Deutschen Bundespost. 1987 mp3
H.W. Strube, University of Göttingen Articulatory synthesis. 1977? mp3
Texas Instruments Language Translator LPC coded word-concatenation from Texas Instruments. Male Voice 1980 mp3
SpeakEaZy waveform-concatenation synthesis from Keller & Trauth. (link broken). 1998 mp3
Spengi diphone-synthesis from Philips/IPO Eindhoven 1997 mp3
University of West Bohemia in Pilsen concatenative synthesizer ARTIC (ARtificial Talker In Czech). Commercial version available by speechtech by the name of ERIS. 2002 mp3
Wolfgang_von_Kempelen's Speaking Machine Hardware manual sound generator ("papa", "mama") 1769 mp3
Vocal Tract LabUniv. of Dresden, Peter Birkholz Vocal Tract Lab: Articulatory synthesis (handtweaked articulatory movements transformed into a mathematical model to generate soundwaves) 2006 mp3

Service products

The following table lists some products to enhance text-to-speech quality.

company product description date sample
ReadSpeaker, now commercialize their own engine under the name rSpeak, both a Hoya company. SagEs / SayIt Serverbased website reader. Based on Acapela products. Sample reads a newspaper article (Tagesspiegel). Note pronunciation of the word "playstation". 7/11/07 mp3
ETeX - Dictionaries. 1/7/05 mp3
Interlinx, aquired by Speech Concept emphasis / SpeechOptimizer Tuning tool for pronounciation and prosody modeling. 1/7/05 mp3

Further examples

Speechsynthesis examples, that did not fit otherwise.

Description Example
Ultrafast speechsynthesis as used by blind, with 14 syllables per second, based on formant synthesis Eloquence mp3
realspeak British English, 31/5/05, "Flight LH312 from Frankfurt to Berlin." mp3
TTS of the Fiat "Blue & Me" Navigation Headunit with Microsoft CE. Voice Steffi of Nuance. mp3
Apple Iphone 2011, Recorded with PC Mikrofon from Apple iPhone 4.1, TTS is faster compact version of Voice Yannick von Nuance mp3, mp3, mp3
"Karlchen": Telephone based automatic train information system from the 70ies mp3

Licensed Systems

the following engines are based on systems with a different name:

Missing examples

For the following systems I didn't yet get samples:

Unknown examples

For the following systems I have no information about the supplier:

Categorization of text-to-speech systems

Systems are usually either system- or signal modeling, primarily rule-based or data-based and can be distinguished by the type of the basic units and the way they are coded. tts technology overview


The following persons delivered information and/or samples:


Speechsynthesis-demos with simulated emotion