site navigation


German Text-to-Speech

last update: 29th January 2014 , diese Seite auf deutsch german

Contents:

  1. Foreword
  2. Commercial systems
  3. Universities/research
  4. Other systems
  5. Service systems
  6. Further samples
  7. Licensed products
  8. Missing examples
  9. Unknown examples
  10. TTS classification chart
  11. Credits
  12. Change-log

Remarks

Terminology

I added a chart to facilitate the understanding of the concepts used for classification. It's kind of out-dated, as non-uniform unit selection is not explicitely mentioned.

TTS consists always of two components, which I call Dutoit's introduction):

The engines that synthesize the speech (DSP-component) are based mainly on four main technologies:


The test sentences were:

sentence 1:

» An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht. Dabei war eigentlich immer sehr schönes Wetter gewesen. «

As I found this sentence a bit too simple, I thought up another test sentence which contains a collection of known problems for the NLP module: (in some demos this sentence is truncuated due to provider's restriction on character number)

sentence 2:

» Dr. A. Smithe von der NATO (und nicht vom CIA) versorgt z.B. - meines Wissens nach - die Heroin seit dem 15.3.00 tgl. mit 13,84 Gramm Heroin zu 1,04 DM das Gramm. «

Speaking now 6 years after thinking up those sentences, more pressing problems for German speech synthesis used in services like email-reading arise from the pronounciation of english terms, e.g. the following sentence would not be pronounced correctly by most systems without tuning:

sentence 3:

» Die Manpowerdiskussion wird gecancelt, du kannst das File vom Server downloaden. «

Commercial

company/link engine name technology languages voice name year (approx.) s1 s2 s3
Acapela Group (former Babeltech, Infovox and Elan)
logo

logo

logo
Acapela HQ TTS non-uniform unit-selection DE, FR, NL, ES, SE, US, SA, CY, DN, FI, CA, GR, IE, NO, PL, PT, BR, RU, TR Lea (Child) 2013 mp3 mp3 mp3
Jonas (child) 2013 mp3 mp3 mp3
Andreas 2011 mp3 mp3 mp3
Julia 2009 mp3 mp3 mp3
Klaus 2006 mp3 mp3 mp3
Sarah 2003 mp3 mp3 mp3
greeting bunny non-uniform unit-selection DE, US, FR, IT, ES, NL, SE, NO, DK, BE bunny 2008 mp3 mp3 mp3
Elan 's SaySo non-uniform unit-selection DE, US, FR, IT, ES Lea 2003 mp3 mp3 mp3
Elan's Tempo diphone-concatenation (PSOLA). Pitch Synchronous Overlap and Add: famous algorithm to change pitch and time of speech that made diphone-synthesis a great success for many years. DE, US, UK, FR, ES, IT, BR, PT, RU, PL Thomas 1998 mp3 mp3 mp3
Dagmar 1996 mp3 mp3 mp3
Babeltech 's BrightSpeech non-uniform unit-selection, same as Acapela HQ TTS Ingrid 2002 mp3 mp3 -
Babeltech's Babil diphone-concatenation based on commercial Mbrola-engine. MBROLA (Multi Band Resynthesis Overlap and Add), similar to PSOLA but the database is treated beforehand to adapt pitch, amplitude and spectral features. DE, US, UK, ES, FR, NL, BE, BR, PT, IT, SE, NO, DK, FI, IS, TR, CZ, SA Eva 2000 mp3 mp3 mp3
Greta 2000 mp3 mp3 mp3
Helga (8 kHz) 2000 mp3 mp3 mp3
Gerhard (8 kHz) 2000 mp3 mp3 mp3
Steffen 1997 mp3 mp3 mp3
Infovox 330/Infovox Desktop diphone-concatenation (probably same as Babil). Infovox 310 is apple version DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE Helga 1996 mp3 mp3 -
Gerhard mp3 1996 - - -
Infovox 210/230 formant-synthesis (successor of KTH's OVE, originally telia promotor) DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE - 1994 mp3 mp3 -
Infovox Desktop PRO non-uniform unit-selection, same as Acapela HQ TTS
Aculab
logo
. diphone-concatenation with LPC coded units. LPC (linear predictive coding), originally a compression algorithm, useful for synthesis because based on a source/filter model of speech. DE, UK, US, FR, BR, IT, ES Julia 1998 mp3 mp3 -
Amazon (formerly Ivona)
logo
IVONA TTS non-uniform unit-selection DE, US, UK, ES, RO, PL, MX Hans 2011 mp3 mp3 mp3
Marlene 2011 mp3 mp3 mp3
Aristech
logo
Cerevoice, Developments from SpeechConcept, CereProc, University of Edinburgh non-uniform unit-selection DE, EN, FR, IT, ES, US, NL, JP Sophie, adult, Corporate Voice, courtesy of Aristech 2011 mp3 mp3 mp3
Leopold, Austrian adult, courtesy of Aristech 2013 mp3 mp3 mp3
Alex, adult, courtesy of Aristech 2013 mp3 mp3 mp3
Gudrun, adult, courtesy of Aristech 2013 mp3 mp3 mp3
Nick, youth, courtesy of Aristech 2011 mp3 mp3 mp3
Saskia, youth, courtesy of Aristech 2011 mp3 mp3 mp3
Atip
logo
Proser NLP-component and voices from Atip, Mbrola Engine (diphone-concatenation) from Babeltech DE, US Carla 2000 mp3 mp3 mp3
Erkan (turkish accent) 2004 mp3 mp3 mp3
Fifi (french accent) 2004 mp3 mp3 mp3
Steffen 1997 mp3 mp3 mp3
Eva 2000 mp3 mp3 mp3
AT&T
logo
Natural Voices non-uniform unit-selection DE, IT, US, UK, FR, MX* Klara 2001 mp3 mp3 mp3
Reiner 2002 mp3 mp3 mp3
Bell-Labs (Lucent)
logo
- LPC-diphone concatenation DE, FR, ES, US, UK, IT, RU, RO, CN - 1997 mp3 mp3 -
Cepstral
logo
- non-uniform unit-selection DE, UK, US, ES, FR, EG, TH, AF Katrin 2003 mp3 mp3 mp3
Matthias 2003 mp3 mp3 mp3
Fonix/SpeechFX
logo
Dectalk rule based formant-synthesis - the legendary formant synthesizer, based on Klatt's MITTalk) DE, US, UK, ES, MX*, FR - 1982 mp3 mp3 -
GData
img src="images/gdata.gif" alt="logo" border="0" />
Logox microsegmentsynthesis (concatenating subphonetic units), not developed any more DE, US, UK - 2000 mp3 mp3 -
Bill 1998 mp3 mp3 mp3
Bill swabian accent 2002 mp3 mp3 mp3
Bill hessian accent 2002 mp3 mp3 mp3
Bill saxian accent 2002 mp3 mp3 mp3
Bill french accent 2002 mp3 mp3 mp3
Google
logo
Unknown non-uniform unit-selection, can be accessed via the translation service NA female 19th oct. 2013 mp3 mp3 mp3
IBM
logo
CTTS non-uniform unit-selection DE, US, UK, JP, KR, IT, ES, FR male, courtesy of IBM. Database speaker is Gilles Karolyi. 2002 mp3 mp3 mp3
8kHz
female mp3 8kHz 2004 - - -
Lumenvox
logo
Lumenvox Text-to-Speech non-uniform unit-selection DE, US, UK, AU, BR, FR, CA, ES, DA, NL, MX, PO, IT, RO, RU, Welsh Lukas 2013 mp3 mp3 mp3
Heidi 2013 mp3 mp3 mp3
Meridian
logo
Orpheus, formerly from Dolphin Oceanic Ltd formant synthesis. DE, UK, US, FR, BR, PT, IT, ES, Welsh, CN (Catonese and Mandarin), CR, DN, NL, FI, GR, HU, LT, MY, NO, PL, RO, MX, SE - 2009 mp3 mp3 mp3
Microsoft
logo
Microsoft Speech Platform - Runtime Languages (Version 11) non-uniform unit-selection. ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN Hedda 2012 mp3 mp3 mp3
Nuance (formerly called Scansoft)
logo
Vocalizer (formally RealSpeak, originally from Lernout & Hauspie), converged with RVoice (formerly Rhetorical) first commercial German unit-selection TTS non-uniform unit-selection DE, NL, PT, CA, CN, ES, DK, PT, FR, IT, JP, KR, MX, NO, PL, RU, SE, US, UK, AU, SA, ID, Basque, BE, CZ, FI, GR, IN, HU, TH, TR, ZA, RO Anna (11 kHz, courtesy of Nuance) 2010 mp3 mp3 mp3
Yannick (11 kHz, courtesy of Nuance) 2006 mp3 mp3 mp3
Yannick embedded version recorded from a cell phone 2009 mp3 mp3 mp3
Monika and/or Beate (?) - same as RVoice F026 2005 mp3 mp3 mp3
Steffi (8 kHz) 2004 mp3 mp3 mp3
Vera (8 kHz) 1999 mp3 mp3 mp3
Former Loquendo, formerly called Actor, now Loquendo TTS non-uniform unit-selection DE, IT, ES, FR, BR, PT, CN, UK, US, MX, GR, CL, AR, SE Ulrike, no longer available 2001 mp3 mp3 mp3
Stefan, courtesy of Loquendo 2003 mp3 mp3 mp3
Katrin, courtesy of Loquendo 2003 mp3 mp3 mp3
Former SVOX, commercial version of the ETH-Zuerich System. diphone-synthesis DE, FR, IT, US, ES Nicole 2000 mp3
mp3
-
Former SVOX, Corporate non-uniform unit-selection DE, US Petra 2005 mp3 mp3 mp3
Markus 2005 mp3 mp3 mp3
Marlene mp3 2003 - - -
Speechify (formerly SpeechWorks, now merged with Scansoft) non-uniform unit-selection DE, US, UK, AU, JP, MX*, FR, BR, CA(FR) Tessa 2002 mp3
mp3
mp3
RVoice, former Rhetorical non-uniform unit-selection DE, UK, US, GR, ES F018 2002 mp3 mp3 mp3
M027 2004 mp3 mp3 mp3
F026 2004 mp3 mp3 mp3
Vocalizer 4.05 (former Nuance, before acquisition by Scansosft) non-uniform unit-selection DE, US, UK, AU, CA(FR), MX*, BR Anna Weber 2004 mp3 mp3 mp3
Vocalizer 1.0 (former Nuance, before acquisition by Scansosft) non-uniform unit-selection (licensed fonix engine) DE, US, UK, NL, FR, IT, NO, ES, SE - 2001 mp3 mp3 -
ETI Eloquence, (originally from Eloquent Technologies, than Speechworks) also licensed to IBM (ViaVoice Outloud) rule-based formant-synthesis (Klatt-style) DE, UK, US, ES, MX, FR, CA(FR), IT, FI, BR, CN, JP, KR - 1998 mp3 mp3 -
TTS3000 (orig. Lernout & Hauspie) diphone-synthesis DE, US, UK, NL, FR, RU, ES, MX, BR, CN, KR Stefan 1996 mp3 mp3 -
Anna mp3 1996 - - -
TruVoice (orig. Centigram, than Lernout & Hauspie) formant-synthesis DE, US, MX*, FR, IT - 1996 mp3 mp3 -
VoiceINTERConnect
logo
Commercial version of the Dress Synthesizer. diphone-synthesis DE male voice 2000 mp3
mp3
mp3
female voice 2000 mp3
mp3
mp3
VoiceRSS
logo
Online TTS Webservice, free of charge, engine unknown. unknown DE, ES, CN, HK, TW, DK, NL, AU, CA, GB, IN, US, FI, CA, FR, IT, JP, KR, NO, PL, BR, PT, MX, RU, SE female voice, downloaded the sample by using the http API 2013 mp3
mp3
mp3

* Mexican stands for Latin American Spanish

Universities / Research

institution system description year (approx.) s1 s2 s3
Simple4All Tundra corpus EU FP7 Project "Simple4All" Tundra corpus, system features unsupervised learning. 2013 mp3 mp3 mp3
Berkom Felix Research system by former r&d department of German Telekom. Hybrid approach combining formant synthesis for voiced phonemes and concatenating with waveform coded units for unvoiced parts. 1998 mp3 mp3 mp3
Ruhr Univerität Bochum SyRUB, Version 4.1.1 Forschungssystem der Ruhr Univerität Bochum. 1995 mp3 mp3 mp3
IKP Bonn BOSS Speech Synthesis Framework based on non-uniform unit-selection 2001 mp3 mp3 mp3
HADIFIX mixed inventory concatenation 1995 mp3 mp3 -
Uni Dresden DreSS diphone-synthesis 1996 mp3 mp3 mp3
Gerhard Mercator University of Duisburg - formant-synthesis 1996 mp3 mp3 -
Jonathan Duddington e-speak (eSpeak) formant-synthesis, based on the 1995 unix "speak"-program. Open-source 2006 mp3 mp3 mp3
IMS Stuttgart diphone synthesis diphone concatenation developed at the IMS Stuttgart. TTS-Framework from Festival. Voice-Database from MBROLA 2000 mp3 mp3 mp3
non-uniform unit-selection engine Developed for the Smartkom-project. TTS-Framework from Festival. mp3 2003 - - -
KTH Stockholm Infovox formant-synthesis from Sweden. Developed by Rolf Carlson, Bjorn Granström and Sheri Hunnicut. ( commercial version ) 1992 mp3 - -
DFKI MARY Non-uniform unit selection based on the Pavoque corpus. 2011 mp3 mp3 mp3
Non-uniform unit selection based on the BITS corpus, for details see Schröder, M. & Hunecke, A. (2007). Creating German Unit Selection Voices for the MARY TTS Platform from the BITS Corpora. Proc. SSW6, Bonn, Germany. 2007
bits 1
mp3 mp3 mp3
2007
bits 2
mp3 mp3 mp3
2007
bits 3
mp3 mp3 mp3
2007
bits 4
mp3 mp3 mp3
diphone-synthesis (DSP is MBROLA, NLP from DFKI/University of Saarbrücken) 2000 mp3 mp3 mp3
Uni Mons MBROLA / Txt2Pho. Hadifix NLP in combination with Mbrola-Synthesis (diphone-synthesis). Available for free for noncommercial use. MBROLA-TTS is avalable for a large number of languages. de8, see Markus Binsteiner 2002 - - -
de7 (by Marc Schröder, DFKI/Uni Saarland, female, 22 kHz), all diphones in three voice qualities 2002 mp3 mp3 mp3
de6 (by Marc Schröder, DFKI/Uni Saarland, male, 22 kHz), all diphones in three voice qualities 2002 mp3 mp3 mp3
de5 (by Fred Englert/ATIP, female, 22 kHz) 2000 mp3 mp3 mp3
de4 (by IMS Stuttgart, male, 16 kHz), includes english and french diphones 2002 mp3 mp3 mp3
de3 (by ATIP, female), first 22005 kHz voice 2000 mp3 mp3 mp3
de2 (by ATIP, male, 16 kHz) 1997 mp3 mp3 mp3
de1 (by Fred Englert, female, 16 kHz) 1996 mp3 mp3 mp3
Uni Budapest Multivox 5 (ProfiVox) diphone-concatenation from University of Budapest. male speaker 1 2004 mp3 mp3 -
male speaker 2 2004 mp3 mp3 -
Multivox 3 formant-synthesis from TU-Budapest. Languages: DE, HU, FI, NL, ES, PT, SA, Esperanto (!) mp3 1994 - - -
Oregon Graduate Institue (OGI) OGI/Festival LPC-diphone concatenation developed at the Oregon Graduate Institute , Center for Spoken Language Understanding during a workshop in 1998. TTS-Framework from Festival. 1998 mp3 mp3 -
ETH Zürich SVOX diphone-concatenation. commercial version here 1998 mp3 mp3 -
Austrian Research Institute for Artificial Intelligence (ÖFAI) VieCtoS Vienna Concept-to-Speech system. If the prosody sounds poor it's due to my limited knowledge of Tobi-Labels. Technology is demisyllable-LPC-concatenation 1998 mp3 - -

with the following systems it wasn't possible to synthesize own sentences:

name/link description year (approx.) mpeg3
CHATR non-uniform unit selection waveform-concatenation from ATR, Japan. Male Voice 1997 mp3
Female Voice 1997 mp3
Markus Binsteiner from TFH Berlin, diphone synthesis with MBROLA (voice de8), simulation of an bavarian accent. 2004 mp3
(Uni-) Dresden Voice 1: concatenative formant-synthesizer 1993 mp3
TUSY: hardware formant-synthesizer 1987 mp3
ROSY (Robotron Synthesizer): hardware formant-synthesizer 1977 mp3
Syni 2: punchcard controlled formant-synthesizer 1975 mp3
Syni 1: punchcard controlled formant-synthesizer 1972 mp3
Eurovocs new version, diphone-synthesis from t & i, technology from Lernout & Hauspie. 1998 mp3
old version, diphone-synthesis from t & i, technology from Lernout & Hauspie. 1996 mp3
First Byte product-name:Monologue, ProVoice. waveform-concatenation synthesis (?) (link broken) 1998 mp3
(Uni-) Köln articulatory-synthesis (actually not a TTS-system) 1996 mp3
SAMT (Sprach-Ausgabesystem in Multiplex-Technik): hardware-based formant synthesis of former Forschungsinstitut der Deutschen Bundespost. 1987 mp3
Texas Instruments Language Translator LPC coded word-concatenation from Texas Instruments. Male Voice 1980 mp3
SpeakEaZy waveform-concatenation synthesis from Keller & Trauth. (link broken). 1998 mp3
Spengi diphone-synthesis from Philips/IPO Eindhoven 1997 mp3
LAIPTTS-D TTS-system from the university of Lausanne (LAIP), uses MBROLA -engine. Includes a model to reduce/elaborate articulation according to speech-rate. 1998 mp3mp3
University of West Bohemia in Pilsen concatenative synthesizer ARTIC (ARtificial Talker In Czech). Commercial version available by speechtech by the name of ERIS. 2002 mp3
Univ. of Rostock, Peter Birkholz Vocal Tract Lab: Articulatory synthesis (handtweaked articulatory movements transformed into a mathematical model to generate soundwaves) 2006 mp3

Service products

The following table lists some products to enhance text-to-speech quality.

company product description date sample
ReadSpeaker SagEs / SayIt Serverbased website reader. Based on Acapela products. Sample reads a newspaper article (Tagesspiegel). Note pronunciation of the word "playstation". 7/11/07 mp3
ETeX - Dictionaries. 1/7/05 mp3
Interlinx, aquired by Speech Concept emphasis / SpeechOptimizer Tuning tool for pronounciation and prosody modeling. 1/7/05 mp3

Further examples

Speechsynthesis examples, that did not fit otherwise.

Description Example
Ultrafast speechsynthesis as used by blind, with 14 syllables per second, based on formant synthesis Eloquence mp3
realspeak British English, 31/5/05, "Flight LH312 from Frankfurt to Berlin." mp3
TTS of the Fiat "Blue & Me" Navigation Headunit with Microsoft CE. Voice Steffi of Nuance. mp3
Apple Iphone 2011, Recorded with PC Mikrofon from Apple iPhone 4.1, TTS is faster compact version of Voice Yannick von Nuance mp3, mp3, mp3

Licensed Systems

the following engines are based on systems with a different name:


Missing examples

For the following systems I didn't yet get samples:


Unknown examples

For the following systems I have no information about the supplier:


Categorization of text-to-speech systems

Systems are usually either system- or signal modeling, primarily rule-based or data-based and can be distinguished by the type of the basic units and the way they are coded. tts technology overview


Credits:

the following persons delivered information and/or samples:


Changelog


Speechsynthesis-demos with simulated emotion