site navigation


German Text-to-Speech

last update: 16th February 2018

Contents:

  1. Foreword
  2. Commercial systems
  3. Universities/research
  4. Other systems
  5. Service systems
  6. Further samples
  7. Licensed products
  8. Missing examples
  9. Unknown examples
  10. TTS classification chart
  11. Credits
  12. Change-log

Remarks

Terminology

I added a chart to facilitate the understanding of the concepts used for classification. It's kind of out-dated, as non-uniform unit selection is not explicitely mentioned.

TTS consists always of two components, which I call Dutoit's introduction):

The engines that synthesize the speech (DSP-component) are based mainly on five main technologies:


The test sentences were:

sentence 1:

An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht. Dabei war eigentlich immer sehr schönes Wetter gewesen.

As I found this sentence a bit too simple, I thought up another test sentence which contains a collection of known problems for the NLP module: (in some demos this sentence is truncuated due to provider's restriction on character number)

sentence 2:

Dr. A. Smithe von der NATO (und nicht vom CIA) versorgt z.B. - meines Wissens nach - die Heroin seit dem 15.3.00 tgl. mit 13,84 Gramm Heroin zu 1,04 DM das Gramm.

Speaking now 6 years after thinking up those sentences, more pressing problems for German speech synthesis used in services like email-reading arise from the pronounciation of english terms, e.g. the following sentence would not be pronounced correctly by most systems without tuning:

sentence 3:

Die Manpowerdiskussion wird gecancelt, du kannst das File vom Server downloaden.

Commercial

company/linkdescription/engine nametechnologylanguagesvoice nameyear (approx.)s1s2s3
Acapela

Acapela was formed in December 2003 from a combination of three European companies specializing in vocal technologies, Babel Technologies (Belgium), Infovox (Sweden) and Elan Speech (France).
Acapela HQ TTSnon-uniform unit-selectionDE, FR, NL, ES, SE, US, SA, CY, DN, FI, CA, GR, IE, NO, PL, PT, BR, RU, TRClaudia
2015mp3mp3mp3
Claudia (Smile)
2017mp3mp3mp3
Lea (Child)
2013mp3mp3mp3
Jonas (child)
2013mp3mp3mp3
Andreas
2011mp3mp3mp3
Julia
2009mp3mp3mp3
Klaus
2006mp3mp3mp3
Sarah
2003mp3mp3mp3
Custom Voicenon-uniform unit-selection, ANNDEFelix DNN (page author):
Artificial neural net model adapted with 15 minutes data
2017mp3mp3mp3
Felix (page author):
Non-uniform unit-selection from 2 hours data
2017mp3mp3mp3
Greeting Bunnynon-uniform unit-selectionDE, US, FR, IT, ES, NL, SE, NO, DK, BEBunny
2008mp3mp3mp3
Aculab

Aculabdiphone
diphone-concatenation with LPC coded units. LPC (linear predictive coding), originally a compression algorithm, useful for synthesis because based on a source/filter model of speech.
DE, UK, US, FR, BR, IT, ESJulia
1998mp3mp3-
Aristech

Formerly Speechconcept
Cerevoicenon-uniform unit-selection
Developments from Aristech, CereProc and University of Edinburgh
DE, EN, FR, IT, ES, US, NL, JPSophie, adult
Corporate Voice, courtesy of Aristech
2011mp3mp3mp3
Leopold, Austrian adult
courtesy of Aristech
2013mp3mp3mp3
Alex, adult
courtesy of Aristech
2016mp3mp3mp3
Gudrun, adult
courtesy of Aristech
2013mp3mp3mp3
Nick, youth
courtesy of Aristech
2011mp3mp3mp3
Saskia, youth
courtesy of Aristech
2011mp3mp3mp3
Atip

Proserdiphone
NLP-component and voices from Atip, Mbrola Engine (diphone-concatenation) from Babeltech
DE, USCarla
2000mp3mp3mp3
Erkan
Turkish accent
2004mp3mp3mp3
Fifi
French accent
2004mp3mp3mp3
Steffen
2000mp3mp3mp3
Eva
2000mp3mp3mp3
AT&T

Natural Voicesnon-uniform unit-selectionDE, IT, US, UK, FR, MX*Klara
2001mp3mp3mp3
Reiner
2002mp3mp3mp3
Babeltech

Brightspeechnon-uniform unit-selection
same as Acapela HQ TTS
Ingrid
2002mp3mp3-
Babildiphone
diphone-concatenation based on commercial Mbrola-engine. MBROLA (Multi Band Resynthesis Overlap and Add), similar to PSOLA but the database is treated beforehand to adapt pitch, amplitude and spectral features.
DE, US, UK, ES, FR, NL, BE, BR, PT, IT, SE, NO, DK, FI, IS, TR, CZ, SAEva
2000mp3mp3mp3
Greta
2000mp3mp3mp3
Steffen
1997mp3mp3mp3
Helga
Same as Infovox 330
1998---
Gerhard
Same as Infovox 330
1998---
Bell Labs

diphone
LPC-diphone concatenation
DE, FR, ES, US, UK, IT, RU, RO, CN
mp3mp3-
Centigram

Acquired by Lernout & Hauspie, later Nuance
TruVoiceformantDE, US, MX*, FR, IT
1996mp3mp3-
Cepstral

Cepstral TTSnon-uniform unit-selection
Associated wiith Alan Black, one of the pioneers of non-uniform unit-selection and lead scientist of Festival, an open source text-to-speech framework developed at Univ. of Edinburgh and the CMU.
DE, UK, US, ES, FR, EG, TH, AFKathrin
2003mp3mp3mp3
Matthias
2003mp3mp3mp3
Deutsche Telekom

Berkom TTSformant
Research system by former rd department of German Telekom. Hybrid approach combining formant synthesis for voiced phonemes and concatenating with waveform coded units for unvoiced parts.
DEFelix
1998mp3mp3mp3
SAMThardware-based formant synthesis
(Sprach-Ausgabesystem in Multiplex-Technik): hardware-based formant synthesis of former Forschungsinstitut der Deutschen Bundespost.
DE

Other sample: mp3
1987---
Digital Equipment Corporation

DecTalkformant
First commercial text-to-speech synthesizer. Rule based formant-synthesis - the legendary formant synthesizer, based on Klatt's MITTalk)
DE, US, UK, ES, MX*, FR
1982mp3mp3-
Elan

SaySonon-uniform unit-selectionDE, US, FR, IT, ESLea
2003mp3mp3mp3
Tempodiphone
Pitch Synchronous Overlap and Add (PSOLA): famous algorithm to change pitch and time of speech that made diphone-synthesis a great success for many years.
DE, US, UK, FR, ES, IT, BR, PT, RU, PLThomas
1998mp3mp3mp3
Dagmar
1996mp3mp3mp3
Eloquent Technologies

Aquired by Scansoft.
ETI Eloquence
rule-based formant-synthesis (Klatt-style). Later sold by Speechworks, also licensed to IBM (ViaVoice Outloud)
DE, UK, US, ES, MX, FR, CA(FR), IT, FI, BR, CN, JP, KR
1998mp3mp3-
GData

Logoxmicrosegment synthesis
Microsegmentsynthesis (concatenating subphonetic units), not developed any more. Originally based on research from Univ. of Saarbrücken.
DE, US, UKDefault voice
2000mp3mp3-
Bill
1998mp3mp3mp3
Bill (Swabian accent)
2002mp3mp3mp3
Bill (Hessian accent)
2002mp3mp3mp3
Bill (Saxon accent)
2002mp3mp3mp3
Bill (French accent)
2002mp3mp3mp3
Google

non-uniform unit-selectionFemale
Samples were accessed via the translation service.
2013mp3mp3mp3
IBM

CTTSnon-uniform unit-selectionDE, US, UK, JP, KR, IT, ES, FRMale
Courtesy of IBM. Database speaker is Gilles Karolyi. Sentence 3 sample is 8 kHz.
2002mp3mp3mp3
Female

Other sample: mp3
2004---
Infovox

330/Infovox Desktopdiphone-concatenation
Probably same as Babeltech Babil. Infovox 310 is apple version
DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SEHelga
8 kHz version
Other sample: mp3
1996mp3mp3mp3
Gerhard
8 kHz version
Other sample: mp3
1996mp3mp3mp3
210/230formant-synthesis
successor of KTH's OVE, originally telia promotor
DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE
1994mp3mp3-
Desktop PROnon-uniform unit-selection
same as Acapela HQ TTS

---
Innoetics

non-uniform unit-selection
Development system from unsupervised audiobook extraction
DE, US, UK, GR, BGChristian
Courtesy of Innoetics
2015mp3mp3mp3
Claudia
Courtesy of Innoetics
2015mp3mp3mp3
Jessi
Courtesy of Innoetics
2015mp3mp3mp3
Kalrsson
Courtesy of Innoetics
2015mp3mp3mp3
Ivona

Owned by Amazon
Ivona TTSnon-uniform unit-selection
Licensed by Lumenvox.
DE, US, UK, ES, RO, PL, MXHans
2011mp3mp3mp3
Marlene
2011mp3mp3mp3
Lernout & Hauspie

Acquired by Scansoft in 2001n after bankruptcy
TTS3000diphoneDE, US, UK, NL, FR, RU, ES, MX, BR, CN, KRStefan
1996mp3mp3-
Anna

Other sample: mp3
1996---
Loquendo

Acquired by Nuance in 2011
Loquendo TTSnon-uniform unit-selection
Formerly called Actor
DE, IT, ES, FR, BR, PT, CN, UK, US, MX, GR, CL, AR, SEKatrin
Courtesy of Loquendo.
2003mp3mp3mp3
Stefan
Courtesy of Loquendo.
2003mp3mp3mp3
Ulrike
2001mp3mp3mp3
Meridian

Orpheusformant
Formerly from Dolphin Oceanic Ltd. Specialized on fast speech as used by blind customers.
DE, UK, US, FR, BR, PT, IT, ES, Welsh, CN, MD, CR, DN, NL, FI, GR, HU, LT, MY, NO, PL, RO, MX, SEOrpheus
2009mp3mp3mp3
Microsoft

Microsoft Speech Platform - Runtime Languages (Version 11)non-uniform unit-selectionES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CNHedda
2012mp3mp3mp3
Neospeech

A Hoya company. As is ReadSpeaker.
non-uniform unit-selectionDE, US, UK, MX, TW, TH, KR, IT, CN, CH, JP, CT, BR, PT, FRLena
2018mp3mp3mp3
Tim
2018mp3mp3mp3
Nuance

Formerly Scansoft (originating from Kurzweil and Xerox), acquired Europeean pioneers Lernout & Hauspie in 2001, took the name of a smaller company named Nuance which they acquired in 2005
Vocalizer DNNArtificial neural netsUSNuance Website Sample

Other sample: mp3
2018---
Vocalizernon-uniform unit-selection
Formerly called RealSpeak (Vocalizer was the name of the original Nuance product), originally from Lernout & Hauspie), converged with RVoice (formerly Rhetorical) . First commercial German unit-selection TTS
DE, NL, PT, CA, CN, ES, DK, PT, FR, IT, JP, KR, MX, NO, PL, RU, SE, US, UK, AU, SA, ID, Basque, BE, CZ, FI, GR, IN, HU, TH, TR, ZA, ROVictor
2016mp3mp3mp3
Anna
11 kHz, courtesy of Nuance
2010mp3mp3mp3
Yannick
11 kHz, courtesy of Nuance
2006mp3mp3mp3
Yannick 2
Yannick embedded version recorded from a cell phone
2009mp3mp3mp3
Monika
and Beate (?) - same as RVoice F026
2005mp3mp3mp3
Steffi
8 kHz
2004mp3mp3mp3
Steffi 2
Newer version with enhanced voicequality and better pronunciation.
2015mp3mp3mp3
Vera
8 kHz
1999mp3mp3mp3
Nuance (until 2005)

Acquired by Scansoft in 2005
Vocalizer 4.05non-uniform unit-selectionDE, US, UK, AU, CA(FR), MX*, BRAnna Weber
2004mp3mp3-
Vocalizer 1.0non-uniform unit-selection
licensed Fonix engine
DE, US, UK, NL, FR, IT, NO, ES, SE
2001mp3mp3-
ReadSpeaker

A Hoya company. As is NeoSpeech. Formerly called rSpeak
non-uniform unit-selection
using deep neural artificial networks
DE,GB,US,AU,ES,FR,NL,SEMax
courtesy of ReadSpeaker
2018mp3mp3mp3
Rhetorical Systems

Was headquartered in Edinburgh, Scotland. Acquired by Scansoft / Nuance in 2004
RVoicenon-uniform unit-selectionDE, UK, US, GR, ESF026
2004mp3mp3mp3
M027
2004mp3mp3mp3
F018
mp3mp3mp3
Speechworks

Acquired by Scansoft / Nuance in 2003
Speechifynon-uniform unit-selectionDE, US, UK, AU, JP, MX*, FR, BR, CA(FR)Tessa
2002mp3mp3mp3
Svox

Originally a spin-off from ETH Zurich. Acquired by Nuance in 2011
Svox Corporatenon-uniform unit-selectionDE, FR, IT, US, ESPetra
2005mp3mp3mp3
Markus
2005mp3mp3mp3
Marlene

Other sample: mp3
2003---
diphoneDE, FR, IT, US, ESNicole
2000mp3mp3-
Tom Weber Software

OnScreenVoicesnon-uniform unit-selectionDEAndreas
Samples courtesy of tom weber software
2015mp3mp3mp3
Marianne
Samples courtesy of tom weber software
2015mp3mp3mp3
VoiceINTERConnect

diphone
Commercial version of the Dress Synthesizer (University of Dresden).
female voice
2000mp3mp3mp3
male voice
2000mp3mp3mp3
Votrax

formant
Early hardware Formant synthesizer. Samples taken from an Audiodata Braille reader.
DE
1974mp3mp3mp3
Voxygen

Spin-off from French Orange Labs.
Hybrid non-uniform unit-selection / HMM synthesisDE, FR, EN, ES, IT, ARSylvia
courtesy of Voxygen
2014mp3mp3mp3
Matthias
courtesy of Voxygen
2014mp3mp3mp3

Universities / Research

InstitutionSystemRemarkYear (approx.) / remarks1s2s3
IKP Bonn

BOSS
non-uniform unit-selection
2001

mp3mp3mp3
Hadifix
mixed inventory concatenation
HADIFIX = HAlbsilben, DIphone und suFIXe
DE
1995

mp3mp3-
University of Budapest

Multivox 5 (Profivox)
diphone synthesis
2004
male speaker 1
mp3mp3-
2004
male speaker 2
mp3mp3-
Multivox 3
formant synthesis
DE, HU, FI, NL, ES, PT, SA, Esperanto
1994


Other sample: mp3
---
DFKI

Mary
non-uniform unit-selection
Mary=modular architecture for speech synthesis, open source. Great tool also to teach about speech synthesis because the output and input of different poicessing modules can be viewed as text.
DE, EN , Tibetian
2011
Pavoque corpus
mp3mp3mp3
2007
Bits 1
for details see Schröder, M. & Hunecke, A. (2007). Creating German Unit Selection Voices for the MARY TTS Platform from the BITS Corpora. Proc. SSW6, Bonn, Germany.
mp3mp3mp3
2007
Bits 2
mp3mp3mp3
2007
Bits 3
mp3mp3mp3
2007
Bits 4
mp3mp3mp3
Mary/Mbrola
diphone
DE, EN
2000

mp3mp3mp3
Technical university of Dresden

DRESS
diphone synthesis
1996

mp3mp3mp3
Voice 1
concatenative formant-synthesizer
1993


Other sample: mp3
---
TUSY
hardware formant-synthesizer
1987


Other sample: mp3
---
ROSY
hardware formant-synthesizer
Robotron Synthesizer
1977


Other sample: mp3
---
Syni 2
punchcard controlled formant-synthesizer
Robotron Synthesizer
1975


Other sample: mp3
---
Syni 1
punchcard controlled formant-synthesizer
Robotron Synthesizer
1972


Other sample: mp3
---
Jonathan Duddington

eSpeak
formant-synthesis
based on the 1995 unix "speak"-program. Open-source
2006

mp3mp3mp3
ETH Zürich

Svox
diphone-concatenation
Predecessor of the commercial version later acquired by Nuance.
1998

mp3mp3mp3
Gerhard Mercator University of Duisburg


formant-synthesis
1996

mp3mp3-
KTH Stockholm

Infovox
formant synthesis
Developed by Rolf Carlson, Bjorn Granströ;m and Sheri Hunnicut
1992

mp3--
Ove III
Hardware formant synthesis
Orator Verbis Electris (OVE) . Developed by Gunnar Fant
1967


Other sample: mp3
---
University of Mons

Mbrola
diphone-synthesis
Mbrola: Multi-band Resynthesis Overlap and Add. The NLP (text phonemisation) component is Txt2Pho, the Hadifix NLP in combination with Mbrola-Synthesis . Available for free for noncommercial use. MBROLA-TTS is avalable for about 34 different languages.
1998
de8
Markus Binsteiner's work an a Bavarian dialect
Other sample: mp3
---
2000
de7
(by Marc Schröder, DFKI/Uni Saarland, female, 22 kHz), all diphones in three voice qualities (for emotional speech simulation).
mp3mp3mp3
2000
de6
(by Marc Schröder, DFKI/Uni Saarland, male, 22 kHz), all diphones in three voice qualities (for emotional speech simulation).
mp3mp3mp3
2000
de5
by Fred Englert (ATIP), female, 22 kHz
mp3mp3mp3
2000
de4
By IMS Stuttgart, male, 16 kHz, includes english and french diphones
mp3mp3mp3
2000
de3
by ATIP, female, first 22005 kHz voice
mp3mp3mp3
1997
de2
By ATIP, male, 16 kHz
mp3mp3mp3
1996
de1
By ATIP, female, 16 kHz
mp3mp3mp3
ÖFAI (Austrian Research Institute for Artificial Intelligence)

VieCtoS
demisyllable-LPC-concatenation
Vienna Concept-to-Speech system. If the prosody sounds poor it's due to my limited knowledge of Tobi-Labels.
1998

mp3--
OGI, Oregon Graduate Institute,


LPC-diphone concatenation
Developed at the OGI, Center for Spoken Language Understanding during a workshop in 1998. TTS-Framework is Festival
1998

mp3mp3-
Ruhr Univerität Bochum

SyRUB, Version 4.1.11995

mp3mp3mp3
Simple4All

Tundra corpus
non-uniform unit-selection
EU FP7 Project "Simple4All" Tundra corpus, system features unsupervised learning.
2013

mp3mp3mp3

with the following systems it wasn't possible to synthesize own sentences:

name/linkdescriptionyear (approx.)mpeg3
AEG Telefunken



unknown concatenation ("Parcor-Synthetisator")
1978

mp3
ATR



non-uniform unit selection
1997
male
mp3
1997
female
mp3
Bose

unkown

unkown
recorded from a bose mini soundlink II bluetooth speaker february 2018
2018

mp3
Univ. of Dresden, Peter Birkholz

Vocal Tract Lab

Articulatory synthesis


Handtweaked articulatory movements transformed into a mathematical model to generate soundwaves
-
ELIS Lab

Eurovocs

diphone-synthesis
Technology from Lernout & Hauspie
1998

mp3
1996

mp3
First Byte



product-name:Monologue, ProVoice. waveform-concatenation synthesis (?
1998

mp3
HHI: Heinrich Hertz Institut



technology unknown
1978

mp3
Keller & Trauth.

SpeakEaZy

waveform-concatenation synthesis
1998

mp3
Wolfgang_von_Kempelen's Speaking Machine



Hardware manual sound generator ("papa", "mama")
1769

mp3
University of Köln

Institut für Phonetik


articulatory-synthesis (actually not a TTS-system)
1996

mp3
Karl Küpfmüller / Bernhard Cramer



Hardware phoneme concatenation
1955

mp3
University of Lausanne (LAIP)



TTS-system from the university of Lausanne (LAIP), uses MBROLA -engine. Includes a model to reduce/elaborate articulation according to speech-rate.
1998

mp3
Mila (Machine learning laboratory at the University of Montrea)

Char2Wav

Deep neural artificial networks from University of Montreal: An end-to-end model for speech synthesis learned with Deeplearning4J. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. For the German samples, the Pavoque database was used for training.
2017

mp3
Philips/IPO Eindhoven

Spengi

diphone-synthesis
1997

mp3
Unknown Russian TTS



unknown / formant?
1970

mp3
H.W. Strube, University of Göttingen



Articulatory synthesis.
1977

mp3
Texas Instruments Language Translator



LPC coded word-concatenation
1980
Male Voice
mp3
University of West Bohemia in Pilsen

ARTIC (ARtificial Talker In Czech)

concatenative synthesizer
Commercial version available by speechtech by the name of ERIS.
2002

mp3





-

Service products

The following table lists some products to enhance text-to-speech quality.

companyproductdescriptiondatesample
ReadSpeaker, now commercialize their own engine under the name rSpeak, both a Hoya company.SagEs / SayItServerbased website reader. Based on Acapela products. Sample reads a newspaper article (Tagesspiegel). Note pronunciation of the word "playstation".7/11/07mp3
ETeX-Dictionaries.1/7/05mp3
Interlinx, aquired by Speech Conceptemphasis / SpeechOptimizerTuning tool for pronounciation and prosody modeling.1/7/05mp3

Further examples

Speechsynthesis examples, that did not fit otherwise.

DescriptionExample
Ultrafast speechsynthesis as used by blind, with 14 syllables per second, based on formant synthesis Eloquencemp3
realspeakBritish English, 31/5/05, "Flight LH312 from Frankfurt to Berlin."mp3
TTS of the Fiat "Blue & Me" Navigation Headunit with Microsoft CE. Voice Steffi of Nuance.mp3
Apple Iphone 2011, Recorded with PC Mikrofon from Apple iPhone 4.1, TTS is faster compact version of Voice Yannick von Nuancemp3, mp3, mp3
"Karlchen": Telephone based automatic train information system from the 70iesmp3

Licensed Systems

the following engines are based on systems with a different name:

Missing examples

For the following systems I didn't yet get samples:


Unknown examples

For the following systems I have no information about the supplier: