German Text-to-Speech

last update: 12th November 2025

Remarks

This collection of German TTS samples is maintained by Felix Burkhardt
There's also a page on Speechsynthesis demos with simulated emotion
I appreciate hints about missing systems! ( felixbur@gmx.de )
Generally the demonstrations don't show the up-to-date quality of the systems!
The information provided is to my best knowledge, but of course I can't guarantee for the correctness!
Some difference in quality is due to different sample-rates; most demos are 16 kHz but some are 8 kHz or 22.
Don't rely on the year specifications, I'm very unsure about most of them. Many of them simply denote the year when I took the samples (although they're meant to stand for the year when the voice / engine was released).
The remark "courtesy of company X" means that I got the samples specially for this page from the respective company. Possibly they made special adjustments wrt. pronounciation and prosody.

Terminology

I added a chart to facilitate the understanding of the concepts used for classification. It's kind of out-dated, as non-uniform unit selection is not explicitely mentioned.

TTS consists always of two components, which I call Dutoit's introduction):

NLP (Natural Language Processing): conversion of orthographical text into phoneme-alphabet and prosody description.
DSP (Digital Speech Processing): speech engine: synthesis of speechsignal from ouput of NLP-component.

The engines that synthesize the speech (DSP-component) are based mainly on five main technologies:

DNN Synthesis: Quite the newest addition to speech synthesis algorithms are artificial neural networks (or deep neural nets, meaning the number of layers is higher than in traditional artificial neural network architecture, say: five. TTS with neural nets has been done since many decades but to my knowledge not for German). They replace the HMM approach to predict the best acoustic parameters for a given sequence of symbols representing text.
HMM Synthesis: Synthesis based on Hidden Markov Models, a statistical approach to model the transition probabilites of the acoustic parameters based on the speech to be generated. The approaches are trained on a relatively large data corpus, but have a small footprint for synthesis because they don't operate on the wavedata directly but on some parameterized representation (e.g. LPC). However this is also the reason they tend to produce artifacts. Sample: Simple4All
Non-uniform unit-selection: Best fitting chunks of speech from large databases get concatenated, minimizing a double cost-function: best fit to neighbor unit and best fit to target prosody. Sounds most natural (similar to original speaker), but unflexible and large footprint. Sample: RealSpeak
Diphone-synthesis: Speech concatenated from diphone-units (two-phone combinations), prosody-fitting done by signal-manipulation (depends on unit-coding). relatively small footprint but not very natural. Sample: Bell-Labs synthesis
Formant-synthesis: Speech synthesized by physical models (formants are resonance frequencies in vocal-tract). Very flexible and smallest footprint, but very unnatural. Sample: Eloquence

The test sentences were:

sentence 1:

An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht. Dabei war eigentlich immer sehr schönes Wetter gewesen.

As I found this sentence a bit too simple, I thought up another test sentence which contains a collection of known problems for the NLP module: (in some demos this sentence is truncuated due to provider's restriction on character number)

sentence 2:

Dr. A. Smithe von der NATO (und nicht vom CIA) versorgt z.B. - meines Wissens nach - die Heroin seit dem 15.3.00 tgl. mit 13,84 Gramm Heroin zu 1,04 DM das Gramm.

Speaking now 6 years after thinking up those sentences, more pressing problems for German speech synthesis used in services like email-reading arise from the pronounciation of english terms, e.g. the following sentence would not be pronounced correctly by most systems without tuning:

sentence 3:

Die Manpowerdiskussion wird gecancelt, du kannst das File vom Server downloaden.

Commercial

company/link	description/engine name	technology	languages	voice name	year (approx.)	s1	s2	s3
Acapela Acapela was formed in December 2003 from a combination of three European companies specializing in vocal technologies, Babel Technologies (Belgium), Infovox (Sweden) and Elan Speech (France).	Acapela HQ TTS	non-uniform unit-selection	DE, FR, NL, ES, SE, US, SA, CY, DN, FI, CA, GR, IE, NO, PL, PT, BR, RU, TR	Claudia	2015
				Claudia (Smile)	2017
				Lea (Child)	2013
				Jonas (child)	2013
				Andreas	2011
				Julia	2009
				Klaus	2006
				Sarah	2003
	Custom Voice	non-uniform unit-selection, ANN	DE	Felix DNN (page author): Artificial neural net model adapted with 15 minutes data	2017
	Custom Voice	non-uniform unit-selection, ANN	DE	Felix (page author): Non-uniform unit-selection from 2 hours data	2017
	Greeting Bunny	non-uniform unit-selection	DE, US, FR, IT, ES, NL, SE, NO, DK, BE	Bunny	2008
Aculab	Aculab	diphone diphone-concatenation with LPC coded units. LPC (linear predictive coding), originally a compression algorithm, useful for synthesis because based on a source/filter model of speech.	DE, UK, US, FR, BR, IT, ES	Julia	1998			-
Aristech Formerly Speechconcept	Cerevoice	non-uniform unit-selection Developments from Aristech, CereProc and University of Edinburgh	DE, EN, FR, IT, ES, US, NL, JP	Sophie, adult Corporate Voice, courtesy of Aristech	2011
				Leopold, Austrian adult courtesy of Aristech	2013
				Alex, adult courtesy of Aristech	2016
				Gudrun, adult courtesy of Aristech	2013
				Nick, youth courtesy of Aristech	2011
				Saskia, youth courtesy of Aristech	2011
Atip	Proser	diphone NLP-component and voices from Atip, Mbrola Engine (diphone-concatenation) from Babeltech	DE, US	Carla	2000
				Erkan Turkish accent	2004
				Fifi French accent	2004
				Steffen	2000
				Eva	2000
AT&T	Natural Voices	non-uniform unit-selection	DE, IT, US, UK, FR, MX*	Klara	2001
AT&T	Natural Voices	non-uniform unit-selection	DE, IT, US, UK, FR, MX*	Reiner	2002
Babeltech	Brightspeech	non-uniform unit-selection same as Acapela HQ TTS		Ingrid	2002			-
	Babil	diphone diphone-concatenation based on commercial Mbrola-engine. MBROLA (Multi Band Resynthesis Overlap and Add), similar to PSOLA but the database is treated beforehand to adapt pitch, amplitude and spectral features.	DE, US, UK, ES, FR, NL, BE, BR, PT, IT, SE, NO, DK, FI, IS, TR, CZ, SA	Eva	2000
				Greta	2000
				Steffen	1997
				Helga Same as Infovox 330	1998	-	-	-
				Gerhard Same as Infovox 330	1998	-	-	-
Bell Labs		diphone LPC-diphone concatenation	DE, FR, ES, US, UK, IT, RU, RO, CN					-
Centigram Acquired by Lernout & Hauspie, later Nuance	TruVoice	formant	DE, US, MX*, FR, IT		1996			-
Cepstral	Cepstral TTS	non-uniform unit-selection Associated wiith Alan Black, one of the pioneers of non-uniform unit-selection and lead scientist of Festival, an open source text-to-speech framework developed at Univ. of Edinburgh and the CMU.	DE, UK, US, ES, FR, EG, TH, AF	Kathrin	2003
Cepstral	Cepstral TTS		DE, UK, US, ES, FR, EG, TH, AF	Matthias	2003
Deutsche Telekom	Berkom TTS	formant Research system by former rd department of German Telekom. Hybrid approach combining formant synthesis for voiced phonemes and concatenating with waveform coded units for unvoiced parts.	DE	Felix	1998
Deutsche Telekom	SAMT	hardware-based formant synthesis (Sprach-Ausgabesystem in Multiplex-Technik): hardware-based formant synthesis of former Forschungsinstitut der Deutschen Bundespost.	DE	Other sample:	1987	-	-	-
Digital Equipment Corporation	DecTalk	formant First commercial text-to-speech synthesizer. Rule based formant-synthesis - the legendary formant synthesizer, based on Klatt's MITTalk)	DE, US, UK, ES, MX*, FR		1982			-
Elan	SaySo	non-uniform unit-selection	DE, US, FR, IT, ES	Lea	2003
	Tempo	diphone Pitch Synchronous Overlap and Add (PSOLA): famous algorithm to change pitch and time of speech that made diphone-synthesis a great success for many years.	DE, US, UK, FR, ES, IT, BR, PT, RU, PL	Thomas	1998
	Tempo		DE, US, UK, FR, ES, IT, BR, PT, RU, PL	Dagmar	1996
Elevenlabs	elevenlabs	end2end End2end synthesis models speech by a large neural network.	many	Matilda	2025
Eloquent Technologies Aquired by Scansoft.	ETI Eloquence	rule-based formant-synthesis (Klatt-style). Later sold by Speechworks, also licensed to IBM (ViaVoice Outloud)	DE, UK, US, ES, MX, FR, CA(FR), IT, FI, BR, CN, JP, KR		1998			-
GData	Logox	microsegment synthesis Microsegmentsynthesis (concatenating subphonetic units), not developed any more. Originally based on research from Univ. of Saarbrücken.	DE, US, UK	Default voice	2000			-
				Bill	1998
				Bill (Swabian accent)	2002
				Bill (Hessian accent)	2002
				Bill (Saxon accent)	2002
				Bill (French accent)	2002
google	wavenet	wavenet: artificial neural nets end-to-end	AF, AR, BG, BN, CA, CS, DA, DE, EL, EN, ES, FI, FIL, FR, GU, HI, HU, IN, IS, IT, JA, KN, KS, LV, ML, MS, NL, NO, PL, PT, RO, RU, SL, SR, SV, TA, TE, TH, TL, TR, UK, VI, ZH	Wavenet A (female)	2018
				Wavenet B (male)	2018
				Wavenet C (female)	2018
				Wavenet D (male)	2018
				Wavenet E	2022
				Wavenet F	2022
	Google Basic	so-called basic (non-uniform unit selection?)	AF, AR, BG, BN, CA, CS, DA, DE, EL, EN, ES, FI, FIL, FR, GU, HI, HU, IN, IS, IT, JA, KN, KS, LV, ML, MS, NL, NO, PL, PT, RO, RU, SL, SR, SV, TA, TE, TH, TL, TR, UK, VI, ZH	Standard A (female)	2018
				Standard B (male)	2018
				Basic C	2022
				Basic D	2022
				Basic E	2022
				Basic F	2022
	Google Translate	non-uniform unit-selection		Female Samples were accessed via the translation service.	2013
ibm	Watson	unknown	CS, DE, EN, ES, FR, IT, JA, KS, NL, PT, SV, ZH	Birgit	2022
				Dieter	2022
				Erika	2022
	CTTS	non-uniform unit-selection	DE, US, UK, JP, KR, IT, ES, FR	Male Courtesy of IBM. Database speaker is Gilles Karolyi. Sentence 3 sample is 8 kHz.	2002
	CTTS	non-uniform unit-selection	DE, US, UK, JP, KR, IT, ES, FR	Female Other sample:	2004	-	-	-
Infovox	330/Infovox Desktop	diphone-concatenation Probably same as Babeltech Babil. Infovox 310 is apple version	DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE	Helga 8 kHz version Other sample:	1996
	330/Infovox Desktop		DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE	Gerhard 8 kHz version Other sample:	1996
	210/230	formant-synthesis successor of KTH's OVE, originally telia promotor	DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE		1994			-
	Desktop PRO	non-uniform unit-selection same as Acapela HQ TTS				-	-	-
Innoetics		non-uniform unit-selection Development system from unsupervised audiobook extraction	DE, US, UK, GR, BG	Christian Courtesy of Innoetics	2015
				Claudia Courtesy of Innoetics	2015
				Jessi Courtesy of Innoetics	2015
				Kalrsson Courtesy of Innoetics	2015
Ivona Owned by Amazon	Ivona TTS	non-uniform unit-selection Licensed by Lumenvox.	DE, US, UK, ES, RO, PL, MX	Hans	2011
Ivona Owned by Amazon	Ivona TTS	non-uniform unit-selection Licensed by Lumenvox.	DE, US, UK, ES, RO, PL, MX	Marlene	2011
Lernout & Hauspie Acquired by Scansoft in 2001n after bankruptcy	TTS3000	diphone	DE, US, UK, NL, FR, RU, ES, MX, BR, CN, KR	Stefan	1996			-
	TTS3000	diphone	DE, US, UK, NL, FR, RU, ES, MX, BR, CN, KR	Anna Other sample:	1996	-	-	-
Loquendo Acquired by Nuance in 2011	Loquendo TTS	non-uniform unit-selection Formerly called Actor	DE, IT, ES, FR, BR, PT, CN, UK, US, MX, GR, CL, AR, SE	Katrin Courtesy of Loquendo.	2003
				Stefan Courtesy of Loquendo.	2003
				Ulrike	2001
Meridian	Orpheus	formant Formerly from Dolphin Oceanic Ltd. Specialized on fast speech as used by blind customers.	DE, UK, US, FR, BR, PT, IT, ES, Welsh, CN, MD, CR, DN, NL, FI, GR, HU, LT, MY, NO, PL, RO, MX, SE	Orpheus	2009
Microsoft	Microsoft Azure TTS services	deep neural nets DNN	ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN	Amala	2022
				Bernd	2022
				Christoph	2022
				Conrad	2022
				Elke	2022
				Gisela	2022
				Kasper	2022
				Killian	2022
				Klarissa	2022
				Klaus	2022
				Louisa	2022
				Maja	2022
				Ralf	2022
				Tanja	2022
				Katja (Neural)	2020
	Microsoft Mobile Voices	non-uniform unit-selection	ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN	Katja	2014
	Microsoft Mobile Voices	non-uniform unit-selection		Stefan	2014
	Microsoft Speech Platform - Runtime Languages (Version 11)	non-uniform unit-selection	ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN	Hedda	2012
Neospeech A Hoya company. As is ReadSpeaker.		non-uniform unit-selection	DE, US, UK, MX, TW, TH, KR, IT, CN, CH, JP, CT, BR, PT, FR	Lena	2018
Neospeech A Hoya company. As is ReadSpeaker.		non-uniform unit-selection	DE, US, UK, MX, TW, TH, KR, IT, CN, CH, JP, CT, BR, PT, FR	Tim	2018
Nuance Formerly Scansoft (originating from Kurzweil and Xerox), acquired Europeean pioneers Lernout & Hauspie in 2001, took the name of a smaller company named Nuance which they acquired in 2005	Vocalizer DNN	Artificial neural nets	US	Nuance Website Sample Other sample:	2018	-	-	-
	Vocalizer	non-uniform unit-selection Formerly called RealSpeak (Vocalizer was the name of the original Nuance product), originally from Lernout & Hauspie), converged with RVoice (formerly Rhetorical) . First commercial German unit-selection TTS	DE, NL, PT, CA, CN, ES, DK, PT, FR, IT, JP, KR, MX, NO, PL, RU, SE, US, UK, AU, SA, ID, Basque, BE, CZ, FI, GR, IN, HU, TH, TR, ZA, RO	Victor	2016
				Anna 11 kHz, courtesy of Nuance	2010
				Yannick 11 kHz, courtesy of Nuance	2006
				Yannick 2 Yannick embedded version recorded from a cell phone	2009
				Monika and Beate (?) - same as RVoice F026	2005
				Steffi 8 kHz	2004
				Steffi 2 Newer version with enhanced voicequality and better pronunciation.	2005
				Vera 8 kHz	1999
Nuance (until 2005) Acquired by Scansoft in 2005	Vocalizer 4.05	non-uniform unit-selection	DE, US, UK, AU, CA(FR), MX*, BR	Anna Weber	2004			-
Nuance (until 2005) Acquired by Scansoft in 2005	Vocalizer 1.0	non-uniform unit-selection licensed Fonix engine	DE, US, UK, NL, FR, IT, NO, ES, SE		2001			-
Nokia	nokiatts	formant	DE, UK, IT, NL, TR		2005
openAI The makers of ChatGPT.	openai_fm	end2end deep learning (ALM)	many	Alloy style: patient speaker	2025
ReadSpeaker A Hoya company. As is NeoSpeech. Formerly called rSpeak		non-uniform unit-selection using deep neural artificial networks	DE,GB,US,AU,ES,FR,NL,SE	Max courtesy of ReadSpeaker	2018
Rhetorical Systems Was headquartered in Edinburgh, Scotland. Acquired by Scansoft / Nuance in 2004	RVoice	non-uniform unit-selection	DE, UK, US, GR, ES	F026	2004
				M027	2004
				F018
Speechworks Acquired by Scansoft / Nuance in 2003	Speechify	non-uniform unit-selection	DE, US, UK, AU, JP, MX*, FR, BR, CA(FR)	Tessa	2002
Svox Originally a spin-off from ETH Zurich. Acquired by Nuance in 2011	Svox Corporate	non-uniform unit-selection	DE, FR, IT, US, ES	Petra	2005
				Markus	2005
				Marlene Other sample:	2003	-	-	-
		diphone	DE, FR, IT, US, ES	Nicole	2000			-
TextSpeaK	TextSpeakSE Version 2 v3.8.20-TTS-EM-HD2	unknown, perhaps non-uniform unit-selection	DE	Peter	2024
thorstenvoice	VITS	deep learning model: VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech)	DE	Thorsten	2023
thorstenvoice	Tacotron 2 - DDC	deep learning model: Double Decoder Consistency model architecture	DE	Thorsten	2023
tom weber software	Fahrgastansagen TTS	non-uniform unit-selection	DE	Andreas Samples courtesy of tom weber software	2015
tom weber software	Fahrgastansagen TTS	non-uniform unit-selection	DE	Marianne Samples courtesy of tom weber software	2015
VoiceINTERConnect		diphone Commercial version of the Dress Synthesizer (University of Dresden).		female voice	2000
VoiceINTERConnect				male voice	2000
Votrax		formant Early hardware Formant synthesizer. Samples taken from an Audiodata Braille reader.	DE		1974
Voxygen Spin-off from French Orange Labs.		Hybrid non-uniform unit-selection / HMM synthesis	DE, FR, EN, ES, IT, AR	Sylvia courtesy of Voxygen	2014
Voxygen Spin-off from French Orange Labs.		Hybrid non-uniform unit-selection / HMM synthesis	DE, FR, EN, ES, IT, AR	Matthias courtesy of Voxygen	2014

Universities / Research

Institution	System	Remark	Year (approx.) / remark	s1	s2	s3
IKP Bonn	BOSS	non-uniform unit-selection	2001
IKP Bonn	Hadifix	mixed inventory concatenation HADIFIX = HAlbsilben, DIphone und suFIXe DE	1995			-
University of Budapest	Multivox 5 (Profivox)	diphone synthesis	2004 male speaker 1			-
	Multivox 5 (Profivox)	diphone synthesis	2004 male speaker 2			-
	Multivox 3	formant synthesis DE, HU, FI, NL, ES, PT, SA, Esperanto	1994 Other sample:	-	-	-
DFKI	Mary	non-uniform unit-selection Mary=modular architecture for speech synthesis, open source. Great tool also to teach about speech synthesis because the output and input of different poicessing modules can be viewed as text. DE, EN , Tibetian	2011 Pavoque corpus
			2007 Bits 1 for details see Schröder, M. & Hunecke, A. (2007). Creating German Unit Selection Voices for the MARY TTS Platform from the BITS Corpora. Proc. SSW6, Bonn, Germany.
			2007 Bits 2
			2007 Bits 3
			2007 Bits 4
	Mary/Mbrola	diphone DE, EN	2000
Technical university of Dresden	DRESS	diphone synthesis	1996
	Voice 1	concatenative formant-synthesizer	1993 Other sample:	-	-	-
	TUSY	hardware formant-synthesizer	1987 Other sample:	-	-	-
	ROSY	hardware formant-synthesizer Robotron Synthesizer	1977 Other sample:	-	-	-
	Syni 2	punchcard controlled formant-synthesizer Robotron Synthesizer	1975 Other sample:	-	-	-
	Syni 1	punchcard controlled formant-synthesizer Robotron Synthesizer	1972 Other sample:	-	-	-
Michael Pucher with Austrian academy of sciences	hts-engine-world	HMM-based vocoder synthesis, for details see the article M. Pucher, D. Schabus, J. Yamagishi, F. Neubarth, V. Strom: Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis. Speech Communication, Volume 52, Issue 2, February 2010, Pages 164-179. Specialized on Austrian dialects/sociolects. based on open-source software: https://github.com/mipuc/hts-engine-world	2020 LEO Austrian German male		-
			2020 HPO Viennese dialect male		-
			2020 JOE Viennese youth female		-
			2020 KEP Austrian German male, adaptive voice		-
			2020 MPU Austrian German male, adaptive voice		-
Jonathan Duddington	eSpeak	formant-synthesis based on the 1995 unix "speak"-program. Open-source	2006
ETH Zürich	Svox	diphone-concatenation Predecessor of the commercial version later acquired by Nuance.	1998
Gerhard Mercator University of Duisburg		formant-synthesis	1996			-
KTH Stockholm	Infovox	formant synthesis Developed by Rolf Carlson, Bjorn Granströ;m and Sheri Hunnicut	1992		-	-
KTH Stockholm	Ove III	Hardware formant synthesis Orator Verbis Electris (OVE) . Developed by Gunnar Fant	1967 Other sample:	-	-	-
University of Mons	Mbrola	diphone-synthesis Mbrola: Multi-band Resynthesis Overlap and Add. The NLP (text phonemisation) component is Txt2Pho, the Hadifix NLP in combination with Mbrola-Synthesis . Available for free for noncommercial use. MBROLA-TTS is avalable for about 34 different languages.	1998 de8 Markus Binsteiner's work an a Bavarian dialect Other sample:	-	-	-
			2000 de7 (by Marc Schröder, DFKI/Uni Saarland, female, 22 kHz), all diphones in three voice qualities (for emotional speech simulation).
			2000 de6 (by Marc Schröder, DFKI/Uni Saarland, male, 22 kHz), all diphones in three voice qualities (for emotional speech simulation).
			2000 de5 by Fred Englert (ATIP), female, 22 kHz
			2000 de4 By IMS Stuttgart, male, 16 kHz, includes english and french diphones
			2000 de3 by ATIP, female, first 22005 kHz voice
			1997 de2 By ATIP, male, 16 kHz
			1996 de1 By ATIP, female, 16 kHz
ÖFAI (Austrian Research Institute for Artificial Intelligence)	VieCtoS	demisyllable-LPC-concatenation Vienna Concept-to-Speech system. If the prosody sounds poor it's due to my limited knowledge of Tobi-Labels.	1998		-	-
OGI, Oregon Graduate Institute,		LPC-diphone concatenation Developed at the OGI, Center for Spoken Language Understanding during a workshop in 1998. TTS-Framework is Festival	1998			-
Ruhr Univerität Bochum	SyRUB, Version 4.1.1		1995
Simple4All	Tundra corpus	non-uniform unit-selection EU FP7 Project "Simple4All" Tundra corpus, system features unsupervised learning.	2013
Espnet	Hokuspokus model	ANN: Tacotron2 Thanks to kan-bayashi en, jp, de	2022 Hokuspokus
Hochschule Hof, Institut für Informationssysteme	VITS	deep learning Vits (VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) model de	2023 Friedrich
			2023 Eva
			2023 Bernd
	Tacotron2	deep learning Tacotron 2 model de	2023 Hokuspokus

with the following systems it wasn't possible to synthesize own sentences:

name/link	description	year (approx.)	mpeg3
AEG Telefunken	SVS (SPRAUS Voll Synthese) unknown	1975
AEG Telefunken	Karlchen unknown concatenation ("Parcor-Synthetisator") Deutsche Bahn Auskunftssystem	1978
ATR	non-uniform unit selection	1997 male
ATR	non-uniform unit selection	1997 female
Bose	unkown unkown recorded from a bose mini soundlink II bluetooth speaker february 2018	2018
Univ. of Dresden, Peter Birkholz	Vocal Tract Lab Articulatory synthesis	Handtweaked articulatory movements transformed into a mathematical model to generate soundwaves	-
ELIS Lab	Eurovocs diphone-synthesis Technology from Lernout & Hauspie	1998
ELIS Lab		1996
First Byte	product-name:Monologue, ProVoice. waveform-concatenation synthesis (?	1998
HHI: Heinrich Hertz Institut	technology unknown	1978
Keller & Trauth.	SpeakEaZy waveform-concatenation synthesis	1998
SlowSoft	SlangTTS Non-uniform unit-selection synthesis	2020
Wolfgang_von_Kempelen's Speaking Machine	Hardware manual sound generator ("papa", "mama")	1769
University of Köln Institut für Phonetik	articulatory-synthesis (actually not a TTS-system)	1996
Karl Küpfmüller / Bernhard Cramer	Hardware phoneme concatenation	1955
University of Lausanne (LAIP)	TTS-system from the university of Lausanne (LAIP), uses MBROLA -engine. Includes a model to reduce/elaborate articulation according to speech-rate.	1998
Mila (Machine learning laboratory at the University of Montrea)	Char2Wav Deep neural artificial networks from University of Montreal: An end-to-end model for speech synthesis learned with Deeplearning4J. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. For the German samples, the Pavoque database was used for training.	2017
Philips/IPO Eindhoven	Spengi diphone-synthesis	1997
Unknown Russian TTS	unknown / formant?	1970
H.W. Strube, University of Göttingen	Articulatory synthesis.	1977
Texas Instruments Language Translator	LPC coded word-concatenation	1980 Male Voice
University of West Bohemia in Pilsen	ARTIC (ARtificial Talker In Czech) concatenative synthesizer Commercial version available by speechtech by the name of ERIS.	2002
			-

Service products

The following table lists some products to enhance text-to-speech quality.

company	product	description	date
ReadSpeaker, now commercialize their own engine under the name rSpeak, both a Hoya company.	SagEs / SayIt	Serverbased website reader. Based on Acapela products. Sample reads a newspaper article (Tagesspiegel). Note pronunciation of the word "playstation".	7/11/07
ETeX	-	Dictionaries.	1/7/05
Interlinx, aquired by Speech Concept	emphasis / SpeechOptimizer	Tuning tool for pronounciation and prosody modeling.	1/7/05

Further examples

Speechsynthesis examples, that did not fit otherwise.

Description	Example
Ultrafast speechsynthesis as used by blind, with 14 syllables per second, based on formant synthesis Eloquence
realspeak British English, 31/5/05, "Flight LH312 from Frankfurt to Berlin."
TTS of the Fiat "Blue & Me" Navigation Headunit with Microsoft CE. Voice Steffi of Nuance.
Apple Iphone 2011, Recorded with PC Mikrofon from Apple iPhone 4.1, TTS is faster compact version of Voice Yannick von Nuance	, ,

Licensed Systems

the following engines are based on systems with a different name:

SpeaKing Synthesis uses SVOX
LinguaTec VoiceReader uses Nuance Vocalizer voices
Lumenvox uses Ivona
POSSY from the ETH Zürich is a multilingual extension of SVOX
Infovox Desktop Version 2.0 PRO same as Babeltech's Brightspeech
VoicePro from WinDi see Babeltech/Mbrola.
IBM's Viavoice Outloud see Eloquent.
Digalo see Acapela Elan Tempo
Voice RSS uses Microsoft Hedda

Missing examples

For the following systems I didn't yet get samples:

ALLVOC, predecessor of Elan from France Telecom based on PSOLA
Papageno, from Siemens
Tubsy from Technical University Berlin
SyRUB from the University of Bochum.

Unknown examples

For the following systems I have no information about the supplier:

Categorization of text-to-speech systems

Systems are usually either system- or signal modeling, primarily rule-based or data-based and can be distinguished by the type of the basic units and the way they are coded. tts technology overview

Credits:

The following persons delivered information and/or samples:

Ulf Beckmann
Patrick Chabane
Jean-Luc Deladrière
Bernhard Frötschl
Robert Kachel
Adrian Kurz
Michael Lang
Selinay Pachale
Eric Röder
Ali Savas
Stefan Seide
Bernhard Zeller

Changelog

2025/11/12: added Nokia samples
2025/03/24: added OpenAI fm samples
2025/03/17: added Elevenlabs samples
2024/11/18: added TextSpeaK samples
2023/03/15: added iisys samples
2023/03/14: added Thorsten-voice samples
2022/08/18: added Espnet2 samples
2022/03/23: added Microsoft, IBM and Google samples
2020/12/14: added Microsoft Katja DNN samples
2020/7/20: added Pucher samples
2020/5/20: added SlowSoft
2018/10/9: added Microsoft Mobile Voices
2018/9/5: added Google's wavenet samples
2018/4/24: added Google's basic samples
2018/4/24: added AEG Karlchen sample
2018/4/24: added AEG SPRAUS sample
2018/2/16: added Nuance DNN sample
2018/2/15: complete re-write in XML/XSLT, added Bose sample
2018/1/3: added new ReadSpeaker samples and NeoSpeech, both Hoya companies. Removed German version of this page. Too much work
2017/12/13: added Author's custom voice
2017/05/12: added Char2Wav
2017/03/1: added acapela Claudia Smile sample
2017/02/15: added rSpeak Max samples
2016/10/28: added Nuance vocalizer Victor example phrase samples
2016/10/13: added Nuance vocalizer Victor sample
2016/05/09: linked second Firstbyte Provoice sample
2016/04/20: updated Aristech Alex samples
2016/1/4: added Votrax full samples
2015/12/4: removed VoiceRSS because they used Microsoft TTS
2015/12/4: added Vocalizer (Realspek) Steffi newer version
2015/9/17: added Innoetics
2015/5/18: added Acapela voice Claudia
2015/5/9: added OnScreenVoices samples
2014/12/18: added H.W. Strube sample
2014/12/17: added Karlchen
2014/12/17: added Votrax, Von Kempelen, Küpfmüller, HHI, AEG Telefunken, OVE III and unknown Russian TTS
2014/11/27: added Voxygen
2014/9/25: removed Lumenvox
2014/1/29: added Acapela child voices Lea and Jonas
2014/1/29: removed broken links
2013/10/29: added Google
2013/09/18: added Simple4all
2013/09/5: added Lumenvox
2013/08/15: changed SpeechConcept to Aristech and updated samples
2013/01/28: added VoiceRSS samples
2013/01/28: introduced Amazon
2012/11/28: added Microsoft samples
2012/01/05: shifted identified samples to other section
2012/01/05: removed Apple because Nuance TTS is used
2012/01/05: added Acapela voice Andreas samples
2012/01/03: added SyRUB samples
2012/01/03: added SpeechConcept Leopold voice samples
2011/09/08: added unknown section.
2011/09/05: added SpeechConcept corporate voice samples. Noted aquisition of Svox and Loquendo by Nuance. Removed product comparison table (too much work to keep up-to-date)
2011/02/28: added Acapela HQTTS Andreas sample.
2011/02/22: added Mary Pavoque samples.
2011/02/21: added Ivona samples.
2011/02/16: added iPhone samples.
2010/10/8: added Vocalizer Anna samples.
2010/01/06: added Vocalizer Yannick embedded samples
2009/09/17: added BrightSpeech Julia samples
2009/09/17: added SpeechConcept samples
2009/09/17: added orpheus from meridian
2009/01/15: added product comparison table
2008/09/23: added berkom speech sample
2008/09/15: added ultra fast speech sample
2008/04/14: added Acapela greeting bunny sample
2007/11/07: added ReadSpeaker sample
2007/10/15: added e-speak and vocal tract lab
2007/09/28: added Mary BITS samples
2007/07/14: added Texas Instruments Language Translator
2006/11/29: added new voice (Klaus) for BrightSpeech
2006/8/21: added updated Loquendo samples
2006/6/19: added Vocalizer Yannick samples.
2005/11/3: added vocalizer sample.
2005/10/11: added second LAIP-TTS sample.
2005/08/29: added SAMT sample.
2005/07/01: added ETeX and Interlinx service descriptions.
2005/06/23: added SVOX unit-selection samples, courtesy of SVOX.
2005/03/31: added realspeak fun sample
2005/02/22: added Logox accent samples
2005/01/04: added MBROLA de-X samples
2004/12/21: added Scansoft Steffi samples
October 29th 2004: added Chatr female sample
October 25th 2004: added Binsteiner sample
September 23rd 2004: added univ. stuttgart unit-selection sample
August 11th 2004: added more s3 samples
August 10th 2004: added Boss s2 sample
August 9th 2004: added rvoice F026 samples
July 7th 2004: added IBM ctts female sample
July 4th 2004: added artic synthesizer
June 24th 2004: added sentence 3 for selected commercial engines
June 22th 2004: added multivox 5 and old dresden formant-synthesizers
June 11th 2004: added voiceINTERConnect
June 10th 2004: added languages for commercial engines
June 10th 2004: added erkan and fifi voices for atip's proser

Speechsynthesis-demos with simulated emotion

Buster Beat, the Berlin based Ska band.