last update: 18th November 2024
I added a chart to facilitate the understanding of the concepts used for classification. It's kind of out-dated, as non-uniform unit selection is not explicitely mentioned.
TTS consists always of two components, which I call Dutoit's introduction):
The engines that synthesize the speech (DSP-component) are based mainly on five main technologies:
The test sentences were:
sentence 1:
An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht. Dabei war eigentlich immer sehr schönes Wetter gewesen.
As I found this sentence a bit too simple, I thought up another test sentence which contains a collection of known problems for the NLP module: (in some demos this sentence is truncuated due to provider's restriction on character number)
sentence 2:
Dr. A. Smithe von der NATO (und nicht vom CIA) versorgt z.B. - meines Wissens nach - die Heroin seit dem 15.3.00 tgl. mit 13,84 Gramm Heroin zu 1,04 DM das Gramm.
Speaking now 6 years after thinking up those sentences, more pressing problems for German speech synthesis used in services like email-reading arise from the pronounciation of english terms, e.g. the following sentence would not be pronounced correctly by most systems without tuning:
sentence 3:
Die Manpowerdiskussion wird gecancelt, du kannst das File vom Server downloaden.
company/link | description/engine name | technology | languages | voice name | year (approx.) | s1 | s2 | s3 |
---|---|---|---|---|---|---|---|---|
Acapela
Acapela was formed in December 2003 from a combination of three European companies specializing in vocal technologies, Babel Technologies (Belgium), Infovox (Sweden) and Elan Speech (France). |
Acapela HQ TTS | non-uniform unit-selection | DE, FR, NL, ES, SE, US, SA, CY, DN, FI, CA, GR, IE, NO, PL, PT, BR, RU, TR | Claudia |
2015 | |||
Claudia (Smile) |
2017 | |||||||
Lea (Child) |
2013 | |||||||
Jonas (child) |
2013 | |||||||
Andreas |
2011 | |||||||
Julia |
2009 | |||||||
Klaus |
2006 | |||||||
Sarah |
2003 | |||||||
Custom Voice | non-uniform unit-selection, ANN | DE | Felix DNN (page author): Artificial neural net model adapted with 15 minutes data |
2017 | ||||
Felix (page author): Non-uniform unit-selection from 2 hours data |
2017 | |||||||
Greeting Bunny | non-uniform unit-selection | DE, US, FR, IT, ES, NL, SE, NO, DK, BE | Bunny |
2008 | ||||
Aculab
|
Aculab | diphone diphone-concatenation with LPC coded units. LPC (linear predictive coding), originally a compression algorithm, useful for synthesis because based on a source/filter model of speech. |
DE, UK, US, FR, BR, IT, ES | Julia |
1998 | - | ||
Aristech
Formerly Speechconcept |
Cerevoice | non-uniform unit-selection Developments from Aristech, CereProc and University of Edinburgh |
DE, EN, FR, IT, ES, US, NL, JP | Sophie, adult Corporate Voice, courtesy of Aristech |
2011 | |||
Leopold, Austrian adult courtesy of Aristech |
2013 | |||||||
Alex, adult courtesy of Aristech |
2016 | |||||||
Gudrun, adult courtesy of Aristech |
2013 | |||||||
Nick, youth courtesy of Aristech |
2011 | |||||||
Saskia, youth courtesy of Aristech |
2011 | |||||||
Atip
|
Proser | diphone NLP-component and voices from Atip, Mbrola Engine (diphone-concatenation) from Babeltech |
DE, US | Carla |
2000 | |||
Erkan Turkish accent |
2004 | |||||||
Fifi French accent |
2004 | |||||||
Steffen |
2000 | |||||||
Eva |
2000 | |||||||
AT&T
|
Natural Voices | non-uniform unit-selection | DE, IT, US, UK, FR, MX* | Klara |
2001 | |||
Reiner |
2002 | |||||||
Babeltech
|
Brightspeech | non-uniform unit-selection same as Acapela HQ TTS |
Ingrid |
2002 | - | |||
Babil | diphone diphone-concatenation based on commercial Mbrola-engine. MBROLA (Multi Band Resynthesis Overlap and Add), similar to PSOLA but the database is treated beforehand to adapt pitch, amplitude and spectral features. |
DE, US, UK, ES, FR, NL, BE, BR, PT, IT, SE, NO, DK, FI, IS, TR, CZ, SA | Eva |
2000 | ||||
Greta |
2000 | |||||||
Steffen |
1997 | |||||||
Helga Same as Infovox 330 |
1998 | - | - | - | ||||
Gerhard Same as Infovox 330 |
1998 | - | - | - | ||||
Bell Labs
|
diphone LPC-diphone concatenation |
DE, FR, ES, US, UK, IT, RU, RO, CN |
|
- | ||||
Centigram
Acquired by Lernout & Hauspie, later Nuance |
TruVoice | formant | DE, US, MX*, FR, IT |
|
1996 | - | ||
Cepstral
|
Cepstral TTS | non-uniform unit-selection Associated wiith Alan Black, one of the pioneers of non-uniform unit-selection and lead scientist of Festival, an open source text-to-speech framework developed at Univ. of Edinburgh and the CMU. |
DE, UK, US, ES, FR, EG, TH, AF | Kathrin |
2003 | |||
Matthias |
2003 | |||||||
Deutsche Telekom
|
Berkom TTS | formant Research system by former rd department of German Telekom. Hybrid approach combining formant synthesis for voiced phonemes and concatenating with waveform coded units for unvoiced parts. |
DE | Felix |
1998 | |||
SAMT | hardware-based formant synthesis (Sprach-Ausgabesystem in Multiplex-Technik): hardware-based formant synthesis of former Forschungsinstitut der Deutschen Bundespost. |
DE |
Other sample: |
1987 | - | - | - | |
Digital Equipment Corporation
|
DecTalk | formant First commercial text-to-speech synthesizer. Rule based formant-synthesis - the legendary formant synthesizer, based on Klatt's MITTalk) |
DE, US, UK, ES, MX*, FR |
|
1982 | - | ||
Elan
|
SaySo | non-uniform unit-selection | DE, US, FR, IT, ES | Lea |
2003 | |||
Tempo | diphone Pitch Synchronous Overlap and Add (PSOLA): famous algorithm to change pitch and time of speech that made diphone-synthesis a great success for many years. |
DE, US, UK, FR, ES, IT, BR, PT, RU, PL | Thomas |
1998 | ||||
Dagmar |
1996 | |||||||
Eloquent Technologies
Aquired by Scansoft. |
ETI Eloquence |
rule-based formant-synthesis (Klatt-style). Later sold by Speechworks, also licensed to IBM (ViaVoice Outloud) |
DE, UK, US, ES, MX, FR, CA(FR), IT, FI, BR, CN, JP, KR |
|
1998 | - | ||
GData
|
Logox | microsegment synthesis Microsegmentsynthesis (concatenating subphonetic units), not developed any more. Originally based on research from Univ. of Saarbrücken. |
DE, US, UK | Default voice |
2000 | - | ||
Bill |
1998 | |||||||
Bill (Swabian accent) |
2002 | |||||||
Bill (Hessian accent) |
2002 | |||||||
Bill (Saxon accent) |
2002 | |||||||
Bill (French accent) |
2002 | |||||||
google
|
wavenet | wavenet: artificial neural nets end-to-end | AF, AR, BG, BN, CA, CS, DA, DE, EL, EN, ES, FI, FIL, FR, GU, HI, HU, IN, IS, IT, JA, KN, KS, LV, ML, MS, NL, NO, PL, PT, RO, RU, SL, SR, SV, TA, TE, TH, TL, TR, UK, VI, ZH | Wavenet A (female) |
2018 | |||
Wavenet B (male) |
2018 | |||||||
Wavenet C (female) |
2018 | |||||||
Wavenet D (male) |
2018 | |||||||
Wavenet E |
2022 | |||||||
Wavenet F |
2022 | |||||||
Google Basic | so-called basic (non-uniform unit selection?) | AF, AR, BG, BN, CA, CS, DA, DE, EL, EN, ES, FI, FIL, FR, GU, HI, HU, IN, IS, IT, JA, KN, KS, LV, ML, MS, NL, NO, PL, PT, RO, RU, SL, SR, SV, TA, TE, TH, TL, TR, UK, VI, ZH | Standard A (female) |
2018 | ||||
Standard B (male) |
2018 | |||||||
Basic C |
2022 | |||||||
Basic D |
2022 | |||||||
Basic E |
2022 | |||||||
Basic F |
2022 | |||||||
Google Translate | non-uniform unit-selection | Female Samples were accessed via the translation service. |
2013 | |||||
ibm
|
Watson | unknown | CS, DE, EN, ES, FR, IT, JA, KS, NL, PT, SV, ZH | Birgit |
2022 | |||
Dieter |
2022 | |||||||
Erika |
2022 | |||||||
CTTS | non-uniform unit-selection | DE, US, UK, JP, KR, IT, ES, FR | Male Courtesy of IBM. Database speaker is Gilles Karolyi. Sentence 3 sample is 8 kHz. |
2002 | ||||
Female Other sample: |
2004 | - | - | - | ||||
Infovox
|
330/Infovox Desktop | diphone-concatenation Probably same as Babeltech Babil. Infovox 310 is apple version |
DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE | Helga 8 kHz version Other sample: |
1996 | |||
Gerhard 8 kHz version Other sample: |
1996 | |||||||
210/230 | formant-synthesis successor of KTH's OVE, originally telia promotor |
DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE |
|
1994 | - | |||
Desktop PRO | non-uniform unit-selection same as Acapela HQ TTS |
|
- | - | - | |||
Innoetics
|
non-uniform unit-selection Development system from unsupervised audiobook extraction |
DE, US, UK, GR, BG | Christian Courtesy of Innoetics |
2015 | ||||
Claudia Courtesy of Innoetics |
2015 | |||||||
Jessi Courtesy of Innoetics |
2015 | |||||||
Kalrsson Courtesy of Innoetics |
2015 | |||||||
Ivona
Owned by Amazon |
Ivona TTS | non-uniform unit-selection Licensed by Lumenvox. |
DE, US, UK, ES, RO, PL, MX | Hans |
2011 | |||
Marlene |
2011 | |||||||
Lernout & Hauspie
Acquired by Scansoft in 2001n after bankruptcy |
TTS3000 | diphone | DE, US, UK, NL, FR, RU, ES, MX, BR, CN, KR | Stefan |
1996 | - | ||
Anna Other sample: |
1996 | - | - | - | ||||
Loquendo
Acquired by Nuance in 2011 |
Loquendo TTS | non-uniform unit-selection Formerly called Actor |
DE, IT, ES, FR, BR, PT, CN, UK, US, MX, GR, CL, AR, SE | Katrin Courtesy of Loquendo. |
2003 | |||
Stefan Courtesy of Loquendo. |
2003 | |||||||
Ulrike |
2001 | |||||||
Meridian
|
Orpheus | formant Formerly from Dolphin Oceanic Ltd. Specialized on fast speech as used by blind customers. |
DE, UK, US, FR, BR, PT, IT, ES, Welsh, CN, MD, CR, DN, NL, FI, GR, HU, LT, MY, NO, PL, RO, MX, SE | Orpheus |
2009 | |||
Microsoft
|
Microsoft Azure TTS services | deep neural nets DNN | ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN | Amala |
2022 | |||
Bernd |
2022 | |||||||
Christoph |
2022 | |||||||
Conrad |
2022 | |||||||
Elke |
2022 | |||||||
Gisela |
2022 | |||||||
Kasper |
2022 | |||||||
Killian |
2022 | |||||||
Klarissa |
2022 | |||||||
Klaus |
2022 | |||||||
Louisa |
2022 | |||||||
Maja |
2022 | |||||||
Ralf |
2022 | |||||||
Tanja |
2022 | |||||||
Katja (Neural) |
2020 | |||||||
Microsoft Mobile Voices | non-uniform unit-selection | ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN | Katja |
2014 | ||||
Stefan |
2014 | |||||||
Microsoft Speech Platform - Runtime Languages (Version 11) | non-uniform unit-selection | ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN | Hedda |
2012 | ||||
Neospeech
A Hoya company. As is ReadSpeaker. |
non-uniform unit-selection | DE, US, UK, MX, TW, TH, KR, IT, CN, CH, JP, CT, BR, PT, FR | Lena |
2018 | ||||
Tim |
2018 | |||||||
Nuance
Formerly Scansoft (originating from Kurzweil and Xerox), acquired Europeean pioneers Lernout & Hauspie in 2001, took the name of a smaller company named Nuance which they acquired in 2005 |
Vocalizer DNN | Artificial neural nets | US | Nuance Website Sample Other sample: |
2018 | - | - | - |
Vocalizer | non-uniform unit-selection Formerly called RealSpeak (Vocalizer was the name of the original Nuance product), originally from Lernout & Hauspie), converged with RVoice (formerly Rhetorical) . First commercial German unit-selection TTS |
DE, NL, PT, CA, CN, ES, DK, PT, FR, IT, JP, KR, MX, NO, PL, RU, SE, US, UK, AU, SA, ID, Basque, BE, CZ, FI, GR, IN, HU, TH, TR, ZA, RO | Victor |
2016 | ||||
Anna 11 kHz, courtesy of Nuance |
2010 | |||||||
Yannick 11 kHz, courtesy of Nuance |
2006 | |||||||
Yannick 2 Yannick embedded version recorded from a cell phone |
2009 | |||||||
Monika and Beate (?) - same as RVoice F026 |
2005 | |||||||
Steffi 8 kHz |
2004 | |||||||
Steffi 2 Newer version with enhanced voicequality and better pronunciation. |
2005 | |||||||
Vera 8 kHz |
1999 | |||||||
Nuance (until 2005)
Acquired by Scansoft in 2005 |
Vocalizer 4.05 | non-uniform unit-selection | DE, US, UK, AU, CA(FR), MX*, BR | Anna Weber |
2004 | - | ||
Vocalizer 1.0 | non-uniform unit-selection licensed Fonix engine |
DE, US, UK, NL, FR, IT, NO, ES, SE |
|
2001 | - | |||
ReadSpeaker
A Hoya company. As is NeoSpeech. Formerly called rSpeak |
non-uniform unit-selection using deep neural artificial networks |
DE,GB,US,AU,ES,FR,NL,SE | Max courtesy of ReadSpeaker |
2018 | ||||
Rhetorical Systems
Was headquartered in Edinburgh, Scotland. Acquired by Scansoft / Nuance in 2004 |
RVoice | non-uniform unit-selection | DE, UK, US, GR, ES | F026 |
2004 | |||
M027 |
2004 | |||||||
F018 |
||||||||
Speechworks
Acquired by Scansoft / Nuance in 2003 |
Speechify | non-uniform unit-selection | DE, US, UK, AU, JP, MX*, FR, BR, CA(FR) | Tessa |
2002 | |||
Svox
Originally a spin-off from ETH Zurich. Acquired by Nuance in 2011 |
Svox Corporate | non-uniform unit-selection | DE, FR, IT, US, ES | Petra |
2005 | |||
Markus |
2005 | |||||||
Marlene Other sample: |
2003 | - | - | - | ||||
diphone | DE, FR, IT, US, ES | Nicole |
2000 | - | ||||
TextSpeaK
|
TextSpeakSE Version 2 v3.8.20-TTS-EM-HD2 | unknown, perhaps non-uniform unit-selection | DE | Peter |
2024 | |||
thorstenvoice
|
VITS | deep learning model: VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) | DE | Thorsten |
2023 | |||
Tacotron 2 - DDC | deep learning model: Double Decoder Consistency model architecture | DE | Thorsten |
2023 | ||||
tom weber
software
|
Fahrgastansagen TTS | non-uniform unit-selection | DE | Andreas Samples courtesy of tom weber software |
2015 | |||
Marianne Samples courtesy of tom weber software |
2015 | |||||||
VoiceINTERConnect
|
diphone Commercial version of the Dress Synthesizer (University of Dresden). |
female voice |
2000 | |||||
male voice |
2000 | |||||||
Votrax
|
formant Early hardware Formant synthesizer. Samples taken from an Audiodata Braille reader. |
DE |
|
1974 | ||||
Voxygen
Spin-off from French Orange Labs. |
Hybrid non-uniform unit-selection / HMM synthesis | DE, FR, EN, ES, IT, AR | Sylvia courtesy of Voxygen |
2014 | ||||
Matthias courtesy of Voxygen |
2014 |
Institution | System | Remark | Year (approx.) / remark | s1 | s2 | s3 |
---|---|---|---|---|---|---|
IKP Bonn
|
BOSS |
non-uniform unit-selection |
2001 |
|||
Hadifix |
mixed inventory concatenation HADIFIX = HAlbsilben, DIphone und suFIXe DE |
1995 |
- | |||
University of Budapest
|
Multivox 5 (Profivox) |
diphone synthesis |
2004 male speaker 1 |
- | ||
2004 male speaker 2 |
- | |||||
Multivox 3 |
formant synthesis DE, HU, FI, NL, ES, PT, SA, Esperanto |
1994 Other sample: |
- | - | - | |
DFKI
|
Mary |
non-uniform unit-selection Mary=modular architecture for speech synthesis, open source. Great tool also to teach about speech synthesis because the output and input of different poicessing modules can be viewed as text. DE, EN , Tibetian |
2011 Pavoque corpus |
|||
2007 Bits 1 for details see Schröder, M. & Hunecke, A. (2007). Creating German Unit Selection Voices for the MARY TTS Platform from the BITS Corpora. Proc. SSW6, Bonn, Germany. |
||||||
2007 Bits 2 |
||||||
2007 Bits 3 |
||||||
2007 Bits 4 |
||||||
Mary/Mbrola |
diphone DE, EN |
2000 |
||||
Technical university of Dresden
|
DRESS |
diphone synthesis |
1996 |
|||
Voice 1 |
concatenative formant-synthesizer |
1993 Other sample: |
- | - | - | |
TUSY |
hardware formant-synthesizer |
1987 Other sample: |
- | - | - | |
ROSY |
hardware formant-synthesizer Robotron Synthesizer |
1977 Other sample: |
- | - | - | |
Syni 2 |
punchcard controlled formant-synthesizer Robotron Synthesizer |
1975 Other sample: |
- | - | - | |
Syni 1 |
punchcard controlled formant-synthesizer Robotron Synthesizer |
1972 Other sample: |
- | - | - | |
Michael Pucher with
Austrian academy of sciences
|
hts-engine-world |
HMM-based vocoder synthesis, for details see the article M. Pucher, D. Schabus, J. Yamagishi, F. Neubarth, V. Strom: Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis. Speech Communication, Volume 52, Issue 2, February 2010, Pages 164-179. Specialized on Austrian dialects/sociolects. based on open-source software: https://github.com/mipuc/hts-engine-world |
2020 LEO Austrian German male |
- | ||
2020 HPO Viennese dialect male |
- | |||||
2020 JOE Viennese youth female |
- | |||||
2020 KEP Austrian German male, adaptive voice |
- | |||||
2020 MPU Austrian German male, adaptive voice |
- | |||||
Jonathan Duddington
|
eSpeak |
formant-synthesis based on the 1995 unix "speak"-program. Open-source |
2006 |
|||
ETH Zürich
|
Svox |
diphone-concatenation Predecessor of the commercial version later acquired by Nuance. |
1998 |
|||
Gerhard Mercator University of Duisburg
|
formant-synthesis |
1996 |
- | |||
KTH Stockholm
|
Infovox |
formant synthesis Developed by Rolf Carlson, Bjorn Granströ;m and Sheri Hunnicut |
1992 |
- | - | |
Ove III |
Hardware formant synthesis Orator Verbis Electris (OVE) . Developed by Gunnar Fant |
1967 Other sample: |
- | - | - | |
University of Mons
|
Mbrola |
diphone-synthesis Mbrola: Multi-band Resynthesis Overlap and Add. The NLP (text phonemisation) component is Txt2Pho, the Hadifix NLP in combination with Mbrola-Synthesis . Available for free for noncommercial use. MBROLA-TTS is avalable for about 34 different languages. |
1998 de8 Markus Binsteiner's work an a Bavarian dialect Other sample: |
- | - | - |
2000 de7 (by Marc Schröder, DFKI/Uni Saarland, female, 22 kHz), all diphones in three voice qualities (for emotional speech simulation). |
||||||
2000 de6 (by Marc Schröder, DFKI/Uni Saarland, male, 22 kHz), all diphones in three voice qualities (for emotional speech simulation). |
||||||
2000 de5 by Fred Englert (ATIP), female, 22 kHz |
||||||
2000 de4 By IMS Stuttgart, male, 16 kHz, includes english and french diphones |
||||||
2000 de3 by ATIP, female, first 22005 kHz voice |
||||||
1997 de2 By ATIP, male, 16 kHz |
||||||
1996 de1 By ATIP, female, 16 kHz |
||||||
ÖFAI (Austrian Research Institute for Artificial
Intelligence)
|
VieCtoS |
demisyllable-LPC-concatenation Vienna Concept-to-Speech system. If the prosody sounds poor it's due to my limited knowledge of Tobi-Labels. |
1998 |
- | - | |
OGI, Oregon Graduate Institute,
|
LPC-diphone concatenation Developed at the OGI, Center for Spoken Language Understanding during a workshop in 1998. TTS-Framework is Festival |
1998 |
- | |||
Ruhr Univerität Bochum
|
SyRUB, Version 4.1.1 | 1995 |
||||
Simple4All
|
Tundra corpus |
non-uniform unit-selection EU FP7 Project "Simple4All" Tundra corpus, system features unsupervised learning. |
2013 |
|||
Espnet
|
Hokuspokus model |
ANN: Tacotron2 Thanks to kan-bayashi en, jp, de |
2022 Hokuspokus |
|||
Hochschule Hof,
Institut für Informationssysteme
|
VITS |
deep learning Vits (VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) model de |
2023 Friedrich |
|||
2023 Eva |
||||||
2023 Bernd |
||||||
Tacotron2 |
deep learning Tacotron 2 model de |
2023 Hokuspokus |
with the following systems it wasn't possible to synthesize own sentences:
name/link | description | year (approx.) | mpeg3 |
---|---|---|---|
AEG Telefunken
|
SVS (SPRAUS Voll Synthese)
unknown |
1975 |
|
Karlchen
unknown concatenation ("Parcor-Synthetisator") Deutsche Bahn Auskunftssystem |
1978 |
||
ATR
|
non-uniform unit selection |
1997 male |
|
1997 female |
|||
Bose
|
unkown
unkown recorded from a bose mini soundlink II bluetooth speaker february 2018 |
2018 |
|
Univ. of Dresden, Peter Birkholz
|
Vocal Tract Lab
Articulatory synthesis |
Handtweaked articulatory movements transformed into a mathematical model to generate soundwaves |
- |
ELIS Lab
|
Eurovocs
diphone-synthesis Technology from Lernout & Hauspie |
1998 |
|
1996 |
|||
First Byte
|
product-name:Monologue, ProVoice. waveform-concatenation synthesis (? |
1998 |
|
HHI:
Heinrich Hertz Institut
|
technology unknown |
1978 |
|
Keller & Trauth.
|
SpeakEaZy
waveform-concatenation synthesis |
1998 |
|
SlowSoft
|
SlangTTS
Non-uniform unit-selection synthesis |
2020 |
|
Wolfgang_von_Kempelen's Speaking Machine
|
Hardware manual sound generator ("papa", "mama") |
1769 |
|
University of Köln
Institut für Phonetik |
articulatory-synthesis (actually not a TTS-system) |
1996 |
|
Karl Küpfmüller / Bernhard
Cramer
|
Hardware phoneme concatenation |
1955 |
|
University of Lausanne (LAIP)
|
TTS-system from the university of Lausanne (LAIP), uses MBROLA -engine. Includes a model to reduce/elaborate articulation according to speech-rate. |
1998 |
|
Mila (Machine learning laboratory at the University of Montrea)
|
Char2Wav
Deep neural artificial networks from University of Montreal: An end-to-end model for speech synthesis learned with Deeplearning4J. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. For the German samples, the Pavoque database was used for training. |
2017 |
|
Philips/IPO Eindhoven
|
Spengi
diphone-synthesis |
1997 |
|
Unknown Russian TTS
|
unknown / formant? |
1970 |
|
H.W. Strube, University of Göttingen
|
Articulatory synthesis. |
1977 |
|
Texas Instruments Language Translator
|
LPC coded word-concatenation |
1980 Male Voice |
|
University of West Bohemia in Pilsen
|
ARTIC (ARtificial Talker In Czech)
concatenative synthesizer Commercial version available by speechtech by the name of ERIS. |
2002 |
|
|
|
|
- |
The following table lists some products to enhance text-to-speech quality.
company | product | description | date | sample |
---|---|---|---|---|
ReadSpeaker, now commercialize their own engine under the name rSpeak, both a Hoya company. | SagEs / SayIt | Serverbased website reader. Based on Acapela products. Sample reads a newspaper article (Tagesspiegel). Note pronunciation of the word "playstation". | 7/11/07 | |
ETeX | - | Dictionaries. | 1/7/05 | |
Interlinx, aquired by Speech Concept | emphasis / SpeechOptimizer | Tuning tool for pronounciation and prosody modeling. | 1/7/05 |
Speechsynthesis examples, that did not fit otherwise.
Description | Example |
---|---|
Ultrafast speechsynthesis as used by blind, with 14 syllables per second, based on formant synthesis Eloquence | |
realspeak British English, 31/5/05, "Flight LH312 from Frankfurt to Berlin." | |
TTS of the Fiat "Blue & Me" Navigation Headunit with Microsoft CE. Voice Steffi of Nuance. | |
Apple Iphone 2011, Recorded with PC Mikrofon from Apple iPhone 4.1, TTS is faster compact version of Voice Yannick von Nuance | , , |
the following engines are based on systems with a different name:
For the following systems I didn't yet get samples:
For the following systems I have no information about the supplier:
The following persons delivered information and/or samples: