Languages of Interest

Introduction

Here, we briefly describe the languages for which we would like to build a TTS voice. These languages have been requested by users and organizations who feel there is a need. If we can obtain funding for any of them, we will proceed from the early research stage to building voices.

We will keep this page updated. Let us know if you have a further candidate: lp at louderpages dot org.

Amharic

According to Wikipedia, spoken by 67 million people mostly in Ethiopia, as a first or second language. We are told that there is no good, usable TTS voice.

Amharic is a a member of the Semitic language family. But, unlike Arabic and Hebrew it is fairly phonemic with all vowels being written in the text. (“Phonemic” might be translated as “What You See Is What You Hear”.)

The language is mostly phonemic, but gives us some problems because of gemination where two words are pronounced with different consonant emphasis, have different meanings, but are written the same way. Such words are called “homographs” and the human reader knows their meaning from context and so can pronounce them correctly. Not possible with TTS. We have homographs in English like “read” (“I read the book yesterday, and tomorrow I will read it again”). They are rare in English, and TTS users get use to the wrong one being spoken. But if a language has a lot of them, then we begin to worry that the TTS will get too many wrong, and become confusing and tiring.

In the following video we drop you into a short lecture on Amharic, where there are some interesting consonants for the English speaker to consider, followed by that gemination issue again.

Armenian

We are talking here of Eastern Armenian, spoken in the Republic of Armenian. E-Speak NG TTS is in use but it suffers from a number of inaccuracies.

In the video below, the e-speak NG TTS is used by an Armenian screen-reader user to read English and Russian. We are told that the Armenian Espeak voice is very poor, and perhaps that is why it is not demonstrated in this instructional video.

Armenian is somewhat challenging for us because it has a number of hidden vowels.

Afrikaans to isiZulu, via Setswana and isiXhosa

Four major languages spoken in Southern Africa. The Republic of South Africa includes them amongst its eleven official languages. One of those eleven, English, is well taken care of in the TTS field. There exist some voices in these other languages, provided by a research institute which uses a similar technology to ours. But, for some reason or another they have their problems.

Afrikaans is a Germanic language coming originally from the Netherlands. However, its spelling and pronunciation is not a regular as its relative, Dutch. This requires a lot of dictionary-like rules.

Setswana, isiZulu and isiXhosa are Bantu languages, the last two being very closely related, Nguni languages. All use tones and clicks, but their use is much reduced in Setswana, which is the national language of Botswana.

IsiZulu and isiXhosa have a rich set of consonants and click sounds, but the real challenge is their use of tones. The tones are not marked in writing, and so we cannot tell which word is being referred to, and thus how to pronounce it. This is like the gemination problem in Amharic, but with the tone of vowels rather than the length of consonants.

Here is a short video covering the wonderful click sounds used in Nguni languages. It’s actually a lot more complicated than this.

Vietnamese (Southern Pronuciation)

There are quite a few Vietnamese voices for TTS. But, we are told that they all suffer from one or more of the following: not very natural, Northern pronunciation is used, costly. Even a low cost voice can be out of the reach of blind users because a credit card is usually needed, and blind people can find it difficult to obtain banking services.

In the video below, two Southerners describe and try to speak using Northern pronunciation. As you will hear, it can be quite different.

The Indic Languages

RHVoice developers get many requests from speakers of both northern and southern languages from South Asia. Hindi, Bengali, Nepali, Marathi, Telugu, Tamil and several others. We think that reasonable TTS exist for some of those languages, but not on all platforms: Some languages may be available on Windows, but not on Android. And vice versa. Price is also an issue.