Docs for the languages, regional dialects, and industry domains supported by Voci's ASR engine
The ASR engine uses machine learning components known as models to represent knowledge about speech. This knowledge is applied during transcription. The two types of models used are acoustic models and language models. The acoustic model converts audio into a stream of sound symbols specific to a language, such as English or Spanish. The language model is responsible for converting the stream of sound symbols into text.
Accents are incorporated into the process of developing acoustic models. For example, audio from calls originating in Southern, Northeast, Midwest and Western regions of the United States were used to train Voci 's North American English acoustic model (eng1), which enables this model to work robustly throughout the United States. For accents that diverge strongly from those found in the United States, different acoustic models are necessary for optimal accuracy. Voci supports United Kingdom, European, and Australian English in this way.
The language packages included in the table use the "call center" specialty, which performs well for most use cases. These packages are tuned for a typical contact center. Several additional language packages are available for more specific needs. For example, eng1:voicemail, which is designed to process a typical voicemail message. The links in the language column describe all available language packages in more detail. Level 1 and Level 2 language packages provide a strong baseline capability that works well out of the box. All included languages support both speakers (agent and client.)
If you need a specialized model that does not appear in these sections, refer to Custom language modeling for more information on developing custom language packages.