Factors affecting accuracy
Transcription accuracy is not a single measurement, but a rather detailed analysis pertaining to several different factors that are summarized below. Voci speech scientists apply best practices to improve accuracy against many of these factors.
Domain: Voicemail, Survey, Call Center, Healthcare, Survey, etc.
Applications in which dialogue is easiest (for example, single-caller voicemail) will yield higher accuracy results than a multi-party conversation that may include overtalk. Voci builds language models specifically for these applications to maximize accuracy and speed. Learn more: Language models
Audio quality: compression, codecs, stereo vs. mono.
The noisier and more compressed the audio, the lower the accuracy. Typical telephone audio is encoded with G.711 at a rate of 64 Kbps, and Voci takes this format as a baseline. A lower encoding rate will result in lower accuracy. Recording source audio in dual channel rather than mono will typically result in higher accuracy, as much as a 10% difference. Voci always recommends dual channel. Learn more: Single-channel (mono) and channel-separated audio
Field Tuning: Substitutions
Substitution is an automatic speech recognition (ASR) feature that can automatically correct errors in transcripts. Transcription accuracy in V‑Blaze deployments can be improved using substitution rules to find and replace transcription errors with the corrected text.
Learn more: Substitutions
Field Tuning: OOV
OOV (out-of-vocabulary) is an ASR tuning feature designed to improve transcription accuracy for audio that contains brand- and industry-specific terminology. OOV enhances existing language models with new words and preferential treatment for those words.
Learn more: Out-of-vocabulary (OOV)