Custom language modeling
Creating a custom language model improves overall accuracy and adds new words to the model dictionary that are specific to a customer's business.
"Light" language modeling requires 50-100 hours of recorded speech audio. This audio must be transcribed manually and combined with existing language resources to create a custom language model that is better attuned to a specific customer's calls.
Typically, fewer than two months are required for model creation once Voci receives recorded audio. The combined effort of audio collection and model creation is non-trivial, but once the model is deployed, transcription accuracy is improved with no negative impact on transcription speed.
Full custom language modeling
The same process used in light language modeling is applied in "Full" custom language modeling, but on a larger scale. "Full" language modeling requires a minimum of 300 hours of speech audio. Up to 25,000 hours of audio can be used to improve the outcome of the modeling effort. Three to four months are required for full custom language model creation, no matter how much audio in that range is submitted. The combined effort of audio collection and model creation is larger than light language modeling, but once the model is deployed there is a greater improvement in overall accuracy.
Full custom language and acoustic modeling
In "Full" custom language and acoustic modeling, the same custom language modeling process is supplemented with the creation of a custom acoustic model that is specifically attuned to the customer's call acoustics. This dual process results in higher immunity from noise and better recognition of accents that occur commonly on the specific customer's calls. A minimum of 1000 hours of speech audio is required, and up to 25,000 hours can be used to create the models. Between 4 and 6 months are required for full custom language and acoustic model creation no matter how much audio in that range is submitted. This is the greatest amount of effort of all tuning options, but it provides the most accurate transcripts.