Preparing your data

To develop substitutions, you will need a set of transcripts and the associated audio files. Your dataset should be representative of the audio you will be transcribing with your Voci ASR solution. Your dataset should be large enough to provide a statistically significant representation of repeated transcription errors yet remain small enough to be processed quickly. Voci recommends using transcripts produced from approximately 1000 hours of recorded audio. Fewer hours are acceptable, but will limit the data available to find consistent errors that are suitable for this type of correction.

Note: You can use preexisting transcripts and the associated audio files for this process.

Proceed through the steps required to configure your transcription solution, upload collected audio, and produce JSON transcripts of that audio. Make sure to select the same language model employed during normal transcription because substitutions are specific to the language model used. Different language models will produce different results from the same speech.

Refer to the Voci guide relevant to your ASR solution for more details on how to prepare your data: