Substitutions

Substitution is an automatic speech recognition (ASR) feature that can automatically correct errors in transcripts. Transcription accuracy in V‑Blaze deployments can be improved using substitution rules to find and replace transcription errors with the corrected text.

Transcription errors occur for a variety of reasons. Typical reasons for transcription errors include the following:

  • Words that are not included in the language model dictionary

  • Fast or poorly enunciated speech

  • Unlikely word associations or industry-specific terms

  • Poor audio quality

Substitution rules can be created to correct many of these errors.

Once you have identified the transcription errors you want to correct, you can develop substitution rules and use them to improve transcription accuracy. The process of improving transcription accuracy on a deployed V‑Blaze system is referred to as Field Tuning.

Substitutions operate best on errors where specific words or phrases are frequently mis-transcribed in the same or similar ways. A "before" and "after" pair defines the error and the correct replacement. For example, if the spoken phrase “date of birth” is transcribed frequently as “data birth,” this can be corrected with the “before : after” substitution rule of “data birth : date of birth.”

You must have ASR transcripts available before you can develop substitution rules. First, analyze the transcripts to locate errors. If you are using V‑Spark for analytics, you can search across transcripts to determine error consistency and frequency once you have identified particular errors. In either case, you should listen to call audio to confirm that the same word or phrase is being spoken for any suspected transcription error.

One of the primary uses of transcripts is analysis. The goal of transcript analysis is to extract actionable insights to help make more informed decisions. Transcription errors hinder the analysis process because mis-transcribed words and phrases can limit or skew the results. Substitution is an effective field tuning approach when used to target the words and phrases that directly affect transcript analytics.