Best practices

Substitution rules work best when they are not too general and not too specific. Rules that are too general will cause substitutions to match and replace when they should not. This case is known as a false positive. Rules that are too specific won't allow substitutions to match and replace where they're supposed to. This case is known as a false negative. Develop your substitution rules to strike a balance between the potential risk of false positives and false negatives. A good substitution rule has just the right amount of specificity.

The following is an example of a substitution rule that is too general and would result in false positives.

this : fish

The substitution rule above seems like an obvious solution if the word "fish" was consistently mistranscribed as "this". However, the word "this" is used frequently. Therefore, every time the word "this" is spoken, it would convert to "fish". The overly general substitution rule above would cause more transcription errors than it would correct. The solution is to add some context to make the matching phrase more specific.

fresh this : fresh fish
quality this : quality fish
bad this : bad fish

The following is an example of a substitution rule that would result in false negatives.

My name is fred and I'm on les sick : My name is fred and I'm on /Lasix/

The substitution rule above contains too much context and will fail to correct most occurrences of "les sick". Avoid making overly specific substitution rules like the one above as they miss the majority of opportunities to correct an error. The following example illustrates a more effective substitution rule with less context.

les sick : /Lasix/

If the mistranscription was "less sick" instead of "les sick", a substitution rule similar to the one above would be too general. It wouldn't be out of the ordinary for someone to say "I feel less sick today". In this instance, adding some context to the rule would make the necessary corrections.

on less sick : on /Lasix/

It might be tempting to add more context to the rule above. Such as, "I'm on less sick : I'm on /Lasix/". However, that would result in false negatives as it would fail to correct phrases such as, "I'm not on less sick" or "My dad is on less sick".

Being overly specific will limit the effectiveness of your substitution rules. The following is another example of a substitution rule with too much context.

Welcome to Wells Fargo for a change : Welcome to Wells Fargo /Foreign/ /Exchange/

The substitution rule above is ineffective because it would fail to correct other instances of the same transcription error. Consider the following mistranscribed phrases.

  • "Good morning, this is Jim from Wells Fargo for a change"

  • "Thank you for calling Wells Fargo for a change"

The following substitution rule will correct the same transcription error in different contexts like the ones listed above.

Wells Fargo for a change : Wells Fargo /Foreign/ /Exchange/

Domain context

Distinctive context can appear on either side of a transcription error. The surrounding context of the transcription error should be taken into account when developing substitution rules because that context mitigates the potential risk of false positives.

For example, the term “ACH” appears frequently in the context of banking and moving money (payments, direct deposits). “ACH” is an important topic and categorization term for analysis and should not be ignored. When spoken quickly, "ACH" can sound very similar to "KCH". This is especially true when audio is damaged by lossy compression, poor cellular reception, or loud background noise.

A substitution rule like "k c h : /ACH/" is too general and would likely introduce false positives when people spell out names that include the letters "k c h". Rather than create a substitution rule without context like "k c h : /ACH/", search the transcripts for different instances of "k c h" and add the necessary context to your rules.

The following example illustrates a few different substitution rules with different contexts. However, they are all meant to correct different instances of the same transcription error.

electronic k c h : electronic /ACH/
k c h debit : /ACH/ debit
k c h transfer : /ACH/ transfer
k c h payment : /ACH/ payment

The substitution rules above will work to correct a phrase such as, "make an electronic k c h payment", where matching context could occur on either side.

Using Multiple Rules to Correct Different Instances of the Same Error

Increasing the specificity of a substitution rule will always lower the false positive rate and increase the false negative rate. For a single rule, if you're uncertain about including more context, it is better to be more specific than too general. Increasing specificity is the only way to lower the false positive rate.

You can lower the false negative rate by creating multiple rules to correct different occurences of the same transcription error. Ensure each rule contains distinctive context that addresses a specific sub-set of errors.

Recall the overly general example "this : fish". Most corrections made by this substitution rule will be false positives. The rule must be made more specific to bring down the false positive rate. In this case, multiple substitution rules should be created with more context. The following example illustrates multiple substitution rules with minimal contextual information to correct different instances of the same transcription error.

fresh this : fresh fish
quality this : quality fish
this market : fish market
this store : fish store
Note: The last two rules are risky. Imagine someone saying “I visited this store” or “I’m bullish in this market.” More context to increase specificity will help, if available. However, if fish is frequently mentioned, the last two rules might be acceptable. Substitution candidates should be carefully validated to ensure the false positive rate of each rule is acceptable.

Check the substitution file

Voci provides an additional Python 2.7 script to ensure the substitution file is functional. Use subsCheck.py to check a substitution file for errors.
Note: Contact support@vocitec.com for access to subsCheck.py.

Once you've completed the substitution list and saved the file, check the substitution file for errors using subsCheck.py . The following example illustrates a command to run the subsCheck script on a particular file.

subsCheck.py mySubsFile.sub

        

If there are any mistakes in the file, subsCheck will indicate the nature and location of the errors. Find the errors, correct them, and run the script again to ensure the file is free of errors.