numtrans

Values: true (default), false

Description:

Controls whether certain words in transcribed text are converted into numeric digits and related conventional formats, including dollar amounts, wall-clock times, percentages, ordinals, web addresses, and telephone numbers. For example, with numtrans set to true (the default), the words “forty two percent” would be transformed into the text “42%”.

In most cases it is desirable to leave numtrans turned on, but there are special cases where it should be turned off. For example, if you are evaluating the Word Error Rate (WER) of Voci’s transcripts, numtrans must be disabled.

WER measurements are only valid against verbatim text because there is not a 1:1 mapping between words that are spoken and conventional representations. For example, both of the word sets “four nine zero” and “four hundred and ninety” will map to the numeric representation “490”.

As of V‑Blaze version 7.3, there is a numtrans configuration file for each supported language. These configuration files specify how transcripts containing numbers are formatted based on the type of information. The following table shows how that information is formatted in transcripts based on the language.

Table 1. numtrans Information Types

Type

Examples

Languages Available

currency

€1,900.50 $1.900.50 $0.20

eng, spa

1,900.50€ 1,900.50$ 0.20$

fre1

1 900,50€ 1 900,50$ 0,20$

fre-fr

1.900,50€ $1.900,50 $0,20 0,20€

ger, ita

percentage

5% 0.03% 2000%

eng, spa, fre1

5% 0,03% 2000%

fre-fr, ger, ita

phone

X represents any digit

XXX-XXXX XXX-XXX-XXXX 1-XXX-XXX-XXXX

eng, spa, fre1

0X XX XX XX XX +XX X XX XX XX XX

fre-fr

XXX-XXX-XXXX 0XX / XXXXXXXXX

ger

XXX-XXXXXXX 1-XXX-XXXXXXX

ita

time

8:00 AM 3:20 PM 15:20 PM 15:20

eng, spa, ger

8h00 3h20 PM 15h20 PM 15h20

fre1, fre-fr

ordinal

21st 10th first third ninth

This applies unless an address is detected, in which case short form is always used. For example: 142 3rd street

all

cardinal

  • 4 or more digits are always concatenated. For example: "eleven twelve" = 1112

  • 3 digits are concatenated unless the first word is two digits. For example: "twelve three" =12 3 "three twelve" = 312 "three one two" = 312

  • 2 digits are never concatenated. For example: "two three" = 2 3

all