numtrans
Values: true (default), false
Description:
Controls whether certain words in transcribed text are converted into numeric digits and related conventional formats, including dollar amounts, wall-clock times, percentages, ordinals, web addresses, and telephone numbers. For example, with
numtrans
set to
true
(the default), the words “forty two percent” would be transformed into the text “42%”.
In most cases it is desirable to leave
numtrans
turned on, but there are special cases where it should be turned off. For example, if you are evaluating the Word Error Rate (WER) of Voci’s transcripts,
numtrans
must be disabled.
WER measurements are only valid against verbatim text because there is not a 1:1 mapping between words that are spoken and conventional representations. For example, both of the word sets “four nine zero” and “four hundred and ninety” will map to the numeric representation “490”.
As of V‑Blaze version 7.3, there is a numtrans configuration file for each supported language. These configuration files specify how transcripts containing numbers are formatted based on the type of information. The following table shows how that information is formatted in transcripts based on the language.
Type |
Examples |
Languages Available |
---|---|---|
currency |
€1,900.50 $1.900.50 $0.20 |
eng, spa |
1,900.50€ 1,900.50$ 0.20$ |
fre1 | |
1 900,50€ 1 900,50$ 0,20$ |
fre-fr | |
1.900,50€ $1.900,50 $0,20 0,20€ |
ger, ita | |
percentage |
5% 0.03% 2000% |
eng, spa, fre1 |
5% 0,03% 2000% |
fre-fr, ger, ita | |
phone X represents any digit |
XXX-XXXX XXX-XXX-XXXX 1-XXX-XXX-XXXX |
eng, spa, fre1 |
0X XX XX XX XX +XX X XX XX XX XX |
fre-fr | |
XXX-XXX-XXXX 0XX / XXXXXXXXX |
ger | |
XXX-XXXXXXX 1-XXX-XXXXXXX |
ita | |
time |
8:00 AM 3:20 PM 15:20 PM 15:20 |
eng, spa, ger |
8h00 3h20 PM 15h20 PM 15h20 |
fre1, fre-fr | |
ordinal |
21st 10th first third ninth This applies unless an address is detected, in which case short form is always used. For example: 142 3rd street |
all |
cardinal |
| all |