Receiving language identification information
Language identification (LID) information is written in a lidinfo
section of the JSON transcript of an audio file. The JSON transcript also contains information about the language model specified during transcription and the model selected by the language identification module to transcribe each utterance.
The lid
parameter enables you to use the ASR engine's Language Identification (LID) module to identify the language spoken in the input audio and automatically use an appropriate language model. To force the use of an an alternate model with a different "domain" name, specify it using lid
=language_model.
Refer to the parameter reference for more detail on additional lid
options and how to use them:
The following cURL example would submit the file sample1.wav for transcription with language identification activated:
curl -F lid=true \ -F file=@sample1.wav \ https://vcloud.vocitec.com/transcribe
The response to this POST command is a JSON file that contains a transcript, sample1.json, that includes a lidinfo
section, as in the following stereo audio example:
"lidinfo": { "0": { "conf": 1.0, "lang": "spa", "speech": 8.4499999999999993 }, "1": { "conf": 0.92000000000000004, "lang": spa", "speech": 0.98999999999999999 }, },
The lidinfo
section is a global, top-level dictionary that contains the following fields:
-
conf
— the confidence score of the language identification decision -
lang
— the three-letter language code specifying the language that was identified for the stream -
speech
— the number of seconds of automatically detected speech that were used to determine the language used in the stream
In addition, the JSON transcript also includes the following additional fields:
-
model
— as a top-level field that reports the value of the language model that was specified by themodel
tag. For example:"model": "eng1:survey"
-
model
— as an additional field in themetadata
dictionary for each element of theutterances
array. Themodel
field for eachutterances
element identifies the language model that was selected by LID for use in transcribing that utterance. For example:"metadata": { "source": "spa-eng-sample.wav", "model": "eng1:callcenter", "uttid": 0, "channel": 0 }
-
langinfo
— Breakdown of language information that is added when there was more than one language detected. For example:"langinfo": { "spa": { "utts": 1, "speech": 17.46, "conf": 1.0, "time": 21.56 }, "eng": { "utts": 1, "speech": 1.35, "conf": 0.81, "time": 0.93 }
-
langfinal
— Added to thelidinfo
object when the LID language has been detected in the audio channel, but with a confidence that is less than thelidthreshold
value. In these cases, the default model (or the model specified by the model parameter) is used to transcribe the audio instead.langfinal
indicates the language of the model that was actually used to transcribe the audio. For example:"lidinfo": { "lang": "spa", "speech": 1.35, "langfinal": "eng", "conf": 0.81 }