Receiving language identification information

Language identification (LID) information is written in a lidinfo section of the JSON transcript of an audio file. The JSON transcript also contains information about the language model specified during transcription and the model selected by the language identification module to transcribe each utterance.

The lid parameter enables you to use the ASR engine's Language Identification (LID) module to identify the language spoken in the input audio and automatically use an appropriate language model. To force the use of an an alternate model with a different "domain" name, specify it using lid=language_model.

Refer to the parameter reference for more detail on additional lid options and how to use them:

The following cURL example would submit the file sample1.wav for transcription with language identification activated:

curl -F lid=true \ -F file=@sample1.wav \ https://vcloud.vocitec.com/transcribe

The response to this POST command is a JSON file that contains a transcript, sample1.json, that includes a lidinfo section, as in the following stereo audio example:

"lidinfo": { "0": { "conf": 1.0, "lang": "spa", "speech": 8.4499999999999993 }, "1": { "conf": 0.92000000000000004, "lang": spa", "speech": 0.98999999999999999 }, },

The lidinfo section is a global, top-level dictionary that contains the following fields:

  • conf — the confidence score of the language identification decision

  • lang — the three-letter language code specifying the language that was identified for the stream

  • speech — the number of seconds of automatically detected speech that were used to determine the language used in the stream

In addition, the JSON transcript also includes the following additional fields:

  • model — as a top-level field that reports the value of the language model that was specified by the model tag. For example:

    "model": "eng1:survey"
  • model — as an additional field in the metadata dictionary for each element of the utterances array. The model field for each utterances element identifies the language model that was selected by LID for use in transcribing that utterance. For example:

     "metadata": { "source": "spa-eng-sample.wav", "model": "eng1:callcenter", "uttid": 0, "channel": 0 }
  • langinfo — Breakdown of language information that is added when there was more than one language detected. For example:

     "langinfo": { "spa": { "utts": 1, "speech": 17.46, "conf": 1.0, "time": 21.56 }, "eng": { "utts": 1, "speech": 1.35, "conf": 0.81, "time": 0.93 }
  • langfinal — Added to the lidinfo object when the LID language has been detected in the audio channel, but with a confidence that is less than the lidthreshold value. In these cases, the default model (or the model specified by the model parameter) is used to transcribe the audio instead. langfinal indicates the language of the model that was actually used to transcribe the audio. For example:

    "lidinfo": { "lang": "spa", "speech": 1.35, "langfinal": "eng", "conf": 0.81 }