lid

Values: false, true, language, language_model

Description:

The lid parameter enables you to use the ASR engine's Language Identification (LID) module to identify the language spoken in the input audio and automatically use an appropriate language model. To force the use of an an alternate model with a different "domain" name, specify it using lid=language_model.

Note: LID is only supported for English-Spanish and English-French language pairs.
  • lid=true - automatically selects the language identification model based on the LID and language models that are available.

  • lid=language - the alternative language to detect. The primary language is determined from the primary model. In this case an alternative language model of the specified language is automatically selected.

  • lid=language_model - the alternate language model.

  • lid=language_model, language_model - for dual-channel audio, this specifies the alternate language models to use for channel 0 and channel 1 respectively.

  • lid=LANG:notext - use this to not decode text from speech when the language LANG is detected. If this option is specified and LANG is detected, then utterances in the request's JSON transcript output do not contain any word events or metadata models. Specify only the base language model when using this option; do not include the region or domain. For example: lid=spa:notext is valid, but lid=eng-us:callcenter:notext is not.

  • lid=language:info - use this to decode all audio using the primary model, but provide language identification information in the transcript.

  • lid=false - lid is not used.

Note: LID will only be performed if the audio submitted has a sample rate of 8000.

The following parameters provide additional options when using the lid tag:

Table 1. Additional LID Options

Name

Values

Description

lidmaxtime

integer (default is 20)

Specifies the maximum audio duration (seconds) to analyze. For example, if lidmaxtime =20, the ASR engine will analyze 20 seconds of audio at most. This tag has no effect when lidutt=true .

lidthreshold

float between 0 and 1 (default is 0)

Adjusts the confidence level required for the system to select the alternative language. Setting this option to values greater than zero will increase preference for the default model.

lideffort

integer (default is 0.7)

Specifies the required confidence level before lid will stop analyzing audio. Defaults to 0.7.

lidoffset= N

integer

Delay start of LID until specified (N) seconds into audio. If there is not enough audio left after offset, this will process preceding utterances in reverse.

  • lidoffset=N behaves differently if lidutt is enabled. For example:

    • If lidoffset=N and lidutt=false then utterances beginning before the specified offset will not be sent to the LID engine unless there is not enough audio after the offset for the LID engine to make a conclusive decision. Note that when lidutt=false , the ASR engine cannot begin transcribing the audio until LID completes. This means lidoffset=N for large values of N may have significant performance impacts on the transcription.

    • If lidoffset=N and lidutt=true - utterances before the specified offset will not be sent to the LID engine under any condition. In this case large values of N have no impact on performance.

lidprior

float between 0 and 1 (default is 0.5)

Defines the prior probability distribution of the alternative lid language being spoken. The default value of lidprior is 0.5, which indicates to the ASR engine that there is an equal probability of the alternative lid language and primary language being spoken.

lidutt

true, false

Run LID on every utterance. The default is only once per stream or audio channel. This option is only available with V‑Blaze 7.1+.

In V‑Blaze version 7.2+, the lid tag may be omitted when using lidutt . Prior to version 7.2, both parameters needed to be specified to run per utterance LID, for example, lid=spa1:callcenter lidutt=true . In version 7.2+, the same command could be executed with the following: lidutt=spa1:callcenter .

Note: This option has a significant performance impact and should only be used when necessary.
Note: Language identification will stop analyzing audio once confidence exceeds the value specified in lidthreshold or goes over the audio duration limit set in lidmaxtime .

When LID scoring is below the decision threshold, the ASR engine will transcribe the audio with the language model specified by the model tag (or the default model for the ASR configuration if model is not explicitly provided). The results are indicated by a lidinfo.langfinal element in the JSON output.

Language identification is a licensed optional feature.

For additional information about using the lid tag, see:

Receiving language identification information