Music

When music detection is enabled, all utterances will be passed through an algorithm to be classified as music or non-music. Utterances classified as non-music will be handled as normal. Utterances classified as music are assumed to contain noisy audio and will not be transcribed.

Important: Utterances classified as music will not be processed by any optional transcription features such as LID, GID, EID, and diarization. Optional transcription features ignore any utterance classified as music.

Utterances that are not classified as music will not have a music field.

Use the following parameters to enable and configure music detection.

Table 1. Music Detection

Name

Values

Description

music

true, info, false (default)

Set music=true to enable music detection. When music detection is enabled, all utterances will be passed through an algorithm to be classified as music or non-music. Utterances classified as non-music will be handled as normal. Utterances classified as music are assumed to contain noisy audio and will not be transcribed.

By default, utterance level musicinfo fields are not included in the transcript. Set music=info to include this information. Within the musicinfo dictionary, score is the music score of the utterance as explained in musicthreshold and used is the amount of utterance audio used by the music identifier when classifying the utterance.

Music classification results are reported at the utterance level with a music field. For example, an utterance classified as music with music=info would show the following:

{
 "confidence": ...,
 "end": ...,
 "start": ...,
 "music": true,
 "musicinfo": {
   "score": 0.5,
   "used": 8.44
 },
 "events": ...
}

musicoffset

integer, default 0.0

Specified value in seconds indicating how long after the beginning of the stream before music identification should begin processing. Any utterances occurring before this offset will not be passed through the music identifier, meaning they will always be classified as non-music utterances.

musicthreshold

integer, default 0.3

Indicates the threshold for the minimum music score needed to classify an utterance as music. The music classifier provides a music score between -1 and 1, indicating both confidence and direction of its classification.

For example, an utterance with a music score of -0.9 has a high confidence of non-music, whereas an utterance with a music score of -0.1 has a low confidence of non-music. Likewise, an utterance with a music score of 0.9 has a high confidence of music, whereas an utterance with a music score of 0.1 has a low confidence of music. All utterances with a music score greater than or equal to the musicthreshold will be classified as music.

musicminutt

integer, default 1.0

Specified value in seconds indicating the minimum utterance duration needed to run music identification on an utterance. Utterances with a duration less than musicmintime will not be passed through the music detector, meaning they will always be classified as non-music utterances.