The utterances array
The top-level utterances
array is included in a JSON transcript when any text is decoded from audio. The utterances
array is an array of objects, and it contains one object for each utterance.
An utterance is defined in this context as an uninterrupted chain of spoken language by a single speaker. An utterance ends with a period of silence that exceeds a threshold duration or that exceeds the maximum utterance duration threshold. Each object in the utterances
array may contain the elements in the following table:
Element |
Availability |
Type |
Description |
---|---|---|---|
emotion |
All |
string |
Emotional intelligence consists of both acoustic and linguistic information. Events can be given the following values:
|
confidence |
All |
number |
A measure of how confident the speech recognition system is in its utterance transcription results.
|
end |
All |
number |
End time of the utterance in seconds |
recvtz |
All |
array |
An array containing two values:
|
sentiment |
All |
string |
Utterance-level linguistic sentiment value:
Sentiment values are derived from the ratio of positive to negative classifications as determined by |
sentimentex |
All |
array |
Contains detailed sentiment information for each utterance
|
gender |
All |
string |
Gender prediction of the speaker |
rawemotion |
All |
string |
Acoustic emotion values (version 7.1+):
Acoustic emotion values (prior to version 7.1):
|
lidinfo |
V‑Blaze version 7.1+ |
array |
The
For example:
|
start |
All |
number |
Start time of the utterance in seconds |
donedate |
All |
string |
Date and time the utterance transcription was completed by the speech-to-text engine |
recvdate |
All |
string |
Date and time the utterance was received by the speech-to-text engine |
events |
All |
array |
Contains information about individual words. Each element is a word object that contains the following values:
For example:
Objects in the events array may contain additional key-value pairs depending on the parameters specified with the transcription request. |
metadata |
All |
object |
Speaker and information of the utterance. Each object contains the following values:
For example:
|
music |
V‑Blaze version 7.3+ |
boolean |
Appears only if audio was processed using the stream tag |
musicinfo |
V‑Blaze version 7.3+ |
object |
Appears only if audio was processed using the stream tag
|