The app_data object
The top-level app_data
object contains information calculated from the entire transcript, rather than from individual utterances. The app_data
object also includes metadata describing the audio's basic attributes and the score results from any applications configured for the folder that processed the audio and transcript.
The app_data
object includes the elements in the following table. Note that some agent- or client-specific fields may not appear for single-channel audio that is not diarized.
app_data
object is generated by V‑Spark, and does not appear in V‑Blaze or V‑Cloud JSON transcripts.
Element |
Type |
Description |
---|---|---|
agent_channel |
number |
The audio channel with agent speech. |
agent_clarity |
string |
How clear the speech on the agent channel is. Expressed as a range between 0 and 1, where 1 is clearest. |
agent_emotion |
string |
Indicates overall agent emotion, calculated using both acoustic and linguistic information, with one of the following values:
Note: V‑Spark defaults to a value of
Positive for agent_emotion and client_emotion in the absence of specific data from ASR results. |
agent_gender |
string |
Gender prediction for the speaker classified as agent. |
client_channel |
number |
The audio channel with client speech. |
client_clarity |
string |
How clear the speech on the agent channel is. Expressed as a range between 0 and 1, where 1 is clearest. |
client_emotion |
string |
Indicates overall client emotion, calculated using both acoustic and linguistic information, with one of the following values:
Note: V‑Spark defaults to a value of Positive for agent_emotion and client_emotion in the absence of specific data from ASR results. |
client_gender |
string |
Gender prediction for the speaker classified as client. |
datetime |
string |
Transcript date and time expressed in Coordinated Universal Time (UTC). |
diarization |
number |
Indicates the level of confidence the system has in its classification of agent and client for audio with two speakers on a single channel. Expressed as a range between 0 and 1, where 1 indicates the best speaker separation. |
duration |
string |
The duration of the initial audio file, expressed in hours, minutes, and seconds using the format HH:MM:SS. |
overall_emotion |
string |
Indicates the audio file's overall emotion, calculated using both acoustic and linguistic information, with one of the following values:
|
overtalk |
string |
Percentage of call when the agent talks over or interrupts the client. Equal to the number of turns where the agent initiated overtalk divided by the total number of agent turns. |
object |
Contains any application scores that have been calculated for the transcript. | |
silence |
string |
Percentage of overall duration that is silence. Equal to all non-speech time, calculated as call duration minus the sum of the duration of each word. If music and noise are not decoded to word events, they are counted as silence. |
tId |
number |
The unique |
url |
string |
Location of the audio file associated with the transcript. |
words |
number |
Total number of words in a transcript. |