The app_data object

The top-level app_data object contains information calculated from the entire transcript, rather than from individual utterances. The app_data object also includes metadata describing the audio's basic attributes and the score results from any applications configured for the folder that processed the audio and transcript.

The app_data object includes the elements in the following table. Note that some agent- or client-specific fields may not appear for single-channel audio that is not diarized.

Important: The app_data object is generated by V‑Spark, and does not appear in V‑Blaze or V‑Cloud JSON transcripts.
Table 1. Elements in the app_data object

Element

Type

Description

agent_channel

number

The audio channel with agent speech.

agent_clarity

string

How clear the speech on the agent channel is. Expressed as a range between 0 and 1, where 1 is clearest.

agent_emotion

string

Indicates overall agent emotion, calculated using both acoustic and linguistic information, with one of the following values:

  • Positive

  • Improving

  • Negative

  • Worsening

Note: V‑Spark defaults to a value of Positive for agent_emotion and client_emotion in the absence of specific data from ASR results.

agent_gender

string

Gender prediction for the speaker classified as agent.

client_channel

number

The audio channel with client speech.

client_clarity

string

How clear the speech on the agent channel is. Expressed as a range between 0 and 1, where 1 is clearest.

client_emotion

string

Indicates overall client emotion, calculated using both acoustic and linguistic information, with one of the following values:

  • Positive

  • Improving

  • Negative

  • Worsening

Note: V‑Spark defaults to a value of Positive for agent_emotion and client_emotion in the absence of specific data from ASR results.

client_gender

string

Gender prediction for the speaker classified as client.

datetime

string

Transcript date and time expressed in Coordinated Universal Time (UTC).

diarization

number

Indicates the level of confidence the system has in its classification of agent and client for audio with two speakers on a single channel. Expressed as a range between 0 and 1, where 1 indicates the best speaker separation.

duration

string

The duration of the initial audio file, expressed in hours, minutes, and seconds using the format HH:MM:SS.

overall_emotion

string

Indicates the audio file's overall emotion, calculated using both acoustic and linguistic information, with one of the following values:

  • Positive

  • Improving

  • Negative

  • Worsening

overtalk

string

Percentage of call when the agent talks over or interrupts the client. Equal to the number of turns where the agent initiated overtalk divided by the total number of agent turns.

scorecard

object

Contains any application scores that have been calculated for the transcript.

silence

string

Percentage of overall duration that is silence. Equal to all non-speech time, calculated as call duration minus the sum of the duration of each word. If music and noise are not decoded to word events, they are counted as silence.

tId

number

The unique transcriptID used to reference a particular transcript.

url

string

Location of the audio file associated with the transcript.

words

number

Total number of words in a transcript.