Speech intelligence

Table 1. Speech intelligence features
FeatureReal-time/Post CallLocal/CloudLanguagesDescription
Emotion BothBothAllClassifies emotion based on combined acoustic features and word sentiment scores.
Emotional intelligence BothBothEnglish onlyVoci's emotion detection feature uses a synthesis of acoustic features and word sentiment scores to determine if a given utterance is Positive, Improving, Neutral, Worsening, or Negative.
Detailed sentiment scoring BothBothEnglish onlyClassifies sentiment based on word usage at the call and utterance level. Custom sentiment rules can also be applied.
Confidence scores BothBothAll Scores words, utterances, and calls for the system's confidence in the transcription results.
Speaker turnsBothBothAllThe number of distinct speaker turns detected in the audio.
Speaker timeBothBothAllMetric with the total audio time in seconds during which words were detected. Also provided is the percentage of total audio time during which words were detected.
Total word countsBothBothAllThe total of number of words spoken in the transcribed audio file.
Language Identification (LID) BothBothEnglish, Spanish, FrenchIf a LID-supported language is detected, the ASR engine will switch to the same model of the detected language.
Gender identification BothBothAllIdentifies speakers as male or female.
Agent identification BothBothEnglish onlyIdentifies which channel is the agent versus the customer.
Music detection BothBothAllAcoustic-based classification model that identified when music occurs. Each utterance is scored -1 to +1, corresponding to the probability that it is music. Music utterances are not transcribed.
Overtalk BothBothAllOvertalk occurs when speakers talk over one another. A recording's overtalk percentage is the count of Agent-initiated overtalk turns as a percentage of the total number of Agent-speaking turns. In other words, out of all of the Agent's turns, it measures how many turns interrupted a Client's turn.
Silence BothBothAllAn utterance is an uninterrupted chain of spoken language by a single speaker. An utterance ends with a period of silence that exceeds a threshold duration or that exceeds the maximum utterance duration threshold.
Text information and word counts BothBothAllThe total number of words is provided for each call depending on parameters. Other counts included are:
  • Number of seconds and and average audio time spent on speech, overtalk (including number of occurrences), and silence.
  • Number of speaker turns.
  • Number of substitutions, when enabled.
Credit card detection BothBothIdentifies which numbers are likely credit cards (n-16 digits) by adding a tag to the transcript metadata file (even if number was redacted). Luhn numbers are not redacted when detected, and there is no "scrub only Luhn numbers" functionality.
Speaker turnsBothBothAllAdds the number of distinct speaker turns in the audio, for stereo or diarized audio only.