Speech intelligence
Feature | Real-time / Post -call | Local / V‑Cloud | Languages | Description |
---|---|---|---|---|
Emotion | Both | Both | All | Classifies emotion based on combined acoustic features and word sentiment scores. |
Emotional intelligence | Both | Both | English only | Voci's emotion detection feature uses a synthesis of acoustic features and word sentiment scores to determine if a given utterance is Positive, Improving, Neutral, Worsening, or Negative. |
Detailed sentiment scoring | Both | Both | English only | Classifies sentiment based on word usage at the call and utterance level. Custom sentiment rules can also be applied. |
Confidence scores | Both | Both | All | Scores words, utterances, and calls for the system's confidence in the transcription results. |
Speaker turns | Both | Both | All | The number of distinct speaker turns detected in the audio. |
Speaker time | Both | Both | All | Metric with the total audio time in seconds during which words were detected. Also provided is the percentage of total audio time during which words were detected. |
Total word counts | Both | Both | All | The total of number of words spoken in the transcribed audio file. |
Language Identification (LID) | Both | Both | English, Spanish, French | If a LID-supported language is detected, the ASR engine will switch to the same model of the detected language. |
Gender identification | Both | Both | All | Identifies speakers as male or female. |
Agent identification | Both | Both | English only | Identifies which channel is the agent versus the customer. |
Music detection | Both | Both | All | Acoustic-based classification model that identified when music occurs. Each utterance is scored -1 to +1, corresponding to the probability that it is music. Music utterances are not transcribed. |
Overtalk | Both | Both | All | Overtalk occurs when speakers talk over one another. A recording's overtalk percentage is the count of Agent-initiated overtalk turns as a percentage of the total number of Agent-speaking turns. In other words, out of all of the Agent's turns, it measures how many turns interrupted a Client's turn. |
Silence | Both | Both | All | An utterance is an uninterrupted chain of spoken language by a single speaker. An utterance ends with a period of silence that exceeds a threshold duration or that exceeds the maximum utterance duration threshold. |
Text information and word counts | Both | Both | All | The total number of words is provided for each call depending on parameters. Other counts included are:
|
Credit card detection | Both | Both | Identifies which numbers are likely credit cards (n-16 digits) by adding a tag to the transcript metadata file (even if the number was redacted). Luhn numbers are not redacted when detected, and there is no Luhn-specific redaction functionality. | |
Speaker turns | Both | Both | All | Adds the number of distinct speaker turns in the audio, for stereo or diarized audio only. |