Single-channel (mono) and channel-separated audio
It is important to distinguish between single-channel (mono) and channel-separated audio.
In mono audio, all speakers are recorded on a single channel.
In channel-separated audio, each speaker is isolated to a distinct channel.
Channel-separated audio makes it possible to transcribe each channel independently and maintain a perfect correspondence between the person speaking and the words spoken. For analytic purposes, it is important to have each speaker on a separate channel.
For example, channel-separated audio not only decouples overtalk from overall accuracy, but it also allows for an objective measurement of the overtalk in calls. However, in single-channel (mono) audio, the greater the overtalk, the lower overall accuracy will be.
Voci employs a process called diarization of mono audio to separate speakers into separate channels. The effectiveness of diarization is decreased when source audio includes hold music, voice recordings, or more than two speakers. Overtalk may also negatively impact diarization accuracy. However, for typical agent and caller situations with only two speakers, diarization is very effective for separating speakers to their own channels for enhanced analytics.
Types of Errors from Mono Transcripts
The following list describes four types of errors that may occur when transcribing mono audio with diarization applied.
- Overtalk Word Error
Overtalk word errors are caused by two people speaking at the same time. This creates an unintelligible audio region that cannot be transcribed reliably. Overtalk negatively impacts accuracy whether or not diarization is used.
- Diarization Word Error
Diarization word errors are caused by the diarizer splitting channels within a word instead of between words. When this incorrect splitting occurs, each word fragment is transcribed independently, resulting in error.
- Diarization Side Error
Diarization side errors occur when the diarizer makes an incorrect assignment and places speech on the wrong channel.
- Side Classification Error
Side classification errors are caused by failures to correctly identify the side of the conversation containing the majority of the contact center agent’s speech.
Types of Errors from Stereo Transcripts
Transcripts of channel-separated stereo audio only contain a single type of error, which is incorrect transcription of an audio region. This is referred to as word error.