Single-channel (mono) and channel-separated audio

It is important to distinguish between single-channel (mono) and channel-separated audio.

  • In mono audio, all speakers are recorded on a single channel.

  • In channel-separated audio, each speaker is isolated to a distinct channel.

Channel-separated audio makes it possible to transcribe each channel independently and maintain a perfect correspondence between the person speaking and the words spoken. For analytic purposes, it is important to have each speaker on a separate channel.

For example, channel-separated audio not only decouples overtalk from overall accuracy, but it also allows for an objective measurement of the overtalk in calls. However, in single-channel (mono) audio, the greater the overtalk, the lower overall accuracy will be.

Voci employs a process called diarization of mono audio to separate speakers into separate channels. The effectiveness of diarization is decreased when source audio includes hold music, voice recordings, or more than two speakers. Overtalk may also negatively impact diarization accuracy. However, for typical agent and caller situations with only two speakers, diarization is very effective for separating speakers to their own channels for enhanced analytics.

Note: Recording channel-separated source audio instead of mono will typically generate a 10% accuracy increase. Voci highly recommends using channel-separated audio for transcription.

Types of Errors from Mono Transcripts

The following list describes four types of errors that may occur when transcribing mono audio with diarization applied.

Overtalk Word Error

Overtalk word errors are caused by two people speaking at the same time. This creates an unintelligible audio region that cannot be transcribed reliably. Overtalk negatively impacts accuracy whether or not diarization is used.

Diarization Word Error

Diarization word errors are caused by the diarizer splitting channels within a word instead of between words. When this incorrect splitting occurs, each word fragment is transcribed independently, resulting in error.

Diarization Side Error

Diarization side errors occur when the diarizer makes an incorrect assignment and places speech on the wrong channel.

Side Classification Error

Side classification errors are caused by failures to correctly identify the side of the conversation containing the majority of the contact center agent’s speech.

Note: The errors mentioned above do not apply to channel-separated audio.

Types of Errors from Stereo Transcripts

Transcripts of channel-separated stereo audio only contain a single type of error, which is incorrect transcription of an audio region. This is referred to as word error.