Adjusting for audio

Table 1. Adjusting for different types of audio input

Name

Values

Description

datahdr

WAVE (Default for files with .wav extension), NONE

In V‑Blaze version 7.2+, datahdr defaults to WAVE when not specified otherwise.

Set datahdr= WAVE when audio contains a RIFF header that specifies audio sampling rate, sampling width, and encoding. Filenames ending in the .wav extension typically possess such a header, although this is not guaranteed.

Set datahdr= NONE when audio does not contain a header. When using headerless audio, specific values for at least samprate, sampwidth, and encoding should be provided to help the ASR engine process the audio correctly.

Note: MP3 files do not work with V‑Blaze.

diarize

false (default), true, noise

Diarization is the process of recognizing distinct speakers on a single (mono) audio channel and segmenting detected speech into separate channels, which are identified in JSON output. Voci’s diarization capability is designed to do this for two speakers, typically a call agent engaged in a conversation with a client over the phone.

You should only set diarize to true under the following conditions:

  • You know that your audio only contains a single audio channel.

  • You know that 2 people are talking on the channel.

  • Segregation of 2 speakers in the transcripts is important for your use case.

Enabling diarize will include the following fields in JSON output:

  • diascore — Indicates the system's level of confidence that it correctly classified detected speech into individual channels. The confidence level is expressed as a range between 0 and 1, where 1 indicates the best speaker separation. Refer to Confidence scores for more information on the confidence scoring system.

  • chaninfo — Provides additional information specific to each channel. chaninfo only appears for stereo or diarized audio. Refer to Top-level elements for more information.

The noise setting is typically not needed. However, if you are experiencing excessive diarization errors due to interference from non-speech sources, you can apply noise reduction by setting diarize=noise.

Tip: Music identification is recommended instead of diarize=noise for noise filtration.
Note: Redaction accuracy is marginally reduced when used in combination with diarize. Avoid diarization when using any of the redaction options for maximum redaction accuracy.

Diarization is a licensed optional feature.

encoding

SPCM, UPCM, ULAW, ALAW

Specifies the algorithm used to encode the audio. Encoding must be supplied when raw or headerless audio is being transcribed.

Refer to encoding for more information on this parameter.

endian

LITTLE (default), BIG

Specifies the byte ordering of audio samples. In a BIG endian data word the most significant byte comes first, when reading from left to right. In a LITTLE endian data word, the least significant byte comes first. By convention, LITTLE endian (the default) is the most common.

This parameter is not required unless your audio uses BIG endian byte ordering.

nchannels

integer

Required when doing real-time decoding when there is no data header.

resample [INTERNAL ONLY]

true (default), false,

When resample=true, this enables resampling to 8000 Hz for all files with sample rates over 8000 Hz. Set resample=false to disable resampling.

Set resample to an integer to resample to a given sample rate.

samprate

integer

Specifies the sampling rate of the audio to be transcribed. Telephone audio is typically sampled at 8000 Hz. For best results, the sampling rate should be a multiple of 8000 (e.g., 8000, 16000, 24000, etc.). Values less than 8000 are not supported. The sampling rate must be supplied when raw or headerless audio is being transcribed.

sampwidth

integer

Specifies the size of each digitized audio sample in bytes. This parameter is only applicable if the encoding parameter is set to SPCM or UPCM.

This parameter is only applicable—but must be supplied—when raw or headerless audio is being transcribed and the encoding parameter is set to either SPCM or UPCM.