Platform features

Table 1. Voci ASR platform features

Out-of-vocabulary (OOV)



Support rolling out to all languages; models version 7.5+ support OOV

OOV (out-of-vocabulary) is an ASR tuning feature designed to improve transcription accuracy for audio that contains brand- and industry-specific terminology. OOV enhances existing language models with new words and preferential treatment for those words.

Voice activity detectionBothBothAllAlgorithm used to detect the differences between human voice, noise, and silence. Utilized to identify only the parts of audio that are recognized as speech and sent to the ASR; configurable for a given use case.
Auto punctuationBothBothAll Adds punctuation and capitalization. Fully punctuated transcripts significantly improve speech analysis by increasing the understanding of the caller's intended meaning.
Number translationsBothBothAll

Controls whether certain words in transcribed text are converted into numeric digits and related conventional formats, including dollar amounts, wall-clock times, percentages, ordinals, web addresses, and telephone numbers. For example, with numtrans set to true (the default), the words “forty two percent” would be transformed into the text “42%”.

TranscodingPost-callV‑Cloud onlyAllDetermines whether V‑Cloud should use its built-in decoders to try to convert incoming audio into a supported format, if necessary.
Output formattingBothBothAllAllows a customer to specify the transcript delivery format. The following outputs are supported: json (default), jsontop, text, noutts
CallbacksBothBothAllCallbacks are used to enable another application to receive and directly interact with the produced transcripts. Allows for automated production workflows for speech transcription.
Text redactionBothBothAll Redacts numbers from a transcript. Automated numeric redaction reduces PCI/PII risk by automatically finding and eliminating credit card and other sensitive numbers from audio and text.
Audio redactionBothBothAll Replaces sensitive segments of an audio file with silence. Automated redaction reduces PCI/PII risk by automatically finding and eliminating credit card and other sensitive numbers from audio and text.
Speaker separation (diarization)Post-callBothAll Automatic speaker separation of customer and agent voices when both are recorded on one channel, enabling their utterances to be analyzed independently. This is referred to as diarization.
Global language coverageBothBothAllVoci supports 30+ languages, accents and domains.
REST APIBothBothAllVoci provides several different APIs for our products:
Direct-to-Transcript audio connectorBothBothAllDirect-to-Transcript (DtT) technology integrates seamlessly and directly into call center telephony systems, immediately capturing and transcribing call audio while also capturing associated metadata. This process is critical for deep analytics.
Protocol supportBothBothAllhttp/https, Websockets, MRCP/uniMRCP, AudioCodes, SIP
Platform integrations and connectorsBothBothAllFive9, 8x8, Calabrio, Genesys, Verint, AWS Connect, and others