Output options

The following parameters are used to refine the transcription output. Refer to output for more information on using the output parameter.

Table 1. Output Options

Name

Availability

Values

Description

agentid

V‑Blaze version 7.3+

true, false

Predicts whether the speaker is the agent or client. Set agentid=true when submitting a transcription request to enable agentid. Refer to agentscore for more information.

insecure

All

false (default), true

This option explicitly allows curl to perform "insecure" SSL connections and transfers. All SSL connections are attempted to be made secure by using the CA certificate bundle installed by default. This makes all connections considered "insecure" fail unless -k --insecure is used.

This option is only relevant when HTTPS URLs are provided for callback or utterance_callback.

Refer to http://curl.haxx.se/docs/sslcerts.html for more details on this parameter.

luhn

V‑Blaze version 7.3.1+

false (default), true, {integer list or range}

Controls whether numbers in transcription output are checked by the Luhn algorithm, which is used to validate different kinds of sensitive numbers, including credit card and personal identification numbers. Luhn-valid numbers are counted and flagged in transcription output. Refer to luhn for more information on this parameter.

music

V‑Blaze version 7.3+

true, false

Set music=true to enable music detection. When music detection is enabled, all utterances will be passed through an algorithm to be classified as music or non-music. Utterances classified as non-music will be handled as normal. Utterances classified as music are assumed to contain noisy audio and will not be transcribed.

By default, utterance level musicinfo fields are not included in the transcript. Set music=info to include this information. Within the musicinfo dictionary, score is the music score of the utterance as explained in musicthreshold and used is the amount of utterance audio used by the music identifier when classifying the utterance.

Music classification results are reported at the utterance level with a music field. For example, an utterance classified as music with music=info would show the following:

{
 "confidence": ...,
 "end": ...,
 "start": ...,
 "music": true,
 "musicinfo": {
   "score": 0.5,
   "used": 8.44
 },
 "events": ...
}

numtrans

All

true (default), false

Controls whether certain words in transcribed text are converted into numeric digits and related conventional formats, including dollar amounts, wall-clock times, percentages, ordinals, web addresses, and telephone numbers. For example, with numtrans set to true (the default), the words “forty two percent” would be transformed into the text “42%”.

Refer to numtrans for more information on this parameter.

outstream

All

true (default), false

Use to enable or disable real-time utterance streaming for a number of use cases.

  1. If realtime is disabled, outstream=true streams a complete (post-call) transcript; however, the resulting stream maintains the same format as real-time utterance streaming.

  2. If realtime is enabled, outstream=false disables utterance streaming.

  3. If realtime is enabled and an utterance_callback is set, outstream=true enables real-time utterance streaming to the specified callback server.

punctuate

All

true (default), false

Controls whether transcript text is punctuated or not. In most cases it is desirable to leave punctuation turned on, but there are special cases where it should be turned off. For example, if you are evaluating the Word Error Rate (WER) of Voci’s transcripts, punctuation must be disabled.

As of V‑Blaze version 7.1.0-1, setting punctuate=false will generate English output in lowercase.

textinfo

V‑Blaze version 7.3+

true (default), false

The textinfo object is included in a JSON transcript by default when any text is decoded from an audio file. To exclude the textinfo object, specify the stream tag textinfo=false when submitting audio for transcription.

For more information, refer to The textinfo object.

turnlist

V‑Blaze version 7.3+

true (default), falseShows the number of distinct speaker turns detected in the audio, indicated by turns.

utterance_callback

All

URL

Specify the URL of a callback server to stream each utterance as it is transcribed. A callback server is required for real-time speech processing. As used in the ASR engine, a callback is the address and (optionally) method name and parameters of a web application that can receive data via HTTP or HTTPS. In the ASR engine, callbacks are usually used to enable another application to receive and directly interact with the transcripts produced by the ASR engine.

Refer to utterance_callback for more information on this parameter.

utterance_fmt

All

json (default), jsontop, text, noutts

Select an output format for utterances. The value specified here only applies when streaming utterances. The output format of complete (post-call) transcripts will continue to appear as the default JSON output unless specified otherwise by the output parameter.

Important: realtime is required to use this option.

zip

All

false (default), true

Specifies whether or not to place the transcript within a zip file.