Common tags

Table 1. Common /transcribe API Parameters for V‑Blaze

Name

Values

Description

callback (optional)

URL: HTTP or HTTPS are supported

The URL to which V-Blaze will POST transcripts. A callback is the address and (optionally) method name and parameters of a web application that can receive data via HTTP or HTTPS. Callbacks are usually used to enable another application to receive and directly interact with the transcripts produced.

V‑Blaze transcripts are normally returned immediately and directly to the user or application that submitted the audio file for transcription. When a callback is specified, the resultant transcript is POSTed to the specified callback address and not returned in the response. V‑Blaze does not retry failed callbacks.

file (required)

PCM audio data in WAVE or RAW format

A single audio file to process.

model (optional) –

see language models

Indicates which language model(s) should be used to transcribe the audio. This parameter can be set to a single language model or a list of language models. If not specified, the default model will be used. Refer to model for more information on this parameter.

output

(optional)

Values: json (default), jsontop, text, jsonlist, jsontext, noutt

Indicates the desired output format. Refer to output for more information on this parameter.

realtime

false (default), true

Controls whether or not the ASR engine is processing incoming audio in real-time mode or not. Real-time mode is enabled based on a license setting and cannot be enabled using this setting if it is not enabled in the license. This tag is only useful to specify that the ASR engine not process incoming audio in real-time even though real-time is enabled in the license.

requestid

The unique identifier for the request for tracing purposes. This can be specified as a parameter or in the X-Request-Id HTTP header. If a requestid is provided in one of these ways, the specified requestid is included in JSON output and in the WebAPI access log.

Refer to requestid for WebAPI for more information on how to use requestid.

model

Values: installation-dependent

Description:

The model parameter is used to specify the language model(s) to use for transcription. This parameter can be set to a single language model to transcribe all channels, or a comma-separated list of language models. V‑Cloud only supports a single language model for this parameter. Voci works with customers to ensure that their deployment delivers the best results possible, providing the language models that are most closely associated with the types of audio that each customer is transcribing. You will receive model names which are authorized for your account from Voci Support.

V‑Blaze supports a comma-separated list of models in channel order. For example, if the client is on channel 0 and the agent is on channel 1, you could use different models for each channel by setting the model parameter to model=eng1:client,eng1:agent . That setting would use the eng1:client language model to transcribe channel 0 and the eng1:agent to transcribe channel 1.

Note: No spaces are permitted in the value specified for the model parameter.

If you don't specify a value for the model parameter, the first available model of your configuration will be used. To determine the default model, use the /models API call as illustrated in the following example.

$ curl http://example:17171/models
{"models":["eng-us:callcenter","eng1:voicemail","eng1:survey"]}

As shown in the example above, if you did not specify a model when transcribing audio, the eng-us:callcenter model would be used.

Voci works with customers to ensure that their deployment delivers the best results possible, providing the language models that are best aligned with the business domain from which the speech originates.

Refer to Language models for more information on supported languages.

requestid for WebAPI

The unique identifier for the request for tracing purposes. This can be specified as a parameter or in the X-Request-Id HTTP header. If a requestid is provided in one of these ways, the specified requestid is included in JSON output and in the WebAPI access log.

The requestid is included in the final transcript and also in utterance callbacks as a top-level field.

The requestid can be anything. For example, you could pass an id to fetch metadata from a table or you could pass all the metadata in the requestid as a CSV string or any format you prefer, such as JSON. The following example shows a requestid :

$ curl -F "requestid=john,1234,567-uuid" -F "output=jsontop" -F "file=@/opt/voci/server/examples/sample1.wav" localhost:17171/transcribe; echo 
{"source":"sample1.wav","confidence":0.89,"donedate":"2020-01-23 13:03:02.881927","requestid":"john,1234,567-uuid","recvtz":["EST",-18000],"text":"And that it was resolved in a very professional manner. Your employees a very good.","model":"devel:callcenter","recvdate":"2020-01-23 13:03:02.276387"}

The following is an example of the utterance callback JSON with requestid included:

{"source":"sample1.wav","utterance":{"confidence":0.89,"end":6.17,"recvtz":["EST",-18000],"text":"And that it was resolved in a very professional manner. Your employees a very good.","start":0.55,"donedate":"2020-01-23 13:04:47.875351","recvdate":"2020-01-23 13:04:47.274705","metadata":{"source":"sample1.wav","model":"devel:callcenter","uttid":0,"channel":0}},"requestid":"john,1234,567-uuid"}