Common tags

Table 1. Common /transcribe API Parameters for V‑Cloud

Name

Values

Description

callback (optional)

URL: HTTP or HTTPS are supported

The URL to which the resultant transcript will POST. A callback is the address and (optionally) method name and parameters of a web application that can receive data via HTTP or HTTPS. Callbacks are usually used to enable another application to receive and directly interact with the produced transcripts.

Once a callback returns success (indicated by HTTP code 200), the result is no longer available from V‑Cloud. If a callback fails, it will be retried until it succeeds or until a maximum number of retries is reached.

Note:

V‑Cloud supports HTTP basic authentication for callbacks. If a callback server requires authentication information, prepend your access credentials to the hostname of the URL as shown in the following example:

https://username:password@hostname.com

callbackurl (optional)

false (default), true

Used to contain transcription results within a URL when posting to a callback host. If the value is set to false, the results file is posted directly to the callback host. If the value is set to true, a URL containing the results file is posted instead.

Receiving results via URL is much lighter than the alternative (especially if scrubaudio=true as these result files contain audio and may be very large). Additionally, callbackurl offers more flexibility to the callback host as the host may pull the results from the provided URL at its convenience rather than being forced to accept the result files it in the callback POST itself.

Note:

A few things to consider when using callbackurl :

  1. V‑Cloud does not automatically delete the result files when callbackurl is set to true. Make an API call to V‑Cloud's DELETE method after successfully retrieving the results.

  2. The URL provided in the callback expires after one day; however, a new URL may always be requested through the /transcribe/result endpoint for up to 14 days. If the results are not retrieved within 14 days, they will be lost.

file (required)

Supported zip file formats are zip (MIME type application/zip) and 7z (MIME type application/x-7z-compressed).

The zip file can be password encrypted, with the zpass parameter specifying the password.

Supported audio formats are PCM and ITU G.711.

A single audio file or zip file that contains one or more audio files to process.

The Linux file command gives the following outputs for accepted audio file formats:

$ file example1.wav
example1.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 8000 Hz

$ file example2.wav
example2.wav: RIFF (little-endian) data, WAVE audio, ITU G.711 mu-law, mono 8000 Hz

$ file example3.wav
example3.wav: RIFF (little-endian) data, WAVE audio, ITU G.711 A-law, mono 8000 Hz

The key compatibility indicators are WAVE LPCM and ITU G.711. The sample rate may be between 8000 and 16000 Hz, where 8000 Hz is preferable. The file may also contain more than one channel.

Tip: Transcoding from other audio formats is available as an optional licensed feature. Refer to Adjusting for audio for more information on the transcode parameter.

url

(alternative to file)

URL

Available with V‑Cloud version 1.6 and later:

Use url as an alternative to the file parameter for submitting audio data to Voci. The provided URL must support HTTP GET and return a Content-Length header when queried. V‑Cloud verifies that the data is properly received if either Content-MD5 or ETag headers are provided in response to querying the URL. The ETag is only used for verification if it conforms to proper MD5.

The following url features are only available with V‑Cloud version 1.6-2021.02.25 and later:

The url parameter supports basic HTTP authentication. Prepend the access credentials to the hostname when a URL audio source requires authentication, as shown in the following example:

curl -F token=your-token-here \
     -F url=http://username:password@hostname.com/sample.wav \
      https://vcloud.vocitec.com/transcribe

The url parameter supports HTTP authorization request headers. Include the authorization header, authorization type, and access credentials when a URL audio source requires authorization, as shown in the following example:

curl -F token=your-token-here \
     -F url=http://hostname.com/sample.wav \
     -H 'Authorization: auth_type credentials' \
      https://vcloud.vocitec.com/transcribe

The url parameter supports presigned URLs from Amazon Web Services (AWS) to submit audio data to V‑Cloud. Using AWS to share a presigned URL requires a named profile and valid security credentials. Refer to Named Profiles and Sharing an object with a presigned URL for more information.

To use AWS as an audio source for the url parameter, include the presigned URL, the name of the profile associated with the presigned URL, and the region, as shown in the following example:

curl -F token=your-token-here \
     -F url="$(aws s3 presign s3://s3bucketname/sample.wav --profile profilename --region us-east-2)" \
     -F filemd5=your-MD5-sum-here
     -X POST \
      https://vcloud.vocitec.com/transcribe
Note: Voci recommends specifying the audio's MD5 sum with filemd5=your-MD5-sum-here in requests that use presigned AWS URLs to submit audio data. If getting the MD5 sum is an issue, an alternative option is to disable MD5 verification with filemd5=false .

filetype (optional)

Content-Type

Used to manually specify the Content-Type of your audio as shown in the following example:

curl -F token=your-token-here \
     -F url=http://username:password@hostname.com/audio-sample \
     -F filetype=audio/x-wav \
     https://vcloud.vocitec.com/transcribe
Note: In most cases, setting the filetype is not necessary because the file type is automatically determined by the filename extension. However, in some situations, such as when a URL source does not contain a file extension, it may be necessary to manually specify the filetype .

model (optional)

see language models

The model parameter is used to specify the language model to use for transcription. The value that specified for this parameter should be a single language model to transcribe all channels. Voci works with customers to ensure that their deployment delivers the best results possible, providing the language models that are most closely associated with the types of audio that each customer is transcribing. You will receive model names which are authorized for your account from Voci Support.

output (optional)

Values: json (default), jsontop, text

Indicates the desired output format. Refer to output for more information on this parameter.

token (required)

Used to authenticate and authorize the request. You will receive a token from Voci Support to use with requests to the ASR server. All requests made with your token will be tied to your account. Please notify Voci Support immediately if your token is compromised or lost.

requestid

The unique identifier for the request about which you want to retrieve results or status information. This is auto-generated and appears in the JSON output.