WebSocket API interface

The WebSocket interface allows for bidirectional streaming of data to and from the V‑Blaze REST API.

Opening a connection

To initiate a WebSocket connection, connect to ws://<webapi_host_here>:17171/transcribe using any typical WebSocket client. Note that WebSocket Secure (wss) is not supported.

All transcription tags must be specified in either the connection query string:

ws://webapi:17171/transcribe?model=eng1:callcenter&output=text

Or the HTTP headers in the following format:

  • X-Voci-Model: eng1:callcenter

  • X-Voci-Output: text

Submitting audio

Once successfully connected, audio may be submitted by writing directly to the WebSocket. Any number of write calls may be made and the audio data may split between write calls in any manner. Once all audio has been submitted, a WebSocket write of 0 bytes (an empty dataframe) must be used to indicate that all audio has been submitted.

Receiving results

Results may be received in a variety of formats depending on the tags.

Default

By default, a single textual transcript or a zipped file, depending on the provided tags, will be returned to the client. This response is guaranteed to be sent within a single textual (for transcripts) or binary (for zipped data) WebSocket message.

Utterance Streaming

If outstream=true and no utterance_callback is provided, utterance results will be streamed back to the client. Each utterance result will be sent in its own textual WebSocket message. After all utterance results are sent, an empty message will be sent, followed by the complete transcription.

Scrubbed Audio Streaming

If scrubaudio=true and outstream=true , utterance and audio data will both be streamed back to the client. Utterance data will always be contained within textual WebSocket messages while audio data will be within binary WebSocket messages. The utterance and audio message may be interleaved. Like with non-audio streaming, the last textual WebSocket message sent will contain the complete transcription.

Note that the outstream tag defaults to true when realtime=true is specified, so it does not need to be explicitly specified.

Closing the connection

Once the API has returned all response data, it will initiate a close. Any client-initiated close will be interpreted as an error and may result in the loss of data.

For a full code sample using the WebSocket interface, refer to WebSocket example.

WebSocket example

The following is a simple python example that illustrates how to interact with the V‑Blaze REST API WebSocket interface. The URL , FILE , and HEADERS variables may be modified to change the behavior of this script.


#!/usr/bin/python3

# pip3 install websocket_client
import websocket

# URL of the V-Blaze machine to connect to
URL = 'ws://localhost:17171/transcribe'

# Path of an audio file to transcribe
FILE = 'sample1.wav'

# HTTP headers. May specify V-Blaze parameters like:
# 'X-Voci- tag_name': 'tag_value'
HEADERS = {
    'X-Voci-Model': 'eng1:callcenter',
    'X-Voci-Output': 'text',

    # Audio encoding must be specified explicitly when using the websocket
    # interface. This instructs V-Blaze that the provided audio will have a
    # WAVE header from which the audio encoding information may be extracted.
    'X-Voci-Datahdr': 'WAVE'

    # For tags which contain special characters you may escape them like so:
    # from urlparse.parse import quote
    # 'X-Voci-Subst_rules': quote(...),
}

# ------------------------------------------------------------------------------

def main():
    # Create and run a websocket application. The on_open function is called
    # after the connection is established. The on_message function is called
    # each time we receive a message from V-Blaze. The on_close function is
    # called when the connection is closed.
    ws = websocket.WebSocketApp(
        URL, header=HEADERS,
        on_open=on_open, on_message=on_message, on_close=on_close
    )
    ws.run_forever()
    ws.close()

def on_open(ws):
    # Sends the file to V-Blaze. This example sends the entire file in a single
    # send call; however, you may split the audio data into smaller chunks and
    # make multiple send calls. This is especially useful if you want to avoid
    # loading the entire file into memory or if you are submitting audio in
    # realtime.
    with open(FILE, 'rb') as f:
        ws.send(f.read(), opcode=websocket.ABNF.OPCODE_BINARY)
    ws.send(bytearray(), opcode=websocket.ABNF.OPCODE_BINARY)

def on_message(ws, message):
    # V-Blaze sends text websocket dataframes containing transcripts and binary
    # dataframes containing audio. This if statement distinguishes between the
    # two. If "X-Voci-Scrubaudio: true" was not specified then this check is not
    # necessary since no audio data will be received.
    if isinstance(message, str):
        # Handle transcript
        print(f'Received transcript: {message}')
    else:
        # Handle scrubbed audio
        print(f'Got {len(message)} bytes of scrubbed audio')
        with open(f'{FILE}.scrubbed', 'ab') as f: f.write(message)
    
def on_close(ws, status, message):
    # status is an integer websocket status code.
    # message is a string detailing the reason for the close (may be empty).
    # A 1000 status code is used for successful closes
    if status == 1000: print('Closed successfully')
    else: print(f'Closed with code {status} and error: {message}')

if __name__ == '__main__': main()