Use callbacks to receive results

In REST applications, a callback is the address and (optionally) the method name and parameters of a web application that can receive data via HTTP. Callbacks are typically used to enable another application to receive and directly interact with the transcripts produced by V‑Spark.

This tutorial shows setting up a simple callback server, submitting audio for transcription with /transcribe, and examining the results, along with suggestions for troubleshooting common problems when setting up and using a callback server.

Setting up a sample callback server

To follow this example, you must have a callback server running on a given host and port. If you do not already have a callback server, the easiest way to simulate a callback server is to use the netcat application to listen on a specified port and display the information that it receives. The netcat application is a computer networking utility for reading from and writing to network connections using the TCP or UDP protocols. The name of its executable version is typically nc or nc.exe , depending on the operating system that you are using. The netcat utility is included in most Linux distributions and is freely available for most modern operating systems.

The sample output shown later in this section was produced by netcat that was started using the following Linux command-line command:

while true ; do nc -l 5555 -k ; done

The trivial callback server that we are implementing here with the netcat command continually executes the netcat command, listening on port 5555 (the -l option), and keeps its connections alive by listening for another connection after its current connection is completed (the -k option). It does not have to return any values to applications that talk to that port because V‑Spark does not expect a return code and therefore does not retry until a return code is received.

while true ; do nc -l 5555 -k ; done

The trivial callback server that we are implementing here with the netcat command continually executes the netcat command, listening on port 5555 (the -l option), and keeps its connections alive by listening for another connection after its current connection is completed (the -k option). It does not have to generically return any values to applications that talk to that callback server because V‑Spark either expects an HTTP return code of success or only retries a limited number of times (100, by default) before canceling the callback.

while true ; do echo -e "HTTP/1.1 200 OK\r\n" | nc -l 5555; done

V‑Spark retries submissions to a callback until the callback returns success (HTTP code 200). For this reason, the trivial callback server that we're implementing here with the netcat command, which is listening on port 5555, echoes that success code to the netcat command inside a loop, so that it always sends that success code with anything that is calling it.

Receiving transcription results

A successful call to the V‑Spark API returns the transcript in the default (JSON) format or whatever other format you specified with the output stream tag in your call to V‑Spark's /transcribe API.

A callback server is generally used to collect output and forward it to some other application, process the transcript itself, or perhaps simply to preserve the output for subsequent use. Using the sample callback server that was introduced earlier, transcripts are written to the standard output for the shell in which you executed the netcat command.

curl -F "file=@sample7.wav;type=audio/wav" -F output=text \
     -F token=0123456789ABCDEFGHIJ0123456789ABS \
     -F callback=http://196.168.6.64:5555 \
     https://vcloud.vocitec.com/transcribe

This call would return a message like the following:

{"requestid":"3b1c30e0-e62e-4da0-9487-c2f2c76310c7"}

The following example shows the output that the netcat callback server displays after a call to that server when text output was requested:

POST / HTTP/1.1
Host: 73.174.3.131:5555
Accept-Encoding: identity
Content-Length: 701
Content-Type: text/plain

Thank you for calling Center point energy technical support. I understand you need to report a gas leak and I have your name please
my name is Joe and I thank you Mr. Know what is your address or account number
my address and then one Martin Houston, Texas is there. Anyone inside the house? I know everyone is out of the house. I notice the strange smell when I got home and I called you I am sending and gas technician to your home to fix the problem. Could you give me a good number to reach you at
you can call 28195345 zero's.
Thank you, please be safe and wait for the technician to arrive call us back if anything changes.
Thank you, bye. Good bye and thank you for calling Center point energy.
POST / HTTP/1.1
Content-Type: multipart/form-data;boundary=x7UiTsbnoupKk6ndj9DxpOvyt6NtDFjnn3K0OC
User-Agent: Java/1.7.0_161
Host: 73.174.3.131:5555
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 1100

--x7UiTsbnoupKk6ndj9DxpOvyt6NtDFjnn3K0OC
Content-Disposition: form-data; name="requestid"
Content-Type: text/plain;charset=UTF-8
Content-Length: 36

3b1c30e0-e62e-4da0-9487-c2f2c76310c7
--x7UiTsbnoupKk6ndj9DxpOvyt6NtDFjnn3K0OC
Content-Disposition: form-data; name="file"; filename="sample7.txt"
Content-Type: text/plain
Content-Length: 702

Thank you for calling Center point energy technical support. I understand you need to report a gas leak and I have your name please
my name is Joe and I thank you Mr. Know what is your address or account number
my address and then one Martin Houston, Texas is there. Anyone inside the house? I know everyone is out of the house. I notice the strange smell when I got home and I called you I am sending and gas technician to your home to fix the problem. Could you give me a good number to reach you at
you can call 28195345 zero's.
Thank you, please be safe and wait for the technician to arrive call us back if anything changes.
Thank you, bye. Good bye and thank you for calling Center point energy.

--x7UiTsbnoupKk6ndj9DxpOvyt6NtDFjnn3K0OC--

As discussed earlier, the goal of a callback server is to enable another application to receive and directly interact with the transcriptions produced by V‑Spark. However, a simple callback server such as the one used in this section can also be convenient when testing the effects of trying different options with calls to V‑Spark's /transcribe method.

For example, the following is the callback server's output after transcribing the same sample audio file using the output=text option and adding the diarize=true option:

POST / HTTP/1.1
Content-Type: multipart/form-data;boundary=PX0PI61Stzs4xNw-G7SyqnxXPcstL3PEmbF
User-Agent: Java/1.7.0_161
Host: 73.174.3.131:5555
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 1096

--PX0PI61Stzs4xNw-G7SyqnxXPcstL3PEmbF
Content-Disposition: form-data; name="requestid"
Content-Type: text/plain;charset=UTF-8
Content-Length: 36

9ca9674c-65b7-46a7-aa06-1ceaeae9d8af
--PX0PI61Stzs4xNw-G7SyqnxXPcstL3PEmbF
Content-Disposition: form-data; name="file"; filename="sample7.txt"
Content-Type: text/plain
Content-Length: 707

Thank you for calling Center point energy to technical support.
I understand you need to report a gas leak and I have your name please.
My name is John.
Thank you, Mr. Darrow.
What is your address or account number?
My address is and and Walmart in Houston Texas.
Is there anyone inside the house?
Know everyone is out of the house so I noticed the strange now when I
got home and I called you.
Hi, I'm sending a gas technician to your home to fix the problem.
Could you give me a good number to reach you at.
You can call 281-953-4507.
Thank you, please be safe and wait for the technician to arrive call us back if anything changes.
Thank you bye.
Good, bye and thank you for calling Center point energy.

--PX0PI61Stzs4xNw-G7SyqnxXPcstL3PEmbF--

In the example output, you can see that enabling V‑Spark's diarize option has improved the identification of the different speakers on the call, even though the audio file is still in mono.

Troubleshooting callbacks

If files are being uploaded successfully to V‑Spark, you received a success code (HTTP code 200) and a requestid in response to uploading to V‑Spark. If your callback server is not receiving results, check the items in the following list:

  • Verify that external hosts can reach your callback server - Receiving a success code and requestid in response to POSTing a request to V‑Spark shows that the system that is POSTing the request can reach V‑Spark. This does not mean that V‑Spark can reach your callback host. This lack of reachability is usually due to firewall or network connectivity restrictions.

    Verifying connectivity can most easily be done using a simple callback server like the one in the previous section.

    To test connectivity between V‑Spark and your callback server, log in on a host that is not on your local network and can be reached directly from the Internet. Once you are logged in there, attempt to reach the host on which your callback server is running. The following is a sample curl command that simply probes the URL at which a callback server is listening:

    curl -i http://host:5555
    
                    
                        
                        
                        
                        
                        
                        
                        

    The host and port that you specify are the host and port on which your callback server is listening.

    The -i option tells the curl command to display the HTTP header that it receives. For example, if you are using the sample callback server that was discussed earlier, you will receive a result that is something like the following:

    HTTP/1.1 200 OK
    Note: If your network administration policies restrict inbound connectivity from external hosts, contact support@vocitec.com for the list of V‑Spark IP addresses from which access needs to be allowed.
  • Identify problems in your callback server - If you are able to reach the host and port on which your callback server is running from some other host on the Internet, connectivity is not the problem. Try the following steps to identify problems with your callback server:

    • Verify that you can POST directly to your callback server - use a command like the following to simulate the data that would be sent by V‑Spark to your callback server:

      curl -F "file=@test.json;type=application/json" \
           -F requestid=700e7496-4fce-4963-aa7b-b3b26600f813 \
           https://HOST:PORT/endpoint

      This command provides the two fields of the multipart POST that your callback server needs to be able to handle. Ensure that your callback server correctly returns success (HTTP code 200) when these two fields are received.

    • Verify correct error handling - it is possible for V‑Spark transcription to encounter an error. In such cases, an error message will be POSTed in an error field to your callback server. Your callback server must be able to handle receiving error messages from V‑Spark. The following example command sends the error message This is a sample error to your callback server:

      curl -F "error=This is a sample error" \
           -F requestid=700e7496-4fce-4963-aa7b-b3b26600f813 \
           http://HOST:PORT/endpoint

      This sample command should trigger error handling in your callback server, such as logging a message.

If you still cannot identify or resolve the problem with your callback server, contact support@vocitec.com for assistance in diagnosing the problem that you are experiencing.

Testing callbacks

The following command calls the V‑Spark API, specifies the address of the callback server, specifies that you want text format output, and identifies the audio file that you want to transcribe:

curl -F callback=http://www.example.com:5555 \
      -F token=123e4567e89b12d3a456426655440000 \
      -F output=text -F "file=@sample7.wav;type=audio/wav" \
      http://asr_server:17171/transcribe
curl -F callback=http://www.example.com:5555 \
      -F token=123e4567e89b12d3a456426655440000 \
      -F output=text -F "file=@sample7.wav;type=audio/wav" \
      http://example_host/transcribe
curl -F callback=http://www.example.com:5555 \
     -F token=123e4567e89b12d3a456426655440000 \
     -F output=text -F "file=@sample7.wav;type=audio/wav" \
     http://vcloud.vocitec.com/transcribe

This sample command sends a text transcript of the audio file sample7.wav to the callback server. The text that was transcribed via the cURL command that was shown previously is the following:

Thank you for calling Center point energy technical support. I understand you need to report a gas leak and I have your name please
my name is Joe and I thank you Mr. Know what is your address or account number
my address and then one Martin Houston, Texas is there. Anyone inside the house? I know everyone is out of the house. I notice the strange smell when I got home and I called you I am sending and gas technician to your home to fix the problem. Could you give me a good number to reach you at
you can call 28195345 zero's.
Thank you, please be safe and wait for the technician to arrive call us back if anything changes.
Thank you, bye. Good bye and thank you for calling Center point energy.
POST / HTTP/1.1
Host: 73.174.3.131:5555
Accept-Encoding: identity
Content-Length: 701
Content-Type: text/plain

Thank you for calling Center point energy technical support. I understand you need to report a gas leak and I have your name please
my name is Joe and I thank you Mr. Know what is your address or account number
my address and then one Martin Houston, Texas is there. Anyone inside the house? I know everyone is out of the house. I notice the strange smell when I got home and I called you I am sending and gas technician to your home to fix the problem. Could you give me a good number to reach you at
you can call 28195345 zero's.
Thank you, please be safe and wait for the technician to arrive call us back if anything changes.
Thank you, bye. Good bye and thank you for calling Center point energy.

Note that the sample audio file used in this example is a mono audio file, so the different portions of the audio in which voices are active (known as utterances) are separated by newlines.