JSON transcripts
All Voci products use the JSON file format to store transcript data derived from source audio. This data includes the text decoded from speech, along with metadata that describes audio attributes and the results of linguistic and emotional analysis performed by Voci products.
The data fields included in Voci JSON transcription output vary depending on the products, circumstances, and optional features that were used to generate the output. There are three categories of Voci JSON data:
-
Core ASR data generated by the ASR engine under most circumstances and whenever text is decoded from speech. For example, fields like the top-level
asr
andmodel
elements will always be included in JSON output because they refer to ASR engine and language model attributes. Similarly, theutterances
array is included if audio was successfully transcribed because it is generated any time speech is decoded from audio. -
Conditional and parameter data generated only under certain conditions or when certain transcription parameters are specified. For example, the top-level
chaninfo
object is included only for stereo or diarized audio, and the top-levelemotion
field is included only when the transcription request includes the stream tagemotion = true
.Under some conditions, fields with identical names appear at different levels of the JSON data hierarchy. For example, the field
agentscore
is included at the top level only when processing undiarized mono audio with the stream tagagentid = true
. When processing stereo audio withagentid = true
, theagentscore
field instead appears in thechaninfo
objects for each channel of audio. -
Data added to JSON transcripts when output is processed by V‑Spark for analytics and application scoring. This data can be viewed in the V‑Spark UI or in the top-level
app_data
andclient_data
objects andlast_modified
field generated for V‑Spark JSON output.
warning
tag in the JSON output to see if there were any issues with the transcription.