Speech API
Speech API triggers audio-file processing in Medallia Speech for the purpose of creating a feedback record in Experience Cloud.
Additionally, there is a Medallia connector that uses this mechanism to fetch data from the Media File Transfer storage bucket for ingestion.
Restrictions and limits
The API supports bulk uploads and can handle up to 1,000 records per request. Additionally, there is a limit to the number of API calls an app can make within a given time period:
-
Up to 13,000 requests per minute
-
Up to 325,000 requests per 24-hour period
Authentication and authorization
Authentication identifies who is making an API request, and authorization identifies what data the requester may access. OAuth is an industry standard for authorizing limited access to services and data. Applications must obtain a secure token that identifies the application that makes the request. The token is passed to the resource server (API server) with each API request. For more information, see Authenticate APIs with OAuth.
-
The application must have an account.
-
The account's role must have permission to access the API.
-
API access is authenticated with OAuth. To use OAuth, the application must first obtain an OAuth access token, by requesting one for the application's client ID and secret. For detailed information, see Authenticating APIs with OAuth.
Request/response formats
-
A required
Content-Type
header field describing the content. The acceptable type is: application/json for JSON. -
An optional
Accept
field tells the Speech API how to format the response. The acceptable type is: application/json for JSON.Parameter Description Required Values Bearer Access token Required See Authentication and authorization. Content-Type Format of request data Required application/json. Accept Format of response data Optional application/json. -
No other header fields are expected.
-
The request URL:
-
Always use the base instance for the company's Medallia installation.
-
URL and endpoint
The Speech API accesses resources from a URL that follows this format:
-
api-host is the server for your company's Medallia Experience Cloud instance. For detailed information about identifying the host, see API hosts.
-
service is
speech
. -
api-version is
v0
. -
endpoint is
bulk-ingest
.
POST body
This API is used to provide context for voice signals by sending metadata associated with the signals (audio files) in the body of the HTTP POST request.
The request body should be encoded in a JSON array of objects, with each object containing keys and values that match the parameters shown in the following table:
Parameter | Description | Type | Required | Notes |
---|---|---|---|---|
call_identifier | A unique record identifier, encoded as a JSON string. | String | Required | It can be the Universal Call ID (UCID) or some other similar tracked value. Important: Make sure you can track this parameter, since it can later be used as an external ID for the file during import or export operations. For example: 12345-67890-1234567890. |
speech_file_name | Name of the audio file associated with the record, encoded as a JSON string. | String | Required | S3 supports the use of forward slashes in file names to simulate folders. If your company uses this feature, you must include the full path in the Speech File Name. For example: audio/2020-07-03!1000/T15996_A.wav. |
unit_identifier | ID of the agent that handled the call (typically the last agent the customer is transferred to, if there are multiple agents), encoded as a JSON string. | String | Optional | This must match the ID that is included in the organizational hierarchy for the agent. Note: You can supplement with additional unit fields through the transfer of custom metadata. For example, if your company is using Apps (formerly known as Best Practice Packages) that have a different unit field, you can include that field as metadata.
|
call_date_and_time | Date and time of the interaction, encoded as a JSON string of an ISO-8601 timestamp. | Datetime | Required | Format is yyyy-MM-dd HH:mm:ssZZ (e.g. 2016-01-01 11:30:00-0800). |
engine | The speech-to-text transcription engine to use for the call, encoded as a JSON string. | String | Required | The default value is "Engine1". The accepted values are:
|
call_recording_url | URL to an external resource of the call interaction recording, encoded as a JSON string. | String | Optional | This is typically used to reference back to the source (third-party) system. Note: This URL is not used to download the call recording. It is intended as a clickable link from Medallia Reporting to the source system. |
vertical_model |
Medallia Speech vertical model to use for analyzing the call contents, encoded as a JSON string. | String | Optional |
The default value is “Call Center”. The accepted value is "Call Center". |
locale | Primary language spoken by the customer during the call, encoded as a JSON string of ISO 639-1 values. | String | Optional | The default value is “en-US”. The accepted ISO 639-1 values are:
Note: When the
engine is Engine1, if the field agent_locale has a value, then locale will be used for customer channel and agent_locale for agent channel. Otherwise, locale will be used as media language for the entire file. |
agent_locale | Primary language spoken by the agent during the call, encoded as a JSON string of ISO 639-1 values. | String | Optional | The default and accepted values are the same that can be sent for locale .Note: This parameter is available only when the
engine is "Engine1". |
apply_diarization |
Boolean that determines whether diarization needs to be applied to the audio file during processing, encoded as a JSON string. | String | Optional |
Diarization presumes two people are speaking, and separates mono audio recordings into distinct channels by categorizing speech into two groups. So, this setting only applies to mono-channel recordings, which need to get diarized. The default value is “No”. The accepted values are:
|
agent_channel | Determines which of the 2 channels (0 or 1) is associated with the agent. The other channel is associated with the customer. Note: The initiator of a call is assigned to channel 0. For inbound calls, set the agent channel to 1. For outbound calls, set the agent channel to 0. | String | Optional |
Must be mapped during Auto-Importer processing. Confirm how your telephony system records data to audio channels to properly set this value. The default value is “0”. The allowed values are:
|
substitutions |
The set of transcription substitutions to make during processing, encoded as a JSON object of key/value pairs. Substitutions can correct errors in transcripts using substitution rules that find and replace transcription errors with corrected values. | Substitution data object | Optional |
The format is that of a JSON object, where the keys are the original versions to find and the values are the replacement versions. See the example below for proper formatting: {"appeal box":"a PO box","triple A batteries":"AAA batteries"} Important: Substitution rules are processed as part of the call made to the Speech API, and therefore cannot be applied to historical data. If you need to apply new substitution rules to data already transcribed by Speech, you must resend the associated audio file through the API.
|
apply_redaction | Boolean that determines whether redaction is performed on the audio and its transcription, encoded as a JSON string. | String | Optional |
By default, if no value is set, redaction is set to “Yes”. Restriction: Redaction is not available for the Amazon Transcribe engine.
The allowed values are:
Note: For security purposes, Medallia Speech automatically redacts credit card numbers, Social Security Numbers, and street addresses from the transcription and playback audio. If your company wishes to keep that information visible in Experience Cloud, set
"apply_redaction": "No" as part of the transcription API request. |
first_name | First name of the customer, encoded as a JSON string. | String | Optional | Only required if a followup survey is being sent for the Contact Center interaction, since this would be necessary for the email invitation. |
last_name | Last name of the customer, encoded as a JSON string. | String | Optional | Only required if a followup survey is being sent for the Contact Center interaction, since this would be necessary for the email invitation. |
Email address of the customer, encoded as a JSON string. | String | Optional | Only required if a followup survey is being sent for the Contact Center interaction, since this would be necessary for the email invitation. | |
phone_number | Phone number of the customer, encoded as a JSON string. | String | Optional | This allows closed-loop feedback processes to have the customer phone number available when applicable. It can be based on the ANI (Automatic Number Identification). |
connection_id | Unique identifier of the connection profile. For more information see Implement Speech. | String | Optional | This property is set automatically when you create a new connection profile. |
connector_id | Unique identifier of a specific Speech Connector API as configured in Medallia Admin Suite. | String | Optional Important: When using the Speech API, if one or more Speech API type connectors are configured, this parameter is required. In this scenario, the API fails if the
connector_id is not provided. | If set, the Medallia Speech data in the API request is routed through the connector for processing, including extra metadata available in speech_additional_info . |
speech_additional_info |
Additional information specific to each speech vendor. Use this parameter to send additional call audio metadata. | Information data object | Optional |
See the example below for proper formatting: {"queue_name":"Bank","queue_id":"12","direction":"Inbound","skill":"Bank","agent_first_name":"Gordon","agent_last_name":"Gekko"} This option enables clients to augment the default set of call metadata. Restriction: Use of this field requires setting a
connector_id . |
Response
The API is synchronous; the response is a JSON object that includes:
Element | Description | Type | Notes | |
---|---|---|---|---|
job_id | UUID of the transcription job. | String | Medallia recommends storing this value for troubleshooting and auditing purposes. | |
status | The overall status of the processing of the request. This value represents whether the basic requirements were met to accept the file for processing; it does not indicate that the transcription will succeed. | String | Values:
Note: A status of ACCEPTED or REJECTED means all the call entities provided in the request take on that status. A status of PARTIALLY_ACCEPTED means that there is a difference in status on particular call entities in the request, and the
details array should be parsed for further status on each. | |
details | An array of details related to the call entities from the request. | File processing data object | This element is only returned when a specific file or several files could not be processed (when the status is PARTIALLY_ACCEPTED). See Error handling. | |
call_identifier | Unique record identifier. | String | This value is used to associate the response in the details array with the Medallia Speech API response details. | |
speech_file_name | Filename of the audio file on the Medallia Media File Transfer system that is associated with the record. | String | — | |
status | Status of the record processing. | String | Values:
| |
error_message | Brief and human-readable description of the error that occurred. | String | — |
Sample requests
The following samples show how to format the body of the request when using the Speech API to transfer call data.
Sample request - Speech API call with one record, all fields
Sample response - Accepted
Sample request - Speech API call with 6 records
Sample response - Accepted
Sample request - Speech API call with 5 records
Sample response - Partially accepted
Error handling
-
Client problems e.g., rate-limited, unauthorized, etc. (4xx HTTP codes).
-
The body of the request fails internal validation (syntax, formatting, etc.).
-
The user-supplied parameters or context are bad (cannot find the specified file in the storage system).
-
One or more Speech API type connectors are configured, but you have not provided a
connector_id
.
For errors 1 and 2 above, the application gets an HTTP error, so the response won't be a Speech API response because the error happened before processing the request.
Sample request - Speech API call with one record - Invalid payload
Sample response - Status 400 - Bad request
For error 3, the response will have a HTTP 200 code and the response will have a mix of file data and error messages. This is because the API will try to process the request, and will return details when something goes wrong.
Sample request - Speech API call with one record - Missing file
Sample response - Status 200 - OK
For error 4, the Speech API fails with the following if the connector_id
is not provided:
Speech API payload examples
The following samples show how to format the body of the request when using the Speech API in different contexts.
Speech API Payload | Speech API payload + Connection ID profiles in Setup | Speech API payload for connectors + metadata |
---|---|---|
Loaded via Auto Importer. | Loaded via Auto Importer. | Metadata added to payload via speech_additional_info mapped to fields through connector data mappings.Tip: While
engine , vertical model , locales , diarization , redactions , substitutions and other values can be defined, currently, there's no merge mechanism between the Speech API payload and connector settings, so Experience Cloud will use the the values defined in the Speech API payload if present. |