Speech API

Speech API triggers audio-file processing in Medallia Speech for the purpose of creating a feedback record in Experience Cloud.

Once audio files have been uploaded to Experience Cloud, use this API to trigger file processing in Medallia Speech, which will create a feedback record for each call.
Note: To use this API, customers must have already transferred the call recording files to an intermediary storage system between their network and the Experience Cloud instance. Medallia recommends that you use a Medallia Media File Transfer storage bucket for this purpose.

Additionally, there is a Medallia connector that uses this mechanism to fetch data from the Media File Transfer storage bucket for ingestion.

Restrictions and limits

The API supports bulk uploads and can handle up to 1,000 records per request. Additionally, there is a limit to the number of API calls an app can make within a given time period:

  • Up to 13,000 requests per minute

  • Up to 325,000 requests per 24-hour period

Important: The timeout for a request is 180 seconds. The API Gateway discards any request that takes longer than that. However, the processing may take longer than 180 seconds: consider checking in reports if the files have been processed before sending another request.

Authentication and authorization

Authentication identifies who is making an API request, and authorization identifies what data the requester may access. OAuth is an industry standard for authorizing limited access to services and data. Applications must obtain a secure token that identifies the application that makes the request. The token is passed to the resource server (API server) with each API request. For more information, see Authenticate APIs with OAuth.

To use the Speech API:
  1. The application must have an account.

  2. The account's role must have permission to access the API.

  3. API access is authenticated with OAuth. To use OAuth, the application must first obtain an OAuth access token, by requesting one for the application's client ID and secret. For detailed information, see ​Authenticating APIs with OAuth​.

Request/response formats

Requests sent to Medallia Speech API are HTTP POST protocol. The information in the request includes:
  • A required Content-Type header field describing the content. The acceptable type is: application/json for JSON.

  • An optional Accept field tells the Speech API how to format the response. The acceptable type is: application/json for JSON.

    ParameterDescriptionRequiredValues
    BearerAccess tokenRequiredSee Authentication and authorization.
    Content-TypeFormat of request dataRequiredapplication/json.
    AcceptFormat of response dataOptionalapplication/json.
  • No other header fields are expected.

  • The request URL:

    • Always use the base instance for the company's Medallia installation.

URL and endpoint

The Speech API accesses resources from a URL that follows this format:

https://<api-host>/<service>/<api-version>/<endpoint>
Where:
  • api-host is the server for your company's Medallia Experience Cloud instance. For detailed information about identifying the host, see API hosts.

  • service is speech.

  • api-version is v0.

  • endpoint is bulk-ingest.

POST body

This API is used to provide context for voice signals by sending metadata associated with the signals (audio files) in the body of the HTTP POST request.

Note: The following table lists the parameters that can be sent for each call record. Medallia recommends that you include these additional metadata values in your API request to ensure your audio is being transcribed more precisely to meet your business needs.

The request body should be encoded in a JSON array of objects, with each object containing keys and values that match the parameters shown in the following table:

ParameterDescriptionTypeRequiredNotes
call_identifierA unique record identifier, encoded as a JSON string. StringRequiredIt can be the Universal Call ID (UCID) or some other similar tracked value.
Important: Make sure you can track this parameter, since it can later be used as an external ID for the file during import or export operations.
For example: 12345-67890-1234567890.
speech_file_name Name of the audio file associated with the record, encoded as a JSON string.StringRequiredS3 supports the use of forward slashes in file names to simulate folders. If your company uses this feature, you must include the full path in the Speech File Name. For example: audio/2020-07-03!1000/T15996_A.wav.
unit_identifierID of the agent that handled the call (typically the last agent the customer is transferred to, if there are multiple agents), encoded as a JSON string.StringOptionalThis must match the ID that is included in the organizational hierarchy for the agent.
Note: You can supplement with additional unit fields through the transfer of custom metadata. For example, if your company is using Apps (formerly known as Best Practice Packages) that have a different unit field, you can include that field as metadata.
call_date_and_timeDate and time of the interaction, encoded as a JSON string of an ISO-8601 timestamp.DatetimeRequiredFormat is yyyy-MM-dd HH:mm:ssZZ (e.g. 2016-01-01 11:30:00-0800).
engineThe speech-to-text transcription engine to use for the call, encoded as a JSON string.StringRequiredThe default value is "Engine1".
The accepted values are:
  • Engine1Voci engine

  • Engine2 — Amazon Transcribe engine

  • Engine3 — Speechmatics engine

call_recording_urlURL to an external resource of the call interaction recording, encoded as a JSON string.StringOptionalThis is typically used to reference back to the source (third-party) system.
Note: This URL is not used to download the call recording. It is intended as a clickable link from Medallia Reporting to the source system.
vertical_model

Medallia Speech vertical model to use for analyzing the call contents, encoded as a JSON string.

StringOptional

The default value is “Call Center”.

The accepted value is "Call Center".

localePrimary language spoken by the customer during the call, encoded as a JSON string of ISO 639-1 values.StringOptionalThe default value is “en-US”.
The accepted ISO 639-1 values are:
  • en-US
  • en-GB
  • en-AU
  • es-US
  • es-ES
  • es-MX
  • fr-CA
  • fr-FR
  • de-DE
  • it-IT
  • pt-BR
  • sv-SE (only available for Engine2)

  • el-GR (only available for Engine2)

  • ko-KR only available for Engine2)

  • zh-TW only available for Engine2)

Note: When the engine is Engine1, if the field agent_locale has a value, then locale will be used for customer channel and agent_locale for agent channel. Otherwise, locale will be used as media language for the entire file.
agent_localePrimary language spoken by the agent during the call, encoded as a JSON string of ISO 639-1 values.StringOptionalThe default and accepted values are the same that can be sent for locale.
Note: This parameter is available only when the engine is "Engine1".
apply_diarization

Boolean that determines whether diarization needs to be applied to the audio file during processing, encoded as a JSON string.

StringOptional

Diarization presumes two people are speaking, and separates mono audio recordings into distinct channels by categorizing speech into two groups. So, this setting only applies to mono-channel recordings, which need to get diarized.

The default value is “No”.

The accepted values are:
  • Yes

  • No

agent_channelDetermines which of the 2 channels (0 or 1) is associated with the agent. The other channel is associated with the customer.
Note: The initiator of a call is assigned to channel 0. For inbound calls, set the agent channel to 1. For outbound calls, set the agent channel to 0.
StringOptional

Must be mapped during Auto-Importer processing. Confirm how your telephony system records data to audio channels to properly set this value.

The default value is “0”.

The allowed values are:
  • 0

  • 1

substitutions

The set of transcription substitutions to make during processing, encoded as a JSON object of key/value pairs. Substitutions can correct errors in transcripts using substitution rules that find and replace transcription errors with corrected values.

Substitution data objectOptional

The format is that of a JSON object, where the keys are the original versions to find and the values are the replacement versions.

See the example below for proper formatting:

{"appeal box":"a PO box","triple A batteries":"AAA batteries"}

Important: Substitution rules are processed as part of the call made to the Speech API, and therefore cannot be applied to historical data. If you need to apply new substitution rules to data already transcribed by Speech, you must resend the associated audio file through the API.
apply_redactionBoolean that determines whether redaction is performed on the audio and its transcription, encoded as a JSON string.StringOptional

By default, if no value is set, redaction is set to “Yes”.

Restriction: Redaction is not available for the Amazon Transcribe engine.
The allowed values are:
  • Yes

  • No

Note: For security purposes, Medallia Speech automatically redacts credit card numbers, Social Security Numbers, and street addresses from the transcription and playback audio. If your company wishes to keep that information visible in Experience Cloud, set "apply_redaction": "No" as part of the transcription API request.
first_nameFirst name of the customer, encoded as a JSON string. StringOptionalOnly required if a followup survey is being sent for the Contact Center interaction, since this would be necessary for the email invitation.
last_nameLast name of the customer, encoded as a JSON string. StringOptionalOnly required if a followup survey is being sent for the Contact Center interaction, since this would be necessary for the email invitation.
emailEmail address of the customer, encoded as a JSON string.StringOptionalOnly required if a followup survey is being sent for the Contact Center interaction, since this would be necessary for the email invitation.
phone_numberPhone number of the customer, encoded as a JSON string. StringOptionalThis allows closed-loop feedback processes to have the customer phone number available when applicable. It can be based on the ANI (Automatic Number Identification).
connection_idUnique identifier of the connection profile. For more information see Implement Speech.StringOptionalThis property is set automatically when you create a new connection profile.
connector_idUnique identifier of a specific Speech Connector API as configured in Medallia Admin Suite.StringOptional
Important: When using the Speech API, if one or more Speech API type connectors are configured, this parameter is required. In this scenario, the API fails if the connector_id is not provided.
If set, the Medallia Speech data in the API request is routed through the connector for processing, including extra metadata available in speech_additional_info.
speech_additional_info

Additional information specific to each speech vendor.

Use this parameter to send additional call audio metadata.

Information data objectOptional

See the example below for proper formatting:

{"queue_name":"Bank","queue_id":"12","direction":"Inbound","skill":"Bank","agent_first_name":"Gordon","agent_last_name":"Gekko"}

This option enables clients to augment the default set of call metadata.

Restriction: Use of this field requires setting a connector_id.

Response

The API is synchronous; the response is a JSON object that includes:

ElementDescriptionTypeNotes
job_idUUID of the transcription job.StringMedallia recommends storing this value for troubleshooting and auditing purposes.
statusThe overall status of the processing of the request.  This value represents whether the basic requirements were met to accept the file for processing; it does not indicate that the transcription will succeed.StringValues:
  • ACCEPTED

  • PARTIALLY_ACCEPTED

  • REJECTED

Note: A status of ACCEPTED or REJECTED means all the call entities provided in the request take on that status.  A status of PARTIALLY_ACCEPTED means that there is a difference in status on particular call entities in the request, and the details array should be parsed for further status on each.
detailsAn array of details related to the call entities from the request.File processing data object

This element is only returned when a specific file or several files could not be processed (when the status is PARTIALLY_ACCEPTED).

See Error handling.
call_identifierUnique record identifier. StringThis value is used to associate the response in the details array with the Medallia Speech API response details.
speech_file_nameFilename of the audio file on the Medallia Media File Transfer system that is associated with the record.String
statusStatus of the record processing.StringValues:
  • ACCEPTED

  • REJECTED

error_messageBrief and human-readable description of the error that occurred.String

Sample requests

The following samples show how to format the body of the request when using the Speech API to transfer call data.

Sample request - Speech API call with one record, all fields

POST https://instance​.apis.medallia.com/speech/v0/bulk-ingest Content-Type: application/json
[ { "call_identifier": "0696e114-b819-11ea-b3de-0242ac130004", "speech_file_name": "T15584.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1", "call_recording_url": "https://www.youtube.com/watch?v=6CcCyYSMwnQ", "vertical_model": "Call Center", "locale": "en-US", "agent_locale": "en-US", "apply_diarization": "No", "agent_channel": "0", "substitutions": {"appeal box":"a PO box","triple A batteries":"AAA batteries"}, "apply_redaction": "Yes", "first_name": "Janelle", "last_name": "Perry", "email": "janelle.perry@mail.com", "phone_number": "555-555-5555", "connection_id": "79589f06-ad70-4621-bdf3-37ef1c693ff0", "connector_id": "68478g15-be61-3512-ceg2-26de2b782gg1", "speech_additional_info": {"queue_name":"Bank","queue_id":"12","direction":"Inbound","skill":"Bank","agent_first_name":"Gordon","agent_last_name":"Gekko"} } ]

Sample response - Accepted

{ "job_id": "713c865e-0d1f-43a0-9998-5fada657850b", "status": "ACCEPTED" }

Sample request - Speech API call with 6 records

POST https://instance​.apis.medallia.com/speech/v0/bulk-ingest Content-Type: application/json
[ { "call_identifier": "8a98e8f7-f815-4247-90da-57ec53da6c50", "speech_file_name": "T15987A.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1", "call_recording_url": "https://www.youtube.com/watch?v=6CcCyYSMwnQ", "apply_redaction": "Yes", "substitutions": {"sub1":"subA"}, "locale": "en-US", "agent_locale": "en-US" }, { "call_identifier": "7f06bf84-98fc-4776-8617-033022819c9c", "speech_file_name": "T15987B.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1", "apply_redaction": "Yes", "first_name": "Janelle", "last_name": "Perry" }, { "call_identifier": "0e04a5a9-bab2-4644-9aab-04e887213e50", "speech_file_name": "T15987C.wav", "unit_identifier": "svc_tech_103", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1", "email": "janelle.perry@mail.com", "phone_number": "555-555-5555" }, { "call_identifier": "504b4661-30f5-46f3-b833-c008d5c9b8c6", "speech_file_name": "T15987D.wav", "unit_identifier": "svc_tech_103", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1", "locale": "en-US", "agent_channel": "0", "substitutions": null, "apply_redaction": "Yes" }, { "call_identifier": "e7cd7a2b-fd51-4028-99db-4bc13aa85ffd", "speech_file_name": "T15987E.wav", "unit_identifier": "svc_tech_103", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1", "call_recording_url": "https://www.youtube.com/watch?v=6CcCyYSMwnQ", "vertical_model": "Call Center" }, { "call_identifier": "75285c45-0675-4325-95a8-867a1575f074", "speech_file_name": "T15987F.wav", "unit_identifier": "svc_tech_103", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine2", "call_recording_url": "https://www.youtube.com/watch?v=6CcCyYSMwnQ", "vertical_model": "Call Center", "apply_diarization": "No", "apply_redaction": "Yes", "email": "janelle.perry@mail.com" } ]

Sample response - Accepted

{ "job_id": "84c18904-b2a2-445b-88c2-efe8ccb0cab9", "status": "ACCEPTED" }

Sample request - Speech API call with 5 records

POST https://instance​.apis.medallia.com/speech/v0/bulk-ingest Content-Type: application/json
[ { "call_identifier": "cb80e932-9a86-45c5-af83-4a0441daca3b", "speech_file_name": "T15560A.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine2" }, { "call_identifier": "6ff2ce14-3be8-43ba-949b-f00254a7be3f", "speech_file_name": "T15560B.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine2" }, { "call_identifier": "e5265198-4a43-4e17-8f7e-60a0c8146401", "speech_file_name": "T15560C.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine2", }, { "call_identifier": "f2796561-9c7e-4825-bee3-769a954e1c66", "speech_file_name": "T15560D.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine2" }, { "call_identifier": "1a87c03c-d148-4b70-b745-eeca94b1cc0b", "speech_file_name": "T15560E.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1" } ]

Sample response - Partially accepted

{ "job_id": "377533aa-d342-4a6a-8af3-724a91587bb1", "status": "PARTIALLY_ACCEPTED", "details": [ { "call_identifier": "cb80e932-9a86-45c5-af83-4a0441daca3b", "speech_file_name": "T15560A.wav", "status": "ACCEPTED" }, { "call_identifier": "6ff2ce14-3be8-43ba-949b-f00254a7be3f", "speech_file_name": "T15560B.wav", "status": "ACCEPTED" }, { "call_identifier": "e5265198-4a43-4e17-8f7e-60a0c8146401", "speech_file_name": "T15560C.wav", "status": "REJECTED", "error_message": "File T15560C.wav does not exist" }, { "call_identifier": "f2796561-9c7e-4825-bee3-769a954e1c66", "speech_file_name": "T15560D.wav", "status": "ACCEPTED" }, { "call_identifier": "1a87c03c-d148-4b70-b745-eeca94b1cc0b", "speech_file_name": "T15560E.wav", "status": "REJECTED", "error_message": "File T15560E.wav does not exist" } ] }

Error handling

There are several types of errors that can happen when calling the API:
  1. Client problems e.g., rate-limited, unauthorized, etc. (4xx HTTP codes).

  2. The body of the request fails internal validation (syntax, formatting, etc.).

  3. The user-supplied parameters or context are bad (cannot find the specified file in the storage system).

  4. One or more Speech API type connectors are configured, but you have not provided a connector_id.

For errors 1 and 2 above, the application gets an HTTP error, so the response won't be a Speech API response because the error happened before processing the request.

Sample request - Speech API call with one record - Invalid payload

POST https://instance​.apis.medallia.com/speech/v0/bulk-ingest Content-Type: application/json
[ { "call_identifier": "0696d868-b819-11ea-b3de-0242ac130004", "speech_file_name": "file.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1" ]

Sample response - Status 400 - Bad request

{ "error_type": "invalid_input", "message": "Input request body is not valid", "_allowed": [] }

For error 3, the response will have a HTTP 200 code and the response will have a mix of file data and error messages. This is because the API will try to process the request, and will return details when something goes wrong.

Sample request - Speech API call with one record - Missing file

POST https://instance​.apis.medallia.com/speech/v0/bulk-ingest Content-Type: application/json
[ { "call_identifier": "ab8c2994-9359-4269-ba4f-cd4c0d53635e", "speech_file_name": "T15525.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine1" } ]

Sample response - Status 200 - OK

{ "job_id": "68a49d2d-71ba-4dda-a27b-85d17039d2cc", "status": "REJECTED", "details": [ { "call_identifier": "ab8c2994-9359-4269-ba4f-cd4c0d53635e", "speech_file_name": "T15525.wav", "status": "REJECTED", "error_message": "File T15525.wav does not exist" } ] }
Note: The service does not validate if duplicate parameters are sent in the body of the request, or across several requests. If the same request is sent more than once, all will be accepted.

For error 4, the Speech API fails with the following if the connector_id is not provided:

{ "job_id": "7b721583-cb4f-48bb-8861-426ed0cf8719", "status": "REJECTED", "details": [ { "call_identifier": ""12042024-454004002450966-record-cv20240413001", "speech_file_name": "Audio/454004002450966.wav", "status": "REJECTED", "error_message": "Send failed; nested exception is org.apache.kafka.common.errors.SaslAuthenticationException: {\"status\":\"invalid_token\"}" } ] }

Speech API payload examples

The following samples show how to format the body of the request when using the Speech API in different contexts.

Speech API PayloadSpeech API payload + Connection ID profiles in SetupSpeech API payload for connectors + metadata
Loaded via Auto Importer.[ { "call_identifier": "cb80e932-9a86-45c5-af83-4a0441daca3b", "speech_file_name": "T15560A.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine2", "agent_channel": "1", "apply_redaction": "Yes", "apply_diarization": "No", "locale": "en-US", "substitutions": {"appeal box":"a PO box","triple A batteries":"AAA batteries"}, }, ] Loaded via Auto Importer.[ { "call_identifier": "cb80e932-9a86-45c5-af83-4a0441daca3b", "speech_file_name": "T15560A.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine2", "agent_channel": "1", "apply_redaction": "Yes", "apply_diarization": "No", "locale": "en-US", "substitutions": {"appeal box":"a PO box","triple A batteries":"AAA batteries"}, "connector_id" : "e8274f70-981c-11ed-938e-9f223223dd53", }, ] Metadata added to payload via speech_additional_info mapped to fields through connector data mappings.[ { "call_identifier": "cb80e932-9a86-45c5-af83-4a0441daca3b", "speech_file_name": "T15560A.wav", "unit_identifier": "wm_advisor_1", "call_date_and_time": "2020-06-24T13:02:00-03:00", "engine": "Engine2", "agent_channel": "1", "apply_redaction": "Yes", "apply_diarization": "No", "locale": "en-US", "substitutions": {"appeal box":"a PO box","triple A batteries":"AAA batteries"}, "connector_id" : "e8274f70-981c-11ed-938e-9f223223dd53", "speech_additional_info": { "RECORDINGID": "62d53717-5a2e-d006-8e18-32e92cef00", "RECORDINGDATE": "2023-01-19T21:26:44Z", "LINENAME": "SMP_LEGACY_MN_1", "CALLID": "2003735217", "CALLTYPE": "External", "CALLDIRECTION": "Inbound", "STATIONID": "MN0136", "LOCALNAME": "Dacy Hanson", "ASSIGNEDWORKGROUP": "STARTTIME": "2023-01-06T21:26:44Z", "REMOTENAME": "GEORGE TWIGG", "ENDTIME": "2023-01-06T21:28:59.23Z", "CALLDURATIONSECONDS": "180", "DNIS": "2172389143", "SKILLSET": "CS_RES_NNE_FIBER", "DISCOTYPE": "Remote Disconnect", }, ]
Tip: While engine, vertical model, locales, diarization, redactions, substitutions and other values can be defined, currently, there's no merge mechanism between the Speech API payload and connector settings, so Experience Cloud will use the the values defined in the Speech API payload if present.