V‑Spark 4.3

V‑Spark 4.3 includes security updates, bug fixes, and user experience improvements. The following release notes contain fixes from multiple iterations of 4.3 because these iterations were released in close proximity. The most recent updates are listed first.

Important:

System administrators must run the vspark-admin script with the core-update parameter on any system that has been upgraded to V‑Spark 4.3.1 from an earlier version. Systems upgraded to 4.3.2 and 4.3.3 from 4.3.1 do not require a core-update.

Refer to the core-update row in the V‑Spark 4.0 release notes for more information about this script.

4.3.3

Updated the retry logic for handling database errors. V‑Spark now retries any database error with an error code between 2000 and 2099 (inclusive), along with the codes 9001, 9002, 9003, and 9006. Previously, V‑Spark retried only a more limited set of 2000-level errors.
Improved logging functionality. Log messages generated when detecting language model availability now include a URL for the request that generated the message.

Fixes in 4.3.3

Resolved an issue with default sort for search results in the Files View. Search results once again display in descending order of Time by default. This fix resolves a regression first introduced with version 4.3.0. Manual sorting was unaffected by this regression.
Resolved an issue for systems hosted on AWS that caused some folder data to be deleted erroneously. Previously, deleting a folder could cause the JSON and mp3 data in long-term storage for other folders starting with the deleted folder's name to also be deleted.
For example, for systems hosted on AWS, long-term storage data associated with the folder foobar was erroneously deleted along with data in the deleted folder foo. Folder configuration and Elasticsearch and database records were not affected by this issue.

4.3.2

Updated the sorting logic for columns in CSV exports generated from the Files View list. Columns in reports created with the Files View are now sorted consistently and independently of the order shown in the UI.
Columns are sorted alphabetically, from left to right, in two groups. The first group contains general columns like agent id, duration, and other audio record attributes. The second group contains application-related columns, the names of which begin with the app. prefix.
This change ensures that column order will be consistent for every report generated, including paginated reports.

4.3.1

Updated the Elasticsearch package included with V‑Spark to account for the security vulnerability associated with the Log4j security flaw. The voci-config-elasticsearch package included with V‑Spark 4.3.1, and initially released with version 4.3.0-2, allows users to upgrade to any minor version of Elasticsearch 7 with the caveat that versions newer than 7.16.2-1 have not been tested for use with V‑Spark.
The full list of service dependencies tested with various V‑Spark versions is available at the following link: Third-party service QA tests. The versions tested with this release are included with these release notes.
For more information about V‑Spark and the Log4j vulnerability, please refer to the following statement from the Voci support team:
1. Voci V‑Spark does not use Log4j directly, so we are not planning any updates to V‑Spark software for this issue.
2. V‑Spark does have an operational dependency on Elasticsearch. The Elasticsearch team has determined that there is no remote execution risk in version 7.x releases, and that a minor issue with potential DNS-based information exposure is easily mitigated with a basic configuration update. Out of an abundance of caution, Voci advises that the mitigation be performed.
3. The best solution to this issue is an upgrade to Elasticsearch version 7.16.2-1 or greater, which users can perform at any time convenient to them and their teams. Voci does not anticipate any issues with using Elasticsearch versions greater than 7.16.2-1 with V‑Spark, but those versions have not been tested by the V‑Spark Engineering team.
More information about the risks and remediation associated with Log4j and Elasticsearch can be found on Elasticsearch's website.
Increased the timeout threshold for jobs in the report worker queue. Jobs now wait in the report worker queue for a maximum of 1 hour before timing out and being requeued.
Added functionality to escape certain characters in the field names for exported CSV reports. Custom metadata fields in exported CSV reports whose names begin with = , + , - , or @ are prefixed with an apostrophe ' to prevent spreadsheet applications from interpreting these labels as formulas.
Improved error handling when Elasticsearch is unavailable. Transcript worker processes now pick up new jobs only when these processes detect that Elasticsearch is up and running. Previously, jobs accepted while Elasticsearch was unavailable were picked up and immediately placed in the errors folder.
Added example and reference files to V‑Spark's system configuration settings directory. The four new example files show system configurations for various use cases and are located in the config/vspark.config.d.examples directory. The reference copies of default configuration files are located in the config/vspark.config.d.examples/base.defaults directory.
Improved V‑Spark security to help protect against remote code execution and other vulnerabilities.

Fixes in 4.3.1

Resolved an issue with residual memory usage associated with report worker processes that caused them to use an abnormally large amount of memory. These processes now terminate and restart when they have exceeded a configurable memory threshold after finishing a report job. The new system configuration option report_worker_idle_memory_threshold sets the amount of memory at which report worker processes terminate and restart, takes an integer value expressed in MB, and has a default value of 200.
Report worker process memory usage grows linearly as a function of the number of calls included in daily and monthly reports at an approximate rate of 10Kb per call. Although it is thus still possible for report worker processes to consume a large amount of memory, with this update, memory used by report worker processes is released back to the operating system as soon as reports are generated.
Report worker jobs that terminate and restart generate a WARNING -level message in backendWorker.log .

Known issues in 4.3.1

Application changes may not display in real time when made by another user from a different host. Although application editing works, users editing an application simultaneously from different hosts must refresh the Application Editor to see changes made by another user. This issue does not typically occur when both users are being served by the same host.

4.3.1 Tests

The following table shows the versions of third-party dependencies used to test V‑Spark version 4.3.1.

Table 1. V‑Spark 4.3.1 Dependency Versions Tested
Dependency	Version(s) Tested
Elasticsearch	7.16.2-1
Database	MariaDB 5.5 (EPEL repository)
Redis	3.2 (EPEL repository)

4.3.0

Added functionality to display and export columns for custom metadata fields and their values using the Show/Hide Columns dropdown in the Dashboard Files View. The new Custom metadata section of the dropdown list includes all metadata fields associated with the folder currently selected in the dashboard. If the dashboard is set to display all folders, the dropdown includes all custom metadata fields for every folder the user has permission to view.
Enabling custom metadata fields using the dropdown adds a column for each enabled field to the table of search results. If a record in the list of search results has a value for the enabled field, that value will display on the search result line in the cell associated with the custom metadata field column; otherwise, the cell is blank. Enabled fields and their values are also included in search result CSV exports.
As a part of this change, the Delete icon for search results displayed in the files view has been moved to the left side of each line in the results list to improve user experience. Previously, the icon appeared on the right side of each result line.
Improved user experience when tagging search results in bulk. The dashboard now displays a Saving... message while adding tags to multiple search results. Previously, when tagging a large number of files, the page would sometimes display a Successfully Added Tags message before the new tags were visible in search results.
Increased the timeout threshold for jobs in the dashboard worker queue. Jobs now wait in the dashboard worker queue for a maximum of 4 hours before timing out and being requeued.
Added log messaging for all requeued jobs. Requeued jobs generate a WARNING-level message with the date and time, type of worker process, and other details in backendWorker.log with the format used in the following example:
```
2021-12-01 18:01:00.108 TranscriptWorker 28387 WARNING  Job Requeued: Found inflight job (transcripts_f145c2a1-2a91-4cef-a245-ac4521e8c17b) with an expired TTL -- <requestid=06babe23-2ad5-44b2-8153-4ceed1f76cd8>
```
Changed the format for temporary Elasticsearch index names. Previously, V‑Spark assigned the prefix .vspark-temp- to temporary Elasticsearch indices created for application scoring. The default value for this prefix has been changed to @vstemp to account for changes to index name requirements in the next major version of Elasticsearch.
In addition to the prefix change, temporary index names now include date, time, and detailed request components. The new complete format for temporary index names is prefix_YYYYMMDD_HHMMSSz_requestid_orgshort_tid and includes the following components:
- prefix — The prefix configured in V‑Spark, which is @vstemp by default.
- YYYYMMDD — The year, month, and day on which the index was created.
- HHMMSSz — The hour, minute, second, and time zone for the index, expressed in UTC.
- requestid — The unique identifier for the request that generated the transcript.
- orgshort — The short name of the organization associated with the transcript.
- tid — The unique transcriptID identifier for the transcript.
Important: Although a core-update is not required to implement the temporary index change, a full system restart of all backend nodes is required. Otherwise, the system will be in an inconsistent state with some temporary indices using the deprecated format and some using the new format.
Addressed an issue with startup verification of database names that contain both lowercase and uppercase characters. Previously, verification would sometimes fail when checking the names of database servers with lower_case_table_names set to 1, such as is common on Azure-hosted databases. For more information, refer to the external Azure database documentation.
The third-party Python and NodeJS packages, and the optional third-party Elasticsearch and database packages, used with V‑Spark are now offered as independent packages in the Voci repository.
- Python — voci-spark-pylib
- NodeJS — voci-spark-nodelib
- Elasticsearch — voci-config-elasticsearch
- Databases — voci-config-database
Improved V‑Spark security to help protect against configuration request forgery and remote command execution.

Fixes in 4.3.0

Updated the vspark-admin check-health maintenance script to account for changes to Elasticsearch and Redis security configurations made with V‑Spark version 4.2.0. This resolves the known issue described in the 4.2.0 release notes.
Addressed an issue that prevented application score data from exporting correctly in certain conditions. Previously, exporting CSV reports from the dashboard with All folders selected in the dashboard dropdown would generate a file with only application names and no score data. Those reports now contain the correct data.
Addressed issues with both the /sysinfo endpoint and the system status page in the V‑Spark UI that caused Elasticsearch to leave too many TCP connections open when the system is under heavy load. This fix resolves a regression first introduced with V‑Spark 4.2.0 initially released on 2021-09-23.
Addressed a rare issue with the Job Manager component's drive failure recovery and retry logic where a Job Manager process could fail to recover from operating-system-level drive failures when moving certain files.