Rules

Syntax

Redaction uses a series of regular expression (regex) matches, specified in a RJSON object. The regex field provides a regular expression to match against, while the repl field is the string to replace it with.

The following sample shows the syntax of a redaction rule:

{
“regex”: “string_to_be_redacted”,
“repl”: “string_to_replace”,
“text” : true/false, [if false, override to not have text redacted]
“audio” : true/false, [if false, override to not have audio redacted]
}

Default redaction rules and exclusions

Use the following rule to redact all numbers:

{
  "description": "replace all digits with #",
  "regex": "\\d",
  "repl": "#"
}

Use the following rules to exclude specific number types from redaction:

Exclude ordinal numbers from redaction:

{
  "description": "exclude ordinal numbers from scrubbing",
  "regex": "^(¿)?(\\d+/)?\\d+(st|nd|rd|th|ᵒ|ᵃ|e|er|re)[.,?]?$",
  "repl": "",
  "text": false,
  "report": false,
  "audio": false
}

Exclude percentages from redaction:

{
  "description": "exclude percentages from scrubbing",
  "regex": "^((\\d+\\.)?\\d+%)([.,?]?)$",
  "repl": "",
  "text": true,
  "report": true,
  "audio": true
}

Exclude times from redaction:

{
  "description": "exclude clock times from scrubbing",
  "regex": "^(¿)?([1-9]|10|11|12):[0-5][0-9]( [AP]M)?[.,?]?$",
  "repl": "",
  "text": false,
  "report": false,
  "audio": false
}

Exclude price amounts from redaction:

{
  "description": "exclude prices from scrubbing",
  "regex": "^(¿)?([\\d,. ]+(R?\\$|€)|(R?\\$|€)[\\d,. ]+)[.,?]?$",
  "repl": "",
  "text": false,
  "report": false,
  "audio": false
}

Exclude short decimal numbers from redaction:

{
  "description": "exclude short floating point numbers (w/decimal point) from scrubbing",
  "regex": "^(¿)?\\d{1,4}[.,]\\d{1,4}[.,?]?$",
  "repl": "",
  "text": false,
  "report": false,
  "audio": false
}

Default redaction file — scrub.conf

Note: V‑Cloud users cannot use custom redaction files. Contact support@vocitec.com for more information.

Use a text editor to replicate the default redaction file. The text editor must be capable of saving in plain text. Freely available text editors such as Emacs, Vim, Nano, and Notepad++ work best as they save in plain text by default.

The redaction file must conform to standard JSON formatting and regular expression matching. Refer to JSON Structures for more information on JSON formatting. Additionally, the file must include the default rules and exclusions in the sample below. Include custom redaction rules and exclusions after the defaults.

The following sample is the default redaction file that is automatically applied unless specified otherwise:

[
  {
    "README": "DO NOT EDIT/REMOVE THIS FILE - USER MODIFICATIONS SHOULD BE MADE IN /opt/voci/state/scrub.conf",
    "description": "README",
    "regex": "^$",
    "repl": "",
    "text": false,
    "report": false,
    "audio": false
  },
  {
    "description": "always scrub audio for any characters surrounded by double-octothorpes (via substitutions)",
    "regex": "^((¿)?)##(.+)##([.,?]?)$",
    "repl": "\\1\\3\\4",
    "tospace": "_"
  },
  {
    "description": "exclude any characters surrounded by double-atsymbols (via substitutions)",
    "regex": "^((¿)?)@@(.+)@@([.,?]?)$",
    "repl": "\\1\\3\\4",
    "tospace": "_",
    "report": false,
    "audio" : false
  },
  {
    "description": "Exclude words that include non-digits other than punctuation",
    "regex": "[^-+$%:0-9.,?]",
    "repl": "",
    "text": false,
    "report": false,
    "audio": false
  },
  {
    "description": "exclude ordinal numbers from scrubbing",
    "regex": "^(¿)?(\\d+/)?\\d+(st|nd|rd|th|ᵒ|ᵃ|e|er|re)[.,?]?$",
    "repl": "",
    "text": false,
    "report": false,
    "audio": false
  },
  {
    "description": "exclude percentages from scrubbing",
    "regex": "^(¿)?(\\d+[.,])?\\d+%[.,?]?$",
    "repl": "",
    "text": false,
    "report": false,
    "audio": false
  },
  {
    "description": "exclude clock times from scrubbing",
    "regex": "^(¿)?([1-9]|10|11|12):[0-5][0-9]( [AP]M)?[.,?]?$",
    "repl": "",
    "text": false,
    "report": false,
    "audio": false
  },
  {
    "description": "exclude prices from scrubbing",
    "regex": "^(¿)?([\\d,. ]+(R?\\$|€)|(R?\\$|€)[\\d,. ]+)[.,?]?$",
    "repl": "",
    "text": false,
    "report": false,
    "audio": false
  },
  {
    "description": "exclude short floating point numbers (w/decimal point) from scrubbing",
    "regex": "^(¿)?\\d{1,4}[.,]\\d{1,4}[.,?]?$",
    "repl": "",
    "text": false,
    "report": false,
    "audio": false
  },
  {
    "description": "replace all other digits with #",
    "regex": "\\d",
    "repl": "#"
  }
]
Important: If an /opt/voci/state/scrub.conf file exists, the default redaction configuration distributed with the ASR Engine will be ignored completely.

Exclusion examples

Customize the information you don't want to be redacted with rules to exclude characters from redaction. Exclusions are useful for avoiding unnecessary redactions such as product names that contain numbers. The following examples are redaction rules that exclude certain characters from redaction:

Match 2 digit number and return 2 digit number:

{
  "description": "exclude 2 digit numbers from scrubbing",
  "regex": "(^\\d{2}[.,]?$)",
  "repl": "\\1"
  "text": false,
  "report": false,
  "audio": false
}

Exclude alpha numeric:

{
  "description": "exclude alpha numeric - anything with both letters and numbers from scrubbing",
  "regex": "^([A-Za-z]+\\d+\\w*|\\d+[A-Za-z]+\\w*)([.,?]?)$",
  "repl": "\\1",
  "text": false,
  "report": false,
  "audio": false
}