Rules
Syntax
Redaction uses a series of regular expression (regex) matches, specified in a RJSON object. The regex field provides a regular expression to match against, while the repl field is the string to replace it with.
The following sample shows the syntax of a redaction rule:
{
“regex”: “string_to_be_redacted”,
“repl”: “string_to_replace”,
“text” : true/false, [if false, override to not have text redacted]
“audio” : true/false, [if false, override to not have audio redacted]
}
Default redaction rules and exclusions
Use the following rule to redact all numbers:
{
"description": "replace all digits with #",
"regex": "\\d",
"repl": "#"
}
Use the following rules to exclude specific number types from redaction:
Exclude ordinal numbers from redaction:
{
"description": "exclude ordinal numbers from scrubbing",
"regex": "^(¿)?(\\d+/)?\\d+(st|nd|rd|th|ᵒ|ᵃ|e|er|re)[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
}
Exclude percentages from redaction:
{
"description": "exclude percentages from scrubbing",
"regex": "^((\\d+\\.)?\\d+%)([.,?]?)$",
"repl": "",
"text": true,
"report": true,
"audio": true
}
Exclude times from redaction:
{
"description": "exclude clock times from scrubbing",
"regex": "^(¿)?([1-9]|10|11|12):[0-5][0-9]( [AP]M)?[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
}
Exclude price amounts from redaction:
{
"description": "exclude prices from scrubbing",
"regex": "^(¿)?([\\d,. ]+(R?\\$|€)|(R?\\$|€)[\\d,. ]+)[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
}
Exclude short decimal numbers from redaction:
{
"description": "exclude short floating point numbers (w/decimal point) from scrubbing",
"regex": "^(¿)?\\d{1,4}[.,]\\d{1,4}[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
}
Default redaction file — scrub.conf
Use a text editor to replicate the default redaction file. The text editor must be capable of saving in plain text. Freely available text editors such as Emacs, Vim, Nano, and Notepad++ work best as they save in plain text by default.
The redaction file must conform to standard JSON formatting and regular expression matching. Refer to JSON Structures for more information on JSON formatting. Additionally, the file must include the default rules and exclusions in the sample below. Include custom redaction rules and exclusions after the defaults.
The following sample is the default redaction file that is automatically applied unless specified otherwise:
[
{
"README": "DO NOT EDIT/REMOVE THIS FILE - USER MODIFICATIONS SHOULD BE MADE IN /opt/voci/state/scrub.conf",
"description": "README",
"regex": "^$",
"repl": "",
"text": false,
"report": false,
"audio": false
},
{
"description": "always scrub audio for any characters surrounded by double-octothorpes (via substitutions)",
"regex": "^((¿)?)##(.+)##([.,?]?)$",
"repl": "\\1\\3\\4",
"tospace": "_"
},
{
"description": "exclude any characters surrounded by double-atsymbols (via substitutions)",
"regex": "^((¿)?)@@(.+)@@([.,?]?)$",
"repl": "\\1\\3\\4",
"tospace": "_",
"report": false,
"audio" : false
},
{
"description": "Exclude words that include non-digits other than punctuation",
"regex": "[^-+$%:0-9.,?]",
"repl": "",
"text": false,
"report": false,
"audio": false
},
{
"description": "exclude ordinal numbers from scrubbing",
"regex": "^(¿)?(\\d+/)?\\d+(st|nd|rd|th|ᵒ|ᵃ|e|er|re)[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
},
{
"description": "exclude percentages from scrubbing",
"regex": "^(¿)?(\\d+[.,])?\\d+%[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
},
{
"description": "exclude clock times from scrubbing",
"regex": "^(¿)?([1-9]|10|11|12):[0-5][0-9]( [AP]M)?[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
},
{
"description": "exclude prices from scrubbing",
"regex": "^(¿)?([\\d,. ]+(R?\\$|€)|(R?\\$|€)[\\d,. ]+)[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
},
{
"description": "exclude short floating point numbers (w/decimal point) from scrubbing",
"regex": "^(¿)?\\d{1,4}[.,]\\d{1,4}[.,?]?$",
"repl": "",
"text": false,
"report": false,
"audio": false
},
{
"description": "replace all other digits with #",
"regex": "\\d",
"repl": "#"
}
]
Exclusion examples
Customize the information you don't want to be redacted with rules to exclude characters from redaction. Exclusions are useful for avoiding unnecessary redactions such as product names that contain numbers. The following examples are redaction rules that exclude certain characters from redaction:
Match 2 digit number and return 2 digit number:
{
"description": "exclude 2 digit numbers from scrubbing",
"regex": "(^\\d{2}[.,]?$)",
"repl": "\\1"
"text": false,
"report": false,
"audio": false
}
Exclude alpha numeric:
{
"description": "exclude alpha numeric - anything with both letters and numbers from scrubbing",
"regex": "^([A-Za-z]+\\d+\\w*|\\d+[A-Za-z]+\\w*)([.,?]?)$",
"repl": "\\1",
"text": false,
"report": false,
"audio": false
}