Spreadsheet Formulas - Regular Expression Function Examples

The following examples demonstrate the use of Statistica regular expression functions:

Example 1 (Spreadsheet Function)

Replaces all non-alphanumeric characters to underscore in variable 1. This could be useful for cleaning variable text labels.

=RE_REPLACE(v1, "[^[:alnum:]]", "_", "g")

Example 2 (Case Selection Condition)

Matches observations where variable 1 matches "potato", "potatoe" (a common misspelling), "potatoes", and " potatos" (another common misspelling)

RE_MATCH(v1, "potatoe?s?", "i")

Example 3 (Case Selection Condition)

Matches any observation where variable 1 is a "*.txt" file located within a folder named either "Help Files" or "Hilfe Datei" within the folder "C:\My Documents\Emma" (Windows folder structure) or "/home/Emma" (*NIX folder path).

RE_MATCH(v1, "(c:\\My Documents\\Emma\\|/home/Emma/)(Hilfe[[:space:]]Datei|Help[[:space:]]Files)[\\/].*.txt", "i")

Valid observations could be:

  • /home/Emma/Help Files/man.txt
  • c:\my documents\emma\help files\examples.txt
  • /home/Emma/Hilfe Datei/manual.txt
  • c:\My Documents\EMMA\help files\documentation.txt

This could be useful for subsetting WebCrawler results.

Example 4 (Case Selection Condition)

Matches any observation where variable 1 contains the URL "www.codeguru" or "www.codeproject".

RE_SEARCH(v1, "www.code(guru|project).", "i")

This could be useful for subsetting WebCrawler results.

Example 5 (Case Selection Condition)

Matches any observation where variable 1 contains a line that begins with either "To: Stacey" or "From: Stacey". Note that "Stacey" could also be spelled "Staci" or "Stacy" and could have any number of spaces in front of it. Finally, there could potentially be a quotation mark in front of "Stacey", so we place a "\x022" in front to indicate this.

RE_SEARCH(v1, "^(To|From):[[:space:]]*\x022?Stace?(i|y)", "im")

This could be useful for subsetting emails pasted into a spreadsheet that will be analyzed by Text Miner.

Example 6 (Case Selection Condition)

Matches any observation where variable 1 matches "Stichprobengröße". Note that the romanizations of this German word will also be matched. That is to say, 'ß' and "ss"; will be interpreted as the same character, as will 'ö' and "oe".

RE_MATCH(v1, "Stichprobengr(oe|\x0F6)(ss|ß)e", "i")

Note: you can either enter extended ASCII letters as they appear (e.g. 'ö') or as their respective hexadecimal value (e.g. "\x0F6" ).

Example 7 (Case Selection Condition)

Matches any observation where variable 3 matches anything beginning with "fail".

v3 LIKE 'FAIL%'

This matches items such as "failed", "FAIL", or "failure".

See also: Spreadsheet Formulas - Overview, Spreadsheet Formulas - Syntax Summary, Spreadsheet Formulas - Examples, Spreadsheet Formulas - Predefined Functions.