Spreadsheet Formulas - Regular Expression Function Examples
The following examples demonstrate the use of Statistica regular expression functions:
Example 1 (Spreadsheet Function)
Replaces all non-alphanumeric characters to underscore in variable 1. This could be useful for cleaning variable text labels.
=RE_REPLACE(v1, "[^[:alnum:]]", "_", "g")
Example 2 (Case Selection Condition)
Matches observations where variable 1 matches "potato", "potatoe" (a common misspelling), "potatoes", and " potatos" (another common misspelling)
RE_MATCH(v1, "potatoe?s?", "i")
Example 3 (Case Selection Condition)
Matches any observation where variable 1 is a "*.txt" file located within a folder named either "Help Files" or "Hilfe Datei" within the folder "C:\My Documents\Emma" (Windows folder structure) or "/home/Emma" (*NIX folder path).
RE_MATCH(v1, "(c:\\My Documents\\Emma\\|/home/Emma/)(Hilfe[[:space:]]Datei|Help[[:space:]]Files)[\\/].*.txt", "i")
Valid observations could be:
- /home/Emma/Help Files/man.txt
- c:\my documents\emma\help files\examples.txt
- /home/Emma/Hilfe Datei/manual.txt
- c:\My Documents\EMMA\help files\documentation.txt
This could be useful for subsetting WebCrawler results.
Example 4 (Case Selection Condition)
Matches any observation where variable 1 contains the URL "www.codeguru" or "www.codeproject".
RE_SEARCH(v1, "www.code(guru|project).", "i")
This could be useful for subsetting WebCrawler results.
Example 5 (Case Selection Condition)
Matches any observation where variable 1 contains a line that begins with either "To: Stacey" or "From: Stacey". Note that "Stacey" could also be spelled "Staci" or "Stacy" and could have any number of spaces in front of it. Finally, there could potentially be a quotation mark in front of "Stacey", so we place a "\x022" in front to indicate this.
RE_SEARCH(v1, "^(To|From):[[:space:]]*\x022?Stace?(i|y)", "im")
This could be useful for subsetting emails pasted into a spreadsheet that will be analyzed by Text Miner.
Example 6 (Case Selection Condition)
Matches any observation where variable 1 matches "Stichprobengröße". Note that the romanizations of this German word will also be matched. That is to say, 'ß' and "ss"; will be interpreted as the same character, as will 'ö' and "oe".
RE_MATCH(v1, "Stichprobengr(oe|\x0F6)(ss|ß)e", "i")
Example 7 (Case Selection Condition)
Matches any observation where variable 3 matches anything beginning with "fail".
v3 LIKE 'FAIL%'
This matches items such as "failed", "FAIL", or "failure".
See also: Spreadsheet Formulas - Overview, Spreadsheet Formulas - Syntax Summary, Spreadsheet Formulas - Examples, Spreadsheet Formulas - Predefined Functions.