Filters Tab

Select the Filters tab in the Text Mining dialog box to access options to specify various parameters that define valid words to be included in the indexing. The program creates an index of words and terms and selects a certain number of terms for further analyses and reporting (in the Results dialog box). Many more words than those that are selected can be indexed and can be accessible later by selecting them, provided that the Keep unselected words in database for browsing option is selected on the Advanced tab. Otherwise, unselected words will be discarded. Use the options on the Defaults tab to save or retrieve the settings for these options, and to set the defaults for future analyses.

The options on this tab enable you to prevent the indexing of particular words, and in this case those words cannot later be re-selected for further analyses. It is desirable for performance reasons to keep the list of indexed words as small as possible, in particular when indexing very large document collections.

Word length

Option Description
Min. Specify the minimum number of characters permissible in a word; words that are shorter than specified will not be indexed and will be excluded from the analysis.
Max. Specify the maximum number of characters permissible in a word; words that are longer than specified will not be indexed and will be excluded from the analysis.
Min stem length Specify the minimum number of characters permissible in an indexed word after stemming; words that are shorter than specified indexed and will be excluded from the analysis.
Min num of vowels Specify the minimum number of vowels permissible in a word; words with fewer vowels than specified will not be selected (or indexed, unless the Keep unselected words in database for browsing check box is selected on the Advanced tab), and will be excluded from the analysis.

Maximum number of consecutive

Option Description
Consonants Specify the maximum number of consecutive consonants permissible in a word; words with more consecutive consonants than specified will not be indexed and will be excluded from the analysis.
Vowels Specify the maximum number of consecutive vowels permissible in a word; words with more consecutive vowels than specified will not be indexed and will be excluded from the analysis.
Duplicates Specify the maximum number of consecutive identical characters permissible in a word; words with more consecutive identical characters than specified will not be indexed and will be excluded from the analysis.
Punctuations Specify the maximum number of consecutive punctuations permissible in a word; words with more consecutive punctuations than specified will not be indexed and, hence, will be excluded from the analysis. This option interacts with the option Characters for word on the Characters tab. Specifically, what constitutes a punctuation here is determined by the punctuation characters specified there. By default, the only punctuation character is "-" (the dash), and if this parameter is set to 1, only words including 1 consecutive dash are permissible.