Workspace Node: Text Mining - Specifications - Words Tab

In the Text Mining node dialog box, under the Specifications heading, select the Words tab to access options to "fine-tune" the words and phrases indexed for the final results.

See also the Introductory Overview.

Note: Selected and unselected words; indexed and non-indexed words. It is important to distinguish between selected and unselected words vs. indexed and non-indexed words. Words or terms can be indexed in the (internal) database but not selected into the word list from which final results are computed (e.g., singular value decomposition). The options on this tab pertain to the indexing of words, e.g., stop words specified on this tab will be discarded and will not be indexed (and, hence, will not be selected).
Element Name Description
Phrases (word combinations treated as single word) Select this check box to search for multiple words as phrases, so that the entire phrase is treated as a separate term during indexing (e.g., Microsoft Windows should be treated as a phrase, while Microsoft and Windows could also be indexed as separate terms). After you select this check box, the Edit and Select buttons will be enabled.
Edit Click this button to display the Phrase Editor, where you can edit the list of phrases to be included for indexing (one phrase per line).
Select Click this button to display the Open Phrase (Text) File dialog box, where you can locate and select a file including the phrases for indexing. These should be simple text files with a single phrase per line.
Stop words (discarded, excluded from indexing) Select this check box to exclude non-informative or non-diagnostic terms from the results during indexing and, hence, from the analyses and final results. After you select this check box, the Edit and Select buttons will be enabled.
Edit Click this button to display the Stop-Word Editor, where you can edit the list of stop words (one word or term per line).
Select Click this button to display the Open Stop-Word (Text) File dialog box, where you can select a file that includes the list of stop-words. For most languages, a default list is supplied (e.g., EnglishStopList.txt), including the most common words such as the English "a," "the," "also," etc. These files can further be edited via the Edit button (see above).
Inclusion words (words not in this list are discarded) Select this check box to specify the words and terms that are to be indexed and included in the analyses. These options are useful when you want to use an a priori list of words or terms and enumerate the frequencies with which these occur in the input documents. After you select this check box, the Edit and Select buttons will be enabled.
Edit Click this button to display the Inclusion Word Editor, where you can edit the word or term list for the analyses.
Select Click this button to display the Open Inclusion Word (Text) File dialog box, where you can locate and select a file including the words and terms that are to be indexed, selected, and included in the analyses. The file should be a simple text file, where each term or word is placed on a separate line.
Synonyms (replace, combine words) Select this check box to specify words that are to be treated as synonyms during indexing and when computing results. For example, you could combine the words "supper" and "dinner" as synonyms, and count each as a reference to meals consumed in the late afternoon or evening. After you select this check box, the Edit and Select buttons will be enabled.
Edit Click this button to display the Synonyms dialog box, where you can edit the synonym list (one set of synonyms per line).
Select Click this button to display the Open Synonym (Text) File dialog box, where you can locate and select a file containing the synonym list for indexing.

Options / C / W. See Common Options.