Workspace Node Elasticsearch Text Analysis Results-Summary Tab

In the Elasticsearch Text Analysis node dialog box, under the Results heading, select Summary tab to access the following options.

Option Description
Frequency Measure
Inverse Document Frequency Select this option to analyze and report inverse document frequencies.

These are relative document frequencies (df) of different words.

Raw Select this option to operate on raw word frequencies collected in the term-document index.
Frequency Outputs
Minimum % of files where word occurs Use this option to specify the minimum permissible document frequency (specify a percentage value) for the analysis. Words that occur in fewer than the indicated percentage of documents are deemed as non-diagnostic and excluded from the analysis.
Maximum number of words to be selected Use this option to specify an integer number for the maximum number of indexed words to be selected for subsequent analysis. If the number of indexed terms exceeds this limit, the program trims it by selecting those with the highest document frequencies, and among those with equal document frequencies - ones with higher total occurrence count.
Frequency Matrix Select this option to compute statistical summaries for each word (within each document). These are simple transformations of the original word frequencies. This is done to achieve more meaningful indices with values and distributions (for example, of the words across the documents).
Selected Words Select this option to display the words that are extracted from the documents and their frequencies (the overall word frequencies as well as document frequencies i.e., number of documents in which they are found).