Workspace Node: Text Mining - Results - Concept Extraction Tab

In the Text Mining node dialog box, under the Results heading, select the Concept Extraction tab to access options to perform singular value decomposition on the term-document matrix, based on the selected words only, and with the word frequencies or transformed word frequencies as currently selected in the Frequency measure group box on the Frequency Measure tab. Note that the results will only be available for one particular type of transformation of the word frequencies, and once you change your selection on the Frequency Measure tab, any previously computed results for singular value decomposition will be discarded.

The last computed SVD results for the currently selected words will be saved automatically in the Text Miner project’s database file (see the documentation for the Projects tab). Saved data can then be used in subsequent analyses to automatically index and score new documents. Thus, this data is essential for many applications of text mining (as discussed in the Introductory Overview), for example, to implement automatic mail filters or text routing systems.

Singular value decomposition is an analytic tool for feature extraction that can be used to determine a few underlying "dimensions" that account for most of the common contents or "meaning" of the documents and words that were extracted. See also the section on latent semantic indexing in the Introductory Overview for additional details.

For computational details of the statistics and results available on this tab, see Singular Value Decomposition in STATISTICA Text Mining and Document Retrieval.

Element Name Description
Perform Singular Value Decomposition (SVD) Select this check box to compute the singular value decomposition of the term-documents matrix (word occurrences or their transformations as currently selected). When this check box is selected, the various options available for reviewing the coefficients and document scores will become available.
Number of concepts to use You can limit the number of singular values to use (it will be equal to or smaller than the number generated as a result of SVD calculations). This enables you to control the number of "dimensions of meaning" to consider.
Scree plot Select this check box to create a scree plot of the singular values extracted from the term-document matrix. This plot is useful for determining the number of singular values that are useful and informative, and that should be retained for subsequent analyses. Usually, the number of "informative" dimensions to retain for subsequent analysis is determined by locating the "elbow" in this plot, to the right of which one presumably finds on the factorial "scree" due to random noise.
Singular values Select this check box to produce a results spreadsheet with the singular values as computed and selected.
Word
Coefficients Select this check box to produce a results spreadsheet with the word coefficients. You can use the standard spreadsheet options to turn these results into an input spreadsheet in order to, for example, create 2D scatterplots for selected dimensions. Such scatterplots, when they contain labeled points (see also 2D Brushing), can be very useful for exploring the meaning of the dimensions into which the words and documents are mapped, i.e., to understand the semantic space for the extracted words or terms and documents. See also the section on latent semantic indexing via singular value decomposition in the Introductory Overview.
Importance Select this check box to produce the word importance values computed from the singular value decomposition. As described in Singular Value Decomposition in STATISTICA Text Mining and Document Retrieval, the reported values (indices) are proportional to and can be interpreted as the extent to which the individual words are represented or reproduced by the dimensions extracted via singular value decomposition and, hence, how important the words are for defining the semantic space extracted by this technique.
Residuals Select this check box to produce the sums of squares of word residuals from the singular value decomposition. As described in Singular Value Decomposition in STATISTICA Text Mining and Document Retrieval, these values are related to the extent to which each word is represented well by the semantic space defined by the dimensions extracted via singular value decomposition (see also, Latent Semantic Indexing).
Document scores Select this check box to generate a results spreadsheet with the document scores. Like the word coefficients, these can be plotted in 2D or 3D scatterplots to aid in the interpretation of the semantic space defined by the extracted words and documents in the analysis.

Options / C. See Common Options.