Concept Extraction Tab
Select the Concept extraction tab of the Results dialog box to access options to perform singular value decomposition on the term-document matrix, based on the selected words only, and with the word frequencies or transformed word frequencies as currently selected in the Frequency (importance/relevance measure) group box in the Text Miner Results dialog box.
Note that the results will only be available for one particular type of transformation of the word frequencies, and once you change your selection in the Text Miner Results dialog box, any previously computed results for singular value decomposition will be discarded.
The last computed SVD results for the currently selected words will be saved automatically in the Text Miner project’s database file (see the documentation for the Project tab of the Text Mining dialog box). Saved data can then be used in subsequent analyses to automatically index and score new documents. Thus, this data is essential for many applications of text mining (as discussed in the Introductory Overview), for example, to implement automatic mail filters or text routing systems.
Option | Description |
---|---|
Perform Singular Value Decomposition (SVD) | Starts computing the singular value decomposition of the term-documents matrix (word occurrences or their transformations as currently selected in the Text Mining Results dialog box). When the computations are completed, the various options available for reviewing the coefficients and document scores becomes available (not dimmed). |
Number of concepts to use | In this box, you can limit the number of singular values to use (it will be equal to or smaller than the number generated as a result of SVD calculations). This allows you to control the number of dimensions of meaning to consider. |
Scree plot | Click this button to create a scree plot of the singular values extracted from the term-document matrix. This plot is useful for determining the number of singular values that are useful and informative, and that should be retained for subsequent analyses. Usually, the number of informative dimensions to retain for subsequent analysis is determined by locating the elbow in this plot, to the right of which one presumably finds on the factorial scree due to random noise. |
Singular values | Click this button to produce a results spreadsheet with the singular values as computed and selected. |
Word Coefficient | Click this button to produce a results spreadsheet with the word coefficients. You can use the standard spreadsheet options to turn these results into an input spreadsheet in order to, for example, create 2D scatterplots for selected dimensions. Such scatterplots, when they contain labeled points (see also 2D Brushing), can be very useful for exploring the meaning of the dimensions into which the words and documents are mapped, i.e., to understand the semantic space for the extracted words or terms and documents. See also the section on latent semantic indexing via singular value decomposition in the Introductory Overview. |
Importance | Click this button to display the word importance values computed from the singular value decomposition. As described in Singular Value Decomposition in Statistica Text Mining and Document Retrieval, the reported values (indices) are proportional to and can be interpreted as the extent to which the individual words are represented or reproduced by the dimensions extracted via singular value decomposition and, hence, how important the words are for defining the semantic space extracted by this technique. |
Residuals | Click this button to display the sums of squares of word residuals from the singular value decomposition. As described in Singular Value Decomposition in Statistica Text Mining and Document Retrieval, these values are related to the extent to which each word is represented well by the semantic space defined by the dimensions extracted via singular value decomposition (see also, Latent Semantic Indexing). |
Document scores | Click this button to generate a results spreadsheet with the document scores; like the word coefficients, these can be plotted in 2D or 3D scatterplots to aid in the interpretation of the semantic space defined by the extracted words and documents in the analysis. |