Text Mining and Document Retrieval Overview

The Statistica Text and Document Mining module provides powerful tools to process unstructured (textual) information. Here, processing information includes gathering, manipulating, storing, retrieving, and classifying recorded information. The information is made accessible to the various data mining (statistical and machine learning) algorithms available in the Statistica System The information can extracted to derive summaries for the words or compute summaries for the documents based on word.

The methods implemented in this module are described and discussed in great detail by Manning and Schütze (2002). For an in-depth treatment of these and related topics as well as the history of this approach to text mining, we highly recommend that source. See also, Miner, G.; Elder, J., Hill, T., Nisbet, R., Delen, D., Fast, A. (2012).

  • Text Mining Applications
  • Approaches to Text Mining
  • Issues and Considerations for "Numericizing"
  • Word Frequencies Transformation
  • Latent Semantic Indexing using Singular Value
  • Incorporation of Text Mining Results in Data Mining Project