Text Mining and Document Retrieval Overview
The Statistica Text and Document Mining module provides powerful tools to process unstructured (textual) information. Here, processing information includes gathering, manipulating, storing, retrieving, and classifying recorded information. The information is made accessible to the various data mining (statistical and machine learning) algorithms available in the Statistica System The information can extracted to derive summaries for the words or compute summaries for the documents based on word.
The methods implemented in this module are described and discussed in great detail by Manning and Schütze (2002). For an in-depth treatment of these and related topics as well as the history of this approach to text mining, we highly recommend that source. See also, Miner, G.; Elder, J., Hill, T., Nisbet, R., Delen, D., Fast, A. (2012).