LDA Training and Model Evaluation Tips
When you are using the LDA Predictor and LDA Trainer, following these guidelines can produce more meaningful results.
- Select the right n-grams to use: ensure that the N-Gram Dictionary and the n-gram selection method used are relevant (by specifying/updating a customized Stop Words File in the N-Gram Dictionary Builder, and changing the n-gram selection method):
- Run the LDA long enough (it can require many iterations to obtain relevant topics)
- Try different parameters (number of topics, etc.) and evaluate log perplexity on a held-out sample.
- Building a good LDA model often requires many iterations and human feedback. Indeed, log perplexity is good for relative comparisons between models or parameter settings, but its numeric value doesn't really mean much, and it's not correlated to human judgment.
- Inspect the topics: Look at the highest-likelihood words in each topic. Do they sound like they form a cohesive topic, or just some random group of words?
- Inspect the topic assignments. Hold out a few random documents from training and see what topics LDA assigns to them. Manually inspect the documents and the top words in the assigned topics. Does it look like the topics really describe what the documents are actually about?
- Look at the density of words of the topics: if you have a topic with weak/low densities for its constituent words, it is most likely a weak topic.
Related tasks
Copyright © Cloud Software Group, Inc. All rights reserved.