Example 2: Illustration of EM Clustering with a synthetic data set

Then click OK to apply these formulas. The results will look (approximately) as follows:

The first variable will contain the three integer values 1 (cases 1 through 1000), 2 (cases 1001 through 2000), and 3 (2001 through 3000). Variable 2 will contain normal random numbers with (approximate) means and standard deviations 5 and 1 (cases 1 through 1000), 10 and 2 (cases 1001 through 2000), and 15 and 3 (2001 through 3000). Variable 3 will contain Poisson random numbers with (approximate) parameter values 5 (cases 1 through 1000), 10 (cases 1001 through 2000), and 15 (2001 through 3000).
Click on the EM tab, and click the Select distributions button; then specify variable 2 (Var2) into the list of Normal variables, and variable 3 into the list of Poisson variables.

Now, click OK to begin the analysis, and after a few seconds the Results dialog will be displayed.

Shown above are the results for both variables Var2 and Var3. The final parameter estimates for the different distributions (for each cluster) are indicated in the header of each graph, which specifies the respective distribution functions depicted in each graph. As you can see, the parameters that we "inserted" into the data by generating random numbers from known distributions (with different parameters for each of the three clusters) are reasonably reproduced. In other words, the mixture of 3 normal and 3 Poisson distributions was successfully estimated from the data, and the clusters extracted as expected. This example illustrates further the basic "mechanism" of the EM clustering algorithm, as further detailed in the Introductory Overview.