Data Preparation - Advanced Tab
For Data Preparation step, the following options are available on the Advanced tab.
| Element Name | Description |
|---|---|
| Use Sample data | Use the options in this group box to create a new output data spreadsheet that is a sample of the input data set. Select the sampling method, and then click the
More options button to view sampling options specific to the selected method.
|
| Remove duplicate records (cases) | You can select this check box to detect and remove duplicate records during the run and validation process.
Select the Remove duplicate records (cases) check box and click the Duplicate records (cases) button to display the Select variables to define duplicate records dialog box. Use the single variable selection dialog box to select any number of variables that specify the basis of distinction for de-duping the data set. |
| Valid data range | You can use the options in this dialog box to specify a minimum and maximum value for each of the selected variables.
Cases with values outside the specified range are treated as invalid data. Select the Valid data range check box and click the Valid data range button to display the Missing data and Invalid Case Definition dialog box. |
| Remove outlier | You can use the options in this dialog box to select the variables for outlier analysis and specify how to treat outliers once they are detected.
The Set to boundary option (located in the Outlier and Extreme Value dialog box) iteratively recodes outliers and extreme values to +/-3 standard deviation limits. Select the Remove outlier check box and click the Outlier button to display the Outlier and Extreme Value dialog box. |
| Missing data | You can specify the type of algorithms for handling missing cases, including the methods of mean substitution and case wise deletion, for each variable in the analysis.
Select the Missing data check box and click the Missing data definition button to display the Missing data definition dialog box. |