Cleansing Project Data
Combine a variety of functions such as data faceting and transforming to cleanse project data.
The patients project created in
Creating a Dataset and a Project is used to show how to cleanse data.
Procedure
- Go to the patients project data page.
- From the PATNO column menu, click Facet > Text Facet.
-
In the PATNO facet panel, click
count.
Patient number of 002, 003, and 006 have duplicates.
- Remove the duplicate rows or modify the duplicate value: Continue to remove other duplicate rows. Here, PANTO 003 and PANT 006 have duplicate values, not duplicate rows, so either remove a duplicate row or modify one duplicate patient number.
-
From the GENDER column menu, click
Facet > Text Facet.
The GENDER column has invalid values: 2, X, and (blank).
- Remove the invalid value 2:
-
From the VISIT column menu, click
Facet > Text pattern facet.
Some dates are not in the MM/dd/yyyy format.
- Transform the invalid date formats to the MM/dd/yyyy format:
-
Check the HR column data:
- From the HR column menu, click Facet > Numeric facet.
-
Clear the
Numeric check box to display results of non-numeric and blank data only.
Six matching rows are displayed.
- From the HR column menu, click Edit cells > Common transforms > Blank out cells.
- Flag all blank rows.
- Select the Numeric check box.
-
Move
between the ranges of 10 - 40 and 120 - 910 respectively.
- Flag all the rows that fall into these two ranges.
- Click Reset to return to the project data page.
When you finish the transformation, all the rows with invalid and incomplete patient values are flagged.
- From the AE column, click Facet > Text Facet:
- On the toolbar, click Data > Edit rows > Remove all flagged rows.
- On the toolbar, click Data > Edit rows > Remove all validated errors rows.
Copyright © Cloud Software Group, Inc. All Rights Reserved.