Cleansing Project Data

Combine a variety of functions such as data faceting and transforming to cleanse project data.

The created patients project is used to show how to cleanse data.

Procedure

  1. Go to the patients project data page.
  2. From the PATNO column menu, click Facet > Text Facet.
  3. In the PATNO facet panel, click count.
    Patient 2, 3, and 6 have duplicates.
  4. Remove the duplicate rows or modify the duplicate value:
    1. Click 2.
      Two rows of patient 2 with identical information are displayed.
    2. Flag either row.
    3. On the toolbar, click Data > Edit rows > Remove all flagged rows.
    Continue to remove other duplicate rows. In this example, PANTO 3 and PANT 6 have duplicate values, not duplicate rows, so either remove a duplicate row or modify one duplicate patient number.
  5. From the GENDER column menu, click Facet > Text Facet.
    The GENDER column contains three types of invalid values: 2, X, and (blank).
  6. Remove the invalid value 2:
    1. Click 2.
      One matching row is displayed.
    2. From the GENDER column menu, click Edit cells > Common transforms > Blank out cells to remove this value.
  7. From the VISIT column menu, click Facet > Text pattern facet.
    Some dates are not in the MM/dd/yyyy format.
  8. Transform the invalid date formats to the MM/dd/yyyy format:
    1. Click 99/99/99, and then update the value of VISIT, for example, 10/12/1998.
    2. Click AAAAAAAA, and then blank out the value of VISIT.
  9. Check the HR column data:
    1. Click the data type icon before the HR column header, and change its data type to Integer.
    2. From the HR column menu, click Facet > Numeric facet.
    3. Clear the Numeric check box to display results of non-numeric and blank cells only.
      Six matching rows are displayed.
    4. From the HR column menu, click Edit cells > Common transforms > Blank out cells.
    5. Flag all blank rows.
    6. Select the Numeric check box.
    7. Move to the ranges of 10 - 40 and 120 - 910 respectively.
    8. Flag all the rows that fall into these two ranges.
    9. Click Reset to return to the project data page.
    Apply the same transformation rules for the HR column to the SBP, DBP, and DX columns.

    When you finish the transformation, all the rows with invalid and incomplete patient values are flagged.

  10. From the AE column menu, click Facet > Text Facet:
    1. In the AE facet panel, click (blank), and then flag the rows with blank AE value.
    2. Click Reset to return to the project data page.
  11. On the toolbar, click Data > Edit rows > Remove all flagged rows.
  12. On the toolbar, click Data > Edit rows > Remove all validated errors rows.

Result

The patients project data is cleansed.

What to do next

Exporting Project Data