Cleansing Project Data

Combine a variety of functions such as data faceting and transforming to cleanse project data.

The patients project created in Creating a Dataset and a Project is used to show how to cleanse data.

Procedure

  1. Go to the patients project data page.
  2. From the PATNO column menu, click Facet > Text Facet.
  3. In the PATNO facet panel, click count.
    Patient number of 002, 003, and 006 have duplicates.
  4. Remove the duplicate rows or modify the duplicate value:
    1. Click 002.
      Two rows of patient 002 with identical information are displayed.
    2. Flag either row.
    3. On the toolbar, click Data > Edit rows > Remove all flagged rows.
    Continue to remove other duplicate rows. Here, PANTO 003 and PANT 006 have duplicate values, not duplicate rows, so either remove a duplicate row or modify one duplicate patient number.
  5. From the GENDER column menu, click Facet > Text Facet.
    The GENDER column has invalid values: 2, X, and (blank).
  6. Remove the invalid value 2:
    1. Click 2.
      One matching row is displayed.
    2. From the GENDER column menu, click Edit cells > Common transforms > Blank out cells to remove this value.
  7. From the VISIT column menu, click Facet > Text pattern facet.
    Some dates are not in the MM/dd/yyyy format.
  8. Transform the invalid date formats to the MM/dd/yyyy format:
    1. Click 99/99/99, and then update value of VISIT, for example, 10/12/1998.
    2. Click AAAAAAAA, and then blank out the value of VISIT.
  9. Check the HR column data:
    1. From the HR column menu, click Facet > Numeric facet.
    2. Clear the Numeric check box to display results of non-numeric and blank data only.
      Six matching rows are displayed.
    3. From the HR column menu, click Edit cells > Common transforms > Blank out cells.
    4. Flag all blank rows.
    5. Select the Numeric check box.
    6. Move between the ranges of 10 - 40 and 120 - 910 respectively.
    7. Flag all the rows that fall into these two ranges.
    8. Click Reset to return to the project data page.
    Apply the same transformation rules for the HR column to the SBP, DBP, and DX numeric columns.

    When you finish the transformation, all the rows with invalid and incomplete patient values are flagged.

  10. From the AE column, click Facet > Text Facet:
    1. In the AE facet panel, click (blank), and then flag the rows with blank AE value.
    2. Click Reset to return to the project data page.
  11. On the toolbar, click Data > Edit rows > Remove all flagged rows.
  12. On the toolbar, click Data > Edit rows > Remove all validated errors rows.

Result

The patients project data is cleansed.

What to do next

Exporting Project Data