Identifying PII data

    Identifying PII data


    Article summary

    Document details

    Purpose

    To help you use data discovery across an environment to remove any PII data.

    Audience

    Anyone needing to ensure their test data is compliant with privacy legislation and who cares about restricting access to users’ sensitive data.

    Requirements

    Access to the Curiosity Dashboard.

    Additional Links

    Video tutorial on PII Identification in the Curiosity Platform

    Why do you need to do this

    To identify sensitive data that could either identify customers or be sensitive, such as account numbers, or medical details in your databases.

    In order to do this you first need to scan the database.

    Set up the Scan Database activity

    You will need to create a scanning activity and set up some classifiers

    This process is detailed in the manual Data Discovery → Data Profiling [ADD LINK ONCE IT IS VISIBLE]

    You should end up with an activity like this example where I have added the database connection and definition (1), then created three category lists for:

    Regex (4): which looks at the data in the columns and matches them with regular expressions to identify the data characteristics

    Name (3): which looks are column names and data types

    Seed (2): which examines particular data values

    Then create a scan submit form (5)

    Finally execute the scan (6) and wait for the job to finish.

    Note that on the configuration tab, you should click edit

    Then make sure that the ‘Scanning list application id’ is set to the same application as was set in the components.

    Review the PII data

    Once the scan has competed, you can go to the data dictionary page (1), select the Definitions tab (2) and click on your database definition (3)

    Then you can open the versions list (1) and choose the latest version (2)

    Then expand the table lists (X) to view the columns (X) to see which are tagged as PII data (X)

    [SCREEN SHOT HERE  of expanded list of accounts and columns showing PII tags ONCE functionality on partners - Mark: column with PII tag, and ‘view diagram’ button - ideally with drop down list shown and Entity relationship diagram marked as well]

    You can also view the entity relationship diagram (X) by selecting it from the view diagram button (X)

    On the diagram you can see columns that are PII data marked with a shield symbol (X).  These will be data that you either need to mask or generate data for.

    [ENTITY Diagram screen shot here - with PII data highlighted]