Identifying PII data
Document details | |
Purpose | To help you use data discovery across an environment to remove any PII data. |
Audience | Anyone needing to ensure their test data is compliant with privacy legislation and who cares about restricting access to users’ sensitive data. |
Requirements | Access to the Curiosity Dashboard. |
Additional Links | Video tutorial on PII Identification in the Curiosity Platform |
Why do you need to do this
To identify sensitive data that could either identify customers or be sensitive, such as account numbers, or medical details in your databases.
In order to do this you first need to scan the database.
Set up the Scan Database activity
You will need to create a scanning activity and set up some classifiers
This process is detailed in the manual Data Discovery → Data Profiling [ADD LINK ONCE IT IS VISIBLE]
You should end up with an activity like this example where I have added the database connection and definition (1), then created three category lists for:
Regex (4): which looks at the data in the columns and matches them with regular expressions to identify the data characteristics
Name (3): which looks are column names and data types
Seed (2): which examines particular data values
Then create a scan submit form (5)
Finally execute the scan (6) and wait for the job to finish.
Note that on the configuration tab, you should click edit
Then make sure that the ‘Scanning list application id’ is set to the same application as was set in the components.
Review the PII data
Once the scan has competed, you can go to the data dictionary page (1), select the Definitions tab (2) and click on your database definition (3)
Then you can open the versions list (1) and choose the latest version (2)
Then expand the table lists (X) to view the columns (X) to see which are tagged as PII data (X)
[SCREEN SHOT HERE of expanded list of accounts and columns showing PII tags ONCE functionality on partners - Mark: column with PII tag, and ‘view diagram’ button - ideally with drop down list shown and Entity relationship diagram marked as well]
You can also view the entity relationship diagram (X) by selecting it from the view diagram button (X)
On the diagram you can see columns that are PII data marked with a shield symbol (X). These will be data that you either need to mask or generate data for.
[ENTITY Diagram screen shot here - with PII data highlighted]