Why do you need to do this
To identify sensitive data that could either identify customers or be sensitive, such as account numbers, or medical details in your databases.
In order to do this, you first need to scan the database.
Set up the Scan Database activity
You will need to create a scanning activity and set up some classifiers
This process is detailed in the manual Data Discovery → Data Profiling [ADD LINK ONCE IT IS VISIBLE]
You should end up with an activity like this example where I have added the database connection and definition (1), then created three category lists for:
Regex (4): which looks at the data in the columns and matches them with regular expressions to identify the data characteristics
Name (3): which looks are column names and data types
Seed (2): which examines particular data values
Then create a scan submit form (5)
Finally execute the scan (6) and wait for the job to finish.
Note that on the configuration tab, you should click edit
Then make sure that the ‘Scanning list application id’ is set to the same application as was set in the components.
Review the PII data
Once the scan has competed, you can go to the data dictionary page (1), select the Definitions tab (2) and click on your database definition (3)
Then you can open the versions list (1) and choose the latest version (2)
Then expand the table lists (X) to view the columns (X) to see which are tagged as PII data (X)
You can also view the entity relationship diagram (X) by selecting it from the view diagram button (X)
On the diagram you can see columns that are PII data marked with a shield symbol (X). These will be data that you either need to mask or generate data for.