---
title: "Data Profiling"
slug: "data-profiling"
updated: 2025-08-05T08:45:52Z
published: 2025-08-05T08:45:52Z
canonical: "knowledge.curiositysoftware.ie/data-profiling"
---

> ## Documentation Index
> Fetch the complete documentation index at: https://knowledge.curiositysoftware.ie/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Profiling

Automatically locate, catalogue and audit sensitive information in your data, flagging it for removal before provisioning to less-secure test environments.

[Data Discovery - Data Profiling Tutorial | Enterprise Test Data](https://www.youtube.com/embed/BQ7bJbZiWTI)

- Detect and classify sensitive data such as Personally Identifiable Information and Protected Health Information.
- Understand where your most critical data is stored and how it’s being used to ensure compliance with regulations.

> To complete Data Profiling make sure you have completed the Data Catalogue walk through.

From the side ribbon navigate to the **Data Activities** → **Activity Explorer** and choose a Explorer folder. This will set the context to save the Data Profiling Activity within.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-O3SLAWAX.png)

Click **+Add Activity**and choose **Scan Database**.

We will build this data activity and configure it to find sensitive data.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-FISWAEUQ.png)

Enter a **Name, Description & Server**. Click **Next Step**when ready.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-SHVZWOY0.png)

When ready click **Finish → Go to Data Activity**.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-MGI77IDI.png)

First, attach a **Default Database Connection** to the activity. Choose the previously defined connection from the previous Data Catalogue tutorial.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-AQNT8EKE.png)

Click OK when when you have select a profile.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-11SB4G8U.png)

A connection will display on against the data activity.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-JIBEOYUZ.png)

Next we need to attach a **Definition Version**

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-3TBPCDMM.png)

Select a definition version to scan. Click **OK**when ready.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-9HC4TDBY.png)

We now need to build the type of profiling this job will execute. Click on**Create Category List.**

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-78ZB19SN.png)

Provide the:

- List Name
- The type of list
- The description
- The application

As below we’ve chosen the **StarterList**

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-BJGLYMQC.png)

When ready choose **Execute**. This will run a job to create the list to use during profiling.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-SSJJCYLV.png)

The **Data List** will be created against the activity. If you click on the list you can view the catergories of data we will scan for.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-2A5GWFVV.png)

The **List Details**holds the type of information we will look for.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-A3DSMDY4.png)

We will add further lists to include as part of the **Scan.**

From the **Data Activity**under **Actions** click ‘**Create Category List**’

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-MO20JVHC.png)

Choose the **Seedlist-StarterList**and provide a name for the list. It will also require a description and an application from the drop down.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-TET3C6JL.png)

> A job will trigger, make sure to check it completes.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-QYJTX2AG.png)

Complete this action for **RegExStarterList**too.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-WRRGEU3L.png)

Each list will provide a different type of search type. From RegEx patterns analysis to specific data.

**Data Lists** are searchable and editable from the **Data Lists**

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-Y5T9XRDI.png)

If you click on one of the lists, you can see and alter the types of records that are being searched. Below you can see the regular expression being used.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-5VOOTLER.png)

Navigate back to the **Data Activity**. You should have 3 lists configured. You can edit these lists from the **Configuration** tab.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-TFDI6WE5.png)

The **PROPERTY**will show some of the customization you can alter. For example, including views to be scanned, counting rows in tables, finding distinct values. The default parameters are optimally configured but alter them as you requirements need.

Click **Edit**

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-CSLGOX3Q.png)

You can toggle on and off values. Click **OK**. When finished.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-0QSK7G9T.png)

From the **Data Activity** we will now create a **Data Scanning Submit Form**. Which will let us run the job.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-7I1ECUV1.png)

Click on the **Data Scanning Submit Form**action

The form requires a **Name** & **Group**

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-VWMXYPHM.png)

> [!NOTE]
> The group can be an existing group from the **Self-Service Data** page or a new group.
> 
> If you are updating a existing process, pick it from the bottom drop down list

Click **Execute**when ready.

### **Run the data profiling activity**

We are now configured to run the job. This can be manually called, schedule as part of a routine or via an API.

Click the **Play**icon to run the routine.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-93E8I7LX.png)

You can run the scan in several ways. Clicking **Execute** will run this job directly. But clicking the code icon in the bottom right, will present some other options.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-FWHTHHXP.png)

Embed as **iFrame**. You can embed this job as an **iFrame**using the HTML provide. Users can then run this job from different locations and browsers.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-YAO4JIWR.png)

You can embed the job as an API request. You can **Submit**the job using the **POST** request and subsequently **track** its progress and results using the **GET**options. You can also **Download**this code.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-GPNL40JV.png)

You can also call and dynamically instrument this job with constructed **Java**code.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-KNS0Y919.png)

When ready, you can run this job.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-TOLOSMFF.png)

> We’ve selected to Run the schema crawler to gather further metadata.

Click **Execute**if your happy the **Connection ID to scan**is correct.

Follow the **Job** to check it finished and the **Results**have been collected.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-6W85CQHX.png)

### Review the results

When the job completes a new scan will be available against the **Data Definition**. Navigate to the

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-4MC747V2.png)

The **Scan #2** version will now hold the scanned PII formation. Click on **Scan #2.**

> [!NOTE]
> The scan will hold the information for PII we are looking for. The Version #2 will hold the schema details.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-PXP2NGWJ.png)

The scanned tables and columns will now have the Tags assigned to them based on the information we picked up during the scan.

![](https://cdn.document360.io/77f722a6-2d0a-49fa-8074-572515a6c4b8/Images/Documentation/image-EZ4BF482.png)

## Related

- [Data Discovery](/data-discovery.md)
