Data Subset

Article summary

Did you find this summary helpful?

Thank you for your feedback

Document details
Purpose	To walk through the process of Data Subsetting and reducing the amount of data moved into an environment.
Audience	Interest in data Subsetting, data masking or compliance.
Requirements	Access to the Curiosity Dashboard.
Additional Links	https://www.youtube.com/watch?v=CbL6Q8D4rrk

Extract only the data you need from a much larger database.

- Database Subsetting: Extract and mask only relevant data subsets.
- Targeted Data Extraction: Apply rules to isolate necessary data efficiently.
- Smaller, Manageable, Datasets: Improve performance and focus on specific data needs.

Pre Requisites:

To follow this documentation through you will need to a Data Definition to be created. Follow Data Discovery to setup a definition.

Configure Stored Query

The Data Subset will require a stored query to run against the underlying data store. Create a +New Query.

Click on Rules from the side ribbon → Criteria Explorer → Query tab → +New Query.

Add a Name, Type & Description for the Query.

On the Query tab, enter the Definition, Version, Schema, Table to query and the Query itself.

Click OK when ready.

Create Data Activity

Click on the Activity Explorer in the side panel ribbon

Provide a Name & Description

Click Next Step → Finish, when ready.

Attach a Definition Version

Provide the Definition & Version you wish to work with.

Click OK when ready.

Now Attach Default Database Connection.

Choose the Connection Profile for the definition version you selected.

On the configuration tab, we will Edit some of the properties.

The Target database is the location that the data will move to.

The Source database is the location that the original data is located in.

In this example environment, we have a Database with 2 schema. These schemas are identical. We will move data from public to subset.

In the configuration edit, we will set the Schema Mappings to ‘public, subset’ as the above screenshot for the schemas we are moving data between.

Click OK when ready.

The configuration values should have updated.

Now we attach the Stored Query we created earlier.

Choose the query we created earlier. Click OK when ready.

Next we will configure which tables are included in our subset, choose Create Process Model from the Definition Version drop down list. Click the Play icon.

Choose the tables you wish to include. I’ve chosen Transactions, Accounts, Creditcards & Customers. Click Add Tables & when ready, select Execute.