Data Subset

    Data Subset


    Article summary

    Document details

    Purpose

    To walk through the process of Data Subsetting and reducing the amount of data moved into an environment.

    Audience

    Interest in data Subsetting, data masking or compliance.

    Requirements

    Access to the Curiosity Dashboard.

    Additional Links

    https://www.youtube.com/watch?v=CbL6Q8D4rrk

    Extract only the data you need from a much larger database.

      • Database Subsetting: Extract and mask only relevant data subsets.

      • Targeted Data Extraction: Apply rules to isolate necessary data efficiently.

      • Smaller, Manageable, Datasets: Improve performance and focus on specific data needs.

    Pre Requisites:

    To follow this documentation through you will need to a Data Definition to be created. Follow Data Discovery to setup a definition.

    Configure Stored Query

    The Data Subset will require a stored query to run against the underlying data store. Create a +New Query.

    Click on Rules from the side ribbon → Criteria Explorer → Query tab → +New Query.

    Add a Name, Type & Description for the Query.

    On the Query tab, enter the Definition, Version, Schema, Table to query and the Query itself.

    Click OK when ready.

    Create Data Activity

    Click on the Activity Explorer in the side panel ribbon

    Provide a Name & Description

    Click Next Step → Finish, when ready.

    Attach a Definition Version

    Provide the Definition & Version you wish to work with.

    Click OK when ready.

    Now Attach Default Database Connection.

    Choose the Connection Profile for the definition version you selected.

    On the configuration tab, we will Edit some of the properties.

    The Target database is the location that the data will move to.

    The Source database is the location that the original data is located in.

    In this example environment, we have a Database with 2 schema. These schemas are identical. We will move data from public to subset.

    In the configuration edit, we will set the Schema Mappings to ‘public, subset’ as the above screenshot for the schemas we are moving data between.

    Click OK when ready.

    The configuration values should have updated.

    Now we attach the Stored Query we created earlier.

    Choose the query we created earlier. Click OK when ready.

    Next we will configure which tables are included in our subset, choose Create Process Model from the Definition Version drop down list. Click the Play icon.

    Choose the tables you wish to include. I’ve chosen Transactions, Accounts, Creditcards & Customers. Click Add Tables & when ready, select Execute.

    As a check, make sure the job completes without errors.

    As a last step, we will create a Submit Form.

    Fill in the Name of the Submit Process & Choose a group to add it to.

    You can now run this job by clicking the Play icon on the Server Process

    Select 'Subset’ from the Actions to perform drop down list.

    The target schema should be empty. As data is moved between environments, make sure no data is present.

    Click Execute to submit the job.

    When the job completes it will give you details on what data has been subset into an environment.


    What's Next