Data generation accelerators
Overview and links
Purpose | To help you understand how to use the data generation activity accelerators in the Enterprise Test Data environment |
Audience | Anyone needing to work with the data generation activity |
Requirements | Access to the Curiosity Dashboard. |
Additional Links |
What are they and why would I use them?
The data generation activity has many potential uses and is highly configurable. In order to make it easier for you to set up common use cases, we have created a series of accelerators that will do much of the work of configuring the data generation activity to carry out the accelerator task. They give you a good starting point which you can then adapt to your own use case. For example by limiting the data that is extracted or masking some of that data.
Below is a list of the accelerators that are available.
Accelerator name | Action | some suggested use cases |
---|---|---|
Extract the Data to a CSV | Extracts data from selected database tables into CSV files | Extracting data for a third party |
Move the data from source to Target | Copies data from selected tables in one database, to another database | Creating a clone of tables for testing |
Convert the RuleSet to a Upsert (update) any Matching Data between Schemas | Updates the values in one database with those from the source (reference) database | Updating data from a master data source. |
This guide will step you through the process of setting up and using these different accelerators and guide you on some variations that you can add to customise your data. Where there are differences, these are highlighted in the guide by using collapsible sections.
Create a data generation activity
Prior to running the accelerator, you need to have created a data generation activity.
Follow the steps in the Synthetic Data Generation guide to create a data generation activity, to which you have:
Used ‘Attach Default Database Connection’ (1), to attach the required database (2)
Used ‘Attach Definition Version’ (3) to attach the definition (4)
Used ‘Create New Rule Set’ (5) to create an appropriate rule set (or used ‘Attach Rule Set Version’). Note that at this stage you do not need to make any changes to this rule set, as the accelerator will configure it.
When you have created your data generation activity, Click the Action ‘Rule Set Accelerators’ (7) in order to apply one.
Note that at this point, the rule set is basic, with no user defined parameters set. In this example, we have added 5 tables to the rule set.
Run the Accelerator
The ‘Data Activity -Rule Set Accelerators - Job Parameters’ dialog will be displayed and the parameters available will depend on the accelerator option that you choose. These are listed in the collapsible sections for each option below.
Note that when the job is executed, it will open the job Detail page in a new browser tab, so you can review the progress and outcome, examples of which will be shown in the sections below.
Extract the Data to a CSV
In the dialog:
choose ‘Extract the Data to a CSV’ from the drop-down option list (1)
the database from the ‘extract the Data from here’ drop-down list (2)
Once it has been executed and completed, the job details will look something like the following screen shot
Move the data from source to Target
In the dialog, choose:
‘Move the data from source to Target’ from the drop-down option list (1)
the database to copy data from (2).
The database to put the data into (3).
The schema that the data should be written into (4)
Once it has been executed and completed, the job details will look something like the following screen shot
Convert the RuleSet to a Upsert (update) any Matching Data between Schemas
In the dialog, choose:
‘Convert the RuleSet to a Upsert (update) any Matching Data between Schemas’ from the drop-down option list (1)
the database to copy data from (2).
The target database (3).
The schema that the data will be updated in (4)
Once it has been executed and completed, the job details will look something like the following screenshot
Once the job has completed, you can refresh the ruleset, and confirm that it has been updated. You will see that user defined variables have been added and rules for the table columns have been updated.
Note that a number of user defined parameters have been set up.
These will be used in the data extraction, so we have variables which will map to each of the tables, for example the Products table (1)and a count for each table (2)
If you check the list of tables
The Rules for each column have been updated to reference the defined variables (1),(2).
Note also that the id fields for the tables have been set to Active (3). This is to ensure that the extracted data matches that in the database. If you were extracting into another database, this would be very important so you can ensure that the records can be aligned.
Next you need to build the flow.
Create the activity flow
Click the Action: ‘Rebuild VIP Flow on Server’
This will start a job on the server and will open a new browser window to track the job details
Once it has completed, you can refresh the data activity tab and see that two new components have been added, the raw file and the flow (1).
Next you need to create the Submit form, click ‘Create Data Generation Submit Form’ (2).
Create the Submit form
This will display the ‘Data Activity -Create Data Generation Submit Form - Job Parameters’ dialog
Set the ‘Type of Submit form to be created’ to ‘A Standard Data Generation in to a Database’ (1). Name your form (2) and its group (3) as appropriate. If you have already generated a form and want to update it, then you can use the ‘choose an existing Process and Update it’ (4) drop-down to select it. Clicking Execute starts the job and opens the Job Details in a new browser tab. Once it has completed successfully, you can refresh the Data Activity tab to see the new form (1).
Typically you will execute the form next (2)
Execute the form and review the output
This will show the ‘Reference tables to export - Job Parameters’ dialog. Click execute, to start the job and the job details will be shown in a new Browser window.
Note that the dialog will look different and have different outcomes based on the type of accelerator you are using. These different dialogs and outcomes are described in the collapsible sections below.
Extract the Data to a CSV
Note that in this case you dont need to make any changes to the job parameters and can execute the form.
Once it completes successfully, you can download the results, which will contain the CSV files.
Then you can review them.
Move the data from source to Target
In this case, you should ensure that the ‘Create the Data in this Database’ and Target Schema are set correctly.
Once the job has completed, you can check that it ran correctly in the job details.
Then check the database, to ensure that the data has been added. In this example we used pgadmin to run the select on the table.
Convert the RuleSet to a Upsert (update) any Matching Data between Schemas
In this case, note that ‘Enable Upserts’ is checked.
Once the job has completed, review the details to confirm that it ran successfully.
Then, you can check the database to confirm that values were updated.
Before executing the activity, note that the cost of the first product was 100.0
After executing the activity, the cost of the first product has updated to 450.36, as it is set in the source database.
Use case variations
Masking some column data
In the event that you want to hide some data, you can open the rule set and update the appropriate columns. In this example we hide the list price and standard cost.
We have replaced the generated rule for standard_cost and list_price with the function:
RandomHelper.Faker.Random.Int(10, 450)
which will replace the product price/cost with a random value.
Once you have made a change to the rule set, you need to recreate the activity flow
Then you can execute the form and download the new results. In this case, the products.csv file now has random integers in the cost and price columns
Limiting the data from a table
By default the accelerator will set up the activity to export all the data from the listed table. However, you may wish to limit that, for example, in our case there is a large number of products and so we want to limit the export to show a sample.
The data that will be exported is defined in the appropriate User Defined Variable, that the accelerator has set up. For example, if the table name is ‘products’ then the variable name will be ‘varExistingproducts’ . So in order to change the data that is exported, you can edit the User Defined Variables (1). Select the appropriate variable (2). Note the current SQL selects all the records (3). Click the edit button (4) to change the variable.
Then the default value can be updated
In this case, we have changed it from:
varSourceRepo.Fetch(Of PostgresBIGAWSOTVersion6_646.products)("SELECT * FROM ot.products where 1 = 1")
to:
varSourceRepo.Fetch(Of PostgresBIGAWSOTVersion6_646.products)("SELECT * FROM ot.products where product_id <1000")
Note that the select is now limited to product_id <1000. You can set your own where clause for the select, as needed.
Once you have made a change to the rule set, you need to recreate the activity flow
Then you can execute the form and download the new results. In this case, the products.csv file now is truncated to product_id<1000
Appendix: A look at the ruleset variables.
If you take a closer look at the variables that are set up by the accelerator, you can see that they are some that are linked to the submit form (1). This example it is for the Upsert accelerator form, so there is ‘parEnableUpsert’ which is used to track if the Enable Upsert box is checked (See the Upsert submit form section).
There are also variables used to select the records from the database (2) and a count of those values (3). There will be one of each of these for every table that is being selected from the database.
As this example is from an Upsert, then the records that are selected (4) are those where there is a matching record in the target database.