Required Curiosity tools

  • VIPExcel and VIP installed to run the Subsetting

Required assets

  • Run commands in a zipfile

(warning) Tools can be download via the cloud link we supply. An assets zipfile will also be supplied.


Create source and target databases

Prerequisites - a sample database as source database for creating a target database with the same DDL > It is assumed you have a source database already created. If not, ensure to:

use the sample AdventureWorksDW2016 database as the source database - the backup file supplied (Note: it is assumed you have VIP and VIPExcel installed) located in the VIPTDM folder:

C:\VIPTDM\SubsetCloneAndMaskRun\Subset\Training\AdventureWorksDW2016.bak

or https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver15&tabs=ssms

Then run the DDL script to create a target database called AdventureWorksDW2016Subset. (Note: the DDL script can be run via SQL Server Management Studio) located in the VIPTDM folder:

C:\VIPTDM\SubsetCloneAndMaskRun\Subset\Training\AdventureWorksDW2016.bak

> Run the DDL scripts using SQL Server Management Studio.
https://docs.microsoft.com/en-us/sql/database-engine/install-windows/install-sql-server?view=sql-server-ver15

Create a Subset target database from the source and name it AdventureWorksDW2016Subset.

A word from our devs

Why use batch command scripts and not Test Modeller.io to run the Subsetting?
configuring the job involves editing a spreadsheet it is more convenient to run it using batch scripts


Run the flow script command

All the command files will be located in C:\VIPTDM\SubsetCloneAndMaskRun\Subset\Training\ Run the command PutFlowsInMemory. This loads the VIP flows for Subsetting into memory.

(warning) It takes a few minutes to load into memory and completed when the end message Completed loading interpreter appears. Keep the command window open throughout the whole Subsetting process! Running any of the commands again requires you to run the command PutFlowsInMemory.

Type in the database connection

Connect the SQL Server instance being used. To do so open and update the connection details in the Configuration tab in the source spreadsheet: C:\VIPTDM\SubsetCloneAndMaskRun\Subset\Training\SubsetAdventureWorks.xlsx In the folder use the spreadsheet SubsetAdventureWorks.xlsx which shows 4 worksheets with tabs: Configuration FoundCriteria SQLCriteria SoftKeys

(warning) Save the spreadsheet and remember to close it. The flow indicates ‘failed’ if the spreadsheet isn’t closed.

A word from our devs

In the example the driving conditions have been pre-populated…
Subsetting depends on a(the) driving condition(s) requirements. This is set in the tab SQLCriteria. Then as a preview here specify inactive ‘NO’ for the categories which need Subsetting. In this example the criteria DateKey >= %1 and DateKey <= %2 reduces (Subsets) the number of queried records in the database from 776286 to 36360.


Here’s the Subsetting process as a step-by-step:

▶ Start the Subsetting job

Run the command files sequentially

Note: Running each of the Subsetting flow commands updates the spreadsheet with new tabs and data

which is located in C:\VIPTDM\SubsetCloneAndMaskRun\Subset\Training\

(warning) A log file appears in a new folder called Output allowing issues to be tracked for troubleshooting.

Subset_1_MetaData.cmd

This adds the three additional worksheets with tabs Tables ForeignKeys IdentityColumns to the target spreadsheet.

Subset_2_Build.cmd

This adds another three worksheets with tabs RuleList ProcessModel HashCodes to the target spreadsheet.

A word from our devs

The ProcessModel consists of the supplied SQLCriteria, followed by key relationships which describe how to ‘crawl’ across the database satisfying parent to child and child to parent keys. So, depending on the data in the database and the key relationships crawling across the database to create the subset might in fact not result in any reduction in its size. In this case it may be necessary to edit the ProcessModel to disable some of the links used to crawl across the database. See Check Subset Results below.


Subset_3_DropKeys.cmd

With the Subsetting running, foreign key constraints may not be met but once subsetting is complete foreign keys will be valid and can be added to the target spreadsheet. (To do so run the Subset_5_AddKeys.cmd at the very end of the Subsetting process).

Subset_4_Subset.cmd

A tab called SubsetResults should show the database records have reduced (Subset). In the example the amount goes down from 776286 to 36360.

▶ Check Subset results

A word from our devs

Check the row counts in the SubsetResults tab to see if the size of the database has been reduced. If not you may need to amend the SQLCriteria supplied, in which case you should re-run Subset_2_Build.cmd. Alternatively edit the ProcessRules tab to inactivate some of the rules. In the example provided you could for example deactivate the FK_FactProductInventory_DimDate (DOWN) rule on and the FK_FactProductInventory_DimProduct (DOWN) rule.


(warning) Compare the original database against the records which have been Subset in SQL Server Management Studio.

SELECT COUNT (*)
FROM [AdventureWorksDW2016].[dbo].[FactProductInventory]