Introduction - Advanced Subsets

Running the Basic Subset Job creates a set of Subset Rules, in addition to a set of relationships that the Data Subset will need to retain or not. This information is stored in the Advanced Control Spreadsheet, where rules and relationships can be toggled on or off from the Advanced Control Spreadsheet.

You can thereby refine the Control Spreadsheet to create Subsets iteratively. This will include more or less data in your Data Subset, depending on which inter-related tables must be included to fulfil the specified relationships.

The process of running an iterative Subset involves the following actions:

  1. DROP: Drops the tables created by the PREPENV Action and deletes the data in your Target Database. You will only run this if something has gone wrong in a Subset, and you want to create a wholly new Data Subset.

    1. PREPENV: If you drop the tables, you will need to re-recreate tables before performing the next Subset.

  2. TRUNCATE: Deletes data from the Target Database or Schema.

  3. BUILDMODEL: Creates the rules to drive the Advanced Subset, based on the Control Spreadsheet.

  4. SUBSET: Writes the new Data Subset to the Staging Database.

These Actions are driven by the Advanced Control Spreadsheet. The sheets in the Advanced Control Spreadsheet are created and populated when the actions of a Basic Subset are performed. They can then be refined to run iterative Subsets.

This subsection of the Knowledge Base explains:

  1. How to refine the Advanced Control Sheet where needed in order to re-run an updated Subset job.

  2. How to perform the Actions involved in running an Advanced or Iterative Subset.


Overview of The Advanced Control Spreadsheet

The following sheets will have either been contained in the initial Basic Configuration Spreadsheet, or will have been created when the Basic Subset Actions were run.

Sheet Name

Contains

Editable?

Effect of Editing

Configuration

Displays the database connection, source and target databases and schemas, max rows, max iterations and substitution parameters.

Yes

See the previous Knowledge Base article on the Configuration sheet.

FOUNDCRITERIA

Displays any FoundCriteria, used to define when the Subset should finish.

Yes

See the previous Knowledge Base article on the FOUNDCRITERIA sheet.

SQLCRITERIA

Contains the criteria used to produce the Subset results.

Yes

See the previous Knowledge Base article on the SQLCRITERIA sheet.

ForeignKeys

Displays the Foreignkeys that have been read from the source database.

Yes

Toggle Foreign Keys that will be read in future Subsets.

SoftKeys

Displays any "Soft" Primary Keys that were set for the Subset.

Yes

See the previous Knowledge Base article on the SoftKeys sheet.

Tables

Displays the metadata that has been read from the source database.

Yes

Toggles the source tables from which metadata will be read during the Subset.

RuleList

This sheet contains the rules to drive subsetting. These rules are formulated automatically from the user-defined parameters, maintain consistency and referential integrity.

No

See the previous article on the RuleList sheet.

ProcessModel

This sheet contains the model used to drive the Subset. It is populated by information in the RuleList and SQLCriteria sheets.

Yes

See the previous article on the ProcessModel sheet.

SubsetResults

This sheet displays the number of rows during Staging and the number of rows added to each Schema.

No

See the previous article on the SubsetResults sheet.

HashCodes

The sheet holds hashcodes for the SQLCriteria, Tables, ForeignKeys and SoftKeys sheets. If any of these has changed since the last BUILDMODEL action was run, the ProcessModel sheet might then be out of date.

No

The automation uses the HashCodes sheet to validate that the database model is still valid and up-to-date.

Identity Columns

This sheet is populated with the Identity Columns retrieved from the Source Database.

No

These Identity Columns are used by subsetting actions to insert data into the target tables.

These sheets will all feature in the Control Sheet following a Basic Subset Job:


Refining The Control Spreadsheet

The Advanced Control Spreadsheet contains several editable sheets. These sheets should be edited before the Iterative Subset Actions are run or re-run.

You can edit some or all of the Sheets, refining your Subset in different ways.

A Subset can be refined by editing the "Active" cell in the following sheets:

  1. Tables

  2. ForeignKeys

  3. ProcessModel

Setting rows in these three Sheets to "Active" or not will exclude tables or relationships from your subset. You can often achieve the same result in a Data Subset in several different ways. For instance, if you turn off a Foreign Key relationship and re-run BUILDMODEL, that will exclude

rules that would otherwise have been created in the ProcessModel sheet. It will further exclude any corresponding Primary Keys. Alternatively, you could have toggled the Rule off in the ProcessModel sheet, or the Primary Key relationship in the Tables sheet.

The sheets defined as part of a Basic Subset can also be edited:

  1. Configuration

  2. FOUNDCRITERIA

  3. SQLCRITERIA

  4. ForeignKeys

All other sheets should be left unchanged.

Refining the Advanced Control Sheet enables iterative Subsetting, working until you have achieved the ideal data Subset.

This subsection of the Knowledge Base provides an overview of the Sheets that can be refined. The next subsection sets out how to run the Actions that will use the refined Control Spreadsheet.


The ProcessModel Sheet - Toggling Subset Rules

The ProcessModel sheet provides the rules to drive subsetting. The ProcessModel and RuleList sheets are populated during a Basic or Advanced Subset by the BUILDMODEL action.

You can switch the Subset Rules on or off from the ProcessModel sheet. Each rule is displayed by row, with additional information available for reference in the RuleList sheet. Simply set Active to "Yes" or "No" to filter rules.

The columns for each row are:

  1. ParentTable: The Parent Table name

  2. ChildTable: The Child Table name

  3. ParentColumn: The Parent Key Columns

  4. ChildColumn: The Child Key Columns

  5. Active: Set to "Yes" or "No" to include a Rule or not.

  6. ParentSchema: The Source Parent Schema name

  7. ChildSchema: The Source Child Schema name

  8. FKName: The Foreign Key name

  9. Direction: "DOWN" or "UP" indicates the relationship between ParentTable and ChildTable

  10. Level: The level in the hierarchical table relationships of the rule.

An example ProcessModel sheet appears as follows:


The Tables Sheet - Defining Metadata that will be read from the Source Database

The Tables Sheet is created by the GETMETADATA or TABLES action and defines which Primary Key constraints will be satisfied by the Data Subset.
The information contained in the Tables sheet is used in the following actions that can be performed post-Subset:

  1. Adding Primary Keys (ADDPEYS)

  2. Dropping Primary Keys (DROPPKEYS)

  3. The composite actions ADDKEYS and DROPKEYS which add or drop both Primary Keys and Foreign Keys.

  4. Validating that Primary Keys are unique (VALIDATEPKEYS)

If you update this Sheet, you will therefore add or drop the specified Primary Keys after a Subset.

Each row represents a different Primary Key constraint, and contains the following columns:

  1. Active: "Yes" or "No"

  2. TableName: The source table form which metadata will be read.

  3. SchemaName: The source schema name.

  4. NumberOfRows: The number of rows in the source database.

  5. PK_Name: The Primary Key name.

  6. PK_Columns: Primary Key Columns, separated by a semi-colon for composite keys.

  7. PK_Column_Types: Primary key column types, separated by a semi-colon if there is more than one.

  8. IsIdentity: Specifies whether the column is an identify column.

  9. InheritedBy: This cell is populated if the table is inherited from another table.

  10. InheritsFrom: This tcell is populated if the table inherits from another table.

  11. IsSoftKey: This cell displays "Yes" if the primary key was given in the SoftKeys sheet.

  12. NextValueFunction: If the INCREMENTKEYS action is run as part of a Data Clone, this cell provides a list of functions to use to increment the key columns.

Toggling Active "Yes" or "No" specifies whether the metadata will be read from the source table and schema.

Below is an example Tables sheet:

Note: The Primary Key constraints will not be included in the Data Subset by the core actions involved in running a Subset (GETMETADATA, PREPENV, BUILDMODEL, and SUBSET). To include the Primary Keys specified in this sheet, you must perform the Post-Subset action of ADDPKEYS or the composite action ADDKEYS. This should be performed after the SUBSET Action.


The ForeignKeys Sheet - Specifying which Foreign Keys to read from the Source Database

The ForeignKeys sheet displays the Foreign Keys that have been read from the source database. The ForeignKeys sheet information is populated by VIP using the GETMETADATA or GETKEYS action. This is run as part of a Basic Subset. The ForeignKeys sheet can then be edited.

The Foreign Keys determine which tables and rows are included in the Data Subset. The SUBSET. The BUILDMODEL Action automatically formulate rules for creating a Data Subset that will satisfy the relationships defined by Primary and Foreign keys. The SUBSET action will then "Crawl" up and down the tables collecting data until it produces a data set that satisfies all the hard and soft relationships.

The Foreign Key constraints will not themselves be included in the Data Subset by the core actions involved in running a Subset (GETMETADATA, PREPENV, BUILDMODEL, and SUBSET). To include the Foreign Keys specified in this sheet, you must perform the Post-Subset action of ADDFKEYS. This should be performed after the SUBSET Action and ADDPKEYS or the composite ADDKEYS Action. ADDFKEYS should be performed after the SUBSET Action and ADDPKEYS Action. Foreign Keys can also be validated post-Subset to check that they are all unique.

ForeignKeys can be toggled on and off from the ForeignKeys sheet, specifying whether Subset Rules will be built to satisfy a given Foreign Key. Each Row of the ForeignKeys sheet represents a different Foreign Key that has been read from the source database. The Columns are:

  1. Active: Setting Active to "Yes" or "No" includes or excludes a relationship.

  2. FK_Schema: The source Foreign Key schema.

  3. FK_Table: The Foreign Key table name.

  4. FK_Columns: The Column Names. For composite keys the column names are delimited by a semi-colon.

  5. PK_Schema: The source Primary Key schema

  6. PK_Table: Primary Key table name

  7. PK_Columns: The Primary Key Column Names. For composite keys the column names are delimited by a semi-colon.

  8. FK_Name: The Foreign Key name.

An example ForeignKeys sheet appears as follows:


Explanation of Remaining Sheets in the Control Spreadsheet

Several sheets in the Advanced Control Sheet should never be edited. However, they can be useful for reference when defining iterative subsets. Their contents are explained below.

The RuleList sheet

The RuleList sheet is created when the BUILDMODEL action is run. sheet contains the rules to drive subsetting.

The RuleList sheet should not be edited, but is useful for reference when editing the ProcessModel sheet.

Each Rule is reflected in a row. The columns contained for each rule are:

  1. Direction: "DOWN" or "UP" gives the direction of the relationship between Table and RefTable.

  2. Schema: The Source Schema name.

  3. Table: The Source Table name.

  4. Columns: The Key Column names in the Table.

  5. RefSchema: The Source Schema name.

  6. RefTable: The Source Table name.

  7. RefColumns: The Key Column names in the RefTable

  8. FK_Name: Foreign key name

An example RuleList sheet appears as follows:

SubsetResults

This sheet displays the results of the Subset. It is generated by the SUBSET action.

Each row represents a Table and Schema name. The SubsetResults sheet displays the total number of rows in the Staging Table, and the number added by the last SUBSET action.

The columns in the sheet are:

  1. SchemaName: The Staging Schema name.

  2. TableName: The Staging Table name.

  3. RowsInStaging: The total number of rows in the table during staging.

  4. RowsAdded: The number of rows added by the last SUBSET action.

An example SubsetResults sheet appears as follows:

HashCodes

The HashCodes sheets is used by VIP to ensure that the SUBSET action is only run if the control spreadsheet contains an up-to-date ProcessModel sheet.

The sheet holds hashcodes for the SQLCriteria, Tables, ForeignKeys and SoftKeys sheets. If any of these has changed since the last BUILDMODEL action was run, the ProcessModel sheet might then be out of date.

The BUILDMODEL and SUBSET actions use the codes in this sheet to check for any changes to these tables. If the SUBSET action detects that any of the SQLCriteria, Tables, ForeignKeys or SoftKeys have changed in any way since the last BUILDMODEL action, then a warning will be given in the log. The SUBSET action will not then run unless parOverrideBuildmodelRequirement has been set to True.

An example HashCodes sheet appears as follows:

TEST DATA AUTOMATION DATA SUBSETTING ADVANCED SUBSETS