What is Data Masking?

Data Masking is the process of masking or anonymizing a set of data that will be used as test data for an application under development. The primary motivation for masking data is usually to comply with data protection requirements. These requirements might be dictated by applicable privacy laws or might be governed by an organisation's internal policies. They often apply when production data refers to personally identifiable information (PII), as well as to commercially or financially sensitive information. As strict laws often prohibit the use/re-use of personal data, there is good reason to anonymize data before it is moved to test and development. Data Masking, a part of Test Data Automation* supports compliance in testing and development by removing sensitive data from data before it is moved to non-production environments.

*Test Data Automation (TDA) is Curiosity's complete test data solution set. Find out more about TDA here.

Data Masking Video

For a brief overview of the Data Masking using Test Data Automation, see:

Requirements and Installation

The required pre-requisites skills and experience for data masking with Test Data Automation are:

  • Knowledge of SQL and SQL Server Management software

  • Knowledge of Microsoft Excel

  • Knowledge of VIP

To use Data Masking from Test Data Automation, you will need an intermediate understanding of SQL. You will also need a good level of proficiency in Microsoft Excel, and an understanding of the database that you are masking. There are four pieces of prerequisite software for Data Masking:

  1. You must be running a 64-bit Windows machine. Windows 10 is strongly recommended and is required for the full range of functionality.

  2. The latest 64-bit version of Microsoft Excel.

  3. The latest version of VIP. Information on prerequisites and installation is in this article.

  4. The latest version of the VIP Execution Server. Prerequisites and installation can be found here.

Once you have installed the prerequisite software on a 64-bit Windows machine, you will require licensed access to the relevant TDM utilities. Contact your Curiosity representative or email if you need to request a license.

Supported database types

Data Masking from Test Data Automation supports the following database types:

  • MySQL

  • Postgres

  • SQL Server

  • Oracle

  • DB2

  • MariaDB

Note: In the interest of simplicity, this document uses an example MySQL Database, but Data Masking can be done with any of the Database formats listed above. If are using a Database type that is not supported yet, please ask us for a quote –

The Requisite Files needed for Data Masking

Data Masking is one of Test Data Automation's high-speed utilities. As set out in the process overview, a generic VIP Workflow performs the Data Masking Actions used during Data Masking. These Actions are configured by the user in the Control Spreadsheet. Each Action is customisable, with a range of parameters that can be defined.

You therefore require a number of files to run a Data Mask. The Control Spreadsheet and VIP Server Controller are used to configure a Data Mask iteratively. If running Actions via The VIP Server Controller, this spreadsheet will be the primary artefact that you work with.

Note: You can also parameterize actions in a Command Script and execute the actions from the command line rather than the VIP Server Controller. For this, a standard set of editable .CMD scripts are used. These are shipped with the VIP Server Controller installation. To run the automated Actions, you also require a set of automated VIP workflows and the associated files.

The Files
The main points for documentation is to refer to the right file path for things – so in the initial setup the DDL to create tables and functions is now in:


The preferred way to do masking is to use SQL Server by going to:


where you will see the following files:

| PutFlowsInMemory.cmd 
| Executor.cfg
| Mask_1_Prep.cmd
| Mask_2_Mask.cmd
| VIPMaskSQLServer.xlsx
| Mask_InsertLists.cmd

The steps for Masking using VIPExecutor (recommended) are:

  1. Run PutFlowsInMemory.cmd

  2. Edit VIPMaskSQLServer.xlsx so that it has your connection details

  3. Run Mask_1_Prep.cmd

  4. Manually set up masking rules5. Run Mask_2_Mask.cmd

If for some reason you would like to run Masking from the Command Line (not the recommended method), please see Data Masking from the Command Line section.

All generated output files will go to an Output folder:


Mask_InsertLists.cmd is the one-off environment preparation step for masking

Note: Files in C:\VIPTDM\SubsetCloneAndMask should not as a rule be changed, once the initial environment preparation has been done (ie tables and functions created).

Folder C:\VIPTDM\SubsetCloneAndMask contains

  • encrypted VIP flows

  • dlls

  • control spreadsheet examples (the "starting point" for each function)

  • batch script examples

  • VIPExecutor config examples

  • Table and UDF DDL

  • Regex file for masking

  • Seedlist .csvs

It is not possible to change any of the files with the exception of the following:

  • the naming used in the Table and UDF DDL

  • the regexes used in masking (C:\VIPTDM\SubsetCloneAndMask\DataScannerRules.csv)

  • the seedlists

The following requisite files (for MySQL DB) are therefore needed to define and perform the Data Masking Actions.

  1. A blank Control Spreadsheet that points the VIP workflow to the database that will be modelled and masked, and then configures the Masking Rules.

Note: The database used throughout this documentation is SampleCommerce which is like one that would be produced by an eCommerce site. It contains various orders, grouped by the customer who made the order. Each order has an order ID, and there are products associated with each order ID. Each product in turn has a supplier company associated with it.

If running Masking Actions via The VIP Server Controller, this is the primary or sometimes only artefact you will work with.

  1. Automated workflows used to perform the Masking Actions. These are made up of:

a. Two TDM VIP workflows that execute the Masking Actions in C:\VIPTDM\SubsetCloneAndMask:



b. The Dynamic Link Library (.DLL) file associated with the automated Mask workflows in C:\VIPTDM\SubsetCloneAndMask

i. VIPSubset.dll

c. The standard command script used to execute the Masking Actions. This script can be edited for each Action when running from the Command Line. You also can create and rename several scripts for several, easily re-usable Actions.

i. Mask_InsertLists.cmd in C:\VIPTDM\SubsetCloneAndMaskRun\Mask\MySQL\EnvironmentPrep

ii. Mask_1_Prep.cmd in C:\VIPTDM\SubsetCloneAndMaskRun

iii. Mask_2_Mask.cmd in C:\VIPTDM\SubsetCloneAndMaskRun

If running Masking Actions via The Command Line, you will edit and re-use these command scripts.

d. The PutFlowsInMemory.cmd and associated FlowCache.txt in C:\VIPTDM\SubsetCloneAndMaskRun. FlowCache.txt is used to tell the VIP Server Controller the location of VIP workflows to put "in memory". This improves the performance of the automation.

e. Folders containing additional files are in C:\VIPTDM\SubsetCloneAndMaskRun :

i. Output – This folder contains excel files showing pre and post masked values as well as the log files. This folder by default will be located in C:\VIPTDM\SubsetCloneAndMaskRun\Mask\SQLServer\Output. Note that this file path can be set as an argument value in the VIP Controller.

ii. DDL – This folder in C:\VIPTDM\SubsetCloneAndMask\EnvironmentPrep\DDL\MySQL contains four SQL scripts for Creating/Dropping and Tables and Functions:





Additionally, the SQLServer folder contains

VIPFunctions – contains scripts to create tables and user-defined functions used in masking (see Initial Setup below)

TestDB – contains SQL scripts to create and populate a simple example database.

iii. Seedlists* – This folder contains comma delimited files with lists of values for use in masking. For example LastName.csv contains a randomly generated list of 1,000 last names. Seedlist files can be found in C:\VIPTDM\SubsetCloneAndMask\EnvironmentPrep\Seedlists

*Seedlists : a list or set of lists containing members of a list type that will be used as source/original data for a process such as masking.

Note: All outputs (log file, MaskReport.txt, audit .csvs…) go to an Output folder: –C:\VIPTDM\SubsetCloneAndMaskRun\Mask\SQLServer\Outputas setup by logDir="C:\VIPTDM\SubsetCloneAndMaskRun\Mask\SQLServer\Output"

Files will be provided through a downloadable installer package. Instructions will be provided for how to unpack into the man folder etc.

Housekeeping tip: It is recommended that you create a folder for each individual Data Masking task. Maintaining individual folders per Masking enables better management of your TDM artefacts. It helps to avoid them being edited or lost, and also enables you to quickly and easily repeat TDM jobs in future.

You might additionally incorporate a file structure where TDM tasks are stored under folders for individual Databases or Schemas

Data Masking List of Steps