This Knowledge Base section describes in detail how to generate Synthetic Test Data, part of Test Data Automation.

Test Data Automation provides a simple, intuitive and largely automated approach to Test Data Generation, A high-speed workflow engine, automatically builds a data model of the target database. This produces an easy-to-use Excel control spreadsheet, in which a comprehensive set of Data Generation functions are defined. Event Hooks, pre-process variables and more allow you to retain the referential integrity of complex data, producing data for complex systems testing from simple configuration spreadsheets. High speed automation populates the data, producing rich synthetic data for manual or automated testing.

…but first, a few questions

Why do I want to generate Synthetic Test Data?

Primarily to be able to test software systems or applications. Software is created with a set of assumptions and constraints in mind. Once it is developed, we can use it, right? Not so fast. We first need to test the developed system using data not created by the developers. VIP has a feature that allows the developers of the software to create 'synthetic' data which is generated using a set of predefined criteria .

Reasons why Synthetic Test Data is a good idea

  • Testers often spend many hours trying to find the correct test data for testing.

  • Data is often required to be consistent across multiple applications.

  • Many bugs are actually incorrect data being used in a test, not in the application.

  • Test Data often changes and becomes invalid for the specific test.

  • Testers often cannabilize each others data

  • Incorrect data destabilises automated testing and creates automated test failures.

  • Each tester hunts for their own data and there is little reuse of previous data finds.

  • High volumes of storage required for test and data and time-consuming to provision. It is often quicker to synthesise data and provision it in parallel.

  • Last, but not least, Synthetic Data complies with privacy and protection laws, so there's no need to worry about illegal use of personal data.

What will I gain from generating Synthetic Test Data?

You will gain the peace of mind that the system has been tested and that the synthetic data is representative of production, but also goes well beyond it in terms of test coverage.

How long will it take me to generate Synthetic Test Data?

If the application tables and their relationships have already been created, then generating Synthetic Test Data should only take a short time (hour/s).

Are there any Prerequisites for generating Synthetic Test Data?

Yes, a Database should have already been created that models the application through a set of Tables and their relationships to one another.

Note:  This document uses example-based walk-throughs for doing Data Generation using a Sample Commerce Database model.

Using Test Modeller for Synthetic Data Generation

This section describes how to use VIP Modeller to customize and run the Data Generation task. This method allows the user to create lots of combos of variables and parameters that will be used in Synthetic Data Generation. This method is particularly suited to people who do not have a strong technical background.

Overview of the Method

  • Start with a VIP Flow for Generating Data . See Section XXXX for more information on this.

  • Load the Flow into VIP Modeller

  • Use the argument/parameter values from the VIP Flow to fill in/customize the forms in Test Modeller.

  • Generate synthetic Data using Test Modeller.

Detailed steps

Overview and Step by Step Demo

End-to-End Demo Video

This video provides a full walk through of synthetic Test Data Generation for SQL Server. It provides an end-to-end overview for new and experienced users of Test Data Automation.

Demo Video

Process Overview

The aim of this process to automatically produce artificial or synthetic test data. As described above, rigorous testing requires a variety of data to ensure that the system functionality is robust, consistent and effective. Synthetic test data generation is a rapid and comprehensive approach for creating the required data to test any application.

The Synthetic Data Generation Process begins by generating 1) a VIP Database  Model* and 2) a related Test Data Configuration Sheet. The Configuration Sheet specifies how to generate Data for various fields in a Database. This data generation uses a set of pre-defined functions for generating rich data that matches a particular Database Model.

*A VIP Model stores metadata information about the database to perform database operations at table level (e.g. Insert, Delete etc.) later.

Once the Database Model and Configuration Sheet are generated, the user is then able to configure the Test Data Generation Spreadsheet to conform/produce the type of Test Data that is required for system under test. The Test Data Generation Sheet is an Excel spreadsheet and contains the data fields as well as the relationships between different data in the database. The Test Data Generation Sheet has access to ready-made functions that generate values for individual fields.

Event Hooks are a feature that can be used to add custom functionality such as business logic to the generated data. For example, generating data for an Ecommerce application might require a Total Value of Product Sales. This total would require a set of calculations based upon the value of individual Products multiplied by the number of Products sold for each Product, which are then added together to determine the Total Amount of Products sold.

Once the Test Data Configuration Sheet is complete, the VIP flow needs to be generated using the Configuration requirements. This workflow leverages the configured sheet together with the Database Model.

Finally, it is necessary to execute the VIP workflow with the expected records to generate the new Synthetic Test Data. The amount of Test Data is specified at this point in the process. So you can generate 10 pieces of data or 100 or 1000. Further, more data can be generated with further iterations of this process.


Before discussing specific prerequisites for Synthetic Data Generation below, it is very useful for the user to have familiarity with VIP (Visual Integration Processor) created by Curiosity Software.  If you are not familiar with VIP, please see the VIP documentation describing the purpose, function and uses of VIP, otherwise continue reading.

A Database should have already been created that models the application through a set of Tables and their relationships to one another. This is the target Database/schema/tables into which data will be generated.

Supported Database Types

  • Sql Server

  • Oracle

  • My SQL

  • Postgres

  • MS Access

  • MariaDB

To use Synthetic Test Data Generation, you will need a basic understanding of SQL and Microsoft Excel, and a good understanding of the underlying application database model.

The prerequisite hardware/software for Synthetic Test Data Generation:

  1. You must be running a 64-bit Windows machine. Windows 10 is strongly recommended and is required for the full range of functionality.

  2. The latest 64-bit version of Microsoft Excel.

  3. The latest version of VIP RPA. Information on prerequisites and installation can be found here, and licensing information here.

  4. The latest version of The VIP Server Controller. Pre-requisites and installation can be found here.

Once you have installed the prerequisite software on a 64-bit Windows machine, you will require licensed access to the relevant Test Data Management (TDM) utilities.

The Requisite Files for Synthetic Data Generation Folder

  • These files are generated as part of Register Data Model VIP action

  • The Data Generation Configuration Sheet, DataGenConfigSampleCommerce.xlsx (excel file for the example in our documentation)

  • The dynamic link library (.dll) for the Flow file which in this example will be SampleCommerce.dll. It can preferably be in %programdata%\VIP\lib folder if more than 1 data generation flow is using the same dll. That is to say that if a database has multiple tables and a user is generating data for a few of those x tables in one flow and y tables in another flow then both of the DLLs can use the same model dll. So preferably it should go to %programdata%\vip\lib directory.

  • The VIP Flow file such as for which data is to be generated is created in the process.

The file that you will need to begin Synthetic Test Data Generation is by default located in C:\Program Files\Curiosity\Visual Integration Processor\VIP.exe . This will be placed here when you run the install  files. For full instructions, see Pre-requisites and installation link.  

It is a good idea to create a shortcut to VIP.exe on your desktop. There will be files generated when you do Test Data Generation. The location of  those generated files  is described in the relevant sections of this document as you go along. In general, it is a good idea to keep generated files in one location so they are easy to find.