Insights Blog Test Data Management

White Paper

Test Data Management

Test data, if left untouched and unhandled, may interfere with the client’s business intelligence. The interference may be quite significant for a more dynamic site that undergoes frequent maintenance and upgrades and as a result has more test data on it. So what should we do?

By Kelvin Kam


No matter the project and no matter how much work you’ve done in the test environment, additional testing will always be necessary in the live environment before and after the product’s launch date. On a website, for example, live testing will involve typical tasks – like performing searches, page navigation, registering as a new customer, logging into the customer account, putting through a purchase and checking the order details – are all captured correctly.  Inevitably, this creates test data in the client’s live production environment, which is of no use to anyone once the test is successfully performed and is just sitting there, cluttering up the system.

Can we avoid this? No, we probably can’t. So what are the options for getting rid of this data? Connecting to a test database which is a copy of the live one, even if it is only for a short period of time, is not really doable.  The site is live and may be experiencing genuine traffic at any point in time, which cannot be diverted to a test database.  How about if we run scripts at regular intervals to delete the test data?  Obviously, there is a very real need to remove such data.  System administrators have access to user and transaction records and can see such test data.  Some types of test data, like a product reviews or something, they could be seen on the website itself.  This is clearly unacceptable.

Test data, if left untouched and unhandled, may interfere with the client’s business intelligence.  The interference may be quite significant for a more dynamic site that undergoes frequent maintenance and upgrades and as a result has more test data on it.  So what could we do?


We will need a good Test Data Management strategy, covering the following points:

Creating test data in live environments: Depending on the type of environment and the type of testing required (performance, functional or data warehouse); new data may have to be created.  This strategy should decide on the exceptions under which projects can create new data, the type of tools required for data creation, and the necessary approval mechanisms to enable the same

Cleaning up test data post testing completion: In order to test all the necessary scenarios, test data may have to be modified in certain cases, which can lead to the existence of altered test data at the end of project testing.  A clear methodology and approach needs to be in place, with set guidelines, to ensure when and how the cleaning up of test data needs to be done after test completion.  Based on the prescribed data integrity guidelines, there will be instances when the altered test data cannot be, or should not be, cleaned up.

Managing data requirements of conflicting projects: A clear guideline and mechanism needs to be in place to decide on which project should go for testing first, based on data dimension parameters, to avoid delays or soaring costs.

Controlling the provisioning of data to required projects: A smart data provisioning mechanism can solve this problem effectively. The control mechanism should be applicable for both newly created data and existing test data. The mechanism can be as simple as blocking a set of customers for a specific projects or as complex as sampling based on multiple dimensions.

Data privacy and protection: There are several tools available in the market for this; however, the scale and volume of test data should decide the mechanism to be adopted. In the scenario of testing being outsourced or offshored, it is essential that the organization, along with the service provider, jointly decide the data privacy mechanism, rather than treating it as an afterthought or individual priority.


Overall, I think the way to create test data for a live environment would have to be determined after taking into consideration the specific end user concerns, functionalities, and reports specific to the particular application under test.  All parties should agree to create a defined list of test data of all possible types – user names, products, general textual matter, etc. – and decided to use only this pre-defined list when testing in live environments.  The list can then be published across the organization and also be used to inform the clients of the exercise.  System administrators will be advised to ignore such data when processing reports or performing other administrative duties. However, it should be noted that this may not be an effective or efficient option for very large businesses.

Test Data Management is a key component in ensuring high system and application quality and reliability. A good Test Data Management solution could help rapidly reduce inefficiencies while extracting greater value from test data available in an organized, secure, consistent and controlled manner.