Efficient management of data used for testing is essential to maximizing return on investment and supplementing the testing efforts for the highest levels of success and coverage. If the data used in testing does not promote ease of use and adaptation, poorly represents the sampled source, or consumes excessive resources for preparation and maintenance, a negative impact on the desired outcome quickly manifests and continues to degrade the quality of results. To balance in favor of positive results and improved returns, consider the process, potential challenges, and possible solutions involved in TDM.

The TDM Process

A Tester cannot simply claim “there are probably defects” in a system and never attempt to identify and report the defects. They must interact with the system and replicate potential defects that have been found. Similarly, a tester can’t provide adequate results if they do not have access to relevant systems and an appropriate sample of data the system utilizes. For data to return the most value, it must be managed using quality processes. The key phases involved in a TDM process are:

  • Planning
  • Analysis
  • Design
  • Build
  • Maintenance

Table of the test data management process:

Phase Steps Involved
Planning 1.       Assign Test Data Manager (TDM)2.       Define data requirements and templates for data management

3.       Prepare documentation including list of tests and data landscape reference

4.       Establish a service level agreement

5.       Set up the test data management team

6.       Appropriate plans and papers signed off

Analysis 1.       Initial set up and synch exercises involve data profiling for each individual data store assignment/recording of version numbers for existing data in all environments2.       Collection/consolidation of data requirements

3.       Update project lists

4.       Analyze data requirements and latest distribution log

5.       Asses for gaps and impact of data modification

6.       Define data security, back up, storage, and access policy

7.       Prepare reports

Design 1.       Decide strategy for data preparation2.       Identify regions needing data to be loaded/refreshed

3.       Identify appropriate methods

4.       Identify data sources and providers

5.       Identify tools

6.       Data Distribution plans

7.       Coordination/communication plan

8.       Test activities plan

9.       Document for data plan

Build 1.       Execute plans2.       Execute masking/de-identification where applicable

3.       Back up data

4.       Update logs

Maintenance 1.       Support change requests, unplanned data needs, problems/incidents2.       Prioritize requests where applicable

3.       Analyze requirements and consider if they can be met from existing/modified current data including data assigned to other projects

4.       Required data modification

5.       Back up new data

6.       Assign version markers and log with appropriate description

7.       Review status of ongoing projects

8.       Data profile exercises

9.       Assess/address gaps

10.   Refresh data where needed

11.   Schedule and communicate maintenance

12.   If necessary, redirect requests

13.   Documentation and reports

 

Tools

The use of quality tools promotes quality results in any line of work, and it is no different when it comes to TDM. Links with useful tools are provided below.

 

Challenges

There are many challenges that can complicate the TDM process such as sensitive data masking and resource consumption. An overlooked challenge can cause major setbacks. Several common topics for consideration have been listed below.

 Challenges of Test Data Management include:

  • Additional time for data set up/management instead of actual testing
  • Additional administrative efforts in test data management
  • Additional expense including personnel and hardware
  • Inaccurate/difficult to access data negatively impacts testing
  • Sensitivity of private information (credit cards, medical records, etc.)
  • Storage required for test data
  • Potential for data loss
  • Use of real data versus fake data generated from scratch
  • Data requests poorly communicated result in inadequate data returns
  • Identification of data anomalies
  • Test priority confliction
  • Timely data reversions

 

Data masking and de-identification

Data masking and de-identification is essential to comply with privacy laws and standards. There are several approaches that may be taken to use realistic data without betraying the confidentiality of sensitive data:

  • You could go through and remove all sensitive information, such as credit cards or social security numbers, but this may not always be the correct method to accurately cover test requirements.
  • One method is to generate fake data from scratch that fits the appropriate format. This can be time consuming for personnel; however, an automated script can be used to quickly generate required data.
  • If you need to return the data to its original format, in some circumstances, a reversible algorithm can be used to alter the data. However, if the algorithm is known or discovered this could potentially allow for the private data to be compromised.
  • A numeric variance, such as +/- 10%, can be used to change information (finance, demographics, etc.) just enough to make it untrue but still valid enough for appropriate use.
  • Data Encryption is a very extensive approach that may not be as effective as it appears if access rights are carelessly given out.
  • Masking out with viewed values being changed, such as with XX or **, can allow systems to still use the data without making the data available for easy access.

Solutions

Once challenges are reviewed we need to consider solutions to help mitigate the impact of these challenges. Considerations for TDM improvement have been listed below:

Solutions to reduce challenge impact include:

  • Ensure connectivity of relevant parties before data set up
  • Testing environments and data requirements are well-defined
  • Smaller datasets that accurately sample full data coverage
  • Involved parties meet and confirm requirements are fully addressed
  • Back up data and assign versions
  • Log the versions with relevant details for quick reference and conversions
  • Data partitions are assigned to entire teams/projects, not to individual members
  • Maintain records of data distribution
  • Unused data/partitions made available for other relevant projects
  • Masking and de-identification of sensitive information
  • Scope of project defines masking tools for complete and consistent masking with realistic representation
  • Masking tools jointly decided by relevant parties
  • Standard request and documentation templates
  • Refresh test data as needed, including periodic updates with new extracts, to accurately cover customer data
  • Subset of metadata to accommodate changes
  • Regular scheduled maintenance
  • Insert row and database editing changes with multilevel undo capabilities
  • Cloud storage (may violate privacy protection)
  • Outsourcing of processes to expert companies
  • Networking with other professionals
  • Automation can be used to expedite processes and lower resource cost, including:

-Masking/De-identification of sensitive information

-comparisons between baseline and successive test runs

Summary

Efficient Test Data Management (TDM) improves quality of testing results. Improved results lead to an improved product and higher return on investment. A process with good understanding and meeting of requirements, coupled with quality solutions to relevant challenges, will help provide the efficiency desired in TDM. Once TDM is optimized, increases in productivity, results, and profitability should quickly manifest, allowing more resources and focus can be utilized on continuing quality products and services.