Insights Blog Leveraging Client QA Through AI

Case Study

Leveraging Client QA Through AI

When full testing coverage is impossible, how does one determine where to focus? That’s exactly what our client needed to know.

Client Overview

The client’s web-based application generates reports; some are generated by paid subscribers, and others are free, bringing in no income.  20 different applications access nearly 200 different retail data sources. Customers typically select a filter (they call them frequencies, such as time, durations, data source, server type, company, department, etc.) and run a saved report or customize a new report based on that metadata.  Data quality is part of the company’s reputation.

Business Needs

Some customers have reported some data problems in generated ad hoc reports. The client believes that these are due to the data processing post-report request, not due to collected metadata quality.

The challenge is that their customers generate about 4 million reports a year. The number of possible paths to generate these ad hoc reports is prohibitively large. Manually testing every possible path would exceed 35 man-years.  The problem: how does one choose an optimal subset from this 35-year-plus test case set?  Any implementation of new features would require a regression set to confirm data quality of the reports.  Someone needed to significantly narrow the testing focus.

The Qualitest Solution

We turned to pattern matching, to see if A.I. could help us to recognize patterns in production behavior to help narrow our focus.  Our initial plan at Qualitest was to use production driven test coverage and combinatorial analysis, which is at the cutting edge of test design and test management. It is based on online analysis of data and metadata as indicators of functional usage gaps between different environments, i.e. production vs. test. This approach leverages COTS visualization software packages such as Weave or Tableau to provide a visual representation of data and metadata patterns as a guide for test designers to identify gaps in test coverage. Usage of this tool set offers significant benefits:

  • Online indication of gaps between how real life users use a product vs. how it is being tested. This enables identifying gaps (hidden patterns) that would otherwise be impossible or cost prohibitive to find.
  • A new basis for risk-based testing, led by metadata factors such as indicators of coverage on functionality that is revenue generating vs. not, that is key user focused vs. not, etc, resulting in improved test coverage, profit/revenue and customer satisfaction.
  • A method to define subset test coverage (which can be used for regression or smoke testing) that is comprehensive and sufficient where infinite amounts of functionality and combinations are impossible or prohibitively expensive to cover.
  • All 3 benefits point to emphasizing test cases focused more on real user behaviors, and less on behaviors they avoid.

We at Qualitest started with their front-end application. The test environment had a category named “Usage metrics” that actually generated a report of the application usage, but lacked necessary granularity. That’s when we realized we needed back-end access to the data to understand their structures and various application dimensions such as security, DB schema, suppression and normalization.

Before we could start exploring, we needed to understand the overall structure (their DB schema, how the tables were related and joined, how information flowed between the tables, and how it flowed between the metadata and Sybase).  That foundation let us begin our exploration, hunting for patterns.  We began with a small subset – data over the last month during a particular day of a week.  With this production data, we filtered and analyzed the hits or counts in every category, the traffic, hours of activity, permission levels, data sources targeted, app servers and DB servers utilized for the same.

We expanded analysis to a week, then again to a month.  The next step involved comparisons with test data. We targeted the prior month as testing precedes production, using the same analysis approach for an apples to apples comparison.

In the end, Weave was a successful tool for finding the patterns we sought, analyzing more axes at once than people can.  We generated a variety of bar charts to display how concentration patterns differed between the production and QA efforts.  Production environment usage metrics displayed the types of daily, weekly, and monthly reports as well as filters used.  The study included analysis of report counts for different modules that represent different business names, usage of app servers and DB servers, different frequencies or data sources, access roles for permissions and security, and suppression use, and emphasis strayed between Production and QA for each of these areas.  Our analysis showed what was never or barely tested in the QA environment, indicating coverage gaps to be filled.

The analysis also found performance concerns during heavy usage that were outside the initial project scope, but easily visible from the observed data, that intrigued the client. Though a total test coverage solution is not realistic, significant quality gains were discovered.

As we proceed, A.I. strategies will be used for mapping known bugs to gaps in coverage.  Meanwhile, the human effort will have engineers investigating the sources of the data problems giving consideration to Bad Calculations, Incorrect Data Segmentation, ETL Issues, Data Quality Issues, and Display/GUI Issues among other possible causes.  The test engineers will ultimately select several workflows to analyze in depth and will run pilot tests of various possible test automation solutions.