The Testing Show: Usable Metrics

Transcript

Over the years, a variety of metrics have been gathered to measure and determine how well or how poorly the processes of software quality and software delivery have progressed. Sometimes these metrics are helpful. A lot of the time they are benign or irrelevant. Sometimes they can actually be hurtful or stymie progress, which defeats their purpose altogether.

Matt and Michael welcome Tom Cagley and Nausheen Sayed to discuss meaningful metric, what they are, how to find them, how to make them work for us, and we discuss some metrics train wrecks.

Panelists:

References:

Software Process and Measurement Cast
QTDashboard
Frederick Winslow Taylor
Key Performance Indicator (KPI)
Service Level Agreement (SLA)
SmartBear Connect 2018
Future Proofing Your Software: Design Inclusively
Building a Testing Framework from Scratch

Transcript:

MICHAEL LARSEN: Hello and welcome to the Testing Show, Episode… 61.

[Begin Intro Music]

This show is sponsored by QualiTest. QualiTest Software Testing and Business Assurance solutions offer an alternative by leveraging deep technology, business and industry-specific understanding to deliver solutions that align with client’s business context. Clients also comment that QualiTest’s solutions have increased their trust in the software they release. Please visit QualiTestGroup.com to test beyond the obvious!

[End Intro]

MICHAEL LARSEN: Hello, and welcome to The Testing Show. Happy October, everybody. Glad to have you joining us. Thanks so much for being with us today. Without further ado, I would like to introduce our guests. Please welcome Mr. Tom Cagley?

TOM CAGLEY: Hello. Hello.

MICHAEL LARSEN: And, Nausheen Sayed?

NAUSHEEN SAYED: Hello, everyone.

MICHAEL LARSEN: With that, we’ll turn the time over to our regular MC, Mr. Matthew Heusser. Here’s Matt.

MATTHEW HEUSSER: Thank you, Michael. Welcome everybody. This week we wanted to talk about management and metrics. So, I invited Thomas. We’ve known each other for years. He actually has a podcast. I don’t know if I’d say it specializes in management. It’s definitely a management-quality podcast. How would you describe your podcast, Tom?

TOM CAGLEY: Well, to be honest with you, Matt, it is nondenominational. Tends to be more, yes, management oriented, but Agile, some technical side to it, and testing on occasion. So, a little bit of everything.

MATTHEW HEUSSER: Thomas is based in the Midwest and has a consultant company that is a little bit more (I’d say) traditional, but that might be changing. Tell us a little bit about that.

TOM CAGLEY: Well, actually, Matt, I have recently come out of the field. I decided that, having been an Agile coach for about 12 years, that it was time to actually go back in and see how people really lived and reorient myself. So, I’m actually now working for Hyland Software in Cleveland, Ohio.

MATTHEW HEUSSER: Wow. You’ve gone back to practitioner. That’s fantastic. I didn’t know that.

TOM CAGLEY: Yeah. I get to test, write a little code on occasion, and help people in terms of being a senior Agile coach.

MATTHEW HEUSSER: Fantastic. We’ve also got Nausheen Sayed who is a product owner for QualiTest. Welcome, Nausheen.

NAUSHEEN SAYED: Thank you.

MATTHEW HEUSSER: You’re also sort of the evangelist for the QualiTest dashboard and tooling offerings?

NAUSHEEN SAYED: That’ right. As you already said, I’m the product manager for a test analytic system that we developed in-house, in QualiTest. My role is more customer centric. I meet with the customers and understand their needs, their requirements, from a metrics perspective, because what the tool does is it’s an analytic system and it helps create visual ripples and dashboards based on metrics and day-to-day reporting.

MATTHEW HEUSSER: Well, let’s jump into it then. Let’s talk specifically. We want to talk about how metrics can enable management and what good metrics look like and that sort of thing. Sometimes it seems like a little bit of, “My friend, Marcus McCullough, a solution problemer.” What kind of problems do metrics solve, and who do they solve them for?

TOM CAGLEY: Let’s start in with the whole idea that metrics solve any problem. I mean, very frankly, I would suggest, Matt, that metrics provides information, human beings solve problems. With some luck, right, measurement and metrics provide us with the information that allow us to make decisions. If the measurement data doesn’t allow us to make decisions, it might look good on the wall, it might be a great number, but it’s meaningless and might be counterproductive, if for no other reason, and it costs money to make.

MICHAEL LARSEN: So, if I can ask you a quick question on this: Can you give an example of a good, what I’d like to call a “metrics train wreck?” I’m intimately familiar with that term, and I’ve lived with it myself. But, for those who may wonder, some good examples of abuse of metrics? Let’s start there, because I think a lot of people, when they hear the word “metrics,” they kind of cringe a little bit.

TOM CAGLEY: Well, anytime that you attempt to say that you’re a “team level organization” and that you’re “putting together teams/pods,” any of the other words that one might use for “teams, guilds,” whatever and then incent the measurement of individuals but try to hold teams accountable, you end up with a measurement train wreck. For example, there is a large bank on the east coast that now has gone back to doing such wonderful measurement that probably harkens back to Frederick Taylor and that is they’re counting keystrokes. Very frankly, [LAUGHTER], what they have found (and disciplined some people) is that some people actually wrote little pieces of code that emulated keystrokes, and so that they could go out and have a cigarette or go to the bathroom or get a cup of coffee. That’s a measurement train wreck.

MATTHEW HEUSSER: That’s crazy.

TOM CAGLEY: It’s not “crazy,” and there are organizations that are out there marketing (directly marketing) those kinds of services, “Let’s count keystrokes” because, very frankly, obviously, keystrokes are the natural precursor to productivity or innovation and that is nutcase stuff.

MATTHEW HEUSSER: Yeah. I mean, maybe that’s dysfunction, and that is a measurement which drives dysfunction, right? So then, the next question I would ask is: Are there different measures appropriate for different levels? Is what a director or a VP looks at going to be different than what would be helpful for a team?

TOM CAGLEY: Well, I’m sure Nausheen will have something to say about this, but I think the simple answer is, “Yes.” Because each level in those organizations are making different decisions, and again if measurements are used and are used to create the information so that you can create decisions, which I think is probably a pretty appropriate thing to do in any organization that wants to make money, then I think you end up with having different measures at different levels, and some of those might not feel right at each level.

NAUSHEEN SAYED: Absolutely. I agree. As we said earlier when we started, the information is there and it depends upon the person who is using it and how well it can drive a decision or enable a decision or drive a change. Now, in dealing with customers on a daily basis, I see that there are different business cases. For example, if you are a business sponsor, let’s say, you would like to see, you know, what is the ROI on your automation? You know, how well is your team doing on building that automation, you know getting you that ROI? “What is my vendor utilization?” For example. “What is my resource utilization? Is my budget allocated to every project being utilized the way I want it to?” Whereas these kinds of metrics make absolutely no sense to a technology executor, he might want to see, you know, failure by root cause, for example. He wants to see what the problems areas are in his application. What is the progress on day-to-day operational tasks, QA tasks, and etcetera? Whereas, if you are a project manager, you are more interested in your day-to-day progress. You know, “Where are we today then we were yesterday?” Metrics like those. Then, if you are just a team member, then again, you just want to see, “How many pass/fail today? How many defects were raised today? How many were fixed today?” Etcetera. So, definitely, yes, different metrics for different kinds of roles.

MICHAEL LARSEN: So, have you seen that metrics… do these drive behavior or is this more of a report card kind of an approach? I guess, what I’m really asking is: We have the metrics and we’re looking for them, but how does that actually influence what happens day‑to‑day? Let’s go with that.

NAUSHEEN SAYED: So, let me give you an example of one such change that we were able to drive in one organization. During the analysis of the data, we had data from production. We had stats from the testing activities. When we analyzed the data and we started scripting those metrics, we were able to focus or you can say channel all the production bugs to a specific area of the application on the test. Whereas, the testing activities were really not much intense in that particular area. So, we were able to find some trends in the production data. Where is the activity in production, where are the defects found in production? With the help of that, we were able to focus our testing extensively only on those areas which were more prone to failure, and I think that really drove a change in that organization. That’s just an example which I’ve been through.

TOM CAGLEY: I think, similarly, the whole idea of throughput and using throughput as a pretty standard measure allows teams and teams of teams to predict with, not certainty or anything, but at least get into the right ballpark to know when they will be able to release with the right level of quality. That sort of measure throughput is one or cycle time in combination with the two allows you to have service-level agreements, allows teams to step back and say, “Okay. We can commit and work to that commitment within certain levels of discretion.” But, when it gets to report cards, and I think the answer is, “Yes.” People do both of those. They make team-level decisions. They make release training sorts of decisions, if you’re using things like SAFe with data, hopefully, and that allows them to have an understanding of where they’re going and measure against that. But they also then turn around and start to use it as report cards, which makes people want to avoid or at least tailor the measurement such that they get a better grade. I think we all understand that there is economic consequences at macro-level, that CFO/CTO kind of level, for missing dates and things like that, but very frankly, when it gets to a person level, we hold people accountable. Very frankly, I don’t think we’ve changed organizations. They grade people, and when they grade people, they incent more behavior.

MATTHEW HEUSSER: Yeah. That’s what I was going to ask about next, Tom. Specifically KPIs. So, there is a strong argument—well, there is a loud argument—that the way to run a business is by the numbers. You establish your KPIs and then you measure and reward to your KPIs. Then, of course, you should tailor your chain-level measurements and your department level, division level to aim towards those KPIs. My experience has been that there’s usually something missing in order for you to accomplish what looks like the report card, which is what you’ve alluded to. How do you deal with that disconnected? How do you approach senior management that, “Metrics are great. We’re just going to measure to the KPIs and drive to them.”

TOM CAGLEY: Well, yes. From the measurement point of view, I think all of those steps are important. The problem is, it is never a straight shot. Every team has some level of disconnect. So, finding those natural things that are part of work, if I was using Agile, I have to get stories done. I have to get bugs done. I can measure those. Those are part and parcel of how I work. So, I can measure those. I can talk about them. I can look and say, “You know what? Eighty-five percent of the time, I get these things done in these many days or less.” I can use that actionable Agile kind of a way of coming up with a good SLA or a good KPI, but the fact is that that’s 85% of the time. If I have things that happen, the context of every individual piece of work, I think helps us understand and take action whether it’s at the team level or at a more macro-level. That tends to go away or at least the ability to see that context goes away as it percolates up. Obviously, the thing that I think is important at that point is to help whoever is using this report card data to understand that knee-jerk reaction is almost always wrong. People believe in their heart that they’re intuition is great, well, 9 times out of 10 it’s poo-poo to keep this business safe, but you have to step back. You have to ask the right questions. If you don’t, this stuff doesn’t work. It isn’t helpful.

MATTHEW HEUSSER: Thanks, Tom.

MICHAEL LARSEN: All right. Awesome. I really appreciate the details here and getting into the nuts and bolts of this. So, one of the things that I’d like to ask is: We’re probably familiar with what are called “code smells.” You know, where something seems like you start to notice that when you’re looking at things in code that, “Uh, there’s something wrong here.” You just get that gut sense. I’d like to talk a little bit about “metric smells,” if I can. There’s a lot of things that just get put into a template that people will just say, “Oh, this is important. We need to look at this.” We’ve probably all been in this situation at one point or another, “This metric is important, and therefore we look towards it. We measure it, and we center all of our activities around it.” How do you know when you’ve come across a “metric smell?” They seem important on the surface, but how do you start to know that maybe a metric has gone a little bit bad?

NAUSHEEN SAYED: So, what I think, again, the one size does not fit all. I completely agree. But, again, looking at a menu of metrics and saying, you know, picking and choosing, “Okay. This is what we need. This is what we need,” it doesn’t work for everyone. That’s why I think some initial rounds of discussion around, “What metrics work best for your organization” is really, really important. You need to have the data to support those metrics as well. So, it’s really no point if you want to see metrics like defect leakage. I know it looks really pretty on paper, “Oh, we have defect linkage.” We want to see defect linkage, but when you don’t have any data, that is actually reflecting on production bugs, for example, let’s say, you won’t be able to create or even, you know, show metrics on defect leakage. So, again, I think having those initial discussions as to, “What metrics will work best, depending upon the way you work.” that would really, I think, make a lot of difference.

TOM CAGLEY: I have one follow on. In terms of a very simple method of determining whether a metric has gone bad or a measure, they’re a little bit different in both cases, is to find out when the last time anybody did something based on that, made a decision based on that. If no one can remember, the answer is, “Stop collecting the data.”

NAUSHEEN SAYED: Makes sense, yeah.

MATTHEW HEUSSER: Oh, I love that. That’s one of my favorites within metrics, and then I’ve actually done it where we just stopped filling in the report card and no one noticed.

TOM CAGLEY: [LAUGHTER].

MATTHEW HEUSSER: For six months.

TOM CAGLEY: I think that’s a sign from something that measurement system had gone awry.

MATTHEW HEUSSER: Or at least it’s not providing value. Maybe the answer is you need to crack some skulls and say, “Because this isn’t filled in, we didn’t know about this or that. At this moment, it’s not providing value.”

TOM CAGLEY: My experience and my experience of others would suggest that most measures have their biggest impact in terms of finding something new, making decisions, in the first few months that you start to collect them. After that, they start to age. You need to start looking at different things, because you’ve found to use an MBA is that low‑hanging fruit and then the rest of it is hard. People don’t really like to make hard changes until they really have to.

MATTHEW HEUSSER: Yeah. Ardita Karaj, I was talking to her a while back, and she said, “Throw them away. Get a new one once they’ve stabilize,” and I think you’ve said it very well there. There is a diminishing returns aspect. I do have one final question, and then we can do the Final Thoughts, and that’s application health. Frequently, people talk about measuring the health of the application, and I think they usually mean the customer experience in production. Things like uptime, and I kind of want to throw it out. I will start with Nausheen. Have you seen companies that have really effective performance monitoring health measures and what kind of things were they measuring?

NAUSHEEN SAYED: Yeah. I’ve seen companies measuring all sorts of stuff when it comes down to the health of their application or their SDLC. Few examples: As I already said “defect leakage.” The percentage leakage into the production, if it’s above a certain level, that shows if that’s healthy, unhealthy, red, amber, green, that kind of status. I think another one that we like to see is, again, desk coverage, release readiness.

MATTHEW HEUSSER: What does “release readiness” mean?

NAUSHEEN SAYED: So, release readiness is basically when you have 90-plus desk coverage. You have resolved all your high-priority defects and you know all your DevOps as well as all your testing tasks are green. You know, they are all done. That’s when you know you are ready to release or go live.

MATTHEW HEUSSER: So, it would be the percentage over time, what time period, is it percentage of builds?

NAUSHEEN SAYED: No. It’s percentage over time, definitely. They want to see how they have been doing over time, what is the trend of their lag status over time?” Were they doing better previously or do they need to change anything in order to improve those trends, etcetera?

MATTHEW HEUSSER: Yeah. That’s what I’m interested in when I say “application health.” It’s got to be normalized for the number of hits that you’ve got, but you can look at your number of 500 errors and how long it took you to serve pages and that sort of thing for a given period of time/sprint maybe and then compare it to see whether you’re doing better or worse over time. That’s more interesting to me than some arbitrary measure of, “We need to get 95% of something.” “Where did the 95% come from?” “It’s an industry standard.” “Well, I do this for a living, and I’ve never heard of that number before.” Tom, any thoughts on application health?

TOM CAGLEY: Again, I think Nausheen has described that in pretty good detail. I think we then have to twist it back and say, “Application health removes perspective.” If I’m a customer, I might have a slightly different perspective. I think what we’ve described is very operational from an inside-the-organization’s level of application health. So, the question then becomes, “What is important to the customer?” Getting fixes faster or perhaps uptime or perhaps the ability to process transactions. You know, I work in an organization that in essence takes and connects people’s data so organizations send in mass loads of paper and we translate that into a way that you can move from one end of a piece of data all the way through to another. What’s important to them is, very frankly, that it does it and it does it correctly all the time or at least is close all the time as humanly possible. When it doesn’t, that it’s fixable fast. I think it really depends on who you ask what application health really is.

MICHAEL LARSEN: How do we address (hope that this is something that we can, maybe, do a little short riff on) user experience? How do you actually gauge? What are some good ways? How do you metricize the really… un-metric-able?

TOM CAGLEY: I don’t necessarily think that it is un-metric-able. Very frankly, I think that’s why you have significant test labs that watch where people’s eyes go, what their reactions. I spent a period of time, oh, late last year out on the west coast at Blizzard. Very frankly, they have huge labs where they bring people in, not only internal, external, people who’ve never played their games before, and they put them in front of multiple monitors and let them play. They monitor every keystroke and every eye glance so that they can measure and then track that back to satisfaction. I think it is metric-able. It’s just really hard. It, very frankly, at least at the UX level, means a lot of different things than we classically would do in terms of test coverage or productivity or defects and things like that.

NAUSHEEN SAYED: Completely agree with what Tom said, there are testing labs and you know, test activities that do the same, and again the domain where I am working in is basically reporting what we have. It’s not like a monitoring application.

MICHAEL LARSEN: So then, with that, let’s go ahead and let’s get to Closing Thoughts, some ways that we can summarize this, and of course we like to end our podcast with Shameless Self-Promotion, what y’all are working on, and how people can get in touch with you. So, let’s go ahead. Tom, just some thoughts to close up, and again how people can learn more about you, get to know what you are doing. You’ve got a podcast, promote it. [LAUGHTER].

TOM CAGLEY: Absolutely. I have no problem. Very frankly, you know, measurement isn’t something that we should push away. It’s something that we can use to make decisions. Very frankly, if you end up in a scenario where it does smell wrong or it’s being used in the wrong way, I think you have to get outside help to actually deal with that. From the point of view of the self-promotion, I do have a podcast, the Software Process and Measurement Cast, Spam Cast. Been doing it for 12 years. Comes out every week on Sunday at approximately 5:00 p.m. and a blog at: https://tcagley.wordpress.com. I look forward to hearing from anyone and there are tons of ways to reach out to me. [email protected] is the easiest, and hey, let’s talk.

MICHAEL LARSEN: Awesome. Nausheen, any final thoughts, and how people can get ahold of you or get in touch with you?

NAUSHEEN SAYED: Absolutely. So, as final thoughts, I definitely want to say, never just blindly go for metrics and a reporting dashboard without having put sufficient thought into it. That’s very important. There are various tools on the market that are doing this, but it’s really important to have those one-on-one consultations with the customers, trying to understand each and every one’s requirements and, as you said, the definition of health. So, I think it’s really important to have those initial discussions before we go into finalizing any metrics or KPIs, and that’s exactly what we take pride in. We offer those consultations to all our customers, the data analytics, and the one-on-one consultations around metrics. If you want to know more, there’s definitely a lot of information on the QualiTest Group website where you can know more about the tool and the kind of services we offer around it.

MATTHEW HEUSSER: Okay. I think folks probably know the most about me. I am going to be in Boston for the SmartBear User Conference. I will be doing Lean Coffee at the end of October. So, if you get this and you’re in the Boston area, you know, you don’t have to pay for travel or a hotel or anything, drop on by. I’d love to see you. Anything, Michael?

MICHAEL LARSEN: By the time that this show goes live, I should be finished with the two talks that I am delivering at PNSQC. So, this will be kind of in the past, but they have been consuming a lot of my time lately. So, I’m going to mention them one more time. I am speaking at the Pacific NW Software Quality Conference on Accessibility and Inclusive Design and I am giving a workshop along with my friend, Bill Opsal, on Building A Testing Framework from Scratch – A Choose Your Own Adventure Game. So, expect that after I’m back from that conference, you’ll be hearing a lot more about both of those initiatives. I’ll be able to talk more about them, but right now they are consuming the vast majority, [LAUGHTER], of my time.

MATTHEW HEUSSER: Okay. Thanks, man. With that, I’m going to call it a wrap. So, thanks for being on the show. Nausheen, great to meet you.

NAUSHEEN SAYED: Thank you.

MATTHEW HEUSSER: Thanks, Tom.

TOM CAGLEY: As always, thank you.

MATTHEW HEUSSER: Pleasure. Michael, I’m sure we’ll be talking again very, very soon.

MICHAEL LARSEN: Absolutely. Thanks for having me.

MATTHEW HEUSSER: Thanks for listening, folks.

[Begin Outro]

MICHAEL LARSEN: That concludes this episode of The Testing Show. We also want to encourage you, our listeners, to give us a rating and a review on Apple Podcasts. Those ratings and reviews help raise the visibility of the show and let more people find us.

Also, we want to invite you to come join us on The Testing Show Slack Channel as a way to communicate about the show, talk to us about what you like and what you’d like to hear, and also to help us shape future shows. Please email us at TheTestingShow(at)QualitestGroup(dot)com and we will send you an invite to join the group.

The Testing Show is produced and edited by Michael Larsen, Moderated by Matt Heusser, with frequent contributions from Perze Ababa, Jessica Ingrassellino and Justin Rohrman as well as our many featured guests who bring the topics and expertise to make the show happen.

Additionally, if you have questions you’d like to see addressed on The Testing Show, or if you would like to BE a guest on the podcast, please email us at TheTestingShow(at)qualitestgroup(dot)com.

Thanks for listening and we will see you again in November 2018.

[End Outro]

[END OF TRANSCRIPT]

The Testing Show: Usable Metrics

share

Panelists:

References:

Transcript:

Recent posts

share