The Testing Show: Performance Assurance

Transcript

We’ve heard of Performance Testing but what is Performance Assurance?

In a One on One conversation, Matt Heusser talks with Leandro Melendez about the difference between Performance Testing and Performance Assurance and how the changing landscape of systems and infrastructure makes Performance Assurance an area worth paying attention to.

Panelists:

References:

Señor Performo (Leandro Melendez): LinkedIn, Facebook, Blog, and YouTube
PerfBytes Podcast and PerfBytes Español
Statement of Work
JMeter
NeoLoad
Load Ninja
Gatling
Locust
Application Performance Management
Performance Advisory Council, Neotys
SmartBear Connect 2020
STPCon Spring 2020 (San Diego)

Transcript:

Matthew Heusser (00:00):

Hello everyone. Welcome back to The Testing Show podcast. I’m your host, Matt Heusser, and this month we’re going to do something a little bit different. It’s a little bit more personal, a little bit more intimate. It’s a one on one conversation with Leandro Melendez who is the performance test lead for Qualitest and, Leandro, that means you both manage performance testing projects, but you’re also available and frequently deployed as a consultant. Did I get that right?

Leandro Melendez (00:31):

Hey Matt? Yes, indeed. I am on both sides helping management, helping people and creating SOW’s doing some projections, but at the same time I’m in the field having fun with customers and helping them.

Matthew Heusser (00:47):

Great. And for the tech folks, a SOW is the Statement of Work, which defines what the consultant’s going to do before they do it. So at the end of the assignment we can say, did you do what you promised you were going to do? Now you’re also the host. I didn’t know about this. I knew about the PerfBytes podcast. You’re calling in from Mexico City now, but you’re also the host of the Español, Spanish edition of that podcast. How long have you been doing that and when does that come out?

Leandro Melendez (01:16):

It started more or less a year ago. It was in preparation a year and a half, but getting some good equipment, learning some of the sound basics and everything. Took me about six months creating an intro and all the work around it, so it has been already a year. I stopped Season One one last December and I’m about to start Season Two anytime soon. I need to get working on the new intro. I wanted to change it a little bit and anytime now we will have a Season Two for PerfBytes Español, hopefully as soon as possible.

Matthew Heusser (01:52):

Yeah, that’s great. The Testing Show, we don’t really have seasons, we just record it every single month, but that idea of a break every now and again. That seems interesting.

Leandro Melendez (02:01):

It sounded to me a little bit hip to make it Season wise. I noticed some other podcasts were doing, it gives the audience a break, gives me a little bit of a break and enough time to come up with more topics, more things to say and to distribute to the audience in Español, in Spanish, because that’s what I want to focus on most. Trying to help people that are trying to get into not only performance testing but performance testing topics that the language barrier could be a little of a hold-up and, well, translating some of the content and distributing them.

Matthew Heusser (02:38):

That’s something for us to think about and I think you’ve explained it well. For the audience though, Although you are calling in from Mexico city today. You’ve recently been in El Segundo, California for Electronic Arts. You’ve done work in Chicago, San Diego, you were in Houston for some time working with Exxon Mobil, Saskatchewan, Canada before that. So you typically travel to client sites and do the work at the client sites. You’ve been around a lot of places, so my first question is going to be what’s new and interesting in performance testing? You know, 15 years ago you would get LoadRunner or something and you would run it on your desktop and you’d be inside the data center and you’d say, yeah, that was good enough for anything but the heaviest of amazon.com type websites. That’s not how things are done today. How are things changing?

Leandro Melendez (03:32):

Oh well that’s a pretty wide question because as you mentioned from the days when we were playing with WinRunner, LoadRunner, all the evolution where you were focusing mostly on trying to generate load through some of the tools and the monitors that you would get. Some of these tools were kind of say it in a way crafted to work, integrated in the waterfall product delivery methodologies. Nowadays, agile and monitoring technologies are shaking the load testing and performance testing world a lot and on top of that are challenging for organizations to understand the difference in between what is performance testing or performance assurance with what is just running load tests, which used to be the general approach, initially. The changes have been a lot. The coming off microservices, distributed systems where not everything is just a single conversation in between only one server and the other just multiple applications, CDNs, distributed networks, framework or solutions as, well, the cloud also not to forget the cloud, all of these pieces that have been moving in the last 10 or 15 years have made the practice considerably different from what it used to be on the just LoadRunner load tests days where you don’t have enough time to automate certain amount of business processes, which would take several weeks where you have just sprints to go quickly every two weeks. New releases, new features, new things coming up frequently so you gotta be prepared and you don’t care that much about load tests. They are still crucial. Super important, but you need to, as I said, performance testing, performance assurance, very different from load testing where you first try to think on ensuring that the performance of the solution is in good shape. Something that I think should have happened in the waterfall days, but given the nature of the way that you were delivering the projects, you were not forced into knowing the performance.

Leandro Melendez (05:53):

I will give you a quick example. On the initial days in my career as a performance, a load tester, I will receive some business processes to be automated and in the time that I was working through them, translating them into automations, I would notice these processes seriously slow and functional testing, unit testing, user acceptance testing. All of those processes went through already. Why has no one noticed that this thing is taking 10, 20, 30 seconds? Why am I even being asked to automate it? Some of those things have to be catched way earlier nowadays in the agile methodologies but you just have two weeks to notice those things as soon as possible? Initially those were the bottlenecks that we were finding out while doing load tests, so the general process has been changing a lot as well. I will say the tools you mentioned LoadRunner after they were acquired by microfocus, they have been going through some refactoring, some work being done on them and some other tools have been probing where JMeter, which has been around also for add. They can also nowadays is more robust, more stable. It gives you a better experience trying to pull those tests and is better oriented to do some of these microservice tests together with NeoLoad, Load Ninja, Gatling, locust, so on both hands, some the free tools and paid with support tools. We have a big set of choices that are having coming out. It’s pretty wide.

Matthew Heusser (07:36):

There’s a lot to unpack there. I certainly understand for this conversation I’m going to kind of assume a role, I guess you could call it devil’s advocate where I’m going to take a counter position and I’m going to ask you questions. Now I might not really even fully believe the counter position, but I’m asking it.

Leandro Melendez (07:55):

trying to put me in a corner.

Matthew Heusser (07:56):

Yeah, yeah.

Leandro Melendez (07:57):

I like it.

Matthew Heusser (07:58):

It’s kind of fun. I heard a couple of things specifically you mentioned at the beginning, content delivery networks in the cloud. The one thing I will say that’s undeniably clear that I can’t argue against just there’s an explosion of tools and the tools go a lot deeper and a lot more customizable, both the open source and the commercial tools, but let’s start with the CDN and the cloud. In my contrary position would be, okay, so the website is in the cloud. So what? It’s still just a website. I still just need to performance test it. I generate some load, I see how it handles under load. Nothing has changed. It’s all the same stuff is just hosted in the cloud. Am I wrong?

Leandro Melendez (08:40):

Yes and no. That’s how good corner that you got me in. So the cloud gives you this possibility too. Leverage more resources with the elasticity that it gives you. It gives you a little bit the impression that the code that you just uploaded there, that you ran your performance tests, your load test and it seemed to be working fine. You don’t have a reference point, you just see it working well and the results are more or less what matter in a short term basis. Why would I even keep performance testing if the cloud is elastic and it allows me to get good response times? I usually explain this impact with examples. I love to do examples to bring complex topics down. In the old days where you had bare metal in-prem solutions and all your servers were at your home, it is a little bit like you used to have a power generator in your house. It had a limited capacity. It had just as much gasoline or even a bicycle where you were peddling it. That was the limit of power that you would get from it and you will be careful to have a good TV that didn’t consume that much. You would have a good fridge that was more or less tuned or eco-friendly and nowadays with the cloud is like we got the power company giving us as much power as we need and you can just plug as many devices. You can plug your big TV, an industrial size fridge that you don’t take care if it’s regulating well, electricity or consuming too much of it and you say it’s fine. I’m not seeing any problems. That performance is good. The response, my fridge, my TV, all my equipment seems to be lighting up. My house is so covered with Christmas lights so I’m happy with it.

Leandro Melendez (10:31):

The side effects comes a little bit after when you received your electricity bill. That happens as well with the cloud. If you’re not careful with what you’re putting over there, you might not see the performance degradation happening, but you will see it on the bill. You will see it on the amount of containers, servers or instances that you are having to consume to assure a similar amount of performance. The same happens with the bandwidth. If you don’t have CDNs well-placed, well distributed, all the content, they send your central instance. Even if it’s in the cloud, you are generating unnecessary flows and consumptions that you could avoid and of course it would be cheaper, better electricity will follow in the example. The impact might not be immediate. You should say, okay, my performance is good, but probably it is not. And you are only draining more power from your providers.

Leandro Melendez (11:28):

Most probably you will face the consequences after the bill comes. In the field where I had that experience, companies thinking that because they are in the cloud, they could just upload whatever as long as it works. But when the cloud provider bill came sometime afterwards, they noticed that what they were charging to their customers were not more than the bill for some of those services. So some of these companies got in financial trouble where often you don’t have this vision into the future of what is going to happen and you just drink, let’s say, all the Kool-aid of the cloud is all mighty and all good. You gotta be careful with that.

Matthew Heusser (12:07):

So you’re saying auto-scale might work, but it might consume so much resources that your costs actually go up. So if you’re a software as a service vendor charging 10 bucks a month per user, you might lose money on your utility fees.

Leandro Melendez (12:23):

Yeah, absolutely. You need to be careful as well. What are you putting up there in the cloud? Some best practices that also are not being done like the release steps or phases. Your QA environment could be on prem, could be physical, so you get a better understanding if you’re uploading there at good performing piece of code before you sending it to the cloud or even the cloud providers are nowadays starting to open up a little bit so that you get monitors. More visibility into what is going on, how much are you consuming? Something also very big coming up are the APMs. That used to be application performance monitors. Now our application performance management where they can do a myriad of things that they couldn’t 5-10 years ago and it gives you way more visibility on your client environment to a point that some of those will give you money-related dashboards and graphs to tell you not only your CPU is doing this bad, this good, your RAM, your network, but how many, let’s say dollars are you consuming in these processes? How efficient are you money-wise?

Matthew Heusser (13:34):

Sure. That makes a lot of sense to me. At the beginning you touched on this idea of performance and when I hear performance, I think mostly does it ever work at all for one user, does it ever perform and load is, how does it scale when we can get to multiple users? You see it takes 30 seconds just to test it by hand. Can you talk a little bit about performance assurance? That term is new to me as a discipline. I mean I have my guesses about what that would mean, but let’s let you use your words.

Leandro Melendez (14:05):

Well, the performance assurance term, I use it often on my own. I started to use it a lot because it gives awareness to the people that performance testing, its main goal is to assure that the user has a good experience that doesn’t sit there waiting for more than two, three seconds for some page to reply and of course that they don’t have downtime, that they don’t have the possibility to conduct business on top of all of that, that we don’t lose customers, that it’s very well known that when they have a poor performance experience it’s decided there’s a high probability that they will never ever come back to our system. And if it’s corporate internal system, our users might not be encouraged to use it. I don’t want to use the billing system. I don’t want to create invoices because our system, it’s so bad that I’m discouraged as an employee.

Leandro Melendez (15:01):

So some of those situations are core to performance assurance. I start to call it assurance because it’s what we’re trying to do with performance testing, not only load tests. As you mentioned from the example that I gave at the beginning, it used to be so common that a single user doing a single click would take so long that that was the first threshold to call it, in a way, you could have multiple gates to assure the performance and on each level the amount of effort that you have to do to guarantee that it has good performance starts to grow. So if you add the developers code, put some gateways, some measurements, some mechanisms of an APA/MSA mentioned or internal code that can measure your developers code performance, that’s the quickest and the easiest that you can detect it and as it goes further down the pipeline and it reaches production and multiple users can use it and you have a poorly performing code that reaches that far, it is better to start capturing it earlier, as I mentioned, not only with load tests at the very end, just before you’re about to release to production, they should be catched up as early as possible, as cheap as possible and as quick as possible.

Leandro Melendez (16:19):

Yeah. Performance assurance as I see it as I try to communicate it is when you start to go through those gateways of quality and in terms of performance, you can see right away these new piece of code, how many seconds, which is a core of performance testing, how many seconds does it take? But at the same time, how is the RAM consumption? Is it taking too many, let’s say, database queries going on? Is it optimized? Is it doing too many round trips? Is it well designed? Because even before code, you can have a system that the conception of the way that it works might not be the best performance wise and you wouldn’t even know it. Even if you have good performance gateways. I’m going to give you a quick example that we had. There was this system that had multiple items that will be displayed and you had to select them. Each one of these items would be a checkbox and each checkbox decides to do an action. So it’s super quick. It takes only two seconds to come back, but the user will have to click on a hundred checkboxes. So each one of them doing a two second, you will be 2000 seconds just sitting there clicking on checkboxes waiting for the system just in the design step. It was flawed. Another part of the performance assurance is to have a performance engineer or even a test person in their requirement meetings saying, Hey, wouldn’t the design for this would be better if we do it this way? Instead of just waiting two seconds on every click, just do them all and then submit them instead of one at a time. So it’s a huge universe where you could be improving, tuning, doing the performance optimization and assuring that it will be good for the user once it reaches production.

Matthew Heusser (18:11):

Yeah, but I see it as two separate things. Someone told me once years ago, most performance problems are available when a single human being looks at it. Then you hit performance problems when you get to four users, which are usually like Singleton problems where things are passed around in memory and there’s only one copy of them and then you have performance problems when you get to 50-60 you actually hit a bottleneck like CPU or disk or memory. Separating those and saying before it leaves the sprint, it needs to at least pass the lower levels of that makes sense.

Leandro Melendez (18:45):

If I can add a little bit on that, that’s a good one as well to train your developers to be doing, but I like to call them mini load tests where you have as you say, isolated some of the items in your solution, which is just a single item and you automate a tiny load tests on it. You know how four, five ,10 users load of it will respond and you have another slightly smaller gateway passed. You know that little piece how it did and again, you train your developer to create some little bit of code that can load it. Rather you use postman, Java or even JMeter LoadRunner. It’s super simple. Train an indicated developer, Hey, as soon as you finish your code, could you create a little call that with a thread with JMeter with multiple possible solutions, stresses it a little bit before you check it in, so that we know that it can handle five, 10 users clicking it at the same time and again there you are also filtering some functional issues because sometimes our developers do not think, Hey, I didn’t think what would happen if 10 people would click at the same item at the same time. All of these are not only performance assurance but you have also quality assurance, quality gateways that the application keeps going on into production. It doesn’t stop. Once you get in production dev ops and ops lab, you have a huge universe of things that you can also work through in terms of performance from ops back to them.

Matthew Heusser (20:26):

Yeah, sure. We’ve got monitoring and I don’t know about you, but I find that the number of scenarios that I can do when I do load testing is limited, but the number of scenarios that customers will do in production is not. If we put monitoring in place, we can say, Hey, did you know that this thing which isn’t in your load test times out or takes a very, very long time, go through the logs and then grep and sort and find the things that the slowest operation, then you can automate those and hand them to the devs and say, do you care that it takes 45 seconds for this to respond? Because If you don’t, that’s fine. We’re just monitoring it, but if you want to fix it, there’s your story. You’ve already got test for it. Do you do that kind of work?

Leandro Melendez (21:07):

Yeah. Yeah. It’s something that you can detect right away without waiting for a load test to happen. Someone’s automated the process. The user of your system is infinitely creative in the ways that they can use it. Even some that you could have, not even devised in your tests and as you say, we noticed there’s a process that takes 40 something seconds, three minutes, five minutes. The annual declaration process. Oh, okay. That happens only once a year. Only one person triggers it and it’s perfectly fine. It’s for the whole company or Hey, this happens every five seconds and we really need to fix it. APM, application monitoring. It’s one of the first steps that every organization that has IT systems, which are most of them nowadays, everybody should have because if you don’t have an APM or something in place that tells you how are you doing, how much are you doing of it?

Leandro Melendez (22:08):

It’s pretty much like driving a car with a blindfold on your eyes. It’s a little bit like the birdcage movie where you don’t even know where are you going, where have you been? How much have you been doing? You’re driving a car without a speed meter, a fuel tank, meter temperature. You don’t even know how your application is doing. And many might think “this is just for production for when I’m already driving the car”, which is not, you should have all those monitoring tools way before QA pre-prod even the dev environments so they can easily, quickly and cheaply (mostly) you can detect those. I haven’t been doing that many load tests as we used to do in the waterfall days because many of these bottlenecks, issues, bad performers are being detected way before. And I like the way that you said, just a human being thinking, seeing it.

Leandro Melendez (23:03):

You can tell it has poor performance. Many customers can say, I know I have seen it, I’m aware and I need you to create an automation for me to have a metric or an official measurement or report that tells me that, well, with an APM, they prompt you all those informations and they can even ping your IT guy, your developers and SMS be a little bit intrusive. But give you these alerts, this information at the right moment with enough tracing information that you can fix it, work it out, stop it, send it, roll it back to the previous version. There’s a myriad of options that get through these APM. So yeah, I totally agree with you that these should be caught as early as possible. And frankly, it scandalizes me every time that I walk into an organization that has zero to very poor visibility into all of these things, which at the same time, our treasure chests on information that you can get on your application.

Leandro Melendez (24:11):

How much are you spending on some areas of your solution? Which ones should you focus on most? Which ones should you stop paying attention too much? Which ones give you the most revenue? Which ones should we be expanding? This was a little bit on the data intelligence area, but monitoring is one of the first steps because many would say, Hey, all these monitoring, isn’t it heavy? Isn’t it an overload on our system. And I would say all the monitoring tools that you have on an airplane or commercial passenger airplane are heavy, consuming a lot of extra gasoline consumption, but I dare you to get into a plane that doesn’t have them. Make sure that the plane that you’re flying on has all the monitors possible to be safe or feel safe on it.

Matthew Heusser (25:01):

I definitely see how you’re speaking through metaphors and I like metaphors. I think they can be very effective at communicating with people that aren’t in the day to day work. I hope you’re hearing this and you’re dealing with this at work. Listen to the podcast again and listen for the metaphors because Leandro is really dropping some good terms to use to help people make the work more familiar if they’re outside of it. So let’s talk a little bit before we go about you and what you’ve been up to you. I know you’ve been doing a lot of writing, conference speaking. Where can people go to learn more about you and what you’re up to?

Leandro Melendez (25:41):

First, last year was a busy year for me in terms of public speaking conferences. As I said, I started the podcast. I have been writing my blog and I go around a little bit with the AKA of Señor Performo on the internets. I started with a blog where I try to explain through these metaphors and through these silly examples, but usually I think are the best ways to communicate these complex topics to who might not be so involved or technically deep diving into all these little pieces so you need a better explanation. When someone starts speaking to you with gibberish and our lingo, it’s easy to lose people, so I like to have them engage to understand, to know what we’re talking about and why some of those decisions should be done because they are business decisions. So that’s also what I’m been doing through the public. Speaking through the conferences.

Leandro Melendez (26:43):

I was last year on the PAC from Neotys, some conferences here in Mexico, hands on testing. I went to STP-con in Boston. I went to accelerate from Tricentis in Vienna where on all of those I have been trying to share some of this knowledge translated into digestible bits of information for the next year or this year I’m going to be on STP con in San Diego. I might be in this year’s PAC from Neotys. It is in Santorini, Greece. Also I’ll be explaining a little bit on some how to integrate all these performance testing into an organization and through these gradual gateways to grow it. As well in April there’s the SmartBear connect conferences where I’ll be also attending. So just in the short term, two or three conferences. The one that I am more excited as offering now is a STP-con because I’ll be doing a full workshop where I’m going to be trying to explain through these types of examples how to pull automation for load testing and to tackle the dreaded correlations, which are the most difficult task when you are trying to correlate.

Leandro Melendez (28:10):

I’m going to be doing it through a game of spies where you capture information and you try to decipher it to forge a message. So I’m trying to make it all fun and digestible and as well as you mentioned, I am on the social networks, LinkedIn, Facebook, Twitter, even Instagram as senor performo, S R P E R F, S R perfLeandro Melendez on LinkedIn, @srperf on Twitter, so you’re perform on Facebook and the YouTube channel also I upload there my presentations when I get a chance to record them, they are on YouTube. You can just look for Señor Performo. There’s a channel there and shoot me a message if you want to keep talking about performance testing.

Matthew Heusser (29:02):

Great. We’ll have all those hard links in the show notes, so if you’re listening to this on iTunes you can search for the show by number and title and then you can just click away and get all that information. Thanks for being on the show. There’s so much we could get into. I think what I’d rather do is have you back soon. If that works for you.

Leandro Melendez (29:25):

I would absolutely love it.

Matthew Heusser (29:27):

I may have to brush up on my Spanish to listen to your podcast. We’ll have a link.

Leandro Melendez (29:32):

and thanks a lot Matt, for having me here.

Matthew Heusser (29:34):

All right. Thank you for coming.

The Testing Show: Performance Assurance

share

Panelists:

References:

Transcript:

Recent posts