This is Part Two of our discussion with Alex Schladebeck and Joel Montvelisky.
We tackle regression testing and share a few stories from the trenches (and a ghostly Michael even makes a contribution to this topic), discuss the idea that perhaps continuous testing is a concept that’s time really has come, and look to see if we can possibly break out of the “hardening” process at the end of sprints in favor of more testing up front, so that discoveries can actually be addressed sooner rather than pushed off to later.
Panelists:
References:
- SalesForceFoundation on GitHub
- CAST 2017 in Nashville
- Spring Online Test Conf, June 13-14, 2017
- State of Testing 2017
- Communication and Testing: Why You Have Been Wrong All Along!
- Bredex
- Jubula
- Will your tests stand the test of time? Patterns for automated UI tests
- Will your tests stand the test of time? Slides
- Eurostar 2017, Copenhagen
Transcript:
[What you are about to hear is Part Two of a Two-Part “The Testing Show” series on The State of the Testing Practice. If you have not listened to Part One, we encourage you to pause, go listen to Part One, and then come back and listen to this episode. Of course, you may enjoy the feeling of jumping into a story partway through… after all, that’s how many of us first experienced Star Wars. In any case, we hope you enjoy Part Two of The State of the Testing Practice with Alex Schladebeck and Joel Monvelisky]
MATTHEW HEUSSER: I wanted to cover one more topic, and that is the regression test process. How long does it typically take, and how often is it between releases? it’s so hard to measure, so maybe we can just give some data points. My last company it was, and at my current company it is, and hopefully that’s changing over time. For all my clients, I’m seeing releasing more often and finding ways to compress that time, and tooling is a part of that. Maybe we could give just a couple of examples.
JESSICA INGRASSELLINO: I can rally quickly talk about what I built with bit.ly. I built a Python/Selenium suite and when it got to a point where it was testing enough things that were deemed worth running the thing in CI, which wasn’t long. It was probably like a two-minute test run, so the basic cases. We released multiple times a day. I had two things, and I had tests that were just tagged for running every single time we released something, and then I had the test that didn’t need to be run every single time, but could be run multiple times a day, so I wrote a Jenkins job for those tests asynchronously throughout the day. The main tests ran every time that we wanted to deploy code to production. We would deploy to production anywhere between four to ten times a day, depending on what was happening. The other caveat was that I got ten minutes, because unit tests and everything had to run. So, it really required some serious decisions to be made about what was absolutely critical to test before something went to production versus what needed to be tested but could wait, too. I actually had my whole suite, it was able to run in six minutes. It was pretty solid and pretty steady. We also had a lot of developers that worked on that as they were developing, if they changed something, they actually were responsible for making sure their change didn’t break the test. If it broke the front-end test, they had to fix it and commit the fix. That was a big help in maintenance, because I was the only tester. That’s kind of how we worked that out and ran that. It is something that, even though I am not there, I am aware that they are still using it at this point in time, and I’ve not been there since September. At least it has some short residual value.
At SalesForce, it’s a little bit different because we have a two-week release cadence, and I am not the main writer of the automated tests, so I’m actually the person who is responsible for manual, in-depth exploratory testing, so regression-wise, the current system is to run through a regression during release. We have a release to a sandbox environment, and then we have a week in that environment before we release to production. The first thing that I do is to run the automated tests and go through and look at things and test things that need my attention, which there are a few things that can’t be automated very well, so going in and doing those things before we push to sandbox. When we have something ready, but before we push it to that sandbox space. The reason that’s important is we are open and our customers all have access to that sandbox space. They can try it, they can play with it, that kind of a thing. So, we want to make sure that that’s, at least, the major pieces are working for them, and then during the week I do major exploratory testing when it’s in that environment, because that’s the closest thing to production that we have. Then it gets pushed to production. I’m looking to streamline that process and do as much of the testing as I can that’s possible to do in pre-production, but that involves a lot of different moves culturally. We’ve got great feedback from the developers, which is awesome. It’s just “try things”. Trying and failing and trying again, until we have a process that everybody’s feeling good about.
MATTHEW HEUSSER: Yeah, I think that that’s more common to see. We have a large system, with integrated parts and if you change one thing here, it might break something else there, so we need this regression testy process, so we release about every two weeks. Most of the companies I work with are somewhere in that window. Some can do it better and some can do it worse. Some of it is architecture, like they have a bunch of PHP pages that are all separated and isolated, and they can deploy as often as you want.
ALEX SCHLADEBECK: We’re in the interesting situation that a lot of the work we do is for large concerns, so even if we could release faster, our release cycles are dictated mostly by, “No, we want it then”, and “then” is in a couple of months, so… [laughter], there’s a customer side thing on that. What we are doing a lot of, though, is trying to make sure that even if the customers release isn’t necessarily planned for the next month or two, that we’re making sure that the test phase that the customer’s going to do anyway, because they’re big customers, that they don’t find anything that’s going to throw us back. What we’re doing a lot of is making sure that, yes, there’s automation going on at various levels; obviously, UI automation as well, and with that, we’ve been working more and more on getting developers and testers working together in the teams on things like that. Partially because one tester in a team, for example, is never going to be able to automate and test and do and check and be at every meeting and do everything against X developers, but also because it makes sense to have the whole tam thinking about that. We’ve been doing a lot more in the role of the tester there’s a lot more coaching and strategizing and asking question and showing and supporting to make sure that the whole team is working towards things like that. The other thing that we’ve been introducing is that, when we say “OK, here is the build that your people are going to be able to test, we introduced exploratory testing days with the whole development team and with customers as well. This was something that I did a few workshops with our developers and testers; I think it was at the beginning of last year. Since then I’ve been noticing a lot more days like that going on, where the customers and the developers and the testers are all sitting together. That’s just fantastic to see. It means the team is getting feedback on what kind of errors the customers still find even after we think we’ve tested it. The customers are seeing that we’re interested in getting their feedback and that we do things about the feedback. Of course, in the best-case scenario, they don’t find anything horrific and think “Hey, that was a really good job” and are then more likely to then maybe say “Okay, we’ll trust you to release quicker if we want a different feature or a new feature more quickly than we’d planned. The release cycles in a lot of projects are still fairly long, but we tend to put things in place that such that if that had to not be the case, we could release much more quickly than we do at the moment. In other stuff, on the open source project that we work on, we’ve moved from six-month release phases to being able to release after every sprint. That’s partially through more automation, but also doing a lot more exploratory testing, and also putting things in place that make it possible for us to rerelease quickly, even if we do find something. It’s not saying “Ahhh, someone else will find it”, it’s just doing the risk based “Well, we think we’d look at everything that could possibly go horribly wrong”, and now if something does go wrong that we don’t like, it’s OK because we can patch it quickly. Getting the feedback about whether it’s valuable or not is, in our context for this thing, more valuable than making sure that we got rid of every tiny little thing that we could possibly find. There we’re much more free to deal with the process how we like it, because it’s an open source tool and there’s no one customer behind it. The same principles apply. The theory is that if big customers with long release phases wanted to move to more frequent things, then we’d have the things in place for doing that. It’s a tough and long process, though. I remember that not all of the developers were particularly thrilled about the idea of having some exploratory workshops, to put it mildly.
MATTHEW HEUSSER: Thanks, Alex. Justin?
JUSTIN ROHRMAN: This is something I do every day. I work on a UI automation project. I have a suite of tests that takes about two and a half hours to run. I run it on three different environments every night, so every morning I have a report back on what broke from the previous day’s changes. It’s pretty simple… well, it’s not simple. Maintaining a UI automation suite is kind of like committing the rest of your life to refactoring UI tests, but it works in this context, because the product is fairly legacy, and new changes tend to introduce new problems. At the same time, the UI doesn’t change significantly daily or weekly or monthly. That’s sort of what it’s like, and we release quarterly, so generally before a release we might do a little more hands-on investigation, but for me every day is hand on exploration, based on the reports of the tests, so it’s not like I’m just looking at reports from an ACI system every day. There’s actually a hands-on investigation into the software.
MATTHEW HEUSSER: Well, that’s interesting. The purpose of the test run, part of it, is to give you a jumping off point for you to look for problems.
JUSTIN ROHRMAN: Yeah, yeah, it’s not really automated testing, it’s more like a report triggering somebody to investigate a particular thing and do follow-up testing.
MATTHEW HEUSSER: That’s what it is very often, and we just don’t talk about it often enough.
JUSTIN ROHRMAN: Yeah.
MATTHEW HEUSSER: I seem to recall that you had a bunch of branches.
JUSTIN ROHRMAN: We have branches for just about every customer, and we also have branches for specific configurations for customers. The environment aspect of this is kind of complicated. We just select particular environments to run the test on, based on what’s important or what we’re working on at the time. Right now, it’s just three environments.
MATTHEW HEUSSER: Great, thanks. I’ve worked with a bunch of different companies. Most common with the larger ones, they customize the release test process to the product, so if it’s small, you can just test everything in an hour because it’s just a report generator. You can release it as often as you want, and if it’s longer, they usually develop some sort of coverage model, which I think we should talk about, and then fill in the coverage model until we run out of time. Sometimes the tools help us to fill that out more quickly. I wish Michael was here; we could talk about Socialtext. I left Socialtext in 2011. They still use the same testing framework. It was a combination of running unit tests and Selenium tests on all of the browsers. Some browsers can’t test all of the test cases because, doesn’t work, with a visual slideshow with exploration; rechecking some of the newest stories. That process typically took three days, and they could release one a week. I understand they have tightened that process up through the cloud. When you test a story now, they actually go and they run all of the sunshine test cases right before you test that story, so it finds breakages earlier.
MICHAEL LARSEN: Through the magic of podcast alchemy, I can join this conversation, too. Well, Okay, not really, but I am adding these comments in post. So yes, what Matt described is close. We have a long-lived automation framework that is very closely coupled with our product. It makes our testing process… interesting to say the least. It also means our automation framework has a unique dialect as compared to others. I wouldn’t even know how to begin to describe that, to be honest. We have three sets of tests that we run. We run a typical unit test suite. We have a series of integration tests that we refer to as “wikiDTests”. We also have end to end Selenium tests that we call “WikiQTests”. As you might guess from the names, the unit and WikiDtests are written by the programmers on the development team, and the wikiQtests are written by the testing team. These tests used to take us a day and a half to run serially, so we had a subset that we called “sunshine test cases” that were our sanity tests. If those failed, it was back to the drawing board. Now, we run all of the tests (that’s unit, wikiD and WikiQ) in parallel through Jenkins and using AWS. Each test runs in its entirety and it takes about an hour and twenty minutes. That means we have the capacity to do several builds each day, and push multiple times a day. More accurately, we tend to push daily or a few times a week to our staging environment, but we cut our regular releases once a month. In addition to pushing a completed build to a set of test servers, we run through what we call a “release burn down”. That also includes a range of exploratory tests we can consider that are not automated, and are more suggested than scripted. We know the areas we need to cover, but how we cover them is left up to us and to our own creativity. So far, the process works well. We do find that, because of such a long-lived framework, we do at times make a bunch of changes due to updating modules on our product, and those changes can be time consuming. Still, overall, the system has proven to be remarkably stable for over ten years. And now back to your regularly scheduled podcast.
JOEL MONTVELISKY: We actually do something… I think that it’s between what you’re saying that Michael is doing to what Alex is actually talking about of the risk based type of approach. We do release in PractiTest between two to three times a week. It really depends on what we’re releasing. We have a minimal amount of sanity that we need to run. We don’t have extremely large amount of regression because, just as Alex said, we’re very good at understanding there’s actually something broke that was under the radar that we were actually testing to fix it very, very quickly. We’re also a SaaS product, and being a SaaS product that is constantly monitored, it makes it a lot easier. We do have a cycle that involves a lot of tester/developer on specific risk areas, meaning that whenever we are going to be releasing something, we look into “Okay, so what was done in there?” We do a very thorough risk analysis on what needs to be tested, and then we go about almost everything manually, because the automation is going to be doing the sanity, and it’s going to be checking for all of those sunny day scenarios, but most of the actual hard testing is going to be done by those guys who know that they need to release, and we usually have between half a day and a day to release it. It’s based a lot on risk testing, and it’s based on the fact that, if something goes bad, then usually we know how to fix it very, very quickly. Trying to look into other companies that I have worked with in the last six months or so, I was actually surprised seeing how much people are relying on still one week regression phases that are mostly done manually. You can have people working on three week sprints, then the last week… they call it sometimes hardening, they call it sometimes, I don’t know, regression testing, but they’re still doing a lot of regression test manually, where you just have six guys just going over tons of scripts, manual scripts, just to make sure that they can release the product, and again, I thought that that was behind us, but I was surprised to still see it out there, at least so frequently.
MATTHEW HEUSSER: Yeah, I see that, too. What I like to do is come up with a coverage model, and then at least describing what we covered. I think most testers can get beyond scripts for anything but the most scientifically complex of software, and even then, the scripts should become “ho to set up some specific problem and how to know a specific answer. But yeah, real common. I’d like to move to just “testing all the time”. What would happen, if instead of a one week regression testing, we just tested all the time, and we just recognized that there’s no such thing as a candidate build any more? We’re going to find as many bugs as we can in the time that we have, and the release process is a day, and that’s just burning down the worst of the stuff, and we actually regression test every day for two hours. For a team like that place of load tooling and high failure demand, would that be a good idea?
ALEX SCHLADEBECK: I think that one of the things that would happen, and I’ve seen that teams that want to try and do this, where they’re saying “Okay, let’s do a hardening week at the end of the sprint”. It would be hard for them to realize that taking that away might mean that they might not get as many stories done in the sense of “development done”. And I use that with all of the air quotes that I can muster. There’s this idea, especially in teams that are working like that, “well, if it’s developed, it’s fine. It just now needs to be tested, and the testing is taking away necessary development time.” If you say “Okay, we’re going to test all of the time”, then that probably means more people are going to be testing, unless you want to test less, which would be another option but that’s, again, you’d have to approach that in a risk based way. That can be something that’s hard to sell to the team, maybe hard to sell to the product owner, possibly hard to sell to the customer, which I think is a very sad thing, but I think that’s one of the things that happened once the team has accepted that, and the customer’s realized that what they get out of that is probably better than just trying to test everything in the last week. That’s a good thing that can come out of that. We, actually, moved to “zero day release phases” in one team, and they worked very well. We now do a one day release phase, because in one of the zero day release phases, we released with a terrible, terrible blocker. It was an unknown unknown. More testing for things that we knew about would not have found it. We may have found it by accident in another way, but it was one of those “comes out of the blue, hits you, and sidelines you”. So now, that made us a bit more careful, so now we say “Okay, we are going t have a one day release phase where everyone, on a high level, looks at things to make sure that things are making sense, that we haven’t done anything really embarrassing, that hopefully, by exploring, something like that comes up again, and that we find it before we release. That’s what can come out of that, and that’s beneficial to the customers, but it is terrifying to do that kind of release for the first time.
MATTHEW HEUSSER: Yeah. Oh, I agree. I was thinking about doing it the other way. So you usually have five days, you have forty hours of testing. Six people, but let’s ignore the six people.
ALEX SCHLADEBECK: Okay
MATTHEW HEUSSER: Forty hours per tester, and we’re flipping that up, so in week one and week two we get two hours a day. That is ten hours a week. And week three we do two hours a day, plus you have an eight hour regression test process. So if you did that, you’d actually get in twenty, twenty-eight, thirty-two hours of testing. Not quite as many as the forty you get for regression, but then basically you get, instead of two weeks of development and then bug fixing, you get two and a half weeks of development.
ALEX SCHLADEBECK: I think that’s the theory. I think that the testing would find problems sooner that we realize that we need to talk more about.
MATTHEW HEUSSER: Yep.
ALEX SCHLADEBECK: I think that’s…
MATTHEW HEUSSER: And that’s good, right?
ALEX SCHLADEBECK: Yes, it is. Yes, I’m all for that, but I think that that would be the problem for the team, that they would go “No, this isn’t working because it’s costing us so much more time. It’s blocking us while we were trying to develop.
MATTHEW HEUSSER: You’re paying for velocity earlier, right? You’re feeling the pain earlier. Having a testing phase allows you to defer pain and make the appearance of progress.
ALEX SCHLADEBECK: Yes, and it also allows you to then make the really, really tough decisions that basically ae along the lines of “well, we don’t have any time to fix this, so we’re releasing anyway.”
MATTHEW HEUSSER: Yeah. Right.
ALEX SCHLADEBECK: If you find the error earlier, then you do have time to fix it, but you have to prioritize it over other things that might be happening.
MATTHEW HEUSSER: This would provide you with information faster, that was more accurate, that you could make more actionable decisions, which would not provide you with convenient corner cutting excuses that you get when you found the problems at the end that allow you to ship bad quality.
ALEX SCHLADEBECK: And I think that’s a good thing, but I think it’s very difficult for people to learn to do [laughter].
MATTHEW HEUSSER: Right, they’re, like, super awkward. People don’t like change, and they don’t like super awkward, and by the way, this strategy is one that I’ve recommended many times, and customers are reluctant to take it up, and I think that’s a big part of why. Before we go, what have you been up to lately and where can people here more about you and what you’re doing? Jess and Justin are on the show all the time, but do you have anything new? Maybe Jess can point us to the open source code for Salesforce?
JESSICA INGRASSELLINO: Yeah, if you just search “SalesForce Foundation[i]” in GitHub, you will find all of the open repos. That includes all of the open projects that we have, as well as the test code for those open projects.
MATTHEW HEUSSER: Please, email Michael and me a link, we’ll get it in the show notes.
JESSICA INGRASSELLINO: Cool.
JUSTIN ROHRMAN: I’m still working on CAST 2017 in Nashville[ii]. The CFP is open until February 20th I believe. If you’re interested in speaking at CAST, you can find out about that at associationforsoftwaretesting.org [Editor’s note: CFP was extended to February 25, 2017, but will be closed by the time this podcast airs].
MATTHEW HEUSSER: Thanks, Joel?
JOEL MONTVELISKY: Well, actually, we had a couple of pretty cool projects. We had, back in November, The First Online Test Conf that we run. It was actually a pretty cool conference. Everything was online, and again what we tried to do differently was, instead of having a regular conference, that it’s also being transmitted where the speakers are actually mostly talking with the guys in front of them, then we did it online and we had a lot of different types of interactions, actually went pretty well. We got feedback and requested, we’re actually planning already the second one[iii]. It’s going to be in June 2017. We’re running right now the State of Testing survey [iv] and then we’ll have the report. [Editor’s Note: survey is now closed, report being created, you can request to receive it when it is ready]. Other than that, preparing for conferences. I’m going to be speaking at STAREast, I think, in May[v], and I think there might be a couple of meetups, I think, in the UK over in February, and again just plain old working and trying to keep up to date with what’s going on outside.
MATTHEW HEUSSER: Okay, and Alex?
ALEX SCHLADEBECK: One of the things that we’ve just done that’s exciting here at Bredex[vi] is to release the new version of the open source tool that we work on called Jubula[vii]. That’s now available in 8.4., so that’s cool, and it is UI testing and we like it [laughter]. The other thing that I’m working on at the moment is, obviously, getting used to the new role. I think that… well, that’s a very personal thing, but it’s going to be some fun months learning what’s new and what stays the same. In terms of conferences, I’m going to be talking at The European Testing Conference in February in Finland[viii] which I’m totally excited about [Editor’s Note: Conference was held February 8-10, 2017. Alex’s slides for her presentation are in the show notes]. I’m at the OOP, which is the Object Oriented Programming Conference in Munich [ix]which is at the end of January, and I’m on the program committee for Eurostar[x]. I’m really excited to be able to work with some other cool people on the program committee for that.
MATTHEW HEUSSER: That’s a lot! Great, thanks everybody for being on the show. Any closing thoughts?
ALEX SCHLADEBECK: I enjoyed being here, thank you.
JOEL MONTVELISKY: Yeah, as well. Actually, it was quite a cool experience, thanks for inviting us.
MATTHEW HEUSSER: I really enjoy this. I think we actually got real, which is increasingly important in this sort of superficial and best practice-ey world, so thank you for your time and we’ll talk soon.
ALEX SCHLADEBECK: Okay, thanks Matt.
MATTHEW HEUSSER: Thanks, everybody.
JUSTIN ROHRMAN: Later.
JOEL MONTVELISKY: Bye, guys.
JESSICA INGRASSELLINO: Thank you. Bye.