The Testing Show: Episode 43: Machine Learning, Part 1
This is the first of a two-parter with Peter Varhol on both the promises and the hype surrounding AI and Machine Learning. Matt, Perze and Michael go down the rabbit hole on the Machine Learning topic with Peter as we try to wrap our heads around both the realities of Machine Learning, AI and the unique testing challenges such systems offer.
From Facebook’s Chatbots negotiating an agreement to systems making predictive suggestions in ways that are both intriguing and creepy. There is a lot to the machine learning puzzle that we are just starting to understand and also prepare ourselves to effectively test. Hint: the algorithms themselves are only part of the puzzle.
- Facebook AI Creates Its Own Language In Creepy Preview Of Our Potential Future
- Basic Concepts in Machine Learning
- Can we Build ‘Her’?: What Samantha Tells us About the Future of AI
- Natural Language Processing and Machine Learning
- Supervised Learning Algorithm
- Unsupervised Learning Algorithm
- Python Machine Learning (book)
- Regression Analysis
- Multivariate Analysis
- Correlation Coefficient
- Carina C. Zona: Consequences of an Insightful Algorithm | JSConf EU 2015
- Consequences of an Insightful Algorithm (Slideshare)
- Cognitive Bias in AI: Why Machine Learning Applications Are Like People
- Decision Tree Learning
MICHAEL LARSEN: Hello everyone, and welcome to The Testing Show. I’m Michael Larsen, your show producer, and today we’d like to welcome our guests, Perze Ababa?
PERZE ABABA: Hello, everyone.
MICHAEL LARSEN: Matthew Heusser?
MATTHEW HEUSSER: Hi. Welcome to the show.
MICHAEL LARSEN: And our special guest, Mr. Peter Varhol?
PETER VARHOL: Hello, Michael. Thank you. Hello, everybody else out there.
MICHAEL LARSEN: With that, we will turn the time over to Matt. Take it away.
MATTHEW HEUSSER: Hey, thanks Michael. So Peter, I’ve known for years and years. I think we probably met at STPCon, I’m guessing?
PETER VARHOL: We met at Agile Testing Days in—I’m going to say—2012?
MATTHEW HEUSSER: Agile Testing, yeah. Yeah. Yeah, in Germany. Right.
PETER VARHOL: I attended your’s and Pete Walen’s Workshop.
MATTHEW HEUSSER: Yeah. I remember that now, and then we ran into each other again in Estonia for Nordic Testing Days two years later.
PETER VARHOL: We don’t seem to be able to avoid one another, Matt.
MATTHEW HEUSSER: We’ve been collaborating ever since. We’ve actually brought you in on a couple of different test projects and several writing projects too, and when we are tapped for work and we just can’t take on anymore and AI or Machine Learning Projects come in, you are at the top of speed dial.
PETER VARHOL: I appreciate that, Matt.
MATTHEW HEUSSER: Yeah. So, what else? What else should the audience know about Peter? From what I know, he’s a really eclectic guy. He’s done a ton of stuff. He’s got an advanced degree in math, been a Math Department Chair. A ton of writing, more writing than me. He used to be an editor for—I think—TechTarget.
PETER VARHOL: I was. I’ve been editor—a full-time editor—on and off for 20 years. When I’m not full time, I freelance. But I was TechTarget’s Application Group, Media Group, Director. So, I directed all of the web properties that TechTarget has within the application development and testing space.
MATTHEW HEUSSER: Yeah. So, impressive guy. Lately, he’s been working on AI and ML, which is kind of the focus, what we want to talk about this episode. Perze has been looking at it for work. I’ve been looking at it for work. So, it should be a good episode. Let’s get started with The News. This news came out 1-1/2 months ago that Facebook had created these AI Chatbots that develop their own language and Facebook turned them off. Let’s start with that. Let’s start with the official story when it came out the first time through mainstream media. How did you react to that? What did you think about that? Anybody in the audience?
PETER VARHOL: Well Matt, I thought it was kind of cool while I think the key thing about Machine Learning is that it operates on the Feedback Principal. What we’re trying to do is to look at how we can produce better results by looking and comparing the results produced by the Machine Learning algorithms to the actual results and then feeding that back in to adjust the algorithms to produce incrementally better results. So, I thought that was kind of cool.
PERZE ABABA: You know, the first thing that came to my head when I saw that headline was—if you guys remember—that movie Her. It’s a dystopian future where everybody has this personal operating system. You get to talk to it, and it learns about you. It pulls data from your e-mails, from whatever data that’s available for you, and then it has the ability to talk to you. To make the long story short, all the OS’s or AI’s ended up transcending and pretty much stopped talking to everyone because they only started talking to each other. So if you haven’t seen that, I think it’s an interesting movie. It kind of bombed in the Box Office, but anyway, [LAUGHTER], that’s the first thing that came to my mind.
MICHAEL LARSEN: You know, most of the posts that I saw and the commentary, it was really interesting just watching people react to it, “Concerned, amused, not entirely sure what to make of it,” and I think a lot of the commentary that I saw was around the actual chats and people trying to make sense of it. I remember seeing a few people that were just, “Oh, yeah. Why are we concerned about this when the conversation going on here is this bad,” and I think that was one of the things I focused on and I thought was interesting. It was everybody was looking at it as a way of saying, “Well, because it’s not speaking proper English; therefore, it must be, in some way, defective, and it’s not going to work.” I was looking at it from the perspective of, “So that’s how they’re figuring out how to iterate things, or they have tasks that they need to do. The stringing of I’s is an example of iterating the number of tasks they were doing. I thought it was pretty clever, to be honest. I was looking at that going, “I would not have come up with that as a way of communicating something, but in a short distance, you know, between two machines, it actually has an elegance to it. It looked kind of cool.
It was more interesting to see people’s reactions to it, and the first reaction that people had was they dismissed it or they put it down. I don’t know if it was reflexive or if it was just, “We don’t understand this; therefore, it’s bad.” I just felt like I was the lone person looking at it from, “But there’s a neat little elegance here. Don’t you see that?” Nobody wanted to talk.
MATTHEW HEUSSER: Well, let’s dive into this. Right? There’s a couple of things as a journalist, and maybe Peter has a different perspective on this than I do. I’m a credentialed journalist. I’m a member of the American Society of Journalists and Authors, and my reaction to that post—which a few of you guys are in the, sort of, private discussion rooms where we were having these discussions—was, “This doesn’t make any sense.” The claim was they developed their own language, and my honest reaction was, “I doubt it.” Because words have semantic meaning and getting a computer to understand the meaning is really hard. Natural language processing, where you can take a sentence and break it apart into symbolic elements, that you can do, but to try to actually re-compose your own AI that exists that kind of creates its own text, from what I’ve seen, it’s mostly gobbledygook. It’s mostly pseudo-stuff that you run through a generator in order to—as a joke—get something published in the journal. It’s not that good. I would be surprised that Facebook would actually be able to create these Chatbots that had real semantic meaning. When they say they “invented their own language,” my honest thought was more like, “No. It’s more like they don’t understand English, so they devolve to nonsense.”
PETER VARHOL: Well Matt, you’re probably right, in that it’s probably not a language that can be widely applicable or could be generalizable among a larger population. It was more of a stimulus response sort of thing.
MATTHEW HEUSSER: Yeah. Now, the part that is interesting is the one example they have, and there’s just not a lot of detail. You go back to the original post by Facebook and there’s like one sentence in there about developing their own language and that was turned into entire articles that are almost all speculation. The one example they gave when they did the follow-up was, “Repetitive use of words to indicate the number of items that they were trying to negotiate for.” I just don’t know how AI would go there; and, if it did, how would its other? Because it was two different programs that were talking to each other, and it was a negotiation. So we’re trying to get one program as standardized, not-machine-learning, same input, same output, same results, and the other one is supposed to use ML. You run it through a million times and you see if you get better negotiations out of it. How was the second program—how was the standard program, the one that is just iterative, that is not learning—supposed to know that the other one means it wants seven items when it uses the word “I” seven times in a row? How was that supposed to happen? I don’t think it did.
PERZE ABABA: You know, I think what’s key was what Peter brought up earlier. It’s the feedback loop, right? If it really was based on a very specific learning algorithm, maybe it was a Supervised Learning that lead into reinforcement learning with what it’s using. Right? So, there’s very specific decision processes that particular machine can make.
You can force it to remember something and we just stick to that because that’s all it knows. Right? But then, as that body of knowledge kind of gets bigger and wider, then this gobbledygook becomes not really gobbledygook.
MATTHEW HEUSSER: The problem is: I don’t think that they have a setting mechanism to talk to each other so that they could understand each other. I’m also not sure the way ML works. From what I’ve seen, ML doesn’t really work that way. If you’re talking about supervised and unsupervised algorithms, which we’re going to get into, it’s not going to, like, invent semantic meaning for something, some words, you give it. It just doesn’t make sense to me. I don’t think we have the whole story, and I wish Facebook would Open Source the code and we could figure it out because this just doesn’t make any sense.
PETER VARHOL: Well, even if you Open Source the code, the algorithms are likely so complex and the training is so intricate, probably few (if any) people could look at the code and say precisely what the end result is going to be and why we got a particular end result. That’s both a blessing and a curse of Machine Learning. It’s a blessing in that we can set up a series of algorithms and an iterative process to be able to achieve something we can’t do without that iterative feedback mechanism, but the curse is that when we get a result that seems to be reasonable or seems to be conclusive or seems to be something, we can’t go back through the code and trace out just why we got that result.
MATTHEW HEUSSER: Yeah. That’s a really important point, I think. Because of the way these things work, you could make a log of, “This guy said this and the other guy said that, and this guy said this and the other guy said that.” Then, they reach a decision point, and the agreement was 10 units of flour for $100.00. But, “Why did the first guy say these words?’ There’s not a lot of way to build logging in for us even to understand why he said, “I” seven times in a row. In psychology terms for AI, it’s very easy to build a behavioral model. But the cognitive model is problematic.
PETER VARHOL: As I said, “That’s both a blessing and a curse” with machine learning. We’re devising systems that, as you go in, look at the algorithms, you really can’t tell why we got a particular result, and that raises a larger issue. How do we know that result is reasonable? How do we know it’s correct? How do we know, Matt, that it’s meaningful?
MATTHEW HEUSSER: Yeah. I think that’s what I want to dig into next. For our audience, let’s tell them a little bit about ML is and how it works, and then, “How do you test it?” I’m going to throw out a wrong and simplistic explanation of Machine Learning, which is a subset of artificial intelligence and then let Peter correct me, basically. Let’s say you go buy Machine Learning in Python, the book, and you read all about it. What you’re going to find is a lot of it is regression analysis. We used to do this in Excel.
You would plot a whole bunch of points and you would say, “Hey, Excel. Find me the curve that matches these points,” and then Excel would find you the curve. There’s a whole bunch of ways to do it. Once you’ve got that set up, you can now make informed guesses. So if that line, the curve, performance of your customers over time, the demand curve for your product, you can match that curve, and then you can say, “What it’s going to be tomorrow? What’s it going to be next week?” The problem with that kind of math is that there’s a bunch of different algorithms that do it. They’re all very straightforward iterative algorithms. You can actually pick the algorithm you want. So, if you’re trying to predict global warning, there’s an algorithm, which is nicknamed “the hockey stick” algorithm where you plug those numbers in and you always get a hockey stick. At the end, you get huge-massive growth—right?—based on a small amount of growth. Machine Learning is a little smarter than that, because it can go through and do that regression analysis and then it can go through again and re-plot it and see whether or not it was right. Then, based on that, draw a better line. Then, based on that, draw a better line. Then, based on that, draw a better line. It can also do multivaritet analysis. So, you have a game like 20 Questions where your data isn’t structured. You plug a whole bunch of data into it, and the AI can start to play 20 Questions. It can build its own structure of what questions it should ask in what order based on the data that you feed it. It’s a little bit better than that, but really, in Python, a lot of what this is, is you’re trying to predict results based on past history. You have to get the data from a file somewhere. You have to ream into the right kind of structure, and then you have a function called, like, “Do all the ML things.” You pass the structure to the function, and now you can call that function and get your answer back. It’s just regression analysis. It’s super easy. It’s not that big of a deal, and it’s not going to do the magic that we see in the Terminator movies. So, after that wrong, over-simplification, Peter, tell me how I’m wrong?
PETER VARHOL: Matt, you’re right. First of all, I find it amazing that a technique such as calculating correlation coefficients and regression analysis, which have been around for about 150 years now, has suddenly taken on Machine Learning connotations. The reason for that is that feedback loop or iterative loop or whatever you want to call it, the fact that we can come up with an approximation curve, whether it be linear, straight line, or it be an actual curve and then go and try to adjust that based on—we call it, “learning.” We call it, “training,” or—whatever you might want to do. I think that everyone who is listening to this who has graduated college has probably done correlation coefficients and regression analysis in at least one college-level course. So I think they know what you’re talking about here, at least in an abstract sense. What machine learning adds to that particular technique is that it adds that feedback loop or the iteration to say, “Okay. We’re holding back some of our data. Now, let’s run it through again and see if we can get closer and see just how far off we are.”
MATTHEW HEUSSER: That’s a nice definite improvement, “Let’s go back through the data again.”
PETER VARHOL: Sure.
MATTHEW HEUSSER: “Run it ourselves and see.” I don’t know minimize that.
PETER VARHOL: I think you really put your finger on it when you said, “Okay. So, we have a model. But, what if our problem domain changes over time?” That’s how our system is able to learn, and I think we’re getting into supervised versus unsupervised learning at that point. We can bring that up later, once again.
MICHAEL LARSEN: Carina C. Zona did a talk a couple of years ago called the, Consequences of an Insightful Algorithm. She has several versions of it on video that you can look at. It’s on YouTube. I was certainly put it in the Show Notes. I went back and I reviewed the slides that she had for this, and I thought it was interesting. Thinking about some of the challenges with Machine Learning and how we’re doing this, and some of the ways that data mining can actually fail us. She breaks it down. She makes an interesting little table here where she describes beyond the technical challenges, the regression analysis, and being able to go through and figure out that because of feedback, “Well, this is what this probably means”. You run into personally identifiable info being leaked. There are human complexity failures. You can actually run into false neutrality, making uncritical assumptions, moving fast, breaking things, shaming, accurate but not correct, and not really how valid research works. There’s a bunch of other ones in here too, “False neutrality.” This is probably the one that people have heard of. It’s the sense that, because of a person’s shopping habits, on Target’s site, a girl with a shared account started getting maternity clothing coupons and baby needs and the father who was receiving these ads was going, “Why are you sending maternity ads to my daughter? She’s not pregnant.” Well, it turns out that, after having this discussion, the daughter came out to her dad and said, “Actually, yes, I am.” Because of the, either, searches or things that she was purchasing or looking at, Target figured out that this girl was pregnant before she had even announced it. On one side that’s both interesting that it was able to intuit all of that and really creepy that it started advertising that and outing somebody before they ever had a chance to even discuss it.
PETER VARHOL: Michael, it’s funny that you bring that up. I’m actually trying to dig into a little bit and develop, maybe, some articles and presentations around a subject area that I’m calling, Cognitive Bias in AI. When we use data that is selected and generated by people, the end result is that our machines end up behaving a lot like people, at least within their problem domain. Let’s say that we, as developers and testers of AI and Machine Learning systems, go and select what data we believe is relevant to our problem domain. There are some situations like you’re describing where the algorithm is actually choosing or the designers of those systems have actually chosen what data it wants to focus on. These aren’t scientific systems or making scientific measurements or anything like that. These are people-based systems, and as a result, we are getting people-based biases into the Machine Learning process and the resulting conclusions probably have some people-oriented biases in them too.
MATTHEW HEUSSER: There’s a lot to unpack there. One of them, it could be as simple as, she belongs to a content network that shares what sites that she goes to, and she’s going to https://www.whattoexpect.com.
PETER VARHOL: https://www.babiesrus.com.
MATTHEW HEUSSER: https://www.babiesrus.com. “Should I breastfeed or not?” Like, all these kinds of things on the computer. It could be as dumb as, it doesn’t have to be Machine Learning. It could be as simple as, “People who bought these things also bought those things,” and let’s inject that into your Facebook. It could be as simple as that. The simple, “People who bought these things also bought those things. People who ranked this highly also ranked that highly.” It doesn’t really have to be Machine Learning. You can do that with a simple iterative algorithm. Where we get into Machine Learning is where we start doing that and then predicting it and then saying, “I think this person should be looking at these things. So, I’m going to go through and compare my predictions to what they actually got and use that to train my algorithm and run through that 300, 400, or 500 times until I can make very good predictions or better predictions.” So, let’s talk about the differences between supervised and unsupervised learning, Peter?
PETER VARHOL: Sure. Well, supervised learning is, we know the result, at least from our training data, and we’re trying to get an algorithm, trying to get a decision that is close to that known result as possible. Unsupervised learning is, “We’re trying to optimize something.” My favorite is airline tickets, and everybody gets frustrated. They log in, in the morning. They see a great price on an airline ticket, but they need to firm up some plans. So they log in later in the afternoon, the price has gone up by $200.00. That’s the airline—I’ll say—trying to optimize the revenue they get on each and every individual flight. In doing so, they are changing. As I understand it, airlines change their pricing tables based on what they believe will give them the maximum revenue on a plane. They change it during the week, three times a day. On weekends, Saturday and Sunday, they change it once a day. That is unsupervised learning. They know what they’re trying to optimize. In this case, it’s revenue. They don’t know what the exact answer is. They’re not going to come up with a mathematical optimization. They’re not going to come up with the exact peak of the curve or valley of the curve or whatever they’re looking for, but whatever they come up with is better than they can do by trying to price these things up manually.
MATTHEW HEUSSER: Okay. So, let me try to put that into my language and maybe between the three of us—we’ll ask Perze to do the same thing—we’ll come up with something for “the audience.” What I would say is that, I totally agree on the supervised. You have a training day to set to where you know what the right answer is. You can generate your little graph. It’s going to be a lot more complex than a simple graph. It could have an X and a Y and a Z axis. You can have all kinds of different data, right?
PETER VARHOL: You can have interdimensional space, which really doesn’t exist in reality; but, when you have 15 different variables, what you’re looking at is a 15‑dimensional space.
MATTHEW HEUSSER: Right. You can’t map it, but a software project has more than three independent variables. Right?
PETER VARHOL: Sure.
MATTHEW HEUSSER: We know what the right answer is. We can run the algorithm. We can generate a graph, and then we can go and say, “Is this the right answer?” We can use what the actual data is and run it through again and again and again and again and again and again to evaluate how accurate it is and make it smarter. With unsupervised data, we’re just looking for patterns. I understand those kind of algorithms, [LAUGHTER], much less because they seem more like magic to me, “We’re doing data mining looking for interesting combinations.” So that the algorithm can say, “Hey. When these four variables are over here, here, and here. When this one is low and that one is high, when that one is in the middle and that one is high, then the fifth variable, which we’re trying to optimize (which is total profit) is higher. So, do more of that.” That would be what unsupervised learning is trying to find. Actually, that makes sense.
PETER VARHOL: That sounds fun.
MATTHEW HEUSSER: It’s almost English. Perze, how does that square with your understanding or the words you would choose?
PERZE ABABA: What we’re looking into is, we’re not even jumping into the notion of Machine Learning, but we’re trying to apply this into (based on a given set of known variables or test ideas), “How can we automatically employ, pretty much in an automated checking space—how do we automatically employ that—in a given system that we have no prior understanding of what it is?” For example, pretty much, you’re looking at, from a decision‑tree‑based algorithm, which is probably a good example of Supervised Learning, I look at a system. I recognize all the objects within the system. Let’s say we’re looking at a webpage, and I know that a search object has this particular DIV and I can perform X amount of tests or checks based on that particular DIV; and, because I have an existing set of checks that I already have available, I can employ that into the system and see how it behaves. Right? So, I see a search box. I can employ, you know, cross-site scripting. I can employ a regular search. I have access to the API in the backend so I can actually confirm if the results match, you know, what’s on the database. One thing that I might not be able to employ is to have an idea of the relevance of the search results based on the search terms that I have because, as a human, I can employ value if I search for “apple” for example. I might be searching for the company or the fruit or, you know, something else. Like a dumb machine, pretty much, won’t be able to do that, but I believe from the unsupervised learning piece of things that you can start grading very specific values towards, “How did everyone else rate this particular search result?” I know there’s an already-existing amount of results that people have clicked into.
MATTHEW HEUSSER: That’s what I was going to say, and maybe Peter can correct us. You would need some form of third party to say, “Hey. These 15 links are Apple the Computer Company and nothing else is. They’re all apple the fruit,” and once you give that extra piece of data to the unsupervised process, then it could go and work its magic. Giving it a number of stars or a “yes” or “no” or somehow having the human rate the search results. You would need to do that to kick start the algorithm, I think.
PERZE ABABA: Yeah. I think at the end of it all is the idea of value in relation to the relevance of what you actually got out the algorithm that you used. For supervised learning, it’s somewhat of a known space or even a known/unknown space that gives you the ability to dig into and assign very specific value to the output, if there is an output. For unsupervised, on the other hand, it just assigns a specific value so that we can then—as part of our analysis, after we look at the results—see, “Oh, that’s actually important.” That piece, that it found out that where we should look deeper into. This is something that’s worth looking into.
PETER VARHOL: This is what e-commerce recommendation engines—the more sophisticated ones—are doing, and they’re still (I’ll say) not particularly accurate. But, you know, let’s say that you’re an Amazon or something like that or a Jet or something like, and you manage through the use of the recommendation engine to increase your sales by 2 percent. For an Amazon, that’s like $2 billion. Is it worthwhile doing? Is it worthwhile incorporating Machine Learning to do that? The answer is, “Heck, yes.”
[END OF TRANSCRIPT]