How Artificial Intelligence and Machine Learning Will Change Test Automation in 2019 and Beyond?

As 2019 begins, the domains of software testing and test automation are starting to see the first impacts of new technologies that will transform them fundamentally - Artificial Intelligence (AI) and Machine Learning (ML). AI will change the way you do business. This article will cover topics as below:

Analyze how AI will change the way that software testers work forever.
Give examples of how this new technology is being applied to testing today.
Review what’s coming in the near future & demonstrate why software testing is an ideal application for AI.

If you’re involved in software testing or app development at any level, you’ll want to read on this article.

How Artificial Intelligence and Machine Learning Will Change Test Automation in 2019 and Beyond

Table of Contents

AI and Software Testing
AI Test Definitions
AI Screen and Element Identification
AI Test Step Sequencing
Prepare for the Future: Embrace AI

AI and Software Testing

Software testing is about to be transformed by Artificial Intelligence (AI) and Machine Learning (ML). While other aspects of software engineering have improved dramatically in the past decade, software testing still looks largely the same. Testing hasn’t changed much because testing requires human judgment, manual activities, domain knowledge, and empathy for the end user – all of these require human-level intelligence. AI is a way to build software that can replicate human-style judgment. The explosion of AI as a field, combined with affordable computing, means that AI can be applied to some of the most challenging aspects of automated testing and deliver AI-assisted testing for humans.

AI is a broad category of tech, including machine learning, where software is ‘trained’ to do basic tasks with or without human instruction. AI even includes work on Artificial General Intelligence (AGI) – the work to construct conscious - even superintelligent - machines.

Of all professions, software testing is the ripest to be automated with AI. Most applications of AI are really just pattern matching. Today, AI is being applied to many fields, from radiology, to driving cars, to making hair appointments over the phone. These applications use the output of the AI training process and apply that to specific problems, such as analyzing MRI scans, recognizing a stop sign and stopping a car, or composing conversations with hair stylists. These are problems which consist of inputs and comparing the outputs to expected results. Testing is another matter altogether, though, as the basic processes of software testing are similar to the processes used to train AI. Testing is, fundamentally, an AI training problem. The good news is that since the processes are so similar, all the money spent today on infrastructure and researching AI training techniques is really an investment in AI test automation. Testing is, fundamentally, the process of applying inputs to an application / systemunder-test, observing the outputs, and checking those outputs against the expected values.

Training AI systems is very similar to testing. Instead of applying inputs and measuring outputs to an application under test, AI systems apply inputs to the neural network (or other model) and measure the outputs from that model. AI training systems also compare the output of the neural network with the expected value (training data). This looks very similar to testing. In fact, it is testing.

Not only is AI training similar to testing, but most testing activities consist of quick visual inspections to determine what to do next or to determine whether the application’s functionality is correct. Modern AI is great at solving such quick-twitch judgment calls. Even better, the AI can be trained on the judgement of not just one tester but on the collective wisdom of thousands of smart testers and can have all that brainpower encoded in the machine.

Test automation is a combination of testing best practices and software development, so it is often as time consuming and expensive as application development. There are many types of testing. The most painful is UI (User Interface)-based functional regression testing, as the application is constantly changing and the test code needs to drive another application, not just an API (Application Programming Interface) or call a function (Unit Testing).

UI test automation is fundamentally composed of four major tasks:

Test definitions
Screen and element identification
Test step sequencing
State verification

Below, we explore how each of these aspects of test automation can be written with an AI-first approach. For each test automation task, we’ll also explore how AI improves the following aspects of testing:

Efficiency of test development
Reliability of test execution
Reduced cost of maintenance
Re-use of test artifacts across platforms and applications.

The sum of these improvements in test automation will usher in a new world of software testing. Ultimately, most test automation will be centralized thanks to re-use and AI. Most test cases will already be written for an app – before it is even implemented. Most importantly, AI-powered test automation means competitive benchmarking in quality is finally possible. AI won’t just make test automation better, faster, and cheaper, it will fundamentally revolutionize the testing profession and help standardize measures of quality as the same ‘test’ can now be performed on different platforms and even different applications.

AI Test Definitions

Manual test cases are often written in human language, as their purpose is to describe the test for people to read and execute. Most test case definitions, however, are written in procedural code or long winding paths of poorly written Python or Java test scripts.

This approach is less than ideal:

Only programmers can create or modify the tests
Test code is difficult to write, test, and debug
Most of the code written is for test case setup and teardown – not the test.
Test code must be manually updated when the application changes

The astute reader may be thinking of the Cucumber test framework. On the surface, Cucumber tries to tackle this test definition problem so that tests can be constructed in a human-like language and then executed by a second system or model of the application. The reality is that Cucumber projects often fail because they still have the same problems of procedurally coded tests. Cucumber tests need a few programmers to connect the human-like language of the test definition to executable code that drives the application. We’ll see below how AI can do all that magic for humans.

AI test cases need to be written in some sort of language. Ideally, these test definitions should be abstracted from the application’s implementation as far as possible to make the test case execution flexible and reusable. The test definition should also be human-readable so both machines and humans can work from the same test case definition and be free from the need to know or bother with programming languages for test creation or execution. At this point, the best candidate for such a test case definition format is the Abstract Intent Test (AIT) language, which conveniently has “AI” in its acronym. This format is an open standard and actively worked on by the AI for Software Testing Association (https://www.aitesting.org).

AIT borrows from the learning of Cucumber and its Gherkin language, as it is designed for human readability, and often used by designers or product managers to specify the functionality of an application. AIT just adds some additional syntactic sugar to the steps of a Gherkin scenario so that it is readily readable by machines. AIT enables test definitions to be as precise or as general as the test author likes. Steps that are left out, or obvious, are performed by the AI automatically. For those steps that are specific and necessary for the test, the AI will execute those steps exactly as declared by the test author.

AI-powered testing means that test cases are also quick to write, with no programming knowledge needed, and can magically execute test cases across platforms and applications – just like a human can. The same test case can deal with very different user interfaces, numbers of steps, platforms and apps.

Now that we can define the ideal test case for execution by AI (and humans), let’s get into the nuts and bolts of how to get AI to magically execute all these AIT test cases for us.

AI Screen and Element Identification

A key aspect of test automation is the ability to identify elements in an application. Test automation needs to find elements to interact with them via taps, swipes, and text input. Test automation also needs to find elements to verify the correct output or results of a test case: for example, finding the search result item and verifying that it is the expected result by examining the text description of the link.

When humans test an app, they can readily identify what type of screen it is, e.g. search, login, profile, etc. Humans can also do a very good job of determining if a button is a search button, if a text box is a search box, or if an image is a picture of a product.

People can even readily identify screens and elements in applications they have never seen before even if the buttons are a bit larger, a different color, or different position on a page, because the person has likely seen similar objects in other apps. Much like people are trained to recognize basic application screens and elements, we can teach machines to do the same.

To teach an AI how to classify a screen or element we need lots of training data. Training data is simply a large set of examples of, say, search buttons. Some may be small, some large, some red, some blue, but we need lots of examples. Many will have the word ‘search’, or ‘go’, in the text of the button. Most will be centered, or on the right-hand side of the application, and often in the middle or top near the search text box. We humans know this intuitively; we just need to give lots of examples to an ML model to ‘learn’ to recognize search buttons too.

So, how do we get this large corpus (set) of training data? There are three steps. First, write a crawler bot that will download thousands of applications (or browse websites) and take screenshots of everything. Second, break down the screenshots into individual images of buttons, text boxes, images, etc. Third, you need to get the labels to train the AI. Labeling is the process of naming each individual image with a label such as ‘search_button’, ‘product_image’, or ‘shopping_cart_button’. We now have a set of training data to teach the machines to think like human testers.

How do you get labels for hundreds of thousands of images? Amazon’s Mechanical Turk is a common solution. You can pay people pennies per label. Simply send the service a list of images and define the job as ‘please pick one of the following that describe the picture’. Posting this type of job is pretty easy.

Now, powered with a bunch of training data, which is simply a large set of images that are labeled, we simply pass this data to a machine learning infrastructure such as TensorFlow or SciKitLearn Python libraries, etc. The detailed mechanics of doing that is beyond the scope of this article but, basically, you just put all the images and labels into a giant array and call a function to, say, ‘train’. The AI starts by randomly guessing the correct label for each element. When it is right, it remembers how it was configured. When it is wrong, it reconfigures (changes its internal connections) and tries again. In practice, this happens hundreds of thousands of times until it can get most labels for elements correct.

A high-level view of the training process follows:

A randomly configured “brain” is generated
The system presents each image and its ‘feature values’ to the neural network and measures whether the network guesses the correct label.
If the label was correct, the network is ‘reinforced’ that the current configuration is a good one.
If the label was incorrect, the network is ‘changed’ in hopes that next time it will guess correctly.
The system repeats this process hundreds of thousands of times until the network has ‘learned’ to produce the correct results.

After 24 hours of computation on high-end machines, the AI training system generates a neural network that can now take the input of an image and suggest the correct label for that screen or element, just like a human can. Perhaps better than a human. Testing the correctness of such a system is also out of scope for this article but, essentially, the brain is tested using a separate set of images that were not used in training and then tested to see how accurately the brain, like a human brain, performs on those new images. This AI approach is known as “supervised learning” as we have supervised the learning of the system by giving it many examples and letting the machine learn to reproduce the same output as humans.

With all this work, we now have an alternative way of identifying parts of an application. Remember, traditional non-AI software automation finds elements by hardcoded searches in the application for a magic accessibility or other ID value, XPATH, or CSS Selectors. AI-based element identification, however, finds elements just like the human brain.

The AI-based approach to identifying elements is far more complex to set up but it has two distinct advantages versus traditional methods:

Speed: The engineering time it takes a human to write the code to identify an individual element can be several minutes. With an AI trained on thousands of different elements, the time to identify an element is less than a second and can be done at runtime. This represents a 10X development speed/cost improvement. Moreover, these AI classifiers can be shared between app teams, so only one or a few people on the planet need to build these classifiers and everyone else can benefit.
Robustness/Maintenance: A key problem in test automation today is that the applications are constantly changing. The color, size, location, or text of a search button may change and break test code, but to the eye of a human, or well-trained AI, the search button still looks like a search button, whereas traditional element identification will fail with even minor changes to the application. AI-based identification keeps working and doesn’t require maintenance.

It is worth noting that the trained AI is the collective knowledge of all people and testers that contributed to the training set of images and labels. When the AI is trained on hundreds of thousands of images with labels from hundreds of different people, it has seen more search buttons than most humans ever will in their lifetime. This means the AI might just be smarter than any single tester at identifying elements in applications.

AI Test Step Sequencing

A test case is a sequence of steps: a series of inputs and outputs. Find the search screen, enter text in the search box, click the search button, then verify the search results seem relevant to the query. We’ve seen how AI can be trained to identify the individual parts of an application such as search boxes and buttons; now we need to teach it to accomplish a task that is a series of steps.

Traditional test automation expresses test step sequences in procedural code such as Python or Java. Each step is hard-coded, step-by-step, to interact with elements in the application. Procedural code for test steps is problematic in three ways:

Test automator needs to know how to program. There are a limited number of competent programmers in the world and they are expensive. Generally speaking, test engineers are not the most experienced engineers, nor do they produce beautiful test code.
Time to develop tests. Programming is labor intensive. Programming is, ironically, manual in that each line of code must be hand crafted and it also needs to be tested.
Brittleness/maintenance. The biggest issue with procedural code is that if the flow/structure of the application changes, the test automation breaks. Often it breaks at the exact moment the team looks to the automation to verify that the application still works. A/B testing, redesign, interstitial dialogs, etc. can appear during execution and break the expectations of the procedural code.

Prepare for the Future: Embrace AI

AI and machine learning are core technologies that can be applied in a nearly infinite array of ways to many different testing problems. Unsupervised learning is applied at Concur to automatically identify servers are acting oddly. Ultimate Software is using AI to generate additional test cases by learning from existing test cases. King (of Candy Crush fame) uses AI to automatically test new level designs. Many testing vendors are actively figuring out how to integrate AI because they don’t want to be left behind. Others are so motivated that they claim to be using AI before they know what it means. All this is evidence that AI will transform software testing – whether we like it or not.

AI for software test automation is real, it's here, and it is running on real world apps today. Many testers will be in denial. Many testers are intimidated or confused about what “AI” really is. Many testers ask if their automation or manual jobs are in peril. Many will say “it can’t perform this edge case”, or that they are too invested in the procedural testing world to change. By analogy, cars were initially noisier, more complex, more expensive, couldn’t travel well on muddy roads, needed gas, and even ran over people, but you don’t see many horses around town anymore. AI is just as transformational a technology for the testing profession as cars were to transportation. We are in the early days of figuring out how to apply AI to testing but the revolution has begun. Regardless of what people want to believe, AI for software testing is here today and it will rapidly transform what we know as test automation in the coming years.