False positive predictions using Simple Linear Regression

Going further with our journey based on stories from the software automation industry, the new chapter will be dedicated to AI. In the initial phase, we will study 10 machine learning algorithms and try to find its applicability in Software testing industry.

We will start with the Simple Linear Regression algorithm, this being the easiest one to understand as an AI beginner.

What we want with this?  – To make predictions.

What predictions in test automation? – Actually, this was quite a challenge as most of the time I was thinking about the AI as a new approach to write tests automatically and just watch it working for me so I can get retired. To achieve this, it’s definitely a long way and this algorithm does not seem to be useful for this scenario. Still, I’ve tried to give it a chance and we’ve found a huge benefit in terms of making predictions of the false positive tests.

As a use case: we have a data set with three columns:

  1. a weekly build
  2. total number of automated executed test cases
  3. number of failed test cases marked as false positive

Usually, false positive tests are caused by:

  • Framework issues
  • Connectivity problems
  • Bad test case design/implementation
  • etc…

As a QA automation engineer you have to consider the effort spent on fixing the false positive so here it comes the Simple Linear Regression which will do this for you.

Introduction to Simple Linear Regression

The line for simple linear regression is defined by the following formula:

y= b0 + b1*x

y - dependent variable
x - independent variable
b0 - intercept of the regression line
b1 - the slope

The goal is to predict the value of the y variable based on the x known variable. For our scenario y would be the number of false positive tests and x, the total number of executed tests.
Please consider that the focus of this article is on the algorithm applicability and implementation so we won’t spend time on the math behind it. For this, there are tons of other popular articles on the internet, so I highly encourage you to read one. Check my favorite

In terms of computing the correlation coefficients b0 and b1, they can be easily estimated based on the training dataset.

b1 = sum((x(i) - mean(x)) * (y(i) - mean(y))) / sum( (x(i) - mean(x))^2 )
b0 = mean(y) - B1 * mean(x)

Our dataset

Automated false positive tests
Build Total nr. of tests False failures
1.0.0 50 5
1.0.1 60 7
1.0.2 70 5
1.0.3 85 15
1.0.4 100 11
1.1.0 200 ?
1.2.0 400 ?

In this table, we have recorded the total number of tests for different build versions and also their false failures. Our goal would be to predict the value from the cells marked in pink (vs. 1.1.0 and 1.2.0).

Python implementation from scratch

Based on the previously listed table, we are only interested in the last two columns, x is the total number of tests and Y the false positive tests.

There are four steps to be implemented in order to achieve the prediction of y based on x.

Step 1 – Calculate the mean and the variance

# Estimate the mean and the variance for the input and output variables of the data set
# mean(x) = sum(x)/count(x)

def mean(values):
    return sum(values) / float(len(values))

def variance(values, mean_value):
    return sum([(x - mean_value) ** 2 for x in values])

Step 2 – Calculate the covariance

# Calculate the variance: The variance is the sum squared difference for each value from the mean value.
# variance = sum( (x - mean(x))^2 )

def covariance(x, mean_x, y, mean_y):
    covariance_ = 0.0
    for i in range(len(x)):
        covariance_ += (x[i] - mean_x) * (y[i] - mean_y)

    return covariance_

Step 3 – Estimate the coefficients

# Estimate Coefficients
# B1 = sum((x(i) - mean(x)) * (y(i) - mean(y))) / sum( (x(i) - mean(x))^2 )
# B1 = covariance(x, y) / variance(x)
# B0 = mean(y) - B1 * mean(x)

def coefficients(data_set):
    x = [row_[0] for row_ in data_set]
    y = [row_[1] for row_ in data_set]
    x_mean, y_mean = mean(x), mean(y)
    b1 = covariance(x, x_mean, y, y_mean) / variance(x, x_mean)
    b0 = y_mean - b1 * x_mean
    return [b0, b1]

Step 4 – Get predictions

# Simple linear regression
def simple_linear_regression(train, test):
    predictions = list()
    b0, b1 = coefficients(train)
    for r in test:
        y = b0 + b1 * r[0]
        predictions.append(y)
    return predictions

Execution on our data set

simple_linear_regression([[50, 5], [60, 7], [70, 5], [85, 15], [100, 11]],
                               [[200, None], [400, None], [500, None], [1000, None]])

Output:

Build Total nr. of tests False failures
1.0.0 50 5
1.0.1 60 7
1.0.2 70 5
1.0.3 85 15
1.0.4 100 11
1.1.0 200 29
1.2.0 400 61

Conclusion

That’s all about the simple linear regression algorithm and its usage on software automation industry. More than that we’ve followed the implementation from scratch based on the math formula without even touching Scipy or any other Python package.

More details about the implementation could be found here

Posted in AI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.