Going further with our journey based on stories from the software automation industry, the new chapter will be dedicated to AI. In the initial phase, we will study 10 machine learning algorithms and try to find its applicability in Software testing industry.

We will start with the Simple Linear Regression algorithm, this being the easiest one to understand as an AI beginner.

What we want with this? – To make predictions.

What predictions in test automation? – Actually, this was quite a challenge as most of the time I was thinking about the AI as a new approach to write tests automatically and just watch it working for me so I can get retired. To achieve this, it’s definitely a long way and this algorithm does not seem to be useful for this scenario. Still, I’ve tried to give it a chance and we’ve found a huge benefit in terms of making predictions of the false positive tests.

As a use case: we have a data set with three columns:

- a weekly build
- total number of automated executed test cases
- number of failed test cases marked as false positive

Usually, false positive tests are caused by:

- Framework issues
- Connectivity problems
- Bad test case design/implementation
- etc…

As a QA automation engineer you have to consider the effort spent on fixing the false positive so here it comes the Simple Linear Regression which will do this for you.

## Introduction to Simple Linear Regression

The line for simple linear regression is defined by the following formula:

```
y= b0 + b1*x
y - dependent variable
x - independent variable
b0 - intercept of the regression line
b1 - the slope
```

The goal is to predict the value of the y variable based on the x known variable. For our scenario y would be the number of false positive tests and x, the total number of executed tests.

Please consider that the focus of this article is on the algorithm applicability and implementation so we won’t spend time on the math behind it. For this, there are tons of other popular articles on the internet, so I highly encourage you to read one. Check my favorite

In terms of computing the correlation coefficients b0 and b1, they can be easily estimated based on the training dataset.

`b1 = sum((x(i) - mean(x)) * (y(i) - mean(y))) / `

`sum( (x(i) - mean(x))^2 )`

b0 = mean(y) - B1 * mean(x)

## Our dataset

Build | Total nr. of tests | False failures |
---|---|---|

1.0.0 | 50 | 5 |

1.0.1 | 60 | 7 |

1.0.2 | 70 | 5 |

1.0.3 | 85 | 15 |

1.0.4 | 100 | 11 |

1.1.0 | 200 | ? |

1.2.0 | 400 | ? |

In this table, we have recorded the total number of tests for different build versions and also their false failures. Our goal would be to predict the value from the cells marked in pink (vs. 1.1.0 and 1.2.0).

## Python implementation from scratch

Based on the previously listed table, we are only interested in the last two columns, x is the total number of tests and Y the false positive tests.

There are four steps to be implemented in order to achieve the prediction of y based on x.

Step 1 – Calculate the mean and the variance

```
# Estimate the mean and the variance for the input and output variables of the data set
# mean(x) = sum(x)/count(x)
def mean(values):
return sum(values) / float(len(values))
def variance(values, mean_value):
return sum([(x - mean_value) ** 2 for x in values])
```

Step 2 – Calculate the covariance

```
# Calculate the variance: The variance is the sum squared difference for each value from the mean value.
# variance = sum( (x - mean(x))^2 )
def covariance(x, mean_x, y, mean_y):
covariance_ = 0.0
for i in range(len(x)):
covariance_ += (x[i] - mean_x) * (y[i] - mean_y)
return covariance_
```

Step 3 – Estimate the coefficients

```
# Estimate Coefficients
# B1 = sum((x(i) - mean(x)) * (y(i) - mean(y))) / sum( (x(i) - mean(x))^2 )
# B1 = covariance(x, y) / variance(x)
# B0 = mean(y) - B1 * mean(x)
def coefficients(data_set):
x = [row_[0] for row_ in data_set]
y = [row_[1] for row_ in data_set]
x_mean, y_mean = mean(x), mean(y)
b1 = covariance(x, x_mean, y, y_mean) / variance(x, x_mean)
b0 = y_mean - b1 * x_mean
return [b0, b1]
```

Step 4 – Get predictions

```
# Simple linear regression
def simple_linear_regression(train, test):
predictions = list()
b0, b1 = coefficients(train)
for r in test:
y = b0 + b1 * r[0]
predictions.append(y)
return predictions
```

## Execution on our data set

```
simple_linear_regression([[50, 5], [60, 7], [70, 5], [85, 15], [100, 11]],
[[200, None], [400, None], [500, None], [1000, None]])
```

Output:

Build | Total nr. of tests | False failures |
---|---|---|

1.0.0 | 50 | 5 |

1.0.1 | 60 | 7 |

1.0.2 | 70 | 5 |

1.0.3 | 85 | 15 |

1.0.4 | 100 | 11 |

1.1.0 | 200 | 29 |

1.2.0 | 400 | 61 |

## Conclusion

That’s all about the simple linear regression algorithm and its usage on software automation industry. More than that we’ve followed the implementation from scratch based on the math formula without even touching Scipy or any other Python package.

More details about the implementation could be found here