Performance Metrics for  Regression Algorithms

Performance Metrics for Regression Algorithms

Machine learning is the field of study to understand and build methods that learn from the data. A core objective of any machine learning algorithm is to generalize from experience. Generalization is the ability to perform accurately on new, unseen data after having experienced a learning data set.

The Supervised Learning Algorithms build a mathematical model on the set of data that contains both inputs and desired output. The data used to train the model is known as training data and consists of training examples.

Sometimes, a **hypothesis **(model function) may have a low error on training examples but is still inaccurate because of the overfitting. Thus, to evaluate the hypothesis on the given datasets of training examples. We can split the data into two datasets.

  • Training Data i.e. 70% of data.

  • Testing Data i.e. 30% of remaining data.

The train_test_split method of the sklearn.model_selection class can use to split the dataset into training and testing datasets. Now, we can use these training data to train the model and evaluate its performance on testing data.

You can read my article on ML Algorithm: Linear Regression from scratch using gradient descent to know more about regression analysis. Where I discussed, Building linear regression from scratch using Python and NumPy library. But in this article, we will be using the sci-kit learn library for model training and evaluation.

The regression algorithm predicts the continuous output values such as integers or floating point values in contrast to classification, which predicts the category or class label.

To measure the performance of the regression algorithm, we want to know how close the actual values and predictions are. So we compute the difference between the observed and predicted values. i.e. prediction error.

We will learn different metrics to measure the performance of regression algorithms using metric functions of the sci-kit learn library. For demonstration, we will use the diabetes dataset from the sklearn. datasets library.

Let’s first start with importing the required libraries and datasets. After that, we split the data into training and testing datasets in a 70:30 ratio. After training a model, we are ready to evaluate our model performance using the metrics.

# Import libraries
import pandas as pd
import numpy as np

# Datasets
from sklearn.datasets import load_diabetes
# load data
X, y = load_diabetes(return_X_y=True, as_frame=True)

# To split the data
from sklearn.model_selection import train_test_split
# split the data into 70% of training data and 30% of testing data.
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    train_size=0.7, 
                                                    random_state=42)

# Model training
from sklearn.linear_model import LinearRegression
# Training the algorithm Linear Regression
LR_estimator = LinearRegression()

# train the data
LR_estimator.fit(X_train, y_train)

# predicts the output
y_preds = LR_estimator.predict(X_test)

Types of Regression Metrics

  • Mean Absolute Error (MAE)

  • Mean Squared Error (MSE)

  • Root Mean Squared Error (RMSE)

  • R2 score

  • Adjusted R2 score

These are some of the most commonly used metrics to evaluate the model performance. There are many other metrics available for regression. You can read the list of metrics supported by the sci-kit learn library here.

1. Mean Absolute Error (MAE)

Mean absolute error is the average of the absolute difference between actual and predicted values by a regression model.

$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y_i}|$$

here,

n = number of observations

y_i = actual values

y_hat = predicted values

Key Points

  • The unit of mean absolute error is the same as the unit measure of the response variable. MAE is useful for reporting the model performance but can not be used to interpret the evaluation of different models.

  • The mean absolute error is robust to outliers, which means that the error does not affect by the extreme outliers.

  • The mean absolute error can not be used as a Loss Function since it is non-differentiable.

# Import `mean_absolute_error` method from metric class of sklearn library.
from sklearn.metrics import mean_absolute_error

# Mean Absolute Error
mae = mean_absolute_error(y_test, y_preds)
print("Mean Absolute Error: %.3f" % mae)
Mean Absolute Error: 41.919

2. Mean Squared Error (MSE)

Mean Squared Error is the average squared difference between the actual and predicted values by the regression model.

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2$$

Key points

  • The unit of the mean squared error is the square of the measuring unit of the response variable. It is a perfect metric to compare the performance of different estimators.

  • The mean squared error is sensitive to the outliers, which means that the error metrics affect by the outliers. In MSE, large errors(outliers) are penalized more than small errors since they square each term.

  • It is a well-known metric for the loss function since it has differentiable properties.

  • The best score for MSE is 0, which is practically impossible. The MSE is always a positive value and decreases as the error approaches zero.

# Import `mean_square_error` method from metric class of sklearn library.
from sklearn.metrics import mean_squared_error

# Mean Squared Error
mse = mean_squared_error(y_test, y_preds)
print("Mean Squared Error: %.3f" % mse)
Mean Squared Error: 2821.751

3. Root Mean Squared Error (RMSE)

Root Mean Squared Error is the extension of MSE. It is a square root of the average squared difference between the actual and predicted values by the estimator.

$$RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2}$$

Key points

  • The unit of the root mean squared error is the same as the measuring unit of the response variable. It can use to interpret the results or report the performance.

  • Root Mean Squared Errors are sensitive to the outliers. The effect of each error on RMSE is proportional to the size of the squared error thus, larger errors have a large effect on RMSE.

  • The RMSE is always a positive value which decreases as the error approaches zero. RMSE with the value of 0 would indicate the perfect fit to the data, which is not possible in practice.

# for RMSE we will use 'mean_squared_error'.
from sklearn.metrics import mean_squared_error

# first compute MSE
mse = mean_squared_error(y_test, y_preds)
# To calculate RMSE, take sqaure root of MSE
rmse = np.sqrt(mse)
print("Root Mean Squared Error: %.3f" %rmse)
Root Mean Squared Error: 53.120

4. R2 score

The R2(R squared) score is also known as the coefficient of determination. It measures the goodness of fit of a model, meaning it measures how well the regression predictions perfectly fit the data. It helps you to answer the question, “What percentage of the total variation in Y(Actual values) is explained by the variation in the regression line(Predicted values)?”

$$ss\{mean} = \frac{1}{n} \sum{i=1}^{n} (y_i - \bar{y_i})^2$$

$$ss\{RegLine} = \frac{1}{n} \sum{i=1}^{n} (y_i - \hat{y_i})^2$$

$$R^2 Score = 1 - \frac{ss\{RegLine}}{ss\{mean}}$$

Key points

  • The R2 score is more informative than other metrics, as evaluation is expressed by percentage. An R2 score of 1(100%) indicates that the regression predictions perfectly fit the data. The value of the r2 score must be in the range of 0 to 1. A score value close to 1 is better for the model performance.

  • If the sum of the square of the regression line i.e. error term is less or close to 0, the R2 score will close to 1. If the sum of the square of the regression line i.e. error term is large or close to 1, the R2 score will be small or 0.

  • The R2 score can be less than 0 or negative, It occurs when a non-linear function is used, to fit the data or when the estimator does not fit the data properly. In this case, the mean of the data performs better than the estimator used to fit the data.

from sklearn.metrics import r2_score

# r2 score
r_squared = r2_score(y_test, y_preds)
print("r-squared score: %0.3f" %(r_squared))
r-squared score: 0.477

5. Adjusted R2 score

The disadvantage of the R2 score is that the R2 score increases with new variables or features. However, when we add an irrelevant variable, the value of the R2 score might stay constant or increase. This can happens when an estimator overfits the data. To avoid that, we can use an adjusted R2 score with the number of independent variables(also known as the degree of freedom).

$$Adjusted R^2 = 1 - [\frac{(n-1)}{(n-k-1)} (1-R^2)]$$

here,

n=number of observations

k=number of independent variables.

R2=R squared score

The adjusted R2 score is always less than the R2 score as it adjusts for the increasing predictors and only shows the improvement for the features that really help to improve the performance of an estimator.

# Calculate adjusted R2_score using r2_score function.
from sklearn.metrics import r2_score

# r2 score
r_squared = r2_score(y_test, y_preds)

# compute the `adjusted R2` using above formula
n, k = X.shape 
Adj_Rsquared = 1 - ((1-r_squared)*(n-1)/(n-k-1))
print("Adjusted r-squared score: %0.3f" %(Adj_Rsquared))
Adjusted r-squared score: 0.465

Conclusion

I hope you now understand the importance of performance metrics in model evaluation. This article explained some of the most common metrics for regression algorithms.

One important thing to note is that the selection of metrics for model evaluation depends on the type of data or the problem you’re trying to solve.

You can always try these metrics with the combination of different algorithms and compare the results of validating and testing data.

Keep experimenting!

That’s it for now. Thank you for reading.😊

Did you find this article valuable?

Support Madhuri's blog by becoming a sponsor. Any amount is appreciated!