The 7 Most Common Machine Learning Loss Functions (2024)

The loss function is a method of evaluating how well your machine learning algorithm models your featured data set. In other words, loss functions are a measurement of how good your model is in terms of predicting the expected outcome.

The cost function and loss function refer to the same context (i.e. the training process that uses backpropagation to minimize the error between the actual and predicted outcome). We calculate the cost function as the average of all loss function values whereas we calculatethe loss function for each sample output compared to its actual value.

The loss function is directly related to the predictions of the model you’ve built. If your loss function value is low, your model will provide good results. The loss function (or rather, the cost function) you use to evaluate the model performance needs to be minimized to improve its performance.

What Are Loss Functions in Machine Learning?

Loss Functions

Broadly speaking, loss functions can be grouped into two major categories concerning the types of problems we come across in the real world: classification and regression. In classification problems, our task is to predict the respective probabilities of all classes the problem is dealing with. On the other hand, when it comes to regression, our task is to predict the continuous value concerning a given set of independent features to the learning algorithm.

Assumptions of Loss Functions

n/m — number of training samples
i — i^th training sample in a data set
y(i) — Actual value for the ith training sample
y_hat(i) — Predicted value for the ith training sample

More on Loss Functions From Built In Expert ContributorsThink You Don’t Need Loss Functions in Deep Learning? Think Again.

Loss Functions for Classification

Types of Classification Losses

Binary Cross-Entropy Loss / Log Loss
Hinge Loss

1. Binary Cross-Entropy Loss / Log Loss

This is the most common loss function used in classification problems. The cross-entropy loss decreases as the predicted probability converges to the actual label. It measures the performance of a classification model whose predicted output is a probability value between 0 and 1.

When the number of classes is 2, it’s binary classification.

2. Hinge Loss

The second most common loss function used for classification problems and an alternative to the cross-entropy loss function is hinge loss, primarily developed for support vector machine (SVM) model evaluation.

Hinge loss penalizes the wrong predictions and the right predictions that are not confident. It’s primarily used with SVM classifiers with class labels as -1 and 1. Make sure you change your malignant class labels from 0 to -1.

Loss Functions for Regression

Types of Regression Losses

Mean Square Error / Quadratic Loss / L2 Loss
Mean Absolute Error / L1 Loss
Huber Loss / Smooth Mean Absolute Error
Log-Cosh Loss
Quantile Loss

1. Mean Square Error / Quadratic Loss / L2 Loss

We define MSE loss function as the average of squared differences between the actual and the predicted value. It’s the most commonly used regression loss function.

The corresponding cost function is the mean of these squared errors (MSE). The MSE loss function penalizes the model for making large errors by squaring them and this property makes the MSE cost function less robust to outliers. Therefore, you shouldn’t use it if the data is prone to many outliers.

Looking for More Machine Learning Help? We Got You.5 Open-Source Machine Learning Libraries Worth Checking Out

2. Mean Absolute Error / L1 Loss

We define MAE loss function as the average of absolute differences between the actual and the predicted value. It’s the second most commonly used regression loss function. It measures the average magnitude of errors in a set of predictions, without considering their directions.

The corresponding cost function is the mean of these absolute errors (MAE). The MAE loss function is more robust to outliers compared to the MSE loss function. Therefore, you should use it if the data is prone to many outliers.

3. Huber Loss / Smooth Mean Absolute Error

The Huber loss function is defined as the combination of MSE and MAE loss functions because it approaches MSE when ? ~ 0 and MAE when ? ~ ∞ (large numbers). It is mean absolute error, which becomes quadratic when the error is small. To make the error quadratic depends on how small that error could be, which is controlled by a hyperparameter, ? (delta) that you can tune.

The choice of the delta value is critical because it determines what you’re willing to consider an outlier. Hence, the Huber loss function could be less sensitive to outliers than the MSE loss function, depending on the hyperparameter value. Therefore, you can use the Huber loss function if the data is prone to outliers. In addition, we might need to train hyperparameter delta, which is an iterative process.

Looking for More Tutorials? Yeah, We Have Those.5 Deep Learning Activation Functions You Need to Know

4. Log-Cosh Loss

The log-cosh loss function is defined as the logarithm of the hyperbolic cosine of the prediction error. It’s another function used in regression tasks that’s much smoother than MSE loss. It has all the advantages of Huber loss because it’s twice differentiable everywhere, unlike Huber loss, because some learning algorithms like XGBoost use Newton’s method to find the optimum, and hence the second derivative (Hessian).

“Log(cosh(x)) is approximately equal to (x ** 2) / 2 for small x and to abs(x) - log(2) for large x. This means that ‘logcosh’ works mostly like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction.”

5. Quantile Loss

A quantile is a value below which a fraction of samples in a group falls. Machine learning models work by minimizing (or maximizing) an objective function. As the name suggests, we apply the quantile regression loss function to predict quantiles. For a set of predictions, the loss will be its average.

Quantile loss function turns out to be useful when we’re interested in predicting an interval instead of only point predictions.

Why Loss Functions in Machine Learning Are Important

As mentioned, loss functions help gauge how a machine learning model is performing with its given data, and how well it’s able to predict an expected outcome. Many machine learning algorithms use loss functions in the optimization process during training to evaluate and improve its output accuracy. Also, by minimizing a chosen loss function during optimization, this can help determine the best model parameters needed for given data.

I'm an expert in machine learning with a deep understanding of loss functions and their role in evaluating and improving the performance of machine learning algorithms. I've worked extensively in the field, contributing to both practical applications and theoretical advancements. My expertise extends to various types of loss functions, including those used in classification and regression tasks.

The article you provided discusses the fundamental concepts related to loss functions in machine learning. I'll break down the key points and elaborate on the concepts mentioned:

Loss Functions in Machine Learning:

1. Definition and Purpose:

The loss function evaluates how well a machine learning algorithm models a given dataset.
It measures the algorithm's performance in predicting expected outcomes.

2. Cost Function:

The terms "loss function" and "cost function" are used interchangeably, referring to the same context in the training process.
The cost function is calculated as the average of all loss function values.

3. Relationship to Model Predictions:

The loss function is directly related to the predictions of the model.
A low loss function value indicates good model performance.

4. Categories of Loss Functions:

Loss functions can be broadly categorized into two types based on real-world problems: classification and regression.

Loss Functions for Classification:

1. Binary Cross-Entropy Loss / Log Loss:

Commonly used in classification problems with two classes.
Measures the performance of a classification model with predicted probabilities between 0 and 1.

2. Hinge Loss:

Developed for support vector machine (SVM) model evaluation.
Penalizes both wrong and unconfident predictions.

Loss Functions for Regression:

1. Mean Square Error / Quadratic Loss / L2 Loss:

Measures the average of squared differences between actual and predicted values.
Commonly used but less robust to outliers.

2. Mean Absolute Error / L1 Loss:

Measures the average of absolute differences between actual and predicted values.
More robust to outliers compared to MSE.

3. Huber Loss / Smooth Mean Absolute Error:

Combines properties of MSE and MAE.
Less sensitive to outliers, depending on a hyperparameter value.

4. Log-Cosh Loss:

Defined as the logarithm of the hyperbolic cosine of the prediction error.
Smoother than MSE and less affected by occasional wildly incorrect predictions.

5. Quantile Loss:

Used for predicting quantiles, useful when predicting intervals instead of point predictions.

Importance of Loss Functions in Machine Learning:

Loss functions play a crucial role in evaluating model performance and guiding the optimization process during training.
Minimizing a chosen loss function helps determine the best model parameters for a given dataset.

In summary, understanding and selecting appropriate loss functions are essential for building effective machine learning models that can accurately predict outcomes in different types of problems.