Understanding L1 and SmoothL1Loss

Som
6 min readAug 20, 2023

While training machine learning or deep learning models choosing appropriate loss is a crucial step. Not only does loss help our model to learn, but it also acts as an indicator for us to determine how the model is performing epoch by epoch. In this post, I’ll be discussing about what is SmoothL1Loss and where we can use it. What is the difference in using smoothl1loss instead of l1 loss? How does it vary as the values increase or decrease?

Photo by Daniel Lerman on Unsplash

Prerequisites

Before understanding and going in-depth on nn.smoothl1loss we will have to understand

What is L1 loss, how can we calculate it and where we can use it?

What is L1 loss?

L1 loss is also known as mean absolute loss or mean absolute error. It’s simply the summation of the absolute difference between actual and predicted values.

MAE: mean absolute error

Here,
MAE: Mean absolute error
yi = prediction value
xi = true value
n = no of values

What does L1 loss or MAE indicate?

The mean absolute error indicates how much our model differentiates from actual values. Let’s take an example and understand it further.

import torch
import numpy as np
from tqdm import tqdm
import plotly.express as px # for visualisation
import plotly.io as pio
pio.renderers.default = 'iframe'

class LinearRegressor(torch.nn.Module):
"""
Building a simple linear regression model with pytorch
"""
def __init__(self):
super(LinearRegressor,self).__init__()
self.l1 = torch.nn.Linear(1,1) # setting up linear layer of 1 input and 1 output
def forward(self, x):
out = self.l1(x)
return out

let’s understand the mean absolute error

# let's take a sample data here we are taking integer in range 1 to 20 and output as 20 multiple this will be 
# enough for understanding loss function
X = np.arange(1,20)
y = X * 20

now let’s build a model and see how L1 loss varies across the inputs

model = LinearRegressor()
loss_function = torch.nn.L1Loss()
def train_model(model, loss_function):
optimizer = torch.optim.Adam(model.parameters(), lr = 0.05)
losses = []
for epoch in range(200):# let's train this model for 1000 epochs
loss_output = []
for (input,output) in tqdm(zip(X,y)):
input = torch.tensor([input]).to(torch.float32)
output = torch.tensor([output]).to(torch.long)
preds = model(input)
loss = loss_function(preds,output)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_output.append(float(loss.cpu().detach().numpy()))
if epoch%100 == 0: print(f"loss after {epoch}/100 is {loss}")
losses.append(np.mean(loss_output))
return losses
loss_output = train_model(model,loss_function)
px.line( loss_output, title = "L1 loss across 200 epochs")

As we can see from the output the L1 loss (i.e. MAE) can be in the range (0,inf). As our model learns from the training data the loss decreases from 200 to 0.073 now the model will be predicting almost the right answer if we ask it to predict the value which it has not gone in its training. Here the L1 loss indicates to us that the model has learnt about the linear nature of out training data let’s see if we can ask it to predict and measure the error from the predicted output

Let’s try to predict for values `20 and 21` the ideal answer should be `400 and 420`

out1 = model(torch.tensor([20]).to(torch.float32))
out2 = model(torch.tensor([21]).to(torch.float32))
out1, out2
loss_function(out1, torch.tensor(400)), loss_function(out2, torch.tensor(420))

As we can see from the above outputs the model predicted 400.5421 for 20 and 420.5483 for 21 if we calculate the `L1loss` from our loss function we get 0.5421 and 0.5483 which is absolute difference.

Now let’s discuss SmoothL1Loss

This version of loss depends on the value based on the value of the beta threshold. When the value is less than threshold It’s less sensitive to outliers than the mean squared error loss. But at the same time when the value is greater, it acts the same as the L1 loss which makes it more sensitive to the outliers.

Logically smooth l1 loss looks like this

if |prediction - ground truth| < beta:
loss = (prediction - ground truth)^2 / 2
else:
loss = |prediction - ground truth| - beta / 2

In default PyTorch implementation the value for the beta is 1.0

Let’s observe this loss in action

Now we will be training the same LinearRegressor model with SmoothL1Loss and compare losses of both L1_loss as well as SmoothL1 loss across the iterations

model_1 = LinearRegressor()
smooth_loss_function = torch.nn.SmoothL1Loss()
smooth_loss_output = train_model(model_1,smooth_loss_function)

Let’s plot the output of smoothL1Loss across 200 epochs

fig = px.line(
y=[loss_output, smooth_loss_output],
title="Smooth_L1_loss across 200 epochs",
labels={"y": "Loss", "x": "Epoch"},
color_discrete_sequence=["blue", "green"], # Optional custom colors
)
fig.data[0].name = "L1 Loss Output"
fig.data[1].name = "Smooth Loss Output"
fig.update_layout(legend_title_text="Loss Type")
fig.show()

Both the graphs were quite similar but if we focus on the end graph where the loss approaches to near 0 the L1 loss we found to be approx 0.54 whereas the smoothL1 loss indicates at the same points loss as 0. This indicates the less sensitive nature of the smooth L1 loss.

Let’s compare the output from the model_1

out1 = model_1(torch.tensor([20]).to(torch.float32))
out2 = model_1(torch.tensor([21]).to(torch.float32))

out1, out2
smooth_loss_function(out1, torch.tensor(400)), smooth_loss_function(out2, torch.tensor(420))

In the above case we found the output loss is 0

For measuring the impact of both the losses as predictions are far from actual values let’s arrange dummy predictions

# Let's check what will happen to both the losses when the losses are increasing  
# let's generate data for dummy predictions and actual values
dummy_preds = torch.tensor(np.sort( np.random.uniform(0, 2, 100))).unsqueeze(1)
dummy_actuals = torch.tensor(np.ones(100)).unsqueeze(1)
fig = px.line(y = [dummy_preds.squeeze(), dummy_actuals.squeeze()], title = 'Sample data for understanding')
fig.data[0].name = "dummy predictions"
fig.data[1].name = "dummy acutal values "
fig.update_layout(legend_title_text="Legend")
fig.show()

Let’s generate both L1 and L2 losses from the above dummy-generated data

dummy_l1_loss = [loss_function(pred,act) for pred,act in zip(dummy_preds,dummy_actuals)]
dummy_smooth_l1_loss = [smooth_loss_function(pred,act) for pred,act in zip(dummy_preds, dummy_actuals)]
fig = px.line(
y=[dummy_l1_loss, dummy_smooth_l1_loss],
title="Dummy values losses for L1 and smooth L1",
labels={"y": "Loss", "x": "increasing losses"},
color_discrete_sequence=["blue", "green"], # Optional custom colors
)
fig.data[0].name = "L1 Loss Output"
fig.data[1].name = "Smooth Loss Output"
fig.update_layout(legend_title_text="Loss Type")
fig.show()

The expected output value for all the predictions is supposed to be zero for test purposes we have taken the random floats from the range 0.5 to 1.5 Here we can see as the loss varies from 0 to 2 the L1 loss is more sensitive to the outliers whereas the smooth L1 Loss is less sensitive showing near zero loss as absolute value reaches < delta.

Notebook version for the same blog

Additional sources

thanks for reading :) follow for more!
have a good day 🤗

WRITER at MLearning.ai / 800+ AI plugins / AI Searching 2024

--

--