Introduction to diffusion models

Som
3 min readNov 29, 2022

Recently Hugging Face launched a course on diffusion models. This blog aims toward the basics of diffusion models, How do they work? and Diffusion library! 🤗

Photo by Marc Schulte on Unsplash

What are the diffusion models?

They are the type of generative models. They generate a diverse set of output that resembles the training data without being the exact same copies. For getting the diffusion models to work the training is done iteratively. Where we add random noise and model estimates how can we go from small noise to a completely denoised image.

Training procedure in case of diffusion models:

  1. Loading the data
  2. adding noise (in different amounts)
  3. feed noisy version as inputs
  4. optimize on how well the model denoises.
  5. update model weights on the above information.

Generating new images with diffusion models:

We begin with a completely random input update each time by a small amount based on the model prediction.

Dreambooth

The stable diffusion model is a text-conditioned model. It lets us create our own variant with an extra specific face, object, or style.

Huggingface API diffusers

  1. Pipelines: high-level classes
  2. Models: architecture
  3. Schedulers: generating images from noise during inference as well as generating noisy images for training.

DDPM (Denoising diffusion probabilistic models) scheduler:
ddpm is the noise scheduler it feeds noisy images to the models, during inference we use the model predictions iteratively to remove the noise. The scheduler helps us to handle this procedure.

adding noise to butterfly images.

Defining the model

We can define the model same as the variant of the UNET architecture ( figure given below)

this model has several blocks of the present layers which halves the image size by 2 after that we upsample again to increase the image size.

After defining the model we can train the model as the regular pytorch training loop

Training loop + adding noise to images

# Set the noise scheduler
noise_scheduler = DDPMScheduler(
num_train_timesteps=1000, beta_schedule="squaredcos_cap_v2"
)

# Training loop
optimizer = torch.optim.AdamW(model.parameters(), lr=4e-4)

losses = []

for epoch in range(30):
for step, batch in enumerate(train_dataloader):
clean_images = batch["images"].to(device)
# Sample noise to add to the images
noise = torch.randn(clean_images.shape).to(clean_images.device)
bs = clean_images.shape[0]

# Sample a random timestep for each image
timesteps = torch.randint(
0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device
).long()

# Add noise to the clean images according to the noise magnitude at each timestep
noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)

# Get the model prediction
noise_pred = model(noisy_images, timesteps, return_dict=False)[0]

# Calculate the loss
loss = F.mse_loss(noise_pred, noise)
loss.backward(loss)
losses.append(loss.item())

# Update the model parameters with the optimizer
optimizer.step()
optimizer.zero_grad()

if (epoch + 1) % 5 == 0:
loss_last_epoch = sum(losses[-len(train_dataloader) :]) / len(train_dataloader)
print(f"Epoch:{epoch+1}, loss: {loss_last_epoch}")

after training model for the 50 epochs and the images similar to the train data can be generated.

Example:

Notebook links

Link to the Course GitHub

References

--

--