Introduction to diffusion models

3 min readNov 29, 2022

Recently Hugging Face launched a course on diffusion models. This blog aims toward the basics of diffusion models, How do they work? and Diffusion library! 🤗

What are the diffusion models?

They are the type of generative models. They generate a diverse set of output that resembles the training data without being the exact same copies. For getting the diffusion models to work the training is done iteratively. Where we add random noise and model estimates how can we go from small noise to a completely denoised image.

Training procedure in case of diffusion models:

Loading the data
adding noise (in different amounts)
feed noisy version as inputs
optimize on how well the model denoises.
update model weights on the above information.

Generating new images with diffusion models:

We begin with a completely random input update each time by a small amount based on the model prediction.

Dreambooth

The stable diffusion model is a text-conditioned model. It lets us create our own variant with an extra specific face, object, or style.

Huggingface API diffusers

Pipelines: high-level classes
Models: architecture
Schedulers: generating images from noise during inference as well as generating noisy images for training.

DDPM (Denoising diffusion probabilistic models) scheduler:
ddpm is the noise scheduler it feeds noisy images to the models, during inference we use the model predictions iteratively to remove the noise. The scheduler helps us to handle this procedure.

adding noise to butterfly images.

Defining the model

We can define the model same as the variant of the UNET architecture ( figure given below)

this model has several blocks of the present layers which halves the image size by 2 after that we upsample again to increase the image size.

After defining the model we can train the model as the regular pytorch training loop

Training loop + adding noise to images

# Set the noise scheduler
noise_scheduler = DDPMScheduler(
    num_train_timesteps=1000, beta_schedule="squaredcos_cap_v2"
)

# Training loop
optimizer = torch.optim.AdamW(model.parameters(), lr=4e-4)

losses = []

for epoch in range(30):
    for step, batch in enumerate(train_dataloader):
        clean_images = batch["images"].to(device)
        # Sample noise to add to the images
        noise = torch.randn(clean_images.shape).to(clean_images.device)
        bs = clean_images.shape[0]

        # Sample a random timestep for each image
        timesteps = torch.randint(
            0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device
        ).long()

        # Add noise to the clean images according to the noise magnitude at each timestep
        noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)

        # Get the model prediction
        noise_pred = model(noisy_images, timesteps, return_dict=False)[0]

        # Calculate the loss
        loss = F.mse_loss(noise_pred, noise)
        loss.backward(loss)
        losses.append(loss.item())

        # Update the model parameters with the optimizer
        optimizer.step()
        optimizer.zero_grad()

    if (epoch + 1) % 5 == 0:
        loss_last_epoch = sum(losses[-len(train_dataloader) :]) / len(train_dataloader)
        print(f"Epoch:{epoch+1}, loss: {loss_last_epoch}")

after training model for the 50 epochs and the images similar to the train data can be generated.

Example:

Notebook links

Google Colaboratory

Edit description

colab.research.google.com

Link to the Course GitHub

diffusion-models-class/unit1 at main · huggingface/diffusion-models-class

Welcome to Unit 1 of the Hugging Face Diffusion Models Course! In this unit you will learn the basics of how diffusion…

github.com

References

Denoising Diffusion Probabilistic Models

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models…

arxiv.org

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com