Bookmarks
diffusion transformers
Metaphorically, you can think of Vision Transformers as the eyes of the system, able to understand and contextualize what it sees, while Stable Diffusion is the hand of the system, able to generate and manipulate images based on this understanding.
Flow Matching Guide and Code
Flow Matching (FM) is a recent framework for generative modeling that has
achieved state-of-the-art performance across various domains, including image,
video, audio, speech, and biological structures. This guide offers a
comprehensive and self-contained review of FM, covering its mathematical
foundations, design choices, and extensions. By also providing a PyTorch
package featuring relevant examples (e.g., image and text generation), this
work aims to serve as a resource for both novice and experienced researchers
interested in understanding, applying and further developing FM.
Tutorial on Diffusion Models for Imaging and Vision
The astonishing growth of generative tools in recent years has empowered many
exciting applications in text-to-image generation and text-to-video generation.
The underlying principle behind these generative tools is the concept of
diffusion, a particular sampling mechanism that has overcome some shortcomings
that were deemed difficult in the previous approaches. The goal of this
tutorial is to discuss the essential ideas underlying the diffusion models. The
target audience of this tutorial includes undergraduate and graduate students
who are interested in doing research on diffusion models or applying these
models to solve other problems.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
The authors present a method for training large text-to-image diffusion models on a very low budget. They use a technique called deferred masking to minimize performance loss while reducing computational costs. Their approach achieves high-quality results at a fraction of the cost compared to existing models, demonstrating the potential for democratizing AI training.
Step-by-Step Diffusion: An Elementary Tutorial
The text is a tutorial about diffusion. The authors are Preetum Nakkiran, Arwen Bradley, Hattie Zhou, and Madhu Advani. The tutorial is available on the domain readwise.io.
What are Diffusion Models?
Diffusion models slowly add noise to data and then learn to reverse the process to create desired samples. Unlike other models, diffusion models have a fixed procedure and high-dimensional latent variables. Training a diffusion model involves approximating conditioned probability distributions and simplifying the objective function.
Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model
The paper presents a simple and effective denoising-diffusion model called Iterative α-(de)Blending. It offers a user-friendly alternative to complex theories, making it accessible with basic calculus and probability knowledge. By iteratively blending and deblending samples, the model converges to a deterministic mapping, showing promising results in computer graphics applications.
How diffusion models work: the math from scratch
Diffusion models generate diverse high-resolution images and are different from previous generative methods. Cascade diffusion models and latent diffusion models are used to scale up models to higher resolutions efficiently. Score-based generative models are similar to diffusion models and involve noise perturbations to generate new samples.
The Annotated Diffusion Model
A neural network learns to denoise data by gradually removing noise. The process involves adding noise to an image and then training the network to reverse the denoising. The network predicts noise levels based on corrupted images at different time steps.
Defusing Diffusion Models
This post explains the concepts of forward and reverse diffusion processes in diffusion models. By understanding these processes, readers can train diffusion models to generate samples from target distributions effectively. Guided diffusion models are also discussed, showing how conditioning information can be used to guide the diffusion process for specific outcomes.
The Illustrated Stable Diffusion
AI image generation with Stable Diffusion involves an image information creator and an image decoder. Diffusion models use noise and powerful computer vision models to generate aesthetically pleasing images. Text can be incorporated to control the type of image the model generates in the diffusion process.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
Diffusion Models and Associative Memories show surprising similarities in their mathematical underpinnings and goals, bridging traditional and modern AI research. This connection highlights the convergence of AI models towards memory-focused paradigms, emphasizing the importance of understanding Associative Memories in the field of computation. By exploring these parallels, researchers aim to enhance our comprehension of how models like Diffusion Models and Transformers operate in Deep Learning applications.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
Diffusion Models (DMs) have become increasingly popular in generating benchmarks, but their mathematical descriptions can be complex. In this survey, the authors provide an overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs), revealing a mathematical connection to Associative Memories (AMs). AMs are energy-based models that share similarities with denoising DMs, but they allow for the computation of a Lyapunov energy function and gradient descent to denoise data. The authors also summarize the 40-year history of energy-based AMs, starting with the Hopfield Network, and discuss future research directions for both AMs and DMs.
Subcategories
- applications (9)
- compression (9)
- computer_vision (8)
- deep_learning (94)
- ethics (2)
- generative_models (25)
- interpretability (17)
- natural_language_processing (24)
- optimization (7)
- recommendation (2)
- reinforcement_learning (11)
- supervised_learning (1)