Bookmarks
World Models
Can agents learn inside of their own dreams?
Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT
State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs.
Position: Model Collapse Does Not Mean What You Think
The proliferation of AI-generated content online has fueled concerns over \emph{model collapse}, a degradation in future generative models' performance when trained on synthetic data generated by earlier models. Industry leaders, premier research journals and popular science publications alike have prophesied catastrophic societal consequences stemming from model collapse. In this position piece, we contend this widespread narrative fundamentally misunderstands the scientific evidence. We highlight that research on model collapse actually encompasses eight distinct and at times conflicting definitions of model collapse, and argue that inconsistent terminology within and between papers has hindered building a comprehensive understanding of model collapse. To assess how significantly different interpretations of model collapse threaten future generative models, we posit what we believe are realistic conditions for studying model collapse and then conduct a rigorous assessment of the literature's methodologies through this lens. While we leave room for reasonable disagreement, our analysis of research studies, weighted by how faithfully each study matches real-world conditions, leads us to conclude that certain predicted claims of model collapse rely on assumptions and conditions that poorly match real-world conditions, and in fact several prominent collapse scenarios are readily avoidable. Altogether, this position paper argues that model collapse has been warped from a nuanced multifaceted consideration into an oversimplified threat, and that the evidence suggests specific harms more likely under society's current trajectory have received disproportionately less attention.
diffusion transofrmers
Metaphorically, you can think of Vision Transformers as the eyes of the system, able to understand and contextualize what it sees, while Stable Diffusion is the hand of the system, able to generate and manipulate images based on this understanding.
diffusion transformers
Metaphorically, you can think of Vision Transformers as the eyes of the system, able to understand and contextualize what it sees, while Stable Diffusion is the hand of the system, able to generate and manipulate images based on this understanding.
Flow Matching Guide and Code
Flow Matching (FM) is a recent framework for generative modeling that has
achieved state-of-the-art performance across various domains, including image,
video, audio, speech, and biological structures. This guide offers a
comprehensive and self-contained review of FM, covering its mathematical
foundations, design choices, and extensions. By also providing a PyTorch
package featuring relevant examples (e.g., image and text generation), this
work aims to serve as a resource for both novice and experienced researchers
interested in understanding, applying and further developing FM.
Genie 2: A large-scale foundation world model
Generating unlimited diverse training environments for future general agents
WilliamYi96/Awesome-Energy-Based-Models: A curated list of resources on energy-based models.
A curated list of resources on energy-based models. - WilliamYi96/Awesome-Energy-Based-Models
"CBLL, Research Projects, Computational and Biological Learning Lab, Courant Institute, NYU"
Yann LeCun's Web pages at NYU
yataobian/awesome-ebm: Collecting research materials on EBM/EBL (Energy Based Models, Energy Based Learning)
Collecting research materials on EBM/EBL (Energy Based Models, Energy Based Learning) - yataobian/awesome-ebm
Oasis: A Universe in a Transformer
Generating Worlds in Realtime
Tutorial on Diffusion Models for Imaging and Vision
The astonishing growth of generative tools in recent years has empowered many
exciting applications in text-to-image generation and text-to-video generation.
The underlying principle behind these generative tools is the concept of
diffusion, a particular sampling mechanism that has overcome some shortcomings
that were deemed difficult in the previous approaches. The goal of this
tutorial is to discuss the essential ideas underlying the diffusion models. The
target audience of this tutorial includes undergraduate and graduate students
who are interested in doing research on diffusion models or applying these
models to solve other problems.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
The authors present a method for training large text-to-image diffusion models on a very low budget. They use a technique called deferred masking to minimize performance loss while reducing computational costs. Their approach achieves high-quality results at a fraction of the cost compared to existing models, demonstrating the potential for democratizing AI training.
Picsart-AI-Research/LIVE-Layerwise-Image-Vectorization: [CVPR 2022 Oral] Towards Layer-wise Image Vectorization
The text discusses a new method called LIVE for generating SVG images layer by layer to fit raster images. LIVE uses closed bezier paths to learn visual concepts in a recursive manner. Installation instructions and references for the method are provided in the text.
Step-by-Step Diffusion: An Elementary Tutorial
The text is a tutorial about diffusion. The authors are Preetum Nakkiran, Arwen Bradley, Hattie Zhou, and Madhu Advani. The tutorial is available on the domain readwise.io.
What are Diffusion Models?
Diffusion models slowly add noise to data and then learn to reverse the process to create desired samples. Unlike other models, diffusion models have a fixed procedure and high-dimensional latent variables. Training a diffusion model involves approximating conditioned probability distributions and simplifying the objective function.
Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model
The paper presents a simple and effective denoising-diffusion model called Iterative α-(de)Blending. It offers a user-friendly alternative to complex theories, making it accessible with basic calculus and probability knowledge. By iteratively blending and deblending samples, the model converges to a deterministic mapping, showing promising results in computer graphics applications.
How diffusion models work: the math from scratch
Diffusion models generate diverse high-resolution images and are different from previous generative methods. Cascade diffusion models and latent diffusion models are used to scale up models to higher resolutions efficiently. Score-based generative models are similar to diffusion models and involve noise perturbations to generate new samples.
The Annotated Diffusion Model
A neural network learns to denoise data by gradually removing noise. The process involves adding noise to an image and then training the network to reverse the denoising. The network predicts noise levels based on corrupted images at different time steps.
Defusing Diffusion Models
This post explains the concepts of forward and reverse diffusion processes in diffusion models. By understanding these processes, readers can train diffusion models to generate samples from target distributions effectively. Guided diffusion models are also discussed, showing how conditioning information can be used to guide the diffusion process for specific outcomes.
The Illustrated Stable Diffusion
AI image generation with Stable Diffusion involves an image information creator and an image decoder. Diffusion models use noise and powerful computer vision models to generate aesthetically pleasing images. Text can be incorporated to control the type of image the model generates in the diffusion process.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
Diffusion Models and Associative Memories show surprising similarities in their mathematical underpinnings and goals, bridging traditional and modern AI research. This connection highlights the convergence of AI models towards memory-focused paradigms, emphasizing the importance of understanding Associative Memories in the field of computation. By exploring these parallels, researchers aim to enhance our comprehension of how models like Diffusion Models and Transformers operate in Deep Learning applications.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
Diffusion Models (DMs) have become increasingly popular in generating benchmarks, but their mathematical descriptions can be complex. In this survey, the authors provide an overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs), revealing a mathematical connection to Associative Memories (AMs). AMs are energy-based models that share similarities with denoising DMs, but they allow for the computation of a Lyapunov energy function and gradient descent to denoise data. The authors also summarize the 40-year history of energy-based AMs, starting with the Hopfield Network, and discuss future research directions for both AMs and DMs.
Pen and Paper Exercises in Machine Learning
This is a collection of (mostly) pen-and-paper exercises in machine learning.
The exercises are on the following topics: linear algebra, optimisation,
directed graphical models, undirected graphical models, expressive power of
graphical models, factor graphs and message passing, inference for hidden
Markov models, model-based learning (including ICA and unnormalised models),
sampling and Monte-Carlo integration, and variational inference.
MotionGPT: Human Motion as a Foreign Language
MotionGPT is a unified model for language and motion tasks, achieving top performance in text-driven motion generation. It combines natural language models with human motion tasks, benefiting fields like gaming and robotics. The model treats human motion like a foreign language, offering a versatile solution for diverse motion synthesis problems.
Subcategories
- applications (9)
- compression (9)
- computer_vision (8)
- deep_learning (94)
- ethics (2)
- generative_models (25)
- interpretability (17)
- natural_language_processing (24)
- optimization (7)
- recommendation (2)
- reinforcement_learning (11)
- supervised_learning (1)