Bookmarks
Category Theory: Lecture Notes and Online Books
The links below are to various freely (and legitimately!) available online mathematical resources for those interested in category theory at an elementary/intermediate level. There is supplementary page, introductory readings for philosophers, for reading suggestions for those looking for the most accessible routes into category theory and/or links to philosophical discussions. A gentle introduction? My Category … Category Theory: Lecture Notes and Online Books Read More »
Towards a Categorical Foundation of Deep Learning: A Survey
The unprecedented pace of machine learning research has lead to incredible advances, but also poses hard challenges. At present, the field lacks strong theoretical underpinnings, and many important achievements stem from ad hoc design choices which are hard to justify in principle and whose effectiveness often goes unexplained. Research debt is increasing and many papers are found not to be reproducible.
This thesis is a survey that covers some recent work attempting to study machine learning categorically. Category theory is a branch of abstract mathematics that has found successful applications in many fields, both inside and outside mathematics. Acting as a lingua franca of mathematics and science, category theory might be able to give a unifying structure to the field of machine learning. This could solve some of the aforementioned problems.
In this work, we mainly focus on the application of category theory to deep learning. Namely, we discuss the use of categorical optics to model gradient-based learning, the use of categorical algebras and integral transforms to link classical computer science to neural networks, the use of functors to link different layers of abstraction and preserve structure, and, finally, the use of string diagrams to provide detailed representations of neural network architectures.
Soft question: Deep learning and higher categories
Recently, I have stumbled upon certain articles and lecture videos that use category theory to explain certain aspects of machine learning or deep learning (e.g. Cats for AI and the paper An enriched
Algebraic Databases
Databases have been studied category-theoretically for decades. The database schema---whose purpose is to arrange high-level conceptual entities---is generally modeled as a category or sketch. The data itself, often called an instance, is generally modeled as a set-valued functor, assigning to each conceptual entity a set of examples. While mathematically elegant, these categorical models have typically struggled with representing concrete data such as integers or strings.
In the present work, we propose an extension of the set-valued functor model, making use of multisorted algebraic theories (a.k.a. Lawvere theories) to incorporate concrete data in a principled way. This also allows constraints and queries to make use of operations on data, such as multiplication or comparison of numbers, helping to bridge the gap between traditional databases and programming languages.
We also show how all of the components of our model---including schemas, instances, change-of-schema functors, and queries - fit into a single double categorical structure called a proarrow equipment (a.k.a. framed bicategory).
The categorical abstract machine
The Cartesian closed categories have been shown by several authors to provide the right framework of the model theory of λ-calculus. The second author…
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
We present our position on the elusive quest for a general-purpose framework
for specifying and studying deep learning architectures. Our opinion is that
the key attempts made so far lack a coherent bridge between specifying
constraints which models must satisfy and specifying their implementations.
Focusing on building a such a bridge, we propose to apply category theory --
precisely, the universal algebra of monads valued in a 2-category of parametric
maps -- as a single theory elegantly subsuming both of these flavours of neural
network design. To defend our position, we show how this theory recovers
constraints induced by geometric deep learning, as well as implementations of
many architectures drawn from the diverse landscape of neural networks, such as
RNNs. We also illustrate how the theory naturally encodes many standard
constructs in computer science and automata theory.
Fundamental Components of Deep Learning: A category-theoretic approach
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
Logic and linear algebra: an introduction
We give an introduction to logic tailored for algebraists, explaining how proofs in linear logic can be viewed as algorithms for constructing morphisms in symmetric closed monoidal categories with additional structure. This is made explicit by showing how to represent proofs in linear logic as linear maps between vector spaces. The interesting part of this vector space semantics is based on the cofree cocommutative coalgebra of Sweedler.
Logical Complexity of Proofs
If you cannot find proofs, talk about them. Robert Reckhow with his advsior Stephen Cook famously started the formal study of the complexity of proofs with their 1979 paper. They were interested in…
Richard Hamming - Wikipedia
Richard Wesley Hamming (February 11, 1915 – January 7, 1998) was an American mathematician whose work had many implications for computer engineering and telecommunications.
Fundamental Components of Deep Learning: A category-theoretic approach
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
Category theory for scientists (Old version)
There are many books designed to introduce category theory to either a
mathematical audience or a computer science audience. In this book, our
audience is the broader scientific community. We attempt to show that category
theory can be applied throughout the sciences as a framework for modeling
phenomena and communicating results. In order to target the scientific
audience, this book is example-based rather than proof-based. For example,
monoids are framed in terms of agents acting on objects, sheaves are introduced
with primary examples coming from geography, and colored operads are discussed
in terms of their ability to model self-similarity.
A new version with solutions to exercises will be available through MIT
Press.
Category Theory usage in Algebraic Topology
First my question:
How much category theory should someone studying algebraic topology generally know?
Motivation: I am taking my first graduate course in algebraic topology next semester, and,...
Topos Theory in a Nutshell
Okay, you wanna know what a topos is? First I'll give you a hand-wavy vague explanation, then an actual definition, then a few consequences of this definition, and then some examples.
Proof Explorer
Inspired by Whitehead and Russell's monumental Principia Mathematica, the Metamath Proof Explorer has over 26,000 completely worked out proofs in its main sections (and over 41,000 counting "mathboxes", which are annexes where contributors can develop additional topics), starting from the very foundation that mathematics is built on and eventually arriving at familiar mathematical facts and beyond.
An Invitation to Applied Category Theory
Abstract page for arXiv paper 1803.05316: Seven Sketches in Compositionality: An Invitation to Applied Category Theory
An Invitation to Applied Category Theory
Cambridge Core - Programming Languages and Applied Logic - An Invitation to Applied Category Theory
Information Theory: A Tutorial Introduction
Shannon's mathematical theory of communication defines fundamental limits on
how much information can be transmitted between the different components of any
man-made or biological system. This paper is an informal but rigorous
introduction to the main ideas implicit in Shannon's theory. An annotated
reading list is provided for further reading.
How to get from high school math to cutting-edge ML/AI: a detailed 4-stage roadmap with links to the best learning resources that I’m aware of.
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning.
Fundamental Components of Deep Learning: A category-theoretic approach
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
The Fast Track
In order to accelerate the development of prospective mathematical scientists, we have selected a series of textbooks one can study to reach expertise in mathematics and physics in the most efficient manner possible.
Your Starting Point!
The text discusses the concepts of three-dimensional objects and how they are represented in two dimensions for computer graphics. It explains the process of projecting 3D points onto a canvas to create images. The importance of geometry and mathematics in computer graphics, particularly in defining objects and creating images, is emphasized.
Ray Tracing in One Weekend
"Ray Tracing in One Weekend" introduces readers to the concept of ray tracing through a step-by-step guide to creating a ray tracer that produces images. The document covers topics such as sending rays into the scene, ray-sphere intersection, shading, and reflection. It explains the mathematical aspects behind ray tracing, including formulas for sphere intersections and normal vectors. The guide progresses from creating a simple image of a sphere to more complex scenes, providing insights into the coding process and considerations for optimizing the rendering process.
immersivemath: Immersive Linear Algebra
This text introduces a book on linear algebra with chapters covering vectors, dot products, matrix operations, and more. It aims to help readers understand fundamental concepts and tools in linear algebra through clear explanations and examples. The book includes topics such as Gaussian elimination, determinants, rank, and eigenvalues.
Arithmetic functions
BQN's arithmetic functions mirror mathematical notation and apply element-wise to arrays. BQN supports basic arithmetic operations like addition, subtraction, multiplication, division, exponentiation, and root functions. Character arithmetic is a distinctive feature allowing manipulation of characters with symbols like + and -.
Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model
The paper presents a simple and effective denoising-diffusion model called Iterative α-(de)Blending. It offers a user-friendly alternative to complex theories, making it accessible with basic calculus and probability knowledge. By iteratively blending and deblending samples, the model converges to a deterministic mapping, showing promising results in computer graphics applications.
LADW_2017-09-04
This text discusses properties of vector spaces and matrices, particularly focusing on bases and eigenvalues. It establishes that any linearly independent system of vectors can be completed to form a basis in a finite-dimensional vector space. Additionally, it explains that operators in inner product spaces have an upper triangular matrix representation under certain conditions.
Pattern Recognition and Machine Learning
The content discusses likelihood functions for Gaussian distributions, maximizing parameters using observed data, Bayesian model comparison, mixture density networks, and EM algorithm for Gaussian mixtures. It covers topics like posterior distributions, predictive distributions, graphical models, and variational inference. The material emphasizes probability distributions, optimization, and model comparison.
Revisiting Deep Learning as a Non-Equilibrium Process
The document discusses the nature of Deep Learning systems, highlighting differences from traditional machine learning systems and challenging common misconceptions. It emphasizes the complexity and non-convexity of Deep Learning, noting that optimization techniques alone cannot explain its success. The text critiques the field for lacking in-depth exploration of the true nature of Deep Learning, pointing out a tendency towards superficial explanations and reliance on celebrity figures rather than rigorous scientific inquiry. It delves into the use of Bayesian techniques, the role of noise, and the importance of architecture in Deep Learning, arguing for a deeper understanding of the underlying processes and the need for more precise language and theoretical exploration.
Dissipative Adaptation: The Origins of Life and Deep Learning
The document explores the concept of Dissipative Adaptation, drawing parallels between the emergence of life and the mechanisms of Deep Learning. It discusses the work of Jeremy England and his theory of non-equilibrium statistical mechanics known as Dissipative Adaptation, which explains the self-organizing behavior of Deep Learning. The text delves into how neural networks evolve through training, emphasizing the role of external observations in driving the system towards minimizing entropy. It contrasts the mechanisms of Dissipative Adaptation with current Deep Learning architectures, highlighting similarities in alignment of components to maximize energy dissipation or information gradient.
Re: [Fis] A PROPOSAL ABOUT THE DEFINITION OF INFORMATION
The email exchange discusses the concept of negative entropy and its implications in mathematics and thermodynamics. Sungchul Ji questions the validity of negative entropy based on the Third Law of Thermodynamics. Arturo Tozzi argues for the existence of negative entropy in certain cases and relates it to information theory and free energy.
Information
The text discusses the challenges and complexities of measuring and quantifying information, particularly in terms of storage capacity, compression, and entropy. It explores various examples, such as genome information, human sensory capabilities, and the information content of objects like water molecules and black holes. The relationship between information, entropy, and physical properties is also highlighted.
Landauer's principle
Landauer's principle is a physical principle that establishes the minimum energy consumption of computation. It states that irreversible changes in information stored in a computer dissipate a minimum amount of heat to the surroundings. The principle was proposed by Rolf Landauer in 1961 and states that the minimum energy needed to erase one bit of information is proportional to the temperature at which the system is operating. While the principle is widely accepted, it has faced challenges in recent years. However, it has been shown that Landauer's principle can be derived from the second law of thermodynamics and the entropy change associated with information gain.
Bekenstein bound
The Bekenstein bound is an upper limit on the entropy or information that can be contained within a given finite region of space with a finite amount of energy. It implies that the information of a physical system must be finite if the region of space and energy are finite. The bound was derived from arguments involving black holes and has implications for thermodynamics and general relativity. It can be proven in the framework of quantum field theory and has applications in various fields, such as black hole thermodynamics and the study of human brains.
numerical_recipes
The content provided is the table of contents for a book titled "Numerical Recipes: The Art of Scientific Computing, Third Edition." It includes various topics such as linear algebra, interpolation and extrapolation, integration of functions, evaluation of functions, special functions, random numbers, sorting and selection, root finding and nonlinear sets of equations, minimization or maximization of functions, eigensystems, and more.
Temperature as Joules per Bit
The paper suggests that temperature should be defined in terms of entropy, rather than vice versa. It argues that the current practice of measuring entropy in joules per kelvin is a historical artifact and proposes measuring entropy in bits instead. The paper also discusses the role of information in thermodynamics and the thermodynamic cost of erasure. It concludes by suggesting that entropy, not temperature, should have its own unit and that Boltzmann's constant should be dissolved.
Deep Learning Course
This document provides resources for François Fleuret's deep-learning course at the University of Geneva. The course offers a thorough introduction to deep learning, with examples using the PyTorch framework. The materials include slides, recordings, and a virtual machine. The course covers topics such as machine learning objectives, tensor operations, automatic differentiation, gradient descent, and deep-learning techniques. The document also includes prerequisites for the course, such as knowledge of linear algebra, differential calculus, Python programming, and probability and statistics.
Pen and Paper Exercises in Machine Learning
This is a collection of (mostly) pen-and-paper exercises in machine learning.
The exercises are on the following topics: linear algebra, optimisation,
directed graphical models, undirected graphical models, expressive power of
graphical models, factor graphs and message passing, inference for hidden
Markov models, model-based learning (including ICA and unnormalised models),
sampling and Monte-Carlo integration, and variational inference.
Linear Algebra Review and Reference
Sorry, there is no content provided to summarize. Please provide the content you want me to summarize.
Probability and InformationTheory
In this chapter, the authors discuss probability theory and information theory. Probability theory is a mathematical framework for representing uncertain statements and is used in artificial intelligence for reasoning. Information theory, on the other hand, quantifies the amount of uncertainty in a probability distribution. The chapter explains various concepts, such as probability mass functions for discrete variables and probability density functions for continuous variables. It also introduces key ideas from information theory, such as entropy and mutual information. The authors provide examples and explanations to help readers understand these concepts.
Linear Algebra
Linear algebra is a fundamental topic in understanding and working with machine learning algorithms, especially deep learning algorithms. This chapter provides an introduction to scalars, vectors, matrices, and tensors, which are the key mathematical objects in linear algebra. It explains the concepts and notation used in linear algebra, such as matrix multiplication, transpose, identity and inverse matrices, and norms. The chapter also introduces special kinds of matrices and vectors, such as diagonal matrices, orthogonal matrices, and eigenvalues and eigenvectors. These concepts are important for analyzing and solving equations in machine learning.
Mathematics for Machine Learning
I'm sorry, but there is no content provided for me to summarize.
The Random Transformer
This blog post provides an end-to-end example of the math within a transformer model, with a focus on the encoder part. The goal is to understand how the model works, and to make it more manageable, simplifications are made and the dimensions of the model are reduced. The post recommends reading "The Illustrated Transformer" blog for a more intuitive explanation of the transformer model. The prerequisites for understanding the content include basic knowledge of linear algebra, machine learning, and deep learning. The post covers the math within a transformer model during inference, attention mechanisms, residual connections and layer normalization, and provides some code to scale it up.
Subcategories
- applications (15)
- computer_architecture (1)
- ethics (1)
- expert_systems (2)
- game_ai (5)
- knowledge_representation (4)
- machine_learning (324)
- natural_language_processing (3)
- planning_and_scheduling (2)
- robotics (2)
- software_development (1)
- theory (1)