Bookmarks
Categories
This seminar series seeks to promote
the learning and use of Category Theory by Machine Learning Researchers
Modular Manifolds
A geometric framework for co-designing neural net optimizers with manifold constraints.
Why Momentum Really Works
We often think of optimization with momentum as a ball rolling down a hill. This isn't wrong, but there is much more to the story.
An elementary proof of a universal approximation theorem
In this short note, we give an elementary proof of a universal approximation
theorem for neural networks with three hidden layers and increasing,
continuous, bounded activation function. The result is weaker than the best
known results, but the proof is elementary in the sense that no machinery
beyond undergraduate analysis is used.
Fundamental Components of Deep Learning: A category-theoretic approach
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.