Ludwig - ai/deep_learning/neural

Continuous Thought Machines

Added on June 27, 2025

Introducing Continuous Thought Machines: a new kind of neural network model that unfolds and uses neural dynamics as a powerful representation for thought.

By using feature inversion to visualize millions of activations from an image classification network, we create an explorable activation atlas of features the network has learned and what concepts it typically represents.

ai/interpretability ai/deep_learning/neural_networks/convolutional_neural_networks

Unnamed Document

Added on January 25, 2025

cs/theory/algorithms

Greg Yang

Added on December 5, 2024

I am currently developing a framework called Tensor Programs for understanding large neural networks.

Using neural nets to recognize handwritten digits

Added on July 5, 2024

Neural networks can recognize handwritten digits by learning from examples. Sigmoid neurons play a key role in helping neural networks learn. Gradient descent is a common method used for learning in neural networks.

Binary Magic: Building BitNet 1.58bit Using PyTorch from Scratch

Added on May 25, 2024

The document discusses the creation of a 1.58bit model called BitNet using PyTorch from scratch, which can rival full precision LLMs. Quantization, the process of representing float numbers with fewer bits, is explained as a method to increase the speed and reduce the RAM consumption of ML models, albeit with some loss of accuracy. BitNet differs from existing quantization approaches as it trains the model from scratch with quantization, offering a unique quantization algorithm and implementation in PyTorch. Results from experiments with custom PyTorch implementations show that the 2bit and 1bit variants of models perform as well as full precision models, demonstrating the potential of this approach.

Heatmaps and CNNs Using Fast.ai

Added on May 25, 2024

The text discusses heatmaps, CNNs, and their relationship in deep learning. It explains how heatmaps are generated using Grad-CAM heatmaps from the final layer of a Convolutional Neural Network. The article also touches on creating heatmaps using Adaptive Pooling layers and interpreting top losses for model evaluation.

ai/interpretability ai/deep_learning/neural_networks/convolutional_neural_networks

Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks

Added on March 11, 2024

The text is a comprehensive survey of 400 activation functions for neural networks. It provides numerous URLs and DOIs for further reading and reference. The authors are Vladimír Kunc and Jiří Kléma.

Understanding The Exploding and Vanishing Gradients Problem

Added on January 7, 2024

The "Understanding The Exploding and Vanishing Gradients Problem" article discusses the vanishing and exploding gradients problem in deep neural networks. It explains how the gradients used to update the weights can shrink or grow exponentially, causing learning to stall or become unstable. The article explores why gradients vanish or explode exponentially and how it affects the backpropagation algorithm during training. It also provides strategies to address the vanishing and exploding gradients problem, such as using the ReLU activation function, weight initialization techniques, and gradient clipping.

ai/optimization

Bookmarks