Ludwig - cs/theory/algorithms

Taking a Look at Compression Algorithms

Added on June 27, 2025

Dissecting various compression algorithms.

attention is logarithmic, actually

Added on May 16, 2025

supaiku dot com § attention is logarithmic, actually § time complexity is a very bad model when working with parallelism. in which i make the case for work-depth analysis instead of time complexity.

ai/deep_learning/transformers

Are Transformers universal approximators of sequence-to-sequence functions?

Added on May 3, 2025

Despite the widespread adoption of Transformer models for NLP tasks, the expressive power of these models is not well-understood. In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models. Furthermore, using positional encodings, we circumvent the restriction of permutation equivariance, and show that Transformer models can universally approximate arbitrary continuous sequence-to-sequence functions on a compact domain. Interestingly, our proof techniques clearly highlight the different roles of the self-attention and the feed-forward layers in Transformers. In particular, we prove that fixed width self-attention layers can compute contextual mappings of the input sequences, playing a key role in the universal approximation property of Transformers. Based on this insight from our analysis, we consider other simpler alternatives to self-attention layers and empirically evaluate them.

ai/deep_learning

Training Large Language Models to Reason in a Continuous Latent Space

Added on April 24, 2025

Large language models (LLMs) are restricted to reason in the “language space”, where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem.

ai/deep_learning

Training Large Language Models to Reason in a Continuous Latent Space

Added on April 22, 2025

Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.

ai/natural_language_processing

What Is ChatGPT Doing … and Why Does It Work?

Added on April 15, 2025

Stephen Wolfram explores the broader picture of what's going on inside ChatGPT and why it produces meaningful text. Discusses models, training neural nets, embeddings, tokens, transformers, language syntax.

ai/natural_language_processing

Programming Really Is Simple Mathematics

Added on February 25, 2025

A re-construction of the fundamentals of programming as a small mathematical theory (PRISM) based on elementary set theory. Highlights: $\bullet$ Zero axioms. No properties are assumed, all are proved (from standard set theory). $\bullet$ A single concept covers specifications and programs. $\bullet$ Its definition only involves one relation and one set. $\bullet$ Everything proceeds from three operations: choice, composition and restriction. $\bullet$ These techniques suffice to derive the axioms of classic papers on the "laws of programming" as consequences and prove them mechanically. $\bullet$ The ordinary subset operator suffices to define both the notion of program correctness and the concepts of specialization and refinement. $\bullet$ From this basis, the theory deduces dozens of theorems characterizing important properties of programs and programming. $\bullet$ All these theorems have been mechanically verified (using Isabelle/HOL); the proofs are available in a public repository. This paper is a considerable extension and rewrite of an earlier contribution [arXiv:1507.00723]

cs/theory/algorithms/verification cs/theory/programming_languages

Proof of correctness of data representation

Added on January 25, 2025

cs/theory/algorithms/verification cs/theory/programming_languages

Unnamed Document

Added on January 25, 2025

ai/deep_learning/neural_networks

TLA+ is hard to learn

Added on December 16, 2024

I’m a fan of the formal specification language TLA+. With TLA+, you can build models of programs or systems, which helps to reason about their behavior. TLA+ is particularly useful for reason…

cs/theory/algorithms/verification

How hard is constraint programming?

Added on December 16, 2024

Writing code using the Z3 SMT solver is different from typical programming, due to mixed programming models--not unlike CUDA for GPUs. Here's what to expect.

cs/software_development

Competitive Programming

Added on November 29, 2024

This is the supporting web page for a book titled: "Competitive Programming 4: The Lower Bound of Programming Contests in the 2020s" written by Steven Halim, Felix Halim, and Suhendry Effendy.

cs/software_development/educational_resources

Numerical Recipes

Added on November 11, 2024

We are Numerical Recipes, one of the oldest continuously operating sites on the Internet.

trading_interview_blog

Added on August 21, 2024

brotli-2015-09-22

Added on July 29, 2024

cs/software_development

Tiled Matrix Multiplication

Added on July 29, 2024

Tiled matrix multiplication is an efficient algorithm used on GPUs that reduces memory access by utilizing shared memory. By organizing threads into blocks, each thread can perform calculations more quickly and with fewer memory accesses. This method is important for improving performance in tasks like graphics rendering and machine learning.

cs/computer_architecture/hardware/gpus cs/theory/algorithms/matrix_multiplication

Cache-Oblivious Algorithms

Added on July 29, 2024

Cache-oblivious algorithms are designed to use processor caches efficiently without needing to know specific cache details. They work by dividing data into smaller parts, allowing more computations to happen in cache and reducing memory access. This leads to better performance, especially in parallel algorithms, by minimizing shared memory bottlenecks.

cs/software_development/performance_optimization

Minimal Boolean Formulas

Added on July 27, 2024

The post discusses how to compute the minimum number of AND and OR operators needed for Boolean functions with five variables. It describes the author's program that efficiently calculates this minimum for various functions while also improving algorithms for speed. The findings contribute to understanding the complexity of Boolean functions and their representations.

cs/theory/algorithms/minimization mathematics/logic

What is an Invariant? Oct 6, 2023

Added on July 22, 2024

Invariants are properties that hold true during the evolution of a system, helping to ensure correct behavior in programming. They can simplify reasoning about code, whether it’s for small algorithms or larger systems. By clearly defining invariants, programmers can create robust code and manage complex systems effectively.

cs/software_development/design cs/theory/algorithms/verification

Data Compression Explained

Added on July 15, 2024

Data compression involves modeling and coding to reduce the size of data files. Modern compressors typically use arithmetic coding for efficient compression. Algorithms like Huffman coding and run-length encoding are commonly used to achieve better compression results.

mathematics/information_theory

The next fifty years

Added on July 11, 2024

The text discusses the future of computing science over the next fifty years, emphasizing the importance of simplicity and elegance in design to prevent complexity. It highlights the close connection between program design and proof design, suggesting that advancements in program design can impact general mathematics. The author encourages embracing the opportunity to simplify processes and design systems that rely on formal mathematics.

cs/software_development/design/simplicity cs/theory/algorithms/verification

Leslie Lamport

Added on June 28, 2024

Leslie Lamport wrote several papers on verifying and specifying concurrent systems using TLA. He discovered algorithms through formal derivation and emphasized mechanical verification of concurrent algorithms. His work influenced the development of the TLAPS proof system.

cs/software_development/concurrency cs/theory/algorithms/verification

Prospecting for Hash Functions

Added on June 24, 2024

The text discusses the process of designing non-cryptographic integer hash functions, exploring different operations and constraints to create effective hash functions. It also compares various 32-bit hash functions and their bias levels, highlighting the search for high-quality hash functions with minimal bias for both 32-bit and 64-bit integers.

cs/software_development/data_structures/hash_maps

Nanosystems

Added on June 23, 2024

This text is about a book called "Nanosystems" by K. Eric Drexler, which is considered groundbreaking in the field of molecular nanotechnology. The book explains how to create manufacturing systems at the molecular level and discusses the significant impact nanotechnology will have on various industries. Experts praise the book for providing a foundation for future research in molecular systems engineering and molecular manufacturing.

Understanding_Machine_Learning_-_From_Theory_to_Algorithms

Added on June 17, 2024

I'm sorry, but there is no content provided for me to summarize. If you provide me with the specific content or information you would like summarized, I would be happy to help.

ai/supervised_learning

chrono-Compatible Low-Level Date Algorithms

Added on June 15, 2024

The text explains algorithms for handling dates and determining leap years. It includes functions for calculating the last day of a month and converting dates between different calendar systems. The algorithms are designed to be efficient and accurate for various date calculations.

cs/theory/algorithms/date_and_time cs/theory/algorithms/calendar_systems

The Art of Computer Programming, Vol. 4 Fascicle 6

Added on June 6, 2024

mathematics/logic

Matrix multiplication in Mojo

Added on May 27, 2024

The text discusses matrix multiplication in Mojo. It is written by modular.com and can be found on docs.modular.com.

cs/theory/algorithms/matrix_multiplication cs/software_development/performance_optimization

Matrix Multiplication on CPU

Added on May 27, 2024

The text is about matrix multiplication on a CPU. The author is Marek Kolodziej and the domain is marek.ai.

cs/theory/algorithms/matrix_multiplication cs/software_development/performance_optimization

numerical_recipes

Added on February 14, 2024

The content provided is the table of contents for a book titled "Numerical Recipes: The Art of Scientific Computing, Third Edition." It includes various topics such as linear algebra, interpolation and extrapolation, integration of functions, evaluation of functions, special functions, random numbers, sorting and selection, root finding and nonlinear sets of equations, minimization or maximization of functions, eigensystems, and more.

Competitive Programmer's Handbook

Added on February 5, 2024

The article discusses various algorithms and data structures used in computer programming, such as Kadane's algorithm, binary indexed trees, segment trees, Dijkstra's algorithm, and Floyd's algorithm. The author also explains concepts like successor graphs, index compression, and minimum spanning trees. The time complexity of each algorithm is also discussed.

cs/theory/data_structures

Bookmarks

Taking a Look at Compression Algorithms

attention is logarithmic, actually

Are Transformers universal approximators of sequence-to-sequence functions?

Training Large Language Models to Reason in a Continuous Latent Space

Training Large Language Models to Reason in a Continuous Latent Space

What Is ChatGPT Doing … and Why Does It Work?

Programming Really Is Simple Mathematics

Proof of correctness of data representation

Unnamed Document

TLA+ is hard to learn

How hard is constraint programming?

Competitive Programming

Numerical Recipes

trading_interview_blog

brotli-2015-09-22

Tiled Matrix Multiplication

Cache-Oblivious Algorithms

Minimal Boolean Formulas

What is an Invariant? Oct 6, 2023

Data Compression Explained

The next fifty years

Leslie Lamport

Prospecting for Hash Functions

Nanosystems

Understanding_Machine_Learning_-_From_Theory_to_Algorithms

chrono-Compatible Low-Level Date Algorithms

The Art of Computer Programming, Vol. 4 Fascicle 6

Matrix multiplication in Mojo

Matrix Multiplication on CPU

numerical_recipes

Competitive Programmer's Handbook

Subcategories