Ludwig - ai/natural_language_processing/language

How I don't use LLMs

Added on June 27, 2025

Manfred Mohr, Cubic Limit: P-197 (1977)I enjoy shocking people by telling them I don’t use LLMs.This isn’t true, but it’s morally true for the reference clas...

cognition/productivity

CS336: Language Modeling from Scratch

Added on April 19, 2025

Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks.

ai/deep_learning/transformers

The MiniPile Challenge for Data-Efficient Language Models

Added on July 29, 2024

The MiniPile Challenge introduces a new dataset for pre-training language models, containing 1 million documents filtered for quality. It aims to reduce the need for large computational resources while still achieving competitive performance on language tasks. The research shows that models pre-trained on MiniPile perform only slightly worse than those trained on much larger datasets.

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

Added on July 22, 2024

Researchers trained a chess-playing language model to understand the game without prior knowledge, focusing on how it represents the board state. They found that the model not only learned the board's layout but also estimated player skill, which helped it predict the next move better. By incorporating a player skill vector, the model's win rate improved significantly.

ai/interpretability

[2305.13009] Textually Pretrained Speech Language Models

Added on June 24, 2024

Exploring architectures- Transformers II

Added on June 6, 2024

The text explains how Transformers utilize queries, keys, and values to calculate self-attention weights for tokens. It details the process of obtaining the self-attention weights and generating output tokens through neural networks. The final steps involve calculating loss using cross-entropy and backpropagating to update the weight parameters.

ai/deep_learning/transformers

Auto-Regressive Next-Token Predictors are Universal Learners

Added on May 26, 2024

Simple linear next-token predictors can efficiently approximate any function computable by a Turing machine. Even basic models like linear networks and shallow Multi-Layer Perceptrons show strong performance on tasks like text generation and arithmetic. By leveraging auto-regressive learning, these models can achieve impressive results in solving complex tasks.

2309.10668

Added on February 8, 2024

This article discusses the relationship between language modeling and compression. The authors argue that large language models can be viewed as powerful compressors due to their impressive predictive capabilities. They demonstrate that these models can achieve state-of-the-art compression rates across different data modalities, such as images and audio. The authors also explore the connection between compression and prediction, showing that models that compress well also generalize well. They conclude by advocating for the use of compression as a framework for studying and evaluating language models.

ai/compression

Bookmarks