Bookmarks
How I don't use LLMs
Manfred Mohr, Cubic Limit: P-197 (1977)I enjoy shocking people by telling them I don’t use LLMs.This isn’t true, but it’s morally true for the reference clas...
CS336: Language Modeling from Scratch
Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks.
The MiniPile Challenge for Data-Efficient Language Models
The MiniPile Challenge introduces a new dataset for pre-training language models, containing 1 million documents filtered for quality. It aims to reduce the need for large computational resources while still achieving competitive performance on language tasks. The research shows that models pre-trained on MiniPile perform only slightly worse than those trained on much larger datasets.
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
Researchers trained a chess-playing language model to understand the game without prior knowledge, focusing on how it represents the board state. They found that the model not only learned the board's layout but also estimated player skill, which helped it predict the next move better. By incorporating a player skill vector, the model's win rate improved significantly.
Exploring architectures- Transformers II
The text explains how Transformers utilize queries, keys, and values to calculate self-attention weights for tokens. It details the process of obtaining the self-attention weights and generating output tokens through neural networks. The final steps involve calculating loss using cross-entropy and backpropagating to update the weight parameters.
Auto-Regressive Next-Token Predictors are Universal Learners
Simple linear next-token predictors can efficiently approximate any function computable by a Turing machine. Even basic models like linear networks and shallow Multi-Layer Perceptrons show strong performance on tasks like text generation and arithmetic. By leveraging auto-regressive learning, these models can achieve impressive results in solving complex tasks.
2309.10668
This article discusses the relationship between language modeling and compression. The authors argue that large language models can be viewed as powerful compressors due to their impressive predictive capabilities. They demonstrate that these models can achieve state-of-the-art compression rates across different data modalities, such as images and audio. The authors also explore the connection between compression and prediction, showing that models that compress well also generalize well. They conclude by advocating for the use of compression as a framework for studying and evaluating language models.
Subcategories
- applications (9)
- compression (9)
- computer_vision (8)
- deep_learning (94)
- ethics (2)
- generative_models (25)
- interpretability (17)
- natural_language_processing (24)
- optimization (7)
- recommendation (2)
- reinforcement_learning (11)
- supervised_learning (1)