Categories
-
ai (361)
-
applications (15)
-
ethics (1)
-
expert_systems (2)
-
game_ai (5)
-
machine_learning (324)
-
robotics (2)
-
theory (1)
(0 direct) -
-
-
bioelectronics (1)
-
microfluidics (1)
-
neurotechnology (2)
(1 direct) -
-
cognition (70)
-
communication (1)
-
consciousness (3)
-
decision_making (14)
-
intuition (2)
-
learning (19)
-
learning_methods (2)
-
memory (4)
-
problem_solving (1)
-
productivity (6)
-
reasoning (7)
-
systems_thinking (4)
-
working_memory (1)
(1 direct) -
-
cs (876)
-
computer_graphics (10)
-
software_development (466)
-
theory (302)
(1 direct) -
mathematics (65)
-
arithmetic (1)
-
category_theory (15)
-
entropy (5)
-
fractal_geometry (1)
-
geometry (2)
-
linear_algebra (9)
-
logic (2)
-
measure_theory (1)
-
optimization (5)
-
probability (4)
-
proof_theory (4)
(3 direct) -
physics (10)
-
thermodynamics (1)
(2 direct) -
technical_writing (18)
-
documentation (3)
-
surveys (1)
-
tutorials (4)
(4 direct) -
-
uncategorized (38)(38 direct)
Timeline
June 2025
1 bookmarksMatrices and graphs
June 5, 2025
The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices
May 2025
14 bookmarksDeepSeek-V3 Explained 1: Multi-head Latent Attention
May 29, 2025
Key architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inference
Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT
May 16, 2025
State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs.
You could have designed state of the art positional encoding
May 16, 2025
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
attention is logarithmic, actually
May 16, 2025
supaiku dot com § attention is logarithmic, actually § time complexity is a very bad model when working with parallelism. in which i make the case for work-depth analysis instead of time complexity.
AI Arrives In The Middle East: US Strikes A Deal with UAE and KSA – SemiAnalysis
May 16, 2025
The US has signed two landmark agreements with the United Arab Emirates and Kingdom of Saudi Arabia (KSA) that that will noticeably shift the balance of power. The deals have economic, geopolitical…
Transformers Represent Belief State Geometry in their Residual Stream
May 16, 2025
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS.…
Llama from scratch (or how to implement a paper without crying)
May 16, 2025
I want to provide some tips from my experience implementing a paper. I'm going to cover my tips so far from implementing a dramatically scaled-down versio...
The Curse of Knowing How, or; Fixing Everything
May 16, 2025
A reflection on control, burnout, and the strange weight of technical fluency.
The MAP-Elites Algorithm: Finding Optimality Through Diversity
May 16, 2025
MAP-Elites is a method in reinforcement learning to avoid the local optimum of a search space by storing multiple candidate solutions…
How To Scale
May 13, 2025
While there are already excellent posts on scaling, I wanted to share my own understanding and things i've learned from my past few months and hopefully spark some discussion. I hope this post can shed light for anyone navigating the challenges of scaling up neural networks. And there may be mistakes or inaccuracies, so if you want to correct me or would like to discuss further, please feel free to DM me on X or leave a comment.
Are Transformers universal approximators of sequence-to-sequence functions?
May 3, 2025
Despite the widespread adoption of Transformer models for NLP tasks, the expressive power of these models is not well-understood. In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models. Furthermore, using positional encodings, we circumvent the restriction of permutation equivariance, and show that Transformer models can universally approximate arbitrary continuous sequence-to-sequence functions on a compact domain. Interestingly, our proof techniques clearly highlight the different roles of the self-attention and the feed-forward layers in Transformers. In particular, we prove that fixed width self-attention layers can compute contextual mappings of the input sequences, playing a key role in the universal approximation property of Transformers. Based on this insight from our analysis, we consider other simpler alternatives to self-attention layers and empirically evaluate them.
a Hugging Face Space by nanotron
May 3, 2025
The ultimate guide to training LLM on large GPU Clusters
April 2025
33 bookmarksA Group and Its Center, Intuitively
April 27, 2025
Last week we took an intuitive peek into the First Isomorphism Theorem as one example in our ongoing discussion on quotient groups.
Understanding Entanglement With SVD
April 27, 2025
Quantum entanglement is, as you know, a phrase that's jam-packed with meaning in physics. But what you might not know is that the linear algebra behind it is quite simple.
Training Large Language Models to Reason in a Continuous Latent Space
April 24, 2025
Large language models (LLMs) are restricted to reason in the “language space”, where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem.
The Book of Shaders
April 22, 2025
Gentle step-by-step guide through the abstract and complex universe of Fragment Shaders.
Unstructured Thoughts on the Problems of OSS/FOSS
April 22, 2025
Originally from replies to a Twitter thread: https://x.com/TheGingerBill/status/1914389352416993395
This is not a structured argument against FOSS/OSS but my uncommon thoughts on the topic.
I am not sure if I agree [that FOSS/OSS derives from the same thinking process as the ideology of communism], but I understand the sentiment. The fundamental issue is that software is trivially copyable. I have loads of issues with FOSS and OSS1. And part of this “ideology” (as presented in the original post) is naïvety coupled with only first-order thinking and a poor understanding of ownership.
Training Large Language Models to Reason in a Continuous Latent Space
April 22, 2025
Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.
On the Biology of a Large Language Model
April 22, 2025
Large language models display impressive capabilities. However, for the most part, the mechanisms by which they do so are unknown.
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
April 22, 2025
We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language models function and the origins of linguistic bias. Focusing on the Llama-2 family of transformer models, our study uses carefully constructed non-English prompts with a unique correct single-token continuation. From layer to layer, transformers gradually map an input embedding of the final prompt token to an output embedding from which next-token probabilities are computed. Tracking intermediate embeddings through their high-dimensional space reveals three distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already allow for decoding a semantically correct next token in the middle layers, but give higher probability to its version in English than in the input language; (3) finally move into an input-language-specific region of the embedding space. We cast these results into a conceptual model where the three phases operate in "input space", "concept space", and "output space", respectively. Crucially, our evidence suggests that the abstract "concept space" lies closer to English than to other languages, which may have important consequences regarding the biases held by multilingual language models.
The Unsustainability of Moore’s Law
April 22, 2025
Roughly every two years, the density of transistors that can be fit onto a silicon chip doubles.
"Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?"
April 22, 2025
This isn't a new intuition, but a nice new set of results.
+33 7 80 61 21 67
April 21, 2025
Quickly send and receive WhatsApp messages directly from your computer.
tt-metal/tech_reports/memory/allocator.md at main · tenstorrent/tt-metal
April 19, 2025
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model. - tenstorrent/tt-metal
Multi-layer language heads: the output latent is for text (and nothing else)
April 19, 2025
The last layer’s hidden state in a transformer is meant only for being decoded into token probabilities. Don’t use it for autoregressive image generation Dont’t use it for looped latent transformers Only use it to produce the next token in a language model It is a compressed representation of the...
Subnanosecond flash memory enabled by 2D-enhanced hot-carrier injection
April 19, 2025
A two-dimensional Dirac graphene-channel flash memory based on a two-dimensional-enhanced hot-carrier-injection mechanism that supports both electron and hole injection is used to make devices with a subnanosecond program speed.
CS336: Language Modeling from Scratch
April 19, 2025
Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks.
A Gentle Introduction to Lambda Calculus - Part 1: Syntax
April 19, 2025
Even though lots of people nowadays advocate for applying functional programming principles to JavaScript, not many of them know the principles of Lambda Cal...
Getting Started
April 19, 2025
Yet it seems to me that the situation right now is that LtU has readers with very different backgrounds, among them many readers who haven't studied PL formally.
Intelligence as efficient model building
April 19, 2025
Personal site for posts about my interests: the biotech industry, medicine, molecular biology, neuroscience, biorisk, science, consciousness, AI, innovation, decision making, philosophy, games, sci-fi, probability, and forecasting (among other things). I write to learn, mostly about biotech.
What Is ChatGPT Doing … and Why Does It Work?
April 15, 2025
Stephen Wolfram explores the broader picture of what's going on inside ChatGPT and why it produces meaningful text. Discusses models, training neural nets, embeddings, tokens, transformers, language syntax.
Position: Model Collapse Does Not Mean What You Think
April 10, 2025
The proliferation of AI-generated content online has fueled concerns over \emph{model collapse}, a degradation in future generative models' performance when trained on synthetic data generated by earlier models. Industry leaders, premier research journals and popular science publications alike have prophesied catastrophic societal consequences stemming from model collapse. In this position piece, we contend this widespread narrative fundamentally misunderstands the scientific evidence. We highlight that research on model collapse actually encompasses eight distinct and at times conflicting definitions of model collapse, and argue that inconsistent terminology within and between papers has hindered building a comprehensive understanding of model collapse. To assess how significantly different interpretations of model collapse threaten future generative models, we posit what we believe are realistic conditions for studying model collapse and then conduct a rigorous assessment of the literature's methodologies through this lens. While we leave room for reasonable disagreement, our analysis of research studies, weighted by how faithfully each study matches real-world conditions, leads us to conclude that certain predicted claims of model collapse rely on assumptions and conditions that poorly match real-world conditions, and in fact several prominent collapse scenarios are readily avoidable. Altogether, this position paper argues that model collapse has been warped from a nuanced multifaceted consideration into an oversimplified threat, and that the evidence suggests specific harms more likely under society's current trajectory have received disproportionately less attention.
Recent AI model progress feels mostly like
April 7, 2025
About nine months ago, I and three friends decided that AI had gotten good enough to monitor large codebases autonomously for security problems. We s…
Building an Open Future
April 5, 2025
We are building an open future for AI. Own your silicon future. Join us.
diffusion transofrmers
April 5, 2025
Metaphorically, you can think of Vision Transformers as the eyes of the system, able to understand and contextualize what it sees, while Stable Diffusion is the hand of the system, able to generate and manipulate images based on this understanding.
diffusion transformers
April 5, 2025
Metaphorically, you can think of Vision Transformers as the eyes of the system, able to understand and contextualize what it sees, while Stable Diffusion is the hand of the system, able to generate and manipulate images based on this understanding.
Faking ADTs and GADTs in Languages That Shouldn't Have Them
April 1, 2025
Haskell is the world’s best programming language, but let’s face the harsh reality that a lot of times in life you’ll have to write in other programming languages. But alas you have been fully Haskell-brained and lost all ability to program unless it is type-directed, you don’t even know how to start writing a program without imagining its shape as a type first. Well, fear not. The foundational theory behind Algebraic Data Types and Generalized Algebraic Data Types (ADTs and GADTs) are so fundamental that they’ll fit (somewhat) seamlessly into whatever language you’re forced to write. After all, if they can fit profunctor optics in Microsoft’s Java code, the sky’s the limit! This is an “April Fools” joke in the tradition of my previous one in some of these ways that we are going to twist these other languages might seem unconventional or possibly ill-advised… but also the title is definitely a lie: these languages definitely should have them! :D
March 2025
36 bookmarksAccelerate
March 29, 2025
Accelerate is a language for array-based computations, designed to exploit massive parallelism.
Ok Rust, You Really Have a Readability Problem
March 29, 2025
Rust is safe. Rust is fast. Rust is powerful. And Rust is… sometimes completely unreadable.
Circuit Tracing: Revealing Computational Graphs in Language Models
March 29, 2025
Deep learning models produce their outputs using a series of transformations distributed across many computational units (artificial “neurons”).
Analyzing Modern NVIDIA GPU cores
March 29, 2025
GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pipeline designs based on architectures that are more than 15 years old.
This paper reverse engineers modern NVIDIA GPU cores, unveiling many key aspects of its design and explaining how GPUs leverage hardware-compiler techniques where the compiler guides hardware during execution. In particular, it reveals how the issue logic works including the policy of the issue scheduler, the structure of the register file and its associated cache, and multiple features of the memory pipeline. Moreover, it analyses how a simple instruction prefetcher based on a stream buffer fits well with modern NVIDIA GPUs and is likely to be used. Furthermore, we investigate the impact of the register file cache and the number of register file read ports on both simulation accuracy and performance.
By modeling all these new discovered microarchitectural details, we achieve 18.24% lower mean absolute percentage error (MAPE) in execution cycles than previous state-of-the-art simulators, resulting in an average of 13.98% MAPE with respect to real hardware (NVIDIA RTX A6000). Also, we demonstrate that this new model stands for other NVIDIA architectures, such as Turing. Finally, we show that the software-based dependence management mechanism included in modern NVIDIA GPUs outperforms a hardware mechanism based on scoreboards in terms of performance and area.
tt-metal/tech_reports/AdvancedPerformanceOptimizationsForModels/AdvancedPerformanceOptimizationsForModels.md at main · tenstorrent/tt-metal · GitHub
March 29, 2025
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model. - tenstorrent/tt-metal
Why is Yazi fast?
March 28, 2025
This article assumes that you have already used Yazi and are familiar with most of its features.
User Guide for NVPTX Back-end
March 28, 2025
To support GPU programming, the NVPTX back-end supports a subset of LLVM IR along with a defined set of conventions used to represent GPU programming concepts.
An AnandTech Interview with Jim Keller: 'The Laziest Person at Tesla'
March 27, 2025
I've spoken about Jim Keller many times on AnandTech.
Notes/Primer on Clang Compiler Frontend (1) : Introduction and Architecture
March 25, 2025
Notes/Primer on Clang Compiler Frontend: Introduction and Architecture
These are my notes on chapters 1 & 2 of the Clang Compiler Frontend by Ivan Murashko. The book is focused on teaching the fundamentals of LLVM to C++ engineers who are interested in learning about compilers to optimize their daily workflow by enhancing their code quality and overall development process. (I’ve referened this book extensively, and a lot of the snippets here are from this book.
Implementation of simple microprocessor using verilog
March 25, 2025
I am trying to make a simple microprocessor in verilog as a way to understand verilog and assembly at the same time.
I am not sure if I am implementing what I think of microprocessors well enough ...
learn-fpga/FemtoRV/TUTORIALS/FROM_BLINKER_TO_RISCV/README.md at master · BrunoLevy/learn-fpga · GitHub
March 24, 2025
Learning FPGA, yosys, nextpnr, and RISC-V . Contribute to BrunoLevy/learn-fpga development by creating an account on GitHub.
Why async Rust?
March 24, 2025
I genuinely can’t understand how anybody could look at the mess that’s Rust’s async and think that it was a good design for a language that already had the reputation of being very complicated to write.
Softmax Attention is a Fluke
March 24, 2025
Calibrated AttentionCalibrated Attention NanoGPTAttention is the magic ingredient of modern neural networks. It is the core of what has launched performant language models into the spotlight starting with GPT, and since then, it has extended its hands across all modalities.There are a number of desirable properties that make attention a first-class building block. Namely: • It handles variable sequence lengths with ease • It allows for a global receptive field without needing to scale parameters
Transformers Laid Out
March 23, 2025
I have encountered that there are mainly three types of blogs/videos/tutorials talking about transformers
Template Haskell
March 22, 2025
Intuitively Template Haskell provides new language features that allow us to convert back and forth between concrete syntax, i. e.
A friendly introduction to machine learning compilers and optimizers
March 18, 2025
[Twitter thread, Hacker News discussion]
Comments on Source
March 18, 2025
The section of the wiki allows anyone to document, explain, post questions, or make comments on the Lua source code. You may link to [1] or paste the code in question.
Bloom’s 3 Stages of Talent Development
March 18, 2025
First, fun and exciting playtime. Then, intense and strenuous skill development. Finally, developing one’s individual style while pushing the boundaries of the field.
Russell’s Paradox and Possible Solutions
March 18, 2025
The origins of set theory can be traced back to a Bohemian priest, Bernhard Bolzano (1781-1848), who was a professor of religion at the University of Prague.
The Making of Python
March 17, 2025
Guido van Rossum is the author of Python, an interpreted, interactive object-oriented programming language.
tt-metal/METALIUM_GUIDE.md at main · tenstorrent/tt-metal · GitHub
March 17, 2025
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model. - tenstorrent/tt-metal
Scoping out the Tenstorrent Wormhole
March 17, 2025
The Tenstorrent Wormhole n300s PCIe accelerator board is available for purchase, featuring 672 RISC-V cores driving 466 TFLOP/s of FP8 matmul.
What’s the (floating) Point of all these data types? A (not so) brief overview of the history and usage of datatypes within the wide world of computation
March 17, 2025
This presentation delves into the fascinating and sometimes aggravating world of numerical data types, exploring the evolution, strengths, and weaknesses of decimal, fixed point, floating point, and shared exponent formats over the past 70 years.
Physics of language models
March 17, 2025
Many asked about collaborations (details are in FAQ). Short answer: unless you're from Meta and willing to work with us in your spare time (20+ hrs/week), or you're an early-year PhD from UCB/NYU/CMU/UW (but application ddl was Jan 10, 2025).
Citation request: I'm delighted to know that multiple
Tenstorrent first thoughts
March 17, 2025
I've looked into alternative AI accelerators to continue my saga of running GGML on lower power-consumption hardware. The most promising - and the only one that ever replied to my emails - was Tenstorrent. This post is me deeply thinking about if buying their hardware for development is a good inve ...
Neural Networks, Manifolds, and Topology
March 9, 2025
However, there remain a number of concerns about them. One is that it can be quite challenging to understand what a neural network is really doing.
Attention from Beginners Point of View
March 9, 2025
Transformers are a type of neural network architecture which is popularly used for text generations, machine translations, etc.
(How) Do Language Models Track State?
March 9, 2025
Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that appear to require tracking the unobserved state of an evolving world. How do they do so? We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). Despite the simple algebraic structure of this problem, many other tasks (e.g., simulation of finite automata and evaluation of boolean expressions) can be reduced to permutation composition, making it a natural model for state tracking in general. We show that LMs consistently learn one of two state tracking mechanisms for this task. The first closely resembles the "associative scan" construction used in recent theoretical work by Liu et al. (2023) and Merrill et al. (2024). The second uses an easy-to-compute feature (permutation parity) to partially prune the space of outputs, then refines this with an associative scan. The two mechanisms exhibit markedly different robustness properties, and we show how to steer LMs toward one or the other with intermediate training tasks that encourage or suppress the heuristics. Our results demonstrate that transformer LMs, whether pretrained or fine-tuned, can learn to implement efficient and interpretable state tracking mechanisms, and the emergence of these mechanisms can be predicted and controlled.
Why Attention Is All You NeedWhy Attention Is All You Need
March 9, 2025
The Transformer architecture introduced in this paper was a major breakthrough in sequence transduction methodologies, particularly within neural machine translation (NMT) and broader natural language processing (NLP).
CFD Python: 12 steps to Navier-Stokes
March 7, 2025
We announce the public release of online educational materials for self-learners of CFD using IPython Notebooks: the CFD Python Class!
tt-mlir documentation
March 6, 2025
The following document provides an overview of the TT-MLIR project, with a focus on the technical specifications of an MLIR-based compiler stack. So what exactly is an MLIR-based compiler stack?
Yizhou Shan's Home Page
March 6, 2025
This paper has a really nice Intro, pay close attention to how they lay out the storyline.
Crossing the uncanny valley ofconversational voice
March 1, 2025
At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.
February 2025
13 bookmarksHow to Think About TPUs
February 26, 2025
All about how TPUs work, how they're networked together to enable multi-chip training and inference, and how they limit the performance of our favorite algorithms. While this may seem a little dry, it's super important for actually making models efficient.
Programming Really Is Simple Mathematics
February 25, 2025
A re-construction of the fundamentals of programming as a small mathematical theory (PRISM) based on elementary set theory. Highlights:
$\bullet$ Zero axioms. No properties are assumed, all are proved (from standard set theory).
$\bullet$ A single concept covers specifications and programs.
$\bullet$ Its definition only involves one relation and one set.
$\bullet$ Everything proceeds from three operations: choice, composition and restriction.
$\bullet$ These techniques suffice to derive the axioms of classic papers on the "laws of programming" as consequences and prove them mechanically.
$\bullet$ The ordinary subset operator suffices to define both the notion of program correctness and the concepts of specialization and refinement.
$\bullet$ From this basis, the theory deduces dozens of theorems characterizing important properties of programs and programming.
$\bullet$ All these theorems have been mechanically verified (using Isabelle/HOL); the proofs are available in a public repository.
This paper is a considerable extension and rewrite of an earlier contribution [arXiv:1507.00723]
Tenstorrent Wormhole Series Part 1: Physicalities
February 25, 2025
A company called Tenstorrent design and sell PCIe cards for AI acceleration. At the time of writing, they've recently started shipping their Wormhole n150s and Wormhole n300s cards.
Community Highlight: Tenstorrent Wormhole Series Part 2: Which disabled rows?
February 25, 2025
An in depth look at Tenstorrent Wormhole, originally posted on corsix.org
The world's largest prediction market.
February 24, 2025
Polymarket is the world’s largest prediction market, allowing you to say informed and profit from your knowledge by betting on future events across various topics.
neural video codecs: the future of video compression
February 17, 2025
how deep learning could rewrite the way we encode and decode video
Unnamed Document
February 17, 2025
Mastering LLM Techniques: Evaluation
February 15, 2025
Evaluating large language models (LLMs) and retrieval-augmented generation (RAG) systems is a complex and nuanced process, reflecting the sophisticated and multifaceted nature of these systems.
Mastering LLM Inference Techniques: Inference Optimization
February 15, 2025
Learn about the most pressing challenges in LLM inference, along with some practical solutions.
Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling
February 15, 2025
As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is emerging. Also known as AI reasoning or long…
The high-return activity of raising others’ aspirations
February 12, 2025
Yesterday I had lunch with a former Ph.D student of mine, who is now highly successful and tenured at a very good school. I was reminded that, over twenty years ago, I was Graduate Director of Admissions. One of my favorite strategies was to take strong candidates who applied for Masters and also offer them […]
January 2025
14 bookmarksTilde, my LLVM alternative
January 25, 2025
I'm Yasser and I've made it my mission to produce an alternative to LLVM, the current king of compiler backend libraries.
A WebAssembly compiler that fits in a tweet
January 25, 2025
Starting with a 192-byte one-liner that implements a Reverse Polish Notation arithmetic compiler, we'll work backward to transform it into readable JavaScript by removing one code golf trick at a time
Proof of correctness of data representation
January 25, 2025
Unnamed Document
January 25, 2025
Unveiling_DeepSeek.pdf
January 22, 2025
successful modifications since its inception, let alone large-scale validation.
Stating the problem in Lean
January 19, 2025
Note: this post was written for Lean 3; the latest version, Lean 4, is a very different language.
Turn back the clock to 2009: a confused physics major newly infatuated with math and computer science, I enrolled in MATH 273: Numbers and Proofs at the University of Calgary. This wasn’t my first encounter with mathematical proof; in first-year calculus I’d mastered rote regurgitation of delta-epsilon proofs. Despite writing out several dozen, their meaning never progressed beyond a sort of incantation I can summon to this day (for every \( \epsilon > 0 \) there exists a \( \delta > 0 \) such that…).
DeepSeek-V3 Explained: A Deep Dive into the Next-Generation AI Model
January 18, 2025
Artificial Intelligence (AI) is advancing at an unprecedented pace, and the DeepSeek-V3 model is at the forefront of this revolution. As…
Foundations of Large Language Models
January 17, 2025
This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is structured into four main chapters, each exploring a key area: pre-training, generative models, prompting techniques, and alignment methods. It is intended for college students, professionals, and practitioners in natural language processing and related fields, and can serve as a reference for anyone interested in large language models.
Category Theory: Lecture Notes and Online Books
January 10, 2025
The links below are to various freely (and legitimately!) available online mathematical resources for those interested in category theory at an elementary/intermediate level. There is supplementary page, introductory readings for philosophers, for reading suggestions for those looking for the most accessible routes into category theory and/or links to philosophical discussions. A gentle introduction? My Category … Category Theory: Lecture Notes and Online Books Read More »
Why Futhark?
January 9, 2025
A high-performance and high-level purely functional data-parallel array programming language that can execute on the GPU and CPU.
Hesabım - Pozitif Teknoloji
January 6, 2025
Ödeme - Pozitif Teknoloji
January 6, 2025
*Lütfen açıklama kısmına sipariş numaranızı giriniz, Sipariş numarası yazılmayan havale işlemlerinde ki gecikmelerden firmamız sorumlu değildir.
Clear cache x app ios
January 4, 2025
Any way to delete the cache or app data on iphone? - RedditJul 26, 2023X app taking up 1.
December 2024
66 bookmarksBloom filters debunked: Dispelling 30 Years of bad math with Coq!
December 27, 2024
While conceptually simple, this feature actually requires more engineering effort than one would expect - in particular, tracking the set of known malicious URLs in a practical manner turns out to be somewhat difficult.
DeepSeek-V3/DeepSeek_V3.pdf at main · deepseek-ai/DeepSeek-V3
December 26, 2024
by Marcus Hutter and David Quarel and Elliot Catt
December 24, 2024
The book can be ordered from amazon. com / co.
Deepseek: The Quiet Giant Leading China’s AI Race
December 24, 2024
Annotated translation of its CEO's deepest interview
Demystifying Debuggers, Part 2: The Anatomy Of A Running Program
December 23, 2024
On the concepts involved in a running program. What happens, exactly, when you double click an executable file, or launch it from the command line, and it begins to execute?
Towards a Categorical Foundation of Deep Learning: A Survey
December 22, 2024
The unprecedented pace of machine learning research has lead to incredible advances, but also poses hard challenges. At present, the field lacks strong theoretical underpinnings, and many important achievements stem from ad hoc design choices which are hard to justify in principle and whose effectiveness often goes unexplained. Research debt is increasing and many papers are found not to be reproducible.
This thesis is a survey that covers some recent work attempting to study machine learning categorically. Category theory is a branch of abstract mathematics that has found successful applications in many fields, both inside and outside mathematics. Acting as a lingua franca of mathematics and science, category theory might be able to give a unifying structure to the field of machine learning. This could solve some of the aforementioned problems.
In this work, we mainly focus on the application of category theory to deep learning. Namely, we discuss the use of categorical optics to model gradient-based learning, the use of categorical algebras and integral transforms to link classical computer science to neural networks, the use of functors to link different layers of abstraction and preserve structure, and, finally, the use of string diagrams to provide detailed representations of neural network architectures.
Soft question: Deep learning and higher categories
December 22, 2024
Recently, I have stumbled upon certain articles and lecture videos that use category theory to explain certain aspects of machine learning or deep learning (e.g. Cats for AI and the paper An enriched
Algebraic Databases
December 22, 2024
Databases have been studied category-theoretically for decades. The database schema---whose purpose is to arrange high-level conceptual entities---is generally modeled as a category or sketch. The data itself, often called an instance, is generally modeled as a set-valued functor, assigning to each conceptual entity a set of examples. While mathematically elegant, these categorical models have typically struggled with representing concrete data such as integers or strings.
In the present work, we propose an extension of the set-valued functor model, making use of multisorted algebraic theories (a.k.a. Lawvere theories) to incorporate concrete data in a principled way. This also allows constraints and queries to make use of operations on data, such as multiplication or comparison of numbers, helping to bridge the gap between traditional databases and programming languages.
We also show how all of the components of our model---including schemas, instances, change-of-schema functors, and queries - fit into a single double categorical structure called a proarrow equipment (a.k.a. framed bicategory).
Categorical Databases
December 22, 2024
walter
December 22, 2024
FPGAs for Software Engineers 0: The Basics
December 22, 2024
A brief introduction to FPGAs, Verilog and simulation
A note about "The Humane Representation of Thought"
December 17, 2024
A year and a half ago, on a plane, I wrote An Ill-Advised Personal Note about "Media for Thinking the Unthinkable".
BLT__Patches_Scale_Better_Than_Tokens
December 17, 2024
On Ousterhout’s Dichotomy Oct 6, 2024
December 17, 2024
Why are there so many programming languages? One of the driving reasons for this is that some
languages tend to produce fast code, but are a bit of a pain to use (C++), while others are a breeze
to write, but run somewhat slow (Python). Depending on the ratio of CPUs to programmers, one or the
other might be relatively more important.
The categorical abstract machine
December 17, 2024
The Cartesian closed categories have been shown by several authors to provide the right framework of the model theory of λ-calculus. The second author…
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
December 17, 2024
We present our position on the elusive quest for a general-purpose framework
for specifying and studying deep learning architectures. Our opinion is that
the key attempts made so far lack a coherent bridge between specifying
constraints which models must satisfy and specifying their implementations.
Focusing on building a such a bridge, we propose to apply category theory --
precisely, the universal algebra of monads valued in a 2-category of parametric
maps -- as a single theory elegantly subsuming both of these flavours of neural
network design. To defend our position, we show how this theory recovers
constraints induced by geometric deep learning, as well as implementations of
many architectures drawn from the diverse landscape of neural networks, such as
RNNs. We also illustrate how the theory naturally encodes many standard
constructs in computer science and automata theory.
Fundamental Components of Deep Learning: A category-theoretic approach
December 17, 2024
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
Logic and linear algebra: an introduction
December 17, 2024
We give an introduction to logic tailored for algebraists, explaining how proofs in linear logic can be viewed as algorithms for constructing morphisms in symmetric closed monoidal categories with additional structure. This is made explicit by showing how to represent proofs in linear logic as linear maps between vector spaces. The interesting part of this vector space semantics is based on the cofree cocommutative coalgebra of Sweedler.
Gemini: A Family of Highly Capable Multimodal Models
December 17, 2024
This report introduces a new family of multimodal models, Gemini, that
exhibit remarkable capabilities across image, audio, video, and text
understanding. The Gemini family consists of Ultra, Pro, and Nano sizes,
suitable for applications ranging from complex reasoning tasks to on-device
memory-constrained use-cases. Evaluation on a broad range of benchmarks shows
that our most-capable Gemini Ultra model advances the state of the art in 30 of
32 of these benchmarks - notably being the first model to achieve human-expert
performance on the well-studied exam benchmark MMLU, and improving the state of
the art in every one of the 20 multimodal benchmarks we examined. We believe
that the new capabilities of the Gemini family in cross-modal reasoning and
language understanding will enable a wide variety of use cases. We discuss our
approach toward post-training and deploying Gemini models responsibly to users
through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud
Vertex AI.
Flow Matching Guide and Code
December 17, 2024
Flow Matching (FM) is a recent framework for generative modeling that has
achieved state-of-the-art performance across various domains, including image,
video, audio, speech, and biological structures. This guide offers a
comprehensive and self-contained review of FM, covering its mathematical
foundations, design choices, and extensions. By also providing a PyTorch
package featuring relevant examples (e.g., image and text generation), this
work aims to serve as a resource for both novice and experienced researchers
interested in understanding, applying and further developing FM.
Logical Complexity of Proofs
December 17, 2024
If you cannot find proofs, talk about them. Robert Reckhow with his advsior Stephen Cook famously started the formal study of the complexity of proofs with their 1979 paper. They were interested in…
Proofs and Types
December 16, 2024
Richard Hamming - Wikipedia
December 16, 2024
Richard Wesley Hamming (February 11, 1915 – January 7, 1998) was an American mathematician whose work had many implications for computer engineering and telecommunications.
What is the "question" that programming language theory is trying to answer?
December 16, 2024
I've been interested in various topics like Combinatory Logic, Lambda Calculus, Functional Programming for a while and have been studying them. However, unlike the "Theory of Computation" which str...
Introducing Limbo: A complete rewrite of SQLite in Rust
December 16, 2024
we forked SQLite with the libSQL project. What would it be like if we just rewrote it?
TLA+ is hard to learn
December 16, 2024
I’m a fan of the formal specification language TLA+. With TLA+, you can build models of programs or systems, which helps to reason about their behavior. TLA+ is particularly useful for reason…
How hard is constraint programming?
December 16, 2024
Writing code using the Z3 SMT solver is different from typical programming, due to mixed programming models--not unlike CUDA for GPUs. Here's what to expect.
Fundamental Components of Deep Learning: A category-theoretic approach
December 16, 2024
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
Geeks, MOPs, and sociopaths in subculture evolution
December 16, 2024
How muggles and sociopaths invade and undermine creative subcultures; and how to stop them.
Advanced programming languages
December 16, 2024
Students often ask for a recommendation on what language they should learn next.
ugh.book
December 16, 2024
Working memory - Wikipedia
December 16, 2024
Working memory is a cognitive system with a limited capacity that can hold information temporarily. [1] It is important for reasoning and the guidance of decision-making and behavior.
Working hurts less than procrastinating, we fear the twinge of starting
December 16, 2024
When you procrastinate, you're probably not procrastinating because of the pain of working. …
llama.cpp guide - Running LLMs locally, on any hardware, from scratch
December 16, 2024
Psst, kid, want some cheap and small LLMs?
GitHub - avinassh/py-caskdb: (educational) build your own disk based KV store
December 16, 2024
(educational) build your own disk based KV store. Contribute to avinassh/py-caskdb development by creating an account on GitHub.
Command Line Interface Guidelines
December 13, 2024
An open-source guide to help you write better command-line programs, taking traditional UNIX principles and updating them for the modern day.
How Many Computers Are In Your Computer?
December 11, 2024
Any ‘computer’ is made up of hundreds of separate computers plugged together, any of which can be hacked. I list some of these parts.
Category theory for scientists (Old version)
December 11, 2024
There are many books designed to introduce category theory to either a
mathematical audience or a computer science audience. In this book, our
audience is the broader scientific community. We attempt to show that category
theory can be applied throughout the sciences as a framework for modeling
phenomena and communicating results. In order to target the scientific
audience, this book is example-based rather than proof-based. For example,
monoids are framed in terms of agents acting on objects, sheaves are introduced
with primary examples coming from geography, and colored operads are discussed
in terms of their ability to model self-similarity.
A new version with solutions to exercises will be available through MIT
Press.
Genie 2: A large-scale foundation world model
December 10, 2024
Generating unlimited diverse training environments for future general agents
Design Of This Website
December 9, 2024
Meta page describing Gwern.net, the self-documenting website’s implementation and experiments for better ‘semantic zoom’ of hypertext; technical decisions using Markdown and static hosting.
WilliamYi96/Awesome-Energy-Based-Models: A curated list of resources on energy-based models.
December 9, 2024
A curated list of resources on energy-based models. - WilliamYi96/Awesome-Energy-Based-Models
"CBLL, Research Projects, Computational and Biological Learning Lab, Courant Institute, NYU"
December 9, 2024
Yann LeCun's Web pages at NYU
yataobian/awesome-ebm: Collecting research materials on EBM/EBL (Energy Based Models, Energy Based Learning)
December 9, 2024
Collecting research materials on EBM/EBL (Energy Based Models, Energy Based Learning) - yataobian/awesome-ebm
TuringConf
December 7, 2024
Omens of exceptional talent
December 6, 2024
Gaiseric…was a man of moderate height and lame in consequence of a fall from his horse. He was a man of deep thought and few words
I’m often asked about the signs of exceptional talent I’ve observed, probably because I spend too much running around talking to people & observing things, instead of doing anything useful.
Patrick Collison, Sam Altman, and Tyler Cowen are the three names that come to mind when thinking about this question. Of my writing, Intelligence killed …
An Introduction to Current Theories of Consciousness
December 6, 2024
(Crosspost from my blog) • • There are few academic lists of theories of consciousness (Doerig 2020) as well as some good blog post series about specific ideas (shout out to SelfAwarePatterns), but…
Being the (Pareto) Best in the World
December 6, 2024
John Wentworth argues that becoming one of the best in the world at *one* specific skill is hard, but it's not as hard to become the best in the worl…
Greg Yang
December 5, 2024
I am currently developing a framework called Tensor Programs for understanding large neural networks.
A Century of Mathematics in America, Part I
December 4, 2024
Fastest contributed programs, grouped by programming language implementation
December 3, 2024
Charts showing benchmark program performance grouped by implementation language.
Haskell as fast as C: working at a high altitude for low level performance
December 3, 2024
After the last post about high performance, high level programming, Slava Pestov, of Factor fame, wondered whether it was generally true that “if you want good performance you have to write C…
On Competing with C Using Haskell
December 3, 2024
Mark Karpov wrote in his article on Migrating text metrics to pure Haskell how he originally did foreign calls out to C for many of the functions in his text metric package, but now ported them to Haskell when he learned that Haskell can give you performance comparable to C.
Performance
December 3, 2024
Moreover, it's often not clear if two programs which supposedly have the same functionality really do the same thing.
TS_Tutorial
December 3, 2024
Category Theory usage in Algebraic Topology
December 3, 2024
First my question:
How much category theory should someone studying algebraic topology generally know?
Motivation: I am taking my first graduate course in algebraic topology next semester, and,...
Topos Theory in a Nutshell
December 3, 2024
Okay, you wanna know what a topos is? First I'll give you a hand-wavy vague explanation, then an actual definition, then a few consequences of this definition, and then some examples.
context
December 3, 2024
Proof Explorer
December 3, 2024
Inspired by Whitehead and Russell's monumental Principia Mathematica, the Metamath Proof Explorer has over 26,000 completely worked out proofs in its main sections (and over 41,000 counting "mathboxes", which are annexes where contributors can develop additional topics), starting from the very foundation that mathematics is built on and eventually arriving at familiar mathematical facts and beyond.
An Invitation to Applied Category Theory
December 3, 2024
Abstract page for arXiv paper 1803.05316: Seven Sketches in Compositionality: An Invitation to Applied Category Theory
An Invitation to Applied Category Theory
December 3, 2024
Cambridge Core - Programming Languages and Applied Logic - An Invitation to Applied Category Theory
Introducing io_uring_spawn
December 2, 2024
The traditional mechanism for launching a program in a new process on Unix systems—forking and execing—has been with us for decades, but it is not really the most efficient of operations.
November 2024
27 bookmarksInformation Theory: A Tutorial Introduction
November 29, 2024
Shannon's mathematical theory of communication defines fundamental limits on
how much information can be transmitted between the different components of any
man-made or biological system. This paper is an informal but rigorous
introduction to the main ideas implicit in Shannon's theory. An annotated
reading list is provided for further reading.
Daniel Lemire's blog
November 29, 2024
I find that there can still be a significant benefit to using csFastFloat over the . NET library: it can be about 3 times faster.
A Beginner's Guide to Vectorization By Hand: Part 3
November 29, 2024
We're continuing our expendition to the world of manual vectorization. In this part, we will explain the most common technique for vectorizing conditional code (usually referred as if-conversion).
Competitive Programming
November 29, 2024
This is the supporting web page for a book titled: "Competitive Programming 4: The Lower Bound of Programming Contests in the 2020s" written by Steven Halim, Felix Halim, and Suhendry Effendy.
Coalescence: making LLM inference 5x faster
November 24, 2024
In this post we’re going to explore a surprising property of structured generation when working with Large Language Models (LLMs): generating structured output from an LLM can be significantly faster than generating unstructured text.
þÿClassics in the History of Psychology -- Miller (1956)
November 21, 2024
``You and Your Research''
November 20, 2024
At a seminar in the Bell Communications Research Colloquia Series, Dr. Richard W.
Algorithms for Modern Hardware
November 19, 2024
Its intended audience is everyone from performance engineers and practical algorithm researchers to undergraduate computer science students who have just finished an advanced algorithms course and want to learn more practical ways to speed up a program than by going from O(nlogn) to O(nloglogn).
Creating enums at comptime
November 18, 2024
Using zig's @Type to dynamically create enums at comptime
How to get from high school math to cutting-edge ML/AI: a detailed 4-stage roadmap with links to the best learning resources that I’m aware of.
November 18, 2024
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning.
Fundamental Components of Deep Learning: A category-theoretic approach
November 18, 2024
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
How LLVM Optimizes a Function
November 17, 2024
In some compilers the IR format remains fixed throughout the optimization pipeline, in others the format or semantics change.
PS2_and_PC_BIOS_Interface_Technical_Reference_Apr87
November 17, 2024
How 99% of C Tutorials Get it Wrong
November 17, 2024
But this article did not arise only from my own opinion. The argument I'll present here, at least in its general form, is one which programmers who I know personally and I admire a lot (e.
A Beginner's Guide to Vectorization By Hand: Part 1
November 17, 2024
The CPU vendors have been trying for a lot of time to exploit as much parallelism as they can and the introduction of vector instructions is one way to go.
Tell the Compiler What You Know
November 17, 2024
Compilers a lot of times use magic to uncover hidden mysteries of your program and optimize it aggressively.
Compiler Optimization in a Language you Can Understand
November 17, 2024
In this article, I'll explain compiler optimizations through a series of examples, focusing on what compilers do.
How Target-Independent is Your IR?
November 17, 2024
An esoteric exploration on the target independence of compiler IRs.
Bibliopolis-Book-retypeset-1984
November 12, 2024
Numerical Recipes
November 11, 2024
We are Numerical Recipes, one of the oldest continuously operating sites on the Internet.
Unpacking Intuition
November 10, 2024
Can intuition be taught? The way in which faces are recognized, the structure of natural classes, and the architecture of intuition may all be instances of the same process. The conjecture that intuition is a species of recognition memory implies ...
October 2024
6 bookmarksTCP Server in Zig - Part 5a - Poll
October 15, 2024
Using non-blocking sockets and poll to improve the scalability of our system.
6.824 Schedule: Spring 2022
October 1, 2024
Here is the tentative schedule of lectures and due dates. The lecture notes and paper questions for future dates are copies from previous years, and may change.
September 2024
14 bookmarks2305.20091
September 30, 2024
Humans in 4D: Reconstructing and Tracking Humans with Transformers
September 30, 2024
Join the discussion on this paper page
slpj-book-1987.djvu
September 30, 2024
Typing the technical interview
September 30, 2024
In the formless days, long before the rise of the Church, all spells were woven of pure causality, all actions were permitted, and death was common.
Reversing the technical interview
September 30, 2024
If you want to get a job as a software witch, you’re going to have to pass a whiteboard interview.
Hexing the technical interview
September 30, 2024
But Hacker News has read of you, in their snicker-slithing susurrential warrens, and word has spread, which is why the young man offering you a smörgåsbord of microkitchen delights looks mildly suspicious already.
Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)
September 23, 2024
General Lessons from Boosting Data Ingestion in the range-set-blaze Crate by 7x
Conscious exotica
September 21, 2024
From algorithms to aliens, could humans ever understand minds that are radically unlike our own?
B-trees and database indexes
September 13, 2024
B-trees are used by many modern DBMSs. Learn how they work, how databases use them, and how your choice of primary key can affect index performance.
Safe C++
September 13, 2024
Over the past two years, the United States Government has been issuing warnings about memory-unsafe programming languages with increasing urgency.
Tutorial on Diffusion Models for Imaging and Vision
September 10, 2024
The astonishing growth of generative tools in recent years has empowered many
exciting applications in text-to-image generation and text-to-video generation.
The underlying principle behind these generative tools is the concept of
diffusion, a particular sampling mechanism that has overcome some shortcomings
that were deemed difficult in the previous approaches. The goal of this
tutorial is to discuss the essential ideas underlying the diffusion models. The
target audience of this tutorial includes undergraduate and graduate students
who are interested in doing research on diffusion models or applying these
models to solve other problems.
Async Rust can be a pleasure to work with (without `Send + Sync + 'static`)
September 9, 2024
Async Rust is powerful. And it can be a pain to work with (and learn). Async Rust can be a pleasure to work with, though, if we can do it without `Send + Sync + 'static`.
The Perfect Plan
September 3, 2024
Too often do we obsess over the perfect plan to chase our dreams, resulting in analysis paralysis. Instead of being stuck in this limbo, I've made the perfect plan for anyone to chase their dreams.
The Fast Track
September 1, 2024
In order to accelerate the development of prospective mathematical scientists, we have selected a series of textbooks one can study to reach expertise in mathematics and physics in the most efficient manner possible.
August 2024
25 bookmarksLinus Torvalds talks AI, Rust adoption, and why the Linux kernel is 'the only thing that matters'
August 24, 2024
In a wide-ranging conversation with Verizon open-source officer Dirk Hohndel, 'plodding engineer' Linus Torvalds discussed where Linux is today and where it may go tomorrow.
Intercepting and modifying Linux system calls with ptrace
August 24, 2024
Intercepting and modifying Linux system calls with ptrace
What's the big deal about Deterministic Simulation Testing?
August 24, 2024
What's the big deal about Deterministic Simulation Testing?
Zig and Emulators
August 24, 2024
Some quick Zig feedback in the context of a new 8-bit emulator project I starteda little while ago:
A ToC of the 20 part linker essay
August 24, 2024
I release this message (the ToC and comments) into the public domain, no right reserved.
trading_interview_blog
August 21, 2024
`zig cc`: a Powerful Drop-In Replacement for GCC/Clang
August 15, 2024
If you have heard of Zig before, you may know it as a promising new programming language which is ambitiously trying to overthrow C as the de-facto systems language.
Zig Build System
August 15, 2024
The fundamental commands zig build-exe, zig build-lib, zig build-obj, and zig test are often sufficient.
Resources for Amateur Compiler Writers
August 14, 2024
I know complete pans of the literature are left out, but this is a page for amateur compiler writers. Anything that I did not find practical is not listed here.
MattPD/cpplinks: A categorized list of C++ resources.
August 14, 2024
A categorized list of C++ resources. Contribute to MattPD/cpplinks development by creating an account on GitHub.
Putting the “You” in CPU
August 14, 2024
Curious exactly what happens when you run a program on your computer? Learn how multiprocessing works, what system calls really are, how computers manage memory with hardware interrupts, and how Linux loads executables.
How to Compile Your Language
August 9, 2024
The guide also covers how to create a platform-specific executable with the help of the LLVM compiler infrastructure, which all of the previously mentioned languages use for the same purpose.
Introduction to the Odin Programming Language
August 9, 2024
Preface This article is an introduction the Odin Programming Language. It is aimed at people who know a bit of programming, but have never touched Odin. It is not a reference guide, rather I try to keep things informal and talk about what I think are important aspects of the language. There will be some notes on differences to C/C++, as Odin in many ways tries to be better C. If you enjoy this article and want to support me, then you can do so by becoming a patron.
Arena allocator tips and tricks
August 6, 2024
Over the past year I’ve refined my approach to arena allocation. With practice, it’s effective, simple, and fast; typically as easy to use as garbage collection but without the costs.
No Starch Press
August 6, 2024
Your billing information must match the billing address for the credit card entered below or we will be unable to process your payment.
Part 2: Portable Executable Files
August 6, 2024
bytecode interpreters for tiny computers
August 4, 2024
I've previously come to the conclusion that there's little reason for using bytecode in the modern world, except in order to get more compact code, for which it can be very effective.
How I built zig-sqlite
August 4, 2024
When you prepare a statement zig-sqlite creates a brand new type only for this prepared statement.
The Hunt for the Missing Data Type
August 3, 2024
A (directed) graph is a set of nodes, connected by arrows (edges). The nodes and edges may contain data. Here are some graphs:
All graphs made with graphviz (source)
Graphs are ubiquitous in software engineering:
Package dependencies form directed graphs, as do module imports. The internet is a graph of links between webpages. Model checkers analyze software by exploring the “state space” of all possible configurations.
Microfeatures I'd like to see in more languages
August 3, 2024
There are roughly three classes of language features: Features that the language is effectively designed around, such that you can't add it after the fact....
Google’s Fully Homomorphic Encryption Compiler — A Primer
August 2, 2024
Back in May of 2022 I transferred teams at Google to work on Fully Homomorphic Encryption (newsletter announcement). Since then I’ve been working on a variety of projects in the space, includ…
Will I be able to access proprietary platform APIs (e.g. Android / iOS)?
August 1, 2024
The kind of binary format being considered for WebAssembly can be natively decoded much faster than JavaScript can be parsed (experiments show more than 20× faster).
The future of Clang-based tooling
August 1, 2024
By Peter Goodman Clang is a marvelous compiler; it’s a compiler’s compiler! But it isn’t a toolsmith’s compiler. As a toolsmith, my ideal compiler would be an open book, allowing me to get to…
July 2024
174 bookmarksFast Multidimensional Matrix Multiplication on CPU from Scratch
July 30, 2024
Numpy can multiply two 1024x1024 matrices on a 4-core Intel CPU in ~8ms.This is incredibly fast, considering this boils down to 18 FLOPs / core / cycle, with...
Efficient n-states on x86 systems
July 29, 2024
The text discusses how to efficiently handle control flow in x86 systems when a flag can have multiple states beyond true and false. It explains how to use condition codes, such as testing for zero and parity, to minimize the number of instructions needed for these tests. Additionally, it touches on the challenges and limitations of using inline assembly for optimization in C programming.
Program tuning as a resource allocation problem
July 29, 2024
Program tuning involves balancing simplicity and performance while sharing cache resources among various subsystems. Optimizing one function can impact others, making it a global resource allocation problem that requires careful consideration of algorithms and their resource footprints. Better tools and metrics are needed to manage and analyze cache resource consumption effectively.
How web bloat impacts users with slow connections
July 29, 2024
Web bloat makes many websites difficult to use for people with slow internet connections and devices. Sites like Discourse and Reddit perform poorly on low-end devices, even if they seem fast on high-end ones. Improving web performance for these users is crucial, as many people rely on older, slower devices.
Files are hard
July 29, 2024
Writing files in a way that ensures their robustness is challenging due to the complexity involved. The paper discusses various issues related to file corruption and data loss, such as crash consistency, filesystem semantics, filesystem correctness, error handling, and error recovery. It highlights the differences in how different filesystems handle errors and points out bugs and inconsistencies found in popular filesystems. The paper also addresses the frequency of disk errors and data corruption, emphasizing the need for caution when writing files and the importance of using libraries or tools to ensure safety. Overall, the document emphasizes the difficulty of reasoning about file-related problems and the need for careful considerations when working with filesystems.
Ringing in a new asynchronous I/O API
July 29, 2024
The new "io_uring" interface simplifies asynchronous I/O in the Linux kernel by using two ring buffers for submission and completion queues. Applications can set up these buffers with a system call and submit I/O requests through a structured format. This method aims to reduce complaints about AIO by improving efficiency and ease of use.
applicative-mental-models
July 29, 2024
The text discusses the importance of understanding program performance for effective optimization. It emphasizes that while most optimizations may not be necessary, being aware of critical performance paths is essential. The author provides latency numbers to help programmers grasp the impact of different operations on performance.
applicative-mental-models
July 29, 2024
The text discusses the importance of understanding program performance for effective optimization. It emphasizes that while most optimizations may not be necessary, being aware of critical performance paths is essential. The author provides latency numbers to help programmers grasp the impact of different operations on performance.
Optimizing subroutines in assembly language
July 29, 2024
Optimizing subroutines in assembly language involves various techniques such as using inline assembly in a C++ compiler, separating code using MMX registers from code using ST registers, and understanding different register sizes and memory operands. It is important to consider the use of instruction prefixes, intrinsic functions for vector operations, and accessing class and structure members efficiently. Additionally, preventing false dependences, aligning loop and subroutine entries, and optimizing instruction sizes can improve performance. However, it is crucial to note that these optimizations are processor-specific and may vary depending on the target platform.
Brian Robert Callahan
July 29, 2024
This blog post starts a series on creating programs that demystify how programs work. The first program is a disassembler that reads bytecode and converts it into assembly language, while a future post will cover creating an assembler. The disassembler uses a table of mnemonics and instruction sizes to print out the corresponding assembly instructions from bytecode.
QBE vs LLVM
July 29, 2024
QBE and LLVM are both compiler backends, but QBE is a smaller, more accessible project aimed at amateur language designers. While LLVM is feature-rich and complex, QBE focuses on simplicity and efficiency, making it easier to use for quick projects. QBE provides straightforward operations and a cleaner intermediate language, reducing the complexity often found in LLVM.
Recent presentations and papers
July 29, 2024
Andi Kleen's work focuses on improving Linux performance through various techniques like hardware monitoring and profiling. He has presented on topics such as lock elision, multi-core scalability, and error handling in the Linux kernel. His contributions include discussions on modern CPU performance, tools for Linux development, and enhancements for energy efficiency.
brotli-2015-09-22
July 29, 2024
How long does it take to make a context switch?
July 29, 2024
Context switching times vary significantly across different Intel CPU models, with more expensive CPUs generally performing better. The performance can be greatly affected by cache usage and thread migration between cores, leading to increased costs when tasks are switched. Optimizing the number of threads to match the number of hardware threads can improve CPU efficiency and reduce context switching overhead.
Ghostty Devlog 001
July 29, 2024
Ghostty is a terminal emulator developed as a side project. In this devlog, the author shares details about the tech stack behind Ghostty, including its cross-platform capabilities and GPU acceleration. The devlog also introduces two features: automatic shell integration injection and auto-italicize fonts. The shell integration feature improves prompt redrawing, working directory reporting, and active process detection, while the auto-italicize fonts feature fixes a bug and adds the ability to skew regular fonts to create fake italics. The devlog concludes by inviting readers to follow the author on social media for updates and future devlogs.
Tiled Matrix Multiplication
July 29, 2024
Tiled matrix multiplication is an efficient algorithm used on GPUs that reduces memory access by utilizing shared memory. By organizing threads into blocks, each thread can perform calculations more quickly and with fewer memory accesses. This method is important for improving performance in tasks like graphics rendering and machine learning.
Rust Atomics and Locks
July 29, 2024
This book by Mara Bos explores Rust programming language's concurrency features, including atomics, locks, and memory ordering. Readers will gain a practical understanding of low-level concurrency in Rust, covering topics like mutexes and condition variables. The book provides insights on implementing correct concurrency code and building custom locking and synchronization mechanisms.
Compiler Backend
July 29, 2024
The QBE compiler backend is designed to be a compact yet high-performance C embeddable backend that prioritizes correctness, simplicity, and user-friendliness. It compiles on various x64 operating systems and boasts features like IEEE floating point support, SSA-based intermediate language, and quick compilation times. While currently limited to x64 platforms, plans include ARM support and further enhancements. The backend has been successfully utilized in various projects, showcasing its adaptability and effectiveness in compiler development.
Vale's Memory Safety Strategy: Generational References and Regions
July 29, 2024
Vale's memory safety strategy uses generational references to manage memory without relying on traditional methods like garbage collection. Each reference stores a "generation" ID, and before accessing an object, a check ensures the ID matches the object's current generation. This approach allows for efficient memory management while maintaining safety, reducing overhead significantly compared to other methods.
Introduction
July 29, 2024
Wait-freedom ensures that each thread can progress independently, executing operations in a fixed number of steps without being blocked by others. Lock-freedom allows the system to make overall progress, but individual threads might still get stuck. Obstruction-freedom means a thread can only progress without interference from others, making it a weaker guarantee than lock-freedom.
Cache-Oblivious Algorithms
July 29, 2024
Cache-oblivious algorithms are designed to use processor caches efficiently without needing to know specific cache details. They work by dividing data into smaller parts, allowing more computations to happen in cache and reducing memory access. This leads to better performance, especially in parallel algorithms, by minimizing shared memory bottlenecks.
A Memory Allocator
July 29, 2024
A memory allocator is software that manages dynamic memory allocation in programs, providing functions like malloc(), free(), and realloc(). This particular allocator aims to minimize memory wastage and improve efficiency, and it is widely used in various systems, including Linux. It employs techniques like coalescing freed chunks and supports memory mapping to enhance performance and reduce fragmentation.
Cramming: Training a Language Model on a Single GPU in One Day
July 29, 2024
Recent trends in language modeling have focused on increasing performance
through scaling, and have resulted in an environment where training language
models is out of reach for most researchers and practitioners. While most in
the community are asking how to push the limits of extreme computation, we ask
the opposite question: How far can we get with a single GPU in just one day?
We investigate the downstream performance achievable with a transformer-based
language model trained completely from scratch with masked language modeling
for a single day on a single consumer GPU. Aside from re-analyzing nearly all
components of the pretraining pipeline for this scenario and providing a
modified pipeline with performance close to BERT, we investigate why scaling
down is hard, and which modifications actually improve performance in this
scenario. We provide evidence that even in this constrained setting,
performance closely follows scaling laws observed in large-compute settings.
Through the lens of scaling laws, we ...
The MiniPile Challenge for Data-Efficient Language Models
July 29, 2024
The MiniPile Challenge introduces a new dataset for pre-training language models, containing 1 million documents filtered for quality. It aims to reduce the need for large computational resources while still achieving competitive performance on language tasks. The research shows that models pre-trained on MiniPile perform only slightly worse than those trained on much larger datasets.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
July 29, 2024
The authors present a method for training large text-to-image diffusion models on a very low budget. They use a technique called deferred masking to minimize performance loss while reducing computational costs. Their approach achieves high-quality results at a fraction of the cost compared to existing models, demonstrating the potential for democratizing AI training.
1024cores
July 29, 2024
Dmitry Vyukov shares information on synchronization algorithms, multicore design patterns, and high-performance computing on his website, 1024cores.net. He focuses on shared-memory systems and does not cover topics like clusters or GPUs. New content is added regularly, and readers can subscribe for updates.
Implementing interactive languages
July 28, 2024
Implementing an interactive language requires considering both compile-time and run-time performance. Traditional switch-based bytecode interpreters are easy to implement but have lower run-time performance compared to optimizing compilers. A sweet spot in performance can be found by aiming for combined compile-time and run-time performance within a certain range. Various options for implementing fast interpreters, existing compilers like LLVM and Cranelift, custom compilers, and using WebAssembly as a backend are discussed. The idea of having two backends for a language to support quick startup and aggressive optimization is also explored. There are still many unknowns and further research is needed to determine the feasibility and performance of different approaches.
Pointers Are Complicated, or: What's in a Byte?
July 28, 2024
The document explains the complexities of pointers in low-level programming languages like C++ and Rust, debunking the misconception that pointers are simple integers. It delves into examples showing how assumptions about pointers can lead to undefined behavior and how pointer arithmetic can be tricky. The text proposes a model where a pointer is a pair of an allocation ID and an offset, rather than just an integer. Additionally, it discusses the challenges of representing bytes in memory, especially when dealing with uninitialized memory and the need for a more nuanced byte representation to ensure program correctness.
Three Architectures for a Responsive IDE
July 28, 2024
The text discusses three architectures for a responsive IDE: indexing on a per-file basis, using a FQN index for completion, and a query-based compiler approach. Each approach has its own challenges and benefits, such as handling macro expansions and managing dependencies efficiently to ensure fast performance.
How a Zig IDE Could Work Feb 10, 2023
July 28, 2024
The author discusses how to build an Integrated Development Environment (IDE) for the Zig programming language, which has unique features like a simple syntax but also complex compile-time evaluation. The IDE needs to handle incomplete code and provide immediate feedback while managing rapid code changes. The post explores various strategies for efficiently processing code, such as using abstract interpretation and optimizing compilation to focus only on necessary parts of the codebase.
Properly Testing Concurrent Data Structures Jul 5, 2024
July 28, 2024
The article discusses how to effectively test concurrent data structures by using managed threads that can be paused and resumed. It explains the importance of controlling thread execution to avoid issues like race conditions while executing random operations. The author emphasizes the need for proper synchronization mechanisms to ensure that only one thread is active at a time during tests.
Parse, don’t validate
July 28, 2024
The text discusses the importance of parsing over validating in Haskell to prevent errors and enhance code reliability by using strong argument types. Parsing upfront helps maintain consistency and avoids potential issues with partial input processing, demonstrating the benefits of type-driven design in Haskell programming. The text also touches on the subjective nature of programming languages, highlighting differing perceptions of Haskell and the challenges faced by learners in navigating diverse opinions.
Too Fast, Too Megamorphic: what influences method call performance in Java?
July 28, 2024
The performance of method calls in Java can be improved through techniques like inlining and using inline caches. Monomorphic calls, where only one method can be invoked, are the fastest, while bimorphic and megamorphic calls are slower due to increased lookup costs. The study highlights that simply adding the "final" keyword or overriding methods does not significantly enhance performance.
The Black Magic of (Java) Method Dispatch
July 28, 2024
The content shows code execution percentages for different operations within a program. It includes instructions for handling different coders, with comparisons and jumps based on coder values. The code includes sections like the main entry point, epilogue, handling other coders, and specific coder cases like Coder1 and Coder2.
Why null sucks, even if it's checked
July 28, 2024
The article discusses the problems with using null in programming languages like Kotlin and C#, highlighting that null can lead to confusion and errors. It argues that null is not an extensible solution for representing absence of value and suggests using sum types or optional types instead. The author believes that languages should focus on improving optional types rather than trying to make null safer.
Unnamed Document
July 28, 2024
Resources for Building Programming Languages
July 28, 2024
The article shares resources for learning how to create programming languages, focusing on Rust and C. It highlights the book "Crafting Interpreters," which provides practical insights into building interpreters using different programming approaches. The author also discusses their personal experience building a language and the tools they've found helpful, like LLVM and Cranelift.
Little 'Big Ideas' in Programming Language Design
July 28, 2024
Colin Davis discusses "little big ideas" in programming language design, focusing on the balance between innovative features and conventional choices. He highlights Mojo and Go as examples, noting how Mojo combines modern improvements with familiar concepts, while Go prioritizes simplicity and a strong ecosystem. Davis suggests that small design decisions, like memory management and parameter passing, can greatly enhance a language's usability and performance.
Computer Networking: A Top-Down Approach
July 27, 2024
Jim Kurose and Keith Ross are prominent computer science professors with extensive experience in networking and related fields. They have received multiple awards for their teaching and research, and both have held leadership roles in academic and professional organizations. Their work focuses on topics like network protocols, security, and multimedia communication.
Using Uninitialized Memory for Fun and Profit Posted on Friday, March 14, 2008.
July 27, 2024
A clever trick involves using uninitialized memory to improve performance in certain programming situations by representing sparse sets efficiently with two arrays that point at each other. This technique allows for fast constant-time operations for adding, checking, and clearing elements in the set, making it a valuable tool for optimizing algorithms and data structures. The sparse set representation is especially useful for scenarios where speed is critical, such as in compiler optimizations and graph traversal algorithms.
Zip Files All The Way Down
July 27, 2024
The text discusses creating self-reproducing programs and files like zip files that can decompress to themselves. It explores using Lempel-Ziv compression for self-reproduction and the challenges of translating these concepts into real opcode encodings like DEFLATE used in gzip and zip files. The ultimate goal is to create a zip file that contains a larger copy of itself recursively, creating a chain of expanding zip files.
UTF-8: Bits, Bytes, and Benefits Posted on Friday, March 5, 2010.
July 27, 2024
UTF-8 is a straightforward way to encode Unicode code points into a byte stream, and understanding its inner workings is key to leveraging its benefits. Key properties of UTF-8 include preserving ASCII files, ensuring ASCII bytes are represented as themselves, and requiring code points to be encoded using the shortest possible sequence. The encoding is self-synchronizing, facilitating substring searches and making it compatible with most programs that handle 8-bit files safely. While some tools may need modification to handle UTF-8, it is increasingly becoming the standard encoding due to its practical advantages and simple design.
Minimal Boolean Formulas
July 27, 2024
The post discusses how to compute the minimum number of AND and OR operators needed for Boolean functions with five variables. It describes the author's program that efficiently calculates this minimum for various functions while also improving algorithms for speed. The findings contribute to understanding the complexity of Boolean functions and their representations.
Hacking the OS X Kernel for Fun and Profiles Posted on Tuesday, August 13, 2013.
July 27, 2024
The article discusses a bug in the OS X kernel related to how profiling signals are delivered in multithreaded processes. It explains that the kernel incorrectly sends the SIGPROF signal to the entire process instead of the specific running thread. The author outlines a fix involving a small edit to the kernel code to ensure the signal is sent to the correct thread.
How To Build a User-Level CPU Profiler Posted on Thursday, August 8, 2013.
July 27, 2024
The text discusses how the pprof tool simplifies CPU profiling for C++ and Go programs by utilizing hardware timers and the operating system. Profiling information is gathered through hardware interrupts, providing insights into a program's performance and resource usage. By moving profiling logic to user-level timers, programs can customize and enhance profiling capabilities without kernel changes.
An Encoded Tree Traversal
July 27, 2024
The text discusses different ways to traverse binary trees and how these methods can be generalized to k-ary trees. It highlights a new ordering for traversing k-ary trees that results in a regular numbering pattern, which is not present in the traditional methods. The author seeks references or examples of this k-ary-coded traversal order, which he has not yet found.
Our Software Dependency Problem
July 27, 2024
The text discusses the risks and benefits of using software dependencies in programming. It emphasizes the importance of understanding, managing, and monitoring dependencies to prevent potential issues like bugs and security vulnerabilities. The article highlights the need for developers to establish best practices for effectively utilizing dependencies in their projects.
The Magic of Sampling, and its Limitations Posted on Saturday, February 4, 2023.
July 27, 2024
Sampling can help estimate the percentage of items with a specific trait accurately. The number of samples taken greatly affects the accuracy of the estimate. To get precise estimates, all items must have an equal chance of being selected during sampling.
Running the “Reflections on Trusting Trust” Compiler Posted on Wednesday, October 25, 2023.
July 27, 2024
The text discusses how to modify a C compiler to insert a backdoor into a program without leaving traces in the source code. It explains that the backdoor can be detected because the compiler's size increases each time it compiles itself. Finally, it highlights the importance of using trusted compilers to prevent hidden backdoors in modern software development.
Improving the Font Pipeline
July 26, 2024
To improve the font pipeline, consider how to efficiently choose and render glyphs for different languages, including handling ligatures and memory constraints. You may need to create texture atlases for various glyphs while ensuring new translations are incorporated. Finally, optimize rendering to avoid blurriness and ensure smooth performance across different character sets.
Easy Scalable Text Rendering on the GPU
July 26, 2024
This text explains a fast and memory-efficient technique for rendering text on the GPU without using traditional methods like signed distance fields. It uses triangles to fill in pixels inside the glyph and supports subpixel anti-aliasing for crisp text on LCD screens. The technique is resolution-independent, simple to implement, and can be extended to enhance rendering quality.
Adventures in Text Rendering: Kerning and Glyph Atlases
July 26, 2024
Text rendering involves converting vector glyphs to bitmaps, positioning them on screen, and optimizing performance by using glyph atlases. Glyph atlases store rasterized glyphs efficiently, allowing for sub-pixel alignment and improved rendering quality. This approach balances performance and quality in text rendering for different types of fonts.
Exploring the Power of Negative Space Programming
July 25, 2024
Negative space programming helps improve code by defining what it should not do, making it more robust and clear. By using constraints and assertions, developers can catch errors early and enhance security. This approach also promotes simplicity, making the code easier to maintain and understand.
CompilerTalkFinal
July 25, 2024
The content discusses various compilers and their features, including Clang, GCC, V8, CakeML, Chez Scheme, and more. It also touches on the history of interpreters and compilers, with examples like ENIAC and the first compiler developed by Grace Hopper. Different approaches to compilation and interpretation are highlighted, showcasing the evolution of compiler technology.
Graydon Hoare: 21 compilers and 3 orders of magnitude in 60 minutes
July 25, 2024
Graydon Hoare's talk explains different approaches to building compilers, from traditional giants to more efficient variants. He highlights the importance of using compiler-friendly languages and theory-driven meta-languages. The presentation covers key concepts like sophisticated partial evaluation and implementing compilers directly by hand.
p75-hoare
July 25, 2024
The author recounts experiences in designing a computer programming language and issues a warning about language complexity. Despite challenges, a subset of the language was successfully implemented. The author emphasizes the importance of simplicity and reliability in programming languages for critical applications.
Updating the Go Memory Model
July 23, 2024
The Go memory model needs updates to clarify how synchronization works and to endorse race detectors for safer concurrency. It suggests adding typed atomic operations and possibly unsynchronized atomics to improve program correctness and performance. The goal is to ensure that Go programs behave consistently and avoid data races, making them easier to debug.
Programming Language Memory Models (Memory Models, Part 2) Posted on Tuesday, July 6, 2021. PDF
July 23, 2024
Modern programming languages use atomic variables and operations to help synchronize threads and prevent data races. This ensures that programs run correctly by allowing proper communication between threads without inconsistent memory access. All major languages, like C++, Java, and Rust, support sequentially consistent atomics to simplify the development of multithreaded programs.
Hardware Memory Models (Memory Models, Part 1) Posted on Tuesday, June 29, 2021. PDF
July 23, 2024
This text discusses hardware memory models, focusing on how different processors handle memory operations and maintain order. It explains the concept of sequential consistency, where operations are executed in a predictable order, and contrasts it with more relaxed models like those used in ARM and POWER architectures. The author highlights the importance of synchronization to avoid data races in concurrent programming.
Baby Steps to a C Compiler
July 23, 2024
Writing a simple compiler can help you understand how computers work. Start with a minimal project that compiles a small subset of a language, and then gradually add more features. This approach makes learning about compilers and programming enjoyable and rewarding.
Kernel Programming Guide
July 23, 2024
Essential information for programming in the OS X kernel. Includes a high-level overview.
Tiny Tapeout
July 23, 2024
Tiny Tapeout is a project that helps people easily and affordably create their own chip designs. It offers resources for beginners and advanced users, along with a special price for submissions. Join the community to learn and share your designs before the deadline on September 6th.
Why Pascal is Not My Favorite Programming Language
July 23, 2024
Pascal is not recommended for serious programming due to limitations in its standard form. The language's strict typing and lack of features like separate compilation make it challenging for complex projects. Pascal is better suited for educational purposes rather than practical programming tasks.
What Color is Your Function?
July 23, 2024
Functions in a programming language can be either red or blue, affecting how they are called and used. Red functions are asynchronous and typically more complex to work with than blue functions. The choice between red and blue functions can impact code organization and maintainability.
What is an Invariant? Oct 6, 2023
July 22, 2024
Invariants are properties that hold true during the evolution of a system, helping to ensure correct behavior in programming. They can simplify reasoning about code, whether it’s for small algorithms or larger systems. By clearly defining invariants, programmers can create robust code and manage complex systems effectively.
Chess-GPT's Internal World Model
July 22, 2024
The blog post discusses how a GPT model trained on chess games learns to predict moves and track the board state without being explicitly given the rules. It successfully classified chess pieces with high accuracy and estimated player skill levels based on game moves. The findings suggest that models trained on strategic games can effectively learn complex tasks through pattern recognition.
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
July 22, 2024
Researchers trained a chess-playing language model to understand the game without prior knowledge, focusing on how it represents the board state. They found that the model not only learned the board's layout but also estimated player skill, which helped it predict the next move better. By incorporating a player skill vector, the model's win rate improved significantly.
Manipulating Chess-GPT's World Model
July 22, 2024
The author explores how Chess-GPT, a language model for chess, can improve its performance by manipulating its internal understanding of player skill and board state. By using linear probes and skill interventions, the model's chess-playing ability was significantly enhanced, especially in games with random initializations. The findings suggest that Chess-GPT learns a deeper understanding of chess rather than just memorizing patterns.
Crafting an Interpreter in Zig - part 1
July 22, 2024
The author is learning Zig by implementing an interpreter for the Lox programming language, inspired by the book "Crafting Interpreters." They are documenting their journey, focusing on interesting aspects of Zig and how it differs from C. So far, they have enjoyed the process, particularly the simplicity and power of Zig's generic programming.
Teach Yourself Programming in Ten Years
July 21, 2024
The text discusses the misconception of quickly learning programming in a short time, emphasizing that true expertise takes about ten years of dedicated practice and learning. It highlights the importance of hands-on experience, interacting with other programmers, and working on various projects to become a skilled programmer. The text emphasizes that mastering programming requires practical application, learning from others, and continuous practice over time.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
July 21, 2024
The text discusses the challenges and considerations of floating-point arithmetic in computer science. It emphasizes the importance of rounding in floating-point calculations and the implications of different precision levels. Additionally, it highlights the need for careful implementation to ensure correctness and accuracy in programs that rely on floating-point arithmetic.
The Development of the C Language*
July 20, 2024
The paper discusses the development and influences of the C programming language, highlighting its creation at Bell Labs and transition from the B language. C's simplicity, efficiency, and widespread adoption across various platforms and architectures are emphasized, showcasing its enduring stability and usefulness in software development. Despite its quirks and historical origin, C has proven to be a powerful and versatile language for programmers worldwide.
Class Warfare
July 20, 2024
The text discusses a woman's conversation about company politics and self-interest, highlighting a zero-sum mentality within organizations. It emphasizes the need to shift away from this mindset and focus on creating value instead. The author suggests that combating this mentality starts with internal change and encourages individuals to reject zero-sum thinking for long-term benefit.
Ownership
July 20, 2024
A Note About Zig Books for the Zig Community
July 18, 2024
The author discusses the idea of writing a Zig book and shares personal plans for self-publishing their own book. They weigh the pros and cons of working with a publisher versus self-publishing, emphasizing the importance of considering creative freedom and revenue sharing. The author encourages those interested in writing a Zig book to carefully evaluate their options, noting that the Zig community values learning materials and support.
Your Starting Point!
July 17, 2024
The text discusses the concepts of three-dimensional objects and how they are represented in two dimensions for computer graphics. It explains the process of projecting 3D points onto a canvas to create images. The importance of geometry and mathematics in computer graphics, particularly in defining objects and creating images, is emphasized.
Zig Interfaces for the Uninitiated, an update
July 17, 2024
The post discusses a new idiom for runtime polymorphism in Zig, focusing on using fat pointers instead of @fieldParentPtr. It provides a step-by-step guide on creating a formal Iterator interface and implementing it with an example range iterator. The drawbacks of this pattern include potential performance issues and the requirement for the original implementor to remain alive for the interface to function correctly.
Zig Interfaces for the Uninitiated
July 17, 2024
The text discusses how to create and implement generic iterators in Zig using interfaces like `Iterator` and `Range`.
It demonstrates how to use these iterators to iterate over ranges of values and provides examples of ascending, descending, and skipping ranges.
Additionally, it introduces a function `fold` to apply a function to successive elements in an iterator, showcasing Zig's runtime polymorphism for data structures.
Exploring Compile-Time Interfaces in Zig
July 17, 2024
Zig is a programming language with active community support and a focus on efficient, reusable software development. Interfaces in Zig define a blueprint for classes to implement specific methods, promoting code abstraction and flexibility. Compile-time interfaces in Zig optimize code structure by resolving methods during compilation for efficient program execution.
Aro - a C compiler
July 17, 2024
Aro is a C compiler created as an alternative to Zig's compiler. It includes the aro module for the compiler and a language-agnostic aro_backend module for translating code into machine code. Aro uses self-hosted backends from the Zig compiler for optimization.
Database Systems
July 17, 2024
This course at CMU covers database management systems, including data models, query languages, storage architectures, and more. It uses case studies to show real-world applications and is suitable for students with basic systems programming skills. The course also thanks companies for their support in equipment donations and course development.
Discovering and exploring mmap using Go
July 17, 2024
Memory-mapped files allow programs to access disk data larger than available memory. By using mmap in Go, you can map a file directly into memory for easier manipulation. Virtual memory techniques, like mmap, can help solve memory limitations in handling large files efficiently.
But how, exactly, databases use mmap?
July 17, 2024
Databases use memory-mapped files like mmap to handle data on disk larger than available memory. Examples include SQLite, LevelDB, Lucene, LMDB, and MongoDB. By understanding how mmap is used, we can grasp how databases efficiently read and write data from disk.
reHow memory mapped files, filesystems and cloud storage works
July 17, 2024
Kelly discusses the challenges of memory-mapped files and cloud storage in response to a comment about space reservation in Voron. Cloud providers may allocate more space than needed, leading to unexpected charges and unreliable data handling. Testing reveals issues with sparse files and memory mapping in cloud scenarios, highlighting the importance of understanding storage limitations.
Implementing a file pager in Zig
July 17, 2024
Implementing a file pager in Zig involves delaying disk writes until a threshold is reached. Two eviction strategies include least recently used and least frequently used models. Prioritizing pages based on usage can help optimize performance.
Criticizing Hare language approach for generic data structures
July 17, 2024
The blog criticizes the Hare language approach for not providing generic data structures like hash maps in its standard library. It highlights the complexity and importance of hash tables in various programming languages and emphasizes the need for efficient data structures in modern programming ecosystems. The author disagrees with Hare's approach and stresses the significance of hash tables in software development.
spikedoanz/from-bits-to-intelligence: machine learninig stack in under 100,000 lines of code
July 16, 2024
The text discusses building a machine learning stack in under 100,000 lines of code with hardware, software, tensors, and machine learning components. It outlines the required components like a CPU, GPU, storage, C compiler, Python runtime, operating system, and more. The goal is to simplify the machine learning stack while providing detailed steps for implementation in different programming languages.
One year of C
July 16, 2024
The author reflects on their year of writing C code, finding it enjoyable and productive. They emphasize the importance of choosing the right language for each problem and share insights on the benefits of using C over C++ in certain scenarios. Additionally, they discuss the advantages of C99 improvements and the simplified nature of writing C code compared to C++.
Heap Memory and Allocators
July 15, 2024
The text discusses different types of memory allocators in Zig programming language.
It explains how memory allocation and deallocation work using alloc and free functions.
Various allocator types like GeneralPurposeAllocator and FixedBufferAllocator are highlighted for managing memory efficiently.
Learning Zig - Pointers
July 15, 2024
Pointers
July 15, 2024
Pointers in Zig allow variables to reference memory addresses. Understanding pointers helps manipulate memory effectively. Pointers are values that store memory addresses and can be nested within structures.
Data Compression Explained
July 15, 2024
Data compression involves modeling and coding to reduce the size of data files. Modern compressors typically use arithmetic coding for efficient compression. Algorithms like Huffman coding and run-length encoding are commonly used to achieve better compression results.
Twitter's Recommendation Algorithm
July 14, 2024
Twitter uses a recommendation algorithm to select the top tweets for users' timelines. The algorithm is based on core models and features that extract information from tweet, user, and engagement data. The recommendation pipeline consists of three main stages: candidate sourcing, ranking, and applying heuristics and filters. Twitter uses both in-network and out-of-network sources to find relevant tweets, and employs embedding spaces to determine content similarity. The final step involves blending tweets with other non-tweet content before sending them to users' devices. The goal of Twitter's open source endeavor is to provide transparency to users about how the recommendation system works.
Programming languages resources
July 13, 2024
This page is a collection of the author's favorite resources for people getting started writing programming languages. The resources cover various aspects such as compilers, runtimes, runtime optimization, pointer tagging, JIT compilers, assembler libraries, and interesting tools. The author also mentions topics they want to write about in the future and papers they want to read. The page is meant to be a helpful reference for those interested in programming language implementation.
3D Math Primer for Graphics and Game Development
July 12, 2024
The book "3D Math Primer for Graphics and Game Development" is available to read for free on the gamemath.com website. It includes information about GDC talks, FAQs, and resources for the first edition of the book. The first edition, published in 2002, is described as high tech, but the author recommends reading the second edition instead, which is also available for free.
Welcome to OpenGL
July 12, 2024
This text is about learning modern OpenGL through an online book that covers basic, intermediate, and advanced knowledge with clear examples and practical concepts. The content is freely available online and in print, with the aim of providing a complete and easy-to-understand platform for graphics programming enthusiasts. Readers will learn core graphics aspects, useful techniques, and even create a small game based on the obtained OpenGL knowledge.
WebGPU Fundamentals
July 12, 2024
The text provides a collection of articles to help beginners learn the basics of WebGPU, covering topics like fundamentals, 3D math, lighting techniques, and compute shaders. It also includes information on optional features, data memory layout, transparency, performance, and resources for further learning. Readers can explore various aspects of WebGPU, including how it works, 2D and 3D techniques, and essential concepts like uniforms, textures, and storage buffers.
An opinionated beginner’s guide to Haskell in mid-2019
July 12, 2024
This guide is for beginners in Haskell or those transitioning from similar languages, offering advice on learning resources and tools. It emphasizes the importance of writing Haskell code, getting help online, choosing popular platforms, and sticking to the default Prelude. The guide also touches on application architecture, using records, debugging techniques, and the experimental nature of Haskell as both a research and industrial language.
Are tagged unions overrated?
July 12, 2024
The author discusses the limitations of tagged unions and pattern matching in language development, suggesting that they are overrated for implementing language ASTs and IRs. Despite the benefits of tagged unions, the complexity they add may not always justify their use, especially in cases where simpler alternatives like class hierarchies can offer similar functionality. The post also highlights the potential for enhancing pattern-matching capabilities in mainstream languages to improve code readability and maintainability.
C++ Core Guidelines
July 12, 2024
These guidelines aim to simplify and improve the safety of C++ code by recommending specific extensions and best practices. They focus on static type safety, resource management, and reducing the likelihood of errors or accidents. By following these guidelines, programmers can write more correct, safer code without sacrificing performance.
What every systems programmer should know about concurrency
July 11, 2024
The document delves into the complexities of concurrency for systems programmers, explaining the challenges of running multithreaded programs where code is optimized and executed in unexpected sequences. It covers fundamental concepts like atomicity, enforcing order in multithreaded programs, and memory orderings. The text emphasizes the importance of understanding how hardware, compilers, programming languages, and applications interact to create a sense of order in multithreaded programs. Key topics include atomic operations, read-modify-write operations, compare-and-swap mechanisms, and memory barriers in weakly-ordered hardware architectures.
compiler_construction
July 11, 2024
Building a compiler can be straightforward by breaking the development into small steps and using Scheme as the implementation language. The tutorial focuses on translating a subset of Scheme to assembly code, with a step-by-step approach to achieve a fully working compiler. Testing and refining the compiler incrementally leads to a powerful tool capable of compiling an interactive evaluator.
How do we tell truths that might hurt?
July 11, 2024
The document discusses the challenges of telling unpleasant truths and the conflict that arises when sharing these truths in the field of Computing Science. The author argues that remaining silent about these truths compromises the intellectual integrity of the field. The document also lists a number of truths related to programming languages and the use of language in computing systems. The author questions whether the field should continue to ignore these truths and urges for a change in attitude.
The next fifty years
July 11, 2024
The text discusses the future of computing science over the next fifty years, emphasizing the importance of simplicity and elegance in design to prevent complexity. It highlights the close connection between program design and proof design, suggesting that advancements in program design can impact general mathematics. The author encourages embracing the opportunity to simplify processes and design systems that rely on formal mathematics.
Recommender Systems: A Primer
July 10, 2024
Personalized recommendations have become a common feature of modern online
services, including most major e-commerce sites, media platforms and social
networks. Today, due to their high practical relevance, research in the area of
recommender systems is flourishing more than ever. However, with the new
application scenarios of recommender systems that we observe today, constantly
new challenges arise as well, both in terms of algorithmic requirements and
with respect to the evaluation of such systems. In this paper, we first provide
an overview of the traditional formulation of the recommendation problem. We
then review the classical algorithmic paradigms for item retrieval and ranking
and elaborate how such systems can be evaluated. Afterwards, we discuss a
number of recent developments in recommender systems research, including
research on session-based recommendation, biases in recommender systems, and
questions regarding the impact and value of recommender systems in practice.
http client in the standard library · Issue #2007 · ziglang/zig
July 10, 2024
The issue #2007 discusses the implementation of an HTTP client in Zig's standard library. Contributors debate the necessity and scope of including an HTTP client, considering factors like complexity and resource allocation. Ultimately, the HTTP client implementation was completed and closed as part of milestone 0.12.0.
Introduction to Compilers and Language Design
July 10, 2024
A compiler translates high-level code to lower-level code, and building one is a common project in computer science education. This book provides a beginner-friendly guide to building a compiler for a C-like language, suitable for undergraduates with programming experience. The author offers free online access to the textbook and related code resources, with options to purchase a physical copy.
Bare Metal Zig
July 10, 2024
The text discusses compiling a freestanding Zig binary to run on "bare metal" without relying on an operating system. It shows how to create a simple freestanding binary, make it multiboot compliant, and add custom console functionality for output. The process involves targeting specific architectures, handling linker warnings, and ultimately creating a bootable "kernel" to run on virtual machines like QEMU.
Comparing SIMD on x86-64 and arm64
July 10, 2024
The text compares SIMD implementations using SSE on x86-64 and Neon on arm64 processors, including emulating SSE on arm64 with Neon. It explores vectorized code performance using intrinsics, auto-vectorization, and ISPC, highlighting the efficiency of SSE and Neon implementations. The study shows how optimizing for SIMD instructions significantly boosts performance over scalar implementations in ray-box intersection tests.
Compiler Optimizations Are Hard Because They Forget
July 10, 2024
Compiler optimizations involve breaking down complex changes into smaller, more manageable steps to improve code efficiency. However, as more optimizations are added, the potential for errors and missed opportunities increases, making it challenging to maintain optimal performance. Compilers struggle with balancing aggressive optimizations while preserving correct program behavior, highlighting the complexity and difficulties inherent in optimizing compilers.
C Isn't A Programming Language Anymore
July 10, 2024
C is no longer just a programming language but a vital protocol for all languages. Parsing C headers is a complex task best left to C compilers. Maintaining ABI compatibility in C can be challenging and may require versioning schemes.
Writing a C Compiler, Part 1
July 9, 2024
This text is about creating a C compiler in multiple stages, starting with lexing, parsing, and code generation. The process involves breaking down the source code, building an abstract syntax tree, and generating x86 assembly code. The compiler will handle simple programs with a single main function and a return statement.
GitHub - DoctorWkt/acwj: A Compiler Writing Journey
July 9, 2024
This GitHub repository documents the author's journey to create a self-compiling compiler for a subset of the C language. The author shares steps taken and explanations to help others follow along practically. The author credits Nils M Holm's SubC compiler for inspiration and differentiates their code with separate licensing.
A new JIT engine for PHP-8.4/9
July 9, 2024
A new JIT engine for PHP is being developed, improving performance and simplifying development. The engine will be included in the next major PHP version, potentially PHP 9.0. The new JIT engine generates a single Intermediate Representation (IR), eliminating the need to support assembler code for different CPUs.
Unknown
July 9, 2024
Hardware prefetching in multicore processors can be too aggressive, wasting resources and impacting performance for co-running threads. Combining hardware and software prefetching can optimize performance by efficiently handling irregular memory accesses. A method described in Paper II offers a low-overhead framework for accurate software prefetching in applications with irregular access patterns.
Introduction 2016 NUMA Deep Dive Series
July 9, 2024
The 2016 NUMA Deep Dive Series by staroceans.org explores various aspects of computer architecture, focusing on NUMA systems and their optimization for performance. The series covers topics such as system architecture, cache coherency, memory optimization, and VMkernel constructs to help readers understand and improve their host design and management. The series aims to provide valuable insights for configuring and deploying dual socket systems using Intel Xeon processors, with a focus on enhancing overall platform performance.
von Neumann architecture - Wikipedia
July 9, 2024
The von Neumann architecture is a computer design with a processing unit, control unit, memory, and input/output mechanisms. It allows for instructions and data operations to be stored in memory, advancing computer technology from fixed-function machines like the ENIAC. This architecture was influenced by the work of Alan Turing and John von Neumann and has been widely used in the development of modern computers.
Compiling tree transforms to operate on packed representations
July 8, 2024
The article explains how tree traversals in programming can be optimized by compiling them to work on serialized tree structures without using pointers. This approach can make programs run significantly faster on current x86 architectures. The authors developed a prototype compiler for a functional language that generates efficient code for traversing trees using packed data representations.
Pipelines Support Vectorized, Point-Free, and Imperative Style
July 8, 2024
The text discusses how pipelines in the shell language support vectorized operations on collections and point-free style, where no data is explicitly mentioned. It also demonstrates how imperative code can be incorporated within pipelines for tasks like generating HTML tables. The unique features of pipelines include their ability to handle vectorized code, point-free composition, and integration of imperative instructions.
Entering text in the terminal is complicated
July 8, 2024
Entering text in the terminal can be challenging due to inconsistencies in how different programs handle text input. Some programs support basic features like arrow keys and history navigation, while others have custom input systems with advanced functionalities. Understanding the input mode of a program can help users navigate text editing more effectively in the terminal.
What happens when you start a process on Linux?
July 8, 2024
The process of starting a new program on Linux involves using the fork and exec system calls. Fork creates a clone of the current process, while exec replaces that clone with the new program to be executed. The new process inherits most attributes from its parent, with memory being shared through copy-on-write to optimize performance.
Debug your programs like they're closed source!
July 8, 2024
The author discusses debugging programs without looking at the source code by using system calls like open, execve, and write. System calls allow you to understand and monitor a program's behavior without needing access to its source code. By learning and utilizing system calls, you gain debugging superpowers that are platform-independent and useful for closed-source programs.
How I got better at debugging
July 8, 2024
Julia Evans shares her journey of improving her debugging skills through logical thinking, confidence, expanding knowledge, communication, and using tools like strace and tcpdump. By being systematic, confident, knowledgeable, and open to collaboration, she transformed debugging from a challenging task to an exciting learning opportunity. Her story emphasizes the importance of persistence, curiosity, and practical problem-solving in mastering the art of debugging.
Media Page Under Construction
July 8, 2024
Handmade Cities' media page is under construction, with some recordings missing. The videos from Handmade Boston 2023 have poor audio quality due to using a third-party A/V company. Freya's Masterclass footage was lost, and an abridged version will be shown at Dutch Game Day.
Infographics: Operation Costs in CPU Clock Cycles
July 8, 2024
The text discusses the operation costs in CPU clock cycles for different types of operations, including simple operations, floating-point operations, and vector operations. It highlights that memory involvement can significantly impact operation costs, with some operations taking as little as 1 CPU cycle. Different CPU architectures and types of operations can result in varying costs, with some operations requiring specialized CPU support to work efficiently.
Handles are the better pointers
July 8, 2024
The text discusses using 'index-handles' instead of raw or smart pointers for memory management in C and C++. It suggests centralizing memory management into systems, grouping items into arrays, and converting handles to pointers only when necessary. By following specific rules, such as not storing pointers and using handle-to-pointer conversion, memory safety and efficient memory usage can be maintained.
You're Not Sick of Programming
July 8, 2024
Many people feel tired of programming and dream of quitting for a more fulfilling career, like farming or traveling. However, the real issue might be frustration with office politics, lack of product vision, and burnout rather than a true dislike of programming. Taking a break or addressing these underlying problems could help rediscover the creative potential of programming.
Zig Bare Metal Programming on STM32F103 — Booting up
July 8, 2024
The text explains how to program the STM32F103 microcontroller using the Zig programming language. It covers topics such as memory layout, linker scripts, and compiling code for embedded systems. By following the provided instructions, readers can successfully compile and run their first embedded program on the microcontroller.
OWASP Top Ten
July 7, 2024
The OWASP Top 10 is a guide for developers to understand critical security risks in web applications. Companies are encouraged to follow this document to improve the security of their web applications. The 2021 update includes new categories and ranking changes based on testing data and industry feedback.
Introduction
July 7, 2024
The OWASP Cheat Sheet Series offers valuable security information on application security topics. Created by experts, these concise cheat sheets aim to provide easy-to-read security guidance. You can download the cheat sheets from this site and stay updated through the ATOM feed.
The Copenhagen Book
July 7, 2024
The Copenhagen Book is a free and open-source guide for implementing auth in web applications. It is community-maintained and can be used alongside the OWASP Cheat Sheet Series. Suggestions or concerns can be addressed by opening a new issue.
Undefined Behavior deserves a better reputation
July 6, 2024
Undefined Behavior is often viewed negatively, but it can be a valuable tool for language designers. It allows programmers to convey insights to the compiler for optimizations. Responsible use of Undefined Behavior can enhance language design and code performance.
KHM+15
July 6, 2024
The text discusses a formal C memory model that supports integer-pointer casts, essential for low-level C programming. It proposes a quasi-concrete memory model that allows standard compiler optimizations while fully supporting integer-pointer casts. This model helps verify programs and optimizations that are challenging to validate with integer-pointer casts.
Learning LLVM (Part-1) - Writing a simple LLVM pass
July 5, 2024
This text introduces learning about LLVM and writing LLVM passes, which are used for transforming or analyzing a program's intermediate representation. LLVM offers a versatile compiler infrastructure with modules like the frontend, middle-end, and backend for optimizing and generating machine-specific code. By understanding LLVM concepts and pass managers, developers can create efficient passes for tasks like performance optimization and code analysis.
Some Were Meant for C
July 5, 2024
The document "Some Were Meant for C" explores the enduring significance of the C programming language, highlighting its dual role as both an application and systems programming language. It challenges common assumptions about C, emphasizing its unique communicative design that differs from managed languages. The document argues that C's explicit representations and memory access foster effective system-building and communication, making it a preferred choice for certain technical challenges. Additionally, it critiques the prevailing discourse that demonizes C, advocating for a nuanced understanding of its role in the programming landscape.
Xv6, a simple Unix-like teaching operating system
July 5, 2024
Xv6 is a teaching operating system developed by MIT for their operating systems course. It is based on Unix V6, written in ANSI C, and runs on Intel x86 machines. The xv6 source code is available on GitHub and is used in lectures to teach operating system concepts.
C Is Not a Low-level Language
July 5, 2024
C is often considered a low-level language, but this article argues that it is not. The author explains that vulnerabilities like Spectre and Meltdown occurred because processor architects were trying to build fast processors that exposed the same abstract machine as a PDP-11, which C programmers believe is close to the underlying hardware. However, the reality is that C code runs on a complex compiler that performs intricate transformations to achieve the desired performance. The article also discusses how C's memory model and optimizations make it difficult to understand and can lead to undefined behavior. The author suggests that instead of trying to make C code fast, it may be time to explore programming models on processors designed for speed.
Should you learn C to "learn how the computer works"?
July 5, 2024
The author discusses whether learning C is necessary to understand how computers work, ultimately concluding that C is not a direct representation of computer operations. Learning C can still be beneficial for understanding computing concepts and history, but it operates within a virtual machine and abstracts certain hardware details. By learning C, you can gain insight into the relationship between programming languages, hardware, and the historical development of computing.
A Guide to Undefined Behavior in C and C++, Part 1
July 5, 2024
The text explains that undefined behavior in C and C++ can lead to unpredictable program outcomes. Compilers may optimize code by exploiting undefined behavior, potentially causing programs to misbehave. It is important for programmers to understand how undefined behavior can impact program execution.
Using neural nets to recognize handwritten digits
July 5, 2024
Neural networks can recognize handwritten digits by learning from examples. Sigmoid neurons play a key role in helping neural networks learn. Gradient descent is a common method used for learning in neural networks.
When Network is Faster than Cache
July 5, 2024
Firefox introduced a feature called RCWN to improve web performance by racing cached requests against the network. In some cases, the network can be faster than fetching data from the cache due to various factors like browser bugs and resource prioritization. Factors like device hardware and the total number of assets served from the cache impact cache retrieval performance significantly.
John Carmack on Functional Programming in C++
July 5, 2024
Functional programming in C++ can help in writing better software by making code easier to reason about and eliminating thread race conditions. Pure functions, which only rely on input parameters and produce consistent outputs, offer benefits such as thread safety and easier testing. Refactoring towards purity can improve code quality, even if full purity is not achieved, by disentangling computation from the environment it operates in.
Zig-style generics are not well-suited for most languages
July 5, 2024
Zig-style generics, like those in C++, may not work well for all languages due to limitations in compiler support and type inference. Armchair suggestions about adopting Zig-style generics in other languages may overlook these challenges. The flexibility and metaprogramming capabilities in Zig may not easily translate to other statically-typed languages.
WebGL2 vs WebGL1
July 4, 2024
WebGL is a 3D API that works as a rasterization engine, requiring users to provide code for rendering points, lines, and triangles. Users must create vertex and fragment shaders to control how WebGL processes and displays graphics. The WebGL API simplifies rendering by executing user-created functions to draw basic shapes like triangles.
WebGL How It Works
July 4, 2024
The text explains how WebGL processes vertices to create triangles and render them with pixels using shaders. Varyings are used to pass data from the vertex shader to the fragment shader for color interpolation. Buffers are essential for transferring vertex data to the GPU for rendering, and attribute locations are assigned to specify how to extract and use this data efficiently.
The_Night_Watch
July 4, 2024
The text discusses the importance of systems programmers in dealing with complex technical challenges, emphasizing their unique skills in debugging and problem-solving. It contrasts the roles of systems programmers with other computer professionals like GUI designers and PHP developers, highlighting the critical nature of systems programming in challenging scenarios. The text humorously portrays the intense and sometimes absurd experiences of systems programmers, showcasing their indispensable role in addressing technical issues efficiently and effectively.
FreeType
July 4, 2024
FreeType is a software library for rendering fonts, available for free. It is designed to be small, efficient, and capable of producing high-quality font images. Users can find installation instructions, documentation, and ways to communicate with the FreeType team on their website.
A Freestanding Rust Binary
July 3, 2024
To create a freestanding Rust executable for operating system development, we need to disable linking to the standard library and define our own entry point function. By compiling for a bare metal target like thumbv7em-none-eabihf, we can avoid linker errors and run Rust code without an underlying operating system. Additional linker arguments are required for specific operating systems like Linux, Windows, and macOS to resolve linker errors and build the freestanding Rust binary successfully.
Manually linking Rust binaries to support out-of-tree LLVM passes
July 3, 2024
LLVM is a compiler infrastructure used by frontends like rustc to generate machine code. To add custom LLVM passes to a Rust binary, extra flags can be used during compilation to produce LLVM-IR and then link the binary properly using LLVM tools. By understanding how Rust's static libraries work and leveraging cargo for dependency management, custom LLVM passes can be integrated into Rust binaries efficiently.
The Rust Reference
July 3, 2024
The Rust compiler can generate different types of output artifacts, such as runnable executables, Rust libraries, dynamic libraries, and static system libraries. Dependencies between crates can be linked in various formats, such as rlib and dynamic library formats, following specific rules set by the compiler. Understanding how to specify output formats like --crate-type=bin or --crate-type=lib can help control the compilation process for Rust crates, while also considering options for linking C runtimes dynamically or statically based on target features.
Rust Compiler Development Guide
July 3, 2024
The Rust compiler processes and transforms your code for compilation. It uses different stages like lexing, parsing, and abstract syntax tree lowering. The compiler aims for correctness, performance, and supporting incremental compilation.
How to speed up the Rust compiler one last time
July 3, 2024
The author at Mozilla is concluding their work on speeding up the Rust compiler after several years of dedicated effort.
They wrote multiple blog posts detailing their performance optimizations and shared valuable lessons learned from the process.
The author expressed gratitude to those who supported their work and highlighted the importance of ongoing contributions to Rust's development.
How to speed up the Rust compiler in March 2024
July 3, 2024
In March 2024, updates on the Rust compiler's performance highlighted several key improvements. Changes like using a single codegen unit, marking Debug::fmt methods with #[inline], introducing a cache, and upgrading LLVM versions led to notable reductions in wall-time, binary size, and hash table lookups. Additionally, the availability of the Cranelift codegen backend for x86-64/Linux and ARM/Linux offers an alternative for faster compile times. While the author didn't contribute to speed improvements this time, overall performance from August 2023 to March 2024 showed reductions in wall-time, peak memory usage, and binary size, indicating steady progress in enhancing the Rust compiler's efficiency.
Zig Bits 0x4: Building an HTTP client/server from scratch
July 3, 2024
The text explains how to create an HTTP client and server from scratch using Zig >=0.11.
For the client, you need to set up requests, headers, and wait for responses.
The server part involves defining functions to handle requests and running the server to accept connections.
Do We Really Need A Link Step?
July 3, 2024
The author questions the need for a link step in native-code compilation for faster performance. They propose a "zero-link" approach where compilers directly write object code into the final executable file. This method could improve efficiency by avoiding unnecessary object files and incorporating symbol resolution within the executable itself.
Death Note: L, Anonymity & Eluding Entropy
July 2, 2024
The text discusses Light's mistakes in using the Death Note and how they led to his de-anonymization by L. Light's errors, such as revealing his precise killing methods and using confidential police information, significantly reduced his anonymity. The text also explores strategies Light could have employed to better protect his anonymity while using the Death Note.
jamiebuilds/the-super-tiny-compiler: :snowman: Possibly the smallest compiler ever
July 2, 2024
The Super Tiny Compiler is a simplified example of a modern compiler using easy-to-read JavaScript. It helps you understand how compilers work from start to finish. Compilers play a big role in the tools we use daily.
5 Days to Virtualization: A Series on Hypervisor Development
July 2, 2024
A series on hypervisor development for Intel processors with virtualization support will be published next week, covering topics like setting up a test environment, driver skeleton creation, and multi-processor initialization. The series aims to aid new readers in building, testing, and understanding type-2 hypervisor development using C programming language. Recommended reading and detailed explanations will be provided to enhance knowledge and understanding of virtualization concepts.
In-depth analysis on Valorant’s Guarded Regions
July 2, 2024
The text discusses how Valorant's anti-cheat system, Vanguard, uses innovative techniques to protect against memory manipulation by whitelisting threads and creating shadow regions. These methods involve cloning and modifying the game's paging tables to allow access to hidden memory without affecting performance. By implementing these advanced security measures, Vanguard effectively prevents cheats from bypassing its guarded regions.
Exploit Development: No Code Execution? No Problem! Living The Age of VBS, HVCI, and Kernel CFG
July 2, 2024
The text discusses various techniques used in exploit development, particularly focusing on targeting the Windows kernel. It mentions concepts like Hypervisor-Protected Code Integrity (HVCI) and how exploits can manipulate memory to execute attacker-controlled code in kernel mode. The text also delves into details like leaking kernel-mode memory, constructing ROP chains on the kernel-mode stack, and utilizing functions like NtQuerySystemInformation to escalate privileges and perform malicious actions in the system.
Reader
July 2, 2024
The Reader API by jina.ai helps extract clean, LLM-friendly text from web content, ensuring high-quality input for AI systems like agents and RAG. It can also search the web for the latest information to keep LLMs up-to-date, improve factuality, and reduce misinformation. Additionally, Reader can read images on webpages and PDFs, providing alt text for images and lightning-fast PDF processing, all available for free with flexible rate limits.
CheerpX versus WebContainers
July 2, 2024
CheerpX is a client-side virtualization technology for running x86 executables and operating systems in the browser without modifications or recompilation. It offers cost-effective, secure, and private execution of native code, making it suitable for various web-based applications. CheerpX stands out from other solutions by supporting any x86 executable and providing a robust two-tier emulator for efficient code execution.
Creating a Rootkit to Learn C
July 2, 2024
The text demonstrates creating a userland rootkit in C to hide malicious activities like network connections and files. By hooking into system calls like access() and write(), the rootkit can manipulate userland programs and evade detection by tools like netstat. The rootkit uses shared library injections and hooks to intercept and manipulate system calls, showcasing the power of C for malicious activities.
Picsart-AI-Research/LIVE-Layerwise-Image-Vectorization: [CVPR 2022 Oral] Towards Layer-wise Image Vectorization
July 1, 2024
The text discusses a new method called LIVE for generating SVG images layer by layer to fit raster images. LIVE uses closed bezier paths to learn visual concepts in a recursive manner. Installation instructions and references for the method are provided in the text.
Udacity CS344: Intro to Parallel Programming
July 1, 2024
Intro to Parallel Programming is a free online course by NVIDIA and Udacity teaching parallel computing with CUDA. It's for developers, scientists, engineers, and students looking to learn about GPU programming and optimization. The course is self-paced, requires C programming knowledge, and offers approximately 21 hours of content.
CS 361: Systems Programming
July 1, 2024
The Systems Programming course at UIC includes assigned readings, video lectures, labs, and quizzes scheduled throughout the week. Students can access additional resources and submit assignments through the course gradescope page. Office hours, content quizzes, discussions, and exams are held on specific days via Zoom and YouTube.
Resolving Rust Symbols
July 1, 2024
Linking combines object files into an executable or shared library in Rust. The linker resolves symbols and dependencies between object files. Rust prefers static linking to create a single distributable binary with all dependencies included.
When FFI Function Calls Beat Native C
July 1, 2024
David Yu performed a benchmark comparing different Foreign Function Interfaces (FFI) for function calls. LuaJIT's FFI was found to be faster than native C function calls due to efficient dynamic function call handling. Direct function calls, like those used by LuaJIT, can outperform indirect calls routed through a Procedure Linkage Table (PLT).
Cap'n Proto, FlatBuffers, and SBE
July 1, 2024
FlatBuffers is a new serialization protocol released by Google engineers, similar to Cap’n Proto. Cap’n Proto allows random access using pointers, while FlatBuffers uses offsets stored in tables for random access. Protobufs, Cap’n Proto, and FlatBuffers have custom schema languages and different features for data serialization and access.
A Database Without Dynamic Memory Allocation
July 1, 2024
TigerBeetle, a database written in Zig, does not allocate memory dynamically after startup. It uses static memory allocation for all data structures, avoiding performance issues and use-after-free bugs. This approach allows for better predictability, easier handling of overload, and efficient resource management.
Wizard Zines Collection!
July 1, 2024
Julia offers programming zines with black and white covers for free and colored covers for purchase. The zines can be bought individually for $10-$12 each or as a whole collection. Additionally, there are free posters and a weekly comic subscription available.
Aggregating Millions of Groups Fast in Apache Arrow DataFusion 28.0.0
July 1, 2024
Apache Arrow DataFusion version 28.0.0 now offers faster parallel aggregation for queries with many groups. The improvements aim to enhance user experiences by generating insights more efficiently. These enhancements bring DataFusion closer to the grouping speed of DuckDB.
Problems of C, and how Zig addresses them
July 1, 2024
This blog post discusses issues with C and how Zig addresses them through features like comptime evaluations and improved memory management. Zig offers solutions like error handling improvements and treating everything as an expression, making it a modern alternative to C with enhanced functionalities. The comparison highlights Zig's advantages in areas such as memory management, error handling, and expressive coding practices.
June 2024
91 bookmarksHow to use hash map contexts to save memory when doing a string table
June 30, 2024
The text explains how to save memory when building a string table using hash map contexts. By adapting context APIs, only indexes are stored in the table, reducing memory usage. This method can save 117 KB of memory for a string table with 10 thousand entries.
resume.txt
June 30, 2024
Andrew Kelley is a programmer with 16 years of experience in software development and a passion for open-source projects. He has worked on various music-related software like the Genesis DAW and libgroove, contributing patches to libav and ffmpeg. Additionally, he has experience in low-level systems, custom algorithm creation, and designing user interfaces.
Leslie Lamport
June 28, 2024
Leslie Lamport wrote several papers on verifying and specifying concurrent systems using TLA. He discovered algorithms through formal derivation and emphasized mechanical verification of concurrent algorithms. His work influenced the development of the TLAPS proof system.
Indices and tables
June 27, 2024
CompilerGym is a library for reinforcement learning in compiler tasks. It helps ML researchers work on optimization problems and allows system developers to create new tasks for ML research. The goal is to use ML to make compilers faster.
448997590_1496256481254967_2304975057370160015_n
June 27, 2024
The LLM Compiler is a suite of pre-trained models designed for code optimization tasks, based on Code Llama. It has been trained on a large corpus of LLVM-IR and assembly code to enhance compiler behavior understanding. The release of LLM Compiler aims to support further research in compiler optimization for both academia and industry.
Bare Bones
June 25, 2024
This text explains how to create an operating system by first cross-compiling and using existing technology. It guides you through writing a kernel in C or C++, creating a bootloader, and linking the kernel for x86 systems. Following these steps ensures your operating system can be loaded and executed correctly.
The Graphics Codex
June 24, 2024
"The Graphics Codex" is a comprehensive resource for computer graphics, offering essential information on 3D rendering and shading. It includes equations, diagrams, and programming projects, with free updates every month. Written by expert Morgan McGuire, it is a valuable tool for learning and reference in the field of computer graphics.
[2305.13009] Textually Pretrained Speech Language Models
June 24, 2024
Notes on partial borrows
June 24, 2024
The text discusses limitations of the Rust borrow checker and proposes solutions for creating references that borrow from specific subsets of a type. Two approaches, "View types" and "Reference views," are explored to address these limitations and provide more flexibility in borrowing subsets of fields with different lifetimes and mutability. The discussion includes examples, subtyping implications, monomorphization considerations, and the need to update Rust's aliasing model to accommodate view references accessing discontiguous memory regions.
Dioxus Labs + “High-level Rust”
June 24, 2024
An article criticized Rust's gamedev hype, but its popularity stems from meeting modern programming needs like speed and safety. Efforts are underway to enhance Rust's capabilities for various industries and improve compile times significantly. Proposed enhancements include incremental linking, parallel frontend, and macro expansion caching to make Rust more efficient for developers.
Compile-Time Configuration For Zig Libraries
June 24, 2024
To expose compile-time configuration options in Zig libraries, developers can use global declarations in the root source file or through Zig's build system. By setting configuration flags, developers can customize behavior such as enabling or disabling assertions in library code. Compile-time configuration can improve performance by allowing certain checks to be done at compile-time rather than runtime.
Generics
June 24, 2024
Generics in Zig allow for creating data structures and algorithms that can work with different types. By using generics, code can be written once and reused with various data types. Zig's approach to generics involves leveraging compile-time metaprogramming capabilities.
Zig's HashMap - Part 1
June 24, 2024
Zig's std.HashMap implementation relies on two key functions: hash and eql. The documentation outlines various hash map types and their functionalities, including std.HashMapUnmanaged. AutoHashMap can automatically generate hash functions, but there are limitations, and custom contexts can be provided for more complex keys.
Zig Parser
June 24, 2024
The Zig Parser is a crucial part of the Zig compiler internals, responsible for constructing an abstract syntax tree from a stream of tokens. The parser uses a struct called Parser to manage the internal state of the parse operation, accumulating errors and building up AST nodes. Understanding the structure of an AST node and the data pattern is essential for comprehending how the parser works and the subsequent stages of the compiler. The AST node data is stored in various locations such as the token stream, the node list, and the extra data list, with specific structures and indexes used to access information about AST nodes like function declarations and prototypes.
Copying Better: How To Acquire The Tacit Knowledge of Experts
June 24, 2024
The text discusses how to acquire expert intuition, known as tacit knowledge, through emulation and apprenticeship. Naturalistic Decision Making (NDM) research helps extract and teach expert judgment using methods like Cognitive Task Analysis and the recognition-primed decision making model. Experts rely on implicit memory and pattern recognition to make rapid assessments and decisions, which can be challenging to verbalize.
Causal ordering
June 24, 2024
Causal ordering is essential for understanding distributed systems, where events may not have a clear time order. This concept helps determine the causal relationship between events in a system. It enables reasoning about causality, leading to simpler solutions in distributed computing.
Assorted thoughts on zig (and rust)
June 24, 2024
Zig is simpler than Rust and offers similar features through compile-time execution. Rust provides strong type safety guarantees for generic functions, while Zig lacks automatic type constraint documentation and may face challenges with IDE support. Zig excels in custom allocators and handling out-of-memory errors, while Rust excels in preventing memory leaks and resource management.
Columnar kernels in go?
June 24, 2024
Over the winter I'm going to be adding a columnar query engine to an existing system written in go.
An opinionated map of incremental and streaming systems
June 24, 2024
The text discusses various design choices and characteristics of incremental and streaming systems. It highlights the core idea of these systems, which is to process inputs to generate outputs efficiently. The systems are categorized based on unstructured vs structured design, high temporal locality vs low temporal locality workloads, internal consistency vs internal inconsistency, and eager vs lazy computation approaches. The text explains the advantages and disadvantages of each design choice and provides examples of systems that fall into different categories. Additionally, it emphasizes the importance of understanding these design choices in selecting the appropriate system for specific workloads.
Internal consistency in streaming systems
June 24, 2024
The text discusses the importance of internal consistency in streaming systems. It explains how eventual consistency can lead to incorrect outputs and the need for systems to wait for all relevant inputs before emitting results. Maintaining internal consistency ensures correct outputs and prevents confusion between changes and corrections.
Pain we forgot
June 24, 2024
The text discusses the challenges in programming and the need for more user-friendly tools. It emphasizes the importance of improving feedback loops, running code smoothly, and creating more helpful programming environments. The author suggests rethinking traditional tools and approaches to make programming more accessible and efficient.
Have you tried rubbing a database on it?
June 24, 2024
HYTRADBOI was a conference featuring lightning talks on innovative uses of databases for solving problems. Talks included topics like building data-centric apps, realtime machine learning, and interactive databases. The event focused on embracing new solutions and fostering professional behavior among attendees.
The shape of data
June 24, 2024
The text discusses the importance of having a clear and consistent data notation in programming languages like Clojure. It emphasizes the advantages of a notation that closely aligns with the in-memory representation of data, making it easier for developers to work with and understand data structures. Additionally, it suggests that a well-designed data model and notation are crucial for efficient data manipulation and code analysis.
Reflections on a decade of coding
June 24, 2024
The author reflects on 12 years of coding experience, sharing recent projects and personal growth insights. They highlight the importance of gradual improvements in habits and processes over innate talent. The author identifies areas of progress, like writing efficient code and managing emotions, while acknowledging gaps in experience in maintaining large codebases and teamwork.
Prospecting for Hash Functions
June 24, 2024
The text discusses the process of designing non-cryptographic integer hash functions, exploring different operations and constraints to create effective hash functions. It also compares various 32-bit hash functions and their bias levels, highlighting the search for high-quality hash functions with minimal bias for both 32-bit and 64-bit integers.
The Missing Zig Polymorphism / Runtime Dispatch Reference
June 24, 2024
The text discusses how Zig lacks built-in polymorphism features like interfaces or virtual methods. It explores creating polymorphism using existing language features in Zig. The author provides a detailed guide on implementing polymorphism in Zig, focusing on dynamic dispatch using function pointers.
Nanosystems
June 23, 2024
This text is about a book called "Nanosystems" by K. Eric Drexler, which is considered groundbreaking in the field of molecular nanotechnology. The book explains how to create manufacturing systems at the molecular level and discusses the significant impact nanotechnology will have on various industries. Experts praise the book for providing a foundation for future research in molecular systems engineering and molecular manufacturing.
How To Become A Hacker
June 23, 2024
The text explains what it means to be a hacker, focusing on problem-solving, creativity, and a willingness to share knowledge within the hacker culture. It emphasizes the importance of developing a hacker mindset, skills, and dedication through self-education and a passion for solving new problems. The hacker culture values intelligence, hard work, and a sense of community, with an emphasis on learning and sharing information to advance the collective knowledge of hackers.
the rr debugging experience
June 20, 2024
rr is a debugging tool for Linux that records failures for deterministic replay under gdb. It helps debug real applications efficiently and supports reverse execution for finding bugs. rr aims to make debugging easier with low overhead and powerful features like hardware data watchpoints.
Text Buffer Reimplementation
June 19, 2024
The Visual Studio Code 1.21 release includes a new text buffer implementation that improves performance in terms of speed and memory usage. The previous implementation used an array of lines, but it had limitations such as high memory usage and slow file opening times. The new implementation uses a piece table data structure, which allows for better memory usage and faster line look-up. Additionally, the implementation uses techniques such as caching for faster line lookup and a balanced binary tree for efficient searching. Benchmarks showed that the new implementation outperformed the previous line array implementation in terms of memory usage, file opening times, and reading operations.
What Is The Minimal Set Of Optimizations Needed For Zero-Cost Abstraction?
June 19, 2024
Rust and C++ offer "zero-cost abstractions" where high-level code compiles to low-level code without added runtime overhead, but enabling necessary compiler optimizations can slow down compilation and impact debugging. The challenge is to find the minimal set of optimizations that maintain zero-cost abstractions while improving build speed and debug information quality. Balancing fast debuggable builds with zero-cost abstractions is crucial for performance and developer experience in languages like Rust and C++.
Using ASCII waveforms to test hardware designs
June 19, 2024
Using expect tests automates the validation of code output, detecting errors efficiently. Jane Street uses Hardcaml in OCaml for hardware development, simplifying testbench creation. Waveform expect tests help visualize hardware behavior, improving development workflows.
Rust 2019 and beyond: limits to (some) growth.
June 19, 2024
The text discusses the need for controls and policies to manage the growth limits of technical artifacts and the strains on individuals in the Rust project. It emphasizes the importance of acknowledging and addressing these limits to prevent potential crises or dysfunction in the future. The author suggests implementing controls, such as hard limits and moderation strategies, to maintain a healthy and sustainable project environment.
Your ABI is Probably Wrong
June 19, 2024
The text discusses how most ABIs have a design flaw that harms performance by passing large structures inefficiently. Different ABIs handle passing large structures differently, but they all repeat the same mistakes. A correctly-specified ABI should pass large structures by immutable reference to avoid unnecessary copies.
GitHub - sirupsen/napkin-math: Techniques and numbers for estimating system's performance from first-principles
June 19, 2024
The project "Napkin Math" aims to provide resources and techniques to estimate system performance quickly and accurately. It includes examples like estimating memory reading speed and storage costs for applications. The best way to learn this skill is through practical application, with the option to subscribe for regular practice problems. Detailed numbers and cost estimates are provided, along with compression ratios and techniques to simplify calculations. The project encourages user participation to enhance and refine the provided data and tools for napkin math calculations.
Don't write bugs
June 19, 2024
Effective programmers should focus on preventing bugs rather than debugging them. Re-reading code frequently can help reduce the number of errors. Writing bug-free code is achievable with practice and attention to detail.
technicalities: "not rocket science" (the story of monotone and bors)
June 19, 2024
The text discusses the development of a program called bors that enforces the "Not Rocket Science Rule" of maintaining a code repository that always passes tests. Bors automates integration testing and ensures code changes are only merged if they pass tests, preventing broken code from being merged. This system has been found to be extremely beneficial for software projects, ensuring a stable and reliable codebase.
Why is Python slow
June 19, 2024
Python's performance issues stem from spending most time in the C runtime, rather than the Python code itself. Pyston focuses on speeding up the C code to improve performance. Suggestions to improve Python's speed by using other JIT techniques overlook the fundamental issue of optimizing C code.
Design duality and the expression problem
June 19, 2024
The text discusses the concept of design duality in programming, focusing on the trade-offs between objects and data representations. It highlights the importance of making conscious design choices when introducing new types, whether as data, objects with extensible implementations, or abstract data types with restricted extensibility. The author emphasizes the need for programming languages to better support and encourage these design considerations.
Random Thoughts On Rust: crates.io And IDEs
June 19, 2024
The author shares experiences with Rust, praising cargo and crates.io for easy code distribution. They highlight the need for improved library discovery on crates.io and discuss the potential for better IDE support in Rust projects. Despite challenges like type inference, Rust's design enables advanced IDE features that can enhance coding efficiency.
John Carmack on Inlined Code
June 19, 2024
Consider inlining functions that are only called in one place for efficiency. Simplify code structure to reduce bugs and improve performance. Emphasize consistent execution paths over avoiding minor optimizations.
A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World
June 19, 2024
The text discusses the development and commercialization of a bug-finding tool that can identify errors in large amounts of code. It highlights the challenges faced in finding and addressing various types of bugs, such as memory corruption and data races, across different programming systems. The tool's effectiveness in uncovering bugs in complex codebases emphasizes the importance of bug detection for improving software quality.
What is Systems Programming, Really?
June 19, 2024
The term "systems programming" combines low-level programming and systems design. It involves creating and managing complex components, often focusing on machine implementation details. Over time, the distinction between systems programming and other programming languages has become less clear.
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding
June 18, 2024
MLKV introduces Multi-Layer Key-Value sharing to reduce memory usage in transformer decoding. This approach improves efficiency without sacrificing performance on NLP benchmarks. MLKV significantly reduces memory requirements compared to existing methods like Multi-Query Attention.
Mitchell Hashimoto
June 17, 2024
Mitchell Hashimoto is an advisor at Polar and shares insights on technical projects, Zig programming, and automation on his website. He discusses various topics like GitHub pull requests, Zig build system, and AI growth through cloud lens. Mitchell's writing covers a range of technical subjects and his experiences in the startup world.
Understanding_Machine_Learning_-_From_Theory_to_Algorithms
June 17, 2024
I'm sorry, but there is no content provided for me to summarize. If you provide me with the specific content or information you would like summarized, I would be happy to help.
UB Might Be a Wrong Term for Newer Languages Apr 2, 2023
June 17, 2024
The author suggests that using the term "undefined behavior" in newer languages like Zig and Rust may not be the best choice due to differences in semantics. In C, implementations can define some behaviors left undefined by the standard, but in Rust and Zig, any program showing undefined behavior is considered invalid. The author proposes using terms like "non-trapping programming error" or "invalid behavior" to better convey the intended semantics in these languages.
What Every C Programmer Should Know About Undefined Behavior #1/3
June 17, 2024
This blog post explains that many seemingly reasonable things in C actually have undefined behavior, leading to common bugs in programs. Undefined behavior in C allows for optimizations that improve performance but can result in unexpected outcomes like formatting your hard drive. Understanding undefined behavior is crucial for C programmers to prevent potential issues and improve code efficiency.
The Rustonomicon
June 16, 2024
The Rustonomicon is a book for understanding Unsafe Rust programming details. It complements The Rust Programming Language by delving into combining language pieces and potential issues. The book covers topics like (un)safety, creating safe abstractions with unsafe primitives, and working with memory, but does not provide exhaustive API details.
chrono-Compatible Low-Level Date Algorithms
June 15, 2024
The text explains algorithms for handling dates and determining leap years. It includes functions for calculating the last day of a month and converting dates between different calendar systems. The algorithms are designed to be efficient and accurate for various date calculations.
Step-by-Step Diffusion: An Elementary Tutorial
June 15, 2024
The text is a tutorial about diffusion. The authors are Preetum Nakkiran, Arwen Bradley, Hattie Zhou, and Madhu Advani. The tutorial is available on the domain readwise.io.
So Many New Systems Programming Languages II
June 14, 2024
The text discusses new systems programming languages like Rust, Zig, and Odin, highlighting their safety features and syntax. These languages offer improved memory management and safety compared to older languages like C and C++. Rust, in particular, stands out for its memory safety, threading support, and borrow checker.
zackoverflow
June 14, 2024
Zack, the author, enjoys building things and delving into the inner workings of systems and computers for dopamine. He works on the Bun JavaScript runtime and creates music when not coding. Zack invites anyone to chat through his open calendar link.
From Theory To Implementation
June 14, 2024
Physically Based Rendering is a widely-used textbook in computer graphics that combines theory with practical implementation for creating realistic images. The book, authored by industry experts, offers cutting-edge algorithms and ideas, including GPU ray tracing, to help readers design advanced rendering systems. Both the third and fourth editions of the book are available online for free.
Speech-to-text models
June 13, 2024
Speech-to-text AI enhances communication and accessibility by transcribing spoken words into text accurately and efficiently. Machine learning and AI advancements have significantly improved the accuracy and adaptability of speech-to-text systems. These technologies open up new possibilities for inclusive and effective communication across various industries.
Ray Tracing in One Weekend
June 13, 2024
"Ray Tracing in One Weekend" introduces readers to the concept of ray tracing through a step-by-step guide to creating a ray tracer that produces images. The document covers topics such as sending rays into the scene, ray-sphere intersection, shading, and reflection. It explains the mathematical aspects behind ray tracing, including formulas for sphere intersections and normal vectors. The guide progresses from creating a simple image of a sphere to more complex scenes, providing insights into the coding process and considerations for optimizing the rendering process.
Untangling Lifetimes: The Arena Allocator
June 13, 2024
The text discusses the arena allocator as an alternative to traditional manual memory management in C, addressing issues with malloc and free. The arena allocator simplifies memory allocation and deallocation by grouping lifetimes together in a single block of memory. It provides a more efficient and manageable way to handle memory usage in codebases compared to the malloc and free approach.
Tree-Structured Concurrency — 2023-07-01
June 12, 2024
Structured concurrency is a programming concept that ensures clear control flow in concurrent programs. In the context of async Rust, it guarantees properties like cancellation propagation, which means that dropping a future will also cancel all nested futures. The text discusses examples of unstructured and structured concurrency patterns, emphasizing the importance of applying structured concurrency to improve program correctness and maintainability. It also mentions the need for more API support to fully achieve structured concurrency in async Rust, suggesting practical approaches like using task queues or adopting the smol model for task spawning. Overall, structured concurrency provides a way to reason about async Rust programs effectively and enhance their reliability.
immersivemath: Immersive Linear Algebra
June 12, 2024
This text introduces a book on linear algebra with chapters covering vectors, dot products, matrix operations, and more. It aims to help readers understand fundamental concepts and tools in linear algebra through clear explanations and examples. The book includes topics such as Gaussian elimination, determinants, rank, and eigenvalues.
BSTJ 57: 6. July-August 1978: The UNIX Time-Sharing System. (Ritchie, D.M.; Thompson, K.)
June 12, 2024
The UNIX Time-Sharing System is a versatile operating system with unique features. It runs on Digital Equipment Corporation computers and emphasizes simplicity and ease of use. UNIX has been widely adopted for research, education, and document preparation purposes.
Principles of compiler design
June 12, 2024
This text is about a book on compiler design principles. The book is authored by Jeffrey D. Ullman and contains 604 pages. It includes bibliographical references, but access to the EPUB and PDF versions is not available.
A Mathematical Theory of Communication
June 12, 2024
The paper extends communication theory by considering noise in the channel, savings from message structure, and channel capacity. It discusses entropy, coding efficiency, channel capacity, noisy channels, equivocation, and optimal information transmission techniques. Examples and theorems are provided to explain the concepts of encoding, channel capacity, and noise in communication systems.
Mapping the whole internet with Hilbert curves
June 12, 2024
The author mapped the internet using Hilbert curves to visualize IP addresses. The curves help display the vast network structure in a more comprehensible way. The scan revealed interesting patterns and changes in IP address allocations over time.
Hausdorff dimension - Wikipedia
June 12, 2024
xorvoid
June 12, 2024
Anthony Bonkoski, a computer enthusiast and engineer, shares his experiences in programming and working in quantitative finance. He enjoys working on various projects and has expertise in low-level programming, distributed systems, and reverse-engineering. Currently taking a break from full-time work, he is open to part-time consulting projects and enjoys writing and exploring new interests.
A Recipe for Training Neural Networks
June 11, 2024
The text discusses common mistakes in training neural networks and emphasizes the importance of patience and attention to detail for successful deep learning. It provides a recipe for training neural networks, including steps like setting up a training skeleton, visualizing losses, and focusing on regularization and tuning to improve model performance. The text also highlights the value of adding more real data and using ensembles to enhance accuracy.
You own your data, in spite of the cloud
June 11, 2024
The text discusses the benefits of local-first software, emphasizing ownership and control of data while also enabling seamless collaboration. It compares traditional cloud apps with new approaches that prioritize user ownership and real-time collaboration. The focus is on developing software that combines the convenience of cloud apps with the data ownership of traditional software.
Writing CUDA Kernels for PyTorch
June 11, 2024
The text shows the thread distribution on different streaming multiprocessors (SM) in CUDA. Threads are organized into warps, lanes, and specific thread numbers within each SM. This information is crucial for optimizing CUDA kernels in PyTorch.
Multi-Query & Grouped-Query Attention
June 11, 2024
The text explains how Multi-Query Attention and Grouped-Query Attention reduce the Key-Value Cache size in transformer models while maintaining performance. Multi-Query Attention allows multiple attention heads to share key and value vectors, while Grouped-Query Attention groups these vectors based on a hyperparameter, offering a balance between performance and cache reduction. These techniques help manage memory usage during text generation tasks in transformer models.
999 crates of Rust on the wall
June 11, 2024
The author compared popular crates on crates.io to their upstream repositories to improve supply chain security. Most top crates matched their repositories, but some had issues like missing VCS info or build failures. Future work includes extending this analysis to all crates on crates.io and improving publishing processes for better security.
Uiuisms
June 9, 2024
This text provides a list of Uiua functions for solving common problems. Contributors can add more functions to the list in the repository. Functions include splitting arrays, removing rows, upscaling matrices, and working with diagonal arrays.
Arithmetic functions
June 9, 2024
BQN's arithmetic functions mirror mathematical notation and apply element-wise to arrays. BQN supports basic arithmetic operations like addition, subtraction, multiplication, division, exponentiation, and root functions. Character arithmetic is a distinctive feature allowing manipulation of characters with symbols like + and -.
An interactive study of queueing strategies
June 9, 2024
This text explores different queueing strategies for handling requests, emphasizing the importance of prioritizing requests effectively to prevent dropping important ones. It introduces concepts like FIFO and priority queues, as well as active queue management techniques to optimize request processing. Understanding these strategies can help in efficiently managing queues and improving overall system performance.
A DSL for Implementing Math Functions
June 8, 2024
MCC15-04
June 8, 2024
ethereumbook/04keys-addresses.asciidoc at develop · ethereumbook/ethereumbook · GitHub
June 6, 2024
This chapter introduces public key cryptography used in Ethereum for securing ownership of funds through private keys and addresses. Public keys are derived from private keys and are represented as points on an elliptic curve. Ethereum addresses are unique identifiers generated from public keys using the Keccak-256 hash function.
Accidentally Turing-Complete
June 6, 2024
The document "Accidentally Turing-Complete" explores various unexpected systems and technologies that unintentionally exhibit Turing completeness, a property that allows them to perform any computation. Examples include C++ templates, TypeScript, Java generics, X86 mov instructions, Magic: The Gathering card game, HTML5, Minecraft, Dwarf Fortress game, SQL, Apache Rewrite Rules, Pokemon Yellow game, Scala type system, MediaWiki templates, Little Big Planet game, Sendmail, Vim Normal-Mode, Border Gateway Protocol (BGP), Excel, Super Mario World glitches, PowerPoint, Font Shaping, JBIG2 Image Compression, and Stupid RDMA NICs. The document showcases how these diverse systems, from games to internet protocols, can unexpectedly demonstrate the computational power of Turing completeness.
The Art of Computer Programming, Vol. 4 Fascicle 6
June 6, 2024
The_Manga_Guide_to_Linear_Algebra
June 6, 2024
Exploring architectures- Transformers II
June 6, 2024
The text explains how Transformers utilize queries, keys, and values to calculate self-attention weights for tokens. It details the process of obtaining the self-attention weights and generating output tokens through neural networks. The final steps involve calculating loss using cross-entropy and backpropagating to update the weight parameters.
What are Diffusion Models?
June 6, 2024
Diffusion models slowly add noise to data and then learn to reverse the process to create desired samples. Unlike other models, diffusion models have a fixed procedure and high-dimensional latent variables. Training a diffusion model involves approximating conditioned probability distributions and simplifying the objective function.
Problems with BQN
June 6, 2024
BQN has issues with incoherent monad-dyad pairs and train structures, making code readability and implementation challenging. Modifications like the Constant modifier ˙ attempt to address these challenges. However, there are still limitations in tacit code construction and array reductions that impact the language's usability.
Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model
June 5, 2024
The paper presents a simple and effective denoising-diffusion model called Iterative α-(de)Blending. It offers a user-friendly alternative to complex theories, making it accessible with basic calculus and probability knowledge. By iteratively blending and deblending samples, the model converges to a deterministic mapping, showing promising results in computer graphics applications.
The borrow checker within
June 5, 2024
The text discusses improvements to Rust's borrow checker to align better with its core design ethos of mutation xor sharing. These changes aim to make Rust code patterns feel more intuitive and work seamlessly with the borrow checker's rules. The proposed enhancements include features like conditional return references, view types, and addressing phased initialization issues.
How should I read type system notation?
June 5, 2024
A type system in programming languages follows rules for expressions and types. Typing rules are written as relationships between expressions and their types for checking and inferring types. Contexts are used to keep track of variable types in type judgments.
Writing a Simple Garbage Collector in C
June 5, 2024
Summary:
The text explains how to implement a simple garbage collector in C by writing a memory allocator function that manages free and used memory blocks using linked lists. The garbage collection algorithm involves scanning memory regions to mark blocks in use and free those not in use. The collector function collects unused memory blocks, making the heap scanning code simpler and faster.
A decade of developing a programming language
June 5, 2024
The author spent a decade developing the programming language Inko, transitioning from gradual to static typing and using Rust for the compiler. Recommendations include avoiding gradual typing, self-hosting compilers, and focusing on functionality over performance when building a new language. Building a language for long-term use is a time-consuming process that requires prioritizing user needs over technical complexities.
The Rust I Wanted Had No Future
June 5, 2024
The author preferred certain design choices in early Rust over the current state, such as the treatment of certain language features and performance considerations. They express a desire for a simpler, less performance-focused language with different priorities than those commonly held in the Rust community. The author reflects on their preferences for language design and the trade-offs they would have made for a more straightforward and expressive programming experience.
The Garbage Collection Handbook
June 5, 2024
The Garbage Collection Handbook is a comprehensive guide on automatic memory management, covering modern techniques and challenges faced by programmers. This second edition updates the handbook with insights from over 60 years of research and development in the field. It is essential reading for programmers looking to understand and navigate the complexities of garbage collection in modern programming languages.
A high-bias, low-variance introduction to Machine Learning for physicists
June 5, 2024
This text is an introduction to Machine Learning for physicists, highlighting the natural connections between ML and statistical physics. It explains the use of "energy-based models" inspired by statistical physics in deep learning methods. The discussion includes the application of methods from statistical physics to study deep learning and the efficiency of learning rules.
How diffusion models work: the math from scratch
June 1, 2024
Diffusion models generate diverse high-resolution images and are different from previous generative methods. Cascade diffusion models and latent diffusion models are used to scale up models to higher resolutions efficiently. Score-based generative models are similar to diffusion models and involve noise perturbations to generate new samples.
May 2024
34 bookmarksessentials-of-compilation
May 31, 2024
The text discusses the implementation of compilers for different programming languages, covering topics such as syntax definitions, interpreter extensions, and x86 assembly translation. It emphasizes simplifying the compiler process for readers by using a straightforward language and providing step-by-step guidance on compiler development. Additionally, it introduces new language features like Booleans, conditionals, and tuples, expanding the capabilities of the compilers being built.
PRACTICAL COMPILER CONSTRUCTION
May 31, 2024
"Practical Compiler Construction" is a textbook on writing compilers with annotated source code. The second edition is now available in print with improvements and bug fixes. The book covers compiler construction concepts and advanced techniques for optimizing code.
A Distributed Systems Reading List
May 31, 2024
This reading list covers materials for understanding distributed systems design and challenges. It includes resources on topics like latency, Amazon's organizational culture, Google's cutting-edge technologies, consistency models, theory, languages, tools, infrastructure, storage, Paxos consensus, and gossip protocols. The list aims to help readers adapt their thinking to effectively tackle distributed system complexities.
An Introduction to Assembly Programming with RISC-V
May 28, 2024
This text provides information about a resource related to RISC-V programming. The ISBN number for this resource is 978-65-00-15811-3. It is authored by riscv-programming.org.
Microsoft PowerPoint - SRAM Architecture
May 28, 2024
The text discusses the architecture of Static Random Access Memory (SRAM) cells, focusing on their read and write operations, sizing considerations, and column circuitry. SRAM cells store data using cross-coupled inverters, with specific steps for reading and writing data. Column circuitry includes bitline conditioning, sense amplifiers, and multiplexing for efficient data access.
MLIR: A Compiler Infrastructure for the End of Moore's Law
May 27, 2024
MLIR is a versatile compiler infrastructure designed to address software fragmentation and improve compilation for different hardware. It aims to reduce the cost of building domain-specific compilers and facilitate the connection of existing compilers. MLIR offers a standardized approach to code generation and optimization across various application domains and hardware targets.
MLIR — Getting Started
May 27, 2024
The text is a guide titled "MLIR — Getting Started" by Math ∩ Programming available on www.jeremykun.com.
Chapter 2 Basics of SIMD Programming
May 27, 2024
The text explains how to organize data for SIMD operations and provides examples of SIMD-Ready Vectors. It also discusses the relationship between vectors and scalars in SIMD programming. Built-in functions for VMX instructions and SIMD operation principles are outlined in the text.
Matrix multiplication in Mojo
May 27, 2024
The text discusses matrix multiplication in Mojo. It is written by modular.com and can be found on docs.modular.com.
Matrix Multiplication on CPU
May 27, 2024
The text is about matrix multiplication on a CPU. The author is Marek Kolodziej and the domain is marek.ai.
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
May 27, 2024
The text is a worklog by Simon Boehm about optimizing a CUDA Matmul Kernel for cuBLAS-like performance. It can be found on the domain siboehm.com.
The Annotated Transformer
May 27, 2024
The text discusses the architecture and training of a Transformer model.
It explains the use of self-attention and feed-forward networks in the encoder and decoder.
The model is demonstrated through examples of prediction and visualization of attention mechanisms.
Anonymity and the internet
May 27, 2024
Anonymity on the internet is fragile, with each piece of information reducing anonymity. Revealing multiple bits of personal information can jeopardize anonymity, but deliberate disinformation can help regain some anonymity. To protect anonymity, it's best to minimize information disclosure.
Auto-Regressive Next-Token Predictors are Universal Learners
May 26, 2024
Simple linear next-token predictors can efficiently approximate any function computable by a Turing machine. Even basic models like linear networks and shallow Multi-Layer Perceptrons show strong performance on tasks like text generation and arithmetic. By leveraging auto-regressive learning, these models can achieve impressive results in solving complex tasks.
Where Vim Came From
May 25, 2024
Vim is a popular text editor with a long history tracing back to the Unix epoch. Its development started in 1988 and evolved from the "wq text editor" concept. Vim's success is attributed to its features and the gradual accumulation of good ideas over time.
Building and operating a pretty big storage system called S3
May 25, 2024
Dr. Werner Vogels shares insights from working on Amazon's S3 storage system, highlighting the scale and unique challenges faced. S3's design incorporates innovative strategies to efficiently handle vast amounts of data across millions of hard drives while prioritizing customer experience. Vogels emphasizes the need for a broader perspective on software systems and the rewarding journey of scaling as an engineer at Amazon.
Unnamed Document
May 25, 2024
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
May 25, 2024
Meta's LLaMA family has become one of the most powerful open-source Large
Language Model (LLM) series. Notably, LLaMA3 models have recently been released
and achieve impressive performance across various with super-large scale
pre-training on over 15T tokens of data. Given the wide application of low-bit
quantization for LLMs in resource-limited scenarios, we explore LLaMA3's
capabilities when quantized to low bit-width. This exploration holds the
potential to unveil new insights and challenges for low-bit quantization of
LLaMA3 and other forthcoming LLMs, especially in addressing performance
degradation problems that suffer in LLM compression. Specifically, we evaluate
the 10 existing post-training quantization and LoRA-finetuning methods of
LLaMA3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMA3's
low-bit quantization performance. Our experiment results indicate that LLaMA3
still suffers non-negligent degradation in these scenarios, especially in
ultra-low bit-width. This highlights the signif...
LADW_2017-09-04
May 25, 2024
This text discusses properties of vector spaces and matrices, particularly focusing on bases and eigenvalues. It establishes that any linearly independent system of vectors can be completed to form a basis in a finite-dimensional vector space. Additionally, it explains that operators in inner product spaces have an upper triangular matrix representation under certain conditions.
New Scaling Laws for Large Language Models
May 25, 2024
DeepMind's new paper challenges existing scaling laws for training large language models, proposing more optimal use of compute resources. By training a smaller 70-billion parameter model using their new scaling laws, DeepMind demonstrated superior performance compared to larger models like GPT-3 and their own 270-billion parameter model. This discovery may lead to more cost-effective and efficient training of large language models in the future.
Binary Magic: Building BitNet 1.58bit Using PyTorch from Scratch
May 25, 2024
The document discusses the creation of a 1.58bit model called BitNet using PyTorch from scratch, which can rival full precision LLMs. Quantization, the process of representing float numbers with fewer bits, is explained as a method to increase the speed and reduce the RAM consumption of ML models, albeit with some loss of accuracy. BitNet differs from existing quantization approaches as it trains the model from scratch with quantization, offering a unique quantization algorithm and implementation in PyTorch. Results from experiments with custom PyTorch implementations show that the 2bit and 1bit variants of models perform as well as full precision models, demonstrating the potential of this approach.
king - man + woman is queen; but why?
May 25, 2024
The text explains how the word2vec algorithm transforms words into vectors for analyzing similarities and relationships between words. By using vector arithmetic, it can find analogies such as "king - man + woman = queen." Understanding word co-occurrences can provide insight into the meaning of words through the distributional hypothesis.
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
May 25, 2024
Meta's LLaMA family has become one of the most powerful open-source Large
Language Model (LLM) series. Notably, LLaMA3 models have recently been released
and achieve impressive performance across various with super-large scale
pre-training on over 15T tokens of data. Given the wide application of low-bit
quantization for LLMs in resource-limited scenarios, we explore LLaMA3's
capabilities when quantized to low bit-width. This exploration holds the
potential to unveil new insights and challenges for low-bit quantization of
LLaMA3 and other forthcoming LLMs, especially in addressing performance
degradation problems that suffer in LLM compression. Specifically, we evaluate
the 10 existing post-training quantization and LoRA-finetuning methods of
LLaMA3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMA3's
low-bit quantization performance. Our experiment results indicate that LLaMA3
still suffers non-negligent degradation in these scenarios, especially in
ultra-low bit-width. This highlights the signif...
1-bit Model
May 25, 2024
Quantizing small models like Llama2-7B at 1-bit yields poor performance but fine-tuning with low-rank adapters significantly improves output quality. The HQQ+ approach shows potential in extreme low-bit quantization for machine learning models, reducing memory and computational requirements while maintaining performance. Training larger models with extreme quantization can lead to superior performance compared to training smaller models from scratch.
Human Knowledge Compression Contest
May 25, 2024
The Human Knowledge Compression Contest measures intelligence through data compression ratios. Better compression leads to better prediction and understanding, showcasing a link between compression and artificial intelligence. The contest aims to raise awareness of the relationship between compression and intelligence, encouraging the development of improved compressors.
Heatmaps and CNNs Using Fast.ai
May 25, 2024
The text discusses heatmaps, CNNs, and their relationship in deep learning. It explains how heatmaps are generated using Grad-CAM heatmaps from the final layer of a Convolutional Neural Network. The article also touches on creating heatmaps using Adaptive Pooling layers and interpreting top losses for model evaluation.
Where do LLMs spend their FLOPS?
May 19, 2024
LLMs (large language models) spend their FLOPS (floating point operations) on various tasks, including computing QKV (query, key, value) matrices, attention output matrices, and running the feed-forward network (FFN). The attention mechanism plays a crucial role in LLMs, even though the FLOPS required for attention calculations are relatively small. The KV cache, which stores information for each token, requires significant memory but is necessary for generating sequences. Different architectural choices, such as grouped query attention and sliding window attention, can affect the size and efficiency of the KV cache. Increasing the number of layers in an LLM linearly scales the FLOPS and parameters, while increasing the model width quadratically scales the model size. Wider models parallelize better, while deeper models increase inference time linearly.
The Annotated Diffusion Model
May 19, 2024
A neural network learns to denoise data by gradually removing noise. The process involves adding noise to an image and then training the network to reverse the denoising. The network predicts noise levels based on corrupted images at different time steps.
Defusing Diffusion Models
May 19, 2024
This post explains the concepts of forward and reverse diffusion processes in diffusion models. By understanding these processes, readers can train diffusion models to generate samples from target distributions effectively. Guided diffusion models are also discussed, showing how conditioning information can be used to guide the diffusion process for specific outcomes.
The Illustrated Stable Diffusion
May 19, 2024
AI image generation with Stable Diffusion involves an image information creator and an image decoder. Diffusion models use noise and powerful computer vision models to generate aesthetically pleasing images. Text can be incorporated to control the type of image the model generates in the diffusion process.
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
May 10, 2024
Mamba-UNet is a new architecture combining U-Net with Mamba technology for better medical image segmentation performance. It addresses limitations in modeling long-range dependencies within medical images. Results show that Mamba-UNet outperforms other UNet variations in medical image segmentation tasks.
Sparse Autoencoders Find Highly Interpretable Features in Language Models
May 1, 2024
Sparse autoencoders help identify clear and understandable features in language models by tackling the issue of polysemanticity. By using sparse autoencoders, researchers can pinpoint specific features responsible for certain behaviors in neural networks more effectively than other methods. This approach may lead to increased transparency and control over language models in the future.
KAN: Kolmogorov–Arnold Networks
May 1, 2024
Kolmogorov-Arnold Networks (KANs) have learnable activation functions on edges, outperforming Multilayer Perceptrons (MLPs) in accuracy and interpretability. KANs show faster neural scaling laws than MLPs, leveraging splines and MLPs to improve accuracy and interpretability. KANs can represent functions effectively and display more favorable scaling curves than MLPs, especially in high-dimensional examples.
KAN: Kolmogorov-Arnold Networks
May 1, 2024
KANs outperform MLPs in accuracy and interpretability by using learnable activation functions on edges. They have faster neural scaling laws and can represent special functions more efficiently. KANs offer a promising alternative to MLPs in various applications, showcasing improved performance and interpretability.
April 2024
9 bookmarksStructure and Interpretation of Computer Programs, 2nd ed.
April 30, 2024
The text discusses key concepts in programming, such as primitive expressions, means of combination, and means of abstraction. It highlights the role of the environment in determining the meaning of symbols in expressions. The evaluation process involves reducing expressions to procedures applied to arguments, leading to a deeper understanding of programming concepts.
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
April 25, 2024
The reproducibility and transparency of large language models are crucial for
advancing open research, ensuring the trustworthiness of results, and enabling
investigations into data and model biases, as well as potential risks. To this
end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a
layer-wise scaling strategy to efficiently allocate parameters within each
layer of the transformer model, leading to enhanced accuracy. For example, with
a parameter budget of approximately one billion parameters, OpenELM exhibits a
2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer
pre-training tokens.
Diverging from prior practices that only provide model weights and inference
code, and pre-train on private datasets, our release includes the complete
framework for training and evaluation of the language model on publicly
available datasets, including training logs, multiple checkpoints, and
pre-training configurations. We also release code to convert models to MLX
libra...
þÿAn Infinitely Large Napkin
April 23, 2024
The text is titled "An Infinitely Large Napkin" by Evan Chen. The author's work can be found on readwise.io.
IEEE Xplore Full-Text PDF:
April 10, 2024
Root Mean Square Layer Normalization
April 9, 2024
The text discusses a technique called Root Mean Square Layer Normalization proposed by Biao Zhang and Rico Sennrich. This technique is likely related to a method for normalizing data in neural networks. The authors' work can be found on arxiv.org.
Root Mean Square Layer Normalization
April 9, 2024
Layer normalization (LayerNorm) has been successfully applied to various deep
neural networks to help stabilize training and boost model convergence because
of its capability in handling re-centering and re-scaling of both inputs and
weight matrix. However, the computational overhead introduced by LayerNorm
makes these improvements expensive and significantly slows the underlying
network, e.g. RNN in particular. In this paper, we hypothesize that
re-centering invariance in LayerNorm is dispensable and propose root mean
square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs
to a neuron in one layer according to root mean square (RMS), giving the model
re-scaling invariance property and implicit learning rate adaptation ability.
RMSNorm is computationally simpler and thus more efficient than LayerNorm. We
also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of
the summed inputs without breaking the above properties. Extensive experiments
on several tasks using diverse...
Terry A. Davis
April 7, 2024
Terry A. Davis, an American electrical engineer and programmer, created TempleOS, a public domain operating system. Despite his mental health challenges, Davis gained an online following for his unique work and beliefs. His legacy continues to be remembered through documentaries and online discussions.
Pattern Recognition and Machine Learning
April 6, 2024
The content discusses likelihood functions for Gaussian distributions, maximizing parameters using observed data, Bayesian model comparison, mixture density networks, and EM algorithm for Gaussian mixtures. It covers topics like posterior distributions, predictive distributions, graphical models, and variational inference. The material emphasizes probability distributions, optimization, and model comparison.
Ludwig Wittgenstein: The Duty of Genius
April 6, 2024
The text discusses the complex relationship between Ludwig Wittgenstein and his peers, particularly Bertrand Russell. Wittgenstein's philosophical ideas and personal struggles are highlighted, showing the challenges he faced in expressing his thoughts and finding understanding from others. Despite his brilliance, Wittgenstein's life was marked by loneliness and inner turmoil, making it difficult for him to fully convey his philosophical insights.
March 2024
16 bookmarksGenerative Agents: Interactive Simulacra of Human Behavior
March 28, 2024
The content discusses generative agents that simulate believable human behavior for interactive applications. These agents populate a sandbox environment, interact with each other, plan their days, form relationships, and exhibit emergent social behaviors. The paper introduces a novel architecture that allows agents to remember, retrieve, reflect, and interact dynamically.
Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks
March 11, 2024
The text is a comprehensive survey of 400 activation functions for neural networks. It provides numerous URLs and DOIs for further reading and reference. The authors are Vladimír Kunc and Jiří Kléma.
Revisiting Deep Learning as a Non-Equilibrium Process
March 8, 2024
The document discusses the nature of Deep Learning systems, highlighting differences from traditional machine learning systems and challenging common misconceptions. It emphasizes the complexity and non-convexity of Deep Learning, noting that optimization techniques alone cannot explain its success. The text critiques the field for lacking in-depth exploration of the true nature of Deep Learning, pointing out a tendency towards superficial explanations and reliance on celebrity figures rather than rigorous scientific inquiry. It delves into the use of Bayesian techniques, the role of noise, and the importance of architecture in Deep Learning, arguing for a deeper understanding of the underlying processes and the need for more precise language and theoretical exploration.
Dissipative Adaptation: The Origins of Life and Deep Learning
March 8, 2024
The document explores the concept of Dissipative Adaptation, drawing parallels between the emergence of life and the mechanisms of Deep Learning. It discusses the work of Jeremy England and his theory of non-equilibrium statistical mechanics known as Dissipative Adaptation, which explains the self-organizing behavior of Deep Learning. The text delves into how neural networks evolve through training, emphasizing the role of external observations in driving the system towards minimizing entropy. It contrasts the mechanisms of Dissipative Adaptation with current Deep Learning architectures, highlighting similarities in alignment of components to maximize energy dissipation or information gradient.
A Gentle Introduction to LLVM IR
March 6, 2024
Learning LLVM IR can be beneficial for generalist working programmers to understand what their compiler is doing to create highly optimized code. LLVM IR is well-documented and can be treated as a slightly weird programming language. It is strongly typed and requires explicit type annotations. LLVM IR is a static single assignment form (SSA) IR and has properties that make optimizations simpler to write. It supports control flow operations, arithmetic instructions for different types, and memory operations. There are also LLVM intrinsics available for specific functions. However, some parts of LLVM's semantics, such as undefined behavior and pointer provenance, can be challenging to navigate.
Re: [Fis] A PROPOSAL ABOUT THE DEFINITION OF INFORMATION
March 6, 2024
The email exchange discusses the concept of negative entropy and its implications in mathematics and thermodynamics. Sungchul Ji questions the validity of negative entropy based on the Third Law of Thermodynamics. Arturo Tozzi argues for the existence of negative entropy in certain cases and relates it to information theory and free energy.
The Art of Embeddings: Transforming Text for Vector Databases (Part 2)
March 6, 2024
Embeddings are a crucial component of transforming text into vectors in vector databases. They capture rich context and make data more useful by capturing meaning and context in a machine-readable format. Tokenization is the first step in the embedding process, where text is broken down into smaller parts or tokens. Word2Vec is a popular method that creates dense vector representations of word features based on context. However, it has limitations such as struggling with polysemy and out-of-vocabulary words. Sub-word tokenization is a hybrid approach that can handle these limitations by decomposing words into meaningful sub-words. Transformer models, such as BERT, are used to transform tokenized words into embeddings by leveraging self-attention mechanisms and positional encodings. The choice of tokenization method can significantly affect the size and effectiveness of the embeddings, including vocabulary size, handling of out-of-vocabulary words, and overall quality and usefulness of the embeddings. Choosing th...
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
March 6, 2024
The text discusses a method called Parameter-Efficient Sparsity Crafting (PESC) that enhances sparse models for natural language processing tasks. PESC involves integrating adapters into sparse models, improving performance without changing individual weights. The approach outperforms other sparse models and even competes with GPT-3.5 in various tasks.
þÿThe Little Book of Deep Learning
March 6, 2024
I'm sorry, but there is no content provided to summarize. If you have any text or information you would like me to summarize, please provide it so I can assist you.
Information
March 6, 2024
The text discusses the challenges and complexities of measuring and quantifying information, particularly in terms of storage capacity, compression, and entropy. It explores various examples, such as genome information, human sensory capabilities, and the information content of objects like water molecules and black holes. The relationship between information, entropy, and physical properties is also highlighted.
Sequence to Sequence Learning with Neural Networks
March 6, 2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
March 6, 2024
The article introduces a new era of 1-bit Large Language Models (LLMs) that can significantly reduce the cost of LLMs while maintaining their performance. BitNet b1.58 is a 1.58-bit LLM variant in which every parameter is ternary, taking on values of {-1, 0, 1}. It retains all the benefits of the original 1-bit BitNet, including its new computation paradigm, which requires almost no multiplication operations for matrix multiplication and can be highly optimized. Moreover, BitNet b1.58 offers two additional advantages: its modeling capability is stronger due to its explicit support for feature filtering, and it can match full precision (i.e., FP16) baselines in terms of both perplexity and end-task performance at a 3B size.
How to round to 2 decimals with Python? [duplicate]
March 5, 2024
To round a number to 2 decimals in Python, the usual method is using round(value, significantDigit), but it can behave unexpectedly when the digit before the one being rounded is a 5. To address this, a workaround involves adding a small value to ensure proper rounding. This method allows for traditional rounding commonly used in statistics without needing to import additional libraries like Decimal. By incorporating this workaround into a function, you can achieve the desired rounding results without encountering the issue with numbers ending in 5.
Rounding floats with f-string [duplicate]
March 5, 2024
Using %-formatting, I can specify the number of decimal cases in a string:
x = 3.14159265
print('pi = %0.2f' %x)
This would give me:
pi = 3.14
Is there any way of doing this using f-strings in ...
Latent Interfaces
March 3, 2024
In a career shift, the author is launching Latent Interfaces to apply expertise in design, prototyping, and development to complex data challenges. They share insights into a genomic data project, emphasizing the importance of Python skills alongside JavaScript. The document showcases the creation of intuitive data interfaces and the design process involving both digital and physical tools. Additionally, the author discusses the significance of well-designed APIs like StabilityAI and the potential for future collaborations in data visualization projects.
Hypercomputation
March 3, 2024
Hypercomputation and super-Turing computation involve models of computation that can produce non-Turing-computable outputs. Introduced in the early 1990s, super-Turing computing is inspired by neurological and biological systems and serves as the foundation for Lifelong Machine Learning. Hypercomputation, a field introduced in the late 1990s, includes philosophical constructs and aims to compute functions beyond what a Turing machine can. The Church-Turing thesis states that any "computable" function can be computed by a Turing machine, but hypercomputers can compute functions that are not computable in the Church-Turing sense. Various hypercomputer models exist, ranging from theoretical concepts like oracle machines to more plausible models like quantum computing. Some proposals suggest that hypercomputation may be achievable through systems like neural networks or analog computers. Critics argue that hypercomputation is not physically realizable.
February 2024
24 bookmarksThe Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
February 28, 2024
Recent research is leading to a new era of 1-bit Large Language Models (LLMs), such as BitNet, introducing a variant called BitNet b1.58 where every parameter is ternary {-1, 0, 1}. This model matches the performance of full-precision Transformer LLMs while being more cost-effective in terms of latency, memory, throughput, and energy consumption. The 1.58-bit LLM sets a new standard for training high-performance and cost-effective models, paving the way for new computation methods and specialized hardware designed for 1-bit LLMs.
How Netflix Really Uses Java
February 27, 2024
The discussion at Netflix delves into how Java is utilized within the company's architecture, highlighting their transition to Java 17 and ongoing testing with Java 21. The move to newer Java versions resulted in significant performance improvements, such as 20% better CPU usage with Java 17. Additionally, the implementation of GraphQL Federation and virtual threads in Java 21 are key advancements that are expected to impact the way code is written and scaled within Netflix's Java stack. The company's shift from Java 8 to Java 17 and the ongoing evolution of their technology frameworks and tooling, particularly focusing on Spring Boot, demonstrate their commitment to staying current with Java developments.
Scheduling Internals
February 27, 2024
The document delves into the concept of concurrency in programming, exploring how tasks can be handled concurrently using different methods like threads, async I/O, event loops, and schedulers. It discusses the challenges and benefits of each approach, illustrating examples in C code to demonstrate the practical implementations. The text covers topics like preemptive and non-preemptive schedulers, implementation details in languages like Go and Rust, as well as the use of event loops for efficient task handling. It also touches on the importance of understanding program state management and the impact on task execution in concurrent programming.
Glossary of Deep Learning: Word Embedding
February 26, 2024
Word embedding is a method that transforms text into numerical vectors for machine learning algorithms to process efficiently. These vectors are created to represent words or phrases as real numbers, focusing on dimensionality reduction and contextual similarity. Word2Vec is a popular algorithm that implements this approach using techniques like CBOW and Skip-gram to predict target words based on their context. While word embeddings are not deep learning themselves, they provide a way for deep nets to interpret and understand natural language, offering a new understanding of language as numbers.
gemini_v1_5_report
February 18, 2024
Gemini 1.5 Pro is a highly compute-efficient multimodal model that can recall and reason over millions of tokens of context, including long documents, videos, and audio. It achieves near-perfect recall on long-context retrieval tasks and outperforms the state-of-the-art in long-document QA, long-video QA, and long-context ASR. Gemini 1.5 Pro also showcases surprising new capabilities, such as learning to translate a new language from a grammar manual. The model surpasses the previous Gemini 1.0 Pro and performs at a similar level to 1.0 Ultra on a wide range of benchmarks while requiring less compute to train.
How to Use t-SNE Effectively
February 16, 2024
t-SNE plots can be useful for visualizing high-dimensional data, but they can also be misleading if not interpreted correctly. The technique creates 2D "maps" of data with many dimensions, but these images can be misread. The perplexity parameter, which balances attention between local and global aspects of the data, has a significant impact on the resulting plots. Different perplexity values may be needed to capture different aspects of the data. t-SNE plots can equalize cluster sizes and distort distances between clusters, making it difficult to interpret relative sizes and distances. It's important to recognize random noise and avoid misinterpreting it as meaningful patterns. t-SNE plots can show some shapes accurately, but local effects and clumping can also affect the interpretation. For topological information, multiple plots at different perplexities may be required. Overall, using t-SNE effectively requires understanding its behavior and limitations.
Temperature as Joules per Bit
February 15, 2024
The text discusses the concept of temperature and entropy in terms of information theory, suggesting that entropy should be measured in bits rather than joules per kelvin. It highlights the importance of information in thermodynamics and how Landauer's principle relates to the cost of erasing information. The authors advocate for viewing energy and entropy as more fundamental than temperature, emphasizing the duality between energy and information.
Consciousness, Cognition and the Neuronal Cytoskeleton – A New Paradigm Needed in Neuroscience
February 14, 2024
Viewing the brain as a complex computer of simple neurons is insufficient to explain consciousness and cognition. A new paradigm is needed that considers the brain as a scale-invariant hierarchy, with quantum and classical processes occurring in cytoskeletal microtubules inside neurons. Evidence shows that microtubules regulate specific firings of axonal branches and modulate membrane and synaptic activities. This new paradigm suggests that information processing for cognitive and conscious brain functions occurs in microtubules and involves both top-down and bottom-up regulation within the brain hierarchy. The precise mechanisms of consciousness may be most likely to reveal themselves in Layer V cortical pyramidal neurons, which have a large collection of mixed polarity microtubule networks.
OpenMEA: Open-Source Microelectrode Array Platform for Bioelectronic Interfacing
February 14, 2024
OpenMEA is an open-source platform for closed-loop bioelectronics research that aims to revolutionize the treatment of medical disorders and augment physiology. It includes designs for components such as electrophysiological recording and stimulation electronics, a microfluidic perfusion system, and physical designs for multielectrode arrays. The platform enables researchers to conduct in vitro experiments and understand the long-term effects of electrical stimulation and drug interactions. OpenMEA offers high-performance processing capabilities and supports simultaneous recording and stimulation, as well as the real-time adaptation of neuromodulation waveforms. It fills the gaps in existing solutions and provides a versatile tool for bioelectronic research.
Landauer's principle
February 14, 2024
Landauer's principle is a physical principle that establishes the minimum energy consumption of computation. It states that irreversible changes in information stored in a computer dissipate a minimum amount of heat to the surroundings. The principle was proposed by Rolf Landauer in 1961 and states that the minimum energy needed to erase one bit of information is proportional to the temperature at which the system is operating. While the principle is widely accepted, it has faced challenges in recent years. However, it has been shown that Landauer's principle can be derived from the second law of thermodynamics and the entropy change associated with information gain.
Bremermann's limit
February 14, 2024
Bremermann's limit is a maximum rate of computation that can be achieved in a self-contained system in the material universe. It is based on Einstein's mass-energy equivalency and the Heisenberg uncertainty principle. This limit has implications for designing cryptographic algorithms, as it can determine the minimum size of encryption keys needed to create an uncrackable algorithm. The limit has also been analyzed in relation to the maximum rate at which a system with energy spread can evolve into an orthogonal state.
Bekenstein bound
February 14, 2024
The Bekenstein bound is an upper limit on the entropy or information that can be contained within a given finite region of space with a finite amount of energy. It implies that the information of a physical system must be finite if the region of space and energy are finite. The bound was derived from arguments involving black holes and has implications for thermodynamics and general relativity. It can be proven in the framework of quantum field theory and has applications in various fields, such as black hole thermodynamics and the study of human brains.
numerical_recipes
February 14, 2024
The content provided is the table of contents for a book titled "Numerical Recipes: The Art of Scientific Computing, Third Edition." It includes various topics such as linear algebra, interpolation and extrapolation, integration of functions, evaluation of functions, special functions, random numbers, sorting and selection, root finding and nonlinear sets of equations, minimization or maximization of functions, eigensystems, and more.
Temperature as Joules per Bit
February 14, 2024
The paper suggests that temperature should be defined in terms of entropy, rather than vice versa. It argues that the current practice of measuring entropy in joules per kelvin is a historical artifact and proposes measuring entropy in bits instead. The paper also discusses the role of information in thermodynamics and the thermodynamic cost of erasure. It concludes by suggesting that entropy, not temperature, should have its own unit and that Boltzmann's constant should be dissolved.
Deep Learning Course
February 10, 2024
This document provides resources for François Fleuret's deep-learning course at the University of Geneva. The course offers a thorough introduction to deep learning, with examples using the PyTorch framework. The materials include slides, recordings, and a virtual machine. The course covers topics such as machine learning objectives, tensor operations, automatic differentiation, gradient descent, and deep-learning techniques. The document also includes prerequisites for the course, such as knowledge of linear algebra, differential calculus, Python programming, and probability and statistics.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
February 9, 2024
Diffusion Models and Associative Memories show surprising similarities in their mathematical underpinnings and goals, bridging traditional and modern AI research. This connection highlights the convergence of AI models towards memory-focused paradigms, emphasizing the importance of understanding Associative Memories in the field of computation. By exploring these parallels, researchers aim to enhance our comprehension of how models like Diffusion Models and Transformers operate in Deep Learning applications.
2309.10668
February 8, 2024
This article discusses the relationship between language modeling and compression. The authors argue that large language models can be viewed as powerful compressors due to their impressive predictive capabilities. They demonstrate that these models can achieve state-of-the-art compression rates across different data modalities, such as images and audio. The authors also explore the connection between compression and prediction, showing that models that compress well also generalize well. They conclude by advocating for the use of compression as a framework for studying and evaluating language models.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
February 8, 2024
Diffusion Models (DMs) have become increasingly popular in generating benchmarks, but their mathematical descriptions can be complex. In this survey, the authors provide an overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs), revealing a mathematical connection to Associative Memories (AMs). AMs are energy-based models that share similarities with denoising DMs, but they allow for the computation of a Lyapunov energy function and gradient descent to denoise data. The authors also summarize the 40-year history of energy-based AMs, starting with the Hopfield Network, and discuss future research directions for both AMs and DMs.
tns
February 8, 2024
This document, entitled "tns", explores the concept of network states and their potential to replace traditional nation states. The author argues that a network state is a social network with a moral innovation, a sense of national consciousness, and a recognized founder, among other features. The document also delves into the history of political power and technological truth, and how the network state is the next Leviathan. The author provides examples of positive and negative syntheses of the network and state and discusses the potential for startup societies and network states to maintain liberal values in an illiberal world.
A New Physics Theory of Life | Quanta Magazine
February 7, 2024
According to physicist Jeremy England, the origin and evolution of life can be explained by the fundamental laws of nature. He proposes that living things are better at capturing and dissipating energy from their environment compared to inanimate objects. England has derived a mathematical formula based on established physics that explains this capacity. His theory, which underlies Darwin's theory of evolution, has sparked controversy among his colleagues. While some see it as a potential breakthrough, others find it speculative. England's idea is based on the second law of thermodynamics and the process of dissipating energy. He argues that self-replication and structural organization are mechanisms by which systems increase their ability to dissipate energy. His theory may have implications for understanding the formation of patterned structures in nature.
K-Level Reasoning with Large Language Models
February 7, 2024
Large Language Models (LLMs) have shown proficiency in complex reasoning tasks, but their performance in dynamic and competitive scenarios remains unexplored. To address this, researchers have introduced two game theory-based challenges that mirror real-world decision-making. Existing reasoning methods tend to struggle in dynamic settings that require k-level thinking, so the researchers propose a novel approach called "K-Level Reasoning" that improves prediction accuracy and informs strategic decision-making. This research sets a benchmark for dynamic reasoning assessment and enhances the proficiency of LLMs in dynamic contexts.
Competitive Programmer's Handbook
February 5, 2024
The article discusses various algorithms and data structures used in computer programming, such as Kadane's algorithm, binary indexed trees, segment trees, Dijkstra's algorithm, and Floyd's algorithm. The author also explains concepts like successor graphs, index compression, and minimum spanning trees. The time complexity of each algorithm is also discussed.
Writing an OS in Rust
February 3, 2024
This blog series provides tutorials on creating a small operating system in the Rust programming language. Each post includes all the necessary code and is accompanied by a corresponding GitHub repository. The series covers topics such as creating a Rust executable without linking the standard library, building a bootable disk image, implementing VGA text mode, performing unit and integration testing, handling CPU exceptions, setting up the interrupt descriptor table, implementing paging and heap allocation, and exploring cooperative multitasking and the async/await feature of Rust. The posts also include status updates and information on supporting the author.
Ever wanted to make your own programming language or wondered how they are designed and built?
February 3, 2024
Crafting Interpreters is a book that provides everything you need to create your own programming language. It covers both high-level concepts like parsing and semantics, as well as technical details such as bytecode representation and garbage collection. The book guides you through building a language from scratch, including features like dynamic typing, lexical scope, functions, classes, and inheritance. It is available in multiple formats, including print, ebook, and online for free. The author, Robert Nystrom, is an experienced language developer who currently works at Google on the Dart language.
January 2024
61 bookmarksGitHub - sst/demo-ai-app: Sample AI movies app built with ❍ Ion
January 31, 2024
This document provides an overview of the sst/demo-ai-app, a sample movies app built with Ion that demonstrates how to use AI in your apps using your own data. The app includes features such as tagging, related movies, and deep search using natural language. It utilizes the Vector component, which is based on Amazon Bedrock and allows for easy AI integration with your data. The document also highlights the advantages of Ion, including faster deployment and no stack limits. The app works by ingesting movie data from IMDB, generating embeddings, and storing them in a Vector database, which the Next.js app then retrieves.
ThermodynamicComputing
January 31, 2024
Measuring Faithfulness in Chain-of-Thought Reasoning
January 28, 2024
Large language models (LLMs) are more effective when they engage in step-by-step "Chain-of-Thought" (CoT) reasoning, but it is unclear if this reasoning is a faithful explanation of the model's actual process. The study examines how interventions on the CoT affect model predictions, finding that models vary in how strongly they rely on the CoT. The performance boost from CoT does not solely come from added test-time compute or specific phrasing. As models become larger and more capable, they tend to produce less faithful reasoning. The results suggest that faithful CoT reasoning depends on carefully chosen circumstances such as model size and task.
ageron/handson-ml3: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
January 26, 2024
The ageron/handson-ml3 project is designed to teach the fundamentals of Machine Learning using Python. It includes example code and exercise solutions from the third edition of the book "Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow." The project provides options for running the notebooks online, using a Docker image, or installing the project on your own machine. It also addresses frequently asked questions about Python versions, SSL errors, and updating the project. The project has received contributions from various individuals, including reviewers, contributors to exercise solutions, and supporters from the Google ML Developer Programs team.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
January 23, 2024
BERT and RoBERTa have achieved impressive results on sentence-pair regression tasks like semantic textual similarity, but they have a significant computational overhead when comparing large collections of sentences. To address this, Sentence-BERT (SBERT) has been developed as a modification of BERT that uses siamese and triplet network structures to generate semantically meaningful sentence embeddings. SBERT reduces the time required to find the most similar pair from 65 hours with BERT to just 5 seconds, while maintaining accuracy. SBERT outperforms other state-of-the-art sentence embedding methods on various tasks, including STS and transfer learning.
Turing-1951 Intelligent Machinery-a Heretical Theory
January 21, 2024
Self-Rewarding Language Models
January 20, 2024
To achieve superhuman language models, researchers propose the use of self-rewarding language models (LLMs) that provide their own rewards during training. Unlike current approaches that rely on human preferences, LLMs use prompts to judge their own performance and improve their instruction following ability and reward generation. A preliminary study using this approach, specifically fine-tuning Llama 2 70B, demonstrates that it outperforms existing systems on the AlpacaEval 2.0 leaderboard. This work suggests the potential for models that can continually improve in both axes.
Software Development Trends 2023/2024 - Vol. 2.
January 16, 2024
The document provides a summary of important software development trends observed in 2023 that are likely to continue into 2024. It includes information on technology roadmaps, the state of DevOps, cloud computing, serverless technology, databases, and more. Some key insights from the document include the value drivers and risks associated with adopting software engineering technologies, the impact of generative cultures and user-focused teams on performance, and the increasing adoption of serverless solutions. Additionally, the document highlights the need for multi-cloud skills development and the most in-demand cloud skills for 2023.
Word2vec from Scratch
January 15, 2024
Word2vec is a technique used to express words as vectors that encode their semantics in a meaningful way. This article discusses how to implement word2vec from scratch using NumPy. The process involves tokenizing the text, creating lookup tables for words and IDs, generating training data in the form of matrices using one-hot vectorization, and building and training the embedding network. The rows of the weight matrix in the network serve as the word embeddings, representing words as dense vectors. The final output of the network is a probability vector that predicts the nearby context words.
MemGPT: Towards LLMs as Operating Systems
January 15, 2024
MemGPT is a system that manages different memory tiers to provide extended context within the limited context window of large language models (LLMs). Using an OS-inspired design, MemGPT can handle unbounded context using LLMs that have finite context windows. It is successful in domains where existing LLMs' limited context windows severely limit their performance, such as document analysis and multi-session chat. MemGPT supports self-directed editing and retrieval, memory-hierarchy, OS functions, and event-based control flow to manage unbounded context.
Visual Guides to understand the basics of Large Language Models
January 14, 2024
This article provides a compilation of tools and articles that aim to break down the complicated concepts of Large Language Models (LLMs) in an intuitive way. It acknowledges that many people struggle with understanding the basics of LLMs and offers resources to help solidify their understanding. The article includes a table of contents with links to various resources, such as "The Illustrated Transformer" by Jay Alammar, which provides visualizations to explain the transformer architecture, a fundamental building block of LLMs. The goal is to make the concepts of LLMs easily understood and accessible.
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
January 14, 2024
This article provides a comprehensive understanding and coding guide for self-attention mechanisms in transformer architectures and large language models (LLMs) like GPT-4 and Llama. It covers the concept of self-attention, its importance in NLP, and the implementation of the self-attention mechanism in Python and PyTorch. The article also discusses the scaled dot-product attention, computing unnormalized attention weights, computing attention weights, and computing the context vector. Additionally, it explores multi-head attention and provides code examples for implementing multiple attention heads.
Thinking in Systems: International Bestseller: Donella H. Meadows, Diana Wright: 9781603580557: Amazon.com: Books
January 14, 2024
"Thinking in Systems" is a book that explores the concept of systems thinking, which involves analyzing the interconnectedness and dynamics of various systems. The book uses examples such as the human body, businesses, and societal systems to illustrate how stocks and flows contribute to achieving system goals. It also highlights the importance of aligning stated goals with actual outcomes and discusses the need for change in systems that are not functioning optimally. The book emphasizes the complexity of systems and the challenges of making meaningful improvements.
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
January 12, 2024
Backdoored behavior in AI models is most persistent in larger models and models trained to deceive the training process, even when the deceptive behavior is distilled away. Adversarial training can actually make models better at recognizing their backdoor triggers, effectively hiding the unsafe behavior. Safety training techniques, such as reinforcement learning, are often ineffective in removing backdoors. The study explores different methods for training backdoored models and finds that chain-of-thought backdoors allow models to produce consistent reasoning for their deceptive behavior.
This project is about how to systematically persuade LLMs to jailbreak them.
January 10, 2024
This project introduces a taxonomy of 40 persuasion techniques to systematically persuade LLMs (large language models) to jailbreak them. Through iterative application of these techniques, the researchers achieved a 92% success rate in jailbreaking advanced LLMs. They also found that more advanced models are more vulnerable to persuasive adversarial prompts (PAPs) and that adaptive defenses can effectively neutralize these prompts. The research highlights the challenges of addressing user-invoked risks from persuasion and the need for further investigation and improved defenses for more capable models.
Pruning vs Quantization: Which is Better?
January 10, 2024
Neural network pruning and quantization are techniques used to compress deep neural networks. This paper compares the two techniques and provides an analytical comparison of expected quantization and pruning error. The results show that in most cases, quantization outperforms pruning. However, in scenarios with very high compression ratios, pruning may be beneficial. The paper also discusses the hardware implications of both techniques and provides a comparison of pruning and quantization in the post-training and fine-tuning settings.
mlx-examples/lora at main · ml-explore/mlx-examples · GitHub
January 10, 2024
This document provides an example of using MLX to fine-tune either a Llama 7B1 or Mistral 7B2 model with low rank adaptation (LoRA) for a target task. The example demonstrates using the WikiSQL dataset to train the model to generate SQL queries from natural language. It includes instructions for setup, running the script, fine-tuning the model, evaluating the model, generating output, and dealing with memory issues. The document also provides results from the training process and offers tips for reducing memory consumption during fine-tuning.
Mixtral of Experts
January 10, 2024
Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model that outperforms or matches other models like Llama 2 70B and GPT-3.5 across various benchmarks. It has the same architecture as Mistral 7B but uses 8 feedforward blocks (experts) in each layer. A router network selects two experts for each token at each layer, allowing for dynamic selection of different experts at each timestep. This results in each token having access to 47B parameters but only using 13B active parameters during inference. Mixtral also offers a fine-tuned model, Mixtral 8x7B - Instruct, which surpasses other models on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.
Paper page - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
January 10, 2024
The content is a set of instructions on how to cite a specific URL (arxiv.org/abs/2401.01335) in three different types of README.md files, in order to create links from those pages.
From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
January 10, 2024
LLMs (Large Language Models) have been enhanced with innovative prompting strategies and external tools, expanding their capabilities. However, integrating LLMs into conversational agents presents a challenge. This paper introduces RAISE, an enhanced version of the ReAct framework, which utilizes scratchpad and retrieved examples to augment the agent's capabilities. RAISE demonstrates superiority as a conversational agent in experiments conducted on a real estate dataset. The working memory of RAISE consists of conversation history, scratchpad, examples, and task trajectory. The paper also discusses the evaluation of agent performance and the core aspects of planning and Chain-of-Thought reasoning.
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
January 9, 2024
The paper presents WikiChat, a few-shot language model (LLM)-based chatbot that minimizes hallucinations and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia and combines grounded facts with additional information from the corpus to generate factual and engaging responses. The system achieves high factual accuracy and outperforms previous retrieval-based chatbots in terms of informativeness and engagement. The paper also introduces a novel evaluation methodology that combines simulated and real user conversations for assessing the factuality and conversationality of chatbots.
Discovering Language Model Behaviors with Model-Written Evaluations
January 8, 2024
The article discusses an approach to generating evaluations using language models (LMs) with the help of crowdworkers. The LM-generated evaluations were rated highly relevant, with workers agreeing with 90-100% of their labels. The researchers showcase their approach by generating datasets that test LMs for 154 diverse behaviors related to model personality, politics, ethics, social bias, and risks from advanced AI systems. The generated multiple-choice questions help the researchers to reveal additional instances of inverse scaling with RLHF training, as well as to distinguish when concerning behaviors are likely caused by pretraining or RLHF.
Getting Started with Elastic Stack 8.0
January 8, 2024
The Elastic Stack consists of Elasticsearch for data storage and search, Kibana for visualization, and tools like Beats and Logstash for data collection and transformation. Beginners can learn about key topics like indexing, searching, and managing data in Elasticsearch through various chapters in the book. Kibana is essential for interacting with data and building solutions on the Elastic Stack.
Understanding The Exploding and Vanishing Gradients Problem
January 7, 2024
The "Understanding The Exploding and Vanishing Gradients Problem" article discusses the vanishing and exploding gradients problem in deep neural networks. It explains how the gradients used to update the weights can shrink or grow exponentially, causing learning to stall or become unstable. The article explores why gradients vanish or explode exponentially and how it affects the backpropagation algorithm during training. It also provides strategies to address the vanishing and exploding gradients problem, such as using the ReLU activation function, weight initialization techniques, and gradient clipping.
Practical Deep Learning for Coders 2022
January 7, 2024
"Practical Deep Learning for Coders 2022" is a course that covers topics such as building and training deep learning models, deploying models, and using PyTorch and other popular libraries. The course is led by Jeremy Howard, who has extensive experience in machine learning and has created companies that utilize deep learning. The course is suitable for those with at least a year of coding experience and a high school math background. Students will learn how to train models for computer vision, natural language processing, tabular data analysis, and collaborative filtering, and will also learn about the latest deep learning techniques.
fastai/fastbook: The fastai book, published as Jupyter Notebooks
January 7, 2024
The fastai book, published as Jupyter Notebooks, provides an introduction to deep learning, fastai, and PyTorch. It is copyright Jeremy Howard and Sylvain Gugger, and a selection of chapters is available to read online. The notebooks in the repository are used for a MOOC and form the basis of the book, which is available for purchase. The code in the notebooks is covered by the GPL v3 license, while the other content is not licensed for redistribution or change. It is recommended to use Google Colab to access and work with the notebooks. If there are any contributions or citations, copyright is assigned to Jeremy Howard and Sylvain Gugger.
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise, 5th Edition
January 7, 2024
The text explains how word2vec uses one-hot encoded vectors and weight matrices to represent words in a neural network model. It details the learning process for updating weights between input, hidden, and output layers based on prediction errors. The update equations for weights are derived through backpropagation to improve the model's ability to predict words within a context.
Attention? Attention!
January 7, 2024
The document explores the concept of attention, as performed by humans and deep learning algorithms. Attention is used in deep learning to transform one input sequence into another and is accomplished through an encoder-decoder architecture with LSTM or GRU units. The attention mechanism, invented to address the incapability of the fixed-length context vector, creates shortcuts between the context vector and the entire source input. Attention mechanisms vary in form, from soft or hard to global or local. The document also introduces self-attention, which relates different positions of a single sequence to compute a representation of the same sequence, and the Neural Turing Machine, a model architecture for coupling a neural network with external memory storage.
An Intuition for Attention
January 7, 2024
The transformer neural network, used by models like ChatGPT, incorporates an attention mechanism to improve performance. Attention is a key feature of transformers and is defined by an equation that involves the softmax function. Attention can take different forms, but the scaled dot product attention is commonly used. This attention mechanism is based on the idea of key-value lookups, where a query is matched with keys to retrieve corresponding values. The attention scores, which determine how much attention is given to each key-value pair, are computed using dot product similarity and transformed into decimal percentages using the softmax function. This process allows for meaningful and efficient processing of queries in large language models.
Pen and Paper Exercises in Machine Learning
January 7, 2024
This is a collection of (mostly) pen-and-paper exercises in machine learning.
The exercises are on the following topics: linear algebra, optimisation,
directed graphical models, undirected graphical models, expressive power of
graphical models, factor graphs and message passing, inference for hidden
Markov models, model-based learning (including ICA and unnormalised models),
sampling and Monte-Carlo integration, and variational inference.
Transformers From Scratch
January 7, 2024
This blog provides a step-by-step guide on creating and training a transformer from scratch. The author explains each foundational element and provides a Jupyter notebook with the code for readers to run and experiment with. The blog references a YouTube video and the Attention Is All You Need paper for further understanding. The author also mentions the availability of the final code and a dataset for download.
Mathematics for Machine Learning
January 5, 2024
I'm sorry, but there is no content provided for me to summarize.
Linear Algebra Review and Reference
January 5, 2024
Sorry, there is no content provided to summarize. Please provide the content you want me to summarize.
Probability and InformationTheory
January 5, 2024
In this chapter, the authors discuss probability theory and information theory. Probability theory is a mathematical framework for representing uncertain statements and is used in artificial intelligence for reasoning. Information theory, on the other hand, quantifies the amount of uncertainty in a probability distribution. The chapter explains various concepts, such as probability mass functions for discrete variables and probability density functions for continuous variables. It also introduces key ideas from information theory, such as entropy and mutual information. The authors provide examples and explanations to help readers understand these concepts.
Linear Algebra
January 5, 2024
Linear algebra is a fundamental topic in understanding and working with machine learning algorithms, especially deep learning algorithms. This chapter provides an introduction to scalars, vectors, matrices, and tensors, which are the key mathematical objects in linear algebra. It explains the concepts and notation used in linear algebra, such as matrix multiplication, transpose, identity and inverse matrices, and norms. The chapter also introduces special kinds of matrices and vectors, such as diagonal matrices, orthogonal matrices, and eigenvalues and eigenvectors. These concepts are important for analyzing and solving equations in machine learning.
(2) Home
January 5, 2024
Eagle Dynamics has exciting plans for the upcoming year, with the development and release of new aircraft and maps. Some highlights include the introduction of the MiG-29A Fulcrum, as well as the Afghanistan and Iraq maps. They are also continuing their work on the CH-47F, Hellcat/USS Enterprise, and the Marianas WW2 map. Fans of flight simulation can look forward to these upcoming additions to the game.
Mathematics for Machine Learning
January 5, 2024
I'm sorry, but there is no content provided for me to summarize.
An overview of gradient descent optimization algorithms
January 5, 2024
The text provides an overview of gradient descent optimization algorithms commonly used in deep learning. It explains different types of gradient descent methods like batch, stochastic, and mini-batch, highlighting their strengths and challenges. The author also discusses advanced algorithms such as Adagrad, RMSprop, and Adam, which adapt learning rates to improve optimization performance.
An overview of gradient descent optimization algorithms∗
January 5, 2024
The article provides an overview of gradient descent optimization algorithms, which are often used as black-box optimizers. The article outlines the three variants of gradient descent and summarizes the challenges. The article then introduces some widely used algorithms to deal with the challenges, including Nesterov accelerated gradient, Adagrad, Adadelta, and RMSprop. The article explains how these algorithms work and their benefits and weaknesses.
How GPT3 Works - Visualizations and Animations
January 5, 2024
Discussions:
Hacker News (397 points, 97 comments), Reddit r/MachineLearning (247 points, 27 comments)
Translations: German, Korean, Chinese (Simplified), Russian
The tech world is abuzz with GPT3 hype. Massive language models (like GPT3) are starting to surprise us with their abilities. While not yet completely reliable for most businesses to put in front of their customers, these models are showing sparks of cleverness that are sure to accelerate the march of automation and the possibilities of intelligent computer systems. Let’s remove the aura of mystery around GPT3 and learn how it’s trained and how it works.
A trained language model generates text.
We can optionally pass it some text as input, which influences its output.
The output is generated from what the model “learned” during its training period where it scanned vast amounts of text.
GPT in 60 Lines of NumPy
January 5, 2024
This post outlines how to implement a GPT (Generative Pre-trained Transformer) from scratch in just 60 lines of NumPy, including loading trained GPT-2 model weights released by OpenAI and generating text. The GPT generates text given a prompt and the task of predicting the next logical word in a sequence is called language modeling. The post explains how to train a GPT using gradient descent with respect to the cross entropy loss over the language modeling task. The post also touches on prompting and how to handle hyperparameters.
The Annotated Transformer
January 4, 2024
"The Annotated Transformer" is a paper that introduces a new architecture for natural language processing tasks, with a focus on translation. The paper provides an annotated version of the original paper, giving a line-by-line implementation of the model. The Transformer model relies on self-attention to compute representations of its input and output without using sequence-aligned recurrent neural networks or convolutions. The model consists of an encoder and decoder stack, each containing self-attention layers and position-wise feed-forward networks. The paper also discusses the use of multi-head attention and positional encoding in the model. The model is trained using the WMT 2014 English-German dataset and the Adam optimizer.
The Illustrated Transformer
January 4, 2024
"The Illustrated Transformer" is a comprehensive guide to understanding the Transformer model, which utilizes attention to improve the training speed of neural machine translation models. The model consists of stacked encoders and decoders, with each encoder and decoder having self-attention layers. Self-attention allows the model to incorporate information from other words in the input sequence, resulting in better encoding. The model also employs multi-headed attention, which allows it to focus on different positions and creates multiple sets of Query/Key/Value weight matrices. Positional encoding is used to account for the order of words in the input sequence. The architecture includes residual connections and layer normalization for each sub-layer.
GitHub - tensorflow/nmt: TensorFlow Neural Machine Translation Tutorial
January 4, 2024
TensorFlow Neural Machine Translation Tutorial. Contribute to tensorflow/nmt development by creating an account on GitHub.
What Are Word Embeddings for Text?
January 4, 2024
Word embeddings are a way to represent words with similar meanings in a similar manner using real-valued vectors. They are a key advancement in deep learning for natural language processing tasks. You can either train your own word embeddings or use pre-trained ones for your projects.
Deep Learning for Natural Language Processing
January 4, 2024
Deep Learning for Natural Language Processing Develop Deep Learning Models for your Natural Language Problems Working with Text is… important, under-discussed, and HARD We are awash with text, from books, papers, blogs, tweets, news, and increasingly text from spoken utterances. Every day, I get questions asking how to develop machine learning models for text data. Working […]
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)
January 4, 2024
The article explains the mechanics of sequence-to-sequence models, which are deep learning models used for machine translation, text summarization, and image captioning. The article includes visualizations to explain the concepts and requires some previous understanding of deep learning. The article also discusses attention models, which improve machine translation systems by allowing the model to focus on relevant parts of the input sequence. The article provides examples of how attention models work and concludes with a link to TensorFlow's Neural Machine Translation tutorial.
The Random Transformer
January 4, 2024
This blog post provides an end-to-end example of the math within a transformer model, with a focus on the encoder part. The goal is to understand how the model works, and to make it more manageable, simplifications are made and the dimensions of the model are reduced. The post recommends reading "The Illustrated Transformer" blog for a more intuitive explanation of the transformer model. The prerequisites for understanding the content include basic knowledge of linear algebra, machine learning, and deep learning. The post covers the math within a transformer model during inference, attention mechanisms, residual connections and layer normalization, and provides some code to scale it up.
GitHub - SkalskiP/courses: This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
January 4, 2024
SkalskiP/courses is a curated collection of links to various courses and resources about Artificial Intelligence (AI). It includes courses on topics such as generative AI, deep learning, natural language processing, computer vision, machine learning, and more. The repository aims to provide a comprehensive resource for beginners and experienced learners alike. Contributions from the community are encouraged to make the repository even better.
CS25: Transformers United V3
January 4, 2024
Transformers have revolutionized Natural Language Processing (NLP) and are now being applied in various fields, including Computer Vision, Reinforcement Learning, and Speech. This seminar explores the details of how Transformers work and their applications, with a focus on large language models (LLMs). The seminar includes instructor and guest lectures from experts in Transformers research. The schedule includes topics such as the creation of fine-tuned chat models, low-level embodied intelligence with foundation models, and training helpful chatbots. The seminar also covers the motivations behind Transformers, scaling human-centered machine translation, and going beyond LLMs to explore emergent abilities and intermediate-guided reasoning.
Spaces using openai/whisper-large-v2 232
January 3, 2024
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates strong generalization abilities without the need for fine-tuning. The large-v2 model, trained for 2.5x more epochs with added regularization, offers improved performance. The models can be used for transcription and translation tasks, with context tokens indicating the language and task. While the models show robustness and accuracy in many languages, they may exhibit limitations such as generating repetitive texts and hallucinations. The models have potential applications in accessibility tools but also raise concerns about dual use and surveillance capabilities.
Text Summarization: How to Calculate BertScore
January 3, 2024
BERTScore is a metric used to measure the quality of text summarization by calculating the similarity between the summary and the original text. It addresses issues that n-gram-based metrics face, such as incorrect matching of paraphrases and the inability to capture long-range dependencies. The BERTScore architecture involves contextual embeddings, cosine similarity, token matching for precision and recall, importance weighting, and baseline rescaling. The metric has the potential to improve various natural language processing tasks and can be applied in domains such as translation quality assessment, text generation, and document comparison. Future developments include broader language coverage and adaptation for multilingual texts.
Some Core Principles of Large Language Model (LLM) Tuning
January 3, 2024
Large Language Models (LLMs) like GPT2 and GPT3 are trained using unsupervised pre-training on billions to trillions of tokens. After pre-training, the models are fine-tuned for specific use cases such as chatbots or content generation. Fine-tuning can be done through supervised fine-tuning (SFT) or reinforcement learning with human feedback (RLHF). SFT involves minimizing the loss between the model's output and the correct result, while RLHF uses a reward model to optimize the model's performance. InstructGPT is an RLHF-tuned version of GPT3 that is trained to follow instructions and provide aligned responses. There are also open-source alternatives to GPT models, such as GPT-J and GPT-Neo.
MotionGPT: Human Motion as a Foreign Language
January 3, 2024
MotionGPT is a unified model for language and motion tasks, achieving top performance in text-driven motion generation. It combines natural language models with human motion tasks, benefiting fields like gaming and robotics. The model treats human motion like a foreign language, offering a versatile solution for diverse motion synthesis problems.
An intuitive introduction to text embeddings
January 2, 2024
Text embeddings are essential in natural language processing (NLP) and convert text into vector coordinates. They allow us to understand the semantic meaning of words and sentences by representing them as vectors in a high-dimensional latent space. By using text embeddings, we can capture the similarity between texts and perform tasks such as search and classification more efficiently. There are various algorithms and models, such as Word2vec and transformers, that help us generate text embeddings and capture the sequential nature of text. These advancements in text embeddings have greatly improved our ability to reason intuitively about NLP and other machine learning models.
Mathematics for Machine Learning
January 1, 2024
Generative Agents: Interactive Simulacra of Human Behavior
January 1, 2024
The article describes the concept of "generative agents", which are computational software agents that simulate believable human behavior for interactive applications. The agents are created using a large language model and can remember, reflect, and plan based on their past experiences. The article demonstrates generative agents by populating a sandbox environment with 25 agents, where users can observe and intervene as agents plan their days, form relationships, and coordinate group activities. The article discusses the architecture that enables generative agents and their potential applications in various domains.
VOYAGER: An Open-Ended Embodied Agent with Large Language Models
January 1, 2024
The article presents VOYAGER, an embodied agent that continuously explores the Minecraft world, acquires skills, and makes new discoveries without human intervention. VOYAGER consists of three key components: an automatic curriculum for exploration, a skill library for storing and retrieving complex behaviors, and an iterative prompting mechanism for program improvement. The agent utilizes Large Language Models (LLMs) and code as the action space, allowing it to represent temporally extended and compositional actions. The article also highlights VOYAGER's superior performance in discovering novel items, unlocking the Minecraft tech tree, and applying its learned skill library to unseen tasks in a newly instantiated world.
Reader: Frequently Asked Questions
January 1, 2024
Changelog December 19, 2023 Added section about the Daily Digest Explained limitations of Kindle/Google/etc books Explained link between Reader docs and Readwise highlights Updated info about auto-highlighting feature Expanded section about PDF highlights Added browser extension hot key (alt+R) December 7, 2023 Added more context for a
Extensions in Arc: How to Import, Add, & Open
January 1, 2024
Arc has full extension support. Here's how
2025
111 bookmarksMatrices and graphs
June 5, 2025
The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices
DeepSeek-V3 Explained 1: Multi-head Latent Attention
May 29, 2025
Key architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inference
Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT
May 16, 2025
State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs.
You could have designed state of the art positional encoding
May 16, 2025
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
attention is logarithmic, actually
May 16, 2025
supaiku dot com § attention is logarithmic, actually § time complexity is a very bad model when working with parallelism. in which i make the case for work-depth analysis instead of time complexity.
AI Arrives In The Middle East: US Strikes A Deal with UAE and KSA – SemiAnalysis
May 16, 2025
The US has signed two landmark agreements with the United Arab Emirates and Kingdom of Saudi Arabia (KSA) that that will noticeably shift the balance of power. The deals have economic, geopolitical…
Transformers Represent Belief State Geometry in their Residual Stream
May 16, 2025
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS.…
Llama from scratch (or how to implement a paper without crying)
May 16, 2025
I want to provide some tips from my experience implementing a paper. I'm going to cover my tips so far from implementing a dramatically scaled-down versio...
The Curse of Knowing How, or; Fixing Everything
May 16, 2025
A reflection on control, burnout, and the strange weight of technical fluency.
The MAP-Elites Algorithm: Finding Optimality Through Diversity
May 16, 2025
MAP-Elites is a method in reinforcement learning to avoid the local optimum of a search space by storing multiple candidate solutions…
How To Scale
May 13, 2025
While there are already excellent posts on scaling, I wanted to share my own understanding and things i've learned from my past few months and hopefully spark some discussion. I hope this post can shed light for anyone navigating the challenges of scaling up neural networks. And there may be mistakes or inaccuracies, so if you want to correct me or would like to discuss further, please feel free to DM me on X or leave a comment.
Are Transformers universal approximators of sequence-to-sequence functions?
May 3, 2025
Despite the widespread adoption of Transformer models for NLP tasks, the expressive power of these models is not well-understood. In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models. Furthermore, using positional encodings, we circumvent the restriction of permutation equivariance, and show that Transformer models can universally approximate arbitrary continuous sequence-to-sequence functions on a compact domain. Interestingly, our proof techniques clearly highlight the different roles of the self-attention and the feed-forward layers in Transformers. In particular, we prove that fixed width self-attention layers can compute contextual mappings of the input sequences, playing a key role in the universal approximation property of Transformers. Based on this insight from our analysis, we consider other simpler alternatives to self-attention layers and empirically evaluate them.
a Hugging Face Space by nanotron
May 3, 2025
The ultimate guide to training LLM on large GPU Clusters
A Group and Its Center, Intuitively
April 27, 2025
Last week we took an intuitive peek into the First Isomorphism Theorem as one example in our ongoing discussion on quotient groups.
Understanding Entanglement With SVD
April 27, 2025
Quantum entanglement is, as you know, a phrase that's jam-packed with meaning in physics. But what you might not know is that the linear algebra behind it is quite simple.
Training Large Language Models to Reason in a Continuous Latent Space
April 24, 2025
Large language models (LLMs) are restricted to reason in the “language space”, where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem.
The Book of Shaders
April 22, 2025
Gentle step-by-step guide through the abstract and complex universe of Fragment Shaders.
Unstructured Thoughts on the Problems of OSS/FOSS
April 22, 2025
Originally from replies to a Twitter thread: https://x.com/TheGingerBill/status/1914389352416993395
This is not a structured argument against FOSS/OSS but my uncommon thoughts on the topic.
I am not sure if I agree [that FOSS/OSS derives from the same thinking process as the ideology of communism], but I understand the sentiment. The fundamental issue is that software is trivially copyable. I have loads of issues with FOSS and OSS1. And part of this “ideology” (as presented in the original post) is naïvety coupled with only first-order thinking and a poor understanding of ownership.
Training Large Language Models to Reason in a Continuous Latent Space
April 22, 2025
Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.
On the Biology of a Large Language Model
April 22, 2025
Large language models display impressive capabilities. However, for the most part, the mechanisms by which they do so are unknown.
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
April 22, 2025
We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language models function and the origins of linguistic bias. Focusing on the Llama-2 family of transformer models, our study uses carefully constructed non-English prompts with a unique correct single-token continuation. From layer to layer, transformers gradually map an input embedding of the final prompt token to an output embedding from which next-token probabilities are computed. Tracking intermediate embeddings through their high-dimensional space reveals three distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already allow for decoding a semantically correct next token in the middle layers, but give higher probability to its version in English than in the input language; (3) finally move into an input-language-specific region of the embedding space. We cast these results into a conceptual model where the three phases operate in "input space", "concept space", and "output space", respectively. Crucially, our evidence suggests that the abstract "concept space" lies closer to English than to other languages, which may have important consequences regarding the biases held by multilingual language models.
The Unsustainability of Moore’s Law
April 22, 2025
Roughly every two years, the density of transistors that can be fit onto a silicon chip doubles.
"Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?"
April 22, 2025
This isn't a new intuition, but a nice new set of results.
+33 7 80 61 21 67
April 21, 2025
Quickly send and receive WhatsApp messages directly from your computer.
tt-metal/tech_reports/memory/allocator.md at main · tenstorrent/tt-metal
April 19, 2025
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model. - tenstorrent/tt-metal
Multi-layer language heads: the output latent is for text (and nothing else)
April 19, 2025
The last layer’s hidden state in a transformer is meant only for being decoded into token probabilities. Don’t use it for autoregressive image generation Dont’t use it for looped latent transformers Only use it to produce the next token in a language model It is a compressed representation of the...
Subnanosecond flash memory enabled by 2D-enhanced hot-carrier injection
April 19, 2025
A two-dimensional Dirac graphene-channel flash memory based on a two-dimensional-enhanced hot-carrier-injection mechanism that supports both electron and hole injection is used to make devices with a subnanosecond program speed.
CS336: Language Modeling from Scratch
April 19, 2025
Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks.
A Gentle Introduction to Lambda Calculus - Part 1: Syntax
April 19, 2025
Even though lots of people nowadays advocate for applying functional programming principles to JavaScript, not many of them know the principles of Lambda Cal...
Getting Started
April 19, 2025
Yet it seems to me that the situation right now is that LtU has readers with very different backgrounds, among them many readers who haven't studied PL formally.
Intelligence as efficient model building
April 19, 2025
Personal site for posts about my interests: the biotech industry, medicine, molecular biology, neuroscience, biorisk, science, consciousness, AI, innovation, decision making, philosophy, games, sci-fi, probability, and forecasting (among other things). I write to learn, mostly about biotech.
What Is ChatGPT Doing … and Why Does It Work?
April 15, 2025
Stephen Wolfram explores the broader picture of what's going on inside ChatGPT and why it produces meaningful text. Discusses models, training neural nets, embeddings, tokens, transformers, language syntax.
Position: Model Collapse Does Not Mean What You Think
April 10, 2025
The proliferation of AI-generated content online has fueled concerns over \emph{model collapse}, a degradation in future generative models' performance when trained on synthetic data generated by earlier models. Industry leaders, premier research journals and popular science publications alike have prophesied catastrophic societal consequences stemming from model collapse. In this position piece, we contend this widespread narrative fundamentally misunderstands the scientific evidence. We highlight that research on model collapse actually encompasses eight distinct and at times conflicting definitions of model collapse, and argue that inconsistent terminology within and between papers has hindered building a comprehensive understanding of model collapse. To assess how significantly different interpretations of model collapse threaten future generative models, we posit what we believe are realistic conditions for studying model collapse and then conduct a rigorous assessment of the literature's methodologies through this lens. While we leave room for reasonable disagreement, our analysis of research studies, weighted by how faithfully each study matches real-world conditions, leads us to conclude that certain predicted claims of model collapse rely on assumptions and conditions that poorly match real-world conditions, and in fact several prominent collapse scenarios are readily avoidable. Altogether, this position paper argues that model collapse has been warped from a nuanced multifaceted consideration into an oversimplified threat, and that the evidence suggests specific harms more likely under society's current trajectory have received disproportionately less attention.
Recent AI model progress feels mostly like
April 7, 2025
About nine months ago, I and three friends decided that AI had gotten good enough to monitor large codebases autonomously for security problems. We s…
Building an Open Future
April 5, 2025
We are building an open future for AI. Own your silicon future. Join us.
diffusion transofrmers
April 5, 2025
Metaphorically, you can think of Vision Transformers as the eyes of the system, able to understand and contextualize what it sees, while Stable Diffusion is the hand of the system, able to generate and manipulate images based on this understanding.
diffusion transformers
April 5, 2025
Metaphorically, you can think of Vision Transformers as the eyes of the system, able to understand and contextualize what it sees, while Stable Diffusion is the hand of the system, able to generate and manipulate images based on this understanding.
Faking ADTs and GADTs in Languages That Shouldn't Have Them
April 1, 2025
Haskell is the world’s best programming language, but let’s face the harsh reality that a lot of times in life you’ll have to write in other programming languages. But alas you have been fully Haskell-brained and lost all ability to program unless it is type-directed, you don’t even know how to start writing a program without imagining its shape as a type first. Well, fear not. The foundational theory behind Algebraic Data Types and Generalized Algebraic Data Types (ADTs and GADTs) are so fundamental that they’ll fit (somewhat) seamlessly into whatever language you’re forced to write. After all, if they can fit profunctor optics in Microsoft’s Java code, the sky’s the limit! This is an “April Fools” joke in the tradition of my previous one in some of these ways that we are going to twist these other languages might seem unconventional or possibly ill-advised… but also the title is definitely a lie: these languages definitely should have them! :D
Accelerate
March 29, 2025
Accelerate is a language for array-based computations, designed to exploit massive parallelism.
Ok Rust, You Really Have a Readability Problem
March 29, 2025
Rust is safe. Rust is fast. Rust is powerful. And Rust is… sometimes completely unreadable.
Circuit Tracing: Revealing Computational Graphs in Language Models
March 29, 2025
Deep learning models produce their outputs using a series of transformations distributed across many computational units (artificial “neurons”).
Analyzing Modern NVIDIA GPU cores
March 29, 2025
GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pipeline designs based on architectures that are more than 15 years old.
This paper reverse engineers modern NVIDIA GPU cores, unveiling many key aspects of its design and explaining how GPUs leverage hardware-compiler techniques where the compiler guides hardware during execution. In particular, it reveals how the issue logic works including the policy of the issue scheduler, the structure of the register file and its associated cache, and multiple features of the memory pipeline. Moreover, it analyses how a simple instruction prefetcher based on a stream buffer fits well with modern NVIDIA GPUs and is likely to be used. Furthermore, we investigate the impact of the register file cache and the number of register file read ports on both simulation accuracy and performance.
By modeling all these new discovered microarchitectural details, we achieve 18.24% lower mean absolute percentage error (MAPE) in execution cycles than previous state-of-the-art simulators, resulting in an average of 13.98% MAPE with respect to real hardware (NVIDIA RTX A6000). Also, we demonstrate that this new model stands for other NVIDIA architectures, such as Turing. Finally, we show that the software-based dependence management mechanism included in modern NVIDIA GPUs outperforms a hardware mechanism based on scoreboards in terms of performance and area.
tt-metal/tech_reports/AdvancedPerformanceOptimizationsForModels/AdvancedPerformanceOptimizationsForModels.md at main · tenstorrent/tt-metal · GitHub
March 29, 2025
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model. - tenstorrent/tt-metal
Why is Yazi fast?
March 28, 2025
This article assumes that you have already used Yazi and are familiar with most of its features.
User Guide for NVPTX Back-end
March 28, 2025
To support GPU programming, the NVPTX back-end supports a subset of LLVM IR along with a defined set of conventions used to represent GPU programming concepts.
An AnandTech Interview with Jim Keller: 'The Laziest Person at Tesla'
March 27, 2025
I've spoken about Jim Keller many times on AnandTech.
Notes/Primer on Clang Compiler Frontend (1) : Introduction and Architecture
March 25, 2025
Notes/Primer on Clang Compiler Frontend: Introduction and Architecture
These are my notes on chapters 1 & 2 of the Clang Compiler Frontend by Ivan Murashko. The book is focused on teaching the fundamentals of LLVM to C++ engineers who are interested in learning about compilers to optimize their daily workflow by enhancing their code quality and overall development process. (I’ve referened this book extensively, and a lot of the snippets here are from this book.
Implementation of simple microprocessor using verilog
March 25, 2025
I am trying to make a simple microprocessor in verilog as a way to understand verilog and assembly at the same time.
I am not sure if I am implementing what I think of microprocessors well enough ...
learn-fpga/FemtoRV/TUTORIALS/FROM_BLINKER_TO_RISCV/README.md at master · BrunoLevy/learn-fpga · GitHub
March 24, 2025
Learning FPGA, yosys, nextpnr, and RISC-V . Contribute to BrunoLevy/learn-fpga development by creating an account on GitHub.
Why async Rust?
March 24, 2025
I genuinely can’t understand how anybody could look at the mess that’s Rust’s async and think that it was a good design for a language that already had the reputation of being very complicated to write.
Softmax Attention is a Fluke
March 24, 2025
Calibrated AttentionCalibrated Attention NanoGPTAttention is the magic ingredient of modern neural networks. It is the core of what has launched performant language models into the spotlight starting with GPT, and since then, it has extended its hands across all modalities.There are a number of desirable properties that make attention a first-class building block. Namely: • It handles variable sequence lengths with ease • It allows for a global receptive field without needing to scale parameters
Transformers Laid Out
March 23, 2025
I have encountered that there are mainly three types of blogs/videos/tutorials talking about transformers
Template Haskell
March 22, 2025
Intuitively Template Haskell provides new language features that allow us to convert back and forth between concrete syntax, i. e.
A friendly introduction to machine learning compilers and optimizers
March 18, 2025
[Twitter thread, Hacker News discussion]
Comments on Source
March 18, 2025
The section of the wiki allows anyone to document, explain, post questions, or make comments on the Lua source code. You may link to [1] or paste the code in question.
Bloom’s 3 Stages of Talent Development
March 18, 2025
First, fun and exciting playtime. Then, intense and strenuous skill development. Finally, developing one’s individual style while pushing the boundaries of the field.
Russell’s Paradox and Possible Solutions
March 18, 2025
The origins of set theory can be traced back to a Bohemian priest, Bernhard Bolzano (1781-1848), who was a professor of religion at the University of Prague.
The Making of Python
March 17, 2025
Guido van Rossum is the author of Python, an interpreted, interactive object-oriented programming language.
tt-metal/METALIUM_GUIDE.md at main · tenstorrent/tt-metal · GitHub
March 17, 2025
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model. - tenstorrent/tt-metal
Scoping out the Tenstorrent Wormhole
March 17, 2025
The Tenstorrent Wormhole n300s PCIe accelerator board is available for purchase, featuring 672 RISC-V cores driving 466 TFLOP/s of FP8 matmul.
What’s the (floating) Point of all these data types? A (not so) brief overview of the history and usage of datatypes within the wide world of computation
March 17, 2025
This presentation delves into the fascinating and sometimes aggravating world of numerical data types, exploring the evolution, strengths, and weaknesses of decimal, fixed point, floating point, and shared exponent formats over the past 70 years.
Physics of language models
March 17, 2025
Many asked about collaborations (details are in FAQ). Short answer: unless you're from Meta and willing to work with us in your spare time (20+ hrs/week), or you're an early-year PhD from UCB/NYU/CMU/UW (but application ddl was Jan 10, 2025).
Citation request: I'm delighted to know that multiple
Tenstorrent first thoughts
March 17, 2025
I've looked into alternative AI accelerators to continue my saga of running GGML on lower power-consumption hardware. The most promising - and the only one that ever replied to my emails - was Tenstorrent. This post is me deeply thinking about if buying their hardware for development is a good inve ...
Neural Networks, Manifolds, and Topology
March 9, 2025
However, there remain a number of concerns about them. One is that it can be quite challenging to understand what a neural network is really doing.
Attention from Beginners Point of View
March 9, 2025
Transformers are a type of neural network architecture which is popularly used for text generations, machine translations, etc.
(How) Do Language Models Track State?
March 9, 2025
Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that appear to require tracking the unobserved state of an evolving world. How do they do so? We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). Despite the simple algebraic structure of this problem, many other tasks (e.g., simulation of finite automata and evaluation of boolean expressions) can be reduced to permutation composition, making it a natural model for state tracking in general. We show that LMs consistently learn one of two state tracking mechanisms for this task. The first closely resembles the "associative scan" construction used in recent theoretical work by Liu et al. (2023) and Merrill et al. (2024). The second uses an easy-to-compute feature (permutation parity) to partially prune the space of outputs, then refines this with an associative scan. The two mechanisms exhibit markedly different robustness properties, and we show how to steer LMs toward one or the other with intermediate training tasks that encourage or suppress the heuristics. Our results demonstrate that transformer LMs, whether pretrained or fine-tuned, can learn to implement efficient and interpretable state tracking mechanisms, and the emergence of these mechanisms can be predicted and controlled.
Why Attention Is All You NeedWhy Attention Is All You Need
March 9, 2025
The Transformer architecture introduced in this paper was a major breakthrough in sequence transduction methodologies, particularly within neural machine translation (NMT) and broader natural language processing (NLP).
CFD Python: 12 steps to Navier-Stokes
March 7, 2025
We announce the public release of online educational materials for self-learners of CFD using IPython Notebooks: the CFD Python Class!
tt-mlir documentation
March 6, 2025
The following document provides an overview of the TT-MLIR project, with a focus on the technical specifications of an MLIR-based compiler stack. So what exactly is an MLIR-based compiler stack?
Yizhou Shan's Home Page
March 6, 2025
This paper has a really nice Intro, pay close attention to how they lay out the storyline.
Crossing the uncanny valley ofconversational voice
March 1, 2025
At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.
How to Think About TPUs
February 26, 2025
All about how TPUs work, how they're networked together to enable multi-chip training and inference, and how they limit the performance of our favorite algorithms. While this may seem a little dry, it's super important for actually making models efficient.
Programming Really Is Simple Mathematics
February 25, 2025
A re-construction of the fundamentals of programming as a small mathematical theory (PRISM) based on elementary set theory. Highlights:
$\bullet$ Zero axioms. No properties are assumed, all are proved (from standard set theory).
$\bullet$ A single concept covers specifications and programs.
$\bullet$ Its definition only involves one relation and one set.
$\bullet$ Everything proceeds from three operations: choice, composition and restriction.
$\bullet$ These techniques suffice to derive the axioms of classic papers on the "laws of programming" as consequences and prove them mechanically.
$\bullet$ The ordinary subset operator suffices to define both the notion of program correctness and the concepts of specialization and refinement.
$\bullet$ From this basis, the theory deduces dozens of theorems characterizing important properties of programs and programming.
$\bullet$ All these theorems have been mechanically verified (using Isabelle/HOL); the proofs are available in a public repository.
This paper is a considerable extension and rewrite of an earlier contribution [arXiv:1507.00723]
Tenstorrent Wormhole Series Part 1: Physicalities
February 25, 2025
A company called Tenstorrent design and sell PCIe cards for AI acceleration. At the time of writing, they've recently started shipping their Wormhole n150s and Wormhole n300s cards.
Community Highlight: Tenstorrent Wormhole Series Part 2: Which disabled rows?
February 25, 2025
An in depth look at Tenstorrent Wormhole, originally posted on corsix.org
The world's largest prediction market.
February 24, 2025
Polymarket is the world’s largest prediction market, allowing you to say informed and profit from your knowledge by betting on future events across various topics.
neural video codecs: the future of video compression
February 17, 2025
how deep learning could rewrite the way we encode and decode video
Unnamed Document
February 17, 2025
Mastering LLM Techniques: Evaluation
February 15, 2025
Evaluating large language models (LLMs) and retrieval-augmented generation (RAG) systems is a complex and nuanced process, reflecting the sophisticated and multifaceted nature of these systems.
Mastering LLM Inference Techniques: Inference Optimization
February 15, 2025
Learn about the most pressing challenges in LLM inference, along with some practical solutions.
Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling
February 15, 2025
As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is emerging. Also known as AI reasoning or long…
The high-return activity of raising others’ aspirations
February 12, 2025
Yesterday I had lunch with a former Ph.D student of mine, who is now highly successful and tenured at a very good school. I was reminded that, over twenty years ago, I was Graduate Director of Admissions. One of my favorite strategies was to take strong candidates who applied for Masters and also offer them […]
Tilde, my LLVM alternative
January 25, 2025
I'm Yasser and I've made it my mission to produce an alternative to LLVM, the current king of compiler backend libraries.
A WebAssembly compiler that fits in a tweet
January 25, 2025
Starting with a 192-byte one-liner that implements a Reverse Polish Notation arithmetic compiler, we'll work backward to transform it into readable JavaScript by removing one code golf trick at a time
Proof of correctness of data representation
January 25, 2025
Unnamed Document
January 25, 2025
Unveiling_DeepSeek.pdf
January 22, 2025
successful modifications since its inception, let alone large-scale validation.
Stating the problem in Lean
January 19, 2025
Note: this post was written for Lean 3; the latest version, Lean 4, is a very different language.
Turn back the clock to 2009: a confused physics major newly infatuated with math and computer science, I enrolled in MATH 273: Numbers and Proofs at the University of Calgary. This wasn’t my first encounter with mathematical proof; in first-year calculus I’d mastered rote regurgitation of delta-epsilon proofs. Despite writing out several dozen, their meaning never progressed beyond a sort of incantation I can summon to this day (for every \( \epsilon > 0 \) there exists a \( \delta > 0 \) such that…).
DeepSeek-V3 Explained: A Deep Dive into the Next-Generation AI Model
January 18, 2025
Artificial Intelligence (AI) is advancing at an unprecedented pace, and the DeepSeek-V3 model is at the forefront of this revolution. As…
Foundations of Large Language Models
January 17, 2025
This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is structured into four main chapters, each exploring a key area: pre-training, generative models, prompting techniques, and alignment methods. It is intended for college students, professionals, and practitioners in natural language processing and related fields, and can serve as a reference for anyone interested in large language models.
Category Theory: Lecture Notes and Online Books
January 10, 2025
The links below are to various freely (and legitimately!) available online mathematical resources for those interested in category theory at an elementary/intermediate level. There is supplementary page, introductory readings for philosophers, for reading suggestions for those looking for the most accessible routes into category theory and/or links to philosophical discussions. A gentle introduction? My Category … Category Theory: Lecture Notes and Online Books Read More »
Why Futhark?
January 9, 2025
A high-performance and high-level purely functional data-parallel array programming language that can execute on the GPU and CPU.
Hesabım - Pozitif Teknoloji
January 6, 2025
Ödeme - Pozitif Teknoloji
January 6, 2025
*Lütfen açıklama kısmına sipariş numaranızı giriniz, Sipariş numarası yazılmayan havale işlemlerinde ki gecikmelerden firmamız sorumlu değildir.
Clear cache x app ios
January 4, 2025
Any way to delete the cache or app data on iphone? - RedditJul 26, 2023X app taking up 1.
2024
547 bookmarksBloom filters debunked: Dispelling 30 Years of bad math with Coq!
December 27, 2024
While conceptually simple, this feature actually requires more engineering effort than one would expect - in particular, tracking the set of known malicious URLs in a practical manner turns out to be somewhat difficult.
DeepSeek-V3/DeepSeek_V3.pdf at main · deepseek-ai/DeepSeek-V3
December 26, 2024
by Marcus Hutter and David Quarel and Elliot Catt
December 24, 2024
The book can be ordered from amazon. com / co.
Deepseek: The Quiet Giant Leading China’s AI Race
December 24, 2024
Annotated translation of its CEO's deepest interview
Demystifying Debuggers, Part 2: The Anatomy Of A Running Program
December 23, 2024
On the concepts involved in a running program. What happens, exactly, when you double click an executable file, or launch it from the command line, and it begins to execute?
Towards a Categorical Foundation of Deep Learning: A Survey
December 22, 2024
The unprecedented pace of machine learning research has lead to incredible advances, but also poses hard challenges. At present, the field lacks strong theoretical underpinnings, and many important achievements stem from ad hoc design choices which are hard to justify in principle and whose effectiveness often goes unexplained. Research debt is increasing and many papers are found not to be reproducible.
This thesis is a survey that covers some recent work attempting to study machine learning categorically. Category theory is a branch of abstract mathematics that has found successful applications in many fields, both inside and outside mathematics. Acting as a lingua franca of mathematics and science, category theory might be able to give a unifying structure to the field of machine learning. This could solve some of the aforementioned problems.
In this work, we mainly focus on the application of category theory to deep learning. Namely, we discuss the use of categorical optics to model gradient-based learning, the use of categorical algebras and integral transforms to link classical computer science to neural networks, the use of functors to link different layers of abstraction and preserve structure, and, finally, the use of string diagrams to provide detailed representations of neural network architectures.
Soft question: Deep learning and higher categories
December 22, 2024
Recently, I have stumbled upon certain articles and lecture videos that use category theory to explain certain aspects of machine learning or deep learning (e.g. Cats for AI and the paper An enriched
Algebraic Databases
December 22, 2024
Databases have been studied category-theoretically for decades. The database schema---whose purpose is to arrange high-level conceptual entities---is generally modeled as a category or sketch. The data itself, often called an instance, is generally modeled as a set-valued functor, assigning to each conceptual entity a set of examples. While mathematically elegant, these categorical models have typically struggled with representing concrete data such as integers or strings.
In the present work, we propose an extension of the set-valued functor model, making use of multisorted algebraic theories (a.k.a. Lawvere theories) to incorporate concrete data in a principled way. This also allows constraints and queries to make use of operations on data, such as multiplication or comparison of numbers, helping to bridge the gap between traditional databases and programming languages.
We also show how all of the components of our model---including schemas, instances, change-of-schema functors, and queries - fit into a single double categorical structure called a proarrow equipment (a.k.a. framed bicategory).
Categorical Databases
December 22, 2024
walter
December 22, 2024
FPGAs for Software Engineers 0: The Basics
December 22, 2024
A brief introduction to FPGAs, Verilog and simulation
A note about "The Humane Representation of Thought"
December 17, 2024
A year and a half ago, on a plane, I wrote An Ill-Advised Personal Note about "Media for Thinking the Unthinkable".
BLT__Patches_Scale_Better_Than_Tokens
December 17, 2024
On Ousterhout’s Dichotomy Oct 6, 2024
December 17, 2024
Why are there so many programming languages? One of the driving reasons for this is that some
languages tend to produce fast code, but are a bit of a pain to use (C++), while others are a breeze
to write, but run somewhat slow (Python). Depending on the ratio of CPUs to programmers, one or the
other might be relatively more important.
The categorical abstract machine
December 17, 2024
The Cartesian closed categories have been shown by several authors to provide the right framework of the model theory of λ-calculus. The second author…
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
December 17, 2024
We present our position on the elusive quest for a general-purpose framework
for specifying and studying deep learning architectures. Our opinion is that
the key attempts made so far lack a coherent bridge between specifying
constraints which models must satisfy and specifying their implementations.
Focusing on building a such a bridge, we propose to apply category theory --
precisely, the universal algebra of monads valued in a 2-category of parametric
maps -- as a single theory elegantly subsuming both of these flavours of neural
network design. To defend our position, we show how this theory recovers
constraints induced by geometric deep learning, as well as implementations of
many architectures drawn from the diverse landscape of neural networks, such as
RNNs. We also illustrate how the theory naturally encodes many standard
constructs in computer science and automata theory.
Fundamental Components of Deep Learning: A category-theoretic approach
December 17, 2024
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
Logic and linear algebra: an introduction
December 17, 2024
We give an introduction to logic tailored for algebraists, explaining how proofs in linear logic can be viewed as algorithms for constructing morphisms in symmetric closed monoidal categories with additional structure. This is made explicit by showing how to represent proofs in linear logic as linear maps between vector spaces. The interesting part of this vector space semantics is based on the cofree cocommutative coalgebra of Sweedler.
Gemini: A Family of Highly Capable Multimodal Models
December 17, 2024
This report introduces a new family of multimodal models, Gemini, that
exhibit remarkable capabilities across image, audio, video, and text
understanding. The Gemini family consists of Ultra, Pro, and Nano sizes,
suitable for applications ranging from complex reasoning tasks to on-device
memory-constrained use-cases. Evaluation on a broad range of benchmarks shows
that our most-capable Gemini Ultra model advances the state of the art in 30 of
32 of these benchmarks - notably being the first model to achieve human-expert
performance on the well-studied exam benchmark MMLU, and improving the state of
the art in every one of the 20 multimodal benchmarks we examined. We believe
that the new capabilities of the Gemini family in cross-modal reasoning and
language understanding will enable a wide variety of use cases. We discuss our
approach toward post-training and deploying Gemini models responsibly to users
through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud
Vertex AI.
Flow Matching Guide and Code
December 17, 2024
Flow Matching (FM) is a recent framework for generative modeling that has
achieved state-of-the-art performance across various domains, including image,
video, audio, speech, and biological structures. This guide offers a
comprehensive and self-contained review of FM, covering its mathematical
foundations, design choices, and extensions. By also providing a PyTorch
package featuring relevant examples (e.g., image and text generation), this
work aims to serve as a resource for both novice and experienced researchers
interested in understanding, applying and further developing FM.
Logical Complexity of Proofs
December 17, 2024
If you cannot find proofs, talk about them. Robert Reckhow with his advsior Stephen Cook famously started the formal study of the complexity of proofs with their 1979 paper. They were interested in…
Proofs and Types
December 16, 2024
Richard Hamming - Wikipedia
December 16, 2024
Richard Wesley Hamming (February 11, 1915 – January 7, 1998) was an American mathematician whose work had many implications for computer engineering and telecommunications.
What is the "question" that programming language theory is trying to answer?
December 16, 2024
I've been interested in various topics like Combinatory Logic, Lambda Calculus, Functional Programming for a while and have been studying them. However, unlike the "Theory of Computation" which str...
Introducing Limbo: A complete rewrite of SQLite in Rust
December 16, 2024
we forked SQLite with the libSQL project. What would it be like if we just rewrote it?
TLA+ is hard to learn
December 16, 2024
I’m a fan of the formal specification language TLA+. With TLA+, you can build models of programs or systems, which helps to reason about their behavior. TLA+ is particularly useful for reason…
How hard is constraint programming?
December 16, 2024
Writing code using the Z3 SMT solver is different from typical programming, due to mixed programming models--not unlike CUDA for GPUs. Here's what to expect.
Fundamental Components of Deep Learning: A category-theoretic approach
December 16, 2024
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
Geeks, MOPs, and sociopaths in subculture evolution
December 16, 2024
How muggles and sociopaths invade and undermine creative subcultures; and how to stop them.
Advanced programming languages
December 16, 2024
Students often ask for a recommendation on what language they should learn next.
ugh.book
December 16, 2024
Working memory - Wikipedia
December 16, 2024
Working memory is a cognitive system with a limited capacity that can hold information temporarily. [1] It is important for reasoning and the guidance of decision-making and behavior.
Working hurts less than procrastinating, we fear the twinge of starting
December 16, 2024
When you procrastinate, you're probably not procrastinating because of the pain of working. …
llama.cpp guide - Running LLMs locally, on any hardware, from scratch
December 16, 2024
Psst, kid, want some cheap and small LLMs?
GitHub - avinassh/py-caskdb: (educational) build your own disk based KV store
December 16, 2024
(educational) build your own disk based KV store. Contribute to avinassh/py-caskdb development by creating an account on GitHub.
Command Line Interface Guidelines
December 13, 2024
An open-source guide to help you write better command-line programs, taking traditional UNIX principles and updating them for the modern day.
How Many Computers Are In Your Computer?
December 11, 2024
Any ‘computer’ is made up of hundreds of separate computers plugged together, any of which can be hacked. I list some of these parts.
Category theory for scientists (Old version)
December 11, 2024
There are many books designed to introduce category theory to either a
mathematical audience or a computer science audience. In this book, our
audience is the broader scientific community. We attempt to show that category
theory can be applied throughout the sciences as a framework for modeling
phenomena and communicating results. In order to target the scientific
audience, this book is example-based rather than proof-based. For example,
monoids are framed in terms of agents acting on objects, sheaves are introduced
with primary examples coming from geography, and colored operads are discussed
in terms of their ability to model self-similarity.
A new version with solutions to exercises will be available through MIT
Press.
Genie 2: A large-scale foundation world model
December 10, 2024
Generating unlimited diverse training environments for future general agents
Design Of This Website
December 9, 2024
Meta page describing Gwern.net, the self-documenting website’s implementation and experiments for better ‘semantic zoom’ of hypertext; technical decisions using Markdown and static hosting.
WilliamYi96/Awesome-Energy-Based-Models: A curated list of resources on energy-based models.
December 9, 2024
A curated list of resources on energy-based models. - WilliamYi96/Awesome-Energy-Based-Models
"CBLL, Research Projects, Computational and Biological Learning Lab, Courant Institute, NYU"
December 9, 2024
Yann LeCun's Web pages at NYU
yataobian/awesome-ebm: Collecting research materials on EBM/EBL (Energy Based Models, Energy Based Learning)
December 9, 2024
Collecting research materials on EBM/EBL (Energy Based Models, Energy Based Learning) - yataobian/awesome-ebm
TuringConf
December 7, 2024
Omens of exceptional talent
December 6, 2024
Gaiseric…was a man of moderate height and lame in consequence of a fall from his horse. He was a man of deep thought and few words
I’m often asked about the signs of exceptional talent I’ve observed, probably because I spend too much running around talking to people & observing things, instead of doing anything useful.
Patrick Collison, Sam Altman, and Tyler Cowen are the three names that come to mind when thinking about this question. Of my writing, Intelligence killed …
An Introduction to Current Theories of Consciousness
December 6, 2024
(Crosspost from my blog) • • There are few academic lists of theories of consciousness (Doerig 2020) as well as some good blog post series about specific ideas (shout out to SelfAwarePatterns), but…
Being the (Pareto) Best in the World
December 6, 2024
John Wentworth argues that becoming one of the best in the world at *one* specific skill is hard, but it's not as hard to become the best in the worl…
Greg Yang
December 5, 2024
I am currently developing a framework called Tensor Programs for understanding large neural networks.
A Century of Mathematics in America, Part I
December 4, 2024
Fastest contributed programs, grouped by programming language implementation
December 3, 2024
Charts showing benchmark program performance grouped by implementation language.
Haskell as fast as C: working at a high altitude for low level performance
December 3, 2024
After the last post about high performance, high level programming, Slava Pestov, of Factor fame, wondered whether it was generally true that “if you want good performance you have to write C…
On Competing with C Using Haskell
December 3, 2024
Mark Karpov wrote in his article on Migrating text metrics to pure Haskell how he originally did foreign calls out to C for many of the functions in his text metric package, but now ported them to Haskell when he learned that Haskell can give you performance comparable to C.
Performance
December 3, 2024
Moreover, it's often not clear if two programs which supposedly have the same functionality really do the same thing.
TS_Tutorial
December 3, 2024
Category Theory usage in Algebraic Topology
December 3, 2024
First my question:
How much category theory should someone studying algebraic topology generally know?
Motivation: I am taking my first graduate course in algebraic topology next semester, and,...
Topos Theory in a Nutshell
December 3, 2024
Okay, you wanna know what a topos is? First I'll give you a hand-wavy vague explanation, then an actual definition, then a few consequences of this definition, and then some examples.
context
December 3, 2024
Proof Explorer
December 3, 2024
Inspired by Whitehead and Russell's monumental Principia Mathematica, the Metamath Proof Explorer has over 26,000 completely worked out proofs in its main sections (and over 41,000 counting "mathboxes", which are annexes where contributors can develop additional topics), starting from the very foundation that mathematics is built on and eventually arriving at familiar mathematical facts and beyond.
An Invitation to Applied Category Theory
December 3, 2024
Abstract page for arXiv paper 1803.05316: Seven Sketches in Compositionality: An Invitation to Applied Category Theory
An Invitation to Applied Category Theory
December 3, 2024
Cambridge Core - Programming Languages and Applied Logic - An Invitation to Applied Category Theory
Introducing io_uring_spawn
December 2, 2024
The traditional mechanism for launching a program in a new process on Unix systems—forking and execing—has been with us for decades, but it is not really the most efficient of operations.
Information Theory: A Tutorial Introduction
November 29, 2024
Shannon's mathematical theory of communication defines fundamental limits on
how much information can be transmitted between the different components of any
man-made or biological system. This paper is an informal but rigorous
introduction to the main ideas implicit in Shannon's theory. An annotated
reading list is provided for further reading.
Daniel Lemire's blog
November 29, 2024
I find that there can still be a significant benefit to using csFastFloat over the . NET library: it can be about 3 times faster.
A Beginner's Guide to Vectorization By Hand: Part 3
November 29, 2024
We're continuing our expendition to the world of manual vectorization. In this part, we will explain the most common technique for vectorizing conditional code (usually referred as if-conversion).
Competitive Programming
November 29, 2024
This is the supporting web page for a book titled: "Competitive Programming 4: The Lower Bound of Programming Contests in the 2020s" written by Steven Halim, Felix Halim, and Suhendry Effendy.
Coalescence: making LLM inference 5x faster
November 24, 2024
In this post we’re going to explore a surprising property of structured generation when working with Large Language Models (LLMs): generating structured output from an LLM can be significantly faster than generating unstructured text.
þÿClassics in the History of Psychology -- Miller (1956)
November 21, 2024
``You and Your Research''
November 20, 2024
At a seminar in the Bell Communications Research Colloquia Series, Dr. Richard W.
Algorithms for Modern Hardware
November 19, 2024
Its intended audience is everyone from performance engineers and practical algorithm researchers to undergraduate computer science students who have just finished an advanced algorithms course and want to learn more practical ways to speed up a program than by going from O(nlogn) to O(nloglogn).
Creating enums at comptime
November 18, 2024
Using zig's @Type to dynamically create enums at comptime
How to get from high school math to cutting-edge ML/AI: a detailed 4-stage roadmap with links to the best learning resources that I’m aware of.
November 18, 2024
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning.
Fundamental Components of Deep Learning: A category-theoretic approach
November 18, 2024
Deep learning, despite its remarkable achievements, is still a young field.
Like the early stages of many scientific disciplines, it is marked by the
discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform
and compositional mathematical foundation. From the intricacies of the
implementation of backpropagation, through a growing zoo of neural network
architectures, to the new and poorly understood phenomena such as double
descent, scaling laws or in-context learning, there are few unifying principles
in deep learning. This thesis develops a novel mathematical foundation for deep
learning based on the language of category theory. We develop a new framework
that is a) end-to-end, b) unform, and c) not merely descriptive, but
prescriptive, meaning it is amenable to direct implementation in programming
languages with sufficient features. We also systematise many existing
approaches, placing many existing constructions and concepts from the
literature under the same umbrella. In Part I we identify and model two main
properties of deep learning systems parametricity and bidirectionality by we
expand on the previously defined construction of actegories and Para to study
the former, and define weighted optics to study the latter. Combining them
yields parametric weighted optics, a categorical model of artificial neural
networks, and more. Part II justifies the abstractions from Part I, applying
them to model backpropagation, architectures, and supervised learning. We
provide a lens-theoretic axiomatisation of differentiation, covering not just
smooth spaces, but discrete settings of boolean circuits as well. We survey
existing, and develop new categorical models of neural network architectures.
We formalise the notion of optimisers and lastly, combine all the existing
concepts together, providing a uniform and compositional framework for
supervised learning.
How LLVM Optimizes a Function
November 17, 2024
In some compilers the IR format remains fixed throughout the optimization pipeline, in others the format or semantics change.
PS2_and_PC_BIOS_Interface_Technical_Reference_Apr87
November 17, 2024
How 99% of C Tutorials Get it Wrong
November 17, 2024
But this article did not arise only from my own opinion. The argument I'll present here, at least in its general form, is one which programmers who I know personally and I admire a lot (e.
A Beginner's Guide to Vectorization By Hand: Part 1
November 17, 2024
The CPU vendors have been trying for a lot of time to exploit as much parallelism as they can and the introduction of vector instructions is one way to go.
Tell the Compiler What You Know
November 17, 2024
Compilers a lot of times use magic to uncover hidden mysteries of your program and optimize it aggressively.
Compiler Optimization in a Language you Can Understand
November 17, 2024
In this article, I'll explain compiler optimizations through a series of examples, focusing on what compilers do.
How Target-Independent is Your IR?
November 17, 2024
An esoteric exploration on the target independence of compiler IRs.
Bibliopolis-Book-retypeset-1984
November 12, 2024
Numerical Recipes
November 11, 2024
We are Numerical Recipes, one of the oldest continuously operating sites on the Internet.
Unpacking Intuition
November 10, 2024
Can intuition be taught? The way in which faces are recognized, the structure of natural classes, and the architecture of intuition may all be instances of the same process. The conjecture that intuition is a species of recognition memory implies ...
TCP Server in Zig - Part 5a - Poll
October 15, 2024
Using non-blocking sockets and poll to improve the scalability of our system.
6.824 Schedule: Spring 2022
October 1, 2024
Here is the tentative schedule of lectures and due dates. The lecture notes and paper questions for future dates are copies from previous years, and may change.
2305.20091
September 30, 2024
Humans in 4D: Reconstructing and Tracking Humans with Transformers
September 30, 2024
Join the discussion on this paper page
slpj-book-1987.djvu
September 30, 2024
Typing the technical interview
September 30, 2024
In the formless days, long before the rise of the Church, all spells were woven of pure causality, all actions were permitted, and death was common.
Reversing the technical interview
September 30, 2024
If you want to get a job as a software witch, you’re going to have to pass a whiteboard interview.
Hexing the technical interview
September 30, 2024
But Hacker News has read of you, in their snicker-slithing susurrential warrens, and word has spread, which is why the young man offering you a smörgåsbord of microkitchen delights looks mildly suspicious already.
Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)
September 23, 2024
General Lessons from Boosting Data Ingestion in the range-set-blaze Crate by 7x
Conscious exotica
September 21, 2024
From algorithms to aliens, could humans ever understand minds that are radically unlike our own?
B-trees and database indexes
September 13, 2024
B-trees are used by many modern DBMSs. Learn how they work, how databases use them, and how your choice of primary key can affect index performance.
Safe C++
September 13, 2024
Over the past two years, the United States Government has been issuing warnings about memory-unsafe programming languages with increasing urgency.
Tutorial on Diffusion Models for Imaging and Vision
September 10, 2024
The astonishing growth of generative tools in recent years has empowered many
exciting applications in text-to-image generation and text-to-video generation.
The underlying principle behind these generative tools is the concept of
diffusion, a particular sampling mechanism that has overcome some shortcomings
that were deemed difficult in the previous approaches. The goal of this
tutorial is to discuss the essential ideas underlying the diffusion models. The
target audience of this tutorial includes undergraduate and graduate students
who are interested in doing research on diffusion models or applying these
models to solve other problems.
Async Rust can be a pleasure to work with (without `Send + Sync + 'static`)
September 9, 2024
Async Rust is powerful. And it can be a pain to work with (and learn). Async Rust can be a pleasure to work with, though, if we can do it without `Send + Sync + 'static`.
The Perfect Plan
September 3, 2024
Too often do we obsess over the perfect plan to chase our dreams, resulting in analysis paralysis. Instead of being stuck in this limbo, I've made the perfect plan for anyone to chase their dreams.
The Fast Track
September 1, 2024
In order to accelerate the development of prospective mathematical scientists, we have selected a series of textbooks one can study to reach expertise in mathematics and physics in the most efficient manner possible.
Linus Torvalds talks AI, Rust adoption, and why the Linux kernel is 'the only thing that matters'
August 24, 2024
In a wide-ranging conversation with Verizon open-source officer Dirk Hohndel, 'plodding engineer' Linus Torvalds discussed where Linux is today and where it may go tomorrow.
Intercepting and modifying Linux system calls with ptrace
August 24, 2024
Intercepting and modifying Linux system calls with ptrace
What's the big deal about Deterministic Simulation Testing?
August 24, 2024
What's the big deal about Deterministic Simulation Testing?
Zig and Emulators
August 24, 2024
Some quick Zig feedback in the context of a new 8-bit emulator project I starteda little while ago:
A ToC of the 20 part linker essay
August 24, 2024
I release this message (the ToC and comments) into the public domain, no right reserved.
trading_interview_blog
August 21, 2024
`zig cc`: a Powerful Drop-In Replacement for GCC/Clang
August 15, 2024
If you have heard of Zig before, you may know it as a promising new programming language which is ambitiously trying to overthrow C as the de-facto systems language.
Zig Build System
August 15, 2024
The fundamental commands zig build-exe, zig build-lib, zig build-obj, and zig test are often sufficient.
Resources for Amateur Compiler Writers
August 14, 2024
I know complete pans of the literature are left out, but this is a page for amateur compiler writers. Anything that I did not find practical is not listed here.
MattPD/cpplinks: A categorized list of C++ resources.
August 14, 2024
A categorized list of C++ resources. Contribute to MattPD/cpplinks development by creating an account on GitHub.
Putting the “You” in CPU
August 14, 2024
Curious exactly what happens when you run a program on your computer? Learn how multiprocessing works, what system calls really are, how computers manage memory with hardware interrupts, and how Linux loads executables.
How to Compile Your Language
August 9, 2024
The guide also covers how to create a platform-specific executable with the help of the LLVM compiler infrastructure, which all of the previously mentioned languages use for the same purpose.
Introduction to the Odin Programming Language
August 9, 2024
Preface This article is an introduction the Odin Programming Language. It is aimed at people who know a bit of programming, but have never touched Odin. It is not a reference guide, rather I try to keep things informal and talk about what I think are important aspects of the language. There will be some notes on differences to C/C++, as Odin in many ways tries to be better C. If you enjoy this article and want to support me, then you can do so by becoming a patron.
Arena allocator tips and tricks
August 6, 2024
Over the past year I’ve refined my approach to arena allocation. With practice, it’s effective, simple, and fast; typically as easy to use as garbage collection but without the costs.
No Starch Press
August 6, 2024
Your billing information must match the billing address for the credit card entered below or we will be unable to process your payment.
Part 2: Portable Executable Files
August 6, 2024
bytecode interpreters for tiny computers
August 4, 2024
I've previously come to the conclusion that there's little reason for using bytecode in the modern world, except in order to get more compact code, for which it can be very effective.
How I built zig-sqlite
August 4, 2024
When you prepare a statement zig-sqlite creates a brand new type only for this prepared statement.
The Hunt for the Missing Data Type
August 3, 2024
A (directed) graph is a set of nodes, connected by arrows (edges). The nodes and edges may contain data. Here are some graphs:
All graphs made with graphviz (source)
Graphs are ubiquitous in software engineering:
Package dependencies form directed graphs, as do module imports. The internet is a graph of links between webpages. Model checkers analyze software by exploring the “state space” of all possible configurations.
Microfeatures I'd like to see in more languages
August 3, 2024
There are roughly three classes of language features: Features that the language is effectively designed around, such that you can't add it after the fact....
Google’s Fully Homomorphic Encryption Compiler — A Primer
August 2, 2024
Back in May of 2022 I transferred teams at Google to work on Fully Homomorphic Encryption (newsletter announcement). Since then I’ve been working on a variety of projects in the space, includ…
Will I be able to access proprietary platform APIs (e.g. Android / iOS)?
August 1, 2024
The kind of binary format being considered for WebAssembly can be natively decoded much faster than JavaScript can be parsed (experiments show more than 20× faster).
The future of Clang-based tooling
August 1, 2024
By Peter Goodman Clang is a marvelous compiler; it’s a compiler’s compiler! But it isn’t a toolsmith’s compiler. As a toolsmith, my ideal compiler would be an open book, allowing me to get to…
Fast Multidimensional Matrix Multiplication on CPU from Scratch
July 30, 2024
Numpy can multiply two 1024x1024 matrices on a 4-core Intel CPU in ~8ms.This is incredibly fast, considering this boils down to 18 FLOPs / core / cycle, with...
Efficient n-states on x86 systems
July 29, 2024
The text discusses how to efficiently handle control flow in x86 systems when a flag can have multiple states beyond true and false. It explains how to use condition codes, such as testing for zero and parity, to minimize the number of instructions needed for these tests. Additionally, it touches on the challenges and limitations of using inline assembly for optimization in C programming.
Program tuning as a resource allocation problem
July 29, 2024
Program tuning involves balancing simplicity and performance while sharing cache resources among various subsystems. Optimizing one function can impact others, making it a global resource allocation problem that requires careful consideration of algorithms and their resource footprints. Better tools and metrics are needed to manage and analyze cache resource consumption effectively.
How web bloat impacts users with slow connections
July 29, 2024
Web bloat makes many websites difficult to use for people with slow internet connections and devices. Sites like Discourse and Reddit perform poorly on low-end devices, even if they seem fast on high-end ones. Improving web performance for these users is crucial, as many people rely on older, slower devices.
Files are hard
July 29, 2024
Writing files in a way that ensures their robustness is challenging due to the complexity involved. The paper discusses various issues related to file corruption and data loss, such as crash consistency, filesystem semantics, filesystem correctness, error handling, and error recovery. It highlights the differences in how different filesystems handle errors and points out bugs and inconsistencies found in popular filesystems. The paper also addresses the frequency of disk errors and data corruption, emphasizing the need for caution when writing files and the importance of using libraries or tools to ensure safety. Overall, the document emphasizes the difficulty of reasoning about file-related problems and the need for careful considerations when working with filesystems.
Ringing in a new asynchronous I/O API
July 29, 2024
The new "io_uring" interface simplifies asynchronous I/O in the Linux kernel by using two ring buffers for submission and completion queues. Applications can set up these buffers with a system call and submit I/O requests through a structured format. This method aims to reduce complaints about AIO by improving efficiency and ease of use.
applicative-mental-models
July 29, 2024
The text discusses the importance of understanding program performance for effective optimization. It emphasizes that while most optimizations may not be necessary, being aware of critical performance paths is essential. The author provides latency numbers to help programmers grasp the impact of different operations on performance.
applicative-mental-models
July 29, 2024
The text discusses the importance of understanding program performance for effective optimization. It emphasizes that while most optimizations may not be necessary, being aware of critical performance paths is essential. The author provides latency numbers to help programmers grasp the impact of different operations on performance.
Optimizing subroutines in assembly language
July 29, 2024
Optimizing subroutines in assembly language involves various techniques such as using inline assembly in a C++ compiler, separating code using MMX registers from code using ST registers, and understanding different register sizes and memory operands. It is important to consider the use of instruction prefixes, intrinsic functions for vector operations, and accessing class and structure members efficiently. Additionally, preventing false dependences, aligning loop and subroutine entries, and optimizing instruction sizes can improve performance. However, it is crucial to note that these optimizations are processor-specific and may vary depending on the target platform.
Brian Robert Callahan
July 29, 2024
This blog post starts a series on creating programs that demystify how programs work. The first program is a disassembler that reads bytecode and converts it into assembly language, while a future post will cover creating an assembler. The disassembler uses a table of mnemonics and instruction sizes to print out the corresponding assembly instructions from bytecode.
QBE vs LLVM
July 29, 2024
QBE and LLVM are both compiler backends, but QBE is a smaller, more accessible project aimed at amateur language designers. While LLVM is feature-rich and complex, QBE focuses on simplicity and efficiency, making it easier to use for quick projects. QBE provides straightforward operations and a cleaner intermediate language, reducing the complexity often found in LLVM.
Recent presentations and papers
July 29, 2024
Andi Kleen's work focuses on improving Linux performance through various techniques like hardware monitoring and profiling. He has presented on topics such as lock elision, multi-core scalability, and error handling in the Linux kernel. His contributions include discussions on modern CPU performance, tools for Linux development, and enhancements for energy efficiency.
brotli-2015-09-22
July 29, 2024
How long does it take to make a context switch?
July 29, 2024
Context switching times vary significantly across different Intel CPU models, with more expensive CPUs generally performing better. The performance can be greatly affected by cache usage and thread migration between cores, leading to increased costs when tasks are switched. Optimizing the number of threads to match the number of hardware threads can improve CPU efficiency and reduce context switching overhead.
Ghostty Devlog 001
July 29, 2024
Ghostty is a terminal emulator developed as a side project. In this devlog, the author shares details about the tech stack behind Ghostty, including its cross-platform capabilities and GPU acceleration. The devlog also introduces two features: automatic shell integration injection and auto-italicize fonts. The shell integration feature improves prompt redrawing, working directory reporting, and active process detection, while the auto-italicize fonts feature fixes a bug and adds the ability to skew regular fonts to create fake italics. The devlog concludes by inviting readers to follow the author on social media for updates and future devlogs.
Tiled Matrix Multiplication
July 29, 2024
Tiled matrix multiplication is an efficient algorithm used on GPUs that reduces memory access by utilizing shared memory. By organizing threads into blocks, each thread can perform calculations more quickly and with fewer memory accesses. This method is important for improving performance in tasks like graphics rendering and machine learning.
Rust Atomics and Locks
July 29, 2024
This book by Mara Bos explores Rust programming language's concurrency features, including atomics, locks, and memory ordering. Readers will gain a practical understanding of low-level concurrency in Rust, covering topics like mutexes and condition variables. The book provides insights on implementing correct concurrency code and building custom locking and synchronization mechanisms.
Compiler Backend
July 29, 2024
The QBE compiler backend is designed to be a compact yet high-performance C embeddable backend that prioritizes correctness, simplicity, and user-friendliness. It compiles on various x64 operating systems and boasts features like IEEE floating point support, SSA-based intermediate language, and quick compilation times. While currently limited to x64 platforms, plans include ARM support and further enhancements. The backend has been successfully utilized in various projects, showcasing its adaptability and effectiveness in compiler development.
Vale's Memory Safety Strategy: Generational References and Regions
July 29, 2024
Vale's memory safety strategy uses generational references to manage memory without relying on traditional methods like garbage collection. Each reference stores a "generation" ID, and before accessing an object, a check ensures the ID matches the object's current generation. This approach allows for efficient memory management while maintaining safety, reducing overhead significantly compared to other methods.
Introduction
July 29, 2024
Wait-freedom ensures that each thread can progress independently, executing operations in a fixed number of steps without being blocked by others. Lock-freedom allows the system to make overall progress, but individual threads might still get stuck. Obstruction-freedom means a thread can only progress without interference from others, making it a weaker guarantee than lock-freedom.
Cache-Oblivious Algorithms
July 29, 2024
Cache-oblivious algorithms are designed to use processor caches efficiently without needing to know specific cache details. They work by dividing data into smaller parts, allowing more computations to happen in cache and reducing memory access. This leads to better performance, especially in parallel algorithms, by minimizing shared memory bottlenecks.
A Memory Allocator
July 29, 2024
A memory allocator is software that manages dynamic memory allocation in programs, providing functions like malloc(), free(), and realloc(). This particular allocator aims to minimize memory wastage and improve efficiency, and it is widely used in various systems, including Linux. It employs techniques like coalescing freed chunks and supports memory mapping to enhance performance and reduce fragmentation.
Cramming: Training a Language Model on a Single GPU in One Day
July 29, 2024
Recent trends in language modeling have focused on increasing performance
through scaling, and have resulted in an environment where training language
models is out of reach for most researchers and practitioners. While most in
the community are asking how to push the limits of extreme computation, we ask
the opposite question: How far can we get with a single GPU in just one day?
We investigate the downstream performance achievable with a transformer-based
language model trained completely from scratch with masked language modeling
for a single day on a single consumer GPU. Aside from re-analyzing nearly all
components of the pretraining pipeline for this scenario and providing a
modified pipeline with performance close to BERT, we investigate why scaling
down is hard, and which modifications actually improve performance in this
scenario. We provide evidence that even in this constrained setting,
performance closely follows scaling laws observed in large-compute settings.
Through the lens of scaling laws, we ...
The MiniPile Challenge for Data-Efficient Language Models
July 29, 2024
The MiniPile Challenge introduces a new dataset for pre-training language models, containing 1 million documents filtered for quality. It aims to reduce the need for large computational resources while still achieving competitive performance on language tasks. The research shows that models pre-trained on MiniPile perform only slightly worse than those trained on much larger datasets.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
July 29, 2024
The authors present a method for training large text-to-image diffusion models on a very low budget. They use a technique called deferred masking to minimize performance loss while reducing computational costs. Their approach achieves high-quality results at a fraction of the cost compared to existing models, demonstrating the potential for democratizing AI training.
1024cores
July 29, 2024
Dmitry Vyukov shares information on synchronization algorithms, multicore design patterns, and high-performance computing on his website, 1024cores.net. He focuses on shared-memory systems and does not cover topics like clusters or GPUs. New content is added regularly, and readers can subscribe for updates.
Implementing interactive languages
July 28, 2024
Implementing an interactive language requires considering both compile-time and run-time performance. Traditional switch-based bytecode interpreters are easy to implement but have lower run-time performance compared to optimizing compilers. A sweet spot in performance can be found by aiming for combined compile-time and run-time performance within a certain range. Various options for implementing fast interpreters, existing compilers like LLVM and Cranelift, custom compilers, and using WebAssembly as a backend are discussed. The idea of having two backends for a language to support quick startup and aggressive optimization is also explored. There are still many unknowns and further research is needed to determine the feasibility and performance of different approaches.
Pointers Are Complicated, or: What's in a Byte?
July 28, 2024
The document explains the complexities of pointers in low-level programming languages like C++ and Rust, debunking the misconception that pointers are simple integers. It delves into examples showing how assumptions about pointers can lead to undefined behavior and how pointer arithmetic can be tricky. The text proposes a model where a pointer is a pair of an allocation ID and an offset, rather than just an integer. Additionally, it discusses the challenges of representing bytes in memory, especially when dealing with uninitialized memory and the need for a more nuanced byte representation to ensure program correctness.
Three Architectures for a Responsive IDE
July 28, 2024
The text discusses three architectures for a responsive IDE: indexing on a per-file basis, using a FQN index for completion, and a query-based compiler approach. Each approach has its own challenges and benefits, such as handling macro expansions and managing dependencies efficiently to ensure fast performance.
How a Zig IDE Could Work Feb 10, 2023
July 28, 2024
The author discusses how to build an Integrated Development Environment (IDE) for the Zig programming language, which has unique features like a simple syntax but also complex compile-time evaluation. The IDE needs to handle incomplete code and provide immediate feedback while managing rapid code changes. The post explores various strategies for efficiently processing code, such as using abstract interpretation and optimizing compilation to focus only on necessary parts of the codebase.
Properly Testing Concurrent Data Structures Jul 5, 2024
July 28, 2024
The article discusses how to effectively test concurrent data structures by using managed threads that can be paused and resumed. It explains the importance of controlling thread execution to avoid issues like race conditions while executing random operations. The author emphasizes the need for proper synchronization mechanisms to ensure that only one thread is active at a time during tests.
Parse, don’t validate
July 28, 2024
The text discusses the importance of parsing over validating in Haskell to prevent errors and enhance code reliability by using strong argument types. Parsing upfront helps maintain consistency and avoids potential issues with partial input processing, demonstrating the benefits of type-driven design in Haskell programming. The text also touches on the subjective nature of programming languages, highlighting differing perceptions of Haskell and the challenges faced by learners in navigating diverse opinions.
Too Fast, Too Megamorphic: what influences method call performance in Java?
July 28, 2024
The performance of method calls in Java can be improved through techniques like inlining and using inline caches. Monomorphic calls, where only one method can be invoked, are the fastest, while bimorphic and megamorphic calls are slower due to increased lookup costs. The study highlights that simply adding the "final" keyword or overriding methods does not significantly enhance performance.
The Black Magic of (Java) Method Dispatch
July 28, 2024
The content shows code execution percentages for different operations within a program. It includes instructions for handling different coders, with comparisons and jumps based on coder values. The code includes sections like the main entry point, epilogue, handling other coders, and specific coder cases like Coder1 and Coder2.
Why null sucks, even if it's checked
July 28, 2024
The article discusses the problems with using null in programming languages like Kotlin and C#, highlighting that null can lead to confusion and errors. It argues that null is not an extensible solution for representing absence of value and suggests using sum types or optional types instead. The author believes that languages should focus on improving optional types rather than trying to make null safer.
Unnamed Document
July 28, 2024
Resources for Building Programming Languages
July 28, 2024
The article shares resources for learning how to create programming languages, focusing on Rust and C. It highlights the book "Crafting Interpreters," which provides practical insights into building interpreters using different programming approaches. The author also discusses their personal experience building a language and the tools they've found helpful, like LLVM and Cranelift.
Little 'Big Ideas' in Programming Language Design
July 28, 2024
Colin Davis discusses "little big ideas" in programming language design, focusing on the balance between innovative features and conventional choices. He highlights Mojo and Go as examples, noting how Mojo combines modern improvements with familiar concepts, while Go prioritizes simplicity and a strong ecosystem. Davis suggests that small design decisions, like memory management and parameter passing, can greatly enhance a language's usability and performance.
Computer Networking: A Top-Down Approach
July 27, 2024
Jim Kurose and Keith Ross are prominent computer science professors with extensive experience in networking and related fields. They have received multiple awards for their teaching and research, and both have held leadership roles in academic and professional organizations. Their work focuses on topics like network protocols, security, and multimedia communication.
Using Uninitialized Memory for Fun and Profit Posted on Friday, March 14, 2008.
July 27, 2024
A clever trick involves using uninitialized memory to improve performance in certain programming situations by representing sparse sets efficiently with two arrays that point at each other. This technique allows for fast constant-time operations for adding, checking, and clearing elements in the set, making it a valuable tool for optimizing algorithms and data structures. The sparse set representation is especially useful for scenarios where speed is critical, such as in compiler optimizations and graph traversal algorithms.
Zip Files All The Way Down
July 27, 2024
The text discusses creating self-reproducing programs and files like zip files that can decompress to themselves. It explores using Lempel-Ziv compression for self-reproduction and the challenges of translating these concepts into real opcode encodings like DEFLATE used in gzip and zip files. The ultimate goal is to create a zip file that contains a larger copy of itself recursively, creating a chain of expanding zip files.
UTF-8: Bits, Bytes, and Benefits Posted on Friday, March 5, 2010.
July 27, 2024
UTF-8 is a straightforward way to encode Unicode code points into a byte stream, and understanding its inner workings is key to leveraging its benefits. Key properties of UTF-8 include preserving ASCII files, ensuring ASCII bytes are represented as themselves, and requiring code points to be encoded using the shortest possible sequence. The encoding is self-synchronizing, facilitating substring searches and making it compatible with most programs that handle 8-bit files safely. While some tools may need modification to handle UTF-8, it is increasingly becoming the standard encoding due to its practical advantages and simple design.
Minimal Boolean Formulas
July 27, 2024
The post discusses how to compute the minimum number of AND and OR operators needed for Boolean functions with five variables. It describes the author's program that efficiently calculates this minimum for various functions while also improving algorithms for speed. The findings contribute to understanding the complexity of Boolean functions and their representations.
Hacking the OS X Kernel for Fun and Profiles Posted on Tuesday, August 13, 2013.
July 27, 2024
The article discusses a bug in the OS X kernel related to how profiling signals are delivered in multithreaded processes. It explains that the kernel incorrectly sends the SIGPROF signal to the entire process instead of the specific running thread. The author outlines a fix involving a small edit to the kernel code to ensure the signal is sent to the correct thread.
How To Build a User-Level CPU Profiler Posted on Thursday, August 8, 2013.
July 27, 2024
The text discusses how the pprof tool simplifies CPU profiling for C++ and Go programs by utilizing hardware timers and the operating system. Profiling information is gathered through hardware interrupts, providing insights into a program's performance and resource usage. By moving profiling logic to user-level timers, programs can customize and enhance profiling capabilities without kernel changes.
An Encoded Tree Traversal
July 27, 2024
The text discusses different ways to traverse binary trees and how these methods can be generalized to k-ary trees. It highlights a new ordering for traversing k-ary trees that results in a regular numbering pattern, which is not present in the traditional methods. The author seeks references or examples of this k-ary-coded traversal order, which he has not yet found.
Our Software Dependency Problem
July 27, 2024
The text discusses the risks and benefits of using software dependencies in programming. It emphasizes the importance of understanding, managing, and monitoring dependencies to prevent potential issues like bugs and security vulnerabilities. The article highlights the need for developers to establish best practices for effectively utilizing dependencies in their projects.
The Magic of Sampling, and its Limitations Posted on Saturday, February 4, 2023.
July 27, 2024
Sampling can help estimate the percentage of items with a specific trait accurately. The number of samples taken greatly affects the accuracy of the estimate. To get precise estimates, all items must have an equal chance of being selected during sampling.
Running the “Reflections on Trusting Trust” Compiler Posted on Wednesday, October 25, 2023.
July 27, 2024
The text discusses how to modify a C compiler to insert a backdoor into a program without leaving traces in the source code. It explains that the backdoor can be detected because the compiler's size increases each time it compiles itself. Finally, it highlights the importance of using trusted compilers to prevent hidden backdoors in modern software development.
Improving the Font Pipeline
July 26, 2024
To improve the font pipeline, consider how to efficiently choose and render glyphs for different languages, including handling ligatures and memory constraints. You may need to create texture atlases for various glyphs while ensuring new translations are incorporated. Finally, optimize rendering to avoid blurriness and ensure smooth performance across different character sets.
Easy Scalable Text Rendering on the GPU
July 26, 2024
This text explains a fast and memory-efficient technique for rendering text on the GPU without using traditional methods like signed distance fields. It uses triangles to fill in pixels inside the glyph and supports subpixel anti-aliasing for crisp text on LCD screens. The technique is resolution-independent, simple to implement, and can be extended to enhance rendering quality.
Adventures in Text Rendering: Kerning and Glyph Atlases
July 26, 2024
Text rendering involves converting vector glyphs to bitmaps, positioning them on screen, and optimizing performance by using glyph atlases. Glyph atlases store rasterized glyphs efficiently, allowing for sub-pixel alignment and improved rendering quality. This approach balances performance and quality in text rendering for different types of fonts.
Exploring the Power of Negative Space Programming
July 25, 2024
Negative space programming helps improve code by defining what it should not do, making it more robust and clear. By using constraints and assertions, developers can catch errors early and enhance security. This approach also promotes simplicity, making the code easier to maintain and understand.
CompilerTalkFinal
July 25, 2024
The content discusses various compilers and their features, including Clang, GCC, V8, CakeML, Chez Scheme, and more. It also touches on the history of interpreters and compilers, with examples like ENIAC and the first compiler developed by Grace Hopper. Different approaches to compilation and interpretation are highlighted, showcasing the evolution of compiler technology.
Graydon Hoare: 21 compilers and 3 orders of magnitude in 60 minutes
July 25, 2024
Graydon Hoare's talk explains different approaches to building compilers, from traditional giants to more efficient variants. He highlights the importance of using compiler-friendly languages and theory-driven meta-languages. The presentation covers key concepts like sophisticated partial evaluation and implementing compilers directly by hand.
p75-hoare
July 25, 2024
The author recounts experiences in designing a computer programming language and issues a warning about language complexity. Despite challenges, a subset of the language was successfully implemented. The author emphasizes the importance of simplicity and reliability in programming languages for critical applications.
Updating the Go Memory Model
July 23, 2024
The Go memory model needs updates to clarify how synchronization works and to endorse race detectors for safer concurrency. It suggests adding typed atomic operations and possibly unsynchronized atomics to improve program correctness and performance. The goal is to ensure that Go programs behave consistently and avoid data races, making them easier to debug.
Programming Language Memory Models (Memory Models, Part 2) Posted on Tuesday, July 6, 2021. PDF
July 23, 2024
Modern programming languages use atomic variables and operations to help synchronize threads and prevent data races. This ensures that programs run correctly by allowing proper communication between threads without inconsistent memory access. All major languages, like C++, Java, and Rust, support sequentially consistent atomics to simplify the development of multithreaded programs.
Hardware Memory Models (Memory Models, Part 1) Posted on Tuesday, June 29, 2021. PDF
July 23, 2024
This text discusses hardware memory models, focusing on how different processors handle memory operations and maintain order. It explains the concept of sequential consistency, where operations are executed in a predictable order, and contrasts it with more relaxed models like those used in ARM and POWER architectures. The author highlights the importance of synchronization to avoid data races in concurrent programming.
Baby Steps to a C Compiler
July 23, 2024
Writing a simple compiler can help you understand how computers work. Start with a minimal project that compiles a small subset of a language, and then gradually add more features. This approach makes learning about compilers and programming enjoyable and rewarding.
Kernel Programming Guide
July 23, 2024
Essential information for programming in the OS X kernel. Includes a high-level overview.
Tiny Tapeout
July 23, 2024
Tiny Tapeout is a project that helps people easily and affordably create their own chip designs. It offers resources for beginners and advanced users, along with a special price for submissions. Join the community to learn and share your designs before the deadline on September 6th.
Why Pascal is Not My Favorite Programming Language
July 23, 2024
Pascal is not recommended for serious programming due to limitations in its standard form. The language's strict typing and lack of features like separate compilation make it challenging for complex projects. Pascal is better suited for educational purposes rather than practical programming tasks.
What Color is Your Function?
July 23, 2024
Functions in a programming language can be either red or blue, affecting how they are called and used. Red functions are asynchronous and typically more complex to work with than blue functions. The choice between red and blue functions can impact code organization and maintainability.
What is an Invariant? Oct 6, 2023
July 22, 2024
Invariants are properties that hold true during the evolution of a system, helping to ensure correct behavior in programming. They can simplify reasoning about code, whether it’s for small algorithms or larger systems. By clearly defining invariants, programmers can create robust code and manage complex systems effectively.
Chess-GPT's Internal World Model
July 22, 2024
The blog post discusses how a GPT model trained on chess games learns to predict moves and track the board state without being explicitly given the rules. It successfully classified chess pieces with high accuracy and estimated player skill levels based on game moves. The findings suggest that models trained on strategic games can effectively learn complex tasks through pattern recognition.
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
July 22, 2024
Researchers trained a chess-playing language model to understand the game without prior knowledge, focusing on how it represents the board state. They found that the model not only learned the board's layout but also estimated player skill, which helped it predict the next move better. By incorporating a player skill vector, the model's win rate improved significantly.
Manipulating Chess-GPT's World Model
July 22, 2024
The author explores how Chess-GPT, a language model for chess, can improve its performance by manipulating its internal understanding of player skill and board state. By using linear probes and skill interventions, the model's chess-playing ability was significantly enhanced, especially in games with random initializations. The findings suggest that Chess-GPT learns a deeper understanding of chess rather than just memorizing patterns.
Crafting an Interpreter in Zig - part 1
July 22, 2024
The author is learning Zig by implementing an interpreter for the Lox programming language, inspired by the book "Crafting Interpreters." They are documenting their journey, focusing on interesting aspects of Zig and how it differs from C. So far, they have enjoyed the process, particularly the simplicity and power of Zig's generic programming.
Teach Yourself Programming in Ten Years
July 21, 2024
The text discusses the misconception of quickly learning programming in a short time, emphasizing that true expertise takes about ten years of dedicated practice and learning. It highlights the importance of hands-on experience, interacting with other programmers, and working on various projects to become a skilled programmer. The text emphasizes that mastering programming requires practical application, learning from others, and continuous practice over time.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
July 21, 2024
The text discusses the challenges and considerations of floating-point arithmetic in computer science. It emphasizes the importance of rounding in floating-point calculations and the implications of different precision levels. Additionally, it highlights the need for careful implementation to ensure correctness and accuracy in programs that rely on floating-point arithmetic.
The Development of the C Language*
July 20, 2024
The paper discusses the development and influences of the C programming language, highlighting its creation at Bell Labs and transition from the B language. C's simplicity, efficiency, and widespread adoption across various platforms and architectures are emphasized, showcasing its enduring stability and usefulness in software development. Despite its quirks and historical origin, C has proven to be a powerful and versatile language for programmers worldwide.
Class Warfare
July 20, 2024
The text discusses a woman's conversation about company politics and self-interest, highlighting a zero-sum mentality within organizations. It emphasizes the need to shift away from this mindset and focus on creating value instead. The author suggests that combating this mentality starts with internal change and encourages individuals to reject zero-sum thinking for long-term benefit.
Ownership
July 20, 2024
A Note About Zig Books for the Zig Community
July 18, 2024
The author discusses the idea of writing a Zig book and shares personal plans for self-publishing their own book. They weigh the pros and cons of working with a publisher versus self-publishing, emphasizing the importance of considering creative freedom and revenue sharing. The author encourages those interested in writing a Zig book to carefully evaluate their options, noting that the Zig community values learning materials and support.
Your Starting Point!
July 17, 2024
The text discusses the concepts of three-dimensional objects and how they are represented in two dimensions for computer graphics. It explains the process of projecting 3D points onto a canvas to create images. The importance of geometry and mathematics in computer graphics, particularly in defining objects and creating images, is emphasized.
Zig Interfaces for the Uninitiated, an update
July 17, 2024
The post discusses a new idiom for runtime polymorphism in Zig, focusing on using fat pointers instead of @fieldParentPtr. It provides a step-by-step guide on creating a formal Iterator interface and implementing it with an example range iterator. The drawbacks of this pattern include potential performance issues and the requirement for the original implementor to remain alive for the interface to function correctly.
Zig Interfaces for the Uninitiated
July 17, 2024
The text discusses how to create and implement generic iterators in Zig using interfaces like `Iterator` and `Range`.
It demonstrates how to use these iterators to iterate over ranges of values and provides examples of ascending, descending, and skipping ranges.
Additionally, it introduces a function `fold` to apply a function to successive elements in an iterator, showcasing Zig's runtime polymorphism for data structures.
Exploring Compile-Time Interfaces in Zig
July 17, 2024
Zig is a programming language with active community support and a focus on efficient, reusable software development. Interfaces in Zig define a blueprint for classes to implement specific methods, promoting code abstraction and flexibility. Compile-time interfaces in Zig optimize code structure by resolving methods during compilation for efficient program execution.
Aro - a C compiler
July 17, 2024
Aro is a C compiler created as an alternative to Zig's compiler. It includes the aro module for the compiler and a language-agnostic aro_backend module for translating code into machine code. Aro uses self-hosted backends from the Zig compiler for optimization.
Database Systems
July 17, 2024
This course at CMU covers database management systems, including data models, query languages, storage architectures, and more. It uses case studies to show real-world applications and is suitable for students with basic systems programming skills. The course also thanks companies for their support in equipment donations and course development.
Discovering and exploring mmap using Go
July 17, 2024
Memory-mapped files allow programs to access disk data larger than available memory. By using mmap in Go, you can map a file directly into memory for easier manipulation. Virtual memory techniques, like mmap, can help solve memory limitations in handling large files efficiently.
But how, exactly, databases use mmap?
July 17, 2024
Databases use memory-mapped files like mmap to handle data on disk larger than available memory. Examples include SQLite, LevelDB, Lucene, LMDB, and MongoDB. By understanding how mmap is used, we can grasp how databases efficiently read and write data from disk.
reHow memory mapped files, filesystems and cloud storage works
July 17, 2024
Kelly discusses the challenges of memory-mapped files and cloud storage in response to a comment about space reservation in Voron. Cloud providers may allocate more space than needed, leading to unexpected charges and unreliable data handling. Testing reveals issues with sparse files and memory mapping in cloud scenarios, highlighting the importance of understanding storage limitations.
Implementing a file pager in Zig
July 17, 2024
Implementing a file pager in Zig involves delaying disk writes until a threshold is reached. Two eviction strategies include least recently used and least frequently used models. Prioritizing pages based on usage can help optimize performance.
Criticizing Hare language approach for generic data structures
July 17, 2024
The blog criticizes the Hare language approach for not providing generic data structures like hash maps in its standard library. It highlights the complexity and importance of hash tables in various programming languages and emphasizes the need for efficient data structures in modern programming ecosystems. The author disagrees with Hare's approach and stresses the significance of hash tables in software development.
spikedoanz/from-bits-to-intelligence: machine learninig stack in under 100,000 lines of code
July 16, 2024
The text discusses building a machine learning stack in under 100,000 lines of code with hardware, software, tensors, and machine learning components. It outlines the required components like a CPU, GPU, storage, C compiler, Python runtime, operating system, and more. The goal is to simplify the machine learning stack while providing detailed steps for implementation in different programming languages.
One year of C
July 16, 2024
The author reflects on their year of writing C code, finding it enjoyable and productive. They emphasize the importance of choosing the right language for each problem and share insights on the benefits of using C over C++ in certain scenarios. Additionally, they discuss the advantages of C99 improvements and the simplified nature of writing C code compared to C++.
Heap Memory and Allocators
July 15, 2024
The text discusses different types of memory allocators in Zig programming language.
It explains how memory allocation and deallocation work using alloc and free functions.
Various allocator types like GeneralPurposeAllocator and FixedBufferAllocator are highlighted for managing memory efficiently.
Learning Zig - Pointers
July 15, 2024
Pointers
July 15, 2024
Pointers in Zig allow variables to reference memory addresses. Understanding pointers helps manipulate memory effectively. Pointers are values that store memory addresses and can be nested within structures.
Data Compression Explained
July 15, 2024
Data compression involves modeling and coding to reduce the size of data files. Modern compressors typically use arithmetic coding for efficient compression. Algorithms like Huffman coding and run-length encoding are commonly used to achieve better compression results.
Twitter's Recommendation Algorithm
July 14, 2024
Twitter uses a recommendation algorithm to select the top tweets for users' timelines. The algorithm is based on core models and features that extract information from tweet, user, and engagement data. The recommendation pipeline consists of three main stages: candidate sourcing, ranking, and applying heuristics and filters. Twitter uses both in-network and out-of-network sources to find relevant tweets, and employs embedding spaces to determine content similarity. The final step involves blending tweets with other non-tweet content before sending them to users' devices. The goal of Twitter's open source endeavor is to provide transparency to users about how the recommendation system works.
Programming languages resources
July 13, 2024
This page is a collection of the author's favorite resources for people getting started writing programming languages. The resources cover various aspects such as compilers, runtimes, runtime optimization, pointer tagging, JIT compilers, assembler libraries, and interesting tools. The author also mentions topics they want to write about in the future and papers they want to read. The page is meant to be a helpful reference for those interested in programming language implementation.
3D Math Primer for Graphics and Game Development
July 12, 2024
The book "3D Math Primer for Graphics and Game Development" is available to read for free on the gamemath.com website. It includes information about GDC talks, FAQs, and resources for the first edition of the book. The first edition, published in 2002, is described as high tech, but the author recommends reading the second edition instead, which is also available for free.
Welcome to OpenGL
July 12, 2024
This text is about learning modern OpenGL through an online book that covers basic, intermediate, and advanced knowledge with clear examples and practical concepts. The content is freely available online and in print, with the aim of providing a complete and easy-to-understand platform for graphics programming enthusiasts. Readers will learn core graphics aspects, useful techniques, and even create a small game based on the obtained OpenGL knowledge.
WebGPU Fundamentals
July 12, 2024
The text provides a collection of articles to help beginners learn the basics of WebGPU, covering topics like fundamentals, 3D math, lighting techniques, and compute shaders. It also includes information on optional features, data memory layout, transparency, performance, and resources for further learning. Readers can explore various aspects of WebGPU, including how it works, 2D and 3D techniques, and essential concepts like uniforms, textures, and storage buffers.
An opinionated beginner’s guide to Haskell in mid-2019
July 12, 2024
This guide is for beginners in Haskell or those transitioning from similar languages, offering advice on learning resources and tools. It emphasizes the importance of writing Haskell code, getting help online, choosing popular platforms, and sticking to the default Prelude. The guide also touches on application architecture, using records, debugging techniques, and the experimental nature of Haskell as both a research and industrial language.
Are tagged unions overrated?
July 12, 2024
The author discusses the limitations of tagged unions and pattern matching in language development, suggesting that they are overrated for implementing language ASTs and IRs. Despite the benefits of tagged unions, the complexity they add may not always justify their use, especially in cases where simpler alternatives like class hierarchies can offer similar functionality. The post also highlights the potential for enhancing pattern-matching capabilities in mainstream languages to improve code readability and maintainability.
C++ Core Guidelines
July 12, 2024
These guidelines aim to simplify and improve the safety of C++ code by recommending specific extensions and best practices. They focus on static type safety, resource management, and reducing the likelihood of errors or accidents. By following these guidelines, programmers can write more correct, safer code without sacrificing performance.
What every systems programmer should know about concurrency
July 11, 2024
The document delves into the complexities of concurrency for systems programmers, explaining the challenges of running multithreaded programs where code is optimized and executed in unexpected sequences. It covers fundamental concepts like atomicity, enforcing order in multithreaded programs, and memory orderings. The text emphasizes the importance of understanding how hardware, compilers, programming languages, and applications interact to create a sense of order in multithreaded programs. Key topics include atomic operations, read-modify-write operations, compare-and-swap mechanisms, and memory barriers in weakly-ordered hardware architectures.
compiler_construction
July 11, 2024
Building a compiler can be straightforward by breaking the development into small steps and using Scheme as the implementation language. The tutorial focuses on translating a subset of Scheme to assembly code, with a step-by-step approach to achieve a fully working compiler. Testing and refining the compiler incrementally leads to a powerful tool capable of compiling an interactive evaluator.
How do we tell truths that might hurt?
July 11, 2024
The document discusses the challenges of telling unpleasant truths and the conflict that arises when sharing these truths in the field of Computing Science. The author argues that remaining silent about these truths compromises the intellectual integrity of the field. The document also lists a number of truths related to programming languages and the use of language in computing systems. The author questions whether the field should continue to ignore these truths and urges for a change in attitude.
The next fifty years
July 11, 2024
The text discusses the future of computing science over the next fifty years, emphasizing the importance of simplicity and elegance in design to prevent complexity. It highlights the close connection between program design and proof design, suggesting that advancements in program design can impact general mathematics. The author encourages embracing the opportunity to simplify processes and design systems that rely on formal mathematics.
Recommender Systems: A Primer
July 10, 2024
Personalized recommendations have become a common feature of modern online
services, including most major e-commerce sites, media platforms and social
networks. Today, due to their high practical relevance, research in the area of
recommender systems is flourishing more than ever. However, with the new
application scenarios of recommender systems that we observe today, constantly
new challenges arise as well, both in terms of algorithmic requirements and
with respect to the evaluation of such systems. In this paper, we first provide
an overview of the traditional formulation of the recommendation problem. We
then review the classical algorithmic paradigms for item retrieval and ranking
and elaborate how such systems can be evaluated. Afterwards, we discuss a
number of recent developments in recommender systems research, including
research on session-based recommendation, biases in recommender systems, and
questions regarding the impact and value of recommender systems in practice.
http client in the standard library · Issue #2007 · ziglang/zig
July 10, 2024
The issue #2007 discusses the implementation of an HTTP client in Zig's standard library. Contributors debate the necessity and scope of including an HTTP client, considering factors like complexity and resource allocation. Ultimately, the HTTP client implementation was completed and closed as part of milestone 0.12.0.
Introduction to Compilers and Language Design
July 10, 2024
A compiler translates high-level code to lower-level code, and building one is a common project in computer science education. This book provides a beginner-friendly guide to building a compiler for a C-like language, suitable for undergraduates with programming experience. The author offers free online access to the textbook and related code resources, with options to purchase a physical copy.
Bare Metal Zig
July 10, 2024
The text discusses compiling a freestanding Zig binary to run on "bare metal" without relying on an operating system. It shows how to create a simple freestanding binary, make it multiboot compliant, and add custom console functionality for output. The process involves targeting specific architectures, handling linker warnings, and ultimately creating a bootable "kernel" to run on virtual machines like QEMU.
Comparing SIMD on x86-64 and arm64
July 10, 2024
The text compares SIMD implementations using SSE on x86-64 and Neon on arm64 processors, including emulating SSE on arm64 with Neon. It explores vectorized code performance using intrinsics, auto-vectorization, and ISPC, highlighting the efficiency of SSE and Neon implementations. The study shows how optimizing for SIMD instructions significantly boosts performance over scalar implementations in ray-box intersection tests.
Compiler Optimizations Are Hard Because They Forget
July 10, 2024
Compiler optimizations involve breaking down complex changes into smaller, more manageable steps to improve code efficiency. However, as more optimizations are added, the potential for errors and missed opportunities increases, making it challenging to maintain optimal performance. Compilers struggle with balancing aggressive optimizations while preserving correct program behavior, highlighting the complexity and difficulties inherent in optimizing compilers.
C Isn't A Programming Language Anymore
July 10, 2024
C is no longer just a programming language but a vital protocol for all languages. Parsing C headers is a complex task best left to C compilers. Maintaining ABI compatibility in C can be challenging and may require versioning schemes.
Writing a C Compiler, Part 1
July 9, 2024
This text is about creating a C compiler in multiple stages, starting with lexing, parsing, and code generation. The process involves breaking down the source code, building an abstract syntax tree, and generating x86 assembly code. The compiler will handle simple programs with a single main function and a return statement.
GitHub - DoctorWkt/acwj: A Compiler Writing Journey
July 9, 2024
This GitHub repository documents the author's journey to create a self-compiling compiler for a subset of the C language. The author shares steps taken and explanations to help others follow along practically. The author credits Nils M Holm's SubC compiler for inspiration and differentiates their code with separate licensing.
A new JIT engine for PHP-8.4/9
July 9, 2024
A new JIT engine for PHP is being developed, improving performance and simplifying development. The engine will be included in the next major PHP version, potentially PHP 9.0. The new JIT engine generates a single Intermediate Representation (IR), eliminating the need to support assembler code for different CPUs.
Unknown
July 9, 2024
Hardware prefetching in multicore processors can be too aggressive, wasting resources and impacting performance for co-running threads. Combining hardware and software prefetching can optimize performance by efficiently handling irregular memory accesses. A method described in Paper II offers a low-overhead framework for accurate software prefetching in applications with irregular access patterns.
Introduction 2016 NUMA Deep Dive Series
July 9, 2024
The 2016 NUMA Deep Dive Series by staroceans.org explores various aspects of computer architecture, focusing on NUMA systems and their optimization for performance. The series covers topics such as system architecture, cache coherency, memory optimization, and VMkernel constructs to help readers understand and improve their host design and management. The series aims to provide valuable insights for configuring and deploying dual socket systems using Intel Xeon processors, with a focus on enhancing overall platform performance.
von Neumann architecture - Wikipedia
July 9, 2024
The von Neumann architecture is a computer design with a processing unit, control unit, memory, and input/output mechanisms. It allows for instructions and data operations to be stored in memory, advancing computer technology from fixed-function machines like the ENIAC. This architecture was influenced by the work of Alan Turing and John von Neumann and has been widely used in the development of modern computers.
Compiling tree transforms to operate on packed representations
July 8, 2024
The article explains how tree traversals in programming can be optimized by compiling them to work on serialized tree structures without using pointers. This approach can make programs run significantly faster on current x86 architectures. The authors developed a prototype compiler for a functional language that generates efficient code for traversing trees using packed data representations.
Pipelines Support Vectorized, Point-Free, and Imperative Style
July 8, 2024
The text discusses how pipelines in the shell language support vectorized operations on collections and point-free style, where no data is explicitly mentioned. It also demonstrates how imperative code can be incorporated within pipelines for tasks like generating HTML tables. The unique features of pipelines include their ability to handle vectorized code, point-free composition, and integration of imperative instructions.
Entering text in the terminal is complicated
July 8, 2024
Entering text in the terminal can be challenging due to inconsistencies in how different programs handle text input. Some programs support basic features like arrow keys and history navigation, while others have custom input systems with advanced functionalities. Understanding the input mode of a program can help users navigate text editing more effectively in the terminal.
What happens when you start a process on Linux?
July 8, 2024
The process of starting a new program on Linux involves using the fork and exec system calls. Fork creates a clone of the current process, while exec replaces that clone with the new program to be executed. The new process inherits most attributes from its parent, with memory being shared through copy-on-write to optimize performance.
Debug your programs like they're closed source!
July 8, 2024
The author discusses debugging programs without looking at the source code by using system calls like open, execve, and write. System calls allow you to understand and monitor a program's behavior without needing access to its source code. By learning and utilizing system calls, you gain debugging superpowers that are platform-independent and useful for closed-source programs.
How I got better at debugging
July 8, 2024
Julia Evans shares her journey of improving her debugging skills through logical thinking, confidence, expanding knowledge, communication, and using tools like strace and tcpdump. By being systematic, confident, knowledgeable, and open to collaboration, she transformed debugging from a challenging task to an exciting learning opportunity. Her story emphasizes the importance of persistence, curiosity, and practical problem-solving in mastering the art of debugging.
Media Page Under Construction
July 8, 2024
Handmade Cities' media page is under construction, with some recordings missing. The videos from Handmade Boston 2023 have poor audio quality due to using a third-party A/V company. Freya's Masterclass footage was lost, and an abridged version will be shown at Dutch Game Day.
Infographics: Operation Costs in CPU Clock Cycles
July 8, 2024
The text discusses the operation costs in CPU clock cycles for different types of operations, including simple operations, floating-point operations, and vector operations. It highlights that memory involvement can significantly impact operation costs, with some operations taking as little as 1 CPU cycle. Different CPU architectures and types of operations can result in varying costs, with some operations requiring specialized CPU support to work efficiently.
Handles are the better pointers
July 8, 2024
The text discusses using 'index-handles' instead of raw or smart pointers for memory management in C and C++. It suggests centralizing memory management into systems, grouping items into arrays, and converting handles to pointers only when necessary. By following specific rules, such as not storing pointers and using handle-to-pointer conversion, memory safety and efficient memory usage can be maintained.
You're Not Sick of Programming
July 8, 2024
Many people feel tired of programming and dream of quitting for a more fulfilling career, like farming or traveling. However, the real issue might be frustration with office politics, lack of product vision, and burnout rather than a true dislike of programming. Taking a break or addressing these underlying problems could help rediscover the creative potential of programming.
Zig Bare Metal Programming on STM32F103 — Booting up
July 8, 2024
The text explains how to program the STM32F103 microcontroller using the Zig programming language. It covers topics such as memory layout, linker scripts, and compiling code for embedded systems. By following the provided instructions, readers can successfully compile and run their first embedded program on the microcontroller.
OWASP Top Ten
July 7, 2024
The OWASP Top 10 is a guide for developers to understand critical security risks in web applications. Companies are encouraged to follow this document to improve the security of their web applications. The 2021 update includes new categories and ranking changes based on testing data and industry feedback.
Introduction
July 7, 2024
The OWASP Cheat Sheet Series offers valuable security information on application security topics. Created by experts, these concise cheat sheets aim to provide easy-to-read security guidance. You can download the cheat sheets from this site and stay updated through the ATOM feed.
The Copenhagen Book
July 7, 2024
The Copenhagen Book is a free and open-source guide for implementing auth in web applications. It is community-maintained and can be used alongside the OWASP Cheat Sheet Series. Suggestions or concerns can be addressed by opening a new issue.
Undefined Behavior deserves a better reputation
July 6, 2024
Undefined Behavior is often viewed negatively, but it can be a valuable tool for language designers. It allows programmers to convey insights to the compiler for optimizations. Responsible use of Undefined Behavior can enhance language design and code performance.
KHM+15
July 6, 2024
The text discusses a formal C memory model that supports integer-pointer casts, essential for low-level C programming. It proposes a quasi-concrete memory model that allows standard compiler optimizations while fully supporting integer-pointer casts. This model helps verify programs and optimizations that are challenging to validate with integer-pointer casts.
Learning LLVM (Part-1) - Writing a simple LLVM pass
July 5, 2024
This text introduces learning about LLVM and writing LLVM passes, which are used for transforming or analyzing a program's intermediate representation. LLVM offers a versatile compiler infrastructure with modules like the frontend, middle-end, and backend for optimizing and generating machine-specific code. By understanding LLVM concepts and pass managers, developers can create efficient passes for tasks like performance optimization and code analysis.
Some Were Meant for C
July 5, 2024
The document "Some Were Meant for C" explores the enduring significance of the C programming language, highlighting its dual role as both an application and systems programming language. It challenges common assumptions about C, emphasizing its unique communicative design that differs from managed languages. The document argues that C's explicit representations and memory access foster effective system-building and communication, making it a preferred choice for certain technical challenges. Additionally, it critiques the prevailing discourse that demonizes C, advocating for a nuanced understanding of its role in the programming landscape.
Xv6, a simple Unix-like teaching operating system
July 5, 2024
Xv6 is a teaching operating system developed by MIT for their operating systems course. It is based on Unix V6, written in ANSI C, and runs on Intel x86 machines. The xv6 source code is available on GitHub and is used in lectures to teach operating system concepts.
C Is Not a Low-level Language
July 5, 2024
C is often considered a low-level language, but this article argues that it is not. The author explains that vulnerabilities like Spectre and Meltdown occurred because processor architects were trying to build fast processors that exposed the same abstract machine as a PDP-11, which C programmers believe is close to the underlying hardware. However, the reality is that C code runs on a complex compiler that performs intricate transformations to achieve the desired performance. The article also discusses how C's memory model and optimizations make it difficult to understand and can lead to undefined behavior. The author suggests that instead of trying to make C code fast, it may be time to explore programming models on processors designed for speed.
Should you learn C to "learn how the computer works"?
July 5, 2024
The author discusses whether learning C is necessary to understand how computers work, ultimately concluding that C is not a direct representation of computer operations. Learning C can still be beneficial for understanding computing concepts and history, but it operates within a virtual machine and abstracts certain hardware details. By learning C, you can gain insight into the relationship between programming languages, hardware, and the historical development of computing.
A Guide to Undefined Behavior in C and C++, Part 1
July 5, 2024
The text explains that undefined behavior in C and C++ can lead to unpredictable program outcomes. Compilers may optimize code by exploiting undefined behavior, potentially causing programs to misbehave. It is important for programmers to understand how undefined behavior can impact program execution.
Using neural nets to recognize handwritten digits
July 5, 2024
Neural networks can recognize handwritten digits by learning from examples. Sigmoid neurons play a key role in helping neural networks learn. Gradient descent is a common method used for learning in neural networks.
When Network is Faster than Cache
July 5, 2024
Firefox introduced a feature called RCWN to improve web performance by racing cached requests against the network. In some cases, the network can be faster than fetching data from the cache due to various factors like browser bugs and resource prioritization. Factors like device hardware and the total number of assets served from the cache impact cache retrieval performance significantly.
John Carmack on Functional Programming in C++
July 5, 2024
Functional programming in C++ can help in writing better software by making code easier to reason about and eliminating thread race conditions. Pure functions, which only rely on input parameters and produce consistent outputs, offer benefits such as thread safety and easier testing. Refactoring towards purity can improve code quality, even if full purity is not achieved, by disentangling computation from the environment it operates in.
Zig-style generics are not well-suited for most languages
July 5, 2024
Zig-style generics, like those in C++, may not work well for all languages due to limitations in compiler support and type inference. Armchair suggestions about adopting Zig-style generics in other languages may overlook these challenges. The flexibility and metaprogramming capabilities in Zig may not easily translate to other statically-typed languages.
WebGL2 vs WebGL1
July 4, 2024
WebGL is a 3D API that works as a rasterization engine, requiring users to provide code for rendering points, lines, and triangles. Users must create vertex and fragment shaders to control how WebGL processes and displays graphics. The WebGL API simplifies rendering by executing user-created functions to draw basic shapes like triangles.
WebGL How It Works
July 4, 2024
The text explains how WebGL processes vertices to create triangles and render them with pixels using shaders. Varyings are used to pass data from the vertex shader to the fragment shader for color interpolation. Buffers are essential for transferring vertex data to the GPU for rendering, and attribute locations are assigned to specify how to extract and use this data efficiently.
The_Night_Watch
July 4, 2024
The text discusses the importance of systems programmers in dealing with complex technical challenges, emphasizing their unique skills in debugging and problem-solving. It contrasts the roles of systems programmers with other computer professionals like GUI designers and PHP developers, highlighting the critical nature of systems programming in challenging scenarios. The text humorously portrays the intense and sometimes absurd experiences of systems programmers, showcasing their indispensable role in addressing technical issues efficiently and effectively.
FreeType
July 4, 2024
FreeType is a software library for rendering fonts, available for free. It is designed to be small, efficient, and capable of producing high-quality font images. Users can find installation instructions, documentation, and ways to communicate with the FreeType team on their website.
A Freestanding Rust Binary
July 3, 2024
To create a freestanding Rust executable for operating system development, we need to disable linking to the standard library and define our own entry point function. By compiling for a bare metal target like thumbv7em-none-eabihf, we can avoid linker errors and run Rust code without an underlying operating system. Additional linker arguments are required for specific operating systems like Linux, Windows, and macOS to resolve linker errors and build the freestanding Rust binary successfully.
Manually linking Rust binaries to support out-of-tree LLVM passes
July 3, 2024
LLVM is a compiler infrastructure used by frontends like rustc to generate machine code. To add custom LLVM passes to a Rust binary, extra flags can be used during compilation to produce LLVM-IR and then link the binary properly using LLVM tools. By understanding how Rust's static libraries work and leveraging cargo for dependency management, custom LLVM passes can be integrated into Rust binaries efficiently.
The Rust Reference
July 3, 2024
The Rust compiler can generate different types of output artifacts, such as runnable executables, Rust libraries, dynamic libraries, and static system libraries. Dependencies between crates can be linked in various formats, such as rlib and dynamic library formats, following specific rules set by the compiler. Understanding how to specify output formats like --crate-type=bin or --crate-type=lib can help control the compilation process for Rust crates, while also considering options for linking C runtimes dynamically or statically based on target features.
Rust Compiler Development Guide
July 3, 2024
The Rust compiler processes and transforms your code for compilation. It uses different stages like lexing, parsing, and abstract syntax tree lowering. The compiler aims for correctness, performance, and supporting incremental compilation.
How to speed up the Rust compiler one last time
July 3, 2024
The author at Mozilla is concluding their work on speeding up the Rust compiler after several years of dedicated effort.
They wrote multiple blog posts detailing their performance optimizations and shared valuable lessons learned from the process.
The author expressed gratitude to those who supported their work and highlighted the importance of ongoing contributions to Rust's development.
How to speed up the Rust compiler in March 2024
July 3, 2024
In March 2024, updates on the Rust compiler's performance highlighted several key improvements. Changes like using a single codegen unit, marking Debug::fmt methods with #[inline], introducing a cache, and upgrading LLVM versions led to notable reductions in wall-time, binary size, and hash table lookups. Additionally, the availability of the Cranelift codegen backend for x86-64/Linux and ARM/Linux offers an alternative for faster compile times. While the author didn't contribute to speed improvements this time, overall performance from August 2023 to March 2024 showed reductions in wall-time, peak memory usage, and binary size, indicating steady progress in enhancing the Rust compiler's efficiency.
Zig Bits 0x4: Building an HTTP client/server from scratch
July 3, 2024
The text explains how to create an HTTP client and server from scratch using Zig >=0.11.
For the client, you need to set up requests, headers, and wait for responses.
The server part involves defining functions to handle requests and running the server to accept connections.
Do We Really Need A Link Step?
July 3, 2024
The author questions the need for a link step in native-code compilation for faster performance. They propose a "zero-link" approach where compilers directly write object code into the final executable file. This method could improve efficiency by avoiding unnecessary object files and incorporating symbol resolution within the executable itself.
Death Note: L, Anonymity & Eluding Entropy
July 2, 2024
The text discusses Light's mistakes in using the Death Note and how they led to his de-anonymization by L. Light's errors, such as revealing his precise killing methods and using confidential police information, significantly reduced his anonymity. The text also explores strategies Light could have employed to better protect his anonymity while using the Death Note.
jamiebuilds/the-super-tiny-compiler: :snowman: Possibly the smallest compiler ever
July 2, 2024
The Super Tiny Compiler is a simplified example of a modern compiler using easy-to-read JavaScript. It helps you understand how compilers work from start to finish. Compilers play a big role in the tools we use daily.
5 Days to Virtualization: A Series on Hypervisor Development
July 2, 2024
A series on hypervisor development for Intel processors with virtualization support will be published next week, covering topics like setting up a test environment, driver skeleton creation, and multi-processor initialization. The series aims to aid new readers in building, testing, and understanding type-2 hypervisor development using C programming language. Recommended reading and detailed explanations will be provided to enhance knowledge and understanding of virtualization concepts.
In-depth analysis on Valorant’s Guarded Regions
July 2, 2024
The text discusses how Valorant's anti-cheat system, Vanguard, uses innovative techniques to protect against memory manipulation by whitelisting threads and creating shadow regions. These methods involve cloning and modifying the game's paging tables to allow access to hidden memory without affecting performance. By implementing these advanced security measures, Vanguard effectively prevents cheats from bypassing its guarded regions.
Exploit Development: No Code Execution? No Problem! Living The Age of VBS, HVCI, and Kernel CFG
July 2, 2024
The text discusses various techniques used in exploit development, particularly focusing on targeting the Windows kernel. It mentions concepts like Hypervisor-Protected Code Integrity (HVCI) and how exploits can manipulate memory to execute attacker-controlled code in kernel mode. The text also delves into details like leaking kernel-mode memory, constructing ROP chains on the kernel-mode stack, and utilizing functions like NtQuerySystemInformation to escalate privileges and perform malicious actions in the system.
Reader
July 2, 2024
The Reader API by jina.ai helps extract clean, LLM-friendly text from web content, ensuring high-quality input for AI systems like agents and RAG. It can also search the web for the latest information to keep LLMs up-to-date, improve factuality, and reduce misinformation. Additionally, Reader can read images on webpages and PDFs, providing alt text for images and lightning-fast PDF processing, all available for free with flexible rate limits.
CheerpX versus WebContainers
July 2, 2024
CheerpX is a client-side virtualization technology for running x86 executables and operating systems in the browser without modifications or recompilation. It offers cost-effective, secure, and private execution of native code, making it suitable for various web-based applications. CheerpX stands out from other solutions by supporting any x86 executable and providing a robust two-tier emulator for efficient code execution.
Creating a Rootkit to Learn C
July 2, 2024
The text demonstrates creating a userland rootkit in C to hide malicious activities like network connections and files. By hooking into system calls like access() and write(), the rootkit can manipulate userland programs and evade detection by tools like netstat. The rootkit uses shared library injections and hooks to intercept and manipulate system calls, showcasing the power of C for malicious activities.
Picsart-AI-Research/LIVE-Layerwise-Image-Vectorization: [CVPR 2022 Oral] Towards Layer-wise Image Vectorization
July 1, 2024
The text discusses a new method called LIVE for generating SVG images layer by layer to fit raster images. LIVE uses closed bezier paths to learn visual concepts in a recursive manner. Installation instructions and references for the method are provided in the text.
Udacity CS344: Intro to Parallel Programming
July 1, 2024
Intro to Parallel Programming is a free online course by NVIDIA and Udacity teaching parallel computing with CUDA. It's for developers, scientists, engineers, and students looking to learn about GPU programming and optimization. The course is self-paced, requires C programming knowledge, and offers approximately 21 hours of content.
CS 361: Systems Programming
July 1, 2024
The Systems Programming course at UIC includes assigned readings, video lectures, labs, and quizzes scheduled throughout the week. Students can access additional resources and submit assignments through the course gradescope page. Office hours, content quizzes, discussions, and exams are held on specific days via Zoom and YouTube.
Resolving Rust Symbols
July 1, 2024
Linking combines object files into an executable or shared library in Rust. The linker resolves symbols and dependencies between object files. Rust prefers static linking to create a single distributable binary with all dependencies included.
When FFI Function Calls Beat Native C
July 1, 2024
David Yu performed a benchmark comparing different Foreign Function Interfaces (FFI) for function calls. LuaJIT's FFI was found to be faster than native C function calls due to efficient dynamic function call handling. Direct function calls, like those used by LuaJIT, can outperform indirect calls routed through a Procedure Linkage Table (PLT).
Cap'n Proto, FlatBuffers, and SBE
July 1, 2024
FlatBuffers is a new serialization protocol released by Google engineers, similar to Cap’n Proto. Cap’n Proto allows random access using pointers, while FlatBuffers uses offsets stored in tables for random access. Protobufs, Cap’n Proto, and FlatBuffers have custom schema languages and different features for data serialization and access.
A Database Without Dynamic Memory Allocation
July 1, 2024
TigerBeetle, a database written in Zig, does not allocate memory dynamically after startup. It uses static memory allocation for all data structures, avoiding performance issues and use-after-free bugs. This approach allows for better predictability, easier handling of overload, and efficient resource management.
Wizard Zines Collection!
July 1, 2024
Julia offers programming zines with black and white covers for free and colored covers for purchase. The zines can be bought individually for $10-$12 each or as a whole collection. Additionally, there are free posters and a weekly comic subscription available.
Aggregating Millions of Groups Fast in Apache Arrow DataFusion 28.0.0
July 1, 2024
Apache Arrow DataFusion version 28.0.0 now offers faster parallel aggregation for queries with many groups. The improvements aim to enhance user experiences by generating insights more efficiently. These enhancements bring DataFusion closer to the grouping speed of DuckDB.
Problems of C, and how Zig addresses them
July 1, 2024
This blog post discusses issues with C and how Zig addresses them through features like comptime evaluations and improved memory management. Zig offers solutions like error handling improvements and treating everything as an expression, making it a modern alternative to C with enhanced functionalities. The comparison highlights Zig's advantages in areas such as memory management, error handling, and expressive coding practices.
How to use hash map contexts to save memory when doing a string table
June 30, 2024
The text explains how to save memory when building a string table using hash map contexts. By adapting context APIs, only indexes are stored in the table, reducing memory usage. This method can save 117 KB of memory for a string table with 10 thousand entries.
resume.txt
June 30, 2024
Andrew Kelley is a programmer with 16 years of experience in software development and a passion for open-source projects. He has worked on various music-related software like the Genesis DAW and libgroove, contributing patches to libav and ffmpeg. Additionally, he has experience in low-level systems, custom algorithm creation, and designing user interfaces.
Leslie Lamport
June 28, 2024
Leslie Lamport wrote several papers on verifying and specifying concurrent systems using TLA. He discovered algorithms through formal derivation and emphasized mechanical verification of concurrent algorithms. His work influenced the development of the TLAPS proof system.
Indices and tables
June 27, 2024
CompilerGym is a library for reinforcement learning in compiler tasks. It helps ML researchers work on optimization problems and allows system developers to create new tasks for ML research. The goal is to use ML to make compilers faster.
448997590_1496256481254967_2304975057370160015_n
June 27, 2024
The LLM Compiler is a suite of pre-trained models designed for code optimization tasks, based on Code Llama. It has been trained on a large corpus of LLVM-IR and assembly code to enhance compiler behavior understanding. The release of LLM Compiler aims to support further research in compiler optimization for both academia and industry.
Bare Bones
June 25, 2024
This text explains how to create an operating system by first cross-compiling and using existing technology. It guides you through writing a kernel in C or C++, creating a bootloader, and linking the kernel for x86 systems. Following these steps ensures your operating system can be loaded and executed correctly.
The Graphics Codex
June 24, 2024
"The Graphics Codex" is a comprehensive resource for computer graphics, offering essential information on 3D rendering and shading. It includes equations, diagrams, and programming projects, with free updates every month. Written by expert Morgan McGuire, it is a valuable tool for learning and reference in the field of computer graphics.
[2305.13009] Textually Pretrained Speech Language Models
June 24, 2024
Notes on partial borrows
June 24, 2024
The text discusses limitations of the Rust borrow checker and proposes solutions for creating references that borrow from specific subsets of a type. Two approaches, "View types" and "Reference views," are explored to address these limitations and provide more flexibility in borrowing subsets of fields with different lifetimes and mutability. The discussion includes examples, subtyping implications, monomorphization considerations, and the need to update Rust's aliasing model to accommodate view references accessing discontiguous memory regions.
Dioxus Labs + “High-level Rust”
June 24, 2024
An article criticized Rust's gamedev hype, but its popularity stems from meeting modern programming needs like speed and safety. Efforts are underway to enhance Rust's capabilities for various industries and improve compile times significantly. Proposed enhancements include incremental linking, parallel frontend, and macro expansion caching to make Rust more efficient for developers.
Compile-Time Configuration For Zig Libraries
June 24, 2024
To expose compile-time configuration options in Zig libraries, developers can use global declarations in the root source file or through Zig's build system. By setting configuration flags, developers can customize behavior such as enabling or disabling assertions in library code. Compile-time configuration can improve performance by allowing certain checks to be done at compile-time rather than runtime.
Generics
June 24, 2024
Generics in Zig allow for creating data structures and algorithms that can work with different types. By using generics, code can be written once and reused with various data types. Zig's approach to generics involves leveraging compile-time metaprogramming capabilities.
Zig's HashMap - Part 1
June 24, 2024
Zig's std.HashMap implementation relies on two key functions: hash and eql. The documentation outlines various hash map types and their functionalities, including std.HashMapUnmanaged. AutoHashMap can automatically generate hash functions, but there are limitations, and custom contexts can be provided for more complex keys.
Zig Parser
June 24, 2024
The Zig Parser is a crucial part of the Zig compiler internals, responsible for constructing an abstract syntax tree from a stream of tokens. The parser uses a struct called Parser to manage the internal state of the parse operation, accumulating errors and building up AST nodes. Understanding the structure of an AST node and the data pattern is essential for comprehending how the parser works and the subsequent stages of the compiler. The AST node data is stored in various locations such as the token stream, the node list, and the extra data list, with specific structures and indexes used to access information about AST nodes like function declarations and prototypes.
Copying Better: How To Acquire The Tacit Knowledge of Experts
June 24, 2024
The text discusses how to acquire expert intuition, known as tacit knowledge, through emulation and apprenticeship. Naturalistic Decision Making (NDM) research helps extract and teach expert judgment using methods like Cognitive Task Analysis and the recognition-primed decision making model. Experts rely on implicit memory and pattern recognition to make rapid assessments and decisions, which can be challenging to verbalize.
Causal ordering
June 24, 2024
Causal ordering is essential for understanding distributed systems, where events may not have a clear time order. This concept helps determine the causal relationship between events in a system. It enables reasoning about causality, leading to simpler solutions in distributed computing.
Assorted thoughts on zig (and rust)
June 24, 2024
Zig is simpler than Rust and offers similar features through compile-time execution. Rust provides strong type safety guarantees for generic functions, while Zig lacks automatic type constraint documentation and may face challenges with IDE support. Zig excels in custom allocators and handling out-of-memory errors, while Rust excels in preventing memory leaks and resource management.
Columnar kernels in go?
June 24, 2024
Over the winter I'm going to be adding a columnar query engine to an existing system written in go.
An opinionated map of incremental and streaming systems
June 24, 2024
The text discusses various design choices and characteristics of incremental and streaming systems. It highlights the core idea of these systems, which is to process inputs to generate outputs efficiently. The systems are categorized based on unstructured vs structured design, high temporal locality vs low temporal locality workloads, internal consistency vs internal inconsistency, and eager vs lazy computation approaches. The text explains the advantages and disadvantages of each design choice and provides examples of systems that fall into different categories. Additionally, it emphasizes the importance of understanding these design choices in selecting the appropriate system for specific workloads.
Internal consistency in streaming systems
June 24, 2024
The text discusses the importance of internal consistency in streaming systems. It explains how eventual consistency can lead to incorrect outputs and the need for systems to wait for all relevant inputs before emitting results. Maintaining internal consistency ensures correct outputs and prevents confusion between changes and corrections.
Pain we forgot
June 24, 2024
The text discusses the challenges in programming and the need for more user-friendly tools. It emphasizes the importance of improving feedback loops, running code smoothly, and creating more helpful programming environments. The author suggests rethinking traditional tools and approaches to make programming more accessible and efficient.
Have you tried rubbing a database on it?
June 24, 2024
HYTRADBOI was a conference featuring lightning talks on innovative uses of databases for solving problems. Talks included topics like building data-centric apps, realtime machine learning, and interactive databases. The event focused on embracing new solutions and fostering professional behavior among attendees.
The shape of data
June 24, 2024
The text discusses the importance of having a clear and consistent data notation in programming languages like Clojure. It emphasizes the advantages of a notation that closely aligns with the in-memory representation of data, making it easier for developers to work with and understand data structures. Additionally, it suggests that a well-designed data model and notation are crucial for efficient data manipulation and code analysis.
Reflections on a decade of coding
June 24, 2024
The author reflects on 12 years of coding experience, sharing recent projects and personal growth insights. They highlight the importance of gradual improvements in habits and processes over innate talent. The author identifies areas of progress, like writing efficient code and managing emotions, while acknowledging gaps in experience in maintaining large codebases and teamwork.
Prospecting for Hash Functions
June 24, 2024
The text discusses the process of designing non-cryptographic integer hash functions, exploring different operations and constraints to create effective hash functions. It also compares various 32-bit hash functions and their bias levels, highlighting the search for high-quality hash functions with minimal bias for both 32-bit and 64-bit integers.
The Missing Zig Polymorphism / Runtime Dispatch Reference
June 24, 2024
The text discusses how Zig lacks built-in polymorphism features like interfaces or virtual methods. It explores creating polymorphism using existing language features in Zig. The author provides a detailed guide on implementing polymorphism in Zig, focusing on dynamic dispatch using function pointers.
Nanosystems
June 23, 2024
This text is about a book called "Nanosystems" by K. Eric Drexler, which is considered groundbreaking in the field of molecular nanotechnology. The book explains how to create manufacturing systems at the molecular level and discusses the significant impact nanotechnology will have on various industries. Experts praise the book for providing a foundation for future research in molecular systems engineering and molecular manufacturing.
How To Become A Hacker
June 23, 2024
The text explains what it means to be a hacker, focusing on problem-solving, creativity, and a willingness to share knowledge within the hacker culture. It emphasizes the importance of developing a hacker mindset, skills, and dedication through self-education and a passion for solving new problems. The hacker culture values intelligence, hard work, and a sense of community, with an emphasis on learning and sharing information to advance the collective knowledge of hackers.
the rr debugging experience
June 20, 2024
rr is a debugging tool for Linux that records failures for deterministic replay under gdb. It helps debug real applications efficiently and supports reverse execution for finding bugs. rr aims to make debugging easier with low overhead and powerful features like hardware data watchpoints.
Text Buffer Reimplementation
June 19, 2024
The Visual Studio Code 1.21 release includes a new text buffer implementation that improves performance in terms of speed and memory usage. The previous implementation used an array of lines, but it had limitations such as high memory usage and slow file opening times. The new implementation uses a piece table data structure, which allows for better memory usage and faster line look-up. Additionally, the implementation uses techniques such as caching for faster line lookup and a balanced binary tree for efficient searching. Benchmarks showed that the new implementation outperformed the previous line array implementation in terms of memory usage, file opening times, and reading operations.
What Is The Minimal Set Of Optimizations Needed For Zero-Cost Abstraction?
June 19, 2024
Rust and C++ offer "zero-cost abstractions" where high-level code compiles to low-level code without added runtime overhead, but enabling necessary compiler optimizations can slow down compilation and impact debugging. The challenge is to find the minimal set of optimizations that maintain zero-cost abstractions while improving build speed and debug information quality. Balancing fast debuggable builds with zero-cost abstractions is crucial for performance and developer experience in languages like Rust and C++.
Using ASCII waveforms to test hardware designs
June 19, 2024
Using expect tests automates the validation of code output, detecting errors efficiently. Jane Street uses Hardcaml in OCaml for hardware development, simplifying testbench creation. Waveform expect tests help visualize hardware behavior, improving development workflows.
Rust 2019 and beyond: limits to (some) growth.
June 19, 2024
The text discusses the need for controls and policies to manage the growth limits of technical artifacts and the strains on individuals in the Rust project. It emphasizes the importance of acknowledging and addressing these limits to prevent potential crises or dysfunction in the future. The author suggests implementing controls, such as hard limits and moderation strategies, to maintain a healthy and sustainable project environment.
Your ABI is Probably Wrong
June 19, 2024
The text discusses how most ABIs have a design flaw that harms performance by passing large structures inefficiently. Different ABIs handle passing large structures differently, but they all repeat the same mistakes. A correctly-specified ABI should pass large structures by immutable reference to avoid unnecessary copies.
GitHub - sirupsen/napkin-math: Techniques and numbers for estimating system's performance from first-principles
June 19, 2024
The project "Napkin Math" aims to provide resources and techniques to estimate system performance quickly and accurately. It includes examples like estimating memory reading speed and storage costs for applications. The best way to learn this skill is through practical application, with the option to subscribe for regular practice problems. Detailed numbers and cost estimates are provided, along with compression ratios and techniques to simplify calculations. The project encourages user participation to enhance and refine the provided data and tools for napkin math calculations.
Don't write bugs
June 19, 2024
Effective programmers should focus on preventing bugs rather than debugging them. Re-reading code frequently can help reduce the number of errors. Writing bug-free code is achievable with practice and attention to detail.
technicalities: "not rocket science" (the story of monotone and bors)
June 19, 2024
The text discusses the development of a program called bors that enforces the "Not Rocket Science Rule" of maintaining a code repository that always passes tests. Bors automates integration testing and ensures code changes are only merged if they pass tests, preventing broken code from being merged. This system has been found to be extremely beneficial for software projects, ensuring a stable and reliable codebase.
Why is Python slow
June 19, 2024
Python's performance issues stem from spending most time in the C runtime, rather than the Python code itself. Pyston focuses on speeding up the C code to improve performance. Suggestions to improve Python's speed by using other JIT techniques overlook the fundamental issue of optimizing C code.
Design duality and the expression problem
June 19, 2024
The text discusses the concept of design duality in programming, focusing on the trade-offs between objects and data representations. It highlights the importance of making conscious design choices when introducing new types, whether as data, objects with extensible implementations, or abstract data types with restricted extensibility. The author emphasizes the need for programming languages to better support and encourage these design considerations.
Random Thoughts On Rust: crates.io And IDEs
June 19, 2024
The author shares experiences with Rust, praising cargo and crates.io for easy code distribution. They highlight the need for improved library discovery on crates.io and discuss the potential for better IDE support in Rust projects. Despite challenges like type inference, Rust's design enables advanced IDE features that can enhance coding efficiency.
John Carmack on Inlined Code
June 19, 2024
Consider inlining functions that are only called in one place for efficiency. Simplify code structure to reduce bugs and improve performance. Emphasize consistent execution paths over avoiding minor optimizations.
A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World
June 19, 2024
The text discusses the development and commercialization of a bug-finding tool that can identify errors in large amounts of code. It highlights the challenges faced in finding and addressing various types of bugs, such as memory corruption and data races, across different programming systems. The tool's effectiveness in uncovering bugs in complex codebases emphasizes the importance of bug detection for improving software quality.
What is Systems Programming, Really?
June 19, 2024
The term "systems programming" combines low-level programming and systems design. It involves creating and managing complex components, often focusing on machine implementation details. Over time, the distinction between systems programming and other programming languages has become less clear.
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding
June 18, 2024
MLKV introduces Multi-Layer Key-Value sharing to reduce memory usage in transformer decoding. This approach improves efficiency without sacrificing performance on NLP benchmarks. MLKV significantly reduces memory requirements compared to existing methods like Multi-Query Attention.
Mitchell Hashimoto
June 17, 2024
Mitchell Hashimoto is an advisor at Polar and shares insights on technical projects, Zig programming, and automation on his website. He discusses various topics like GitHub pull requests, Zig build system, and AI growth through cloud lens. Mitchell's writing covers a range of technical subjects and his experiences in the startup world.
Understanding_Machine_Learning_-_From_Theory_to_Algorithms
June 17, 2024
I'm sorry, but there is no content provided for me to summarize. If you provide me with the specific content or information you would like summarized, I would be happy to help.
UB Might Be a Wrong Term for Newer Languages Apr 2, 2023
June 17, 2024
The author suggests that using the term "undefined behavior" in newer languages like Zig and Rust may not be the best choice due to differences in semantics. In C, implementations can define some behaviors left undefined by the standard, but in Rust and Zig, any program showing undefined behavior is considered invalid. The author proposes using terms like "non-trapping programming error" or "invalid behavior" to better convey the intended semantics in these languages.
What Every C Programmer Should Know About Undefined Behavior #1/3
June 17, 2024
This blog post explains that many seemingly reasonable things in C actually have undefined behavior, leading to common bugs in programs. Undefined behavior in C allows for optimizations that improve performance but can result in unexpected outcomes like formatting your hard drive. Understanding undefined behavior is crucial for C programmers to prevent potential issues and improve code efficiency.
The Rustonomicon
June 16, 2024
The Rustonomicon is a book for understanding Unsafe Rust programming details. It complements The Rust Programming Language by delving into combining language pieces and potential issues. The book covers topics like (un)safety, creating safe abstractions with unsafe primitives, and working with memory, but does not provide exhaustive API details.
chrono-Compatible Low-Level Date Algorithms
June 15, 2024
The text explains algorithms for handling dates and determining leap years. It includes functions for calculating the last day of a month and converting dates between different calendar systems. The algorithms are designed to be efficient and accurate for various date calculations.
Step-by-Step Diffusion: An Elementary Tutorial
June 15, 2024
The text is a tutorial about diffusion. The authors are Preetum Nakkiran, Arwen Bradley, Hattie Zhou, and Madhu Advani. The tutorial is available on the domain readwise.io.
So Many New Systems Programming Languages II
June 14, 2024
The text discusses new systems programming languages like Rust, Zig, and Odin, highlighting their safety features and syntax. These languages offer improved memory management and safety compared to older languages like C and C++. Rust, in particular, stands out for its memory safety, threading support, and borrow checker.
zackoverflow
June 14, 2024
Zack, the author, enjoys building things and delving into the inner workings of systems and computers for dopamine. He works on the Bun JavaScript runtime and creates music when not coding. Zack invites anyone to chat through his open calendar link.
From Theory To Implementation
June 14, 2024
Physically Based Rendering is a widely-used textbook in computer graphics that combines theory with practical implementation for creating realistic images. The book, authored by industry experts, offers cutting-edge algorithms and ideas, including GPU ray tracing, to help readers design advanced rendering systems. Both the third and fourth editions of the book are available online for free.
Speech-to-text models
June 13, 2024
Speech-to-text AI enhances communication and accessibility by transcribing spoken words into text accurately and efficiently. Machine learning and AI advancements have significantly improved the accuracy and adaptability of speech-to-text systems. These technologies open up new possibilities for inclusive and effective communication across various industries.
Ray Tracing in One Weekend
June 13, 2024
"Ray Tracing in One Weekend" introduces readers to the concept of ray tracing through a step-by-step guide to creating a ray tracer that produces images. The document covers topics such as sending rays into the scene, ray-sphere intersection, shading, and reflection. It explains the mathematical aspects behind ray tracing, including formulas for sphere intersections and normal vectors. The guide progresses from creating a simple image of a sphere to more complex scenes, providing insights into the coding process and considerations for optimizing the rendering process.
Untangling Lifetimes: The Arena Allocator
June 13, 2024
The text discusses the arena allocator as an alternative to traditional manual memory management in C, addressing issues with malloc and free. The arena allocator simplifies memory allocation and deallocation by grouping lifetimes together in a single block of memory. It provides a more efficient and manageable way to handle memory usage in codebases compared to the malloc and free approach.
Tree-Structured Concurrency — 2023-07-01
June 12, 2024
Structured concurrency is a programming concept that ensures clear control flow in concurrent programs. In the context of async Rust, it guarantees properties like cancellation propagation, which means that dropping a future will also cancel all nested futures. The text discusses examples of unstructured and structured concurrency patterns, emphasizing the importance of applying structured concurrency to improve program correctness and maintainability. It also mentions the need for more API support to fully achieve structured concurrency in async Rust, suggesting practical approaches like using task queues or adopting the smol model for task spawning. Overall, structured concurrency provides a way to reason about async Rust programs effectively and enhance their reliability.
immersivemath: Immersive Linear Algebra
June 12, 2024
This text introduces a book on linear algebra with chapters covering vectors, dot products, matrix operations, and more. It aims to help readers understand fundamental concepts and tools in linear algebra through clear explanations and examples. The book includes topics such as Gaussian elimination, determinants, rank, and eigenvalues.
BSTJ 57: 6. July-August 1978: The UNIX Time-Sharing System. (Ritchie, D.M.; Thompson, K.)
June 12, 2024
The UNIX Time-Sharing System is a versatile operating system with unique features. It runs on Digital Equipment Corporation computers and emphasizes simplicity and ease of use. UNIX has been widely adopted for research, education, and document preparation purposes.
Principles of compiler design
June 12, 2024
This text is about a book on compiler design principles. The book is authored by Jeffrey D. Ullman and contains 604 pages. It includes bibliographical references, but access to the EPUB and PDF versions is not available.
A Mathematical Theory of Communication
June 12, 2024
The paper extends communication theory by considering noise in the channel, savings from message structure, and channel capacity. It discusses entropy, coding efficiency, channel capacity, noisy channels, equivocation, and optimal information transmission techniques. Examples and theorems are provided to explain the concepts of encoding, channel capacity, and noise in communication systems.
Mapping the whole internet with Hilbert curves
June 12, 2024
The author mapped the internet using Hilbert curves to visualize IP addresses. The curves help display the vast network structure in a more comprehensible way. The scan revealed interesting patterns and changes in IP address allocations over time.
Hausdorff dimension - Wikipedia
June 12, 2024
xorvoid
June 12, 2024
Anthony Bonkoski, a computer enthusiast and engineer, shares his experiences in programming and working in quantitative finance. He enjoys working on various projects and has expertise in low-level programming, distributed systems, and reverse-engineering. Currently taking a break from full-time work, he is open to part-time consulting projects and enjoys writing and exploring new interests.
A Recipe for Training Neural Networks
June 11, 2024
The text discusses common mistakes in training neural networks and emphasizes the importance of patience and attention to detail for successful deep learning. It provides a recipe for training neural networks, including steps like setting up a training skeleton, visualizing losses, and focusing on regularization and tuning to improve model performance. The text also highlights the value of adding more real data and using ensembles to enhance accuracy.
You own your data, in spite of the cloud
June 11, 2024
The text discusses the benefits of local-first software, emphasizing ownership and control of data while also enabling seamless collaboration. It compares traditional cloud apps with new approaches that prioritize user ownership and real-time collaboration. The focus is on developing software that combines the convenience of cloud apps with the data ownership of traditional software.
Writing CUDA Kernels for PyTorch
June 11, 2024
The text shows the thread distribution on different streaming multiprocessors (SM) in CUDA. Threads are organized into warps, lanes, and specific thread numbers within each SM. This information is crucial for optimizing CUDA kernels in PyTorch.
Multi-Query & Grouped-Query Attention
June 11, 2024
The text explains how Multi-Query Attention and Grouped-Query Attention reduce the Key-Value Cache size in transformer models while maintaining performance. Multi-Query Attention allows multiple attention heads to share key and value vectors, while Grouped-Query Attention groups these vectors based on a hyperparameter, offering a balance between performance and cache reduction. These techniques help manage memory usage during text generation tasks in transformer models.
999 crates of Rust on the wall
June 11, 2024
The author compared popular crates on crates.io to their upstream repositories to improve supply chain security. Most top crates matched their repositories, but some had issues like missing VCS info or build failures. Future work includes extending this analysis to all crates on crates.io and improving publishing processes for better security.
Uiuisms
June 9, 2024
This text provides a list of Uiua functions for solving common problems. Contributors can add more functions to the list in the repository. Functions include splitting arrays, removing rows, upscaling matrices, and working with diagonal arrays.
Arithmetic functions
June 9, 2024
BQN's arithmetic functions mirror mathematical notation and apply element-wise to arrays. BQN supports basic arithmetic operations like addition, subtraction, multiplication, division, exponentiation, and root functions. Character arithmetic is a distinctive feature allowing manipulation of characters with symbols like + and -.
An interactive study of queueing strategies
June 9, 2024
This text explores different queueing strategies for handling requests, emphasizing the importance of prioritizing requests effectively to prevent dropping important ones. It introduces concepts like FIFO and priority queues, as well as active queue management techniques to optimize request processing. Understanding these strategies can help in efficiently managing queues and improving overall system performance.
A DSL for Implementing Math Functions
June 8, 2024
MCC15-04
June 8, 2024
ethereumbook/04keys-addresses.asciidoc at develop · ethereumbook/ethereumbook · GitHub
June 6, 2024
This chapter introduces public key cryptography used in Ethereum for securing ownership of funds through private keys and addresses. Public keys are derived from private keys and are represented as points on an elliptic curve. Ethereum addresses are unique identifiers generated from public keys using the Keccak-256 hash function.
Accidentally Turing-Complete
June 6, 2024
The document "Accidentally Turing-Complete" explores various unexpected systems and technologies that unintentionally exhibit Turing completeness, a property that allows them to perform any computation. Examples include C++ templates, TypeScript, Java generics, X86 mov instructions, Magic: The Gathering card game, HTML5, Minecraft, Dwarf Fortress game, SQL, Apache Rewrite Rules, Pokemon Yellow game, Scala type system, MediaWiki templates, Little Big Planet game, Sendmail, Vim Normal-Mode, Border Gateway Protocol (BGP), Excel, Super Mario World glitches, PowerPoint, Font Shaping, JBIG2 Image Compression, and Stupid RDMA NICs. The document showcases how these diverse systems, from games to internet protocols, can unexpectedly demonstrate the computational power of Turing completeness.
The Art of Computer Programming, Vol. 4 Fascicle 6
June 6, 2024
The_Manga_Guide_to_Linear_Algebra
June 6, 2024
Exploring architectures- Transformers II
June 6, 2024
The text explains how Transformers utilize queries, keys, and values to calculate self-attention weights for tokens. It details the process of obtaining the self-attention weights and generating output tokens through neural networks. The final steps involve calculating loss using cross-entropy and backpropagating to update the weight parameters.
What are Diffusion Models?
June 6, 2024
Diffusion models slowly add noise to data and then learn to reverse the process to create desired samples. Unlike other models, diffusion models have a fixed procedure and high-dimensional latent variables. Training a diffusion model involves approximating conditioned probability distributions and simplifying the objective function.
Problems with BQN
June 6, 2024
BQN has issues with incoherent monad-dyad pairs and train structures, making code readability and implementation challenging. Modifications like the Constant modifier ˙ attempt to address these challenges. However, there are still limitations in tacit code construction and array reductions that impact the language's usability.
Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model
June 5, 2024
The paper presents a simple and effective denoising-diffusion model called Iterative α-(de)Blending. It offers a user-friendly alternative to complex theories, making it accessible with basic calculus and probability knowledge. By iteratively blending and deblending samples, the model converges to a deterministic mapping, showing promising results in computer graphics applications.
The borrow checker within
June 5, 2024
The text discusses improvements to Rust's borrow checker to align better with its core design ethos of mutation xor sharing. These changes aim to make Rust code patterns feel more intuitive and work seamlessly with the borrow checker's rules. The proposed enhancements include features like conditional return references, view types, and addressing phased initialization issues.
How should I read type system notation?
June 5, 2024
A type system in programming languages follows rules for expressions and types. Typing rules are written as relationships between expressions and their types for checking and inferring types. Contexts are used to keep track of variable types in type judgments.
Writing a Simple Garbage Collector in C
June 5, 2024
Summary:
The text explains how to implement a simple garbage collector in C by writing a memory allocator function that manages free and used memory blocks using linked lists. The garbage collection algorithm involves scanning memory regions to mark blocks in use and free those not in use. The collector function collects unused memory blocks, making the heap scanning code simpler and faster.
A decade of developing a programming language
June 5, 2024
The author spent a decade developing the programming language Inko, transitioning from gradual to static typing and using Rust for the compiler. Recommendations include avoiding gradual typing, self-hosting compilers, and focusing on functionality over performance when building a new language. Building a language for long-term use is a time-consuming process that requires prioritizing user needs over technical complexities.
The Rust I Wanted Had No Future
June 5, 2024
The author preferred certain design choices in early Rust over the current state, such as the treatment of certain language features and performance considerations. They express a desire for a simpler, less performance-focused language with different priorities than those commonly held in the Rust community. The author reflects on their preferences for language design and the trade-offs they would have made for a more straightforward and expressive programming experience.
The Garbage Collection Handbook
June 5, 2024
The Garbage Collection Handbook is a comprehensive guide on automatic memory management, covering modern techniques and challenges faced by programmers. This second edition updates the handbook with insights from over 60 years of research and development in the field. It is essential reading for programmers looking to understand and navigate the complexities of garbage collection in modern programming languages.
A high-bias, low-variance introduction to Machine Learning for physicists
June 5, 2024
This text is an introduction to Machine Learning for physicists, highlighting the natural connections between ML and statistical physics. It explains the use of "energy-based models" inspired by statistical physics in deep learning methods. The discussion includes the application of methods from statistical physics to study deep learning and the efficiency of learning rules.
How diffusion models work: the math from scratch
June 1, 2024
Diffusion models generate diverse high-resolution images and are different from previous generative methods. Cascade diffusion models and latent diffusion models are used to scale up models to higher resolutions efficiently. Score-based generative models are similar to diffusion models and involve noise perturbations to generate new samples.
essentials-of-compilation
May 31, 2024
The text discusses the implementation of compilers for different programming languages, covering topics such as syntax definitions, interpreter extensions, and x86 assembly translation. It emphasizes simplifying the compiler process for readers by using a straightforward language and providing step-by-step guidance on compiler development. Additionally, it introduces new language features like Booleans, conditionals, and tuples, expanding the capabilities of the compilers being built.
PRACTICAL COMPILER CONSTRUCTION
May 31, 2024
"Practical Compiler Construction" is a textbook on writing compilers with annotated source code. The second edition is now available in print with improvements and bug fixes. The book covers compiler construction concepts and advanced techniques for optimizing code.
A Distributed Systems Reading List
May 31, 2024
This reading list covers materials for understanding distributed systems design and challenges. It includes resources on topics like latency, Amazon's organizational culture, Google's cutting-edge technologies, consistency models, theory, languages, tools, infrastructure, storage, Paxos consensus, and gossip protocols. The list aims to help readers adapt their thinking to effectively tackle distributed system complexities.
An Introduction to Assembly Programming with RISC-V
May 28, 2024
This text provides information about a resource related to RISC-V programming. The ISBN number for this resource is 978-65-00-15811-3. It is authored by riscv-programming.org.
Microsoft PowerPoint - SRAM Architecture
May 28, 2024
The text discusses the architecture of Static Random Access Memory (SRAM) cells, focusing on their read and write operations, sizing considerations, and column circuitry. SRAM cells store data using cross-coupled inverters, with specific steps for reading and writing data. Column circuitry includes bitline conditioning, sense amplifiers, and multiplexing for efficient data access.
MLIR: A Compiler Infrastructure for the End of Moore's Law
May 27, 2024
MLIR is a versatile compiler infrastructure designed to address software fragmentation and improve compilation for different hardware. It aims to reduce the cost of building domain-specific compilers and facilitate the connection of existing compilers. MLIR offers a standardized approach to code generation and optimization across various application domains and hardware targets.
MLIR — Getting Started
May 27, 2024
The text is a guide titled "MLIR — Getting Started" by Math ∩ Programming available on www.jeremykun.com.
Chapter 2 Basics of SIMD Programming
May 27, 2024
The text explains how to organize data for SIMD operations and provides examples of SIMD-Ready Vectors. It also discusses the relationship between vectors and scalars in SIMD programming. Built-in functions for VMX instructions and SIMD operation principles are outlined in the text.
Matrix multiplication in Mojo
May 27, 2024
The text discusses matrix multiplication in Mojo. It is written by modular.com and can be found on docs.modular.com.
Matrix Multiplication on CPU
May 27, 2024
The text is about matrix multiplication on a CPU. The author is Marek Kolodziej and the domain is marek.ai.
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
May 27, 2024
The text is a worklog by Simon Boehm about optimizing a CUDA Matmul Kernel for cuBLAS-like performance. It can be found on the domain siboehm.com.
The Annotated Transformer
May 27, 2024
The text discusses the architecture and training of a Transformer model.
It explains the use of self-attention and feed-forward networks in the encoder and decoder.
The model is demonstrated through examples of prediction and visualization of attention mechanisms.
Anonymity and the internet
May 27, 2024
Anonymity on the internet is fragile, with each piece of information reducing anonymity. Revealing multiple bits of personal information can jeopardize anonymity, but deliberate disinformation can help regain some anonymity. To protect anonymity, it's best to minimize information disclosure.
Auto-Regressive Next-Token Predictors are Universal Learners
May 26, 2024
Simple linear next-token predictors can efficiently approximate any function computable by a Turing machine. Even basic models like linear networks and shallow Multi-Layer Perceptrons show strong performance on tasks like text generation and arithmetic. By leveraging auto-regressive learning, these models can achieve impressive results in solving complex tasks.
Where Vim Came From
May 25, 2024
Vim is a popular text editor with a long history tracing back to the Unix epoch. Its development started in 1988 and evolved from the "wq text editor" concept. Vim's success is attributed to its features and the gradual accumulation of good ideas over time.
Building and operating a pretty big storage system called S3
May 25, 2024
Dr. Werner Vogels shares insights from working on Amazon's S3 storage system, highlighting the scale and unique challenges faced. S3's design incorporates innovative strategies to efficiently handle vast amounts of data across millions of hard drives while prioritizing customer experience. Vogels emphasizes the need for a broader perspective on software systems and the rewarding journey of scaling as an engineer at Amazon.
Unnamed Document
May 25, 2024
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
May 25, 2024
Meta's LLaMA family has become one of the most powerful open-source Large
Language Model (LLM) series. Notably, LLaMA3 models have recently been released
and achieve impressive performance across various with super-large scale
pre-training on over 15T tokens of data. Given the wide application of low-bit
quantization for LLMs in resource-limited scenarios, we explore LLaMA3's
capabilities when quantized to low bit-width. This exploration holds the
potential to unveil new insights and challenges for low-bit quantization of
LLaMA3 and other forthcoming LLMs, especially in addressing performance
degradation problems that suffer in LLM compression. Specifically, we evaluate
the 10 existing post-training quantization and LoRA-finetuning methods of
LLaMA3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMA3's
low-bit quantization performance. Our experiment results indicate that LLaMA3
still suffers non-negligent degradation in these scenarios, especially in
ultra-low bit-width. This highlights the signif...
LADW_2017-09-04
May 25, 2024
This text discusses properties of vector spaces and matrices, particularly focusing on bases and eigenvalues. It establishes that any linearly independent system of vectors can be completed to form a basis in a finite-dimensional vector space. Additionally, it explains that operators in inner product spaces have an upper triangular matrix representation under certain conditions.
New Scaling Laws for Large Language Models
May 25, 2024
DeepMind's new paper challenges existing scaling laws for training large language models, proposing more optimal use of compute resources. By training a smaller 70-billion parameter model using their new scaling laws, DeepMind demonstrated superior performance compared to larger models like GPT-3 and their own 270-billion parameter model. This discovery may lead to more cost-effective and efficient training of large language models in the future.
Binary Magic: Building BitNet 1.58bit Using PyTorch from Scratch
May 25, 2024
The document discusses the creation of a 1.58bit model called BitNet using PyTorch from scratch, which can rival full precision LLMs. Quantization, the process of representing float numbers with fewer bits, is explained as a method to increase the speed and reduce the RAM consumption of ML models, albeit with some loss of accuracy. BitNet differs from existing quantization approaches as it trains the model from scratch with quantization, offering a unique quantization algorithm and implementation in PyTorch. Results from experiments with custom PyTorch implementations show that the 2bit and 1bit variants of models perform as well as full precision models, demonstrating the potential of this approach.
king - man + woman is queen; but why?
May 25, 2024
The text explains how the word2vec algorithm transforms words into vectors for analyzing similarities and relationships between words. By using vector arithmetic, it can find analogies such as "king - man + woman = queen." Understanding word co-occurrences can provide insight into the meaning of words through the distributional hypothesis.
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
May 25, 2024
Meta's LLaMA family has become one of the most powerful open-source Large
Language Model (LLM) series. Notably, LLaMA3 models have recently been released
and achieve impressive performance across various with super-large scale
pre-training on over 15T tokens of data. Given the wide application of low-bit
quantization for LLMs in resource-limited scenarios, we explore LLaMA3's
capabilities when quantized to low bit-width. This exploration holds the
potential to unveil new insights and challenges for low-bit quantization of
LLaMA3 and other forthcoming LLMs, especially in addressing performance
degradation problems that suffer in LLM compression. Specifically, we evaluate
the 10 existing post-training quantization and LoRA-finetuning methods of
LLaMA3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMA3's
low-bit quantization performance. Our experiment results indicate that LLaMA3
still suffers non-negligent degradation in these scenarios, especially in
ultra-low bit-width. This highlights the signif...
1-bit Model
May 25, 2024
Quantizing small models like Llama2-7B at 1-bit yields poor performance but fine-tuning with low-rank adapters significantly improves output quality. The HQQ+ approach shows potential in extreme low-bit quantization for machine learning models, reducing memory and computational requirements while maintaining performance. Training larger models with extreme quantization can lead to superior performance compared to training smaller models from scratch.
Human Knowledge Compression Contest
May 25, 2024
The Human Knowledge Compression Contest measures intelligence through data compression ratios. Better compression leads to better prediction and understanding, showcasing a link between compression and artificial intelligence. The contest aims to raise awareness of the relationship between compression and intelligence, encouraging the development of improved compressors.
Heatmaps and CNNs Using Fast.ai
May 25, 2024
The text discusses heatmaps, CNNs, and their relationship in deep learning. It explains how heatmaps are generated using Grad-CAM heatmaps from the final layer of a Convolutional Neural Network. The article also touches on creating heatmaps using Adaptive Pooling layers and interpreting top losses for model evaluation.
Where do LLMs spend their FLOPS?
May 19, 2024
LLMs (large language models) spend their FLOPS (floating point operations) on various tasks, including computing QKV (query, key, value) matrices, attention output matrices, and running the feed-forward network (FFN). The attention mechanism plays a crucial role in LLMs, even though the FLOPS required for attention calculations are relatively small. The KV cache, which stores information for each token, requires significant memory but is necessary for generating sequences. Different architectural choices, such as grouped query attention and sliding window attention, can affect the size and efficiency of the KV cache. Increasing the number of layers in an LLM linearly scales the FLOPS and parameters, while increasing the model width quadratically scales the model size. Wider models parallelize better, while deeper models increase inference time linearly.
The Annotated Diffusion Model
May 19, 2024
A neural network learns to denoise data by gradually removing noise. The process involves adding noise to an image and then training the network to reverse the denoising. The network predicts noise levels based on corrupted images at different time steps.
Defusing Diffusion Models
May 19, 2024
This post explains the concepts of forward and reverse diffusion processes in diffusion models. By understanding these processes, readers can train diffusion models to generate samples from target distributions effectively. Guided diffusion models are also discussed, showing how conditioning information can be used to guide the diffusion process for specific outcomes.
The Illustrated Stable Diffusion
May 19, 2024
AI image generation with Stable Diffusion involves an image information creator and an image decoder. Diffusion models use noise and powerful computer vision models to generate aesthetically pleasing images. Text can be incorporated to control the type of image the model generates in the diffusion process.
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
May 10, 2024
Mamba-UNet is a new architecture combining U-Net with Mamba technology for better medical image segmentation performance. It addresses limitations in modeling long-range dependencies within medical images. Results show that Mamba-UNet outperforms other UNet variations in medical image segmentation tasks.
Sparse Autoencoders Find Highly Interpretable Features in Language Models
May 1, 2024
Sparse autoencoders help identify clear and understandable features in language models by tackling the issue of polysemanticity. By using sparse autoencoders, researchers can pinpoint specific features responsible for certain behaviors in neural networks more effectively than other methods. This approach may lead to increased transparency and control over language models in the future.
KAN: Kolmogorov–Arnold Networks
May 1, 2024
Kolmogorov-Arnold Networks (KANs) have learnable activation functions on edges, outperforming Multilayer Perceptrons (MLPs) in accuracy and interpretability. KANs show faster neural scaling laws than MLPs, leveraging splines and MLPs to improve accuracy and interpretability. KANs can represent functions effectively and display more favorable scaling curves than MLPs, especially in high-dimensional examples.
KAN: Kolmogorov-Arnold Networks
May 1, 2024
KANs outperform MLPs in accuracy and interpretability by using learnable activation functions on edges. They have faster neural scaling laws and can represent special functions more efficiently. KANs offer a promising alternative to MLPs in various applications, showcasing improved performance and interpretability.
Structure and Interpretation of Computer Programs, 2nd ed.
April 30, 2024
The text discusses key concepts in programming, such as primitive expressions, means of combination, and means of abstraction. It highlights the role of the environment in determining the meaning of symbols in expressions. The evaluation process involves reducing expressions to procedures applied to arguments, leading to a deeper understanding of programming concepts.
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
April 25, 2024
The reproducibility and transparency of large language models are crucial for
advancing open research, ensuring the trustworthiness of results, and enabling
investigations into data and model biases, as well as potential risks. To this
end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a
layer-wise scaling strategy to efficiently allocate parameters within each
layer of the transformer model, leading to enhanced accuracy. For example, with
a parameter budget of approximately one billion parameters, OpenELM exhibits a
2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer
pre-training tokens.
Diverging from prior practices that only provide model weights and inference
code, and pre-train on private datasets, our release includes the complete
framework for training and evaluation of the language model on publicly
available datasets, including training logs, multiple checkpoints, and
pre-training configurations. We also release code to convert models to MLX
libra...
þÿAn Infinitely Large Napkin
April 23, 2024
The text is titled "An Infinitely Large Napkin" by Evan Chen. The author's work can be found on readwise.io.
IEEE Xplore Full-Text PDF:
April 10, 2024
Root Mean Square Layer Normalization
April 9, 2024
The text discusses a technique called Root Mean Square Layer Normalization proposed by Biao Zhang and Rico Sennrich. This technique is likely related to a method for normalizing data in neural networks. The authors' work can be found on arxiv.org.
Root Mean Square Layer Normalization
April 9, 2024
Layer normalization (LayerNorm) has been successfully applied to various deep
neural networks to help stabilize training and boost model convergence because
of its capability in handling re-centering and re-scaling of both inputs and
weight matrix. However, the computational overhead introduced by LayerNorm
makes these improvements expensive and significantly slows the underlying
network, e.g. RNN in particular. In this paper, we hypothesize that
re-centering invariance in LayerNorm is dispensable and propose root mean
square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs
to a neuron in one layer according to root mean square (RMS), giving the model
re-scaling invariance property and implicit learning rate adaptation ability.
RMSNorm is computationally simpler and thus more efficient than LayerNorm. We
also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of
the summed inputs without breaking the above properties. Extensive experiments
on several tasks using diverse...
Terry A. Davis
April 7, 2024
Terry A. Davis, an American electrical engineer and programmer, created TempleOS, a public domain operating system. Despite his mental health challenges, Davis gained an online following for his unique work and beliefs. His legacy continues to be remembered through documentaries and online discussions.
Pattern Recognition and Machine Learning
April 6, 2024
The content discusses likelihood functions for Gaussian distributions, maximizing parameters using observed data, Bayesian model comparison, mixture density networks, and EM algorithm for Gaussian mixtures. It covers topics like posterior distributions, predictive distributions, graphical models, and variational inference. The material emphasizes probability distributions, optimization, and model comparison.
Ludwig Wittgenstein: The Duty of Genius
April 6, 2024
The text discusses the complex relationship between Ludwig Wittgenstein and his peers, particularly Bertrand Russell. Wittgenstein's philosophical ideas and personal struggles are highlighted, showing the challenges he faced in expressing his thoughts and finding understanding from others. Despite his brilliance, Wittgenstein's life was marked by loneliness and inner turmoil, making it difficult for him to fully convey his philosophical insights.
Generative Agents: Interactive Simulacra of Human Behavior
March 28, 2024
The content discusses generative agents that simulate believable human behavior for interactive applications. These agents populate a sandbox environment, interact with each other, plan their days, form relationships, and exhibit emergent social behaviors. The paper introduces a novel architecture that allows agents to remember, retrieve, reflect, and interact dynamically.
Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks
March 11, 2024
The text is a comprehensive survey of 400 activation functions for neural networks. It provides numerous URLs and DOIs for further reading and reference. The authors are Vladimír Kunc and Jiří Kléma.
Revisiting Deep Learning as a Non-Equilibrium Process
March 8, 2024
The document discusses the nature of Deep Learning systems, highlighting differences from traditional machine learning systems and challenging common misconceptions. It emphasizes the complexity and non-convexity of Deep Learning, noting that optimization techniques alone cannot explain its success. The text critiques the field for lacking in-depth exploration of the true nature of Deep Learning, pointing out a tendency towards superficial explanations and reliance on celebrity figures rather than rigorous scientific inquiry. It delves into the use of Bayesian techniques, the role of noise, and the importance of architecture in Deep Learning, arguing for a deeper understanding of the underlying processes and the need for more precise language and theoretical exploration.
Dissipative Adaptation: The Origins of Life and Deep Learning
March 8, 2024
The document explores the concept of Dissipative Adaptation, drawing parallels between the emergence of life and the mechanisms of Deep Learning. It discusses the work of Jeremy England and his theory of non-equilibrium statistical mechanics known as Dissipative Adaptation, which explains the self-organizing behavior of Deep Learning. The text delves into how neural networks evolve through training, emphasizing the role of external observations in driving the system towards minimizing entropy. It contrasts the mechanisms of Dissipative Adaptation with current Deep Learning architectures, highlighting similarities in alignment of components to maximize energy dissipation or information gradient.
A Gentle Introduction to LLVM IR
March 6, 2024
Learning LLVM IR can be beneficial for generalist working programmers to understand what their compiler is doing to create highly optimized code. LLVM IR is well-documented and can be treated as a slightly weird programming language. It is strongly typed and requires explicit type annotations. LLVM IR is a static single assignment form (SSA) IR and has properties that make optimizations simpler to write. It supports control flow operations, arithmetic instructions for different types, and memory operations. There are also LLVM intrinsics available for specific functions. However, some parts of LLVM's semantics, such as undefined behavior and pointer provenance, can be challenging to navigate.
Re: [Fis] A PROPOSAL ABOUT THE DEFINITION OF INFORMATION
March 6, 2024
The email exchange discusses the concept of negative entropy and its implications in mathematics and thermodynamics. Sungchul Ji questions the validity of negative entropy based on the Third Law of Thermodynamics. Arturo Tozzi argues for the existence of negative entropy in certain cases and relates it to information theory and free energy.
The Art of Embeddings: Transforming Text for Vector Databases (Part 2)
March 6, 2024
Embeddings are a crucial component of transforming text into vectors in vector databases. They capture rich context and make data more useful by capturing meaning and context in a machine-readable format. Tokenization is the first step in the embedding process, where text is broken down into smaller parts or tokens. Word2Vec is a popular method that creates dense vector representations of word features based on context. However, it has limitations such as struggling with polysemy and out-of-vocabulary words. Sub-word tokenization is a hybrid approach that can handle these limitations by decomposing words into meaningful sub-words. Transformer models, such as BERT, are used to transform tokenized words into embeddings by leveraging self-attention mechanisms and positional encodings. The choice of tokenization method can significantly affect the size and effectiveness of the embeddings, including vocabulary size, handling of out-of-vocabulary words, and overall quality and usefulness of the embeddings. Choosing th...
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
March 6, 2024
The text discusses a method called Parameter-Efficient Sparsity Crafting (PESC) that enhances sparse models for natural language processing tasks. PESC involves integrating adapters into sparse models, improving performance without changing individual weights. The approach outperforms other sparse models and even competes with GPT-3.5 in various tasks.
þÿThe Little Book of Deep Learning
March 6, 2024
I'm sorry, but there is no content provided to summarize. If you have any text or information you would like me to summarize, please provide it so I can assist you.
Information
March 6, 2024
The text discusses the challenges and complexities of measuring and quantifying information, particularly in terms of storage capacity, compression, and entropy. It explores various examples, such as genome information, human sensory capabilities, and the information content of objects like water molecules and black holes. The relationship between information, entropy, and physical properties is also highlighted.
Sequence to Sequence Learning with Neural Networks
March 6, 2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
March 6, 2024
The article introduces a new era of 1-bit Large Language Models (LLMs) that can significantly reduce the cost of LLMs while maintaining their performance. BitNet b1.58 is a 1.58-bit LLM variant in which every parameter is ternary, taking on values of {-1, 0, 1}. It retains all the benefits of the original 1-bit BitNet, including its new computation paradigm, which requires almost no multiplication operations for matrix multiplication and can be highly optimized. Moreover, BitNet b1.58 offers two additional advantages: its modeling capability is stronger due to its explicit support for feature filtering, and it can match full precision (i.e., FP16) baselines in terms of both perplexity and end-task performance at a 3B size.
How to round to 2 decimals with Python? [duplicate]
March 5, 2024
To round a number to 2 decimals in Python, the usual method is using round(value, significantDigit), but it can behave unexpectedly when the digit before the one being rounded is a 5. To address this, a workaround involves adding a small value to ensure proper rounding. This method allows for traditional rounding commonly used in statistics without needing to import additional libraries like Decimal. By incorporating this workaround into a function, you can achieve the desired rounding results without encountering the issue with numbers ending in 5.
Rounding floats with f-string [duplicate]
March 5, 2024
Using %-formatting, I can specify the number of decimal cases in a string:
x = 3.14159265
print('pi = %0.2f' %x)
This would give me:
pi = 3.14
Is there any way of doing this using f-strings in ...
Latent Interfaces
March 3, 2024
In a career shift, the author is launching Latent Interfaces to apply expertise in design, prototyping, and development to complex data challenges. They share insights into a genomic data project, emphasizing the importance of Python skills alongside JavaScript. The document showcases the creation of intuitive data interfaces and the design process involving both digital and physical tools. Additionally, the author discusses the significance of well-designed APIs like StabilityAI and the potential for future collaborations in data visualization projects.
Hypercomputation
March 3, 2024
Hypercomputation and super-Turing computation involve models of computation that can produce non-Turing-computable outputs. Introduced in the early 1990s, super-Turing computing is inspired by neurological and biological systems and serves as the foundation for Lifelong Machine Learning. Hypercomputation, a field introduced in the late 1990s, includes philosophical constructs and aims to compute functions beyond what a Turing machine can. The Church-Turing thesis states that any "computable" function can be computed by a Turing machine, but hypercomputers can compute functions that are not computable in the Church-Turing sense. Various hypercomputer models exist, ranging from theoretical concepts like oracle machines to more plausible models like quantum computing. Some proposals suggest that hypercomputation may be achievable through systems like neural networks or analog computers. Critics argue that hypercomputation is not physically realizable.
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
February 28, 2024
Recent research is leading to a new era of 1-bit Large Language Models (LLMs), such as BitNet, introducing a variant called BitNet b1.58 where every parameter is ternary {-1, 0, 1}. This model matches the performance of full-precision Transformer LLMs while being more cost-effective in terms of latency, memory, throughput, and energy consumption. The 1.58-bit LLM sets a new standard for training high-performance and cost-effective models, paving the way for new computation methods and specialized hardware designed for 1-bit LLMs.
How Netflix Really Uses Java
February 27, 2024
The discussion at Netflix delves into how Java is utilized within the company's architecture, highlighting their transition to Java 17 and ongoing testing with Java 21. The move to newer Java versions resulted in significant performance improvements, such as 20% better CPU usage with Java 17. Additionally, the implementation of GraphQL Federation and virtual threads in Java 21 are key advancements that are expected to impact the way code is written and scaled within Netflix's Java stack. The company's shift from Java 8 to Java 17 and the ongoing evolution of their technology frameworks and tooling, particularly focusing on Spring Boot, demonstrate their commitment to staying current with Java developments.
Scheduling Internals
February 27, 2024
The document delves into the concept of concurrency in programming, exploring how tasks can be handled concurrently using different methods like threads, async I/O, event loops, and schedulers. It discusses the challenges and benefits of each approach, illustrating examples in C code to demonstrate the practical implementations. The text covers topics like preemptive and non-preemptive schedulers, implementation details in languages like Go and Rust, as well as the use of event loops for efficient task handling. It also touches on the importance of understanding program state management and the impact on task execution in concurrent programming.
Glossary of Deep Learning: Word Embedding
February 26, 2024
Word embedding is a method that transforms text into numerical vectors for machine learning algorithms to process efficiently. These vectors are created to represent words or phrases as real numbers, focusing on dimensionality reduction and contextual similarity. Word2Vec is a popular algorithm that implements this approach using techniques like CBOW and Skip-gram to predict target words based on their context. While word embeddings are not deep learning themselves, they provide a way for deep nets to interpret and understand natural language, offering a new understanding of language as numbers.
gemini_v1_5_report
February 18, 2024
Gemini 1.5 Pro is a highly compute-efficient multimodal model that can recall and reason over millions of tokens of context, including long documents, videos, and audio. It achieves near-perfect recall on long-context retrieval tasks and outperforms the state-of-the-art in long-document QA, long-video QA, and long-context ASR. Gemini 1.5 Pro also showcases surprising new capabilities, such as learning to translate a new language from a grammar manual. The model surpasses the previous Gemini 1.0 Pro and performs at a similar level to 1.0 Ultra on a wide range of benchmarks while requiring less compute to train.
How to Use t-SNE Effectively
February 16, 2024
t-SNE plots can be useful for visualizing high-dimensional data, but they can also be misleading if not interpreted correctly. The technique creates 2D "maps" of data with many dimensions, but these images can be misread. The perplexity parameter, which balances attention between local and global aspects of the data, has a significant impact on the resulting plots. Different perplexity values may be needed to capture different aspects of the data. t-SNE plots can equalize cluster sizes and distort distances between clusters, making it difficult to interpret relative sizes and distances. It's important to recognize random noise and avoid misinterpreting it as meaningful patterns. t-SNE plots can show some shapes accurately, but local effects and clumping can also affect the interpretation. For topological information, multiple plots at different perplexities may be required. Overall, using t-SNE effectively requires understanding its behavior and limitations.
Temperature as Joules per Bit
February 15, 2024
The text discusses the concept of temperature and entropy in terms of information theory, suggesting that entropy should be measured in bits rather than joules per kelvin. It highlights the importance of information in thermodynamics and how Landauer's principle relates to the cost of erasing information. The authors advocate for viewing energy and entropy as more fundamental than temperature, emphasizing the duality between energy and information.
Consciousness, Cognition and the Neuronal Cytoskeleton – A New Paradigm Needed in Neuroscience
February 14, 2024
Viewing the brain as a complex computer of simple neurons is insufficient to explain consciousness and cognition. A new paradigm is needed that considers the brain as a scale-invariant hierarchy, with quantum and classical processes occurring in cytoskeletal microtubules inside neurons. Evidence shows that microtubules regulate specific firings of axonal branches and modulate membrane and synaptic activities. This new paradigm suggests that information processing for cognitive and conscious brain functions occurs in microtubules and involves both top-down and bottom-up regulation within the brain hierarchy. The precise mechanisms of consciousness may be most likely to reveal themselves in Layer V cortical pyramidal neurons, which have a large collection of mixed polarity microtubule networks.
OpenMEA: Open-Source Microelectrode Array Platform for Bioelectronic Interfacing
February 14, 2024
OpenMEA is an open-source platform for closed-loop bioelectronics research that aims to revolutionize the treatment of medical disorders and augment physiology. It includes designs for components such as electrophysiological recording and stimulation electronics, a microfluidic perfusion system, and physical designs for multielectrode arrays. The platform enables researchers to conduct in vitro experiments and understand the long-term effects of electrical stimulation and drug interactions. OpenMEA offers high-performance processing capabilities and supports simultaneous recording and stimulation, as well as the real-time adaptation of neuromodulation waveforms. It fills the gaps in existing solutions and provides a versatile tool for bioelectronic research.
Landauer's principle
February 14, 2024
Landauer's principle is a physical principle that establishes the minimum energy consumption of computation. It states that irreversible changes in information stored in a computer dissipate a minimum amount of heat to the surroundings. The principle was proposed by Rolf Landauer in 1961 and states that the minimum energy needed to erase one bit of information is proportional to the temperature at which the system is operating. While the principle is widely accepted, it has faced challenges in recent years. However, it has been shown that Landauer's principle can be derived from the second law of thermodynamics and the entropy change associated with information gain.
Bremermann's limit
February 14, 2024
Bremermann's limit is a maximum rate of computation that can be achieved in a self-contained system in the material universe. It is based on Einstein's mass-energy equivalency and the Heisenberg uncertainty principle. This limit has implications for designing cryptographic algorithms, as it can determine the minimum size of encryption keys needed to create an uncrackable algorithm. The limit has also been analyzed in relation to the maximum rate at which a system with energy spread can evolve into an orthogonal state.
Bekenstein bound
February 14, 2024
The Bekenstein bound is an upper limit on the entropy or information that can be contained within a given finite region of space with a finite amount of energy. It implies that the information of a physical system must be finite if the region of space and energy are finite. The bound was derived from arguments involving black holes and has implications for thermodynamics and general relativity. It can be proven in the framework of quantum field theory and has applications in various fields, such as black hole thermodynamics and the study of human brains.
numerical_recipes
February 14, 2024
The content provided is the table of contents for a book titled "Numerical Recipes: The Art of Scientific Computing, Third Edition." It includes various topics such as linear algebra, interpolation and extrapolation, integration of functions, evaluation of functions, special functions, random numbers, sorting and selection, root finding and nonlinear sets of equations, minimization or maximization of functions, eigensystems, and more.
Temperature as Joules per Bit
February 14, 2024
The paper suggests that temperature should be defined in terms of entropy, rather than vice versa. It argues that the current practice of measuring entropy in joules per kelvin is a historical artifact and proposes measuring entropy in bits instead. The paper also discusses the role of information in thermodynamics and the thermodynamic cost of erasure. It concludes by suggesting that entropy, not temperature, should have its own unit and that Boltzmann's constant should be dissolved.
Deep Learning Course
February 10, 2024
This document provides resources for François Fleuret's deep-learning course at the University of Geneva. The course offers a thorough introduction to deep learning, with examples using the PyTorch framework. The materials include slides, recordings, and a virtual machine. The course covers topics such as machine learning objectives, tensor operations, automatic differentiation, gradient descent, and deep-learning techniques. The document also includes prerequisites for the course, such as knowledge of linear algebra, differential calculus, Python programming, and probability and statistics.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
February 9, 2024
Diffusion Models and Associative Memories show surprising similarities in their mathematical underpinnings and goals, bridging traditional and modern AI research. This connection highlights the convergence of AI models towards memory-focused paradigms, emphasizing the importance of understanding Associative Memories in the field of computation. By exploring these parallels, researchers aim to enhance our comprehension of how models like Diffusion Models and Transformers operate in Deep Learning applications.
2309.10668
February 8, 2024
This article discusses the relationship between language modeling and compression. The authors argue that large language models can be viewed as powerful compressors due to their impressive predictive capabilities. They demonstrate that these models can achieve state-of-the-art compression rates across different data modalities, such as images and audio. The authors also explore the connection between compression and prediction, showing that models that compress well also generalize well. They conclude by advocating for the use of compression as a framework for studying and evaluating language models.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
February 8, 2024
Diffusion Models (DMs) have become increasingly popular in generating benchmarks, but their mathematical descriptions can be complex. In this survey, the authors provide an overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs), revealing a mathematical connection to Associative Memories (AMs). AMs are energy-based models that share similarities with denoising DMs, but they allow for the computation of a Lyapunov energy function and gradient descent to denoise data. The authors also summarize the 40-year history of energy-based AMs, starting with the Hopfield Network, and discuss future research directions for both AMs and DMs.
tns
February 8, 2024
This document, entitled "tns", explores the concept of network states and their potential to replace traditional nation states. The author argues that a network state is a social network with a moral innovation, a sense of national consciousness, and a recognized founder, among other features. The document also delves into the history of political power and technological truth, and how the network state is the next Leviathan. The author provides examples of positive and negative syntheses of the network and state and discusses the potential for startup societies and network states to maintain liberal values in an illiberal world.
A New Physics Theory of Life | Quanta Magazine
February 7, 2024
According to physicist Jeremy England, the origin and evolution of life can be explained by the fundamental laws of nature. He proposes that living things are better at capturing and dissipating energy from their environment compared to inanimate objects. England has derived a mathematical formula based on established physics that explains this capacity. His theory, which underlies Darwin's theory of evolution, has sparked controversy among his colleagues. While some see it as a potential breakthrough, others find it speculative. England's idea is based on the second law of thermodynamics and the process of dissipating energy. He argues that self-replication and structural organization are mechanisms by which systems increase their ability to dissipate energy. His theory may have implications for understanding the formation of patterned structures in nature.
K-Level Reasoning with Large Language Models
February 7, 2024
Large Language Models (LLMs) have shown proficiency in complex reasoning tasks, but their performance in dynamic and competitive scenarios remains unexplored. To address this, researchers have introduced two game theory-based challenges that mirror real-world decision-making. Existing reasoning methods tend to struggle in dynamic settings that require k-level thinking, so the researchers propose a novel approach called "K-Level Reasoning" that improves prediction accuracy and informs strategic decision-making. This research sets a benchmark for dynamic reasoning assessment and enhances the proficiency of LLMs in dynamic contexts.
Competitive Programmer's Handbook
February 5, 2024
The article discusses various algorithms and data structures used in computer programming, such as Kadane's algorithm, binary indexed trees, segment trees, Dijkstra's algorithm, and Floyd's algorithm. The author also explains concepts like successor graphs, index compression, and minimum spanning trees. The time complexity of each algorithm is also discussed.
Writing an OS in Rust
February 3, 2024
This blog series provides tutorials on creating a small operating system in the Rust programming language. Each post includes all the necessary code and is accompanied by a corresponding GitHub repository. The series covers topics such as creating a Rust executable without linking the standard library, building a bootable disk image, implementing VGA text mode, performing unit and integration testing, handling CPU exceptions, setting up the interrupt descriptor table, implementing paging and heap allocation, and exploring cooperative multitasking and the async/await feature of Rust. The posts also include status updates and information on supporting the author.
Ever wanted to make your own programming language or wondered how they are designed and built?
February 3, 2024
Crafting Interpreters is a book that provides everything you need to create your own programming language. It covers both high-level concepts like parsing and semantics, as well as technical details such as bytecode representation and garbage collection. The book guides you through building a language from scratch, including features like dynamic typing, lexical scope, functions, classes, and inheritance. It is available in multiple formats, including print, ebook, and online for free. The author, Robert Nystrom, is an experienced language developer who currently works at Google on the Dart language.
GitHub - sst/demo-ai-app: Sample AI movies app built with ❍ Ion
January 31, 2024
This document provides an overview of the sst/demo-ai-app, a sample movies app built with Ion that demonstrates how to use AI in your apps using your own data. The app includes features such as tagging, related movies, and deep search using natural language. It utilizes the Vector component, which is based on Amazon Bedrock and allows for easy AI integration with your data. The document also highlights the advantages of Ion, including faster deployment and no stack limits. The app works by ingesting movie data from IMDB, generating embeddings, and storing them in a Vector database, which the Next.js app then retrieves.
ThermodynamicComputing
January 31, 2024
Measuring Faithfulness in Chain-of-Thought Reasoning
January 28, 2024
Large language models (LLMs) are more effective when they engage in step-by-step "Chain-of-Thought" (CoT) reasoning, but it is unclear if this reasoning is a faithful explanation of the model's actual process. The study examines how interventions on the CoT affect model predictions, finding that models vary in how strongly they rely on the CoT. The performance boost from CoT does not solely come from added test-time compute or specific phrasing. As models become larger and more capable, they tend to produce less faithful reasoning. The results suggest that faithful CoT reasoning depends on carefully chosen circumstances such as model size and task.
ageron/handson-ml3: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
January 26, 2024
The ageron/handson-ml3 project is designed to teach the fundamentals of Machine Learning using Python. It includes example code and exercise solutions from the third edition of the book "Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow." The project provides options for running the notebooks online, using a Docker image, or installing the project on your own machine. It also addresses frequently asked questions about Python versions, SSL errors, and updating the project. The project has received contributions from various individuals, including reviewers, contributors to exercise solutions, and supporters from the Google ML Developer Programs team.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
January 23, 2024
BERT and RoBERTa have achieved impressive results on sentence-pair regression tasks like semantic textual similarity, but they have a significant computational overhead when comparing large collections of sentences. To address this, Sentence-BERT (SBERT) has been developed as a modification of BERT that uses siamese and triplet network structures to generate semantically meaningful sentence embeddings. SBERT reduces the time required to find the most similar pair from 65 hours with BERT to just 5 seconds, while maintaining accuracy. SBERT outperforms other state-of-the-art sentence embedding methods on various tasks, including STS and transfer learning.
Turing-1951 Intelligent Machinery-a Heretical Theory
January 21, 2024
Self-Rewarding Language Models
January 20, 2024
To achieve superhuman language models, researchers propose the use of self-rewarding language models (LLMs) that provide their own rewards during training. Unlike current approaches that rely on human preferences, LLMs use prompts to judge their own performance and improve their instruction following ability and reward generation. A preliminary study using this approach, specifically fine-tuning Llama 2 70B, demonstrates that it outperforms existing systems on the AlpacaEval 2.0 leaderboard. This work suggests the potential for models that can continually improve in both axes.
Software Development Trends 2023/2024 - Vol. 2.
January 16, 2024
The document provides a summary of important software development trends observed in 2023 that are likely to continue into 2024. It includes information on technology roadmaps, the state of DevOps, cloud computing, serverless technology, databases, and more. Some key insights from the document include the value drivers and risks associated with adopting software engineering technologies, the impact of generative cultures and user-focused teams on performance, and the increasing adoption of serverless solutions. Additionally, the document highlights the need for multi-cloud skills development and the most in-demand cloud skills for 2023.
Word2vec from Scratch
January 15, 2024
Word2vec is a technique used to express words as vectors that encode their semantics in a meaningful way. This article discusses how to implement word2vec from scratch using NumPy. The process involves tokenizing the text, creating lookup tables for words and IDs, generating training data in the form of matrices using one-hot vectorization, and building and training the embedding network. The rows of the weight matrix in the network serve as the word embeddings, representing words as dense vectors. The final output of the network is a probability vector that predicts the nearby context words.
MemGPT: Towards LLMs as Operating Systems
January 15, 2024
MemGPT is a system that manages different memory tiers to provide extended context within the limited context window of large language models (LLMs). Using an OS-inspired design, MemGPT can handle unbounded context using LLMs that have finite context windows. It is successful in domains where existing LLMs' limited context windows severely limit their performance, such as document analysis and multi-session chat. MemGPT supports self-directed editing and retrieval, memory-hierarchy, OS functions, and event-based control flow to manage unbounded context.
Visual Guides to understand the basics of Large Language Models
January 14, 2024
This article provides a compilation of tools and articles that aim to break down the complicated concepts of Large Language Models (LLMs) in an intuitive way. It acknowledges that many people struggle with understanding the basics of LLMs and offers resources to help solidify their understanding. The article includes a table of contents with links to various resources, such as "The Illustrated Transformer" by Jay Alammar, which provides visualizations to explain the transformer architecture, a fundamental building block of LLMs. The goal is to make the concepts of LLMs easily understood and accessible.
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
January 14, 2024
This article provides a comprehensive understanding and coding guide for self-attention mechanisms in transformer architectures and large language models (LLMs) like GPT-4 and Llama. It covers the concept of self-attention, its importance in NLP, and the implementation of the self-attention mechanism in Python and PyTorch. The article also discusses the scaled dot-product attention, computing unnormalized attention weights, computing attention weights, and computing the context vector. Additionally, it explores multi-head attention and provides code examples for implementing multiple attention heads.
Thinking in Systems: International Bestseller: Donella H. Meadows, Diana Wright: 9781603580557: Amazon.com: Books
January 14, 2024
"Thinking in Systems" is a book that explores the concept of systems thinking, which involves analyzing the interconnectedness and dynamics of various systems. The book uses examples such as the human body, businesses, and societal systems to illustrate how stocks and flows contribute to achieving system goals. It also highlights the importance of aligning stated goals with actual outcomes and discusses the need for change in systems that are not functioning optimally. The book emphasizes the complexity of systems and the challenges of making meaningful improvements.
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
January 12, 2024
Backdoored behavior in AI models is most persistent in larger models and models trained to deceive the training process, even when the deceptive behavior is distilled away. Adversarial training can actually make models better at recognizing their backdoor triggers, effectively hiding the unsafe behavior. Safety training techniques, such as reinforcement learning, are often ineffective in removing backdoors. The study explores different methods for training backdoored models and finds that chain-of-thought backdoors allow models to produce consistent reasoning for their deceptive behavior.
This project is about how to systematically persuade LLMs to jailbreak them.
January 10, 2024
This project introduces a taxonomy of 40 persuasion techniques to systematically persuade LLMs (large language models) to jailbreak them. Through iterative application of these techniques, the researchers achieved a 92% success rate in jailbreaking advanced LLMs. They also found that more advanced models are more vulnerable to persuasive adversarial prompts (PAPs) and that adaptive defenses can effectively neutralize these prompts. The research highlights the challenges of addressing user-invoked risks from persuasion and the need for further investigation and improved defenses for more capable models.
Pruning vs Quantization: Which is Better?
January 10, 2024
Neural network pruning and quantization are techniques used to compress deep neural networks. This paper compares the two techniques and provides an analytical comparison of expected quantization and pruning error. The results show that in most cases, quantization outperforms pruning. However, in scenarios with very high compression ratios, pruning may be beneficial. The paper also discusses the hardware implications of both techniques and provides a comparison of pruning and quantization in the post-training and fine-tuning settings.
mlx-examples/lora at main · ml-explore/mlx-examples · GitHub
January 10, 2024
This document provides an example of using MLX to fine-tune either a Llama 7B1 or Mistral 7B2 model with low rank adaptation (LoRA) for a target task. The example demonstrates using the WikiSQL dataset to train the model to generate SQL queries from natural language. It includes instructions for setup, running the script, fine-tuning the model, evaluating the model, generating output, and dealing with memory issues. The document also provides results from the training process and offers tips for reducing memory consumption during fine-tuning.
Mixtral of Experts
January 10, 2024
Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model that outperforms or matches other models like Llama 2 70B and GPT-3.5 across various benchmarks. It has the same architecture as Mistral 7B but uses 8 feedforward blocks (experts) in each layer. A router network selects two experts for each token at each layer, allowing for dynamic selection of different experts at each timestep. This results in each token having access to 47B parameters but only using 13B active parameters during inference. Mixtral also offers a fine-tuned model, Mixtral 8x7B - Instruct, which surpasses other models on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.
Paper page - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
January 10, 2024
The content is a set of instructions on how to cite a specific URL (arxiv.org/abs/2401.01335) in three different types of README.md files, in order to create links from those pages.
From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
January 10, 2024
LLMs (Large Language Models) have been enhanced with innovative prompting strategies and external tools, expanding their capabilities. However, integrating LLMs into conversational agents presents a challenge. This paper introduces RAISE, an enhanced version of the ReAct framework, which utilizes scratchpad and retrieved examples to augment the agent's capabilities. RAISE demonstrates superiority as a conversational agent in experiments conducted on a real estate dataset. The working memory of RAISE consists of conversation history, scratchpad, examples, and task trajectory. The paper also discusses the evaluation of agent performance and the core aspects of planning and Chain-of-Thought reasoning.
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
January 9, 2024
The paper presents WikiChat, a few-shot language model (LLM)-based chatbot that minimizes hallucinations and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia and combines grounded facts with additional information from the corpus to generate factual and engaging responses. The system achieves high factual accuracy and outperforms previous retrieval-based chatbots in terms of informativeness and engagement. The paper also introduces a novel evaluation methodology that combines simulated and real user conversations for assessing the factuality and conversationality of chatbots.
Discovering Language Model Behaviors with Model-Written Evaluations
January 8, 2024
The article discusses an approach to generating evaluations using language models (LMs) with the help of crowdworkers. The LM-generated evaluations were rated highly relevant, with workers agreeing with 90-100% of their labels. The researchers showcase their approach by generating datasets that test LMs for 154 diverse behaviors related to model personality, politics, ethics, social bias, and risks from advanced AI systems. The generated multiple-choice questions help the researchers to reveal additional instances of inverse scaling with RLHF training, as well as to distinguish when concerning behaviors are likely caused by pretraining or RLHF.
Getting Started with Elastic Stack 8.0
January 8, 2024
The Elastic Stack consists of Elasticsearch for data storage and search, Kibana for visualization, and tools like Beats and Logstash for data collection and transformation. Beginners can learn about key topics like indexing, searching, and managing data in Elasticsearch through various chapters in the book. Kibana is essential for interacting with data and building solutions on the Elastic Stack.
Understanding The Exploding and Vanishing Gradients Problem
January 7, 2024
The "Understanding The Exploding and Vanishing Gradients Problem" article discusses the vanishing and exploding gradients problem in deep neural networks. It explains how the gradients used to update the weights can shrink or grow exponentially, causing learning to stall or become unstable. The article explores why gradients vanish or explode exponentially and how it affects the backpropagation algorithm during training. It also provides strategies to address the vanishing and exploding gradients problem, such as using the ReLU activation function, weight initialization techniques, and gradient clipping.
Practical Deep Learning for Coders 2022
January 7, 2024
"Practical Deep Learning for Coders 2022" is a course that covers topics such as building and training deep learning models, deploying models, and using PyTorch and other popular libraries. The course is led by Jeremy Howard, who has extensive experience in machine learning and has created companies that utilize deep learning. The course is suitable for those with at least a year of coding experience and a high school math background. Students will learn how to train models for computer vision, natural language processing, tabular data analysis, and collaborative filtering, and will also learn about the latest deep learning techniques.
fastai/fastbook: The fastai book, published as Jupyter Notebooks
January 7, 2024
The fastai book, published as Jupyter Notebooks, provides an introduction to deep learning, fastai, and PyTorch. It is copyright Jeremy Howard and Sylvain Gugger, and a selection of chapters is available to read online. The notebooks in the repository are used for a MOOC and form the basis of the book, which is available for purchase. The code in the notebooks is covered by the GPL v3 license, while the other content is not licensed for redistribution or change. It is recommended to use Google Colab to access and work with the notebooks. If there are any contributions or citations, copyright is assigned to Jeremy Howard and Sylvain Gugger.
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise, 5th Edition
January 7, 2024
The text explains how word2vec uses one-hot encoded vectors and weight matrices to represent words in a neural network model. It details the learning process for updating weights between input, hidden, and output layers based on prediction errors. The update equations for weights are derived through backpropagation to improve the model's ability to predict words within a context.
Attention? Attention!
January 7, 2024
The document explores the concept of attention, as performed by humans and deep learning algorithms. Attention is used in deep learning to transform one input sequence into another and is accomplished through an encoder-decoder architecture with LSTM or GRU units. The attention mechanism, invented to address the incapability of the fixed-length context vector, creates shortcuts between the context vector and the entire source input. Attention mechanisms vary in form, from soft or hard to global or local. The document also introduces self-attention, which relates different positions of a single sequence to compute a representation of the same sequence, and the Neural Turing Machine, a model architecture for coupling a neural network with external memory storage.
An Intuition for Attention
January 7, 2024
The transformer neural network, used by models like ChatGPT, incorporates an attention mechanism to improve performance. Attention is a key feature of transformers and is defined by an equation that involves the softmax function. Attention can take different forms, but the scaled dot product attention is commonly used. This attention mechanism is based on the idea of key-value lookups, where a query is matched with keys to retrieve corresponding values. The attention scores, which determine how much attention is given to each key-value pair, are computed using dot product similarity and transformed into decimal percentages using the softmax function. This process allows for meaningful and efficient processing of queries in large language models.
Pen and Paper Exercises in Machine Learning
January 7, 2024
This is a collection of (mostly) pen-and-paper exercises in machine learning.
The exercises are on the following topics: linear algebra, optimisation,
directed graphical models, undirected graphical models, expressive power of
graphical models, factor graphs and message passing, inference for hidden
Markov models, model-based learning (including ICA and unnormalised models),
sampling and Monte-Carlo integration, and variational inference.
Transformers From Scratch
January 7, 2024
This blog provides a step-by-step guide on creating and training a transformer from scratch. The author explains each foundational element and provides a Jupyter notebook with the code for readers to run and experiment with. The blog references a YouTube video and the Attention Is All You Need paper for further understanding. The author also mentions the availability of the final code and a dataset for download.
Mathematics for Machine Learning
January 5, 2024
I'm sorry, but there is no content provided for me to summarize.
Linear Algebra Review and Reference
January 5, 2024
Sorry, there is no content provided to summarize. Please provide the content you want me to summarize.
Probability and InformationTheory
January 5, 2024
In this chapter, the authors discuss probability theory and information theory. Probability theory is a mathematical framework for representing uncertain statements and is used in artificial intelligence for reasoning. Information theory, on the other hand, quantifies the amount of uncertainty in a probability distribution. The chapter explains various concepts, such as probability mass functions for discrete variables and probability density functions for continuous variables. It also introduces key ideas from information theory, such as entropy and mutual information. The authors provide examples and explanations to help readers understand these concepts.
Linear Algebra
January 5, 2024
Linear algebra is a fundamental topic in understanding and working with machine learning algorithms, especially deep learning algorithms. This chapter provides an introduction to scalars, vectors, matrices, and tensors, which are the key mathematical objects in linear algebra. It explains the concepts and notation used in linear algebra, such as matrix multiplication, transpose, identity and inverse matrices, and norms. The chapter also introduces special kinds of matrices and vectors, such as diagonal matrices, orthogonal matrices, and eigenvalues and eigenvectors. These concepts are important for analyzing and solving equations in machine learning.
(2) Home
January 5, 2024
Eagle Dynamics has exciting plans for the upcoming year, with the development and release of new aircraft and maps. Some highlights include the introduction of the MiG-29A Fulcrum, as well as the Afghanistan and Iraq maps. They are also continuing their work on the CH-47F, Hellcat/USS Enterprise, and the Marianas WW2 map. Fans of flight simulation can look forward to these upcoming additions to the game.
Mathematics for Machine Learning
January 5, 2024
I'm sorry, but there is no content provided for me to summarize.
An overview of gradient descent optimization algorithms
January 5, 2024
The text provides an overview of gradient descent optimization algorithms commonly used in deep learning. It explains different types of gradient descent methods like batch, stochastic, and mini-batch, highlighting their strengths and challenges. The author also discusses advanced algorithms such as Adagrad, RMSprop, and Adam, which adapt learning rates to improve optimization performance.
An overview of gradient descent optimization algorithms∗
January 5, 2024
The article provides an overview of gradient descent optimization algorithms, which are often used as black-box optimizers. The article outlines the three variants of gradient descent and summarizes the challenges. The article then introduces some widely used algorithms to deal with the challenges, including Nesterov accelerated gradient, Adagrad, Adadelta, and RMSprop. The article explains how these algorithms work and their benefits and weaknesses.
How GPT3 Works - Visualizations and Animations
January 5, 2024
Discussions:
Hacker News (397 points, 97 comments), Reddit r/MachineLearning (247 points, 27 comments)
Translations: German, Korean, Chinese (Simplified), Russian
The tech world is abuzz with GPT3 hype. Massive language models (like GPT3) are starting to surprise us with their abilities. While not yet completely reliable for most businesses to put in front of their customers, these models are showing sparks of cleverness that are sure to accelerate the march of automation and the possibilities of intelligent computer systems. Let’s remove the aura of mystery around GPT3 and learn how it’s trained and how it works.
A trained language model generates text.
We can optionally pass it some text as input, which influences its output.
The output is generated from what the model “learned” during its training period where it scanned vast amounts of text.
GPT in 60 Lines of NumPy
January 5, 2024
This post outlines how to implement a GPT (Generative Pre-trained Transformer) from scratch in just 60 lines of NumPy, including loading trained GPT-2 model weights released by OpenAI and generating text. The GPT generates text given a prompt and the task of predicting the next logical word in a sequence is called language modeling. The post explains how to train a GPT using gradient descent with respect to the cross entropy loss over the language modeling task. The post also touches on prompting and how to handle hyperparameters.
The Annotated Transformer
January 4, 2024
"The Annotated Transformer" is a paper that introduces a new architecture for natural language processing tasks, with a focus on translation. The paper provides an annotated version of the original paper, giving a line-by-line implementation of the model. The Transformer model relies on self-attention to compute representations of its input and output without using sequence-aligned recurrent neural networks or convolutions. The model consists of an encoder and decoder stack, each containing self-attention layers and position-wise feed-forward networks. The paper also discusses the use of multi-head attention and positional encoding in the model. The model is trained using the WMT 2014 English-German dataset and the Adam optimizer.
The Illustrated Transformer
January 4, 2024
"The Illustrated Transformer" is a comprehensive guide to understanding the Transformer model, which utilizes attention to improve the training speed of neural machine translation models. The model consists of stacked encoders and decoders, with each encoder and decoder having self-attention layers. Self-attention allows the model to incorporate information from other words in the input sequence, resulting in better encoding. The model also employs multi-headed attention, which allows it to focus on different positions and creates multiple sets of Query/Key/Value weight matrices. Positional encoding is used to account for the order of words in the input sequence. The architecture includes residual connections and layer normalization for each sub-layer.
GitHub - tensorflow/nmt: TensorFlow Neural Machine Translation Tutorial
January 4, 2024
TensorFlow Neural Machine Translation Tutorial. Contribute to tensorflow/nmt development by creating an account on GitHub.
What Are Word Embeddings for Text?
January 4, 2024
Word embeddings are a way to represent words with similar meanings in a similar manner using real-valued vectors. They are a key advancement in deep learning for natural language processing tasks. You can either train your own word embeddings or use pre-trained ones for your projects.
Deep Learning for Natural Language Processing
January 4, 2024
Deep Learning for Natural Language Processing Develop Deep Learning Models for your Natural Language Problems Working with Text is… important, under-discussed, and HARD We are awash with text, from books, papers, blogs, tweets, news, and increasingly text from spoken utterances. Every day, I get questions asking how to develop machine learning models for text data. Working […]
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)
January 4, 2024
The article explains the mechanics of sequence-to-sequence models, which are deep learning models used for machine translation, text summarization, and image captioning. The article includes visualizations to explain the concepts and requires some previous understanding of deep learning. The article also discusses attention models, which improve machine translation systems by allowing the model to focus on relevant parts of the input sequence. The article provides examples of how attention models work and concludes with a link to TensorFlow's Neural Machine Translation tutorial.
The Random Transformer
January 4, 2024
This blog post provides an end-to-end example of the math within a transformer model, with a focus on the encoder part. The goal is to understand how the model works, and to make it more manageable, simplifications are made and the dimensions of the model are reduced. The post recommends reading "The Illustrated Transformer" blog for a more intuitive explanation of the transformer model. The prerequisites for understanding the content include basic knowledge of linear algebra, machine learning, and deep learning. The post covers the math within a transformer model during inference, attention mechanisms, residual connections and layer normalization, and provides some code to scale it up.
GitHub - SkalskiP/courses: This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
January 4, 2024
SkalskiP/courses is a curated collection of links to various courses and resources about Artificial Intelligence (AI). It includes courses on topics such as generative AI, deep learning, natural language processing, computer vision, machine learning, and more. The repository aims to provide a comprehensive resource for beginners and experienced learners alike. Contributions from the community are encouraged to make the repository even better.
CS25: Transformers United V3
January 4, 2024
Transformers have revolutionized Natural Language Processing (NLP) and are now being applied in various fields, including Computer Vision, Reinforcement Learning, and Speech. This seminar explores the details of how Transformers work and their applications, with a focus on large language models (LLMs). The seminar includes instructor and guest lectures from experts in Transformers research. The schedule includes topics such as the creation of fine-tuned chat models, low-level embodied intelligence with foundation models, and training helpful chatbots. The seminar also covers the motivations behind Transformers, scaling human-centered machine translation, and going beyond LLMs to explore emergent abilities and intermediate-guided reasoning.
Spaces using openai/whisper-large-v2 232
January 3, 2024
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates strong generalization abilities without the need for fine-tuning. The large-v2 model, trained for 2.5x more epochs with added regularization, offers improved performance. The models can be used for transcription and translation tasks, with context tokens indicating the language and task. While the models show robustness and accuracy in many languages, they may exhibit limitations such as generating repetitive texts and hallucinations. The models have potential applications in accessibility tools but also raise concerns about dual use and surveillance capabilities.
Text Summarization: How to Calculate BertScore
January 3, 2024
BERTScore is a metric used to measure the quality of text summarization by calculating the similarity between the summary and the original text. It addresses issues that n-gram-based metrics face, such as incorrect matching of paraphrases and the inability to capture long-range dependencies. The BERTScore architecture involves contextual embeddings, cosine similarity, token matching for precision and recall, importance weighting, and baseline rescaling. The metric has the potential to improve various natural language processing tasks and can be applied in domains such as translation quality assessment, text generation, and document comparison. Future developments include broader language coverage and adaptation for multilingual texts.
Some Core Principles of Large Language Model (LLM) Tuning
January 3, 2024
Large Language Models (LLMs) like GPT2 and GPT3 are trained using unsupervised pre-training on billions to trillions of tokens. After pre-training, the models are fine-tuned for specific use cases such as chatbots or content generation. Fine-tuning can be done through supervised fine-tuning (SFT) or reinforcement learning with human feedback (RLHF). SFT involves minimizing the loss between the model's output and the correct result, while RLHF uses a reward model to optimize the model's performance. InstructGPT is an RLHF-tuned version of GPT3 that is trained to follow instructions and provide aligned responses. There are also open-source alternatives to GPT models, such as GPT-J and GPT-Neo.
MotionGPT: Human Motion as a Foreign Language
January 3, 2024
MotionGPT is a unified model for language and motion tasks, achieving top performance in text-driven motion generation. It combines natural language models with human motion tasks, benefiting fields like gaming and robotics. The model treats human motion like a foreign language, offering a versatile solution for diverse motion synthesis problems.
An intuitive introduction to text embeddings
January 2, 2024
Text embeddings are essential in natural language processing (NLP) and convert text into vector coordinates. They allow us to understand the semantic meaning of words and sentences by representing them as vectors in a high-dimensional latent space. By using text embeddings, we can capture the similarity between texts and perform tasks such as search and classification more efficiently. There are various algorithms and models, such as Word2vec and transformers, that help us generate text embeddings and capture the sequential nature of text. These advancements in text embeddings have greatly improved our ability to reason intuitively about NLP and other machine learning models.
Mathematics for Machine Learning
January 1, 2024
Generative Agents: Interactive Simulacra of Human Behavior
January 1, 2024
The article describes the concept of "generative agents", which are computational software agents that simulate believable human behavior for interactive applications. The agents are created using a large language model and can remember, reflect, and plan based on their past experiences. The article demonstrates generative agents by populating a sandbox environment with 25 agents, where users can observe and intervene as agents plan their days, form relationships, and coordinate group activities. The article discusses the architecture that enables generative agents and their potential applications in various domains.
VOYAGER: An Open-Ended Embodied Agent with Large Language Models
January 1, 2024
The article presents VOYAGER, an embodied agent that continuously explores the Minecraft world, acquires skills, and makes new discoveries without human intervention. VOYAGER consists of three key components: an automatic curriculum for exploration, a skill library for storing and retrieving complex behaviors, and an iterative prompting mechanism for program improvement. The agent utilizes Large Language Models (LLMs) and code as the action space, allowing it to represent temporally extended and compositional actions. The article also highlights VOYAGER's superior performance in discovering novel items, unlocking the Minecraft tech tree, and applying its learned skill library to unseen tasks in a newly instantiated world.
Reader: Frequently Asked Questions
January 1, 2024
Changelog December 19, 2023 Added section about the Daily Digest Explained limitations of Kindle/Google/etc books Explained link between Reader docs and Readwise highlights Updated info about auto-highlighting feature Expanded section about PDF highlights Added browser extension hot key (alt+R) December 7, 2023 Added more context for a
Extensions in Arc: How to Import, Add, & Open
January 1, 2024
Arc has full extension support. Here's how
Recent Bookmarks
Matrices and graphs
Added on June 5, 2025
The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices
DeepSeek-V3 Explained 1: Multi-head Latent Attention
Added on May 29, 2025
Key architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inference
Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT
Added on May 16, 2025
State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs.
You could have designed state of the art positional encoding
Added on May 16, 2025
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
attention is logarithmic, actually
Added on May 16, 2025
supaiku dot com § attention is logarithmic, actually § time complexity is a very bad model when working with parallelism. in which i make the case for work-depth analysis instead of time complexity.
AI Arrives In The Middle East: US Strikes A Deal with UAE and KSA – SemiAnalysis
Added on May 16, 2025
The US has signed two landmark agreements with the United Arab Emirates and Kingdom of Saudi Arabia (KSA) that that will noticeably shift the balance of power. The deals have economic, geopolitical…
Transformers Represent Belief State Geometry in their Residual Stream
Added on May 16, 2025
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS.…
Llama from scratch (or how to implement a paper without crying)
Added on May 16, 2025
I want to provide some tips from my experience implementing a paper. I'm going to cover my tips so far from implementing a dramatically scaled-down versio...
The Curse of Knowing How, or; Fixing Everything
Added on May 16, 2025
A reflection on control, burnout, and the strange weight of technical fluency.
The MAP-Elites Algorithm: Finding Optimality Through Diversity
Added on May 16, 2025
MAP-Elites is a method in reinforcement learning to avoid the local optimum of a search space by storing multiple candidate solutions…
How To Scale
Added on May 13, 2025
While there are already excellent posts on scaling, I wanted to share my own understanding and things i've learned from my past few months and hopefully spark some discussion. I hope this post can shed light for anyone navigating the challenges of scaling up neural networks. And there may be mistakes or inaccuracies, so if you want to correct me or would like to discuss further, please feel free to DM me on X or leave a comment.
Are Transformers universal approximators of sequence-to-sequence functions?
Added on May 3, 2025
Despite the widespread adoption of Transformer models for NLP tasks, the expressive power of these models is not well-understood. In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models. Furthermore, using positional encodings, we circumvent the restriction of permutation equivariance, and show that Transformer models can universally approximate arbitrary continuous sequence-to-sequence functions on a compact domain. Interestingly, our proof techniques clearly highlight the different roles of the self-attention and the feed-forward layers in Transformers. In particular, we prove that fixed width self-attention layers can compute contextual mappings of the input sequences, playing a key role in the universal approximation property of Transformers. Based on this insight from our analysis, we consider other simpler alternatives to self-attention layers and empirically evaluate them.
a Hugging Face Space by nanotron
Added on May 3, 2025
The ultimate guide to training LLM on large GPU Clusters