Bookmarks

Notes/Primer on Clang Compiler Frontend (1) : Introduction and Architecture

Notes/Primer on Clang Compiler Frontend: Introduction and Architecture These are my notes on chapters 1 & 2 of the Clang Compiler Frontend by Ivan Murashko. The book is focused on teaching the fundamentals of LLVM to C++ engineers who are interested in learning about compilers to optimize their daily workflow by enhancing their code quality and overall development process. (I’ve referened this book extensively, and a lot of the snippets here are from this book.

Implementation of simple microprocessor using verilog

I am trying to make a simple microprocessor in verilog as a way to understand verilog and assembly at the same time. I am not sure if I am implementing what I think of microprocessors well enough ...

learn-fpga/FemtoRV/TUTORIALS/FROM_BLINKER_TO_RISCV/README.md at master · BrunoLevy/learn-fpga · GitHub

Learning FPGA, yosys, nextpnr, and RISC-V . Contribute to BrunoLevy/learn-fpga development by creating an account on GitHub.

Why async Rust?

I genuinely can’t understand how anybody could look at the mess that’s Rust’s async and think that it was a good design for a language that already had the reputation of being very complicated to write.

Softmax Attention is a Fluke

Calibrated AttentionCalibrated Attention NanoGPTAttention is the magic ingredient of modern neural networks. It is the core of what has launched performant language models into the spotlight starting with GPT, and since then, it has extended its hands across all modalities.There are a number of desirable properties that make attention a first-class building block. Namely: • It handles variable sequence lengths with ease • It allows for a global receptive field without needing to scale parameters

Transformers Laid Out

I have encountered that there are mainly three types of blogs/videos/tutorials talking about transformers

Template Haskell

Intuitively Template Haskell provides new language features that allow us to convert back and forth between concrete syntax, i. e.

A friendly introduction to machine learning compilers and optimizers

[Twitter thread, Hacker News discussion]

Comments on Source

The section of the wiki allows anyone to document, explain, post questions, or make comments on the Lua source code. You may link to [1] or paste the code in question.

Bloom’s 3 Stages of Talent Development

First, fun and exciting playtime. Then, intense and strenuous skill development. Finally, developing one’s individual style while pushing the boundaries of the field.

The Making of Python

Guido van Rossum is the author of Python, an interpreted, interactive object-oriented programming language.

tt-metal/METALIUM_GUIDE.md at main · tenstorrent/tt-metal · GitHub

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model. - tenstorrent/tt-metal

Scoping out the Tenstorrent Wormhole

The Tenstorrent Wormhole n300s PCIe accelerator board is available for purchase, featuring 672 RISC-V cores driving 466 TFLOP/s of FP8 matmul.

What’s the (floating) Point of all these data types? A (not so) brief overview of the history and usage of datatypes within the wide world of computation

This presentation delves into the fascinating and sometimes aggravating world of numerical data types, exploring the evolution, strengths, and weaknesses of decimal, fixed point, floating point, and shared exponent formats over the past 70 years.

Physics of language models

Many asked about collaborations (details are in FAQ). Short answer: unless you're from Meta and willing to work with us in your spare time (20+ hrs/week), or you're an early-year PhD from UCB/NYU/CMU/UW (but application ddl was Jan 10, 2025). Citation request: I'm delighted to know that multiple

Tenstorrent first thoughts

I've looked into alternative AI accelerators to continue my saga of running GGML on lower power-consumption hardware. The most promising - and the only one that ever replied to my emails - was Tenstorrent. This post is me deeply thinking about if buying their hardware for development is a good inve ...

Why Attention Is All You NeedWhy Attention Is All You Need

The Transformer architecture introduced in this paper was a major breakthrough in sequence transduction methodologies, particularly within neural machine translation (NMT) and broader natural language processing (NLP).

CFD Python: 12 steps to Navier-Stokes

We announce the public release of online educational materials for self-learners of CFD using IPython Notebooks: the CFD Python Class!

tt-mlir documentation

The following document provides an overview of the TT-MLIR project, with a focus on the technical specifications of an MLIR-based compiler stack. So what exactly is an MLIR-based compiler stack?

Tutorials

Multi-Level IR Compiler Framework

How to Think About TPUs

All about how TPUs work, how they're networked together to enable multi-chip training and inference, and how they limit the performance of our favorite algorithms. While this may seem a little dry, it's super important for actually making models efficient.

Programming Really Is Simple Mathematics

A re-construction of the fundamentals of programming as a small mathematical theory (PRISM) based on elementary set theory. Highlights: $\bullet$ Zero axioms. No properties are assumed, all are proved (from standard set theory). $\bullet$ A single concept covers specifications and programs. $\bullet$ Its definition only involves one relation and one set. $\bullet$ Everything proceeds from three operations: choice, composition and restriction. $\bullet$ These techniques suffice to derive the axioms of classic papers on the "laws of programming" as consequences and prove them mechanically. $\bullet$ The ordinary subset operator suffices to define both the notion of program correctness and the concepts of specialization and refinement. $\bullet$ From this basis, the theory deduces dozens of theorems characterizing important properties of programs and programming. $\bullet$ All these theorems have been mechanically verified (using Isabelle/HOL); the proofs are available in a public repository. This paper is a considerable extension and rewrite of an earlier contribution [arXiv:1507.00723]

Tenstorrent Wormhole Series Part 1: Physicalities

A company called Tenstorrent design and sell PCIe cards for AI acceleration. At the time of writing, they've recently started shipping their Wormhole n150s and Wormhole n300s cards.

Unnamed Document

Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling

As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is emerging. Also known as AI reasoning or long…

Build Your Own Text Editor

The text editor is antirez’s kilo, with some changes.

Tilde, my LLVM alternative

I'm Yasser and I've made it my mission to produce an alternative to LLVM, the current king of compiler backend libraries.

A WebAssembly compiler that fits in a tweet

Starting with a 192-byte one-liner that implements a Reverse Polish Notation arithmetic compiler, we'll work backward to transform it into readable JavaScript by removing one code golf trick at a time

Unnamed Document

Why Futhark?

A high-performance and high-level purely functional data-parallel array programming language that can execute on the GPU and CPU.

Ödeme - Pozitif Teknoloji

*Lütfen açıklama kısmına sipariş numaranızı giriniz, Sipariş numarası yazılmayan havale işlemlerinde ki gecikmelerden firmamız sorumlu değildir.

Bloom filters debunked: Dispelling 30 Years of bad math with Coq!

While conceptually simple, this feature actually requires more engineering effort than one would expect - in particular, tracking the set of known malicious URLs in a practical manner turns out to be somewhat difficult.

by Marcus Hutter and David Quarel and Elliot Catt

The book can be ordered from amazon. com / co.

The Double-E Infix Expression Parsing Method

Topic in Programming Models

Demystifying Debuggers, Part 2: The Anatomy Of A Running Program

On the concepts involved in a running program. What happens, exactly, when you double click an executable file, or launch it from the command line, and it begins to execute?

Algebraic Databases

Databases have been studied category-theoretically for decades. The database schema---whose purpose is to arrange high-level conceptual entities---is generally modeled as a category or sketch. The data itself, often called an instance, is generally modeled as a set-valued functor, assigning to each conceptual entity a set of examples. While mathematically elegant, these categorical models have typically struggled with representing concrete data such as integers or strings. In the present work, we propose an extension of the set-valued functor model, making use of multisorted algebraic theories (a.k.a. Lawvere theories) to incorporate concrete data in a principled way. This also allows constraints and queries to make use of operations on data, such as multiplication or comparison of numbers, helping to bridge the gap between traditional databases and programming languages. We also show how all of the components of our model---including schemas, instances, change-of-schema functors, and queries - fit into a single double categorical structure called a proarrow equipment (a.k.a. framed bicategory).

FPGAs for Software Engineers 0: The Basics

A brief introduction to FPGAs, Verilog and simulation

Data-Oriented Design

Data-Oriented Design

BLT__Patches_Scale_Better_Than_Tokens

On Ousterhout’s Dichotomy Oct 6, 2024

Why are there so many programming languages? One of the driving reasons for this is that some languages tend to produce fast code, but are a bit of a pain to use (C++), while others are a breeze to write, but run somewhat slow (Python). Depending on the ratio of CPUs to programmers, one or the other might be relatively more important.

The categorical abstract machine

The Cartesian closed categories have been shown by several authors to provide the right framework of the model theory of λ-calculus. The second author…

Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the universal algebra of monads valued in a 2-category of parametric maps -- as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.

Fundamental Components of Deep Learning: A category-theoretic approach

Deep learning, despite its remarkable achievements, is still a young field. Like the early stages of many scientific disciplines, it is marked by the discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform and compositional mathematical foundation. From the intricacies of the implementation of backpropagation, through a growing zoo of neural network architectures, to the new and poorly understood phenomena such as double descent, scaling laws or in-context learning, there are few unifying principles in deep learning. This thesis develops a novel mathematical foundation for deep learning based on the language of category theory. We develop a new framework that is a) end-to-end, b) unform, and c) not merely descriptive, but prescriptive, meaning it is amenable to direct implementation in programming languages with sufficient features. We also systematise many existing approaches, placing many existing constructions and concepts from the literature under the same umbrella. In Part I we identify and model two main properties of deep learning systems parametricity and bidirectionality by we expand on the previously defined construction of actegories and Para to study the former, and define weighted optics to study the latter. Combining them yields parametric weighted optics, a categorical model of artificial neural networks, and more. Part II justifies the abstractions from Part I, applying them to model backpropagation, architectures, and supervised learning. We provide a lens-theoretic axiomatisation of differentiation, covering not just smooth spaces, but discrete settings of boolean circuits as well. We survey existing, and develop new categorical models of neural network architectures. We formalise the notion of optimisers and lastly, combine all the existing concepts together, providing a uniform and compositional framework for supervised learning.

Logical Complexity of Proofs

If you cannot find proofs, talk about them. Robert Reckhow with his advsior Stephen Cook famously started the formal study of the complexity of proofs with their 1979 paper. They were interested in…

Richard Hamming - Wikipedia

Richard Wesley Hamming (February 11, 1915 – January 7, 1998) was an American mathematician whose work had many implications for computer engineering and telecommunications.

What is the "question" that programming language theory is trying to answer?

I've been interested in various topics like Combinatory Logic, Lambda Calculus, Functional Programming for a while and have been studying them. However, unlike the "Theory of Computation" which str...

Introducing Limbo: A complete rewrite of SQLite in Rust

we forked SQLite with the libSQL project. What would it be like if we just rewrote it?

TLA+ is hard to learn

I’m a fan of the formal specification language TLA+. With TLA+, you can build models of programs or systems, which helps to reason about their behavior. TLA+ is particularly useful for reason…

How hard is constraint programming?

Writing code using the Z3 SMT solver is different from typical programming, due to mixed programming models--not unlike CUDA for GPUs. Here's what to expect.

Fundamental Components of Deep Learning: A category-theoretic approach

Deep learning, despite its remarkable achievements, is still a young field. Like the early stages of many scientific disciplines, it is marked by the discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform and compositional mathematical foundation. From the intricacies of the implementation of backpropagation, through a growing zoo of neural network architectures, to the new and poorly understood phenomena such as double descent, scaling laws or in-context learning, there are few unifying principles in deep learning. This thesis develops a novel mathematical foundation for deep learning based on the language of category theory. We develop a new framework that is a) end-to-end, b) unform, and c) not merely descriptive, but prescriptive, meaning it is amenable to direct implementation in programming languages with sufficient features. We also systematise many existing approaches, placing many existing constructions and concepts from the literature under the same umbrella. In Part I we identify and model two main properties of deep learning systems parametricity and bidirectionality by we expand on the previously defined construction of actegories and Para to study the former, and define weighted optics to study the latter. Combining them yields parametric weighted optics, a categorical model of artificial neural networks, and more. Part II justifies the abstractions from Part I, applying them to model backpropagation, architectures, and supervised learning. We provide a lens-theoretic axiomatisation of differentiation, covering not just smooth spaces, but discrete settings of boolean circuits as well. We survey existing, and develop new categorical models of neural network architectures. We formalise the notion of optimisers and lastly, combine all the existing concepts together, providing a uniform and compositional framework for supervised learning.

Geeks, MOPs, and sociopaths in subculture evolution

How muggles and sociopaths invade and undermine creative subcultures; and how to stop them.

Advanced programming languages

Students often ask for a recommendation on what language they should learn next.

llama.cpp guide - Running LLMs locally, on any hardware, from scratch

Psst, kid, want some cheap and small LLMs?

GitHub - avinassh/py-caskdb: (educational) build your own disk based KV store

(educational) build your own disk based KV store. Contribute to avinassh/py-caskdb development by creating an account on GitHub.

Command Line Interface Guidelines

An open-source guide to help you write better command-line programs, taking traditional UNIX principles and updating them for the modern day.

How Many Computers Are In Your Computer?

Any ‘computer’ is made up of hundreds of separate computers plugged together, any of which can be hacked. I list some of these parts.

Design Of This Website

Meta page describing Gwern.net, the self-documenting website’s implementation and experiments for better ‘semantic zoom’ of hypertext; technical decisions using Markdown and static hosting.

Being the (Pareto) Best in the World

John Wentworth argues that becoming one of the best in the world at *one* specific skill is hard, but it's not as hard to become the best in the worl…

Some questions

Google's approach to email

Fastest contributed programs, grouped by programming language implementation

Charts showing benchmark program performance grouped by implementation language.

Haskell as fast as C: working at a high altitude for low level performance

After the last post about high performance, high level programming, Slava Pestov, of Factor fame, wondered whether it was generally true that “if you want good performance you have to write C…

On Competing with C Using Haskell

Mark Karpov wrote in his article on Migrating text metrics to pure Haskell how he originally did foreign calls out to C for many of the functions in his text metric package, but now ported them to Haskell when he learned that Haskell can give you performance comparable to C.

Performance

Moreover, it's often not clear if two programs which supposedly have the same functionality really do the same thing.

Proof Explorer

Inspired by Whitehead and Russell's monumental Principia Mathematica, the Metamath Proof Explorer has over 26,000 completely worked out proofs in its main sections (and over 41,000 counting "mathboxes", which are annexes where contributors can develop additional topics), starting from the very foundation that mathematics is built on and eventually arriving at familiar mathematical facts and beyond.

An Invitation to Applied Category Theory

Abstract page for arXiv paper 1803.05316: Seven Sketches in Compositionality: An Invitation to Applied Category Theory

An Invitation to Applied Category Theory

Cambridge Core - Programming Languages and Applied Logic - An Invitation to Applied Category Theory

Introducing io_uring_spawn

The traditional mechanism for launching a program in a new process on Unix systems—forking and execing—has been with us for decades, but it is not really the most efficient of operations.

Daniel Lemire's blog

I find that there can still be a significant benefit to using csFastFloat over the . NET library: it can be about 3 times faster.

A Beginner's Guide to Vectorization By Hand: Part 3

We're continuing our expendition to the world of manual vectorization. In this part, we will explain the most common technique for vectorizing conditional code (usually referred as if-conversion).

Competitive Programming

This is the supporting web page for a book titled: "Competitive Programming 4: The Lower Bound of Programming Contests in the 2020s" written by Steven Halim, Felix Halim, and Suhendry Effendy.

MOND←TECH MAGAZINE

This is a website, which means it sometimes goes offline

Algorithms for Modern Hardware

Its intended audience is everyone from performance engineers and practical algorithm researchers to undergraduate computer science students who have just finished an advanced algorithms course and want to learn more practical ways to speed up a program than by going from O(nlogn) to O(nloglogn).

Creating enums at comptime

Using zig's @Type to dynamically create enums at comptime

Zig's new declaration literals

A look at Zig's new declaration literals

Zig's (.{}){} syntax

A look at some unfriendly Zig syntax

How LLVM Optimizes a Function

In some compilers the IR format remains fixed throughout the optimization pipeline, in others the format or semantics change.

How 99% of C Tutorials Get it Wrong

But this article did not arise only from my own opinion. The argument I'll present here, at least in its general form, is one which programmers who I know personally and I admire a lot (e.

A Beginner's Guide to Vectorization By Hand: Part 1

The CPU vendors have been trying for a lot of time to exploit as much parallelism as they can and the introduction of vector instructions is one way to go.

Tell the Compiler What You Know

Compilers a lot of times use magic to uncover hidden mysteries of your program and optimize it aggressively.

Compiler Optimization in a Language you Can Understand

In this article, I'll explain compiler optimizations through a series of examples, focusing on what compilers do.

How Target-Independent is Your IR?

An esoteric exploration on the target independence of compiler IRs.

Numerical Recipes

We are Numerical Recipes, one of the oldest continuously operating sites on the Internet.

For Beginners

Occasional writings about Haskell.

Oasis: A Universe in a Transformer

Generating Worlds in Realtime

A Fat Pointer Library

libCello Official Website

TCP Server in Zig - Part 5a - Poll

Using non-blocking sockets and poll to improve the scalability of our system.

6.824 Schedule: Spring 2022

Here is the tentative schedule of lectures and due dates. The lecture notes and paper questions for future dates are copies from previous years, and may change.

Typing the technical interview

In the formless days, long before the rise of the Church, all spells were woven of pure causality, all actions were permitted, and death was common.

Reversing the technical interview

If you want to get a job as a software witch, you’re going to have to pass a whiteboard interview.

Hexing the technical interview

But Hacker News has read of you, in their snicker-slithing susurrential warrens, and word has spread, which is why the young man offering you a smörgåsbord of microkitchen delights looks mildly suspicious already.

Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)

General Lessons from Boosting Data Ingestion in the range-set-blaze Crate by 7x

B-trees and database indexes

B-trees are used by many modern DBMSs. Learn how they work, how databases use them, and how your choice of primary key can affect index performance.

Safe C++

Over the past two years, the United States Government has been issuing warnings about memory-unsafe programming languages with increasing urgency.

Async Rust can be a pleasure to work with (without `Send + Sync + 'static`)

Async Rust is powerful. And it can be a pain to work with (and learn). Async Rust can be a pleasure to work with, though, if we can do it without `Send + Sync + 'static`.

The Perfect Plan

Too often do we obsess over the perfect plan to chase our dreams, resulting in analysis paralysis. Instead of being stuck in this limbo, I've made the perfect plan for anyone to chase their dreams.

Zig's BoundedArray

A quick look at how and why to use Zig's BoundedArray.

Linus Torvalds talks AI, Rust adoption, and why the Linux kernel is 'the only thing that matters'

In a wide-ranging conversation with Verizon open-source officer Dirk Hohndel, 'plodding engineer' Linus Torvalds discussed where Linux is today and where it may go tomorrow.

Intercepting and modifying Linux system calls with ptrace

Intercepting and modifying Linux system calls with ptrace

What's the big deal about Deterministic Simulation Testing?

What's the big deal about Deterministic Simulation Testing?

Zig and Emulators

Some quick Zig feedback in the context of a new 8-bit emulator project I starteda little while ago:

Linkers part 1

I’ve been working on and off on a new linker.

A ToC of the 20 part linker essay

I release this message (the ToC and comments) into the public domain, no right reserved.

trading_interview_blog

`zig cc`: a Powerful Drop-In Replacement for GCC/Clang

If you have heard of Zig before, you may know it as a promising new programming language which is ambitiously trying to overthrow C as the de-facto systems language.

Zig Build System

The fundamental commands zig build-exe, zig build-lib, zig build-obj, and zig test are often sufficient.

Resources for Amateur Compiler Writers

I know complete pans of the literature are left out, but this is a page for amateur compiler writers. Anything that I did not find practical is not listed here.

Putting the “You” in CPU

Curious exactly what happens when you run a program on your computer? Learn how multiprocessing works, what system calls really are, how computers manage memory with hardware interrupts, and how Linux loads executables.

How to Compile Your Language

The guide also covers how to create a platform-specific executable with the help of the LLVM compiler infrastructure, which all of the previously mentioned languages use for the same purpose.

Introduction to the Odin Programming Language

Preface This article is an introduction the Odin Programming Language. It is aimed at people who know a bit of programming, but have never touched Odin. It is not a reference guide, rather I try to keep things informal and talk about what I think are important aspects of the language. There will be some notes on differences to C/C++, as Odin in many ways tries to be better C. If you enjoy this article and want to support me, then you can do so by becoming a patron.

Arena allocator tips and tricks

Over the past year I’ve refined my approach to arena allocation. With practice, it’s effective, simple, and fast; typically as easy to use as garbage collection but without the costs.

No Starch Press

Your billing information must match the billing address for the credit card entered below or we will be unable to process your payment.

Part 2: Portable Executable Files

bytecode interpreters for tiny computers

I've previously come to the conclusion that there's little reason for using bytecode in the modern world, except in order to get more compact code, for which it can be very effective.

How I built zig-sqlite

When you prepare a statement zig-sqlite creates a brand new type only for this prepared statement.

The Hunt for the Missing Data Type

A (directed) graph is a set of nodes, connected by arrows (edges). The nodes and edges may contain data. Here are some graphs: All graphs made with graphviz (source) Graphs are ubiquitous in software engineering: Package dependencies form directed graphs, as do module imports. The internet is a graph of links between webpages. Model checkers analyze software by exploring the “state space” of all possible configurations.

Microfeatures I'd like to see in more languages

There are roughly three classes of language features: Features that the language is effectively designed around, such that you can't add it after the fact....

Google’s Fully Homomorphic Encryption Compiler — A Primer

Back in May of 2022 I transferred teams at Google to work on Fully Homomorphic Encryption (newsletter announcement). Since then I’ve been working on a variety of projects in the space, includ…

Will I be able to access proprietary platform APIs (e.g. Android / iOS)?

The kind of binary format being considered for WebAssembly can be natively decoded much faster than JavaScript can be parsed (experiments show more than 20× faster).

The future of Clang-based tooling

By Peter Goodman Clang is a marvelous compiler; it’s a compiler’s compiler! But it isn’t a toolsmith’s compiler. As a toolsmith, my ideal compiler would be an open book, allowing me to get to…

Fast Multidimensional Matrix Multiplication on CPU from Scratch

Numpy can multiply two 1024x1024 matrices on a 4-core Intel CPU in ~8ms.This is incredibly fast, considering this boils down to 18 FLOPs / core / cycle, with...

Efficient n-states on x86 systems

The text discusses how to efficiently handle control flow in x86 systems when a flag can have multiple states beyond true and false. It explains how to use condition codes, such as testing for zero and parity, to minimize the number of instructions needed for these tests. Additionally, it touches on the challenges and limitations of using inline assembly for optimization in C programming.

Program tuning as a resource allocation problem

Program tuning involves balancing simplicity and performance while sharing cache resources among various subsystems. Optimizing one function can impact others, making it a global resource allocation problem that requires careful consideration of algorithms and their resource footprints. Better tools and metrics are needed to manage and analyze cache resource consumption effectively.

How web bloat impacts users with slow connections

Web bloat makes many websites difficult to use for people with slow internet connections and devices. Sites like Discourse and Reddit perform poorly on low-end devices, even if they seem fast on high-end ones. Improving web performance for these users is crucial, as many people rely on older, slower devices.

Files are hard

Writing files in a way that ensures their robustness is challenging due to the complexity involved. The paper discusses various issues related to file corruption and data loss, such as crash consistency, filesystem semantics, filesystem correctness, error handling, and error recovery. It highlights the differences in how different filesystems handle errors and points out bugs and inconsistencies found in popular filesystems. The paper also addresses the frequency of disk errors and data corruption, emphasizing the need for caution when writing files and the importance of using libraries or tools to ensure safety. Overall, the document emphasizes the difficulty of reasoning about file-related problems and the need for careful considerations when working with filesystems.

Ringing in a new asynchronous I/O API

The new "io_uring" interface simplifies asynchronous I/O in the Linux kernel by using two ring buffers for submission and completion queues. Applications can set up these buffers with a system call and submit I/O requests through a structured format. This method aims to reduce complaints about AIO by improving efficiency and ease of use.

Optimizing subroutines in assembly language

Optimizing subroutines in assembly language involves various techniques such as using inline assembly in a C++ compiler, separating code using MMX registers from code using ST registers, and understanding different register sizes and memory operands. It is important to consider the use of instruction prefixes, intrinsic functions for vector operations, and accessing class and structure members efficiently. Additionally, preventing false dependences, aligning loop and subroutine entries, and optimizing instruction sizes can improve performance. However, it is crucial to note that these optimizations are processor-specific and may vary depending on the target platform.

Brian Robert Callahan

This blog post starts a series on creating programs that demystify how programs work. The first program is a disassembler that reads bytecode and converts it into assembly language, while a future post will cover creating an assembler. The disassembler uses a table of mnemonics and instruction sizes to print out the corresponding assembly instructions from bytecode.

QBE vs LLVM

QBE and LLVM are both compiler backends, but QBE is a smaller, more accessible project aimed at amateur language designers. While LLVM is feature-rich and complex, QBE focuses on simplicity and efficiency, making it easier to use for quick projects. QBE provides straightforward operations and a cleaner intermediate language, reducing the complexity often found in LLVM.

Recent presentations and papers

Andi Kleen's work focuses on improving Linux performance through various techniques like hardware monitoring and profiling. He has presented on topics such as lock elision, multi-core scalability, and error handling in the Linux kernel. His contributions include discussions on modern CPU performance, tools for Linux development, and enhancements for energy efficiency.

How long does it take to make a context switch?

Context switching times vary significantly across different Intel CPU models, with more expensive CPUs generally performing better. The performance can be greatly affected by cache usage and thread migration between cores, leading to increased costs when tasks are switched. Optimizing the number of threads to match the number of hardware threads can improve CPU efficiency and reduce context switching overhead.

Ghostty Devlog 001

Ghostty is a terminal emulator developed as a side project. In this devlog, the author shares details about the tech stack behind Ghostty, including its cross-platform capabilities and GPU acceleration. The devlog also introduces two features: automatic shell integration injection and auto-italicize fonts. The shell integration feature improves prompt redrawing, working directory reporting, and active process detection, while the auto-italicize fonts feature fixes a bug and adds the ability to skew regular fonts to create fake italics. The devlog concludes by inviting readers to follow the author on social media for updates and future devlogs.

Tiled Matrix Multiplication

Tiled matrix multiplication is an efficient algorithm used on GPUs that reduces memory access by utilizing shared memory. By organizing threads into blocks, each thread can perform calculations more quickly and with fewer memory accesses. This method is important for improving performance in tasks like graphics rendering and machine learning.

Rust Atomics and Locks

This book by Mara Bos explores Rust programming language's concurrency features, including atomics, locks, and memory ordering. Readers will gain a practical understanding of low-level concurrency in Rust, covering topics like mutexes and condition variables. The book provides insights on implementing correct concurrency code and building custom locking and synchronization mechanisms.

Compiler Backend

The QBE compiler backend is designed to be a compact yet high-performance C embeddable backend that prioritizes correctness, simplicity, and user-friendliness. It compiles on various x64 operating systems and boasts features like IEEE floating point support, SSA-based intermediate language, and quick compilation times. While currently limited to x64 platforms, plans include ARM support and further enhancements. The backend has been successfully utilized in various projects, showcasing its adaptability and effectiveness in compiler development.

Vale's Memory Safety Strategy: Generational References and Regions

Vale's memory safety strategy uses generational references to manage memory without relying on traditional methods like garbage collection. Each reference stores a "generation" ID, and before accessing an object, a check ensures the ID matches the object's current generation. This approach allows for efficient memory management while maintaining safety, reducing overhead significantly compared to other methods.

Introduction

Wait-freedom ensures that each thread can progress independently, executing operations in a fixed number of steps without being blocked by others. Lock-freedom allows the system to make overall progress, but individual threads might still get stuck. Obstruction-freedom means a thread can only progress without interference from others, making it a weaker guarantee than lock-freedom.

Cache-Oblivious Algorithms

Cache-oblivious algorithms are designed to use processor caches efficiently without needing to know specific cache details. They work by dividing data into smaller parts, allowing more computations to happen in cache and reducing memory access. This leads to better performance, especially in parallel algorithms, by minimizing shared memory bottlenecks.

A Memory Allocator

A memory allocator is software that manages dynamic memory allocation in programs, providing functions like malloc(), free(), and realloc(). This particular allocator aims to minimize memory wastage and improve efficiency, and it is widely used in various systems, including Linux. It employs techniques like coalescing freed chunks and supports memory mapping to enhance performance and reduce fragmentation.

1024cores

Dmitry Vyukov shares information on synchronization algorithms, multicore design patterns, and high-performance computing on his website, 1024cores.net. He focuses on shared-memory systems and does not cover topics like clusters or GPUs. New content is added regularly, and readers can subscribe for updates.

Implementing interactive languages

Implementing an interactive language requires considering both compile-time and run-time performance. Traditional switch-based bytecode interpreters are easy to implement but have lower run-time performance compared to optimizing compilers. A sweet spot in performance can be found by aiming for combined compile-time and run-time performance within a certain range. Various options for implementing fast interpreters, existing compilers like LLVM and Cranelift, custom compilers, and using WebAssembly as a backend are discussed. The idea of having two backends for a language to support quick startup and aggressive optimization is also explored. There are still many unknowns and further research is needed to determine the feasibility and performance of different approaches.

Pointers Are Complicated, or: What's in a Byte?

The document explains the complexities of pointers in low-level programming languages like C++ and Rust, debunking the misconception that pointers are simple integers. It delves into examples showing how assumptions about pointers can lead to undefined behavior and how pointer arithmetic can be tricky. The text proposes a model where a pointer is a pair of an allocation ID and an offset, rather than just an integer. Additionally, it discusses the challenges of representing bytes in memory, especially when dealing with uninitialized memory and the need for a more nuanced byte representation to ensure program correctness.

Three Architectures for a Responsive IDE

The text discusses three architectures for a responsive IDE: indexing on a per-file basis, using a FQN index for completion, and a query-based compiler approach. Each approach has its own challenges and benefits, such as handling macro expansions and managing dependencies efficiently to ensure fast performance.

How a Zig IDE Could Work Feb 10, 2023

The author discusses how to build an Integrated Development Environment (IDE) for the Zig programming language, which has unique features like a simple syntax but also complex compile-time evaluation. The IDE needs to handle incomplete code and provide immediate feedback while managing rapid code changes. The post explores various strategies for efficiently processing code, such as using abstract interpretation and optimizing compilation to focus only on necessary parts of the codebase.

Properly Testing Concurrent Data Structures Jul 5, 2024

The article discusses how to effectively test concurrent data structures by using managed threads that can be paused and resumed. It explains the importance of controlling thread execution to avoid issues like race conditions while executing random operations. The author emphasizes the need for proper synchronization mechanisms to ensure that only one thread is active at a time during tests.

Parse, don’t validate

The text discusses the importance of parsing over validating in Haskell to prevent errors and enhance code reliability by using strong argument types. Parsing upfront helps maintain consistency and avoids potential issues with partial input processing, demonstrating the benefits of type-driven design in Haskell programming. The text also touches on the subjective nature of programming languages, highlighting differing perceptions of Haskell and the challenges faced by learners in navigating diverse opinions.

Too Fast, Too Megamorphic: what influences method call performance in Java?

The performance of method calls in Java can be improved through techniques like inlining and using inline caches. Monomorphic calls, where only one method can be invoked, are the fastest, while bimorphic and megamorphic calls are slower due to increased lookup costs. The study highlights that simply adding the "final" keyword or overriding methods does not significantly enhance performance.

The Black Magic of (Java) Method Dispatch

The content shows code execution percentages for different operations within a program. It includes instructions for handling different coders, with comparisons and jumps based on coder values. The code includes sections like the main entry point, epilogue, handling other coders, and specific coder cases like Coder1 and Coder2.

Why null sucks, even if it's checked

The article discusses the problems with using null in programming languages like Kotlin and C#, highlighting that null can lead to confusion and errors. It argues that null is not an extensible solution for representing absence of value and suggests using sum types or optional types instead. The author believes that languages should focus on improving optional types rather than trying to make null safer.

Resources for Building Programming Languages

The article shares resources for learning how to create programming languages, focusing on Rust and C. It highlights the book "Crafting Interpreters," which provides practical insights into building interpreters using different programming approaches. The author also discusses their personal experience building a language and the tools they've found helpful, like LLVM and Cranelift.

Little 'Big Ideas' in Programming Language Design

Colin Davis discusses "little big ideas" in programming language design, focusing on the balance between innovative features and conventional choices. He highlights Mojo and Go as examples, noting how Mojo combines modern improvements with familiar concepts, while Go prioritizes simplicity and a strong ecosystem. Davis suggests that small design decisions, like memory management and parameter passing, can greatly enhance a language's usability and performance.

Computer Networking: A Top-Down Approach

Jim Kurose and Keith Ross are prominent computer science professors with extensive experience in networking and related fields. They have received multiple awards for their teaching and research, and both have held leadership roles in academic and professional organizations. Their work focuses on topics like network protocols, security, and multimedia communication.

Using Uninitialized Memory for Fun and Profit Posted on Friday, March 14, 2008.

A clever trick involves using uninitialized memory to improve performance in certain programming situations by representing sparse sets efficiently with two arrays that point at each other. This technique allows for fast constant-time operations for adding, checking, and clearing elements in the set, making it a valuable tool for optimizing algorithms and data structures. The sparse set representation is especially useful for scenarios where speed is critical, such as in compiler optimizations and graph traversal algorithms.

Zip Files All The Way Down

The text discusses creating self-reproducing programs and files like zip files that can decompress to themselves. It explores using Lempel-Ziv compression for self-reproduction and the challenges of translating these concepts into real opcode encodings like DEFLATE used in gzip and zip files. The ultimate goal is to create a zip file that contains a larger copy of itself recursively, creating a chain of expanding zip files.

UTF-8: Bits, Bytes, and Benefits Posted on Friday, March 5, 2010.

UTF-8 is a straightforward way to encode Unicode code points into a byte stream, and understanding its inner workings is key to leveraging its benefits. Key properties of UTF-8 include preserving ASCII files, ensuring ASCII bytes are represented as themselves, and requiring code points to be encoded using the shortest possible sequence. The encoding is self-synchronizing, facilitating substring searches and making it compatible with most programs that handle 8-bit files safely. While some tools may need modification to handle UTF-8, it is increasingly becoming the standard encoding due to its practical advantages and simple design.

Minimal Boolean Formulas

The post discusses how to compute the minimum number of AND and OR operators needed for Boolean functions with five variables. It describes the author's program that efficiently calculates this minimum for various functions while also improving algorithms for speed. The findings contribute to understanding the complexity of Boolean functions and their representations.

Hacking the OS X Kernel for Fun and Profiles Posted on Tuesday, August 13, 2013.

The article discusses a bug in the OS X kernel related to how profiling signals are delivered in multithreaded processes. It explains that the kernel incorrectly sends the SIGPROF signal to the entire process instead of the specific running thread. The author outlines a fix involving a small edit to the kernel code to ensure the signal is sent to the correct thread.

How To Build a User-Level CPU Profiler Posted on Thursday, August 8, 2013.

The text discusses how the pprof tool simplifies CPU profiling for C++ and Go programs by utilizing hardware timers and the operating system. Profiling information is gathered through hardware interrupts, providing insights into a program's performance and resource usage. By moving profiling logic to user-level timers, programs can customize and enhance profiling capabilities without kernel changes.

An Encoded Tree Traversal

The text discusses different ways to traverse binary trees and how these methods can be generalized to k-ary trees. It highlights a new ordering for traversing k-ary trees that results in a regular numbering pattern, which is not present in the traditional methods. The author seeks references or examples of this k-ary-coded traversal order, which he has not yet found.

Our Software Dependency Problem

The text discusses the risks and benefits of using software dependencies in programming. It emphasizes the importance of understanding, managing, and monitoring dependencies to prevent potential issues like bugs and security vulnerabilities. The article highlights the need for developers to establish best practices for effectively utilizing dependencies in their projects.

The Magic of Sampling, and its Limitations Posted on Saturday, February 4, 2023.

Sampling can help estimate the percentage of items with a specific trait accurately. The number of samples taken greatly affects the accuracy of the estimate. To get precise estimates, all items must have an equal chance of being selected during sampling.

Running the “Reflections on Trusting Trust” Compiler Posted on Wednesday, October 25, 2023.

The text discusses how to modify a C compiler to insert a backdoor into a program without leaving traces in the source code. It explains that the backdoor can be detected because the compiler's size increases each time it compiles itself. Finally, it highlights the importance of using trusted compilers to prevent hidden backdoors in modern software development.

Improving the Font Pipeline

To improve the font pipeline, consider how to efficiently choose and render glyphs for different languages, including handling ligatures and memory constraints. You may need to create texture atlases for various glyphs while ensuring new translations are incorporated. Finally, optimize rendering to avoid blurriness and ensure smooth performance across different character sets.

Easy Scalable Text Rendering on the GPU

This text explains a fast and memory-efficient technique for rendering text on the GPU without using traditional methods like signed distance fields. It uses triangles to fill in pixels inside the glyph and supports subpixel anti-aliasing for crisp text on LCD screens. The technique is resolution-independent, simple to implement, and can be extended to enhance rendering quality.

Adventures in Text Rendering: Kerning and Glyph Atlases

Text rendering involves converting vector glyphs to bitmaps, positioning them on screen, and optimizing performance by using glyph atlases. Glyph atlases store rasterized glyphs efficiently, allowing for sub-pixel alignment and improved rendering quality. This approach balances performance and quality in text rendering for different types of fonts.

Exploring the Power of Negative Space Programming

Negative space programming helps improve code by defining what it should not do, making it more robust and clear. By using constraints and assertions, developers can catch errors early and enhance security. This approach also promotes simplicity, making the code easier to maintain and understand.

CompilerTalkFinal

The content discusses various compilers and their features, including Clang, GCC, V8, CakeML, Chez Scheme, and more. It also touches on the history of interpreters and compilers, with examples like ENIAC and the first compiler developed by Grace Hopper. Different approaches to compilation and interpretation are highlighted, showcasing the evolution of compiler technology.

Graydon Hoare: 21 compilers and 3 orders of magnitude in 60 minutes

Graydon Hoare's talk explains different approaches to building compilers, from traditional giants to more efficient variants. He highlights the importance of using compiler-friendly languages and theory-driven meta-languages. The presentation covers key concepts like sophisticated partial evaluation and implementing compilers directly by hand.

p75-hoare

The author recounts experiences in designing a computer programming language and issues a warning about language complexity. Despite challenges, a subset of the language was successfully implemented. The author emphasizes the importance of simplicity and reliability in programming languages for critical applications.

Updating the Go Memory Model

The Go memory model needs updates to clarify how synchronization works and to endorse race detectors for safer concurrency. It suggests adding typed atomic operations and possibly unsynchronized atomics to improve program correctness and performance. The goal is to ensure that Go programs behave consistently and avoid data races, making them easier to debug.

Programming Language Memory Models (Memory Models, Part 2) Posted on Tuesday, July 6, 2021. PDF

Modern programming languages use atomic variables and operations to help synchronize threads and prevent data races. This ensures that programs run correctly by allowing proper communication between threads without inconsistent memory access. All major languages, like C++, Java, and Rust, support sequentially consistent atomics to simplify the development of multithreaded programs.

Hardware Memory Models (Memory Models, Part 1) Posted on Tuesday, June 29, 2021. PDF

This text discusses hardware memory models, focusing on how different processors handle memory operations and maintain order. It explains the concept of sequential consistency, where operations are executed in a predictable order, and contrasts it with more relaxed models like those used in ARM and POWER architectures. The author highlights the importance of synchronization to avoid data races in concurrent programming.

Baby Steps to a C Compiler

Writing a simple compiler can help you understand how computers work. Start with a minimal project that compiles a small subset of a language, and then gradually add more features. This approach makes learning about compilers and programming enjoyable and rewarding.

Kernel Programming Guide

Essential information for programming in the OS X kernel. Includes a high-level overview.

Tiny Tapeout

Tiny Tapeout is a project that helps people easily and affordably create their own chip designs. It offers resources for beginners and advanced users, along with a special price for submissions. Join the community to learn and share your designs before the deadline on September 6th.

Why Pascal is Not My Favorite Programming Language

Pascal is not recommended for serious programming due to limitations in its standard form. The language's strict typing and lack of features like separate compilation make it challenging for complex projects. Pascal is better suited for educational purposes rather than practical programming tasks.

What Color is Your Function?

Functions in a programming language can be either red or blue, affecting how they are called and used. Red functions are asynchronous and typically more complex to work with than blue functions. The choice between red and blue functions can impact code organization and maintainability.

What is an Invariant? Oct 6, 2023

Invariants are properties that hold true during the evolution of a system, helping to ensure correct behavior in programming. They can simplify reasoning about code, whether it’s for small algorithms or larger systems. By clearly defining invariants, programmers can create robust code and manage complex systems effectively.

Crafting an Interpreter in Zig - part 1

The author is learning Zig by implementing an interpreter for the Lox programming language, inspired by the book "Crafting Interpreters." They are documenting their journey, focusing on interesting aspects of Zig and how it differs from C. So far, they have enjoyed the process, particularly the simplicity and power of Zig's generic programming.

What Every Computer Scientist Should Know About Floating-Point Arithmetic

The text discusses the challenges and considerations of floating-point arithmetic in computer science. It emphasizes the importance of rounding in floating-point calculations and the implications of different precision levels. Additionally, it highlights the need for careful implementation to ensure correctness and accuracy in programs that rely on floating-point arithmetic.

The Development of the C Language*

The paper discusses the development and influences of the C programming language, highlighting its creation at Bell Labs and transition from the B language. C's simplicity, efficiency, and widespread adoption across various platforms and architectures are emphasized, showcasing its enduring stability and usefulness in software development. Despite its quirks and historical origin, C has proven to be a powerful and versatile language for programmers worldwide.

Ownership

Your Starting Point!

The text discusses the concepts of three-dimensional objects and how they are represented in two dimensions for computer graphics. It explains the process of projecting 3D points onto a canvas to create images. The importance of geometry and mathematics in computer graphics, particularly in defining objects and creating images, is emphasized.

Zig Interfaces for the Uninitiated, an update

The post discusses a new idiom for runtime polymorphism in Zig, focusing on using fat pointers instead of @fieldParentPtr. It provides a step-by-step guide on creating a formal Iterator interface and implementing it with an example range iterator. The drawbacks of this pattern include potential performance issues and the requirement for the original implementor to remain alive for the interface to function correctly.

Zig Interfaces for the Uninitiated

The text discusses how to create and implement generic iterators in Zig using interfaces like `Iterator` and `Range`. It demonstrates how to use these iterators to iterate over ranges of values and provides examples of ascending, descending, and skipping ranges. Additionally, it introduces a function `fold` to apply a function to successive elements in an iterator, showcasing Zig's runtime polymorphism for data structures.

Exploring Compile-Time Interfaces in Zig

Zig is a programming language with active community support and a focus on efficient, reusable software development. Interfaces in Zig define a blueprint for classes to implement specific methods, promoting code abstraction and flexibility. Compile-time interfaces in Zig optimize code structure by resolving methods during compilation for efficient program execution.

Aro - a C compiler

Aro is a C compiler created as an alternative to Zig's compiler. It includes the aro module for the compiler and a language-agnostic aro_backend module for translating code into machine code. Aro uses self-hosted backends from the Zig compiler for optimization.

Database Systems

This course at CMU covers database management systems, including data models, query languages, storage architectures, and more. It uses case studies to show real-world applications and is suitable for students with basic systems programming skills. The course also thanks companies for their support in equipment donations and course development.

Discovering and exploring mmap using Go

Memory-mapped files allow programs to access disk data larger than available memory. By using mmap in Go, you can map a file directly into memory for easier manipulation. Virtual memory techniques, like mmap, can help solve memory limitations in handling large files efficiently.

But how, exactly, databases use mmap?

Databases use memory-mapped files like mmap to handle data on disk larger than available memory. Examples include SQLite, LevelDB, Lucene, LMDB, and MongoDB. By understanding how mmap is used, we can grasp how databases efficiently read and write data from disk.

reHow memory mapped files, filesystems and cloud storage works

Kelly discusses the challenges of memory-mapped files and cloud storage in response to a comment about space reservation in Voron. Cloud providers may allocate more space than needed, leading to unexpected charges and unreliable data handling. Testing reveals issues with sparse files and memory mapping in cloud scenarios, highlighting the importance of understanding storage limitations.

Implementing a file pager in Zig

Implementing a file pager in Zig involves delaying disk writes until a threshold is reached. Two eviction strategies include least recently used and least frequently used models. Prioritizing pages based on usage can help optimize performance.

Criticizing Hare language approach for generic data structures

The blog criticizes the Hare language approach for not providing generic data structures like hash maps in its standard library. It highlights the complexity and importance of hash tables in various programming languages and emphasizes the need for efficient data structures in modern programming ecosystems. The author disagrees with Hare's approach and stresses the significance of hash tables in software development.

spikedoanz/from-bits-to-intelligence: machine learninig stack in under 100,000 lines of code

The text discusses building a machine learning stack in under 100,000 lines of code with hardware, software, tensors, and machine learning components. It outlines the required components like a CPU, GPU, storage, C compiler, Python runtime, operating system, and more. The goal is to simplify the machine learning stack while providing detailed steps for implementation in different programming languages.

One year of C

The author reflects on their year of writing C code, finding it enjoyable and productive. They emphasize the importance of choosing the right language for each problem and share insights on the benefits of using C over C++ in certain scenarios. Additionally, they discuss the advantages of C99 improvements and the simplified nature of writing C code compared to C++.

Heap Memory and Allocators

The text discusses different types of memory allocators in Zig programming language. It explains how memory allocation and deallocation work using alloc and free functions. Various allocator types like GeneralPurposeAllocator and FixedBufferAllocator are highlighted for managing memory efficiently.

Pointers

Pointers in Zig allow variables to reference memory addresses. Understanding pointers helps manipulate memory effectively. Pointers are values that store memory addresses and can be nested within structures.

Learning Zig - Pointers

Emulator 101

A detailed, step by step guide to writing an emulator

Data Compression Explained

Data compression involves modeling and coding to reduce the size of data files. Modern compressors typically use arithmetic coding for efficient compression. Algorithms like Huffman coding and run-length encoding are commonly used to achieve better compression results.

Twitter's Recommendation Algorithm

Twitter uses a recommendation algorithm to select the top tweets for users' timelines. The algorithm is based on core models and features that extract information from tweet, user, and engagement data. The recommendation pipeline consists of three main stages: candidate sourcing, ranking, and applying heuristics and filters. Twitter uses both in-network and out-of-network sources to find relevant tweets, and employs embedding spaces to determine content similarity. The final step involves blending tweets with other non-tweet content before sending them to users' devices. The goal of Twitter's open source endeavor is to provide transparency to users about how the recommendation system works.

Programming languages resources

This page is a collection of the author's favorite resources for people getting started writing programming languages. The resources cover various aspects such as compilers, runtimes, runtime optimization, pointer tagging, JIT compilers, assembler libraries, and interesting tools. The author also mentions topics they want to write about in the future and papers they want to read. The page is meant to be a helpful reference for those interested in programming language implementation.

3D Math Primer for Graphics and Game Development

The book "3D Math Primer for Graphics and Game Development" is available to read for free on the gamemath.com website. It includes information about GDC talks, FAQs, and resources for the first edition of the book. The first edition, published in 2002, is described as high tech, but the author recommends reading the second edition instead, which is also available for free.

Welcome to OpenGL

This text is about learning modern OpenGL through an online book that covers basic, intermediate, and advanced knowledge with clear examples and practical concepts. The content is freely available online and in print, with the aim of providing a complete and easy-to-understand platform for graphics programming enthusiasts. Readers will learn core graphics aspects, useful techniques, and even create a small game based on the obtained OpenGL knowledge.

WebGPU Fundamentals

The text provides a collection of articles to help beginners learn the basics of WebGPU, covering topics like fundamentals, 3D math, lighting techniques, and compute shaders. It also includes information on optional features, data memory layout, transparency, performance, and resources for further learning. Readers can explore various aspects of WebGPU, including how it works, 2D and 3D techniques, and essential concepts like uniforms, textures, and storage buffers.

An opinionated beginner’s guide to Haskell in mid-2019

This guide is for beginners in Haskell or those transitioning from similar languages, offering advice on learning resources and tools. It emphasizes the importance of writing Haskell code, getting help online, choosing popular platforms, and sticking to the default Prelude. The guide also touches on application architecture, using records, debugging techniques, and the experimental nature of Haskell as both a research and industrial language.

Are tagged unions overrated?

The author discusses the limitations of tagged unions and pattern matching in language development, suggesting that they are overrated for implementing language ASTs and IRs. Despite the benefits of tagged unions, the complexity they add may not always justify their use, especially in cases where simpler alternatives like class hierarchies can offer similar functionality. The post also highlights the potential for enhancing pattern-matching capabilities in mainstream languages to improve code readability and maintainability.

C++ Core Guidelines

These guidelines aim to simplify and improve the safety of C++ code by recommending specific extensions and best practices. They focus on static type safety, resource management, and reducing the likelihood of errors or accidents. By following these guidelines, programmers can write more correct, safer code without sacrificing performance.

What every systems programmer should know about concurrency

The document delves into the complexities of concurrency for systems programmers, explaining the challenges of running multithreaded programs where code is optimized and executed in unexpected sequences. It covers fundamental concepts like atomicity, enforcing order in multithreaded programs, and memory orderings. The text emphasizes the importance of understanding how hardware, compilers, programming languages, and applications interact to create a sense of order in multithreaded programs. Key topics include atomic operations, read-modify-write operations, compare-and-swap mechanisms, and memory barriers in weakly-ordered hardware architectures.

compiler_construction

Building a compiler can be straightforward by breaking the development into small steps and using Scheme as the implementation language. The tutorial focuses on translating a subset of Scheme to assembly code, with a step-by-step approach to achieve a fully working compiler. Testing and refining the compiler incrementally leads to a powerful tool capable of compiling an interactive evaluator.

How do we tell truths that might hurt?

The document discusses the challenges of telling unpleasant truths and the conflict that arises when sharing these truths in the field of Computing Science. The author argues that remaining silent about these truths compromises the intellectual integrity of the field. The document also lists a number of truths related to programming languages and the use of language in computing systems. The author questions whether the field should continue to ignore these truths and urges for a change in attitude.

The next fifty years

The text discusses the future of computing science over the next fifty years, emphasizing the importance of simplicity and elegance in design to prevent complexity. It highlights the close connection between program design and proof design, suggesting that advancements in program design can impact general mathematics. The author encourages embracing the opportunity to simplify processes and design systems that rely on formal mathematics.

Recommender Systems: A Primer

Personalized recommendations have become a common feature of modern online services, including most major e-commerce sites, media platforms and social networks. Today, due to their high practical relevance, research in the area of recommender systems is flourishing more than ever. However, with the new application scenarios of recommender systems that we observe today, constantly new challenges arise as well, both in terms of algorithmic requirements and with respect to the evaluation of such systems. In this paper, we first provide an overview of the traditional formulation of the recommendation problem. We then review the classical algorithmic paradigms for item retrieval and ranking and elaborate how such systems can be evaluated. Afterwards, we discuss a number of recent developments in recommender systems research, including research on session-based recommendation, biases in recommender systems, and questions regarding the impact and value of recommender systems in practice.

http client in the standard library · Issue #2007 · ziglang/zig

The issue #2007 discusses the implementation of an HTTP client in Zig's standard library. Contributors debate the necessity and scope of including an HTTP client, considering factors like complexity and resource allocation. Ultimately, the HTTP client implementation was completed and closed as part of milestone 0.12.0.

Introduction to Compilers and Language Design

A compiler translates high-level code to lower-level code, and building one is a common project in computer science education. This book provides a beginner-friendly guide to building a compiler for a C-like language, suitable for undergraduates with programming experience. The author offers free online access to the textbook and related code resources, with options to purchase a physical copy.

Bare Metal Zig

The text discusses compiling a freestanding Zig binary to run on "bare metal" without relying on an operating system. It shows how to create a simple freestanding binary, make it multiboot compliant, and add custom console functionality for output. The process involves targeting specific architectures, handling linker warnings, and ultimately creating a bootable "kernel" to run on virtual machines like QEMU.

Comparing SIMD on x86-64 and arm64

The text compares SIMD implementations using SSE on x86-64 and Neon on arm64 processors, including emulating SSE on arm64 with Neon. It explores vectorized code performance using intrinsics, auto-vectorization, and ISPC, highlighting the efficiency of SSE and Neon implementations. The study shows how optimizing for SIMD instructions significantly boosts performance over scalar implementations in ray-box intersection tests.

Compiler Optimizations Are Hard Because They Forget

Compiler optimizations involve breaking down complex changes into smaller, more manageable steps to improve code efficiency. However, as more optimizations are added, the potential for errors and missed opportunities increases, making it challenging to maintain optimal performance. Compilers struggle with balancing aggressive optimizations while preserving correct program behavior, highlighting the complexity and difficulties inherent in optimizing compilers.

C Isn't A Programming Language Anymore

C is no longer just a programming language but a vital protocol for all languages. Parsing C headers is a complex task best left to C compilers. Maintaining ABI compatibility in C can be challenging and may require versioning schemes.

Writing a C Compiler, Part 1

This text is about creating a C compiler in multiple stages, starting with lexing, parsing, and code generation. The process involves breaking down the source code, building an abstract syntax tree, and generating x86 assembly code. The compiler will handle simple programs with a single main function and a return statement.

GitHub - DoctorWkt/acwj: A Compiler Writing Journey

This GitHub repository documents the author's journey to create a self-compiling compiler for a subset of the C language. The author shares steps taken and explanations to help others follow along practically. The author credits Nils M Holm's SubC compiler for inspiration and differentiates their code with separate licensing.

A new JIT engine for PHP-8.4/9

A new JIT engine for PHP is being developed, improving performance and simplifying development. The engine will be included in the next major PHP version, potentially PHP 9.0. The new JIT engine generates a single Intermediate Representation (IR), eliminating the need to support assembler code for different CPUs.

Unknown

Hardware prefetching in multicore processors can be too aggressive, wasting resources and impacting performance for co-running threads. Combining hardware and software prefetching can optimize performance by efficiently handling irregular memory accesses. A method described in Paper II offers a low-overhead framework for accurate software prefetching in applications with irregular access patterns.

Introduction 2016 NUMA Deep Dive Series

The 2016 NUMA Deep Dive Series by staroceans.org explores various aspects of computer architecture, focusing on NUMA systems and their optimization for performance. The series covers topics such as system architecture, cache coherency, memory optimization, and VMkernel constructs to help readers understand and improve their host design and management. The series aims to provide valuable insights for configuring and deploying dual socket systems using Intel Xeon processors, with a focus on enhancing overall platform performance.

von Neumann architecture - Wikipedia

The von Neumann architecture is a computer design with a processing unit, control unit, memory, and input/output mechanisms. It allows for instructions and data operations to be stored in memory, advancing computer technology from fixed-function machines like the ENIAC. This architecture was influenced by the work of Alan Turing and John von Neumann and has been widely used in the development of modern computers.

Compiling tree transforms to operate on packed representations

The article explains how tree traversals in programming can be optimized by compiling them to work on serialized tree structures without using pointers. This approach can make programs run significantly faster on current x86 architectures. The authors developed a prototype compiler for a functional language that generates efficient code for traversing trees using packed data representations.

Pipelines Support Vectorized, Point-Free, and Imperative Style

The text discusses how pipelines in the shell language support vectorized operations on collections and point-free style, where no data is explicitly mentioned. It also demonstrates how imperative code can be incorporated within pipelines for tasks like generating HTML tables. The unique features of pipelines include their ability to handle vectorized code, point-free composition, and integration of imperative instructions.

Entering text in the terminal is complicated

Entering text in the terminal can be challenging due to inconsistencies in how different programs handle text input. Some programs support basic features like arrow keys and history navigation, while others have custom input systems with advanced functionalities. Understanding the input mode of a program can help users navigate text editing more effectively in the terminal.

What happens when you start a process on Linux?

The process of starting a new program on Linux involves using the fork and exec system calls. Fork creates a clone of the current process, while exec replaces that clone with the new program to be executed. The new process inherits most attributes from its parent, with memory being shared through copy-on-write to optimize performance.

Debug your programs like they're closed source!

The author discusses debugging programs without looking at the source code by using system calls like open, execve, and write. System calls allow you to understand and monitor a program's behavior without needing access to its source code. By learning and utilizing system calls, you gain debugging superpowers that are platform-independent and useful for closed-source programs.

How I got better at debugging

Julia Evans shares her journey of improving her debugging skills through logical thinking, confidence, expanding knowledge, communication, and using tools like strace and tcpdump. By being systematic, confident, knowledgeable, and open to collaboration, she transformed debugging from a challenging task to an exciting learning opportunity. Her story emphasizes the importance of persistence, curiosity, and practical problem-solving in mastering the art of debugging.

Media Page Under Construction

Handmade Cities' media page is under construction, with some recordings missing. The videos from Handmade Boston 2023 have poor audio quality due to using a third-party A/V company. Freya's Masterclass footage was lost, and an abridged version will be shown at Dutch Game Day.

Infographics: Operation Costs in CPU Clock Cycles

The text discusses the operation costs in CPU clock cycles for different types of operations, including simple operations, floating-point operations, and vector operations. It highlights that memory involvement can significantly impact operation costs, with some operations taking as little as 1 CPU cycle. Different CPU architectures and types of operations can result in varying costs, with some operations requiring specialized CPU support to work efficiently.

Handles are the better pointers

The text discusses using 'index-handles' instead of raw or smart pointers for memory management in C and C++. It suggests centralizing memory management into systems, grouping items into arrays, and converting handles to pointers only when necessary. By following specific rules, such as not storing pointers and using handle-to-pointer conversion, memory safety and efficient memory usage can be maintained.

You're Not Sick of Programming

Many people feel tired of programming and dream of quitting for a more fulfilling career, like farming or traveling. However, the real issue might be frustration with office politics, lack of product vision, and burnout rather than a true dislike of programming. Taking a break or addressing these underlying problems could help rediscover the creative potential of programming.

Zig Bare Metal Programming on STM32F103 — Booting up

The text explains how to program the STM32F103 microcontroller using the Zig programming language. It covers topics such as memory layout, linker scripts, and compiling code for embedded systems. By following the provided instructions, readers can successfully compile and run their first embedded program on the microcontroller.

OWASP Top Ten

The OWASP Top 10 is a guide for developers to understand critical security risks in web applications. Companies are encouraged to follow this document to improve the security of their web applications. The 2021 update includes new categories and ranking changes based on testing data and industry feedback.

Introduction

The OWASP Cheat Sheet Series offers valuable security information on application security topics. Created by experts, these concise cheat sheets aim to provide easy-to-read security guidance. You can download the cheat sheets from this site and stay updated through the ATOM feed.

The Copenhagen Book

The Copenhagen Book is a free and open-source guide for implementing auth in web applications. It is community-maintained and can be used alongside the OWASP Cheat Sheet Series. Suggestions or concerns can be addressed by opening a new issue.

Undefined Behavior deserves a better reputation

Undefined Behavior is often viewed negatively, but it can be a valuable tool for language designers. It allows programmers to convey insights to the compiler for optimizations. Responsible use of Undefined Behavior can enhance language design and code performance.

KHM+15

The text discusses a formal C memory model that supports integer-pointer casts, essential for low-level C programming. It proposes a quasi-concrete memory model that allows standard compiler optimizations while fully supporting integer-pointer casts. This model helps verify programs and optimizations that are challenging to validate with integer-pointer casts.

Learning LLVM (Part-1) - Writing a simple LLVM pass

This text introduces learning about LLVM and writing LLVM passes, which are used for transforming or analyzing a program's intermediate representation. LLVM offers a versatile compiler infrastructure with modules like the frontend, middle-end, and backend for optimizing and generating machine-specific code. By understanding LLVM concepts and pass managers, developers can create efficient passes for tasks like performance optimization and code analysis.

Some Were Meant for C

The document "Some Were Meant for C" explores the enduring significance of the C programming language, highlighting its dual role as both an application and systems programming language. It challenges common assumptions about C, emphasizing its unique communicative design that differs from managed languages. The document argues that C's explicit representations and memory access foster effective system-building and communication, making it a preferred choice for certain technical challenges. Additionally, it critiques the prevailing discourse that demonizes C, advocating for a nuanced understanding of its role in the programming landscape.

Xv6, a simple Unix-like teaching operating system

Xv6 is a teaching operating system developed by MIT for their operating systems course. It is based on Unix V6, written in ANSI C, and runs on Intel x86 machines. The xv6 source code is available on GitHub and is used in lectures to teach operating system concepts.

C Is Not a Low-level Language

C is often considered a low-level language, but this article argues that it is not. The author explains that vulnerabilities like Spectre and Meltdown occurred because processor architects were trying to build fast processors that exposed the same abstract machine as a PDP-11, which C programmers believe is close to the underlying hardware. However, the reality is that C code runs on a complex compiler that performs intricate transformations to achieve the desired performance. The article also discusses how C's memory model and optimizations make it difficult to understand and can lead to undefined behavior. The author suggests that instead of trying to make C code fast, it may be time to explore programming models on processors designed for speed.

Should you learn C to "learn how the computer works"?

The author discusses whether learning C is necessary to understand how computers work, ultimately concluding that C is not a direct representation of computer operations. Learning C can still be beneficial for understanding computing concepts and history, but it operates within a virtual machine and abstracts certain hardware details. By learning C, you can gain insight into the relationship between programming languages, hardware, and the historical development of computing.

A Guide to Undefined Behavior in C and C++, Part 1

The text explains that undefined behavior in C and C++ can lead to unpredictable program outcomes. Compilers may optimize code by exploiting undefined behavior, potentially causing programs to misbehave. It is important for programmers to understand how undefined behavior can impact program execution.

When Network is Faster than Cache

Firefox introduced a feature called RCWN to improve web performance by racing cached requests against the network. In some cases, the network can be faster than fetching data from the cache due to various factors like browser bugs and resource prioritization. Factors like device hardware and the total number of assets served from the cache impact cache retrieval performance significantly.

John Carmack on Functional Programming in C++

Functional programming in C++ can help in writing better software by making code easier to reason about and eliminating thread race conditions. Pure functions, which only rely on input parameters and produce consistent outputs, offer benefits such as thread safety and easier testing. Refactoring towards purity can improve code quality, even if full purity is not achieved, by disentangling computation from the environment it operates in.

Zig-style generics are not well-suited for most languages

Zig-style generics, like those in C++, may not work well for all languages due to limitations in compiler support and type inference. Armchair suggestions about adopting Zig-style generics in other languages may overlook these challenges. The flexibility and metaprogramming capabilities in Zig may not easily translate to other statically-typed languages.

WebGL2 vs WebGL1

WebGL is a 3D API that works as a rasterization engine, requiring users to provide code for rendering points, lines, and triangles. Users must create vertex and fragment shaders to control how WebGL processes and displays graphics. The WebGL API simplifies rendering by executing user-created functions to draw basic shapes like triangles.

WebGL How It Works

The text explains how WebGL processes vertices to create triangles and render them with pixels using shaders. Varyings are used to pass data from the vertex shader to the fragment shader for color interpolation. Buffers are essential for transferring vertex data to the GPU for rendering, and attribute locations are assigned to specify how to extract and use this data efficiently.

The_Night_Watch

The text discusses the importance of systems programmers in dealing with complex technical challenges, emphasizing their unique skills in debugging and problem-solving. It contrasts the roles of systems programmers with other computer professionals like GUI designers and PHP developers, highlighting the critical nature of systems programming in challenging scenarios. The text humorously portrays the intense and sometimes absurd experiences of systems programmers, showcasing their indispensable role in addressing technical issues efficiently and effectively.

FreeType

FreeType is a software library for rendering fonts, available for free. It is designed to be small, efficient, and capable of producing high-quality font images. Users can find installation instructions, documentation, and ways to communicate with the FreeType team on their website.

A Freestanding Rust Binary

To create a freestanding Rust executable for operating system development, we need to disable linking to the standard library and define our own entry point function. By compiling for a bare metal target like thumbv7em-none-eabihf, we can avoid linker errors and run Rust code without an underlying operating system. Additional linker arguments are required for specific operating systems like Linux, Windows, and macOS to resolve linker errors and build the freestanding Rust binary successfully.

Manually linking Rust binaries to support out-of-tree LLVM passes

LLVM is a compiler infrastructure used by frontends like rustc to generate machine code. To add custom LLVM passes to a Rust binary, extra flags can be used during compilation to produce LLVM-IR and then link the binary properly using LLVM tools. By understanding how Rust's static libraries work and leveraging cargo for dependency management, custom LLVM passes can be integrated into Rust binaries efficiently.

The Rust Reference

The Rust compiler can generate different types of output artifacts, such as runnable executables, Rust libraries, dynamic libraries, and static system libraries. Dependencies between crates can be linked in various formats, such as rlib and dynamic library formats, following specific rules set by the compiler. Understanding how to specify output formats like --crate-type=bin or --crate-type=lib can help control the compilation process for Rust crates, while also considering options for linking C runtimes dynamically or statically based on target features.

Rust Compiler Development Guide

The Rust compiler processes and transforms your code for compilation. It uses different stages like lexing, parsing, and abstract syntax tree lowering. The compiler aims for correctness, performance, and supporting incremental compilation.

How to speed up the Rust compiler one last time

The author at Mozilla is concluding their work on speeding up the Rust compiler after several years of dedicated effort. They wrote multiple blog posts detailing their performance optimizations and shared valuable lessons learned from the process. The author expressed gratitude to those who supported their work and highlighted the importance of ongoing contributions to Rust's development.

How to speed up the Rust compiler in March 2024

In March 2024, updates on the Rust compiler's performance highlighted several key improvements. Changes like using a single codegen unit, marking Debug::fmt methods with #[inline], introducing a cache, and upgrading LLVM versions led to notable reductions in wall-time, binary size, and hash table lookups. Additionally, the availability of the Cranelift codegen backend for x86-64/Linux and ARM/Linux offers an alternative for faster compile times. While the author didn't contribute to speed improvements this time, overall performance from August 2023 to March 2024 showed reductions in wall-time, peak memory usage, and binary size, indicating steady progress in enhancing the Rust compiler's efficiency.

Zig Bits 0x4: Building an HTTP client/server from scratch

The text explains how to create an HTTP client and server from scratch using Zig >=0.11. For the client, you need to set up requests, headers, and wait for responses. The server part involves defining functions to handle requests and running the server to accept connections.

Do We Really Need A Link Step?

The author questions the need for a link step in native-code compilation for faster performance. They propose a "zero-link" approach where compilers directly write object code into the final executable file. This method could improve efficiency by avoiding unnecessary object files and incorporating symbol resolution within the executable itself.

jamiebuilds/the-super-tiny-compiler: :snowman: Possibly the smallest compiler ever

The Super Tiny Compiler is a simplified example of a modern compiler using easy-to-read JavaScript. It helps you understand how compilers work from start to finish. Compilers play a big role in the tools we use daily.

5 Days to Virtualization: A Series on Hypervisor Development

A series on hypervisor development for Intel processors with virtualization support will be published next week, covering topics like setting up a test environment, driver skeleton creation, and multi-processor initialization. The series aims to aid new readers in building, testing, and understanding type-2 hypervisor development using C programming language. Recommended reading and detailed explanations will be provided to enhance knowledge and understanding of virtualization concepts.

In-depth analysis on Valorant’s Guarded Regions

The text discusses how Valorant's anti-cheat system, Vanguard, uses innovative techniques to protect against memory manipulation by whitelisting threads and creating shadow regions. These methods involve cloning and modifying the game's paging tables to allow access to hidden memory without affecting performance. By implementing these advanced security measures, Vanguard effectively prevents cheats from bypassing its guarded regions.

Exploit Development: No Code Execution? No Problem! Living The Age of VBS, HVCI, and Kernel CFG

The text discusses various techniques used in exploit development, particularly focusing on targeting the Windows kernel. It mentions concepts like Hypervisor-Protected Code Integrity (HVCI) and how exploits can manipulate memory to execute attacker-controlled code in kernel mode. The text also delves into details like leaking kernel-mode memory, constructing ROP chains on the kernel-mode stack, and utilizing functions like NtQuerySystemInformation to escalate privileges and perform malicious actions in the system.

CheerpX versus WebContainers

CheerpX is a client-side virtualization technology for running x86 executables and operating systems in the browser without modifications or recompilation. It offers cost-effective, secure, and private execution of native code, making it suitable for various web-based applications. CheerpX stands out from other solutions by supporting any x86 executable and providing a robust two-tier emulator for efficient code execution.

Creating a Rootkit to Learn C

The text demonstrates creating a userland rootkit in C to hide malicious activities like network connections and files. By hooking into system calls like access() and write(), the rootkit can manipulate userland programs and evade detection by tools like netstat. The rootkit uses shared library injections and hooks to intercept and manipulate system calls, showcasing the power of C for malicious activities.

Picsart-AI-Research/LIVE-Layerwise-Image-Vectorization: [CVPR 2022 Oral] Towards Layer-wise Image Vectorization

The text discusses a new method called LIVE for generating SVG images layer by layer to fit raster images. LIVE uses closed bezier paths to learn visual concepts in a recursive manner. Installation instructions and references for the method are provided in the text.

Udacity CS344: Intro to Parallel Programming

Intro to Parallel Programming is a free online course by NVIDIA and Udacity teaching parallel computing with CUDA. It's for developers, scientists, engineers, and students looking to learn about GPU programming and optimization. The course is self-paced, requires C programming knowledge, and offers approximately 21 hours of content.

CS 361: Systems Programming

The Systems Programming course at UIC includes assigned readings, video lectures, labs, and quizzes scheduled throughout the week. Students can access additional resources and submit assignments through the course gradescope page. Office hours, content quizzes, discussions, and exams are held on specific days via Zoom and YouTube.

Resolving Rust Symbols

Linking combines object files into an executable or shared library in Rust. The linker resolves symbols and dependencies between object files. Rust prefers static linking to create a single distributable binary with all dependencies included.

When FFI Function Calls Beat Native C

David Yu performed a benchmark comparing different Foreign Function Interfaces (FFI) for function calls. LuaJIT's FFI was found to be faster than native C function calls due to efficient dynamic function call handling. Direct function calls, like those used by LuaJIT, can outperform indirect calls routed through a Procedure Linkage Table (PLT).

Cap'n Proto, FlatBuffers, and SBE

FlatBuffers is a new serialization protocol released by Google engineers, similar to Cap’n Proto. Cap’n Proto allows random access using pointers, while FlatBuffers uses offsets stored in tables for random access. Protobufs, Cap’n Proto, and FlatBuffers have custom schema languages and different features for data serialization and access.

A Database Without Dynamic Memory Allocation

TigerBeetle, a database written in Zig, does not allocate memory dynamically after startup. It uses static memory allocation for all data structures, avoiding performance issues and use-after-free bugs. This approach allows for better predictability, easier handling of overload, and efficient resource management.

Wizard Zines Collection!

Julia offers programming zines with black and white covers for free and colored covers for purchase. The zines can be bought individually for $10-$12 each or as a whole collection. Additionally, there are free posters and a weekly comic subscription available.

Problems of C, and how Zig addresses them

This blog post discusses issues with C and how Zig addresses them through features like comptime evaluations and improved memory management. Zig offers solutions like error handling improvements and treating everything as an expression, making it a modern alternative to C with enhanced functionalities. The comparison highlights Zig's advantages in areas such as memory management, error handling, and expressive coding practices.

How to use hash map contexts to save memory when doing a string table

The text explains how to save memory when building a string table using hash map contexts. By adapting context APIs, only indexes are stored in the table, reducing memory usage. This method can save 117 KB of memory for a string table with 10 thousand entries.

resume.txt

Andrew Kelley is a programmer with 16 years of experience in software development and a passion for open-source projects. He has worked on various music-related software like the Genesis DAW and libgroove, contributing patches to libav and ffmpeg. Additionally, he has experience in low-level systems, custom algorithm creation, and designing user interfaces.

Leslie Lamport

Leslie Lamport wrote several papers on verifying and specifying concurrent systems using TLA. He discovered algorithms through formal derivation and emphasized mechanical verification of concurrent algorithms. His work influenced the development of the TLAPS proof system.

Indices and tables

CompilerGym is a library for reinforcement learning in compiler tasks. It helps ML researchers work on optimization problems and allows system developers to create new tasks for ML research. The goal is to use ML to make compilers faster.

448997590_1496256481254967_2304975057370160015_n

The LLM Compiler is a suite of pre-trained models designed for code optimization tasks, based on Code Llama. It has been trained on a large corpus of LLVM-IR and assembly code to enhance compiler behavior understanding. The release of LLM Compiler aims to support further research in compiler optimization for both academia and industry.

Bare Bones

This text explains how to create an operating system by first cross-compiling and using existing technology. It guides you through writing a kernel in C or C++, creating a bootloader, and linking the kernel for x86 systems. Following these steps ensures your operating system can be loaded and executed correctly.

The Graphics Codex

"The Graphics Codex" is a comprehensive resource for computer graphics, offering essential information on 3D rendering and shading. It includes equations, diagrams, and programming projects, with free updates every month. Written by expert Morgan McGuire, it is a valuable tool for learning and reference in the field of computer graphics.

Notes on partial borrows

The text discusses limitations of the Rust borrow checker and proposes solutions for creating references that borrow from specific subsets of a type. Two approaches, "View types" and "Reference views," are explored to address these limitations and provide more flexibility in borrowing subsets of fields with different lifetimes and mutability. The discussion includes examples, subtyping implications, monomorphization considerations, and the need to update Rust's aliasing model to accommodate view references accessing discontiguous memory regions.

Dioxus Labs + “High-level Rust”

An article criticized Rust's gamedev hype, but its popularity stems from meeting modern programming needs like speed and safety. Efforts are underway to enhance Rust's capabilities for various industries and improve compile times significantly. Proposed enhancements include incremental linking, parallel frontend, and macro expansion caching to make Rust more efficient for developers.

Compile-Time Configuration For Zig Libraries

To expose compile-time configuration options in Zig libraries, developers can use global declarations in the root source file or through Zig's build system. By setting configuration flags, developers can customize behavior such as enabling or disabling assertions in library code. Compile-time configuration can improve performance by allowing certain checks to be done at compile-time rather than runtime.

Generics

Generics in Zig allow for creating data structures and algorithms that can work with different types. By using generics, code can be written once and reused with various data types. Zig's approach to generics involves leveraging compile-time metaprogramming capabilities.

Zig's HashMap - Part 1

Zig's std.HashMap implementation relies on two key functions: hash and eql. The documentation outlines various hash map types and their functionalities, including std.HashMapUnmanaged. AutoHashMap can automatically generate hash functions, but there are limitations, and custom contexts can be provided for more complex keys.

Zig Parser

The Zig Parser is a crucial part of the Zig compiler internals, responsible for constructing an abstract syntax tree from a stream of tokens. The parser uses a struct called Parser to manage the internal state of the parse operation, accumulating errors and building up AST nodes. Understanding the structure of an AST node and the data pattern is essential for comprehending how the parser works and the subsequent stages of the compiler. The AST node data is stored in various locations such as the token stream, the node list, and the extra data list, with specific structures and indexes used to access information about AST nodes like function declarations and prototypes.

Causal ordering

Causal ordering is essential for understanding distributed systems, where events may not have a clear time order. This concept helps determine the causal relationship between events in a system. It enables reasoning about causality, leading to simpler solutions in distributed computing.

Assorted thoughts on zig (and rust)

Zig is simpler than Rust and offers similar features through compile-time execution. Rust provides strong type safety guarantees for generic functions, while Zig lacks automatic type constraint documentation and may face challenges with IDE support. Zig excels in custom allocators and handling out-of-memory errors, while Rust excels in preventing memory leaks and resource management.

Columnar kernels in go?

Over the winter I'm going to be adding a columnar query engine to an existing system written in go.

An opinionated map of incremental and streaming systems

The text discusses various design choices and characteristics of incremental and streaming systems. It highlights the core idea of these systems, which is to process inputs to generate outputs efficiently. The systems are categorized based on unstructured vs structured design, high temporal locality vs low temporal locality workloads, internal consistency vs internal inconsistency, and eager vs lazy computation approaches. The text explains the advantages and disadvantages of each design choice and provides examples of systems that fall into different categories. Additionally, it emphasizes the importance of understanding these design choices in selecting the appropriate system for specific workloads.

Internal consistency in streaming systems

The text discusses the importance of internal consistency in streaming systems. It explains how eventual consistency can lead to incorrect outputs and the need for systems to wait for all relevant inputs before emitting results. Maintaining internal consistency ensures correct outputs and prevents confusion between changes and corrections.

Pain we forgot

The text discusses the challenges in programming and the need for more user-friendly tools. It emphasizes the importance of improving feedback loops, running code smoothly, and creating more helpful programming environments. The author suggests rethinking traditional tools and approaches to make programming more accessible and efficient.

The shape of data

The text discusses the importance of having a clear and consistent data notation in programming languages like Clojure. It emphasizes the advantages of a notation that closely aligns with the in-memory representation of data, making it easier for developers to work with and understand data structures. Additionally, it suggests that a well-designed data model and notation are crucial for efficient data manipulation and code analysis.

Reflections on a decade of coding

The author reflects on 12 years of coding experience, sharing recent projects and personal growth insights. They highlight the importance of gradual improvements in habits and processes over innate talent. The author identifies areas of progress, like writing efficient code and managing emotions, while acknowledging gaps in experience in maintaining large codebases and teamwork.

Prospecting for Hash Functions

The text discusses the process of designing non-cryptographic integer hash functions, exploring different operations and constraints to create effective hash functions. It also compares various 32-bit hash functions and their bias levels, highlighting the search for high-quality hash functions with minimal bias for both 32-bit and 64-bit integers.

The Missing Zig Polymorphism / Runtime Dispatch Reference

The text discusses how Zig lacks built-in polymorphism features like interfaces or virtual methods. It explores creating polymorphism using existing language features in Zig. The author provides a detailed guide on implementing polymorphism in Zig, focusing on dynamic dispatch using function pointers.

Nanosystems

This text is about a book called "Nanosystems" by K. Eric Drexler, which is considered groundbreaking in the field of molecular nanotechnology. The book explains how to create manufacturing systems at the molecular level and discusses the significant impact nanotechnology will have on various industries. Experts praise the book for providing a foundation for future research in molecular systems engineering and molecular manufacturing.

How To Become A Hacker

The text explains what it means to be a hacker, focusing on problem-solving, creativity, and a willingness to share knowledge within the hacker culture. It emphasizes the importance of developing a hacker mindset, skills, and dedication through self-education and a passion for solving new problems. The hacker culture values intelligence, hard work, and a sense of community, with an emphasis on learning and sharing information to advance the collective knowledge of hackers.

the rr debugging experience

rr is a debugging tool for Linux that records failures for deterministic replay under gdb. It helps debug real applications efficiently and supports reverse execution for finding bugs. rr aims to make debugging easier with low overhead and powerful features like hardware data watchpoints.

Text Buffer Reimplementation

The Visual Studio Code 1.21 release includes a new text buffer implementation that improves performance in terms of speed and memory usage. The previous implementation used an array of lines, but it had limitations such as high memory usage and slow file opening times. The new implementation uses a piece table data structure, which allows for better memory usage and faster line look-up. Additionally, the implementation uses techniques such as caching for faster line lookup and a balanced binary tree for efficient searching. Benchmarks showed that the new implementation outperformed the previous line array implementation in terms of memory usage, file opening times, and reading operations.

What Is The Minimal Set Of Optimizations Needed For Zero-Cost Abstraction?

Rust and C++ offer "zero-cost abstractions" where high-level code compiles to low-level code without added runtime overhead, but enabling necessary compiler optimizations can slow down compilation and impact debugging. The challenge is to find the minimal set of optimizations that maintain zero-cost abstractions while improving build speed and debug information quality. Balancing fast debuggable builds with zero-cost abstractions is crucial for performance and developer experience in languages like Rust and C++.

Using ASCII waveforms to test hardware designs

Using expect tests automates the validation of code output, detecting errors efficiently. Jane Street uses Hardcaml in OCaml for hardware development, simplifying testbench creation. Waveform expect tests help visualize hardware behavior, improving development workflows.

Rust 2019 and beyond: limits to (some) growth.

The text discusses the need for controls and policies to manage the growth limits of technical artifacts and the strains on individuals in the Rust project. It emphasizes the importance of acknowledging and addressing these limits to prevent potential crises or dysfunction in the future. The author suggests implementing controls, such as hard limits and moderation strategies, to maintain a healthy and sustainable project environment.

Your ABI is Probably Wrong

The text discusses how most ABIs have a design flaw that harms performance by passing large structures inefficiently. Different ABIs handle passing large structures differently, but they all repeat the same mistakes. A correctly-specified ABI should pass large structures by immutable reference to avoid unnecessary copies.

GitHub - sirupsen/napkin-math: Techniques and numbers for estimating system's performance from first-principles

The project "Napkin Math" aims to provide resources and techniques to estimate system performance quickly and accurately. It includes examples like estimating memory reading speed and storage costs for applications. The best way to learn this skill is through practical application, with the option to subscribe for regular practice problems. Detailed numbers and cost estimates are provided, along with compression ratios and techniques to simplify calculations. The project encourages user participation to enhance and refine the provided data and tools for napkin math calculations.

Don't write bugs

Effective programmers should focus on preventing bugs rather than debugging them. Re-reading code frequently can help reduce the number of errors. Writing bug-free code is achievable with practice and attention to detail.

technicalities: "not rocket science" (the story of monotone and bors)

The text discusses the development of a program called bors that enforces the "Not Rocket Science Rule" of maintaining a code repository that always passes tests. Bors automates integration testing and ensures code changes are only merged if they pass tests, preventing broken code from being merged. This system has been found to be extremely beneficial for software projects, ensuring a stable and reliable codebase.

Why is Python slow

Python's performance issues stem from spending most time in the C runtime, rather than the Python code itself. Pyston focuses on speeding up the C code to improve performance. Suggestions to improve Python's speed by using other JIT techniques overlook the fundamental issue of optimizing C code.

Design duality and the expression problem

The text discusses the concept of design duality in programming, focusing on the trade-offs between objects and data representations. It highlights the importance of making conscious design choices when introducing new types, whether as data, objects with extensible implementations, or abstract data types with restricted extensibility. The author emphasizes the need for programming languages to better support and encourage these design considerations.

Random Thoughts On Rust: crates.io And IDEs

The author shares experiences with Rust, praising cargo and crates.io for easy code distribution. They highlight the need for improved library discovery on crates.io and discuss the potential for better IDE support in Rust projects. Despite challenges like type inference, Rust's design enables advanced IDE features that can enhance coding efficiency.

John Carmack on Inlined Code

Consider inlining functions that are only called in one place for efficiency. Simplify code structure to reduce bugs and improve performance. Emphasize consistent execution paths over avoiding minor optimizations.

A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World

The text discusses the development and commercialization of a bug-finding tool that can identify errors in large amounts of code. It highlights the challenges faced in finding and addressing various types of bugs, such as memory corruption and data races, across different programming systems. The tool's effectiveness in uncovering bugs in complex codebases emphasizes the importance of bug detection for improving software quality.

What is Systems Programming, Really?

The term "systems programming" combines low-level programming and systems design. It involves creating and managing complex components, often focusing on machine implementation details. Over time, the distinction between systems programming and other programming languages has become less clear.

Mitchell Hashimoto

Mitchell Hashimoto is an advisor at Polar and shares insights on technical projects, Zig programming, and automation on his website. He discusses various topics like GitHub pull requests, Zig build system, and AI growth through cloud lens. Mitchell's writing covers a range of technical subjects and his experiences in the startup world.

UB Might Be a Wrong Term for Newer Languages Apr 2, 2023

The author suggests that using the term "undefined behavior" in newer languages like Zig and Rust may not be the best choice due to differences in semantics. In C, implementations can define some behaviors left undefined by the standard, but in Rust and Zig, any program showing undefined behavior is considered invalid. The author proposes using terms like "non-trapping programming error" or "invalid behavior" to better convey the intended semantics in these languages.

What Every C Programmer Should Know About Undefined Behavior #1/3

This blog post explains that many seemingly reasonable things in C actually have undefined behavior, leading to common bugs in programs. Undefined behavior in C allows for optimizations that improve performance but can result in unexpected outcomes like formatting your hard drive. Understanding undefined behavior is crucial for C programmers to prevent potential issues and improve code efficiency.

The Rustonomicon

The Rustonomicon is a book for understanding Unsafe Rust programming details. It complements The Rust Programming Language by delving into combining language pieces and potential issues. The book covers topics like (un)safety, creating safe abstractions with unsafe primitives, and working with memory, but does not provide exhaustive API details.

chrono-Compatible Low-Level Date Algorithms

The text explains algorithms for handling dates and determining leap years. It includes functions for calculating the last day of a month and converting dates between different calendar systems. The algorithms are designed to be efficient and accurate for various date calculations.

So Many New Systems Programming Languages II

The text discusses new systems programming languages like Rust, Zig, and Odin, highlighting their safety features and syntax. These languages offer improved memory management and safety compared to older languages like C and C++. Rust, in particular, stands out for its memory safety, threading support, and borrow checker.

zackoverflow

Zack, the author, enjoys building things and delving into the inner workings of systems and computers for dopamine. He works on the Bun JavaScript runtime and creates music when not coding. Zack invites anyone to chat through his open calendar link.

From Theory To Implementation

Physically Based Rendering is a widely-used textbook in computer graphics that combines theory with practical implementation for creating realistic images. The book, authored by industry experts, offers cutting-edge algorithms and ideas, including GPU ray tracing, to help readers design advanced rendering systems. Both the third and fourth editions of the book are available online for free.

Ray Tracing in One Weekend

"Ray Tracing in One Weekend" introduces readers to the concept of ray tracing through a step-by-step guide to creating a ray tracer that produces images. The document covers topics such as sending rays into the scene, ray-sphere intersection, shading, and reflection. It explains the mathematical aspects behind ray tracing, including formulas for sphere intersections and normal vectors. The guide progresses from creating a simple image of a sphere to more complex scenes, providing insights into the coding process and considerations for optimizing the rendering process.

Tree-Structured Concurrency — 2023-07-01

Structured concurrency is a programming concept that ensures clear control flow in concurrent programs. In the context of async Rust, it guarantees properties like cancellation propagation, which means that dropping a future will also cancel all nested futures. The text discusses examples of unstructured and structured concurrency patterns, emphasizing the importance of applying structured concurrency to improve program correctness and maintainability. It also mentions the need for more API support to fully achieve structured concurrency in async Rust, suggesting practical approaches like using task queues or adopting the smol model for task spawning. Overall, structured concurrency provides a way to reason about async Rust programs effectively and enhance their reliability.

BSTJ 57: 6. July-August 1978: The UNIX Time-Sharing System. (Ritchie, D.M.; Thompson, K.)

The UNIX Time-Sharing System is a versatile operating system with unique features. It runs on Digital Equipment Corporation computers and emphasizes simplicity and ease of use. UNIX has been widely adopted for research, education, and document preparation purposes.

Mapping the whole internet with Hilbert curves

The author mapped the internet using Hilbert curves to visualize IP addresses. The curves help display the vast network structure in a more comprehensible way. The scan revealed interesting patterns and changes in IP address allocations over time.

xorvoid

Anthony Bonkoski, a computer enthusiast and engineer, shares his experiences in programming and working in quantitative finance. He enjoys working on various projects and has expertise in low-level programming, distributed systems, and reverse-engineering. Currently taking a break from full-time work, he is open to part-time consulting projects and enjoys writing and exploring new interests.

You own your data, in spite of the cloud

The text discusses the benefits of local-first software, emphasizing ownership and control of data while also enabling seamless collaboration. It compares traditional cloud apps with new approaches that prioritize user ownership and real-time collaboration. The focus is on developing software that combines the convenience of cloud apps with the data ownership of traditional software.

Writing CUDA Kernels for PyTorch

The text shows the thread distribution on different streaming multiprocessors (SM) in CUDA. Threads are organized into warps, lanes, and specific thread numbers within each SM. This information is crucial for optimizing CUDA kernels in PyTorch.

999 crates of Rust on the wall

The author compared popular crates on crates.io to their upstream repositories to improve supply chain security. Most top crates matched their repositories, but some had issues like missing VCS info or build failures. Future work includes extending this analysis to all crates on crates.io and improving publishing processes for better security.

Uiuisms

This text provides a list of Uiua functions for solving common problems. Contributors can add more functions to the list in the repository. Functions include splitting arrays, removing rows, upscaling matrices, and working with diagonal arrays.

Arithmetic functions

BQN's arithmetic functions mirror mathematical notation and apply element-wise to arrays. BQN supports basic arithmetic operations like addition, subtraction, multiplication, division, exponentiation, and root functions. Character arithmetic is a distinctive feature allowing manipulation of characters with symbols like + and -.

An interactive study of queueing strategies

This text explores different queueing strategies for handling requests, emphasizing the importance of prioritizing requests effectively to prevent dropping important ones. It introduces concepts like FIFO and priority queues, as well as active queue management techniques to optimize request processing. Understanding these strategies can help in efficiently managing queues and improving overall system performance.

ethereumbook/04keys-addresses.asciidoc at develop · ethereumbook/ethereumbook · GitHub

This chapter introduces public key cryptography used in Ethereum for securing ownership of funds through private keys and addresses. Public keys are derived from private keys and are represented as points on an elliptic curve. Ethereum addresses are unique identifiers generated from public keys using the Keccak-256 hash function.

Accidentally Turing-Complete

The document "Accidentally Turing-Complete" explores various unexpected systems and technologies that unintentionally exhibit Turing completeness, a property that allows them to perform any computation. Examples include C++ templates, TypeScript, Java generics, X86 mov instructions, Magic: The Gathering card game, HTML5, Minecraft, Dwarf Fortress game, SQL, Apache Rewrite Rules, Pokemon Yellow game, Scala type system, MediaWiki templates, Little Big Planet game, Sendmail, Vim Normal-Mode, Border Gateway Protocol (BGP), Excel, Super Mario World glitches, PowerPoint, Font Shaping, JBIG2 Image Compression, and Stupid RDMA NICs. The document showcases how these diverse systems, from games to internet protocols, can unexpectedly demonstrate the computational power of Turing completeness.

Problems with BQN

BQN has issues with incoherent monad-dyad pairs and train structures, making code readability and implementation challenging. Modifications like the Constant modifier ˙ attempt to address these challenges. However, there are still limitations in tacit code construction and array reductions that impact the language's usability.

Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model

The paper presents a simple and effective denoising-diffusion model called Iterative α-(de)Blending. It offers a user-friendly alternative to complex theories, making it accessible with basic calculus and probability knowledge. By iteratively blending and deblending samples, the model converges to a deterministic mapping, showing promising results in computer graphics applications.

The borrow checker within

The text discusses improvements to Rust's borrow checker to align better with its core design ethos of mutation xor sharing. These changes aim to make Rust code patterns feel more intuitive and work seamlessly with the borrow checker's rules. The proposed enhancements include features like conditional return references, view types, and addressing phased initialization issues.

How should I read type system notation?

A type system in programming languages follows rules for expressions and types. Typing rules are written as relationships between expressions and their types for checking and inferring types. Contexts are used to keep track of variable types in type judgments.

Writing a Simple Garbage Collector in C

Summary: The text explains how to implement a simple garbage collector in C by writing a memory allocator function that manages free and used memory blocks using linked lists. The garbage collection algorithm involves scanning memory regions to mark blocks in use and free those not in use. The collector function collects unused memory blocks, making the heap scanning code simpler and faster.

A decade of developing a programming language

The author spent a decade developing the programming language Inko, transitioning from gradual to static typing and using Rust for the compiler. Recommendations include avoiding gradual typing, self-hosting compilers, and focusing on functionality over performance when building a new language. Building a language for long-term use is a time-consuming process that requires prioritizing user needs over technical complexities.

The Rust I Wanted Had No Future

The author preferred certain design choices in early Rust over the current state, such as the treatment of certain language features and performance considerations. They express a desire for a simpler, less performance-focused language with different priorities than those commonly held in the Rust community. The author reflects on their preferences for language design and the trade-offs they would have made for a more straightforward and expressive programming experience.

The Garbage Collection Handbook

The Garbage Collection Handbook is a comprehensive guide on automatic memory management, covering modern techniques and challenges faced by programmers. This second edition updates the handbook with insights from over 60 years of research and development in the field. It is essential reading for programmers looking to understand and navigate the complexities of garbage collection in modern programming languages.

PRACTICAL COMPILER CONSTRUCTION

"Practical Compiler Construction" is a textbook on writing compilers with annotated source code. The second edition is now available in print with improvements and bug fixes. The book covers compiler construction concepts and advanced techniques for optimizing code.

A Distributed Systems Reading List

This reading list covers materials for understanding distributed systems design and challenges. It includes resources on topics like latency, Amazon's organizational culture, Google's cutting-edge technologies, consistency models, theory, languages, tools, infrastructure, storage, Paxos consensus, and gossip protocols. The list aims to help readers adapt their thinking to effectively tackle distributed system complexities.

An Introduction to Assembly Programming with RISC-V

This text provides information about a resource related to RISC-V programming. The ISBN number for this resource is 978-65-00-15811-3. It is authored by riscv-programming.org.

MLIR — Getting Started

The text is a guide titled "MLIR — Getting Started" by Math ∩ Programming available on www.jeremykun.com.

Chapter 2 Basics of SIMD Programming

The text explains how to organize data for SIMD operations and provides examples of SIMD-Ready Vectors. It also discusses the relationship between vectors and scalars in SIMD programming. Built-in functions for VMX instructions and SIMD operation principles are outlined in the text.

Matrix multiplication in Mojo

The text discusses matrix multiplication in Mojo. It is written by modular.com and can be found on docs.modular.com.

Matrix Multiplication on CPU

The text is about matrix multiplication on a CPU. The author is Marek Kolodziej and the domain is marek.ai.

How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog

The text is a worklog by Simon Boehm about optimizing a CUDA Matmul Kernel for cuBLAS-like performance. It can be found on the domain siboehm.com.

Anonymity and the internet

Anonymity on the internet is fragile, with each piece of information reducing anonymity. Revealing multiple bits of personal information can jeopardize anonymity, but deliberate disinformation can help regain some anonymity. To protect anonymity, it's best to minimize information disclosure.

Where Vim Came From

Vim is a popular text editor with a long history tracing back to the Unix epoch. Its development started in 1988 and evolved from the "wq text editor" concept. Vim's success is attributed to its features and the gradual accumulation of good ideas over time.

Building and operating a pretty big storage system called S3

Dr. Werner Vogels shares insights from working on Amazon's S3 storage system, highlighting the scale and unique challenges faced. S3's design incorporates innovative strategies to efficiently handle vast amounts of data across millions of hard drives while prioritizing customer experience. Vogels emphasizes the need for a broader perspective on software systems and the rewarding journey of scaling as an engineer at Amazon.

LADW_2017-09-04

This text discusses properties of vector spaces and matrices, particularly focusing on bases and eigenvalues. It establishes that any linearly independent system of vectors can be completed to form a basis in a finite-dimensional vector space. Additionally, it explains that operators in inner product spaces have an upper triangular matrix representation under certain conditions.

New Scaling Laws for Large Language Models

DeepMind's new paper challenges existing scaling laws for training large language models, proposing more optimal use of compute resources. By training a smaller 70-billion parameter model using their new scaling laws, DeepMind demonstrated superior performance compared to larger models like GPT-3 and their own 270-billion parameter model. This discovery may lead to more cost-effective and efficient training of large language models in the future.

king - man + woman is queen; but why?

The text explains how the word2vec algorithm transforms words into vectors for analyzing similarities and relationships between words. By using vector arithmetic, it can find analogies such as "king - man + woman = queen." Understanding word co-occurrences can provide insight into the meaning of words through the distributional hypothesis.

1-bit Model

Quantizing small models like Llama2-7B at 1-bit yields poor performance but fine-tuning with low-rank adapters significantly improves output quality. The HQQ+ approach shows potential in extreme low-bit quantization for machine learning models, reducing memory and computational requirements while maintaining performance. Training larger models with extreme quantization can lead to superior performance compared to training smaller models from scratch.

Human Knowledge Compression Contest

The Human Knowledge Compression Contest measures intelligence through data compression ratios. Better compression leads to better prediction and understanding, showcasing a link between compression and artificial intelligence. The contest aims to raise awareness of the relationship between compression and intelligence, encouraging the development of improved compressors.

Where do LLMs spend their FLOPS?

LLMs (large language models) spend their FLOPS (floating point operations) on various tasks, including computing QKV (query, key, value) matrices, attention output matrices, and running the feed-forward network (FFN). The attention mechanism plays a crucial role in LLMs, even though the FLOPS required for attention calculations are relatively small. The KV cache, which stores information for each token, requires significant memory but is necessary for generating sequences. Different architectural choices, such as grouped query attention and sliding window attention, can affect the size and efficiency of the KV cache. Increasing the number of layers in an LLM linearly scales the FLOPS and parameters, while increasing the model width quadratically scales the model size. Wider models parallelize better, while deeper models increase inference time linearly.

The Illustrated Stable Diffusion

AI image generation with Stable Diffusion involves an image information creator and an image decoder. Diffusion models use noise and powerful computer vision models to generate aesthetically pleasing images. Text can be incorporated to control the type of image the model generates in the diffusion process.

Structure and Interpretation of Computer Programs, 2nd ed.

The text discusses key concepts in programming, such as primitive expressions, means of combination, and means of abstraction. It highlights the role of the environment in determining the meaning of symbols in expressions. The evaluation process involves reducing expressions to procedures applied to arguments, leading to a deeper understanding of programming concepts.

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX libra...

IEEE Xplore Full-Text PDF:

Terry A. Davis

Terry A. Davis, an American electrical engineer and programmer, created TempleOS, a public domain operating system. Despite his mental health challenges, Davis gained an online following for his unique work and beliefs. His legacy continues to be remembered through documentaries and online discussions.

Generative Agents: Interactive Simulacra of Human Behavior

The content discusses generative agents that simulate believable human behavior for interactive applications. These agents populate a sandbox environment, interact with each other, plan their days, form relationships, and exhibit emergent social behaviors. The paper introduces a novel architecture that allows agents to remember, retrieve, reflect, and interact dynamically.

A Gentle Introduction to LLVM IR

Learning LLVM IR can be beneficial for generalist working programmers to understand what their compiler is doing to create highly optimized code. LLVM IR is well-documented and can be treated as a slightly weird programming language. It is strongly typed and requires explicit type annotations. LLVM IR is a static single assignment form (SSA) IR and has properties that make optimizations simpler to write. It supports control flow operations, arithmetic instructions for different types, and memory operations. There are also LLVM intrinsics available for specific functions. However, some parts of LLVM's semantics, such as undefined behavior and pointer provenance, can be challenging to navigate.

How to round to 2 decimals with Python? [duplicate]

To round a number to 2 decimals in Python, the usual method is using round(value, significantDigit), but it can behave unexpectedly when the digit before the one being rounded is a 5. To address this, a workaround involves adding a small value to ensure proper rounding. This method allows for traditional rounding commonly used in statistics without needing to import additional libraries like Decimal. By incorporating this workaround into a function, you can achieve the desired rounding results without encountering the issue with numbers ending in 5.

Rounding floats with f-string [duplicate]

Using %-formatting, I can specify the number of decimal cases in a string: x = 3.14159265 print('pi = %0.2f' %x) This would give me: pi = 3.14 Is there any way of doing this using f-strings in ...

Latent Interfaces

In a career shift, the author is launching Latent Interfaces to apply expertise in design, prototyping, and development to complex data challenges. They share insights into a genomic data project, emphasizing the importance of Python skills alongside JavaScript. The document showcases the creation of intuitive data interfaces and the design process involving both digital and physical tools. Additionally, the author discusses the significance of well-designed APIs like StabilityAI and the potential for future collaborations in data visualization projects.

Hypercomputation

Hypercomputation and super-Turing computation involve models of computation that can produce non-Turing-computable outputs. Introduced in the early 1990s, super-Turing computing is inspired by neurological and biological systems and serves as the foundation for Lifelong Machine Learning. Hypercomputation, a field introduced in the late 1990s, includes philosophical constructs and aims to compute functions beyond what a Turing machine can. The Church-Turing thesis states that any "computable" function can be computed by a Turing machine, but hypercomputers can compute functions that are not computable in the Church-Turing sense. Various hypercomputer models exist, ranging from theoretical concepts like oracle machines to more plausible models like quantum computing. Some proposals suggest that hypercomputation may be achievable through systems like neural networks or analog computers. Critics argue that hypercomputation is not physically realizable.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Recent research is leading to a new era of 1-bit Large Language Models (LLMs), such as BitNet, introducing a variant called BitNet b1.58 where every parameter is ternary {-1, 0, 1}. This model matches the performance of full-precision Transformer LLMs while being more cost-effective in terms of latency, memory, throughput, and energy consumption. The 1.58-bit LLM sets a new standard for training high-performance and cost-effective models, paving the way for new computation methods and specialized hardware designed for 1-bit LLMs.

How Netflix Really Uses Java

The discussion at Netflix delves into how Java is utilized within the company's architecture, highlighting their transition to Java 17 and ongoing testing with Java 21. The move to newer Java versions resulted in significant performance improvements, such as 20% better CPU usage with Java 17. Additionally, the implementation of GraphQL Federation and virtual threads in Java 21 are key advancements that are expected to impact the way code is written and scaled within Netflix's Java stack. The company's shift from Java 8 to Java 17 and the ongoing evolution of their technology frameworks and tooling, particularly focusing on Spring Boot, demonstrate their commitment to staying current with Java developments.

Scheduling Internals

The document delves into the concept of concurrency in programming, exploring how tasks can be handled concurrently using different methods like threads, async I/O, event loops, and schedulers. It discusses the challenges and benefits of each approach, illustrating examples in C code to demonstrate the practical implementations. The text covers topics like preemptive and non-preemptive schedulers, implementation details in languages like Go and Rust, as well as the use of event loops for efficient task handling. It also touches on the importance of understanding program state management and the impact on task execution in concurrent programming.

Bremermann's limit

Bremermann's limit is a maximum rate of computation that can be achieved in a self-contained system in the material universe. It is based on Einstein's mass-energy equivalency and the Heisenberg uncertainty principle. This limit has implications for designing cryptographic algorithms, as it can determine the minimum size of encryption keys needed to create an uncrackable algorithm. The limit has also been analyzed in relation to the maximum rate at which a system with energy spread can evolve into an orthogonal state.

numerical_recipes

The content provided is the table of contents for a book titled "Numerical Recipes: The Art of Scientific Computing, Third Edition." It includes various topics such as linear algebra, interpolation and extrapolation, integration of functions, evaluation of functions, special functions, random numbers, sorting and selection, root finding and nonlinear sets of equations, minimization or maximization of functions, eigensystems, and more.

2309.10668

This article discusses the relationship between language modeling and compression. The authors argue that large language models can be viewed as powerful compressors due to their impressive predictive capabilities. They demonstrate that these models can achieve state-of-the-art compression rates across different data modalities, such as images and audio. The authors also explore the connection between compression and prediction, showing that models that compress well also generalize well. They conclude by advocating for the use of compression as a framework for studying and evaluating language models.

Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories

Diffusion Models (DMs) have become increasingly popular in generating benchmarks, but their mathematical descriptions can be complex. In this survey, the authors provide an overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs), revealing a mathematical connection to Associative Memories (AMs). AMs are energy-based models that share similarities with denoising DMs, but they allow for the computation of a Lyapunov energy function and gradient descent to denoise data. The authors also summarize the 40-year history of energy-based AMs, starting with the Hopfield Network, and discuss future research directions for both AMs and DMs.

K-Level Reasoning with Large Language Models

Large Language Models (LLMs) have shown proficiency in complex reasoning tasks, but their performance in dynamic and competitive scenarios remains unexplored. To address this, researchers have introduced two game theory-based challenges that mirror real-world decision-making. Existing reasoning methods tend to struggle in dynamic settings that require k-level thinking, so the researchers propose a novel approach called "K-Level Reasoning" that improves prediction accuracy and informs strategic decision-making. This research sets a benchmark for dynamic reasoning assessment and enhances the proficiency of LLMs in dynamic contexts.

Competitive Programmer's Handbook

The article discusses various algorithms and data structures used in computer programming, such as Kadane's algorithm, binary indexed trees, segment trees, Dijkstra's algorithm, and Floyd's algorithm. The author also explains concepts like successor graphs, index compression, and minimum spanning trees. The time complexity of each algorithm is also discussed.

Writing an OS in Rust

This blog series provides tutorials on creating a small operating system in the Rust programming language. Each post includes all the necessary code and is accompanied by a corresponding GitHub repository. The series covers topics such as creating a Rust executable without linking the standard library, building a bootable disk image, implementing VGA text mode, performing unit and integration testing, handling CPU exceptions, setting up the interrupt descriptor table, implementing paging and heap allocation, and exploring cooperative multitasking and the async/await feature of Rust. The posts also include status updates and information on supporting the author.

Ever wanted to make your own programming language or wondered how they are designed and built?

Crafting Interpreters is a book that provides everything you need to create your own programming language. It covers both high-level concepts like parsing and semantics, as well as technical details such as bytecode representation and garbage collection. The book guides you through building a language from scratch, including features like dynamic typing, lexical scope, functions, classes, and inheritance. It is available in multiple formats, including print, ebook, and online for free. The author, Robert Nystrom, is an experienced language developer who currently works at Google on the Dart language.

ThermodynamicComputing

Software Development Trends 2023/2024 - Vol. 2.

The document provides a summary of important software development trends observed in 2023 that are likely to continue into 2024. It includes information on technology roadmaps, the state of DevOps, cloud computing, serverless technology, databases, and more. Some key insights from the document include the value drivers and risks associated with adopting software engineering technologies, the impact of generative cultures and user-focused teams on performance, and the increasing adoption of serverless solutions. Additionally, the document highlights the need for multi-cloud skills development and the most in-demand cloud skills for 2023.

MemGPT: Towards LLMs as Operating Systems

MemGPT is a system that manages different memory tiers to provide extended context within the limited context window of large language models (LLMs). Using an OS-inspired design, MemGPT can handle unbounded context using LLMs that have finite context windows. It is successful in domains where existing LLMs' limited context windows severely limit their performance, such as document analysis and multi-session chat. MemGPT supports self-directed editing and retrieval, memory-hierarchy, OS functions, and event-based control flow to manage unbounded context.

This project is about how to systematically persuade LLMs to jailbreak them.

This project introduces a taxonomy of 40 persuasion techniques to systematically persuade LLMs (large language models) to jailbreak them. Through iterative application of these techniques, the researchers achieved a 92% success rate in jailbreaking advanced LLMs. They also found that more advanced models are more vulnerable to persuasive adversarial prompts (PAPs) and that adaptive defenses can effectively neutralize these prompts. The research highlights the challenges of addressing user-invoked risks from persuasion and the need for further investigation and improved defenses for more capable models.

(2) Home

Eagle Dynamics has exciting plans for the upcoming year, with the development and release of new aircraft and maps. Some highlights include the introduction of the MiG-29A Fulcrum, as well as the Afghanistan and Iraq maps. They are also continuing their work on the CH-47F, Hellcat/USS Enterprise, and the Marianas WW2 map. Fans of flight simulation can look forward to these upcoming additions to the game.

Mathematics for Machine Learning

I'm sorry, but there is no content provided for me to summarize.

Tensor2Tensor Intro

The content below is not provided.

Generative Agents: Interactive Simulacra of Human Behavior

The article describes the concept of "generative agents", which are computational software agents that simulate believable human behavior for interactive applications. The agents are created using a large language model and can remember, reflect, and plan based on their past experiences. The article demonstrates generative agents by populating a sandbox environment with 25 agents, where users can observe and intervene as agents plan their days, form relationships, and coordinate group activities. The article discusses the architecture that enables generative agents and their potential applications in various domains.

Extensions in Arc: How to Import, Add, & Open

Arc has full extension support. Here's how

Subcategories