Ludwig - ai/reinforcement

CompilerGym is a library for reinforcement learning in compiler tasks. It helps ML researchers work on optimization problems and allows system developers to create new tasks for ML research. The goal is to use ML to make compilers faster.

cs/theory/compilers/optimization

Self-Rewarding Language Models

Added on January 20, 2024

To achieve superhuman language models, researchers propose the use of self-rewarding language models (LLMs) that provide their own rewards during training. Unlike current approaches that rely on human preferences, LLMs use prompts to judge their own performance and improve their instruction following ability and reward generation. A preliminary study using this approach, specifically fine-tuning Llama 2 70B, demonstrates that it outperforms existing systems on the AlpacaEval 2.0 leaderboard. This work suggests the potential for models that can continually improve in both axes.

Paper page - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Added on January 10, 2024

The content is a set of instructions on how to cite a specific URL (arxiv.org/abs/2401.01335) in three different types of README.md files, in order to create links from those pages.

Some Core Principles of Large Language Model (LLM) Tuning

Added on January 3, 2024

Large Language Models (LLMs) like GPT2 and GPT3 are trained using unsupervised pre-training on billions to trillions of tokens. After pre-training, the models are fine-tuned for specific use cases such as chatbots or content generation. Fine-tuning can be done through supervised fine-tuning (SFT) or reinforcement learning with human feedback (RLHF). SFT involves minimizing the loss between the model's output and the correct result, while RLHF uses a reward model to optimize the model's performance. InstructGPT is an RLHF-tuned version of GPT3 that is trained to follow instructions and provide aligned responses. There are also open-source alternatives to GPT models, such as GPT-J and GPT-Neo.

VOYAGER: An Open-Ended Embodied Agent with Large Language Models

Added on January 1, 2024

The article presents VOYAGER, an embodied agent that continuously explores the Minecraft world, acquires skills, and makes new discoveries without human intervention. VOYAGER consists of three key components: an automatic curriculum for exploration, a skill library for storing and retrieving complex behaviors, and an iterative prompting mechanism for program improvement. The agent utilizes Large Language Models (LLMs) and code as the action space, allowing it to represent temporally extended and compositional actions. The article also highlights VOYAGER's superior performance in discovering novel items, unlocking the Minecraft tech tree, and applying its learned skill library to unseen tasks in a newly instantiated world.

Bookmarks

World Models

The MAP-Elites Algorithm: Finding Optimality Through Diversity

"Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?"

by Marcus Hutter and David Quarel and Elliot Catt

Genie 2: A large-scale foundation world model

TS_Tutorial

Indices and tables

Self-Rewarding Language Models

Paper page - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Some Core Principles of Large Language Model (LLM) Tuning

VOYAGER: An Open-Ended Embodied Agent with Large Language Models

Subcategories