Bookmarks
World Models
Can agents learn inside of their own dreams?
The MAP-Elites Algorithm: Finding Optimality Through Diversity
MAP-Elites is a method in reinforcement learning to avoid the local optimum of a search space by storing multiple candidate solutions…
"Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?"
This isn't a new intuition, but a nice new set of results.
by Marcus Hutter and David Quarel and Elliot Catt
The book can be ordered from amazon. com / co.
Genie 2: A large-scale foundation world model
Generating unlimited diverse training environments for future general agents
Indices and tables
CompilerGym is a library for reinforcement learning in compiler tasks. It helps ML researchers work on optimization problems and allows system developers to create new tasks for ML research. The goal is to use ML to make compilers faster.
Self-Rewarding Language Models
To achieve superhuman language models, researchers propose the use of self-rewarding language models (LLMs) that provide their own rewards during training. Unlike current approaches that rely on human preferences, LLMs use prompts to judge their own performance and improve their instruction following ability and reward generation. A preliminary study using this approach, specifically fine-tuning Llama 2 70B, demonstrates that it outperforms existing systems on the AlpacaEval 2.0 leaderboard. This work suggests the potential for models that can continually improve in both axes.
Paper page - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
The content is a set of instructions on how to cite a specific URL (arxiv.org/abs/2401.01335) in three different types of README.md files, in order to create links from those pages.
Some Core Principles of Large Language Model (LLM) Tuning
Large Language Models (LLMs) like GPT2 and GPT3 are trained using unsupervised pre-training on billions to trillions of tokens. After pre-training, the models are fine-tuned for specific use cases such as chatbots or content generation. Fine-tuning can be done through supervised fine-tuning (SFT) or reinforcement learning with human feedback (RLHF). SFT involves minimizing the loss between the model's output and the correct result, while RLHF uses a reward model to optimize the model's performance. InstructGPT is an RLHF-tuned version of GPT3 that is trained to follow instructions and provide aligned responses. There are also open-source alternatives to GPT models, such as GPT-J and GPT-Neo.
VOYAGER: An Open-Ended Embodied Agent with Large Language Models
The article presents VOYAGER, an embodied agent that continuously explores the Minecraft world, acquires skills, and makes new discoveries without human intervention. VOYAGER consists of three key components: an automatic curriculum for exploration, a skill library for storing and retrieving complex behaviors, and an iterative prompting mechanism for program improvement. The agent utilizes Large Language Models (LLMs) and code as the action space, allowing it to represent temporally extended and compositional actions. The article also highlights VOYAGER's superior performance in discovering novel items, unlocking the Minecraft tech tree, and applying its learned skill library to unseen tasks in a newly instantiated world.
Subcategories
- applications (9)
- compression (9)
- computer_vision (8)
- deep_learning (94)
- ethics (2)
- generative_models (25)
- interpretability (17)
- natural_language_processing (24)
- optimization (7)
- recommendation (2)
- reinforcement_learning (11)
- supervised_learning (1)