Reinforcement Learning for LLMs

1. Basics RLHF: Reinforcement Learning from Human Feedback SFT: Supervised Fine-Tuning RL trains neural networks through trial and error. When finetuning a language model with RLHF, the model produces some text then receives a score/reward from a human annotator that captures the quality of that text. Then, we use RL to finetune the language model to generate outputs with high scores. In this case, we cannot apply a loss function that trains the language model to maximize human preferences with supervised learning. This is because there’s no easy way to explain the score human give or connect it mathematically to the output of the neural network. In other words, we cannot backpropagate a loss applied to this score through the rest of the neural network. This would require that we are able to differentiate (i.e., compute the gradient of) the system that generates the score, which is a human that subjectively evaluates the generated text. ...

Jun-02-2025 · 27 min · 5602 words · jamesnulliu

CUDA Programming Notes | 01: Memory Coalescing

Introduction to memory coalescing with Nsight Compute.

Mar-16-2025 · 2 min · 969 words · jamesnulliu

Arithmetic Intensity Estimation of Large Language Models

This blog post discusses the arithmetic intensity of large language models and how it affects the performance of these models.

Mar-13-2025 · 5 min · 889 words · jamesnulliu

A Brief Talk on Speculative Decoding

A brief talk on speculative decoding in large language models.

Feb-21-2025 · 8 min · 1691 words · jamesnulliu

WSL is All You Need

How do I work with WSL

Jan-10-2025 · 4 min · 700 words · jamesnulliu

Create A LibTorch Project

How to create a LibTorch project.

Dec-23-2024 · 7 min · 1337 words · jamesnulliu

My vimrc

My configurations of vim.

Nov-21-2024 · 2 min · 413 words · jamesnulliu

VSCode: Debug C++

This post shows how to configure launch.json in VSCode for debugging C++.

Oct-09-2024 · 2 min · 400 words · jamesnulliu

VSCode: Debug Python

This post shows how to configure launch.json in VSCode for debugging Python.

Oct-09-2024 · 1 min · 173 words · jamesnulliu

Render Mathematics in Hugo

This post shows how to render mathematics in Hugo.

Oct-08-2024 · 2 min · 413 words · jamesnulliu