Arithmetic Intensity Estimation of Large Language Models

This blog post discusses the arithmetic intensity of large language models and how it affects the performance of these models.

Mar-13-2025 · 5 min · 890 words · jamesnulliu

A Brief Talk on Speculative Decoding

A brief talk on speculative decoding in large language models.

Feb-21-2025 · 8 min · 1690 words · jamesnulliu

APC, SD, and SF

Explanation of Automatic Prefix Caching (APC), Speculative Decoding (SD), and Split Fuse (SF).

Jan-13-2025 · 2 min · 394 words · jamesnulliu

Create A LibTorch Project

How to create a LibTorch project.

Dec-23-2024 · 7 min · 1306 words · jamesnulliu

Dive into Paged Attention

Dive into the paged attention mechanism of vLLM.

Oct-07-2024 · 11 min · 5109 words · jamesnulliu

A Simple Pytorch Trainpipeline

How to build a simple Pytorch trainpipeline.

Jun-30-2024 · 4 min · 714 words · jamesnulliu