Arithmetic Intensity Estimation of Large Language Models

This blog post discusses the arithmetic intensity of large language models and how it affects the performance of these models.

Mar-13-2025 · 5 min · 890 words · jamesnulliu

A Brief Talk on Speculative Decoding

A brief talk on speculative decoding in large language models.

Feb-21-2025 · 8 min · 1690 words · jamesnulliu

Dive into Paged Attention

Dive into the paged attention mechanism of vLLM.

Oct-07-2024 · 11 min · 5109 words · jamesnulliu