Arithmetic Intensity Estimation of Large Language Models

This blog post discusses the arithmetic intensity of large language models and how it affects the performance of these models.

Mar-13-2025 · 5 min · 890 words · jamesnulliu

A Brief Talk on Speculative Decoding

A brief talk on speculative decoding in large language models.

Feb-21-2025 · 8 min · 1690 words · jamesnulliu

APC, SD, and SF

Explanation of Automatic Prefix Caching (APC), Speculative Decoding (SD), and Split Fuse (SF).

Jan-13-2025 · 2 min · 394 words · jamesnulliu

WSL is All You Need

How do I work with WSL

Jan-10-2025 · 3 min · 452 words · jamesnulliu

Create A LibTorch Project

How to create a LibTorch project.

Dec-23-2024 · 7 min · 1306 words · jamesnulliu

My vimrc

My configurations of vim.

Nov-21-2024 · 2 min · 413 words · jamesnulliu

VSCode: Debug C++

This post shows how to configure launch.json in VSCode for debugging C++.

Oct-09-2024 · 2 min · 400 words · jamesnulliu

VSCode: Debug Python

This post shows how to configure launch.json in VSCode for debugging Python.

Oct-09-2024 · 1 min · 173 words · jamesnulliu

Render Mathematics in Hugo

This post shows how to render mathematics in Hugo.

Oct-08-2024 · 2 min · 413 words · jamesnulliu

Dive into Paged Attention

Dive into the paged attention mechanism of vLLM.

Oct-07-2024 · 11 min · 5109 words · jamesnulliu