Arithmetic Intensity Estimation of Large Language Models
This blog post discusses the arithmetic intensity of large language models and how it affects the performance of these models.
This blog post discusses the arithmetic intensity of large language models and how it affects the performance of these models.
A brief talk on speculative decoding in large language models.
Explanation of Automatic Prefix Caching (APC), Speculative Decoding (SD), and Split Fuse (SF).
How do I work with WSL
How to create a LibTorch project.
My configurations of vim.
This post shows how to configure launch.json in VSCode for debugging C++.
This post shows how to configure launch.json in VSCode for debugging Python.
This post shows how to render mathematics in Hugo.
Dive into the paged attention mechanism of vLLM.