Arithmetic Intensity Estimation of Large Language Models
This blog post discusses the arithmetic intensity of large language models and how it affects the performance of these models.
This blog post discusses the arithmetic intensity of large language models and how it affects the performance of these models.
A brief talk on speculative decoding in large language models.
Explanation of Automatic Prefix Caching (APC), Speculative Decoding (SD), and Split Fuse (SF).
How to create a LibTorch project.
Dive into the paged attention mechanism of vLLM.
How to build a simple Pytorch trainpipeline.