- Teamwork: git & github + CI/CD
- Environment:
- Docker & Singularity (Apptainer)
- Linux (Debian & Ubuntu & RockyLinux & Fedora)
- Windows & WSL2 (pin memory)
- CMake + vcpkg
- pip/conda/uv
- IDE: VSCode (Linux/Windows) & VS (Windows)
- Others: Bash & Powershell & Vim & Tmux
2. Languages & Frameworks#
- Languages: C & C++ & CUDA & Python & pytorch & libtorch & Triton
- Inference: vLLM & sglang
- SFT: trl & unsloth
- RLHF: verl
- Compute Graph: MLIR
4. Concepts#
4.1. NLP#
- Inference:
- TP (Megatron)
- Quantization: PTQ & QAT & GPTQ
- Pruning: Unstructured & Structured
- Paged Attention & Flash Attention & MQA & GQA & DeepSeek Sparse Attention (MLA)
- Prefix Caching
- Continuous Batching
- Chunked Prefill
- Speculative Decoding
- Sampling: Top-k & Top-p & Temperature & Beam Search
- Training:
- Pretraining
- SFT
- RLHF: PPO & GRPO & DAPO
- PEFT: LoRA, Prefix Tuning, P-Tuning, Prompt Tuning
- Efficiency:
- Position Embedding:
- Cluster:
- Apptainer + Slurm + Docker + Module
- torchrun & deepspeed & accelerate & bitsandbytes
- NCCL & Gloo & MPI
- Ray
- InfiniBand
- Evaluation: HPCG