Reading Notes: “GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints”Feb 28, 2025
Reading Notes: “Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning”Feb 23, 2025