Reading Notes: “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”Mar 8, 2025
Reading Notes: “Efficient Memory Management for Large Language Model Serving with PagedAttention”Mar 7, 2025
Reading Notes: “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer”Mar 2, 2025
Reading Notes: “GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints”Feb 28, 2025