Boomerang distillation is a phenomenon in LLMs, where distilling a teacher model into a student model enables us to reconstruct intermediate-sized models by incorporating teacher layers into the ...