Boomerang distillation is a phenomenon in LLMs, where distilling a teacher model into a student model enables us to reconstruct intermediate-sized models by incorporating teacher layers into the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results