Distillation Scaling Laws

We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings mitigate the risks associated with large-scale distillation by enabling compute-optimal allocation for both the teacher and student to maximize student performance. We provide compute-optimal distillation recipes for two key …

AI at light speed: How glass fibers could replace silicon brains

Imagine supercomputers that think with light instead of electricity. That s the breakthrough two European research teams have made, demonstrating how intense laser pulses through ultra-thin glass fibers can perform AI-like computations thousands of times faster than traditional electronics. Their system doesn t just break speed records it achieves near state-of-the-art results in tasks like …