The Machine Learning Practitioner’s Guide to Speculative Decoding

Large language models generate text one token at a time.