Implementing Speculative and Contrastive Decoding Large Language models are comprised of billions of parameters (weights). For each word it generates, the model has to perform computationally expensive calculations across all of these parameters. Large Language models accept a sentence, or sequence of tokens, and generate a probability distribution of the next most likely token. Thus,…
