In the last few years, a focus in language modelling has been on improving performance through increasing the number of parameters in transformer-based models. This approach has led to impressive results and state-of-the-art performance across many natural language processing tasks. We also pursued this line of research at DeepMind and recently showcased Gopher, a 280-billion…
