The rise of the MoEs Image by the author — Generated with DALL-ESince the release of Mixtral-8x7B by Mistral AI, there has been a renewed interest in the mixture of expert (MoE) models. This architecture exploits expert sub-networks among which only some of them are selected and activated by a router network during inference. MoEs…
