In Depth
MoE models can be very large in total parameters but only use a fraction for any given input, making them faster and cheaper than dense models of equivalent capability. GPT-4 and Mixtral are believed to use MoE architectures.
Top Stories
All Sections
Resources & Tools
About
MoE models can be very large in total parameters but only use a fraction for any given input, making them faster and cheaper than dense models of equivalent capability. GPT-4 and Mixtral are believed to use MoE architectures.