Mixtral 8x22B is currently the most performant open model. A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.
65,536 tokens
$2
$6
$15
$0.19
Old MoE model; superseded by Mistral Large