A 7B sparse Mixture-of-Experts (SMoE). Uses 12.9B active parameters out of 45B total.
32,768 tokens
$0.70
$15
$0.19
Old small MoE model; superseded by Mixtral 8x22B