Standard MoE (baseline MoE)

learns syntactic / function-word clusters

EMO (two-level MoE)

learns topical / semantic clusters

Click a cluster on the left to see documents with that cluster's tokens highlighted.