Training a 1 trillion parameter model
Update: 2025-09-04
Share
Description
Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism
Comments 
In Channel
Description
Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism