V3 ArchitectureMLA — Multi-head Latent AttentionDeepSeekMoE — 256 experts, 1 sharedMTP — Multi-token PredictionFP8 학습