Fixed Gemma4 MoE weight tying where lm_head drifted from the active embed_tokens during training.

Merged PR fixing weight tying on the Gemma4 MoE path: the language-model replacement orphaned the lm_headembed_tokens link, so the tied weights drifted during training. The fix re-ties lm_head to the active embed_tokens.

  • NeMo Automodel
  • Gemma
  • MoE
  • weight-tying

Resolved ImportError: AnthropicOutputConfig in nightly images by busting BuildKit layer cache with wheel SHA-256 checksums.

Merged PR fixing stale wheel installs in nightly Docker images. Bind-mount installs were skipped on warm BuildKit agents; copying only wheel.sha256 busts the cache without bloating the image. Added regression test for Anthropic protocol exports.

  • vLLM
  • Docker
  • BuildKit
  • CI