Skip to content

Minor follow up to transformer changes

Description

Follow up to !241 (merged)

  • Switch to flash-varlen attn for GN3 (use torch-meff as default)
  • Remove registers from SaltModel class
  • Update register initialisation to match paper

cc @mleigh

Review checklist:

  • CI Passing
  • Comments addressed
  • Source branch is up to date with target
Edited by Samuel Van Stroud

Merge request reports