Skip to content

Draft: Global Salt refactoring

This MR breaks backward compatibility with pre-GN3 models (like GN1 and GN2*)

Summary

This MR introduces a global refactoring of salt/models. It aims to remove old unsupported code and transfer all legacy models to the same transformer backend, as used in GN3.

Models refactoring

  • Remove GATv2 attention completely
  • Replace old transformer and attention modules with their analogs from transformer_v2
    • muP options are supported
    • edges are supported via separate EdgeAttention class
  • No CrossAttentionPooling, wasn't used anywhere

Backward compatibility policy

  • In all configs replace TransformerEncoder with Transformer.
  • Match num_layers and num_heads
  • Attention -- flash-varlen
  • Norm -- pre with LayerNorm
  • Dense -- gated=False
  • Dropout for attention -- not supported

Conformity

Edited by Dmitrii Kobylianskii

Merge request reports

Loading