Draft: Global Salt refactoring
This MR breaks backward compatibility with pre-GN3 models (like GN1 and GN2*)
Summary
This MR introduces a global refactoring of salt/models.
It aims to remove old unsupported code and transfer all legacy models to the same transformer backend, as used in GN3.
Models refactoring
- Remove
GATv2attention completely - Replace old
transformerandattentionmodules with their analogs fromtransformer_v2-
muPoptions are supported -
edgesare supported via separateEdgeAttentionclass
-
- No
CrossAttentionPooling, wasn't used anywhere
Backward compatibility policy
- In all configs replace
TransformerEncoderwithTransformer. - Match
num_layersandnum_heads - Attention --
flash-varlen - Norm --
prewithLayerNorm - Dense --
gated=False - Dropout for attention -- not supported
Conformity
-
Documentation Updated? -
Development Guidelines followed? -
Pipeline Passes?
Edited by Dmitrii Kobylianskii