Skip to content

Draft: muP transfer of parameters between models of different complexity

Maxence Draguet requested to merge mdraguet/salt:max-adding-muP into main

This MR is adapting the salt codebase to run the muP transfer. This technique makes it possible to optimise the hyperparameters of a lower complexity model than the target model. Thanks to the correct parametrisation and initialisation of the network, the performance hierarchy of the low complexity model should match that of the high complexity one. Furthermore, for a given architecture, the muP parametrised network should perform equally or better than the same network in the (current) standard parametrisation.

This work is based on this paper and uses this GitHub repository and the muP package

Merge request reports