Extend functionality for attention network output size in CADS
Currently the attention network outputs a single node which can be
- softmax over
- left as value between 0 and 1
It would be good to extend this to multiple output nodes These can then be seaprately handled with softmax, or no, or fed to downstream tasks/auxiliary losses
Required changes go in:
-
tf_tools/layers.py
: allow for node output size in Attention model, will be safest to output tensor of shape (batch, N, 1) in the case of only one node still -
tf_tools/layers.py
: accommodate changes for Attention output into AttentionPooling layer. Removal of the expand_dims as should now by default be done in the Attention network itself. Current matrix multiplication will handle if this is 1->many. -
tf_tools/models.py
: need to pass options to the layers for building. Perhaps add more flexibility in (conditional) deep sets for attention outputs and how they interact with the sets output.