Draft: FullTransformerDecoder
This just adds the FullTransformerDecoder
class analogous to the FullTransformerEncoder
, but based on cross-attention rather than only self-attention.
This just adds the FullTransformerDecoder
class analogous to the FullTransformerEncoder
, but based on cross-attention rather than only self-attention.