Enhancement: Implement PyTorch's native Transformer model for reference
Although adapted after the initial 2017 arhitecture, PyTorch's native Transformer is a good comparison in terms of speed and memory requirements, since its implementation (most likely) is optimized in C++ CUDA code.
The task is to implement a simple adapter inside optimus/models
(probably named PyTorchTransformer), whose __init__()
method initializes a PyTorch Transformer, and whose forward()
method calls PyTorchTransformer.forward()
.
Edit: This class may not be exactly what we want, since it has both decoders and encoders (for translation). We only want the decoder part, so perhaps using only the TransfomerDecoder is more appropriate. This, however, probably needs separate Embedding and positional encodings, and a separate Linear Layer at the end.
Probably a good starting point is the PyTorch Transformer example which is almost 90% what we want (there are a few things which our framework does a bit differently though, so those need to be adapted).