Skip to content
Snippets Groups Projects
Unverified Commit 9c126a3a authored by Alexandru-Mihai GHERGHESCU's avatar Alexandru-Mihai GHERGHESCU
Browse files

Add gradient checkpointing papers

parent db8f2c94
No related branches found
No related tags found
No related merge requests found
......@@ -29,6 +29,13 @@ useful for someone to get up-to-date with the NLP research of today.
- ["Effective Approaches to Attention-based Neural Machine Translation" (Luong
et al. - sep. 2015)](https://arxiv.org/abs/1508.04025) - one of the other
Attention papers
- ["Training Deep Nets with Sublinear Memory Cost" (Chen et al. - apr.
2016)](https://arxiv.org/abs/1604.06174) - introduces gradient checkpointing
for the first time, showing how to train larger models with little
computational overhead
- ["Memory-Efficient Backpropagation Through Time" (Gruslys et al. - jun.
2016)](https://arxiv.org/abs/1606.03401) - improves on gradient checkpointing
for RNNs
- ["Layer Normalization" (Hinton et al. - jul.
2016)](https://arxiv.org/abs/1607.06450) - the Layer Normalization paper, used
by the original Transformer architecture
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment