Skip to content
Snippets Groups Projects
Commit d682608a authored by Vlad-Andrei BĂDOIU (78692)'s avatar Vlad-Andrei BĂDOIU (78692)
Browse files

Merge branch 'main-patch-papers' into 'main'

Add tokenization and mixed precision sections

See merge request !5
parents 9b26fbc9 7333a138
No related branches found
No related tags found
1 merge request!5Add tokenization and mixed precision sections
......@@ -47,6 +47,47 @@
2023)](https://arxiv.org/abs/2305.14314) - builds on top of LoRA, and further
quantizes models to 4-bit, to reduce memory usage
## Tokenization
- ["Neural Machine Translation of Rare Words with Subword Units" (Sennrich et
al. - jun. 2016)](https://arxiv.org/abs/1508.07909) - introduces BPE
(Byte-Pair Encoding), as a solution for encoding subword units
- ["Subword Regularization: Improving Neural Network Translation Models with
Multiple Subword Candidates" (Kudo - apr.
2018)](https://arxiv.org/abs/1804.10959) - the unigram language model, as an
improvement over BPE; both are used today, though
- ["SentencePiece: A simple and language independent subword tokenizer and
detokenizer for Neural Text Processing" (Kudo et al. - aug.
2018)](https://arxiv.org/abs/1808.06226) - Google's SentencePiece tokenizer,
a tokenization method with API's in C++ and Python, which can use either
unigrams or BPE as its encoding method
## Mixed precision training
- ["Mixed Precision Training" (Micikevicious et al. (Nvidia) - oct.
2017)](https://arxiv.org/abs/1710.03740) - introduces mixed precision training
- ["Mixed-Precision Training of Deep Neural Networks"
(Nvidia)](https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/)
- 2017 Nvidia blogpost (accompanying the paper above) about mixed-precision
training, implemented on Nvidia GPU's
- ["Mixed-Precision Training of Deep Neural Networks" video
(Nvidia)](https://on-demand.gputechconf.com/gtc/2019/video/_/S9143/) - the
video accompanying the above blogpost/paper
- ["Train with Mixed Precision"
(Nvidia)](https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html)
- documentation on training models with Nvidia GPU's using mixed precision
- ["What Every User Should Know About Mixed Precision Training in PyTorch"
(PyTorch)](https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/)
- intro to mixed precision in PyTorch
- ["Automatic Mixed Precision"
(PyTorch)](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html) - a
short recipe on how to add mixed precision in PyTorch
- ["CUDA Automatic Mixed Precision examples"
(PyTorch)](https://pytorch.org/docs/stable/notes/amp_examples.html) - examples
of using mixed precision in PyTorch
- ["Mixed precision training"
(fast.ai)](https://docs.fast.ai/callback.fp16.html) - this one is another good
explanation of how mixed precision training works in theory
## Benchmarks
- [MultiPL-E](https://nuprl.github.io/MultiPL-E/)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment