Merge branch 'main-patch-papers' into 'main'

Add tokenization and mixed precision sections See merge request !5

Merge branch 'main-patch-papers' into 'main'
Add tokenization and mixed precision sections See merge request !5
d682608a · Vlad-Andrei BĂDOIU (78692) · 9b26fbc9 · 7333a138 · d682608a
Commit d682608a authored 1 year ago by Vlad-Andrei BĂDOIU (78692)
--- a/doc/llm.md
+++ b/doc/llm.md
@@ -47,6 +47,47 @@
  2023)](https://arxiv.org/abs/2305.14314) - builds on top of LoRA, and further
  quantizes models to 4-bit, to reduce memory usage

+## Tokenization
+
+- ["Neural Machine Translation of Rare Words with Subword Units" (Sennrich et
+  al. - jun. 2016)](https://arxiv.org/abs/1508.07909) - introduces BPE
+  (Byte-Pair Encoding), as a solution for encoding subword units
+- ["Subword Regularization: Improving Neural Network Translation Models with
+  Multiple Subword Candidates" (Kudo - apr.
+  2018)](https://arxiv.org/abs/1804.10959) - the unigram language model, as an
+  improvement over BPE; both are used today, though
+- ["SentencePiece: A simple and language independent subword tokenizer and
+  detokenizer for Neural Text Processing" (Kudo et al. - aug.
+  2018)](https://arxiv.org/abs/1808.06226) - Google's SentencePiece tokenizer,
+  a tokenization method with API's in C++ and Python, which can use either
+  unigrams or BPE as its encoding method
+
+## Mixed precision training
+- ["Mixed Precision Training" (Micikevicious et al. (Nvidia) - oct.
+  2017)](https://arxiv.org/abs/1710.03740) - introduces mixed precision training
+- ["Mixed-Precision Training of Deep Neural Networks"
+  (Nvidia)](https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/)
+  - 2017 Nvidia blogpost (accompanying the paper above) about mixed-precision
+    training, implemented on Nvidia GPU's
+- ["Mixed-Precision Training of Deep Neural Networks" video
+  (Nvidia)](https://on-demand.gputechconf.com/gtc/2019/video/_/S9143/) - the
+  video accompanying the above blogpost/paper
+- ["Train with Mixed Precision"
+  (Nvidia)](https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html)
+  - documentation on training models with Nvidia GPU's using mixed precision
+- ["What Every User Should Know About Mixed Precision Training in PyTorch"
+  (PyTorch)](https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/)
+  - intro to mixed precision in PyTorch
+- ["Automatic Mixed Precision"
+  (PyTorch)](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html) - a
+  short recipe on how to add mixed precision in PyTorch
+- ["CUDA Automatic Mixed Precision examples"
+  (PyTorch)](https://pytorch.org/docs/stable/notes/amp_examples.html) - examples
+  of using mixed precision in PyTorch
+- ["Mixed precision training"
+  (fast.ai)](https://docs.fast.ai/callback.fp16.html) - this one is another good
+  explanation of how mixed precision training works in theory
+
 ## Benchmarks

 - [MultiPL-E](https://nuprl.github.io/MultiPL-E/)