From 7333a1387db9589746e62de47134cb4ef07fcfaa Mon Sep 17 00:00:00 2001 From: Alexandru Gherghescu <gherghescu_alex1@yahoo.ro> Date: Sun, 31 Dec 2023 17:45:03 +0200 Subject: [PATCH] Add tokenization and mixed precision sections --- doc/llm.md | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/doc/llm.md b/doc/llm.md index f694a46..fbdc170 100644 --- a/doc/llm.md +++ b/doc/llm.md @@ -47,6 +47,47 @@ 2023)](https://arxiv.org/abs/2305.14314) - builds on top of LoRA, and further quantizes models to 4-bit, to reduce memory usage +## Tokenization + +- ["Neural Machine Translation of Rare Words with Subword Units" (Sennrich et + al. - jun. 2016)](https://arxiv.org/abs/1508.07909) - introduces BPE + (Byte-Pair Encoding), as a solution for encoding subword units +- ["Subword Regularization: Improving Neural Network Translation Models with + Multiple Subword Candidates" (Kudo - apr. + 2018)](https://arxiv.org/abs/1804.10959) - the unigram language model, as an + improvement over BPE; both are used today, though +- ["SentencePiece: A simple and language independent subword tokenizer and + detokenizer for Neural Text Processing" (Kudo et al. - aug. + 2018)](https://arxiv.org/abs/1808.06226) - Google's SentencePiece tokenizer, + a tokenization method with API's in C++ and Python, which can use either + unigrams or BPE as its encoding method + +## Mixed precision training +- ["Mixed Precision Training" (Micikevicious et al. (Nvidia) - oct. + 2017)](https://arxiv.org/abs/1710.03740) - introduces mixed precision training +- ["Mixed-Precision Training of Deep Neural Networks" + (Nvidia)](https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/) + - 2017 Nvidia blogpost (accompanying the paper above) about mixed-precision + training, implemented on Nvidia GPU's +- ["Mixed-Precision Training of Deep Neural Networks" video + (Nvidia)](https://on-demand.gputechconf.com/gtc/2019/video/_/S9143/) - the + video accompanying the above blogpost/paper +- ["Train with Mixed Precision" + (Nvidia)](https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html) + - documentation on training models with Nvidia GPU's using mixed precision +- ["What Every User Should Know About Mixed Precision Training in PyTorch" + (PyTorch)](https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/) + - intro to mixed precision in PyTorch +- ["Automatic Mixed Precision" + (PyTorch)](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html) - a + short recipe on how to add mixed precision in PyTorch +- ["CUDA Automatic Mixed Precision examples" + (PyTorch)](https://pytorch.org/docs/stable/notes/amp_examples.html) - examples + of using mixed precision in PyTorch +- ["Mixed precision training" + (fast.ai)](https://docs.fast.ai/callback.fp16.html) - this one is another good + explanation of how mixed precision training works in theory + ## Benchmarks - [MultiPL-E](https://nuprl.github.io/MultiPL-E/) -- GitLab