From 7333a1387db9589746e62de47134cb4ef07fcfaa Mon Sep 17 00:00:00 2001
From: Alexandru Gherghescu <gherghescu_alex1@yahoo.ro>
Date: Sun, 31 Dec 2023 17:45:03 +0200
Subject: [PATCH] Add tokenization and mixed precision sections

---
 doc/llm.md | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/doc/llm.md b/doc/llm.md
index f694a46..fbdc170 100644
--- a/doc/llm.md
+++ b/doc/llm.md
@@ -47,6 +47,47 @@
   2023)](https://arxiv.org/abs/2305.14314) - builds on top of LoRA, and further
   quantizes models to 4-bit, to reduce memory usage
 
+## Tokenization
+
+- ["Neural Machine Translation of Rare Words with Subword Units" (Sennrich et
+  al. - jun. 2016)](https://arxiv.org/abs/1508.07909) - introduces BPE
+  (Byte-Pair Encoding), as a solution for encoding subword units
+- ["Subword Regularization: Improving Neural Network Translation Models with
+  Multiple Subword Candidates" (Kudo - apr.
+  2018)](https://arxiv.org/abs/1804.10959) - the unigram language model, as an
+  improvement over BPE; both are used today, though
+- ["SentencePiece: A simple and language independent subword tokenizer and
+  detokenizer for Neural Text Processing" (Kudo et al. - aug.
+  2018)](https://arxiv.org/abs/1808.06226) - Google's SentencePiece tokenizer,
+  a tokenization method with API's in C++ and Python, which can use either
+  unigrams or BPE as its encoding method
+
+## Mixed precision training
+- ["Mixed Precision Training" (Micikevicious et al. (Nvidia) - oct.
+  2017)](https://arxiv.org/abs/1710.03740) - introduces mixed precision training
+- ["Mixed-Precision Training of Deep Neural Networks"
+  (Nvidia)](https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/)
+  - 2017 Nvidia blogpost (accompanying the paper above) about mixed-precision
+    training, implemented on Nvidia GPU's
+- ["Mixed-Precision Training of Deep Neural Networks" video
+  (Nvidia)](https://on-demand.gputechconf.com/gtc/2019/video/_/S9143/) - the
+  video accompanying the above blogpost/paper
+- ["Train with Mixed Precision"
+  (Nvidia)](https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html)
+  - documentation on training models with Nvidia GPU's using mixed precision
+- ["What Every User Should Know About Mixed Precision Training in PyTorch"
+  (PyTorch)](https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/)
+  - intro to mixed precision in PyTorch
+- ["Automatic Mixed Precision"
+  (PyTorch)](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html) - a
+  short recipe on how to add mixed precision in PyTorch
+- ["CUDA Automatic Mixed Precision examples"
+  (PyTorch)](https://pytorch.org/docs/stable/notes/amp_examples.html) - examples
+  of using mixed precision in PyTorch
+- ["Mixed precision training"
+  (fast.ai)](https://docs.fast.ai/callback.fp16.html) - this one is another good
+  explanation of how mixed precision training works in theory
+
 ## Benchmarks
 
 - [MultiPL-E](https://nuprl.github.io/MultiPL-E/)
-- 
GitLab