Add RetNet paper

95abe74b · Alexandru-Mihai GHERGHESCU · 0a9228da · 95abe74b
Commit 95abe74b authored 1 year ago by Alexandru-Mihai GHERGHESCU
--- a/README.md
+++ b/README.md
@@ -18,6 +18,8 @@

 [Why multi-head self attention works: math, intuitions and 10+1 hidden insights - AI Summer](https://theaisummer.com/self-attention/) - Really good article

+[Retentive network: a successor to transformer for large language models (aug. 2023) - Microsoft Research](https://arxiv.org/abs/2307.08621) - Paper introducing the retention mechanism to substitute attention in Transformers; there is a decent overview of the paper as a Medium post [here](https://medium.com/ai-fusion-labs/retentive-networks-retnet-explained-the-much-awaited-transformers-killer-is-here-6c17e3e8add8)
+
 ## Systems

 [Llama.cpp 30B runs with only 6GB of RAM now (CPU)](https://github.com/ggerganov/llama.cpp/discussions/638#discussioncomment-5492916)