- Dec 03, 2023
-
-
Vlad-Andrei BĂDOIU (78692) authored
-
- Nov 29, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
Since only the first process communicates with the user, the keyboard interrupt will not reach the other processes, hence it will still crash, whatever we catch or not. A more complex mechanism would be needed to transmit the user intention to close the app.
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Since the tokenizer model is just a 500KB file, we can easily add it here and not deal with it later.
-
Alexandru-Mihai GHERGHESCU authored
Previously, the script just crashed, without informing the user about it.
-
Alexandru-Mihai GHERGHESCU authored
-
- Nov 14, 2023
-
-
ruanslv authored
Fix key-value caching for seqlen != 1 (Issue #899)
-
- Nov 13, 2023
-
-
flu0r1ne authored
Update and add comments about the shape of the key and value matrices in the attention component. E.g., the second dimension is of length seqlen + cache_len not seqlen as previously stated.
-
Alex authored
Update names for consistency with code Co-authored-by:
ruanslv <ruanslv@gmail.com>
-
- Nov 10, 2023
-
-
Joseph Spisak authored
Update README.md
-
Joseph Spisak authored
-
- Nov 08, 2023
-
-
Suraj Subramanian authored
-
Suraj Subramanian authored
-
- Nov 03, 2023
-
-
flu0r1ne authored
This commit fixes a bug in the key-value caching. Currently, a square attention mask is misapplied to the scores matrix despite not matching the shape of the scores matrix. This results in a runtime error. In a correct implementation, the decoder mask needs to describe how the new seq_len tokens interact with all the cached tokens. That is, the attention mask needs to be of shape (seq_len, total_len), indicating how the token at row i (representing token i + cached_len in the transformer model) attends to token j. Accordingly, the matrix needs to mask entries where j > cached_len + i. This patch horizontally appends (seq_len, cached_len) zeros to an upper-triangular mask of size (seq_len, seq_len) to form the (seq_len, total_len) mask.
-
- Nov 02, 2023
-
-
Joseph Spisak authored
Correct "bug," typo to "bug", in README.md
-
JacobHelwig authored
-
Joseph Spisak authored
Delete FAQ.md
-
Joseph Spisak authored
-
Joseph Spisak authored
Update README.md
-
Joseph Spisak authored
-
- Oct 18, 2023
-
-
Suraj Subramanian authored
-
- Oct 16, 2023
-
-
Joseph Spisak authored
Update issue templates
-
Suraj Subramanian authored
-
- Oct 15, 2023
-
-
Joseph Spisak authored
[closes #858] change "Content Length" to "Context Length MODEL_CARD.md
-
yonashub authored
-
- Oct 11, 2023
-
-
Joseph Spisak authored
Faq updates
-
Joseph Spisak authored
made some small fixes and added some context.
-
sekyondaMeta authored
-
sekyondaMeta authored
-
- Sep 29, 2023
-
-
samuelselvan authored
Add "--continue" flag to wget for model binary in order to resume dl
-
- Sep 26, 2023
-
-
Joseph Spisak authored
Update README.md
-
Joseph Spisak authored
Updated the Meta AI mention to just Meta.
-
- Sep 23, 2023
-
-
Kieren authored
-
- Sep 21, 2023
-
-
Joseph Spisak authored
Update MODEL_CARD.md
-
Joseph Spisak authored
Update FAQ.md
-