Move to HuggingFace tokenizers
Drop SentencePiece tokenizers, as HuggingFace's tokenizers has a much nicer interface to work with, plus it's written in Rust, is parallelizable, and has better integration with the whole ecosystem. HuggingFace tokenizers should not affect performance at all.
parent
ed936b00
No related branches found
No related tags found
Showing
- inference.py 9 additions, 10 deletionsinference.py
- llama32K.model 0 additions, 0 deletionsllama32K.model
- local_tokenizers/gemma/config.json 27 additions, 0 deletionslocal_tokenizers/gemma/config.json
- local_tokenizers/gemma/special_tokens_map.json 34 additions, 0 deletionslocal_tokenizers/gemma/special_tokens_map.json
- local_tokenizers/gemma/tokenizer.json 838657 additions, 0 deletionslocal_tokenizers/gemma/tokenizer.json
- local_tokenizers/gemma/tokenizer.model 0 additions, 0 deletionslocal_tokenizers/gemma/tokenizer.model
- local_tokenizers/gemma/tokenizer_config.json 1516 additions, 0 deletionslocal_tokenizers/gemma/tokenizer_config.json
- local_tokenizers/gpt2/config.json 31 additions, 0 deletionslocal_tokenizers/gpt2/config.json
- local_tokenizers/gpt2/merges.txt 50001 additions, 0 deletionslocal_tokenizers/gpt2/merges.txt
- local_tokenizers/gpt2/tokenizer.json 1 addition, 0 deletionslocal_tokenizers/gpt2/tokenizer.json
- local_tokenizers/gpt2/tokenizer_config.json 1 addition, 0 deletionslocal_tokenizers/gpt2/tokenizer_config.json
- local_tokenizers/gpt2/vocab.json 1 addition, 0 deletionslocal_tokenizers/gpt2/vocab.json
- local_tokenizers/llama2/config.json 25 additions, 0 deletionslocal_tokenizers/llama2/config.json
- local_tokenizers/llama2/special_tokens_map.json 23 additions, 0 deletionslocal_tokenizers/llama2/special_tokens_map.json
- local_tokenizers/llama2/tokenizer.json 93391 additions, 0 deletionslocal_tokenizers/llama2/tokenizer.json
- local_tokenizers/llama2/tokenizer_config.json 35 additions, 0 deletionslocal_tokenizers/llama2/tokenizer_config.json
- local_tokenizers/llama3/config.json 27 additions, 0 deletionslocal_tokenizers/llama3/config.json
- local_tokenizers/llama3/special_tokens_map.json 4 additions, 0 deletionslocal_tokenizers/llama3/special_tokens_map.json
- local_tokenizers/llama3/tokenizer.json 410563 additions, 0 deletionslocal_tokenizers/llama3/tokenizer.json
- local_tokenizers/llama3/tokenizer_config.json 2061 additions, 0 deletionslocal_tokenizers/llama3/tokenizer_config.json
Loading
Please register or sign in to comment