Move to HuggingFace tokenizers
Drop SentencePiece tokenizers, as HuggingFace's tokenizers has a much nicer interface to work with, plus it's written in Rust, is parallelizable, and has better integration with the whole ecosystem. HuggingFace tokenizers should not affect performance at all.
Showing
- inference.py 9 additions, 10 deletionsinference.py
- llama32K.model 0 additions, 0 deletionsllama32K.model
- local_tokenizers/gemma/config.json 27 additions, 0 deletionslocal_tokenizers/gemma/config.json
- local_tokenizers/gemma/special_tokens_map.json 34 additions, 0 deletionslocal_tokenizers/gemma/special_tokens_map.json
- local_tokenizers/gemma/tokenizer.json 838657 additions, 0 deletionslocal_tokenizers/gemma/tokenizer.json
- local_tokenizers/gemma/tokenizer.model 0 additions, 0 deletionslocal_tokenizers/gemma/tokenizer.model
- local_tokenizers/gemma/tokenizer_config.json 1516 additions, 0 deletionslocal_tokenizers/gemma/tokenizer_config.json
- local_tokenizers/gpt2/config.json 31 additions, 0 deletionslocal_tokenizers/gpt2/config.json
- local_tokenizers/gpt2/merges.txt 50001 additions, 0 deletionslocal_tokenizers/gpt2/merges.txt
- local_tokenizers/gpt2/tokenizer.json 1 addition, 0 deletionslocal_tokenizers/gpt2/tokenizer.json
- local_tokenizers/gpt2/tokenizer_config.json 1 addition, 0 deletionslocal_tokenizers/gpt2/tokenizer_config.json
- local_tokenizers/gpt2/vocab.json 1 addition, 0 deletionslocal_tokenizers/gpt2/vocab.json
- local_tokenizers/llama2/config.json 25 additions, 0 deletionslocal_tokenizers/llama2/config.json
- local_tokenizers/llama2/special_tokens_map.json 23 additions, 0 deletionslocal_tokenizers/llama2/special_tokens_map.json
- local_tokenizers/llama2/tokenizer.json 93391 additions, 0 deletionslocal_tokenizers/llama2/tokenizer.json
- local_tokenizers/llama2/tokenizer_config.json 35 additions, 0 deletionslocal_tokenizers/llama2/tokenizer_config.json
- local_tokenizers/llama3/config.json 27 additions, 0 deletionslocal_tokenizers/llama3/config.json
- local_tokenizers/llama3/special_tokens_map.json 4 additions, 0 deletionslocal_tokenizers/llama3/special_tokens_map.json
- local_tokenizers/llama3/tokenizer.json 410563 additions, 0 deletionslocal_tokenizers/llama3/tokenizer.json
- local_tokenizers/llama3/tokenizer_config.json 2061 additions, 0 deletionslocal_tokenizers/llama3/tokenizer_config.json
Loading
Please register or sign in to comment