Add inference code
- Jan 26, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Output model tokens per second at the end of inference.
-
Alexandru-Mihai GHERGHESCU authored
This allows the inference code to start up with a prompt, instead of waiting for user input from stdin. Allows easier scripting, useful for batch generation, benchmarking etc.
-
- Jan 25, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Inference example code. At the moment, the code simply loads a model state file and generates text using that. Parameters like max sequence length, whether training used fp16, what the tokenizer used for training is etc., need to be passed manually by the user (there's a lot of room for error here). To be improved. Merges changes from !14 Closes !14
-