Fix tokenizer typos, add newlines
Fix some issues in the tokenizer.
Mainly, fix a problem where newlines weren't present in the tokenizer model. This meant any whitespace was silently deleted and newlines weren't a thing. This could introduce issues for datasets such as wikitext103, where newlines delimited titles and actual text.
Re-train the wikitext tokenizers with newlines.