Feature: Data collator for training

Not completely sure this is needed just yet, since chunking always drop the last batch, but implementing this would allow the framework to never throw away data, since it would be able to pad any extra data with padding tokens.