Skip to content
Snippets Groups Projects
Unverified Commit 7aa99b4a authored by Alexandru-Mihai GHERGHESCU's avatar Alexandru-Mihai GHERGHESCU
Browse files

Fix bad calculation for number of batches

There was a corner case when the shape of the predictions y of the
dataset would not be correct, due to the fact that the number of batches
was miscalculated.

This happened when `batch_len` was exactly divisible by `seq_len`, since
the predictions, which are simply the text shifted once to the right,
would not have that extra column at the end.

Fix the above issue by decrementing the number of available batches with
1 when `batch_len` exactly divides by `seq_len`.
parent 4ab91bcf
No related branches found
No related tags found
1 merge request!11Fix a number of issues with the infrastructure, no major rework
...@@ -51,8 +51,24 @@ class _OptimusDL(Iterable): ...@@ -51,8 +51,24 @@ class _OptimusDL(Iterable):
print(f"Done. Took {time.time() - start:.2f}s.") print(f"Done. Took {time.time() - start:.2f}s.")
self.num_batches = torch.cat(self._data, # pre-calculate the number of batches in the dataset
dim=-1).shape[0] // self.bs // self.seq_len
# Note: there's a special case we need to be careful about; since the
# predictions are simply the inputs shifted to the right by one value;
# there's a case when the dataset ends before we can get these
# shifted-right predictions; this occurs iff `batch_len % seq_len == 0`;
# to avoid this, we have to be explicit about the available number of
# batches (by simply subtracting 1 from the total number of available
# batches)
dataset_stream_len = 0
for sample in self._data:
dataset_stream_len += len(sample)
batch_len = dataset_stream_len // self.bs
self.num_batches = batch_len // self.seq_len
if batch_len % self.seq_len == 0:
self.num_batches -= 1
def _process_data_before_iter(self): def _process_data_before_iter(self):
data = self._data data = self._data
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment