Skip to content
Snippets Groups Projects
Unverified Commit 4ab91bcf authored by Alexandru-Mihai GHERGHESCU's avatar Alexandru-Mihai GHERGHESCU
Browse files

Ignore last batches when calculating final train loss

Visual change. This only changes what the trainer reports as the final
training loss.

Not quite sure if the value before was accurate anyway, since gradient
accumulation would not let the optimizer step every batch anyway.

For a big enough dataset, this should not have any impact at all.

The final loss value will be reported based on the last calculation of
the loss, correctly taking into consideration gradient accumulation as
well.
parent a092db0a
No related branches found
No related tags found
1 merge request!11Fix a number of issues with the infrastructure, no major rework
......@@ -155,10 +155,6 @@ class Trainer():
f"~{self.ms_per_batch:.2f} ms/batch | " \
f" lr: {lr:.7f}"
# account for last batches when computing average train loss
self.train_loss = total_loss / (len(self.dl.train) % est_interval - 1)
self.train_ppl = math.exp(self.train_loss)
pb.on_iter_end()
def _do_epoch_validate(self):
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment