Skip to content
Snippets Groups Projects

Fix a number of issues with the infrastructure, no major rework

Merged Alexandru-Mihai GHERGHESCU requested to merge fix/general_small_fixes into main
  1. Jan 24, 2024
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix final training loss calculation, fix estimation interval · 64302265
      Alexandru-Mihai GHERGHESCU authored
      Visual change, correctly display final training loss.
      
      The final training loss didn't account for gradient accumulation, and
      was therefore much smaller than it should've been in reality.
      
      Fix the estimation interval, which was also not properly calculated due
      to gradient accumulation.
      Unverified
      64302265
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix bad calculation for number of batches · 7aa99b4a
      Alexandru-Mihai GHERGHESCU authored
      There was a corner case when the shape of the predictions y of the
      dataset would not be correct, due to the fact that the number of batches
      was miscalculated.
      
      This happened when `batch_len` was exactly divisible by `seq_len`, since
      the predictions, which are simply the text shifted once to the right,
      would not have that extra column at the end.
      
      Fix the above issue by decrementing the number of available batches with
      1 when `batch_len` exactly divides by `seq_len`.
      Unverified
      7aa99b4a
    • Alexandru-Mihai GHERGHESCU's avatar
      Ignore last batches when calculating final train loss · 4ab91bcf
      Alexandru-Mihai GHERGHESCU authored
      Visual change. This only changes what the trainer reports as the final
      training loss.
      
      Not quite sure if the value before was accurate anyway, since gradient
      accumulation would not let the optimizer step every batch anyway.
      
      For a big enough dataset, this should not have any impact at all.
      
      The final loss value will be reported based on the last calculation of
      the loss, correctly taking into consideration gradient accumulation as
      well.
      Unverified
      4ab91bcf
  2. Jan 22, 2024
Loading