Skip to content
Snippets Groups Projects
  • Alexandru-Mihai GHERGHESCU's avatar
    4ab91bcf
    Ignore last batches when calculating final train loss · 4ab91bcf
    Alexandru-Mihai GHERGHESCU authored
    Visual change. This only changes what the trainer reports as the final
    training loss.
    
    Not quite sure if the value before was accurate anyway, since gradient
    accumulation would not let the optimizer step every batch anyway.
    
    For a big enough dataset, this should not have any impact at all.
    
    The final loss value will be reported based on the last calculation of
    the loss, correctly taking into consideration gradient accumulation as
    well.
    Ignore last batches when calculating final train loss
    Alexandru-Mihai GHERGHESCU authored
    Visual change. This only changes what the trainer reports as the final
    training loss.
    
    Not quite sure if the value before was accurate anyway, since gradient
    accumulation would not let the optimizer step every batch anyway.
    
    For a big enough dataset, this should not have any impact at all.
    
    The final loss value will be reported based on the last calculation of
    the loss, correctly taking into consideration gradient accumulation as
    well.
Code owners
Assign users and groups as approvers for specific file changes. Learn more.