Skip to content
Snippets Groups Projects
  1. Jun 10, 2024
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix inference code · cb1a7974
      Alexandru-Mihai GHERGHESCU authored
      This should now work with any PyTorch model (Optimus is the example
      given in the source code), as well as any HuggingFace model (adjusted
      the code to be independent of any model source).
      cb1a7974
  2. Jun 04, 2024
  3. Jun 03, 2024
    • Alexandru-Mihai GHERGHESCU's avatar
      Move to HuggingFace datasets · ed936b00
      Alexandru-Mihai GHERGHESCU authored
      This should be much easier to work with, as we don't have to make a
      separate dataset each time. HuggingFace datasets also has nice
      functionality which we can use, without loss of performance.
      ed936b00
    • Alexandru-Mihai GHERGHESCU's avatar
      Change quotes style · cb1a3343
      Alexandru-Mihai GHERGHESCU authored
      Make double quotes for doc-strings and f-strings which contain single
      quotes inside. Single quotes everywhere else.
      cb1a3343
    • Alexandru-Mihai GHERGHESCU's avatar
      Add gradient checkpointing option to Optimus · 8247f4a4
      Alexandru-Mihai GHERGHESCU authored
      Gradient (or activation) checkpointing trades compute for memory saved.
      This should overall make it easier to train large models on not-so-large
      hardware.
      
      Add checkpointing to every layer (same as HuggingFace), as opposed to
      every 2/3 layers, since 1) this is the easiest to implement, and 2) has
      the best balance between memory/compute.
      8247f4a4
    • Alexandru-Mihai GHERGHESCU's avatar
      Add PyTorch built-in SDPA to Optimus · 70ccb523
      Alexandru-Mihai GHERGHESCU authored
      Add PyTorch's core scaled dot-product attention (SDPA) to Optimus. This
      automatically uses flash attention 2, or memory efficient attention, if
      the hardware supports it. If it doesn't, falls back to manual
      implementation.
      
      Training should be much faster with this; memory should also be around
      half what it was before.
      70ccb523
    • Alexandru-Mihai GHERGHESCU's avatar
      README cleanup · 209826e4
      Alexandru-Mihai GHERGHESCU authored
      209826e4
    • Alexandru-Mihai GHERGHESCU's avatar
      Move Optimus configuration into separate config class · a91b0d2a
      Alexandru-Mihai GHERGHESCU authored
      This should be much nicer to work with, since every option / setting of
      the model can be controlled through a dataclass; this config can also be
      created easily from a json file.
      
      Set a naming scheme for the Optimus model, similar to HuggingFace
      models.
      a91b0d2a
  4. Feb 16, 2024
  5. Feb 15, 2024
    • Vlad-Andrei BĂDOIU (78692)'s avatar
      Merge branch 'feature/fp16' into 'main' · 0faac554
      Vlad-Andrei BĂDOIU (78692) authored
      Add fp16 mixed precision training
      
      See merge request !17
      0faac554
    • Alexandru-Mihai GHERGHESCU's avatar
      Adjust optimizer epsilon value for AMP · 8579fc15
      Alexandru-Mihai GHERGHESCU authored
      Pick a better default as epsilon value. Although this value should never
      touch the fp16 gradients in mixed precision training (as the optimizer
      should only ever work on the master fp32 copy of the model), this value
      didn't need to be changed. However, in pure fp16 training, any epsilon
      value lower than 1e-7 would simply underflow to 0, causing it to become
      useless.
      
      Although the framework doesn't directly support the second case above,
      an epsilon value of 1e-7 seems like a better default for both AMP and
      normal training.
      8579fc15
    • Alexandru-Mihai GHERGHESCU's avatar
      Add fp16 mixed precision training · 6db26eb1
      Alexandru-Mihai GHERGHESCU authored
      This should give training a theoretical 2x speedup in time (though in
      practice that's not usually the case), with close to no loss in
      performance.
      
      The interface allows the user to choose between mixed precision or no
      mixed precision training, which falls back to normal float32 precision.
      
      CPU support for training has been dropped, as it takes (with or without
      mixed precision) much much longer to train than on GPU's, and it's not
      really an alternative anyone considers. With the addition of mixed
      precision, supporting both CPU and GPU would complicate things too much,
      therefore CPU training support has been dropped.
      6db26eb1
  6. Jan 30, 2024
  7. Jan 29, 2024
  8. Jan 28, 2024
  9. Jan 26, 2024
  10. Jan 25, 2024
  11. Jan 24, 2024
Loading