Feature: training visualization (WandB, TensorBoard)
Use wandb or tensorboard as mechanisms to visualize training metrics. Perhaps employ a similar mechanism to HuggingFace, where these packages can be enabled or disabled easily, since they are used through a generic interface.
These should visualize things like memory used, loss, learning rate decay, computation / communication time etc.