Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
O
Optimus Prime
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Iterations
Wiki
Requirements
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Test cases
Artifacts
Deploy
Releases
Package registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Insights
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
NetSys
Optimus Prime
Merge requests
!11
Fix a number of issues with the infrastructure, no major rework
Code
Review changes
Check out branch
Download
Patches
Plain diff
Merged
Fix a number of issues with the infrastructure, no major rework
fix/general_small_fixes
into
main
Overview
1
Commits
11
Pipelines
0
Changes
8
Merged
Alexandru-Mihai GHERGHESCU
requested to merge
fix/general_small_fixes
into
main
1 year ago
Overview
1
Commits
11
Pipelines
0
Changes
1
Expand
Fix a number of issues present in the code:
type issues
README.md issues
variable initializations
model issues
training loss / number of batches calculation issues
0
0
Merge request reports
Compare
version 1
version 1
7aa99b4a
1 year ago
main (base)
and
latest version
latest version
64302265
11 commits,
1 year ago
version 1
7aa99b4a
10 commits,
1 year ago
Show latest version
1 file
+
5
−
5
Inline
Compare changes
Side-by-side
Inline
Show whitespace changes
Show one file at a time
optimus/trainer.py
+
5
−
5
Options
@@ -102,12 +102,12 @@ class Trainer():
def
_do_epoch_train
(
self
):
self
.
model
.
train
()
# put model in training mode
# compute average train loss, train ppl and ms/batch every ~200 batches
#
(depending on gradient accumulation steps), or every 10% of training
#
dataset (whichever is smaller)
# compute average train loss, train ppl and ms/batch every ~200 batches
,
#
or every 10% of training dataset (whichever is smaller), rounded to
#
gradient accumulation steps
self
.
ms_per_batch
=
0.
total_loss
=
0.
est_interval
=
int
(
max
(
min
(
200
//
self
.
grad_acc_steps
,
0.1
*
len
(
self
.
dl
.
train
)),
1
))
*
self
.
grad_acc_steps
est_interval
=
int
(
max
(
min
(
200
,
0.1
*
len
(
self
.
dl
.
train
)),
1
))
//
self
.
grad_acc_steps
*
self
.
grad_acc_steps
start_time
=
time
.
time
()
# progress bar for batches
@@ -145,7 +145,7 @@ class Trainer():
# update train loss, train ppl and estimated ms/batch
if
(
i
+
1
)
%
est_interval
==
0
:
self
.
ms_per_batch
=
(
time
.
time
()
-
start_time
)
*
1000
/
est_interval
self
.
train_loss
=
total_loss
/
est_interval
self
.
train_loss
=
(
total_loss
*
self
.
grad_acc_steps
)
/
est_interval
self
.
train_ppl
=
math
.
exp
(
self
.
train_loss
)
total_loss
=
0.
Loading