utils.schedulers
utils.schedulers
Module for custom LRScheduler class
Classes
| Name | Description |
|---|---|
| InterpolatingLogScheduler | A scheduler that interpolates learning rates in a logarithmic fashion |
| JaggedLRRestartScheduler | Wraps another scheduler to apply per-lora-restart learning rate warmups. |
| RexLR | Reflected Exponential (REX) learning rate scheduler. |
InterpolatingLogScheduler
utils.schedulers.InterpolatingLogScheduler(
optimizer,
num_steps,
min_lr,
max_lr,
last_epoch=-1,
)A scheduler that interpolates learning rates in a logarithmic fashion
JaggedLRRestartScheduler
utils.schedulers.JaggedLRRestartScheduler(
optimizer,
inner_schedule,
jagged_restart_steps,
jagged_restart_warmup_steps,
jagged_restart_anneal_steps=1,
min_lr_scale=0.001,
)Wraps another scheduler to apply per-lora-restart learning rate warmups.
RexLR
utils.schedulers.RexLR(
optimizer,
max_lr,
min_lr,
total_steps=0,
num_warmup_steps=0,
last_step=0,
)Reflected Exponential (REX) learning rate scheduler.
- Original implementation: https://github.com/IvanVassi/REX_LR
- Original license: Apache 2.0
- Based on: https://arxiv.org/abs/2107.04197
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| optimizer | torch.optim.Optimizer | The optimizer to schedule the learning rate for. | required |
| max_lr | float | The maximum learning rate. | required |
| min_lr | float | The minimum learning rate. | required |
| total_steps | int | The total number of training steps. | 0 |
| num_warmup_steps | int | The number of warmup steps. | 0 |
| last_step | int | The index of last step. | 0 |
Functions
| Name | Description |
|---|---|
| get_cosine_schedule_with_min_lr | |
| get_cosine_schedule_with_quadratic_warmup | Create a schedule with a learning rate that decreases following the values of the cosine function between the |
| get_cosine_schedule_with_warmup_decay_constant | Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf) |
get_cosine_schedule_with_min_lr
utils.schedulers.get_cosine_schedule_with_min_lr(
optimizer,
num_warmup_steps,
num_training_steps,
min_lr_ratio=0.0,
)Create a learning rate schedule which has
- linear warmup from 0 ->
max_lrovernum_warmup_steps - cosine learning rate annealing from
max_lr->min_lrovernum_training_steps
get_cosine_schedule_with_quadratic_warmup
utils.schedulers.get_cosine_schedule_with_quadratic_warmup(
optimizer,
num_warmup_steps,
num_training_steps,
num_cycles=0.5,
last_epoch=-1,
)Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| optimizer | [~torch.optim.Optimizer] |
The optimizer for which to schedule the learning rate. | required |
| num_warmup_steps | int |
The number of steps for the warmup phase. | required |
| num_training_steps | int |
The total number of training steps. | required |
| num_cycles | float, optional, defaults to 0.5 |
The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). | 0.5 |
| last_epoch | int, optional, defaults to -1 |
The index of the last epoch when resuming training. | -1 |
Return
torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.
get_cosine_schedule_with_warmup_decay_constant
utils.schedulers.get_cosine_schedule_with_warmup_decay_constant(
optimizer,
num_warmup_steps,
num_training_steps,
constant_lr_ratio,
min_lr_ratio,
num_cycles=0.5,
last_epoch=-1,
)Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf) Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to min_lr_ratio until num_training_steps * constant_lr_ratio, after constant_rate returns constant value of min_rate , after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| optimizer | [~torch.optim.Optimizer] |
The optimizer for which to schedule the learning rate. | required |
| num_warmup_steps | int |
The number of steps for the warmup phase. | required |
| num_training_steps | int |
The total number of training steps. | required |
| constant_lr_ratio | float | (float): The ratio of num_training_steps to decrease by cosine function. |
required |
| min_lr_ratio | float | (float): The ratio of maximum learning rate for cosine function to decay to minimum learning rate. | _required_ | | num_cycles |float, *optional*, defaults to 0.5 | The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). |0.5| | last_epoch |int, *optional*, defaults to -1 | The index of the last epoch when resuming training. |-1` |
Return
torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.