utils.schedulers

utils.schedulers

Module for custom LRScheduler class

Classes

Name	Description
InterpolatingLogScheduler	A scheduler that interpolates learning rates in a logarithmic fashion
JaggedLRRestartScheduler	Wraps another scheduler to apply per-lora-restart learning rate warmups.
RexLR	Reflected Exponential (REX) learning rate scheduler.

InterpolatingLogScheduler

utils.schedulers.InterpolatingLogScheduler(
    optimizer,
    num_steps,
    min_lr,
    max_lr,
    last_epoch=-1,
)

A scheduler that interpolates learning rates in a logarithmic fashion

JaggedLRRestartScheduler

utils.schedulers.JaggedLRRestartScheduler(
    optimizer,
    inner_schedule,
    jagged_restart_steps,
    jagged_restart_warmup_steps,
    jagged_restart_anneal_steps=1,
    min_lr_scale=0.001,
)

Wraps another scheduler to apply per-lora-restart learning rate warmups.

RexLR

utils.schedulers.RexLR(
    optimizer,
    max_lr,
    min_lr,
    total_steps=0,
    num_warmup_steps=0,
    last_step=0,
)

Reflected Exponential (REX) learning rate scheduler.

Original implementation: https://github.com/IvanVassi/REX_LR
Original license: Apache 2.0
Based on: https://arxiv.org/abs/2107.04197

Parameters

Name	Type	Description	Default
optimizer	torch.optim.Optimizer	The optimizer to schedule the learning rate for.	required
max_lr	float	The maximum learning rate.	required
min_lr	float	The minimum learning rate.	required
total_steps	int	The total number of training steps.	`0`
num_warmup_steps	int	The number of warmup steps.	`0`
last_step	int	The index of last step.	`0`

Functions

Name	Description
get_cosine_schedule_with_min_lr
get_cosine_schedule_with_quadratic_warmup	Create a schedule with a learning rate that decreases following the values of the cosine function between the
get_cosine_schedule_with_warmup_decay_constant	Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf)

get_cosine_schedule_with_min_lr

utils.schedulers.get_cosine_schedule_with_min_lr(
    optimizer,
    num_warmup_steps,
    num_training_steps,
    min_lr_ratio=0.0,
)

Create a learning rate schedule which has

linear warmup from 0 -> max_lr over num_warmup_steps
cosine learning rate annealing from max_lr -> min_lr over num_training_steps

get_cosine_schedule_with_quadratic_warmup

utils.schedulers.get_cosine_schedule_with_quadratic_warmup(
    optimizer,
    num_warmup_steps,
    num_training_steps,
    num_cycles=0.5,
    last_epoch=-1,
)

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters

Name	Type	Description	Default
optimizer	[`~torch.optim.Optimizer`]	The optimizer for which to schedule the learning rate.	required
num_warmup_steps	`int`	The number of steps for the warmup phase.	required
num_training_steps	`int`	The total number of training steps.	required
num_cycles	`float`, optional, defaults to 0.5	The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine).	`0.5`
last_epoch	`int`, optional, defaults to -1	The index of the last epoch when resuming training.	`-1`

Return

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

get_cosine_schedule_with_warmup_decay_constant

utils.schedulers.get_cosine_schedule_with_warmup_decay_constant(
    optimizer,
    num_warmup_steps,
    num_training_steps,
    constant_lr_ratio,
    min_lr_ratio,
    num_cycles=0.5,
    last_epoch=-1,
)

Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf) Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to min_lr_ratio until num_training_steps * constant_lr_ratio, after constant_rate returns constant value of min_rate , after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters

Name	Type	Description	Default
optimizer	[`~torch.optim.Optimizer`]	The optimizer for which to schedule the learning rate.	required
num_warmup_steps	`int`	The number of steps for the warmup phase.	required
num_training_steps	`int`	The total number of training steps.	required
constant_lr_ratio	float	(`float`): The ratio of num_training_steps to decrease by cosine function.	required
min_lr_ratio	float	(`float): The ratio of maximum learning rate for cosine function to decay to minimum learning rate. \| _required_ \| \| num_cycles \|`float`, optional, defaults to 0.5 \| The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). \|`0.5`\| \| last_epoch \|`int`, optional, defaults to -1 \| The index of the last epoch when resuming training. \|`-1`

Return

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.