common.datasets
common.datasets
Dataset loading utilities.
Classes
| Name | Description |
|---|---|
| TrainDatasetMeta | Dataclass with fields for training and validation datasets and metadata. |
TrainDatasetMeta
common.datasets.TrainDatasetMeta(
train_dataset,
eval_dataset=None,
total_num_steps=None,
)Dataclass with fields for training and validation datasets and metadata.
Functions
| Name | Description |
|---|---|
| load_datasets | Loads one or more training or evaluation datasets, calling |
| load_preference_datasets | Loads one or more training or evaluation datasets for RL training using paired |
| sample_dataset | Randomly sample num_samples samples with replacement from dataset. |
load_datasets
common.datasets.load_datasets(cfg, cli_args=None, debug=False)Loads one or more training or evaluation datasets, calling
axolotl.utils.data.prepare_datasets. Optionally, logs out debug information.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| cfg | DictDefault | Dictionary mapping axolotl config keys to values. |
required |
| cli_args | PreprocessCliArgs | TrainerCliArgs | None | Command-specific CLI arguments. | None |
| debug | bool | Whether to print out tokenization of sample. This is duplicated in cfg and cli_args, but is kept due to use in our Colab notebooks. |
False |
Returns
| Name | Type | Description |
|---|---|---|
| TrainDatasetMeta | Dataclass with fields for training and evaluation datasets and the computed total_num_steps. |
load_preference_datasets
common.datasets.load_preference_datasets(cfg, cli_args=None)Loads one or more training or evaluation datasets for RL training using paired
preference data, calling axolotl.utils.data.rl.prepare_preference_datasets.
Optionally, logs out debug information.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| cfg | DictDefault | Dictionary mapping axolotl config keys to values. |
required |
| cli_args | PreprocessCliArgs | TrainerCliArgs | None | Command-specific CLI arguments. | None |
Returns
| Name | Type | Description |
|---|---|---|
| TrainDatasetMeta | Dataclass with fields for training and evaluation datasets and the computed | |
| TrainDatasetMeta | total_num_steps. |
sample_dataset
common.datasets.sample_dataset(dataset, num_samples)Randomly sample num_samples samples with replacement from dataset.