utils.data.sft
utils.data.sft
Data handling specific to SFT.
Functions
| Name | Description |
|---|---|
| prepare_datasets | Prepare training and evaluation datasets based on configuration. |
prepare_datasets
utils.data.sft.prepare_datasets(cfg, tokenizer, processor=None)Prepare training and evaluation datasets based on configuration.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| cfg | DictDefault | Dictionary mapping axolotl config keys to values. |
required |
| tokenizer | PreTrainedTokenizer | Tokenizer to use for processing text. | required |
| processor | ProcessorMixin | None | Optional processor for multimodal datasets. | None |
Returns
| Name | Type | Description |
|---|---|---|
| tuple[IterableDataset | Dataset, Dataset | None, int, list[Prompter | None]] | Tuple of (train_dataset, eval_dataset, total_steps, prompters). |