API Reference
Core
Core functionality for training
| train | Prepare and train a model on a dataset. Can also infer from a model or merge lora |
| evaluate | Module for evaluating models. |
| datasets | Module containing dataset functionality. |
| convert | Module containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes |
| prompt_tokenizers | Module containing PromptTokenizingStrategy and Prompter classes |
| logging_config | Common logging module for axolotl. |
| core.builders.base | Base class for trainer builder |
| core.builders.causal | Builder for causal trainers |
| core.builders.rl | Builder for RLHF trainers |
| core.training_args | extra axolotl specific training args |
| core.chat.messages | internal message representations of chat messages |
| core.chat.format.chatml | ChatML transformation functions for MessageContents |
| core.chat.format.llama3x | Llama 3.x chat formatting functions for MessageContents |
| core.chat.format.shared | shared functions for format transforms |
| core.datasets.chat | chat dataset module |
| core.datasets.transforms.chat_builder | This module contains a function that builds a transform that takes a row from the |
CLI
Command-line interface
| cli.main | Click CLI definitions for various axolotl commands. |
| cli.train | CLI to run training on a model. |
| cli.evaluate | CLI to run evaluation on a model. |
| cli.args | Module for axolotl CLI command arguments. |
| cli.art | Axolotl ASCII logo utils. |
| cli.checks | Various checks for Axolotl CLI. |
| cli.config | Configuration loading and processing. |
| cli.delinearize_llama4 | CLI tool to delinearize quantized/Linearized Llama-4 models. |
| cli.inference | CLI to run inference on a trained model. |
| cli.merge_lora | CLI to merge a trained LoRA into a base model. |
| cli.merge_sharded_fsdp_weights | CLI to merge sharded FSDP model checkpoints into a single combined checkpoint. |
| cli.preprocess | CLI to run preprocessing of a dataset. |
| cli.quantize | CLI to post-training quantize a model using torchao |
| cli.vllm_serve | CLI to start the vllm server for online RL |
| cli.cloud.base | base class for cloud platforms from cli |
| cli.cloud.modal_ | Modal Cloud support from CLI |
| cli.utils | Init for axolotl.cli.utils module. |
| cli.utils.args | Utilities for axolotl CLI args. |
| cli.utils.fetch | Utilities for axolotl fetch CLI command. |
| cli.utils.load | Utilities for model, tokenizer, etc. loading. |
| cli.utils.sweeps | Utilities for handling sweeps over configs for axolotl train CLI command |
| cli.utils.train | Utilities for axolotl train CLI command. |
Trainers
Training implementations
| core.trainers.base | Module for customized trainers |
| core.trainers.trl | Module for TRL RL trainers |
| core.trainers.mamba | Module for mamba trainer |
| core.trainers.dpo.trainer | DPO trainer for axolotl |
| core.trainers.grpo.trainer | Axolotl GRPO trainers (with and without sequence parallelism handling) |
| core.trainers.grpo.sampler | Repeat random sampler (similar to the one implemented in |
| core.trainers.utils | Utils for Axolotl trainers |
Model Loading
Functionality for loading and patching models, tokenizers, etc.
| loaders.model | Model loader class implementation for loading, configuring, and patching various models. |
| loaders.tokenizer | Tokenizer loading functionality and associated utils |
| loaders.processor | Processor loading functionality for multi-modal models |
| loaders.adapter | Adapter loading functionality, including LoRA / QLoRA and associated utils |
| loaders.patch_manager | Patch manager class implementation to complement axolotl.loaders.ModelLoader. |
| loaders.constants | Shared constants for axolotl.loaders module |
Mixins
Mixin classes for augmenting trainers
| core.trainers.mixins.optimizer | Module for Axolotl trainer optimizer mixin |
| core.trainers.mixins.rng_state_loader | Temporary fix/override for bug in resume from checkpoint |
| core.trainers.mixins.scheduler | Module for Axolotl trainer scheduler mixin |
Context Managers
Context managers for altering trainer behaviors
| utils.ctx_managers.sequence_parallel | Module for Axolotl trainer sequence parallelism manager and utilities |
Prompt Strategies
Prompt formatting strategies
| prompt_strategies.base | module for base dataset transform strategies |
| prompt_strategies.chat_template | HF Chat Templates prompt strategy |
| prompt_strategies.alpaca_chat | Module for Alpaca prompt strategy classes |
| prompt_strategies.alpaca_instruct | Module loading the AlpacaInstructPromptTokenizingStrategy class |
| prompt_strategies.alpaca_w_system | Prompt strategies loader for alpaca instruction datasets with system prompts |
| prompt_strategies.user_defined | User Defined prompts with configuration from the YML config |
| prompt_strategies.llama2_chat | Prompt Strategy for finetuning Llama2 chat models |
| prompt_strategies.completion | Basic completion text |
| prompt_strategies.input_output | Module for plain input/output prompt pairs |
| prompt_strategies.stepwise_supervised | Module for stepwise datasets, typically including a prompt and reasoning traces, |
| prompt_strategies.metharme | Module containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class |
| prompt_strategies.orcamini | Prompt Strategy for finetuning Orca Mini (v2) models |
| prompt_strategies.pygmalion | Module containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class |
| prompt_strategies.messages.chat | Chat dataset wrapping strategy for new internal messages representations |
| prompt_strategies.dpo.chat_template | DPO prompt strategies for using tokenizer chat templates. |
| prompt_strategies.dpo.llama3 | DPO strategies for llama-3 chat template |
| prompt_strategies.dpo.chatml | DPO strategies for chatml |
| prompt_strategies.dpo.zephyr | DPO strategies for zephyr |
| prompt_strategies.dpo.user_defined | User-defined DPO strategies |
| prompt_strategies.dpo.passthrough | DPO prompt strategies passthrough/zero-processing strategy |
| prompt_strategies.kto.llama3 | KTO strategies for llama-3 chat template |
| prompt_strategies.kto.chatml | KTO strategies for chatml |
| prompt_strategies.kto.user_defined | User-defined KTO strategies |
| prompt_strategies.orpo.chat_template | chatml prompt tokenization strategy for ORPO |
| prompt_strategies.bradley_terry.llama3 | chatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template |
Kernels
Low-level performance optimizations
| kernels.lora | Module for definition of Low-Rank Adaptation (LoRA) Triton kernels. |
| kernels.geglu | Module for definition of GEGLU Triton kernels. |
| kernels.swiglu | Module for definition of SwiGLU Triton kernels. |
| kernels.quantize | Dequantization utilities for bitsandbytes integration. |
| kernels.utils | Utilities for axolotl.kernels submodules. |
Monkey Patches
Runtime patches for model optimizations
| monkeypatch.llama_attn_hijack_flash | Flash attention monkey patch for llama model |
| monkeypatch.llama_attn_hijack_xformers | Directly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments |
| monkeypatch.mistral_attn_hijack_flash | Flash attention monkey patch for mistral model |
| monkeypatch.multipack | multipack patching for v2 of sample packing |
| monkeypatch.relora | Implements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune. |
| monkeypatch.llama_expand_mask | expands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf |
| monkeypatch.lora_kernels | Module for patching custom LoRA Triton kernels and torch.autograd functions. |
| monkeypatch.utils | Shared utils for the monkeypatches |
| monkeypatch.btlm_attn_hijack_flash | Flash attention monkey patch for cerebras btlm model |
| monkeypatch.llama_patch_multipack | Patched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention |
| monkeypatch.stablelm_attn_hijack_flash | PyTorch StableLM Epoch model. |
| monkeypatch.trainer_fsdp_optim | fix for FSDP optimizer save in trainer w 4.47.0 |
| monkeypatch.transformers_fa_utils | see https://github.com/huggingface/transformers/pull/35834 |
| monkeypatch.unsloth_ | module for patching with unsloth optimizations |
| monkeypatch.data.batch_dataset_fetcher | Monkey patches for the dataset fetcher to handle batches of packed indexes. |
| monkeypatch.mixtral | Patches to support multipack for mixtral |
| monkeypatch.gradient_checkpointing.offload_cpu | CPU offloaded checkpointing |
| monkeypatch.gradient_checkpointing.offload_disk | DISCO - DIsk-based Storage and Checkpointing with Optimized prefetching |
Utils
Utility functions
| utils.tokenization | Module for tokenization utilities |
| utils.chat_templates | This module provides functionality for selecting chat templates based on user choices. |
| utils.lora | module to get the state dict of a merged lora model |
| utils.model_shard_quant | module to handle loading model on cpu/meta device for FSDP |
| utils.bench | Benchmarking and measurement utilities |
| utils.freeze | module to freeze/unfreeze parameters by name |
| utils.trainer | Module containing the Trainer class and related functions |
| utils.schedulers | Module for custom LRScheduler class |
| utils.distributed | Utilities for distributed functionality. |
| utils.dict | Module containing the DictDefault class |
| utils.optimizers.adopt | Copied from https://github.com/iShohei220/adopt |
| utils.data.streaming | Data handling specific to streaming datasets. |
| utils.data.sft | Data handling specific to SFT. |
| utils.quantization | Utilities for quantization including QAT and PTQ using torchao. |
Schemas
Pydantic data models for Axolotl config
| utils.schemas.config | Module with Pydantic models for configuration. |
| utils.schemas.model | Pydantic models for model input / output, etc. configuration |
| utils.schemas.training | Pydantic models for training hyperparameters |
| utils.schemas.datasets | Pydantic models for datasets-related configuration |
| utils.schemas.peft | Pydantic models for PEFT-related configuration |
| utils.schemas.trl | Pydantic models for TRL trainer configuration |
| utils.schemas.multimodal | Pydantic models for multimodal-related configuration |
| utils.schemas.integrations | Pydantic models for Axolotl integrations |
| utils.schemas.enums | Enums for Axolotl input config |
| utils.schemas.utils | Utilities for Axolotl Pydantic models |
Integrations
Third-party integrations and extensions
| integrations.base | Base class for all plugins. |
| integrations.cut_cross_entropy.args | Module for handling Cut Cross Entropy input arguments. |
| integrations.grokfast.optimizer | |
| integrations.kd.trainer | KD trainer |
| integrations.liger.args | Module for handling LIGER input arguments. |
| integrations.lm_eval.args | Module for handling lm eval harness input arguments. |
| integrations.spectrum.args | Module for handling Spectrum input arguments. |
Common
Common utilities and shared functionality
| common.architectures | Common architecture specific constants |
| common.const | Various shared constants |
| common.datasets | Dataset loading utilities. |
Models
Custom model implementations
| models.mamba.modeling_mamba |
Data Processing
Data processing utilities
| utils.collators.core | basic shared collator constants |
| utils.collators.batching | Data collators for axolotl to pad labels and position_ids for packed sequences |
| utils.collators.mamba | collators for Mamba |
| utils.collators.mm_chat | Collators for multi-modal chat messages and packing |
| utils.samplers.multipack | Multipack Batch Sampler - An efficient batch sampler for packing variable-length sequences |
Callbacks
Training callbacks
| utils.callbacks.perplexity | callback to calculate perplexity as an evaluation metric. |
| utils.callbacks.profiler | HF Trainer callback for creating pytorch profiling snapshots |
| utils.callbacks.lisa | module for LISA |
| utils.callbacks.mlflow_ | MLFlow module for trainer callbacks |
| utils.callbacks.comet_ | Comet module for trainer callbacks |
| utils.callbacks.qat | QAT Callback for HF Causal Trainer |