utils.model_shard_quant
utils.model_shard_quant
module to handle loading model on cpu/meta device for FSDP
Functions
| Name | Description |
|---|---|
| load_and_quantize | Loads value tensor into submodule of module, optionally skipping skip_names and converting to dtype. |
load_and_quantize
utils.model_shard_quant.load_and_quantize(
module,
name,
value,
device=None,
dtype=None,
skip_names=None,
to_cpu=False,
to_meta=False,
verbose=False,
quant_method='bnb',
)Loads value tensor into submodule of module, optionally skipping skip_names and converting to dtype.
Quantizes Params4bit on device then places on “cpu” if to_cpu=True or “meta” if to_meta=True.