kernels.quantize
kernels.quantize
Dequantization utilities for bitsandbytes integration.
Functions
| Name | Description |
|---|---|
| dequantize | Fast NF4 dequantization using bitsandbytes CUDA kernels. |
dequantize
kernels.quantize.dequantize(W, quant_state=None, out=None)Fast NF4 dequantization using bitsandbytes CUDA kernels.
Performs efficient dequantization of weights from NF4 format using bitsandbytes’
optimized CUDA implementations. Supports both legacy list and new QuantState
formats.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| W | torch.Tensor | Quantized weight tensor to dequantize | required |
| quant_state | QuantState | list | None | Quantization state containing metadata needed for dequantization. Can be either a QuantState object or legacy list format. If None, returns W unchanged. |
None |
| out | torch.Tensor | None | Optional output tensor for storing dequantized results. Must match expected shape and dtype if provided. | None |
Returns
| Name | Type | Description |
|---|---|---|
| torch.Tensor | Dequantized tensor in the specified dtype (fp16 or bf16). Will be transposed if | |
| torch.Tensor | input W was transposed. |
Raises
| Name | Type | Description |
|---|---|---|
| AssertionError | If provided output tensor doesn’t match expected shape / dtype. |
Note
Uses CUDA streams for better performance when available in newer bitsandbytes
versions (>0.43.3).