Finetune LLM with Axolotl on Intel GPU#

Axolotl is a popular tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures. You can now use ipex-llm as an accelerated backend for Axolotl running on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max).

See the demo of finetuning LLaMA2-7B on Intel Arc GPU below.


0. Prerequisites#

IPEX-LLM’s support for Axolotl v0.4.0 is only available for Linux system. We recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).

Visit the Install IPEX-LLM on Linux with Intel GPU, follow Install Intel GPU Driver and Install oneAPI to install GPU driver and Intel® oneAPI Base Toolkit 2024.0.

1. Install IPEX-LLM for Axolotl#

Create a new conda env, and install ipex-llm[xpu].

conda create -n axolotl python=3.11
conda activate axolotl
# install ipex-llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url

Install axolotl v0.4.0 from git.

# install axolotl v0.4.0
git clone
cd axolotl
# replace requirements.txt
remove requirements.txt
wget -O requirements.txt
pip install -e .
pip install transformers==4.36.0
# to avoid
pip install datasets==2.15.0
# prepare axolotl entrypoints

After the installation, you should have created a conda environment, named axolotl for instance, for running Axolotl commands with IPEX-LLM.

2. Example: Finetune Llama-2-7B with Axolotl#

The following example will introduce finetuning Llama-2-7B with alpaca_2k_test dataset using LoRA and QLoRA.

Note that you don’t need to write any code in this example.

Model Dataset Finetune method
Llama-2-7B alpaca_2k_test LoRA (Low-Rank Adaptation)
Llama-2-7B alpaca_2k_test QLoRA (Quantized Low-Rank Adaptation)

For more technical details, please refer to Llama 2, LoRA and QLoRA.

2.1 Download Llama-2-7B and alpaca_2k_test#

By default, Axolotl will automatically download models and datasets from Huggingface. Please ensure you have login to Huggingface.

huggingface-cli login

If you prefer offline models and datasets, please download Llama-2-7B and alpaca_2k_test. Then, set HF_HUB_OFFLINE=1 to avoid connecting to Huggingface.


2.2 Set Environment Variables#


This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.

Configure oneAPI variables by running the following command:

source /opt/intel/oneapi/

Configure accelerate to avoid training with CPU. You can download a default default_config.yaml with use_cpu: false.

mkdir -p  ~/.cache/huggingface/accelerate/
wget -O ~/.cache/huggingface/accelerate/default_config.yaml

As an alternative, you can config accelerate based on your requirements.

accelerate config

Please answer NO in option Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:.

After finishing accelerate config, check if use_cpu is disabled (i.e., use_cpu: false) in accelerate config file (~/.cache/huggingface/accelerate/default_config.yaml).

2.3 LoRA finetune#

Prepare lora.yml for Axolotl LoRA finetune. You can download a template from github.


If you are using the offline model and dataset in local env, please modify the model path and dataset path in lora.yml. Otherwise, keep them unchanged.

# Please change to local path if model is offline, e.g., /path/to/model/Llama-2-7b-hf
base_model: NousResearch/Llama-2-7b-hf
  # Please change to local path if dataset is offline, e.g., /path/to/dataset/alpaca_2k_test
  - path: mhenrichsen/alpaca_2k_test
    type: alpaca

Modify LoRA parameters, such as lora_r and lora_alpha, etc.

adapter: lora

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

Launch LoRA training with the following command.

accelerate launch lora.yml

In Axolotl v0.4.0, you can use instead of -m axolotl.cli.train or

accelerate launch lora.yml

2.4 QLoRA finetune#

Prepare lora.yml for QLoRA finetune. You can download a template from github.


If you are using the offline model and dataset in local env, please modify the model path and dataset path in qlora.yml. Otherwise, keep them unchanged.

# Please change to local path if model is offline, e.g., /path/to/model/Llama-2-7b-hf
base_model: NousResearch/Llama-2-7b-hf
  # Please change to local path if dataset is offline, e.g., /path/to/dataset/alpaca_2k_test
  - path: mhenrichsen/alpaca_2k_test
    type: alpaca

Modify QLoRA parameters, such as lora_r and lora_alpha, etc.

adapter: qlora

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

Launch LoRA training with the following command.

accelerate launch qlora.yml

In Axolotl v0.4.0, you can use instead of -m axolotl.cli.train or

accelerate launch qlora.yml

3. Finetune Llama-3-8B (Experimental)#

Warning: this section will install axolotl main (796a085) for new features, e.g., Llama-3-8B.

3.1 Install Axolotl main in conda#

Axolotl main has lots of new dependencies. Please setup a new conda env for this version.

conda create -n llm python=3.11
conda activate llm
# install axolotl main
git clone
cd axolotl && git checkout 796a085
pip install -e .
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url
# install transformers etc
pip install accelerate==0.23.0
# to avoid
pip install datasets==2.15.0
pip install transformers==4.37.0

Config accelerate and oneAPIs, according to Set Environment Variables.

3.2 Alpaca QLoRA#

Based on axolotl Llama-3 QLoRA example.

Prepare llama3-qlora.yml for QLoRA finetune. You can download a template from github.


If you are using the offline model and dataset in local env, please modify the model path and dataset path in llama3-qlora.yml. Otherwise, keep them unchanged.

# Please change to local path if model is offline, e.g., /path/to/model/Meta-Llama-3-8B
base_model: meta-llama/Meta-Llama-3-8B
  # Please change to local path if dataset is offline, e.g., /path/to/dataset/alpaca_2k_test
  - path: aaditya/alpaca_subset_1
    type: alpaca

Modify QLoRA parameters, such as lora_r and lora_alpha, etc.

adapter: qlora

sequence_len: 256
sample_packing: true
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
accelerate launch llama3-qlora.yml

You can also use instead of -m axolotl.cli.train or

accelerate launch llama3-qlora.yml

Expected output

{'loss': 0.237, 'learning_rate': 1.2254711850265387e-06, 'epoch': 3.77}
{'loss': 0.6068, 'learning_rate': 1.1692453482951115e-06, 'epoch': 3.77}
{'loss': 0.2926, 'learning_rate': 1.1143322458989303e-06, 'epoch': 3.78}
{'loss': 0.2475, 'learning_rate': 1.0607326072295087e-06, 'epoch': 3.78}
{'loss': 0.1531, 'learning_rate': 1.008447144232094e-06, 'epoch': 3.79}
{'loss': 0.1799, 'learning_rate': 9.57476551396197e-07, 'epoch': 3.79}
{'loss': 0.2724, 'learning_rate': 9.078215057463868e-07, 'epoch': 3.79}
{'loss': 0.2534, 'learning_rate': 8.594826668332445e-07, 'epoch': 3.8}
{'loss': 0.3388, 'learning_rate': 8.124606767246579e-07, 'epoch': 3.8}
{'loss': 0.3867, 'learning_rate': 7.667561599972505e-07, 'epoch': 3.81}
{'loss': 0.2108, 'learning_rate': 7.223697237281668e-07, 'epoch': 3.81}
{'loss': 0.0792, 'learning_rate': 6.793019574868775e-07, 'epoch': 3.82}


TypeError: PosixPath#

Error message: TypeError: argument of type 'PosixPath' is not iterable

This issue is related to axolotl #1544. It can be fixed by downgrading datasets to 2.15.0.

pip install datasets==2.15.0

RuntimeError: out of device memory#

Error message: RuntimeError: Allocation is out of device memory on current platform.

This issue is caused by running out of GPU memory. Please reduce lora_r or micro_batch_size in qlora.yml or lora.yml, or reduce data using in training.


Error message: OSError: cannot open shared object file: No such file or directory

oneAPI environment is not correctly set. Please refer to Set Environment Variables.