Fine-Tuning LLMs with 7B or more parameters require substantial hardware resources. One option is to build and on-premise computer with powerful and costly GPUs. The other option is to use cloud environments, including free services, like Collab and Kaggle, and paid services, like Replicate and Paperspace. These environments offer Jupyter notebooks in which you can run your LLM fine-tuning code. However, these environments have constraints and limitations that need to be considered, such as the maximum amount of time that a notebook can run.
This article contains eight tricks when working with such cloud environments. You will learn how to inspect the cloud environment, define workloads to run on CPU or GPU, how to save and export training results as well as preventing sessions timeouts.
This article originally appeared at my blog admantium.com.
Overview
The tricks cover following aspects:
Inspect
- Hardware Specification
- Library and Binary Versions
Setup
- Library Version Pinning
- Binary Versions Pinning and Execution
- Prevent Data Logging to External Providers
Running
- Periodically Save Training Artifacts
- Manually Save Training Artifacts
- Prevent Session Timeout
Inspect
Hardware Specification
To see the details of the available hardware, use the following script:
# Source: https://www.kaggle.com/code/lukicdarkoo/kaggle-machine-specification-cpu-gpu-ram-os
from GPUtil import showUtilization as gpu_usage
def run(command):
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
out, err = process.communicate()
print(out.decode('utf-8').strip())
print('# CPU')
run('cat /proc/cpuinfo | egrep -m 1 "^model name"')
run('cat /proc/cpuinfo | egrep -m 1 "^cpu MHz"')
run('cat /proc/cpuinfo | egrep -m 1 "^cpu cores"')
print('# RAM')
run('cat /proc/meminfo | egrep "^MemTotal"')
print('# OS')
run('uname -a')
print('# GPU')
run('nvidia-smi')
Example output from Kaggle:
# CPU
model name : Intel(R) Xeon(R) CPU @ 2.00GHz
cpu MHz : 2000.174
cpu cores : 2
# RAM
MemTotal: 32880784 kB
# OS
Linux 5b1f6e39bcfd 5.15.133+ #1 SMP Tue Dec 19 13:14:11 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
# GPU
Fri Mar 29 09:49:48 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P100-PCIE-16GB Off | 00000000:00:04.0 Off | 0 |
| N/A 34C P0 25W / 250W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
| ID | GPU | MEM |
------------------
| 0 | 0% | 0% |
Library and Binary Versions
To see all installed libraries in your notebook, run this:
!python --version
# Python 3.10.13
!conda list
# packages in environment at /opt/conda:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
absl-py 1.4.0 pypi_0 pypi
accelerate 0.28.0 pypi_0 pypi
# ....
transformers 4.38.2 pypi_0 pypi
# ....
To see which other binaries are installed:
!find /usr/bin -executable|sort
usr/bin
/usr/bin/7z
/usr/bin/7za
/usr/bin/7zr
/usr/bin/X11
/usr/bin/apt
/usr/bin/bash
Setup
Library Version Pinning
The majority of cloud provider notebooks come with pre-installed libraries. And most published notebooks typically install the latest version of all dependencies. This works for the libraries at the time the notebook was published! If you try a notebook that is 6 months old, the chances are high that it does not work anymore.
Software libraries evolve, including changes to their API, available parameters and return types. Therefore, it is crucial to apply restrictive version pinning in your projects to ensure that what runs today is still running 6 months later.
Here is an example from my Kaggle LLM Fine-Tuning notebook:
# Transformers installation
!pip install -U transformers==4.30 tensorflow==2.15
!pip install accelerate==0.27.2 peft==0.10.0 bitsandbytes==0.43.0 trl==0.8.1 datasets==2.1.0
!pip install einops==0.7.0 fsspec==2024.2.0
Binary Version Pinning and Execution
Some projects require you to use a specific version of an installed binary, such as Python.
Running an internet search reveals a plethora of methods, dating back several years into the past, and include using Linux install commands, pipx
, pyenv
and conda
.
In environments where conda
is available, you can install a specific Python version as shown:
!conda create -n py3.8 -y \
&& source /opt/conda/bin/activate py3.8 \
&& conda install python=3.8 -y \
&& python --version
When using this specific binary, you need to consider that each command in a Jupyter notebook s essential a one-off command. Therefore, you need to prepend all commands with the desired binary, and chain the commands together, like this:
!source /opt/conda/bin/activate py3.8 \
&& python --version \
&& cd llm-evaluation \
&& pip install -r requirements.txt \
Prevent Data Logging to External Providers
Some cloud environments automatically enable external telemetry data to be captured and send.
On Kaggle, the wandb library is installed, which is invoked during training automatically. If you do not need it, you can uninstall it with this command:
!pip uninstall wandb -y
Alternatively, you can set an environment variable.
import os
os.environ["WANDB_MODE"] = "offline"
When using HuggingFace trainer library, disable all telemetry with this:
args = TrainingArguments(
...
report_to=None,
)
Execution
Periodically Save Training Artifacts
Some cloud environments do not guarantee a default runtime duration. Therefore, you should save training results automatically & periodically.
With the HuggingFace Trainer library, use this:
training_args = TrainingArguments(
output_dir="./llama-7b-qlora-instruct",
save_steps=1,
)
With Tensorflow, you need to create a Checkpoint
and CheckpoinManager
object, and pass them to the trainer.
# source: https://www.tensorflow.org/guide/checkpoint
ckpt = tf.train.Checkpoint(step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator)
manager = tf.train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=3)
def train_and_checkpoint(net, manager):
#...
for _ in range(50):
example = next(iterator)
loss = train_step(net, example, opt)
ckpt.step.assign_add(1)
save_path = manager.save()
Manually Export Training Artifacts
Output data resides in the virtual machine instance of the cloud provider. To get this data out, you have several cloud-provide specific and agonistic solutions. A list ordered by “most-generic” to “very specific”:
Download via GUI
Some environments offer and option to download files from a dedicated directory path. First, create a zip file via a bash command, e.g. !zip -r file.zip "/kaggle/working/llama-7b-qlora-instruct/checkpoint-80"
. Second, download this zip file.
In Collab, you can access the file explorer via the GUI. Or you can trigger a Download dialog to open by executing his snippet:
from google.colab import files
files.download(zipfile_name)
In Kaggle, you can also use the GUI, or open a clickable link with this code:
from IPython.display import FileLink, display
display(FileLink(zipfile_name)
Upload to Cloud Storage
Another option is to upload the results to a cloud storage repository. Thereby, it is crucial that you trust the environment with providing required access credentials.
For accessing Google Storage, use the following snippet. It creates an inline-tile that starts an interactive login and then mounts the drive at the specified mount point.
from google.colab import drive
drive.mount('/content/gdrive')
For accessing Amazon cloud storage, use the boto3 library:
s3_client = boto3.client('s3')
response = s3_client.upload_file(file_name, bucket, object_name)
Prevent Session Timeout
Most cloud environments have an Idle timeout, which means that after a certain period where you do not engage with the site, the environment will be stopped, and your results lost. The key is to implement browser interactivity with a script. Open the browser console, then run the following script:
document.body.addEventListener('click', () => {
console.log("click");
});
const click = () => {
const simulate = new MouseEvent('click', {
view: window,
bubbles: true,
cancelable: true,
clientX: 100,
});
document.body.dispatchEvent(simulate);
}
const sleep = (delay) => new Promise((resolve) => setTimeout(resolve, delay))
const repeatedClick = async () => {
while (true) {
click();
await sleep(60000);
}
}
repeatedClick();
This will keep the session active even when you un-focus the browser window.
Conclusion
When fine-tuning or evaluating LLMs in cloud environments, several restrictions apply. This blog post includes a set of tricks and best-practices to make these environments work more robust for your projects. You learned how to inspect the hardware, libraries and binaries, then how to apply strict version pinning, and finally how to periodically and automatically save results and prevent a sessions timeout.