Llama
Table of contents
1. Introduction
1.1. performance 4/8/16 bit
1.2. 4-bit Model Requirements for LLaMA
Model | Model Size | Minimum Total VRAM | Card examples | RAM/Swap to Load |
---|---|---|---|---|
LLaMA-7B | 3.5GB | 6GB | RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060 | 16 GB |
LLaMA-13B | 6.5GB | 10GB | AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000 | 32 GB |
LLaMA-30B | 15.8GB | 20GB | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 | 64 GB |
LLaMA-65B | 31.2GB | 40GB | A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada | 128 GB |
1.3. 8-bit Model Requirements for LLaMA
Model | VRAM Used | Minimum Total VRAM | Card examples | RAM/Swap to Load |
---|---|---|---|---|
LLaMA-7B | 9.2GB | 10GB | 3060 12GB, RTX 3080 10GB, RTX 3090 | 24 GB |
LLaMA-13B | 16.3GB | 20GB | RTX 3090 Ti, RTX 4090 | 32GB |
LLaMA-30B | 36GB | 40GB | A6000 48GB, A100 40GB | 64GB |
LLaMA-65B | 74GB | 80GB | A100 80GB | 128GB |
1.4 downloading the correct models
The original leaked weights won’t work. You need the “HFv2” (HuggingFace version 2) converted model weights. You can get them by using this torrent or this magnet link
The WRONG original leaked weights have filenames that look like:
consolidated.00.pth
consolidated.01.pth
The CORRECT “HF Converted” weights have filenames that look like:
pytorch_model-00001-of-00033.bin
pytorch_model-00002-of-00033.bin
pytorch_model-00003-of-00033.bin
pytorch_model-00004-of-00033.bin
now place the folders into the subfolder
loeken@the-machine:~/Projects/text-generation-webui$ tree models/decapoda-research_llama-7b-hf/
models/decapoda-research_llama-7b-hf/
├── config.json
├── generation_config.json
├── huggingface-metadata.txt
├── pytorch_model-00001-of-00033.bin
├── pytorch_model-00002-of-00033.bin
├── pytorch_model-00003-of-00033.bin
├── pytorch_model-00004-of-00033.bin
├── pytorch_model-00005-of-00033.bin
├── pytorch_model-00006-of-00033.bin
├── pytorch_model-00007-of-00033.bin
├── pytorch_model-00008-of-00033.bin
├── pytorch_model-00009-of-00033.bin
├── pytorch_model-00010-of-00033.bin
├── pytorch_model-00011-of-00033.bin
├── pytorch_model-00012-of-00033.bin
├── pytorch_model-00013-of-00033.bin
├── pytorch_model-00014-of-00033.bin
├── pytorch_model-00015-of-00033.bin
├── pytorch_model-00016-of-00033.bin
├── pytorch_model-00017-of-00033.bin
├── pytorch_model-00018-of-00033.bin
├── pytorch_model-00019-of-00033.bin
├── pytorch_model-00020-of-00033.bin
├── pytorch_model-00021-of-00033.bin
├── pytorch_model-00022-of-00033.bin
├── pytorch_model-00023-of-00033.bin
├── pytorch_model-00024-of-00033.bin
├── pytorch_model-00025-of-00033.bin
├── pytorch_model-00026-of-00033.bin
├── pytorch_model-00027-of-00033.bin
├── pytorch_model-00028-of-00033.bin
├── pytorch_model-00029-of-00033.bin
├── pytorch_model-00030-of-00033.bin
├── pytorch_model-00031-of-00033.bin
├── pytorch_model-00032-of-00033.bin
├── pytorch_model-00033-of-00033.bin
├── pytorch_model.bin.index.json
├── README.md
├── special_tokens_map.json
├── tokenizer_config.json
└── tokenizer.model
0 directories, 41 files
alternatively you can use huggingface ones: https://huggingface.co/decapoda-research
2 text-generation-webui
https://github.com/oobabooga/text-generation-webui is a nice dashboard that allows you to load various models and make them output(genearte) and you can also use it to train models.
2.1 Ubuntu 22.04
2.1.0. youtube video
A video walking you through the setup can be found here:
2.1.1. update the drivers
in the the “software updater” update drivers to the last version of the prop driver.
2.1.2. reboot
to switch using to new driver
2.1.3. install docker
sudo apt update
sudo apt-get install curl
sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-compose -y
sudo usermod -aG docker $USER
newgrp docker
2.1.4. docker & container toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/amd64 /" | \
sudo tee /etc/apt/sources.list.d/nvidia.list > /dev/null
sudo apt update
sudo apt install nvidia-docker2 nvidia-container-runtime -y
sudo systemctl restart docker
2.1.5. clone the repo
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
2.1.6. prepare models
download and place the models inside the models folder. tested with:
4bit https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617 https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483941105
8bit: https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1484235789
2.1.7. prepare .env file
edit .env values to your needs.
cp .env.example .env
nano .env
2.1.8. startup docker container
docker-compose up --build
2.2. Manjaro
manjaro/arch is similar to ubuntu just the dependency installation is more convenient
2.2.1 update the drivers
sudo mhwd -a pci nonfree 0300
2.2.2 reboot
reboot
2.2.3 docker & container toolkit
yay -S docker docker-compose buildkit gcc nvidia-docker
sudo usermod -aG docker $USER
newgrp docker
sudo systemctl restart docker # required by nvidia-container-runtime
2.2.4 continue with ubuntu task
continue at 5. clone the repo
2.3. Windows
2.3.0. youtube video
A video walking you through the setup can be found here:
2.3.1. choco package manager
install package manager (https://chocolatey.org/ )
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
2.3.2. install drivers/dependencies
choco install nvidia-display-driver cuda git docker-desktop
2.3.3. install wsl
wsl –install
2.3.4. reboot
after reboot enter username/password in wsl
2.3.5. git clone && startup
clone the repo and edit .env values to your needs.
cd Desktop
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
COPY .env.example .env
notepad .env
2.3.6. prepare models
download and place the models inside the models folder. tested with:
4bit https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617 https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483941105
8bit: https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1484235789
2.3.7. startup
docker-compose up
comments powered by Disqus