1. Introduction

1.1. performance 4/8/16 bit

compared

1.2. 4-bit Model Requirements for LLaMA

Model	Model Size	Minimum Total VRAM	Card examples	RAM/Swap to Load
LLaMA-7B	3.5GB	6GB	RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060	16 GB
LLaMA-13B	6.5GB	10GB	AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000	32 GB
LLaMA-30B	15.8GB	20GB	RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100	64 GB
LLaMA-65B	31.2GB	40GB	A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada	128 GB

1.3. 8-bit Model Requirements for LLaMA

Model	VRAM Used	Minimum Total VRAM	Card examples	RAM/Swap to Load
LLaMA-7B	9.2GB	10GB	3060 12GB, RTX 3080 10GB, RTX 3090	24 GB
LLaMA-13B	16.3GB	20GB	RTX 3090 Ti, RTX 4090	32GB
LLaMA-30B	36GB	40GB	A6000 48GB, A100 40GB	64GB
LLaMA-65B	74GB	80GB	A100 80GB	128GB

1.4 downloading the correct models

The original leaked weights won’t work. You need the “HFv2” (HuggingFace version 2) converted model weights. You can get them by using this torrent or this magnet link

The WRONG original leaked weights have filenames that look like:

consolidated.00.pth
consolidated.01.pth

The CORRECT “HF Converted” weights have filenames that look like:

pytorch_model-00001-of-00033.bin
pytorch_model-00002-of-00033.bin
pytorch_model-00003-of-00033.bin
pytorch_model-00004-of-00033.bin

now place the folders into the subfolder

loeken@the-machine:~/Projects/text-generation-webui$ tree models/decapoda-research_llama-7b-hf/
models/decapoda-research_llama-7b-hf/
├── config.json
├── generation_config.json
├── huggingface-metadata.txt
├── pytorch_model-00001-of-00033.bin
├── pytorch_model-00002-of-00033.bin
├── pytorch_model-00003-of-00033.bin
├── pytorch_model-00004-of-00033.bin
├── pytorch_model-00005-of-00033.bin
├── pytorch_model-00006-of-00033.bin
├── pytorch_model-00007-of-00033.bin
├── pytorch_model-00008-of-00033.bin
├── pytorch_model-00009-of-00033.bin
├── pytorch_model-00010-of-00033.bin
├── pytorch_model-00011-of-00033.bin
├── pytorch_model-00012-of-00033.bin
├── pytorch_model-00013-of-00033.bin
├── pytorch_model-00014-of-00033.bin
├── pytorch_model-00015-of-00033.bin
├── pytorch_model-00016-of-00033.bin
├── pytorch_model-00017-of-00033.bin
├── pytorch_model-00018-of-00033.bin
├── pytorch_model-00019-of-00033.bin
├── pytorch_model-00020-of-00033.bin
├── pytorch_model-00021-of-00033.bin
├── pytorch_model-00022-of-00033.bin
├── pytorch_model-00023-of-00033.bin
├── pytorch_model-00024-of-00033.bin
├── pytorch_model-00025-of-00033.bin
├── pytorch_model-00026-of-00033.bin
├── pytorch_model-00027-of-00033.bin
├── pytorch_model-00028-of-00033.bin
├── pytorch_model-00029-of-00033.bin
├── pytorch_model-00030-of-00033.bin
├── pytorch_model-00031-of-00033.bin
├── pytorch_model-00032-of-00033.bin
├── pytorch_model-00033-of-00033.bin
├── pytorch_model.bin.index.json
├── README.md
├── special_tokens_map.json
├── tokenizer_config.json
└── tokenizer.model

0 directories, 41 files

alternatively you can use huggingface ones: https://huggingface.co/decapoda-research

2 text-generation-webui

https://github.com/oobabooga/text-generation-webui is a nice dashboard that allows you to load various models and make them output(genearte) and you can also use it to train models.

2.1 Ubuntu 22.04

2.1.0. youtube video

A video walking you through the setup can be found here:

2.1.1. update the drivers

in the the “software updater” update drivers to the last version of the prop driver.

2.1.2. reboot

to switch using to new driver

2.1.3. install docker

sudo apt update
sudo apt-get install curl
sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-compose -y
sudo usermod -aG docker $USER
newgrp docker

2.1.4. docker & container toolkit

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/amd64 /" | \
sudo tee /etc/apt/sources.list.d/nvidia.list > /dev/null 
sudo apt update
sudo apt install nvidia-docker2 nvidia-container-runtime -y
sudo systemctl restart docker

2.1.5. clone the repo

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui

2.1.6. prepare models

download and place the models inside the models folder. tested with:

4bit https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617 https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483941105

8bit: https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1484235789

2.1.7. prepare .env file

edit .env values to your needs.

cp .env.example .env
nano .env

2.1.8. startup docker container

docker-compose up --build

2.2. Manjaro

manjaro/arch is similar to ubuntu just the dependency installation is more convenient

2.2.1 update the drivers

sudo mhwd -a pci nonfree 0300

2.2.2 reboot

reboot

2.2.3 docker & container toolkit

yay -S docker docker-compose buildkit gcc nvidia-docker
sudo usermod -aG docker $USER
newgrp docker
sudo systemctl restart docker # required by nvidia-container-runtime

2.2.4 continue with ubuntu task

continue at 5. clone the repo

2.3. Windows

2.3.0. youtube video

A video walking you through the setup can be found here:

2.3.1. choco package manager

install package manager (https://chocolatey.org/ )

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

2.3.2. install drivers/dependencies

choco install nvidia-display-driver cuda git docker-desktop

2.3.3. install wsl

wsl –install

2.3.4. reboot

after reboot enter username/password in wsl

2.3.5. git clone && startup

clone the repo and edit .env values to your needs.

cd Desktop
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
COPY .env.example .env
notepad .env

2.3.6. prepare models