Table of contents

1. Introduction

1.1. performance 4/8/16 bit

compared

1.2. 4-bit Model Requirements for LLaMA

ModelModel SizeMinimum Total VRAMCard examplesRAM/Swap to Load
LLaMA-7B3.5GB6GBRTX 1660, 2060, AMD 5700xt, RTX 3050, 306016 GB
LLaMA-13B6.5GB10GBAMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A200032 GB
LLaMA-30B15.8GB20GBRTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V10064 GB
LLaMA-65B31.2GB40GBA100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada128 GB

1.3. 8-bit Model Requirements for LLaMA

ModelVRAM UsedMinimum Total VRAMCard examplesRAM/Swap to Load
LLaMA-7B9.2GB10GB3060 12GB, RTX 3080 10GB, RTX 309024 GB
LLaMA-13B16.3GB20GBRTX 3090 Ti, RTX 409032GB
LLaMA-30B36GB40GBA6000 48GB, A100 40GB64GB
LLaMA-65B74GB80GBA100 80GB128GB

1.4 downloading the correct models

The original leaked weights won’t work. You need the “HFv2” (HuggingFace version 2) converted model weights. You can get them by using this torrent or this magnet link

The WRONG original leaked weights have filenames that look like:

consolidated.00.pth
consolidated.01.pth

The CORRECT “HF Converted” weights have filenames that look like:

pytorch_model-00001-of-00033.bin
pytorch_model-00002-of-00033.bin
pytorch_model-00003-of-00033.bin
pytorch_model-00004-of-00033.bin

now place the folders into the subfolder

loeken@the-machine:~/Projects/text-generation-webui$ tree models/decapoda-research_llama-7b-hf/
models/decapoda-research_llama-7b-hf/
├── config.json
├── generation_config.json
├── huggingface-metadata.txt
├── pytorch_model-00001-of-00033.bin
├── pytorch_model-00002-of-00033.bin
├── pytorch_model-00003-of-00033.bin
├── pytorch_model-00004-of-00033.bin
├── pytorch_model-00005-of-00033.bin
├── pytorch_model-00006-of-00033.bin
├── pytorch_model-00007-of-00033.bin
├── pytorch_model-00008-of-00033.bin
├── pytorch_model-00009-of-00033.bin
├── pytorch_model-00010-of-00033.bin
├── pytorch_model-00011-of-00033.bin
├── pytorch_model-00012-of-00033.bin
├── pytorch_model-00013-of-00033.bin
├── pytorch_model-00014-of-00033.bin
├── pytorch_model-00015-of-00033.bin
├── pytorch_model-00016-of-00033.bin
├── pytorch_model-00017-of-00033.bin
├── pytorch_model-00018-of-00033.bin
├── pytorch_model-00019-of-00033.bin
├── pytorch_model-00020-of-00033.bin
├── pytorch_model-00021-of-00033.bin
├── pytorch_model-00022-of-00033.bin
├── pytorch_model-00023-of-00033.bin
├── pytorch_model-00024-of-00033.bin
├── pytorch_model-00025-of-00033.bin
├── pytorch_model-00026-of-00033.bin
├── pytorch_model-00027-of-00033.bin
├── pytorch_model-00028-of-00033.bin
├── pytorch_model-00029-of-00033.bin
├── pytorch_model-00030-of-00033.bin
├── pytorch_model-00031-of-00033.bin
├── pytorch_model-00032-of-00033.bin
├── pytorch_model-00033-of-00033.bin
├── pytorch_model.bin.index.json
├── README.md
├── special_tokens_map.json
├── tokenizer_config.json
└── tokenizer.model

0 directories, 41 files

alternatively you can use huggingface ones: https://huggingface.co/decapoda-research

2 text-generation-webui

https://github.com/oobabooga/text-generation-webui is a nice dashboard that allows you to load various models and make them output(genearte) and you can also use it to train models.

2.1 Ubuntu 22.04

2.1.0. youtube video

A video walking you through the setup can be found here:

oobabooga text-generation-webui setup in docker on ubuntu 22.04

2.1.1. update the drivers

in the the “software updater” update drivers to the last version of the prop driver.

2.1.2. reboot

to switch using to new driver

2.1.3. install docker

sudo apt update
sudo apt-get install curl
sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-compose -y
sudo usermod -aG docker $USER
newgrp docker

2.1.4. docker & container toolkit

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu22.04/amd64 /" | \
sudo tee /etc/apt/sources.list.d/nvidia.list > /dev/null 
sudo apt update
sudo apt install nvidia-docker2 nvidia-container-runtime -y
sudo systemctl restart docker

2.1.5. clone the repo

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui

2.1.6. prepare models

download and place the models inside the models folder. tested with:

4bit https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617 https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483941105

8bit: https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1484235789

2.1.7. prepare .env file

edit .env values to your needs.

cp .env.example .env
nano .env

2.1.8. startup docker container

docker-compose up --build

2.2. Manjaro

manjaro/arch is similar to ubuntu just the dependency installation is more convenient

2.2.1 update the drivers

sudo mhwd -a pci nonfree 0300

2.2.2 reboot

reboot

2.2.3 docker & container toolkit

yay -S docker docker-compose buildkit gcc nvidia-docker
sudo usermod -aG docker $USER
newgrp docker
sudo systemctl restart docker # required by nvidia-container-runtime

2.2.4 continue with ubuntu task

continue at 5. clone the repo

2.3. Windows

2.3.0. youtube video

A video walking you through the setup can be found here: oobabooga text-generation-webui setup in docker on windows 11

2.3.1. choco package manager

install package manager (https://chocolatey.org/ )

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

2.3.2. install drivers/dependencies

choco install nvidia-display-driver cuda git docker-desktop

2.3.3. install wsl

wsl –install

2.3.4. reboot

after reboot enter username/password in wsl

2.3.5. git clone && startup

clone the repo and edit .env values to your needs.

cd Desktop
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
COPY .env.example .env
notepad .env

2.3.6. prepare models

download and place the models inside the models folder. tested with:

4bit https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617 https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483941105

8bit: https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1484235789

2.3.7. startup

docker-compose up