Proxmox update cluster from 3 to 4

The new Proxmox got released http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_4.0 on 05.10.2015. You probably ended up here by typing “Proxmox update cluster from 3 to 4” to google. In this article I will outline how i updated my “poor man’s cluster”. My proxmox cluster currently consists of 5 servers hosted @ 3 Datacentres. I will try to add screenshots etc because most people i know prefer a hands on description rather then the technical writeups.

According to the migration paper which can be found @ https://pve.proxmox.com/wiki/Upgrade_from_3.x_to_4.0

Proxmox VE 4.0 introduces major new features, therefore the upgrade must be carefully planned and tested. Depending on your existing configuration, several manual steps are required, including some downtime. NEVER start the upgrade process without a valid backup and without testing the same in a test lab setup.

Major upgrades for V4.0:

  • OpenVZ is removed, a conversion via backup/restore to LXC is needed
  • New corosync version, therefore clusters has to be re-established
  • New HA manager (replacing RGmanager, involving a complete HA re-configuration)

Generally speaking there are two possibilities to move from 3.x to 4.0

  • In-place upgrade via apt, step by step
  • New installation on new hardware (and restore VMยดs from backup) – safest way.

In order to keep the downtime minimal i will try the “In-place upgrad via apt, step by step guide”. So lets dive in

Mixed cluster of v3 and v4 not possible:

 

Why bother to update?

There are a lot of reasons why to update to the new versions but the key reasons that motivated me to update are:

  • New Kernel Versions
  • Put OpenVZ to rest
  • Introduction of LXC
  • High Availability ( HA ) that supposedly works out of the box

 

Quick summary of steps for proxmox update cluster

step 1 – 5 Servers in cluster, remove all vms from server1 move over to other servers.

step 2 – change dns of all vms that ran on server1 to point to the new locations

step 3 – update server1 to new proxmox 4.0 ( remove from current cluster )

step 4 – create new cluster on server1 using proxmox 4.0

step 5 – import vms that previously ran on server1 to the new cluster on server1

step 6 – repeat step 1-5 for every other server

 

1 Migrate all Openvz Containers to KVMs/LXC

OpenVZ is not available for Kernels above 2.6.32, therefore a migration is necessary. Linux Container technology is available in all mainline Linux kernels and a future proof technology introduced in Proxmox VE 4.x series.You have a few options here

  • Move the applications that are running inside openvz containers over to KVMs
  • Convert the openvz containers to LXC containers – tutorial

 

 

2 Backup all existing systems

I hope you already have an automated system to backup your virtual machines regularly, if not create a system of each VMknow, in case of a zombie apocalypse your should be prepared. You can create backups inside proxmox easily using the webinterface, your backups’ default location for kvms on your servers is /var/lib/vz/dump. Most server providers offer a few 100GB of backup space accesible via nfs/ftp etc, this backup space is ideal to place your backups.

 

 

3 Clean up your Test-Server

So from the 5 servers i have running i have physical access to 1 of those servers, this will be the server i ll be testing the updated from 3 to 4 on. The servername of that server is Berlin1. The other Servers are located in datacentres, because they need more traffic then my local ISPs can handle. All servers are connected via OpenVPN to allow multicasts to be sent between the servers.

proxmox current setup

So my first task is to move all virtual machines running on this server to other servers in my cluster, so these vms remain online and i can test this update process on berlin1 without affecting anybody else. Then once i know how the updates work, I can roll-out these changes on the other production systems during off-peak hours ( yes nightshift! ๐Ÿ™‚ )

I do keep regular backups of my Berlin1 node on other Servers. So for me it was faster to use the last Backups of these and import them on the other servers. For you it might be faster to simply migrate the KVMs from one server to another. So now my Berlin1 node is free of any VMs – as naked as it hasn’t been since setting it up last year.

No more VMs on Berlin1

4 One last check before take off

Another peak at the Upgrade wiki of Proxmox tells us to ensure that we have met our preconditions for the in-place upgrade

  • upgraded to latest V3.4 version ( Check )

proxmox_version

  • reliable access to all configured storages ( Check: A copy of each VM on the Server and 2 redundant off-site backups )
  • healthy cluster ( Check )

Proxmox Cluster Status

  • no VM or CT running (note: VM live migration from 3.4 to 4.0 node or vice versa NOT possible) ( Check )
  • valid backup of all OpenVZ containers (needed for the conversion to LXC) ( Check )
  • valid backup of all VM (only needed if something goes wrong) ( Check )
  • Correct repository configuration (accessible both wheezy and jessie) ( Check )
  • at least 1GB free disk space at root mount point ( Check )

Space Free

 

 

5 Remove Proxmox VE 3.x packages

Since I use the same aptitute repository lists i have to ensure all servers have the same software installed. So we ll start my making sure they have the same software installed

Code:
apt-get update && apt-get dist-upgrade

Then we remove Proxmox VE 3.x packages in order to avoid dependency errors

Code:
apt-get remove proxmox-ve-2.6.32 pve-manager corosync-pve openais-pve redhat-cluster-pve pve-cluster pve-firmware

 

 

6 Update the repository

We adapt repository locations and update the apt database, point all to jessie:

Code:
sed -i 's/wheezy/jessie/g' /etc/apt/sources.list sed -i 's/wheezy/jessie/g' /etc/apt/sources.list.d/pve-enterprise.list apt-get update

Ceph:

In case Ceph server is used: Ceph repositories for jessie can be found at http://download.ceph.com, therefore etc/apt/sources.list.d/ceph.list will contain

Code:
deb http://download.ceph.com/debian-hammer jessie main

 

 

7 Install the new Kernel

Important:

A new version of OpenSSH will be installed, by default this newer version does not allow root logins with a password, so make sure you have a non root user to login later, this update process also removes the server from the cluster so if you plan on using the keys you added when you created a cluster this will not work. to add a user simply connect to your server and use “adduser newuser”

 

First you search for the new kernel by using

Code:
apt-cache search pve-kernel-4

Then you install the latest version in my example 4.2.3-2

Code:
apt-get install pve-kernel-4.2.3-2-pve pve-firmware -y
proxmox update cluster

the actual proxmox update cluster command.

So all thats left to be done is run the dist-upgrade command to turn our debian wheezy into a debian jessie:

Code:
apt-get dist-upgrade -y
install bridge-utils to ensure the bridge will come back
Code:
apt-get install bridge-utils -y

And finally reboot the server into the new kernel version:

Code:
reboot

After the reboot we check if we are now running the proper kernel by running

Code:
uname -a

And then we remove the old kernel

Code:
apt-get remove pve-kernel-2.6.32-* -y

and install our loved proxmox tools

Code:
apt-get install proxmox-ve

proxmox from another node in the cluster

It is not possible to mix Proxmox VE 3.x and earlier with Proxmox VE 4.0 cluster

Due to the new corosync 2.x, the cluster has to be re-established again. So we can remove berlin1 from the old cluster.

del_berlin1_node_from_old_cluster

 

8 Create new Cluster

After the installation we can login to see that the cluster is there, but neither working nor healthy

Cluster is there but not

Now we can create a new Cluster with our first Server by simply typing

Code:
pvecm create <clustername>

Make sure you use the same cluster name as you had before

Create new Cluster

So you remember the beginning, my berlin1 has been running roughly 20 Virtual Machines, which i moved over to other hosts, so right now these VMs are running on the other hosts. I will in the next step import all VMs back to berlin1, shut down the clones on the other servers 1 by 1 and update dns records so we can slowly migrate users over. ( Good tip for this to lower your dns’s TTL during all these moving back and forth ). But eventually we want to get all the servers back to the newly created cluster – and to create/join a cluster each proxmox server has to have no vms running.

 

9 Cleanup entries from old Cluster

proxmox_after_update

So at this moment initially it was a bit unclear to me if that now means that the old configuration was ported over to the new cluster. After checking pvecm nodes and pvecm status, i came to the conclusion that the other nodes are not actually linked to this server anymore and that i only have left over data inside my /etc/pve/nodes folder. So i changed to that directory and removed my old nodes ( proxmox1,2,5,6 )

The Webui then updated and now only shows me the one node.

just_berlin_but_still_red

We don’t use windows here – but we ll do a reboot anyways to it to recover without having to track down much here. After the server returns the light is now green:

Berlin1 after reboot all greenSo my plan was to create a new vm to then load the last backup for the first vm and repeat those steps for every vm.

But we are greeted by the lovely error message: “cluster not ready. no quorum? (500)”

create_new_vm_no_quorumSo i ssh into the server and make sure cman is running, which it turns out didnt run properly:

restart_cman

Now that the quorum is there I can create the vm and start importing the last backups:

proxmox_import_kvm_backup

Next on the road: create new vms for each of the backups and start importing. Coffee++ time ….

10. Do it to the next servers

We do the same to the other servers in the cluster with 1 small difference instead of pvecm create IME we join the cluster using.

Code:
pvecm add berlin1 -force

After running these commands on proxmox6.internetz.local the node was showing up red initially, but after 1 reboot we are in business:

2servers_done

 

Leave a Reply

Your email address will not be published.

Powered by themekiller.com