LXCs and the problems with migration in Proxmox

2025-03-28
Updated: 2025-03-28

~ TLDR: I have used HA to keep a VM and an lxc running in a proxmox cluster of three nodes but the lxc didn't work out.

The proxmox cluster is a testing enviroment and I was checking HA for the first time. So I wanted to keep a vm with a psql database running as well as an lxc with forgejo. For that I have made a Gluster distributed storage between the nodes and mounted the DB-vm there. The lxc unfortunately was in the local-lvm. When I added both of them in the HA and turned off the second node (which had the vm and the lxc) I saw that they seemed to migrate properly to the other node. But when the vm started running again, the lxc didn't. Then I realized I have done something that I shouldn't and searched what was wrong and how to fix it. So first thing was that it was giving me this error:

Requesting HA start for CT 102
service 'ct:102' in error state, must be disabled and fixed first
TASK ERROR: command 'ha-manager set ct:102 --state started' failed: exit code 255

So I went to HA and change the Request State: for CT:102 to disabled. Then I turned on the 2nd node and tried to migrate the VM and the CT back. Then DB-vm migrated properly, but the CT gave another error:

task started by HA resource agent
2025-03-28 12:14:27 starting migration of CT 102 to node 'pve22' (192.108.1.2)
2025-03-28 12:14:27 found local volume 'local-lvm:vm-102-disk-0' (in current VM config)
2025-03-28 12:14:27 ERROR: storage migration for 'local-lvm:vm-102-disk-0' to storage 'local-lvm' failed - no such logical volume pve/vm-102-disk-0
2025-03-28 12:14:27 aborting phase 1 - cleanup resources
2025-03-28 12:14:27 ERROR: found stale volume copy 'local-lvm:vm-102-disk-0' on node 'pve22'
2025-03-28 12:14:27 start final cleanup
2025-03-28 12:14:27 ERROR: migration aborted (duration 00:00:01): storage migration for 'local-lvm:vm-102-disk-0' to storage 'local-lvm' failed - no such logical volume pve/vm-102-disk-0
TASK ERROR: migration aborted

So I thought that the issue -might- be solved by copying the CT volume to that node and the it -could- migrate it back. After a bit of searching I foudn this thread and followed it.

ssh pve-node2
pvesm  list local-lvm
pvesm export local-lvm:vm-102-disk-0 raw+size - | ssh root@192.108.1.3 pvesm import local-lvm:vm-102-disk-0 raw+size -

Then I tested if the CT could start, and it worked! So just to migrate it back I used the Proxmox GUI, but I had a little surprise.

task started by HA resource agent
2025-03-28 12:57:39 starting migration of CT 102 to node 'pve22' (192.108.1.2)
2025-03-28 12:57:39 found local volume 'local-lvm:vm-102-disk-0' (in current VM config)
2025-03-28 12:57:40 volume pve/vm-102-disk-0 already exists - importing with a different name
2025-03-28 12:57:40   Logical volume "vm-102-disk-1" created.

It didn't use the old volume vm-102-disk-0 but made a new one vm-102-disk-1. I left is as is, and just deleted the old volume: pvesm free local-lvm:vm-102-disk-0

That's it, I hope someone might find it useful! Don't try to migrate CT like that as you lose time for nothing!

Cheers!