Bug #1255
VMs do not resume anymore after oned was stopped
Status: | Closed | Start date: | 04/25/2012 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Javi Fontan | % Done: | 0% | |
Category: | Drivers - Auth | |||
Target version: | Release 3.8 | |||
Resolution: | fixed | Pull request: | ||
Affected Versions: | OpenNebula 3.2 |
Description
Hi,
after shutting down and restarting oned (complete frontend/host had to be rebooted) none of the suspeneded VMs could be resumed again, all ended in failed state.
Resubmitting (with an actual backup of the system disk) fails also, complaining about the domain uuid already existing.
... Wed Apr 25 13:42:27 2012 [VMM][I]: Generating deployment file: /var/lib/one/1/deployment.34 Wed Apr 25 13:42:27 2012 [VMM][I]: ExitCode: 0 Wed Apr 25 13:42:27 2012 [VMM][I]: Successfully execute network driver operation: pre. Wed Apr 25 13:42:27 2012 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /var/lib/one/1/images/deployment.34 atlas1 1 atlas1 Wed Apr 25 13:42:27 2012 [VMM][I]: error: Failed to create domain from /var/lib/one/1/images/deployment.34 Wed Apr 25 13:42:27 2012 [VMM][I]: error: operation failed: domain 'one-1' already exists with uuid 2d10604f-b64b-e613-e708-af1725e80240 Wed Apr 25 13:42:27 2012 [VMM][E]: Could not create domain from /var/lib/one/1/images/deployment.34 Wed Apr 25 13:42:27 2012 [VMM][I]: ExitCode: 255 Wed Apr 25 13:42:27 2012 [VMM][I]: Failed to execute virtualization driver operation: deploy. Wed Apr 25 13:42:27 2012 [VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/1/images/deployment.34 Wed Apr 25 13:42:27 2012 [DiM][I]: New VM state is FAILED
Associated revisions
bug #1255: make sure the VM is undefined before deploying it
Revert "bug #1255: make sure the VM is undefined before deploying it"
The command had the path of the deployment.0 as parameter,
that was totally wrong. Reverting as it fails and does not
make the command any better.
This reverts commit 1284b7aa18bd57a157f1794e779dfc83523517c0.
Feature #1255: Add a SecurityGroup VNM driver that can handle a pool of
Security Groups
(cherry picked from commit 8f877a54b418c4d032f987bee9885efaa0ac4440)
History
#1 Updated by Jochem Ippers about 9 years ago
P.S.:
I could resume suspended VMs on a different Opennebula Host that was also restarted.
#2 Updated by Ruben S. Montero about 9 years ago
- Target version changed from Release 3.8 to Release 3.6
#3 Updated by Ruben S. Montero about 9 years ago
- Assignee set to Javi Fontan
#4 Updated by Ruben S. Montero about 9 years ago
- Status changed from New to Assigned
#5 Updated by Javi Fontan about 9 years ago
- Target version changed from Release 3.6 to Release 3.8
Do you have logs from the resume failure? It seems that the resume action started the VM but somehow failed.
A possible solution is to force a cancelation of a VM that failed resuming you you can resubmit it again. But we want to check what is causing the resume failure as a oned/machine reboot should not cause this problem.
#6 Updated by Jochem Ippers about 9 years ago
Hi Javi,
as far as I remember there were no other opennebula messages to find then and no libvirt messages had been produced that time.
But, my fault, I just found one line in the qemu-logs (/var/log/libvirt/qemu/*) of the VMs that I oversaw when I did this bug report.
After logging the long qemu start commandline it directly says:
load of migration failed
And if you google for "qemu restore load of migration failed" it finds a problem with migration but also cases where you "can't restore a guest saved by an older qemu":
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/87570
Well, and I think I did an update of the virtualization packages on that machine before.
So I guess this bug should just be closed (if noone else experienced such a problem without updating kvm/qemu).
Kind regards
Jochem
#7 Updated by Javi Fontan almost 9 years ago
- Status changed from Assigned to Closed
- Resolution set to fixed
Make sure the VM is undefined before creating it.