Bug #1255: VMs do not resume anymore after oned was stopped - OpenNebula - OpenNebula Development pages

Bug #1255

VMs do not resume anymore after oned was stopped

Added by Jochem Ippers about 9 years ago. Updated almost 9 years ago.

Status:

Closed

Start date:

04/25/2012

Priority:

Normal

Due date:

Assignee:

Javi Fontan

% Done:

Category:

Drivers - Auth

Target version:

Release 3.8

Resolution:

fixed

Pull request:

Affected Versions:

OpenNebula 3.2

Description

Hi,
after shutting down and restarting oned (complete frontend/host had to be rebooted) none of the suspeneded VMs could be resumed again, all ended in failed state.
Resubmitting (with an actual backup of the system disk) fails also, complaining about the domain uuid already existing.

...
Wed Apr 25 13:42:27 2012 [VMM][I]: Generating deployment file: /var/lib/one/1/deployment.34
Wed Apr 25 13:42:27 2012 [VMM][I]: ExitCode: 0
Wed Apr 25 13:42:27 2012 [VMM][I]: Successfully execute network driver operation: pre.
Wed Apr 25 13:42:27 2012 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /var/lib/one/1/images/deployment.34 atlas1 1 atlas1
Wed Apr 25 13:42:27 2012 [VMM][I]: error: Failed to create domain from /var/lib/one/1/images/deployment.34
Wed Apr 25 13:42:27 2012 [VMM][I]: error: operation failed: domain 'one-1' already exists with uuid 2d10604f-b64b-e613-e708-af1725e80240
Wed Apr 25 13:42:27 2012 [VMM][E]: Could not create domain from /var/lib/one/1/images/deployment.34
Wed Apr 25 13:42:27 2012 [VMM][I]: ExitCode: 255
Wed Apr 25 13:42:27 2012 [VMM][I]: Failed to execute virtualization driver operation: deploy.
Wed Apr 25 13:42:27 2012 [VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one/1/images/deployment.34
Wed Apr 25 13:42:27 2012 [DiM][I]: New VM state is FAILED

Associated revisions

Revision 1284b7aa
Added by Javi Fontan almost 9 years ago

bug #1255: make sure the VM is undefined before deploying it

Revision 252a5932
Added by Javi Fontan over 8 years ago

Revert "bug #1255: make sure the VM is undefined before deploying it"

The command had the path of the deployment.0 as parameter,
that was totally wrong. Reverting as it fails and does not
make the command any better.

This reverts commit 1284b7aa18bd57a157f1794e779dfc83523517c0.

Revision ffd3f1e9
Added by Jaime Melis almost 7 years ago

Feature #1255: Add a SecurityGroup VNM driver that can handle a pool of
Security Groups

(cherry picked from commit 8f877a54b418c4d032f987bee9885efaa0ac4440)

History

#1 Updated by Jochem Ippers about 9 years ago

P.S.:
I could resume suspended VMs on a different Opennebula Host that was also restarted.

#2 Updated by Ruben S. Montero about 9 years ago

Target version changed from Release 3.8 to Release 3.6

#3 Updated by Ruben S. Montero about 9 years ago

Assignee set to Javi Fontan

#4 Updated by Ruben S. Montero about 9 years ago

Status changed from New to Assigned

#5 Updated by Javi Fontan about 9 years ago

Target version changed from Release 3.6 to Release 3.8

Do you have logs from the resume failure? It seems that the resume action started the VM but somehow failed.

A possible solution is to force a cancelation of a VM that failed resuming you you can resubmit it again. But we want to check what is causing the resume failure as a oned/machine reboot should not cause this problem.

#6 Updated by Jochem Ippers about 9 years ago

Hi Javi,
as far as I remember there were no other opennebula messages to find then and no libvirt messages had been produced that time.
But, my fault, I just found one line in the qemu-logs (/var/log/libvirt/qemu/*) of the VMs that I oversaw when I did this bug report.
After logging the long qemu start commandline it directly says:

load of migration failed

And if you google for "qemu restore load of migration failed" it finds a problem with migration but also cases where you "can't restore a guest saved by an older qemu":
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/87570
Well, and I think I did an update of the virtualization packages on that machine before.

So I guess this bug should just be closed (if noone else experienced such a problem without updating kvm/qemu).

Kind regards
Jochem

#7 Updated by Javi Fontan almost 9 years ago

Status changed from Assigned to Closed
Resolution set to fixed

Make sure the VM is undefined before creating it.

Also available in: Atom PDF

OpenNebula

Issues

Custom queries