Bug #1165

Wrong monitoring of VM when in paused state always leading to failed state

Added by Stefan Catargiu over 9 years ago. Updated about 9 years ago.

Status:ClosedStart date:03/14/2012
Priority:NormalDue date:
Assignee:Javi Fontan% Done:

0%

Category:Drivers - Auth
Target version:Release 3.6
Resolution:fixed Pull request:
Affected Versions:OpenNebula 3.4

Description

OpenNebula has a bug in the monitoring system.
If a VM goes into "paused" state (libvirt suspend) from outside OpenNebula, ONE will mark it as being into suspended mode.
When you try to "resume" it from ONE it will always go into failed state because it assumes there should be a checkpoint file.
So even if the machine gets resumed from outside ONE via libvirt, it won't be monitored anymore and it will remain in the failed state.
When the machine is in paused state, ONE should list it as being into an unknown state, or better yet add support in the future versions of ONE to a new state.

I have attached a file that shows exactly this problem.

one_bug.txt Magnifier (1.33 KB) Stefan Catargiu, 03/14/2012 10:30 AM

Associated revisions

Revision ccff2483
Added by Javi Fontan over 9 years ago

bug #1165: when a VM is paused the new state is unknown (xen and kvm)

Revision 7c1401a8
Added by Javi Fontan about 9 years ago

bug #1165: when a VM is paused the new state is unknown (xen and kvm)
(cherry picked from commit ccff248315a9917e975b873283b70b535be0c3e3)

Revision a7ab6e4c
Added by Javi Fontan about 9 years ago

bug #1165: set correct state value for unknown state for vmm/poll

Revision 3d1713f1
Added by Javi Fontan about 9 years ago

bug #1165: set correct state value for unknown state for vmm/poll
(cherry picked from commit a7ab6e4ce8e3ee304de85fc1fb31246c3a998582)

History

#1 Updated by Stefan Catargiu over 9 years ago

LOG File from the VM

Wed Mar 14 11:10:16 2012 [VMM][I]: ExitCode: 0
Wed Mar 14 11:10:16 2012 [VMM][I]: VM running but new state from monitor is PAUSED.
Wed Mar 14 11:10:16 2012 [LCM][I]: VM is suspended.
Wed Mar 14 11:10:16 2012 [DiM][I]: New VM state is SUSPENDED
Wed Mar 14 11:10:57 2012 [DiM][I]: New VM state is ACTIVE.
Wed Mar 14 11:10:57 2012 [LCM][I]: Restoring VM
Wed Mar 14 11:10:57 2012 [LCM][I]: New state is BOOT
Wed Mar 14 11:10:57 2012 [VMM][I]: ExitCode: 0
Wed Mar 14 11:10:57 2012 [VMM][I]: Successfully execute network driver operation: pre.
Wed Mar 14 11:10:57 2012 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /var/lib/one//0/images/checkpoint host_1 0 host_1
Wed Mar 14 11:10:57 2012 [VMM][E]: restore: Command "virsh --connect qemu:///system restore /var/lib/one//0/images/checkpoint" failed.
Wed Mar 14 11:10:57 2012 [VMM][E]: restore: error: Failed to restore domain from /var/lib/one//0/images/checkpoint
Wed Mar 14 11:10:57 2012 [VMM][I]: error: Failed to create file '/var/lib/one//0/images/checkpoint': No such file or directory
Wed Mar 14 11:10:57 2012 [VMM][E]: Could not restore from /var/lib/one//0/images/checkpoint
Wed Mar 14 11:10:57 2012 [VMM][I]: ExitCode: 1
Wed Mar 14 11:10:57 2012 [VMM][I]: Failed to execute virtualization driver operation: restore.
Wed Mar 14 11:10:57 2012 [VMM][E]: Error restoring VM: Could not restore from /var/lib/one//0/images/checkpoint
Wed Mar 14 11:10:57 2012 [DiM][I]: New VM state is FAILED

#2 Updated by Jaime Melis over 9 years ago

This should be fixed here:
https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/poll_xen_kvm.rb#L174
https://github.com/OpenNebula/one/blob/master/src/vmm/VirtualMachineManagerDriver.cc#L583

When libvirt reports the vm to be in paused state, the VM should be moved to the UNKNOWN state, and not the suspended state.

#3 Updated by Javi Fontan over 9 years ago

  • Category set to Drivers - Auth
  • Status changed from New to Assigned
  • Assignee set to Javi Fontan
  • Target version set to Release 3.4

#4 Updated by Javi Fontan over 9 years ago

  • Status changed from Assigned to Closed
  • Resolution set to fixed

#5 Updated by Stefan Catargiu about 9 years ago

This still seems to be broken and now it is a bit worse.
When I suspend the VM manually, opennebula will never change the state from running to unknown or as it was before the "fix" to suspended.

#6 Updated by Ruben S. Montero about 9 years ago

  • Status changed from Closed to New
  • Target version changed from Release 3.4 to Release 3.6
  • Affected Versions OpenNebula 3.4 added

#7 Updated by Javi Fontan about 9 years ago

  • Status changed from New to Closed

Also available in: Atom PDF