Bug #1165
Wrong monitoring of VM when in paused state always leading to failed state
Status: | Closed | Start date: | 03/14/2012 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Javi Fontan | % Done: | 0% | |
Category: | Drivers - Auth | |||
Target version: | Release 3.6 | |||
Resolution: | fixed | Pull request: | ||
Affected Versions: | OpenNebula 3.4 |
Description
OpenNebula has a bug in the monitoring system.
If a VM goes into "paused" state (libvirt suspend) from outside OpenNebula, ONE will mark it as being into suspended mode.
When you try to "resume" it from ONE it will always go into failed state because it assumes there should be a checkpoint file.
So even if the machine gets resumed from outside ONE via libvirt, it won't be monitored anymore and it will remain in the failed state.
When the machine is in paused state, ONE should list it as being into an unknown state, or better yet add support in the future versions of ONE to a new state.
I have attached a file that shows exactly this problem.
Associated revisions
bug #1165: when a VM is paused the new state is unknown (xen and kvm)
bug #1165: when a VM is paused the new state is unknown (xen and kvm)
(cherry picked from commit ccff248315a9917e975b873283b70b535be0c3e3)
bug #1165: set correct state value for unknown state for vmm/poll
bug #1165: set correct state value for unknown state for vmm/poll
(cherry picked from commit a7ab6e4ce8e3ee304de85fc1fb31246c3a998582)
History
#1 Updated by Stefan Catargiu over 9 years ago
LOG File from the VM
Wed Mar 14 11:10:16 2012 [VMM][I]: ExitCode: 0
Wed Mar 14 11:10:16 2012 [VMM][I]: VM running but new state from monitor is PAUSED.
Wed Mar 14 11:10:16 2012 [LCM][I]: VM is suspended.
Wed Mar 14 11:10:16 2012 [DiM][I]: New VM state is SUSPENDED
Wed Mar 14 11:10:57 2012 [DiM][I]: New VM state is ACTIVE.
Wed Mar 14 11:10:57 2012 [LCM][I]: Restoring VM
Wed Mar 14 11:10:57 2012 [LCM][I]: New state is BOOT
Wed Mar 14 11:10:57 2012 [VMM][I]: ExitCode: 0
Wed Mar 14 11:10:57 2012 [VMM][I]: Successfully execute network driver operation: pre.
Wed Mar 14 11:10:57 2012 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /var/lib/one//0/images/checkpoint host_1 0 host_1
Wed Mar 14 11:10:57 2012 [VMM][E]: restore: Command "virsh --connect qemu:///system restore /var/lib/one//0/images/checkpoint" failed.
Wed Mar 14 11:10:57 2012 [VMM][E]: restore: error: Failed to restore domain from /var/lib/one//0/images/checkpoint
Wed Mar 14 11:10:57 2012 [VMM][I]: error: Failed to create file '/var/lib/one//0/images/checkpoint': No such file or directory
Wed Mar 14 11:10:57 2012 [VMM][E]: Could not restore from /var/lib/one//0/images/checkpoint
Wed Mar 14 11:10:57 2012 [VMM][I]: ExitCode: 1
Wed Mar 14 11:10:57 2012 [VMM][I]: Failed to execute virtualization driver operation: restore.
Wed Mar 14 11:10:57 2012 [VMM][E]: Error restoring VM: Could not restore from /var/lib/one//0/images/checkpoint
Wed Mar 14 11:10:57 2012 [DiM][I]: New VM state is FAILED
#2 Updated by Jaime Melis over 9 years ago
This should be fixed here:
https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/poll_xen_kvm.rb#L174
https://github.com/OpenNebula/one/blob/master/src/vmm/VirtualMachineManagerDriver.cc#L583
When libvirt reports the vm to be in paused state, the VM should be moved to the UNKNOWN state, and not the suspended state.
#3 Updated by Javi Fontan over 9 years ago
- Category set to Drivers - Auth
- Status changed from New to Assigned
- Assignee set to Javi Fontan
- Target version set to Release 3.4
#4 Updated by Javi Fontan over 9 years ago
- Status changed from Assigned to Closed
- Resolution set to fixed
#5 Updated by Stefan Catargiu about 9 years ago
This still seems to be broken and now it is a bit worse.
When I suspend the VM manually, opennebula will never change the state from running to unknown or as it was before the "fix" to suspended.
#6 Updated by Ruben S. Montero about 9 years ago
- Status changed from Closed to New
- Target version changed from Release 3.4 to Release 3.6
- Affected Versions OpenNebula 3.4 added
#7 Updated by Javi Fontan about 9 years ago
- Status changed from New to Closed