Bug #3741

Snapshot revert stops working after some iterations

Added by Joaquin Rinaudo over 6 years ago. Updated about 6 years ago.

Status:ClosedStart date:04/08/2015
Priority:NoneDue date:
Assignee:-% Done:

0%

Category:Drivers - VM
Target version:Release 4.14
Resolution:worksforme Pull request:
Affected Versions:OpenNebula 4.10

Description

Hi, i'm running ONE 4.10.2. I have a VM with a qcow2 image that is running a job every 5 minutes. After finishing it, the VM is restored to a saved snapshot. For some reason, after a couple of iterations of reverting the snapshot, I get an error of the sort:

Wed Apr 8 11:27:24 2015 [Z0][VMM][I]: VM Snapshot successfully created.
Wed Apr 8 11:28:19 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:29:15 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:30:08 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:31:04 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:32:00 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:32:56 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:33:49 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:34:45 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:34:45 2015 [Z0][VMM][I]: VM running but monitor state is POWEROFF
Wed Apr 8 11:34:45 2015 [Z0][DiM][I]: New VM state is POWEROFF
Wed Apr 8 11:35:35 2015 [Z0][VMM][I]: VM found again, state is RUNNING
Wed Apr 8 11:35:35 2015 [Z0][LCM][I]: New VM state is RUNNING

When this happens, the snapshot is lost (because VM state was changed).

Regards,
Joaquín


Related issues

Related to Bug #3740: VM snapshots not visible after powercycle (poweroff / res... Closed 04/08/2015

History

#1 Updated by Ruben S. Montero over 6 years ago

  • Related to Bug #3740: VM snapshots not visible after powercycle (poweroff / resume) added

#2 Updated by Ruben S. Montero over 6 years ago

Hi Joaquin

It seems that the monitor probe sends information while the VM is snapshoting... oned then assumes it's been powered off and clears the snapshots in its internal data. When the VM is found again it's moved to running but the snapshots are gone.

This will be solved by preserving snapshots during a powercycle, till then try to tune the timing of the action and the monitoring cycle (although playing with timing in a distributed system is not a good idea...).

Alternatively we can take a look to the monitor probe to report the VM as active while snapsotting.

Thanks

#3 Updated by Ruben S. Montero over 6 years ago

  • Category set to Drivers - VM
  • Status changed from Pending to New
  • Target version set to Release 4.14

#4 Updated by Joaquin Rinaudo over 6 years ago

Hi, can the alternative solution of reporting the VM as active when snapshoting be implemented before the 4.14 release?
Unfortunately, tuning the timing of the action isn't possible since the idea is to revert as soon as the job has finished (and the job duration is different in each run). I can try including dead waiting times to even out the jobs to be able to sync to the monitoring cycle but it's not a great solution performance wise.

Thanks

#5 Updated by Ruben S. Montero about 6 years ago

Joaquin Rinaudo wrote:

Hi, can the alternative solution of reporting the VM as active when snapshoting be implemented before the 4.14 release?
Unfortunately, tuning the timing of the action isn't possible since the idea is to revert as soon as the job has finished (and the job duration is different in each run). I can try including dead waiting times to even out the jobs to be able to sync to the monitoring cycle but it's not a great solution performance wise.

Thanks

We need to find out what's the state of a VM while snapshotting, basically the probe executes a virsh list and a dominfo. You may try to hack the state here:

https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/poll_xen_kvm.rb#L68

and here:

https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/poll_xen_kvm.rb#L271

#6 Updated by Joaquin Rinaudo about 6 years ago

When snapshotting (both taking and reverting a snapshot) the state is set to paused.

virsh --connect qemu:///system --readonly dominfo one-220

Id:             2
Name: one-220
UUID: 87892a7b-f829-bb85-6d99-60b1a2d40547
OS Type: hvm
State: paused
CPU(s): 1
CPU time: 622218.4s
Max memory: 2097152 KiB
Used memory: 2097152 KiB
Persistent: no
Autostart: disable
Managed save: no

The commit https://github.com/OpenNebula/one/commit/87cef75a8e42cf4f74a05160255dfb7796690bf7#diff-650e58e49530115caad6187b7283aa25L266 already fixed the issue.

Joaquín

#7 Updated by Ruben S. Montero about 6 years ago

  • Status changed from New to Closed
  • Resolution set to worksforme

Great! closing this then...

Also available in: Atom PDF