Bug #4048

Disk snapshots that fail delete the existing snapshot instead of the new (failed) snapshot

Added by Roy Keene over 5 years ago. Updated over 5 years ago.

Status:ClosedStart date:10/10/2015
Priority:NormalDue date:
Assignee:Ruben S. Montero% Done:

0%

Category:Drivers - Storage
Target version:Release 4.14.2
Resolution:fixed Pull request:
Affected Versions:OpenNebula 4.14

Description

The rollback action for creating disk snapshots is to delete the existing (base) snapshot rather than the snapshot that was failed to be created.

src/vmm/VirtualMachineManager.cc:

2156     rc = tm->snapshot_transfer_command( vm, "SNAP_CREATE", os);
2157 
2158     snap_cmd = os.str();
2159 
2160     os.str("");
2161 
2162     rc += tm->snapshot_transfer_command( vm, "SNAP_DELETE", os);
2163 
2164     snap_cmd_rollback = os.str();

The snapshot_transfer_command() (in src/tm/TransferManager.cc) function inserts the snap_id value and eventually passes that on to the transfer manager.

However, since the snap_id value is computed before any action is taken, it's the same snap_id as passed to "SNAP_CREATE".

The result is if you try to create a snapshot of disk 0's snapshot 0 and it fails then disk 0 snapshot 0 is deleted, while the new snapshot (disk 0 snapshot 1, which refers to disk 0 snapshot 0) is left alone being broken all by itself

I'm really not sure how this was ever expected to work.

Associated revisions

Revision bcf061f1
Added by Ruben S. Montero over 5 years ago

bug #4048: Removed snap_cmd_rollback to prevent VM disk corruption

Revision cc863873
Added by Ruben S. Montero over 5 years ago

bug #4048: Removed snap_cmd_rollback to prevent VM disk corruption

(cherry picked from commit bcf061f1ec4a068fd1e42d49882dfe4ce017ef5e)

Revision a35b4d48
Added by Javi Fontan over 5 years ago

bug #4048: take out TM ROLLBACK from snap create

Revision 5f63b5dd
Added by Javi Fontan over 5 years ago

bug #4048: take out TM ROLLBACK from snap create

(cherry picked from commit a35b4d483c1abb6e1c40aa8f1f72d407152be16b)

History

#1 Updated by Ruben S. Montero over 5 years ago

  • Category set to Drivers - Storage
  • Status changed from Pending to New
  • Assignee set to Javi Fontan
  • Target version set to Release 4.14.2

Hi Roy,

Thanks for the feedback. This may escape the integration tests. The reason is that each driver handles the snap_id logic in a different way. Just to confirm, are you using the qcow2 drivers?

Cheers

#2 Updated by Roy Keene over 5 years ago

Yes, this was observed specifically in the qcow2 driver -- however the Transfer Manager always sends the SNAP_DELETE for the same snap_id as was sent for the SNAP_CREATE, without any way for SNAP_DELETE to know what SNAP_CREATE did (since at the time SNAP_DELETE is constructed, SNAP_CREATE has not been run yet).

#3 Updated by Ruben S. Montero over 5 years ago

  • Assignee changed from Javi Fontan to Ruben S. Montero

#4 Updated by Ruben S. Montero over 5 years ago

  • Status changed from New to Closed
  • Resolution set to fixed

Remove the rollback operation from core. The rollback operation must be addressed in the snap_create operation as there are multiple possible failure points that may need different rollback operations.

Also available in: Atom PDF