Backlog #3989: persistent image in ERROR status after VM delete - OpenNebula - OpenNebula Development pages

Backlog #3989

persistent image in ERROR status after VM delete

Added by Anton Todorov almost 6 years ago. Updated almost 4 years ago.

Status:

Closed

Start date:

09/18/2015

Priority:

Normal

Due date:

Assignee:

Jaime Melis

% Done:

Category:

Drivers - VM

Target version:

Release 5.4

Description

How to reproduce:
1. import image from marketplace
2. mark imported image as persistent
3. select the persistent image to be used by the template
4. deploy VM with the above template
5. wait for running state (optional)
6. delete the VM

Result:
In the images tab the image is with Status ERROR but should be READY

If you mark the image and select "Enable", the image will become in READY Status.

Attached:
<VM_id>.log
part of oned.log for the related time period

The logs indicates problem in VMM so I am categorizing the issue as VM related one.
Target version is second beta(4.13.85), but it is not in target version list na I thing the issue is in master branch too.

16.log (1.92 KB) Anton Todorov, 09/18/2015 03:46 PM

oned.log (15.7 KB) Anton Todorov, 09/18/2015 03:46 PM

History

#1 Updated by Ruben S. Montero almost 6 years ago

Target version changed from 79 to Release 4.14

#2 Updated by Carlos Martín almost 6 years ago

Priority changed from High to Normal
Target version deleted (~~Release 4.14~~)

Hi,

The image is set to error on purpose. After a delete operation, there isn't a proper epilog phase, and the image may be in an inconsistent state. If the file was directly linked, the file system may be corrupted somehow. If the image was not linked, the changes made during the VM lifecycle are not saved back to the datastore.

So in any case, the image is not released gently as you would expect from a VM that is shutdown. With the error state + enable, we have a way to notify the owner of the image that it may need to be checked for errors.

We'll leave the ticket open to discuss if this mechanism needs to change, but if anything needs to be done it will be after 4.14.

Please let us know your thoughts.
Cheers.

#3 Updated by Carlos Martín almost 6 years ago

Tracker changed from Bug to Backlog
Target version set to Release 5.0

Moving to backlog because it's not something actually broken.

#4 Updated by Sachin Agarwal over 5 years ago

Hi,

I have the same issue. I thought its a bug until I came across this. In my case I am using the saturniscsi driver for persistent images and when I delete the VM the image goes into a "err" state until I run oneimage enable.

The reasoning behind the decision does hold weight, although end-users will have an additional step to perform before re-using the persistent images.

Best,
Sachin

#5 Updated by Anton Todorov over 5 years ago

IMHO even at the moment there is inconsistency in the image statuses because almost all current image operations on running VM are leading to disk images with no better state but are not marked as 'error'. The most noticeable example are disk "Save-as" and disk "Snapshot".

Kind Regards,
Anton Todorov

#6 Updated by Ruben S. Montero over 5 years ago

Tracker changed from Backlog to Feature

#7 Updated by Ruben S. Montero over 5 years ago

Status changed from Pending to New

#8 Updated by Jaime Melis over 5 years ago

Hi Anton, can you elaborate a bit more on your last message? What do you mean with:

almost all current image operations on running VM are leading to disk images with no better state but are not marked as 'error'. The most noticeable example are disk "Save-as" and disk "Snapshot".

#9 Updated by Anton Todorov over 5 years ago

Hi Jaime,
If I recall correctly, here are the cases I mean:

1. When you do disk save-as on runing VM the consistency of the cloned image is same as if the VM process is killed durring the Delete operation.

2. When I was writing this Snapshot-live was not so developed, but again - without live snapshots when you snapshot VM disk while the VM is running you have inconsistent data in the snapshot.

In both cases the cached data in the VM kernel is missing( furthermore there is mariadb DB files caching which should be flushed additionally via the qemu-ga but this is another topic ;) ).

In both examples the resulting disks are with status READY but their data consistency is not better. IMO if the disk of the deleted VM is in ERROR status because there are not flushed data on it, then the above cases should be in ERROR state too.

You can test the consistency with fsck on the cloned/snapshotted disk. If there were active IO during the image creation you will have errors. thankfully in most cases fsck is solving them.

Cheers,
Anton

#10 Updated by Ruben S. Montero about 5 years ago

Assignee set to Jaime Melis

#11 Updated by Ruben S. Montero about 5 years ago

Tracker changed from Feature to Backlog

#12 Updated by Ruben S. Montero almost 5 years ago

Target version changed from Release 5.0 to Release 5.4

#13 Updated by Javi Fontan almost 4 years ago

Status changed from New to Closed

delete operation should only be used when there's a problem with the machine and in 5.x delete is part of recover, that only the admin can use.

Also available in: Atom PDF

OpenNebula

Issues

Custom queries