Bug #4878

Ceph persitent image "loses" changes

Added by Taavi K over 4 years ago. Updated over 4 years ago.

Status:ClosedStart date:10/23/2016
Priority:NormalDue date:
Assignee:Jaime Melis% Done:

0%

Category:Drivers - Storage
Target version:Release 5.2.1
Resolution:fixed Pull request:
Affected Versions:OpenNebula 5.2

Description

Ceph image loses recent changes if image has been non persitent.

Steps to reproduce:

1) create persitent empty datablock (id 42), mount it in vm and add some files:

mkfs.ext4 /dev/vdb
mkdir /mnt/d1
mount /dev/vdb /mnt/d1/; cd /mnt/d1/; touch test1; touch test2; touch test3

Verify in ceph cluster: rbd -p one --id libvirt du

NAME         PROVISIONED    USED
one-42             1024M  73728k

2) terminate vm

3) make image none persitent & boot vm with it

Verify in ceph cluster: rbd -p one --id libvirt du

NAME         PROVISIONED    USED
one-42@snap        1024M 73728k <-- snapshot of original
one-42             1024M      0 <-- original image
one-42-623-1       1024M      0 <-- new vm none persitent image, child of snapshot

4) so far all ok with created files, terminate vm

Verify in ceph cluster: rbd -p one --id libvirt du

NAME         PROVISIONED    USED
one-42@snap        1024M 73728k <-- snapshot of original still present 
one-42             1024M      0

5) make image presitent again and boot vm with it. remove some files:

rm test2 test3

Verify in ceph cluster: rbd -p one --id libvirt du

NAME         PROVISIONED    USED
one-42@snap        1024M 73728k <-- old snapshot of original (with 3 files)
one-42             1024M 12288k <-- running presistent image (now with 1 file)

6) terminate vm and make image non persitent

7) boot new vm with image that had 1 file, and list dir: ls -al /mnt/d1

drwxr-xr-x  3 root root  4096 Oct 23 16:10 .
drwxr-xr-x 22 root root  4096 Oct  5 21:50 ..
drwx------  2 root root 16384 Oct 23 16:08 lost+found
-rw-r--r--  1 root root     0 Oct 23 16:10 test1
-rw-r--r--  1 root root     0 Oct 23 16:10 *test2* <-- deleted file in step 5
-rw-r--r--  1 root root     0 Oct 23 16:10 *test3* <-- deleted file in step 5

Verify in ceph cluster: rbd -p one --id libvirt du

NAME         PROVISIONED    USED
one-42@snap        1024M 73728k <- old snapshot with 3 files
one-42             1024M 12288k <-- original image with 1 file
one-42-625-1       1024M  4096k <-- latest image, clone of snapshot 1st line with 3 files

Verify that new clone is in fact from snapshot:
rbd --id libvirt children one/one-42@snap

one/one-42-625-1

So now whenever I boot image as non persitent, old snapshot (3 files) for image cloning is used. When I boot it as persitent original/latest image (1 file) is used.

I think rbd snapshot needs to be removed after we terminate last running vm with none persitent image.

Associated revisions

Revision 5764fe97
Added by Jaime Melis over 4 years ago

B #4878: Ceph persistent image "loses" changes

Revision 831cc1f8
Added by Jaime Melis over 4 years ago

B #4878: Ceph persistent image "loses" changes

(cherry picked from commit 5764fe974a19a56c39810a8a4e4c6fca018255d7)

Revision 2a9a6ac2
Added by Jaime Melis over 4 years ago

B #4878: Ceph persistent image "loses" changes

History

#1 Updated by Ruben S. Montero over 4 years ago

Are you sure you are using 5.2? The @snap is removed for persistent images, take a look here:

https://github.com/OpenNebula/one/blob/master/src/tm_mad/ceph/ln#L73

Do you have any errors when instantiating the VM after making the image persistent?

#2 Updated by Taavi K over 4 years ago

Ruben S. Montero wrote:

Are you sure you are using 5.2? The @snap is removed for persistent images, take a look here:

https://github.com/OpenNebula/one/blob/master/src/tm_mad/ceph/ln#L73

Do you have any errors when instantiating the VM after making the image persistent?

Yes, it's 5.2.
Can't find any errors either.

#3 Updated by Taavi K over 4 years ago

Oned.log reports success converting image to persitent but snapshot is still there:

[Z0][ReM][D]: Req:2032 UID:0 ImageInfo invoked , 42
[Z0][ReM][D]: Req:2032 UID:0 ImageInfo result SUCCESS, "<IMAGE><ID>42</ID><U..." 
[Z0][ReM][D]: Req:1680 UID:0 ImagePersistent invoked , 42, true
[Z0][ReM][D]: Req:1680 UID:0 ImagePersistent result SUCCESS, 42
[Z0][ReM][D]: Req:5056 UID:0 ImageInfo invoked , 42
[Z0][ReM][D]: Req:5056 UID:0 ImageInfo result SUCCESS, "<IMAGE><ID>42</ID><U..." 

Ceph cluster still has a snapshot:

rbd --id libvirt -p one info one-42

rbd image 'one-42':
        size 1024 MB in 256 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.217ed81238e1f29
        format: 2
        features: layering
        flags:

rbd --id libvirt -p one info one-42@snap

rbd image 'one-42':
        size 1024 MB in 256 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.217ed81238e1f29
        format: 2
        features: layering
        flags:
        protected: True

oned -v

Copyright 2002-2016, OpenNebula Project, OpenNebula Systems

OpenNebula 5.2.0 is distributed and licensed for use under the terms of the
Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).

Ceph cluster version 10.2.3 and relevant auth permissions:

client.libvirt
        key: *removed*
        caps: [mon] allow r
        caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=one, allow rwx pool=one_ssd

#4 Updated by Ruben S. Montero over 4 years ago

Note that the snapshot will be removed the next time the image is used, not right after the persistent change

#5 Updated by Taavi K over 4 years ago

This still does not happen after persistent image use.

In first post step 5, snapshot should be deleted but it's still there.

#6 Updated by Ruben S. Montero over 4 years ago

I cannot reproduce this, will try in another Ceph cluster, and update the issue.

#8 Updated by Jaime Melis over 4 years ago

  • Resolution set to worksforme

I'm sorry, but I have also tried to replicate the issue and it works for me. The code seems alright, as Rubén explained, the snapshot is being removed when the VM is instantiated.

Could it be perhaps a problem with an unproper umount, without running sync or something like that?

Closing with worksforme.

Please reopen if you have any additional info.

#9 Updated by Jaime Melis over 4 years ago

  • Status changed from Pending to Closed

#10 Updated by Jaime Melis over 4 years ago

  • Status changed from Closed to Assigned
  • Assignee set to Jaime Melis
  • Target version set to Release 5.4

Actually reopening... I followed your instructions again and I've seen the bug.

Thanks for the detailed explanation!

#11 Updated by Taavi K over 4 years ago

Sure :)

#12 Updated by Jaime Melis over 4 years ago

  • Status changed from Assigned to Closed
  • Resolution changed from worksforme to fixed

Fixed

#13 Updated by Tino Vázquez over 4 years ago

  • Target version changed from Release 5.4 to Release 5.2.1

Also available in: Atom PDF