Feature #1068

Implement RBD disk images

Added by Grzegorz Kocur over 9 years ago. Updated over 8 years ago.

Status:ClosedStart date:01/19/2012
Priority:NormalDue date:
Assignee:Jaime Melis% Done:

0%

Category:Drivers - Auth
Target version:Release 4.0
Resolution:fixed Pull request:

Description

RBD allows striping a VM block device over objects stored in the Ceph distributed object store. This gives shared block storage to facilitate VM migration between hosts. It also improve speed and availability of images. It's based on Ceph cluster.
Whereas Ceph is still under development, it would be great to use it with opennebula when it's finished/stable.

RBD is supported by libvirt.

More info:
http://ceph.newdream.net/wiki/QEMU-RBD
http://libvirt.org/formatdomain.html#elementsDisks

opennebula-rbd.tar.gz - RBD source patches and MAD scripts (8.91 KB) Bill Campbell, 10/26/2012 09:24 PM

opennebula-rbd-fix.tar.gz - Fixed Image.h and attach_disk (10 KB) Bill Campbell, 10/31/2012 12:43 PM

opennebula4.0-rbd-format2-transfer-datastore-drivers.tar.gz (3.86 KB) Bill Campbell, 03/04/2013 08:37 PM

Associated revisions

Revision 555da993
Added by Jaime Melis over 8 years ago

Feature #1068: The core supports RBD images

Revision a64f2337
Added by Jaime Melis over 8 years ago

Feature #1068: Add the datastore drivers for Ceph

Revision 20df320d
Added by Jaime Melis over 8 years ago

Feature #1068: The onedatastore command now shows the DISK_TYPE

Revision f52d7dbe
Added by Jaime Melis over 8 years ago

Feature #1068: rbd names inside the name attribute of the disk's section in the libvirt domain file

Revision 3642a8eb
Added by Jaime Melis over 8 years ago

Feature #1068: add Ceph TMs

Revision 9a19cbfd
Added by Jaime Melis over 8 years ago

Feature #1068: add ceph to oned.conf

Revision b73efa59
Added by Jaime Melis over 8 years ago

Feature #1068: add ceph to install.sh

Revision e1e6b15e
Added by Ruben S. Montero over 8 years ago

feature #1068: Change default staging dir to /var/tmp

Revision 85eeb4b2
Added by Jaime Melis over 8 years ago

Feature #1068: Fix unnecessary use of sudo in the Ceph drivers

Revision 1f6b7ff7
Added by Jaime Melis over 8 years ago

Feature #1068: Fix unnecessary use of sudo in the Ceph drivers(cherry picked from commit 85eeb4b240881f1918b37ab2ee7c201e311cd474)

History

#1 Updated by Cristian M about 9 years ago

Grzegorz Kocur wrote:

RBD allows striping a VM block device over objects stored in the Ceph distributed object store. This gives shared block storage to facilitate VM migration between hosts. It also improve speed and availability of images. It's based on Ceph cluster.
Whereas Ceph is still under development, it would be great to use it with opennebula when it's finished/stable.

RBD is supported by libvirt.

More info:
http://ceph.newdream.net/wiki/QEMU-RBD
http://libvirt.org/formatdomain.html#elementsDisks

Currently RBD is ready to use in production. I use this with Openstack, and is a good idea can use it with OpenNebula.

#2 Updated by Bill Campbell over 8 years ago

Grzegorz Kocur wrote:

RBD allows striping a VM block device over objects stored in the Ceph distributed object store. This gives shared block storage to facilitate VM migration between hosts. It also improve speed and availability of images. It's based on Ceph cluster.
Whereas Ceph is still under development, it would be great to use it with opennebula when it's finished/stable.

RBD is supported by libvirt.

More info:
http://ceph.newdream.net/wiki/QEMU-RBD
http://libvirt.org/formatdomain.html#elementsDisks

This has been tested on Ubuntu Server 12.04.1 for OpenNebula/Ceph/KVM Hosts

I've been working on an RBD driver for OpenNebula and have come up with the attached. This is tested and based on version 3.8 (and tested with version 3.8.1). The driver is a combination of some modified bits of vmm/LibVirtDriverKVM.cc, datastore/Datastores.cc, and include/Image.h. Also a modification of the Sunstone public/js/plugins/datastores-tab.js to allow selection of the RBD types. Datastore, Transfer Manager, and a modified attach_disk VMM/KVM MAD are included as well. I hope this is of help to not only the community but the OpenNebula developers to help support a very promising distributed storage platform for cloud architectures.

All VM functions are working minus hotplug/attaching of Volatile disks. It functions, but creates a file in the shared system DS (as designed). I believe an option to select the datastore when creating a hotplug volatile disk would help facilitate that particular function. The existing image hotplug works just fine.

A couple of notable configuration necessities/assumptions:

--The datastore NEEDS to be named the same as the RADOS/RBD Pool you create. Otherwise this will NOT work. The MADS key off of the datastore name to appropriate and generate the appropriate disk configuration in the Libvirt domain definition file. You can have multiple RADOS pools, however you cannot clone between them using the default datastore 'clone' mad. This should be simple to implement, recommend additional Source/Destination datastore logic in the clone script.

--It is assumed the ceph.conf and keyring are stored in defined locations on the KVM hosts as well. This makes the KVM hosts aware of the cluster configuration, and the keyring installed provides appropriate authentication. I know this is not optimal, as implementing the ceph cluster hostnames and keyrings within the libvirt domain definition file seemed rather complicated. I'm not a C programmer, so I hope somebody can provide some input and help develop this part.

--The configuration file needs the appropriate additions to the DS_MADS and TM_MADS location to add 'rbd' as available drivers.

--When creating the datastore, select "RBD" for Datastore manager, Transfer Manager, and Disk Type. I've modified the Sunstone interface to have these options available.

I'm by no means a programmer/developer, and I'm sure my methods aren't particularly optimal, so please feel free to critique, pull apart, rewrite, etc. etc.

Hope this helps!

#3 Updated by Bill Campbell over 8 years ago

Forgot to mention the OpenNebula server does NOT need to be able to see the Ceph cluster (helpful if using a separate storage network), but you WILL need to share out (I used NFS) /var/lib/one/datastores/0 (the system datastore). This helps in communicating particular bits of disk information between ONE FE and the KVM hosts.

#4 Updated by Ruben S. Montero over 8 years ago

  • Target version set to Release 4.0

#5 Updated by Bill Campbell over 8 years ago

Realized in testing deployment/recompilation on separate instance that some last minute modifications to code (was referencing disk type as "network" as that is how libvirt sees the disk, but I changed to RBD to be more descriptive) and did not provide updated Image.h and attach_disk vmm script. This is now included.

#7 Updated by Vladislav Gorbunov over 8 years ago

remotes/tm/rbd/delete delete persistent image on my rbd cluster. Here is the path that fix it:

7d46
< DST_PERSISTENT=$( cat /var/lib/one/${VMID}/${DST_BASENAME}.index | grep -c "image" )
50,51c49,63
< if [ $DST_PERSISTENT -eq 1 ]; then
<     log "Image is persistent.  Skipping deletion." 
---
> DISK_ID=$(echo "$DST_PATH" | $AWK -F. '{print $NF}')
> 
> XPATH="/var/lib/one/remotes/datastore/xpath.rb --stdin" 
> 
> unset i XPATH_ELEMENTS
> 
> while IFS= read -r -d '' element; do
>     XPATH_ELEMENTS[i++]="$element" 
> done < <(onevm show -x $VMID| $XPATH \
>                     /VM/TEMPLATE/DISK[DISK_ID=$DISK_ID]/PERSISTENT)
> 
> DST_PERSISTENT="${XPATH_ELEMENTS[0]}" 
> 
> if [ "$DST_PERSISTENT" = "YES" ]; then
>     log "Image is persistent - $DST_PERSISTENT.  Skipping deletion." 

#8 Updated by Vladislav Gorbunov over 8 years ago

and path for remotes/tm/rbd/mvds

73c73,86
< if [ $SRC_PERSISTENT -eq 1 ]; then
---
> DISK_ID=$(echo "$DST_PATH" | $AWK -F. '{print $NF}')
> 
> XPATH="/var/lib/one/remotes/datastore/xpath.rb --stdin" 
> 
> unset i XPATH_ELEMENTS
> 
> while IFS= read -r -d '' element; do
>     XPATH_ELEMENTS[i++]="$element" 
> done < <(onevm show -x $VMID| $XPATH \
>                     /VM/TEMPLATE/DISK[DISK_ID=$DISK_ID]/PERSISTENT)
> 
> DST_PERSISTENT="${XPATH_ELEMENTS[0]}" 
> 
> if [ "$DST_PERSISTENT" = "YES" ]; then

#9 Updated by Bill Campbell over 8 years ago

Vladislav thanks for the help in the TM driver! Looking at this gives me a better understanding of how XPATH works, so I thank you. In implementing/testing your changes I noticed on the mvds script that the DISK_ID needs to be sourced from $SRC_PATH and not $DST_PATH (it kept coming up empty and attempted to save it back to the datastore and failing).

#10 Updated by Vladislav Gorbunov over 8 years ago

You are right. DISK_ID on the mvds script needs to be sourced from $SRC_PATH. Thank you!

#11 Updated by Jaime Melis over 8 years ago

  • Status changed from New to Assigned
  • Assignee set to Jaime Melis

Thank you all for helping with this feature: ideas, patches and comments.

We have merged the new feature with the 'master' branch in OpenNebula. It would be great if you could test it and let us know what you think.

The documentation is here: http://opennebula.org/documentation:rel4.0:ceph_ds

thanks again!

#12 Updated by Jaime Melis over 8 years ago

  • Status changed from Assigned to Closed
  • Resolution set to fixed

I'm closing the ticket for the moment. Feel free to reopen if you have any suggestions.

#13 Updated by Bill Campbell over 8 years ago

Hello,
Thanks for implementing the driver in the 4.0 release. I'm not sure if this will make release, but attached are updated DS and TM drivers for Ceph RBD volumes to take advantage of Ceph's RBD Format 2 images. What this allows are Copy-on-Write snapshots for non-persistent images, allowing for rapid deployment of VMs without additional copy operations. Format 2 has some additional snapshot capability as well, however that has not been added to these drivers.

QEMU-IMG cannot import format 2 images, so we have to skip using that. I've modified the driver to convert any uploaded image to a raw format and then import it into the RBD cluster. Outside of that, operation is pretty much the same, with the exception non-persistent images deploy a LOT faster now!

#14 Updated by liu tao over 8 years ago

mark

Also available in: Atom PDF