Feature #1068
Implement RBD disk images
Status: | Closed | Start date: | 01/19/2012 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Jaime Melis | % Done: | 0% | |
Category: | Drivers - Auth | |||
Target version: | Release 4.0 | |||
Resolution: | fixed | Pull request: |
Description
RBD allows striping a VM block device over objects stored in the Ceph distributed object store. This gives shared block storage to facilitate VM migration between hosts. It also improve speed and availability of images. It's based on Ceph cluster.
Whereas Ceph is still under development, it would be great to use it with opennebula when it's finished/stable.
RBD is supported by libvirt.
More info:
http://ceph.newdream.net/wiki/QEMU-RBD
http://libvirt.org/formatdomain.html#elementsDisks
Associated revisions
Feature #1068: The core supports RBD images
Feature #1068: Add the datastore drivers for Ceph
Feature #1068: The onedatastore command now shows the DISK_TYPE
Feature #1068: rbd names inside the name attribute of the disk's section in the libvirt domain file
Feature #1068: add Ceph TMs
Feature #1068: add ceph to oned.conf
Feature #1068: add ceph to install.sh
feature #1068: Change default staging dir to /var/tmp
Feature #1068: Fix unnecessary use of sudo in the Ceph drivers
Feature #1068: Fix unnecessary use of sudo in the Ceph drivers(cherry picked from commit 85eeb4b240881f1918b37ab2ee7c201e311cd474)
History
#1 Updated by Cristian M about 9 years ago
Grzegorz Kocur wrote:
RBD allows striping a VM block device over objects stored in the Ceph distributed object store. This gives shared block storage to facilitate VM migration between hosts. It also improve speed and availability of images. It's based on Ceph cluster.
Whereas Ceph is still under development, it would be great to use it with opennebula when it's finished/stable.RBD is supported by libvirt.
More info:
http://ceph.newdream.net/wiki/QEMU-RBD
http://libvirt.org/formatdomain.html#elementsDisks
Currently RBD is ready to use in production. I use this with Openstack, and is a good idea can use it with OpenNebula.
#2 Updated by Bill Campbell over 8 years ago
- File opennebula-rbd.tar.gz added
Grzegorz Kocur wrote:
RBD allows striping a VM block device over objects stored in the Ceph distributed object store. This gives shared block storage to facilitate VM migration between hosts. It also improve speed and availability of images. It's based on Ceph cluster.
Whereas Ceph is still under development, it would be great to use it with opennebula when it's finished/stable.RBD is supported by libvirt.
More info:
http://ceph.newdream.net/wiki/QEMU-RBD
http://libvirt.org/formatdomain.html#elementsDisks
This has been tested on Ubuntu Server 12.04.1 for OpenNebula/Ceph/KVM Hosts
I've been working on an RBD driver for OpenNebula and have come up with the attached. This is tested and based on version 3.8 (and tested with version 3.8.1). The driver is a combination of some modified bits of vmm/LibVirtDriverKVM.cc, datastore/Datastores.cc, and include/Image.h. Also a modification of the Sunstone public/js/plugins/datastores-tab.js to allow selection of the RBD types. Datastore, Transfer Manager, and a modified attach_disk VMM/KVM MAD are included as well. I hope this is of help to not only the community but the OpenNebula developers to help support a very promising distributed storage platform for cloud architectures.
All VM functions are working minus hotplug/attaching of Volatile disks. It functions, but creates a file in the shared system DS (as designed). I believe an option to select the datastore when creating a hotplug volatile disk would help facilitate that particular function. The existing image hotplug works just fine.
A couple of notable configuration necessities/assumptions:
--The datastore NEEDS to be named the same as the RADOS/RBD Pool you create. Otherwise this will NOT work. The MADS key off of the datastore name to appropriate and generate the appropriate disk configuration in the Libvirt domain definition file. You can have multiple RADOS pools, however you cannot clone between them using the default datastore 'clone' mad. This should be simple to implement, recommend additional Source/Destination datastore logic in the clone script.
--It is assumed the ceph.conf and keyring are stored in defined locations on the KVM hosts as well. This makes the KVM hosts aware of the cluster configuration, and the keyring installed provides appropriate authentication. I know this is not optimal, as implementing the ceph cluster hostnames and keyrings within the libvirt domain definition file seemed rather complicated. I'm not a C programmer, so I hope somebody can provide some input and help develop this part.
--The configuration file needs the appropriate additions to the DS_MADS and TM_MADS location to add 'rbd' as available drivers.
--When creating the datastore, select "RBD" for Datastore manager, Transfer Manager, and Disk Type. I've modified the Sunstone interface to have these options available.
I'm by no means a programmer/developer, and I'm sure my methods aren't particularly optimal, so please feel free to critique, pull apart, rewrite, etc. etc.
Hope this helps!
#3 Updated by Bill Campbell over 8 years ago
Forgot to mention the OpenNebula server does NOT need to be able to see the Ceph cluster (helpful if using a separate storage network), but you WILL need to share out (I used NFS) /var/lib/one/datastores/0 (the system datastore). This helps in communicating particular bits of disk information between ONE FE and the KVM hosts.
#4 Updated by Ruben S. Montero over 8 years ago
- Target version set to Release 4.0
#5 Updated by Bill Campbell over 8 years ago
Realized in testing deployment/recompilation on separate instance that some last minute modifications to code (was referencing disk type as "network" as that is how libvirt sees the disk, but I changed to RBD to be more descriptive) and did not provide updated Image.h and attach_disk vmm script. This is now included.
#6 Updated by Bill Campbell over 8 years ago
- File opennebula-rbd-fix.tar.gz added
#7 Updated by Vladislav Gorbunov over 8 years ago
remotes/tm/rbd/delete delete persistent image on my rbd cluster. Here is the path that fix it:
7d46 < DST_PERSISTENT=$( cat /var/lib/one/${VMID}/${DST_BASENAME}.index | grep -c "image" ) 50,51c49,63 < if [ $DST_PERSISTENT -eq 1 ]; then < log "Image is persistent. Skipping deletion." --- > DISK_ID=$(echo "$DST_PATH" | $AWK -F. '{print $NF}') > > XPATH="/var/lib/one/remotes/datastore/xpath.rb --stdin" > > unset i XPATH_ELEMENTS > > while IFS= read -r -d '' element; do > XPATH_ELEMENTS[i++]="$element" > done < <(onevm show -x $VMID| $XPATH \ > /VM/TEMPLATE/DISK[DISK_ID=$DISK_ID]/PERSISTENT) > > DST_PERSISTENT="${XPATH_ELEMENTS[0]}" > > if [ "$DST_PERSISTENT" = "YES" ]; then > log "Image is persistent - $DST_PERSISTENT. Skipping deletion."
#8 Updated by Vladislav Gorbunov over 8 years ago
and path for remotes/tm/rbd/mvds
73c73,86 < if [ $SRC_PERSISTENT -eq 1 ]; then --- > DISK_ID=$(echo "$DST_PATH" | $AWK -F. '{print $NF}') > > XPATH="/var/lib/one/remotes/datastore/xpath.rb --stdin" > > unset i XPATH_ELEMENTS > > while IFS= read -r -d '' element; do > XPATH_ELEMENTS[i++]="$element" > done < <(onevm show -x $VMID| $XPATH \ > /VM/TEMPLATE/DISK[DISK_ID=$DISK_ID]/PERSISTENT) > > DST_PERSISTENT="${XPATH_ELEMENTS[0]}" > > if [ "$DST_PERSISTENT" = "YES" ]; then
#9 Updated by Bill Campbell over 8 years ago
Vladislav thanks for the help in the TM driver! Looking at this gives me a better understanding of how XPATH works, so I thank you. In implementing/testing your changes I noticed on the mvds script that the DISK_ID needs to be sourced from $SRC_PATH and not $DST_PATH (it kept coming up empty and attempted to save it back to the datastore and failing).
#10 Updated by Vladislav Gorbunov over 8 years ago
You are right. DISK_ID on the mvds script needs to be sourced from $SRC_PATH. Thank you!
#11 Updated by Jaime Melis over 8 years ago
- Status changed from New to Assigned
- Assignee set to Jaime Melis
Thank you all for helping with this feature: ideas, patches and comments.
We have merged the new feature with the 'master' branch in OpenNebula. It would be great if you could test it and let us know what you think.
The documentation is here: http://opennebula.org/documentation:rel4.0:ceph_ds
thanks again!
#12 Updated by Jaime Melis over 8 years ago
- Status changed from Assigned to Closed
- Resolution set to fixed
I'm closing the ticket for the moment. Feel free to reopen if you have any suggestions.
#13 Updated by Bill Campbell over 8 years ago
Hello,
Thanks for implementing the driver in the 4.0 release. I'm not sure if this will make release, but attached are updated DS and TM drivers for Ceph RBD volumes to take advantage of Ceph's RBD Format 2 images. What this allows are Copy-on-Write snapshots for non-persistent images, allowing for rapid deployment of VMs without additional copy operations. Format 2 has some additional snapshot capability as well, however that has not been added to these drivers.
QEMU-IMG cannot import format 2 images, so we have to skip using that. I've modified the driver to convert any uploaded image to a raw format and then import it into the RBD cluster. Outside of that, operation is pretty much the same, with the exception non-persistent images deploy a LOT faster now!
#14 Updated by liu tao over 8 years ago
mark