Bug #5563

ceph rbd image attached to existing VM via attach_disk is missing additional ceph monitors

Added by Tobias Fischer 10 months ago. Updated 10 months ago.

Status:PendingStart date:11/17/2017
Priority:HighDue date:
Assignee:-% Done:

0%

Category:Drivers - VM
Target version:-
Resolution: Pull request:
Affected Versions:OpenNebula 5.4.1

Description

Hello,

any additional ceph rbd image attached to an existing VM via attach disk is missing additional ceph monitors. Example:

1st disk (defined via template) has following XML definition:

<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='writeback'/>
<auth username='XXX'>
<secret type='ceph' uuid='XXX'/>
</auth>
<source protocol='rbd' name='POOL/IMAGE'>
<host name='XXX.XXX.XXX.XXX' port='6789'/>
<host name='XXX.XXX.XXX.XXX' port='6789'/>
<host name='XXX.XXX.XXX.XXX' port='6789'/>
</source>

any additional ceph image has following XML definition:

<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<auth username='XXX'>
<secret type='ceph' uuid='XXX'/>
</auth>
<source protocol='rbd' name='POOL/IMAGE'>
<host name='XXX.XXX.XXX.XXX' port='6789'/>
</source>

Although the corresponding Datastore has 3 ceph monitors defined in opennebula the attach script only uses the first one. So the other 2 ceph monitors are missing.
This is very bad because if the 1st ceph monitor dies somehow the attached disk in the VM gets unusable. If it had all 3 monitors defined then it could switch to one of the other 2.

The source code for getting the ceph monitors of the corresponding DS from which the image should be attached seems to be in
remotes/vmm/kvm/attach_disk

cat <<EOF > $ATTACH_FILE
<disk type='$TYPE_XML' device='$DEVICE'>
<driver name='qemu' type='$DRIVER' $CACHE $DISCARD/>
<source $TYPE_SOURCE='$SOURCE' $SOURCE_ARGS>
$SOURCE_HOST
</source>
$AUTH
<target dev='$TARGET'/>
$READONLY
</disk>
EOF

the variable containing the ceph monitors is $SOURCE_HOST
it gets defined in remotes/scripts_common.sh:
function get_source_xml
function get_disk_information

unfortunately i am not sure where exactly the error is located.
Any help is appreciated. thanks :-)

Best,
tobi

History

#1 Updated by Tobias Fischer 10 months ago

Here is how i fixed it:

remotes/scripts_common.sh:
function get_source_xml {
HOSTS=""
for host in $1 ; do
BCK_IFS=$IFS
IFS=':'

unset k HOST_PARTS SOURCE_HOST
for part in $host ; do
HOST_PARTS[k++]="$part"
done
SOURCE_HOST="$SOURCE_HOST&lt;host name='${HOST_PARTS[0]}'"
if [ -n "${HOST_PARTS[1]}" ]; then
SOURCE_HOST="$SOURCE_HOST port='${HOST_PARTS[1]}'"
fi
SOURCE_HOST="$SOURCE_HOST/>" 
HOSTS+=$SOURCE_HOST
IFS=$BCK_IFS
done
echo "$HOSTS" 
}
so i added

HOSTS=""
HOSTS+=$SOURCE_HOST
echo "$HOSTS"

and also in
function get_disk_information {

SOURCE_HOST=$(get_source_xml "$CEPH_HOST")

i added quotes around $CEPH_HOST

Also available in: Atom PDF