Feature #4139

Improve system datastore monitoring

Added by Jaime Melis over 5 years ago. Updated about 5 years ago.

Status:ClosedStart date:11/06/2015
Priority:NormalDue date:
Assignee:Ruben S. Montero% Done:

80%

Category:Drivers - Monitor
Target version:Release 5.0
Resolution:fixed Pull request:

Description

It currently monitors all the directories, not just the ssh based ones.


Related issues

Duplicates Feature #3981: generic disk polling for block devices Closed 09/12/2015

Associated revisions

Revision cb6b6f26
Added by Javi Fontan about 5 years ago

bug #4366: get disk info in tm/shared monitor

Revision 7c06afc7
Added by Javi Fontan about 5 years ago

feature #4139: do not monitor file based disks in vm poll

Revision 3cb5cab0
Added by Javi Fontan about 5 years ago

feature #4139: get disk info for ssh in monitor_ds

Revision de891f3f
Added by Javi Fontan about 5 years ago

feature #4139: create .monitor file in ssh_make_path

Revision 563afe9b
Added by Javi Fontan about 5 years ago

feature #4139: create ssh directories with .monitor

Revision 468eadd5
Added by Javi Fontan about 5 years ago

feature #4139: bug in ssh_male_path

Revision 12f36729
Added by Javi Fontan about 5 years ago

feature #4139: create .monitor in qcow2 and shared tm's

Revision bb926c11
Added by Javi Fontan about 5 years ago

feature #4139: monitor local ds disks with specific probes

Revision 0e8b3835
Added by Javi Fontan about 5 years ago

feature #4139: add missing ssh/monitor_ds

Revision 20a39214
Added by Javi Fontan about 5 years ago

feature #4139: bug in monitor_ds.sh

Revision 48a714a8
Added by Javi Fontan about 5 years ago

feature #4139: monitor local ds disks with specific probes

Revision ea3b780e
Added by Javi Fontan about 5 years ago

feature #4139: add missing ssh/monitor_ds

Revision 91b45a92
Added by Javi Fontan about 5 years ago

feature #4139: bug in monitor_ds.sh

History

#1 Updated by Ruben S. Montero over 5 years ago

  • Tracker changed from Backlog to Feature
  • Status changed from Pending to New

#2 Updated by Ruben S. Montero about 5 years ago

  • Assignee set to Javi Fontan

#3 Updated by Javi Fontan about 5 years ago

  • Subject changed from Improve SSH system datastore monitoring to Improve system datastore monitoring

Move VM disk monitoring to tm drivers.

#5 Updated by Javi Fontan about 5 years ago

  • Status changed from New to Closed
  • Resolution set to fixed

#6 Updated by Anton Todorov about 5 years ago

Hi,

I am observing the development on this topic and would like to share some toughs.

Correct me if I am wrong but the monitoring looks like:
IMAGE_DATASTORE - monitored via datastore/driver/monitor on the FE
SYSTEM_DATASTORE - monitored depending on the ".monitor" flag in the datastore root:
if there is ".monitor" - monitored via im/remotes/kvm-probes.d/monitor_ds.sh (the SSH case)
if ".monitor" is missing monitored on the FE via tm/driver/monitor (not called from the core yet?)
VM_DISKS
if CEPH they are monitored on the remotes via im/kvm-probes.d/poll (I assume this will be moved to tm/ceph/monitor?)

I am worrying because still can't figure out how to proper hook to such scheme without patching it?

Here is an example:
  • IMAGE DATASTORE on shared storage (storpool) - the images are on the storage and attached to the remote hosts as block devices with dymlinks to DATASTORE_LOCATION/DS_ID/VM_ID/disk.N
  • SYSTEM DATASTORE with TM_MAD="ssh"

In the above example the im/monitor_ds.sh will fail to monitor the VM disk because it has no idea how to proceed.

It is becoming more complicated when the SYSTEM DS is on StorPool too (at least the context and volatile discs)...

As I do not know how the monitoring is planned please allow to share idea for solution:

The im/monitor_ds.sh script for each DATASTORE to call all tm/*/monitor_vm scripts with DS_ID and the VM XML as arguments - each script will process the DS and if it find attached to the VM disk to report its size,snapshots etc and output data for the finding. If the script is not suitable - just to exit silently. This way it will work for all disks in the vm, even if there is mix of different storages involved.

Something like:
<...>
for DS_ID in {each DS}
for VM_ID in {each VM}
VM_XML=`virsh dumpxml VM_ID|base64 -w0 -`
echo "VM[ID=$VM_ID,POLL=\""
for tm_mad in tm/*
if tm_mad/monitor_vm.sh $DS_ID $VM_ID $VM_XML
done
echo "\"]"
done
done

So you can even more simplify the im/poll - the RBD disks will be processed from the TM_MAD script, etc...

Something like above can be implemented in the FE when monitoring the SYSTEM DS from there.

I am open to discuss this further if needed.

Kind Regards,
Anton Todorov

#7 Updated by Ruben S. Montero about 5 years ago

Anton Todorov wrote:

Hi,

I am observing the development on this topic and would like to share some toughs.

Correct me if I am wrong but the monitoring looks like:
IMAGE_DATASTORE - monitored via datastore/driver/monitor on the FE
SYSTEM_DATASTORE - monitored depending on the ".monitor" flag in the datastore root:
if there is ".monitor" - monitored via im/remotes/kvm-probes.d/monitor_ds.sh (the SSH case)
if ".monitor" is missing monitored on the FE via tm/driver/monitor (not called from the core yet?)

Yes this is being called by core

VM_DISKS
if CEPH they are monitored on the remotes via im/kvm-probes.d/poll (I assume this will be moved to tm/ceph/monitor?)

This is going to be removed as no relevant information is obtained.

I am worrying because still can't figure out how to proper hook to such scheme without patching it?

There are basically to use cases:

  1. Shared Datastores This should be monitored through a process initiated in the front-end (usually using the BRIDGE_LIST). The idea is not to do the same operation on all the hosts in the cluster to obtain the same information. This is terrible in terms of performance for several storage backends. Just add the information in datastore/monitor script for the overall datastore capacity and if it is a system ds use tm/monitor to also report information about disk usage. Note also that this is interesting only if the virtual vs physical size is relevant.
  1. Distributed Datastores Each host has a fraction of the datastore (e.g. a distributed pool of local disks, like in ssh datastore) In this case the information is different for each host and has to be gathered through the monitoring system. The stock tm ssh driver (to identify the ssh datastores) creates a .monitor file as a hint, indicating that the directory includes an ssh datastore that has to be monitored by the probe.
Here is an example:
  • IMAGE DATASTORE on shared storage (storpool) - the images are on the storage and attached to the remote hosts as block devices with dymlinks to DATASTORE_LOCATION/DS_ID/VM_ID/disk.N
  • SYSTEM DATASTORE with TM_MAD="ssh"

In the above example the im/monitor_ds.sh will fail to monitor the VM disk because it has no idea how to proceed.

It is becoming more complicated when the SYSTEM DS is on StorPool too (at least the context and volatile discs)...

I would say that this is the case for a shared datastore, i.e. the information can be obtained from a single point. datastore/monitor and tm/monitor will be the place to hook. In your case ssh is not actually being used right? Note also that you could use a mix of the two. I.e. information from the host, and from a "central" location.

As I do not know how the monitoring is planned please allow to share idea for solution:

The im/monitor_ds.sh script for each DATASTORE to call all tm/*/monitor_vm scripts with DS_ID and the VM XML as arguments - each script will process the DS and if it find attached to the VM disk to report its size,snapshots etc and output data for the finding. If the script is not suitable - just to exit silently. This way it will work for all disks in the vm, even if there is mix of different storages involved.

Something like:
<...>
for DS_ID in {each DS}
for VM_ID in {each VM}
VM_XML=`virsh dumpxml VM_ID|base64 -w0 -`
echo "VM[ID=$VM_ID,POLL=\""
for tm_mad in tm/*
if tm_mad/monitor_vm.sh $DS_ID $VM_ID $VM_XML
done
echo "\"]"
done
done

So you can even more simplify the im/poll - the RBD disks will be processed from the TM_MAD script, etc...

Something like above can be implemented in the FE when monitoring the SYSTEM DS from there.

I am open to discuss this further if needed.

The idea behind the re-structure of the monitor information is to not get multiple times the same information in shared storage systems.

Kind Regards,
Anton Todorov

#8 Updated by Anton Todorov about 5 years ago

Hi Ruben,

Thank you for the answer.

As I am trying to figure out the entire monitoring, please allow me to summarize the monitoring and to clarify my understandings. Correct me if I am wrong somewhere.

We are talking about two types of capacity:

1. Datastore capacity (total, used, free). Mostly used for statistics and for the scheduler to know where enough free space is available to deploy a VM.
1.1 for IMAGE datastore: datastore/monitor called locally on the frontend. I assume it is for each datastore separately?
1.2 for SYSTEM datastore:
1.2.1 if SHARED: tm/monitor called in DATASTORE context by the frontend. Again - is it for each datastore separately?
1.2.2 if SSH: reported by remote host probe im/kvm-probes.d/monitor_ds.sh

2. VM disc capacity (used) and their snapshots(used) if available. For statistics and for the scheduler to find enough room to migrate the VM.
2.1 PERSISTENT and NON-PERSISTENT images:
2.1.1 if SHARED tm/monitor called in VMDISK context - for each VM or each DISK?
2.1.2 if SSH - im/kvm-probes.d/monitor_ds.sh - How the monitor_ds.sh will know which tm/monitor to query?
2.2 VOLATILE images(and optional the CONTEXT iso):
2.2.1 if SHARED: tm/monitor called in VMDISK context on the frontend(and SYSTEM_DS context). For each VM or each DISK?
2.2.2 if SSH: im/kvm-probes.d/monitor_ds.sh - How the monitor_ds.sh will know which tm/monitor to query?

In your case ssh is not actually being used right?

My understandings are that the SYSTEM datastore has two capacity:
1. Non filesystem dependent - the VOLATILE disks, the CONTEXT iso and all other data that can be stored on block device
2. Filesystem dependent - the datastore XML and the checkpoint file. The checkpoint file is a tricky one - it can be imported in the datastore but it must be extracted on the filesystem to be used by libvirt (the time to import/extract is close to the time to copy on 10G network).

So if the filesystem for the SYSTEM datastore is Shared - no ssh is used. But if the SYSTEM datastore is not mounted on shared filesystem the use of SSH is mandatory at least for two reasons:
A) To copy all not datastore related files like:
- VM deploy XML,
- for non-live migration the checkpoint file(it is same time to import/export the file in the datastore against copy over 10G network)
- all other possible files placed by other drivers
B) For datastore migration of the VM - again to move the files to the other datastore path.

Note also that you could use a mix of the two. I.e. information from the host, and from a "central" location.

This point is not clear to me. Could you elaborate a little bit?
The tm/monitor will be called (on the frontend) with argumets IMAGE_ID for the PERSISTENT images and with argumets IMAGE_ID, VM_ID and VM_DISK_ID for NON-PERSISTENT images?

We have a real case - a customer with setup as follow: StorPool for both IMAGE and SYSTEM datastre(context and volatile disks) and no shared filesystem on the SYSTEM datastore. Current setup (on 4.14) is monitored as follow:
*IMAGE datastore capacity monitored with datastore/monitor
*SYSTEM datastore capacity monitored with im/kvm-probes.d/monitor_ds.sh (patched to query the right tool)
*VM_DISKs usage monitored with im/kvm-probes.d/poll (as you know patched to work with block devices)

What will be the monitoring on v5? I asume that
IMAGE datastore capacity with datastore/monitor
*SYSTEM datastore capacity with im/kvm-probes.d/monitor_ds.sh and/or (somehow) with tm_mad/monitor(
)
VM_DISKs usage again with tm_mad/monitor()

(*) it looks like there are two different tasks for tm/monitor - to report datastore capacity and VM disk usage?

I have the feeling that there is something that I can't understand...

Kind Regards,
Anton Todorov

#9 Updated by Javi Fontan about 5 years ago

Anton Todorov wrote:

1. Datastore capacity (total, used, free). Mostly used for statistics and for the scheduler to know where enough free space is available to deploy a VM.
1.1 for IMAGE datastore: datastore/monitor called locally on the frontend. I assume it is for each datastore separately?
1.2 for SYSTEM datastore:
1.2.1 if SHARED: tm/monitor called in DATASTORE context by the frontend. Again - is it for each datastore separately?
1.2.2 if SSH: reported by remote host probe im/kvm-probes.d/monitor_ds.sh

Yes, those monitor scripts are executed per datastore.

2. VM disc capacity (used) and their snapshots(used) if available. For statistics and for the scheduler to find enough room to migrate the VM.
2.1 PERSISTENT and NON-PERSISTENT images:
2.1.1 if SHARED tm/monitor called in VMDISK context - for each VM or each DISK?
2.1.2 if SSH - im/kvm-probes.d/monitor_ds.sh - How the monitor_ds.sh will know which tm/monitor to query?
2.2 VOLATILE images(and optional the CONTEXT iso):
2.2.1 if SHARED: tm/monitor called in VMDISK context on the frontend(and SYSTEM_DS context). For each VM or each DISK?
2.2.2 if SSH: im/kvm-probes.d/monitor_ds.sh - How the monitor_ds.sh will know which tm/monitor to query?

Right now monitor_ds.sh is self contained and only reports datastore size, usage and free space plus disk based disks:

https://github.com/OpenNebula/one/blob/master/src/im_mad/remotes/common.d/monitor_ds.sh#L61

In your case ssh is not actually being used right?

My understandings are that the SYSTEM datastore has two capacity:
1. Non filesystem dependent - the VOLATILE disks, the CONTEXT iso and all other data that can be stored on block device
2. Filesystem dependent - the datastore XML and the checkpoint file. The checkpoint file is a tricky one - it can be imported in the datastore but it must be extracted on the filesystem to be used by libvirt (the time to import/extract is close to the time to copy on 10G network).

So if the filesystem for the SYSTEM datastore is Shared - no ssh is used. But if the SYSTEM datastore is not mounted on shared filesystem the use of SSH is mandatory at least for two reasons:
A) To copy all not datastore related files like:
- VM deploy XML,
- for non-live migration the checkpoint file(it is same time to import/export the file in the datastore against copy over 10G network)
- all other possible files placed by other drivers
B) For datastore migration of the VM - again to move the files to the other datastore path.

You're right.

Note also that you could use a mix of the two. I.e. information from the host, and from a "central" location.

This point is not clear to me. Could you elaborate a little bit?
The tm/monitor will be called (on the frontend) with argumets IMAGE_ID for the PERSISTENT images and with argumets IMAGE_ID, VM_ID and VM_DISK_ID for NON-PERSISTENT images?

tm/monitor is called once per shared system datastore. It outputs datastore info (size, used and free) plus all VM disks information. This is then merged to VM information by the core.

https://github.com/OpenNebula/one/blob/master/src/tm_mad/fs_lvm/monitor#L120

We have a real case - a customer with setup as follow: StorPool for both IMAGE and SYSTEM datastre(context and volatile disks) and no shared filesystem on the SYSTEM datastore. Current setup (on 4.14) is monitored as follow:
*IMAGE datastore capacity monitored with datastore/monitor
*SYSTEM datastore capacity monitored with im/kvm-probes.d/monitor_ds.sh (patched to query the right tool)
*VM_DISKs usage monitored with im/kvm-probes.d/poll (as you know patched to work with block devices)

What will be the monitoring on v5? I asume that
IMAGE datastore capacity with datastore/monitor
*SYSTEM datastore capacity with im/kvm-probes.d/monitor_ds.sh and/or (somehow) with tm_mad/monitor(
)
VM_DISKs usage again with tm_mad/monitor()

(*) it looks like there are two different tasks for tm/monitor - to report datastore capacity and VM disk usage?

I have the feeling that there is something that I can't understand...

This last use case it's more tricky and Ruben may shed more light here. My understanding is that you may need to retrieve volume information stored in StorPool using tm/monitor and volatile disks information (stored in system datastore files) using monitor_ds. Alternatively we can find a way to use monitor_ds to check if the disk is a block device and use another command to get the volume size (instead of du).

#10 Updated by Anton Todorov about 5 years ago

Right now monitor_ds.sh is self contained and only reports datastore size, usage and free space plus disk based disks:

https://github.com/OpenNebula/one/blob/master/src/im_mad/remotes/common.d/monitor_ds.sh#L61

...

tm/monitor is called once per shared system datastore. It outputs datastore info (size, used and free) plus all VM disks information. This is then merged to VM information by the core.

I am almost sure that the current tm/lvm/monitor will work only for the NON_PERSISTENT images as there is VM_ID and DISK_ID encoded in their names.

As I am closely following your naming convention so our addon suffer for same issue - the PERSISTENT images doesn't have VM and disk ID encoded in their name.

There is no way a script querying only the datastore backend to report which PERSISTENT image to which VM and disk ID is mapped :(
The only (indirect) way is via resolving the vm disk symlink to the device in the VM folder on the hypervisor.

So if tm/monitor is called on SHARED SYSTEM datastore I can recourse the datastore path and follow the symlinks and report right values.

But for im/monitor_ds.sh I would like to suggest to extract the disk details reporting on separate script(s) like:

    for vm in $vms; do
        vmdir="${dir}/${vm}" 
        disks=$(ls "$vmdir" | grep '^disk\.[0-9]\+$')

        [ -z "$disks" ] && continue

        echo -n "VM=[ID=$vm,POLL=\"" 

        for helper in `ls $(basedir $0)/monitor_ds.d/*`; do
            source "$helper" "$vmdir" 
        done
        echo "\"]" 
    done

... and disk details in im/kvm-probes.d/monitor_ds.d/qcow2_helper

        for disk in $disks; do
            disk_id="$(echo "$disk" | cut -d. -f2)" 
            disk_size="$(du -mL "${vmdir}/${disk}" | awk '{print $1}')" 

            [ -n "disk_size" ] || continue

            snap_dir="${vmdir}/${disk}.snap" 

            echo -n "DISK_SIZE=[ID=${disk_id},SIZE=${disk_size}] " 

            if [ -e "$snap_dir" ]; then
                snaps="$(ls "$snap_dir" | grep '^[0-9]$')" 

                for snap in $snaps; do
                    snap_size="$(du -mL "${snap_dir}/${snap}" | awk '{print $1}')" 
                    echo -n "SNAPSHOT_SIZE=[ID=${snap},DISK_ID=${disk_id},SIZE=${snap_size}] " 
                done
            fi
        done

This way each datastore backend(LVM, iSCSI, etc) will drop a script there that will test the disks and report only if it "recognize" the disk.

I believe this is the way to open the reporting without need to patch the monitor_ds.sh...

Kind Regards,
Anton Todorov

#11 Updated by Javi Fontan about 5 years ago

  • Status changed from Closed to New
  • % Done changed from 0 to 80
  • Resolution deleted (fixed)

You're right. We are going to review this method to include specific monitoring for non FS datastores. This will go as is in the beta release (for timing reasons) but will be fixed for the final.

We will be updating this issue.

#12 Updated by Ruben S. Montero about 5 years ago

  • Related to Feature #3981: generic disk polling for block devices added

#13 Updated by Ruben S. Montero about 5 years ago

  • Related to deleted (Feature #3981: generic disk polling for block devices)

#14 Updated by Ruben S. Montero about 5 years ago

  • Duplicates Feature #3981: generic disk polling for block devices added

#15 Updated by Javi Fontan about 5 years ago

We've been thinking about the system you propose but we are not very happy with having several scripts executed for the same datastore. It can happen that two of them are triggered and both report disk information.

Right now the way monitor_ds selects whether to monitor the disks or not is checking if a file named .monitor exists in that datastore:

https://github.com/OpenNebula/one/blob/master/src/im_mad/remotes/common.d/monitor_ds.sh#L44

    # Skip if datastore is not marked for monitoring (ssh)
    [ -e "${dir}/.monitor" ] || continue

This file is generated when the directory is created by the tm driver using ssh_make_path:

https://github.com/OpenNebula/one/blob/master/src/mad/sh/scripts_common.sh#L413-L434

# Creates path ($2) at $1. If third parameter is "monitor" creates the
# file ".monitor" in the directory. Used for ssh disk monitoring
function ssh_make_path
{
    SSH_EXEC_ERR=`$SSH $1 sh -s 2>&1 1>/dev/null <<EOF
set -e
if [ ! -d $2 ]; then
   mkdir -p $2
   if [ "monitor" = "$3" ]; then
       touch "\$(dirname $2)/.monitor" 
   fi
fi
EOF`
    SSH_EXEC_RC=$?

    if [ $SSH_EXEC_RC -ne 0 ]; then
        error_message "Error creating directory $2 at $1: $SSH_EXEC_ERR" 

        exit $SSH_EXEC_RC
    fi
}

Called by the ssh driver:

https://github.com/OpenNebula/one/blob/master/src/tm_mad/ssh/clone#L56

ssh_make_path $DST_HOST $DST_DIR "monitor" 

So the idea is changing this file to contain the tm driver that knows how to monitor the files in the datastore. For the ssh driver we will change the call to:

ssh_make_path $DST_HOST $DST_DIR "ssh" 

ssh_make_path will be modified to write this value in .monitor. The code to get disk information will be moved to a new tm script called monitor_ds. This way each non shared datastore will call its own monitoring code for disks.

#16 Updated by Anton Todorov about 5 years ago

We've been thinking about the system you propose but we are not very happy with having several scripts executed for the same datastore. It can happen that two of them are triggered and both report disk information.

I understand that it is risky but each successful script can break when success. I am already using it in the patch for im/monitor_ds.sh:

for ds in $dirs; do
    echo $ds | grep -q -E '^[0123456789]+$' || continue

    dir=$DATASTORE_LOCATION/$ds

    USED_MB=$(df -B1M -P $dir 2>/dev/null | tail -n 1 | awk '{print $3}')
    TOTAL_MB=$(df -B1M -P $dir 2>/dev/null | tail -n 1 | awk '{print $2}')
    FREE_MB=$(df -B1M -P $dir 2>/dev/null | tail -n 1 | awk '{print $4}')

    USED_MB=${USED_MB:-"0"}
    TOTAL_MB=${TOTAL_MB:-"0"}
    FREE_MB=${FREE_MB:-"0"}

    [ -f ../../datastore/storpool/monitor_ds.sh ] && source ../../datastore/storpool/monitor_ds.sh

And in our monitor_ds.sh( https://github.com/OpenNebula/addon-storpool/blob/master/datastore/storpool/monitor_ds.sh#L65) I am using `continue` iterate the"parent" loop. So the loop of all drivers can `brake` the loop on result.

The reason that I insist on such algo is because I think that it is possible to have disks from different datastores with different drivers attached to a VM.

So here is another idea to consider. On each disk action (clone(NON_PERSISTENT), ln(PERSISTENT), mkfs(VOLATILE), {pre,post}migrate,mv, etc...) create a file with the tm_mad for it. for example:

echo "tm_mad_name" >/var/lib/one/datastores/<SYSTEM_DS_ID>/<VM_ID>/disk.0.mad

The above idea could be an extension to your idea: if there is no 'disk.N.mad' use the tm_mad from the .monitor file. This way it will work only for the drivers that implement it.

What do you think?

Kind Regards,
Anton Todorov

#17 Updated by Ruben S. Montero about 5 years ago

Hi Anton,

So after discussing this with the team...

A VM can only include two type of disks, volatile (created by system DS scripts) and images (either persistent or not, created by the image DS scripts).

For storpool, the image DS scripts will just include something like

ssh_make_path $DST_HOST $DST_DIR "storpool"

in their clone/ln operations.

This will execute the custom stopool monitor script, from that I think it should be easy to test each disk (e.g. file) to findout if this is a volatile disk or an storpool device.

The drawback of this approach is that probably common code to monitor volatile disks cannot be reused. But, We think it is easy to distribute and develop new drivers if all the logic is self contained. We also prevent any changes upstream to break other drivers. Also it will probably require more changes to deal with per-disk monitor scripts in the rest of OpenNebula core.

So the only problem I see is for 2 different Image Datastores able to work with the same system DS (that need to be SSH, non-shared). In that case you will have 3 types of disks (e.g. storpool, ssh and other). That will be a limitation of the current approach, but I am not sure if this use-case is even possible (most of the storage now uses their own system ds, ceph lvm...).

#18 Updated by Anton Todorov about 5 years ago

Hi Ruben,

The suggested approach will definitely work in the case of same tm_mad for system and images datastores. With the suggested change it definitely will work for different system and images datastores. The only reason of my suggestion it to open the interface to allow the users to use more than one images datastore - for any (theoretical) reason.

So if I am understanding right, we will have:

For IMAGES_DS:
  • ds/monitor
    - report the images datastore metrics: USED_MB, FREE_MB, TOTAL_MB
For shared(nfs) SYSTEM_DS:
  • tm/monitor
    - report the SYSTEM datastore metrics: USED_MB, FREE_MB, TOTAL_MB
    - report per VM disk statistics for the VOLATILE disks: VM=[ID,POLL="DISK_SIZE=[ID,SIZE] SNAPSHOT_SIZE=[ID,DISK_IDSIZE] ..."]
    - call ../$(<.monitor)/monitor_ds to report IMAGES_DS disks: VM=[ID,POLL="DISK_SIZE=[ID,SIZE] SNAPSHOT_SIZE=[ID,DISK_ID,SIZE] ..."] <<< I am not sure it is correct guess
For non-shared(ssh) SYSTEM_DS:
  • tm/monitor is called by im_mad on the remote host
    - report the SYSTEM datastore metrics: USED_MB, FREE_MB, TOTAL_MB
    - report per VM disk statistics for the VOLATILE disks: VM=[ID,POLL="DISK_SIZE=[ID,SIZE] SNAPSHOT_SIZE=[ID,DISK_ID,SIZE]"]
    - call ../$(<.monitor)/monitor_ds to report IMAGES_DS disks: VM=[ID,POLL="DISK_SIZE=[ID,SIZE] SNAPSHOT_SIZE=[ID,DISK_ID,SIZE]"]

Beside the speculative guess (marked with "<<<") I am ok with the above scenario. In this case the monitor_ds will do the loop of the VM disks and I can implement in our add-on the extension to support different IMAGES_DS by checking for a hint which TM_MAD to call.

I was thinking that the tm/monitor script could walk all the VM disks and call ../images_tm_mad/monitor_ds.sh $DISK_ID $VM_ID $SYSTEM_DS_ID... something like:

#in tm/monitor
for disk_file in `ls disk.*`; do
   disk_id=$(...)
   if [ -f "$disk_file.mad" ]; then
      tm_mad=$(<$disk_file.mad)
   else
      tm_mad=$(<.monitor)
   fi
   # This script will print the DISK_SIZE=[ID,SIZE] and optionally the SNAPSHOT_SIZE=[ID,DISK_ID,SIZE] strings for the given disk
   ../$tm_mad/monitor_disk.sh $VM_ID $disk_file #optionally $SYSTEM_DS_PATH
done

The above code will work for all drivers by calling .monitor as is your suggestion but will be open to extend it if the addon developer wants...

And I think then the .monitor file could not be used as flag to distinguish shared/non-shared SYSTEM_DS tough.

What do you think?

I am OK with your latest proposal, but I believe it easily could be made a little bit more open.

Kind Regards,
Anton Todorov

#19 Updated by Ruben S. Montero about 5 years ago

Hi Anton

The process is as follows.

Image Datastores

Exactly as you described it.

Shared System Datastores

These datastores are monitored from a single point once (either the front-end or one of the storage bridged in ``BRIDGE_LIST``). This will prevent overloading the storage by all the nodes querying it at the same time (e.g. NFS and Ceph presents problems when doing this otherwise inefficient operation)

The driver plugin ``<tm_mad>/monitor`` will report the information for two things:

  • Total storage metrics for the datastore (``USED_MB`` ``FREE_MB`` ``TOTAL_MB`` )
  • Disk usage metrics (all disks: volatile, persistent and non-persistent)

SSH System Datastore

Nodes have access to both shared and non-shared. Non-shared SSH datastores are labeled by including a .monitor file in the datastore directory. So only those datastores are monitored remotely. The datastore is monitored with ``<tm_mad>/monitor_ds``, but ``tm_mad`` is obtained by the probes reading from the .monitor file.

The plugins <tm_mad>/monitor_ds + kvm-probes.d/monitor_ds.sh will report the information for two things:

  • Total storage metrics for the datastore (``USED_MB`` ``FREE_MB`` ``TOTAL_MB``)
  • Disk usage metrics (all disks volatile, persistent and non-persistent)

Note that:

  • ``.monitor`` will be only present in SSH datastores to be monitored in the nodes.
  • System Datastores that need to be monitored in the nodes will need to provide a ``monitor_ds`` script and not the ``monitor`` one. This is to prevent errors, and not invoke the shared mechanism for local datastores.

The monitor_ds script.

The monitor_ds.sh probe from the IM, if the ``.monitor`` file is present (e.g. ``/var/lib/one/datastores/100/.monitor``), will execute its contents in the form ``/var/tmp/one/remotes/tm/$(cat .monitor)/monitor_ds /var/lib/one/datastores/100/``. Note that the argument is the datastore path and not the VM or VM disk.

The script is responsible from getting the information from all disks of all VMs in the datastore in that node.

Extensions

A Custom datastore, working with the SSH datastore will need to include disk information of the VMs in each node. In that case it:

  • May use as base the loop structure provided in ``tm/ssh/monitor_ds``
  • include the information of each disk of each VM. Volatile disks (e.g. file disk.0) can use the du command in ssh/monitor_ds as an example
  • Other disks, needs to include a reference in the VM directory (for example the clone/ln operation of the datastore can include the internal reference in disk.i for later user)
  • If the datastore supports disk snapshots it can also include snapshots sizes. Again the internal structure used to store the snapshot information is driver dependent but enough references should be made available in the VM directory (e.g. qcow2 snapshots keep a .snap directory where .snap/0 refers to the first disk snapshot).

I think that your proposal is somehow captured in the current implementation described above. The loop process is however left to the custom monitor_ds script as it may need additional information to be retrieved from the contents of the datastore.

Note that in this case the datastore is distributed, and the datastore directory for the VMs are used as the communication mechanism between the TM operations and the IM (monitoring) ones.

Thanks again for your through and useful feedback! :)

Ruben

#20 Updated by Anton Todorov about 5 years ago

Hi Ruben,

Thank you for the detailed explanation.

I'll begin drafting the scripts while waiting for the related commits pushed to github.

Kind Regards,
Anton Todorov

#21 Updated by Ruben S. Montero about 5 years ago

Hi Anton,

Just merge the branch into master to speedup things.

Cheers

#22 Updated by Ruben S. Montero about 5 years ago

  • Assignee changed from Javi Fontan to Ruben S. Montero

#23 Updated by Ruben S. Montero about 5 years ago

  • Status changed from New to Closed
  • Resolution set to fixed

Also available in: Atom PDF