Backlog #2912

/var/lib/one/remotes/im/kvm-probes.d/monitor_ds.sh check for LVM datastores is sloppy

Added by Bill Cole over 5 years ago. Updated about 5 years ago.

Status:NewStart date:05/12/2014
Priority:NormalDue date:
Assignee:Jaime Melis% Done:

0%

Category:Drivers - StorageEstimated time:1.00 hour
Target version:-

Description

The current code run a constructed "sudo vgdisplay[...]" command for each numeric subdirectory of $DATASTORE_LOCATION on each host, based only on whether a 'vgdisplay' executable exists. On hosts where LVM is installed but only used for local storage, the 'vgdisplay' commands get run but are useless. If the host is configured to log 'sudo' uses, those commands also cause substantial log noise.

To avoid pointless 'sudo vgdisplay' commands on hosts with no CLVM datastores and protect against the unlikely edge case of collision with an environment variable, replace this line:

PATH=$PATH:/sbin:/bin:/usr/sbin:/usr/bin which vgdisplay &> /dev/null

with these:

unset LVM_SIZE_CMD
grep "${LVM_VG_PREFIX}" /etc/mtab >/dev/null

This will still result in some useless size checks for non-existent VGs on hosts with mixed datastore types, a problem which could be fixed by checking for specific VGs inside the DS loop instead.

History

#1 Updated by Jaime Melis over 5 years ago

  • Tracker changed from Bug to Feature
  • Category set to Drivers - Storage
  • Status changed from Pending to New
  • Assignee set to Jaime Melis
  • Target version set to Release 4.8

Thanks for the patch. It makes a lot of sense.

I also like the idea of the final comment. Extracting the list of existing vgs from /etc/mtab would be a nice improvement to this driver.

Programmed for OpenNebula 4.8.

I've changed it to Feature since it's not a Bug.

#2 Updated by Ruben S. Montero about 5 years ago

  • Tracker changed from Feature to Bug

#3 Updated by Jaime Melis about 5 years ago

  • Tracker changed from Bug to Feature
  • Target version deleted (Release 4.8)
  • Resolution set to wontfix

We've been evaluating this patch further, and although we like the idea, there's a problem that we need to think about first.

The problem is that before an image is registered into the LVM datastore, the grep mtab will not return anything, and will therefore report an invalid amount of space left, which can cause further problems.

I've been trying to investigate if it's possible to detect without 'sudo' permissions if there's a "vg-one" VG, but haven't been able to. stracing the vgs command does not reveal a way to find it out...

We also thought about substituting the vgdisplay check with "sudo vgs vg-one" therefore only enabling the check of the lvm available space if that VG is present. However, this has also a big drawback: we are hardcoding the name of the VG which means that you can only have one LVM based datastore per cluster, which is unnacceptable.

Setting it to wontfix until we have a clear way to overcome these limitations.

Bill, we'd love your input on this :)

#4 Updated by Jaime Melis about 5 years ago

  • Tracker changed from Feature to Backlog

#5 Updated by Bill Cole about 5 years ago

I don't have any CLVM systems to look at for comparison, but on machines with LVM local storage and defined LVs, there are world-readable symlinks to device mapper nodes at /dev/<vgname>/<lvname> whether or not they are (or ever have been) mounted. Unfortunately, if there are no LVs defined in a VG, it has no directory in /dev and the only indication I can find on a machine that such a VG (in this case named 'demo') exists is this line in /var/log/boot.log:

Setting up Logical Volume Management: 0 logical volume(s) in volume group "demo" now active

And while that happens to exist and is world-readable on the CentOS 6 machines I've got handy, I've no idea how universal it is and obviously that line relies on the VG existing at boot time. If CLVM behaves similarly, there may in fact be no reliable way of telling whether a VG exists if it has no LVs. One way around that could be for OpenNebula to create a trivial LV in each LVM DS solely to get hosts to create the /dev/<vgname> directory with one entry in it.

Another alternative (which is in the larger realm of monitoring design) would be to provide the monitor scripts with a means of determining the scope of what they should expect to find on a particular host or across the environment as a whole to avoid attempts to monitor resources that don't exist. I'm not thinking of a full replication of the DB to hosts, just something like a manifest file or set of files that monitoring scripts could use to know what they should actually be looking for. I should add a warning to that: even after re-reading the docs and many of the scripts, I'm unclear on precisely which datastores are monitored with monitor_ds.sh and which are monitored by the datastore/[dstype]/monitor scripts and how results from multiple hosts (and the frontend?) are reconciled for shared datastores. I am fairly certain that in my production environment where we currently use only NFS-shared datastores, I could remove monitor_ds.sh altogether, but that would create an ongoing maintainability issue.

#6 Updated by Jaime Melis about 5 years ago

We still don't have a clear view on how to add this feature. It's high in the priority list however.
We need to find a solution that conforms with all the supported platforms.

Also available in: Atom PDF