Bug #3393
Multiple probes running on a node
Status: | Closed | Start date: | 12/03/2014 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Javi Fontan | % Done: | 0% | |
Category: | Drivers - Monitor | |||
Target version: | Release 4.10.2 | |||
Resolution: | fixed | Pull request: | ||
Affected Versions: | OpenNebula 4.8 |
Description
Hello,
From the mailing-list
One of our ONE 4.8 node has tons of monitor_ds.sh
probes running.
All the du
are marked in uninterruptible sleep
5903 ? Ss 0:00 \_ sshd: oneadmin [priv] 5905 ? S 0:00 \_ sshd: oneadmin <at> notty 5906 ? Ss 0:00 \_ bash -c if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 8 igor; else exit 42; fi 5907 ? S 0:00 \_ /bin/bash /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 8 igor 5914 ? S 0:00 \_ /bin/bash /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 8 igor 5915 ? S 0:00 \_ /bin/bash /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 8 igor 5920 ? S 0:00 \_ /bin/bash ./collectd-client_control.sh kvm /var/lib/one//datastores 4124 60 8 igor 5927 ? S 0:00 \_ /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 60 8 igor 5934 ? S 0:00 \_ /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 60 8 igor 5935 ? S 0:00 \_ /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 60 8 igor 5988 ? S 0:00 \_ /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 60 8 igor 5991 ? S 0:00 \_ /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 60 8 igor 5992 ? D 0:00 \_ du -sLm /var/lib/one//datastores 5993 ? S 0:00 \_ cut -f1
I think that instead of trying to run the probes again and again adding more contention on the disk, ONE should report the node as ERROR.
Related issues
History
#1 Updated by EOLE Team over 6 years ago
Note that I disable the node with onehost disable igor
but OpenNebula continue to start probes on it.
#2 Updated by Ruben S. Montero over 6 years ago
- Status changed from Pending to New
- Target version set to Release 4.12
#3 Updated by Ruben S. Montero over 6 years ago
- Related to Bug #2783: Storage size calculation for Datastore using du added
#4 Updated by Ruben S. Montero over 6 years ago
- Target version changed from Release 4.12 to Release 4.10.2
#5 Updated by Ruben S. Montero over 6 years ago
- Assignee set to Javi Fontan
#6 Updated by Javi Fontan over 6 years ago
That tree of pocesses is normal. The first time a node is monitored or when it does not get information for a certain ammount of time it does this:
- Check if remotes are in the host and copy them if they are not
- Start run probes for <driver> (this is what starts that tree)
- Start collectd client
- Run probes to collect data the first time
After that you will only see collectd client process and probes run from them. The problem here is the du that takes a long time to execute and hangs there.
We have added a patch that substitutes du by df and should fix this problem, the commit is 4ec69144c9561d9200f0578ae9a4701b4dbeb5a7
#7 Updated by Ruben S. Montero over 6 years ago
- Status changed from New to Closed
- Resolution set to fixed
WE'll reopen it if needed.