Bug #3393

Multiple probes running on a node

Added by EOLE Team over 6 years ago. Updated over 6 years ago.

Status:ClosedStart date:12/03/2014
Priority:NormalDue date:
Assignee:Javi Fontan% Done:

0%

Category:Drivers - Monitor
Target version:Release 4.10.2
Resolution:fixed Pull request:
Affected Versions:OpenNebula 4.8

Description

Hello,

From the mailing-list

One of our ONE 4.8 node has tons of monitor_ds.sh probes running.

All the du are marked in uninterruptible sleep

 5903 ? Ss 0:00  \_ sshd: oneadmin [priv]
 5905 ? S  0:00      \_ sshd: oneadmin <at> notty
 5906 ? Ss 0:00          \_ bash -c if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 8 igor; else                              exit 42; fi
 5907 ? S  0:00              \_ /bin/bash /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 8 igor
 5914 ? S  0:00                  \_ /bin/bash /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 8 igor
 5915 ? S  0:00                      \_ /bin/bash /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 8 igor
 5920 ? S  0:00                          \_ /bin/bash ./collectd-client_control.sh kvm /var/lib/one//datastores 4124 60 8 igor
 5927 ? S  0:00                              \_ /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 60 8 igor
 5934 ? S  0:00                                  \_ /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 60 8 igor
 5935 ? S  0:00                                      \_ /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 60 8 igor
 5988 ? S  0:00                                          \_ /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 60 8 igor
 5991 ? S  0:00                                              \_ /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 60 8 igor
 5992 ? D  0:00                                                  \_ du -sLm /var/lib/one//datastores
 5993 ? S  0:00                                                  \_ cut -f1

I think that instead of trying to run the probes again and again adding more contention on the disk, ONE should report the node as ERROR.


Related issues

Related to Bug #2783: Storage size calculation for Datastore using du Closed 03/19/2014

History

#1 Updated by EOLE Team over 6 years ago

Note that I disable the node with onehost disable igor but OpenNebula continue to start probes on it.

#2 Updated by Ruben S. Montero over 6 years ago

  • Status changed from Pending to New
  • Target version set to Release 4.12

#3 Updated by Ruben S. Montero over 6 years ago

  • Related to Bug #2783: Storage size calculation for Datastore using du added

#4 Updated by Ruben S. Montero over 6 years ago

  • Target version changed from Release 4.12 to Release 4.10.2

#5 Updated by Ruben S. Montero over 6 years ago

  • Assignee set to Javi Fontan

#6 Updated by Javi Fontan over 6 years ago

That tree of pocesses is normal. The first time a node is monitored or when it does not get information for a certain ammount of time it does this:

  • Check if remotes are in the host and copy them if they are not
  • Start run probes for <driver> (this is what starts that tree)
  • Start collectd client
  • Run probes to collect data the first time

After that you will only see collectd client process and probes run from them. The problem here is the du that takes a long time to execute and hangs there.

We have added a patch that substitutes du by df and should fix this problem, the commit is 4ec69144c9561d9200f0578ae9a4701b4dbeb5a7

#7 Updated by Ruben S. Montero over 6 years ago

  • Status changed from New to Closed
  • Resolution set to fixed

WE'll reopen it if needed.

Also available in: Atom PDF