Feature #4206
Poll script on host should skip failed checks
Status: | Closed | Start date: | 11/26/2015 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Javi Fontan | % Done: | 0% | |
Category: | Drivers - Monitor | |||
Target version: | Release 5.0 | |||
Resolution: | fixed | Pull request: |
Description
When the poll script on a host encounters a error then no information is send to the frontend. Example:
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: 2015-11-26 08:45:34.511115 7f6590276840 0 monclient(hunting): authenticate timed out after 300
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: 2015-11-26 08:45:34.511178 7f6590276840 0 librados: client.libvirt authentication error (110) Connection timed out
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: rbd: couldn't connect to the cluster!
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: ../../vmm/kvm/poll:349:in `block in get_disk_usage': undefined method `text' for nil:NilClass (NoMethodError)
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from /usr/lib/ruby/1.9.1/rexml/element.rb:905:in `block in each'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from /usr/lib/ruby/1.9.1/rexml/xpath.rb:67:in `each'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from /usr/lib/ruby/1.9.1/rexml/xpath.rb:67:in `each'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from /usr/lib/ruby/1.9.1/rexml/element.rb:905:in `each'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from ../../vmm/kvm/poll:329:in `get_disk_usage'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from ../../vmm/kvm/poll:145:in `block in get_all_vm_info'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from ../../vmm/kvm/poll:129:in `each'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from ../../vmm/kvm/poll:129:in `get_all_vm_info'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from ../../vmm/kvm/poll:856:in `print_all_vm_template'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: from ../../vmm/kvm/poll:908:in `<main>'
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][E]: Error executing poll.sh
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][E]: Error executing collectd-client_control.sh
Nov 26 08:45:34 test-oned2 oned8488: [Z0][InM][I]: ExitCode: 1
In this case the (test) Ceph cluster is unavailable which results in a "blackout" of all VM's running on the host. Environments which depend on accounting info from ONE will lose billing info. Instead of failing completely I would suggest the poll script should report succesfully collected metrics and mention an error / warning error for the failed check.
Associated revisions
feature #4206: do not crash getting disk info in poll
feature #4206: do not crash getting disk info in poll
(cherry picked from commit 7d6f91a369cdd31a788d8d44cece531357fa7fb3)
feature #4206: do not crash getting disk info in poll
(cherry picked from commit 7d6f91a369cdd31a788d8d44cece531357fa7fb3)
History
#1 Updated by Ruben S. Montero over 5 years ago
- Category set to Drivers - Monitor
- Target version set to Release 5.0
#2 Updated by Ruben S. Montero over 5 years ago
- Status changed from Pending to New
#3 Updated by Javi Fontan over 5 years ago
- Assignee set to Javi Fontan
The drivers only log STDERR in case the command fails. This should be changed so error messages for a successful execution (warnings?) can be seen in log files.
#4 Updated by Ruben S. Montero about 5 years ago
- Status changed from New to Closed
- Resolution set to fixed