Bug #4796
When failing over HA controllers, hypervisor collectd probes do not switch to new controller
Status: | Closed | Start date: | 09/19/2016 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Javi Fontan | % Done: | 0% | |
Category: | Core & System | |||
Target version: | Release 5.4 | |||
Resolution: | fixed | Pull request: | ||
Affected Versions: | OpenNebula 5.0 |
Description
After a failover of HA controllers, the collectd monitoring probes on hypervisors do not switch to sending data to the new controller. The new controller is able to monitor the hypervisors (looks like via the ssh pull that happens if oned has not received data in a little while), but the collectd-client.rb processes running on the hypervisors continue sending data to the old controller's IP (It looks like the monitoring scripts just make sure that the collectd-client.rb is running, they do not restart it).
Fix is to run a onehost sync (or offline/online the hosts). (Tested for a few hours and this did not switch over automatically.) This is running with IM_MAD kvm udp-push.
Related issues
Associated revisions
B #4796: restart collectd one active monitorization
B #4796: restart collectd one active monitorization
(cherry picked from commit 5db34212ef837b5314205aa89cedd3f4c229418a)
History
#1 Updated by Ruben S. Montero almost 5 years ago
- Related to Feature #4809: Simplify HA management in OpenNebula added
#2 Updated by Ruben S. Montero almost 5 years ago
- Category set to Core & System
- Target version set to Release 5.4
#3 Updated by Kristian Feldsam almost 5 years ago
In standard clustered HA setup, you should have one floating IP on which sits collectd, oned, sunstone, nginx.... So when active node fails, second node get floating IP and continue running...
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_add_a_resource.html
So you have probably problem in you HA setup, when collectd get other IP.
#4 Updated by Ruben S. Montero about 4 years ago
- Assignee set to Jaime Melis
#5 Updated by Javi Fontan almost 4 years ago
- Status changed from Pending to Closed
- Assignee changed from Jaime Melis to Javi Fontan
- Resolution set to fixed
Fixed both in master and one-5.2