When failing over HA controllers, hypervisor collectd probes do not switch to new controller
|Assignee:||Javi Fontan||% Done:|
|Category:||Core & System|
|Target version:||Release 5.4|
|Affected Versions:||OpenNebula 5.0|
After a failover of HA controllers, the collectd monitoring probes on hypervisors do not switch to sending data to the new controller. The new controller is able to monitor the hypervisors (looks like via the ssh pull that happens if oned has not received data in a little while), but the collectd-client.rb processes running on the hypervisors continue sending data to the old controller's IP (It looks like the monitoring scripts just make sure that the collectd-client.rb is running, they do not restart it).
Fix is to run a onehost sync (or offline/online the hosts). (Tested for a few hours and this did not switch over automatically.) This is running with IM_MAD kvm udp-push.
#3 Updated by Kristian Feldsam over 4 years ago
In standard clustered HA setup, you should have one floating IP on which sits collectd, oned, sunstone, nginx.... So when active node fails, second node get floating IP and continue running...
So you have probably problem in you HA setup, when collectd get other IP.