Feature #4809

Simplify HA management in OpenNebula

Added by Ruben S. Montero about 4 years ago. Updated over 3 years ago.

Status:ClosedStart date:09/21/2016
Priority:NormalDue date:
Assignee:Vlastimil Holer% Done:

0%

Category:Core & System
Target version:Release 5.4
Resolution:fixed Pull request:

Description

This issue is to simplify the process of setting OpenNebula services in HA. Currently the system is designed to be compatible with classical Linux cluster suites. In practice, this impose to many requirements (e.g. need of a reliable fencing) and leads to deployments that are not easy to maintain.


Related issues

Related to Bug #4796: When failing over HA controllers, hypervisor collectd pro... Closed 09/19/2016

Associated revisions

Revision e1cb2c92
Added by Ruben S. Montero over 3 years ago

F #4809: Add a base class for extended template attributes

Revision d03ed158
Added by Ruben S. Montero over 3 years ago

F #4809: Add zone server list to zone data

Revision ebc810f4
Added by Ruben S. Montero over 3 years ago

F #4809: Add Server to Zone API call

Revision c360b015
Added by Ruben S. Montero over 3 years ago

F #4809: AddServer zone API in federated setups

Revision cd580714
Added by Ruben S. Montero over 3 years ago

F #4809: Delete servers from zone

Revision a6d4ab3c
Added by Ruben S. Montero over 3 years ago

F #4809: Update Sql database interface to include read/write/bootstrap
operations

Revision e4848c55
Added by Ruben S. Montero over 3 years ago

F #4809: Overwrite SqlDB db_exec_* methods in DB implementations

Revision bae57600
Added by Ruben S. Montero over 3 years ago

F #4809: Some notes on the implementation of the log replication for
zone servers

Revision c8981e82
Added by Ruben S. Montero over 3 years ago

F #4809: Some implementation files. Fix compilation issues

Revision 116425fc
Added by Ruben S. Montero over 3 years ago

F #4809: Template for the LogDBManager

Revision dd0598aa
Added by Ruben S. Montero over 3 years ago

F #4809: Work on log management and replication

Revision bca17f4e
Added by Ruben S. Montero over 3 years ago

F #4809: Update SqlDB method name. DO NOT replicate monitoring data

Revision ed0d64a2
Added by Ruben S. Montero over 3 years ago

F #4809: Updated log structure

Revision b26e5a71
Added by Ruben S. Montero over 3 years ago

F #4809: Replication logic

Revision f2039e02
Added by Ruben S. Montero over 3 years ago

F #4809: Added LogDBManger to Zone server

Revision 034e2316
Added by Ruben S. Montero over 3 years ago

F #4809: Start replication on new exec_wr calls

Revision 0db5478b
Added by Ruben S. Montero over 3 years ago

F #4809: Notify clients on majority replication

Revision 0c8299f1
Added by Ruben S. Montero over 3 years ago

feature #4809: Bootstrap LogDB tables in DB, initialize next_index

Revision 4c577126
Added by Ruben S. Montero over 3 years ago

F #4809: Improve replication logic, new RaftManager to control server
states

Revision 7c479b8e
Added by Ruben S. Montero over 3 years ago

F #4809: Restructure Raft implementation

Revision 900c37fd
Added by Ruben S. Montero over 3 years ago

F #4809: Solves some bugs

Revision cc08fd98
Added by Ruben S. Montero over 3 years ago

F #4809: replicate logic for leader

Revision 88513088
Added by Ruben S. Montero over 3 years ago

F #4809: ReplicateLog client logic

Revision 613ec634
Added by Ruben S. Montero over 3 years ago

F #4809: Fix replication logic

Revision 6f976c1e
Added by Ruben S. Montero over 3 years ago

F #4809: Make some Raft events synchronous with replication threads

Revision dd51b583
Added by Ruben S. Montero over 3 years ago

F #4809: DO NOT monitor marketplaces, nor datastores in followers

Revision 5441a2f8
Added by Ruben S. Montero over 3 years ago

F #4809: Log retention system

Revision 3be9f38d
Added by Ruben S. Montero over 3 years ago

F #4809: ActionManager can set timers in in ns. Adjust Raft manager to ms timers. Include Hearbeat control from leader and follower. XML-RPC calls are now async. RaftManager db calls is now decoupled with pool classes.

Revision 071d0d84
Added by Ruben S. Montero over 3 years ago

F #4809: start/stop replica threads when adding/deleting servers

Revision 3f9638be
Added by Ruben S. Montero over 3 years ago

F #4809: Condifugre raft timers through oned.conf

Revision 2a695bc8
Added by Ruben S. Montero over 3 years ago

F #4809: Persist raft state. Implement Vote API call

Revision c6a7500d
Added by Ruben S. Montero over 3 years ago

F #4809: Adds vote requests from candidate, fix some bugs

Revision 03c5698a
Added by Ruben S. Montero over 3 years ago

F #4809: Fix several issues accessing the log records

Revision 9e1fd142
Added by Ruben S. Montero over 3 years ago

F #4809: Debug election process

Revision c14d648e
Added by Ruben S. Montero over 3 years ago

F #4809: Method to query raft status one servers. Updated CLI to make
use of it

Revision c8f77ab8
Added by Ruben S. Montero over 3 years ago

F #4809: Remove cache from core pools

Revision 9d397346
Added by Ruben S. Montero over 3 years ago

F #4809: Keep last_index counters internally. Commit log records on
heartbeats

Revision 503b2835
Added by Ruben S. Montero over 3 years ago

F #4809: Update API internal name to match public xml-rpc names. Do not log
heartbeat/replicate log entries

Revision c1317ed6
Added by Ruben S. Montero over 3 years ago

F #4809: Store the leader_id for the term

Revision 59cf651d
Added by Ruben S. Montero over 3 years ago

F #4809: API methods leader_only attribute. List, Info and Raft methods
are not leader only.

Revision 590b3548
Added by Ruben S. Montero over 3 years ago

F #4809: Forward request to leader

Revision ac76ab45
Added by Ruben S. Montero over 3 years ago

F #4809: Reload ACL rule cache when a server becomes zone leader

Revision 5f6a627b
Added by Ruben S. Montero over 3 years ago

F #4809: Do not schedule VMs on non-leader servers

Revision 1bcfc2bd
Added by Ruben S. Montero over 3 years ago

F #4809: Less verbose output

Revision 741a55d1
Added by Ruben S. Montero over 3 years ago

F #4809: Generic replication serves. Solve minor bugs in election
process

Revision b38874a0
Added by Ruben S. Montero over 3 years ago

F #4809: Moved zone server list to ZonePool. Added FedLogDB class

Revision faa46d9c
Added by Ruben S. Montero over 3 years ago

F #4809: fix compilation

Revision cfe3b415
Added by Jaime Melis over 3 years ago

F #4809: Handle properly if only one server in the zone

Revision 5e92ec0a
Added by Ruben S. Montero over 3 years ago

F #4809: Use int for object IDs. Added base federation replica manager

Revision 85b6ed25
Added by Jaime Melis over 3 years ago

F #4809: Implement Hooks on RAFT events

Revision 94052286
Added by Jaime Melis over 3 years ago

F #4809: Move the FT hooks to its own folder

Revision 680d49a2
Added by Jaime Melis over 3 years ago

F #4809: Add RAFT hooks

Revision 2676d8b7
Added by Ruben S. Montero over 3 years ago

feature #4809: Start the federation replica manager with OpenNebula

Revision 4291e4a8
Added by Ruben S. Montero over 3 years ago

F #4809: Do not start replica threads for own zone

Revision 46d4e232
Added by Ruben S. Montero over 3 years ago

F #4809: Fix timers in federation replica manager

Revision c5a82aba
Added by Ruben S. Montero over 3 years ago

F #4809: Update server list on election

Revision c5f54f81
Added by Ruben S. Montero over 3 years ago

F #4809: First version of replication thread for federated zones

Revision 4a780941
Added by Ruben S. Montero over 3 years ago

F #4809: Refresh zone list on server add/remove to keep an updated list
of servers

Revision 31ba4d4d
Added by Ruben S. Montero over 3 years ago

F #4809: Fix some bugs when replicating log entries to slave zones

Revision afca14d7
Added by Ruben S. Montero over 3 years ago

F #4809: Initialize last_index of federated log when starting replica
threads

Revision a216bb9e
Added by Ruben S. Montero over 3 years ago

F #4809: Raft servers / zone slave lists updated on server-add in the
rigth place (leaders/masters)

Revision 98d26e9e
Added by Ruben S. Montero over 3 years ago

F #4809: Do not accept empty records. Increase default xml-rpc timeouts

Revision c8d88de0
Added by Ruben S. Montero over 3 years ago

F #4809: Do not doble-unlock raft mutex in timer. Fix update of
heartbeat counter

Revision 8488320e
Added by Javi Fontan over 3 years ago

F #4809: add migrator for logdb and fed_logdb tables

Revision 02b229d7
Added by Ruben S. Montero over 3 years ago

F #4809: Fix cache issues when updating PCI devices on VM

Revision 3d9686e3
Added by Ruben S. Montero over 3 years ago

F #4809: Fix history update for non-cache vm objects

Revision 9d763ef6
Added by Jaime Melis over 3 years ago

F #4809: Fully working vip hooks

Revision 54c1fbaa
Added by Jaime Melis over 3 years ago

F #4809: Pass arguments to the raft hooks

Revision 75b8889e
Added by Ruben S. Montero over 3 years ago

F #4809: Fix cache issue when the VM outated set is modified

Revision 234734a3
Added by Ruben S. Montero over 3 years ago

F #4809: Better query to purge log records

Revision f12fdfae
Added by Ruben S. Montero over 3 years ago

F #4809: Compress DB commands.

Revision 50880bb2
Added by Ruben S. Montero over 3 years ago

F #4809: Do not write log in solo mode, update instead of insert for
timestamps

Revision 50d0a6b4
Added by Jaime Melis over 3 years ago

F #4809: sql is a reserved keyword in MySQL

Revision ab63767e
Added by Jaime Melis over 3 years ago

F #4809: Tool to backup/restore the database (with federated option)

Revision d3c7a07a
Added by Ruben S. Montero over 3 years ago

F #4809: Check log records are properly loaded from DB

Revision 654b384b
Added by Ruben S. Montero over 3 years ago

F #4809: Do not use log after unlocking objetcs

Revision e80386ef
Added by Ruben S. Montero over 3 years ago

F #4809: Better timeouts for client xmlrpc-c

Revision b768f896
Added by Ruben S. Montero over 3 years ago

F #4809: Execute follower hook on startup

Revision 0a2f18a7
Added by Ruben S. Montero over 3 years ago

F #4809: Better heartbeat management. A separated thread is created for
each follower

Revision 953a7f16
Added by Ruben S. Montero over 3 years ago

f #4809: Do not include log info on raft.status when operating in solo
mode

Revision bd4040d3
Added by Ruben S. Montero over 3 years ago

F #4809: Do not bootstrap DB before upgrading

Revision 486a4856
Added by Anton Todorov over 3 years ago

F #4809: rename sql to sqlcmd in database_schema.rb

'sql' is reserved word in MariaDB

Revision 169d153e
Added by Jaime Melis over 3 years ago

Merge pull request #324 from atodorov-storpool/patch-16

F #4809: rename sql to sqlcmd in database_schema.rb

Revision 978d1538
Added by Ruben S. Montero over 3 years ago

F #4809: Load Hook managet earlier to be able to execute raft hooks on
start

Revision 585cc130
Added by Jaime Melis over 3 years ago

F #4809: Typo in the name of the hook

Revision 6d61d510
Added by Ruben S. Montero over 3 years ago

F #4809: Fix deadlock when stopping replica threads

Revision f4ec2c07
Added by Ruben S. Montero over 3 years ago

F #4809: Fix error when deleting server from zone in standalone mode

Revision 2f5a3185
Added by Ruben S. Montero over 3 years ago

F #4809: Start/Stop zone replication threads when adding/deleting zones

Revision 97474282
Added by Ruben S. Montero over 3 years ago

F #4809: sync zone update in server add when adding a HA follower in a
federation (for 2 server corner case). Check for empty records in
federated log replication

Revision dff203fe
Added by Ruben S. Montero over 3 years ago

F #4809: Adding missing unlock()

Revision d00e3798
Added by Ruben S. Montero over 3 years ago

F #4809: Add missing migrators

Revision 0888ae48
Added by Ruben S. Montero over 3 years ago

F #4809: Fix log rewrite. Disable curl timeouts.

Revision c98457ae
Added by Ruben S. Montero over 3 years ago

F #4809: Better cancellatin of replica threads

Revision 468a104b
Added by Ruben S. Montero over 3 years ago

F #4809: Enable curl timeouts

Revision 83d158d6
Added by Ruben S. Montero over 3 years ago

F #4809: Fix memory leak after disabling the pool cache

Revision a2c5a4cb
Added by Ruben S. Montero over 3 years ago

F #4809. Fix index in PoolSQL. Update interface in all classes

Revision e4280206
Added by Ruben S. Montero over 3 years ago

F #4809: Fix replication on failed zones, adjust timeout for replication

Revision a09c6d48
Added by Jaime Melis over 3 years ago

F #4809: Update VM in DB after generating context

This is an issue now because the cache has been
removed.

Revision 47050433
Added by Ruben S. Montero over 3 years ago

F #4809: Do not start Raft timer in solo mode

Revision 15001cb1
Added by Ruben S. Montero over 3 years ago

F #4809: Fix ns retries

Revision ca2a1a42
Added by Ruben S. Montero over 3 years ago

F #4809: Fix non-persistent state of Hosts

Revision 3224a504
Added by Jaime Melis over 3 years ago

F #4809: Cleanup VIP if oned dies

Revision 8e19b566
Added by Jaime Melis over 3 years ago

F #4809: Cleanup script

Revision cc6cc460
Added by Ruben S. Montero over 3 years ago

F #4809: Add replicated log index information on server zones

Revision 509d98df
Added by Jaime Melis over 3 years ago

F #4809: Send gratuitous ARP to update VIP

Revision d5d6cb96
Added by Ruben S. Montero over 3 years ago

F #4809: Get fed index from the DB (needed by followers in HA). Use Zone
ENDPOINT to replicate log instead of server list. Fix bug when replicate
fails in a zone.

Revision dbc47c98
Added by Ruben S. Montero over 3 years ago

F #4809: Remove unneeded update_zone calls when adding a server to a
zone

Revision 192ce9db
Added by Ruben S. Montero over 3 years ago

F #4809: Fix replica log for federation

Revision 4f8195d2
Added by Ruben S. Montero over 3 years ago

F #4809: Better management of last index of federated log

Revision 0c9272b6
Added by Ruben S. Montero over 3 years ago

F #4809: Fix divergence problems when replicated the fed log in a HA
zone

Revision 361409e7
Added by Ruben S. Montero over 3 years ago

F #4809: Compress federated log

Revision c6286c03
Added by Ruben S. Montero over 3 years ago

F #4809: Safer purge function for RAFT log

Revision 80d08166
Added by Ruben S. Montero over 3 years ago

F #4809: Re-design replicated log structure

Revision 899b460a
Added by Ruben S. Montero over 3 years ago

F #4809: Enable federated of solo Zones

Revision fab2a07f
Added by Ruben S. Montero over 3 years ago

F #4809: Log information to debug federated zones with HA clusters. THIS
COMMIT IS MEANT TO BE REVERTED

Revision 50ab087c
Added by Ruben S. Montero over 3 years ago

F #4809: Update onedb backup federated backup utility

Revision 6f4a45c8
Added by Ruben S. Montero over 3 years ago

Revert "F #4809: Log information to debug federated zones with HA clusters. THIS"

This reverts commit fab2a07f74f55528631fa5b6159e80c1fa884637.

Revision 6d24617e
Added by Ruben S. Montero over 3 years ago

F #4809: Pre-allocate lastoid to prevent stale id's in the pool in case
of leader failure

Revision fcf08d42
Added by Ruben S. Montero over 3 years ago

F #4809: Update migrator. There is no longer need to add servers to a
zone to configure a federation if not using HA

Revision 87b5e5cb
Added by Ruben S. Montero over 3 years ago

F #4809: Re-design replicated log structure

Revision 215bc0df
Added by Ruben S. Montero over 3 years ago

F #4809: Enable federated of solo Zones

Revision 7b22d875
Added by Ruben S. Montero over 3 years ago

F #4809: Log information to debug federated zones with HA clusters. THIS
COMMIT IS MEANT TO BE REVERTED

Revision 25b48b1e
Added by Ruben S. Montero over 3 years ago

F #4809: Update onedb backup federated backup utility

Revision 3378c9a2
Added by Ruben S. Montero over 3 years ago

Revert "F #4809: Log information to debug federated zones with HA clusters. THIS"

This reverts commit fab2a07f74f55528631fa5b6159e80c1fa884637.

Revision cfd29830
Added by Ruben S. Montero over 3 years ago

F #4809: Pre-allocate lastoid to prevent stale id's in the pool in case
of leader failure

Revision 031da2d3
Added by Ruben S. Montero over 3 years ago

F #4809: Update migrator. There is no longer need to add servers to a
zone to configure a federation if not using HA

Revision b315b710
Added by Ruben S. Montero over 3 years ago

F #4809: Fix race condition updating next index in federated log.

History

#1 Updated by Ruben S. Montero about 4 years ago

  • Related to Bug #4796: When failing over HA controllers, hypervisor collectd probes do not switch to new controller added

#2 Updated by Ruben S. Montero over 3 years ago

  • Assignee set to Ruben S. Montero

#3 Updated by Ruben S. Montero over 3 years ago

  • Assignee changed from Ruben S. Montero to Vlastimil Holer

#4 Updated by Ruben S. Montero over 3 years ago

  • Status changed from New to Closed
  • Resolution set to fixed

Also available in: Atom PDF