Making reservations in parallel using onevnet causes oned to slow down and then hang
|Assignee:||Ruben S. Montero||% Done:|
|Category:||Core & System|
|Target version:||Release 5.0|
|Affected Versions:||OpenNebula 4.12, OpenNebula 4.14|
When running multiple onevnet reserve commands concurrently, the first one succeeds but the rest fail with Net::ReadTimeout. After that any onevnet commands or sunstone virtual network operations will time out. Other, non virtual network operations get progressively slower after this point, eventually requiring opennebula to be restarted, at which point it is usable once more.
We are able to reproduce this consistently with:
onevnet reserve vnet -n deleteme-1 -s 1 -a 0 -i 10.20.37.70 & onevnet reserve vnet -n deleteme-2 -s 1 -a 0 -i 10.20.37.71 & onevnet reserve vnet -n deleteme-3 -s 1 -a 0 -i 10.20.37.72
These operations did work previously on our install. We have run onedb fsck, which revealed the following inconsistencies:
User 11 quotas: VNet 58 LEASES_USED has 1 is 0 Group 100 quotas: VNet 58 LEASES_USED has 1 is 0 Total errors found: 2 Total errors repaired: 2 Total errors unrepaired: 0
Network 58 has now been deleted, although vestigial quotas for it remain associated with some of our groups and users. WE are unable to use onegroup quota and oneuser quota to remove these. The problem still occurs with newly created virtual networks.
#2 Updated by Richard Stevenson over 4 years ago
Thanks for getting back to me.
As you can see above, we are asking for one address (and interface) per request, and specifying the IP explicitly as we do so. I would therefore think that the specific resource we are requesting is either available or not, there's no scope to fulfil that request and have an overflow. The detail for issue #4419 is rather brief. I'll happily check further if you can point me to more detail on the issue.