Bug #4516

Making reservations in parallel using onevnet causes oned to slow down and then hang

Added by Richard Stevenson over 4 years ago. Updated over 4 years ago.

Status:ClosedStart date:06/02/2016
Priority:NormalDue date:
Assignee:Ruben S. Montero% Done:

0%

Category:Core & System
Target version:Release 5.0
Resolution:fixed Pull request:
Affected Versions:OpenNebula 4.12, OpenNebula 4.14

Description

When running multiple onevnet reserve commands concurrently, the first one succeeds but the rest fail with Net::ReadTimeout. After that any onevnet commands or sunstone virtual network operations will time out. Other, non virtual network operations get progressively slower after this point, eventually requiring opennebula to be restarted, at which point it is usable once more.

We are able to reproduce this consistently with:

onevnet reserve vnet -n deleteme-1 -s 1 -a 0 -i 10.20.37.70 &
onevnet reserve vnet -n deleteme-2 -s 1 -a 0 -i 10.20.37.71 &
onevnet reserve vnet -n deleteme-3 -s 1 -a 0 -i 10.20.37.72

These operations did work previously on our install. We have run onedb fsck, which revealed the following inconsistencies:

User 11 quotas: VNet 58    LEASES_USED has 1     is    0
Group 100 quotas: VNet 58    LEASES_USED has 1     is    0
Total errors found: 2
Total errors repaired: 2
Total errors unrepaired: 0

Network 58 has now been deleted, although vestigial quotas for it remain associated with some of our groups and users. WE are unable to use onegroup quota and oneuser quota to remove these. The problem still occurs with newly created virtual networks.

Associated revisions

Revision 57976f61
Added by Ruben S. Montero over 4 years ago

Bug #4516: Fix deadlock with concurrent reservations on the same
vnet.

History

#1 Updated by Ruben S. Montero over 4 years ago

Hi Richard,

Could you check if you are being hit somehow by #4419? i.e. Do you have any AR with addresses overflow? I am not sure if that can probably lock some ARs...

Thanks

#2 Updated by Richard Stevenson over 4 years ago

Hi Ruben,

Thanks for getting back to me.

As you can see above, we are asking for one address (and interface) per request, and specifying the IP explicitly as we do so. I would therefore think that the specific resource we are requesting is either available or not, there's no scope to fulfil that request and have an overflow. The detail for issue #4419 is rather brief. I'll happily check further if you can point me to more detail on the issue.

Cheers,
Richard

#3 Updated by Ruben S. Montero over 4 years ago

  • Category set to Core & System
  • Status changed from Pending to New
  • Assignee set to Ruben S. Montero
  • Target version set to Release 5.0

Hi Richard,

I can confirm this issue, we are working on it.

Thanks for reporting!

#4 Updated by Richard Stevenson over 4 years ago

Thanks for confirming, glad you can reproduce the bug. I'd be much obliged if you'd keep the ticket up-to-date as and when you know more about the nature of the issue.

#5 Updated by Ruben S. Montero over 4 years ago

This is now fixed in master and ready for 5.0

#6 Updated by Ruben S. Montero over 4 years ago

  • Status changed from New to Closed
  • Resolution set to fixed
  • Affected Versions OpenNebula 4.8 added

#7 Updated by Ruben S. Montero over 4 years ago

  • Affected Versions OpenNebula 4.12, OpenNebula 4.14 added

Also available in: Atom PDF