Bug #4516: Making reservations in parallel using onevnet causes oned to slow down and then hang - OpenNebula - OpenNebula Development pages

Bug #4516

Making reservations in parallel using onevnet causes oned to slow down and then hang

Added by Richard Stevenson about 5 years ago. Updated about 5 years ago.

Status:

Closed

Start date:

06/02/2016

Priority:

Normal

Due date:

Assignee:

Ruben S. Montero

% Done:

Category:

Core & System

Target version:

Release 5.0

Resolution:

fixed

Pull request:

Affected Versions:

OpenNebula 4.12, OpenNebula 4.14

Description

When running multiple onevnet reserve commands concurrently, the first one succeeds but the rest fail with Net::ReadTimeout. After that any onevnet commands or sunstone virtual network operations will time out. Other, non virtual network operations get progressively slower after this point, eventually requiring opennebula to be restarted, at which point it is usable once more.

We are able to reproduce this consistently with:

onevnet reserve vnet -n deleteme-1 -s 1 -a 0 -i 10.20.37.70 &
onevnet reserve vnet -n deleteme-2 -s 1 -a 0 -i 10.20.37.71 &
onevnet reserve vnet -n deleteme-3 -s 1 -a 0 -i 10.20.37.72

These operations did work previously on our install. We have run onedb fsck, which revealed the following inconsistencies:

User 11 quotas: VNet 58    LEASES_USED has 1     is    0
Group 100 quotas: VNet 58    LEASES_USED has 1     is    0
Total errors found: 2
Total errors repaired: 2
Total errors unrepaired: 0

Network 58 has now been deleted, although vestigial quotas for it remain associated with some of our groups and users. WE are unable to use onegroup quota and oneuser quota to remove these. The problem still occurs with newly created virtual networks.

Associated revisions

Revision 57976f61
Added by Ruben S. Montero about 5 years ago

Bug #4516: Fix deadlock with concurrent reservations on the same
vnet.

History

#1 Updated by Ruben S. Montero about 5 years ago

Hi Richard,

Could you check if you are being hit somehow by #4419? i.e. Do you have any AR with addresses overflow? I am not sure if that can probably lock some ARs...

Thanks

#2 Updated by Richard Stevenson about 5 years ago

Hi Ruben,

Thanks for getting back to me.

As you can see above, we are asking for one address (and interface) per request, and specifying the IP explicitly as we do so. I would therefore think that the specific resource we are requesting is either available or not, there's no scope to fulfil that request and have an overflow. The detail for issue #4419 is rather brief. I'll happily check further if you can point me to more detail on the issue.

Cheers,
Richard

#3 Updated by Ruben S. Montero about 5 years ago

Category set to Core & System
Status changed from Pending to New
Assignee set to Ruben S. Montero
Target version set to Release 5.0

Hi Richard,

I can confirm this issue, we are working on it.

Thanks for reporting!

#4 Updated by Richard Stevenson about 5 years ago

Thanks for confirming, glad you can reproduce the bug. I'd be much obliged if you'd keep the ticket up-to-date as and when you know more about the nature of the issue.

#5 Updated by Ruben S. Montero about 5 years ago

This is now fixed in master and ready for 5.0

#6 Updated by Ruben S. Montero about 5 years ago

Status changed from New to Closed
Resolution set to fixed
Affected Versions OpenNebula 4.8 added

#7 Updated by Ruben S. Montero about 5 years ago

Affected Versions OpenNebula 4.12, OpenNebula 4.14 added

Also available in: Atom PDF

OpenNebula

Issues

Custom queries