Bug #5195

Race condition when VMs statistics are gathered from vCenter and a VM is deleted

Added by Miguel Ángel Álvarez Cabrerizo over 3 years ago. Updated over 3 years ago.

Status:ClosedStart date:06/16/2017
Priority:NormalDue date:
Assignee:Miguel Ángel Álvarez Cabrerizo% Done:

0%

Category:vCenter
Target version:Release 5.4
Resolution:fixed Pull request:
Affected Versions:Development

Description

vCenter Performance Manager is used to retrieve statistics for nettx, netrx, disk IOPS... When a host is polled, OpenNebula asks the performance manager to retrieve information not only for VMs deployed by OpenNebula but for wild VMs.

OpenNebula creates a query that contains vCenter VM objects with its references. While that query is launched, a VM or several VMs may be deleted due to a terminate action. When vCenter receives the query there are VM objects that have been deleted and hence vCenter cannot provide its information and fails with the exception "ManagedObjectNotFound: The object has already been deleted or has not been completely created". That is, a VM contained in the performance manager query is no longer there as it was deleted after the query was launched.

Associated revisions

Revision 8d9bada5
Added by Miguel Ángel Álvarez Cabrerizo over 3 years ago

B #5195: Fix vCenter stats gather may fail if VM is deleted while metrics are gathered

History

#1 Updated by Miguel Ángel Álvarez Cabrerizo over 3 years ago

  • Subject changed from Race condition when to Race condition when VMs statistics are gathered from vCenter and a VM is deleted

Unfortunately if a VM fails to exist in the query, the whole perf manager query fails and the alternative of doing the perf queries one VM at a time it's quite unefficient so if the perf manager query fails a rescue {} is added and we assume that 0 is the answer for the statistics.

@stats = pm.retrieve_stats(
vm_objects,
['net.transmitted','net.bytesRx','net.bytesTx','net.received',
'virtualDisk.numberReadAveraged','virtualDisk.numberWriteAveraged',
'virtualDisk.read','virtualDisk.write'], {max_samples: max_samples}
) rescue {}@

The VCENTER_LAST_PERF_POLL attribute, which is used to know what was the last time vCenter statistics were successfully retrieved for VMs and to define how many data samples are queried, won't be updated if the perf query fails, therefore in the next poll all the samples that could not be used will be retrieved and then the statistic data won't be lost it only will be aggregated in a different time slot.

#2 Updated by Miguel Ángel Álvarez Cabrerizo over 3 years ago

The fix should avoid that VMs are changed to UNKNOWN state if a VM is deleted while the metrics are being gathered.

Also available in: Atom PDF