Add quota for VCPU
|Category:||Core & System|
To constrain users from the problematic practice of using high VCPU/CPU ratios, it would be helpful to have a VCPU quota in addition to the CPU quota. This would be particularly helpful in circumstances where hosts are loaded near their practical limits with interdependent VMs. We (CipherSpace) have seen a user effectively take down a host by launching multiple VMs with an aggregate VCPU/CPU over 4 and aggregate CPU to consume all unreserved CPU time. This sort of deployment seems functional under light actual load, but when all of the VMs start needing more CPU at once (e.g. as interconnected pieces of a modular application) the host can be sent into a failure mode characterized by very high context-switching rates (>100k/sec) and generally very slow response. This can cascade into UDP monitoring failure, resulting in fallback to SSH monitoring which only worsens the load while also failing, and ultimately this can trigger a migration attempt if the HOST_HOOK is enabled, which we have seen fail in a mode that leaves VMs running on two hosts accessing the same shared images, corrupting their filesystems.
We are currently handling this risk by cautioning users against creating VMs with VCPU/CPU>2 but that is an oversimplification of the risk and requires users to behave prudently. It would give them more flexibility within an enforced limit and more accurately model the risk if we could limit total active VCPU for each customer based on the specific nature of their VM usage.