Driver operation journaling
|Category:||Core & System|
High level design:
- Add a unique ID to each driver operation (e.g. DEPLOY <vm_id>).
- The core will keep track of the pending operations, in memory and the DB.
- When a new operation is started, all the pending ones are cancelled, or tagged to be ignored when (and if) they return.
- At boot time, oned will check the pending operations, and try to recover: either retry them, or assume a failed operation result.
The rationale behind:
Let's say a VM is being deployed to Host A. It is in BOOT, waiting for a DEPLOY SUCCESS/FAILURE.
The user gets tired of waiting, executes a onevm resubmit, and the scheduler deploys the VM to Host B.
Now the first deploy action to A ends, and returns a DEPLOY SUCCESS, so oned gets confused and thinks the VM got successfully deployed to the Host B.
This can happen with different states and operations, causing incorrect state transitions and inconsistent Host capacity (cpu and memory) counters.