Bug #4

Wrong Failure Handling

Added by Ruben S. Montero about 13 years ago. Updated about 13 years ago.

Status:ClosedStart date:
Priority:HighDue date:
Assignee:Ruben S. Montero% Done:

0%

Category:Core & System
Target version:Release 1.0
Resolution:fixed Pull request:
Affected Versions:

Description

When a migration or other action is performed VM failure should be detected through the monitor command:

  • Trigger a monitor action when something fails
  • Rollback actions needed (remove history records...)
  • Include state monitoring in VM drivers

Associated revisions

Revision 92b17d4e
Added by Ruben S. Montero about 13 years ago

Simplified history handling in VM
Solved some history timer issues
Improved life-cycle (address ticket #4)
Solved a couple of deadlocks in some RequestManager methods

git-svn-id: http://svn.opennebula.org/trunk@9 3034c82b-c49b-4eb3-8279-a7acafdc01c0

Revision 06fd90a2
Added by Ruben S. Montero about 13 years ago

Improved life-cycle (address ticket #4), new action when monitor returns paused or error states. Reason attribute for history records is now updated

git-svn-id: http://svn.opennebula.org/trunk@10 3034c82b-c49b-4eb3-8279-a7acafdc01c0

History

#1 Updated by Ruben S. Montero about 13 years ago

Changes in the OpenNebula core include:

  • The failure events of the DispatchManager will be unified. The LifeCycleManager will send just one failure notification when a unrecoverable failure occurs
  • When a the VirtualMachineManager triggers a failure event the LCM will:
    • Return to RUNNING state
    • Trigger a Monitor action on the VMM
    • If the failure occurs during a migration. A new record with the original resource will be added to the history (reason of migration = ERROR). Capacity and running VMs will be adjusted for both hosts

#2 Updated by Javi Fontan about 13 years ago

VM state monitoring is implemented in r8. It is returned in STATE variable. The codes are as follows:
  • a: alive, xen states r, b, s
  • p: paused, xen state p
  • e: error, any other xen state

#3 Updated by Ruben S. Montero about 13 years ago

The DispatchManager events has been simplified. Also the life-cycle of the VM has been modified so when a failure occurs it returns to the RUNNING state (changeset r9). Still missing:
  • Handle state callbacks from polling actions
  • If a VM does not exists the driver should return error.

#4 Updated by Ruben S. Montero about 13 years ago

Changeset r10 includes new actions for monitor callbacks (paused & error).
Also the reason attribute for history records is now updated. I'll leave the ticket open till we test the whole life-cycle transitions.

#5 Updated by Javi Fontan about 13 years ago

Replying to [comment:4 ruben]:

The DispatchManager events has been simplified. Also the life-cycle of the VM has been modified so when a failure occurs it returns to the RUNNING state (changeset r9). Still missing:
  • Handle state callbacks from polling actions
  • If a VM does not exists the driver should return error.

Driver now returns STATE=d if the VM does not exists. Added in r16.

#6 Updated by Ruben S. Montero about 13 years ago

  • Resolution set to fixed

All changes committed and some preliminary tests performed. I am closing this ticket.

Also available in: Atom PDF