Bug #1952

Possible upgrade issue with host zombie VM detection

Added by Simon Boulet about 8 years ago. Updated about 8 years ago.

Status:ClosedStart date:04/24/2013
Priority:NormalDue date:
Assignee:Carlos Martín% Done:

0%

Category:-
Target version:-
Resolution:worksforme Pull request:
Affected Versions:OpenNebula 4.0

Description

I can't tell if that's a real bug or if something wrong happened with my Dev environment. It turns out VMs that were previously created (apparently before I upgraded my environment to 4.0-beta) were missing from the host_pool <VMS> XML list. This caused the missing VM to be improperly flagged as a "zombie" by OpenNebula. The VM were still in RUNNING state, but I noticed the monitoring information wasn't being updated.

I did find some references to the host_pool <VMS> XML list back in the 3.8.0_to_3.8.1.rb DB migration script. However, my Dev environment was last reinstalled straight from 3.8.1.

sqlite> select * from db_versioning;
0|3.8.1|1352405388|OpenNebula 3.8.1 daemon bootstrap
1|3.8.3|1360125767|Database migrated from 3.8.1 to 3.8.3 (OpenNebula 3.8.3) by onedb command.
2|3.9.80|1362753500|Database migrated from 3.8.3 to 3.9.80 (OpenNebula 3.9.80) by onedb command.
3|3.9.90|1366832639|Database migrated from 3.9.80 to 3.9.90 (OpenNebula 3.9.90) by onedb command.

Running a onedb fsck fixed the issue.

Perhaps rebuilding the <VMS> XML list in the 4.0 migrator would be a good idea to ensure the user doesn't end up with false zombies.

Also may I suggest adding logging to warn when a zombie (and wild) VM is detected.

Thanks

Simon

History

#1 Updated by Carlos Martín about 8 years ago

  • Assignee set to Carlos Martín
  • Target version set to Release 4.0

#2 Updated by Carlos Martín about 8 years ago

Hi,

I have just tested this, and it works for me. I've installed 3.8.1, launched a VM that was RUNNING, and then upgraded to 4.0-rc2.
The host has the correct ID inside the <VMS> element before and after the upgrade.

If you still have the backup, can you check if the host had the VM ID before the upgrade?
Maybe it was a bug in 3.8...
Was the VM in the RUNNING state?
Did you perform any operations that may trigger a bug in the core? something like a failed migration...

About rebuilding the VMS element in the migrator, we decided to include the fsck as the next step right after the upgrade, see http://opennebula.org/documentation:rel4.0:upgrade#check_db_consistency

Cheers

#3 Updated by Carlos Martín about 8 years ago

  • Target version changed from Release 4.0 to Release 4.2

#4 Updated by Simon Boulet about 8 years ago

Hi Carlos

It's really possible that this issue came from a bug I introduced while developing my drivers or developing various patches I submitted to OpenNebula (I sometime run through segfaults, etc. that could cause the DB to become corrupted). I wasn't able to reproduce this on my end with any new VMs. I think it's a good thing to suggest the onedb fsck in the upgrade documentation.

You can close this issue :)

Simon

#5 Updated by Carlos Martín about 8 years ago

  • Status changed from New to Closed
  • Target version deleted (Release 4.2)
  • Resolution set to worksforme

Great! Thanks anyway for reporting it, better safe than sorry.

Also available in: Atom PDF