Bug #1952
Possible upgrade issue with host zombie VM detection
Status: | Closed | Start date: | 04/24/2013 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Carlos Martín | % Done: | 0% | |
Category: | - | |||
Target version: | - | |||
Resolution: | worksforme | Pull request: | ||
Affected Versions: | OpenNebula 4.0 |
Description
I can't tell if that's a real bug or if something wrong happened with my Dev environment. It turns out VMs that were previously created (apparently before I upgraded my environment to 4.0-beta) were missing from the host_pool <VMS> XML list. This caused the missing VM to be improperly flagged as a "zombie" by OpenNebula. The VM were still in RUNNING state, but I noticed the monitoring information wasn't being updated.
I did find some references to the host_pool <VMS> XML list back in the 3.8.0_to_3.8.1.rb DB migration script. However, my Dev environment was last reinstalled straight from 3.8.1.
sqlite> select * from db_versioning;
0|3.8.1|1352405388|OpenNebula 3.8.1 daemon bootstrap
1|3.8.3|1360125767|Database migrated from 3.8.1 to 3.8.3 (OpenNebula 3.8.3) by onedb command.
2|3.9.80|1362753500|Database migrated from 3.8.3 to 3.9.80 (OpenNebula 3.9.80) by onedb command.
3|3.9.90|1366832639|Database migrated from 3.9.80 to 3.9.90 (OpenNebula 3.9.90) by onedb command.
Running a onedb fsck fixed the issue.
Perhaps rebuilding the <VMS> XML list in the 4.0 migrator would be a good idea to ensure the user doesn't end up with false zombies.
Also may I suggest adding logging to warn when a zombie (and wild) VM is detected.
Thanks
Simon
History
#1 Updated by Carlos Martín about 8 years ago
- Assignee set to Carlos Martín
- Target version set to Release 4.0
#2 Updated by Carlos Martín about 8 years ago
Hi,
I have just tested this, and it works for me. I've installed 3.8.1, launched a VM that was RUNNING, and then upgraded to 4.0-rc2.
The host has the correct ID inside the <VMS> element before and after the upgrade.
If you still have the backup, can you check if the host had the VM ID before the upgrade?
Maybe it was a bug in 3.8...
Was the VM in the RUNNING state?
Did you perform any operations that may trigger a bug in the core? something like a failed migration...
About rebuilding the VMS element in the migrator, we decided to include the fsck as the next step right after the upgrade, see http://opennebula.org/documentation:rel4.0:upgrade#check_db_consistency
Cheers
#3 Updated by Carlos Martín about 8 years ago
- Target version changed from Release 4.0 to Release 4.2
#4 Updated by Simon Boulet about 8 years ago
Hi Carlos
It's really possible that this issue came from a bug I introduced while developing my drivers or developing various patches I submitted to OpenNebula (I sometime run through segfaults, etc. that could cause the DB to become corrupted). I wasn't able to reproduce this on my end with any new VMs. I think it's a good thing to suggest the onedb fsck in the upgrade documentation.
You can close this issue :)
Simon
#5 Updated by Carlos Martín about 8 years ago
- Status changed from New to Closed
- Target version deleted (
Release 4.2) - Resolution set to worksforme
Great! Thanks anyway for reporting it, better safe than sorry.