Bug #4640
live migrate with 'shared' SYSTEMD_DS not working properly
Status: | Closed | Start date: | 07/14/2016 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Javi Fontan | % Done: | 0% | |
Category: | Drivers - Storage | |||
Target version: | Release 5.2 | |||
Resolution: | fixed | Pull request: | ||
Affected Versions: | OpenNebula 5.0 |
Description
Hi,
There are some bugs in premigrate and postmigrate scripts in /tm/shared/
I am not sure how to address the issue - as a bug or as new feature proposal
I've reworked the scripts, in the following commit (not added as pull request because I am feeling that a discussion is needed first)
https://github.com/atodorov-storpool/one/commit/4a402f74e1670dc642e2345901ded87b8bb144db
I am copying here the body of the commit message
The change was inspired by the live-migrate scripts for SYSTEM_DS with `shared` TM_MAD
After some rework I've figured out that it could be made as some sort of generic one.
The proposed changes- tm_common.sh - new function migrate_other() receiving all arguments from the caller script
The function parses the TEMPLATE blob and exposes BASH arrays:
DISK_ID_ARRAY - array of all VM DISK_ID
TM_MAD_ARRAY - array of TM_MADs for each DISK_ID
CLONE_ARRAY - aray of all CLONE values for each DISK_ID
The function exposes DISK_ID of the CONTEXT disk too as CONTEXT_DISK_ID
The function check for extra argument to prevent recursion loops
For each TM_MAD it it is not current one the corresponding script is called
with same arguments but one additional - to mark that it is called not from
in the SYSTEM_DS context
Usage:
migrate_common "$@"
The function is appended to the following TM_MADs:
ceph
fs_lvm
qcow2
shared
ssh
The General idea is to allow live-migration of VMs with mix of disks with any of the above TM_MADs (different datastores)
For example if we have VM with the following disks
disk.0 ceph
disk.1 ceph (context)
disk.2 - ceph
disk.3 - fs_lvm
disk.4 - qcow2
In the above scenario when live migration is issed the following scripts will be called
ceph/premigrate <args>
fs_lvm/premigrate <args> "ceph"
qcow2/premigrate <args> "ceph"
As most probably I am missing something I am open for discussion :)
Not sure for the target version too - the current 5.0.1 script is broken but can't decide the majority of the proposed change...
Kind Regards,
Anton Todorov
Associated revisions
bug #4640: disable pre/postmigrate for shared tm
bug #4640: disable pre/postmigrate for shared tm
(cherry picked from commit 56fff9184f3784bc3cec16ca9bb04febe92d74d7)
History
#1 Updated by Jaime Melis almost 5 years ago
- Assignee set to Javi Fontan
#2 Updated by Javi Fontan almost 5 years ago
What's the error you're seeing in pre/post migrate?
We are going to review the change you're proposing but I think it's better do it in 5.2 instead of 5.0.2.
#3 Updated by Anton Todorov almost 5 years ago
Javi Fontan wrote:
What's the error you're seeing in pre/post migrate?
The current scripts has wrong variable in the path ($TMS vs $TM) to the {pre,post}migrate so they are not called.
But even if they were called they will fail because they expect 6 arguments not 3.
So the error is in the migration step because the VM disk symlinks/files are not created or attached at all
We are going to review the change you're proposing but I think it's better do it in 5.2 instead of 5.0.2.
I agree that it is not trivial change so I agree that 5.2 is better than 5.0.2.
In brief the change is that in addition of the SYSTEM_DS, there is a call for each additional TM_MAD associated with VM disk, passing same argumets plus one to mark that it is not called in the root SYSTEM_DS context(so "do not check other tm_mads and run their scripts too").
The presumption is that the {pre,post}migrate scripts are smart enough to process only the disks that are related to them. In our addon we are first checking is the disk's TM_MAD = `storpool` and doing work only on disks that are on StorPool.
Kind Regards,
Anton Todorov
#4 Updated by Javi Fontan almost 5 years ago
- Target version changed from Release 5.0.2 to Release 5.2
You're right. These scripts are wrong in some many ways. I've been trying to figure out how to fix them in a non intrusive way but the very way of calling the other TM scripts is broken.
I am going to disable these scripts for 5.0.2 as they can do more harm than help and we will check your suggestions on how to fix this for 5.2.0.
Thanks Anton.
#5 Updated by Javi Fontan almost 5 years ago
I am reviewing the change linked in your github repository and seems a good way to go. We will check what must be changed in the rest of the scripts and test that everything works the same. Right now testing the linked commit.
I'll update this issue with the results.
#6 Updated by Javi Fontan almost 5 years ago
- Status changed from Pending to Closed
- Resolution set to fixed
Merged to master and tests are OK. Closing this ticket. Another one can be opened if we find problems.
Thanks!