Bug #4640

live migrate with 'shared' SYSTEMD_DS not working properly

Added by Anton Todorov almost 5 years ago. Updated almost 5 years ago.

Status:ClosedStart date:07/14/2016
Priority:NormalDue date:
Assignee:Javi Fontan% Done:

0%

Category:Drivers - Storage
Target version:Release 5.2
Resolution:fixed Pull request:
Affected Versions:OpenNebula 5.0

Description

Hi,

There are some bugs in premigrate and postmigrate scripts in /tm/shared/

I am not sure how to address the issue - as a bug or as new feature proposal

I've reworked the scripts, in the following commit (not added as pull request because I am feeling that a discussion is needed first)

https://github.com/atodorov-storpool/one/commit/4a402f74e1670dc642e2345901ded87b8bb144db

I am copying here the body of the commit message

The change was inspired by the live-migrate scripts for SYSTEM_DS with `shared` TM_MAD

After some rework I've figured out that it could be made as some sort of generic one.

The proposed changes
  • tm_common.sh - new function migrate_other() receiving all arguments from the caller script
    The function parses the TEMPLATE blob and exposes BASH arrays:
    DISK_ID_ARRAY - array of all VM DISK_ID
    TM_MAD_ARRAY - array of TM_MADs for each DISK_ID
    CLONE_ARRAY - aray of all CLONE values for each DISK_ID
    The function exposes DISK_ID of the CONTEXT disk too as CONTEXT_DISK_ID
    The function check for extra argument to prevent recursion loops
    For each TM_MAD it it is not current one the corresponding script is called
    with same arguments but one additional - to mark that it is called not from
    in the SYSTEM_DS context
Usage:
migrate_common "$@"

The function is appended to the following TM_MADs:
ceph
fs_lvm
qcow2
shared
ssh

The General idea is to allow live-migration of VMs with mix of disks with any of the above TM_MADs (different datastores)

For example if we have VM with the following disks
disk.0 ceph
disk.1 ceph (context)
disk.2 - ceph
disk.3 - fs_lvm
disk.4 - qcow2

In the above scenario when live migration is issed the following scripts will be called

ceph/premigrate <args>
fs_lvm/premigrate <args> "ceph"
qcow2/premigrate <args> "ceph"

As most probably I am missing something I am open for discussion :)

Not sure for the target version too - the current 5.0.1 script is broken but can't decide the majority of the proposed change...

Kind Regards,
Anton Todorov

Associated revisions

Revision 56fff918
Added by Javi Fontan almost 5 years ago

bug #4640: disable pre/postmigrate for shared tm

Revision 0cc04f9a
Added by Javi Fontan almost 5 years ago

bug #4640: disable pre/postmigrate for shared tm

(cherry picked from commit 56fff9184f3784bc3cec16ca9bb04febe92d74d7)

History

#1 Updated by Jaime Melis almost 5 years ago

  • Assignee set to Javi Fontan

#2 Updated by Javi Fontan almost 5 years ago

What's the error you're seeing in pre/post migrate?

We are going to review the change you're proposing but I think it's better do it in 5.2 instead of 5.0.2.

#3 Updated by Anton Todorov almost 5 years ago

Javi Fontan wrote:

What's the error you're seeing in pre/post migrate?

The current scripts has wrong variable in the path ($TMS vs $TM) to the {pre,post}migrate so they are not called.
But even if they were called they will fail because they expect 6 arguments not 3.

So the error is in the migration step because the VM disk symlinks/files are not created or attached at all

We are going to review the change you're proposing but I think it's better do it in 5.2 instead of 5.0.2.

I agree that it is not trivial change so I agree that 5.2 is better than 5.0.2.
In brief the change is that in addition of the SYSTEM_DS, there is a call for each additional TM_MAD associated with VM disk, passing same argumets plus one to mark that it is not called in the root SYSTEM_DS context(so "do not check other tm_mads and run their scripts too").
The presumption is that the {pre,post}migrate scripts are smart enough to process only the disks that are related to them. In our addon we are first checking is the disk's TM_MAD = `storpool` and doing work only on disks that are on StorPool.

Kind Regards,
Anton Todorov

#4 Updated by Javi Fontan almost 5 years ago

  • Target version changed from Release 5.0.2 to Release 5.2

You're right. These scripts are wrong in some many ways. I've been trying to figure out how to fix them in a non intrusive way but the very way of calling the other TM scripts is broken.

I am going to disable these scripts for 5.0.2 as they can do more harm than help and we will check your suggestions on how to fix this for 5.2.0.

Thanks Anton.

#5 Updated by Javi Fontan almost 5 years ago

I am reviewing the change linked in your github repository and seems a good way to go. We will check what must be changed in the rest of the scripts and test that everything works the same. Right now testing the linked commit.

I'll update this issue with the results.

#6 Updated by Javi Fontan almost 5 years ago

  • Status changed from Pending to Closed
  • Resolution set to fixed

Merged to master and tests are OK. Closing this ticket. Another one can be opened if we find problems.

Thanks!

Also available in: Atom PDF