Bug #4969

Ceph driver for system datastore ignores SHARED=YES

Added by kvaps kvaps over 4 years ago. Updated about 4 years ago.

Status:ClosedStart date:01/09/2017
Priority:NormalDue date:
Assignee:kvaps kvaps% Done:

100%

Category:Drivers - Storage
Target version:Release 5.4
Resolution:fixed Pull request:
Affected Versions:OpenNebula 5.2

Description

If you use ceph as system datastore and if your /var/lib/one is shared between your nodes.
Then `premigrate`, `postmigrate` and `mv` scripts from ceph tm driver removes vm folder: /var/lib/one/datastores/{datastoreid}/{vmid} with files like `deployment.0`, `disk.1`, `checkpoint`
This entails the error when `save` vm for `suspend` or `migrate`.
```
Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/migrate 'one-48' 'c13n1' 'c15n1' 48 c15n1
migrate: Command "virsh --connect qemu:///system migrate --live one-48 qemu+ssh://c13n1/system" failed: error: Cannot access storage file '/var/lib/one//datastores/100/48/disk.1' (as uid:9869, gid:9869): No such file or directory
Could not migrate one-48 to c13n1
ExitCode: 1
```

History

#1 Updated by kvaps kvaps over 4 years ago

  • % Done changed from 80 to 100

#2 Updated by Jaime Melis over 4 years ago

  • Target version set to Release 5.4

thanks for the ticket! we'll review it and study it as soon as possible

#3 Updated by Stefan Kooman over 4 years ago

I was bitten by this as well. I applied patch from @kvaps and that works well. Thanks!

#4 Updated by kvaps kvaps over 4 years ago

Please be informed before merging my changes:
I think we must to change default value SHARED = "YES" in TM_MAD_CONF for ceph to SHARED = "NO" in oned.conf
Otherwise, my fix may break clouds with default configuration for ceph tm driver.

Also, I think we need to have possibility for override TM_MAD_CONF options in oned.conf
Please review this comment:
https://github.com/OpenNebula/one/pull/182#discussion_r95556917

#5 Updated by Anton Todorov over 4 years ago

Hi,

I've already hit the case when a TM_MAD is shared for IMAGE and no for SYSTEM (ssh) with our addon.

Here is my request #5061 addressing the matter.

Kind Regards,
Anton Todorov

#6 Updated by Vlastimil Holer about 4 years ago

  • Status changed from Pending to Closed

Hello,

kvaps kvaps wrote:

If you use ceph as system datastore and if your /var/lib/one is shared between your nodes.
Then `premigrate`, `postmigrate` and `mv` scripts from ceph tm driver removes vm folder: /var/lib/one/datastores/{datastoreid}/{vmid} with files like `deployment.0`, `disk.1`, `checkpoint`
This entails the error when `save` vm for `suspend` or `migrate`.
```
Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/migrate 'one-48' 'c13n1' 'c15n1' 48 c15n1
migrate: Command "virsh --connect qemu:///system migrate --live one-48 qemu+ssh://c13n1/system" failed: error: Cannot access storage file '/var/lib/one//datastores/100/48/disk.1' (as uid:9869, gid:9869): No such file or directory
Could not migrate one-48 to c13n1
ExitCode: 1
```

thank you for the bug report. There was really a problem with the live migrations with a shared filesystem on datastore directory and it was recently fixed by Anton's patch https://github.com/OpenNebula/one/pull/160 mentioned in the bug #4924.

I'm going to close the bug as fixed, feel free to reopen in case you have any doubts.

Best regards,
Vlastimil Holer

Also available in: Atom PDF