Bug #4969
Ceph driver for system datastore ignores SHARED=YES
Status: | Closed | Start date: | 01/09/2017 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | kvaps kvaps | % Done: | 100% | |
Category: | Drivers - Storage | |||
Target version: | Release 5.4 | |||
Resolution: | fixed | Pull request: | ||
Affected Versions: | OpenNebula 5.2 |
Description
If you use ceph as system datastore and if your /var/lib/one is shared between your nodes.
Then `premigrate`, `postmigrate` and `mv` scripts from ceph tm driver removes vm folder: /var/lib/one/datastores/{datastoreid}/{vmid} with files like `deployment.0`, `disk.1`, `checkpoint`
This entails the error when `save` vm for `suspend` or `migrate`.
```
Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/migrate 'one-48' 'c13n1' 'c15n1' 48 c15n1
migrate: Command "virsh --connect qemu:///system migrate --live one-48 qemu+ssh://c13n1/system" failed: error: Cannot access storage file '/var/lib/one//datastores/100/48/disk.1' (as uid:9869, gid:9869): No such file or directory
Could not migrate one-48 to c13n1
ExitCode: 1
```
History
#1 Updated by kvaps kvaps over 4 years ago
- % Done changed from 80 to 100
My pull request:
https://github.com/OpenNebula/one/pull/182
#2 Updated by Jaime Melis over 4 years ago
- Target version set to Release 5.4
thanks for the ticket! we'll review it and study it as soon as possible
#3 Updated by Stefan Kooman over 4 years ago
I was bitten by this as well. I applied patch from @kvaps and that works well. Thanks!
#4 Updated by kvaps kvaps over 4 years ago
Please be informed before merging my changes:
I think we must to change default value SHARED = "YES" in TM_MAD_CONF for ceph to SHARED = "NO" in oned.conf
Otherwise, my fix may break clouds with default configuration for ceph tm driver.
Also, I think we need to have possibility for override TM_MAD_CONF options in oned.conf
Please review this comment:
https://github.com/OpenNebula/one/pull/182#discussion_r95556917
#5 Updated by Anton Todorov over 4 years ago
Hi,
I've already hit the case when a TM_MAD is shared for IMAGE and no for SYSTEM (ssh) with our addon.
Here is my request #5061 addressing the matter.
Kind Regards,
Anton Todorov
#6 Updated by Vlastimil Holer about 4 years ago
- Status changed from Pending to Closed
Hello,
kvaps kvaps wrote:
If you use ceph as system datastore and if your /var/lib/one is shared between your nodes.
Then `premigrate`, `postmigrate` and `mv` scripts from ceph tm driver removes vm folder: /var/lib/one/datastores/{datastoreid}/{vmid} with files like `deployment.0`, `disk.1`, `checkpoint`
This entails the error when `save` vm for `suspend` or `migrate`.
```
Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/migrate 'one-48' 'c13n1' 'c15n1' 48 c15n1
migrate: Command "virsh --connect qemu:///system migrate --live one-48 qemu+ssh://c13n1/system" failed: error: Cannot access storage file '/var/lib/one//datastores/100/48/disk.1' (as uid:9869, gid:9869): No such file or directory
Could not migrate one-48 to c13n1
ExitCode: 1
```
thank you for the bug report. There was really a problem with the live migrations with a shared filesystem on datastore directory and it was recently fixed by Anton's patch https://github.com/OpenNebula/one/pull/160 mentioned in the bug #4924.
I'm going to close the bug as fixed, feel free to reopen in case you have any doubts.
Best regards,
Vlastimil Holer