Bug #5131

migration failure due to ceph system datastore being misused as a fs datastore

Added by Arnaud Abélard about 4 years ago. Updated almost 4 years ago.

Status:ClosedStart date:04/25/2017
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Drivers - Storage
Target version:-
Resolution:worksforme Pull request:
Affected Versions:OpenNebula 5.2

Description

I have 2 ceph datastores: one image datastore (id 100) and one system datastore (id 101):

root@one-ctrl-1:~# onedatastore show 101
DATASTORE 101 INFORMATION                                                       
ID             : 101                 
NAME           : ceph-system         
USER           : oneadmin            
GROUP          : oneadmin            
CLUSTERS       : 0                   
TYPE           : SYSTEM              
DS_MAD         : -                   
TM_MAD         : ceph                
BASE PATH      : /var/lib/one//datastores/101
DISK_TYPE      : RBD                 
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : 56.7T               
FREE:          : 54.1T               
USED:          : 2.6T                
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : uma                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
BRIDGE_LIST="iaas-vm-1.u07.univ-nantes.prive iaas-vm-2.u07.univ-nantes.prive iaas-vm-3.u07.univ-nantes.prive iaas-vm-4.u07.univ-nantes.prive iaas-vm-5.u07.univ-nantes.prive" 
CEPH_HOST="172.20.107.54:6789 172.20.106.54:6789 172.20.108.54:6789" 
CEPH_SECRET="6f5cab54-404b-4c63-b883-65ae350be8e7" 
CEPH_USER="opennebula" 
DATASTORE_CAPACITY_CHECK="YES" 
DISK_TYPE="RBD" 
DS_MIGRATE="NO" 
POOL_NAME="opennebula" 
RESTRICTED_DIRS="/" 
SAFE_DIRS="/var/tmp" 
SHARED="YES" 
TM_MAD="ceph" 
TYPE="SYSTEM_DS" 

When trying to migrate a VM it fails, because opennebula is trying to use the ceph datastore as a mounted fs datastore:

Tue Apr 25 15:46:16 2017 [Z0][VM][I]: New LCM state is SAVE_MIGRATE
Tue Apr 25 15:46:21 2017 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/save 'one-425' '/var/lib/one//datastores/101/425/checkpoint' 'iaas-vm-4.u07.univ-nantes.prive' 425 iaas-vm-4.u07.univ-nantes.prive
Tue Apr 25 15:46:21 2017 [Z0][VMM][E]: save: Command "virsh --connect qemu:///system save one-425 /var/lib/one//datastores/101/425/checkpoint" failed: error: Failed to save domain one-425 to /var/lib/one//datastores/101/425/checkpoint
Tue Apr 25 15:46:21 2017 [Z0][VMM][I]: error: operation failed: domain save job: unexpectedly failed
Tue Apr 25 15:46:21 2017 [Z0][VMM][E]: Could not save one-425 to /var/lib/one//datastores/101/425/checkpoint
Tue Apr 25 15:46:21 2017 [Z0][VMM][I]: ExitCode: 1
Tue Apr 25 15:46:21 2017 [Z0][VMM][I]: Failed to execute virtualization driver operation: save.
Tue Apr 25 15:46:21 2017 [Z0][VMM][E]: Error saving VM state: Could not save one-425 to /var/lib/one//datastores/101/425/checkpoint
Tue Apr 25 15:46:21 2017 [Z0][VM][I]: New LCM state is RUNNING
Tue Apr 25 15:46:21 2017 [Z0][LCM][I]: Fail to save VM state while migrating. Assuming that the VM is still RUNNING (will poll VM).
Tue Apr 25 16:05:16 2017 [Z0][VM][I]: New LCM state is SAVE_MIGRATE
Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/save 'one-425' '/var/lib/one//datastores/101/425/checkpoint' 'iaas-vm-4.u07.univ-nantes.prive' 425 iaas-vm-4.u07.univ-nantes.prive
Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: bash: ligne 2: impossible de créer un fichier temporaire pour le « here-document » : Aucun espace disponible sur le périphérique
Tue Apr 25 16:05:16 2017 [Z0][VMM][E]: save: Command "virsh --connect qemu:///system save one-425 /var/lib/one//datastores/101/425/checkpoint" failed: error: Failed to save domain one-425 to /var/lib/one//datastores/101/425/checkpoint
Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: error: operation failed: domain save job: unexpectedly failed
Tue Apr 25 16:05:16 2017 [Z0][VMM][E]: Could not save one-425 to /var/lib/one//datastores/101/425/checkpoint
Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: ExitCode: 1
Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: Failed to execute virtualization driver operation: save.
Tue Apr 25 16:05:16 2017 [Z0][VMM][E]: Error saving VM state: Could not save one-425 to /var/lib/one//datastores/101/425/checkpoint
Tue Apr 25 16:05:16 2017 [Z0][VM][I]: New LCM state is RUNNING
Tue Apr 25 16:05:16 2017 [Z0][LCM][I]: Fail to save VM state while migrating. Assuming that the VM is still RUNNING (will poll VM).

There is no datastore mounted on /var/lib/one//datastores/101/ since datastore 101 is a ceph datastore. "/var/lib/one//datastores/101/" is just a local path on the host...

History

#1 Updated by Anton Todorov about 4 years ago

Hi,

IMHO it is an expected behavior.

There are still Linux distributions in support that are shipped with virtualization stack (libvirt) that cannot read the checkpoint file directly from a block device so the driver is storing the RAM dump on the local file system and then it is imported to a CEPH volume. Later on resume or when the VM is started on the other host the checkpoint file is extracted from the CEPH volume and served to libvirt to start the VM. So for this to work you must have enough free space on the HV node for this operation.

Kind Regards,
Anton Todorov

#2 Updated by Javi Fontan almost 4 years ago

  • Category set to Drivers - Storage
  • Status changed from Pending to Closed
  • Resolution set to worksforme

As Anton said this is the expected behavior.

Also available in: Atom PDF