Bug #4681

Start VM Blocked after Generating deployment file

Added by Jimmy Goffaux almost 5 years ago. Updated almost 5 years ago.

Status:ClosedStart date:07/25/2016
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Core & System
Target version:Release 5.0.2
Resolution:worksforme Pull request:
Affected Versions:OpenNebula 5.0

Description

Hello,

I have OpenNebula 5.0.1-1 (From 4.14 after upgrade).

My problems :

After the version update, I lost a hypervisor (kernel panic) I restart the electrically.

When I wanted to restart VMs on hypervisor that they were all blocked.

Mon Jul 18 10:26:19 2016 [Z0][VM][I]: New LCM state is UNKNOWN
Mon Jul 18 10:43:18 2016 [Z0][VM][I]: New LCM state is BOOT_UNKNOWN
Mon Jul 18 10:43:18 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1959/deployment.1
Mon Jul 18 10:55:57 2016 [Z0][VM][I]: New LCM state is UNKNOWN
Mon Jul 18 10:59:33 2016 [Z0][VM][I]: New LCM state is RUNNING
Mon Jul 18 10:59:46 2016 [Z0][LCM][I]: VM running but monitor state is POWEROFF
Mon Jul 18 10:59:46 2016 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Mon Jul 18 10:59:46 2016 [Z0][VM][I]: New state is POWEROFF
Mon Jul 18 10:59:46 2016 [Z0][VM][I]: New LCM state is LCM_INIT
Mon Jul 18 11:11:07 2016 [Z0][VM][I]: New state is ACTIVE
Mon Jul 18 11:11:07 2016 [Z0][VM][I]: New LCM state is BOOT_POWEROFF
Mon Jul 18 11:11:07 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1959/deployment.2
Mon Jul 18 17:06:41 2016 [Z0][VM][I]: New state is POWEROFF
Mon Jul 18 17:06:41 2016 [Z0][VM][I]: New LCM state is LCM_INIT
Mon Jul 18 17:21:30 2016 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE_POWEROFF
Mon Jul 18 17:21:30 2016 [Z0][VM][I]: New state is ACTIVE
Mon Jul 18 17:21:34 2016 [Z0][VM][I]: New state is POWEROFF
Mon Jul 18 17:21:34 2016 [Z0][VM][I]: New LCM state is LCM_INIT

To solve the problem I had to migrate VMs to another hypervisor and re-install the hypervisor.

Thu Jul 21 23:26:24 2016 [Z0][VM][I]: New state is POWEROFF
Thu Jul 21 23:26:24 2016 [Z0][VM][I]: New LCM state is LCM_INIT
Thu Jul 21 23:26:37 2016 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE_POWEROFF
Thu Jul 21 23:26:37 2016 [Z0][VM][I]: New state is ACTIVE
Thu Jul 21 23:26:39 2016 [Z0][VM][I]: New state is POWEROFF
Thu Jul 21 23:26:39 2016 [Z0][VM][I]: New LCM state is LCM_INIT
Thu Jul 21 23:27:34 2016 [Z0][VM][I]: New state is ACTIVE
Thu Jul 21 23:27:34 2016 [Z0][VM][I]: New LCM state is BOOT_POWEROFF
Thu Jul 21 23:27:34 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/1959/deployment.6
Thu Jul 21 23:27:36 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Thu Jul 21 23:27:36 2016 [Z0][VMM][I]: ExitCode: 0

Unfortunately following an incident on the CEPH cluster of I revive all my VMs on all my knots.

Any old VM migrated to the new hypervisors yet had this problem.

When the problem is found, I tried to create a new VM on the hypervisor and I have the same problem but not on others.

Server : Ubuntu 14.04 and Ubuntu 16.04

Do you have an idea ?
Many thanks

History

#1 Updated by Ruben S. Montero almost 5 years ago

Probably this is an old version of the drivers in the hypervisors.

Can you try onehost sync -f?

#2 Updated by Jimmy Goffaux almost 5 years ago

I have run the command: :

oneadmin@ih-prd-oneadm01:~$ onehost sync -f
[.....]
* Adding ih-prd-onenode05 to upgrade
Jul 26 10:48:16 ih-prd-onenode05 sshd[19474]: Accepted publickey for oneadmin from A.B.C.D port 45721 ssh2: RSA SHA256:blabla
Jul 26 10:48:16 ih-prd-onenode05 sshd[19474]: pam_unix(sshd:session): session opened for user oneadmin by (uid=0)
Jul 26 10:48:16 ih-prd-onenode05 systemd-logind[2416]: New session 23 of user oneadmin.
Jul 26 10:48:18 ih-prd-onenode05 sshd[19557]: Received disconnect from A.B.C.D port 45721:11: disconnected by user
Jul 26 10:48:18 ih-prd-onenode05 sshd[19557]: Disconnected from A.B.C.D port 45721
Jul 26 10:48:18 ih-prd-onenode05 sshd[19474]: pam_unix(sshd:session): session closed for user oneadmin
Jul 26 10:48:18 ih-prd-onenode05 systemd-logind[2416]: Removed session 23.
Jul 26 10:48:25 ih-prd-onenode05 systemd-logind[2416]: Removed session 1.

And create VM :

Tue Jul 26 10:48:55 2016 [Z0][HKM][I]: Success executing Hook: IPPONHOSTING_HOOK: Outgoing update query:.
Tue Jul 26 10:49:42 2016 [Z0][VM][I]: New state is ACTIVE
Tue Jul 26 10:49:42 2016 [Z0][VM][I]: New LCM state is PROLOG
Tue Jul 26 10:49:44 2016 [Z0][VM][I]: New LCM state is BOOT
Tue Jul 26 10:49:44 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/2083/deployment.0

#3 Updated by Ruben S. Montero almost 5 years ago

And the VM is not booting?

You can further debug this:

1.- Check the deployment file, does it look ok?

2.- Go to the hypervisor and try to start the VM directly virsh create deployment.0, Check if there is any log in libvirtd qemu files.

#4 Updated by Jimmy Goffaux almost 5 years ago

Yes, the VM not booting.

The deployment file hast appeared to be okay. (/var/lib/one/vms/2083/deployment.0 from oneadmin)

The directory "/var/lib/one/datastores/ID_DS/VM_ID" have not even been created from oneadmin.

Yet the "XXX" server knows to connect:

oneadmin@ih-prd-oneadm01:~$ ssh ih-prd-onenode05
Warning: Permanently added 'ih-prd-onenode05,A.B.C.D' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-31-generic x86_64)
[...]

Last login: Mon Jul 18 15:25:25 2016 from A.B.C.D
oneadmin@ih-prd-onenode05:~$

#5 Updated by Ruben S. Montero almost 5 years ago

So it seems that somehow the prolog step is not working for you, In that step /var/lib/one/datastores/ID_DS/VM_ID is created, so please double check the image and system datastore, and the host configuration to access those datastores. Also oned.log should include messages for VM 2083 and prolog, Could you check for any error/log message related to the prolog of this VM (disks copied, directories created, etc...)

#6 Updated by Jimmy Goffaux almost 5 years ago

I think I have a right problem on the datastore:
- The rights are well positioned in /var/lib/one (chown oneadmin:oneadmin R)
I can create VMs on other hypervisors with the same cluster, the same datastores (Ceph) and virtual networks.

I find no error in the logs :

Tue Jul 26 10:48:52 2016 [Z0][ReM][D]: Req:6320 UID:41 TemplateInstantiate result SUCCESS, 2083
Tue Jul 26 10:48:53 2016 [Z0][ReM][D]: Req:5568 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:48:53 2016 [Z0][ReM][D]: Req:5568 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:48:54 2016 [Z0][ReM][D]: Req:8752 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:48:54 2016 [Z0][ReM][D]: Req:8752 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:48:54 2016 [Z0][ReM][D]: Req:7232 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:48:54 2016 [Z0][ReM][D]: Req:7232 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:48:55 2016 [Z0][ReM][D]: Req:7152 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:48:55 2016 [Z0][ReM][D]: Req:7152 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:48:55 2016 [Z0][ReM][D]: Req:80 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:48:55 2016 [Z0][ReM][D]: Req:80 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:48:55 2016 [Z0][HKM][D]: Message received: EXECUTE SUCCESS 2083 IPPONHOSTING_HOOK: Outgoing update query:
Tue Jul 26 10:48:55 2016 [Z0][HKM][D]: Message received: test-2083.one.ippon-hosting.net. 0 ANY    A    
Tue Jul 26 10:48:55 2016 [Z0][HKM][D]: Message received: test-2083.one.ippon-hosting.net. 60 IN    A    10.0.51.30
Tue Jul 26 10:48:55 2016 [Z0][HKM][D]: Message received: one-2083.one.ippon-hosting.net.    0 ANY    A    
Tue Jul 26 10:48:55 2016 [Z0][HKM][D]: Message received: one-2083.one.ippon-hosting.net.    60 IN    A    10.0.51.30
Tue Jul 26 10:49:00 2016 [Z0][ReM][D]: Req:7936 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:49:00 2016 [Z0][ReM][D]: Req:7936 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:49:04 2016 [Z0][ReM][D]: Req:6000 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:49:04 2016 [Z0][ReM][D]: Req:6000 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:49:05 2016 [Z0][ReM][D]: Req:7616 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:49:05 2016 [Z0][ReM][D]: Req:7616 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:49:12 2016 [Z0][ReM][D]: Req:3664 UID:0 VirtualMachineUpdateTemplate invoked , 2083, "<TEMPLATE><DESCRIPTI..." 
Tue Jul 26 10:49:12 2016 [Z0][ReM][D]: Req:3664 UID:0 VirtualMachineUpdateTemplate result SUCCESS, 2083
Tue Jul 26 10:49:20 2016 [Z0][ReM][D]: Req:8880 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:49:20 2016 [Z0][ReM][D]: Req:8880 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:49:42 2016 [Z0][ReM][D]: Req:4144 UID:0 VirtualMachineUpdateTemplate invoked , 2083, "<TEMPLATE><DESCRIPTI..." 
Tue Jul 26 10:49:42 2016 [Z0][ReM][D]: Req:4144 UID:0 VirtualMachineUpdateTemplate result SUCCESS, 2083
Tue Jul 26 10:49:42 2016 [Z0][ReM][D]: Req:1712 UID:0 VirtualMachineDeploy invoked , 2083, 16, false, 108
Tue Jul 26 10:49:42 2016 [Z0][DiM][D]: Deploying VM 2083
Tue Jul 26 10:49:42 2016 [Z0][ReM][D]: Req:1712 UID:0 VirtualMachineDeploy result SUCCESS, 2083
Tue Jul 26 10:49:43 2016 [Z0][ReM][D]: Req:1936 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:49:43 2016 [Z0][ReM][D]: Req:1936 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:49:44 2016 [Z0][TM][D]: Message received: TRANSFER SUCCESS 2083 -
Tue Jul 26 10:50:19 2016 [Z0][ReM][D]: Req:8912 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:50:19 2016 [Z0][ReM][D]: Req:8912 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:50:20 2016 [Z0][ReM][D]: Req:8448 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:50:20 2016 [Z0][ReM][D]: Req:8448 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:51:57 2016 [Z0][ReM][D]: Req:304 UID:41 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:51:57 2016 [Z0][ReM][D]: Req:304 UID:41 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:51:57 2016 [Z0][ReM][D]: Req:1664 UID:41 VirtualMachineMonitoring invoked , 2083
Tue Jul 26 10:51:59 2016 [Z0][ReM][D]: Req:8416 UID:41 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:51:59 2016 [Z0][ReM][D]: Req:8416 UID:41 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:51:59 2016 [Z0][ReM][D]: Req:4016 UID:41 VirtualMachineMonitoring invoked , 2083
Tue Jul 26 10:52:01 2016 [Z0][ReM][D]: Req:8352 UID:41 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:52:01 2016 [Z0][ReM][D]: Req:8352 UID:41 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:52:01 2016 [Z0][ReM][D]: Req:6416 UID:41 VirtualMachineMonitoring invoked , 2083
Tue Jul 26 10:52:05 2016 [Z0][ReM][D]: Req:2576 UID:41 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:52:05 2016 [Z0][ReM][D]: Req:2576 UID:41 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:52:05 2016 [Z0][ReM][D]: Req:3280 UID:41 VirtualMachineMonitoring invoked , 2083
Tue Jul 26 10:52:05 2016 [Z0][ReM][D]: Req:7280 UID:41 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:52:05 2016 [Z0][ReM][D]: Req:7280 UID:41 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:52:05 2016 [Z0][ReM][D]: Req:1808 UID:41 VirtualMachineMonitoring invoked , 2083
Tue Jul 26 10:54:50 2016 [Z0][ReM][D]: Req:9408 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:54:50 2016 [Z0][ReM][D]: Req:9408 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 
Tue Jul 26 10:54:50 2016 [Z0][ReM][D]: Req:3008 UID:0 VirtualMachineInfo invoked , 2083
Tue Jul 26 10:54:50 2016 [Z0][ReM][D]: Req:3008 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>2083</ID><UI..." 

#7 Updated by Jimmy Goffaux almost 5 years ago

I have reboot oneadmin and now it's Okay . . .

I think the admin server was in a weird state since the update.

I can not change the status of the ticket.

You can close it, sorry for the loss of time

#8 Updated by Ruben S. Montero almost 5 years ago

  • Status changed from Pending to Closed
  • Resolution set to worksforme

Thanks for letting us know

Also available in: Atom PDF