Bug #1704

Rare failure to deploy VM

Added by Steffen Neumann over 8 years ago. Updated about 8 years ago.

Status:ClosedStart date:12/20/2012
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Drivers - VM
Target version:-
Resolution:worksforme Pull request:
Affected Versions:OpenNebula 3.8

Description

Hi,

we have a running ONE-3.8.1 setup here. Master and nodes are runnung Ubuntu12.04,
with ONE compiled from source. Shared storage via NFS, stock ubuntu12.04 libvirt on the nodes.

What happens: we submitted a number of VMs, including VMID 365 and 366. The disks
are copied into var/datastores/0/365/disk.* but after that the SSH connection
to the node-blc18 dies, and ONE considers both VMs as failed.
In reality, libvirt was able to deploy and start the VM 366 successfully,
but ONE never saw that.

One effect is that the node-blc18 has a wrong count for availability,
and is over-subscribed. We have attached some log snippets.

Somehow the deployment needs to be "atomic", or wrapped in
a kind of transaction which (analogous to databases) either succeeds
or fails.

Yours,
Steffen

plain text document attachment (libvirtd.log on node-blc18)

2012-12-12 10:56:06.623+0000: 1539: info : libvirt version: 0.9.8
2012-12-12 10:56:06.623+0000: 1539: error : virNetMessageDecodeLength:140 : packet 16801276 bytes received from server too large, want 262144
2012-12-12 10:56:06.853+0000: 1543: error : virCommandHandshakeWait:2369 : Unable to wait for child process: Bad file descriptor
2012-12-12 12:23:57.040+0000: 1539: error : qemuMonitorIO:603 : internal error End of file from monitor
2012-12-12 12:23:59.276+0000: 1539: error : qemuMonitorIO:603 : internal error End of file from monitor
2012-12-12 12:24:23.181+0000: 1539: error : qemuMonitorIO:603 : internal error End of file from monitor

plain text document attachment (one-365.log)

2012-12-12 10:56:06.847+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 46080 -smp 12,sockets=12,cores=1,threads=1 -name one-365 -uuid 986f1048-f29c-18d3-352f-929fff8c2a91 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-365.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -drive file=/cumulus/one/cur/var//datastores/0/365/disk.0,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file=/cumulus/one/cur/var//datastores/0/365/disk.1,if=none,id=drive-ide0-0-1,format=raw -device ide-drive,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -drive file=/cumulus/one/cur/var//datastores/0/365/disk.2,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=18,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=02:00:0a:16:02:2d,bus=pci.0,addr=0x3 -netdev tap,fd=19,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=02:00:0a:17:02:0e,bus=pci.0,addr=0x4 -usb -vnc 0.0.0.0:365 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
2012-12-12 10:56:06.853+0000: shutting down

plain text document attachment (one-366.log)

2012-12-12 10:56:10.088+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 46080 -smp 12,sockets=12,cores=1,threads=1 -name one-366 -uuid 3a39d6f2-fcf7-0e39-89e5-19580432af59 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-366.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -drive file=/cumulus/one/cur/var//datastores/0/366/disk.0,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file=/cumulus/one/cur/var//datastores/0/366/disk.1,if=none,id=drive-ide0-0-1,format=raw -device ide-drive,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -drive file=/cumulus/one/cur/var//datastores/0/366/disk.2,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=18,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=02:00:0a:16:02:2e,bus=pci.0,addr=0x3 -netdev tap,fd=19,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=02:00:0a:17:02:0f,bus=pci.0,addr=0x4 -usb -vnc 0.0.0.0:366 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
qemu: terminating on signal 15 from pid 1539
2012-12-12 12:24:23.181+0000: shutting down

plain text document attachment (vm365.log)

Wed Dec 12 11:55:33 2012 [DiM][I]: New VM state is ACTIVE.
Wed Dec 12 11:55:33 2012 [LCM][I]: New VM state is PROLOG.
Wed Dec 12 11:56:05 2012 [TM][I]: clone: Cloning /cumulus/one/cur/var/datastores/1/539e973e5c346fa09df6b138ddec2af2 in node-blc18:/cumulus/one/cur/var//datastores/0/365/disk.0
Wed Dec 12 11:56:05 2012 [TM][I]: ExitCode: 0
Wed Dec 12 11:56:24 2012 [TM][I]: clone: Cloning /cumulus/one/cur/var/datastores/1/62f489936c7f3eb5742f903a1cb3e75d in node-blc18:/cumulus/one/cur/var//datastores/0/365/disk.1
Wed Dec 12 11:56:24 2012 [TM][I]: ExitCode: 0
Wed Dec 12 11:56:55 2012 [TM][I]: context: Generating context block device at node-blc18:/cumulus/one/cur/var//datastores/0/365/disk.2
Wed Dec 12 11:56:55 2012 [TM][I]: ExitCode: 0
Wed Dec 12 11:56:55 2012 [LCM][I]: New VM state is BOOT
Wed Dec 12 11:56:55 2012 [VMM][I]: Generating deployment file: /cumulus/one/cur/var/365/deployment.0
Wed Dec 12 11:57:06 2012 [VMM][I]: ExitCode: 0
Wed Dec 12 11:57:06 2012 [VMM][I]: Successfully execute network driver operation: pre.
Wed Dec 12 11:57:09 2012 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /cumulus/one/cur/var//datastores/0/365/deployment.0 node-blc18 365 node-blc18
Wed Dec 12 11:57:09 2012 [VMM][I]: error: Failed to create domain from /cumulus/one/cur/var//datastores/0/365/deployment.0
Wed Dec 12 11:57:09 2012 [VMM][I]: error: Unable to wait for child process: Bad file descriptor
Wed Dec 12 11:57:09 2012 [VMM][E]: Could not create domain from /cumulus/one/cur/var//datastores/0/365/deployment.0
Wed Dec 12 11:57:09 2012 [VMM][I]: ExitCode: 255
Wed Dec 12 11:57:09 2012 [VMM][I]: Failed to execute virtualization driver operation: deploy.
Wed Dec 12 11:57:09 2012 [VMM][E]: Error deploying virtual machine: Could not create domain from /cumulus/one/cur/var//datastores/0/365/deployment.0
Wed Dec 12 11:57:09 2012 [DiM][I]: New VM state is FAILED

plain text document attachment (vm366.log)

Wed Dec 12 11:55:33 2012 [DiM][I]: New VM state is ACTIVE.
Wed Dec 12 11:55:33 2012 [LCM][I]: New VM state is PROLOG.
Wed Dec 12 11:56:07 2012 [TM][I]: clone: Cloning /cumulus/one/cur/var/datastores/1/539e973e5c346fa09df6b138ddec2af2 in node-blc18:/cumulus/one/cur/var//datastores/0/366/disk.0
Wed Dec 12 11:56:07 2012 [TM][I]: ExitCode: 0
Wed Dec 12 11:56:25 2012 [TM][I]: clone: Cloning /cumulus/one/cur/var/datastores/1/62f489936c7f3eb5742f903a1cb3e75d in node-blc18:/cumulus/one/cur/var//datastores/0/366/disk.1
Wed Dec 12 11:56:25 2012 [TM][I]: ExitCode: 0
Wed Dec 12 11:56:56 2012 [TM][I]: context: Generating context block device at node-blc18:/cumulus/one/cur/var//datastores/0/366/disk.2
Wed Dec 12 11:56:56 2012 [TM][I]: ExitCode: 0
Wed Dec 12 11:56:56 2012 [LCM][I]: New VM state is BOOT
Wed Dec 12 11:56:56 2012 [VMM][I]: Generating deployment file: /cumulus/one/cur/var/366/deployment.0
Wed Dec 12 11:57:07 2012 [VMM][I]: ExitCode: 0
Wed Dec 12 11:57:07 2012 [VMM][I]: Successfully execute network driver operation: pre.
Wed Dec 12 11:57:07 2012 [VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy /cumulus/one/cur/var//datastores/0/366/deployment.0 node-blc18 366 node-blc18
Wed Dec 12 11:57:07 2012 [VMM][I]: error: Failed to create domain from /cumulus/one/cur/var//datastores/0/366/deployment.0
Wed Dec 12 11:57:07 2012 [VMM][I]: error: End of file while reading data: Input/output error
Wed Dec 12 11:57:07 2012 [VMM][E]: Could not create domain from /cumulus/one/cur/var//datastores/0/366/deployment.0
Wed Dec 12 11:57:07 2012 [VMM][I]: ExitCode: 255
Wed Dec 12 11:57:07 2012 [VMM][I]: Failed to execute virtualization driver operation: deploy.
Wed Dec 12 11:57:07 2012 [VMM][E]: Error deploying virtual machine: Could not create domain from /cumulus/one/cur/var//datastores/0/366/deployment.0
Wed Dec 12 11:57:07 2012 [DiM][I]: New VM state is FAILED

History

#1 Updated by Ruben S. Montero about 8 years ago

  • Category set to Drivers - VM

#2 Updated by Ruben S. Montero about 8 years ago

  • Status changed from New to Closed
  • Resolution set to worksforme

So if the VMs are successfully deployed but the driver returns a failure this can be tackled by the ZOMBIE attribute. There is not much that can be done if a failure is reported by the driver.

Note that disks transfers are made separately, and all the actions needed to deploy a VM are atomic (networking...)

Closing this as worksforme, will be reopened if more information can be used to reproduce this.

Also available in: Atom PDF