Bug #148

one stops monitoring VM after deployement

Added by Marlon Nerling almost 12 years ago. Updated over 11 years ago.

Status:ClosedStart date:09/10/2009
Priority:HighDue date:
Assignee:Ruben S. Montero% Done:

0%

Category:Core & System
Target version:Release 1.2
Resolution:worksforme Pull request:
Affected Versions:

Description

one stops monitoring VM after deployement, status stays at BOOT, althought the machine is up and running.
if I update - manually the DB - by "update vm_pool set lcm_state=3 where oid=1766" then onevm list gets running, only one time, backing to boot by next.

There is no difference in the application to the last Week - as it was working correctly.

oned -v

Copyright 2002-2009, Distributed Systems Architecture Group,
Universidad Complutense de Madrid (dsa-research.org).

OpenNebula release 1.2 (2009/2/6) is distributed and licensed for use under the terms of the
Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).

Associated revisions

Revision 2565f29b
Added by Tino Vázquez over 4 years ago

Merge pull request #148 from n40lab/F4584_enhanced_storagedrs

F5484 Enhance StorageDRS allowing VM deployment

History

#1 Updated by Marlon Nerling almost 12 years ago

hu Sep 10 13:05:46 2009 [VMM][I]: Warning: Permanently added '172.22.253.254' (RSA) to the list of known hosts.
Thu Sep 10 13:05:46 2009 [VMM][I]: libvir: QEMU error : Domain not found
Thu Sep 10 13:05:46 2009 [VMM][I]: error: failed to get domain '-'
Thu Sep 10 13:05:46 2009 [VMM][I]: ExitCode: 1
Thu Sep 10 13:05:46 2009 [VMM][I]: VM running but it was not found. Assuming it is done.

But:

abatesting:/var/www/abatesting/DB/classes# ssh 172.22.253.254 virsh list
Warning: Permanently added '172.22.253.254' (RSA) to the list of known hosts.
Id Name State
----------------------------------
61 one-1776 running

#2 Updated by Marlon Nerling almost 12 years ago

I think I found it:

echo 'select * from vm_pool where oid=1776 ;' | sqlite3 /var/lib/one/one.db
1776|0|1252580746|1776|6|0|1252576564|1252580802||0|0|0|0

Do you see:

it does not register the deploy_id.

Could it be that one is having problem with the DB?

#3 Updated by Marlon Nerling almost 12 years ago

grep 1776 /var/log/one/oned.log

hu Sep 10 13:05:46 2009 [VMM][I]: Monitoring VM 1776.
Thu Sep 10 13:05:46 2009 [VMM][D]: Message received: LOG - 1776 Warning: Permanently added '172.22.253.254' (RSA) to the list of known hosts.
Thu Sep 10 13:05:46 2009 [VMM][D]: Message received: LOG - 1776 libvir: QEMU error : Domain not found
Thu Sep 10 13:05:46 2009 [VMM][D]: Message received: LOG - 1776 error: failed to get domain '-'
Thu Sep 10 13:05:46 2009 [VMM][D]: Message received: LOG - 1776 ExitCode: 1
Thu Sep 10 13:05:46 2009 [VMM][D]: Message received: POLL SUCCESS 1776 STATE=d
Thu Sep 10 13:05:46 2009 [TM][D]: Message received: LOG - 1776 tm_delete.sh: Source: 172.22.253.254:/var/lib/one//1776/images
Thu Sep 10 13:05:46 2009 [TM][D]: Message received: LOG - 1776 tm_delete.sh: Destiny: 172.22.253.254:/var/lib/one//1776/images
Thu Sep 10 13:05:46 2009 [TM][D]: Message received: LOG - 1776 tm_delete.sh: Executed "ssh 172.22.253.254 mkdir -p /var/lib/one//1776".
Thu Sep 10 13:05:46 2009 [TM][D]: Message received: LOG - 1776 tm_delete.sh: Holding 172.22.253.254:/var/lib/one//1776/images back
Thu Sep 10 13:06:42 2009 [TM][D]: Message received: LOG - 1776 tm_delete.sh: Executed "ssh 172.22.253.254 cp -a /var/lib/one//1776/images /one-master//var/lib/one//1776".
Thu Sep 10 13:06:42 2009 [TM][D]: Message received: LOG - 1776 tm_delete.sh: Executed "ssh 172.22.253.254 rm -rf /var/lib/one//1776".
Thu Sep 10 13:06:42 2009 [TM][D]: Message received: TRANSFER SUCCESS 1776 -

Log of another vm:
grep 1777 /var/log/one/oned.log
Thu Sep 10 13:14:40 2009 [DiM][D]: Deploying VM 1777
Thu Sep 10 13:14:40 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: SOURCE : abatesting:/var/lib/one/1770/images/disk.0
Thu Sep 10 13:14:40 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: DESTINATION: 172.22.0.6:/var/lib/one//1777/images/disk.0
Thu Sep 10 13:14:40 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Creating directory /var/lib/one//1777/images
Thu Sep 10 13:14:40 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Executed "ssh 172.22.0.6 mkdir -p /var/lib/one//1777/images".
Thu Sep 10 13:14:40 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Executed "ssh 172.22.0.6 chmod a+w /var/lib/one//1777/images".
Thu Sep 10 13:14:40 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Cloning /var/lib/one/1770/images/disk.0 to 172.22.0.6:/var/lib/one//1777/images/disk.0
Thu Sep 10 13:14:47 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Executed "ssh 172.22.0.6 cp -L /one-master//var/lib/one/1770/images/disk.0 /var/lib/one//1777/images/disk.0".
Thu Sep 10 13:14:47 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Executed "ssh 172.22.0.6 chmod a+rw -R /var/lib/one//1777/images/disk.0".
Thu Sep 10 13:14:47 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: SOURCE : abatesting:/var/lib/one/1770/images/disk.1
Thu Sep 10 13:14:47 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: DESTINATION: 172.22.0.6:/var/lib/one//1777/images/disk.1
Thu Sep 10 13:14:47 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Creating directory /var/lib/one//1777/images
Thu Sep 10 13:14:47 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Executed "ssh 172.22.0.6 mkdir -p /var/lib/one//1777/images".
Thu Sep 10 13:14:47 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Executed "ssh 172.22.0.6 chmod a+w /var/lib/one//1777/images".
Thu Sep 10 13:14:47 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Cloning /var/lib/one/1770/images/disk.1 to 172.22.0.6:/var/lib/one//1777/images/disk.1
Thu Sep 10 13:16:31 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Executed "ssh 172.22.0.6 cp -L /one-master//var/lib/one/1770/images/disk.1 /var/lib/one//1777/images/disk.1".
Thu Sep 10 13:16:31 2009 [TM][D]: Message received: LOG - 1777 tm_clone.sh: Executed "ssh 172.22.0.6 chmod a+rw -R /var/lib/one//1777/images/disk.1".
Thu Sep 10 13:16:31 2009 [TM][D]: Message received: TRANSFER SUCCESS 1777 -
Thu Sep 10 13:16:31 2009 [VMM][D]: Message received: LOG - 1777 Command: scp /var/lib/one/1777/deployment.0 172.22.0.6:/var/lib/one//1777/images/deployment.0
Thu Sep 10 13:16:31 2009 [VMM][D]: Message received: LOG - 1777 Copy success
Thu Sep 10 13:16:31 2009 [VMM][D]: Message received: LOG - 1777 Warning: Permanently added '172.22.0.6' (RSA) to the list of known hosts.
Thu Sep 10 13:16:31 2009 [VMM][D]: Message received: LOG - 1777 ExitCode: 0
Thu Sep 10 13:16:31 2009 [VMM][D]: Message received: DEPLOY SUCCESS 1777 one-1777
Thu Sep 10 13:16:46 2009 [VMM][I]: Monitoring VM 1777.
Thu Sep 10 13:16:46 2009 [VMM][D]: Message received: LOG - 1777 Warning: Permanently added '172.22.0.6' (RSA) to the list of known hosts.
Thu Sep 10 13:16:46 2009 [VMM][D]: Message received: LOG - 1777 ExitCode: 0
Thu Sep 10 13:16:46 2009 [VMM][D]: Message received: POLL SUCCESS 1777 USEDMEMORY=2096128 STATE=a
Thu Sep 10 13:17:16 2009 [VMM][I]: Monitoring VM 1777.
Thu Sep 10 13:17:16 2009 [VMM][D]: Message received: LOG - 1777 Warning: Permanently added '172.22.0.6' (RSA) to the list of known hosts.
Thu Sep 10 13:17:16 2009 [VMM][D]: Message received: LOG - 1777 ExitCode: 0
Thu Sep 10 13:17:16 2009 [VMM][D]: Message received: POLL SUCCESS 1777 USEDMEMORY=2096128 STATE=a
Thu Sep 10 13:17:46 2009 [VMM][I]: Monitoring VM 1777.
Thu Sep 10 13:17:46 2009 [VMM][D]: Message received: LOG - 1777 Warning: Permanently added '172.22.0.6' (RSA) to the list of known hosts.
Thu Sep 10 13:17:46 2009 [VMM][D]: Message received: LOG - 1777 ExitCode: 0
Thu Sep 10 13:17:46 2009 [VMM][D]: Message received: POLL SUCCESS 1777 USEDMEMORY=2096128 STATE=a
Thu Sep 10 13:18:16 2009 [VMM][I]: Monitoring VM 1777.
Thu Sep 10 13:18:16 2009 [VMM][D]: Message received: LOG - 1777 Warning: Permanently added '172.22.0.6' (RSA) to the list of known hosts.
Thu Sep 10 13:18:16 2009 [VMM][D]: Message received: LOG - 1777 ExitCode: 0
Thu Sep 10 13:18:16 2009 [VMM][D]: Message received: POLL SUCCESS 1777 USEDMEMORY=2096128 STATE=a
Thu Sep 10 13:18:46 2009 [VMM][I]: Monitoring VM 1777.
Thu Sep 10 13:18:46 2009 [VMM][D]: Message received: LOG - 1777 Warning: Permanently added '172.22.0.6' (RSA) to the list of known hosts.
Thu Sep 10 13:18:46 2009 [VMM][D]: Message received: LOG - 1777 ExitCode: 0
Thu Sep 10 13:18:46 2009 [VMM][D]: Message received: POLL SUCCESS 1777 USEDMEMORY=2096128 STATE=a
Thu Sep 10 13:19:16 2009 [VMM][I]: Monitoring VM 1777.
Thu Sep 10 13:19:16 2009 [VMM][D]: Message received: LOG - 1777 Warning: Permanently added '172.22.0.6' (RSA) to the list of known hosts.
Thu Sep 10 13:19:16 2009 [VMM][D]: Message received: LOG - 1777 ExitCode: 0
Thu Sep 10 13:19:16 2009 [VMM][D]: Message received: POLL SUCCESS 1777 USEDMEMORY=2096128 STATE=a
Thu Sep 10 13:19:46 2009 [VMM][I]: Monitoring VM 1777.
Thu Sep 10 13:19:46 2009 [VMM][D]: Message received: LOG - 1777 Warning: Permanently added '172.22.0.6' (RSA) to the list of known hosts.
Thu Sep 10 13:19:46 2009 [VMM][D]: Message received: LOG - 1777 ExitCode: 0
Thu Sep 10 13:19:46 2009 [VMM][D]: Message received: POLL SUCCESS 1777 USEDMEMORY=2096128 STATE=a

#4 Updated by Ruben S. Montero over 11 years ago

  • Status changed from New to Closed
  • Resolution set to worksforme

OpenNebula only monitors VMs in the Running state, this could be a missing callback from the driver. Also, the VM life-cycle has been changed to improve the behavior reported in the above comments, so it is more difficult to end with a zombie VM (not running for OpenNebula but actually running in the hypervisor).

Also available in: Atom PDF