Bug #4284

Scheduler stops working during night

Added by Tobias Fischer over 5 years ago. Updated over 5 years ago.

Status:ClosedStart date:01/11/2016
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Scheduler
Target version:Release 5.0
Resolution:duplicate Pull request:
Affected Versions:OpenNebula 4.14

Description

Hello,

we observed that the scheduler stops during night - the next morning VMs stay in Pending state. After a scheduler restart (one stop-sched & one start-sched) everything works again as expected. as a workaround we're having cron doing this every 5 minutes.
We're running the controller (OpenNebula 4.14.0) on Debian 8.2 with systemd.
Could systemd be the Problem? Or maybe logrotate or cronjob?

Thanks!

Best,
Tobias

oned.log (13.6 MB) Tobias Fischer, 01/19/2016 11:12 AM

sched.log (2.17 KB) Tobias Fischer, 01/19/2016 11:12 AM


Related issues

Related to Bug #3390: mm_sched stops scheduling new vms Closed 12/01/2014

History

#1 Updated by Ruben S. Montero over 5 years ago

Could you enable core dumps, we could take a look at the core and see what happened....

#2 Updated by Tobias Fischer over 5 years ago

Hello Ruben,

see attached the logs. actually we had no problems while generating the logs. we will observe further and update the report if we find something.

Thanks for the help.

Best,
Tobias

#3 Updated by Armin Deliomini over 5 years ago

Hi Ruben,

We seem to have the same problem in our Setup since we did the last maintenance upgrade to 4.14.2 on Ubuntu 14.04.3 running kernel 4.2.0. Since Ubuntu is using Upstart in that version it should not be connected to systemd.

I'll collect the logs and upload if I find something related ...

Ciao, Armin

#4 Updated by Ruben S. Montero over 5 years ago

The best thing would be to connect to the scheduler and dump the stack, something like:


gdb`which mm_sched` `pgrep mm_sched`
› thread apply all bt

Thanks

#5 Updated by Tobias Fischer over 5 years ago

Hi Ruben,

today the scheduler stopped working - new VMs were stuck in PENDING state. After a one sched-stop & start VMs got deployed. This is the ouput of gdb`which mm_sched` `pgrep mm_sched` & thread apply all bt:

Thread 2 (Thread 0x7f226d8b4700 (LWP 14086)):
#0 0x00007f2272d18389 in pselect () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f22721fa237 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#2 0x00007f22721fa701 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#3 0x00007f22721facc7 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#4 0x00007f2273d93c3c in xmlrpc_c::clientXmlTransport_http::call(xmlrpc_c::carriageParm*, std::string const&, std::string*) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#5 0x00007f2273d937fe in xmlrpc_c::client_xml::call(xmlrpc_c::carriageParm*, std::string const&, xmlrpc_c::paramList const&, xmlrpc_c::rpcOutcome*) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#6 0x00007f2273d9693c in xmlrpc_c::rpc::call(xmlrpc_c::client*, xmlrpc_c::carriageParm*) () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#7 0x00007f2273d9875c in xmlrpc_c::clientSimple::call(std::string, std::string, std::string, xmlrpc_c::value*, ...) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#8 0x000000000042a577 in VirtualMachinePoolXML::load_info (this=0x1074830, result=...) at src/scheduler/src/pool/VirtualMachinePoolXML.cc:128
#9 0x000000000041757e in PoolXML::set_up (this=0x1074830) at src/scheduler/include/PoolXML.h:61
#10 0x000000000042b2ea in VirtualMachineActionsPoolXML::set_up (this=0x1074830) at src/scheduler/src/pool/VirtualMachinePoolXML.cc:259
#11 0x0000000000416599 in Scheduler::do_action (this=0x7ffe7315fd10, name="ACTION_TIMER", args=0x0) at src/scheduler/src/sched/Scheduler.cc:1355
#12 0x00000000004429a7 in ActionManager::loop (this=0x7ffe7315fdf8, timer=20, timer_args=0x0) at src/common/ActionManager.cc:111
#13 0x000000000041179a in scheduler_action_loop (arg=0x7ffe7315fd10) at src/scheduler/src/sched/Scheduler.cc:92
#14 0x00007f227380c0a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007f2272d1f04d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7f22747e2740 (LWP 14083)):
#0 0x00007f2273813609 in do_sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f2273813693 in sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x0000000000412970 in Scheduler::start (this=0x7ffe7315fd10) at src/scheduler/src/sched/Scheduler.cc:397
#3 0x000000000040d4bd in main (argc=1, argv=0x7ffe7315ffc8) at src/scheduler/src/sched/mm_sched.cc:72

tried a 2nd time:

Thread 2 (Thread 0x7f226d8b4700 (LWP 14086)):
#0 0x00007f2272d18389 in pselect () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f22721fa237 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#2 0x00007f22721fa701 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#3 0x00007f22721facc7 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#4 0x00007f2273d93c3c in xmlrpc_c::clientXmlTransport_http::call(xmlrpc_c::carriageParm*, std::string const&, std::string*) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#5 0x00007f2273d937fe in xmlrpc_c::client_xml::call(xmlrpc_c::carriageParm*, std::string const&, xmlrpc_c::paramList const&, xmlrpc_c::rpcOutcome*) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#6 0x00007f2273d9693c in xmlrpc_c::rpc::call(xmlrpc_c::client*, xmlrpc_c::carriageParm*) () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#7 0x00007f2273d9875c in xmlrpc_c::clientSimple::call(std::string, std::string, std::string, xmlrpc_c::value*, ...) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#8 0x000000000042a577 in VirtualMachinePoolXML::load_info (this=0x1074830, result=...) at src/scheduler/src/pool/VirtualMachinePoolXML.cc:128
#9 0x000000000041757e in PoolXML::set_up (this=0x1074830) at src/scheduler/include/PoolXML.h:61
#10 0x000000000042b2ea in VirtualMachineActionsPoolXML::set_up (this=0x1074830) at src/scheduler/src/pool/VirtualMachinePoolXML.cc:259
#11 0x0000000000416599 in Scheduler::do_action (this=0x7ffe7315fd10, name="ACTION_TIMER", args=0x0) at src/scheduler/src/sched/Scheduler.cc:1355
#12 0x00000000004429a7 in ActionManager::loop (this=0x7ffe7315fdf8, timer=20, timer_args=0x0) at src/common/ActionManager.cc:111
#13 0x000000000041179a in scheduler_action_loop (arg=0x7ffe7315fd10) at src/scheduler/src/sched/Scheduler.cc:92
#14 0x00007f227380c0a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007f2272d1f04d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7f22747e2740 (LWP 14083)):
#0 0x00007f2273813609 in do_sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f2273813693 in sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x0000000000412970 in Scheduler::start (this=0x7ffe7315fd10) at src/scheduler/src/sched/Scheduler.cc:397
#3 0x000000000040d4bd in main (argc=1, argv=0x7ffe7315ffc8) at src/scheduler/src/sched/mm_sched.cc:72

hope this helps...

Best,
Tobi

#6 Updated by Ruben S. Montero over 5 years ago

Hi Tobias,

Were these outputs obtained while the scheduler was not working or after restarting it?

Tobias Fischer wrote:

Hi Ruben,

today the scheduler stopped working - new VMs were stuck in PENDING state. After a one sched-stop & start VMs got deployed. This is the ouput of gdb`which mm_sched` `pgrep mm_sched` & thread apply all bt:

Thread 2 (Thread 0x7f226d8b4700 (LWP 14086)):
#0 0x00007f2272d18389 in pselect () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f22721fa237 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#2 0x00007f22721fa701 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#3 0x00007f22721facc7 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#4 0x00007f2273d93c3c in xmlrpc_c::clientXmlTransport_http::call(xmlrpc_c::carriageParm*, std::string const&, std::string*) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#5 0x00007f2273d937fe in xmlrpc_c::client_xml::call(xmlrpc_c::carriageParm*, std::string const&, xmlrpc_c::paramList const&, xmlrpc_c::rpcOutcome*) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#6 0x00007f2273d9693c in xmlrpc_c::rpc::call(xmlrpc_c::client*, xmlrpc_c::carriageParm*) () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#7 0x00007f2273d9875c in xmlrpc_c::clientSimple::call(std::string, std::string, std::string, xmlrpc_c::value*, ...) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#8 0x000000000042a577 in VirtualMachinePoolXML::load_info (this=0x1074830, result=...) at src/scheduler/src/pool/VirtualMachinePoolXML.cc:128
#9 0x000000000041757e in PoolXML::set_up (this=0x1074830) at src/scheduler/include/PoolXML.h:61
#10 0x000000000042b2ea in VirtualMachineActionsPoolXML::set_up (this=0x1074830) at src/scheduler/src/pool/VirtualMachinePoolXML.cc:259
#11 0x0000000000416599 in Scheduler::do_action (this=0x7ffe7315fd10, name="ACTION_TIMER", args=0x0) at src/scheduler/src/sched/Scheduler.cc:1355
#12 0x00000000004429a7 in ActionManager::loop (this=0x7ffe7315fdf8, timer=20, timer_args=0x0) at src/common/ActionManager.cc:111
#13 0x000000000041179a in scheduler_action_loop (arg=0x7ffe7315fd10) at src/scheduler/src/sched/Scheduler.cc:92
#14 0x00007f227380c0a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007f2272d1f04d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7f22747e2740 (LWP 14083)):
#0 0x00007f2273813609 in do_sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f2273813693 in sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x0000000000412970 in Scheduler::start (this=0x7ffe7315fd10) at src/scheduler/src/sched/Scheduler.cc:397
#3 0x000000000040d4bd in main (argc=1, argv=0x7ffe7315ffc8) at src/scheduler/src/sched/mm_sched.cc:72

tried a 2nd time:

Thread 2 (Thread 0x7f226d8b4700 (LWP 14086)):
#0 0x00007f2272d18389 in pselect () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f22721fa237 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#2 0x00007f22721fa701 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#3 0x00007f22721facc7 in ?? () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client.so.3
#4 0x00007f2273d93c3c in xmlrpc_c::clientXmlTransport_http::call(xmlrpc_c::carriageParm*, std::string const&, std::string*) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#5 0x00007f2273d937fe in xmlrpc_c::client_xml::call(xmlrpc_c::carriageParm*, std::string const&, xmlrpc_c::paramList const&, xmlrpc_c::rpcOutcome*) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#6 0x00007f2273d9693c in xmlrpc_c::rpc::call(xmlrpc_c::client*, xmlrpc_c::carriageParm*) () from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#7 0x00007f2273d9875c in xmlrpc_c::clientSimple::call(std::string, std::string, std::string, xmlrpc_c::value*, ...) ()
from /usr/lib/x86_64-linux-gnu/libxmlrpc_client++.so.8
#8 0x000000000042a577 in VirtualMachinePoolXML::load_info (this=0x1074830, result=...) at src/scheduler/src/pool/VirtualMachinePoolXML.cc:128
#9 0x000000000041757e in PoolXML::set_up (this=0x1074830) at src/scheduler/include/PoolXML.h:61
#10 0x000000000042b2ea in VirtualMachineActionsPoolXML::set_up (this=0x1074830) at src/scheduler/src/pool/VirtualMachinePoolXML.cc:259
#11 0x0000000000416599 in Scheduler::do_action (this=0x7ffe7315fd10, name="ACTION_TIMER", args=0x0) at src/scheduler/src/sched/Scheduler.cc:1355
#12 0x00000000004429a7 in ActionManager::loop (this=0x7ffe7315fdf8, timer=20, timer_args=0x0) at src/common/ActionManager.cc:111
#13 0x000000000041179a in scheduler_action_loop (arg=0x7ffe7315fd10) at src/scheduler/src/sched/Scheduler.cc:92
#14 0x00007f227380c0a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007f2272d1f04d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7f22747e2740 (LWP 14083)):
#0 0x00007f2273813609 in do_sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f2273813693 in sigwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x0000000000412970 in Scheduler::start (this=0x7ffe7315fd10) at src/scheduler/src/sched/Scheduler.cc:397
#3 0x000000000040d4bd in main (argc=1, argv=0x7ffe7315ffc8) at src/scheduler/src/sched/mm_sched.cc:72

hope this helps...

Best,
Tobi

#7 Updated by Tobias Fischer over 5 years ago

Hi Ruben,

outputs were obtained while the scheduler was not working - so before restart.

Best,
Tobi

#8 Updated by Ruben S. Montero over 5 years ago

  • Target version set to Release 5.0

#9 Updated by Michal Leinweber over 5 years ago

There is already older issue: http://dev.opennebula.org/issues/3390

#10 Updated by Ruben S. Montero over 5 years ago

  • Status changed from Pending to Closed
  • Resolution set to duplicate

#11 Updated by Ruben S. Montero over 5 years ago

  • Related to Bug #3390: mm_sched stops scheduling new vms added

Also available in: Atom PDF