Bug #353

mm_sched crashes with segmentation fault

Added by Stefan Freitag almost 11 years ago. Updated almost 11 years ago.

Status:ClosedStart date:09/22/2010
Priority:NormalDue date:
Assignee:Ruben S. Montero% Done:

0%

Category:Scheduler
Target version:-
Resolution:fixed Pull request:
Affected Versions:

Description

Dear all,

I had to re-install a lot of VM im my system and noticed that the mm_sched stop assigning the VMs to physical nodes after some time. Doing a "ps aux" revealed that the mm_sched process was not running.

To handle with this issue I decided to run the mm_sched manually and supplied the parameters that usually would have been set by the "one" script (that starts the ONE daemon and the scheduler). I also tried to tweak some values and ran:

oneadmin@one:/opt/one/bin> mm_sched -p 2633 -t 30 -m 600 -d 3 -h 1

after a few seconds/ minutes I got this output:

Segmentation fault

I attached also a strace run for the mm_sched showing the same result.

Could you please have a look at this?
(I am running a ONE 2.0beta git snapshot from 29.07.2010)

Cheers,
Stefan

debug.txt Magnifier - strace for mm_sched (76.1 KB) Stefan Freitag, 09/22/2010 09:01 AM

Associated revisions

Revision 1daa959f
Added by Javi Fontan almost 4 years ago

Merge pull request #353 from gladhorn/master

Fix authentication for users with non-ascii chars in their names

History

#1 Updated by Stefan Freitag almost 11 years ago

Hello Ruben,

just an update to the situation. After some investigation the problem was imo caused by a "bad" template file that contained the line

REQUIREMENTS=HOSTNAME = ""

Do you know if the could lead to the seg fault in the scheduler?

Cheers
Stefan

#2 Updated by Carlos Martín almost 11 years ago

Hi Stefan,

I've tried to reproduce the error submitting this template:

NAME = test

REQUIREMENTS = HOSTNAME = "" 

But all I get is this error from the Request Manager:

$ onevm create template.one 
Error: [VirtualMachineAllocate] Error trying to PARSE VM TEMPLATE Returned error code [1].. Reason: syntax error, unexpected EQUAL, expecting $end or VARIABLE at line 3, columns 37:40

What means that the malformed template is never inserted in the DB, and the scheduler doesn't have a chance to read it.

Could you send us the complete template you used?

Regards,
Carlos.

#3 Updated by Stefan Freitag almost 11 years ago

Hello Carlos,

I think you missed some quotes in the requirement section.
Here is the template I used to create the VM === snipp ===
NAME = udo-wn117
VCPU = 8
MEMORY = 13312

OS = [
bootloader = "/root/bin/domUloader.py"
]

RAW = [ type="xen", data="bootargs=\"--verbose --entry=xvda1\"" ]

DISK = [
source = "/mnt/gridconfig/images/workernode/wn_sl54_x86_64_2.img",
target = "xvda",
readonly = "no" ]

DISK = [
type = swap,
size = 1024,
target = "xvdb",
readonly = "no" ]

DISK = [
type="block",
clone="yes",
source="/dev/cciss/c0d0p4",
target = "xvdc",
readonly = "no" ]

NIC = [NETWORK="DGRZRWorkernodes", IP = X.XXX.XXX.XXX ]

REQUIREMENTS = "HOSTNAME = \"\""
RANK = FREECPU

=== ===

Doing
  1. onevm create udo-wn117.template

results in

  1. onevm show 346
    VIRTUAL MACHINE 346 INFORMATION
    ID : 346
    NAME : udo-wn117
    STATE : PENDING
    LCM_STATE : LCM_INIT
    START TIME : 09/24 15:25:58
    END TIME : -
    DEPLOY ID: :

VIRTUAL MACHINE TEMPLATE
DISK=[
DISK_ID=0,
READONLY=no,
SOURCE=/mnt/gridconfig/images/workernode/wn_sl54_x86_64_2.img,
TARGET=xvda ]
DISK=[
DISK_ID=1,
READONLY=no,
SIZE=1024,
TARGET=xvdb,
TYPE=swap ]
DISK=[
CLONE=yes,
DISK_ID=2,
READONLY=no,
SOURCE=/dev/cciss/c0d0p4,
TARGET=xvdc,
TYPE=block ]
MEMORY=13312
NAME=udo-wn117
NIC=[
BRIDGE=eth0,
IP=XXX.XXX.XXX.XXX,
MAC=YY:YY:YY:YY:YY:YY,
NETWORK=DGRZRWorkernodes,
NETWORK_ID=0 ]
OS=[
BOOTLOADER=/root/bin/domUloader.py ]
RANK=FREECPU
RAW=[
DATA=bootargs="--verbose --entry=xvda1",
TYPE=xen ]
REQUIREMENTS=HOSTNAME = ""
VCPU=8
VMID=346

Cheers
Stefan

#4 Updated by Carlos Martín almost 11 years ago

  • Status changed from New to Closed
  • Resolution set to fixed

Hi Stefan,

You were right, there was a bug in the scheduler parser that led to a segmentation fault when it tried to compare an empty string. It should be solved by this commit:

http://dev.opennebula.org/projects/opennebula/repository/revisions/3448f8698603eaf50db47113315815da4e93f0d5

Thank you for your feedback!!

Also available in: Atom PDF