Bug #4682

cloud-config crashes OpenNebula

Added by Rachel Chen over 4 years ago. Updated over 4 years ago.

Status:ClosedStart date:07/25/2016
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Resolution:worksforme Pull request:
Affected Versions:OpenNebula 5.0

Description

Environment:

root@tetra-oned:~# uname -a
Linux tetra-oned 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 GNU/Linux
root@tetra-oned:~# oned -v
Copyright 2002-2016, OpenNebula Project, OpenNebula Systems        

OpenNebula 5.0.0 is distributed and licensed for use under the terms of the
Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).

Steps to reproduce:
1. Build an actually usable CoreOS image for OpenNebula (https://github.com/zllovesuki/coreos-opennebula-image)
2. Use this template to provision:

CONTEXT = [
  USER_DATA = "$USER_DATA",
  NETWORK = "YES",
  SET_HOSTNAME = "$NAME",
  SSH_PUBLIC_KEY = "$USER[SSH_PUBLIC_KEY]" ]
CPU = "2" 
DISK = [
  IMAGE = "CoreOS Stable 1068.8.0" ]
FEATURES = [
  ACPI = "yes",
  APIC = "yes",
  HYPERV = "yes",
  PAE = "yes" ]
GRAPHICS = [
  LISTEN = "0.0.0.0",
  TYPE = "VNC" ]
HYPERVISOR = "kvm" 
MEMORY = "8192" 
NIC = [
  NETWORK = "NAT" ]
NIC = [
  NETWORK = "External" ]
NIC_DEFAULT = [
  MODEL = "virtio" ]
OS = [
  ARCH = "x86_64" ]
USER_INPUTS = [
  USER_DATA = "M|text|cloud-config" ]
VCPU = "4" 

Problems:
Attempting to provision VM with the following cloud-config crashes OpenNebula:

Jul 25 05:27:12 tetra-oned kernel: [63985.419425] oned[5217]: segfault at 20 ip 0000000000421f8e sp 00007f23ffffc430 error 4 in oned[400000+2a4000]

#cloud-config

coreos:
  units:
    - name: etcd2.service
      runtime: true
      drop-ins:
        - name: 10-oem.conf
          content: |
            [Service]
            Environment=ETCD_ELECTION_TIMEOUT=1200
    - name: reload.service
      command: start
      content: |
        [Unit]
        Description=reload systemd

        [Service]
        Type=oneshot
        ExecStart=/usr/bin/systemctl daemon-reload
    - name: start.service
      command: start
      content: |
        [Unit]
        Description=start etcd2

        [Service]
        Type=oneshot
        ExecStart=/usr/bin/systemctl start etcd2

  etcd2:
    # generate a new token for each unique cluster from https://discovery.etcd.io/new?size=3
    discovery: "https://discovery.etcd.io/aeaa1d21e3ea14bca24071f8a7a029f4" 
    advertise-client-urls: "http://$NIC[IP, NETWORK=\"External\"]:2379" 
    initial-advertise-peer-urls: "http://$NIC[IP, NETWORK=\"NAT\"]:2380" 
    listen-client-urls: "http://0.0.0.0:2379,http://0.0.0.0:4001" 
    listen-peer-urls: "http://$NIC[IP, NETWORK=\"NAT\"]:2380,http://$NIC[IP, NETWORK=\"NAT\"]:7001" 

I'm suspecting that OpenNebula is trying to replace $NIC variable when creating the context ISO but fails for some reasons?

History

#1 Updated by Ruben S. Montero over 4 years ago

Hi Jerry

I cannot reproduce this in 5.0.2, Could you try with that version? Although I think there has not been any change regarding context generation.

This is what I'm getting

 onevm show 2
VIRTUAL MACHINE 2 INFORMATION                                                   
ID                  : 2                   
NAME                : test-2              
USER                : ruben               
GROUP               : oneadmin            
STATE               : PENDING             
LCM_STATE           : LCM_INIT            
RESCHED             : No                  
START TIME          : 08/04 10:52:24      
END TIME            : -                   
DEPLOY ID           : -                   

VIRTUAL MACHINE MONITORING                                                      

PERMISSIONS                                                                     
OWNER               : um-                 
GROUP               : ---                 
OTHER               : ---                 

VM DISKS                                                                        
 ID DATASTORE  TARGET IMAGE                               SIZE      TYPE SAVE
  0 -          hda    CONTEXT                             -/-       -       -

VM NICS                                                                         
 ID NETWORK              BRIDGE       IP              MAC               PCI_ID  
  0 NAT                  br1          172.16.0.10     02:00:ac:10:00:0a
  1 External             br0          10.0.0.1        02:00:0a:00:00:01

SECURITY                                                                        

NIC_ID NETWORK                   SECURITY_GROUPS                                
     0 NAT                       0
     1 External                  0

SECURITY GROUP   TYPE     PROTOCOL NETWORK                       RANGE          
  ID NAME                          VNET START             SIZE                  
   0 default     OUTBOUND ALL
   0 default     INBOUND  ALL

USER TEMPLATE                                                                   
HYPERVISOR="kvm" 
SCHED_MESSAGE="Thu Aug  4 10:52:27 2016 : No hosts enabled to run VMs" 
USER_DATA="#cloud-config

coreos:
  units:
    - name: etcd2.service
      runtime: true
      drop-ins:
        - name: 10-oem.conf
          content: |
            [Service]
            Environment=ETCD_ELECTION_TIMEOUT=1200
    - name: reload.service
      command: start
      content: |
        [Unit]
        Description=reload systemd

        [Service]
        Type=oneshot
        ExecStart=/usr/bin/systemctl daemon-reload
    - name: start.service
      command: start
      content: |
        [Unit]
        Description=start etcd2

        [Service]
        Type=oneshot
        ExecStart=/usr/bin/systemctl start etcd2

  etcd2:
    # generate a new token for each unique cluster from https://discovery.etcd.io/new?size=3
    discovery: \"https://discovery.etcd.io/aeaa1d21e3ea14bca24071f8a7a029f4\" 
    advertise-client-urls: \"http://$NIC[IP, NETWORK=\"External\"]:2379\" 
    initial-advertise-peer-urls: \"http://$NIC[IP, NETWORK=\"NAT\"]:2380\" 
    listen-client-urls: \"http://0.0.0.0:2379,http://0.0.0.0:4001\" 
    listen-peer-urls: \"http://$NIC[IP, NETWORK=\"NAT\"]:2380,http://$NIC[IP, NETWORK=\"NAT\"]:7001\" " 
USER_INPUTS=[
  USER_DATA="M|text|cloud-config" ]

VIRTUAL MACHINE TEMPLATE                                                        
AUTOMATIC_DS_REQUIREMENTS="\"CLUSTERS/ID\" @> 0" 
AUTOMATIC_REQUIREMENTS="(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES)" 
CONTEXT=[
  DISK_ID="0",
  ETH0_CONTEXT_FORCE_IPV4="",
  ETH0_DNS="",
  ETH0_GATEWAY="",
  ETH0_GATEWAY6="",
  ETH0_IP="172.16.0.10",
  ETH0_IP6="",
  ETH0_IP6_ULA="",
  ETH0_MAC="02:00:ac:10:00:0a",
  ETH0_MASK="",
  ETH0_MTU="",
  ETH0_NETWORK="",
  ETH0_SEARCH_DOMAIN="",
  ETH0_VLAN_ID="",
  ETH0_VROUTER_IP="",
  ETH0_VROUTER_IP6="",
  ETH0_VROUTER_MANAGEMENT="",
  ETH1_CONTEXT_FORCE_IPV4="",
  ETH1_DNS="",
  ETH1_GATEWAY="",
  ETH1_GATEWAY6="",
  ETH1_IP="10.0.0.1",
  ETH1_IP6="",
  ETH1_IP6_ULA="",
  ETH1_MAC="02:00:0a:00:00:01",
  ETH1_MASK="",
  ETH1_MTU="",
  ETH1_NETWORK="",
  ETH1_SEARCH_DOMAIN="",
  ETH1_VLAN_ID="",
  ETH1_VROUTER_IP="",
  ETH1_VROUTER_IP6="",
  ETH1_VROUTER_MANAGEMENT="",
  NETWORK="YES",
  SET_HOSTNAME="test-2",
  SSH_PUBLIC_KEY="",
  TARGET="hda",
  USER_DATA="#cloud-config

coreos:
  units:
    - name: etcd2.service
      runtime: true
      drop-ins:
        - name: 10-oem.conf
          content: |
            [Service]
            Environment=ETCD_ELECTION_TIMEOUT=1200
    - name: reload.service
      command: start
      content: |
        [Unit]
        Description=reload systemd

        [Service]
        Type=oneshot
        ExecStart=/usr/bin/systemctl daemon-reload
    - name: start.service
      command: start
      content: |
        [Unit]
        Description=start etcd2

        [Service]
        Type=oneshot
        ExecStart=/usr/bin/systemctl start etcd2

  etcd2:
    # generate a new token for each unique cluster from https://discovery.etcd.io/new?size=3
    discovery: \"https://discovery.etcd.io/aeaa1d21e3ea14bca24071f8a7a029f4\" 
    advertise-client-urls: \"http://$NIC[IP, NETWORK=\"External\"]:2379\" 
    initial-advertise-peer-urls: \"http://$NIC[IP, NETWORK=\"NAT\"]:2380\" 
    listen-client-urls: \"http://0.0.0.0:2379,http://0.0.0.0:4001\" 
    listen-peer-urls: \"http://$NIC[IP, NETWORK=\"NAT\"]:2380,http://$NIC[IP, NETWORK=\"NAT\"]:7001\" " ]
CPU="2" 
FEATURES=[
  ACPI="yes",
  APIC="yes",
  HYPERV="yes",
  PAE="yes" ]
GRAPHICS=[
  LISTEN="0.0.0.0",
  TYPE="VNC" ]
MEMORY="8192" 
OS=[
  ARCH="x86_64" ]
TEMPLATE_ID="0" 
VCPU="4" 
VMID="2" 

Note that there is no double substitution, once USER_DATA=$USER_DATA is resolved it is not resolved again.

#2 Updated by Ruben S. Montero over 4 years ago

Maybe you can get the backtrace from gdb, using coredumpctl or similar?

#3 Updated by Rachel Chen over 4 years ago

Interesting. I could not reproduce the same result on my home lab OpenNebula. Maybe it is an isolated incident where there is a bug somewhere in my system. I will close this instead.

#4 Updated by Rachel Chen over 4 years ago

Well, could you please close this?

#5 Updated by Ruben S. Montero over 4 years ago

  • Status changed from Pending to Closed
  • Resolution set to worksforme

Thanks!!!

Also available in: Atom PDF