Bug #3733

xmlrpc-c / scheduler problems with big XML-Sets

Added by Robert Waffen over 3 years ago. Updated over 3 years ago.

Status:ClosedStart date:04/01/2015
Priority:HighDue date:
Assignee:-% Done:

0%

Category:Scheduler
Target version:Release 4.12.1
Resolution:fixed Pull request:
Affected Versions:OpenNebula 4.10

Description

We experiencing a strange behavior with the scheduler and the xmlrpc-c lib.

We have one opennebula 4.10 instance on CentOS 6 which is managing 10 server and 1200 VMs. We are extending and want to do up to 2000 VMs.
At the moment we can't do more than 1300 VMs. If we do more VMs they will end in pendig state and not be deployed.
We use XMLRPC to communicate with ONe. It seems we get xml-sets which are greater than 10mb and with those the xmlrpc-c lib breaks.

The Log says then something like this:

  • /var/log/one/sched.log
    Tue Mar 31 16:31:20 2015 [Z0][VM][E]: Exception raised: Response XML from server is not valid XML-RPC response.  Unable to find XML-RPC response in what server sent back.  Not valid XML.  XML parsing failed
    Tue Mar 31 16:31:20 2015 [Z0][POOL][E]: Could not retrieve pool info from ONE
    
These are the RPC Options we set in our installation:
  • oned.conf
    MAX_CONN           = 240
    MAX_CONN_BACKLOG   = 480
    KEEPALIVE_TIMEOUT  = 150
    KEEPALIVE_MAX_CONN = 300
    TIMEOUT            = 150
    #temp enabled for debugging
    RPC_LOG            = YES
    MESSAGE_SIZE       = 1073741824
    

Associated revisions

Revision 04a97cc7
Added by gschmidt over 3 years ago

Bug #3733: xmlrpc-c / scheduler problems with big XML-Sets - switched from xmlParseMemory to xmlReadMemory; xmlReadMemory allows parameter XML_PARSE_HUGE which adds support for files >10MB

(cherry picked from commit b98a2cdc67c5330b0dbc6ba63e67e90fb956996d)

Revision 537e152a
Added by gschmidt over 3 years ago

Bug #3733: xmlrpc-c / scheduler problems with big XML-Sets - switched from xmlParseMemory to xmlReadMemory; xmlReadMemory allows parameter XML_PARSE_HUGE which adds support for files >10MB

(cherry picked from commit b98a2cdc67c5330b0dbc6ba63e67e90fb956996d)

History

#1 Updated by Ruben S. Montero over 3 years ago

Hi Robert,

We have also MESSAGE_SIZE, in sched.conf for the client side, could you double check that one?

#2 Updated by Robert Waffen over 3 years ago

Hi Ruben...

our sched.conf looks like this

MESSAGE_SIZE = 1073741824
ONED_PORT = 2633
SCHED_INTERVAL = 30
MAX_VM       = 5000
MAX_DISPATCH = 30
MAX_HOST     = 1
LIVE_RESCHEDS  = 0

DEFAULT_SCHED = [
    policy = 1
]

DEFAULT_DS_SCHED = [
   policy = 1
]

LOG = [
  system      = "file",
  debug_level = 3
]

#3 Updated by Ruben S. Montero over 3 years ago

ok, that should be more than enough for thousands of VMs. Does other client tools works? like onevm list and onehost list?

#4 Updated by Robert Waffen over 3 years ago

yes onevm and other commands work. onevm needs up to 8s to execute.

#5 Updated by Ruben S. Montero over 3 years ago

that's also too much, probably you are missing some ruby gems. Try running install_gems again to install nokogiri etc...
About the scheduler problem we need to reproduce this...

#6 Updated by Ruben S. Montero over 3 years ago

Hi Robert,

I'm now able to reproduce this, will keep you posted.

Cheers

#7 Updated by Robert Waffen over 3 years ago

Hi Ruben,

oh, cool... we thought maybe we are handling the xmlrpc api wrong...
but when you can reproduce the behavior this seems to be a real bug or something

#8 Updated by Robert Waffen over 3 years ago

colleages of mine have tested something...
maybe there ist a problem with libxml2 which has a hard limit of 10mb for text nodes. this is since version 2.7.3.

the following should address the problem:

diff --git a/src/xml/ObjectXML.cc b/src/xml/ObjectXML.cc

index 446a0a2..e57c999 100644
--- a/src/xml/ObjectXML.cc
+++ b/src/xml/ObjectXML.cc
@@ -564,7 +564,7 @@ int ObjectXML::validate_xml(const string &xml_doc)

 void ObjectXML::xml_parse(const string &xml_doc)
 {
-    xml = xmlParseMemory (xml_doc.c_str(),xml_doc.length());
+    xml = xmlReadMemory (xml_doc.c_str(),xml_doc.length(),0,0,XML_PARSE_HUGE);

     if (xml == 0)
     {

what do you think?

#9 Updated by Robert Waffen over 3 years ago

we tested this now... we will do a pull request...

#10 Updated by Ruben S. Montero over 3 years ago

I can confirm that the problem is with the libxml2 parsing,... however I think this error is happening at the libxmlrpc-c parsing, here:

http://sourceforge.net/p/xmlrpc-c/code/HEAD/tree/advanced/src/xmlrpc_libxml2.c#l452

(the ""XML parsing failed"" string is the one output in the last place in the error message).

I was looking for a method to pass XML_PARSE_HUGE to the libxmlrpc-c or set globally that option for parsers, but I could not find a API call for either two :( If this is not an option we can easily overwrite the call method of the client XML_RPC class to set the option.

Thanks for the help

#11 Updated by Gerald Schmidt over 3 years ago

You are absolutely right. For our test we patched libxml2 with options |= XML_PARSE_HUGE.

We can confirm that this solves the problem, but as you say this fix requires a change in xmlrpc_libxml2 (perhaps adding a param options to xml_parse?) or a patched libxml2 (our rather ugly interim fix).

#12 Updated by Ruben S. Montero over 3 years ago

  • Status changed from Pending to Closed
  • Target version set to Release 4.12.1
  • Resolution set to fixed

So the final solution is two-folded:

1.- Apply the patch sent by Robert (already in master and 4.12)

2.- Use XML_PARSE_HUGE in the xmlrpc-c client. This patch is applied to the libxmlrpc used to build the opennebula packages. We are already linking statically with a compiled xmlrpc-c library because some important distributions ships a very old version or compile against insecure xml parsers.

I'm closing this and will be released in 4.12.1

Thanks guys for the feedback :)

Also available in: Atom PDF