Bug #187

opennebula does evently allocate the VMs on the hosts

Added by Marlon Nerling over 11 years ago. Updated over 11 years ago.

Status:ClosedStart date:01/06/2010
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Resolution:worksforme Pull request:
Affected Versions:

Description

We have a Host farm of 12 servers, all the same HardWare.
Per day we clone/work with sometime 30 VMs, rarely more than 20 at the same time.
We notice 5 of them NEVER being used at all.
7 of these Hosts are being used, every Day, all Days. Some of them with 4 Machines ( 1 VCPU pro VM), some with 1, other with 2.
Tried to delete and recreate this Hosts, no shake, they still unused.
Somehow I am beginning to believe opennebula has not an optimal host selection algorithm.

I will annex some statistics and in the next days a patch for this behavior.

Cheer

Associated revisions

Revision 3e8d3850
Added by German Gutierrez over 4 years ago

Fix: Missing programname in syslog output (#187)

History

#1 Updated by Marlon Nerling over 11 years ago

Marlon Nerling wrote:

We have a Host farm of 12 servers, all the same HardWare.
Per day we clone/work with sometime 30 VMs, rarely more than 20 at the same time.
We notice 5 of them NEVER being used at all.

OK.. NEVER was overacted! stays a problem to me.

7 of these Hosts are being used, every Day, all Days. Some of them with 4 Machines ( 1 VCPU pro VM), some with 1, other with 2.
Tried to delete and recreate this Hosts, no shake, they still unused.
Somehow I am beginning to believe opennebula has not an optimal host selection algorithm.

I will annex some statistics and in the next days a patch for this behavior.

Cheer

#2 Updated by Marlon Nerling over 11 years ago

Some statistics:

As you will see, opennebula allocates 5 VMs to ?..?.2 , ( every one 1 VCPU and between 2G and 4 GB memory ):

nerling@??????????????????:~/one-1.3.80/src$ onevm list
ID USER NAME STAT CPU MEM HOSTNAME TIME
2531 root 2003_DE3 runn 0 2096128 ?..?.2 39 04:53:54
2720 root 200332DE dele 0 3072000 ?..?.2 25 05:11:16
2886 root xp32DE runn 0 3072000 ?..?.13 03 23:59:29
2888 root win764DE runn 0 4117504 ?..?.12 03 21:30:58
2911 root hardy-x8 runn 0 3072000 ?..?.9 02 02:01:52
2934 root hardy-x8 runn 0 3072000 ?..?.10 00 23:22:24
2951 root win764DE runn 0 4117504 ?..?.5 00 03:17:18
2960 root Txp32DE unkn 0 3072000 ?..?.7 00 00:58:24
2962 root 200332DE runn 0 3072000 ?..?.2 00 00:10:38
2963 root vista64D runn 0 4117504 ?..?.2 00 00:10:13
2964 root win764DE runn 0 4117504 ?..?.4 00 00:09:49
2965 root vista64D prol 0 0 ?..?.2 00 00:02:23
2966 root xp32DE prol 0 0 ?..?.7 00 00:01:24

At the same time ?..?.11, ?..?.8, ?..?.6, ?..?.3 have no VM and others have 1 or 2!!

nerling@??????????????????:~/one-1.3.80/src$ onehost list
HID NAME RVM TCPU FCPU ACPU TMEM FMEM STAT
30 ?..?.2 4 400 32 32 1647421 6809820 on
32 ?..?.5 1 400 388 388 1647421 1213251 on
34 ?..?.7 2 400 160 160 1647421 1628625 on
38 ?..?.11 0 400 400 400 1647421 1634499 on
39 ?..?.12 2 400 382 382 1647421 1201402 on
41 ?..?.6 0 400 400 400 1647421 1632049 on
43 ?..?.13 1 400 396 396 1647421 1320783 on
44 ?..?.3 0 400 400 400 1647421 1634944 on
45 ?..?.7 0 400 256 256 1647421 1631358 on
46 ?..?.8 0 400 400 400 1647421 1634518 on
47 ?..?.9 1 400 378 378 1647421 1328256 on
48 ?..?.10 1 400 379 379 1647421 1019601 on
49 ?..?.4 1 400 294 294 1647421 1199256 on

#3 Updated by Marlon Nerling over 11 years ago

I was not able to understand and propose a patch for this issue.
At first I do not understand how and where in the source the Host ist selected.
My Idea of starts with:

select hid from host_shares ORDER BY running_vms ASC;

OBS: select * from host_pool where oid in ( select hid from host_shares ORDER BY running_vms ASC ) will not give follow the nested ORDER BY!! Maybe this is our whole problem?!

#4 Updated by Ruben S. Montero over 11 years ago

  • Status changed from New to Closed
  • Resolution set to worksforme

Hi,

The host selection process is perform by the scheduler using the REQUIREMENTS and RANK attributes. The algorithm works as follows:

  • First those hosts that do not meet the VM requirements (REQUIREMENTS attribute) and do not have enough resources (available CPU and memory) to run the VM are filtered out.
  • The RANK expression is evaluated upon this list using the information gathered by the monitor drivers. Any variable reported by the monitor driver can be included in the rank expression.
  • Those resources with a higher rank are used first to allocate VMs.

So try for example to load balance VMs with RANK=FREECPU (first those hosts with more FREECPU), or add a new probe to get the RUNNING_VMS of a host and put RANK=-RUNNING_VMS to "strip" them...

Ruben

#5 Updated by Marlon Nerling over 11 years ago

Yes, this is it.
I muss reread our documentation..
thanks very much Ruben

#6 Updated by Marlon Nerling over 11 years ago

I'm using RANK = -RUNNING_VMS (opennebula version 1.3.80)
But.. in the long run, it does not seems to work
As we see below:
1) I submited a VM (id=3683), then 30 seconds later another (id=3684) and both VMs were deployed on the HOST 172.22.0.21.
2) The VM 3686 was deployed on the host 172.22.0.20, even though the host has already two running VMs (ids 3664 and 3669), at the same time we had our hosts 3,4,5,8,9,14,15,16,17,18 and 19 empty.

Maybe am I using RANK = -RUNNING_VMS instead of RANK = - RUNNING_VMS (?) I will try it too.

One more question: can I use more than one expression in RANK, like RANK = FREECPU - RUNNING_VMS ??

3043 root 200332DE runn 0 3072000 172.22.0.2 48 22:00:37
3318 root 200832EN stop 0 3072000 172.22.0.6 26 22:00:37
3340 root win764DE dele 0 4117504 172.22.0.10 26 03:34:28
3349 root xp32DE runn 0 3072000 172.22.0.6 25 22:01:44
3489 root vista64D runn 0 4117504 172.22.0.11 15 22:23:05
3522 root vista64D runn 0 4117504 172.22.0.8 13 19:25:13
3664 root vista64D stop 0 4117504 172.22.0.20 02 03:39:29
3665 root win764DE fail 0 4117504 172.22.0.11 01 05:56:38
3666 root win764DE runn 0 4117504 172.22.0.2 02 01:40:22
3669 root hardy-x6 stop 0 4117504 172.22.0.20 01 22:44:08
3676 root vista_DE runn 0 1048576 172.22.0.11 00 22:18:58
3680 root 200332DE runn 0 3072000 172.22.0.2 00 21:47:34
3681 root win764DE runn 0 4117504 172.22.0.19 00 19:45:17
3682 root win764DE runn 0 4117504 172.22.0.2 00 19:43:43
3683 root Thardy-x unkn 0 4117504 172.22.0.21 00 05:49:06
3684 root Txp32DE unkn 0 3072000 172.22.0.21 00 05:48:46
3685 root Tvista64 unkn 0 4117504 172.22.0.7 00 05:48:26
3686 root Twin764D runn 0 4117504 172.22.0.20 00 05:48:06
3687 root T200332D unkn 0 3072000 172.22.0.17 00 05:47:47
3690 root win764DE runn 0 4117504 172.22.0.12 00 00:05:31

Thanks very much for your atención!

#7 Updated by Ruben S. Montero over 11 years ago

Hi Marlon,

check issue #196

Cheers

Also available in: Atom PDF