Feature #1770

Image decompress and their size reported in OpenNebula

Added by Simon Boulet almost 7 years ago. Updated over 5 years ago.

Status:ClosedStart date:02/19/2013
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Core & System
Target version:Release 4.4
Resolution:fixed Pull request:

Description

Currently when registering an image:

1- The user submit a new image (using oneimage create)
2- OpenNebula calls the DS driver STAT action to obtain the size of the image. That size is looked up against the OpenNebula datastore quota and is registered.
3- The image is set to LOCKED state.
4- OpenNebula calls the LN driver CP action, which does a bunch of things, but most importantly call tar to decompress the image (unless the datastore is NO_DECOMPRESS).
5- If the CP action is successful, the image changes to READY state.

The problem is that the actual size (or amount of space) used by the uncompressed image is a lot more than the size originally reported by the STAT action.

root@OpenNebula-Test:~# sudo -u oneadmin -i oneimage show 1 | grep -e SOURCE -e SIZE
SOURCE : /var/lib/one/datastores/1/7cdbf23f732b0de66a7ea91770fcbf27
SIZE : 212M

But since the image was decompressed in the datastore, the actual amount of space consumed is 627M.

root@OpenNebula-Test:~# du -hs /var/lib/one/datastores/1/7cdbf23f732b0de66a7ea91770fcbf27
627M /var/lib/one/datastores/1/7cdbf23f732b0de66a7ea91770fcbf27

Ideally I think OpenNebula should report the image size as being the full 627M, not the size of the initial compressed archive (which was 212M).

I came up with 2 solutions :

1- Adapt the DS driver STAT action to report the real uncompressed size at the first place (Ex. using gunzip -l to report the uncompressed size). This would work for local images, but not for remote images (ex. comming from the marketplace), wget is currently used to obtain the remote size before downloading the image, and I couldn't think of anyway to obtain the size of the remote archive without first downloading it.

2- So I thought perhaps OpenNebula could do a second STAT action on the image before switching the image to READY state and update the image SIZE accordingly (and bumping the user quota). I think that could be very feasible with small disruptive changes to OpenNebula.

I'd like to hear what you guys think before proposing a patch.

Simon


Related issues

Related to Feature #1662: Update image size of persistent images Closed 11/19/2012
Related to Feature #2489: Let set SIZE for Images using SOURCE Closed 11/22/2013

Associated revisions

Revision f69e86f2
Added by Ruben S. Montero about 6 years ago

feature #1770: Apparent size of qcow images reported by fs_size function

Revision 61484386
Added by Ruben S. Montero about 6 years ago

feature #1770: Apparent size of qcow images reported by fs_size function

(cherry picked from commit f69e86f2bb29d819ec85c402e5a1e1bbfc2647de)

Revision c1ff1fd1
Added by Jaime Melis almost 6 years ago

Feature #1770: Handle non-uniform 'file -b' output in distributions

Revision d6438fb6
Added by Jaime Melis almost 6 years ago

Feature #1770: Handle non-uniform 'file -b' output in distributions

(cherry picked from commit c1ff1fd11b1992601c86c3eedfbb4c7c080a398d)

History

#1 Updated by Ruben S. Montero almost 7 years ago

Totally agree, we first observed that when registering an image from the market place. I prefer the second option. But thinking about this, as we are always doing CP+STAT, why not merging both?. So CP would report back to OpenNebula both the SOURCE + SIZE. (If no size is reported back we can fail-back to that obtained from the first stat, I am thinking of custom datastore drivers). This way there is no need to execute other operation...

#2 Updated by Simon Boulet almost 7 years ago

I like the idea of CP reporting the "final" size of an image. Making the second size parameter optional for drivers that don't report it (and when that's the case simply not update the size that was initially reported by STAT). Might look at this in 2-3 weeks.

#3 Updated by Ruben S. Montero over 6 years ago

  • Category set to Drivers - Storage

#4 Updated by Ruben S. Montero over 6 years ago

  • Category changed from Drivers - Storage to Core & System

#5 Updated by Ruben S. Montero over 6 years ago

  • Target version set to Release 4.2

#6 Updated by Ruben S. Montero over 6 years ago

This also happens for other formats like qcow2....

So we need to improve STAT and keep track of file size changes

#7 Updated by Ruben S. Montero over 6 years ago

  • Target version changed from Release 4.2 to Release 4.4

#8 Updated by Ruben S. Montero about 6 years ago

  • Target version deleted (Release 4.4)

#9 Updated by Ruben S. Montero about 6 years ago

  • Target version set to Release 4.4

This already works for gzips. Needs to add support for qcow's

#10 Updated by Ruben S. Montero about 6 years ago

  • Status changed from New to Closed
  • Resolution set to fixed

fix in master

#11 Updated by Rolandas Naujikas about 6 years ago

Still there are problems with compressed qcow2 images, like hadoop from OpenNebula Marketplace.
Those are really of 10GB, but after import it is shown as 1.3GB.

#12 Updated by Rolandas Naujikas about 6 years ago

Also when creating/registering image with only SOURCE attribute its size is always 0MB.
That is probably because PATH in this case is missing.

#13 Updated by Ruben S. Montero about 6 years ago

Rolandas Naujikas wrote:

Still there are problems with compressed qcow2 images, like hadoop from OpenNebula Marketplace.
Those are really of 10GB, but after import it is shown as 1.3GB.

This should be a problem of the market place image. Note that we are not decompressing the file to get the actual size. Market place images take their size from either the json document, or HTTP headers.

Locally we cannot support nested formats, so stat operation does not require decompressing a whole image.

#14 Updated by Ruben S. Montero about 6 years ago

Rolandas Naujikas wrote:

Also when creating/registering image with only SOURCE attribute its size is always 0MB.
That is probably because PATH in this case is missing.

Yes, when using SOURCE we are not calling the stat script. Note that using source should be a restricted operation that bypasses all OpenNebula checks. In fact it is in oned.conf as restricted:

IMAGE_RESTRICTED_ATTR = "SOURCE" 

But, you may need to put a size in the image. So,Probably the best option is to let the admin set the SIZE when using SOURCE. Like:

NAME=admin_image
SOURCE=/dev/hdb
SIZE=10240

What do you think?

#15 Updated by Rolandas Naujikas about 6 years ago

I agree, setting SIZE for SOURCE would be solution.
But for the moment it doesn't work (I tried on one-4.4).

#16 Updated by Rolandas Naujikas about 6 years ago

Ruben S. Montero wrote:

Rolandas Naujikas wrote:

Still there are problems with compressed qcow2 images, like hadoop from OpenNebula Marketplace.
Those are really of 10GB, but after import it is shown as 1.3GB.

This should be a problem of the market place image. Note that we are not decompressing the file to get the actual size. Market place images take their size from either the json document, or HTTP headers.

Locally we cannot support nested formats, so stat operation does not require decompressing a whole image.

So at least lets put right size in Marketplace metadata.

#17 Updated by Ruben S. Montero about 6 years ago

You are right this is not yet implemented, I've opened an issue for this #2489. This will be ready for 4.4.

Thanks for your feedback and testing!

#18 Updated by Rolandas Naujikas almost 6 years ago

I just got gzip'ed file http://www.cloudbase.it/ws2012r2/ with incorrect size registered:

$ gzip -l windows_server_2012_r2_standard_eval_kvm_20131117.qcow2.gz 
  compressed uncompressed  ratio uncompressed_name
  4831004561      2883584 -99.9% windows_server_2012_r2_standard_eval_kvm_20131117.qcow2

OpenNebula registered it with SIZE=3MB.
I re gziped it (to check if it is problem on creator OS/gzip) and it shows the same.

#19 Updated by Ruben S. Montero almost 6 years ago

Rolandas Naujikas wrote:

I just got gzip'ed file http://www.cloudbase.it/ws2012r2/ with incorrect size registered:
[...]
OpenNebula registered it with SIZE=3MB.
I re gziped it (to check if it is problem on creator OS/gzip) and it shows the same.

Note that currently nested formats are not supported. If you register a qcow image it will use the the Virtual size, if you register a gzip image it will be used the uncompressed size. So if a gzip qcow image is registered the uncompressed qcow size (not virtual) will be used.

#20 Updated by Rolandas Naujikas almost 6 years ago

Problem is not in qcow2 format here, but invalid data from gzip -l output.

#21 Updated by Ruben S. Montero almost 6 years ago

Rolandas Naujikas wrote:

Problem is not in qcow2 format here, but invalid data from gzip -l output.

OpenNebula uses gzip -l, and you are getting a different output running it directly?

#22 Updated by Rolandas Naujikas almost 6 years ago

$ gzip -l windows_server_2012_r2_standard_eval_kvm_20131117.qcow2.gz 
  compressed uncompressed  ratio uncompressed_name
  4831004561      2883584 -99.9% windows_server_2012_r2_standard_eval_kvm_20131117.qcow2

So we cannot trust gzip -l output.
Probably problem could be in 32bit header fields or other bugs in gzip.

#23 Updated by Stuart Longland over 5 years ago

I'm finding this issue when importing a Microsoft HyperV image.

stuartl@rikishi ~ $ ls -lh /tmp/win2012de.vhd 
-rw-r--r-- 1 stuartl stuartl 7.5G Apr  2 12:20 /tmp/win2012de.vhd
stuartl@rikishi ~ $ qemu-img info /tmp/win2012de.vhd 
image: /tmp/win2012de.vhd
file format: vpc
virtual size: 40G (42949017600 bytes)
disk size: 7.5G

I'm using a custom datastore driver which I'm developing, using Ceph as the back-end. I've handled collecting the size from the raw image in my handler for the 'cp' action:

BYTES_PER_MB=1073741824
BYTES_PER_512KB=536870912
REGISTER_CMD=$(cat <<EOF
    set -e

    # determine size of image in bytes
    size=\$( $QEMU_IMG info --output=json "${TMP_DST}" | \
        grep virtual-size | cut -f 2 -d: | cut -f 1 -d , )
    # convert this to MB for ceph, rounding up to nearest MB
    size_mb=\$(( ( \$size + $BYTES_PER_512KB ) / $BYTES_PER_MB ))

    # create rbd
    $SUDO $RBD create --image-format 2 --size \${size_mb} $RBD_SOURCE
    # map rbd device
    $SUDO $RBD map $RBD_SOURCE
    # import image
    $SUDO $QEMU_IMG convert $TMP_DST /dev/rbd/$RBD_SOURCE
    # unmap rbd device
    $SUDO $RBD unmap /dev/rbd/$RBD_SOURCE

    # remove original
    $RM -f $TMP_DST
EOF
)

(And yes, things are a little … ugly.)

However I've got no idea how I tell OpenNebula that no, the image is actually 40GB not 7.5GB as originally thought.

Also available in: Atom PDF