Re: QEMU RBD is slow with QCOW2 images

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Stefano Garzarella <sgarzare@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Peter Lieven <pl@kamp.de>,
	dillaman@redhat.com, qemu-devel <qemu-devel@nongnu.org>,
	qemu-block <qemu-block@nongnu.org>
Subject: Re: QEMU RBD is slow with QCOW2 images
Date: Thu, 4 Mar 2021 12:12:51 +0100	[thread overview]
Message-ID: <20210304111251.2ernxss627lllwqa@steredhat> (raw)
In-Reply-To: <YEC1nQPYf4e5o8/j@redhat.com>

On Thu, Mar 04, 2021 at 10:25:33AM +0000, Daniel P. Berrangé wrote:
>On Thu, Mar 04, 2021 at 09:55:40AM +0100, Stefano Garzarella wrote:
>> On Wed, Mar 03, 2021 at 01:47:06PM -0500, Jason Dillaman wrote:
>> > On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > >
>> > > Hi Jason,
>> > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
>> > > writing data is very slow compared to a raw file.
>> > >
>> > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
>> > > different object size, for the raw file I see '4 MiB objects', for QCOW2
>> > > I see '64 KiB objects' as reported on comment 14 [2].
>> > > This should be the main issue of slowness, indeed forcing in the code 4
>> > > MiB object size also for QCOW2 increased the speed a lot.
>> > >
>> > > Looking better I discovered that for raw files, we call rbd_create()
>> > > with obj_order = 0 (if 'cluster_size' options is not defined), so the
>> > > default object size is used.
>> > > Instead for QCOW2, we use obj_order = 16, since the default
>> > > 'cluster_size' defined for QCOW2, is 64 KiB.
>> > >
>> > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
>> > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
>> > > QemuOpts calling qemu_opts_to_qdict_filtered().
>> > > For some reason that I have yet to understand, after this deletion,
>> > > however remains in QemuOpts the default value of 'cluster_size' for
>> > > qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
>> > >
>> > > At this point my doubts are:
>> > > Does it make sense to use the same cluster_size as qcow2 as object_size
>> > > in RBD?
>> >
>> > No, not really. But it also doesn't really make any sense to put a
>> > QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
>> > does not put QCOW2 images on RBD, it converts QCOW2 images into raw
>> > images to store in RBD.
>>
>> Yes, that was my doubt, thanks for the confirmation.
>>
>> Also Daniel (+CC) confirmed me the same thing, but just to be complete he
>> added that there is a case where OpenStack could use qcow2 on RBD, but in
>> this case using in-kernel RBD, so the QEMU RBD is not involved.
>>
>> >
>> > > If we want to keep the 2 options separated, how can it be done? Should
>> > > we rename the option in block/rbd.c?
>> >
>> > You can already pass overrides to the RBD block driver by just
>> > appending them after the
>> > "rbd:<filename>[:option1=value1[:option2=value2]]" portion, perhaps
>> > that could be re-used.
>>
>> I see, we should extend qemu_rbd_parse_filename() to suppurt it.
>
>We shouldn't really be extending the legacy filename syntax.
>If we need extra options we want them in the QAPI schema for
>blockdev.

Got it.

I'm still a bit confused about how QemuOpts are handled between format 
and protocol drivers.

It seems that in this case the protocol tries to access some information 
from the format (BLOCK_OPT_CLUSTER_SIZE).

Since the format removes this information from the QemuOpts passed to 
the protocol, this takes the default value of the format, even if a 
different value is specified.

Is it correct for a protocol to access BLOCK_OPT_CLUSTER_SIZE?

>
>> Maybe if we don't want to support this configuration, we should print some
>> warning messages.
>
>Note these are separate layers in QEMU block layer. qcow2 is a format
>driver, while RBD is a protocol driver. QEMU lets any format driver be
>run on top of any protocol driver in general. In practice there are
>certain combinations that are more sane and commonly used than others,
>but QEMU doesn't document this or assign support level to pairing
>right now.

Thanks for the clarification,
Stefano

next prev parent reply	other threads:[~2021-03-04 11:14 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-03 17:40 QEMU RBD is slow with QCOW2 images Stefano Garzarella
2021-03-03 18:47 ` Jason Dillaman
2021-03-03 21:26   ` Peter Lieven
2021-03-04  8:58     ` Stefano Garzarella
2021-03-04  8:55   ` Stefano Garzarella
2021-03-04 10:25     ` Daniel P. Berrangé
2021-03-04 11:12       ` Stefano Garzarella [this message]
2021-03-04 11:15         ` Daniel P. Berrangé
2021-03-04 12:05 ` Kevin Wolf
2021-03-04 14:08   ` Stefano Garzarella
2021-03-04 14:59     ` Kevin Wolf
2021-03-04 17:32       ` Stefano Garzarella
2021-03-05  9:16         ` Kevin Wolf
2021-03-05  9:44           ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210304111251.2ernxss627lllwqa@steredhat \
    --to=sgarzare@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dillaman@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).