From: Kevin Wolf <kwolf@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Peter Lieven <pl@kamp.de>, Jason Dillaman <dillaman@redhat.com>,
qemu-devel@nongnu.org, qemu-block@nongnu.org, armbru@redhat.com
Subject: Re: QEMU RBD is slow with QCOW2 images
Date: Thu, 4 Mar 2021 15:59:17 +0100 [thread overview]
Message-ID: <20210304145917.GF9607@merkur.fritz.box> (raw)
In-Reply-To: <20210304140829.4tfdrd2mhqa4o76h@steredhat>
Am 04.03.2021 um 15:08 hat Stefano Garzarella geschrieben:
> On Thu, Mar 04, 2021 at 01:05:02PM +0100, Kevin Wolf wrote:
> > Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
> > > Hi Jason,
> > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > > writing data is very slow compared to a raw file.
> > >
> > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > > different object size, for the raw file I see '4 MiB objects', for
> > > QCOW2 I
> > > see '64 KiB objects' as reported on comment 14 [2].
> > > This should be the main issue of slowness, indeed forcing in the code 4 MiB
> > > object size also for QCOW2 increased the speed a lot.
> > >
> > > Looking better I discovered that for raw files, we call rbd_create() with
> > > obj_order = 0 (if 'cluster_size' options is not defined), so the default
> > > object size is used.
> > > Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
> > > defined for QCOW2, is 64 KiB.
> >
> > Hm, the QemuOpts-based image creation is messy, but why does the rbd
> > driver even see the cluster_size option?
> >
> > The first thing qcow2_co_create_opts() does is splitting the passed
> > QemuOpts into options it will process on the qcow2 layer and options
> > that are passed to the protocol layer. So if you pass a cluster_size
> > option, qcow2 should take it for itself and not pass it to rbd.
> >
> > If it is passed to rbd, I think that's a bug in the qcow2 driver.
>
> IIUC qcow2 properyl remove it, but when rbd uses qemu_opt_get_size_del(opts,
> BLOCK_OPT_CLUSTER_SIZE, 0) the default value of qcow2 format is returned.
>
> Going in depth in qemu_opt_get_size_helper(), I found that qemu_opt_find()
> properly returns a NULL pointer, but then we call find_default_by_name()
> that returns the default value of qcow2 format (64k).
Ugh, I see why. We're passing the protocol driver a QemuOpts that was
created for a QemuOptsList with the qcow2 default, not for its own
QemuOptsList. This is wrong.
Note that the QemuOptsList is not qcow2_create_opts itself, but a list
that is created with qemu_opts_append() to combine qcow2 and rbd options
into a new QemuOptsList. For overlapping options, the format wins.
I don't think you can change the QemuOptsList of an existing QemuOpts,
nor is there a clone operation that could just copy all options into a
new QemuOpts created for the rbd QemuOptsList, so maybe the easiest
hack^Wsolution would be converting to QDict and back...
> > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> > > QemuOpts calling qemu_opts_to_qdict_filtered().
> > > For some reason that I have yet to understand, after this deletion, however
> > > remains in QemuOpts the default value of 'cluster_size' for qcow2 (64 KiB),
> > > that it's used in qemu_rbd_co_create_opts()
> >
> > So it seems you came to a similar conclusion. We need to find out where
> > the 64k come from and just fix that so that rbd uses its default.
>
> Yes, I tried debugging above, but I'm not sure how to fix it.
>
> Maybe a new parameter in qemu_opt_get_size_helper() to prevent it from
> looking for the default value.
> Or we should prevent the default value from being added to the
> opts->list->desc, but that part is still not very clear to me.
opts->list is already wrong, I think this is what we need to fix.
> > > At this point my doubts are:
> > > Does it make sense to use the same cluster_size as qcow2 as object_size in
> > > RBD?
> > > If we want to keep the 2 options separated, how can it be done? Should we
> > > rename the option in block/rbd.c?
> >
> > My lazy answer is that you could just use QMP blockdev-create, where you
> > create layer by layer separately.
> >
> > What could possibly be done for the QemuOpts is using the dotted syntax
> > like for opening, so you could specify file.cluster_size=... for the
> > protocol layer (or data_file.cluster_size=... for the external data
> > file etc.)
>
> This would be cool :-)
I'm almost sure that compatibility will make this more complicated than
it sounds, but we could have a try.
Kevin
next prev parent reply other threads:[~2021-03-04 15:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-03 17:40 QEMU RBD is slow with QCOW2 images Stefano Garzarella
2021-03-03 18:47 ` Jason Dillaman
2021-03-03 21:26 ` Peter Lieven
2021-03-04 8:58 ` Stefano Garzarella
2021-03-04 8:55 ` Stefano Garzarella
2021-03-04 10:25 ` Daniel P. Berrangé
2021-03-04 11:12 ` Stefano Garzarella
2021-03-04 11:15 ` Daniel P. Berrangé
2021-03-04 12:05 ` Kevin Wolf
2021-03-04 14:08 ` Stefano Garzarella
2021-03-04 14:59 ` Kevin Wolf [this message]
2021-03-04 17:32 ` Stefano Garzarella
2021-03-05 9:16 ` Kevin Wolf
2021-03-05 9:44 ` Stefano Garzarella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210304145917.GF9607@merkur.fritz.box \
--to=kwolf@redhat.com \
--cc=armbru@redhat.com \
--cc=dillaman@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).