qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Peter Lieven <pl@kamp.de>, Jason Dillaman <dillaman@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org, armbru@redhat.com
Subject: Re: QEMU RBD is slow with QCOW2 images
Date: Fri, 5 Mar 2021 10:16:41 +0100	[thread overview]
Message-ID: <20210305091641.GA5155@merkur.fritz.box> (raw)
In-Reply-To: <20210304173254.3qid3tm26eq6yweg@steredhat>

Am 04.03.2021 um 18:32 hat Stefano Garzarella geschrieben:
> On Thu, Mar 04, 2021 at 03:59:17PM +0100, Kevin Wolf wrote:
> > Am 04.03.2021 um 15:08 hat Stefano Garzarella geschrieben:
> > > On Thu, Mar 04, 2021 at 01:05:02PM +0100, Kevin Wolf wrote:
> > > > Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
> > > > > Hi Jason,
> > > > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > > > > writing data is very slow compared to a raw file.
> > > > >
> > > > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > > > > different object size, for the raw file I see '4 MiB objects', for
> > > > > QCOW2 I
> > > > > see '64 KiB objects' as reported on comment 14 [2].
> > > > > This should be the main issue of slowness, indeed forcing in the code 4 MiB
> > > > > object size also for QCOW2 increased the speed a lot.
> > > > >
> > > > > Looking better I discovered that for raw files, we call rbd_create() with
> > > > > obj_order = 0 (if 'cluster_size' options is not defined), so the default
> > > > > object size is used.
> > > > > Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
> > > > > defined for QCOW2, is 64 KiB.
> > > >
> > > > Hm, the QemuOpts-based image creation is messy, but why does the rbd
> > > > driver even see the cluster_size option?
> > > >
> > > > The first thing qcow2_co_create_opts() does is splitting the passed
> > > > QemuOpts into options it will process on the qcow2 layer and options
> > > > that are passed to the protocol layer. So if you pass a cluster_size
> > > > option, qcow2 should take it for itself and not pass it to rbd.
> > > >
> > > > If it is passed to rbd, I think that's a bug in the qcow2 driver.
> > > 
> > > IIUC qcow2 properyl remove it, but when rbd uses qemu_opt_get_size_del(opts,
> > > BLOCK_OPT_CLUSTER_SIZE, 0) the default value of qcow2 format is returned.
> > > 
> > > Going in depth in qemu_opt_get_size_helper(), I found that qemu_opt_find()
> > > properly returns a NULL pointer, but then we call find_default_by_name()
> > > that returns the default value of qcow2 format (64k).
> > 
> > Ugh, I see why. We're passing the protocol driver a QemuOpts that was
> > created for a QemuOptsList with the qcow2 default, not for its own
> > QemuOptsList. This is wrong.
> > 
> > Note that the QemuOptsList is not qcow2_create_opts itself, but a list
> > that is created with qemu_opts_append() to combine qcow2 and rbd options
> > into a new QemuOptsList. For overlapping options, the format wins.
> > 
> > I don't think you can change the QemuOptsList of an existing QemuOpts,
> > nor is there a clone operation that could just copy all options into a
> > new QemuOpts created for the rbd QemuOptsList, so maybe the easiest
> > hack^Wsolution would be converting to QDict and back...
> 
> Do you mean something like this? (I'll send a proper patch when everything
> is a little clearer to me :-)
> 
> diff --git a/block.c b/block.c
> index a1f3cecd75..74b02b32dc 100644
> --- a/block.c
> +++ b/block.c
> @@ -671,13 +671,33 @@ out:
>  int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
>  {
>      BlockDriver *drv;
> +    QemuOpts *new_opts;
> +    QDict *qdict;
> +    int ret;
> 
>      drv = bdrv_find_protocol(filename, true, errp);
>      if (drv == NULL) {
>          return -ENOENT;
>      }
> 
> -    return bdrv_create(drv, filename, opts, errp);
> +    if (!drv->create_opts) {
> +        error_setg(errp, "Driver '%s' does not support image creation",
> +                   drv->format_name);
> +        return -ENOTSUP;
> +    }
> +
> +    qdict = qemu_opts_to_qdict(opts, NULL);
> +    new_opts = qemu_opts_from_qdict(drv->create_opts, qdict, errp);
> +    if (new_opts == NULL) {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ret = bdrv_create(drv, filename, new_opts, errp);
> +out:
> +    qemu_opts_del(new_opts);
> +    qobject_unref(qdict);
> +    return ret;
>  }

Something like this, yes. Does it work for you?

Of course, in the real patch it could use a comment why we're doing
these seemingly redundant conversions.

Kevin



  reply	other threads:[~2021-03-05  9:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-03 17:40 QEMU RBD is slow with QCOW2 images Stefano Garzarella
2021-03-03 18:47 ` Jason Dillaman
2021-03-03 21:26   ` Peter Lieven
2021-03-04  8:58     ` Stefano Garzarella
2021-03-04  8:55   ` Stefano Garzarella
2021-03-04 10:25     ` Daniel P. Berrangé
2021-03-04 11:12       ` Stefano Garzarella
2021-03-04 11:15         ` Daniel P. Berrangé
2021-03-04 12:05 ` Kevin Wolf
2021-03-04 14:08   ` Stefano Garzarella
2021-03-04 14:59     ` Kevin Wolf
2021-03-04 17:32       ` Stefano Garzarella
2021-03-05  9:16         ` Kevin Wolf [this message]
2021-03-05  9:44           ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210305091641.GA5155@merkur.fritz.box \
    --to=kwolf@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dillaman@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).