qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Josh Durgin <jdurgin@redhat.com>,
	dillaman@redhat.com, qemu-devel <qemu-devel@nongnu.org>,
	qemu-block <qemu-block@nongnu.org>, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
Date: Tue, 7 May 2019 11:43:50 +0200	[thread overview]
Message-ID: <20190507094350.GE5808@localhost.localdomain> (raw)
In-Reply-To: <20190506095031.kffsp76faaqhkdr2@steredhat>

Am 06.05.2019 um 11:50 hat Stefano Garzarella geschrieben:
> On Fri, May 03, 2019 at 01:21:23PM -0400, Jason Dillaman wrote:
> > On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >
> > > RBD APIs don't allow us to write more than the size set with
> > > rbd_create() or rbd_resize().
> > > In order to support growing images (eg. qcow2), we resize the
> > > image before write operations that exceed the current size.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > > v2:
> > >   - use bs->total_sectors instead of adding a new field [Kevin]
> > >   - resize the image only during write operation [Kevin]
> > >     for read operation, the bdrv_aligned_preadv() already handles reads
> > >     that exceed the length returned by bdrv_getlength(), so IMHO we can
> > >     avoid to handle it in the rbd driver
> > > ---
> > >  block/rbd.c | 14 +++++++++++++-
> > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/block/rbd.c b/block/rbd.c
> > > index 0c549c9935..613e8f4982 100644
> > > --- a/block/rbd.c
> > > +++ b/block/rbd.c
> > > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
> > >      }
> > >
> > >      switch (cmd) {
> > > -    case RBD_AIO_WRITE:
> > > +    case RBD_AIO_WRITE: {
> > > +        /*
> > > +         * RBD APIs don't allow us to write more than actual size, so in order
> > > +         * to support growing images, we resize the image before write
> > > +         * operations that exceed the current size.
> > > +         */
> > > +        if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
> > 
> > When will "bs->total_sectors" be refreshed to represent the correct
> > current size? You wouldn't want a future write whose extent was
> > greater than the original image size but less then a previous IO that
> > expanded the image to attempt to shrink the image.
> > 
> 
> Good point!
> IIUC it can happen, because in the bdrv_aligned_pwritev() we do these
> steps:
> 1. call bdrv_driver_pwritev() that invokes "drv->bdrv_aio_pwritev" and
>    then it waits calling "qemu_coroutine_yield()"
> 2. call bdrv_co_write_req_finish() that updates the "bs->total_sectors"
> 
> Between steps 1 and 2, maybe another request can be executed, then the
> issue that you described can occur.
> 
> The solutions that I have in mind are:
> a. Add a variable in the BDRVRBDState to track the latest resize.

This would work and be relatively simple.

> b. Call rbd_get_size() before the rbd_resize() to be sure to avoid to shrink
>    the image.

I'm not sure if rbd_get_size() involves network traffic or other
significant complexity. If so, I'd definitely avoid it.

> c. Updates the "bs->total_sectors" after the rbd_resize(), but I'm not
>    sure it is allowed.
> 
> @Jason, @Kevin Do you have any advice?

We need to make sure to run everything that bdrv_co_write_req_finish()
does for resizing an image:

    bs->total_sectors = end_sector;
    bdrv_parent_cb_resize(bs);
    bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS);

Just duplicating that code wouldn't be good; if something is added, we'd
probably forget updating rbd, too. So I think your solution c would at
least involve refactoring the above code into a separate function that
can be called from rbd.

But solution a might actually be the simplest. In this case, sorry for
giving you bad advice in v1 of the patch.

Kevin


  reply	other threads:[~2019-05-07  9:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-03 16:30 [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size Stefano Garzarella
2019-05-03 16:30 ` Stefano Garzarella
2019-05-03 17:21 ` Jason Dillaman
2019-05-03 17:21   ` Jason Dillaman
2019-05-06  9:50   ` Stefano Garzarella
2019-05-07  9:43     ` Kevin Wolf [this message]
2019-05-08  9:41       ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190507094350.GE5808@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=dillaman@redhat.com \
    --cc=jdurgin@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).