qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Sam Li <faithilikerun@gmail.com>
Cc: qemu-devel@nongnu.org, stefanha@redhat.com,
	Hanna Reitz <hreitz@redhat.com>,
	dmitry.fomichev@wdc.com, qemu-block@nongnu.org,
	dlemoal@kernel.org
Subject: Re: [PATCH v2] block/file-posix: optimize append write
Date: Tue, 22 Oct 2024 00:11:28 +0200	[thread overview]
Message-ID: <ZxbRkOg6wB37NAxj@redhat.com> (raw)
In-Reply-To: <CAAAx-8KtfoM+gVRzUjCCLY8vEG=5vcoRGcxBzDDG6DArC8L7tA@mail.gmail.com>

Am 21.10.2024 um 15:21 hat Sam Li geschrieben:
> Kevin Wolf <kwolf@redhat.com> 于2024年10月18日周五 16:37写道:
> >
> > Am 04.10.2024 um 12:41 hat Sam Li geschrieben:
> > > When the file-posix driver emulates append write, it holds the lock
> > > whenever accessing wp, which limits the IO queue depth to one.
> > >
> > > The write IO flow can be optimized to allow concurrent writes. The lock
> > > is held in two cases:
> > > 1. Assumed that the write IO succeeds, update the wp before issuing the
> > > write.
> > > 2. If the write IO fails, report that zone and use the reported value
> > > as the current wp.
> >
> > What happens with the concurrent writes that started later and may not
> > have completed yet? Can we really just reset to the reported value
> > before all other requests have completed, too?
> >
> > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > ---
> > >  block/file-posix.c | 57 ++++++++++++++++++++++++++++++----------------
> > >  1 file changed, 38 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 90fa54352c..a65a23cb2c 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -2482,18 +2482,46 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
> > >      BDRVRawState *s = bs->opaque;
> > >      RawPosixAIOData acb;
> > >      int ret;
> > > -    uint64_t offset = *offset_ptr;
> > > +    uint64_t end_offset, end_zone, offset = *offset_ptr;
> > > +    uint64_t *wp;
> >
> > Without CONFIG_BLKZONED, these are unused variables and break the build.
> >
> > They are only used in the first CONFIG_BLKZONED block, so you could just
> > declare them locally there.
> 
> Thanks! Will do.
> 
> >
> > >
> > >      if (fd_open(bs) < 0)
> > >          return -EIO;
> > >  #if defined(CONFIG_BLKZONED)
> > >      if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
> > >          bs->bl.zoned != BLK_Z_NONE) {
> > > -        qemu_co_mutex_lock(&bs->wps->colock);
> > > -        if (type & QEMU_AIO_ZONE_APPEND) {
> > > -            int index = offset / bs->bl.zone_size;
> > > -            offset = bs->wps->wp[index];
> > > +        BlockZoneWps *wps = bs->wps;
> > > +        int index = offset / bs->bl.zone_size;
> > > +
> > > +        qemu_co_mutex_lock(&wps->colock);
> >
> > This is preexisting, but what is the reason for using coroutine locks
> > here? There doesn't seem to be any code running under the lock that can
> > yield, so a normal mutex might be more efficient.
> 
> Using coroutine locks is to avoid blocking in coroutines. QemuMutex
> blocks the thread when the lock is held instead of yielding.

Right, but usually you have to wait only for a very short time until the
mutex is released again and CoMutexes are more expensive then.

You absolutely do need to use CoMutex when the coroutine can yield in
the critical section, but if it can't, the CoMutex is often worse.

Though as I said here...

> > Hm... Looking a bit closer, get_zones_wp() could probably be a
> > coroutine_fn and call hdev_co_ioctl() instead of calling ioctl()
> > directly in order to avoid blocking.

...we should probably use a coroutine version of ioctl() instead of
the blocking one, and then you do need the CoMutex.

> > > +        wp = &wps->wp[index];
> >
> > Also preexisting, but who guarantees that index is within the bounds? A
> > stacked block driver may try to grow the file size.
> 
> Right. It needs to be checked if it's over nr_zones.

Can you send a separate fix for this, please? (Can be both in one
patch series, though.)

> >
> > > +        if (!BDRV_ZT_IS_CONV(*wp)) {
> > > +            if (type & QEMU_AIO_WRITE && offset != *wp) {
> > > +                error_report("write offset 0x%" PRIx64 " is not equal to the wp"
> > > +                             " of zone[%d] 0x%" PRIx64 "", offset, index, *wp);
> >
> > We can't error_report() in an I/O path that can be triggered by the
> > guest, it could result in unbounded log file growth.
> 
> Those error messages show in the err path and are good for debugging
> zoned device emulation.
> 
> I was wondering if there is a better approach to print errors? Use
> error_report_once() to reduce the log?

If it's for debugging, I think trace points are best.

Kevin



      reply	other threads:[~2024-10-21 22:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04 10:41 [PATCH v2] block/file-posix: optimize append write Sam Li
2024-10-18 14:37 ` Kevin Wolf
2024-10-20  1:03   ` Damien Le Moal
2024-10-21 11:08     ` Kevin Wolf
2024-10-21 12:32       ` Damien Le Moal
2024-10-21 18:13         ` Stefan Hajnoczi
2024-10-22  1:56           ` Damien Le Moal
2024-10-21 13:21   ` Sam Li
2024-10-21 22:11     ` Kevin Wolf [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZxbRkOg6wB37NAxj@redhat.com \
    --to=kwolf@redhat.com \
    --cc=dlemoal@kernel.org \
    --cc=dmitry.fomichev@wdc.com \
    --cc=faithilikerun@gmail.com \
    --cc=hreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).