Re: [PATCH] block/file-posix: optimize append write

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Damien Le Moal <dlemoal@kernel.org>
To: Sam Li <faithilikerun@gmail.com>, qemu-devel@nongnu.org
Cc: stefanha@redhat.com, Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, dmitry.fomichev@wdc.com,
	Hanna Reitz <hreitz@redhat.com>
Subject: Re: [PATCH] block/file-posix: optimize append write
Date: Tue, 1 Oct 2024 09:20:46 +0900	[thread overview]
Message-ID: <1bf857b1-d8ee-4fa5-ae63-074a1533b5f6@kernel.org> (raw)
In-Reply-To: <20240929160359.581141-1-faithilikerun@gmail.com>

On 9/30/24 01:03, Sam Li wrote:
> When the file-posix driver emulates append write, it holds the lock
> whenever accessing wp, which limits the IO queue depth to one.
> 
> The write IO flow can be optimized to allow concurrent writes. The lock
> is held in two cases:
> 1. Assumed that the write IO succeeds, update the wp before issuing the
> write.
> 2. If the write IO fails, report that zone and use the reported value
> as the current wp.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/file-posix.c | 33 ++++++++++++++++-----------------
>  1 file changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ff928b5e85..64a57fadb1 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2489,11 +2489,19 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
>  #if defined(CONFIG_BLKZONED)
>      if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
>          bs->bl.zoned != BLK_Z_NONE) {
> -        qemu_co_mutex_lock(&bs->wps->colock);
> +        BlockZoneWps *wps = bs->wps;
> +        int index = offset / bs->bl.zone_size;

A blank line after this declaration (to separate declarations from code) woule
make this code more readable...

> +        qemu_co_mutex_lock(&wps->colock);
> +        uint64_t *wp = &wps->wp[index];
>          if (type & QEMU_AIO_ZONE_APPEND) {
> -            int index = offset / bs->bl.zone_size;
> -            offset = bs->wps->wp[index];
> +            offset = *wp;
> +            *offset_ptr = offset;
> +        }
> +        /* Advance the wp if needed */
> +        if (offset + bytes > *wp) {

Why the if ? offset MUST be equal to wp for writes, and for zone append we do
not need to check the offset at all. So advancing the wp should be unconditional.

BUT ! where are the checks for "zone is full" and "offset == wp" for write
operations ? These must be checked while holding the zone lock.

> +            *wp = offset + bytes;
>          }
> +        qemu_co_mutex_unlock(&bs->wps->colock);
>      }
>  #endif
>  
> @@ -2540,28 +2548,19 @@ out:
>  #if defined(CONFIG_BLKZONED)
>      if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
>          bs->bl.zoned != BLK_Z_NONE) {
> -        BlockZoneWps *wps = bs->wps;
>          if (ret == 0) {
> -            uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
> -            if (!BDRV_ZT_IS_CONV(*wp)) {
> -                if (type & QEMU_AIO_ZONE_APPEND) {
> -                    *offset_ptr = *wp;
> -                    trace_zbd_zone_append_complete(bs, *offset_ptr
> -                        >> BDRV_SECTOR_BITS);
> -                }
> -                /* Advance the wp if needed */
> -                if (offset + bytes > *wp) {
> -                    *wp = offset + bytes;
> -                }
> +            if (type & QEMU_AIO_ZONE_APPEND) {
> +                trace_zbd_zone_append_complete(bs, *offset_ptr
> +                    >> BDRV_SECTOR_BITS);
>              }
>          } else {
> +            qemu_co_mutex_lock(&bs->wps->colock);
>              /*
>               * write and append write are not allowed to cross zone boundaries
>               */
>              update_zones_wp(bs, s->fd, offset, 1);
> +            qemu_co_mutex_unlock(&bs->wps->colock);
>          }
> -
> -        qemu_co_mutex_unlock(&wps->colock);
>      }
>  #endif
>      return ret;


-- 
Damien Le Moal
Western Digital Research

     prev parent reply	other threads:[~2024-10-01  0:21 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-29 16:03 [PATCH] block/file-posix: optimize append write Sam Li
2024-10-01  0:20 ` Damien Le Moal [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1bf857b1-d8ee-4fa5-ae63-074a1533b5f6@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=dmitry.fomichev@wdc.com \
    --cc=faithilikerun@gmail.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).