From: Sam Li <faithilikerun@gmail.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
stefanha@redhat.com, Hanna Reitz <hreitz@redhat.com>,
dmitry.fomichev@wdc.com, qemu-block@nongnu.org,
dlemoal@kernel.org, Sam Li <faithilikerun@gmail.com>
Subject: [PATCH v2] block/file-posix: optimize append write
Date: Fri, 4 Oct 2024 12:41:23 +0200 [thread overview]
Message-ID: <20241004104123.236457-1-faithilikerun@gmail.com> (raw)
When the file-posix driver emulates append write, it holds the lock
whenever accessing wp, which limits the IO queue depth to one.
The write IO flow can be optimized to allow concurrent writes. The lock
is held in two cases:
1. Assumed that the write IO succeeds, update the wp before issuing the
write.
2. If the write IO fails, report that zone and use the reported value
as the current wp.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
block/file-posix.c | 57 ++++++++++++++++++++++++++++++----------------
1 file changed, 38 insertions(+), 19 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 90fa54352c..a65a23cb2c 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2482,18 +2482,46 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
BDRVRawState *s = bs->opaque;
RawPosixAIOData acb;
int ret;
- uint64_t offset = *offset_ptr;
+ uint64_t end_offset, end_zone, offset = *offset_ptr;
+ uint64_t *wp;
if (fd_open(bs) < 0)
return -EIO;
#if defined(CONFIG_BLKZONED)
if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
bs->bl.zoned != BLK_Z_NONE) {
- qemu_co_mutex_lock(&bs->wps->colock);
- if (type & QEMU_AIO_ZONE_APPEND) {
- int index = offset / bs->bl.zone_size;
- offset = bs->wps->wp[index];
+ BlockZoneWps *wps = bs->wps;
+ int index = offset / bs->bl.zone_size;
+
+ qemu_co_mutex_lock(&wps->colock);
+ wp = &wps->wp[index];
+ if (!BDRV_ZT_IS_CONV(*wp)) {
+ if (type & QEMU_AIO_WRITE && offset != *wp) {
+ error_report("write offset 0x%" PRIx64 " is not equal to the wp"
+ " of zone[%d] 0x%" PRIx64 "", offset, index, *wp);
+ qemu_co_mutex_unlock(&wps->colock);
+ return -EINVAL;
+ }
+
+ if (type & QEMU_AIO_ZONE_APPEND) {
+ offset = *wp;
+ *offset_ptr = offset;
+ }
+
+ end_offset = offset + bytes;
+ end_zone = (index + 1) * bs->bl.zone_size;
+ if (end_offset > end_zone) {
+ error_report("write exceeds zone boundary with end_offset "
+ "%" PRIu64 ", end_zone %" PRIu64 "",
+ end_offset, end_zone);
+ qemu_co_mutex_unlock(&wps->colock);
+ return -EINVAL;
+ }
+
+ /* Advance the wp */
+ *wp = end_offset;
}
+ qemu_co_mutex_unlock(&bs->wps->colock);
}
#endif
@@ -2540,28 +2568,19 @@ out:
#if defined(CONFIG_BLKZONED)
if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
bs->bl.zoned != BLK_Z_NONE) {
- BlockZoneWps *wps = bs->wps;
if (ret == 0) {
- uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
- if (!BDRV_ZT_IS_CONV(*wp)) {
- if (type & QEMU_AIO_ZONE_APPEND) {
- *offset_ptr = *wp;
- trace_zbd_zone_append_complete(bs, *offset_ptr
- >> BDRV_SECTOR_BITS);
- }
- /* Advance the wp if needed */
- if (offset + bytes > *wp) {
- *wp = offset + bytes;
- }
+ if (type & QEMU_AIO_ZONE_APPEND) {
+ trace_zbd_zone_append_complete(bs, *offset_ptr
+ >> BDRV_SECTOR_BITS);
}
} else {
+ qemu_co_mutex_lock(&bs->wps->colock);
/*
* write and append write are not allowed to cross zone boundaries
*/
update_zones_wp(bs, s->fd, offset, 1);
+ qemu_co_mutex_unlock(&bs->wps->colock);
}
-
- qemu_co_mutex_unlock(&wps->colock);
}
#endif
return ret;
--
2.34.1
next reply other threads:[~2024-10-04 10:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-04 10:41 Sam Li [this message]
2024-10-18 14:37 ` [PATCH v2] block/file-posix: optimize append write Kevin Wolf
2024-10-20 1:03 ` Damien Le Moal
2024-10-21 11:08 ` Kevin Wolf
2024-10-21 12:32 ` Damien Le Moal
2024-10-21 18:13 ` Stefan Hajnoczi
2024-10-22 1:56 ` Damien Le Moal
2024-10-21 13:21 ` Sam Li
2024-10-21 22:11 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241004104123.236457-1-faithilikerun@gmail.com \
--to=faithilikerun@gmail.com \
--cc=dlemoal@kernel.org \
--cc=dmitry.fomichev@wdc.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).