qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update
@ 2025-04-03  8:16 Pinku Deb Nath
  2025-04-03  8:16 ` [PATCH v2 1/2] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA Pinku Deb Nath
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Pinku Deb Nath @ 2025-04-03  8:16 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-block, qemu-devel, Pinku Deb Nath

The testing with "-t writeback" works for turning on enable_write_cache.
I renamed the function to qemu_pwritev_fua() and fixed any typos.

I moved the handle_aiocb_flush() into the qemu_pwritev_fua() and
removed from the previously todo seciont. Initially I thought
of only passing aiocb, but then I was not sure whethe I could
derive buf from aiocb, so I added arguments for iovec and iovcnt
into qemu_pwritev_fua().

For handling buf in handle_aiocb_rw_linear(), I created iovec
and passed its reference. I assumed that there will be only one
buffer/iovec, so I passed 1 for iovcnt.

Signed-off-by: Pinku Deb Nath <prantoran@gmail.com>

Pinku Deb Nath (2):
  block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA
  block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update

 block/file-posix.c | 54 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 12 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA
  2025-04-03  8:16 [PATCH v2 0/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update Pinku Deb Nath
@ 2025-04-03  8:16 ` Pinku Deb Nath
  2025-04-03  8:16 ` [PATCH v2 2/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update Pinku Deb Nath
  2025-04-03 16:08 ` [PATCH v2 0/2] " Stefan Hajnoczi
  2 siblings, 0 replies; 5+ messages in thread
From: Pinku Deb Nath @ 2025-04-03  8:16 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-block, qemu-devel, Pinku Deb Nath

Full Unit Access (FUA) is an optimization where a disk write with the
flag set will be persisted to disk immediately instead of potentially
remaining in the disk's write cache. This commit address the todo task
for using pwritev2() with RWF_DSYNC in the thread pool section of
raw_co_prw(), if pwritev2 with RWF_DSYNC is available in the host,
which is alway for Linux kernel >= 4.7. The intent for FUA is indicated
with the BDRV_REQ_FUA flag. The old code paths are preserved in case
BDRV_REQ_FUA is off or pwritev2() with RWF_DSYNC is not available.

During testing, I observed that the BDRV_REQ_FUA is always turned on
when blk->enable_write_cache is not set in block/block-backend.c, so
I commented this section off during testing:
https://gitlab.com/qemu-project/qemu/-/blob/master/block/block-backend.c?ref_type=heads#L1432-1434

Signed-off-by: Pinku Deb Nath <prantoran@gmail.com>
---
 block/file-posix.c | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 56d1972d15..34de816eab 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -229,6 +229,7 @@ typedef struct RawPosixAIOData {
             unsigned long op;
         } zone_mgmt;
     };
+    BdrvRequestFlags flags;
 } RawPosixAIOData;
 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -1674,6 +1675,16 @@ qemu_pwritev(int fd, const struct iovec *iov, int nr_iov, off_t offset)
     return pwritev(fd, iov, nr_iov, offset);
 }
 
+static ssize_t
+qemu_pwrite_fua(int fd, const struct iovec *iov, int nr_iov, off_t offset)
+{
+#ifdef RWF_DSYNC
+    return pwritev2(fd, iov, nr_iov, offset, RWF_DSYNC);
+#else
+    return pwritev2(fd, iov, nr_iov, offset, 0);
+#endif
+}
+
 #else
 
 static bool preadv_present = false;
@@ -1698,10 +1709,15 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
 
     len = RETRY_ON_EINTR(
         (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
-            qemu_pwritev(aiocb->aio_fildes,
-                           aiocb->io.iov,
-                           aiocb->io.niov,
-                           aiocb->aio_offset) :
+            (aiocb->flags &  BDRV_REQ_FUA) ?
+                qemu_pwrite_fua(aiocb->aio_fildes,
+                                aiocb->io.iov,
+                                aiocb->io.niov,
+                                aiocb->aio_offset) :
+                qemu_pwritev(aiocb->aio_fildes,
+                            aiocb->io.iov,
+                            aiocb->io.niov,
+                            aiocb->aio_offset) :
             qemu_preadv(aiocb->aio_fildes,
                           aiocb->io.iov,
                           aiocb->io.niov,
@@ -1727,10 +1743,17 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
 
     while (offset < aiocb->aio_nbytes) {
         if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
-            len = pwrite(aiocb->aio_fildes,
-                         (const char *)buf + offset,
-                         aiocb->aio_nbytes - offset,
-                         aiocb->aio_offset + offset);
+            if (aiocb->flags & BDRV_REQ_FUA) {
+                len = qemu_pwrite_fua(aiocb->aio_fildes,
+                                    aiocb->io.iov,
+                                    aiocb->io.niov,
+                                    aiocb->aio_offset);
+            } else {
+                len = pwrite(aiocb->aio_fildes,
+                            (const char *)buf + offset,
+                            aiocb->aio_nbytes - offset,
+                            aiocb->aio_offset + offset);
+            }
         } else {
             len = pread(aiocb->aio_fildes,
                         buf + offset,
@@ -2539,14 +2562,17 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
             .iov            = qiov->iov,
             .niov           = qiov->niov,
         },
+        .flags          = flags,
     };
 
     assert(qiov->size == bytes);
     ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
+#ifndef RWD_DSYNC
     if (ret == 0 && (flags & BDRV_REQ_FUA)) {
         /* TODO Use pwritev2() instead if it's available */
         ret = raw_co_flush_to_disk(bs);
     }
+#endif
     goto out; /* Avoid the compiler err of unused label */
 
 out:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update
  2025-04-03  8:16 [PATCH v2 0/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update Pinku Deb Nath
  2025-04-03  8:16 ` [PATCH v2 1/2] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA Pinku Deb Nath
@ 2025-04-03  8:16 ` Pinku Deb Nath
  2025-04-03 15:58   ` Stefan Hajnoczi
  2025-04-03 16:08 ` [PATCH v2 0/2] " Stefan Hajnoczi
  2 siblings, 1 reply; 5+ messages in thread
From: Pinku Deb Nath @ 2025-04-03  8:16 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi; +Cc: qemu-block, qemu-devel, Pinku Deb Nath

The testing with "-t writeback" works for turning on enable_write_cache.
I renamed the function to qemu_pwritev_fua() and fixed any typos.

I moved the handle_aiocb_flush() into the qemu_pwritev_fua() and
removed from the previously todo seciont. Initially I thought
of only passing aiocb, but then I was not sure whethe I could
derive buf from aiocb, so I added arguments for iovec and iovcnt
into qemu_pwritev_fua().

For handling buf in handle_aiocb_rw_linear(), I created iovec
and passed its reference. I assumed that there will be only one
buffer/iovec, so I passed 1 for iovcnt.

Signed-off-by: Pinku Deb Nath <prantoran@gmail.com>
---
 block/file-posix.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 34de816eab..4fffd49318 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1676,12 +1676,24 @@ qemu_pwritev(int fd, const struct iovec *iov, int nr_iov, off_t offset)
 }
 
 static ssize_t
-qemu_pwrite_fua(int fd, const struct iovec *iov, int nr_iov, off_t offset)
+qemu_pwritev_fua(const RawPosixAIOData *aiocb, struct iovec *iov, int iovcnt)
 {
 #ifdef RWF_DSYNC
-    return pwritev2(fd, iov, nr_iov, offset, RWF_DSYNC);
+    return pwritev2(aiocb->aio_fildes,
+                    iov,
+                    iovcnt,
+                    aiocb->aio_offset,
+                    RWF_DSYNC);
 #else
-    return pwritev2(fd, iov, nr_iov, offset, 0);
+    ssize_t len = pwritev2(aiocb->aio_fildes,
+                        iov,
+                        iovcnt,
+                        aiocb->aio_offset,
+                        0);
+    if (len == 0) {
+        len = handle_aiocb_flush(aiocb);
+    }
+    return len;
 #endif
 }
 
@@ -1710,10 +1722,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
     len = RETRY_ON_EINTR(
         (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
             (aiocb->flags &  BDRV_REQ_FUA) ?
-                qemu_pwrite_fua(aiocb->aio_fildes,
-                                aiocb->io.iov,
-                                aiocb->io.niov,
-                                aiocb->aio_offset) :
+                qemu_pwritev_fua(aiocb, aiocb->io.iov, aiocb->io.niov) :
                 qemu_pwritev(aiocb->aio_fildes,
                             aiocb->io.iov,
                             aiocb->io.niov,
@@ -1744,10 +1753,11 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
     while (offset < aiocb->aio_nbytes) {
         if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
             if (aiocb->flags & BDRV_REQ_FUA) {
-                len = qemu_pwrite_fua(aiocb->aio_fildes,
-                                    aiocb->io.iov,
-                                    aiocb->io.niov,
-                                    aiocb->aio_offset);
+                struct iovec iov = {
+                    .iov_base = buf,
+                    .iov_len = aiocb->aio_nbytes - offset,
+                };
+                len = qemu_pwritev_fua(aiocb, &iov, 1);
             } else {
                 len = pwrite(aiocb->aio_fildes,
                             (const char *)buf + offset,
@@ -2567,12 +2577,6 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
 
     assert(qiov->size == bytes);
     ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
-#ifndef RWD_DSYNC
-    if (ret == 0 && (flags & BDRV_REQ_FUA)) {
-        /* TODO Use pwritev2() instead if it's available */
-        ret = raw_co_flush_to_disk(bs);
-    }
-#endif
     goto out; /* Avoid the compiler err of unused label */
 
 out:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update
  2025-04-03  8:16 ` [PATCH v2 2/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update Pinku Deb Nath
@ 2025-04-03 15:58   ` Stefan Hajnoczi
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2025-04-03 15:58 UTC (permalink / raw)
  To: Pinku Deb Nath; +Cc: Kevin Wolf, qemu-block, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4577 bytes --]

On Thu, Apr 03, 2025 at 01:16:33AM -0700, Pinku Deb Nath wrote:
> The testing with "-t writeback" works for turning on enable_write_cache.
> I renamed the function to qemu_pwritev_fua() and fixed any typos.
> 
> I moved the handle_aiocb_flush() into the qemu_pwritev_fua() and
> removed from the previously todo seciont. Initially I thought
> of only passing aiocb, but then I was not sure whethe I could
> derive buf from aiocb, so I added arguments for iovec and iovcnt
> into qemu_pwritev_fua().
> 
> For handling buf in handle_aiocb_rw_linear(), I created iovec
> and passed its reference. I assumed that there will be only one
> buffer/iovec, so I passed 1 for iovcnt.
> 
> Signed-off-by: Pinku Deb Nath <prantoran@gmail.com>
> ---
>  block/file-posix.c | 38 +++++++++++++++++++++-----------------
>  1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 34de816eab..4fffd49318 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1676,12 +1676,24 @@ qemu_pwritev(int fd, const struct iovec *iov, int nr_iov, off_t offset)
>  }
>  
>  static ssize_t
> -qemu_pwrite_fua(int fd, const struct iovec *iov, int nr_iov, off_t offset)
> +qemu_pwritev_fua(const RawPosixAIOData *aiocb, struct iovec *iov, int iovcnt)
>  {
>  #ifdef RWF_DSYNC
> -    return pwritev2(fd, iov, nr_iov, offset, RWF_DSYNC);
> +    return pwritev2(aiocb->aio_fildes,
> +                    iov,
> +                    iovcnt,
> +                    aiocb->aio_offset,
> +                    RWF_DSYNC);
>  #else
> -    return pwritev2(fd, iov, nr_iov, offset, 0);
> +    ssize_t len = pwritev2(aiocb->aio_fildes,
> +                        iov,
> +                        iovcnt,
> +                        aiocb->aio_offset,
> +                        0);

On a non-Linux host pwritev2(2) will not exist. Please take a look at
how qemu_preadv() is integrated (including the !CONFIG_PREADV case) and
decide on a solution that works on non-Linux hosts.

> +    if (len == 0) {
> +        len = handle_aiocb_flush(aiocb);
> +    }
> +    return len;
>  #endif
>  }
>  
> @@ -1710,10 +1722,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
>      len = RETRY_ON_EINTR(
>          (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
>              (aiocb->flags &  BDRV_REQ_FUA) ?
> -                qemu_pwrite_fua(aiocb->aio_fildes,
> -                                aiocb->io.iov,
> -                                aiocb->io.niov,
> -                                aiocb->aio_offset) :
> +                qemu_pwritev_fua(aiocb, aiocb->io.iov, aiocb->io.niov) :
>                  qemu_pwritev(aiocb->aio_fildes,
>                              aiocb->io.iov,
>                              aiocb->io.niov,
> @@ -1744,10 +1753,11 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
>      while (offset < aiocb->aio_nbytes) {
>          if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
>              if (aiocb->flags & BDRV_REQ_FUA) {
> -                len = qemu_pwrite_fua(aiocb->aio_fildes,
> -                                    aiocb->io.iov,
> -                                    aiocb->io.niov,
> -                                    aiocb->aio_offset);
> +                struct iovec iov = {
> +                    .iov_base = buf,
> +                    .iov_len = aiocb->aio_nbytes - offset,
> +                };
> +                len = qemu_pwritev_fua(aiocb, &iov, 1);

The else branch takes offset into account. Here aiocb is passed in
assuming it's the first iteration of the while (offset <
aiocb->aio_nbytes) loop. On subsequent iterations the wrong values will
be used because offset has changed.

Perhaps it's easier to pass in the individual parameters (fd, offset,
etc) instead of passing in aiocb.

>              } else {
>                  len = pwrite(aiocb->aio_fildes,
>                              (const char *)buf + offset,
> @@ -2567,12 +2577,6 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
>  
>      assert(qiov->size == bytes);
>      ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
> -#ifndef RWD_DSYNC
> -    if (ret == 0 && (flags & BDRV_REQ_FUA)) {
> -        /* TODO Use pwritev2() instead if it's available */
> -        ret = raw_co_flush_to_disk(bs);
> -    }
> -#endif
>      goto out; /* Avoid the compiler err of unused label */
>  
>  out:
> -- 
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 0/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update
  2025-04-03  8:16 [PATCH v2 0/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update Pinku Deb Nath
  2025-04-03  8:16 ` [PATCH v2 1/2] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA Pinku Deb Nath
  2025-04-03  8:16 ` [PATCH v2 2/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update Pinku Deb Nath
@ 2025-04-03 16:08 ` Stefan Hajnoczi
  2 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2025-04-03 16:08 UTC (permalink / raw)
  To: Pinku Deb Nath; +Cc: Kevin Wolf, qemu-block, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

On Thu, Apr 03, 2025 at 01:16:31AM -0700, Pinku Deb Nath wrote:
> The testing with "-t writeback" works for turning on enable_write_cache.
> I renamed the function to qemu_pwritev_fua() and fixed any typos.
> 
> I moved the handle_aiocb_flush() into the qemu_pwritev_fua() and
> removed from the previously todo seciont. Initially I thought
> of only passing aiocb, but then I was not sure whethe I could
> derive buf from aiocb, so I added arguments for iovec and iovcnt
> into qemu_pwritev_fua().
> 
> For handling buf in handle_aiocb_rw_linear(), I created iovec
> and passed its reference. I assumed that there will be only one
> buffer/iovec, so I passed 1 for iovcnt.
> 
> Signed-off-by: Pinku Deb Nath <prantoran@gmail.com>
> 
> Pinku Deb Nath (2):
>   block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA
>   block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update
> 
>  block/file-posix.c | 54 +++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 42 insertions(+), 12 deletions(-)

Thanks for sending this updates patch series. Please squash changes in
the future instead of appending them as separate commits. This means
editing previous commits (e.g. git rebase -i master) so that they
contain changes made after code review.

So if commit 1 is '+ printf("foo\n")', then instead of adding commit 2
to add a semi-colon to the end of the line, just edit the commit so it
is '+ printf("foo\n");' in v2 of your patch.

One reason to squash changes is so that git-bisect(1) works. Without
squashing, there will be intermediate commits that are broken and maybe
don't even compile. git-bisect(1) is only usable when each commit
compiles and passes tests.

Reviews also tend to prefer to see the final state of commits so they
don't have to review every incremental edit that was made (often
replacing code they already reviewed). It saves them time.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-04-03 16:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-03  8:16 [PATCH v2 0/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update Pinku Deb Nath
2025-04-03  8:16 ` [PATCH v2 1/2] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA Pinku Deb Nath
2025-04-03  8:16 ` [PATCH v2 2/2] [PATCH] block/file-posix.c: Use pwritev2() with RWF_DSYNC for FUA - update Pinku Deb Nath
2025-04-03 15:58   ` Stefan Hajnoczi
2025-04-03 16:08 ` [PATCH v2 0/2] " Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).