qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-block@nongnu.org, pbonzini@redhat.com, afaria@redhat.com,
	hreitz@redhat.com, qemu-devel@nongnu.org
Subject: Re: [PATCH 1/5] file-posix: Support FUA writes
Date: Mon, 10 Mar 2025 18:41:58 +0800	[thread overview]
Message-ID: <20250310104158.GA359802@fedora> (raw)
In-Reply-To: <20250307221634.71951-2-kwolf@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]

On Fri, Mar 07, 2025 at 11:16:30PM +0100, Kevin Wolf wrote:
> Until now, FUA was always emulated with a separate flush after the write
> for file-posix. The overhead of processing a second request can reduce
> performance significantly for a guest disk that has disabled the write
> cache, especially if the host disk is already write through, too, and
> the flush isn't actually doing anything.
> 
> Advertise support for REQ_FUA in write requests and implement it for
> Linux AIO and io_uring using the RWF_DSYNC flag for write requests. The
> thread pool still performs a separate fdatasync() call. This can be
> improved later by using the pwritev2() syscall if available.
> 
> As an example, this is how fio numbers can be improved in some scenarios
> with this patch (all using virtio-blk with cache=directsync on an nvme
> block device for the VM, fio with ioengine=libaio,direct=1,sync=1):
> 
>                               | old           | with FUA support
> ------------------------------+---------------+-------------------
> bs=4k, iodepth=1, numjobs=1   |  45.6k iops   |  56.1k iops
> bs=4k, iodepth=1, numjobs=16  | 183.3k iops   | 236.0k iops
> bs=4k, iodepth=16, numjobs=1  | 258.4k iops   | 311.1k iops
> 
> However, not all scenarios are clear wins. On another slower disk I saw
> little to no improvment. In fact, in two corner case scenarios, I even
> observed a regression, which I however consider acceptable:
> 
> 1. On slow host disks in a write through cache mode, when the guest is
>    using virtio-blk in a separate iothread so that polling can be
>    enabled, and each completion is quickly followed up with a new
>    request (so that polling gets it), it can happen that enabling FUA
>    makes things slower - the additional very fast no-op flush we used to
>    have gave the adaptive polling algorithm a success so that it kept
>    polling. Without it, we only have the slow write request, which
>    disables polling. This is a problem in the polling algorithm that
>    will be fixed later in this series.
> 
> 2. With a high queue depth, it can be beneficial to have flush requests
>    for another reason: The optimisation in bdrv_co_flush() that flushes
>    only once per write generation acts as a synchronisation mechanism
>    that lets all requests complete at the same time. This can result in
>    better batching and if the disk is very fast (I only saw this with a
>    null_blk backend), this can make up for the overhead of the flush and
>    improve throughput. In theory, we could optionally introduce a
>    similar artificial latency in the normal completion path to achieve
>    the same kind of completion batching. This is not implemented in this
>    series.
> 
> Compatibility is not a concern for io_uring, it has supported RWF_DSYNC
> from the start. Linux AIO started supporting it in Linux 4.13 and libaio
> 0.3.111. The kernel is not a problem for any supported build platform,
> so it's not necessary to add runtime checks. However, openSUSE is still
> stuck with an older libaio version that would break the build. We must
> detect this at build time to avoid build failures.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/raw-aio.h |  8 ++++++--
>  block/file-posix.c      | 26 ++++++++++++++++++--------
>  block/io_uring.c        | 13 ++++++++-----
>  block/linux-aio.c       | 24 +++++++++++++++++++++---
>  meson.build             |  4 ++++
>  5 files changed, 57 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-03-10 10:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-07 22:16 [PATCH 0/5] block: Improve writethrough performance Kevin Wolf
2025-03-07 22:16 ` [PATCH 1/5] file-posix: Support FUA writes Kevin Wolf
2025-03-10 10:41   ` Stefan Hajnoczi [this message]
2025-03-07 22:16 ` [PATCH 2/5] block/io: Ignore FUA with cache.no-flush=on Kevin Wolf
2025-03-10 10:42   ` Stefan Hajnoczi
2025-03-07 22:16 ` [PATCH 3/5] aio: Create AioPolledEvent Kevin Wolf
2025-03-10 10:55   ` Stefan Hajnoczi
2025-03-07 22:16 ` [PATCH 4/5] aio-posix: Factor out adjust_polling_time() Kevin Wolf
2025-03-10 10:55   ` Stefan Hajnoczi
2025-03-07 22:16 ` [PATCH 5/5] aio-posix: Separate AioPolledEvent per AioHandler Kevin Wolf
2025-03-10 10:55   ` Stefan Hajnoczi
2025-03-10 11:11     ` Kevin Wolf
2025-03-11  2:18       ` Stefan Hajnoczi
2025-03-10 10:55 ` [PATCH 0/5] block: Improve writethrough performance Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250310104158.GA359802@fedora \
    --to=stefanha@redhat.com \
    --cc=afaria@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).