From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Brian Foster <bfoster@redhat.com>, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: allow concurrent unaligned dio overwrites
Date: Fri, 24 Feb 2023 10:10:40 +0530 [thread overview]
Message-ID: <87v8jrekvr.fsf@doe.com> (raw)
In-Reply-To: <20230223191626.263331-1-bfoster@redhat.com>
Brian Foster <bfoster@redhat.com> writes:
> We've had reports of significant performance regression of sub-block
> (unaligned) direct writes due to the added exclusivity restrictions
> in ext4. The purpose of the exclusivity requirement for unaligned
> direct writes is to avoid data corruption caused by unserialized
> partial block zeroing in the iomap dio layer across overlapping
> writes.
>
> XFS has similar requirements for the same underlying reasons, yet
> doesn't suffer the extreme performance regression that ext4 does.
> The reason for this is that XFS utilizes IOMAP_DIO_OVERWRITE_ONLY
> mode, which allows for optimistic submission of concurrent unaligned
> I/O and kicks back writes that require partial block zeroing such
> that they can be submitted in a safe, exclusive context. Since ext4
> already performs most of these checks pre-submission, it can support
> something similar without necessarily relying on the iomap flag and
> associated retry mechanism.
>
> Update the dio write submission path to allow concurrent submission
> of unaligned direct writes that are purely overwrite and so will not
> require block zeroing. To improve readability of the various related
> checks, move the unaligned I/O handling down into
> ext4_dio_write_checks(), where the dio draining and force wait logic
> can immediately follow the locking requirement checks. Finally, the
> IOMAP_DIO_OVERWRITE_ONLY flag is set to enable a warning check as a
> precaution should the ext4 overwrite logic ever become inconsistent
> with the zeroing expectations of iomap dio.
>
> The performance improvement of sub-block direct write I/O is shown
> in the following fio test on a 64xcpu guest vm:
>
> Test: fio --name=test --ioengine=libaio --direct=1 --group_reporting
> --overwrite=1 --thread --size=10G --filename=/mnt/fio
> --readwrite=write --ramp_time=10s --runtime=60s --numjobs=8
> --blocksize=2k --iodepth=256 --allow_file_create=0
>
> v6.2: write: IOPS=4328, BW=8724KiB/s
> v6.2 (patched): write: IOPS=801k, BW=1565MiB/s
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>
> Hi all,
>
> This survives a couple fstests regression runs (with 4k and 2k block
> sizes) and cleans up the code a bit from the RFC, taking a suggestion
> from Ritesh to move some of the checks into ext4_dio_write_checks().
Thanks for working on the suggestion and rebasing on top.
I liked the way this patch has turned out to be. It's very clear now.
Thanks again for the optimization!!
Looks good to me. Please feel free to add -
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
next prev parent reply other threads:[~2023-02-24 4:41 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-23 19:16 [PATCH] ext4: allow concurrent unaligned dio overwrites Brian Foster
2023-02-24 4:40 ` Ritesh Harjani [this message]
2023-03-07 11:50 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v8jrekvr.fsf@doe.com \
--to=ritesh.list@gmail.com \
--cc=bfoster@redhat.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.