linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: linux-ext4@vger.kernel.org
Subject: [PATCH RFC] ext4: allow concurrent unaligned dio overwrites
Date: Fri, 10 Feb 2023 09:59:54 -0500	[thread overview]
Message-ID: <20230210145954.277611-1-bfoster@redhat.com> (raw)

Hi all,

We've had a customer report a significant performance regression of
sub-block (unaligned) direct writes between a couple distro kernels
(that span a large range of upstream releases). I've not bisected
upstream to narrow down to specific commit(s), but the regression
appears to correspond with added concurrency restrictions of
unaligned dio in ext4. Obviously this user should ideally move to a
configuration that minimizes unaligned I/O, but while looking into
this we also observed that XFS performs noticeably better with the
same workload, even though it has the same general unaligned dio
constraints.

The difference appears to be the use of IOMAP_DIO_OVERWRITE_ONLY in
XFS, which allows optimistic concurrent submission of unaligned
direct I/O under shared locking. I.e., if the dio turns out to be
something other than a pure overwrite that may require block
zeroing, iomap kicks the request back with -EAGAIN so it can be
resubmitted with appropriate exclusivity.

I initially prototyped this same sort of logic on ext4, but on
further inspection realized that ext4 seems to already check for dio
overwrites in ext4_dio_write_checks(). Therefore ISTM that since
ext4 already knows when a dio is purely overwrite, it can safely
submit unaligned dios concurrently where it knows zeroing is not
required, and then fall back to exclusive submission otherwise.

This RFC prototypes something along those lines using ilock_shared
as a proxy for non-overwrite (since non-overwrite always means
non-shared locking). Based on the following fio test against a
prewritten (i.e. no unwritten extents) file, on an 8xcpu kvm guest,
using default ext4 options:

fio --name=test --ioengine=libaio --direct=1 --group_reporting
  --overwrite=1 --thread --size=10G --filename=/mnt/fio
  --readwrite=write --ramp_time=10s --runtime=60s --numjobs=8
  --blocksize=2k --iodepth=256 --allow_file_create=0

... performance goes from something like ~1350 iops / 2.7 MB/s on a
v6.1 kernel to +350k iops / +700MB/s on a patched v6.2.0-rc7 kernel.
The latter is much more closely aligned to what I see from the same
test against XFS.

This also survives an initial fstests regression run, though it does
leave at least a couple open questions I can think of:

1. Do we care to be explicit about overwrites and perhaps plumb
   through an 'overwrite' flag from ext4_dio_write_checks()?
2. Do we want to use DIO_OVERWRITE_ONLY and assume iomap will never
   kick back an overwrite only I/O, or perhaps include retry logic
   similar to XFS? That may be superfluous, but it's not much
   additional  code either.

Thoughts on any of this? If there's consensus I can followup with a v1
with a proper implementation, commit log, code comment updates, etc.

Not-Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/ext4/file.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 7ac0a81bd371..bb41520f89d0 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -493,15 +493,14 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
 	bool extend = false, unaligned_io = false;
 	bool ilock_shared = true;
+	unsigned int dio_flags = 0;
 
 	/*
 	 * We initially start with shared inode lock unless it is
 	 * unaligned IO which needs exclusive lock anyways.
 	 */
-	if (ext4_unaligned_io(inode, from, offset)) {
+	if (ext4_unaligned_io(inode, from, offset))
 		unaligned_io = true;
-		ilock_shared = false;
-	}
 	/*
 	 * Quick check here without any i_rwsem lock to see if it is extending
 	 * IO. A more reliable check is done in ext4_dio_write_checks() with
@@ -563,9 +562,6 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	 * below inode_dio_wait() may anyway become a no-op, since we start
 	 * with exclusive lock.
 	 */
-	if (unaligned_io)
-		inode_dio_wait(inode);
-
 	if (extend) {
 		handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
 		if (IS_ERR(handle)) {
@@ -582,11 +578,18 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		ext4_journal_stop(handle);
 	}
 
-	if (ilock_shared)
+	if (ilock_shared) {
 		iomap_ops = &ext4_iomap_overwrite_ops;
+		if (unaligned_io)
+			dio_flags = IOMAP_DIO_OVERWRITE_ONLY;
+	} else if (unaligned_io || extend) {
+		dio_flags = IOMAP_DIO_FORCE_WAIT;
+		if (unaligned_io)
+			inode_dio_wait(inode);
+	}
 	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
-			   (unaligned_io || extend) ? IOMAP_DIO_FORCE_WAIT : 0,
-			   NULL, 0);
+			   dio_flags, NULL, 0);
+	WARN_ON_ONCE(ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT));
 	if (ret == -ENOTBLK)
 		ret = 0;
 
-- 
2.39.1


             reply	other threads:[~2023-02-10 14:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-10 14:59 Brian Foster [this message]
2023-02-15 15:04 ` [PATCH RFC] ext4: allow concurrent unaligned dio overwrites Ritesh Harjani
2023-02-15 15:56   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230210145954.277611-1-bfoster@redhat.com \
    --to=bfoster@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).