From: Baokun Li <libaokun@linux.alibaba.com>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
yi.zhang@huawei.com, ojaswin@linux.ibm.com,
ritesh.list@gmail.com, peng_wang@linux.alibaba.com
Subject: [PATCH 0/2] ext4: allow more DIO writes under shared i_rwsem
Date: Fri, 12 Jun 2026 00:34:39 +0800 [thread overview]
Message-ID: <20260611163441.2431805-1-libaokun@linux.alibaba.com> (raw)
Hi all,
This series relaxes the i_rwsem requirements of ext4_dio_write_iter()
so that more direct I/O writes can proceed under the shared lock.
It continues the work started by Peng Wang's RFC [1]; I'm taking
over this effort going forward.
ext4_dio_write_checks() currently calls ext4_overwrite_io() to decide
whether the shared lock is sufficient. Its single ext4_map_blocks()
lookup only sees the first contiguous extent of the same type, which
forces the exclusive lock for two cases that are actually safe under
the shared lock (see individual patches for the full safety
argument):
1. Aligned writes spanning multiple already-allocated extents (e.g.
written + unwritten, or two discontiguous written extents).
2. Unaligned writes whose head/tail partial blocks land on written
extents but the fully-covered middle blocks include hole or
unwritten extents.
Patch 1 skips the ext4_overwrite_io() pre-check entirely for aligned
non-extending writes, letting them proceed under the shared lock
regardless of extent state.
Patch 2 replaces ext4_overwrite_io() with ext4_dio_needs_zeroing(),
which directly answers the question driving the lock decision. It
checks only the head and tail partial blocks (at most two
ext4_map_blocks() calls), and ignores the state of middle blocks.
Testing
=======
"kvm-xfstests -c ext4/all -g auto" passes with no new failures.
Performance
===========
Hardware: /dev/sda (rotational disk, ~1 GB/s sustained write)
Filesystem: ext4 default mkfs
Test 1: aligned 8K DIO writes spanning written+unwritten extent
boundaries. Each thread writes its own 1G region sequentially; the
file is rebuilt between runs so every block is written exactly once.
Metric: IOPS.
JOBS base +patch 1 +patch 1+2 speedup
---- --------- -------- ---------- -------
1 42,322 43,329 43,087 1.02x
2 68,516 70,677 66,958 1.03x
4 62,489 97,072 101,468 1.62x
8 58,701 110,819 113,679 1.94x
16 58,569 116,392 115,272 1.97x
32 60,860 117,244 119,621 1.97x
Wall time at JOBS=32: 69.2s (base) -> 35.4s (patched), 1.96x faster.
Test 2: unaligned DIO writes (14336 bytes at +512 within each 16K
stripe). Each stripe is laid out as [written][unwritten][unwritten]
[written], so the head and tail partial blocks land on written
extents but the middle is unwritten. Metric: IOPS.
JOBS base +patch 1 +patch 1+2 speedup
---- --------- -------- ---------- -------
1 15,547 15,975 17,381 1.12x
2 15,910 14,808 34,172 2.15x
4 15,014 14,828 57,567 3.83x
8 15,022 14,648 81,947 5.46x
16 14,586 14,262 99,126 6.80x
32 14,047 13,809 92,519 6.59x
Wall time at JOBS=32: 149.3s (base) -> 22.7s (patched), 6.58x faster.
In test 2, patch 1 alone has no effect (slight noise) because patch 1
only touches the aligned write path. Patch 2 introduces
ext4_dio_needs_zeroing() which precisely identifies when partial
block zeroing is required, allowing the shared lock for the much
larger set of unaligned writes that don't actually trigger zeroing.
Comments and questions are, as always, welcome.
Thanks,
Baokun
[1]: https://patch.msgid.link/20260607124935.6168-1-peng_wang@linux.alibaba.com
Baokun Li (2):
ext4: skip overwrite check for aligned non-extending DIO writes
ext4: base unaligned DIO lock decision on partial block zeroing
fs/ext4/file.c | 132 +++++++++++++++++++++++++++++++++----------------
1 file changed, 89 insertions(+), 43 deletions(-)
--
2.43.7
next reply other threads:[~2026-06-11 16:34 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 16:34 Baokun Li [this message]
2026-06-11 16:34 ` [PATCH 1/2] ext4: skip overwrite check for aligned non-extending DIO writes Baokun Li
2026-06-11 16:34 ` [PATCH 2/2] ext4: base unaligned DIO lock decision on partial block zeroing Baokun Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260611163441.2431805-1-libaokun@linux.alibaba.com \
--to=libaokun@linux.alibaba.com \
--cc=adilger.kernel@dilger.ca \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=peng_wang@linux.alibaba.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox