All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baokun Li <libaokun@linux.alibaba.com>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
	yi.zhang@huawei.com, ojaswin@linux.ibm.com,
	ritesh.list@gmail.com, peng_wang@linux.alibaba.com
Subject: [PATCH v2 2/8] ext4: drain in-flight DIO before buffered write fallback
Date: Thu, 18 Jun 2026 20:57:29 +0800	[thread overview]
Message-ID: <20260618125735.4156639-3-libaokun@linux.alibaba.com> (raw)
In-Reply-To: <20260618125735.4156639-1-libaokun@linux.alibaba.com>

generic/746 started failing intermittently on ext3 (no-extent inodes).
The test triggers 'Page cache invalidation failure on direct I/O'
warnings and subsequent fsync returns -EIO. Adding a 50ms delay
between ext4_buffered_write_iter() and filemap_write_and_wait_range()
in ext4_dio_write_iter() makes the race almost always reproducible.

On no-extent inodes, DIO writes to holes cannot use unwritten extents,
so ext4_iomap_alloc() leaves m_flags=0 and ext4_map_blocks() returns 0.
The iomap layer then returns -ENOTBLK, causing fallback to buffered I/O.

The fallback path in ext4_dio_write_iter() calls
ext4_buffered_write_iter() which dirties pages, then does flush and
invalidate. However, there's an unprotected window between
ext4_buffered_write_iter() returning (with inode lock released) and
the subsequent flush+invalidate.

Concurrent async DIO completions from other threads can run
kiocb_invalidate_post_direct_write() during this window. If pages have
been re-dirtied, post-invalidation finds dirty pages and triggers the
warning, setting -EIO in the error sequence.

Consider a file with two 4k extents: [hole][written]. Thread A does
DIO to the written extent, while thread B does DIO spanning both:

  kworker A (4k DIO, allocated block)    kworker B (8k DIO, fallback)
  -----------------------------------    ----------------------------
  inode_lock_shared()                    inode_lock_shared()
  iomap_dio_rw():                        iomap_dio_rw():
    kiocb_invalidate_pages -> clean        iomap_begin -> -ENOTBLK
    submit_bio (async)                     dio->size = 0
  inode_unlock_shared()                  inode_unlock_shared()

  [bio pending in block layer]           /* fallback: lock released */
                                         ext4_buffered_write_iter()
                                           inode_lock(exclusive)
                                           generic_perform_write()
                                             -> dirty pages [0, 8k]
                                           inode_unlock(exclusive)

                                         /* pages dirty, no lock */
  [bio completes]                        filemap_write_and_wait_range()
  iomap_dio_complete()                     -> flush dirty pages
    kiocb_invalidate_post_direct_write() invalidate_mapping_pages()
      invalidate_inode_pages2_range()
      -> finds dirty page!
      -> dio_warn_stale_pagecache()
      -> errseq_set(-EIO)

This issue can be triggered through normal I/O paths, not just
intentionally overlapping DIO writes from userspace. For example,
generic/746 uses a loop device where multiple kworkers issue concurrent
I/O to the backing file. Additionally, when block_size < folio_size,
non-overlapping DIO writes that share a large folio can also trigger
the race.

Add inode_dio_wait() in ext4_buffered_write_iter() before
generic_perform_write() to drain all in-flight DIO. This ensures
that all DIO clears existing pages before submitting IO (via
kiocb_invalidate_pages()), and all BIO waits for all DIO to
complete (via inode_dio_wait()), thus eliminating the race.

Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure")
Suggested-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/d1adcf7c-c276-458d-9cac-68a4410f7626@gmail.com
Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
---
 fs/ext4/file.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index eb1a323962b1..9f9bc0b13772 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -313,6 +313,12 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 	if (ret <= 0)
 		goto out;
 
+	/*
+	 * Prevent concurrent DIO and BIO to the same file range.
+	 * Wait for all in-flight DIO to complete before dirtying pages.
+	 */
+	inode_dio_wait(inode);
+
 	ret = generic_perform_write(iocb, from);
 
 out:
-- 
2.43.7


  parent reply	other threads:[~2026-06-18 12:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18 12:57 [PATCH v2 0/8] ext4: allow more DIO writes under shared i_rwsem Baokun Li
2026-06-18 12:57 ` [PATCH v2 1/8] ext4: prevent sleeping allocation in NOWAIT write path Baokun Li
2026-06-18 13:52   ` Jan Kara
2026-06-18 12:57 ` Baokun Li [this message]
2026-06-18 13:54   ` [PATCH v2 2/8] ext4: drain in-flight DIO before buffered write fallback Jan Kara
2026-06-18 12:57 ` [PATCH v2 3/8] ext4: skip overwrite check for aligned non-extending DIO writes Baokun Li
2026-06-18 12:57 ` [PATCH v2 4/8] ext4: base unaligned DIO lock decision on partial block zeroing Baokun Li
2026-06-18 12:57 ` [PATCH v2 5/8] ext4: use kiocb_modified instead of file_modified in DIO/DAX write path Baokun Li
2026-06-18 13:56   ` Jan Kara
2026-06-18 12:57 ` [PATCH v2 6/8] ext4: return -EAGAIN from ext4_map_blocks() in NOWAIT cache miss Baokun Li
2026-06-18 14:09   ` Jan Kara
2026-06-18 15:51     ` Baokun Li
2026-06-18 12:57 ` [PATCH v2 7/8] ext4: handle IOMAP_NOWAIT in ext4_iomap_begin() with cache-only lookup Baokun Li
2026-06-18 14:09   ` Jan Kara
2026-06-18 12:57 ` [PATCH v2 8/8] ext4: handle IOCB_NOWAIT in ext4_dio_needs_zeroing() " Baokun Li
2026-06-18 14:10   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260618125735.4156639-3-libaokun@linux.alibaba.com \
    --to=libaokun@linux.alibaba.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=peng_wang@linux.alibaba.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.