All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chao Shi <coshi036@gmail.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org, Chao Shi <coshi036@gmail.com>,
	Sungwoo Kim <iam@sung-woo.kim>, Dave Tian <daveti@purdue.edu>,
	Weidong Zhu <weizhu@fiu.edu>
Subject: [RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears
Date: Sat, 25 Apr 2026 22:01:37 -0400	[thread overview]
Message-ID: <20260426020137.1221985-1-coshi036@gmail.com> (raw)

A WARN_ON_ONCE(!buffer_uptodate(bh)) in mark_buffer_dirty() is reachable
from the buffered write path on a block device when the underlying
device returns I/O errors at high density.  Reproduced by fuzzing an
NVMe controller (FEMU) that returns crafted error completions for a
sustained workload from /dev/nvme0n1.

The race is:

  CPU A: block_commit_write (folio lock held)        CPU B: end_buffer_async_read
    set_buffer_uptodate(bh);
                                                       clear_buffer_uptodate(bh);
    mark_buffer_dirty(bh);  /* WARN fires */

The contract documented at set_buffer_uptodate() in
include/linux/buffer_head.h:140 already states:

  "Any other serialization (with IO errors or whatever that might
   clear the bit) has to come from other state (eg BH_Lock)."

block_commit_write() and the buffer_new() branch in
__block_write_begin_int() violate this contract: they hold the folio
lock but not BH_Lock when calling set_buffer_uptodate() immediately
followed by mark_buffer_dirty().  Take BH_Lock around the pair so the
documented serialization holds.

The race is the same family as 558d6450c775 ("ext4: fix
WARN_ON_ONCE(!buffer_uptodate) after an error writing the
superblock"), which addressed the ext4 superblock-specific case via
state recovery.  No equivalent recovery hook exists in the generic
block_commit_write() path, so apply BH_Lock instead.

WARN stack:

  RIP: mark_buffer_dirty+0x4c2/0x560 fs/buffer.c:1183
  Call Trace:
    block_commit_write          fs/buffer.c
    block_write_end             fs/buffer.c
    iomap_write_end             fs/iomap/buffered-io.c
    iomap_file_buffered_write
    blkdev_buffered_write       block/fops.c
    blkdev_write_iter
    vfs_write
    __x64_sys_write

Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).

Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---

Notes for reviewers (RFC):

1. lock_buffer() in the buffered-write end path: in the steady state
   the bh should be unlocked when block_commit_write() runs
   (block_write_begin already waited for any RMW read), so contention
   should be rare.  fio numbers TBD; happy to defer until measured if
   that is the bar.

2. Several other call sites have the same set_buffer_uptodate(bh)
   immediately followed by mark_buffer_dirty(bh) pattern without
   BH_Lock:

     fs/nilfs2/mdt.c:60
     fs/ocfs2/alloc.c:6840
     fs/ocfs2/aops.c:655
     fs/exfat/fatent.c:408
     fs/exfat/misc.c:168
     fs/ufs/ialloc.c:146
     fs/ufs/inode.c:1076, 1088
     fs/ufs/balloc.c:324
     fs/jfs/super.c:770
     fs/ext2/super.c:1594
     fs/ntfs3/fsntfs.c:1096, 1491
     fs/ntfs3/bitmap.c:755, 796, 1387

   Most look like one-shot init / metadata paths where concurrent IO
   completion on the same bh is unlikely, but I have not audited each.
   Per-fs follow-ups by respective maintainers, or one tree-wide
   series?

3. Reproducer is currently fuzzer-only (FEMU + syz-executor).  A
   minimal C reproducer using dm-flakey for read-error injection is
   in progress.

 fs/buffer.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 4d7f84e77d2..bc4fad93392 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2041,9 +2041,16 @@ int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
 			if (buffer_new(bh)) {
 				clean_bdev_bh_alias(bh);
 				if (folio_test_uptodate(folio)) {
+					/*
+					 * See block_commit_write() for why we
+					 * must hold BH_Lock around set_uptodate
+					 * + mark_dirty.
+					 */
+					lock_buffer(bh);
 					clear_buffer_new(bh);
 					set_buffer_uptodate(bh);
 					mark_buffer_dirty(bh);
+					unlock_buffer(bh);
 					continue;
 				}
 				if (block_end > to || block_start < from)
@@ -2104,8 +2111,20 @@ void block_commit_write(struct folio *folio, size_t from, size_t to)
 			if (!buffer_uptodate(bh))
 				partial = true;
 		} else {
+			/*
+			 * Per the contract documented at set_buffer_uptodate()
+			 * (include/linux/buffer_head.h), callers must hold
+			 * BH_Lock to serialize against concurrent clears of
+			 * BH_Uptodate.  Holding only the folio lock is not
+			 * sufficient: a concurrent end_buffer_async_read() on
+			 * a previously failed read can clear BH_Uptodate
+			 * between set_buffer_uptodate() and mark_buffer_dirty(),
+			 * tripping the WARN_ON_ONCE in mark_buffer_dirty().
+			 */
+			lock_buffer(bh);
 			set_buffer_uptodate(bh);
 			mark_buffer_dirty(bh);
+			unlock_buffer(bh);
 		}
 		if (buffer_new(bh))
 			clear_buffer_new(bh);
-- 
2.43.0


             reply	other threads:[~2026-04-26  2:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-26  2:01 Chao Shi [this message]
2026-04-26  2:42 ` [RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears Matthew Wilcox
2026-04-27 11:24   ` Jan Kara
2026-04-27 13:46     ` Matthew Wilcox
2026-04-27 15:48       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260426020137.1221985-1-coshi036@gmail.com \
    --to=coshi036@gmail.com \
    --cc=brauner@kernel.org \
    --cc=daveti@purdue.edu \
    --cc=iam@sung-woo.kim \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=weizhu@fiu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.