[RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Chao Shi <coshi036@gmail.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org, Chao Shi <coshi036@gmail.com>,
	Sungwoo Kim <iam@sung-woo.kim>, Dave Tian <daveti@purdue.edu>,
	Weidong Zhu <weizhu@fiu.edu>
Subject: [RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears
Date: Sat, 25 Apr 2026 22:01:37 -0400	[thread overview]
Message-ID: <20260426020137.1221985-1-coshi036@gmail.com> (raw)

A WARN_ON_ONCE(!buffer_uptodate(bh)) in mark_buffer_dirty() is reachable
from the buffered write path on a block device when the underlying
device returns I/O errors at high density.  Reproduced by fuzzing an
NVMe controller (FEMU) that returns crafted error completions for a
sustained workload from /dev/nvme0n1.

The race is:

  CPU A: block_commit_write (folio lock held)        CPU B: end_buffer_async_read
    set_buffer_uptodate(bh);
                                                       clear_buffer_uptodate(bh);
    mark_buffer_dirty(bh);  /* WARN fires */

The contract documented at set_buffer_uptodate() in
include/linux/buffer_head.h:140 already states:

  "Any other serialization (with IO errors or whatever that might
   clear the bit) has to come from other state (eg BH_Lock)."

block_commit_write() and the buffer_new() branch in
__block_write_begin_int() violate this contract: they hold the folio
lock but not BH_Lock when calling set_buffer_uptodate() immediately
followed by mark_buffer_dirty().  Take BH_Lock around the pair so the
documented serialization holds.

The race is the same family as 558d6450c775 ("ext4: fix
WARN_ON_ONCE(!buffer_uptodate) after an error writing the
superblock"), which addressed the ext4 superblock-specific case via
state recovery.  No equivalent recovery hook exists in the generic
block_commit_write() path, so apply BH_Lock instead.

WARN stack:

  RIP: mark_buffer_dirty+0x4c2/0x560 fs/buffer.c:1183
  Call Trace:
    block_commit_write          fs/buffer.c
    block_write_end             fs/buffer.c
    iomap_write_end             fs/iomap/buffered-io.c
    iomap_file_buffered_write
    blkdev_buffered_write       block/fops.c
    blkdev_write_iter
    vfs_write
    __x64_sys_write

Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).

Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---

Notes for reviewers (RFC):

1. lock_buffer() in the buffered-write end path: in the steady state
   the bh should be unlocked when block_commit_write() runs
   (block_write_begin already waited for any RMW read), so contention
   should be rare.  fio numbers TBD; happy to defer until measured if
   that is the bar.

2. Several other call sites have the same set_buffer_uptodate(bh)
   immediately followed by mark_buffer_dirty(bh) pattern without
   BH_Lock:

     fs/nilfs2/mdt.c:60
     fs/ocfs2/alloc.c:6840
     fs/ocfs2/aops.c:655
     fs/exfat/fatent.c:408
     fs/exfat/misc.c:168
     fs/ufs/ialloc.c:146
     fs/ufs/inode.c:1076, 1088
     fs/ufs/balloc.c:324
     fs/jfs/super.c:770
     fs/ext2/super.c:1594
     fs/ntfs3/fsntfs.c:1096, 1491
     fs/ntfs3/bitmap.c:755, 796, 1387

   Most look like one-shot init / metadata paths where concurrent IO
   completion on the same bh is unlikely, but I have not audited each.
   Per-fs follow-ups by respective maintainers, or one tree-wide
   series?

3. Reproducer is currently fuzzer-only (FEMU + syz-executor).  A
   minimal C reproducer using dm-flakey for read-error injection is
   in progress.

 fs/buffer.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 4d7f84e77d2..bc4fad93392 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2041,9 +2041,16 @@ int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
 			if (buffer_new(bh)) {
 				clean_bdev_bh_alias(bh);
 				if (folio_test_uptodate(folio)) {
+					/*
+					 * See block_commit_write() for why we
+					 * must hold BH_Lock around set_uptodate
+					 * + mark_dirty.
+					 */
+					lock_buffer(bh);
 					clear_buffer_new(bh);
 					set_buffer_uptodate(bh);
 					mark_buffer_dirty(bh);
+					unlock_buffer(bh);
 					continue;
 				}
 				if (block_end > to || block_start < from)
@@ -2104,8 +2111,20 @@ void block_commit_write(struct folio *folio, size_t from, size_t to)
 			if (!buffer_uptodate(bh))
 				partial = true;
 		} else {
+			/*
+			 * Per the contract documented at set_buffer_uptodate()
+			 * (include/linux/buffer_head.h), callers must hold
+			 * BH_Lock to serialize against concurrent clears of
+			 * BH_Uptodate.  Holding only the folio lock is not
+			 * sufficient: a concurrent end_buffer_async_read() on
+			 * a previously failed read can clear BH_Uptodate
+			 * between set_buffer_uptodate() and mark_buffer_dirty(),
+			 * tripping the WARN_ON_ONCE in mark_buffer_dirty().
+			 */
+			lock_buffer(bh);
 			set_buffer_uptodate(bh);
 			mark_buffer_dirty(bh);
+			unlock_buffer(bh);
 		}
 		if (buffer_new(bh))
 			clear_buffer_new(bh);
-- 
2.43.0

next             reply	other threads:[~2026-04-26  2:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-26  2:01 Chao Shi [this message]
2026-04-26  2:42 ` [RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears Matthew Wilcox
2026-04-27 11:24   ` Jan Kara
2026-04-27 13:46     ` Matthew Wilcox
2026-04-27 15:48       ` Jan Kara

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:4d7f84e77d dfblob:bc4fad9339 )
 OR (
bs:"[RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260426020137.1221985-1-coshi036@gmail.com \
    --to=coshi036@gmail.com \
    --cc=brauner@kernel.org \
    --cc=daveti@purdue.edu \
    --cc=iam@sung-woo.kim \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=weizhu@fiu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox