From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-vs1-f52.google.com (mail-vs1-f52.google.com [209.85.217.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0664138490 for ; Sun, 26 Apr 2026 02:01:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777168906; cv=none; b=Dxf7/dGGvOLoly5sv2EMUz9G7vLGrZt2HQN0fZUe+EOMSxMqknhnEBvcD3e7fsbuQ0lrrpF0qhe/Q5WEAbZD0OetZcnu60W/w26xCGbBla1Zgh351e9Dum2XaG/lYMx5Y1kWyGdcu03kPKQ4a2tIbi8AiunFf7gBZOURfR010Qs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777168906; c=relaxed/simple; bh=9NaZesoDM/ryxq7jkZ4oEEQxN7g1m2eWIhyK9JDfE/Q=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=JuoEDhvE+iGB+StgeSbSd4rFvr0k1T6vNgFTQ3Y6NjyR1t5eXeXpaFEkZ5SwCNkoyOgvmX8pbVlw7J3a0uT1lzwe32T5t5GKDH/vJkzxB98J+L9HSAOA5ldlg4NuoG2j5vvD3aCAqGjwcFF/MyiRK2umlW36vGrzvHrhYxnOQPc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XLR26t9e; arc=none smtp.client-ip=209.85.217.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XLR26t9e" Received: by mail-vs1-f52.google.com with SMTP id ada2fe7eead31-610e2e8f57dso3265678137.0 for ; Sat, 25 Apr 2026 19:01:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777168903; x=1777773703; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ROdrh5QX8hoV2xqZ+4RAD5dwLaJZGrNA1ByRkwvfeKY=; b=XLR26t9ex7XpYqyq/rznf94NuJUu0MY1RlqR+hm4lWXbe3WEYPHMyoUbyjsBMAR7Jm KIypEMsK5Ted59Qze5G/WlFGQMRuwnLLgmpnXCiWNvABI7LDansFFXgNosGkbmOns/ZA gaNXdPM/e8VR2iM1JkS/KagcffveeWu3HGDorGF7by7GMcI66IfiYHMUGMkDNHEzG6HJ ueq16mCcAiwByj+04TmoyZiTKQjclO2FClvYDLqgl7HG4G0JLSxqmUNElL7sbZFU+6/Z sZTBediu/4migIeQucIOHbd0PldKi8/mqbNYExMacZW2yoo1EGKLLCUE5AQ5KH5Afa6l dt1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777168903; x=1777773703; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ROdrh5QX8hoV2xqZ+4RAD5dwLaJZGrNA1ByRkwvfeKY=; b=Swcx63tIUBmiAMUScxgnQkUBSCsxbL+5w90tD95bBwCB6T8ciWQGVX2oMDArpn6e56 ImqVVP31PfmPA0l8JuieEkiHnSsRAq5mCRbOmNc+8HVVtpqTpc0bUI/HjB1bFLHswm59 XqSc49sdN5jporYXvgvxLpfwNeJy3pHjZgWiPD7ea1g5CSVPW2WFJTmPusVITNRkOVUY KQMzUCcQOwS3QcDolSAliyKN6BKlV+DYCPGnTyBrVPvEMThLi7rPOVN6PwnYE8hLT0gC AEoHZQJaKgQoQp4Zy4iaPp1Jstb1cUHnkWvkY+f5ogNAJ99OJvC2jnUYQ4yK03UPNIcN VlpA== X-Forwarded-Encrypted: i=1; AFNElJ85yxMMqJ3lgfqMrzJ9M1oSJbrlysP8cVWWmYrbjLW+WhCQxo9e0/WBFqXrYdH6qV7ApLG78S86tO9t5nM=@vger.kernel.org X-Gm-Message-State: AOJu0YywyGnQf6DFKBj3WwzHADFJVowfAUG++SGQuYMtFI/xwlIPVhSh ZdpJWhh/EczNy8WOWKvD8fkTKBLtwv0+qmRwK0kJzbFF9jKft8lk3kQGKWCGBMOIKKg= X-Gm-Gg: AeBDievbY1RhRIl686woqDYqs1VejXJdNolZ0m7CoWRZFZrUB6qZ9LmsIZ/ivyVxi46 DdBtW2Z0whuWGlYUjDs3LXxDIWePEqxDbysUNDxobw+9Sz1j78ELC9UEyCzbdEIZIeADRpJzg/Q VSsaM1IbX2khEfgYy68Wwzgqy1Dk1Ae/aLizID7Y/L2Rhrw2rd0YHxLNS9WOR+mrM5hO5dpn/OL KN3cq0lEu7iaoC2HOR+dLrlMuYzEf2sfnqTfMRgzNVJzTV6tPRaRHzLBz67+knoazt6kCWwoM7R rFBG3doNXacs/sKg7fYSoe3AODz7XuQvWZhxCBQ0eMS8xEOdCn/3ZXfW9g73+5272cfJXojzqhu QRRJkfkLMBbh5L93/WfYaTPJkhkJaHO2Zjf7UBp4+NEzGPnBXw2E9b2+2y8GOcFMnd65oAQOwFG g670a9v7MzM2C8WwNIqpqFRUzRKBnXDFA7F4z420zcWzYWT18sTTq0 X-Received: by 2002:a05:6122:1d91:b0:56c:ce8a:b07a with SMTP id 71dfb90a1353d-56fa589cdb0mr17733883e0c.7.1777168903465; Sat, 25 Apr 2026 19:01:43 -0700 (PDT) Received: from syssplab.cs.fiu.edu (nat1.cs.fiu.edu. [131.94.134.89]) by smtp.gmail.com with ESMTPSA id 71dfb90a1353d-56fa91ea606sm16057942e0c.5.2026.04.25.19.01.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Apr 2026 19:01:43 -0700 (PDT) From: Chao Shi To: Alexander Viro , Christian Brauner , linux-fsdevel@vger.kernel.org Cc: Jan Kara , linux-kernel@vger.kernel.org, Chao Shi , Sungwoo Kim , Dave Tian , Weidong Zhu Subject: [RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears Date: Sat, 25 Apr 2026 22:01:37 -0400 Message-ID: <20260426020137.1221985-1-coshi036@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit A WARN_ON_ONCE(!buffer_uptodate(bh)) in mark_buffer_dirty() is reachable from the buffered write path on a block device when the underlying device returns I/O errors at high density. Reproduced by fuzzing an NVMe controller (FEMU) that returns crafted error completions for a sustained workload from /dev/nvme0n1. The race is: CPU A: block_commit_write (folio lock held) CPU B: end_buffer_async_read set_buffer_uptodate(bh); clear_buffer_uptodate(bh); mark_buffer_dirty(bh); /* WARN fires */ The contract documented at set_buffer_uptodate() in include/linux/buffer_head.h:140 already states: "Any other serialization (with IO errors or whatever that might clear the bit) has to come from other state (eg BH_Lock)." block_commit_write() and the buffer_new() branch in __block_write_begin_int() violate this contract: they hold the folio lock but not BH_Lock when calling set_buffer_uptodate() immediately followed by mark_buffer_dirty(). Take BH_Lock around the pair so the documented serialization holds. The race is the same family as 558d6450c775 ("ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock"), which addressed the ext4 superblock-specific case via state recovery. No equivalent recovery hook exists in the generic block_commit_write() path, so apply BH_Lock instead. WARN stack: RIP: mark_buffer_dirty+0x4c2/0x560 fs/buffer.c:1183 Call Trace: block_commit_write fs/buffer.c block_write_end fs/buffer.c iomap_write_end fs/iomap/buffered-io.c iomap_file_buffered_write blkdev_buffered_write block/fops.c blkdev_write_iter vfs_write __x64_sys_write Found by FuzzNvme(Syzkaller with FEMU fuzzing framework). Acked-by: Sungwoo Kim Acked-by: Dave Tian Acked-by: Weidong Zhu Signed-off-by: Chao Shi --- Notes for reviewers (RFC): 1. lock_buffer() in the buffered-write end path: in the steady state the bh should be unlocked when block_commit_write() runs (block_write_begin already waited for any RMW read), so contention should be rare. fio numbers TBD; happy to defer until measured if that is the bar. 2. Several other call sites have the same set_buffer_uptodate(bh) immediately followed by mark_buffer_dirty(bh) pattern without BH_Lock: fs/nilfs2/mdt.c:60 fs/ocfs2/alloc.c:6840 fs/ocfs2/aops.c:655 fs/exfat/fatent.c:408 fs/exfat/misc.c:168 fs/ufs/ialloc.c:146 fs/ufs/inode.c:1076, 1088 fs/ufs/balloc.c:324 fs/jfs/super.c:770 fs/ext2/super.c:1594 fs/ntfs3/fsntfs.c:1096, 1491 fs/ntfs3/bitmap.c:755, 796, 1387 Most look like one-shot init / metadata paths where concurrent IO completion on the same bh is unlikely, but I have not audited each. Per-fs follow-ups by respective maintainers, or one tree-wide series? 3. Reproducer is currently fuzzer-only (FEMU + syz-executor). A minimal C reproducer using dm-flakey for read-error injection is in progress. fs/buffer.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/buffer.c b/fs/buffer.c index 4d7f84e77d2..bc4fad93392 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2041,9 +2041,16 @@ int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len, if (buffer_new(bh)) { clean_bdev_bh_alias(bh); if (folio_test_uptodate(folio)) { + /* + * See block_commit_write() for why we + * must hold BH_Lock around set_uptodate + * + mark_dirty. + */ + lock_buffer(bh); clear_buffer_new(bh); set_buffer_uptodate(bh); mark_buffer_dirty(bh); + unlock_buffer(bh); continue; } if (block_end > to || block_start < from) @@ -2104,8 +2111,20 @@ void block_commit_write(struct folio *folio, size_t from, size_t to) if (!buffer_uptodate(bh)) partial = true; } else { + /* + * Per the contract documented at set_buffer_uptodate() + * (include/linux/buffer_head.h), callers must hold + * BH_Lock to serialize against concurrent clears of + * BH_Uptodate. Holding only the folio lock is not + * sufficient: a concurrent end_buffer_async_read() on + * a previously failed read can clear BH_Uptodate + * between set_buffer_uptodate() and mark_buffer_dirty(), + * tripping the WARN_ON_ONCE in mark_buffer_dirty(). + */ + lock_buffer(bh); set_buffer_uptodate(bh); mark_buffer_dirty(bh); + unlock_buffer(bh); } if (buffer_new(bh)) clear_buffer_new(bh); -- 2.43.0