From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Qu Wenruo <wqu@suse.com>, David Sterba <dsterba@suse.com>,
Sasha Levin <sashal@kernel.org>,
clm@fb.com, josef@toxicpanda.com, linux-btrfs@vger.kernel.org
Subject: [PATCH AUTOSEL 6.14 40/54] btrfs: reject out-of-band dirty folios during writeback
Date: Thu, 3 Apr 2025 15:01:55 -0400 [thread overview]
Message-ID: <20250403190209.2675485-40-sashal@kernel.org> (raw)
In-Reply-To: <20250403190209.2675485-1-sashal@kernel.org>
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit 7ca3e84980ef6484a5c6f004aa180b61ce0c37d9 ]
[OUT-OF-BAND DIRTY FOLIOS]
An out-of-band folio means the folio is marked dirty but without
notifying the filesystem.
This can lead to various problems, not limited to:
- No folio::private to track per block status
- No proper space reserved for such a dirty folio
[HISTORY IN BTRFS]
This used to be a problem related to get_user_page(), but with the
introduction of pin_user_pages*(), we should no longer hit such
case anymore.
In btrfs, we have a long history of catching such out-of-band dirty
folios by:
- Mark the folio ordered during delayed allocation
- Check the folio ordered flag during writeback
If the folio has no ordered flag, it means it doesn't go through
delayed allocation, thus it's definitely an out-of-band
one.
If we got one, we go through COW fixup, which will re-dirty the folio
with proper handling in another workqueue.
[PROBLEMS OF COW-FIXUP]
Such workaround is a blockage for us to migrate to iomap (it requires
extra flags to trace if a folio is dirtied by the fs or not) and I'd
argue it's not data checksum safe, since if a folio can be marked dirty
without informing the fs, the content can also change at any time.
But with the introduction of pin_user_pages*() during v5.8 merge
window, such out-of-band dirty folio such be treated as a bug.
Ext4 has treated such case by warning and erroring out even before
pin_user_pages*().
Furthermore, there are already proofs that such folio ordered flag
tracking can be screwed up by incorrect error handling, check the commit
messages of the following commits:
06f364284794 ("btrfs: do proper folio cleanup when cow_file_range() failed")
c2b47df81c8e ("btrfs: do proper folio cleanup when run_delalloc_nocow() failed")
[FIXES]
Unlike btrfs, ext4 and xfs (iomap) never bother handling such
out-of-band dirty folios.
- Ext4 just warns and errors out
- Iomap always follows the folio/block dirty flags
And there is nothing really COW specific, xfs also supports COW too.
Here we take one step towards ext4 by doing warning and erroring out.
But since the cow fixup thing is introduced from the beginning, we keep
the old behavior for non-experimental builds, and only do the new warning
for experimental builds before we're 100% sure and remove cow fixup.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/extent_io.c | 28 +++++++++++++++++++++++++++-
fs/btrfs/inode.c | 15 +++++++++++++++
2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b2fae67f8fa34..1fb1b54bc856c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1442,12 +1442,14 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
start + len <= folio_start + folio_size(folio));
ret = btrfs_writepage_cow_fixup(folio);
- if (ret) {
+ if (ret == -EAGAIN) {
/* Fixup worker will requeue */
folio_redirty_for_writepage(bio_ctrl->wbc, folio);
folio_unlock(folio);
return 1;
}
+ if (ret < 0)
+ return ret;
for (cur = start; cur < start + len; cur += fs_info->sectorsize)
set_bit((cur - folio_start) >> fs_info->sectorsize_bits, &range_bitmap);
@@ -1551,6 +1553,30 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
* The proper bitmap can only be initialized until writepage_delalloc().
*/
bio_ctrl->submit_bitmap = (unsigned long)-1;
+
+ /*
+ * If the page is dirty but without private set, it's marked dirty
+ * without informing the fs.
+ * Nowadays that is a bug, since the introduction of
+ * pin_user_pages*().
+ *
+ * So here we check if the page has private set to rule out such
+ * case.
+ * But we also have a long history of relying on the COW fixup,
+ * so here we only enable this check for experimental builds until
+ * we're sure it's safe.
+ */
+ if (IS_ENABLED(CONFIG_BTRFS_EXPERIMENTAL) &&
+ unlikely(!folio_test_private(folio))) {
+ WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+ btrfs_err_rl(fs_info,
+ "root %lld ino %llu folio %llu is marked dirty without notifying the fs",
+ inode->root->root_key.objectid,
+ btrfs_ino(inode), folio_pos(folio));
+ ret = -EUCLEAN;
+ goto done;
+ }
+
ret = set_folio_extent_mapped(folio);
if (ret < 0)
goto done;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 38756f8cef463..f877e531fd073 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2877,6 +2877,21 @@ int btrfs_writepage_cow_fixup(struct folio *folio)
if (folio_test_ordered(folio))
return 0;
+ /*
+ * For experimental build, we error out instead of EAGAIN.
+ *
+ * We should not hit such out-of-band dirty folios anymore.
+ */
+ if (IS_ENABLED(CONFIG_BTRFS_EXPERIMENTAL)) {
+ WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+ btrfs_err_rl(fs_info,
+ "root %lld ino %llu folio %llu is marked dirty without notifying the fs",
+ BTRFS_I(inode)->root->root_key.objectid,
+ btrfs_ino(BTRFS_I(inode)),
+ folio_pos(folio));
+ return -EUCLEAN;
+ }
+
/*
* folio_checked is set below when we create a fixup worker for this
* folio, don't try to create another one if we're already
--
2.39.5
next prev parent reply other threads:[~2025-04-03 19:03 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-03 19:01 [PATCH AUTOSEL 6.14 01/54] wifi: ath9k: use unsigned long for activity check timestamp Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 02/54] wifi: ath11k: Fix DMA buffer allocation to resolve SWIOTLB issues Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 03/54] wifi: ath11k: fix memory leak in ath11k_xxx_remove() Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 04/54] wifi: ath12k: fix memory leak in ath12k_pci_remove() Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 05/54] wifi: ath12k: Fix invalid entry fetch in ath12k_dp_mon_srng_process Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 06/54] wifi: ath12k: Avoid memory leak while enabling statistics Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 07/54] ata: libata-core: Add 'external' to the libata.force kernel parameter Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 08/54] scsi: mpi3mr: Avoid reply queue full condition Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 09/54] scsi: mpi3mr: Synchronous access b/w reset and tm thread for reply queue Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 10/54] net: page_pool: don't cast mp param to devmem Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 11/54] f2fs: don't retry IO for corrupted data scenario Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 12/54] wifi: mac80211: add strict mode disabling workarounds Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 13/54] wifi: mac80211: ensure sdata->work is canceled before initialized Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 14/54] scsi: target: spc: Fix RSOC parameter data header size Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 15/54] net: usb: asix_devices: add FiberGecko DeviceID Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 16/54] page_pool: avoid infinite loop to schedule delayed worker Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 17/54] can: flexcan: Add quirk to handle separate interrupt lines for mailboxes Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 18/54] can: flexcan: add NXP S32G2/S32G3 SoC support Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 19/54] jfs: Fix uninit-value access of imap allocated in the diMount() function Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 20/54] mptcp: move the whole rx path under msk socket lock protection Sasha Levin
2025-04-10 11:05 ` Matthieu Baerts
2025-04-14 0:12 ` Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 21/54] fs/jfs: cast inactags to s64 to prevent potential overflow Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 22/54] fs/jfs: Prevent integer overflow in AG size calculation Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 23/54] jfs: Prevent copying of nlink with value 0 from disk inode Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 24/54] jfs: add sanity check for agwidth in dbMount Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 25/54] wifi: rtw88: Add support for Mercusys MA30N and D-Link DWA-T185 rev. A1 Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 26/54] ata: libata-eh: Do not use ATAPI DMA for a device limited to PIO mode Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 27/54] net: sfp: add quirk for 2.5G OEM BX SFP Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 28/54] wifi: ath12k: Fix invalid data access in ath12k_dp_rx_h_undecap_nwifi Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 29/54] null_blk: replace null_process_cmd() call in null_zone_write() Sasha Levin
2025-04-04 3:31 ` Shinichiro Kawasaki
2025-04-14 0:12 ` Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 30/54] f2fs: fix to avoid out-of-bounds access in f2fs_truncate_inode_blocks() Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 31/54] net: sfp: add quirk for FS SFP-10GM-T copper SFP+ module Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 32/54] ahci: add PCI ID for Marvell 88SE9215 SATA Controller Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 33/54] ext4: protect ext4_release_dquot against freezing Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 34/54] Revert "f2fs: rebuild nat_bits during umount" Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 35/54] wifi: mac80211: fix userspace_selectors corruption Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 36/54] ext4: ignore xattrs past end Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 37/54] cdc_ether|r8152: ThinkPad Hybrid USB-C/A Dock quirk Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 38/54] scsi: st: Fix array overflow in st_setup() Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 39/54] ahci: Marvell 88SE9215 controllers prefer DMA for ATAPI Sasha Levin
2025-04-03 19:01 ` Sasha Levin [this message]
2025-04-03 19:37 ` [PATCH AUTOSEL 6.14 40/54] btrfs: reject out-of-band dirty folios during writeback David Sterba
2025-04-14 0:11 ` Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 41/54] btrfs: harden block_group::bg_list against list_del() races Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 42/54] wifi: mt76: mt76x2u: add TP-Link TL-WDN6200 ID to device table Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 43/54] net: vlan: don't propagate flags on open Sasha Levin
2025-04-03 19:01 ` [PATCH AUTOSEL 6.14 44/54] tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 45/54] Bluetooth: btusb: Add new VID/PID for WCN785x Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 46/54] Bluetooth: btintel_pcie: Add device id of Whale Peak Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 47/54] Bluetooth: btusb: Add 13 USB device IDs for Qualcomm WCN785x Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 48/54] Bluetooth: hci_uart: fix race during initialization Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 49/54] Bluetooth: btusb: Add 2 HWIDs for MT7922 Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 50/54] Bluetooth: hci_qca: use the power sequencer for wcn6750 Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 51/54] Bluetooth: qca: simplify WCN399x NVM loading Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 52/54] Bluetooth: qca: add WCN3950 support Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 53/54] Bluetooth: Add quirk for broken READ_VOICE_SETTING Sasha Levin
2025-04-03 19:02 ` [PATCH AUTOSEL 6.14 54/54] Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250403190209.2675485-40-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox