* [PATCH v4 1/1] iomap: fix race when reading in all bytes of a folio
2025-10-31 21:13 [PATCH v4 0/1] vfs-6.19.iomap commit 51311f045375 fixup Joanne Koong
@ 2025-10-31 21:13 ` Joanne Koong
2025-11-05 11:57 ` [PATCH v4 0/1] vfs-6.19.iomap commit 51311f045375 fixup Christian Brauner
1 sibling, 0 replies; 3+ messages in thread
From: Joanne Koong @ 2025-10-31 21:13 UTC (permalink / raw)
To: brauner; +Cc: bfoster, hch, djwong, linux-fsdevel
There is a race where if all bytes in a folio need to get read in and
the filesystem finishes reading the bytes in before the call to
iomap_read_end(), then bytes_accounted in iomap_read_end() will be 0 and
the following "ifs->read_bytes_pending -= bytes_accounted" will aslo be
which will trigger an extra folio_end_read() call. This extra
folio_end_read() unlocks the folio for the 2nd time, which sets the lock
bit on the folio, resulting in a permanent lockup.
Fix this by adding a +1 bias when doing the initialization for
ifs->read_bytes_pending. The bias will be subtracted in
iomap_read_end(). This ensures the read has not been ended on the folio
when iomap_read_end() is called, even if the IO helper has finished
reading in the entire folio.
Additionally, add some comments to clarify how this logic works.
Fixes: 51311f045375 ("iomap: track pending read bytes more optimally")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 40 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 38 insertions(+), 2 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 36ee3290669a..6ae031ac8058 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -358,12 +358,42 @@ static void iomap_read_init(struct folio *folio)
if (ifs) {
size_t len = folio_size(folio);
+ /*
+ * ifs->read_bytes_pending is used to track how many bytes are
+ * read in asynchronously by the IO helper. We need to track
+ * this so that we can know when the IO helper has finished
+ * reading in all the necessary ranges of the folio and can end
+ * the read.
+ *
+ * Increase ->read_bytes_pending by the folio size to start, and
+ * add a +1 bias. We'll subtract the bias and any uptodate /
+ * zeroed ranges that did not require IO in iomap_read_end()
+ * after we're done processing the folio.
+ *
+ * We do this because otherwise, we would have to increment
+ * ifs->read_bytes_pending every time a range in the folio needs
+ * to be read in, which can get expensive since the spinlock
+ * needs to be held whenever modifying ifs->read_bytes_pending.
+ *
+ * We add the bias to ensure the read has not been ended on the
+ * folio when iomap_read_end() is called, even if the IO helper
+ * has already finished reading in the entire folio.
+ */
spin_lock_irq(&ifs->state_lock);
- ifs->read_bytes_pending += len;
+ ifs->read_bytes_pending += len + 1;
spin_unlock_irq(&ifs->state_lock);
}
}
+/*
+ * This ends IO if no bytes were submitted to an IO helper.
+ *
+ * Otherwise, this calibrates ifs->read_bytes_pending to represent only the
+ * submitted bytes (see comment in iomap_read_init()). If all bytes submitted
+ * have already been completed by the IO helper, then this will end the read.
+ * Else the IO helper will end the read after all submitted ranges have been
+ * read.
+ */
static void iomap_read_end(struct folio *folio, size_t bytes_pending)
{
struct iomap_folio_state *ifs;
@@ -381,7 +411,13 @@ static void iomap_read_end(struct folio *folio, size_t bytes_pending)
ifs = folio->private;
if (ifs) {
bool end_read, uptodate;
- size_t bytes_accounted = folio_size(folio) - bytes_pending;
+ /*
+ * Subtract any bytes that were initially accounted to
+ * read_bytes_pending but skipped for IO.
+ * The +1 accounts for the bias we added in iomap_read_init().
+ */
+ size_t bytes_accounted = folio_size(folio) + 1 -
+ bytes_pending;
spin_lock_irq(&ifs->state_lock);
ifs->read_bytes_pending -= bytes_accounted;
--
2.47.3
^ permalink raw reply related [flat|nested] 3+ messages in thread