Linux EXT4 FS development
 help / color / mirror / Atom feed
* [PATCH v2 0/3] jbd2/ext4/ocfs2: lockless jinode dirty range
@ 2026-02-19 11:46 Li Chen
  2026-02-19 11:46 ` [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Li Chen @ 2026-02-19 11:46 UTC (permalink / raw)
  To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
	Matthew Wilcox
  Cc: Andreas Dilger, Joel Becker, Joseph Qi, linux-kernel

This series makes the jbd2_inode dirty range tracking safe for lockless
reads in jbd2 and filesystem callbacks used by ext4 and ocfs2.

Some paths access jinode fields without holding journal->j_list_lock
(e.g. fast commit helpers and ordered truncate helpers). v1 used READ_ONCE()
on i_dirty_start/end, but Matthew pointed out that loff_t can be torn on
32-bit platforms, and Jan suggested storing the dirty range in PAGE_SIZE
units as pgoff_t.

With this series, jbd2 stores i_dirty_start/end as pgoff_t and uses
READ_ONCE()/WRITE_ONCE() for lockless access. ext4 and ocfs2 convert the
page-based dirty range back to byte offsets for writeback.

This is based on Jan's suggestion in the review of the ext4 jinode
publication race fix. [1]

Changes since v1:
- Store i_dirty_start/end in PAGE_SIZE units (pgoff_t) to avoid torn loads on
  32-bit (pointed out by Matthew, suggested by Jan).
- Use WRITE_ONCE() for i_dirty_* / i_flags updates in jbd2 (per Jan).
- Drop pointless READ_ONCE() on i_vfs_inode in jbd2_wait_inode_data (per Jan).
- Convert ext4/ocfs2 callbacks to translate page range to byte offsets.

[1]: https://lore.kernel.org/all/4jxwogttddiaoqbstlgou5ox6zs27ngjjz5ukrxafm2z5ijxod@so4eqnykiegj/

v1: https://lore.kernel.org/all/20260130031232.60780-1-me@linux.beauty/

Li Chen (3):
  jbd2: store jinode dirty range in PAGE_SIZE units
  ext4: use READ_ONCE for lockless jinode reads
  ocfs2: use READ_ONCE for lockless jinode reads

 fs/ext4/inode.c       | 12 ++++++--
 fs/ext4/super.c       | 19 +++++++++----
 fs/jbd2/commit.c      | 65 ++++++++++++++++++++++++++++++++++---------
 fs/jbd2/journal.c     |  3 +-
 fs/jbd2/transaction.c | 20 ++++++++-----
 fs/ocfs2/journal.c    | 13 +++++++--
 include/linux/jbd2.h  | 17 +++++++----
 7 files changed, 113 insertions(+), 36 deletions(-)

-- 
2.52.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units
  2026-02-19 11:46 [PATCH v2 0/3] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
@ 2026-02-19 11:46 ` Li Chen
  2026-02-19 21:00   ` Andreas Dilger
  2026-02-19 11:46 ` [PATCH v2 2/3] ext4: use READ_ONCE for lockless jinode reads Li Chen
  2026-02-19 11:46 ` [PATCH v2 3/3] ocfs2: " Li Chen
  2 siblings, 1 reply; 7+ messages in thread
From: Li Chen @ 2026-02-19 11:46 UTC (permalink / raw)
  To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
	Matthew Wilcox, Jan Kara, linux-kernel
  Cc: Li Chen

jbd2_inode fields are updated under journal->j_list_lock, but some paths
read them without holding the lock (e.g. fast commit helpers and ordered
truncate helpers).

READ_ONCE() alone is not sufficient for i_dirty_start/end as they are
loff_t and 32-bit platforms can observe torn loads. Store the dirty range
in PAGE_SIZE units as pgoff_t so lockless readers can take non-torn
snapshots.

Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for
the dirty range and i_flags to match the existing lockless access pattern.

Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Li Chen <me@linux.beauty>
---
Changes since v1:
- Store i_dirty_start/end in PAGE_SIZE units (pgoff_t) to avoid torn loads on
  32-bit (pointed out by Matthew, suggested by Jan).
- Use WRITE_ONCE() for i_dirty_* / i_flags updates (per Jan).
- Drop pointless READ_ONCE() on i_vfs_inode in jbd2_wait_inode_data (per Jan).

 fs/jbd2/commit.c      | 65 ++++++++++++++++++++++++++++++++++---------
 fs/jbd2/journal.c     |  3 +-
 fs/jbd2/transaction.c | 20 ++++++++-----
 include/linux/jbd2.h  | 17 +++++++----
 4 files changed, 78 insertions(+), 27 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 7203d2d2624d..d98f4dbde695 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -180,7 +180,13 @@ static int journal_wait_on_commit_record(journal_t *journal,
 /* Send all the data buffers related to an inode */
 int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode)
 {
-	if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
+	unsigned long flags;
+
+	if (!jinode)
+		return 0;
+
+	flags = READ_ONCE(jinode->i_flags);
+	if (!(flags & JI_WRITE_DATA))
 		return 0;
 
 	trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
@@ -191,12 +197,35 @@ EXPORT_SYMBOL(jbd2_submit_inode_data);
 
 int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode)
 {
-	if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) ||
-		!jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping)
+	struct address_space *mapping;
+	struct inode *inode;
+	unsigned long flags;
+	pgoff_t start, end;
+	loff_t start_byte, end_byte;
+
+	if (!jinode)
+		return 0;
+
+	flags = READ_ONCE(jinode->i_flags);
+	if (!(flags & JI_WAIT_DATA))
+		return 0;
+
+	inode = jinode->i_vfs_inode;
+	if (!inode)
+		return 0;
+
+	mapping = inode->i_mapping;
+	start = READ_ONCE(jinode->i_dirty_start);
+	end = READ_ONCE(jinode->i_dirty_end);
+	if (end == JBD2_INODE_DIRTY_RANGE_NONE)
+		return 0;
+	start_byte = (loff_t)start << PAGE_SHIFT;
+	end_byte = ((loff_t)end << PAGE_SHIFT) + PAGE_SIZE - 1;
+
+	if (!mapping)
 		return 0;
 	return filemap_fdatawait_range_keep_errors(
-		jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start,
-		jinode->i_dirty_end);
+		mapping, start_byte, end_byte);
 }
 EXPORT_SYMBOL(jbd2_wait_inode_data);
 
@@ -218,7 +247,8 @@ static int journal_submit_data_buffers(journal_t *journal,
 	list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
 		if (!(jinode->i_flags & JI_WRITE_DATA))
 			continue;
-		jinode->i_flags |= JI_COMMIT_RUNNING;
+		WRITE_ONCE(jinode->i_flags,
+			   jinode->i_flags | JI_COMMIT_RUNNING);
 		spin_unlock(&journal->j_list_lock);
 		/* submit the inode data buffers. */
 		trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
@@ -229,7 +259,8 @@ static int journal_submit_data_buffers(journal_t *journal,
 		}
 		spin_lock(&journal->j_list_lock);
 		J_ASSERT(jinode->i_transaction == commit_transaction);
-		jinode->i_flags &= ~JI_COMMIT_RUNNING;
+		WRITE_ONCE(jinode->i_flags,
+			   jinode->i_flags & ~JI_COMMIT_RUNNING);
 		smp_mb();
 		wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
 	}
@@ -240,10 +271,17 @@ static int journal_submit_data_buffers(journal_t *journal,
 int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode)
 {
 	struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
+	pgoff_t start = READ_ONCE(jinode->i_dirty_start);
+	pgoff_t end = READ_ONCE(jinode->i_dirty_end);
+	loff_t start_byte, end_byte;
+
+	if (end == JBD2_INODE_DIRTY_RANGE_NONE)
+		return 0;
+	start_byte = (loff_t)start << PAGE_SHIFT;
+	end_byte = ((loff_t)end << PAGE_SHIFT) + PAGE_SIZE - 1;
 
 	return filemap_fdatawait_range_keep_errors(mapping,
-						   jinode->i_dirty_start,
-						   jinode->i_dirty_end);
+						   start_byte, end_byte);
 }
 
 /*
@@ -262,7 +300,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
 	list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
 		if (!(jinode->i_flags & JI_WAIT_DATA))
 			continue;
-		jinode->i_flags |= JI_COMMIT_RUNNING;
+		WRITE_ONCE(jinode->i_flags, jinode->i_flags | JI_COMMIT_RUNNING);
 		spin_unlock(&journal->j_list_lock);
 		/* wait for the inode data buffers writeout. */
 		if (journal->j_finish_inode_data_buffers) {
@@ -272,7 +310,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
 		}
 		cond_resched();
 		spin_lock(&journal->j_list_lock);
-		jinode->i_flags &= ~JI_COMMIT_RUNNING;
+		WRITE_ONCE(jinode->i_flags, jinode->i_flags & ~JI_COMMIT_RUNNING);
 		smp_mb();
 		wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
 	}
@@ -288,8 +326,9 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
 				&jinode->i_transaction->t_inode_list);
 		} else {
 			jinode->i_transaction = NULL;
-			jinode->i_dirty_start = 0;
-			jinode->i_dirty_end = 0;
+			WRITE_ONCE(jinode->i_dirty_start, 0);
+			WRITE_ONCE(jinode->i_dirty_end,
+				   JBD2_INODE_DIRTY_RANGE_NONE);
 		}
 	}
 	spin_unlock(&journal->j_list_lock);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index c973162d5b31..9a7477c54dcb 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -3021,7 +3021,7 @@ void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode)
 	jinode->i_vfs_inode = inode;
 	jinode->i_flags = 0;
 	jinode->i_dirty_start = 0;
-	jinode->i_dirty_end = 0;
+	jinode->i_dirty_end = JBD2_INODE_DIRTY_RANGE_NONE;
 	INIT_LIST_HEAD(&jinode->i_list);
 }
 
@@ -3178,4 +3178,3 @@ MODULE_DESCRIPTION("Generic filesystem journal-writing module");
 MODULE_LICENSE("GPL");
 module_init(journal_init);
 module_exit(journal_exit);
-
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index dca4b5d8aaaa..bbe47be6c73c 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2646,6 +2646,7 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
 {
 	transaction_t *transaction = handle->h_transaction;
 	journal_t *journal;
+	pgoff_t start, end;
 
 	if (is_handle_aborted(handle))
 		return -EROFS;
@@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
 	jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino,
 			transaction->t_tid);
 
+	start = (pgoff_t)(start_byte >> PAGE_SHIFT);
+	end = (pgoff_t)(end_byte >> PAGE_SHIFT);
+
 	spin_lock(&journal->j_list_lock);
-	jinode->i_flags |= flags;
+	WRITE_ONCE(jinode->i_flags, jinode->i_flags | flags);
 
-	if (jinode->i_dirty_end) {
-		jinode->i_dirty_start = min(jinode->i_dirty_start, start_byte);
-		jinode->i_dirty_end = max(jinode->i_dirty_end, end_byte);
+	if (jinode->i_dirty_end != JBD2_INODE_DIRTY_RANGE_NONE) {
+		WRITE_ONCE(jinode->i_dirty_start,
+			   min(jinode->i_dirty_start, start));
+		WRITE_ONCE(jinode->i_dirty_end,
+			   max(jinode->i_dirty_end, end));
 	} else {
-		jinode->i_dirty_start = start_byte;
-		jinode->i_dirty_end = end_byte;
+		WRITE_ONCE(jinode->i_dirty_start, start);
+		WRITE_ONCE(jinode->i_dirty_end, end);
 	}
 
 	/* Is inode already attached where we need it? */
@@ -2739,7 +2745,7 @@ int jbd2_journal_begin_ordered_truncate(journal_t *journal,
 	int ret = 0;
 
 	/* This is a quick check to avoid locking if not necessary */
-	if (!jinode->i_transaction)
+	if (!READ_ONCE(jinode->i_transaction))
 		goto out;
 	/* Locks are here just to force reading of recent values, it is
 	 * enough that the transaction was not committing before we started
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index a53a00d36228..81eb58ddc126 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -390,6 +390,8 @@ static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 /* Wait for outstanding data writes for this inode before commit */
 #define JI_WAIT_DATA (1 << __JI_WAIT_DATA)
 
+#define JBD2_INODE_DIRTY_RANGE_NONE	((pgoff_t)-1)
+
 /**
  * struct jbd2_inode - The jbd_inode type is the structure linking inodes in
  * ordered mode present in a transaction so that we can sync them during commit.
@@ -431,18 +433,23 @@ struct jbd2_inode {
 	/**
 	 * @i_dirty_start:
 	 *
-	 * Offset in bytes where the dirty range for this inode starts.
+	 * Dirty range start in PAGE_SIZE units.
+	 *
+	 * The dirty range is empty if @i_dirty_end is set to
+	 * %JBD2_INODE_DIRTY_RANGE_NONE.
+	 *
 	 * [j_list_lock]
 	 */
-	loff_t i_dirty_start;
+	pgoff_t i_dirty_start;
 
 	/**
 	 * @i_dirty_end:
 	 *
-	 * Inclusive offset in bytes where the dirty range for this inode
-	 * ends. [j_list_lock]
+	 * Dirty range end in PAGE_SIZE units (inclusive).
+	 *
+	 * [j_list_lock]
 	 */
-	loff_t i_dirty_end;
+	pgoff_t i_dirty_end;
 };
 
 struct jbd2_revoke_table_s;
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] ext4: use READ_ONCE for lockless jinode reads
  2026-02-19 11:46 [PATCH v2 0/3] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
  2026-02-19 11:46 ` [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
@ 2026-02-19 11:46 ` Li Chen
  2026-02-19 11:46 ` [PATCH v2 3/3] ocfs2: " Li Chen
  2 siblings, 0 replies; 7+ messages in thread
From: Li Chen @ 2026-02-19 11:46 UTC (permalink / raw)
  To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
	Matthew Wilcox, Andreas Dilger, linux-kernel
  Cc: Li Chen

ext4 journal commit callbacks access jbd2_inode fields such as
i_transaction and i_dirty_start/end without holding journal->j_list_lock.

Use READ_ONCE() for these reads to match the lockless access pattern.
Convert the dirty range from PAGE_SIZE units back to byte offsets before
passing it to writeback.

Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Li Chen <me@linux.beauty>
---
Changes since v1:
- Convert the jinode dirty range from PAGE_SIZE units (pgoff_t) back to byte
  offsets before passing it to writeback.

 fs/ext4/inode.c | 12 ++++++++++--
 fs/ext4/super.c | 19 ++++++++++++++-----
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d99296d7315f..5ec60580a2d0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3033,11 +3033,19 @@ static int ext4_writepages(struct address_space *mapping,
 
 int ext4_normal_submit_inode_data_buffers(struct jbd2_inode *jinode)
 {
+	pgoff_t dirty_start = READ_ONCE(jinode->i_dirty_start);
+	pgoff_t dirty_end = READ_ONCE(jinode->i_dirty_end);
+	loff_t range_start, range_end;
+
+	if (dirty_end == JBD2_INODE_DIRTY_RANGE_NONE)
+		return 0;
+	range_start = (loff_t)dirty_start << PAGE_SHIFT;
+	range_end = ((loff_t)dirty_end << PAGE_SHIFT) + PAGE_SIZE - 1;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
 		.nr_to_write = LONG_MAX,
-		.range_start = jinode->i_dirty_start,
-		.range_end = jinode->i_dirty_end,
+		.range_start = range_start,
+		.range_end = range_end,
 	};
 	struct mpage_da_data mpd = {
 		.inode = jinode->i_vfs_inode,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 69eb63dde983..27062c8ad60a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -521,6 +521,7 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode,
 {
 	struct buffer_head *bh, *head;
 	struct journal_head *jh;
+	transaction_t *trans = READ_ONCE(jinode->i_transaction);
 
 	bh = head = folio_buffers(folio);
 	do {
@@ -539,7 +540,7 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode,
 		 */
 		jh = bh2jh(bh);
 		if (buffer_dirty(bh) ||
-		    (jh && (jh->b_transaction != jinode->i_transaction ||
+		    (jh && (jh->b_transaction != trans ||
 			    jh->b_next_transaction)))
 			return true;
 	} while ((bh = bh->b_this_page) != head);
@@ -550,12 +551,20 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode,
 static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode)
 {
 	struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
+	pgoff_t dirty_start = READ_ONCE(jinode->i_dirty_start);
+	pgoff_t dirty_end = READ_ONCE(jinode->i_dirty_end);
+	loff_t range_start, range_end;
+
+	if (dirty_end == JBD2_INODE_DIRTY_RANGE_NONE)
+		return 0;
+	range_start = (loff_t)dirty_start << PAGE_SHIFT;
+	range_end = ((loff_t)dirty_end << PAGE_SHIFT) + PAGE_SIZE - 1;
 	struct writeback_control wbc = {
-		.sync_mode =  WB_SYNC_ALL,
+		.sync_mode = WB_SYNC_ALL,
 		.nr_to_write = LONG_MAX,
-		.range_start = jinode->i_dirty_start,
-		.range_end = jinode->i_dirty_end,
-        };
+		.range_start = range_start,
+		.range_end = range_end,
+	};
 	struct folio *folio = NULL;
 	int error;
 
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] ocfs2: use READ_ONCE for lockless jinode reads
  2026-02-19 11:46 [PATCH v2 0/3] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
  2026-02-19 11:46 ` [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
  2026-02-19 11:46 ` [PATCH v2 2/3] ext4: use READ_ONCE for lockless jinode reads Li Chen
@ 2026-02-19 11:46 ` Li Chen
  2 siblings, 0 replies; 7+ messages in thread
From: Li Chen @ 2026-02-19 11:46 UTC (permalink / raw)
  To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
	Matthew Wilcox, Joel Becker, Joseph Qi, linux-kernel
  Cc: Li Chen

ocfs2 journal commit callback reads jbd2_inode dirty range fields without
holding journal->j_list_lock.

Use READ_ONCE() for these reads to match the lockless access pattern.
Convert the dirty range from PAGE_SIZE units back to byte offsets before
passing it to writeback.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Li Chen <me@linux.beauty>
---
Changes since v1:
- Convert the jinode dirty range from PAGE_SIZE units (pgoff_t) back to byte
  offsets before passing it to writeback.

 fs/ocfs2/journal.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 85239807dec7..0b40383ebcdf 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -902,8 +902,17 @@ int ocfs2_journal_alloc(struct ocfs2_super *osb)
 
 static int ocfs2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
 {
-	return filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
-			jinode->i_dirty_start, jinode->i_dirty_end);
+	struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
+	pgoff_t dirty_start = READ_ONCE(jinode->i_dirty_start);
+	pgoff_t dirty_end = READ_ONCE(jinode->i_dirty_end);
+	loff_t start_byte, end_byte;
+
+	if (dirty_end == JBD2_INODE_DIRTY_RANGE_NONE)
+		return 0;
+	start_byte = (loff_t)dirty_start << PAGE_SHIFT;
+	end_byte = ((loff_t)dirty_end << PAGE_SHIFT) + PAGE_SIZE - 1;
+
+	return filemap_fdatawrite_range(mapping, start_byte, end_byte);
 }
 
 int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units
  2026-02-19 11:46 ` [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
@ 2026-02-19 21:00   ` Andreas Dilger
  2026-02-20  6:43     ` Li Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Dilger @ 2026-02-19 21:00 UTC (permalink / raw)
  To: Li Chen
  Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
	Matthew Wilcox, Jan Kara, linux-kernel

On Feb 19, 2026, at 04:46, Li Chen <me@linux.beauty> wrote:
> 
> jbd2_inode fields are updated under journal->j_list_lock, but some paths
> read them without holding the lock (e.g. fast commit helpers and ordered
> truncate helpers).
> 
> READ_ONCE() alone is not sufficient for i_dirty_start/end as they are
> loff_t and 32-bit platforms can observe torn loads. Store the dirty range
> in PAGE_SIZE units as pgoff_t so lockless readers can take non-torn
> snapshots.

When making semantic changes like this, it is best to change the variable
names as well, so that breaks compilation if bisection happens to land
between these patches.  Otherwise, that could cause some random behavior
if jbd2 is treating these as pages, but ext4/ocfs2 are treating them as
bytes or vice versa.

Something like i_dirty_{start,end} -> i_dirty_{start,end}_page would make
this very clear what the units are.

To avoid breakage between the patches (which is desirable to avoid problems
with automated bisection) you should make an initial patch with wrappers to
access these values and convert ext4/ocfs2 to use them:

static inline loff_t jbd2_jinode_dirty_start(struct jbd2_inode *jinode)
{
	return jinode->i_dirty_start;
}

static inline loff_t jbd2_jinode_dirty_end(struct jbd2_inode *jinode)
{
	return jinode->i_dirty_end;
}

then change this in the jbd2 patch at the end, which would then be self-contained:

static inline loff_t jbd2_jinode_dirty_start(struct jbd2_inode *jinode)
{
	return (loff_t)jinode->i_dirty_start_page << PAGE_SHIFT;
}

static inline loff_t jbd2_jinode_dirty_end(struct jbd2_inode *jinode)
{
	return ((loff_t)jinode->i_dirty_end_page << PAGE_SHIFT) + ~PAGE_MASK;
}


Cheers, Andreas

> Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for
> the dirty range and i_flags to match the existing lockless access pattern.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Li Chen <me@linux.beauty>
> ---
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units
  2026-02-19 21:00   ` Andreas Dilger
@ 2026-02-20  6:43     ` Li Chen
  2026-02-23  1:11       ` Andreas Dilger
  0 siblings, 1 reply; 7+ messages in thread
From: Li Chen @ 2026-02-20  6:43 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
	Matthew Wilcox, Jan Kara, linux-kernel

Hi Andreas,

Thanks a lot for your review!

 ---- On Fri, 20 Feb 2026 05:00:13 +0800  Andreas Dilger <adilger@dilger.ca> wrote --- 
 > On Feb 19, 2026, at 04:46, Li Chen <me@linux.beauty> wrote:
 > > 
 > > jbd2_inode fields are updated under journal->j_list_lock, but some paths
 > > read them without holding the lock (e.g. fast commit helpers and ordered
 > > truncate helpers).
 > > 
 > > READ_ONCE() alone is not sufficient for i_dirty_start/end as they are
 > > loff_t and 32-bit platforms can observe torn loads. Store the dirty range
 > > in PAGE_SIZE units as pgoff_t so lockless readers can take non-torn
 > > snapshots.
 > 
 > When making semantic changes like this, it is best to change the variable
 > names as well, so that breaks compilation if bisection happens to land
 > between these patches.  Otherwise, that could cause some random behavior
 > if jbd2 is treating these as pages, but ext4/ocfs2 are treating them as
 > bytes or vice versa.
 > 
 > Something like i_dirty_{start,end} -> i_dirty_{start,end}_page would make
 > this very clear what the units are.

Agreed. I’ll make the units explicit in the field names (e.g. *_page).

 > To avoid breakage between the patches (which is desirable to avoid problems
 > with automated bisection) you should make an initial patch with wrappers to
 > access these values and convert ext4/ocfs2 to use them:
 > 
 > static inline loff_t jbd2_jinode_dirty_start(struct jbd2_inode *jinode)
 > {
 >     return jinode->i_dirty_start;
 > }
 > 
 > static inline loff_t jbd2_jinode_dirty_end(struct jbd2_inode *jinode)
 > {
 >     return jinode->i_dirty_end;
 > }
 > 
 > then change this in the jbd2 patch at the end, which would then be self-contained:
 > 
 > static inline loff_t jbd2_jinode_dirty_start(struct jbd2_inode *jinode)
 > {
 >     return (loff_t)jinode->i_dirty_start_page << PAGE_SHIFT;
 > }
 > 
 > static inline loff_t jbd2_jinode_dirty_end(struct jbd2_inode *jinode)
 > {
 >     return ((loff_t)jinode->i_dirty_end_page << PAGE_SHIFT) + ~PAGE_MASK;
 > }


Agreed as well. I’ll add an accessor and switch ext4/ocfs2 over to it first,
Then do the internal representation change later.

I plan to use a single helper that returns the (start,end) pair in
bytes:

static inline bool jbd2_jinode_get_dirty_range(const struct jbd2_inode *jinode,
loff_t *start, loff_t *end)
{
    pgoff_t start_page = READ_ONCE(jinode->i_dirty_start_page);
    pgoff_t end_page = READ_ONCE(jinode->i_dirty_end_page);

    if (end_page == JBD2_INODE_DIRTY_RANGE_NONE)
      return false;

    *start = (loff_t)start_page << PAGE_SHIFT;
    *end = ((loff_t)end_page << PAGE_SHIFT) + PAGE_SIZE - 1;
    return true;

}

I think this is a bit easier to use correctly than separate start/end helpers
(keeps start/end together, and the end-of-page conversion lives in one place).

Does that sound OK, or would you rather see separate
jbd2_jinode_dirty_start()/jbd2_jinode_dirty_end() helpers?

Regards,
Li​


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units
  2026-02-20  6:43     ` Li Chen
@ 2026-02-23  1:11       ` Andreas Dilger
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Dilger @ 2026-02-23  1:11 UTC (permalink / raw)
  To: Li Chen
  Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
	Matthew Wilcox, Jan Kara, linux-kernel

On Feb 19, 2026, at 23:43, Li Chen <me@linux.beauty> wrote:
> 
> Hi Andreas,
> 
> Thanks a lot for your review!
> 
> ---- On Fri, 20 Feb 2026 05:00:13 +0800  Andreas Dilger <adilger@dilger.ca> wrote ---
>> On Feb 19, 2026, at 04:46, Li Chen <me@linux.beauty> wrote:
>>> 
>>> jbd2_inode fields are updated under journal->j_list_lock, but some paths
>>> read them without holding the lock (e.g. fast commit helpers and ordered
>>> truncate helpers).
>>> 
>>> READ_ONCE() alone is not sufficient for i_dirty_start/end as they are
>>> loff_t and 32-bit platforms can observe torn loads. Store the dirty range
>>> in PAGE_SIZE units as pgoff_t so lockless readers can take non-torn
>>> snapshots.
>> 
>> When making semantic changes like this, it is best to change the variable
>> names as well, so that breaks compilation if bisection happens to land
>> between these patches.  Otherwise, that could cause some random behavior
>> if jbd2 is treating these as pages, but ext4/ocfs2 are treating them as
>> bytes or vice versa.
>> 
>> Something like i_dirty_{start,end} -> i_dirty_{start,end}_page would make
>> this very clear what the units are.
> 
> Agreed. I’ll make the units explicit in the field names (e.g. *_page).
> 
>> To avoid breakage between the patches (which is desirable to avoid problems
>> with automated bisection) you should make an initial patch with wrappers to
>> access these values and convert ext4/ocfs2 to use them:
>> 
>> static inline loff_t jbd2_jinode_dirty_start(struct jbd2_inode *jinode)
>> {
>>    return jinode->i_dirty_start;
>> }
>> 
>> static inline loff_t jbd2_jinode_dirty_end(struct jbd2_inode *jinode)
>> {
>>    return jinode->i_dirty_end;
>> }
>> 
>> then change this in the jbd2 patch at the end, which would then be self-contained:
>> 
>> static inline loff_t jbd2_jinode_dirty_start(struct jbd2_inode *jinode)
>> {
>>    return (loff_t)jinode->i_dirty_start_page << PAGE_SHIFT;
>> }
>> 
>> static inline loff_t jbd2_jinode_dirty_end(struct jbd2_inode *jinode)
>> {
>>    return ((loff_t)jinode->i_dirty_end_page << PAGE_SHIFT) + ~PAGE_MASK;
>> }
> 
> 
> Agreed as well. I’ll add an accessor and switch ext4/ocfs2 over to it first,
> Then do the internal representation change later.
> 
> I plan to use a single helper that returns the (start,end) pair in
> bytes:
> 
> static inline bool jbd2_jinode_get_dirty_range(const struct jbd2_inode *jinode,
> loff_t *start, loff_t *end)
> {
>    pgoff_t start_page = READ_ONCE(jinode->i_dirty_start_page);
>    pgoff_t end_page = READ_ONCE(jinode->i_dirty_end_page);
> 
>    if (end_page == JBD2_INODE_DIRTY_RANGE_NONE)
>      return false;
> 
>    *start = (loff_t)start_page << PAGE_SHIFT;
>    *end = ((loff_t)end_page << PAGE_SHIFT) + PAGE_SIZE - 1;
>    return true;
> 
> }
> 
> I think this is a bit easier to use correctly than separate start/end helpers
> (keeps start/end together, and the end-of-page conversion lives in one place).
> 
> Does that sound OK, or would you rather see separate
> jbd2_jinode_dirty_start()/jbd2_jinode_dirty_end() helpers?

This is fine with me.  I had only proposed the other option as a "typical"
interface for such fields.  If start/end are always used together then it
is fine that there is only one function to get these fields, and one to
set them.

Cheers, Andreas





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-23  1:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 11:46 [PATCH v2 0/3] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
2026-02-19 11:46 ` [PATCH v2 1/3] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
2026-02-19 21:00   ` Andreas Dilger
2026-02-20  6:43     ` Li Chen
2026-02-23  1:11       ` Andreas Dilger
2026-02-19 11:46 ` [PATCH v2 2/3] ext4: use READ_ONCE for lockless jinode reads Li Chen
2026-02-19 11:46 ` [PATCH v2 3/3] ocfs2: " Li Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox