* [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range
@ 2026-02-24 9:24 Li Chen
2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw)
To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel
Cc: Andreas Dilger, Joel Becker, Joseph Qi, linux-kernel
This series makes the jbd2_inode dirty range tracking safe for lockless
reads in jbd2 and filesystem callbacks used by ext4 and ocfs2.
Some paths access jinode fields without holding journal->j_list_lock
(e.g. fast commit helpers and ordered truncate helpers). v1 used READ_ONCE()
on i_dirty_start/end, but Matthew pointed out that loff_t can be torn on
32-bit platforms, and Jan suggested storing the dirty range in PAGE_SIZE
units as pgoff_t.
With this series, jbd2 stores the dirty range as page indexes and uses
READ_ONCE()/WRITE_ONCE() for lockless access. ext4 and ocfs2 use the new
jbd2_jinode_get_dirty_range() accessor which converts the page-based range
back to byte offsets for writeback.
This is based on Jan's suggestion in the review of the ext4 jinode
publication race fix. [1]
Changes since v2:
- Add jbd2_jinode_get_dirty_range() accessor and convert ext4/ocfs2 to use it
before switching the underlying representation (per Andreas).
- Rename the dirty range fields to i_dirty_start_page/end_page to make the
PAGE_SIZE units explicit and avoid silent unit mismatches when bisecting.
Changes since v1:
- Store i_dirty_start/end in PAGE_SIZE units (pgoff_t) to avoid torn loads on
32-bit (pointed out by Matthew, suggested by Jan).
- Use WRITE_ONCE() for i_dirty_* / i_flags updates in jbd2 (per Jan).
- Drop pointless READ_ONCE() on i_vfs_inode in jbd2_wait_inode_data (per Jan).
- Convert ext4/ocfs2 callbacks to translate page range to byte offsets.
[1]: https://lore.kernel.org/all/4jxwogttddiaoqbstlgou5ox6zs27ngjjz5ukrxafm2z5ijxod@so4eqnykiegj/
v2: https://lore.kernel.org/all/20260219114645.778338-1-me@linux.beauty/
v1: https://lore.kernel.org/all/20260130031232.60780-1-me@linux.beauty/
Li Chen (4):
jbd2: add jinode dirty range accessors
ext4: use jbd2 jinode dirty range accessor
ocfs2: use jbd2 jinode dirty range accessor
jbd2: store jinode dirty range in PAGE_SIZE units
fs/ext4/inode.c | 10 ++++++--
fs/ext4/super.c | 16 +++++++++----
fs/jbd2/commit.c | 56 +++++++++++++++++++++++++++++++++----------
fs/jbd2/journal.c | 5 ++--
fs/jbd2/transaction.c | 20 ++++++++++------
fs/ocfs2/journal.c | 9 +++++--
include/linux/jbd2.h | 35 +++++++++++++++++++++------
7 files changed, 112 insertions(+), 39 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/4] jbd2: add jinode dirty range accessors
2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
@ 2026-02-24 9:24 ` Li Chen
2026-03-02 18:07 ` Jan Kara
2026-02-24 9:24 ` [PATCH v3 2/4] ext4: use jbd2 jinode dirty range accessor Li Chen
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw)
To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
Jan Kara, linux-kernel
Cc: Andreas Dilger, Li Chen
Provide a helper to fetch jinode dirty ranges in bytes. This lets
filesystem callbacks avoid depending on the internal representation,
preparing for a later conversion to page units.
Suggested-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Li Chen <me@linux.beauty>
---
Changes since v2:
- New patch: add jbd2_jinode_get_dirty_range() helper.
include/linux/jbd2.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index a53a00d36228c..64392baf5f4b4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -445,6 +445,20 @@ struct jbd2_inode {
loff_t i_dirty_end;
};
+static inline bool jbd2_jinode_get_dirty_range(const struct jbd2_inode *jinode,
+ loff_t *start, loff_t *end)
+{
+ loff_t start_byte = jinode->i_dirty_start;
+ loff_t end_byte = jinode->i_dirty_end;
+
+ if (!end_byte)
+ return false;
+
+ *start = start_byte;
+ *end = end_byte;
+ return true;
+}
+
struct jbd2_revoke_table_s;
/**
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/4] ext4: use jbd2 jinode dirty range accessor
2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen
@ 2026-02-24 9:24 ` Li Chen
2026-02-24 9:24 ` [PATCH v3 3/4] ocfs2: " Li Chen
2026-02-24 9:24 ` [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
3 siblings, 0 replies; 10+ messages in thread
From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw)
To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
Andreas Dilger, linux-kernel
Cc: Li Chen
ext4 journal commit callbacks access jbd2_inode dirty range fields without
holding journal->j_list_lock.
Use jbd2_jinode_get_dirty_range() to get the range in bytes, and read
i_transaction with READ_ONCE() in the redirty check.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Li Chen <me@linux.beauty>
---
Changes since v2:
- Use jbd2_jinode_get_dirty_range() instead of direct i_dirty_* reads.
- Drop per-caller page->byte conversion (now handled by the accessor).
fs/ext4/inode.c | 10 ++++++++--
fs/ext4/super.c | 16 +++++++++++-----
2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index da96db5f23450..f87d35bda9276 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3031,17 +3031,23 @@ static int ext4_writepages(struct address_space *mapping,
int ext4_normal_submit_inode_data_buffers(struct jbd2_inode *jinode)
{
+ loff_t range_start, range_end;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = LONG_MAX,
- .range_start = jinode->i_dirty_start,
- .range_end = jinode->i_dirty_end,
};
struct mpage_da_data mpd = {
.inode = jinode->i_vfs_inode,
.wbc = &wbc,
.can_map = 0,
};
+
+ if (!jbd2_jinode_get_dirty_range(jinode, &range_start, &range_end))
+ return 0;
+
+ wbc.range_start = range_start;
+ wbc.range_end = range_end;
+
return ext4_do_writepages(&mpd);
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 69eb63dde9839..685951cd58394 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -521,6 +521,7 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode,
{
struct buffer_head *bh, *head;
struct journal_head *jh;
+ transaction_t *trans = READ_ONCE(jinode->i_transaction);
bh = head = folio_buffers(folio);
do {
@@ -539,7 +540,7 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode,
*/
jh = bh2jh(bh);
if (buffer_dirty(bh) ||
- (jh && (jh->b_transaction != jinode->i_transaction ||
+ (jh && (jh->b_transaction != trans ||
jh->b_next_transaction)))
return true;
} while ((bh = bh->b_this_page) != head);
@@ -550,15 +551,20 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode,
static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode)
{
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
+ loff_t range_start, range_end;
struct writeback_control wbc = {
- .sync_mode = WB_SYNC_ALL,
+ .sync_mode = WB_SYNC_ALL,
.nr_to_write = LONG_MAX,
- .range_start = jinode->i_dirty_start,
- .range_end = jinode->i_dirty_end,
- };
+ };
struct folio *folio = NULL;
int error;
+ if (!jbd2_jinode_get_dirty_range(jinode, &range_start, &range_end))
+ return 0;
+
+ wbc.range_start = range_start;
+ wbc.range_end = range_end;
+
/*
* writeback_iter() already checks for dirty pages and calls
* folio_clear_dirty_for_io(), which we want to write protect the
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/4] ocfs2: use jbd2 jinode dirty range accessor
2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen
2026-02-24 9:24 ` [PATCH v3 2/4] ext4: use jbd2 jinode dirty range accessor Li Chen
@ 2026-02-24 9:24 ` Li Chen
2026-03-02 18:07 ` Jan Kara
2026-02-24 9:24 ` [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
3 siblings, 1 reply; 10+ messages in thread
From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw)
To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
Joel Becker, Joseph Qi, linux-kernel
Cc: Li Chen
ocfs2 journal commit callback reads jbd2_inode dirty range fields without
holding journal->j_list_lock.
Use jbd2_jinode_get_dirty_range() to get the range in bytes.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Li Chen <me@linux.beauty>
---
Changes since v2:
- Use jbd2_jinode_get_dirty_range() instead of direct i_dirty_* reads.
- Drop per-caller page->byte conversion (now handled by the accessor).
fs/ocfs2/journal.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 85239807dec78..68c2f567e6e1b 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -902,8 +902,13 @@ int ocfs2_journal_alloc(struct ocfs2_super *osb)
static int ocfs2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
{
- return filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
- jinode->i_dirty_start, jinode->i_dirty_end);
+ struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
+ loff_t range_start, range_end;
+
+ if (!jbd2_jinode_get_dirty_range(jinode, &range_start, &range_end))
+ return 0;
+
+ return filemap_fdatawrite_range(mapping, range_start, range_end);
}
int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units
2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
` (2 preceding siblings ...)
2026-02-24 9:24 ` [PATCH v3 3/4] ocfs2: " Li Chen
@ 2026-02-24 9:24 ` Li Chen
2026-03-02 18:27 ` Jan Kara
3 siblings, 1 reply; 10+ messages in thread
From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw)
To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
Jan Kara, linux-kernel
Cc: Li Chen
jbd2_inode fields are updated under journal->j_list_lock, but some paths
read them without holding the lock (e.g. fast commit helpers and ordered
truncate helpers).
READ_ONCE() alone is not sufficient for the dirty range fields when they
are stored as loff_t because 32-bit platforms can observe torn loads.
Store the dirty range in PAGE_SIZE units as pgoff_t instead.
Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the
dirty range and i_flags to match the existing lockless access pattern.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Li Chen <me@linux.beauty>
---
Changes since v2:
- Rename i_dirty_start/end to i_dirty_start_page/end_page.
- Use jbd2_jinode_get_dirty_range() for byte conversions in commit paths.
fs/jbd2/commit.c | 56 +++++++++++++++++++++++++++++++++----------
fs/jbd2/journal.c | 5 ++--
fs/jbd2/transaction.c | 20 ++++++++++------
include/linux/jbd2.h | 31 ++++++++++++++----------
4 files changed, 77 insertions(+), 35 deletions(-)
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 7203d2d2624d7..514f204aa1db1 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -180,7 +180,13 @@ static int journal_wait_on_commit_record(journal_t *journal,
/* Send all the data buffers related to an inode */
int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode)
{
- if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
+ unsigned long flags;
+
+ if (!jinode)
+ return 0;
+
+ flags = READ_ONCE(jinode->i_flags);
+ if (!(flags & JI_WRITE_DATA))
return 0;
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
@@ -191,12 +197,30 @@ EXPORT_SYMBOL(jbd2_submit_inode_data);
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode)
{
- if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) ||
- !jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping)
+ struct address_space *mapping;
+ struct inode *inode;
+ unsigned long flags;
+ loff_t start_byte, end_byte;
+
+ if (!jinode)
+ return 0;
+
+ flags = READ_ONCE(jinode->i_flags);
+ if (!(flags & JI_WAIT_DATA))
+ return 0;
+
+ inode = jinode->i_vfs_inode;
+ if (!inode)
+ return 0;
+
+ mapping = inode->i_mapping;
+ if (!mapping)
+ return 0;
+
+ if (!jbd2_jinode_get_dirty_range(jinode, &start_byte, &end_byte))
return 0;
return filemap_fdatawait_range_keep_errors(
- jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start,
- jinode->i_dirty_end);
+ mapping, start_byte, end_byte);
}
EXPORT_SYMBOL(jbd2_wait_inode_data);
@@ -218,7 +242,8 @@ static int journal_submit_data_buffers(journal_t *journal,
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
if (!(jinode->i_flags & JI_WRITE_DATA))
continue;
- jinode->i_flags |= JI_COMMIT_RUNNING;
+ WRITE_ONCE(jinode->i_flags,
+ jinode->i_flags | JI_COMMIT_RUNNING);
spin_unlock(&journal->j_list_lock);
/* submit the inode data buffers. */
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
@@ -229,7 +254,8 @@ static int journal_submit_data_buffers(journal_t *journal,
}
spin_lock(&journal->j_list_lock);
J_ASSERT(jinode->i_transaction == commit_transaction);
- jinode->i_flags &= ~JI_COMMIT_RUNNING;
+ WRITE_ONCE(jinode->i_flags,
+ jinode->i_flags & ~JI_COMMIT_RUNNING);
smp_mb();
wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
}
@@ -240,10 +266,13 @@ static int journal_submit_data_buffers(journal_t *journal,
int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode)
{
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
+ loff_t start_byte, end_byte;
+
+ if (!jbd2_jinode_get_dirty_range(jinode, &start_byte, &end_byte))
+ return 0;
return filemap_fdatawait_range_keep_errors(mapping,
- jinode->i_dirty_start,
- jinode->i_dirty_end);
+ start_byte, end_byte);
}
/*
@@ -262,7 +291,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
if (!(jinode->i_flags & JI_WAIT_DATA))
continue;
- jinode->i_flags |= JI_COMMIT_RUNNING;
+ WRITE_ONCE(jinode->i_flags, jinode->i_flags | JI_COMMIT_RUNNING);
spin_unlock(&journal->j_list_lock);
/* wait for the inode data buffers writeout. */
if (journal->j_finish_inode_data_buffers) {
@@ -272,7 +301,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
}
cond_resched();
spin_lock(&journal->j_list_lock);
- jinode->i_flags &= ~JI_COMMIT_RUNNING;
+ WRITE_ONCE(jinode->i_flags, jinode->i_flags & ~JI_COMMIT_RUNNING);
smp_mb();
wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
}
@@ -288,8 +317,9 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
&jinode->i_transaction->t_inode_list);
} else {
jinode->i_transaction = NULL;
- jinode->i_dirty_start = 0;
- jinode->i_dirty_end = 0;
+ WRITE_ONCE(jinode->i_dirty_start_page, 0);
+ WRITE_ONCE(jinode->i_dirty_end_page,
+ JBD2_INODE_DIRTY_RANGE_NONE);
}
}
spin_unlock(&journal->j_list_lock);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index c973162d5b316..eb26c3088a164 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -3020,8 +3020,8 @@ void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode)
jinode->i_next_transaction = NULL;
jinode->i_vfs_inode = inode;
jinode->i_flags = 0;
- jinode->i_dirty_start = 0;
- jinode->i_dirty_end = 0;
+ jinode->i_dirty_start_page = 0;
+ jinode->i_dirty_end_page = JBD2_INODE_DIRTY_RANGE_NONE;
INIT_LIST_HEAD(&jinode->i_list);
}
@@ -3178,4 +3178,3 @@ MODULE_DESCRIPTION("Generic filesystem journal-writing module");
MODULE_LICENSE("GPL");
module_init(journal_init);
module_exit(journal_exit);
-
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index dca4b5d8aaaa3..f5226b6d47d24 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2646,6 +2646,7 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
{
transaction_t *transaction = handle->h_transaction;
journal_t *journal;
+ pgoff_t start_page, end_page;
if (is_handle_aborted(handle))
return -EROFS;
@@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino,
transaction->t_tid);
+ start_page = (pgoff_t)(start_byte >> PAGE_SHIFT);
+ end_page = (pgoff_t)(end_byte >> PAGE_SHIFT);
+
spin_lock(&journal->j_list_lock);
- jinode->i_flags |= flags;
+ WRITE_ONCE(jinode->i_flags, jinode->i_flags | flags);
- if (jinode->i_dirty_end) {
- jinode->i_dirty_start = min(jinode->i_dirty_start, start_byte);
- jinode->i_dirty_end = max(jinode->i_dirty_end, end_byte);
+ if (jinode->i_dirty_end_page != JBD2_INODE_DIRTY_RANGE_NONE) {
+ WRITE_ONCE(jinode->i_dirty_start_page,
+ min(jinode->i_dirty_start_page, start_page));
+ WRITE_ONCE(jinode->i_dirty_end_page,
+ max(jinode->i_dirty_end_page, end_page));
} else {
- jinode->i_dirty_start = start_byte;
- jinode->i_dirty_end = end_byte;
+ WRITE_ONCE(jinode->i_dirty_start_page, start_page);
+ WRITE_ONCE(jinode->i_dirty_end_page, end_page);
}
/* Is inode already attached where we need it? */
@@ -2739,7 +2745,7 @@ int jbd2_journal_begin_ordered_truncate(journal_t *journal,
int ret = 0;
/* This is a quick check to avoid locking if not necessary */
- if (!jinode->i_transaction)
+ if (!READ_ONCE(jinode->i_transaction))
goto out;
/* Locks are here just to force reading of recent values, it is
* enough that the transaction was not committing before we started
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 64392baf5f4b4..4fffbd13d38d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -390,6 +390,8 @@ static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
/* Wait for outstanding data writes for this inode before commit */
#define JI_WAIT_DATA (1 << __JI_WAIT_DATA)
+#define JBD2_INODE_DIRTY_RANGE_NONE ((pgoff_t)-1)
+
/**
* struct jbd2_inode - The jbd_inode type is the structure linking inodes in
* ordered mode present in a transaction so that we can sync them during commit.
@@ -429,33 +431,38 @@ struct jbd2_inode {
unsigned long i_flags;
/**
- * @i_dirty_start:
+ * @i_dirty_start_page:
+ *
+ * Dirty range start in PAGE_SIZE units.
+ *
+ * The dirty range is empty if @i_dirty_end_page is set to
+ * %JBD2_INODE_DIRTY_RANGE_NONE.
*
- * Offset in bytes where the dirty range for this inode starts.
* [j_list_lock]
*/
- loff_t i_dirty_start;
+ pgoff_t i_dirty_start_page;
/**
- * @i_dirty_end:
+ * @i_dirty_end_page:
*
- * Inclusive offset in bytes where the dirty range for this inode
- * ends. [j_list_lock]
+ * Dirty range end in PAGE_SIZE units (inclusive).
+ *
+ * [j_list_lock]
*/
- loff_t i_dirty_end;
+ pgoff_t i_dirty_end_page;
};
static inline bool jbd2_jinode_get_dirty_range(const struct jbd2_inode *jinode,
loff_t *start, loff_t *end)
{
- loff_t start_byte = jinode->i_dirty_start;
- loff_t end_byte = jinode->i_dirty_end;
+ pgoff_t start_page = READ_ONCE(jinode->i_dirty_start_page);
+ pgoff_t end_page = READ_ONCE(jinode->i_dirty_end_page);
- if (!end_byte)
+ if (end_page == JBD2_INODE_DIRTY_RANGE_NONE)
return false;
- *start = start_byte;
- *end = end_byte;
+ *start = (loff_t)start_page << PAGE_SHIFT;
+ *end = ((loff_t)end_page << PAGE_SHIFT) + PAGE_SIZE - 1;
return true;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/4] ocfs2: use jbd2 jinode dirty range accessor
2026-02-24 9:24 ` [PATCH v3 3/4] ocfs2: " Li Chen
@ 2026-03-02 18:07 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2026-03-02 18:07 UTC (permalink / raw)
To: Li Chen
Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
Joel Becker, Joseph Qi, linux-kernel
On Tue 24-02-26 17:24:32, Li Chen wrote:
> ocfs2 journal commit callback reads jbd2_inode dirty range fields without
> holding journal->j_list_lock.
> Use jbd2_jinode_get_dirty_range() to get the range in bytes.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Li Chen <me@linux.beauty>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> Changes since v2:
> - Use jbd2_jinode_get_dirty_range() instead of direct i_dirty_* reads.
> - Drop per-caller page->byte conversion (now handled by the accessor).
>
> fs/ocfs2/journal.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index 85239807dec78..68c2f567e6e1b 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -902,8 +902,13 @@ int ocfs2_journal_alloc(struct ocfs2_super *osb)
>
> static int ocfs2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
> {
> - return filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
> - jinode->i_dirty_start, jinode->i_dirty_end);
> + struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
> + loff_t range_start, range_end;
> +
> + if (!jbd2_jinode_get_dirty_range(jinode, &range_start, &range_end))
> + return 0;
> +
> + return filemap_fdatawrite_range(mapping, range_start, range_end);
> }
>
> int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
> --
> 2.52.0
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/4] jbd2: add jinode dirty range accessors
2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen
@ 2026-03-02 18:07 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2026-03-02 18:07 UTC (permalink / raw)
To: Li Chen
Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
Jan Kara, linux-kernel, Andreas Dilger
On Tue 24-02-26 17:24:30, Li Chen wrote:
> Provide a helper to fetch jinode dirty ranges in bytes. This lets
> filesystem callbacks avoid depending on the internal representation,
> preparing for a later conversion to page units.
>
> Suggested-by: Andreas Dilger <adilger@dilger.ca>
> Signed-off-by: Li Chen <me@linux.beauty>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> Changes since v2:
> - New patch: add jbd2_jinode_get_dirty_range() helper.
>
> include/linux/jbd2.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index a53a00d36228c..64392baf5f4b4 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -445,6 +445,20 @@ struct jbd2_inode {
> loff_t i_dirty_end;
> };
>
> +static inline bool jbd2_jinode_get_dirty_range(const struct jbd2_inode *jinode,
> + loff_t *start, loff_t *end)
> +{
> + loff_t start_byte = jinode->i_dirty_start;
> + loff_t end_byte = jinode->i_dirty_end;
> +
> + if (!end_byte)
> + return false;
> +
> + *start = start_byte;
> + *end = end_byte;
> + return true;
> +}
> +
> struct jbd2_revoke_table_s;
>
> /**
> --
> 2.52.0
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units
2026-02-24 9:24 ` [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
@ 2026-03-02 18:27 ` Jan Kara
2026-03-03 9:01 ` Jan Kara
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2026-03-02 18:27 UTC (permalink / raw)
To: Li Chen
Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
Jan Kara, linux-kernel
On Tue 24-02-26 17:24:33, Li Chen wrote:
> jbd2_inode fields are updated under journal->j_list_lock, but some paths
> read them without holding the lock (e.g. fast commit helpers and ordered
> truncate helpers).
>
> READ_ONCE() alone is not sufficient for the dirty range fields when they
> are stored as loff_t because 32-bit platforms can observe torn loads.
> Store the dirty range in PAGE_SIZE units as pgoff_t instead.
>
> Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the
> dirty range and i_flags to match the existing lockless access pattern.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Li Chen <me@linux.beauty>
...
> @@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
> jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino,
> transaction->t_tid);
>
> + start_page = (pgoff_t)(start_byte >> PAGE_SHIFT);
> + end_page = (pgoff_t)(end_byte >> PAGE_SHIFT);
MAX_LFS_SIZE on 32-bit is ULONG_MAX << PAGE_SHIFT and that's maximum file
size. So we could do here end_page = DIV_ROUND_UP(end_byte, PAGE_SIZE) and
just use end_page as exclusive end of a range to flush and get rid of
special JBD2_INODE_DIRTY_RANGE_NONE value.
The problem with the scheme you use is that files of MAX_LFS_SIZE would be
treated as having empty flush range...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units
2026-03-02 18:27 ` Jan Kara
@ 2026-03-03 9:01 ` Jan Kara
2026-03-06 5:44 ` Li Chen
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2026-03-03 9:01 UTC (permalink / raw)
To: Li Chen
Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel,
Jan Kara, linux-kernel
On Mon 02-03-26 19:27:13, Jan Kara wrote:
> On Tue 24-02-26 17:24:33, Li Chen wrote:
> > jbd2_inode fields are updated under journal->j_list_lock, but some paths
> > read them without holding the lock (e.g. fast commit helpers and ordered
> > truncate helpers).
> >
> > READ_ONCE() alone is not sufficient for the dirty range fields when they
> > are stored as loff_t because 32-bit platforms can observe torn loads.
> > Store the dirty range in PAGE_SIZE units as pgoff_t instead.
> >
> > Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the
> > dirty range and i_flags to match the existing lockless access pattern.
> >
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Li Chen <me@linux.beauty>
> ...
> > @@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
> > jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino,
> > transaction->t_tid);
> >
> > + start_page = (pgoff_t)(start_byte >> PAGE_SHIFT);
> > + end_page = (pgoff_t)(end_byte >> PAGE_SHIFT);
>
> MAX_LFS_SIZE on 32-bit is ULONG_MAX << PAGE_SHIFT and that's maximum file
> size. So we could do here end_page = DIV_ROUND_UP(end_byte, PAGE_SIZE) and
> just use end_page as exclusive end of a range to flush and get rid of
> special JBD2_INODE_DIRTY_RANGE_NONE value.
>
> The problem with the scheme you use is that files of MAX_LFS_SIZE would be
> treated as having empty flush range...
Or perhaps even easier might be to set start_page to ULONG_MAX and end_page
to 0 and use start_page > end_page as an indication of empty range.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units
2026-03-03 9:01 ` Jan Kara
@ 2026-03-06 5:44 ` Li Chen
0 siblings, 0 replies; 10+ messages in thread
From: Li Chen @ 2026-03-06 5:44 UTC (permalink / raw)
To: Jan Kara
Cc: Theodore Ts'o, Mark Fasheh, linux-ext4, ocfs2-devel, Jan Kara,
linux-kernel
Hi Jan,
---- On Tue, 03 Mar 2026 17:01:28 +0800 Jan Kara <jack@suse.cz> wrote ---
> On Mon 02-03-26 19:27:13, Jan Kara wrote:
> > On Tue 24-02-26 17:24:33, Li Chen wrote:
> > > jbd2_inode fields are updated under journal->j_list_lock, but some paths
> > > read them without holding the lock (e.g. fast commit helpers and ordered
> > > truncate helpers).
> > >
> > > READ_ONCE() alone is not sufficient for the dirty range fields when they
> > > are stored as loff_t because 32-bit platforms can observe torn loads.
> > > Store the dirty range in PAGE_SIZE units as pgoff_t instead.
> > >
> > > Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the
> > > dirty range and i_flags to match the existing lockless access pattern.
> > >
> > > Suggested-by: Jan Kara <jack@suse.cz>
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Li Chen <me@linux.beauty>
> > ...
> > > @@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
> > > jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino,
> > > transaction->t_tid);
> > >
> > > + start_page = (pgoff_t)(start_byte >> PAGE_SHIFT);
> > > + end_page = (pgoff_t)(end_byte >> PAGE_SHIFT);
> >
> > MAX_LFS_SIZE on 32-bit is ULONG_MAX << PAGE_SHIFT and that's maximum file
> > size. So we could do here end_page = DIV_ROUND_UP(end_byte, PAGE_SIZE) and
> > just use end_page as exclusive end of a range to flush and get rid of
> > special JBD2_INODE_DIRTY_RANGE_NONE value.
> >
> > The problem with the scheme you use is that files of MAX_LFS_SIZE would be
> > treated as having empty flush range...
>
> Or perhaps even easier might be to set start_page to ULONG_MAX and end_page
> to 0 and use start_page > end_page as an indication of empty range.
Great idea! Thanks for the detailed review.
I will update the range handling to avoid the sentinel collision.
Regards,
Li
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-06 5:44 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen
2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen
2026-03-02 18:07 ` Jan Kara
2026-02-24 9:24 ` [PATCH v3 2/4] ext4: use jbd2 jinode dirty range accessor Li Chen
2026-02-24 9:24 ` [PATCH v3 3/4] ocfs2: " Li Chen
2026-03-02 18:07 ` Jan Kara
2026-02-24 9:24 ` [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen
2026-03-02 18:27 ` Jan Kara
2026-03-03 9:01 ` Jan Kara
2026-03-06 5:44 ` Li Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox