* [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range
@ 2026-02-24 9:24 Li Chen
2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw)
To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel
Cc: Andreas Dilger, Joel Becker, Joseph Qi, linux-kernel
This series makes the jbd2_inode dirty range tracking safe for lockless
reads in jbd2 and filesystem callbacks used by ext4 and ocfs2.
Some paths access jinode fields without holding journal->j_list_lock
(e.g. fast commit helpers and ordered truncate helpers). v1 used READ_ONCE()
on i_dirty_start/end, but Matthew pointed out that loff_t can be torn on
32-bit platforms, and Jan suggested storing the dirty range in PAGE_SIZE
units as pgoff_t.
With this series, jbd2 stores the dirty range as page indexes and uses
READ_ONCE()/WRITE_ONCE() for lockless access. ext4 and ocfs2 use the new
jbd2_jinode_get_dirty_range() accessor which converts the page-based range
back to byte offsets for writeback.
This is based on Jan's suggestion in the review of the ext4 jinode
publication race fix. [1]
Changes since v2:
- Add jbd2_jinode_get_dirty_range() accessor and convert ext4/ocfs2 to use it
before switching the underlying representation (per Andreas).
- Rename the dirty range fields to i_dirty_start_page/end_page to make the
PAGE_SIZE units explicit and avoid silent unit mismatches when bisecting.
Changes since v1:
- Store i_dirty_start/end in PAGE_SIZE units (pgoff_t) to avoid torn loads on
32-bit (pointed out by Matthew, suggested by Jan).
- Use WRITE_ONCE() for i_dirty_* / i_flags updates in jbd2 (per Jan).
- Drop pointless READ_ONCE() on i_vfs_inode in jbd2_wait_inode_data (per Jan).
- Convert ext4/ocfs2 callbacks to translate page range to byte offsets.
[1]: https://lore.kernel.org/all/4jxwogttddiaoqbstlgou5ox6zs27ngjjz5ukrxafm2z5ijxod@so4eqnykiegj/
v2: https://lore.kernel.org/all/20260219114645.778338-1-me@linux.beauty/
v1: https://lore.kernel.org/all/20260130031232.60780-1-me@linux.beauty/
Li Chen (4):
jbd2: add jinode dirty range accessors
ext4: use jbd2 jinode dirty range accessor
ocfs2: use jbd2 jinode dirty range accessor
jbd2: store jinode dirty range in PAGE_SIZE units
fs/ext4/inode.c | 10 ++++++--
fs/ext4/super.c | 16 +++++++++----
fs/jbd2/commit.c | 56 +++++++++++++++++++++++++++++++++----------
fs/jbd2/journal.c | 5 ++--
fs/jbd2/transaction.c | 20 ++++++++++------
fs/ocfs2/journal.c | 9 +++++--
include/linux/jbd2.h | 35 +++++++++++++++++++++------
7 files changed, 112 insertions(+), 39 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 1/4] jbd2: add jinode dirty range accessors 2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen @ 2026-02-24 9:24 ` Li Chen 2026-03-02 18:07 ` Jan Kara 2026-02-24 9:24 ` [PATCH v3 2/4] ext4: use jbd2 jinode dirty range accessor Li Chen ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw) To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Jan Kara, linux-kernel Cc: Andreas Dilger, Li Chen Provide a helper to fetch jinode dirty ranges in bytes. This lets filesystem callbacks avoid depending on the internal representation, preparing for a later conversion to page units. Suggested-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Li Chen <me@linux.beauty> --- Changes since v2: - New patch: add jbd2_jinode_get_dirty_range() helper. include/linux/jbd2.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index a53a00d36228c..64392baf5f4b4 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -445,6 +445,20 @@ struct jbd2_inode { loff_t i_dirty_end; }; +static inline bool jbd2_jinode_get_dirty_range(const struct jbd2_inode *jinode, + loff_t *start, loff_t *end) +{ + loff_t start_byte = jinode->i_dirty_start; + loff_t end_byte = jinode->i_dirty_end; + + if (!end_byte) + return false; + + *start = start_byte; + *end = end_byte; + return true; +} + struct jbd2_revoke_table_s; /** -- 2.52.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/4] jbd2: add jinode dirty range accessors 2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen @ 2026-03-02 18:07 ` Jan Kara 0 siblings, 0 replies; 10+ messages in thread From: Jan Kara @ 2026-03-02 18:07 UTC (permalink / raw) To: Li Chen Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Jan Kara, linux-kernel, Andreas Dilger On Tue 24-02-26 17:24:30, Li Chen wrote: > Provide a helper to fetch jinode dirty ranges in bytes. This lets > filesystem callbacks avoid depending on the internal representation, > preparing for a later conversion to page units. > > Suggested-by: Andreas Dilger <adilger@dilger.ca> > Signed-off-by: Li Chen <me@linux.beauty> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > Changes since v2: > - New patch: add jbd2_jinode_get_dirty_range() helper. > > include/linux/jbd2.h | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h > index a53a00d36228c..64392baf5f4b4 100644 > --- a/include/linux/jbd2.h > +++ b/include/linux/jbd2.h > @@ -445,6 +445,20 @@ struct jbd2_inode { > loff_t i_dirty_end; > }; > > +static inline bool jbd2_jinode_get_dirty_range(const struct jbd2_inode *jinode, > + loff_t *start, loff_t *end) > +{ > + loff_t start_byte = jinode->i_dirty_start; > + loff_t end_byte = jinode->i_dirty_end; > + > + if (!end_byte) > + return false; > + > + *start = start_byte; > + *end = end_byte; > + return true; > +} > + > struct jbd2_revoke_table_s; > > /** > -- > 2.52.0 -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/4] ext4: use jbd2 jinode dirty range accessor 2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen 2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen @ 2026-02-24 9:24 ` Li Chen 2026-02-24 9:24 ` [PATCH v3 3/4] ocfs2: " Li Chen 2026-02-24 9:24 ` [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen 3 siblings, 0 replies; 10+ messages in thread From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw) To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Andreas Dilger, linux-kernel Cc: Li Chen ext4 journal commit callbacks access jbd2_inode dirty range fields without holding journal->j_list_lock. Use jbd2_jinode_get_dirty_range() to get the range in bytes, and read i_transaction with READ_ONCE() in the redirty check. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> --- Changes since v2: - Use jbd2_jinode_get_dirty_range() instead of direct i_dirty_* reads. - Drop per-caller page->byte conversion (now handled by the accessor). fs/ext4/inode.c | 10 ++++++++-- fs/ext4/super.c | 16 +++++++++++----- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index da96db5f23450..f87d35bda9276 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3031,17 +3031,23 @@ static int ext4_writepages(struct address_space *mapping, int ext4_normal_submit_inode_data_buffers(struct jbd2_inode *jinode) { + loff_t range_start, range_end; struct writeback_control wbc = { .sync_mode = WB_SYNC_ALL, .nr_to_write = LONG_MAX, - .range_start = jinode->i_dirty_start, - .range_end = jinode->i_dirty_end, }; struct mpage_da_data mpd = { .inode = jinode->i_vfs_inode, .wbc = &wbc, .can_map = 0, }; + + if (!jbd2_jinode_get_dirty_range(jinode, &range_start, &range_end)) + return 0; + + wbc.range_start = range_start; + wbc.range_end = range_end; + return ext4_do_writepages(&mpd); } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 69eb63dde9839..685951cd58394 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -521,6 +521,7 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode, { struct buffer_head *bh, *head; struct journal_head *jh; + transaction_t *trans = READ_ONCE(jinode->i_transaction); bh = head = folio_buffers(folio); do { @@ -539,7 +540,7 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode, */ jh = bh2jh(bh); if (buffer_dirty(bh) || - (jh && (jh->b_transaction != jinode->i_transaction || + (jh && (jh->b_transaction != trans || jh->b_next_transaction))) return true; } while ((bh = bh->b_this_page) != head); @@ -550,15 +551,20 @@ static bool ext4_journalled_writepage_needs_redirty(struct jbd2_inode *jinode, static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode) { struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + loff_t range_start, range_end; struct writeback_control wbc = { - .sync_mode = WB_SYNC_ALL, + .sync_mode = WB_SYNC_ALL, .nr_to_write = LONG_MAX, - .range_start = jinode->i_dirty_start, - .range_end = jinode->i_dirty_end, - }; + }; struct folio *folio = NULL; int error; + if (!jbd2_jinode_get_dirty_range(jinode, &range_start, &range_end)) + return 0; + + wbc.range_start = range_start; + wbc.range_end = range_end; + /* * writeback_iter() already checks for dirty pages and calls * folio_clear_dirty_for_io(), which we want to write protect the -- 2.52.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/4] ocfs2: use jbd2 jinode dirty range accessor 2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen 2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen 2026-02-24 9:24 ` [PATCH v3 2/4] ext4: use jbd2 jinode dirty range accessor Li Chen @ 2026-02-24 9:24 ` Li Chen 2026-03-02 18:07 ` Jan Kara 2026-02-24 9:24 ` [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen 3 siblings, 1 reply; 10+ messages in thread From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw) To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Joel Becker, Joseph Qi, linux-kernel Cc: Li Chen ocfs2 journal commit callback reads jbd2_inode dirty range fields without holding journal->j_list_lock. Use jbd2_jinode_get_dirty_range() to get the range in bytes. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> --- Changes since v2: - Use jbd2_jinode_get_dirty_range() instead of direct i_dirty_* reads. - Drop per-caller page->byte conversion (now handled by the accessor). fs/ocfs2/journal.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 85239807dec78..68c2f567e6e1b 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -902,8 +902,13 @@ int ocfs2_journal_alloc(struct ocfs2_super *osb) static int ocfs2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) { - return filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping, - jinode->i_dirty_start, jinode->i_dirty_end); + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + loff_t range_start, range_end; + + if (!jbd2_jinode_get_dirty_range(jinode, &range_start, &range_end)) + return 0; + + return filemap_fdatawrite_range(mapping, range_start, range_end); } int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty) -- 2.52.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/4] ocfs2: use jbd2 jinode dirty range accessor 2026-02-24 9:24 ` [PATCH v3 3/4] ocfs2: " Li Chen @ 2026-03-02 18:07 ` Jan Kara 0 siblings, 0 replies; 10+ messages in thread From: Jan Kara @ 2026-03-02 18:07 UTC (permalink / raw) To: Li Chen Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Joel Becker, Joseph Qi, linux-kernel On Tue 24-02-26 17:24:32, Li Chen wrote: > ocfs2 journal commit callback reads jbd2_inode dirty range fields without > holding journal->j_list_lock. > Use jbd2_jinode_get_dirty_range() to get the range in bytes. > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Li Chen <me@linux.beauty> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > Changes since v2: > - Use jbd2_jinode_get_dirty_range() instead of direct i_dirty_* reads. > - Drop per-caller page->byte conversion (now handled by the accessor). > > fs/ocfs2/journal.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c > index 85239807dec78..68c2f567e6e1b 100644 > --- a/fs/ocfs2/journal.c > +++ b/fs/ocfs2/journal.c > @@ -902,8 +902,13 @@ int ocfs2_journal_alloc(struct ocfs2_super *osb) > > static int ocfs2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) > { > - return filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping, > - jinode->i_dirty_start, jinode->i_dirty_end); > + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; > + loff_t range_start, range_end; > + > + if (!jbd2_jinode_get_dirty_range(jinode, &range_start, &range_end)) > + return 0; > + > + return filemap_fdatawrite_range(mapping, range_start, range_end); > } > > int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty) > -- > 2.52.0 -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units 2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen ` (2 preceding siblings ...) 2026-02-24 9:24 ` [PATCH v3 3/4] ocfs2: " Li Chen @ 2026-02-24 9:24 ` Li Chen 2026-03-02 18:27 ` Jan Kara 3 siblings, 1 reply; 10+ messages in thread From: Li Chen @ 2026-02-24 9:24 UTC (permalink / raw) To: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Jan Kara, linux-kernel Cc: Li Chen jbd2_inode fields are updated under journal->j_list_lock, but some paths read them without holding the lock (e.g. fast commit helpers and ordered truncate helpers). READ_ONCE() alone is not sufficient for the dirty range fields when they are stored as loff_t because 32-bit platforms can observe torn loads. Store the dirty range in PAGE_SIZE units as pgoff_t instead. Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the dirty range and i_flags to match the existing lockless access pattern. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> --- Changes since v2: - Rename i_dirty_start/end to i_dirty_start_page/end_page. - Use jbd2_jinode_get_dirty_range() for byte conversions in commit paths. fs/jbd2/commit.c | 56 +++++++++++++++++++++++++++++++++---------- fs/jbd2/journal.c | 5 ++-- fs/jbd2/transaction.c | 20 ++++++++++------ include/linux/jbd2.h | 31 ++++++++++++++---------- 4 files changed, 77 insertions(+), 35 deletions(-) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 7203d2d2624d7..514f204aa1db1 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -180,7 +180,13 @@ static int journal_wait_on_commit_record(journal_t *journal, /* Send all the data buffers related to an inode */ int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode) { - if (!jinode || !(jinode->i_flags & JI_WRITE_DATA)) + unsigned long flags; + + if (!jinode) + return 0; + + flags = READ_ONCE(jinode->i_flags); + if (!(flags & JI_WRITE_DATA)) return 0; trace_jbd2_submit_inode_data(jinode->i_vfs_inode); @@ -191,12 +197,30 @@ EXPORT_SYMBOL(jbd2_submit_inode_data); int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode) { - if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) || - !jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping) + struct address_space *mapping; + struct inode *inode; + unsigned long flags; + loff_t start_byte, end_byte; + + if (!jinode) + return 0; + + flags = READ_ONCE(jinode->i_flags); + if (!(flags & JI_WAIT_DATA)) + return 0; + + inode = jinode->i_vfs_inode; + if (!inode) + return 0; + + mapping = inode->i_mapping; + if (!mapping) + return 0; + + if (!jbd2_jinode_get_dirty_range(jinode, &start_byte, &end_byte)) return 0; return filemap_fdatawait_range_keep_errors( - jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start, - jinode->i_dirty_end); + mapping, start_byte, end_byte); } EXPORT_SYMBOL(jbd2_wait_inode_data); @@ -218,7 +242,8 @@ static int journal_submit_data_buffers(journal_t *journal, list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { if (!(jinode->i_flags & JI_WRITE_DATA)) continue; - jinode->i_flags |= JI_COMMIT_RUNNING; + WRITE_ONCE(jinode->i_flags, + jinode->i_flags | JI_COMMIT_RUNNING); spin_unlock(&journal->j_list_lock); /* submit the inode data buffers. */ trace_jbd2_submit_inode_data(jinode->i_vfs_inode); @@ -229,7 +254,8 @@ static int journal_submit_data_buffers(journal_t *journal, } spin_lock(&journal->j_list_lock); J_ASSERT(jinode->i_transaction == commit_transaction); - jinode->i_flags &= ~JI_COMMIT_RUNNING; + WRITE_ONCE(jinode->i_flags, + jinode->i_flags & ~JI_COMMIT_RUNNING); smp_mb(); wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING); } @@ -240,10 +266,13 @@ static int journal_submit_data_buffers(journal_t *journal, int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) { struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + loff_t start_byte, end_byte; + + if (!jbd2_jinode_get_dirty_range(jinode, &start_byte, &end_byte)) + return 0; return filemap_fdatawait_range_keep_errors(mapping, - jinode->i_dirty_start, - jinode->i_dirty_end); + start_byte, end_byte); } /* @@ -262,7 +291,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal, list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { if (!(jinode->i_flags & JI_WAIT_DATA)) continue; - jinode->i_flags |= JI_COMMIT_RUNNING; + WRITE_ONCE(jinode->i_flags, jinode->i_flags | JI_COMMIT_RUNNING); spin_unlock(&journal->j_list_lock); /* wait for the inode data buffers writeout. */ if (journal->j_finish_inode_data_buffers) { @@ -272,7 +301,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal, } cond_resched(); spin_lock(&journal->j_list_lock); - jinode->i_flags &= ~JI_COMMIT_RUNNING; + WRITE_ONCE(jinode->i_flags, jinode->i_flags & ~JI_COMMIT_RUNNING); smp_mb(); wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING); } @@ -288,8 +317,9 @@ static int journal_finish_inode_data_buffers(journal_t *journal, &jinode->i_transaction->t_inode_list); } else { jinode->i_transaction = NULL; - jinode->i_dirty_start = 0; - jinode->i_dirty_end = 0; + WRITE_ONCE(jinode->i_dirty_start_page, 0); + WRITE_ONCE(jinode->i_dirty_end_page, + JBD2_INODE_DIRTY_RANGE_NONE); } } spin_unlock(&journal->j_list_lock); diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index c973162d5b316..eb26c3088a164 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -3020,8 +3020,8 @@ void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode) jinode->i_next_transaction = NULL; jinode->i_vfs_inode = inode; jinode->i_flags = 0; - jinode->i_dirty_start = 0; - jinode->i_dirty_end = 0; + jinode->i_dirty_start_page = 0; + jinode->i_dirty_end_page = JBD2_INODE_DIRTY_RANGE_NONE; INIT_LIST_HEAD(&jinode->i_list); } @@ -3178,4 +3178,3 @@ MODULE_DESCRIPTION("Generic filesystem journal-writing module"); MODULE_LICENSE("GPL"); module_init(journal_init); module_exit(journal_exit); - diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index dca4b5d8aaaa3..f5226b6d47d24 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2646,6 +2646,7 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode, { transaction_t *transaction = handle->h_transaction; journal_t *journal; + pgoff_t start_page, end_page; if (is_handle_aborted(handle)) return -EROFS; @@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode, jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino, transaction->t_tid); + start_page = (pgoff_t)(start_byte >> PAGE_SHIFT); + end_page = (pgoff_t)(end_byte >> PAGE_SHIFT); + spin_lock(&journal->j_list_lock); - jinode->i_flags |= flags; + WRITE_ONCE(jinode->i_flags, jinode->i_flags | flags); - if (jinode->i_dirty_end) { - jinode->i_dirty_start = min(jinode->i_dirty_start, start_byte); - jinode->i_dirty_end = max(jinode->i_dirty_end, end_byte); + if (jinode->i_dirty_end_page != JBD2_INODE_DIRTY_RANGE_NONE) { + WRITE_ONCE(jinode->i_dirty_start_page, + min(jinode->i_dirty_start_page, start_page)); + WRITE_ONCE(jinode->i_dirty_end_page, + max(jinode->i_dirty_end_page, end_page)); } else { - jinode->i_dirty_start = start_byte; - jinode->i_dirty_end = end_byte; + WRITE_ONCE(jinode->i_dirty_start_page, start_page); + WRITE_ONCE(jinode->i_dirty_end_page, end_page); } /* Is inode already attached where we need it? */ @@ -2739,7 +2745,7 @@ int jbd2_journal_begin_ordered_truncate(journal_t *journal, int ret = 0; /* This is a quick check to avoid locking if not necessary */ - if (!jinode->i_transaction) + if (!READ_ONCE(jinode->i_transaction)) goto out; /* Locks are here just to force reading of recent values, it is * enough that the transaction was not committing before we started diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 64392baf5f4b4..4fffbd13d38d4 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -390,6 +390,8 @@ static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) /* Wait for outstanding data writes for this inode before commit */ #define JI_WAIT_DATA (1 << __JI_WAIT_DATA) +#define JBD2_INODE_DIRTY_RANGE_NONE ((pgoff_t)-1) + /** * struct jbd2_inode - The jbd_inode type is the structure linking inodes in * ordered mode present in a transaction so that we can sync them during commit. @@ -429,33 +431,38 @@ struct jbd2_inode { unsigned long i_flags; /** - * @i_dirty_start: + * @i_dirty_start_page: + * + * Dirty range start in PAGE_SIZE units. + * + * The dirty range is empty if @i_dirty_end_page is set to + * %JBD2_INODE_DIRTY_RANGE_NONE. * - * Offset in bytes where the dirty range for this inode starts. * [j_list_lock] */ - loff_t i_dirty_start; + pgoff_t i_dirty_start_page; /** - * @i_dirty_end: + * @i_dirty_end_page: * - * Inclusive offset in bytes where the dirty range for this inode - * ends. [j_list_lock] + * Dirty range end in PAGE_SIZE units (inclusive). + * + * [j_list_lock] */ - loff_t i_dirty_end; + pgoff_t i_dirty_end_page; }; static inline bool jbd2_jinode_get_dirty_range(const struct jbd2_inode *jinode, loff_t *start, loff_t *end) { - loff_t start_byte = jinode->i_dirty_start; - loff_t end_byte = jinode->i_dirty_end; + pgoff_t start_page = READ_ONCE(jinode->i_dirty_start_page); + pgoff_t end_page = READ_ONCE(jinode->i_dirty_end_page); - if (!end_byte) + if (end_page == JBD2_INODE_DIRTY_RANGE_NONE) return false; - *start = start_byte; - *end = end_byte; + *start = (loff_t)start_page << PAGE_SHIFT; + *end = ((loff_t)end_page << PAGE_SHIFT) + PAGE_SIZE - 1; return true; } -- 2.52.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units 2026-02-24 9:24 ` [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen @ 2026-03-02 18:27 ` Jan Kara 2026-03-03 9:01 ` Jan Kara 0 siblings, 1 reply; 10+ messages in thread From: Jan Kara @ 2026-03-02 18:27 UTC (permalink / raw) To: Li Chen Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Jan Kara, linux-kernel On Tue 24-02-26 17:24:33, Li Chen wrote: > jbd2_inode fields are updated under journal->j_list_lock, but some paths > read them without holding the lock (e.g. fast commit helpers and ordered > truncate helpers). > > READ_ONCE() alone is not sufficient for the dirty range fields when they > are stored as loff_t because 32-bit platforms can observe torn loads. > Store the dirty range in PAGE_SIZE units as pgoff_t instead. > > Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the > dirty range and i_flags to match the existing lockless access pattern. > > Suggested-by: Jan Kara <jack@suse.cz> > Reviewed-by: Jan Kara <jack@suse.cz> > Signed-off-by: Li Chen <me@linux.beauty> ... > @@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode, > jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino, > transaction->t_tid); > > + start_page = (pgoff_t)(start_byte >> PAGE_SHIFT); > + end_page = (pgoff_t)(end_byte >> PAGE_SHIFT); MAX_LFS_SIZE on 32-bit is ULONG_MAX << PAGE_SHIFT and that's maximum file size. So we could do here end_page = DIV_ROUND_UP(end_byte, PAGE_SIZE) and just use end_page as exclusive end of a range to flush and get rid of special JBD2_INODE_DIRTY_RANGE_NONE value. The problem with the scheme you use is that files of MAX_LFS_SIZE would be treated as having empty flush range... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units 2026-03-02 18:27 ` Jan Kara @ 2026-03-03 9:01 ` Jan Kara 2026-03-06 5:44 ` Li Chen 0 siblings, 1 reply; 10+ messages in thread From: Jan Kara @ 2026-03-03 9:01 UTC (permalink / raw) To: Li Chen Cc: Theodore Ts'o, Jan Kara, Mark Fasheh, linux-ext4, ocfs2-devel, Jan Kara, linux-kernel On Mon 02-03-26 19:27:13, Jan Kara wrote: > On Tue 24-02-26 17:24:33, Li Chen wrote: > > jbd2_inode fields are updated under journal->j_list_lock, but some paths > > read them without holding the lock (e.g. fast commit helpers and ordered > > truncate helpers). > > > > READ_ONCE() alone is not sufficient for the dirty range fields when they > > are stored as loff_t because 32-bit platforms can observe torn loads. > > Store the dirty range in PAGE_SIZE units as pgoff_t instead. > > > > Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the > > dirty range and i_flags to match the existing lockless access pattern. > > > > Suggested-by: Jan Kara <jack@suse.cz> > > Reviewed-by: Jan Kara <jack@suse.cz> > > Signed-off-by: Li Chen <me@linux.beauty> > ... > > @@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode, > > jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino, > > transaction->t_tid); > > > > + start_page = (pgoff_t)(start_byte >> PAGE_SHIFT); > > + end_page = (pgoff_t)(end_byte >> PAGE_SHIFT); > > MAX_LFS_SIZE on 32-bit is ULONG_MAX << PAGE_SHIFT and that's maximum file > size. So we could do here end_page = DIV_ROUND_UP(end_byte, PAGE_SIZE) and > just use end_page as exclusive end of a range to flush and get rid of > special JBD2_INODE_DIRTY_RANGE_NONE value. > > The problem with the scheme you use is that files of MAX_LFS_SIZE would be > treated as having empty flush range... Or perhaps even easier might be to set start_page to ULONG_MAX and end_page to 0 and use start_page > end_page as an indication of empty range. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units 2026-03-03 9:01 ` Jan Kara @ 2026-03-06 5:44 ` Li Chen 0 siblings, 0 replies; 10+ messages in thread From: Li Chen @ 2026-03-06 5:44 UTC (permalink / raw) To: Jan Kara Cc: Theodore Ts'o, Mark Fasheh, linux-ext4, ocfs2-devel, Jan Kara, linux-kernel Hi Jan, ---- On Tue, 03 Mar 2026 17:01:28 +0800 Jan Kara <jack@suse.cz> wrote --- > On Mon 02-03-26 19:27:13, Jan Kara wrote: > > On Tue 24-02-26 17:24:33, Li Chen wrote: > > > jbd2_inode fields are updated under journal->j_list_lock, but some paths > > > read them without holding the lock (e.g. fast commit helpers and ordered > > > truncate helpers). > > > > > > READ_ONCE() alone is not sufficient for the dirty range fields when they > > > are stored as loff_t because 32-bit platforms can observe torn loads. > > > Store the dirty range in PAGE_SIZE units as pgoff_t instead. > > > > > > Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the > > > dirty range and i_flags to match the existing lockless access pattern. > > > > > > Suggested-by: Jan Kara <jack@suse.cz> > > > Reviewed-by: Jan Kara <jack@suse.cz> > > > Signed-off-by: Li Chen <me@linux.beauty> > > ... > > > @@ -2654,15 +2655,20 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode, > > > jbd2_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino, > > > transaction->t_tid); > > > > > > + start_page = (pgoff_t)(start_byte >> PAGE_SHIFT); > > > + end_page = (pgoff_t)(end_byte >> PAGE_SHIFT); > > > > MAX_LFS_SIZE on 32-bit is ULONG_MAX << PAGE_SHIFT and that's maximum file > > size. So we could do here end_page = DIV_ROUND_UP(end_byte, PAGE_SIZE) and > > just use end_page as exclusive end of a range to flush and get rid of > > special JBD2_INODE_DIRTY_RANGE_NONE value. > > > > The problem with the scheme you use is that files of MAX_LFS_SIZE would be > > treated as having empty flush range... > > Or perhaps even easier might be to set start_page to ULONG_MAX and end_page > to 0 and use start_page > end_page as an indication of empty range. Great idea! Thanks for the detailed review. I will update the range handling to avoid the sentinel collision. Regards, Li ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-06 5:44 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-24 9:24 [PATCH v3 0/4] jbd2/ext4/ocfs2: lockless jinode dirty range Li Chen 2026-02-24 9:24 ` [PATCH v3 1/4] jbd2: add jinode dirty range accessors Li Chen 2026-03-02 18:07 ` Jan Kara 2026-02-24 9:24 ` [PATCH v3 2/4] ext4: use jbd2 jinode dirty range accessor Li Chen 2026-02-24 9:24 ` [PATCH v3 3/4] ocfs2: " Li Chen 2026-03-02 18:07 ` Jan Kara 2026-02-24 9:24 ` [PATCH v3 4/4] jbd2: store jinode dirty range in PAGE_SIZE units Li Chen 2026-03-02 18:27 ` Jan Kara 2026-03-03 9:01 ` Jan Kara 2026-03-06 5:44 ` Li Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox