* [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion @ 2026-03-26 14:26 Heming Zhao 2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao 0 siblings, 1 reply; 5+ messages in thread From: Heming Zhao @ 2026-03-26 14:26 UTC (permalink / raw) To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel, glass.su For easier merging, this patch is based on Joseph's patch [1]. v3->v4: Remove [patch 2/2] as the revert operation is incorrect. v2->v3: Following the discussion, use 'batch' and 'handle' to control restarting the jbd2 transaction. v1->v2: following the review comments, restore the i_size update code in ocfs2_dio_end_io_write(). the runtime of the test script [2]. real 1m49.100s user 0m0.303s sys 0m22.672s [1]: https://lore.kernel.org/ocfs2-devel/46yilbaq5z5x6gdfdpoa6lprf6sf3gbxriuku2odje4kx4bovf@jd735cphfutz/T/#t [2]: https://lore.kernel.org/ocfs2-devel/75f89a17-213b-42a0-a30e-d52fb2d077a6@linux.alibaba.com/T/#mbe2b5f52ee249178e1ad4c76d964de2dc818eb32 Heming Zhao (1): ocfs2: split transactions in dio completion to avoid credit exhaustion fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 28 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-26 14:26 [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao @ 2026-03-26 14:26 ` Heming Zhao 2026-03-27 1:42 ` Joseph Qi 0 siblings, 1 reply; 5+ messages in thread From: Heming Zhao @ 2026-03-26 14:26 UTC (permalink / raw) To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel, glass.su During ocfs2 dio operations, JBD2 may report warnings via following call trace: ocfs2_dio_end_io_write ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ocfs2_try_to_merge_extent ocfs2_extend_rotate_transaction ocfs2_extend_trans jbd2__journal_restart start_this_handle output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to handle extent in a batch of transaction. Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should only be removed from the orphan list after the extent tree update is complete. this ensures that if a crash occurs in the middle of extent tree updates, we won't leave stale blocks beyond EOF. Finally, thanks to Jans and Joseph for providing the bug fix prototype and suggestions. Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Heming Zhao <heming.zhao@suse.com> --- fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 28 deletions(-) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 09146b43d1f0..60f1b607022f 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -37,6 +37,8 @@ #include "namei.h" #include "sysfile.h" +#define OCFS2_DIO_MARK_EXTENT_BATCH 200 + static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode, struct ocfs2_alloc_context *meta_ac = NULL; handle_t *handle = NULL; loff_t end = offset + bytes; - int ret = 0, credits = 0; + int ret = 0, credits = 0, batch = 0; ocfs2_init_dealloc_ctxt(&dealloc); @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, goto out; } - /* Delete orphan before acquire i_rwsem. */ - if (dwc->dw_orphaned) { - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); - - end = end > i_size_read(inode) ? end : 0; - - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, - !!end, end); - if (ret < 0) - mlog_errno(ret); - } - down_write(&oi->ip_alloc_sem); di = (struct ocfs2_dinode *)di_bh->b_data; @@ -2326,20 +2316,22 @@ static int ocfs2_dio_end_io_write(struct inode *inode, credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); - handle = ocfs2_start_trans(osb, credits); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - mlog_errno(ret); - goto unlock; - } - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, - OCFS2_JOURNAL_ACCESS_WRITE); - if (ret) { - mlog_errno(ret); - goto commit; - } - list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { + if (!handle) { + handle = ocfs2_start_trans(osb, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + mlog_errno(ret); + handle = NULL; + break; + } + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) { + mlog_errno(ret); + break; + } + } ret = ocfs2_assure_trans_credits(handle, credits); if (ret < 0) { mlog_errno(ret); @@ -2353,17 +2345,41 @@ static int ocfs2_dio_end_io_write(struct inode *inode, mlog_errno(ret); break; } + + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) { + ocfs2_commit_trans(osb, handle); + handle = NULL; + batch = 0; + } } if (end > i_size_read(inode)) { + if (!handle) { + handle = ocfs2_start_trans(osb, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + mlog_errno(ret); + goto unlock; + } + } ret = ocfs2_set_inode_size(handle, inode, di_bh, end); if (ret < 0) mlog_errno(ret); } -commit: - ocfs2_commit_trans(osb, handle); + if (handle) + ocfs2_commit_trans(osb, handle); + unlock: up_write(&oi->ip_alloc_sem); + + /* everything looks good, let's start the cleanup */ + if (dwc->dw_orphaned) { + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); + + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); + if (ret < 0) + mlog_errno(ret); + } ocfs2_inode_unlock(inode, 1); brelse(di_bh); out: -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao @ 2026-03-27 1:42 ` Joseph Qi 2026-03-27 3:02 ` Heming Zhao 0 siblings, 1 reply; 5+ messages in thread From: Joseph Qi @ 2026-03-27 1:42 UTC (permalink / raw) To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara Hi, On 3/26/26 10:26 PM, Heming Zhao wrote: > During ocfs2 dio operations, JBD2 may report warnings via following call trace: > ocfs2_dio_end_io_write > ocfs2_mark_extent_written > ocfs2_change_extent_flag > ocfs2_split_extent > ocfs2_try_to_merge_extent > ocfs2_extend_rotate_transaction > ocfs2_extend_trans > jbd2__journal_restart > start_this_handle > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() > to handle extent in a batch of transaction. > > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > only be removed from the orphan list after the extent tree update is complete. > this ensures that if a crash occurs in the middle of extent tree updates, we > won't leave stale blocks beyond EOF. > > Finally, thanks to Jans and Joseph for providing the bug fix prototype and > suggestions. > > Suggested-by: Jan Kara <jack@suse.cz> > Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> > Reviewed-by: Jan Kara <jack@suse.cz> > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > --- > fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++------------------- > 1 file changed, 44 insertions(+), 28 deletions(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index 09146b43d1f0..60f1b607022f 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -37,6 +37,8 @@ > #include "namei.h" > #include "sysfile.h" > > +#define OCFS2_DIO_MARK_EXTENT_BATCH 200 > + > static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock, > struct buffer_head *bh_result, int create) > { > @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > struct ocfs2_alloc_context *meta_ac = NULL; > handle_t *handle = NULL; > loff_t end = offset + bytes; > - int ret = 0, credits = 0; > + int ret = 0, credits = 0, batch = 0; > > ocfs2_init_dealloc_ctxt(&dealloc); > > @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > goto out; > } > > - /* Delete orphan before acquire i_rwsem. */ > - if (dwc->dw_orphaned) { > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > - > - end = end > i_size_read(inode) ? end : 0; > - > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > - !!end, end); > - if (ret < 0) > - mlog_errno(ret); > - } > - > down_write(&oi->ip_alloc_sem); > di = (struct ocfs2_dinode *)di_bh->b_data; > > @@ -2326,20 +2316,22 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > > credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > > - handle = ocfs2_start_trans(osb, credits); > - if (IS_ERR(handle)) { > - ret = PTR_ERR(handle); > - mlog_errno(ret); > - goto unlock; > - } > - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > - OCFS2_JOURNAL_ACCESS_WRITE); > - if (ret) { > - mlog_errno(ret); > - goto commit; > - } > - > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > + if (!handle) { > + handle = ocfs2_start_trans(osb, credits); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > + mlog_errno(ret); > + handle = NULL; > + break; > + } > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > + OCFS2_JOURNAL_ACCESS_WRITE); > + if (ret) { > + mlog_errno(ret); > + break; > + } > + } > ret = ocfs2_assure_trans_credits(handle, credits); > if (ret < 0) { > mlog_errno(ret); > @@ -2353,17 +2345,41 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > mlog_errno(ret); > break; > } > + > + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) { > + ocfs2_commit_trans(osb, handle); > + handle = NULL; > + batch = 0; > + } > } > > if (end > i_size_read(inode)) { I still don't think it is a good idea to update inode size in case error. The original logic behaves inconsistent, if ocfs2_start_trans() and ocfs2_journal_access_di() fails, it won't update inode size, but if ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do. So let's make it behave consistent by both checking 'ret' here. Other looks fine. Joseph > + if (!handle) { > + handle = ocfs2_start_trans(osb, credits); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > + mlog_errno(ret); > + goto unlock; > + } > + } > ret = ocfs2_set_inode_size(handle, inode, di_bh, end); > if (ret < 0) > mlog_errno(ret); > } > -commit:> - ocfs2_commit_trans(osb, handle); > + if (handle) > + ocfs2_commit_trans(osb, handle); > + > unlock: > up_write(&oi->ip_alloc_sem); > + > + /* everything looks good, let's start the cleanup */ > + if (dwc->dw_orphaned) { > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > + > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); > + if (ret < 0) > + mlog_errno(ret); > + } > ocfs2_inode_unlock(inode, 1); > brelse(di_bh); > out: ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-27 1:42 ` Joseph Qi @ 2026-03-27 3:02 ` Heming Zhao 2026-03-27 3:12 ` Joseph Qi 0 siblings, 1 reply; 5+ messages in thread From: Heming Zhao @ 2026-03-27 3:02 UTC (permalink / raw) To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara On Fri, Mar 27, 2026 at 09:42:55AM +0800, Joseph Qi wrote: > Hi, > > On 3/26/26 10:26 PM, Heming Zhao wrote: > > During ocfs2 dio operations, JBD2 may report warnings via following call trace: > > ocfs2_dio_end_io_write > > ocfs2_mark_extent_written > > ocfs2_change_extent_flag > > ocfs2_split_extent > > ocfs2_try_to_merge_extent > > ocfs2_extend_rotate_transaction > > ocfs2_extend_trans > > jbd2__journal_restart > > start_this_handle > > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > > > > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() > > to handle extent in a batch of transaction. > > > > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > > only be removed from the orphan list after the extent tree update is complete. > > this ensures that if a crash occurs in the middle of extent tree updates, we > > won't leave stale blocks beyond EOF. > > > > Finally, thanks to Jans and Joseph for providing the bug fix prototype and > > suggestions. > > > > Suggested-by: Jan Kara <jack@suse.cz> > > Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> > > Reviewed-by: Jan Kara <jack@suse.cz> > > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > > --- > > fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++------------------- > > 1 file changed, 44 insertions(+), 28 deletions(-) > > > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > > index 09146b43d1f0..60f1b607022f 100644 > > --- a/fs/ocfs2/aops.c > > +++ b/fs/ocfs2/aops.c > > @@ -37,6 +37,8 @@ > > #include "namei.h" > > ... ... <--- snip ---> > > + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) { > > + ocfs2_commit_trans(osb, handle); > > + handle = NULL; > > + batch = 0; > > + } > > } > > > > if (end > i_size_read(inode)) { > > I still don't think it is a good idea to update inode size in case error. > The original logic behaves inconsistent, if ocfs2_start_trans() and > ocfs2_journal_access_di() fails, it won't update inode size, but if > ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do. > So let's make it behave consistent by both checking 'ret' here. > > Other looks fine. > > Joseph After sending the v4 patch, I did some tests and identified the key of the performance degradation in the dynamic credit adjustment patch. I found that by using half of the jbd2_max value, the time consumption returned to the same level as the "direct loop" style. key code change: (base on patch[1]: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch) int jbd2_max = (journal->j_max_transaction_buffers - journal->j_transaction_overhead_buffers) >> 2; my test env: - 10G ocfs2 volume - DIO 2G file, write with 64MB block (--bs=64M). observations: - ocfs2_assure_trans_credits calls "commit & start trans" times on every 64MB block: - using jbd2_max (5449): ~9 times. - using half value (2724): ~20 times. - v4 patch ("batch" style), call "commit & start trans" on every 64MB block: ~62 times time consumption: jbd2_max:5449 case real 2m9.346s user 0m0.277s sys 0m22.748s half jbd2_max:2724 case real 1m49.400s user 0m0.343s sys 0m22.734s v4 patch: real 1m49.650s user 0m0.337s sys 0m22.780s the reason (I guess): The original patch only performed a "commit and start new trans" when jbd2 credits were nearly exhausted. I suspect contention between the commit flush operation and the jbd2 operations in the dio end path, leading to high latency. By using half the jbd2_max value, trigger commits more frequently, which appears to reduce jbd2 contention and racing. Finally, I suggest we use the dynamic credit extension patch for v5. This version only adds logic to ocfs2_assure_trans_credits() and does not affect i_size changes during error cases. [1]: searching "0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch" in following url: - https://lore.kernel.org/ocfs2-devel/75f89a17-213b-42a0-a30e-d52fb2d077a6@linux.alibaba.com/T/#mbe2b5f52ee249178e1ad4c76d964de2dc818eb32 ----- For easy of discussion, I past the full dynamic credits adjustment patch (with half jbd2_max) below. ``` Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path --- fs/ocfs2/aops.c | 31 +++++++++++++++++-------------- fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++---- fs/ocfs2/journal.h | 5 +++-- 3 files changed, 50 insertions(+), 20 deletions(-) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 7a65d5a36a3e..9cdecfa0ea0c 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode, handle_t *handle = NULL; loff_t end = offset + bytes; int ret = 0, credits = 0; + /* do the same job of jbd2_max_user_trans_buffers() */ + journal_t *journal = osb->journal->j_journal; + int jbd2_max = (journal->j_max_transaction_buffers - + journal->j_transaction_overhead_buffers) >> 2; ocfs2_init_dealloc_ctxt(&dealloc); @@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, goto out; } - /* Delete orphan before acquire i_rwsem. */ - if (dwc->dw_orphaned) { - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); - - end = end > i_size_read(inode) ? end : 0; - - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, - !!end, end); - if (ret < 0) - mlog_errno(ret); - } - down_write(&oi->ip_alloc_sem); di = (struct ocfs2_dinode *)di_bh->b_data; @@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode, } list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { - ret = ocfs2_assure_trans_credits(handle, credits); - if (ret < 0) { + ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max); + if (ret == 1) { + goto unlock; + } else if (ret < 0) { mlog_errno(ret); break; } @@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode, ocfs2_commit_trans(osb, handle); unlock: up_write(&oi->ip_alloc_sem); + + if (dwc->dw_orphaned) { + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); + + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); + if (ret < 0) + mlog_errno(ret); + } + ocfs2_inode_unlock(inode, 1); brelse(di_bh); out: diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index e5f58ff2175f..57ad69d19494 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks) * credits. Similar notes regarding data consistency and locking implications * as for ocfs2_extend_trans() apply here. */ -int ocfs2_assure_trans_credits(handle_t *handle, int nblocks) +int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle, + struct buffer_head *bh, int nblocks, int jbd2_max) { + handle_t *handle = *i_handle; int old_nblks = jbd2_handle_buffer_credits(handle); + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); + int ret, wanted; + int plan = nblocks << 1; /* Double the credits to prevent boundary issues */ trace_ocfs2_assure_trans_credits(old_nblks); - if (old_nblks >= nblocks) - return 0; - return ocfs2_extend_trans(handle, nblocks - old_nblks); + + wanted = old_nblks + plan; + + if (wanted < jbd2_max) { + if (old_nblks < nblocks) + return ocfs2_extend_trans(handle, nblocks - old_nblks); + else + return 0; + } + + ocfs2_commit_trans(osb, handle); + handle = ocfs2_start_trans(osb, plan); + if (IS_ERR(handle)) { + mlog_errno(PTR_ERR(handle)); + return 1; /* caller must not commit trans for this error */ + } + + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) + mlog_errno(ret); + + *i_handle = handle; + return ret; } /* diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h index 6397170f302f..08a76ffb870a 100644 --- a/fs/ocfs2/journal.h +++ b/fs/ocfs2/journal.h @@ -244,8 +244,9 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb, int ocfs2_commit_trans(struct ocfs2_super *osb, handle_t *handle); int ocfs2_extend_trans(handle_t *handle, int nblocks); -int ocfs2_assure_trans_credits(handle_t *handle, - int nblocks); +int ocfs2_assure_trans_credits(struct inode *inode, + handle_t **i_handle, struct buffer_head *bh, + int nblocks, int jbd2_max); int ocfs2_allocate_extend_trans(handle_t *handle, int thresh); -- 2.43.0 ``` - Heming ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-27 3:02 ` Heming Zhao @ 2026-03-27 3:12 ` Joseph Qi 0 siblings, 0 replies; 5+ messages in thread From: Joseph Qi @ 2026-03-27 3:12 UTC (permalink / raw) To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara On 3/27/26 11:02 AM, Heming Zhao wrote: > On Fri, Mar 27, 2026 at 09:42:55AM +0800, Joseph Qi wrote: >> Hi, >> >> On 3/26/26 10:26 PM, Heming Zhao wrote: >>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: >>> ocfs2_dio_end_io_write >>> ocfs2_mark_extent_written >>> ocfs2_change_extent_flag >>> ocfs2_split_extent >>> ocfs2_try_to_merge_extent >>> ocfs2_extend_rotate_transaction >>> ocfs2_extend_trans >>> jbd2__journal_restart >>> start_this_handle >>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 >>> >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() >>> to handle extent in a batch of transaction. >>> >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should >>> only be removed from the orphan list after the extent tree update is complete. >>> this ensures that if a crash occurs in the middle of extent tree updates, we >>> won't leave stale blocks beyond EOF. >>> >>> Finally, thanks to Jans and Joseph for providing the bug fix prototype and >>> suggestions. >>> >>> Suggested-by: Jan Kara <jack@suse.cz> >>> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> >>> Reviewed-by: Jan Kara <jack@suse.cz> >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> >>> --- >>> fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++------------------- >>> 1 file changed, 44 insertions(+), 28 deletions(-) >>> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >>> index 09146b43d1f0..60f1b607022f 100644 >>> --- a/fs/ocfs2/aops.c >>> +++ b/fs/ocfs2/aops.c >>> @@ -37,6 +37,8 @@ >>> #include "namei.h" >>> ... ... <--- snip ---> >>> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) { >>> + ocfs2_commit_trans(osb, handle); >>> + handle = NULL; >>> + batch = 0; >>> + } >>> } >>> >>> if (end > i_size_read(inode)) { >> >> I still don't think it is a good idea to update inode size in case error. >> The original logic behaves inconsistent, if ocfs2_start_trans() and >> ocfs2_journal_access_di() fails, it won't update inode size, but if >> ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do. >> So let's make it behave consistent by both checking 'ret' here. >> >> Other looks fine. >> >> Joseph > > After sending the v4 patch, I did some tests and identified the key of the > performance degradation in the dynamic credit adjustment patch. > I found that by using half of the jbd2_max value, the time consumption returned > to the same level as the "direct loop" style. > > key code change: > (base on patch[1]: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch) > int jbd2_max = (journal->j_max_transaction_buffers - > journal->j_transaction_overhead_buffers) >> 2; > > my test env: > - 10G ocfs2 volume > - DIO 2G file, write with 64MB block (--bs=64M). > > observations: > - ocfs2_assure_trans_credits calls "commit & start trans" times on every 64MB block: > - using jbd2_max (5449): ~9 times. > - using half value (2724): ~20 times. > - v4 patch ("batch" style), call "commit & start trans" on every 64MB block: ~62 times > > time consumption: > > jbd2_max:5449 case > real 2m9.346s > user 0m0.277s > sys 0m22.748s > > half jbd2_max:2724 case > real 1m49.400s > user 0m0.343s > sys 0m22.734s > > v4 patch: > real 1m49.650s > user 0m0.337s > sys 0m22.780s > > > the reason (I guess): > The original patch only performed a "commit and start new trans" when jbd2 > credits were nearly exhausted. I suspect contention between the commit flush > operation and the jbd2 operations in the dio end path, leading to high latency. > By using half the jbd2_max value, trigger commits more frequently, which appears > to reduce jbd2 contention and racing. > > Finally, I suggest we use the dynamic credit extension patch for v5. This > version only adds logic to ocfs2_assure_trans_credits() and does not affect > i_size changes during error cases. > Ummm... I think v4 looks simpler and direct. For i_size update logic, I think the existing logic looks wried. Take a look at ext4, it also doesn't do this if converting unwritten extents fails. Joseph ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-27 3:12 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-26 14:26 [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao 2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao 2026-03-27 1:42 ` Joseph Qi 2026-03-27 3:02 ` Heming Zhao 2026-03-27 3:12 ` Joseph Qi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox