* [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid @ 2026-03-12 16:27 Heming Zhao 2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao 0 siblings, 1 reply; 13+ messages in thread From: Heming Zhao @ 2026-03-12 16:27 UTC (permalink / raw) To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel v1->v2: following the review comments, restore the i_size update code in ocfs2_dio_end_io_write(). For easier merging, this patch is based on Joseph's patch: https://lore.kernel.org/ocfs2-devel/46yilbaq5z5x6gdfdpoa6lprf6sf3gbxriuku2odje4kx4bovf@jd735cphfutz/T/#t Heming Zhao (1): ocfs2: split transactions in dio completion to avoid credit exhaustion fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- 1 file changed, 29 insertions(+), 29 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-12 16:27 [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid Heming Zhao @ 2026-03-12 16:27 ` Heming Zhao 2026-03-13 2:04 ` Joseph Qi 2026-03-13 2:27 ` Joseph Qi 0 siblings, 2 replies; 13+ messages in thread From: Heming Zhao @ 2026-03-12 16:27 UTC (permalink / raw) To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel During ocfs2 dio operations, JBD2 may report warnings via following call trace: ocfs2_dio_end_io_write ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ocfs2_try_to_merge_extent ocfs2_extend_rotate_transaction ocfs2_extend_trans jbd2__journal_restart start_this_handle output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to handle each extent in a separate transaction. Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should only be removed from the orphan list after the extent tree update is complete. this ensures that if a crash occurs in the middle of extent tree updates, we won't leave stale blocks beyond EOF. This patch also removes the only call to ocfs2_assure_trans_credits(), which was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to insufficient transaction credits"). Finally, thanks to Jans for providing the bug fix prototype and suggestions. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Heming Zhao <heming.zhao@suse.com> --- fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 09146b43d1f0..91997b330d39 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, goto out; } - /* Delete orphan before acquire i_rwsem. */ - if (dwc->dw_orphaned) { - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); - - end = end > i_size_read(inode) ? end : 0; - - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, - !!end, end); - if (ret < 0) - mlog_errno(ret); - } - down_write(&oi->ip_alloc_sem); di = (struct ocfs2_dinode *)di_bh->b_data; @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); - handle = ocfs2_start_trans(osb, credits); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - mlog_errno(ret); - goto unlock; - } - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, - OCFS2_JOURNAL_ACCESS_WRITE); - if (ret) { - mlog_errno(ret); - goto commit; - } - list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { - ret = ocfs2_assure_trans_credits(handle, credits); - if (ret < 0) { + handle = ocfs2_start_trans(osb, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); mlog_errno(ret); break; } + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) { + mlog_errno(ret); + ocfs2_commit_trans(osb, handle); + break; + } ret = ocfs2_mark_extent_written(inode, &et, handle, ue->ue_cpos, 1, ue->ue_phys, meta_ac, &dealloc); if (ret < 0) { mlog_errno(ret); + ocfs2_commit_trans(osb, handle); break; } + ocfs2_commit_trans(osb, handle); } if (end > i_size_read(inode)) { + handle = ocfs2_start_trans(osb, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + mlog_errno(ret); + goto unlock; + } ret = ocfs2_set_inode_size(handle, inode, di_bh, end); if (ret < 0) mlog_errno(ret); + ocfs2_commit_trans(osb, handle); } -commit: - ocfs2_commit_trans(osb, handle); + unlock: up_write(&oi->ip_alloc_sem); + + /* everything looks good, let's start the cleanup */ + if (dwc->dw_orphaned) { + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); + + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); + if (ret < 0) + mlog_errno(ret); + } ocfs2_inode_unlock(inode, 1); brelse(di_bh); out: -- 2.43.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao @ 2026-03-13 2:04 ` Joseph Qi 2026-03-13 3:17 ` Heming Zhao 2026-03-13 2:27 ` Joseph Qi 1 sibling, 1 reply; 13+ messages in thread From: Joseph Qi @ 2026-03-13 2:04 UTC (permalink / raw) To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara Almost looks fine, minor updates below. On 3/13/26 12:27 AM, Heming Zhao wrote: > During ocfs2 dio operations, JBD2 may report warnings via following call trace: > ocfs2_dio_end_io_write > ocfs2_mark_extent_written > ocfs2_change_extent_flag > ocfs2_split_extent > ocfs2_try_to_merge_extent > ocfs2_extend_rotate_transaction > ocfs2_extend_trans > jbd2__journal_restart > start_this_handle > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to > handle each extent in a separate transaction. > > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > only be removed from the orphan list after the extent tree update is complete. > this ensures that if a crash occurs in the middle of extent tree updates, we > won't leave stale blocks beyond EOF. > > This patch also removes the only call to ocfs2_assure_trans_credits(), which > was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to > insufficient transaction credits"). > > Finally, thanks to Jans for providing the bug fix prototype and suggestions. > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > --- > fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- > 1 file changed, 29 insertions(+), 29 deletions(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index 09146b43d1f0..91997b330d39 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > goto out; > } > > - /* Delete orphan before acquire i_rwsem. */ > - if (dwc->dw_orphaned) { > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > - > - end = end > i_size_read(inode) ? end : 0; > - > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > - !!end, end); > - if (ret < 0) > - mlog_errno(ret); > - } > - > down_write(&oi->ip_alloc_sem); > di = (struct ocfs2_dinode *)di_bh->b_data; > > @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > > credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > > - handle = ocfs2_start_trans(osb, credits); > - if (IS_ERR(handle)) { > - ret = PTR_ERR(handle); > - mlog_errno(ret); > - goto unlock; > - } > - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > - OCFS2_JOURNAL_ACCESS_WRITE); > - if (ret) { > - mlog_errno(ret); > - goto commit; > - } > - > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > - ret = ocfs2_assure_trans_credits(handle, credits); > - if (ret < 0) { > + handle = ocfs2_start_trans(osb, credits); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > mlog_errno(ret); > break; I'd rather goto unlock directly without update i_size in case error. > } > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > + OCFS2_JOURNAL_ACCESS_WRITE); > + if (ret) { > + mlog_errno(ret); > + ocfs2_commit_trans(osb, handle); > + break; Ditto. > + } > ret = ocfs2_mark_extent_written(inode, &et, handle, > ue->ue_cpos, 1, > ue->ue_phys, > meta_ac, &dealloc); > if (ret < 0) { > mlog_errno(ret); > + ocfs2_commit_trans(osb, handle); > break; Ditto. BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() to save the extra call in case error. Thanks, Joseph > } > + ocfs2_commit_trans(osb, handle); > } > > if (end > i_size_read(inode)) { > + handle = ocfs2_start_trans(osb, credits); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > + mlog_errno(ret); > + goto unlock; > + } > ret = ocfs2_set_inode_size(handle, inode, di_bh, end); > if (ret < 0) > mlog_errno(ret); > + ocfs2_commit_trans(osb, handle); > } > -commit: > - ocfs2_commit_trans(osb, handle); > + > unlock: > up_write(&oi->ip_alloc_sem); > + > + /* everything looks good, let's start the cleanup */ > + if (dwc->dw_orphaned) { > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > + > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); > + if (ret < 0) > + mlog_errno(ret); > + } > ocfs2_inode_unlock(inode, 1); > brelse(di_bh); > out: ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-13 2:04 ` Joseph Qi @ 2026-03-13 3:17 ` Heming Zhao 2026-03-13 4:53 ` Joseph Qi 0 siblings, 1 reply; 13+ messages in thread From: Heming Zhao @ 2026-03-13 3:17 UTC (permalink / raw) To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: > Almost looks fine, minor updates below. > > On 3/13/26 12:27 AM, Heming Zhao wrote: > > During ocfs2 dio operations, JBD2 may report warnings via following call trace: > > ocfs2_dio_end_io_write > > ocfs2_mark_extent_written > > ocfs2_change_extent_flag > > ocfs2_split_extent > > ocfs2_try_to_merge_extent > > ocfs2_extend_rotate_transaction > > ocfs2_extend_trans > > jbd2__journal_restart > > start_this_handle > > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > > > > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to > > handle each extent in a separate transaction. > > > > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > > only be removed from the orphan list after the extent tree update is complete. > > this ensures that if a crash occurs in the middle of extent tree updates, we > > won't leave stale blocks beyond EOF. > > > > This patch also removes the only call to ocfs2_assure_trans_credits(), which > > was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to > > insufficient transaction credits"). > > > > Finally, thanks to Jans for providing the bug fix prototype and suggestions. > > > > Suggested-by: Jan Kara <jack@suse.cz> > > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > > --- > > fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- > > 1 file changed, 29 insertions(+), 29 deletions(-) > > > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > > index 09146b43d1f0..91997b330d39 100644 > > --- a/fs/ocfs2/aops.c > > +++ b/fs/ocfs2/aops.c > > @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > > goto out; > > } > > > > - /* Delete orphan before acquire i_rwsem. */ > > - if (dwc->dw_orphaned) { > > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > > - > > - end = end > i_size_read(inode) ? end : 0; > > - > > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > > - !!end, end); > > - if (ret < 0) > > - mlog_errno(ret); > > - } > > - > > down_write(&oi->ip_alloc_sem); > > di = (struct ocfs2_dinode *)di_bh->b_data; > > > > @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > > > > credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > > > > - handle = ocfs2_start_trans(osb, credits); > > - if (IS_ERR(handle)) { > > - ret = PTR_ERR(handle); > > - mlog_errno(ret); > > - goto unlock; > > - } > > - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > > - OCFS2_JOURNAL_ACCESS_WRITE); > > - if (ret) { > > - mlog_errno(ret); > > - goto commit; > > - } > > - > > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > > - ret = ocfs2_assure_trans_credits(handle, credits); > > - if (ret < 0) { > > + handle = ocfs2_start_trans(osb, credits); > > + if (IS_ERR(handle)) { > > + ret = PTR_ERR(handle); > > mlog_errno(ret); > > break; > > I'd rather goto unlock directly without update i_size in case error. agree > > > } > > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > > + OCFS2_JOURNAL_ACCESS_WRITE); > > + if (ret) { > > + mlog_errno(ret); > > + ocfs2_commit_trans(osb, handle); > > + break; > > Ditto. agree > > > + } > > ret = ocfs2_mark_extent_written(inode, &et, handle, > > ue->ue_cpos, 1, > > ue->ue_phys, > > meta_ac, &dealloc); > > if (ret < 0) { > > mlog_errno(ret); > > + ocfs2_commit_trans(osb, handle); > > break; > > Ditto. The existing code still updates i_size even if ocfs2_mark_extent_written() returns an error. I am not certain whether updating i_size in this case is correct, but I prefer to maintain the original logic for now. Does that seem reasonable to you? > BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() > to save the extra call in case error. agree > > Thanks, > Joseph > I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count is about 12107 ~ 12457. Regarding the transaction splitting in the unwritten handling loop: this patch introduces a minor performance regression. I would like to merge some of the transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations. What do you think of this approach? Heming > > } > > + ocfs2_commit_trans(osb, handle); > > } > > > > if (end > i_size_read(inode)) { > > + handle = ocfs2_start_trans(osb, credits); > > + if (IS_ERR(handle)) { > > + ret = PTR_ERR(handle); > > + mlog_errno(ret); > > + goto unlock; > > + } > > ret = ocfs2_set_inode_size(handle, inode, di_bh, end); > > if (ret < 0) > > mlog_errno(ret); > > + ocfs2_commit_trans(osb, handle); > > } > > -commit: > > - ocfs2_commit_trans(osb, handle); > > + > > unlock: > > up_write(&oi->ip_alloc_sem); > > + > > + /* everything looks good, let's start the cleanup */ > > + if (dwc->dw_orphaned) { > > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > > + > > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); > > + if (ret < 0) > > + mlog_errno(ret); > > + } > > ocfs2_inode_unlock(inode, 1); > > brelse(di_bh); > > out: > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-13 3:17 ` Heming Zhao @ 2026-03-13 4:53 ` Joseph Qi 2026-03-13 5:35 ` Joseph Qi 2026-03-18 6:40 ` Heming Zhao 0 siblings, 2 replies; 13+ messages in thread From: Joseph Qi @ 2026-03-13 4:53 UTC (permalink / raw) To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara On 3/13/26 11:17 AM, Heming Zhao wrote: > On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: >> Almost looks fine, minor updates below. >> >> On 3/13/26 12:27 AM, Heming Zhao wrote: >>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: >>> ocfs2_dio_end_io_write >>> ocfs2_mark_extent_written >>> ocfs2_change_extent_flag >>> ocfs2_split_extent >>> ocfs2_try_to_merge_extent >>> ocfs2_extend_rotate_transaction >>> ocfs2_extend_trans >>> jbd2__journal_restart >>> start_this_handle >>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 >>> >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to >>> handle each extent in a separate transaction. >>> >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should >>> only be removed from the orphan list after the extent tree update is complete. >>> this ensures that if a crash occurs in the middle of extent tree updates, we >>> won't leave stale blocks beyond EOF. >>> >>> This patch also removes the only call to ocfs2_assure_trans_credits(), which >>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to >>> insufficient transaction credits"). >>> >>> Finally, thanks to Jans for providing the bug fix prototype and suggestions. >>> >>> Suggested-by: Jan Kara <jack@suse.cz> >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> >>> --- >>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- >>> 1 file changed, 29 insertions(+), 29 deletions(-) >>> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >>> index 09146b43d1f0..91997b330d39 100644 >>> --- a/fs/ocfs2/aops.c >>> +++ b/fs/ocfs2/aops.c >>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, >>> goto out; >>> } >>> >>> - /* Delete orphan before acquire i_rwsem. */ >>> - if (dwc->dw_orphaned) { >>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); >>> - >>> - end = end > i_size_read(inode) ? end : 0; >>> - >>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, >>> - !!end, end); >>> - if (ret < 0) >>> - mlog_errno(ret); >>> - } >>> - >>> down_write(&oi->ip_alloc_sem); >>> di = (struct ocfs2_dinode *)di_bh->b_data; >>> >>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, >>> >>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); >>> >>> - handle = ocfs2_start_trans(osb, credits); >>> - if (IS_ERR(handle)) { >>> - ret = PTR_ERR(handle); >>> - mlog_errno(ret); >>> - goto unlock; >>> - } >>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, >>> - OCFS2_JOURNAL_ACCESS_WRITE); >>> - if (ret) { >>> - mlog_errno(ret); >>> - goto commit; >>> - } >>> - >>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { >>> - ret = ocfs2_assure_trans_credits(handle, credits); >>> - if (ret < 0) { >>> + handle = ocfs2_start_trans(osb, credits); >>> + if (IS_ERR(handle)) { >>> + ret = PTR_ERR(handle); >>> mlog_errno(ret); >>> break; >> >> I'd rather goto unlock directly without update i_size in case error. > > agree >> >>> } >>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, >>> + OCFS2_JOURNAL_ACCESS_WRITE); >>> + if (ret) { >>> + mlog_errno(ret); >>> + ocfs2_commit_trans(osb, handle); >>> + break; >> >> Ditto. > > agree >> >>> + } >>> ret = ocfs2_mark_extent_written(inode, &et, handle, >>> ue->ue_cpos, 1, >>> ue->ue_phys, >>> meta_ac, &dealloc); >>> if (ret < 0) { >>> mlog_errno(ret); >>> + ocfs2_commit_trans(osb, handle); >>> break; >> >> Ditto. > > The existing code still updates i_size even if ocfs2_mark_extent_written() > returns an error. I am not certain whether updating i_size in this case is > correct, but I prefer to maintain the original logic for now. > Does that seem reasonable to you? > Since it returns 0 for unwritten extents, it looks fine. >> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() >> to save the extra call in case error. > > agree > > I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count > is about 12107 ~ 12457. > Regarding the transaction splitting in the unwritten handling loop: this patch > introduces a minor performance regression. I would like to merge some of the > transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations. > What do you think of this approach? > Seems we can limit max credit for a single transaction here? If exceeds, restart a new one. Joseph, Thanks ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-13 4:53 ` Joseph Qi @ 2026-03-13 5:35 ` Joseph Qi 2026-03-18 6:40 ` Heming Zhao 1 sibling, 0 replies; 13+ messages in thread From: Joseph Qi @ 2026-03-13 5:35 UTC (permalink / raw) To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara On 3/13/26 12:53 PM, Joseph Qi wrote: > > > On 3/13/26 11:17 AM, Heming Zhao wrote: >> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: >>> Almost looks fine, minor updates below. >>> >>> On 3/13/26 12:27 AM, Heming Zhao wrote: >>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: >>>> ocfs2_dio_end_io_write >>>> ocfs2_mark_extent_written >>>> ocfs2_change_extent_flag >>>> ocfs2_split_extent >>>> ocfs2_try_to_merge_extent >>>> ocfs2_extend_rotate_transaction >>>> ocfs2_extend_trans >>>> jbd2__journal_restart >>>> start_this_handle >>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 >>>> >>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to >>>> handle each extent in a separate transaction. >>>> >>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should >>>> only be removed from the orphan list after the extent tree update is complete. >>>> this ensures that if a crash occurs in the middle of extent tree updates, we >>>> won't leave stale blocks beyond EOF. >>>> >>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which >>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to >>>> insufficient transaction credits"). >>>> >>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions. >>>> >>>> Suggested-by: Jan Kara <jack@suse.cz> >>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> >>>> --- >>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- >>>> 1 file changed, 29 insertions(+), 29 deletions(-) >>>> >>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >>>> index 09146b43d1f0..91997b330d39 100644 >>>> --- a/fs/ocfs2/aops.c >>>> +++ b/fs/ocfs2/aops.c >>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, >>>> goto out; >>>> } >>>> >>>> - /* Delete orphan before acquire i_rwsem. */ >>>> - if (dwc->dw_orphaned) { >>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); >>>> - >>>> - end = end > i_size_read(inode) ? end : 0; >>>> - >>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, >>>> - !!end, end); >>>> - if (ret < 0) >>>> - mlog_errno(ret); >>>> - } >>>> - >>>> down_write(&oi->ip_alloc_sem); >>>> di = (struct ocfs2_dinode *)di_bh->b_data; >>>> >>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, >>>> >>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); >>>> >>>> - handle = ocfs2_start_trans(osb, credits); >>>> - if (IS_ERR(handle)) { >>>> - ret = PTR_ERR(handle); >>>> - mlog_errno(ret); >>>> - goto unlock; >>>> - } >>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, >>>> - OCFS2_JOURNAL_ACCESS_WRITE); >>>> - if (ret) { >>>> - mlog_errno(ret); >>>> - goto commit; >>>> - } >>>> - >>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { >>>> - ret = ocfs2_assure_trans_credits(handle, credits); >>>> - if (ret < 0) { >>>> + handle = ocfs2_start_trans(osb, credits); >>>> + if (IS_ERR(handle)) { >>>> + ret = PTR_ERR(handle); >>>> mlog_errno(ret); >>>> break; >>> >>> I'd rather goto unlock directly without update i_size in case error. >> >> agree >>> >>>> } >>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, >>>> + OCFS2_JOURNAL_ACCESS_WRITE); >>>> + if (ret) { >>>> + mlog_errno(ret); >>>> + ocfs2_commit_trans(osb, handle); >>>> + break; >>> >>> Ditto. >> >> agree >>> >>>> + } >>>> ret = ocfs2_mark_extent_written(inode, &et, handle, >>>> ue->ue_cpos, 1, >>>> ue->ue_phys, >>>> meta_ac, &dealloc); >>>> if (ret < 0) { >>>> mlog_errno(ret); >>>> + ocfs2_commit_trans(osb, handle); >>>> break; >>> >>> Ditto. >> >> The existing code still updates i_size even if ocfs2_mark_extent_written() >> returns an error. I am not certain whether updating i_size in this case is >> correct, but I prefer to maintain the original logic for now. >> Does that seem reasonable to you? >> > > Since it returns 0 for unwritten extents, it looks fine. > Think it a bit more, I think it would be more acceptable to behave consistent here. e.g. succeeds in the first round and fails to start transaction in the second, now it won't update i_size. Joseph ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-13 4:53 ` Joseph Qi 2026-03-13 5:35 ` Joseph Qi @ 2026-03-18 6:40 ` Heming Zhao 2026-03-19 1:53 ` Joseph Qi 2026-03-24 6:00 ` Heming Zhao 1 sibling, 2 replies; 13+ messages in thread From: Heming Zhao @ 2026-03-18 6:40 UTC (permalink / raw) To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote: > > > On 3/13/26 11:17 AM, Heming Zhao wrote: > > On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: > >> Almost looks fine, minor updates below. > >> > >> On 3/13/26 12:27 AM, Heming Zhao wrote: > >>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: > >>> ocfs2_dio_end_io_write > >>> ocfs2_mark_extent_written > >>> ocfs2_change_extent_flag > >>> ocfs2_split_extent > >>> ocfs2_try_to_merge_extent > >>> ocfs2_extend_rotate_transaction > >>> ocfs2_extend_trans > >>> jbd2__journal_restart > >>> start_this_handle > >>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > >>> > >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to > >>> handle each extent in a separate transaction. > >>> > >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > >>> only be removed from the orphan list after the extent tree update is complete. > >>> this ensures that if a crash occurs in the middle of extent tree updates, we > >>> won't leave stale blocks beyond EOF. > >>> > >>> This patch also removes the only call to ocfs2_assure_trans_credits(), which > >>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to > >>> insufficient transaction credits"). > >>> > >>> Finally, thanks to Jans for providing the bug fix prototype and suggestions. > >>> > >>> Suggested-by: Jan Kara <jack@suse.cz> > >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> > >>> --- > >>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- > >>> 1 file changed, 29 insertions(+), 29 deletions(-) > >>> > >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > >>> index 09146b43d1f0..91997b330d39 100644 > >>> --- a/fs/ocfs2/aops.c > >>> +++ b/fs/ocfs2/aops.c > >>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > >>> goto out; > >>> } > >>> > >>> - /* Delete orphan before acquire i_rwsem. */ > >>> - if (dwc->dw_orphaned) { > >>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > >>> - > >>> - end = end > i_size_read(inode) ? end : 0; > >>> - > >>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > >>> - !!end, end); > >>> - if (ret < 0) > >>> - mlog_errno(ret); > >>> - } > >>> - > >>> down_write(&oi->ip_alloc_sem); > >>> di = (struct ocfs2_dinode *)di_bh->b_data; > >>> > >>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > >>> > >>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > >>> > >>> - handle = ocfs2_start_trans(osb, credits); > >>> - if (IS_ERR(handle)) { > >>> - ret = PTR_ERR(handle); > >>> - mlog_errno(ret); > >>> - goto unlock; > >>> - } > >>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > >>> - OCFS2_JOURNAL_ACCESS_WRITE); > >>> - if (ret) { > >>> - mlog_errno(ret); > >>> - goto commit; > >>> - } > >>> - > >>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > >>> - ret = ocfs2_assure_trans_credits(handle, credits); > >>> - if (ret < 0) { > >>> + handle = ocfs2_start_trans(osb, credits); > >>> + if (IS_ERR(handle)) { > >>> + ret = PTR_ERR(handle); > >>> mlog_errno(ret); > >>> break; > >> > >> I'd rather goto unlock directly without update i_size in case error. > > > > agree > >> > >>> } > >>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > >>> + OCFS2_JOURNAL_ACCESS_WRITE); > >>> + if (ret) { > >>> + mlog_errno(ret); > >>> + ocfs2_commit_trans(osb, handle); > >>> + break; > >> > >> Ditto. > > > > agree > >> > >>> + } > >>> ret = ocfs2_mark_extent_written(inode, &et, handle, > >>> ue->ue_cpos, 1, > >>> ue->ue_phys, > >>> meta_ac, &dealloc); > >>> if (ret < 0) { > >>> mlog_errno(ret); > >>> + ocfs2_commit_trans(osb, handle); > >>> break; > >> > >> Ditto. > > > > The existing code still updates i_size even if ocfs2_mark_extent_written() > > returns an error. I am not certain whether updating i_size in this case is > > correct, but I prefer to maintain the original logic for now. > > Does that seem reasonable to you? > > > > Since it returns 0 for unwritten extents, it looks fine. > > >> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() > >> to save the extra call in case error. > > > > agree > > > > I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count > > is about 12107 ~ 12457. > > Regarding the transaction splitting in the unwritten handling loop: this patch > > introduces a minor performance regression. I would like to merge some of the > > transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations. > > What do you think of this approach? > > > > Seems we can limit max credit for a single transaction here? > If exceeds, restart a new one. > > Joseph, > Thanks sorry for the late reply, I spend some time running the tests. I have created two different types of patches (see below) to address this issue. time consumption results: ocfs2-dynamic-... patch real 2m9.900s user 0m0.333s sys 0m22.687s %200 of ocfs2-split-trans-... patch real 1m48.901s user 0m0.306s sys 0m23.019s %500 of ocfs2-split-trans-... patch real 1m49.350s user 0m0.299s sys 0m22.718s As shown above, the fixed "500% mode" is faster than the dynamic style. Which approach do you prefer? --- the test script: ``` #!/bin/bash # thanks Wolfgang DISK="vdc" MOUNTPOINT="/mnt/ocfs2" TEST_FILE="$MOUNTPOINT/test.bin" if lsblk | grep "$DISK.*$MOUNTPOINT"; then umount $MOUNTPOINT fi echo "------mkfs & mount----------------" mkfs.ocfs2 -b 4K -C 4K --cluster-stack=pcmk -N 2 /dev/$DISK --cluster-name=hacluster mount.ocfs2 -t ocfs2 /dev/$DISK $MOUNTPOINT sleep 1 echo "------fallocate----------------" fallocate -l 2G "$TEST_FILE" sleep 1 echo "------setup file----------------" fio --name=setup --filename="$TEST_FILE" --rw=randwrite --bs=4k \ --ioengine=libaio --iodepth=64 --size=2G --io_size=512M \ --direct=0 --end_fsync=1 --minimal sync sleep 3 echo "-----start test-----------------" time fio --name=tst --filename="$TEST_FILE" --rw=write --bs=64M \ --ioengine=libaio --iodepth=16 --size=2G --direct=1 ``` --- patch: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path --- fs/ocfs2/aops.c | 31 +++++++++++++++++-------------- fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++---- fs/ocfs2/journal.h | 5 +++-- 3 files changed, 50 insertions(+), 20 deletions(-) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 7a65d5a36a3e..d9be765662dc 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode, handle_t *handle = NULL; loff_t end = offset + bytes; int ret = 0, credits = 0; + /* do the same job of jbd2_max_user_trans_buffers() */ + journal_t *journal = osb->journal->j_journal; + int jbd2_max = journal->j_max_transaction_buffers - + journal->j_transaction_overhead_buffers; ocfs2_init_dealloc_ctxt(&dealloc); @@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, goto out; } - /* Delete orphan before acquire i_rwsem. */ - if (dwc->dw_orphaned) { - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); - - end = end > i_size_read(inode) ? end : 0; - - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, - !!end, end); - if (ret < 0) - mlog_errno(ret); - } - down_write(&oi->ip_alloc_sem); di = (struct ocfs2_dinode *)di_bh->b_data; @@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode, } list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { - ret = ocfs2_assure_trans_credits(handle, credits); - if (ret < 0) { + ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max); + if (ret == 1) { + goto unlock; + } else if (ret < 0) { mlog_errno(ret); break; } @@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode, ocfs2_commit_trans(osb, handle); unlock: up_write(&oi->ip_alloc_sem); + + if (dwc->dw_orphaned) { + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); + + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); + if (ret < 0) + mlog_errno(ret); + } + ocfs2_inode_unlock(inode, 1); brelse(di_bh); out: diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index e5f58ff2175f..57ad69d19494 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks) * credits. Similar notes regarding data consistency and locking implications * as for ocfs2_extend_trans() apply here. */ -int ocfs2_assure_trans_credits(handle_t *handle, int nblocks) +int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle, + struct buffer_head *bh, int nblocks, int jbd2_max) { + handle_t *handle = *i_handle; int old_nblks = jbd2_handle_buffer_credits(handle); + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); + int ret, wanted; + int plan = nblocks << 1; /* Double the credits to prevent boundary issues */ trace_ocfs2_assure_trans_credits(old_nblks); - if (old_nblks >= nblocks) - return 0; - return ocfs2_extend_trans(handle, nblocks - old_nblks); + + wanted = old_nblks + plan; + + if (wanted < jbd2_max) { + if (old_nblks < nblocks) + return ocfs2_extend_trans(handle, nblocks - old_nblks); + else + return 0; + } + + ocfs2_commit_trans(osb, handle); + handle = ocfs2_start_trans(osb, plan); + if (IS_ERR(handle)) { + mlog_errno(PTR_ERR(handle)); + return 1; /* caller must not commit trans for this error */ + } + + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) + mlog_errno(ret); + + *i_handle = handle; + return ret; } /* diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h index 6397170f302f..08a76ffb870a 100644 --- a/fs/ocfs2/journal.h +++ b/fs/ocfs2/journal.h @@ -244,8 +244,9 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb, int ocfs2_commit_trans(struct ocfs2_super *osb, handle_t *handle); int ocfs2_extend_trans(handle_t *handle, int nblocks); -int ocfs2_assure_trans_credits(handle_t *handle, - int nblocks); +int ocfs2_assure_trans_credits(struct inode *inode, + handle_t **i_handle, struct buffer_head *bh, + int nblocks, int jbd2_max); int ocfs2_allocate_extend_trans(handle_t *handle, int thresh); -- 2.43.0 ---------- patch: 0001-ocfs2-split-trans-in-end-dio-to-avoid-credit-exhaust.patch Subject: [PATCH] ocfs2: split trans in end dio to avoid credit exhaustion --- fs/ocfs2/aops.c | 77 +++++++++++++++++++++++++++++-------------------- 1 file changed, 45 insertions(+), 32 deletions(-) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 7a65d5a36a3e..9053f1ee587a 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -2279,6 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode, handle_t *handle = NULL; loff_t end = offset + bytes; int ret = 0, credits = 0; + int cnt = 0, trans_start = 0, mod = 500; ocfs2_init_dealloc_ctxt(&dealloc); @@ -2295,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, goto out; } - /* Delete orphan before acquire i_rwsem. */ - if (dwc->dw_orphaned) { - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); - - end = end > i_size_read(inode) ? end : 0; - - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, - !!end, end); - if (ret < 0) - mlog_errno(ret); - } - down_write(&oi->ip_alloc_sem); di = (struct ocfs2_dinode *)di_bh->b_data; @@ -2327,44 +2316,68 @@ static int ocfs2_dio_end_io_write(struct inode *inode, credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); - handle = ocfs2_start_trans(osb, credits); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - mlog_errno(ret); - goto unlock; - } - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, - OCFS2_JOURNAL_ACCESS_WRITE); - if (ret) { - mlog_errno(ret); - goto commit; - } - list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { - ret = ocfs2_assure_trans_credits(handle, credits); - if (ret < 0) { - mlog_errno(ret); - break; + if ((cnt++ % mod) == 0) { + trans_start = 1; + handle = ocfs2_start_trans(osb, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + mlog_errno(ret); + goto unlock; + } + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) { + mlog_errno(ret); + goto unlock; + } } ret = ocfs2_mark_extent_written(inode, &et, handle, ue->ue_cpos, 1, ue->ue_phys, meta_ac, &dealloc); + if ((cnt % mod) == 0) { + ocfs2_commit_trans(osb, handle); + if (ret < 0) { + mlog_errno(ret); + break; + } + trans_start = 0; + } + } + + //commit if above loop doesn't do + if (trans_start) { + ocfs2_commit_trans(osb, handle); if (ret < 0) { mlog_errno(ret); - break; } } if (end > i_size_read(inode)) { + handle = ocfs2_start_trans(osb, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + mlog_errno(ret); + goto unlock; + } ret = ocfs2_set_inode_size(handle, inode, di_bh, end); if (ret < 0) mlog_errno(ret); + ocfs2_commit_trans(osb, handle); } -commit: - ocfs2_commit_trans(osb, handle); + unlock: up_write(&oi->ip_alloc_sem); + + /* everything looks good, let's start the cleanup */ + if (dwc->dw_orphaned) { + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); + + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); + if (ret < 0) + mlog_errno(ret); + } ocfs2_inode_unlock(inode, 1); brelse(di_bh); out: -- 2.43.0 Thanks, Heming ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-18 6:40 ` Heming Zhao @ 2026-03-19 1:53 ` Joseph Qi 2026-03-24 7:18 ` Heming Zhao 2026-03-25 3:31 ` Heming Zhao 2026-03-24 6:00 ` Heming Zhao 1 sibling, 2 replies; 13+ messages in thread From: Joseph Qi @ 2026-03-19 1:53 UTC (permalink / raw) To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su On 3/18/26 2:40 PM, Heming Zhao wrote: > On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote: >> >> >> On 3/13/26 11:17 AM, Heming Zhao wrote: >>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: >>>> Almost looks fine, minor updates below. >>>> >>>> On 3/13/26 12:27 AM, Heming Zhao wrote: >>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: >>>>> ocfs2_dio_end_io_write >>>>> ocfs2_mark_extent_written >>>>> ocfs2_change_extent_flag >>>>> ocfs2_split_extent >>>>> ocfs2_try_to_merge_extent >>>>> ocfs2_extend_rotate_transaction >>>>> ocfs2_extend_trans >>>>> jbd2__journal_restart >>>>> start_this_handle >>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 >>>>> >>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to >>>>> handle each extent in a separate transaction. >>>>> >>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should >>>>> only be removed from the orphan list after the extent tree update is complete. >>>>> this ensures that if a crash occurs in the middle of extent tree updates, we >>>>> won't leave stale blocks beyond EOF. >>>>> >>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which >>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to >>>>> insufficient transaction credits"). >>>>> >>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions. >>>>> >>>>> Suggested-by: Jan Kara <jack@suse.cz> >>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> >>>>> --- >>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- >>>>> 1 file changed, 29 insertions(+), 29 deletions(-) >>>>> >>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >>>>> index 09146b43d1f0..91997b330d39 100644 >>>>> --- a/fs/ocfs2/aops.c >>>>> +++ b/fs/ocfs2/aops.c >>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, >>>>> goto out; >>>>> } >>>>> >>>>> - /* Delete orphan before acquire i_rwsem. */ >>>>> - if (dwc->dw_orphaned) { >>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); >>>>> - >>>>> - end = end > i_size_read(inode) ? end : 0; >>>>> - >>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, >>>>> - !!end, end); >>>>> - if (ret < 0) >>>>> - mlog_errno(ret); >>>>> - } >>>>> - >>>>> down_write(&oi->ip_alloc_sem); >>>>> di = (struct ocfs2_dinode *)di_bh->b_data; >>>>> >>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, >>>>> >>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); >>>>> >>>>> - handle = ocfs2_start_trans(osb, credits); >>>>> - if (IS_ERR(handle)) { >>>>> - ret = PTR_ERR(handle); >>>>> - mlog_errno(ret); >>>>> - goto unlock; >>>>> - } >>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, >>>>> - OCFS2_JOURNAL_ACCESS_WRITE); >>>>> - if (ret) { >>>>> - mlog_errno(ret); >>>>> - goto commit; >>>>> - } >>>>> - >>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { >>>>> - ret = ocfs2_assure_trans_credits(handle, credits); >>>>> - if (ret < 0) { >>>>> + handle = ocfs2_start_trans(osb, credits); >>>>> + if (IS_ERR(handle)) { >>>>> + ret = PTR_ERR(handle); >>>>> mlog_errno(ret); >>>>> break; >>>> >>>> I'd rather goto unlock directly without update i_size in case error. >>> >>> agree >>>> >>>>> } >>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, >>>>> + OCFS2_JOURNAL_ACCESS_WRITE); >>>>> + if (ret) { >>>>> + mlog_errno(ret); >>>>> + ocfs2_commit_trans(osb, handle); >>>>> + break; >>>> >>>> Ditto. >>> >>> agree >>>> >>>>> + } >>>>> ret = ocfs2_mark_extent_written(inode, &et, handle, >>>>> ue->ue_cpos, 1, >>>>> ue->ue_phys, >>>>> meta_ac, &dealloc); >>>>> if (ret < 0) { >>>>> mlog_errno(ret); >>>>> + ocfs2_commit_trans(osb, handle); >>>>> break; >>>> >>>> Ditto. >>> >>> The existing code still updates i_size even if ocfs2_mark_extent_written() >>> returns an error. I am not certain whether updating i_size in this case is >>> correct, but I prefer to maintain the original logic for now. >>> Does that seem reasonable to you? >>> >> >> Since it returns 0 for unwritten extents, it looks fine. >> >>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() >>>> to save the extra call in case error. >>> >>> agree >>> >>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count >>> is about 12107 ~ 12457. >>> Regarding the transaction splitting in the unwritten handling loop: this patch >>> introduces a minor performance regression. I would like to merge some of the >>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations. >>> What do you think of this approach? >>> >> >> Seems we can limit max credit for a single transaction here? >> If exceeds, restart a new one. >> >> Joseph, >> Thanks > > sorry for the late reply, I spend some time running the tests. > I have created two different types of patches (see below) to address this issue. > > time consumption results: > > ocfs2-dynamic-... patch > real 2m9.900s > user 0m0.333s > sys 0m22.687s > > %200 of ocfs2-split-trans-... patch > real 1m48.901s > user 0m0.306s > sys 0m23.019s > > %500 of ocfs2-split-trans-... patch > real 1m49.350s > user 0m0.299s > sys 0m22.718s > > As shown above, the fixed "500% mode" is faster than the dynamic style. > Which approach do you prefer? > I'd prefer solution B, it looks simpler to directly set a commit batch. Some comments for the formal patch: 1. I don't think you have to use 'cnt + trans_start + mod' to control the batch commit, just check 'batch + handle' seems enough. 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH, 3. Do not update i_size in case error. 4. Correctly handle commit transaction in case error. Thanks, Joseph ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-19 1:53 ` Joseph Qi @ 2026-03-24 7:18 ` Heming Zhao 2026-03-25 3:31 ` Heming Zhao 1 sibling, 0 replies; 13+ messages in thread From: Heming Zhao @ 2026-03-24 7:18 UTC (permalink / raw) To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su Sorry, I just found this mail. Please ignore my previous mail. On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote: > > > On 3/18/26 2:40 PM, Heming Zhao wrote: > > On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote: > >> > >> > >> On 3/13/26 11:17 AM, Heming Zhao wrote: > >>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: > >>>> Almost looks fine, minor updates below. > >>>> > >>>> On 3/13/26 12:27 AM, Heming Zhao wrote: > >>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: > >>>>> ocfs2_dio_end_io_write > >>>>> ocfs2_mark_extent_written > >>>>> ocfs2_change_extent_flag > >>>>> ocfs2_split_extent > >>>>> ocfs2_try_to_merge_extent > >>>>> ocfs2_extend_rotate_transaction > >>>>> ocfs2_extend_trans > >>>>> jbd2__journal_restart > >>>>> start_this_handle > >>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > >>>>> > >>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to > >>>>> handle each extent in a separate transaction. > >>>>> > >>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > >>>>> only be removed from the orphan list after the extent tree update is complete. > >>>>> this ensures that if a crash occurs in the middle of extent tree updates, we > >>>>> won't leave stale blocks beyond EOF. > >>>>> > >>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which > >>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to > >>>>> insufficient transaction credits"). > >>>>> > >>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions. > >>>>> > >>>>> Suggested-by: Jan Kara <jack@suse.cz> > >>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> > >>>>> --- > >>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- > >>>>> 1 file changed, 29 insertions(+), 29 deletions(-) > >>>>> > >>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > >>>>> index 09146b43d1f0..91997b330d39 100644 > >>>>> --- a/fs/ocfs2/aops.c > >>>>> +++ b/fs/ocfs2/aops.c > >>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > >>>>> goto out; > >>>>> } > >>>>> > >>>>> - /* Delete orphan before acquire i_rwsem. */ > >>>>> - if (dwc->dw_orphaned) { > >>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > >>>>> - > >>>>> - end = end > i_size_read(inode) ? end : 0; > >>>>> - > >>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > >>>>> - !!end, end); > >>>>> - if (ret < 0) > >>>>> - mlog_errno(ret); > >>>>> - } > >>>>> - > >>>>> down_write(&oi->ip_alloc_sem); > >>>>> di = (struct ocfs2_dinode *)di_bh->b_data; > >>>>> > >>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > >>>>> > >>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > >>>>> > >>>>> - handle = ocfs2_start_trans(osb, credits); > >>>>> - if (IS_ERR(handle)) { > >>>>> - ret = PTR_ERR(handle); > >>>>> - mlog_errno(ret); > >>>>> - goto unlock; > >>>>> - } > >>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > >>>>> - OCFS2_JOURNAL_ACCESS_WRITE); > >>>>> - if (ret) { > >>>>> - mlog_errno(ret); > >>>>> - goto commit; > >>>>> - } > >>>>> - > >>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > >>>>> - ret = ocfs2_assure_trans_credits(handle, credits); > >>>>> - if (ret < 0) { > >>>>> + handle = ocfs2_start_trans(osb, credits); > >>>>> + if (IS_ERR(handle)) { > >>>>> + ret = PTR_ERR(handle); > >>>>> mlog_errno(ret); > >>>>> break; > >>>> > >>>> I'd rather goto unlock directly without update i_size in case error. > >>> > >>> agree > >>>> > >>>>> } > >>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > >>>>> + OCFS2_JOURNAL_ACCESS_WRITE); > >>>>> + if (ret) { > >>>>> + mlog_errno(ret); > >>>>> + ocfs2_commit_trans(osb, handle); > >>>>> + break; > >>>> > >>>> Ditto. > >>> > >>> agree > >>>> > >>>>> + } > >>>>> ret = ocfs2_mark_extent_written(inode, &et, handle, > >>>>> ue->ue_cpos, 1, > >>>>> ue->ue_phys, > >>>>> meta_ac, &dealloc); > >>>>> if (ret < 0) { > >>>>> mlog_errno(ret); > >>>>> + ocfs2_commit_trans(osb, handle); > >>>>> break; > >>>> > >>>> Ditto. > >>> > >>> The existing code still updates i_size even if ocfs2_mark_extent_written() > >>> returns an error. I am not certain whether updating i_size in this case is > >>> correct, but I prefer to maintain the original logic for now. > >>> Does that seem reasonable to you? > >>> > >> > >> Since it returns 0 for unwritten extents, it looks fine. > >> > >>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() > >>>> to save the extra call in case error. > >>> > >>> agree > >>> > >>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count > >>> is about 12107 ~ 12457. > >>> Regarding the transaction splitting in the unwritten handling loop: this patch > >>> introduces a minor performance regression. I would like to merge some of the > >>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations. > >>> What do you think of this approach? > >>> > >> > >> Seems we can limit max credit for a single transaction here? > >> If exceeds, restart a new one. > >> > >> Joseph, > >> Thanks > > > > sorry for the late reply, I spend some time running the tests. > > I have created two different types of patches (see below) to address this issue. > > > > time consumption results: > > > > ocfs2-dynamic-... patch > > real 2m9.900s > > user 0m0.333s > > sys 0m22.687s > > > > %200 of ocfs2-split-trans-... patch > > real 1m48.901s > > user 0m0.306s > > sys 0m23.019s > > > > %500 of ocfs2-split-trans-... patch > > real 1m49.350s > > user 0m0.299s > > sys 0m22.718s > > > > As shown above, the fixed "500% mode" is faster than the dynamic style. > > Which approach do you prefer? > > > I'd prefer solution B, it looks simpler to directly set a commit batch. > Some comments for the formal patch: > 1. I don't think you have to use 'cnt + trans_start + mod' to control > the batch commit, just check 'batch + handle' seems enough. > 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH, > 3. Do not update i_size in case error. > 4. Correctly handle commit transaction in case error. > > Thanks, > Joseph > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-19 1:53 ` Joseph Qi 2026-03-24 7:18 ` Heming Zhao @ 2026-03-25 3:31 ` Heming Zhao 2026-03-25 6:33 ` Joseph Qi 1 sibling, 1 reply; 13+ messages in thread From: Heming Zhao @ 2026-03-25 3:31 UTC (permalink / raw) To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote: > > > On 3/18/26 2:40 PM, Heming Zhao wrote: > > On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote: > >> > >> > >> On 3/13/26 11:17 AM, Heming Zhao wrote: > >>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: > >>>> Almost looks fine, minor updates below. > >>>> > >>>> On 3/13/26 12:27 AM, Heming Zhao wrote: > >>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: > >>>>> ocfs2_dio_end_io_write > >>>>> ocfs2_mark_extent_written > >>>>> ocfs2_change_extent_flag > >>>>> ocfs2_split_extent > >>>>> ocfs2_try_to_merge_extent > >>>>> ocfs2_extend_rotate_transaction > >>>>> ocfs2_extend_trans > >>>>> jbd2__journal_restart > >>>>> start_this_handle > >>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > >>>>> > >>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to > >>>>> handle each extent in a separate transaction. > >>>>> > >>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > >>>>> only be removed from the orphan list after the extent tree update is complete. > >>>>> this ensures that if a crash occurs in the middle of extent tree updates, we > >>>>> won't leave stale blocks beyond EOF. > >>>>> > >>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which > >>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to > >>>>> insufficient transaction credits"). > >>>>> > >>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions. > >>>>> > >>>>> Suggested-by: Jan Kara <jack@suse.cz> > >>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> > >>>>> --- > >>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- > >>>>> 1 file changed, 29 insertions(+), 29 deletions(-) > >>>>> > >>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > >>>>> index 09146b43d1f0..91997b330d39 100644 > >>>>> --- a/fs/ocfs2/aops.c > >>>>> +++ b/fs/ocfs2/aops.c > >>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > >>>>> goto out; > >>>>> } > >>>>> > >>>>> - /* Delete orphan before acquire i_rwsem. */ > >>>>> - if (dwc->dw_orphaned) { > >>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > >>>>> - > >>>>> - end = end > i_size_read(inode) ? end : 0; > >>>>> - > >>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > >>>>> - !!end, end); > >>>>> - if (ret < 0) > >>>>> - mlog_errno(ret); > >>>>> - } > >>>>> - > >>>>> down_write(&oi->ip_alloc_sem); > >>>>> di = (struct ocfs2_dinode *)di_bh->b_data; > >>>>> > >>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > >>>>> > >>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > >>>>> > >>>>> - handle = ocfs2_start_trans(osb, credits); > >>>>> - if (IS_ERR(handle)) { > >>>>> - ret = PTR_ERR(handle); > >>>>> - mlog_errno(ret); > >>>>> - goto unlock; > >>>>> - } > >>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > >>>>> - OCFS2_JOURNAL_ACCESS_WRITE); > >>>>> - if (ret) { > >>>>> - mlog_errno(ret); > >>>>> - goto commit; > >>>>> - } > >>>>> - > >>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > >>>>> - ret = ocfs2_assure_trans_credits(handle, credits); > >>>>> - if (ret < 0) { > >>>>> + handle = ocfs2_start_trans(osb, credits); > >>>>> + if (IS_ERR(handle)) { > >>>>> + ret = PTR_ERR(handle); > >>>>> mlog_errno(ret); > >>>>> break; > >>>> > >>>> I'd rather goto unlock directly without update i_size in case error. > >>> > >>> agree > >>>> > >>>>> } > >>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > >>>>> + OCFS2_JOURNAL_ACCESS_WRITE); > >>>>> + if (ret) { > >>>>> + mlog_errno(ret); > >>>>> + ocfs2_commit_trans(osb, handle); > >>>>> + break; > >>>> > >>>> Ditto. > >>> > >>> agree > >>>> > >>>>> + } > >>>>> ret = ocfs2_mark_extent_written(inode, &et, handle, > >>>>> ue->ue_cpos, 1, > >>>>> ue->ue_phys, > >>>>> meta_ac, &dealloc); > >>>>> if (ret < 0) { > >>>>> mlog_errno(ret); > >>>>> + ocfs2_commit_trans(osb, handle); > >>>>> break; > >>>> > >>>> Ditto. > >>> > >>> The existing code still updates i_size even if ocfs2_mark_extent_written() > >>> returns an error. I am not certain whether updating i_size in this case is > >>> correct, but I prefer to maintain the original logic for now. > >>> Does that seem reasonable to you? > >>> > >> > >> Since it returns 0 for unwritten extents, it looks fine. > >> > >>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() > >>>> to save the extra call in case error. > >>> > >>> agree > >>> > >>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count > >>> is about 12107 ~ 12457. > >>> Regarding the transaction splitting in the unwritten handling loop: this patch > >>> introduces a minor performance regression. I would like to merge some of the > >>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations. > >>> What do you think of this approach? > >>> > >> > >> Seems we can limit max credit for a single transaction here? > >> If exceeds, restart a new one. > >> > >> Joseph, > >> Thanks > > > > sorry for the late reply, I spend some time running the tests. > > I have created two different types of patches (see below) to address this issue. > > > > time consumption results: > > > > ocfs2-dynamic-... patch > > real 2m9.900s > > user 0m0.333s > > sys 0m22.687s > > > > %200 of ocfs2-split-trans-... patch > > real 1m48.901s > > user 0m0.306s > > sys 0m23.019s > > > > %500 of ocfs2-split-trans-... patch > > real 1m49.350s > > user 0m0.299s > > sys 0m22.718s > > > > As shown above, the fixed "500% mode" is faster than the dynamic style. > > Which approach do you prefer? > > > I'd prefer solution B, it looks simpler to directly set a commit batch. > Some comments for the formal patch: > 1. I don't think you have to use 'cnt + trans_start + mod' to control > the batch commit, just check 'batch + handle' seems enough. > 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH, > 3. Do not update i_size in case error. > 4. Correctly handle commit transaction in case error. > > Thanks, > Joseph > I hope I understand your point correctly. creating and setting OCFS2_DIO_MARK_EXTENT_BATCH to 800, corresponding to the %200 used in my previous 'cnt+trans_start+mod' patch. the time consumption: real 1m50.508s user 0m0.365s sys 0m22.829s ``` diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 7a65d5a36a3e..4e3a51e86e32 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -37,6 +37,8 @@ #include "namei.h" #include "sysfile.h" +#define OCFS2_DIO_MARK_EXTENT_BATCH 800 + static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { @@ -2279,6 +2281,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode, handle_t *handle = NULL; loff_t end = offset + bytes; int ret = 0, credits = 0; + int trans_err = 0, wanted; ocfs2_init_dealloc_ctxt(&dealloc); @@ -2295,18 +2298,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, goto out; } - /* Delete orphan before acquire i_rwsem. */ - if (dwc->dw_orphaned) { - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); - - end = end > i_size_read(inode) ? end : 0; - - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, - !!end, end); - if (ret < 0) - mlog_errno(ret); - } - down_write(&oi->ip_alloc_sem); di = (struct ocfs2_dinode *)di_bh->b_data; @@ -2341,11 +2332,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, } list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { - ret = ocfs2_assure_trans_credits(handle, credits); - if (ret < 0) { - mlog_errno(ret); - break; - } ret = ocfs2_mark_extent_written(inode, &et, handle, ue->ue_cpos, 1, ue->ue_phys, @@ -2354,17 +2340,46 @@ static int ocfs2_dio_end_io_write(struct inode *inode, mlog_errno(ret); break; } - } - if (end > i_size_read(inode)) { - ret = ocfs2_set_inode_size(handle, inode, di_bh, end); - if (ret < 0) - mlog_errno(ret); + wanted = jbd2_handle_buffer_credits(handle) + (credits << 2); + if (wanted > OCFS2_DIO_MARK_EXTENT_BATCH) { + ocfs2_commit_trans(osb, handle); + handle = ocfs2_start_trans(osb, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + mlog_errno(ret); + trans_err = 1; + break; + } + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) { + mlog_errno(ret); + break; + } + } } + + if (!trans_err) { + if (end > i_size_read(inode)) { + ret = ocfs2_set_inode_size(handle, inode, di_bh, end); + if (ret < 0) + mlog_errno(ret); + } commit: - ocfs2_commit_trans(osb, handle); + ocfs2_commit_trans(osb, handle); + } unlock: up_write(&oi->ip_alloc_sem); + + /* everything looks good, let's start the cleanup */ + if (dwc->dw_orphaned) { + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); + + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); + if (ret < 0) + mlog_errno(ret); + } ocfs2_inode_unlock(inode, 1); brelse(di_bh); out: ``` if this approach is acceptable, I will send v3. - Heming ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-25 3:31 ` Heming Zhao @ 2026-03-25 6:33 ` Joseph Qi 0 siblings, 0 replies; 13+ messages in thread From: Joseph Qi @ 2026-03-25 6:33 UTC (permalink / raw) To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su On 3/25/26 11:31 AM, Heming Zhao wrote: > On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote: >> >> >> On 3/18/26 2:40 PM, Heming Zhao wrote: >>> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote: >>>> >>>> >>>> On 3/13/26 11:17 AM, Heming Zhao wrote: >>>>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: >>>>>> Almost looks fine, minor updates below. >>>>>> >>>>>> On 3/13/26 12:27 AM, Heming Zhao wrote: >>>>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: >>>>>>> ocfs2_dio_end_io_write >>>>>>> ocfs2_mark_extent_written >>>>>>> ocfs2_change_extent_flag >>>>>>> ocfs2_split_extent >>>>>>> ocfs2_try_to_merge_extent >>>>>>> ocfs2_extend_rotate_transaction >>>>>>> ocfs2_extend_trans >>>>>>> jbd2__journal_restart >>>>>>> start_this_handle >>>>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 >>>>>>> >>>>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to >>>>>>> handle each extent in a separate transaction. >>>>>>> >>>>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should >>>>>>> only be removed from the orphan list after the extent tree update is complete. >>>>>>> this ensures that if a crash occurs in the middle of extent tree updates, we >>>>>>> won't leave stale blocks beyond EOF. >>>>>>> >>>>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which >>>>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to >>>>>>> insufficient transaction credits"). >>>>>>> >>>>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions. >>>>>>> >>>>>>> Suggested-by: Jan Kara <jack@suse.cz> >>>>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> >>>>>>> --- >>>>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- >>>>>>> 1 file changed, 29 insertions(+), 29 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >>>>>>> index 09146b43d1f0..91997b330d39 100644 >>>>>>> --- a/fs/ocfs2/aops.c >>>>>>> +++ b/fs/ocfs2/aops.c >>>>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, >>>>>>> goto out; >>>>>>> } >>>>>>> >>>>>>> - /* Delete orphan before acquire i_rwsem. */ >>>>>>> - if (dwc->dw_orphaned) { >>>>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); >>>>>>> - >>>>>>> - end = end > i_size_read(inode) ? end : 0; >>>>>>> - >>>>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, >>>>>>> - !!end, end); >>>>>>> - if (ret < 0) >>>>>>> - mlog_errno(ret); >>>>>>> - } >>>>>>> - >>>>>>> down_write(&oi->ip_alloc_sem); >>>>>>> di = (struct ocfs2_dinode *)di_bh->b_data; >>>>>>> >>>>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, >>>>>>> >>>>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); >>>>>>> >>>>>>> - handle = ocfs2_start_trans(osb, credits); >>>>>>> - if (IS_ERR(handle)) { >>>>>>> - ret = PTR_ERR(handle); >>>>>>> - mlog_errno(ret); >>>>>>> - goto unlock; >>>>>>> - } >>>>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, >>>>>>> - OCFS2_JOURNAL_ACCESS_WRITE); >>>>>>> - if (ret) { >>>>>>> - mlog_errno(ret); >>>>>>> - goto commit; >>>>>>> - } >>>>>>> - >>>>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { >>>>>>> - ret = ocfs2_assure_trans_credits(handle, credits); >>>>>>> - if (ret < 0) { >>>>>>> + handle = ocfs2_start_trans(osb, credits); >>>>>>> + if (IS_ERR(handle)) { >>>>>>> + ret = PTR_ERR(handle); >>>>>>> mlog_errno(ret); >>>>>>> break; >>>>>> >>>>>> I'd rather goto unlock directly without update i_size in case error. >>>>> >>>>> agree >>>>>> >>>>>>> } >>>>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, >>>>>>> + OCFS2_JOURNAL_ACCESS_WRITE); >>>>>>> + if (ret) { >>>>>>> + mlog_errno(ret); >>>>>>> + ocfs2_commit_trans(osb, handle); >>>>>>> + break; >>>>>> >>>>>> Ditto. >>>>> >>>>> agree >>>>>> >>>>>>> + } >>>>>>> ret = ocfs2_mark_extent_written(inode, &et, handle, >>>>>>> ue->ue_cpos, 1, >>>>>>> ue->ue_phys, >>>>>>> meta_ac, &dealloc); >>>>>>> if (ret < 0) { >>>>>>> mlog_errno(ret); >>>>>>> + ocfs2_commit_trans(osb, handle); >>>>>>> break; >>>>>> >>>>>> Ditto. >>>>> >>>>> The existing code still updates i_size even if ocfs2_mark_extent_written() >>>>> returns an error. I am not certain whether updating i_size in this case is >>>>> correct, but I prefer to maintain the original logic for now. >>>>> Does that seem reasonable to you? >>>>> >>>> >>>> Since it returns 0 for unwritten extents, it looks fine. >>>> >>>>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() >>>>>> to save the extra call in case error. >>>>> >>>>> agree >>>>> >>>>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count >>>>> is about 12107 ~ 12457. >>>>> Regarding the transaction splitting in the unwritten handling loop: this patch >>>>> introduces a minor performance regression. I would like to merge some of the >>>>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations. >>>>> What do you think of this approach? >>>>> >>>> >>>> Seems we can limit max credit for a single transaction here? >>>> If exceeds, restart a new one. >>>> >>>> Joseph, >>>> Thanks >>> >>> sorry for the late reply, I spend some time running the tests. >>> I have created two different types of patches (see below) to address this issue. >>> >>> time consumption results: >>> >>> ocfs2-dynamic-... patch >>> real 2m9.900s >>> user 0m0.333s >>> sys 0m22.687s >>> >>> %200 of ocfs2-split-trans-... patch >>> real 1m48.901s >>> user 0m0.306s >>> sys 0m23.019s >>> >>> %500 of ocfs2-split-trans-... patch >>> real 1m49.350s >>> user 0m0.299s >>> sys 0m22.718s >>> >>> As shown above, the fixed "500% mode" is faster than the dynamic style. >>> Which approach do you prefer? >>> >> I'd prefer solution B, it looks simpler to directly set a commit batch. >> Some comments for the formal patch: >> 1. I don't think you have to use 'cnt + trans_start + mod' to control >> the batch commit, just check 'batch + handle' seems enough. >> 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH, >> 3. Do not update i_size in case error. >> 4. Correctly handle commit transaction in case error. >> >> Thanks, >> Joseph >> > > I hope I understand your point correctly. > creating and setting OCFS2_DIO_MARK_EXTENT_BATCH to 800, corresponding to the > %200 used in my previous 'cnt+trans_start+mod' patch. > Seems you've proposed a new solution? In your previous mail, I think it is: list_for_each_entry() { if (!handle) ocfs2_start_trans(); ocfs2_journal_access_di() ocfs2_mark_extent_written() if (batch == OCFS2_DIO_MARK_EXTENT_BATCH) { ocfs2_commit_trans(); handle = NULL; } } Joseph ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-18 6:40 ` Heming Zhao 2026-03-19 1:53 ` Joseph Qi @ 2026-03-24 6:00 ` Heming Zhao 1 sibling, 0 replies; 13+ messages in thread From: Heming Zhao @ 2026-03-24 6:00 UTC (permalink / raw) To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su ping... On Wed, Mar 18, 2026 at 02:40:45PM +0800, Heming Zhao wrote: > On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote: > > > > > > On 3/13/26 11:17 AM, Heming Zhao wrote: > > > On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote: > > >> Almost looks fine, minor updates below. > > >> > > >> On 3/13/26 12:27 AM, Heming Zhao wrote: > > >>> During ocfs2 dio operations, JBD2 may report warnings via following call trace: > > >>> ocfs2_dio_end_io_write > > >>> ocfs2_mark_extent_written > > >>> ocfs2_change_extent_flag > > >>> ocfs2_split_extent > > >>> ocfs2_try_to_merge_extent > > >>> ocfs2_extend_rotate_transaction > > >>> ocfs2_extend_trans > > >>> jbd2__journal_restart > > >>> start_this_handle > > >>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > > >>> > > >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to > > >>> handle each extent in a separate transaction. > > >>> > > >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > > >>> only be removed from the orphan list after the extent tree update is complete. > > >>> this ensures that if a crash occurs in the middle of extent tree updates, we > > >>> won't leave stale blocks beyond EOF. > > >>> > > >>> This patch also removes the only call to ocfs2_assure_trans_credits(), which > > >>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to > > >>> insufficient transaction credits"). > > >>> > > >>> Finally, thanks to Jans for providing the bug fix prototype and suggestions. > > >>> > > >>> Suggested-by: Jan Kara <jack@suse.cz> > > >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> > > >>> --- > > >>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- > > >>> 1 file changed, 29 insertions(+), 29 deletions(-) > > >>> > > >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > > >>> index 09146b43d1f0..91997b330d39 100644 > > >>> --- a/fs/ocfs2/aops.c > > >>> +++ b/fs/ocfs2/aops.c > > >>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > > >>> goto out; > > >>> } > > >>> > > >>> - /* Delete orphan before acquire i_rwsem. */ > > >>> - if (dwc->dw_orphaned) { > > >>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > > >>> - > > >>> - end = end > i_size_read(inode) ? end : 0; > > >>> - > > >>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > > >>> - !!end, end); > > >>> - if (ret < 0) > > >>> - mlog_errno(ret); > > >>> - } > > >>> - > > >>> down_write(&oi->ip_alloc_sem); > > >>> di = (struct ocfs2_dinode *)di_bh->b_data; > > >>> > > >>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > > >>> > > >>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > > >>> > > >>> - handle = ocfs2_start_trans(osb, credits); > > >>> - if (IS_ERR(handle)) { > > >>> - ret = PTR_ERR(handle); > > >>> - mlog_errno(ret); > > >>> - goto unlock; > > >>> - } > > >>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > > >>> - OCFS2_JOURNAL_ACCESS_WRITE); > > >>> - if (ret) { > > >>> - mlog_errno(ret); > > >>> - goto commit; > > >>> - } > > >>> - > > >>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > > >>> - ret = ocfs2_assure_trans_credits(handle, credits); > > >>> - if (ret < 0) { > > >>> + handle = ocfs2_start_trans(osb, credits); > > >>> + if (IS_ERR(handle)) { > > >>> + ret = PTR_ERR(handle); > > >>> mlog_errno(ret); > > >>> break; > > >> > > >> I'd rather goto unlock directly without update i_size in case error. > > > > > > agree > > >> > > >>> } > > >>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > > >>> + OCFS2_JOURNAL_ACCESS_WRITE); > > >>> + if (ret) { > > >>> + mlog_errno(ret); > > >>> + ocfs2_commit_trans(osb, handle); > > >>> + break; > > >> > > >> Ditto. > > > > > > agree > > >> > > >>> + } > > >>> ret = ocfs2_mark_extent_written(inode, &et, handle, > > >>> ue->ue_cpos, 1, > > >>> ue->ue_phys, > > >>> meta_ac, &dealloc); > > >>> if (ret < 0) { > > >>> mlog_errno(ret); > > >>> + ocfs2_commit_trans(osb, handle); > > >>> break; > > >> > > >> Ditto. > > > > > > The existing code still updates i_size even if ocfs2_mark_extent_written() > > > returns an error. I am not certain whether updating i_size in this case is > > > correct, but I prefer to maintain the original logic for now. > > > Does that seem reasonable to you? > > > > > > > Since it returns 0 for unwritten extents, it looks fine. > > > > >> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written() > > >> to save the extra call in case error. > > > > > > agree > > > > > > I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count > > > is about 12107 ~ 12457. > > > Regarding the transaction splitting in the unwritten handling loop: this patch > > > introduces a minor performance regression. I would like to merge some of the > > > transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations. > > > What do you think of this approach? > > > > > > > Seems we can limit max credit for a single transaction here? > > If exceeds, restart a new one. > > > > Joseph, > > Thanks > > sorry for the late reply, I spend some time running the tests. > I have created two different types of patches (see below) to address this issue. > > time consumption results: > > ocfs2-dynamic-... patch > real 2m9.900s > user 0m0.333s > sys 0m22.687s > > %200 of ocfs2-split-trans-... patch > real 1m48.901s > user 0m0.306s > sys 0m23.019s > > %500 of ocfs2-split-trans-... patch > real 1m49.350s > user 0m0.299s > sys 0m22.718s > > As shown above, the fixed "500% mode" is faster than the dynamic style. > Which approach do you prefer? > > --- > the test script: > ``` > #!/bin/bash > # thanks Wolfgang > > DISK="vdc" > MOUNTPOINT="/mnt/ocfs2" > TEST_FILE="$MOUNTPOINT/test.bin" > > if lsblk | grep "$DISK.*$MOUNTPOINT"; then > umount $MOUNTPOINT > fi > echo "------mkfs & mount----------------" > mkfs.ocfs2 -b 4K -C 4K --cluster-stack=pcmk -N 2 /dev/$DISK --cluster-name=hacluster > mount.ocfs2 -t ocfs2 /dev/$DISK $MOUNTPOINT > > sleep 1 > echo "------fallocate----------------" > fallocate -l 2G "$TEST_FILE" > > sleep 1 > echo "------setup file----------------" > fio --name=setup --filename="$TEST_FILE" --rw=randwrite --bs=4k \ > --ioengine=libaio --iodepth=64 --size=2G --io_size=512M \ > --direct=0 --end_fsync=1 --minimal > > sync > sleep 3 > > echo "-----start test-----------------" > time fio --name=tst --filename="$TEST_FILE" --rw=write --bs=64M \ > --ioengine=libaio --iodepth=16 --size=2G --direct=1 > ``` > > --- > patch: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch > Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path > > --- > fs/ocfs2/aops.c | 31 +++++++++++++++++-------------- > fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++---- > fs/ocfs2/journal.h | 5 +++-- > 3 files changed, 50 insertions(+), 20 deletions(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index 7a65d5a36a3e..d9be765662dc 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > handle_t *handle = NULL; > loff_t end = offset + bytes; > int ret = 0, credits = 0; > + /* do the same job of jbd2_max_user_trans_buffers() */ > + journal_t *journal = osb->journal->j_journal; > + int jbd2_max = journal->j_max_transaction_buffers - > + journal->j_transaction_overhead_buffers; > > ocfs2_init_dealloc_ctxt(&dealloc); > > @@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > goto out; > } > > - /* Delete orphan before acquire i_rwsem. */ > - if (dwc->dw_orphaned) { > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > - > - end = end > i_size_read(inode) ? end : 0; > - > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > - !!end, end); > - if (ret < 0) > - mlog_errno(ret); > - } > - > down_write(&oi->ip_alloc_sem); > di = (struct ocfs2_dinode *)di_bh->b_data; > > @@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > } > > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > - ret = ocfs2_assure_trans_credits(handle, credits); > - if (ret < 0) { > + ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max); > + if (ret == 1) { > + goto unlock; > + } else if (ret < 0) { > mlog_errno(ret); > break; > } > @@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > ocfs2_commit_trans(osb, handle); > unlock: > up_write(&oi->ip_alloc_sem); > + > + if (dwc->dw_orphaned) { > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > + > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); > + if (ret < 0) > + mlog_errno(ret); > + } > + > ocfs2_inode_unlock(inode, 1); > brelse(di_bh); > out: > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c > index e5f58ff2175f..57ad69d19494 100644 > --- a/fs/ocfs2/journal.c > +++ b/fs/ocfs2/journal.c > @@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks) > * credits. Similar notes regarding data consistency and locking implications > * as for ocfs2_extend_trans() apply here. > */ > -int ocfs2_assure_trans_credits(handle_t *handle, int nblocks) > +int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle, > + struct buffer_head *bh, int nblocks, int jbd2_max) > { > + handle_t *handle = *i_handle; > int old_nblks = jbd2_handle_buffer_credits(handle); > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + int ret, wanted; > + int plan = nblocks << 1; /* Double the credits to prevent boundary issues */ > > trace_ocfs2_assure_trans_credits(old_nblks); > - if (old_nblks >= nblocks) > - return 0; > - return ocfs2_extend_trans(handle, nblocks - old_nblks); > + > + wanted = old_nblks + plan; > + > + if (wanted < jbd2_max) { > + if (old_nblks < nblocks) > + return ocfs2_extend_trans(handle, nblocks - old_nblks); > + else > + return 0; > + } > + > + ocfs2_commit_trans(osb, handle); > + handle = ocfs2_start_trans(osb, plan); > + if (IS_ERR(handle)) { > + mlog_errno(PTR_ERR(handle)); > + return 1; /* caller must not commit trans for this error */ > + } > + > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh, > + OCFS2_JOURNAL_ACCESS_WRITE); > + if (ret) > + mlog_errno(ret); > + > + *i_handle = handle; > + return ret; > } > > /* > diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h > index 6397170f302f..08a76ffb870a 100644 > --- a/fs/ocfs2/journal.h > +++ b/fs/ocfs2/journal.h > @@ -244,8 +244,9 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb, > int ocfs2_commit_trans(struct ocfs2_super *osb, > handle_t *handle); > int ocfs2_extend_trans(handle_t *handle, int nblocks); > -int ocfs2_assure_trans_credits(handle_t *handle, > - int nblocks); > +int ocfs2_assure_trans_credits(struct inode *inode, > + handle_t **i_handle, struct buffer_head *bh, > + int nblocks, int jbd2_max); > int ocfs2_allocate_extend_trans(handle_t *handle, > int thresh); > > -- > 2.43.0 > > ---------- > patch: 0001-ocfs2-split-trans-in-end-dio-to-avoid-credit-exhaust.patch > Subject: [PATCH] ocfs2: split trans in end dio to avoid credit exhaustion > > --- > fs/ocfs2/aops.c | 77 +++++++++++++++++++++++++++++-------------------- > 1 file changed, 45 insertions(+), 32 deletions(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index 7a65d5a36a3e..9053f1ee587a 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -2279,6 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > handle_t *handle = NULL; > loff_t end = offset + bytes; > int ret = 0, credits = 0; > + int cnt = 0, trans_start = 0, mod = 500; > > ocfs2_init_dealloc_ctxt(&dealloc); > > @@ -2295,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > goto out; > } > > - /* Delete orphan before acquire i_rwsem. */ > - if (dwc->dw_orphaned) { > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > - > - end = end > i_size_read(inode) ? end : 0; > - > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > - !!end, end); > - if (ret < 0) > - mlog_errno(ret); > - } > - > down_write(&oi->ip_alloc_sem); > di = (struct ocfs2_dinode *)di_bh->b_data; > > @@ -2327,44 +2316,68 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > > credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > > - handle = ocfs2_start_trans(osb, credits); > - if (IS_ERR(handle)) { > - ret = PTR_ERR(handle); > - mlog_errno(ret); > - goto unlock; > - } > - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > - OCFS2_JOURNAL_ACCESS_WRITE); > - if (ret) { > - mlog_errno(ret); > - goto commit; > - } > - > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > - ret = ocfs2_assure_trans_credits(handle, credits); > - if (ret < 0) { > - mlog_errno(ret); > - break; > + if ((cnt++ % mod) == 0) { > + trans_start = 1; > + handle = ocfs2_start_trans(osb, credits); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > + mlog_errno(ret); > + goto unlock; > + } > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > + OCFS2_JOURNAL_ACCESS_WRITE); > + if (ret) { > + mlog_errno(ret); > + goto unlock; > + } > } > ret = ocfs2_mark_extent_written(inode, &et, handle, > ue->ue_cpos, 1, > ue->ue_phys, > meta_ac, &dealloc); > + if ((cnt % mod) == 0) { > + ocfs2_commit_trans(osb, handle); > + if (ret < 0) { > + mlog_errno(ret); > + break; > + } > + trans_start = 0; > + } > + } > + > + //commit if above loop doesn't do > + if (trans_start) { > + ocfs2_commit_trans(osb, handle); > if (ret < 0) { > mlog_errno(ret); > - break; > } > } > > if (end > i_size_read(inode)) { > + handle = ocfs2_start_trans(osb, credits); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > + mlog_errno(ret); > + goto unlock; > + } > ret = ocfs2_set_inode_size(handle, inode, di_bh, end); > if (ret < 0) > mlog_errno(ret); > + ocfs2_commit_trans(osb, handle); > } > -commit: > - ocfs2_commit_trans(osb, handle); > + > unlock: > up_write(&oi->ip_alloc_sem); > + > + /* everything looks good, let's start the cleanup */ > + if (dwc->dw_orphaned) { > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > + > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); > + if (ret < 0) > + mlog_errno(ret); > + } > ocfs2_inode_unlock(inode, 1); > brelse(di_bh); > out: > -- > 2.43.0 > > > Thanks, > Heming ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion 2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao 2026-03-13 2:04 ` Joseph Qi @ 2026-03-13 2:27 ` Joseph Qi 1 sibling, 0 replies; 13+ messages in thread From: Joseph Qi @ 2026-03-13 2:27 UTC (permalink / raw) To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara On 3/13/26 12:27 AM, Heming Zhao wrote: > During ocfs2 dio operations, JBD2 may report warnings via following call trace: > ocfs2_dio_end_io_write > ocfs2_mark_extent_written > ocfs2_change_extent_flag > ocfs2_split_extent > ocfs2_try_to_merge_extent > ocfs2_extend_rotate_transaction > ocfs2_extend_trans > jbd2__journal_restart > start_this_handle > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 > > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to > handle each extent in a separate transaction. > > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should > only be removed from the orphan list after the extent tree update is complete. > this ensures that if a crash occurs in the middle of extent tree updates, we > won't leave stale blocks beyond EOF. > > This patch also removes the only call to ocfs2_assure_trans_credits(), which > was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to > insufficient transaction credits"). > > Finally, thanks to Jans for providing the bug fix prototype and suggestions. > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > --- > fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++------------------------- > 1 file changed, 29 insertions(+), 29 deletions(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index 09146b43d1f0..91997b330d39 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > goto out; > } > > - /* Delete orphan before acquire i_rwsem. */ > - if (dwc->dw_orphaned) { > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > - > - end = end > i_size_read(inode) ? end : 0; > - > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, > - !!end, end); > - if (ret < 0) > - mlog_errno(ret); > - } > - > down_write(&oi->ip_alloc_sem); > di = (struct ocfs2_dinode *)di_bh->b_data; > > @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode, > > credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); > > - handle = ocfs2_start_trans(osb, credits); > - if (IS_ERR(handle)) { > - ret = PTR_ERR(handle); > - mlog_errno(ret); > - goto unlock; > - } > - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > - OCFS2_JOURNAL_ACCESS_WRITE); > - if (ret) { > - mlog_errno(ret); > - goto commit; > - } > - > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { > - ret = ocfs2_assure_trans_credits(handle, credits); Since now no any caller for ocfs2_assure_trans_credits(), we may remove it in PATCH 2 of this series. Thanks, Joseph > - if (ret < 0) { > + handle = ocfs2_start_trans(osb, credits); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > mlog_errno(ret); > break; > } > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, > + OCFS2_JOURNAL_ACCESS_WRITE); > + if (ret) { > + mlog_errno(ret); > + ocfs2_commit_trans(osb, handle); > + break; > + } > ret = ocfs2_mark_extent_written(inode, &et, handle, > ue->ue_cpos, 1, > ue->ue_phys, > meta_ac, &dealloc); > if (ret < 0) { > mlog_errno(ret); > + ocfs2_commit_trans(osb, handle); > break; > } > + ocfs2_commit_trans(osb, handle); > } > > if (end > i_size_read(inode)) { > + handle = ocfs2_start_trans(osb, credits); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > + mlog_errno(ret); > + goto unlock; > + } > ret = ocfs2_set_inode_size(handle, inode, di_bh, end); > if (ret < 0) > mlog_errno(ret); > + ocfs2_commit_trans(osb, handle); > } > -commit: > - ocfs2_commit_trans(osb, handle); > + > unlock: > up_write(&oi->ip_alloc_sem); > + > + /* everything looks good, let's start the cleanup */ > + if (dwc->dw_orphaned) { > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); > + > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); > + if (ret < 0) > + mlog_errno(ret); > + } > ocfs2_inode_unlock(inode, 1); > brelse(di_bh); > out: ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-03-25 6:33 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-12 16:27 [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid Heming Zhao 2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao 2026-03-13 2:04 ` Joseph Qi 2026-03-13 3:17 ` Heming Zhao 2026-03-13 4:53 ` Joseph Qi 2026-03-13 5:35 ` Joseph Qi 2026-03-18 6:40 ` Heming Zhao 2026-03-19 1:53 ` Joseph Qi 2026-03-24 7:18 ` Heming Zhao 2026-03-25 3:31 ` Heming Zhao 2026-03-25 6:33 ` Joseph Qi 2026-03-24 6:00 ` Heming Zhao 2026-03-13 2:27 ` Joseph Qi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox