* [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid
@ 2026-03-12 16:27 Heming Zhao
2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
0 siblings, 1 reply; 13+ messages in thread
From: Heming Zhao @ 2026-03-12 16:27 UTC (permalink / raw)
To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel
v1->v2:
following the review comments, restore the i_size update code in
ocfs2_dio_end_io_write().
For easier merging, this patch is based on Joseph's patch:
https://lore.kernel.org/ocfs2-devel/46yilbaq5z5x6gdfdpoa6lprf6sf3gbxriuku2odje4kx4bovf@jd735cphfutz/T/#t
Heming Zhao (1):
ocfs2: split transactions in dio completion to avoid credit exhaustion
fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
1 file changed, 29 insertions(+), 29 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-12 16:27 [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid Heming Zhao
@ 2026-03-12 16:27 ` Heming Zhao
2026-03-13 2:04 ` Joseph Qi
2026-03-13 2:27 ` Joseph Qi
0 siblings, 2 replies; 13+ messages in thread
From: Heming Zhao @ 2026-03-12 16:27 UTC (permalink / raw)
To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel
During ocfs2 dio operations, JBD2 may report warnings via following call trace:
ocfs2_dio_end_io_write
ocfs2_mark_extent_written
ocfs2_change_extent_flag
ocfs2_split_extent
ocfs2_try_to_merge_extent
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2__journal_restart
start_this_handle
output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
handle each extent in a separate transaction.
Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
only be removed from the orphan list after the extent tree update is complete.
this ensures that if a crash occurs in the middle of extent tree updates, we
won't leave stale blocks beyond EOF.
This patch also removes the only call to ocfs2_assure_trans_credits(), which
was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
insufficient transaction credits").
Finally, thanks to Jans for providing the bug fix prototype and suggestions.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
1 file changed, 29 insertions(+), 29 deletions(-)
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 09146b43d1f0..91997b330d39 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
goto out;
}
- /* Delete orphan before acquire i_rwsem. */
- if (dwc->dw_orphaned) {
- BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
- end = end > i_size_read(inode) ? end : 0;
-
- ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
- !!end, end);
- if (ret < 0)
- mlog_errno(ret);
- }
-
down_write(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;
@@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
- handle = ocfs2_start_trans(osb, credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto unlock;
- }
- ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret) {
- mlog_errno(ret);
- goto commit;
- }
-
list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
- ret = ocfs2_assure_trans_credits(handle, credits);
- if (ret < 0) {
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
mlog_errno(ret);
break;
}
+ ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ ocfs2_commit_trans(osb, handle);
+ break;
+ }
ret = ocfs2_mark_extent_written(inode, &et, handle,
ue->ue_cpos, 1,
ue->ue_phys,
meta_ac, &dealloc);
if (ret < 0) {
mlog_errno(ret);
+ ocfs2_commit_trans(osb, handle);
break;
}
+ ocfs2_commit_trans(osb, handle);
}
if (end > i_size_read(inode)) {
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ goto unlock;
+ }
ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
if (ret < 0)
mlog_errno(ret);
+ ocfs2_commit_trans(osb, handle);
}
-commit:
- ocfs2_commit_trans(osb, handle);
+
unlock:
up_write(&oi->ip_alloc_sem);
+
+ /* everything looks good, let's start the cleanup */
+ if (dwc->dw_orphaned) {
+ BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+ ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
out:
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
@ 2026-03-13 2:04 ` Joseph Qi
2026-03-13 3:17 ` Heming Zhao
2026-03-13 2:27 ` Joseph Qi
1 sibling, 1 reply; 13+ messages in thread
From: Joseph Qi @ 2026-03-13 2:04 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara
Almost looks fine, minor updates below.
On 3/13/26 12:27 AM, Heming Zhao wrote:
> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> ocfs2_dio_end_io_write
> ocfs2_mark_extent_written
> ocfs2_change_extent_flag
> ocfs2_split_extent
> ocfs2_try_to_merge_extent
> ocfs2_extend_rotate_transaction
> ocfs2_extend_trans
> jbd2__journal_restart
> start_this_handle
> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>
> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> handle each extent in a separate transaction.
>
> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> only be removed from the orphan list after the extent tree update is complete.
> this ensures that if a crash occurs in the middle of extent tree updates, we
> won't leave stale blocks beyond EOF.
>
> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> insufficient transaction credits").
>
> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> 1 file changed, 29 insertions(+), 29 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 09146b43d1f0..91997b330d39 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> goto out;
> }
>
> - /* Delete orphan before acquire i_rwsem. */
> - if (dwc->dw_orphaned) {
> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> - end = end > i_size_read(inode) ? end : 0;
> -
> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> - !!end, end);
> - if (ret < 0)
> - mlog_errno(ret);
> - }
> -
> down_write(&oi->ip_alloc_sem);
> di = (struct ocfs2_dinode *)di_bh->b_data;
>
> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>
> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>
> - handle = ocfs2_start_trans(osb, credits);
> - if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - mlog_errno(ret);
> - goto unlock;
> - }
> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> - OCFS2_JOURNAL_ACCESS_WRITE);
> - if (ret) {
> - mlog_errno(ret);
> - goto commit;
> - }
> -
> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> - ret = ocfs2_assure_trans_credits(handle, credits);
> - if (ret < 0) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> mlog_errno(ret);
> break;
I'd rather goto unlock directly without update i_size in case error.
> }
> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> + OCFS2_JOURNAL_ACCESS_WRITE);
> + if (ret) {
> + mlog_errno(ret);
> + ocfs2_commit_trans(osb, handle);
> + break;
Ditto.
> + }
> ret = ocfs2_mark_extent_written(inode, &et, handle,
> ue->ue_cpos, 1,
> ue->ue_phys,
> meta_ac, &dealloc);
> if (ret < 0) {
> mlog_errno(ret);
> + ocfs2_commit_trans(osb, handle);
> break;
Ditto.
BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
to save the extra call in case error.
Thanks,
Joseph
> }
> + ocfs2_commit_trans(osb, handle);
> }
>
> if (end > i_size_read(inode)) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + goto unlock;
> + }
> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> if (ret < 0)
> mlog_errno(ret);
> + ocfs2_commit_trans(osb, handle);
> }
> -commit:
> - ocfs2_commit_trans(osb, handle);
> +
> unlock:
> up_write(&oi->ip_alloc_sem);
> +
> + /* everything looks good, let's start the cleanup */
> + if (dwc->dw_orphaned) {
> + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> + if (ret < 0)
> + mlog_errno(ret);
> + }
> ocfs2_inode_unlock(inode, 1);
> brelse(di_bh);
> out:
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
2026-03-13 2:04 ` Joseph Qi
@ 2026-03-13 2:27 ` Joseph Qi
1 sibling, 0 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-13 2:27 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara
On 3/13/26 12:27 AM, Heming Zhao wrote:
> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> ocfs2_dio_end_io_write
> ocfs2_mark_extent_written
> ocfs2_change_extent_flag
> ocfs2_split_extent
> ocfs2_try_to_merge_extent
> ocfs2_extend_rotate_transaction
> ocfs2_extend_trans
> jbd2__journal_restart
> start_this_handle
> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>
> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> handle each extent in a separate transaction.
>
> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> only be removed from the orphan list after the extent tree update is complete.
> this ensures that if a crash occurs in the middle of extent tree updates, we
> won't leave stale blocks beyond EOF.
>
> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> insufficient transaction credits").
>
> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> 1 file changed, 29 insertions(+), 29 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 09146b43d1f0..91997b330d39 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> goto out;
> }
>
> - /* Delete orphan before acquire i_rwsem. */
> - if (dwc->dw_orphaned) {
> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> - end = end > i_size_read(inode) ? end : 0;
> -
> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> - !!end, end);
> - if (ret < 0)
> - mlog_errno(ret);
> - }
> -
> down_write(&oi->ip_alloc_sem);
> di = (struct ocfs2_dinode *)di_bh->b_data;
>
> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>
> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>
> - handle = ocfs2_start_trans(osb, credits);
> - if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - mlog_errno(ret);
> - goto unlock;
> - }
> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> - OCFS2_JOURNAL_ACCESS_WRITE);
> - if (ret) {
> - mlog_errno(ret);
> - goto commit;
> - }
> -
> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> - ret = ocfs2_assure_trans_credits(handle, credits);
Since now no any caller for ocfs2_assure_trans_credits(), we may remove
it in PATCH 2 of this series.
Thanks,
Joseph
> - if (ret < 0) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> mlog_errno(ret);
> break;
> }
> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> + OCFS2_JOURNAL_ACCESS_WRITE);
> + if (ret) {
> + mlog_errno(ret);
> + ocfs2_commit_trans(osb, handle);
> + break;
> + }
> ret = ocfs2_mark_extent_written(inode, &et, handle,
> ue->ue_cpos, 1,
> ue->ue_phys,
> meta_ac, &dealloc);
> if (ret < 0) {
> mlog_errno(ret);
> + ocfs2_commit_trans(osb, handle);
> break;
> }
> + ocfs2_commit_trans(osb, handle);
> }
>
> if (end > i_size_read(inode)) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + goto unlock;
> + }
> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> if (ret < 0)
> mlog_errno(ret);
> + ocfs2_commit_trans(osb, handle);
> }
> -commit:
> - ocfs2_commit_trans(osb, handle);
> +
> unlock:
> up_write(&oi->ip_alloc_sem);
> +
> + /* everything looks good, let's start the cleanup */
> + if (dwc->dw_orphaned) {
> + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> + if (ret < 0)
> + mlog_errno(ret);
> + }
> ocfs2_inode_unlock(inode, 1);
> brelse(di_bh);
> out:
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-13 2:04 ` Joseph Qi
@ 2026-03-13 3:17 ` Heming Zhao
2026-03-13 4:53 ` Joseph Qi
0 siblings, 1 reply; 13+ messages in thread
From: Heming Zhao @ 2026-03-13 3:17 UTC (permalink / raw)
To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara
On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> Almost looks fine, minor updates below.
>
> On 3/13/26 12:27 AM, Heming Zhao wrote:
> > During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> > ocfs2_dio_end_io_write
> > ocfs2_mark_extent_written
> > ocfs2_change_extent_flag
> > ocfs2_split_extent
> > ocfs2_try_to_merge_extent
> > ocfs2_extend_rotate_transaction
> > ocfs2_extend_trans
> > jbd2__journal_restart
> > start_this_handle
> > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >
> > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> > handle each extent in a separate transaction.
> >
> > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > only be removed from the orphan list after the extent tree update is complete.
> > this ensures that if a crash occurs in the middle of extent tree updates, we
> > won't leave stale blocks beyond EOF.
> >
> > This patch also removes the only call to ocfs2_assure_trans_credits(), which
> > was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> > insufficient transaction credits").
> >
> > Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> >
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> > fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> > 1 file changed, 29 insertions(+), 29 deletions(-)
> >
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 09146b43d1f0..91997b330d39 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > goto out;
> > }
> >
> > - /* Delete orphan before acquire i_rwsem. */
> > - if (dwc->dw_orphaned) {
> > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > -
> > - end = end > i_size_read(inode) ? end : 0;
> > -
> > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> > - !!end, end);
> > - if (ret < 0)
> > - mlog_errno(ret);
> > - }
> > -
> > down_write(&oi->ip_alloc_sem);
> > di = (struct ocfs2_dinode *)di_bh->b_data;
> >
> > @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >
> > credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >
> > - handle = ocfs2_start_trans(osb, credits);
> > - if (IS_ERR(handle)) {
> > - ret = PTR_ERR(handle);
> > - mlog_errno(ret);
> > - goto unlock;
> > - }
> > - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > - OCFS2_JOURNAL_ACCESS_WRITE);
> > - if (ret) {
> > - mlog_errno(ret);
> > - goto commit;
> > - }
> > -
> > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> > - ret = ocfs2_assure_trans_credits(handle, credits);
> > - if (ret < 0) {
> > + handle = ocfs2_start_trans(osb, credits);
> > + if (IS_ERR(handle)) {
> > + ret = PTR_ERR(handle);
> > mlog_errno(ret);
> > break;
>
> I'd rather goto unlock directly without update i_size in case error.
agree
>
> > }
> > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > + OCFS2_JOURNAL_ACCESS_WRITE);
> > + if (ret) {
> > + mlog_errno(ret);
> > + ocfs2_commit_trans(osb, handle);
> > + break;
>
> Ditto.
agree
>
> > + }
> > ret = ocfs2_mark_extent_written(inode, &et, handle,
> > ue->ue_cpos, 1,
> > ue->ue_phys,
> > meta_ac, &dealloc);
> > if (ret < 0) {
> > mlog_errno(ret);
> > + ocfs2_commit_trans(osb, handle);
> > break;
>
> Ditto.
The existing code still updates i_size even if ocfs2_mark_extent_written()
returns an error. I am not certain whether updating i_size in this case is
correct, but I prefer to maintain the original logic for now.
Does that seem reasonable to you?
> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> to save the extra call in case error.
agree
>
> Thanks,
> Joseph
>
I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
is about 12107 ~ 12457.
Regarding the transaction splitting in the unwritten handling loop: this patch
introduces a minor performance regression. I would like to merge some of the
transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
What do you think of this approach?
Heming
> > }
> > + ocfs2_commit_trans(osb, handle);
> > }
> >
> > if (end > i_size_read(inode)) {
> > + handle = ocfs2_start_trans(osb, credits);
> > + if (IS_ERR(handle)) {
> > + ret = PTR_ERR(handle);
> > + mlog_errno(ret);
> > + goto unlock;
> > + }
> > ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> > if (ret < 0)
> > mlog_errno(ret);
> > + ocfs2_commit_trans(osb, handle);
> > }
> > -commit:
> > - ocfs2_commit_trans(osb, handle);
> > +
> > unlock:
> > up_write(&oi->ip_alloc_sem);
> > +
> > + /* everything looks good, let's start the cleanup */
> > + if (dwc->dw_orphaned) {
> > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > +
> > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> > + if (ret < 0)
> > + mlog_errno(ret);
> > + }
> > ocfs2_inode_unlock(inode, 1);
> > brelse(di_bh);
> > out:
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-13 3:17 ` Heming Zhao
@ 2026-03-13 4:53 ` Joseph Qi
2026-03-13 5:35 ` Joseph Qi
2026-03-18 6:40 ` Heming Zhao
0 siblings, 2 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-13 4:53 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara
On 3/13/26 11:17 AM, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>> Almost looks fine, minor updates below.
>>
>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>> ocfs2_dio_end_io_write
>>> ocfs2_mark_extent_written
>>> ocfs2_change_extent_flag
>>> ocfs2_split_extent
>>> ocfs2_try_to_merge_extent
>>> ocfs2_extend_rotate_transaction
>>> ocfs2_extend_trans
>>> jbd2__journal_restart
>>> start_this_handle
>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>
>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>> handle each extent in a separate transaction.
>>>
>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>> only be removed from the orphan list after the extent tree update is complete.
>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>> won't leave stale blocks beyond EOF.
>>>
>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>> insufficient transaction credits").
>>>
>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>
>>> Suggested-by: Jan Kara <jack@suse.cz>
>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>> ---
>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>> 1 file changed, 29 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 09146b43d1f0..91997b330d39 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>> goto out;
>>> }
>>>
>>> - /* Delete orphan before acquire i_rwsem. */
>>> - if (dwc->dw_orphaned) {
>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>> -
>>> - end = end > i_size_read(inode) ? end : 0;
>>> -
>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>> - !!end, end);
>>> - if (ret < 0)
>>> - mlog_errno(ret);
>>> - }
>>> -
>>> down_write(&oi->ip_alloc_sem);
>>> di = (struct ocfs2_dinode *)di_bh->b_data;
>>>
>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>
>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>
>>> - handle = ocfs2_start_trans(osb, credits);
>>> - if (IS_ERR(handle)) {
>>> - ret = PTR_ERR(handle);
>>> - mlog_errno(ret);
>>> - goto unlock;
>>> - }
>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>> - OCFS2_JOURNAL_ACCESS_WRITE);
>>> - if (ret) {
>>> - mlog_errno(ret);
>>> - goto commit;
>>> - }
>>> -
>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>> - ret = ocfs2_assure_trans_credits(handle, credits);
>>> - if (ret < 0) {
>>> + handle = ocfs2_start_trans(osb, credits);
>>> + if (IS_ERR(handle)) {
>>> + ret = PTR_ERR(handle);
>>> mlog_errno(ret);
>>> break;
>>
>> I'd rather goto unlock directly without update i_size in case error.
>
> agree
>>
>>> }
>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>> + OCFS2_JOURNAL_ACCESS_WRITE);
>>> + if (ret) {
>>> + mlog_errno(ret);
>>> + ocfs2_commit_trans(osb, handle);
>>> + break;
>>
>> Ditto.
>
> agree
>>
>>> + }
>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
>>> ue->ue_cpos, 1,
>>> ue->ue_phys,
>>> meta_ac, &dealloc);
>>> if (ret < 0) {
>>> mlog_errno(ret);
>>> + ocfs2_commit_trans(osb, handle);
>>> break;
>>
>> Ditto.
>
> The existing code still updates i_size even if ocfs2_mark_extent_written()
> returns an error. I am not certain whether updating i_size in this case is
> correct, but I prefer to maintain the original logic for now.
> Does that seem reasonable to you?
>
Since it returns 0 for unwritten extents, it looks fine.
>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>> to save the extra call in case error.
>
> agree
>
> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> is about 12107 ~ 12457.
> Regarding the transaction splitting in the unwritten handling loop: this patch
> introduces a minor performance regression. I would like to merge some of the
> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> What do you think of this approach?
>
Seems we can limit max credit for a single transaction here?
If exceeds, restart a new one.
Joseph,
Thanks
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-13 4:53 ` Joseph Qi
@ 2026-03-13 5:35 ` Joseph Qi
2026-03-18 6:40 ` Heming Zhao
1 sibling, 0 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-13 5:35 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara
On 3/13/26 12:53 PM, Joseph Qi wrote:
>
>
> On 3/13/26 11:17 AM, Heming Zhao wrote:
>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>> Almost looks fine, minor updates below.
>>>
>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>> ocfs2_dio_end_io_write
>>>> ocfs2_mark_extent_written
>>>> ocfs2_change_extent_flag
>>>> ocfs2_split_extent
>>>> ocfs2_try_to_merge_extent
>>>> ocfs2_extend_rotate_transaction
>>>> ocfs2_extend_trans
>>>> jbd2__journal_restart
>>>> start_this_handle
>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>
>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>> handle each extent in a separate transaction.
>>>>
>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>> only be removed from the orphan list after the extent tree update is complete.
>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>> won't leave stale blocks beyond EOF.
>>>>
>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>> insufficient transaction credits").
>>>>
>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>
>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>> ---
>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>> 1 file changed, 29 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>> index 09146b43d1f0..91997b330d39 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>> goto out;
>>>> }
>>>>
>>>> - /* Delete orphan before acquire i_rwsem. */
>>>> - if (dwc->dw_orphaned) {
>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>> -
>>>> - end = end > i_size_read(inode) ? end : 0;
>>>> -
>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>> - !!end, end);
>>>> - if (ret < 0)
>>>> - mlog_errno(ret);
>>>> - }
>>>> -
>>>> down_write(&oi->ip_alloc_sem);
>>>> di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>
>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>
>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>
>>>> - handle = ocfs2_start_trans(osb, credits);
>>>> - if (IS_ERR(handle)) {
>>>> - ret = PTR_ERR(handle);
>>>> - mlog_errno(ret);
>>>> - goto unlock;
>>>> - }
>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>> - OCFS2_JOURNAL_ACCESS_WRITE);
>>>> - if (ret) {
>>>> - mlog_errno(ret);
>>>> - goto commit;
>>>> - }
>>>> -
>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>> - ret = ocfs2_assure_trans_credits(handle, credits);
>>>> - if (ret < 0) {
>>>> + handle = ocfs2_start_trans(osb, credits);
>>>> + if (IS_ERR(handle)) {
>>>> + ret = PTR_ERR(handle);
>>>> mlog_errno(ret);
>>>> break;
>>>
>>> I'd rather goto unlock directly without update i_size in case error.
>>
>> agree
>>>
>>>> }
>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>> + OCFS2_JOURNAL_ACCESS_WRITE);
>>>> + if (ret) {
>>>> + mlog_errno(ret);
>>>> + ocfs2_commit_trans(osb, handle);
>>>> + break;
>>>
>>> Ditto.
>>
>> agree
>>>
>>>> + }
>>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>> ue->ue_cpos, 1,
>>>> ue->ue_phys,
>>>> meta_ac, &dealloc);
>>>> if (ret < 0) {
>>>> mlog_errno(ret);
>>>> + ocfs2_commit_trans(osb, handle);
>>>> break;
>>>
>>> Ditto.
>>
>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>> returns an error. I am not certain whether updating i_size in this case is
>> correct, but I prefer to maintain the original logic for now.
>> Does that seem reasonable to you?
>>
>
> Since it returns 0 for unwritten extents, it looks fine.
>
Think it a bit more, I think it would be more acceptable to behave
consistent here.
e.g. succeeds in the first round and fails to start transaction in the
second, now it won't update i_size.
Joseph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-13 4:53 ` Joseph Qi
2026-03-13 5:35 ` Joseph Qi
@ 2026-03-18 6:40 ` Heming Zhao
2026-03-19 1:53 ` Joseph Qi
2026-03-24 6:00 ` Heming Zhao
1 sibling, 2 replies; 13+ messages in thread
From: Heming Zhao @ 2026-03-18 6:40 UTC (permalink / raw)
To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su
On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
>
>
> On 3/13/26 11:17 AM, Heming Zhao wrote:
> > On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> >> Almost looks fine, minor updates below.
> >>
> >> On 3/13/26 12:27 AM, Heming Zhao wrote:
> >>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> >>> ocfs2_dio_end_io_write
> >>> ocfs2_mark_extent_written
> >>> ocfs2_change_extent_flag
> >>> ocfs2_split_extent
> >>> ocfs2_try_to_merge_extent
> >>> ocfs2_extend_rotate_transaction
> >>> ocfs2_extend_trans
> >>> jbd2__journal_restart
> >>> start_this_handle
> >>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >>>
> >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> >>> handle each extent in a separate transaction.
> >>>
> >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> >>> only be removed from the orphan list after the extent tree update is complete.
> >>> this ensures that if a crash occurs in the middle of extent tree updates, we
> >>> won't leave stale blocks beyond EOF.
> >>>
> >>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> >>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> >>> insufficient transaction credits").
> >>>
> >>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> >>>
> >>> Suggested-by: Jan Kara <jack@suse.cz>
> >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> >>> ---
> >>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> >>> 1 file changed, 29 insertions(+), 29 deletions(-)
> >>>
> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> >>> index 09146b43d1f0..91997b330d39 100644
> >>> --- a/fs/ocfs2/aops.c
> >>> +++ b/fs/ocfs2/aops.c
> >>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>> goto out;
> >>> }
> >>>
> >>> - /* Delete orphan before acquire i_rwsem. */
> >>> - if (dwc->dw_orphaned) {
> >>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> >>> -
> >>> - end = end > i_size_read(inode) ? end : 0;
> >>> -
> >>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> >>> - !!end, end);
> >>> - if (ret < 0)
> >>> - mlog_errno(ret);
> >>> - }
> >>> -
> >>> down_write(&oi->ip_alloc_sem);
> >>> di = (struct ocfs2_dinode *)di_bh->b_data;
> >>>
> >>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>
> >>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >>>
> >>> - handle = ocfs2_start_trans(osb, credits);
> >>> - if (IS_ERR(handle)) {
> >>> - ret = PTR_ERR(handle);
> >>> - mlog_errno(ret);
> >>> - goto unlock;
> >>> - }
> >>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>> - OCFS2_JOURNAL_ACCESS_WRITE);
> >>> - if (ret) {
> >>> - mlog_errno(ret);
> >>> - goto commit;
> >>> - }
> >>> -
> >>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> >>> - ret = ocfs2_assure_trans_credits(handle, credits);
> >>> - if (ret < 0) {
> >>> + handle = ocfs2_start_trans(osb, credits);
> >>> + if (IS_ERR(handle)) {
> >>> + ret = PTR_ERR(handle);
> >>> mlog_errno(ret);
> >>> break;
> >>
> >> I'd rather goto unlock directly without update i_size in case error.
> >
> > agree
> >>
> >>> }
> >>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>> + OCFS2_JOURNAL_ACCESS_WRITE);
> >>> + if (ret) {
> >>> + mlog_errno(ret);
> >>> + ocfs2_commit_trans(osb, handle);
> >>> + break;
> >>
> >> Ditto.
> >
> > agree
> >>
> >>> + }
> >>> ret = ocfs2_mark_extent_written(inode, &et, handle,
> >>> ue->ue_cpos, 1,
> >>> ue->ue_phys,
> >>> meta_ac, &dealloc);
> >>> if (ret < 0) {
> >>> mlog_errno(ret);
> >>> + ocfs2_commit_trans(osb, handle);
> >>> break;
> >>
> >> Ditto.
> >
> > The existing code still updates i_size even if ocfs2_mark_extent_written()
> > returns an error. I am not certain whether updating i_size in this case is
> > correct, but I prefer to maintain the original logic for now.
> > Does that seem reasonable to you?
> >
>
> Since it returns 0 for unwritten extents, it looks fine.
>
> >> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> >> to save the extra call in case error.
> >
> > agree
> >
> > I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> > is about 12107 ~ 12457.
> > Regarding the transaction splitting in the unwritten handling loop: this patch
> > introduces a minor performance regression. I would like to merge some of the
> > transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> > What do you think of this approach?
> >
>
> Seems we can limit max credit for a single transaction here?
> If exceeds, restart a new one.
>
> Joseph,
> Thanks
sorry for the late reply, I spend some time running the tests.
I have created two different types of patches (see below) to address this issue.
time consumption results:
ocfs2-dynamic-... patch
real 2m9.900s
user 0m0.333s
sys 0m22.687s
%200 of ocfs2-split-trans-... patch
real 1m48.901s
user 0m0.306s
sys 0m23.019s
%500 of ocfs2-split-trans-... patch
real 1m49.350s
user 0m0.299s
sys 0m22.718s
As shown above, the fixed "500% mode" is faster than the dynamic style.
Which approach do you prefer?
---
the test script:
```
#!/bin/bash
# thanks Wolfgang
DISK="vdc"
MOUNTPOINT="/mnt/ocfs2"
TEST_FILE="$MOUNTPOINT/test.bin"
if lsblk | grep "$DISK.*$MOUNTPOINT"; then
umount $MOUNTPOINT
fi
echo "------mkfs & mount----------------"
mkfs.ocfs2 -b 4K -C 4K --cluster-stack=pcmk -N 2 /dev/$DISK --cluster-name=hacluster
mount.ocfs2 -t ocfs2 /dev/$DISK $MOUNTPOINT
sleep 1
echo "------fallocate----------------"
fallocate -l 2G "$TEST_FILE"
sleep 1
echo "------setup file----------------"
fio --name=setup --filename="$TEST_FILE" --rw=randwrite --bs=4k \
--ioengine=libaio --iodepth=64 --size=2G --io_size=512M \
--direct=0 --end_fsync=1 --minimal
sync
sleep 3
echo "-----start test-----------------"
time fio --name=tst --filename="$TEST_FILE" --rw=write --bs=64M \
--ioengine=libaio --iodepth=16 --size=2G --direct=1
```
---
patch: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch
Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path
---
fs/ocfs2/aops.c | 31 +++++++++++++++++--------------
fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++----
fs/ocfs2/journal.h | 5 +++--
3 files changed, 50 insertions(+), 20 deletions(-)
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..d9be765662dc 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
handle_t *handle = NULL;
loff_t end = offset + bytes;
int ret = 0, credits = 0;
+ /* do the same job of jbd2_max_user_trans_buffers() */
+ journal_t *journal = osb->journal->j_journal;
+ int jbd2_max = journal->j_max_transaction_buffers -
+ journal->j_transaction_overhead_buffers;
ocfs2_init_dealloc_ctxt(&dealloc);
@@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
goto out;
}
- /* Delete orphan before acquire i_rwsem. */
- if (dwc->dw_orphaned) {
- BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
- end = end > i_size_read(inode) ? end : 0;
-
- ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
- !!end, end);
- if (ret < 0)
- mlog_errno(ret);
- }
-
down_write(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;
@@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
}
list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
- ret = ocfs2_assure_trans_credits(handle, credits);
- if (ret < 0) {
+ ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max);
+ if (ret == 1) {
+ goto unlock;
+ } else if (ret < 0) {
mlog_errno(ret);
break;
}
@@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
ocfs2_commit_trans(osb, handle);
unlock:
up_write(&oi->ip_alloc_sem);
+
+ if (dwc->dw_orphaned) {
+ BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+ ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
+
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
out:
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index e5f58ff2175f..57ad69d19494 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks)
* credits. Similar notes regarding data consistency and locking implications
* as for ocfs2_extend_trans() apply here.
*/
-int ocfs2_assure_trans_credits(handle_t *handle, int nblocks)
+int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle,
+ struct buffer_head *bh, int nblocks, int jbd2_max)
{
+ handle_t *handle = *i_handle;
int old_nblks = jbd2_handle_buffer_credits(handle);
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ int ret, wanted;
+ int plan = nblocks << 1; /* Double the credits to prevent boundary issues */
trace_ocfs2_assure_trans_credits(old_nblks);
- if (old_nblks >= nblocks)
- return 0;
- return ocfs2_extend_trans(handle, nblocks - old_nblks);
+
+ wanted = old_nblks + plan;
+
+ if (wanted < jbd2_max) {
+ if (old_nblks < nblocks)
+ return ocfs2_extend_trans(handle, nblocks - old_nblks);
+ else
+ return 0;
+ }
+
+ ocfs2_commit_trans(osb, handle);
+ handle = ocfs2_start_trans(osb, plan);
+ if (IS_ERR(handle)) {
+ mlog_errno(PTR_ERR(handle));
+ return 1; /* caller must not commit trans for this error */
+ }
+
+ ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret)
+ mlog_errno(ret);
+
+ *i_handle = handle;
+ return ret;
}
/*
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 6397170f302f..08a76ffb870a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -244,8 +244,9 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb,
int ocfs2_commit_trans(struct ocfs2_super *osb,
handle_t *handle);
int ocfs2_extend_trans(handle_t *handle, int nblocks);
-int ocfs2_assure_trans_credits(handle_t *handle,
- int nblocks);
+int ocfs2_assure_trans_credits(struct inode *inode,
+ handle_t **i_handle, struct buffer_head *bh,
+ int nblocks, int jbd2_max);
int ocfs2_allocate_extend_trans(handle_t *handle,
int thresh);
--
2.43.0
----------
patch: 0001-ocfs2-split-trans-in-end-dio-to-avoid-credit-exhaust.patch
Subject: [PATCH] ocfs2: split trans in end dio to avoid credit exhaustion
---
fs/ocfs2/aops.c | 77 +++++++++++++++++++++++++++++--------------------
1 file changed, 45 insertions(+), 32 deletions(-)
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..9053f1ee587a 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2279,6 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
handle_t *handle = NULL;
loff_t end = offset + bytes;
int ret = 0, credits = 0;
+ int cnt = 0, trans_start = 0, mod = 500;
ocfs2_init_dealloc_ctxt(&dealloc);
@@ -2295,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
goto out;
}
- /* Delete orphan before acquire i_rwsem. */
- if (dwc->dw_orphaned) {
- BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
- end = end > i_size_read(inode) ? end : 0;
-
- ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
- !!end, end);
- if (ret < 0)
- mlog_errno(ret);
- }
-
down_write(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;
@@ -2327,44 +2316,68 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
- handle = ocfs2_start_trans(osb, credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto unlock;
- }
- ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret) {
- mlog_errno(ret);
- goto commit;
- }
-
list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
- ret = ocfs2_assure_trans_credits(handle, credits);
- if (ret < 0) {
- mlog_errno(ret);
- break;
+ if ((cnt++ % mod) == 0) {
+ trans_start = 1;
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ goto unlock;
+ }
+ ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ goto unlock;
+ }
}
ret = ocfs2_mark_extent_written(inode, &et, handle,
ue->ue_cpos, 1,
ue->ue_phys,
meta_ac, &dealloc);
+ if ((cnt % mod) == 0) {
+ ocfs2_commit_trans(osb, handle);
+ if (ret < 0) {
+ mlog_errno(ret);
+ break;
+ }
+ trans_start = 0;
+ }
+ }
+
+ //commit if above loop doesn't do
+ if (trans_start) {
+ ocfs2_commit_trans(osb, handle);
if (ret < 0) {
mlog_errno(ret);
- break;
}
}
if (end > i_size_read(inode)) {
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ goto unlock;
+ }
ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
if (ret < 0)
mlog_errno(ret);
+ ocfs2_commit_trans(osb, handle);
}
-commit:
- ocfs2_commit_trans(osb, handle);
+
unlock:
up_write(&oi->ip_alloc_sem);
+
+ /* everything looks good, let's start the cleanup */
+ if (dwc->dw_orphaned) {
+ BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+ ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
out:
--
2.43.0
Thanks,
Heming
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-18 6:40 ` Heming Zhao
@ 2026-03-19 1:53 ` Joseph Qi
2026-03-24 7:18 ` Heming Zhao
2026-03-25 3:31 ` Heming Zhao
2026-03-24 6:00 ` Heming Zhao
1 sibling, 2 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-19 1:53 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su
On 3/18/26 2:40 PM, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
>>
>>
>> On 3/13/26 11:17 AM, Heming Zhao wrote:
>>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>>> Almost looks fine, minor updates below.
>>>>
>>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>>> ocfs2_dio_end_io_write
>>>>> ocfs2_mark_extent_written
>>>>> ocfs2_change_extent_flag
>>>>> ocfs2_split_extent
>>>>> ocfs2_try_to_merge_extent
>>>>> ocfs2_extend_rotate_transaction
>>>>> ocfs2_extend_trans
>>>>> jbd2__journal_restart
>>>>> start_this_handle
>>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>>
>>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>>> handle each extent in a separate transaction.
>>>>>
>>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>>> only be removed from the orphan list after the extent tree update is complete.
>>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>>> won't leave stale blocks beyond EOF.
>>>>>
>>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>>> insufficient transaction credits").
>>>>>
>>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>>
>>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>>> ---
>>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>>> 1 file changed, 29 insertions(+), 29 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>> index 09146b43d1f0..91997b330d39 100644
>>>>> --- a/fs/ocfs2/aops.c
>>>>> +++ b/fs/ocfs2/aops.c
>>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>> goto out;
>>>>> }
>>>>>
>>>>> - /* Delete orphan before acquire i_rwsem. */
>>>>> - if (dwc->dw_orphaned) {
>>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>>> -
>>>>> - end = end > i_size_read(inode) ? end : 0;
>>>>> -
>>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>>> - !!end, end);
>>>>> - if (ret < 0)
>>>>> - mlog_errno(ret);
>>>>> - }
>>>>> -
>>>>> down_write(&oi->ip_alloc_sem);
>>>>> di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>>
>>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>
>>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>>
>>>>> - handle = ocfs2_start_trans(osb, credits);
>>>>> - if (IS_ERR(handle)) {
>>>>> - ret = PTR_ERR(handle);
>>>>> - mlog_errno(ret);
>>>>> - goto unlock;
>>>>> - }
>>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>> - OCFS2_JOURNAL_ACCESS_WRITE);
>>>>> - if (ret) {
>>>>> - mlog_errno(ret);
>>>>> - goto commit;
>>>>> - }
>>>>> -
>>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>>> - ret = ocfs2_assure_trans_credits(handle, credits);
>>>>> - if (ret < 0) {
>>>>> + handle = ocfs2_start_trans(osb, credits);
>>>>> + if (IS_ERR(handle)) {
>>>>> + ret = PTR_ERR(handle);
>>>>> mlog_errno(ret);
>>>>> break;
>>>>
>>>> I'd rather goto unlock directly without update i_size in case error.
>>>
>>> agree
>>>>
>>>>> }
>>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>> + OCFS2_JOURNAL_ACCESS_WRITE);
>>>>> + if (ret) {
>>>>> + mlog_errno(ret);
>>>>> + ocfs2_commit_trans(osb, handle);
>>>>> + break;
>>>>
>>>> Ditto.
>>>
>>> agree
>>>>
>>>>> + }
>>>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>>> ue->ue_cpos, 1,
>>>>> ue->ue_phys,
>>>>> meta_ac, &dealloc);
>>>>> if (ret < 0) {
>>>>> mlog_errno(ret);
>>>>> + ocfs2_commit_trans(osb, handle);
>>>>> break;
>>>>
>>>> Ditto.
>>>
>>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>>> returns an error. I am not certain whether updating i_size in this case is
>>> correct, but I prefer to maintain the original logic for now.
>>> Does that seem reasonable to you?
>>>
>>
>> Since it returns 0 for unwritten extents, it looks fine.
>>
>>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>>>> to save the extra call in case error.
>>>
>>> agree
>>>
>>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
>>> is about 12107 ~ 12457.
>>> Regarding the transaction splitting in the unwritten handling loop: this patch
>>> introduces a minor performance regression. I would like to merge some of the
>>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
>>> What do you think of this approach?
>>>
>>
>> Seems we can limit max credit for a single transaction here?
>> If exceeds, restart a new one.
>>
>> Joseph,
>> Thanks
>
> sorry for the late reply, I spend some time running the tests.
> I have created two different types of patches (see below) to address this issue.
>
> time consumption results:
>
> ocfs2-dynamic-... patch
> real 2m9.900s
> user 0m0.333s
> sys 0m22.687s
>
> %200 of ocfs2-split-trans-... patch
> real 1m48.901s
> user 0m0.306s
> sys 0m23.019s
>
> %500 of ocfs2-split-trans-... patch
> real 1m49.350s
> user 0m0.299s
> sys 0m22.718s
>
> As shown above, the fixed "500% mode" is faster than the dynamic style.
> Which approach do you prefer?
>
I'd prefer solution B, it looks simpler to directly set a commit batch.
Some comments for the formal patch:
1. I don't think you have to use 'cnt + trans_start + mod' to control
the batch commit, just check 'batch + handle' seems enough.
2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
3. Do not update i_size in case error.
4. Correctly handle commit transaction in case error.
Thanks,
Joseph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-18 6:40 ` Heming Zhao
2026-03-19 1:53 ` Joseph Qi
@ 2026-03-24 6:00 ` Heming Zhao
1 sibling, 0 replies; 13+ messages in thread
From: Heming Zhao @ 2026-03-24 6:00 UTC (permalink / raw)
To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su
ping...
On Wed, Mar 18, 2026 at 02:40:45PM +0800, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
> >
> >
> > On 3/13/26 11:17 AM, Heming Zhao wrote:
> > > On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> > >> Almost looks fine, minor updates below.
> > >>
> > >> On 3/13/26 12:27 AM, Heming Zhao wrote:
> > >>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> > >>> ocfs2_dio_end_io_write
> > >>> ocfs2_mark_extent_written
> > >>> ocfs2_change_extent_flag
> > >>> ocfs2_split_extent
> > >>> ocfs2_try_to_merge_extent
> > >>> ocfs2_extend_rotate_transaction
> > >>> ocfs2_extend_trans
> > >>> jbd2__journal_restart
> > >>> start_this_handle
> > >>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> > >>>
> > >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> > >>> handle each extent in a separate transaction.
> > >>>
> > >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > >>> only be removed from the orphan list after the extent tree update is complete.
> > >>> this ensures that if a crash occurs in the middle of extent tree updates, we
> > >>> won't leave stale blocks beyond EOF.
> > >>>
> > >>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> > >>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> > >>> insufficient transaction credits").
> > >>>
> > >>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> > >>>
> > >>> Suggested-by: Jan Kara <jack@suse.cz>
> > >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > >>> ---
> > >>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> > >>> 1 file changed, 29 insertions(+), 29 deletions(-)
> > >>>
> > >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > >>> index 09146b43d1f0..91997b330d39 100644
> > >>> --- a/fs/ocfs2/aops.c
> > >>> +++ b/fs/ocfs2/aops.c
> > >>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > >>> goto out;
> > >>> }
> > >>>
> > >>> - /* Delete orphan before acquire i_rwsem. */
> > >>> - if (dwc->dw_orphaned) {
> > >>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > >>> -
> > >>> - end = end > i_size_read(inode) ? end : 0;
> > >>> -
> > >>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> > >>> - !!end, end);
> > >>> - if (ret < 0)
> > >>> - mlog_errno(ret);
> > >>> - }
> > >>> -
> > >>> down_write(&oi->ip_alloc_sem);
> > >>> di = (struct ocfs2_dinode *)di_bh->b_data;
> > >>>
> > >>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > >>>
> > >>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> > >>>
> > >>> - handle = ocfs2_start_trans(osb, credits);
> > >>> - if (IS_ERR(handle)) {
> > >>> - ret = PTR_ERR(handle);
> > >>> - mlog_errno(ret);
> > >>> - goto unlock;
> > >>> - }
> > >>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > >>> - OCFS2_JOURNAL_ACCESS_WRITE);
> > >>> - if (ret) {
> > >>> - mlog_errno(ret);
> > >>> - goto commit;
> > >>> - }
> > >>> -
> > >>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> > >>> - ret = ocfs2_assure_trans_credits(handle, credits);
> > >>> - if (ret < 0) {
> > >>> + handle = ocfs2_start_trans(osb, credits);
> > >>> + if (IS_ERR(handle)) {
> > >>> + ret = PTR_ERR(handle);
> > >>> mlog_errno(ret);
> > >>> break;
> > >>
> > >> I'd rather goto unlock directly without update i_size in case error.
> > >
> > > agree
> > >>
> > >>> }
> > >>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > >>> + OCFS2_JOURNAL_ACCESS_WRITE);
> > >>> + if (ret) {
> > >>> + mlog_errno(ret);
> > >>> + ocfs2_commit_trans(osb, handle);
> > >>> + break;
> > >>
> > >> Ditto.
> > >
> > > agree
> > >>
> > >>> + }
> > >>> ret = ocfs2_mark_extent_written(inode, &et, handle,
> > >>> ue->ue_cpos, 1,
> > >>> ue->ue_phys,
> > >>> meta_ac, &dealloc);
> > >>> if (ret < 0) {
> > >>> mlog_errno(ret);
> > >>> + ocfs2_commit_trans(osb, handle);
> > >>> break;
> > >>
> > >> Ditto.
> > >
> > > The existing code still updates i_size even if ocfs2_mark_extent_written()
> > > returns an error. I am not certain whether updating i_size in this case is
> > > correct, but I prefer to maintain the original logic for now.
> > > Does that seem reasonable to you?
> > >
> >
> > Since it returns 0 for unwritten extents, it looks fine.
> >
> > >> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> > >> to save the extra call in case error.
> > >
> > > agree
> > >
> > > I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> > > is about 12107 ~ 12457.
> > > Regarding the transaction splitting in the unwritten handling loop: this patch
> > > introduces a minor performance regression. I would like to merge some of the
> > > transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> > > What do you think of this approach?
> > >
> >
> > Seems we can limit max credit for a single transaction here?
> > If exceeds, restart a new one.
> >
> > Joseph,
> > Thanks
>
> sorry for the late reply, I spend some time running the tests.
> I have created two different types of patches (see below) to address this issue.
>
> time consumption results:
>
> ocfs2-dynamic-... patch
> real 2m9.900s
> user 0m0.333s
> sys 0m22.687s
>
> %200 of ocfs2-split-trans-... patch
> real 1m48.901s
> user 0m0.306s
> sys 0m23.019s
>
> %500 of ocfs2-split-trans-... patch
> real 1m49.350s
> user 0m0.299s
> sys 0m22.718s
>
> As shown above, the fixed "500% mode" is faster than the dynamic style.
> Which approach do you prefer?
>
> ---
> the test script:
> ```
> #!/bin/bash
> # thanks Wolfgang
>
> DISK="vdc"
> MOUNTPOINT="/mnt/ocfs2"
> TEST_FILE="$MOUNTPOINT/test.bin"
>
> if lsblk | grep "$DISK.*$MOUNTPOINT"; then
> umount $MOUNTPOINT
> fi
> echo "------mkfs & mount----------------"
> mkfs.ocfs2 -b 4K -C 4K --cluster-stack=pcmk -N 2 /dev/$DISK --cluster-name=hacluster
> mount.ocfs2 -t ocfs2 /dev/$DISK $MOUNTPOINT
>
> sleep 1
> echo "------fallocate----------------"
> fallocate -l 2G "$TEST_FILE"
>
> sleep 1
> echo "------setup file----------------"
> fio --name=setup --filename="$TEST_FILE" --rw=randwrite --bs=4k \
> --ioengine=libaio --iodepth=64 --size=2G --io_size=512M \
> --direct=0 --end_fsync=1 --minimal
>
> sync
> sleep 3
>
> echo "-----start test-----------------"
> time fio --name=tst --filename="$TEST_FILE" --rw=write --bs=64M \
> --ioengine=libaio --iodepth=16 --size=2G --direct=1
> ```
>
> ---
> patch: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch
> Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path
>
> ---
> fs/ocfs2/aops.c | 31 +++++++++++++++++--------------
> fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++----
> fs/ocfs2/journal.h | 5 +++--
> 3 files changed, 50 insertions(+), 20 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 7a65d5a36a3e..d9be765662dc 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> handle_t *handle = NULL;
> loff_t end = offset + bytes;
> int ret = 0, credits = 0;
> + /* do the same job of jbd2_max_user_trans_buffers() */
> + journal_t *journal = osb->journal->j_journal;
> + int jbd2_max = journal->j_max_transaction_buffers -
> + journal->j_transaction_overhead_buffers;
>
> ocfs2_init_dealloc_ctxt(&dealloc);
>
> @@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> goto out;
> }
>
> - /* Delete orphan before acquire i_rwsem. */
> - if (dwc->dw_orphaned) {
> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> - end = end > i_size_read(inode) ? end : 0;
> -
> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> - !!end, end);
> - if (ret < 0)
> - mlog_errno(ret);
> - }
> -
> down_write(&oi->ip_alloc_sem);
> di = (struct ocfs2_dinode *)di_bh->b_data;
>
> @@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> }
>
> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> - ret = ocfs2_assure_trans_credits(handle, credits);
> - if (ret < 0) {
> + ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max);
> + if (ret == 1) {
> + goto unlock;
> + } else if (ret < 0) {
> mlog_errno(ret);
> break;
> }
> @@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> ocfs2_commit_trans(osb, handle);
> unlock:
> up_write(&oi->ip_alloc_sem);
> +
> + if (dwc->dw_orphaned) {
> + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> + if (ret < 0)
> + mlog_errno(ret);
> + }
> +
> ocfs2_inode_unlock(inode, 1);
> brelse(di_bh);
> out:
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index e5f58ff2175f..57ad69d19494 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks)
> * credits. Similar notes regarding data consistency and locking implications
> * as for ocfs2_extend_trans() apply here.
> */
> -int ocfs2_assure_trans_credits(handle_t *handle, int nblocks)
> +int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle,
> + struct buffer_head *bh, int nblocks, int jbd2_max)
> {
> + handle_t *handle = *i_handle;
> int old_nblks = jbd2_handle_buffer_credits(handle);
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> + int ret, wanted;
> + int plan = nblocks << 1; /* Double the credits to prevent boundary issues */
>
> trace_ocfs2_assure_trans_credits(old_nblks);
> - if (old_nblks >= nblocks)
> - return 0;
> - return ocfs2_extend_trans(handle, nblocks - old_nblks);
> +
> + wanted = old_nblks + plan;
> +
> + if (wanted < jbd2_max) {
> + if (old_nblks < nblocks)
> + return ocfs2_extend_trans(handle, nblocks - old_nblks);
> + else
> + return 0;
> + }
> +
> + ocfs2_commit_trans(osb, handle);
> + handle = ocfs2_start_trans(osb, plan);
> + if (IS_ERR(handle)) {
> + mlog_errno(PTR_ERR(handle));
> + return 1; /* caller must not commit trans for this error */
> + }
> +
> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh,
> + OCFS2_JOURNAL_ACCESS_WRITE);
> + if (ret)
> + mlog_errno(ret);
> +
> + *i_handle = handle;
> + return ret;
> }
>
> /*
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 6397170f302f..08a76ffb870a 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -244,8 +244,9 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb,
> int ocfs2_commit_trans(struct ocfs2_super *osb,
> handle_t *handle);
> int ocfs2_extend_trans(handle_t *handle, int nblocks);
> -int ocfs2_assure_trans_credits(handle_t *handle,
> - int nblocks);
> +int ocfs2_assure_trans_credits(struct inode *inode,
> + handle_t **i_handle, struct buffer_head *bh,
> + int nblocks, int jbd2_max);
> int ocfs2_allocate_extend_trans(handle_t *handle,
> int thresh);
>
> --
> 2.43.0
>
> ----------
> patch: 0001-ocfs2-split-trans-in-end-dio-to-avoid-credit-exhaust.patch
> Subject: [PATCH] ocfs2: split trans in end dio to avoid credit exhaustion
>
> ---
> fs/ocfs2/aops.c | 77 +++++++++++++++++++++++++++++--------------------
> 1 file changed, 45 insertions(+), 32 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 7a65d5a36a3e..9053f1ee587a 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2279,6 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> handle_t *handle = NULL;
> loff_t end = offset + bytes;
> int ret = 0, credits = 0;
> + int cnt = 0, trans_start = 0, mod = 500;
>
> ocfs2_init_dealloc_ctxt(&dealloc);
>
> @@ -2295,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> goto out;
> }
>
> - /* Delete orphan before acquire i_rwsem. */
> - if (dwc->dw_orphaned) {
> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> - end = end > i_size_read(inode) ? end : 0;
> -
> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> - !!end, end);
> - if (ret < 0)
> - mlog_errno(ret);
> - }
> -
> down_write(&oi->ip_alloc_sem);
> di = (struct ocfs2_dinode *)di_bh->b_data;
>
> @@ -2327,44 +2316,68 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>
> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>
> - handle = ocfs2_start_trans(osb, credits);
> - if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - mlog_errno(ret);
> - goto unlock;
> - }
> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> - OCFS2_JOURNAL_ACCESS_WRITE);
> - if (ret) {
> - mlog_errno(ret);
> - goto commit;
> - }
> -
> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> - ret = ocfs2_assure_trans_credits(handle, credits);
> - if (ret < 0) {
> - mlog_errno(ret);
> - break;
> + if ((cnt++ % mod) == 0) {
> + trans_start = 1;
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + goto unlock;
> + }
> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> + OCFS2_JOURNAL_ACCESS_WRITE);
> + if (ret) {
> + mlog_errno(ret);
> + goto unlock;
> + }
> }
> ret = ocfs2_mark_extent_written(inode, &et, handle,
> ue->ue_cpos, 1,
> ue->ue_phys,
> meta_ac, &dealloc);
> + if ((cnt % mod) == 0) {
> + ocfs2_commit_trans(osb, handle);
> + if (ret < 0) {
> + mlog_errno(ret);
> + break;
> + }
> + trans_start = 0;
> + }
> + }
> +
> + //commit if above loop doesn't do
> + if (trans_start) {
> + ocfs2_commit_trans(osb, handle);
> if (ret < 0) {
> mlog_errno(ret);
> - break;
> }
> }
>
> if (end > i_size_read(inode)) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + goto unlock;
> + }
> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> if (ret < 0)
> mlog_errno(ret);
> + ocfs2_commit_trans(osb, handle);
> }
> -commit:
> - ocfs2_commit_trans(osb, handle);
> +
> unlock:
> up_write(&oi->ip_alloc_sem);
> +
> + /* everything looks good, let's start the cleanup */
> + if (dwc->dw_orphaned) {
> + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> + if (ret < 0)
> + mlog_errno(ret);
> + }
> ocfs2_inode_unlock(inode, 1);
> brelse(di_bh);
> out:
> --
> 2.43.0
>
>
> Thanks,
> Heming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-19 1:53 ` Joseph Qi
@ 2026-03-24 7:18 ` Heming Zhao
2026-03-25 3:31 ` Heming Zhao
1 sibling, 0 replies; 13+ messages in thread
From: Heming Zhao @ 2026-03-24 7:18 UTC (permalink / raw)
To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su
Sorry, I just found this mail. Please ignore my previous mail.
On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote:
>
>
> On 3/18/26 2:40 PM, Heming Zhao wrote:
> > On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 3/13/26 11:17 AM, Heming Zhao wrote:
> >>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> >>>> Almost looks fine, minor updates below.
> >>>>
> >>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
> >>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> >>>>> ocfs2_dio_end_io_write
> >>>>> ocfs2_mark_extent_written
> >>>>> ocfs2_change_extent_flag
> >>>>> ocfs2_split_extent
> >>>>> ocfs2_try_to_merge_extent
> >>>>> ocfs2_extend_rotate_transaction
> >>>>> ocfs2_extend_trans
> >>>>> jbd2__journal_restart
> >>>>> start_this_handle
> >>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >>>>>
> >>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> >>>>> handle each extent in a separate transaction.
> >>>>>
> >>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> >>>>> only be removed from the orphan list after the extent tree update is complete.
> >>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
> >>>>> won't leave stale blocks beyond EOF.
> >>>>>
> >>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> >>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> >>>>> insufficient transaction credits").
> >>>>>
> >>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> >>>>>
> >>>>> Suggested-by: Jan Kara <jack@suse.cz>
> >>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> >>>>> ---
> >>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> >>>>> 1 file changed, 29 insertions(+), 29 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> >>>>> index 09146b43d1f0..91997b330d39 100644
> >>>>> --- a/fs/ocfs2/aops.c
> >>>>> +++ b/fs/ocfs2/aops.c
> >>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>>> goto out;
> >>>>> }
> >>>>>
> >>>>> - /* Delete orphan before acquire i_rwsem. */
> >>>>> - if (dwc->dw_orphaned) {
> >>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> >>>>> -
> >>>>> - end = end > i_size_read(inode) ? end : 0;
> >>>>> -
> >>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> >>>>> - !!end, end);
> >>>>> - if (ret < 0)
> >>>>> - mlog_errno(ret);
> >>>>> - }
> >>>>> -
> >>>>> down_write(&oi->ip_alloc_sem);
> >>>>> di = (struct ocfs2_dinode *)di_bh->b_data;
> >>>>>
> >>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>>>
> >>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >>>>>
> >>>>> - handle = ocfs2_start_trans(osb, credits);
> >>>>> - if (IS_ERR(handle)) {
> >>>>> - ret = PTR_ERR(handle);
> >>>>> - mlog_errno(ret);
> >>>>> - goto unlock;
> >>>>> - }
> >>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>>>> - OCFS2_JOURNAL_ACCESS_WRITE);
> >>>>> - if (ret) {
> >>>>> - mlog_errno(ret);
> >>>>> - goto commit;
> >>>>> - }
> >>>>> -
> >>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> >>>>> - ret = ocfs2_assure_trans_credits(handle, credits);
> >>>>> - if (ret < 0) {
> >>>>> + handle = ocfs2_start_trans(osb, credits);
> >>>>> + if (IS_ERR(handle)) {
> >>>>> + ret = PTR_ERR(handle);
> >>>>> mlog_errno(ret);
> >>>>> break;
> >>>>
> >>>> I'd rather goto unlock directly without update i_size in case error.
> >>>
> >>> agree
> >>>>
> >>>>> }
> >>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>>>> + OCFS2_JOURNAL_ACCESS_WRITE);
> >>>>> + if (ret) {
> >>>>> + mlog_errno(ret);
> >>>>> + ocfs2_commit_trans(osb, handle);
> >>>>> + break;
> >>>>
> >>>> Ditto.
> >>>
> >>> agree
> >>>>
> >>>>> + }
> >>>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
> >>>>> ue->ue_cpos, 1,
> >>>>> ue->ue_phys,
> >>>>> meta_ac, &dealloc);
> >>>>> if (ret < 0) {
> >>>>> mlog_errno(ret);
> >>>>> + ocfs2_commit_trans(osb, handle);
> >>>>> break;
> >>>>
> >>>> Ditto.
> >>>
> >>> The existing code still updates i_size even if ocfs2_mark_extent_written()
> >>> returns an error. I am not certain whether updating i_size in this case is
> >>> correct, but I prefer to maintain the original logic for now.
> >>> Does that seem reasonable to you?
> >>>
> >>
> >> Since it returns 0 for unwritten extents, it looks fine.
> >>
> >>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> >>>> to save the extra call in case error.
> >>>
> >>> agree
> >>>
> >>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> >>> is about 12107 ~ 12457.
> >>> Regarding the transaction splitting in the unwritten handling loop: this patch
> >>> introduces a minor performance regression. I would like to merge some of the
> >>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> >>> What do you think of this approach?
> >>>
> >>
> >> Seems we can limit max credit for a single transaction here?
> >> If exceeds, restart a new one.
> >>
> >> Joseph,
> >> Thanks
> >
> > sorry for the late reply, I spend some time running the tests.
> > I have created two different types of patches (see below) to address this issue.
> >
> > time consumption results:
> >
> > ocfs2-dynamic-... patch
> > real 2m9.900s
> > user 0m0.333s
> > sys 0m22.687s
> >
> > %200 of ocfs2-split-trans-... patch
> > real 1m48.901s
> > user 0m0.306s
> > sys 0m23.019s
> >
> > %500 of ocfs2-split-trans-... patch
> > real 1m49.350s
> > user 0m0.299s
> > sys 0m22.718s
> >
> > As shown above, the fixed "500% mode" is faster than the dynamic style.
> > Which approach do you prefer?
> >
> I'd prefer solution B, it looks simpler to directly set a commit batch.
> Some comments for the formal patch:
> 1. I don't think you have to use 'cnt + trans_start + mod' to control
> the batch commit, just check 'batch + handle' seems enough.
> 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
> 3. Do not update i_size in case error.
> 4. Correctly handle commit transaction in case error.
>
> Thanks,
> Joseph
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-19 1:53 ` Joseph Qi
2026-03-24 7:18 ` Heming Zhao
@ 2026-03-25 3:31 ` Heming Zhao
2026-03-25 6:33 ` Joseph Qi
1 sibling, 1 reply; 13+ messages in thread
From: Heming Zhao @ 2026-03-25 3:31 UTC (permalink / raw)
To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su
On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote:
>
>
> On 3/18/26 2:40 PM, Heming Zhao wrote:
> > On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 3/13/26 11:17 AM, Heming Zhao wrote:
> >>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> >>>> Almost looks fine, minor updates below.
> >>>>
> >>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
> >>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> >>>>> ocfs2_dio_end_io_write
> >>>>> ocfs2_mark_extent_written
> >>>>> ocfs2_change_extent_flag
> >>>>> ocfs2_split_extent
> >>>>> ocfs2_try_to_merge_extent
> >>>>> ocfs2_extend_rotate_transaction
> >>>>> ocfs2_extend_trans
> >>>>> jbd2__journal_restart
> >>>>> start_this_handle
> >>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >>>>>
> >>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> >>>>> handle each extent in a separate transaction.
> >>>>>
> >>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> >>>>> only be removed from the orphan list after the extent tree update is complete.
> >>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
> >>>>> won't leave stale blocks beyond EOF.
> >>>>>
> >>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> >>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> >>>>> insufficient transaction credits").
> >>>>>
> >>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> >>>>>
> >>>>> Suggested-by: Jan Kara <jack@suse.cz>
> >>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> >>>>> ---
> >>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> >>>>> 1 file changed, 29 insertions(+), 29 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> >>>>> index 09146b43d1f0..91997b330d39 100644
> >>>>> --- a/fs/ocfs2/aops.c
> >>>>> +++ b/fs/ocfs2/aops.c
> >>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>>> goto out;
> >>>>> }
> >>>>>
> >>>>> - /* Delete orphan before acquire i_rwsem. */
> >>>>> - if (dwc->dw_orphaned) {
> >>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> >>>>> -
> >>>>> - end = end > i_size_read(inode) ? end : 0;
> >>>>> -
> >>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> >>>>> - !!end, end);
> >>>>> - if (ret < 0)
> >>>>> - mlog_errno(ret);
> >>>>> - }
> >>>>> -
> >>>>> down_write(&oi->ip_alloc_sem);
> >>>>> di = (struct ocfs2_dinode *)di_bh->b_data;
> >>>>>
> >>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>>>
> >>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >>>>>
> >>>>> - handle = ocfs2_start_trans(osb, credits);
> >>>>> - if (IS_ERR(handle)) {
> >>>>> - ret = PTR_ERR(handle);
> >>>>> - mlog_errno(ret);
> >>>>> - goto unlock;
> >>>>> - }
> >>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>>>> - OCFS2_JOURNAL_ACCESS_WRITE);
> >>>>> - if (ret) {
> >>>>> - mlog_errno(ret);
> >>>>> - goto commit;
> >>>>> - }
> >>>>> -
> >>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> >>>>> - ret = ocfs2_assure_trans_credits(handle, credits);
> >>>>> - if (ret < 0) {
> >>>>> + handle = ocfs2_start_trans(osb, credits);
> >>>>> + if (IS_ERR(handle)) {
> >>>>> + ret = PTR_ERR(handle);
> >>>>> mlog_errno(ret);
> >>>>> break;
> >>>>
> >>>> I'd rather goto unlock directly without update i_size in case error.
> >>>
> >>> agree
> >>>>
> >>>>> }
> >>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>>>> + OCFS2_JOURNAL_ACCESS_WRITE);
> >>>>> + if (ret) {
> >>>>> + mlog_errno(ret);
> >>>>> + ocfs2_commit_trans(osb, handle);
> >>>>> + break;
> >>>>
> >>>> Ditto.
> >>>
> >>> agree
> >>>>
> >>>>> + }
> >>>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
> >>>>> ue->ue_cpos, 1,
> >>>>> ue->ue_phys,
> >>>>> meta_ac, &dealloc);
> >>>>> if (ret < 0) {
> >>>>> mlog_errno(ret);
> >>>>> + ocfs2_commit_trans(osb, handle);
> >>>>> break;
> >>>>
> >>>> Ditto.
> >>>
> >>> The existing code still updates i_size even if ocfs2_mark_extent_written()
> >>> returns an error. I am not certain whether updating i_size in this case is
> >>> correct, but I prefer to maintain the original logic for now.
> >>> Does that seem reasonable to you?
> >>>
> >>
> >> Since it returns 0 for unwritten extents, it looks fine.
> >>
> >>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> >>>> to save the extra call in case error.
> >>>
> >>> agree
> >>>
> >>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> >>> is about 12107 ~ 12457.
> >>> Regarding the transaction splitting in the unwritten handling loop: this patch
> >>> introduces a minor performance regression. I would like to merge some of the
> >>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> >>> What do you think of this approach?
> >>>
> >>
> >> Seems we can limit max credit for a single transaction here?
> >> If exceeds, restart a new one.
> >>
> >> Joseph,
> >> Thanks
> >
> > sorry for the late reply, I spend some time running the tests.
> > I have created two different types of patches (see below) to address this issue.
> >
> > time consumption results:
> >
> > ocfs2-dynamic-... patch
> > real 2m9.900s
> > user 0m0.333s
> > sys 0m22.687s
> >
> > %200 of ocfs2-split-trans-... patch
> > real 1m48.901s
> > user 0m0.306s
> > sys 0m23.019s
> >
> > %500 of ocfs2-split-trans-... patch
> > real 1m49.350s
> > user 0m0.299s
> > sys 0m22.718s
> >
> > As shown above, the fixed "500% mode" is faster than the dynamic style.
> > Which approach do you prefer?
> >
> I'd prefer solution B, it looks simpler to directly set a commit batch.
> Some comments for the formal patch:
> 1. I don't think you have to use 'cnt + trans_start + mod' to control
> the batch commit, just check 'batch + handle' seems enough.
> 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
> 3. Do not update i_size in case error.
> 4. Correctly handle commit transaction in case error.
>
> Thanks,
> Joseph
>
I hope I understand your point correctly.
creating and setting OCFS2_DIO_MARK_EXTENT_BATCH to 800, corresponding to the
%200 used in my previous 'cnt+trans_start+mod' patch.
the time consumption:
real 1m50.508s
user 0m0.365s
sys 0m22.829s
```
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..4e3a51e86e32 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -37,6 +37,8 @@
#include "namei.h"
#include "sysfile.h"
+#define OCFS2_DIO_MARK_EXTENT_BATCH 800
+
static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
@@ -2279,6 +2281,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
handle_t *handle = NULL;
loff_t end = offset + bytes;
int ret = 0, credits = 0;
+ int trans_err = 0, wanted;
ocfs2_init_dealloc_ctxt(&dealloc);
@@ -2295,18 +2298,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
goto out;
}
- /* Delete orphan before acquire i_rwsem. */
- if (dwc->dw_orphaned) {
- BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
- end = end > i_size_read(inode) ? end : 0;
-
- ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
- !!end, end);
- if (ret < 0)
- mlog_errno(ret);
- }
-
down_write(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;
@@ -2341,11 +2332,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
}
list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
- ret = ocfs2_assure_trans_credits(handle, credits);
- if (ret < 0) {
- mlog_errno(ret);
- break;
- }
ret = ocfs2_mark_extent_written(inode, &et, handle,
ue->ue_cpos, 1,
ue->ue_phys,
@@ -2354,17 +2340,46 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
mlog_errno(ret);
break;
}
- }
- if (end > i_size_read(inode)) {
- ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
- if (ret < 0)
- mlog_errno(ret);
+ wanted = jbd2_handle_buffer_credits(handle) + (credits << 2);
+ if (wanted > OCFS2_DIO_MARK_EXTENT_BATCH) {
+ ocfs2_commit_trans(osb, handle);
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ trans_err = 1;
+ break;
+ }
+ ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ break;
+ }
+ }
}
+
+ if (!trans_err) {
+ if (end > i_size_read(inode)) {
+ ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
commit:
- ocfs2_commit_trans(osb, handle);
+ ocfs2_commit_trans(osb, handle);
+ }
unlock:
up_write(&oi->ip_alloc_sem);
+
+ /* everything looks good, let's start the cleanup */
+ if (dwc->dw_orphaned) {
+ BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+ ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
out:
```
if this approach is acceptable, I will send v3.
- Heming
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-25 3:31 ` Heming Zhao
@ 2026-03-25 6:33 ` Joseph Qi
0 siblings, 0 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-25 6:33 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su
On 3/25/26 11:31 AM, Heming Zhao wrote:
> On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote:
>>
>>
>> On 3/18/26 2:40 PM, Heming Zhao wrote:
>>> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 3/13/26 11:17 AM, Heming Zhao wrote:
>>>>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>>>>> Almost looks fine, minor updates below.
>>>>>>
>>>>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>>>>> ocfs2_dio_end_io_write
>>>>>>> ocfs2_mark_extent_written
>>>>>>> ocfs2_change_extent_flag
>>>>>>> ocfs2_split_extent
>>>>>>> ocfs2_try_to_merge_extent
>>>>>>> ocfs2_extend_rotate_transaction
>>>>>>> ocfs2_extend_trans
>>>>>>> jbd2__journal_restart
>>>>>>> start_this_handle
>>>>>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>>>>
>>>>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>>>>> handle each extent in a separate transaction.
>>>>>>>
>>>>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>>>>> only be removed from the orphan list after the extent tree update is complete.
>>>>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>>>>> won't leave stale blocks beyond EOF.
>>>>>>>
>>>>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>>>>> insufficient transaction credits").
>>>>>>>
>>>>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>>>>
>>>>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>>>>> ---
>>>>>>> fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>>>>> 1 file changed, 29 insertions(+), 29 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>>>> index 09146b43d1f0..91997b330d39 100644
>>>>>>> --- a/fs/ocfs2/aops.c
>>>>>>> +++ b/fs/ocfs2/aops.c
>>>>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>>> goto out;
>>>>>>> }
>>>>>>>
>>>>>>> - /* Delete orphan before acquire i_rwsem. */
>>>>>>> - if (dwc->dw_orphaned) {
>>>>>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>>>>> -
>>>>>>> - end = end > i_size_read(inode) ? end : 0;
>>>>>>> -
>>>>>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>>>>> - !!end, end);
>>>>>>> - if (ret < 0)
>>>>>>> - mlog_errno(ret);
>>>>>>> - }
>>>>>>> -
>>>>>>> down_write(&oi->ip_alloc_sem);
>>>>>>> di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>>>>
>>>>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>>>
>>>>>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>>>>
>>>>>>> - handle = ocfs2_start_trans(osb, credits);
>>>>>>> - if (IS_ERR(handle)) {
>>>>>>> - ret = PTR_ERR(handle);
>>>>>>> - mlog_errno(ret);
>>>>>>> - goto unlock;
>>>>>>> - }
>>>>>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>>>> - OCFS2_JOURNAL_ACCESS_WRITE);
>>>>>>> - if (ret) {
>>>>>>> - mlog_errno(ret);
>>>>>>> - goto commit;
>>>>>>> - }
>>>>>>> -
>>>>>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>>>>> - ret = ocfs2_assure_trans_credits(handle, credits);
>>>>>>> - if (ret < 0) {
>>>>>>> + handle = ocfs2_start_trans(osb, credits);
>>>>>>> + if (IS_ERR(handle)) {
>>>>>>> + ret = PTR_ERR(handle);
>>>>>>> mlog_errno(ret);
>>>>>>> break;
>>>>>>
>>>>>> I'd rather goto unlock directly without update i_size in case error.
>>>>>
>>>>> agree
>>>>>>
>>>>>>> }
>>>>>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>>>> + OCFS2_JOURNAL_ACCESS_WRITE);
>>>>>>> + if (ret) {
>>>>>>> + mlog_errno(ret);
>>>>>>> + ocfs2_commit_trans(osb, handle);
>>>>>>> + break;
>>>>>>
>>>>>> Ditto.
>>>>>
>>>>> agree
>>>>>>
>>>>>>> + }
>>>>>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>>>>> ue->ue_cpos, 1,
>>>>>>> ue->ue_phys,
>>>>>>> meta_ac, &dealloc);
>>>>>>> if (ret < 0) {
>>>>>>> mlog_errno(ret);
>>>>>>> + ocfs2_commit_trans(osb, handle);
>>>>>>> break;
>>>>>>
>>>>>> Ditto.
>>>>>
>>>>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>>>>> returns an error. I am not certain whether updating i_size in this case is
>>>>> correct, but I prefer to maintain the original logic for now.
>>>>> Does that seem reasonable to you?
>>>>>
>>>>
>>>> Since it returns 0 for unwritten extents, it looks fine.
>>>>
>>>>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>>>>>> to save the extra call in case error.
>>>>>
>>>>> agree
>>>>>
>>>>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
>>>>> is about 12107 ~ 12457.
>>>>> Regarding the transaction splitting in the unwritten handling loop: this patch
>>>>> introduces a minor performance regression. I would like to merge some of the
>>>>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
>>>>> What do you think of this approach?
>>>>>
>>>>
>>>> Seems we can limit max credit for a single transaction here?
>>>> If exceeds, restart a new one.
>>>>
>>>> Joseph,
>>>> Thanks
>>>
>>> sorry for the late reply, I spend some time running the tests.
>>> I have created two different types of patches (see below) to address this issue.
>>>
>>> time consumption results:
>>>
>>> ocfs2-dynamic-... patch
>>> real 2m9.900s
>>> user 0m0.333s
>>> sys 0m22.687s
>>>
>>> %200 of ocfs2-split-trans-... patch
>>> real 1m48.901s
>>> user 0m0.306s
>>> sys 0m23.019s
>>>
>>> %500 of ocfs2-split-trans-... patch
>>> real 1m49.350s
>>> user 0m0.299s
>>> sys 0m22.718s
>>>
>>> As shown above, the fixed "500% mode" is faster than the dynamic style.
>>> Which approach do you prefer?
>>>
>> I'd prefer solution B, it looks simpler to directly set a commit batch.
>> Some comments for the formal patch:
>> 1. I don't think you have to use 'cnt + trans_start + mod' to control
>> the batch commit, just check 'batch + handle' seems enough.
>> 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
>> 3. Do not update i_size in case error.
>> 4. Correctly handle commit transaction in case error.
>>
>> Thanks,
>> Joseph
>>
>
> I hope I understand your point correctly.
> creating and setting OCFS2_DIO_MARK_EXTENT_BATCH to 800, corresponding to the
> %200 used in my previous 'cnt+trans_start+mod' patch.
>
Seems you've proposed a new solution?
In your previous mail, I think it is:
list_for_each_entry() {
if (!handle)
ocfs2_start_trans();
ocfs2_journal_access_di()
ocfs2_mark_extent_written()
if (batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
ocfs2_commit_trans();
handle = NULL;
}
}
Joseph
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-03-25 6:33 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 16:27 [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid Heming Zhao
2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
2026-03-13 2:04 ` Joseph Qi
2026-03-13 3:17 ` Heming Zhao
2026-03-13 4:53 ` Joseph Qi
2026-03-13 5:35 ` Joseph Qi
2026-03-18 6:40 ` Heming Zhao
2026-03-19 1:53 ` Joseph Qi
2026-03-24 7:18 ` Heming Zhao
2026-03-25 3:31 ` Heming Zhao
2026-03-25 6:33 ` Joseph Qi
2026-03-24 6:00 ` Heming Zhao
2026-03-13 2:27 ` Joseph Qi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox