public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid
@ 2026-03-12 16:27 Heming Zhao
  2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
  0 siblings, 1 reply; 13+ messages in thread
From: Heming Zhao @ 2026-03-12 16:27 UTC (permalink / raw)
  To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel


v1->v2:
following the review comments, restore the i_size update code in
ocfs2_dio_end_io_write().

For easier merging, this patch is based on Joseph's patch:
https://lore.kernel.org/ocfs2-devel/46yilbaq5z5x6gdfdpoa6lprf6sf3gbxriuku2odje4kx4bovf@jd735cphfutz/T/#t 

Heming Zhao (1):
  ocfs2: split transactions in dio completion to avoid credit exhaustion

 fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-12 16:27 [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid Heming Zhao
@ 2026-03-12 16:27 ` Heming Zhao
  2026-03-13  2:04   ` Joseph Qi
  2026-03-13  2:27   ` Joseph Qi
  0 siblings, 2 replies; 13+ messages in thread
From: Heming Zhao @ 2026-03-12 16:27 UTC (permalink / raw)
  To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel

During ocfs2 dio operations, JBD2 may report warnings via following call trace:
ocfs2_dio_end_io_write
 ocfs2_mark_extent_written
  ocfs2_change_extent_flag
   ocfs2_split_extent
    ocfs2_try_to_merge_extent
     ocfs2_extend_rotate_transaction
      ocfs2_extend_trans
       jbd2__journal_restart
        start_this_handle
         output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449

To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
handle each extent in a separate transaction.

Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
only be removed from the orphan list after the extent tree update is complete.
this ensures that if a crash occurs in the middle of extent tree updates, we
won't leave stale blocks beyond EOF.

This patch also removes the only call to ocfs2_assure_trans_credits(), which
was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
insufficient transaction credits").

Finally, thanks to Jans for providing the bug fix prototype and suggestions.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 09146b43d1f0..91997b330d39 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 		goto out;
 	}
 
-	/* Delete orphan before acquire i_rwsem. */
-	if (dwc->dw_orphaned) {
-		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
-		end = end > i_size_read(inode) ? end : 0;
-
-		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
-				!!end, end);
-		if (ret < 0)
-			mlog_errno(ret);
-	}
-
 	down_write(&oi->ip_alloc_sem);
 	di = (struct ocfs2_dinode *)di_bh->b_data;
 
@@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 
 	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
 
-	handle = ocfs2_start_trans(osb, credits);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-		mlog_errno(ret);
-		goto unlock;
-	}
-	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
-				      OCFS2_JOURNAL_ACCESS_WRITE);
-	if (ret) {
-		mlog_errno(ret);
-		goto commit;
-	}
-
 	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
-		ret = ocfs2_assure_trans_credits(handle, credits);
-		if (ret < 0) {
+		handle = ocfs2_start_trans(osb, credits);
+		if (IS_ERR(handle)) {
+			ret = PTR_ERR(handle);
 			mlog_errno(ret);
 			break;
 		}
+		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+					      OCFS2_JOURNAL_ACCESS_WRITE);
+		if (ret) {
+			mlog_errno(ret);
+			ocfs2_commit_trans(osb, handle);
+			break;
+		}
 		ret = ocfs2_mark_extent_written(inode, &et, handle,
 						ue->ue_cpos, 1,
 						ue->ue_phys,
 						meta_ac, &dealloc);
 		if (ret < 0) {
 			mlog_errno(ret);
+			ocfs2_commit_trans(osb, handle);
 			break;
 		}
+		ocfs2_commit_trans(osb, handle);
 	}
 
 	if (end > i_size_read(inode)) {
+		handle = ocfs2_start_trans(osb, credits);
+		if (IS_ERR(handle)) {
+			ret = PTR_ERR(handle);
+			mlog_errno(ret);
+			goto unlock;
+		}
 		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
 		if (ret < 0)
 			mlog_errno(ret);
+		ocfs2_commit_trans(osb, handle);
 	}
-commit:
-	ocfs2_commit_trans(osb, handle);
+
 unlock:
 	up_write(&oi->ip_alloc_sem);
+
+	/* everything looks good, let's start the cleanup */
+	if (dwc->dw_orphaned) {
+		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+		if (ret < 0)
+			mlog_errno(ret);
+	}
 	ocfs2_inode_unlock(inode, 1);
 	brelse(di_bh);
 out:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
@ 2026-03-13  2:04   ` Joseph Qi
  2026-03-13  3:17     ` Heming Zhao
  2026-03-13  2:27   ` Joseph Qi
  1 sibling, 1 reply; 13+ messages in thread
From: Joseph Qi @ 2026-03-13  2:04 UTC (permalink / raw)
  To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara

Almost looks fine, minor updates below.

On 3/13/26 12:27 AM, Heming Zhao wrote:
> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> ocfs2_dio_end_io_write
>  ocfs2_mark_extent_written
>   ocfs2_change_extent_flag
>    ocfs2_split_extent
>     ocfs2_try_to_merge_extent
>      ocfs2_extend_rotate_transaction
>       ocfs2_extend_trans
>        jbd2__journal_restart
>         start_this_handle
>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> 
> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> handle each extent in a separate transaction.
> 
> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> only be removed from the orphan list after the extent tree update is complete.
> this ensures that if a crash occurs in the middle of extent tree updates, we
> won't leave stale blocks beyond EOF.
> 
> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> insufficient transaction credits").
> 
> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>  1 file changed, 29 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 09146b43d1f0..91997b330d39 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  		goto out;
>  	}
>  
> -	/* Delete orphan before acquire i_rwsem. */
> -	if (dwc->dw_orphaned) {
> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> -		end = end > i_size_read(inode) ? end : 0;
> -
> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> -				!!end, end);
> -		if (ret < 0)
> -			mlog_errno(ret);
> -	}
> -
>  	down_write(&oi->ip_alloc_sem);
>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>  
> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  
>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>  
> -	handle = ocfs2_start_trans(osb, credits);
> -	if (IS_ERR(handle)) {
> -		ret = PTR_ERR(handle);
> -		mlog_errno(ret);
> -		goto unlock;
> -	}
> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> -	if (ret) {
> -		mlog_errno(ret);
> -		goto commit;
> -	}
> -
>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> -		ret = ocfs2_assure_trans_credits(handle, credits);
> -		if (ret < 0) {
> +		handle = ocfs2_start_trans(osb, credits);
> +		if (IS_ERR(handle)) {
> +			ret = PTR_ERR(handle);
>  			mlog_errno(ret);
>  			break;

I'd rather goto unlock directly without update i_size in case error.

>  		}
> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> +					      OCFS2_JOURNAL_ACCESS_WRITE);
> +		if (ret) {
> +			mlog_errno(ret);
> +			ocfs2_commit_trans(osb, handle);
> +			break;

Ditto.

> +		}
>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>  						ue->ue_cpos, 1,
>  						ue->ue_phys,
>  						meta_ac, &dealloc);
>  		if (ret < 0) {
>  			mlog_errno(ret);
> +			ocfs2_commit_trans(osb, handle);
>  			break;

Ditto.
BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
to save the extra call in case error.

Thanks,
Joseph

>  		}
> +		ocfs2_commit_trans(osb, handle);
>  	}
>  
>  	if (end > i_size_read(inode)) {
> +		handle = ocfs2_start_trans(osb, credits);
> +		if (IS_ERR(handle)) {
> +			ret = PTR_ERR(handle);
> +			mlog_errno(ret);
> +			goto unlock;
> +		}
>  		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
>  		if (ret < 0)
>  			mlog_errno(ret);
> +		ocfs2_commit_trans(osb, handle);
>  	}
> -commit:
> -	ocfs2_commit_trans(osb, handle);
> +
>  unlock:
>  	up_write(&oi->ip_alloc_sem);
> +
> +	/* everything looks good, let's start the cleanup */
> +	if (dwc->dw_orphaned) {
> +		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> +		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> +		if (ret < 0)
> +			mlog_errno(ret);
> +	}
>  	ocfs2_inode_unlock(inode, 1);
>  	brelse(di_bh);
>  out:


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
  2026-03-13  2:04   ` Joseph Qi
@ 2026-03-13  2:27   ` Joseph Qi
  1 sibling, 0 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-13  2:27 UTC (permalink / raw)
  To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara



On 3/13/26 12:27 AM, Heming Zhao wrote:
> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> ocfs2_dio_end_io_write
>  ocfs2_mark_extent_written
>   ocfs2_change_extent_flag
>    ocfs2_split_extent
>     ocfs2_try_to_merge_extent
>      ocfs2_extend_rotate_transaction
>       ocfs2_extend_trans
>        jbd2__journal_restart
>         start_this_handle
>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> 
> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> handle each extent in a separate transaction.
> 
> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> only be removed from the orphan list after the extent tree update is complete.
> this ensures that if a crash occurs in the middle of extent tree updates, we
> won't leave stale blocks beyond EOF.
> 
> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> insufficient transaction credits").
> 
> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>  1 file changed, 29 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 09146b43d1f0..91997b330d39 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  		goto out;
>  	}
>  
> -	/* Delete orphan before acquire i_rwsem. */
> -	if (dwc->dw_orphaned) {
> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> -		end = end > i_size_read(inode) ? end : 0;
> -
> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> -				!!end, end);
> -		if (ret < 0)
> -			mlog_errno(ret);
> -	}
> -
>  	down_write(&oi->ip_alloc_sem);
>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>  
> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  
>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>  
> -	handle = ocfs2_start_trans(osb, credits);
> -	if (IS_ERR(handle)) {
> -		ret = PTR_ERR(handle);
> -		mlog_errno(ret);
> -		goto unlock;
> -	}
> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> -	if (ret) {
> -		mlog_errno(ret);
> -		goto commit;
> -	}
> -
>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> -		ret = ocfs2_assure_trans_credits(handle, credits);

Since now no any caller for ocfs2_assure_trans_credits(), we may remove
it in PATCH 2 of this series.

Thanks,
Joseph

> -		if (ret < 0) {
> +		handle = ocfs2_start_trans(osb, credits);
> +		if (IS_ERR(handle)) {
> +			ret = PTR_ERR(handle);
>  			mlog_errno(ret);
>  			break;
>  		}
> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> +					      OCFS2_JOURNAL_ACCESS_WRITE);
> +		if (ret) {
> +			mlog_errno(ret);
> +			ocfs2_commit_trans(osb, handle);
> +			break;
> +		}
>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>  						ue->ue_cpos, 1,
>  						ue->ue_phys,
>  						meta_ac, &dealloc);
>  		if (ret < 0) {
>  			mlog_errno(ret);
> +			ocfs2_commit_trans(osb, handle);
>  			break;
>  		}
> +		ocfs2_commit_trans(osb, handle);
>  	}
>  
>  	if (end > i_size_read(inode)) {
> +		handle = ocfs2_start_trans(osb, credits);
> +		if (IS_ERR(handle)) {
> +			ret = PTR_ERR(handle);
> +			mlog_errno(ret);
> +			goto unlock;
> +		}
>  		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
>  		if (ret < 0)
>  			mlog_errno(ret);
> +		ocfs2_commit_trans(osb, handle);
>  	}
> -commit:
> -	ocfs2_commit_trans(osb, handle);
> +
>  unlock:
>  	up_write(&oi->ip_alloc_sem);
> +
> +	/* everything looks good, let's start the cleanup */
> +	if (dwc->dw_orphaned) {
> +		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> +		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> +		if (ret < 0)
> +			mlog_errno(ret);
> +	}
>  	ocfs2_inode_unlock(inode, 1);
>  	brelse(di_bh);
>  out:


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-13  2:04   ` Joseph Qi
@ 2026-03-13  3:17     ` Heming Zhao
  2026-03-13  4:53       ` Joseph Qi
  0 siblings, 1 reply; 13+ messages in thread
From: Heming Zhao @ 2026-03-13  3:17 UTC (permalink / raw)
  To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara

On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> Almost looks fine, minor updates below.
> 
> On 3/13/26 12:27 AM, Heming Zhao wrote:
> > During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> > ocfs2_dio_end_io_write
> >  ocfs2_mark_extent_written
> >   ocfs2_change_extent_flag
> >    ocfs2_split_extent
> >     ocfs2_try_to_merge_extent
> >      ocfs2_extend_rotate_transaction
> >       ocfs2_extend_trans
> >        jbd2__journal_restart
> >         start_this_handle
> >          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> > 
> > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> > handle each extent in a separate transaction.
> > 
> > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > only be removed from the orphan list after the extent tree update is complete.
> > this ensures that if a crash occurs in the middle of extent tree updates, we
> > won't leave stale blocks beyond EOF.
> > 
> > This patch also removes the only call to ocfs2_assure_trans_credits(), which
> > was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> > insufficient transaction credits").
> > 
> > Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> > 
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> >  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> >  1 file changed, 29 insertions(+), 29 deletions(-)
> > 
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 09146b43d1f0..91997b330d39 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >  		goto out;
> >  	}
> >  
> > -	/* Delete orphan before acquire i_rwsem. */
> > -	if (dwc->dw_orphaned) {
> > -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > -
> > -		end = end > i_size_read(inode) ? end : 0;
> > -
> > -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> > -				!!end, end);
> > -		if (ret < 0)
> > -			mlog_errno(ret);
> > -	}
> > -
> >  	down_write(&oi->ip_alloc_sem);
> >  	di = (struct ocfs2_dinode *)di_bh->b_data;
> >  
> > @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >  
> >  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >  
> > -	handle = ocfs2_start_trans(osb, credits);
> > -	if (IS_ERR(handle)) {
> > -		ret = PTR_ERR(handle);
> > -		mlog_errno(ret);
> > -		goto unlock;
> > -	}
> > -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > -				      OCFS2_JOURNAL_ACCESS_WRITE);
> > -	if (ret) {
> > -		mlog_errno(ret);
> > -		goto commit;
> > -	}
> > -
> >  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> > -		ret = ocfs2_assure_trans_credits(handle, credits);
> > -		if (ret < 0) {
> > +		handle = ocfs2_start_trans(osb, credits);
> > +		if (IS_ERR(handle)) {
> > +			ret = PTR_ERR(handle);
> >  			mlog_errno(ret);
> >  			break;
> 
> I'd rather goto unlock directly without update i_size in case error.

agree
> 
> >  		}
> > +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > +					      OCFS2_JOURNAL_ACCESS_WRITE);
> > +		if (ret) {
> > +			mlog_errno(ret);
> > +			ocfs2_commit_trans(osb, handle);
> > +			break;
> 
> Ditto.

agree
> 
> > +		}
> >  		ret = ocfs2_mark_extent_written(inode, &et, handle,
> >  						ue->ue_cpos, 1,
> >  						ue->ue_phys,
> >  						meta_ac, &dealloc);
> >  		if (ret < 0) {
> >  			mlog_errno(ret);
> > +			ocfs2_commit_trans(osb, handle);
> >  			break;
> 
> Ditto.

The existing code still updates i_size even if ocfs2_mark_extent_written()
returns an error. I am not certain whether updating i_size in this case is
correct, but I prefer to maintain the original logic for now.
Does that seem reasonable to you?

> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> to save the extra call in case error.

agree
> 
> Thanks,
> Joseph
> 

I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
is about 12107 ~ 12457.
Regarding the transaction splitting in the unwritten handling loop: this patch
introduces a minor performance regression. I would like to merge some of the
transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
What do you think of this approach?

Heming
> >  		}
> > +		ocfs2_commit_trans(osb, handle);
> >  	}
> >  
> >  	if (end > i_size_read(inode)) {
> > +		handle = ocfs2_start_trans(osb, credits);
> > +		if (IS_ERR(handle)) {
> > +			ret = PTR_ERR(handle);
> > +			mlog_errno(ret);
> > +			goto unlock;
> > +		}
> >  		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> >  		if (ret < 0)
> >  			mlog_errno(ret);
> > +		ocfs2_commit_trans(osb, handle);
> >  	}
> > -commit:
> > -	ocfs2_commit_trans(osb, handle);
> > +
> >  unlock:
> >  	up_write(&oi->ip_alloc_sem);
> > +
> > +	/* everything looks good, let's start the cleanup */
> > +	if (dwc->dw_orphaned) {
> > +		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > +
> > +		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> > +		if (ret < 0)
> > +			mlog_errno(ret);
> > +	}
> >  	ocfs2_inode_unlock(inode, 1);
> >  	brelse(di_bh);
> >  out:
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-13  3:17     ` Heming Zhao
@ 2026-03-13  4:53       ` Joseph Qi
  2026-03-13  5:35         ` Joseph Qi
  2026-03-18  6:40         ` Heming Zhao
  0 siblings, 2 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-13  4:53 UTC (permalink / raw)
  To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara



On 3/13/26 11:17 AM, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>> Almost looks fine, minor updates below.
>>
>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>> ocfs2_dio_end_io_write
>>>  ocfs2_mark_extent_written
>>>   ocfs2_change_extent_flag
>>>    ocfs2_split_extent
>>>     ocfs2_try_to_merge_extent
>>>      ocfs2_extend_rotate_transaction
>>>       ocfs2_extend_trans
>>>        jbd2__journal_restart
>>>         start_this_handle
>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>
>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>> handle each extent in a separate transaction.
>>>
>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>> only be removed from the orphan list after the extent tree update is complete.
>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>> won't leave stale blocks beyond EOF.
>>>
>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>> insufficient transaction credits").
>>>
>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>
>>> Suggested-by: Jan Kara <jack@suse.cz>
>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>> ---
>>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>  1 file changed, 29 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 09146b43d1f0..91997b330d39 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>  		goto out;
>>>  	}
>>>  
>>> -	/* Delete orphan before acquire i_rwsem. */
>>> -	if (dwc->dw_orphaned) {
>>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>> -
>>> -		end = end > i_size_read(inode) ? end : 0;
>>> -
>>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>> -				!!end, end);
>>> -		if (ret < 0)
>>> -			mlog_errno(ret);
>>> -	}
>>> -
>>>  	down_write(&oi->ip_alloc_sem);
>>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>>>  
>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>  
>>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>  
>>> -	handle = ocfs2_start_trans(osb, credits);
>>> -	if (IS_ERR(handle)) {
>>> -		ret = PTR_ERR(handle);
>>> -		mlog_errno(ret);
>>> -		goto unlock;
>>> -	}
>>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
>>> -	if (ret) {
>>> -		mlog_errno(ret);
>>> -		goto commit;
>>> -	}
>>> -
>>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>> -		ret = ocfs2_assure_trans_credits(handle, credits);
>>> -		if (ret < 0) {
>>> +		handle = ocfs2_start_trans(osb, credits);
>>> +		if (IS_ERR(handle)) {
>>> +			ret = PTR_ERR(handle);
>>>  			mlog_errno(ret);
>>>  			break;
>>
>> I'd rather goto unlock directly without update i_size in case error.
> 
> agree
>>
>>>  		}
>>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
>>> +		if (ret) {
>>> +			mlog_errno(ret);
>>> +			ocfs2_commit_trans(osb, handle);
>>> +			break;
>>
>> Ditto.
> 
> agree
>>
>>> +		}
>>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>  						ue->ue_cpos, 1,
>>>  						ue->ue_phys,
>>>  						meta_ac, &dealloc);
>>>  		if (ret < 0) {
>>>  			mlog_errno(ret);
>>> +			ocfs2_commit_trans(osb, handle);
>>>  			break;
>>
>> Ditto.
> 
> The existing code still updates i_size even if ocfs2_mark_extent_written()
> returns an error. I am not certain whether updating i_size in this case is
> correct, but I prefer to maintain the original logic for now.
> Does that seem reasonable to you?
> 

Since it returns 0 for unwritten extents, it looks fine.

>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>> to save the extra call in case error.
> 
> agree
> 
> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> is about 12107 ~ 12457.
> Regarding the transaction splitting in the unwritten handling loop: this patch
> introduces a minor performance regression. I would like to merge some of the
> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> What do you think of this approach?
> 

Seems we can limit max credit for a single transaction here?
If exceeds, restart a new one.

Joseph,
Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-13  4:53       ` Joseph Qi
@ 2026-03-13  5:35         ` Joseph Qi
  2026-03-18  6:40         ` Heming Zhao
  1 sibling, 0 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-13  5:35 UTC (permalink / raw)
  To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara



On 3/13/26 12:53 PM, Joseph Qi wrote:
> 
> 
> On 3/13/26 11:17 AM, Heming Zhao wrote:
>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>> Almost looks fine, minor updates below.
>>>
>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>> ocfs2_dio_end_io_write
>>>>  ocfs2_mark_extent_written
>>>>   ocfs2_change_extent_flag
>>>>    ocfs2_split_extent
>>>>     ocfs2_try_to_merge_extent
>>>>      ocfs2_extend_rotate_transaction
>>>>       ocfs2_extend_trans
>>>>        jbd2__journal_restart
>>>>         start_this_handle
>>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>
>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>> handle each extent in a separate transaction.
>>>>
>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>> only be removed from the orphan list after the extent tree update is complete.
>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>> won't leave stale blocks beyond EOF.
>>>>
>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>> insufficient transaction credits").
>>>>
>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>
>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>> ---
>>>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>>  1 file changed, 29 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>> index 09146b43d1f0..91997b330d39 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>  		goto out;
>>>>  	}
>>>>  
>>>> -	/* Delete orphan before acquire i_rwsem. */
>>>> -	if (dwc->dw_orphaned) {
>>>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>> -
>>>> -		end = end > i_size_read(inode) ? end : 0;
>>>> -
>>>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>> -				!!end, end);
>>>> -		if (ret < 0)
>>>> -			mlog_errno(ret);
>>>> -	}
>>>> -
>>>>  	down_write(&oi->ip_alloc_sem);
>>>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>  
>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>  
>>>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>  
>>>> -	handle = ocfs2_start_trans(osb, credits);
>>>> -	if (IS_ERR(handle)) {
>>>> -		ret = PTR_ERR(handle);
>>>> -		mlog_errno(ret);
>>>> -		goto unlock;
>>>> -	}
>>>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
>>>> -	if (ret) {
>>>> -		mlog_errno(ret);
>>>> -		goto commit;
>>>> -	}
>>>> -
>>>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>> -		ret = ocfs2_assure_trans_credits(handle, credits);
>>>> -		if (ret < 0) {
>>>> +		handle = ocfs2_start_trans(osb, credits);
>>>> +		if (IS_ERR(handle)) {
>>>> +			ret = PTR_ERR(handle);
>>>>  			mlog_errno(ret);
>>>>  			break;
>>>
>>> I'd rather goto unlock directly without update i_size in case error.
>>
>> agree
>>>
>>>>  		}
>>>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
>>>> +		if (ret) {
>>>> +			mlog_errno(ret);
>>>> +			ocfs2_commit_trans(osb, handle);
>>>> +			break;
>>>
>>> Ditto.
>>
>> agree
>>>
>>>> +		}
>>>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>>  						ue->ue_cpos, 1,
>>>>  						ue->ue_phys,
>>>>  						meta_ac, &dealloc);
>>>>  		if (ret < 0) {
>>>>  			mlog_errno(ret);
>>>> +			ocfs2_commit_trans(osb, handle);
>>>>  			break;
>>>
>>> Ditto.
>>
>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>> returns an error. I am not certain whether updating i_size in this case is
>> correct, but I prefer to maintain the original logic for now.
>> Does that seem reasonable to you?
>>
> 
> Since it returns 0 for unwritten extents, it looks fine.
> 
Think it a bit more, I think it would be more acceptable to behave
consistent here.
e.g. succeeds in the first round and fails to start transaction in the
second, now it won't update i_size.

Joseph

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-13  4:53       ` Joseph Qi
  2026-03-13  5:35         ` Joseph Qi
@ 2026-03-18  6:40         ` Heming Zhao
  2026-03-19  1:53           ` Joseph Qi
  2026-03-24  6:00           ` Heming Zhao
  1 sibling, 2 replies; 13+ messages in thread
From: Heming Zhao @ 2026-03-18  6:40 UTC (permalink / raw)
  To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su

On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
> 
> 
> On 3/13/26 11:17 AM, Heming Zhao wrote:
> > On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> >> Almost looks fine, minor updates below.
> >>
> >> On 3/13/26 12:27 AM, Heming Zhao wrote:
> >>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> >>> ocfs2_dio_end_io_write
> >>>  ocfs2_mark_extent_written
> >>>   ocfs2_change_extent_flag
> >>>    ocfs2_split_extent
> >>>     ocfs2_try_to_merge_extent
> >>>      ocfs2_extend_rotate_transaction
> >>>       ocfs2_extend_trans
> >>>        jbd2__journal_restart
> >>>         start_this_handle
> >>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >>>
> >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> >>> handle each extent in a separate transaction.
> >>>
> >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> >>> only be removed from the orphan list after the extent tree update is complete.
> >>> this ensures that if a crash occurs in the middle of extent tree updates, we
> >>> won't leave stale blocks beyond EOF.
> >>>
> >>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> >>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> >>> insufficient transaction credits").
> >>>
> >>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> >>>
> >>> Suggested-by: Jan Kara <jack@suse.cz>
> >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> >>> ---
> >>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> >>>  1 file changed, 29 insertions(+), 29 deletions(-)
> >>>
> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> >>> index 09146b43d1f0..91997b330d39 100644
> >>> --- a/fs/ocfs2/aops.c
> >>> +++ b/fs/ocfs2/aops.c
> >>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>  		goto out;
> >>>  	}
> >>>  
> >>> -	/* Delete orphan before acquire i_rwsem. */
> >>> -	if (dwc->dw_orphaned) {
> >>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> >>> -
> >>> -		end = end > i_size_read(inode) ? end : 0;
> >>> -
> >>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> >>> -				!!end, end);
> >>> -		if (ret < 0)
> >>> -			mlog_errno(ret);
> >>> -	}
> >>> -
> >>>  	down_write(&oi->ip_alloc_sem);
> >>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
> >>>  
> >>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>  
> >>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >>>  
> >>> -	handle = ocfs2_start_trans(osb, credits);
> >>> -	if (IS_ERR(handle)) {
> >>> -		ret = PTR_ERR(handle);
> >>> -		mlog_errno(ret);
> >>> -		goto unlock;
> >>> -	}
> >>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> >>> -	if (ret) {
> >>> -		mlog_errno(ret);
> >>> -		goto commit;
> >>> -	}
> >>> -
> >>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> >>> -		ret = ocfs2_assure_trans_credits(handle, credits);
> >>> -		if (ret < 0) {
> >>> +		handle = ocfs2_start_trans(osb, credits);
> >>> +		if (IS_ERR(handle)) {
> >>> +			ret = PTR_ERR(handle);
> >>>  			mlog_errno(ret);
> >>>  			break;
> >>
> >> I'd rather goto unlock directly without update i_size in case error.
> > 
> > agree
> >>
> >>>  		}
> >>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
> >>> +		if (ret) {
> >>> +			mlog_errno(ret);
> >>> +			ocfs2_commit_trans(osb, handle);
> >>> +			break;
> >>
> >> Ditto.
> > 
> > agree
> >>
> >>> +		}
> >>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
> >>>  						ue->ue_cpos, 1,
> >>>  						ue->ue_phys,
> >>>  						meta_ac, &dealloc);
> >>>  		if (ret < 0) {
> >>>  			mlog_errno(ret);
> >>> +			ocfs2_commit_trans(osb, handle);
> >>>  			break;
> >>
> >> Ditto.
> > 
> > The existing code still updates i_size even if ocfs2_mark_extent_written()
> > returns an error. I am not certain whether updating i_size in this case is
> > correct, but I prefer to maintain the original logic for now.
> > Does that seem reasonable to you?
> > 
> 
> Since it returns 0 for unwritten extents, it looks fine.
> 
> >> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> >> to save the extra call in case error.
> > 
> > agree
> > 
> > I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> > is about 12107 ~ 12457.
> > Regarding the transaction splitting in the unwritten handling loop: this patch
> > introduces a minor performance regression. I would like to merge some of the
> > transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> > What do you think of this approach?
> > 
> 
> Seems we can limit max credit for a single transaction here?
> If exceeds, restart a new one.
> 
> Joseph,
> Thanks

sorry for the late reply, I spend some time running the tests.
I have created two different types of patches (see below) to address this issue.

time consumption results:

ocfs2-dynamic-... patch
real    2m9.900s
user    0m0.333s
sys     0m22.687s

%200 of ocfs2-split-trans-... patch
real    1m48.901s
user    0m0.306s
sys     0m23.019s

%500 of ocfs2-split-trans-... patch
real    1m49.350s
user    0m0.299s
sys     0m22.718s

As shown above, the fixed "500% mode" is faster than the dynamic style.
Which approach do you prefer?

---
the test script:
```
#!/bin/bash
# thanks Wolfgang

DISK="vdc"
MOUNTPOINT="/mnt/ocfs2"
TEST_FILE="$MOUNTPOINT/test.bin"

if lsblk | grep "$DISK.*$MOUNTPOINT"; then
    umount $MOUNTPOINT
fi
echo "------mkfs & mount----------------"
mkfs.ocfs2 -b 4K -C 4K --cluster-stack=pcmk -N 2 /dev/$DISK --cluster-name=hacluster
mount.ocfs2 -t ocfs2 /dev/$DISK $MOUNTPOINT

sleep 1
echo "------fallocate----------------"
fallocate -l 2G "$TEST_FILE"

sleep 1
echo "------setup file----------------"
fio --name=setup --filename="$TEST_FILE" --rw=randwrite --bs=4k \
    --ioengine=libaio --iodepth=64 --size=2G --io_size=512M \
    --direct=0 --end_fsync=1 --minimal

sync
sleep 3

echo "-----start test-----------------"
time fio --name=tst --filename="$TEST_FILE" --rw=write --bs=64M \
    --ioengine=libaio --iodepth=16 --size=2G --direct=1
```

---
patch: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch
Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path

---
 fs/ocfs2/aops.c    | 31 +++++++++++++++++--------------
 fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++----
 fs/ocfs2/journal.h |  5 +++--
 3 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..d9be765662dc 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	handle_t *handle = NULL;
 	loff_t end = offset + bytes;
 	int ret = 0, credits = 0;
+	/* do the same job of jbd2_max_user_trans_buffers() */
+	journal_t *journal = osb->journal->j_journal;
+	int jbd2_max = journal->j_max_transaction_buffers -
+			journal->j_transaction_overhead_buffers;
 
 	ocfs2_init_dealloc_ctxt(&dealloc);
 
@@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 		goto out;
 	}
 
-	/* Delete orphan before acquire i_rwsem. */
-	if (dwc->dw_orphaned) {
-		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
-		end = end > i_size_read(inode) ? end : 0;
-
-		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
-				!!end, end);
-		if (ret < 0)
-			mlog_errno(ret);
-	}
-
 	down_write(&oi->ip_alloc_sem);
 	di = (struct ocfs2_dinode *)di_bh->b_data;
 
@@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	}
 
 	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
-		ret = ocfs2_assure_trans_credits(handle, credits);
-		if (ret < 0) {
+		ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max);
+		if (ret == 1) {
+			goto unlock;
+		} else if (ret < 0) {
 			mlog_errno(ret);
 			break;
 		}
@@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	ocfs2_commit_trans(osb, handle);
 unlock:
 	up_write(&oi->ip_alloc_sem);
+
+	if (dwc->dw_orphaned) {
+		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+		if (ret < 0)
+			mlog_errno(ret);
+	}
+
 	ocfs2_inode_unlock(inode, 1);
 	brelse(di_bh);
 out:
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index e5f58ff2175f..57ad69d19494 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks)
  * credits. Similar notes regarding data consistency and locking implications
  * as for ocfs2_extend_trans() apply here.
  */
-int ocfs2_assure_trans_credits(handle_t *handle, int nblocks)
+int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle,
+				struct buffer_head *bh, int nblocks, int jbd2_max)
 {
+	handle_t *handle = *i_handle;
 	int old_nblks = jbd2_handle_buffer_credits(handle);
+	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+	int ret, wanted;
+	int plan = nblocks << 1; /* Double the credits to prevent boundary issues */
 
 	trace_ocfs2_assure_trans_credits(old_nblks);
-	if (old_nblks >= nblocks)
-		return 0;
-	return ocfs2_extend_trans(handle, nblocks - old_nblks);
+
+	wanted = old_nblks + plan;
+
+	if (wanted < jbd2_max) {
+		if (old_nblks < nblocks)
+			return ocfs2_extend_trans(handle, nblocks - old_nblks);
+		else
+			return 0;
+	}
+
+	ocfs2_commit_trans(osb, handle);
+	handle = ocfs2_start_trans(osb, plan);
+	if (IS_ERR(handle)) {
+		mlog_errno(PTR_ERR(handle));
+		return 1; /* caller must not commit trans for this error */
+	}
+
+	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh,
+			OCFS2_JOURNAL_ACCESS_WRITE);
+	if (ret)
+		mlog_errno(ret);
+
+	*i_handle = handle;
+	return ret;
 }
 
 /*
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 6397170f302f..08a76ffb870a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -244,8 +244,9 @@ handle_t		    *ocfs2_start_trans(struct ocfs2_super *osb,
 int			     ocfs2_commit_trans(struct ocfs2_super *osb,
 						handle_t *handle);
 int			     ocfs2_extend_trans(handle_t *handle, int nblocks);
-int			     ocfs2_assure_trans_credits(handle_t *handle,
-						int nblocks);
+int			     ocfs2_assure_trans_credits(struct inode *inode,
+					handle_t **i_handle, struct buffer_head *bh,
+					int nblocks, int jbd2_max);
 int			     ocfs2_allocate_extend_trans(handle_t *handle,
 						int thresh);
  
--
2.43.0

---------- 
patch: 0001-ocfs2-split-trans-in-end-dio-to-avoid-credit-exhaust.patch
Subject: [PATCH] ocfs2: split trans in end dio to avoid credit exhaustion

---
 fs/ocfs2/aops.c | 77 +++++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..9053f1ee587a 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2279,6 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	handle_t *handle = NULL;
 	loff_t end = offset + bytes;
 	int ret = 0, credits = 0;
+	int cnt = 0, trans_start = 0, mod = 500;
 
 	ocfs2_init_dealloc_ctxt(&dealloc);
 
@@ -2295,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 		goto out;
 	}
 
-	/* Delete orphan before acquire i_rwsem. */
-	if (dwc->dw_orphaned) {
-		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
-		end = end > i_size_read(inode) ? end : 0;
-
-		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
-				!!end, end);
-		if (ret < 0)
-			mlog_errno(ret);
-	}
-
 	down_write(&oi->ip_alloc_sem);
 	di = (struct ocfs2_dinode *)di_bh->b_data;
 
@@ -2327,44 +2316,68 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 
 	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
 
-	handle = ocfs2_start_trans(osb, credits);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-		mlog_errno(ret);
-		goto unlock;
-	}
-	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
-				      OCFS2_JOURNAL_ACCESS_WRITE);
-	if (ret) {
-		mlog_errno(ret);
-		goto commit;
-	}
-
 	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
-		ret = ocfs2_assure_trans_credits(handle, credits);
-		if (ret < 0) {
-			mlog_errno(ret);
-			break;
+		if ((cnt++ % mod) == 0) {
+			trans_start = 1;
+			handle = ocfs2_start_trans(osb, credits);
+			if (IS_ERR(handle)) {
+				ret = PTR_ERR(handle);
+				mlog_errno(ret);
+				goto unlock;
+			}
+			ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+					OCFS2_JOURNAL_ACCESS_WRITE);
+			if (ret) {
+				mlog_errno(ret);
+				goto unlock;
+			}
 		}
 		ret = ocfs2_mark_extent_written(inode, &et, handle,
 						ue->ue_cpos, 1,
 						ue->ue_phys,
 						meta_ac, &dealloc);
+		if ((cnt % mod) == 0) {
+			ocfs2_commit_trans(osb, handle);
+			if (ret < 0) {
+				mlog_errno(ret);
+				break;
+			}
+			trans_start = 0;
+		}
+	}
+
+	//commit if above loop doesn't do
+	if (trans_start) {
+		ocfs2_commit_trans(osb, handle);
 		if (ret < 0) {
 			mlog_errno(ret);
-			break;
 		}
 	}
 
 	if (end > i_size_read(inode)) {
+		handle = ocfs2_start_trans(osb, credits);
+		if (IS_ERR(handle)) {
+			ret = PTR_ERR(handle);
+			mlog_errno(ret);
+			goto unlock;
+		}
 		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
 		if (ret < 0)
 			mlog_errno(ret);
+		ocfs2_commit_trans(osb, handle);
 	}
-commit:
-	ocfs2_commit_trans(osb, handle);
+
 unlock:
 	up_write(&oi->ip_alloc_sem);
+
+	/* everything looks good, let's start the cleanup */
+	if (dwc->dw_orphaned) {
+		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+		if (ret < 0)
+			mlog_errno(ret);
+	}
 	ocfs2_inode_unlock(inode, 1);
 	brelse(di_bh);
 out:
--
2.43.0


Thanks,
Heming

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-18  6:40         ` Heming Zhao
@ 2026-03-19  1:53           ` Joseph Qi
  2026-03-24  7:18             ` Heming Zhao
  2026-03-25  3:31             ` Heming Zhao
  2026-03-24  6:00           ` Heming Zhao
  1 sibling, 2 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-19  1:53 UTC (permalink / raw)
  To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su



On 3/18/26 2:40 PM, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
>>
>>
>> On 3/13/26 11:17 AM, Heming Zhao wrote:
>>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>>> Almost looks fine, minor updates below.
>>>>
>>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>>> ocfs2_dio_end_io_write
>>>>>  ocfs2_mark_extent_written
>>>>>   ocfs2_change_extent_flag
>>>>>    ocfs2_split_extent
>>>>>     ocfs2_try_to_merge_extent
>>>>>      ocfs2_extend_rotate_transaction
>>>>>       ocfs2_extend_trans
>>>>>        jbd2__journal_restart
>>>>>         start_this_handle
>>>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>>
>>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>>> handle each extent in a separate transaction.
>>>>>
>>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>>> only be removed from the orphan list after the extent tree update is complete.
>>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>>> won't leave stale blocks beyond EOF.
>>>>>
>>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>>> insufficient transaction credits").
>>>>>
>>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>>
>>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>>> ---
>>>>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>>>  1 file changed, 29 insertions(+), 29 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>> index 09146b43d1f0..91997b330d39 100644
>>>>> --- a/fs/ocfs2/aops.c
>>>>> +++ b/fs/ocfs2/aops.c
>>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>  		goto out;
>>>>>  	}
>>>>>  
>>>>> -	/* Delete orphan before acquire i_rwsem. */
>>>>> -	if (dwc->dw_orphaned) {
>>>>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>>> -
>>>>> -		end = end > i_size_read(inode) ? end : 0;
>>>>> -
>>>>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>>> -				!!end, end);
>>>>> -		if (ret < 0)
>>>>> -			mlog_errno(ret);
>>>>> -	}
>>>>> -
>>>>>  	down_write(&oi->ip_alloc_sem);
>>>>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>>  
>>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>  
>>>>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>>  
>>>>> -	handle = ocfs2_start_trans(osb, credits);
>>>>> -	if (IS_ERR(handle)) {
>>>>> -		ret = PTR_ERR(handle);
>>>>> -		mlog_errno(ret);
>>>>> -		goto unlock;
>>>>> -	}
>>>>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
>>>>> -	if (ret) {
>>>>> -		mlog_errno(ret);
>>>>> -		goto commit;
>>>>> -	}
>>>>> -
>>>>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>>> -		ret = ocfs2_assure_trans_credits(handle, credits);
>>>>> -		if (ret < 0) {
>>>>> +		handle = ocfs2_start_trans(osb, credits);
>>>>> +		if (IS_ERR(handle)) {
>>>>> +			ret = PTR_ERR(handle);
>>>>>  			mlog_errno(ret);
>>>>>  			break;
>>>>
>>>> I'd rather goto unlock directly without update i_size in case error.
>>>
>>> agree
>>>>
>>>>>  		}
>>>>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
>>>>> +		if (ret) {
>>>>> +			mlog_errno(ret);
>>>>> +			ocfs2_commit_trans(osb, handle);
>>>>> +			break;
>>>>
>>>> Ditto.
>>>
>>> agree
>>>>
>>>>> +		}
>>>>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>>>  						ue->ue_cpos, 1,
>>>>>  						ue->ue_phys,
>>>>>  						meta_ac, &dealloc);
>>>>>  		if (ret < 0) {
>>>>>  			mlog_errno(ret);
>>>>> +			ocfs2_commit_trans(osb, handle);
>>>>>  			break;
>>>>
>>>> Ditto.
>>>
>>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>>> returns an error. I am not certain whether updating i_size in this case is
>>> correct, but I prefer to maintain the original logic for now.
>>> Does that seem reasonable to you?
>>>
>>
>> Since it returns 0 for unwritten extents, it looks fine.
>>
>>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>>>> to save the extra call in case error.
>>>
>>> agree
>>>
>>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
>>> is about 12107 ~ 12457.
>>> Regarding the transaction splitting in the unwritten handling loop: this patch
>>> introduces a minor performance regression. I would like to merge some of the
>>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
>>> What do you think of this approach?
>>>
>>
>> Seems we can limit max credit for a single transaction here?
>> If exceeds, restart a new one.
>>
>> Joseph,
>> Thanks
> 
> sorry for the late reply, I spend some time running the tests.
> I have created two different types of patches (see below) to address this issue.
> 
> time consumption results:
> 
> ocfs2-dynamic-... patch
> real    2m9.900s
> user    0m0.333s
> sys     0m22.687s
> 
> %200 of ocfs2-split-trans-... patch
> real    1m48.901s
> user    0m0.306s
> sys     0m23.019s
> 
> %500 of ocfs2-split-trans-... patch
> real    1m49.350s
> user    0m0.299s
> sys     0m22.718s
> 
> As shown above, the fixed "500% mode" is faster than the dynamic style.
> Which approach do you prefer?
> 
I'd prefer solution B, it looks simpler to directly set a commit batch.
Some comments for the formal patch:
1. I don't think you have to use 'cnt + trans_start + mod' to control
the batch commit, just check 'batch + handle' seems enough.
2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
3. Do not update i_size in case error.
4. Correctly handle commit transaction in case error.

Thanks,
Joseph



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-18  6:40         ` Heming Zhao
  2026-03-19  1:53           ` Joseph Qi
@ 2026-03-24  6:00           ` Heming Zhao
  1 sibling, 0 replies; 13+ messages in thread
From: Heming Zhao @ 2026-03-24  6:00 UTC (permalink / raw)
  To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su

ping...

On Wed, Mar 18, 2026 at 02:40:45PM +0800, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
> > 
> > 
> > On 3/13/26 11:17 AM, Heming Zhao wrote:
> > > On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> > >> Almost looks fine, minor updates below.
> > >>
> > >> On 3/13/26 12:27 AM, Heming Zhao wrote:
> > >>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> > >>> ocfs2_dio_end_io_write
> > >>>  ocfs2_mark_extent_written
> > >>>   ocfs2_change_extent_flag
> > >>>    ocfs2_split_extent
> > >>>     ocfs2_try_to_merge_extent
> > >>>      ocfs2_extend_rotate_transaction
> > >>>       ocfs2_extend_trans
> > >>>        jbd2__journal_restart
> > >>>         start_this_handle
> > >>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> > >>>
> > >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> > >>> handle each extent in a separate transaction.
> > >>>
> > >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > >>> only be removed from the orphan list after the extent tree update is complete.
> > >>> this ensures that if a crash occurs in the middle of extent tree updates, we
> > >>> won't leave stale blocks beyond EOF.
> > >>>
> > >>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> > >>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> > >>> insufficient transaction credits").
> > >>>
> > >>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> > >>>
> > >>> Suggested-by: Jan Kara <jack@suse.cz>
> > >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > >>> ---
> > >>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> > >>>  1 file changed, 29 insertions(+), 29 deletions(-)
> > >>>
> > >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > >>> index 09146b43d1f0..91997b330d39 100644
> > >>> --- a/fs/ocfs2/aops.c
> > >>> +++ b/fs/ocfs2/aops.c
> > >>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > >>>  		goto out;
> > >>>  	}
> > >>>  
> > >>> -	/* Delete orphan before acquire i_rwsem. */
> > >>> -	if (dwc->dw_orphaned) {
> > >>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > >>> -
> > >>> -		end = end > i_size_read(inode) ? end : 0;
> > >>> -
> > >>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> > >>> -				!!end, end);
> > >>> -		if (ret < 0)
> > >>> -			mlog_errno(ret);
> > >>> -	}
> > >>> -
> > >>>  	down_write(&oi->ip_alloc_sem);
> > >>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
> > >>>  
> > >>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > >>>  
> > >>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> > >>>  
> > >>> -	handle = ocfs2_start_trans(osb, credits);
> > >>> -	if (IS_ERR(handle)) {
> > >>> -		ret = PTR_ERR(handle);
> > >>> -		mlog_errno(ret);
> > >>> -		goto unlock;
> > >>> -	}
> > >>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > >>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> > >>> -	if (ret) {
> > >>> -		mlog_errno(ret);
> > >>> -		goto commit;
> > >>> -	}
> > >>> -
> > >>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> > >>> -		ret = ocfs2_assure_trans_credits(handle, credits);
> > >>> -		if (ret < 0) {
> > >>> +		handle = ocfs2_start_trans(osb, credits);
> > >>> +		if (IS_ERR(handle)) {
> > >>> +			ret = PTR_ERR(handle);
> > >>>  			mlog_errno(ret);
> > >>>  			break;
> > >>
> > >> I'd rather goto unlock directly without update i_size in case error.
> > > 
> > > agree
> > >>
> > >>>  		}
> > >>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > >>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
> > >>> +		if (ret) {
> > >>> +			mlog_errno(ret);
> > >>> +			ocfs2_commit_trans(osb, handle);
> > >>> +			break;
> > >>
> > >> Ditto.
> > > 
> > > agree
> > >>
> > >>> +		}
> > >>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
> > >>>  						ue->ue_cpos, 1,
> > >>>  						ue->ue_phys,
> > >>>  						meta_ac, &dealloc);
> > >>>  		if (ret < 0) {
> > >>>  			mlog_errno(ret);
> > >>> +			ocfs2_commit_trans(osb, handle);
> > >>>  			break;
> > >>
> > >> Ditto.
> > > 
> > > The existing code still updates i_size even if ocfs2_mark_extent_written()
> > > returns an error. I am not certain whether updating i_size in this case is
> > > correct, but I prefer to maintain the original logic for now.
> > > Does that seem reasonable to you?
> > > 
> > 
> > Since it returns 0 for unwritten extents, it looks fine.
> > 
> > >> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> > >> to save the extra call in case error.
> > > 
> > > agree
> > > 
> > > I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> > > is about 12107 ~ 12457.
> > > Regarding the transaction splitting in the unwritten handling loop: this patch
> > > introduces a minor performance regression. I would like to merge some of the
> > > transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> > > What do you think of this approach?
> > > 
> > 
> > Seems we can limit max credit for a single transaction here?
> > If exceeds, restart a new one.
> > 
> > Joseph,
> > Thanks
> 
> sorry for the late reply, I spend some time running the tests.
> I have created two different types of patches (see below) to address this issue.
> 
> time consumption results:
> 
> ocfs2-dynamic-... patch
> real    2m9.900s
> user    0m0.333s
> sys     0m22.687s
> 
> %200 of ocfs2-split-trans-... patch
> real    1m48.901s
> user    0m0.306s
> sys     0m23.019s
> 
> %500 of ocfs2-split-trans-... patch
> real    1m49.350s
> user    0m0.299s
> sys     0m22.718s
> 
> As shown above, the fixed "500% mode" is faster than the dynamic style.
> Which approach do you prefer?
> 
> ---
> the test script:
> ```
> #!/bin/bash
> # thanks Wolfgang
> 
> DISK="vdc"
> MOUNTPOINT="/mnt/ocfs2"
> TEST_FILE="$MOUNTPOINT/test.bin"
> 
> if lsblk | grep "$DISK.*$MOUNTPOINT"; then
>     umount $MOUNTPOINT
> fi
> echo "------mkfs & mount----------------"
> mkfs.ocfs2 -b 4K -C 4K --cluster-stack=pcmk -N 2 /dev/$DISK --cluster-name=hacluster
> mount.ocfs2 -t ocfs2 /dev/$DISK $MOUNTPOINT
> 
> sleep 1
> echo "------fallocate----------------"
> fallocate -l 2G "$TEST_FILE"
> 
> sleep 1
> echo "------setup file----------------"
> fio --name=setup --filename="$TEST_FILE" --rw=randwrite --bs=4k \
>     --ioengine=libaio --iodepth=64 --size=2G --io_size=512M \
>     --direct=0 --end_fsync=1 --minimal
> 
> sync
> sleep 3
> 
> echo "-----start test-----------------"
> time fio --name=tst --filename="$TEST_FILE" --rw=write --bs=64M \
>     --ioengine=libaio --iodepth=16 --size=2G --direct=1
> ```
> 
> ---
> patch: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch
> Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path
> 
> ---
>  fs/ocfs2/aops.c    | 31 +++++++++++++++++--------------
>  fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++----
>  fs/ocfs2/journal.h |  5 +++--
>  3 files changed, 50 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 7a65d5a36a3e..d9be765662dc 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	handle_t *handle = NULL;
>  	loff_t end = offset + bytes;
>  	int ret = 0, credits = 0;
> +	/* do the same job of jbd2_max_user_trans_buffers() */
> +	journal_t *journal = osb->journal->j_journal;
> +	int jbd2_max = journal->j_max_transaction_buffers -
> +			journal->j_transaction_overhead_buffers;
>  
>  	ocfs2_init_dealloc_ctxt(&dealloc);
>  
> @@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  		goto out;
>  	}
>  
> -	/* Delete orphan before acquire i_rwsem. */
> -	if (dwc->dw_orphaned) {
> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> -		end = end > i_size_read(inode) ? end : 0;
> -
> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> -				!!end, end);
> -		if (ret < 0)
> -			mlog_errno(ret);
> -	}
> -
>  	down_write(&oi->ip_alloc_sem);
>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>  
> @@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	}
>  
>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> -		ret = ocfs2_assure_trans_credits(handle, credits);
> -		if (ret < 0) {
> +		ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max);
> +		if (ret == 1) {
> +			goto unlock;
> +		} else if (ret < 0) {
>  			mlog_errno(ret);
>  			break;
>  		}
> @@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	ocfs2_commit_trans(osb, handle);
>  unlock:
>  	up_write(&oi->ip_alloc_sem);
> +
> +	if (dwc->dw_orphaned) {
> +		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> +		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> +		if (ret < 0)
> +			mlog_errno(ret);
> +	}
> +
>  	ocfs2_inode_unlock(inode, 1);
>  	brelse(di_bh);
>  out:
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index e5f58ff2175f..57ad69d19494 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks)
>   * credits. Similar notes regarding data consistency and locking implications
>   * as for ocfs2_extend_trans() apply here.
>   */
> -int ocfs2_assure_trans_credits(handle_t *handle, int nblocks)
> +int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle,
> +				struct buffer_head *bh, int nblocks, int jbd2_max)
>  {
> +	handle_t *handle = *i_handle;
>  	int old_nblks = jbd2_handle_buffer_credits(handle);
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	int ret, wanted;
> +	int plan = nblocks << 1; /* Double the credits to prevent boundary issues */
>  
>  	trace_ocfs2_assure_trans_credits(old_nblks);
> -	if (old_nblks >= nblocks)
> -		return 0;
> -	return ocfs2_extend_trans(handle, nblocks - old_nblks);
> +
> +	wanted = old_nblks + plan;
> +
> +	if (wanted < jbd2_max) {
> +		if (old_nblks < nblocks)
> +			return ocfs2_extend_trans(handle, nblocks - old_nblks);
> +		else
> +			return 0;
> +	}
> +
> +	ocfs2_commit_trans(osb, handle);
> +	handle = ocfs2_start_trans(osb, plan);
> +	if (IS_ERR(handle)) {
> +		mlog_errno(PTR_ERR(handle));
> +		return 1; /* caller must not commit trans for this error */
> +	}
> +
> +	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh,
> +			OCFS2_JOURNAL_ACCESS_WRITE);
> +	if (ret)
> +		mlog_errno(ret);
> +
> +	*i_handle = handle;
> +	return ret;
>  }
>  
>  /*
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 6397170f302f..08a76ffb870a 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -244,8 +244,9 @@ handle_t		    *ocfs2_start_trans(struct ocfs2_super *osb,
>  int			     ocfs2_commit_trans(struct ocfs2_super *osb,
>  						handle_t *handle);
>  int			     ocfs2_extend_trans(handle_t *handle, int nblocks);
> -int			     ocfs2_assure_trans_credits(handle_t *handle,
> -						int nblocks);
> +int			     ocfs2_assure_trans_credits(struct inode *inode,
> +					handle_t **i_handle, struct buffer_head *bh,
> +					int nblocks, int jbd2_max);
>  int			     ocfs2_allocate_extend_trans(handle_t *handle,
>  						int thresh);
>   
> --
> 2.43.0
> 
> ---------- 
> patch: 0001-ocfs2-split-trans-in-end-dio-to-avoid-credit-exhaust.patch
> Subject: [PATCH] ocfs2: split trans in end dio to avoid credit exhaustion
> 
> ---
>  fs/ocfs2/aops.c | 77 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 45 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 7a65d5a36a3e..9053f1ee587a 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2279,6 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	handle_t *handle = NULL;
>  	loff_t end = offset + bytes;
>  	int ret = 0, credits = 0;
> +	int cnt = 0, trans_start = 0, mod = 500;
>  
>  	ocfs2_init_dealloc_ctxt(&dealloc);
>  
> @@ -2295,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  		goto out;
>  	}
>  
> -	/* Delete orphan before acquire i_rwsem. */
> -	if (dwc->dw_orphaned) {
> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> -		end = end > i_size_read(inode) ? end : 0;
> -
> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> -				!!end, end);
> -		if (ret < 0)
> -			mlog_errno(ret);
> -	}
> -
>  	down_write(&oi->ip_alloc_sem);
>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>  
> @@ -2327,44 +2316,68 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  
>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>  
> -	handle = ocfs2_start_trans(osb, credits);
> -	if (IS_ERR(handle)) {
> -		ret = PTR_ERR(handle);
> -		mlog_errno(ret);
> -		goto unlock;
> -	}
> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> -	if (ret) {
> -		mlog_errno(ret);
> -		goto commit;
> -	}
> -
>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> -		ret = ocfs2_assure_trans_credits(handle, credits);
> -		if (ret < 0) {
> -			mlog_errno(ret);
> -			break;
> +		if ((cnt++ % mod) == 0) {
> +			trans_start = 1;
> +			handle = ocfs2_start_trans(osb, credits);
> +			if (IS_ERR(handle)) {
> +				ret = PTR_ERR(handle);
> +				mlog_errno(ret);
> +				goto unlock;
> +			}
> +			ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> +					OCFS2_JOURNAL_ACCESS_WRITE);
> +			if (ret) {
> +				mlog_errno(ret);
> +				goto unlock;
> +			}
>  		}
>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>  						ue->ue_cpos, 1,
>  						ue->ue_phys,
>  						meta_ac, &dealloc);
> +		if ((cnt % mod) == 0) {
> +			ocfs2_commit_trans(osb, handle);
> +			if (ret < 0) {
> +				mlog_errno(ret);
> +				break;
> +			}
> +			trans_start = 0;
> +		}
> +	}
> +
> +	//commit if above loop doesn't do
> +	if (trans_start) {
> +		ocfs2_commit_trans(osb, handle);
>  		if (ret < 0) {
>  			mlog_errno(ret);
> -			break;
>  		}
>  	}
>  
>  	if (end > i_size_read(inode)) {
> +		handle = ocfs2_start_trans(osb, credits);
> +		if (IS_ERR(handle)) {
> +			ret = PTR_ERR(handle);
> +			mlog_errno(ret);
> +			goto unlock;
> +		}
>  		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
>  		if (ret < 0)
>  			mlog_errno(ret);
> +		ocfs2_commit_trans(osb, handle);
>  	}
> -commit:
> -	ocfs2_commit_trans(osb, handle);
> +
>  unlock:
>  	up_write(&oi->ip_alloc_sem);
> +
> +	/* everything looks good, let's start the cleanup */
> +	if (dwc->dw_orphaned) {
> +		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> +		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> +		if (ret < 0)
> +			mlog_errno(ret);
> +	}
>  	ocfs2_inode_unlock(inode, 1);
>  	brelse(di_bh);
>  out:
> --
> 2.43.0
> 
> 
> Thanks,
> Heming

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-19  1:53           ` Joseph Qi
@ 2026-03-24  7:18             ` Heming Zhao
  2026-03-25  3:31             ` Heming Zhao
  1 sibling, 0 replies; 13+ messages in thread
From: Heming Zhao @ 2026-03-24  7:18 UTC (permalink / raw)
  To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su

Sorry, I just found this mail. Please ignore my previous mail.

On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote:
> 
> 
> On 3/18/26 2:40 PM, Heming Zhao wrote:
> > On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 3/13/26 11:17 AM, Heming Zhao wrote:
> >>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> >>>> Almost looks fine, minor updates below.
> >>>>
> >>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
> >>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> >>>>> ocfs2_dio_end_io_write
> >>>>>  ocfs2_mark_extent_written
> >>>>>   ocfs2_change_extent_flag
> >>>>>    ocfs2_split_extent
> >>>>>     ocfs2_try_to_merge_extent
> >>>>>      ocfs2_extend_rotate_transaction
> >>>>>       ocfs2_extend_trans
> >>>>>        jbd2__journal_restart
> >>>>>         start_this_handle
> >>>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >>>>>
> >>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> >>>>> handle each extent in a separate transaction.
> >>>>>
> >>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> >>>>> only be removed from the orphan list after the extent tree update is complete.
> >>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
> >>>>> won't leave stale blocks beyond EOF.
> >>>>>
> >>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> >>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> >>>>> insufficient transaction credits").
> >>>>>
> >>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> >>>>>
> >>>>> Suggested-by: Jan Kara <jack@suse.cz>
> >>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> >>>>> ---
> >>>>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> >>>>>  1 file changed, 29 insertions(+), 29 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> >>>>> index 09146b43d1f0..91997b330d39 100644
> >>>>> --- a/fs/ocfs2/aops.c
> >>>>> +++ b/fs/ocfs2/aops.c
> >>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>>>  		goto out;
> >>>>>  	}
> >>>>>  
> >>>>> -	/* Delete orphan before acquire i_rwsem. */
> >>>>> -	if (dwc->dw_orphaned) {
> >>>>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> >>>>> -
> >>>>> -		end = end > i_size_read(inode) ? end : 0;
> >>>>> -
> >>>>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> >>>>> -				!!end, end);
> >>>>> -		if (ret < 0)
> >>>>> -			mlog_errno(ret);
> >>>>> -	}
> >>>>> -
> >>>>>  	down_write(&oi->ip_alloc_sem);
> >>>>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
> >>>>>  
> >>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>>>  
> >>>>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >>>>>  
> >>>>> -	handle = ocfs2_start_trans(osb, credits);
> >>>>> -	if (IS_ERR(handle)) {
> >>>>> -		ret = PTR_ERR(handle);
> >>>>> -		mlog_errno(ret);
> >>>>> -		goto unlock;
> >>>>> -	}
> >>>>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>>>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> >>>>> -	if (ret) {
> >>>>> -		mlog_errno(ret);
> >>>>> -		goto commit;
> >>>>> -	}
> >>>>> -
> >>>>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> >>>>> -		ret = ocfs2_assure_trans_credits(handle, credits);
> >>>>> -		if (ret < 0) {
> >>>>> +		handle = ocfs2_start_trans(osb, credits);
> >>>>> +		if (IS_ERR(handle)) {
> >>>>> +			ret = PTR_ERR(handle);
> >>>>>  			mlog_errno(ret);
> >>>>>  			break;
> >>>>
> >>>> I'd rather goto unlock directly without update i_size in case error.
> >>>
> >>> agree
> >>>>
> >>>>>  		}
> >>>>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>>>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
> >>>>> +		if (ret) {
> >>>>> +			mlog_errno(ret);
> >>>>> +			ocfs2_commit_trans(osb, handle);
> >>>>> +			break;
> >>>>
> >>>> Ditto.
> >>>
> >>> agree
> >>>>
> >>>>> +		}
> >>>>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
> >>>>>  						ue->ue_cpos, 1,
> >>>>>  						ue->ue_phys,
> >>>>>  						meta_ac, &dealloc);
> >>>>>  		if (ret < 0) {
> >>>>>  			mlog_errno(ret);
> >>>>> +			ocfs2_commit_trans(osb, handle);
> >>>>>  			break;
> >>>>
> >>>> Ditto.
> >>>
> >>> The existing code still updates i_size even if ocfs2_mark_extent_written()
> >>> returns an error. I am not certain whether updating i_size in this case is
> >>> correct, but I prefer to maintain the original logic for now.
> >>> Does that seem reasonable to you?
> >>>
> >>
> >> Since it returns 0 for unwritten extents, it looks fine.
> >>
> >>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> >>>> to save the extra call in case error.
> >>>
> >>> agree
> >>>
> >>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> >>> is about 12107 ~ 12457.
> >>> Regarding the transaction splitting in the unwritten handling loop: this patch
> >>> introduces a minor performance regression. I would like to merge some of the
> >>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> >>> What do you think of this approach?
> >>>
> >>
> >> Seems we can limit max credit for a single transaction here?
> >> If exceeds, restart a new one.
> >>
> >> Joseph,
> >> Thanks
> > 
> > sorry for the late reply, I spend some time running the tests.
> > I have created two different types of patches (see below) to address this issue.
> > 
> > time consumption results:
> > 
> > ocfs2-dynamic-... patch
> > real    2m9.900s
> > user    0m0.333s
> > sys     0m22.687s
> > 
> > %200 of ocfs2-split-trans-... patch
> > real    1m48.901s
> > user    0m0.306s
> > sys     0m23.019s
> > 
> > %500 of ocfs2-split-trans-... patch
> > real    1m49.350s
> > user    0m0.299s
> > sys     0m22.718s
> > 
> > As shown above, the fixed "500% mode" is faster than the dynamic style.
> > Which approach do you prefer?
> > 
> I'd prefer solution B, it looks simpler to directly set a commit batch.
> Some comments for the formal patch:
> 1. I don't think you have to use 'cnt + trans_start + mod' to control
> the batch commit, just check 'batch + handle' seems enough.
> 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
> 3. Do not update i_size in case error.
> 4. Correctly handle commit transaction in case error.
> 
> Thanks,
> Joseph
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-19  1:53           ` Joseph Qi
  2026-03-24  7:18             ` Heming Zhao
@ 2026-03-25  3:31             ` Heming Zhao
  2026-03-25  6:33               ` Joseph Qi
  1 sibling, 1 reply; 13+ messages in thread
From: Heming Zhao @ 2026-03-25  3:31 UTC (permalink / raw)
  To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su

On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote:
> 
> 
> On 3/18/26 2:40 PM, Heming Zhao wrote:
> > On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 3/13/26 11:17 AM, Heming Zhao wrote:
> >>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
> >>>> Almost looks fine, minor updates below.
> >>>>
> >>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
> >>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> >>>>> ocfs2_dio_end_io_write
> >>>>>  ocfs2_mark_extent_written
> >>>>>   ocfs2_change_extent_flag
> >>>>>    ocfs2_split_extent
> >>>>>     ocfs2_try_to_merge_extent
> >>>>>      ocfs2_extend_rotate_transaction
> >>>>>       ocfs2_extend_trans
> >>>>>        jbd2__journal_restart
> >>>>>         start_this_handle
> >>>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >>>>>
> >>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> >>>>> handle each extent in a separate transaction.
> >>>>>
> >>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> >>>>> only be removed from the orphan list after the extent tree update is complete.
> >>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
> >>>>> won't leave stale blocks beyond EOF.
> >>>>>
> >>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
> >>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
> >>>>> insufficient transaction credits").
> >>>>>
> >>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
> >>>>>
> >>>>> Suggested-by: Jan Kara <jack@suse.cz>
> >>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> >>>>> ---
> >>>>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
> >>>>>  1 file changed, 29 insertions(+), 29 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> >>>>> index 09146b43d1f0..91997b330d39 100644
> >>>>> --- a/fs/ocfs2/aops.c
> >>>>> +++ b/fs/ocfs2/aops.c
> >>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>>>  		goto out;
> >>>>>  	}
> >>>>>  
> >>>>> -	/* Delete orphan before acquire i_rwsem. */
> >>>>> -	if (dwc->dw_orphaned) {
> >>>>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> >>>>> -
> >>>>> -		end = end > i_size_read(inode) ? end : 0;
> >>>>> -
> >>>>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> >>>>> -				!!end, end);
> >>>>> -		if (ret < 0)
> >>>>> -			mlog_errno(ret);
> >>>>> -	}
> >>>>> -
> >>>>>  	down_write(&oi->ip_alloc_sem);
> >>>>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
> >>>>>  
> >>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>>>  
> >>>>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >>>>>  
> >>>>> -	handle = ocfs2_start_trans(osb, credits);
> >>>>> -	if (IS_ERR(handle)) {
> >>>>> -		ret = PTR_ERR(handle);
> >>>>> -		mlog_errno(ret);
> >>>>> -		goto unlock;
> >>>>> -	}
> >>>>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>>>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> >>>>> -	if (ret) {
> >>>>> -		mlog_errno(ret);
> >>>>> -		goto commit;
> >>>>> -	}
> >>>>> -
> >>>>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> >>>>> -		ret = ocfs2_assure_trans_credits(handle, credits);
> >>>>> -		if (ret < 0) {
> >>>>> +		handle = ocfs2_start_trans(osb, credits);
> >>>>> +		if (IS_ERR(handle)) {
> >>>>> +			ret = PTR_ERR(handle);
> >>>>>  			mlog_errno(ret);
> >>>>>  			break;
> >>>>
> >>>> I'd rather goto unlock directly without update i_size in case error.
> >>>
> >>> agree
> >>>>
> >>>>>  		}
> >>>>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>>>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
> >>>>> +		if (ret) {
> >>>>> +			mlog_errno(ret);
> >>>>> +			ocfs2_commit_trans(osb, handle);
> >>>>> +			break;
> >>>>
> >>>> Ditto.
> >>>
> >>> agree
> >>>>
> >>>>> +		}
> >>>>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
> >>>>>  						ue->ue_cpos, 1,
> >>>>>  						ue->ue_phys,
> >>>>>  						meta_ac, &dealloc);
> >>>>>  		if (ret < 0) {
> >>>>>  			mlog_errno(ret);
> >>>>> +			ocfs2_commit_trans(osb, handle);
> >>>>>  			break;
> >>>>
> >>>> Ditto.
> >>>
> >>> The existing code still updates i_size even if ocfs2_mark_extent_written()
> >>> returns an error. I am not certain whether updating i_size in this case is
> >>> correct, but I prefer to maintain the original logic for now.
> >>> Does that seem reasonable to you?
> >>>
> >>
> >> Since it returns 0 for unwritten extents, it looks fine.
> >>
> >>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
> >>>> to save the extra call in case error.
> >>>
> >>> agree
> >>>
> >>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
> >>> is about 12107 ~ 12457.
> >>> Regarding the transaction splitting in the unwritten handling loop: this patch
> >>> introduces a minor performance regression. I would like to merge some of the
> >>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
> >>> What do you think of this approach?
> >>>
> >>
> >> Seems we can limit max credit for a single transaction here?
> >> If exceeds, restart a new one.
> >>
> >> Joseph,
> >> Thanks
> > 
> > sorry for the late reply, I spend some time running the tests.
> > I have created two different types of patches (see below) to address this issue.
> > 
> > time consumption results:
> > 
> > ocfs2-dynamic-... patch
> > real    2m9.900s
> > user    0m0.333s
> > sys     0m22.687s
> > 
> > %200 of ocfs2-split-trans-... patch
> > real    1m48.901s
> > user    0m0.306s
> > sys     0m23.019s
> > 
> > %500 of ocfs2-split-trans-... patch
> > real    1m49.350s
> > user    0m0.299s
> > sys     0m22.718s
> > 
> > As shown above, the fixed "500% mode" is faster than the dynamic style.
> > Which approach do you prefer?
> > 
> I'd prefer solution B, it looks simpler to directly set a commit batch.
> Some comments for the formal patch:
> 1. I don't think you have to use 'cnt + trans_start + mod' to control
> the batch commit, just check 'batch + handle' seems enough.
> 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
> 3. Do not update i_size in case error.
> 4. Correctly handle commit transaction in case error.
> 
> Thanks,
> Joseph
> 

I hope I understand your point correctly.
creating and setting OCFS2_DIO_MARK_EXTENT_BATCH to 800, corresponding to the
%200 used in my previous 'cnt+trans_start+mod' patch.

the time consumption:
real    1m50.508s
user    0m0.365s
sys     0m22.829s

```
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..4e3a51e86e32 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -37,6 +37,8 @@
 #include "namei.h"
 #include "sysfile.h"
 
+#define OCFS2_DIO_MARK_EXTENT_BATCH 800
+
 static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
 				   struct buffer_head *bh_result, int create)
 {
@@ -2279,6 +2281,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	handle_t *handle = NULL;
 	loff_t end = offset + bytes;
 	int ret = 0, credits = 0;
+	int trans_err = 0, wanted;
 
 	ocfs2_init_dealloc_ctxt(&dealloc);
 
@@ -2295,18 +2298,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 		goto out;
 	}
 
-	/* Delete orphan before acquire i_rwsem. */
-	if (dwc->dw_orphaned) {
-		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
-		end = end > i_size_read(inode) ? end : 0;
-
-		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
-				!!end, end);
-		if (ret < 0)
-			mlog_errno(ret);
-	}
-
 	down_write(&oi->ip_alloc_sem);
 	di = (struct ocfs2_dinode *)di_bh->b_data;
 
@@ -2341,11 +2332,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	}
 
 	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
-		ret = ocfs2_assure_trans_credits(handle, credits);
-		if (ret < 0) {
-			mlog_errno(ret);
-			break;
-		}
 		ret = ocfs2_mark_extent_written(inode, &et, handle,
 						ue->ue_cpos, 1,
 						ue->ue_phys,
@@ -2354,17 +2340,46 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 			mlog_errno(ret);
 			break;
 		}
-	}
 
-	if (end > i_size_read(inode)) {
-		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
-		if (ret < 0)
-			mlog_errno(ret);
+		wanted = jbd2_handle_buffer_credits(handle) + (credits << 2);
+		if (wanted > OCFS2_DIO_MARK_EXTENT_BATCH) {
+			ocfs2_commit_trans(osb, handle);
+			handle = ocfs2_start_trans(osb, credits);
+			if (IS_ERR(handle)) {
+				ret = PTR_ERR(handle);
+				mlog_errno(ret);
+				trans_err = 1;
+				break;
+			}
+			ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+					OCFS2_JOURNAL_ACCESS_WRITE);
+			if (ret) {
+				mlog_errno(ret);
+				break;
+			}
+		}
 	}
+
+	if (!trans_err) {
+		if (end > i_size_read(inode)) {
+			ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
+			if (ret < 0)
+				mlog_errno(ret);
+		}
 commit:
-	ocfs2_commit_trans(osb, handle);
+		ocfs2_commit_trans(osb, handle);
+	}
 unlock:
 	up_write(&oi->ip_alloc_sem);
+
+	/* everything looks good, let's start the cleanup */
+	if (dwc->dw_orphaned) {
+		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+		if (ret < 0)
+			mlog_errno(ret);
+	}
 	ocfs2_inode_unlock(inode, 1);
 	brelse(di_bh);
 out:
```

if this approach is acceptable, I will send v3. 

- Heming

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-25  3:31             ` Heming Zhao
@ 2026-03-25  6:33               ` Joseph Qi
  0 siblings, 0 replies; 13+ messages in thread
From: Joseph Qi @ 2026-03-25  6:33 UTC (permalink / raw)
  To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, Jan Kara, glass.su



On 3/25/26 11:31 AM, Heming Zhao wrote:
> On Thu, Mar 19, 2026 at 09:53:16AM +0800, Joseph Qi wrote:
>>
>>
>> On 3/18/26 2:40 PM, Heming Zhao wrote:
>>> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
>>>>
>>>>
>>>> On 3/13/26 11:17 AM, Heming Zhao wrote:
>>>>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>>>>> Almost looks fine, minor updates below.
>>>>>>
>>>>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>>>>> ocfs2_dio_end_io_write
>>>>>>>  ocfs2_mark_extent_written
>>>>>>>   ocfs2_change_extent_flag
>>>>>>>    ocfs2_split_extent
>>>>>>>     ocfs2_try_to_merge_extent
>>>>>>>      ocfs2_extend_rotate_transaction
>>>>>>>       ocfs2_extend_trans
>>>>>>>        jbd2__journal_restart
>>>>>>>         start_this_handle
>>>>>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>>>>
>>>>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>>>>> handle each extent in a separate transaction.
>>>>>>>
>>>>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>>>>> only be removed from the orphan list after the extent tree update is complete.
>>>>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>>>>> won't leave stale blocks beyond EOF.
>>>>>>>
>>>>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>>>>> insufficient transaction credits").
>>>>>>>
>>>>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>>>>
>>>>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>>>>> ---
>>>>>>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>>>>>  1 file changed, 29 insertions(+), 29 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>>>> index 09146b43d1f0..91997b330d39 100644
>>>>>>> --- a/fs/ocfs2/aops.c
>>>>>>> +++ b/fs/ocfs2/aops.c
>>>>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>>>  		goto out;
>>>>>>>  	}
>>>>>>>  
>>>>>>> -	/* Delete orphan before acquire i_rwsem. */
>>>>>>> -	if (dwc->dw_orphaned) {
>>>>>>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>>>>> -
>>>>>>> -		end = end > i_size_read(inode) ? end : 0;
>>>>>>> -
>>>>>>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>>>>> -				!!end, end);
>>>>>>> -		if (ret < 0)
>>>>>>> -			mlog_errno(ret);
>>>>>>> -	}
>>>>>>> -
>>>>>>>  	down_write(&oi->ip_alloc_sem);
>>>>>>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>>>>  
>>>>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>>>  
>>>>>>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>>>>  
>>>>>>> -	handle = ocfs2_start_trans(osb, credits);
>>>>>>> -	if (IS_ERR(handle)) {
>>>>>>> -		ret = PTR_ERR(handle);
>>>>>>> -		mlog_errno(ret);
>>>>>>> -		goto unlock;
>>>>>>> -	}
>>>>>>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>>>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
>>>>>>> -	if (ret) {
>>>>>>> -		mlog_errno(ret);
>>>>>>> -		goto commit;
>>>>>>> -	}
>>>>>>> -
>>>>>>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>>>>> -		ret = ocfs2_assure_trans_credits(handle, credits);
>>>>>>> -		if (ret < 0) {
>>>>>>> +		handle = ocfs2_start_trans(osb, credits);
>>>>>>> +		if (IS_ERR(handle)) {
>>>>>>> +			ret = PTR_ERR(handle);
>>>>>>>  			mlog_errno(ret);
>>>>>>>  			break;
>>>>>>
>>>>>> I'd rather goto unlock directly without update i_size in case error.
>>>>>
>>>>> agree
>>>>>>
>>>>>>>  		}
>>>>>>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>>>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
>>>>>>> +		if (ret) {
>>>>>>> +			mlog_errno(ret);
>>>>>>> +			ocfs2_commit_trans(osb, handle);
>>>>>>> +			break;
>>>>>>
>>>>>> Ditto.
>>>>>
>>>>> agree
>>>>>>
>>>>>>> +		}
>>>>>>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>>>>>  						ue->ue_cpos, 1,
>>>>>>>  						ue->ue_phys,
>>>>>>>  						meta_ac, &dealloc);
>>>>>>>  		if (ret < 0) {
>>>>>>>  			mlog_errno(ret);
>>>>>>> +			ocfs2_commit_trans(osb, handle);
>>>>>>>  			break;
>>>>>>
>>>>>> Ditto.
>>>>>
>>>>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>>>>> returns an error. I am not certain whether updating i_size in this case is
>>>>> correct, but I prefer to maintain the original logic for now.
>>>>> Does that seem reasonable to you?
>>>>>
>>>>
>>>> Since it returns 0 for unwritten extents, it looks fine.
>>>>
>>>>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>>>>>> to save the extra call in case error.
>>>>>
>>>>> agree
>>>>>
>>>>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
>>>>> is about 12107 ~ 12457.
>>>>> Regarding the transaction splitting in the unwritten handling loop: this patch
>>>>> introduces a minor performance regression. I would like to merge some of the
>>>>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
>>>>> What do you think of this approach?
>>>>>
>>>>
>>>> Seems we can limit max credit for a single transaction here?
>>>> If exceeds, restart a new one.
>>>>
>>>> Joseph,
>>>> Thanks
>>>
>>> sorry for the late reply, I spend some time running the tests.
>>> I have created two different types of patches (see below) to address this issue.
>>>
>>> time consumption results:
>>>
>>> ocfs2-dynamic-... patch
>>> real    2m9.900s
>>> user    0m0.333s
>>> sys     0m22.687s
>>>
>>> %200 of ocfs2-split-trans-... patch
>>> real    1m48.901s
>>> user    0m0.306s
>>> sys     0m23.019s
>>>
>>> %500 of ocfs2-split-trans-... patch
>>> real    1m49.350s
>>> user    0m0.299s
>>> sys     0m22.718s
>>>
>>> As shown above, the fixed "500% mode" is faster than the dynamic style.
>>> Which approach do you prefer?
>>>
>> I'd prefer solution B, it looks simpler to directly set a commit batch.
>> Some comments for the formal patch:
>> 1. I don't think you have to use 'cnt + trans_start + mod' to control
>> the batch commit, just check 'batch + handle' seems enough.
>> 2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
>> 3. Do not update i_size in case error.
>> 4. Correctly handle commit transaction in case error.
>>
>> Thanks,
>> Joseph
>>
> 
> I hope I understand your point correctly.
> creating and setting OCFS2_DIO_MARK_EXTENT_BATCH to 800, corresponding to the
> %200 used in my previous 'cnt+trans_start+mod' patch.
> 

Seems you've proposed a new solution?

In your previous mail, I think it is:

list_for_each_entry() {
	if (!handle)
		ocfs2_start_trans();
	ocfs2_journal_access_di()
	ocfs2_mark_extent_written()
	if (batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
		ocfs2_commit_trans();
		handle = NULL;
	}
}

Joseph


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-03-25  6:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 16:27 [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid Heming Zhao
2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
2026-03-13  2:04   ` Joseph Qi
2026-03-13  3:17     ` Heming Zhao
2026-03-13  4:53       ` Joseph Qi
2026-03-13  5:35         ` Joseph Qi
2026-03-18  6:40         ` Heming Zhao
2026-03-19  1:53           ` Joseph Qi
2026-03-24  7:18             ` Heming Zhao
2026-03-25  3:31             ` Heming Zhao
2026-03-25  6:33               ` Joseph Qi
2026-03-24  6:00           ` Heming Zhao
2026-03-13  2:27   ` Joseph Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox