public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
@ 2026-03-26 14:26 Heming Zhao
  2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao
  0 siblings, 1 reply; 5+ messages in thread
From: Heming Zhao @ 2026-03-26 14:26 UTC (permalink / raw)
  To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel, glass.su

For easier merging, this patch is based on Joseph's patch [1].

v3->v4:
Remove [patch 2/2] as the revert operation is incorrect.

v2->v3:
Following the discussion, use 'batch' and 'handle' to control
restarting the jbd2 transaction.

v1->v2:
following the review comments, restore the i_size update code in
ocfs2_dio_end_io_write().

the runtime of the test script [2].
real    1m49.100s
user    0m0.303s
sys     0m22.672s

[1]:
https://lore.kernel.org/ocfs2-devel/46yilbaq5z5x6gdfdpoa6lprf6sf3gbxriuku2odje4kx4bovf@jd735cphfutz/T/#t 
[2]:
https://lore.kernel.org/ocfs2-devel/75f89a17-213b-42a0-a30e-d52fb2d077a6@linux.alibaba.com/T/#mbe2b5f52ee249178e1ad4c76d964de2dc818eb32

Heming Zhao (1):
  ocfs2: split transactions in dio completion to avoid credit exhaustion

 fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 28 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-26 14:26 [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
@ 2026-03-26 14:26 ` Heming Zhao
  2026-03-27  1:42   ` Joseph Qi
  0 siblings, 1 reply; 5+ messages in thread
From: Heming Zhao @ 2026-03-26 14:26 UTC (permalink / raw)
  To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel, glass.su

During ocfs2 dio operations, JBD2 may report warnings via following call trace:
ocfs2_dio_end_io_write
 ocfs2_mark_extent_written
  ocfs2_change_extent_flag
   ocfs2_split_extent
    ocfs2_try_to_merge_extent
     ocfs2_extend_rotate_transaction
      ocfs2_extend_trans
       jbd2__journal_restart
        start_this_handle
         output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449

To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
to handle extent in a batch of transaction.

Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
only be removed from the orphan list after the extent tree update is complete.
this ensures that if a crash occurs in the middle of extent tree updates, we
won't leave stale blocks beyond EOF.

Finally, thanks to Jans and Joseph for providing the bug fix prototype and
suggestions.

Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 28 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 09146b43d1f0..60f1b607022f 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -37,6 +37,8 @@
 #include "namei.h"
 #include "sysfile.h"
 
+#define OCFS2_DIO_MARK_EXTENT_BATCH 200
+
 static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
 				   struct buffer_head *bh_result, int create)
 {
@@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	struct ocfs2_alloc_context *meta_ac = NULL;
 	handle_t *handle = NULL;
 	loff_t end = offset + bytes;
-	int ret = 0, credits = 0;
+	int ret = 0, credits = 0, batch = 0;
 
 	ocfs2_init_dealloc_ctxt(&dealloc);
 
@@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 		goto out;
 	}
 
-	/* Delete orphan before acquire i_rwsem. */
-	if (dwc->dw_orphaned) {
-		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
-		end = end > i_size_read(inode) ? end : 0;
-
-		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
-				!!end, end);
-		if (ret < 0)
-			mlog_errno(ret);
-	}
-
 	down_write(&oi->ip_alloc_sem);
 	di = (struct ocfs2_dinode *)di_bh->b_data;
 
@@ -2326,20 +2316,22 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 
 	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
 
-	handle = ocfs2_start_trans(osb, credits);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-		mlog_errno(ret);
-		goto unlock;
-	}
-	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
-				      OCFS2_JOURNAL_ACCESS_WRITE);
-	if (ret) {
-		mlog_errno(ret);
-		goto commit;
-	}
-
 	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
+		if (!handle) {
+			handle = ocfs2_start_trans(osb, credits);
+			if (IS_ERR(handle)) {
+				ret = PTR_ERR(handle);
+				mlog_errno(ret);
+				handle = NULL;
+				break;
+			}
+			ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+					OCFS2_JOURNAL_ACCESS_WRITE);
+			if (ret) {
+				mlog_errno(ret);
+				break;
+			}
+		}
 		ret = ocfs2_assure_trans_credits(handle, credits);
 		if (ret < 0) {
 			mlog_errno(ret);
@@ -2353,17 +2345,41 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 			mlog_errno(ret);
 			break;
 		}
+
+		if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
+			ocfs2_commit_trans(osb, handle);
+			handle = NULL;
+			batch = 0;
+		}
 	}
 
 	if (end > i_size_read(inode)) {
+		if (!handle) {
+			handle = ocfs2_start_trans(osb, credits);
+			if (IS_ERR(handle)) {
+				ret = PTR_ERR(handle);
+				mlog_errno(ret);
+				goto unlock;
+			}
+		}
 		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
 		if (ret < 0)
 			mlog_errno(ret);
 	}
-commit:
-	ocfs2_commit_trans(osb, handle);
+	if (handle)
+		ocfs2_commit_trans(osb, handle);
+
 unlock:
 	up_write(&oi->ip_alloc_sem);
+
+	/* everything looks good, let's start the cleanup */
+	if (dwc->dw_orphaned) {
+		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+		if (ret < 0)
+			mlog_errno(ret);
+	}
 	ocfs2_inode_unlock(inode, 1);
 	brelse(di_bh);
 out:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao
@ 2026-03-27  1:42   ` Joseph Qi
  2026-03-27  3:02     ` Heming Zhao
  0 siblings, 1 reply; 5+ messages in thread
From: Joseph Qi @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara

Hi,

On 3/26/26 10:26 PM, Heming Zhao wrote:
> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> ocfs2_dio_end_io_write
>  ocfs2_mark_extent_written
>   ocfs2_change_extent_flag
>    ocfs2_split_extent
>     ocfs2_try_to_merge_extent
>      ocfs2_extend_rotate_transaction
>       ocfs2_extend_trans
>        jbd2__journal_restart
>         start_this_handle
>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> 
> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
> to handle extent in a batch of transaction.
> 
> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> only be removed from the orphan list after the extent tree update is complete.
> this ensures that if a crash occurs in the middle of extent tree updates, we
> won't leave stale blocks beyond EOF.
> 
> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> suggestions.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
>  1 file changed, 44 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 09146b43d1f0..60f1b607022f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -37,6 +37,8 @@
>  #include "namei.h"
>  #include "sysfile.h"
>  
> +#define OCFS2_DIO_MARK_EXTENT_BATCH 200
> +
>  static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
>  				   struct buffer_head *bh_result, int create)
>  {
> @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  	struct ocfs2_alloc_context *meta_ac = NULL;
>  	handle_t *handle = NULL;
>  	loff_t end = offset + bytes;
> -	int ret = 0, credits = 0;
> +	int ret = 0, credits = 0, batch = 0;
>  
>  	ocfs2_init_dealloc_ctxt(&dealloc);
>  
> @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  		goto out;
>  	}
>  
> -	/* Delete orphan before acquire i_rwsem. */
> -	if (dwc->dw_orphaned) {
> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> -		end = end > i_size_read(inode) ? end : 0;
> -
> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> -				!!end, end);
> -		if (ret < 0)
> -			mlog_errno(ret);
> -	}
> -
>  	down_write(&oi->ip_alloc_sem);
>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>  
> @@ -2326,20 +2316,22 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  
>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>  
> -	handle = ocfs2_start_trans(osb, credits);
> -	if (IS_ERR(handle)) {
> -		ret = PTR_ERR(handle);
> -		mlog_errno(ret);
> -		goto unlock;
> -	}
> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> -				      OCFS2_JOURNAL_ACCESS_WRITE);
> -	if (ret) {
> -		mlog_errno(ret);
> -		goto commit;
> -	}
> -
>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> +		if (!handle) {
> +			handle = ocfs2_start_trans(osb, credits);
> +			if (IS_ERR(handle)) {
> +				ret = PTR_ERR(handle);
> +				mlog_errno(ret);
> +				handle = NULL;
> +				break;
> +			}
> +			ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> +					OCFS2_JOURNAL_ACCESS_WRITE);
> +			if (ret) {
> +				mlog_errno(ret);
> +				break;
> +			}
> +		}
>  		ret = ocfs2_assure_trans_credits(handle, credits);
>  		if (ret < 0) {
>  			mlog_errno(ret);
> @@ -2353,17 +2345,41 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>  			mlog_errno(ret);
>  			break;
>  		}
> +
> +		if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> +			ocfs2_commit_trans(osb, handle);
> +			handle = NULL;
> +			batch = 0;
> +		}
>  	}
>  
>  	if (end > i_size_read(inode)) {

I still don't think it is a good idea to update inode size in case error.
The original logic behaves inconsistent, if ocfs2_start_trans() and
ocfs2_journal_access_di() fails, it won't update inode size, but if
ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
So let's make it behave consistent by both checking 'ret' here.

Other looks fine.

Joseph


> +		if (!handle) {
> +			handle = ocfs2_start_trans(osb, credits);
> +			if (IS_ERR(handle)) {
> +				ret = PTR_ERR(handle);
> +				mlog_errno(ret);
> +				goto unlock;
> +			}
> +		}
>  		ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
>  		if (ret < 0)
>  			mlog_errno(ret);
>  	}
> -commit:> -	ocfs2_commit_trans(osb, handle);
> +	if (handle)
> +		ocfs2_commit_trans(osb, handle);
> +
>  unlock:
>  	up_write(&oi->ip_alloc_sem);
> +
> +	/* everything looks good, let's start the cleanup */
> +	if (dwc->dw_orphaned) {
> +		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> +		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> +		if (ret < 0)
> +			mlog_errno(ret);
> +	}
>  	ocfs2_inode_unlock(inode, 1);
>  	brelse(di_bh);
>  out:


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-27  1:42   ` Joseph Qi
@ 2026-03-27  3:02     ` Heming Zhao
  2026-03-27  3:12       ` Joseph Qi
  0 siblings, 1 reply; 5+ messages in thread
From: Heming Zhao @ 2026-03-27  3:02 UTC (permalink / raw)
  To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara

On Fri, Mar 27, 2026 at 09:42:55AM +0800, Joseph Qi wrote:
> Hi,
> 
> On 3/26/26 10:26 PM, Heming Zhao wrote:
> > During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> > ocfs2_dio_end_io_write
> >  ocfs2_mark_extent_written
> >   ocfs2_change_extent_flag
> >    ocfs2_split_extent
> >     ocfs2_try_to_merge_extent
> >      ocfs2_extend_rotate_transaction
> >       ocfs2_extend_trans
> >        jbd2__journal_restart
> >         start_this_handle
> >          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> > 
> > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
> > to handle extent in a batch of transaction.
> > 
> > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > only be removed from the orphan list after the extent tree update is complete.
> > this ensures that if a crash occurs in the middle of extent tree updates, we
> > won't leave stale blocks beyond EOF.
> > 
> > Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> > suggestions.
> > 
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> >  fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
> >  1 file changed, 44 insertions(+), 28 deletions(-)
> > 
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 09146b43d1f0..60f1b607022f 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -37,6 +37,8 @@
> >  #include "namei.h"
> > ... ... <--- snip --->
> > +		if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> > +			ocfs2_commit_trans(osb, handle);
> > +			handle = NULL;
> > +			batch = 0;
> > +		}
> >  	}
> >  
> >  	if (end > i_size_read(inode)) {
> 
> I still don't think it is a good idea to update inode size in case error.
> The original logic behaves inconsistent, if ocfs2_start_trans() and
> ocfs2_journal_access_di() fails, it won't update inode size, but if
> ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
> So let's make it behave consistent by both checking 'ret' here.
> 
> Other looks fine.
> 
> Joseph

After sending the v4 patch, I did some tests and identified the key of the
performance degradation in the dynamic credit adjustment patch.
I found that by using half of the jbd2_max value, the time consumption returned
to the same level as the "direct loop" style. 

key code change:
(base on patch[1]: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch)
        int jbd2_max = (journal->j_max_transaction_buffers -
                        journal->j_transaction_overhead_buffers) >> 2;

my test env:
- 10G ocfs2 volume
- DIO 2G file, write with 64MB block (--bs=64M).

observations:
- ocfs2_assure_trans_credits calls "commit & start trans" times on every 64MB block:
  - using jbd2_max (5449):  ~9 times.
  - using half value (2724): ~20 times.
- v4 patch ("batch" style), call "commit & start trans" on every 64MB block: ~62 times

time consumption:

jbd2_max:5449 case
real    2m9.346s
user    0m0.277s
sys     0m22.748s

half jbd2_max:2724 case
real    1m49.400s                                                                              
user    0m0.343s                                                                               
sys     0m22.734s

v4 patch:
real    1m49.650s
user    0m0.337s
sys     0m22.780s


the reason (I guess):
The original patch only performed a "commit and start new trans" when jbd2
credits were nearly exhausted. I suspect contention between the commit flush
operation and the jbd2 operations in the dio end path, leading to high latency.
By using half the jbd2_max value, trigger commits more frequently, which appears
to reduce jbd2 contention and racing.

Finally, I suggest we use the dynamic credit extension patch for v5. This
version only adds logic to ocfs2_assure_trans_credits() and does not affect
i_size changes during error cases.

[1]:
searching "0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch"
in following url:
- https://lore.kernel.org/ocfs2-devel/75f89a17-213b-42a0-a30e-d52fb2d077a6@linux.alibaba.com/T/#mbe2b5f52ee249178e1ad4c76d964de2dc818eb32

-----
For easy of discussion, I past the full dynamic credits adjustment patch
(with half jbd2_max) below.

```
Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path

---
 fs/ocfs2/aops.c    | 31 +++++++++++++++++--------------
 fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++----
 fs/ocfs2/journal.h |  5 +++--
 3 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..9cdecfa0ea0c 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	handle_t *handle = NULL;
 	loff_t end = offset + bytes;
 	int ret = 0, credits = 0;
+	/* do the same job of jbd2_max_user_trans_buffers() */
+	journal_t *journal = osb->journal->j_journal;
+	int jbd2_max = (journal->j_max_transaction_buffers -
+			journal->j_transaction_overhead_buffers) >> 2;
 
 	ocfs2_init_dealloc_ctxt(&dealloc);
 
@@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 		goto out;
 	}
 
-	/* Delete orphan before acquire i_rwsem. */
-	if (dwc->dw_orphaned) {
-		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
-		end = end > i_size_read(inode) ? end : 0;
-
-		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
-				!!end, end);
-		if (ret < 0)
-			mlog_errno(ret);
-	}
-
 	down_write(&oi->ip_alloc_sem);
 	di = (struct ocfs2_dinode *)di_bh->b_data;
 
@@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	}
 
 	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
-		ret = ocfs2_assure_trans_credits(handle, credits);
-		if (ret < 0) {
+		ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max);
+		if (ret == 1) {
+			goto unlock;
+		} else if (ret < 0) {
 			mlog_errno(ret);
 			break;
 		}
@@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
 	ocfs2_commit_trans(osb, handle);
 unlock:
 	up_write(&oi->ip_alloc_sem);
+
+	if (dwc->dw_orphaned) {
+		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+		if (ret < 0)
+			mlog_errno(ret);
+	}
+
 	ocfs2_inode_unlock(inode, 1);
 	brelse(di_bh);
 out:
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index e5f58ff2175f..57ad69d19494 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks)
  * credits. Similar notes regarding data consistency and locking implications
  * as for ocfs2_extend_trans() apply here.
  */
-int ocfs2_assure_trans_credits(handle_t *handle, int nblocks)
+int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle,
+				struct buffer_head *bh, int nblocks, int jbd2_max)
 {
+	handle_t *handle = *i_handle;
 	int old_nblks = jbd2_handle_buffer_credits(handle);
+	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+	int ret, wanted;
+	int plan = nblocks << 1; /* Double the credits to prevent boundary issues */
 
 	trace_ocfs2_assure_trans_credits(old_nblks);
-	if (old_nblks >= nblocks)
-		return 0;
-	return ocfs2_extend_trans(handle, nblocks - old_nblks);
+
+	wanted = old_nblks + plan;
+
+	if (wanted < jbd2_max) {
+		if (old_nblks < nblocks)
+			return ocfs2_extend_trans(handle, nblocks - old_nblks);
+		else
+			return 0;
+	}
+
+	ocfs2_commit_trans(osb, handle);
+	handle = ocfs2_start_trans(osb, plan);
+	if (IS_ERR(handle)) {
+		mlog_errno(PTR_ERR(handle));
+		return 1; /* caller must not commit trans for this error */
+	}
+
+	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh,
+			OCFS2_JOURNAL_ACCESS_WRITE);
+	if (ret)
+		mlog_errno(ret);
+
+	*i_handle = handle;
+	return ret;
 }
 
 /*
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 6397170f302f..08a76ffb870a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -244,8 +244,9 @@ handle_t		    *ocfs2_start_trans(struct ocfs2_super *osb,
 int			     ocfs2_commit_trans(struct ocfs2_super *osb,
 						handle_t *handle);
 int			     ocfs2_extend_trans(handle_t *handle, int nblocks);
-int			     ocfs2_assure_trans_credits(handle_t *handle,
-						int nblocks);
+int			     ocfs2_assure_trans_credits(struct inode *inode,
+					handle_t **i_handle, struct buffer_head *bh,
+					int nblocks, int jbd2_max);
 int			     ocfs2_allocate_extend_trans(handle_t *handle,
 						int thresh);
 
-- 
2.43.0

```

- Heming

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
  2026-03-27  3:02     ` Heming Zhao
@ 2026-03-27  3:12       ` Joseph Qi
  0 siblings, 0 replies; 5+ messages in thread
From: Joseph Qi @ 2026-03-27  3:12 UTC (permalink / raw)
  To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara



On 3/27/26 11:02 AM, Heming Zhao wrote:
> On Fri, Mar 27, 2026 at 09:42:55AM +0800, Joseph Qi wrote:
>> Hi,
>>
>> On 3/26/26 10:26 PM, Heming Zhao wrote:
>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>> ocfs2_dio_end_io_write
>>>  ocfs2_mark_extent_written
>>>   ocfs2_change_extent_flag
>>>    ocfs2_split_extent
>>>     ocfs2_try_to_merge_extent
>>>      ocfs2_extend_rotate_transaction
>>>       ocfs2_extend_trans
>>>        jbd2__journal_restart
>>>         start_this_handle
>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>
>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
>>> to handle extent in a batch of transaction.
>>>
>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>> only be removed from the orphan list after the extent tree update is complete.
>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>> won't leave stale blocks beyond EOF.
>>>
>>> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
>>> suggestions.
>>>
>>> Suggested-by: Jan Kara <jack@suse.cz>
>>> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>> ---
>>>  fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
>>>  1 file changed, 44 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 09146b43d1f0..60f1b607022f 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -37,6 +37,8 @@
>>>  #include "namei.h"
>>> ... ... <--- snip --->
>>> +		if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
>>> +			ocfs2_commit_trans(osb, handle);
>>> +			handle = NULL;
>>> +			batch = 0;
>>> +		}
>>>  	}
>>>  
>>>  	if (end > i_size_read(inode)) {
>>
>> I still don't think it is a good idea to update inode size in case error.
>> The original logic behaves inconsistent, if ocfs2_start_trans() and
>> ocfs2_journal_access_di() fails, it won't update inode size, but if
>> ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
>> So let's make it behave consistent by both checking 'ret' here.
>>
>> Other looks fine.
>>
>> Joseph
> 
> After sending the v4 patch, I did some tests and identified the key of the
> performance degradation in the dynamic credit adjustment patch.
> I found that by using half of the jbd2_max value, the time consumption returned
> to the same level as the "direct loop" style. 
> 
> key code change:
> (base on patch[1]: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch)
>         int jbd2_max = (journal->j_max_transaction_buffers -
>                         journal->j_transaction_overhead_buffers) >> 2;
> 
> my test env:
> - 10G ocfs2 volume
> - DIO 2G file, write with 64MB block (--bs=64M).
> 
> observations:
> - ocfs2_assure_trans_credits calls "commit & start trans" times on every 64MB block:
>   - using jbd2_max (5449):  ~9 times.
>   - using half value (2724): ~20 times.
> - v4 patch ("batch" style), call "commit & start trans" on every 64MB block: ~62 times
> 
> time consumption:
> 
> jbd2_max:5449 case
> real    2m9.346s
> user    0m0.277s
> sys     0m22.748s
> 
> half jbd2_max:2724 case
> real    1m49.400s                                                                              
> user    0m0.343s                                                                               
> sys     0m22.734s
> 
> v4 patch:
> real    1m49.650s
> user    0m0.337s
> sys     0m22.780s
> 
> 
> the reason (I guess):
> The original patch only performed a "commit and start new trans" when jbd2
> credits were nearly exhausted. I suspect contention between the commit flush
> operation and the jbd2 operations in the dio end path, leading to high latency.
> By using half the jbd2_max value, trigger commits more frequently, which appears
> to reduce jbd2 contention and racing.
> 
> Finally, I suggest we use the dynamic credit extension patch for v5. This
> version only adds logic to ocfs2_assure_trans_credits() and does not affect
> i_size changes during error cases.
> 
Ummm... I think v4 looks simpler and direct.
For i_size update logic, I think the existing logic looks wried. Take a
look at ext4, it also doesn't do this if converting unwritten extents
fails.

Joseph
 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-27  3:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 14:26 [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao
2026-03-27  1:42   ` Joseph Qi
2026-03-27  3:02     ` Heming Zhao
2026-03-27  3:12       ` Joseph Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox