* [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
@ 2026-03-26 14:26 Heming Zhao
2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao
0 siblings, 1 reply; 5+ messages in thread
From: Heming Zhao @ 2026-03-26 14:26 UTC (permalink / raw)
To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel, glass.su
For easier merging, this patch is based on Joseph's patch [1].
v3->v4:
Remove [patch 2/2] as the revert operation is incorrect.
v2->v3:
Following the discussion, use 'batch' and 'handle' to control
restarting the jbd2 transaction.
v1->v2:
following the review comments, restore the i_size update code in
ocfs2_dio_end_io_write().
the runtime of the test script [2].
real 1m49.100s
user 0m0.303s
sys 0m22.672s
[1]:
https://lore.kernel.org/ocfs2-devel/46yilbaq5z5x6gdfdpoa6lprf6sf3gbxriuku2odje4kx4bovf@jd735cphfutz/T/#t
[2]:
https://lore.kernel.org/ocfs2-devel/75f89a17-213b-42a0-a30e-d52fb2d077a6@linux.alibaba.com/T/#mbe2b5f52ee249178e1ad4c76d964de2dc818eb32
Heming Zhao (1):
ocfs2: split transactions in dio completion to avoid credit exhaustion
fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
1 file changed, 44 insertions(+), 28 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-26 14:26 [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
@ 2026-03-26 14:26 ` Heming Zhao
2026-03-27 1:42 ` Joseph Qi
0 siblings, 1 reply; 5+ messages in thread
From: Heming Zhao @ 2026-03-26 14:26 UTC (permalink / raw)
To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel, glass.su
During ocfs2 dio operations, JBD2 may report warnings via following call trace:
ocfs2_dio_end_io_write
ocfs2_mark_extent_written
ocfs2_change_extent_flag
ocfs2_split_extent
ocfs2_try_to_merge_extent
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2__journal_restart
start_this_handle
output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
to handle extent in a batch of transaction.
Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
only be removed from the orphan list after the extent tree update is complete.
this ensures that if a crash occurs in the middle of extent tree updates, we
won't leave stale blocks beyond EOF.
Finally, thanks to Jans and Joseph for providing the bug fix prototype and
suggestions.
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
1 file changed, 44 insertions(+), 28 deletions(-)
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 09146b43d1f0..60f1b607022f 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -37,6 +37,8 @@
#include "namei.h"
#include "sysfile.h"
+#define OCFS2_DIO_MARK_EXTENT_BATCH 200
+
static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
@@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
struct ocfs2_alloc_context *meta_ac = NULL;
handle_t *handle = NULL;
loff_t end = offset + bytes;
- int ret = 0, credits = 0;
+ int ret = 0, credits = 0, batch = 0;
ocfs2_init_dealloc_ctxt(&dealloc);
@@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
goto out;
}
- /* Delete orphan before acquire i_rwsem. */
- if (dwc->dw_orphaned) {
- BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
- end = end > i_size_read(inode) ? end : 0;
-
- ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
- !!end, end);
- if (ret < 0)
- mlog_errno(ret);
- }
-
down_write(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;
@@ -2326,20 +2316,22 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
- handle = ocfs2_start_trans(osb, credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto unlock;
- }
- ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret) {
- mlog_errno(ret);
- goto commit;
- }
-
list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
+ if (!handle) {
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ handle = NULL;
+ break;
+ }
+ ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ break;
+ }
+ }
ret = ocfs2_assure_trans_credits(handle, credits);
if (ret < 0) {
mlog_errno(ret);
@@ -2353,17 +2345,41 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
mlog_errno(ret);
break;
}
+
+ if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
+ ocfs2_commit_trans(osb, handle);
+ handle = NULL;
+ batch = 0;
+ }
}
if (end > i_size_read(inode)) {
+ if (!handle) {
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ goto unlock;
+ }
+ }
ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
if (ret < 0)
mlog_errno(ret);
}
-commit:
- ocfs2_commit_trans(osb, handle);
+ if (handle)
+ ocfs2_commit_trans(osb, handle);
+
unlock:
up_write(&oi->ip_alloc_sem);
+
+ /* everything looks good, let's start the cleanup */
+ if (dwc->dw_orphaned) {
+ BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+ ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
out:
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao
@ 2026-03-27 1:42 ` Joseph Qi
2026-03-27 3:02 ` Heming Zhao
0 siblings, 1 reply; 5+ messages in thread
From: Joseph Qi @ 2026-03-27 1:42 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara
Hi,
On 3/26/26 10:26 PM, Heming Zhao wrote:
> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> ocfs2_dio_end_io_write
> ocfs2_mark_extent_written
> ocfs2_change_extent_flag
> ocfs2_split_extent
> ocfs2_try_to_merge_extent
> ocfs2_extend_rotate_transaction
> ocfs2_extend_trans
> jbd2__journal_restart
> start_this_handle
> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>
> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
> to handle extent in a batch of transaction.
>
> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> only be removed from the orphan list after the extent tree update is complete.
> this ensures that if a crash occurs in the middle of extent tree updates, we
> won't leave stale blocks beyond EOF.
>
> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> suggestions.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
> fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
> 1 file changed, 44 insertions(+), 28 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 09146b43d1f0..60f1b607022f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -37,6 +37,8 @@
> #include "namei.h"
> #include "sysfile.h"
>
> +#define OCFS2_DIO_MARK_EXTENT_BATCH 200
> +
> static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
> struct buffer_head *bh_result, int create)
> {
> @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> struct ocfs2_alloc_context *meta_ac = NULL;
> handle_t *handle = NULL;
> loff_t end = offset + bytes;
> - int ret = 0, credits = 0;
> + int ret = 0, credits = 0, batch = 0;
>
> ocfs2_init_dealloc_ctxt(&dealloc);
>
> @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> goto out;
> }
>
> - /* Delete orphan before acquire i_rwsem. */
> - if (dwc->dw_orphaned) {
> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> - end = end > i_size_read(inode) ? end : 0;
> -
> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> - !!end, end);
> - if (ret < 0)
> - mlog_errno(ret);
> - }
> -
> down_write(&oi->ip_alloc_sem);
> di = (struct ocfs2_dinode *)di_bh->b_data;
>
> @@ -2326,20 +2316,22 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>
> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>
> - handle = ocfs2_start_trans(osb, credits);
> - if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - mlog_errno(ret);
> - goto unlock;
> - }
> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> - OCFS2_JOURNAL_ACCESS_WRITE);
> - if (ret) {
> - mlog_errno(ret);
> - goto commit;
> - }
> -
> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> + if (!handle) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + handle = NULL;
> + break;
> + }
> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> + OCFS2_JOURNAL_ACCESS_WRITE);
> + if (ret) {
> + mlog_errno(ret);
> + break;
> + }
> + }
> ret = ocfs2_assure_trans_credits(handle, credits);
> if (ret < 0) {
> mlog_errno(ret);
> @@ -2353,17 +2345,41 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> mlog_errno(ret);
> break;
> }
> +
> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> + ocfs2_commit_trans(osb, handle);
> + handle = NULL;
> + batch = 0;
> + }
> }
>
> if (end > i_size_read(inode)) {
I still don't think it is a good idea to update inode size in case error.
The original logic behaves inconsistent, if ocfs2_start_trans() and
ocfs2_journal_access_di() fails, it won't update inode size, but if
ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
So let's make it behave consistent by both checking 'ret' here.
Other looks fine.
Joseph
> + if (!handle) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + goto unlock;
> + }
> + }
> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> if (ret < 0)
> mlog_errno(ret);
> }
> -commit:> - ocfs2_commit_trans(osb, handle);
> + if (handle)
> + ocfs2_commit_trans(osb, handle);
> +
> unlock:
> up_write(&oi->ip_alloc_sem);
> +
> + /* everything looks good, let's start the cleanup */
> + if (dwc->dw_orphaned) {
> + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> + if (ret < 0)
> + mlog_errno(ret);
> + }
> ocfs2_inode_unlock(inode, 1);
> brelse(di_bh);
> out:
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-27 1:42 ` Joseph Qi
@ 2026-03-27 3:02 ` Heming Zhao
2026-03-27 3:12 ` Joseph Qi
0 siblings, 1 reply; 5+ messages in thread
From: Heming Zhao @ 2026-03-27 3:02 UTC (permalink / raw)
To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara
On Fri, Mar 27, 2026 at 09:42:55AM +0800, Joseph Qi wrote:
> Hi,
>
> On 3/26/26 10:26 PM, Heming Zhao wrote:
> > During ocfs2 dio operations, JBD2 may report warnings via following call trace:
> > ocfs2_dio_end_io_write
> > ocfs2_mark_extent_written
> > ocfs2_change_extent_flag
> > ocfs2_split_extent
> > ocfs2_try_to_merge_extent
> > ocfs2_extend_rotate_transaction
> > ocfs2_extend_trans
> > jbd2__journal_restart
> > start_this_handle
> > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >
> > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
> > to handle extent in a batch of transaction.
> >
> > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > only be removed from the orphan list after the extent tree update is complete.
> > this ensures that if a crash occurs in the middle of extent tree updates, we
> > won't leave stale blocks beyond EOF.
> >
> > Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> > suggestions.
> >
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> > fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
> > 1 file changed, 44 insertions(+), 28 deletions(-)
> >
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 09146b43d1f0..60f1b607022f 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -37,6 +37,8 @@
> > #include "namei.h"
> > ... ... <--- snip --->
> > + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> > + ocfs2_commit_trans(osb, handle);
> > + handle = NULL;
> > + batch = 0;
> > + }
> > }
> >
> > if (end > i_size_read(inode)) {
>
> I still don't think it is a good idea to update inode size in case error.
> The original logic behaves inconsistent, if ocfs2_start_trans() and
> ocfs2_journal_access_di() fails, it won't update inode size, but if
> ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
> So let's make it behave consistent by both checking 'ret' here.
>
> Other looks fine.
>
> Joseph
After sending the v4 patch, I did some tests and identified the key of the
performance degradation in the dynamic credit adjustment patch.
I found that by using half of the jbd2_max value, the time consumption returned
to the same level as the "direct loop" style.
key code change:
(base on patch[1]: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch)
int jbd2_max = (journal->j_max_transaction_buffers -
journal->j_transaction_overhead_buffers) >> 2;
my test env:
- 10G ocfs2 volume
- DIO 2G file, write with 64MB block (--bs=64M).
observations:
- ocfs2_assure_trans_credits calls "commit & start trans" times on every 64MB block:
- using jbd2_max (5449): ~9 times.
- using half value (2724): ~20 times.
- v4 patch ("batch" style), call "commit & start trans" on every 64MB block: ~62 times
time consumption:
jbd2_max:5449 case
real 2m9.346s
user 0m0.277s
sys 0m22.748s
half jbd2_max:2724 case
real 1m49.400s
user 0m0.343s
sys 0m22.734s
v4 patch:
real 1m49.650s
user 0m0.337s
sys 0m22.780s
the reason (I guess):
The original patch only performed a "commit and start new trans" when jbd2
credits were nearly exhausted. I suspect contention between the commit flush
operation and the jbd2 operations in the dio end path, leading to high latency.
By using half the jbd2_max value, trigger commits more frequently, which appears
to reduce jbd2 contention and racing.
Finally, I suggest we use the dynamic credit extension patch for v5. This
version only adds logic to ocfs2_assure_trans_credits() and does not affect
i_size changes during error cases.
[1]:
searching "0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch"
in following url:
- https://lore.kernel.org/ocfs2-devel/75f89a17-213b-42a0-a30e-d52fb2d077a6@linux.alibaba.com/T/#mbe2b5f52ee249178e1ad4c76d964de2dc818eb32
-----
For easy of discussion, I past the full dynamic credits adjustment patch
(with half jbd2_max) below.
```
Subject: [PATCH] ocfs2: dynamic extending jbd2 credits during dio end path
---
fs/ocfs2/aops.c | 31 +++++++++++++++++--------------
fs/ocfs2/journal.c | 34 ++++++++++++++++++++++++++++++----
fs/ocfs2/journal.h | 5 +++--
3 files changed, 50 insertions(+), 20 deletions(-)
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7a65d5a36a3e..9cdecfa0ea0c 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2279,6 +2279,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
handle_t *handle = NULL;
loff_t end = offset + bytes;
int ret = 0, credits = 0;
+ /* do the same job of jbd2_max_user_trans_buffers() */
+ journal_t *journal = osb->journal->j_journal;
+ int jbd2_max = (journal->j_max_transaction_buffers -
+ journal->j_transaction_overhead_buffers) >> 2;
ocfs2_init_dealloc_ctxt(&dealloc);
@@ -2295,18 +2299,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
goto out;
}
- /* Delete orphan before acquire i_rwsem. */
- if (dwc->dw_orphaned) {
- BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
- end = end > i_size_read(inode) ? end : 0;
-
- ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
- !!end, end);
- if (ret < 0)
- mlog_errno(ret);
- }
-
down_write(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;
@@ -2341,8 +2333,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
}
list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
- ret = ocfs2_assure_trans_credits(handle, credits);
- if (ret < 0) {
+ ret = ocfs2_assure_trans_credits(inode, &handle, di_bh, credits, jbd2_max);
+ if (ret == 1) {
+ goto unlock;
+ } else if (ret < 0) {
mlog_errno(ret);
break;
}
@@ -2365,6 +2359,15 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
ocfs2_commit_trans(osb, handle);
unlock:
up_write(&oi->ip_alloc_sem);
+
+ if (dwc->dw_orphaned) {
+ BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+ ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
+
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
out:
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index e5f58ff2175f..57ad69d19494 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -474,14 +474,40 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks)
* credits. Similar notes regarding data consistency and locking implications
* as for ocfs2_extend_trans() apply here.
*/
-int ocfs2_assure_trans_credits(handle_t *handle, int nblocks)
+int ocfs2_assure_trans_credits(struct inode *inode, handle_t **i_handle,
+ struct buffer_head *bh, int nblocks, int jbd2_max)
{
+ handle_t *handle = *i_handle;
int old_nblks = jbd2_handle_buffer_credits(handle);
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ int ret, wanted;
+ int plan = nblocks << 1; /* Double the credits to prevent boundary issues */
trace_ocfs2_assure_trans_credits(old_nblks);
- if (old_nblks >= nblocks)
- return 0;
- return ocfs2_extend_trans(handle, nblocks - old_nblks);
+
+ wanted = old_nblks + plan;
+
+ if (wanted < jbd2_max) {
+ if (old_nblks < nblocks)
+ return ocfs2_extend_trans(handle, nblocks - old_nblks);
+ else
+ return 0;
+ }
+
+ ocfs2_commit_trans(osb, handle);
+ handle = ocfs2_start_trans(osb, plan);
+ if (IS_ERR(handle)) {
+ mlog_errno(PTR_ERR(handle));
+ return 1; /* caller must not commit trans for this error */
+ }
+
+ ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret)
+ mlog_errno(ret);
+
+ *i_handle = handle;
+ return ret;
}
/*
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 6397170f302f..08a76ffb870a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -244,8 +244,9 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb,
int ocfs2_commit_trans(struct ocfs2_super *osb,
handle_t *handle);
int ocfs2_extend_trans(handle_t *handle, int nblocks);
-int ocfs2_assure_trans_credits(handle_t *handle,
- int nblocks);
+int ocfs2_assure_trans_credits(struct inode *inode,
+ handle_t **i_handle, struct buffer_head *bh,
+ int nblocks, int jbd2_max);
int ocfs2_allocate_extend_trans(handle_t *handle,
int thresh);
--
2.43.0
```
- Heming
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-03-27 3:02 ` Heming Zhao
@ 2026-03-27 3:12 ` Joseph Qi
0 siblings, 0 replies; 5+ messages in thread
From: Joseph Qi @ 2026-03-27 3:12 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara
On 3/27/26 11:02 AM, Heming Zhao wrote:
> On Fri, Mar 27, 2026 at 09:42:55AM +0800, Joseph Qi wrote:
>> Hi,
>>
>> On 3/26/26 10:26 PM, Heming Zhao wrote:
>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>> ocfs2_dio_end_io_write
>>> ocfs2_mark_extent_written
>>> ocfs2_change_extent_flag
>>> ocfs2_split_extent
>>> ocfs2_try_to_merge_extent
>>> ocfs2_extend_rotate_transaction
>>> ocfs2_extend_trans
>>> jbd2__journal_restart
>>> start_this_handle
>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>
>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
>>> to handle extent in a batch of transaction.
>>>
>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>> only be removed from the orphan list after the extent tree update is complete.
>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>> won't leave stale blocks beyond EOF.
>>>
>>> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
>>> suggestions.
>>>
>>> Suggested-by: Jan Kara <jack@suse.cz>
>>> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>> ---
>>> fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
>>> 1 file changed, 44 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 09146b43d1f0..60f1b607022f 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -37,6 +37,8 @@
>>> #include "namei.h"
>>> ... ... <--- snip --->
>>> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
>>> + ocfs2_commit_trans(osb, handle);
>>> + handle = NULL;
>>> + batch = 0;
>>> + }
>>> }
>>>
>>> if (end > i_size_read(inode)) {
>>
>> I still don't think it is a good idea to update inode size in case error.
>> The original logic behaves inconsistent, if ocfs2_start_trans() and
>> ocfs2_journal_access_di() fails, it won't update inode size, but if
>> ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
>> So let's make it behave consistent by both checking 'ret' here.
>>
>> Other looks fine.
>>
>> Joseph
>
> After sending the v4 patch, I did some tests and identified the key of the
> performance degradation in the dynamic credit adjustment patch.
> I found that by using half of the jbd2_max value, the time consumption returned
> to the same level as the "direct loop" style.
>
> key code change:
> (base on patch[1]: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch)
> int jbd2_max = (journal->j_max_transaction_buffers -
> journal->j_transaction_overhead_buffers) >> 2;
>
> my test env:
> - 10G ocfs2 volume
> - DIO 2G file, write with 64MB block (--bs=64M).
>
> observations:
> - ocfs2_assure_trans_credits calls "commit & start trans" times on every 64MB block:
> - using jbd2_max (5449): ~9 times.
> - using half value (2724): ~20 times.
> - v4 patch ("batch" style), call "commit & start trans" on every 64MB block: ~62 times
>
> time consumption:
>
> jbd2_max:5449 case
> real 2m9.346s
> user 0m0.277s
> sys 0m22.748s
>
> half jbd2_max:2724 case
> real 1m49.400s
> user 0m0.343s
> sys 0m22.734s
>
> v4 patch:
> real 1m49.650s
> user 0m0.337s
> sys 0m22.780s
>
>
> the reason (I guess):
> The original patch only performed a "commit and start new trans" when jbd2
> credits were nearly exhausted. I suspect contention between the commit flush
> operation and the jbd2 operations in the dio end path, leading to high latency.
> By using half the jbd2_max value, trigger commits more frequently, which appears
> to reduce jbd2 contention and racing.
>
> Finally, I suggest we use the dynamic credit extension patch for v5. This
> version only adds logic to ocfs2_assure_trans_credits() and does not affect
> i_size changes during error cases.
>
Ummm... I think v4 looks simpler and direct.
For i_size update logic, I think the existing logic looks wried. Take a
look at ext4, it also doesn't do this if converting unwritten extents
fails.
Joseph
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-27 3:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 14:26 [PATCH v4 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
2026-03-26 14:26 ` [PATCH v4 1/1] " Heming Zhao
2026-03-27 1:42 ` Joseph Qi
2026-03-27 3:02 ` Heming Zhao
2026-03-27 3:12 ` Joseph Qi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox