* [PATCH v6 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
@ 2026-04-02 2:26 Heming Zhao
2026-04-02 2:26 ` [PATCH v6 1/1] " Heming Zhao
0 siblings, 1 reply; 6+ messages in thread
From: Heming Zhao @ 2026-04-02 2:26 UTC (permalink / raw)
To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel, glass.su
For easier merging, this patch is based on Joseph's patch [1].
v5->v6:
add a check for handle before updating i_size.
v4->v5:
change update inode size and remove orphan logic.
add above change in commit log.
v3->v4:
Remove [patch 2/2] as the revert operation is incorrect.
v2->v3:
Following the discussion, use 'batch' and 'handle' to control
restarting the jbd2 transaction.
v1->v2:
following the review comments, restore the i_size update code in
ocfs2_dio_end_io_write().
the runtime of the test script [2].
real 1m48.148s
user 0m0.344s
sys 0m22.684s
[1]:
https://lore.kernel.org/ocfs2-devel/46yilbaq5z5x6gdfdpoa6lprf6sf3gbxriuku2odje4kx4bovf@jd735cphfutz/T/#t
[2]:
https://lore.kernel.org/ocfs2-devel/75f89a17-213b-42a0-a30e-d52fb2d077a6@linux.alibaba.com/T/#mbe2b5f52ee249178e1ad4c76d964de2dc818eb32
Heming Zhao (1):
ocfs2: split transactions in dio completion to avoid credit exhaustion
fs/ocfs2/aops.c | 71 ++++++++++++++++++++++++++++++-------------------
1 file changed, 43 insertions(+), 28 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v6 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-04-02 2:26 [PATCH v6 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
@ 2026-04-02 2:26 ` Heming Zhao
2026-04-02 7:07 ` Joseph Qi
0 siblings, 1 reply; 6+ messages in thread
From: Heming Zhao @ 2026-04-02 2:26 UTC (permalink / raw)
To: joseph.qi, jack; +Cc: Heming Zhao, ocfs2-devel, linux-kernel, glass.su
During ocfs2 dio operations, JBD2 may report warnings via following
call trace:
ocfs2_dio_end_io_write
ocfs2_mark_extent_written
ocfs2_change_extent_flag
ocfs2_split_extent
ocfs2_try_to_merge_extent
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2__journal_restart
start_this_handle
output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
handle extents in a batch of transaction.
Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
only be removed from the orphan list after the extent tree update is complete.
This ensures that if a crash occurs in the middle of extent tree updates, we
won't leave stale blocks beyond EOF.
This patch also changes the logic for updating the inode size and removing
orphan, making it similar to ext4_dio_write_end_io(). Both operations are
performed only when everything looks good.
Finally, thanks to Jans and Joseph for providing the bug fix prototype and
suggestions.
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
fs/ocfs2/aops.c | 71 ++++++++++++++++++++++++++++++-------------------
1 file changed, 43 insertions(+), 28 deletions(-)
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 09146b43d1f0..0c88f751ffd5 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -37,6 +37,8 @@
#include "namei.h"
#include "sysfile.h"
+#define OCFS2_DIO_MARK_EXTENT_BATCH 200
+
static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
@@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
struct ocfs2_alloc_context *meta_ac = NULL;
handle_t *handle = NULL;
loff_t end = offset + bytes;
- int ret = 0, credits = 0;
+ int ret = 0, credits = 0, batch = 0;
ocfs2_init_dealloc_ctxt(&dealloc);
@@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
goto out;
}
- /* Delete orphan before acquire i_rwsem. */
- if (dwc->dw_orphaned) {
- BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
-
- end = end > i_size_read(inode) ? end : 0;
-
- ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
- !!end, end);
- if (ret < 0)
- mlog_errno(ret);
- }
-
down_write(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;
@@ -2326,24 +2316,25 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
- handle = ocfs2_start_trans(osb, credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto unlock;
- }
- ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret) {
- mlog_errno(ret);
- goto commit;
- }
-
list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
+ if (!handle) {
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ goto unlock;
+ }
+ ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ goto commit;
+ }
+ }
ret = ocfs2_assure_trans_credits(handle, credits);
if (ret < 0) {
mlog_errno(ret);
- break;
+ goto commit;
}
ret = ocfs2_mark_extent_written(inode, &et, handle,
ue->ue_cpos, 1,
@@ -2351,19 +2342,43 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
meta_ac, &dealloc);
if (ret < 0) {
mlog_errno(ret);
- break;
+ goto commit;
+ }
+
+ if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
+ ocfs2_commit_trans(osb, handle);
+ handle = NULL;
+ batch = 0;
}
}
if (end > i_size_read(inode)) {
+ if (!handle) {
+ handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ goto unlock;
+ }
+ }
ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
if (ret < 0)
mlog_errno(ret);
}
+
commit:
ocfs2_commit_trans(osb, handle);
unlock:
up_write(&oi->ip_alloc_sem);
+
+ /* everything looks good, let's start the cleanup */
+ if (!ret && dwc->dw_orphaned) {
+ BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
+
+ ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
+ if (ret < 0)
+ mlog_errno(ret);
+ }
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
out:
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-04-02 2:26 ` [PATCH v6 1/1] " Heming Zhao
@ 2026-04-02 7:07 ` Joseph Qi
2026-04-02 7:43 ` Heming Zhao
0 siblings, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2026-04-02 7:07 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara
On 4/2/26 10:26 AM, Heming Zhao wrote:
> During ocfs2 dio operations, JBD2 may report warnings via following
> call trace:
> ocfs2_dio_end_io_write
> ocfs2_mark_extent_written
> ocfs2_change_extent_flag
> ocfs2_split_extent
> ocfs2_try_to_merge_extent
> ocfs2_extend_rotate_transaction
> ocfs2_extend_trans
> jbd2__journal_restart
> start_this_handle
> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>
> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> handle extents in a batch of transaction.
>
> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> only be removed from the orphan list after the extent tree update is complete.
> This ensures that if a crash occurs in the middle of extent tree updates, we
> won't leave stale blocks beyond EOF.
>
> This patch also changes the logic for updating the inode size and removing
> orphan, making it similar to ext4_dio_write_end_io(). Both operations are
> performed only when everything looks good.
>
> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> suggestions.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
> fs/ocfs2/aops.c | 71 ++++++++++++++++++++++++++++++-------------------
> 1 file changed, 43 insertions(+), 28 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 09146b43d1f0..0c88f751ffd5 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -37,6 +37,8 @@
> #include "namei.h"
> #include "sysfile.h"
>
> +#define OCFS2_DIO_MARK_EXTENT_BATCH 200
> +
> static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
> struct buffer_head *bh_result, int create)
> {
> @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> struct ocfs2_alloc_context *meta_ac = NULL;
> handle_t *handle = NULL;
> loff_t end = offset + bytes;
> - int ret = 0, credits = 0;
> + int ret = 0, credits = 0, batch = 0;
>
> ocfs2_init_dealloc_ctxt(&dealloc);
>
> @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> goto out;
> }
>
> - /* Delete orphan before acquire i_rwsem. */
> - if (dwc->dw_orphaned) {
> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> -
> - end = end > i_size_read(inode) ? end : 0;
> -
> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> - !!end, end);
> - if (ret < 0)
> - mlog_errno(ret);
> - }
> -
> down_write(&oi->ip_alloc_sem);
> di = (struct ocfs2_dinode *)di_bh->b_data;
>
> @@ -2326,24 +2316,25 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>
> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>
> - handle = ocfs2_start_trans(osb, credits);
> - if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - mlog_errno(ret);
> - goto unlock;
> - }
> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> - OCFS2_JOURNAL_ACCESS_WRITE);
> - if (ret) {
> - mlog_errno(ret);
> - goto commit;
> - }
> -
> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> + if (!handle) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + goto unlock;
> + }
> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> + OCFS2_JOURNAL_ACCESS_WRITE);
> + if (ret) {
> + mlog_errno(ret);
> + goto commit;
> + }
> + }
> ret = ocfs2_assure_trans_credits(handle, credits);
> if (ret < 0) {
> mlog_errno(ret);
> - break;
> + goto commit;
> }
> ret = ocfs2_mark_extent_written(inode, &et, handle,
> ue->ue_cpos, 1,
> @@ -2351,19 +2342,43 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> meta_ac, &dealloc);
> if (ret < 0) {
> mlog_errno(ret);
> - break;
> + goto commit;
> + }
> +
> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> + ocfs2_commit_trans(osb, handle);
> + handle = NULL;
> + batch = 0;
> }
> }
>
> if (end > i_size_read(inode)) {
> + if (!handle) {
> + handle = ocfs2_start_trans(osb, credits);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + mlog_errno(ret);
> + goto unlock;
> + }
> + }
> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> if (ret < 0)
> mlog_errno(ret);
> }
> +
> commit:
> ocfs2_commit_trans(osb, handle);
Seems checking 'handle' is still missing here.
Joseph
> unlock:
> up_write(&oi->ip_alloc_sem);
> +
> + /* everything looks good, let's start the cleanup */
> + if (!ret && dwc->dw_orphaned) {
> + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> +
> + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> + if (ret < 0)
> + mlog_errno(ret);
> + }
> ocfs2_inode_unlock(inode, 1);
> brelse(di_bh);
> out:
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-04-02 7:07 ` Joseph Qi
@ 2026-04-02 7:43 ` Heming Zhao
2026-04-02 7:48 ` Joseph Qi
0 siblings, 1 reply; 6+ messages in thread
From: Heming Zhao @ 2026-04-02 7:43 UTC (permalink / raw)
To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara
On Thu, Apr 02, 2026 at 03:07:10PM +0800, Joseph Qi wrote:
>
>
> On 4/2/26 10:26 AM, Heming Zhao wrote:
> > During ocfs2 dio operations, JBD2 may report warnings via following
> > call trace:
> > ocfs2_dio_end_io_write
> > ocfs2_mark_extent_written
> > ocfs2_change_extent_flag
> > ocfs2_split_extent
> > ocfs2_try_to_merge_extent
> > ocfs2_extend_rotate_transaction
> > ocfs2_extend_trans
> > jbd2__journal_restart
> > start_this_handle
> > output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >
> > To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> > handle extents in a batch of transaction.
> >
> > Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> > only be removed from the orphan list after the extent tree update is complete.
> > This ensures that if a crash occurs in the middle of extent tree updates, we
> > won't leave stale blocks beyond EOF.
> >
> > This patch also changes the logic for updating the inode size and removing
> > orphan, making it similar to ext4_dio_write_end_io(). Both operations are
> > performed only when everything looks good.
> >
> > Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> > suggestions.
> >
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> > fs/ocfs2/aops.c | 71 ++++++++++++++++++++++++++++++-------------------
> > 1 file changed, 43 insertions(+), 28 deletions(-)
> >
> > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> > index 09146b43d1f0..0c88f751ffd5 100644
> > --- a/fs/ocfs2/aops.c
> > +++ b/fs/ocfs2/aops.c
> > @@ -37,6 +37,8 @@
> > #include "namei.h"
> > #include "sysfile.h"
> >
> > +#define OCFS2_DIO_MARK_EXTENT_BATCH 200
> > +
> > static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
> > struct buffer_head *bh_result, int create)
> > {
> > @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > struct ocfs2_alloc_context *meta_ac = NULL;
> > handle_t *handle = NULL;
> > loff_t end = offset + bytes;
> > - int ret = 0, credits = 0;
> > + int ret = 0, credits = 0, batch = 0;
> >
> > ocfs2_init_dealloc_ctxt(&dealloc);
> >
> > @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > goto out;
> > }
> >
> > - /* Delete orphan before acquire i_rwsem. */
> > - if (dwc->dw_orphaned) {
> > - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > -
> > - end = end > i_size_read(inode) ? end : 0;
> > -
> > - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> > - !!end, end);
> > - if (ret < 0)
> > - mlog_errno(ret);
> > - }
> > -
> > down_write(&oi->ip_alloc_sem);
> > di = (struct ocfs2_dinode *)di_bh->b_data;
> >
> > @@ -2326,24 +2316,25 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >
> > credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >
> > - handle = ocfs2_start_trans(osb, credits);
> > - if (IS_ERR(handle)) {
> > - ret = PTR_ERR(handle);
> > - mlog_errno(ret);
> > - goto unlock;
> > - }
> > - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > - OCFS2_JOURNAL_ACCESS_WRITE);
> > - if (ret) {
> > - mlog_errno(ret);
> > - goto commit;
> > - }
> > -
> > list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> > + if (!handle) {
> > + handle = ocfs2_start_trans(osb, credits);
> > + if (IS_ERR(handle)) {
> > + ret = PTR_ERR(handle);
> > + mlog_errno(ret);
> > + goto unlock;
> > + }
> > + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> > + OCFS2_JOURNAL_ACCESS_WRITE);
> > + if (ret) {
> > + mlog_errno(ret);
> > + goto commit;
> > + }
> > + }
> > ret = ocfs2_assure_trans_credits(handle, credits);
> > if (ret < 0) {
> > mlog_errno(ret);
> > - break;
> > + goto commit;
> > }
> > ret = ocfs2_mark_extent_written(inode, &et, handle,
> > ue->ue_cpos, 1,
> > @@ -2351,19 +2342,43 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> > meta_ac, &dealloc);
> > if (ret < 0) {
> > mlog_errno(ret);
> > - break;
> > + goto commit;
> > + }
> > +
> > + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> > + ocfs2_commit_trans(osb, handle);
> > + handle = NULL;
> > + batch = 0;
> > }
> > }
> >
> > if (end > i_size_read(inode)) {
> > + if (!handle) {
> > + handle = ocfs2_start_trans(osb, credits);
> > + if (IS_ERR(handle)) {
> > + ret = PTR_ERR(handle);
> > + mlog_errno(ret);
> > + goto unlock;
> > + }
> > + }
> > ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> > if (ret < 0)
> > mlog_errno(ret);
> > }
> > +
> > commit:
> > ocfs2_commit_trans(osb, handle);
>
> Seems checking 'handle' is still missing here.
>
> Joseph
I have a different view.
In the list_for_each_entry(), if it finishes because the condition
(++batch == OCFS2_DIO_MARK_EXTENT_BATCH) is met, the handle will be null.
However, it will be initialized later within the i_size update block.
In my view, the handle must be non-null before entering ocfs2_commit_trans().
- Heming
>
> > unlock:
> > up_write(&oi->ip_alloc_sem);
> > +
> > + /* everything looks good, let's start the cleanup */
> > + if (!ret && dwc->dw_orphaned) {
> > + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> > +
> > + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0);
> > + if (ret < 0)
> > + mlog_errno(ret);
> > + }
> > ocfs2_inode_unlock(inode, 1);
> > brelse(di_bh);
> > out:
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-04-02 7:43 ` Heming Zhao
@ 2026-04-02 7:48 ` Joseph Qi
2026-04-02 7:54 ` Heming Zhao
0 siblings, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2026-04-02 7:48 UTC (permalink / raw)
To: Heming Zhao; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara
On 4/2/26 3:43 PM, Heming Zhao wrote:
> On Thu, Apr 02, 2026 at 03:07:10PM +0800, Joseph Qi wrote:
>>
>>
>> On 4/2/26 10:26 AM, Heming Zhao wrote:
>>> During ocfs2 dio operations, JBD2 may report warnings via following
>>> call trace:
>>> ocfs2_dio_end_io_write
>>> ocfs2_mark_extent_written
>>> ocfs2_change_extent_flag
>>> ocfs2_split_extent
>>> ocfs2_try_to_merge_extent
>>> ocfs2_extend_rotate_transaction
>>> ocfs2_extend_trans
>>> jbd2__journal_restart
>>> start_this_handle
>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>
>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>> handle extents in a batch of transaction.
>>>
>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>> only be removed from the orphan list after the extent tree update is complete.
>>> This ensures that if a crash occurs in the middle of extent tree updates, we
>>> won't leave stale blocks beyond EOF.
>>>
>>> This patch also changes the logic for updating the inode size and removing
>>> orphan, making it similar to ext4_dio_write_end_io(). Both operations are
>>> performed only when everything looks good.
>>>
>>> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
>>> suggestions.
>>>
>>> Suggested-by: Jan Kara <jack@suse.cz>
>>> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>> ---
>>> fs/ocfs2/aops.c | 71 ++++++++++++++++++++++++++++++-------------------
>>> 1 file changed, 43 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 09146b43d1f0..0c88f751ffd5 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -37,6 +37,8 @@
>>> #include "namei.h"
>>> #include "sysfile.h"
>>>
>>> +#define OCFS2_DIO_MARK_EXTENT_BATCH 200
>>> +
>>> static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
>>> struct buffer_head *bh_result, int create)
>>> {
>>> @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>> struct ocfs2_alloc_context *meta_ac = NULL;
>>> handle_t *handle = NULL;
>>> loff_t end = offset + bytes;
>>> - int ret = 0, credits = 0;
>>> + int ret = 0, credits = 0, batch = 0;
>>>
>>> ocfs2_init_dealloc_ctxt(&dealloc);
>>>
>>> @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>> goto out;
>>> }
>>>
>>> - /* Delete orphan before acquire i_rwsem. */
>>> - if (dwc->dw_orphaned) {
>>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>> -
>>> - end = end > i_size_read(inode) ? end : 0;
>>> -
>>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>> - !!end, end);
>>> - if (ret < 0)
>>> - mlog_errno(ret);
>>> - }
>>> -
>>> down_write(&oi->ip_alloc_sem);
>>> di = (struct ocfs2_dinode *)di_bh->b_data;
>>>
>>> @@ -2326,24 +2316,25 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>
>>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>
>>> - handle = ocfs2_start_trans(osb, credits);
>>> - if (IS_ERR(handle)) {
>>> - ret = PTR_ERR(handle);
>>> - mlog_errno(ret);
>>> - goto unlock;
>>> - }
>>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>> - OCFS2_JOURNAL_ACCESS_WRITE);
>>> - if (ret) {
>>> - mlog_errno(ret);
>>> - goto commit;
>>> - }
>>> -
>>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>> + if (!handle) {
>>> + handle = ocfs2_start_trans(osb, credits);
>>> + if (IS_ERR(handle)) {
>>> + ret = PTR_ERR(handle);
>>> + mlog_errno(ret);
>>> + goto unlock;
>>> + }
>>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>> + OCFS2_JOURNAL_ACCESS_WRITE);
>>> + if (ret) {
>>> + mlog_errno(ret);
>>> + goto commit;
>>> + }
>>> + }
>>> ret = ocfs2_assure_trans_credits(handle, credits);
>>> if (ret < 0) {
>>> mlog_errno(ret);
>>> - break;
>>> + goto commit;
>>> }
>>> ret = ocfs2_mark_extent_written(inode, &et, handle,
>>> ue->ue_cpos, 1,
>>> @@ -2351,19 +2342,43 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>> meta_ac, &dealloc);
>>> if (ret < 0) {
>>> mlog_errno(ret);
>>> - break;
>>> + goto commit;
>>> + }
>>> +
>>> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
>>> + ocfs2_commit_trans(osb, handle);
>>> + handle = NULL;
>>> + batch = 0;
>>> }
>>> }
>>>
>>> if (end > i_size_read(inode)) {
>>> + if (!handle) {
>>> + handle = ocfs2_start_trans(osb, credits);
>>> + if (IS_ERR(handle)) {
>>> + ret = PTR_ERR(handle);
>>> + mlog_errno(ret);
>>> + goto unlock;
>>> + }
>>> + }
>>> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
>>> if (ret < 0)
>>> mlog_errno(ret);
>>> }
>>> +
>>> commit:
>>> ocfs2_commit_trans(osb, handle);
>>
>> Seems checking 'handle' is still missing here.
>>
>> Joseph
>
> I have a different view.
> In the list_for_each_entry(), if it finishes because the condition
> (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) is met, the handle will be null.
> However, it will be initialized later within the i_size update block.
> In my view, the handle must be non-null before entering ocfs2_commit_trans().
>
Ummm... inode size update is conditional, right?
Thanks,
Joseph
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
2026-04-02 7:48 ` Joseph Qi
@ 2026-04-02 7:54 ` Heming Zhao
0 siblings, 0 replies; 6+ messages in thread
From: Heming Zhao @ 2026-04-02 7:54 UTC (permalink / raw)
To: Joseph Qi; +Cc: ocfs2-devel, linux-kernel, glass.su, Jan Kara
On Thu, Apr 02, 2026 at 03:48:36PM +0800, Joseph Qi wrote:
>
>
> On 4/2/26 3:43 PM, Heming Zhao wrote:
> > On Thu, Apr 02, 2026 at 03:07:10PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 4/2/26 10:26 AM, Heming Zhao wrote:
> >>> During ocfs2 dio operations, JBD2 may report warnings via following
> >>> call trace:
> >>> ocfs2_dio_end_io_write
> >>> ocfs2_mark_extent_written
> >>> ocfs2_change_extent_flag
> >>> ocfs2_split_extent
> >>> ocfs2_try_to_merge_extent
> >>> ocfs2_extend_rotate_transaction
> >>> ocfs2_extend_trans
> >>> jbd2__journal_restart
> >>> start_this_handle
> >>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
> >>>
> >>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
> >>> handle extents in a batch of transaction.
> >>>
> >>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
> >>> only be removed from the orphan list after the extent tree update is complete.
> >>> This ensures that if a crash occurs in the middle of extent tree updates, we
> >>> won't leave stale blocks beyond EOF.
> >>>
> >>> This patch also changes the logic for updating the inode size and removing
> >>> orphan, making it similar to ext4_dio_write_end_io(). Both operations are
> >>> performed only when everything looks good.
> >>>
> >>> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
> >>> suggestions.
> >>>
> >>> Suggested-by: Jan Kara <jack@suse.cz>
> >>> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> >>> Reviewed-by: Jan Kara <jack@suse.cz>
> >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> >>> ---
> >>> fs/ocfs2/aops.c | 71 ++++++++++++++++++++++++++++++-------------------
> >>> 1 file changed, 43 insertions(+), 28 deletions(-)
> >>>
> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> >>> index 09146b43d1f0..0c88f751ffd5 100644
> >>> --- a/fs/ocfs2/aops.c
> >>> +++ b/fs/ocfs2/aops.c
> >>> @@ -37,6 +37,8 @@
> >>> #include "namei.h"
> >>> #include "sysfile.h"
> >>>
> >>> +#define OCFS2_DIO_MARK_EXTENT_BATCH 200
> >>> +
> >>> static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
> >>> struct buffer_head *bh_result, int create)
> >>> {
> >>> @@ -2277,7 +2279,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>> struct ocfs2_alloc_context *meta_ac = NULL;
> >>> handle_t *handle = NULL;
> >>> loff_t end = offset + bytes;
> >>> - int ret = 0, credits = 0;
> >>> + int ret = 0, credits = 0, batch = 0;
> >>>
> >>> ocfs2_init_dealloc_ctxt(&dealloc);
> >>>
> >>> @@ -2294,18 +2296,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>> goto out;
> >>> }
> >>>
> >>> - /* Delete orphan before acquire i_rwsem. */
> >>> - if (dwc->dw_orphaned) {
> >>> - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> >>> -
> >>> - end = end > i_size_read(inode) ? end : 0;
> >>> -
> >>> - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
> >>> - !!end, end);
> >>> - if (ret < 0)
> >>> - mlog_errno(ret);
> >>> - }
> >>> -
> >>> down_write(&oi->ip_alloc_sem);
> >>> di = (struct ocfs2_dinode *)di_bh->b_data;
> >>>
> >>> @@ -2326,24 +2316,25 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>>
> >>> credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
> >>>
> >>> - handle = ocfs2_start_trans(osb, credits);
> >>> - if (IS_ERR(handle)) {
> >>> - ret = PTR_ERR(handle);
> >>> - mlog_errno(ret);
> >>> - goto unlock;
> >>> - }
> >>> - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>> - OCFS2_JOURNAL_ACCESS_WRITE);
> >>> - if (ret) {
> >>> - mlog_errno(ret);
> >>> - goto commit;
> >>> - }
> >>> -
> >>> list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
> >>> + if (!handle) {
> >>> + handle = ocfs2_start_trans(osb, credits);
> >>> + if (IS_ERR(handle)) {
> >>> + ret = PTR_ERR(handle);
> >>> + mlog_errno(ret);
> >>> + goto unlock;
> >>> + }
> >>> + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> >>> + OCFS2_JOURNAL_ACCESS_WRITE);
> >>> + if (ret) {
> >>> + mlog_errno(ret);
> >>> + goto commit;
> >>> + }
> >>> + }
> >>> ret = ocfs2_assure_trans_credits(handle, credits);
> >>> if (ret < 0) {
> >>> mlog_errno(ret);
> >>> - break;
> >>> + goto commit;
> >>> }
> >>> ret = ocfs2_mark_extent_written(inode, &et, handle,
> >>> ue->ue_cpos, 1,
> >>> @@ -2351,19 +2342,43 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> >>> meta_ac, &dealloc);
> >>> if (ret < 0) {
> >>> mlog_errno(ret);
> >>> - break;
> >>> + goto commit;
> >>> + }
> >>> +
> >>> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
> >>> + ocfs2_commit_trans(osb, handle);
> >>> + handle = NULL;
> >>> + batch = 0;
> >>> }
> >>> }
> >>>
> >>> if (end > i_size_read(inode)) {
> >>> + if (!handle) {
> >>> + handle = ocfs2_start_trans(osb, credits);
> >>> + if (IS_ERR(handle)) {
> >>> + ret = PTR_ERR(handle);
> >>> + mlog_errno(ret);
> >>> + goto unlock;
> >>> + }
> >>> + }
> >>> ret = ocfs2_set_inode_size(handle, inode, di_bh, end);
> >>> if (ret < 0)
> >>> mlog_errno(ret);
> >>> }
> >>> +
> >>> commit:
> >>> ocfs2_commit_trans(osb, handle);
> >>
> >> Seems checking 'handle' is still missing here.
> >>
> >> Joseph
> >
> > I have a different view.
> > In the list_for_each_entry(), if it finishes because the condition
> > (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) is met, the handle will be null.
> > However, it will be initialized later within the i_size update block.
> > In my view, the handle must be non-null before entering ocfs2_commit_trans().
> >
>
> Ummm... inode size update is conditional, right?
>
> Thanks,
> Joseph
>
How stupid of me! understand your point now.
- Heming
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-02 7:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 2:26 [PATCH v6 0/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
2026-04-02 2:26 ` [PATCH v6 1/1] " Heming Zhao
2026-04-02 7:07 ` Joseph Qi
2026-04-02 7:43 ` Heming Zhao
2026-04-02 7:48 ` Joseph Qi
2026-04-02 7:54 ` Heming Zhao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.