All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joseph Qi <joseph.qi@linux.alibaba.com>
To: Heming Zhao <heming.zhao@suse.com>
Cc: ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org,
	Jan Kara <jack@suse.cz>,
	glass.su@suse.com
Subject: Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
Date: Thu, 19 Mar 2026 09:53:16 +0800	[thread overview]
Message-ID: <a108b18a-e27e-424c-80f1-c19ea2c30685@linux.alibaba.com> (raw)
In-Reply-To: <x7tv6dzqhiwcvld2cct4olou5oioeis5k4nlpr6oglpch2skvr@duochardclqx>



On 3/18/26 2:40 PM, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
>>
>>
>> On 3/13/26 11:17 AM, Heming Zhao wrote:
>>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>>> Almost looks fine, minor updates below.
>>>>
>>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>>> ocfs2_dio_end_io_write
>>>>>  ocfs2_mark_extent_written
>>>>>   ocfs2_change_extent_flag
>>>>>    ocfs2_split_extent
>>>>>     ocfs2_try_to_merge_extent
>>>>>      ocfs2_extend_rotate_transaction
>>>>>       ocfs2_extend_trans
>>>>>        jbd2__journal_restart
>>>>>         start_this_handle
>>>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>>
>>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>>> handle each extent in a separate transaction.
>>>>>
>>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>>> only be removed from the orphan list after the extent tree update is complete.
>>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>>> won't leave stale blocks beyond EOF.
>>>>>
>>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>>> insufficient transaction credits").
>>>>>
>>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>>
>>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>>> ---
>>>>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>>>  1 file changed, 29 insertions(+), 29 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>> index 09146b43d1f0..91997b330d39 100644
>>>>> --- a/fs/ocfs2/aops.c
>>>>> +++ b/fs/ocfs2/aops.c
>>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>  		goto out;
>>>>>  	}
>>>>>  
>>>>> -	/* Delete orphan before acquire i_rwsem. */
>>>>> -	if (dwc->dw_orphaned) {
>>>>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>>> -
>>>>> -		end = end > i_size_read(inode) ? end : 0;
>>>>> -
>>>>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>>> -				!!end, end);
>>>>> -		if (ret < 0)
>>>>> -			mlog_errno(ret);
>>>>> -	}
>>>>> -
>>>>>  	down_write(&oi->ip_alloc_sem);
>>>>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>>  
>>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>  
>>>>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>>  
>>>>> -	handle = ocfs2_start_trans(osb, credits);
>>>>> -	if (IS_ERR(handle)) {
>>>>> -		ret = PTR_ERR(handle);
>>>>> -		mlog_errno(ret);
>>>>> -		goto unlock;
>>>>> -	}
>>>>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
>>>>> -	if (ret) {
>>>>> -		mlog_errno(ret);
>>>>> -		goto commit;
>>>>> -	}
>>>>> -
>>>>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>>> -		ret = ocfs2_assure_trans_credits(handle, credits);
>>>>> -		if (ret < 0) {
>>>>> +		handle = ocfs2_start_trans(osb, credits);
>>>>> +		if (IS_ERR(handle)) {
>>>>> +			ret = PTR_ERR(handle);
>>>>>  			mlog_errno(ret);
>>>>>  			break;
>>>>
>>>> I'd rather goto unlock directly without update i_size in case error.
>>>
>>> agree
>>>>
>>>>>  		}
>>>>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
>>>>> +		if (ret) {
>>>>> +			mlog_errno(ret);
>>>>> +			ocfs2_commit_trans(osb, handle);
>>>>> +			break;
>>>>
>>>> Ditto.
>>>
>>> agree
>>>>
>>>>> +		}
>>>>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>>>  						ue->ue_cpos, 1,
>>>>>  						ue->ue_phys,
>>>>>  						meta_ac, &dealloc);
>>>>>  		if (ret < 0) {
>>>>>  			mlog_errno(ret);
>>>>> +			ocfs2_commit_trans(osb, handle);
>>>>>  			break;
>>>>
>>>> Ditto.
>>>
>>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>>> returns an error. I am not certain whether updating i_size in this case is
>>> correct, but I prefer to maintain the original logic for now.
>>> Does that seem reasonable to you?
>>>
>>
>> Since it returns 0 for unwritten extents, it looks fine.
>>
>>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>>>> to save the extra call in case error.
>>>
>>> agree
>>>
>>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
>>> is about 12107 ~ 12457.
>>> Regarding the transaction splitting in the unwritten handling loop: this patch
>>> introduces a minor performance regression. I would like to merge some of the
>>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
>>> What do you think of this approach?
>>>
>>
>> Seems we can limit max credit for a single transaction here?
>> If exceeds, restart a new one.
>>
>> Joseph,
>> Thanks
> 
> sorry for the late reply, I spend some time running the tests.
> I have created two different types of patches (see below) to address this issue.
> 
> time consumption results:
> 
> ocfs2-dynamic-... patch
> real    2m9.900s
> user    0m0.333s
> sys     0m22.687s
> 
> %200 of ocfs2-split-trans-... patch
> real    1m48.901s
> user    0m0.306s
> sys     0m23.019s
> 
> %500 of ocfs2-split-trans-... patch
> real    1m49.350s
> user    0m0.299s
> sys     0m22.718s
> 
> As shown above, the fixed "500% mode" is faster than the dynamic style.
> Which approach do you prefer?
> 
I'd prefer solution B, it looks simpler to directly set a commit batch.
Some comments for the formal patch:
1. I don't think you have to use 'cnt + trans_start + mod' to control
the batch commit, just check 'batch + handle' seems enough.
2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
3. Do not update i_size in case error.
4. Correctly handle commit transaction in case error.

Thanks,
Joseph



  reply	other threads:[~2026-03-19  1:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 16:27 [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid Heming Zhao
2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
2026-03-13  2:04   ` Joseph Qi
2026-03-13  3:17     ` Heming Zhao
2026-03-13  4:53       ` Joseph Qi
2026-03-13  5:35         ` Joseph Qi
2026-03-18  6:40         ` Heming Zhao
2026-03-19  1:53           ` Joseph Qi [this message]
2026-03-24  7:18             ` Heming Zhao
2026-03-25  3:31             ` Heming Zhao
2026-03-25  6:33               ` Joseph Qi
2026-03-24  6:00           ` Heming Zhao
2026-03-13  2:27   ` Joseph Qi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a108b18a-e27e-424c-80f1-c19ea2c30685@linux.alibaba.com \
    --to=joseph.qi@linux.alibaba.com \
    --cc=glass.su@suse.com \
    --cc=heming.zhao@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.