public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Joseph Qi <joseph.qi@linux.alibaba.com>
To: Heming Zhao <heming.zhao@suse.com>
Cc: ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org,
	Jan Kara <jack@suse.cz>,
	glass.su@suse.com
Subject: Re: [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
Date: Thu, 19 Mar 2026 09:53:16 +0800	[thread overview]
Message-ID: <a108b18a-e27e-424c-80f1-c19ea2c30685@linux.alibaba.com> (raw)
In-Reply-To: <x7tv6dzqhiwcvld2cct4olou5oioeis5k4nlpr6oglpch2skvr@duochardclqx>



On 3/18/26 2:40 PM, Heming Zhao wrote:
> On Fri, Mar 13, 2026 at 12:53:32PM +0800, Joseph Qi wrote:
>>
>>
>> On 3/13/26 11:17 AM, Heming Zhao wrote:
>>> On Fri, Mar 13, 2026 at 10:04:11AM +0800, Joseph Qi wrote:
>>>> Almost looks fine, minor updates below.
>>>>
>>>> On 3/13/26 12:27 AM, Heming Zhao wrote:
>>>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>>>> ocfs2_dio_end_io_write
>>>>>  ocfs2_mark_extent_written
>>>>>   ocfs2_change_extent_flag
>>>>>    ocfs2_split_extent
>>>>>     ocfs2_try_to_merge_extent
>>>>>      ocfs2_extend_rotate_transaction
>>>>>       ocfs2_extend_trans
>>>>>        jbd2__journal_restart
>>>>>         start_this_handle
>>>>>          output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>>>
>>>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
>>>>> handle each extent in a separate transaction.
>>>>>
>>>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>>>> only be removed from the orphan list after the extent tree update is complete.
>>>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>>>> won't leave stale blocks beyond EOF.
>>>>>
>>>>> This patch also removes the only call to ocfs2_assure_trans_credits(), which
>>>>> was introduced by commit be346c1a6eeb ("ocfs2: fix DIO failure due to
>>>>> insufficient transaction credits").
>>>>>
>>>>> Finally, thanks to Jans for providing the bug fix prototype and suggestions.
>>>>>
>>>>> Suggested-by: Jan Kara <jack@suse.cz>
>>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>>> ---
>>>>>  fs/ocfs2/aops.c | 58 ++++++++++++++++++++++++-------------------------
>>>>>  1 file changed, 29 insertions(+), 29 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>> index 09146b43d1f0..91997b330d39 100644
>>>>> --- a/fs/ocfs2/aops.c
>>>>> +++ b/fs/ocfs2/aops.c
>>>>> @@ -2294,18 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>  		goto out;
>>>>>  	}
>>>>>  
>>>>> -	/* Delete orphan before acquire i_rwsem. */
>>>>> -	if (dwc->dw_orphaned) {
>>>>> -		BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
>>>>> -
>>>>> -		end = end > i_size_read(inode) ? end : 0;
>>>>> -
>>>>> -		ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh,
>>>>> -				!!end, end);
>>>>> -		if (ret < 0)
>>>>> -			mlog_errno(ret);
>>>>> -	}
>>>>> -
>>>>>  	down_write(&oi->ip_alloc_sem);
>>>>>  	di = (struct ocfs2_dinode *)di_bh->b_data;
>>>>>  
>>>>> @@ -2326,44 +2314,56 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>>  
>>>>>  	credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list);
>>>>>  
>>>>> -	handle = ocfs2_start_trans(osb, credits);
>>>>> -	if (IS_ERR(handle)) {
>>>>> -		ret = PTR_ERR(handle);
>>>>> -		mlog_errno(ret);
>>>>> -		goto unlock;
>>>>> -	}
>>>>> -	ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>> -				      OCFS2_JOURNAL_ACCESS_WRITE);
>>>>> -	if (ret) {
>>>>> -		mlog_errno(ret);
>>>>> -		goto commit;
>>>>> -	}
>>>>> -
>>>>>  	list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) {
>>>>> -		ret = ocfs2_assure_trans_credits(handle, credits);
>>>>> -		if (ret < 0) {
>>>>> +		handle = ocfs2_start_trans(osb, credits);
>>>>> +		if (IS_ERR(handle)) {
>>>>> +			ret = PTR_ERR(handle);
>>>>>  			mlog_errno(ret);
>>>>>  			break;
>>>>
>>>> I'd rather goto unlock directly without update i_size in case error.
>>>
>>> agree
>>>>
>>>>>  		}
>>>>> +		ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
>>>>> +					      OCFS2_JOURNAL_ACCESS_WRITE);
>>>>> +		if (ret) {
>>>>> +			mlog_errno(ret);
>>>>> +			ocfs2_commit_trans(osb, handle);
>>>>> +			break;
>>>>
>>>> Ditto.
>>>
>>> agree
>>>>
>>>>> +		}
>>>>>  		ret = ocfs2_mark_extent_written(inode, &et, handle,
>>>>>  						ue->ue_cpos, 1,
>>>>>  						ue->ue_phys,
>>>>>  						meta_ac, &dealloc);
>>>>>  		if (ret < 0) {
>>>>>  			mlog_errno(ret);
>>>>> +			ocfs2_commit_trans(osb, handle);
>>>>>  			break;
>>>>
>>>> Ditto.
>>>
>>> The existing code still updates i_size even if ocfs2_mark_extent_written()
>>> returns an error. I am not certain whether updating i_size in this case is
>>> correct, but I prefer to maintain the original logic for now.
>>> Does that seem reasonable to you?
>>>
>>
>> Since it returns 0 for unwritten extents, it looks fine.
>>
>>>> BTW, we can move ocfs2_commit_trans() rightly after ocfs2_mark_extent_written()
>>>> to save the extra call in case error.
>>>
>>> agree
>>>
>>> I ran a DIO test using 64MB chunk size (e.g.: fio --bs=64M), the dwc->dw_zero_count
>>> is about 12107 ~ 12457.
>>> Regarding the transaction splitting in the unwritten handling loop: this patch
>>> introduces a minor performance regression. I would like to merge some of the
>>> transaction calls. e.g.: by starting a new transaction every 100 or 200 iterations.
>>> What do you think of this approach?
>>>
>>
>> Seems we can limit max credit for a single transaction here?
>> If exceeds, restart a new one.
>>
>> Joseph,
>> Thanks
> 
> sorry for the late reply, I spend some time running the tests.
> I have created two different types of patches (see below) to address this issue.
> 
> time consumption results:
> 
> ocfs2-dynamic-... patch
> real    2m9.900s
> user    0m0.333s
> sys     0m22.687s
> 
> %200 of ocfs2-split-trans-... patch
> real    1m48.901s
> user    0m0.306s
> sys     0m23.019s
> 
> %500 of ocfs2-split-trans-... patch
> real    1m49.350s
> user    0m0.299s
> sys     0m22.718s
> 
> As shown above, the fixed "500% mode" is faster than the dynamic style.
> Which approach do you prefer?
> 
I'd prefer solution B, it looks simpler to directly set a commit batch.
Some comments for the formal patch:
1. I don't think you have to use 'cnt + trans_start + mod' to control
the batch commit, just check 'batch + handle' seems enough.
2. Define a macro for the batch, e.g. OCFS2_DIO_MARK_EXTENT_BATCH,
3. Do not update i_size in case error.
4. Correctly handle commit transaction in case error.

Thanks,
Joseph



  reply	other threads:[~2026-03-19  1:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 16:27 [PATCH v2 0/1] ocfs2: split transactions in dio completion to avoid Heming Zhao
2026-03-12 16:27 ` [PATCH v2 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion Heming Zhao
2026-03-13  2:04   ` Joseph Qi
2026-03-13  3:17     ` Heming Zhao
2026-03-13  4:53       ` Joseph Qi
2026-03-13  5:35         ` Joseph Qi
2026-03-18  6:40         ` Heming Zhao
2026-03-19  1:53           ` Joseph Qi [this message]
2026-03-24  7:18             ` Heming Zhao
2026-03-25  3:31             ` Heming Zhao
2026-03-25  6:33               ` Joseph Qi
2026-03-24  6:00           ` Heming Zhao
2026-03-13  2:27   ` Joseph Qi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a108b18a-e27e-424c-80f1-c19ea2c30685@linux.alibaba.com \
    --to=joseph.qi@linux.alibaba.com \
    --cc=glass.su@suse.com \
    --cc=heming.zhao@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox