Re: [PATCH] xfs: fix deadlock when set label online

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: yangerkun <yangerkun@huawei.com>
To: yangerkun <yangerkun@huaweicloud.com>,
	Dave Chinner <david@fromorbit.com>
Cc: <djwong@kernel.org>, <dchinner@redhat.com>, <sandeen@redhat.com>,
	<linux-xfs@vger.kernel.org>, <yukuai3@huawei.com>
Subject: Re: [PATCH] xfs: fix deadlock when set label online
Date: Fri, 11 Aug 2023 14:21:15 +0800	[thread overview]
Message-ID: <ebfee92e-c8fa-c9bc-6093-7d8a06e48c88@huawei.com> (raw)
In-Reply-To: <4139563b-8918-d89a-c926-4155228a12dc@huaweicloud.com>



在 2023/6/30 10:19, yangerkun 写道:
> 
> 
> 在 2023/6/30 6:24, Dave Chinner 写道:
>> On Thu, Jun 29, 2023 at 07:55:10PM +0800, yangerkun wrote:
>>> 在 2023/6/29 7:10, Dave Chinner 写道:
>>>> On Tue, Jun 27, 2023 at 04:42:41PM +0800, yangerkun wrote:
>>>>> 在 2023/6/27 5:45, Dave Chinner 写道:
>>>>>> On Mon, Jun 26, 2023 at 09:15:42PM +0800, yangerkun wrote:
>>>>>>> From: yangerkun <yangerkun@huawei.com>
>>>>>>>
>>>>>>> Combine use of xfs_trans_hold and xfs_trans_set_sync in 
>>>>>>> xfs_sync_sb_buf
>>>>>>> can trigger a deadlock once shutdown happened concurrently. 
>>>>>>> xlog_ioend_work
>>>>>>> will first unpin the sb(which stuck with xfs_buf_lock), then wakeup
>>>>>>> xfs_sync_sb_buf. However, xfs_sync_sb_buf never get the chance to 
>>>>>>> unlock
>>>>>>> sb until been wakeup by xlog_ioend_work.
>>>>>>>
>>>>>>> xfs_sync_sb_buf
>>>>>>>      xfs_trans_getsb // lock sb buf
>>>>>>>      xfs_trans_bhold // sb buf keep lock until success commit
>>>>>>>      xfs_trans_commit
>>>>>>>      ...
>>>>>>>        xfs_log_force_seq
>>>>>>>          xlog_force_lsn
>>>>>>>            xlog_wait_on_iclog
>>>>>>>              xlog_wait(&iclog->ic_force_wait... // shutdown happened
>>>>>>>      xfs_buf_relse // unlock sb buf
>>>>>>>
>>>>>>> xlog_ioend_work
>>>>>>>      xlog_force_shutdown
>>>>>>>        xlog_state_shutdown_callbacks
>>>>>>>          xlog_cil_process_committed
>>>>>>>            xlog_cil_committed
>>>>>>>            ...
>>>>>>>            xfs_buf_item_unpin
>>>>>>>              xfs_buf_lock // deadlock
>>>>>>>          wake_up_all(&iclog->ic_force_wait)
>>>>>>>
>>>>>>> xfs_ioc_setlabel use xfs_sync_sb_buf to make sure userspace will 
>>>>>>> see the
>>>>>>> change for sb immediately. We can simply call 
>>>>>>> xfs_ail_push_all_sync to
>>>>>>> do this and sametime fix the deadlock.
>>>>>>
>>>>>> Why is this deadlock specific to the superblock buffer?
>>>>>
>>>>> Hi Dave,
>>>>>
>>>>> Thanks a lot for your revirew! We find this problem when do some code
>>>>> reading(which can help us to fix another growfs bug). And then 
>>>>> reproduce it
>>>>> easily when we set label online frequently with IO error inject at the
>>>>> sametime.
>>>>
>>>> Right, I know how it can be triggered; that's not actually my
>>>> concern...
>>>>
>>>>>> Can't any buffer that is held locked over a synchronous transaction
>>>>>> commit deadlock during a shutdown like this?
>>>>>
>>>>> After check all place use xfs_buf_bhold, it seems xfs_sync_sb_buf 
>>>>> is the
>>>>> only convict that combine use xfs_trans_hold and 
>>>>> xfs_trans_set_sync(I'm not
>>>>> familiar with xfs yet, so I may have some problems with my code 
>>>>> check)...
>>>>
>>>> Yes, I can also see that. But my concern is that this change only
>>>> addresses the symptom, but leaves the underlying deadlock unsolved.
>>>>
>>>> Indeed, this isn't xfs_trans_commit() I'm worried about here; it's
>>>> the call to xfs_log_force(mp, XFS_LOG_SYNC) or
>>>> xfs_log_force_seq(XFS_LOG_SYNC) with a buffer held locked that I'm
>>>> worried about.
>>>>
>>>> i.e. We have a buffer in the CIL (from a previous transaction) that
>>>> we currently hold locked while we call xfs_log_force(XFS_LOG_SYNC).
>>>> If a shutdown occurs while we are waiting for journal IO completion
>>>> to occur, then xlog_ioend_work() will attempt to lock the buffer and
>>>> deadlock, right?
>>>>
>>>> e.g. I'm thinking of things like busy extent flushing (hold AGF +
>>>> AGFL + AG btree blocks locked when we call xfs_log_force()) could
>>>> also be vulnerable to the same deadlock...
>>>
>>> You mean something like xfs_allocbt_alloc_block(call xfs_log_force to
>>> flush busy extent which keep agf locked sametime)?
>>>
>>> We call xfs_log_force(mp, XFS_LOG_SYNC) after lock agf and before
>>> xfs_trans_commit. It seems ok since xfs_buf_item_unpin will not call
>>> xfs_buf_lock because bli_refcount still keep active(once we hold locked
>>> agf, the bli_refcount will inc in _xfs_trans_bjoin, and keep it until
>>> xfs_trans_commit success(clean agf item) or .iop_unpin(dirty agf item,
>>> call from xlog_ioend_work) which can be called after xfs_trans_commit
>>> too)...
>>
>> Again, I gave an example of the class of issue I'm worried about.
>> Again, you chased the one example given through, but haven't
>> mentioned a thing about all the other code paths that lead to
>> xfs_log_force(SYNC) that might hold buffers locked that I didn't
>> mention.
>>
>> I don't want to have to ask every person who proposes a fix about
>> every possible code path the bug may manifest in -one at a time-.  I
>> use examples to point you in the right direction for further
>> analysis of the rest of the code base, not because that's the only
>> thing I want checked. Please use your initiative to look at all the
>> callers of xfs_log_force(SYNC) and determine if they are all safe or
>> whether there are landmines lurked or even more bugs of a similar
>> sort.
> 
> Hi Dave,
> 
> Thank you very much for pointing this out! I'm so sorry for the lack of
> awareness of a comprehensive investigation does there any other place
> can trigger the bug too...
> 
>>
>> When we learn about a new issue, this is the sort of audit work that
>> is necessary to determine the scope of the issue. We need to perform
>> such audits because they direct the scope of the fix necessary. We
>> are not interested in slapping a band-aid fix over the symptom that
>> was reported - that only leads to more band-aid fixes as the same
>> issue appears in other places.
> 
> Yes, agree with you and thanks for your advise, it can really help me to
> forbid a band-aid fix however leads to more band-aid fixes, so can
> contribute better!
> 
>>
>> Now we know there is a lock ordering problem in this code, so before
>> we attempt to fix it we need to know how widespread it is, what the
>> impact is, how different code paths avoid it, etc. That requires a
>> code audit to determine, and that requires looking at all the paths
>> into xfs_log_force(XFS_LOG_SYNC) to determine if they are safe or
>> not and documenting that.
>>
>> Yes, it's more work *right now* than slapping a quick band-aid fix
>> over it, but it's much less work in the long run for us and we don't
>> have to keep playing whack-a-mole because we fixed it the right way
>> the first time.
>>
> 
> I will try to look all paths into xfs_log_force(XFS_LOG_SYNC) or
> xfs_log_force_seq(XFS_LOG_SYNC) to check if it's safe or not. Thanks
> again for your advise!
> 
> Thanks,
> Yang Erkun.
> 
>> -Dave.
> 

Hi, Dave,

Sorry for the late reply, I was quiet busy last month and it also took 
me long time to check does all the callers of 
xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) was safe. I'm not familiar 
with xfs yet, so if there's anything wrong with the description below, 
please point it out!

The logic I choose was to check will we call 
xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) between 
xfs_buf_lock/xfs_buf_trylock and xfs_buf_unlock at the same thread 
context(I have check other item's .iop_unpin, it seems only xfs_buf item 
can trigger the problem since it will try to lock the buf in 
xfs_buf_item_unpin; besides, different thread context call for 
xfs_buf_lock&xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) and 
xfs_buf_unlock is safe too since this unlock will not wait until 
xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) success return), and once it 
happend, will we trigger the bug too?

I divide the logic of calling xfs_buf_lock/xfs_buf_trylock into two 
categories:

1. Later the xfs_buf will join the tp(xfs_trans_bjoin will inc 
.bli_refcount)

a. xfs_trans_bjoin will inc .bli_refcount
b. xfs_buf_item_pin will inc .bli_refcount when the item of xfs_buf was 
dirty
c. xfs_buf_item_committing will dec .bli_refcount no matter the item was 
dirty or not, and normally it will unlock the xfs_buf, or keep the 
xfs_buf locked when we see XFS_BLI_HOLD
d. xfs_buf_item_unpin will dec .bli_refcount, and it won't call 
xfs_buf_lock when another .bli_refcount exist

xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) can happend before we commit 
the tp(like xfs_create, it will first read&lock agi buf, then call 
xfs_dir_createname, which may trigger agfl fixup, the block allocation 
may see the busy extent and call xfs_log_force(SYNC) to flush the busy 
extent journal). It won't trigger the problem since xfs_trans_bjoin will 
keep another .bli_refcount.

xfs_log_force_seq(SYNC) can happend when we commit the tp since we see 
XFS_TRANS_SYNC(see __xfs_trans_commit), the only case we can trigger the 
deadlock was that we have combine called xfs_trans_bhold, or we will 
unlock xfs_buf in xfs_buf_item_committing. The case which this patch try 
to fix was the only case combine call for xfs_trans_bhold and set 
XFS_TRANS_SYNC.

After commit tp, xfs_buf will only keep locked because of 
xfs_trans_bhold, and once there is a XFS_TRANS_PERM_LOG_RES tp, we may 
trigger another commit, then xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) 
can happend too. But it is safe too since we will first rejoin xfs_buf 
to tp which help protect us.

2. The xfs_buf won't join the tp

xfs_buf_readahead_map
xfs_buf_read
xfs_buf_get
xfs_buf_incore
xfs_buf_delwri_cancel
xfs_buf_delwri_submit_buffers
xfs_buf_delwri_pushbuf
xfs_buf_item_unpin
xfs_iflush_shutdown_abort
xfs_log_quiesce
xlog_do_recover
xfs_freesb
xfs_add_incompat_log_feature
xfs_clear_incompat_log_features

Most case above was io, for xfs_buf_unlock called from another thread 
context, it is safe; for xfs_buf_unlock called from same thread context, 
it is safe too since we won't trigger 
xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) between xfs_buf_lock and 
xfs_buf_unlock.


 From above, it seems only xfs_sync_sb_buf can trigger this deadlock... 
And I prefer to add some comments to xfs_trans_bhold to notice that 
there is a bug when combine use xfs_trans_bhold and 
xfs_log_force(SYNC)/xfs_log_force_seq(SYNC)...


Dave, sorry again for the late reply, and look forward to your reply!

Thanks,
Yang Erkun.

     prev parent reply	other threads:[~2023-08-11  6:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 13:15 [PATCH] xfs: fix deadlock when set label online yangerkun
2023-06-26 21:45 ` Dave Chinner
2023-06-27  8:42   ` yangerkun
2023-06-28 23:10     ` Dave Chinner
2023-06-29 11:55       ` yangerkun
2023-06-29 22:24         ` Dave Chinner
2023-06-30  2:19           ` yangerkun
2023-08-11  6:21             ` yangerkun [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebfee92e-c8fa-c9bc-6093-7d8a06e48c88@huawei.com \
    --to=yangerkun@huawei.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=yangerkun@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).