From: yangerkun <yangerkun@huaweicloud.com>
To: Dave Chinner <david@fromorbit.com>
Cc: djwong@kernel.org, dchinner@redhat.com, sandeen@redhat.com,
linux-xfs@vger.kernel.org, yangerkun@huawei.com,
yukuai3@huawei.com
Subject: Re: [PATCH] xfs: fix deadlock when set label online
Date: Fri, 30 Jun 2023 10:19:53 +0800 [thread overview]
Message-ID: <4139563b-8918-d89a-c926-4155228a12dc@huaweicloud.com> (raw)
In-Reply-To: <ZJ4EkyxoxDYmf8rv@dread.disaster.area>
在 2023/6/30 6:24, Dave Chinner 写道:
> On Thu, Jun 29, 2023 at 07:55:10PM +0800, yangerkun wrote:
>> 在 2023/6/29 7:10, Dave Chinner 写道:
>>> On Tue, Jun 27, 2023 at 04:42:41PM +0800, yangerkun wrote:
>>>> 在 2023/6/27 5:45, Dave Chinner 写道:
>>>>> On Mon, Jun 26, 2023 at 09:15:42PM +0800, yangerkun wrote:
>>>>>> From: yangerkun <yangerkun@huawei.com>
>>>>>>
>>>>>> Combine use of xfs_trans_hold and xfs_trans_set_sync in xfs_sync_sb_buf
>>>>>> can trigger a deadlock once shutdown happened concurrently. xlog_ioend_work
>>>>>> will first unpin the sb(which stuck with xfs_buf_lock), then wakeup
>>>>>> xfs_sync_sb_buf. However, xfs_sync_sb_buf never get the chance to unlock
>>>>>> sb until been wakeup by xlog_ioend_work.
>>>>>>
>>>>>> xfs_sync_sb_buf
>>>>>> xfs_trans_getsb // lock sb buf
>>>>>> xfs_trans_bhold // sb buf keep lock until success commit
>>>>>> xfs_trans_commit
>>>>>> ...
>>>>>> xfs_log_force_seq
>>>>>> xlog_force_lsn
>>>>>> xlog_wait_on_iclog
>>>>>> xlog_wait(&iclog->ic_force_wait... // shutdown happened
>>>>>> xfs_buf_relse // unlock sb buf
>>>>>>
>>>>>> xlog_ioend_work
>>>>>> xlog_force_shutdown
>>>>>> xlog_state_shutdown_callbacks
>>>>>> xlog_cil_process_committed
>>>>>> xlog_cil_committed
>>>>>> ...
>>>>>> xfs_buf_item_unpin
>>>>>> xfs_buf_lock // deadlock
>>>>>> wake_up_all(&iclog->ic_force_wait)
>>>>>>
>>>>>> xfs_ioc_setlabel use xfs_sync_sb_buf to make sure userspace will see the
>>>>>> change for sb immediately. We can simply call xfs_ail_push_all_sync to
>>>>>> do this and sametime fix the deadlock.
>>>>>
>>>>> Why is this deadlock specific to the superblock buffer?
>>>>
>>>> Hi Dave,
>>>>
>>>> Thanks a lot for your revirew! We find this problem when do some code
>>>> reading(which can help us to fix another growfs bug). And then reproduce it
>>>> easily when we set label online frequently with IO error inject at the
>>>> sametime.
>>>
>>> Right, I know how it can be triggered; that's not actually my
>>> concern...
>>>
>>>>> Can't any buffer that is held locked over a synchronous transaction
>>>>> commit deadlock during a shutdown like this?
>>>>
>>>> After check all place use xfs_buf_bhold, it seems xfs_sync_sb_buf is the
>>>> only convict that combine use xfs_trans_hold and xfs_trans_set_sync(I'm not
>>>> familiar with xfs yet, so I may have some problems with my code check)...
>>>
>>> Yes, I can also see that. But my concern is that this change only
>>> addresses the symptom, but leaves the underlying deadlock unsolved.
>>>
>>> Indeed, this isn't xfs_trans_commit() I'm worried about here; it's
>>> the call to xfs_log_force(mp, XFS_LOG_SYNC) or
>>> xfs_log_force_seq(XFS_LOG_SYNC) with a buffer held locked that I'm
>>> worried about.
>>>
>>> i.e. We have a buffer in the CIL (from a previous transaction) that
>>> we currently hold locked while we call xfs_log_force(XFS_LOG_SYNC).
>>> If a shutdown occurs while we are waiting for journal IO completion
>>> to occur, then xlog_ioend_work() will attempt to lock the buffer and
>>> deadlock, right?
>>>
>>> e.g. I'm thinking of things like busy extent flushing (hold AGF +
>>> AGFL + AG btree blocks locked when we call xfs_log_force()) could
>>> also be vulnerable to the same deadlock...
>>
>> You mean something like xfs_allocbt_alloc_block(call xfs_log_force to
>> flush busy extent which keep agf locked sametime)?
>>
>> We call xfs_log_force(mp, XFS_LOG_SYNC) after lock agf and before
>> xfs_trans_commit. It seems ok since xfs_buf_item_unpin will not call
>> xfs_buf_lock because bli_refcount still keep active(once we hold locked
>> agf, the bli_refcount will inc in _xfs_trans_bjoin, and keep it until
>> xfs_trans_commit success(clean agf item) or .iop_unpin(dirty agf item,
>> call from xlog_ioend_work) which can be called after xfs_trans_commit
>> too)...
>
> Again, I gave an example of the class of issue I'm worried about.
> Again, you chased the one example given through, but haven't
> mentioned a thing about all the other code paths that lead to
> xfs_log_force(SYNC) that might hold buffers locked that I didn't
> mention.
>
> I don't want to have to ask every person who proposes a fix about
> every possible code path the bug may manifest in -one at a time-. I
> use examples to point you in the right direction for further
> analysis of the rest of the code base, not because that's the only
> thing I want checked. Please use your initiative to look at all the
> callers of xfs_log_force(SYNC) and determine if they are all safe or
> whether there are landmines lurked or even more bugs of a similar
> sort.
Hi Dave,
Thank you very much for pointing this out! I'm so sorry for the lack of
awareness of a comprehensive investigation does there any other place
can trigger the bug too...
>
> When we learn about a new issue, this is the sort of audit work that
> is necessary to determine the scope of the issue. We need to perform
> such audits because they direct the scope of the fix necessary. We
> are not interested in slapping a band-aid fix over the symptom that
> was reported - that only leads to more band-aid fixes as the same
> issue appears in other places.
Yes, agree with you and thanks for your advise, it can really help me to
forbid a band-aid fix however leads to more band-aid fixes, so can
contribute better!
>
> Now we know there is a lock ordering problem in this code, so before
> we attempt to fix it we need to know how widespread it is, what the
> impact is, how different code paths avoid it, etc. That requires a
> code audit to determine, and that requires looking at all the paths
> into xfs_log_force(XFS_LOG_SYNC) to determine if they are safe or
> not and documenting that.
>
> Yes, it's more work *right now* than slapping a quick band-aid fix
> over it, but it's much less work in the long run for us and we don't
> have to keep playing whack-a-mole because we fixed it the right way
> the first time.
>
I will try to look all paths into xfs_log_force(XFS_LOG_SYNC) or
xfs_log_force_seq(XFS_LOG_SYNC) to check if it's safe or not. Thanks
again for your advise!
Thanks,
Yang Erkun.
> -Dave.
next prev parent reply other threads:[~2023-06-30 2:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-26 13:15 [PATCH] xfs: fix deadlock when set label online yangerkun
2023-06-26 21:45 ` Dave Chinner
2023-06-27 8:42 ` yangerkun
2023-06-28 23:10 ` Dave Chinner
2023-06-29 11:55 ` yangerkun
2023-06-29 22:24 ` Dave Chinner
2023-06-30 2:19 ` yangerkun [this message]
2023-08-11 6:21 ` yangerkun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4139563b-8918-d89a-c926-4155228a12dc@huaweicloud.com \
--to=yangerkun@huaweicloud.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=yangerkun@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).