From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EC0BEB64DD for ; Fri, 11 Aug 2023 06:21:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231423AbjHKGVX (ORCPT ); Fri, 11 Aug 2023 02:21:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229706AbjHKGVW (ORCPT ); Fri, 11 Aug 2023 02:21:22 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1FA32D48 for ; Thu, 10 Aug 2023 23:21:19 -0700 (PDT) Received: from kwepemi500022.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4RMYYZ4dhRzkX3M; Fri, 11 Aug 2023 14:20:02 +0800 (CST) Received: from [10.174.177.210] (10.174.177.210) by kwepemi500022.china.huawei.com (7.221.188.64) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Fri, 11 Aug 2023 14:21:16 +0800 Message-ID: Date: Fri, 11 Aug 2023 14:21:15 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [PATCH] xfs: fix deadlock when set label online To: yangerkun , Dave Chinner CC: , , , , References: <20230626131542.3711391-1-yangerkun@huaweicloud.com> <4d6ee3b3-6d4b-ddb6-eb8e-e04a7e0c1ab0@huaweicloud.com> <4139563b-8918-d89a-c926-4155228a12dc@huaweicloud.com> From: yangerkun In-Reply-To: <4139563b-8918-d89a-c926-4155228a12dc@huaweicloud.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.210] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemi500022.china.huawei.com (7.221.188.64) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org 在 2023/6/30 10:19, yangerkun 写道: > > > 在 2023/6/30 6:24, Dave Chinner 写道: >> On Thu, Jun 29, 2023 at 07:55:10PM +0800, yangerkun wrote: >>> 在 2023/6/29 7:10, Dave Chinner 写道: >>>> On Tue, Jun 27, 2023 at 04:42:41PM +0800, yangerkun wrote: >>>>> 在 2023/6/27 5:45, Dave Chinner 写道: >>>>>> On Mon, Jun 26, 2023 at 09:15:42PM +0800, yangerkun wrote: >>>>>>> From: yangerkun >>>>>>> >>>>>>> Combine use of xfs_trans_hold and xfs_trans_set_sync in >>>>>>> xfs_sync_sb_buf >>>>>>> can trigger a deadlock once shutdown happened concurrently. >>>>>>> xlog_ioend_work >>>>>>> will first unpin the sb(which stuck with xfs_buf_lock), then wakeup >>>>>>> xfs_sync_sb_buf. However, xfs_sync_sb_buf never get the chance to >>>>>>> unlock >>>>>>> sb until been wakeup by xlog_ioend_work. >>>>>>> >>>>>>> xfs_sync_sb_buf >>>>>>>      xfs_trans_getsb // lock sb buf >>>>>>>      xfs_trans_bhold // sb buf keep lock until success commit >>>>>>>      xfs_trans_commit >>>>>>>      ... >>>>>>>        xfs_log_force_seq >>>>>>>          xlog_force_lsn >>>>>>>            xlog_wait_on_iclog >>>>>>>              xlog_wait(&iclog->ic_force_wait... // shutdown happened >>>>>>>      xfs_buf_relse // unlock sb buf >>>>>>> >>>>>>> xlog_ioend_work >>>>>>>      xlog_force_shutdown >>>>>>>        xlog_state_shutdown_callbacks >>>>>>>          xlog_cil_process_committed >>>>>>>            xlog_cil_committed >>>>>>>            ... >>>>>>>            xfs_buf_item_unpin >>>>>>>              xfs_buf_lock // deadlock >>>>>>>          wake_up_all(&iclog->ic_force_wait) >>>>>>> >>>>>>> xfs_ioc_setlabel use xfs_sync_sb_buf to make sure userspace will >>>>>>> see the >>>>>>> change for sb immediately. We can simply call >>>>>>> xfs_ail_push_all_sync to >>>>>>> do this and sametime fix the deadlock. >>>>>> >>>>>> Why is this deadlock specific to the superblock buffer? >>>>> >>>>> Hi Dave, >>>>> >>>>> Thanks a lot for your revirew! We find this problem when do some code >>>>> reading(which can help us to fix another growfs bug). And then >>>>> reproduce it >>>>> easily when we set label online frequently with IO error inject at the >>>>> sametime. >>>> >>>> Right, I know how it can be triggered; that's not actually my >>>> concern... >>>> >>>>>> Can't any buffer that is held locked over a synchronous transaction >>>>>> commit deadlock during a shutdown like this? >>>>> >>>>> After check all place use xfs_buf_bhold, it seems xfs_sync_sb_buf >>>>> is the >>>>> only convict that combine use xfs_trans_hold and >>>>> xfs_trans_set_sync(I'm not >>>>> familiar with xfs yet, so I may have some problems with my code >>>>> check)... >>>> >>>> Yes, I can also see that. But my concern is that this change only >>>> addresses the symptom, but leaves the underlying deadlock unsolved. >>>> >>>> Indeed, this isn't xfs_trans_commit() I'm worried about here; it's >>>> the call to xfs_log_force(mp, XFS_LOG_SYNC) or >>>> xfs_log_force_seq(XFS_LOG_SYNC) with a buffer held locked that I'm >>>> worried about. >>>> >>>> i.e. We have a buffer in the CIL (from a previous transaction) that >>>> we currently hold locked while we call xfs_log_force(XFS_LOG_SYNC). >>>> If a shutdown occurs while we are waiting for journal IO completion >>>> to occur, then xlog_ioend_work() will attempt to lock the buffer and >>>> deadlock, right? >>>> >>>> e.g. I'm thinking of things like busy extent flushing (hold AGF + >>>> AGFL + AG btree blocks locked when we call xfs_log_force()) could >>>> also be vulnerable to the same deadlock... >>> >>> You mean something like xfs_allocbt_alloc_block(call xfs_log_force to >>> flush busy extent which keep agf locked sametime)? >>> >>> We call xfs_log_force(mp, XFS_LOG_SYNC) after lock agf and before >>> xfs_trans_commit. It seems ok since xfs_buf_item_unpin will not call >>> xfs_buf_lock because bli_refcount still keep active(once we hold locked >>> agf, the bli_refcount will inc in _xfs_trans_bjoin, and keep it until >>> xfs_trans_commit success(clean agf item) or .iop_unpin(dirty agf item, >>> call from xlog_ioend_work) which can be called after xfs_trans_commit >>> too)... >> >> Again, I gave an example of the class of issue I'm worried about. >> Again, you chased the one example given through, but haven't >> mentioned a thing about all the other code paths that lead to >> xfs_log_force(SYNC) that might hold buffers locked that I didn't >> mention. >> >> I don't want to have to ask every person who proposes a fix about >> every possible code path the bug may manifest in -one at a time-.  I >> use examples to point you in the right direction for further >> analysis of the rest of the code base, not because that's the only >> thing I want checked. Please use your initiative to look at all the >> callers of xfs_log_force(SYNC) and determine if they are all safe or >> whether there are landmines lurked or even more bugs of a similar >> sort. > > Hi Dave, > > Thank you very much for pointing this out! I'm so sorry for the lack of > awareness of a comprehensive investigation does there any other place > can trigger the bug too... > >> >> When we learn about a new issue, this is the sort of audit work that >> is necessary to determine the scope of the issue. We need to perform >> such audits because they direct the scope of the fix necessary. We >> are not interested in slapping a band-aid fix over the symptom that >> was reported - that only leads to more band-aid fixes as the same >> issue appears in other places. > > Yes, agree with you and thanks for your advise, it can really help me to > forbid a band-aid fix however leads to more band-aid fixes, so can > contribute better! > >> >> Now we know there is a lock ordering problem in this code, so before >> we attempt to fix it we need to know how widespread it is, what the >> impact is, how different code paths avoid it, etc. That requires a >> code audit to determine, and that requires looking at all the paths >> into xfs_log_force(XFS_LOG_SYNC) to determine if they are safe or >> not and documenting that. >> >> Yes, it's more work *right now* than slapping a quick band-aid fix >> over it, but it's much less work in the long run for us and we don't >> have to keep playing whack-a-mole because we fixed it the right way >> the first time. >> > > I will try to look all paths into xfs_log_force(XFS_LOG_SYNC) or > xfs_log_force_seq(XFS_LOG_SYNC) to check if it's safe or not. Thanks > again for your advise! > > Thanks, > Yang Erkun. > >> -Dave. > Hi, Dave, Sorry for the late reply, I was quiet busy last month and it also took me long time to check does all the callers of xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) was safe. I'm not familiar with xfs yet, so if there's anything wrong with the description below, please point it out! The logic I choose was to check will we call xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) between xfs_buf_lock/xfs_buf_trylock and xfs_buf_unlock at the same thread context(I have check other item's .iop_unpin, it seems only xfs_buf item can trigger the problem since it will try to lock the buf in xfs_buf_item_unpin; besides, different thread context call for xfs_buf_lock&xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) and xfs_buf_unlock is safe too since this unlock will not wait until xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) success return), and once it happend, will we trigger the bug too? I divide the logic of calling xfs_buf_lock/xfs_buf_trylock into two categories: 1. Later the xfs_buf will join the tp(xfs_trans_bjoin will inc .bli_refcount) a. xfs_trans_bjoin will inc .bli_refcount b. xfs_buf_item_pin will inc .bli_refcount when the item of xfs_buf was dirty c. xfs_buf_item_committing will dec .bli_refcount no matter the item was dirty or not, and normally it will unlock the xfs_buf, or keep the xfs_buf locked when we see XFS_BLI_HOLD d. xfs_buf_item_unpin will dec .bli_refcount, and it won't call xfs_buf_lock when another .bli_refcount exist xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) can happend before we commit the tp(like xfs_create, it will first read&lock agi buf, then call xfs_dir_createname, which may trigger agfl fixup, the block allocation may see the busy extent and call xfs_log_force(SYNC) to flush the busy extent journal). It won't trigger the problem since xfs_trans_bjoin will keep another .bli_refcount. xfs_log_force_seq(SYNC) can happend when we commit the tp since we see XFS_TRANS_SYNC(see __xfs_trans_commit), the only case we can trigger the deadlock was that we have combine called xfs_trans_bhold, or we will unlock xfs_buf in xfs_buf_item_committing. The case which this patch try to fix was the only case combine call for xfs_trans_bhold and set XFS_TRANS_SYNC. After commit tp, xfs_buf will only keep locked because of xfs_trans_bhold, and once there is a XFS_TRANS_PERM_LOG_RES tp, we may trigger another commit, then xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) can happend too. But it is safe too since we will first rejoin xfs_buf to tp which help protect us. 2. The xfs_buf won't join the tp xfs_buf_readahead_map xfs_buf_read xfs_buf_get xfs_buf_incore xfs_buf_delwri_cancel xfs_buf_delwri_submit_buffers xfs_buf_delwri_pushbuf xfs_buf_item_unpin xfs_iflush_shutdown_abort xfs_log_quiesce xlog_do_recover xfs_freesb xfs_add_incompat_log_feature xfs_clear_incompat_log_features Most case above was io, for xfs_buf_unlock called from another thread context, it is safe; for xfs_buf_unlock called from same thread context, it is safe too since we won't trigger xfs_log_force(SYNC)/xfs_log_force_seq(SYNC) between xfs_buf_lock and xfs_buf_unlock. From above, it seems only xfs_sync_sb_buf can trigger this deadlock... And I prefer to add some comments to xfs_trans_bhold to notice that there is a bug when combine use xfs_trans_bhold and xfs_log_force(SYNC)/xfs_log_force_seq(SYNC)... Dave, sorry again for the late reply, and look forward to your reply! Thanks, Yang Erkun.