From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 771A21A841B for ; Mon, 5 May 2025 17:19:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.148 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746465573; cv=none; b=LKLWm1yNZHgT9dggJKKCW5rEmm3Ulab/CJQHJwQGjYqfRS/YF04YJ0E9mCJupNbHPgzaVRRTIYq7lEGcsmT+x9gprWOFfolMxYDNV6p2bSNuph3Vs2DUIbc5gUQ1UhkBwbL2vn6heVsSSLWlT+UdroAnXyEM+EYYPJgHc2n3l3g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746465573; c=relaxed/simple; bh=CcM3ELPv3CJmNXkr83Gn+5EiKnkRXmbPnEjwWcgy9rs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=fCijcdy0xAnegL8PgpXFDZAjcWtBUNoQu02Tccmb/YHPpYLvSv9s3oglEPzsQnc//4rNN+jroOGu5JuRJpsorKoxjhgLnKZmoXdIT4mp9wIYP7HBxcD5lwYCFeBpE/wi6ILwWMfXlO7taqOuDl6IVx7MVMg36kLkgHWok/jpHtI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=iWsD7gNM; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=LRc6B7Jn; arc=none smtp.client-ip=202.12.124.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="iWsD7gNM"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="LRc6B7Jn" Received: from phl-compute-05.internal (phl-compute-05.phl.internal [10.202.2.45]) by mailfout.stl.internal (Postfix) with ESMTP id 7A0681140217; Mon, 5 May 2025 13:19:29 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-05.internal (MEProxy); Mon, 05 May 2025 13:19:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1746465569; x=1746551969; bh=YGbkHepHbX VeLojwHTi/aBSrAQ11O97htRMbmtAnWE8=; b=iWsD7gNM6OTRzbkzQ1gzj3dLiD O4V4S/5XroyvDUNJtWXW4akCVAl90d2rNMQ6bftuVRxs4V16uNKBM7OW4lvO2YdD VyRoE0bceJgZ5+7CFmaNdv7cJCViwBwTUKBQ7UFoZkFcKa3dlN71/D58iTcAyx18 26NriikkbxWAi8qpWuTgyRN9Pvo2w1rVABGvzM2Uu3pSVOOHHlK4qc6qe17tXsoP uhHRo1oAVzyipfeeV/9Uz7Fv7eYGgXVdhuGPNY1XWt9O2H1SBY9GqR/PIyWa31/e OoWdR/aSH/50MpB7YVuws2s3Oiwcw2KeGWsM5W9LmqYa58FEgk1OnNc5Mtsw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1746465569; x=1746551969; bh=YGbkHepHbXVeLojwHTi/aBSrAQ11O97htRM bmtAnWE8=; b=LRc6B7JnvYRyAyhRf4sbwM5FERQZqn/NaHJcLTOHFJhDKCrzg81 UuqU1n2YJNy5plpCarx+VcwWJZ81e6jPfjWtk5Jy5MLK1iHKxoPjqk0B3Wvb0gXR 4e9jrgC9J4kl+b12NiGAMYTq9eOou/BMWLmW2k5RQpVOsJ4vltcpBBIub2qxcmip 0ms5weJyHpp1rt6ZcXjRsTp/fttp6TovEI+1hgx+iMjFureIbYeKInvOFNmMVN3D D9g91R0EjjiOK4G3G5ERIHxEFGmWZ000S3w8ytlRsKi0+vNml/6lK6mQZqfckRRx 3Dv/cEw6LlG1cp1B7LmPah555KPTxWpCNPQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvkeduieekucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpeffhf fvvefukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeeuohhrihhsuceuuhhrkhho vhcuoegsohhrihhssegsuhhrrdhioheqnecuggftrfgrthhtvghrnhepvdetgfejtdffgf efheeifeduudfgledugfdugedvledvtedtieffteeiheefieevnecuffhomhgrihhnpeho phgvnhhsuhhsvgdrohhrghdpshhushgvrdgtohhmnecuvehluhhsthgvrhfuihiivgeptd enucfrrghrrghmpehmrghilhhfrhhomhepsghorhhishessghurhdrihhopdhnsggprhgt phhtthhopedvpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehfughmrghnrghnrg eskhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinhhugidqsghtrhhfshesvhhgvghr rdhkvghrnhgvlhdrohhrgh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 5 May 2025 13:19:28 -0400 (EDT) Date: Mon, 5 May 2025 10:20:15 -0700 From: Boris Burkov To: fdmanana@kernel.org Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH 1/2] btrfs: fix discard worker infinite loop after disabling discard Message-ID: <20250505172015.GA3436025@zen.localdomain> References: <363580a4634fb10d27f167814d9c8fcdb1a89e51.1746460035.git.fdmanana@suse.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <363580a4634fb10d27f167814d9c8fcdb1a89e51.1746460035.git.fdmanana@suse.com> On Mon, May 05, 2025 at 04:49:55PM +0100, fdmanana@kernel.org wrote: > From: Filipe Manana > > If the discard worker is running and there's currently only one block > group, that block group is a data block group, it's in the unused block > groups discard list and is being used (it got an extent allocated from it > after becoming unused), the worker can end up in an infinite loop if a > transaction abort happens or the async discard is disabled (during remount > or unmount for example). > > This happens like this: > > 1) Task A, the discard worker, is at peek_discard_list() and > find_next_block_group() returns block group X; > > 2) Block group X is in the unused block groups discard list (its discard > index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past > it become an unused block group and was added to that list, but then > later it got an extent allocated from it, so its ->used counter is not > zero anymore; > > 3) The current transaction is aborted by task B and we end up at > __btrfs_handle_fs_error() in the transaction abort path, where we call > btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from > fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode > (setting SB_RDONLY in the super block's s_flags field); > > 4) Task A calls __add_to_discard_list() with the goal of moving the block > group from the unused block groups discard list into another discard > list, but at __add_to_discard_list() we end up doing nothing because > btrfs_run_discard_work() returns false, since the super block has > SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set > anymore in fs_info->flags. So block group X remains in the unused block > groups discard list; > > 5) Task A then does a goto into the 'again' label, calls > find_next_block_group() again we gets block group X again. Then it > repeats the previous steps over and over since there are not other > block groups in the discard lists and block group X is never moved > out of the unused block groups discard list since > btrfs_run_discard_work() keeps return false and there > __add_to_discard_list() doesn't move block group X out of that discard > list. > > When this happens we can get a soft lockup report like this: > > [ 71.957163][ C0] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97] > [ 71.957169][ C0] Modules linked in: xfs af_packet rfkill (...) > [ 71.957236][ C0] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa > [ 71.957241][ C0] Tainted: [W]=WARN > [ 71.957242][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 > [ 71.957244][ C0] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] > [ 71.957307][ C0] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] > [ 71.957355][ C0] Code: c1 01 48 83 (...) > [ 71.957357][ C0] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 > [ 71.957359][ C0] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 > [ 71.957360][ C0] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad > [ 71.957361][ C0] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 > [ 71.957362][ C0] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 > [ 71.957363][ C0] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 > [ 71.957368][ C0] FS: 0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000 > [ 71.957369][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 71.957370][ C0] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0 > [ 71.957373][ C0] PKRU: 55555554 > [ 71.957374][ C0] Call Trace: > [ 71.957377][ C0] > [ 71.957381][ C0] process_one_work+0x17e/0x330 > [ 71.957386][ C0] worker_thread+0x2ce/0x3f0 > [ 71.957389][ C0] ? __pfx_worker_thread+0x10/0x10 > [ 71.957391][ C0] kthread+0xef/0x220 > [ 71.957394][ C0] ? __pfx_kthread+0x10/0x10 > [ 71.957397][ C0] ret_from_fork+0x34/0x50 > [ 71.957401][ C0] ? __pfx_kthread+0x10/0x10 > [ 71.957403][ C0] ret_from_fork_asm+0x1a/0x30 > [ 71.957410][ C0] > [ 71.957412][ C0] Kernel panic - not syncing: softlockup: hung tasks > [ 71.987152][ C0] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W L 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa > [ 71.989063][ C0] Tainted: [W]=WARN, [L]=SOFTLOCKUP > [ 71.989934][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 > [ 71.991832][ C0] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] > [ 71.992925][ C0] Call Trace: > [ 71.993601][ C0] > [ 71.994202][ C0] dump_stack_lvl+0x5a/0x80 > [ 71.994923][ C0] panic+0x10b/0x2da > [ 71.995582][ C0] watchdog_timer_fn.cold+0x9a/0xa1 > [ 71.996366][ C0] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 71.997148][ C0] __hrtimer_run_queues+0x132/0x2a0 > [ 71.997913][ C0] hrtimer_interrupt+0xff/0x230 > [ 71.998630][ C0] __sysvec_apic_timer_interrupt+0x55/0x100 > [ 71.999435][ C0] sysvec_apic_timer_interrupt+0x6c/0x90 > [ 72.000208][ C0] > [ 72.000732][ C0] > [ 72.001250][ C0] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 72.002056][ C0] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] > [ 72.002950][ C0] Code: c1 01 48 83 (...) > [ 72.005191][ C0] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 > [ 72.006011][ C0] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 > [ 72.006978][ C0] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad > [ 72.007939][ C0] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 > [ 72.008895][ C0] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 > [ 72.009861][ C0] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 > [ 72.010825][ C0] ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f] > [ 72.011978][ C0] process_one_work+0x17e/0x330 > [ 72.012663][ C0] worker_thread+0x2ce/0x3f0 > [ 72.013325][ C0] ? __pfx_worker_thread+0x10/0x10 > [ 72.014044][ C0] kthread+0xef/0x220 > [ 72.014647][ C0] ? __pfx_kthread+0x10/0x10 > [ 72.015302][ C0] ret_from_fork+0x34/0x50 > [ 72.015939][ C0] ? __pfx_kthread+0x10/0x10 > [ 72.016592][ C0] ret_from_fork_asm+0x1a/0x30 > [ 72.017281][ C0] > [ 72.017931][ C0] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 72.019478][ C0] Rebooting in 90 seconds.. > > So fix this by making sure we move a block group out of the unused block > groups discard list when calling __add_to_discard_list(). > > Fixes: 2bee7eb8bb81 ("btrfs: discard one region at a time in async discard") Fix LGTM, thanks. I think it wouldn't hurt to also add another layer of defense of bailing out of the again: loop in peek_discard_list() if btrfs_run_discard_work() returns false. OTOH, that might wallpaper over some other bug like this one in a way that bites us. Reviewed-by: Boris Burkov > Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012 > Signed-off-by: Filipe Manana > --- > fs/btrfs/discard.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > index d6eef4bd9e9d..de23c4b3515e 100644 > --- a/fs/btrfs/discard.c > +++ b/fs/btrfs/discard.c > @@ -94,8 +94,6 @@ static void __add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, > struct btrfs_block_group *block_group) > { > lockdep_assert_held(&discard_ctl->lock); > - if (!btrfs_run_discard_work(discard_ctl)) > - return; > > if (list_empty(&block_group->discard_list) || > block_group->discard_index == BTRFS_DISCARD_INDEX_UNUSED) { > @@ -118,6 +116,9 @@ static void add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, > if (!btrfs_is_block_group_data_only(block_group)) > return; > > + if (!btrfs_run_discard_work(discard_ctl)) > + return; > + > spin_lock(&discard_ctl->lock); > __add_to_discard_list(discard_ctl, block_group); > spin_unlock(&discard_ctl->lock); > @@ -244,6 +245,18 @@ static struct btrfs_block_group *peek_discard_list( > block_group->used != 0) { > if (btrfs_is_block_group_data_only(block_group)) { > __add_to_discard_list(discard_ctl, block_group); > + /* > + * The block group must have been moved to other > + * discard list even if discard was disabled in > + * the meantime or a transaction abort happened, > + * otherwise we can end up in an infinite loop, > + * always jumping into the 'again' label and > + * keep getting this block group over and over > + * in case there are no other block groups in > + * the discard lists. > + */ > + ASSERT(block_group->discard_index != > + BTRFS_DISCARD_INDEX_UNUSED); > } else { > list_del_init(&block_group->discard_list); > btrfs_put_block_group(block_group); > -- > 2.47.2 >