public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Boris Burkov <boris@bur.io>,
	Daniel Vacek <neelx@suse.com>, Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>
Subject: [PATCH 5.15 31/59] btrfs: fix discard worker infinite loop after disabling discard
Date: Tue, 20 May 2025 15:50:22 +0200	[thread overview]
Message-ID: <20250520125755.095235604@linuxfoundation.org> (raw)
In-Reply-To: <20250520125753.836407405@linuxfoundation.org>

5.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <fdmanana@suse.com>

commit 54db6d1bdd71fa90172a2a6aca3308bbf7fa7eb5 upstream.

If the discard worker is running and there's currently only one block
group, that block group is a data block group, it's in the unused block
groups discard list and is being used (it got an extent allocated from it
after becoming unused), the worker can end up in an infinite loop if a
transaction abort happens or the async discard is disabled (during remount
or unmount for example).

This happens like this:

1) Task A, the discard worker, is at peek_discard_list() and
   find_next_block_group() returns block group X;

2) Block group X is in the unused block groups discard list (its discard
   index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past
   it become an unused block group and was added to that list, but then
   later it got an extent allocated from it, so its ->used counter is not
   zero anymore;

3) The current transaction is aborted by task B and we end up at
   __btrfs_handle_fs_error() in the transaction abort path, where we call
   btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from
   fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode
   (setting SB_RDONLY in the super block's s_flags field);

4) Task A calls __add_to_discard_list() with the goal of moving the block
   group from the unused block groups discard list into another discard
   list, but at __add_to_discard_list() we end up doing nothing because
   btrfs_run_discard_work() returns false, since the super block has
   SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set
   anymore in fs_info->flags. So block group X remains in the unused block
   groups discard list;

5) Task A then does a goto into the 'again' label, calls
   find_next_block_group() again we gets block group X again. Then it
   repeats the previous steps over and over since there are not other
   block groups in the discard lists and block group X is never moved
   out of the unused block groups discard list since
   btrfs_run_discard_work() keeps returning false and therefore
   __add_to_discard_list() doesn't move block group X out of that discard
   list.

When this happens we can get a soft lockup report like this:

  [71.957] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97]
  [71.957] Modules linked in: xfs af_packet rfkill (...)
  [71.957] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G        W          6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa
  [71.957] Tainted: [W]=WARN
  [71.957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  [71.957] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs]
  [71.957] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs]
  [71.957] Code: c1 01 48 83 (...)
  [71.957] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246
  [71.957] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000
  [71.957] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad
  [71.957] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080
  [71.957] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860
  [71.957] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000
  [71.957] FS:  0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000
  [71.957] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [71.957] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0
  [71.957] PKRU: 55555554
  [71.957] Call Trace:
  [71.957]  <TASK>
  [71.957]  process_one_work+0x17e/0x330
  [71.957]  worker_thread+0x2ce/0x3f0
  [71.957]  ? __pfx_worker_thread+0x10/0x10
  [71.957]  kthread+0xef/0x220
  [71.957]  ? __pfx_kthread+0x10/0x10
  [71.957]  ret_from_fork+0x34/0x50
  [71.957]  ? __pfx_kthread+0x10/0x10
  [71.957]  ret_from_fork_asm+0x1a/0x30
  [71.957]  </TASK>
  [71.957] Kernel panic - not syncing: softlockup: hung tasks
  [71.987] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G        W    L     6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa
  [71.989] Tainted: [W]=WARN, [L]=SOFTLOCKUP
  [71.989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  [71.991] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs]
  [71.992] Call Trace:
  [71.993]  <IRQ>
  [71.994]  dump_stack_lvl+0x5a/0x80
  [71.994]  panic+0x10b/0x2da
  [71.995]  watchdog_timer_fn.cold+0x9a/0xa1
  [71.996]  ? __pfx_watchdog_timer_fn+0x10/0x10
  [71.997]  __hrtimer_run_queues+0x132/0x2a0
  [71.997]  hrtimer_interrupt+0xff/0x230
  [71.998]  __sysvec_apic_timer_interrupt+0x55/0x100
  [71.999]  sysvec_apic_timer_interrupt+0x6c/0x90
  [72.000]  </IRQ>
  [72.000]  <TASK>
  [72.001]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
  [72.002] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs]
  [72.002] Code: c1 01 48 83 (...)
  [72.005] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246
  [72.006] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000
  [72.006] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad
  [72.007] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080
  [72.008] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860
  [72.009] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000
  [72.010]  ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f]
  [72.011]  process_one_work+0x17e/0x330
  [72.012]  worker_thread+0x2ce/0x3f0
  [72.013]  ? __pfx_worker_thread+0x10/0x10
  [72.014]  kthread+0xef/0x220
  [72.014]  ? __pfx_kthread+0x10/0x10
  [72.015]  ret_from_fork+0x34/0x50
  [72.015]  ? __pfx_kthread+0x10/0x10
  [72.016]  ret_from_fork_asm+0x1a/0x30
  [72.017]  </TASK>
  [72.017] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
  [72.019] Rebooting in 90 seconds..

So fix this by making sure we move a block group out of the unused block
groups discard list when calling __add_to_discard_list().

Fixes: 2bee7eb8bb81 ("btrfs: discard one region at a time in async discard")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/discard.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -78,8 +78,6 @@ static void __add_to_discard_list(struct
 				  struct btrfs_block_group *block_group)
 {
 	lockdep_assert_held(&discard_ctl->lock);
-	if (!btrfs_run_discard_work(discard_ctl))
-		return;
 
 	if (list_empty(&block_group->discard_list) ||
 	    block_group->discard_index == BTRFS_DISCARD_INDEX_UNUSED) {
@@ -102,6 +100,9 @@ static void add_to_discard_list(struct b
 	if (!btrfs_is_block_group_data_only(block_group))
 		return;
 
+	if (!btrfs_run_discard_work(discard_ctl))
+		return;
+
 	spin_lock(&discard_ctl->lock);
 	__add_to_discard_list(discard_ctl, block_group);
 	spin_unlock(&discard_ctl->lock);
@@ -233,6 +234,18 @@ again:
 		    block_group->used != 0) {
 			if (btrfs_is_block_group_data_only(block_group)) {
 				__add_to_discard_list(discard_ctl, block_group);
+				/*
+				 * The block group must have been moved to other
+				 * discard list even if discard was disabled in
+				 * the meantime or a transaction abort happened,
+				 * otherwise we can end up in an infinite loop,
+				 * always jumping into the 'again' label and
+				 * keep getting this block group over and over
+				 * in case there are no other block groups in
+				 * the discard lists.
+				 */
+				ASSERT(block_group->discard_index !=
+				       BTRFS_DISCARD_INDEX_UNUSED);
 			} else {
 				list_del_init(&block_group->discard_list);
 				btrfs_put_block_group(block_group);



  parent reply	other threads:[~2025-05-20 13:54 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20 13:49 [PATCH 5.15 00/59] 5.15.184-rc1 review Greg Kroah-Hartman
2025-05-20 13:49 ` [PATCH 5.15 01/59] platform/x86: asus-wmi: Fix wlan_ctrl_by_user detection Greg Kroah-Hartman
2025-05-20 13:49 ` [PATCH 5.15 02/59] tracing: probes: Fix a possible race in trace_probe_log APIs Greg Kroah-Hartman
2025-05-20 13:49 ` [PATCH 5.15 03/59] iio: adc: ad7768-1: Fix insufficient alignment of timestamp Greg Kroah-Hartman
2025-05-20 13:49 ` [PATCH 5.15 04/59] iio: chemical: sps30: use aligned_s64 for timestamp Greg Kroah-Hartman
2025-05-20 13:49 ` [PATCH 5.15 05/59] RDMA/rxe: Fix slab-use-after-free Read in rxe_queue_cleanup bug Greg Kroah-Hartman
2025-05-20 13:49 ` [PATCH 5.15 06/59] nfs: handle failure of nfs_get_lock_context in unlock path Greg Kroah-Hartman
2025-05-20 13:49 ` [PATCH 5.15 07/59] spi: loopback-test: Do not split 1024-byte hexdumps Greg Kroah-Hartman
2025-05-20 13:49 ` [PATCH 5.15 08/59] net_sched: Flush gso_skb list too during ->change() Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 09/59] net: cadence: macb: Fix a possible deadlock in macb_halt_tx Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 10/59] net: dsa: sja1105: discard incoming frames in BR_STATE_LISTENING Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 11/59] ALSA: sh: SND_AICA should depend on SH_DMA_API Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 12/59] qlcnic: fix memory leak in qlcnic_sriov_channel_cfg_cmd() Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 13/59] NFSv4/pnfs: Reset the layout state after a layoutreturn Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 14/59] x86,nospec: Simplify {JMP,CALL}_NOSPEC Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 15/59] x86/speculation: Simplify and make CALL_NOSPEC consistent Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 16/59] x86/speculation: Add a conditional CS prefix to CALL_NOSPEC Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 17/59] x86/speculation: Remove the extra #ifdef around CALL_NOSPEC Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 18/59] Documentation: x86/bugs/its: Add ITS documentation Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 19/59] x86/its: Enumerate Indirect Target Selection (ITS) bug Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 20/59] x86/its: Add support for ITS-safe indirect thunk Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 21/59] x86/alternative: Optimize returns patching Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 22/59] x86/alternatives: Remove faulty optimization Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 23/59] x86/its: Add support for ITS-safe return thunk Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 24/59] x86/its: Enable Indirect Target Selection mitigation Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 25/59] x86/its: Add "vmexit" option to skip mitigation on some CPUs Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 26/59] x86/its: Align RETs in BHB clear sequence to avoid thunking Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 27/59] x86/its: Use dynamic thunks for indirect branches Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 28/59] x86/its: Fix build errors when CONFIG_MODULES=n Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 29/59] x86/its: FineIBT-paranoid vs ITS Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 30/59] dmaengine: Revert "dmaengine: dmatest: Fix dmatest waiting less when interrupted" Greg Kroah-Hartman
2025-05-20 13:50 ` Greg Kroah-Hartman [this message]
2025-05-20 13:50 ` [PATCH 5.15 32/59] ACPI: PPTT: Fix processor subtable walk Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 33/59] ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2() Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 34/59] ALSA: usb-audio: Add sample rate quirk for Audioengine D1 Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 35/59] ALSA: usb-audio: Add sample rate quirk for Microdia JP001 USB Camera Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 36/59] ftrace: Fix preemption accounting for stacktrace trigger command Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 37/59] ftrace: Fix preemption accounting for stacktrace filter command Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 38/59] tracing: samples: Initialize trace_array_printk() with the correct function Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 39/59] phy: Fix error handling in tegra_xusb_port_init Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 40/59] phy: renesas: rcar-gen3-usb2: Set timing registers only once Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 41/59] wifi: mt76: disable napi on driver removal Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 42/59] dmaengine: ti: k3-udma: Add missing locking Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 43/59] dmaengine: ti: k3-udma: Use cap_mask directly from dma_device structure instead of a local copy Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 44/59] dmaengine: idxd: fix memory leak in error handling path of idxd_setup_engines Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 45/59] dmaengine: idxd: fix memory leak in error handling path of idxd_setup_groups Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 46/59] block: fix direct io NOWAIT flag not work Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 47/59] clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable() Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 48/59] usb: typec: ucsi: displayport: Fix deadlock Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 49/59] usb: typec: altmodes/displayport: create sysfs nodes as drivers default device attribute group Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 50/59] usb: typec: fix potential array underflow in ucsi_ccg_sync_control() Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 51/59] usb: typec: fix pm usage counter imbalance " Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 52/59] selftests/mm: compaction_test: support platform with huge mount of memory Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 53/59] sctp: add mutual exclusion in proc_sctp_do_udp_port() Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 54/59] btrfs: dont BUG_ON() when 0 reference count at btrfs_lookup_extent_info() Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 55/59] btrfs: do not clean up repair bio if submit fails Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 56/59] netfilter: nf_tables: pass nft_chain to destroy function, not nft_ctx Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 57/59] netfilter: nf_tables: wait for rcu grace period on net_device removal Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 58/59] netfilter: nf_tables: do not defer rule destruction via call_rcu Greg Kroah-Hartman
2025-05-20 13:50 ` [PATCH 5.15 59/59] ice: arfs: fix use-after-free when freeing @rx_cpu_rmap Greg Kroah-Hartman
2025-05-20 18:19 ` [PATCH 5.15 00/59] 5.15.184-rc1 review Florian Fainelli
2025-05-20 22:46 ` Shuah Khan
2025-05-21  1:53 ` Ron Economos
2025-05-21  3:16 ` Vijayendra Suman
2025-05-21  8:30 ` Jon Hunter
2025-05-21 12:39 ` Naresh Kamboju
2025-05-21 18:54 ` Mark Brown
2025-05-21 19:10 ` Alexandre Chartre
2025-05-21 21:25   ` Pawan Gupta
2025-05-22  5:09 ` Hardik Garg
2025-05-23  9:25 ` Guenter Roeck
2025-05-27 19:31   ` Richard Narron
2025-05-28  0:55     ` Pawan Gupta
2025-05-29  4:49       ` Richard Narron
2025-05-29 17:40         ` Pawan Gupta
2025-05-30  5:11           ` Greg Kroah-Hartman
2025-05-30  5:21             ` Pawan Gupta
2025-05-30  6:04             ` Pawan Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250520125755.095235604@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=boris@bur.io \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=neelx@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox