public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Mingzhe Zou <mingzhe.zou@easystack.cn>,
	Coly Li <colyli@suse.de>, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 5.15 48/69] bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race
Date: Thu, 30 Nov 2023 16:22:45 +0000	[thread overview]
Message-ID: <20231130162134.654684267@linuxfoundation.org> (raw)
In-Reply-To: <20231130162133.035359406@linuxfoundation.org>

5.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mingzhe Zou <mingzhe.zou@easystack.cn>

commit 2faac25d7958c4761bb8cec54adb79f806783ad6 upstream.

We get a kernel crash about "unable to handle kernel paging request":

```dmesg
[368033.032005] BUG: unable to handle kernel paging request at ffffffffad9ae4b5
[368033.032007] PGD fc3a0d067 P4D fc3a0d067 PUD fc3a0e063 PMD 8000000fc38000e1
[368033.032012] Oops: 0003 [#1] SMP PTI
[368033.032015] CPU: 23 PID: 55090 Comm: bch_dirtcnt[0] Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-147.5.1.es8_24.x86_64 #1
[368033.032017] Hardware name: Tsinghua Tongfang THTF Chaoqiang Server/072T6D, BIOS 2.4.3 01/17/2017
[368033.032027] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
[368033.032029] Code: 8b 02 48 85 c0 74 f6 48 89 c1 eb d0 c1 e9 12 83 e0
03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 c0 3d 02 00 48 03 04 cd 60 68 93
ad <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 02
[368033.032031] RSP: 0018:ffffbb48852abe00 EFLAGS: 00010082
[368033.032032] RAX: ffffffffad9ae4b5 RBX: 0000000000000246 RCX: 0000000000003bf3
[368033.032033] RDX: ffff97b0ff8e3dc0 RSI: 0000000000600000 RDI: ffffbb4884743c68
[368033.032034] RBP: 0000000000000001 R08: 0000000000000000 R09: 000007ffffffffff
[368033.032035] R10: ffffbb486bb01000 R11: 0000000000000001 R12: ffffffffc068da70
[368033.032036] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[368033.032038] FS:  0000000000000000(0000) GS:ffff97b0ff8c0000(0000) knlGS:0000000000000000
[368033.032039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[368033.032040] CR2: ffffffffad9ae4b5 CR3: 0000000fc3a0a002 CR4: 00000000003626e0
[368033.032042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[368033.032043] bcache: bch_cached_dev_attach() Caching rbd479 as bcache462 on set 8cff3c36-4a76-4242-afaa-7630206bc70b
[368033.032045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[368033.032046] Call Trace:
[368033.032054]  _raw_spin_lock_irqsave+0x32/0x40
[368033.032061]  __wake_up_common_lock+0x63/0xc0
[368033.032073]  ? bch_ptr_invalid+0x10/0x10 [bcache]
[368033.033502]  bch_dirty_init_thread+0x14c/0x160 [bcache]
[368033.033511]  ? read_dirty_submit+0x60/0x60 [bcache]
[368033.033516]  kthread+0x112/0x130
[368033.033520]  ? kthread_flush_work_fn+0x10/0x10
[368033.034505]  ret_from_fork+0x35/0x40
```

The crash occurred when call wake_up(&state->wait), and then we want
to look at the value in the state. However, bch_sectors_dirty_init()
is not found in the stack of any task. Since state is allocated on
the stack, we guess that bch_sectors_dirty_init() has exited, causing
bch_dirty_init_thread() to be unable to handle kernel paging request.

In order to verify this idea, we added some printing information during
wake_up(&state->wait). We find that "wake up" is printed twice, however
we only expect the last thread to wake up once.

```dmesg
[  994.641004] alcache: bch_dirty_init_thread() wake up
[  994.641018] alcache: bch_dirty_init_thread() wake up
[  994.641523] alcache: bch_sectors_dirty_init() init exit
```

There is a race. If bch_sectors_dirty_init() exits after the first wake
up, the second wake up will trigger this bug("unable to handle kernel
paging request").

Proceed as follows:

bch_sectors_dirty_init
    kthread_run ==============> bch_dirty_init_thread(bch_dirtcnt[0])
            ...                         ...
    atomic_inc(&state.started)          ...
            ...                         ...
    atomic_read(&state.enough)          ...
            ...                 atomic_set(&state->enough, 1)
    kthread_run ======================================================> bch_dirty_init_thread(bch_dirtcnt[1])
            ...                 atomic_dec_and_test(&state->started)            ...
    atomic_inc(&state.started)          ...                                     ...
            ...                 wake_up(&state->wait)                           ...
    atomic_read(&state.enough)                                          atomic_dec_and_test(&state->started)
            ...                                                                 ...
    wait_event(state.wait, atomic_read(&state.started) == 0)                    ...
    return                                                                      ...
                                                                        wake_up(&state->wait)

We believe it is very common to wake up twice if there is no dirty, but
crash is an extremely low probability event. It's hard for us to reproduce
this issue. We attached and detached continuously for a week, with a total
of more than one million attaches and only one crash.

Putting atomic_inc(&state.started) before kthread_run() can avoid waking
up twice.

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Cc:  <stable@vger.kernel.org>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20231120052503.6122-8-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/bcache/writeback.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -1004,17 +1004,18 @@ void bch_sectors_dirty_init(struct bcach
 		if (atomic_read(&state.enough))
 			break;
 
+		atomic_inc(&state.started);
 		state.infos[i].state = &state;
 		state.infos[i].thread =
 			kthread_run(bch_dirty_init_thread, &state.infos[i],
 				    "bch_dirtcnt[%d]", i);
 		if (IS_ERR(state.infos[i].thread)) {
 			pr_err("fails to run thread bch_dirty_init[%d]\n", i);
+			atomic_dec(&state.started);
 			for (--i; i >= 0; i--)
 				kthread_stop(state.infos[i].thread);
 			goto out;
 		}
-		atomic_inc(&state.started);
 	}
 
 out:



  parent reply	other threads:[~2023-11-30 16:34 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 16:21 [PATCH 5.15 00/69] 5.15.141-rc1 review Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 5.15 01/69] afs: Fix afs_server_list to be cleaned up with RCU Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 5.15 02/69] afs: Make error on cell lookup failure consistent with OpenAFS Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 03/69] drm/panel: boe-tv101wum-nl6: Fine tune the panel power sequence Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 04/69] drm/panel: auo,b101uan08.3: " Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 05/69] drm/panel: simple: Fix Innolux G101ICE-L01 bus flags Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 06/69] drm/panel: simple: Fix Innolux G101ICE-L01 timings Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 07/69] wireguard: use DEV_STATS_INC() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 08/69] octeontx2-pf: Fix memory leak during interface down Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 09/69] ata: pata_isapnp: Add missing error check for devm_ioport_map() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 10/69] drm/rockchip: vop: Fix color for RGB888/BGR888 format on VOP full Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 11/69] HID: core: store the unique system identifier in hid_device Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 12/69] HID: fix HID device resource race between HID core and debugging support Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 13/69] ipv4: Correct/silence an endian warning in __ip_do_redirect Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 14/69] net: usb: ax88179_178a: fix failed operations during ax88179_reset Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 15/69] net/smc: avoid data corruption caused by decline Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 16/69] arm/xen: fix xen_vcpu_info allocation alignment Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 17/69] octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 18/69] amd-xgbe: handle corner-case during sfp hotplug Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 19/69] amd-xgbe: handle the corner-case during tx completion Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 20/69] amd-xgbe: propagate the correct speed and duplex status Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 21/69] net: axienet: Fix check for partial TX checksum Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 22/69] afs: Return ENOENT if no cell DNS record can be found Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 23/69] afs: Fix file locking on R/O volumes to operate in local mode Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 24/69] nvmet: nul-terminate the NQNs passed in the connect command Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 25/69] USB: dwc3: qcom: fix resource leaks on probe deferral Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 26/69] USB: dwc3: qcom: fix ACPI platform device leak Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 27/69] lockdep: Fix block chain corruption Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 28/69] MIPS: KVM: Fix a build warning about variable set but not used Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 29/69] media: camss: Replace hard coded value with parameter Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 30/69] media: camss: sm8250: Virtual channels for CSID Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 31/69] media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3 Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 32/69] media: qcom: camss: Fix csid-gen2 for test pattern generator Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 33/69] ext4: add a new helper to check if es must be kept Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 34/69] ext4: factor out __es_alloc_extent() and __es_free_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 35/69] ext4: use pre-allocated es in __es_insert_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 36/69] ext4: use pre-allocated es in __es_remove_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 37/69] ext4: using nofail preallocation in ext4_es_remove_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 38/69] ext4: using nofail preallocation in ext4_es_insert_delayed_block() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 39/69] ext4: using nofail preallocation in ext4_es_insert_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 40/69] ext4: fix slab-use-after-free " Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 41/69] ext4: make sure allocate pending entry not fail Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 42/69] tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 43/69] proc: sysctl: prevent aliased sysctls from getting passed to init Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 44/69] ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CVA Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 45/69] swiotlb-xen: provide the "max_mapping_size" method Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 46/69] bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 47/69] md: fix bi_status reporting in md_end_clone_io Greg Kroah-Hartman
2023-11-30 16:22 ` Greg Kroah-Hartman [this message]
2023-11-30 16:22 ` [PATCH 5.15 49/69] io_uring/fs: consider link->flags when getting path for LINKAT Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 50/69] s390/dasd: protect device queue against concurrent access Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 51/69] USB: serial: option: add Luat Air72*U series products Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 52/69] hv_netvsc: Fix race of register_netdevice_notifier and VF register Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 53/69] hv_netvsc: Mark VF as slave before exposing it to user-mode Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 54/69] dm-delay: fix a race between delay_presuspend and delay_bio Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 55/69] bcache: check return value from btree_node_alloc_replacement() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 56/69] bcache: prevent potential division by zero error Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 57/69] bcache: fixup init dirty data errors Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 58/69] bcache: fixup lock c->root error Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 59/69] usb: cdnsp: Fix deadlock issue during using NCM gadget Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 60/69] USB: serial: option: add Fibocom L7xx modules Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 61/69] USB: serial: option: fix FM101R-GL defines Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 62/69] USB: serial: option: dont claim interface 4 for ZTE MF290 Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 63/69] usb: typec: tcpm: Skip hard reset when in error recovery Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 64/69] USB: dwc2: write HCINT with INTMASK applied Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 65/69] usb: dwc3: Fix default mode initialization Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 66/69] usb: dwc3: set the dma max_seg_size Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 67/69] USB: dwc3: qcom: fix software node leak on probe errors Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 68/69] USB: dwc3: qcom: fix wakeup after probe deferral Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 69/69] io_uring: fix off-by one bvec index Greg Kroah-Hartman
2023-11-30 17:21 ` [PATCH 5.15 00/69] 5.15.141-rc1 review Daniel Díaz
2023-11-30 17:44   ` Guenter Roeck
2023-11-30 18:11     ` Daniel Díaz
2023-11-30 18:56       ` Guenter Roeck
2023-12-01  8:21       ` Greg Kroah-Hartman
2023-12-01  9:35         ` Francis Laniel
2023-12-01  9:44           ` Greg Kroah-Hartman
2023-12-01 14:34             ` Daniel Díaz
2023-12-01 23:04               ` Greg Kroah-Hartman
2023-12-04 14:55                 ` Daniel Díaz
2023-12-01  6:31     ` Harshit Mogalapalli
2023-11-30 22:27   ` Pavel Machek
2023-11-30 18:57 ` Florian Fainelli
2023-12-01  0:08 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231130162134.654684267@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=colyli@suse.de \
    --cc=mingzhe.zou@easystack.cn \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox