From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0BB9C433F5 for ; Fri, 11 Feb 2022 11:38:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348083AbiBKLiz (ORCPT ); Fri, 11 Feb 2022 06:38:55 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:52722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233008AbiBKLiy (ORCPT ); Fri, 11 Feb 2022 06:38:54 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1F62381 for ; Fri, 11 Feb 2022 03:38:52 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 84153210FA; Fri, 11 Feb 2022 11:38:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1644579531; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GM6QXq1pOPt3sfeAPFjE9toIQi4lMULa/opYJbQ7qwM=; b=Y5oBgkt+Y3P7dL8wp8JJqX1VeFHkWRQH9ExMsUBYpaP8EznndPU9x+sAEJytaZWu2lJiFe RUfpS8tGch6Quu6Ex0nFcYBi5d5zpvuTISqxc/VBgD/yv6KTGb+pWyPDRo/+oWy8Fz/k9X 3LPxuAOrMIoK6sp5ySU9UPMUg75kTr4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1644579531; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GM6QXq1pOPt3sfeAPFjE9toIQi4lMULa/opYJbQ7qwM=; b=2080AK4nrnwogQxwJDy8RlVkgFQWY4pHkD8ypkVlCl+t6AhOTptm+LITSMKqNA+TFhfY4X B7rqK4OBjS4+gHDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 76E5613C03; Fri, 11 Feb 2022 11:38:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id rmQAEMpKBmLXVAAAMHmgww (envelope-from ); Fri, 11 Feb 2022 11:38:50 +0000 Message-ID: <551d44a7-8e87-2100-e395-04de8e1e45c3@suse.de> Date: Fri, 11 Feb 2022 19:38:48 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.5.1 Subject: Re: [PATCH] bcache: fixup multiple threads crash Content-Language: en-US To: Zou Mingzhe Cc: zoumingzhe@qq.com, linux-bcache@vger.kernel.org References: <20220211063915.4101-1-mingzhe.zou@easystack.cn> <8c7314c0-4df3-5bda-f17c-7ddc15de7ffc@suse.de> <214c96f8-dc8e-5511-ec85-40d945cf3a7c@easystack.cn> From: Coly Li In-Reply-To: <214c96f8-dc8e-5511-ec85-40d945cf3a7c@easystack.cn> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-bcache@vger.kernel.org On 2/11/22 4:57 PM, Zou Mingzhe wrote: > > 在 2022/2/11 16:39, Coly Li 写道: >> On 2/11/22 2:39 PM, mingzhe.zou@easystack.cn wrote: >>> From: Mingzhe Zou >>> >>> When multiple threads to check btree nodes in parallel, the main >>> thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag: >>> >>> wait_event_interruptible(check_state->wait, >>> atomic_read(&check_state->started) == 0 || >>>                           test_bit(CACHE_SET_IO_DISABLE, &c->flags)); >>> >>> However, the bch_btree_node_read and bch_btree_node_read_done >>> maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE >>> will be set. If the flag already set, the main thread return >>> error. At the same time, maybe some threads still running and >>> read NULL pointer, the kernel will crash. >> >> Hi Mingzhe, >> >> Could you please explain a bit more about "read NULL pointer"? Which >> NULL pointer might be read in the above condition? >> >> Thanks. >> >> Coly Li > > Hi ColyLi: > > > This is dmesg output information: > > [956549.478889] bcache: bch_cache_set_error() bcache: error on > c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8: > [956549.492265] unsupported bset version at bucket 78226, block 0, > 282468165 keys > [956549.492266] bcache: bch_cache_set_error() , disabling caching What happens here? It seems the btree node is corrupted with unexpected bset version. > > [956549.531298] bcache: conditional_stop_bcache_device() > stop_when_cache_set_failed of escache2 is "auto" and cache is dirty, > stop it to avoid potential data corruption. > [956549.557162] bcache: cached_dev_detach_finish() Caching disabled > for sdc1 > [956549.570292] bcache: bcache_device_free() escache2 stopped > [956549.603434] bcache: cache_set_free() Cache set > c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 unregistered > [956583.000094] bcache: bch_cache_set_error() bcache: error on > 4f45b0b1-32ea-4d83-a2b1-caa169a95b35: > [956583.013567] unsupported bset version at bucket 25452, block 0, > 208758840 keys > [956583.013569] bcache: bch_cache_set_error() , disabling caching Similar corrupted btree node here. The corrupted btree node should be the first thing to look into. > > [956583.053296] bcache: conditional_stop_bcache_device() > stop_when_cache_set_failed of escache1 is "auto" and cache is dirty, > stop it to avoid potential data corruption. > [956583.079968] bcache: cached_dev_detach_finish() Caching disabled > for sdd1 > [956583.093539] bcache: bcache_device_free() escache1 stopped > [956583.126677] bcache: cache_set_free() Cache set > 4f45b0b1-32ea-4d83-a2b1-caa169a95b35 unregistered > [956589.522067]  sdc: sdc1 > [956590.291748] bcache: register_bdev() registered backing device sdc1 > [956590.307925] bcache: cache_alloc() set > c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 with btree_buckets: 256, free: > 128, free_inc: 11264, heap: 22528 > > [956590.406700] bcache: bch_cache_set_error() bcache: error on > c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8: > [956590.411369] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE > already set > [956590.412453] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE > already set > [956590.412455] bcache: bch_cache_set_error() bcache: error on > c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8: > [956590.412456] io error reading bucket 81152 > [956590.412456] bcache: bch_cache_set_error() , disabling caching > > [956590.412578] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000010 > [956590.412594] PGD 0 P4D 0 > [956590.412624] Oops: 0000 [#1] SMP PTI > [956590.412645] CPU: 12 PID: 55477 Comm: bch_btrchk[9] Kdump: loaded > Tainted: G           OE    --------- -  - > 4.18.0-147.5.1.el8_1.5es.16.x86_64 #1 > [956590.412646] Hardware name: Tsinghua Tongfang THTF Chaoqiang > Server/072T6D, BIOS 2.4.3 01/17/2017 > [956590.412859] RIP: 0010:bch_btree_node_read_done+0x1e9/0x400 [escache] > [956590.412868] Code: 11 00 00 45 89 d3 0f b7 c0 41 c1 e3 09 49 89 c6 > 41 0f af d3 49 d3 ee 4c 89 f1 48 63 d2 49 8d 1c 10 45 39 f1 0f 83 da > 00 00 00 <4d> 8b 78 10 4c 39 7b 10 0f 85 d4 01 00 00 44 8b 43 18 41 83 > f8 01 > [956590.412870] RSP: 0018:ffffac308ec4fd58 EFLAGS: 00010283 > [956590.412882] RAX: 0000000000000200 RBX: 0000000000035600 RCX: > 0000000000000200 > [956590.412883] RDX: 0000000000035600 RSI: 00000000000001ab RDI: > ffff911fcac20000 > [956590.412889] RBP: ffff911f090f7000 R08: 0000000000000000 R09: > 00000000000001ab > [956590.412890] R10: 0000000000000001 R11: 0000000000000200 R12: > ffff910fb3238000 > [956590.412891] R13: 90135c78b99e07f5 R14: 0000000000000200 R15: > ffff9119994b4620 > [956590.412896] FS:  0000000000000000(0000) GS:ffff91107f980000(0000) > knlGS:0000000000000000 > [956590.412898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [956590.412902] bcache: cache_set_free() Cache set > c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 unregistered > [956590.412911] CR2: 0000000000000010 CR3: 0000001534e0a002 CR4: > 00000000003626e0 > [956590.412925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [956590.412926] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [956590.412928] Call Trace: > [956590.412941]  bch_btree_node_read+0xe7/0x1a0 [escache] > [956590.412953]  ? bch_keybuf_init+0x60/0x60 [escache] > [956590.412976]  ? bch_ptr_invalid+0x10/0x10 [escache] > [956590.413002]  btree_node_prefetch+0x6e/0x90 [escache] > [956590.413025]  bch_btree_check_thread+0x15c/0x270 [escache] > [956590.413089]  ? finish_task_switch+0xd7/0x2b0 > [956590.413099]  ? bch_btree_check_recurse+0x1b0/0x1b0 [escache] > [956590.413131]  kthread+0x112/0x130 > [956590.413139]  ? kthread_flush_work_fn+0x10/0x10 > [956590.413164]  ret_from_fork+0x35/0x40 > [956590.413191] Modules linked in: iscsi_target_mod target_core_mod > vhost_net vhost tap tun tcp_diag udp_diag inet_diag unix_diag ext4 > mbcache jbd2 nbd sch_htb act_police cls_u32 sch_ingress xt_connmark > xt_CHECKSUM ip_set ip_gre ip_tunnel gre ebtable_filter ebtables > ip6table_filter nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp > nf_conntrack_proto_gre veth nf_conntrack_netlink nfnetlink > ip6table_nat ip6_tables overlay iscsi_tcp libiscsi_tcp libiscsi > scsi_transport_iscsi vxlan ip6_udp_tunnel udp_tunnel openvswitch > nf_nat_ipv6 nf_conncount ipt_REJECT nf_reject_ipv4 iptable_filter > xt_conntrack iptable_mangle xt_statistic xt_nat ipt_MASQUERADE > xt_comment xt_addrtype xt_recent xt_mark iptable_nat nf_nat_ipv4 > nf_nat iptable_raw vfat fat intel_rapl sb_edac x86_pkg_temp_thermal > mxm_wmi intel_powerclamp coretemp > [956590.413654]  kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul > ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf pcspkr > mei_me mei joydev lpc_ich sg ipmi_si acpi_power_meter wmi escache nfsd > nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 > auth_rpcgss br_netfilter bridge nfs_acl lockd grace stp llc sunrpc > nf_conntrack ip_tables xfs sd_mod drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm ixgbe nvme drm ahci libahci > libata nvme_core igb mdio dca i2c_algo_bit megaraid_sas(OE) dm_mirror > dm_region_hash dm_log dm_mod rbd libceph dns_resolver ipmi_watchdog > ipmi_devintf ipmi_msghandler drbd_transport_tcp(OE) drbd(OE) libcrc32c > crc32c_intel > [956590.413942] CR2: 0000000000000010 > > I see some print "bcache: bch_cache_set_error() bcache: error on", but > I don't have any more information on how to go wrong. > > Then the 'bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already > set' mean the CACHE_SET_IO_DISABLE flag has been set and the main > thread will return error. > > At the same time, some threads is running bch_btree_check and read > NULL pointer. > As I asked before, can you tell which NULL pointer was read to trigger the NULL pointer deference ? Then I can evaluate whether the fix is efficient. [snipped] >>> This patch change the event wait condition, the main thread must >>> wait for all threads to stop. >>> >>> Signed-off-by: Mingzhe Zou >>> --- >>>   drivers/md/bcache/btree.c     | 6 ++++-- >>>   drivers/md/bcache/writeback.c | 6 ++++-- >>>   2 files changed, 8 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c >>> index 88c573eeb598..ad9f16689419 100644 >>> --- a/drivers/md/bcache/btree.c >>> +++ b/drivers/md/bcache/btree.c >>> @@ -2060,9 +2060,11 @@ int bch_btree_check(struct cache_set *c) >>>           } >>>       } >>>   +    /* >>> +     * Must wait for all threads to stop. >>> +     */ >>>       wait_event_interruptible(check_state->wait, >>> -                 atomic_read(&check_state->started) == 0 || >>> -                  test_bit(CACHE_SET_IO_DISABLE, &c->flags)); >>> +                 atomic_read(&check_state->started) == 0); >>>         for (i = 0; i < check_state->total_threads; i++) { >>>           if (check_state->infos[i].result) { >>> diff --git a/drivers/md/bcache/writeback.c >>> b/drivers/md/bcache/writeback.c >>> index c7560f66dca8..68d3dd6b4f11 100644 >>> --- a/drivers/md/bcache/writeback.c >>> +++ b/drivers/md/bcache/writeback.c >>> @@ -998,9 +998,11 @@ void bch_sectors_dirty_init(struct >>> bcache_device *d) >>>           } >>>       } >>>   +    /* >>> +     * Must wait for all threads to stop. >>> +     */ >>>       wait_event_interruptible(state->wait, >>> -         atomic_read(&state->started) == 0 || >>> -         test_bit(CACHE_SET_IO_DISABLE, &c->flags)); >>> +         atomic_read(&state->started) == 0); >>>     out: >>>       kfree(state); >> >>