From: Coly Li <colyli@suse.de>
To: Zou Mingzhe <mingzhe.zou@easystack.cn>
Cc: zoumingzhe@qq.com, linux-bcache@vger.kernel.org
Subject: Re: [PATCH] bcache: fixup multiple threads crash
Date: Fri, 11 Feb 2022 19:38:48 +0800 [thread overview]
Message-ID: <551d44a7-8e87-2100-e395-04de8e1e45c3@suse.de> (raw)
In-Reply-To: <214c96f8-dc8e-5511-ec85-40d945cf3a7c@easystack.cn>
On 2/11/22 4:57 PM, Zou Mingzhe wrote:
>
> 在 2022/2/11 16:39, Coly Li 写道:
>> On 2/11/22 2:39 PM, mingzhe.zou@easystack.cn wrote:
>>> From: Mingzhe Zou <mingzhe.zou@easystack.cn>
>>>
>>> When multiple threads to check btree nodes in parallel, the main
>>> thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag:
>>>
>>> wait_event_interruptible(check_state->wait,
>>> atomic_read(&check_state->started) == 0 ||
>>> test_bit(CACHE_SET_IO_DISABLE, &c->flags));
>>>
>>> However, the bch_btree_node_read and bch_btree_node_read_done
>>> maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE
>>> will be set. If the flag already set, the main thread return
>>> error. At the same time, maybe some threads still running and
>>> read NULL pointer, the kernel will crash.
>>
>> Hi Mingzhe,
>>
>> Could you please explain a bit more about "read NULL pointer"? Which
>> NULL pointer might be read in the above condition?
>>
>> Thanks.
>>
>> Coly Li
>
> Hi ColyLi:
>
>
> This is dmesg output information:
>
> [956549.478889] bcache: bch_cache_set_error() bcache: error on
> c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8:
> [956549.492265] unsupported bset version at bucket 78226, block 0,
> 282468165 keys
> [956549.492266] bcache: bch_cache_set_error() , disabling caching
What happens here? It seems the btree node is corrupted with unexpected
bset version.
>
> [956549.531298] bcache: conditional_stop_bcache_device()
> stop_when_cache_set_failed of escache2 is "auto" and cache is dirty,
> stop it to avoid potential data corruption.
> [956549.557162] bcache: cached_dev_detach_finish() Caching disabled
> for sdc1
> [956549.570292] bcache: bcache_device_free() escache2 stopped
> [956549.603434] bcache: cache_set_free() Cache set
> c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 unregistered
> [956583.000094] bcache: bch_cache_set_error() bcache: error on
> 4f45b0b1-32ea-4d83-a2b1-caa169a95b35:
> [956583.013567] unsupported bset version at bucket 25452, block 0,
> 208758840 keys
> [956583.013569] bcache: bch_cache_set_error() , disabling caching
Similar corrupted btree node here.
The corrupted btree node should be the first thing to look into.
>
> [956583.053296] bcache: conditional_stop_bcache_device()
> stop_when_cache_set_failed of escache1 is "auto" and cache is dirty,
> stop it to avoid potential data corruption.
> [956583.079968] bcache: cached_dev_detach_finish() Caching disabled
> for sdd1
> [956583.093539] bcache: bcache_device_free() escache1 stopped
> [956583.126677] bcache: cache_set_free() Cache set
> 4f45b0b1-32ea-4d83-a2b1-caa169a95b35 unregistered
> [956589.522067] sdc: sdc1
> [956590.291748] bcache: register_bdev() registered backing device sdc1
> [956590.307925] bcache: cache_alloc() set
> c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 with btree_buckets: 256, free:
> 128, free_inc: 11264, heap: 22528
>
> [956590.406700] bcache: bch_cache_set_error() bcache: error on
> c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8:
> [956590.411369] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE
> already set
> [956590.412453] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE
> already set
> [956590.412455] bcache: bch_cache_set_error() bcache: error on
> c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8:
> [956590.412456] io error reading bucket 81152
> [956590.412456] bcache: bch_cache_set_error() , disabling caching
>
> [956590.412578] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000010
> [956590.412594] PGD 0 P4D 0
> [956590.412624] Oops: 0000 [#1] SMP PTI
> [956590.412645] CPU: 12 PID: 55477 Comm: bch_btrchk[9] Kdump: loaded
> Tainted: G OE --------- - -
> 4.18.0-147.5.1.el8_1.5es.16.x86_64 #1
> [956590.412646] Hardware name: Tsinghua Tongfang THTF Chaoqiang
> Server/072T6D, BIOS 2.4.3 01/17/2017
> [956590.412859] RIP: 0010:bch_btree_node_read_done+0x1e9/0x400 [escache]
> [956590.412868] Code: 11 00 00 45 89 d3 0f b7 c0 41 c1 e3 09 49 89 c6
> 41 0f af d3 49 d3 ee 4c 89 f1 48 63 d2 49 8d 1c 10 45 39 f1 0f 83 da
> 00 00 00 <4d> 8b 78 10 4c 39 7b 10 0f 85 d4 01 00 00 44 8b 43 18 41 83
> f8 01
> [956590.412870] RSP: 0018:ffffac308ec4fd58 EFLAGS: 00010283
> [956590.412882] RAX: 0000000000000200 RBX: 0000000000035600 RCX:
> 0000000000000200
> [956590.412883] RDX: 0000000000035600 RSI: 00000000000001ab RDI:
> ffff911fcac20000
> [956590.412889] RBP: ffff911f090f7000 R08: 0000000000000000 R09:
> 00000000000001ab
> [956590.412890] R10: 0000000000000001 R11: 0000000000000200 R12:
> ffff910fb3238000
> [956590.412891] R13: 90135c78b99e07f5 R14: 0000000000000200 R15:
> ffff9119994b4620
> [956590.412896] FS: 0000000000000000(0000) GS:ffff91107f980000(0000)
> knlGS:0000000000000000
> [956590.412898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [956590.412902] bcache: cache_set_free() Cache set
> c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 unregistered
> [956590.412911] CR2: 0000000000000010 CR3: 0000001534e0a002 CR4:
> 00000000003626e0
> [956590.412925] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [956590.412926] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [956590.412928] Call Trace:
> [956590.412941] bch_btree_node_read+0xe7/0x1a0 [escache]
> [956590.412953] ? bch_keybuf_init+0x60/0x60 [escache]
> [956590.412976] ? bch_ptr_invalid+0x10/0x10 [escache]
> [956590.413002] btree_node_prefetch+0x6e/0x90 [escache]
> [956590.413025] bch_btree_check_thread+0x15c/0x270 [escache]
> [956590.413089] ? finish_task_switch+0xd7/0x2b0
> [956590.413099] ? bch_btree_check_recurse+0x1b0/0x1b0 [escache]
> [956590.413131] kthread+0x112/0x130
> [956590.413139] ? kthread_flush_work_fn+0x10/0x10
> [956590.413164] ret_from_fork+0x35/0x40
> [956590.413191] Modules linked in: iscsi_target_mod target_core_mod
> vhost_net vhost tap tun tcp_diag udp_diag inet_diag unix_diag ext4
> mbcache jbd2 nbd sch_htb act_police cls_u32 sch_ingress xt_connmark
> xt_CHECKSUM ip_set ip_gre ip_tunnel gre ebtable_filter ebtables
> ip6table_filter nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp
> nf_conntrack_proto_gre veth nf_conntrack_netlink nfnetlink
> ip6table_nat ip6_tables overlay iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi vxlan ip6_udp_tunnel udp_tunnel openvswitch
> nf_nat_ipv6 nf_conncount ipt_REJECT nf_reject_ipv4 iptable_filter
> xt_conntrack iptable_mangle xt_statistic xt_nat ipt_MASQUERADE
> xt_comment xt_addrtype xt_recent xt_mark iptable_nat nf_nat_ipv4
> nf_nat iptable_raw vfat fat intel_rapl sb_edac x86_pkg_temp_thermal
> mxm_wmi intel_powerclamp coretemp
> [956590.413654] kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf pcspkr
> mei_me mei joydev lpc_ich sg ipmi_si acpi_power_meter wmi escache nfsd
> nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4
> auth_rpcgss br_netfilter bridge nfs_acl lockd grace stp llc sunrpc
> nf_conntrack ip_tables xfs sd_mod drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops ttm ixgbe nvme drm ahci libahci
> libata nvme_core igb mdio dca i2c_algo_bit megaraid_sas(OE) dm_mirror
> dm_region_hash dm_log dm_mod rbd libceph dns_resolver ipmi_watchdog
> ipmi_devintf ipmi_msghandler drbd_transport_tcp(OE) drbd(OE) libcrc32c
> crc32c_intel
> [956590.413942] CR2: 0000000000000010
>
> I see some print "bcache: bch_cache_set_error() bcache: error on", but
> I don't have any more information on how to go wrong.
>
> Then the 'bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already
> set' mean the CACHE_SET_IO_DISABLE flag has been set and the main
> thread will return error.
>
> At the same time, some threads is running bch_btree_check and read
> NULL pointer.
>
As I asked before, can you tell which NULL pointer was read to trigger
the NULL pointer deference ? Then I can evaluate whether the fix is
efficient.
[snipped]
>>> This patch change the event wait condition, the main thread must
>>> wait for all threads to stop.
>>>
>>> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
>>> ---
>>> drivers/md/bcache/btree.c | 6 ++++--
>>> drivers/md/bcache/writeback.c | 6 ++++--
>>> 2 files changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>>> index 88c573eeb598..ad9f16689419 100644
>>> --- a/drivers/md/bcache/btree.c
>>> +++ b/drivers/md/bcache/btree.c
>>> @@ -2060,9 +2060,11 @@ int bch_btree_check(struct cache_set *c)
>>> }
>>> }
>>> + /*
>>> + * Must wait for all threads to stop.
>>> + */
>>> wait_event_interruptible(check_state->wait,
>>> - atomic_read(&check_state->started) == 0 ||
>>> - test_bit(CACHE_SET_IO_DISABLE, &c->flags));
>>> + atomic_read(&check_state->started) == 0);
>>> for (i = 0; i < check_state->total_threads; i++) {
>>> if (check_state->infos[i].result) {
>>> diff --git a/drivers/md/bcache/writeback.c
>>> b/drivers/md/bcache/writeback.c
>>> index c7560f66dca8..68d3dd6b4f11 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -998,9 +998,11 @@ void bch_sectors_dirty_init(struct
>>> bcache_device *d)
>>> }
>>> }
>>> + /*
>>> + * Must wait for all threads to stop.
>>> + */
>>> wait_event_interruptible(state->wait,
>>> - atomic_read(&state->started) == 0 ||
>>> - test_bit(CACHE_SET_IO_DISABLE, &c->flags));
>>> + atomic_read(&state->started) == 0);
>>> out:
>>> kfree(state);
>>
>>
next prev parent reply other threads:[~2022-02-11 11:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-11 6:39 [PATCH] bcache: fixup multiple threads crash mingzhe.zou
2022-02-11 8:39 ` Coly Li
2022-02-11 8:57 ` Zou Mingzhe
2022-02-11 11:38 ` Coly Li [this message]
2022-02-12 12:25 ` Zou Mingzhe
2022-02-14 7:46 ` Zou Mingzhe
2022-02-23 9:10 ` Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=551d44a7-8e87-2100-e395-04de8e1e45c3@suse.de \
--to=colyli@suse.de \
--cc=linux-bcache@vger.kernel.org \
--cc=mingzhe.zou@easystack.cn \
--cc=zoumingzhe@qq.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox