From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: "xiyan@cestc.cn" <xiyan@cestc.cn>,
linux-rdma <linux-rdma@vger.kernel.org>
Cc: markzhang <markzhang@nvidia.com>, phaddad <phaddad@nvidia.com>,
leon <leon@kernel.org>, "yuan.liu" <yuan.liu@cestc.cn>,
zhangsiyao <zhangsiyao@cestc.cn>, peizhiwei <peizhiwei@cestc.cn>,
eaujames <eaujames@ddn.com>, ssmirnov <ssmirnov@whamcloud.com>
Subject: Re: [BUG] rdma_cm: unable to handle kernel NULL pointer dereference in process_one_work when disconnect
Date: Tue, 17 Dec 2024 13:50:36 +0100 [thread overview]
Message-ID: <a3bbcb1d-00bc-4218-85da-291bf368770d@linux.dev> (raw)
In-Reply-To: <2024121711472494997716@cestc.cn>
On 17.12.24 04:47, xiyan@cestc.cn wrote:
> Hello RDMA Community,
> While testing the RoCEv2 feature of the Lustre file system, we encountered a crash issue related to ARP updates. Preliminary analysis suggests that this issue may be kernel-related, and it is also observed in the nvmeof environment,We are eager to receive your assistance. Below are the detailed information regarding the issue LU-18364.
> Thanks.
>
> Lustre's client and server are deployed within the VM, The VM uses the network card PF pass-through mode.
> 【OS】
> VM Version: qemu-kvm-7.0.0
> OS Verion: Rocky 8.10
> Kernel Verion: 4.18.0-553.el8_10.x86_64
>
> 【Network Card】
> Client:
> MLX CX6 1*100G RoCE v2
> MLNX_OFED_LINUX-23.10-3.2.2.0-rhel8.10-x86_64
> firmware-version: 22.35.2000 (MT_0000000359)
>
> Server:
> MLX CX6 1*100G RoCE v2
> MLNX_OFED_LINUX-23.10-3.2.2.0-rhel8.10-x86_64
> firmware-version: 22.35.2000 (MT_0000000359)
>
> 【Kernel Commit】
> [PATCH rdma-next v2 2/2] RDMA/core: Add a netevent notifier to cma - Leon Romanovsky
> https://lore.kernel.org/all/bb255c9e301cd50b905663b8e73f7f5133d0e4c5.1654601342.git.leonro@nvidia.com/
>
> 【Lustre Issue】
> LU-18364:https://jira.whamcloud.com/browse/LU-18364
> LU-18275:https://jira.whamcloud.com/browse/LU-18275
>
> 【Problem Reproduction Steps】
> We've found a stable reproduction step for the crash issue:
> 1. We only use one network card, and do not use bonding.
> 2. Use vdbench run read/write test case on the lustre client.
> 3. Construct an ARP update for a lustre server IP address on the lustre client.
>
> for example, the lustre client ip is 192.168.122.220, the lustre server ip is 192.168.122.115, so do "arp -s 192.168.122.115 10:71:fc:69:92:b8 && arp -d 192.168.122.115" on 192.168.122.220, 10:71:fc:69:92:b8 is a wrong mac address.
>
> The crash stack is blow:
> KERNEL: /usr/lib/debug/lib/modules/4.18.0-553.el8_10.x86_64/vmlinux [TAINTED]
> DUMPFILE: vmcore [PARTIAL DUMP]
> CPUS: 20
> DATE: Tue Dec 3 14:58:41 CST 2024
> UPTIME: 00:06:20
> LOAD AVERAGE: 10.14, 2.56, 0.86
> TASKS: 1076
> NODENAME: rocky8vm3
> RELEASE: 4.18.0-553.el8_10.x86_64
> VERSION: #1 SMP Fri May 24 13:05:10 UTC 2024
> MACHINE: x86_64 (2599 Mhz)
> MEMORY: 31.4 GB
> PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
> PID: 607
> COMMAND: "kworker/u40:28"
> TASK: ff1e34360b6e0000 [THREAD_INFO: ff1e34360b6e0000]
> CPU: 1
> STATE: TASK_RUNNING (PANIC)crash> bt
> PID: 607 TASK: ff1e34360b6e0000 CPU: 1 COMMAND: "kworker/u40:28"
> #0 [ff4de14b444cbbc0] machine_kexec at ffffffff9c46f2d3
> #1 [ff4de14b444cbc18] __crash_kexec at ffffffff9c5baa5a
> #2 [ff4de14b444cbcd8] crash_kexec at ffffffff9c5bb991
> #3 [ff4de14b444cbcf0] oops_end at ffffffff9c42d811
> #4 [ff4de14b444cbd10] no_context at ffffffff9c481cf3
> #5 [ff4de14b444cbd68] __bad_area_nosemaphore at ffffffff9c48206c
> #6 [ff4de14b444cbdb0] do_page_fault at ffffffff9c482cf7
> #7 [ff4de14b444cbde0] page_fault at ffffffff9d0011ae
> [exception RIP: process_one_work+46]
> RIP: ffffffff9c51944e RSP: ff4de14b444cbe98 RFLAGS: 00010046
> RAX: 0000000000000000 RBX: ff1e34360734b5d8 RCX: dead000000000200
> RDX: 000000010001393f RSI: ff1e34360734b5d8 RDI: ff1e343ca7eed5c0
> RBP: ff1e343600019400 R8: ff1e343d37c73bb8 R9: 0000005885358800
> R10: 0000000000000000 R11: ff1e343d37c71dc4 R12: 0000000000000000
> R13: ff1e343600019420 R14: ff1e3436000194d0 R15: ff1e343ca7eed5c0
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #8 [ff4de14b444cbed8] worker_thread at ffffffff9c5197e0
> #9 [ff4de14b444cbf10] kthread at ffffffff9c520e04
> #10 [ff4de14b444cbf50] ret_from_fork at ffffffff9d00028f
>
> Another stack is below:
> [ 1656.060089] list_del corruption. next->prev should be ff4880c9d81b8d48, but was ff4880ccfb2d45e0
It seems that it is a memory corruption problem. The reason that causes
this memory corruption is very complicated. Because you can reproduce
this problem, perhaps some eBPF tools can help you to find out the root
cause.
Zhu Yanjun
> [ 1656.060536] ------------[ cut here ]------------
> [ 1656.060538] kernel BUG at lib/list_debug.c:56!
> [ 1656.060738] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1656.060872] CPU: 4 PID: 606 Comm: kworker/u40:27 Kdump: loaded Tainted: GF OE -------- - - 4.18.0-553.el8_10.x86_64 #1
> [ 1656.061130] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-4.module+el8.9.0+1408+7b966129 04/01/2014
> [ 1656.061261] Workqueue: mlx5_cmd_0000:11:00.0 cmd_work_handler [mlx5_core]
> [ 1656.061457] RIP: 0010:__list_del_entry_valid.cold.1+0x20/0x48
> [ 1656.061586] Code: 45 d4 99 e8 5e 52 c7 ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 00 46 d4 99 e8 4a 52 c7 ff 0f 0b 48 c7 c7 b0 46 d4 99 e8 3c 52 c7 ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 70 46 d4 99 e8 28 52 c7 ff 0f 0b
> [ 1656.061846] RSP: 0018:ff650559444dfe90 EFLAGS: 00010046
> [ 1656.061974] RAX: 0000000000000054 RBX: ff4880c9d81b8d40 RCX: 0000000000000000
> [ 1656.062103] RDX: 0000000000000000 RSI: ff4880cf9731e698 RDI: ff4880cf9731e698
> [ 1656.062238] RBP: ff4880c840019400 R08: 0000000000000000 R09: c0000000ffff7fff
> [ 1656.062366] R10: 0000000000000001 R11: ff650559444dfcb0 R12: ff4880c862647b00
> [ 1656.062492] R13: ff4880c879326540 R14: 0000000000000000 R15: ff4880c9d81b8d48
> [ 1656.062619] FS: 0000000000000000(0000) GS:ff4880cf97300000(0000) knlGS:0000000000000000
> [ 1656.062745] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1656.062868] CR2: 000055cc1af6b000 CR3: 000000084b610006 CR4: 0000000000771ee0
> [ 1656.062996] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1656.063127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1656.063250] PKRU: 55555554
> KERNEL: /usr/lib/debug/lib/modules/4.18.0-553.el8_10.x86_64/vmlinux [TAINTED]
> DUMPFILE: vmcore [PARTIAL DUMP]
> CPUS: 20
> DATE: Fri Nov 29 17:37:31 CST 2024
> UPTIME: 00:27:35
> LOAD AVERAGE: 350.47, 237.79, 163.91
> TASKS: 1106
> NODENAME: rocky8vm3
> RELEASE: 4.18.0-553.el8_10.x86_64
> VERSION: #1 SMP Fri May 24 13:05:10 UTC 2024
> MACHINE: x86_64 (2599 Mhz)
> MEMORY: 31.4 GB
> PANIC: "kernel BUG at lib/list_debug.c:56!"
> PID: 606
> COMMAND: "kworker/u40:27"
> TASK: ff4880c8793f8000 [THREAD_INFO: ff4880c8793f8000]
> CPU: 4
> STATE: TASK_RUNNING (PANIC)crash> bt
> PID: 606 TASK: ff4880c8793f8000 CPU: 4 COMMAND: "kworker/u40:27"
> #0 [ff650559444dfc28] machine_kexec at ffffffff98a6f2d3
> #1 [ff650559444dfc80] __crash_kexec at ffffffff98bbaa5a
> #2 [ff650559444dfd40] crash_kexec at ffffffff98bbb991
> #3 [ff650559444dfd58] oops_end at ffffffff98a2d811
> #4 [ff650559444dfd78] do_trap at ffffffff98a29a27
> #5 [ff650559444dfdc0] do_invalid_op at ffffffff98a2a766
> #6 [ff650559444dfde0] invalid_op at ffffffff99600da4
> [exception RIP: __list_del_entry_valid.cold.1+32]
> RIP: ffffffff98ef8f98 RSP: ff650559444dfe90 RFLAGS: 00010046
> RAX: 0000000000000054 RBX: ff4880c9d81b8d40 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ff4880cf9731e698 RDI: ff4880cf9731e698
> RBP: ff4880c840019400 R8: 0000000000000000 R9: c0000000ffff7fff
> R10: 0000000000000001 R11: ff650559444dfcb0 R12: ff4880c862647b00
> R13: ff4880c879326540 R14: 0000000000000000 R15: ff4880c9d81b8d48
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #7 [ff650559444dfe90] process_one_work at ffffffff98b19557
> #8 [ff650559444dfed8] worker_thread at ffffffff98b197e0
> #9 [ff650559444dff10] kthread at ffffffff98b20e04
> #10 [ff650559444dff50] ret_from_fork at ffffffff9960028f
>
> This bug seems to be in rdma_cm module on the MOFED/kernel side. So we try to reproduce the crash on the Nvme-oF node:
> 1. Mount the nvme-of disk, do "nvme connect -t rdma -n "nqn.2014-08.org.nvmexpress:67240ebd3fa63ca3" -a 192.168.122.30 -s 4421"
> 2. Use dd run write/read test case, for example, "dd if=/dev/nvme0n17 of=./test bs=32K count=102400 oflag=direct"
> 3. Construct an ARP update, do "arp -s 192.168.122.112 10:71:fe:69:93:b8 && arp -d 192.168.122.112" on the nvme_of client.
> 4. The crash is already reproduce.
>
> The issue may involve the following key points:
> 1. The RDMA module receives multiple network events simultaneously.
> 2. We have observed that during normal ARP updates, one or more events may be generated, making this issue probabilistic.
> 3. When both ARP update events and connection termination (conn disconnect) events are received at the same time, it triggers issue LU-18275.
next prev parent reply other threads:[~2024-12-17 12:50 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-17 3:47 [BUG] rdma_cm: unable to handle kernel NULL pointer dereference in process_one_work when disconnect xiyan
2024-12-17 12:50 ` Zhu Yanjun [this message]
[not found] <2024121711374552593113@cestc.cn>
2024-12-19 11:34 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a3bbcb1d-00bc-4218-85da-291bf368770d@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=eaujames@ddn.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=markzhang@nvidia.com \
--cc=peizhiwei@cestc.cn \
--cc=phaddad@nvidia.com \
--cc=ssmirnov@whamcloud.com \
--cc=xiyan@cestc.cn \
--cc=yuan.liu@cestc.cn \
--cc=zhangsiyao@cestc.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox