Re: [BUG] rdma_cm: unable to handle kernel NULL pointer dereference in process_one_work when disconnect

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: "xiyan@cestc.cn" <xiyan@cestc.cn>,
	linux-rdma <linux-rdma@vger.kernel.org>
Cc: markzhang <markzhang@nvidia.com>, phaddad <phaddad@nvidia.com>,
	leon <leon@kernel.org>, "yuan.liu" <yuan.liu@cestc.cn>,
	zhangsiyao <zhangsiyao@cestc.cn>, peizhiwei <peizhiwei@cestc.cn>,
	eaujames <eaujames@ddn.com>, ssmirnov <ssmirnov@whamcloud.com>
Subject: Re: [BUG] rdma_cm: unable to handle kernel NULL pointer dereference in process_one_work when disconnect
Date: Tue, 17 Dec 2024 13:50:36 +0100	[thread overview]
Message-ID: <a3bbcb1d-00bc-4218-85da-291bf368770d@linux.dev> (raw)
In-Reply-To: <2024121711472494997716@cestc.cn>

On 17.12.24 04:47, xiyan@cestc.cn wrote:
> Hello RDMA Community,
> While testing the RoCEv2 feature of the Lustre file system, we encountered a crash issue related to ARP updates. Preliminary analysis suggests that this issue may be kernel-related, and it is also observed in the nvmeof environment，We are eager to receive your assistance. Below are the detailed information regarding the issue  LU-18364.
> Thanks.
> 
> Lustre's client and server are deployed within the VM, The VM uses the network card PF pass-through mode.
> 【OS】
> VM Version: qemu-kvm-7.0.0
> OS Verion: Rocky 8.10
> Kernel Verion: 4.18.0-553.el8_10.x86_64
> 
> 【Network Card】
> Client：
> MLX CX6 1*100G RoCE v2
> MLNX_OFED_LINUX-23.10-3.2.2.0-rhel8.10-x86_64
> firmware-version: 22.35.2000 (MT_0000000359)
> 
> Server:
> MLX CX6 1*100G RoCE v2
> MLNX_OFED_LINUX-23.10-3.2.2.0-rhel8.10-x86_64
> firmware-version: 22.35.2000 (MT_0000000359)
> 
> 【Kernel Commit】
> [PATCH rdma-next v2 2/2] RDMA/core: Add a netevent notifier to cma - Leon Romanovsky
> https://lore.kernel.org/all/bb255c9e301cd50b905663b8e73f7f5133d0e4c5.1654601342.git.leonro@nvidia.com/
> 
> 【Lustre Issue】
> LU-18364：https://jira.whamcloud.com/browse/LU-18364
> LU-18275：https://jira.whamcloud.com/browse/LU-18275
> 
> 【Problem Reproduction Steps】
> We've found a stable reproduction step for the crash issue:
> 1. We only use one network card, and do not use bonding.
> 2. Use vdbench run read/write test case on the lustre client.
> 3. Construct an ARP update for a lustre server IP address on the lustre client.
> 
> for example, the lustre client ip is 192.168.122.220,  the lustre server ip is 192.168.122.115, so do "arp -s 192.168.122.115 10:71:fc:69:92:b8 && arp -d 192.168.122.115" on 192.168.122.220, 10:71:fc:69:92:b8 is a wrong mac address.
> 
> The crash stack is blow:
>        KERNEL: /usr/lib/debug/lib/modules/4.18.0-553.el8_10.x86_64/vmlinux  [TAINTED]
>      DUMPFILE: vmcore  [PARTIAL DUMP]
>          CPUS: 20
>          DATE: Tue Dec  3 14:58:41 CST 2024
>        UPTIME: 00:06:20
> LOAD AVERAGE: 10.14, 2.56, 0.86
>         TASKS: 1076
>      NODENAME: rocky8vm3
>       RELEASE: 4.18.0-553.el8_10.x86_64
>       VERSION: #1 SMP Fri May 24 13:05:10 UTC 2024
>       MACHINE: x86_64  (2599 Mhz)
>        MEMORY: 31.4 GB
>         PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
>           PID: 607
>       COMMAND: "kworker/u40:28"
>          TASK: ff1e34360b6e0000  [THREAD_INFO: ff1e34360b6e0000]
>           CPU: 1
>         STATE: TASK_RUNNING (PANIC)crash> bt
> PID: 607      TASK: ff1e34360b6e0000  CPU: 1    COMMAND: "kworker/u40:28"
>   #0 [ff4de14b444cbbc0] machine_kexec at ffffffff9c46f2d3
>   #1 [ff4de14b444cbc18] __crash_kexec at ffffffff9c5baa5a
>   #2 [ff4de14b444cbcd8] crash_kexec at ffffffff9c5bb991
>   #3 [ff4de14b444cbcf0] oops_end at ffffffff9c42d811
>   #4 [ff4de14b444cbd10] no_context at ffffffff9c481cf3
>   #5 [ff4de14b444cbd68] __bad_area_nosemaphore at ffffffff9c48206c
>   #6 [ff4de14b444cbdb0] do_page_fault at ffffffff9c482cf7
>   #7 [ff4de14b444cbde0] page_fault at ffffffff9d0011ae
>      [exception RIP: process_one_work+46]
>      RIP: ffffffff9c51944e  RSP: ff4de14b444cbe98  RFLAGS: 00010046
>      RAX: 0000000000000000  RBX: ff1e34360734b5d8  RCX: dead000000000200
>      RDX: 000000010001393f  RSI: ff1e34360734b5d8  RDI: ff1e343ca7eed5c0
>      RBP: ff1e343600019400   R8: ff1e343d37c73bb8   R9: 0000005885358800
>      R10: 0000000000000000  R11: ff1e343d37c71dc4  R12: 0000000000000000
>      R13: ff1e343600019420  R14: ff1e3436000194d0  R15: ff1e343ca7eed5c0
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #8 [ff4de14b444cbed8] worker_thread at ffffffff9c5197e0
>   #9 [ff4de14b444cbf10] kthread at ffffffff9c520e04
> #10 [ff4de14b444cbf50] ret_from_fork at ffffffff9d00028f
> 
> Another stack is below:
> [ 1656.060089] list_del corruption. next->prev should be ff4880c9d81b8d48, but was ff4880ccfb2d45e0

It seems that it is a memory corruption problem. The reason that causes 
this memory corruption is very complicated. Because you can reproduce 
this problem, perhaps some eBPF tools can help you to find out the root 
cause.

Zhu Yanjun

> [ 1656.060536] ------------[ cut here ]------------
> [ 1656.060538] kernel BUG at lib/list_debug.c:56!
> [ 1656.060738] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1656.060872] CPU: 4 PID: 606 Comm: kworker/u40:27 Kdump: loaded Tainted: GF          OE     -------- -  - 4.18.0-553.el8_10.x86_64 #1
> [ 1656.061130] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-4.module+el8.9.0+1408+7b966129 04/01/2014
> [ 1656.061261] Workqueue: mlx5_cmd_0000:11:00.0 cmd_work_handler [mlx5_core]
> [ 1656.061457] RIP: 0010:__list_del_entry_valid.cold.1+0x20/0x48
> [ 1656.061586] Code: 45 d4 99 e8 5e 52 c7 ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 00 46 d4 99 e8 4a 52 c7 ff 0f 0b 48 c7 c7 b0 46 d4 99 e8 3c 52 c7 ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 70 46 d4 99 e8 28 52 c7 ff 0f 0b
> [ 1656.061846] RSP: 0018:ff650559444dfe90 EFLAGS: 00010046
> [ 1656.061974] RAX: 0000000000000054 RBX: ff4880c9d81b8d40 RCX: 0000000000000000
> [ 1656.062103] RDX: 0000000000000000 RSI: ff4880cf9731e698 RDI: ff4880cf9731e698
> [ 1656.062238] RBP: ff4880c840019400 R08: 0000000000000000 R09: c0000000ffff7fff
> [ 1656.062366] R10: 0000000000000001 R11: ff650559444dfcb0 R12: ff4880c862647b00
> [ 1656.062492] R13: ff4880c879326540 R14: 0000000000000000 R15: ff4880c9d81b8d48
> [ 1656.062619] FS:  0000000000000000(0000) GS:ff4880cf97300000(0000) knlGS:0000000000000000
> [ 1656.062745] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1656.062868] CR2: 000055cc1af6b000 CR3: 000000084b610006 CR4: 0000000000771ee0
> [ 1656.062996] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1656.063127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1656.063250] PKRU: 55555554
>        KERNEL: /usr/lib/debug/lib/modules/4.18.0-553.el8_10.x86_64/vmlinux  [TAINTED]
>      DUMPFILE: vmcore  [PARTIAL DUMP]
>          CPUS: 20
>          DATE: Fri Nov 29 17:37:31 CST 2024
>        UPTIME: 00:27:35
> LOAD AVERAGE: 350.47, 237.79, 163.91
>         TASKS: 1106
>      NODENAME: rocky8vm3
>       RELEASE: 4.18.0-553.el8_10.x86_64
>       VERSION: #1 SMP Fri May 24 13:05:10 UTC 2024
>       MACHINE: x86_64  (2599 Mhz)
>        MEMORY: 31.4 GB
>         PANIC: "kernel BUG at lib/list_debug.c:56!"
>           PID: 606
>       COMMAND: "kworker/u40:27"
>          TASK: ff4880c8793f8000  [THREAD_INFO: ff4880c8793f8000]
>           CPU: 4
>         STATE: TASK_RUNNING (PANIC)crash> bt
> PID: 606      TASK: ff4880c8793f8000  CPU: 4    COMMAND: "kworker/u40:27"
>   #0 [ff650559444dfc28] machine_kexec at ffffffff98a6f2d3
>   #1 [ff650559444dfc80] __crash_kexec at ffffffff98bbaa5a
>   #2 [ff650559444dfd40] crash_kexec at ffffffff98bbb991
>   #3 [ff650559444dfd58] oops_end at ffffffff98a2d811
>   #4 [ff650559444dfd78] do_trap at ffffffff98a29a27
>   #5 [ff650559444dfdc0] do_invalid_op at ffffffff98a2a766
>   #6 [ff650559444dfde0] invalid_op at ffffffff99600da4
>      [exception RIP: __list_del_entry_valid.cold.1+32]
>      RIP: ffffffff98ef8f98  RSP: ff650559444dfe90  RFLAGS: 00010046
>      RAX: 0000000000000054  RBX: ff4880c9d81b8d40  RCX: 0000000000000000
>      RDX: 0000000000000000  RSI: ff4880cf9731e698  RDI: ff4880cf9731e698
>      RBP: ff4880c840019400   R8: 0000000000000000   R9: c0000000ffff7fff
>      R10: 0000000000000001  R11: ff650559444dfcb0  R12: ff4880c862647b00
>      R13: ff4880c879326540  R14: 0000000000000000  R15: ff4880c9d81b8d48
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #7 [ff650559444dfe90] process_one_work at ffffffff98b19557
>   #8 [ff650559444dfed8] worker_thread at ffffffff98b197e0
>   #9 [ff650559444dff10] kthread at ffffffff98b20e04
> #10 [ff650559444dff50] ret_from_fork at ffffffff9960028f
> 
> This bug seems to be in rdma_cm module on the MOFED/kernel side. So we try to reproduce the crash on the Nvme-oF node:
> 1. Mount the nvme-of disk, do "nvme connect -t rdma -n "nqn.2014-08.org.nvmexpress:67240ebd3fa63ca3" -a 192.168.122.30 -s 4421"
> 2. Use dd run write/read test case, for example, "dd if=/dev/nvme0n17 of=./test bs=32K count=102400 oflag=direct"
> 3. Construct an ARP update, do "arp -s 192.168.122.112 10:71:fe:69:93:b8 && arp -d 192.168.122.112" on the nvme_of client.
> 4. The crash is already reproduce.
> 
> The issue may involve the following key points:
> 1. The RDMA module receives multiple network events simultaneously.
> 2. We have observed that during normal ARP updates, one or more events may be generated, making this issue probabilistic.
> 3. When both ARP update events and connection termination (conn disconnect) events are received at the same time, it triggers issue LU-18275.

next prev parent reply	other threads:[~2024-12-17 12:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-17  3:47 [BUG] rdma_cm: unable to handle kernel NULL pointer dereference in process_one_work when disconnect xiyan
2024-12-17 12:50 ` Zhu Yanjun [this message]
     [not found] <2024121711374552593113@cestc.cn>
2024-12-19 11:34 ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3bbcb1d-00bc-4218-85da-291bf368770d@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=eaujames@ddn.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=markzhang@nvidia.com \
    --cc=peizhiwei@cestc.cn \
    --cc=phaddad@nvidia.com \
    --cc=ssmirnov@whamcloud.com \
    --cc=xiyan@cestc.cn \
    --cc=yuan.liu@cestc.cn \
    --cc=zhangsiyao@cestc.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox