linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug report] RDMA/iwpm: reentrant iwpm hello message
@ 2024-12-20 15:32 Lin Ma
  2024-12-24  9:29 ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Lin Ma @ 2024-12-20 15:32 UTC (permalink / raw)
  To: jgg, leon, cmeiohas, michaelgur, linux-rdma, linux-kernel

Hello maintainers,

Our fuzzer identified one interesting reentrant bug that could cause hang
in the kernel. The crash log is like below:

[   32.616575][ T2983]
[   32.617000][ T2983] ============================================
[   32.617879][ T2983] WARNING: possible recursive locking detected
[   32.618759][ T2983] 6.1.70 #1 Not tainted
[   32.619362][ T2983] --------------------------------------------
[   32.620248][ T2983] hello.elf/2983 is trying to acquire lock:
[   32.621084][ T2983] ffffffff91978ff8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv+0x30f/0x990
[   32.624234][ T2983]
[   32.624234][ T2983] but task is already holding lock:
[   32.625237][ T2983] ffffffff91978ff8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv+0x30f/0x990
[   32.626562][ T2983]
[   32.626562][ T2983] other info that might help us debug this:
[   32.627648][ T2983]  Possible unsafe locking scenario:
[   32.627648][ T2983]
[   32.633708][ T2983]        CPU0
[   32.634184][ T2983]        ----
[   32.634646][ T2983]   lock(&rdma_nl_types[idx].sem);
[   32.635433][ T2983]   lock(&rdma_nl_types[idx].sem);
[   32.636155][ T2983]
[   32.636155][ T2983]  *** DEADLOCK ***
[   32.636155][ T2983]
[   32.637236][ T2983]  May be due to missing lock nesting notation
[   32.637236][ T2983]
[   32.638408][ T2983] 2 locks held by hello.elf/2983:
[   32.639135][ T2983]  #0: ffffffff91978ff8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv+0x30f/0x990
[   32.640605][ T2983]  #1: ffff888103f8f690 (nlk_cb_mutex-RDMA){+.+.}-{3:3}, at: netlink_dump+0xd3/0xc60
[   32.641981][ T2983]
[   32.641981][ T2983] stack backtrace:
[   32.642833][ T2983] CPU: 0 PID: 2983 Comm: hello.elf Not tainted 6.1.70 #1
[   32.643830][ T2983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   32.645243][ T2983] Call Trace:
[   32.645735][ T2983]  <TASK>
[   32.646197][ T2983]  dump_stack_lvl+0x177/0x231
[   32.646901][ T2983]  ? nf_tcp_handle_invalid+0x605/0x605
[   32.647705][ T2983]  ? panic+0x725/0x725
[   32.648350][ T2983]  validate_chain+0x4dd0/0x6010
[   32.649080][ T2983]  ? reacquire_held_locks+0x5a0/0x5a0
[   32.649864][ T2983]  ? mark_lock+0x94/0x320
[   32.650506][ T2983]  ? lockdep_hardirqs_on_prepare+0x3fd/0x760
[   32.651376][ T2983]  ? print_irqtrace_events+0x210/0x210
[   32.652182][ T2983]  ? mark_lock+0x94/0x320
[   32.652825][ T2983]  __lock_acquire+0x12ad/0x2010
[   32.653541][ T2983]  lock_acquire+0x1b4/0x490
[   32.654211][ T2983]  ? rdma_nl_rcv+0x30f/0x990
[   32.654891][ T2983]  ? __might_sleep+0xd0/0xd0
[   32.655569][ T2983]  ? __lock_acquire+0x12ad/0x2010
[   32.656316][ T2983]  ? read_lock_is_recursive+0x10/0x10
[   32.657109][ T2983]  down_read+0x42/0x2d0
[   32.657723][ T2983]  ? rdma_nl_rcv+0x30f/0x990
[   32.658400][ T2983]  rdma_nl_rcv+0x30f/0x990
[   32.659132][ T2983]  ? rdma_nl_net_init+0x160/0x160
[   32.659847][ T2983]  ? netlink_lookup+0x30/0x200
[   32.660519][ T2983]  ? __netlink_lookup+0x2a/0x6d0
[   32.661214][ T2983]  ? netlink_lookup+0x30/0x200
[   32.661880][ T2983]  ? netlink_lookup+0x30/0x200
[   32.662545][ T2983]  netlink_unicast+0x74b/0x8c0
[   32.663215][ T2983]  rdma_nl_unicast+0x4b/0x60
[   32.663852][ T2983]  iwpm_send_hello+0x1d8/0x350
[   32.664525][ T2983]  ? iwpm_mapinfo_available+0x130/0x130
[   32.665295][ T2983]  ? iwpm_parse_nlmsg+0x124/0x260
[   32.665995][ T2983]  iwpm_hello_cb+0x1e1/0x2e0
[   32.666638][ T2983]  ? netlink_dump+0x236/0xc60
[   32.667294][ T2983]  ? iwpm_mapping_error_cb+0x3e0/0x3e0
[   32.668064][ T2983]  netlink_dump+0x592/0xc60
[   32.668706][ T2983]  ? netlink_lookup+0x200/0x200
[   32.669381][ T2983]  ? __netlink_lookup+0x2a/0x6d0
[   32.670073][ T2983]  ? netlink_lookup+0x30/0x200
[   32.670731][ T2983]  ? netlink_lookup+0x30/0x200
[   32.671411][ T2983]  __netlink_dump_start+0x54e/0x710
[   32.672220][ T2983]  rdma_nl_rcv+0x753/0x990
[   32.672846][ T2983]  ? rdma_nl_net_init+0x160/0x160
[   32.673538][ T2983]  ? iwpm_mapping_error_cb+0x3e0/0x3e0
[   32.674316][ T2983]  ? netlink_deliver_tap+0x2e/0x1b0
[   32.675106][ T2983]  ? net_generic+0x1e/0x240
[   32.675778][ T2983]  ? netlink_deliver_tap+0x2e/0x1b0
[   32.676553][ T2983]  netlink_unicast+0x74b/0x8c0
[   32.677262][ T2983]  netlink_sendmsg+0x882/0xb90
[   32.677969][ T2983]  ? netlink_getsockopt+0x550/0x550
[   32.678732][ T2983]  ? aa_sock_msg_perm+0x94/0x150
[   32.679465][ T2983]  ? bpf_lsm_socket_sendmsg+0x5/0x10
[   32.680243][ T2983]  ? security_socket_sendmsg+0x7c/0xa0
[   32.681048][ T2983]  __sys_sendto+0x456/0x5b0
[   32.681724][ T2983]  ? __ia32_sys_getpeername+0x80/0x80
[   32.682510][ T2983]  ? __lock_acquire+0x2010/0x2010
[   32.683241][ T2983]  ? lockdep_hardirqs_on_prepare+0x3fd/0x760
[   32.684134][ T2983]  ? fd_install+0x5c/0x4f0
[   32.684794][ T2983]  ? print_irqtrace_events+0x210/0x210
[   32.685608][ T2983]  __x64_sys_sendto+0xda/0xf0
[   32.686298][ T2983]  do_syscall_64+0x45/0x90
[   32.686955][ T2983]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   32.687810][ T2983] RIP: 0033:0x440624
[   32.691944][ T2983] RSP: 002b:00007ffc82f3ee48 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   32.693136][ T2983] RAX: ffffffffffffffda RBX: 0000000000400400 RCX: 0000000000440624
[   32.694264][ T2983] RDX: 0000000000000018 RSI: 00007ffc82f3ee80 RDI: 0000000000000003
[   32.695387][ T2983] RBP: 00007ffc82f3fe90 R08: 000000000047df08 R09: 000000000000000c
[   32.696621][ T2983] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403990
[   32.697693][ T2983] R13: 0000000000000000 R14: 00000000006a6018 R15: 0000000000000000
[   32.698774][ T2983]  </TASK>

In a nutshell, the callback function for the command RDMA_NL_IWPM_HELLO, iwpm_hello_cb,
can further call rdma_nl_unicast, leading to repeated calls that may cause
a deadlock and potentially harm the kernel.

I am not familiar with the internal workings of the callback mechanism or how
IWPMD utilizes it, so I'm uncertain whether this reentrancy is expected behavior.
If it is, perhaps a reference counter should be used instead of an rw_semaphore.
If not, a proper check should be implemented.

Regards,
Lin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
  2024-12-20 15:32 [bug report] RDMA/iwpm: reentrant iwpm hello message Lin Ma
@ 2024-12-24  9:29 ` Leon Romanovsky
  2024-12-24 10:51   ` Lin Ma
  0 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2024-12-24  9:29 UTC (permalink / raw)
  To: Lin Ma; +Cc: jgg, cmeiohas, michaelgur, linux-rdma, linux-kernel

On Fri, Dec 20, 2024 at 11:32:34PM +0800, Lin Ma wrote:
> Hello maintainers,
> 
> Our fuzzer identified one interesting reentrant bug that could cause hang
> in the kernel. The crash log is like below:
> 
> [   32.616575][ T2983]
> [   32.617000][ T2983] ============================================
> [   32.617879][ T2983] WARNING: possible recursive locking detected
> [   32.618759][ T2983] 6.1.70 #1 Not tainted
> [   32.619362][ T2983] --------------------------------------------
> [   32.620248][ T2983] hello.elf/2983 is trying to acquire lock:
> [   32.621084][ T2983] ffffffff91978ff8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv+0x30f/0x990
> [   32.624234][ T2983]
> [   32.624234][ T2983] but task is already holding lock:
> [   32.625237][ T2983] ffffffff91978ff8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv+0x30f/0x990
> [   32.626562][ T2983]
> [   32.626562][ T2983] other info that might help us debug this:
> [   32.627648][ T2983]  Possible unsafe locking scenario:
> [   32.627648][ T2983]
> [   32.633708][ T2983]        CPU0
> [   32.634184][ T2983]        ----
> [   32.634646][ T2983]   lock(&rdma_nl_types[idx].sem);
> [   32.635433][ T2983]   lock(&rdma_nl_types[idx].sem);
> [   32.636155][ T2983]
> [   32.636155][ T2983]  *** DEADLOCK ***
> [   32.636155][ T2983]
> [   32.637236][ T2983]  May be due to missing lock nesting notation
> [   32.637236][ T2983]
> [   32.638408][ T2983] 2 locks held by hello.elf/2983:
> [   32.639135][ T2983]  #0: ffffffff91978ff8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv+0x30f/0x990
> [   32.640605][ T2983]  #1: ffff888103f8f690 (nlk_cb_mutex-RDMA){+.+.}-{3:3}, at: netlink_dump+0xd3/0xc60
> [   32.641981][ T2983]
> [   32.641981][ T2983] stack backtrace:
> [   32.642833][ T2983] CPU: 0 PID: 2983 Comm: hello.elf Not tainted 6.1.70 #1
> [   32.643830][ T2983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [   32.645243][ T2983] Call Trace:
> [   32.645735][ T2983]  <TASK>
> [   32.646197][ T2983]  dump_stack_lvl+0x177/0x231
> [   32.646901][ T2983]  ? nf_tcp_handle_invalid+0x605/0x605
> [   32.647705][ T2983]  ? panic+0x725/0x725
> [   32.648350][ T2983]  validate_chain+0x4dd0/0x6010
> [   32.649080][ T2983]  ? reacquire_held_locks+0x5a0/0x5a0
> [   32.649864][ T2983]  ? mark_lock+0x94/0x320
> [   32.650506][ T2983]  ? lockdep_hardirqs_on_prepare+0x3fd/0x760
> [   32.651376][ T2983]  ? print_irqtrace_events+0x210/0x210
> [   32.652182][ T2983]  ? mark_lock+0x94/0x320
> [   32.652825][ T2983]  __lock_acquire+0x12ad/0x2010
> [   32.653541][ T2983]  lock_acquire+0x1b4/0x490
> [   32.654211][ T2983]  ? rdma_nl_rcv+0x30f/0x990
> [   32.654891][ T2983]  ? __might_sleep+0xd0/0xd0
> [   32.655569][ T2983]  ? __lock_acquire+0x12ad/0x2010
> [   32.656316][ T2983]  ? read_lock_is_recursive+0x10/0x10
> [   32.657109][ T2983]  down_read+0x42/0x2d0
> [   32.657723][ T2983]  ? rdma_nl_rcv+0x30f/0x990
> [   32.658400][ T2983]  rdma_nl_rcv+0x30f/0x990
> [   32.659132][ T2983]  ? rdma_nl_net_init+0x160/0x160
> [   32.659847][ T2983]  ? netlink_lookup+0x30/0x200
> [   32.660519][ T2983]  ? __netlink_lookup+0x2a/0x6d0
> [   32.661214][ T2983]  ? netlink_lookup+0x30/0x200
> [   32.661880][ T2983]  ? netlink_lookup+0x30/0x200
> [   32.662545][ T2983]  netlink_unicast+0x74b/0x8c0
> [   32.663215][ T2983]  rdma_nl_unicast+0x4b/0x60
> [   32.663852][ T2983]  iwpm_send_hello+0x1d8/0x350
> [   32.664525][ T2983]  ? iwpm_mapinfo_available+0x130/0x130
> [   32.665295][ T2983]  ? iwpm_parse_nlmsg+0x124/0x260
> [   32.665995][ T2983]  iwpm_hello_cb+0x1e1/0x2e0
> [   32.666638][ T2983]  ? netlink_dump+0x236/0xc60
> [   32.667294][ T2983]  ? iwpm_mapping_error_cb+0x3e0/0x3e0
> [   32.668064][ T2983]  netlink_dump+0x592/0xc60
> [   32.668706][ T2983]  ? netlink_lookup+0x200/0x200
> [   32.669381][ T2983]  ? __netlink_lookup+0x2a/0x6d0
> [   32.670073][ T2983]  ? netlink_lookup+0x30/0x200
> [   32.670731][ T2983]  ? netlink_lookup+0x30/0x200
> [   32.671411][ T2983]  __netlink_dump_start+0x54e/0x710
> [   32.672220][ T2983]  rdma_nl_rcv+0x753/0x990
> [   32.672846][ T2983]  ? rdma_nl_net_init+0x160/0x160
> [   32.673538][ T2983]  ? iwpm_mapping_error_cb+0x3e0/0x3e0
> [   32.674316][ T2983]  ? netlink_deliver_tap+0x2e/0x1b0
> [   32.675106][ T2983]  ? net_generic+0x1e/0x240
> [   32.675778][ T2983]  ? netlink_deliver_tap+0x2e/0x1b0
> [   32.676553][ T2983]  netlink_unicast+0x74b/0x8c0
> [   32.677262][ T2983]  netlink_sendmsg+0x882/0xb90
> [   32.677969][ T2983]  ? netlink_getsockopt+0x550/0x550
> [   32.678732][ T2983]  ? aa_sock_msg_perm+0x94/0x150
> [   32.679465][ T2983]  ? bpf_lsm_socket_sendmsg+0x5/0x10
> [   32.680243][ T2983]  ? security_socket_sendmsg+0x7c/0xa0
> [   32.681048][ T2983]  __sys_sendto+0x456/0x5b0
> [   32.681724][ T2983]  ? __ia32_sys_getpeername+0x80/0x80
> [   32.682510][ T2983]  ? __lock_acquire+0x2010/0x2010
> [   32.683241][ T2983]  ? lockdep_hardirqs_on_prepare+0x3fd/0x760
> [   32.684134][ T2983]  ? fd_install+0x5c/0x4f0
> [   32.684794][ T2983]  ? print_irqtrace_events+0x210/0x210
> [   32.685608][ T2983]  __x64_sys_sendto+0xda/0xf0
> [   32.686298][ T2983]  do_syscall_64+0x45/0x90
> [   32.686955][ T2983]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [   32.687810][ T2983] RIP: 0033:0x440624
> [   32.691944][ T2983] RSP: 002b:00007ffc82f3ee48 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> [   32.693136][ T2983] RAX: ffffffffffffffda RBX: 0000000000400400 RCX: 0000000000440624
> [   32.694264][ T2983] RDX: 0000000000000018 RSI: 00007ffc82f3ee80 RDI: 0000000000000003
> [   32.695387][ T2983] RBP: 00007ffc82f3fe90 R08: 000000000047df08 R09: 000000000000000c
> [   32.696621][ T2983] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403990
> [   32.697693][ T2983] R13: 0000000000000000 R14: 00000000006a6018 R15: 0000000000000000
> [   32.698774][ T2983]  </TASK>
> 
> In a nutshell, the callback function for the command RDMA_NL_IWPM_HELLO, iwpm_hello_cb,
> can further call rdma_nl_unicast, leading to repeated calls that may cause
> a deadlock and potentially harm the kernel.
> 
> I am not familiar with the internal workings of the callback mechanism or how
> IWPMD utilizes it, so I'm uncertain whether this reentrancy is expected behavior.
> If it is, perhaps a reference counter should be used instead of an rw_semaphore.
> If not, a proper check should be implemented.

I'm not fully understand the lockdep here. We use down_read(), which is
reentry safe.

Thanks

> 
> Regards,
> Lin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
  2024-12-24  9:29 ` Leon Romanovsky
@ 2024-12-24 10:51   ` Lin Ma
  2024-12-24 14:11     ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Lin Ma @ 2024-12-24 10:51 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: jgg, cmeiohas, michaelgur, linux-rdma, linux-kernel

Hello Leon,

> 
> I'm not fully understand the lockdep here. We use down_read(), which is
> reentry safe.
> 

Really? To my knowledge, though down_read() itself will not trigger locking
errors. But below scenario will lead to deadlock, and that's why this
WARNING is raised.

   CPU0                CPU1
   ----                ----
down_read()[1]
                  down_write()[2]
down_read()[3]

If CPU1 thread not exists, the CPU0 will run smoothly (However, it will keep
looping and the PoC cannot be killed by any signal, causing Denial-of-Service).

When CPU1 calls down_write(), it will wait for [1] to be released.
However, when [3] is called, it will then wait for [2] to be released,
leading to a deadlock situation.

Please let me know if I understand this correctly or incorrectly?

Thanks,
Lin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
  2024-12-24 10:51   ` Lin Ma
@ 2024-12-24 14:11     ` Leon Romanovsky
  2024-12-24 16:16       ` Lin Ma
  0 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2024-12-24 14:11 UTC (permalink / raw)
  To: Lin Ma; +Cc: jgg, cmeiohas, michaelgur, linux-rdma, linux-kernel

On Tue, Dec 24, 2024 at 06:51:27PM +0800, Lin Ma wrote:
> Hello Leon,
> 
> > 
> > I'm not fully understand the lockdep here. We use down_read(), which is
> > reentry safe.
> > 
> 
> Really? To my knowledge, though down_read() itself will not trigger locking
> errors. But below scenario will lead to deadlock, and that's why this
> WARNING is raised.
> 
>    CPU0                CPU1
>    ----                ----
> down_read()[1]
>                   down_write()[2]
> down_read()[3]
> 
> If CPU1 thread not exists, the CPU0 will run smoothly (However, it will keep
> looping and the PoC cannot be killed by any signal, causing Denial-of-Service).
> 
> When CPU1 calls down_write(), it will wait for [1] to be released.
> However, when [3] is called, it will then wait for [2] to be released,
> leading to a deadlock situation.
> 
> Please let me know if I understand this correctly or incorrectly?

The thing is that down_write() is called when we unregistering module
which sent netlink messages. It shouldn't happen.

Thanks

> 
> Thanks,
> Lin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
  2024-12-24 14:11     ` Leon Romanovsky
@ 2024-12-24 16:16       ` Lin Ma
  2024-12-24 19:26         ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Lin Ma @ 2024-12-24 16:16 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: jgg, cmeiohas, michaelgur, linux-rdma, linux-kernel

Hello Leon,

> > Please let me know if I understand this correctly or incorrectly?
> 
> The thing is that down_write() is called when we unregistering module
> which sent netlink messages. It shouldn't happen.
> 

I acknowledge that this is a low-probability event. However, the race
condition still exists; otherwise, these read and write semaphores
would not be necessary. Why not just remove all of them?

Moreover, I find that even without the deadlock, this reentrant message
would hang the kernel and cannot be killed, with logs like below:
(after disabling locking sanitizer, tested in latest ubuntu)

[2187983.899998] INFO: task poc.elf:1717021 blocked for more than 122 seconds.
[2187983.900049]       Not tainted 6.8.0-49-generic #49~22.04.1-Ubuntu
[2187983.900057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2187983.900063] task:poc.elf       state:D stack:0     pid:1717021 tgid:1717021 ppid:1716834 flags:0x00004006
[2187983.900087] Call Trace:
[2187983.900094]  <TASK>
[2187983.900355]  __schedule+0x27c/0x6a0
[2187983.900430]  schedule+0x33/0x110
[2187983.900442]  schedule_preempt_disabled+0x15/0x30
[2187983.900454]  __mutex_lock.constprop.0+0x3f8/0x7a0
[2187983.900476]  __mutex_lock_slowpath+0x13/0x20
[2187983.900486]  mutex_lock+0x3c/0x50
[2187983.900493]  __netlink_dump_start+0x76/0x2a0
[2187983.900552]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
[2187983.900673]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
[2187983.900699]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
[2187983.900802]  rdma_nl_rcv+0xe/0x20 [ib_core]
[2187983.900898]  netlink_unicast+0x1b0/0x2a0
[2187983.900911]  rdma_nl_unicast+0x49/0x70 [ib_core]
[2187983.901005]  iwpm_send_hello+0xfd/0x150 [iw_cm]
[2187983.901030]  iwpm_hello_cb+0xb9/0x130 [iw_cm]
[2187983.901052]  netlink_dump+0x1c0/0x340
[2187983.901065]  __netlink_dump_start+0x1ef/0x2a0
[2187983.901077]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
[2187983.901219]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
[2187983.901245]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
[2187983.901344]  rdma_nl_rcv+0xe/0x20 [ib_core]
[2187983.901437]  netlink_unicast+0x1b0/0x2a0
[2187983.901449]  rdma_nl_unicast+0x49/0x70 [ib_core]
[2187983.901544]  iwpm_send_hello+0xfd/0x150 [iw_cm]
[2187983.901567]  iwpm_hello_cb+0xb9/0x130 [iw_cm]
[2187983.901589]  netlink_dump+0x1c0/0x340
[2187983.901602]  __netlink_dump_start+0x1ef/0x2a0
[2187983.901613]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
[2187983.901707]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
[2187983.901731]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
[2187983.901830]  rdma_nl_rcv+0xe/0x20 [ib_core]
[2187983.901922]  netlink_unicast+0x1b0/0x2a0
[2187983.901933]  netlink_sendmsg+0x214/0x470
[2187983.901946]  __sys_sendto+0x21b/0x230
[2187983.901992]  __x64_sys_sendto+0x24/0x40
[2187983.902002]  x64_sys_call+0x1fc0/0x24b0
[2187983.902023]  do_syscall_64+0x81/0x170
[2187983.902059]  ? security_file_alloc+0x5f/0xf0
[2187983.902079]  ? alloc_empty_file+0x85/0x130
[2187983.902140]  ? alloc_file+0x9b/0x170
[2187983.902150]  ? alloc_file_pseudo+0x9e/0x100
[2187983.902163]  ? restore_fpregs_from_fpstate+0x3d/0xd0
[2187983.902197]  ? switch_fpu_return+0x55/0xf0
[2187983.902208]  ? syscall_exit_to_user_mode+0x83/0x260
[2187983.902229]  ? do_syscall_64+0x8d/0x170
[2187983.902240]  ? irqentry_exit+0x43/0x50
[2187983.902249]  ? clear_bhb_loop+0x15/0x70
[2187983.902293]  ? clear_bhb_loop+0x15/0x70
[2187983.902302]  ? clear_bhb_loop+0x15/0x70
[2187983.902311]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[2187983.902319] RIP: 0033:0x440624
[2187983.902582] RSP: 002b:00007ffcfa4b29f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[2187983.902592] RAX: ffffffffffffffda RBX: 0000000000400400 RCX: 0000000000440624
[2187983.902598] RDX: 0000000000000018 RSI: 00007ffcfa4b2a30 RDI: 0000000000000003
[2187983.902604] RBP: 00007ffcfa4b3a40 R08: 000000000047df08 R09: 000000000000000c
[2187983.902609] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403990
[2187983.902614] R13: 0000000000000000 R14: 00000000006a6018 R15: 0000000000000000

That's why I'm quite sure this is a bug and requires fixing.

Thanks
Lin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
  2024-12-24 16:16       ` Lin Ma
@ 2024-12-24 19:26         ` Leon Romanovsky
  2024-12-25  1:58           ` Lin Ma
  0 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2024-12-24 19:26 UTC (permalink / raw)
  To: Lin Ma; +Cc: jgg, cmeiohas, michaelgur, linux-rdma, linux-kernel

On Wed, Dec 25, 2024 at 12:16:45AM +0800, Lin Ma wrote:
> Hello Leon,
> 
> > > Please let me know if I understand this correctly or incorrectly?
> > 
> > The thing is that down_write() is called when we unregistering module
> > which sent netlink messages. It shouldn't happen.
> > 
> 
> I acknowledge that this is a low-probability event. However, the race
> condition still exists; otherwise, these read and write semaphores
> would not be necessary. Why not just remove all of them?

netlink input and module removal are different paths and they can be in
parallel, and from this race, the semaphore is protecting.

Do you have reproducer for that?

> 
> Moreover, I find that even without the deadlock, this reentrant message
> would hang the kernel and cannot be killed, with logs like below:
> (after disabling locking sanitizer, tested in latest ubuntu)
> 
> [2187983.899998] INFO: task poc.elf:1717021 blocked for more than 122 seconds.
> [2187983.900049]       Not tainted 6.8.0-49-generic #49~22.04.1-Ubuntu
> [2187983.900057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [2187983.900063] task:poc.elf       state:D stack:0     pid:1717021 tgid:1717021 ppid:1716834 flags:0x00004006
> [2187983.900087] Call Trace:
> [2187983.900094]  <TASK>
> [2187983.900355]  __schedule+0x27c/0x6a0
> [2187983.900430]  schedule+0x33/0x110
> [2187983.900442]  schedule_preempt_disabled+0x15/0x30
> [2187983.900454]  __mutex_lock.constprop.0+0x3f8/0x7a0
> [2187983.900476]  __mutex_lock_slowpath+0x13/0x20
> [2187983.900486]  mutex_lock+0x3c/0x50
> [2187983.900493]  __netlink_dump_start+0x76/0x2a0
> [2187983.900552]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
> [2187983.900673]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
> [2187983.900699]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
> [2187983.900802]  rdma_nl_rcv+0xe/0x20 [ib_core]
> [2187983.900898]  netlink_unicast+0x1b0/0x2a0
> [2187983.900911]  rdma_nl_unicast+0x49/0x70 [ib_core]
> [2187983.901005]  iwpm_send_hello+0xfd/0x150 [iw_cm]
> [2187983.901030]  iwpm_hello_cb+0xb9/0x130 [iw_cm]
> [2187983.901052]  netlink_dump+0x1c0/0x340
> [2187983.901065]  __netlink_dump_start+0x1ef/0x2a0
> [2187983.901077]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
> [2187983.901219]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
> [2187983.901245]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
> [2187983.901344]  rdma_nl_rcv+0xe/0x20 [ib_core]
> [2187983.901437]  netlink_unicast+0x1b0/0x2a0
> [2187983.901449]  rdma_nl_unicast+0x49/0x70 [ib_core]
> [2187983.901544]  iwpm_send_hello+0xfd/0x150 [iw_cm]
> [2187983.901567]  iwpm_hello_cb+0xb9/0x130 [iw_cm]
> [2187983.901589]  netlink_dump+0x1c0/0x340
> [2187983.901602]  __netlink_dump_start+0x1ef/0x2a0
> [2187983.901613]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
> [2187983.901707]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
> [2187983.901731]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
> [2187983.901830]  rdma_nl_rcv+0xe/0x20 [ib_core]
> [2187983.901922]  netlink_unicast+0x1b0/0x2a0
> [2187983.901933]  netlink_sendmsg+0x214/0x470
> [2187983.901946]  __sys_sendto+0x21b/0x230
> [2187983.901992]  __x64_sys_sendto+0x24/0x40
> [2187983.902002]  x64_sys_call+0x1fc0/0x24b0
> [2187983.902023]  do_syscall_64+0x81/0x170
> [2187983.902059]  ? security_file_alloc+0x5f/0xf0
> [2187983.902079]  ? alloc_empty_file+0x85/0x130
> [2187983.902140]  ? alloc_file+0x9b/0x170
> [2187983.902150]  ? alloc_file_pseudo+0x9e/0x100
> [2187983.902163]  ? restore_fpregs_from_fpstate+0x3d/0xd0
> [2187983.902197]  ? switch_fpu_return+0x55/0xf0
> [2187983.902208]  ? syscall_exit_to_user_mode+0x83/0x260
> [2187983.902229]  ? do_syscall_64+0x8d/0x170
> [2187983.902240]  ? irqentry_exit+0x43/0x50
> [2187983.902249]  ? clear_bhb_loop+0x15/0x70
> [2187983.902293]  ? clear_bhb_loop+0x15/0x70
> [2187983.902302]  ? clear_bhb_loop+0x15/0x70
> [2187983.902311]  entry_SYSCALL_64_after_hwframe+0x78/0x80
> [2187983.902319] RIP: 0033:0x440624
> [2187983.902582] RSP: 002b:00007ffcfa4b29f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> [2187983.902592] RAX: ffffffffffffffda RBX: 0000000000400400 RCX: 0000000000440624
> [2187983.902598] RDX: 0000000000000018 RSI: 00007ffcfa4b2a30 RDI: 0000000000000003
> [2187983.902604] RBP: 00007ffcfa4b3a40 R08: 000000000047df08 R09: 000000000000000c
> [2187983.902609] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403990
> [2187983.902614] R13: 0000000000000000 R14: 00000000006a6018 R15: 0000000000000000
> 
> That's why I'm quite sure this is a bug and requires fixing.
> 
> Thanks
> Lin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
  2024-12-24 19:26         ` Leon Romanovsky
@ 2024-12-25  1:58           ` Lin Ma
  2024-12-30 18:28             ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Lin Ma @ 2024-12-25  1:58 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: jgg, cmeiohas, michaelgur, linux-rdma, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 156 bytes --]


> 
> Do you have reproducer for that?
> 

Yep, I attached the PoC code, please enable CONFIG_INFINIBAND
for testing.

Thanks
By the way, Merry Christmas~


[-- Attachment #2: poc.c --]
[-- Type: text/plain, Size: 1584 bytes --]

// gcc poc.c -static -o poc.elf -lmnl
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <stdbool.h>

#include <libmnl/libmnl.h>

#define PAGE_SIZE 0x1000
#define RDMA_NL_GET_CLIENT(type) ((type & (((1 << 6) - 1) << 10)) >> 10)
#define RDMA_NL_GET_OP(type) (type & ((1 << 10) - 1))
#define RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
#define RDMA_NL_IWCM (2)
#define IWPM_NLA_HELLO_ABI_VERSION (1)

enum
{
    RDMA_NL_IWPM_REG_PID = 0,
    RDMA_NL_IWPM_ADD_MAPPING,
    RDMA_NL_IWPM_QUERY_MAPPING,
    RDMA_NL_IWPM_REMOVE_MAPPING,
    RDMA_NL_IWPM_REMOTE_INFO,
    RDMA_NL_IWPM_HANDLE_ERR,
    RDMA_NL_IWPM_MAPINFO,
    RDMA_NL_IWPM_MAPINFO_NUM,
    RDMA_NL_IWPM_HELLO,
    RDMA_NL_IWPM_NUM_OPS
};

int main(int argc, char const *argv[])
{
    struct mnl_socket *sock;
    struct nlmsghdr *nlh;
    char buf[PAGE_SIZE];
    int err;

    sock = mnl_socket_open(NETLINK_RDMA);
    if (sock == NULL)
    {
        perror("mnl_socket_open");
        exit(-1);
    }

    nlh = mnl_nlmsg_put_header(buf);
    nlh->nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_IWCM, RDMA_NL_IWPM_HELLO);
    nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
    nlh->nlmsg_seq = 1;
    nlh->nlmsg_pid = 0;

    // static const struct nla_policy hello_policy[IWPM_NLA_HELLO_MAX] = {
    //     [IWPM_NLA_HELLO_ABI_VERSION]     = { .type = NLA_U16 }
    // };
    mnl_attr_put_u16(nlh, IWPM_NLA_HELLO_ABI_VERSION, 3);

    err = mnl_socket_sendto(sock, buf, nlh->nlmsg_len);
    if (err < 0)
    {
        perror("mnl_socket_sendto");
        exit(-1);
    }
    return 0;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
  2024-12-25  1:58           ` Lin Ma
@ 2024-12-30 18:28             ` Leon Romanovsky
  2025-01-08 15:14               ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2024-12-30 18:28 UTC (permalink / raw)
  To: Lin Ma; +Cc: jgg, cmeiohas, michaelgur, linux-rdma, linux-kernel

On Wed, Dec 25, 2024 at 09:58:35AM +0800, Lin Ma wrote:
> 
> > 
> > Do you have reproducer for that?
> > 
> 
> Yep, I attached the PoC code, please enable CONFIG_INFINIBAND
> for testing.

Thanks a lot for the repro. I wonder why iWARP folks never complained
about it, Anyway I have local fix, but need to test it before sending,
will do after New Year holidays.

Thanks again.

> 
> Thanks
> By the way, Merry Christmas~
> 

> // gcc poc.c -static -o poc.elf -lmnl
> #include <stdio.h>
> #include <stdlib.h>
> #include <stdint.h>
> #include <string.h>
> #include <stdbool.h>
> 
> #include <libmnl/libmnl.h>
> 
> #define PAGE_SIZE 0x1000
> #define RDMA_NL_GET_CLIENT(type) ((type & (((1 << 6) - 1) << 10)) >> 10)
> #define RDMA_NL_GET_OP(type) (type & ((1 << 10) - 1))
> #define RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
> #define RDMA_NL_IWCM (2)
> #define IWPM_NLA_HELLO_ABI_VERSION (1)
> 
> enum
> {
>     RDMA_NL_IWPM_REG_PID = 0,
>     RDMA_NL_IWPM_ADD_MAPPING,
>     RDMA_NL_IWPM_QUERY_MAPPING,
>     RDMA_NL_IWPM_REMOVE_MAPPING,
>     RDMA_NL_IWPM_REMOTE_INFO,
>     RDMA_NL_IWPM_HANDLE_ERR,
>     RDMA_NL_IWPM_MAPINFO,
>     RDMA_NL_IWPM_MAPINFO_NUM,
>     RDMA_NL_IWPM_HELLO,
>     RDMA_NL_IWPM_NUM_OPS
> };
> 
> int main(int argc, char const *argv[])
> {
>     struct mnl_socket *sock;
>     struct nlmsghdr *nlh;
>     char buf[PAGE_SIZE];
>     int err;
> 
>     sock = mnl_socket_open(NETLINK_RDMA);
>     if (sock == NULL)
>     {
>         perror("mnl_socket_open");
>         exit(-1);
>     }
> 
>     nlh = mnl_nlmsg_put_header(buf);
>     nlh->nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_IWCM, RDMA_NL_IWPM_HELLO);
>     nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
>     nlh->nlmsg_seq = 1;
>     nlh->nlmsg_pid = 0;
> 
>     // static const struct nla_policy hello_policy[IWPM_NLA_HELLO_MAX] = {
>     //     [IWPM_NLA_HELLO_ABI_VERSION]     = { .type = NLA_U16 }
>     // };
>     mnl_attr_put_u16(nlh, IWPM_NLA_HELLO_ABI_VERSION, 3);
> 
>     err = mnl_socket_sendto(sock, buf, nlh->nlmsg_len);
>     if (err < 0)
>     {
>         perror("mnl_socket_sendto");
>         exit(-1);
>     }
>     return 0;
> }


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
  2024-12-30 18:28             ` Leon Romanovsky
@ 2025-01-08 15:14               ` Leon Romanovsky
  0 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-01-08 15:14 UTC (permalink / raw)
  To: Lin Ma, Potnuri Bharat Teja, Bernard Metzler, Michal Kalderon
  Cc: jgg, cmeiohas, michaelgur, linux-rdma, linux-kernel

On Mon, Dec 30, 2024 at 08:28:28PM +0200, Leon Romanovsky wrote:
> On Wed, Dec 25, 2024 at 09:58:35AM +0800, Lin Ma wrote:
> > 
> > > 
> > > Do you have reproducer for that?
> > > 
> > 
> > Yep, I attached the PoC code, please enable CONFIG_INFINIBAND
> > for testing.
> 
> Thanks a lot for the repro. I wonder why iWARP folks never complained
> about it, Anyway I have local fix, but need to test it before sending,
> will do after New Year holidays.

I was wrong, there is no simple fix for this issue.
The root cause for these lockdep warnings is nested locking in iWARP.
IWCM uses dump callbacks as doit ones. See the following FIXME line:

  184         /* FIXME: Convert IWCM to properly handle doit callbacks */
  185         if ((nlh->nlmsg_flags & NLM_F_DUMP) || index == RDMA_NL_IWCM) {
  186                 struct netlink_dump_control c = {
  187                         .dump = cb_table[op].dump,
  188                 };
  189                 if (c.dump)
  190                         err = netlink_dump_start(skb->sk, skb, nlh, &c);

In our case,
 cb_table[op].dump ->
 	iwpm_hello_cb ->
		iwpm_send_hello ->
			rdma_nl_unicast() <---- this shouldn't be in dump callbacks.

The right and only viable solution is to convert all IWCM to use .doit callbacks.
Do any iWARP developer/user volunteer for such conversion?

Thanks

> 
> Thanks again.
> 
> > 
> > Thanks
> > By the way, Merry Christmas~
> > 
> 
> > // gcc poc.c -static -o poc.elf -lmnl
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <stdint.h>
> > #include <string.h>
> > #include <stdbool.h>
> > 
> > #include <libmnl/libmnl.h>
> > 
> > #define PAGE_SIZE 0x1000
> > #define RDMA_NL_GET_CLIENT(type) ((type & (((1 << 6) - 1) << 10)) >> 10)
> > #define RDMA_NL_GET_OP(type) (type & ((1 << 10) - 1))
> > #define RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
> > #define RDMA_NL_IWCM (2)
> > #define IWPM_NLA_HELLO_ABI_VERSION (1)
> > 
> > enum
> > {
> >     RDMA_NL_IWPM_REG_PID = 0,
> >     RDMA_NL_IWPM_ADD_MAPPING,
> >     RDMA_NL_IWPM_QUERY_MAPPING,
> >     RDMA_NL_IWPM_REMOVE_MAPPING,
> >     RDMA_NL_IWPM_REMOTE_INFO,
> >     RDMA_NL_IWPM_HANDLE_ERR,
> >     RDMA_NL_IWPM_MAPINFO,
> >     RDMA_NL_IWPM_MAPINFO_NUM,
> >     RDMA_NL_IWPM_HELLO,
> >     RDMA_NL_IWPM_NUM_OPS
> > };
> > 
> > int main(int argc, char const *argv[])
> > {
> >     struct mnl_socket *sock;
> >     struct nlmsghdr *nlh;
> >     char buf[PAGE_SIZE];
> >     int err;
> > 
> >     sock = mnl_socket_open(NETLINK_RDMA);
> >     if (sock == NULL)
> >     {
> >         perror("mnl_socket_open");
> >         exit(-1);
> >     }
> > 
> >     nlh = mnl_nlmsg_put_header(buf);
> >     nlh->nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_IWCM, RDMA_NL_IWPM_HELLO);
> >     nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
> >     nlh->nlmsg_seq = 1;
> >     nlh->nlmsg_pid = 0;
> > 
> >     // static const struct nla_policy hello_policy[IWPM_NLA_HELLO_MAX] = {
> >     //     [IWPM_NLA_HELLO_ABI_VERSION]     = { .type = NLA_U16 }
> >     // };
> >     mnl_attr_put_u16(nlh, IWPM_NLA_HELLO_ABI_VERSION, 3);
> > 
> >     err = mnl_socket_sendto(sock, buf, nlh->nlmsg_len);
> >     if (err < 0)
> >     {
> >         perror("mnl_socket_sendto");
> >         exit(-1);
> >     }
> >     return 0;
> > }
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-01-08 15:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-20 15:32 [bug report] RDMA/iwpm: reentrant iwpm hello message Lin Ma
2024-12-24  9:29 ` Leon Romanovsky
2024-12-24 10:51   ` Lin Ma
2024-12-24 14:11     ` Leon Romanovsky
2024-12-24 16:16       ` Lin Ma
2024-12-24 19:26         ` Leon Romanovsky
2024-12-25  1:58           ` Lin Ma
2024-12-30 18:28             ` Leon Romanovsky
2025-01-08 15:14               ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).