* [bug report] rdma: rtnl_lock deadlock?
@ 2019-08-07 2:21 Jiangyiwen
2019-08-07 12:10 ` Jason Gunthorpe
0 siblings, 1 reply; 3+ messages in thread
From: Jiangyiwen @ 2019-08-07 2:21 UTC (permalink / raw)
To: bvanassche, dledford, jgg
Cc: linux-rdma, target-devel, yebiaoxiang, Xiexiangyou
Hello,
I find a scenario may cause deadlock of rtnl_lock as follows:
1. CPU1 add rtnl_lock and wait kworker finished.
CPU1 add rtnl_lock before call unregister_netdevice_queue() and
then wait sport->work(function srpt_refresh_port_work) finished
in srpt_remove_one().
[<0>] __switch_to+0x94/0xe8
[<0>] __flush_work+0x128/0x280
[<0>] __cancel_work_timer+0x13c/0x1b0
[<0>] cancel_work_sync+0x24/0x30
[<0>] srpt_remove_one+0xf0/0x530 [ib_srpt]
[<0>] ib_unregister_device+0x124/0x230 [ib_core]
[<0>] rxe_unregister_device+0x30/0x40 [rdma_rxe]
[<0>] rxe_remove+0x20/0x50 [rdma_rxe]
[<0>] rxe_notify+0xe8/0x150 [rdma_rxe]
[<0>] notifier_call_chain+0x5c/0xa0
[<0>] raw_notifier_call_chain+0x3c/0x50
[<0>] call_netdevice_notifiers_info+0x3c/0x80
[<0>] rollback_registered_many+0x35c/0x568
[<0>] rollback_registered+0x68/0xb0
[<0>] unregister_netdevice_queue+0xc0/0x110
[<0>] __tun_detach+0x25c/0x2a0 [tun]
[<0>] tun_chr_close+0x30/0x60 [tun]
[<0>] __fput+0xa4/0x1e0
[<0>] ____fput+0x20/0x30
[<0>] task_work_run+0xc0/0xf8
[<0>] do_notify_resume+0x12c/0x138
[<0>] work_pending+0x8/0x10
[<0>] 0xffffffffffffffff
2. CPU2 run sport->work and wait for rxe->usdev_lock.
CPU2 run work(sport->work function: srpt_refresh_port_work) and
wait for rxe->usdev_lock in rxe_query_port().
[<0>] __switch_to+0x94/0xe8
[<0>] rxe_query_port+0x6c/0xd0 [rdma_rxe]
[<0>] ib_query_port+0x84/0x120 [ib_core]
[<0>] srpt_refresh_port+0xa4/0x1b8 [ib_srpt]
[<0>] srpt_refresh_port_work+0x20/0x30 [ib_srpt]
[<0>] process_one_work+0x1b4/0x3f8
[<0>] worker_thread+0x54/0x470
[<0>] kthread+0x134/0x138
[<0>] ret_from_fork+0x10/0x18
[<0>] 0xffffffffffffffff
3. CPU3 add rxe->usdev_lock and wait for rtnl_lock.
CPU3 run ib_cache_task work and add rxe->usdev_lock, then wait for
rtnl_lock is unlocked.
[<0>] __switch_to+0x94/0xe8
[<0>] rtnl_lock+0x1c/0x28
[<0>] ib_get_eth_speed+0x78/0x1c0 [ib_core]
[<0>] rxe_query_port+0x80/0xd0 [rdma_rxe]
[<0>] ib_query_port+0x84/0x120 [ib_core]
[<0>] ib_cache_update.part.7+0x74/0x388 [ib_core]
[<0>] ib_cache_task+0x68/0x80 [ib_core]
[<0>] process_one_work+0x1b4/0x3f8
[<0>] worker_thread+0x54/0x470
[<0>] kthread+0x134/0x138
[<0>] ret_from_fork+0x10/0x18
[<0>] 0xffffffffffffffff
So, deadlock is produced, that is, CPU1 wait for CPU2 work is
finished, CPU2 wait for CPU3 unlock rxe->usdev_lock, CPU3 wait
for CPU1 unlock rtnl_lock.
I don't know how to solve it.
Thanks,
Yiwen.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug report] rdma: rtnl_lock deadlock?
2019-08-07 2:21 [bug report] rdma: rtnl_lock deadlock? Jiangyiwen
@ 2019-08-07 12:10 ` Jason Gunthorpe
2019-08-08 1:59 ` Jiangyiwen
0 siblings, 1 reply; 3+ messages in thread
From: Jason Gunthorpe @ 2019-08-07 12:10 UTC (permalink / raw)
To: Jiangyiwen
Cc: bvanassche, dledford, linux-rdma, target-devel, yebiaoxiang,
Xiexiangyou
On Wed, Aug 07, 2019 at 10:21:11AM +0800, Jiangyiwen wrote:
> Hello,
>
> I find a scenario may cause deadlock of rtnl_lock as follows:
>
> 1. CPU1 add rtnl_lock and wait kworker finished.
> CPU1 add rtnl_lock before call unregister_netdevice_queue() and
> then wait sport->work(function srpt_refresh_port_work) finished
> in srpt_remove_one().
This is an old kernel, this issue has been fixed
Jason
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug report] rdma: rtnl_lock deadlock?
2019-08-07 12:10 ` Jason Gunthorpe
@ 2019-08-08 1:59 ` Jiangyiwen
0 siblings, 0 replies; 3+ messages in thread
From: Jiangyiwen @ 2019-08-08 1:59 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: bvanassche, dledford, linux-rdma, target-devel, yebiaoxiang,
Xiexiangyou
Hi Jason,
On 2019/8/7 20:10, Jason Gunthorpe wrote:
> On Wed, Aug 07, 2019 at 10:21:11AM +0800, Jiangyiwen wrote:
>> Hello,
>>
>> I find a scenario may cause deadlock of rtnl_lock as follows:
>>
>> 1. CPU1 add rtnl_lock and wait kworker finished.
>> CPU1 add rtnl_lock before call unregister_netdevice_queue() and
>> then wait sport->work(function srpt_refresh_port_work) finished
>> in srpt_remove_one().
> This is an old kernel, this issue has been fixed
>
> Jason
>
Thank you for your reply, and can you tell me the commit id?
I use the kernel version is Linux4.19.36.
Thanks,
Yiwen.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-08-08 1:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-08-07 2:21 [bug report] rdma: rtnl_lock deadlock? Jiangyiwen
2019-08-07 12:10 ` Jason Gunthorpe
2019-08-08 1:59 ` Jiangyiwen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).