netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] net: mellanox: mlx4: possible deadlock in mlx4_xdp_set() and mlx4_en_reset_config()
@ 2022-02-07 15:16 Jia-Ju Bai
  2022-02-09 10:21 ` Tariq Toukan
  0 siblings, 1 reply; 3+ messages in thread
From: Jia-Ju Bai @ 2022-02-07 15:16 UTC (permalink / raw)
  To: tariqt, davem, kuba; +Cc: netdev, linux-rdma, linux-kernel

Hello,

My static analysis tool reports a possible deadlock in the mlx4 driver 
in Linux 5.16:

mlx4_xdp_set()
   mutex_lock(&mdev->state_lock); --> Line 2778 (Lock A)
   mlx4_en_try_alloc_resources()
     mlx4_en_alloc_resources()
       mlx4_en_destroy_tx_ring()
         mlx4_qp_free()
           wait_for_completion(&qp->free); --> Line 528 (Wait X)

mlx4_en_reset_config()
   mutex_lock(&mdev->state_lock); --> Line 3522 (Lock A)
   mlx4_en_try_alloc_resources()
     mlx4_en_alloc_resources()
       mlx4_en_destroy_tx_ring()
         mlx4_qp_free()
           complete(&qp->free); --> Line 527 (Wake X)

When mlx4_xdp_set() is executed, "Wait X" is performed by holding "Lock 
A". If mlx4_en_reset_config() is executed at this time, "Wake X" cannot 
be performed to wake up "Wait X" in mlx4_xdp_set(), because "Lock A" has 
been already hold by mlx4_xdp_set(), causing a possible deadlock.

I am not quite sure whether this possible problem is real and how to fix 
it if it is real.
Any feedback would be appreciated, thanks :)


Best wishes,
Jia-Ju Bai

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] net: mellanox: mlx4: possible deadlock in mlx4_xdp_set() and mlx4_en_reset_config()
  2022-02-07 15:16 [BUG] net: mellanox: mlx4: possible deadlock in mlx4_xdp_set() and mlx4_en_reset_config() Jia-Ju Bai
@ 2022-02-09 10:21 ` Tariq Toukan
  2022-02-09 11:32   ` Jia-Ju Bai
  0 siblings, 1 reply; 3+ messages in thread
From: Tariq Toukan @ 2022-02-09 10:21 UTC (permalink / raw)
  To: Jia-Ju Bai, tariqt, davem, kuba; +Cc: netdev, linux-rdma, linux-kernel



On 2/7/2022 5:16 PM, Jia-Ju Bai wrote:
> Hello,
> 
> My static analysis tool reports a possible deadlock in the mlx4 driver 
> in Linux 5.16:
> 

Hi Jia-Ju,
Thanks for your email.

Which static analysis tool do you use? Is it standard one?

> mlx4_xdp_set()
>    mutex_lock(&mdev->state_lock); --> Line 2778 (Lock A)
>    mlx4_en_try_alloc_resources()
>      mlx4_en_alloc_resources()
>        mlx4_en_destroy_tx_ring()
>          mlx4_qp_free()
>            wait_for_completion(&qp->free); --> Line 528 (Wait X)

The refcount_dec_and_test(&qp->refcount)) in mlx4_qp_free() pairs with 
refcount_set(&qp->refcount, 1); in mlx4_qp_alloc.
mlx4_qp_event increases and decreasing the refcount while running 
qp->event(qp, event_type); to protect it from being freed.

> 
> mlx4_en_reset_config()
>    mutex_lock(&mdev->state_lock); --> Line 3522 (Lock A)
>    mlx4_en_try_alloc_resources()
>      mlx4_en_alloc_resources()
>        mlx4_en_destroy_tx_ring()
>          mlx4_qp_free()
>            complete(&qp->free); --> Line 527 (Wake X)
> 
> When mlx4_xdp_set() is executed, "Wait X" is performed by holding "Lock 
> A". If mlx4_en_reset_config() is executed at this time, "Wake X" cannot 
> be performed to wake up "Wait X" in mlx4_xdp_set(), because "Lock A" has 
> been already hold by mlx4_xdp_set(), causing a possible deadlock.
> 
> I am not quite sure whether this possible problem is real and how to fix 
> it if it is real.
> Any feedback would be appreciated, thanks :)
> 

Not possible.
These are two different qps, maintaining two different instances of 
refcount and complete, following the behavior I described above.
> 
> Best wishes,
> Jia-Ju Bai

Thanks,
Tariq

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] net: mellanox: mlx4: possible deadlock in mlx4_xdp_set() and mlx4_en_reset_config()
  2022-02-09 10:21 ` Tariq Toukan
@ 2022-02-09 11:32   ` Jia-Ju Bai
  0 siblings, 0 replies; 3+ messages in thread
From: Jia-Ju Bai @ 2022-02-09 11:32 UTC (permalink / raw)
  To: Tariq Toukan, tariqt, davem, kuba; +Cc: netdev, linux-rdma, linux-kernel



On 2022/2/9 18:21, Tariq Toukan wrote:
>
>
> On 2/7/2022 5:16 PM, Jia-Ju Bai wrote:
>> Hello,
>>
>> My static analysis tool reports a possible deadlock in the mlx4 
>> driver in Linux 5.16:
>>
>
> Hi Jia-Ju,
> Thanks for your email.
>
> Which static analysis tool do you use? Is it standard one?

Hi Tariq,

Thanks for the reply and explanation :)
I developed this tool by myself, based on LLVM.

>
>> mlx4_xdp_set()
>>    mutex_lock(&mdev->state_lock); --> Line 2778 (Lock A)
>>    mlx4_en_try_alloc_resources()
>>      mlx4_en_alloc_resources()
>>        mlx4_en_destroy_tx_ring()
>>          mlx4_qp_free()
>>            wait_for_completion(&qp->free); --> Line 528 (Wait X)
>
> The refcount_dec_and_test(&qp->refcount)) in mlx4_qp_free() pairs with 
> refcount_set(&qp->refcount, 1); in mlx4_qp_alloc.
> mlx4_qp_event increases and decreasing the refcount while running 
> qp->event(qp, event_type); to protect it from being freed.
>
>>
>> mlx4_en_reset_config()
>>    mutex_lock(&mdev->state_lock); --> Line 3522 (Lock A)
>>    mlx4_en_try_alloc_resources()
>>      mlx4_en_alloc_resources()
>>        mlx4_en_destroy_tx_ring()
>>          mlx4_qp_free()
>>            complete(&qp->free); --> Line 527 (Wake X)
>>
>> When mlx4_xdp_set() is executed, "Wait X" is performed by holding 
>> "Lock A". If mlx4_en_reset_config() is executed at this time, "Wake 
>> X" cannot be performed to wake up "Wait X" in mlx4_xdp_set(), because 
>> "Lock A" has been already hold by mlx4_xdp_set(), causing a possible 
>> deadlock.
>>
>> I am not quite sure whether this possible problem is real and how to 
>> fix it if it is real.
>> Any feedback would be appreciated, thanks :)
>>
>
> Not possible.
> These are two different qps, maintaining two different instances of 
> refcount and complete, following the behavior I described above.

Okay, "there are two different qps" should be the reason of this false 
positive, and my tool cannot identify this reason in static analysis...


Best wishes,
Jia-Ju Bai

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-02-09 12:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-07 15:16 [BUG] net: mellanox: mlx4: possible deadlock in mlx4_xdp_set() and mlx4_en_reset_config() Jia-Ju Bai
2022-02-09 10:21 ` Tariq Toukan
2022-02-09 11:32   ` Jia-Ju Bai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).