Netdev List
 help / color / mirror / Atom feed
* bnxt_en: suspicious RCU usage in bnxt_fw_reset_task() qdisc path in 7.1-rc6
@ 2026-06-02 15:33 Breno Leitao
  2026-06-02 17:49 ` Stanislav Fomichev
  2026-06-02 17:54 ` Jakub Kicinski
  0 siblings, 2 replies; 4+ messages in thread
From: Breno Leitao @ 2026-06-02 15:33 UTC (permalink / raw)
  To: michael.chan, pavan.chebbi, stfomichev, kuba
  Cc: kernel-team, netdev, linux-kernel

I am hitting a "suspicious RCU usage" lockdep splat from bnxt_en
during a firmware reset on v7.1-rc6 (e43ffb69e043) with PROVE_RCU
and PROVE_LOCKING enabled.

The firmware reset path re-opens the device from bnxt_fw_reset_task()
holding only the netdev instance lock (&dev->lock).  bnxt_open() ->
__bnxt_open_nic() -> bnxt_set_real_num_queues() ->
netif_set_real_num_tx_queues() then walks the qdisc tree in
dev_qdisc_change_real_num_tx(), which still dereferences dev->qdisc
with rtnl_dereference() and therefore expects rtnl_lock to be held:

  void dev_qdisc_change_real_num_tx(struct net_device *dev,
                                    unsigned int new_real_tx)
  {
        struct Qdisc *qdisc = rtnl_dereference(dev->qdisc);
        ...
  }

Since only the instance lock is held in this path, lockdep complains.

Splat
-----

  =============================
  WARNING: suspicious RCU usage
  7.1.0 #1 Tainted: G            E
  -----------------------------
  net/sched/sch_generic.c:1416 suspicious rcu_dereference_protected() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  3 locks held by kworker/u208:1/13:
   #0: ((wq_completion)bnxt_pf_wq){+.+.}-{0:0}, at: process_scheduled_works+0x8f7/0x13b0
   #1: ((work_completion)(&(&bp->fw_reset_task)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x917/0x13b0
   #2: (&dev->lock){+.+.}-{4:4}, at: bnxt_fw_reset_task+0x7ed/0x1e80
  stack backtrace:
  CPU: 38 UID: 0 PID: 13 Comm: kworker/u208:1 Tainted: G            E
  Hardware name: Wiwynn Delta Lake MP/Delta Lake-Class1, BIOS Y3DL405 11/21/2025
  Workqueue: bnxt_pf_wq bnxt_fw_reset_task
   <TASK>
   dump_stack_lvl+0x69/0xa0
   lockdep_rcu_suspicious+0x13f/0x1d0
   dev_qdisc_change_real_num_tx+0x54/0xe0
   netif_set_real_num_tx_queues+0x4ed/0xa80
   __bnxt_open_nic+0x9cb/0x3490
   ? bnxt_hwrm_if_change+0x4fd/0x620
   bnxt_open+0x1cb/0x370
   bnxt_fw_reset_task+0x80d/0x1e80
   process_scheduled_works+0x9c1/0x13b0
   worker_thread+0x90d/0xd20
   kthread+0x320/0x3f0
   ret_from_fork+0x2b6/0xb00
   ret_from_fork_asm+0x11/0x20
   </TASK>

Bisect / analysis
-----------------

This looks like a regression from:

  850d9248d2ea ("Revert "bnxt_en: bring back rtnl_lock() in the
                  bnxt_open() path"")

That revert dropped rtnl_lock() from the bnxt_open() paths and left
bnxt_fw_reset_task() holding only the instance lock around bnxt_open().

I can send a patch restoring rtnl_lock() around the bnxt_open() call in
bnxt_fw_reset_task() (re-taking rtnl before the instance lock, matching the
pre-revert code), but I wanted to report it first to make sure I am in the
right direction.

Please let me know how you would like to proceed.

Thanks,
Breno


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bnxt_en: suspicious RCU usage in bnxt_fw_reset_task() qdisc path in 7.1-rc6
  2026-06-02 15:33 bnxt_en: suspicious RCU usage in bnxt_fw_reset_task() qdisc path in 7.1-rc6 Breno Leitao
@ 2026-06-02 17:49 ` Stanislav Fomichev
  2026-06-02 18:53   ` Stanislav Fomichev
  2026-06-02 17:54 ` Jakub Kicinski
  1 sibling, 1 reply; 4+ messages in thread
From: Stanislav Fomichev @ 2026-06-02 17:49 UTC (permalink / raw)
  To: Breno Leitao
  Cc: michael.chan, pavan.chebbi, stfomichev, kuba, kernel-team, netdev,
	linux-kernel

On 06/02, Breno Leitao wrote:
> I am hitting a "suspicious RCU usage" lockdep splat from bnxt_en
> during a firmware reset on v7.1-rc6 (e43ffb69e043) with PROVE_RCU
> and PROVE_LOCKING enabled.
> 
> The firmware reset path re-opens the device from bnxt_fw_reset_task()
> holding only the netdev instance lock (&dev->lock).  bnxt_open() ->
> __bnxt_open_nic() -> bnxt_set_real_num_queues() ->
> netif_set_real_num_tx_queues() then walks the qdisc tree in
> dev_qdisc_change_real_num_tx(), which still dereferences dev->qdisc
> with rtnl_dereference() and therefore expects rtnl_lock to be held:
> 
>   void dev_qdisc_change_real_num_tx(struct net_device *dev,
>                                     unsigned int new_real_tx)
>   {
>         struct Qdisc *qdisc = rtnl_dereference(dev->qdisc);
>         ...
>   }
> 
> Since only the instance lock is held in this path, lockdep complains.
> 
> Splat
> -----
> 
>   =============================
>   WARNING: suspicious RCU usage
>   7.1.0 #1 Tainted: G            E
>   -----------------------------
>   net/sched/sch_generic.c:1416 suspicious rcu_dereference_protected() usage!
> 
>   other info that might help us debug this:
> 
>   rcu_scheduler_active = 2, debug_locks = 1
>   3 locks held by kworker/u208:1/13:
>    #0: ((wq_completion)bnxt_pf_wq){+.+.}-{0:0}, at: process_scheduled_works+0x8f7/0x13b0
>    #1: ((work_completion)(&(&bp->fw_reset_task)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x917/0x13b0
>    #2: (&dev->lock){+.+.}-{4:4}, at: bnxt_fw_reset_task+0x7ed/0x1e80
>   stack backtrace:
>   CPU: 38 UID: 0 PID: 13 Comm: kworker/u208:1 Tainted: G            E
>   Hardware name: Wiwynn Delta Lake MP/Delta Lake-Class1, BIOS Y3DL405 11/21/2025
>   Workqueue: bnxt_pf_wq bnxt_fw_reset_task
>    <TASK>
>    dump_stack_lvl+0x69/0xa0
>    lockdep_rcu_suspicious+0x13f/0x1d0
>    dev_qdisc_change_real_num_tx+0x54/0xe0
>    netif_set_real_num_tx_queues+0x4ed/0xa80
>    __bnxt_open_nic+0x9cb/0x3490
>    ? bnxt_hwrm_if_change+0x4fd/0x620
>    bnxt_open+0x1cb/0x370
>    bnxt_fw_reset_task+0x80d/0x1e80
>    process_scheduled_works+0x9c1/0x13b0
>    worker_thread+0x90d/0xd20
>    kthread+0x320/0x3f0
>    ret_from_fork+0x2b6/0xb00
>    ret_from_fork_asm+0x11/0x20
>    </TASK>
> 
> Bisect / analysis
> -----------------
> 
> This looks like a regression from:
> 
>   850d9248d2ea ("Revert "bnxt_en: bring back rtnl_lock() in the
>                   bnxt_open() path"")
> 
> That revert dropped rtnl_lock() from the bnxt_open() paths and left
> bnxt_fw_reset_task() holding only the instance lock around bnxt_open().
> 
> I can send a patch restoring rtnl_lock() around the bnxt_open() call in
> bnxt_fw_reset_task() (re-taking rtnl before the instance lock, matching the
> pre-revert code), but I wanted to report it first to make sure I am in the
> right direction.
>
> Please let me know how you would like to proceed.

I hope (but need to verify) that at this point most of the paths that assign
qdisc are under ops lock as well. So the alternative might be to use
netdev_ops_lock_dereference from Jakub's recent
https://lore.kernel.org/netdev/20260528231637.251822-1-kuba@kernel.org/t/#m3de56e1d53f86b2c1b12ca29d1617dd6635e78cb

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bnxt_en: suspicious RCU usage in bnxt_fw_reset_task() qdisc path in 7.1-rc6
  2026-06-02 15:33 bnxt_en: suspicious RCU usage in bnxt_fw_reset_task() qdisc path in 7.1-rc6 Breno Leitao
  2026-06-02 17:49 ` Stanislav Fomichev
@ 2026-06-02 17:54 ` Jakub Kicinski
  1 sibling, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2026-06-02 17:54 UTC (permalink / raw)
  To: Breno Leitao
  Cc: michael.chan, pavan.chebbi, stfomichev, kernel-team, netdev,
	linux-kernel

On Tue, 2 Jun 2026 08:33:31 -0700 Breno Leitao wrote:
> This looks like a regression from:
> 
>   850d9248d2ea ("Revert "bnxt_en: bring back rtnl_lock() in the
>                   bnxt_open() path"")
> 
> That revert dropped rtnl_lock() from the bnxt_open() paths and left
> bnxt_fw_reset_task() holding only the instance lock around bnxt_open().
> 
> I can send a patch restoring rtnl_lock() around the bnxt_open() call in
> bnxt_fw_reset_task() (re-taking rtnl before the instance lock, matching the
> pre-revert code), but I wanted to report it first to make sure I am in the
> right direction.

Yes, we need to revert the revert. Per my recent ethtool locking series
netif_set_num_tx_queues needs rtnl_lock for now.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bnxt_en: suspicious RCU usage in bnxt_fw_reset_task() qdisc path in 7.1-rc6
  2026-06-02 17:49 ` Stanislav Fomichev
@ 2026-06-02 18:53   ` Stanislav Fomichev
  0 siblings, 0 replies; 4+ messages in thread
From: Stanislav Fomichev @ 2026-06-02 18:53 UTC (permalink / raw)
  To: Breno Leitao
  Cc: michael.chan, pavan.chebbi, stfomichev, kuba, kernel-team, netdev,
	linux-kernel

On 06/02, Stanislav Fomichev wrote:
> On 06/02, Breno Leitao wrote:
> > I am hitting a "suspicious RCU usage" lockdep splat from bnxt_en
> > during a firmware reset on v7.1-rc6 (e43ffb69e043) with PROVE_RCU
> > and PROVE_LOCKING enabled.
> > 
> > The firmware reset path re-opens the device from bnxt_fw_reset_task()
> > holding only the netdev instance lock (&dev->lock).  bnxt_open() ->
> > __bnxt_open_nic() -> bnxt_set_real_num_queues() ->
> > netif_set_real_num_tx_queues() then walks the qdisc tree in
> > dev_qdisc_change_real_num_tx(), which still dereferences dev->qdisc
> > with rtnl_dereference() and therefore expects rtnl_lock to be held:
> > 
> >   void dev_qdisc_change_real_num_tx(struct net_device *dev,
> >                                     unsigned int new_real_tx)
> >   {
> >         struct Qdisc *qdisc = rtnl_dereference(dev->qdisc);
> >         ...
> >   }
> > 
> > Since only the instance lock is held in this path, lockdep complains.
> > 
> > Splat
> > -----
> > 
> >   =============================
> >   WARNING: suspicious RCU usage
> >   7.1.0 #1 Tainted: G            E
> >   -----------------------------
> >   net/sched/sch_generic.c:1416 suspicious rcu_dereference_protected() usage!
> > 
> >   other info that might help us debug this:
> > 
> >   rcu_scheduler_active = 2, debug_locks = 1
> >   3 locks held by kworker/u208:1/13:
> >    #0: ((wq_completion)bnxt_pf_wq){+.+.}-{0:0}, at: process_scheduled_works+0x8f7/0x13b0
> >    #1: ((work_completion)(&(&bp->fw_reset_task)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x917/0x13b0
> >    #2: (&dev->lock){+.+.}-{4:4}, at: bnxt_fw_reset_task+0x7ed/0x1e80
> >   stack backtrace:
> >   CPU: 38 UID: 0 PID: 13 Comm: kworker/u208:1 Tainted: G            E
> >   Hardware name: Wiwynn Delta Lake MP/Delta Lake-Class1, BIOS Y3DL405 11/21/2025
> >   Workqueue: bnxt_pf_wq bnxt_fw_reset_task
> >    <TASK>
> >    dump_stack_lvl+0x69/0xa0
> >    lockdep_rcu_suspicious+0x13f/0x1d0
> >    dev_qdisc_change_real_num_tx+0x54/0xe0
> >    netif_set_real_num_tx_queues+0x4ed/0xa80
> >    __bnxt_open_nic+0x9cb/0x3490
> >    ? bnxt_hwrm_if_change+0x4fd/0x620
> >    bnxt_open+0x1cb/0x370
> >    bnxt_fw_reset_task+0x80d/0x1e80
> >    process_scheduled_works+0x9c1/0x13b0
> >    worker_thread+0x90d/0xd20
> >    kthread+0x320/0x3f0
> >    ret_from_fork+0x2b6/0xb00
> >    ret_from_fork_asm+0x11/0x20
> >    </TASK>
> > 
> > Bisect / analysis
> > -----------------
> > 
> > This looks like a regression from:
> > 
> >   850d9248d2ea ("Revert "bnxt_en: bring back rtnl_lock() in the
> >                   bnxt_open() path"")
> > 
> > That revert dropped rtnl_lock() from the bnxt_open() paths and left
> > bnxt_fw_reset_task() holding only the instance lock around bnxt_open().
> > 
> > I can send a patch restoring rtnl_lock() around the bnxt_open() call in
> > bnxt_fw_reset_task() (re-taking rtnl before the instance lock, matching the
> > pre-revert code), but I wanted to report it first to make sure I am in the
> > right direction.
> >
> > Please let me know how you would like to proceed.
> 
> I hope (but need to verify) that at this point most of the paths that assign
> qdisc are under ops lock as well. So the alternative might be to use
> netdev_ops_lock_dereference from Jakub's recent
> https://lore.kernel.org/netdev/20260528231637.251822-1-kuba@kernel.org/t/#m3de56e1d53f86b2c1b12ca29d1617dd6635e78cb

Agree with Jakub, this is more than just the assignment :-(

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-02 18:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 15:33 bnxt_en: suspicious RCU usage in bnxt_fw_reset_task() qdisc path in 7.1-rc6 Breno Leitao
2026-06-02 17:49 ` Stanislav Fomichev
2026-06-02 18:53   ` Stanislav Fomichev
2026-06-02 17:54 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox