netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task()
@ 2025-05-12  6:37 Michael Chan
  2025-05-12 14:20 ` Stanislav Fomichev
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Chan @ 2025-05-12  6:37 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, kuba, pabeni, andrew+netdev, pavan.chebbi,
	andrew.gospodarek, sdf, Kalesh AP

RTNL assertion failed in netif_set_real_num_tx_queues() in the
error recovery path:

RTNL: assertion failed at net/core/dev.c (3178)
WARNING: CPU: 3 PID: 3392 at net/core/dev.c:3178 netif_set_real_num_tx_queues+0x1fd/0x210

Call Trace:
 <TASK>
 ? __pfx_bnxt_msix+0x10/0x10 [bnxt_en]
 __bnxt_open_nic+0x1ef/0xb20 [bnxt_en]
 bnxt_open+0xda/0x130 [bnxt_en]
 bnxt_fw_reset_task+0x21f/0x780 [bnxt_en]
 process_scheduled_works+0x9d/0x400

Bring back the rtnl_lock() for now in bnxt_fw_reset_task().

Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 86a5de44b6f3..8df602663e0d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -14960,15 +14960,17 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		bp->fw_reset_state = BNXT_FW_RESET_STATE_OPENING;
 		fallthrough;
 	case BNXT_FW_RESET_STATE_OPENING:
-		while (!netdev_trylock(bp->dev)) {
+		while (!rtnl_trylock()) {
 			bnxt_queue_fw_reset_work(bp, HZ / 10);
 			return;
 		}
+		netdev_lock(bp->dev);
 		rc = bnxt_open(bp->dev);
 		if (rc) {
 			netdev_err(bp->dev, "bnxt_open() failed during FW reset\n");
 			bnxt_fw_reset_abort(bp, rc);
 			netdev_unlock(bp->dev);
+			rtnl_unlock();
 			goto ulp_start;
 		}
 
@@ -14988,6 +14990,7 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 			bnxt_dl_health_fw_status_update(bp, true);
 		}
 		netdev_unlock(bp->dev);
+		rtnl_unlock();
 		bnxt_ulp_start(bp, 0);
 		bnxt_reenable_sriov(bp);
 		netdev_lock(bp->dev);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task()
  2025-05-12  6:37 [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task() Michael Chan
@ 2025-05-12 14:20 ` Stanislav Fomichev
  2025-05-12 22:08   ` Michael Chan
  0 siblings, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-05-12 14:20 UTC (permalink / raw)
  To: Michael Chan
  Cc: davem, netdev, edumazet, kuba, pabeni, andrew+netdev,
	pavan.chebbi, andrew.gospodarek, sdf, Kalesh AP

On 05/11, Michael Chan wrote:
> RTNL assertion failed in netif_set_real_num_tx_queues() in the
> error recovery path:
> 
> RTNL: assertion failed at net/core/dev.c (3178)
> WARNING: CPU: 3 PID: 3392 at net/core/dev.c:3178 netif_set_real_num_tx_queues+0x1fd/0x210
> 
> Call Trace:
>  <TASK>
>  ? __pfx_bnxt_msix+0x10/0x10 [bnxt_en]
>  __bnxt_open_nic+0x1ef/0xb20 [bnxt_en]
>  bnxt_open+0xda/0x130 [bnxt_en]
>  bnxt_fw_reset_task+0x21f/0x780 [bnxt_en]
>  process_scheduled_works+0x9d/0x400
> 
> Bring back the rtnl_lock() for now in bnxt_fw_reset_task().
> 
> Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL")
> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 86a5de44b6f3..8df602663e0d 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -14960,15 +14960,17 @@ static void bnxt_fw_reset_task(struct work_struct *work)
>  		bp->fw_reset_state = BNXT_FW_RESET_STATE_OPENING;
>  		fallthrough;
>  	case BNXT_FW_RESET_STATE_OPENING:
> -		while (!netdev_trylock(bp->dev)) {
> +		while (!rtnl_trylock()) {
>  			bnxt_queue_fw_reset_work(bp, HZ / 10);
>  			return;
>  		}
> +		netdev_lock(bp->dev);
>  		rc = bnxt_open(bp->dev);
>  		if (rc) {
>  			netdev_err(bp->dev, "bnxt_open() failed during FW reset\n");
>  			bnxt_fw_reset_abort(bp, rc);
>  			netdev_unlock(bp->dev);
> +			rtnl_unlock();
>  			goto ulp_start;
>  		}
>  
> @@ -14988,6 +14990,7 @@ static void bnxt_fw_reset_task(struct work_struct *work)
>  			bnxt_dl_health_fw_status_update(bp, true);
>  		}
>  		netdev_unlock(bp->dev);
> +		rtnl_unlock();
>  		bnxt_ulp_start(bp, 0);
>  		bnxt_reenable_sriov(bp);
>  		netdev_lock(bp->dev);
> -- 
> 2.30.1
> 

Will the following work instead? netdev_ops_assert_locked should take
care of asserting either ops lock or rtnl lock depending on the device
properties.

diff --git a/net/core/dev.c b/net/core/dev.c
index c9013632296f..d8d29729c685 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3177,7 +3177,6 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
 
 	if (dev->reg_state == NETREG_REGISTERED ||
 	    dev->reg_state == NETREG_UNREGISTERING) {
-		ASSERT_RTNL();
 		netdev_ops_assert_locked(dev);
 
 		rc = netdev_queue_update_kobjects(dev, dev->real_num_tx_queues,
@@ -3227,7 +3226,6 @@ int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq)
 		return -EINVAL;
 
 	if (dev->reg_state == NETREG_REGISTERED) {
-		ASSERT_RTNL();
 		netdev_ops_assert_locked(dev);
 
 		rc = net_rx_queue_update_kobjects(dev, dev->real_num_rx_queues,

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task()
  2025-05-12 14:20 ` Stanislav Fomichev
@ 2025-05-12 22:08   ` Michael Chan
  2025-05-12 23:43     ` Stanislav Fomichev
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Chan @ 2025-05-12 22:08 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, netdev, edumazet, kuba, pabeni, andrew+netdev,
	pavan.chebbi, andrew.gospodarek, sdf, Kalesh AP

[-- Attachment #1: Type: text/plain, Size: 1397 bytes --]

On Mon, May 12, 2025 at 7:20 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> Will the following work instead? netdev_ops_assert_locked should take
> care of asserting either ops lock or rtnl lock depending on the device
> properties.

It works for netif_set_real_num_tx_queues() but I also need to replace
the ASSERT_RTNL() with netdev_ops_assert_locked(dev) in
__udp_tunnel_nic_reset_ntf().

With the additional change, it works well with repeated NIC resets.
Thanks for the suggestion.

>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index c9013632296f..d8d29729c685 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3177,7 +3177,6 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
>
>         if (dev->reg_state == NETREG_REGISTERED ||
>             dev->reg_state == NETREG_UNREGISTERING) {
> -               ASSERT_RTNL();
>                 netdev_ops_assert_locked(dev);
>
>                 rc = netdev_queue_update_kobjects(dev, dev->real_num_tx_queues,
> @@ -3227,7 +3226,6 @@ int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq)
>                 return -EINVAL;
>
>         if (dev->reg_state == NETREG_REGISTERED) {
> -               ASSERT_RTNL();
>                 netdev_ops_assert_locked(dev);
>
>                 rc = net_rx_queue_update_kobjects(dev, dev->real_num_rx_queues,

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task()
  2025-05-12 22:08   ` Michael Chan
@ 2025-05-12 23:43     ` Stanislav Fomichev
  2025-05-13  0:26       ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-05-12 23:43 UTC (permalink / raw)
  To: Michael Chan
  Cc: davem, netdev, edumazet, kuba, pabeni, andrew+netdev,
	pavan.chebbi, andrew.gospodarek, sdf, Kalesh AP

On 05/12, Michael Chan wrote:
> On Mon, May 12, 2025 at 7:20 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > Will the following work instead? netdev_ops_assert_locked should take
> > care of asserting either ops lock or rtnl lock depending on the device
> > properties.
> 
> It works for netif_set_real_num_tx_queues() but I also need to replace
> the ASSERT_RTNL() with netdev_ops_assert_locked(dev) in
> __udp_tunnel_nic_reset_ntf().

Sounds good!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task()
  2025-05-12 23:43     ` Stanislav Fomichev
@ 2025-05-13  0:26       ` Jakub Kicinski
  2025-05-13  1:14         ` Stanislav Fomichev
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-05-13  0:26 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Michael Chan, davem, netdev, edumazet, pabeni, andrew+netdev,
	pavan.chebbi, andrew.gospodarek, sdf, Kalesh AP

On Mon, 12 May 2025 16:43:12 -0700 Stanislav Fomichev wrote:
> On 05/12, Michael Chan wrote:
> > On Mon, May 12, 2025 at 7:20 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:  
> > > Will the following work instead? netdev_ops_assert_locked should take
> > > care of asserting either ops lock or rtnl lock depending on the device
> > > properties.  
> > 
> > It works for netif_set_real_num_tx_queues() but I also need to replace
> > the ASSERT_RTNL() with netdev_ops_assert_locked(dev) in
> > __udp_tunnel_nic_reset_ntf().  
> 
> Sounds good!

Mm... To me it sounds concerning. UDP tunnel port tracking doesn't have
any locks, it depends on RTNL. Are y'all sure we can just drop the
ASSERT_RTNL() and nothing will blow up? Or did I misunderstand?

I'd go with Michael's patch for net and revisit in net-next if you're
filling bold.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task()
  2025-05-13  0:26       ` Jakub Kicinski
@ 2025-05-13  1:14         ` Stanislav Fomichev
  2025-05-13  5:41           ` Michael Chan
  0 siblings, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-05-13  1:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Michael Chan, davem, netdev, edumazet, pabeni, andrew+netdev,
	pavan.chebbi, andrew.gospodarek, sdf, Kalesh AP

On 05/12, Jakub Kicinski wrote:
> On Mon, 12 May 2025 16:43:12 -0700 Stanislav Fomichev wrote:
> > On 05/12, Michael Chan wrote:
> > > On Mon, May 12, 2025 at 7:20 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:  
> > > > Will the following work instead? netdev_ops_assert_locked should take
> > > > care of asserting either ops lock or rtnl lock depending on the device
> > > > properties.  
> > > 
> > > It works for netif_set_real_num_tx_queues() but I also need to replace
> > > the ASSERT_RTNL() with netdev_ops_assert_locked(dev) in
> > > __udp_tunnel_nic_reset_ntf().  
> > 
> > Sounds good!
> 
> Mm... To me it sounds concerning. UDP tunnel port tracking doesn't have
> any locks, it depends on RTNL. Are y'all sure we can just drop the
> ASSERT_RTNL() and nothing will blow up? Or did I misunderstand?
> 
> I'd go with Michael's patch for net and revisit in net-next if you're
> filling bold.

Good point. But in this case, we need to cover more? bnxt_open from
bnxt_resume and the callers of bnxt_reset?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task()
  2025-05-13  1:14         ` Stanislav Fomichev
@ 2025-05-13  5:41           ` Michael Chan
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Chan @ 2025-05-13  5:41 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
	pavan.chebbi, andrew.gospodarek, sdf, Kalesh AP

[-- Attachment #1: Type: text/plain, Size: 543 bytes --]

On Mon, May 12, 2025 at 6:14 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 05/12, Jakub Kicinski wrote:
> > I'd go with Michael's patch for net and revisit in net-next if you're
> > filling bold.
>
> Good point. But in this case, we need to cover more? bnxt_open from
> bnxt_resume and the callers of bnxt_reset?

You are right.  We haven't tested suspend/resume, PCI AER, and TX
timeout yet and these will have the same problem.  Let me expand on
the original fix and get these code paths tested tomorrow.  Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-13  5:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12  6:37 [PATCH net] bnxt_en: bring back rtnl_lock() in bnxt_fw_reset_task() Michael Chan
2025-05-12 14:20 ` Stanislav Fomichev
2025-05-12 22:08   ` Michael Chan
2025-05-12 23:43     ` Stanislav Fomichev
2025-05-13  0:26       ` Jakub Kicinski
2025-05-13  1:14         ` Stanislav Fomichev
2025-05-13  5:41           ` Michael Chan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).