netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
@ 2024-08-23  9:44 Souradeep Chakrabarti
  2024-08-23 21:12 ` Haiyang Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Souradeep Chakrabarti @ 2024-08-23  9:44 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, yury.norov, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma
  Cc: schakrabarti, Souradeep Chakrabarti

Currently napi_disable() gets called during rxq and txq cleanup,
even before napi is enabled and hrtimer is initialized. It causes
kernel panic.

? page_fault_oops+0x136/0x2b0
  ? page_counter_cancel+0x2e/0x80
  ? do_user_addr_fault+0x2f2/0x640
  ? refill_obj_stock+0xc4/0x110
  ? exc_page_fault+0x71/0x160
  ? asm_exc_page_fault+0x27/0x30
  ? __mmdrop+0x10/0x180
  ? __mmdrop+0xec/0x180
  ? hrtimer_active+0xd/0x50
  hrtimer_try_to_cancel+0x2c/0xf0
  hrtimer_cancel+0x15/0x30
  napi_disable+0x65/0x90
  mana_destroy_rxq+0x4c/0x2f0
  mana_create_rxq.isra.0+0x56c/0x6d0
  ? mana_uncfg_vport+0x50/0x50
  mana_alloc_queues+0x21b/0x320
  ? skb_dequeue+0x5f/0x80

Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
V2 -> V1:
Addressed the comment on cleaning up napi for the queues,
where queue creation was successful.
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 22 +++++++++++--------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 39f56973746d..7448085fd49e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1872,10 +1872,11 @@ static void mana_destroy_txq(struct mana_port_context *apc)
 
 	for (i = 0; i < apc->num_queues; i++) {
 		napi = &apc->tx_qp[i].tx_cq.napi;
-		napi_synchronize(napi);
-		napi_disable(napi);
-		netif_napi_del(napi);
-
+		if (napi->dev == apc->ndev) {
+			napi_synchronize(napi);
+			napi_disable(napi);
+			netif_napi_del(napi);
+		}
 		mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object);
 
 		mana_deinit_cq(apc, &apc->tx_qp[i].tx_cq);
@@ -2023,14 +2024,17 @@ static void mana_destroy_rxq(struct mana_port_context *apc,
 
 	napi = &rxq->rx_cq.napi;
 
-	if (validate_state)
-		napi_synchronize(napi);
+	if (napi->dev == apc->ndev) {
 
-	napi_disable(napi);
+		if (validate_state)
+			napi_synchronize(napi);
 
-	xdp_rxq_info_unreg(&rxq->xdp_rxq);
+		napi_disable(napi);
 
-	netif_napi_del(napi);
+		netif_napi_del(napi);
+	}
+
+	xdp_rxq_info_unreg(&rxq->xdp_rxq);
 
 	mana_destroy_wq_obj(apc, GDMA_RQ, rxq->rxobj);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
  2024-08-23  9:44 [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup Souradeep Chakrabarti
@ 2024-08-23 21:12 ` Haiyang Zhang
  2024-08-24  3:52 ` Shradha Gupta
  2024-08-27 20:26 ` Jakub Kicinski
  2 siblings, 0 replies; 5+ messages in thread
From: Haiyang Zhang @ 2024-08-23 21:12 UTC (permalink / raw)
  To: Souradeep Chakrabarti, KY Srinivasan, wei.liu@kernel.org,
	Dexuan Cui, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Long Li, yury.norov@gmail.com,
	leon@kernel.org, cai.huoqing@linux.dev,
	ssengar@linux.microsoft.com, vkuznets@redhat.com,
	tglx@linutronix.de, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Souradeep Chakrabarti



> -----Original Message-----
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> Sent: Friday, August 23, 2024 5:44 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>;
> yury.norov@gmail.com; leon@kernel.org; cai.huoqing@linux.dev;
> ssengar@linux.microsoft.com; vkuznets@redhat.com; tglx@linutronix.de;
> linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> Cc: Souradeep Chakrabarti <schakrabarti@microsoft.com>; Souradeep
> Chakrabarti <schakrabarti@linux.microsoft.com>
> Subject: [PATCH V2 net] net: mana: Fix error handling in
> mana_create_txq/rxq's NAPI cleanup
> 
> Currently napi_disable() gets called during rxq and txq cleanup,
> even before napi is enabled and hrtimer is initialized. It causes
> kernel panic.
> 
> ? page_fault_oops+0x136/0x2b0
>   ? page_counter_cancel+0x2e/0x80
>   ? do_user_addr_fault+0x2f2/0x640
>   ? refill_obj_stock+0xc4/0x110
>   ? exc_page_fault+0x71/0x160
>   ? asm_exc_page_fault+0x27/0x30
>   ? __mmdrop+0x10/0x180
>   ? __mmdrop+0xec/0x180
>   ? hrtimer_active+0xd/0x50
>   hrtimer_try_to_cancel+0x2c/0xf0
>   hrtimer_cancel+0x15/0x30
>   napi_disable+0x65/0x90
>   mana_destroy_rxq+0x4c/0x2f0
>   mana_create_rxq.isra.0+0x56c/0x6d0
>   ? mana_uncfg_vport+0x50/0x50
>   mana_alloc_queues+0x21b/0x320
>   ? skb_dequeue+0x5f/0x80
> 
> Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>

Please add:
Cc: stable@vger.kernel.org

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
  2024-08-23  9:44 [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup Souradeep Chakrabarti
  2024-08-23 21:12 ` Haiyang Zhang
@ 2024-08-24  3:52 ` Shradha Gupta
  2024-08-27 20:26 ` Jakub Kicinski
  2 siblings, 0 replies; 5+ messages in thread
From: Shradha Gupta @ 2024-08-24  3:52 UTC (permalink / raw)
  To: Souradeep Chakrabarti
  Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, yury.norov, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma, schakrabarti

On Fri, Aug 23, 2024 at 02:44:29AM -0700, Souradeep Chakrabarti wrote:
> Currently napi_disable() gets called during rxq and txq cleanup,
> even before napi is enabled and hrtimer is initialized. It causes
> kernel panic.
> 
> ? page_fault_oops+0x136/0x2b0
>   ? page_counter_cancel+0x2e/0x80
>   ? do_user_addr_fault+0x2f2/0x640
>   ? refill_obj_stock+0xc4/0x110
>   ? exc_page_fault+0x71/0x160
>   ? asm_exc_page_fault+0x27/0x30
>   ? __mmdrop+0x10/0x180
>   ? __mmdrop+0xec/0x180
>   ? hrtimer_active+0xd/0x50
>   hrtimer_try_to_cancel+0x2c/0xf0
>   hrtimer_cancel+0x15/0x30
>   napi_disable+0x65/0x90
>   mana_destroy_rxq+0x4c/0x2f0
>   mana_create_rxq.isra.0+0x56c/0x6d0
>   ? mana_uncfg_vport+0x50/0x50
>   mana_alloc_queues+0x21b/0x320
>   ? skb_dequeue+0x5f/0x80
> 
> Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V2 -> V1:
> Addressed the comment on cleaning up napi for the queues,
> where queue creation was successful.
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 22 +++++++++++--------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 39f56973746d..7448085fd49e 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1872,10 +1872,11 @@ static void mana_destroy_txq(struct mana_port_context *apc)
>  
>  	for (i = 0; i < apc->num_queues; i++) {
>  		napi = &apc->tx_qp[i].tx_cq.napi;
> -		napi_synchronize(napi);
> -		napi_disable(napi);
> -		netif_napi_del(napi);
> -
> +		if (napi->dev == apc->ndev) {
> +			napi_synchronize(napi);
> +			napi_disable(napi);
> +			netif_napi_del(napi);
> +		}
>  		mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object);
>  
>  		mana_deinit_cq(apc, &apc->tx_qp[i].tx_cq);
> @@ -2023,14 +2024,17 @@ static void mana_destroy_rxq(struct mana_port_context *apc,
>  
>  	napi = &rxq->rx_cq.napi;
>  
> -	if (validate_state)
> -		napi_synchronize(napi);
> +	if (napi->dev == apc->ndev) {
>  
> -	napi_disable(napi);
> +		if (validate_state)
> +			napi_synchronize(napi);
>  
> -	xdp_rxq_info_unreg(&rxq->xdp_rxq);
> +		napi_disable(napi);
>  
> -	netif_napi_del(napi);
> +		netif_napi_del(napi);
> +	}
> +
> +	xdp_rxq_info_unreg(&rxq->xdp_rxq);
>  
>  	mana_destroy_wq_obj(apc, GDMA_RQ, rxq->rxobj);

Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
>  
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
  2024-08-23  9:44 [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup Souradeep Chakrabarti
  2024-08-23 21:12 ` Haiyang Zhang
  2024-08-24  3:52 ` Shradha Gupta
@ 2024-08-27 20:26 ` Jakub Kicinski
  2024-08-29  8:43   ` Souradeep Chakrabarti
  2 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2024-08-27 20:26 UTC (permalink / raw)
  To: Souradeep Chakrabarti
  Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, pabeni, longli,
	yury.norov, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma, schakrabarti

On Fri, 23 Aug 2024 02:44:29 -0700 Souradeep Chakrabarti wrote:
> @@ -2023,14 +2024,17 @@ static void mana_destroy_rxq(struct mana_port_context *apc,
>  
>  	napi = &rxq->rx_cq.napi;
>  
> -	if (validate_state)
> -		napi_synchronize(napi);
> +	if (napi->dev == apc->ndev) {
>  
> -	napi_disable(napi);
> +		if (validate_state)
> +			napi_synchronize(napi);
>  
> -	xdp_rxq_info_unreg(&rxq->xdp_rxq);
> +		napi_disable(napi);
>  
> -	netif_napi_del(napi);
> +		netif_napi_del(napi);
> +	}
> +
> +	xdp_rxq_info_unreg(&rxq->xdp_rxq);

Please don't use internal core state as a crutch for your cleanup.

IDK what "validate_state" stands for, but it gives you all the info you
need on Rx. On Rx NAPI registration happens as the last stage of rxq
activation, once nothing can fail. And the "cleanup" path calls destroy
with validate_state=false. The only other caller passes true.

So you can rewrite this as:

	if (validate_state) { /* rename it maybe? */
		napi_disable(napi);
		...
	}
	xdp_rxq_info_unreg(&rxq->xdp_rxq);

You can take similar approach with Tx. Pass a bool which tells the
destroy function whether NAPI has been registered.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
  2024-08-27 20:26 ` Jakub Kicinski
@ 2024-08-29  8:43   ` Souradeep Chakrabarti
  0 siblings, 0 replies; 5+ messages in thread
From: Souradeep Chakrabarti @ 2024-08-29  8:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, pabeni, longli,
	yury.norov, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma, schakrabarti

On Tue, Aug 27, 2024 at 01:26:37PM -0700, Jakub Kicinski wrote:
> On Fri, 23 Aug 2024 02:44:29 -0700 Souradeep Chakrabarti wrote:
> > @@ -2023,14 +2024,17 @@ static void mana_destroy_rxq(struct mana_port_context *apc,
> >  
> >  	napi = &rxq->rx_cq.napi;
> >  
> > -	if (validate_state)
> > -		napi_synchronize(napi);
> > +	if (napi->dev == apc->ndev) {
> >  
> > -	napi_disable(napi);
> > +		if (validate_state)
> > +			napi_synchronize(napi);
> >  
> > -	xdp_rxq_info_unreg(&rxq->xdp_rxq);
> > +		napi_disable(napi);
> >  
> > -	netif_napi_del(napi);
> > +		netif_napi_del(napi);
> > +	}
> > +
> > +	xdp_rxq_info_unreg(&rxq->xdp_rxq);
> 
> Please don't use internal core state as a crutch for your cleanup.
> 
> IDK what "validate_state" stands for, but it gives you all the info you
> need on Rx. On Rx NAPI registration happens as the last stage of rxq
> activation, once nothing can fail. And the "cleanup" path calls destroy
> with validate_state=false. The only other caller passes true.
> 
> So you can rewrite this as:
> 
> 	if (validate_state) { /* rename it maybe? */
> 		napi_disable(napi);
> 		...
> 	}
> 	xdp_rxq_info_unreg(&rxq->xdp_rxq);
> 
> You can take similar approach with Tx. Pass a bool which tells the
> destroy function whether NAPI has been registered.
Thanks Jakub for the suggestion. I have changed the implementation
in the V3. I have added a new txq and rxq structure attribute to check
that per queue napi is initialized.
The use of a local flag like validate_state will not be possible with
current design of txq destroy function, as it uses the hole vport
and loops for all the queues for that port.
-
Souradeep
> -- 
> pw-bot: cr

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-08-29  8:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-23  9:44 [PATCH V2 net] net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup Souradeep Chakrabarti
2024-08-23 21:12 ` Haiyang Zhang
2024-08-24  3:52 ` Shradha Gupta
2024-08-27 20:26 ` Jakub Kicinski
2024-08-29  8:43   ` Souradeep Chakrabarti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).