netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net v1] net: wwan: t7xx: Fix napi rx poll issue
@ 2025-05-15  3:17 Jinjian Song
  2025-05-16  0:52 ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Jinjian Song @ 2025-05-15  3:17 UTC (permalink / raw)
  To: chandrashekar.devegowda, chiranjeevi.rapolu, haijun.liu,
	m.chetan.kumar, ricardo.martinez, loic.poulain, ryazanov.s.a,
	johannes, davem, edumazet, kuba, pabeni
  Cc: linux-kernel, netdev, linux-doc, angelogioacchino.delregno,
	linux-arm-kernel, matthias.bgg, corbet, linux-mediatek, helgaas,
	danielwinkler, andrew+netdev, horms, Jinjian Song

When driver handles the napi rx polling requests, the netdev might
have been released by the dellink logic triggered by the disconnect
operation on user plane. However, in the logic of processing skb in
polling, an invalid netdev is still being used, which causes a panic.

BUG: kernel NULL pointer dereference, address: 00000000000000f1
Oops: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:dev_gro_receive+0x3a/0x620
[...]
Call Trace:
 <IRQ>
 ? __die_body+0x68/0xb0
 ? page_fault_oops+0x379/0x3e0
 ? exc_page_fault+0x4f/0xa0
 ? asm_exc_page_fault+0x22/0x30
 ? __pfx_t7xx_ccmni_recv_skb+0x10/0x10 [mtk_t7xx (HASH:1400 7)]
 ? dev_gro_receive+0x3a/0x620
 napi_gro_receive+0xad/0x170
 t7xx_ccmni_recv_skb+0x48/0x70 [mtk_t7xx (HASH:1400 7)]
 t7xx_dpmaif_napi_rx_poll+0x590/0x800 [mtk_t7xx (HASH:1400 7)]
 net_rx_action+0x103/0x470
 irq_exit_rcu+0x13a/0x310
 sysvec_apic_timer_interrupt+0x56/0x90
 </IRQ>

Fixes: 5545b7b9f294 ("net: wwan: t7xx: Add NAPI support")
Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
---
 drivers/net/wwan/t7xx/t7xx_netdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wwan/t7xx/t7xx_netdev.c b/drivers/net/wwan/t7xx/t7xx_netdev.c
index 91fa082e9cab..2116ff81728b 100644
--- a/drivers/net/wwan/t7xx/t7xx_netdev.c
+++ b/drivers/net/wwan/t7xx/t7xx_netdev.c
@@ -324,6 +324,7 @@ static void t7xx_ccmni_wwan_dellink(void *ctxt, struct net_device *dev, struct l
 	if (WARN_ON(ctlb->ccmni_inst[if_id] != ccmni))
 		return;
 
+	ctlb->ccmni_inst[if_id] = NULL;
 	unregister_netdevice(dev);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [net v1] net: wwan: t7xx: Fix napi rx poll issue
  2025-05-15  3:17 [net v1] net: wwan: t7xx: Fix napi rx poll issue Jinjian Song
@ 2025-05-16  0:52 ` Jakub Kicinski
  2025-05-16  7:30   ` Jinjian Song
  2025-05-16 15:48   ` Jakub Kicinski
  0 siblings, 2 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-05-16  0:52 UTC (permalink / raw)
  To: Jinjian Song
  Cc: chandrashekar.devegowda, chiranjeevi.rapolu, haijun.liu,
	m.chetan.kumar, ricardo.martinez, loic.poulain, ryazanov.s.a,
	johannes, davem, edumazet, pabeni, linux-kernel, netdev,
	linux-doc, angelogioacchino.delregno, linux-arm-kernel,
	matthias.bgg, corbet, linux-mediatek, helgaas, danielwinkler,
	andrew+netdev, horms

On Thu, 15 May 2025 11:17:42 +0800 Jinjian Song wrote:
> diff --git a/drivers/net/wwan/t7xx/t7xx_netdev.c b/drivers/net/wwan/t7xx/t7xx_netdev.c
> index 91fa082e9cab..2116ff81728b 100644
> --- a/drivers/net/wwan/t7xx/t7xx_netdev.c
> +++ b/drivers/net/wwan/t7xx/t7xx_netdev.c
> @@ -324,6 +324,7 @@ static void t7xx_ccmni_wwan_dellink(void *ctxt, struct net_device *dev, struct l
>  	if (WARN_ON(ctlb->ccmni_inst[if_id] != ccmni))
>  		return;
>  
> +	ctlb->ccmni_inst[if_id] = NULL;
>  	unregister_netdevice(dev);

I don't see any synchronization between this write and NAPI processing.
Is this safe? NAPI can be at any point of processing as we set the ptr
to NULL
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net v1] net: wwan: t7xx: Fix napi rx poll issue
  2025-05-16  0:52 ` Jakub Kicinski
@ 2025-05-16  7:30   ` Jinjian Song
  2025-05-16 15:48   ` Jakub Kicinski
  1 sibling, 0 replies; 6+ messages in thread
From: Jinjian Song @ 2025-05-16  7:30 UTC (permalink / raw)
  To: kuba
  Cc: andrew+netdev, angelogioacchino.delregno, chandrashekar.devegowda,
	chiranjeevi.rapolu, corbet, danielwinkler, davem, edumazet,
	haijun.liu, helgaas, horms, jinjian.song, johannes,
	linux-arm-kernel, linux-doc, linux-kernel, linux-mediatek,
	loic.poulain, m.chetan.kumar, matthias.bgg, netdev, pabeni,
	ricardo.martinez, ryazanov.s.a

>On Thu, 15 May 2025 11:17:42 +0800 Jinjian Song wrote:
>> diff --git a/drivers/net/wwan/t7xx/t7xx_netdev.c b/drivers/net/wwan/t7xx/t7xx_netdev.c
>> index 91fa082e9cab..2116ff81728b 100644
>> --- a/drivers/net/wwan/t7xx/t7xx_netdev.c
>> +++ b/drivers/net/wwan/t7xx/t7xx_netdev.c
>> @@ -324,6 +324,7 @@ static void t7xx_ccmni_wwan_dellink(void *ctxt, struct net_device *dev, struct l
>>  	if (WARN_ON(ctlb->ccmni_inst[if_id] != ccmni))
>>  		return;
>>  
>> +	ctlb->ccmni_inst[if_id] = NULL;
>>  	unregister_netdevice(dev);
>
>I don't see any synchronization between this write and NAPI processing.
>Is this safe? NAPI can be at any point of processing as we set the ptr
>to NULL

This panic occured in the scenario where there are frequent disconnect
and connect WWAN cellular on UI.
I debug the panic with gdb and found it as caused by an invalid net_device
during this process:
1.-> t7xx_dpmaif_napi_rx_poll
2.-> t7xx_ccmni_recv_skb 
3.-> napi_gro_receive
4.-> dev_gro_receive
5.-> netif_elide_gro
One way, the net_device using in step 5 is valid, so "dev->features .." panic,
this net_device pass from t7xx_ccmni_recv_skb:
void t7xx_ccmni_recv_skb(...) {
  [...]
  
  ccmni = ccmni_ctlb->ccmni_inst[netif_id];
  if (!ccmni) {
    dev_kfree_skb(skb);
    return;
  }
  
  net_dev = ccmni->dev;
  skb->dev = net_dev;
  [...]
  napi_gro_receive(napi, skb);
  [...]
}

Another way, WWAN disconnect -> wwan_ops.dellink -> t7xx_ccmni_wwan_dellink
-> unregister_netdevice(dev).
netdevice has been invalid, so t7xx_dpmaif_napi_rx_poll can't use it any more.
I mark ccmni_inst[if_id] = NULL with netdevice invalid at the same time.
It seems that a judgment is made every time ccmni_inst[x] is used in the driver,
and the synchronization on the 2 way might have been done when NAPI triggers
polling by napi_schedule and when WWAN trigger dellink. 
So this should be safe.

Jinjian,
Best Regards.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net v1] net: wwan: t7xx: Fix napi rx poll issue
  2025-05-16  0:52 ` Jakub Kicinski
  2025-05-16  7:30   ` Jinjian Song
@ 2025-05-16 15:48   ` Jakub Kicinski
  2025-05-20  7:05     ` Jinjian Song
  2025-05-20 22:39     ` Jakub Kicinski
  1 sibling, 2 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-05-16 15:48 UTC (permalink / raw)
  To: Jinjian Song
  Cc: andrew+netdev, angelogioacchino.delregno, chandrashekar.devegowda,
	chiranjeevi.rapolu, corbet, danielwinkler, davem, edumazet,
	haijun.liu, helgaas, horms, johannes, linux-arm-kernel, linux-doc,
	linux-kernel, linux-mediatek, loic.poulain, m.chetan.kumar,
	matthias.bgg, netdev, pabeni, ricardo.martinez, ryazanov.s.a

On Fri, 16 May 2025 15:30:38 +0800 Jinjian Song wrote:
> It seems that a judgment is made every time ccmni_inst[x] is used in the driver,
> and the synchronization on the 2 way might have been done when NAPI triggers
> polling by napi_schedule and when WWAN trigger dellink. 

Synchronization is about ensuring that the condition validating
by the if() remains true for as long as necessary.
You need to wrap the read with READ_ONCE() and write with WRITE_ONCE().
The rest if fine because netdev unregister sync against NAPIs in flight.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net v1] net: wwan: t7xx: Fix napi rx poll issue
  2025-05-16 15:48   ` Jakub Kicinski
@ 2025-05-20  7:05     ` Jinjian Song
  2025-05-20 22:39     ` Jakub Kicinski
  1 sibling, 0 replies; 6+ messages in thread
From: Jinjian Song @ 2025-05-20  7:05 UTC (permalink / raw)
  To: kuba
  Cc: andrew+netdev, angelogioacchino.delregno, chandrashekar.devegowda,
	chiranjeevi.rapolu, corbet, danielwinkler, davem, edumazet,
	haijun.liu, helgaas, horms, jinjian.song, johannes,
	linux-arm-kernel, linux-doc, linux-kernel, linux-mediatek,
	loic.poulain, m.chetan.kumar, matthias.bgg, netdev, pabeni,
	ricardo.martinez, ryazanov.s.a, liuqf

>On Fri, 16 May 2025 15:30:38 +0800 Jinjian Song wrote:
>> It seems that a judgment is made every time ccmni_inst[x] is used in the driver,
>> and the synchronization on the 2 way might have been done when NAPI triggers
>> polling by napi_schedule and when WWAN trigger dellink. 
>
>Synchronization is about ensuring that the condition validating
>by the if() remains true for as long as necessary.
>You need to wrap the read with READ_ONCE() and write with WRITE_ONCE().
>The rest if fine because netdev unregister sync against NAPIs in flight.
>

Hi Jakub,
  I think I got your point.
  I can use the atomic_t usage in struct t7xx_ccmni to synchronization.
  
  static void t7xx_ccmni_wwan_dellink(...) {

  [...]

  if (WARN_ON(ctlb->ccmni_inst[if_id] != ccmni))
    return;

  unregister_netdevice(dev);

  //Add here use this variable(ccmnii->usage) to synchronization

  if (atomic_read(&ccmni->usage) == 0)
     ccmni == NULL;

  }

  How about this modify?

Thanks.

Jinjian,
Best Regards.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net v1] net: wwan: t7xx: Fix napi rx poll issue
  2025-05-16 15:48   ` Jakub Kicinski
  2025-05-20  7:05     ` Jinjian Song
@ 2025-05-20 22:39     ` Jakub Kicinski
  1 sibling, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-05-20 22:39 UTC (permalink / raw)
  To: Jinjian Song
  Cc: andrew+netdev, angelogioacchino.delregno, chandrashekar.devegowda,
	chiranjeevi.rapolu, corbet, danielwinkler, davem, edumazet,
	haijun.liu, helgaas, horms, johannes, linux-arm-kernel, linux-doc,
	linux-kernel, linux-mediatek, loic.poulain, m.chetan.kumar,
	matthias.bgg, netdev, pabeni, ricardo.martinez, ryazanov.s.a,
	liuqf

On Tue, 20 May 2025 15:05:34 +0800 Jinjian Song wrote:
> >Synchronization is about ensuring that the condition validating
> >by the if() remains true for as long as necessary.
> >You need to wrap the read with READ_ONCE() and write with WRITE_ONCE().
> >The rest if fine because netdev unregister sync against NAPIs in flight.
> >  
> 
> Hi Jakub,
>   I think I got your point.
>   I can use the atomic_t usage in struct t7xx_ccmni to synchronization.
>   
>   static void t7xx_ccmni_wwan_dellink(...) {
> 
>   [...]
> 
>   if (WARN_ON(ctlb->ccmni_inst[if_id] != ccmni))
>     return;
> 
>   unregister_netdevice(dev);
> 
>   //Add here use this variable(ccmnii->usage) to synchronization
> 
>   if (atomic_read(&ccmni->usage) == 0)
>      ccmni == NULL;
> 
>   }
> 
>   How about this modify?

Just use READ_ONCE() / WRITE_ONCE() on the pointer as I suggested.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-05-20 22:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15  3:17 [net v1] net: wwan: t7xx: Fix napi rx poll issue Jinjian Song
2025-05-16  0:52 ` Jakub Kicinski
2025-05-16  7:30   ` Jinjian Song
2025-05-16 15:48   ` Jakub Kicinski
2025-05-20  7:05     ` Jinjian Song
2025-05-20 22:39     ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).