linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: clear offline CPU backlog.state in dev_cpu_dead()
@ 2025-07-23  8:38 wangyongyong
  2025-07-24  6:45 ` wangyongyong
  0 siblings, 1 reply; 2+ messages in thread
From: wangyongyong @ 2025-07-23  8:38 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, linux-kernel, wangyongyong

From: wangyongyong <wangyongyong@gztozed.com>

When a packet is enqueued to a remote CPU's backlog queue via enqueue_to_backlog(),
the following race condition can occur with CPU hotplug:

1. Source CPU sets NAPI_STATE_SCHED on target CPU's softnet_data->backlog.state
2. Source CPU raises NET_RX_SOFTIRQ to schedule NAPI polling
3. Target CPU is taken offline before the IPI arrives
4. dev_cpu_dead() fails to clear NAPI_STATE_SCHED because backlog isn't in poll_list

This results in:
- Stale NAPI_STATE_SCHED flag on offline CPU's backlog.state
- When the target CPU comes back online, the persistent NAPI_STATE_SCHED flag
  prevents the backlog from being properly added to poll_list, causing packet
  processing stalls
Signed-off-by: wangyongyong <wangyongyong@gztozed.com>
---
 net/core/dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index be97c440ecd5..fd92ab79c02a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12385,6 +12385,7 @@ static int dev_cpu_dead(unsigned int oldcpu)
 		else
 			____napi_schedule(sd, napi);
 	}
+	oldsd->backlog.state &= NAPIF_STATE_THREADED;
 
 	raise_softirq_irqoff(NET_TX_SOFTIRQ);
 	local_irq_enable();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] net: clear offline CPU backlog.state in dev_cpu_dead()
  2025-07-23  8:38 [PATCH] net: clear offline CPU backlog.state in dev_cpu_dead() wangyongyong
@ 2025-07-24  6:45 ` wangyongyong
  0 siblings, 0 replies; 2+ messages in thread
From: wangyongyong @ 2025-07-24  6:45 UTC (permalink / raw)
  To: wangyongyong; +Cc: davem, edumazet, horms, kuba, linux-kernel, netdev, pabeni

Hi all,

I apologize for missing the earlier discussion about this issue ([https://lore.kernel.org/netdev/b3ecb218932daa656a796cfa6e9e62b9.squirrel@www.codeaurora.org/]). 
While working on, I encountered what appeared to be the same problem, but I now realize the historical discussion already covers this case.
I should have researched more thoroughly before sending the patch.
I appreciate your time in reviewing my previous patch, and I'm sorry for any inconvenience caused by the duplicate submission.

Best regards.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-07-24  6:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23  8:38 [PATCH] net: clear offline CPU backlog.state in dev_cpu_dead() wangyongyong
2025-07-24  6:45 ` wangyongyong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).