* [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
@ 2026-06-17 9:20 Yun Zhou
2026-06-17 12:49 ` Maxime Chevallier
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Yun Zhou @ 2026-06-17 9:20 UTC (permalink / raw)
To: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
bigeasy, clrkwllms, rostedt
Cc: netdev, linux-kernel, linux-rt-devel, yun.zhou
On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
network traffic, the IRQ can enter suspend with desc->depth == 1
(masked by the oneshot mechanism between handler invocations).
During suspend, the kernel increments depth to 2 and masks the
interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
due to IRQCHIP_MASK_ON_SUSPEND). On resume, depth is decremented
back to 1, but since it does not reach 0, the unmask is never
called. The MPIC CPU routing remains cleared, permanently disabling
interrupt delivery.
Fix by freeing the IRQ in suspend and re-requesting it in resume.
This ensures a clean IRQ state (depth=0, proper hardware routing)
on every resume cycle, regardless of the pre-suspend depth. This
follows the approach used by other drivers (e.g. igb).
Fixes: 9768b45ceb0b ("net: mvneta: support suspend and resume")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v2:
- Move request_irq before cpuhp registration in resume (matching
mvneta_open ordering) so that failure does not leave cpuhp
callbacks registered on a non-functional device.
- On request_irq failure, call netif_device_detach() to prevent
further traffic on the dead interface.
drivers/net/ethernet/marvell/mvneta.c | 29 +++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index b4a845f04c05..02ea867d07a3 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5826,6 +5826,20 @@ static int mvneta_suspend(struct device *device)
mvneta_stop_dev(pp);
rtnl_unlock();
+ /* Release IRQ to avoid stale MPIC mask state on resume.
+ * On PREEMPT_RT, forced-threaded oneshot IRQs may leave the
+ * interrupt masked (depth>0) at suspend time. This prevents
+ * resume_device_irqs() from restoring the MPIC CPU routing,
+ * permanently disabling the interrupt. Re-requesting the IRQ
+ * on resume guarantees a clean state.
+ */
+ if (pp->neta_armada3700)
+ free_irq(dev->irq, pp);
+ else {
+ on_each_cpu(mvneta_percpu_disable, pp, true);
+ free_percpu_irq(dev->irq, pp->ports);
+ }
+
for (queue = 0; queue < rxq_number; queue++) {
struct mvneta_rx_queue *rxq = &pp->rxqs[queue];
@@ -5892,6 +5906,21 @@ static int mvneta_resume(struct device *device)
mvneta_txq_hw_init(pp, txq);
}
+ /* Re-request IRQ (see comment in mvneta_suspend) */
+ if (pp->neta_armada3700) {
+ err = request_irq(dev->irq, mvneta_isr, 0, dev->name, pp);
+ } else {
+ err = request_percpu_irq(dev->irq, mvneta_percpu_isr,
+ dev->name, pp->ports);
+ if (!err)
+ on_each_cpu(mvneta_percpu_enable, pp, true);
+ }
+ if (err) {
+ netdev_err(dev, "cannot request irq %d\n", dev->irq);
+ netif_device_detach(dev);
+ return err;
+ }
+
if (!pp->neta_armada3700) {
spin_lock(&pp->lock);
pp->is_stopped = false;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
2026-06-17 9:20 [PATCH v2] net: mvneta: free/request IRQ across suspend/resume Yun Zhou
@ 2026-06-17 12:49 ` Maxime Chevallier
2026-06-18 9:03 ` Zhou, Yun
2026-06-18 8:39 ` Sebastian Andrzej Siewior
2026-06-18 9:21 ` sashiko-bot
2 siblings, 1 reply; 6+ messages in thread
From: Maxime Chevallier @ 2026-06-17 12:49 UTC (permalink / raw)
To: Yun Zhou, marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba,
pabeni, bigeasy, clrkwllms, rostedt
Cc: netdev, linux-kernel, linux-rt-devel
Hi,
On 6/17/26 11:20, Yun Zhou wrote:
> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
> network traffic, the IRQ can enter suspend with desc->depth == 1
> (masked by the oneshot mechanism between handler invocations).
>
> During suspend, the kernel increments depth to 2 and masks the
> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
> due to IRQCHIP_MASK_ON_SUSPEND). On resume, depth is decremented
> back to 1, but since it does not reach 0, the unmask is never
> called. The MPIC CPU routing remains cleared, permanently disabling
> interrupt delivery.
>
> Fix by freeing the IRQ in suspend and re-requesting it in resume.
> This ensures a clean IRQ state (depth=0, proper hardware routing)
> on every resume cycle, regardless of the pre-suspend depth. This
> follows the approach used by other drivers (e.g. igb).
This description makes it sound like it's not really a mvneta problem,
but rather a broader effect from preempt-rt / irq management / suspend
interactions.
Is this the expected way to deal with that ?
Maxime
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
2026-06-17 9:20 [PATCH v2] net: mvneta: free/request IRQ across suspend/resume Yun Zhou
2026-06-17 12:49 ` Maxime Chevallier
@ 2026-06-18 8:39 ` Sebastian Andrzej Siewior
2026-06-18 9:14 ` Zhou, Yun
2026-06-18 9:21 ` sashiko-bot
2 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-18 8:39 UTC (permalink / raw)
To: Yun Zhou
Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
clrkwllms, rostedt, netdev, linux-kernel, linux-rt-devel
On 2026-06-17 17:20:28 [+0800], Yun Zhou wrote:
> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
There is also the `threadirqs' option.
> network traffic, the IRQ can enter suspend with desc->depth == 1
> (masked by the oneshot mechanism between handler invocations).
That would be irq_desc::depth.
> During suspend, the kernel increments depth to 2 and masks the
> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
> due to IRQCHIP_MASK_ON_SUSPEND).
The interrupt should be masked while the depth counter goes 0->1, no?
> On resume, depth is decremented
> back to 1, but since it does not reach 0, the unmask is never
> called. The MPIC CPU routing remains cleared, permanently disabling
> interrupt delivery.
But why not? In my naive assumption, we get into suspend with
irq_desc::depth = 2 and the threaded should be woken up. Once the
treaded handler is done the counter should decrement by one. Then again
during resume reaching 0 leading to the unmask. If the thread handler is
frozen and defrosted on resume then it should still happen but in
different order.
Something is missing here based on my naive assumption.
> Fix by freeing the IRQ in suspend and re-requesting it in resume.
> This ensures a clean IRQ state (depth=0, proper hardware routing)
> on every resume cycle, regardless of the pre-suspend depth. This
> follows the approach used by other drivers (e.g. igb).
The igb shutdowns the device entirely, not just freeing the IRQ.
> Fixes: 9768b45ceb0b ("net: mvneta: support suspend and resume")
> Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Sebastian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
2026-06-17 12:49 ` Maxime Chevallier
@ 2026-06-18 9:03 ` Zhou, Yun
0 siblings, 0 replies; 6+ messages in thread
From: Zhou, Yun @ 2026-06-18 9:03 UTC (permalink / raw)
To: Maxime Chevallier, marcin.s.wojtas, andrew+netdev, davem,
edumazet, kuba, pabeni, bigeasy, clrkwllms, rostedt
Cc: netdev, linux-kernel, linux-rt-devel
On 6/17/26 20:49, Maxime Chevallier wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> Hi,
>
> On 6/17/26 11:20, Yun Zhou wrote:
>> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
>> network traffic, the IRQ can enter suspend with desc->depth == 1
>> (masked by the oneshot mechanism between handler invocations).
>>
>> During suspend, the kernel increments depth to 2 and masks the
>> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
>> due to IRQCHIP_MASK_ON_SUSPEND). On resume, depth is decremented
>> back to 1, but since it does not reach 0, the unmask is never
>> called. The MPIC CPU routing remains cleared, permanently disabling
>> interrupt delivery.
>>
>> Fix by freeing the IRQ in suspend and re-requesting it in resume.
>> This ensures a clean IRQ state (depth=0, proper hardware routing)
>> on every resume cycle, regardless of the pre-suspend depth. This
>> follows the approach used by other drivers (e.g. igb).
> This description makes it sound like it's not really a mvneta problem,
> but rather a broader effect from preempt-rt / irq management / suspend
> interactions.
>
> Is this the expected way to deal with that ?
>
You were right to question this. After deeper investigation, I found
that the original analysis was incorrect.
The real root cause is entirely within the mvneta driver:
mvneta_percpu_isr() calls disable_percpu_irq() to mask the MPIC percpu
IRQ before scheduling NAPI. The corresponding enable_percpu_irq() is
called in napi_complete_done(). If suspend occurs during active NAPI
polling (between disable and enable), the MPIC percpu IRQ remains
masked after resume — mvneta_start_dev() only restores the NIC-level
INTR_NEW_MASK register, not the irqchip-level per-CPU mask.
The fix is a one-liner: call on_each_cpu(mvneta_percpu_enable) in the
resume path to ensure the MPIC percpu IRQ is unmasked. I will send a
v3 with the correct fix and updated description.
The previous free_irq/request_irq approach happened to work as a
side-effect (request_percpu_irq → enable_percpu_irq restores the mask),
but it was fixing the symptom rather than the actual cause.
Thank you very much for your rigorous review,
Yun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
2026-06-18 8:39 ` Sebastian Andrzej Siewior
@ 2026-06-18 9:14 ` Zhou, Yun
0 siblings, 0 replies; 6+ messages in thread
From: Zhou, Yun @ 2026-06-18 9:14 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
clrkwllms, rostedt, netdev, linux-kernel, linux-rt-devel
On 6/18/26 16:39, Sebastian Andrzej Siewior wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> On 2026-06-17 17:20:28 [+0800], Yun Zhou wrote:
>> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
> There is also the `threadirqs' option.
>
>> network traffic, the IRQ can enter suspend with desc->depth == 1
>> (masked by the oneshot mechanism between handler invocations).
> That would be irq_desc::depth.
>
>> During suspend, the kernel increments depth to 2 and masks the
>> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
>> due to IRQCHIP_MASK_ON_SUSPEND).
> The interrupt should be masked while the depth counter goes 0->1, no?
>
>> On resume, depth is decremented
>> back to 1, but since it does not reach 0, the unmask is never
>> called. The MPIC CPU routing remains cleared, permanently disabling
>> interrupt delivery.
> But why not? In my naive assumption, we get into suspend with
> irq_desc::depth = 2 and the threaded should be woken up. Once the
> treaded handler is done the counter should decrement by one. Then again
> during resume reaching 0 leading to the unmask. If the thread handler is
> frozen and defrosted on resume then it should still happen but in
> different order.
>
> Something is missing here based on my naive assumption.
>
>> Fix by freeing the IRQ in suspend and re-requesting it in resume.
>> This ensures a clean IRQ state (depth=0, proper hardware routing)
>> on every resume cycle, regardless of the pre-suspend depth. This
>> follows the approach used by other drivers (e.g. igb).
> The igb shutdowns the device entirely, not just freeing the IRQ.
You are right. The original analysis was wrong — mvneta uses
request_percpu_irq() which sets IRQF_NO_SUSPEND, so the PM framework
never touches this IRQ. The depth never changes from 1.
The actual root cause is simpler: mvneta_percpu_isr() calls
disable_percpu_irq() before scheduling NAPI, and enable_percpu_irq()
is called in napi_complete_done(). If suspend hits during active NAPI
polling, the MPIC percpu IRQ stays masked after resume because
mvneta_start_dev() doesn't restore it.
Will send a v3 with the correct one-liner fix (enable_percpu_irq in
the resume path). Apologies for the incorrect analysis.
BR,
Yun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
2026-06-17 9:20 [PATCH v2] net: mvneta: free/request IRQ across suspend/resume Yun Zhou
2026-06-17 12:49 ` Maxime Chevallier
2026-06-18 8:39 ` Sebastian Andrzej Siewior
@ 2026-06-18 9:21 ` sashiko-bot
2 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-06-18 9:21 UTC (permalink / raw)
To: Yun Zhou; +Cc: linux-rt-devel, linux-kernel
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Missing rollback in `mvneta_resume()` error path causes kernel panic and deadlock on subsequent teardown.
--
commit 3e974147b6189ed308e9434d784dbcd4c3ed5e87
Author: Yun Zhou <yun.zhou@windriver.com>
net: mvneta: free/request IRQ across suspend/resume
This commit fixes an issue on PREEMPT_RT where the mvneta IRQ handler is
left masked during suspend, causing interrupt delivery to be permanently
disabled upon resume. It addresses this by explicitly freeing the IRQ
during suspend and re-requesting it during resume.
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
[ ... ]
> @@ -5885,6 +5899,21 @@ static int mvneta_resume(struct device *device)
> mvneta_txq_hw_init(pp, txq);
> }
>
> + /* Re-request IRQ (see comment in mvneta_suspend) */
> + if (pp->neta_armada3700) {
> + err = request_irq(dev->irq, mvneta_isr, 0, dev->name, pp);
> + } else {
> + err = request_percpu_irq(dev->irq, mvneta_percpu_isr,
> + dev->name, pp->ports);
> + if (!err)
> + on_each_cpu(mvneta_percpu_enable, pp, true);
> + }
> + if (err) {
> + netdev_err(dev, "cannot request irq %d\n", dev->irq);
> + netif_device_detach(dev);
> + return err;
> + }
> +
[Severity: High]
Does this error path leave the device in an inconsistent state that will
cause a crash during teardown?
If request_irq() fails, this returns early without updating pp->is_stopped
and without re-adding the CPU hotplug instances. However, the interface
remains logically UP.
If the interface is subsequently brought down (e.g., via ip link set down),
mvneta_stop() will be called. This leads to mvneta_stop_dev(), which
invokes napi_disable(). Since NAPI was already disabled in
mvneta_suspend() and never re-enabled here, will this cause a deadlock
waiting for NAPIF_STATE_SCHED to clear?
Additionally, for non-Armada3700 devices, mvneta_stop() will call:
mvneta_stop() {
...
cpuhp_state_remove_instance_nocalls(online_hpstate,
&pp->node_online);
...
}
Since the nodes were already removed during suspend and never re-added,
could this invoke hlist_del() on a poisoned pointer and cause an immediate
kernel panic?
Finally, since the IRQ was never acquired, will the subsequent free_irq()
or free_percpu_irq() in mvneta_stop() trigger a kernel WARNING?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260617092028.1722407-1-yun.zhou@windriver.com?part=1
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-18 9:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 9:20 [PATCH v2] net: mvneta: free/request IRQ across suspend/resume Yun Zhou
2026-06-17 12:49 ` Maxime Chevallier
2026-06-18 9:03 ` Zhou, Yun
2026-06-18 8:39 ` Sebastian Andrzej Siewior
2026-06-18 9:14 ` Zhou, Yun
2026-06-18 9:21 ` sashiko-bot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.