[PATCH v3] net: mvneta: re-enable percpu interrupt on resume

Netdev List
 help / color / mirror / Atom feed

* [PATCH v3] net: mvneta: re-enable percpu interrupt on resume
@ 2026-06-18 10:43 Yun Zhou
  2026-06-18 12:51 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Yun Zhou @ 2026-06-18 10:43 UTC (permalink / raw)
  To: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, bigeasy
  Cc: netdev, linux-kernel, yun.zhou

On Armada XP (non-armada3700), mvneta uses percpu interrupts where
the ISR (mvneta_percpu_isr) calls disable_percpu_irq() to mask the
MPIC percpu IRQ, then schedules NAPI. NAPI poll completion calls
enable_percpu_irq() to unmask.

If suspend occurs while NAPI is actively polling (between
disable_percpu_irq in the ISR and enable_percpu_irq in
napi_complete_done), the MPIC percpu interrupt remains masked.
mvneta_stop_dev/mvneta_start_dev do not manage the percpu IRQ
enable state -- they only control mvneta's own INTR_NEW_MASK register.

After resume, the MPIC percpu IRQ stays masked permanently: the
network hardware generates interrupts (INTR_NEW_CAUSE != 0) but the
CPU never receives them (irq count stops incrementing), causing a
complete loss of network connectivity.

Fix by calling on_each_cpu(mvneta_percpu_enable) after
mvneta_start_dev() in the resume path, ensuring the MPIC percpu
IRQ is always unmasked regardless of the pre-suspend state.

Fixes: 12bb03b436da ("net: mvneta: Handle per-cpu interrupts")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v3:
  - Dropped the free_irq/request_irq approach (incorrect root cause).
  - Instead, call on_each_cpu(mvneta_percpu_enable) in the resume path
    to ensure the MPIC percpu IRQ is unmasked, matching mvneta_open().
  - Updated commit message with correct root cause analysis.

v2:
  - Move request_irq before cpuhp registration in resume (matching
    mvneta_open ordering) so that failure does not leave cpuhp
    callbacks registered on a non-functional device.
  - On request_irq failure, call netif_device_detach() to prevent
    further traffic on the dead interface.

 drivers/net/ethernet/marvell/mvneta.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index b4a845f04c05..5ef79e70e319 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5907,6 +5907,9 @@ static int mvneta_resume(struct device *device)
 	rtnl_unlock();
 	mvneta_set_rx_mode(dev);

+	if (!pp->neta_armada3700)
+		on_each_cpu(mvneta_percpu_enable, pp, true);
+
 	return 0;
 }
 #endif
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] net: mvneta: re-enable percpu interrupt on resume
  2026-06-18 10:43 [PATCH v3] net: mvneta: re-enable percpu interrupt on resume Yun Zhou
@ 2026-06-18 12:51 ` Sebastian Andrzej Siewior
  2026-06-18 13:56   ` Zhou, Yun
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-18 12:51 UTC (permalink / raw)
  To: Yun Zhou
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, netdev, linux-kernel

On 2026-06-18 18:43:51 [+0800], Yun Zhou wrote:
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -5907,6 +5907,9 @@ static int mvneta_resume(struct device *device)
>  	rtnl_unlock();
>  	mvneta_set_rx_mode(dev);
>  
> +	if (!pp->neta_armada3700)
> +		on_each_cpu(mvneta_percpu_enable, pp, true);
> +
>  	return 0;
>  }
>  #endif

This does not look symmetrical. I wouldn't mind if mvneta_suspend()
would have the matching disable but this isn't the case. 
But if the thread is idle then you have one enable too many, don't you?
Well you have the NAPI callback which does disable on the local CPU and
this resume which enables it on every CPU. So this does not look right.

The interesting question is what happens to the enable_percpu_irq() from
the mvneta_poll(). Is it lost? And if so, how/ why?

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] net: mvneta: re-enable percpu interrupt on resume
  2026-06-18 12:51 ` Sebastian Andrzej Siewior
@ 2026-06-18 13:56   ` Zhou, Yun
  2026-06-18 15:04     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Zhou, Yun @ 2026-06-18 13:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, netdev, linux-kernel, yun.zhou


On 6/18/2026 8:51 PM, Sebastian Andrzej Siewior wrote:
> 
> On 2026-06-18 18:43:51 [+0800], Yun Zhou wrote:
>> --- a/drivers/net/ethernet/marvell/mvneta.c
>> +++ b/drivers/net/ethernet/marvell/mvneta.c
>> @@ -5907,6 +5907,9 @@ static int mvneta_resume(struct device *device)
>>        rtnl_unlock();
>>        mvneta_set_rx_mode(dev);
>>
>> +     if (!pp->neta_armada3700)
>> +             on_each_cpu(mvneta_percpu_enable, pp, true);
>> +
>>        return 0;
>>   }
>>   #endif
> 
> This does not look symmetrical. I wouldn't mind if mvneta_suspend()
> would have the matching disable but this isn't the case.
> But if the thread is idle then you have one enable too many, don't you?
> Well you have the NAPI callback which does disable on the local CPU and
> this resume which enables it on every CPU. So this does not look right.
> 

The enable in resume is intentionally unconditional and idempotent
(writing MPIC_INT_CLEAR_MASK on an already unmasked IRQ is a no-op).

> The interesting question is what happens to the enable_percpu_irq() from
> the mvneta_poll(). Is it lost? And if so, how/ why?
> 

The enable_percpu_irq() from mvneta_poll is not "lost" — it never
gets a chance to execute. The sequence is:

1. mvneta_percpu_isr: disable_percpu_irq() + napi_schedule()
2. PM freezes kthreads (on PREEMPT_RT, softirq runs in kthread)
3. NAPI poll cannot run → enable_percpu_irq() is never called
4. mvneta_stop_dev → napi_disable(): cancels the scheduled poll
     but does NOT execute the completion path (no enable_percpu_irq)
5. Resume → napi_enable(): resets NAPI state but MPIC stays masked

The unconditional enable in resume covers this case. When NAPI was
idle at suspend time, the extra enable is harmless.

BR,
Yun


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] net: mvneta: re-enable percpu interrupt on resume
  2026-06-18 13:56   ` Zhou, Yun
@ 2026-06-18 15:04     ` Sebastian Andrzej Siewior
  2026-06-18 15:34       ` Maxime Chevallier
  2026-06-18 15:45       ` Zhou, Yun
  0 siblings, 2 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-18 15:04 UTC (permalink / raw)
  To: Zhou, Yun
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, netdev, linux-kernel

On 2026-06-18 21:56:12 [+0800], Zhou, Yun wrote:
> 
> > But if the thread is idle then you have one enable too many, don't you?
> > Well you have the NAPI callback which does disable on the local CPU and
> > this resume which enables it on every CPU. So this does not look right.
> > 
> 
> The enable in resume is intentionally unconditional and idempotent
> (writing MPIC_INT_CLEAR_MASK on an already unmasked IRQ is a no-op).
> 
> > The interesting question is what happens to the enable_percpu_irq() from
> > the mvneta_poll(). Is it lost? And if so, how/ why?
> > 
> 
> The enable_percpu_irq() from mvneta_poll is not "lost" — it never
> gets a chance to execute. The sequence is:
> 
> 1. mvneta_percpu_isr: disable_percpu_irq() + napi_schedule()
> 2. PM freezes kthreads (on PREEMPT_RT, softirq runs in kthread)

But napi_schedule() runs in ksoftird, doesn't it? The per-CPU IRQs are
not threaded. That does not look optimal.

> 3. NAPI poll cannot run → enable_percpu_irq() is never called
> 4. mvneta_stop_dev → napi_disable(): cancels the scheduled poll
>     but does NOT execute the completion path (no enable_percpu_irq)

napi_schedule() sets NAPIF_STATE_SCHED.
napi_disable() sets NAPI_STATE_DISABLE and waits until NAPIF_STATE_SCHED
is cleared. So if NAPIF_STATE_SCHED was set then enable_percpu_irq()
will be invoked unless it leaves somewhere early.
However, if DISABLED was already set then it disables the IRQ source but
does not schedule NAPI. This is probably what happens.

> 5. Resume → napi_enable(): resets NAPI state but MPIC stays masked
> 
> The unconditional enable in resume covers this case. When NAPI was
> idle at suspend time, the extra enable is harmless.

There is no desc::depth counting here, that got me confused. But that
per-CPU irq is not optimal. Is this a SoC limitation or a design choice?
Having a NAPI instance with IRQ per queue and those configured and
spread among CPUs should cause less trouble and is what others do.
In fact is the only net driver using per-CPU interrupts.

> BR,
> Yun

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] net: mvneta: re-enable percpu interrupt on resume
  2026-06-18 15:04     ` Sebastian Andrzej Siewior
@ 2026-06-18 15:34       ` Maxime Chevallier
  2026-06-18 15:45       ` Zhou, Yun
  1 sibling, 0 replies; 7+ messages in thread
From: Maxime Chevallier @ 2026-06-18 15:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Zhou, Yun
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, linux-kernel

Hi,

On 6/18/26 17:04, Sebastian Andrzej Siewior wrote:
> On 2026-06-18 21:56:12 [+0800], Zhou, Yun wrote:
>>
>>> But if the thread is idle then you have one enable too many, don't you?
>>> Well you have the NAPI callback which does disable on the local CPU and
>>> this resume which enables it on every CPU. So this does not look right.
>>>
>>
>> The enable in resume is intentionally unconditional and idempotent
>> (writing MPIC_INT_CLEAR_MASK on an already unmasked IRQ is a no-op).
>>
>>> The interesting question is what happens to the enable_percpu_irq() from
>>> the mvneta_poll(). Is it lost? And if so, how/ why?
>>>
>>
>> The enable_percpu_irq() from mvneta_poll is not "lost" — it never
>> gets a chance to execute. The sequence is:
>>
>> 1. mvneta_percpu_isr: disable_percpu_irq() + napi_schedule()
>> 2. PM freezes kthreads (on PREEMPT_RT, softirq runs in kthread)
> 
> But napi_schedule() runs in ksoftird, doesn't it? The per-CPU IRQs are
> not threaded. That does not look optimal.
> 
>> 3. NAPI poll cannot run → enable_percpu_irq() is never called
>> 4. mvneta_stop_dev → napi_disable(): cancels the scheduled poll
>>     but does NOT execute the completion path (no enable_percpu_irq)
> 
> napi_schedule() sets NAPIF_STATE_SCHED.
> napi_disable() sets NAPI_STATE_DISABLE and waits until NAPIF_STATE_SCHED
> is cleared. So if NAPIF_STATE_SCHED was set then enable_percpu_irq()
> will be invoked unless it leaves somewhere early.
> However, if DISABLED was already set then it disables the IRQ source but
> does not schedule NAPI. This is probably what happens.
> 
>> 5. Resume → napi_enable(): resets NAPI state but MPIC stays masked
>>
>> The unconditional enable in resume covers this case. When NAPI was
>> idle at suspend time, the extra enable is harmless.
> 
> There is no desc::depth counting here, that got me confused. But that
> per-CPU irq is not optimal. Is this a SoC limitation or a design choice?

I _think_ that on mvneta, the mapping is done by assigning queues to CPUs
directly, and then you have per-cpu register banks to handle the interrupts :

https://elixir.bootlin.com/linux/v7.1/source/drivers/net/ethernet/marvell/mvneta.c#L138

This seems to be confirmed by some comments here [1] :

static void mvneta_percpu_mask_interrupt(void *arg)
{
	struct mvneta_port *pp = arg;

	/* All the queue are masked, but actually only the ones
	 * mapped to this CPU will be masked
	 */
	mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
	mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
	mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
}

These registers will control different queue's interrupt behaviour depending
on which CPU executes that code...

[1] : https://elixir.bootlin.com/linux/v7.1/source/drivers/net/ethernet/marvell/mvneta.c#L1441

This may be why the design ended-up that way. I'm not saying this is ideal
though :)

The percpu interrupt mechanism isn't used on armada 3700 (hence all the special
conditions) because IIRC the interrupt routing is flawed for the network part, and
all interrupts end-up on CPU0 anyways...

Maxime


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] net: mvneta: re-enable percpu interrupt on resume
  2026-06-18 15:04     ` Sebastian Andrzej Siewior
  2026-06-18 15:34       ` Maxime Chevallier
@ 2026-06-18 15:45       ` Zhou, Yun
  2026-06-18 15:53         ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 7+ messages in thread
From: Zhou, Yun @ 2026-06-18 15:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, netdev, linux-kernel



On 6/18/2026 11:04 PM, Sebastian Andrzej Siewior wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> On 2026-06-18 21:56:12 [+0800], Zhou, Yun wrote:
>>
>> The enable_percpu_irq() from mvneta_poll is not "lost" — it never
>> gets a chance to execute. The sequence is:
>>
>> 1. mvneta_percpu_isr: disable_percpu_irq() + napi_schedule()
>> 2. PM freezes kthreads (on PREEMPT_RT, softirq runs in kthread)
> 
> But napi_schedule() runs in ksoftird, doesn't it? The per-CPU IRQs are
> not threaded. That does not look optimal.
> 
Correct, percpu IRQs are not threaded. net_rx_action (which calls 
mvneta_poll and ultimately enable_percpu_irq) runs in softirq context 
via ksoftirqd.

>> 3. NAPI poll cannot run → enable_percpu_irq() is never called
>> 4. mvneta_stop_dev → napi_disable(): cancels the scheduled poll
>>      but does NOT execute the completion path (no enable_percpu_irq)
> 
> napi_schedule() sets NAPIF_STATE_SCHED.
> napi_disable() sets NAPI_STATE_DISABLE and waits until NAPIF_STATE_SCHED
> is cleared. So if NAPIF_STATE_SCHED was set then enable_percpu_irq()
> will be invoked unless it leaves somewhere early.
> However, if DISABLED was already set then it disables the IRQ source but
> does not schedule NAPI. This is probably what happens.

Your analysis makes perfect sense, and that scenario is indeed much more
likely. It looks like I'll need to update the commit message accordingly.

>> 5. Resume → napi_enable(): resets NAPI state but MPIC stays masked
>>
>> The unconditional enable in resume covers this case. When NAPI was
>> idle at suspend time, the extra enable is harmless.
> 
> There is no desc::depth counting here, that got me confused. But that
> per-CPU irq is not optimal. Is this a SoC limitation or a design choice?
> Having a NAPI instance with IRQ per queue and those configured and
> spread among CPUs should cause less trouble and is what others do.
> In fact is the only net driver using per-CPU interrupts.
> 
It is a SoC limitation. Armada XP's MPIC provides a single shared
interrupt for the ethernet controller with per-CPU masking for
interrupt steering — there are no per-queue MSI vectors. The percpu
IRQ model was the only way to distribute interrupt handling across
CPUs given this hardware constraint.

Newer Marvell SoCs (armada3700+) moved away from this model and use
standard IRQs, which is why the armada3700 path does not have this
problem.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] net: mvneta: re-enable percpu interrupt on resume
  2026-06-18 15:45       ` Zhou, Yun
@ 2026-06-18 15:53         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-18 15:53 UTC (permalink / raw)
  To: Zhou, Yun
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, netdev, linux-kernel

On 2026-06-18 23:45:51 [+0800], Zhou, Yun wrote:
> > Having a NAPI instance with IRQ per queue and those configured and
> > spread among CPUs should cause less trouble and is what others do.
> > In fact is the only net driver using per-CPU interrupts.
> > 
> It is a SoC limitation. Armada XP's MPIC provides a single shared
> interrupt for the ethernet controller with per-CPU masking for
> interrupt steering — there are no per-queue MSI vectors. The percpu
> IRQ model was the only way to distribute interrupt handling across
> CPUs given this hardware constraint.

Is this a hardware constraint or more a software design choice? From the
other comment it read like it could be changed.
There is nothing wrong to provide 4 interrupts for a device from the
device-tree and then allocate and request all four. This requires that
SMP affinity is supported properly in order to spread it across CPU. You
would also be able to reduce the amount of queues/ interrupts via
ethtool if you would like to isolate a CPU for NOHZ reasons.

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-18 15:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 10:43 [PATCH v3] net: mvneta: re-enable percpu interrupt on resume Yun Zhou
2026-06-18 12:51 ` Sebastian Andrzej Siewior
2026-06-18 13:56   ` Zhou, Yun
2026-06-18 15:04     ` Sebastian Andrzej Siewior
2026-06-18 15:34       ` Maxime Chevallier
2026-06-18 15:45       ` Zhou, Yun
2026-06-18 15:53         ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox