public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG
@ 2026-02-20  4:11 Siddharth Vadapalli
  2026-02-20  4:11 ` [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ Siddharth Vadapalli
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-20  4:11 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller
  Cc: stable, netdev, linux-kernel, linux-arm-kernel, srk, s-vadapalli

Hello,

This series fixes the warning:
    Unbalanced enable for IRQ ...
for the CPSW and ICSSG drivers.

Under heavy traffic and in an SMP environment the warning shows up after
a relatively long time. The issue occurs due to the order in which the
variable 'irq_disabled' is set and the function disable_irq_nosync() is
invoked.

I have examined other drivers and they follow the right order which is
to invoke disable_irq_nosync() before setting 'irq_disabled' (or its
equivalent variable).

The first patch is for the CPSW driver and it has two Fixes tags since
the code change associated with the fix is for a recent commit while
the incorrect order was first introduced by a much older commit.

The second and third patches are for the ICSSG driver. Although they
are both for the same driver and could be squashed, I chose to split
them since they fix different commits and need to be backported as
Fixes for the respective commits.

Regards,
Siddharth.

Siddharth Vadapalli (3):
  net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX
    IRQ
  net: ethernet: ti: icssg_common: set irq_disabled after disabling TX
    IRQ
  net: ethernet: ti: icssg_common: set irq_disabled after disabling RX
    IRQ

 drivers/net/ethernet/ti/am65-cpsw-nuss.c     | 2 +-
 drivers/net/ethernet/ti/icssg/icssg_common.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.51.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ
  2026-02-20  4:11 [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Siddharth Vadapalli
@ 2026-02-20  4:11 ` Siddharth Vadapalli
  2026-02-24  2:48   ` Jakub Kicinski
  2026-02-20  4:11 ` [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ Siddharth Vadapalli
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-20  4:11 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller
  Cc: stable, netdev, linux-kernel, linux-arm-kernel, srk, s-vadapalli

The 'irq_disabled' variable indicates the current state of the RX IRQ and
is used by the RX NAPI handler to determine whether the IRQ should be
enabled.

Currently, 'irq_disabled' is set before actually disabling the IRQ by
invoking disable_irq_nosync(). In an SMP environment, this leads to a race
condition wherein the processor taking the interrupt sets 'irq_disabled'
while another processor executing a previous instance of the RX NAPI
handler sees 'irq_disabled' set and invokes enable_irq() before the RX IRQ
is actually disabled by disable_irq_nosync(). This results in the following
warning:
	Unbalanced enable for IRQ ...

Fix this by disabling the RX IRQ using disable_irq_nosync() before setting
'irq_disabled'.

Fixes: da70d184a8c3 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx")
Fixes: 47bfc4d128de ("net: ti: am65-cpsw-nuss: fix RX IRQ state after .ndo_stop()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
 drivers/net/ethernet/ti/am65-cpsw-nuss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index 5924db6be3fe..8785ab40f157 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -1570,8 +1570,8 @@ static irqreturn_t am65_cpsw_nuss_rx_irq(int irq, void *dev_id)
 {
 	struct am65_cpsw_rx_flow *flow = dev_id;
 
-	flow->irq_disabled = true;
 	disable_irq_nosync(irq);
+	flow->irq_disabled = true;
 	napi_schedule(&flow->napi_rx);
 
 	return IRQ_HANDLED;
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ
  2026-02-20  4:11 [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Siddharth Vadapalli
  2026-02-20  4:11 ` [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ Siddharth Vadapalli
@ 2026-02-20  4:11 ` Siddharth Vadapalli
  2026-02-24  2:48   ` Jakub Kicinski
  2026-02-20  4:11 ` [PATCH net 3/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling RX IRQ Siddharth Vadapalli
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-20  4:11 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller
  Cc: stable, netdev, linux-kernel, linux-arm-kernel, srk, s-vadapalli

The 'irq_disabled' variable indicates the current state of the TX IRQ and
is used by the TX NAPI handler to determine whether the IRQ should be
enabled.

Currently, 'irq_disabled' is set before actually disabling the IRQ by
invoking disable_irq_nosync(). In an SMP environment, this leads to a race
condition wherein the processor taking the interrupt sets 'irq_disabled'
while another processor executing a previous instance of the TX NAPI
handler sees 'irq_disabled' set and invokes enable_irq() before the TX IRQ
is actually disabled by disable_irq_nosync(). This results in the following
warning:
	Unbalanced enable for IRQ ...

Fix this by disabling the TX IRQ using disable_irq_nosync() before setting
'irq_disabled'.

Fixes: 8756ef2eb078 ("net: ti: icssg-prueth: Add AF_XDP zero copy for TX")
Cc: <stable@vger.kernel.org>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
 drivers/net/ethernet/ti/icssg/icssg_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index 0cf9dfe0fa36..24716c8d7f75 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -348,8 +348,8 @@ static irqreturn_t prueth_tx_irq(int irq, void *dev_id)
 {
 	struct prueth_tx_chn *tx_chn = dev_id;
 
-	tx_chn->irq_disabled = true;
 	disable_irq_nosync(irq);
+	tx_chn->irq_disabled = true;
 	napi_schedule(&tx_chn->napi_tx);
 
 	return IRQ_HANDLED;
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net 3/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling RX IRQ
  2026-02-20  4:11 [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Siddharth Vadapalli
  2026-02-20  4:11 ` [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ Siddharth Vadapalli
  2026-02-20  4:11 ` [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ Siddharth Vadapalli
@ 2026-02-20  4:11 ` Siddharth Vadapalli
  2026-02-20 10:00 ` [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Malladi, Meghana
  2026-02-23 17:39 ` Simon Horman
  4 siblings, 0 replies; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-20  4:11 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller
  Cc: stable, netdev, linux-kernel, linux-arm-kernel, srk, s-vadapalli

The 'irq_disabled' variable indicates the current state of the RX IRQ and
is used by the RX NAPI handler to determine whether the IRQ should be
enabled.

Currently, 'irq_disabled' is set before actually disabling the IRQ by
invoking disable_irq_nosync(). In an SMP environment, this leads to a race
condition wherein the processor taking the interrupt sets 'irq_disabled'
while another processor executing a previous instance of the RX NAPI
handler sees 'irq_disabled' set and invokes enable_irq() before the RX IRQ
is actually disabled by disable_irq_nosync(). This results in the following
warning:
	Unbalanced enable for IRQ ...

Fix this by disabling the RX IRQ using disable_irq_nosync() before setting
'irq_disabled'.

Fixes: 7a64bb388df3 ("net: ti: icssg-prueth: Add AF_XDP zero copy for RX")
Cc: <stable@vger.kernel.org>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
 drivers/net/ethernet/ti/icssg/icssg_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index 24716c8d7f75..a512a1317c59 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -1385,8 +1385,8 @@ irqreturn_t prueth_rx_irq(int irq, void *dev_id)
 {
 	struct prueth_emac *emac = dev_id;
 
-	emac->rx_chns.irq_disabled = true;
 	disable_irq_nosync(irq);
+	emac->rx_chns.irq_disabled = true;
 	napi_schedule(&emac->napi_rx);
 
 	return IRQ_HANDLED;
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG
  2026-02-20  4:11 [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Siddharth Vadapalli
                   ` (2 preceding siblings ...)
  2026-02-20  4:11 ` [PATCH net 3/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling RX IRQ Siddharth Vadapalli
@ 2026-02-20 10:00 ` Malladi, Meghana
  2026-02-23 17:39 ` Simon Horman
  4 siblings, 0 replies; 16+ messages in thread
From: Malladi, Meghana @ 2026-02-20 10:00 UTC (permalink / raw)
  To: Siddharth Vadapalli, andrew+netdev, davem, edumazet, kuba, pabeni,
	danishanwar, rogerq, horms, mwalle, nm, v-singh1, vadim.fedorenko,
	matthias.schiffer, vigneshr, jacob.e.keller
  Cc: stable, netdev, linux-kernel, linux-arm-kernel, srk



On 2/20/2026 9:41 AM, Siddharth Vadapalli wrote:
> Hello,
> 
> This series fixes the warning:
>      Unbalanced enable for IRQ ...
> for the CPSW and ICSSG drivers.
> 
> Under heavy traffic and in an SMP environment the warning shows up after
> a relatively long time. The issue occurs due to the order in which the
> variable 'irq_disabled' is set and the function disable_irq_nosync() is
> invoked.
> 
> I have examined other drivers and they follow the right order which is
> to invoke disable_irq_nosync() before setting 'irq_disabled' (or its
> equivalent variable).
> 
> The first patch is for the CPSW driver and it has two Fixes tags since
> the code change associated with the fix is for a recent commit while
> the incorrect order was first introduced by a much older commit.
> 
> The second and third patches are for the ICSSG driver. Although they
> are both for the same driver and could be squashed, I chose to split
> them since they fix different commits and need to be backported as
> Fixes for the respective commits.
> 
> Regards,
> Siddharth.
> 
> Siddharth Vadapalli (3):
>    net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX
>      IRQ
>    net: ethernet: ti: icssg_common: set irq_disabled after disabling TX
>      IRQ
>    net: ethernet: ti: icssg_common: set irq_disabled after disabling RX
>      IRQ
> 
>   drivers/net/ethernet/ti/am65-cpsw-nuss.c     | 2 +-
>   drivers/net/ethernet/ti/icssg/icssg_common.c | 4 ++--
>   2 files changed, 3 insertions(+), 3 deletions(-)
> 

Reviewed-by: Meghana Malladi <m-malladi@ti.com>

-- 
Thanks,
Meghana Malladi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG
  2026-02-20  4:11 [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Siddharth Vadapalli
                   ` (3 preceding siblings ...)
  2026-02-20 10:00 ` [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Malladi, Meghana
@ 2026-02-23 17:39 ` Simon Horman
  4 siblings, 0 replies; 16+ messages in thread
From: Simon Horman @ 2026-02-23 17:39 UTC (permalink / raw)
  To: Siddharth Vadapalli
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, danishanwar, rogerq,
	mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk

On Fri, Feb 20, 2026 at 09:41:56AM +0530, Siddharth Vadapalli wrote:
> Hello,
> 
> This series fixes the warning:
>     Unbalanced enable for IRQ ...
> for the CPSW and ICSSG drivers.
> 
> Under heavy traffic and in an SMP environment the warning shows up after
> a relatively long time. The issue occurs due to the order in which the
> variable 'irq_disabled' is set and the function disable_irq_nosync() is
> invoked.
> 
> I have examined other drivers and they follow the right order which is
> to invoke disable_irq_nosync() before setting 'irq_disabled' (or its
> equivalent variable).
> 
> The first patch is for the CPSW driver and it has two Fixes tags since
> the code change associated with the fix is for a recent commit while
> the incorrect order was first introduced by a much older commit.
> 
> The second and third patches are for the ICSSG driver. Although they
> are both for the same driver and could be squashed, I chose to split
> them since they fix different commits and need to be backported as
> Fixes for the respective commits.

For the series,

Reviewed-by: Simon Horman <horms@kernel.org>

...


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ
  2026-02-20  4:11 ` [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ Siddharth Vadapalli
@ 2026-02-24  2:48   ` Jakub Kicinski
  2026-02-24  5:10     ` Siddharth Vadapalli
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2026-02-24  2:48 UTC (permalink / raw)
  To: Siddharth Vadapalli
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk

On Fri, 20 Feb 2026 09:41:57 +0530 Siddharth Vadapalli wrote:
> The 'irq_disabled' variable indicates the current state of the RX IRQ and
> is used by the RX NAPI handler to determine whether the IRQ should be
> enabled.
> 
> Currently, 'irq_disabled' is set before actually disabling the IRQ by
> invoking disable_irq_nosync(). In an SMP environment, this leads to a race
> condition wherein the processor taking the interrupt sets 'irq_disabled'
> while another processor executing a previous instance of the RX NAPI
> handler sees 'irq_disabled' set and invokes enable_irq() before the RX IRQ
> is actually disabled by disable_irq_nosync(). This results in the following
> warning:
> 	Unbalanced enable for IRQ ...
> 
> Fix this by disabling the RX IRQ using disable_irq_nosync() before setting
> 'irq_disabled'.

I'm not sure this is enough. The IRQ enable/disable serves as barriers
so the ordering is sane. I think the problem is that there are multiple
paths for Rx which may schedule NAPI, not just the path from the IRQ
handler. If the state changes have to be atomic you need a lock.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ
  2026-02-20  4:11 ` [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ Siddharth Vadapalli
@ 2026-02-24  2:48   ` Jakub Kicinski
  2026-02-24 12:24     ` Siddharth Vadapalli
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2026-02-24  2:48 UTC (permalink / raw)
  To: Siddharth Vadapalli
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk

On Fri, 20 Feb 2026 09:41:58 +0530 Siddharth Vadapalli wrote:
> The 'irq_disabled' variable indicates the current state of the TX IRQ and
> is used by the TX NAPI handler to determine whether the IRQ should be
> enabled.
> 
> Currently, 'irq_disabled' is set before actually disabling the IRQ by
> invoking disable_irq_nosync(). In an SMP environment, this leads to a race
> condition wherein the processor taking the interrupt sets 'irq_disabled'
> while another processor executing a previous instance of the TX NAPI
> handler sees 'irq_disabled' set and invokes enable_irq() before the TX IRQ
> is actually disabled by disable_irq_nosync(). This results in the following
> warning:
> 	Unbalanced enable for IRQ ...

AFAICT the flow on the Tx bug is not buggy, owner ship of the IRQ
vector passes handler -> NAPI -> timer. I don't see how those can
race.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ
  2026-02-24  2:48   ` Jakub Kicinski
@ 2026-02-24  5:10     ` Siddharth Vadapalli
  2026-02-24 23:54       ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-24  5:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk, s-vadapalli

On Mon, 2026-02-23 at 18:48 -0800, Jakub Kicinski wrote:
> On Fri, 20 Feb 2026 09:41:57 +0530 Siddharth Vadapalli wrote:
> > The 'irq_disabled' variable indicates the current state of the RX IRQ and
> > is used by the RX NAPI handler to determine whether the IRQ should be
> > enabled.
> > 
> > Currently, 'irq_disabled' is set before actually disabling the IRQ by
> > invoking disable_irq_nosync(). In an SMP environment, this leads to a race
> > condition wherein the processor taking the interrupt sets 'irq_disabled'
> > while another processor executing a previous instance of the RX NAPI
> > handler sees 'irq_disabled' set and invokes enable_irq() before the RX IRQ
> > is actually disabled by disable_irq_nosync(). This results in the following
> > warning:
> > 	Unbalanced enable for IRQ ...
> > 
> > Fix this by disabling the RX IRQ using disable_irq_nosync() before setting
> > 'irq_disabled'.
> 
> I'm not sure this is enough. The IRQ enable/disable serves as barriers
> so the ordering is sane. I think the problem is that there are multiple
> paths for Rx which may schedule NAPI, not just the path from the IRQ
> handler. If the state changes have to be atomic you need a lock.

I assume that state changes will be atomic since 'irq_disabled' is set in
the HARD
IRQ handler. Only the processor taking the interrupt will be able to update
the state.
Other processors can read the variable during their execution of the
previous NAPI
RX handlers. Only when 'irq_disabled' is set they will invoke
'enable_irq()'. So the race
condition that you are referring to, is possible during the 'enable_irq()'
path:

	In am65_cpsw_nuss_rx_poll() => The RX NAPI Handler
		...
		if (flow->irq_disabled) {
			flow->irq_disabled = false;
			...
			enable_irq();

CPU1 sees irq_disabled being 'true' and before it updates it to 'false', if
CPU2 also sees irq_disabled
being 'true', both CPU1 and CPU2 will enter the IF-condition and eventually
invoke enable_irq().

Please let me know if this is what you were referring to. I will use atomic
APIs at all places to update
'irq_disabled'.

Regards,
Siddharth.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ
  2026-02-24  2:48   ` Jakub Kicinski
@ 2026-02-24 12:24     ` Siddharth Vadapalli
  2026-02-24 23:49       ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-24 12:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk, s-vadapalli

On Mon, 2026-02-23 at 18:48 -0800, Jakub Kicinski wrote:
> On Fri, 20 Feb 2026 09:41:58 +0530 Siddharth Vadapalli wrote:
> > The 'irq_disabled' variable indicates the current state of the TX IRQ and
> > is used by the TX NAPI handler to determine whether the IRQ should be
> > enabled.
> > 
> > Currently, 'irq_disabled' is set before actually disabling the IRQ by
> > invoking disable_irq_nosync(). In an SMP environment, this leads to a race
> > condition wherein the processor taking the interrupt sets 'irq_disabled'
> > while another processor executing a previous instance of the TX NAPI
> > handler sees 'irq_disabled' set and invokes enable_irq() before the TX IRQ
> > is actually disabled by disable_irq_nosync(). This results in the following
> > warning:
> > 	Unbalanced enable for IRQ ...
> 
> AFAICT the flow on the Tx bug is not buggy, owner ship of the IRQ
> vector passes handler -> NAPI -> timer. I don't see how those can
> race.

The issue is seen in practice. Interrupt coalescing (hrtimer) isn't used.
The call sequence leading to the warning is:
	net_rx_action
		__napi_poll
			NAPI TX Handler
It does seem strange that the 'net_rx_action' leads to the NAPI TX Handler.
However, it is exactly this path that causes the warning, and it is due to
this that we could end up in the following situation:

			CPU0						
				CPU1
		----------------
								--------------
1.	TX HARD IRQ Handler entered				NAPI TX
Handler is running
2.	irq_disabled is
set							Sees irq_disabled being set
3.	Starts executing disable_irq_nosync()		Invokes
enable_irq() for TX IRQ before its really disabled
									
			[UNBALANCED IRQ WARNING IS DISPLAYED]

Regards,
Siddharth.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ
  2026-02-24 12:24     ` Siddharth Vadapalli
@ 2026-02-24 23:49       ` Jakub Kicinski
  2026-02-25 11:31         ` Siddharth Vadapalli
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2026-02-24 23:49 UTC (permalink / raw)
  To: Siddharth Vadapalli
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk

On Tue, 24 Feb 2026 17:54:18 +0530 Siddharth Vadapalli wrote:
> 			CPU0						
> 				CPU1
> 		----------------
> 								--------------
> 1.	TX HARD IRQ Handler entered				NAPI TX
> Handler is running
> 2.	irq_disabled is
> set							Sees irq_disabled being set
> 3.	Starts executing disable_irq_nosync()		Invokes
> enable_irq() for TX IRQ before its really disabled

Could you resend your last email fixing the line wrap issue? It's very
hard to read as it arrived on the list.

From what I gather you're concerned about the case when hard IRQ and
NAPI run in parallel. But I don't see how that could ever happen for
Tx (there are some complexities like netpoll and busy poll but those
will return false from napi_complete_done()).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ
  2026-02-24  5:10     ` Siddharth Vadapalli
@ 2026-02-24 23:54       ` Jakub Kicinski
  2026-02-25 11:12         ` Siddharth Vadapalli
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2026-02-24 23:54 UTC (permalink / raw)
  To: Siddharth Vadapalli
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk

On Tue, 24 Feb 2026 10:40:05 +0530 Siddharth Vadapalli wrote:
> CPU1 sees irq_disabled being 'true' and before it updates it to 'false', if
> CPU2 also sees irq_disabled
> being 'true', both CPU1 and CPU2 will enter the IF-condition and eventually
> invoke enable_irq().

I think the races are just between NAPI and the HARD IRQ context.
There can only be one NAPI scheduled for a queue, I assume.

> Please let me know if this is what you were referring to. I will use atomic
> APIs at all places to update
> 'irq_disabled'.

I recommend a spin lock, unless you can measure as significant
difference. Locks and atomics have similar cost on many CPUs.
And juggling local state, IRQ state, and NAPI state atomically
will get tricky.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ
  2026-02-24 23:54       ` Jakub Kicinski
@ 2026-02-25 11:12         ` Siddharth Vadapalli
  0 siblings, 0 replies; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-25 11:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk, s-vadapalli

On 25/02/26 05:24, Jakub Kicinski wrote:
> On Tue, 24 Feb 2026 10:40:05 +0530 Siddharth Vadapalli wrote:
>> CPU1 sees irq_disabled being 'true' and before it updates it to 'false', if
>> CPU2 also sees irq_disabled
>> being 'true', both CPU1 and CPU2 will enter the IF-condition and eventually
>> invoke enable_irq().
> 
> I think the races are just between NAPI and the HARD IRQ context.
> There can only be one NAPI scheduled for a queue, I assume.

Yes. An already executing RX NAPI Handler (scheduled via net_rx_action) 
sees 'irq_disabled' set by the HARD IRQ handler. The RX NAPI Handler then 
executes 'enable_irq()' for the RX IRQ before it is actually disabled by 
the HARD IRQ handler using disable_irq_nosync().

> 
>> Please let me know if this is what you were referring to. I will use atomic
>> APIs at all places to update
>> 'irq_disabled'.
> 
> I recommend a spin lock, unless you can measure as significant
> difference. Locks and atomics have similar cost on many CPUs.
> And juggling local state, IRQ state, and NAPI state atomically
> will get tricky.

Updates to 'irq_disabled' are performed by:

	1. Hard IRQ Handler sets irq_disabled to true.
	   => Since there can be only one IRQ for a given RX Queue,
	    we can be certain that there is no race w.r.t. setting it
	    to true.
	2. NAPI RX Handler sets irq_disabled to false if currently true.
	   => This is the part I am unsure of but if a single instance
	    of the NAPI RX Handler will be scheduled in an SMP
	    environment as well, there won't be a race between
	    multiple processors as the following will cannot happen
	    simultaneously on multiple CPUs running the RX Handler:
		irq_disabled = false;
		enable_irq();

Regards,
Siddharth.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ
  2026-02-24 23:49       ` Jakub Kicinski
@ 2026-02-25 11:31         ` Siddharth Vadapalli
  2026-02-26  0:09           ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-25 11:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk, s-vadapalli

On 25/02/26 05:19, Jakub Kicinski wrote:
> On Tue, 24 Feb 2026 17:54:18 +0530 Siddharth Vadapalli wrote:
>> 			CPU0						
>> 				CPU1
>> 		----------------
>> 								--------------
>> 1.	TX HARD IRQ Handler entered				NAPI TX
>> Handler is running
>> 2.	irq_disabled is
>> set							Sees irq_disabled being set
>> 3.	Starts executing disable_irq_nosync()		Invokes
>> enable_irq() for TX IRQ before its really disabled
> 
> Could you resend your last email fixing the line wrap issue? It's very
> hard to read as it arrived on the list.

I have changed my email client now to fix the line wrap issue. Sorry for 
the inconvenience cause. I am repeating my earlier message below with 
proper formatting. It is in response to the following statement:

 > AFAICT the flow on the Tx bug is not buggy, owner ship of the IRQ
 > vector passes handler -> NAPI -> timer. I don't see how those can
 > race.

The issue is seen in practice. Interrupt coalescing (hrtimer) isn't used.
The call sequence leading to the warning is:

	net_rx_action
		__napi_poll
			NAPI TX Handler

It does seem strange that the 'net_rx_action' leads to the NAPI TX Handler.
However, it is exactly this path that causes the warning, and it is due to
this that we could end up in the following situation:

                 CPU0                             CPU1
    -----------------------------      -----------------------------
1. TX HARD IRQ Handler entered         NAPI TX Handler is running
2. irq_disabled is set to true         Sees irq_disabled being true
3. Calls disable_irq_nosync()          Calls enable_irq()
4. Enters disable_irq_nosync()         [WARNING: Unbalanced enable for IRQ]

> 
>  From what I gather you're concerned about the case when hard IRQ and
> NAPI run in parallel. But I don't see how that could ever happen for
> Tx (there are some complexities like netpoll and busy poll but those
> will return false from napi_complete_done()).

Weirdly, net_rx_action() is leading to NAPI TX Handler.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ
  2026-02-25 11:31         ` Siddharth Vadapalli
@ 2026-02-26  0:09           ` Jakub Kicinski
  2026-02-26 11:34             ` Siddharth Vadapalli
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2026-02-26  0:09 UTC (permalink / raw)
  To: Siddharth Vadapalli
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk

On Wed, 25 Feb 2026 17:01:31 +0530 Siddharth Vadapalli wrote:
> 	net_rx_action
> 		__napi_poll
> 			NAPI TX Handler
> 
> It does seem strange that the 'net_rx_action' leads to the NAPI TX Handler.

For historic reason rx_action runs all NAPI, it's fine.

> However, it is exactly this path that causes the warning, and it is due to
> this that we could end up in the following situation:
> 
>                  CPU0                             CPU1
>     -----------------------------      -----------------------------
> 1. TX HARD IRQ Handler entered         NAPI TX Handler is running
> 2. irq_disabled is set to true         Sees irq_disabled being true
> 3. Calls disable_irq_nosync()          Calls enable_irq()
> 4. Enters disable_irq_nosync()         [WARNING: Unbalanced enable for IRQ]

Right, but for Tx NAPI is only scheduled from the IRQ so this is not
possible. For Rx yes, AFAICT there are paths in the driver which
schedule the Rx NAPI (AF_XDP?). But Tx NAPI seemed to have only
been scheduled by IRQ. And if that's the case the NAPI can't run
until CPU0's IRQ handler calls napi_schedule().

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ
  2026-02-26  0:09           ` Jakub Kicinski
@ 2026-02-26 11:34             ` Siddharth Vadapalli
  0 siblings, 0 replies; 16+ messages in thread
From: Siddharth Vadapalli @ 2026-02-26 11:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: andrew+netdev, davem, edumazet, pabeni, danishanwar, rogerq,
	horms, mwalle, nm, v-singh1, vadim.fedorenko, matthias.schiffer,
	vigneshr, m-malladi, jacob.e.keller, stable, netdev, linux-kernel,
	linux-arm-kernel, srk, s-vadapalli

On 26/02/26 05:39, Jakub Kicinski wrote:
> On Wed, 25 Feb 2026 17:01:31 +0530 Siddharth Vadapalli wrote:
>> 	net_rx_action
>> 		__napi_poll
>> 			NAPI TX Handler
>>
>> It does seem strange that the 'net_rx_action' leads to the NAPI TX Handler.
> 
> For historic reason rx_action runs all NAPI, it's fine.
> 
>> However, it is exactly this path that causes the warning, and it is due to
>> this that we could end up in the following situation:
>>
>>                   CPU0                             CPU1
>>      -----------------------------      -----------------------------
>> 1. TX HARD IRQ Handler entered         NAPI TX Handler is running
>> 2. irq_disabled is set to true         Sees irq_disabled being true
>> 3. Calls disable_irq_nosync()          Calls enable_irq()
>> 4. Enters disable_irq_nosync()         [WARNING: Unbalanced enable for IRQ]
> 
> Right, but for Tx NAPI is only scheduled from the IRQ so this is not

But the NAPI TX Handler running on CPU1 at step-1 is a previously queued 
instance and not the one corresponding to the TX HARD IRQ Handler running 
on CPU0 in step-1.

> possible. For Rx yes, AFAICT there are paths in the driver which
> schedule the Rx NAPI (AF_XDP?). But Tx NAPI seemed to have only
> been scheduled by IRQ. And if that's the case the NAPI can't run
> until CPU0's IRQ handler calls napi_schedule().

Wouldn't net_rx_action periodically schedule the NAPI TX Handler? Please 
note the following:
1. The call trace shows net_rx_action scheduling the NAPI TX Handler.
2. Since CPU0 entered the TX HARD IRQ Handler for the TX Completion 
Interrupt, it indicates that the TX Interrupt was indeed enabled at that 
point in time. Hence, a previous instance of the NAPI TX Handler has 
already invoked 'enable_irq()' and enabled the TX Interrupt.
3. Given that 'another' previous instance of the NAPI TX Handler is running 
even though the TX Interrupt is already enabled (given that CPU0 takes the 
interrupt), it indicates that it has been scheduled by some entity.
4. The entity that seems to schedule the NAPI TX Handler despite TX 
Interrupt already being enabled by a previous run of the NAPI TX Handler 
seems to be 'net_rx_action'.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-02-26 11:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20  4:11 [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Siddharth Vadapalli
2026-02-20  4:11 ` [PATCH net 1/3] net: ethernet: ti: am65-cpsw-nuss: set irq_disabled after disabling RX IRQ Siddharth Vadapalli
2026-02-24  2:48   ` Jakub Kicinski
2026-02-24  5:10     ` Siddharth Vadapalli
2026-02-24 23:54       ` Jakub Kicinski
2026-02-25 11:12         ` Siddharth Vadapalli
2026-02-20  4:11 ` [PATCH net 2/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling TX IRQ Siddharth Vadapalli
2026-02-24  2:48   ` Jakub Kicinski
2026-02-24 12:24     ` Siddharth Vadapalli
2026-02-24 23:49       ` Jakub Kicinski
2026-02-25 11:31         ` Siddharth Vadapalli
2026-02-26  0:09           ` Jakub Kicinski
2026-02-26 11:34             ` Siddharth Vadapalli
2026-02-20  4:11 ` [PATCH net 3/3] net: ethernet: ti: icssg_common: set irq_disabled after disabling RX IRQ Siddharth Vadapalli
2026-02-20 10:00 ` [PATCH net 0/3] Fix Unbalanced IRQ Enable for CPSW and ICSSG Malladi, Meghana
2026-02-23 17:39 ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox