[PATCH] net: macb: fix race condition in TX queue initialization

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] net: macb: fix race condition in TX queue initialization
@ 2026-02-18 10:43 Ingo Rohloff
  2026-02-19 22:43 ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Rohloff @ 2026-02-18 10:43 UTC (permalink / raw)
  To: nicolas.ferre, claudiu.beznea; +Cc: kuba, netdev, Ingo Rohloff

Commit bf9cf80cab81e ("net: macb: Fix tx/rx malfunction after phy link
down and up") initializes queue->tx_head and queue->tx_tail to zero
without holding queue->tx_ptr_lock.

As established in commit 99537d5c476c ("net: macb: Relocate
mog_init_rings() callback from macb_mac_link_up() to macb_open()"),
macb_start_xmit() and macb_mac_link_up() can execute concurrently.
Consequently, modifying the TX pointers without locking creates a race
condition.

To resolve this, this patch ensures that the netif tx queues are stopped
before the device is registered via register_netdev().  The TX queues are
now enabled only at the end of macb_mac_link_up() rather than during
macb_open(). This avoids concurrent execution of macb_mac_link_up()
and macb_start_xmit() thus avoiding the race condition.

This is consistent with the behavior of macb_mac_link_down(),
which also stops the tx queues.

Signed-off-by: Ingo Rohloff <ingo.rohloff@lauterbach.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 43cd013bb70e..2fe0e843627f 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2973,8 +2973,6 @@ static int macb_open(struct net_device *dev)
 	if (err)
 		goto phy_off;

-	netif_tx_start_all_queues(dev);
-
 	if (bp->ptp_info)
 		bp->ptp_info->ptp_init(dev);

@@ -5607,6 +5605,7 @@ static int macb_probe(struct platform_device *pdev)
 	if (err)
 		goto err_out_phy_exit;

+	netif_tx_stop_all_queues(dev);
 	netif_carrier_off(dev);

 	err = register_netdev(dev);
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: macb: fix race condition in TX queue initialization
  2026-02-18 10:43 [PATCH] net: macb: fix race condition in TX queue initialization Ingo Rohloff
@ 2026-02-19 22:43 ` Jakub Kicinski
  2026-02-20 14:27   ` Ingo Rohloff
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2026-02-19 22:43 UTC (permalink / raw)
  To: Ingo Rohloff; +Cc: nicolas.ferre, claudiu.beznea, netdev

On Wed, 18 Feb 2026 11:43:25 +0100 Ingo Rohloff wrote:
> Commit bf9cf80cab81e ("net: macb: Fix tx/rx malfunction after phy link
> down and up") initializes queue->tx_head and queue->tx_tail to zero
> without holding queue->tx_ptr_lock.

That commit does look quite buggy, but I'm not sure whether the fix is
effective. netif_tx_start_all_queues() and netif_tx_stop_all_queues()
do not take locks or wait for anything either. I don't see how they can
provide us with synchronization guarantees. Of course if we start with
queues stopped - the _chances_ of concurrent xmit and link_up are much
lower, but if the link happens to flap rapidly while xmit is happening
we're back to the same race.

Should we take queue->tx_ptr_lock in link_up, appropriately?
The patch you posted looks like a common sense improvement, so we can
apply it as well. But the fix should be locking and fixing whatever
questionable dark arts the link_up handler is performing on the queues
(we are resetting the pointers without freeing outstanding skbs!?).

When you repost please make sure to include a Fixes tag on the fix.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: macb: fix race condition in TX queue initialization
  2026-02-19 22:43 ` Jakub Kicinski
@ 2026-02-20 14:27   ` Ingo Rohloff
  2026-02-20 23:54     ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Rohloff @ 2026-02-20 14:27 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: nicolas.ferre, claudiu.beznea, netdev

Subject: Re: [PATCH] net: macb: fix race condition in TX queue initialization
Date: Thu, 19 Feb 2026 14:43:38 -0800

> > On Wed, 18 Feb 2026 11:43:25 +0100 Ingo Rohloff wrote:
> > Commit bf9cf80cab81e ("net: macb: Fix tx/rx malfunction after phy link
> > down and up") initializes queue->tx_head and queue->tx_tail to zero
> > without holding queue->tx_ptr_lock.

> That commit does look quite buggy, but I'm not sure whether the fix is
> effective.

Yes: I myself think that what I posted is definitely not enough.

I am not sure if I can come up with a real fix though:

Looking at ethernet drivers which use both:
   * phylink_mac_ops->mac_link_down()
   * net_device_ops->ndo_open()
the most commonly used strategy seems to be:
* When net_device_ops->ndo_open() is called:
  - Setup ring buffers
  - Call netif_tx_start_all_queues()
* DO NOT call netif_tx_stop_all_queues() or similar when
  phylink_mac_ops->mac_link_down() is called
* DO NOT re-initialize ring buffers when
  phylink_mac_ops->mac_link_up() is called.

So: Leave the ring buffers alone in
phylink_mac_ops->mac_link_up() and phylink_mac_ops->mac_link_down().

The driver in macb_main.c seems to be one of the exceptions
which calls netif_tx_stop_all_queues() in macb_mac_link_down() and
netif_tx_wake_all_queues() in macb_mac_link_up().

My problem is:
I don't understand why the OTHER drivers don't do it like this.
What's the expectation of the hardware, if I fill the TX ring buffer
with TX skb's even if the link is down?
Is the hardware supposed to throw all these TX entries away
and still produce completion or error(?) interrupts?
Or is this handled, because drivers which use "phylink",
automatically call netif_carrier_on() and netif_carrier_off() via
"phylink" when the link changes up/down state?

If you know the expected behavior:
Could you teach me what's expected ?

If I understand the expected behavior I might be able to come up
with an idea what the macb driver is supposed to do :-)

> But the fix should be locking and fixing whatever
> questionable dark arts the link_up handler is performing on the queues
> (we are resetting the pointers without freeing outstanding skbs!?).

Yeah: The last one also caught my eye; I think this might produce
an ever growing bunch of outstanding skb's.

I think the reason why the macb driver is unusual is,
because the hardware itself resets the pointers into the
TX ring buffer, when you disable transmit; which means you somehow
have to clean up the TX ring buffer after a link down
(if you disable transmit).

Most of the other drivers I have looked at, somehow seem to expect
that this cleanup happens automagically; but I have no clue why or
how this cleanup happens for other hardware.

I will be very thankful for any input what's the expected behavior!

so long
  Ingo

PS:
Just for reference,
I looked into the following drivers which use both:
    ->ndo_open() and ->mac_link_down()
not many surprisingly(?):

ethernet/altera/altera_tse_main.c
ethernet/atheros/ag71xx.c
ethernet/cadence/macb_main.c
ethernet/freescale/enetc/enetc4_pf.c
ethernet/freescale/enetc/enetc_pf.c
ethernet/freescale/fs_enet/fs_enet-main.c
ethernet/freescale/ucc_geth.c
ethernet/marvell/mvneta.c
ethernet/marvell/mvpp2/mvpp2_main.c
ethernet/marvell/prestera/prestera_main.c
ethernet/mediatek/mtk_eth_soc.c
ethernet/microchip/lan743x_main.c
ethernet/mscc/ocelot_net.c
ethernet/stmicro/stmmac/stmmac_main.c
ethernet/ti/am65-cpsw-nuss.c
ethernet/xilinx/xilinx_axienet_main.c

Drivers which stop netif tx queues in mac_link_down()
Note: Seems like these still start
with enabled tx queues in ->ndo_open() ?

   ethernet/microchip/lan743x_main.c
   ethernet/marvell/mvpp2/mvpp2_main.c
   ethernet/ti/am65-cpsw-nuss.c

Drivers which DO NOT stop netif tx queues in mac_link_down()

   ethernet/altera/altera_tse_main.c
   ethernet/atheros/ag71xx.c
   ethernet/freescale/enetc/enetc.c
   ethernet/marvell/mvneta.c
   ethernet/marvell/prestera/prestera_main.c
   ethernet/mediatek/mtk_eth_soc.c
   ethernet/mscc/ocelot_net.c
   ethernet/stmicro/stmmac/stmmac_main.c
   ethernet/xilinx/xilinx_axienet_main.c

And:
   ethernet/freescale/ucc_geth.c
Which seems to be another potential start_xmit/link_up race,
albeit with fancy handling. Look for usage of ugeth_quiesce()

-------------------------------------------------------------------------
Dipl.-Inform.
Ingo ROHLOFF
Senior Staff Embedded Systems Engineering
phone +49 8102 9876-142 - ingo.rohloff@lauterbach.com

Lauterbach Engineering GmbH & Co. KG
Altlaufstrasse 40, 85635 Hoehenkirchen-Siegertsbrunn, GERMANY
www.lauterbach.com

Registered Office: Hoehenkirchen-Siegertsbrunn, Germany,
Local Court: Munich, HRA 87406, VAT ID: DE246770537,
Managing Directors: Lothar Lauterbach, Stephan Lauterbach, Dr. Thomas Ullmann

-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: macb: fix race condition in TX queue initialization
  2026-02-20 14:27   ` Ingo Rohloff
@ 2026-02-20 23:54     ` Jakub Kicinski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2026-02-20 23:54 UTC (permalink / raw)
  To: Ingo Rohloff; +Cc: nicolas.ferre, claudiu.beznea, netdev

On Fri, 20 Feb 2026 15:27:23 +0100 Ingo Rohloff wrote:
> Subject: Re: [PATCH] net: macb: fix race condition in TX queue initialization
> Date: Thu, 19 Feb 2026 14:43:38 -0800
> 
> > > On Wed, 18 Feb 2026 11:43:25 +0100 Ingo Rohloff wrote:
> > > Commit bf9cf80cab81e ("net: macb: Fix tx/rx malfunction after phy link
> > > down and up") initializes queue->tx_head and queue->tx_tail to zero
> > > without holding queue->tx_ptr_lock.  
> 
> > That commit does look quite buggy, but I'm not sure whether the fix is
> > effective.  
> 
> Yes: I myself think that what I posted is definitely not enough.
> 
> I am not sure if I can come up with a real fix though:
> 
> Looking at ethernet drivers which use both:
>    * phylink_mac_ops->mac_link_down()
>    * net_device_ops->ndo_open()
> the most commonly used strategy seems to be:
> * When net_device_ops->ndo_open() is called:
>   - Setup ring buffers
>   - Call netif_tx_start_all_queues()
> * DO NOT call netif_tx_stop_all_queues() or similar when
>   phylink_mac_ops->mac_link_down() is called
> * DO NOT re-initialize ring buffers when
>   phylink_mac_ops->mac_link_up() is called.
> 
> So: Leave the ring buffers alone in
> phylink_mac_ops->mac_link_up() and phylink_mac_ops->mac_link_down().
> 
> The driver in macb_main.c seems to be one of the exceptions
> which calls netif_tx_stop_all_queues() in macb_mac_link_down() and
> netif_tx_wake_all_queues() in macb_mac_link_up().
> 
> My problem is:
> I don't understand why the OTHER drivers don't do it like this.
> What's the expectation of the hardware, if I fill the TX ring buffer
> with TX skb's even if the link is down?
> Is the hardware supposed to throw all these TX entries away
> and still produce completion or error(?) interrupts?
> Or is this handled, because drivers which use "phylink",
> automatically call netif_carrier_on() and netif_carrier_off() via
> "phylink" when the link changes up/down state?
> 
> 
> If you know the expected behavior:
> Could you teach me what's expected ?
> 
> If I understand the expected behavior I might be able to come up
> with an idea what the macb driver is supposed to do :-)

Before I tell you what I think - a disclosure that I have limited
experience with embedded devices, so take what I say with a grain
of salt. I mostly work on integrated NIC designs, day to day.

Normally the MAC has a "flush" bit of some sort which makes it consume
all the packets sent to the device / placed on the xmit queues.
With this mode enabled the device completes the transfers to the host
as if the send was successful, but the packets go nowhere.
This is necessary because we don't want to suddenly send a burst of
potentially extremely old packets when the link finally comes back up.

The driver is not expected to stop the Tx rings when link goes down.

I suppose macb either lacks such flush functionality or it's broken?
That'd explain why it tries to reset the ring manually.

Note that for integrated NICs the link is usually managed by FW running
in the NIC, so you won't see the driver do anything in response to the
link down.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-20 23:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 10:43 [PATCH] net: macb: fix race condition in TX queue initialization Ingo Rohloff
2026-02-19 22:43 ` Jakub Kicinski
2026-02-20 14:27   ` Ingo Rohloff
2026-02-20 23:54     ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox