Netdev List
 help / color / mirror / Atom feed
* [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
@ 2026-06-12  9:01 Andrea della Porta
  2026-06-12  9:45 ` Théo Lebrun
  2026-06-12 12:23 ` Nicolai Buchwitz
  0 siblings, 2 replies; 9+ messages in thread
From: Andrea della Porta @ 2026-06-12  9:01 UTC (permalink / raw)
  To: netdev, Theo Lebrun, Andrea della Porta, Nicolas Ferre,
	Claudiu Beznea, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, linux-arm-kernel,
	linux-rpi-kernel
  Cc: Lukasz Raczylo, Steffen Jaeckel

From: Lukasz Raczylo <lukasz@raczylo.com>

The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
the TX queue.
While the exact root cause is not yet fully understood, it is likely
related to a hardware issue where a TSTART write to the NCR register
is missed, preventing the transmission from being kicked off.

Implement a timeout callback to handle TX queue stalls, triggering the
existing restart mechanism to recover.

Link: https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi RP1 ethernet controller")
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index a12aa21244e83..615da65d5d68d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -4522,6 +4522,16 @@ static int macb_setup_tc(struct net_device *dev, enum tc_setup_type type,
 	}
 }
 
+static void macb_tx_timeout(struct net_device *dev, unsigned int q)
+{
+	struct macb *bp = netdev_priv(dev);
+
+	if (net_ratelimit())
+		netdev_err(dev, "TX stall detected, re-kicking TSTART\n");
+	dev->stats.tx_errors++;
+	macb_tx_restart(&bp->queues[q]);
+}
+
 static const struct net_device_ops macb_netdev_ops = {
 	.ndo_open		= macb_open,
 	.ndo_stop		= macb_close,
@@ -4540,6 +4550,7 @@ static const struct net_device_ops macb_netdev_ops = {
 	.ndo_hwtstamp_set	= macb_hwtstamp_set,
 	.ndo_hwtstamp_get	= macb_hwtstamp_get,
 	.ndo_setup_tc		= macb_setup_tc,
+	.ndo_tx_timeout		= macb_tx_timeout,
 };
 
 /* Configure peripheral capabilities according to device tree
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
  2026-06-12  9:01 [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write Andrea della Porta
@ 2026-06-12  9:45 ` Théo Lebrun
  2026-06-12 12:40   ` Andrea della Porta
  2026-06-12 12:23 ` Nicolai Buchwitz
  1 sibling, 1 reply; 9+ messages in thread
From: Théo Lebrun @ 2026-06-12  9:45 UTC (permalink / raw)
  To: Andrea della Porta, netdev, Nicolas Ferre, Claudiu Beznea,
	Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-kernel, linux-arm-kernel, linux-rpi-kernel
  Cc: Lukasz Raczylo, Steffen Jaeckel

Hello Andrea,

On Fri Jun 12, 2026 at 11:01 AM CEST, Andrea della Porta wrote:
> From: Lukasz Raczylo <lukasz@raczylo.com>
>
> The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
> the TX queue.
> While the exact root cause is not yet fully understood, it is likely
> related to a hardware issue where a TSTART write to the NCR register
> is missed, preventing the transmission from being kicked off.
>
> Implement a timeout callback to handle TX queue stalls, triggering the
> existing restart mechanism to recover.
>
> Link: https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
> Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi RP1 ethernet controller")
> Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
> Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
> Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
> Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
> Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index a12aa21244e83..615da65d5d68d 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -4522,6 +4522,16 @@ static int macb_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  	}
>  }
>  
> +static void macb_tx_timeout(struct net_device *dev, unsigned int q)
> +{
> +	struct macb *bp = netdev_priv(dev);
> +
> +	if (net_ratelimit())
> +		netdev_err(dev, "TX stall detected, re-kicking TSTART\n");

Is this standard? It looks odd.

> +	dev->stats.tx_errors++;

I am surprised by this. `tx_errors` would ideally be one per packet that
didn't get sent. Here we increment it once per queue that stalled.

I have a series to address stats issue (and use netdev_stat_ops API).
It is a follow-up to this:
https://lore.kernel.org/netdev/20260428-macb-drop-tx-v2-0-647f5199d8df@bootlin.com/

Also this is per-device shared data and we access it without
synchronisation.

Let's drop this increment.

> +	macb_tx_restart(&bp->queues[q]);
> +}

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
  2026-06-12  9:01 [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write Andrea della Porta
  2026-06-12  9:45 ` Théo Lebrun
@ 2026-06-12 12:23 ` Nicolai Buchwitz
  2026-06-12 12:51   ` Andrea della Porta
  1 sibling, 1 reply; 9+ messages in thread
From: Nicolai Buchwitz @ 2026-06-12 12:23 UTC (permalink / raw)
  To: Andrea della Porta
  Cc: netdev, Theo Lebrun, Nicolas Ferre, Claudiu Beznea, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel, linux-arm-kernel, linux-rpi-kernel, Lukasz Raczylo,
	Steffen Jaeckel

Hi Andrea

On 12.6.2026 11:01, Andrea della Porta wrote:
> From: Lukasz Raczylo <lukasz@raczylo.com>
> 
> The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
> the TX queue.
> While the exact root cause is not yet fully understood, it is likely
> related to a hardware issue where a TSTART write to the NCR register
> is missed, preventing the transmission from being kicked off.
> 
> Implement a timeout callback to handle TX queue stalls, triggering the
> existing restart mechanism to recover.
> 
> Link: 
> https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
> Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi 
> RP1 ethernet controller")
> Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
> Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
> Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
> Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
> Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
> b/drivers/net/ethernet/cadence/macb_main.c
> index a12aa21244e83..615da65d5d68d 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -4522,6 +4522,16 @@ static int macb_setup_tc(struct net_device *dev, 
> enum tc_setup_type type,
>  	}
>  }
> 
> +static void macb_tx_timeout(struct net_device *dev, unsigned int q)
> +{
> +	struct macb *bp = netdev_priv(dev);
> +
> +	if (net_ratelimit())

Do we need the net_ratelimit() check (and message) here? AFAIU the 
watchdog core already prints a message for every timeout.

> +		netdev_err(dev, "TX stall detected, re-kicking TSTART\n");
> +	dev->stats.tx_errors++;
> +	macb_tx_restart(&bp->queues[q]);
> +}
> +
>  static const struct net_device_ops macb_netdev_ops = {
>  	.ndo_open		= macb_open,
>  	.ndo_stop		= macb_close,
> @@ -4540,6 +4550,7 @@ static const struct net_device_ops 
> macb_netdev_ops = {
>  	.ndo_hwtstamp_set	= macb_hwtstamp_set,
>  	.ndo_hwtstamp_get	= macb_hwtstamp_get,
>  	.ndo_setup_tc		= macb_setup_tc,
> +	.ndo_tx_timeout		= macb_tx_timeout,

The commit message describes it as RP1 specific, but it gets applied to 
all other variants?

>  };
> 
>  /* Configure peripheral capabilities according to device tree

Thanks
Nicolai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
  2026-06-12  9:45 ` Théo Lebrun
@ 2026-06-12 12:40   ` Andrea della Porta
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea della Porta @ 2026-06-12 12:40 UTC (permalink / raw)
  To: Théo Lebrun
  Cc: Andrea della Porta, netdev, Nicolas Ferre, Claudiu Beznea,
	Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-kernel, linux-arm-kernel, linux-rpi-kernel,
	Lukasz Raczylo, Steffen Jaeckel

Hi Theo,

On 11:45 Fri 12 Jun     , Théo Lebrun wrote:
> Hello Andrea,
> 
> On Fri Jun 12, 2026 at 11:01 AM CEST, Andrea della Porta wrote:
> > From: Lukasz Raczylo <lukasz@raczylo.com>
> >
> > The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
> > the TX queue.
> > While the exact root cause is not yet fully understood, it is likely
> > related to a hardware issue where a TSTART write to the NCR register
> > is missed, preventing the transmission from being kicked off.
> >
> > Implement a timeout callback to handle TX queue stalls, triggering the
> > existing restart mechanism to recover.
> >
> > Link: https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
> > Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi RP1 ethernet controller")
> > Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
> > Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
> > Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
> > Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
> > Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
> > ---
> >  drivers/net/ethernet/cadence/macb_main.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> > index a12aa21244e83..615da65d5d68d 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -4522,6 +4522,16 @@ static int macb_setup_tc(struct net_device *dev, enum tc_setup_type type,
> >  	}
> >  }
> >  
> > +static void macb_tx_timeout(struct net_device *dev, unsigned int q)
> > +{
> > +	struct macb *bp = netdev_priv(dev);
> > +
> > +	if (net_ratelimit())
> > +		netdev_err(dev, "TX stall detected, re-kicking TSTART\n");
> 
> Is this standard? It looks odd.

I've found it used in other drivers, it's the closest I had found that
limit the rate for net related output. As Nicolai suggested, on timeout
a message is already printed by the core, so I will drop those two lines.

> 
> > +	dev->stats.tx_errors++;
> 
> I am surprised by this. `tx_errors` would ideally be one per packet that
> didn't get sent. Here we increment it once per queue that stalled.
> 
> I have a series to address stats issue (and use netdev_stat_ops API).
> It is a follow-up to this:
> https://lore.kernel.org/netdev/20260428-macb-drop-tx-v2-0-647f5199d8df@bootlin.com/
> 
> Also this is per-device shared data and we access it without
> synchronisation.
> 
> Let's drop this increment.

Agreed.

Thanks,
Andrea

> 
> > +	macb_tx_restart(&bp->queues[q]);
> > +}
> 
> Regards,
> 
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
  2026-06-12 12:23 ` Nicolai Buchwitz
@ 2026-06-12 12:51   ` Andrea della Porta
  2026-06-12 12:53     ` Nicolai Buchwitz
  0 siblings, 1 reply; 9+ messages in thread
From: Andrea della Porta @ 2026-06-12 12:51 UTC (permalink / raw)
  To: Nicolai Buchwitz
  Cc: Andrea della Porta, netdev, Theo Lebrun, Nicolas Ferre,
	Claudiu Beznea, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, linux-arm-kernel,
	linux-rpi-kernel, Lukasz Raczylo, Steffen Jaeckel

Hi Nicolai,

On 14:23 Fri 12 Jun     , Nicolai Buchwitz wrote:
> Hi Andrea
> 
> On 12.6.2026 11:01, Andrea della Porta wrote:
> > From: Lukasz Raczylo <lukasz@raczylo.com>
> > 
> > The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
> > the TX queue.
> > While the exact root cause is not yet fully understood, it is likely
> > related to a hardware issue where a TSTART write to the NCR register
> > is missed, preventing the transmission from being kicked off.
> > 
> > Implement a timeout callback to handle TX queue stalls, triggering the
> > existing restart mechanism to recover.
> > 
> > Link:
> > https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
> > Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi
> > RP1 ethernet controller")
> > Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
> > Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
> > Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
> > Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
> > Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
> > ---
> >  drivers/net/ethernet/cadence/macb_main.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index a12aa21244e83..615da65d5d68d 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -4522,6 +4522,16 @@ static int macb_setup_tc(struct net_device *dev,
> > enum tc_setup_type type,
> >  	}
> >  }
> > 
> > +static void macb_tx_timeout(struct net_device *dev, unsigned int q)
> > +{
> > +	struct macb *bp = netdev_priv(dev);
> > +
> > +	if (net_ratelimit())
> 
> Do we need the net_ratelimit() check (and message) here? AFAIU the watchdog
> core already prints a message for every timeout.

Correct. I'll drop those two lines.

> 
> > +		netdev_err(dev, "TX stall detected, re-kicking TSTART\n");
> > +	dev->stats.tx_errors++;
> > +	macb_tx_restart(&bp->queues[q]);
> > +}
> > +
> >  static const struct net_device_ops macb_netdev_ops = {
> >  	.ndo_open		= macb_open,
> >  	.ndo_stop		= macb_close,
> > @@ -4540,6 +4550,7 @@ static const struct net_device_ops macb_netdev_ops
> > = {
> >  	.ndo_hwtstamp_set	= macb_hwtstamp_set,
> >  	.ndo_hwtstamp_get	= macb_hwtstamp_get,
> >  	.ndo_setup_tc		= macb_setup_tc,
> > +	.ndo_tx_timeout		= macb_tx_timeout,
> 
> The commit message describes it as RP1 specific, but it gets applied to all
> other variants?

I've seen this issue happening only on RaspberryPi 5, but AFAIK it
could affect also other MACB blocks connected through PCIe, so it
may be widespread (even though it should have probably already been
noticed in the past). In the orginal driver there's no timeout callback
defined and this is much like pretgending the issue causing the timeout
to happen to go away without doing anything (whatever the cause ot the
specific hw are). So in my opinion we can just extend that to all MACB.
Or maybe we should execute the restart conditionally on 
.compatible = "raspberrypi,rp1-gem"?

Thanks,
Andrea

> 
> >  };
> > 
> >  /* Configure peripheral capabilities according to device tree
> 
> Thanks
> Nicolai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
  2026-06-12 12:51   ` Andrea della Porta
@ 2026-06-12 12:53     ` Nicolai Buchwitz
  2026-06-12 13:03       ` Andrea della Porta
  0 siblings, 1 reply; 9+ messages in thread
From: Nicolai Buchwitz @ 2026-06-12 12:53 UTC (permalink / raw)
  To: Andrea della Porta
  Cc: netdev, Theo Lebrun, Nicolas Ferre, Claudiu Beznea, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel, linux-arm-kernel, linux-rpi-kernel, Lukasz Raczylo,
	Steffen Jaeckel

Hi Andrea

On 12.6.2026 14:51, Andrea della Porta wrote:

> [...]

>> 
>> The commit message describes it as RP1 specific, but it gets applied 
>> to all
>> other variants?
> 
> I've seen this issue happening only on RaspberryPi 5, but AFAIK it
> could affect also other MACB blocks connected through PCIe, so it
> may be widespread (even though it should have probably already been
> noticed in the past). In the orginal driver there's no timeout callback
> defined and this is much like pretgending the issue causing the timeout
> to happen to go away without doing anything (whatever the cause ot the
> specific hw are). So in my opinion we can just extend that to all MACB.
> Or maybe we should execute the restart conditionally on
> .compatible = "raspberrypi,rp1-gem"?

I just observed the issue once, but other people reported it to be 
happen more
frequently. If we can narrow down a reproducer, it would be good to test 
on other
blocks too (like EyeQ at Théo's).|

So maybe you can imagine a good repro for this issue?

Thanks,
Nicolai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
  2026-06-12 12:53     ` Nicolai Buchwitz
@ 2026-06-12 13:03       ` Andrea della Porta
  2026-06-12 14:28         ` Théo Lebrun
  0 siblings, 1 reply; 9+ messages in thread
From: Andrea della Porta @ 2026-06-12 13:03 UTC (permalink / raw)
  To: Nicolai Buchwitz
  Cc: Andrea della Porta, netdev, Theo Lebrun, Nicolas Ferre,
	Claudiu Beznea, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, linux-arm-kernel,
	linux-rpi-kernel, Lukasz Raczylo, Steffen Jaeckel

Hi Nicolai,

On 14:53 Fri 12 Jun     , Nicolai Buchwitz wrote:
> Hi Andrea
> 
> On 12.6.2026 14:51, Andrea della Porta wrote:
> 
> > [...]
> 
> > > 
> > > The commit message describes it as RP1 specific, but it gets applied
> > > to all
> > > other variants?
> > 
> > I've seen this issue happening only on RaspberryPi 5, but AFAIK it
> > could affect also other MACB blocks connected through PCIe, so it
> > may be widespread (even though it should have probably already been
> > noticed in the past). In the orginal driver there's no timeout callback
> > defined and this is much like pretgending the issue causing the timeout
> > to happen to go away without doing anything (whatever the cause ot the
> > specific hw are). So in my opinion we can just extend that to all MACB.
> > Or maybe we should execute the restart conditionally on
> > .compatible = "raspberrypi,rp1-gem"?
> 
> I just observed the issue once, but other people reported it to be happen
> more
> frequently. If we can narrow down a reproducer, it would be good to test on
> other
> blocks too (like EyeQ at Théo's).|
> 
> So maybe you can imagine a good repro for this issue?

Sure, it's happening quite often during bulk dataflow, at least
on my RPi5.
It can be reproduced with the following, issued from the DUT:

  iperf -c <SERVER_IP> -P 10 -t 3000 -w 4M -i 1

plus, of course, the related command on server side: iperf -s.

It usually happens a couple of times withing a few hours.

Regards,
Andrea

> 
> Thanks,
> Nicolai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
  2026-06-12 13:03       ` Andrea della Porta
@ 2026-06-12 14:28         ` Théo Lebrun
  2026-06-12 14:30           ` Théo Lebrun
  0 siblings, 1 reply; 9+ messages in thread
From: Théo Lebrun @ 2026-06-12 14:28 UTC (permalink / raw)
  To: Andrea della Porta, Nicolai Buchwitz
  Cc: netdev, Nicolas Ferre, Claudiu Beznea, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel, linux-arm-kernel, linux-rpi-kernel, Lukasz Raczylo,
	Steffen Jaeckel

On Fri Jun 12, 2026 at 3:03 PM CEST, Andrea della Porta wrote:
> On 14:53 Fri 12 Jun     , Nicolai Buchwitz wrote:
>> On 12.6.2026 14:51, Andrea della Porta wrote:
>> > > The commit message describes it as RP1 specific, but it gets applied
>> > > to all
>> > > other variants?
>> > 
>> > I've seen this issue happening only on RaspberryPi 5, but AFAIK it
>> > could affect also other MACB blocks connected through PCIe, so it
>> > may be widespread (even though it should have probably already been
>> > noticed in the past). In the orginal driver there's no timeout callback
>> > defined and this is much like pretgending the issue causing the timeout
>> > to happen to go away without doing anything (whatever the cause ot the
>> > specific hw are). So in my opinion we can just extend that to all MACB.
>> > Or maybe we should execute the restart conditionally on
>> > .compatible = "raspberrypi,rp1-gem"?
>> 
>> I just observed the issue once, but other people reported it to be happen
>> more
>> frequently. If we can narrow down a reproducer, it would be good to test on
>> other
>> blocks too (like EyeQ at Théo's).|
>> 
>> So maybe you can imagine a good repro for this issue?
>
> Sure, it's happening quite often during bulk dataflow, at least
> on my RPi5.
> It can be reproduced with the following, issued from the DUT:
>
>   iperf -c <SERVER_IP> -P 10 -t 3000 -w 4M -i 1
>
> plus, of course, the related command on server side: iperf -s.
>
> It usually happens a couple of times withing a few hours.

Thanks for the reproducer command; I'll run it next week.
I'd be surprised if it reproduced on hardware that isn't the Pi 5.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write
  2026-06-12 14:28         ` Théo Lebrun
@ 2026-06-12 14:30           ` Théo Lebrun
  0 siblings, 0 replies; 9+ messages in thread
From: Théo Lebrun @ 2026-06-12 14:30 UTC (permalink / raw)
  To: Andrea della Porta, Nicolai Buchwitz
  Cc: netdev, Nicolas Ferre, Claudiu Beznea, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel, linux-arm-kernel, linux-rpi-kernel, Lukasz Raczylo,
	Steffen Jaeckel

On Fri Jun 12, 2026 at 4:28 PM CEST, Théo Lebrun wrote:
> On Fri Jun 12, 2026 at 3:03 PM CEST, Andrea della Porta wrote:
>> On 14:53 Fri 12 Jun     , Nicolai Buchwitz wrote:
>>> On 12.6.2026 14:51, Andrea della Porta wrote:
>>> > > The commit message describes it as RP1 specific, but it gets applied
>>> > > to all
>>> > > other variants?
>>> > 
>>> > I've seen this issue happening only on RaspberryPi 5, but AFAIK it
>>> > could affect also other MACB blocks connected through PCIe, so it
>>> > may be widespread (even though it should have probably already been
>>> > noticed in the past). In the orginal driver there's no timeout callback
>>> > defined and this is much like pretgending the issue causing the timeout
>>> > to happen to go away without doing anything (whatever the cause ot the
>>> > specific hw are). So in my opinion we can just extend that to all MACB.
>>> > Or maybe we should execute the restart conditionally on
>>> > .compatible = "raspberrypi,rp1-gem"?
>>> 
>>> I just observed the issue once, but other people reported it to be happen
>>> more
>>> frequently. If we can narrow down a reproducer, it would be good to test on
>>> other
>>> blocks too (like EyeQ at Théo's).|
>>> 
>>> So maybe you can imagine a good repro for this issue?
>>
>> Sure, it's happening quite often during bulk dataflow, at least
>> on my RPi5.
>> It can be reproduced with the following, issued from the DUT:
>>
>>   iperf -c <SERVER_IP> -P 10 -t 3000 -w 4M -i 1
>>
>> plus, of course, the related command on server side: iperf -s.
>>
>> It usually happens a couple of times withing a few hours.
>
> Thanks for the reproducer command; I'll run it next week.
> I'd be surprised if it reproduced on hardware that isn't the Pi 5.

Sorry for the two-step message. I forgot to mention I'd prefer to have
the timeout callback on all platforms: don't reserve it for Pi 5.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-12 14:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12  9:01 [PATCH] net: macb: add TX stall timeout callback to recover from lost TSTART write Andrea della Porta
2026-06-12  9:45 ` Théo Lebrun
2026-06-12 12:40   ` Andrea della Porta
2026-06-12 12:23 ` Nicolai Buchwitz
2026-06-12 12:51   ` Andrea della Porta
2026-06-12 12:53     ` Nicolai Buchwitz
2026-06-12 13:03       ` Andrea della Porta
2026-06-12 14:28         ` Théo Lebrun
2026-06-12 14:30           ` Théo Lebrun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox