[PATCH net-next] net: stmmac: enable RPS and RBU interrupts

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
@ 2026-04-10 13:07 Russell King (Oracle)
  2026-04-12 14:01 ` Maxime Chevallier
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-10 13:07 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, linux-arm-kernel, linux-stm32, netdev,
	Paolo Abeni, Sam Edwards

Enable receive process stopped and receive buffer unavailable
interrupts, so that the statistic counters can be updated.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Since we are seeing receive buffer exhaustion on several platforms,
let's enable the interrupts so the statistics we publish via ethtool -S
actually work to aid diagnosis. I've been in two minds about whether
to send this patch, but given the problems with stmmac at the moment,
I think it should be merged.

 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
index af6580332d49..43b036d4e95b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
@@ -99,6 +99,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
 #define DMA_CHAN_INTR_ENA_NIE_4_10	BIT(15)
 #define DMA_CHAN_INTR_ENA_AIE_4_10	BIT(14)
 #define DMA_CHAN_INTR_ENA_FBE		BIT(12)
+#define DMA_CHAN_INTR_ENA_RPS		BIT(8)
+#define DMA_CHAN_INTR_ENA_RBU		BIT(7)
 #define DMA_CHAN_INTR_ENA_RIE		BIT(6)
 #define DMA_CHAN_INTR_ENA_TIE		BIT(0)
 
@@ -107,6 +109,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
 					 DMA_CHAN_INTR_ENA_TIE)
 
 #define DMA_CHAN_INTR_ABNORMAL		(DMA_CHAN_INTR_ENA_AIE | \
+					 DMA_CHAN_INTR_ENA_RPS | \
+					 DMA_CHAN_INTR_ENA_RBU | \
 					 DMA_CHAN_INTR_ENA_FBE)
 /* DMA default interrupt mask for 4.00 */
 #define DMA_CHAN_INTR_DEFAULT_MASK	(DMA_CHAN_INTR_NORMAL | \
@@ -117,6 +121,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
 					 DMA_CHAN_INTR_ENA_TIE)
 
 #define DMA_CHAN_INTR_ABNORMAL_4_10	(DMA_CHAN_INTR_ENA_AIE_4_10 | \
+					 DMA_CHAN_INTR_ENA_RPS | \
+					 DMA_CHAN_INTR_ENA_RBU | \
 					 DMA_CHAN_INTR_ENA_FBE)
 /* DMA default interrupt mask for 4.10a */
 #define DMA_CHAN_INTR_DEFAULT_MASK_4_10	(DMA_CHAN_INTR_NORMAL_4_10 | \
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle)
@ 2026-04-12 14:01 ` Maxime Chevallier
  2026-04-12 14:23   ` Russell King (Oracle)
  2026-04-13 18:02 ` Jakub Kicinski
  2026-04-13 22:00 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 20+ messages in thread
From: Maxime Chevallier @ 2026-04-12 14:01 UTC (permalink / raw)
  To: Russell King (Oracle), Andrew Lunn
  Cc: Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, linux-arm-kernel, linux-stm32, netdev,
	Paolo Abeni, Sam Edwards

Hi Russell,

On 10/04/2026 15:07, Russell King (Oracle) wrote:
> Enable receive process stopped and receive buffer unavailable
> interrupts, so that the statistic counters can be updated.
> 
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> Since we are seeing receive buffer exhaustion on several platforms,
> let's enable the interrupts so the statistics we publish via ethtool -S
> actually work to aid diagnosis. I've been in two minds about whether
> to send this patch, but given the problems with stmmac at the moment,
> I think it should be merged.

Looks like my reply to your original RFC was lost in limbo as the review/test tags are missing.

Here's my original answer :


It works, I can indeed see the stats get properly updated on imx8mp 🙂

There's one downside to it though, which is that as soon as we hit a situation
where we don't have RX bufs available, this patchs has a tendancy to make things
worse as we'll trigger interrupts for each packet we receive and that we can't
process, making it even longer for queues to be refilled.

It shows on iperf3 with small packets :

---- Before patch, 17% packet loss on UDP 56 bytes packets -----------------

# iperf3 -u -b 0 -l 56 -c 192.168.2.1 -R
Connecting to host 192.168.2.1, port 5201
Reverse mode, remote host 192.168.2.1 is sending
[  5] local 192.168.2.18 port 47851 connected to 192.168.2.1 port 5201
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  10.7 MBytes  90.0 Mbits/sec  0.003 ms  48550/249650 (19%)  
[  5]   1.00-2.00   sec  11.3 MBytes  95.0 Mbits/sec  0.003 ms  41881/253832 (16%)  
[  5]   2.00-3.00   sec  11.3 MBytes  94.9 Mbits/sec  0.002 ms  42060/253913 (17%)  
[  5]   3.00-4.00   sec  11.3 MBytes  95.1 Mbits/sec  0.003 ms  41499/253785 (16%)  
[  5]   4.00-5.00   sec  11.3 MBytes  94.6 Mbits/sec  0.003 ms  42663/253787 (17%)  
[  5]   5.00-6.00   sec  11.3 MBytes  94.9 Mbits/sec  0.006 ms  41976/253719 (17%)  
[  5]   6.00-7.00   sec  11.3 MBytes  94.5 Mbits/sec  0.003 ms  43133/253999 (17%)  
[  5]   7.00-8.00   sec  11.3 MBytes  95.0 Mbits/sec  0.004 ms  41442/253579 (16%)  
[  5]   8.00-9.00   sec  11.4 MBytes  95.2 Mbits/sec  0.004 ms  41518/254131 (16%)  
[  5]   9.00-10.00  sec  11.2 MBytes  94.3 Mbits/sec  0.006 ms  43580/254143 (17%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec   135 MBytes   114 Mbits/sec  0.000 ms  0/0 (0%)  sender
[  5]   0.00-10.00  sec   112 MBytes  94.3 Mbits/sec  0.006 ms  428302/2534538 (17%)  receiver

iperf Done.
# ethtool -S eth1 | grep rx_buf_unav_irq
     rx_buf_unav_irq: 0

---- After patch, 22% packet loss on UDP 56 bytes packets ----------------------

# iperf3 -u -b 0 -l 56 -c 192.168.2.1 -R
Connecting to host 192.168.2.1, port 5201
Reverse mode, remote host 192.168.2.1 is sending
[  5] local 192.168.2.18 port 42121 connected to 192.168.2.1 port 5201
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  10.3 MBytes  85.8 Mbits/sec  0.004 ms  55146/247172 (22%)  
[  5]   1.00-2.00   sec  10.6 MBytes  89.1 Mbits/sec  0.003 ms  54699/253355 (22%)  
[  5]   2.00-3.00   sec  10.6 MBytes  89.0 Mbits/sec  0.003 ms  55231/253887 (22%)  
[  5]   3.00-4.00   sec  10.6 MBytes  88.9 Mbits/sec  0.003 ms  55138/253602 (22%)  
[  5]   4.00-5.00   sec  10.6 MBytes  89.0 Mbits/sec  0.003 ms  54938/253722 (22%)  
[  5]   5.00-6.00   sec  10.6 MBytes  88.9 Mbits/sec  0.003 ms  55273/253580 (22%)  
[  5]   6.00-7.00   sec  10.6 MBytes  89.0 Mbits/sec  0.003 ms  55202/253986 (22%)  
[  5]   7.00-8.00   sec  10.6 MBytes  89.1 Mbits/sec  0.003 ms  55047/253958 (22%)  
[  5]   8.00-9.00   sec  10.6 MBytes  88.9 Mbits/sec  0.003 ms  55612/254140 (22%)  
[  5]   9.00-10.00  sec  10.6 MBytes  89.0 Mbits/sec  0.003 ms  55683/254403 (22%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec   135 MBytes   113 Mbits/sec  0.000 ms  0/0 (0%)  sender
[  5]   0.00-10.00  sec   106 MBytes  88.7 Mbits/sec  0.003 ms  551969/2531805 (22%)  receiver

iperf Done.
# ethtool -S eth1 | grep rx_buf_unav_irq
     rx_buf_unav_irq: 30624


So clearly there are pros and cons with this, but I don't want to fall into the
"let's not break microbenchmarks" pitfall.

I personnaly find the stat useful, and that having the stat visible to user
but stuck at 0 is misleading so,

Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>


Maxime

> 
>  drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
> index af6580332d49..43b036d4e95b 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
> @@ -99,6 +99,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
>  #define DMA_CHAN_INTR_ENA_NIE_4_10	BIT(15)
>  #define DMA_CHAN_INTR_ENA_AIE_4_10	BIT(14)
>  #define DMA_CHAN_INTR_ENA_FBE		BIT(12)
> +#define DMA_CHAN_INTR_ENA_RPS		BIT(8)
> +#define DMA_CHAN_INTR_ENA_RBU		BIT(7)
>  #define DMA_CHAN_INTR_ENA_RIE		BIT(6)
>  #define DMA_CHAN_INTR_ENA_TIE		BIT(0)
>  
> @@ -107,6 +109,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
>  					 DMA_CHAN_INTR_ENA_TIE)
>  
>  #define DMA_CHAN_INTR_ABNORMAL		(DMA_CHAN_INTR_ENA_AIE | \
> +					 DMA_CHAN_INTR_ENA_RPS | \
> +					 DMA_CHAN_INTR_ENA_RBU | \
>  					 DMA_CHAN_INTR_ENA_FBE)
>  /* DMA default interrupt mask for 4.00 */
>  #define DMA_CHAN_INTR_DEFAULT_MASK	(DMA_CHAN_INTR_NORMAL | \
> @@ -117,6 +121,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
>  					 DMA_CHAN_INTR_ENA_TIE)
>  
>  #define DMA_CHAN_INTR_ABNORMAL_4_10	(DMA_CHAN_INTR_ENA_AIE_4_10 | \
> +					 DMA_CHAN_INTR_ENA_RPS | \
> +					 DMA_CHAN_INTR_ENA_RBU | \
>  					 DMA_CHAN_INTR_ENA_FBE)
>  /* DMA default interrupt mask for 4.10a */
>  #define DMA_CHAN_INTR_DEFAULT_MASK_4_10	(DMA_CHAN_INTR_NORMAL_4_10 | \



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-12 14:01 ` Maxime Chevallier
@ 2026-04-12 14:23   ` Russell King (Oracle)
  2026-04-13  1:42     ` Sam Edwards
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-12 14:23 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32,
	netdev, Paolo Abeni, Sam Edwards

On Sun, Apr 12, 2026 at 04:01:59PM +0200, Maxime Chevallier wrote:
> Hi Russell,
> 
> On 10/04/2026 15:07, Russell King (Oracle) wrote:
> > Enable receive process stopped and receive buffer unavailable
> > interrupts, so that the statistic counters can be updated.
> > 
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---
> > Since we are seeing receive buffer exhaustion on several platforms,
> > let's enable the interrupts so the statistics we publish via ethtool -S
> > actually work to aid diagnosis. I've been in two minds about whether
> > to send this patch, but given the problems with stmmac at the moment,
> > I think it should be merged.
> 
> Looks like my reply to your original RFC was lost in limbo as the review/test tags are missing.

Thanks. Unfortunately, I can't run iperf3 against stmmac on the Jetson
NX because stmmac just totally screws itself (at the first RBU, the
receive side irrevocably collapses.)

Against i.MX6 (which is limited to around 480Mbps,) it's recoverable by
taking the interface down and back up a couple of times.

Against x86 (which will saturate the link) its pretty much
irrecoverable without entire system reboot - if one tries the down+up,
we then get arm-smmu errors because it seems that, despite stmmac being
reset, it still attempts to access a previous receive buffer from
before the down/up sometime after the up. Moreover, transmit stops
working - packets get queued but they are never processed by the
hardware. This is a scenario that I can only rarely test myself (as
it depends on my physical location.)

As the dwmac 5.0 core receive path seems to lock up after the first
RBU, I never see more than one of those at a time.

Right now, I consider this pretty much unsolvable - I've spent quite
some time looking at it and trying various approaches, nothing seems
to fix it. However, adding dma_rmb() in the descriptor cleanup/refill
paths does seem to improve the situation a little with the 480Mbps
case, because I think it means that we're reading the descriptors in
a more timely manner after the hardware has updated them.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-12 14:23   ` Russell King (Oracle)
@ 2026-04-13  1:42     ` Sam Edwards
  2026-04-13  7:24       ` Russell King (Oracle)
  0 siblings, 1 reply; 20+ messages in thread
From: Sam Edwards @ 2026-04-13  1:42 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
	linux-stm32, netdev, Paolo Abeni

On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> As the dwmac 5.0 core receive path seems to lock up after the first
> RBU, I never see more than one of those at a time.
>
> Right now, I consider this pretty much unsolvable - I've spent quite
> some time looking at it and trying various approaches, nothing seems
> to fix it. However, adding dma_rmb() in the descriptor cleanup/refill
> paths does seem to improve the situation a little with the 480Mbps
> case, because I think it means that we're reading the descriptors in
> a more timely manner after the hardware has updated them.

Hey Russell,

I'd like to repro this but I currently can't boot net-next. My issue
is the same as [1], and the patch to fix it [2] isn't yet committed
anywhere apparently.

This prevents my Jetson Xavier NX from starting at all (and after
enough attempts, corrupts eMMC); I'm surprised you're not suffering
the same effects. But because this bug lives in the IOMMU subsystem
(and it has somewhat inconsistent effects), perhaps this is just a
different way it manifests? Could you confirm whether your dwmac hang
happens with IOMMU disabled, and/or with [1] reverted or [2] applied?

I'm using a defconfig build and a fairly minimal cmdline (just
console=, root=, and rootwait).

Cheers,
Sam

[1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/
[2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-13  1:42     ` Sam Edwards
@ 2026-04-13  7:24       ` Russell King (Oracle)
  2026-04-13  7:28         ` Russell King (Oracle)
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-13  7:24 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
	linux-stm32, netdev, Paolo Abeni

On Sun, Apr 12, 2026 at 06:42:04PM -0700, Sam Edwards wrote:
> On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> > As the dwmac 5.0 core receive path seems to lock up after the first
> > RBU, I never see more than one of those at a time.
> >
> > Right now, I consider this pretty much unsolvable - I've spent quite
> > some time looking at it and trying various approaches, nothing seems
> > to fix it. However, adding dma_rmb() in the descriptor cleanup/refill
> > paths does seem to improve the situation a little with the 480Mbps
> > case, because I think it means that we're reading the descriptors in
> > a more timely manner after the hardware has updated them.
> 
> Hey Russell,
> 
> I'd like to repro this but I currently can't boot net-next. My issue
> is the same as [1], and the patch to fix it [2] isn't yet committed
> anywhere apparently.
> 
> This prevents my Jetson Xavier NX from starting at all (and after
> enough attempts, corrupts eMMC); I'm surprised you're not suffering
> the same effects. But because this bug lives in the IOMMU subsystem
> (and it has somewhat inconsistent effects), perhaps this is just a
> different way it manifests? Could you confirm whether your dwmac hang
> happens with IOMMU disabled, and/or with [1] reverted or [2] applied?
> 
> I'm using a defconfig build and a fairly minimal cmdline (just
> console=, root=, and rootwait).
> 
> Cheers,
> Sam
> 
> [1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/
> [2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/

In the second link, there is this sub-thread:

https://lore.kernel.org/all/ee2c2044-e329-4cdd-ac35-9365824d3677@arm.com/

which was committed into -rc as:

7e0548525abd iommu: Ensure .iotlb_sync is called correctly

which does fix IOMMU problems which caused net-next which reports itself
as v7.0-rc6 failing to boot with ext4 errors. See:

https://lore.kernel.org/r/adZTGOjjJrVJOcT8@shell.armlinux.org.uk

which resulted in it being merged into v7.0-rc7 just before Thursday's
net tree merge. Due to the way net-next is operated, that means that
net-next on Thursday evening gained this fix.

Involving Linus in the problem meant he was aware of it, and explaining
how netdev works allowed him to delay the merging of the net tree to
ensure net-next gained the fix.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-13  7:24       ` Russell King (Oracle)
@ 2026-04-13  7:28         ` Russell King (Oracle)
  0 siblings, 0 replies; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-13  7:28 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
	linux-stm32, netdev, Paolo Abeni

On Mon, Apr 13, 2026 at 08:24:59AM +0100, Russell King (Oracle) wrote:
> On Sun, Apr 12, 2026 at 06:42:04PM -0700, Sam Edwards wrote:
> > On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > > As the dwmac 5.0 core receive path seems to lock up after the first
> > > RBU, I never see more than one of those at a time.
> > >
> > > Right now, I consider this pretty much unsolvable - I've spent quite
> > > some time looking at it and trying various approaches, nothing seems
> > > to fix it. However, adding dma_rmb() in the descriptor cleanup/refill
> > > paths does seem to improve the situation a little with the 480Mbps
> > > case, because I think it means that we're reading the descriptors in
> > > a more timely manner after the hardware has updated them.
> > 
> > Hey Russell,
> > 
> > I'd like to repro this but I currently can't boot net-next. My issue
> > is the same as [1], and the patch to fix it [2] isn't yet committed
> > anywhere apparently.
> > 
> > This prevents my Jetson Xavier NX from starting at all (and after
> > enough attempts, corrupts eMMC); I'm surprised you're not suffering
> > the same effects. But because this bug lives in the IOMMU subsystem
> > (and it has somewhat inconsistent effects), perhaps this is just a
> > different way it manifests? Could you confirm whether your dwmac hang
> > happens with IOMMU disabled, and/or with [1] reverted or [2] applied?
> > 
> > I'm using a defconfig build and a fairly minimal cmdline (just
> > console=, root=, and rootwait).
> > 
> > Cheers,
> > Sam
> > 
> > [1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/
> > [2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/
> 
> In the second link, there is this sub-thread:
> 
> https://lore.kernel.org/all/ee2c2044-e329-4cdd-ac35-9365824d3677@arm.com/
> 
> which was committed into -rc as:
> 
> 7e0548525abd iommu: Ensure .iotlb_sync is called correctly
> 
> which does fix IOMMU problems which caused net-next which reports itself
> as v7.0-rc6 failing to boot with ext4 errors. See:
> 
> https://lore.kernel.org/r/adZTGOjjJrVJOcT8@shell.armlinux.org.uk
> 
> which resulted in it being merged into v7.0-rc7 just before Thursday's
> net tree merge. Due to the way net-next is operated, that means that
> net-next on Thursday evening gained this fix.
> 
> Involving Linus in the problem meant he was aware of it, and explaining
> how netdev works allowed him to delay the merging of the net tree to
> ensure net-next gained the fix.

I'll also state what I've stated previously about the iperf3 problem:
it seems to go back a long time, certainly before I started cleaning
up the stmmac driver which is now well over a year ago.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle)
  2026-04-12 14:01 ` Maxime Chevallier
@ 2026-04-13 18:02 ` Jakub Kicinski
  2026-04-13 18:49   ` Russell King (Oracle)
  2026-04-13 22:00 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-04-13 18:02 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller,
	Eric Dumazet, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni,
	Sam Edwards

On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote:
> Since we are seeing receive buffer exhaustion on several platforms,
> let's enable the interrupts so the statistics we publish via ethtool -S
> actually work to aid diagnosis. I've been in two minds about whether
> to send this patch, but given the problems with stmmac at the moment,
> I think it should be merged.

Sorry for a under-research response but wasn't there are person trying
to fix the OOM starvation issue? Who was supposed to add a timer?
Is your problem also OOM related or do you suspect something else?

Firing interrupts when Rx fill ring runs dry (which IIUC this patches
dies?) is not a good idea.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-13 18:02 ` Jakub Kicinski
@ 2026-04-13 18:49   ` Russell King (Oracle)
  2026-04-13 20:50     ` Jakub Kicinski
  2026-04-13 21:54     ` Sam Edwards
  0 siblings, 2 replies; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-13 18:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller,
	Eric Dumazet, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni,
	Sam Edwards

On Mon, Apr 13, 2026 at 11:02:22AM -0700, Jakub Kicinski wrote:
> On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote:
> > Since we are seeing receive buffer exhaustion on several platforms,
> > let's enable the interrupts so the statistics we publish via ethtool -S
> > actually work to aid diagnosis. I've been in two minds about whether
> > to send this patch, but given the problems with stmmac at the moment,
> > I think it should be merged.
> 
> Sorry for a under-research response but wasn't there are person trying
> to fix the OOM starvation issue? Who was supposed to add a timer?
> Is your problem also OOM related or do you suspect something else?

It is not OOM related. I have this patch applied:

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 131ea887bedc..614d0e10e3e6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5095,14 +5095,18 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)

 		if (!buf->page) {
 			buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
-			if (!buf->page)
+			if (!buf->page) {
+				netdev_err(priv->dev, "q%u: no buffer 1\n", queue);
 				break;
+			}
 		}

 		if (priv->sph_active && !buf->sec_page) {
 			buf->sec_page = page_pool_alloc_pages(rx_q->page_pool, gfp);
-			if (!buf->sec_page)
+			if (!buf->sec_page) {
+				netdev_err(priv->dev, "q%u: no buffer 2\n", queue);
 				break;
+			}

 			buf->sec_addr = page_pool_get_dma_addr(buf->sec_page);
 		}

and it is silent, so we are not suffering starvation of buffers.

However, the hardware hangs during iperf3, and because it triggers the
MAC to stream PAUSE frames, and my network uses Netgear GS108 and GS116
unmanaged switches that always use flow-control between them (there's no
way not to) it takes down the entire network - as we've discussed
before. So, this problem is pretty fatal to the *entire* network.

With this patch, the existing statistical counters for this condition
are incremented, and thus users can use ethtool -S to see what happened
and report whether they are seeing the same issue.

Without this patch applied, there are no diagnostics from stmmac that
report what the state is. ethtool -d doesn't list the appropriate
registers (as I suspect part of the problem is the number of queues
is somewhat dynamic - userspace can change that configuration through
ethtool).

Thus, one has to resort to using devmem2 to find out what's happened.
That's not user friendly.

For me, devmem2 shows:

Channel 0 status register:
Value at address 0x02491160: 0x00000484
bit 10: ETI early transmit interrupt - set
bit 9 : RWT receive watchdog - clear
bit 8 : RPS receieve process stopped - clear
bit 7 : RBU receive buffer unavailable - set
bit 6 : RI  receive interrupt - clear
bit 2 : TBU transmit buffer unavailable - set
bit 1 : TPS transmit process stopped - clear
bit 0 : TI  transmit interrupt - clear

Debug status register:
Value at address 0x0249100c: 0x00006300
TPS[3:0] = 6 = Suspended, Tx descriptor unavailable or Tx buffer
		underflow
RPS[3:0] = 3 = Running, waiting for Rx packet

Metal Queue 0 debug register:
Value at address 0x02490d38: 0x002e0020
PRXQ[13:0] = 0x2e = 46 packets in receive queue
RXQSTS[1:0] = 2 = Rx queue fill-level above flow-control activate
		threshold
RRCSTS[1:0] = 0 = Rx Queue Read Controller State = Idle

> Firing interrupts when Rx fill ring runs dry (which IIUC this patches
> dies?) is not a good idea.

Well, I'm thinking that at least on some platforms, such as the Jetson
Xavier NX, unless a different solution can be found, we need the RBU
interrupt to fire off a reset of the stmmac IP when this happens to
reduce the PAUSE frame flood on the network.

If we can't do that, then I think stmmac on these platforms needs to be
marked with CONFIG_BROKEN because right now there doesn't seem to be any
other viable solution.

My intention with this patch is merely to start collecting the already
existing statistics so other users can start seeing whether they are
hitting the same or similar problem. If we're not prepared to do that,
then we should delete the useless statistics from ethtool -S, but I
suspect they're now part of the UAPI, even though without this patch
they will remain stedfastly stuck at zero.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-13 18:49   ` Russell King (Oracle)
@ 2026-04-13 20:50     ` Jakub Kicinski
  2026-04-13 20:53       ` Russell King (Oracle)
  2026-04-13 21:54     ` Sam Edwards
  1 sibling, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-04-13 20:50 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller,
	Eric Dumazet, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni,
	Sam Edwards

On Mon, 13 Apr 2026 19:49:46 +0100 Russell King (Oracle) wrote:
> > Firing interrupts when Rx fill ring runs dry (which IIUC this patches
> > dies?) is not a good idea.  
> 
> Well, I'm thinking that at least on some platforms, such as the Jetson
> Xavier NX, unless a different solution can be found, we need the RBU
> interrupt to fire off a reset of the stmmac IP when this happens to
> reduce the PAUSE frame flood on the network.
> 
> If we can't do that, then I think stmmac on these platforms needs to be
> marked with CONFIG_BROKEN because right now there doesn't seem to be any
> other viable solution.
> 
> My intention with this patch is merely to start collecting the already
> existing statistics so other users can start seeing whether they are
> hitting the same or similar problem. If we're not prepared to do that,
> then we should delete the useless statistics from ethtool -S, but I
> suspect they're now part of the UAPI, even though without this patch
> they will remain stedfastly stuck at zero.

Understood, thanks for the extra context. And the statistic we are
talking about is rx_buf_unav_irq ?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-13 20:50     ` Jakub Kicinski
@ 2026-04-13 20:53       ` Russell King (Oracle)
  0 siblings, 0 replies; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-13 20:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller,
	Eric Dumazet, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni,
	Sam Edwards

On Mon, Apr 13, 2026 at 01:50:18PM -0700, Jakub Kicinski wrote:
> On Mon, 13 Apr 2026 19:49:46 +0100 Russell King (Oracle) wrote:
> > > Firing interrupts when Rx fill ring runs dry (which IIUC this patches
> > > dies?) is not a good idea.  
> > 
> > Well, I'm thinking that at least on some platforms, such as the Jetson
> > Xavier NX, unless a different solution can be found, we need the RBU
> > interrupt to fire off a reset of the stmmac IP when this happens to
> > reduce the PAUSE frame flood on the network.
> > 
> > If we can't do that, then I think stmmac on these platforms needs to be
> > marked with CONFIG_BROKEN because right now there doesn't seem to be any
> > other viable solution.
> > 
> > My intention with this patch is merely to start collecting the already
> > existing statistics so other users can start seeing whether they are
> > hitting the same or similar problem. If we're not prepared to do that,
> > then we should delete the useless statistics from ethtool -S, but I
> > suspect they're now part of the UAPI, even though without this patch
> > they will remain stedfastly stuck at zero.
> 
> Understood, thanks for the extra context. And the statistic we are
> talking about is rx_buf_unav_irq ?

Yes, correct.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-13 18:49   ` Russell King (Oracle)
  2026-04-13 20:50     ` Jakub Kicinski
@ 2026-04-13 21:54     ` Sam Edwards
  2026-04-14 14:13       ` Russell King (Oracle)
  1 sibling, 1 reply; 20+ messages in thread
From: Sam Edwards @ 2026-04-13 21:54 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

On Mon, Apr 13, 2026, 11:49 Russell King (Oracle) <linux@armlinux.org.uk> wrote:
>
> On Mon, Apr 13, 2026 at 11:02:22AM -0700, Jakub Kicinski wrote:
> > On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote:
> > > Since we are seeing receive buffer exhaustion on several platforms,
> > > let's enable the interrupts so the statistics we publish via ethtool -S
> > > actually work to aid diagnosis. I've been in two minds about whether
> > > to send this patch, but given the problems with stmmac at the moment,
> > > I think it should be merged.
> >
> > Sorry for a under-research response but wasn't there are person trying
> > to fix the OOM starvation issue? Who was supposed to add a timer?
> > Is your problem also OOM related or do you suspect something else?
>
> It is not OOM related. I have this patch applied:
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 131ea887bedc..614d0e10e3e6 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -5095,14 +5095,18 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
>
>                 if (!buf->page) {
>                         buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
> -                       if (!buf->page)
> +                       if (!buf->page) {
> +                               netdev_err(priv->dev, "q%u: no buffer 1\n", queue);
>                                 break;
> +                       }
>                 }
>
>                 if (priv->sph_active && !buf->sec_page) {
>                         buf->sec_page = page_pool_alloc_pages(rx_q->page_pool, gfp);
> -                       if (!buf->sec_page)
> +                       if (!buf->sec_page) {
> +                               netdev_err(priv->dev, "q%u: no buffer 2\n", queue);
>                                 break;
> +                       }
>
>                         buf->sec_addr = page_pool_get_dma_addr(buf->sec_page);
>                 }
>
> and it is silent, so we are not suffering starvation of buffers.
>
> However, the hardware hangs during iperf3, and because it triggers the
> MAC to stream PAUSE frames, and my network uses Netgear GS108 and GS116
> unmanaged switches that always use flow-control between them (there's no
> way not to) it takes down the entire network - as we've discussed
> before. So, this problem is pretty fatal to the *entire* network.
>
> With this patch, the existing statistical counters for this condition
> are incremented, and thus users can use ethtool -S to see what happened
> and report whether they are seeing the same issue.
>
> Without this patch applied, there are no diagnostics from stmmac that
> report what the state is. ethtool -d doesn't list the appropriate
> registers (as I suspect part of the problem is the number of queues
> is somewhat dynamic - userspace can change that configuration through
> ethtool).
>
> Thus, one has to resort to using devmem2 to find out what's happened.
> That's not user friendly.
>
> For me, devmem2 shows:
>
> Channel 0 status register:
> Value at address 0x02491160: 0x00000484
> bit 10: ETI early transmit interrupt - set
> bit 9 : RWT receive watchdog - clear
> bit 8 : RPS receieve process stopped - clear
> bit 7 : RBU receive buffer unavailable - set
> bit 6 : RI  receive interrupt - clear
> bit 2 : TBU transmit buffer unavailable - set
> bit 1 : TPS transmit process stopped - clear
> bit 0 : TI  transmit interrupt - clear
>
> Debug status register:
> Value at address 0x0249100c: 0x00006300
> TPS[3:0] = 6 = Suspended, Tx descriptor unavailable or Tx buffer
>                 underflow
> RPS[3:0] = 3 = Running, waiting for Rx packet
>
> Metal Queue 0 debug register:
> Value at address 0x02490d38: 0x002e0020
> PRXQ[13:0] = 0x2e = 46 packets in receive queue
> RXQSTS[1:0] = 2 = Rx queue fill-level above flow-control activate
>                 threshold
> RRCSTS[1:0] = 0 = Rx Queue Read Controller State = Idle
>
> > Firing interrupts when Rx fill ring runs dry (which IIUC this patches
> > dies?) is not a good idea.
>
> Well, I'm thinking that at least on some platforms, such as the Jetson
> Xavier NX, unless a different solution can be found, we need the RBU
> interrupt to fire off a reset of the stmmac IP when this happens to
> reduce the PAUSE frame flood on the network.

Hi Russell,

Should that reset trigger be RPS, not RBU? My understanding of these
status bits is RBU is just "RxDMA has failed to take a frame from the
RxFIFO" while RPS is "the RxFIFO is full." That would make RBU our
critical threshold to start proactively refilling, and RPS the "too
late, we lose" threshold.

Thinking aloud: Do you suppose the RxDMA waits for a wakeup signal
sent whenever a frame is added to RxFIFO? That might explain why the
former never recovers once the latter is full: a manual wakeup needs
to be sent whenever we resolve RBU. Does the .enable_dma_reception()
op need to be implemented for dwmac5, or have you tried that already?

>
> If we can't do that, then I think stmmac on these platforms needs to be
> marked with CONFIG_BROKEN because right now there doesn't seem to be any
> other viable solution.
>
> My intention with this patch is merely to start collecting the already
> existing statistics so other users can start seeing whether they are
> hitting the same or similar problem. If we're not prepared to do that,
> then we should delete the useless statistics from ethtool -S, but I
> suspect they're now part of the UAPI, even though without this patch
> they will remain stedfastly stuck at zero.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-13 21:54     ` Sam Edwards
@ 2026-04-14 14:13       ` Russell King (Oracle)
  2026-04-15  1:19         ` Russell King (Oracle)
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-14 14:13 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

Hi Sam,

Most of this email was written this morning, but I didn't have a chance
to finish nor send it due to how busy I am.

I had also written a separate reply last night with detailed results of
what I was seeing but didn't/haven't got around to sending it. Not
currently sure whether I saved it as draft or got rid of it yet.

On Mon, Apr 13, 2026 at 02:54:30PM -0700, Sam Edwards wrote:
> On Mon, Apr 13, 2026, 11:49 Russell King (Oracle) <linux@armlinux.org.uk> wrote:
> >
> > On Mon, Apr 13, 2026 at 11:02:22AM -0700, Jakub Kicinski wrote:
> > > On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote:
> > > > Since we are seeing receive buffer exhaustion on several platforms,
> > > > let's enable the interrupts so the statistics we publish via ethtool -S
> > > > actually work to aid diagnosis. I've been in two minds about whether
> > > > to send this patch, but given the problems with stmmac at the moment,
> > > > I think it should be merged.
> > >
> > > Sorry for a under-research response but wasn't there are person trying
> > > to fix the OOM starvation issue? Who was supposed to add a timer?
> > > Is your problem also OOM related or do you suspect something else?
> >
> > It is not OOM related. I have this patch applied:
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 131ea887bedc..614d0e10e3e6 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -5095,14 +5095,18 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
> >
> >                 if (!buf->page) {
> >                         buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
> > -                       if (!buf->page)
> > +                       if (!buf->page) {
> > +                               netdev_err(priv->dev, "q%u: no buffer 1\n", queue);
> >                                 break;
> > +                       }
> >                 }
> >
> >                 if (priv->sph_active && !buf->sec_page) {
> >                         buf->sec_page = page_pool_alloc_pages(rx_q->page_pool, gfp);
> > -                       if (!buf->sec_page)
> > +                       if (!buf->sec_page) {
> > +                               netdev_err(priv->dev, "q%u: no buffer 2\n", queue);
> >                                 break;
> > +                       }
> >
> >                         buf->sec_addr = page_pool_get_dma_addr(buf->sec_page);
> >                 }
> >
> > and it is silent, so we are not suffering starvation of buffers.
> >
> > However, the hardware hangs during iperf3, and because it triggers the
> > MAC to stream PAUSE frames, and my network uses Netgear GS108 and GS116
> > unmanaged switches that always use flow-control between them (there's no
> > way not to) it takes down the entire network - as we've discussed
> > before. So, this problem is pretty fatal to the *entire* network.
> >
> > With this patch, the existing statistical counters for this condition
> > are incremented, and thus users can use ethtool -S to see what happened
> > and report whether they are seeing the same issue.
> >
> > Without this patch applied, there are no diagnostics from stmmac that
> > report what the state is. ethtool -d doesn't list the appropriate
> > registers (as I suspect part of the problem is the number of queues
> > is somewhat dynamic - userspace can change that configuration through
> > ethtool).
> >
> > Thus, one has to resort to using devmem2 to find out what's happened.
> > That's not user friendly.
> >
> > For me, devmem2 shows:
> >
> > Channel 0 status register:
> > Value at address 0x02491160: 0x00000484
> > bit 10: ETI early transmit interrupt - set
> > bit 9 : RWT receive watchdog - clear
> > bit 8 : RPS receieve process stopped - clear
> > bit 7 : RBU receive buffer unavailable - set
> > bit 6 : RI  receive interrupt - clear
> > bit 2 : TBU transmit buffer unavailable - set
> > bit 1 : TPS transmit process stopped - clear
> > bit 0 : TI  transmit interrupt - clear
> 
> Should that reset trigger be RPS, not RBU? My understanding of these
> status bits is RBU is just "RxDMA has failed to take a frame from the
> RxFIFO" while RPS is "the RxFIFO is full." That would make RBU our
> critical threshold to start proactively refilling, and RPS the "too
> late, we lose" threshold.

That's a fine theory, but look at the channel 0 status register above,
noting that any interrupts that are raised but not enabled remain set.
RPS is not set, so RPS is not being raised, only RBU when this
condition occurs.

> Thinking aloud: Do you suppose the RxDMA waits for a wakeup signal
> sent whenever a frame is added to RxFIFO? That might explain why the
> former never recovers once the latter is full: a manual wakeup needs
> to be sent whenever we resolve RBU. Does the .enable_dma_reception()
> op need to be implemented for dwmac5, or have you tried that already?

I've not found anything in the closest documentation I have. The Xavier
is Synopsys IP v5.0, whereas i.MX8M is v5.1 - and v5.1 compared to
previous versions reads the same for statements concerning recovering
from a RBU condition:

"In ring mode, the application should advance the Receive Descriptor
Tail Pointer register of a channel. This bit is set only when the DMA
owns the previous Rx descriptor."

I've tried expanding what happens when RBU fires, dumping some of the
receive state and the receive ring:

[   55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245 last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64

cur_rx == dirty_rx _should_ mean that we fully refilled the ring. These
are their values at the point the RBU interrupt fires.

last_cur_rx and last_dirty_rx are the values of cur_rx/dirty_rx when
stmmac_rx() was last entered.

last_cur_rx_post is the value of cur_rx when stmmac_rx() finished
looping but before we have refilled the ring.

count is the value of count just before stmmac_rx() returns, budget is
the limit at that point.

The patch that prints errors should we fail to allocate a buffer is in
place, none of those errors fire, so we are fully repopulating the ring
each time stmmac_rx() runs.

[   55.766785] RX descriptor ring:
[   55.766802] 000 [0x0000007fffffe000]: 0x0 0x12 0x0 0x340105ee
[   55.766826] 001 [0x0000007fffffe010]: 0x0 0x12 0x0 0x340105ee
[   55.766843] 002 [0x0000007fffffe020]: 0x0 0x12 0x0 0x340105ee
[   55.766860] 003 [0x0000007fffffe030]: 0x0 0x12 0x0 0x340105ee
...
[   55.772205] 308 [0x0000007ffffff340]: 0x0 0x12 0x0 0x340105ee
[   55.772221] 309 [0x0000007ffffff350]: 0x0 0x12 0x0 0x340105ee
[   55.772237] 310 [0x0000007ffffff360]: 0x0 0x12 0x0 0x340105ee
[   55.772253] 311 [0x0000007ffffff370]: 0x0 0x12 0x0 0x340105ee
[   55.772268] 312 [0x0000007ffffff380]: 0x0 0x12 0x0 0x340105ee
[   55.772284] 313 [0x0000007ffffff390]: 0x0 0x12 0x0 0x340105ee
[   55.772300] 314 [0x0000007ffffff3a0]: 0x0 0x12 0x0 0x340105ee
[   55.772315] 315 [0x0000007ffffff3b0]: 0x0 0x12 0x0 0x340105ee
...
[   55.775539] 511 [0x0000007ffffffff0]: 0x0 0x12 0x0 0x340105ee

Every ring entry contains the same RDES3 value, so it really is
completely full at the point RBU fires (bit 31 clear means software
owns the descriptor, and it's basically saying first/last segment,
RDES1 valid, buffer 1 length of 1518.

The Rx tail pointer register contains 0xfffff3a0 which is entry 314.
The current receive descriptor address is also 0xfffff3a0. Note that
these values were obtained some time after the RBU interrupt fired
(due to the time taken for devmem2 to access every stmmac register -
I have a script that dumps the entire stmmac register state via
devmem2.)

The other thing to note is that when looking at debugfs
stmmaceth/eth0/descriptor* (or whatever it's called, I don't have the
NX powered to look at the moment, and I didn't take a copy of it last
night) all tne descriptor entries are fully repopulated with buffers
and owned by the hardware.

I've tried using devmem2 to write to the rx tail pointer to kick it
back into action, but that changes nothing. I've tried writing the
next descriptor value and previous descriptor value, but that appears
to have no effect, it stedfastly remains stuck - and as that is the
documented recovery from RBU and there's no "receive demand" register
listed in dwmac v4 or v5 documentation, there seems to be no other
documented way.

The debug registers that I provided in my previous email suggest that
the MAC is waiting for a packet, and MTL's descriptor reader is idle
(I'm guessing it would only briefly change when the tail pointer is
updated.)

Note that I have augmented the driver with more dma_rmb() + dma_wmb()
in stmmac_rx(), dwmac4_wrback_get_rx_status(), and stmmac_rx_refill()
to ensure that reads and writes to the descriptor ring are correctly
ordered. While this generally allows iperf3 to run for a few more
seconds, it doesn't solve the problem - it is very rare for iperf3
to actually complete before stmmac has taken down my entire network.

I have noticed that on some occasions I see a small number of RBU
interrupts before it falls over.

I'm not going to have much time to look at this today due to further
appointments (I also didn't yesterday - only an hour in the morning
and a bit more time late in the evening/night.) I should have more
time during the rest of the week... but that may change.

From the above, it looks like NAPI/stmmac driver isn't keeping up with
the packet flow coming from an i.MX6 platform (which is limited to
around 470Mbps due to internal SoC bus limitations.)

I'll also mention that stmmac falls apart even more if I run iperf3 -c
-R against an x86 machine that is capable of saturating the network,
so much so that the arm-smmu IOMMU throws errors even after the stmmac
hardware has been soft-reset for addresses that were in the ring
*prior* to the soft-reset occuring (stmmac is soft-reset each time the
netdev is brought up.) The only recovery from that is to reboot -
down/up the interface just spews more IOMMU errors. I don't have the
details of that to hand and I don't have enough time to re-run that
test this morning. From what I remember, the transmit side also stops
processing descriptors (one can see them accumulate in the debugfs
file,) which eventually leads to the netdev watchdog firing.

It currently looks like the stmmac v5 EQoS IP works fine only under
light packet loads. If one puts any stress on it, then the hardware
totally falls apart. This may point to an issue with the AXI bus
configuration that is specific to this platform, but that requires
further investigation.

I'll mention again, in case anyone's forgotten, that these problems
pre-date any of my cleanups I've made to stmmac. From what I remember
they are reproducible with the kernels that are supplied as part of
the nVidia BSP. Again, as I don't have access to the nVidia platform
at the moment, I can't include the details in this email.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-14 14:13       ` Russell King (Oracle)
@ 2026-04-15  1:19         ` Russell King (Oracle)
  2026-04-15  2:12           ` Sam Edwards
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-15  1:19 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
survives iperf3 -c -R to the imx6.

Dumping the registers and comparing, and then forcing the RQS and TQS
values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
*256 = 36864 ytes) respectively seems to solve the problem. Under
net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
all four of the MTL receive operation mode registers, but only the
first MTL transmit operation mode register. However, DMA channels 1-3
aren't initialised.

net-next derives them from:

        unsigned int tqs = fifosz / 256 - 1;

where fifosz is passed in to dwmac4_dma_tx_chan_op_mode() and

        unsigned int rqs = fifosz / 256 - 1;

where fifosz is passed in to dwmac4_dma_rx_chan_op_mode().

Now, according to the DMA capabilities:

        Number of Additional RX channel: 4
        Number of Additional TX channel: 4
        Number of Additional RX queues: 4
        Number of Additional TX queues: 4
        TX Fifo Size: 65536
        RX Fifo Size: 65536

However:

# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:             4
TX:             4
Other:          0
Combined:       0
Current hardware settings:
RX:             1
TX:             1
Other:          0
Combined:       0

So, we end up allocating the entire 64K of the tx and rx FIFO to one
queue in net-next.

Looking back at 5.10, I don't see any code that would account for these
values being programmed for TQS and RQS, it looks like the calculations
are basically the same as we have today.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-15  1:19         ` Russell King (Oracle)
@ 2026-04-15  2:12           ` Sam Edwards
  2026-04-15 12:43             ` Russell King (Oracle)
  0 siblings, 1 reply; 20+ messages in thread
From: Sam Edwards @ 2026-04-15  2:12 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> survives iperf3 -c -R to the imx6.

Hi Russell,

Aw, you beat me to it! I was about to report that 5.10.104-tegra is
unaffected. And my iperf3 server is a multi-GbE amd64 machine.

> Dumping the registers and comparing, and then forcing the RQS and TQS
> values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> *256 = 36864 ytes) respectively seems to solve the problem. Under
> net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> all four of the MTL receive operation mode registers, but only the
> first MTL transmit operation mode register. However, DMA channels 1-3
> aren't initialised.

Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
than that, so when the DMA suffers a momentary hiccup, the FIFOs are
allowed to overflow, putting the hardware in a bad state.

Though I suspect this is only half of the problem: do you still see
RBUs? Everything you've shared so far suggests the DMA failures are
_not_ because the rx ring is drying up. My gut's telling me the DMA
unit is encountering an AXI error, triggering RBU plus some kind of
recovery behavior, and the recovery takes the DMA offline long enough
for the FIFO to overflow (without triggering RPS because the RQS
threshold is unreachable).

It seems that the problem happens less frequently on my test setup
when I boot with iommu.passthrough=1 but that could be my imagination.
But if the hardware remains stable with RQS and TQS set correctly, I
don't feel an urgent need to dig deeper. :)

> Looking back at 5.10, I don't see any code that would account for these
> values being programmed for TQS and RQS, it looks like the calculations
> are basically the same as we have today.

Note that Nvidia have their own "nvethernet" driver for their vendor
kernel, which appears to pick the FIFO sizes from hardcoded tables in
its eqos_configure_mtl_queue() [1] function.

Cheers,
Sam

[1] https://github.com/proski/nvethernet/blob/main/nvethernetrm/osi/core/eqos_core.c#L263

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-15  2:12           ` Sam Edwards
@ 2026-04-15 12:43             ` Russell King (Oracle)
  2026-04-15 17:38               ` Sam Edwards
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-15 12:43 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote:
> On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> > survives iperf3 -c -R to the imx6.
> 
> Hi Russell,
> 
> Aw, you beat me to it! I was about to report that 5.10.104-tegra is
> unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> 
> > Dumping the registers and comparing, and then forcing the RQS and TQS
> > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> > *256 = 36864 ytes) respectively seems to solve the problem. Under
> > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> > all four of the MTL receive operation mode registers, but only the
> > first MTL transmit operation mode register. However, DMA channels 1-3
> > aren't initialised.
> 
> Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
> than that, so when the DMA suffers a momentary hiccup, the FIFOs are
> allowed to overflow, putting the hardware in a bad state.
> 
> Though I suspect this is only half of the problem: do you still see
> RBUs? Everything you've shared so far suggests the DMA failures are
> _not_ because the rx ring is drying up.

Yes. Note that RBUs will happen not because of DMA failures, but if
the kernel fails to keep up with the packet rate. RBU means "we read
the next descriptor, and it wasn't owned by hardware".

> > Looking back at 5.10, I don't see any code that would account for these
> > values being programmed for TQS and RQS, it looks like the calculations
> > are basically the same as we have today.
> 
> Note that Nvidia have their own "nvethernet" driver for their vendor
> kernel, which appears to pick the FIFO sizes from hardcoded tables in
> its eqos_configure_mtl_queue() [1] function.

That has:

	const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
		{ FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
		  FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
		{ FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U),
		  FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) },
	};
	const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
		{ FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
		  FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
		{ FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U),
		  FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) },
	};

where each of those values is the RQS/TQS value to use in KiB:

#define FIFO_SZ(x)		((((x) * 1024U) / 256U) - 1U)

This doesn't correspond with the values I'm seeing programmed into
the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143
(36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables
above from a quick look, but they're not in the right place!

For example, tx_fifo_sz[] doesn't contain an entry for 36KiB.
rx_fifo_sz[0][0..3] looks plausible.

It's certainly not a case of misreading the register values, this is
what devmem2 said:

Value at address 0x02490d00: 0x008f000a
Value at address 0x02490d30: 0x02379eb0

where TQS is bits 24:16 of the register at offset 0xd00 - which is
0x8f, and RQS is bits 29:20 of the register at 0xd30, which is
0x23.

Now, as for FIFO sizes, if we sum up all the entries, then we
get:

SUM(rx_fifo_size[0][]) = 60KiB
SUM(rx_fifo_size[1][]) = 64KiB
SUM(tx_fifo_size[0][]) = 60KiB
SUM(tx_fifo_size[1][]) = 64KiB

From what I gather in core_local.h, l_mac_ver contains one of three
values - 0 = Legacy EQOS, 1 = Orin EQOS, 2 = Orin MGBE, and which
set of values is selected by bit 0 of that. Decoding this further,
Legacy EQOS is IP version v5.0, Orin EQOS is v5.3, and Orin MGBE
is v3.1 and v4.0.

So, I wonder whether there's something in "Legacy EQOS" that consumes
4KiB of FIFO that isn't documented in iMX8M (IP v5.1).

Is anyone aware of public SoC documentation that covers the v5.0 IP
version?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-15 12:43             ` Russell King (Oracle)
@ 2026-04-15 17:38               ` Sam Edwards
  2026-04-15 19:37                 ` Russell King (Oracle)
  0 siblings, 1 reply; 20+ messages in thread
From: Sam Edwards @ 2026-04-15 17:38 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

On Wed, Apr 15, 2026 at 5:44 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote:
> > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> > > survives iperf3 -c -R to the imx6.
> >
> > Hi Russell,
> >
> > Aw, you beat me to it! I was about to report that 5.10.104-tegra is
> > unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> >
> > > Dumping the registers and comparing, and then forcing the RQS and TQS
> > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> > > *256 = 36864 ytes) respectively seems to solve the problem. Under
> > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> > > all four of the MTL receive operation mode registers, but only the
> > > first MTL transmit operation mode register. However, DMA channels 1-3
> > > aren't initialised.
> >
> > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
> > than that, so when the DMA suffers a momentary hiccup, the FIFOs are
> > allowed to overflow, putting the hardware in a bad state.
> >
> > Though I suspect this is only half of the problem: do you still see
> > RBUs? Everything you've shared so far suggests the DMA failures are
> > _not_ because the rx ring is drying up.
>
> Yes. Note that RBUs will happen not because of DMA failures, but if
> the kernel fails to keep up with the packet rate. RBU means "we read
> the next descriptor, and it wasn't owned by hardware".

Are you speaking from observation, documentation, or understanding?
I'd define RBU the same way, but you reported:

```
[   55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer
unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245
last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64

cur_rx == dirty_rx _should_ mean that we fully refilled the ring. [...]
[...]
Every ring entry contains the same RDES3 value, so it really is
completely full at the point RBU fires (bit 31 clear means software
owns the descriptor, and it's basically saying first/last segment,
RDES1 valid, buffer 1 length of 1518.
```

It would seem* that the kernel isn't really failing to keep up with
the packet rate. If RBU is firing with a ring that's not even close to
empty, that tells me there's another way for it to fire. So I suspect
the hardware designers implemented it to mean:
"We couldn't read the next descriptor, _or_ it wasn't owned by hardware."

(* However, if bit 31 is clear everywhere, wouldn't that mean the ring
is actually completely depleted, not full? If count==budget, wouldn't
that mean the whole ring hasn't been visited, so we only refilled 64
entries and not necessarily the entire ring? Maybe the kernel isn't
keeping up after all.)

> That has:
>
>         const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
>                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
>                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
>                 { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U),
>                   FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) },
>         };
>         const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
>                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
>                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
>                 { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U),
>                   FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) },
>         };
>
> where each of those values is the RQS/TQS value to use in KiB:
>
> #define FIFO_SZ(x)              ((((x) * 1024U) / 256U) - 1U)
>
> This doesn't correspond with the values I'm seeing programmed into
> the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143
> (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables
> above from a quick look, but they're not in the right place!

True, but:
a) I doubt 5.10.216-tegra includes exactly the same version of the
driver found in this random GitHub mirror. (My intent was only to
point out that they don't use 5.10's stmmac; I should have been more
clear that I wasn't trying to link the same version, sorry!)
b) This is vendor code; I don't know how good their testing/review
process is. It might not run the way it looks. The intent seems to be
for RQS > TQS (which makes intuitive sense), but as you're seeing the
registers programmed the other way 'round, they might have gotten them
subtly mixed up.

> Now, as for FIFO sizes, if we sum up all the entries, then we
> get:
>
> SUM(rx_fifo_size[0][]) = 60KiB
> SUM(rx_fifo_size[1][]) = 64KiB
> SUM(tx_fifo_size[0][]) = 60KiB
> SUM(tx_fifo_size[1][]) = 64KiB

I follow the math with 64KiB, but surely the 60KiB should be
9+9+9+9+1+1+1+1=40KiB? This seems to me that the "legacy EQOS" simply
shifts with smaller FIFOs. Since dwmac is licensed as a soft IP core,
perhaps the FIFO size is an elaboration parameter? That would mean
this isn't an issue with dwmac 5.0 broadly, but with Nvidia's specific
instantiation of it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-15 17:38               ` Sam Edwards
@ 2026-04-15 19:37                 ` Russell King (Oracle)
  2026-04-15 20:50                   ` Sam Edwards
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-15 19:37 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

On Wed, Apr 15, 2026 at 10:38:29AM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 5:44 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote:
> > > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
> > > <linux@armlinux.org.uk> wrote:
> > > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> > > > survives iperf3 -c -R to the imx6.
> > >
> > > Hi Russell,
> > >
> > > Aw, you beat me to it! I was about to report that 5.10.104-tegra is
> > > unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> > >
> > > > Dumping the registers and comparing, and then forcing the RQS and TQS
> > > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> > > > *256 = 36864 ytes) respectively seems to solve the problem. Under
> > > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> > > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> > > > all four of the MTL receive operation mode registers, but only the
> > > > first MTL transmit operation mode register. However, DMA channels 1-3
> > > > aren't initialised.
> > >
> > > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
> > > than that, so when the DMA suffers a momentary hiccup, the FIFOs are
> > > allowed to overflow, putting the hardware in a bad state.
> > >
> > > Though I suspect this is only half of the problem: do you still see
> > > RBUs? Everything you've shared so far suggests the DMA failures are
> > > _not_ because the rx ring is drying up.
> >
> > Yes. Note that RBUs will happen not because of DMA failures, but if
> > the kernel fails to keep up with the packet rate. RBU means "we read
> > the next descriptor, and it wasn't owned by hardware".
> 
> Are you speaking from observation, documentation, or understanding?

Observation.

> I'd define RBU the same way, but you reported:

It's not a question about how I define RBU - this is defined by Synopsys
and I'm using it *exactly* that way as stated in the documentation.

"This bit indicates that the host owns the Next Descriptor in the
Receive List and the DMA cannot acquire it. The Receive Process is
suspended. ... This bit is set only when the previous Receive
Descriptor is owned by the DMA."

In other words, DMA has processed the previous receive descriptor which
_was_ owned by the hardware, written back to clear the OWN bit, and
then fetches the next descriptor and finds that the OWN bit is also
clear.

> 
> ```
> [   55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer
> unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245
> last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64
> 
> cur_rx == dirty_rx _should_ mean that we fully refilled the ring. [...]
> [...]
> Every ring entry contains the same RDES3 value, so it really is
> completely full at the point RBU fires (bit 31 clear means software
> owns the descriptor, and it's basically saying first/last segment,
> RDES1 valid, buffer 1 length of 1518.
> ```

Right, because the _last_ time stmmac_rx() was called, the ring was
completely refilled (as it always is for me).

There are two scenarios that what I'm seeing may happen.

1) The ring was fully refilled, but before stmmac_rx() is next
   executed, all descriptors end up being consumed due to the rate
   at which packets are being received. Thus, the hardware encounters
   a descriptor that has OWN=0

2) The kernel has been slow to respond to packets that have been
   received, and because of the NAPI throttling stmmac_rx() to only
   process 64 descriptors at a time, we are falling way behind the
   hardware position. Eventually, the hardware catches up with
   the point at which stmmac_rx_refill() is repopulating the receive
   descriptors, and encounters a descriptor that has OWN=0.

For (2), for example, let's take the example which you've quoted from
me.

stmmac_rx() gets called, and cur_rx = dirty_rx = 245. We're limited to
a count of 64 meaning we're not going to process more than 64 entries
no matter how far ahead the hardware is. Let's say the hardware is at
e.g. descriptor 400 at this point.

stmmac_rx() runs, processing descriptors. It works its way up to entry
309, at which point count == limit, so it stops, and we now have
cur_rx = 309, dirty_rx = 245.

The next thing stmmac_rx() does is call stmmac_rx_refill(). This looks
at the difference, and calculates how many entries need to be
repopulated. stmmac_rx_dirty() returns 64, as that's the number of
entries between dirty_rx and the updated cur_rx. It populates those
entries.

At this point, dirty_rx = 309. All well and good. However, during that
process, packet reception hasn't stopped, and let's say it's now at
descriptor 500.

In that scenario, we're consuming 100 descriptors, but only repopulating
64 descriptors. As this continues, the hardware is slowly catching up
with point in the ring that stmmac_rx_refill() is repopulating the
descriptors.

When it does catch up, it will encounter a descriptor with OWN=0, which
will fire the RBU interrupt.

At this point, my debug dumps the state of the ring. If the RBU was
raised when stmmac_rx()/stmmac_rx_refill() was not running, _and_ we
are always successfully refilling all the entries that stmmac_rx()
processed, then cur_rx will equal dirty_rx, even when the hardware
could be way ahead of cur_rx. Neither of these indexes have any
relevance to where the hardware actually is in the ring.

The dump of the ring state *clearly* shows that all descriptors have
a RDES3 value which indicates that every single descriptor is not
hardware owned at this point (since RBU has been raised, the receive
process is suspended, so hardware is no longer changing the ring.)

> It would seem* that the kernel isn't really failing to keep up with
> the packet rate. If RBU is firing with a ring that's not even close to
> empty, that tells me there's another way for it to fire. So I suspect
> the hardware designers implemented it to mean:
> "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> 
> (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> is actually completely depleted, not full? If count==budget, wouldn't
> that mean the whole ring hasn't been visited, so we only refilled 64
> entries and not necessarily the entire ring? Maybe the kernel isn't
> keeping up after all.)

Ah, I think that's where our terminology differs.

You seem to define full as "populated with empty buffers". I define
full to mean "the hardware has filled every buffer with a packet that
it has received and handed it over to software to process." Note even
the terminology there - filling buffers with data. That ultimately
ends up filling the ring, and when completely filled, it is full.

I think of buffers like buckets. If a buffer contains no data, it
is empty. If a buffer contains data, it has been filled or is full.
Apply that to a list of buffers and you get the same thing. Many
ethernet driver documentation uses this same terminology, so I
thought it would be widely understood.

> > That has:
> >
> >         const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
> >                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
> >                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
> >                 { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U),
> >                   FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) },
> >         };
> >         const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
> >                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
> >                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
> >                 { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U),
> >                   FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) },
> >         };
> >
> > where each of those values is the RQS/TQS value to use in KiB:
> >
> > #define FIFO_SZ(x)              ((((x) * 1024U) / 256U) - 1U)
> >
> > This doesn't correspond with the values I'm seeing programmed into
> > the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143
> > (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables
> > above from a quick look, but they're not in the right place!
> 
> True, but:
> a) I doubt 5.10.216-tegra includes exactly the same version of the
> driver found in this random GitHub mirror. (My intent was only to
> point out that they don't use 5.10's stmmac; I should have been more
> clear that I wasn't trying to link the same version, sorry!)
> b) This is vendor code; I don't know how good their testing/review
> process is. It might not run the way it looks. The intent seems to be
> for RQS > TQS (which makes intuitive sense), but as you're seeing the
> registers programmed the other way 'round, they might have gotten them
> subtly mixed up.
> 
> > Now, as for FIFO sizes, if we sum up all the entries, then we
> > get:
> >
> > SUM(rx_fifo_size[0][]) = 60KiB
> > SUM(rx_fifo_size[1][]) = 64KiB
> > SUM(tx_fifo_size[0][]) = 60KiB
> > SUM(tx_fifo_size[1][]) = 64KiB
> 
> I follow the math with 64KiB, but surely the 60KiB should be
> 9+9+9+9+1+1+1+1=40KiB? This seems to me that the "legacy EQOS" simply
> shifts with smaller FIFOs. Since dwmac is licensed as a soft IP core,
> perhaps the FIFO size is an elaboration parameter? That would mean
> this isn't an issue with dwmac 5.0 broadly, but with Nvidia's specific
> instantiation of it.

Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
trying to do anything.

However, I've tested with 0x7f in both fields, and it still falls flat
on its face. I've also tried other values, but because I had to unplug
the laptop from the nvidia board to use the laptop portably due to the
medical emergency situation, that caused screen to quit, so I've lost
all that. Chaos reigns supreme here :/

So, I'm not sure we understand what's going on - I don't think it's that
the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
results in some kind of throttling that prevents the condition which
hangs the hardware.

I'm not getting as much time as I'd like to really test out scenarios
due to everything that is going on, and honestly I feel like just
writing this week off now and giving up.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-15 19:37                 ` Russell King (Oracle)
@ 2026-04-15 20:50                   ` Sam Edwards
  2026-04-16  0:02                     ` Russell King (Oracle)
  0 siblings, 1 reply; 20+ messages in thread
From: Sam Edwards @ 2026-04-15 20:50 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> It's not a question about how I define RBU - this is defined by Synopsys
> and I'm using it *exactly* that way as stated in the documentation.
>
> "This bit indicates that the host owns the Next Descriptor in the
> Receive List and the DMA cannot acquire it. The Receive Process is
> suspended. ... This bit is set only when the previous Receive
> Descriptor is owned by the DMA."
>
> In other words, DMA has processed the previous receive descriptor which
> _was_ owned by the hardware, written back to clear the OWN bit, and
> then fetches the next descriptor and finds that the OWN bit is also
> clear.

I'm only trying to leave open the possibility that the Synopsys
technical writer and the hardware implementation team weren't
communicating clearly. We already have a situation where RPS isn't
behaving as documented (even if that's likely just hardware
misconfiguration), so while I'm currently pretty sure RBU carries no
other (actual) meaning than "DMA caught up to OWN=0," I'm only about
75% confident.

> > It would seem* that the kernel isn't really failing to keep up with
> > the packet rate. If RBU is firing with a ring that's not even close to
> > empty, that tells me there's another way for it to fire. So I suspect
> > the hardware designers implemented it to mean:
> > "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> >
> > (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> > is actually completely depleted, not full? If count==budget, wouldn't
> > that mean the whole ring hasn't been visited, so we only refilled 64
> > entries and not necessarily the entire ring? Maybe the kernel isn't
> > keeping up after all.)
>
> Ah, I think that's where our terminology differs.
>
> You seem to define full as "populated with empty buffers". I define
> full to mean "the hardware has filled every buffer with a packet that
> it has received and handed it over to software to process." Note even
> the terminology there - filling buffers with data. That ultimately
> ends up filling the ring, and when completely filled, it is full.
>
> I think of buffers like buckets. If a buffer contains no data, it
> is empty. If a buffer contains data, it has been filled or is full.
> Apply that to a list of buffers and you get the same thing. Many
> ethernet driver documentation uses this same terminology, so I
> thought it would be widely understood.

Ah okay, I was beginning to suspect the same. In my defense: though I
also think of buffers in the same way, this driver calls the process
of supplying empty buffers "refilling," which is also the terminology
we've both been using throughout this exchange, and when something is
"completely refilled" I generally call it "full." But I'm realizing
now that the bidirectional (submissions+completions) nature of this
ring means that "full" and "empty" aren't really well-defined
concepts. I'll try to read more carefully (and switch to saying
"completely dirty" and "completely clean") going forward.

So the kernel is able to supply clean buffers without issue, but it
somehow falls behind the incoming packet rate and the DMA is left with
a completely dirty ring. I agree that stmmac_rx() is therefore just
not running fast enough: either it's got really bad scheduler jitter
for the ~6.3ms minimum it takes for 512x full-sized Ethernet frames to
arrive from the PHY (your scenario 1), or -- more likely -- the NAPI
budgets gradually fall behind the hardware (your scenario 2).

> Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
> trying to do anything.
>
> However, I've tested with 0x7f in both fields, and it still falls flat
> on its face. I've also tried other values, but because I had to unplug
> the laptop from the nvidia board to use the laptop portably due to the
> medical emergency situation, that caused screen to quit, so I've lost
> all that. Chaos reigns supreme here :/

I'm sorry to hear about that, please prioritize you/yours and don't
feel like you owe me speedy replies.

> So, I'm not sure we understand what's going on - I don't think it's that
> the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
> results in some kind of throttling that prevents the condition which
> hangs the hardware.

I'll try playing with the FIFO configuration on my end to learn:
a) If a suitably-configured FIFO size makes the RPS status arrive as documented
b) If I can safely fill the FIFO slowly (by manually stalling the
driver and adding frames one at a time) and have it drain on resume
c) Whether the TQS value can be adjusted independently of this
problem's prevalence
d) The maximum RQS value that allows the problem to happen

> I'm not getting as much time as I'd like to really test out scenarios
> due to everything that is going on, and honestly I feel like just
> writing this week off now and giving up.

I have the same hardware, observe the same issue, and find this
interesting enough to keep plugging away at it. I would have no hard
feelings if you left me alone with this problem for a bit. :)

Be well,
Sam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-15 20:50                   ` Sam Edwards
@ 2026-04-16  0:02                     ` Russell King (Oracle)
  0 siblings, 0 replies; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-16  0:02 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni

On Wed, Apr 15, 2026 at 01:50:53PM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > It's not a question about how I define RBU - this is defined by Synopsys
> > and I'm using it *exactly* that way as stated in the documentation.
> >
> > "This bit indicates that the host owns the Next Descriptor in the
> > Receive List and the DMA cannot acquire it. The Receive Process is
> > suspended. ... This bit is set only when the previous Receive
> > Descriptor is owned by the DMA."
> >
> > In other words, DMA has processed the previous receive descriptor which
> > _was_ owned by the hardware, written back to clear the OWN bit, and
> > then fetches the next descriptor and finds that the OWN bit is also
> > clear.
> 
> I'm only trying to leave open the possibility that the Synopsys
> technical writer and the hardware implementation team weren't
> communicating clearly. We already have a situation where RPS isn't
> behaving as documented (even if that's likely just hardware
> misconfiguration), so while I'm currently pretty sure RBU carries no
> other (actual) meaning than "DMA caught up to OWN=0," I'm only about
> 75% confident.

It doesn't make sense for RPS to be set though. RPS is "Receive Process
Stopped" and it's documented as being raised when the receive process
enters the stopped state.

If we look at the DMA Debug Status 0 register at 0x100c, then this
gives us a four bit bitfield for channels 0, 1 and 2. Further channels
are in 0x1010. I've added code to dump these when RBU occurs:

dwc-eth-dwmac 2490000.ethernet eth0: debug status: 0x00006400 0x00000000

bits 11:8 are RPS0, which indiciates that the DMA channel 0 receive
process state is "Suspended (Rx Descriptor Unavailable)". If this were
0, then it would be "Stopped (Reset or Stop Receive Command issued)".

So, RPS isn't being raised because the process state isn't entering
the stopped state, which makes sense - because we haven't issued a
stop command, nor have we caused a reset, and the documented recovery
from this condition is to merely advance the tail pointer, rather than
issuing a command to re-start the receive process.

When this is done (because stmmac_rx() continues to periodically run
because of NAPI) RPS0 does change back to 3 "Running (Waiting for Rx
packet)" but it seems that although there are packets waiting to be
written out, that never happens (the Queue 0 Receive Debug register
indicates that there are packets in the receive queue, the receive
queue fill level is above the flow control activate threshold, and
the MAC itself hammers the network with pause frames as a result.)

Thus, I think that the fact that RPS isn't being signalled is entirely
reasonable and consistent with the available documentation.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
  2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle)
  2026-04-12 14:01 ` Maxime Chevallier
  2026-04-13 18:02 ` Jakub Kicinski
@ 2026-04-13 22:00 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-04-13 22:00 UTC (permalink / raw)
  To: Russell King
  Cc: andrew, alexandre.torgue, andrew+netdev, davem, edumazet, kuba,
	linux-arm-kernel, linux-stm32, netdev, pabeni, cfsworks

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 14:07:51 +0100 you wrote:
> Enable receive process stopped and receive buffer unavailable
> interrupts, so that the statistic counters can be updated.
> 
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> Since we are seeing receive buffer exhaustion on several platforms,
> let's enable the interrupts so the statistics we publish via ethtool -S
> actually work to aid diagnosis. I've been in two minds about whether
> to send this patch, but given the problems with stmmac at the moment,
> I think it should be merged.
> 
> [...]

Here is the summary with links:
  - [net-next] net: stmmac: enable RPS and RBU interrupts
    https://git.kernel.org/netdev/net-next/c/1b9707e6f1a9

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-04-16  0:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle)
2026-04-12 14:01 ` Maxime Chevallier
2026-04-12 14:23   ` Russell King (Oracle)
2026-04-13  1:42     ` Sam Edwards
2026-04-13  7:24       ` Russell King (Oracle)
2026-04-13  7:28         ` Russell King (Oracle)
2026-04-13 18:02 ` Jakub Kicinski
2026-04-13 18:49   ` Russell King (Oracle)
2026-04-13 20:50     ` Jakub Kicinski
2026-04-13 20:53       ` Russell King (Oracle)
2026-04-13 21:54     ` Sam Edwards
2026-04-14 14:13       ` Russell King (Oracle)
2026-04-15  1:19         ` Russell King (Oracle)
2026-04-15  2:12           ` Sam Edwards
2026-04-15 12:43             ` Russell King (Oracle)
2026-04-15 17:38               ` Sam Edwards
2026-04-15 19:37                 ` Russell King (Oracle)
2026-04-15 20:50                   ` Sam Edwards
2026-04-16  0:02                     ` Russell King (Oracle)
2026-04-13 22:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox