* [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
@ 2026-04-10 13:07 Russell King (Oracle)
2026-04-12 14:01 ` Maxime Chevallier
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Russell King (Oracle) @ 2026-04-10 13:07 UTC (permalink / raw)
To: Andrew Lunn
Cc: Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, linux-arm-kernel, linux-stm32, netdev,
Paolo Abeni, Sam Edwards
Enable receive process stopped and receive buffer unavailable
interrupts, so that the statistic counters can be updated.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Since we are seeing receive buffer exhaustion on several platforms,
let's enable the interrupts so the statistics we publish via ethtool -S
actually work to aid diagnosis. I've been in two minds about whether
to send this patch, but given the problems with stmmac at the moment,
I think it should be merged.
drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
index af6580332d49..43b036d4e95b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
@@ -99,6 +99,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
#define DMA_CHAN_INTR_ENA_NIE_4_10 BIT(15)
#define DMA_CHAN_INTR_ENA_AIE_4_10 BIT(14)
#define DMA_CHAN_INTR_ENA_FBE BIT(12)
+#define DMA_CHAN_INTR_ENA_RPS BIT(8)
+#define DMA_CHAN_INTR_ENA_RBU BIT(7)
#define DMA_CHAN_INTR_ENA_RIE BIT(6)
#define DMA_CHAN_INTR_ENA_TIE BIT(0)
@@ -107,6 +109,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
DMA_CHAN_INTR_ENA_TIE)
#define DMA_CHAN_INTR_ABNORMAL (DMA_CHAN_INTR_ENA_AIE | \
+ DMA_CHAN_INTR_ENA_RPS | \
+ DMA_CHAN_INTR_ENA_RBU | \
DMA_CHAN_INTR_ENA_FBE)
/* DMA default interrupt mask for 4.00 */
#define DMA_CHAN_INTR_DEFAULT_MASK (DMA_CHAN_INTR_NORMAL | \
@@ -117,6 +121,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs,
DMA_CHAN_INTR_ENA_TIE)
#define DMA_CHAN_INTR_ABNORMAL_4_10 (DMA_CHAN_INTR_ENA_AIE_4_10 | \
+ DMA_CHAN_INTR_ENA_RPS | \
+ DMA_CHAN_INTR_ENA_RBU | \
DMA_CHAN_INTR_ENA_FBE)
/* DMA default interrupt mask for 4.10a */
#define DMA_CHAN_INTR_DEFAULT_MASK_4_10 (DMA_CHAN_INTR_NORMAL_4_10 | \
--
2.47.3
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle) @ 2026-04-12 14:01 ` Maxime Chevallier 2026-04-12 14:23 ` Russell King (Oracle) 2026-04-13 18:02 ` Jakub Kicinski 2026-04-13 22:00 ` patchwork-bot+netdevbpf 2 siblings, 1 reply; 20+ messages in thread From: Maxime Chevallier @ 2026-04-12 14:01 UTC (permalink / raw) To: Russell King (Oracle), Andrew Lunn Cc: Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni, Sam Edwards Hi Russell, On 10/04/2026 15:07, Russell King (Oracle) wrote: > Enable receive process stopped and receive buffer unavailable > interrupts, so that the statistic counters can be updated. > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > --- > Since we are seeing receive buffer exhaustion on several platforms, > let's enable the interrupts so the statistics we publish via ethtool -S > actually work to aid diagnosis. I've been in two minds about whether > to send this patch, but given the problems with stmmac at the moment, > I think it should be merged. Looks like my reply to your original RFC was lost in limbo as the review/test tags are missing. Here's my original answer : It works, I can indeed see the stats get properly updated on imx8mp 🙂 There's one downside to it though, which is that as soon as we hit a situation where we don't have RX bufs available, this patchs has a tendancy to make things worse as we'll trigger interrupts for each packet we receive and that we can't process, making it even longer for queues to be refilled. It shows on iperf3 with small packets : ---- Before patch, 17% packet loss on UDP 56 bytes packets ----------------- # iperf3 -u -b 0 -l 56 -c 192.168.2.1 -R Connecting to host 192.168.2.1, port 5201 Reverse mode, remote host 192.168.2.1 is sending [ 5] local 192.168.2.18 port 47851 connected to 192.168.2.1 port 5201 [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-1.00 sec 10.7 MBytes 90.0 Mbits/sec 0.003 ms 48550/249650 (19%) [ 5] 1.00-2.00 sec 11.3 MBytes 95.0 Mbits/sec 0.003 ms 41881/253832 (16%) [ 5] 2.00-3.00 sec 11.3 MBytes 94.9 Mbits/sec 0.002 ms 42060/253913 (17%) [ 5] 3.00-4.00 sec 11.3 MBytes 95.1 Mbits/sec 0.003 ms 41499/253785 (16%) [ 5] 4.00-5.00 sec 11.3 MBytes 94.6 Mbits/sec 0.003 ms 42663/253787 (17%) [ 5] 5.00-6.00 sec 11.3 MBytes 94.9 Mbits/sec 0.006 ms 41976/253719 (17%) [ 5] 6.00-7.00 sec 11.3 MBytes 94.5 Mbits/sec 0.003 ms 43133/253999 (17%) [ 5] 7.00-8.00 sec 11.3 MBytes 95.0 Mbits/sec 0.004 ms 41442/253579 (16%) [ 5] 8.00-9.00 sec 11.4 MBytes 95.2 Mbits/sec 0.004 ms 41518/254131 (16%) [ 5] 9.00-10.00 sec 11.2 MBytes 94.3 Mbits/sec 0.006 ms 43580/254143 (17%) - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-10.00 sec 135 MBytes 114 Mbits/sec 0.000 ms 0/0 (0%) sender [ 5] 0.00-10.00 sec 112 MBytes 94.3 Mbits/sec 0.006 ms 428302/2534538 (17%) receiver iperf Done. # ethtool -S eth1 | grep rx_buf_unav_irq rx_buf_unav_irq: 0 ---- After patch, 22% packet loss on UDP 56 bytes packets ---------------------- # iperf3 -u -b 0 -l 56 -c 192.168.2.1 -R Connecting to host 192.168.2.1, port 5201 Reverse mode, remote host 192.168.2.1 is sending [ 5] local 192.168.2.18 port 42121 connected to 192.168.2.1 port 5201 [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-1.00 sec 10.3 MBytes 85.8 Mbits/sec 0.004 ms 55146/247172 (22%) [ 5] 1.00-2.00 sec 10.6 MBytes 89.1 Mbits/sec 0.003 ms 54699/253355 (22%) [ 5] 2.00-3.00 sec 10.6 MBytes 89.0 Mbits/sec 0.003 ms 55231/253887 (22%) [ 5] 3.00-4.00 sec 10.6 MBytes 88.9 Mbits/sec 0.003 ms 55138/253602 (22%) [ 5] 4.00-5.00 sec 10.6 MBytes 89.0 Mbits/sec 0.003 ms 54938/253722 (22%) [ 5] 5.00-6.00 sec 10.6 MBytes 88.9 Mbits/sec 0.003 ms 55273/253580 (22%) [ 5] 6.00-7.00 sec 10.6 MBytes 89.0 Mbits/sec 0.003 ms 55202/253986 (22%) [ 5] 7.00-8.00 sec 10.6 MBytes 89.1 Mbits/sec 0.003 ms 55047/253958 (22%) [ 5] 8.00-9.00 sec 10.6 MBytes 88.9 Mbits/sec 0.003 ms 55612/254140 (22%) [ 5] 9.00-10.00 sec 10.6 MBytes 89.0 Mbits/sec 0.003 ms 55683/254403 (22%) - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-10.00 sec 135 MBytes 113 Mbits/sec 0.000 ms 0/0 (0%) sender [ 5] 0.00-10.00 sec 106 MBytes 88.7 Mbits/sec 0.003 ms 551969/2531805 (22%) receiver iperf Done. # ethtool -S eth1 | grep rx_buf_unav_irq rx_buf_unav_irq: 30624 So clearly there are pros and cons with this, but I don't want to fall into the "let's not break microbenchmarks" pitfall. I personnaly find the stat useful, and that having the stat visible to user but stuck at 0 is misleading so, Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Maxime > > drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h > index af6580332d49..43b036d4e95b 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h > @@ -99,6 +99,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs, > #define DMA_CHAN_INTR_ENA_NIE_4_10 BIT(15) > #define DMA_CHAN_INTR_ENA_AIE_4_10 BIT(14) > #define DMA_CHAN_INTR_ENA_FBE BIT(12) > +#define DMA_CHAN_INTR_ENA_RPS BIT(8) > +#define DMA_CHAN_INTR_ENA_RBU BIT(7) > #define DMA_CHAN_INTR_ENA_RIE BIT(6) > #define DMA_CHAN_INTR_ENA_TIE BIT(0) > > @@ -107,6 +109,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs, > DMA_CHAN_INTR_ENA_TIE) > > #define DMA_CHAN_INTR_ABNORMAL (DMA_CHAN_INTR_ENA_AIE | \ > + DMA_CHAN_INTR_ENA_RPS | \ > + DMA_CHAN_INTR_ENA_RBU | \ > DMA_CHAN_INTR_ENA_FBE) > /* DMA default interrupt mask for 4.00 */ > #define DMA_CHAN_INTR_DEFAULT_MASK (DMA_CHAN_INTR_NORMAL | \ > @@ -117,6 +121,8 @@ static inline u32 dma_chanx_base_addr(const struct dwmac4_addrs *addrs, > DMA_CHAN_INTR_ENA_TIE) > > #define DMA_CHAN_INTR_ABNORMAL_4_10 (DMA_CHAN_INTR_ENA_AIE_4_10 | \ > + DMA_CHAN_INTR_ENA_RPS | \ > + DMA_CHAN_INTR_ENA_RBU | \ > DMA_CHAN_INTR_ENA_FBE) > /* DMA default interrupt mask for 4.10a */ > #define DMA_CHAN_INTR_DEFAULT_MASK_4_10 (DMA_CHAN_INTR_NORMAL_4_10 | \ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-12 14:01 ` Maxime Chevallier @ 2026-04-12 14:23 ` Russell King (Oracle) 2026-04-13 1:42 ` Sam Edwards 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-12 14:23 UTC (permalink / raw) To: Maxime Chevallier Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni, Sam Edwards On Sun, Apr 12, 2026 at 04:01:59PM +0200, Maxime Chevallier wrote: > Hi Russell, > > On 10/04/2026 15:07, Russell King (Oracle) wrote: > > Enable receive process stopped and receive buffer unavailable > > interrupts, so that the statistic counters can be updated. > > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > > --- > > Since we are seeing receive buffer exhaustion on several platforms, > > let's enable the interrupts so the statistics we publish via ethtool -S > > actually work to aid diagnosis. I've been in two minds about whether > > to send this patch, but given the problems with stmmac at the moment, > > I think it should be merged. > > Looks like my reply to your original RFC was lost in limbo as the review/test tags are missing. Thanks. Unfortunately, I can't run iperf3 against stmmac on the Jetson NX because stmmac just totally screws itself (at the first RBU, the receive side irrevocably collapses.) Against i.MX6 (which is limited to around 480Mbps,) it's recoverable by taking the interface down and back up a couple of times. Against x86 (which will saturate the link) its pretty much irrecoverable without entire system reboot - if one tries the down+up, we then get arm-smmu errors because it seems that, despite stmmac being reset, it still attempts to access a previous receive buffer from before the down/up sometime after the up. Moreover, transmit stops working - packets get queued but they are never processed by the hardware. This is a scenario that I can only rarely test myself (as it depends on my physical location.) As the dwmac 5.0 core receive path seems to lock up after the first RBU, I never see more than one of those at a time. Right now, I consider this pretty much unsolvable - I've spent quite some time looking at it and trying various approaches, nothing seems to fix it. However, adding dma_rmb() in the descriptor cleanup/refill paths does seem to improve the situation a little with the 480Mbps case, because I think it means that we're reading the descriptors in a more timely manner after the hardware has updated them. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-12 14:23 ` Russell King (Oracle) @ 2026-04-13 1:42 ` Sam Edwards 2026-04-13 7:24 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Sam Edwards @ 2026-04-13 1:42 UTC (permalink / raw) To: Russell King (Oracle) Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > As the dwmac 5.0 core receive path seems to lock up after the first > RBU, I never see more than one of those at a time. > > Right now, I consider this pretty much unsolvable - I've spent quite > some time looking at it and trying various approaches, nothing seems > to fix it. However, adding dma_rmb() in the descriptor cleanup/refill > paths does seem to improve the situation a little with the 480Mbps > case, because I think it means that we're reading the descriptors in > a more timely manner after the hardware has updated them. Hey Russell, I'd like to repro this but I currently can't boot net-next. My issue is the same as [1], and the patch to fix it [2] isn't yet committed anywhere apparently. This prevents my Jetson Xavier NX from starting at all (and after enough attempts, corrupts eMMC); I'm surprised you're not suffering the same effects. But because this bug lives in the IOMMU subsystem (and it has somewhat inconsistent effects), perhaps this is just a different way it manifests? Could you confirm whether your dwmac hang happens with IOMMU disabled, and/or with [1] reverted or [2] applied? I'm using a defconfig build and a fairly minimal cmdline (just console=, root=, and rootwait). Cheers, Sam [1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/ [2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-13 1:42 ` Sam Edwards @ 2026-04-13 7:24 ` Russell King (Oracle) 2026-04-13 7:28 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-13 7:24 UTC (permalink / raw) To: Sam Edwards Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni On Sun, Apr 12, 2026 at 06:42:04PM -0700, Sam Edwards wrote: > On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle) > <linux@armlinux.org.uk> wrote: > > As the dwmac 5.0 core receive path seems to lock up after the first > > RBU, I never see more than one of those at a time. > > > > Right now, I consider this pretty much unsolvable - I've spent quite > > some time looking at it and trying various approaches, nothing seems > > to fix it. However, adding dma_rmb() in the descriptor cleanup/refill > > paths does seem to improve the situation a little with the 480Mbps > > case, because I think it means that we're reading the descriptors in > > a more timely manner after the hardware has updated them. > > Hey Russell, > > I'd like to repro this but I currently can't boot net-next. My issue > is the same as [1], and the patch to fix it [2] isn't yet committed > anywhere apparently. > > This prevents my Jetson Xavier NX from starting at all (and after > enough attempts, corrupts eMMC); I'm surprised you're not suffering > the same effects. But because this bug lives in the IOMMU subsystem > (and it has somewhat inconsistent effects), perhaps this is just a > different way it manifests? Could you confirm whether your dwmac hang > happens with IOMMU disabled, and/or with [1] reverted or [2] applied? > > I'm using a defconfig build and a fairly minimal cmdline (just > console=, root=, and rootwait). > > Cheers, > Sam > > [1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/ > [2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/ In the second link, there is this sub-thread: https://lore.kernel.org/all/ee2c2044-e329-4cdd-ac35-9365824d3677@arm.com/ which was committed into -rc as: 7e0548525abd iommu: Ensure .iotlb_sync is called correctly which does fix IOMMU problems which caused net-next which reports itself as v7.0-rc6 failing to boot with ext4 errors. See: https://lore.kernel.org/r/adZTGOjjJrVJOcT8@shell.armlinux.org.uk which resulted in it being merged into v7.0-rc7 just before Thursday's net tree merge. Due to the way net-next is operated, that means that net-next on Thursday evening gained this fix. Involving Linus in the problem meant he was aware of it, and explaining how netdev works allowed him to delay the merging of the net tree to ensure net-next gained the fix. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-13 7:24 ` Russell King (Oracle) @ 2026-04-13 7:28 ` Russell King (Oracle) 0 siblings, 0 replies; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-13 7:28 UTC (permalink / raw) To: Sam Edwards Cc: Maxime Chevallier, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni On Mon, Apr 13, 2026 at 08:24:59AM +0100, Russell King (Oracle) wrote: > On Sun, Apr 12, 2026 at 06:42:04PM -0700, Sam Edwards wrote: > > On Sun, Apr 12, 2026 at 7:23 AM Russell King (Oracle) > > <linux@armlinux.org.uk> wrote: > > > As the dwmac 5.0 core receive path seems to lock up after the first > > > RBU, I never see more than one of those at a time. > > > > > > Right now, I consider this pretty much unsolvable - I've spent quite > > > some time looking at it and trying various approaches, nothing seems > > > to fix it. However, adding dma_rmb() in the descriptor cleanup/refill > > > paths does seem to improve the situation a little with the 480Mbps > > > case, because I think it means that we're reading the descriptors in > > > a more timely manner after the hardware has updated them. > > > > Hey Russell, > > > > I'd like to repro this but I currently can't boot net-next. My issue > > is the same as [1], and the patch to fix it [2] isn't yet committed > > anywhere apparently. > > > > This prevents my Jetson Xavier NX from starting at all (and after > > enough attempts, corrupts eMMC); I'm surprised you're not suffering > > the same effects. But because this bug lives in the IOMMU subsystem > > (and it has somewhat inconsistent effects), perhaps this is just a > > different way it manifests? Could you confirm whether your dwmac hang > > happens with IOMMU disabled, and/or with [1] reverted or [2] applied? > > > > I'm using a defconfig build and a fairly minimal cmdline (just > > console=, root=, and rootwait). > > > > Cheers, > > Sam > > > > [1] https://lore.kernel.org/all/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com/ > > [2] https://lore.kernel.org/all/0-v1-664d3acaabb9+78b-iommu_gather_always_jgg@nvidia.com/ > > In the second link, there is this sub-thread: > > https://lore.kernel.org/all/ee2c2044-e329-4cdd-ac35-9365824d3677@arm.com/ > > which was committed into -rc as: > > 7e0548525abd iommu: Ensure .iotlb_sync is called correctly > > which does fix IOMMU problems which caused net-next which reports itself > as v7.0-rc6 failing to boot with ext4 errors. See: > > https://lore.kernel.org/r/adZTGOjjJrVJOcT8@shell.armlinux.org.uk > > which resulted in it being merged into v7.0-rc7 just before Thursday's > net tree merge. Due to the way net-next is operated, that means that > net-next on Thursday evening gained this fix. > > Involving Linus in the problem meant he was aware of it, and explaining > how netdev works allowed him to delay the merging of the net tree to > ensure net-next gained the fix. I'll also state what I've stated previously about the iperf3 problem: it seems to go back a long time, certainly before I started cleaning up the stmmac driver which is now well over a year ago. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle) 2026-04-12 14:01 ` Maxime Chevallier @ 2026-04-13 18:02 ` Jakub Kicinski 2026-04-13 18:49 ` Russell King (Oracle) 2026-04-13 22:00 ` patchwork-bot+netdevbpf 2 siblings, 1 reply; 20+ messages in thread From: Jakub Kicinski @ 2026-04-13 18:02 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni, Sam Edwards On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote: > Since we are seeing receive buffer exhaustion on several platforms, > let's enable the interrupts so the statistics we publish via ethtool -S > actually work to aid diagnosis. I've been in two minds about whether > to send this patch, but given the problems with stmmac at the moment, > I think it should be merged. Sorry for a under-research response but wasn't there are person trying to fix the OOM starvation issue? Who was supposed to add a timer? Is your problem also OOM related or do you suspect something else? Firing interrupts when Rx fill ring runs dry (which IIUC this patches dies?) is not a good idea. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-13 18:02 ` Jakub Kicinski @ 2026-04-13 18:49 ` Russell King (Oracle) 2026-04-13 20:50 ` Jakub Kicinski 2026-04-13 21:54 ` Sam Edwards 0 siblings, 2 replies; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-13 18:49 UTC (permalink / raw) To: Jakub Kicinski Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni, Sam Edwards On Mon, Apr 13, 2026 at 11:02:22AM -0700, Jakub Kicinski wrote: > On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote: > > Since we are seeing receive buffer exhaustion on several platforms, > > let's enable the interrupts so the statistics we publish via ethtool -S > > actually work to aid diagnosis. I've been in two minds about whether > > to send this patch, but given the problems with stmmac at the moment, > > I think it should be merged. > > Sorry for a under-research response but wasn't there are person trying > to fix the OOM starvation issue? Who was supposed to add a timer? > Is your problem also OOM related or do you suspect something else? It is not OOM related. I have this patch applied: diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 131ea887bedc..614d0e10e3e6 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -5095,14 +5095,18 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue) if (!buf->page) { buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp); - if (!buf->page) + if (!buf->page) { + netdev_err(priv->dev, "q%u: no buffer 1\n", queue); break; + } } if (priv->sph_active && !buf->sec_page) { buf->sec_page = page_pool_alloc_pages(rx_q->page_pool, gfp); - if (!buf->sec_page) + if (!buf->sec_page) { + netdev_err(priv->dev, "q%u: no buffer 2\n", queue); break; + } buf->sec_addr = page_pool_get_dma_addr(buf->sec_page); } and it is silent, so we are not suffering starvation of buffers. However, the hardware hangs during iperf3, and because it triggers the MAC to stream PAUSE frames, and my network uses Netgear GS108 and GS116 unmanaged switches that always use flow-control between them (there's no way not to) it takes down the entire network - as we've discussed before. So, this problem is pretty fatal to the *entire* network. With this patch, the existing statistical counters for this condition are incremented, and thus users can use ethtool -S to see what happened and report whether they are seeing the same issue. Without this patch applied, there are no diagnostics from stmmac that report what the state is. ethtool -d doesn't list the appropriate registers (as I suspect part of the problem is the number of queues is somewhat dynamic - userspace can change that configuration through ethtool). Thus, one has to resort to using devmem2 to find out what's happened. That's not user friendly. For me, devmem2 shows: Channel 0 status register: Value at address 0x02491160: 0x00000484 bit 10: ETI early transmit interrupt - set bit 9 : RWT receive watchdog - clear bit 8 : RPS receieve process stopped - clear bit 7 : RBU receive buffer unavailable - set bit 6 : RI receive interrupt - clear bit 2 : TBU transmit buffer unavailable - set bit 1 : TPS transmit process stopped - clear bit 0 : TI transmit interrupt - clear Debug status register: Value at address 0x0249100c: 0x00006300 TPS[3:0] = 6 = Suspended, Tx descriptor unavailable or Tx buffer underflow RPS[3:0] = 3 = Running, waiting for Rx packet Metal Queue 0 debug register: Value at address 0x02490d38: 0x002e0020 PRXQ[13:0] = 0x2e = 46 packets in receive queue RXQSTS[1:0] = 2 = Rx queue fill-level above flow-control activate threshold RRCSTS[1:0] = 0 = Rx Queue Read Controller State = Idle > Firing interrupts when Rx fill ring runs dry (which IIUC this patches > dies?) is not a good idea. Well, I'm thinking that at least on some platforms, such as the Jetson Xavier NX, unless a different solution can be found, we need the RBU interrupt to fire off a reset of the stmmac IP when this happens to reduce the PAUSE frame flood on the network. If we can't do that, then I think stmmac on these platforms needs to be marked with CONFIG_BROKEN because right now there doesn't seem to be any other viable solution. My intention with this patch is merely to start collecting the already existing statistics so other users can start seeing whether they are hitting the same or similar problem. If we're not prepared to do that, then we should delete the useless statistics from ethtool -S, but I suspect they're now part of the UAPI, even though without this patch they will remain stedfastly stuck at zero. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-13 18:49 ` Russell King (Oracle) @ 2026-04-13 20:50 ` Jakub Kicinski 2026-04-13 20:53 ` Russell King (Oracle) 2026-04-13 21:54 ` Sam Edwards 1 sibling, 1 reply; 20+ messages in thread From: Jakub Kicinski @ 2026-04-13 20:50 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni, Sam Edwards On Mon, 13 Apr 2026 19:49:46 +0100 Russell King (Oracle) wrote: > > Firing interrupts when Rx fill ring runs dry (which IIUC this patches > > dies?) is not a good idea. > > Well, I'm thinking that at least on some platforms, such as the Jetson > Xavier NX, unless a different solution can be found, we need the RBU > interrupt to fire off a reset of the stmmac IP when this happens to > reduce the PAUSE frame flood on the network. > > If we can't do that, then I think stmmac on these platforms needs to be > marked with CONFIG_BROKEN because right now there doesn't seem to be any > other viable solution. > > My intention with this patch is merely to start collecting the already > existing statistics so other users can start seeing whether they are > hitting the same or similar problem. If we're not prepared to do that, > then we should delete the useless statistics from ethtool -S, but I > suspect they're now part of the UAPI, even though without this patch > they will remain stedfastly stuck at zero. Understood, thanks for the extra context. And the statistic we are talking about is rx_buf_unav_irq ? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-13 20:50 ` Jakub Kicinski @ 2026-04-13 20:53 ` Russell King (Oracle) 0 siblings, 0 replies; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-13 20:53 UTC (permalink / raw) To: Jakub Kicinski Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, linux-arm-kernel, linux-stm32, netdev, Paolo Abeni, Sam Edwards On Mon, Apr 13, 2026 at 01:50:18PM -0700, Jakub Kicinski wrote: > On Mon, 13 Apr 2026 19:49:46 +0100 Russell King (Oracle) wrote: > > > Firing interrupts when Rx fill ring runs dry (which IIUC this patches > > > dies?) is not a good idea. > > > > Well, I'm thinking that at least on some platforms, such as the Jetson > > Xavier NX, unless a different solution can be found, we need the RBU > > interrupt to fire off a reset of the stmmac IP when this happens to > > reduce the PAUSE frame flood on the network. > > > > If we can't do that, then I think stmmac on these platforms needs to be > > marked with CONFIG_BROKEN because right now there doesn't seem to be any > > other viable solution. > > > > My intention with this patch is merely to start collecting the already > > existing statistics so other users can start seeing whether they are > > hitting the same or similar problem. If we're not prepared to do that, > > then we should delete the useless statistics from ethtool -S, but I > > suspect they're now part of the UAPI, even though without this patch > > they will remain stedfastly stuck at zero. > > Understood, thanks for the extra context. And the statistic we are > talking about is rx_buf_unav_irq ? Yes, correct. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-13 18:49 ` Russell King (Oracle) 2026-04-13 20:50 ` Jakub Kicinski @ 2026-04-13 21:54 ` Sam Edwards 2026-04-14 14:13 ` Russell King (Oracle) 1 sibling, 1 reply; 20+ messages in thread From: Sam Edwards @ 2026-04-13 21:54 UTC (permalink / raw) To: Russell King (Oracle) Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni On Mon, Apr 13, 2026, 11:49 Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Mon, Apr 13, 2026 at 11:02:22AM -0700, Jakub Kicinski wrote: > > On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote: > > > Since we are seeing receive buffer exhaustion on several platforms, > > > let's enable the interrupts so the statistics we publish via ethtool -S > > > actually work to aid diagnosis. I've been in two minds about whether > > > to send this patch, but given the problems with stmmac at the moment, > > > I think it should be merged. > > > > Sorry for a under-research response but wasn't there are person trying > > to fix the OOM starvation issue? Who was supposed to add a timer? > > Is your problem also OOM related or do you suspect something else? > > It is not OOM related. I have this patch applied: > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index 131ea887bedc..614d0e10e3e6 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -5095,14 +5095,18 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue) > > if (!buf->page) { > buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp); > - if (!buf->page) > + if (!buf->page) { > + netdev_err(priv->dev, "q%u: no buffer 1\n", queue); > break; > + } > } > > if (priv->sph_active && !buf->sec_page) { > buf->sec_page = page_pool_alloc_pages(rx_q->page_pool, gfp); > - if (!buf->sec_page) > + if (!buf->sec_page) { > + netdev_err(priv->dev, "q%u: no buffer 2\n", queue); > break; > + } > > buf->sec_addr = page_pool_get_dma_addr(buf->sec_page); > } > > and it is silent, so we are not suffering starvation of buffers. > > However, the hardware hangs during iperf3, and because it triggers the > MAC to stream PAUSE frames, and my network uses Netgear GS108 and GS116 > unmanaged switches that always use flow-control between them (there's no > way not to) it takes down the entire network - as we've discussed > before. So, this problem is pretty fatal to the *entire* network. > > With this patch, the existing statistical counters for this condition > are incremented, and thus users can use ethtool -S to see what happened > and report whether they are seeing the same issue. > > Without this patch applied, there are no diagnostics from stmmac that > report what the state is. ethtool -d doesn't list the appropriate > registers (as I suspect part of the problem is the number of queues > is somewhat dynamic - userspace can change that configuration through > ethtool). > > Thus, one has to resort to using devmem2 to find out what's happened. > That's not user friendly. > > For me, devmem2 shows: > > Channel 0 status register: > Value at address 0x02491160: 0x00000484 > bit 10: ETI early transmit interrupt - set > bit 9 : RWT receive watchdog - clear > bit 8 : RPS receieve process stopped - clear > bit 7 : RBU receive buffer unavailable - set > bit 6 : RI receive interrupt - clear > bit 2 : TBU transmit buffer unavailable - set > bit 1 : TPS transmit process stopped - clear > bit 0 : TI transmit interrupt - clear > > Debug status register: > Value at address 0x0249100c: 0x00006300 > TPS[3:0] = 6 = Suspended, Tx descriptor unavailable or Tx buffer > underflow > RPS[3:0] = 3 = Running, waiting for Rx packet > > Metal Queue 0 debug register: > Value at address 0x02490d38: 0x002e0020 > PRXQ[13:0] = 0x2e = 46 packets in receive queue > RXQSTS[1:0] = 2 = Rx queue fill-level above flow-control activate > threshold > RRCSTS[1:0] = 0 = Rx Queue Read Controller State = Idle > > > Firing interrupts when Rx fill ring runs dry (which IIUC this patches > > dies?) is not a good idea. > > Well, I'm thinking that at least on some platforms, such as the Jetson > Xavier NX, unless a different solution can be found, we need the RBU > interrupt to fire off a reset of the stmmac IP when this happens to > reduce the PAUSE frame flood on the network. Hi Russell, Should that reset trigger be RPS, not RBU? My understanding of these status bits is RBU is just "RxDMA has failed to take a frame from the RxFIFO" while RPS is "the RxFIFO is full." That would make RBU our critical threshold to start proactively refilling, and RPS the "too late, we lose" threshold. Thinking aloud: Do you suppose the RxDMA waits for a wakeup signal sent whenever a frame is added to RxFIFO? That might explain why the former never recovers once the latter is full: a manual wakeup needs to be sent whenever we resolve RBU. Does the .enable_dma_reception() op need to be implemented for dwmac5, or have you tried that already? > > If we can't do that, then I think stmmac on these platforms needs to be > marked with CONFIG_BROKEN because right now there doesn't seem to be any > other viable solution. > > My intention with this patch is merely to start collecting the already > existing statistics so other users can start seeing whether they are > hitting the same or similar problem. If we're not prepared to do that, > then we should delete the useless statistics from ethtool -S, but I > suspect they're now part of the UAPI, even though without this patch > they will remain stedfastly stuck at zero. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-13 21:54 ` Sam Edwards @ 2026-04-14 14:13 ` Russell King (Oracle) 2026-04-15 1:19 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-14 14:13 UTC (permalink / raw) To: Sam Edwards Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni Hi Sam, Most of this email was written this morning, but I didn't have a chance to finish nor send it due to how busy I am. I had also written a separate reply last night with detailed results of what I was seeing but didn't/haven't got around to sending it. Not currently sure whether I saved it as draft or got rid of it yet. On Mon, Apr 13, 2026 at 02:54:30PM -0700, Sam Edwards wrote: > On Mon, Apr 13, 2026, 11:49 Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > > > On Mon, Apr 13, 2026 at 11:02:22AM -0700, Jakub Kicinski wrote: > > > On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote: > > > > Since we are seeing receive buffer exhaustion on several platforms, > > > > let's enable the interrupts so the statistics we publish via ethtool -S > > > > actually work to aid diagnosis. I've been in two minds about whether > > > > to send this patch, but given the problems with stmmac at the moment, > > > > I think it should be merged. > > > > > > Sorry for a under-research response but wasn't there are person trying > > > to fix the OOM starvation issue? Who was supposed to add a timer? > > > Is your problem also OOM related or do you suspect something else? > > > > It is not OOM related. I have this patch applied: > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index 131ea887bedc..614d0e10e3e6 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -5095,14 +5095,18 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue) > > > > if (!buf->page) { > > buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp); > > - if (!buf->page) > > + if (!buf->page) { > > + netdev_err(priv->dev, "q%u: no buffer 1\n", queue); > > break; > > + } > > } > > > > if (priv->sph_active && !buf->sec_page) { > > buf->sec_page = page_pool_alloc_pages(rx_q->page_pool, gfp); > > - if (!buf->sec_page) > > + if (!buf->sec_page) { > > + netdev_err(priv->dev, "q%u: no buffer 2\n", queue); > > break; > > + } > > > > buf->sec_addr = page_pool_get_dma_addr(buf->sec_page); > > } > > > > and it is silent, so we are not suffering starvation of buffers. > > > > However, the hardware hangs during iperf3, and because it triggers the > > MAC to stream PAUSE frames, and my network uses Netgear GS108 and GS116 > > unmanaged switches that always use flow-control between them (there's no > > way not to) it takes down the entire network - as we've discussed > > before. So, this problem is pretty fatal to the *entire* network. > > > > With this patch, the existing statistical counters for this condition > > are incremented, and thus users can use ethtool -S to see what happened > > and report whether they are seeing the same issue. > > > > Without this patch applied, there are no diagnostics from stmmac that > > report what the state is. ethtool -d doesn't list the appropriate > > registers (as I suspect part of the problem is the number of queues > > is somewhat dynamic - userspace can change that configuration through > > ethtool). > > > > Thus, one has to resort to using devmem2 to find out what's happened. > > That's not user friendly. > > > > For me, devmem2 shows: > > > > Channel 0 status register: > > Value at address 0x02491160: 0x00000484 > > bit 10: ETI early transmit interrupt - set > > bit 9 : RWT receive watchdog - clear > > bit 8 : RPS receieve process stopped - clear > > bit 7 : RBU receive buffer unavailable - set > > bit 6 : RI receive interrupt - clear > > bit 2 : TBU transmit buffer unavailable - set > > bit 1 : TPS transmit process stopped - clear > > bit 0 : TI transmit interrupt - clear > > Should that reset trigger be RPS, not RBU? My understanding of these > status bits is RBU is just "RxDMA has failed to take a frame from the > RxFIFO" while RPS is "the RxFIFO is full." That would make RBU our > critical threshold to start proactively refilling, and RPS the "too > late, we lose" threshold. That's a fine theory, but look at the channel 0 status register above, noting that any interrupts that are raised but not enabled remain set. RPS is not set, so RPS is not being raised, only RBU when this condition occurs. > Thinking aloud: Do you suppose the RxDMA waits for a wakeup signal > sent whenever a frame is added to RxFIFO? That might explain why the > former never recovers once the latter is full: a manual wakeup needs > to be sent whenever we resolve RBU. Does the .enable_dma_reception() > op need to be implemented for dwmac5, or have you tried that already? I've not found anything in the closest documentation I have. The Xavier is Synopsys IP v5.0, whereas i.MX8M is v5.1 - and v5.1 compared to previous versions reads the same for statements concerning recovering from a RBU condition: "In ring mode, the application should advance the Receive Descriptor Tail Pointer register of a channel. This bit is set only when the DMA owns the previous Rx descriptor." I've tried expanding what happens when RBU fires, dumping some of the receive state and the receive ring: [ 55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245 last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64 cur_rx == dirty_rx _should_ mean that we fully refilled the ring. These are their values at the point the RBU interrupt fires. last_cur_rx and last_dirty_rx are the values of cur_rx/dirty_rx when stmmac_rx() was last entered. last_cur_rx_post is the value of cur_rx when stmmac_rx() finished looping but before we have refilled the ring. count is the value of count just before stmmac_rx() returns, budget is the limit at that point. The patch that prints errors should we fail to allocate a buffer is in place, none of those errors fire, so we are fully repopulating the ring each time stmmac_rx() runs. [ 55.766785] RX descriptor ring: [ 55.766802] 000 [0x0000007fffffe000]: 0x0 0x12 0x0 0x340105ee [ 55.766826] 001 [0x0000007fffffe010]: 0x0 0x12 0x0 0x340105ee [ 55.766843] 002 [0x0000007fffffe020]: 0x0 0x12 0x0 0x340105ee [ 55.766860] 003 [0x0000007fffffe030]: 0x0 0x12 0x0 0x340105ee ... [ 55.772205] 308 [0x0000007ffffff340]: 0x0 0x12 0x0 0x340105ee [ 55.772221] 309 [0x0000007ffffff350]: 0x0 0x12 0x0 0x340105ee [ 55.772237] 310 [0x0000007ffffff360]: 0x0 0x12 0x0 0x340105ee [ 55.772253] 311 [0x0000007ffffff370]: 0x0 0x12 0x0 0x340105ee [ 55.772268] 312 [0x0000007ffffff380]: 0x0 0x12 0x0 0x340105ee [ 55.772284] 313 [0x0000007ffffff390]: 0x0 0x12 0x0 0x340105ee [ 55.772300] 314 [0x0000007ffffff3a0]: 0x0 0x12 0x0 0x340105ee [ 55.772315] 315 [0x0000007ffffff3b0]: 0x0 0x12 0x0 0x340105ee ... [ 55.775539] 511 [0x0000007ffffffff0]: 0x0 0x12 0x0 0x340105ee Every ring entry contains the same RDES3 value, so it really is completely full at the point RBU fires (bit 31 clear means software owns the descriptor, and it's basically saying first/last segment, RDES1 valid, buffer 1 length of 1518. The Rx tail pointer register contains 0xfffff3a0 which is entry 314. The current receive descriptor address is also 0xfffff3a0. Note that these values were obtained some time after the RBU interrupt fired (due to the time taken for devmem2 to access every stmmac register - I have a script that dumps the entire stmmac register state via devmem2.) The other thing to note is that when looking at debugfs stmmaceth/eth0/descriptor* (or whatever it's called, I don't have the NX powered to look at the moment, and I didn't take a copy of it last night) all tne descriptor entries are fully repopulated with buffers and owned by the hardware. I've tried using devmem2 to write to the rx tail pointer to kick it back into action, but that changes nothing. I've tried writing the next descriptor value and previous descriptor value, but that appears to have no effect, it stedfastly remains stuck - and as that is the documented recovery from RBU and there's no "receive demand" register listed in dwmac v4 or v5 documentation, there seems to be no other documented way. The debug registers that I provided in my previous email suggest that the MAC is waiting for a packet, and MTL's descriptor reader is idle (I'm guessing it would only briefly change when the tail pointer is updated.) Note that I have augmented the driver with more dma_rmb() + dma_wmb() in stmmac_rx(), dwmac4_wrback_get_rx_status(), and stmmac_rx_refill() to ensure that reads and writes to the descriptor ring are correctly ordered. While this generally allows iperf3 to run for a few more seconds, it doesn't solve the problem - it is very rare for iperf3 to actually complete before stmmac has taken down my entire network. I have noticed that on some occasions I see a small number of RBU interrupts before it falls over. I'm not going to have much time to look at this today due to further appointments (I also didn't yesterday - only an hour in the morning and a bit more time late in the evening/night.) I should have more time during the rest of the week... but that may change. From the above, it looks like NAPI/stmmac driver isn't keeping up with the packet flow coming from an i.MX6 platform (which is limited to around 470Mbps due to internal SoC bus limitations.) I'll also mention that stmmac falls apart even more if I run iperf3 -c -R against an x86 machine that is capable of saturating the network, so much so that the arm-smmu IOMMU throws errors even after the stmmac hardware has been soft-reset for addresses that were in the ring *prior* to the soft-reset occuring (stmmac is soft-reset each time the netdev is brought up.) The only recovery from that is to reboot - down/up the interface just spews more IOMMU errors. I don't have the details of that to hand and I don't have enough time to re-run that test this morning. From what I remember, the transmit side also stops processing descriptors (one can see them accumulate in the debugfs file,) which eventually leads to the netdev watchdog firing. It currently looks like the stmmac v5 EQoS IP works fine only under light packet loads. If one puts any stress on it, then the hardware totally falls apart. This may point to an issue with the AXI bus configuration that is specific to this platform, but that requires further investigation. I'll mention again, in case anyone's forgotten, that these problems pre-date any of my cleanups I've made to stmmac. From what I remember they are reproducible with the kernels that are supplied as part of the nVidia BSP. Again, as I don't have access to the nVidia platform at the moment, I can't include the details in this email. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-14 14:13 ` Russell King (Oracle) @ 2026-04-15 1:19 ` Russell King (Oracle) 2026-04-15 2:12 ` Sam Edwards 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-15 1:19 UTC (permalink / raw) To: Sam Edwards Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel survives iperf3 -c -R to the imx6. Dumping the registers and comparing, and then forcing the RQS and TQS values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144, *256 = 36864 ytes) respectively seems to solve the problem. Under net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.) Suspiciously, 36 * 4 = 144, and I also see that this kernel programs all four of the MTL receive operation mode registers, but only the first MTL transmit operation mode register. However, DMA channels 1-3 aren't initialised. net-next derives them from: unsigned int tqs = fifosz / 256 - 1; where fifosz is passed in to dwmac4_dma_tx_chan_op_mode() and unsigned int rqs = fifosz / 256 - 1; where fifosz is passed in to dwmac4_dma_rx_chan_op_mode(). Now, according to the DMA capabilities: Number of Additional RX channel: 4 Number of Additional TX channel: 4 Number of Additional RX queues: 4 Number of Additional TX queues: 4 TX Fifo Size: 65536 RX Fifo Size: 65536 However: # ethtool -l eth0 Channel parameters for eth0: Pre-set maximums: RX: 4 TX: 4 Other: 0 Combined: 0 Current hardware settings: RX: 1 TX: 1 Other: 0 Combined: 0 So, we end up allocating the entire 64K of the tx and rx FIFO to one queue in net-next. Looking back at 5.10, I don't see any code that would account for these values being programmed for TQS and RQS, it looks like the calculations are basically the same as we have today. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-15 1:19 ` Russell King (Oracle) @ 2026-04-15 2:12 ` Sam Edwards 2026-04-15 12:43 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Sam Edwards @ 2026-04-15 2:12 UTC (permalink / raw) To: Russell King (Oracle) Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel > survives iperf3 -c -R to the imx6. Hi Russell, Aw, you beat me to it! I was about to report that 5.10.104-tegra is unaffected. And my iperf3 server is a multi-GbE amd64 machine. > Dumping the registers and comparing, and then forcing the RQS and TQS > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144, > *256 = 36864 ytes) respectively seems to solve the problem. Under > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.) > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs > all four of the MTL receive operation mode registers, but only the > first MTL transmit operation mode register. However, DMA channels 1-3 > aren't initialised. Wow, great! I wonder if the problem is that the MTL FIFOs are smaller than that, so when the DMA suffers a momentary hiccup, the FIFOs are allowed to overflow, putting the hardware in a bad state. Though I suspect this is only half of the problem: do you still see RBUs? Everything you've shared so far suggests the DMA failures are _not_ because the rx ring is drying up. My gut's telling me the DMA unit is encountering an AXI error, triggering RBU plus some kind of recovery behavior, and the recovery takes the DMA offline long enough for the FIFO to overflow (without triggering RPS because the RQS threshold is unreachable). It seems that the problem happens less frequently on my test setup when I boot with iommu.passthrough=1 but that could be my imagination. But if the hardware remains stable with RQS and TQS set correctly, I don't feel an urgent need to dig deeper. :) > Looking back at 5.10, I don't see any code that would account for these > values being programmed for TQS and RQS, it looks like the calculations > are basically the same as we have today. Note that Nvidia have their own "nvethernet" driver for their vendor kernel, which appears to pick the FIFO sizes from hardcoded tables in its eqos_configure_mtl_queue() [1] function. Cheers, Sam [1] https://github.com/proski/nvethernet/blob/main/nvethernetrm/osi/core/eqos_core.c#L263 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-15 2:12 ` Sam Edwards @ 2026-04-15 12:43 ` Russell King (Oracle) 2026-04-15 17:38 ` Sam Edwards 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-15 12:43 UTC (permalink / raw) To: Sam Edwards Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote: > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle) > <linux@armlinux.org.uk> wrote: > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel > > survives iperf3 -c -R to the imx6. > > Hi Russell, > > Aw, you beat me to it! I was about to report that 5.10.104-tegra is > unaffected. And my iperf3 server is a multi-GbE amd64 machine. > > > Dumping the registers and comparing, and then forcing the RQS and TQS > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144, > > *256 = 36864 ytes) respectively seems to solve the problem. Under > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.) > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs > > all four of the MTL receive operation mode registers, but only the > > first MTL transmit operation mode register. However, DMA channels 1-3 > > aren't initialised. > > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller > than that, so when the DMA suffers a momentary hiccup, the FIFOs are > allowed to overflow, putting the hardware in a bad state. > > Though I suspect this is only half of the problem: do you still see > RBUs? Everything you've shared so far suggests the DMA failures are > _not_ because the rx ring is drying up. Yes. Note that RBUs will happen not because of DMA failures, but if the kernel fails to keep up with the packet rate. RBU means "we read the next descriptor, and it wasn't owned by hardware". > > Looking back at 5.10, I don't see any code that would account for these > > values being programmed for TQS and RQS, it looks like the calculations > > are basically the same as we have today. > > Note that Nvidia have their own "nvethernet" driver for their vendor > kernel, which appears to pick the FIFO sizes from hardcoded tables in > its eqos_configure_mtl_queue() [1] function. That has: const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = { { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) }, { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) }, }; const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = { { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) }, { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) }, }; where each of those values is the RQS/TQS value to use in KiB: #define FIFO_SZ(x) ((((x) * 1024U) / 256U) - 1U) This doesn't correspond with the values I'm seeing programmed into the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143 (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables above from a quick look, but they're not in the right place! For example, tx_fifo_sz[] doesn't contain an entry for 36KiB. rx_fifo_sz[0][0..3] looks plausible. It's certainly not a case of misreading the register values, this is what devmem2 said: Value at address 0x02490d00: 0x008f000a Value at address 0x02490d30: 0x02379eb0 where TQS is bits 24:16 of the register at offset 0xd00 - which is 0x8f, and RQS is bits 29:20 of the register at 0xd30, which is 0x23. Now, as for FIFO sizes, if we sum up all the entries, then we get: SUM(rx_fifo_size[0][]) = 60KiB SUM(rx_fifo_size[1][]) = 64KiB SUM(tx_fifo_size[0][]) = 60KiB SUM(tx_fifo_size[1][]) = 64KiB From what I gather in core_local.h, l_mac_ver contains one of three values - 0 = Legacy EQOS, 1 = Orin EQOS, 2 = Orin MGBE, and which set of values is selected by bit 0 of that. Decoding this further, Legacy EQOS is IP version v5.0, Orin EQOS is v5.3, and Orin MGBE is v3.1 and v4.0. So, I wonder whether there's something in "Legacy EQOS" that consumes 4KiB of FIFO that isn't documented in iMX8M (IP v5.1). Is anyone aware of public SoC documentation that covers the v5.0 IP version? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-15 12:43 ` Russell King (Oracle) @ 2026-04-15 17:38 ` Sam Edwards 2026-04-15 19:37 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Sam Edwards @ 2026-04-15 17:38 UTC (permalink / raw) To: Russell King (Oracle) Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni On Wed, Apr 15, 2026 at 5:44 AM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote: > > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle) > > <linux@armlinux.org.uk> wrote: > > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel > > > survives iperf3 -c -R to the imx6. > > > > Hi Russell, > > > > Aw, you beat me to it! I was about to report that 5.10.104-tegra is > > unaffected. And my iperf3 server is a multi-GbE amd64 machine. > > > > > Dumping the registers and comparing, and then forcing the RQS and TQS > > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144, > > > *256 = 36864 ytes) respectively seems to solve the problem. Under > > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.) > > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs > > > all four of the MTL receive operation mode registers, but only the > > > first MTL transmit operation mode register. However, DMA channels 1-3 > > > aren't initialised. > > > > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller > > than that, so when the DMA suffers a momentary hiccup, the FIFOs are > > allowed to overflow, putting the hardware in a bad state. > > > > Though I suspect this is only half of the problem: do you still see > > RBUs? Everything you've shared so far suggests the DMA failures are > > _not_ because the rx ring is drying up. > > Yes. Note that RBUs will happen not because of DMA failures, but if > the kernel fails to keep up with the packet rate. RBU means "we read > the next descriptor, and it wasn't owned by hardware". Are you speaking from observation, documentation, or understanding? I'd define RBU the same way, but you reported: ``` [ 55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245 last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64 cur_rx == dirty_rx _should_ mean that we fully refilled the ring. [...] [...] Every ring entry contains the same RDES3 value, so it really is completely full at the point RBU fires (bit 31 clear means software owns the descriptor, and it's basically saying first/last segment, RDES1 valid, buffer 1 length of 1518. ``` It would seem* that the kernel isn't really failing to keep up with the packet rate. If RBU is firing with a ring that's not even close to empty, that tells me there's another way for it to fire. So I suspect the hardware designers implemented it to mean: "We couldn't read the next descriptor, _or_ it wasn't owned by hardware." (* However, if bit 31 is clear everywhere, wouldn't that mean the ring is actually completely depleted, not full? If count==budget, wouldn't that mean the whole ring hasn't been visited, so we only refilled 64 entries and not necessarily the entire ring? Maybe the kernel isn't keeping up after all.) > That has: > > const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = { > { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), > FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) }, > { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), > FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) }, > }; > const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = { > { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), > FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) }, > { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), > FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) }, > }; > > where each of those values is the RQS/TQS value to use in KiB: > > #define FIFO_SZ(x) ((((x) * 1024U) / 256U) - 1U) > > This doesn't correspond with the values I'm seeing programmed into > the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143 > (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables > above from a quick look, but they're not in the right place! True, but: a) I doubt 5.10.216-tegra includes exactly the same version of the driver found in this random GitHub mirror. (My intent was only to point out that they don't use 5.10's stmmac; I should have been more clear that I wasn't trying to link the same version, sorry!) b) This is vendor code; I don't know how good their testing/review process is. It might not run the way it looks. The intent seems to be for RQS > TQS (which makes intuitive sense), but as you're seeing the registers programmed the other way 'round, they might have gotten them subtly mixed up. > Now, as for FIFO sizes, if we sum up all the entries, then we > get: > > SUM(rx_fifo_size[0][]) = 60KiB > SUM(rx_fifo_size[1][]) = 64KiB > SUM(tx_fifo_size[0][]) = 60KiB > SUM(tx_fifo_size[1][]) = 64KiB I follow the math with 64KiB, but surely the 60KiB should be 9+9+9+9+1+1+1+1=40KiB? This seems to me that the "legacy EQOS" simply shifts with smaller FIFOs. Since dwmac is licensed as a soft IP core, perhaps the FIFO size is an elaboration parameter? That would mean this isn't an issue with dwmac 5.0 broadly, but with Nvidia's specific instantiation of it. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-15 17:38 ` Sam Edwards @ 2026-04-15 19:37 ` Russell King (Oracle) 2026-04-15 20:50 ` Sam Edwards 0 siblings, 1 reply; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-15 19:37 UTC (permalink / raw) To: Sam Edwards Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni On Wed, Apr 15, 2026 at 10:38:29AM -0700, Sam Edwards wrote: > On Wed, Apr 15, 2026 at 5:44 AM Russell King (Oracle) > <linux@armlinux.org.uk> wrote: > > > > On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote: > > > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle) > > > <linux@armlinux.org.uk> wrote: > > > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel > > > > survives iperf3 -c -R to the imx6. > > > > > > Hi Russell, > > > > > > Aw, you beat me to it! I was about to report that 5.10.104-tegra is > > > unaffected. And my iperf3 server is a multi-GbE amd64 machine. > > > > > > > Dumping the registers and comparing, and then forcing the RQS and TQS > > > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144, > > > > *256 = 36864 ytes) respectively seems to solve the problem. Under > > > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.) > > > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs > > > > all four of the MTL receive operation mode registers, but only the > > > > first MTL transmit operation mode register. However, DMA channels 1-3 > > > > aren't initialised. > > > > > > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller > > > than that, so when the DMA suffers a momentary hiccup, the FIFOs are > > > allowed to overflow, putting the hardware in a bad state. > > > > > > Though I suspect this is only half of the problem: do you still see > > > RBUs? Everything you've shared so far suggests the DMA failures are > > > _not_ because the rx ring is drying up. > > > > Yes. Note that RBUs will happen not because of DMA failures, but if > > the kernel fails to keep up with the packet rate. RBU means "we read > > the next descriptor, and it wasn't owned by hardware". > > Are you speaking from observation, documentation, or understanding? Observation. > I'd define RBU the same way, but you reported: It's not a question about how I define RBU - this is defined by Synopsys and I'm using it *exactly* that way as stated in the documentation. "This bit indicates that the host owns the Next Descriptor in the Receive List and the DMA cannot acquire it. The Receive Process is suspended. ... This bit is set only when the previous Receive Descriptor is owned by the DMA." In other words, DMA has processed the previous receive descriptor which _was_ owned by the hardware, written back to clear the OWN bit, and then fetches the next descriptor and finds that the OWN bit is also clear. > > ``` > [ 55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer > unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245 > last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64 > > cur_rx == dirty_rx _should_ mean that we fully refilled the ring. [...] > [...] > Every ring entry contains the same RDES3 value, so it really is > completely full at the point RBU fires (bit 31 clear means software > owns the descriptor, and it's basically saying first/last segment, > RDES1 valid, buffer 1 length of 1518. > ``` Right, because the _last_ time stmmac_rx() was called, the ring was completely refilled (as it always is for me). There are two scenarios that what I'm seeing may happen. 1) The ring was fully refilled, but before stmmac_rx() is next executed, all descriptors end up being consumed due to the rate at which packets are being received. Thus, the hardware encounters a descriptor that has OWN=0 2) The kernel has been slow to respond to packets that have been received, and because of the NAPI throttling stmmac_rx() to only process 64 descriptors at a time, we are falling way behind the hardware position. Eventually, the hardware catches up with the point at which stmmac_rx_refill() is repopulating the receive descriptors, and encounters a descriptor that has OWN=0. For (2), for example, let's take the example which you've quoted from me. stmmac_rx() gets called, and cur_rx = dirty_rx = 245. We're limited to a count of 64 meaning we're not going to process more than 64 entries no matter how far ahead the hardware is. Let's say the hardware is at e.g. descriptor 400 at this point. stmmac_rx() runs, processing descriptors. It works its way up to entry 309, at which point count == limit, so it stops, and we now have cur_rx = 309, dirty_rx = 245. The next thing stmmac_rx() does is call stmmac_rx_refill(). This looks at the difference, and calculates how many entries need to be repopulated. stmmac_rx_dirty() returns 64, as that's the number of entries between dirty_rx and the updated cur_rx. It populates those entries. At this point, dirty_rx = 309. All well and good. However, during that process, packet reception hasn't stopped, and let's say it's now at descriptor 500. In that scenario, we're consuming 100 descriptors, but only repopulating 64 descriptors. As this continues, the hardware is slowly catching up with point in the ring that stmmac_rx_refill() is repopulating the descriptors. When it does catch up, it will encounter a descriptor with OWN=0, which will fire the RBU interrupt. At this point, my debug dumps the state of the ring. If the RBU was raised when stmmac_rx()/stmmac_rx_refill() was not running, _and_ we are always successfully refilling all the entries that stmmac_rx() processed, then cur_rx will equal dirty_rx, even when the hardware could be way ahead of cur_rx. Neither of these indexes have any relevance to where the hardware actually is in the ring. The dump of the ring state *clearly* shows that all descriptors have a RDES3 value which indicates that every single descriptor is not hardware owned at this point (since RBU has been raised, the receive process is suspended, so hardware is no longer changing the ring.) > It would seem* that the kernel isn't really failing to keep up with > the packet rate. If RBU is firing with a ring that's not even close to > empty, that tells me there's another way for it to fire. So I suspect > the hardware designers implemented it to mean: > "We couldn't read the next descriptor, _or_ it wasn't owned by hardware." > > (* However, if bit 31 is clear everywhere, wouldn't that mean the ring > is actually completely depleted, not full? If count==budget, wouldn't > that mean the whole ring hasn't been visited, so we only refilled 64 > entries and not necessarily the entire ring? Maybe the kernel isn't > keeping up after all.) Ah, I think that's where our terminology differs. You seem to define full as "populated with empty buffers". I define full to mean "the hardware has filled every buffer with a packet that it has received and handed it over to software to process." Note even the terminology there - filling buffers with data. That ultimately ends up filling the ring, and when completely filled, it is full. I think of buffers like buckets. If a buffer contains no data, it is empty. If a buffer contains data, it has been filled or is full. Apply that to a list of buffers and you get the same thing. Many ethernet driver documentation uses this same terminology, so I thought it would be widely understood. > > That has: > > > > const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = { > > { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), > > FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) }, > > { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), > > FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) }, > > }; > > const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = { > > { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), > > FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) }, > > { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), > > FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) }, > > }; > > > > where each of those values is the RQS/TQS value to use in KiB: > > > > #define FIFO_SZ(x) ((((x) * 1024U) / 256U) - 1U) > > > > This doesn't correspond with the values I'm seeing programmed into > > the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143 > > (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables > > above from a quick look, but they're not in the right place! > > True, but: > a) I doubt 5.10.216-tegra includes exactly the same version of the > driver found in this random GitHub mirror. (My intent was only to > point out that they don't use 5.10's stmmac; I should have been more > clear that I wasn't trying to link the same version, sorry!) > b) This is vendor code; I don't know how good their testing/review > process is. It might not run the way it looks. The intent seems to be > for RQS > TQS (which makes intuitive sense), but as you're seeing the > registers programmed the other way 'round, they might have gotten them > subtly mixed up. > > > Now, as for FIFO sizes, if we sum up all the entries, then we > > get: > > > > SUM(rx_fifo_size[0][]) = 60KiB > > SUM(rx_fifo_size[1][]) = 64KiB > > SUM(tx_fifo_size[0][]) = 60KiB > > SUM(tx_fifo_size[1][]) = 64KiB > > I follow the math with 64KiB, but surely the 60KiB should be > 9+9+9+9+1+1+1+1=40KiB? This seems to me that the "legacy EQOS" simply > shifts with smaller FIFOs. Since dwmac is licensed as a soft IP core, > perhaps the FIFO size is an elaboration parameter? That would mean > this isn't an issue with dwmac 5.0 broadly, but with Nvidia's specific > instantiation of it. Right, 40KiB. Sorry, I'm getting interrupted almost constantly while trying to do anything. However, I've tested with 0x7f in both fields, and it still falls flat on its face. I've also tried other values, but because I had to unplug the laptop from the nvidia board to use the laptop portably due to the medical emergency situation, that caused screen to quit, so I've lost all that. Chaos reigns supreme here :/ So, I'm not sure we understand what's going on - I don't think it's that the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB results in some kind of throttling that prevents the condition which hangs the hardware. I'm not getting as much time as I'd like to really test out scenarios due to everything that is going on, and honestly I feel like just writing this week off now and giving up. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-15 19:37 ` Russell King (Oracle) @ 2026-04-15 20:50 ` Sam Edwards 2026-04-16 0:02 ` Russell King (Oracle) 0 siblings, 1 reply; 20+ messages in thread From: Sam Edwards @ 2026-04-15 20:50 UTC (permalink / raw) To: Russell King (Oracle) Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > It's not a question about how I define RBU - this is defined by Synopsys > and I'm using it *exactly* that way as stated in the documentation. > > "This bit indicates that the host owns the Next Descriptor in the > Receive List and the DMA cannot acquire it. The Receive Process is > suspended. ... This bit is set only when the previous Receive > Descriptor is owned by the DMA." > > In other words, DMA has processed the previous receive descriptor which > _was_ owned by the hardware, written back to clear the OWN bit, and > then fetches the next descriptor and finds that the OWN bit is also > clear. I'm only trying to leave open the possibility that the Synopsys technical writer and the hardware implementation team weren't communicating clearly. We already have a situation where RPS isn't behaving as documented (even if that's likely just hardware misconfiguration), so while I'm currently pretty sure RBU carries no other (actual) meaning than "DMA caught up to OWN=0," I'm only about 75% confident. > > It would seem* that the kernel isn't really failing to keep up with > > the packet rate. If RBU is firing with a ring that's not even close to > > empty, that tells me there's another way for it to fire. So I suspect > > the hardware designers implemented it to mean: > > "We couldn't read the next descriptor, _or_ it wasn't owned by hardware." > > > > (* However, if bit 31 is clear everywhere, wouldn't that mean the ring > > is actually completely depleted, not full? If count==budget, wouldn't > > that mean the whole ring hasn't been visited, so we only refilled 64 > > entries and not necessarily the entire ring? Maybe the kernel isn't > > keeping up after all.) > > Ah, I think that's where our terminology differs. > > You seem to define full as "populated with empty buffers". I define > full to mean "the hardware has filled every buffer with a packet that > it has received and handed it over to software to process." Note even > the terminology there - filling buffers with data. That ultimately > ends up filling the ring, and when completely filled, it is full. > > I think of buffers like buckets. If a buffer contains no data, it > is empty. If a buffer contains data, it has been filled or is full. > Apply that to a list of buffers and you get the same thing. Many > ethernet driver documentation uses this same terminology, so I > thought it would be widely understood. Ah okay, I was beginning to suspect the same. In my defense: though I also think of buffers in the same way, this driver calls the process of supplying empty buffers "refilling," which is also the terminology we've both been using throughout this exchange, and when something is "completely refilled" I generally call it "full." But I'm realizing now that the bidirectional (submissions+completions) nature of this ring means that "full" and "empty" aren't really well-defined concepts. I'll try to read more carefully (and switch to saying "completely dirty" and "completely clean") going forward. So the kernel is able to supply clean buffers without issue, but it somehow falls behind the incoming packet rate and the DMA is left with a completely dirty ring. I agree that stmmac_rx() is therefore just not running fast enough: either it's got really bad scheduler jitter for the ~6.3ms minimum it takes for 512x full-sized Ethernet frames to arrive from the PHY (your scenario 1), or -- more likely -- the NAPI budgets gradually fall behind the hardware (your scenario 2). > Right, 40KiB. Sorry, I'm getting interrupted almost constantly while > trying to do anything. > > However, I've tested with 0x7f in both fields, and it still falls flat > on its face. I've also tried other values, but because I had to unplug > the laptop from the nvidia board to use the laptop portably due to the > medical emergency situation, that caused screen to quit, so I've lost > all that. Chaos reigns supreme here :/ I'm sorry to hear about that, please prioritize you/yours and don't feel like you owe me speedy replies. > So, I'm not sure we understand what's going on - I don't think it's that > the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB > results in some kind of throttling that prevents the condition which > hangs the hardware. I'll try playing with the FIFO configuration on my end to learn: a) If a suitably-configured FIFO size makes the RPS status arrive as documented b) If I can safely fill the FIFO slowly (by manually stalling the driver and adding frames one at a time) and have it drain on resume c) Whether the TQS value can be adjusted independently of this problem's prevalence d) The maximum RQS value that allows the problem to happen > I'm not getting as much time as I'd like to really test out scenarios > due to everything that is going on, and honestly I feel like just > writing this week off now and giving up. I have the same hardware, observe the same issue, and find this interesting enough to keep plugging away at it. I would have no hard feelings if you left me alone with this problem for a bit. :) Be well, Sam ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-15 20:50 ` Sam Edwards @ 2026-04-16 0:02 ` Russell King (Oracle) 0 siblings, 0 replies; 20+ messages in thread From: Russell King (Oracle) @ 2026-04-16 0:02 UTC (permalink / raw) To: Sam Edwards Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller, Eric Dumazet, moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE, linux-stm32, Linux Network Development Mailing List, Paolo Abeni On Wed, Apr 15, 2026 at 01:50:53PM -0700, Sam Edwards wrote: > On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle) > <linux@armlinux.org.uk> wrote: > > > > It's not a question about how I define RBU - this is defined by Synopsys > > and I'm using it *exactly* that way as stated in the documentation. > > > > "This bit indicates that the host owns the Next Descriptor in the > > Receive List and the DMA cannot acquire it. The Receive Process is > > suspended. ... This bit is set only when the previous Receive > > Descriptor is owned by the DMA." > > > > In other words, DMA has processed the previous receive descriptor which > > _was_ owned by the hardware, written back to clear the OWN bit, and > > then fetches the next descriptor and finds that the OWN bit is also > > clear. > > I'm only trying to leave open the possibility that the Synopsys > technical writer and the hardware implementation team weren't > communicating clearly. We already have a situation where RPS isn't > behaving as documented (even if that's likely just hardware > misconfiguration), so while I'm currently pretty sure RBU carries no > other (actual) meaning than "DMA caught up to OWN=0," I'm only about > 75% confident. It doesn't make sense for RPS to be set though. RPS is "Receive Process Stopped" and it's documented as being raised when the receive process enters the stopped state. If we look at the DMA Debug Status 0 register at 0x100c, then this gives us a four bit bitfield for channels 0, 1 and 2. Further channels are in 0x1010. I've added code to dump these when RBU occurs: dwc-eth-dwmac 2490000.ethernet eth0: debug status: 0x00006400 0x00000000 bits 11:8 are RPS0, which indiciates that the DMA channel 0 receive process state is "Suspended (Rx Descriptor Unavailable)". If this were 0, then it would be "Stopped (Reset or Stop Receive Command issued)". So, RPS isn't being raised because the process state isn't entering the stopped state, which makes sense - because we haven't issued a stop command, nor have we caused a reset, and the documented recovery from this condition is to merely advance the tail pointer, rather than issuing a command to re-start the receive process. When this is done (because stmmac_rx() continues to periodically run because of NAPI) RPS0 does change back to 3 "Running (Waiting for Rx packet)" but it seems that although there are packets waiting to be written out, that never happens (the Queue 0 Receive Debug register indicates that there are packets in the receive queue, the receive queue fill level is above the flow control activate threshold, and the MAC itself hammers the network with pause frames as a result.) Thus, I think that the fact that RPS isn't being signalled is entirely reasonable and consistent with the available documentation. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts 2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle) 2026-04-12 14:01 ` Maxime Chevallier 2026-04-13 18:02 ` Jakub Kicinski @ 2026-04-13 22:00 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 20+ messages in thread From: patchwork-bot+netdevbpf @ 2026-04-13 22:00 UTC (permalink / raw) To: Russell King Cc: andrew, alexandre.torgue, andrew+netdev, davem, edumazet, kuba, linux-arm-kernel, linux-stm32, netdev, pabeni, cfsworks Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Fri, 10 Apr 2026 14:07:51 +0100 you wrote: > Enable receive process stopped and receive buffer unavailable > interrupts, so that the statistic counters can be updated. > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > --- > Since we are seeing receive buffer exhaustion on several platforms, > let's enable the interrupts so the statistics we publish via ethtool -S > actually work to aid diagnosis. I've been in two minds about whether > to send this patch, but given the problems with stmmac at the moment, > I think it should be merged. > > [...] Here is the summary with links: - [net-next] net: stmmac: enable RPS and RBU interrupts https://git.kernel.org/netdev/net-next/c/1b9707e6f1a9 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-04-16 0:03 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle) 2026-04-12 14:01 ` Maxime Chevallier 2026-04-12 14:23 ` Russell King (Oracle) 2026-04-13 1:42 ` Sam Edwards 2026-04-13 7:24 ` Russell King (Oracle) 2026-04-13 7:28 ` Russell King (Oracle) 2026-04-13 18:02 ` Jakub Kicinski 2026-04-13 18:49 ` Russell King (Oracle) 2026-04-13 20:50 ` Jakub Kicinski 2026-04-13 20:53 ` Russell King (Oracle) 2026-04-13 21:54 ` Sam Edwards 2026-04-14 14:13 ` Russell King (Oracle) 2026-04-15 1:19 ` Russell King (Oracle) 2026-04-15 2:12 ` Sam Edwards 2026-04-15 12:43 ` Russell King (Oracle) 2026-04-15 17:38 ` Sam Edwards 2026-04-15 19:37 ` Russell King (Oracle) 2026-04-15 20:50 ` Sam Edwards 2026-04-16 0:02 ` Russell King (Oracle) 2026-04-13 22:00 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox