public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Alexandre Torgue <alexandre.torgue@foss.st.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-stm32@st-md-mailman.stormreply.com, netdev@vger.kernel.org,
	Paolo Abeni <pabeni@redhat.com>, Sam Edwards <cfsworks@gmail.com>
Subject: Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
Date: Mon, 13 Apr 2026 19:49:46 +0100	[thread overview]
Message-ID: <ad06yiZZbLC9k3jY@shell.armlinux.org.uk> (raw)
In-Reply-To: <20260413110222.49fc3759@kernel.org>

On Mon, Apr 13, 2026 at 11:02:22AM -0700, Jakub Kicinski wrote:
> On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote:
> > Since we are seeing receive buffer exhaustion on several platforms,
> > let's enable the interrupts so the statistics we publish via ethtool -S
> > actually work to aid diagnosis. I've been in two minds about whether
> > to send this patch, but given the problems with stmmac at the moment,
> > I think it should be merged.
> 
> Sorry for a under-research response but wasn't there are person trying
> to fix the OOM starvation issue? Who was supposed to add a timer?
> Is your problem also OOM related or do you suspect something else?

It is not OOM related. I have this patch applied:

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 131ea887bedc..614d0e10e3e6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5095,14 +5095,18 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
 
 		if (!buf->page) {
 			buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
-			if (!buf->page)
+			if (!buf->page) {
+				netdev_err(priv->dev, "q%u: no buffer 1\n", queue);
 				break;
+			}
 		}
 
 		if (priv->sph_active && !buf->sec_page) {
 			buf->sec_page = page_pool_alloc_pages(rx_q->page_pool, gfp);
-			if (!buf->sec_page)
+			if (!buf->sec_page) {
+				netdev_err(priv->dev, "q%u: no buffer 2\n", queue);
 				break;
+			}
 
 			buf->sec_addr = page_pool_get_dma_addr(buf->sec_page);
 		}

and it is silent, so we are not suffering starvation of buffers.

However, the hardware hangs during iperf3, and because it triggers the
MAC to stream PAUSE frames, and my network uses Netgear GS108 and GS116
unmanaged switches that always use flow-control between them (there's no
way not to) it takes down the entire network - as we've discussed
before. So, this problem is pretty fatal to the *entire* network.

With this patch, the existing statistical counters for this condition
are incremented, and thus users can use ethtool -S to see what happened
and report whether they are seeing the same issue.

Without this patch applied, there are no diagnostics from stmmac that
report what the state is. ethtool -d doesn't list the appropriate
registers (as I suspect part of the problem is the number of queues
is somewhat dynamic - userspace can change that configuration through
ethtool).

Thus, one has to resort to using devmem2 to find out what's happened.
That's not user friendly.

For me, devmem2 shows:

Channel 0 status register:
Value at address 0x02491160: 0x00000484
bit 10: ETI early transmit interrupt - set
bit 9 : RWT receive watchdog - clear
bit 8 : RPS receieve process stopped - clear
bit 7 : RBU receive buffer unavailable - set
bit 6 : RI  receive interrupt - clear
bit 2 : TBU transmit buffer unavailable - set
bit 1 : TPS transmit process stopped - clear
bit 0 : TI  transmit interrupt - clear

Debug status register:
Value at address 0x0249100c: 0x00006300
TPS[3:0] = 6 = Suspended, Tx descriptor unavailable or Tx buffer
		underflow
RPS[3:0] = 3 = Running, waiting for Rx packet

Metal Queue 0 debug register:
Value at address 0x02490d38: 0x002e0020
PRXQ[13:0] = 0x2e = 46 packets in receive queue
RXQSTS[1:0] = 2 = Rx queue fill-level above flow-control activate
		threshold
RRCSTS[1:0] = 0 = Rx Queue Read Controller State = Idle

> Firing interrupts when Rx fill ring runs dry (which IIUC this patches
> dies?) is not a good idea.

Well, I'm thinking that at least on some platforms, such as the Jetson
Xavier NX, unless a different solution can be found, we need the RBU
interrupt to fire off a reset of the stmmac IP when this happens to
reduce the PAUSE frame flood on the network.

If we can't do that, then I think stmmac on these platforms needs to be
marked with CONFIG_BROKEN because right now there doesn't seem to be any
other viable solution.

My intention with this patch is merely to start collecting the already
existing statistics so other users can start seeing whether they are
hitting the same or similar problem. If we're not prepared to do that,
then we should delete the useless statistics from ethtool -S, but I
suspect they're now part of the UAPI, even though without this patch
they will remain stedfastly stuck at zero.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

  reply	other threads:[~2026-04-13 18:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10 13:07 [PATCH net-next] net: stmmac: enable RPS and RBU interrupts Russell King (Oracle)
2026-04-12 14:01 ` Maxime Chevallier
2026-04-12 14:23   ` Russell King (Oracle)
2026-04-13  1:42     ` Sam Edwards
2026-04-13  7:24       ` Russell King (Oracle)
2026-04-13  7:28         ` Russell King (Oracle)
2026-04-13 18:02 ` Jakub Kicinski
2026-04-13 18:49   ` Russell King (Oracle) [this message]
2026-04-13 20:50     ` Jakub Kicinski
2026-04-13 20:53       ` Russell King (Oracle)
2026-04-13 21:54     ` Sam Edwards
2026-04-14 14:13       ` Russell King (Oracle)
2026-04-15  1:19         ` Russell King (Oracle)
2026-04-15  2:12           ` Sam Edwards
2026-04-15 12:43             ` Russell King (Oracle)
2026-04-15 17:38               ` Sam Edwards
2026-04-15 19:37                 ` Russell King (Oracle)
2026-04-15 20:50                   ` Sam Edwards
2026-04-16  0:02                     ` Russell King (Oracle)
2026-04-13 22:00 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad06yiZZbLC9k3jY@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=alexandre.torgue@foss.st.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrew@lunn.ch \
    --cc=cfsworks@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-stm32@st-md-mailman.stormreply.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox