From: "Théo Lebrun" <theo.lebrun@bootlin.com>
To: "Lukasz Raczylo" <lukasz@raczylo.com>, <netdev@vger.kernel.org>
Cc: "Nicolas Ferre" <nicolas.ferre@microchip.com>,
"Claudiu Beznea" <claudiu.beznea@tuxon.dev>,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>, <linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>,
<linux-rpi-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
Date: Thu, 14 May 2026 12:31:02 +0200 [thread overview]
Message-ID: <DIIBWIZCWTLY.P8H6UNUWM22K@bootlin.com> (raw)
In-Reply-To: <cover.1777064117.git.lukasz@raczylo.com>
Hello Lukasz,
On Sat Apr 25, 2026 at 12:38 AM CEST, Lukasz Raczylo wrote:
> This series proposes three candidate fixes for the silent TX stall
> observed on Raspberry Pi 5 (BCM2712 SoC, Cadence GEM via RP1 PCIe
> south bridge). The bug has been reported, with reproducers, at:
I've taken over the MACB driver maintenance following Clandiu & Nicolas'
work. I have read with curiosity your series and links attached
(though I skimmed over parts because there's been a lot of discussion).
Have you moved forward since your initial post? I've seen your
still-working-on-it message from May 8th on the rpi kernel PR. I have a
few remarks and/or questions:
- You still think the two fix patches solve it? Any clearer picture of
which of the two patches inbetween [1/3] or[2/3] fixes it? Does
[3/3] ever trigger on your targets?
- Can you clarify the exact symptoms? I've seen a few contradictory
facts. Two I remember:
- You say here it is a Tx stall but I've seen messages in the linked
threads that say explicitely broken Tx & Rx.
https://github.com/cilium/cilium/issues/43198#issue-3706713821
- You say here link down/up fixes it, but there is a comment that
says they unload/reload the module (rtheobald). They don't say
explicitely that link down/up doesn't work for them though, but
someone before in the thread recommended link down/up. Another
one says "Only power cycle recovers the node" (lexfrei).
- Also, some messages point out disabling TSO / SG / EEE helped it.
Any comment on that? It would help point fingers.
- Some comments are about DT props missing. Is that lead dead now?
- I've seen no mention of the bug having been reproduced on upstream
kernel (?). What does the rpi kernel bring to the table that makes
everyone use it?
- Anything was found to increase the reproducibility of the bug? If it
was then a bisect could be made possible, as I've seen mentions that
it didn't appear on some older kernels.
Now about the patches:
> Reading the current driver we identified three plausible races
> between driver and hardware, each of which could independently
> produce the observed behaviour. We did not determine which is the
> actual root cause -- that likely requires either BCM2712/RP1
> documentation we do not have, or dynamic tracing of the driver
> during an in-situ stall. The series therefore attempts to close
> all three, with each commit message stating which specific race
> that patch is targeting.
>
> Patch 1/3 -- flush PCIe posted write after TSTART doorbell.
> Writes to NCR are posted PCIe writes and may not reach the MAC
> before the driver returns. If the TSTART doorbell is lost, no
> TX starts, no TCOMP arrives, and the ring goes quiescent. A
> read-back of NCR after the write is a standard read-after-write
> PCIe flush.
- Makes sense, but only on MACB mounted over PCI, which is not the
majority.
- IDK if we can do better than a readl(NCR) on all platforms.
- I am surprised it is the only writel() that needs to be flushed?
> Patch 2/3 -- re-check ISR after IER re-enable in macb_tx_poll().
> An existing comment in macb_tx_poll() notes that completions
> raised while TCOMP is masked do not re-fire when IER is
> re-enabled, and mitigates the window with macb_tx_complete_pending(),
> which inspects driver-visible ring state only (after rmb()). On
> PCIe-attached parts the descriptor DMA write that sets TX_USED
> can remain in flight when that check runs; the rmb() orders CPU
> writes but does not retire peripheral DMA. Reading ISR directly
> after IER re-enable addresses this in two ways: (a) the MMIO read
> is an architected PCIe read barrier for prior DMA writes, so a
> subsequent macb_tx_complete_pending() sees up-to-date TX_USED
> state; (b) it directly observes a pending TCOMP bit if the
> hardware has one set. Either signal reschedules NAPI.
This will not fly because ISR might be read-to-clear.
See macb_queue_isr_clear() and how it is used. So we cannot re-read ISR
safely on those platforms.
> Patch 3/3 -- TX stall watchdog. Defence-in-depth. If patches
> 1 and 2 close the races we identified, this patch performs a
> single spin_lock_irqsave/unlock and a branch per queue per
> second with no other effect. If a further race remains that we
> have not identified, it invokes the driver's own existing
> macb_tx_restart(), which already verifies that TBQP is behind
> tx_head before re-asserting TSTART. We include this patch
> because we have empirically observed multi-minute stalls on this
> hardware; we are willing to drop it if the preference is for
> 1 and 2 to stand alone.
Good idea, but that is what ndo_tx_timeout() is meant for no? It is a
mechanism that is not specific to our HW so that should be implemented
at the subsystem level, and it looks like it already is. :-)
We are aware of a few software scheduling races that we plan on fixing.
If your above patches ended up not fixing the issue, you could look
into those.
https://lore.kernel.org/netdev/DHIT9TPJQJ46.21A89R5UAFXVH@bootlin.com/
Thanks!
--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
prev parent reply other threads:[~2026-05-14 10:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-05-05 13:17 ` Andrea della Porta
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-05-05 13:30 ` Andrea della Porta
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-05-14 10:31 ` Théo Lebrun [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DIIBWIZCWTLY.P8H6UNUWM22K@bootlin.com \
--to=theo.lebrun@bootlin.com \
--cc=andrew+netdev@lunn.ch \
--cc=claudiu.beznea@tuxon.dev \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rpi-kernel@lists.infradead.org \
--cc=lukasz@raczylo.com \
--cc=netdev@vger.kernel.org \
--cc=nicolas.ferre@microchip.com \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox