From: "Théo Lebrun" <theo.lebrun@bootlin.com>
To: "Lukasz Raczylo" <lukasz@raczylo.com>, <netdev@vger.kernel.org>
Cc: "Nicolas Ferre" <nicolas.ferre@microchip.com>,
"Claudiu Beznea" <claudiu.beznea@tuxon.dev>,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>, <linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>,
<linux-rpi-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
Date: Thu, 14 May 2026 12:31:02 +0200 [thread overview]
Message-ID: <DIIBWIZCWTLY.P8H6UNUWM22K@bootlin.com> (raw)
In-Reply-To: <cover.1777064117.git.lukasz@raczylo.com>
Hello Lukasz,
On Sat Apr 25, 2026 at 12:38 AM CEST, Lukasz Raczylo wrote:
> This series proposes three candidate fixes for the silent TX stall
> observed on Raspberry Pi 5 (BCM2712 SoC, Cadence GEM via RP1 PCIe
> south bridge). The bug has been reported, with reproducers, at:
I've taken over the MACB driver maintenance following Clandiu & Nicolas'
work. I have read with curiosity your series and links attached
(though I skimmed over parts because there's been a lot of discussion).
Have you moved forward since your initial post? I've seen your
still-working-on-it message from May 8th on the rpi kernel PR. I have a
few remarks and/or questions:
- You still think the two fix patches solve it? Any clearer picture of
which of the two patches inbetween [1/3] or[2/3] fixes it? Does
[3/3] ever trigger on your targets?
- Can you clarify the exact symptoms? I've seen a few contradictory
facts. Two I remember:
- You say here it is a Tx stall but I've seen messages in the linked
threads that say explicitely broken Tx & Rx.
https://github.com/cilium/cilium/issues/43198#issue-3706713821
- You say here link down/up fixes it, but there is a comment that
says they unload/reload the module (rtheobald). They don't say
explicitely that link down/up doesn't work for them though, but
someone before in the thread recommended link down/up. Another
one says "Only power cycle recovers the node" (lexfrei).
- Also, some messages point out disabling TSO / SG / EEE helped it.
Any comment on that? It would help point fingers.
- Some comments are about DT props missing. Is that lead dead now?
- I've seen no mention of the bug having been reproduced on upstream
kernel (?). What does the rpi kernel bring to the table that makes
everyone use it?
- Anything was found to increase the reproducibility of the bug? If it
was then a bisect could be made possible, as I've seen mentions that
it didn't appear on some older kernels.
Now about the patches:
> Reading the current driver we identified three plausible races
> between driver and hardware, each of which could independently
> produce the observed behaviour. We did not determine which is the
> actual root cause -- that likely requires either BCM2712/RP1
> documentation we do not have, or dynamic tracing of the driver
> during an in-situ stall. The series therefore attempts to close
> all three, with each commit message stating which specific race
> that patch is targeting.
>
> Patch 1/3 -- flush PCIe posted write after TSTART doorbell.
> Writes to NCR are posted PCIe writes and may not reach the MAC
> before the driver returns. If the TSTART doorbell is lost, no
> TX starts, no TCOMP arrives, and the ring goes quiescent. A
> read-back of NCR after the write is a standard read-after-write
> PCIe flush.
- Makes sense, but only on MACB mounted over PCI, which is not the
majority.
- IDK if we can do better than a readl(NCR) on all platforms.
- I am surprised it is the only writel() that needs to be flushed?
> Patch 2/3 -- re-check ISR after IER re-enable in macb_tx_poll().
> An existing comment in macb_tx_poll() notes that completions
> raised while TCOMP is masked do not re-fire when IER is
> re-enabled, and mitigates the window with macb_tx_complete_pending(),
> which inspects driver-visible ring state only (after rmb()). On
> PCIe-attached parts the descriptor DMA write that sets TX_USED
> can remain in flight when that check runs; the rmb() orders CPU
> writes but does not retire peripheral DMA. Reading ISR directly
> after IER re-enable addresses this in two ways: (a) the MMIO read
> is an architected PCIe read barrier for prior DMA writes, so a
> subsequent macb_tx_complete_pending() sees up-to-date TX_USED
> state; (b) it directly observes a pending TCOMP bit if the
> hardware has one set. Either signal reschedules NAPI.
This will not fly because ISR might be read-to-clear.
See macb_queue_isr_clear() and how it is used. So we cannot re-read ISR
safely on those platforms.
> Patch 3/3 -- TX stall watchdog. Defence-in-depth. If patches
> 1 and 2 close the races we identified, this patch performs a
> single spin_lock_irqsave/unlock and a branch per queue per
> second with no other effect. If a further race remains that we
> have not identified, it invokes the driver's own existing
> macb_tx_restart(), which already verifies that TBQP is behind
> tx_head before re-asserting TSTART. We include this patch
> because we have empirically observed multi-minute stalls on this
> hardware; we are willing to drop it if the preference is for
> 1 and 2 to stand alone.
Good idea, but that is what ndo_tx_timeout() is meant for no? It is a
mechanism that is not specific to our HW so that should be implemented
at the subsystem level, and it looks like it already is. :-)
We are aware of a few software scheduling races that we plan on fixing.
If your above patches ended up not fixing the issue, you could look
into those.
https://lore.kernel.org/netdev/DHIT9TPJQJ46.21A89R5UAFXVH@bootlin.com/
Thanks!
--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
next prev parent reply other threads:[~2026-05-14 10:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-05-05 13:17 ` Andrea della Porta
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-05-05 13:30 ` Andrea della Porta
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-05-14 10:31 ` Théo Lebrun [this message]
2026-05-14 21:51 ` Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 " Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 1/3] net: macb: flush PCIe posted write after TSTART doorbell (PCIe-only) Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 2/3] net: macb: insert PCIe read barrier before TX completion descriptor check Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo
2026-05-15 9:53 ` [PATCH net-next] net: macb: fix build of TX stall watchdog by replacing undefined netdev_warn_ratelimited Lukasz Raczylo
2026-05-15 12:47 ` Andrew Lunn
2026-05-15 13:08 ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DIIBWIZCWTLY.P8H6UNUWM22K@bootlin.com \
--to=theo.lebrun@bootlin.com \
--cc=andrew+netdev@lunn.ch \
--cc=claudiu.beznea@tuxon.dev \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rpi-kernel@lists.infradead.org \
--cc=lukasz@raczylo.com \
--cc=netdev@vger.kernel.org \
--cc=nicolas.ferre@microchip.com \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.