From: "Théo Lebrun" <theo.lebrun@bootlin.com>
To: "Lukasz Raczylo" <lukasz@raczylo.com>, <netdev@vger.kernel.org>
Cc: "Andrea della Porta" <andrea.porta@suse.com>,
"Nicolas Ferre" <nicolas.ferre@microchip.com>,
"Claudiu Beznea" <claudiu.beznea@tuxon.dev>,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>, <linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>,
<linux-rpi-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
Date: Sat, 16 May 2026 11:36:46 +0200 [thread overview]
Message-ID: <DIK002QFFNBY.31C3KUX2SQC6W@bootlin.com> (raw)
In-Reply-To: <20260514215129.33559-1-lukasz@raczylo.com>
Hello Lukasz,
On Thu May 14, 2026 at 11:51 PM CEST, Lukasz Raczylo wrote:
> Andrea, Théo --
>
> Thanks both. Replying to multiple points in one mail since they
> intersect.
Hum, you forgot some parts of my email to address. The good thing with
interleaved replies is that you see what you need to reply to.
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#use-trimmed-interleaved-replies-in-email-discussions
No big deal.
> First a correction to the v1 cover: the "zero events post-patch"
> claim was true only at the user-space watchdog visibility level.
> With patch 3's warn made unconditional (which is what Andrea's
> review re-tested with on v6.19.10), kernel-level evidence on my
> side now matches what Andrea saw -- patches 1 and 2 are partial,
> patch 3 is empirically the load-bearing fix on this platform.
> The v2 cover acknowledges this and reframes.
[...]
> ## Théo -- the four questions on the cover
>
> Welcome to macb maintenance. Quick answers; raw 1 Hz traces,
> dmesg dumps, and event logs available on request for any of
> these.
Thanks!
> 1. Tx-only or Tx+Rx broken?
[...]
ACK, makes sense
> 2. Recovery: link down/up vs module reload vs power cycle?
[...]
ACK, makes sense
> 3. TSO / SG / EEE disabling helped some reporters?
>
> Mixed picture, with one gap in my matrix.
>
> Tested fleet-wide since 2026-04-24 as baseline before this
> series:
> * EEE off (--set-eee end0 eee off + advertise 0x0)
> * TSO off (-K tso off)
> * GSO off (-K gso off)
> * RX/TX rings 4096/2048 (default 512)
> * IRQ affinity moved off CPU0 to CPU3
> * CPU governor schedutil (default) -- earlier tested
> performance, no change
> * qdisc fq -> pfifo_fast
>
> With all of those, the stall still fires. Pre-patch
> fleet rate was multiple per hour with these knobs already
> applied.
>
> Untested by me: TSO + SG (scatter-gather, NETIF_F_SG) off
> *together*. That is the specific combination rtheobald
> (cilium#43198 comment 4188846955) and the launchpad
> commenter (#34) report as masking the stall -- "must be
> both, not just one". I tested TSO + GSO, not TSO + SG.
>
> Both patch 1 (PCIe-fabric race on TSTART) and patch 2
> (peripheral DMA retirement race on TX_USED) are
> consistent with descriptor-fragment-path interactions
> that SG-off would mask, so the workaround being real
> isn't surprising. Closing the race rather than masking
> it should still be the right thing for mainline. Happy
> to canary-test TSO+SG-off on one node if you want the
> data point before v2 review.
Honestly I was surprised by that comment and doubted it. I don't see how
those could be related. Don't bother testing the full matrix too much.
> 4. cdns,*-max-pipe DT props -- lead dead?
[...]
ACK makes sense
> ## My own audit -- two issues v2 fixes
>
> Worth disclosing before someone else catches them.
>
> * Patch 2 (v1) reads ISR in macb_tx_poll(), masks off
> everything except TCOMP, and discards the rest.
> raspberrypi_rp1_config does not set
> MACB_CAPS_ISR_CLEAR_ON_WRITE -- the driver assumes
> read-clear ISR semantics on RP1, and macb_interrupt()
> relies on processing every bit it reads in one pass for
> that case to be correct. My v1 patch breaks that contract:
> any RCOMP / ROVR / TXUBR / etc. set in ISR at the moment of
> my read is silently consumed and never processed. RCOMP
> being lost is the worst case (RX completion never scheduled
> until something else asserts the line). Race window is
> ~200-500 ns per macb_tx_poll completion; significant at
> moderate-to-heavy RX load on a level-triggered IRQ where
> the consumed bit drops the line before GIC delivery.
Careful, you mention your raspberrypi_rp1_config doesn't set
MACB_CAPS_ISR_CLEAR_ON_WRITE but there is a catch: we expect that cap
to come from runtime discovery, see macb_configure_caps() and
register DCFG1/0x0280.
My understanding is that read-to-clear is only for old variants and all
modern MACB/GEM instances use write-to-clear.
I don't expect your RP1 HW uses R2C; my past comment was about making
sure you support both W2C and R2C to avoid breaking the driver for
some.
We have `dev_dbg(..., "Cadence caps 0x%08x\n", bp->caps)` to dump caps,
including the runtime detected ones.
> v2 patch 2 drops the ISR read entirely and substitutes
> (void)queue_readl(queue, IMR). IMR is the read-only mask
> mirror, no side effects, still flushes prior peripheral
> DMA writes via PCIe ordering. Loses the "directly sample
> latched TCOMP" half of v1's claim; keeps the PCIe-barrier
> half (which is the half that actually addresses the
> documented race in the existing macb_tx_complete_pending()
> rmb() comment).
>
> * Patch 3 (v1) boot-time false positive described above to
> Andrea. v2 fixes:
>
> - netif_carrier_ok() gate -- no carrier, no completion is
> possible, don't fire.
>
> - tracks tail movement via a bool tx_stall_tail_moved set
> by macb_tx_complete() under tx_ptr_lock when tail
> advances, cleared by the watchdog tick on the same
> lock. Form suggested by pelwell when he reviewed the
> same series anchored against rpi-6.18.y at
> raspberrypi/linux#7340 (merged 2026-05-08); 13 days of
> fleet runtime in this form since 2026-05-02 across 24
> nodes. Folded into mainline v2 patch 3 directly rather
> than carried as a fix-up.
>
> - netdev_warn_once -> netdev_warn_ratelimited per
> Andrea's ask -- operator visibility doesn't disappear
> after the first fire.
>
>
> ## v2 follows
>
> Sending [PATCH net-next v2 0/3] threaded under the v1 cover
> shortly. Plan to drop "RFC" from the subject prefix (~3 weeks
> production runtime + the rpi in-tree merge tip the balance
> toward a regular PATCH); happy to revert to RFC if you'd
> prefer the iterative-review cadence.
[...]
So you skimmed over two parts of my initial comment:
- Upstream kernel? From your messages it seems you only build test.
My question is two-fold:
- (1) Can we confirm the bug reproduces on upstream?
- (2) For my personal knowledge, can you tell me why everyone uses
the Raspberry Pi vendor kernel? I see an upstream DT for rpi5.
I'll boot one on my desk next week (hopefully).
- .ndo_tx_timeout() is probably the right approach for [3/3].
---
My recommandation would be this:
Let's get a .ndo_tx_timeout() implementation merged that will mitigate
the issue. If you can prove [1/3] and/or [2/3] are useful, then we can
get those in as well. Otherwise we drop them and keep digging to find
the Tx stall reason(s).
Those are bug-fixes, HW is broken-ish for a good chunk of users, so they
should target net and not net-next. You need a `Fixes:` trailer, it
will probably date back a long way.
Thanks Lukasz!
--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
next prev parent reply other threads:[~2026-05-16 9:37 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-05-05 13:17 ` Andrea della Porta
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-05-05 13:30 ` Andrea della Porta
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-05-14 10:31 ` Théo Lebrun
2026-05-14 21:51 ` Lukasz Raczylo
2026-05-16 9:36 ` Théo Lebrun [this message]
2026-05-14 21:54 ` [PATCH net-next v2 " Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 1/3] net: macb: flush PCIe posted write after TSTART doorbell (PCIe-only) Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 2/3] net: macb: insert PCIe read barrier before TX completion descriptor check Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo
2026-05-15 9:53 ` [PATCH net-next] net: macb: fix build of TX stall watchdog by replacing undefined netdev_warn_ratelimited Lukasz Raczylo
2026-05-15 12:47 ` Andrew Lunn
2026-05-15 13:08 ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DIK002QFFNBY.31C3KUX2SQC6W@bootlin.com \
--to=theo.lebrun@bootlin.com \
--cc=andrea.porta@suse.com \
--cc=andrew+netdev@lunn.ch \
--cc=claudiu.beznea@tuxon.dev \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rpi-kernel@lists.infradead.org \
--cc=lukasz@raczylo.com \
--cc=netdev@vger.kernel.org \
--cc=nicolas.ferre@microchip.com \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox