From: Lukasz Raczylo <lukasz@raczylo.com>
To: netdev@vger.kernel.org
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>,
Claudiu Beznea <claudiu.beznea@tuxon.dev>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
Date: Sat, 25 Apr 2026 22:48:25 +0100 [thread overview]
Message-ID: <20260425214825.31390-1-lukasz@raczylo.com> (raw)
In-Reply-To: <cover.1777064117.git.lukasz@raczylo.com>
A follow-up runtime data point on this series.
Fleet state at 2026-04-25 21:46 UTC:
* Patched uptime (since staggered rollout 2026-04-24 18:10-19:20 UTC):
- shortest: 26h 26m (last master upgraded)
- longest: 27h 34m (canary)
- cumulative across 24 nodes: ~651 node-hours
* Macb-attributable event counts (out-of-band userspace watchdog;
the [tx-stall] detector watches /sys/class/net/end0/statistics/
tx_packets + qdisc backlog every 1 s and would have fired
ip link down/up if any node's TX path froze):
- RECOVER trigger=tx-stall (actual stalls caught): 0
- partial [tx-stall] markers (transient 1 s freezes): 0
* Separately: 40 RECOVER events with trigger=ping fired in this
window across the fleet, attributable to a brief upstream-network
outage (gateway / switch event); each node simultaneously lost ping
to gateway, VIP, and NAS within seconds of each other, then
recovered. These are unrelated to the macb hang the patch series
targets — distinguishing them from a real TX stall is exactly what
the trigger= tag in the watchdog log is for.
At the pre-patch rate referenced in the cover letter (50 stalls in
95 node-hours observed in our 2026-04-24 14:00-18:10 UTC reference
window, ~0.5 per node-hour), the projected stall count in
651 node-hours is on the order of 342;
observed is 0.
Same observability runs forward; will reply again after a full week
of uptime unless something changes.
--
Lukasz Raczylo
prev parent reply other threads:[~2026-04-25 21:48 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-04-25 21:48 ` Lukasz Raczylo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260425214825.31390-1-lukasz@raczylo.com \
--to=lukasz@raczylo.com \
--cc=andrew+netdev@lunn.ch \
--cc=claudiu.beznea@tuxon.dev \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nicolas.ferre@microchip.com \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox