[RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
@ 2026-04-24 22:38 Lukasz Raczylo
  2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Lukasz Raczylo @ 2026-04-24 22:38 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel,
	linux-arm-kernel, linux-rpi-kernel

Hi netdev, Nicolas, Claudiu, linux-rpi,

This series proposes three candidate fixes for the silent TX stall
observed on Raspberry Pi 5 (BCM2712 SoC, Cadence GEM via RP1 PCIe
south bridge).  The bug has been reported, with reproducers, at:

  * https://github.com/cilium/cilium/issues/43198
  * https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877

Cilium #43198 reports reproduction on linux-raspi 6.17.0-1004, and
explicitly notes reproduction with both Cilium/eBPF and
Calico/nftables dataplanes (i.e. not CNI-specific).  We observe the
same failure mode on kernel 6.18.24 built from raspberrypi/linux
rpi-6.18.y @ f2f68e79f16f, across a 24-node Raspberry Pi 5 fleet.
The 6.17/6.18 commonality and the two-CNI reproduction together
put the root cause below the packet-scheduling layer, in the macb
driver or the RP1 PCIe path.

Observed symptoms (our side, 6.18.24; consistent with the linked
reports):

  * queue->tx_tail stops advancing at a single second;
  * /sys/class/net/<iface>/statistics/tx_packets stops incrementing;
  * qdisc backlog grows past zero; netif_stop_subqueue() is called;
  * RX counters continue advancing; the MAC IRQ line continues to
    fire (RX completions are handled);
  * no kernel log line is produced for the duration of the stall;
  * dev_watchdog does not fire: macb_netdev_ops has no
    .ndo_tx_timeout, and our reading is that trans_start is kept
    fresh by successful xmit prior to the ring filling;
  * recovery on our side has required `ip link set <iface> down/up`
    via an out-of-band watchdog DaemonSet.

Reading the current driver we identified three plausible races
between driver and hardware, each of which could independently
produce the observed behaviour.  We did not determine which is the
actual root cause -- that likely requires either BCM2712/RP1
documentation we do not have, or dynamic tracing of the driver
during an in-situ stall.  The series therefore attempts to close
all three, with each commit message stating which specific race
that patch is targeting.

  Patch 1/3 -- flush PCIe posted write after TSTART doorbell.
  Writes to NCR are posted PCIe writes and may not reach the MAC
  before the driver returns.  If the TSTART doorbell is lost, no
  TX starts, no TCOMP arrives, and the ring goes quiescent.  A
  read-back of NCR after the write is a standard read-after-write
  PCIe flush.

  Patch 2/3 -- re-check ISR after IER re-enable in macb_tx_poll().
  An existing comment in macb_tx_poll() notes that completions
  raised while TCOMP is masked do not re-fire when IER is
  re-enabled, and mitigates the window with macb_tx_complete_pending(),
  which inspects driver-visible ring state only (after rmb()).  On
  PCIe-attached parts the descriptor DMA write that sets TX_USED
  can remain in flight when that check runs; the rmb() orders CPU
  writes but does not retire peripheral DMA.  Reading ISR directly
  after IER re-enable addresses this in two ways: (a) the MMIO read
  is an architected PCIe read barrier for prior DMA writes, so a
  subsequent macb_tx_complete_pending() sees up-to-date TX_USED
  state; (b) it directly observes a pending TCOMP bit if the
  hardware has one set.  Either signal reschedules NAPI.

  Patch 3/3 -- TX stall watchdog.  Defence-in-depth.  If patches
  1 and 2 close the races we identified, this patch performs a
  single spin_lock_irqsave/unlock and a branch per queue per
  second with no other effect.  If a further race remains that we
  have not identified, it invokes the driver's own existing
  macb_tx_restart(), which already verifies that TBQP is behind
  tx_head before re-asserting TSTART.  We include this patch
  because we have empirically observed multi-minute stalls on this
  hardware; we are willing to drop it if the preference is for
  1 and 2 to stand alone.

Status and testing:

  * Apply-tested against Linux net-next HEAD (this series is
    generated from it) and against raspberrypi/linux rpi-6.18.y @
    f2f68e79f16f (the fork our fleet runs): all three apply
    cleanly on net-next; the rpi fork carries an additional local
    `bool tx_pending` field on `struct macb_queue` that is not in
    mainline, so we maintain a small rebased patch 3 hunk for it.
  * Build-tested: the series compiles cleanly as part of our Talos
    image build pipeline on arm64.
  * Runtime-tested, early signal: ~4 h 20 min of post-patch uptime
    on the canary node, ~3 h 15 min on the slowest (last master to
    upgrade), ~95 node-hours cumulative across the 24-node fleet at
    the time this cover letter was written.  During that window the
    fleet-wide counts are zero RECOVER events, zero `[tx-stall]`
    partial markers (an out-of-band userspace detector that records
    even transient one-second freezes that recover before the
    3-second threshold), and zero ping-failure markers.  Pre-patch
    reference window (2026-04-24 14:00-18:10 UTC, when proper
    monitoring was in place) observed multiple stalls per hour at
    fleet level; at that rate we would expect on the order of 50
    stalls in 95 node-hours, actual is zero.  We will follow up
    with a 24 h and a 1-week data point as the same observability
    runs forward; the direction so far is consistent with patches
    1 and 2 closing the underlying race(s) and patch 3 correctly
    being a no-op on healthy hardware.

The series does not depend on any other in-flight work we are
aware of.  Happy to split, rebase, or drop individual patches on
feedback.  All three are independently revertable.

Lukasz Raczylo (3):
  net: macb: flush PCIe posted write after TSTART doorbell
  net: macb: re-check ISR after IER re-enable in macb_tx_poll
  net: macb: add TX stall watchdog as defence-in-depth safety net

 drivers/net/ethernet/cadence/macb.h      |  5 ++
 drivers/net/ethernet/cadence/macb_main.c | 99 +++++++++++++++++++++---
 2 files changed, 94 insertions(+), 10 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell
  2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
@ 2026-04-24 22:38 ` Lukasz Raczylo
  2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Lukasz Raczylo @ 2026-04-24 22:38 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel,
	linux-arm-kernel, linux-rpi-kernel

macb_start_xmit() and macb_tx_restart() kick transmission by
OR-ing MACB_BIT(TSTART) into NCR.  On PCIe-attached macb instances
(BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 is the setup we
have in front of us), writes to NCR are posted PCIe writes: they
are not guaranteed to reach the device before the issuing CPU
returns.  If the TSTART doorbell does not reach the MAC, no TX
begins, no TCOMP completion arrives, and the ring remains
quiescent without any kernel-visible indication.

Note that the raspberrypi/linux vendor fork carries a local patch
around the TSTART site (a queue->tx_pending breadcrumb that is
promoted to queue->txubr_pending by the next TCOMP interrupt,
triggering macb_tx_restart()).  That workaround makes the loss
recoverable under traffic, but it cannot help if TCOMP itself is
not raised because no TX started -- which is exactly the case we
are targeting here.  The handshake is not present in mainline.

Add a read-back of NCR after each TSTART write in macb_start_xmit()
and macb_tx_restart().  The read is an architected PCIe read
barrier for earlier posted writes on the same path; it ensures the
doorbell has reached the MAC before the functions return.

We do not yet have direct hardware evidence that TSTART is being
lost on the RP1 path (that would require a PCIe protocol analyser,
or at minimum a before/after counter on queue->tx_stall_last_tail
with and without this patch applied in isolation).  This patch is
one of a three-patch series ("candidate fixes for silent TX stall
on BCM2712/RP1"); see the cover letter for context.  We have
verified the series compiles and applies cleanly against mainline
HEAD and against raspberrypi/linux rpi-6.18.y @ f2f68e79f16f;
runtime verification is pending.

Link: https://github.com/cilium/cilium/issues/43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index a12aa2124..b6cca55ad 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1922,6 +1922,13 @@ static void macb_tx_restart(struct macb_queue *queue)

 	spin_lock(&bp->lock);
 	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+	/*
+	 * Flush the PCIe posted-write queue so the TSTART doorbell
+	 * reliably reaches the MAC.  Without this, the write can sit
+	 * in the fabric and the MAC never advances, causing a silent
+	 * TX stall.
+	 */
+	(void)macb_readl(bp, NCR);
 	spin_unlock(&bp->lock);

 out_tx_ptr_unlock:
@@ -2560,6 +2567,11 @@ static netdev_tx_t macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	spin_lock(&bp->lock);
 	macb_tx_lpi_wake(bp);
 	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+	/*
+	 * Flush the PCIe posted-write queue; see the comment in
+	 * macb_tx_restart() for the reasoning.
+	 */
+	(void)macb_readl(bp, NCR);
 	spin_unlock(&bp->lock);

 	if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1)
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll
  2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
  2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
@ 2026-04-24 22:38 ` Lukasz Raczylo
  2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
  2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
  3 siblings, 0 replies; 5+ messages in thread
From: Lukasz Raczylo @ 2026-04-24 22:38 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel,
	linux-arm-kernel, linux-rpi-kernel

macb_tx_poll() runs with TCOMP masked, drains the TX ring, then
calls napi_complete_done() and re-enables TCOMP via IER.  An
existing comment in the function notes:

	/* Packet completions only seem to propagate to raise
	 * interrupts when interrupts are enabled at the time, so if
	 * packets were sent while interrupts were disabled,
	 * they will not cause another interrupt to be generated when
	 * interrupts are re-enabled.
	 */

and mitigates this by calling macb_tx_complete_pending(), which
inspects driver-visible ring state (descriptor->ctrl, after rmb())
and reschedules NAPI if a completion is observable in memory.

On PCIe-attached parts (BCM2712 + RP1 on Raspberry Pi 5 is the
setup we have in front of us), the descriptor DMA write that sets
TX_USED may not have retired to system memory at the point
macb_tx_complete_pending() runs.  The rmb() synchronises the CPU
view of earlier CPU writes; it is not sufficient to retire an
in-flight peripheral DMA write.  Under that ordering the in-memory
descriptor can still read TX_USED=0 when the hardware has in fact
completed the frame; the check returns false; NAPI exits; the
quirk above prevents the re-enabled IER from re-firing; the ring
goes quiescent.

Add an explicit ISR read after the IER write.  The MMIO read
serves two independent purposes:

  (1) It is an architected PCIe read barrier for earlier
      peripheral-originated DMA writes on the same path, so a
      subsequent macb_tx_complete_pending() observes any TX_USED
      write that was in flight at the time of the barrier.

  (2) It samples the hardware ISR directly, so a TCOMP bit that
      the hardware set while TCOMP was masked is visible here,
      independently of whether the descriptor DMA has retired.

If either signal indicates pending work, reschedule NAPI via the
same path as the existing check.

This patch addresses one of three candidate races for the silent
TX stall described in the cover letter.  Whether it is sufficient
by itself, or whether it requires the PCIe posted-write flush in
patch 1/3 to cover the observed behaviour, we have not yet
verified at runtime.

Link: https://github.com/cilium/cilium/issues/43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 28 +++++++++++++++---------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index b6cca55ad..ea231b1c5 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1973,17 +1973,25 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
 	if (work_done < budget && napi_complete_done(napi, work_done)) {
 		queue_writel(queue, IER, MACB_BIT(TCOMP));
 
-		/* Packet completions only seem to propagate to raise
-		 * interrupts when interrupts are enabled at the time, so if
-		 * packets were sent while interrupts were disabled,
-		 * they will not cause another interrupt to be generated when
-		 * interrupts are re-enabled.
-		 * Check for this case here to avoid losing a wakeup. This can
-		 * potentially race with the interrupt handler doing the same
-		 * actions if an interrupt is raised just after enabling them,
-		 * but this should be harmless.
+		/*
+		 * TCOMP events that fire while the interrupt is masked do
+		 * not re-fire when IER is re-enabled.  Catch this two ways
+		 * to avoid losing a wakeup:
+		 *
+		 *   (1) Read ISR -- catches completions the hardware flagged
+		 *       but that we did not see as an interrupt.  The MMIO
+		 *       read doubles as a PCIe read barrier, flushing any
+		 *       in-flight descriptor TX_USED DMA writes into memory.
+		 *   (2) macb_tx_complete_pending() inspects the ring after
+		 *       that flush, catching a descriptor whose TX_USED is
+		 *       now visible as a result of the barrier.
+		 *
+		 * This can race with the interrupt handler taking the same
+		 * path if an interrupt fires just after the IER write;
+		 * rescheduling NAPI in that case is harmless.
 		 */
-		if (macb_tx_complete_pending(queue)) {
+		if ((queue_readl(queue, ISR) & MACB_BIT(TCOMP)) ||
+		    macb_tx_complete_pending(queue)) {
 			queue_writel(queue, IDR, MACB_BIT(TCOMP));
 			macb_queue_isr_clear(bp, queue, MACB_BIT(TCOMP));
 			netdev_vdbg(bp->dev, "TX poll: packets pending, reschedule\n");
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net
  2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
  2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
  2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
@ 2026-04-24 22:38 ` Lukasz Raczylo
  2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
  3 siblings, 0 replies; 5+ messages in thread
From: Lukasz Raczylo @ 2026-04-24 22:38 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel,
	linux-arm-kernel, linux-rpi-kernel

Patches 1/3 and 2/3 address two candidate races that could lead
to a TCOMP completion being missed on PCIe-attached macb
instances.  This patch adds a defence-in-depth safety net, in
case a further race remains that we have not identified.

The watchdog is a per-queue delayed_work that runs once per
second.  It snapshots queue->tx_tail; if the ring is non-empty
(queue->tx_head != queue->tx_tail) and tx_tail has not advanced
since the previous tick, it calls macb_tx_restart().

No new recovery logic is introduced.  macb_tx_restart() already
exists in this file, is correctly locked (tx_ptr_lock, bp->lock),
and verifies that the hardware's TBQP is behind the driver's
head index before re-asserting TSTART.  On a healthy ring it is
a no-op at the hardware level; the watchdog only supplies the
missing trigger.

On a healthy queue the per-tick cost is one spin_lock_irqsave()
/ spin_unlock_irqrestore() and one branch.  The delayed_work is
only scheduled between macb_open() and macb_close(), and is
cancelled synchronously on close.

Context for submission: on our 24-node Raspberry Pi 5 fleet,
before this series, an out-of-band user-space watchdog
(monitoring tx_packets from /sys/class/net/.../statistics and
toggling the link down/up when it froze) was required to keep
nodes usable.  We include this kernel-side watchdog as a cleaner
in-kernel equivalent for any residual stall that patches 1 and
2 do not cover.  We are willing to drop this patch if the view
is that 1 and 2 should stand alone.

Link: https://github.com/cilium/cilium/issues/43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
---
 drivers/net/ethernet/cadence/macb.h      |  5 ++
 drivers/net/ethernet/cadence/macb_main.c | 59 ++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 2de56017e..9115f2b47 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1278,6 +1278,11 @@ struct macb_queue {
 	dma_addr_t		tx_ring_dma;
 	struct work_struct	tx_error_task;
 	bool			txubr_pending;
+
+	/* TX stall watchdog -- see macb_tx_stall_watchdog() in macb_main.c */
+	struct delayed_work	tx_stall_watchdog_work;
+	unsigned int		tx_stall_last_tail;
+
 	struct napi_struct	napi_tx;
 
 	dma_addr_t		rx_ring_dma;
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index ea231b1c5..ea2306ef7 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2002,6 +2002,59 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
 	return work_done;
 }
 
+#define MACB_TX_STALL_INTERVAL_MS	1000
+
+/*
+ * TX stall watchdog.
+ *
+ * Defence-in-depth against lost TCOMP interrupts.  macb already has a
+ * recovery chain (tx_pending -> txubr_pending -> macb_tx_restart())
+ * that fires on TCOMP; if TCOMP itself is lost the TX ring stalls
+ * silently until something else kicks TSTART.  This watchdog runs
+ * once per second per queue, snapshots tx_tail, and calls
+ * macb_tx_restart() if the ring is non-empty and tx_tail has not
+ * advanced since the previous tick.
+ *
+ * macb_tx_restart() already checks the hardware's TBQP against the
+ * driver's head index before re-asserting TSTART, so on a healthy
+ * ring this is a no-op at the hardware level.  The watchdog only
+ * adds the missing trigger.
+ */
+static void macb_tx_stall_watchdog(struct work_struct *work)
+{
+	struct macb_queue *queue = container_of(to_delayed_work(work),
+						struct macb_queue,
+						tx_stall_watchdog_work);
+	struct macb *bp = queue->bp;
+	unsigned int cur_tail, cur_head;
+	bool stalled = false;
+	unsigned long flags;
+
+	if (!netif_running(bp->dev))
+		return;
+
+	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
+	cur_tail = queue->tx_tail;
+	cur_head = queue->tx_head;
+	if (cur_head != cur_tail &&
+	    cur_tail == queue->tx_stall_last_tail)
+		stalled = true;
+	else
+		queue->tx_stall_last_tail = cur_tail;
+	spin_unlock_irqrestore(&queue->tx_ptr_lock, flags);
+
+	if (stalled) {
+		netdev_warn_once(bp->dev,
+				 "TX stall detected on queue %u (tail=%u head=%u); re-kicking TSTART\n",
+				 (unsigned int)(queue - bp->queues),
+				 cur_tail, cur_head);
+		macb_tx_restart(queue);
+	}
+
+	schedule_delayed_work(&queue->tx_stall_watchdog_work,
+			      msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS));
+}
+
 static void macb_hresp_error_task(struct work_struct *work)
 {
 	struct macb *bp = from_work(bp, work, hresp_err_bh_work);
@@ -3190,6 +3243,9 @@ static int macb_open(struct net_device *dev)
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
 		napi_enable(&queue->napi_rx);
 		napi_enable(&queue->napi_tx);
+		queue->tx_stall_last_tail = queue->tx_tail;
+		schedule_delayed_work(&queue->tx_stall_watchdog_work,
+				      msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS));
 	}
 
 	macb_init_hw(bp);
@@ -3240,6 +3296,7 @@ static int macb_close(struct net_device *dev)
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
 		napi_disable(&queue->napi_rx);
 		napi_disable(&queue->napi_tx);
+		cancel_delayed_work_sync(&queue->tx_stall_watchdog_work);
 		netdev_tx_reset_queue(netdev_get_tx_queue(dev, q));
 	}
 
@@ -4802,6 +4859,8 @@ static int macb_init_dflt(struct platform_device *pdev)
 		}
 
 		INIT_WORK(&queue->tx_error_task, macb_tx_error_task);
+		INIT_DELAYED_WORK(&queue->tx_stall_watchdog_work,
+				  macb_tx_stall_watchdog);
 		q++;
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
  2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
                   ` (2 preceding siblings ...)
  2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
@ 2026-04-25 21:48 ` Lukasz Raczylo
  3 siblings, 0 replies; 5+ messages in thread
From: Lukasz Raczylo @ 2026-04-25 21:48 UTC (permalink / raw)
  To: netdev
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel,
	linux-arm-kernel

A follow-up runtime data point on this series.

Fleet state at 2026-04-25 21:46 UTC:

  * Patched uptime (since staggered rollout 2026-04-24 18:10-19:20 UTC):
    - shortest: 26h 26m   (last master upgraded)
    - longest:  27h 34m   (canary)
    - cumulative across 24 nodes: ~651 node-hours

  * Macb-attributable event counts (out-of-band userspace watchdog;
    the [tx-stall] detector watches /sys/class/net/end0/statistics/
    tx_packets + qdisc backlog every 1 s and would have fired
    ip link down/up if any node's TX path froze):
    - RECOVER trigger=tx-stall (actual stalls caught):    0
    - partial [tx-stall] markers (transient 1 s freezes): 0

  * Separately: 40 RECOVER events with trigger=ping fired in this
    window across the fleet, attributable to a brief upstream-network
    outage (gateway / switch event); each node simultaneously lost ping
    to gateway, VIP, and NAS within seconds of each other, then
    recovered.  These are unrelated to the macb hang the patch series
    targets — distinguishing them from a real TX stall is exactly what
    the trigger= tag in the watchdog log is for.

At the pre-patch rate referenced in the cover letter (50 stalls in
95 node-hours observed in our 2026-04-24 14:00-18:10 UTC reference
window, ~0.5 per node-hour), the projected stall count in
651 node-hours is on the order of 342;
observed is 0.

Same observability runs forward; will reply again after a full week
of uptime unless something changes.

--
Lukasz Raczylo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-25 21:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox