Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lukasz Raczylo <lukasz@raczylo.com>
To: netdev@vger.kernel.org
Cc: Theo Lebrun <theo.lebrun@bootlin.com>,
	Andrea della Porta <andrea.porta@suse.com>,
	Nicolas Ferre <nicolas.ferre@microchip.com>,
	Claudiu Beznea <claudiu.beznea@tuxon.dev>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-rpi-kernel@lists.infradead.org
Subject: [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts
Date: Thu, 14 May 2026 22:54:59 +0100	[thread overview]
Message-ID: <20260514215459.36109-4-lukasz@raczylo.com> (raw)
In-Reply-To: <20260514215459.36109-1-lukasz@raczylo.com>

On PCIe-attached macb instances (BCM2712 + RP1 PCIe south bridge
on Raspberry Pi 5 is the case I have in front of me), a TCOMP
interrupt can be lost: the TSTART doorbell can be lost in the
posted-write fabric (addressed by an earlier patch), or the
descriptor TX_USED DMA write can be observed late by the driver
(also addressed earlier).  When that happens the TX ring stalls
silently until something else kicks TSTART.

Add a per-queue delayed_work that runs once per second.  It
detects forward progress on the TX completion path via a per-queue
bool tx_stall_tail_moved that macb_tx_complete() sets when tx_tail
advances and the watchdog clears on each tick.  If the ring is
non-empty (queue->tx_head != queue->tx_tail) and the flag is
unset when the tick runs, the watchdog calls the existing
macb_tx_restart() to re-assert TSTART.

The bool form (rather than a tx_tail snapshot) sidesteps any
concern about ring-index aliasing between ticks and is the form
suggested by Phil Elwell when reviewing the same series anchored
against rpi-6.18.y at raspberrypi/linux#7340 (merged 2026-05-08).

No new recovery logic is introduced.  macb_tx_restart() already
exists in this file, is correctly locked (tx_ptr_lock, bp->lock),
and verifies that the hardware's TBQP is behind the driver's head
index before re-asserting TSTART.  On a healthy ring it is a
no-op at the hardware level.

Cost on a healthy queue: one spin_lock_irqsave / spin_unlock and
two field assignments per tick.  The delayed_work is only
scheduled between macb_open() and macb_close() and is cancelled
synchronously on close.

A netif_carrier_ok() gate at the top of the tick skips the stall
check when there is no carrier (no completion is possible without
a link), eliminating a boot-time false positive where queue->tx_head
can advance from kernel-queued packets between macb_open() and link
autoneg completion, while tx_tail stays unchanged because no TCOMPs
have arrived yet.

netdev_warn_ratelimited() is used rather than netdev_warn_once() so
operators can count occurrences across the lifetime of the netdev.

Link: https://github.com/cilium/cilium/issues/43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Link: https://github.com/raspberrypi/linux/pull/7340
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
---
 drivers/net/ethernet/cadence/macb.h      | 10 ++++
 drivers/net/ethernet/cadence/macb_main.c | 72 ++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index ce9037f9e..75df0f75b 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1282,6 +1282,16 @@ struct macb_queue {
 	dma_addr_t		tx_ring_dma;
 	struct work_struct	tx_error_task;
 	bool			txubr_pending;
+
+	/* TX stall watchdog -- see macb_tx_stall_watchdog() in macb_main.c.
+	 * tx_stall_tail_moved is set by macb_tx_complete() when tx_tail
+	 * advances and cleared by the watchdog tick on each pass (both
+	 * under tx_ptr_lock).  Using a bool sidesteps any ring-index
+	 * aliasing concern between ticks.
+	 */
+	struct delayed_work	tx_stall_watchdog_work;
+	bool			tx_stall_tail_moved;
+
 	struct napi_struct	napi_tx;
 
 	dma_addr_t		rx_ring_dma;
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index f7fa9e7ad..8245c69e1 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1473,6 +1473,8 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 				  packets, bytes);
 
 	queue->tx_tail = tail;
+	if (packets)
+		queue->tx_stall_tail_moved = true;
 	if (__netif_subqueue_stopped(bp->dev, queue_index) &&
 	    CIRC_CNT(queue->tx_head, queue->tx_tail,
 		     bp->tx_ring_size) <= MACB_TX_WAKEUP_THRESH(bp))
@@ -2003,6 +2005,70 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
 	return work_done;
 }
 
+#define MACB_TX_STALL_INTERVAL_MS	1000
+
+/*
+ * TX stall watchdog.
+ *
+ * Recovers from lost TCOMP interrupts on PCIe-attached macb
+ * instances.  macb already has a recovery chain
+ * (txubr_pending -> macb_tx_restart()) that fires on TCOMP; if
+ * TCOMP itself is lost the TX ring stalls silently until something
+ * else kicks TSTART.  This watchdog runs once per second per queue
+ * and calls macb_tx_restart() if the ring is non-empty and
+ * tx_tail has not advanced since the previous tick.
+ *
+ * Movement is tracked via the tx_stall_tail_moved boolean rather
+ * than a tx_tail snapshot, sidestepping any ring-index aliasing
+ * concern.  The bool is set by macb_tx_complete() when tx_tail
+ * advances and cleared here on each tick; both writes are under
+ * tx_ptr_lock so no atomic is required.
+ *
+ * macb_tx_restart() already checks the hardware's TBQP against
+ * the driver's head index before re-asserting TSTART, so on a
+ * healthy ring this is a no-op at the hardware level.  The
+ * watchdog only supplies the missing trigger.
+ */
+static void macb_tx_stall_watchdog(struct work_struct *work)
+{
+	struct macb_queue *queue = container_of(to_delayed_work(work),
+						struct macb_queue,
+						tx_stall_watchdog_work);
+	struct macb *bp = queue->bp;
+	unsigned int cur_tail, cur_head;
+	bool stalled = false;
+	unsigned long flags;
+
+	if (!netif_running(bp->dev))
+		return;
+
+	/* No carrier => no completion is possible.  Skip the check
+	 * but keep the watchdog ticking for when carrier comes up.
+	 */
+	if (!netif_carrier_ok(bp->dev))
+		goto reschedule;
+
+	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
+	cur_tail = queue->tx_tail;
+	cur_head = queue->tx_head;
+	if (cur_head != cur_tail && !queue->tx_stall_tail_moved)
+		stalled = true;
+	queue->tx_stall_tail_moved = false;
+	spin_unlock_irqrestore(&queue->tx_ptr_lock, flags);
+
+	if (stalled) {
+		netdev_warn_ratelimited(bp->dev,
+					"TX stall detected on queue %u (tail=%u head=%u); re-kicking TSTART\n",
+					(unsigned int)(queue - bp->queues),
+					cur_tail, cur_head);
+		macb_tx_restart(queue);
+	}
+
+reschedule:
+	schedule_delayed_work(&queue->tx_stall_watchdog_work,
+			      msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS));
+}
+
 static void macb_hresp_error_task(struct work_struct *work)
 {
 	struct macb *bp = from_work(bp, work, hresp_err_bh_work);
@@ -3192,6 +3258,9 @@ static int macb_open(struct net_device *dev)
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
 		napi_enable(&queue->napi_rx);
 		napi_enable(&queue->napi_tx);
+		queue->tx_stall_tail_moved = true;
+		schedule_delayed_work(&queue->tx_stall_watchdog_work,
+				      msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS));
 	}
 
 	macb_init_hw(bp);
@@ -3242,6 +3311,7 @@ static int macb_close(struct net_device *dev)
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
 		napi_disable(&queue->napi_rx);
 		napi_disable(&queue->napi_tx);
+		cancel_delayed_work_sync(&queue->tx_stall_watchdog_work);
 		netdev_tx_reset_queue(netdev_get_tx_queue(dev, q));
 	}
 
@@ -4804,6 +4874,8 @@ static int macb_init_dflt(struct platform_device *pdev)
 		}
 
 		INIT_WORK(&queue->tx_error_task, macb_tx_error_task);
+		INIT_DELAYED_WORK(&queue->tx_stall_watchdog_work,
+				  macb_tx_stall_watchdog);
 		q++;
 	}
 
-- 
2.54.0



      parent reply	other threads:[~2026-05-14 21:55 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-05-05 13:17   ` Andrea della Porta
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-05-05 13:30   ` Andrea della Porta
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-05-14 10:31 ` Théo Lebrun
2026-05-14 21:51 ` Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 " Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 1/3] net: macb: flush PCIe posted write after TSTART doorbell (PCIe-only) Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 2/3] net: macb: insert PCIe read barrier before TX completion descriptor check Lukasz Raczylo
2026-05-14 21:54   ` Lukasz Raczylo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514215459.36109-4-lukasz@raczylo.com \
    --to=lukasz@raczylo.com \
    --cc=andrea.porta@suse.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=claudiu.beznea@tuxon.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.ferre@microchip.com \
    --cc=pabeni@redhat.com \
    --cc=theo.lebrun@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox