Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lukasz Raczylo <lukasz@raczylo.com>
To: netdev@vger.kernel.org
Cc: Theo Lebrun <theo.lebrun@bootlin.com>,
	Andrea della Porta <andrea.porta@suse.com>,
	Nicolas Ferre <nicolas.ferre@microchip.com>,
	Claudiu Beznea <claudiu.beznea@tuxon.dev>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-rpi-kernel@lists.infradead.org
Subject: [PATCH net-next v2 2/3] net: macb: insert PCIe read barrier before TX completion descriptor check
Date: Thu, 14 May 2026 22:54:58 +0100	[thread overview]
Message-ID: <20260514215459.36109-3-lukasz@raczylo.com> (raw)
In-Reply-To: <20260514215459.36109-1-lukasz@raczylo.com>

macb_tx_poll() runs with TCOMP masked, drains the TX ring, then
calls napi_complete_done() and re-enables TCOMP via IER.  An
existing comment in the function notes that completions raised
while TCOMP is masked do not re-fire on IER re-enable, and
mitigates this by calling macb_tx_complete_pending(), which
inspects driver-visible ring state (descriptor->ctrl, after
rmb()) and reschedules NAPI if a completion is observable in
memory.

On PCIe-attached parts (BCM2712 + RP1 PCIe south bridge on
Raspberry Pi 5 is the case I have in front of me), the
descriptor DMA write that sets TX_USED may not have retired to
system memory at the point macb_tx_complete_pending() runs.  The
rmb() synchronises the CPU view of earlier CPU writes; it is
not sufficient to retire an in-flight peripheral DMA write.
Under that ordering the in-memory descriptor can still read
TX_USED=0 when the hardware has in fact completed the frame;
the check returns false; NAPI exits; the quirk above prevents
the re-enabled IER from re-firing; the ring goes quiescent.

Add a side-effect-free MMIO read between the IER write and the
macb_tx_complete_pending() check.  The read functions as an
architected PCIe read barrier for earlier peripheral-originated
DMA writes on the same path, so any in-flight TX_USED update
retires to system memory before the descriptor read.

The register chosen is IMR (the read-only interrupt mask
mirror); reading it has no side effects on either read-clear or
W1C ISR silicon (it is not the ISR), and the read still flushes
prior DMA writes via the PCIe completion-ordering guarantee.

Link: https://github.com/cilium/cilium/issues/43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 6879f3458..f7fa9e7ad 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1984,6 +1984,14 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
 		 * actions if an interrupt is raised just after enabling them,
 		 * but this should be harmless.
 		 */
+		/*
+		 * PCIe read barrier: flush any in-flight peripheral DMA
+		 * writes (descriptor TX_USED updates) so the subsequent
+		 * macb_tx_complete_pending() check observes them.  IMR is
+		 * the read-only interrupt mask mirror; the read has no
+		 * side effects on either read-clear or W1C ISR silicon.
+		 */
+		(void)queue_readl(queue, IMR);
 		if (macb_tx_complete_pending(queue)) {
 			queue_writel(queue, IDR, MACB_BIT(TCOMP));
 			macb_queue_isr_clear(bp, queue, MACB_BIT(TCOMP));
-- 
2.54.0



  parent reply	other threads:[~2026-05-14 21:55 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-05-05 13:17   ` Andrea della Porta
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-05-05 13:30   ` Andrea della Porta
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-05-14 10:31 ` Théo Lebrun
2026-05-14 21:51 ` Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 " Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 1/3] net: macb: flush PCIe posted write after TSTART doorbell (PCIe-only) Lukasz Raczylo
2026-05-14 21:54   ` Lukasz Raczylo [this message]
2026-05-14 21:54   ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514215459.36109-3-lukasz@raczylo.com \
    --to=lukasz@raczylo.com \
    --cc=andrea.porta@suse.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=claudiu.beznea@tuxon.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.ferre@microchip.com \
    --cc=pabeni@redhat.com \
    --cc=theo.lebrun@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox