From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from acj35aaf126.lhr1.oracleemaildelivery.com (acj35aaf126.lhr1.oracleemaildelivery.com [130.35.116.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FCF73009CB for ; Fri, 24 Apr 2026 22:38:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=130.35.116.126 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777070319; cv=none; b=fimCM56MqOetuSW2YDY4m4p+zXTi9PTZlznFibi4pPn27b1FJW7HeXif7jVDL34ShZeS3S3Kw3J4xAix//3pEUvN3o8iHfN3N++IOZpeX8V9Zyq0kdAa62Cc2SnUPeh+4OHx7+qmhXPLsME8qUvMFjU12CUl0R8ZTyMl5Ml2t14= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777070319; c=relaxed/simple; bh=xUJwXvMkpWSDYRXgzCZLsd+EsgdoCgW8I52ZPmglP2Y=; h=From:To:Cc:Subject:Date:Message-id:MIME-version; b=bjL75NBJVaC4ZAEA4hrjuvV8sBmS8gJ5/BGagYTjqxCDPM+ERlnCGWWmyvZgtpoWQ0KVe9ucXINk+vLstdvtylhkGINCUJ98n33Olx0S8UxJTUhzRVfbsyXVVxORww2XuLnAnYKwkO6aOsFYwDC+AX+G7IqX6g6HwXlC9rOttwA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=raczylo.com; spf=pass smtp.mailfrom=lhr1.rp.oracleemaildelivery.com; dkim=pass (2048-bit key) header.d=raczylo.com header.i=@raczylo.com header.b=N9BYA9+L; dkim=pass (2048-bit key) header.d=lhr1.rp.oracleemaildelivery.com header.i=@lhr1.rp.oracleemaildelivery.com header.b=IPVEwozc; arc=none smtp.client-ip=130.35.116.126 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=raczylo.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lhr1.rp.oracleemaildelivery.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=raczylo.com header.i=@raczylo.com header.b="N9BYA9+L"; dkim=pass (2048-bit key) header.d=lhr1.rp.oracleemaildelivery.com header.i=@lhr1.rp.oracleemaildelivery.com header.b="IPVEwozc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=oracle-uk-012026; d=raczylo.com; h=Date:To:From:Subject:Message-Id:MIME-Version:Sender:List-Unsubscribe:List-Unsubscribe-Post; bh=PuNCSuhDY3aqUFt5mHJ3ZpNiytDxGYxQFnYjsTy8j+M=; b=N9BYA9+Ll20t3jXTukv5c5Ei/ghT8pKSnld8ndOhO7tEzkcJMJNX3zoHDPslSAlLUxM2fv8prPrG MMOThjsnTJjL3PgwdAB8pX73jOFK9RKA82kEGPeU9uHrcSoM5lMPHqFVm7NQCp2LjxiilHRVFuRN RZeQzZKVu7XiUJdKYf+N/sZbMaV1ghIaVKS1ce9rzJ7nhJfLGD9q5FurJd6JEjkowmDcVXs1R7Eo FW5xH51U3bVVAdkpJk7+9rp+r2T9nhbL8HtkX1+HhWbCS2brs5H6OvbG7ymwdcz4fKpWfQOfqFYr SKHn9I6D0JEPw3DVIsxKdJreih522cWh4CFN+g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=prod-lhr-20191104; d=lhr1.rp.oracleemaildelivery.com; h=Date:To:From:Subject:Message-Id:MIME-Version:Sender:List-Unsubscribe:List-Unsubscribe-Post; bh=PuNCSuhDY3aqUFt5mHJ3ZpNiytDxGYxQFnYjsTy8j+M=; b=IPVEwozcAmYaHp+VJNxkEuC/d1ZvLqSMTXOciJ2gp9UtpQlgPSQxOuytlHTil3eiiK5cV9crndXg +kglYWABUFhRGE3/ibnKwVRcpuoRNt2YxBIO33y3vIVgK/YBOiePB0SOLiojve0jCSjCmfh3VpgM B/WaaArVEdo0SAd/L1rJm0fIXFDqbEDJ8lE6wnY1olVJpsfYXCNYnI5auc61mEgEXVaKNZLNPQ8s biO/+99mXcjIw9vrqpJ5jwge1kYjD4inVJzQIBQxdXF+Bl9yJuCb3A3VALHmYyh7kjBjJ0lhAado GSRXZ2dHy83iSM4r9YnQhuKXXDfprd2v+GuFNQ== Received: by omta-ad1-fd1-402-uk-london-1.omtaad1.vcndplhr.oraclevcn.com (Oracle Communications Messaging Server 8.1.0.1.20260212 64bit (built Feb 12 2026)) with ESMTPS id <0TE0001IRS8AB850@omta-ad1-fd1-402-uk-london-1.omtaad1.vcndplhr.oraclevcn.com> for linux-kernel@vger.kernel.org; Fri, 24 Apr 2026 22:38:34 +0000 (GMT) List-Unsubscribe-Post: List-Unsubscribe=One-Click From: Lukasz Raczylo To: netdev@vger.kernel.org Cc: Nicolas Ferre , Claudiu Beznea , Andrew Lunn , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-rpi-kernel@lists.infradead.org Subject: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Date: Fri, 24 Apr 2026 23:38:30 +0100 Message-id: X-Mailer: git-send-email 2.53.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-version: 1.0 Content-transfer-encoding: 8bit Reporting-Meta: AAFzDiYTuS9jvqSCDQoXiqVnkU2yDl7DFh57yZB8VktRLO86c1eFyO5ekvNWCLwV PHyd9gw8WA5srfV/TRD50HPcqN0ckK5QcJKo8mB70Da7TcawKrLaPoKGUn9aIddZ nK6mKATC2eGl7ssNTnPGOXkjGQRgjMhWwx36gcYxJTz73a1nGxrdwvckSoMh1Au+ qyZ6cn4Kzex8hx98Ik+GtqjLJ5wp3A1ICmXaNffD1WDlzU/KXTOMCTCBSScotHCZ y0kqecib7hXajslQyBQGVG1YfI0yiPptsTB4mF3Hw5LSCow+VtNcw1xfdpeuyzau a49i2hijZTfpXavIkd5dXDwhgOSPUsG3M/QsBoq2hp3GcJAzoQxuoWMuD+afwoCG vsMJZiI19s789jzoIHUFqs17S31XtoZqMXTx3eR3rou9vloI0JUJLfDllKW/+1fC Hi netdev, Nicolas, Claudiu, linux-rpi, This series proposes three candidate fixes for the silent TX stall observed on Raspberry Pi 5 (BCM2712 SoC, Cadence GEM via RP1 PCIe south bridge). The bug has been reported, with reproducers, at: * https://github.com/cilium/cilium/issues/43198 * https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877 Cilium #43198 reports reproduction on linux-raspi 6.17.0-1004, and explicitly notes reproduction with both Cilium/eBPF and Calico/nftables dataplanes (i.e. not CNI-specific). We observe the same failure mode on kernel 6.18.24 built from raspberrypi/linux rpi-6.18.y @ f2f68e79f16f, across a 24-node Raspberry Pi 5 fleet. The 6.17/6.18 commonality and the two-CNI reproduction together put the root cause below the packet-scheduling layer, in the macb driver or the RP1 PCIe path. Observed symptoms (our side, 6.18.24; consistent with the linked reports): * queue->tx_tail stops advancing at a single second; * /sys/class/net//statistics/tx_packets stops incrementing; * qdisc backlog grows past zero; netif_stop_subqueue() is called; * RX counters continue advancing; the MAC IRQ line continues to fire (RX completions are handled); * no kernel log line is produced for the duration of the stall; * dev_watchdog does not fire: macb_netdev_ops has no .ndo_tx_timeout, and our reading is that trans_start is kept fresh by successful xmit prior to the ring filling; * recovery on our side has required `ip link set down/up` via an out-of-band watchdog DaemonSet. Reading the current driver we identified three plausible races between driver and hardware, each of which could independently produce the observed behaviour. We did not determine which is the actual root cause -- that likely requires either BCM2712/RP1 documentation we do not have, or dynamic tracing of the driver during an in-situ stall. The series therefore attempts to close all three, with each commit message stating which specific race that patch is targeting. Patch 1/3 -- flush PCIe posted write after TSTART doorbell. Writes to NCR are posted PCIe writes and may not reach the MAC before the driver returns. If the TSTART doorbell is lost, no TX starts, no TCOMP arrives, and the ring goes quiescent. A read-back of NCR after the write is a standard read-after-write PCIe flush. Patch 2/3 -- re-check ISR after IER re-enable in macb_tx_poll(). An existing comment in macb_tx_poll() notes that completions raised while TCOMP is masked do not re-fire when IER is re-enabled, and mitigates the window with macb_tx_complete_pending(), which inspects driver-visible ring state only (after rmb()). On PCIe-attached parts the descriptor DMA write that sets TX_USED can remain in flight when that check runs; the rmb() orders CPU writes but does not retire peripheral DMA. Reading ISR directly after IER re-enable addresses this in two ways: (a) the MMIO read is an architected PCIe read barrier for prior DMA writes, so a subsequent macb_tx_complete_pending() sees up-to-date TX_USED state; (b) it directly observes a pending TCOMP bit if the hardware has one set. Either signal reschedules NAPI. Patch 3/3 -- TX stall watchdog. Defence-in-depth. If patches 1 and 2 close the races we identified, this patch performs a single spin_lock_irqsave/unlock and a branch per queue per second with no other effect. If a further race remains that we have not identified, it invokes the driver's own existing macb_tx_restart(), which already verifies that TBQP is behind tx_head before re-asserting TSTART. We include this patch because we have empirically observed multi-minute stalls on this hardware; we are willing to drop it if the preference is for 1 and 2 to stand alone. Status and testing: * Apply-tested against Linux net-next HEAD (this series is generated from it) and against raspberrypi/linux rpi-6.18.y @ f2f68e79f16f (the fork our fleet runs): all three apply cleanly on net-next; the rpi fork carries an additional local `bool tx_pending` field on `struct macb_queue` that is not in mainline, so we maintain a small rebased patch 3 hunk for it. * Build-tested: the series compiles cleanly as part of our Talos image build pipeline on arm64. * Runtime-tested, early signal: ~4 h 20 min of post-patch uptime on the canary node, ~3 h 15 min on the slowest (last master to upgrade), ~95 node-hours cumulative across the 24-node fleet at the time this cover letter was written. During that window the fleet-wide counts are zero RECOVER events, zero `[tx-stall]` partial markers (an out-of-band userspace detector that records even transient one-second freezes that recover before the 3-second threshold), and zero ping-failure markers. Pre-patch reference window (2026-04-24 14:00-18:10 UTC, when proper monitoring was in place) observed multiple stalls per hour at fleet level; at that rate we would expect on the order of 50 stalls in 95 node-hours, actual is zero. We will follow up with a 24 h and a 1-week data point as the same observability runs forward; the direction so far is consistent with patches 1 and 2 closing the underlying race(s) and patch 3 correctly being a no-op on healthy hardware. The series does not depend on any other in-flight work we are aware of. Happy to split, rebase, or drop individual patches on feedback. All three are independently revertable. Lukasz Raczylo (3): net: macb: flush PCIe posted write after TSTART doorbell net: macb: re-check ISR after IER re-enable in macb_tx_poll net: macb: add TX stall watchdog as defence-in-depth safety net drivers/net/ethernet/cadence/macb.h | 5 ++ drivers/net/ethernet/cadence/macb_main.c | 99 +++++++++++++++++++++--- 2 files changed, 94 insertions(+), 10 deletions(-) -- 2.53.0