Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Théo Lebrun" <theo.lebrun@bootlin.com>
To: "Lukasz Raczylo" <lukasz@raczylo.com>, <netdev@vger.kernel.org>
Cc: "Andrea della Porta" <andrea.porta@suse.com>,
	"Nicolas Ferre" <nicolas.ferre@microchip.com>,
	"Claudiu Beznea" <claudiu.beznea@tuxon.dev>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-rpi-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
Date: Sat, 16 May 2026 11:36:46 +0200	[thread overview]
Message-ID: <DIK002QFFNBY.31C3KUX2SQC6W@bootlin.com> (raw)
In-Reply-To: <20260514215129.33559-1-lukasz@raczylo.com>

Hello Lukasz,

On Thu May 14, 2026 at 11:51 PM CEST, Lukasz Raczylo wrote:
> Andrea, Théo --
>
> Thanks both.  Replying to multiple points in one mail since they
> intersect.

Hum, you forgot some parts of my email to address. The good thing with
interleaved replies is that you see what you need to reply to.
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#use-trimmed-interleaved-replies-in-email-discussions

No big deal.

> First a correction to the v1 cover: the "zero events post-patch"
> claim was true only at the user-space watchdog visibility level.
> With patch 3's warn made unconditional (which is what Andrea's
> review re-tested with on v6.19.10), kernel-level evidence on my
> side now matches what Andrea saw -- patches 1 and 2 are partial,
> patch 3 is empirically the load-bearing fix on this platform.
> The v2 cover acknowledges this and reframes.

[...]

> ## Théo -- the four questions on the cover
>
> Welcome to macb maintenance.  Quick answers; raw 1 Hz traces,
> dmesg dumps, and event logs available on request for any of
> these.

Thanks!

>   1. Tx-only or Tx+Rx broken?
[...]

ACK, makes sense

>   2. Recovery: link down/up vs module reload vs power cycle?
[...]

ACK, makes sense

>   3. TSO / SG / EEE disabling helped some reporters?
>
>      Mixed picture, with one gap in my matrix.
>
>      Tested fleet-wide since 2026-04-24 as baseline before this
>      series:
>        * EEE off (--set-eee end0 eee off + advertise 0x0)
>        * TSO off (-K tso off)
>        * GSO off (-K gso off)
>        * RX/TX rings 4096/2048 (default 512)
>        * IRQ affinity moved off CPU0 to CPU3
>        * CPU governor schedutil (default) -- earlier tested
>          performance, no change
>        * qdisc fq -> pfifo_fast
>
>      With all of those, the stall still fires.  Pre-patch
>      fleet rate was multiple per hour with these knobs already
>      applied.
>
>      Untested by me: TSO + SG (scatter-gather, NETIF_F_SG) off
>      *together*.  That is the specific combination rtheobald
>      (cilium#43198 comment 4188846955) and the launchpad
>      commenter (#34) report as masking the stall -- "must be
>      both, not just one".  I tested TSO + GSO, not TSO + SG.
>
>      Both patch 1 (PCIe-fabric race on TSTART) and patch 2
>      (peripheral DMA retirement race on TX_USED) are
>      consistent with descriptor-fragment-path interactions
>      that SG-off would mask, so the workaround being real
>      isn't surprising.  Closing the race rather than masking
>      it should still be the right thing for mainline.  Happy
>      to canary-test TSO+SG-off on one node if you want the
>      data point before v2 review.

Honestly I was surprised by that comment and doubted it. I don't see how
those could be related. Don't bother testing the full matrix too much.

>   4. cdns,*-max-pipe DT props -- lead dead?
[...]

ACK makes sense

> ## My own audit -- two issues v2 fixes
>
> Worth disclosing before someone else catches them.
>
>   * Patch 2 (v1) reads ISR in macb_tx_poll(), masks off
>     everything except TCOMP, and discards the rest.
>     raspberrypi_rp1_config does not set
>     MACB_CAPS_ISR_CLEAR_ON_WRITE -- the driver assumes
>     read-clear ISR semantics on RP1, and macb_interrupt()
>     relies on processing every bit it reads in one pass for
>     that case to be correct.  My v1 patch breaks that contract:
>     any RCOMP / ROVR / TXUBR / etc. set in ISR at the moment of
>     my read is silently consumed and never processed.  RCOMP
>     being lost is the worst case (RX completion never scheduled
>     until something else asserts the line).  Race window is
>     ~200-500 ns per macb_tx_poll completion; significant at
>     moderate-to-heavy RX load on a level-triggered IRQ where
>     the consumed bit drops the line before GIC delivery.

Careful, you mention your raspberrypi_rp1_config doesn't set
MACB_CAPS_ISR_CLEAR_ON_WRITE but there is a catch: we expect that cap
to come from runtime discovery, see macb_configure_caps() and
register DCFG1/0x0280.

My understanding is that read-to-clear is only for old variants and all
modern MACB/GEM instances use write-to-clear.

I don't expect your RP1 HW uses R2C; my past comment was about making
sure you support both W2C and R2C to avoid breaking the driver for
some.

We have `dev_dbg(..., "Cadence caps 0x%08x\n", bp->caps)` to dump caps,
including the runtime detected ones.

>     v2 patch 2 drops the ISR read entirely and substitutes
>     (void)queue_readl(queue, IMR).  IMR is the read-only mask
>     mirror, no side effects, still flushes prior peripheral
>     DMA writes via PCIe ordering.  Loses the "directly sample
>     latched TCOMP" half of v1's claim; keeps the PCIe-barrier
>     half (which is the half that actually addresses the
>     documented race in the existing macb_tx_complete_pending()
>     rmb() comment).
>
>   * Patch 3 (v1) boot-time false positive described above to
>     Andrea.  v2 fixes:
>
>       - netif_carrier_ok() gate -- no carrier, no completion is
>         possible, don't fire.
>
>       - tracks tail movement via a bool tx_stall_tail_moved set
>         by macb_tx_complete() under tx_ptr_lock when tail
>         advances, cleared by the watchdog tick on the same
>         lock.  Form suggested by pelwell when he reviewed the
>         same series anchored against rpi-6.18.y at
>         raspberrypi/linux#7340 (merged 2026-05-08); 13 days of
>         fleet runtime in this form since 2026-05-02 across 24
>         nodes.  Folded into mainline v2 patch 3 directly rather
>         than carried as a fix-up.
>
>       - netdev_warn_once -> netdev_warn_ratelimited per
>         Andrea's ask -- operator visibility doesn't disappear
>         after the first fire.
>
>
> ## v2 follows
>
> Sending [PATCH net-next v2 0/3] threaded under the v1 cover
> shortly.  Plan to drop "RFC" from the subject prefix (~3 weeks
> production runtime + the rpi in-tree merge tip the balance
> toward a regular PATCH); happy to revert to RFC if you'd
> prefer the iterative-review cadence.
[...]

So you skimmed over two parts of my initial comment:

 - Upstream kernel? From your messages it seems you only build test.
   My question is two-fold:

    - (1) Can we confirm the bug reproduces on upstream?

    - (2) For my personal knowledge, can you tell me why everyone uses
      the Raspberry Pi vendor kernel? I see an upstream DT for rpi5.
      I'll boot one on my desk next week (hopefully).

 - .ndo_tx_timeout() is probably the right approach for [3/3].

---

My recommandation would be this:

Let's get a .ndo_tx_timeout() implementation merged that will mitigate
the issue. If you can prove [1/3] and/or [2/3] are useful, then we can
get those in as well. Otherwise we drop them and keep digging to find
the Tx stall reason(s).

Those are bug-fixes, HW is broken-ish for a good chunk of users, so they
should target net and not net-next. You need a `Fixes:` trailer, it
will probably date back a long way.

Thanks Lukasz!

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com



  reply	other threads:[~2026-05-16  9:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-05-05 13:17   ` Andrea della Porta
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-05-05 13:30   ` Andrea della Porta
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-05-14 10:31 ` Théo Lebrun
2026-05-14 21:51 ` Lukasz Raczylo
2026-05-16  9:36   ` Théo Lebrun [this message]
2026-05-14 21:54 ` [PATCH net-next v2 " Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 1/3] net: macb: flush PCIe posted write after TSTART doorbell (PCIe-only) Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 2/3] net: macb: insert PCIe read barrier before TX completion descriptor check Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo
2026-05-15  9:53     ` [PATCH net-next] net: macb: fix build of TX stall watchdog by replacing undefined netdev_warn_ratelimited Lukasz Raczylo
2026-05-15 12:47       ` Andrew Lunn
2026-05-15 13:08     ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DIK002QFFNBY.31C3KUX2SQC6W@bootlin.com \
    --to=theo.lebrun@bootlin.com \
    --cc=andrea.porta@suse.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=claudiu.beznea@tuxon.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=lukasz@raczylo.com \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.ferre@microchip.com \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox