All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Théo Lebrun" <theo.lebrun@bootlin.com>
To: "Lukasz Raczylo" <lukasz@raczylo.com>, <netdev@vger.kernel.org>
Cc: "Andrea della Porta" <andrea.porta@suse.com>,
	"Nicolas Ferre" <nicolas.ferre@microchip.com>,
	"Claudiu Beznea" <claudiu.beznea@tuxon.dev>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-rpi-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
Date: Sat, 16 May 2026 11:36:46 +0200	[thread overview]
Message-ID: <DIK002QFFNBY.31C3KUX2SQC6W@bootlin.com> (raw)
In-Reply-To: <20260514215129.33559-1-lukasz@raczylo.com>

Hello Lukasz,

On Thu May 14, 2026 at 11:51 PM CEST, Lukasz Raczylo wrote:
> Andrea, Théo --
>
> Thanks both.  Replying to multiple points in one mail since they
> intersect.

Hum, you forgot some parts of my email to address. The good thing with
interleaved replies is that you see what you need to reply to.
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#use-trimmed-interleaved-replies-in-email-discussions

No big deal.

> First a correction to the v1 cover: the "zero events post-patch"
> claim was true only at the user-space watchdog visibility level.
> With patch 3's warn made unconditional (which is what Andrea's
> review re-tested with on v6.19.10), kernel-level evidence on my
> side now matches what Andrea saw -- patches 1 and 2 are partial,
> patch 3 is empirically the load-bearing fix on this platform.
> The v2 cover acknowledges this and reframes.

[...]

> ## Théo -- the four questions on the cover
>
> Welcome to macb maintenance.  Quick answers; raw 1 Hz traces,
> dmesg dumps, and event logs available on request for any of
> these.

Thanks!

>   1. Tx-only or Tx+Rx broken?
[...]

ACK, makes sense

>   2. Recovery: link down/up vs module reload vs power cycle?
[...]

ACK, makes sense

>   3. TSO / SG / EEE disabling helped some reporters?
>
>      Mixed picture, with one gap in my matrix.
>
>      Tested fleet-wide since 2026-04-24 as baseline before this
>      series:
>        * EEE off (--set-eee end0 eee off + advertise 0x0)
>        * TSO off (-K tso off)
>        * GSO off (-K gso off)
>        * RX/TX rings 4096/2048 (default 512)
>        * IRQ affinity moved off CPU0 to CPU3
>        * CPU governor schedutil (default) -- earlier tested
>          performance, no change
>        * qdisc fq -> pfifo_fast
>
>      With all of those, the stall still fires.  Pre-patch
>      fleet rate was multiple per hour with these knobs already
>      applied.
>
>      Untested by me: TSO + SG (scatter-gather, NETIF_F_SG) off
>      *together*.  That is the specific combination rtheobald
>      (cilium#43198 comment 4188846955) and the launchpad
>      commenter (#34) report as masking the stall -- "must be
>      both, not just one".  I tested TSO + GSO, not TSO + SG.
>
>      Both patch 1 (PCIe-fabric race on TSTART) and patch 2
>      (peripheral DMA retirement race on TX_USED) are
>      consistent with descriptor-fragment-path interactions
>      that SG-off would mask, so the workaround being real
>      isn't surprising.  Closing the race rather than masking
>      it should still be the right thing for mainline.  Happy
>      to canary-test TSO+SG-off on one node if you want the
>      data point before v2 review.

Honestly I was surprised by that comment and doubted it. I don't see how
those could be related. Don't bother testing the full matrix too much.

>   4. cdns,*-max-pipe DT props -- lead dead?
[...]

ACK makes sense

> ## My own audit -- two issues v2 fixes
>
> Worth disclosing before someone else catches them.
>
>   * Patch 2 (v1) reads ISR in macb_tx_poll(), masks off
>     everything except TCOMP, and discards the rest.
>     raspberrypi_rp1_config does not set
>     MACB_CAPS_ISR_CLEAR_ON_WRITE -- the driver assumes
>     read-clear ISR semantics on RP1, and macb_interrupt()
>     relies on processing every bit it reads in one pass for
>     that case to be correct.  My v1 patch breaks that contract:
>     any RCOMP / ROVR / TXUBR / etc. set in ISR at the moment of
>     my read is silently consumed and never processed.  RCOMP
>     being lost is the worst case (RX completion never scheduled
>     until something else asserts the line).  Race window is
>     ~200-500 ns per macb_tx_poll completion; significant at
>     moderate-to-heavy RX load on a level-triggered IRQ where
>     the consumed bit drops the line before GIC delivery.

Careful, you mention your raspberrypi_rp1_config doesn't set
MACB_CAPS_ISR_CLEAR_ON_WRITE but there is a catch: we expect that cap
to come from runtime discovery, see macb_configure_caps() and
register DCFG1/0x0280.

My understanding is that read-to-clear is only for old variants and all
modern MACB/GEM instances use write-to-clear.

I don't expect your RP1 HW uses R2C; my past comment was about making
sure you support both W2C and R2C to avoid breaking the driver for
some.

We have `dev_dbg(..., "Cadence caps 0x%08x\n", bp->caps)` to dump caps,
including the runtime detected ones.

>     v2 patch 2 drops the ISR read entirely and substitutes
>     (void)queue_readl(queue, IMR).  IMR is the read-only mask
>     mirror, no side effects, still flushes prior peripheral
>     DMA writes via PCIe ordering.  Loses the "directly sample
>     latched TCOMP" half of v1's claim; keeps the PCIe-barrier
>     half (which is the half that actually addresses the
>     documented race in the existing macb_tx_complete_pending()
>     rmb() comment).
>
>   * Patch 3 (v1) boot-time false positive described above to
>     Andrea.  v2 fixes:
>
>       - netif_carrier_ok() gate -- no carrier, no completion is
>         possible, don't fire.
>
>       - tracks tail movement via a bool tx_stall_tail_moved set
>         by macb_tx_complete() under tx_ptr_lock when tail
>         advances, cleared by the watchdog tick on the same
>         lock.  Form suggested by pelwell when he reviewed the
>         same series anchored against rpi-6.18.y at
>         raspberrypi/linux#7340 (merged 2026-05-08); 13 days of
>         fleet runtime in this form since 2026-05-02 across 24
>         nodes.  Folded into mainline v2 patch 3 directly rather
>         than carried as a fix-up.
>
>       - netdev_warn_once -> netdev_warn_ratelimited per
>         Andrea's ask -- operator visibility doesn't disappear
>         after the first fire.
>
>
> ## v2 follows
>
> Sending [PATCH net-next v2 0/3] threaded under the v1 cover
> shortly.  Plan to drop "RFC" from the subject prefix (~3 weeks
> production runtime + the rpi in-tree merge tip the balance
> toward a regular PATCH); happy to revert to RFC if you'd
> prefer the iterative-review cadence.
[...]

So you skimmed over two parts of my initial comment:

 - Upstream kernel? From your messages it seems you only build test.
   My question is two-fold:

    - (1) Can we confirm the bug reproduces on upstream?

    - (2) For my personal knowledge, can you tell me why everyone uses
      the Raspberry Pi vendor kernel? I see an upstream DT for rpi5.
      I'll boot one on my desk next week (hopefully).

 - .ndo_tx_timeout() is probably the right approach for [3/3].

---

My recommandation would be this:

Let's get a .ndo_tx_timeout() implementation merged that will mitigate
the issue. If you can prove [1/3] and/or [2/3] are useful, then we can
get those in as well. Otherwise we drop them and keep digging to find
the Tx stall reason(s).

Those are bug-fixes, HW is broken-ish for a good chunk of users, so they
should target net and not net-next. You need a `Fixes:` trailer, it
will probably date back a long way.

Thanks Lukasz!

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com



  reply	other threads:[~2026-05-16  9:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-05-05 13:17   ` Andrea della Porta
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-05-05 13:30   ` Andrea della Porta
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-05-14 10:31 ` Théo Lebrun
2026-05-14 21:51 ` Lukasz Raczylo
2026-05-16  9:36   ` Théo Lebrun [this message]
2026-05-14 21:54 ` [PATCH net-next v2 " Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 1/3] net: macb: flush PCIe posted write after TSTART doorbell (PCIe-only) Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 2/3] net: macb: insert PCIe read barrier before TX completion descriptor check Lukasz Raczylo
2026-05-14 21:54   ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo
2026-05-15  9:53     ` [PATCH net-next] net: macb: fix build of TX stall watchdog by replacing undefined netdev_warn_ratelimited Lukasz Raczylo
2026-05-15 12:47       ` Andrew Lunn
2026-05-15 13:08     ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DIK002QFFNBY.31C3KUX2SQC6W@bootlin.com \
    --to=theo.lebrun@bootlin.com \
    --cc=andrea.porta@suse.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=claudiu.beznea@tuxon.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=lukasz@raczylo.com \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.ferre@microchip.com \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.