public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Herve Codina <herve.codina@bootlin.com>
To: Daniel Machon <daniel.machon@microchip.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Horatiu Vultur <horatiu.vultur@microchip.com>,
	Steen Hegelund <steen.hegelund@microchip.com>,
	<UNGLinuxDriver@microchip.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Arnd Bergmann <arnd@arndb.de>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<bpf@vger.kernel.org>
Subject: Re: [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
Date: Fri, 27 Mar 2026 11:33:37 +0100	[thread overview]
Message-ID: <20260327113337.0368eea3@bootlin.com> (raw)
In-Reply-To: <20260326154833.jp6rx5x2rlpmwrg3@DEN-DL-M70577>

Hi Daniel,

On Thu, 26 Mar 2026 16:48:33 +0100
Daniel Machon <daniel.machon@microchip.com> wrote:

...

> 
> As I remembered, doing rmmod on the lan966x_switch followed by modprobe
> lan966x_switch works fine. This is because neither the switch core, nor the FDMA
> engine is reset, so they remain in sync.
> 
> When the lan966x_pci module is removed and reloaded (what you did), the DT
> overlay is re-applied, which causes the reset controller
> (reset-microchip-sparx5) to re-probe. During probe, it performs a GCB soft reset
> that resets the switch core, but protects the CPU domain from the reset. The
> FDMA engine is part of the CPU domain, so it is not reset.
> 
> This leaves the switch core in a reset state while the FDMA
> retains state from the previous driver instance. When the switch driver
> subsequently probes and activates the FDMA channels, the two are out of
> sync, and the FDMA immediately reports extraction errors.
> 
> Theres actually an FDMA register called NRESET that resets the FDMA controller
> state. Calling this in the FDMA init path causes traffic to work correctly on
> lan966x_pci reload, but it does not get rid of the FDMA splats you posted above.
> They get queued up between the switch core reset, in the reset controller, and
> the FDMA enabling. I tried different approaches to drain or flush queues, but
> they wont go away entirely.
> 
> The only thing that seems to work consistently is to *not* do the soft reset in
> the reset controller for the PCI path. The soft reset is actually the problem:
> it only resets the switch core while protecting the CPU domain (including FDMA),
> causing a desync.
> 
> A simple fix could be (in reset-microchip-sparx5.c):
> 
> +static bool mchp_reset_is_pci(struct device *dev)
> +{
> +	for (dev = dev->parent; dev; dev = dev->parent) {
> +		if (dev_is_pci(dev))
> +			return true;
> +	}
> +	return false;
> +}
> 
> -	/* Issue the reset very early, our actual reset callback is a noop. */
> -	err = sparx5_switch_reset(ctx);
> -	if (err)
> -		return err;
> +	/* Issue the reset very early, our actual reset callback is a noop.
> +	 *
> +	 * On the PCI path, skip the reset. The endpoint is already in
> +	 * power-on reset state on the first probe. On subsequent probes
> +	 * (after driver reload), resetting the switch core while the FDMA
> +	 * retains state (CPU domain is protected from the soft reset)
> +	 * causes the two to go out of sync, leading to FDMA extraction
> +	 * errors.
> +	 */
> +	if (!mchp_reset_is_pci(&pdev->dev)) {
> +		err = sparx5_switch_reset(ctx);
> +		if (err)
> +			return err;
> +	}
> 
> Could you test it and see if it helps the problem on your side.
> 

I have tested it on my ARM and x86 system. It fixes the lan966x_pci module
unloading / reloading issue.

However an other regression is present. After a reboot, without power
off/on, the board is not working (tested on both my ARM and x86 systems).

According to your explanation, this makes sense.

IMHO, the problem is that we cannot make the assumption that "The endpoint
is already in power-on reset state on the first probe". That's not true
when you just call the reboot command.

Best regards,
Hervé

  reply	other threads:[~2026-03-27 10:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 15:00 [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Daniel Machon
2026-03-20 15:00 ` [PATCH net-next 01/10] net: microchip: fdma: rename contiguous dataptr helpers Daniel Machon
2026-03-20 15:00 ` [PATCH net-next 02/10] net: microchip: fdma: add PCIe ATU support Daniel Machon
2026-03-20 15:00 ` [PATCH net-next 03/10] net: lan966x: add FDMA LLP register write helper Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 04/10] net: lan966x: export FDMA helpers for reuse Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 05/10] net: lan966x: add FDMA ops dispatch for PCIe support Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 06/10] net: lan966x: add PCIe FDMA support Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 07/10] net: lan966x: add PCIe FDMA MTU change support Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 08/10] net: lan966x: add PCIe FDMA XDP support Daniel Machon
2026-03-22  7:11   ` Mohsin Bashir
2026-03-22 20:30     ` Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 09/10] misc: lan966x-pci: dts: extend cpu reg to cover PCIE DBI space Daniel Machon
2026-03-20 15:01 ` [PATCH net-next 10/10] misc: lan966x-pci: dts: add fdma interrupt to overlay Daniel Machon
2026-03-23 14:52 ` [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA Herve Codina
2026-03-23 16:26   ` Herve Codina
2026-03-23 19:40     ` Daniel Machon
2026-03-24  8:07       ` Herve Codina
2026-03-26 15:48         ` Daniel Machon
2026-03-27 10:33           ` Herve Codina [this message]
2026-03-27 11:07             ` Daniel Machon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260327113337.0368eea3@bootlin.com \
    --to=herve.codina@bootlin.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel.machon@microchip.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hawk@kernel.org \
    --cc=horatiu.vultur@microchip.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=steen.hegelund@microchip.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox