public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Brian Norris <briannorris@chromium.org>
To: Hongxing Zhu <hongxing.zhu@nxp.com>
Cc: "manivannan.sadhasivam@oss.qualcomm.com"
	<manivannan.sadhasivam@oss.qualcomm.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
	"Oliver O'Halloran" <oohall@gmail.com>,
	"Will Deacon" <will@kernel.org>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Manivannan Sadhasivam" <mani@kernel.org>,
	"Rob Herring" <robh@kernel.org>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Philipp Zabel" <p.zabel@pengutronix.de>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-arm-msm@vger.kernel.org" <linux-arm-msm@vger.kernel.org>,
	"linux-rockchip@lists.infradead.org"
	<linux-rockchip@lists.infradead.org>,
	"Niklas Cassel" <cassel@kernel.org>,
	"Wilfred Mallawa" <wilfred.mallawa@wdc.com>,
	"Krishna Chaitanya Chundru" <krishna.chundru@oss.qualcomm.com>,
	"Lukas Wunner" <lukas@wunner.de>,
	"Wilson Ding" <dingwei@marvell.com>,
	"Miles Chen" <minhuachen@google.com>
Subject: Re: [PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a platform specific way
Date: Wed, 8 Apr 2026 18:58:34 -0700	[thread overview]
Message-ID: <adcHylFjFjhHT-tP@google.com> (raw)
In-Reply-To: <AS8PR04MB883389FD2A016F9E02756B048C49A@AS8PR04MB8833.eurprd04.prod.outlook.com>

Hi Richard and Mani,

For the record, I've been using a form of an earlier version of this
patchset in my environment for some time now, and I've run across
problems that *might* relate to what Richard is reporting, but I'm not
quite sure at the moment. Details below.

On Wed, Mar 25, 2026 at 07:06:49AM +0000, Hongxing Zhu wrote:
> Hi Mani:
> I've accidentally encountered a new issue based on the reset root port patch-set.
> After performing a few hot-reset operations, the PCIe link enters a continuous up/down cycling pattern.
> 
> I found that calling pci_reset_secondary_bus() first in pcibios_reset_secondary_bus() appears to resolve this issue.
> Have you experienced a similar problem?
> 
> "
> ...
> [  141.897701] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> [  142.086341] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> [  142.092038] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected
> ...
> "
> 
> Platform: i.MX95 EVK board plus local Root Ports reset supports based on the #1 and #2 patches of v7 patch-set.
> Notes of the logs:
> - One Gen3 NVME device is connected.
> - "./memtool 4c341058=0;./memtool 4c341058=1;" is used to toggle the LTSSM_EN bit to trigger the link down.
> - Toggle BIT6 of Bridge Control Register to trigger hot reset by "./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff;"
> - The Root Port reset patches works correctly at first.
> However, after several hot-reset triggers, the link enters a repeated down/up cycling state.
> 
> Logs:
> [    3.553188] imx6q-pcie 4c300000.pcie: host bridge /soc/pcie@4c300000 ranges:
> [    3.560308] imx6q-pcie 4c300000.pcie:       IO 0x006ff00000..0x006fffffff -> 0x0000000000
> [    3.568525] imx6q-pcie 4c300000.pcie:      MEM 0x0910000000..0x091fffffff -> 0x0010000000
> [    3.577314] imx6q-pcie 4c300000.pcie: config reg[1] 0x60100000 == cpu 0x60100000
> [    3.796029] imx6q-pcie 4c300000.pcie: iATU: unroll T, 128 ob, 128 ib, align 4K, limit 1024G
> [    4.003746] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> [    4.009553] imx6q-pcie 4c300000.pcie: PCI host bridge to bus 0000:00
> root@imx95evk:~#
> root@imx95evk:~#
> root@imx95evk:~# ./memtool 4c341058=0;./memtool 4c341058=1; Writing 32-bit value 0x0 to address 0x4C341058
> Writing 32-bit v
> [   87.265348] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d01) link down detected
> alue 0x1 to adder
> [   87.273106] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down
> ss 0x4C341058
> [   87.281264] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> [   87.289245] pci 0000:01:00.0: AER: can't recover (no error_detected callback)
> root@imx95evk:~# [   87.514216] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> [   87.702968] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> [   87.834983] pcieport 0000:00:00.0: Root Port has been reset
> [   87.840714] pcieport 0000:00:00.0: AER: device recovery failed
> [   87.846592] imx6q-pcie 4c300000.pcie: Rescan bus after link up is detected
> [   87.855947] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring

I've seen this same line ("bridge configuration invalid") before, and I
believe that's because the saved state (pci_save_state(); more about
this below) is invalid -- it contains 0 values in places where they
should be non-zero. So when those values are restored
(pci_restore_state()), we get confused.

I believe we've pinned down one reason this invalid state occurs -- it's
because of an automatic (mis)feature in the DesignWare PCIe hardware.
Specifically, it's because of what the controller does during a surprise
link-down error.

From the Designware docs:

  "[...] during normal operation, the link might fail and go down. After
  this link-down event, the controller requests the DWC_pcie_clkrst.v
  module to hot-reset the controller. There is no difference in the
  handling of a link-down reset or a hot reset; the controller asserts
  the link_req_rst_not output requesting the DWC_pcie_clkrst.v module to
  reset the controller."

In some of the adjacent documentation (and confirmed in local testing),
it suggests that this automatic reset will also reset various DBI (i.e.,
PCIe config space) registers. It also seems as if there's not really a
good way to completely stop this automatic reset -- the docs mention
some SW methods prevent the reset, but they all seem racy or incomplete.

Anyway, I think this implies that patch 1 is somewhat wrong [1]. It
includes some code like this:

		pci_save_state(dev);
		ret = host->reset_root_port(host, dev);
		if (ret)
			pci_err(dev, "Failed to reset Root Port: %d\n", ret);
		else
			/* Now restore it on success */
			pci_restore_state(dev);

That first line (pci_save_state()) is prone to saving invalid state,
depending on whether the link-down event has finished flushing and
resetting the controller yet or not. The resulting impact is a bit hard
to judge, depending on what (mis)configuration you end up with.

I also noticed commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times") was merged recently. With that change, I
believe it is now safe to perform pci_restore_state() even without
pci_save_state() here.

So ... can we remove pci_save_state() from
pcibios_reset_secondary_bus()? Might that help? It sounds like my above
observations *may* match Richard's reports, but I'm not sure. And
anyway, the documented hardware behavior is racy, so it's hard to
propose a foolproof solution.

Brian

[1] At least, for DesignWare controllers.

> [   87.864423] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> 
> root@imx95evk:~#
> root@imx95evk:~# cat /proc/interrupts | grep lnk;
> 273:          2          0          0          0          0          0    GICv3 342 Level     PCIe PME, lnk_notify
> root@imx95evk:~#
> root@imx95evk:~#
> root@imx95evk:~# ./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff; Writing 32-bit va
> [  107.028086] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d00) link down detected lue 0x4001FF to a
> [  107.037018] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down ddress 0x4C30003C
> [  107.045137] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> 
> Writing 32-bit
> [  107.053332] pci 0000:01:00.0: AER: can't recover (no error_detected callback)  value 0x1FF to address 0x4C30003C root@imx95evk:~#
> [  107.282146] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> [  107.470801] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> [  107.602823] pcieport 0000:00:00.0: Root Port has been reset
> [  107.608601] pcieport 0000:00:00.0: AER: device recovery failed
> [  107.614497] imx6q-pcie 4c300000.pcie: Rescan bus after link up is detected
> [  107.623805] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [  107.632281] pci_bus 0000:01: busn_res: [bus 01] end is updated to 01
> 
> root@imx95evk:~#
> root@imx95evk:~# cat /proc/interrupts | grep lnk;
> 273:          4          0          0          0          0          0    GICv3 342 Level     PCIe PME, lnk_notify
> root@imx95evk:~#
> root@imx95evk:~# ./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff; Writing 32-bit va
> [  133.424041] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d00) link down detected lue 0x4001FF to a
> [  133.432954] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down ddress 0x4C30003C
> [  133.441106] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> 
> Writing 32-bit
> [  133.449309] pci 0000:01:00.0: AER: can't recover (no error_detected callback)  value 0x1FF to address 0x4C30003C root@imx95evk:~#
> [  133.677824] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> [  133.870414] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> [  134.002534] pcieport 0000:00:00.0: Root Port has been reset
> [  134.008307] pcieport 0000:00:00.0: AER: device recovery failed
> [  134.014193] imx6q-pcie 4c300000.pcie: Rescan bus after link up is detected
> [  134.023418] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [  134.031881] pci_bus 0000:01: busn_res: [bus 01] end is updated to 01
> 
> root@imx95evk:~# ./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff; Writing 32-bit va
> [  140.149713] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d00) link down detected lue 0x4001FF to a
> [  140.158614] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down ddress 0x4C30003C
> [  140.166779] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> [  140.174981] pci 0000:01:00.0: AER: can't recover (no error_detected callback) Writing 32-bit value 0x1FF to address 0x4C30003C root@imx95evk:~#
> [  140.401605] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> [  140.590491] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> [  140.596206] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected
> 
> root@imx95evk:~#
> [  141.630311] pcieport 0000:00:00.0: Data Link Layer Link Active not set in 100 msec
> [  141.637950] pcieport 0000:00:00.0: Failed to reset Root Port: -25
> [  141.644095] pcieport 0000:00:00.0: AER: subordinate device reset failed
> [  141.650883] pcieport 0000:00:00.0: AER: device recovery failed
> [  141.656784] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down
> [  141.663520] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> [  141.670271] pci 0000:01:00.0: AER: can't recover (no error_detected callback)
> [  141.897701] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> [  142.086341] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> [  142.092038] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected
> [  143.126273] pcieport 0000:00:00.0: Data Link Layer Link Active not set in 100 msec
> [  143.133919] pcieport 0000:00:00.0: Failed to reset Root Port: -25
> [  143.140052] pcieport 0000:00:00.0: AER: subordinate device reset failed
> [  143.146747] pcieport 0000:00:00.0: AER: device recovery failed
> [  143.152604] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down
> [  143.159314] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> [  143.166022] pci 0000:01:00.0: AER: can't recover (no error_detected callback)
> [  143.389723] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> [  143.582294] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> [  143.587996] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected
> 
> 
> Thanks.
> Best Regards
> Richard Zhu


      reply	other threads:[~2026-04-09  1:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10 14:01 [PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
2026-03-10 14:01 ` [PATCH v7 1/4] PCI/ERR: " Manivannan Sadhasivam via B4 Relay
2026-03-11  5:26   ` Shawn Lin
2026-03-10 14:02 ` [PATCH v7 2/4] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
2026-03-11  0:55   ` Shawn Lin
2026-03-11  5:04     ` Manivannan Sadhasivam
2026-03-11  5:20       ` Shawn Lin
2026-03-10 14:02 ` [PATCH v7 3/4] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
2026-03-10 14:02 ` [PATCH v7 4/4] misc: pci_endpoint_test: Add AER error handlers Manivannan Sadhasivam via B4 Relay
2026-03-11  8:37 ` [PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Krishna Chaitanya Chundru
2026-03-11 11:05 ` Niklas Cassel
2026-03-11 14:39   ` Manivannan Sadhasivam
2026-03-11 15:14     ` Manivannan Sadhasivam
2026-03-17 11:16       ` Niklas Cassel
2026-03-17 13:11         ` Niklas Cassel
2026-03-25  7:06 ` Hongxing Zhu
2026-04-09  1:58   ` Brian Norris [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adcHylFjFjhHT-tP@google.com \
    --to=briannorris@chromium.org \
    --cc=bhelgaas@google.com \
    --cc=cassel@kernel.org \
    --cc=dingwei@marvell.com \
    --cc=heiko@sntech.de \
    --cc=hongxing.zhu@nxp.com \
    --cc=krishna.chundru@oss.qualcomm.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lpieralisi@kernel.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=mani@kernel.org \
    --cc=manivannan.sadhasivam@oss.qualcomm.com \
    --cc=minhuachen@google.com \
    --cc=oohall@gmail.com \
    --cc=p.zabel@pengutronix.de \
    --cc=robh@kernel.org \
    --cc=wilfred.mallawa@wdc.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox