From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7857510FC456 for ; Thu, 9 Apr 2026 01:58:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=rtu5J6mkYa7cAz+oGv/2J7Qnc58jCw+U8QvJQpdZOFw=; b=aLTSq8auV4XHy4JJDY9YqUkQD6 vTdWrHa3LXArSacdFBdXmEaEl95B2NU7RsXAGk9O092cffO5e8dr9BdUwR+tn43pMjwcFAMgtkNsj x4AkXu/FNdIxoWjgHo0tiA6737iKXXJmehjuHSXt8EO2D6Qyb5MczoAw7mbXzM5T8trlGe/4oEhaU KfxjpZ3+ecyc++0Oyq1dCkWMx89DxXGj27gBUYB8z4+agYQ6F5uCx2PzSNZXcwc0bt6MGyFBA9O73 9JP34Qv5qMwdK1cS98RT1Tyc9FM1/0xnzPtFcVe+e99oXL0DhqZNHeMRzeXsnWGsXRTPMZRCyzsXm 6wuOv3Yg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1wAefL-00000009b2i-15TP; Thu, 09 Apr 2026 01:58:43 +0000 Received: from mail-dy1-x132b.google.com ([2607:f8b0:4864:20::132b]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1wAefH-00000009b1b-3ESm for linux-arm-kernel@lists.infradead.org; Thu, 09 Apr 2026 01:58:41 +0000 Received: by mail-dy1-x132b.google.com with SMTP id 5a478bee46e88-2d3eb307324so105129eec.0 for ; Wed, 08 Apr 2026 18:58:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1775699918; x=1776304718; darn=lists.infradead.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=rtu5J6mkYa7cAz+oGv/2J7Qnc58jCw+U8QvJQpdZOFw=; b=grtxwQZKEtOXiVyrqRdUjowfs+vFQieTw5qgBm9XUHaCVjEftqe0aVKdx2fbaOsGqj adfaAiIJxUzcq4HM7jlgs3piBafUXIpwxBFJiF4plzZ42wQe3AW1s/tBrVS20Ysnt02v /Z6j7WNXvctoOQSF5QNUJol9sheO/8SpklX80= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775699918; x=1776304718; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rtu5J6mkYa7cAz+oGv/2J7Qnc58jCw+U8QvJQpdZOFw=; b=J13SPZXZM1F+3GYTKPZdrWN5cRFLJ8Z6Xe2vXJ2NmJG28WDiWYZT0lNRp7vNjRgYil CyyzsfwFD8nJuIFaoD9Nr9VjspsjCx/q4UCdut9/lx707gDBHTJxtO06thgMS/XSj6BV ncoG6WLU1O9SNEZruaatVBGZJ5YA7/In/rzVXuBrPcme/WfvJnsDAYN8XbiRKUcTzHtY OfGCAULcbLrUuLLo/qOFTEHDpiBhuijH9jHa4nHgUolm7j4DN2dEy+7dBOVYSidqZGqC IFhbTtPKmvXkwASBAwRY0d0SJ9v8RMuRhusTK/kB0UVyt9tv9ZcYm2ldHEiFFq9HdphO fQiw== X-Forwarded-Encrypted: i=1; AJvYcCXgHNsSm4r/qVJix+gz+iFOnaQDJv26ngIsI+SGkSZYz8gEDLMKKy9H0WvblFbCzJMPwonubiPHU3pZ3BCRN7uA@lists.infradead.org X-Gm-Message-State: AOJu0YzeymlaSDq1dWGpYoeb/rZ7CUDxdMZnjWqB9AN+MK/OKLQypGVf JF283rG9rHm8wrVL+5FHsmSglw25L4Woitzm5NGWvMb8/G+VO4NIbcgOotKEO/k2Dg== X-Gm-Gg: AeBDiesEii59oxvCONOsC3sj+aW6s8lYwW5A1eCm3TYNEdUq2r/5Et/3X/P3q+YK+9C QzcDkuPnIaSZ7kunR39jnVcA6DlW5W7ByX2t8Vc29zn+L3ew9iFsK+G0arBKDX6ysqqwNVT5Z0I hq/XBF1f8k+OA+JyzWT+5HbzOfkFJwnX711jX2olp/w+PA5uCYphcGGR3hUws3MOvoch4b4TGSs 0xTGJ4gli7GukLhJrDNW54ppEItPChHA7giBB/vsU4uEk3fmLyAKiLY51d+1E9GPEuVa6DV3Us7 kV2kVfho2B3uQcu08poxOSQOEQuImYGHv7kJkLDi9EczV7GCxnnyXjOEwBPLOigmP2XMiT8yO8E l1sxJQVzq0OIM+QmxIqU8fx8cr2KjmKdzJbiPtDioL5P6ORKrr7Sgsl9Mxy9vyxxwE7tN4QeGS5 jf4xo3W5H11EOO2DmhycFNuLz3jcUciyFpOTeYmOrlsBAPyAeUK3LZykJBk2faew== X-Received: by 2002:a05:7301:6097:b0:2d1:a3ea:d8a5 with SMTP id 5a478bee46e88-2d40e9cace4mr857392eec.7.1775699918176; Wed, 08 Apr 2026 18:58:38 -0700 (PDT) Received: from localhost ([2a00:79e0:2e7c:8:8419:d73b:6e27:ef7e]) by smtp.gmail.com with UTF8SMTPSA id 5a478bee46e88-2ca7c3010e9sm29435810eec.14.2026.04.08.18.58.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Apr 2026 18:58:37 -0700 (PDT) Date: Wed, 8 Apr 2026 18:58:34 -0700 From: Brian Norris To: Hongxing Zhu Cc: "manivannan.sadhasivam@oss.qualcomm.com" , Bjorn Helgaas , Mahesh J Salgaonkar , Oliver O'Halloran , Will Deacon , Lorenzo Pieralisi , Krzysztof =?utf-8?Q?Wilczy=C5=84ski?= , Manivannan Sadhasivam , Rob Herring , Heiko Stuebner , Philipp Zabel , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linuxppc-dev@lists.ozlabs.org" , "linux-arm-kernel@lists.infradead.org" , "linux-arm-msm@vger.kernel.org" , "linux-rockchip@lists.infradead.org" , Niklas Cassel , Wilfred Mallawa , Krishna Chaitanya Chundru , Lukas Wunner , Wilson Ding , Miles Chen Subject: Re: [PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Message-ID: References: <20260310-pci-port-reset-v7-0-9dd00ccc25ab@oss.qualcomm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20260408_185839_845217_5179B0D1 X-CRM114-Status: GOOD ( 31.18 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Richard and Mani, For the record, I've been using a form of an earlier version of this patchset in my environment for some time now, and I've run across problems that *might* relate to what Richard is reporting, but I'm not quite sure at the moment. Details below. On Wed, Mar 25, 2026 at 07:06:49AM +0000, Hongxing Zhu wrote: > Hi Mani: > I've accidentally encountered a new issue based on the reset root port patch-set. > After performing a few hot-reset operations, the PCIe link enters a continuous up/down cycling pattern. > > I found that calling pci_reset_secondary_bus() first in pcibios_reset_secondary_bus() appears to resolve this issue. > Have you experienced a similar problem? > > " > ... > [ 141.897701] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected > [ 142.086341] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > [ 142.092038] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected > ... > " > > Platform: i.MX95 EVK board plus local Root Ports reset supports based on the #1 and #2 patches of v7 patch-set. > Notes of the logs: > - One Gen3 NVME device is connected. > - "./memtool 4c341058=0;./memtool 4c341058=1;" is used to toggle the LTSSM_EN bit to trigger the link down. > - Toggle BIT6 of Bridge Control Register to trigger hot reset by "./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff;" > - The Root Port reset patches works correctly at first. > However, after several hot-reset triggers, the link enters a repeated down/up cycling state. > > Logs: > [ 3.553188] imx6q-pcie 4c300000.pcie: host bridge /soc/pcie@4c300000 ranges: > [ 3.560308] imx6q-pcie 4c300000.pcie: IO 0x006ff00000..0x006fffffff -> 0x0000000000 > [ 3.568525] imx6q-pcie 4c300000.pcie: MEM 0x0910000000..0x091fffffff -> 0x0010000000 > [ 3.577314] imx6q-pcie 4c300000.pcie: config reg[1] 0x60100000 == cpu 0x60100000 > [ 3.796029] imx6q-pcie 4c300000.pcie: iATU: unroll T, 128 ob, 128 ib, align 4K, limit 1024G > [ 4.003746] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > [ 4.009553] imx6q-pcie 4c300000.pcie: PCI host bridge to bus 0000:00 > root@imx95evk:~# > root@imx95evk:~# > root@imx95evk:~# ./memtool 4c341058=0;./memtool 4c341058=1; Writing 32-bit value 0x0 to address 0x4C341058 > Writing 32-bit v > [ 87.265348] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d01) link down detected > alue 0x1 to adder > [ 87.273106] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down > ss 0x4C341058 > [ 87.281264] pcieport 0000:00:00.0: Recovering Root Port due to Link Down > [ 87.289245] pci 0000:01:00.0: AER: can't recover (no error_detected callback) > root@imx95evk:~# [ 87.514216] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected > [ 87.702968] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > [ 87.834983] pcieport 0000:00:00.0: Root Port has been reset > [ 87.840714] pcieport 0000:00:00.0: AER: device recovery failed > [ 87.846592] imx6q-pcie 4c300000.pcie: Rescan bus after link up is detected > [ 87.855947] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring I've seen this same line ("bridge configuration invalid") before, and I believe that's because the saved state (pci_save_state(); more about this below) is invalid -- it contains 0 values in places where they should be non-zero. So when those values are restored (pci_restore_state()), we get confused. I believe we've pinned down one reason this invalid state occurs -- it's because of an automatic (mis)feature in the DesignWare PCIe hardware. Specifically, it's because of what the controller does during a surprise link-down error. >From the Designware docs: "[...] during normal operation, the link might fail and go down. After this link-down event, the controller requests the DWC_pcie_clkrst.v module to hot-reset the controller. There is no difference in the handling of a link-down reset or a hot reset; the controller asserts the link_req_rst_not output requesting the DWC_pcie_clkrst.v module to reset the controller." In some of the adjacent documentation (and confirmed in local testing), it suggests that this automatic reset will also reset various DBI (i.e., PCIe config space) registers. It also seems as if there's not really a good way to completely stop this automatic reset -- the docs mention some SW methods prevent the reset, but they all seem racy or incomplete. Anyway, I think this implies that patch 1 is somewhat wrong [1]. It includes some code like this: pci_save_state(dev); ret = host->reset_root_port(host, dev); if (ret) pci_err(dev, "Failed to reset Root Port: %d\n", ret); else /* Now restore it on success */ pci_restore_state(dev); That first line (pci_save_state()) is prone to saving invalid state, depending on whether the link-down event has finished flushing and resetting the controller yet or not. The resulting impact is a bit hard to judge, depending on what (mis)configuration you end up with. I also noticed commit a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times") was merged recently. With that change, I believe it is now safe to perform pci_restore_state() even without pci_save_state() here. So ... can we remove pci_save_state() from pcibios_reset_secondary_bus()? Might that help? It sounds like my above observations *may* match Richard's reports, but I'm not sure. And anyway, the documented hardware behavior is racy, so it's hard to propose a foolproof solution. Brian [1] At least, for DesignWare controllers. > [ 87.864423] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 > > root@imx95evk:~# > root@imx95evk:~# cat /proc/interrupts | grep lnk; > 273: 2 0 0 0 0 0 GICv3 342 Level PCIe PME, lnk_notify > root@imx95evk:~# > root@imx95evk:~# > root@imx95evk:~# ./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff; Writing 32-bit va > [ 107.028086] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d00) link down detected lue 0x4001FF to a > [ 107.037018] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down ddress 0x4C30003C > [ 107.045137] pcieport 0000:00:00.0: Recovering Root Port due to Link Down > > Writing 32-bit > [ 107.053332] pci 0000:01:00.0: AER: can't recover (no error_detected callback) value 0x1FF to address 0x4C30003C root@imx95evk:~# > [ 107.282146] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected > [ 107.470801] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > [ 107.602823] pcieport 0000:00:00.0: Root Port has been reset > [ 107.608601] pcieport 0000:00:00.0: AER: device recovery failed > [ 107.614497] imx6q-pcie 4c300000.pcie: Rescan bus after link up is detected > [ 107.623805] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring > [ 107.632281] pci_bus 0000:01: busn_res: [bus 01] end is updated to 01 > > root@imx95evk:~# > root@imx95evk:~# cat /proc/interrupts | grep lnk; > 273: 4 0 0 0 0 0 GICv3 342 Level PCIe PME, lnk_notify > root@imx95evk:~# > root@imx95evk:~# ./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff; Writing 32-bit va > [ 133.424041] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d00) link down detected lue 0x4001FF to a > [ 133.432954] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down ddress 0x4C30003C > [ 133.441106] pcieport 0000:00:00.0: Recovering Root Port due to Link Down > > Writing 32-bit > [ 133.449309] pci 0000:01:00.0: AER: can't recover (no error_detected callback) value 0x1FF to address 0x4C30003C root@imx95evk:~# > [ 133.677824] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected > [ 133.870414] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > [ 134.002534] pcieport 0000:00:00.0: Root Port has been reset > [ 134.008307] pcieport 0000:00:00.0: AER: device recovery failed > [ 134.014193] imx6q-pcie 4c300000.pcie: Rescan bus after link up is detected > [ 134.023418] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring > [ 134.031881] pci_bus 0000:01: busn_res: [bus 01] end is updated to 01 > > root@imx95evk:~# ./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff; Writing 32-bit va > [ 140.149713] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d00) link down detected lue 0x4001FF to a > [ 140.158614] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down ddress 0x4C30003C > [ 140.166779] pcieport 0000:00:00.0: Recovering Root Port due to Link Down > [ 140.174981] pci 0000:01:00.0: AER: can't recover (no error_detected callback) Writing 32-bit value 0x1FF to address 0x4C30003C root@imx95evk:~# > [ 140.401605] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected > [ 140.590491] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > [ 140.596206] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected > > root@imx95evk:~# > [ 141.630311] pcieport 0000:00:00.0: Data Link Layer Link Active not set in 100 msec > [ 141.637950] pcieport 0000:00:00.0: Failed to reset Root Port: -25 > [ 141.644095] pcieport 0000:00:00.0: AER: subordinate device reset failed > [ 141.650883] pcieport 0000:00:00.0: AER: device recovery failed > [ 141.656784] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down > [ 141.663520] pcieport 0000:00:00.0: Recovering Root Port due to Link Down > [ 141.670271] pci 0000:01:00.0: AER: can't recover (no error_detected callback) > [ 141.897701] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected > [ 142.086341] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > [ 142.092038] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected > [ 143.126273] pcieport 0000:00:00.0: Data Link Layer Link Active not set in 100 msec > [ 143.133919] pcieport 0000:00:00.0: Failed to reset Root Port: -25 > [ 143.140052] pcieport 0000:00:00.0: AER: subordinate device reset failed > [ 143.146747] pcieport 0000:00:00.0: AER: device recovery failed > [ 143.152604] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down > [ 143.159314] pcieport 0000:00:00.0: Recovering Root Port due to Link Down > [ 143.166022] pci 0000:01:00.0: AER: can't recover (no error_detected callback) > [ 143.389723] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected > [ 143.582294] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > [ 143.587996] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected > > > Thanks. > Best Regards > Richard Zhu