Re: [EXTERNAL] Re: [PATCH 1/1] PCI: armada8k: Add link-down handle

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
To: Wilson Ding <dingwei@marvell.com>
Cc: "cassel@kernel.org" <cassel@kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	"lpieralisi@kernel.org" <lpieralisi@kernel.org>,
	"thomas.petazzoni@bootlin.com" <thomas.petazzoni@bootlin.com>,
	"kw@linux.com" <kw@linux.com>,
	"robh@kernel.org" <robh@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Sanghoon Lee <salee@marvell.com>
Subject: Re: [EXTERNAL] Re: [PATCH 1/1] PCI: armada8k: Add link-down handle
Date: Sat, 8 Feb 2025 16:06:18 +0530	[thread overview]
Message-ID: <20250208103618.2binrjgry7ghoavc@thinkpad> (raw)
In-Reply-To: <BY3PR18MB46738F5857319F9637FA5050A7F12@BY3PR18MB4673.namprd18.prod.outlook.com>

On Fri, Feb 07, 2025 at 06:46:22PM +0000, Wilson Ding wrote:
> 
> 
> > -----Original Message-----
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Sent: Friday, February 7, 2025 9:58 AM
> > To: Wilson Ding <dingwei@marvell.com>; cassel@kernel.org
> > Cc: Bjorn Helgaas <helgaas@kernel.org>; lpieralisi@kernel.org;
> > thomas.petazzoni@bootlin.com; kw@linux.com; robh@kernel.org;
> > bhelgaas@google.com; linux-pci@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Sanghoon Lee
> > <salee@marvell.com>
> > Subject: [EXTERNAL] Re: [PATCH 1/1] PCI: armada8k: Add link-down handle
> > 
> > + Niklas (who was interested in link down handling) On Sat, Feb 01, 2025
> > + at 11: 05: 56PM +0000, Wilson Ding wrote: > > On November 13, 2024 3: 
> > + 02: 55 AM GMT+05: 30, Bjorn Helgaas > > <mailto: helgaas@ kernel. org>
> > + wrote: > >
> > 
> > + Niklas (who was interested in link down handling)
> > 
> > On Sat, Feb 01, 2025 at 11:05:56PM +0000, Wilson Ding wrote:
> > > > On November 13, 2024 3:02:55 AM GMT+05:30, Bjorn Helgaas
> > > > <mailto:helgaas@kernel.org> wrote:
> > > > >In subject:
> > > > >
> > > > >  PCI: armada8k: Add link-down handling
> > > > >
> > > > >On Mon, Nov 11, 2024 at 10:48:13PM -0800, Jenishkumar Maheshbhai
> > > > Patel wrote:
> > > > >> In PCIE ISR routine caused by RST_LINK_DOWN we schedule work to
> > > > >> handle the link-down procedure.
> > > > >> Link-down procedure will:
> > > > >> 1. Remove PCIe bus
> > > > >> 2. Reset the MAC
> > > > >> 3. Reconfigure link back up
> > > > >> 4. Rescan PCIe bus
> > > > >
> > > > >s/PCIE/PCIe/
> > > > >
> > > > >Rewrap to fill 75 columns.
> > > > >
> > > > >I assume this basically removes a Root Port (and the hierarchy
> > > > >below
> > > > >it) if the link goes down, and then resets the MAC and tries to
> > > > >bring up the link and enumerate the hierarchy again.
> > > > >
> > > > >No other drivers do this, so why does armada8k need it?  Is this to
> > > > >work around some unreliable link?
> > > >
> > > > Certainly Qcom IPs have this same feature and I was also looking to
> > > > implement it. But the link down should not be handled by this in the
> > controller driver.
> > > >
> > > > Instead, it should be tied to bus reset in the core and the reset
> > > > should be done through a callback implemented in the controller
> > > > drivers. This way, the reset cannot happen in the back of PCI core and client
> > drivers.
> > > >
> > > > That said, the Link down IRQ received by this driver should also be
> > > > propagated back to the PCI core and the core should then call the
> > > > callback to reset the bus that I mentioned above.
> > > >
> > >
> > > It's more than a work-around for the unreliable link. A few customers
> > > may have such application - independent power supply to the device
> > > with dedicated reset GPIO to #PRST. In this way, the power cycle and
> > > warm reset of RC and EP won't have impact on each other. However, it
> > > may lead into the PCI driver not aware of the link down when an unexpected
> > power down or reset occurs on the device.
> > > We cannot assume the link will be recovered soon. The worse thing is
> > > the driver may continue access to the device, which may hang the bus.
> > > Since the device is no longer present on the bus, it's better to
> > > remove it. Besides, in order to bring up the link, the only way is to
> > > reset the MAC, which starts over the state machine of LTSSM.
> > >
> > > Well, we also noticed that there is no other driver that did this. I
> > > agree it is not necessary if the power cycle or warm reset of the
> > > device is done gracefully. The user can remove the device prior to the
> > > power cycle/reset.  And do the rescan after the link is recovered. However,
> > the unexpected power down is still possible.
> > > Please enlighten me if there is any better approach to handle such
> > > unexpected link down.
> > >
> > 
> > There is no issue in retraining the link. My concern is that, the retrain should
> > not happen autonomously in the controller driver. PCI core should be made
> > informed of it. More below.
> > 
> 
> Do you mean 
> - pass the link down/up events to PCI core
> - remove the device or hierarchy by PCI core upon link down
> - initiate the link retraining in PCI core by calling the platform retrain callbacks 
> - rescan the bus once link is recovered
> 

Yeah. This is what I came up with quickly:

```
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b6536ed599c3..561eeb464220 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -706,6 +706,33 @@ void pci_free_host_bridge(struct pci_host_bridge *bridge)
 }
 EXPORT_SYMBOL(pci_free_host_bridge);
 
+void pci_host_bridge_handle_link_down(struct pci_host_bridge *bridge)
+{
+       struct pci_bus *bus = bridge->bus;
+       struct pci_dev *child, *tmp;
+       int ret;
+
+       pci_lock_rescan_remove();
+
+       /* Knock the devices off bus since we cannot access them */
+       list_for_each_entry_safe(child, tmp, &bus->devices, bus_list)
+               pci_stop_and_remove_bus_device(child);
+
+       /* Now retrain the link in a controller specific way to bring it back */
+       if (bus->ops->retrain_link) {
+               ret = bus->ops->retrain_link(bus);
+               if (ret) {
+                       dev_err(&bridge->dev, "Failed to retrain the link!\n");
+                       pci_unlock_rescan_remove();
+                       return;
+               }
+       }
+
+       pci_rescan_bus(bus);
+       pci_unlock_rescan_remove();
+}
+EXPORT_SYMBOL(pci_host_bridge_handle_link_down);
+
 /* Indexed by PCI_X_SSTATUS_FREQ (secondary bus mode and frequency) */
 static const unsigned char pcix_bus_speed[] = {
        PCI_SPEED_UNKNOWN,              /* 0 */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 47b31ad724fa..1c6f18a51bdd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -637,6 +637,7 @@ struct pci_host_bridge *pci_alloc_host_bridge(size_t priv);
 struct pci_host_bridge *devm_pci_alloc_host_bridge(struct device *dev,
                                                   size_t priv);
 void pci_free_host_bridge(struct pci_host_bridge *bridge);
+void pci_host_bridge_handle_link_down(struct pci_host_bridge *bridge);
 struct pci_host_bridge *pci_find_host_bridge(struct pci_bus *bus);
 
 void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
@@ -804,6 +805,7 @@ struct pci_ops {
        void __iomem *(*map_bus)(struct pci_bus *bus, unsigned int devfn, int where);
        int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
        int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
+       int (*retrain_link)(struct pci_bus *bus);
 };
 
 /*
```

Your controller driver has to call pci_host_bridge_handle_link_down() during the
link down event (make it threaded if not done already). Then you should also
populate the pci_ops::retrain_link() callback with the function that retrains
the broken link. Finally, the bus will be rescanned to enumerate the devices.

I do have plans to plug this retrain callback to one of the bus_reset()
functions in the future so that we can bring the link back while doing bus level
reset (uncorrectable AERs and such). But this will do the job for now.

I will send a series on Monday with Qcom driver as a reference.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

next prev parent reply	other threads:[~2025-02-08 10:38 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-12  6:48 [PATCH 1/1] PCI: armada8k: Add link-down handle Jenishkumar Maheshbhai Patel
2024-11-12 10:06 ` Russell King (Oracle)
     [not found]   ` <BY3PR18MB4673A39E3A7053093DD03047A7F52@BY3PR18MB4673.namprd18.prod.outlook.com>
2025-02-03  3:57     ` Wilson Ding
2024-11-12 21:32 ` Bjorn Helgaas
2024-11-13 11:45   ` Manivannan Sadhasivam
     [not found]     ` <BY3PR18MB46737FB5FDBD75CF31B505B8A7EB2@BY3PR18MB4673.namprd18.prod.outlook.com>
2025-02-01 23:05       ` Wilson Ding
2025-02-07 17:57         ` Manivannan Sadhasivam
2025-02-07 18:46           ` [EXTERNAL] " Wilson Ding
2025-02-08 10:36             ` Manivannan Sadhasivam [this message]
2025-02-10 17:54               ` Wilson Ding
2025-02-07 21:13           ` Niklas Cassel

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b6536ed599c dfblob:561eeb46422 dfblob:47b31ad724f
dfblob:1c6f18a51bd )
 OR (
bs:"Re: [EXTERNAL] Re: [PATCH 1/1] PCI: armada8k: Add link-down handle" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250208103618.2binrjgry7ghoavc@thinkpad \
    --to=manivannan.sadhasivam@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=cassel@kernel.org \
    --cc=dingwei@marvell.com \
    --cc=helgaas@kernel.org \
    --cc=kw@linux.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=robh@kernel.org \
    --cc=salee@marvell.com \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).