[RFC] PCI: pwrctrl and link-up dependencies

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] PCI: pwrctrl and link-up dependencies
@ 2025-04-08 19:59 Brian Norris
  2025-04-08 21:26 ` Brian Norris
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Brian Norris @ 2025-04-08 19:59 UTC (permalink / raw)
  To: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han,
	Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński
  Cc: Krishna Chaitanya Chundru, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, dmitry.baryshkov, Tsai Sung-Fu

TL;DR: PCIe link-up may depend on pwrctrl; however, link-startup is
often run before pwrctrl gets involved. I'm exploring options to resolve
this.

Hi all,

I'm currently looking at reworking how some (currently out-of-tree, but I'm
hoping to change that) pcie-designware based drivers integrate power sequencing
for their endpoint devices, as well as the corresponding start_link()
functionality.

For power sequencing, drivers/pci/pwrctrl/ looks like a very good start at what
we need, since we have various device-specific regulators, GPIOs, and
sequencing requirements, which we'd prefer not to encode directly in the
controller driver.

For link startup, pcie-designware-host.c currently
(a) starts the link via platform-specific means (dw_pcie::ops::start_link()) and
(b) waits for the link training to complete.

However, (b) will fail if the other end of the link is not powered up --
e.g., if the appropriate pwrctrl driver has not yet loaded, or its
device hasn't finished probing. Today, this can mean the designware
driver will either fail to probe, or at least waste time for a condition
that we can't achieve (link up), depending on the HW/driver
implementation.

I'm wondering how any designware-based platforms (on which I believe pwrctrl
was developed) actually support this, and how I should look to integrate
additional platforms/drivers. From what I can tell, the only way things would
work today would either be if:
(1) a given platform uses the dw_pcie_rp::use_linkup_irq==true functionality,
    which means pcie-designware-host will only start the link, but not wait for
    training to succeed. (And presumably the controller will receive its
    link-up IRQ after power sequencing is done, at which point both pwrctrl and
    the IRQ may rescan the PCI bus.) Or:
(2) pci/pwrctrl sequencing only brings up some non-critical power rails for the
    device in question, so link-up can actually succeed even without
    pwrctrl.

My guess is that (1) is the case, and specifically that the relevant folks are
using the pcie-qcom.c, with its "global" IRQ used for link-up events.

So how should I replicate this on other designware-based platforms? I suppose
if the platform in question has a link-up IRQ, I can imitate (1). But what if
it doesn't? (Today, we don't validate and utilize a link-up IRQ, although it's
possible there is one available. Additionally, we use various retry mechanisms
today, which don't trivially fit into this framework, as we won't know when
precisely to retry, if power sequencing is controlled entirely by some other
entity.)

Would it make sense to introduce some sort of pwrctrl -> start_link()
dependency? For example, I see similar work done in this series [1], for
slightly different reasons. In short, that series adds new
pci_ops::{start,stop}_link() callbacks, and teaches a single pwrctrl driver to
stop and restart the bridge link before/after powering things up.

I also see that Manivannan has a proposal out [2] to add semi-generic
link-down + retraining support to core code. It treads somewhat similar
ground, and I could even imagine that its pci_ops::retrain_link()
callback could even be reimplemented in terms of the aforementioned
pci_ops::{start,stop}_link(), or possibly vice versa.

Any thoughts here? Sorry for a lot of text and no patch, but I didn't just want
to start off by throwing a 3rd set of patches on top of the existing ones that
tread similar ground[1][2].

Regards,
Brian

[1] [PATCH v4 00/10] PCI: Enable Power and configure the TC956x PCIe switch
https://lore.kernel.org/linux-pci/20250225-qps615_v4_1-v4-0-e08633a7bdf8@oss.qualcomm.com/

[PATCH v4 03/10] PCI: Add new start_link() & stop_link function ops
https://lore.kernel.org/linux-pci/20250225-qps615_v4_1-v4-3-e08633a7bdf8@oss.qualcomm.com/

[...]
[
[PATCH v4 08/10] PCI: pwrctrl: Add power control driver for tc956x
https://lore.kernel.org/linux-pci/20250225-qps615_v4_1-v4-8-e08633a7bdf8@oss.qualcomm.com/

[2] [PATCH 0/2] PCI: Add support for handling link down event from host bridge drivers
https://lore.kernel.org/linux-pci/20250221172309.120009-1-manivannan.sadhasivam@linaro.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-08 19:59 [RFC] PCI: pwrctrl and link-up dependencies Brian Norris
@ 2025-04-08 21:26 ` Brian Norris
  2025-04-14 11:07   ` Manivannan Sadhasivam
  2025-04-10  9:59 ` Niklas Cassel
  2025-04-14 10:57 ` Manivannan Sadhasivam
  2 siblings, 1 reply; 12+ messages in thread
From: Brian Norris @ 2025-04-08 21:26 UTC (permalink / raw)
  To: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han,
	Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński
  Cc: Krishna Chaitanya Chundru, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, dmitry.baryshkov, Tsai Sung-Fu, Jim Quinlan,
	Nicolas Saenz Julienne, Florian Fainelli,
	Broadcom internal kernel review list

+ adding pcie-brcmstb.c folks

On Tue, Apr 08, 2025 at 12:59:39PM -0700, Brian Norris wrote:
> TL;DR: PCIe link-up may depend on pwrctrl; however, link-startup is
> often run before pwrctrl gets involved. I'm exploring options to resolve
> this.

Apologies if a quick self-reply is considered nosiy or rude, but I
nearly forgot that I previously was looking at "pwrctrl"-like
functionality and noticed that drivers/pci/controller/pcie-brcmstb.c has
had a portion of the same "pwrctrl" functionality for some time (commit
93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage
regulators")).

Notably, it performs its power sequencing before starting its link, for
(I believe) exactly the same reasons as I mention below. While I'm sure
it could theoretically be nice for them to be able to use
drivers/pci/pwrctrl/, I expect they cannot today, for the same reasons.

Brian

P.S. My original post is here, in case you're interested:
https://lore.kernel.org/linux-pci/Z_WAKDjIeOjlghVs@google.com/
I also leave the rest of the message intact below.

> Hi all,
> 
> I'm currently looking at reworking how some (currently out-of-tree, but I'm
> hoping to change that) pcie-designware based drivers integrate power sequencing
> for their endpoint devices, as well as the corresponding start_link()
> functionality.
> 
> For power sequencing, drivers/pci/pwrctrl/ looks like a very good start at what
> we need, since we have various device-specific regulators, GPIOs, and
> sequencing requirements, which we'd prefer not to encode directly in the
> controller driver.
> 
> For link startup, pcie-designware-host.c currently
> (a) starts the link via platform-specific means (dw_pcie::ops::start_link()) and
> (b) waits for the link training to complete.
> 
> However, (b) will fail if the other end of the link is not powered up --
> e.g., if the appropriate pwrctrl driver has not yet loaded, or its
> device hasn't finished probing. Today, this can mean the designware
> driver will either fail to probe, or at least waste time for a condition
> that we can't achieve (link up), depending on the HW/driver
> implementation.
> 
> I'm wondering how any designware-based platforms (on which I believe pwrctrl
> was developed) actually support this, and how I should look to integrate
> additional platforms/drivers. From what I can tell, the only way things would
> work today would either be if:
> (1) a given platform uses the dw_pcie_rp::use_linkup_irq==true functionality,
>     which means pcie-designware-host will only start the link, but not wait for
>     training to succeed. (And presumably the controller will receive its
>     link-up IRQ after power sequencing is done, at which point both pwrctrl and
>     the IRQ may rescan the PCI bus.) Or:
> (2) pci/pwrctrl sequencing only brings up some non-critical power rails for the
>     device in question, so link-up can actually succeed even without
>     pwrctrl.
> 
> My guess is that (1) is the case, and specifically that the relevant folks are
> using the pcie-qcom.c, with its "global" IRQ used for link-up events.
> 
> So how should I replicate this on other designware-based platforms? I suppose
> if the platform in question has a link-up IRQ, I can imitate (1). But what if
> it doesn't? (Today, we don't validate and utilize a link-up IRQ, although it's
> possible there is one available. Additionally, we use various retry mechanisms
> today, which don't trivially fit into this framework, as we won't know when
> precisely to retry, if power sequencing is controlled entirely by some other
> entity.)
> 
> Would it make sense to introduce some sort of pwrctrl -> start_link()
> dependency? For example, I see similar work done in this series [1], for
> slightly different reasons. In short, that series adds new
> pci_ops::{start,stop}_link() callbacks, and teaches a single pwrctrl driver to
> stop and restart the bridge link before/after powering things up.
> 
> I also see that Manivannan has a proposal out [2] to add semi-generic
> link-down + retraining support to core code. It treads somewhat similar
> ground, and I could even imagine that its pci_ops::retrain_link()
> callback could even be reimplemented in terms of the aforementioned
> pci_ops::{start,stop}_link(), or possibly vice versa.
> 
> Any thoughts here? Sorry for a lot of text and no patch, but I didn't just want
> to start off by throwing a 3rd set of patches on top of the existing ones that
> tread similar ground[1][2].
> 
> Regards,
> Brian
> 
> [1] [PATCH v4 00/10] PCI: Enable Power and configure the TC956x PCIe switch
> https://lore.kernel.org/linux-pci/20250225-qps615_v4_1-v4-0-e08633a7bdf8@oss.qualcomm.com/
> 
> [PATCH v4 03/10] PCI: Add new start_link() & stop_link function ops
> https://lore.kernel.org/linux-pci/20250225-qps615_v4_1-v4-3-e08633a7bdf8@oss.qualcomm.com/
> 
> [...]
> [
> [PATCH v4 08/10] PCI: pwrctrl: Add power control driver for tc956x
> https://lore.kernel.org/linux-pci/20250225-qps615_v4_1-v4-8-e08633a7bdf8@oss.qualcomm.com/
> 
> [2] [PATCH 0/2] PCI: Add support for handling link down event from host bridge drivers
> https://lore.kernel.org/linux-pci/20250221172309.120009-1-manivannan.sadhasivam@linaro.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-08 19:59 [RFC] PCI: pwrctrl and link-up dependencies Brian Norris
  2025-04-08 21:26 ` Brian Norris
@ 2025-04-10  9:59 ` Niklas Cassel
  2025-04-14 10:57 ` Manivannan Sadhasivam
  2 siblings, 0 replies; 12+ messages in thread
From: Niklas Cassel @ 2025-04-10  9:59 UTC (permalink / raw)
  To: Brian Norris
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han,
	Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu

On Tue, Apr 08, 2025 at 12:59:36PM -0700, Brian Norris wrote:
> I also see that Manivannan has a proposal out [2] to add semi-generic
> link-down + retraining support to core code. It treads somewhat similar
> ground, and I could even imagine that its pci_ops::retrain_link()
> callback could even be reimplemented in terms of the aforementioned
> pci_ops::{start,stop}_link(), or possibly vice versa.

Just want to point out that there is a newer version of this series here:
https://lore.kernel.org/linux-pci/20250404-pcie-reset-slot-v1-0-98952918bf90@linaro.org/T/#t


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-08 19:59 [RFC] PCI: pwrctrl and link-up dependencies Brian Norris
  2025-04-08 21:26 ` Brian Norris
  2025-04-10  9:59 ` Niklas Cassel
@ 2025-04-14 10:57 ` Manivannan Sadhasivam
  2025-04-14 23:24   ` Brian Norris
  2 siblings, 1 reply; 12+ messages in thread
From: Manivannan Sadhasivam @ 2025-04-14 10:57 UTC (permalink / raw)
  To: Brian Norris
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu

On Tue, Apr 08, 2025 at 12:59:36PM -0700, Brian Norris wrote:
> TL;DR: PCIe link-up may depend on pwrctrl; however, link-startup is
> often run before pwrctrl gets involved. I'm exploring options to resolve
> this.
> 
> Hi all,
> 
> I'm currently looking at reworking how some (currently out-of-tree, but I'm
> hoping to change that) pcie-designware based drivers integrate power sequencing
> for their endpoint devices, as well as the corresponding start_link()
> functionality.
> 
> For power sequencing, drivers/pci/pwrctrl/ looks like a very good start at what
> we need, since we have various device-specific regulators, GPIOs, and
> sequencing requirements, which we'd prefer not to encode directly in the
> controller driver.
> 

The naming is a bit confusing, but power sequencing and power control are two
different yet related drivers. Power sequencing drivers
(drivers/power/sequencing) are targeted towards devices having complex resource
topology and often accessed by more than one drivers. Like the WiFI + BT combo
PCIe cards. On the other hand, power control (drivers/pci/pwrctrl) drivers are
used to control power to the PCIe slots/cards having simple resource topology.

> For link startup, pcie-designware-host.c currently
> (a) starts the link via platform-specific means (dw_pcie::ops::start_link()) and
> (b) waits for the link training to complete.
> 
> However, (b) will fail if the other end of the link is not powered up --
> e.g., if the appropriate pwrctrl driver has not yet loaded, or its
> device hasn't finished probing. Today, this can mean the designware
> driver will either fail to probe,

This is not correct. DWC driver will start LTSSM and wait for the link to be up
if the platform has no way of detecting link up. But it will not fail if the
link doesn't come up. It will just continue hoping for the link to come up
later. LTSSM would be in Detect.Quiet/Active state till a link partner is found:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-host.c#n558

> or at least waste time for a condition
> that we can't achieve (link up), depending on the HW/driver
> implementation.
> 

Unfortunately we cannot avoid this waiting time as we don't know if a device is
attached to the bus or not. The 1s wait time predates my involvement with DWC
drivers.

> I'm wondering how any designware-based platforms (on which I believe pwrctrl
> was developed) actually support this, and how I should look to integrate
> additional platforms/drivers. From what I can tell, the only way things would
> work today would either be if:
> (1) a given platform uses the dw_pcie_rp::use_linkup_irq==true functionality,
>     which means pcie-designware-host will only start the link, but not wait for
>     training to succeed. (And presumably the controller will receive its
>     link-up IRQ after power sequencing is done, at which point both pwrctrl and
>     the IRQ may rescan the PCI bus.) Or:
> (2) pci/pwrctrl sequencing only brings up some non-critical power rails for the
>     device in question, so link-up can actually succeed even without
>     pwrctrl.
> 

Again, failing to detect link up will not fail the probe. I don't know how you
derived this conclusion. Even the PCIe spec itself is clear that the link should
stay in Detect.Quiet until it has found the link partner. So failing the probe
means we are introducing a dependency on the devices which would be bizarre.
Imagine how a controller will end up supporting hotplug.

> My guess is that (1) is the case, and specifically that the relevant folks are
> using the pcie-qcom.c, with its "global" IRQ used for link-up events.
> 

We only recently added support for 'Link Up' event through 'global_irq' in the
controller driver. And this was done to avoid waiting for link up during probe
(which is what you/your colleagues also want to avoid I believe). But the
problem in your case is that you are completely skipping the LTSSM and relying
on custom userspace tooling to bring up the device and start LTSSM once done.

> So how should I replicate this on other designware-based platforms? I suppose
> if the platform in question has a link-up IRQ, I can imitate (1). But what if
> it doesn't? (Today, we don't validate and utilize a link-up IRQ, although it's
> possible there is one available. Additionally, we use various retry mechanisms
> today, which don't trivially fit into this framework, as we won't know when
> precisely to retry, if power sequencing is controlled entirely by some other
> entity.)
> 
> Would it make sense to introduce some sort of pwrctrl -> start_link()
> dependency? For example, I see similar work done in this series [1], for
> slightly different reasons. In short, that series adds new
> pci_ops::{start,stop}_link() callbacks, and teaches a single pwrctrl driver to
> stop and restart the bridge link before/after powering things up.
> 

This switch has a crazy requirement for configuring it through I2C. The I2C
configuration has to be done before starting LTSSM. So we disable LTSSM first
since it was enabled way before, then do I2C config and then start LTSSM again.

> I also see that Manivannan has a proposal out [2] to add semi-generic
> link-down + retraining support to core code. It treads somewhat similar
> ground, and I could even imagine that its pci_ops::retrain_link()
> callback could even be reimplemented in terms of the aforementioned
> pci_ops::{start,stop}_link(), or possibly vice versa.
> 

Retrain work is mostly to bring up a broken link, which is completely different
from what you are trying to achieve.

> Any thoughts here? Sorry for a lot of text and no patch, but I didn't just want
> to start off by throwing a 3rd set of patches on top of the existing ones that
> tread similar ground[1][2].
> 

No problem. If you want to use pwrctrl in your platform and get rid of the
custom userspace tooling, I'm all in for it. But for that, I need to understand
your controller design first. All I heard so far is, "we want to skip LTSSM and
let our tool take care of it".

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-08 21:26 ` Brian Norris
@ 2025-04-14 11:07   ` Manivannan Sadhasivam
  2025-04-14 23:16     ` Brian Norris
  2025-04-15 18:12     ` Jim Quinlan
  0 siblings, 2 replies; 12+ messages in thread
From: Manivannan Sadhasivam @ 2025-04-14 11:07 UTC (permalink / raw)
  To: Brian Norris
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu, Jim Quinlan, Nicolas Saenz Julienne,
	Florian Fainelli, Broadcom internal kernel review list

On Tue, Apr 08, 2025 at 02:26:24PM -0700, Brian Norris wrote:
> + adding pcie-brcmstb.c folks
> 
> On Tue, Apr 08, 2025 at 12:59:39PM -0700, Brian Norris wrote:
> > TL;DR: PCIe link-up may depend on pwrctrl; however, link-startup is
> > often run before pwrctrl gets involved. I'm exploring options to resolve
> > this.
> 
> Apologies if a quick self-reply is considered nosiy or rude, but I
> nearly forgot that I previously was looking at "pwrctrl"-like
> functionality and noticed that drivers/pci/controller/pcie-brcmstb.c has
> had a portion of the same "pwrctrl" functionality for some time (commit
> 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage
> regulators")).
> 

Yes, the goal of the pwrctrl driver is to get rid of this clutter from the
controller drivers.

> Notably, it performs its power sequencing before starting its link, for
> (I believe) exactly the same reasons as I mention below. While I'm sure
> it could theoretically be nice for them to be able to use
> drivers/pci/pwrctrl/, I expect they cannot today, for the same reasons.
> 

If you look into brcm_pcie_add_bus(), they are ignoring the return value of
brcm_pcie_start_link() precisely for the reason I explained in the previous
thread. However, they do check for it in brcm_pcie_resume_noirq() which looks
like a bug as the controller will fail to resume from system suspend if no
devices are connected.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-14 11:07   ` Manivannan Sadhasivam
@ 2025-04-14 23:16     ` Brian Norris
  2025-04-15 18:12     ` Jim Quinlan
  1 sibling, 0 replies; 12+ messages in thread
From: Brian Norris @ 2025-04-14 23:16 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu, Jim Quinlan, Nicolas Saenz Julienne,
	Florian Fainelli, Broadcom internal kernel review list

On Mon, Apr 14, 2025 at 04:37:23PM +0530, Manivannan Sadhasivam wrote:
> If you look into brcm_pcie_add_bus(), they are ignoring the return value of
> brcm_pcie_start_link() precisely for the reason I explained in the previous
> thread. However, they do check for it in brcm_pcie_resume_noirq() which looks
> like a bug as the controller will fail to resume from system suspend if no
> devices are connected.

Ah, I think I was actually looking more at brcm_pcie_resume_noirq(), and
didn't notice that their add_bus() callback ignored errors.

But I think pcie-brcmstb.c still handles things that pwrctrl does not,
and I'm interested in those things. I'll reply more to your other
response, where there's more details.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-14 10:57 ` Manivannan Sadhasivam
@ 2025-04-14 23:24   ` Brian Norris
  2025-04-15  5:32     ` Manivannan Sadhasivam
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Norris @ 2025-04-14 23:24 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu

Hi Manivannan,

On Mon, Apr 14, 2025 at 04:27:35PM +0530, Manivannan Sadhasivam wrote:
> On Tue, Apr 08, 2025 at 12:59:36PM -0700, Brian Norris wrote:
> > TL;DR: PCIe link-up may depend on pwrctrl; however, link-startup is
> > often run before pwrctrl gets involved. I'm exploring options to resolve
> > this.
> > 
> > Hi all,
> > 
> > I'm currently looking at reworking how some (currently out-of-tree, but I'm
> > hoping to change that) pcie-designware based drivers integrate power sequencing
> > for their endpoint devices, as well as the corresponding start_link()
> > functionality.
> > 
> > For power sequencing, drivers/pci/pwrctrl/ looks like a very good start at what
> > we need, since we have various device-specific regulators, GPIOs, and
> > sequencing requirements, which we'd prefer not to encode directly in the
> > controller driver.
> > 
> 
> The naming is a bit confusing,

+1

> but power sequencing and power control are two
> different yet related drivers. Power sequencing drivers
> (drivers/power/sequencing) are targeted towards devices having complex resource
> topology and often accessed by more than one drivers. Like the WiFI + BT combo
> PCIe cards. On the other hand, power control (drivers/pci/pwrctrl) drivers are
> used to control power to the PCIe slots/cards having simple resource topology.

Sure, I get the difference. There can be "sequencing" in the pwrctrl
area too though, because there can be PMICs involved even in a single
PCIe device (i.e., non-shared, not needing "pwrseq" framework) which
require multiple steps (e.g., 2 GPIOs) to power up. Apologies if my
mention of "sequencing" is unclear, but I believe everything I'm
concerned about is in pwrctrl not pwrseq.

> > For link startup, pcie-designware-host.c currently
> > (a) starts the link via platform-specific means (dw_pcie::ops::start_link()) and
> > (b) waits for the link training to complete.
> > 
> > However, (b) will fail if the other end of the link is not powered up --
> > e.g., if the appropriate pwrctrl driver has not yet loaded, or its
> > device hasn't finished probing. Today, this can mean the designware
> > driver will either fail to probe,
> 
> This is not correct.

That depends on the implementation of start_link(). But I suppose the
intention is that start_link() only "starts" and doesn't care where
things go from there. (IOW, my local start_link() implementation is
probably wrong at the moment, as it performs some poll/retry steps too.)

> DWC driver will start LTSSM and wait for the link to be up
> if the platform has no way of detecting link up. But it will not fail if the
> link doesn't come up. It will just continue hoping for the link to come up
> later. LTSSM would be in Detect.Quiet/Active state till a link partner is found:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-host.c#n558

I'd still consider logging an error a failure of sorts though, even if
we don't fail the probe().

> > or at least waste time for a condition
> > that we can't achieve (link up), depending on the HW/driver
> > implementation.
> > 
> 
> Unfortunately we cannot avoid this waiting time as we don't know if a device is
> attached to the bus or not. The 1s wait time predates my involvement with DWC
> drivers.

I don't really love that answer. It means that any DWC-based platform
that needs pwrctrl and doesn't set use_link_irq==true will waste 1
second per PCIe controller. While it's hard to make guarantees about old
and/or unloved drivers, I'd like to think I can do better on new ones.

> > I'm wondering how any designware-based platforms (on which I believe pwrctrl
> > was developed) actually support this, and how I should look to integrate
> > additional platforms/drivers. From what I can tell, the only way things would
> > work today would either be if:
> > (1) a given platform uses the dw_pcie_rp::use_linkup_irq==true functionality,
> >     which means pcie-designware-host will only start the link, but not wait for
> >     training to succeed. (And presumably the controller will receive its
> >     link-up IRQ after power sequencing is done, at which point both pwrctrl and
> >     the IRQ may rescan the PCI bus.) Or:
> > (2) pci/pwrctrl sequencing only brings up some non-critical power rails for the
> >     device in question, so link-up can actually succeed even without
> >     pwrctrl.
> > 
> 
> Again, failing to detect link up will not fail the probe. I don't know how you
> derived this conclusion. Even the PCIe spec itself is clear that the link should
> stay in Detect.Quiet until it has found the link partner. So failing the probe
> means we are introducing a dependency on the devices which would be bizarre.
> Imagine how a controller will end up supporting hotplug.

I think you're over-fixating on my mention of probe failure. Consider
the lesser statement that was paired along with it: always wasting 1
second per controller polling for something that will never happen. It
feels backwards and wasteful.

One of my key questions: if I don't have a link-up IRQ, how can I avoid
this waste? pcie-brcmstb avoids that waste today (for the common case
where there is, in fact, a device connected), and it would be a
regression for it to start using pwrctrl tomorrow.

(Side note: I also just noticed pcie-tegra194.c does the same.)

> > My guess is that (1) is the case, and specifically that the relevant folks are
> > using the pcie-qcom.c, with its "global" IRQ used for link-up events.
> > 
> 
> We only recently added support for 'Link Up' event through 'global_irq' in the
> controller driver. And this was done to avoid waiting for link up during probe

You're kind of reinforcing my question: you don't like the waste, so
you're adding link-up IRQ support -- is that really the only way?

(My initial thought: no, it's not. We know when pwrctrl has done its
thing -- why should we bother polling for link state before that? But
that's easier said than done, when pwrctrl is optional and highly
abstracted away from the DWC driver...)

> (which is what you/your colleagues also want to avoid I believe). But the
> problem in your case is that you are completely skipping the LTSSM and relying
> on custom userspace tooling to bring up the device and start LTSSM once done.

I assume you're talking about this thread:
https://lore.kernel.org/linux-pci/20240112093006.2832105-1-ajayagarwal@google.com/
[PATCH v5] PCI: dwc: Wait for link up only if link is started

I'm very aware of that thread, and the userspace tooling that underlies
it. I also am well aware that this is not how upstream should work, and
that's really why I'm here at all -- I'm trying to rewrite how we do
things, including our link-up and PMIC strategy.

So yes, that thread does provide some historical context for where I am,
but no, it doesn't really describe what I'm doing or asking about today.

> > Would it make sense to introduce some sort of pwrctrl -> start_link()
> > dependency? For example, I see similar work done in this series [1], for
> > slightly different reasons. In short, that series adds new
> > pci_ops::{start,stop}_link() callbacks, and teaches a single pwrctrl driver to
> > stop and restart the bridge link before/after powering things up.
> > 
> 
> This switch has a crazy requirement for configuring it through I2C. The I2C
> configuration has to be done before starting LTSSM. So we disable LTSSM first
> since it was enabled way before, then do I2C config and then start LTSSM again.

OK, thanks for the notice.

> > I also see that Manivannan has a proposal out [2] to add semi-generic
> > link-down + retraining support to core code. It treads somewhat similar
> > ground, and I could even imagine that its pci_ops::retrain_link()
> > callback could even be reimplemented in terms of the aforementioned
> > pci_ops::{start,stop}_link(), or possibly vice versa.
> > 
> 
> Retrain work is mostly to bring up a broken link, which is completely different
> from what you are trying to achieve.

OK. Thanks for the clarification.

One reason I highlight these patch sets is because they add more cases
of "PCI core" to "host bridge/controller driver" dependencies
specifically around link management, which otherwise has very little
precedent. I wasn't sure if that was by requirement, or simply because
people haven't tried to support these things.

> > Any thoughts here? Sorry for a lot of text and no patch, but I didn't just want
> > to start off by throwing a 3rd set of patches on top of the existing ones that
> > tread similar ground[1][2].
> > 
> 
> No problem. If you want to use pwrctrl in your platform and get rid of the
> custom userspace tooling, I'm all in for it. But for that, I need to understand
> your controller design first. All I heard so far is, "we want to skip LTSSM and
> let our tool take care of it".

Please consider that last sentence "dead" or "totally not any plan of
record." It may be how we do things privately today, but it's not
anything I expect the upstream community to even think about. (Feel free
to CC me if it comes up again. It's hard for me to speak for everyone at
my employer, but I can probably convince them it's a bad idea if
needed.)

Regarding the controller design: frankly, I don't think my controller
does anything all that revolutionary in this space [0]. All of my
questions today can be asked (from what I can tell) of existing upstream
controller drivers. I'm mostly trying to understand the expected driver
design here, and that includes teasing apart what is "stuff done in
'old' drivers, but isn't recommended", and "what is currently
unimplemented in new stuff" (like pwrctrl [1]), and where do my
expectations fit in between that.

For instance, poking around a bit I come up with this question: when
using pci/pwrctrl, how does one ensure timing requirements around, e.g.,
power stability vs PERST# deassertion are met? When looking at a pwrctrl
driver like drivers/pci/pwrctrl/slot.c, the process looks too simple:

(0) host bridge probably already started its LTSSM, deasserted PERST#
(1) slot.c powers the slot
(2) pci_pwrctrl_device_set_ready() -> rescan_work_func() rescans the bus

Notably, there's no enforced delay between (1) and (2).

Reading the PCIe CEM, it seems we're violating some specification bits,
like:

  2.2. PERST# Signal
  [...] On power-up, the de-assertion of PERST# is delayed 100 ms
  (TPVPERL) from the power rails achieving specified operating limits.
  [...]

There are references to this in various implementations (e.g.,
tegra_pcie_enable_slot_regulators() and brcm_pcie_start_link() --
although I suspect the latter is applying the wrong ordering).

Additionaly, CEM also seems to suggest we have PERST# ordering wrong. It
should also come between (1) and (2), not at (0).

And finally (for now), I don't understand how we have any guarantee that
step (2) is useful. Even if we've already started the LTSSM in (0), we
have no idea if the link is actually Active by the time we hit (2), and
so rescanning may not actually discover the device. And if that scan
fails ... then when do we trigger another pci_rescan_bus()? Only if the
implementation has a "link-up" IRQ?

Unless I'm misunderstanding, these concerns all suggest we need some
host-bridge hook in between (1) and (2), and existing pwrctrl users are
operating somewhat outside the specification, or are relying on
something more subtle that I'm missing.

Regards,
Brian

[0] That's not to dodge your question. I can try to describe relevant
    details if they would help. But I don't think they would right now.

[1] And if I were to propose my driver upstream eventually, I bet you
    wouldn't want me reimplementing pwrctrl in my driver, just because
    pwrctrl is missing features AFAICT.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-14 23:24   ` Brian Norris
@ 2025-04-15  5:32     ` Manivannan Sadhasivam
  2025-04-15 18:24       ` Brian Norris
  0 siblings, 1 reply; 12+ messages in thread
From: Manivannan Sadhasivam @ 2025-04-15  5:32 UTC (permalink / raw)
  To: Brian Norris
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu

On Mon, Apr 14, 2025 at 04:24:38PM -0700, Brian Norris wrote:
> Hi Manivannan,
> 
> On Mon, Apr 14, 2025 at 04:27:35PM +0530, Manivannan Sadhasivam wrote:
> > On Tue, Apr 08, 2025 at 12:59:36PM -0700, Brian Norris wrote:
> > > TL;DR: PCIe link-up may depend on pwrctrl; however, link-startup is
> > > often run before pwrctrl gets involved. I'm exploring options to resolve
> > > this.
> > > 
> > > Hi all,
> > > 
> > > I'm currently looking at reworking how some (currently out-of-tree, but I'm
> > > hoping to change that) pcie-designware based drivers integrate power sequencing
> > > for their endpoint devices, as well as the corresponding start_link()
> > > functionality.
> > > 
> > > For power sequencing, drivers/pci/pwrctrl/ looks like a very good start at what
> > > we need, since we have various device-specific regulators, GPIOs, and
> > > sequencing requirements, which we'd prefer not to encode directly in the
> > > controller driver.
> > > 
> > 
> > The naming is a bit confusing,
> 
> +1
> 
> > but power sequencing and power control are two
> > different yet related drivers. Power sequencing drivers
> > (drivers/power/sequencing) are targeted towards devices having complex resource
> > topology and often accessed by more than one drivers. Like the WiFI + BT combo
> > PCIe cards. On the other hand, power control (drivers/pci/pwrctrl) drivers are
> > used to control power to the PCIe slots/cards having simple resource topology.
> 
> Sure, I get the difference. There can be "sequencing" in the pwrctrl
> area too though, because there can be PMICs involved even in a single
> PCIe device (i.e., non-shared, not needing "pwrseq" framework) which
> require multiple steps (e.g., 2 GPIOs) to power up. Apologies if my
> mention of "sequencing" is unclear, but I believe everything I'm
> concerned about is in pwrctrl not pwrseq.
> 

Right.

> > > For link startup, pcie-designware-host.c currently
> > > (a) starts the link via platform-specific means (dw_pcie::ops::start_link()) and
> > > (b) waits for the link training to complete.
> > > 
> > > However, (b) will fail if the other end of the link is not powered up --
> > > e.g., if the appropriate pwrctrl driver has not yet loaded, or its
> > > device hasn't finished probing. Today, this can mean the designware
> > > driver will either fail to probe,
> > 
> > This is not correct.
> 
> That depends on the implementation of start_link(). But I suppose the
> intention is that start_link() only "starts" and doesn't care where
> things go from there. (IOW, my local start_link() implementation is
> probably wrong at the moment, as it performs some poll/retry steps too.)
> 

Yes, that's why I said it was incorrect. The callback is supposed to just start
the link and not wait for anything else.

> > DWC driver will start LTSSM and wait for the link to be up
> > if the platform has no way of detecting link up. But it will not fail if the
> > link doesn't come up. It will just continue hoping for the link to come up
> > later. LTSSM would be in Detect.Quiet/Active state till a link partner is found:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-host.c#n558
> 
> I'd still consider logging an error a failure of sorts though, even if
> we don't fail the probe().
> 

We do print the log now. Infact it is dev_info():

"Phy link never came up"

> > > or at least waste time for a condition
> > > that we can't achieve (link up), depending on the HW/driver
> > > implementation.
> > > 
> > 
> > Unfortunately we cannot avoid this waiting time as we don't know if a device is
> > attached to the bus or not. The 1s wait time predates my involvement with DWC
> > drivers.
> 
> I don't really love that answer. It means that any DWC-based platform
> that needs pwrctrl and doesn't set use_link_irq==true will waste 1
> second per PCIe controller. While it's hard to make guarantees about old
> and/or unloved drivers, I'd like to think I can do better on new ones.
> 

Even I'd like to avoid the 1s delay. But the problem is how would you know if
the device is attached to the bus or not. The delay is to account for the fact
that the link may take up to 1s to come up post starting LTSSM. So if we do not
wait for that period, there is a chance that we would report the false negative
status and also the enumeration would fail.

> > > I'm wondering how any designware-based platforms (on which I believe pwrctrl
> > > was developed) actually support this, and how I should look to integrate
> > > additional platforms/drivers. From what I can tell, the only way things would
> > > work today would either be if:
> > > (1) a given platform uses the dw_pcie_rp::use_linkup_irq==true functionality,
> > >     which means pcie-designware-host will only start the link, but not wait for
> > >     training to succeed. (And presumably the controller will receive its
> > >     link-up IRQ after power sequencing is done, at which point both pwrctrl and
> > >     the IRQ may rescan the PCI bus.) Or:
> > > (2) pci/pwrctrl sequencing only brings up some non-critical power rails for the
> > >     device in question, so link-up can actually succeed even without
> > >     pwrctrl.
> > > 
> > 
> > Again, failing to detect link up will not fail the probe. I don't know how you
> > derived this conclusion. Even the PCIe spec itself is clear that the link should
> > stay in Detect.Quiet until it has found the link partner. So failing the probe
> > means we are introducing a dependency on the devices which would be bizarre.
> > Imagine how a controller will end up supporting hotplug.
> 
> I think you're over-fixating on my mention of probe failure. Consider
> the lesser statement that was paired along with it: always wasting 1
> second per controller polling for something that will never happen. It
> feels backwards and wasteful.
> 

Again, I do get your point. But tell me how can a controller reliably detect
that there is a device attached to the bus. Only on your android setup, you for
sure know that the device won't be there during probe. So you are considering 1s
wait as a wast of time and it is fair. But what if the same controller is used
in another platform which is not android or the endpoint device is powered on
during probe itself without replying on userspace?

> One of my key questions: if I don't have a link-up IRQ, how can I avoid
> this waste? pcie-brcmstb avoids that waste today (for the common case
> where there is, in fact, a device connected), and it would be a
> regression for it to start using pwrctrl tomorrow.
> 

Why are you tying pwrctrl with this designware driver behavior? Both are
unrelated. Even if you don't use pwrctrl and use controller driver to bring up
the device, the 1s delay would be applicable (if there is no device).

pcie-brcmstb driver indeed wastes time. It is not 1s but just 100ms. But that
driver is for only one vendor. In the case of DWC, the driver has to work with
multiple vendors. But again, I do not know how this 1s delay came up. Maybe we
could try to reduce it to 500ms or so, but for that I need confirmation from
someone like Lorenzo who knows the history.

> (Side note: I also just noticed pcie-tegra194.c does the same.)
> 
> > > My guess is that (1) is the case, and specifically that the relevant folks are
> > > using the pcie-qcom.c, with its "global" IRQ used for link-up events.
> > > 
> > 
> > We only recently added support for 'Link Up' event through 'global_irq' in the
> > controller driver. And this was done to avoid waiting for link up during probe
> 
> You're kind of reinforcing my question: you don't like the waste, so
> you're adding link-up IRQ support -- is that really the only way?
> 

I don't know. But so far I haven't seen any other sensible way which is generic.

> (My initial thought: no, it's not. We know when pwrctrl has done its
> thing -- why should we bother polling for link state before that? But
> that's easier said than done, when pwrctrl is optional and highly
> abstracted away from the DWC driver...)
> 

Oh well... this is where you got it wrong. pwrctrl drivers are only probed
before enumeration because of the design (which is way after starting the link).
As of v6.15-rc1, before we try to enumerate any device, we check if there is any
device defined in DT which requires power supply. If so, we create a platform
device (or pwrctrl device) and let the pwrctrl driver to bind to it and power up
the device. In that case, we also do not proceed to scan the bus further and
skip the hierarchy. Because, the pwrctrl driver will rescan the bus once it has
finished powering up the device.

> > (which is what you/your colleagues also want to avoid I believe). But the
> > problem in your case is that you are completely skipping the LTSSM and relying
> > on custom userspace tooling to bring up the device and start LTSSM once done.
> 
> I assume you're talking about this thread:
> https://lore.kernel.org/linux-pci/20240112093006.2832105-1-ajayagarwal@google.com/
> [PATCH v5] PCI: dwc: Wait for link up only if link is started
> 

Yes!

> I'm very aware of that thread, and the userspace tooling that underlies
> it. I also am well aware that this is not how upstream should work, and
> that's really why I'm here at all -- I'm trying to rewrite how we do
> things, including our link-up and PMIC strategy.
> 

I'm really happy to hear this :)

> So yes, that thread does provide some historical context for where I am,
> but no, it doesn't really describe what I'm doing or asking about today.
> 
> > > Would it make sense to introduce some sort of pwrctrl -> start_link()
> > > dependency? For example, I see similar work done in this series [1], for
> > > slightly different reasons. In short, that series adds new
> > > pci_ops::{start,stop}_link() callbacks, and teaches a single pwrctrl driver to
> > > stop and restart the bridge link before/after powering things up.
> > > 
> > 
> > This switch has a crazy requirement for configuring it through I2C. The I2C
> > configuration has to be done before starting LTSSM. So we disable LTSSM first
> > since it was enabled way before, then do I2C config and then start LTSSM again.
> 
> OK, thanks for the notice.
> 
> > > I also see that Manivannan has a proposal out [2] to add semi-generic
> > > link-down + retraining support to core code. It treads somewhat similar
> > > ground, and I could even imagine that its pci_ops::retrain_link()
> > > callback could even be reimplemented in terms of the aforementioned
> > > pci_ops::{start,stop}_link(), or possibly vice versa.
> > > 
> > 
> > Retrain work is mostly to bring up a broken link, which is completely different
> > from what you are trying to achieve.
> 
> OK. Thanks for the clarification.
> 
> One reason I highlight these patch sets is because they add more cases
> of "PCI core" to "host bridge/controller driver" dependencies
> specifically around link management, which otherwise has very little
> precedent. I wasn't sure if that was by requirement, or simply because
> people haven't tried to support these things.
> 

The callbacks are introduced only because of the hardware requirements of the
switch.

> > > Any thoughts here? Sorry for a lot of text and no patch, but I didn't just want
> > > to start off by throwing a 3rd set of patches on top of the existing ones that
> > > tread similar ground[1][2].
> > > 
> > 
> > No problem. If you want to use pwrctrl in your platform and get rid of the
> > custom userspace tooling, I'm all in for it. But for that, I need to understand
> > your controller design first. All I heard so far is, "we want to skip LTSSM and
> > let our tool take care of it".
> 
> Please consider that last sentence "dead" or "totally not any plan of
> record." It may be how we do things privately today, but it's not
> anything I expect the upstream community to even think about. (Feel free
> to CC me if it comes up again. It's hard for me to speak for everyone at
> my employer, but I can probably convince them it's a bad idea if
> needed.)
> 

Sure thing. Thanks for understanding. The thread kept coming once in a while and
I had to repeat everytime why the idea is so bad and won't scale. Finally
someone understood it.

> Regarding the controller design: frankly, I don't think my controller
> does anything all that revolutionary in this space [0]. All of my
> questions today can be asked (from what I can tell) of existing upstream
> controller drivers. I'm mostly trying to understand the expected driver
> design here, and that includes teasing apart what is "stuff done in
> 'old' drivers, but isn't recommended", and "what is currently
> unimplemented in new stuff" (like pwrctrl [1]), and where do my
> expectations fit in between that.
> 
> For instance, poking around a bit I come up with this question: when
> using pci/pwrctrl, how does one ensure timing requirements around, e.g.,
> power stability vs PERST# deassertion are met? When looking at a pwrctrl
> driver like drivers/pci/pwrctrl/slot.c, the process looks too simple:
> 
> (0) host bridge probably already started its LTSSM, deasserted PERST#
> (1) slot.c powers the slot
> (2) pci_pwrctrl_device_set_ready() -> rescan_work_func() rescans the bus
> 
> Notably, there's no enforced delay between (1) and (2).
> 
> Reading the PCIe CEM, it seems we're violating some specification bits,
> like:
> 
>   2.2. PERST# Signal
>   [...] On power-up, the de-assertion of PERST# is delayed 100 ms
>   (TPVPERL) from the power rails achieving specified operating limits.
>   [...]
> 
> There are references to this in various implementations (e.g.,
> tegra_pcie_enable_slot_regulators() and brcm_pcie_start_link() --
> although I suspect the latter is applying the wrong ordering).
> 
> Additionaly, CEM also seems to suggest we have PERST# ordering wrong. It
> should also come between (1) and (2), not at (0).
> 

You are absolutely right! Currently, we follow the timing requirement while
deasserting the PERST# in the controller drivers. But once we power on the slot,
we do not touch PERST# and it just happen to work.

We may need to introduce another callback that toggles PERST# so that we can use
it while powering up the device.

> And finally (for now), I don't understand how we have any guarantee that
> step (2) is useful. Even if we've already started the LTSSM in (0), we
> have no idea if the link is actually Active by the time we hit (2), and
> so rescanning may not actually discover the device. And if that scan
> fails ... then when do we trigger another pci_rescan_bus()? Only if the
> implementation has a "link-up" IRQ?
> 

As I said above, we do not enumerate the device if it has devicetree node with
supplies. So that's why we need (2). Otherwise, the device won't be enumerated
at all, unless userspace does the rescan (which defeats the purpose of pwrctrl).

> Unless I'm misunderstanding, these concerns all suggest we need some
> host-bridge hook in between (1) and (2), and existing pwrctrl users are
> operating somewhat outside the specification, or are relying on
> something more subtle that I'm missing.
> 

Yes, feel free to submit patches for toggling PERST#. Or let me know otherwise,
I'll do it.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-14 11:07   ` Manivannan Sadhasivam
  2025-04-14 23:16     ` Brian Norris
@ 2025-04-15 18:12     ` Jim Quinlan
  1 sibling, 0 replies; 12+ messages in thread
From: Jim Quinlan @ 2025-04-15 18:12 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Brian Norris, Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han,
	Lorenzo Pieralisi, Krzysztof Wilczyński,
	Krishna Chaitanya Chundru, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, dmitry.baryshkov, Tsai Sung-Fu,
	Nicolas Saenz Julienne, Florian Fainelli,
	Broadcom internal kernel review list

On Mon, Apr 14, 2025 at 7:07 AM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Tue, Apr 08, 2025 at 02:26:24PM -0700, Brian Norris wrote:
> > + adding pcie-brcmstb.c folks
> >
> > On Tue, Apr 08, 2025 at 12:59:39PM -0700, Brian Norris wrote:
> > > TL;DR: PCIe link-up may depend on pwrctrl; however, link-startup is
> > > often run before pwrctrl gets involved. I'm exploring options to resolve
> > > this.
> >
> > Apologies if a quick self-reply is considered nosiy or rude, but I
> > nearly forgot that I previously was looking at "pwrctrl"-like
> > functionality and noticed that drivers/pci/controller/pcie-brcmstb.c has
> > had a portion of the same "pwrctrl" functionality for some time (commit
> > 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage
> > regulators")).
> >
>
> Yes, the goal of the pwrctrl driver is to get rid of this clutter from the
> controller drivers.
>
> > Notably, it performs its power sequencing before starting its link, for
> > (I believe) exactly the same reasons as I mention below. While I'm sure
> > it could theoretically be nice for them to be able to use
> > drivers/pci/pwrctrl/, I expect they cannot today, for the same reasons.
> >
>
> If you look into brcm_pcie_add_bus(), they are ignoring the return value of
> brcm_pcie_start_link() precisely for the reason I explained in the previous
> thread. However, they do check for it in brcm_pcie_resume_noirq() which looks
> like a bug as the controller will fail to resume from system suspend if no
> devices are connected.

The reason our brcm_pcie_add_bus() does not check/return an error is
that the caller will invoke a WARN() on our error, or at least that
was the case when the commit was submitted.   We want to be good
citizens.  Also, for our driver, if the pcie-link-up fails, the
probe() fails.  There is no subsequent suspend/resume possible.  We do
not support PCIe hotplug in HW or SW.

Regards,
Jim Quinlan
Broadcom STB

>
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-15  5:32     ` Manivannan Sadhasivam
@ 2025-04-15 18:24       ` Brian Norris
  2025-04-16  5:59         ` Manivannan Sadhasivam
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Norris @ 2025-04-15 18:24 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu

Hi,

On Tue, Apr 15, 2025 at 11:02:14AM +0530, Manivannan Sadhasivam wrote:
> On Mon, Apr 14, 2025 at 04:24:38PM -0700, Brian Norris wrote:
> > On Mon, Apr 14, 2025 at 04:27:35PM +0530, Manivannan Sadhasivam wrote:
> > > On Tue, Apr 08, 2025 at 12:59:36PM -0700, Brian Norris wrote:
> > > > For link startup, pcie-designware-host.c currently
> > > > (a) starts the link via platform-specific means (dw_pcie::ops::start_link()) and
> > > > (b) waits for the link training to complete.
> > > > 
> > > > However, (b) will fail if the other end of the link is not powered up --
> > > > e.g., if the appropriate pwrctrl driver has not yet loaded, or its
> > > > device hasn't finished probing. Today, this can mean the designware
> > > > driver will either fail to probe,
> > > 
> > > This is not correct.
> > 
> > That depends on the implementation of start_link(). But I suppose the
> > intention is that start_link() only "starts" and doesn't care where
> > things go from there. (IOW, my local start_link() implementation is
> > probably wrong at the moment, as it performs some poll/retry steps too.)
> > 
> 
> The callback is supposed to just start
> the link and not wait for anything else.

Ack, thanks. I've learned something.

> > > > or at least waste time for a condition
> > > > that we can't achieve (link up), depending on the HW/driver
> > > > implementation.
> > > > 
> > > 
> > > Unfortunately we cannot avoid this waiting time as we don't know if a device is
> > > attached to the bus or not. The 1s wait time predates my involvement with DWC
> > > drivers.
> > 
> > I don't really love that answer. It means that any DWC-based platform
> > that needs pwrctrl and doesn't set use_link_irq==true will waste 1
> > second per PCIe controller. While it's hard to make guarantees about old
> > and/or unloved drivers, I'd like to think I can do better on new ones.
> > 
> 
> Even I'd like to avoid the 1s delay. But the problem is how would you know if
> the device is attached to the bus or not. The delay is to account for the fact
> that the link may take up to 1s to come up post starting LTSSM. So if we do not
> wait for that period, there is a chance that we would report the false negative
> status and also the enumeration would fail.

I understand there are cases we won't know, if we don't have a
hotplug/presence-detect wiring. But for cases we know, I think it's
cop-out to say "we can't handle it." See below.

> > Consider
> > the lesser statement that was paired along with it: always wasting 1
> > second per controller polling for something that will never happen. It
> > feels backwards and wasteful.
> > 
> 
> Again, I do get your point. But tell me how can a controller reliably detect
> that there is a device attached to the bus. Only on your android setup, you for
> sure know that the device won't be there during probe. So you are considering 1s
> wait as a wast of time and it is fair. But what if the same controller is used
> in another platform which is not android or the endpoint device is powered on
> during probe itself without replying on userspace?

This has nothing to do with Android.

IMO, we have 3 main categories of setups that we should primarily care
about:

(1) hotplug is supported, and PRSNT1#/PRSNT2# are wired
(2) hotplug is not supported, but a device is present and is already
    powered.
(3) hotplug is not supported, but a device is present. the device
    requires external power (i.e., pwrctrl / "subdevice regulators" /
    etc., should be involved)

AFAICT, we don't have much of (1). But if we did, we should also be able
to avoid initial delays, as we can reliably detect presence, and only
wait for training when we know it should succeed. (Or even better,
handle it async via an interrupt.)

For (2), we're also OK. The initial polling delay is likely to be much
less than 1 second.

For (3) ... all non-pwrctrl drivers (pcie-brcmstb.c, pcie-tegra.c,
pcie-tegra194.c, pcie-rockchip-host.c, ...) power things up before they
configure ports, start LTSSM, and have any expectation of detecting a
link. If a device is there, they again should commonly find it in much
less than 1 second.

However, when *using* pwrctrl, we have the ordering all wrong (IMO), and
so we eat needless delay. We *will not* successfully bring the link up,
and we *won't* find the device. This smells like a design problem, where
we have failed to plumb the information we already have available.

I think you're too worried about a case (4): that hotplug is not
supported, and a device is not present.

IMO, (4) should mostly be handled by simply disabling the unused
controller in device tree, or living with the timeouts. If a platform
doesn't support hotplug, then you can't expect optimal behavior for
unplugged devices.

I'm not complaining about (4). I'm complaining about (3) with pwrctrl.

> > One of my key questions: if I don't have a link-up IRQ, how can I avoid
> > this waste? pcie-brcmstb avoids that waste today (for the common case
> > where there is, in fact, a device connected), and it would be a
> > regression for it to start using pwrctrl tomorrow.
> > 
> 
> Why are you tying pwrctrl with this designware driver behavior? Both are
> unrelated. Even if you don't use pwrctrl and use controller driver to bring up
> the device, the 1s delay would be applicable (if there is no device).

We might be talking past each other. Per above, I think we can do better
with (1)-(3). But you're bringing up (4). Problem (3) exists for all
drivers, although it's more acute with DWC, and I happen to be using it.
I also think it's indicative of larger design and ordering problems in
pwrctrl.

> pcie-brcmstb driver indeed wastes time. It is not 1s but just 100ms. But that
> driver is for only one vendor. In the case of DWC, the driver has to work with
> multiple vendors. But again, I do not know how this 1s delay came up. Maybe we
> could try to reduce it to 500ms or so, but for that I need confirmation from
> someone like Lorenzo who knows the history.
> 
> > (Side note: I also just noticed pcie-tegra194.c does the same.)
> > 
> > > > My guess is that (1) is the case, and specifically that the relevant folks are
> > > > using the pcie-qcom.c, with its "global" IRQ used for link-up events.
> > > > 
> > > 
> > > We only recently added support for 'Link Up' event through 'global_irq' in the
> > > controller driver. And this was done to avoid waiting for link up during probe
> > 
> > You're kind of reinforcing my question: you don't like the waste, so
> > you're adding link-up IRQ support -- is that really the only way?
> > 
> 
> I don't know. But so far I haven't seen any other sensible way which is generic.
> 
> > (My initial thought: no, it's not. We know when pwrctrl has done its
> > thing -- why should we bother polling for link state before that? But
> > that's easier said than done, when pwrctrl is optional and highly
> > abstracted away from the DWC driver...)
> > 
> 
> Oh well... this is where you got it wrong. pwrctrl drivers are only probed
> before enumeration because of the design (which is way after starting the link).
> As of v6.15-rc1, before we try to enumerate any device, we check if there is any
> device defined in DT which requires power supply. If so, we create a platform
> device (or pwrctrl device) and let the pwrctrl driver to bind to it and power up
> the device. In that case, we also do not proceed to scan the bus further and
> skip the hierarchy. Because, the pwrctrl driver will rescan the bus once it has
> finished powering up the device.

It sounds like you're saying "it's the way that it is, because of the
way that it is." I understand how it is currently structured, but I'm
saying that I think pwrctrl is placed at the wrong place. It looks cute
and clean, but it has the ordering wrong.

IMO, we should allow pwrctrl to power things up earlier, so that
controller drivers have a better chance of hitting the optimal cases
(case (3) above) properly. (That's also how every pre-pwrctrl driver
does things, and I think it's for good reason.)

That would also resolve my PERST# and other timing questions, because
the controller driver would better know when pwrctrl is finished, and so
can better handle PERST# and any necessary delays.

I agree this might be more difficult to do in a "generic" way (per your
above language), depending on your definition of generic. But IMO, it's
important to prioritize doing things correctly, even if it's slightly
less cute.

As an example less-cute way of doing pwrctrl: expose a wrapped version
of pci_pwrctrl_create_device() such that drivers can call it earlier. If
there is a pwrctrl device created, that means a driver should not yet
wait for link-up -- it should defer that until the relevant pwrctrl is
marked "ready". (There are likely other problems to solve in here too,
but this is just an initial sketch. And to be clear, I suspect this
doesn't fit your notion of "generic", because it requires each driver to
adapt to it.)

> > Regarding the controller design: frankly, I don't think my controller
> > does anything all that revolutionary in this space [0]. All of my
> > questions today can be asked (from what I can tell) of existing upstream
> > controller drivers. I'm mostly trying to understand the expected driver
> > design here, and that includes teasing apart what is "stuff done in
> > 'old' drivers, but isn't recommended", and "what is currently
> > unimplemented in new stuff" (like pwrctrl [1]), and where do my
> > expectations fit in between that.
> > 
> > For instance, poking around a bit I come up with this question: when
> > using pci/pwrctrl, how does one ensure timing requirements around, e.g.,
> > power stability vs PERST# deassertion are met? When looking at a pwrctrl
> > driver like drivers/pci/pwrctrl/slot.c, the process looks too simple:
> > 
> > (0) host bridge probably already started its LTSSM, deasserted PERST#
> > (1) slot.c powers the slot
> > (2) pci_pwrctrl_device_set_ready() -> rescan_work_func() rescans the bus
> > 
> > Notably, there's no enforced delay between (1) and (2).
> > 
> > Reading the PCIe CEM, it seems we're violating some specification bits,
> > like:
> > 
> >   2.2. PERST# Signal
> >   [...] On power-up, the de-assertion of PERST# is delayed 100 ms
> >   (TPVPERL) from the power rails achieving specified operating limits.
> >   [...]
> > 
> > There are references to this in various implementations (e.g.,
> > tegra_pcie_enable_slot_regulators() and brcm_pcie_start_link() --
> > although I suspect the latter is applying the wrong ordering).
> > 
> > Additionaly, CEM also seems to suggest we have PERST# ordering wrong. It
> > should also come between (1) and (2), not at (0).
> > 
> 
> You are absolutely right! Currently, we follow the timing requirement while
> deasserting the PERST# in the controller drivers. But once we power on the slot,
> we do not touch PERST# and it just happen to work.
> 
> We may need to introduce another callback that toggles PERST# so that we can use
> it while powering up the device.
> 
> > And finally (for now), I don't understand how we have any guarantee that
> > step (2) is useful. Even if we've already started the LTSSM in (0), we
> > have no idea if the link is actually Active by the time we hit (2), and
> > so rescanning may not actually discover the device. And if that scan
> > fails ... then when do we trigger another pci_rescan_bus()? Only if the
> > implementation has a "link-up" IRQ?
> > 
> 
> As I said above, we do not enumerate the device if it has devicetree node with
> supplies. So that's why we need (2). Otherwise, the device won't be enumerated
> at all, unless userspace does the rescan (which defeats the purpose of pwrctrl).

But you haven't addressed one of the concerns in my paragraph: how do we
know the link is up by the time we hit the pwrctrl-initiated
pci_rescan_bus()? We haven't gone back to ask the host bridge if it's up
yet. We're just hoping we get lucky.

IOW, the pwrctl sequence should be something like:

 (1) power up the slot
 (1)(a) delay, per spec
 (1)(b) deassert PERST#
 (1)(c) wait for link up
 (2) rescan bus

Currently, we skip all of (1)(a)-(c). We're probably lucky that (1)(b)'s
ordering doesn't matter all the time, as long as we did it earlier. And
we're lucky that there are natural delays in software such that lack of
(1)(a) and (1)(c) aren't significant.

> > Unless I'm misunderstanding, these concerns all suggest we need some
> > host-bridge hook in between (1) and (2), and existing pwrctrl users are
> > operating somewhat outside the specification, or are relying on
> > something more subtle that I'm missing.
> > 
> 
> Yes, feel free to submit patches for toggling PERST#. Or let me know otherwise,
> I'll do it.

Per the above, I think pwrctrl has multiple ordering design problems, so
I haven't yet decided what kind of patch(es) I might submit. Feel free
to CC me if you tackle it before I do though.

Regards,
Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-15 18:24       ` Brian Norris
@ 2025-04-16  5:59         ` Manivannan Sadhasivam
  2025-04-16 17:14           ` Brian Norris
  0 siblings, 1 reply; 12+ messages in thread
From: Manivannan Sadhasivam @ 2025-04-16  5:59 UTC (permalink / raw)
  To: Brian Norris
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu

On Tue, Apr 15, 2025 at 11:24:39AM -0700, Brian Norris wrote:
> Hi,
> 
> On Tue, Apr 15, 2025 at 11:02:14AM +0530, Manivannan Sadhasivam wrote:
> > On Mon, Apr 14, 2025 at 04:24:38PM -0700, Brian Norris wrote:
> > > On Mon, Apr 14, 2025 at 04:27:35PM +0530, Manivannan Sadhasivam wrote:
> > > > On Tue, Apr 08, 2025 at 12:59:36PM -0700, Brian Norris wrote:
> > > > > For link startup, pcie-designware-host.c currently
> > > > > (a) starts the link via platform-specific means (dw_pcie::ops::start_link()) and
> > > > > (b) waits for the link training to complete.
> > > > > 
> > > > > However, (b) will fail if the other end of the link is not powered up --
> > > > > e.g., if the appropriate pwrctrl driver has not yet loaded, or its
> > > > > device hasn't finished probing. Today, this can mean the designware
> > > > > driver will either fail to probe,
> > > > 
> > > > This is not correct.
> > > 
> > > That depends on the implementation of start_link(). But I suppose the
> > > intention is that start_link() only "starts" and doesn't care where
> > > things go from there. (IOW, my local start_link() implementation is
> > > probably wrong at the moment, as it performs some poll/retry steps too.)
> > > 
> > 
> > The callback is supposed to just start
> > the link and not wait for anything else.
> 
> Ack, thanks. I've learned something.
> 
> > > > > or at least waste time for a condition
> > > > > that we can't achieve (link up), depending on the HW/driver
> > > > > implementation.
> > > > > 
> > > > 
> > > > Unfortunately we cannot avoid this waiting time as we don't know if a device is
> > > > attached to the bus or not. The 1s wait time predates my involvement with DWC
> > > > drivers.
> > > 
> > > I don't really love that answer. It means that any DWC-based platform
> > > that needs pwrctrl and doesn't set use_link_irq==true will waste 1
> > > second per PCIe controller. While it's hard to make guarantees about old
> > > and/or unloved drivers, I'd like to think I can do better on new ones.
> > > 
> > 
> > Even I'd like to avoid the 1s delay. But the problem is how would you know if
> > the device is attached to the bus or not. The delay is to account for the fact
> > that the link may take up to 1s to come up post starting LTSSM. So if we do not
> > wait for that period, there is a chance that we would report the false negative
> > status and also the enumeration would fail.
> 
> I understand there are cases we won't know, if we don't have a
> hotplug/presence-detect wiring. But for cases we know, I think it's
> cop-out to say "we can't handle it." See below.
> 
> > > Consider
> > > the lesser statement that was paired along with it: always wasting 1
> > > second per controller polling for something that will never happen. It
> > > feels backwards and wasteful.
> > > 
> > 
> > Again, I do get your point. But tell me how can a controller reliably detect
> > that there is a device attached to the bus. Only on your android setup, you for
> > sure know that the device won't be there during probe. So you are considering 1s
> > wait as a wast of time and it is fair. But what if the same controller is used
> > in another platform which is not android or the endpoint device is powered on
> > during probe itself without replying on userspace?
> 
> This has nothing to do with Android.
> 
> IMO, we have 3 main categories of setups that we should primarily care
> about:
> 
> (1) hotplug is supported, and PRSNT1#/PRSNT2# are wired
> (2) hotplug is not supported, but a device is present and is already
>     powered.
> (3) hotplug is not supported, but a device is present. the device
>     requires external power (i.e., pwrctrl / "subdevice regulators" /
>     etc., should be involved)
> 
> AFAICT, we don't have much of (1). But if we did, we should also be able
> to avoid initial delays, as we can reliably detect presence, and only
> wait for training when we know it should succeed. (Or even better,
> handle it async via an interrupt.)
> 
> For (2), we're also OK. The initial polling delay is likely to be much
> less than 1 second.
> 
> For (3) ... all non-pwrctrl drivers (pcie-brcmstb.c, pcie-tegra.c,
> pcie-tegra194.c, pcie-rockchip-host.c, ...) power things up before they
> configure ports, start LTSSM, and have any expectation of detecting a
> link. If a device is there, they again should commonly find it in much
> less than 1 second.
> 
> However, when *using* pwrctrl, we have the ordering all wrong (IMO), and
> so we eat needless delay. We *will not* successfully bring the link up,
> and we *won't* find the device. This smells like a design problem, where
> we have failed to plumb the information we already have available.
> 

I don't disagree :)

> I think you're too worried about a case (4): that hotplug is not
> supported, and a device is not present.
> 
> IMO, (4) should mostly be handled by simply disabling the unused
> controller in device tree, or living with the timeouts. If a platform
> doesn't support hotplug, then you can't expect optimal behavior for
> unplugged devices.
> 
> I'm not complaining about (4). I'm complaining about (3) with pwrctrl.
> 

Ok!

> > > One of my key questions: if I don't have a link-up IRQ, how can I avoid
> > > this waste? pcie-brcmstb avoids that waste today (for the common case
> > > where there is, in fact, a device connected), and it would be a
> > > regression for it to start using pwrctrl tomorrow.
> > > 
> > 
> > Why are you tying pwrctrl with this designware driver behavior? Both are
> > unrelated. Even if you don't use pwrctrl and use controller driver to bring up
> > the device, the 1s delay would be applicable (if there is no device).
> 
> We might be talking past each other. Per above, I think we can do better
> with (1)-(3). But you're bringing up (4). Problem (3) exists for all
> drivers, although it's more acute with DWC, and I happen to be using it.
> I also think it's indicative of larger design and ordering problems in
> pwrctrl.
> 

Now I get what you are saying.

> > pcie-brcmstb driver indeed wastes time. It is not 1s but just 100ms. But that
> > driver is for only one vendor. In the case of DWC, the driver has to work with
> > multiple vendors. But again, I do not know how this 1s delay came up. Maybe we
> > could try to reduce it to 500ms or so, but for that I need confirmation from
> > someone like Lorenzo who knows the history.
> > 
> > > (Side note: I also just noticed pcie-tegra194.c does the same.)
> > > 
> > > > > My guess is that (1) is the case, and specifically that the relevant folks are
> > > > > using the pcie-qcom.c, with its "global" IRQ used for link-up events.
> > > > > 
> > > > 
> > > > We only recently added support for 'Link Up' event through 'global_irq' in the
> > > > controller driver. And this was done to avoid waiting for link up during probe
> > > 
> > > You're kind of reinforcing my question: you don't like the waste, so
> > > you're adding link-up IRQ support -- is that really the only way?
> > > 
> > 
> > I don't know. But so far I haven't seen any other sensible way which is generic.
> > 
> > > (My initial thought: no, it's not. We know when pwrctrl has done its
> > > thing -- why should we bother polling for link state before that? But
> > > that's easier said than done, when pwrctrl is optional and highly
> > > abstracted away from the DWC driver...)
> > > 
> > 
> > Oh well... this is where you got it wrong. pwrctrl drivers are only probed
> > before enumeration because of the design (which is way after starting the link).
> > As of v6.15-rc1, before we try to enumerate any device, we check if there is any
> > device defined in DT which requires power supply. If so, we create a platform
> > device (or pwrctrl device) and let the pwrctrl driver to bind to it and power up
> > the device. In that case, we also do not proceed to scan the bus further and
> > skip the hierarchy. Because, the pwrctrl driver will rescan the bus once it has
> > finished powering up the device.
> 
> It sounds like you're saying "it's the way that it is, because of the
> way that it is." I understand how it is currently structured, but I'm
> saying that I think pwrctrl is placed at the wrong place. It looks cute
> and clean, but it has the ordering wrong.
> 
> IMO, we should allow pwrctrl to power things up earlier, so that
> controller drivers have a better chance of hitting the optimal cases
> (case (3) above) properly. (That's also how every pre-pwrctrl driver
> does things, and I think it's for good reason.)
> 
> That would also resolve my PERST# and other timing questions, because
> the controller driver would better know when pwrctrl is finished, and so
> can better handle PERST# and any necessary delays.
> 
> I agree this might be more difficult to do in a "generic" way (per your
> above language), depending on your definition of generic. But IMO, it's
> important to prioritize doing things correctly, even if it's slightly
> less cute.
> 
> As an example less-cute way of doing pwrctrl: expose a wrapped version
> of pci_pwrctrl_create_device() such that drivers can call it earlier. If
> there is a pwrctrl device created, that means a driver should not yet
> wait for link-up -- it should defer that until the relevant pwrctrl is
> marked "ready". (There are likely other problems to solve in here too,
> but this is just an initial sketch. And to be clear, I suspect this
> doesn't fit your notion of "generic", because it requires each driver to
> adapt to it.)
> 

This is what I initially had in my mind, but then I opted for a solution which
allowed the pwrctrl devices to be created in the PCI core itself without any
modifications in the controller drivers.

But I totally agree with you that now we don't have any control over PERST# and
that should be fixed.

> > > Regarding the controller design: frankly, I don't think my controller
> > > does anything all that revolutionary in this space [0]. All of my
> > > questions today can be asked (from what I can tell) of existing upstream
> > > controller drivers. I'm mostly trying to understand the expected driver
> > > design here, and that includes teasing apart what is "stuff done in
> > > 'old' drivers, but isn't recommended", and "what is currently
> > > unimplemented in new stuff" (like pwrctrl [1]), and where do my
> > > expectations fit in between that.
> > > 
> > > For instance, poking around a bit I come up with this question: when
> > > using pci/pwrctrl, how does one ensure timing requirements around, e.g.,
> > > power stability vs PERST# deassertion are met? When looking at a pwrctrl
> > > driver like drivers/pci/pwrctrl/slot.c, the process looks too simple:
> > > 
> > > (0) host bridge probably already started its LTSSM, deasserted PERST#
> > > (1) slot.c powers the slot
> > > (2) pci_pwrctrl_device_set_ready() -> rescan_work_func() rescans the bus
> > > 
> > > Notably, there's no enforced delay between (1) and (2).
> > > 
> > > Reading the PCIe CEM, it seems we're violating some specification bits,
> > > like:
> > > 
> > >   2.2. PERST# Signal
> > >   [...] On power-up, the de-assertion of PERST# is delayed 100 ms
> > >   (TPVPERL) from the power rails achieving specified operating limits.
> > >   [...]
> > > 
> > > There are references to this in various implementations (e.g.,
> > > tegra_pcie_enable_slot_regulators() and brcm_pcie_start_link() --
> > > although I suspect the latter is applying the wrong ordering).
> > > 
> > > Additionaly, CEM also seems to suggest we have PERST# ordering wrong. It
> > > should also come between (1) and (2), not at (0).
> > > 
> > 
> > You are absolutely right! Currently, we follow the timing requirement while
> > deasserting the PERST# in the controller drivers. But once we power on the slot,
> > we do not touch PERST# and it just happen to work.
> > 
> > We may need to introduce another callback that toggles PERST# so that we can use
> > it while powering up the device.
> > 
> > > And finally (for now), I don't understand how we have any guarantee that
> > > step (2) is useful. Even if we've already started the LTSSM in (0), we
> > > have no idea if the link is actually Active by the time we hit (2), and
> > > so rescanning may not actually discover the device. And if that scan
> > > fails ... then when do we trigger another pci_rescan_bus()? Only if the
> > > implementation has a "link-up" IRQ?
> > > 
> > 
> > As I said above, we do not enumerate the device if it has devicetree node with
> > supplies. So that's why we need (2). Otherwise, the device won't be enumerated
> > at all, unless userspace does the rescan (which defeats the purpose of pwrctrl).
> 
> But you haven't addressed one of the concerns in my paragraph: how do we
> know the link is up by the time we hit the pwrctrl-initiated
> pci_rescan_bus()? We haven't gone back to ask the host bridge if it's up
> yet. We're just hoping we get lucky.
> 
> IOW, the pwrctl sequence should be something like:
> 
>  (1) power up the slot
>  (1)(a) delay, per spec
>  (1)(b) deassert PERST#
>  (1)(c) wait for link up
>  (2) rescan bus
> 
> Currently, we skip all of (1)(a)-(c). We're probably lucky that (1)(b)'s
> ordering doesn't matter all the time, as long as we did it earlier. And
> we're lucky that there are natural delays in software such that lack of
> (1)(a) and (1)(c) aren't significant.
> 

Let me go back to the drawing board and come up with a proposal. There are
atleast a couple of ways to fix this issue and I need to pick a less intrusive
one.

Thanks for reporting it, appreciated!

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] PCI: pwrctrl and link-up dependencies
  2025-04-16  5:59         ` Manivannan Sadhasivam
@ 2025-04-16 17:14           ` Brian Norris
  0 siblings, 0 replies; 12+ messages in thread
From: Brian Norris @ 2025-04-16 17:14 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bartosz Golaszewski, Bjorn Helgaas, Jingoo Han, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Krishna Chaitanya Chundru, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, dmitry.baryshkov,
	Tsai Sung-Fu

Hi Manivannan,

On Wed, Apr 16, 2025 at 11:29:32AM +0530, Manivannan Sadhasivam wrote:
> On Tue, Apr 15, 2025 at 11:24:39AM -0700, Brian Norris wrote:
> > We might be talking past each other. Per above, I think we can do better
> > with (1)-(3). But you're bringing up (4). Problem (3) exists for all
> > drivers, although it's more acute with DWC, and I happen to be using it.
> > I also think it's indicative of larger design and ordering problems in
> > pwrctrl.
> > 
> 
> Now I get what you are saying.

Great! I probably didn't include all my thoughts in the first place, but
then, my first email was already long enough :)

> > As an example less-cute way of doing pwrctrl: expose a wrapped version
> > of pci_pwrctrl_create_device() such that drivers can call it earlier. If
> > there is a pwrctrl device created, that means a driver should not yet
> > wait for link-up -- it should defer that until the relevant pwrctrl is
> > marked "ready". (There are likely other problems to solve in here too,
> > but this is just an initial sketch. And to be clear, I suspect this
> > doesn't fit your notion of "generic", because it requires each driver to
> > adapt to it.)
> > 
> 
> This is what I initially had in my mind, but then I opted for a solution which
> allowed the pwrctrl devices to be created in the PCI core itself without any
> modifications in the controller drivers.
> 
> But I totally agree with you that now we don't have any control over PERST# and
> that should be fixed.

Yeah, if we have to handle PERST# and its timing, then we have to touch
essentially every driver anyway, I think. So it's definitely a chance to
go a (slightly) different direction.

(Side note: I think this is potentially a chance to solve the odd I2C
pwrctrl problem I linked in my original post with the same set of hooks.
If a controller driver can know when pwrctrl is finished, then it can
also defer its LTSSM until after that point.

I doubt this will be the last set of "odd" HW where additional
platform-specific dependencies need to be inserted between pwrctrl and
PCI enumeration.)

> > IOW, the pwrctl sequence should be something like:
> > 
> >  (1) power up the slot
> >  (1)(a) delay, per spec
> >  (1)(b) deassert PERST#
> >  (1)(c) wait for link up
> >  (2) rescan bus
> > 
> > Currently, we skip all of (1)(a)-(c). We're probably lucky that (1)(b)'s
> > ordering doesn't matter all the time, as long as we did it earlier. And
> > we're lucky that there are natural delays in software such that lack of
> > (1)(a) and (1)(c) aren't significant.
> > 
> 
> Let me go back to the drawing board and come up with a proposal. There are
> atleast a couple of ways to fix this issue and I need to pick a less intrusive
> one.

That's kind of you. Let me know if I can help at all. Or at least CC me
on any updates you have.

> Thanks for reporting it, appreciated!

Thanks for walking through it with me!

Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-04-16 17:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 19:59 [RFC] PCI: pwrctrl and link-up dependencies Brian Norris
2025-04-08 21:26 ` Brian Norris
2025-04-14 11:07   ` Manivannan Sadhasivam
2025-04-14 23:16     ` Brian Norris
2025-04-15 18:12     ` Jim Quinlan
2025-04-10  9:59 ` Niklas Cassel
2025-04-14 10:57 ` Manivannan Sadhasivam
2025-04-14 23:24   ` Brian Norris
2025-04-15  5:32     ` Manivannan Sadhasivam
2025-04-15 18:24       ` Brian Norris
2025-04-16  5:59         ` Manivannan Sadhasivam
2025-04-16 17:14           ` Brian Norris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).