linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>, Lukas Wunner <lukas@wunner.de>,
	Keith Busch <keith.busch@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Alexandru Gagniuc <mr.nuke.me@gmail.com>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Matthias Andree <matthias.andree@gmx.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
Date: Tue, 29 Oct 2019 15:27:09 -0500	[thread overview]
Message-ID: <20191029202708.GA38926@google.com> (raw)
In-Reply-To: <20191029111520.GE2593@lahna.fi.intel.com>

On Tue, Oct 29, 2019 at 01:15:20PM +0200, Mika Westerberg wrote:
> On Mon, Oct 28, 2019 at 03:16:53PM -0500, Bjorn Helgaas wrote:
> > > The related hardware event is resume in this case. Can you point
> > > me to the actual point where you want me to put this?
> > 
> > "Resume" is a Linux software concept, so of course the PCIe spec
> > doesn't say anything about it.  The spec talks about delays
> > related to resets and device power and link state transitions, so
> > somehow we have to connect the Linux delay with those hardware
> > events.
> > 
> > Since we're talking about a transition from D3cold, this has to be
> > done via something external to the device such as power
> > regulators.  For ACPI systems that's probably hidden inside _PS0
> > or something similar.  That's opaque, but at least it's a hook
> > that says "here's where we put the device into D0".  I suggested
> > acpi_pci_set_power_state() as a possibility since I think that's
> > the lowest-level point where we have the pci_dev so we know the
> > current state and the new state.
> 
> I looked at how we could use acpi_pci_set_power_state() but I don't
> think it is possible because it is likely that only the root port
> has the power resource that is used to bring the link to L2 or L3.
> However, we would need to repeat the delay for each downstream/root
> port if there are multiple PCIe switches in the topology.

OK, I think I understand why that's a problem (correct me if I'm
wrong):

  We call pci_pm_resume_noirq() for every device, but it only calls
  acpi_pci_set_power_state() for devices that have _PS0 or _PR0
  methods.  So if the delay is in acpi_pci_set_power_state() and we
  have A -> B -> C where only A has _PS0, we would delay for the link
  to B to come up, but not for the link to C.

I do see that we do need both delays.  In acpi_pci_set_power_state()
when we transition A from D3cold->D0, I assume that single _PS0
evaluation on A causes B to transition from D3cold->D3hot, which in
turn causes C to transition from D3cold->D3hot.  Is that your
understanding, too?

We do know that topology in acpi_pci_set_power_state(), since we have
the pci_dev for A, so it seems conceivable that we could descend the
hierarchy and delay for each level.

If the delay is in pci_pm_resume_noirq() (as in your patch), what
happens with a switch with several Downstream Ports?  I assume that
all the Downstream Ports start their transition out of D3cold
basically simultaneously, so we probably don't need N delays, do we?
It seems a little messy to optimize this in pci_pm_resume_noirq().

The outline of the pci_pm_resume_noirq() part of this patch is:

  pci_pm_resume_noirq
    if (!dev->skip_bus_pm ...)   # <-- condition 1
      pci_pm_default_resume_early
        pci_power_up
          if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
            platform_pci_set_power_state
              pci_platform_pm->set_state
                acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
                  acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
+   if (d3cold)                  # <-- condition 2
+     pci_bridge_wait_for_secondary_bus

Another thing that niggles at me here is that the condition for
calling pci_bridge_wait_for_secondary_bus() is completely different
than the condition for changing the power state.  If we didn't change
the power state, there's no reason to wait, is there?

The outline of the pci_pm_runtime_resume() part of this patch is:

  pci_pm_runtime_resume
    pci_restore_standard_config
      if (dev->current_state != PCI_D0)
        pci_set_power_state(PCI_D0)
          __pci_start_power_transition
            pci_platform_power_transition
              if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
                platform_pci_set_power_state
                  pci_platform_pm->set_state
                    acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
                      acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
              pci_raw_set_power_state
          __pci_complete_power_transition
+   if (d3cold)
+     pci_bridge_wait_for_secondary_bus

In this part, the power state change is inside
pci_restore_standard_config(), which calls pci_set_power_state().
There are many other callers of pci_set_power_state(); can we be sure
that none of them need a delay?

> > But it seems that at least some ACPI firmware doesn't do those
> > delays, so I guess our only alternatives are to always do it in
> > the OS or have some sort of blacklist.  And it doesn't really seem
> > practical to maintain a blacklist.
> 
> I really think this is crystal clear:

I am agreeing with you that the OS needs to do the delays.

> The OS is always responsible for the delays described in the PCIe
> spec.

If the ACPI spec contained this statement, it would be useful, but I
haven't seen it.  It's certainly true that some combination of
firmware and the OS is responsible for the delays :)

> However, if the platform implements some of them say in _ON or _PS0
> methods then it can notify the OS about this by using the _DSM so
> the OS does not need to duplicate all of them.

That makes good sense, but there are other reasons for using that
_DSM, e.g., firmware may know that MID or similar devices are not
really PCI devices and don't need delays anywhere.  So the existence
of the _DSM by itself doesn't convince me that the OS is responsible
for the delays.

> > > > In pci_pm_reset(), we're doing the D0->D3hot->D0 transitions
> > > > specifically to do a reset, so No_Soft_Reset is false.
> > > > Doesn't 6.6.1 say we need at least 100ms here?
> > > 
> > > No since it does not go into D3cold. It just "reset" the thing
> > > if it happens to do internal reset after D3hot -> D0.
> > 
> > Sec 5.8, Figure 5-18 says D3hot->D0uninitialized is a "Soft
> > Reset", which unfortunately is not defined.
> > 
> > My guess is that in sec 5.9, Table 5-13, the 10ms delay is for the
> > D3hot->D0active (i.e., No_Soft_Reset=1) transition, and the
> > D3hot->D0uninitialized (i.e., No_Soft_Reset=0) that does a "soft
> > reset" (whatever that is) probably requires more and we should
> > handle it like a conventional reset to be safe.
> 
> I think it simply means the device functional context is lost (there
> is more in section 5.3.1.4). Linux handles this properly already
> (well at least according the minimum timings required by the spec)
> and restores the context accordingly after it has waited for the
> 10ms.
> 
> It is the D3cold (where links go to L2 or L3) where we really need
> the delays so that the link gets properly trained before we start
> poking the downstream device.

I'm already speculating above, so I don't think I can contribute
anything useful here.

  reply	other threads:[~2019-10-29 20:27 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
2020-08-08 20:22   ` Marc MERLIN
2020-08-08 20:23     ` Marc MERLIN
2020-08-09 16:31     ` Marc MERLIN
2020-09-06 18:18     ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-06 18:26       ` Matthias Andree
2020-09-07 19:14       ` [Nouveau] " Karol Herbst
2020-09-07 20:58         ` Marc MERLIN
2020-09-07 23:51           ` Karol Herbst
2020-09-08  0:29             ` Marc MERLIN
     [not found]               ` <20200529180315.GA18804@merlins.org>
2020-12-26 11:12                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-27 18:28                   ` [Nouveau] " Ilia Mirkin
2021-01-27 21:33                   ` Bjorn Helgaas
2021-01-28 20:59                     ` Bjorn Helgaas
2021-01-29  0:56                     ` Marc MERLIN
2021-01-29 21:20                       ` Bjorn Helgaas
2021-01-30  2:04                         ` Marc MERLIN
2020-12-29 15:51                 ` Marc MERLIN
2020-12-29 16:33                   ` Ilia Mirkin
2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
2019-10-26 14:19   ` Bjorn Helgaas
2019-10-28 11:28     ` Mika Westerberg
2019-10-28 13:42       ` Bjorn Helgaas
2019-10-28 18:06         ` Mika Westerberg
2019-10-28 20:16           ` Bjorn Helgaas
2019-10-29 11:15             ` Mika Westerberg
2019-10-29 20:27               ` Bjorn Helgaas [this message]
2019-10-30 11:15                 ` Mika Westerberg
2019-10-31 22:31                   ` Bjorn Helgaas
2019-11-01 11:19                     ` Mika Westerberg
2019-11-05  0:00                       ` Bjorn Helgaas
2019-11-05  9:54                         ` Mika Westerberg
2019-11-05 12:58                           ` Mika Westerberg
2019-11-05 20:01                             ` Bjorn Helgaas
2019-11-06 13:31                               ` Mika Westerberg
2019-11-05 15:00                           ` Bjorn Helgaas
2019-11-05 15:28                             ` Mika Westerberg
2019-11-05 16:10                               ` Bjorn Helgaas
2019-11-06 13:29                                 ` Mika Westerberg
2019-10-29 20:54   ` Bjorn Helgaas
2019-10-30 11:33     ` Mika Westerberg
2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
2019-10-04 13:06   ` Mika Westerberg
2019-10-05  7:34     ` Matthias Andree
2019-10-07  9:32       ` Mika Westerberg
2019-10-07 15:15         ` Matthias Andree
2019-10-08  9:05           ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191029202708.GA38926@google.com \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=keith.busch@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=matthias.andree@gmx.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mr.nuke.me@gmail.com \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=pmenzel@molgen.mpg.de \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).