From: Ethan Zhao <etzhao1900@gmail.com>
To: Brian Norris <briannorris@chromium.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH] PCI/sysfs: Ensure devices are powered for config reads
Date: Thu, 21 Aug 2025 20:41:28 +0800 [thread overview]
Message-ID: <048bd3c4-887c-4d17-9636-354cc626afa3@gmail.com> (raw)
In-Reply-To: <aKaK4WS0pY0Nb2yi@google.com>
On 8/21/2025 10:56 AM, Brian Norris wrote:
> On Thu, Aug 21, 2025 at 08:54:52AM +0800, Ethan Zhao wrote:
>> On 8/21/2025 1:26 AM, Brian Norris wrote:
>>> From: Brian Norris <briannorris@google.com>
>>>
>>> max_link_speed, max_link_width, current_link_speed, current_link_width,
>>> secondary_bus_number, and subordinate_bus_number all access config
>>> registers, but they don't check the runtime PM state. If the device is
>>> in D3cold, we may see -EINVAL or even bogus values.
>> My understanding, if your device is in D3cold, returning of -EINVAL is
>> the right behavior.
>
> That's not the guaranteed result though. Some hosts don't properly
> return PCIBIOS_DEVICE_NOT_FOUND, for one. But also, it's racy -- because
> we don't even try to hold a pm_runtime reference, the device could
> possibly enter D3cold while we're in the middle of reading from it. If
> you're lucky, that'll get you a completion timeout and an all-1's
> result, and we'll return a garbage result.
>
> So if we want to purposely not resume the device and retain "I can't
> give you what you asked for" behavior, we'd at least need a
> pm_runtime_get_noresume() or similar.
I understand you just want the stable result of these caps, meanwhile
you don't want the side effect either.>
>>> Wrap these access in pci_config_pm_runtime_{get,put}() like most of the
>>> rest of the similar sysfs attributes.
>>>
>>> Fixes: 56c1af4606f0 ("PCI: Add sysfs max_link_speed/width, current_link_speed/width, etc")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Brian Norris <briannorris@google.com>
>>> Signed-off-by: Brian Norris <briannorris@chromium.org>
>>> ---
>>>
>>> drivers/pci/pci-sysfs.c | 32 +++++++++++++++++++++++++++++---
>>> 1 file changed, 29 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>>> index 5eea14c1f7f5..160df897dc5e 100644
>>> --- a/drivers/pci/pci-sysfs.c
>>> +++ b/drivers/pci/pci-sysfs.c
>>> @@ -191,9 +191,16 @@ static ssize_t max_link_speed_show(struct device *dev,
>>> struct device_attribute *attr, char *buf)
>>> {
>>> struct pci_dev *pdev = to_pci_dev(dev);
>>> + ssize_t ret;
>>> +
>>> + pci_config_pm_runtime_get(pdev);
>> This function would potentially change the power state of device,
>> that would be a complex process, beyond the meaning of
>> max_link_speed_show(), given the semantics of these functions (
>> max_link_speed_show()/max_link_width_show()/current_link_speed_show()/
>> ....),
>> this cannot be done !
>
> What makes this different than the 'config' attribute (i.e., "read
> config register")? Why shouldn't that just return -EINVAL? I don't
> really buy your reasoning -- "it's a complex process" is not a reason
It is a reason to know there is side effect to be taken into account.>
not to do something. The user asked for the link speed; why not give it?
> If the user wanted to know if the device was powered, they could check
> the 'power_state' attribute instead.
>
> (Side note: these attributes don't show up anywhere in Documentation/,
> so it's also a bit hard to declare "best" semantics for them.)
>
> To flip this question around a bit: if I have a system that aggressively
> suspends devices when there's no recent activity, how am I supposed to
> check what the link speed is? Probabilistically hammer the file while
> hoping some other activity wakes the device, so I can find the small
> windows of time where it's RPM_ACTIVE? Disable runtime_pm for the device
> while I check?
Hold a PM reference by pci_config_pm_runtime_get() and then write some
data to the PCIe config space, no objection.
To know about the linkspeed etc capabilities/not status, how about
creating a cached version of these caps, no need to change their
power state.
If there is aggressive power saving requirement, and the polling
of these caps will make up wakeup/poweron bugs.
Thanks,
Ethan
>
> Brian
next prev parent reply other threads:[~2025-08-21 12:41 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-20 17:26 [PATCH] PCI/sysfs: Ensure devices are powered for config reads Brian Norris
2025-08-21 0:54 ` Ethan Zhao
2025-08-21 2:56 ` Brian Norris
2025-08-21 12:41 ` Ethan Zhao [this message]
2025-08-21 15:28 ` Brian Norris
2025-08-22 1:11 ` Ethan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=048bd3c4-887c-4d17-9636-354cc626afa3@gmail.com \
--to=etzhao1900@gmail.com \
--cc=bhelgaas@google.com \
--cc=briannorris@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).