From: Alex Williamson <alex@shazbot.org>
To: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Cc: bhelgaas@google.com, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, alex@shazbot.org
Subject: Re: [PATCH v2] PCI: Force PM reset for Qualcomm devices with NoSoftRst+
Date: Mon, 11 May 2026 13:36:43 -0600 [thread overview]
Message-ID: <20260511133643.73a16e69@shazbot.org> (raw)
In-Reply-To: <20260511122622.35311-1-jtornosm@redhat.com>
On Mon, 11 May 2026 14:26:21 +0200
Jose Ignacio Tornos Martinez <jtornosm@redhat.com> wrote:
>
> > What does reset_methods sysfs attribute report for these devices on an
> > unpatched kernel?
> The kernel we use doesn't have CONFIG_PCI_RESET_SYSFS enabled,
> so reset_methods is not available. However, I can provide the actual
> behavior observed through testing and dmesg logs.
What kernel is this? I don't find any reference to such a Kconfig
option.
> > I'd tend to expect these are single-function devices where bus reset
> > would be available as a function level reset.
> Yes, these are single-function devices (PCI header type 00).
> For example, here's the ath11k device: lspci -xxx -s 0000:03:00.0 | head -2
> 03:00.0 Network controller: Qualcomm Technologies, Inc QCNFA765
> 00: cb 17 03 11 06 05 10 00 01 00 80 02 10 00 00 00
> ^^
> Header type: 00 (single-function)
>
> > I'm very suspicious that this is just masking an underlying issue
> > relative to bus reset for these devices
> Yes, you are right, there is an underlying bus reset issue. Let me explain
> what I have observed through the testing:
> Testing showed no reset is performed at all. During both VM startup and
> virsh reset operations, there are no reset-related messages in dmesg.
> The reset hierarchy returns -ENOTTY at each step:
> - No FLR (device doesn't advertise it)
> - PM reset returns -ENOTTY (NoSoftRst+ flag)
> - Bus reset apparently not attempted
Bus reset should be used for function level reset of a single function
device unless either the downstream port or the endpoint are quirked to
prevent it. I don't see any such quirk for 17cb:1103. What's the ID
of the root port?
> When testing the suggested quirk_no_flr() approach (which worked for
> mt7925e), dmesg shows secondary bus reset is attempted:
> vfio-pci 0000:06:00.0: enabling device (0000 -> 0002)
> vfio-pci 0000:06:00.0: resetting
> pcieport 0000:00:1c.4: unlocked secondary bus reset via: __pci_reset_function_locked
> vfio-pci 0000:06:00.0: reset done
> However, the device becomes unresponsive after this:
> lspci -vvvvvvvvvvvv -s 0000:03:00.0
> 03:00.0 Network controller: Qualcomm Technologies, Inc (rev ff) (prog-if ff)
> !!! Unknown header type 7f
> And all config space reads return 0xFF, indicating the device is not
> responding after bus reset.
> If we use PM reset (D3hot->D0) succeeds and the device works correctly
> through multiple VM lifecycles (startup, virsh reset, shutdown/restart).
>
> > especially if we haven't actually verified the device state is
> > actually reset on transition back to D0
> The verification is functional: with our patch, the device successfully
> initializes in the guest after VM reset operations, and continues working
> through multiple reset cycles. Without a working reset (default kernel),
> WiFi devices (ath11k, ath12k) cannot be reused after VM termination, and
> modem devices (SDX62/SDX65) fail to initialize even on first VM assignment.
>
> Summary:
> You're correct that there's a bus reset issue, SBR breaks these devices.
> The question is whether we should:
> 1. Investigate why SBR breaks these single-function devices
Then why aren't we setting quirks to use quirk_no_bus_reset() for these
devices?
> 2. Use PM reset which demonstrably works
> Option 1 may involve firmware-level investigation, while the PM reset
> approach provides a working solution.
> This situation is similar to existing quirks: quirk_no_flr() works around
> devices with broken FLR implementations. Here we're working around devices
> that incorrectly advertise NoSoftRst+ (preventing PM reset) while SBR doesn't
> work properly.
> I'm open to your guidance on the best path forward.
Proving that an advertised reset method doesn't work is much easier
than proving an unadvertised reset method does work. What's being
proposed here effectively ignores 1) while asserting that 2) then
works. Does 2) work only because it prevents the fall through to 1),
which is known broken, or does it have merit on its own. I can't tell.
Whether supported in your kernel or not, the mainline kernel does also
have support for modifying reset method priorities through sysfs, so
the fall through order assumed here isn't necessarily what everyone
will experience.
I would start with disabling the reset methods that are known broken,
FLR and bus reset. Test whether that results in reliable behavior.
If that's still not as reliable as you're seeing by adding the
transition through D3hot, then I'd be open to the discussion of whether
these devices do in fact need a device specific reset or quirk to PM
reset (and everywhere else that tests PCI_PM_CTRL_NO_SOFT_RESET).
The previous patch[1] proposed a device specific reset passing the
device through D3cold. This muddies the waters a bit because D3cold
will actually power off the device causing a reset, but the ability to
enter D3cold depends on the platform, not the device. We can't tell
from the code what state the device actually entered there.
OTOH, the quirk proposed here would only achieve D3hot. Are the BAR
values preserved or cleared immediately after transition to D0? If
cleared, that could provide supporting evidence that NoSoftRst is
actually misrepresented by the device. If not, we're really just
looking at a heuristic that an internal reset might be occurring, but
only the vendor could confirm. Thanks,
Alex
PS - D3cold might be an interesting reset method that could be
implemented for single function endpoints in slots that support it.
[1]https://lore.kernel.org/all/20260507142916.392983-1-jtornosm@redhat.com/
prev parent reply other threads:[~2026-05-11 19:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 14:51 [PATCH v2] PCI: Disable broken FLR on MediaTek MT7925 Jose Ignacio Tornos Martinez
2026-05-08 14:51 ` [PATCH v2] PCI: Force PM reset for Qualcomm devices with NoSoftRst+ Jose Ignacio Tornos Martinez
2026-05-08 17:16 ` Alex Williamson
2026-05-11 12:26 ` Jose Ignacio Tornos Martinez
2026-05-11 19:36 ` Alex Williamson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260511133643.73a16e69@shazbot.org \
--to=alex@shazbot.org \
--cc=bhelgaas@google.com \
--cc=jtornosm@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox