Linux wireless drivers development
 help / color / mirror / Atom feed
* [PATCH v9] PCI: Add device-specific reset for Qualcomm devices
@ 2026-06-12 14:26 Jose Ignacio Tornos Martinez
  2026-06-12 15:12 ` Alex Williamson
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jose Ignacio Tornos Martinez @ 2026-06-12 14:26 UTC (permalink / raw)
  To: bhelgaas, alex
  Cc: jjohnson, mani, linux-pci, linux-wireless, ath11k, ath12k, mhi,
	linux-kernel, Jose Ignacio Tornos Martinez

Some Qualcomm PCIe devices (WCN6855/WCN7850 WiFi cards, SDX62/SDX65 modems)
lack working reset methods for VFIO passthrough scenarios. These devices
have no FLR capability, advertise NoSoftRst+ (blocking PM reset), and have
broken bus reset.

The problem manifests in VFIO passthrough scenarios:

- WCN6855 WiFi card (17cb:1103): Normal VM operation works fine, including
  clean shutdown/reboot. However, when the VM terminates uncleanly
  (crash, force-off), VFIO attempts to reset the device before it can
  be assigned to another VM. Without a working reset method, the device
  remains in an undefined state, preventing reuse.

- WCN7850 WiFi card (17cb:1107): Same behavior as WCN6855.

- SDX62/SDX65 5G modems (17cb:0308): Never successfully initialize even
  on first VM assignment without proper reset capability.

Add device-specific reset entries for these Qualcomm devices using D3hot
power cycling. Testing shows that despite advertising NoSoftRst+, D3hot
transition provides sufficient reset for VFIO reuse, particularly after
unexpected VM termination. While not a complete reset (BARs preserved),
it provides the only viable reset mechanism for these devices.

Testing was performed on desktop platforms with M.2 WiFi and modem cards
using M.2-to-PCIe adapters, including extensive force-reset cycling to
verify stability.

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
---
v9:
  - Complete redesign based on maintainer feedback (Alex Williamson, Bjorn
    Helgaas, Rafael Wysocki): dropped general d3cold infrastructure entirely
    and now just a single patch: the proven D3hot reset for specific
    Qualcomm devices (device-specific reset)
  - Previous v8 patch 1/3 (general d3cold) dropped: concerns about ACPI
    portability, bridge issues, runtime PM, and lack of _PR3 hardware for
    testing.
  - Previous v8 patch 3/3 (quirk_no_bus_reset) already merged for v7.2
v8: https://lore.kernel.org/all/20260609163649.319755-1-jtornosm@redhat.com/

 drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 431c021d7414..bac1edb6c2dc 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4240,6 +4240,41 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
 	return 0;
 }
 
+/*
+ * Device-specific reset method for certain Qualcomm devices via D3hot power
+ * cycle.
+ *
+ * These specific Qualcomm devices lack FLR capability, advertise NoSoftRst+
+ * (blocking PM reset), and have broken bus reset. Despite advertising
+ * NoSoftRst+, testing shows that D3hot transition provides sufficient reset
+ * for VFIO reuse, particularly after unexpected VM termination where the
+ * device would otherwise remain in an undefined state. While not a complete
+ * reset (BARs are preserved), it provides the only viable reset mechanism for
+ * these devices in the commented situations.
+ */
+static int reset_qualcomm_d3hot(struct pci_dev *dev, bool probe)
+{
+	int ret;
+
+	if (probe)
+		return 0;
+
+	if (dev->current_state != PCI_D0)
+		return -EINVAL;
+
+	ret = pci_set_power_state(dev, PCI_D3hot);
+	if (ret)
+		return ret;
+	msleep(200);
+
+	ret = pci_set_power_state(dev, PCI_D0);
+	if (ret)
+		return ret;
+	msleep(200);
+
+	return 0;
+}
+
 static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
 	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
 		 reset_intel_82599_sfp_virtfn },
@@ -4255,6 +4290,9 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
 		reset_chelsio_generic_dev },
 	{ PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF,
 		reset_hinic_vf_dev },
+	{ PCI_VENDOR_ID_QCOM, 0x1103, reset_qualcomm_d3hot },  /* WCN6855 */
+	{ PCI_VENDOR_ID_QCOM, 0x1107, reset_qualcomm_d3hot },  /* WCN7850 */
+	{ PCI_VENDOR_ID_QCOM, 0x0308, reset_qualcomm_d3hot },  /* SDX62/SDX65 */
 	{ 0 }
 };
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v9] PCI: Add device-specific reset for Qualcomm devices
  2026-06-12 14:26 [PATCH v9] PCI: Add device-specific reset for Qualcomm devices Jose Ignacio Tornos Martinez
@ 2026-06-12 15:12 ` Alex Williamson
  2026-06-12 15:17 ` Bjorn Helgaas
  2026-06-17 14:47 ` Manivannan Sadhasivam
  2 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2026-06-12 15:12 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez
  Cc: bhelgaas, jjohnson, mani, linux-pci, linux-wireless, ath11k,
	ath12k, mhi, linux-kernel, alex

On Fri, 12 Jun 2026 16:26:38 +0200
Jose Ignacio Tornos Martinez <jtornosm@redhat.com> wrote:

> Some Qualcomm PCIe devices (WCN6855/WCN7850 WiFi cards, SDX62/SDX65 modems)
> lack working reset methods for VFIO passthrough scenarios. These devices
> have no FLR capability, advertise NoSoftRst+ (blocking PM reset), and have
> broken bus reset.
> 
> The problem manifests in VFIO passthrough scenarios:
> 
> - WCN6855 WiFi card (17cb:1103): Normal VM operation works fine, including
>   clean shutdown/reboot. However, when the VM terminates uncleanly
>   (crash, force-off), VFIO attempts to reset the device before it can
>   be assigned to another VM. Without a working reset method, the device
>   remains in an undefined state, preventing reuse.
> 
> - WCN7850 WiFi card (17cb:1107): Same behavior as WCN6855.
> 
> - SDX62/SDX65 5G modems (17cb:0308): Never successfully initialize even
>   on first VM assignment without proper reset capability.
> 
> Add device-specific reset entries for these Qualcomm devices using D3hot
> power cycling. Testing shows that despite advertising NoSoftRst+, D3hot
> transition provides sufficient reset for VFIO reuse, particularly after
> unexpected VM termination. While not a complete reset (BARs preserved),
> it provides the only viable reset mechanism for these devices.
> 
> Testing was performed on desktop platforms with M.2 WiFi and modem cards
> using M.2-to-PCIe adapters, including extensive force-reset cycling to
> verify stability.
> 
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
> v9:
>   - Complete redesign based on maintainer feedback (Alex Williamson, Bjorn
>     Helgaas, Rafael Wysocki): dropped general d3cold infrastructure entirely
>     and now just a single patch: the proven D3hot reset for specific
>     Qualcomm devices (device-specific reset)
>   - Previous v8 patch 1/3 (general d3cold) dropped: concerns about ACPI
>     portability, bridge issues, runtime PM, and lack of _PR3 hardware for
>     testing.
>   - Previous v8 patch 3/3 (quirk_no_bus_reset) already merged for v7.2
> v8: https://lore.kernel.org/all/20260609163649.319755-1-jtornosm@redhat.com/
> 
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 431c021d7414..bac1edb6c2dc 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4240,6 +4240,41 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
>  	return 0;
>  }
>  
> +/*
> + * Device-specific reset method for certain Qualcomm devices via D3hot power
> + * cycle.
> + *
> + * These specific Qualcomm devices lack FLR capability, advertise NoSoftRst+
> + * (blocking PM reset), and have broken bus reset. Despite advertising
> + * NoSoftRst+, testing shows that D3hot transition provides sufficient reset
> + * for VFIO reuse, particularly after unexpected VM termination where the
> + * device would otherwise remain in an undefined state. While not a complete
> + * reset (BARs are preserved), it provides the only viable reset mechanism for
> + * these devices in the commented situations.
> + */
> +static int reset_qualcomm_d3hot(struct pci_dev *dev, bool probe)
> +{
> +	int ret;
> +
> +	if (probe)
> +		return 0;
> +
> +	if (dev->current_state != PCI_D0)
> +		return -EINVAL;
> +
> +	ret = pci_set_power_state(dev, PCI_D3hot);
> +	if (ret)
> +		return ret;
> +	msleep(200);
> +
> +	ret = pci_set_power_state(dev, PCI_D0);
> +	if (ret)
> +		return ret;
> +	msleep(200);
> +
> +	return 0;
> +}
> +
>  static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
>  		 reset_intel_82599_sfp_virtfn },
> @@ -4255,6 +4290,9 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  		reset_chelsio_generic_dev },
>  	{ PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF,
>  		reset_hinic_vf_dev },
> +	{ PCI_VENDOR_ID_QCOM, 0x1103, reset_qualcomm_d3hot },  /* WCN6855 */
> +	{ PCI_VENDOR_ID_QCOM, 0x1107, reset_qualcomm_d3hot },  /* WCN7850 */
> +	{ PCI_VENDOR_ID_QCOM, 0x0308, reset_qualcomm_d3hot },  /* SDX62/SDX65 */
>  	{ 0 }
>  };
>  

Comment and scope is better, but this is duplicating the body of
pci_pm_reset() using a different mechanism with different timeouts. It
would be better to extract the core of pci_pm_reset() to a
pci_do_pm_reset() function that's used both here and by the
pci_pm_reset() function. Thanks,

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v9] PCI: Add device-specific reset for Qualcomm devices
  2026-06-12 14:26 [PATCH v9] PCI: Add device-specific reset for Qualcomm devices Jose Ignacio Tornos Martinez
  2026-06-12 15:12 ` Alex Williamson
@ 2026-06-12 15:17 ` Bjorn Helgaas
  2026-06-15  7:30   ` Jose Ignacio Tornos Martinez
  2026-06-17 14:47 ` Manivannan Sadhasivam
  2 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2026-06-12 15:17 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez, Alex Williamson
  Cc: bhelgaas, jjohnson, mani, linux-pci, linux-wireless, ath11k,
	ath12k, mhi, linux-kernel

[+to: Alex, VFIO resets]

On Fri, Jun 12, 2026 at 04:26:38PM +0200, Jose Ignacio Tornos Martinez wrote:
> Some Qualcomm PCIe devices (WCN6855/WCN7850 WiFi cards, SDX62/SDX65 modems)
> lack working reset methods for VFIO passthrough scenarios. These devices
> have no FLR capability, advertise NoSoftRst+ (blocking PM reset), and have
> broken bus reset.

I guess "bus reset" here refers to Secondary Bus Reset being asserted
by the bridge upstream from these devices?  Seems a bit surprising if
that doesn't work.  Or is it just that we can't use SBR because there
are multiple devices below that bridge?

> The problem manifests in VFIO passthrough scenarios:
> 
> - WCN6855 WiFi card (17cb:1103): Normal VM operation works fine, including
>   clean shutdown/reboot. However, when the VM terminates uncleanly
>   (crash, force-off), VFIO attempts to reset the device before it can
>   be assigned to another VM. Without a working reset method, the device
>   remains in an undefined state, preventing reuse.

I don't know enough about VFIO, but I sort of expected that VFIO would
reset devices between reassignment regardless of how a VM terminates.
I guess that's not true?

> - WCN7850 WiFi card (17cb:1107): Same behavior as WCN6855.
> 
> - SDX62/SDX65 5G modems (17cb:0308): Never successfully initialize even
>   on first VM assignment without proper reset capability.
> 
> Add device-specific reset entries for these Qualcomm devices using D3hot
> power cycling. Testing shows that despite advertising NoSoftRst+, D3hot
> transition provides sufficient reset for VFIO reuse, particularly after
> unexpected VM termination. While not a complete reset (BARs preserved),
> it provides the only viable reset mechanism for these devices.

Since the device claims to preserve internal state across D3hot->D0
(and it sounds like at least BARs *are* preserved), is this a
potential leak of state between VMs?  To play devil's advocate, how do
we convince a customer that none of their data is ever leaked to a
subsequent tenant using this device?

> Testing was performed on desktop platforms with M.2 WiFi and modem cards
> using M.2-to-PCIe adapters, including extensive force-reset cycling to
> verify stability.
> 
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
> v9:
>   - Complete redesign based on maintainer feedback (Alex Williamson, Bjorn
>     Helgaas, Rafael Wysocki): dropped general d3cold infrastructure entirely
>     and now just a single patch: the proven D3hot reset for specific
>     Qualcomm devices (device-specific reset)
>   - Previous v8 patch 1/3 (general d3cold) dropped: concerns about ACPI
>     portability, bridge issues, runtime PM, and lack of _PR3 hardware for
>     testing.
>   - Previous v8 patch 3/3 (quirk_no_bus_reset) already merged for v7.2
> v8: https://lore.kernel.org/all/20260609163649.319755-1-jtornosm@redhat.com/
> 
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 431c021d7414..bac1edb6c2dc 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4240,6 +4240,41 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
>  	return 0;
>  }
>  
> +/*
> + * Device-specific reset method for certain Qualcomm devices via D3hot power
> + * cycle.
> + *
> + * These specific Qualcomm devices lack FLR capability, advertise NoSoftRst+
> + * (blocking PM reset), and have broken bus reset. Despite advertising
> + * NoSoftRst+, testing shows that D3hot transition provides sufficient reset
> + * for VFIO reuse, particularly after unexpected VM termination where the
> + * device would otherwise remain in an undefined state. While not a complete
> + * reset (BARs are preserved), it provides the only viable reset mechanism for
> + * these devices in the commented situations.
> + */
> +static int reset_qualcomm_d3hot(struct pci_dev *dev, bool probe)
> +{
> +	int ret;
> +
> +	if (probe)
> +		return 0;
> +
> +	if (dev->current_state != PCI_D0)
> +		return -EINVAL;
> +
> +	ret = pci_set_power_state(dev, PCI_D3hot);
> +	if (ret)
> +		return ret;
> +	msleep(200);
> +
> +	ret = pci_set_power_state(dev, PCI_D0);
> +	if (ret)
> +		return ret;
> +	msleep(200);
> +
> +	return 0;

If we think this is a viable method, it seems like we should use
pci_pm_reset(), which takes care of IOMMU and device readiness issues.

We would have to change pci_pm_reset() to deal with the fact that
PCI_PM_CTRL_NO_SOFT_RESET seems wrong on these devices.  Maybe we
could cache PCI_PM_CTRL_NO_SOFT_RESET in pci_pm_init(), then override
it with quirks for these devices?

> +}
> +
>  static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
>  		 reset_intel_82599_sfp_virtfn },
> @@ -4255,6 +4290,9 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  		reset_chelsio_generic_dev },
>  	{ PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF,
>  		reset_hinic_vf_dev },
> +	{ PCI_VENDOR_ID_QCOM, 0x1103, reset_qualcomm_d3hot },  /* WCN6855 */
> +	{ PCI_VENDOR_ID_QCOM, 0x1107, reset_qualcomm_d3hot },  /* WCN7850 */
> +	{ PCI_VENDOR_ID_QCOM, 0x0308, reset_qualcomm_d3hot },  /* SDX62/SDX65 */
>  	{ 0 }
>  };
>  
> -- 
> 2.54.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v9] PCI: Add device-specific reset for Qualcomm devices
  2026-06-12 15:17 ` Bjorn Helgaas
@ 2026-06-15  7:30   ` Jose Ignacio Tornos Martinez
  0 siblings, 0 replies; 8+ messages in thread
From: Jose Ignacio Tornos Martinez @ 2026-06-15  7:30 UTC (permalink / raw)
  To: helgaas, alex
  Cc: ath11k, ath12k, bhelgaas, jjohnson, jtornosm, linux-kernel,
	linux-pci, linux-wireless, mani, mhi

Hi Bjorn and Alex,

Bjorn's questions:

> I guess "bus reset" here refers to Secondary Bus Reset being asserted
> by the bridge upstream from these devices?  Seems a bit surprising if
> that doesn't work.  Or is it just that we can't use SBR because there
> are multiple devices below that bridge?

Yes, SBR. The devices I tested are alone on their bus (single device under
bridge), so it's a device-specific issue, not a topology problem. The
quirk_no_bus_reset patch addresses this for v7.2.

> I don't know enough about VFIO, but I sort of expected that VFIO would
> reset devices between reassignment regardless of how a VM terminates.
> I guess that's not true?

VFIO does attempt reset on every reassignment. Without a working reset method,
the attempt fails and the device remains in undefined state. With this quirk,
D3hot successfully resets the device allowing reassignment.

> Since the device claims to preserve internal state across D3hot->D0
> (and it sounds like at least BARs *are* preserved), is this a
> potential leak of state between VMs?  To play devil's advocate, how do
> we convince a customer that none of their data is ever leaked to a
> subsequent tenant using this device?

This is a valid concern. Testing shows device internals are reset despite
NoSoftRst+ (command register cleared, requires driver reinitialization),
though BARs are preserved. Given these devices have no other reset method,
this provides the only viable mechanism for VFIO reuse. We cannot improve
beyond what D3hot provides - the quirk works because despite advertising
NoSoftRst+, D3hot does clear sufficient internal state for clean
reinitialization.

> If we think this is a viable method, it seems like we should use
> pci_pm_reset(), which takes care of IOMMU and device readiness issues.
>
> We would have to change pci_pm_reset() to deal with the fact that
> PCI_PM_CTRL_NO_SOFT_RESET seems wrong on these devices.  Maybe we
> could cache PCI_PM_CTRL_NO_SOFT_RESET in pci_pm_init(), then override
> it with quirks for these devices?

I explored a similar idea in v2 (PCI_DEV_FLAGS_FORCE_PM_RESET to bypass
NoSoftRst+):
https://lore.kernel.org/linux-pci/20260508145153.717641-2-jtornosm@redhat.com/
(Note: v2 used driver names ath11k/ath12k instead of device-specific names
WCN6855/WCN7850, which Jeff Johnson later commented on in v7 feedback.)

Alex provided guidance on both approaches and indicated device-specific reset
seemed more appropriate here:

"Device specific resets are made for this scenario. Look at
pci_dev_specific_reset() and pci_dev_reset_methods[]. The supporting
evidence that this performs a worthwhile reset is still a bit weak, but
heuristically it seems better than nothing, which is what we're left
with otherwise. Reset via D3hot for a device that does not expose
NoSoftRst- is not something we should enable or endorse for any common
use case."

The device-specific approach keeps this quirk isolated to proven device IDs.
But I can revisit the pm quirk approach if you both prefer it.

Alex's suggestion:

> It would be better to extract the core of pci_pm_reset() to a
> pci_do_pm_reset() function that's used both here and by the
> pci_pm_reset() function.

Good point about the code duplication. In v9 I kept it as a self-contained
quirk to avoid modifying pci_pm_reset() and touching core pci.c code, trying
to minimize the change footprint. But I agree extracting a helper function
would be cleaner.

Once we confirm the preferred approach (device-specific vs pm quirk per
Bjorn's question above), I'll send v10 with the appropriate implementation
including the helper function if we proceed with the device-specific approach.

Thanks

Best regards
Jose Ignacio


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v9] PCI: Add device-specific reset for Qualcomm devices
  2026-06-12 14:26 [PATCH v9] PCI: Add device-specific reset for Qualcomm devices Jose Ignacio Tornos Martinez
  2026-06-12 15:12 ` Alex Williamson
  2026-06-12 15:17 ` Bjorn Helgaas
@ 2026-06-17 14:47 ` Manivannan Sadhasivam
  2026-06-17 15:47   ` Jose Ignacio Tornos Martinez
  2 siblings, 1 reply; 8+ messages in thread
From: Manivannan Sadhasivam @ 2026-06-17 14:47 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez
  Cc: bhelgaas, alex, jjohnson, linux-pci, linux-wireless, ath11k,
	ath12k, mhi, linux-kernel

On Fri, Jun 12, 2026 at 04:26:38PM +0200, Jose Ignacio Tornos Martinez wrote:
> Some Qualcomm PCIe devices (WCN6855/WCN7850 WiFi cards, SDX62/SDX65 modems)
> lack working reset methods for VFIO passthrough scenarios. These devices
> have no FLR capability, advertise NoSoftRst+ (blocking PM reset), and have
> broken bus reset.
> 
> The problem manifests in VFIO passthrough scenarios:
> 
> - WCN6855 WiFi card (17cb:1103): Normal VM operation works fine, including
>   clean shutdown/reboot. However, when the VM terminates uncleanly
>   (crash, force-off), VFIO attempts to reset the device before it can
>   be assigned to another VM. Without a working reset method, the device
>   remains in an undefined state, preventing reuse.
> 
> - WCN7850 WiFi card (17cb:1107): Same behavior as WCN6855.
> 
> - SDX62/SDX65 5G modems (17cb:0308): Never successfully initialize even
>   on first VM assignment without proper reset capability.
> 
> Add device-specific reset entries for these Qualcomm devices using D3hot
> power cycling. Testing shows that despite advertising NoSoftRst+, D3hot
> transition provides sufficient reset for VFIO reuse, particularly after
> unexpected VM termination. While not a complete reset (BARs preserved),
> it provides the only viable reset mechanism for these devices.
> 

I checked internally within Qcom and I was told that these PCIe devices retain
the context during D3Hot to D0 transition and that's why they advertise
No_Soft_Reset.

The partial reset behavior you are seeing might be due to firmware handling the
transition as an error state. All these devices use MHI bus and when the MHI
state doesn't match the PCIe device state, then the firmware will treat it as an
error and will try to torn down resources. This could be the reason why you are
seeing partial reset.

Nevertheless, these devices do not support any form of Soft Reset and only way
to reset them would be by doing D3Cold. But that depends on platform support
though. So it would be inaccurate/wrong to assume that these devices support
Soft Reset during D3Hot to D0 transition.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v9] PCI: Add device-specific reset for Qualcomm devices
  2026-06-17 14:47 ` Manivannan Sadhasivam
@ 2026-06-17 15:47   ` Jose Ignacio Tornos Martinez
  2026-06-17 16:55     ` Manivannan Sadhasivam
  0 siblings, 1 reply; 8+ messages in thread
From: Jose Ignacio Tornos Martinez @ 2026-06-17 15:47 UTC (permalink / raw)
  To: mani
  Cc: alex, ath11k, ath12k, bhelgaas, jjohnson, jtornosm, linux-kernel,
	linux-pci, linux-wireless, mhi

Hi Mani,

Thank you for the internal clarification and sharing this information.

I understand the behavior is firmware error recovery, not a proper reset.
However, these devices are widely used, and the inability to use them in VMs
is a significant problem. Could we explore options to achieve safe VFIO
operation?

  1. Are there ANY alternative reset mechanisms besides D3cold? For example:
     - Device-specific registers or commands?
     - MHI bus-level operations?
     - Firmware commands that could trigger proper reset?

     If such mechanisms exist, I'm willing to implement whatever is needed.

  2. If firmware error recovery is the only option available on platforms
     without _PR3, could we add software steps to make it VFIO-safe?
     For example, before/after the D3hot transition:
     - Explicit MHI state teardown?
     - Firmware commands to clear sensitive device state?
     - Additional verification or cleanup steps?

  3. The practical challenge is that _PR3 support is not available on most
     platforms where these devices need to be deployed (desktops, servers).
     Additionally, the general d3cold reset method has limitations and
     remains unimplemented due to the concerns raised earlier (ACPI
     portability, bridge issues, runtime PM complications).

     If D3cold is the only proper reset but requires _PR3, and no alternative
     mechanisms exist, could we consider accepting the firmware error recovery
     behavior as a last resort - clearly documented as a platform-specific
     workaround?

     Currently these devices have no reset capability on most platforms,
     making them completely unusable for VFIO. Even an imperfect reset is
     significantly better than no reset at all.

My goal is ensuring these devices can be safely reassigned between VMs.
I'm open to implementing any of the above approaches - or others you might
suggest.

Thank you

Best regards
José Ignacio


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v9] PCI: Add device-specific reset for Qualcomm devices
  2026-06-17 15:47   ` Jose Ignacio Tornos Martinez
@ 2026-06-17 16:55     ` Manivannan Sadhasivam
  2026-06-18  6:33       ` Jose Ignacio Tornos Martinez
  0 siblings, 1 reply; 8+ messages in thread
From: Manivannan Sadhasivam @ 2026-06-17 16:55 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez
  Cc: alex, ath11k, ath12k, bhelgaas, jjohnson, linux-kernel, linux-pci,
	linux-wireless, mhi

On Wed, Jun 17, 2026 at 05:47:04PM +0200, Jose Ignacio Tornos Martinez wrote:
> Hi Mani,
> 
> Thank you for the internal clarification and sharing this information.
> 
> I understand the behavior is firmware error recovery, not a proper reset.
> However, these devices are widely used, and the inability to use them in VMs
> is a significant problem. Could we explore options to achieve safe VFIO
> operation?
> 
>   1. Are there ANY alternative reset mechanisms besides D3cold? For example:
>      - Device-specific registers or commands?
>      - MHI bus-level operations?
>      - Firmware commands that could trigger proper reset?
> 
>      If such mechanisms exist, I'm willing to implement whatever is needed.
> 
>   2. If firmware error recovery is the only option available on platforms
>      without _PR3, could we add software steps to make it VFIO-safe?
>      For example, before/after the D3hot transition:
>      - Explicit MHI state teardown?
>      - Firmware commands to clear sensitive device state?
>      - Additional verification or cleanup steps?
> 
>   3. The practical challenge is that _PR3 support is not available on most
>      platforms where these devices need to be deployed (desktops, servers).
>      Additionally, the general d3cold reset method has limitations and
>      remains unimplemented due to the concerns raised earlier (ACPI
>      portability, bridge issues, runtime PM complications).
> 
>      If D3cold is the only proper reset but requires _PR3, and no alternative
>      mechanisms exist, could we consider accepting the firmware error recovery
>      behavior as a last resort - clearly documented as a platform-specific
>      workaround?
> 
>      Currently these devices have no reset capability on most platforms,
>      making them completely unusable for VFIO. Even an imperfect reset is
>      significantly better than no reset at all.
> 
> My goal is ensuring these devices can be safely reassigned between VMs.
> I'm open to implementing any of the above approaches - or others you might
> suggest.
> 

Can you share the exact steps that you tried for passthrough? I'm curious to see
whether you unbinded the MHI host/WLAN driver from the device or not. For the
modem devices, the MHI Host driver's (drivers/bus/mhi/host/pci_generic.c) remove
callback should've quiesced the device and moved the MHI state to RESET if the
driver was unbinded before binding the device with vfio-pci.

I certainly feel that the MHI/WLAN driver should be able to reset the device
during unbind. But I'm not sure if that reset will affect only the firmware
state or the device's config state also. This is something I need to
investigate.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v9] PCI: Add device-specific reset for Qualcomm devices
  2026-06-17 16:55     ` Manivannan Sadhasivam
@ 2026-06-18  6:33       ` Jose Ignacio Tornos Martinez
  0 siblings, 0 replies; 8+ messages in thread
From: Jose Ignacio Tornos Martinez @ 2026-06-18  6:33 UTC (permalink / raw)
  To: mani
  Cc: alex, ath11k, ath12k, bhelgaas, jjohnson, jtornosm, linux-kernel,
	linux-pci, linux-wireless, mhi

Hi Mani,

Let me clarify the exact scenario and where the reset is necessary:

* For the commented WiFi devices (WCN6855/WCN7850):

Standard VFIO passthrough flow (this works fine):
  1. Unbind native driver (ath11k/ath12k/MHI)
  2. Bind vfio-pci driver
  3. Assign device to VM
  4. VM boots, loads its own driver → device works perfectly
  5. VM shuts down cleanly → device can be reassigned → works fine

The problem occurs with unclean VM termination:
  1. VM crashes or is force-terminated
  2. VFIO tries to reset the device before reassignment
  3. Without a working PCI reset method, reset fails
  4. Device stuck in undefined state → cannot be reassigned to another VM
  
     Unbinding the driver again doesn't help because the device hardware
     itself is in a bad state. From hypervisor:
     $ lspci -vvv -s 0000:03:00.0
        03:00.0 Network controller: Qualcomm Technologies, Inc (rev ff) (prog-if ff)
            !!! Unknown header type 7f
     And a full host power-cycle is necessary to recover.
     
* For the commented modem devices (SDX62/SDX65): 

Even worse because it fails during the first VM boot without proper reset
capability, standard VFIO passthrough flow:
  1. Unbind native driver (MHI)
  2. Bind vfio-pci driver
  3. Assign device to VM
  4. VM boots, loads its own driver and crashes:
     [   24.024165] mhi mhi0: Device failed to enter MHI Ready
     [   24.024168] mhi mhi0: MHI did not enter READY state
     
     Unbind/rebind attempts fail:
     [  352.643601] mhi mhi0: Requested to power ON
     [  352.643611] mhi mhi0: Power on setup success
     [  373.442954] mhi mhi0: Device failed to clear MHI Reset
     [  373.442970] mhi mhi0: MHI did not enter READY state
     And requires a full host power cycle to recover,
     even outside of VFIO scenarios.

* MHI Host driver's remove callback may handle clean software state
teardown, but it doesn't provide a PCI reset capability that VFIO can
invoke. VFIO needs a reset method registered in the PCI reset hierarchy
(device_specific, pm, flr, bus, etc.). VFIO invokes this reset both during
initial device binding (before the VM starts) and when reassigning the
device between VMs - without a working reset method, the device cannot
reach a clean state for initialization.



I hope this clarifies the scenario better. Please let me know if I can
provide more information or run any specific tests to help investigate this
further.

Thanks

Best regards
José Ignacio


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-18  6:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 14:26 [PATCH v9] PCI: Add device-specific reset for Qualcomm devices Jose Ignacio Tornos Martinez
2026-06-12 15:12 ` Alex Williamson
2026-06-12 15:17 ` Bjorn Helgaas
2026-06-15  7:30   ` Jose Ignacio Tornos Martinez
2026-06-17 14:47 ` Manivannan Sadhasivam
2026-06-17 15:47   ` Jose Ignacio Tornos Martinez
2026-06-17 16:55     ` Manivannan Sadhasivam
2026-06-18  6:33       ` Jose Ignacio Tornos Martinez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox