All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v18 0/3] [PCI] Error recovery for vfio-pci devices on s390x
@ 2026-06-03 18:16 Farhan Ali
  2026-06-03 18:16 ` [PATCH v18 1/3] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Farhan Ali @ 2026-06-03 18:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, linux-pci
  Cc: helgaas, alex, alifm, schnelle, mjrosato

Hi Bjorn,

This patch set includes only the PCI patches of the original series for
error recovery for vfio-pci devices on s390x [1]. Breaking up the patch
series into PCI and VFIO only patches to make merging easier based on
discussion with Alex [2].

Thanks
Farhan

[1] https://lore.kernel.org/all/20260520171113.1111-1-alifm@linux.ibm.com/
[2] https://lore.kernel.org/all/20260602163344.1eda12d2@shazbot.org/

ChangeLog
---------
v17 -> v18
  - Rebase on 7.1-rc6.

Farhan Ali (3):
  PCI: Allow per function PCI slots to fix slot reset on s390
  PCI: Avoid saving config space state if inaccessible
  PCI: Fail FLR when config space is inaccessible

 drivers/pci/hotplug/rpaphp_slot.c |  2 +-
 drivers/pci/pci.c                 | 32 ++++++++++++++++++++++++++++--
 drivers/pci/slot.c                | 33 +++++++++++++++++++++++--------
 include/linux/pci.h               |  8 ++++++--
 4 files changed, 62 insertions(+), 13 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v18 1/3] PCI: Allow per function PCI slots to fix slot reset on s390
  2026-06-03 18:16 [PATCH v18 0/3] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
@ 2026-06-03 18:16 ` Farhan Ali
  2026-06-03 18:33   ` sashiko-bot
  2026-06-03 18:16 ` [PATCH v18 2/3] PCI: Avoid saving config space state if inaccessible Farhan Ali
  2026-06-03 18:16 ` [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible Farhan Ali
  2 siblings, 1 reply; 15+ messages in thread
From: Farhan Ali @ 2026-06-03 18:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, linux-pci
  Cc: helgaas, alex, alifm, schnelle, mjrosato, stable

On s390 systems, which use a machine level hypervisor, PCI devices are
always accessed through a form of PCI pass-through which fundamentally
operates on a per PCI function granularity. This is also reflected in the
s390 PCI hotplug driver which creates hotplug slots for individual PCI
functions. Its reset_slot() function, which is a wrapper for
zpci_hot_reset_device(), thus also resets individual functions.

Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
to multifunction devices. This approach worked fine on s390 systems that
only exposed virtual functions as individual PCI domains to the operating
system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
s390 supports exposing the topology of multifunction PCI devices by
grouping them in a shared PCI domain. This creates a problem when resetting
a function through the hotplug driver's slot_reset() interface.

When attempting to reset a function through the hotplug driver, the shared
slot assignment causes the wrong function to be reset instead of the
intended one. It also leaks memory as we do create a pci_slot object for
the function, but don't correctly free it in pci_slot_release().

Add a flag for struct pci_slot to allow per function PCI slots for
functions managed through a hypervisor, which exposes individual PCI
functions while retaining the topology. Since we can use all 8 bits
for slot 'number' (for ARI devices), change slot 'number' u16 to
account for special values -1 and PCI_SLOT_ALL_DEVICES.

Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
Cc: stable@vger.kernel.org
Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/hotplug/rpaphp_slot.c |  2 +-
 drivers/pci/pci.c                 |  5 +++--
 drivers/pci/slot.c                | 33 +++++++++++++++++++++++--------
 include/linux/pci.h               |  8 ++++++--
 4 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/hotplug/rpaphp_slot.c b/drivers/pci/hotplug/rpaphp_slot.c
index 67362e5b9971..92eabf5f61b9 100644
--- a/drivers/pci/hotplug/rpaphp_slot.c
+++ b/drivers/pci/hotplug/rpaphp_slot.c
@@ -84,7 +84,7 @@ int rpaphp_register_slot(struct slot *slot)
 	struct hotplug_slot *php_slot = &slot->hotplug_slot;
 	u32 my_index;
 	int retval;
-	int slotno = -1;
+	int slotno = PCI_SLOT_PLACEHOLDER;
 
 	dbg("%s registering slot:path[%pOF] index[%x], name[%s] pdomain[%x] type[%d]\n",
 		__func__, slot->dn, slot->index, slot->name,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d34266651ad0..f5f8291482b0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4865,8 +4865,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
 
 static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
 {
-	if (dev->multifunction || dev->subordinate || !dev->slot ||
-	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
+	if (dev->subordinate || !dev->slot ||
+	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+	    (dev->multifunction && !dev->slot->per_func_slot))
 		return -ENOTTY;
 
 	return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 6d5cd37bfb1e..894d6213ed30 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -37,7 +37,7 @@ static const struct sysfs_ops pci_slot_sysfs_ops = {
 
 static ssize_t address_read_file(struct pci_slot *slot, char *buf)
 {
-	if (slot->number == 0xff)
+	if (slot->number == (u16)PCI_SLOT_PLACEHOLDER)
 		return sysfs_emit(buf, "%04x:%02x\n",
 				  pci_domain_nr(slot->bus),
 				  slot->bus->number);
@@ -72,6 +72,23 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
 	return bus_speed_read(slot->bus->cur_bus_speed, buf);
 }
 
+static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
+{
+	if (slot->per_func_slot)
+		return dev->devfn == slot->number;
+
+	return slot->number == PCI_SLOT_ALL_DEVICES ||
+		PCI_SLOT(dev->devfn) == slot->number;
+}
+
+static bool pci_slot_enabled_per_func(void)
+{
+	if (IS_ENABLED(CONFIG_S390))
+		return true;
+
+	return false;
+}
+
 static void pci_slot_release(struct kobject *kobj)
 {
 	struct pci_dev *dev;
@@ -82,8 +99,7 @@ static void pci_slot_release(struct kobject *kobj)
 
 	down_read(&pci_bus_sem);
 	list_for_each_entry(dev, &slot->bus->devices, bus_list)
-		if (slot->number == PCI_SLOT_ALL_DEVICES ||
-		    PCI_SLOT(dev->devfn) == slot->number)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = NULL;
 	up_read(&pci_bus_sem);
 
@@ -187,8 +203,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
 
 	mutex_lock(&pci_slot_mutex);
 	list_for_each_entry(slot, &dev->bus->slots, list)
-		if (slot->number == PCI_SLOT_ALL_DEVICES ||
-		    PCI_SLOT(dev->devfn) == slot->number)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = slot;
 	mutex_unlock(&pci_slot_mutex);
 }
@@ -267,7 +282,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 
 	mutex_lock(&pci_slot_mutex);
 
-	if (slot_nr == -1)
+	if (slot_nr == PCI_SLOT_PLACEHOLDER)
 		goto placeholder;
 
 	/*
@@ -298,6 +313,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 	slot->bus = pci_bus_get(parent);
 	slot->number = slot_nr;
 
+	if (pci_slot_enabled_per_func())
+		slot->per_func_slot = 1;
+
 	slot->kobj.kset = pci_slots_kset;
 
 	slot_name = make_slot_name(name);
@@ -318,8 +336,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 
 	down_read(&pci_bus_sem);
 	list_for_each_entry(dev, &parent->devices, bus_list)
-		if (slot_nr == PCI_SLOT_ALL_DEVICES ||
-		    PCI_SLOT(dev->devfn) == slot_nr)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = slot;
 	up_read(&pci_bus_sem);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4454583c11..d58982aa8730 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -78,14 +78,18 @@
  * and, if ARI Forwarding is enabled, functions may appear to be on multiple
  * devices.
  */
-#define PCI_SLOT_ALL_DEVICES	0xfe
+#define PCI_SLOT_ALL_DEVICES	0xfeff
+
+/* Used to identify a slot as a placeholder */
+#define PCI_SLOT_PLACEHOLDER	-1
 
 /* pci_slot represents a physical slot */
 struct pci_slot {
 	struct pci_bus		*bus;		/* Bus this slot is on */
 	struct list_head	list;		/* Node in list of slots */
 	struct hotplug_slot	*hotplug;	/* Hotplug info (move here) */
-	unsigned char		number;		/* Device nr, or PCI_SLOT_ALL_DEVICES */
+	u16			number;		/* Device nr, or PCI_SLOT_ALL_DEVICES */
+	unsigned int		per_func_slot:1; /* Allow per function slot */
 	struct kobject		kobj;
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v18 2/3] PCI: Avoid saving config space state if inaccessible
  2026-06-03 18:16 [PATCH v18 0/3] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
  2026-06-03 18:16 ` [PATCH v18 1/3] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
@ 2026-06-03 18:16 ` Farhan Ali
  2026-06-03 18:16 ` [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible Farhan Ali
  2 siblings, 0 replies; 15+ messages in thread
From: Farhan Ali @ 2026-06-03 18:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, linux-pci
  Cc: helgaas, alex, alifm, schnelle, mjrosato, Bjorn Helgaas

The current reset process saves the device's config space state before
reset and restores it afterward. However errors may occur unexpectedly and
it may then be impossible to save config space because the device may be
inaccessible (e.g. DPC). This results in saving invalid values that get
written back to the device during state restoration.

With a reset we want to recover/restore the device into a functional state.
So avoid saving the state of the config space when the device config space
is inaccessible.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/pci.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f5f8291482b0..973d23e41c48 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -722,6 +722,27 @@ u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
 }
 EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
 
+static bool pci_dev_config_accessible(struct pci_dev *dev, char *msg)
+{
+	u32 val;
+
+	/*
+	 * If device's config space is inaccessible it can return ~0 for
+	 * any reads. Since VFs can also return ~0 for Device and Vendor ID
+	 * check Command and Status registers. Note that this is racy
+	 * because the device may become inaccessible partway through
+	 * next access.
+	 */
+	pci_read_config_dword(dev, PCI_COMMAND, &val);
+	if (PCI_POSSIBLE_ERROR(val)) {
+		pci_warn(dev, "Device config space inaccessible; unable to %s\n",
+				msg);
+		return false;
+	}
+
+	return true;
+}
+
 /**
  * pci_find_parent_resource - return resource region of parent bus of given
  *			      region
@@ -5027,6 +5048,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
 	 */
 	pci_set_power_state(dev, PCI_D0);
 
+	if (!pci_dev_config_accessible(dev, "save state"))
+		return;
+
 	pci_save_state(dev);
 	/*
 	 * Disable the device by clearing the Command register, except for
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-03 18:16 [PATCH v18 0/3] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
  2026-06-03 18:16 ` [PATCH v18 1/3] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
  2026-06-03 18:16 ` [PATCH v18 2/3] PCI: Avoid saving config space state if inaccessible Farhan Ali
@ 2026-06-03 18:16 ` Farhan Ali
  2026-06-03 18:57   ` sashiko-bot
  2 siblings, 1 reply; 15+ messages in thread
From: Farhan Ali @ 2026-06-03 18:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, linux-pci
  Cc: helgaas, alex, alifm, schnelle, mjrosato, Benjamin Block

If a device is in an error state, then it's config space may not be
accssible. Add additional check to validate if a device's config space is
accessible before doing an FLR reset.

Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 973d23e41c48..061dfcd1472e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
 	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
 		return -ENOTTY;
 
+	if (!pci_dev_config_accessible(dev, "FLR"))
+		return -ENOTTY;
+
 	if (probe)
 		return 0;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 1/3] PCI: Allow per function PCI slots to fix slot reset on s390
  2026-06-03 18:16 ` [PATCH v18 1/3] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
@ 2026-06-03 18:33   ` sashiko-bot
  0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-06-03 18:33 UTC (permalink / raw)
  To: Farhan Ali; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] This problem wasn't introduced by this patch, but a Use-After-Free (UAF) vulnerability exists due to lockless access to `dev->slot` in `pci_dev_reset_slot_function()`. The function reads `dev->slot` to check `!dev->slot`, and subsequently dereferences it (`dev->slot->per_func_slot` and `dev->slot->hotplug`). However, it does not hold `pci_slot_mutex` or `pci_bus_sem`. Concurrently, if a hotplug service driver (like `pciehp`) is unbound, it triggers `pci_destroy_slot()` which drops the refcount and calls `pci_slot_release()`. This sets `dev->slot = NULL` and frees the slot memory. A race condition allows the reset function to read the slot pointer before it is cleared and then dereference freed memory.
--

commit ce8f71bdb64d4b651c9d934da7ecbb2d2811bf71
Author: Farhan Ali <alifm@linux.ibm.com>

PCI: Allow per function PCI slots to fix slot reset on s390

This commit introduces a flag in struct pci_slot to allow per-function
PCI slots for hypervisor-managed functions. It also modifies the slot
reset function to account for this new flag and handle multi-function
devices appropriately on s390 systems.

[ ... ]

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8f7cfcc000901..d0c9f0166af5c 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4865,8 +4865,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
>  
>  static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
>  {
> -	if (dev->multifunction || dev->subordinate || !dev->slot ||
> -	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
> +	if (dev->subordinate || !dev->slot ||
> +	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
> +	    (dev->multifunction && !dev->slot->per_func_slot))
>  		return -ENOTTY;
>  
>  	return pci_reset_hotplug_slot(dev->slot->hotplug, probe);

[Severity: High]
This isn't a bug introduced by this patch, but is it possible for a concurrent
hotplug driver unbind to cause a use-after-free here?

Looking at pci_dev_reset_slot_function(), dev->slot is checked and then
dereferenced multiple times locklessly:

    if (dev->subordinate || !dev->slot ||
        dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
        (dev->multifunction && !dev->slot->per_func_slot))
            return -ENOTTY;

    return pci_reset_hotplug_slot(dev->slot->hotplug, probe);

If a hotplug service driver like pciehp is unbound concurrently, it triggers
pci_destroy_slot() which drops the refcount and eventually calls
pci_slot_release(). This clears dev->slot and frees the memory:

pci_slot_release() {
    ...
    down_read(&pci_bus_sem);
    list_for_each_entry(dev, &slot->bus->devices, bus_list)
        if (pci_dev_matches_slot(dev, slot))
            dev->slot = NULL;
    up_read(&pci_bus_sem);
    ...
    kfree(slot);
}

Could the reset function read dev->slot, get interrupted by the slot release,
and then dereference the freed memory?

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260603181647.2215-1-alifm@linux.ibm.com?part=1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-03 18:16 ` [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible Farhan Ali
@ 2026-06-03 18:57   ` sashiko-bot
  2026-06-09 22:33     ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-03 18:57 UTC (permalink / raw)
  To: Farhan Ali; +Cc: linux-pci

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] The config space accessibility check is placed in pcie_reset_flr() instead of pcie_flr(), leaving direct callers of pcie_flr() exposed to 60-second hangs.
- [High] The PCI reset quirk delay_250ms_after_flr ignores the return value of pcie_reset_flr(), which masks reset failures and completely breaks the PCI reset fallback mechanism.
--

commit 4e372ed957b10d82d85382f347de3e515a680bbf
Author: Farhan Ali <alifm@linux.ibm.com>

PCI: Fail FLR when config space is inaccessible

This commit adds an accessibility check for the PCI configuration space before
attempting an FLR (Function Level Reset). This is intended to avoid potential
hangs on devices that are in an error state.

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e71b814153921..c50418f5e3181 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
>  	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>  		return -ENOTTY;
>  
> +	if (!pci_dev_config_accessible(dev, "FLR"))
> +		return -ENOTTY;
> +

[Severity: High]
This isn't a bug introduced by this patch, but does placing this check in
pcie_reset_flr() leave direct callers of pcie_flr() exposed to 60-second
hangs?

Direct callers of pcie_flr() (like network and storage drivers during error
recovery) bypass this wrapper. They will still experience a stall in
pci_dev_wait() when attempting to reset a device with an inaccessible config
space. Should this accessibility check be moved into pcie_flr() instead to
protect all callers?

[Severity: High]
This is a pre-existing issue, but does adding this new dynamic failure path
break the fallback reset mechanism in delay_250ms_after_flr()?

Looking at delay_250ms_after_flr() in drivers/pci/quirks.c, it executes
pcie_reset_flr() but discards its return value:

drivers/pci/quirks.c:delay_250ms_after_flr() {
    ...
    pcie_reset_flr(dev, PCI_RESET_DO_RESET);

    msleep(250);

    return 0;
}

With this patch returning -ENOTTY when the config space is inaccessible,
the quirk will falsely report success. This causes
__pci_reset_function_locked() to abort early and completely prevents it from
attempting fallback reset methods, leaving the device in an unrecoverable
state.

>  	if (probe)
>  		return 0;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260603181647.2215-1-alifm@linux.ibm.com?part=3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-03 18:57   ` sashiko-bot
@ 2026-06-09 22:33     ` Bjorn Helgaas
  2026-06-10 16:51       ` Farhan Ali
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2026-06-09 22:33 UTC (permalink / raw)
  To: Farhan Ali, sashiko-reviews; +Cc: linux-pci

On Wed, Jun 03, 2026 at 06:57:36PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> 
> Pre-existing issues:
> - [High] The config space accessibility check is placed in pcie_reset_flr() instead of pcie_flr(), leaving direct callers of pcie_flr() exposed to 60-second hangs.
> ...

I don't know why sashiko calls this a "pre-existing" issue.  This
patch *adds* the config space accessibility check, so it looks like an
issue with this patch to me.

> > +++ b/drivers/pci/pci.c
> > @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
> >  	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> >  		return -ENOTTY;
> >  
> > +	if (!pci_dev_config_accessible(dev, "FLR"))
> > +		return -ENOTTY;
> > +
> 
> [Severity: High]
> This isn't a bug introduced by this patch, but does placing this check in
> pcie_reset_flr() leave direct callers of pcie_flr() exposed to 60-second
> hangs?
> 
> Direct callers of pcie_flr() (like network and storage drivers during error
> recovery) bypass this wrapper. They will still experience a stall in
> pci_dev_wait() when attempting to reset a device with an inaccessible config
> space. Should this accessibility check be moved into pcie_flr() instead to
> protect all callers?

I can't remember why we have both pcie_reset_flr() and pcie_flr(),
other than the fact that callers don't need to supply a "probe"
argument to pcie_flr().

Is there a reason to call pci_dev_config_accessible() here rather than
in pcie_flr()?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-09 22:33     ` Bjorn Helgaas
@ 2026-06-10 16:51       ` Farhan Ali
  2026-06-10 20:02         ` Bjorn Helgaas
  2026-06-10 23:44         ` Bjorn Helgaas
  0 siblings, 2 replies; 15+ messages in thread
From: Farhan Ali @ 2026-06-10 16:51 UTC (permalink / raw)
  To: Bjorn Helgaas, sashiko-reviews; +Cc: linux-pci


On 6/9/2026 3:33 PM, Bjorn Helgaas wrote:
> On Wed, Jun 03, 2026 at 06:57:36PM +0000, sashiko-bot@kernel.org wrote:
>> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>>
>> Pre-existing issues:
>> - [High] The config space accessibility check is placed in pcie_reset_flr() instead of pcie_flr(), leaving direct callers of pcie_flr() exposed to 60-second hangs.
>> ...
> I don't know why sashiko calls this a "pre-existing" issue.  This
> patch *adds* the config space accessibility check, so it looks like an
> issue with this patch to me.

The way I interpreted it was, this issue in pcie_flr() of 60 second hang 
would hit with or without this patch if config space in inaccessible.


>>> +++ b/drivers/pci/pci.c
>>> @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>>   	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>>   		return -ENOTTY;
>>>   
>>> +	if (!pci_dev_config_accessible(dev, "FLR"))
>>> +		return -ENOTTY;
>>> +
>> [Severity: High]
>> This isn't a bug introduced by this patch, but does placing this check in
>> pcie_reset_flr() leave direct callers of pcie_flr() exposed to 60-second
>> hangs?
>>
>> Direct callers of pcie_flr() (like network and storage drivers during error
>> recovery) bypass this wrapper. They will still experience a stall in
>> pci_dev_wait() when attempting to reset a device with an inaccessible config
>> space. Should this accessibility check be moved into pcie_flr() instead to
>> protect all callers?
> I can't remember why we have both pcie_reset_flr() and pcie_flr(),
> other than the fact that callers don't need to supply a "probe"
> argument to pcie_flr().
>
> Is there a reason to call pci_dev_config_accessible() here rather than
> in pcie_flr()?

The reason we wanted the check in pcie_reset_flr() was so to be able to 
escalate to bus reset method if we can't do an FLR. I think the check 
could be moved to pcie_flr(). Is that more preferable?

Thanks

Farhan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-10 16:51       ` Farhan Ali
@ 2026-06-10 20:02         ` Bjorn Helgaas
  2026-06-10 21:56           ` Farhan Ali
  2026-06-10 23:44         ` Bjorn Helgaas
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2026-06-10 20:02 UTC (permalink / raw)
  To: Farhan Ali; +Cc: sashiko-reviews, linux-pci, Alex Williamson

[+cc Alex]

On Wed, Jun 10, 2026 at 09:51:55AM -0700, Farhan Ali wrote:
> On 6/9/2026 3:33 PM, Bjorn Helgaas wrote:
> > On Wed, Jun 03, 2026 at 06:57:36PM +0000, sashiko-bot@kernel.org wrote:
> > > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> > > 
> > > Pre-existing issues:
> ...

> > > > +++ b/drivers/pci/pci.c
> > > > @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
> > > >   	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> > > >   		return -ENOTTY;
> > > > +	if (!pci_dev_config_accessible(dev, "FLR"))
> > > > +		return -ENOTTY;
> > > > +
> > > [Severity: High]
> > > This isn't a bug introduced by this patch, but does placing this
> > > check in pcie_reset_flr() leave direct callers of pcie_flr()
> > > exposed to 60-second hangs?
> > > 
> > > Direct callers of pcie_flr() (like network and storage drivers
> > > during error recovery) bypass this wrapper. They will still
> > > experience a stall in pci_dev_wait() when attempting to reset a
> > > device with an inaccessible config space. Should this
> > > accessibility check be moved into pcie_flr() instead to protect
> > > all callers?
> >
> > I can't remember why we have both pcie_reset_flr() and pcie_flr(),
> > other than the fact that callers don't need to supply a "probe"
> > argument to pcie_flr().
> > 
> > Is there a reason to call pci_dev_config_accessible() here rather
> > than in pcie_flr()?
> 
> The reason we wanted the check in pcie_reset_flr() was so to be able
> to escalate to bus reset method if we can't do an FLR. I think the
> check could be moved to pcie_flr(). Is that more preferable?

What if we added a preliminary patch that folds the body of pcie_flr()
into pcie_reset_flr() and adds:

  #define pcie_flr(dev)	pcie_reset_flr(dev, PCI_RESET_DO_RESET)

That would mean pcie_flr() users would start paying attention to
PCI_DEV_FLAGS_NO_FLR_RESET and PCI_EXP_DEVCAP_FLR, which seems like a
good thing.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-10 20:02         ` Bjorn Helgaas
@ 2026-06-10 21:56           ` Farhan Ali
  2026-06-11 19:22             ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Farhan Ali @ 2026-06-10 21:56 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: sashiko-reviews, linux-pci, Alex Williamson


On 6/10/2026 1:02 PM, Bjorn Helgaas wrote:
> [+cc Alex]
>
> On Wed, Jun 10, 2026 at 09:51:55AM -0700, Farhan Ali wrote:
>> On 6/9/2026 3:33 PM, Bjorn Helgaas wrote:
>>> On Wed, Jun 03, 2026 at 06:57:36PM +0000, sashiko-bot@kernel.org wrote:
>>>> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>>>>
>>>> Pre-existing issues:
>> ...
>>>>> +++ b/drivers/pci/pci.c
>>>>> @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>>>>    	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>>>>    		return -ENOTTY;
>>>>> +	if (!pci_dev_config_accessible(dev, "FLR"))
>>>>> +		return -ENOTTY;
>>>>> +
>>>> [Severity: High]
>>>> This isn't a bug introduced by this patch, but does placing this
>>>> check in pcie_reset_flr() leave direct callers of pcie_flr()
>>>> exposed to 60-second hangs?
>>>>
>>>> Direct callers of pcie_flr() (like network and storage drivers
>>>> during error recovery) bypass this wrapper. They will still
>>>> experience a stall in pci_dev_wait() when attempting to reset a
>>>> device with an inaccessible config space. Should this
>>>> accessibility check be moved into pcie_flr() instead to protect
>>>> all callers?
>>> I can't remember why we have both pcie_reset_flr() and pcie_flr(),
>>> other than the fact that callers don't need to supply a "probe"
>>> argument to pcie_flr().
>>>
>>> Is there a reason to call pci_dev_config_accessible() here rather
>>> than in pcie_flr()?
>> The reason we wanted the check in pcie_reset_flr() was so to be able
>> to escalate to bus reset method if we can't do an FLR. I think the
>> check could be moved to pcie_flr(). Is that more preferable?
> What if we added a preliminary patch that folds the body of pcie_flr()
> into pcie_reset_flr() and adds:
>
>    #define pcie_flr(dev)	pcie_reset_flr(dev, PCI_RESET_DO_RESET)
>
> That would mean pcie_flr() users would start paying attention to
> PCI_DEV_FLAGS_NO_FLR_RESET and PCI_EXP_DEVCAP_FLR, which seems like a
> good thing.

Looking at the history, it seems commit 56f107d7813f ("PCI: Add 
pcie_reset_flr() with 'probe' argument") added pcie_reset_flr() to add 
the probe and remove pcie_has_flr(). It also notes that callers of 
pcie_flr() that didn't call pcie_has_flr() remain. I guess it didn't 
want to change the drivers that called pcie_flr() unconditionally?

There is also reset_intel_82599_sfp_virtfn() that mentions the need to 
FLR without checking for FLR support due to some quirk. So I wonder if 
there were genuine reasons to avoid doing the check for 
PCI_DEV_FLAGS_NO_FLR_RESET and PCI_EXP_DEVCAP_FLR?

Thanks

Farhan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-10 16:51       ` Farhan Ali
  2026-06-10 20:02         ` Bjorn Helgaas
@ 2026-06-10 23:44         ` Bjorn Helgaas
  2026-06-11 18:22           ` Farhan Ali
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2026-06-10 23:44 UTC (permalink / raw)
  To: Farhan Ali; +Cc: sashiko-reviews, linux-pci

On Wed, Jun 10, 2026 at 09:51:55AM -0700, Farhan Ali wrote:
> On 6/9/2026 3:33 PM, Bjorn Helgaas wrote:
> > On Wed, Jun 03, 2026 at 06:57:36PM +0000, sashiko-bot@kernel.org wrote:
> > > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> > > 
> > > Pre-existing issues:
> ...

> > > > +++ b/drivers/pci/pci.c
> > > > @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
> > > >   	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> > > >   		return -ENOTTY;
> > > > +	if (!pci_dev_config_accessible(dev, "FLR"))
> > > > +		return -ENOTTY;
> > > > +
> > > [Severity: High]
> > > This isn't a bug introduced by this patch, but does placing this
> > > check in pcie_reset_flr() leave direct callers of pcie_flr()
> > > exposed to 60-second hangs?
> > > 
> > > Direct callers of pcie_flr() (like network and storage drivers
> > > during error recovery) bypass this wrapper. They will still
> > > experience a stall in pci_dev_wait() when attempting to reset a
> > > device with an inaccessible config space. Should this
> > > accessibility check be moved into pcie_flr() instead to protect
> > > all callers?
> ...

> > Is there a reason to call pci_dev_config_accessible() here rather
> > than in pcie_flr()?
> 
> The reason we wanted the check in pcie_reset_flr() was so to be able
> to escalate to bus reset method if we can't do an FLR. I think the
> check could be moved to pcie_flr(). Is that more preferable?

I think so.  Doesn't the escalation still work if the check is in
pcie_flr()?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-10 23:44         ` Bjorn Helgaas
@ 2026-06-11 18:22           ` Farhan Ali
  0 siblings, 0 replies; 15+ messages in thread
From: Farhan Ali @ 2026-06-11 18:22 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: sashiko-reviews, linux-pci


On 6/10/2026 4:44 PM, Bjorn Helgaas wrote:
> On Wed, Jun 10, 2026 at 09:51:55AM -0700, Farhan Ali wrote:
>> On 6/9/2026 3:33 PM, Bjorn Helgaas wrote:
>>> On Wed, Jun 03, 2026 at 06:57:36PM +0000, sashiko-bot@kernel.org wrote:
>>>> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>>>>
>>>> Pre-existing issues:
>> ...
>>>>> +++ b/drivers/pci/pci.c
>>>>> @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>>>>    	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>>>>    		return -ENOTTY;
>>>>> +	if (!pci_dev_config_accessible(dev, "FLR"))
>>>>> +		return -ENOTTY;
>>>>> +
>>>> [Severity: High]
>>>> This isn't a bug introduced by this patch, but does placing this
>>>> check in pcie_reset_flr() leave direct callers of pcie_flr()
>>>> exposed to 60-second hangs?
>>>>
>>>> Direct callers of pcie_flr() (like network and storage drivers
>>>> during error recovery) bypass this wrapper. They will still
>>>> experience a stall in pci_dev_wait() when attempting to reset a
>>>> device with an inaccessible config space. Should this
>>>> accessibility check be moved into pcie_flr() instead to protect
>>>> all callers?
>> ...
>>> Is there a reason to call pci_dev_config_accessible() here rather
>>> than in pcie_flr()?
>> The reason we wanted the check in pcie_reset_flr() was so to be able
>> to escalate to bus reset method if we can't do an FLR. I think the
>> check could be moved to pcie_flr(). Is that more preferable?
> I think so.  Doesn't the escalation still work if the check is in
> pcie_flr()?

Yes it would still work. I will move the check to pcie_flr() in the next 
revision.

Thanks

Farhan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-10 21:56           ` Farhan Ali
@ 2026-06-11 19:22             ` Bjorn Helgaas
  2026-06-11 23:29               ` Farhan Ali
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2026-06-11 19:22 UTC (permalink / raw)
  To: Farhan Ali; +Cc: sashiko-reviews, linux-pci, Alex Williamson

On Wed, Jun 10, 2026 at 02:56:48PM -0700, Farhan Ali wrote:
> On 6/10/2026 1:02 PM, Bjorn Helgaas wrote:
> > On Wed, Jun 10, 2026 at 09:51:55AM -0700, Farhan Ali wrote:
> > > On 6/9/2026 3:33 PM, Bjorn Helgaas wrote:
> > > > On Wed, Jun 03, 2026 at 06:57:36PM +0000, sashiko-bot@kernel.org wrote:
> > > > > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> > > > > 
> > > > > Pre-existing issues:
> > > ...
> > > > > > +++ b/drivers/pci/pci.c
> > > > > > @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
> > > > > >    	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> > > > > >    		return -ENOTTY;
> > > > > > +	if (!pci_dev_config_accessible(dev, "FLR"))
> > > > > > +		return -ENOTTY;
> > > > > > +
> > > > > [Severity: High]
> > > > > This isn't a bug introduced by this patch, but does placing this
> > > > > check in pcie_reset_flr() leave direct callers of pcie_flr()
> > > > > exposed to 60-second hangs?
> > > > > 
> > > > > Direct callers of pcie_flr() (like network and storage drivers
> > > > > during error recovery) bypass this wrapper. They will still
> > > > > experience a stall in pci_dev_wait() when attempting to reset a
> > > > > device with an inaccessible config space. Should this
> > > > > accessibility check be moved into pcie_flr() instead to protect
> > > > > all callers?
> > > >
> > > > I can't remember why we have both pcie_reset_flr() and pcie_flr(),
> > > > other than the fact that callers don't need to supply a "probe"
> > > > argument to pcie_flr().
> > > > 
> > > > Is there a reason to call pci_dev_config_accessible() here rather
> > > > than in pcie_flr()?
> > >
> > > The reason we wanted the check in pcie_reset_flr() was so to be able
> > > to escalate to bus reset method if we can't do an FLR. I think the
> > > check could be moved to pcie_flr(). Is that more preferable?
> >
> > What if we added a preliminary patch that folds the body of pcie_flr()
> > into pcie_reset_flr() and adds:
> > 
> >    #define pcie_flr(dev)	pcie_reset_flr(dev, PCI_RESET_DO_RESET)
> > 
> > That would mean pcie_flr() users would start paying attention to
> > PCI_DEV_FLAGS_NO_FLR_RESET and PCI_EXP_DEVCAP_FLR, which seems like a
> > good thing.
> 
> Looking at the history, it seems commit 56f107d7813f ("PCI: Add
> pcie_reset_flr() with 'probe' argument") added pcie_reset_flr() to
> add the probe and remove pcie_has_flr(). It also notes that callers
> of pcie_flr() that didn't call pcie_has_flr() remain. I guess it
> didn't want to change the drivers that called pcie_flr()
> unconditionally?

I think it's a mistake that we expose both pcie_reset_flr() and
pcie_flr() to drivers.  It's unnecessary complication and may lead to
pcie_flr() being used to work around device defects, e.g., the 82599
issue, without any explanation.

Ideally I think we should convert the dozen or so callers of
pcie_flr() to pcie_reset_flr().  Since they never called
pcie_has_flr(), they initiate an FLR without checking for FLR support,
so they *could* have the same defect as 82599, i.e., they don't
advertise PCI_EXP_DEVCAP_FLR but FLR does actually work.

So converting would be slightly risky but I think the number is small
enough to be manageable, though it would be out of scope for *this*
series, of course.

> There is also reset_intel_82599_sfp_virtfn() that mentions the need
> to FLR without checking for FLR support due to some quirk. So I
> wonder if there were genuine reasons to avoid doing the check for
> PCI_DEV_FLAGS_NO_FLR_RESET and PCI_EXP_DEVCAP_FLR?

Yes.  That device looks defective in that FLR is mandatory for PFs and
VFs and PCI_EXP_DEVCAP_FLR must be set, but it is not set for 82599
VFs.

When c763e7b58f71 ("PCI: add Intel 82599 Virtual Function
specific reset method") added reset_intel_82599_sfp_virtfn(),
pcie_flr() read PCI_EXP_DEVCAP to check for PCI_EXP_DEVCAP_FLR every
time it was called.

PCI_EXP_DEVCAP is a read-only register that is now cached in
dev->devcap by set_pcie_port_type(), and pcie_reset_flr() checks the
cached value for PCI_EXP_DEVCAP_FLR.

So I think reset_intel_82599_sfp_virtfn() could be reworked to be an
early quirk that sets PCI_EXP_DEVCAP_FLR in dev->devcap, instead of
being a device-specific reset method.  Then pcie_reset_flr() should
just work.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-11 19:22             ` Bjorn Helgaas
@ 2026-06-11 23:29               ` Farhan Ali
  2026-06-12 15:44                 ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Farhan Ali @ 2026-06-11 23:29 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: sashiko-reviews, linux-pci, Alex Williamson


On 6/11/2026 12:22 PM, Bjorn Helgaas wrote:
> On Wed, Jun 10, 2026 at 02:56:48PM -0700, Farhan Ali wrote:
>> On 6/10/2026 1:02 PM, Bjorn Helgaas wrote:
>>> On Wed, Jun 10, 2026 at 09:51:55AM -0700, Farhan Ali wrote:
>>>> On 6/9/2026 3:33 PM, Bjorn Helgaas wrote:
>>>>> On Wed, Jun 03, 2026 at 06:57:36PM +0000,sashiko-bot@kernel.org wrote:
>>>>>> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>>>>>>
>>>>>> Pre-existing issues:
>>>> ...
>>>>>>> +++ b/drivers/pci/pci.c
>>>>>>> @@ -4396,6 +4396,9 @@ int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>>>>>>     	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>>>>>>     		return -ENOTTY;
>>>>>>> +	if (!pci_dev_config_accessible(dev, "FLR"))
>>>>>>> +		return -ENOTTY;
>>>>>>> +
>>>>>> [Severity: High]
>>>>>> This isn't a bug introduced by this patch, but does placing this
>>>>>> check in pcie_reset_flr() leave direct callers of pcie_flr()
>>>>>> exposed to 60-second hangs?
>>>>>>
>>>>>> Direct callers of pcie_flr() (like network and storage drivers
>>>>>> during error recovery) bypass this wrapper. They will still
>>>>>> experience a stall in pci_dev_wait() when attempting to reset a
>>>>>> device with an inaccessible config space. Should this
>>>>>> accessibility check be moved into pcie_flr() instead to protect
>>>>>> all callers?
>>>>> I can't remember why we have both pcie_reset_flr() and pcie_flr(),
>>>>> other than the fact that callers don't need to supply a "probe"
>>>>> argument to pcie_flr().
>>>>>
>>>>> Is there a reason to call pci_dev_config_accessible() here rather
>>>>> than in pcie_flr()?
>>>> The reason we wanted the check in pcie_reset_flr() was so to be able
>>>> to escalate to bus reset method if we can't do an FLR. I think the
>>>> check could be moved to pcie_flr(). Is that more preferable?
>>> What if we added a preliminary patch that folds the body of pcie_flr()
>>> into pcie_reset_flr() and adds:
>>>
>>>     #define pcie_flr(dev)	pcie_reset_flr(dev, PCI_RESET_DO_RESET)
>>>
>>> That would mean pcie_flr() users would start paying attention to
>>> PCI_DEV_FLAGS_NO_FLR_RESET and PCI_EXP_DEVCAP_FLR, which seems like a
>>> good thing.
>> Looking at the history, it seems commit 56f107d7813f ("PCI: Add
>> pcie_reset_flr() with 'probe' argument") added pcie_reset_flr() to
>> add the probe and remove pcie_has_flr(). It also notes that callers
>> of pcie_flr() that didn't call pcie_has_flr() remain. I guess it
>> didn't want to change the drivers that called pcie_flr()
>> unconditionally?
> I think it's a mistake that we expose both pcie_reset_flr() and
> pcie_flr() to drivers.  It's unnecessary complication and may lead to
> pcie_flr() being used to work around device defects, e.g., the 82599
> issue, without any explanation.
>
> Ideally I think we should convert the dozen or so callers of
> pcie_flr() to pcie_reset_flr().  Since they never called
> pcie_has_flr(), they initiate an FLR without checking for FLR support,
> so they *could* have the same defect as 82599, i.e., they don't
> advertise PCI_EXP_DEVCAP_FLR but FLR does actually work.
>
> So converting would be slightly risky but I think the number is small
> enough to be manageable, though it would be out of scope for *this*
> series, of course.
>
>> There is also reset_intel_82599_sfp_virtfn() that mentions the need
>> to FLR without checking for FLR support due to some quirk. So I
>> wonder if there were genuine reasons to avoid doing the check for
>> PCI_DEV_FLAGS_NO_FLR_RESET and PCI_EXP_DEVCAP_FLR?
> Yes.  That device looks defective in that FLR is mandatory for PFs and
> VFs and PCI_EXP_DEVCAP_FLR must be set, but it is not set for 82599
> VFs.
>
> When c763e7b58f71 ("PCI: add Intel 82599 Virtual Function
> specific reset method") added reset_intel_82599_sfp_virtfn(),
> pcie_flr() read PCI_EXP_DEVCAP to check for PCI_EXP_DEVCAP_FLR every
> time it was called.
>
> PCI_EXP_DEVCAP is a read-only register that is now cached in
> dev->devcap by set_pcie_port_type(), and pcie_reset_flr() checks the
> cached value for PCI_EXP_DEVCAP_FLR.
>
> So I think reset_intel_82599_sfp_virtfn() could be reworked to be an
> early quirk that sets PCI_EXP_DEVCAP_FLR in dev->devcap, instead of
> being a device-specific reset method.  Then pcie_reset_flr() should
> just work.

I do agree exposing both pcie_flr() and pcie_reset_flr() is not ideal, 
and having single API with all the checks would be good. I can take a 
stab at it (separate from this series) and look into either folding the 
contents of pcie_flr() to pcie_reset_flr() or converting all pcie_flr() 
caller to pcie_reset_flr(). AFAICT reset_intel_82599_sfp_virtfn() could 
be the only caller that actually needed to avoid the PCI_EXP_DEVCAP_FLR 
check.


FWIW when trying on s390x with the limited set of supported PCI devices, 
I didn't see any issues.

Thanks

Farhan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible
  2026-06-11 23:29               ` Farhan Ali
@ 2026-06-12 15:44                 ` Bjorn Helgaas
  0 siblings, 0 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2026-06-12 15:44 UTC (permalink / raw)
  To: Farhan Ali; +Cc: sashiko-reviews, linux-pci, Alex Williamson

On Thu, Jun 11, 2026 at 04:29:05PM -0700, Farhan Ali wrote:
> ...

> I do agree exposing both pcie_flr() and pcie_reset_flr() is not ideal, and
> having single API with all the checks would be good. I can take a stab at it
> (separate from this series) and look into either folding the contents of
> pcie_flr() to pcie_reset_flr() or converting all pcie_flr() caller to
> pcie_reset_flr(). AFAICT reset_intel_82599_sfp_virtfn() could be the only
> caller that actually needed to avoid the PCI_EXP_DEVCAP_FLR check.

That would be great.  Probably something that should be in linux-next
for quite a while so we have a chance to find any issues.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-12 15:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 18:16 [PATCH v18 0/3] [PCI] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-06-03 18:16 ` [PATCH v18 1/3] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-06-03 18:33   ` sashiko-bot
2026-06-03 18:16 ` [PATCH v18 2/3] PCI: Avoid saving config space state if inaccessible Farhan Ali
2026-06-03 18:16 ` [PATCH v18 3/3] PCI: Fail FLR when config space is inaccessible Farhan Ali
2026-06-03 18:57   ` sashiko-bot
2026-06-09 22:33     ` Bjorn Helgaas
2026-06-10 16:51       ` Farhan Ali
2026-06-10 20:02         ` Bjorn Helgaas
2026-06-10 21:56           ` Farhan Ali
2026-06-11 19:22             ` Bjorn Helgaas
2026-06-11 23:29               ` Farhan Ali
2026-06-12 15:44                 ` Bjorn Helgaas
2026-06-10 23:44         ` Bjorn Helgaas
2026-06-11 18:22           ` Farhan Ali

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.