public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] pci: don't fallback to bus reset after failed slot reset
@ 2026-04-21 15:06 Keith Busch
  2026-04-24 16:11 ` Bjorn Helgaas
  0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2026-04-21 15:06 UTC (permalink / raw)
  To: linux-pci, bhelgaas; +Cc: Keith Busch

From: Keith Busch <kbusch@kernel.org>

If a bus has hotplug slots that implement the slot's reset_slot
callback, it is not safe to do the non-slot specific bus reset, so don't
fallback to it. If a slot reset does fail, the subsequent bus reset will
attempt a 2nd link reset on top of previous and fail to handle the
hotplug events.

Fixes: 8238cb69c01fe ("PCI: Make reset_subordinate hotplug safe")
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/pci.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8f7cfcc000901..d34266651ad09 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5607,13 +5607,14 @@ static int pci_try_reset_bus(struct pci_bus *bus)
  *           reset for affected devices
  *
  * This function will first try to reset the slots on this bus if the method is
- * available. If slot reset fails or is not available, this will fall back to a
+ * available. If slot reset is not available, this will fall back to a
  * secondary bus reset.
  */
 static int pci_reset_bridge(struct pci_dev *bridge, bool restore)
 {
 	struct pci_bus *bus = bridge->subordinate;
 	struct pci_slot *slot;
+	int ret = 0;
 
 	if (!bus)
 		return -ENOTTY;
@@ -5627,19 +5628,17 @@ static int pci_reset_bridge(struct pci_dev *bridge, bool restore)
 			goto bus_reset;
 
 	list_for_each_entry(slot, &bus->slots, list) {
-		int ret;
-
 		if (restore)
 			ret = pci_try_reset_slot(slot);
 		else
 			ret = pci_slot_reset(slot, PCI_RESET_DO_RESET);
 
 		if (ret)
-			goto bus_reset;
+			break;
 	}
 
 	mutex_unlock(&pci_slot_mutex);
-	return 0;
+	return ret;
 bus_reset:
 	mutex_unlock(&pci_slot_mutex);
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] pci: don't fallback to bus reset after failed slot reset
  2026-04-21 15:06 [PATCH] pci: don't fallback to bus reset after failed slot reset Keith Busch
@ 2026-04-24 16:11 ` Bjorn Helgaas
  2026-04-24 22:09   ` Keith Busch
  0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2026-04-24 16:11 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, bhelgaas, Keith Busch

On Tue, Apr 21, 2026 at 08:06:44AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> If a bus has hotplug slots that implement the slot's reset_slot
> callback, it is not safe to do the non-slot specific bus reset, so don't
> fallback to it. If a slot reset does fail, the subsequent bus reset will
> attempt a 2nd link reset on top of previous and fail to handle the
> hotplug events.
> 
> Fixes: 8238cb69c01fe ("PCI: Make reset_subordinate hotplug safe")
> Signed-off-by: Keith Busch <kbusch@kernel.org>

Applied to pci/reset for v7.2, thanks!  Will be rebased after
v7.1-rc1.

> ---
>  drivers/pci/pci.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8f7cfcc000901..d34266651ad09 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5607,13 +5607,14 @@ static int pci_try_reset_bus(struct pci_bus *bus)
>   *           reset for affected devices
>   *
>   * This function will first try to reset the slots on this bus if the method is
> - * available. If slot reset fails or is not available, this will fall back to a
> + * available. If slot reset is not available, this will fall back to a
>   * secondary bus reset.
>   */
>  static int pci_reset_bridge(struct pci_dev *bridge, bool restore)
>  {
>  	struct pci_bus *bus = bridge->subordinate;
>  	struct pci_slot *slot;
> +	int ret = 0;
>  
>  	if (!bus)
>  		return -ENOTTY;
> @@ -5627,19 +5628,17 @@ static int pci_reset_bridge(struct pci_dev *bridge, bool restore)
>  			goto bus_reset;
>  
>  	list_for_each_entry(slot, &bus->slots, list) {
> -		int ret;
> -
>  		if (restore)
>  			ret = pci_try_reset_slot(slot);
>  		else
>  			ret = pci_slot_reset(slot, PCI_RESET_DO_RESET);
>  
>  		if (ret)
> -			goto bus_reset;
> +			break;
>  	}
>  
>  	mutex_unlock(&pci_slot_mutex);
> -	return 0;
> +	return ret;
>  bus_reset:
>  	mutex_unlock(&pci_slot_mutex);
>  
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] pci: don't fallback to bus reset after failed slot reset
  2026-04-24 16:11 ` Bjorn Helgaas
@ 2026-04-24 22:09   ` Keith Busch
  2026-04-27 22:28     ` Bjorn Helgaas
  0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2026-04-24 22:09 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Keith Busch, linux-pci, bhelgaas

On Fri, Apr 24, 2026 at 11:11:36AM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 21, 2026 at 08:06:44AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > If a bus has hotplug slots that implement the slot's reset_slot
> > callback, it is not safe to do the non-slot specific bus reset, so don't
> > fallback to it. If a slot reset does fail, the subsequent bus reset will
> > attempt a 2nd link reset on top of previous and fail to handle the
> > hotplug events.
> > 
> > Fixes: 8238cb69c01fe ("PCI: Make reset_subordinate hotplug safe")
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> 
> Applied to pci/reset for v7.2, thanks!  Will be rebased after
> v7.1-rc1.

I kind of think this is 7.1 material. If a pciehp slot reset fails
because of some hardware issue, that means the device is being removed
after disabling the hotplug "ignore". Falling back to a bus reset walks
the bus' device list without a lock, so it risks grabbing an invalid
pointer racing with the removal.

I didn't intend to get into those details because it may distract to the
pre-existing locking issues that you can hit from a variety of
directions. But for the record, the failure from this particular path
before this patch looks like this:

 pcieport 0000:55:01.0: pciehp: Slot(30): Link Down/Up ignored
 pcieport 0000:55:01.0: pciehp: Slot(30): Link Down
 pcieport 0000:55:01.0: pciehp: Slot(30): Card not present
 Oops: general protection fault, probably for non-canonical address 0xdead000000000180: 0000 [#1] SMP
...
 RIP: 0010:pci_dev_save_and_disable+0x9/0x70
 Code: 89 fe 48 89 c7 e8 57 9d 99 00 48 89 df e8 5f 5e ff ff 4c 89 f7 45 31 e4 e9 6c ff ff ff cc cc cc cc 0f 1f 44 00 00 53 48 89 fb <48> 8b 87 80 00 00 00 48 85 c0 75 27 48 89 df 31 f6 31 d2 e8 6f c1
 RSP: 0018:ffa0000052f27db8 EFLAGS: 00010297
 RAX: 0000000000000086 RBX: dead000000000100 RCX: 0000000000000000
 RDX: 0000000000000400 RSI: 0000000000000004 RDI: dead000000000100
 RBP: ff11000112c8f428 R08: 0000000000000002 R09: ffa0000052f27cfc
 R10: ffffffff82b37718 R11: 0000000000000001 R12: ff11000112c8f440
 R13: ff1100011024b508 R14: dead000000000100 R15: ff1100011024b500
 FS:  00007f63b0ced740(0000) GS:ff11007eea90e000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f58fc1774e0 CR3: 00000003087c8002 CR4: 0000000000771ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  pci_bus_save_and_disable_locked+0x20/0x40
  pci_reset_bridge+0x1e6/0x230
  reset_subordinate_store+0x39/0x60
  kernfs_fop_write_iter.llvm.2764860334700261357+0xd4/0x1d0
  ? do_timerfd_settime+0x490/0x490
  __x64_sys_write+0x309/0x540
  do_syscall_64+0x6b/0x250
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] pci: don't fallback to bus reset after failed slot reset
  2026-04-24 22:09   ` Keith Busch
@ 2026-04-27 22:28     ` Bjorn Helgaas
  0 siblings, 0 replies; 4+ messages in thread
From: Bjorn Helgaas @ 2026-04-27 22:28 UTC (permalink / raw)
  To: Keith Busch; +Cc: Keith Busch, linux-pci, bhelgaas

On Fri, Apr 24, 2026 at 04:09:29PM -0600, Keith Busch wrote:
> On Fri, Apr 24, 2026 at 11:11:36AM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 21, 2026 at 08:06:44AM -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > 
> > > If a bus has hotplug slots that implement the slot's reset_slot
> > > callback, it is not safe to do the non-slot specific bus reset, so don't
> > > fallback to it. If a slot reset does fail, the subsequent bus reset will
> > > attempt a 2nd link reset on top of previous and fail to handle the
> > > hotplug events.
> > > 
> > > Fixes: 8238cb69c01fe ("PCI: Make reset_subordinate hotplug safe")
> > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > 
> > Applied to pci/reset for v7.2, thanks!  Will be rebased after
> > v7.1-rc1.
> 
> I kind of think this is 7.1 material. If a pciehp slot reset fails
> because of some hardware issue, that means the device is being removed
> after disabling the hotplug "ignore". Falling back to a bus reset walks
> the bus' device list without a lock, so it risks grabbing an invalid
> pointer racing with the removal.

Thanks for the heads-up; I moved this to pci/for-linus for v7.1.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-27 22:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 15:06 [PATCH] pci: don't fallback to bus reset after failed slot reset Keith Busch
2026-04-24 16:11 ` Bjorn Helgaas
2026-04-24 22:09   ` Keith Busch
2026-04-27 22:28     ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox