On Wed, 26 Nov 2025, guanghui.fgh wrote: > Refer to the previous email discussion replies: > > 1. When a multifunction device is connected to a PCIe slot that supports hotplug, > during the passthrogh device to guest process based on VFIO, some devices > need to perform a hot reset to initialize the device: > vfio_pci_dev_set_hot_reset ---> __pci_reset_slot/__pci_reset_bus ---> > pci_bridge_secondary_bus_reset ---> pci_bridge_wait_for_secondary_bus. > > After __pci_reset_slot/__pci_reset_bus calls pci_bridge_wait_for_secondary_bus, > the device will be restored via pci_dev_restore. ...and then they call to pci_slot_restore_locked()/pci_bus_restore_locked() that do recursively wait + restore config spaces since the commit 3e40aa29d47e ("PCI: Wait for Link before restoring Downstream Buses"). > However, when a ====== multifunction PCIe device ========= > is connected, executing pci_bridge_wait_for_secondary_bus only guarantees the restoration > of a random device. For other devices that are still restoring, executing pci_dev_restore cannot > restore the device state normally, resulting in errors or even device offline. Addressing this doesn't require recursion AFAICT. > 2. In the PCIe dpc_handler process, do pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link) > to restores the PCIe link. > But dpc_reset_link waits for the link to recover via pci_bridge_wait_for_secondary_bus before returning. > Similarly, pcie_do_recovery restores devices via pci_walk_bridge(bridge, report_resume, &status), > > ====== which also requires pci_bridge_wait_for_secondary_bus to wait for all devices to recover completely. > For other devices that are still restoring, executing report_resume cannot restore the device state normally, > resulting in errors or even device offline. This might lack wait for subordinate buses and ordering with restore, but I don't think the right place to add that is pci_bridge_wait_for_secondary_bus() that is used by others. > 3. The PCIe specification only constrains the minimum wait time under different resets. In the historical kernel implementation, > SBR wait also did not need to restore the PCIe device configuration space before waiting for the coordinate. I'm not saying there isn't anything to fix/change but I'm not convinced by the approach taken by your patch. -- i. > ------------------------------------------------------------------ > On Wed, 26 Nov 2025, guanghui.fgh wrote: > > > 1. Does this add recursion without restoring config space on each level > > before starting the child wait? > > > > Yes > > The current implementation does not require restoring the PCIe device > > configuration space. The status is determined during the waiting > > process, either based on software-recorded status or on the device's RO > > and HWINIT type registers. > > What guarantees a link even come up without restoring the config space > first (it may work in your case but is that true universally)? > > The commit 3e40aa29d47e ("PCI: Wait for Link before restoring Downstream > Buses") tried to address this sub-hierarchy wait issue within the > recursive bus restore (without me ever encountering this problem for > real). So I don't understand why you had to add the recursion to the wait > side as well? > > -- > i. > > > > When executing a PCIe secondary bus reset, all downstream switches and > > > endpoints will generate reset events. Simultaneously, all PCIe links > > > will undergo retraining, and each link will independently re-execute the > > > LTSSM state machine training. Therefore, after executing the SBR, it is > > > necessary to wait for all downstream links and devices to complete > > > recovery. Otherwise, after the SBR returns, accessing devices with some > > > links or endpoints not yet fully recovered may result in driver errors, > > > or even trigger device offline issues. > > > > > > Signed-off-by: Guanghui Feng > > > Signed-off-by: Guixin Liu > > > --- > > >  drivers/pci/pci.c | 112 ++++++++++++++++++++++++++++++++++++++-------- > > >  1 file changed, 94 insertions(+), 18 deletions(-) > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index b14dd064006c..9cf13fe69d94 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -4788,9 +4788,74 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus) > > >   return max(min_delay, max_delay); > > >  } > > >  > > > +int __pci_bridge_wait_for_secondary_bus(struct pci_dev *, unsigned long, char *); > > > + > > > +static int pci_dev_wait_child(struct pci_dev *pdev, unsigned long start_t, int timeout, > > > +    char *reset_type) > > > +{ > > > + struct pci_dev *child, **dev = NULL; > > > + int ct = 0, i = 0, ret = 0, left = 1; > > > + unsigned long dev_start_t; > > > + > > > + down_read(&pci_bus_sem); > > > + > > > + list_for_each_entry(child, &pdev->subordinate->devices, bus_list) > > > +  ct++; > > > + > > > + if (ct) { > > > +  dev = kzalloc(sizeof(struct pci_dev *) * ct, GFP_KERNEL); > > > + > > > +  if(!dev) { > > > +   pci_err(pdev, "dev mem alloc err\n"); > > > +   up_read(&pci_bus_sem); > > > +   return -ENOMEM; > > > +  } > > > + > > > +  list_for_each_entry(child, &pdev->subordinate->devices, bus_list) > > > +   dev[i++] = pci_dev_get(child); > > > + } > > > + > > > + up_read(&pci_bus_sem); > > > + > > > + for (i = 0; i < ct; ++i) { > > > +  left = 1; > > > + > > > +dev_wait: > > > +  dev_start_t = jiffies; > > > +  ret = pci_dev_wait(dev[i], reset_type, left); > > > +  timeout -= jiffies_to_msecs(jiffies - dev_start_t); > > > + > > > +  if (ret) { > > > +   if (pci_dev_is_disconnected(dev[i])) > > > +    continue; > > > + > > > +   if (timeout <= 0) > > > +    goto end; > > > + > > > +   left <<= 1; > > > +   left = timeout > left ? left : timeout; > > > +   goto dev_wait; > > > +  } > > > + } > > > + > > > + for (i = 0; i < ct; ++i) { > > > +  ret = __pci_bridge_wait_for_secondary_bus(dev[i], start_t, reset_type); > > > > Does this add recursion without restoring config space on each level > > before starting the child wait? > > > > > +  if (ret) > > > +   break; > > > + } > > > + > > > +end: > > > + for (i = 0; i < ct; ++i) > > > +  pci_dev_put(dev[i]); > > > + > > > + kfree(dev); > > > + return ret; > > > +} > > > + > > >  /** > > > - * pci_bridge_wait_for_secondary_bus - Wait for secondary bus to be accessible > > > + * __pci_bridge_wait_for_secondary_bus - Wait for secondary bus to be accessible > > >   * @dev: PCI bridge > > > + * @start_t: wait start jiffies time > > >   * @reset_type: reset type in human-readable form > > >   * > > >   * Handle necessary delays before access to the devices on the secondary > > > @@ -4804,10 +4869,9 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus) > > >   * Return 0 on success or -ENOTTY if the first device on the secondary bus > > >   * failed to become accessible. > > >   */ > > > -int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > > > +int __pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, unsigned long start_t, char *reset_type) > > >  { > > > - struct pci_dev *child __free(pci_dev_put) = NULL; > > > - int delay; > > > + int delay, left; > > >  > > >   if (pci_dev_is_disconnected(dev)) > > >    return 0; > > > @@ -4835,8 +4899,6 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > > >    return 0; > > >   } > > >  > > > - child = pci_dev_get(list_first_entry(&dev->subordinate->devices, > > > -          struct pci_dev, bus_list)); > > >   up_read(&pci_bus_sem); > > >  > > >   /* > > > @@ -4844,8 +4906,12 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > > >    * accessing the device after reset (that is 1000 ms + 100 ms). > > >    */ > > >   if (!pci_is_pcie(dev)) { > > > -  pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay); > > > -  msleep(1000 + delay); > > > +  left = 1000 + delay - jiffies_to_msecs(jiffies - start_t); > > > +  pci_dbg(dev, "waiting %d ms for secondary bus\n", left > 0 ? left : 0); > > > + > > > +  if (left > 0) > > > +   msleep(left); > > > + > > >    return 0; > > >   } > > >  > > > @@ -4870,10 +4936,14 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > > >   if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) { > > >    u16 status; > > >  > > > -  pci_dbg(dev, "waiting %d ms for downstream link\n", delay); > > > -  msleep(delay); > > > +  left = delay - jiffies_to_msecs(jiffies - start_t); > > > +  pci_dbg(dev, "waiting %d ms for downstream link\n", left > 0 ? left : 0); > > > + > > > +  if (left > 0) > > > +   msleep(left); > > >  > > > -  if (!pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay)) > > > +  left = PCI_RESET_WAIT - jiffies_to_msecs(jiffies - start_t); > > > +  if(!pci_dev_wait_child(dev, start_t, left > 0 ? left : 0, reset_type)) > > >     return 0; > > >  > > >    /* > > > @@ -4888,20 +4958,26 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > > >    if (!(status & PCI_EXP_LNKSTA_DLLLA)) > > >     return -ENOTTY; > > >  > > > -  return pci_dev_wait(child, reset_type, > > > -        PCIE_RESET_READY_POLL_MS - PCI_RESET_WAIT); > > > +  left = PCIE_RESET_READY_POLL_MS - jiffies_to_msecs(jiffies - start_t); > > > +  return pci_dev_wait_child(dev, start_t, left > 0 ? left : 0, reset_type); > > >   } > > >  > > > - pci_dbg(dev, "waiting %d ms for downstream link, after activation\n", > > > -  delay); > > > - if (!pcie_wait_for_link_delay(dev, true, delay)) { > > > + left = delay - jiffies_to_msecs(jiffies - start_t); > > > + pci_dbg(dev, "waiting %d ms for downstream link, after activation\n", left > 0 ? left : 0); > > > + > > > + if (!pcie_wait_for_link_delay(dev, true, left > 0 ? left : 0)) { > > >    /* Did not train, no need to wait any further */ > > >    pci_info(dev, "Data Link Layer Link Active not set in %d msec\n", delay); > > >    return -ENOTTY; > > >   } > > >  > > > - return pci_dev_wait(child, reset_type, > > > -       PCIE_RESET_READY_POLL_MS - delay); > > > + left = PCIE_RESET_READY_POLL_MS - jiffies_to_msecs(jiffies - start_t); > > > + return pci_dev_wait_child(dev, start_t, left > 0 ? left : 0, reset_type); > > > +} > > > + > > > +int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > > > +{ > > > + return __pci_bridge_wait_for_secondary_bus(dev, jiffies, reset_type); > > >  } > > >  > > >  void pci_reset_secondary_bus(struct pci_dev *dev) > > > > > > > -- > >  i. > > > -- i.