public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.pan@linux.microsoft.com>
To: David Matlack <dmatlack@google.com>
Cc: iommu@lists.linux.dev, kexec@lists.infradead.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-pci@vger.kernel.org,
	Adithya Jayachandran <ajayachandra@nvidia.com>,
	Alexander Graf <graf@amazon.com>,
	Alex Williamson <alex@shazbot.org>,
	Bjorn Helgaas <bhelgaas@google.com>, Chris Li <chrisl@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Joerg Roedel <joro@8bytes.org>,
	Jonathan Corbet <corbet@lwn.net>, Josh Hilke <jrhilke@google.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Lukas Wunner <lukas@wunner.de>, Mike Rapoport <rppt@kernel.org>,
	Parav Pandit <parav@nvidia.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Pranjal Shrivastava <praan@google.com>,
	Pratyush Yadav <pratyush@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Samiullah Khawaja <skhawaja@google.com>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Will Deacon <will@kernel.org>, William Tu <witu@nvidia.com>,
	Yi Liu <yi.l.liu@intel.com>,
	jacob.pan@linux.microsoft.com
Subject: Re: [PATCH v4 05/11] PCI: liveupdate: Inherit bus numbers during Live Update
Date: Wed, 29 Apr 2026 15:28:14 -0700	[thread overview]
Message-ID: <20260429152814.000005f7@linux.microsoft.com> (raw)
In-Reply-To: <ae_SHCjNGrEPurzH@google.com>

Hi David,

On Mon, 27 Apr 2026 21:16:12 +0000
David Matlack <dmatlack@google.com> wrote:

> On 2026-04-27 08:40 PM, David Matlack wrote:
> > On 2026-04-27 11:47 AM, Jacob Pan wrote:  
> > > On Thu, 23 Apr 2026 21:23:09 +0000
> > > David Matlack <dmatlack@google.com> wrote:  
> >   
> > > > To keep things simple, inherit the secondary and subordinate bus
> > > > numbers on all bridges if any PCI devices were preserved (i.e.
> > > > even bridges without any downstream endpoints that were
> > > > preserved). This avoids accidentally assigning a bridge a new
> > > > window that overlaps with a preserved device that is downstream
> > > > of a different bridge.
> > > > 
> > > > If a bridge is enumerated with a broken topology or has no bus
> > > > numbers set during a Live Update, refuse to assign it new bus
> > > > numbers and refuse to enumerate devices below it. This is a
> > > > safety measure to prevent topology conflicts.
> > > > 
> > > > Require that CONFIG_CARDBUS is not enabled to enable
> > > > CONFIG_PCI_LIVEUPDATE since inheriting bus numbers on
> > > > PCI-to-CardBus bridges requires additional work but is not a
> > > > priority at the moment.
> > > > 
> > > > Signed-off-by: David Matlack <dmatlack@google.com>  
> >   
> > > > +	/*
> > > > +	 * During a Live Update, preserved devices are allowed
> > > > to continue
> > > > +	 * performing memory transactions. The kernel must not
> > > > change the fabric
> > > > +	 * topology, including bus numbers, since that would
> > > > require disabling
> > > > +	 * and flushing any memory transactions first.
> > > > +	 *
> > > > +	 * To keep things simple, inherit the secondary and
> > > > subordinate bus
> > > > +	 * numbers on _all_ bridges if _any_ PCI devices were
> > > > preserved (i.e.
> > > > +	 * even bridges without any downstream endpoints that
> > > > were preserved).
> > > > +	 * This avoids accidentally assigning a bridge a new
> > > > window that
> > > > +	 * overlaps with a preserved device that is downstream
> > > > of a different
> > > > +	 * bridge.
> > > > +	 */
> > > > +	dev->liveupdate_inherit_buses = true;
> > > > +  
> > > This flag never gets cleared after the incoming kernel boot up,
> > > what if the user does a manual rescan via sysfs? i.e.
> > > # echo 1 > /sys/bus/pci/rescan
> > > pcibios_assign_all_busses() will never gets called for this
> > > device, and may hit this
> > > 	if (dev->liveupdate_inherit_buses) {
> > > 		pci_err(dev, "Cannot reconfigure bridge during
> > > 		Live Update!\n");
> > > 
> > > So, maybe clear it in pci_liveupdate_finish()?  
> > 
> > I think we can allo wa rescan to assign new bus numbers once all
> > devices go through pci_liveupdate_finish() by clearing
> > dev->liveupdate_inherit_buses on all devices in pci_flb_finish(). We
> > would need to hold pci_rescan_remove_lock to avoid this racing with
> > such a rescan.
> > 
> > Now that you bring up /sys/bus/pci/rescan... I think we also need to
> > set dev->liveupdate_inherit_buses in the outgoing kernel, to avoid
> > bus numbers changing on outgoing preserved devices.
> > pci_flb_preserve() should take pci_rescan_remove_lock and set
> > dev->liveupdate_inherit_buses on all devices, and
> > pci_flb_unpreserve() should do the opposite.
> > 
> > If we did all then then /sys/bus/pci/rescan can work like normal as
> > long as no devices are preserved (incoming or outgoing). If any
> > devices are preserved then dev->liveupdate_inherit_buses gets set
> > to prevent bus numbers from changing during a possible rescan.  
> 
> Something like this? This is a diff applied on top of this commit.
Below looks correct to me, but I have another question. How do you
stablize PCI BARs? PCI BDF stability does not guarantee BARs don't get
moved, right?

> 
> diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> index fead478e8a04..b1b0a5b1a5df 100644
> --- a/drivers/pci/liveupdate.c
> +++ b/drivers/pci/liveupdate.c
> @@ -120,6 +120,20 @@
>  
>  #include "pci.h"
>  
> +/*
> + * During a Live Update, preserved devices are allowed to continue
> performing
> + * memory transactions. The kernel must not change the fabric
> topology,
> + * including bus numbers, since that would require disabling and
> flushing any
> + * memory transactions first.
> + *
> + * To keep things simple, inherit the secondary and subordinate bus
> numbers on
> + * _all_ bridges if _any_ PCI devices are preserved (i.e.  even
> bridges without
> + * any downstream endpoints that were preserved).  This avoids
> accidentally
> + * assigning a bridge a new window that overlaps with a preserved
> device that is
> + * downstream of a different bridge.
> + */
> +static atomic_t inherit_buses;
> +
>  struct pci_flb_outgoing {
>  	/* The pci_ser struct to be passed to the next kernel */
>  	struct pci_ser *ser;
> @@ -141,6 +155,29 @@ static unsigned long pci_ser_xa_key(unsigned
> long domain, unsigned long bdf) return domain << 16 | bdf;
>  }
>  
> +bool pci_liveupdate_inherit_buses(void)
> +{
> +	return atomic_read(&inherit_buses);
> +}
> +
> +static void pci_set_liveupdate_inherit_buses(bool enable)
> +{
> +	/* Ensure updates to inherit_buses do not race with rescans
> */
> +	pci_lock_rescan_remove();
> +
> +	/*
> +	 * Increment/decrement instead of setting directly to
> true/false so that
> +	 * pci_liveupdate_inherit_buses() returns true if any device
> is outgoing
> +	 * preserved or incoming preserved.
> +	 */
> +	if (enable)
> +		atomic_inc(&inherit_buses);
> +	else
> +		atomic_dec(&inherit_buses);
> +
> +	pci_unlock_rescan_remove();
> +}
> +
>  static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
>  {
>  	struct pci_flb_outgoing *outgoing;
> @@ -180,6 +217,8 @@ static int pci_flb_preserve(struct
> liveupdate_flb_op_args *args) 
>  	args->obj = outgoing;
>  	args->data = virt_to_phys(outgoing->ser);
> +
> +	pci_set_liveupdate_inherit_buses(true);
>  	return 0;
>  }
>  
> @@ -187,6 +226,8 @@ static void pci_flb_unpreserve(struct
> liveupdate_flb_op_args *args) {
>  	struct pci_flb_outgoing *outgoing = args->obj;
>  
> +	pci_set_liveupdate_inherit_buses(false);
> +
>  	pr_debug("Unpreserving struct pci_ser\n");
>  	WARN_ON_ONCE(outgoing->ser->nr_devices);
>  	kho_unpreserve_free(outgoing->ser);
> @@ -223,6 +264,8 @@ static int pci_flb_retrieve(struct
> liveupdate_flb_op_args *args) }
>  
>  	args->obj = incoming;
> +
> +	pci_set_liveupdate_inherit_buses(true);
>  	return 0;
>  }
>  
> @@ -230,6 +273,8 @@ static void pci_flb_finish(struct
> liveupdate_flb_op_args *args) {
>  	struct pci_flb_incoming *incoming = args->obj;
>  
> +	pci_set_liveupdate_inherit_buses(false);
> +
>  	xa_destroy(&incoming->xa);
>  	kho_restore_free(incoming->ser);
>  	kfree(incoming);
> @@ -385,21 +430,6 @@ void pci_liveupdate_setup_device(struct pci_dev
> *dev) if (!xa)
>  		return;
>  
> -	/*
> -	 * During a Live Update, preserved devices are allowed to
> continue
> -	 * performing memory transactions. The kernel must not
> change the fabric
> -	 * topology, including bus numbers, since that would require
> disabling
> -	 * and flushing any memory transactions first.
> -	 *
> -	 * To keep things simple, inherit the secondary and
> subordinate bus
> -	 * numbers on _all_ bridges if _any_ PCI devices were
> preserved (i.e.
> -	 * even bridges without any downstream endpoints that were
> preserved).
> -	 * This avoids accidentally assigning a bridge a new window
> that
> -	 * overlaps with a preserved device that is downstream of a
> different
> -	 * bridge.
> -	 */
> -	dev->liveupdate_inherit_buses = true;
> -
>  	key = pci_ser_xa_key(pci_domain_nr(dev->bus),
> pci_dev_id(dev)); dev_ser = xa_load(xa, key);
>  
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 09bab39738d7..abd8379b99cf 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -1442,6 +1442,7 @@ static inline int pci_msix_write_tph_tag(struct
> pci_dev *pdev, unsigned int inde #ifdef CONFIG_PCI_LIVEUPDATE
>  void pci_liveupdate_setup_device(struct pci_dev *dev);
>  void pci_liveupdate_cleanup_device(struct pci_dev *dev);
> +bool pci_liveupdate_inherit_buses(void);
>  #else
>  static inline void pci_liveupdate_setup_device(struct pci_dev *dev)
>  {
> @@ -1450,6 +1451,11 @@ static inline void
> pci_liveupdate_setup_device(struct pci_dev *dev) static inline void
> pci_liveupdate_cleanup_device(struct pci_dev *dev) {
>  }
> +
> +static inline bool pci_liveupdate_inherit_buses(void)
> +{
> +	return false;
> +}
>  #endif
>  
>  #endif /* DRIVERS_PCI_H */
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index fa26f4170add..f94fa1fc76cc 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1374,9 +1374,9 @@ bool pci_ea_fixed_busnrs(struct pci_dev *dev,
> u8 *sec, u8 *sub) return true;
>  }
>  
> -static bool pci_should_assign_new_buses(struct pci_dev *dev)
> +static bool pci_should_assign_new_buses(void)
>  {
> -	if (dev->liveupdate_inherit_buses)
> +	if (pci_liveupdate_inherit_buses())
>  		return false;
>  
>  	return pcibios_assign_all_busses();
> @@ -1409,7 +1409,7 @@ static int pci_scan_bridge_extend(struct
> pci_bus *bus, struct pci_dev *dev, int max, unsigned int
> available_buses, int pass)
>  {
> -	const bool assign_new_buses =
> pci_should_assign_new_buses(dev);
> +	const bool assign_new_buses = pci_should_assign_new_buses();
>  	struct pci_bus *child;
>  	u32 buses;
>  	u16 bctl;
> @@ -1518,7 +1518,7 @@ static int pci_scan_bridge_extend(struct
> pci_bus *bus, struct pci_dev *dev, goto out;
>  		}
>  
> -		if (dev->liveupdate_inherit_buses) {
> +		if (pci_liveupdate_inherit_buses()) {
>  			pci_err(dev, "Cannot reconfigure bridge
> during Live Update!\n"); pci_err(dev, "Downstream devices will not be
> enumerated!\n"); goto out;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 9a602b322e3c..dd6b26ca9462 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -511,7 +511,6 @@ struct pci_dev {
>  	unsigned int	rom_bar_overlap:1;	/* ROM BAR
> disable broken */ unsigned int	rom_attr_enabled:1;	/*
> Display of ROM attribute enabled? */ unsigned int
> non_mappable_bars:1;	/* BARs can't be mapped to user-space  */
> -	unsigned int	liveupdate_inherit_buses:1; /* Inherit
> bus numbers due to Live Update */ pci_dev_flags_t dev_flags;
>  	atomic_t	enable_cnt;	/* pci_enable_device has
> been called */ 


  reply	other threads:[~2026-04-29 22:28 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 21:23 [PATCH v4 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
2026-04-23 21:23 ` [PATCH v4 01/11] PCI: liveupdate: Set up FLB handler for the PCI core David Matlack
2026-04-24 12:33   ` Pratyush Yadav
2026-04-24 13:29     ` Pasha Tatashin
2026-04-27 23:59       ` David Matlack
2026-04-28 17:50         ` Pasha Tatashin
2026-04-28 23:47           ` David Matlack
2026-04-27 21:05   ` Bjorn Helgaas
2026-04-27 21:31     ` David Matlack
2026-04-28 19:45   ` Vipin Sharma
2026-04-28 23:51     ` David Matlack
2026-04-23 21:23 ` [PATCH v4 02/11] PCI: liveupdate: Track outgoing preserved PCI devices David Matlack
2026-04-27 15:57   ` Jacob Pan
2026-04-27 18:56     ` David Matlack
2026-04-27 21:06   ` Bjorn Helgaas
2026-04-28 17:24   ` Samiullah Khawaja
2026-04-28 17:35     ` Samiullah Khawaja
2026-04-28 20:20   ` Vipin Sharma
2026-04-28 21:12     ` David Matlack
2026-04-23 21:23 ` [PATCH v4 03/11] PCI: liveupdate: Track incoming " David Matlack
2026-04-27 21:06   ` Bjorn Helgaas
2026-04-23 21:23 ` [PATCH v4 04/11] PCI: liveupdate: Document driver binding responsibilities David Matlack
2026-04-23 21:23 ` [PATCH v4 05/11] PCI: liveupdate: Inherit bus numbers during Live Update David Matlack
2026-04-27 18:47   ` Jacob Pan
2026-04-27 20:40     ` David Matlack
2026-04-27 21:16       ` David Matlack
2026-04-29 22:28         ` Jacob Pan [this message]
2026-04-29 22:56           ` David Matlack
2026-04-27 21:07   ` Bjorn Helgaas
2026-04-23 21:23 ` [PATCH v4 06/11] PCI: liveupdate: Auto-preserve upstream bridges across " David Matlack
2026-04-23 21:23 ` [PATCH v4 07/11] PCI: liveupdate: Inherit ACS flags in incoming preserved devices David Matlack
2026-04-23 21:23 ` [PATCH v4 08/11] PCI: liveupdate: Require preserved devices are in immutable singleton IOMMU groups David Matlack
2026-04-23 22:10   ` David Matlack
2026-04-23 22:52     ` Jason Gunthorpe
2026-04-23 23:09       ` David Matlack
2026-04-23 23:27         ` Samiullah Khawaja
2026-04-27 20:56   ` Jacob Pan
2026-04-23 21:23 ` [PATCH v4 09/11] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges David Matlack
2026-04-23 21:23 ` [PATCH v4 10/11] PCI: liveupdate: Do not disable bus mastering on preserved devices during kexec David Matlack
2026-04-27 21:08   ` Bjorn Helgaas
2026-04-23 21:23 ` [PATCH v4 11/11] Documentation: PCI: Add documentation for Live Update David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260429152814.000005f7@linux.microsoft.com \
    --to=jacob.pan@linux.microsoft.com \
    --cc=ajayachandra@nvidia.com \
    --cc=alex@shazbot.org \
    --cc=bhelgaas@google.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jrhilke@google.com \
    --cc=kexec@lists.infradead.org \
    --cc=leonro@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=praan@google.com \
    --cc=pratyush@kernel.org \
    --cc=rientjes@google.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=skhawaja@google.com \
    --cc=will@kernel.org \
    --cc=witu@nvidia.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox