All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: David Matlack <dmatlack@google.com>
Cc: kexec@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org,
	Adithya Jayachandran <ajayachandra@nvidia.com>,
	Alexander Graf <graf@amazon.com>,
	Alex Williamson <alex@shazbot.org>,
	Bjorn Helgaas <bhelgaas@google.com>, Chris Li <chrisl@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Jacob Pan <jacob.pan@linux.microsoft.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Jonathan Corbet <corbet@lwn.net>, Josh Hilke <jrhilke@google.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Lukas Wunner <lukas@wunner.de>, Mike Rapoport <rppt@kernel.org>,
	Parav Pandit <parav@nvidia.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Pratyush Yadav <pratyush@kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Samiullah Khawaja <skhawaja@google.com>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Vipin Sharma <vipinsh@google.com>, William Tu <witu@nvidia.com>,
	Yi Liu <yi.l.liu@intel.com>
Subject: Re: [PATCH v6 03/12] PCI: liveupdate: Track incoming preserved PCI devices
Date: Sat, 6 Jun 2026 10:08:28 +0000	[thread overview]
Message-ID: <aiPxVxu2sUVQfG9D@google.com> (raw)
In-Reply-To: <20260522202410.3104264-4-dmatlack@google.com>

On Fri, May 22, 2026 at 08:24:01PM +0000, David Matlack wrote:
> During PCI enumeration, the previous kernel might have passed state about
> devices that were preserved across kexec. The PCI core needs to fetch
> this state to identify which devices are "incoming" and require special
> handling.
> 
> Add pci_liveupdate_setup_device() which is called during device setup
> to fetch the serialized state (struct pci_ser) from the Live Update
> Orchestrator. The first time this happens, pci_flb_retrieve() will run
> and convert the array of pci_dev_ser structs into an xarray so that it
> can be looked up efficiently.
> 
> If a device is found in the xarray, the PCI core stores a pointer to its
> state in dev->liveupdate_incoming and holds a reference to the incoming
> FLB until pci_liveupdate_finish() is called by the driver.
> 
> This ensures proper lifecycle management for incoming preserved devices
> and allows the PCI core and drivers to apply specific Live Update
> logic to them in subsequent commits.
> 
> Drivers can check if a device is an incoming preserved device (e.g.
> during probe) by calling pci_liveupdate_is_incoming().
> 
> CONFIG_64BIT is now required to enable CONFIG_PCI_LIVEUPDATE so that the
> domain and bdf can be guaranteed to fit in an unsigned long and be used
> as the xarray key.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  MAINTAINERS                    |   1 +
>  drivers/pci/Kconfig            |   2 +-
>  drivers/pci/liveupdate.c       | 230 ++++++++++++++++++++++++++++++++-
>  drivers/pci/liveupdate.h       |   5 +
>  drivers/pci/probe.c            |   3 +
>  include/linux/pci_liveupdate.h |  13 ++
>  6 files changed, 251 insertions(+), 3 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6c618830cf61..0e262c0ceb43 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -20537,6 +20537,7 @@ L:	linux-pci@vger.kernel.org
>  S:	Maintained
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
>  F:	drivers/pci/liveupdate.c
> +F:	drivers/pci/liveupdate.h
>  F:	include/linux/kho/abi/pci.h
>  F:	include/linux/pci_liveupdate.h
>  
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 10c9b65aa242..e68ae5c172d4 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -330,7 +330,7 @@ config VGA_ARB_MAX_GPUS
>  
>  config PCI_LIVEUPDATE
>  	bool "PCI Live Update Support"
> -	depends on PCI && LIVEUPDATE
> +	depends on PCI && LIVEUPDATE && 64BIT

I see that the static assertions in Patch 1 work because of the 64BIT
enforcement here. In that case, should we have the assertions check u64?

>  	help
>  	  Enable PCI core support for preserving PCI devices across Live
>  	  Update. This, in combination with support in a device's driver,
>

[...]

>  static int pci_flb_retrieve(struct liveupdate_flb_op_args *args)
>  {
> -	args->obj = phys_to_virt(args->data);
> +	struct pci_ser *ser = phys_to_virt(args->data);
> +	struct pci_flb_incoming *incoming;
> +	int ret = -ENOMEM;
> +	u32 i;
> +
> +	incoming = kmalloc_obj(*incoming);
> +	if (!incoming)
> +		goto err_restore_free;
> +
> +	incoming->ser = ser;
> +	xa_init(&incoming->xa);
> +
> +	for (i = 0; i < incoming->ser->max_nr_devices; i++) {
> +		struct pci_dev_ser *dev_ser = &incoming->ser->devices[i];
> +		unsigned long key;
> +
> +		if (!dev_ser->refcount)
> +			continue;
> +
> +		key = pci_ser_xa_key(dev_ser->domain, dev_ser->bdf);
> +		ret = xa_insert(&incoming->xa, key, dev_ser, GFP_KERNEL);
> +		if (ret)
> +			goto err_xa_destroy;
> +	}
> +
> +	args->obj = incoming;
>  	return 0;
> +
> +err_xa_destroy:
> +	xa_destroy(&incoming->xa);
> +	kfree(incoming);
> +err_restore_free:
> +	kho_restore_free(ser);

I tend to partly agree with Sashiko[1] here.. it raises a policy-hole.
We may need a policy here, the options I have in mind are:

1. Retrieve shall ONLY be tried once, if it fails (like -ENOMEM in the
   xArray alloc), it's a liveupdate failure. We can't retry liveupdate.

2. Retrying retrieve is allowed.

The only downside with option 1 is, the user may want flexibility due to
certain subsystems OR may choose NOT to use the proposed LUOd and instead
have its own user-space component which might try funny things or have a
different use-case.

In such a situation, the system may have transiently run out of memory
during the kexec transition (for e.g. a subsystem uses GFP_ATOMIC to
allocate memory and temporarily runs out of the atomic pool). [Note we
removed it in IOMMU v1 [2] but subsystems may have a use-case for it]

If the kernel frees the KHO page on the first failure, it removes any
chance of recovery. :/

Thus, it might make sense to let the user decide if it wants to fail the
liveupdate or retry again based on the failure type / source?

[...]

The changes LGTM, except for policy-based, kho_restore_free discussion.

Reviewed-by: Pranjal Shrivastava <praan@google.com>

Thanks,
Praan

[1] https://lore.kernel.org/all/20260522211333.D56A21F000E9@smtp.kernel.org/
[2] https://lore.kernel.org/all/20260203220948.2176157-2-skhawaja@google.com/

  parent reply	other threads:[~2026-06-06 10:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22 20:23 [PATCH v6 00/12] PCI: liveupdate: PCI core support for Live Update David Matlack
2026-05-22 20:23 ` [PATCH v6 01/12] PCI: liveupdate: Set up FLB handler for the PCI core David Matlack
2026-05-22 20:43   ` sashiko-bot
2026-05-22 21:37     ` David Matlack
2026-05-23 11:17       ` Mike Rapoport
2026-05-25 16:50         ` Pratyush Yadav
2026-06-05  5:41   ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 02/12] PCI: liveupdate: Track outgoing preserved PCI devices David Matlack
2026-05-22 20:54   ` sashiko-bot
2026-05-22 21:28     ` David Matlack
2026-06-05  6:26       ` Pranjal Shrivastava
2026-06-05  6:11   ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 03/12] PCI: liveupdate: Track incoming " David Matlack
2026-05-22 21:13   ` sashiko-bot
2026-05-22 21:34     ` David Matlack
2026-06-06 10:08   ` Pranjal Shrivastava [this message]
2026-05-22 20:24 ` [PATCH v6 04/12] PCI: liveupdate: Document driver binding responsibilities David Matlack
2026-05-25 15:35   ` Pratyush Yadav
2026-06-06 10:20   ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 05/12] PCI: liveupdate: Keep bus numbers constant during Live Update David Matlack
2026-05-22 21:08   ` sashiko-bot
2026-05-22 21:31     ` David Matlack
2026-06-06 11:10   ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 06/12] PCI: liveupdate: Auto-preserve upstream bridges across " David Matlack
2026-05-22 20:24 ` [PATCH v6 07/12] PCI: Refactor matching logic for pci_dev_acs_ops David Matlack
2026-05-22 20:24 ` [PATCH v6 08/12] PCI: liveupdate: Inherit ACS flags in incoming preserved devices David Matlack
2026-05-22 20:24 ` [PATCH v6 09/12] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges David Matlack
2026-05-22 21:51   ` sashiko-bot
2026-05-22 20:24 ` [PATCH v6 10/12] PCI: liveupdate: Freeze preservation status during shutdown David Matlack
2026-05-22 20:24 ` [PATCH v6 11/12] PCI: liveupdate: Do not disable bus mastering on preserved devices during kexec David Matlack
2026-05-22 20:24 ` [PATCH v6 12/12] Documentation: PCI: Add documentation for Live Update David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiPxVxu2sUVQfG9D@google.com \
    --to=praan@google.com \
    --cc=ajayachandra@nvidia.com \
    --cc=alex@shazbot.org \
    --cc=bhelgaas@google.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=jacob.pan@linux.microsoft.com \
    --cc=jgg@nvidia.com \
    --cc=jrhilke@google.com \
    --cc=kexec@lists.infradead.org \
    --cc=leonro@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=skhawaja@google.com \
    --cc=vipinsh@google.com \
    --cc=witu@nvidia.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.