From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A002D383990 for ; Fri, 22 May 2026 20:24:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779481457; cv=none; b=HS2ciAmTecGed/HblBvPns9IWdzqPQeWlh7YJz6D9BvhWkzptx62yw8bb8Lw4HL08rJRJv9ZnVOhPlCKdBz1B0Vsl3eSr+VkRNz9+6Crp3gW6iMg941ZB92vCZ5mZBDyYDanRFaZGelPk5kBvtdMEFDauVXKMRlPO5hZ8ZpftFM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779481457; c=relaxed/simple; bh=dhC8rqI3i9ihTMWh3pEZNZs1wxVxws/nIw7f91D26B8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ceDZ5iLbEpEVrDFt2VAn0tQRBr3gP87fQn0X2qRrVhH2xHg6YJymdiTadzGNEmHu8+tzUjhy+jfEkqlnO4AgboVd90v1mEv1rMshQfw2PORtlRlrF5ku9/YQOQ/GF798MBpAj7YBDk61pf63qmHgMug90Y4ER8dFELh51uOwkvE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=q5GxOwAq; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--dmatlack.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="q5GxOwAq" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2babc42244aso180264085ad.3 for ; Fri, 22 May 2026 13:24:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779481455; x=1780086255; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=O7EM8yO5UABmCYyvloFksBNZ5H9Ralyj3Xc5krkoLm8=; b=q5GxOwAqwH3qYvcSCYbOPedCPth67HYif8Zp3dNXFJlICx6a8acSLAz71ywK+7ZNr8 oqQd5gszhDlhtJAN/rIZM5uuhHWKJlBWy69G4LpBoNvCYYV8a4gBjMCeVGzqI6b+L01q vn8epwZ8m8a5H5P0uUpTmvoHZuaj74FBGnhydUVRCtewW92seIFbUNuciLjfgKgiN7nt 6jlBduSsKOok+IHu7LkOn5RPxtlOX5php4rHwyCnM/9G5tUV9rVBjinTsNQCRz0CAVmQ Sy5jfaXcLiNIbUeV1OUc4kaxWyuKpG25sHOQtEXWB8GzFDa1mje5rs/PtwdNJDoh6Kuf svqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779481455; x=1780086255; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=O7EM8yO5UABmCYyvloFksBNZ5H9Ralyj3Xc5krkoLm8=; b=ZVZHCwkSZq4s06EKIwfv75F1oFS0xNwSbw8yljnmtrASaQpl6IbtGmZoozjk5K3vPg H/U7QsbXOrcRHz0cvh4dkEAoOJYfpP2592N/L5KA//2E1qHmd28H8NnU+Fflr9TZP41h nOe80gQCG2MaWXxjwPDypTCiqTFOKx6lMryFBy/iiTYJ5HVFB22yNqbIim5EWpwUlFt2 VHjb7afgR+HEogg4bCfMfrpdFGRzCvd1H2qZ7e/b12l2hTmsZR1wFncAAuUPqHS6783f 1HtRz03/OSoxUawRc8L3XB3wdcxImAGv/6QuDk/BCP6d+zyEqbQ5xb4BmsJtgGskZxW5 oSVQ== X-Forwarded-Encrypted: i=1; AFNElJ/5l7y606n7DtZE6VqtxNnrsYn3blGBJhEFeBRKuD9pZs/enep8oXQ1qTAqCQgc3xv2xPRUTkl5c0g=@vger.kernel.org X-Gm-Message-State: AOJu0YxxnXwd4jeNjAIwmRVIG5hLUNwjpuK0z/zyU5eADcnv9MVXuXdG 8DinTDhWM2uIVrl/g7bp+Ce+DM3brADcghzmPgnivU07PdO1EPob+vbqtLFVVVaVGCKGNqJFQAP ssHr7rihnc/dAOw== X-Received: from pldw18.prod.google.com ([2002:a17:902:ca12:b0:2be:22cf:75b2]) (user=dmatlack job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1b43:b0:2b7:abc0:3bd7 with SMTP id d9443c01a7336-2beb035b8edmr57115215ad.9.1779481454632; Fri, 22 May 2026 13:24:14 -0700 (PDT) Date: Fri, 22 May 2026 20:24:01 +0000 In-Reply-To: <20260522202410.3104264-1-dmatlack@google.com> Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260522202410.3104264-1-dmatlack@google.com> X-Mailer: git-send-email 2.54.0.746.g67dd491aae-goog Message-ID: <20260522202410.3104264-4-dmatlack@google.com> Subject: [PATCH v6 03/12] PCI: liveupdate: Track incoming preserved PCI devices From: David Matlack To: kexec@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-pci@vger.kernel.org Cc: Adithya Jayachandran , Alexander Graf , Alex Williamson , Bjorn Helgaas , Chris Li , David Matlack , David Rientjes , Jacob Pan , Jason Gunthorpe , Jonathan Corbet , Josh Hilke , Leon Romanovsky , Lukas Wunner , Mike Rapoport , Parav Pandit , Pasha Tatashin , Pranjal Shrivastava , Pratyush Yadav , Saeed Mahameed , Samiullah Khawaja , Shuah Khan , Vipin Sharma , William Tu , Yi Liu Content-Type: text/plain; charset="UTF-8" During PCI enumeration, the previous kernel might have passed state about devices that were preserved across kexec. The PCI core needs to fetch this state to identify which devices are "incoming" and require special handling. Add pci_liveupdate_setup_device() which is called during device setup to fetch the serialized state (struct pci_ser) from the Live Update Orchestrator. The first time this happens, pci_flb_retrieve() will run and convert the array of pci_dev_ser structs into an xarray so that it can be looked up efficiently. If a device is found in the xarray, the PCI core stores a pointer to its state in dev->liveupdate_incoming and holds a reference to the incoming FLB until pci_liveupdate_finish() is called by the driver. This ensures proper lifecycle management for incoming preserved devices and allows the PCI core and drivers to apply specific Live Update logic to them in subsequent commits. Drivers can check if a device is an incoming preserved device (e.g. during probe) by calling pci_liveupdate_is_incoming(). CONFIG_64BIT is now required to enable CONFIG_PCI_LIVEUPDATE so that the domain and bdf can be guaranteed to fit in an unsigned long and be used as the xarray key. Signed-off-by: David Matlack --- MAINTAINERS | 1 + drivers/pci/Kconfig | 2 +- drivers/pci/liveupdate.c | 230 ++++++++++++++++++++++++++++++++- drivers/pci/liveupdate.h | 5 + drivers/pci/probe.c | 3 + include/linux/pci_liveupdate.h | 13 ++ 6 files changed, 251 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6c618830cf61..0e262c0ceb43 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -20537,6 +20537,7 @@ L: linux-pci@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git F: drivers/pci/liveupdate.c +F: drivers/pci/liveupdate.h F: include/linux/kho/abi/pci.h F: include/linux/pci_liveupdate.h diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 10c9b65aa242..e68ae5c172d4 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -330,7 +330,7 @@ config VGA_ARB_MAX_GPUS config PCI_LIVEUPDATE bool "PCI Live Update Support" - depends on PCI && LIVEUPDATE + depends on PCI && LIVEUPDATE && 64BIT help Enable PCI core support for preserving PCI devices across Live Update. This, in combination with support in a device's driver, diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c index 065d5af822f7..96c43b84532c 100644 --- a/drivers/pci/liveupdate.c +++ b/drivers/pci/liveupdate.c @@ -49,6 +49,20 @@ * This allows the PCI core to keep its FLB data (struct pci_ser) up to date * with the list of **outgoing** preserved devices for the next kernel. * + * After kexec, whenever a device is enumerated, the PCI core will check if it + * is an **incoming** preserved device (i.e. preserved by the previous kernel) + * by checking the incoming FLB data (struct pci_ser). + * + * Drivers must notify the PCI core when an **incoming** device is done + * participating in the incoming Live Update with the following API: + * + * * ``pci_liveupdate_finish(pci_dev)`` + * + * The PCI core does not enforce any ordering of ``pci_liveupdate_finish()`` and + * ``pci_liveupdate_preserve()``. i.e. A PCI device can be **outgoing** + * (preserved for next kernel) and **incoming** (preserved by previous kernel) + * at the same time. + * * Restrictions * ============ * @@ -83,6 +97,21 @@ static struct pci_liveupdate_global pci_liveupdate = { .rwsem = __RWSEM_INITIALIZER(pci_liveupdate.rwsem), }; +/** + * struct pci_flb_incoming - Incoming PCI FLB object + * @ser: The incoming struct pci_ser from the previous kernel. + * @xa: Xarray used to quickly lookup devices in @ser. + */ +struct pci_flb_incoming { + struct pci_ser *ser; + struct xarray xa; +}; + +static unsigned long pci_ser_xa_key(u32 domain, u16 bdf) +{ + return (unsigned long)domain << 16 | bdf; +} + static int pci_flb_preserve(struct liveupdate_flb_op_args *args) { struct pci_dev *dev = NULL; @@ -128,13 +157,49 @@ static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args) static int pci_flb_retrieve(struct liveupdate_flb_op_args *args) { - args->obj = phys_to_virt(args->data); + struct pci_ser *ser = phys_to_virt(args->data); + struct pci_flb_incoming *incoming; + int ret = -ENOMEM; + u32 i; + + incoming = kmalloc_obj(*incoming); + if (!incoming) + goto err_restore_free; + + incoming->ser = ser; + xa_init(&incoming->xa); + + for (i = 0; i < incoming->ser->max_nr_devices; i++) { + struct pci_dev_ser *dev_ser = &incoming->ser->devices[i]; + unsigned long key; + + if (!dev_ser->refcount) + continue; + + key = pci_ser_xa_key(dev_ser->domain, dev_ser->bdf); + ret = xa_insert(&incoming->xa, key, dev_ser, GFP_KERNEL); + if (ret) + goto err_xa_destroy; + } + + args->obj = incoming; return 0; + +err_xa_destroy: + xa_destroy(&incoming->xa); + kfree(incoming); +err_restore_free: + kho_restore_free(ser); + return ret; } static void pci_flb_finish(struct liveupdate_flb_op_args *args) { - kho_restore_free(args->obj); + struct pci_flb_incoming *incoming = args->obj; + + xa_destroy(&incoming->xa); + kho_restore_free(incoming->ser); + kfree(incoming); } static struct liveupdate_flb_ops pci_liveupdate_flb_ops = { @@ -270,6 +335,91 @@ void pci_liveupdate_unpreserve(struct pci_dev *dev) } EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve); +static struct pci_flb_incoming *pci_liveupdate_flb_get_incoming(void) +{ + struct pci_flb_incoming *incoming = NULL; + int ret; + + ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&incoming); + + /* Live Update is not enabled. */ + if (ret == -EOPNOTSUPP) + return NULL; + + /* Live Update is enabled, but there is no incoming FLB data. */ + if (ret == -ENODATA) + return NULL; + + /* + * Live Update is enabled and there is incoming FLB data, but none of it + * matches pci_liveupdate_flb.compatible. + * + * This could mean that no PCI FLB data was passed by the previous + * kernel, but it could also mean the previous kernel used a different + * compatibility string (i.e. a different ABI). + */ + if (ret == -ENOENT) { + pr_info_once("No incoming FLB matched %s\n", pci_liveupdate_flb.compatible); + return NULL; + } + + /* + * There is incoming FLB data that matches pci_liveupdate_flb.compatible + * but it cannot be retrieved. + */ + if (ret) { + WARN_ONCE(ret, "Failed to retrieve incoming FLB data\n"); + return NULL; + } + + return incoming; +} + +static void pci_liveupdate_flb_put_incoming(void) +{ + liveupdate_flb_put_incoming(&pci_liveupdate_flb); +} + +void pci_liveupdate_setup_device(struct pci_dev *dev) +{ + struct pci_flb_incoming *incoming; + struct pci_dev_ser *dev_ser; + unsigned long key; + + guard(rwsem_write)(&pci_liveupdate.rwsem); + + incoming = pci_liveupdate_flb_get_incoming(); + if (!incoming) + return; + + key = pci_ser_xa_key(pci_domain_nr(dev->bus), pci_dev_id(dev)); + dev_ser = xa_load(&incoming->xa, key); + + /* This device was not preserved across Live Update */ + if (!dev_ser) { + pci_liveupdate_flb_put_incoming(); + return; + } + + /* + * This device was preserved, but has already been probed and gone + * through pci_liveupdate_finish(). This can happen if PCI core probes + * the same device multiple times, e.g. due to hotplug. + */ + if (!dev_ser->refcount) { + pci_liveupdate_flb_put_incoming(); + return; + } + + pci_info(dev, "Device was preserved by previous kernel across Live Update\n"); + dev->liveupdate.incoming = dev_ser; + + /* + * Hold the ref on the incoming FLB until pci_liveupdate_finish() so + * that dev->liveupdate.incoming does not get freed while it is in use. + */ +} + void pci_liveupdate_cleanup_device(struct pci_dev *dev) { /* @@ -280,7 +430,83 @@ void pci_liveupdate_cleanup_device(struct pci_dev *dev) pci_WARN(dev, 1, "Destroying outgoing-preserved device!\n"); pci_liveupdate_unpreserve(dev); } + + if (READ_ONCE(dev->liveupdate.incoming)) { + pci_WARN(dev, 1, "Destroying incoming-preserved device!\n"); + pci_liveupdate_finish(dev); + } +} + +static void pci_liveupdate_finish_device(struct pci_ser *ser, struct pci_dev *dev) +{ + if (!dev->liveupdate.incoming) { + pci_warn(dev, "Cannot finish preserving an unpreserved device\n"); + return; + } + + if (dev->liveupdate.incoming->refcount != 1) { + pci_WARN(dev, 1, "Preserved device has a corrupted refcount!\n"); + return; + } + + /* + * Drop the refcount so this device does not get treated as an incoming + * device again, e.g. in case pci_liveupdate_setup_device() gets called + * again because the device is hot-plugged. + */ + dev->liveupdate.incoming->refcount = 0; + + pci_info(dev, "Device is finished participating in Live Update\n"); + dev->liveupdate.incoming = NULL; + ser->nr_devices--; + pci_liveupdate_flb_put_incoming(); +} + +/** + * pci_liveupdate_finish() - Finish the preservation of a PCI device + * @dev: The PCI device + * + * pci_liveupdate_finish() notifies the PCI core that a PCI device that was + * preserved across the previous Live Update has finished participating in Live + * Update. Drivers must call pci_liveupdate_finish() from their struct + * liveupdate_file_handler finish() callback to ensure the incoming struct + * pci_ser is allocated. + */ +void pci_liveupdate_finish(struct pci_dev *dev) +{ + struct pci_flb_incoming *incoming; + + guard(rwsem_write)(&pci_liveupdate.rwsem); + + incoming = pci_liveupdate_flb_get_incoming(); + if (!incoming) { + pci_warn(dev, "Cannot finish preserving device without incoming FLB\n"); + return; + } + + pci_liveupdate_finish_device(incoming->ser, dev); + pci_liveupdate_flb_put_incoming(); +} +EXPORT_SYMBOL_GPL(pci_liveupdate_finish); + +/** + * pci_liveupdate_is_incoming() - Check if a device is incoming-preserved + * @dev: The PCI device to check + * + * Check if a device was preserved across Live Update by the previous kernel, + * i.e. the device is incoming-preserved. Note that a device is only considered + * incoming-preserved prior to pci_liveupdate_finish(). It is up to drivers to + * synchronize usage of pci_liveupdate_is_incoming() with their own call to + * pci_liveupdate_finish() to avoid acting on stale data. + * + * Returns: True if the device is incoming-preserved, false otherwise. + */ +bool pci_liveupdate_is_incoming(struct pci_dev *dev) +{ + guard(rwsem_read)(&pci_liveupdate.rwsem); + return dev->liveupdate.incoming; } +EXPORT_SYMBOL_GPL(pci_liveupdate_is_incoming); /** * pci_liveupdate_register_flb() - Register a file handler with the PCI core diff --git a/drivers/pci/liveupdate.h b/drivers/pci/liveupdate.h index b2335581f8d0..eaaa3559fd77 100644 --- a/drivers/pci/liveupdate.h +++ b/drivers/pci/liveupdate.h @@ -11,8 +11,13 @@ #include #ifdef CONFIG_PCI_LIVEUPDATE +void pci_liveupdate_setup_device(struct pci_dev *dev); void pci_liveupdate_cleanup_device(struct pci_dev *dev); #else +static inline void pci_liveupdate_setup_device(struct pci_dev *dev) +{ +} + static inline void pci_liveupdate_cleanup_device(struct pci_dev *dev) { } diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index b88109a8dfe4..2e2be8af6976 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2070,6 +2070,8 @@ int pci_setup_device(struct pci_dev *dev) if (pci_early_dump) early_dump_pci_device(dev); + pci_liveupdate_setup_device(dev); + /* Need to have dev->class ready */ dev->cfg_size = pci_cfg_space_size(dev); @@ -2193,6 +2195,7 @@ int pci_setup_device(struct pci_dev *dev) default: /* unknown header */ pci_err(dev, "unknown header type %02x, ignoring device\n", dev->hdr_type); + pci_liveupdate_cleanup_device(dev); pci_release_of_node(dev); return -EIO; diff --git a/include/linux/pci_liveupdate.h b/include/linux/pci_liveupdate.h index cfcfbfa73af7..cfdc3d62ec02 100644 --- a/include/linux/pci_liveupdate.h +++ b/include/linux/pci_liveupdate.h @@ -16,9 +16,11 @@ /** * struct pci_liveupdate - PCI Live Update state for a struct pci_dev * @outgoing: State preserved for the next kernel. + * @incoming: State preserved by the previous kernel. */ struct pci_liveupdate { struct pci_dev_ser *outgoing; + struct pci_dev_ser *incoming; }; struct pci_dev; @@ -28,6 +30,8 @@ int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh); void pci_liveupdate_unregister_flb(struct liveupdate_file_handler *fh); int pci_liveupdate_preserve(struct pci_dev *dev); void pci_liveupdate_unpreserve(struct pci_dev *dev); +void pci_liveupdate_finish(struct pci_dev *dev); +bool pci_liveupdate_is_incoming(struct pci_dev *dev); #else static inline int pci_liveupdate_register_flb(struct liveupdate_file_handler *fh) { @@ -46,6 +50,15 @@ static inline int pci_liveupdate_preserve(struct pci_dev *dev) static inline void pci_liveupdate_unpreserve(struct pci_dev *dev) { } + +static inline void pci_liveupdate_finish(struct pci_dev *dev) +{ +} + +static inline bool pci_liveupdate_is_incoming(struct pci_dev *dev) +{ + return false; +} #endif #endif /* LINUX_PCI_LIVEUPDATE_H */ -- 2.54.0.746.g67dd491aae-goog