All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yi Liu <yi.l.liu@intel.com>
To: David Matlack <dmatlack@google.com>,
	Alex Williamson <alex@shazbot.org>,
	Bjorn Helgaas <bhelgaas@google.com>
Cc: "Adithya Jayachandran" <ajayachandra@nvidia.com>,
	"Alexander Graf" <graf@amazon.com>,
	"Alex Mastro" <amastro@fb.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Askar Safin" <safinaskar@gmail.com>,
	"Borislav Petkov (AMD)" <bp@alien8.de>,
	"Chris Li" <chrisl@kernel.org>,
	"Dapeng Mi" <dapeng1.mi@linux.intel.com>,
	"David Rientjes" <rientjes@google.com>,
	"Feng Tang" <feng.tang@linux.alibaba.com>,
	"Jacob Pan" <jacob.pan@linux.microsoft.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Josh Hilke" <jrhilke@google.com>, "Kees Cook" <kees@kernel.org>,
	"Kevin Tian" <kevin.tian@intel.com>,
	kexec@lists.infradead.org, kvm@vger.kernel.org,
	"Leon Romanovsky" <leon@kernel.org>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org, "Li RongQing" <lirongqing@baidu.com>,
	"Lukas Wunner" <lukas@wunner.de>,
	"Marco Elver" <elver@google.com>,
	"Michał Winiarski" <michal.winiarski@intel.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Parav Pandit" <parav@nvidia.com>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	"Pranjal Shrivastava" <praan@google.com>,
	"Pratyush Yadav" <pratyush@kernel.org>,
	"Raghavendra Rao Ananta" <rananta@google.com>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Saeed Mahameed" <saeedm@nvidia.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	"Vipin Sharma" <vipinsh@google.com>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	"William Tu" <witu@nvidia.com>,
	"Zhu Yanjun" <yanjun.zhu@linux.dev>
Subject: Re: [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update
Date: Tue, 24 Mar 2026 21:08:16 +0800	[thread overview]
Message-ID: <df5dac48-8a54-49e2-acb8-9370b7078033@intel.com> (raw)
In-Reply-To: <20260323235817.1960573-8-dmatlack@google.com>

On 3/24/26 07:57, David Matlack wrote:
> From: Vipin Sharma <vipinsh@google.com>
> 
> Implement the live update file handler callbacks to preserve a vfio-pci
> device across a Live Update. Subsequent commits will enable userspace to
> then retrieve this file after the Live Update.
> 
> Live Update support is scoped only to cdev files (i.e. not
> VFIO_GROUP_GET_DEVICE_FD files).
> 
> State about each device is serialized into a new ABI struct
> vfio_pci_core_device_ser. The contents of this struct are preserved
> across the Live Update to the next kernel using a combination of
> Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
> Live Update Orchestrator (LUO) to preserve the physical address of the
> struct.
> 
> For now the only contents of struct vfio_pci_core_device_ser the
> device's PCI segment number and BDF, so that the device can be uniquely
> identified after the Live Update.
> 
> Require that userspace disables interrupts on the device prior to
> freeze() so that the device does not send any interrupts until new
> interrupt handlers have been set up by the next kernel.
> 
> Reset the device and restore its state in the freeze() callback. This
> ensures the device can be received by the next kernel in a consistent
> state. Eventually this will be dropped and the device can be preserved
> across in a running state, but that requires further work in VFIO and
> the core PCI layer.
> 
> Note that LUO holds a reference to this file when it is preserved. So
> VFIO is guaranteed that vfio_df_device_last_close() will not be called
> on this device no matter what userspace does.
> 
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> Co-developed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   drivers/vfio/pci/vfio_pci.c            |   2 +-
>   drivers/vfio/pci/vfio_pci_core.c       |  57 +++++----
>   drivers/vfio/pci/vfio_pci_liveupdate.c | 156 ++++++++++++++++++++++++-
>   drivers/vfio/pci/vfio_pci_priv.h       |   4 +
>   drivers/vfio/vfio_main.c               |   3 +-
>   include/linux/kho/abi/vfio_pci.h       |  15 +++
>   include/linux/vfio.h                   |   2 +
>   7 files changed, 213 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 41dcbe4ace67..351480d13f6e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
>   	return 0;
>   }
>   
> -static const struct vfio_device_ops vfio_pci_ops = {
> +const struct vfio_device_ops vfio_pci_ops = {
>   	.name		= "vfio-pci",
>   	.init		= vfio_pci_core_init_dev,
>   	.release	= vfio_pci_core_release_dev,
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..81f941323641 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -585,9 +585,42 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
>   }
>   EXPORT_SYMBOL_GPL(vfio_pci_core_enable);
>   
> +void vfio_pci_core_try_reset(struct vfio_pci_core_device *vdev)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	struct pci_dev *bridge = pci_upstream_bridge(pdev);
> +
> +	lockdep_assert_held(&vdev->vdev.dev_set->lock);
> +
> +	if (!vdev->reset_works)
> +		return;
> +
> +	/*
> +	 * Try to get the locks ourselves to prevent a deadlock. The
> +	 * success of this is dependent on being able to lock the device,
> +	 * which is not always possible.
> +	 *
> +	 * We cannot use the "try" reset interface here, since that will
> +	 * overwrite the previously restored configuration information.
> +	 */
> +	if (bridge && !pci_dev_trylock(bridge))
> +		return;
> +
> +	if (!pci_dev_trylock(pdev))
> +		goto out;
> +
> +	if (!__pci_reset_function_locked(pdev))
> +		vdev->needs_reset = false;
> +
> +	pci_dev_unlock(pdev);
> +out:
> +	if (bridge)
> +		pci_dev_unlock(bridge);
> +}
> +EXPORT_SYMBOL_GPL(vfio_pci_core_try_reset);
> +
>   void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>   {
> -	struct pci_dev *bridge;
>   	struct pci_dev *pdev = vdev->pdev;
>   	struct vfio_pci_dummy_resource *dummy_res, *tmp;
>   	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
> @@ -687,27 +720,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>   	 */
>   	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
>   
> -	/*
> -	 * Try to get the locks ourselves to prevent a deadlock. The
> -	 * success of this is dependent on being able to lock the device,
> -	 * which is not always possible.
> -	 * We can not use the "try" reset interface here, which will
> -	 * overwrite the previously restored configuration information.
> -	 */
> -	if (vdev->reset_works) {
> -		bridge = pci_upstream_bridge(pdev);
> -		if (bridge && !pci_dev_trylock(bridge))
> -			goto out_restore_state;
> -		if (pci_dev_trylock(pdev)) {
> -			if (!__pci_reset_function_locked(pdev))
> -				vdev->needs_reset = false;
> -			pci_dev_unlock(pdev);
> -		}
> -		if (bridge)
> -			pci_dev_unlock(bridge);
> -	}
> -
> -out_restore_state:
> +	vfio_pci_core_try_reset(vdev);
>   	pci_restore_state(pdev);
>   out:
>   	pci_disable_device(pdev);
> diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
> index 5ea5af46b159..c4ebc7c486e5 100644
> --- a/drivers/vfio/pci/vfio_pci_liveupdate.c
> +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
> @@ -6,27 +6,178 @@
>    * David Matlack <dmatlack@google.com>
>    */
>   
> +/**
> + * DOC: VFIO PCI Preservation via LUO
> + *
> + * VFIO PCI devices can be preserved over a kexec using the Live Update
> + * Orchestrator (LUO) file preservation. This allows userspace (such as a VMM)
> + * to transfer an in-use device to the next kernel.
> + *
> + * .. note::
> + *    The support for preserving VFIO PCI devices is currently *partial* and
> + *    should be considered *experimental*. It should only be used by developers
> + *    working on expanding the support for the time being.
> + *
> + *    To avoid accidental usage while the support is still experimental, this
> + *    support is hidden behind a default-disable config option
> + *    ``CONFIG_VFIO_PCI_LIVEUPDATE``. Once the kernel support has stabilized and
> + *    become complete, this option will be enabled by default when
> + *    ``CONFIG_VFIO_PCI`` and ``CONFIG_LIVEUPDATE`` are enabled.
> + *
> + * Usage Example
> + * =============
> + *
> + * VFIO PCI devices can be preserved across a kexec by preserving the file
> + * associated with the device in a LUO session::
> + *
> + *   device_fd = open("/dev/vfio/devices/X");

/dev/vfio/devices/vfioX

> + *   ...
> + *   ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, { ..., device_fd, ...});
> + *
> + * .. note::
> + *    LUO will hold an extra reference to the device file for as long as it is
> + *    preserved, so there is no way for the file to be destroyed or the device
> + *    to be unbound from the vfio-pci driver while it is preserved.
> + *
> + * Retrieving the file after kexec is not yet supported.
> + *
> + * Restrictions
> + * ============
> + *
> + * The kernel imposes the following restrictions when preserving VFIO devices:
> + *
> + *  * The device must be bound to the ``vfio-pci`` driver.
> + *
> + *  * ``CONFIG_VFIO_PCI_ZDEV_KVM`` must not be enabled. This may be relaxed in
> + *    the future.
> + *
> + *  * The device not be an Intel display device. This may be relaxed in the
> + *    future.
> + *
> + *  * The device file must have been acquired from the VFIO character device,
> + *    not ``VFIO_GROUP_GET_DEVICE_FD``.

how about "The device file descriptor must be obtained by opening the 
VFIO device
character device (``/dev/vfio/devices/vfioX``), not via 
``VFIO_GROUP_GET_DEVICE_FD``."?

just be aligned with the below words in vfio.rst.

"Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
user can now acquire a device fd by directly opening a character device 
/dev/vfio/devices/vfioX"

> + *
> + *  * The device must have interrupt disable prior to kexec. Failure to disable
> + *    interrupts on the device will cause the ``reboot(LINUX_REBOOT_CMD_KEXEC)``
> + *    syscall (to initiate the kexec) to fail.
> + *
> + * Preservation Behavior
> + * =====================
> + *
> + * The eventual goal of this support is to avoid disrupting the workload, state,
> + * or configuration of each preserved device during a Live Update. This would
> + * include allowing the device to perform DMA to preserved memory buffers and
> + * perform P2P DMA to other preserved devices. However, there are many pieces
> + * that still need to land in the kernel.
> + *
> + * For now, VFIO only preserves the following state for for devices:
> + *
> + *  * The PCI Segment, Bus, Device, and Function numbers of the device. The
> + *    kernel guarantees the these will not change across a kexec when a device
> + *    is preserved.
> + *
> + * Since the kernel is not yet prepared to preserve all parts of the device and
> + * its dependencies (such as DMA mappings), VFIO currently resets and restores
> + * preserved devices back into an idle state during kexec, before handing off
> + * control to the next kernel. This will be relaxed in future versions of the
> + * kernel once it is safe to allow the device to keep running across kexec.
> + */
> +
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>   
> +#include <linux/kexec_handover.h>
>   #include <linux/kho/abi/vfio_pci.h>
>   #include <linux/liveupdate.h>
>   #include <linux/errno.h>
> +#include <linux/vfio.h>

maybe follow alphabet order. errno.h would be moved to the top first.

Regards,Yi Liu


  reply	other threads:[~2026-03-24 13:01 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-03-25 20:06   ` David Matlack
2026-03-25 23:12   ` Bjorn Helgaas
2026-03-26 21:39     ` David Matlack
2026-03-30 22:54       ` Bjorn Helgaas
2026-03-31 17:33   ` Samiullah Khawaja
2026-04-02 21:28   ` Yanjun.Zhu
2026-04-03 17:24     ` Chris Li
2026-04-03 21:58     ` David Matlack
2026-04-05 16:56       ` Zhu Yanjun
2026-04-06 16:06         ` David Matlack
2026-04-06 18:09           ` Yanjun.Zhu
2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
2026-03-24 13:07   ` Yi Liu
2026-03-24 18:00     ` David Matlack
2026-03-25 11:12       ` Yi Liu
2026-03-25 17:29         ` David Matlack
2026-03-25 23:13   ` Bjorn Helgaas
2026-03-23 23:57 ` [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update David Matlack
2026-03-23 23:57 ` [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI David Matlack
2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-03-24 13:07   ` Yi Liu
2026-03-24 16:33     ` David Matlack
2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
2026-03-24 13:08   ` Yi Liu [this message]
2026-03-24 16:46     ` David Matlack
2026-03-27 23:39   ` Samiullah Khawaja
2026-04-21 17:40   ` David Matlack
2026-04-21 18:44     ` Jason Gunthorpe
2026-04-21 19:02       ` David Matlack
2026-04-21 19:20         ` Jason Gunthorpe
2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
2026-03-24 13:08   ` Yi Liu
2026-03-24 17:05     ` David Matlack
2026-03-23 23:58 ` [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-03-23 23:58 ` [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-03-23 23:58 ` [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-03-23 23:58 ` [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI David Matlack
2026-03-23 23:58 ` [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-03-23 23:58 ` [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-03-23 23:58 ` [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-03-23 23:58 ` [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-03-23 23:58 ` [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-03-23 23:58 ` [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-03-23 23:58 ` [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-03-23 23:58 ` [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-03-23 23:58 ` [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-03-23 23:58 ` [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack
2026-03-24  6:42 ` [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files Askar Safin
2026-03-26 20:43 ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df5dac48-8a54-49e2-acb8-9370b7078033@intel.com \
    --to=yi.l.liu@intel.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=elver@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=graf@amazon.com \
    --cc=jacob.pan@linux.microsoft.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jrhilke@google.com \
    --cc=kees@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=lukas@wunner.de \
    --cc=michal.winiarski@intel.com \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=praan@google.com \
    --cc=pratyush@kernel.org \
    --cc=rananta@google.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=safinaskar@gmail.com \
    --cc=skhan@linuxfoundation.org \
    --cc=skhawaja@google.com \
    --cc=vipinsh@google.com \
    --cc=vivek.kasireddy@intel.com \
    --cc=witu@nvidia.com \
    --cc=yanjun.zhu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.