From: Steven Sistare <steven.sistare@oracle.com>
To: Fam Zheng <fam.zheng@bytedance.com>
Cc: "Jason Zeng" <jason.zeng@linux.intel.com>,
"Juan Quintela" <quintela@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
qemu-devel@nongnu.org,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"Zheng Chuan" <zhengchuan@huawei.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Daniel P. Berrange" <berrange@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Markus Armbruster" <armbru@redhat.com>
Subject: Re: [PATCH V6 21/27] vfio-pci: cpr part 3 (intx)
Date: Mon, 11 Apr 2022 12:23:50 -0400 [thread overview]
Message-ID: <f18cf0e7-bb4f-1b43-ee2b-5e00c0c282d6@oracle.com> (raw)
In-Reply-To: <20220329110326.GA447081@fam-dell>
On 3/29/2022 7:03 AM, Fam Zheng wrote:
> On 2021-08-06 14:43, Steve Sistare wrote:
>> Preserve vfio INTX state across cpr restart. Preserve VFIOINTx fields as
>> follows:
>> pin : Recover this from the vfio config in kernel space
>> interrupt : Preserve its eventfd descriptor across exec.
>> unmask : Ditto
>> route.irq : This could perhaps be recovered in vfio_pci_post_load by
>> calling pci_device_route_intx_to_irq(pin), whose implementation reads
>> config space for a bridge device such as ich9. However, there is no
>> guarantee that the bridge vmstate is read before vfio vmstate. Rather
>> than fiddling with MigrationPriority for vmstate handlers, explicitly
>> save route.irq in vfio vmstate.
>> pending : save in vfio vmstate.
>> mmap_timeout, mmap_timer : Re-initialize
>> bool kvm_accel : Re-initialize
>>
>> In vfio_realize, defer calling vfio_intx_enable until the vmstate
>> is available, in vfio_pci_post_load. Modify vfio_intx_enable and
>> vfio_intx_kvm_enable to skip vfio initialization, but still perform
>> kvm initialization.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>
> Hi Steve,
>
> Not directly related to this patch, but since the context is close: it looks
> like this series only takes care of exec restart mode of vfio-pci, have you had
> any thoughts on kexec reboot mode with vfio-pci?
>
> The general idea is if DMAR context is not lost during kexec, we should be able
> to set up irqfds again and things will just work?
>
> Fam
Hi Fam,
I have thought about that use case, but only in general terms.
IMO it best fits in the cpr framework as a new mode (rather than as
a new -restore command line argument).
In your code below, you would have fewer code changes if you set
'reused = true' for the new mode, rather than testing both 'reused and restored'
at multiple sites. Lastly, I cleaned up the vector handling somewhat from V6
to V7, so you may want to try your code using V7 as a base.
- Steve
> PS some more info below:
>
> I have some local kernel patches to kexec reboot most part of the host kernel
> while keeping IOMMU DMAR tables in a valid state; with that, not many extra
> things are needed in addition to restore it. A PoC is like below (I can share
> more details of the kernel changes if this patch makes any sense):
>
>
> commit f8951e58be86bd6e37f816394a9a73f28d8059fc
> Author: Fam Zheng <fam.zheng@bytedance.com>
> Date: Mon Mar 21 13:19:49 2022 +0000
>
> cpr: Add live-update support to vfio-pci devices
>
> In cpr-save, always serialize the vfio-pci states.
>
> In cpr-load, add a '-restore' mode that will do
> VFIO_GROUP_GET_DEVICE_FD_INTACT and skip DMAR setup, somewhat similar to
> the current cpr exec mode.
>
> Signed-off-by: Fam Zheng <fam.zheng@bytedance.com>
>
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 73f4259556..e36f0ef97d 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -584,10 +584,15 @@ void msix_init_vector_notifiers(PCIDevice *dev,
> MSIVectorReleaseNotifier release_notifier,
> MSIVectorPollNotifier poll_notifier)
> {
> + int vector;
> +
> assert(use_notifier && release_notifier);
> dev->msix_vector_use_notifier = use_notifier;
> dev->msix_vector_release_notifier = release_notifier;
> dev->msix_vector_poll_notifier = poll_notifier;
> + for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
> + msix_handle_mask_update(dev, vector, true);
> + }
> }
>
> int msix_set_vector_notifiers(PCIDevice *dev,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 605ffbb5d0..f1240410a8 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -2066,6 +2066,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> bool reused;
> VFIOAddressSpace *space;
>
> + if (restore) {
> + return 0;
> + }
> space = vfio_get_address_space(as);
> fd = cpr_find_fd("vfio_container_for_group", group->groupid);
> reused = (fd > 0);
> @@ -2486,7 +2489,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
> fd = cpr_find_fd(name, 0);
> reused = (fd >= 0);
> if (!reused) {
> - fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> + int op = restore ? VFIO_GROUP_GET_DEVICE_FD_INTACT : VFIO_GROUP_GET_DEVICE_FD;
> + fd = ioctl(group->fd, op, name);
> }
>
> if (fd < 0) {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e32513c668..9da5f93228 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -361,7 +361,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
> * Do not alter interrupt state during vfio_realize and cpr-load. The
> * reused flag is cleared thereafter.
> */
> - if (!vdev->pdev.reused) {
> + if (!vdev->pdev.reused && !restore) {
> vfio_disable_interrupts(vdev);
> }
>
> @@ -388,7 +388,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
> fd = event_notifier_get_fd(&vdev->intx.interrupt);
> qemu_set_fd_handler(fd, vfio_intx_interrupt, NULL, vdev);
>
> - if (vdev->pdev.reused) {
> + if (vdev->pdev.reused && !restore) {
> vfio_intx_reenable_kvm(vdev, &err);
> goto finish;
> }
> @@ -2326,6 +2326,9 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> int ret, i, count;
> bool multi = false;
>
> + if (restore) {
> + return 0;
> + }
> trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
>
> if (!single) {
> @@ -3185,7 +3188,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
>
> /* Wait until cpr-load reads intx routing data to enable */
> - if (!pdev->reused) {
> + if (!pdev->reused && !restore) {
> ret = vfio_intx_enable(vdev, errp);
> if (ret) {
> goto out_deregister;
> @@ -3295,7 +3298,7 @@ static void vfio_pci_reset(DeviceState *dev)
> VFIOPCIDevice *vdev = VFIO_PCI(dev);
>
> /* Do not reset the device during qemu_system_reset prior to cpr-load */
> - if (vdev->pdev.reused) {
> + if (vdev->pdev.reused || restore) {
> return;
> }
>
> @@ -3429,33 +3432,40 @@ static void vfio_merge_config(VFIOPCIDevice *vdev)
>
> static void vfio_claim_vectors(VFIOPCIDevice *vdev, int nr_vectors, bool msix)
> {
> - int i, fd;
> + int i, fd, ret;
> bool pending = false;
> PCIDevice *pdev = &vdev->pdev;
>
> + pdev->msix_entries_nr = nr_vectors;
> vdev->nr_vectors = nr_vectors;
> vdev->msi_vectors = g_new0(VFIOMSIVector, nr_vectors);
> vdev->interrupt = msix ? VFIO_INT_MSIX : VFIO_INT_MSI;
>
> - for (i = 0; i < nr_vectors; i++) {
> - VFIOMSIVector *vector = &vdev->msi_vectors[i];
> -
> - fd = load_event_fd(vdev, "interrupt", i);
> - if (fd >= 0) {
> - vfio_vector_init(vdev, i);
> - qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
> + if (restore) {
> + ret = vfio_enable_vectors(vdev, true);
> + if (ret) {
> + error_report("vfio: failed to enable vectors, %d", ret);
> }
> + } else {
> + for (i = 0; i < nr_vectors; i++) {
> + VFIOMSIVector *vector = &vdev->msi_vectors[i];
>
> - if (load_event_fd(vdev, "kvm_interrupt", i) >= 0) {
> - vfio_add_kvm_msi_virq(vdev, vector, i, msix);
> - }
> + fd = load_event_fd(vdev, "interrupt", i);
> + if (fd >= 0) {
> + vfio_vector_init(vdev, i);
> + qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
> + }
>
> - if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
> - set_bit(i, vdev->msix->pending);
> - pending = true;
> + if (load_event_fd(vdev, "kvm_interrupt", i) >= 0) {
> + vfio_add_kvm_msi_virq(vdev, vector, i, msix);
> + }
> +
> + if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
> + set_bit(i, vdev->msix->pending);
> + pending = true;
> + }
> }
> }
> -
> if (msix) {
> memory_region_set_enabled(&pdev->msix_pba_mmio, pending);
> }
> @@ -3534,7 +3544,7 @@ static const VMStateDescription vfio_intx_vmstate = {
>
> static bool vfio_pci_needed(void *opaque)
> {
> - return cpr_get_mode() == CPR_MODE_RESTART;
> + return 1;
> }
>
> static const VMStateDescription vfio_pci_vmstate = {
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 6241c20fb1..0179b0aa90 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -26,6 +26,7 @@ void configure_rtc(QemuOpts *opts);
> void qemu_init_subsystems(void);
>
> extern int autostart;
> +extern int restore;
>
> typedef enum {
> VGA_NONE, VGA_STD, VGA_CIRRUS, VGA_VMWARE, VGA_XENFB, VGA_QXL,
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index e680594f27..65c3bab074 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -188,6 +188,8 @@ struct vfio_group_status {
> */
> #define VFIO_GROUP_GET_DEVICE_FD _IO(VFIO_TYPE, VFIO_BASE + 6)
>
> +#define VFIO_GROUP_GET_DEVICE_FD_INTACT _IO(VFIO_TYPE, VFIO_BASE + 21)
> +
> /* --------------- IOCTLs for DEVICE file descriptors --------------- */
>
> /**
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8b90d04cb9..03666a59b3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3984,6 +3984,10 @@ SRST
> option is experimental.
> ERST
>
> +DEF("restore", 0, QEMU_OPTION_restore, \
> + "-restore restore mode",
> + QEMU_ARCH_ALL)
> +
> DEF("S", 0, QEMU_OPTION_S, \
> "-S freeze CPU at startup (use 'c' to start execution)\n",
> QEMU_ARCH_ALL)
> diff --git a/softmmu/globals.c b/softmmu/globals.c
> index a18fd8dcf3..6fcb5846b4 100644
> --- a/softmmu/globals.c
> +++ b/softmmu/globals.c
> @@ -41,6 +41,7 @@ bool enable_cpu_pm;
> int nb_nics;
> NICInfo nd_table[MAX_NICS];
> int autostart = 1;
> +int restore;
> int vga_interface_type = VGA_NONE;
> Chardev *parallel_hds[MAX_PARALLEL_PORTS];
> int win2k_install_hack;
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index f14e29e622..fba6b577cb 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -3088,6 +3088,9 @@ void qemu_init(int argc, char **argv, char **envp)
> case QEMU_OPTION_S:
> autostart = 0;
> break;
> + case QEMU_OPTION_restore:
> + restore = 1;
> + break;
> case QEMU_OPTION_k:
> keyboard_layout = optarg;
> break;
next prev parent reply other threads:[~2022-04-11 16:27 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-06 21:43 [PATCH V6 00/27] Live Update Steve Sistare
2021-08-06 21:43 ` [PATCH V6 01/27] memory: qemu_check_ram_volatile Steve Sistare
2021-08-06 21:43 ` [PATCH V6 02/27] migration: fix populate_vfio_info Steve Sistare
2021-08-06 21:43 ` [PATCH V6 03/27] migration: qemu file wrappers Steve Sistare
2021-08-06 21:43 ` [PATCH V6 04/27] migration: simplify savevm Steve Sistare
2021-08-06 21:43 ` [PATCH V6 05/27] vl: start on wakeup request Steve Sistare
2021-08-06 21:43 ` [PATCH V6 06/27] cpr: reboot mode Steve Sistare
2021-08-06 21:43 ` [PATCH V6 07/27] cpr: reboot HMP interfaces Steve Sistare
2021-08-06 21:43 ` [PATCH V6 08/27] memory: flat section iterator Steve Sistare
2021-08-06 21:43 ` [PATCH V6 09/27] oslib: qemu_clear_cloexec Steve Sistare
2021-08-06 21:43 ` [PATCH V6 10/27] machine: memfd-alloc option Steve Sistare
2021-08-06 21:43 ` [PATCH V6 11/27] qapi: list utility functions Steve Sistare
2021-08-06 21:43 ` [PATCH V6 12/27] vl: helper to request re-exec Steve Sistare
2021-08-06 21:43 ` [PATCH V6 13/27] cpr: preserve extra state Steve Sistare
2021-08-06 21:43 ` [PATCH V6 14/27] cpr: restart mode Steve Sistare
2021-08-06 21:43 ` [PATCH V6 15/27] cpr: restart HMP interfaces Steve Sistare
2021-08-06 21:43 ` [PATCH V6 16/27] hostmem-memfd: cpr for memory-backend-memfd Steve Sistare
2021-08-06 21:43 ` [PATCH V6 17/27] pci: export functions for cpr Steve Sistare
2021-08-06 21:43 ` [PATCH V6 18/27] vfio-pci: refactor " Steve Sistare
2021-08-10 16:53 ` Alex Williamson
2021-08-23 16:52 ` Steven Sistare
2021-08-06 21:43 ` [PATCH V6 19/27] vfio-pci: cpr part 1 (fd and dma) Steve Sistare
2021-08-10 17:06 ` Alex Williamson
2021-08-23 19:43 ` Steven Sistare
2021-11-10 7:48 ` Zheng Chuan
2021-11-30 16:11 ` Steven Sistare
2021-08-06 21:43 ` [PATCH V6 20/27] vfio-pci: cpr part 2 (msi) Steve Sistare
2021-08-06 21:43 ` [PATCH V6 21/27] vfio-pci: cpr part 3 (intx) Steve Sistare
2022-03-29 11:03 ` Fam Zheng
2022-04-11 16:23 ` Steven Sistare [this message]
2022-04-12 11:01 ` Fam Zheng
2021-08-06 21:43 ` [PATCH V6 22/27] vhost: reset vhost devices for cpr Steve Sistare
2021-08-06 21:43 ` [PATCH V6 23/27] chardev: cpr framework Steve Sistare
2021-08-06 21:43 ` [PATCH V6 24/27] chardev: cpr for simple devices Steve Sistare
2021-08-06 21:43 ` [PATCH V6 25/27] chardev: cpr for pty Steve Sistare
2021-08-06 21:44 ` [PATCH V6 26/27] chardev: cpr for sockets Steve Sistare
2021-08-06 21:44 ` [PATCH V6 27/27] cpr: only-cpr-capable option Steve Sistare
2021-08-09 16:02 ` [PATCH V6 00/27] Live Update Steven Sistare
2021-08-21 8:54 ` Zheng Chuan
2021-08-23 21:36 ` Steven Sistare
2021-08-24 9:36 ` Zheng Chuan
2021-08-31 21:15 ` Steven Sistare
2021-10-27 6:16 ` Zheng Chuan
2021-10-27 12:25 ` Steven Sistare
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f18cf0e7-bb4f-1b43-ee2b-5e00c0c282d6@oracle.com \
--to=steven.sistare@oracle.com \
--cc=alex.bennee@linaro.org \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eblake@redhat.com \
--cc=fam.zheng@bytedance.com \
--cc=jason.zeng@linux.intel.com \
--cc=marcandre.lureau@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=zhengchuan@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).