qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Fam Zheng <fam.zheng@bytedance.com>
Cc: "Jason Zeng" <jason.zeng@linux.intel.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Zheng Chuan" <zhengchuan@huawei.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Markus Armbruster" <armbru@redhat.com>
Subject: Re: [PATCH V6 21/27] vfio-pci: cpr part 3 (intx)
Date: Mon, 11 Apr 2022 12:23:50 -0400	[thread overview]
Message-ID: <f18cf0e7-bb4f-1b43-ee2b-5e00c0c282d6@oracle.com> (raw)
In-Reply-To: <20220329110326.GA447081@fam-dell>

On 3/29/2022 7:03 AM, Fam Zheng wrote:
> On 2021-08-06 14:43, Steve Sistare wrote:
>> Preserve vfio INTX state across cpr restart.  Preserve VFIOINTx fields as
>> follows:
>>   pin : Recover this from the vfio config in kernel space
>>   interrupt : Preserve its eventfd descriptor across exec.
>>   unmask : Ditto
>>   route.irq : This could perhaps be recovered in vfio_pci_post_load by
>>     calling pci_device_route_intx_to_irq(pin), whose implementation reads
>>     config space for a bridge device such as ich9.  However, there is no
>>     guarantee that the bridge vmstate is read before vfio vmstate.  Rather
>>     than fiddling with MigrationPriority for vmstate handlers, explicitly
>>     save route.irq in vfio vmstate.
>>   pending : save in vfio vmstate.
>>   mmap_timeout, mmap_timer : Re-initialize
>>   bool kvm_accel : Re-initialize
>>
>> In vfio_realize, defer calling vfio_intx_enable until the vmstate
>> is available, in vfio_pci_post_load.  Modify vfio_intx_enable and
>> vfio_intx_kvm_enable to skip vfio initialization, but still perform
>> kvm initialization.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> Hi Steve,
> 
> Not directly related to this patch, but since the context is close: it looks
> like this series only takes care of exec restart mode of vfio-pci, have you had
> any thoughts on kexec reboot mode with vfio-pci?
> 
> The general idea is if DMAR context is not lost during kexec, we should be able
> to set up irqfds again and things will just work?
> 
> Fam

Hi Fam,
  I have thought about that use case, but only in general terms.
IMO it best fits in the cpr framework as a new mode (rather than as 
a new -restore command line argument).  

In your code below, you would have fewer code changes if you set 
'reused = true' for the new mode, rather than testing both 'reused and restored' 
at multiple sites. Lastly, I cleaned up the vector handling somewhat from V6 
to V7, so you may want to try your code using V7 as a base.

- Steve

> PS some more info below:
> 
> I have some local kernel patches to kexec reboot most part of the host kernel
> while keeping IOMMU DMAR tables in a valid state; with that, not many extra
> things are needed in addition to restore it. A PoC is like below (I can share
> more details of the kernel changes if this patch makes any sense):
> 
> 
> commit f8951e58be86bd6e37f816394a9a73f28d8059fc
> Author: Fam Zheng <fam.zheng@bytedance.com>
> Date:   Mon Mar 21 13:19:49 2022 +0000
> 
>     cpr: Add live-update support to vfio-pci devices
>     
>     In cpr-save, always serialize the vfio-pci states.
>     
>     In cpr-load, add a '-restore' mode that will do
>     VFIO_GROUP_GET_DEVICE_FD_INTACT and skip DMAR setup, somewhat similar to
>     the current cpr exec mode.
>     
>     Signed-off-by: Fam Zheng <fam.zheng@bytedance.com>
> 
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 73f4259556..e36f0ef97d 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -584,10 +584,15 @@ void msix_init_vector_notifiers(PCIDevice *dev,
>                                  MSIVectorReleaseNotifier release_notifier,
>                                  MSIVectorPollNotifier poll_notifier)
>  {
> +    int vector;
> +
>      assert(use_notifier && release_notifier);
>      dev->msix_vector_use_notifier = use_notifier;
>      dev->msix_vector_release_notifier = release_notifier;
>      dev->msix_vector_poll_notifier = poll_notifier;
> +    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
> +        msix_handle_mask_update(dev, vector, true);
> +    }
>  }
>  
>  int msix_set_vector_notifiers(PCIDevice *dev,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 605ffbb5d0..f1240410a8 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -2066,6 +2066,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>      bool reused;
>      VFIOAddressSpace *space;
>  
> +    if (restore) {
> +        return 0;
> +    }
>      space = vfio_get_address_space(as);
>      fd = cpr_find_fd("vfio_container_for_group", group->groupid);
>      reused = (fd > 0);
> @@ -2486,7 +2489,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
>      fd = cpr_find_fd(name, 0);
>      reused = (fd >= 0);
>      if (!reused) {
> -        fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> +        int op = restore ? VFIO_GROUP_GET_DEVICE_FD_INTACT : VFIO_GROUP_GET_DEVICE_FD;
> +        fd = ioctl(group->fd, op, name);
>      }
>  
>      if (fd < 0) {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e32513c668..9da5f93228 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -361,7 +361,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
>       * Do not alter interrupt state during vfio_realize and cpr-load.  The
>       * reused flag is cleared thereafter.
>       */
> -    if (!vdev->pdev.reused) {
> +    if (!vdev->pdev.reused && !restore) {
>          vfio_disable_interrupts(vdev);
>      }
>  
> @@ -388,7 +388,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
>      fd = event_notifier_get_fd(&vdev->intx.interrupt);
>      qemu_set_fd_handler(fd, vfio_intx_interrupt, NULL, vdev);
>  
> -    if (vdev->pdev.reused) {
> +    if (vdev->pdev.reused && !restore) {
>          vfio_intx_reenable_kvm(vdev, &err);
>          goto finish;
>      }
> @@ -2326,6 +2326,9 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>      int ret, i, count;
>      bool multi = false;
>  
> +    if (restore) {
> +        return 0;
> +    }
>      trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
>  
>      if (!single) {
> @@ -3185,7 +3188,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
>  
>          /* Wait until cpr-load reads intx routing data to enable */
> -        if (!pdev->reused) {
> +        if (!pdev->reused && !restore) {
>              ret = vfio_intx_enable(vdev, errp);
>              if (ret) {
>                  goto out_deregister;
> @@ -3295,7 +3298,7 @@ static void vfio_pci_reset(DeviceState *dev)
>      VFIOPCIDevice *vdev = VFIO_PCI(dev);
>  
>      /* Do not reset the device during qemu_system_reset prior to cpr-load */
> -    if (vdev->pdev.reused) {
> +    if (vdev->pdev.reused || restore) {
>          return;
>      }
>  
> @@ -3429,33 +3432,40 @@ static void vfio_merge_config(VFIOPCIDevice *vdev)
>  
>  static void vfio_claim_vectors(VFIOPCIDevice *vdev, int nr_vectors, bool msix)
>  {
> -    int i, fd;
> +    int i, fd, ret;
>      bool pending = false;
>      PCIDevice *pdev = &vdev->pdev;
>  
> +    pdev->msix_entries_nr = nr_vectors;
>      vdev->nr_vectors = nr_vectors;
>      vdev->msi_vectors = g_new0(VFIOMSIVector, nr_vectors);
>      vdev->interrupt = msix ? VFIO_INT_MSIX : VFIO_INT_MSI;
>  
> -    for (i = 0; i < nr_vectors; i++) {
> -        VFIOMSIVector *vector = &vdev->msi_vectors[i];
> -
> -        fd = load_event_fd(vdev, "interrupt", i);
> -        if (fd >= 0) {
> -            vfio_vector_init(vdev, i);
> -            qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
> +    if (restore) {
> +        ret = vfio_enable_vectors(vdev, true);
> +        if (ret) {
> +            error_report("vfio: failed to enable vectors, %d", ret);
>          }
> +    } else {
> +        for (i = 0; i < nr_vectors; i++) {
> +            VFIOMSIVector *vector = &vdev->msi_vectors[i];
>  
> -        if (load_event_fd(vdev, "kvm_interrupt", i) >= 0) {
> -            vfio_add_kvm_msi_virq(vdev, vector, i, msix);
> -        }
> +            fd = load_event_fd(vdev, "interrupt", i);
> +            if (fd >= 0) {
> +                vfio_vector_init(vdev, i);
> +                qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
> +            }
>  
> -        if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
> -            set_bit(i, vdev->msix->pending);
> -            pending = true;
> +            if (load_event_fd(vdev, "kvm_interrupt", i) >= 0) {
> +                vfio_add_kvm_msi_virq(vdev, vector, i, msix);
> +            }
> +
> +            if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
> +                set_bit(i, vdev->msix->pending);
> +                pending = true;
> +            }
>          }
>      }
> -
>      if (msix) {
>          memory_region_set_enabled(&pdev->msix_pba_mmio, pending);
>      }
> @@ -3534,7 +3544,7 @@ static const VMStateDescription vfio_intx_vmstate = {
>  
>  static bool vfio_pci_needed(void *opaque)
>  {
> -    return cpr_get_mode() == CPR_MODE_RESTART;
> +    return 1;
>  }
>  
>  static const VMStateDescription vfio_pci_vmstate = {
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 6241c20fb1..0179b0aa90 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -26,6 +26,7 @@ void configure_rtc(QemuOpts *opts);
>  void qemu_init_subsystems(void);
>  
>  extern int autostart;
> +extern int restore;
>  
>  typedef enum {
>      VGA_NONE, VGA_STD, VGA_CIRRUS, VGA_VMWARE, VGA_XENFB, VGA_QXL,
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index e680594f27..65c3bab074 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -188,6 +188,8 @@ struct vfio_group_status {
>   */
>  #define VFIO_GROUP_GET_DEVICE_FD	_IO(VFIO_TYPE, VFIO_BASE + 6)
>  
> +#define VFIO_GROUP_GET_DEVICE_FD_INTACT	_IO(VFIO_TYPE, VFIO_BASE + 21)
> +
>  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
>  
>  /**
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8b90d04cb9..03666a59b3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3984,6 +3984,10 @@ SRST
>      option is experimental.
>  ERST
>  
> +DEF("restore", 0, QEMU_OPTION_restore, \
> +    "-restore              restore mode",
> +    QEMU_ARCH_ALL)
> +
>  DEF("S", 0, QEMU_OPTION_S, \
>      "-S              freeze CPU at startup (use 'c' to start execution)\n",
>      QEMU_ARCH_ALL)
> diff --git a/softmmu/globals.c b/softmmu/globals.c
> index a18fd8dcf3..6fcb5846b4 100644
> --- a/softmmu/globals.c
> +++ b/softmmu/globals.c
> @@ -41,6 +41,7 @@ bool enable_cpu_pm;
>  int nb_nics;
>  NICInfo nd_table[MAX_NICS];
>  int autostart = 1;
> +int restore;
>  int vga_interface_type = VGA_NONE;
>  Chardev *parallel_hds[MAX_PARALLEL_PORTS];
>  int win2k_install_hack;
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index f14e29e622..fba6b577cb 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -3088,6 +3088,9 @@ void qemu_init(int argc, char **argv, char **envp)
>              case QEMU_OPTION_S:
>                  autostart = 0;
>                  break;
> +            case QEMU_OPTION_restore:
> +                restore = 1;
> +                break;
>              case QEMU_OPTION_k:
>                  keyboard_layout = optarg;
>                  break;


  reply	other threads:[~2022-04-11 16:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-06 21:43 [PATCH V6 00/27] Live Update Steve Sistare
2021-08-06 21:43 ` [PATCH V6 01/27] memory: qemu_check_ram_volatile Steve Sistare
2021-08-06 21:43 ` [PATCH V6 02/27] migration: fix populate_vfio_info Steve Sistare
2021-08-06 21:43 ` [PATCH V6 03/27] migration: qemu file wrappers Steve Sistare
2021-08-06 21:43 ` [PATCH V6 04/27] migration: simplify savevm Steve Sistare
2021-08-06 21:43 ` [PATCH V6 05/27] vl: start on wakeup request Steve Sistare
2021-08-06 21:43 ` [PATCH V6 06/27] cpr: reboot mode Steve Sistare
2021-08-06 21:43 ` [PATCH V6 07/27] cpr: reboot HMP interfaces Steve Sistare
2021-08-06 21:43 ` [PATCH V6 08/27] memory: flat section iterator Steve Sistare
2021-08-06 21:43 ` [PATCH V6 09/27] oslib: qemu_clear_cloexec Steve Sistare
2021-08-06 21:43 ` [PATCH V6 10/27] machine: memfd-alloc option Steve Sistare
2021-08-06 21:43 ` [PATCH V6 11/27] qapi: list utility functions Steve Sistare
2021-08-06 21:43 ` [PATCH V6 12/27] vl: helper to request re-exec Steve Sistare
2021-08-06 21:43 ` [PATCH V6 13/27] cpr: preserve extra state Steve Sistare
2021-08-06 21:43 ` [PATCH V6 14/27] cpr: restart mode Steve Sistare
2021-08-06 21:43 ` [PATCH V6 15/27] cpr: restart HMP interfaces Steve Sistare
2021-08-06 21:43 ` [PATCH V6 16/27] hostmem-memfd: cpr for memory-backend-memfd Steve Sistare
2021-08-06 21:43 ` [PATCH V6 17/27] pci: export functions for cpr Steve Sistare
2021-08-06 21:43 ` [PATCH V6 18/27] vfio-pci: refactor " Steve Sistare
2021-08-10 16:53   ` Alex Williamson
2021-08-23 16:52     ` Steven Sistare
2021-08-06 21:43 ` [PATCH V6 19/27] vfio-pci: cpr part 1 (fd and dma) Steve Sistare
2021-08-10 17:06   ` Alex Williamson
2021-08-23 19:43     ` Steven Sistare
2021-11-10  7:48     ` Zheng Chuan
2021-11-30 16:11       ` Steven Sistare
2021-08-06 21:43 ` [PATCH V6 20/27] vfio-pci: cpr part 2 (msi) Steve Sistare
2021-08-06 21:43 ` [PATCH V6 21/27] vfio-pci: cpr part 3 (intx) Steve Sistare
2022-03-29 11:03   ` Fam Zheng
2022-04-11 16:23     ` Steven Sistare [this message]
2022-04-12 11:01       ` Fam Zheng
2021-08-06 21:43 ` [PATCH V6 22/27] vhost: reset vhost devices for cpr Steve Sistare
2021-08-06 21:43 ` [PATCH V6 23/27] chardev: cpr framework Steve Sistare
2021-08-06 21:43 ` [PATCH V6 24/27] chardev: cpr for simple devices Steve Sistare
2021-08-06 21:43 ` [PATCH V6 25/27] chardev: cpr for pty Steve Sistare
2021-08-06 21:44 ` [PATCH V6 26/27] chardev: cpr for sockets Steve Sistare
2021-08-06 21:44 ` [PATCH V6 27/27] cpr: only-cpr-capable option Steve Sistare
2021-08-09 16:02 ` [PATCH V6 00/27] Live Update Steven Sistare
2021-08-21  8:54 ` Zheng Chuan
2021-08-23 21:36   ` Steven Sistare
2021-08-24  9:36     ` Zheng Chuan
2021-08-31 21:15       ` Steven Sistare
2021-10-27  6:16         ` Zheng Chuan
2021-10-27 12:25           ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f18cf0e7-bb4f-1b43-ee2b-5e00c0c282d6@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam.zheng@bytedance.com \
    --cc=jason.zeng@linux.intel.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).