qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Yuval Shaia <yuval.shaia@oracle.com>
To: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC v2 2/2] hw/pvrdma: add live migration support
Date: Mon, 8 Jul 2019 08:13:39 +0300	[thread overview]
Message-ID: <20190708051338.GA5441@lap1> (raw)
In-Reply-To: <20190706040940.7884-3-skrtbhtngr@gmail.com>

On Sat, Jul 06, 2019 at 09:39:40AM +0530, Sukrit Bhatnagar wrote:
> Use VMStateDescription for migrating device state. Currently,

What do you mean by 'Currently'?

> 'vmstate_pvrdma' describes the PCI and MSIX state for pvrdma and
> 'vmstate_pvrdma_dsr_dma' describes a temporary state containing
> some values obtained only after mapping to dsr in the source.
> Since the dsr will not be available on dest until we map to the
> dma address we had on source, these values cannot be migrated
> directly.
> 
> Add PVRDMAMigTmp to store this temporary state which consists of
> dma addresses and ring page information. The 'parent' member is
> used to refer to the device state (PVRDMADev) so that parent PCI
> device object is accessible, which is needed to remap to DSR.
> 
> pvrdma_dsr_dma_pre_save() saves the dsr state into this temporary
> representation and pvrdma_dsr_dma_post_load() loads it back.
> This load function also remaps to the dsr and and calls
> load_dsr() for further map and ring init operations.
> 
> Please note that this call to load_dsr() can be removed from the
> migration flow and included in pvrdma_regs_write() to perform a
> lazy load.

The lazy load was suggested to overcome a potential problem with mapping to
addresses while still in migration process. With that been solved i would
suggest to drop the idea of lazy load.

> As of now, migration will fail if there in an error in load_dsr().
> Also, there might be a considerable amount of pages in the rings,
> which will have dma map operations when the init functions are
> called.
> If this takes noticeable time, it might be better to have lazy
> load instead.

Yeah, make sense but i hope we will not get to this.

> 
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> ---
>  hw/rdma/vmw/pvrdma_main.c | 87 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
> 
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index 6c90db96f9..4a10bd2fc7 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -28,6 +28,7 @@
>  #include "sysemu/sysemu.h"
>  #include "monitor/monitor.h"
>  #include "hw/rdma/rdma.h"
> +#include "migration/register.h"
>  
>  #include "../rdma_rm.h"
>  #include "../rdma_backend.h"
> @@ -593,6 +594,91 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
>      pvrdma_fini(pci_dev);
>  }
>  
> +struct PVRDMAMigTmp {
> +    PVRDMADev *parent;
> +    uint64_t dma;
> +    uint64_t cmd_slot_dma;
> +    uint64_t resp_slot_dma;
> +    uint32_t cq_ring_pages_num_pages;
> +    uint64_t cq_ring_pages_pdir_dma;
> +    uint32_t async_ring_pages_num_pages;
> +    uint64_t async_ring_pages_pdir_dma;
> +};
> +
> +static int pvrdma_dsr_dma_pre_save(void *opaque)
> +{
> +    struct PVRDMAMigTmp *tmp = opaque;

For me tmp sounds like a very bad name, if it is the convention then i can
live with that, anyways suggesting something like mig_tmp_info or something
like that.

> +    DSRInfo *dsr_info = &tmp->parent->dsr_info;

Can you shad some light on how the parent field is initialized with the
pointer to the device object?

> +    struct pvrdma_device_shared_region *dsr = dsr_info->dsr;
> +
> +    tmp->dma = dsr_info->dma;
> +    tmp->cmd_slot_dma = dsr->cmd_slot_dma;
> +    tmp->resp_slot_dma = dsr->resp_slot_dma;
> +    tmp->cq_ring_pages_num_pages = dsr->cq_ring_pages.num_pages;
> +    tmp->cq_ring_pages_pdir_dma = dsr->cq_ring_pages.pdir_dma;
> +    tmp->async_ring_pages_num_pages = dsr->async_ring_pages.num_pages;
> +    tmp->async_ring_pages_pdir_dma = dsr->async_ring_pages.pdir_dma;
> +
> +    return 0;
> +}
> +
> +static int pvrdma_dsr_dma_post_load(void *opaque, int version_id)
> +{
> +    struct PVRDMAMigTmp *tmp = opaque;
> +    PVRDMADev *dev = tmp->parent;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    DSRInfo *dsr_info = &dev->dsr_info;
> +    struct pvrdma_device_shared_region *dsr;
> +
> +    dsr_info->dma = tmp->dma;
> +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> +                                sizeof(struct pvrdma_device_shared_region));
> +    if (!dsr_info->dsr) {
> +        rdma_error_report("Failed to map to DSR");
> +        return -ENOMEM;

Will this cause the VM on source host to continue functioning besides
failing the migration?

> +    }
> +
> +    dsr = dsr_info->dsr;
> +    dsr->cmd_slot_dma = tmp->cmd_slot_dma;
> +    dsr->resp_slot_dma = tmp->resp_slot_dma;
> +    dsr->cq_ring_pages.num_pages = tmp->cq_ring_pages_num_pages;
> +    dsr->cq_ring_pages.pdir_dma = tmp->cq_ring_pages_pdir_dma;
> +    dsr->async_ring_pages.num_pages = tmp->async_ring_pages_num_pages;
> +    dsr->async_ring_pages.pdir_dma = tmp->async_ring_pages_pdir_dma;

I expect the above 6 fields to be already populated with the correct values
as we just map to driver's DSR that should be migrated as part of memory
copy of source to dest host.
Can you verify it and if my assumptions are correct to remove these
assignments (and the corresponding from pre_save)?

> +
> +    return load_dsr(dev);
> +}
> +
> +static const VMStateDescription vmstate_pvrdma_dsr_dma = {
> +    .name = "pvrdma-dsr-dma",
> +    .pre_save = pvrdma_dsr_dma_pre_save,
> +    .post_load = pvrdma_dsr_dma_post_load,

I'm looking for a hook that is triggered just before leaving the source
host so we can do some needed cleanups such as unmapping the DSR, removing
IP addresses from the host etc.

> +    .fields = (VMStateField[]) {
> +            VMSTATE_UINT64(dma, struct PVRDMAMigTmp),
> +            VMSTATE_UINT64(cmd_slot_dma, struct PVRDMAMigTmp),
> +            VMSTATE_UINT64(resp_slot_dma, struct PVRDMAMigTmp),
> +            VMSTATE_UINT32(async_ring_pages_num_pages, struct PVRDMAMigTmp),
> +            VMSTATE_UINT64(async_ring_pages_pdir_dma, struct PVRDMAMigTmp),
> +            VMSTATE_UINT32(cq_ring_pages_num_pages, struct PVRDMAMigTmp),
> +            VMSTATE_UINT64(cq_ring_pages_pdir_dma, struct PVRDMAMigTmp),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_pvrdma = {
> +    .name = "pvrdma",

Suggesting to use the already defined constant PVRDMA_HW_NAME.

> +    .version_id = 1,
> +    .minimum_version_id = 1,

Hmmm...interesting, what's the use of these fields?

> +    .fields = (VMStateField[]) {
> +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> +            VMSTATE_WITH_TMP(PVRDMADev,
> +                             struct PVRDMAMigTmp,
> +                             vmstate_pvrdma_dsr_dma),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>  {
>      int rc = 0;
> @@ -688,6 +774,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
>  
>      dc->desc = "RDMA Device";
>      dc->props = pvrdma_dev_properties;
> +    dc->vmsd = &vmstate_pvrdma;
>      set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);

Besides the above comments looks like a great job, thanks!

>  
>      ir->print_statistics = pvrdma_print_statistics;
> -- 
> 2.21.0
> 
> 


  reply	other threads:[~2019-07-08  5:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-06  4:09 [Qemu-devel] [RFC v2 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
2019-07-06  4:09 ` [Qemu-devel] [RFC v2 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
2019-07-06  4:09 ` [Qemu-devel] [RFC v2 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
2019-07-08  5:13   ` Yuval Shaia [this message]
2019-07-10 20:14     ` Sukrit Bhatnagar
2019-07-06 19:04 ` [Qemu-devel] [RFC v2 0/2] Add live migration support in the PVRDMA device Marcel Apfelbaum
2019-07-08  9:38   ` Daniel P. Berrangé
2019-07-08 18:58     ` Marcel Apfelbaum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190708051338.GA5441@lap1 \
    --to=yuval.shaia@oracle.com \
    --cc=qemu-devel@nongnu.org \
    --cc=skrtbhtngr@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).