From: Cornelia Huck <cohuck@redhat.com>
To: Zhao Yan <yan.y.zhao@intel.com>
Cc: alex.williamson@redhat.com, qemu-devel@nongnu.org,
intel-gvt-dev@lists.freedesktop.org,
Zhengxiao.zx@Alibaba-inc.com, yi.l.liu@intel.com,
eskultet@redhat.com, ziye.yang@intel.com,
shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com,
zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com,
aik@ozlabs.ru, eauger@redhat.com, felipe@nutanix.com,
jonathan.davies@nutanix.com, changpeng.liu@intel.com,
Ken.Xue@amd.com, kwankhede@nvidia.com, kevin.tian@intel.com,
cjia@nvidia.com, arei.gonglei@huawei.com, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH 2/5] vfio/migration: support device of device config capability
Date: Thu, 21 Feb 2019 11:56:46 +0100 [thread overview]
Message-ID: <20190221115646.416df963.cohuck@redhat.com> (raw)
In-Reply-To: <20190220225400.GF16456@joy-OptiPlex-7040>
On Wed, 20 Feb 2019 17:54:00 -0500
Zhao Yan <yan.y.zhao@intel.com> wrote:
> On Tue, Feb 19, 2019 at 03:37:24PM +0100, Cornelia Huck wrote:
> > On Tue, 19 Feb 2019 16:52:27 +0800
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >
> > > Device config is the default data that every device should have. so
> > > device config capability is by default on, no need to set.
> > >
> > > - Currently two type of resources are saved/loaded for device of device
> > > config capability:
> > > General PCI config data, and Device config data.
> > > They are copies as a whole when precopy is stopped.
> > >
> > > Migration setup flow:
> > > - Setup device state regions, check its device state version and capabilities.
> > > Mmap Device Config Region and Dirty Bitmap Region, if available.
> > > - If device state regions are failed to get setup, a migration blocker is
> > > registered instead.
> > > - Added SaveVMHandlers to register device state save/load handlers.
> > > - Register VM state change handler to set device's running/stop states.
> > > - On migration startup on source machine, set device's state to
> > > VFIO_DEVICE_STATE_LOGGING
> > >
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
> > > ---
> > > hw/vfio/Makefile.objs | 2 +-
> > > hw/vfio/migration.c | 633 ++++++++++++++++++++++++++++++++++++++++++
> > > hw/vfio/pci.c | 1 -
> > > hw/vfio/pci.h | 25 +-
> > > include/hw/vfio/vfio-common.h | 1 +
> > > 5 files changed, 659 insertions(+), 3 deletions(-)
> > > create mode 100644 hw/vfio/migration.c
> > >
> > > diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> > > index 8b3f664..f32ff19 100644
> > > --- a/hw/vfio/Makefile.objs
> > > +++ b/hw/vfio/Makefile.objs
> > > @@ -1,6 +1,6 @@
> > > ifeq ($(CONFIG_LINUX), y)
> > > obj-$(CONFIG_SOFTMMU) += common.o
> > > -obj-$(CONFIG_PCI) += pci.o pci-quirks.o display.o
> > > +obj-$(CONFIG_PCI) += pci.o pci-quirks.o display.o migration.o
> >
> > I think you want to split the migration code: The type-independent
> > code, and the pci-specific code.
> >
> ok. actually, now only saving/loading of pci generic config data is
> pci-specific. the data getting/setting through device state
> interfaces are type-independent.
Yes. If it has capability chains, it can probably be made to work.
>
> > > obj-$(CONFIG_VFIO_CCW) += ccw.o
> > > obj-$(CONFIG_SOFTMMU) += platform.o
> > > obj-$(CONFIG_VFIO_XGMAC) += calxeda-xgmac.o
(...)
> > > +static int vfio_save_data_device_config(VFIOPCIDevice *vdev, QEMUFile *f)
> > > +{
> > > + VFIODevice *vbasedev = &vdev->vbasedev;
> > > + VFIORegion *region_ctl =
> > > + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL];
> > > + VFIORegion *region_config =
> > > + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG];
> > > + void *dest;
> > > + uint32_t sz;
> > > + uint8_t *buf = NULL;
> > > + uint32_t action = VFIO_DEVICE_DATA_ACTION_GET_BUFFER;
> > > + uint64_t len = vdev->migration->devconfig_size;
> > > +
> > > + qemu_put_be64(f, len);
> >
> > Why big endian? (Generally, do we need any endianness considerations?)
> >
> I think big endian is the endian for qemu to save file.
> as long as qemu_put and qemu_get use the same endian, it will be no
> problem.
> do you think so?
Yes, as long we are explicit about the endianness. I'm not sure whether
e.g. power even has the ability to mix endianness.
(...)
> > > +static int vfio_set_device_state(VFIOPCIDevice *vdev,
> > > + uint32_t dev_state)
> > > +{
> > > + VFIODevice *vbasedev = &vdev->vbasedev;
> > > + VFIORegion *region =
> > > + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL];
> > > + uint32_t sz = sizeof(dev_state);
> > > +
> > > + if (!vdev->migration) {
> > > + return -1;
> > > + }
> > > +
> > > + if (pwrite(vbasedev->fd, &dev_state, sz,
> > > + region->fd_offset +
> > > + offsetof(struct vfio_device_state_ctl, device_state))
> > > + != sz) {
> > > + error_report("vfio: Failed to set device state %d", dev_state);
> >
> > Can the kernel reject this if a state transition is not allowed (or are
> > all transitions allowed?)
> >
> yes, kernel can reject some state transition if it's not allowed.
> But currently all transitions are allowed.
> Maybe a check of self-to-self transition is needed in kernel.
Self-to-self looks benign enough to simply return success early.
>
> > > + return -1;
> > > + }
> > > + vdev->migration->device_state = dev_state;
> > > + return 0;
> > > +}
(...)
> > > +static int vfio_check_devstate_version(VFIOPCIDevice *vdev)
> > > +{
> > > + VFIODevice *vbasedev = &vdev->vbasedev;
> > > + VFIORegion *region =
> > > + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL];
> > > +
> > > + uint32_t version;
> > > + uint32_t size = sizeof(version);
> > > +
> > > + if (pread(vbasedev->fd, &version, size,
> > > + region->fd_offset +
> > > + offsetof(struct vfio_device_state_ctl, version))
> > > + != size) {
> > > + error_report("%s Failed to read version of device state interfaces",
> > > + vbasedev->name);
> > > + return -1;
> > > + }
> > > +
> > > + if (version != VFIO_DEVICE_STATE_INTERFACE_VERSION) {
> > > + error_report("%s migration version mismatch, right version is %d",
> > > + vbasedev->name, VFIO_DEVICE_STATE_INTERFACE_VERSION);
> >
> > So, we require an exact match... or should we allow to extend the
> > interface in an backwards-compatible way, in which case we'd require
> > (QEMU interface version) <= (kernel interface version)?
> >
> currently yes. we can discuss on that.
If we want to allow that, we need to have a strictly monotonous
progression of versions here (which means dragging along old
compatibility code for basically forever). Maintaining a table about
which version is compatible with which other version would get insane
pretty quickly.
Can we somehow accommodate more optional regions via capabilities?
Maybe via optional vmstates?
> > > + return -1;
> > > + }
> > > +
> > > + return 0;
> > > +}
(...)
> > > +static void vfio_pci_load_config(VFIOPCIDevice *vdev, QEMUFile *f)
> > > +{
> > > + PCIDevice *pdev = &vdev->pdev;
> > > + uint32_t ctl, msi_lo, msi_hi, msi_data, bar_cfg, i;
> > > + bool msi_64bit;
> > > +
> > > + /* retore pci bar configuration */
> > > + ctl = pci_default_read_config(pdev, PCI_COMMAND, 2);
> > > + vfio_pci_write_config(pdev, PCI_COMMAND,
> > > + ctl & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
> > > + for (i = 0; i < PCI_ROM_SLOT; i++) {
> > > + bar_cfg = qemu_get_be32(f);
> > > + vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar_cfg, 4);
> > > + }
> > > + vfio_pci_write_config(pdev, PCI_COMMAND,
> > > + ctl | PCI_COMMAND_IO | PCI_COMMAND_MEMORY, 2);
> > > +
> > > + /* restore msi configuration */
> > > + ctl = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2);
> > > + msi_64bit = !!(ctl & PCI_MSI_FLAGS_64BIT);
> > > +
> > > + vfio_pci_write_config(&vdev->pdev,
> > > + pdev->msi_cap + PCI_MSI_FLAGS,
> > > + ctl & (!PCI_MSI_FLAGS_ENABLE), 2);
> > > +
> > > + msi_lo = qemu_get_be32(f);
> > > + vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO, msi_lo, 4);
> > > +
> > > + if (msi_64bit) {
> > > + msi_hi = qemu_get_be32(f);
> > > + vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI,
> > > + msi_hi, 4);
> > > + }
> > > + msi_data = qemu_get_be32(f);
> > > + vfio_pci_write_config(pdev,
> > > + pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
> > > + msi_data, 2);
> > > +
> > > + vfio_pci_write_config(&vdev->pdev, pdev->msi_cap + PCI_MSI_FLAGS,
> > > + ctl | PCI_MSI_FLAGS_ENABLE, 2);
> > > +
> >
> > Ok, this function is indeed pci-specific and probably should be moved
> > to the vfio-pci code (other types could hook themselves up in the same
> > place, then).
> >
> yes, this is the only pci-specific code.
> maybe using VFIO_DEVICE_TYPE_PCI as a sign to decide whether to save/load
> pci config data?
> or as Dave said, put saving/loading of pci config data into
> VMStateDescription interface?
I like having an interface like vmstate where other types can register
themselves better than introducing conditional handling based on
hard-coded type values.
next prev parent reply other threads:[~2019-02-21 10:57 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-19 8:50 [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration Yan Zhao
2019-02-19 8:52 ` [Qemu-devel] [PATCH 1/5] vfio/migration: define kernel interfaces Yan Zhao
2019-02-19 13:09 ` Cornelia Huck
2019-02-20 7:36 ` Zhao Yan
2019-02-20 17:08 ` Cornelia Huck
2019-02-21 1:47 ` Zhao Yan
2019-02-19 8:52 ` [Qemu-devel] [PATCH 2/5] vfio/migration: support device of device config capability Yan Zhao
2019-02-19 11:01 ` Dr. David Alan Gilbert
2019-02-20 5:12 ` Zhao Yan
2019-02-20 10:57 ` Dr. David Alan Gilbert
2019-02-19 14:37 ` Cornelia Huck
2019-02-20 22:54 ` Zhao Yan
2019-02-21 10:56 ` Cornelia Huck [this message]
2019-02-19 8:52 ` [Qemu-devel] [PATCH 3/5] vfio/migration: tracking of dirty page in system memory Yan Zhao
2019-02-19 8:52 ` [Qemu-devel] [PATCH 4/5] vfio/migration: turn on migration Yan Zhao
2019-02-19 8:53 ` [Qemu-devel] [PATCH 5/5] vfio/migration: support device memory capability Yan Zhao
2019-02-19 11:25 ` Dr. David Alan Gilbert
2019-02-20 5:17 ` Zhao Yan
2019-02-19 14:42 ` Christophe de Dinechin
2019-02-20 7:58 ` Zhao Yan
2019-02-20 10:14 ` Christophe de Dinechin
2019-02-21 0:07 ` Zhao Yan
2019-02-19 11:32 ` [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration Dr. David Alan Gilbert
2019-02-20 5:28 ` Zhao Yan
2019-02-20 11:01 ` Dr. David Alan Gilbert
2019-02-20 11:28 ` Gonglei (Arei)
2019-02-20 11:42 ` Cornelia Huck
2019-02-20 12:07 ` Gonglei (Arei)
[not found] ` <20190327063509.GD14681@joy-OptiPlex-7040>
[not found] ` <20190327201854.GG2636@work-vm>
[not found] ` <20190327161020.1c013e65@x1.home>
2019-04-01 8:14 ` Cornelia Huck
2019-04-01 8:40 ` Yan Zhao
2019-04-01 14:15 ` Alex Williamson
2019-02-21 0:31 ` Zhao Yan
2019-02-21 9:15 ` Dr. David Alan Gilbert
2019-02-20 11:56 ` Gonglei (Arei)
2019-02-21 0:24 ` Zhao Yan
2019-02-21 1:35 ` Gonglei (Arei)
2019-02-21 1:58 ` Zhao Yan
2019-02-21 3:33 ` Gonglei (Arei)
2019-02-21 4:08 ` Zhao Yan
2019-02-21 5:46 ` Gonglei (Arei)
2019-02-21 2:04 ` Zhao Yan
2019-02-21 3:16 ` Gonglei (Arei)
2019-02-21 4:21 ` Zhao Yan
2019-02-21 5:56 ` Gonglei (Arei)
2019-02-21 20:40 ` Alex Williamson
2019-02-25 2:22 ` Zhao Yan
2019-03-06 0:22 ` Zhao Yan
2019-03-07 17:44 ` Alex Williamson
2019-03-07 23:20 ` Tian, Kevin
2019-03-08 16:11 ` Alex Williamson
2019-03-08 16:21 ` Dr. David Alan Gilbert
2019-03-08 22:02 ` Alex Williamson
2019-03-11 2:33 ` Tian, Kevin
2019-03-11 20:19 ` Alex Williamson
2019-03-12 2:48 ` Tian, Kevin
2019-03-12 2:57 ` Zhao Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190221115646.416df963.cohuck@redhat.com \
--to=cohuck@redhat.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).