From: Cornelia Huck <cohuck@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
cjia@nvidia.com, kevin.tian@intel.com, ziye.yang@intel.com,
changpeng.liu@intel.com, yi.l.liu@intel.com, mlevitsk@redhat.com,
eskultet@redhat.com, dgilbert@redhat.com,
jonathan.davies@nutanix.com, eauger@redhat.com, aik@ozlabs.ru,
pasic@linux.ibm.com, felipe@nutanix.com,
Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com,
Ken.Xue@amd.com, zhi.a.wang@intel.com, yan.y.zhao@intel.com,
yulei.zhang@intel.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v3 1/5] VFIO KABI for migration interface
Date: Fri, 1 Mar 2019 14:36:30 +0100 [thread overview]
Message-ID: <20190301143630.474870d2.cohuck@redhat.com> (raw)
In-Reply-To: <f15fecfe-26c0-05fa-da3b-20156145d287@nvidia.com>
On Wed, 27 Feb 2019 01:35:01 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:
> Alex,
>
> On 2/22/2019 3:53 AM, Alex Williamson wrote:
> > On Wed, 20 Feb 2019 02:53:16 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >
> >> - Defined MIGRATION region type and sub-type.
> >> - Used 2 bits to define VFIO device states.
> >> Bit 0 => 0/1 => _STOPPED/_RUNNING
> >> Bit 1 => 0/1 => _RESUMING/_SAVING
> >> Combination of these bits defines VFIO device's state during migration
> >> _RUNNING => Normal VFIO device running state.
> >> _STOPPED => VFIO device stopped.
> >> _SAVING | _RUNNING => vCPUs are running, VFIO device is running but start
> >> saving state of device i.e. pre-copy state
> >> _SAVING | _STOPPED => vCPUs are stoppped, VFIO device should be stopped, and
> >> save device state,i.e. stop-n-copy state
> >> _RESUMING => VFIO device resuming state.
> >
> > Shouldn't we have a non-_RESUMING/_SAVING run state? If these are
> > indicating directly flow, maybe we need two bits:
> >
> > 00b - None, normal runtime
> > 01b - Saving
> > 10b - Resuming
> > 11b - Invalid/reserved (maybe a Failed state indicator)
> >
>
> There has to be 2 more states:
> _SAVING | _RUNNING => SAVING while device is RUNNING - pre-copy phase
> _SAVING | _STOPPED => SAVING while device is STOPPED - stop-and-copy phase
>
> So the 2 bits used in this patch are:
> 00b - _RESUMING
> 01b - _RUNNING - Normal Running
> 10b - _SAVING | _STOPPED - stop-and-copy phase
> 11b - _SAVING | _RUNNING - pre-copy phase
Not sure if the "use two bits" approach is the most elegant here --
rightmost bit unset can mean either _RESUMING or _STOPPED...
What about simply making this four distinct states?
#define _RESUMING 0
#define _RUNNING 1 //or the other way around?
#define _SAVING_STOPPED 2
#define _SAVING_RUNNING 3
You could still check if you're currently SAVING by ANDing with 10b.
(...)
> >> + *
> >> + * data_offset: (read/write)
> >> + * User application should read data_offset in migration region from where
> >> + * user application should read data during _SAVING state.
> >> + * User application would write data_offset in migration region from where
> >> + * user application is had written data during _RESUMING state.
> >
> > Why wouldn't data_offset simply be fixed? Do we really want to support
> > the user writing a arbitrary offsets in the migration region? Each
> > chunk can simply start at the kernel provided data_offset.
> >
>
> data_offset differs based on region is trapped on mapped. In case when
> region is mmaped, QEMU writes data to mapped region and then write
> data_offset field, indicating data is present in mmaped buffer. Read
> below for more detailed steps.
>
> >> + *
> >> + * data_size: (write only)
> >> + * User application should write size of data copied in migration region
> >> + * during _RESUMING state.
> >
> > How much does the user read on _SAVING then?
> >
>
> pending_bytes tells bytes that should be read on _SAVING.
>
> >> + *
> >> + * start_pfn: (write only)
> >> + * Start address pfn to get bitmap of dirty pages from vendor driver duing
> >> + * _SAVING state.
> >> + *
> >> + * page_size: (write only)
> >> + * User application should write the page_size of pfn.
> >> + *
> >> + * total_pfns: (write only)
> >> + * Total pfn count from start_pfn for which dirty bitmap is requested.
> >
> > So effectively the area that begins at data_offset is dual purpose
> > (triple purpose vs Yan's, config, memory, and dirty bitmap)
I'm wondering whether splitting it would combine the best of the two
approaches: Just having an optional dirty bitmap region would make it
more obvious if dirty page tracking is not available.
next prev parent reply other threads:[~2019-03-01 13:36 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-19 21:23 [Qemu-devel] [PATCH v3 0/5] Add migration support for VFIO device Kirti Wankhede
2019-02-19 21:23 ` [Qemu-devel] [PATCH v3 1/5] VFIO KABI for migration interface Kirti Wankhede
2019-02-21 22:23 ` Alex Williamson
2019-02-26 20:05 ` Kirti Wankhede
2019-03-01 13:36 ` Cornelia Huck [this message]
2019-03-07 19:45 ` Alex Williamson
2019-02-19 21:23 ` [Qemu-devel] [PATCH v3 2/5] Add save and load functions for VFIO PCI devices Kirti Wankhede
2019-02-19 21:23 ` [Qemu-devel] [PATCH v3 3/5] Add migration functions for VFIO devices Kirti Wankhede
2019-02-21 22:38 ` Alex Williamson
2019-02-26 20:16 ` Kirti Wankhede
2019-02-19 21:23 ` [Qemu-devel] [PATCH v3 4/5] Add vfio_listerner_log_sync to mark dirty pages Kirti Wankhede
2019-02-19 21:23 ` [Qemu-devel] [PATCH v3 5/5] Make vfio-pci device migration capable Kirti Wankhede
2019-02-20 10:22 ` [Qemu-devel] [PATCH v3 0/5] Add migration support for VFIO device Dr. David Alan Gilbert
2019-02-21 5:25 ` Kirti Wankhede
2019-02-21 5:52 ` Tian, Kevin
2019-02-21 7:10 ` Neo Jia
2019-02-27 1:51 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190301143630.474870d2.cohuck@redhat.com \
--to=cohuck@redhat.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kwankhede@nvidia.com \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=yulei.zhang@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).