From: Zhao Yan <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "cjia@nvidia.com" <cjia@nvidia.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"aik@ozlabs.ru" <aik@ozlabs.ru>,
"Zhengxiao.zx@Alibaba-inc.com" <Zhengxiao.zx@Alibaba-inc.com>,
"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
"eauger@redhat.com" <eauger@redhat.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"eskultet@redhat.com" <eskultet@redhat.com>,
"Yang, Ziye" <ziye.yang@intel.com>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
"arei.gonglei@huawei.com" <arei.gonglei@huawei.com>,
"felipe@nutanix.com" <felipe@nutanix.com>,
"Ken.Xue@amd.com" <Ken.Xue@amd.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"dgilbert@redhat.com" <dgilbert@redhat.com>,
"intel-gvt-dev@lists.freedesktop.org"
<intel-gvt-dev@lists.freedesktop.org>,
"L
Subject: Re: [PATCH 0/5] QEMU VFIO live migration
Date: Wed, 13 Mar 2019 21:12:22 -0400 [thread overview]
Message-ID: <20190314011222.GA17006@joy-OptiPlex-7040> (raw)
In-Reply-To: <20190313131454.09f886c1@w520.home>
On Thu, Mar 14, 2019 at 03:14:54AM +0800, Alex Williamson wrote:
> On Tue, 12 Mar 2019 21:13:01 -0400
> Zhao Yan <yan.y.zhao@intel.com> wrote:
>
> > hi Alex
> > Any comments to the sequence below?
> >
> > Actaully we have some concerns and suggestions to userspace-opaque migration
> > data.
> >
> > 1. if data is opaque to userspace, kernel interface must be tightly bound to
> > migration.
> > e.g. vendor driver has to know state (running + not logging) should not
> > return any data, and state (running + logging) should return whole
> > snapshot first and dirty later. it also has to know qemu migration will
> > not call GET_BUFFER in state (running + not logging), otherwise, it has
> > to adjust its behavior.
>
> This all just sounds like defining the protocol we expect with the
> interface. For instance if we define a session as beginning when
> logging is enabled and ending when the device is stopped and the
> interface reports no more data is available, then we can state that any
> partial accumulation of data is incomplete relative to migration. If
> userspace wants to initiate a new migration stream, they can simply
> toggle logging. How the vendor driver provides the data during the
> session is not defined, but beginning the session with a snapshot
> followed by repeated iterations of dirtied data is certainly a valid
> approach.
>
> > 2. vendor driver cannot ensure userspace get all the data it intends to
> > save in pre-copy phase.
> > e.g. in stop-and-copy phase, vendor driver has to first check and send
> > data in previous phase.
>
> First, I don't think the device has control of when QEMU switches from
> pre-copy to stop-and-copy, the protocol needs to support that
> transition at any point. However, it seems a simply data available
> counter provides an indication of when it might be optimal to make such
> a transition. If a vendor driver follows a scheme as above, the
> available data counter would indicate a large value, the entire initial
> snapshot of the device. As the migration continues and pages are
> dirtied, the device would reach a steady state amount of data
> available, depending on the guest activity. This could indicate to the
> user to stop the device. The migration stream would not be considered
> completed until the available data counter reaches zero while the
> device is in the stopped|logging state.
>
> > 3. if all the sequence is tightly bound to live migration, can we remove the
> > logging state? what about adding two states migrate-in and migrate-out?
> > so there are four states: running, stopped, migrate-in, migrate-out.
> > migrate-out is for source side when migration starts. together with
> > state running and stopped, it can substitute state logging.
> > migrate-in is for target side.
>
> In fact, Kirti's implementation specifies a data direction, but I think
> we still need logging to indicate sessions. I'd also assume that
> logging implies some overhead for the vendor driver.
>
ok. If you prefer logging, I'm ok with it. just found migrate-in and
migrate-out are more universal againt hardware requirement changes.
> > On Tue, Mar 12, 2019 at 10:57:47AM +0800, Zhao Yan wrote:
> > > hi Alex
> > > thanks for your reply.
> > >
> > > So, if we choose migration data to be userspace opaque, do you think below
> > > sequence is the right behavior for vendor driver to follow:
> > >
> > > 1. initially LOGGING state is not set. If userspace calls GET_BUFFER to
> > > vendor driver, vendor driver should reject and return 0.
>
> What would this state mean otherwise? If we're not logging then it
> should not be expected that we can construct dirtied data from a
> previous read of the state before logging was enabled (it would be
> outside of the "session"). So at best this is an incomplete segment of
> the initial snapshot of the device, but that presumes how the vendor
> driver constructs the data. I wouldn't necessarily mandate the vendor
> driver reject it, but I think we should consider it undefined and
> vendor specific relative to the migration interface.
>
> > > 2. then LOGGING state is set, if userspace calls GET_BUFFER to vendor
> > > driver,
> > > a. vendor driver shoud first query a whole snapshot of device memory
> > > (let's use this term to represent device's standalone memory for now),
> > > b. vendor driver returns a chunk of data just queried to userspace,
> > > while recording current pos in data.
> > > c. vendor driver finds all data just queried is finished transmitting to
> > > userspace, and queries only dirty data in device memory now.
> > > d. vendor driver returns a chunk of data just quered (this time is dirty
> > > data )to userspace while recording current pos in data
> > > e. if all data is transmited to usespace and still GET_BUFFERs come from
> > > userspace, vendor driver starts another round of dirty data query.
>
> This is a valid vendor driver approach, but it's outside the scope of
> the interface definition. A vendor driver could also decide to not
> provide any data until both stopped and logging are set and then
> provide a fixed, final snapshot. The interface supports either
> approach by defining the protocol to interact with it.
>
> > > 3. if LOGGING state is unset then, and userpace calls GET_BUFFER to vendor
> > > driver,
> > > a. if vendor driver finds there's previously untransmitted data, returns
> > > them until all transmitted.
> > > b. vendor driver then queries dirty data again and transmits them.
> > > c. at last, vendor driver queris device config data (which has to be
> > > queried at last and sent once) and transmits them.
>
> This seems broken, the vendor driver is presuming the user intentions.
> If logging is unset, we return to bullet 1, reading data is undefined
> and vendor specific. It's outside of the session.
>
> > > for the 1 bullet, if LOGGING state is firstly set and migration aborts
> > > then, vendor driver has to be able to detect that condition. so seemingly,
> > > vendor driver has to know more qemu's migration state, like migration
> > > called and failed. Do you think that's acceptable?
>
> If migration aborts, logging is cleared and the device continues
> operation. If a new migration is started, the session is initiated by
> enabling logging. Sound reasonable? Thanks,
>
For the flow, I still have a question.
There are 2 approaches below, which one do you prefer?
Approach A, in precopy stage, the sequence is
(1)
.save_live_pending --> return whole snapshot size
.save_live_iterate --> save whole snapshot
(2)
.save_live_pending --> get dirty data, return dirty data size
.save_live_iterate --> save all dirty data
(3)
.save_live_pending --> get dirty data again, return dirty data size
.save_live_iterate --> save all dirty data
Approach B, in precopy stage, the sequence is
(1)
.save_live_pending --> return whole snapshot size
.save_live_iterate --> save part of snapshot
(2)
.save_live_pending --> return rest part of whole snapshot size +
current dirty data size
.save_live_iterate --> save part of snapshot
(3) repeat (2) until whole snapshot saved.
(4)
.save_live_pending --> get diryt data and return current dirty data size
.save_live_iterate --> save part of dirty data
(5)
.save_live_pending --> return reset part of dirty data size +
delta size of dirty data
.save_live_iterate --> save part of dirty data
(6)
repeat (5) until precopy stops
> Alex
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
next prev parent reply other threads:[~2019-03-14 1:12 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-19 8:50 [PATCH 0/5] QEMU VFIO live migration Yan Zhao
2019-02-19 8:52 ` [PATCH 1/5] vfio/migration: define kernel interfaces Yan Zhao
2019-02-19 13:09 ` Cornelia Huck
2019-02-20 7:36 ` Zhao Yan
2019-02-20 17:08 ` Cornelia Huck
2019-02-21 1:47 ` Zhao Yan
2019-02-19 8:52 ` [PATCH 2/5] vfio/migration: support device of device config capability Yan Zhao
2019-02-19 11:01 ` Dr. David Alan Gilbert
2019-02-20 5:12 ` Zhao Yan
2019-02-20 10:57 ` Dr. David Alan Gilbert
2019-02-19 14:37 ` Cornelia Huck
2019-02-20 22:54 ` Zhao Yan
2019-02-21 10:56 ` Cornelia Huck
2019-02-19 8:52 ` [PATCH 3/5] vfio/migration: tracking of dirty page in system memory Yan Zhao
2019-02-19 8:52 ` [PATCH 4/5] vfio/migration: turn on migration Yan Zhao
2019-02-19 8:53 ` [PATCH 5/5] vfio/migration: support device memory capability Yan Zhao
2019-02-19 11:25 ` Dr. David Alan Gilbert
2019-02-20 5:17 ` Zhao Yan
2019-02-19 14:42 ` Christophe de Dinechin
2019-02-20 7:58 ` Zhao Yan
2019-02-20 10:14 ` Christophe de Dinechin
2019-02-21 0:07 ` Zhao Yan
2019-02-19 11:32 ` [PATCH 0/5] QEMU VFIO live migration Dr. David Alan Gilbert
2019-02-20 5:28 ` Zhao Yan
2019-02-20 11:01 ` Dr. David Alan Gilbert
2019-02-20 11:28 ` Gonglei (Arei)
2019-02-20 11:42 ` Cornelia Huck
2019-02-20 12:07 ` Gonglei (Arei)
2019-03-27 6:35 ` Zhao Yan
2019-03-27 20:18 ` Dr. David Alan Gilbert
2019-03-27 22:10 ` Alex Williamson
2019-03-28 8:36 ` Zhao Yan
2019-03-28 9:21 ` Erik Skultety
2019-03-28 16:04 ` Alex Williamson
2019-03-29 2:47 ` Zhao Yan
2019-03-29 14:26 ` Alex Williamson
2019-03-29 23:10 ` Zhao Yan
2019-03-30 14:14 ` Alex Williamson
2019-04-01 2:17 ` Zhao Yan
2019-04-01 8:14 ` Cornelia Huck
2019-04-01 8:40 ` Yan Zhao
2019-04-01 14:15 ` Alex Williamson
2019-02-21 0:31 ` Zhao Yan
2019-02-21 9:15 ` Dr. David Alan Gilbert
2019-02-20 11:56 ` Gonglei (Arei)
2019-02-21 0:24 ` Zhao Yan
2019-02-21 1:35 ` Gonglei (Arei)
2019-02-21 1:58 ` Zhao Yan
2019-02-21 3:33 ` Gonglei (Arei)
2019-02-21 4:08 ` Zhao Yan
2019-02-21 5:46 ` Gonglei (Arei)
2019-02-21 2:04 ` Zhao Yan
2019-02-21 3:16 ` Gonglei (Arei)
2019-02-21 4:21 ` Zhao Yan
2019-02-21 5:56 ` Gonglei (Arei)
2019-02-21 20:40 ` Alex Williamson
2019-02-25 2:22 ` Zhao Yan
2019-03-06 0:22 ` Zhao Yan
2019-03-07 17:44 ` Alex Williamson
2019-03-07 23:20 ` Tian, Kevin
2019-03-08 16:11 ` Alex Williamson
2019-03-08 16:21 ` Dr. David Alan Gilbert
2019-03-08 22:02 ` Alex Williamson
2019-03-11 2:33 ` Tian, Kevin
2019-03-11 20:19 ` Alex Williamson
2019-03-12 2:48 ` Tian, Kevin
2019-03-13 19:57 ` Alex Williamson
2019-03-12 2:57 ` Zhao Yan
2019-03-13 1:13 ` Zhao Yan
2019-03-13 19:14 ` Alex Williamson
2019-03-14 1:12 ` Zhao Yan [this message]
2019-03-14 22:44 ` Alex Williamson
2019-03-14 23:05 ` Zhao Yan
2019-03-15 2:24 ` Alex Williamson
2019-03-18 2:51 ` Zhao Yan
2019-03-18 3:09 ` Alex Williamson
2019-03-18 3:27 ` Zhao Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190314011222.GA17006@joy-OptiPlex-7040 \
--to=yan.y.zhao@intel.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=cjia@nvidia.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yi.l.liu@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox