qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Zhao Yan <yan.y.zhao@intel.com>
Cc: alex.williamson@redhat.com, qemu-devel@nongnu.org,
	intel-gvt-dev@lists.freedesktop.org,
	Zhengxiao.zx@Alibaba-inc.com, yi.l.liu@intel.com,
	eskultet@redhat.com, ziye.yang@intel.com,
	shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com,
	zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com,
	aik@ozlabs.ru, eauger@redhat.com, felipe@nutanix.com,
	jonathan.davies@nutanix.com, changpeng.liu@intel.com,
	Ken.Xue@amd.com, kwankhede@nvidia.com, kevin.tian@intel.com,
	cjia@nvidia.com, arei.gonglei@huawei.com, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH 1/5] vfio/migration: define kernel interfaces
Date: Wed, 20 Feb 2019 18:08:13 +0100	[thread overview]
Message-ID: <20190220180813.17e6ab86.cohuck@redhat.com> (raw)
In-Reply-To: <20190220073636.GD16456@joy-OptiPlex-7040>

On Wed, 20 Feb 2019 02:36:36 -0500
Zhao Yan <yan.y.zhao@intel.com> wrote:

> On Tue, Feb 19, 2019 at 02:09:18PM +0100, Cornelia Huck wrote:
> > On Tue, 19 Feb 2019 16:52:14 +0800
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
(...)
> > > + *          Size of device config data is smaller than or equal to that of
> > > + *          device config region.  
> > 
> > Not sure if I understand that sentence correctly... but what if a
> > device has more config state than fits into this region? Is that
> > supposed to be covered by the device memory region? Or is this assumed
> > to be something so exotic that we don't need to plan for it?
> >   
> Device config data and device config region are all provided by vendor
> driver, so vendor driver is always able to create a large enough device
> config region to hold device config data.
> So, if a device has data that are better to be saved after device stop and
> saved/loaded in strict order, the data needs to be in device config region.
> This kind of data is supposed to be small.
> If the device data can be saved/loaded several times, it can also be put
> into device memory region.

So, it is the vendor driver's decision which device information should
go via which region? With the device config data supposed to be
saved/loaded in one go?

(...)
> > > +/* version number of the device state interface */
> > > +#define VFIO_DEVICE_STATE_INTERFACE_VERSION 1  
> > 
> > Hm. Is this supposed to be backwards-compatible, should we need to bump
> > this?
> >  
> currently no backwords-compatible. we can discuss on that.

It might be useful if we discover that we need some extensions. But I'm
not sure how much work it would be.

(...)
> > > +/*
> > > + * DEVICE STATES
> > > + *
> > > + * Four states are defined for a VFIO device:
> > > + * RUNNING, RUNNING & LOGGING, STOP & LOGGING, STOP.
> > > + * They can be set by writing to device_state field of
> > > + * vfio_device_state_ctl region.  
> > 
> > Who controls this? Userspace?  
> 
> Yes. Userspace notifies vendor driver to do the state switching.

Might be good to mention this (just to make it obvious).

> > > + * LOGGING state is a special state that it CANNOT exist
> > > + * independently.  
> > 
> > So it's not a state, but rather a modifier?
> >   
> yes. or thinking LOGGING/not LOGGING as bit 1 of a device state,
> whereas RUNNING/STOPPED is bit 0 of a device state.
> They have to be got as a whole.

So it is (on a bit level):
RUNNING -> 00
STOPPED -> 01
LOGGING/RUNNING -> 10
LOGGING/STOPPED -> 11
 
> > > + * It must be set alongside with state RUNNING or STOP, i.e,
> > > + * RUNNING & LOGGING, STOP & LOGGING.
> > > + * It is used for dirty data logging both for device memory
> > > + * and system memory.
> > > + *
> > > + * LOGGING only impacts device/system memory. In LOGGING state, get buffer
> > > + * of device memory returns dirty pages since last call; outside LOGGING
> > > + * state, get buffer of device memory returns whole snapshot of device
> > > + * memory. system memory's dirty page is only available in LOGGING state.
> > > + *
> > > + * Device config should be always accessible and return whole config snapshot
> > > + * regardless of LOGGING state.
> > > + * */
> > > +#define VFIO_DEVICE_STATE_RUNNING 0
> > > +#define VFIO_DEVICE_STATE_STOP 1
> > > +#define VFIO_DEVICE_STATE_LOGGING 2

This makes it look a bit like LOGGING were an individual state, while 2
is in reality LOGGING/RUNNING... not sure how to make that more
obvious. Maybe (as we are dealing with a u32):

#define VFIO_DEVICE_STATE_RUNNING 0x00000000
#define VFIO_DEVICE_STATE_STOPPED 0x00000001
#define VFIO_DEVICE_STATE_LOGGING_RUNNING 0x00000002
#define VFIO_DEVICE_STATE_LOGGING_STOPPED 0x00000003
#define VFIO_DEVICE_STATE_LOGGING_MASK 0x00000002

> > > +
> > > +/* action to get data from device memory or device config
> > > + * the action is write to device state's control region, and data is read
> > > + * from device memory region or device config region.
> > > + * Each time before read device memory region or device config region,
> > > + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to action
> > > + * field in control region. That is because device memory and devie config
> > > + * region is mmaped into user space. vendor driver has to be notified of
> > > + * the the GET_BUFFER action in advance.
> > > + */
> > > +#define VFIO_DEVICE_DATA_ACTION_GET_BUFFER 1
> > > +
> > > +/* action to set data to device memory or device config
> > > + * the action is write to device state's control region, and data is
> > > + * written to device memory region or device config region.
> > > + * Each time after write to device memory region or device config region,
> > > + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to action
> > > + * field in control region. That is because device memory and devie config
> > > + * region is mmaped into user space. vendor driver has to be notified of
> > > + * the the SET_BUFFER action after data written.
> > > + */
> > > +#define VFIO_DEVICE_DATA_ACTION_SET_BUFFER 2  
> > 
> > Let me describe this in my own words to make sure that I understand
> > this correctly.
> > 
> > - The actions are set by userspace to notify the kernel that it is
> >   going to get data or that it has just written data.
> > - This is needed as a notification that the mmapped data should not be
> >   changed resp. just has changed.  
> we need this notification is because when userspace read the mmapped data,
> it's from the ptr returned from mmap(). So, when userspace reads that ptr,
> there will be no page fault or read/write system calls, so vendor driver
> does not know whether read/write opertation happens or not. 
> Therefore, before userspace reads the ptr from mmap, it first writes action
> field in control region (through write system call), and vendor driver
> will not return the write system call until data prepared.
> 
> When userspace writes to that ptr from mmap, it writes data to the data
> region first, then writes the action field in control region (through write
> system call) to notify vendor driver. vendor driver will return the system
> call after it copies the buffer completely.
> > 
> > So, how does the kernel know whether the read action has finished resp.
> > whether the write action has started? Even if userspace reads/writes it
> > as a whole.
> >   
> kernel does not touch the data region except when in response to the
> "action" write system call.

Thanks for the explanation, that makes sense.
(...)

  reply	other threads:[~2019-02-20 17:08 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-19  8:50 [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration Yan Zhao
2019-02-19  8:52 ` [Qemu-devel] [PATCH 1/5] vfio/migration: define kernel interfaces Yan Zhao
2019-02-19 13:09   ` Cornelia Huck
2019-02-20  7:36     ` Zhao Yan
2019-02-20 17:08       ` Cornelia Huck [this message]
2019-02-21  1:47         ` Zhao Yan
2019-02-19  8:52 ` [Qemu-devel] [PATCH 2/5] vfio/migration: support device of device config capability Yan Zhao
2019-02-19 11:01   ` Dr. David Alan Gilbert
2019-02-20  5:12     ` Zhao Yan
2019-02-20 10:57       ` Dr. David Alan Gilbert
2019-02-19 14:37   ` Cornelia Huck
2019-02-20 22:54     ` Zhao Yan
2019-02-21 10:56       ` Cornelia Huck
2019-02-19  8:52 ` [Qemu-devel] [PATCH 3/5] vfio/migration: tracking of dirty page in system memory Yan Zhao
2019-02-19  8:52 ` [Qemu-devel] [PATCH 4/5] vfio/migration: turn on migration Yan Zhao
2019-02-19  8:53 ` [Qemu-devel] [PATCH 5/5] vfio/migration: support device memory capability Yan Zhao
2019-02-19 11:25   ` Dr. David Alan Gilbert
2019-02-20  5:17     ` Zhao Yan
2019-02-19 14:42   ` Christophe de Dinechin
2019-02-20  7:58     ` Zhao Yan
2019-02-20 10:14       ` Christophe de Dinechin
2019-02-21  0:07         ` Zhao Yan
2019-02-19 11:32 ` [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration Dr. David Alan Gilbert
2019-02-20  5:28   ` Zhao Yan
2019-02-20 11:01     ` Dr. David Alan Gilbert
2019-02-20 11:28       ` Gonglei (Arei)
2019-02-20 11:42         ` Cornelia Huck
2019-02-20 12:07           ` Gonglei (Arei)
     [not found]           ` <20190327063509.GD14681@joy-OptiPlex-7040>
     [not found]             ` <20190327201854.GG2636@work-vm>
     [not found]               ` <20190327161020.1c013e65@x1.home>
2019-04-01  8:14                 ` Cornelia Huck
2019-04-01  8:40                   ` Yan Zhao
2019-04-01 14:15                     ` Alex Williamson
2019-02-21  0:31       ` Zhao Yan
2019-02-21  9:15         ` Dr. David Alan Gilbert
2019-02-20 11:56 ` Gonglei (Arei)
2019-02-21  0:24   ` Zhao Yan
2019-02-21  1:35     ` Gonglei (Arei)
2019-02-21  1:58       ` Zhao Yan
2019-02-21  3:33         ` Gonglei (Arei)
2019-02-21  4:08           ` Zhao Yan
2019-02-21  5:46             ` Gonglei (Arei)
2019-02-21  2:04       ` Zhao Yan
2019-02-21  3:16         ` Gonglei (Arei)
2019-02-21  4:21           ` Zhao Yan
2019-02-21  5:56             ` Gonglei (Arei)
2019-02-21 20:40 ` Alex Williamson
2019-02-25  2:22   ` Zhao Yan
2019-03-06  0:22     ` Zhao Yan
2019-03-07 17:44     ` Alex Williamson
2019-03-07 23:20       ` Tian, Kevin
2019-03-08 16:11         ` Alex Williamson
2019-03-08 16:21           ` Dr. David Alan Gilbert
2019-03-08 22:02             ` Alex Williamson
2019-03-11  2:33               ` Tian, Kevin
2019-03-11 20:19                 ` Alex Williamson
2019-03-12  2:48                   ` Tian, Kevin
2019-03-12  2:57       ` Zhao Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190220180813.17e6ab86.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).