From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Dr. David Alan Gilbert" Subject: Re: [PATCH 0/5] QEMU VFIO live migration Date: Wed, 27 Mar 2019 20:18:54 +0000 Message-ID: <20190327201854.GG2636@work-vm> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> <20190219113212.GC2941@work-vm> <20190220052838.GC16456@joy-OptiPlex-7040> <20190220110142.GD2608@work-vm> <33183CC9F5247A488A2544077AF19020DB25D30F@dggeml511-mbx.china.huawei.com> <20190220124242.5a1685c5.cohuck@redhat.com> <20190327063509.GD14681@joy-OptiPlex-7040> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "cjia@nvidia.com" , "kvm@vger.kernel.org" , "aik@ozlabs.ru" , "Zhengxiao.zx@alibaba-inc.com" , "shuangtai.tst@alibaba-inc.com" , "qemu-devel@nongnu.org" , "kwankhede@nvidia.com" , "eauger@redhat.com" , "Liu, Yi L" , "eskultet@redhat.com" , "Yang, Ziye" , "mlevitsk@redhat.com" , "pasic@linux.ibm.com" , "Gonglei \(Arei\)" , "felipe@nutanix.com" , "Ken.Xue@amd.com" , "Tian, Kevin" , alex.williamson@redhat.com, "intel-gvt-dev@lists.freedesktop.org" , "Liu, Changpeng" Return-path: Content-Disposition: inline In-Reply-To: <20190327063509.GD14681@joy-OptiPlex-7040> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel2=m.gmane.org@nongnu.org Sender: "Qemu-devel" List-Id: kvm.vger.kernel.org * Zhao Yan (yan.y.zhao@intel.com) wrote: > On Wed, Feb 20, 2019 at 07:42:42PM +0800, Cornelia Huck wrote: > > > > > > b) How do we detect if we're migrating from/to the wrong device or > > > > > > version of device? Or say to a device with older firmware or perhaps > > > > > > a device that has less device memory ? > > > > > Actually it's still an open for VFIO migration. Need to think about > > > > > whether it's better to check that in libvirt or qemu (like a device magic > > > > > along with verion ?). > > > > > > We must keep the hardware generation is the same with one POD of public cloud > > > providers. But we still think about the live migration between from the the lower > > > generation of hardware migrated to the higher generation. > > > > Agreed, lower->higher is the one direction that might make sense to > > support. > > > > But regardless of that, I think we need to make sure that incompatible > > devices/versions fail directly instead of failing in a subtle, hard to > > debug way. Might be useful to do some initial sanity checks in libvirt > > as well. > > > > How easy is it to obtain that information in a form that can be > > consumed by higher layers? Can we find out the device type at least? > > What about some kind of revision? > hi Alex and Cornelia > for device compatibility, do you think it's a good idea to use "version" > and "device version" fields? > > version field: identify live migration interface's version. it can have a > sort of backward compatibility, like target machine's version >= source > machine's version. something like that. > > device_version field consists two parts: > 1. vendor id : it takes 32 bits. e.g. 0x8086. > 2. vendor proprietary string: it can be any string that a vendor driver > thinks can identify a source device. e.g. pciid + mdev type. > "vendor id" is to avoid overlap of "vendor proprietary string". > > > struct vfio_device_state_ctl { > __u32 version; /* ro */ > __u8 device_version[MAX_DEVICE_VERSION_LEN]; /* ro */ > struct { > __u32 action; /* GET_BUFFER, SET_BUFFER, IS_COMPATIBLE*/ > ... > }data; > ... > }; > > Then, an action IS_COMPATIBLE is added to check device compatibility. > > The flow to figure out whether a source device is migratable to target device > is like that: > 1. in source side's .save_setup, save source device's device_version string > 2. in target side's .load_state, load source device's device version string > and write it to data region, and call IS_COMPATIBLE action to ask vendor driver > to check whether the source device is compatible to it. > > The advantage of adding an IS_COMPATIBLE action is that, vendor driver can > maintain a compatibility table and decide whether source device is compatible > to target device according to its proprietary table. > In device_version string, vendor driver only has to describe the source > device as elaborately as possible and resorts to vendor driver in target side > to figure out whether they are compatible. It would also be good if the 'IS_COMPATIBLE' was somehow callable externally - so we could be able to answer a question like 'can we migrate this VM to this host' - from the management layer before it actually starts the migration. Dave > Thanks > Yan > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK