From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:52478) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h2IFy-0005zl-O9 for qemu-devel@nongnu.org; Fri, 08 Mar 2019 11:21:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h2IFx-0001Gm-Hv for qemu-devel@nongnu.org; Fri, 08 Mar 2019 11:21:58 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55438) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h2IFx-0001FG-8T for qemu-devel@nongnu.org; Fri, 08 Mar 2019 11:21:57 -0500 Date: Fri, 8 Mar 2019 16:21:46 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20190308162146.GD2834@work-vm> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> <20190221134051.2c28893e@w520.home> <20190225022255.GP16456@joy-OptiPlex-7040> <20190307104421.534ea56f@w520.home> <20190308091133.3073f5db@x1.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190308091133.3073f5db@x1.home> Subject: Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: "Tian, Kevin" , "Zhao, Yan Y" , "qemu-devel@nongnu.org" , "intel-gvt-dev@lists.freedesktop.org" , "Zhengxiao.zx@Alibaba-inc.com" , "Liu, Yi L" , "eskultet@redhat.com" , "Yang, Ziye" , "cohuck@redhat.com" , "shuangtai.tst@alibaba-inc.com" , "Wang, Zhi A" , "mlevitsk@redhat.com" , "pasic@linux.ibm.com" , "aik@ozlabs.ru" , "eauger@redhat.com" , "felipe@nutanix.com" , "jonathan.davies@nutanix.com" , "Liu, Changpeng" , "Ken.Xue@amd.com" , "kwankhede@nvidia.com" , "cjia@nvidia.com" , "arei.gonglei@huawei.com" , "kvm@vger.kernel.org" * Alex Williamson (alex.williamson@redhat.com) wrote: > On Thu, 7 Mar 2019 23:20:36 +0000 > "Tian, Kevin" wrote: > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > Sent: Friday, March 8, 2019 1:44 AM > > > > > > > > > > > This kind of data needs to be saved / loaded in pre-copy and > > > > > > stop-and-copy phase. > > > > > > The data of device memory is held in device memory region. > > > > > > Size of devie memory is usually larger than that of device > > > > > > memory region. qemu needs to save/load it in chunks of size of > > > > > > device memory region. > > > > > > Not all device has device memory. Like IGD only uses system > > > memory. > > > > > > > > > > It seems a little gratuitous to me that this is a separate region or > > > > > that this data is handled separately. All of this data is opaque to > > > > > QEMU, so why do we need to separate it? > > > > hi Alex, > > > > as the device state interfaces are provided by kernel, it is expected to > > > > meet as general needs as possible. So, do you think there are such use > > > > cases from user space that user space knows well of the device, and > > > > it wants kernel to return desired data back to it. > > > > E.g. It just wants to get whole device config data including all mmios, > > > > page tables, pci config data... > > > > or, It just wants to get current device memory snapshot, not including any > > > > dirty data. > > > > Or, It just needs the dirty pages in device memory or system memory. > > > > With all this accurate query, quite a lot of useful features can be > > > > developped in user space. > > > > > > > > If all of this data is opaque to user app, seems the only use case is > > > > for live migration. > > > > > > I can certainly appreciate a more versatile interface, but I think > > > we're also trying to create the most simple interface we can, with the > > > primary target being live migration. As soon as we start defining this > > > type of device memory and that type of device memory, we're going to > > > have another device come along that needs yet another because they have > > > a slightly different requirement. Even without that, we're going to > > > have vendor drivers implement it differently, so what works for one > > > device for a more targeted approach may not work for all devices. Can > > > you enumerate some specific examples of the use cases you imagine your > > > design to enable? > > > > > > > Do we want to consider an use case where user space would like to > > selectively introspect a portion of the device state (including implicit > > state which are not available through PCI regions), and may ask for > > capability of direct mapping of selected portion for scanning (e.g. > > device memory) instead of always turning on dirty logging on all > > device state? > > I don't see that a migration interface necessarily lends itself to this > use case. A migration data stream has no requirement to be user > consumable as anything other than opaque data, there's also no > requirement that it expose state in a form that directly represents the > internal state of the device. In fact I'm not sure we want to encourage > introspection via this data stream. If a user knows how to interpret > the data, what prevents them from modifying the data in-flight? I've > raised the question previously regarding how the vendor driver can > validate the integrity of the migration data stream. Using the > migration interface to introspect the device certainly suggests an > interface ripe for exploiting any potential weakness in the vendor > driver reassembling that migration stream. If the user has an mmap to > the actual live working state of the vendor driver, protection in the > hardware seems like the only way you could protect against a malicious > user. Please be defensive in what is directly exposed to the user and > what safeguards are in place within the vendor driver for validating > incoming data. Thanks, Hmm; that sounds like a security-by-obscurity answer! The scripts/analyze-migration.py scripts will actually dump the migration stream data in an almost readable format. So if you properly define the VMState definitions it should be almost readable; it's occasionally been useful. I agree that you should be very very careful to validate the incoming migration stream against: a) Corruption b) Wrong driver versions c) Malicious intent c.1) Especially by the guest c.2) Or by someone trying to feed you a duff stream d) Someone trying load the VFIO stream into completely the wrong device. Whether the migration interface is the right thing to use for that inspection hmm; well it might be - if you're trying to debug your device and need a dump of it's state, then why not? (I guess you end up with something not dissimilar to what things like intek_reg_snapshot in intel-gpu-tools does). Dave > Alex -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK