From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50697) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a7IBe-0002Ii-Ne for qemu-devel@nongnu.org; Fri, 11 Dec 2015 02:32:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a7IBb-0006zM-HS for qemu-devel@nongnu.org; Fri, 11 Dec 2015 02:32:18 -0500 Received: from mga03.intel.com ([134.134.136.65]:13369) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a7IBb-0006zB-8f for qemu-devel@nongnu.org; Fri, 11 Dec 2015 02:32:15 -0500 References: <1448372127-28115-1-git-send-email-tianyu.lan@intel.com> <20151207165039.GA20210@redhat.com> <56685631.50700@intel.com> <20151210101840.GA2570@work-vm> <566961C1.6030000@gmail.com> <20151210114114.GE2570@work-vm> <56698E68.5040207@intel.com> <20151210175849-mutt-send-email-mst@redhat.com> From: "Lan, Tianyu" Message-ID: <566A7BF4.1020704@intel.com> Date: Fri, 11 Dec 2015 15:32:04 +0800 MIME-Version: 1.0 In-Reply-To: <20151210175849-mutt-send-email-mst@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] live migration vs device assignment (motivation) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Yang Zhang , emil.s.tantilov@intel.com, kvm@vger.kernel.org, aik@ozlabs.ru, qemu-devel@nongnu.org, lcapitulino@redhat.com, blauwirbel@gmail.com, kraxel@redhat.com, mark.d.rustad@intel.com, quintela@redhat.com, donald.c.skidmore@intel.com, agraf@suse.de, gerlitz.or@gmail.com, "Dr. David Alan Gilbert" , alex.williamson@redhat.com, anthony@codemonkey.ws, cornelia.huck@de.ibm.com, ard.biesheuvel@linaro.org, eddie.dong@intel.com, nrupal.jani@intel.com, amit.shah@redhat.com, pbonzini@redhat.com On 12/11/2015 12:11 AM, Michael S. Tsirkin wrote: > On Thu, Dec 10, 2015 at 10:38:32PM +0800, Lan, Tianyu wrote: >> >> >> On 12/10/2015 7:41 PM, Dr. David Alan Gilbert wrote: >>>> Ideally, it is able to leave guest driver unmodified but it requires the >>>>> hypervisor or qemu to aware the device which means we may need a driver in >>>>> hypervisor or qemu to handle the device on behalf of guest driver. >>> Can you answer the question of when do you use your code - >>> at the start of migration or >>> just before the end? >> >> Just before stopping VCPU in this version and inject VF mailbox irq to >> notify the driver if the irq handler is installed. >> Qemu side also will check this via the faked PCI migration capability >> and driver will set the status during device open() or resume() callback. > > Right, this is the "good path" optimization. Whether this buys anything > as compared to just sending reset to the device when VCPU is stopped > needs to be measured. In any case, we probably do need a way to > interrupt driver on destination to make it reconfigure the device - > otherwise it might take seconds for it to notice. And a way to make > sure driver can handle this surprise reset so we can block migration if > it can't. > Yes, we need such a way to notify driver about migration status and do reset or restore operation on the destination machine. My original design is to take advantage of device's irq to do that. Driver can tell Qemu that which irq it prefers to handle such task and whether the irq is enabled or bound with handler. We may discuss the detail in the other thread. >>> >>>>>>> It would be great if we could avoid changing the guest; but at least your guest >>>>>>> driver changes don't actually seem to be that hardware specific; could your >>>>>>> changes actually be moved to generic PCI level so they could be made >>>>>>> to work for lots of drivers? >>>>> >>>>> It is impossible to use one common solution for all devices unless the PCIE >>>>> spec documents it clearly and i think one day it will be there. But before >>>>> that, we need some workarounds on guest driver to make it work even it looks >>>>> ugly. >> >> Yes, so far there is not hardware migration support > > VT-D supports setting dirty bit in the PTE in hardware. Actually, this doesn't support in the current hardware. VTD spec documents the dirty bit for first level translation which requires devices to support DMA request with PASID(process address space identifier). Most device don't support the feature. > >> and it's hard to modify >> bus level code. > > Why is it hard? As Yang said, the concern is that PCI Spec doesn't document about how to do migration. > >> It also will block implementation on the Windows. > > Implementation of what? We are discussing motivation here, not > implementation. E.g. windows drivers typically support surprise > removal, should you use that, you get some working code for free. Just > stop worrying about it. Make it work, worry about closed source > software later. > >>> Dave >>>