From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58737) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a6zOZ-0007Er-E6 for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:28:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a6zOV-0007GD-4X for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:28:23 -0500 Received: from mail-pa0-x235.google.com ([2607:f8b0:400e:c03::235]:36191) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a6zOU-0007G7-P7 for qemu-devel@nongnu.org; Thu, 10 Dec 2015 06:28:19 -0500 Received: by pacdm15 with SMTP id dm15so46487670pac.3 for ; Thu, 10 Dec 2015 03:28:18 -0800 (PST) References: <1448372127-28115-1-git-send-email-tianyu.lan@intel.com> <20151207165039.GA20210@redhat.com> <56685631.50700@intel.com> <20151210101840.GA2570@work-vm> From: Yang Zhang Message-ID: <566961C1.6030000@gmail.com> Date: Thu, 10 Dec 2015 19:28:01 +0800 MIME-Version: 1.0 In-Reply-To: <20151210101840.GA2570@work-vm> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] live migration vs device assignment (motivation) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" , "Lan, Tianyu" Cc: lcapitulino@redhat.com, alex.williamson@redhat.com, emil.s.tantilov@intel.com, kvm@vger.kernel.org, ard.biesheuvel@linaro.org, aik@ozlabs.ru, donald.c.skidmore@intel.com, "Michael S. Tsirkin" , eddie.dong@intel.com, qemu-devel@nongnu.org, agraf@suse.de, blauwirbel@gmail.com, quintela@redhat.com, nrupal.jani@intel.com, kraxel@redhat.com, anthony@codemonkey.ws, cornelia.huck@de.ibm.com, pbonzini@redhat.com, mark.d.rustad@intel.com, amit.shah@redhat.com, gerlitz.or@gmail.com On 2015/12/10 18:18, Dr. David Alan Gilbert wrote: > * Lan, Tianyu (tianyu.lan@intel.com) wrote: >> On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote: >>> I thought about what this is doing at the high level, and I do have some >>> value in what you are trying to do, but I also think we need to clarify >>> the motivation a bit more. What you are saying is not really what the >>> patches are doing. >>> >>> And with that clearer understanding of the motivation in mind (assuming >>> it actually captures a real need), I would also like to suggest some >>> changes. >> >> Motivation: >> Most current solutions for migration with passthough device are based on >> the PCI hotplug but it has side affect and can't work for all device. >> >> For NIC device: >> PCI hotplug solution can work around Network device migration >> via switching VF and PF. >> >> But switching network interface will introduce service down time. >> >> I tested the service down time via putting VF and PV interface >> into a bonded interface and ping the bonded interface during plug >> and unplug VF. >> 1) About 100ms when add VF >> 2) About 30ms when del VF >> >> It also requires guest to do switch configuration. These are hard to >> manage and deploy from our customers. To maintain PV performance during >> migration, host side also needs to assign a VF to PV device. This >> affects scalability. >> >> These factors block SRIOV NIC passthough usage in the cloud service and >> OPNFV which require network high performance and stability a lot. > > Right, that I'll agree it's hard to do migration of a VM which uses > an SRIOV device; and while I think it should be possible to bond a virtio device > to a VF for networking and then hotplug the SR-IOV device I agree it's hard to manage. > >> For other kind of devices, it's hard to work. >> We are also adding migration support for QAT(QuickAssist Technology) device. >> >> QAT device user case introduction. >> Server, networking, big data, and storage applications use QuickAssist >> Technology to offload servers from handling compute-intensive operations, >> such as: >> 1) Symmetric cryptography functions including cipher operations and >> authentication operations >> 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve >> cryptography >> 3) Compression and decompression functions including DEFLATE and LZS >> >> PCI hotplug will not work for such devices during migration and these >> operations will fail when unplug device. > > I don't understand that QAT argument; if the device is purely an offload > engine for performance, then why can't you fall back to doing the > same operations in the VM or in QEMU if the card is unavailable? > The tricky bit is dealing with outstanding operations. > >> So we are trying implementing a new solution which really migrates >> device state to target machine and won't affect user during migration >> with low service down time. > > Right, that's a good aim - the only question is how to do it. > > It looks like this is always going to need some device-specific code; > the question I see is whether that's in: > 1) qemu > 2) the host kernel > 3) the guest kernel driver > > The objections to this series seem to be that it needs changes to (3); > I can see the worry that the guest kernel driver might not get a chance > to run during the right time in migration and it's painful having to > change every guest driver (although your change is small). > > My question is what stage of the migration process do you expect to tell > the guest kernel driver to do this? > > If you do it at the start of the migration, and quiesce the device, > the migration might take a long time (say 30 minutes) - are you > intending the device to be quiesced for this long? And where are > you going to send the traffic? > If you are, then do you need to do it via this PCI trick, or could > you just do it via something higher level to quiesce the device. > > Or are you intending to do it just near the end of the migration? > But then how do we know how long it will take the guest driver to > respond? Ideally, it is able to leave guest driver unmodified but it requires the hypervisor or qemu to aware the device which means we may need a driver in hypervisor or qemu to handle the device on behalf of guest driver. > > It would be great if we could avoid changing the guest; but at least your guest > driver changes don't actually seem to be that hardware specific; could your > changes actually be moved to generic PCI level so they could be made > to work for lots of drivers? It is impossible to use one common solution for all devices unless the PCIE spec documents it clearly and i think one day it will be there. But before that, we need some workarounds on guest driver to make it work even it looks ugly. -- best regards yang