From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Zhang Subject: Re: [Qemu-devel] live migration vs device assignment (motivation) Date: Thu, 10 Dec 2015 21:07:32 +0800 Message-ID: <56697914.6090605@gmail.com> References: <1448372127-28115-1-git-send-email-tianyu.lan@intel.com> <20151207165039.GA20210@redhat.com> <56685631.50700@intel.com> <20151210101840.GA2570@work-vm> <566961C1.6030000@gmail.com> <20151210114114.GE2570@work-vm> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: "Lan, Tianyu" , "Michael S. Tsirkin" , qemu-devel@nongnu.org, emil.s.tantilov@intel.com, kvm@vger.kernel.org, ard.biesheuvel@linaro.org, aik@ozlabs.ru, donald.c.skidmore@intel.com, quintela@redhat.com, eddie.dong@intel.com, nrupal.jani@intel.com, agraf@suse.de, blauwirbel@gmail.com, cornelia.huck@de.ibm.com, alex.williamson@redhat.com, kraxel@redhat.com, anthony@codemonkey.ws, amit.shah@redhat.com, pbonzini@redhat.com, mark.d.rustad@intel.com, lcapitulino@redhat.com, gerlitz.or@gmail.com To: "Dr. David Alan Gilbert" Return-path: Received: from mail-pa0-f49.google.com ([209.85.220.49]:36762 "EHLO mail-pa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750977AbbLJNHj (ORCPT ); Thu, 10 Dec 2015 08:07:39 -0500 Received: by pacdm15 with SMTP id dm15so47646779pac.3 for ; Thu, 10 Dec 2015 05:07:39 -0800 (PST) In-Reply-To: <20151210114114.GE2570@work-vm> Sender: kvm-owner@vger.kernel.org List-ID: On 2015/12/10 19:41, Dr. David Alan Gilbert wrote: > * Yang Zhang (yang.zhang.wz@gmail.com) wrote: >> On 2015/12/10 18:18, Dr. David Alan Gilbert wrote: >>> * Lan, Tianyu (tianyu.lan@intel.com) wrote: >>>> On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote: >>>>> I thought about what this is doing at the high level, and I do have some >>>>> value in what you are trying to do, but I also think we need to clarify >>>>> the motivation a bit more. What you are saying is not really what the >>>>> patches are doing. >>>>> >>>>> And with that clearer understanding of the motivation in mind (assuming >>>>> it actually captures a real need), I would also like to suggest some >>>>> changes. >>>> >>>> Motivation: >>>> Most current solutions for migration with passthough device are based on >>>> the PCI hotplug but it has side affect and can't work for all device. >>>> >>>> For NIC device: >>>> PCI hotplug solution can work around Network device migration >>>> via switching VF and PF. >>>> >>>> But switching network interface will introduce service down time. >>>> >>>> I tested the service down time via putting VF and PV interface >>>> into a bonded interface and ping the bonded interface during plug >>>> and unplug VF. >>>> 1) About 100ms when add VF >>>> 2) About 30ms when del VF >>>> >>>> It also requires guest to do switch configuration. These are hard to >>>> manage and deploy from our customers. To maintain PV performance during >>>> migration, host side also needs to assign a VF to PV device. This >>>> affects scalability. >>>> >>>> These factors block SRIOV NIC passthough usage in the cloud service and >>>> OPNFV which require network high performance and stability a lot. >>> >>> Right, that I'll agree it's hard to do migration of a VM which uses >>> an SRIOV device; and while I think it should be possible to bond a virtio device >>> to a VF for networking and then hotplug the SR-IOV device I agree it's hard to manage. >>> >>>> For other kind of devices, it's hard to work. >>>> We are also adding migration support for QAT(QuickAssist Technology) device. >>>> >>>> QAT device user case introduction. >>>> Server, networking, big data, and storage applications use QuickAssist >>>> Technology to offload servers from handling compute-intensive operations, >>>> such as: >>>> 1) Symmetric cryptography functions including cipher operations and >>>> authentication operations >>>> 2) Public key functions including RSA, Diffie-Hellman, and elliptic curve >>>> cryptography >>>> 3) Compression and decompression functions including DEFLATE and LZS >>>> >>>> PCI hotplug will not work for such devices during migration and these >>>> operations will fail when unplug device. >>> >>> I don't understand that QAT argument; if the device is purely an offload >>> engine for performance, then why can't you fall back to doing the >>> same operations in the VM or in QEMU if the card is unavailable? >>> The tricky bit is dealing with outstanding operations. >>> >>>> So we are trying implementing a new solution which really migrates >>>> device state to target machine and won't affect user during migration >>>> with low service down time. >>> >>> Right, that's a good aim - the only question is how to do it. >>> >>> It looks like this is always going to need some device-specific code; >>> the question I see is whether that's in: >>> 1) qemu >>> 2) the host kernel >>> 3) the guest kernel driver >>> >>> The objections to this series seem to be that it needs changes to (3); >>> I can see the worry that the guest kernel driver might not get a chance >>> to run during the right time in migration and it's painful having to >>> change every guest driver (although your change is small). >>> >>> My question is what stage of the migration process do you expect to tell >>> the guest kernel driver to do this? >>> >>> If you do it at the start of the migration, and quiesce the device, >>> the migration might take a long time (say 30 minutes) - are you >>> intending the device to be quiesced for this long? And where are >>> you going to send the traffic? >>> If you are, then do you need to do it via this PCI trick, or could >>> you just do it via something higher level to quiesce the device. >>> >>> Or are you intending to do it just near the end of the migration? >>> But then how do we know how long it will take the guest driver to >>> respond? >> >> Ideally, it is able to leave guest driver unmodified but it requires the >> hypervisor or qemu to aware the device which means we may need a driver in >> hypervisor or qemu to handle the device on behalf of guest driver. > > Can you answer the question of when do you use your code - > at the start of migration or > just before the end? Tianyu can answer this question. In my initial design, i prefer to put more modifications in hypervisor and Qemu, and the only involvement from guest driver is how to restore the state after migration. But I don't know the later implementation since i have left Intel. > >>> It would be great if we could avoid changing the guest; but at least your guest >>> driver changes don't actually seem to be that hardware specific; could your >>> changes actually be moved to generic PCI level so they could be made >>> to work for lots of drivers? >> >> It is impossible to use one common solution for all devices unless the PCIE >> spec documents it clearly and i think one day it will be there. But before >> that, we need some workarounds on guest driver to make it work even it looks >> ugly. > > Dave > >> >> -- >> best regards >> yang > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- best regards yang