From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51286) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YkmIW-00038N-Gr for qemu-devel@nongnu.org; Wed, 22 Apr 2015 00:30:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YkmCF-0004dP-TE for qemu-devel@nongnu.org; Wed, 22 Apr 2015 00:23:40 -0400 Received: from [59.151.112.132] (port=31290 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YkmCC-0004KH-SH for qemu-devel@nongnu.org; Wed, 22 Apr 2015 00:23:35 -0400 Message-ID: <55372212.7030108@cn.fujitsu.com> Date: Wed, 22 Apr 2015 12:22:42 +0800 From: Chen Fan MIME-Version: 1.0 References: <55342C2E.9040804@redhat.com> In-Reply-To: <55342C2E.9040804@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laine Stump , libvir-list@redhat.com Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org Hi Laine, Thanks for your review for my patches. and do you know that solarflare's patches have made some update version since https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html ? if not, I hope to go on to complete this work. ;) Thanks, Chen On 04/20/2015 06:29 AM, Laine Stump wrote: > On 04/17/2015 04:53 AM, Chen Fan wrote: >> backgrond: >> Live migration is one of the most important features of virtualization technology. >> With regard to recent virtualization techniques, performance of network I/O is critical. >> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant >> performance gap with native network I/O. Pass-through network devices have near >> native performance, however, they have thus far prevented live migration. No existing >> methods solve the problem of live migration with pass-through devices perfectly. >> >> There was an idea to solve the problem in website: >> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf >> Please refer to above document for detailed information. > This functionality has been on my mind/bug list for a long time, but I > haven't been able to pursue it much. See this BZ, along with the > original patches submitted by Shradha Shah from SolarFlare: > > https://bugzilla.redhat.com/show_bug.cgi?id=896716 > > (I was a bit optimistic in my initial review of the patches - there are > actually a lot of issues that weren't handled by those patches.) > >> So I think this problem maybe could be solved by using the combination of existing >> technology. and the following steps are we considering to implement: >> >> - before boot VM, we anticipate to specify two NICs for creating bonding device >> (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses >> in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest. > An interesting idea, but I think that is a 2nd level enhancement, not > necessary initially (and maybe not ever, due to the high possibility of > it being extremely difficult to get right in 100% of the cases). > >> - when qemu-guest-agent startup in guest it would send a notification to libvirt, >> then libvirt will call the previous registered initialize callbacks. so through >> the callback functions, we can create the bonding device according to the XML >> configuration. and here we use netcf tool which can facilitate to create bonding device >> easily. > This isn't quite making sense - the bond will be on the guest, which may > not have netcf installed. Anyway, I think it should be up to the guest's > own system network config to have the bond already setup. If you try to > impose it from outside that infrastructure, you run too much risk of > running afoul of something on the guest (e.g. NetworkManager) > > >> - during migration, unplug the passthroughed NIC. then do native migration. > Correct. This is the most important part. But not just unplugging it, > you also need to wait until the unplug operation completes (it is > asynchronous). (After this point, the emulated NIC that is part of the > bond would get all of the traffic). > >> - on destination side, check whether need to hotplug new NIC according to specified XML. >> usually, we use migrate "--xml" command option to specify the destination host NIC mac >> address to hotplug a new NIC, because source side passthrough NIC mac address is different, >> then hotplug the deivce according to the destination XML configuration. > Why does the MAC address need to be different? Are you suggesting doing > this with passed-through non-SRIOV NICs? An SRIOV virtual function gets > its MAC address from the libvirt config, so it's very simple to use the > same MAC address across the migration. Any network card that would be > able to do this on any sort of useful scale will be SRIOV-capable (or > should be replaced with one that is - some of them are not that expensive). > > >> TODO: >> 1. when hot add a new NIC in destination side after migration finished, the NIC device >> need to re-enslave on bonding device in guest. otherwise, it is offline. maybe >> we should consider bonding driver to support add interfaces dynamically. > I never looked at the details of how SolarFlare's code handled the guest > side (they have/had their own patchset they maintained for some older > version of libvirt which integrated with some sort of enhanced bonding > driver on the guests). I assumed the bond driver could handle this > already, but have to say I never investigated. > > >> This is an example on how this might work, so I want to hear some voices about this scenario. >> >> Thanks, >> Chen >> >> Chen Fan (7): >> qemu-agent: add agent init callback when detecting guest setup >> qemu: add guest init event callback to do the initialize work for >> guest >> hostdev: add a 'bond' type element in element > > Putting this into is the wrong approach, for two reasons: 1) > it doesn't account for the device to be used being in a different > address on the source and destination hosts, 2) the element > already has much of the config you need, and an interface type > supporting hostdev passthrough. > > It has been possible to do passthrough of an SRIOV VF via type='hostdev'> for a long time now and, even better, via an type='network'> where the network pointed to contains a pool of VFs - As > long as the source and destination hosts both have networks with the > same name, libvirt will be able to find a currently available device on > the destination as it migrates from one host to another instead of > relying on both hosts having the exact same device at the exact same > address on the host and destination (and also magically unused by any > other guest). This page explains the use of a "hostdev network" which > has a pool of devices: > > http://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition > > This was designed specifically with the idea in mind that one day it > would be possible to migrate a domain with a hostdev device (as long as > the guest could handle the hostdev device being temporarily unplugged > during the migration). > >> qemu-agent: add qemuAgentCreateBond interface >> hostdev: add parse ip and route for bond configure > Again, I think that this level of detail about the guest network config > belongs on the guest, not in libvirt. > >> migrate: hot remove hostdev at perform phase for bond device > ^^ this is the useful part but I don't think the right method is to make > this action dependent on the device being a "bond". > > I think that in this respect Shradha's patches had a better idea - any > hostdev (or, by implication or, much more > usefully pointing to a pool of VFs - could > have an attribute "ephemeral". If ephemeral was "yes", then the device > would always be unplugged prior to migration and re-plugged when > migration was completed (the same thing should be done when > saving/restoring a domain which also can't currently be done with a > domain that has a passthrough device). > > For that matter, this could be a general-purpose thing (although > probably most useful for hostdevs) - just make it possible for *any* > hotpluggable device to be "ephemeral"; the meaning of this would be that > every device marked as ephemeral should be unplugged prior to migration > or save (and libvirt should wait for qemu to notify that the unplug is > completed), and re-plugged right after the guest is restarted. > > (possibly it should be implemented as an *element* rather > than attribute, so that options could be specified). > > After that is implemented and works properly, then it might be the time > to think about auto-creating the bond (although again, my opinion is > that this is getting a bit too intrusive into the guest (and making it > more likely to fail - I know from long experience with netcf that it is > all too easy for some other service on the system (ahem) to mess up all > your hard work); I think it would be better to just let the guest deal > with setting up a bond in its system network config, and if the bond > driver can't handle having a device in the bond unplugging and plugging, > then the bond driver should be enhanced). > > >> migrate: add hostdev migrate status to support hostdev migration >> >> docs/schemas/basictypes.rng | 6 ++ >> docs/schemas/domaincommon.rng | 37 ++++++++ >> src/conf/domain_conf.c | 195 ++++++++++++++++++++++++++++++++++++++--- >> src/conf/domain_conf.h | 40 +++++++-- >> src/conf/networkcommon_conf.c | 17 ---- >> src/conf/networkcommon_conf.h | 17 ++++ >> src/libvirt_private.syms | 1 + >> src/qemu/qemu_agent.c | 196 +++++++++++++++++++++++++++++++++++++++++- >> src/qemu/qemu_agent.h | 12 +++ >> src/qemu/qemu_command.c | 3 + >> src/qemu/qemu_domain.c | 70 +++++++++++++++ >> src/qemu/qemu_domain.h | 14 +++ >> src/qemu/qemu_driver.c | 38 ++++++++ >> src/qemu/qemu_hotplug.c | 8 +- >> src/qemu/qemu_migration.c | 91 ++++++++++++++++++++ >> src/qemu/qemu_migration.h | 4 + >> src/qemu/qemu_process.c | 32 +++++++ >> src/util/virhostdev.c | 3 + >> 18 files changed, 745 insertions(+), 39 deletions(-) >> > . >