From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
To: Laine Stump <laine@redhat.com>, libvir-list@redhat.com
Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
Date: Thu, 23 Apr 2015 16:34:13 +0800 [thread overview]
Message-ID: <5538AE85.9000805@cn.fujitsu.com> (raw)
In-Reply-To: <55342C2E.9040804@redhat.com>
On 04/20/2015 06:29 AM, Laine Stump wrote:
> On 04/17/2015 04:53 AM, Chen Fan wrote:
>> backgrond:
>> Live migration is one of the most important features of virtualization technology.
>> With regard to recent virtualization techniques, performance of network I/O is critical.
>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
>> performance gap with native network I/O. Pass-through network devices have near
>> native performance, however, they have thus far prevented live migration. No existing
>> methods solve the problem of live migration with pass-through devices perfectly.
>>
>> There was an idea to solve the problem in website:
>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
>> Please refer to above document for detailed information.
> This functionality has been on my mind/bug list for a long time, but I
> haven't been able to pursue it much. See this BZ, along with the
> original patches submitted by Shradha Shah from SolarFlare:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=896716
>
> (I was a bit optimistic in my initial review of the patches - there are
> actually a lot of issues that weren't handled by those patches.)
>
>> So I think this problem maybe could be solved by using the combination of existing
>> technology. and the following steps are we considering to implement:
>>
>> - before boot VM, we anticipate to specify two NICs for creating bonding device
>> (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
>> in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> An interesting idea, but I think that is a 2nd level enhancement, not
> necessary initially (and maybe not ever, due to the high possibility of
> it being extremely difficult to get right in 100% of the cases).
>
>> - when qemu-guest-agent startup in guest it would send a notification to libvirt,
>> then libvirt will call the previous registered initialize callbacks. so through
>> the callback functions, we can create the bonding device according to the XML
>> configuration. and here we use netcf tool which can facilitate to create bonding device
>> easily.
> This isn't quite making sense - the bond will be on the guest, which may
> not have netcf installed. Anyway, I think it should be up to the guest's
> own system network config to have the bond already setup. If you try to
> impose it from outside that infrastructure, you run too much risk of
> running afoul of something on the guest (e.g. NetworkManager)
>
>
>> - during migration, unplug the passthroughed NIC. then do native migration.
> Correct. This is the most important part. But not just unplugging it,
> you also need to wait until the unplug operation completes (it is
> asynchronous). (After this point, the emulated NIC that is part of the
> bond would get all of the traffic).
>
>> - on destination side, check whether need to hotplug new NIC according to specified XML.
>> usually, we use migrate "--xml" command option to specify the destination host NIC mac
>> address to hotplug a new NIC, because source side passthrough NIC mac address is different,
>> then hotplug the deivce according to the destination XML configuration.
> Why does the MAC address need to be different? Are you suggesting doing
> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
> its MAC address from the libvirt config, so it's very simple to use the
> same MAC address across the migration. Any network card that would be
> able to do this on any sort of useful scale will be SRIOV-capable (or
> should be replaced with one that is - some of them are not that expensive).
Hi Laine,
I think SRIOV virtual NIC to support migration is good idea,
but I also think some passthrough NIC without SRIOV-capable. for
these NIC devices we only able to use <hostdev> to specify the passthrough
function, so for these NIC I think we should support too.
Thanks,
Chen
>
>
>> TODO:
>> 1. when hot add a new NIC in destination side after migration finished, the NIC device
>> need to re-enslave on bonding device in guest. otherwise, it is offline. maybe
>> we should consider bonding driver to support add interfaces dynamically.
> I never looked at the details of how SolarFlare's code handled the guest
> side (they have/had their own patchset they maintained for some older
> version of libvirt which integrated with some sort of enhanced bonding
> driver on the guests). I assumed the bond driver could handle this
> already, but have to say I never investigated.
>
>
>> This is an example on how this might work, so I want to hear some voices about this scenario.
>>
>> Thanks,
>> Chen
>>
>> Chen Fan (7):
>> qemu-agent: add agent init callback when detecting guest setup
>> qemu: add guest init event callback to do the initialize work for
>> guest
>> hostdev: add a 'bond' type element in <hostdev> element
>
> Putting this into <hostdev> is the wrong approach, for two reasons: 1)
> it doesn't account for the device to be used being in a different
> address on the source and destination hosts, 2) the <interface> element
> already has much of the config you need, and an interface type
> supporting hostdev passthrough.
>
> It has been possible to do passthrough of an SRIOV VF via <interface
> type='hostdev'> for a long time now and, even better, via an <interface
> type='network'> where the network pointed to contains a pool of VFs - As
> long as the source and destination hosts both have networks with the
> same name, libvirt will be able to find a currently available device on
> the destination as it migrates from one host to another instead of
> relying on both hosts having the exact same device at the exact same
> address on the host and destination (and also magically unused by any
> other guest). This page explains the use of a "hostdev network" which
> has a pool of devices:
>
> http://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition
>
> This was designed specifically with the idea in mind that one day it
> would be possible to migrate a domain with a hostdev device (as long as
> the guest could handle the hostdev device being temporarily unplugged
> during the migration).
>
>> qemu-agent: add qemuAgentCreateBond interface
>> hostdev: add parse ip and route for bond configure
> Again, I think that this level of detail about the guest network config
> belongs on the guest, not in libvirt.
>
>> migrate: hot remove hostdev at perform phase for bond device
> ^^ this is the useful part but I don't think the right method is to make
> this action dependent on the device being a "bond".
>
> I think that in this respect Shradha's patches had a better idea - any
> hostdev (or, by implication <interface type='hostdev'> or, much more
> usefully <interface type='network'> pointing to a pool of VFs - could
> have an attribute "ephemeral". If ephemeral was "yes", then the device
> would always be unplugged prior to migration and re-plugged when
> migration was completed (the same thing should be done when
> saving/restoring a domain which also can't currently be done with a
> domain that has a passthrough device).
>
> For that matter, this could be a general-purpose thing (although
> probably most useful for hostdevs) - just make it possible for *any*
> hotpluggable device to be "ephemeral"; the meaning of this would be that
> every device marked as ephemeral should be unplugged prior to migration
> or save (and libvirt should wait for qemu to notify that the unplug is
> completed), and re-plugged right after the guest is restarted.
>
> (possibly it should be implemented as an <ephemeral> *element* rather
> than attribute, so that options could be specified).
>
> After that is implemented and works properly, then it might be the time
> to think about auto-creating the bond (although again, my opinion is
> that this is getting a bit too intrusive into the guest (and making it
> more likely to fail - I know from long experience with netcf that it is
> all too easy for some other service on the system (ahem) to mess up all
> your hard work); I think it would be better to just let the guest deal
> with setting up a bond in its system network config, and if the bond
> driver can't handle having a device in the bond unplugging and plugging,
> then the bond driver should be enhanced).
>
>
>> migrate: add hostdev migrate status to support hostdev migration
>>
>> docs/schemas/basictypes.rng | 6 ++
>> docs/schemas/domaincommon.rng | 37 ++++++++
>> src/conf/domain_conf.c | 195 ++++++++++++++++++++++++++++++++++++++---
>> src/conf/domain_conf.h | 40 +++++++--
>> src/conf/networkcommon_conf.c | 17 ----
>> src/conf/networkcommon_conf.h | 17 ++++
>> src/libvirt_private.syms | 1 +
>> src/qemu/qemu_agent.c | 196 +++++++++++++++++++++++++++++++++++++++++-
>> src/qemu/qemu_agent.h | 12 +++
>> src/qemu/qemu_command.c | 3 +
>> src/qemu/qemu_domain.c | 70 +++++++++++++++
>> src/qemu/qemu_domain.h | 14 +++
>> src/qemu/qemu_driver.c | 38 ++++++++
>> src/qemu/qemu_hotplug.c | 8 +-
>> src/qemu/qemu_migration.c | 91 ++++++++++++++++++++
>> src/qemu/qemu_migration.h | 4 +
>> src/qemu/qemu_process.c | 32 +++++++
>> src/util/virhostdev.c | 3 +
>> 18 files changed, 745 insertions(+), 39 deletions(-)
>>
> .
>
next prev parent reply other threads:[~2015-04-23 8:35 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-17 8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 1/7] qemu-agent: add agent init callback when detecting guest setup Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 2/7] qemu: add guest init event callback to do the initialize work for guest Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 3/7] hostdev: add a 'bond' type element in <hostdev> element Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface Chen Fan
2015-05-19 9:13 ` Michael S. Tsirkin
2015-05-29 7:37 ` Michal Privoznik
2015-04-17 8:53 ` [Qemu-devel] [RFC 5/7] hostdev: add parse ip and route for bond configure Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 6/7] migrate: hot remove hostdev at perform phase for bond device Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 7/7] migrate: add hostdev migrate status to support hostdev migration Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command Chen Fan
2015-05-21 13:52 ` Olga Krishtal
2015-05-21 14:43 ` [Qemu-devel] [libvirt] " Eric Blake
2015-04-17 8:53 ` [Qemu-devel] [RFC 2/3] qemu-agent: add guest-network-delete-interface command Chen Fan
2015-04-17 8:53 ` [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot Chen Fan
2015-04-21 23:38 ` Eric Blake
2015-04-19 22:29 ` [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal Laine Stump
2015-04-22 4:22 ` Chen Fan
2015-04-23 14:14 ` Laine Stump
2015-04-23 8:34 ` Chen Fan [this message]
2015-04-23 15:01 ` Laine Stump
2015-05-19 9:10 ` Michael S. Tsirkin
2015-04-22 9:23 ` [Qemu-devel] " Daniel P. Berrange
2015-04-22 13:05 ` Daniel P. Berrange
2015-04-22 17:01 ` Dr. David Alan Gilbert
2015-04-22 17:06 ` Daniel P. Berrange
2015-04-22 17:12 ` Dr. David Alan Gilbert
2015-04-22 17:15 ` Daniel P. Berrange
2015-04-22 17:20 ` Dr. David Alan Gilbert
2015-04-23 16:35 ` [Qemu-devel] [libvirt] " Laine Stump
2015-05-19 9:04 ` Michael S. Tsirkin
2015-05-19 9:07 ` [Qemu-devel] " Michael S. Tsirkin
2015-05-19 14:15 ` [Qemu-devel] [libvirt] " Laine Stump
2015-05-19 14:21 ` Daniel P. Berrange
2015-05-19 15:03 ` Dr. David Alan Gilbert
2015-05-19 15:18 ` Michael S. Tsirkin
2015-05-19 15:35 ` Daniel P. Berrange
2015-05-19 15:39 ` Michael S. Tsirkin
2015-05-19 15:45 ` Daniel P. Berrange
2015-05-19 16:08 ` Michael S. Tsirkin
2015-05-19 16:13 ` Daniel P. Berrange
2015-05-19 16:27 ` Dr. David Alan Gilbert
2015-05-19 15:21 ` Michael S. Tsirkin
2015-05-19 15:14 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5538AE85.9000805@cn.fujitsu.com \
--to=chen.fan.fnst@cn.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=laine@redhat.com \
--cc=libvir-list@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.