qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
To: Laine Stump <laine@redhat.com>, libvir-list@redhat.com
Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
Date: Wed, 22 Apr 2015 12:22:42 +0800	[thread overview]
Message-ID: <55372212.7030108@cn.fujitsu.com> (raw)
In-Reply-To: <55342C2E.9040804@redhat.com>

Hi Laine,

Thanks for your review for my patches.

and do you know that solarflare's patches have made some update version
since

https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html

?

if not, I hope to go on to complete this work. ;)

Thanks,
Chen


On 04/20/2015 06:29 AM, Laine Stump wrote:
> On 04/17/2015 04:53 AM, Chen Fan wrote:
>> backgrond:
>> Live migration is one of the most important features of virtualization technology.
>> With regard to recent virtualization techniques, performance of network I/O is critical.
>> Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
>> performance gap with native network I/O. Pass-through network devices have near
>> native performance, however, they have thus far prevented live migration. No existing
>> methods solve the problem of live migration with pass-through devices perfectly.
>>
>> There was an idea to solve the problem in website:
>> https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
>> Please refer to above document for detailed information.
> This functionality has been on my mind/bug list for a long time, but I
> haven't been able to pursue it much. See this BZ, along with the
> original patches submitted by Shradha Shah from SolarFlare:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=896716
>
> (I was a bit optimistic in my initial review of the patches - there are
> actually a lot of issues that weren't handled by those patches.)
>
>> So I think this problem maybe could be solved by using the combination of existing
>> technology. and the following steps are we considering to implement:
>>
>> -  before boot VM, we anticipate to specify two NICs for creating bonding device
>>     (one plugged and one virtual NIC) in XML. here we can specify the NIC's mac addresses
>>     in XML, which could facilitate qemu-guest-agent to find the network interfaces in guest.
> An interesting idea, but I think that is a 2nd level enhancement, not
> necessary initially (and maybe not ever, due to the high possibility of
> it being extremely difficult to get right in 100% of the cases).
>
>> -  when qemu-guest-agent startup in guest it would send a notification to libvirt,
>>     then libvirt will call the previous registered initialize callbacks. so through
>>     the callback functions, we can create the bonding device according to the XML
>>     configuration. and here we use netcf tool which can facilitate to create bonding device
>>     easily.
> This isn't quite making sense - the bond will be on the guest, which may
> not have netcf installed. Anyway, I think it should be up to the guest's
> own system network config to have the bond already setup. If you try to
> impose it from outside that infrastructure, you run too much risk of
> running afoul of something on the guest (e.g. NetworkManager)
>
>
>> -  during migration, unplug the passthroughed NIC. then do native migration.
> Correct. This is the most important part. But not just unplugging it,
> you also need to wait until the unplug operation completes (it is
> asynchronous). (After this point, the emulated NIC that is part of the
> bond would get all of the traffic).
>
>> -  on destination side, check whether need to hotplug new NIC according to specified XML.
>>     usually, we use migrate "--xml" command option to specify the destination host NIC mac
>>     address to hotplug a new NIC, because source side passthrough NIC mac address is different,
>>     then hotplug the deivce according to the destination XML configuration.
> Why does the MAC address need to be different? Are you suggesting doing
> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
> its MAC address from the libvirt config, so it's very simple to use the
> same MAC address across the migration. Any network card that would be
> able to do this on any sort of useful scale will be SRIOV-capable (or
> should be replaced with one that is - some of them are not that expensive).
>
>
>> TODO:
>>    1.  when hot add a new NIC in destination side after migration finished, the NIC device
>>        need to re-enslave on bonding device in guest. otherwise, it is offline. maybe
>>        we should consider bonding driver to support add interfaces dynamically.
> I never looked at the details of how SolarFlare's code handled the guest
> side (they have/had their own patchset they maintained for some older
> version of libvirt which integrated with some sort of enhanced bonding
> driver on the guests). I assumed the bond driver could handle this
> already, but have to say I never investigated.
>
>
>> This is an example on how this might work, so I want to hear some voices about this scenario.
>>
>> Thanks,
>> Chen
>>
>> Chen Fan (7):
>>    qemu-agent: add agent init callback when detecting guest setup
>>    qemu: add guest init event callback to do the initialize work for
>>      guest
>>    hostdev: add a 'bond' type element in <hostdev> element
>
> Putting this into <hostdev> is the wrong approach, for two reasons: 1)
> it doesn't account for the device to be used being in a different
> address on the source and destination hosts, 2) the <interface> element
> already has much of the config you need, and an interface type
> supporting hostdev passthrough.
>
> It has been possible to do passthrough of an SRIOV VF via <interface
> type='hostdev'> for a long time now and, even better, via an <interface
> type='network'> where the network pointed to contains a pool of VFs - As
> long as the source and destination hosts both have networks with the
> same name, libvirt will be able to find a currently available device on
> the destination as it migrates from one host to another instead of
> relying on both hosts having the exact same device at the exact same
> address on the host and destination (and also magically unused by any
> other guest). This page explains the use of a "hostdev network" which
> has a pool of devices:
>
> http://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition
>
> This was designed specifically with the idea in mind that one day it
> would be possible to migrate a domain with a hostdev device (as long as
> the guest could handle the hostdev device being temporarily unplugged
> during the migration).
>
>>    qemu-agent: add qemuAgentCreateBond interface
>>    hostdev: add parse ip and route for bond configure
> Again, I think that this level of detail about the guest network config
> belongs on the guest, not in libvirt.
>
>>    migrate: hot remove hostdev at perform phase for bond device
> ^^ this is the useful part but I don't think the right method is to make
> this action dependent on the device being a "bond".
>
> I think that in this respect Shradha's patches had a better idea - any
> hostdev (or, by implication <interface type='hostdev'> or, much more
> usefully <interface type='network'> pointing to a pool of VFs - could
> have an attribute "ephemeral". If ephemeral was "yes", then the device
> would always be unplugged prior to migration and re-plugged when
> migration was completed (the same thing should be done when
> saving/restoring a domain which also can't currently be done with a
> domain that has a passthrough device).
>
> For that matter, this could be a general-purpose thing (although
> probably most useful for hostdevs) - just make it possible for *any*
> hotpluggable device to be "ephemeral"; the meaning of this would be that
> every device marked as ephemeral should be unplugged prior to migration
> or save (and libvirt should wait for qemu to notify that the unplug is
> completed), and re-plugged right after the guest is restarted.
>
> (possibly it should be implemented as an <ephemeral> *element* rather
> than attribute, so that options could be specified).
>
> After that is implemented and works properly, then it might be the time
> to think about auto-creating the bond (although again, my opinion is
> that this is getting a bit too intrusive into the guest (and making it
> more likely to fail - I know from long experience with netcf that it is
> all too easy for some other service on the system (ahem) to mess up all
> your hard work); I think it would be better to just let the guest deal
> with setting up a bond in its system network config, and if the bond
> driver can't handle having a device in the bond unplugging and plugging,
> then the bond driver should be enhanced).
>
>
>>    migrate: add hostdev migrate status to support hostdev migration
>>
>>   docs/schemas/basictypes.rng   |   6 ++
>>   docs/schemas/domaincommon.rng |  37 ++++++++
>>   src/conf/domain_conf.c        | 195 ++++++++++++++++++++++++++++++++++++++---
>>   src/conf/domain_conf.h        |  40 +++++++--
>>   src/conf/networkcommon_conf.c |  17 ----
>>   src/conf/networkcommon_conf.h |  17 ++++
>>   src/libvirt_private.syms      |   1 +
>>   src/qemu/qemu_agent.c         | 196 +++++++++++++++++++++++++++++++++++++++++-
>>   src/qemu/qemu_agent.h         |  12 +++
>>   src/qemu/qemu_command.c       |   3 +
>>   src/qemu/qemu_domain.c        |  70 +++++++++++++++
>>   src/qemu/qemu_domain.h        |  14 +++
>>   src/qemu/qemu_driver.c        |  38 ++++++++
>>   src/qemu/qemu_hotplug.c       |   8 +-
>>   src/qemu/qemu_migration.c     |  91 ++++++++++++++++++++
>>   src/qemu/qemu_migration.h     |   4 +
>>   src/qemu/qemu_process.c       |  32 +++++++
>>   src/util/virhostdev.c         |   3 +
>>   18 files changed, 745 insertions(+), 39 deletions(-)
>>
> .
>

  reply	other threads:[~2015-04-22  4:30 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 1/7] qemu-agent: add agent init callback when detecting guest setup Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 2/7] qemu: add guest init event callback to do the initialize work for guest Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 3/7] hostdev: add a 'bond' type element in <hostdev> element Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface Chen Fan
2015-05-19  9:13   ` Michael S. Tsirkin
2015-05-29  7:37   ` Michal Privoznik
2015-04-17  8:53 ` [Qemu-devel] [RFC 5/7] hostdev: add parse ip and route for bond configure Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 6/7] migrate: hot remove hostdev at perform phase for bond device Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 7/7] migrate: add hostdev migrate status to support hostdev migration Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
2015-04-17  8:53   ` [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command Chen Fan
2015-05-21 13:52     ` Olga Krishtal
2015-05-21 14:43       ` [Qemu-devel] [libvirt] " Eric Blake
2015-04-17  8:53   ` [Qemu-devel] [RFC 2/3] qemu-agent: add guest-network-delete-interface command Chen Fan
2015-04-17  8:53   ` [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot Chen Fan
2015-04-21 23:38     ` Eric Blake
2015-04-19 22:29 ` [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal Laine Stump
2015-04-22  4:22   ` Chen Fan [this message]
2015-04-23 14:14     ` Laine Stump
2015-04-23  8:34   ` Chen Fan
2015-04-23 15:01     ` Laine Stump
2015-05-19  9:10       ` Michael S. Tsirkin
2015-04-22  9:23 ` [Qemu-devel] " Daniel P. Berrange
2015-04-22 13:05   ` Daniel P. Berrange
2015-04-22 17:01   ` Dr. David Alan Gilbert
2015-04-22 17:06     ` Daniel P. Berrange
2015-04-22 17:12       ` Dr. David Alan Gilbert
2015-04-22 17:15         ` Daniel P. Berrange
2015-04-22 17:20           ` Dr. David Alan Gilbert
2015-04-23 16:35             ` [Qemu-devel] [libvirt] " Laine Stump
2015-05-19  9:04               ` Michael S. Tsirkin
2015-05-19  9:07   ` [Qemu-devel] " Michael S. Tsirkin
2015-05-19 14:15     ` [Qemu-devel] [libvirt] " Laine Stump
2015-05-19 14:21       ` Daniel P. Berrange
2015-05-19 15:03         ` Dr. David Alan Gilbert
2015-05-19 15:18           ` Michael S. Tsirkin
2015-05-19 15:35           ` Daniel P. Berrange
2015-05-19 15:39             ` Michael S. Tsirkin
2015-05-19 15:45               ` Daniel P. Berrange
2015-05-19 16:08                 ` Michael S. Tsirkin
2015-05-19 16:13                   ` Daniel P. Berrange
2015-05-19 16:27                   ` Dr. David Alan Gilbert
2015-05-19 15:21         ` Michael S. Tsirkin
2015-05-19 15:14       ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55372212.7030108@cn.fujitsu.com \
    --to=chen.fan.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=laine@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).