qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Laine Stump <laine@redhat.com>
Cc: libvir-list@redhat.com, Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal
Date: Tue, 19 May 2015 11:10:39 +0200	[thread overview]
Message-ID: <20150519110817-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <55390958.3000601@redhat.com>

On Thu, Apr 23, 2015 at 11:01:44AM -0400, Laine Stump wrote:
> On 04/23/2015 04:34 AM, Chen Fan wrote:
> >
> > On 04/20/2015 06:29 AM, Laine Stump wrote:
> >> On 04/17/2015 04:53 AM, Chen Fan wrote:
> >>> -  on destination side, check whether need to hotplug new NIC
> >>> according to specified XML.
> >>>     usually, we use migrate "--xml" command option to specify the
> >>> destination host NIC mac
> >>>     address to hotplug a new NIC, because source side passthrough
> >>> NIC mac address is different,
> >>>     then hotplug the deivce according to the destination XML
> >>> configuration.
> 
> >> Why does the MAC address need to be different? Are you suggesting doing
> >> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
> >> its MAC address from the libvirt config, so it's very simple to use the
> >> same MAC address across the migration. Any network card that would be
> >> able to do this on any sort of useful scale will be SRIOV-capable (or
> >> should be replaced with one that is - some of them are not that
> >> expensive).
> 
> > Hi Laine,
> >
> > I think SRIOV virtual NIC to support migration is good idea,
> > but I also think some passthrough NIC without SRIOV-capable. for
> > these NIC devices we only able to use <hostdev> to specify the
> > passthrough
> > function, so for these NIC I think we should support too.
> 
> As I think you've already discovered, passing through non-SRIOV NICS is
> problematic. It is completely impossible for the host to change their
> MAC address before assigning them to the guest - the guest's driver sees
> standard netdev hardware and resets it, which resets the MAC address to
> the original value burned into the firmware. This makes management more
> complicated, especially when you get into scenarios such as what we're
> discussing (i.e. migration) where the actual hardware (and thus MAC
> address) may be different from one run to the next.

Right, passing through PFs is also insecure.  Let's get
everything working fine with VFs first, worry about PFs later.


> Since libvirt's <interface> element requires a fixed MAC address in the
> XML, it's not possible to have an <interface> that gets the actual
> device from a network pool (without some serious hacking to that code),
> and there is no support for plain (non-network) <hostdev> device pools;
> there would need to be a separate (nonexistent) driver for that. Since
> the <hostdev> element relies on the PCI address of the device (in the
> <source> subelement, which also must be fixed) to determine which device
> to passthrough, a domain config with a <hostdev> that could be run on
> two different machines would require the device to reside at exactly the
> same PCI address on both machines, which is a very serious limitation to
> have in an environment large enough that migrating domains is a requirement.
> 
> Also, non-SRIOV NICs are limited to a single device per physical port,
> meaning probably at most 4 devices per physical host PCIe slot, and this
> results in a greatly reduced density on the host (and even more so on
> the switch that connects to the host!) compared to even the old Intel
> 82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it
> - with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch
> ports, while the same number of guests with non-SRIOV would take 4 PCIe
> slots and 14(!) switch ports. The difference is even more striking when
> comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox
> (also 64?) or SolarFlare (128?) card. And don't forget that, because you
> don't have pools of devices to be automatically chosen from, that each
> guest domain that will be migrated requires a reserved NIC on *every*
> machine it will be migrated to (no other domain can be configured to use
> that NIC, in order to avoid conflicts).
> 
> Of course you could complicate the software by adding a driver that
> manages pools of generic hostdevs, and coordinates MAC address changes
> with the guest (part of what you're suggesting), but all that extra
> complexity not only takes a lot of time and effort to develop, it also
> creates more code that needs to be maintained and tested for regressions
> at each release.
> 
> The alternative is to just spend $130 per host for an 82576 or Intel
> I350 card (these are the cheapest SRIOV options I'm aware of). When
> compared to the total cost of any hardware installation large enough to
> support migration and have performance requirements high enough that NIC
> passthrough is needed, this is a trivial amount.
> 
> I guess the bottom line of all this is that (in my opinion, of course
> :-) supporting useful migration of domains that used passed-through
> non-SRIOV NICs would be an interesting experiment, but I don't see much
> utility to it, other than "scratching an intellectual itch", and I'm
> concerned that it would create more long term maintenance cost than it
> was worth.

I'm not sure it has no utility but it's easy to agree that
VFs are more important, and focusing on this first is a good
idea.

  reply	other threads:[~2015-05-19  9:10 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-17  8:53 [Qemu-devel] [RFC 0/7] Live Migration with Pass-through Devices proposal Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 1/7] qemu-agent: add agent init callback when detecting guest setup Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 2/7] qemu: add guest init event callback to do the initialize work for guest Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 3/7] hostdev: add a 'bond' type element in <hostdev> element Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 4/7] qemu-agent: add qemuAgentCreateBond interface Chen Fan
2015-05-19  9:13   ` Michael S. Tsirkin
2015-05-29  7:37   ` Michal Privoznik
2015-04-17  8:53 ` [Qemu-devel] [RFC 5/7] hostdev: add parse ip and route for bond configure Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 6/7] migrate: hot remove hostdev at perform phase for bond device Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 7/7] migrate: add hostdev migrate status to support hostdev migration Chen Fan
2015-04-17  8:53 ` [Qemu-devel] [RFC 0/3] add support migration with passthrough device Chen Fan
2015-04-17  8:53   ` [Qemu-devel] [RFC 1/3] qemu-agent: add guest-network-set-interface command Chen Fan
2015-05-21 13:52     ` Olga Krishtal
2015-05-21 14:43       ` [Qemu-devel] [libvirt] " Eric Blake
2015-04-17  8:53   ` [Qemu-devel] [RFC 2/3] qemu-agent: add guest-network-delete-interface command Chen Fan
2015-04-17  8:53   ` [Qemu-devel] [RFC 3/3] qemu-agent: add notify for qemu-ga boot Chen Fan
2015-04-21 23:38     ` Eric Blake
2015-04-19 22:29 ` [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal Laine Stump
2015-04-22  4:22   ` Chen Fan
2015-04-23 14:14     ` Laine Stump
2015-04-23  8:34   ` Chen Fan
2015-04-23 15:01     ` Laine Stump
2015-05-19  9:10       ` Michael S. Tsirkin [this message]
2015-04-22  9:23 ` [Qemu-devel] " Daniel P. Berrange
2015-04-22 13:05   ` Daniel P. Berrange
2015-04-22 17:01   ` Dr. David Alan Gilbert
2015-04-22 17:06     ` Daniel P. Berrange
2015-04-22 17:12       ` Dr. David Alan Gilbert
2015-04-22 17:15         ` Daniel P. Berrange
2015-04-22 17:20           ` Dr. David Alan Gilbert
2015-04-23 16:35             ` [Qemu-devel] [libvirt] " Laine Stump
2015-05-19  9:04               ` Michael S. Tsirkin
2015-05-19  9:07   ` [Qemu-devel] " Michael S. Tsirkin
2015-05-19 14:15     ` [Qemu-devel] [libvirt] " Laine Stump
2015-05-19 14:21       ` Daniel P. Berrange
2015-05-19 15:03         ` Dr. David Alan Gilbert
2015-05-19 15:18           ` Michael S. Tsirkin
2015-05-19 15:35           ` Daniel P. Berrange
2015-05-19 15:39             ` Michael S. Tsirkin
2015-05-19 15:45               ` Daniel P. Berrange
2015-05-19 16:08                 ` Michael S. Tsirkin
2015-05-19 16:13                   ` Daniel P. Berrange
2015-05-19 16:27                   ` Dr. David Alan Gilbert
2015-05-19 15:21         ` Michael S. Tsirkin
2015-05-19 15:14       ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150519110817-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=chen.fan.fnst@cn.fujitsu.com \
    --cc=laine@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).