From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46443) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YudY2-0008Dj-C0 for qemu-devel@nongnu.org; Tue, 19 May 2015 05:10:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YudXx-0007bF-6X for qemu-devel@nongnu.org; Tue, 19 May 2015 05:10:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50554) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YudXw-0007b7-Sz for qemu-devel@nongnu.org; Tue, 19 May 2015 05:10:45 -0400 Date: Tue, 19 May 2015 11:10:39 +0200 From: "Michael S. Tsirkin" Message-ID: <20150519110817-mutt-send-email-mst@redhat.com> References: <55342C2E.9040804@redhat.com> <5538AE85.9000805@cn.fujitsu.com> <55390958.3000601@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55390958.3000601@redhat.com> Subject: Re: [Qemu-devel] [libvirt] [RFC 0/7] Live Migration with Pass-through Devices proposal List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laine Stump Cc: libvir-list@redhat.com, Chen Fan , qemu-devel@nongnu.org On Thu, Apr 23, 2015 at 11:01:44AM -0400, Laine Stump wrote: > On 04/23/2015 04:34 AM, Chen Fan wrote: > > > > On 04/20/2015 06:29 AM, Laine Stump wrote: > >> On 04/17/2015 04:53 AM, Chen Fan wrote: > >>> - on destination side, check whether need to hotplug new NIC > >>> according to specified XML. > >>> usually, we use migrate "--xml" command option to specify the > >>> destination host NIC mac > >>> address to hotplug a new NIC, because source side passthrough > >>> NIC mac address is different, > >>> then hotplug the deivce according to the destination XML > >>> configuration. > > >> Why does the MAC address need to be different? Are you suggesting doing > >> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets > >> its MAC address from the libvirt config, so it's very simple to use the > >> same MAC address across the migration. Any network card that would be > >> able to do this on any sort of useful scale will be SRIOV-capable (or > >> should be replaced with one that is - some of them are not that > >> expensive). > > > Hi Laine, > > > > I think SRIOV virtual NIC to support migration is good idea, > > but I also think some passthrough NIC without SRIOV-capable. for > > these NIC devices we only able to use to specify the > > passthrough > > function, so for these NIC I think we should support too. > > As I think you've already discovered, passing through non-SRIOV NICS is > problematic. It is completely impossible for the host to change their > MAC address before assigning them to the guest - the guest's driver sees > standard netdev hardware and resets it, which resets the MAC address to > the original value burned into the firmware. This makes management more > complicated, especially when you get into scenarios such as what we're > discussing (i.e. migration) where the actual hardware (and thus MAC > address) may be different from one run to the next. Right, passing through PFs is also insecure. Let's get everything working fine with VFs first, worry about PFs later. > Since libvirt's element requires a fixed MAC address in the > XML, it's not possible to have an that gets the actual > device from a network pool (without some serious hacking to that code), > and there is no support for plain (non-network) device pools; > there would need to be a separate (nonexistent) driver for that. Since > the element relies on the PCI address of the device (in the > subelement, which also must be fixed) to determine which device > to passthrough, a domain config with a that could be run on > two different machines would require the device to reside at exactly the > same PCI address on both machines, which is a very serious limitation to > have in an environment large enough that migrating domains is a requirement. > > Also, non-SRIOV NICs are limited to a single device per physical port, > meaning probably at most 4 devices per physical host PCIe slot, and this > results in a greatly reduced density on the host (and even more so on > the switch that connects to the host!) compared to even the old Intel > 82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it > - with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch > ports, while the same number of guests with non-SRIOV would take 4 PCIe > slots and 14(!) switch ports. The difference is even more striking when > comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox > (also 64?) or SolarFlare (128?) card. And don't forget that, because you > don't have pools of devices to be automatically chosen from, that each > guest domain that will be migrated requires a reserved NIC on *every* > machine it will be migrated to (no other domain can be configured to use > that NIC, in order to avoid conflicts). > > Of course you could complicate the software by adding a driver that > manages pools of generic hostdevs, and coordinates MAC address changes > with the guest (part of what you're suggesting), but all that extra > complexity not only takes a lot of time and effort to develop, it also > creates more code that needs to be maintained and tested for regressions > at each release. > > The alternative is to just spend $130 per host for an 82576 or Intel > I350 card (these are the cheapest SRIOV options I'm aware of). When > compared to the total cost of any hardware installation large enough to > support migration and have performance requirements high enough that NIC > passthrough is needed, this is a trivial amount. > > I guess the bottom line of all this is that (in my opinion, of course > :-) supporting useful migration of domains that used passed-through > non-SRIOV NICs would be an interesting experiment, but I don't see much > utility to it, other than "scratching an intellectual itch", and I'm > concerned that it would create more long term maintenance cost than it > was worth. I'm not sure it has no utility but it's easy to agree that VFs are more important, and focusing on this first is a good idea.