Re: [RFC] virtio-net: help live migrate SR-IOV devices

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	virtualization@lists.linux-foundation.org,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	Achiad <achiad@mellanox.com>,
	Peter Waskiewicz Jr <peter.waskiewicz.jr@intel.com>,
	"Singhai, Anjali" <anjali.singhai@intel.com>,
	Andy Gospodarek <gospo@broadcom.com>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	netdev <netdev@vger.kernel.org>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>
Subject: Re: [RFC] virtio-net: help live migrate SR-IOV devices
Date: Thu, 30 Nov 2017 15:54:40 +0200	[thread overview]
Message-ID: <20171130153522-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20171129195138.63512ead@cakuba.netronome.com>

On Wed, Nov 29, 2017 at 07:51:38PM -0800, Jakub Kicinski wrote:
> On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:
> > On 2017年11月29日 03:27, Jesse Brandeburg wrote:
> > > Hi, I'd like to get some feedback on a proposal to enhance virtio-net
> > > to ease configuration of a VM and that would enable live migration of
> > > passthrough network SR-IOV devices.
> > >
> > > Today we have SR-IOV network devices (VFs) that can be passed into a VM
> > > in order to enable high performance networking direct within the VM.
> > > The problem I am trying to address is that this configuration is
> > > generally difficult to live-migrate.  There is documentation [1]
> > > indicating that some OS/Hypervisor vendors will support live migration
> > > of a system with a direct assigned networking device.  The problem I
> > > see with these implementations is that the network configuration
> > > requirements that are passed on to the owner of the VM are quite
> > > complicated.  You have to set up bonding, you have to configure it to
> > > enslave two interfaces, those interfaces (one is virtio-net, the other
> > > is SR-IOV device/driver like ixgbevf) must support MAC address changes
> > > requested in the VM, and on and on...
> > >
> > > So, on to the proposal:
> > > Modify virtio-net driver to be a single VM network device that
> > > enslaves an SR-IOV network device (inside the VM) with the same MAC
> > > address. This would cause the virtio-net driver to appear and work like
> > > a simplified bonding/team driver.  The live migration problem would be
> > > solved just like today's bonding solution, but the VM user's networking
> > > config would be greatly simplified.
> > >
> > > At it's simplest, it would appear something like this in the VM.
> > >
> > > ==========
> > > = vnet0  =
> > >           =============
> > > (virtio- =       |
> > >   net)    =       |
> > >           =  ==========
> > >           =  = ixgbef =
> > > ==========  ==========
> > >
> > > (forgive the ASCII art)
> > >
> > > The fast path traffic would prefer the ixgbevf or other SR-IOV device
> > > path, and fall back to virtio's transmit/receive when migrating.
> > >
> > > Compared to today's options this proposal would
> > > 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
> > >     speeds
> > > 2) simplify end user configuration in the VM (most if not all of the
> > >     set up to enable migration would be done in the hypervisor)
> > > 3) allow live migration via a simple link down and maybe a PCI
> > >     hot-unplug of the SR-IOV device, with failover to the virtio-net
> > >     driver core
> > > 4) allow vendor agnostic hardware acceleration, and live migration
> > >     between vendors if the VM os has driver support for all the required
> > >     SR-IOV devices.
> > >
> > > Runtime operation proposed:
> > > - <in either order> virtio-net driver loads, SR-IOV driver loads
> > > - virtio-net finds other NICs that match it's MAC address by
> > >    both examining existing interfaces, and sets up a new device notifier
> > > - virtio-net enslaves the first NIC with the same MAC address
> > > - virtio-net brings up the slave, and makes it the "preferred" path
> > > - virtio-net follows the behavior of an active backup bond/team
> > > - virtio-net acts as the interface to the VM
> > > - live migration initiates
> > > - link goes down on SR-IOV, or SR-IOV device is removed
> > > - failover to virtio-net as primary path
> > > - migration continues to new host
> > > - new host is started with virio-net as primary
> > > - if no SR-IOV, virtio-net stays primary
> > > - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> > > - virtio-net notices new NIC and starts over at enslave step above
> > >
> > > Future ideas (brainstorming):
> > > - Optimize Fast east-west by having special rules to direct east-west
> > >    traffic through virtio-net traffic path
> > >
> > > Thanks for reading!
> > > Jesse  
> > 
> > Cc netdev.
> > 
> > Interesting, and this method is actually used by netvsc now:
> > 
> > commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> > Author: stephen hemminger <stephen@networkplumber.org>
> > Date:   Tue Aug 1 19:58:53 2017 -0700
> > 
> >      netvsc: transparent VF management
> > 
> >      This patch implements transparent fail over from synthetic NIC to
> >      SR-IOV virtual function NIC in Hyper-V environment. It is a better
> >      alternative to using bonding as is done now. Instead, the receive and
> >      transmit fail over is done internally inside the driver.
> > 
> >      Using bonding driver has lots of issues because it depends on the
> >      script being run early enough in the boot process and with sufficient
> >      information to make the association. This patch moves all that
> >      functionality into the kernel.
> > 
> >      Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >      Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > If my understanding is correct there's no need to for any extension of 
> > virtio spec. If this is true, maybe you can start to prepare the patch?
>
> IMHO this is as close to policy in the kernel as one can get.  User
> land has all the information it needs to instantiate that bond/team
> automatically.

It does have this info (MAC addresses match) but where's the policy
here? IMHO the policy has been set by the hypervisor already.
>From hypervisor POV adding passthrough is a commitment not to migrate
until guest stops using the passthrough device.

Within the guest, the bond is required for purely functional reasons - just to
maintain a link up since we know SRIOV will will go away. Maintaining an
uninterrupted connection is not a policy - it's what networking is
about.

>  In fact I'm trying to discuss this with NetworkManager
> folks and Red Hat right now:
> 
> https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html

I thought we should do it too, for a while.

But now, I think that the real issue is this: kernel exposes what looks
like two network devices to userspace, but in fact it is just one
backend device, just exposed by hypervisor in a weird way for
compatibility reasons.

For example you will not get a better reliability or throughput by using
both of them - the only bonding mode that makes sense is fail over. As
another example, if the underlying physical device lost its link, trying
to use virtio won't help - it's only useful when the passthrough device
is gone for good.  As another example, there is no point in not
configuring a bond. As a last example, depending on how the backend is
configured, virtio might not even work when the pass-through device is
active.

So from that point of view, showing two network devices to userspace is
a bug that we are asking userspace to work around.

> Can we flip the argument and ask why is the kernel supposed to be
> responsible for this?

Because if we show a single device to userspace the number of
misconfigured guests will go down, and we won't lose any useful
flexibility.

>  It's not like we run DHCP out of the kernel
> on new interfaces... 

Because one can set up a static IP, IPv6 doesn't always need DHCP, etc.

-- 
MST

next prev parent reply	other threads:[~2017-11-30 13:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20171128112722.00003716@intel.com>
2017-11-30  3:29 ` [RFC] virtio-net: help live migrate SR-IOV devices Jason Wang
2017-11-30  3:51   ` Jakub Kicinski
2017-11-30  4:10     ` Stephen Hemminger
2017-11-30  4:21       ` Jakub Kicinski
2017-11-30 13:54     ` Michael S. Tsirkin [this message]
2017-11-30 20:48       ` Jakub Kicinski
2017-12-01  5:13         ` Michael S. Tsirkin
2017-11-30  8:08   ` achiad shochat
2017-11-30 14:11     ` Michael S. Tsirkin
2017-12-01 20:08       ` Shannon Nelson
2017-12-03  5:05         ` Michael S. Tsirkin
2017-12-03  9:14           ` achiad shochat
2017-12-03 17:35             ` Stephen Hemminger
2017-12-04  9:51               ` achiad shochat
2017-12-04 16:30                 ` Alexander Duyck
2017-12-05  9:59                   ` achiad shochat
2017-12-05 19:20                     ` Michael S. Tsirkin
2017-12-05 21:52                       ` Jesse Brandeburg
2017-12-05 22:05                         ` Michael S. Tsirkin
2017-12-07  7:28                       ` achiad shochat
2017-12-07 16:45                         ` Alexander Duyck
2017-12-07 16:53                           ` Michael S. Tsirkin
2017-12-05 22:29                     ` Jakub Kicinski
2017-12-05 22:41                       ` Stephen Hemminger
2017-11-30 14:14   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171130153522-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=achiad@mellanox.com \
    --cc=anjali.singhai@intel.com \
    --cc=gerlitz.or@gmail.com \
    --cc=gospo@broadcom.com \
    --cc=hannes@stressinduktion.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jasowang@redhat.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=peter.waskiewicz.jr@intel.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).