Re: [RFC] virtio-net: help live migrate SR-IOV devices

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
       [not found] <20171128112722.00003716@intel.com>
@ 2017-11-30  3:29 ` Jason Wang
  2017-11-30  3:51   ` Jakub Kicinski
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Jason Wang @ 2017-11-30  3:29 UTC (permalink / raw)
  To: Jesse Brandeburg, virtualization
  Cc: Jakub Kicinski, mst, Sridhar Samudrala, Achiad,
	Peter Waskiewicz Jr, Singhai, Anjali, Andy Gospodarek, Or Gerlitz,
	Hannes Frederic Sowa, netdev



On 2017年11月29日 03:27, Jesse Brandeburg wrote:
> Hi, I'd like to get some feedback on a proposal to enhance virtio-net
> to ease configuration of a VM and that would enable live migration of
> passthrough network SR-IOV devices.
>
> Today we have SR-IOV network devices (VFs) that can be passed into a VM
> in order to enable high performance networking direct within the VM.
> The problem I am trying to address is that this configuration is
> generally difficult to live-migrate.  There is documentation [1]
> indicating that some OS/Hypervisor vendors will support live migration
> of a system with a direct assigned networking device.  The problem I
> see with these implementations is that the network configuration
> requirements that are passed on to the owner of the VM are quite
> complicated.  You have to set up bonding, you have to configure it to
> enslave two interfaces, those interfaces (one is virtio-net, the other
> is SR-IOV device/driver like ixgbevf) must support MAC address changes
> requested in the VM, and on and on...
>
> So, on to the proposal:
> Modify virtio-net driver to be a single VM network device that
> enslaves an SR-IOV network device (inside the VM) with the same MAC
> address. This would cause the virtio-net driver to appear and work like
> a simplified bonding/team driver.  The live migration problem would be
> solved just like today's bonding solution, but the VM user's networking
> config would be greatly simplified.
>
> At it's simplest, it would appear something like this in the VM.
>
> ==========
> = vnet0  =
>           =============
> (virtio- =       |
>   net)    =       |
>           =  ==========
>           =  = ixgbef =
> ==========  ==========
>
> (forgive the ASCII art)
>
> The fast path traffic would prefer the ixgbevf or other SR-IOV device
> path, and fall back to virtio's transmit/receive when migrating.
>
> Compared to today's options this proposal would
> 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
>     speeds
> 2) simplify end user configuration in the VM (most if not all of the
>     set up to enable migration would be done in the hypervisor)
> 3) allow live migration via a simple link down and maybe a PCI
>     hot-unplug of the SR-IOV device, with failover to the virtio-net
>     driver core
> 4) allow vendor agnostic hardware acceleration, and live migration
>     between vendors if the VM os has driver support for all the required
>     SR-IOV devices.
>
> Runtime operation proposed:
> - <in either order> virtio-net driver loads, SR-IOV driver loads
> - virtio-net finds other NICs that match it's MAC address by
>    both examining existing interfaces, and sets up a new device notifier
> - virtio-net enslaves the first NIC with the same MAC address
> - virtio-net brings up the slave, and makes it the "preferred" path
> - virtio-net follows the behavior of an active backup bond/team
> - virtio-net acts as the interface to the VM
> - live migration initiates
> - link goes down on SR-IOV, or SR-IOV device is removed
> - failover to virtio-net as primary path
> - migration continues to new host
> - new host is started with virio-net as primary
> - if no SR-IOV, virtio-net stays primary
> - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> - virtio-net notices new NIC and starts over at enslave step above
>
> Future ideas (brainstorming):
> - Optimize Fast east-west by having special rules to direct east-west
>    traffic through virtio-net traffic path
>
> Thanks for reading!
> Jesse

Cc netdev.

Interesting, and this method is actually used by netvsc now:

commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
Author: stephen hemminger <stephen@networkplumber.org>
Date:   Tue Aug 1 19:58:53 2017 -0700

     netvsc: transparent VF management

     This patch implements transparent fail over from synthetic NIC to
     SR-IOV virtual function NIC in Hyper-V environment. It is a better
     alternative to using bonding as is done now. Instead, the receive and
     transmit fail over is done internally inside the driver.

     Using bonding driver has lots of issues because it depends on the
     script being run early enough in the boot process and with sufficient
     information to make the association. This patch moves all that
     functionality into the kernel.

     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>

If my understanding is correct there's no need to for any extension of 
virtio spec. If this is true, maybe you can start to prepare the patch?

Thanks

>
> [1]
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30  3:29 ` [RFC] virtio-net: help live migrate SR-IOV devices Jason Wang
@ 2017-11-30  3:51   ` Jakub Kicinski
  2017-11-30  4:10     ` Stephen Hemminger
  2017-11-30 13:54     ` Michael S. Tsirkin
  2017-11-30  8:08   ` achiad shochat
  2017-11-30 14:14   ` Michael S. Tsirkin
  2 siblings, 2 replies; 25+ messages in thread
From: Jakub Kicinski @ 2017-11-30  3:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesse Brandeburg, virtualization, mst, Sridhar Samudrala, Achiad,
	Peter Waskiewicz Jr, Singhai, Anjali, Andy Gospodarek, Or Gerlitz,
	netdev, Hannes Frederic Sowa

On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:
> On 2017年11月29日 03:27, Jesse Brandeburg wrote:
> > Hi, I'd like to get some feedback on a proposal to enhance virtio-net
> > to ease configuration of a VM and that would enable live migration of
> > passthrough network SR-IOV devices.
> >
> > Today we have SR-IOV network devices (VFs) that can be passed into a VM
> > in order to enable high performance networking direct within the VM.
> > The problem I am trying to address is that this configuration is
> > generally difficult to live-migrate.  There is documentation [1]
> > indicating that some OS/Hypervisor vendors will support live migration
> > of a system with a direct assigned networking device.  The problem I
> > see with these implementations is that the network configuration
> > requirements that are passed on to the owner of the VM are quite
> > complicated.  You have to set up bonding, you have to configure it to
> > enslave two interfaces, those interfaces (one is virtio-net, the other
> > is SR-IOV device/driver like ixgbevf) must support MAC address changes
> > requested in the VM, and on and on...
> >
> > So, on to the proposal:
> > Modify virtio-net driver to be a single VM network device that
> > enslaves an SR-IOV network device (inside the VM) with the same MAC
> > address. This would cause the virtio-net driver to appear and work like
> > a simplified bonding/team driver.  The live migration problem would be
> > solved just like today's bonding solution, but the VM user's networking
> > config would be greatly simplified.
> >
> > At it's simplest, it would appear something like this in the VM.
> >
> > ==========
> > = vnet0  =
> >           =============
> > (virtio- =       |
> >   net)    =       |
> >           =  ==========
> >           =  = ixgbef =
> > ==========  ==========
> >
> > (forgive the ASCII art)
> >
> > The fast path traffic would prefer the ixgbevf or other SR-IOV device
> > path, and fall back to virtio's transmit/receive when migrating.
> >
> > Compared to today's options this proposal would
> > 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
> >     speeds
> > 2) simplify end user configuration in the VM (most if not all of the
> >     set up to enable migration would be done in the hypervisor)
> > 3) allow live migration via a simple link down and maybe a PCI
> >     hot-unplug of the SR-IOV device, with failover to the virtio-net
> >     driver core
> > 4) allow vendor agnostic hardware acceleration, and live migration
> >     between vendors if the VM os has driver support for all the required
> >     SR-IOV devices.
> >
> > Runtime operation proposed:
> > - <in either order> virtio-net driver loads, SR-IOV driver loads
> > - virtio-net finds other NICs that match it's MAC address by
> >    both examining existing interfaces, and sets up a new device notifier
> > - virtio-net enslaves the first NIC with the same MAC address
> > - virtio-net brings up the slave, and makes it the "preferred" path
> > - virtio-net follows the behavior of an active backup bond/team
> > - virtio-net acts as the interface to the VM
> > - live migration initiates
> > - link goes down on SR-IOV, or SR-IOV device is removed
> > - failover to virtio-net as primary path
> > - migration continues to new host
> > - new host is started with virio-net as primary
> > - if no SR-IOV, virtio-net stays primary
> > - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> > - virtio-net notices new NIC and starts over at enslave step above
> >
> > Future ideas (brainstorming):
> > - Optimize Fast east-west by having special rules to direct east-west
> >    traffic through virtio-net traffic path
> >
> > Thanks for reading!
> > Jesse  
> 
> Cc netdev.
> 
> Interesting, and this method is actually used by netvsc now:
> 
> commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> Author: stephen hemminger <stephen@networkplumber.org>
> Date:   Tue Aug 1 19:58:53 2017 -0700
> 
>      netvsc: transparent VF management
> 
>      This patch implements transparent fail over from synthetic NIC to
>      SR-IOV virtual function NIC in Hyper-V environment. It is a better
>      alternative to using bonding as is done now. Instead, the receive and
>      transmit fail over is done internally inside the driver.
> 
>      Using bonding driver has lots of issues because it depends on the
>      script being run early enough in the boot process and with sufficient
>      information to make the association. This patch moves all that
>      functionality into the kernel.
> 
>      Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> If my understanding is correct there's no need to for any extension of 
> virtio spec. If this is true, maybe you can start to prepare the patch?

IMHO this is as close to policy in the kernel as one can get.  User
land has all the information it needs to instantiate that bond/team
automatically.  In fact I'm trying to discuss this with NetworkManager
folks and Red Hat right now:

https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html

Can we flip the argument and ask why is the kernel supposed to be
responsible for this?  It's not like we run DHCP out of the kernel
on new interfaces... 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30  3:51   ` Jakub Kicinski
@ 2017-11-30  4:10     ` Stephen Hemminger
  2017-11-30  4:21       ` Jakub Kicinski
  2017-11-30 13:54     ` Michael S. Tsirkin
  1 sibling, 1 reply; 25+ messages in thread
From: Stephen Hemminger @ 2017-11-30  4:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jason Wang, Jesse Brandeburg, virtualization, mst,
	Sridhar Samudrala, Achiad, Peter Waskiewicz Jr, Singhai, Anjali,
	Andy Gospodarek, Or Gerlitz, netdev, Hannes Frederic Sowa

On Wed, 29 Nov 2017 19:51:38 -0800
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:
> > On 2017年11月29日 03:27, Jesse Brandeburg wrote:  
> > > Hi, I'd like to get some feedback on a proposal to enhance
> > > virtio-net to ease configuration of a VM and that would enable
> > > live migration of passthrough network SR-IOV devices.
> > >
> > > Today we have SR-IOV network devices (VFs) that can be passed
> > > into a VM in order to enable high performance networking direct
> > > within the VM. The problem I am trying to address is that this
> > > configuration is generally difficult to live-migrate.  There is
> > > documentation [1] indicating that some OS/Hypervisor vendors will
> > > support live migration of a system with a direct assigned
> > > networking device.  The problem I see with these implementations
> > > is that the network configuration requirements that are passed on
> > > to the owner of the VM are quite complicated.  You have to set up
> > > bonding, you have to configure it to enslave two interfaces,
> > > those interfaces (one is virtio-net, the other is SR-IOV
> > > device/driver like ixgbevf) must support MAC address changes
> > > requested in the VM, and on and on...
> > >
> > > So, on to the proposal:
> > > Modify virtio-net driver to be a single VM network device that
> > > enslaves an SR-IOV network device (inside the VM) with the same
> > > MAC address. This would cause the virtio-net driver to appear and
> > > work like a simplified bonding/team driver.  The live migration
> > > problem would be solved just like today's bonding solution, but
> > > the VM user's networking config would be greatly simplified.
> > >
> > > At it's simplest, it would appear something like this in the VM.
> > >
> > > ==========
> > > = vnet0  =
> > >           =============
> > > (virtio- =       |
> > >   net)    =       |
> > >           =  ==========
> > >           =  = ixgbef =
> > > ==========  ==========
> > >
> > > (forgive the ASCII art)
> > >
> > > The fast path traffic would prefer the ixgbevf or other SR-IOV
> > > device path, and fall back to virtio's transmit/receive when
> > > migrating.
> > >
> > > Compared to today's options this proposal would
> > > 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
> > >     speeds
> > > 2) simplify end user configuration in the VM (most if not all of
> > > the set up to enable migration would be done in the hypervisor)
> > > 3) allow live migration via a simple link down and maybe a PCI
> > >     hot-unplug of the SR-IOV device, with failover to the
> > > virtio-net driver core
> > > 4) allow vendor agnostic hardware acceleration, and live migration
> > >     between vendors if the VM os has driver support for all the
> > > required SR-IOV devices.
> > >
> > > Runtime operation proposed:
> > > - <in either order> virtio-net driver loads, SR-IOV driver loads
> > > - virtio-net finds other NICs that match it's MAC address by
> > >    both examining existing interfaces, and sets up a new device
> > > notifier
> > > - virtio-net enslaves the first NIC with the same MAC address
> > > - virtio-net brings up the slave, and makes it the "preferred"
> > > path
> > > - virtio-net follows the behavior of an active backup bond/team
> > > - virtio-net acts as the interface to the VM
> > > - live migration initiates
> > > - link goes down on SR-IOV, or SR-IOV device is removed
> > > - failover to virtio-net as primary path
> > > - migration continues to new host
> > > - new host is started with virio-net as primary
> > > - if no SR-IOV, virtio-net stays primary
> > > - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> > > - virtio-net notices new NIC and starts over at enslave step above
> > >
> > > Future ideas (brainstorming):
> > > - Optimize Fast east-west by having special rules to direct
> > > east-west traffic through virtio-net traffic path
> > >
> > > Thanks for reading!
> > > Jesse    
> > 
> > Cc netdev.
> > 
> > Interesting, and this method is actually used by netvsc now:
> > 
> > commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> > Author: stephen hemminger <stephen@networkplumber.org>
> > Date:   Tue Aug 1 19:58:53 2017 -0700
> > 
> >      netvsc: transparent VF management
> > 
> >      This patch implements transparent fail over from synthetic NIC
> > to SR-IOV virtual function NIC in Hyper-V environment. It is a
> > better alternative to using bonding as is done now. Instead, the
> > receive and transmit fail over is done internally inside the driver.
> > 
> >      Using bonding driver has lots of issues because it depends on
> > the script being run early enough in the boot process and with
> > sufficient information to make the association. This patch moves
> > all that functionality into the kernel.
> > 
> >      Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >      Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > If my understanding is correct there's no need to for any extension
> > of virtio spec. If this is true, maybe you can start to prepare the
> > patch?  
> 
> IMHO this is as close to policy in the kernel as one can get.  User
> land has all the information it needs to instantiate that bond/team
> automatically.  In fact I'm trying to discuss this with NetworkManager
> folks and Red Hat right now:
> 
> https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html
> 
> Can we flip the argument and ask why is the kernel supposed to be
> responsible for this?  It's not like we run DHCP out of the kernel
> on new interfaces... 

Although "policy should not be in the kernel" is a a great mantra,
it is not practical in the real world.

If you think it can be solved in userspace, then you haven't had to
deal with four different network initialization
systems, multiple orchestration systems and customers on ancient
Enterprise distributions.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30  4:10     ` Stephen Hemminger
@ 2017-11-30  4:21       ` Jakub Kicinski
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2017-11-30  4:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jason Wang, Jesse Brandeburg, virtualization, mst,
	Sridhar Samudrala, Achiad, Peter Waskiewicz Jr, Singhai, Anjali,
	Andy Gospodarek, Or Gerlitz, netdev, Hannes Frederic Sowa

On Wed, 29 Nov 2017 20:10:09 -0800, Stephen Hemminger wrote:
> On Wed, 29 Nov 2017 19:51:38 -0800 Jakub Kicinski wrote:
> > On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:  
> > > On 2017年11月29日 03:27, Jesse Brandeburg wrote:    
> > > commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> > > Author: stephen hemminger <stephen@networkplumber.org>
> > > Date:   Tue Aug 1 19:58:53 2017 -0700
> > > 
> > >      netvsc: transparent VF management
> > > 
> > >      This patch implements transparent fail over from synthetic NIC
> > > to SR-IOV virtual function NIC in Hyper-V environment. It is a
> > > better alternative to using bonding as is done now. Instead, the
> > > receive and transmit fail over is done internally inside the driver.
> > > 
> > >      Using bonding driver has lots of issues because it depends on
> > > the script being run early enough in the boot process and with
> > > sufficient information to make the association. This patch moves
> > > all that functionality into the kernel.
> > > 
> > >      Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > >      Signed-off-by: David S. Miller <davem@davemloft.net>
> > > 
> > > If my understanding is correct there's no need to for any extension
> > > of virtio spec. If this is true, maybe you can start to prepare the
> > > patch?    
> > 
> > IMHO this is as close to policy in the kernel as one can get.  User
> > land has all the information it needs to instantiate that bond/team
> > automatically.  In fact I'm trying to discuss this with NetworkManager
> > folks and Red Hat right now:
> > 
> > https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html
> > 
> > Can we flip the argument and ask why is the kernel supposed to be
> > responsible for this?  It's not like we run DHCP out of the kernel
> > on new interfaces...   
> 
> Although "policy should not be in the kernel" is a a great mantra,
> it is not practical in the real world.
> 
> If you think it can be solved in userspace, then you haven't had to
> deal with four different network initialization
> systems, multiple orchestration systems and customers on ancient
> Enterprise distributions.

I would accept that argument if anyone ever tried to get those
Enterprise distros to handle this use case.  From conversations I 
had it seemed like no one ever did, and SR-IOV+virtio bonding is 
what has been done to solve this since day 1 of SR-IOV networking.

For practical reasons it's easier to push this into the kernel, 
because vendors rarely employ developers of the user space
orchestrations systems.  Is that not the real problem here,
potentially? :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30  3:51   ` Jakub Kicinski
  2017-11-30  4:10     ` Stephen Hemminger
@ 2017-11-30 13:54     ` Michael S. Tsirkin
  2017-11-30 20:48       ` Jakub Kicinski
  1 sibling, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2017-11-30 13:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jason Wang, Jesse Brandeburg, virtualization, Sridhar Samudrala,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Andy Gospodarek,
	Or Gerlitz, netdev, Hannes Frederic Sowa

On Wed, Nov 29, 2017 at 07:51:38PM -0800, Jakub Kicinski wrote:
> On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:
> > On 2017年11月29日 03:27, Jesse Brandeburg wrote:
> > > Hi, I'd like to get some feedback on a proposal to enhance virtio-net
> > > to ease configuration of a VM and that would enable live migration of
> > > passthrough network SR-IOV devices.
> > >
> > > Today we have SR-IOV network devices (VFs) that can be passed into a VM
> > > in order to enable high performance networking direct within the VM.
> > > The problem I am trying to address is that this configuration is
> > > generally difficult to live-migrate.  There is documentation [1]
> > > indicating that some OS/Hypervisor vendors will support live migration
> > > of a system with a direct assigned networking device.  The problem I
> > > see with these implementations is that the network configuration
> > > requirements that are passed on to the owner of the VM are quite
> > > complicated.  You have to set up bonding, you have to configure it to
> > > enslave two interfaces, those interfaces (one is virtio-net, the other
> > > is SR-IOV device/driver like ixgbevf) must support MAC address changes
> > > requested in the VM, and on and on...
> > >
> > > So, on to the proposal:
> > > Modify virtio-net driver to be a single VM network device that
> > > enslaves an SR-IOV network device (inside the VM) with the same MAC
> > > address. This would cause the virtio-net driver to appear and work like
> > > a simplified bonding/team driver.  The live migration problem would be
> > > solved just like today's bonding solution, but the VM user's networking
> > > config would be greatly simplified.
> > >
> > > At it's simplest, it would appear something like this in the VM.
> > >
> > > ==========
> > > = vnet0  =
> > >           =============
> > > (virtio- =       |
> > >   net)    =       |
> > >           =  ==========
> > >           =  = ixgbef =
> > > ==========  ==========
> > >
> > > (forgive the ASCII art)
> > >
> > > The fast path traffic would prefer the ixgbevf or other SR-IOV device
> > > path, and fall back to virtio's transmit/receive when migrating.
> > >
> > > Compared to today's options this proposal would
> > > 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
> > >     speeds
> > > 2) simplify end user configuration in the VM (most if not all of the
> > >     set up to enable migration would be done in the hypervisor)
> > > 3) allow live migration via a simple link down and maybe a PCI
> > >     hot-unplug of the SR-IOV device, with failover to the virtio-net
> > >     driver core
> > > 4) allow vendor agnostic hardware acceleration, and live migration
> > >     between vendors if the VM os has driver support for all the required
> > >     SR-IOV devices.
> > >
> > > Runtime operation proposed:
> > > - <in either order> virtio-net driver loads, SR-IOV driver loads
> > > - virtio-net finds other NICs that match it's MAC address by
> > >    both examining existing interfaces, and sets up a new device notifier
> > > - virtio-net enslaves the first NIC with the same MAC address
> > > - virtio-net brings up the slave, and makes it the "preferred" path
> > > - virtio-net follows the behavior of an active backup bond/team
> > > - virtio-net acts as the interface to the VM
> > > - live migration initiates
> > > - link goes down on SR-IOV, or SR-IOV device is removed
> > > - failover to virtio-net as primary path
> > > - migration continues to new host
> > > - new host is started with virio-net as primary
> > > - if no SR-IOV, virtio-net stays primary
> > > - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> > > - virtio-net notices new NIC and starts over at enslave step above
> > >
> > > Future ideas (brainstorming):
> > > - Optimize Fast east-west by having special rules to direct east-west
> > >    traffic through virtio-net traffic path
> > >
> > > Thanks for reading!
> > > Jesse  
> > 
> > Cc netdev.
> > 
> > Interesting, and this method is actually used by netvsc now:
> > 
> > commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> > Author: stephen hemminger <stephen@networkplumber.org>
> > Date:   Tue Aug 1 19:58:53 2017 -0700
> > 
> >      netvsc: transparent VF management
> > 
> >      This patch implements transparent fail over from synthetic NIC to
> >      SR-IOV virtual function NIC in Hyper-V environment. It is a better
> >      alternative to using bonding as is done now. Instead, the receive and
> >      transmit fail over is done internally inside the driver.
> > 
> >      Using bonding driver has lots of issues because it depends on the
> >      script being run early enough in the boot process and with sufficient
> >      information to make the association. This patch moves all that
> >      functionality into the kernel.
> > 
> >      Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >      Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > If my understanding is correct there's no need to for any extension of 
> > virtio spec. If this is true, maybe you can start to prepare the patch?
>
> IMHO this is as close to policy in the kernel as one can get.  User
> land has all the information it needs to instantiate that bond/team
> automatically.

It does have this info (MAC addresses match) but where's the policy
here? IMHO the policy has been set by the hypervisor already.
>From hypervisor POV adding passthrough is a commitment not to migrate
until guest stops using the passthrough device.

Within the guest, the bond is required for purely functional reasons - just to
maintain a link up since we know SRIOV will will go away. Maintaining an
uninterrupted connection is not a policy - it's what networking is
about.

>  In fact I'm trying to discuss this with NetworkManager
> folks and Red Hat right now:
> 
> https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html

I thought we should do it too, for a while.

But now, I think that the real issue is this: kernel exposes what looks
like two network devices to userspace, but in fact it is just one
backend device, just exposed by hypervisor in a weird way for
compatibility reasons.

For example you will not get a better reliability or throughput by using
both of them - the only bonding mode that makes sense is fail over. As
another example, if the underlying physical device lost its link, trying
to use virtio won't help - it's only useful when the passthrough device
is gone for good.  As another example, there is no point in not
configuring a bond. As a last example, depending on how the backend is
configured, virtio might not even work when the pass-through device is
active.

So from that point of view, showing two network devices to userspace is
a bug that we are asking userspace to work around.

> Can we flip the argument and ask why is the kernel supposed to be
> responsible for this?

Because if we show a single device to userspace the number of
misconfigured guests will go down, and we won't lose any useful
flexibility.

>  It's not like we run DHCP out of the kernel
> on new interfaces... 

Because one can set up a static IP, IPv6 doesn't always need DHCP, etc.

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30 13:54     ` Michael S. Tsirkin
@ 2017-11-30 20:48       ` Jakub Kicinski
  2017-12-01  5:13         ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2017-11-30 20:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Jesse Brandeburg, virtualization, Sridhar Samudrala,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Andy Gospodarek,
	Or Gerlitz, netdev, Hannes Frederic Sowa

On Thu, 30 Nov 2017 15:54:40 +0200, Michael S. Tsirkin wrote:
> On Wed, Nov 29, 2017 at 07:51:38PM -0800, Jakub Kicinski wrote:
> > On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:  
> > > On 2017年11月29日 03:27, Jesse Brandeburg wrote:  
> > > > Hi, I'd like to get some feedback on a proposal to enhance virtio-net
> > > > to ease configuration of a VM and that would enable live migration of
> > > > passthrough network SR-IOV devices.
> > > >
> > > > Today we have SR-IOV network devices (VFs) that can be passed into a VM
> > > > in order to enable high performance networking direct within the VM.
> > > > The problem I am trying to address is that this configuration is
> > > > generally difficult to live-migrate.  There is documentation [1]
> > > > indicating that some OS/Hypervisor vendors will support live migration
> > > > of a system with a direct assigned networking device.  The problem I
> > > > see with these implementations is that the network configuration
> > > > requirements that are passed on to the owner of the VM are quite
> > > > complicated.  You have to set up bonding, you have to configure it to
> > > > enslave two interfaces, those interfaces (one is virtio-net, the other
> > > > is SR-IOV device/driver like ixgbevf) must support MAC address changes
> > > > requested in the VM, and on and on...
> > > >
> > > > So, on to the proposal:
> > > > Modify virtio-net driver to be a single VM network device that
> > > > enslaves an SR-IOV network device (inside the VM) with the same MAC
> > > > address. This would cause the virtio-net driver to appear and work like
> > > > a simplified bonding/team driver.  The live migration problem would be
> > > > solved just like today's bonding solution, but the VM user's networking
> > > > config would be greatly simplified.
> > > >
> > > > At it's simplest, it would appear something like this in the VM.
> > > >
> > > > ==========
> > > > = vnet0  =
> > > >           =============
> > > > (virtio- =       |
> > > >   net)    =       |
> > > >           =  ==========
> > > >           =  = ixgbef =
> > > > ==========  ==========
> > > >
> > > > (forgive the ASCII art)
> > > >
> > > > The fast path traffic would prefer the ixgbevf or other SR-IOV device
> > > > path, and fall back to virtio's transmit/receive when migrating.
> > > >
> > > > Compared to today's options this proposal would
> > > > 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
> > > >     speeds
> > > > 2) simplify end user configuration in the VM (most if not all of the
> > > >     set up to enable migration would be done in the hypervisor)
> > > > 3) allow live migration via a simple link down and maybe a PCI
> > > >     hot-unplug of the SR-IOV device, with failover to the virtio-net
> > > >     driver core
> > > > 4) allow vendor agnostic hardware acceleration, and live migration
> > > >     between vendors if the VM os has driver support for all the required
> > > >     SR-IOV devices.
> > > >
> > > > Runtime operation proposed:
> > > > - <in either order> virtio-net driver loads, SR-IOV driver loads
> > > > - virtio-net finds other NICs that match it's MAC address by
> > > >    both examining existing interfaces, and sets up a new device notifier
> > > > - virtio-net enslaves the first NIC with the same MAC address
> > > > - virtio-net brings up the slave, and makes it the "preferred" path
> > > > - virtio-net follows the behavior of an active backup bond/team
> > > > - virtio-net acts as the interface to the VM
> > > > - live migration initiates
> > > > - link goes down on SR-IOV, or SR-IOV device is removed
> > > > - failover to virtio-net as primary path
> > > > - migration continues to new host
> > > > - new host is started with virio-net as primary
> > > > - if no SR-IOV, virtio-net stays primary
> > > > - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> > > > - virtio-net notices new NIC and starts over at enslave step above
> > > >
> > > > Future ideas (brainstorming):
> > > > - Optimize Fast east-west by having special rules to direct east-west
> > > >    traffic through virtio-net traffic path
> > > >
> > > > Thanks for reading!
> > > > Jesse    
> > > 
> > > Cc netdev.
> > > 
> > > Interesting, and this method is actually used by netvsc now:
> > > 
> > > commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> > > Author: stephen hemminger <stephen@networkplumber.org>
> > > Date:   Tue Aug 1 19:58:53 2017 -0700
> > > 
> > >      netvsc: transparent VF management
> > > 
> > >      This patch implements transparent fail over from synthetic NIC to
> > >      SR-IOV virtual function NIC in Hyper-V environment. It is a better
> > >      alternative to using bonding as is done now. Instead, the receive and
> > >      transmit fail over is done internally inside the driver.
> > > 
> > >      Using bonding driver has lots of issues because it depends on the
> > >      script being run early enough in the boot process and with sufficient
> > >      information to make the association. This patch moves all that
> > >      functionality into the kernel.
> > > 
> > >      Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > >      Signed-off-by: David S. Miller <davem@davemloft.net>
> > > 
> > > If my understanding is correct there's no need to for any extension of 
> > > virtio spec. If this is true, maybe you can start to prepare the patch?  
> >
> > IMHO this is as close to policy in the kernel as one can get.  User
> > land has all the information it needs to instantiate that bond/team
> > automatically.  
> 
> It does have this info (MAC addresses match) but where's the policy
> here? IMHO the policy has been set by the hypervisor already.
> From hypervisor POV adding passthrough is a commitment not to migrate
> until guest stops using the passthrough device.
> 
> Within the guest, the bond is required for purely functional reasons - just to
> maintain a link up since we know SRIOV will will go away. Maintaining an
> uninterrupted connection is not a policy - it's what networking is
> about.
> 
> >  In fact I'm trying to discuss this with NetworkManager
> > folks and Red Hat right now:
> > 
> > https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html  
> 
> I thought we should do it too, for a while.
> 
> But now, I think that the real issue is this: kernel exposes what looks
> like two network devices to userspace, but in fact it is just one
> backend device, just exposed by hypervisor in a weird way for
> compatibility reasons.
> 
> For example you will not get a better reliability or throughput by using
> both of them - the only bonding mode that makes sense is fail over.

Yes, I'm talking about fail over.

> As another example, if the underlying physical device lost its link, trying
> to use virtio won't help - it's only useful when the passthrough device
> is gone for good.  As another example, there is no point in not
> configuring a bond. As a last example, depending on how the backend is
> configured, virtio might not even work when the pass-through device is
> active.
> 
> So from that point of view, showing two network devices to userspace is
> a bug that we are asking userspace to work around.

I'm confused by what you're saying here.  IIRC the question is whether
we expose 2 netdevs or 3.  There will always be a virtio netdev and a
VF netdev.  I assume you're not suggesting hiding the VF netdev.  So
the question is do we expose a VF netdev and a combo virtio netdev
which is also a bond or do we expose a VF netdev a virtio netdev, and a
active/passive bond/team which is a well understood and architecturally
correct construct.

> > Can we flip the argument and ask why is the kernel supposed to be
> > responsible for this?  
> 
> Because if we show a single device to userspace the number of
> misconfigured guests will go down, and we won't lose any useful
> flexibility.

Again, single device?

> >  It's not like we run DHCP out of the kernel
> > on new interfaces...   
> 
> Because one can set up a static IP, IPv6 doesn't always need DHCP, etc.

But we don't handle LACP, etc.

Look, as much as I don't like this, I'm not going to argue about this to
death.  I just find it very dishonest to claim kernel *has to* do it,
when no one seem to have made any honest attempts to solve this in user
space for the last 10 years :/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30 20:48       ` Jakub Kicinski
@ 2017-12-01  5:13         ` Michael S. Tsirkin
  0 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2017-12-01  5:13 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jason Wang, Jesse Brandeburg, virtualization, Sridhar Samudrala,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Andy Gospodarek,
	Or Gerlitz, netdev, Hannes Frederic Sowa

On Thu, Nov 30, 2017 at 12:48:22PM -0800, Jakub Kicinski wrote:
> On Thu, 30 Nov 2017 15:54:40 +0200, Michael S. Tsirkin wrote:
> > On Wed, Nov 29, 2017 at 07:51:38PM -0800, Jakub Kicinski wrote:
> > > On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:  
> > > > On 2017年11月29日 03:27, Jesse Brandeburg wrote:  
> > > > > Hi, I'd like to get some feedback on a proposal to enhance virtio-net
> > > > > to ease configuration of a VM and that would enable live migration of
> > > > > passthrough network SR-IOV devices.
> > > > >
> > > > > Today we have SR-IOV network devices (VFs) that can be passed into a VM
> > > > > in order to enable high performance networking direct within the VM.
> > > > > The problem I am trying to address is that this configuration is
> > > > > generally difficult to live-migrate.  There is documentation [1]
> > > > > indicating that some OS/Hypervisor vendors will support live migration
> > > > > of a system with a direct assigned networking device.  The problem I
> > > > > see with these implementations is that the network configuration
> > > > > requirements that are passed on to the owner of the VM are quite
> > > > > complicated.  You have to set up bonding, you have to configure it to
> > > > > enslave two interfaces, those interfaces (one is virtio-net, the other
> > > > > is SR-IOV device/driver like ixgbevf) must support MAC address changes
> > > > > requested in the VM, and on and on...
> > > > >
> > > > > So, on to the proposal:
> > > > > Modify virtio-net driver to be a single VM network device that
> > > > > enslaves an SR-IOV network device (inside the VM) with the same MAC
> > > > > address. This would cause the virtio-net driver to appear and work like
> > > > > a simplified bonding/team driver.  The live migration problem would be
> > > > > solved just like today's bonding solution, but the VM user's networking
> > > > > config would be greatly simplified.
> > > > >
> > > > > At it's simplest, it would appear something like this in the VM.
> > > > >
> > > > > ==========
> > > > > = vnet0  =
> > > > >           =============
> > > > > (virtio- =       |
> > > > >   net)    =       |
> > > > >           =  ==========
> > > > >           =  = ixgbef =
> > > > > ==========  ==========
> > > > >
> > > > > (forgive the ASCII art)
> > > > >
> > > > > The fast path traffic would prefer the ixgbevf or other SR-IOV device
> > > > > path, and fall back to virtio's transmit/receive when migrating.
> > > > >
> > > > > Compared to today's options this proposal would
> > > > > 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
> > > > >     speeds
> > > > > 2) simplify end user configuration in the VM (most if not all of the
> > > > >     set up to enable migration would be done in the hypervisor)
> > > > > 3) allow live migration via a simple link down and maybe a PCI
> > > > >     hot-unplug of the SR-IOV device, with failover to the virtio-net
> > > > >     driver core
> > > > > 4) allow vendor agnostic hardware acceleration, and live migration
> > > > >     between vendors if the VM os has driver support for all the required
> > > > >     SR-IOV devices.
> > > > >
> > > > > Runtime operation proposed:
> > > > > - <in either order> virtio-net driver loads, SR-IOV driver loads
> > > > > - virtio-net finds other NICs that match it's MAC address by
> > > > >    both examining existing interfaces, and sets up a new device notifier
> > > > > - virtio-net enslaves the first NIC with the same MAC address
> > > > > - virtio-net brings up the slave, and makes it the "preferred" path
> > > > > - virtio-net follows the behavior of an active backup bond/team
> > > > > - virtio-net acts as the interface to the VM
> > > > > - live migration initiates
> > > > > - link goes down on SR-IOV, or SR-IOV device is removed
> > > > > - failover to virtio-net as primary path
> > > > > - migration continues to new host
> > > > > - new host is started with virio-net as primary
> > > > > - if no SR-IOV, virtio-net stays primary
> > > > > - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> > > > > - virtio-net notices new NIC and starts over at enslave step above
> > > > >
> > > > > Future ideas (brainstorming):
> > > > > - Optimize Fast east-west by having special rules to direct east-west
> > > > >    traffic through virtio-net traffic path
> > > > >
> > > > > Thanks for reading!
> > > > > Jesse    
> > > > 
> > > > Cc netdev.
> > > > 
> > > > Interesting, and this method is actually used by netvsc now:
> > > > 
> > > > commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> > > > Author: stephen hemminger <stephen@networkplumber.org>
> > > > Date:   Tue Aug 1 19:58:53 2017 -0700
> > > > 
> > > >      netvsc: transparent VF management
> > > > 
> > > >      This patch implements transparent fail over from synthetic NIC to
> > > >      SR-IOV virtual function NIC in Hyper-V environment. It is a better
> > > >      alternative to using bonding as is done now. Instead, the receive and
> > > >      transmit fail over is done internally inside the driver.
> > > > 
> > > >      Using bonding driver has lots of issues because it depends on the
> > > >      script being run early enough in the boot process and with sufficient
> > > >      information to make the association. This patch moves all that
> > > >      functionality into the kernel.
> > > > 
> > > >      Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > > >      Signed-off-by: David S. Miller <davem@davemloft.net>
> > > > 
> > > > If my understanding is correct there's no need to for any extension of 
> > > > virtio spec. If this is true, maybe you can start to prepare the patch?  
> > >
> > > IMHO this is as close to policy in the kernel as one can get.  User
> > > land has all the information it needs to instantiate that bond/team
> > > automatically.  
> > 
> > It does have this info (MAC addresses match) but where's the policy
> > here? IMHO the policy has been set by the hypervisor already.
> > From hypervisor POV adding passthrough is a commitment not to migrate
> > until guest stops using the passthrough device.
> > 
> > Within the guest, the bond is required for purely functional reasons - just to
> > maintain a link up since we know SRIOV will will go away. Maintaining an
> > uninterrupted connection is not a policy - it's what networking is
> > about.
> > 
> > >  In fact I'm trying to discuss this with NetworkManager
> > > folks and Red Hat right now:
> > > 
> > > https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html  
> > 
> > I thought we should do it too, for a while.
> > 
> > But now, I think that the real issue is this: kernel exposes what looks
> > like two network devices to userspace, but in fact it is just one
> > backend device, just exposed by hypervisor in a weird way for
> > compatibility reasons.
> > 
> > For example you will not get a better reliability or throughput by using
> > both of them - the only bonding mode that makes sense is fail over.
> 
> Yes, I'm talking about fail over.
> 
> > As another example, if the underlying physical device lost its link, trying
> > to use virtio won't help - it's only useful when the passthrough device
> > is gone for good.  As another example, there is no point in not
> > configuring a bond. As a last example, depending on how the backend is
> > configured, virtio might not even work when the pass-through device is
> > active.
> > 
> > So from that point of view, showing two network devices to userspace is
> > a bug that we are asking userspace to work around.
> 
> I'm confused by what you're saying here.  IIRC the question is whether
> we expose 2 netdevs or 3.  There will always be a virtio netdev and a
> VF netdev.  I assume you're not suggesting hiding the VF netdev.

Passthrough is a better term, it does not have to be a VF.

All I am saying is these are not two independent devices.

It's a good point - ideally we would hide it completely. Not sure we
can.

>  So
> the question is do we expose a VF netdev and a combo virtio netdev
> which is also a bond or do we expose a VF netdev a virtio netdev, and a
> active/passive bond/team which is a well understood and architecturally
> correct construct.

It's a well understood construct for bonding but it is not exactly what
we are dealing with here. What we have is a single device with two ways
to access it.


> > > Can we flip the argument and ask why is the kernel supposed to be
> > > responsible for this?  
> > 
> > Because if we show a single device to userspace the number of
> > misconfigured guests will go down, and we won't lose any useful
> > flexibility.
> 
> Again, single device?
> 
> > >  It's not like we run DHCP out of the kernel
> > > on new interfaces...   
> > 
> > Because one can set up a static IP, IPv6 doesn't always need DHCP, etc.
> 
> But we don't handle LACP, etc.
> 
> Look, as much as I don't like this, I'm not going to argue about this to
> death.  I just find it very dishonest to claim kernel *has to* do it,

kernel does not *have* to do it. It might be better to do it in kernel though.

> when no one seem to have made any honest attempts to solve this in user
> space for the last 10 years :/

Each time I tried to convince userspace maintainers I get the kind of
pushback you seem to have encountered. And the reason is that they are
used to manage bonding for actual multiple ethernet ports with all the
complexity this entails. Maybe it wasn't an honest attempt in that I
didn't actually post patches, but there you are.

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30  3:29 ` [RFC] virtio-net: help live migrate SR-IOV devices Jason Wang
  2017-11-30  3:51   ` Jakub Kicinski
@ 2017-11-30  8:08   ` achiad shochat
  2017-11-30 14:11     ` Michael S. Tsirkin
  2017-11-30 14:14   ` Michael S. Tsirkin
  2 siblings, 1 reply; 25+ messages in thread
From: achiad shochat @ 2017-11-30  8:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesse Brandeburg, virtualization, Jakub Kicinski, mst,
	Sridhar Samudrala, Achiad, Peter Waskiewicz Jr, Singhai, Anjali,
	Andy Gospodarek, Or Gerlitz, Hannes Frederic Sowa, netdev

On 30 November 2017 at 05:29, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2017年11月29日 03:27, Jesse Brandeburg wrote:
>>
>> Hi, I'd like to get some feedback on a proposal to enhance virtio-net
>> to ease configuration of a VM and that would enable live migration of
>> passthrough network SR-IOV devices.
>>
>> Today we have SR-IOV network devices (VFs) that can be passed into a VM
>> in order to enable high performance networking direct within the VM.
>> The problem I am trying to address is that this configuration is
>> generally difficult to live-migrate.  There is documentation [1]
>> indicating that some OS/Hypervisor vendors will support live migration
>> of a system with a direct assigned networking device.  The problem I
>> see with these implementations is that the network configuration
>> requirements that are passed on to the owner of the VM are quite
>> complicated.  You have to set up bonding, you have to configure it to
>> enslave two interfaces, those interfaces (one is virtio-net, the other
>> is SR-IOV device/driver like ixgbevf) must support MAC address changes
>> requested in the VM, and on and on...
>>
>> So, on to the proposal:
>> Modify virtio-net driver to be a single VM network device that
>> enslaves an SR-IOV network device (inside the VM) with the same MAC
>> address. This would cause the virtio-net driver to appear and work like
>> a simplified bonding/team driver.  The live migration problem would be
>> solved just like today's bonding solution, but the VM user's networking
>> config would be greatly simplified.
>>
>> At it's simplest, it would appear something like this in the VM.
>>
>> ==========
>> = vnet0  =
>>           =============
>> (virtio- =       |
>>   net)    =       |
>>           =  ==========
>>           =  = ixgbef =
>> ==========  ==========
>>
>> (forgive the ASCII art)
>>
>> The fast path traffic would prefer the ixgbevf or other SR-IOV device
>> path, and fall back to virtio's transmit/receive when migrating.
>>
>> Compared to today's options this proposal would
>> 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
>>     speeds
>> 2) simplify end user configuration in the VM (most if not all of the
>>     set up to enable migration would be done in the hypervisor)
>> 3) allow live migration via a simple link down and maybe a PCI
>>     hot-unplug of the SR-IOV device, with failover to the virtio-net
>>     driver core
>> 4) allow vendor agnostic hardware acceleration, and live migration
>>     between vendors if the VM os has driver support for all the required
>>     SR-IOV devices.
>>
>> Runtime operation proposed:
>> - <in either order> virtio-net driver loads, SR-IOV driver loads
>> - virtio-net finds other NICs that match it's MAC address by
>>    both examining existing interfaces, and sets up a new device notifier
>> - virtio-net enslaves the first NIC with the same MAC address
>> - virtio-net brings up the slave, and makes it the "preferred" path
>> - virtio-net follows the behavior of an active backup bond/team
>> - virtio-net acts as the interface to the VM
>> - live migration initiates
>> - link goes down on SR-IOV, or SR-IOV device is removed
>> - failover to virtio-net as primary path
>> - migration continues to new host
>> - new host is started with virio-net as primary
>> - if no SR-IOV, virtio-net stays primary
>> - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
>> - virtio-net notices new NIC and starts over at enslave step above
>>
>> Future ideas (brainstorming):
>> - Optimize Fast east-west by having special rules to direct east-west
>>    traffic through virtio-net traffic path
>>
>> Thanks for reading!
>> Jesse
>
>
> Cc netdev.
>
> Interesting, and this method is actually used by netvsc now:
>
> commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> Author: stephen hemminger <stephen@networkplumber.org>
> Date:   Tue Aug 1 19:58:53 2017 -0700
>
>     netvsc: transparent VF management
>
>     This patch implements transparent fail over from synthetic NIC to
>     SR-IOV virtual function NIC in Hyper-V environment. It is a better
>     alternative to using bonding as is done now. Instead, the receive and
>     transmit fail over is done internally inside the driver.
>
>     Using bonding driver has lots of issues because it depends on the
>     script being run early enough in the boot process and with sufficient
>     information to make the association. This patch moves all that
>     functionality into the kernel.
>
>     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> If my understanding is correct there's no need to for any extension of
> virtio spec. If this is true, maybe you can start to prepare the patch?
>
> Thanks
>
>>
>> [1]
>>
>> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
>

I do not see why to couple the solution with any specific para-virt technology.
Not with netvsc, nor with virt-io.
One may wish to implement the routing between the VMs and the HV
without any PV device at all, e.g using VF representors and PCIe
loopback, as done with ASAP2-direct.
This method is actually much more efficient in CPU utilization (on the
expense of PCIe BW utilization).

So let's try to first specify the problems that need to be resolved in
order to support Live Migration with SR-IOV rather than rely on
already-done-work (netvsc) without understanding if/why it was right
from the beginning.

To my understanding the problems are the following:
1) DMA: with SR-IOV devices write directly into the guests memory
which yields dirty guest pages that are not marked as dirty for the
host CPU MMU, thus preventing the migration pre-copy phase from
starting while the guest is running on the source machine.
2) Guest network interface persistency: VF detachment causes VF driver
PCI remove which causes the VF netdev to disappear. If that VF netdev
is a guest primary interface (has an IP), sockets using it will break.

Re problem #1:
So far in this mail thread, it was taken for granted that the way to
resolve it is to have a PV device as backup for the pre-copy phase.
In addition to tying the solution with para-virt being in place (which
as already said seems a wrong enforcement to me), it does not really
solve the problem, rather partially works around it.
It just mitigates the problem from long service downtime to long
service degradation time.
To really stab the problem in its heart we need to just mark the guest
DMA written pages as dirty.
Alexander Duyck already initiated patches to address it ~two years ago
(https://groups.google.com/forum/#!topic/linux.kernel/aIQOsh2oJEk) but
un-fortunately they were abandoned.
The simplest way I can think of to resolve it is to have the guest VF
driver just read-modify-write some word of each DMA page before
passing it to the stack by netif_rx().
To limit the performance impact of this operation we can signal the VM
to start doing it only upon pre-copy phase start.

Re. problem #2:
Indeed the best way to address it seems to be to enslave the VF driver
netdev under a persistent anchor netdev.
And it's indeed desired to allow (but not enforce) PV netdev and VF
netdev to work in conjunction.
And it's indeed desired that this enslavement logic work out-of-the box.
But in case of PV+VF some configurable policies must be in place (and
they'd better be generic rather than differ per PV technology).
For example - based on which characteristics should the PV+VF coupling
be done? netvsc uses MAC address, but that might not always be the
desire.
Another example - when to use PV and when to use VF? One may want to
use PV only if VF is gone/down, while others may want to use PV also
when VF is up, e.g for multicasting.
I think the right way to address it is to have a new dedicated module
for this purpose.
Have it automatically enslave PV and VF netdevs according to user
configured policy. Enslave the VF even if there is no PV device at
all.

This way we get:
1) Optimal migration performance
2) A PV agnostic (VM may be migrated even from one PV technology to
another) and HW device agnostic solution
    A dedicated generic module will also enforce a lower common
denominator of guest netdev features, preventing migration dependency
on source/guest machine capabilities.
3) Out-of-the box solution yet with generic methods for policy setting

Thanks

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30  8:08   ` achiad shochat
@ 2017-11-30 14:11     ` Michael S. Tsirkin
  2017-12-01 20:08       ` Shannon Nelson
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2017-11-30 14:11 UTC (permalink / raw)
  To: achiad shochat
  Cc: Jason Wang, Jesse Brandeburg, virtualization, Jakub Kicinski,
	Sridhar Samudrala, Achiad, Peter Waskiewicz Jr, Singhai, Anjali,
	Andy Gospodarek, Or Gerlitz, Hannes Frederic Sowa, netdev

On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:
> Re. problem #2:
> Indeed the best way to address it seems to be to enslave the VF driver
> netdev under a persistent anchor netdev.
> And it's indeed desired to allow (but not enforce) PV netdev and VF
> netdev to work in conjunction.
> And it's indeed desired that this enslavement logic work out-of-the box.
> But in case of PV+VF some configurable policies must be in place (and
> they'd better be generic rather than differ per PV technology).
> For example - based on which characteristics should the PV+VF coupling
> be done? netvsc uses MAC address, but that might not always be the
> desire.

It's a policy but not guest userspace policy.

The hypervisor certainly knows.

Are you concerned that someone might want to create two devices with the
same MAC for an unrelated reason?  If so, hypervisor could easily set a
flag in the virtio device to say "this is a backup, use MAC to find
another device".

> Another example - when to use PV and when to use VF? One may want to
> use PV only if VF is gone/down, while others may want to use PV also
> when VF is up, e.g for multicasting.

There are a bunch of configurations where these two devices share
the same physical backend. In that case there is no point
in multicasting through both devices, in fact, PV might
not even work at all when passthrough is active.

IMHO these cases are what's worth handling in the kernel.

When there are two separate backends, we are getting into policy and
it's best to leave this to userspace (and it's unlikely network manager
will automatically do the right thing here, either).

> I think the right way to address it is to have a new dedicated module
> for this purpose.
> Have it automatically enslave PV and VF netdevs according to user
> configured policy. Enslave the VF even if there is no PV device at
> all.
> 
> This way we get:
> 1) Optimal migration performance

This remains to be proved.

> 2) A PV agnostic (VM may be migrated even from one PV technology to
> another)

Yes - in theory kvm could expose both a virtio and a hyperv device. But what
would be the point? Just do the abstraction in the hypervisor.

> and HW device agnostic solution

That's useful but I don't think we need to involve userspace for that.
HW abstraction is kernel's job.

>     A dedicated generic module will also enforce a lower common
> denominator of guest netdev features, preventing migration dependency
> on source/guest machine capabilities.

If all we are discussing is where this code should live, then I
do not really care. Let's implement it in virtio, then if we
find we have a lot of common we can factor it out.

> 3) Out-of-the box solution yet with generic methods for policy setting
> 
> 
> Thanks

There's no real policy that guest can set though. All setting happens
on the hypervisor side.

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30 14:11     ` Michael S. Tsirkin
@ 2017-12-01 20:08       ` Shannon Nelson
  2017-12-03  5:05         ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Shannon Nelson @ 2017-12-01 20:08 UTC (permalink / raw)
  To: Michael S. Tsirkin, achiad shochat
  Cc: Jason Wang, Jesse Brandeburg, virtualization, Jakub Kicinski,
	Sridhar Samudrala, Achiad, Peter Waskiewicz Jr, Singhai, Anjali,
	Andy Gospodarek, Or Gerlitz, Hannes Frederic Sowa, netdev

On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:
> On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:
>> Re. problem #2:
>> Indeed the best way to address it seems to be to enslave the VF driver
>> netdev under a persistent anchor netdev.
>> And it's indeed desired to allow (but not enforce) PV netdev and VF
>> netdev to work in conjunction.
>> And it's indeed desired that this enslavement logic work out-of-the box.
>> But in case of PV+VF some configurable policies must be in place (and
>> they'd better be generic rather than differ per PV technology).
>> For example - based on which characteristics should the PV+VF coupling
>> be done? netvsc uses MAC address, but that might not always be the
>> desire.
> 
> It's a policy but not guest userspace policy.
> 
> The hypervisor certainly knows.
> 
> Are you concerned that someone might want to create two devices with the
> same MAC for an unrelated reason?  If so, hypervisor could easily set a
> flag in the virtio device to say "this is a backup, use MAC to find
> another device".

This is something I was going to suggest: a flag or other configuration 
on the virtio device to help control how this new feature is used.  I 
can imagine this might be useful to control from either the hypervisor 
side or the VM side.

The hypervisor might want to (1) disable it (force it off), (2) enable 
it for VM choice, or (3) force it on for the VM.  In case (2), the VM 
might be able to chose whether it wants to make use of the feature, or 
stick with the bonding solution.

Either way, the kernel is making a feature available, and the user (VM 
or hypervisor) is able to control it by selecting the feature based on 
the policy desired.

sln

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-01 20:08       ` Shannon Nelson
@ 2017-12-03  5:05         ` Michael S. Tsirkin
  2017-12-03  9:14           ` achiad shochat
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2017-12-03  5:05 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Jakub Kicinski, Hannes Frederic Sowa, achiad shochat,
	Sridhar Samudrala, netdev, virtualization, Achiad,
	Peter Waskiewicz Jr, Singhai, Anjali, Andy Gospodarek, Or Gerlitz

On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote:
> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:
> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:
> > > Re. problem #2:
> > > Indeed the best way to address it seems to be to enslave the VF driver
> > > netdev under a persistent anchor netdev.
> > > And it's indeed desired to allow (but not enforce) PV netdev and VF
> > > netdev to work in conjunction.
> > > And it's indeed desired that this enslavement logic work out-of-the box.
> > > But in case of PV+VF some configurable policies must be in place (and
> > > they'd better be generic rather than differ per PV technology).
> > > For example - based on which characteristics should the PV+VF coupling
> > > be done? netvsc uses MAC address, but that might not always be the
> > > desire.
> > 
> > It's a policy but not guest userspace policy.
> > 
> > The hypervisor certainly knows.
> > 
> > Are you concerned that someone might want to create two devices with the
> > same MAC for an unrelated reason?  If so, hypervisor could easily set a
> > flag in the virtio device to say "this is a backup, use MAC to find
> > another device".
> 
> This is something I was going to suggest: a flag or other configuration on
> the virtio device to help control how this new feature is used.  I can
> imagine this might be useful to control from either the hypervisor side or
> the VM side.
> 
> The hypervisor might want to (1) disable it (force it off), (2) enable it
> for VM choice, or (3) force it on for the VM.  In case (2), the VM might be
> able to chose whether it wants to make use of the feature, or stick with the
> bonding solution.
> 
> Either way, the kernel is making a feature available, and the user (VM or
> hypervisor) is able to control it by selecting the feature based on the
> policy desired.
> 
> sln

I'm not sure what's the feature that is available here.

I saw this as a flag that says "this device shares backend with another
network device which can be found using MAC, and that backend should be
preferred".  kernel then forces configuration which uses that other
backend - as long as it exists.

However, please Cc virtio-dev mailing list if we are doing this since
this is a spec extension.

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-03  5:05         ` Michael S. Tsirkin
@ 2017-12-03  9:14           ` achiad shochat
  2017-12-03 17:35             ` Stephen Hemminger
  0 siblings, 1 reply; 25+ messages in thread
From: achiad shochat @ 2017-12-03  9:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jakub Kicinski, Hannes Frederic Sowa, Sridhar Samudrala, netdev,
	virtualization, Achiad, Peter Waskiewicz Jr, Singhai, Anjali,
	Shannon Nelson, Andy Gospodarek, Or Gerlitz

On 3 December 2017 at 07:05, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote:
>> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:
>> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:
>> > > Re. problem #2:
>> > > Indeed the best way to address it seems to be to enslave the VF driver
>> > > netdev under a persistent anchor netdev.
>> > > And it's indeed desired to allow (but not enforce) PV netdev and VF
>> > > netdev to work in conjunction.
>> > > And it's indeed desired that this enslavement logic work out-of-the box.
>> > > But in case of PV+VF some configurable policies must be in place (and
>> > > they'd better be generic rather than differ per PV technology).
>> > > For example - based on which characteristics should the PV+VF coupling
>> > > be done? netvsc uses MAC address, but that might not always be the
>> > > desire.
>> >
>> > It's a policy but not guest userspace policy.
>> >
>> > The hypervisor certainly knows.
>> >
>> > Are you concerned that someone might want to create two devices with the
>> > same MAC for an unrelated reason?  If so, hypervisor could easily set a
>> > flag in the virtio device to say "this is a backup, use MAC to find
>> > another device".
>>
>> This is something I was going to suggest: a flag or other configuration on
>> the virtio device to help control how this new feature is used.  I can
>> imagine this might be useful to control from either the hypervisor side or
>> the VM side.
>>
>> The hypervisor might want to (1) disable it (force it off), (2) enable it
>> for VM choice, or (3) force it on for the VM.  In case (2), the VM might be
>> able to chose whether it wants to make use of the feature, or stick with the
>> bonding solution.
>>
>> Either way, the kernel is making a feature available, and the user (VM or
>> hypervisor) is able to control it by selecting the feature based on the
>> policy desired.
>>
>> sln
>
> I'm not sure what's the feature that is available here.
>
> I saw this as a flag that says "this device shares backend with another
> network device which can be found using MAC, and that backend should be
> preferred".  kernel then forces configuration which uses that other
> backend - as long as it exists.
>
> However, please Cc virtio-dev mailing list if we are doing this since
> this is a spec extension.
>
> --
> MST


Can someone please explain why assume a virtio device is there at all??
I specified a case where there isn't any.

I second Jacob - having a netdev of one device driver enslave a netdev
of another device driver is an awkward a-symmetric model.
Regardless of whether they share the same backend device.
Only I am not sure the Linux Bond is the right choice.
e.g one may well want to use the virtio device also when the
pass-through device is available, e.g for multicasts, east-west
traffic, etc.
I'm not sure the Linux Bond fits that functionality.
And, as I hear in this thread, it is hard to make it work out of the box.
So I think the right thing would be to write a new dedicated module
for this purpose.

Re policy -
Indeed the HV can request a policy from the guest but that's not a
claim for the virtio device enslaving the pass-through device.
Any policy can be queried by the upper enslaving device.

Bottom line - I do not see a single reason to have the virtio netdev
(nor netvsc or any other PV netdev) enslave another netdev by itself.
If we'd do it right with netvsc from the beginning we wouldn't need
this discussion at all...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-03  9:14           ` achiad shochat
@ 2017-12-03 17:35             ` Stephen Hemminger
  2017-12-04  9:51               ` achiad shochat
  0 siblings, 1 reply; 25+ messages in thread
From: Stephen Hemminger @ 2017-12-03 17:35 UTC (permalink / raw)
  To: achiad shochat
  Cc: Michael S. Tsirkin, Jakub Kicinski, Hannes Frederic Sowa,
	Sridhar Samudrala, netdev, virtualization, Achiad,
	Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On Sun, 3 Dec 2017 11:14:37 +0200
achiad shochat <achiad.mellanox@gmail.com> wrote:

> On 3 December 2017 at 07:05, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote:  
> >> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:  
> >> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:  
> >> > > Re. problem #2:
> >> > > Indeed the best way to address it seems to be to enslave the VF driver
> >> > > netdev under a persistent anchor netdev.
> >> > > And it's indeed desired to allow (but not enforce) PV netdev and VF
> >> > > netdev to work in conjunction.
> >> > > And it's indeed desired that this enslavement logic work out-of-the box.
> >> > > But in case of PV+VF some configurable policies must be in place (and
> >> > > they'd better be generic rather than differ per PV technology).
> >> > > For example - based on which characteristics should the PV+VF coupling
> >> > > be done? netvsc uses MAC address, but that might not always be the
> >> > > desire.  
> >> >
> >> > It's a policy but not guest userspace policy.
> >> >
> >> > The hypervisor certainly knows.
> >> >
> >> > Are you concerned that someone might want to create two devices with the
> >> > same MAC for an unrelated reason?  If so, hypervisor could easily set a
> >> > flag in the virtio device to say "this is a backup, use MAC to find
> >> > another device".  
> >>
> >> This is something I was going to suggest: a flag or other configuration on
> >> the virtio device to help control how this new feature is used.  I can
> >> imagine this might be useful to control from either the hypervisor side or
> >> the VM side.
> >>
> >> The hypervisor might want to (1) disable it (force it off), (2) enable it
> >> for VM choice, or (3) force it on for the VM.  In case (2), the VM might be
> >> able to chose whether it wants to make use of the feature, or stick with the
> >> bonding solution.
> >>
> >> Either way, the kernel is making a feature available, and the user (VM or
> >> hypervisor) is able to control it by selecting the feature based on the
> >> policy desired.
> >>
> >> sln  
> >
> > I'm not sure what's the feature that is available here.
> >
> > I saw this as a flag that says "this device shares backend with another
> > network device which can be found using MAC, and that backend should be
> > preferred".  kernel then forces configuration which uses that other
> > backend - as long as it exists.
> >
> > However, please Cc virtio-dev mailing list if we are doing this since
> > this is a spec extension.
> >
> > --
> > MST  
> 
> 
> Can someone please explain why assume a virtio device is there at all??
> I specified a case where there isn't any.
> 
> I second Jacob - having a netdev of one device driver enslave a netdev
> of another device driver is an awkward a-symmetric model.
> Regardless of whether they share the same backend device.
> Only I am not sure the Linux Bond is the right choice.
> e.g one may well want to use the virtio device also when the
> pass-through device is available, e.g for multicasts, east-west
> traffic, etc.
> I'm not sure the Linux Bond fits that functionality.
> And, as I hear in this thread, it is hard to make it work out of the box.
> So I think the right thing would be to write a new dedicated module
> for this purpose.
> 
> Re policy -
> Indeed the HV can request a policy from the guest but that's not a
> claim for the virtio device enslaving the pass-through device.
> Any policy can be queried by the upper enslaving device.
> 
> Bottom line - I do not see a single reason to have the virtio netdev
> (nor netvsc or any other PV netdev) enslave another netdev by itself.
> If we'd do it right with netvsc from the beginning we wouldn't need
> this discussion at all...

There are several issues with transparent migration.
The first is that the SR-IOV device needs to be shut off for earlier
in the migration process.
Next, the SR-IOV device in the migrated go guest environment maybe different.
It might not exist at all, it might be at a different PCI address, or it
could even be a different vendor/speed/model.
Keeping a virtual network device around allows persisting the connectivity,
during the process.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-03 17:35             ` Stephen Hemminger
@ 2017-12-04  9:51               ` achiad shochat
  2017-12-04 16:30                 ` Alexander Duyck
  0 siblings, 1 reply; 25+ messages in thread
From: achiad shochat @ 2017-12-04  9:51 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael S. Tsirkin, Jakub Kicinski, Hannes Frederic Sowa,
	Sridhar Samudrala, netdev, virtualization, Achiad,
	Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On 3 December 2017 at 19:35, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Sun, 3 Dec 2017 11:14:37 +0200
> achiad shochat <achiad.mellanox@gmail.com> wrote:
>
>> On 3 December 2017 at 07:05, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote:
>> >> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:
>> >> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:
>> >> > > Re. problem #2:
>> >> > > Indeed the best way to address it seems to be to enslave the VF driver
>> >> > > netdev under a persistent anchor netdev.
>> >> > > And it's indeed desired to allow (but not enforce) PV netdev and VF
>> >> > > netdev to work in conjunction.
>> >> > > And it's indeed desired that this enslavement logic work out-of-the box.
>> >> > > But in case of PV+VF some configurable policies must be in place (and
>> >> > > they'd better be generic rather than differ per PV technology).
>> >> > > For example - based on which characteristics should the PV+VF coupling
>> >> > > be done? netvsc uses MAC address, but that might not always be the
>> >> > > desire.
>> >> >
>> >> > It's a policy but not guest userspace policy.
>> >> >
>> >> > The hypervisor certainly knows.
>> >> >
>> >> > Are you concerned that someone might want to create two devices with the
>> >> > same MAC for an unrelated reason?  If so, hypervisor could easily set a
>> >> > flag in the virtio device to say "this is a backup, use MAC to find
>> >> > another device".
>> >>
>> >> This is something I was going to suggest: a flag or other configuration on
>> >> the virtio device to help control how this new feature is used.  I can
>> >> imagine this might be useful to control from either the hypervisor side or
>> >> the VM side.
>> >>
>> >> The hypervisor might want to (1) disable it (force it off), (2) enable it
>> >> for VM choice, or (3) force it on for the VM.  In case (2), the VM might be
>> >> able to chose whether it wants to make use of the feature, or stick with the
>> >> bonding solution.
>> >>
>> >> Either way, the kernel is making a feature available, and the user (VM or
>> >> hypervisor) is able to control it by selecting the feature based on the
>> >> policy desired.
>> >>
>> >> sln
>> >
>> > I'm not sure what's the feature that is available here.
>> >
>> > I saw this as a flag that says "this device shares backend with another
>> > network device which can be found using MAC, and that backend should be
>> > preferred".  kernel then forces configuration which uses that other
>> > backend - as long as it exists.
>> >
>> > However, please Cc virtio-dev mailing list if we are doing this since
>> > this is a spec extension.
>> >
>> > --
>> > MST
>>
>>
>> Can someone please explain why assume a virtio device is there at all??
>> I specified a case where there isn't any.
>>
>> I second Jacob - having a netdev of one device driver enslave a netdev
>> of another device driver is an awkward a-symmetric model.
>> Regardless of whether they share the same backend device.
>> Only I am not sure the Linux Bond is the right choice.
>> e.g one may well want to use the virtio device also when the
>> pass-through device is available, e.g for multicasts, east-west
>> traffic, etc.
>> I'm not sure the Linux Bond fits that functionality.
>> And, as I hear in this thread, it is hard to make it work out of the box.
>> So I think the right thing would be to write a new dedicated module
>> for this purpose.
>>
>> Re policy -
>> Indeed the HV can request a policy from the guest but that's not a
>> claim for the virtio device enslaving the pass-through device.
>> Any policy can be queried by the upper enslaving device.
>>
>> Bottom line - I do not see a single reason to have the virtio netdev
>> (nor netvsc or any other PV netdev) enslave another netdev by itself.
>> If we'd do it right with netvsc from the beginning we wouldn't need
>> this discussion at all...
>
> There are several issues with transparent migration.
> The first is that the SR-IOV device needs to be shut off for earlier
> in the migration process.

That's not a given fact.
It's due to the DMA and it should be solve anyway.
Please read my first reply in this thread.

> Next, the SR-IOV device in the migrated go guest environment maybe different.
> It might not exist at all, it might be at a different PCI address, or it
> could even be a different vendor/speed/model.
> Keeping a virtual network device around allows persisting the connectivity,
> during the process.

Right, but that virtual device must not relate to any para-virt
specific technology (not netvsc, nor virtio).
Again, it seems you did not read my first reply.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-04  9:51               ` achiad shochat
@ 2017-12-04 16:30                 ` Alexander Duyck
  2017-12-05  9:59                   ` achiad shochat
  0 siblings, 1 reply; 25+ messages in thread
From: Alexander Duyck @ 2017-12-04 16:30 UTC (permalink / raw)
  To: achiad shochat
  Cc: Stephen Hemminger, Michael S. Tsirkin, Jakub Kicinski,
	Hannes Frederic Sowa, Sridhar Samudrala, netdev, virtualization,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On Mon, Dec 4, 2017 at 1:51 AM, achiad shochat
<achiad.mellanox@gmail.com> wrote:
> On 3 December 2017 at 19:35, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> On Sun, 3 Dec 2017 11:14:37 +0200
>> achiad shochat <achiad.mellanox@gmail.com> wrote:
>>
>>> On 3 December 2017 at 07:05, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> > On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote:
>>> >> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:
>>> >> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:
>>> >> > > Re. problem #2:
>>> >> > > Indeed the best way to address it seems to be to enslave the VF driver
>>> >> > > netdev under a persistent anchor netdev.
>>> >> > > And it's indeed desired to allow (but not enforce) PV netdev and VF
>>> >> > > netdev to work in conjunction.
>>> >> > > And it's indeed desired that this enslavement logic work out-of-the box.
>>> >> > > But in case of PV+VF some configurable policies must be in place (and
>>> >> > > they'd better be generic rather than differ per PV technology).
>>> >> > > For example - based on which characteristics should the PV+VF coupling
>>> >> > > be done? netvsc uses MAC address, but that might not always be the
>>> >> > > desire.
>>> >> >
>>> >> > It's a policy but not guest userspace policy.
>>> >> >
>>> >> > The hypervisor certainly knows.
>>> >> >
>>> >> > Are you concerned that someone might want to create two devices with the
>>> >> > same MAC for an unrelated reason?  If so, hypervisor could easily set a
>>> >> > flag in the virtio device to say "this is a backup, use MAC to find
>>> >> > another device".
>>> >>
>>> >> This is something I was going to suggest: a flag or other configuration on
>>> >> the virtio device to help control how this new feature is used.  I can
>>> >> imagine this might be useful to control from either the hypervisor side or
>>> >> the VM side.
>>> >>
>>> >> The hypervisor might want to (1) disable it (force it off), (2) enable it
>>> >> for VM choice, or (3) force it on for the VM.  In case (2), the VM might be
>>> >> able to chose whether it wants to make use of the feature, or stick with the
>>> >> bonding solution.
>>> >>
>>> >> Either way, the kernel is making a feature available, and the user (VM or
>>> >> hypervisor) is able to control it by selecting the feature based on the
>>> >> policy desired.
>>> >>
>>> >> sln
>>> >
>>> > I'm not sure what's the feature that is available here.
>>> >
>>> > I saw this as a flag that says "this device shares backend with another
>>> > network device which can be found using MAC, and that backend should be
>>> > preferred".  kernel then forces configuration which uses that other
>>> > backend - as long as it exists.
>>> >
>>> > However, please Cc virtio-dev mailing list if we are doing this since
>>> > this is a spec extension.
>>> >
>>> > --
>>> > MST
>>>
>>>
>>> Can someone please explain why assume a virtio device is there at all??
>>> I specified a case where there isn't any.

Migrating without any virtual device is going to be extremely
challenging, especially in any kind of virtualization setup where the
hosts are not homogeneous. By providing a virtio interface you can
guarantee that at least 1 network interface is available on any given
host, and then fail over to that as the least common denominator for
any migration.

>>> I second Jacob - having a netdev of one device driver enslave a netdev
>>> of another device driver is an awkward a-symmetric model.
>>> Regardless of whether they share the same backend device.
>>> Only I am not sure the Linux Bond is the right choice.
>>> e.g one may well want to use the virtio device also when the
>>> pass-through device is available, e.g for multicasts, east-west
>>> traffic, etc.
>>> I'm not sure the Linux Bond fits that functionality.
>>> And, as I hear in this thread, it is hard to make it work out of the box.
>>> So I think the right thing would be to write a new dedicated module
>>> for this purpose.

This part I can sort of agree with. What if we were to look at
providing a way to somehow advertise that the two devices were meant
to be boded for virtualization purposes? For now lets call it a
"virt-bond". Basically we could look at providing a means for virtio
and VF drivers to advertise that they want this sort of bond. Then it
would just be a matter of providing some sort of side channel to
indicate where you want things like multicast/broadcast/east-west
traffic to go.

>>> Re policy -
>>> Indeed the HV can request a policy from the guest but that's not a
>>> claim for the virtio device enslaving the pass-through device.
>>> Any policy can be queried by the upper enslaving device.
>>>
>>> Bottom line - I do not see a single reason to have the virtio netdev
>>> (nor netvsc or any other PV netdev) enslave another netdev by itself.
>>> If we'd do it right with netvsc from the beginning we wouldn't need
>>> this discussion at all...
>>
>> There are several issues with transparent migration.
>> The first is that the SR-IOV device needs to be shut off for earlier
>> in the migration process.
>
> That's not a given fact.
> It's due to the DMA and it should be solve anyway.
> Please read my first reply in this thread.

For now it is a fact. We would need to do a drastic rewrite of the DMA
API in the guest/host/QEMU/IOMMU in order to avoid it for now. So as a
first step I would say we should look at using this bonding type
solution. Being able to defer the VF eviction could be a next step for
all this as it would allow for much better performance, but we still
have too many cases where the VF might not be there after a migration.

>> Next, the SR-IOV device in the migrated go guest environment maybe different.
>> It might not exist at all, it might be at a different PCI address, or it
>> could even be a different vendor/speed/model.
>> Keeping a virtual network device around allows persisting the connectivity,
>> during the process.
>
> Right, but that virtual device must not relate to any para-virt
> specific technology (not netvsc, nor virtio).
> Again, it seems you did not read my first reply.

I would agree with the need to make this agnostic. Maybe we could look
at the current netvsc solution and find a way to make it generic so it
could be applied to any combination of paravirtual interface and PF.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-04 16:30                 ` Alexander Duyck
@ 2017-12-05  9:59                   ` achiad shochat
  2017-12-05 19:20                     ` Michael S. Tsirkin
  2017-12-05 22:29                     ` Jakub Kicinski
  0 siblings, 2 replies; 25+ messages in thread
From: achiad shochat @ 2017-12-05  9:59 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Stephen Hemminger, Michael S. Tsirkin, Jakub Kicinski,
	Hannes Frederic Sowa, Sridhar Samudrala, netdev, virtualization,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On 4 December 2017 at 18:30, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> On Mon, Dec 4, 2017 at 1:51 AM, achiad shochat
> <achiad.mellanox@gmail.com> wrote:
>> On 3 December 2017 at 19:35, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> On Sun, 3 Dec 2017 11:14:37 +0200
>>> achiad shochat <achiad.mellanox@gmail.com> wrote:
>>>
>>>> On 3 December 2017 at 07:05, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> > On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote:
>>>> >> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:
>>>> >> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:
>>>> >> > > Re. problem #2:
>>>> >> > > Indeed the best way to address it seems to be to enslave the VF driver
>>>> >> > > netdev under a persistent anchor netdev.
>>>> >> > > And it's indeed desired to allow (but not enforce) PV netdev and VF
>>>> >> > > netdev to work in conjunction.
>>>> >> > > And it's indeed desired that this enslavement logic work out-of-the box.
>>>> >> > > But in case of PV+VF some configurable policies must be in place (and
>>>> >> > > they'd better be generic rather than differ per PV technology).
>>>> >> > > For example - based on which characteristics should the PV+VF coupling
>>>> >> > > be done? netvsc uses MAC address, but that might not always be the
>>>> >> > > desire.
>>>> >> >
>>>> >> > It's a policy but not guest userspace policy.
>>>> >> >
>>>> >> > The hypervisor certainly knows.
>>>> >> >
>>>> >> > Are you concerned that someone might want to create two devices with the
>>>> >> > same MAC for an unrelated reason?  If so, hypervisor could easily set a
>>>> >> > flag in the virtio device to say "this is a backup, use MAC to find
>>>> >> > another device".
>>>> >>
>>>> >> This is something I was going to suggest: a flag or other configuration on
>>>> >> the virtio device to help control how this new feature is used.  I can
>>>> >> imagine this might be useful to control from either the hypervisor side or
>>>> >> the VM side.
>>>> >>
>>>> >> The hypervisor might want to (1) disable it (force it off), (2) enable it
>>>> >> for VM choice, or (3) force it on for the VM.  In case (2), the VM might be
>>>> >> able to chose whether it wants to make use of the feature, or stick with the
>>>> >> bonding solution.
>>>> >>
>>>> >> Either way, the kernel is making a feature available, and the user (VM or
>>>> >> hypervisor) is able to control it by selecting the feature based on the
>>>> >> policy desired.
>>>> >>
>>>> >> sln
>>>> >
>>>> > I'm not sure what's the feature that is available here.
>>>> >
>>>> > I saw this as a flag that says "this device shares backend with another
>>>> > network device which can be found using MAC, and that backend should be
>>>> > preferred".  kernel then forces configuration which uses that other
>>>> > backend - as long as it exists.
>>>> >
>>>> > However, please Cc virtio-dev mailing list if we are doing this since
>>>> > this is a spec extension.
>>>> >
>>>> > --
>>>> > MST
>>>>
>>>>
>>>> Can someone please explain why assume a virtio device is there at all??
>>>> I specified a case where there isn't any.
>
> Migrating without any virtual device is going to be extremely
> challenging, especially in any kind of virtualization setup where the
> hosts are not homogeneous. By providing a virtio interface you can
> guarantee that at least 1 network interface is available on any given
> host, and then fail over to that as the least common denominator for
> any migration.
>

I am not sure why you think it is going to be so challenging.
Are you referring to preserving the pass-through device driver state
(RX/TX rings)?
I do not think we should preserve them, we can simply teardown the
whole VF netdev (since we have a parent netdev as application
interface).
The downtime impact will be negligible.

>>>> I second Jacob - having a netdev of one device driver enslave a netdev
>>>> of another device driver is an awkward a-symmetric model.
>>>> Regardless of whether they share the same backend device.
>>>> Only I am not sure the Linux Bond is the right choice.
>>>> e.g one may well want to use the virtio device also when the
>>>> pass-through device is available, e.g for multicasts, east-west
>>>> traffic, etc.
>>>> I'm not sure the Linux Bond fits that functionality.
>>>> And, as I hear in this thread, it is hard to make it work out of the box.
>>>> So I think the right thing would be to write a new dedicated module
>>>> for this purpose.
>
> This part I can sort of agree with. What if we were to look at
> providing a way to somehow advertise that the two devices were meant
> to be boded for virtualization purposes? For now lets call it a
> "virt-bond". Basically we could look at providing a means for virtio
> and VF drivers to advertise that they want this sort of bond. Then it
> would just be a matter of providing some sort of side channel to
> indicate where you want things like multicast/broadcast/east-west
> traffic to go.
>

I like this approach.


>>>> Re policy -
>>>> Indeed the HV can request a policy from the guest but that's not a
>>>> claim for the virtio device enslaving the pass-through device.
>>>> Any policy can be queried by the upper enslaving device.
>>>>
>>>> Bottom line - I do not see a single reason to have the virtio netdev
>>>> (nor netvsc or any other PV netdev) enslave another netdev by itself.
>>>> If we'd do it right with netvsc from the beginning we wouldn't need
>>>> this discussion at all...
>>>
>>> There are several issues with transparent migration.
>>> The first is that the SR-IOV device needs to be shut off for earlier
>>> in the migration process.
>>
>> That's not a given fact.
>> It's due to the DMA and it should be solve anyway.
>> Please read my first reply in this thread.
>
> For now it is a fact. We would need to do a drastic rewrite of the DMA
> API in the guest/host/QEMU/IOMMU in order to avoid it for now. So as a
> first step I would say we should look at using this bonding type
> solution. Being able to defer the VF eviction could be a next step for
> all this as it would allow for much better performance, but we still
> have too many cases where the VF might not be there after a migration.
>

Why would we need such a drastic rewrite?
Why would a simple Read-DontModify-Write (to mark the page as dirty)
by the VF driver not do the job?

Anyway, if you have a generic virtual parent netdev handling that can
be handled orthogonally.

>>> Next, the SR-IOV device in the migrated go guest environment maybe different.
>>> It might not exist at all, it might be at a different PCI address, or it
>>> could even be a different vendor/speed/model.
>>> Keeping a virtual network device around allows persisting the connectivity,
>>> during the process.
>>
>> Right, but that virtual device must not relate to any para-virt
>> specific technology (not netvsc, nor virtio).
>> Again, it seems you did not read my first reply.
>
> I would agree with the need to make this agnostic. Maybe we could look
> at the current netvsc solution and find a way to make it generic so it
> could be applied to any combination of paravirtual interface and PF.

Agree. That's should be the approach IMO.
Then we'll have a single solution for both netvsc and virtio (and any
other PV device).
And we could handle the VF DMA dirt issue agnostically.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-05  9:59                   ` achiad shochat
@ 2017-12-05 19:20                     ` Michael S. Tsirkin
  2017-12-05 21:52                       ` Jesse Brandeburg
  2017-12-07  7:28                       ` achiad shochat
  2017-12-05 22:29                     ` Jakub Kicinski
  1 sibling, 2 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2017-12-05 19:20 UTC (permalink / raw)
  To: achiad shochat
  Cc: Alexander Duyck, Stephen Hemminger, Jakub Kicinski,
	Hannes Frederic Sowa, Sridhar Samudrala, netdev, virtualization,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote:
> Then we'll have a single solution for both netvsc and virtio (and any
> other PV device).
> And we could handle the VF DMA dirt issue agnostically.

For the record, I won't block patches adding this kist to virtio
on the basis that they must be generic. It's not a lot
of code, implementation can come first, prettify later.

But we do need to have a discussion about how devices are paired.
I am not sure using just MAC works. E.g. some passthrough
devices don't give host ability to set the MAC.
Are these worth worrying about?

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-05 19:20                     ` Michael S. Tsirkin
@ 2017-12-05 21:52                       ` Jesse Brandeburg
  2017-12-05 22:05                         ` Michael S. Tsirkin
  2017-12-07  7:28                       ` achiad shochat
  1 sibling, 1 reply; 25+ messages in thread
From: Jesse Brandeburg @ 2017-12-05 21:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: achiad shochat, Jakub Kicinski, Hannes Frederic Sowa,
	Sridhar Samudrala, Alexander Duyck, virtualization,
	Shannon Nelson, Achiad, Peter Waskiewicz Jr, netdev,
	Anjali Singhai Jain, Andy Gospodarek, Or Gerlitz,
	jesse.brandeburg

On Tue, 5 Dec 2017 21:20:07 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote:
> > Then we'll have a single solution for both netvsc and virtio (and any
> > other PV device).
> > And we could handle the VF DMA dirt issue agnostically.  
> 
> For the record, I won't block patches adding this kist to virtio
> on the basis that they must be generic. It's not a lot
> of code, implementation can come first, prettify later.

Thanks, based on this discussion we're going to work on improving
virtio-net first, but some of Achiad's points are good.  I don't believe
it should block the virtio work however.

In particular I'm really interested in figuring out how we can get to
the point that virtio is able to make or implement some smart decisions
about which NIC to pick for traffic delivery (it's own paravirt path or
the passthorugh device path), if Achiad wants to develop the idea into
some code, I'd be interested to review it.

> But we do need to have a discussion about how devices are paired.
> I am not sure using just MAC works. E.g. some passthrough
> devices don't give host ability to set the MAC.
> Are these worth worrying about?

I personally don't think that will be much of a problem, if a
certain device has that issue, can't we just have the virtio-net device
pick up the MAC address of the passthrough device? As long as they match
things should work OK. It at least is an initial way to do the
configuration that has at least some traction as workable, as proved by
the Microsoft design.

FWIW, the Intel SR-IOV devices all accept a hypervisor/host provided
MAC address.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-05 21:52                       ` Jesse Brandeburg
@ 2017-12-05 22:05                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2017-12-05 22:05 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: achiad shochat, Jakub Kicinski, Hannes Frederic Sowa,
	Sridhar Samudrala, Alexander Duyck, virtualization,
	Shannon Nelson, Achiad, Peter Waskiewicz Jr, netdev,
	Anjali Singhai Jain, Andy Gospodarek, Or Gerlitz

On Tue, Dec 05, 2017 at 01:52:26PM -0800, Jesse Brandeburg wrote:
> On Tue, 5 Dec 2017 21:20:07 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote:
> > > Then we'll have a single solution for both netvsc and virtio (and any
> > > other PV device).
> > > And we could handle the VF DMA dirt issue agnostically.  
> > 
> > For the record, I won't block patches adding this kist to virtio
> > on the basis that they must be generic. It's not a lot
> > of code, implementation can come first, prettify later.
> 
> Thanks, based on this discussion we're going to work on improving
> virtio-net first, but some of Achiad's points are good.  I don't believe
> it should block the virtio work however.
> 
> In particular I'm really interested in figuring out how we can get to
> the point that virtio is able to make or implement some smart decisions
> about which NIC to pick for traffic delivery (it's own paravirt path or
> the passthorugh device path), if Achiad wants to develop the idea into
> some code, I'd be interested to review it.
> 
> > But we do need to have a discussion about how devices are paired.
> > I am not sure using just MAC works. E.g. some passthrough
> > devices don't give host ability to set the MAC.
> > Are these worth worrying about?
> 
> I personally don't think that will be much of a problem, if a
> certain device has that issue, can't we just have the virtio-net device
> pick up the MAC address of the passthrough device?

Then what do you do after you have migrated to another box?
The PT device there likely has a different MAC.

> As long as they match
> things should work OK. It at least is an initial way to do the
> configuration that has at least some traction as workable, as proved by
> the Microsoft design.

Yes - that design just implements what people have been doing for years
using bond so of course it's workable.

> FWIW, the Intel SR-IOV devices all accept a hypervisor/host provided
> MAC address.

For VFs you often can program the MAC through the PF, but you typically
can't do this for PFs. Or as another example consider nested virt with a
VF passed through.  PF isn't there within L1 guest so can't be used to
program the mac of the VF.

Still, we can always start small and require same mac, add other ways
to address issues later as we come up with them.

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-05 19:20                     ` Michael S. Tsirkin
  2017-12-05 21:52                       ` Jesse Brandeburg
@ 2017-12-07  7:28                       ` achiad shochat
  2017-12-07 16:45                         ` Alexander Duyck
  1 sibling, 1 reply; 25+ messages in thread
From: achiad shochat @ 2017-12-07  7:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, Stephen Hemminger, Jakub Kicinski,
	Hannes Frederic Sowa, Sridhar Samudrala, netdev, virtualization,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On 5 December 2017 at 21:20, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote:
>> Then we'll have a single solution for both netvsc and virtio (and any
>> other PV device).
>> And we could handle the VF DMA dirt issue agnostically.
>
> For the record, I won't block patches adding this kist to virtio
> on the basis that they must be generic. It's not a lot
> of code, implementation can come first, prettify later.

It's not a lot of code either way.
So I fail to understand why not to do it right from the beginning.
For the record...

>
> But we do need to have a discussion about how devices are paired.
> I am not sure using just MAC works. E.g. some passthrough
> devices don't give host ability to set the MAC.
> Are these worth worrying about?
>
> --
> MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-07  7:28                       ` achiad shochat
@ 2017-12-07 16:45                         ` Alexander Duyck
  2017-12-07 16:53                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Alexander Duyck @ 2017-12-07 16:45 UTC (permalink / raw)
  To: achiad shochat
  Cc: Michael S. Tsirkin, Stephen Hemminger, Jakub Kicinski,
	Hannes Frederic Sowa, Sridhar Samudrala, netdev, virtualization,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On Wed, Dec 6, 2017 at 11:28 PM, achiad shochat
<achiad.mellanox@gmail.com> wrote:
> On 5 December 2017 at 21:20, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote:
>>> Then we'll have a single solution for both netvsc and virtio (and any
>>> other PV device).
>>> And we could handle the VF DMA dirt issue agnostically.
>>
>> For the record, I won't block patches adding this kist to virtio
>> on the basis that they must be generic. It's not a lot
>> of code, implementation can come first, prettify later.
>
> It's not a lot of code either way.
> So I fail to understand why not to do it right from the beginning.
> For the record...

What isn't a lot of code? If you are talking about the DMA dirtying
then I would have to disagree. The big problem with the DMA is that we
have to mark a page as dirty and non-migratable as soon as it is
mapped for Rx DMA. It isn't until the driver has either unmapped the
page or the device has been disabled that we can then allow the page
to be migrated for being dirty. That ends up being the way we have to
support this if we don't have the bonding solution.

With the bonding solution we could look at doing a lightweight DMA
dirtying which would just require flagging pages as dirty after an
unmap or sync call is performed. However it requires that we shut down
the driver/device before we can complete the migration which means we
have to have the paravirtualized fail-over approach.

As far as indicating that the interfaces are meant to be enslaved I
wonder if we couldn't look at tweaking the PCI layout of the guest and
use that to indicate that a given set of interfaces are meant to be
bonded. For example the VFs are all meant to work as a part of a
multi-function device. What if we were to make virtio-net function 0
of a PCI/PCIe device, and then place any direct assigned VFs that are
meant to be a part of the bond in functions 1-7 of the device? Then it
isn't too far off from the model we have on the host where if the VF
goes away we would expect to see the traffic on the PF that is usually
occupying function 0 of a given device.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-07 16:45                         ` Alexander Duyck
@ 2017-12-07 16:53                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2017-12-07 16:53 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Hannes Frederic Sowa, achiad shochat,
	Sridhar Samudrala, virtualization, Shannon Nelson, Achiad,
	Peter Waskiewicz Jr, netdev, Singhai, Anjali, Andy Gospodarek,
	Or Gerlitz

On Thu, Dec 07, 2017 at 08:45:33AM -0800, Alexander Duyck wrote:
> As far as indicating that the interfaces are meant to be enslaved I
> wonder if we couldn't look at tweaking the PCI layout of the guest and
> use that to indicate that a given set of interfaces are meant to be
> bonded. For example the VFs are all meant to work as a part of a
> multi-function device. What if we were to make virtio-net function 0
> of a PCI/PCIe device, and then place any direct assigned VFs that are
> meant to be a part of the bond in functions 1-7 of the device? Then it
> isn't too far off from the model we have on the host where if the VF
> goes away we would expect to see the traffic on the PF that is usually
> occupying function 0 of a given device.

This pretty much precludes removing them with hotplug.

But as long as we are happy to limit this to pci devices,
maybe we should put them behind a pci bridge.

All devices behind a bridge would be assumed to have
the same backend.

QEMU has pci-bridge-seat which we could reuse for this -
just need to build a similar pci express bridge.

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-05  9:59                   ` achiad shochat
  2017-12-05 19:20                     ` Michael S. Tsirkin
@ 2017-12-05 22:29                     ` Jakub Kicinski
  2017-12-05 22:41                       ` Stephen Hemminger
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2017-12-05 22:29 UTC (permalink / raw)
  To: achiad shochat
  Cc: Alexander Duyck, Stephen Hemminger, Michael S. Tsirkin,
	Hannes Frederic Sowa, Sridhar Samudrala, netdev, virtualization,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On Tue, 5 Dec 2017 11:59:17 +0200, achiad shochat wrote:
> >>>> I second Jacob - having a netdev of one device driver enslave a netdev
> >>>> of another device driver is an awkward a-symmetric model.
> >>>> Regardless of whether they share the same backend device.
> >>>> Only I am not sure the Linux Bond is the right choice.
> >>>> e.g one may well want to use the virtio device also when the
> >>>> pass-through device is available, e.g for multicasts, east-west
> >>>> traffic, etc.
> >>>> I'm not sure the Linux Bond fits that functionality.
> >>>> And, as I hear in this thread, it is hard to make it work out of the box.
> >>>> So I think the right thing would be to write a new dedicated module
> >>>> for this purpose.  
> >
> > This part I can sort of agree with. What if we were to look at
> > providing a way to somehow advertise that the two devices were meant
> > to be boded for virtualization purposes? For now lets call it a
> > "virt-bond". Basically we could look at providing a means for virtio
> > and VF drivers to advertise that they want this sort of bond. Then it
> > would just be a matter of providing some sort of side channel to
> > indicate where you want things like multicast/broadcast/east-west
> > traffic to go.
> 
> I like this approach.

+1 on a separate driver, just enslaving devices to virtio may break
existing setups.  If people are bonding from user space today, if they
update their kernel it may surprise them how things get auto-mangled.

Is what Alex is suggesting a separate PV device that says "I would
like to be a bond of those two interfaces"?  That would make the HV
intent explicit and kernel decisions more understandable.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-12-05 22:29                     ` Jakub Kicinski
@ 2017-12-05 22:41                       ` Stephen Hemminger
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen Hemminger @ 2017-12-05 22:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: achiad shochat, Alexander Duyck, Michael S. Tsirkin,
	Hannes Frederic Sowa, Sridhar Samudrala, netdev, virtualization,
	Achiad, Peter Waskiewicz Jr, Singhai, Anjali, Shannon Nelson,
	Andy Gospodarek, Or Gerlitz

On Tue, 5 Dec 2017 14:29:28 -0800
Jakub Kicinski <kubakici@wp.pl> wrote:

> On Tue, 5 Dec 2017 11:59:17 +0200, achiad shochat wrote:
> > >>>> I second Jacob - having a netdev of one device driver enslave a netdev
> > >>>> of another device driver is an awkward a-symmetric model.
> > >>>> Regardless of whether they share the same backend device.
> > >>>> Only I am not sure the Linux Bond is the right choice.
> > >>>> e.g one may well want to use the virtio device also when the
> > >>>> pass-through device is available, e.g for multicasts, east-west
> > >>>> traffic, etc.
> > >>>> I'm not sure the Linux Bond fits that functionality.
> > >>>> And, as I hear in this thread, it is hard to make it work out of the box.
> > >>>> So I think the right thing would be to write a new dedicated module
> > >>>> for this purpose.    
> > >
> > > This part I can sort of agree with. What if we were to look at
> > > providing a way to somehow advertise that the two devices were meant
> > > to be boded for virtualization purposes? For now lets call it a
> > > "virt-bond". Basically we could look at providing a means for virtio
> > > and VF drivers to advertise that they want this sort of bond. Then it
> > > would just be a matter of providing some sort of side channel to
> > > indicate where you want things like multicast/broadcast/east-west
> > > traffic to go.  
> > 
> > I like this approach.  
> 
> +1 on a separate driver, just enslaving devices to virtio may break
> existing setups.  If people are bonding from user space today, if they
> update their kernel it may surprise them how things get auto-mangled.
> 
> Is what Alex is suggesting a separate PV device that says "I would
> like to be a bond of those two interfaces"?  That would make the HV
> intent explicit and kernel decisions more understandable.

So far, in my experience it still works.
As long as the kernel slaving happens first, it will work.
The attempt to bond an already slaved device will fail and no scripts seem
to check the error return.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] virtio-net: help live migrate SR-IOV devices
  2017-11-30  3:29 ` [RFC] virtio-net: help live migrate SR-IOV devices Jason Wang
  2017-11-30  3:51   ` Jakub Kicinski
  2017-11-30  8:08   ` achiad shochat
@ 2017-11-30 14:14   ` Michael S. Tsirkin
  2 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2017-11-30 14:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesse Brandeburg, virtualization, Jakub Kicinski,
	Sridhar Samudrala, Achiad, Peter Waskiewicz Jr, Singhai, Anjali,
	Andy Gospodarek, Or Gerlitz, Hannes Frederic Sowa, netdev

On Thu, Nov 30, 2017 at 11:29:56AM +0800, Jason Wang wrote:
> If my understanding is correct there's no need to for any extension of
> virtio spec.

There appears to be a concern that some existing configurations
might use same MAC for an unrelated reason. Not sure what
that could be, but for sure, we could add a feature flag.
That needs to be approved by the virtio TC, but it's
just a single line in the spec no big deal, we can help here.

> If this is true, maybe you can start to prepare the patch?

Yes, please do. We can add a safeguard of a feature bit on top.

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-12-07 16:53 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20171128112722.00003716@intel.com>
2017-11-30  3:29 ` [RFC] virtio-net: help live migrate SR-IOV devices Jason Wang
2017-11-30  3:51   ` Jakub Kicinski
2017-11-30  4:10     ` Stephen Hemminger
2017-11-30  4:21       ` Jakub Kicinski
2017-11-30 13:54     ` Michael S. Tsirkin
2017-11-30 20:48       ` Jakub Kicinski
2017-12-01  5:13         ` Michael S. Tsirkin
2017-11-30  8:08   ` achiad shochat
2017-11-30 14:11     ` Michael S. Tsirkin
2017-12-01 20:08       ` Shannon Nelson
2017-12-03  5:05         ` Michael S. Tsirkin
2017-12-03  9:14           ` achiad shochat
2017-12-03 17:35             ` Stephen Hemminger
2017-12-04  9:51               ` achiad shochat
2017-12-04 16:30                 ` Alexander Duyck
2017-12-05  9:59                   ` achiad shochat
2017-12-05 19:20                     ` Michael S. Tsirkin
2017-12-05 21:52                       ` Jesse Brandeburg
2017-12-05 22:05                         ` Michael S. Tsirkin
2017-12-07  7:28                       ` achiad shochat
2017-12-07 16:45                         ` Alexander Duyck
2017-12-07 16:53                           ` Michael S. Tsirkin
2017-12-05 22:29                     ` Jakub Kicinski
2017-12-05 22:41                       ` Stephen Hemminger
2017-11-30 14:14   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).