Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Jiri Pirko <jiri@resnulli.us>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: "Duyck, Alexander H" <alexander.h.duyck@intel.com>,
	virtio-dev@lists.oasis-open.org,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jakub Kicinski <kubakici@wp.pl>,
	"Samudrala, Sridhar" <sridhar.samudrala@intel.com>,
	virtualization@lists.linux-foundation.org,
	Siwei Liu <loseweigh@gmail.com>, Netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>
Subject: Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
Date: Wed, 21 Feb 2018 20:38:32 +0100	[thread overview]
Message-ID: <20180221193832.GE1996@nanopsycho> (raw)
In-Reply-To: <CAKgT0UeZS_SzNfiMX3ujy+a9WZWySVdyTpyQ=b=RJJrAG3JO=w@mail.gmail.com>

Wed, Feb 21, 2018 at 06:56:35PM CET, alexander.duyck@gmail.com wrote:
>On Wed, Feb 21, 2018 at 8:58 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Wed, Feb 21, 2018 at 05:49:49PM CET, alexander.duyck@gmail.com wrote:
>>>On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.duyck@gmail.com wrote:
>>>>>On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>> Tue, Feb 20, 2018 at 11:33:56PM CET, kubakici@wp.pl wrote:
>>>>>>>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
>>>>>>>> Yeah, I can see it now :( I guess that the ship has sailed and we are
>>>>>>>> stuck with this ugly thing forever...
>>>>>>>>
>>>>>>>> Could you at least make some common code that is shared in between
>>>>>>>> netvsc and virtio_net so this is handled in exacly the same way in both?
>>>>>>>
>>>>>>>IMHO netvsc is a vendor specific driver which made a mistake on what
>>>>>>>behaviour it provides (or tried to align itself with Windows SR-IOV).
>>>>>>>Let's not make a far, far more commonly deployed and important driver
>>>>>>>(virtio) bug-compatible with netvsc.
>>>>>>
>>>>>> Yeah. netvsc solution is a dangerous precedent here and in my opinition
>>>>>> it was a huge mistake to merge it. I personally would vote to unmerge it
>>>>>> and make the solution based on team/bond.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>To Jiri's initial comments, I feel the same way, in fact I've talked to
>>>>>>>the NetworkManager guys to get auto-bonding based on MACs handled in
>>>>>>>user space.  I think it may very well get done in next versions of NM,
>>>>>>>but isn't done yet.  Stephen also raised the point that not everybody is
>>>>>>>using NM.
>>>>>>
>>>>>> Can be done in NM, networkd or other network management tools.
>>>>>> Even easier to do this in teamd and let them all benefit.
>>>>>>
>>>>>> Actually, I took a stab to implement this in teamd. Took me like an hour
>>>>>> and half.
>>>>>>
>>>>>> You can just run teamd with config option "kidnap" like this:
>>>>>> # teamd/teamd -c '{"kidnap": true }'
>>>>>>
>>>>>> Whenever teamd sees another netdev to appear with the same mac as his,
>>>>>> or whenever teamd sees another netdev to change mac to his,
>>>>>> it enslaves it.
>>>>>>
>>>>>> Here's the patch (quick and dirty):
>>>>>>
>>>>>> Subject: [patch teamd] teamd: introduce kidnap feature
>>>>>>
>>>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>>>
>>>>>So this doesn't really address the original problem we were trying to
>>>>>solve. You asked earlier why the netdev name mattered and it mostly
>>>>>has to do with configuration. Specifically what our patch is
>>>>>attempting to resolve is the issue of how to allow a cloud provider to
>>>>>upgrade their customer to SR-IOV support and live migration without
>>>>>requiring them to reconfigure their guest. So the general idea with
>>>>>our patch is to take a VM that is running with virtio_net only and
>>>>>allow it to instead spawn a virtio_bypass master using the same netdev
>>>>>name as the original virtio, and then have the virtio_net and VF come
>>>>>up and be enslaved by the bypass interface. Doing it this way we can
>>>>>allow for multi-vendor SR-IOV live migration support using a guest
>>>>>that was originally configured for virtio only.
>>>>>
>>>>>The problem with your solution is we already have teaming and bonding
>>>>>as you said. There is already a write-up from Red Hat on how to do it
>>>>>(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
>>>>>That is all well and good as long as you are willing to keep around
>>>>>two VM images, one for virtio, and one for SR-IOV with live migration.
>>>>
>>>> You don't need 2 images. You need only one. The one with the team setup.
>>>> That's it. If another netdev with the same mac appears, teamd will
>>>> enslave it and run traffic on it. If not, ok, you'll go only through
>>>> virtio_net.
>>>
>>>Isn't that going to cause the routing table to get messed up when we
>>>rearrange the netdevs? We don't want to have an significant disruption
>>> in traffic when we are adding/removing the VF. It seems like we would
>>>need to invalidate any entries that were configured for the virtio_net
>>>and reestablish them on the new team interface. Part of the criteria
>>>we have been working with is that we should be able to transition from
>>>having a VF to not or vice versa without seeing any significant
>>>disruption in the traffic.
>>
>> What? You have routes on the team netdev. virtio_net and VF are only
>> slaves. What are you talking about? I don't get it :/
>
>So lets walk though this by example. The general idea of the base case
>for all this is somebody starting with virtio_net, we will call the
>interface "ens1" for now. It comes up and is assigned a dhcp address
>and everything works as expected. Now in order to get better
>performance we want to add a VF "ens2", but we don't want a new IP
>address. Now if I understand correctly what will happen is that when
>"ens2" appears on the system teamd will then create a new team
>interface "team0". Before teamd can enslave ens1 it has to down the

No, you don't understand that correctly.

There is always ens1 and team0. ens1 is a slave of team0. team0 is the
interface to use, to set ip on etc.

When ens2 appears, it gets enslaved to team0 as well.


>interface if I understand things correctly. This means that we have to
>disrupt network traffic in order for this to work.
>
>To give you an idea of where we were before this became about trying
>to do this in the team or bonding driver, we were debating a 2 netdev
>model versus a 3 netdev model. I will call out the model and the
>advantages/disadvantages of those below.
>
>2 Netdev model, "ens1", enslaves "ens2".
>- Requires dropping in-driver XDP in order to work (won't capture VF
>traffic otherwise)
>- VF takes performance hit for extra qdisc/Tx queue lock of virtio_net interface
>- If you ass-u-me (I haven't been a fan of this model if you can't
>tell) that it is okay to rip out in-driver XDP from virtio_net, then
>you could transition between base virtio, virtio w/ backup bit set.
>- Works for netvsc because they limit their features (no in-driver
>XDP) to guarantee this works.
>
>3 Netdev model, "ens1", enslaves "ens1nbackup" and "ens2"
>- Exposes 2 netdevs "ens1" and "ens1nbackup" when only virtio is present
>- No extra qdisc or locking
>- All virtio_net original functionality still present
>- Not able to transition from virtio to virtio w/ backup without
>disruption (requires hot-plug)
>
>The way I see it the only way your team setup could work would be
>something closer to the 3 netdev model. Basically we would be
>requiring the user to always have the team0 present in order to make
>certain that anything like XDP would be run on the team interface
>instead of assuming that the virtio_net could run by itself. I will
>add it as a third option here to compare to the other 2.

Yes.


>
>3 Netdev "team" model, "team0", enslaves "ens1" and "ens2"
>- Requires guest to configure teamd
>- Exposes "team0" and "ens1" when only virtio is present
>- No extra qdisc or locking
>- Doesn't require "backup" bit in virtio
>
>>>
>>>Also how does this handle any static configuration? I am assuming that
>>>everything here assumes the team will be brought up as soon as it is
>>>seen and assigned a DHCP address.
>>
>> Again. You configure whatever you need on the team netdev.
>
>Just so we are clear, are you then saying that the team0 interface
>will always be present with this configuration? You had made it sound

Of course.


>like it would disappear if you didn't have at least 2 interfaces.

Where did I make it sound like that? No.


>
>>>
>>>The solution as you have proposed seems problematic at best. I don't
>>>see how the team solution works without introducing some sort of
>>>traffic disruption to either add/remove the VF and bring up/tear down
>>>the team interface. At that point we might as well just give up on
>>>this piece of live migration support entirely since the disruption was
>>>what we were trying to avoid. We might as well just hotplug out the VF
>>>and hotplug in a virtio at the same bus device and function number and
>>>just let udev take care of renaming it for us. The idea was supposed
>>>to be a seamless transition between the two interfaces.
>>
>> Alex. What you are trying to do in this patchset and what netvsc does it
>> essentialy in-driver bonding. Same thing mechanism, rx_handler,
>> everything. I don't really understand what are you talking about. With
>> use of team you will get exactly the same behaviour.
>
>So the goal of the "in-driver bonding" is to make the bonding as
>non-intrusive as possible and require as little user intervention as
>possible. I agree that much of the handling is the same, however the
>control structure and requirements are significantly different. That
>has been what I have been trying to explain. You keep wanting to use
>the existing structures, but they don't really apply cleanly because
>they push control for the interface up into the guest, and that
>doesn't make much sense in the case of virtualization. What is
>happening here is that we are exposing a bond that the guest should
>have no control over, or at least as little as possible. In addition
>making the user have to add additional configuration in the guest
>means that there is that much more that can go wrong if they screw it
>up.
>
>The other problem here is that the transition needs to be as seamless
>as possible between just a standard virtio_net setup and this new
>setup. With either the team or bonding setup you end up essentially
>forcing the guest to have the bond/team always there even if they are
>running only a single interface. Only if they "upgrade" the VM by
>adding a VF then it finally gets to do anything.

Yeah. There is certainly a dilemma. We have to choose between
1) weird and hackish in-driver semi-bonding that would be simple
   for user.
2) the standard way that would be perhaps slighly more complicated
   for user.

>
>What this comes down to for us is the following requirements:
>1. The name of the interface cannot change when going from virtio_net,
>to virtio_net being bypassed using a VF. We cannot create an interface
>on top of the interface, if anything we need to push the original
>virtio_net out of the way so that the new team interface takes its
>place in the configuration of the system. Otherwise a VM with VF w/
>live migration will require a different configuration than one that
>just runs virtio_net.

Team driver netdev is still the same, no name changes.


>2. We need some way to signal if this VM should be running in an
>"upgraded" mode or not. We have been using the backup bit in
>virtio_net to do that. If it isn't "upgraded" then we don't need the
>team/bond and we can just run with virtio_net.

I don't see why the team cannot be there always.


>3. We cannot introduce any downtime on the interface when adding a VF
>or removing it. The link must stay up the entire time and be able to
>handle packets.

Sure. That should be handled by the team. Whenever the VF netdev
disappears, traffic would switch over to the virtio_net. The benefit of
your in-driver bonding solution is that qemu can actually signal the
guest driver that the disappearance would happen and do the switch a bit
earlier. But that is something that might be implemented in a different
channel where the kernel might get notification that certain pci is
going to disappear so everyone could prepare. Just an idea.

next prev parent reply	other threads:[~2018-02-21 19:38 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-16 18:11 [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device Sridhar Samudrala
2018-02-16 18:11 ` [RFC PATCH v3 1/3] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit Sridhar Samudrala
2018-02-16 18:11 ` [RFC PATCH v3 2/3] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
2018-02-17  3:04   ` Jakub Kicinski
2018-02-17 17:41     ` Alexander Duyck
2018-02-16 18:11 ` [RFC PATCH v3 3/3] virtio_net: Enable alternate datapath without creating an additional netdev Sridhar Samudrala
2018-02-17  2:38 ` [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device Jakub Kicinski
2018-02-17 17:12   ` Alexander Duyck
2018-02-19  6:11     ` Jakub Kicinski
2018-02-20 16:26       ` Samudrala, Sridhar
2018-02-21 23:50     ` Siwei Liu
2018-02-22  0:17       ` Alexander Duyck
2018-02-22  1:59         ` Siwei Liu
2018-02-22  2:35           ` Samudrala, Sridhar
2018-02-22  3:28             ` Samudrala, Sridhar
2018-02-23 22:22             ` Siwei Liu
2018-02-23 22:38               ` Jiri Pirko
2018-02-24  0:17                 ` Siwei Liu
2018-02-24  0:03         ` Stephen Hemminger
2018-02-25 22:17           ` Alexander Duyck
2018-02-20 10:42 ` Jiri Pirko
2018-02-20 16:04   ` Alexander Duyck
2018-02-20 16:29     ` Jiri Pirko
2018-02-20 17:14       ` Samudrala, Sridhar
2018-02-20 20:14         ` Jiri Pirko
2018-02-20 21:02           ` Alexander Duyck
2018-02-20 22:33           ` Jakub Kicinski
2018-02-21  9:51             ` Jiri Pirko
2018-02-21 15:56               ` Alexander Duyck
2018-02-21 16:11                 ` Jiri Pirko
2018-02-21 16:49                   ` Alexander Duyck
2018-02-21 16:58                     ` Jiri Pirko
2018-02-21 17:56                       ` Alexander Duyck
2018-02-21 19:38                         ` Jiri Pirko [this message]
2018-02-21 20:57                           ` Alexander Duyck
2018-02-22  2:02                             ` Jakub Kicinski
2018-02-22  2:15                               ` Samudrala, Sridhar
2018-02-22  8:11                             ` Jiri Pirko
2018-02-22 11:54                               ` Or Gerlitz
2018-02-22 13:07                                 ` Jiri Pirko
2018-02-22 15:30                                   ` Alexander Duyck
2018-02-22 21:30                               ` Alexander Duyck
2018-02-23 23:59                                 ` Stephen Hemminger
2018-02-25 22:21                                   ` Alexander Duyck
2018-02-26  7:19                                   ` Jiri Pirko
2018-02-27  1:02                                     ` Stephen Hemminger
2018-02-27  1:18                                       ` Michael S. Tsirkin
2018-02-27  8:27                                         ` Jiri Pirko
2018-02-20 17:23       ` Alexander Duyck
2018-02-20 19:53         ` Jiri Pirko
2018-02-27  8:49     ` Jiri Pirko
2018-02-27 21:16       ` Alexander Duyck
2018-02-27 21:23         ` Michael S. Tsirkin
2018-02-27 21:41         ` Jakub Kicinski
2018-02-28  7:08           ` Jiri Pirko
2018-02-28 14:32             ` Michael S. Tsirkin
2018-02-28 15:11               ` Jiri Pirko
2018-02-28 15:45                 ` Michael S. Tsirkin
2018-02-28 19:25                   ` Jiri Pirko
2018-02-28 20:48                     ` Michael S. Tsirkin
2018-02-27 21:30       ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180221193832.GE1996@nanopsycho \
    --to=jiri@resnulli.us \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=davem@davemloft.net \
    --cc=kubakici@wp.pl \
    --cc=loseweigh@gmail.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sridhar.samudrala@intel.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox