From: Jiri Pirko <jiri@resnulli.us>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: "Duyck, Alexander H" <alexander.h.duyck@intel.com>,
virtio-dev@lists.oasis-open.org,
"Michael S. Tsirkin" <mst@redhat.com>,
Jakub Kicinski <kubakici@wp.pl>,
Sridhar Samudrala <sridhar.samudrala@intel.com>,
virtualization@lists.linux-foundation.org,
Siwei Liu <loseweigh@gmail.com>, Netdev <netdev@vger.kernel.org>,
David Miller <davem@davemloft.net>
Subject: Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
Date: Tue, 27 Feb 2018 09:49:59 +0100 [thread overview]
Message-ID: <20180227084959.GB2005@nanopsycho> (raw)
In-Reply-To: <CAKgT0UdU8PDXduzxp4kKfur-DLeFQSJj7-fhW_eTgVzd+AcViw@mail.gmail.com>
Tue, Feb 20, 2018 at 05:04:29PM CET, alexander.duyck@gmail.com wrote:
>On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudrala@intel.com wrote:
>>>Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be
>>>used by hypervisor to indicate that virtio_net interface should act as
>>>a backup for another device with the same MAC address.
>>>
>>>Ppatch 2 is in response to the community request for a 3 netdev
>>>solution. However, it creates some issues we'll get into in a moment.
>>>It extends virtio_net to use alternate datapath when available and
>>>registered. When BACKUP feature is enabled, virtio_net driver creates
>>>an additional 'bypass' netdev that acts as a master device and controls
>>>2 slave devices. The original virtio_net netdev is registered as
>>>'backup' netdev and a passthru/vf device with the same MAC gets
>>>registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are
>>>associated with the same 'pci' device. The user accesses the network
>>>interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev
>>>as default for transmits when it is available with link up and running.
>>
>> Sorry, but this is ridiculous. You are apparently re-implemeting part
>> of bonding driver as a part of NIC driver. Bond and team drivers
>> are mature solutions, well tested, broadly used, with lots of issues
>> resolved in the past. What you try to introduce is a weird shortcut
>> that already has couple of issues as you mentioned and will certanly
>> have many more. Also, I'm pretty sure that in future, someone comes up
>> with ideas like multiple VFs, LACP and similar bonding things.
>
>The problem with the bond and team drivers is they are too large and
>have too many interfaces available for configuration so as a result
>they can really screw this interface up.
>
>Essentially this is meant to be a bond that is more-or-less managed by
>the host, not the guest. We want the host to be able to configure it
>and have it automatically kick in on the guest. For now we want to
>avoid adding too much complexity as this is meant to be just the first
>step. Trying to go in and implement the whole solution right from the
>start based on existing drivers is going to be a massive time sink and
>will likely never get completed due to the fact that there is always
>going to be some other thing that will interfere.
>
>My personal hope is that we can look at doing a virtio-bond sort of
>device that will handle all this as well as providing a communication
>channel, but that is much further down the road. For now we only have
>a single bit so the goal for now is trying to keep this as simple as
>possible.
I have another usecase that would require the solution to be different
then what you suggest. Consider following scenario:
- baremetal has 2 sr-iov nics
- there is a vm, has 1 VF from each nics: vf0, vf1. No virtio_net
- baremetal would like to somehow tell the VM to bond vf0 and vf1
together and how this bonding should be configured, according to how
the VF representors are configured on the baremetal (LACP for example)
The baremetal could decide to remove any VF during the VM runtime, it
can add another VF there. For migration, it can add virtio_net. The VM
should be inctructed to bond all interfaces together according to how
baremetal decided - as it knows better.
For this we need a separate communication channel from baremetal to VM
(perhaps something re-usable already exists), we need something to
listen to the events coming from this channel (kernel/userspace) and to
react accordingly (create bond/team, enslave, etc).
Now the question is: is it possible to merge the demands you have and
the generic needs I described into a single solution? From what I see,
that would be quite hard/impossible. So at the end, I think that we have
to end-up with 2 solutions:
1) virtio_net, netvsc in-driver bonding - very limited, stupid, 0config
solution that works for all (no matter what OS you use in VM)
2) team/bond solution with assistance of preferably userspace daemon
getting info from baremetal. This is not 0config, but minimal config
- user just have to define this "magic bonding" should be on.
This covers all possible usecases, including multiple VFs, RDMA, etc.
Thoughts?
next prev parent reply other threads:[~2018-02-27 8:49 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-16 18:11 [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device Sridhar Samudrala
2018-02-16 18:11 ` [RFC PATCH v3 1/3] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit Sridhar Samudrala
2018-02-16 18:11 ` [RFC PATCH v3 2/3] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
2018-02-17 3:04 ` Jakub Kicinski
2018-02-17 17:41 ` Alexander Duyck
2018-02-16 18:11 ` [RFC PATCH v3 3/3] virtio_net: Enable alternate datapath without creating an additional netdev Sridhar Samudrala
2018-02-17 2:38 ` [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device Jakub Kicinski
2018-02-17 17:12 ` Alexander Duyck
2018-02-19 6:11 ` Jakub Kicinski
2018-02-20 16:26 ` Samudrala, Sridhar
2018-02-21 23:50 ` Siwei Liu
2018-02-22 0:17 ` Alexander Duyck
2018-02-22 1:59 ` Siwei Liu
2018-02-22 2:35 ` Samudrala, Sridhar
2018-02-22 3:28 ` Samudrala, Sridhar
2018-02-23 22:22 ` Siwei Liu
2018-02-23 22:38 ` Jiri Pirko
2018-02-24 0:17 ` Siwei Liu
2018-02-24 0:03 ` Stephen Hemminger
2018-02-25 22:17 ` Alexander Duyck
2018-02-20 10:42 ` Jiri Pirko
2018-02-20 16:04 ` Alexander Duyck
2018-02-20 16:29 ` Jiri Pirko
2018-02-20 17:14 ` Samudrala, Sridhar
2018-02-20 20:14 ` Jiri Pirko
2018-02-20 21:02 ` Alexander Duyck
2018-02-20 22:33 ` Jakub Kicinski
2018-02-21 9:51 ` Jiri Pirko
2018-02-21 15:56 ` Alexander Duyck
2018-02-21 16:11 ` Jiri Pirko
2018-02-21 16:49 ` Alexander Duyck
2018-02-21 16:58 ` Jiri Pirko
2018-02-21 17:56 ` Alexander Duyck
2018-02-21 19:38 ` Jiri Pirko
2018-02-21 20:57 ` Alexander Duyck
2018-02-22 2:02 ` Jakub Kicinski
2018-02-22 2:15 ` Samudrala, Sridhar
2018-02-22 8:11 ` Jiri Pirko
2018-02-22 11:54 ` Or Gerlitz
2018-02-22 13:07 ` Jiri Pirko
2018-02-22 15:30 ` Alexander Duyck
2018-02-22 21:30 ` Alexander Duyck
2018-02-23 23:59 ` Stephen Hemminger
2018-02-25 22:21 ` Alexander Duyck
2018-02-26 7:19 ` Jiri Pirko
2018-02-27 1:02 ` Stephen Hemminger
2018-02-27 1:18 ` Michael S. Tsirkin
2018-02-27 8:27 ` Jiri Pirko
2018-02-20 17:23 ` Alexander Duyck
2018-02-20 19:53 ` Jiri Pirko
2018-02-27 8:49 ` Jiri Pirko [this message]
2018-02-27 21:16 ` Alexander Duyck
2018-02-27 21:23 ` Michael S. Tsirkin
2018-02-27 21:41 ` Jakub Kicinski
2018-02-28 7:08 ` Jiri Pirko
2018-02-28 14:32 ` Michael S. Tsirkin
2018-02-28 15:11 ` Jiri Pirko
2018-02-28 15:45 ` Michael S. Tsirkin
2018-02-28 19:25 ` Jiri Pirko
2018-02-28 20:48 ` Michael S. Tsirkin
2018-02-27 21:30 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180227084959.GB2005@nanopsycho \
--to=jiri@resnulli.us \
--cc=alexander.duyck@gmail.com \
--cc=alexander.h.duyck@intel.com \
--cc=davem@davemloft.net \
--cc=kubakici@wp.pl \
--cc=loseweigh@gmail.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=sridhar.samudrala@intel.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox