From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Samudrala, Sridhar" Subject: Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available Date: Sun, 28 Jan 2018 13:01:59 -0800 Message-ID: References: <1515736720-39368-1-git-send-email-sridhar.samudrala@intel.com> <20180122233205-mutt-send-email-mst@kernel.org> <20180124004556-mutt-send-email-mst@kernel.org> <731db0a0-85e7-a88e-6e0e-c540086347b5@intel.com> <20180127000831-mutt-send-email-mst@kernel.org> <20180126144704.6e1a9628@cakuba.netronome.com> <5fa8c6c6-4a94-91fa-fdbb-ee7b624d703f@intel.com> <20180126183003.591cd5c5@cakuba.netronome.com> <20180126215829.7b3c6bac@cakuba.netronome.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Jakub Kicinski , "Michael S. Tsirkin" , Siwei Liu , Stephen Hemminger , David Miller , Netdev , virtualization@lists.linux-foundation.org, virtio-dev@lists.oasis-open.org, "Brandeburg, Jesse" , Alexander Duyck To: Alexander Duyck Return-path: Received: from mga03.intel.com ([134.134.136.65]:43683 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752215AbeA1VCB (ORCPT ); Sun, 28 Jan 2018 16:02:01 -0500 In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 1/28/2018 12:18 PM, Alexander Duyck wrote: > On Sun, Jan 28, 2018 at 11:18 AM, Samudrala, Sridhar > wrote: >> On 1/28/2018 9:35 AM, Alexander Duyck wrote: >>> On Fri, Jan 26, 2018 at 9:58 PM, Jakub Kicinski wrote: >>>> On Fri, 26 Jan 2018 21:33:01 -0800, Samudrala, Sridhar wrote: >>>>>>> 3 netdev model breaks this configuration starting with the creation >>>>>>> and naming of the 2 devices to udev needing to be aware of master and >>>>>>> slave virtio-net devices. >>>>>> I don't understand this comment. There is one virtio-net device and >>>>>> one "virtio-bond" netdev. And user space has to be aware of the >>>>>> special >>>>>> automatic arrangement anyway, because it can't touch the VF. It >>>>>> doesn't make any difference whether it ignores the VF or PV and VF. >>>>>> It simply can't touch the slaves, no matter how many there are. >>>>> If the userspace is not expected to touch the slaves, then why do we >>>>> need to >>>>> take extra effort to expose a netdev that is just not really useful. >>>> You said: >>>> "[user space] needs to be aware of master and slave virtio-net devices." >>>> >>>> I'm saying: >>>> It has to be aware of the special arrangement whether there is an >>>> explicit bond netdev or not. >>> To clarify here the kernel should be aware that there is a device that >>> is an aggregate of 2 other devices. It isn't as if we need to insert >>> the new device "above" the virtio. >>> >>> I have been doing a bit of messing around with a few ideas and it >>> seems like it would be better if we could replace the virtio interface >>> with the virtio-bond, renaming my virt-bond concept to this since it >>> is now supposedly going to live in the virtio driver, interface, and >>> then push the original virtio down one layer and call it a >>> virtio-backup. If I am not mistaken we could assign the same dev >>> pointer used by the virtio netdev to the virtio-bond, and if we >>> register it first with the "eth%d" name then udev will assume that the >>> virtio-bond device is the original virtio and all existing scripts >>> should just work with that. We then would want to change the name of >>> the virtio interface with the backup feature bit set, maybe call it >>> something like bkup-00:00:00 where the 00:00:00 would be the last 3 >>> octets of the MAC address. It should solve the issue of inserting an >>> interface "above" the virtio by making the virtio-bond become the >>> virtio. The only limitation is that we will probably need to remove >>> the back-up if the virtio device is removed, however that was already >>> a limitation of this solution and others like the netvsc solution >>> anyway. >> >> With 3 netdev model, if we make the the master virtio-net associated with >> the >> real virtio pci device, i think the userspace scripts would not break. >> If we go this route, i am still not clear on the purpose of exposing the >> bkup netdev. >> Can we start with the 2 netdev model and move to 3 netdev model later if we >> find out that there are limitiations with the 2 netdev model? I don't think >> this will >> break any user API as the userspace is not expected to use the bkup netdev. > The 2 netdev model breaks a large number of expectations of user > space. Among them is XDP since we cannot guarantee a symmetric setup > between any entity and the virtio. How does it make sense that > enabling XDP on virtio shows zero Rx packets, and in the meantime you > are getting all of the packets coming in off of the VF? Sure we cannot support XDP in this model and it needs to be disabled. > > In addition we would need to rewrite the VLAN and MAC address > filtering ndo operations since we likely cannot add any VLANs since in > most cases VFs are VLAN locked due to things like port VLAN and we > cannot change the MAC address since the whole bonding concept is built > around it. > > The last bit is the netpoll packet routing which the current code > assumes is using the virtio only, but I don't know if that is a valid > assumption since the VF is expected to be the default route for > everything else when it is available. > > The point is by the time you are done you will have rewritten pretty > much all the network device ops. With that being the case why add all > the code to virtio itself instead of just coming up with a brand new > set of ndo_ops that belong to this new device, and you could leave the > existing virtio code in place and save yourself a bunch of time by > just accessing it as an existing call as a separate netdev. When the BACKUP feature is enabled, we can simply disable most of these ndo ops that cannot be supported. Not sure we need an additional netdev and ndo_ops. When we can support all these usecases along with live migration we can move to the 3 netdev model and i think we will need a new feature bit so that the hypervisor can allow VM to use both datapaths and configure PF accordingly. Thanks Sridhar