From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jiri Pirko Subject: Re: [patch net-next RFC 0/4] introduce infrastructure for support of switch chip datapath Date: Sat, 22 Mar 2014 10:40:07 +0100 Message-ID: <20140322094007.GA2844@minipsycho.orion> References: <1395243232-32630-1-git-send-email-jiri@resnulli.us> <532AD5B3.6020205@mojatatu.com> <20140320124021.GA2946@minipsycho.orion> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jamal Hadi Salim , netdev , David Miller , Neil Horman , andy@greyhouse.net, tgraf@suug.ch, dborkman@redhat.com, ogerlitz@mellanox.com, jesse@nicira.com, pshelar@nicira.com, azhou@nicira.com, Ben Hutchings , Stephen Hemminger , jeffrey.t.kirsher@intel.com, vyasevic , Cong Wang , John Fastabend , Eric Dumazet , Scott Feldman , Lennert Buytenhek To: Florian Fainelli Return-path: Received: from mail-ee0-f52.google.com ([74.125.83.52]:51957 "EHLO mail-ee0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750759AbaCVJkL (ORCPT ); Sat, 22 Mar 2014 05:40:11 -0400 Received: by mail-ee0-f52.google.com with SMTP id e49so2599793eek.39 for ; Sat, 22 Mar 2014 02:40:10 -0700 (PDT) Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Thu, Mar 20, 2014 at 06:21:10PM CET, f.fainelli@gmail.com wrote: >2014-03-20 5:40 GMT-07:00 Jiri Pirko : >> Thu, Mar 20, 2014 at 12:49:07PM CET, jhs@mojatatu.com wrote: >>>Hi Jiri, >>> >>>On 03/19/14 11:33, Jiri Pirko wrote: >>>>This is just an early draft, RFC. I wanted to post this early to get the >>>>feedback as soon as possible. >>>> >>>>The basic idea is to introduce a generic infractructure to support various >>>>switch chips in kernel. Also the idea is to benefit of currently existing >>>>Open vSwitch userspace infrastructure. >>>> >>> >>> >>>I think the abstraction should be a netdev and to be specific the >>>bridge - not openvswitch. Our current tools like ifconfig, iproute2, >>>bridge etc should continue to work. >> >> That is exactly the case. Nothing is specific to OVS. OVS is just a one >> method to access the switchdev api. >> >> Abstraction is netdev. One netdev per each switch port and one netdev as >> a master on the top of that representing the switch itself. >> >> >>>In my experience, it is sufficient to model a switch after the linux >>>bridge at the basic level if the starting point is >>>L2 (which is the lowest common denominator). >>>And then you add capabilities that different chips expose. >>>Not every chip can do vxlan, flows etc. And we already know how >>>to abstract those out. >>>My experience on top of broadcom chips is the approach i described >>>works rather well. >>> >>>Additionally, note: >>>We do have L2 devices that offload in the kernel >>>(refer to DSA, posting earlier from the openwrt guys, and >>>the intel devices which do VDMQ etc). I am now counting we have 5 >>>different approaches if we add yours. >> >> I think that the problem is that each solution serves different purpose. >> For example DSA is for switches connected as a PHY to a MAC. That is >> completely different case to what my switchdev API is trying to handle. > >I agree with Jamal here, we should try to find a solution that fits >most users here, it seems to me like there are 3 switches categories: > >- entreprise built-in switches in NICs that support VF/PF >- embedded/entreprise switches that support tagging (Marvell eDSA/DSA, >Broadcom tags) >- embedded switches that only support 802.1q VLANs One case which you maybe forgot: switch chip ------------------------ | | | | | | | CPU p1 p2 ...pn px py MNGMNT ----------- | | | pcie | | | --------------- | | | | NIC0 NIC1 | | ---pcie----- | | | ------someMII------- | ---------someMII----------- NIC0 and NIC1 are ordinary NICs like 8139too for example with no notion they are connected to a switch. They as completely independent on the mngmnt iface. > >The first category is more flow-oriented than control-oriented, >whereas the last two are more "event and control" oriented where you >usually have a system where the switch will be configured not to flood >the CPU port if possible, but when it does, this is to perform >specific configuration (address learning, port protection, snooping, >authorization...). > >DSA is not designed specifically for switches which are connected to a >MAC and appear as a regular PHY, this is how it first started, but >nothing prevents you from using DSA with a switch that is e.g: memory >mapped into your CPU register space, MDIO is just the transport for >the control part. I see that DSA now is *very* MII-oriented. I'm not sure how hard it would be to rewrite it to be more negeric and if it would make sense at all. > >For instance, if my switches support a N-bytes tag that will give me a >reason code for receiving this frame, and a bitmap representing the >originating port, how would you imagine this fitting into the >openvswitch/switchdev model, aside from the netdev per-port? Do you >think we could easily migrate existing DSA users to >openvswitch/switchdev by handling the custom switch tag? I do not think so either. >-- >Florian