From: Roopa Prabhu <roopa@cumulusnetworks.com>
To: Florian Fainelli <f.fainelli@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>, Thomas Graf <tgraf@suug.ch>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Jiri Pirko <jiri@resnulli.us>, netdev <netdev@vger.kernel.org>,
David Miller <davem@davemloft.net>,
Andy Gospodarek <andy@greyhouse.net>,
dborkman <dborkman@redhat.com>, ogerlitz <ogerlitz@mellanox.com>,
jesse <jesse@nicira.com>, pshelar <pshelar@nicira.com>,
azhou <azhou@nicira.com>, Ben Hutchings <ben@decadent.org.uk>,
Stephen Hemminger <stephen@networkplumber.org>,
jeffrey.t.kirsher@intel.com, vyasevic <vyasevic@redhat.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
John Fastabend <john.r.fastabend@intel.com>,
Eric Dumazet <edumazet@google.com>,
Scott Feldman <sfeldma@cumulusnetworks.com>,
Lennert Buytenhek <buytenh@wantstofly.org>,
Shrijeet Mukherjee <shm@cumulusnetworks.com>
Subject: Re: [patch net-next RFC 0/4] introduce infrastructure for support of switch chip datapath
Date: Tue, 25 Mar 2014 22:37:03 -0700 [thread overview]
Message-ID: <5332677F.2090404@cumulusnetworks.com> (raw)
In-Reply-To: <CAGVrzcbRggJV+5+EuZnXVoU+qdK9ugPRL+J-mp-UraGuo19jnQ@mail.gmail.com>
On 3/25/14, 1:11 PM, Florian Fainelli wrote:
> 2014-03-25 12:35 GMT-07:00 Neil Horman <nhorman@tuxdriver.com>:
>> On Tue, Mar 25, 2014 at 06:00:09PM +0000, Thomas Graf wrote:
>>> On 03/25/14 at 01:39pm, Neil Horman wrote:
>>>> No, but it would be really nice if these smaller devices could take advantage of
>>>> this infrastructure. Looking at it, I don't see why thats not possible. The
>>>> big trick (as we've discussed in the past), is using a net_device structure to
>>>> take advantage of all the features that net_devices offer while not enabling the
>>>> device specific features that some hardware doesn't allow.
>>>>
>>>> For instance the broadcom chips that live in many wireless routers would be well
>>>> served by the model jiri has here as far as Media level interface control is
>>>> concerned (i.e. ifup/down/speed/duplex/etc), but its a bit lacking in that
>>>> net_devices are assumed to support L3 protocol configuration (i.e. they can have
>>>> ip addresses assigned to them), which you can't IIRC do on these chips.
>>> How about a new device flag indicating pure L2 mode? Any L3 address
>>> configuration would fail with EAFNOSUPP.
>>>
>> Yeah, we've discussed that before, and it seems like a good idea, though I'm not
>> sure that its flexible enough. It clearly prevents L3 operations on devices
>> that can only do L2, which is great, but that may not be sufficient for some
>> devices. For example, what if you wanted to use ebtables on an L2 port where
>> the hardware can't mirror the actions of a given table rule? Do we need to
>> expand out those capabilities?
>>
>>>> Would it be worth considering a private interface model? That is to say:
>>>>
>>>> 1) Ports on a switch chip are accessed using net_device structures, but
>>>> registered to a private list contained within the switch device, rather than to
>>>> the net namespaces device list.
>>>> 2) Access to the switch ports via user space is done through the master switch
>>>> interface with additional netlink attributes specifying the port on the switch
>>>> to access (or not to access the master switch device directly)
>>>> Such a model I think might fit well with Jiri's code here and provide greater
>>>> flexibility for a wider range of devices. It would of course require
>>>> augmentation for user space, but the changes would be additive, so I think they
>>>> would be reasonable. This would also allow the switch device to have a hook in
>>>> the control path to block or allow features that the hardware may or may not
>>>> support while still being able to use the existing net_device infrastructure to
>>>> support these operations as they are normally carried out.
>>> I believe this would defeat the main advantage of reusing net_device
>>> model which is compatibility with the well established standard toolset.
>>>
>>> In an ideal world, we represent what is possible using the existing
>>> net_device model.
>>>
>> Maybe I'm not being clear. I'm not suggesting that we abandon the use of a
>> net_device to do any of this work, only that we add a layer of indirection to
>> get to it. By Augmenting the existing network device stack to allow
>> registration of net_devices to arbitrary lists, rather than to a fixes
>> per-net-namespace global device list, we can operate net_devices that are only
>> visible within the scope of a given switch fabric. User space still works the
>> same way, it just requires the specification of additional information when
>> speaking to ports on a switch device that may not be directly accessible via the
>> cpu. For example, if a systems has a directly connected nic (em1), and a switch
>> fabric with a master bridge port (sw1), and 10 external ports (sw1pX), we could
>> access them all from user space via ip link show. for example:
>>
>> 1) ip link show:
>> em1
>> sw1
>>
>> 2) ip link show sw1
>> sw1
>>
>> 3) ip link show -p sw1
>> sw1p0
>> sw1p1
>> sw1p2...
> I was scratching my head about why we might want to expose sw1 as a
> separate net_device, but I think this is a good model as it allows for
> a "seamless" switch awareness to be constructed, and allows for
> controlling the CPU/management port(s) of a given Ethernet switch
> separately, which is valuable. It also makes it possible to expose the
> multiple CPU/management ports of a given switch when that exists, and
> finally, there might be special firmware running on the Ethernet
> switch, and that specific 'sw1' net_device could be the one to use to
> talk to this via sockets, ioctls, whatever.
>
>
Sorry about getting on this thread late and possibly in the middle.
Agree on the idea of keeping the ports linked to the master switch dev
(or the 'conduit' to the switch chip) via private list instead of the
master-slave relationship proposed earlier.
By private i mean the netdev->priv linkage to the master switch dev and
not really keeping the ports from being exposed to the user.
We think its better to keep the switch ports exposed as any other netdev
on linux.
This approach will make the switch ports look exactly like a nic port
and all tools will continue to work seamlessly. The switch port
operations could internally be forwarded to the switch netdev (sw1 in
the above case).
example:
$ip link set dev sw1p0 up
$ethtool -S sw1p0
whether sw1 is needed as a separate netdev existing on the system is
debatable.
Most cases the switch port driver (API) can talk to the switch chip
driver without a switch netdev in between.
But there are cases where a switch netdev might become necessary for
switch chip specific operations (which probably has been discussed on
this thread). An example could be a global acl rule that applies to all
switch ports. One can argue that this can be applied on individual
switch ports and the switch driver can take care of consolidating or
optimally programming the acl rule in the switch chip.
Thanks,
Roopa
next prev parent reply other threads:[~2014-03-26 5:46 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-19 15:33 [patch net-next RFC 0/4] introduce infrastructure for support of switch chip datapath Jiri Pirko
2014-03-19 15:33 ` [patch net-next RFC 1/4] openvswitch: split flow structures into ovs specific and generic ones Jiri Pirko
2014-03-20 13:04 ` Thomas Graf
2014-03-19 15:33 ` [patch net-next RFC 2/4] net: introduce switchdev API Jiri Pirko
2014-03-20 13:59 ` Thomas Graf
2014-03-20 14:18 ` Jiri Pirko
2014-03-20 14:43 ` Nikolay Aleksandrov
2014-03-20 15:42 ` Jiri Pirko
2014-03-19 15:33 ` [patch net-next RFC 3/4] openvswitch: Introduce support for switchdev based datapath Jiri Pirko
2014-03-19 15:33 ` [patch net-next RFC 4/4] net: introduce dummy switch Jiri Pirko
2014-03-20 11:49 ` [patch net-next RFC 0/4] introduce infrastructure for support of switch chip datapath Jamal Hadi Salim
2014-03-20 12:40 ` Jiri Pirko
2014-03-20 17:21 ` Florian Fainelli
2014-03-21 12:04 ` Jamal Hadi Salim
2014-03-22 9:48 ` Jiri Pirko
2014-03-24 23:07 ` Jamal Hadi Salim
2014-03-25 17:39 ` Neil Horman
2014-03-25 18:00 ` Thomas Graf
2014-03-25 19:35 ` Neil Horman
2014-03-25 20:11 ` Florian Fainelli
2014-03-25 20:31 ` Neil Horman
2014-03-25 21:22 ` Jamal Hadi Salim
2014-03-25 21:26 ` Thomas Graf
2014-03-25 21:42 ` Florian Fainelli
2014-03-25 21:54 ` Thomas Graf
2014-03-26 10:55 ` Neil Horman
2014-03-26 5:37 ` Roopa Prabhu [this message]
2014-03-26 10:54 ` Jamal Hadi Salim
2014-03-26 15:31 ` John W. Linville
2014-03-26 16:54 ` Roopa Prabhu
2014-03-26 16:59 ` Jiri Pirko
2014-03-26 17:29 ` Florian Fainelli
2014-03-26 17:35 ` Jiri Pirko
2014-03-26 17:58 ` Florian Fainelli
2014-03-26 18:14 ` Jiri Pirko
2014-03-26 18:29 ` Hannes Frederic Sowa
2014-03-26 18:30 ` Florian Fainelli
2014-03-26 21:51 ` Jamal Hadi Salim
2014-03-26 22:22 ` Florian Fainelli
2014-03-26 22:53 ` Jamal Hadi Salim
2014-03-26 23:16 ` Florian Fainelli
2014-03-27 6:56 ` Jiri Pirko
2014-03-27 10:39 ` Jamal Hadi Salim
2014-03-27 10:50 ` Jiri Pirko
2014-03-27 11:12 ` Jamal Hadi Salim
2014-03-27 11:16 ` Jiri Pirko
2014-03-27 14:10 ` Sergey Ryazanov
2014-03-27 16:41 ` Florian Fainelli
2014-03-27 16:57 ` Jiri Pirko
2014-03-27 16:59 ` Thomas Graf
2014-03-27 20:32 ` Sergey Ryazanov
2014-03-27 21:20 ` Florian Fainelli
2014-03-27 21:55 ` Jamal Hadi Salim
2014-03-28 6:28 ` Jiri Pirko
2014-03-30 12:08 ` Alon Harel
2014-03-27 21:41 ` Jamal Hadi Salim
2014-03-27 16:55 ` Jiri Pirko
2014-03-27 19:58 ` Sergey Ryazanov
2014-03-27 20:01 ` Florian Fainelli
2014-03-27 20:04 ` Sergey Ryazanov
2014-03-27 21:47 ` Jamal Hadi Salim
2014-03-27 21:54 ` Florian Fainelli
2014-03-27 21:59 ` Jamal Hadi Salim
2014-03-27 22:19 ` Florian Fainelli
2014-03-27 23:42 ` Thomas Graf
2014-03-27 23:46 ` Florian Fainelli
2014-03-26 17:57 ` Roopa Prabhu
2014-03-26 18:09 ` Florian Fainelli
2014-03-27 13:46 ` John W. Linville
2014-03-26 17:47 ` Roopa Prabhu
2014-03-26 18:03 ` Jiri Pirko
2014-03-26 21:27 ` Roopa Prabhu
2014-03-26 21:31 ` Jiri Pirko
2014-03-27 15:35 ` Roopa Prabhu
2014-03-27 16:10 ` Jiri Pirko
2014-04-01 19:13 ` Scott Feldman
2014-04-02 6:41 ` Jiri Pirko
2014-04-02 15:37 ` Scott Feldman
2014-04-02 14:32 ` Andy Gospodarek
2014-04-02 15:25 ` John W. Linville
2014-04-02 16:15 ` Scott Feldman
2014-04-02 16:47 ` Florian Fainelli
2014-04-02 21:52 ` Thomas Graf
2014-04-02 19:29 ` John W. Linville
2014-04-02 19:54 ` Scott Feldman
2014-04-02 20:06 ` John W. Linville
2014-04-02 20:04 ` Stephen Hemminger
2014-04-02 20:23 ` Jiri Pirko
2014-04-02 20:38 ` John W. Linville
2014-04-02 21:36 ` Thomas Graf
2014-03-25 20:56 ` Jamal Hadi Salim
2014-03-25 21:19 ` Thomas Graf
2014-03-25 21:24 ` Jamal Hadi Salim
2014-03-26 7:21 ` Jiri Pirko
2014-03-26 11:00 ` Jamal Hadi Salim
2014-03-26 11:06 ` Jamal Hadi Salim
2014-03-26 11:31 ` Jamal Hadi Salim
2014-03-26 13:20 ` Jiri Pirko
2014-03-26 13:23 ` Jamal Hadi Salim
2014-03-26 13:17 ` Jiri Pirko
2014-03-26 11:10 ` Neil Horman
2014-03-26 11:29 ` Thomas Graf
2014-03-26 12:58 ` Jamal Hadi Salim
2014-03-26 15:22 ` John W. Linville
2014-03-26 21:36 ` Jamal Hadi Salim
2014-03-26 18:21 ` Neil Horman
2014-03-26 19:11 ` Florian Fainelli
2014-03-26 22:44 ` Jamal Hadi Salim
2014-03-26 23:15 ` Thomas Graf
2014-03-26 23:21 ` Florian Fainelli
2014-03-27 15:26 ` Neil Horman
2014-03-27 21:33 ` Jamal Hadi Salim
2014-03-26 19:24 ` Hannes Frederic Sowa
2014-03-27 13:43 ` John W. Linville
2014-03-26 12:19 ` Jamal Hadi Salim
2014-03-26 15:27 ` John W. Linville
2014-03-25 18:33 ` Florian Fainelli
2014-03-25 19:40 ` Neil Horman
2014-03-25 20:00 ` Florian Fainelli
2014-03-25 21:39 ` tgraf
2014-03-25 22:08 ` Jamal Hadi Salim
2014-03-26 5:48 ` Roopa Prabhu
2014-03-25 20:46 ` Jamal Hadi Salim
2014-03-26 7:24 ` Jiri Pirko
2014-03-22 9:40 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5332677F.2090404@cumulusnetworks.com \
--to=roopa@cumulusnetworks.com \
--cc=andy@greyhouse.net \
--cc=azhou@nicira.com \
--cc=ben@decadent.org.uk \
--cc=buytenh@wantstofly.org \
--cc=davem@davemloft.net \
--cc=dborkman@redhat.com \
--cc=edumazet@google.com \
--cc=f.fainelli@gmail.com \
--cc=jeffrey.t.kirsher@intel.com \
--cc=jesse@nicira.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=john.r.fastabend@intel.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=ogerlitz@mellanox.com \
--cc=pshelar@nicira.com \
--cc=sfeldma@cumulusnetworks.com \
--cc=shm@cumulusnetworks.com \
--cc=stephen@networkplumber.org \
--cc=tgraf@suug.ch \
--cc=vyasevic@redhat.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.