netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch net-next v2 0/9] introduce rocker switch driver with hardware accelerated datapath api
@ 2014-09-19 13:49 Jiri Pirko
  2014-09-19 13:49 ` [patch net-next v2 1/9] net: rename netdev_phys_port_id to more generic name Jiri Pirko
                   ` (8 more replies)
  0 siblings, 9 replies; 67+ messages in thread
From: Jiri Pirko @ 2014-09-19 13:49 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	alexander.h.duyck-ral2JQCrhuEAvxtiuMwx3w,
	simon.horman-wFxRvT7yatFl57MIdRCFDg,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

This patchset can be divided into 3 main sections:
- introduce switchdev api for implementing switch drivers
- introduce switchdev generic netlink api for userspace manipulation
- introduce rocker switch driver which implements switchdev api

More info in separate patches.

So now there is possible to create ovs bridge over rocker
switch ports. ovs daemon can decide which flows to offload to hw
and uses switchdev genl api to tell driver. For now the easiest
way it to do it by hand via "sw" tool (https://github.com/jpirko/switchdev).

v1->v2 changes:
- removed DSA phys switch id implementation for now per Florian's request
- introduced own match key structure so the internal ovs flow struct
  stays untouched
- extended the flow match in order to easily add more match types (hope that
  Jamal will like this :)
- per ovs maintainers' request, removed ovs offload bits - that will be handled
  in ovs userspace using switchdev genl interface.
- added switchdev features so that driver can indicate what it supports
- little renames/fixes here and there

RFC->v1 changes:
- moved include/linux/*.h -> include/net/
- moved net/core/switchdev.c -> net/switchdev/
- moved drivers/net/rocker.* -> drivers/net/ethernet/rocker/
- fixed couple of little bugs and typos
- in dsa the switch id is generated randomly
- fixed rocker schedule in atomic context bug in rocker_port_set_rx_mode
- added switchdev Netlink API

Jiri Pirko (9):
  net: rename netdev_phys_port_id to more generic name
  net: introduce generic switch devices support
  rtnl: expose physical switch id for particular device
  net-sysfs: expose physical switch id for particular device
  net: introduce dummy switch
  switchdev: add basic support for flow matching and actions
  switchdev: add swdev features
  switchdev: introduce Netlink API
  rocker: introduce rocker switch driver

 Documentation/networking/switchdev.txt           |   53 +
 MAINTAINERS                                      |   14 +
 drivers/net/Kconfig                              |    7 +
 drivers/net/Makefile                             |    1 +
 drivers/net/dummyswitch.c                        |  130 +
 drivers/net/ethernet/Kconfig                     |    1 +
 drivers/net/ethernet/Makefile                    |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
 drivers/net/ethernet/rocker/Kconfig              |   29 +
 drivers/net/ethernet/rocker/Makefile             |    5 +
 drivers/net/ethernet/rocker/rocker.c             | 3561 ++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker.h             |  465 +++
 include/linux/netdevice.h                        |   49 +-
 include/net/switchdev.h                          |  160 +
 include/uapi/linux/if_link.h                     |   10 +
 include/uapi/linux/switchdev.h                   |  113 +
 net/Kconfig                                      |    1 +
 net/Makefile                                     |    3 +
 net/core/dev.c                                   |    2 +-
 net/core/net-sysfs.c                             |   26 +-
 net/core/rtnetlink.c                             |   30 +-
 net/switchdev/Kconfig                            |   20 +
 net/switchdev/Makefile                           |    6 +
 net/switchdev/switchdev.c                        |  188 ++
 net/switchdev/switchdev_netlink.c                |  441 +++
 28 files changed, 5307 insertions(+), 17 deletions(-)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 drivers/net/dummyswitch.c
 create mode 100644 drivers/net/ethernet/rocker/Kconfig
 create mode 100644 drivers/net/ethernet/rocker/Makefile
 create mode 100644 drivers/net/ethernet/rocker/rocker.c
 create mode 100644 drivers/net/ethernet/rocker/rocker.h
 create mode 100644 include/net/switchdev.h
 create mode 100644 include/uapi/linux/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c
 create mode 100644 net/switchdev/switchdev_netlink.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 67+ messages in thread
* Re: [patch net-next v2 8/9] switchdev: introduce Netlink API
@ 2014-09-23  3:43 Alexei Starovoitov
  2014-09-23 20:57 ` Tom Herbert
  0 siblings, 1 reply; 67+ messages in thread
From: Alexei Starovoitov @ 2014-09-23  3:43 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Thomas Graf, Jiri Pirko, John Fastabend, Jamal Hadi Salim,
	netdev@vger.kernel.org, David S. Miller, Neil Horman,
	Andy Gospodarek, Daniel Borkmann, Or Gerlitz, Jesse Gross,
	Pravin Shelar, Andy Zhou, Ben Hutchings, Stephen Hemminger,
	Jeff Kirsher, Vladislav Yasevich, Cong Wang, Eric Dumazet,
	Scott Feldman, Florian Fainelli, Roopa Prabhu,
	John Linville <linvi

On Mon, Sep 22, 2014 at 7:16 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Sep 22, 2014 at 6:54 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Mon, Sep 22, 2014 at 8:10 AM, Tom Herbert <therbert@google.com> wrote:
>>> On Mon, Sep 22, 2014 at 1:13 AM, Thomas Graf <tgraf@suug.ch> wrote:
>>>> On 09/20/14 at 03:50pm, Alexei Starovoitov wrote:
>>>>> I think HW should not be limited by SW abstractions whether
>>>>> these abstractions are called flows, n-tuples, bridge or else.
>>>>> Really looking forward to see "device reporting the headers as
>>>>> header fields (len, offset) and the associated parse graph"
>>>>> as the first step.
>>>>>
>>>>> Another topic that this discussion didn't cover yet is how this
>>>>> all connects to tunnels and what is 'tunnel offloading'.
>>
>>> encapsulation (stuffing a few bytes of header into a packet) is in
>>> itself not nearly an expensive enough operation to warrant offloading
>>> to the NIC. Personally, I wish if NIC vendors are going to focus on
>>
>> On contrary, generic tunneling is most important one to get right
>> when we're talking offloads.
>> Adding encap header is easy to do in hw, but it breaks all other
>> offloads if hw is not generic. Consider gso packet coming from vm.
>> Generic tunnel allows sw to add inner headers, outer headers and
>> setup offload offsets, so that HW does segmentation, checksuming
>> of inner packet, adjusts inner headers and adds final outer encap.
>
> As I pointed out on a previous thread, we already have a sufficiently
> generic interface to allow HW to do encapsulated TSO
> (SKB_GSO_UDP_TUNNEL and SKB_GSO_UDP_TUNNEL_CSUM with the inner
> headers).

SKB_GSO_UDP_TUNNEL_CSUM was the right way
to start splitting overloaded and messy semantics of
UDP_TUNNEL. I'm still not sure whether you've intended
it for both rx and tx, since to support tunnel_csum on rx,
parsing of encap is needed, whereas tx is so much simpler.
Unless you're assuming checksum_complete model for rx...

> If properly implemented, HW can implement a whole bunch of
> UDP encap protocols without knowing how to parse them.

on a tx side... yes, but I cannot see how you can do rx
with inner csum verify without parsing encap.
What do you have in mind ?

> I don't see how
> a switch on the NIC helps this...

correct, just a switch on a nic isn't very useful.

If immediate consumer of the packet is a VM,
then doing switching in the nic after decap doesn't
add much speed, since bridge+router+nat+policy in sw
after decap and csum verify done by hw are fast enough.
But switching in HW becomes useful when VF
is a destination device, since it avoids hw->sw->hw
roundtrip as Thomas was saying.

Also there are x86 network gateways where tunneled
traffic from virtual network is terminated and sent
over internet or to other datacenter. Performance
demands are high, so if tunnel+switch+nat+policy
can be done in off-the-shelf HW it would be great.

>> And this is just tx offload. On rx smart tunnel offload in HW parses
>> encap and goes all the way to inner headers to verify checksums,
>> it also steers based on inner headers.
>> Try mellanox nics with and without vxlan offload to see
>> the difference.
>
> Turn on UDP RSS on the device and I bet you'll see those differences
> go away!

Logically it should, since all inner flows should get
hashed into different outer src_port, but somehow
that didn't work. Need to re-investigate with your
l4_hash stuff.

> Alexei, I believe you said previously said that SW should not dictate
> HW models. I agree with this, but also believe the converse is true--
> HW shouldn't dictate SW model.

completely agree!

> This is really why I'm raising the
> question of what it means to integrate a switch into the host stack.
> If this is something that doesn't require any model change to the
> stack and is just a clever backend for rx-filters or tc, then I'm fine
> with that!

agree as well. I'm not excited about switchdev
abstraction from this given patch, since it looks overly
simplified and not applicable to real silicon, but
discussion about exposing programmable
nics/switches to sw in a generic way is worth having :)

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2014-09-26 21:02 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-19 13:49 [patch net-next v2 0/9] introduce rocker switch driver with hardware accelerated datapath api Jiri Pirko
2014-09-19 13:49 ` [patch net-next v2 1/9] net: rename netdev_phys_port_id to more generic name Jiri Pirko
     [not found]   ` <1411134590-4586-2-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-19 13:54     ` Jeff Kirsher
     [not found] ` <1411134590-4586-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-19 13:49   ` [patch net-next v2 2/9] net: introduce generic switch devices support Jiri Pirko
2014-09-19 14:15   ` [patch net-next v2 0/9] introduce rocker switch driver with hardware accelerated datapath api David Laight
     [not found]     ` <063D6719AE5E284EB5DD2968C1650D6D17495CC6-VkEWCZq2GCInGFn1LkZF6NBPR1lH4CV8@public.gmane.org>
2014-09-19 14:20       ` Jiri Pirko
2014-09-20  5:37         ` Florian Fainelli
2014-09-19 13:49 ` [patch net-next v2 3/9] rtnl: expose physical switch id for particular device Jiri Pirko
2014-09-19 13:49 ` [patch net-next v2 4/9] net-sysfs: " Jiri Pirko
2014-09-19 13:49 ` [patch net-next v2 5/9] net: introduce dummy switch Jiri Pirko
     [not found]   ` <1411134590-4586-6-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-20  5:21     ` Florian Fainelli
2014-09-20  7:37       ` Jiri Pirko
2014-09-19 13:49 ` [patch net-next v2 6/9] switchdev: add basic support for flow matching and actions Jiri Pirko
2014-09-20  5:32   ` Florian Fainelli
2014-09-20  7:28     ` Jiri Pirko
2014-09-19 13:49 ` [patch net-next v2 7/9] switchdev: add swdev features Jiri Pirko
2014-09-19 13:49 ` [patch net-next v2 8/9] switchdev: introduce Netlink API Jiri Pirko
2014-09-19 15:25   ` Jamal Hadi Salim
2014-09-19 15:49     ` Jiri Pirko
2014-09-19 17:57       ` Jamal Hadi Salim
2014-09-19 22:12         ` John Fastabend
2014-09-19 22:18           ` Jamal Hadi Salim
2014-09-20  5:39             ` Florian Fainelli
2014-09-20  8:25               ` Jiri Pirko
2014-09-20  8:17             ` Jiri Pirko
2014-09-20 10:19               ` Jamal Hadi Salim
2014-09-20 11:01                 ` Thomas Graf
2014-09-20 11:32                   ` Jamal Hadi Salim
2014-09-20 11:51                     ` Thomas Graf
     [not found]                       ` <20140920115140.GA3777-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-09-20 12:35                         ` Jamal Hadi Salim
2014-09-22  7:53                     ` Jiri Pirko
     [not found]                       ` <20140922075337.GA1828-6KJVSR23iU488b5SBfVpbw@public.gmane.org>
2014-09-22 11:48                         ` Jamal Hadi Salim
2014-09-20  5:36           ` Florian Fainelli
2014-09-20  8:14           ` Jiri Pirko
2014-09-20 10:53             ` Thomas Graf
2014-09-20 22:50               ` Alexei Starovoitov
2014-09-22  8:13                 ` Thomas Graf
2014-09-22 15:10                   ` Tom Herbert
2014-09-22 22:17                     ` Thomas Graf
     [not found]                       ` <20140922221727.GA4708-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-09-22 22:40                         ` Tom Herbert
2014-09-22 22:53                           ` Thomas Graf
2014-09-22 23:07                             ` Tom Herbert
2014-09-23  1:36                               ` John Fastabend
2014-09-23  7:19                                 ` Thomas Graf
2014-09-23 11:09                                 ` Jamal Hadi Salim
     [not found]                           ` <CA+mtBx9ZVQ5r5Hzy9-uEnk+iu+HKkOP4+VANC06Xf8VvTxktwQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-23  9:18                             ` Thomas Graf
2014-09-23  1:54                     ` Alexei Starovoitov
2014-09-23  2:16                       ` Tom Herbert
2014-09-23  4:11                         ` Andy Gospodarek
2014-09-23 10:11                           ` Thomas Graf
2014-09-23 15:32                           ` Or Gerlitz
2014-09-24 13:32                             ` Thomas Graf
2014-09-26 20:03                               ` Or Gerlitz
2014-09-26 21:02                                 ` Thomas Graf
2014-09-23  9:52                         ` Thomas Graf
2014-09-20  3:41       ` Roopa Prabhu
2014-09-20  8:09         ` Jiri Pirko
2014-09-20 12:39           ` Roopa Prabhu
2014-09-20  8:10         ` Scott Feldman
2014-09-20 10:31           ` Jamal Hadi Salim
     [not found]           ` <DDC24110-C3F5-470F-B9BE-1D1792415D1E-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR@public.gmane.org>
2014-09-20 12:51             ` Roopa Prabhu
2014-09-20 17:21               ` Scott Feldman
2014-09-20 17:38                 ` Jiri Pirko
2014-09-21  1:30                   ` Roopa Prabhu
2014-09-19 13:49 ` [patch net-next v2 9/9] rocker: introduce rocker switch driver Jiri Pirko
  -- strict thread matches above, loose matches on Subject: below --
2014-09-23  3:43 [patch net-next v2 8/9] switchdev: introduce Netlink API Alexei Starovoitov
2014-09-23 20:57 ` Tom Herbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).