netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] swdev: add IPv4 routing offload
@ 2015-01-02  3:29 sfeldma
  2015-01-02  5:11 ` Dave Taht
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: sfeldma @ 2015-01-02  3:29 UTC (permalink / raw)
  To: netdev, jiri, john.fastabend, tgraf, jhs, andy, roopa

From: Scott Feldman <sfeldma@gmail.com>

This patch set adds L3 routing offload support for IPv4 routes.  The idea is to
mirror routes installed in the kernel's FIB down to a hardware switch device to
offload the data forwarding path for L3.  Only the data forwarding path is
intercepted.  Control and management of the kernel's FIB remains with the
kernel.

A couple of new ndo ops (ndo_switch_fib_ipv4_add/del) are added to the swdev
model to add/remove FIB entries to/from the offload device.  The ops are called
from the core IPv4 FIB code directly.  Just before the FIB entry is installed
in the kernel's FIB, the swdev device driver gets a chance at the FIB entry
(assuming the swdev driver implements the new ndo ops).  This is a synchronous
call in the RTM_NEWROUTE path, and the swdev has the option to fail the
install, which means the FIB entry is not installed in swdev or the kernel, and
the user is notified of the failure.  The swdev driver also has the option to
return -EOPNOTSUPP to pass on the FIB entry, so it'll only be installed in the
kernel FIB.

The FIB flush path is modified also to call into the swdev driver to flush the
FIB entries from hardware.

The rocker swdev driver is updated to support these new ndo ops.  Right now
rocker only supports IPv4 singlepath routes, but follow-on patches will add
IPv6 and ECMP support.  Also, only unicast IPv4 routes are supported, but
follow-on patches will add multicast route support.

Testing was done in my simulated network envionment using VMs and the rocker
device.  I'm using Quagga OSPFv2 for the routing protocol for automatic control
plane processing.  No modifications to Quagga or netlink/iproute2 is required;
it just works.

One important metric is the time spent installing/removing FIB entries from the
kernel and the device.  With these patches applied, I measured the wall time
required to install and remove 10K IPv4 routes.  I used ip route add cmd in
batch mode to install static routes.  I used the ip route flush cmd to delete
the routes.  This is 10000 routes installed to the kernel's FIB and to the
swdev device's L3 tables.  And then removed from each.  The performance is less
than a second for each operation.  This is on my simulated rocker device running
on a VM, so a real embedded CPU would probably do much better.

My batch has 10K lines of:

simp@simp:~$ head east
route add 16.0.0.0/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.1/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.2/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.3/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.4/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.5/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.6/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.7/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.8/32 nexthop via 11.0.0.2 dev swp1
route add 16.0.0.9/32 nexthop via 11.0.0.2 dev swp1
[...]

Install/removing routes:

simp@simp:~$ wc -l east
10000 east
simp@simp:~$ ip route show root 16/8 | wc -l
0
simp@simp:~$ time sudo ip --batch east

real    0m0.715s
user    0m0.092s
sys     0m0.388s
simp@simp:~$ ip route show root 16/8 | wc -l
10000

[At this point, 10K routes are installed in kernel and the device]

simp@simp:~$ time sudo ip route flush root 16/8

real    0m0.458s
user    0m0.000s
sys     0m0.284s
simp@simp:~$ ip route show root 16/8 | wc -l
0

[All gone]

Scott Feldman (3):
  net: add IPv4 routing FIB support for swdev
  net: call swdev fib del for flushed routes
  rocker: implement IPv4 fib offloading

 drivers/net/ethernet/rocker/rocker.c |  441 +++++++++++++++++++++++++++++++++-
 include/linux/netdevice.h            |   22 ++
 include/net/switchdev.h              |   18 ++
 net/ipv4/fib_trie.c                  |   31 ++-
 net/switchdev/switchdev.c            |   89 +++++++
 5 files changed, 592 insertions(+), 9 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next 0/3] swdev: add IPv4 routing offload
  2015-01-02  3:29 [PATCH net-next 0/3] swdev: add IPv4 routing offload sfeldma
@ 2015-01-02  5:11 ` Dave Taht
  2015-01-02  9:04 ` Rami Rosen
  2015-01-05  3:17 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Dave Taht @ 2015-01-02  5:11 UTC (permalink / raw)
  To: Scott Feldman
  Cc: netdev@vger.kernel.org, jiri, john fastabend, Thomas Graf,
	Jamal Hadi Salim, andy, roopa, David Lamparter

On Thu, Jan 1, 2015 at 7:29 PM,  <sfeldma@gmail.com> wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> This patch set adds L3 routing offload support for IPv4 routes.  The idea is to
> mirror routes installed in the kernel's FIB down to a hardware switch device to
> offload the data forwarding path for L3.  Only the data forwarding path is
> intercepted.  Control and management of the kernel's FIB remains with the
> kernel.
>
> A couple of new ndo ops (ndo_switch_fib_ipv4_add/del) are added to the swdev
> model to add/remove FIB entries to/from the offload device.  The ops are called
> from the core IPv4 FIB code directly.  Just before the FIB entry is installed
> in the kernel's FIB, the swdev device driver gets a chance at the FIB entry
> (assuming the swdev driver implements the new ndo ops).  This is a synchronous
> call in the RTM_NEWROUTE path, and the swdev has the option to fail the
> install, which means the FIB entry is not installed in swdev or the kernel, and
> the user is notified of the failure.  The swdev driver also has the option to
> return -EOPNOTSUPP to pass on the FIB entry, so it'll only be installed in the
> kernel FIB.

A couple notes:

1) As currently implemented in quagga, (to my knowledge), an actual
route change is actually a route delete/route add rather than an
atomic route modify or route add/route delete. While it would be nice
to fix quagga to do it atomically (and for all I know some fork does
it right?), I am curious as to the extent of serialization during the
process like this in the virtual switch. (and it does not appear you
have tested the ip route change commands above, or beat up quagga's
routing decisions)

2) It is generally helpful to be concurrently running the max traffic
you can sustain through the switch, while doing fib changes... and
observing what happens to that traffic.

3) As you attempt ipv6, life gets more complex. (you need to switch to
a later routing protocol in particular...)

4) There's a new idea on the block: Source specific routing (sometimes
called SADR) is mandated by the ietf homenet working group, in
particular, which relies on IPV6_subtrees, and link local ipv6
multicast. the code furthest enough along is babels
(http://www.pps.univ-paris-diderot.fr/~jch/software/babel/
https://github.com/boutier/babeld also with patches for quagga) which,
being easy to setup, might be a good exercise of both link local
multicast and of ipv6 in the virtual switch itself, as well as
exercising the fib. (ospfv3 and ISIS also have support for source
specific routing in various branches.)

>
> The FIB flush path is modified also to call into the swdev driver to flush the
> FIB entries from hardware.
>
> The rocker swdev driver is updated to support these new ndo ops.  Right now
> rocker only supports IPv4 singlepath routes, but follow-on patches will add
> IPv6 and ECMP support.  Also, only unicast IPv4 routes are supported, but
> follow-on patches will add multicast route support.
>
> Testing was done in my simulated network envionment using VMs and the rocker
> device.  I'm using Quagga OSPFv2 for the routing protocol for automatic control
> plane processing.  No modifications to Quagga or netlink/iproute2 is required;
> it just works.
>
> One important metric is the time spent installing/removing FIB entries from the
> kernel and the device.  With these patches applied, I measured the wall time
> required to install and remove 10K IPv4 routes.  I used ip route add cmd in
> batch mode to install static routes.  I used the ip route flush cmd to delete
> the routes.  This is 10000 routes installed to the kernel's FIB and to the
> swdev device's L3 tables.  And then removed from each.  The performance is less
> than a second for each operation.  This is on my simulated rocker device running
> on a VM, so a real embedded CPU would probably do much better.
>
> My batch has 10K lines of:
>
> simp@simp:~$ head east
> route add 16.0.0.0/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.1/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.2/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.3/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.4/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.5/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.6/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.7/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.8/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.9/32 nexthop via 11.0.0.2 dev swp1
> [...]
>
> Install/removing routes:
>
> simp@simp:~$ wc -l east
> 10000 east
> simp@simp:~$ ip route show root 16/8 | wc -l
> 0
> simp@simp:~$ time sudo ip --batch east
>
> real    0m0.715s
> user    0m0.092s
> sys     0m0.388s
> simp@simp:~$ ip route show root 16/8 | wc -l
> 10000
>
> [At this point, 10K routes are installed in kernel and the device]
>
> simp@simp:~$ time sudo ip route flush root 16/8
>
> real    0m0.458s
> user    0m0.000s
> sys     0m0.284s
> simp@simp:~$ ip route show root 16/8 | wc -l
> 0
>
> [All gone]
>
> Scott Feldman (3):
>   net: add IPv4 routing FIB support for swdev
>   net: call swdev fib del for flushed routes
>   rocker: implement IPv4 fib offloading
>
>  drivers/net/ethernet/rocker/rocker.c |  441 +++++++++++++++++++++++++++++++++-
>  include/linux/netdevice.h            |   22 ++
>  include/net/switchdev.h              |   18 ++
>  net/ipv4/fib_trie.c                  |   31 ++-
>  net/switchdev/switchdev.c            |   89 +++++++
>  5 files changed, 592 insertions(+), 9 deletions(-)
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht

thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next 0/3] swdev: add IPv4 routing offload
  2015-01-02  3:29 [PATCH net-next 0/3] swdev: add IPv4 routing offload sfeldma
  2015-01-02  5:11 ` Dave Taht
@ 2015-01-02  9:04 ` Rami Rosen
  2015-01-05  3:17 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Rami Rosen @ 2015-01-02  9:04 UTC (permalink / raw)
  To: sfeldma; +Cc: Netdev, jiri, john.fastabend, tgraf, jhs, andy, roopa

Hi, Scott,

Good work!

You say that currently the rocker driver support only unicast singlepath IPv4.
If I understand correctly, IPv4 packets with tos !=0 are skipped in
the current rocker implementation of the ndo_sw_parent_fib_ipv4_add()
callback. I am referring to the rocker_port_fib_ipv4_skip() method:

if (tos != 0)
     return -EOPNOTSUPP;

see:
https://github.com/jpirko/net-next-rocker/blob/master/drivers/net/ethernet/rocker/rocker.c#L3701

Is there a reason for this? (The NDO that you suggest,
ndo_sw_parent_fib_ipv4_add(), has the tos as a parameter, so from this
aspect there is no problem).

Regards,
Rami Rosen


On Fri, Jan 2, 2015 at 5:29 AM,  <sfeldma@gmail.com> wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> This patch set adds L3 routing offload support for IPv4 routes.  The idea is to
> mirror routes installed in the kernel's FIB down to a hardware switch device to
> offload the data forwarding path for L3.  Only the data forwarding path is
> intercepted.  Control and management of the kernel's FIB remains with the
> kernel.
>
> A couple of new ndo ops (ndo_switch_fib_ipv4_add/del) are added to the swdev
> model to add/remove FIB entries to/from the offload device.  The ops are called
> from the core IPv4 FIB code directly.  Just before the FIB entry is installed
> in the kernel's FIB, the swdev device driver gets a chance at the FIB entry
> (assuming the swdev driver implements the new ndo ops).  This is a synchronous
> call in the RTM_NEWROUTE path, and the swdev has the option to fail the
> install, which means the FIB entry is not installed in swdev or the kernel, and
> the user is notified of the failure.  The swdev driver also has the option to
> return -EOPNOTSUPP to pass on the FIB entry, so it'll only be installed in the
> kernel FIB.
>
> The FIB flush path is modified also to call into the swdev driver to flush the
> FIB entries from hardware.
>
> The rocker swdev driver is updated to support these new ndo ops.  Right now
> rocker only supports IPv4 singlepath routes, but follow-on patches will add
> IPv6 and ECMP support.  Also, only unicast IPv4 routes are supported, but
> follow-on patches will add multicast route support.
>
> Testing was done in my simulated network envionment using VMs and the rocker
> device.  I'm using Quagga OSPFv2 for the routing protocol for automatic control
> plane processing.  No modifications to Quagga or netlink/iproute2 is required;
> it just works.
>
> One important metric is the time spent installing/removing FIB entries from the
> kernel and the device.  With these patches applied, I measured the wall time
> required to install and remove 10K IPv4 routes.  I used ip route add cmd in
> batch mode to install static routes.  I used the ip route flush cmd to delete
> the routes.  This is 10000 routes installed to the kernel's FIB and to the
> swdev device's L3 tables.  And then removed from each.  The performance is less
> than a second for each operation.  This is on my simulated rocker device running
> on a VM, so a real embedded CPU would probably do much better.
>
> My batch has 10K lines of:
>
> simp@simp:~$ head east
> route add 16.0.0.0/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.1/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.2/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.3/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.4/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.5/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.6/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.7/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.8/32 nexthop via 11.0.0.2 dev swp1
> route add 16.0.0.9/32 nexthop via 11.0.0.2 dev swp1
> [...]
>
> Install/removing routes:
>
> simp@simp:~$ wc -l east
> 10000 east
> simp@simp:~$ ip route show root 16/8 | wc -l
> 0
> simp@simp:~$ time sudo ip --batch east
>
> real    0m0.715s
> user    0m0.092s
> sys     0m0.388s
> simp@simp:~$ ip route show root 16/8 | wc -l
> 10000
>
> [At this point, 10K routes are installed in kernel and the device]
>
> simp@simp:~$ time sudo ip route flush root 16/8
>
> real    0m0.458s
> user    0m0.000s
> sys     0m0.284s
> simp@simp:~$ ip route show root 16/8 | wc -l
> 0
>
> [All gone]
>
> Scott Feldman (3):
>   net: add IPv4 routing FIB support for swdev
>   net: call swdev fib del for flushed routes
>   rocker: implement IPv4 fib offloading
>
>  drivers/net/ethernet/rocker/rocker.c |  441 +++++++++++++++++++++++++++++++++-
>  include/linux/netdevice.h            |   22 ++
>  include/net/switchdev.h              |   18 ++
>  net/ipv4/fib_trie.c                  |   31 ++-
>  net/switchdev/switchdev.c            |   89 +++++++
>  5 files changed, 592 insertions(+), 9 deletions(-)
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next 0/3] swdev: add IPv4 routing offload
  2015-01-02  3:29 [PATCH net-next 0/3] swdev: add IPv4 routing offload sfeldma
  2015-01-02  5:11 ` Dave Taht
  2015-01-02  9:04 ` Rami Rosen
@ 2015-01-05  3:17 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2015-01-05  3:17 UTC (permalink / raw)
  To: sfeldma; +Cc: netdev, jiri, john.fastabend, tgraf, jhs, andy, roopa

From: sfeldma@gmail.com
Date: Thu,  1 Jan 2015 19:29:18 -0800

> This patch set adds L3 routing offload support for IPv4 routes.  The idea is to
> mirror routes installed in the kernel's FIB down to a hardware switch device to
> offload the data forwarding path for L3.  Only the data forwarding path is
> intercepted.  Control and management of the kernel's FIB remains with the
> kernel.

It looks like the design for this is still under discussion and that
new patches of whatever is decided upon will be forthcoming
eventually.

Can I ask you guys a huge favor?  DO NOT quote the entire patch when
discussing these changes.

The thread for patch #1 was so time consuming to scan and read in
patchwork because you guys did this.

I know it takes a little bit more work to select and delete the patch
content in the quoted area, but you really have to do this because
otherwise it is a huge burden for reviewers trying to follow the
conversation.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-01-05  3:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-02  3:29 [PATCH net-next 0/3] swdev: add IPv4 routing offload sfeldma
2015-01-02  5:11 ` Dave Taht
2015-01-02  9:04 ` Rami Rosen
2015-01-05  3:17 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).