netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: netdev@vger.kernel.org
Subject: [flamebait] xdp, well meaning but pointless
Date: Thu, 1 Dec 2016 10:11:08 +0100	[thread overview]
Message-ID: <20161201091108.GF26507@breakpoint.cc> (raw)

[ As already mentioned in my reply to Tom, here is
the xdp flamebait/critique ]

Lots of XDP related patches started to appear on netdev.
I'd prefer if it would stop...

To me XDP combines all disadvantages of stack bypass solutions like dpdk
with the disadvantages of kernel programming with a more limited
instruction set and toolchain.

Unlike XDP userspace bypass (dpdk et al) allow use of any programming
model or language you want (including scripting languages), which
makes things a lot easier, e.g. garbage collection, debuggers vs.
crash+vmcore+printk...

I have heared the argument that these restrictions that come with
XDP are great because it allows to 'limit what users can do'.

Given existence of DPDK/netmap/userspace bypass is a reality, this is
a very weak argument -- why would anyone pick XDP over a dpdk/netmap
based solution?
XDP will always be less powerful and a lot more complicated,
especially considering users of dpdk (or toolkits built on top of it)
are not kernel programmers and userspace has more powerful ipc
(or storage) mechanisms.

Aside from this, XDP, like DPDK, is a kernel bypass.
You might say 'Its just stack bypass, not a kernel bypass!'.
But what does that mean exactly?  That packets can still be passed
onward to normal stack?
Bypass solutions like netmap can also inject packets back to
kernel stack again.

Running less powerful user code in a restricted environment in the kernel
address space is certainly a worse idea than separating this logic out
to user space.

In light of DPDKs existence it make a lot more sense to me to provide
a). a faster mmap based interface (possibly AF_PACKET based) that allows
to map nic directly into userspace, detaching tx/rx queue from kernel.

John Fastabend sent something like this last year as a proof of
concept, iirc it was rejected because register space got exposed directly
to userspace.  I think we should re-consider merging netmap
(or something conceptually close to its design).

b). with regards to a programmable data path: IFF one wants to do this
in kernel (and thats a big if), it seems much more preferrable to provide
a config/data-based approach rather than a programmable one.  If you want
full freedom DPDK is architecturally just too powerful to compete with.

Proponents of XDP sometimes provide usage examples.
Lets look at some of these.

== Application developement: ==
* DNS Server
data structures and algorithms need to be implemented in a mostly touring
complete language, so eBPF cannot readily be be used for that.
At least it will be orders of magnitude harder than in userspace.

* TCP Endpoint
TCP processing in eBPF is a bit out of question while userspace tcp stacks
based on both netmap and dpdk already exist today.

== Forwarding dataplane: ==

* Router/Switch
Router and switches should actually adhere to standardized and specified
protocols and thus don't need a lot of custom software and specialized
software.  Still a lot more work compared to userspace offloads where
you can do things like allocating a 4GB array to perform nexthop lookup.
Also needs ability to perform tx on another interface.

* Load balancer
State holding algorithm need sorting and searching, so also no fit for
eBPF (could be exposed by function exports, but then can we do DoS by
finding worst case scenarios?).

Also again needs way to forward frame out via another interface.

For cases where packet gets sent out via same interface it would appear
to be easier to use port mirroring in a switch and use stochastic filtering
on end nodes to determine which host should take responsibility.

XDP plus: central authority over how distribution will work in case
nodes are added/removed from pool.
But then again, it will be easier to hande this with netmap/dpdk where
more complicated scheduling algorithms can be used.

* early drop/filtering.
While its possible to do "u32" like filters with ebpf, all modern nics
support ntuple filtering in hardware, which is going to be faster because
such packet will never even be signalled to the operating system.
For more complicated cases (e.g. doing socket lookup to check if particular
packet does match bound socket (and expected sequence numbers etc) I don't
see easy ways to do that with XDP (and without sk_buff context).
Providing it via function exports is possible of course, but that will only
result in an "arms race" where we will see special-sauce functions
all over the place -- DoS will always attempt to go for something
that is difficult to filter against, cf. all the recent volume-based
floodings.

Thanks, Florian

             reply	other threads:[~2016-12-01  9:14 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-01  9:11 Florian Westphal [this message]
2016-12-01 13:42 ` [flamebait] xdp, well meaning but pointless Hannes Frederic Sowa
2016-12-01 14:58 ` Thomas Graf
2016-12-01 15:52   ` Hannes Frederic Sowa
2016-12-01 16:28     ` Thomas Graf
2016-12-01 20:44       ` Hannes Frederic Sowa
2016-12-01 21:12         ` Tom Herbert
2016-12-01 21:27           ` Hannes Frederic Sowa
2016-12-01 21:51             ` Tom Herbert
2016-12-02 10:24               ` Jesper Dangaard Brouer
2016-12-02 11:54                 ` Hannes Frederic Sowa
2016-12-02 16:59                   ` Tom Herbert
2016-12-02 18:12                     ` Hannes Frederic Sowa
2016-12-02 19:56                       ` Stephen Hemminger
2016-12-02 20:19                         ` Tom Herbert
2016-12-02 18:39             ` bpf bounded loops. Was: [flamebait] xdp Alexei Starovoitov
2016-12-02 19:25               ` Hannes Frederic Sowa
2016-12-02 19:42                 ` John Fastabend
2016-12-02 19:50                   ` Hannes Frederic Sowa
2016-12-03  0:20                   ` Alexei Starovoitov
2016-12-03  9:11                     ` Sargun Dhillon
2016-12-02 19:42                 ` Hannes Frederic Sowa
2016-12-02 23:34                   ` Alexei Starovoitov
2016-12-04 16:05                     ` [flamebait] xdp Was: " Hannes Frederic Sowa
2016-12-06  3:05                       ` Alexei Starovoitov
2016-12-06  5:08                         ` Tom Herbert
2016-12-06  6:04                           ` Alexei Starovoitov
2016-12-05 16:40                 ` Edward Cree
2016-12-05 16:50                   ` Hannes Frederic Sowa
2016-12-05 16:54                     ` Edward Cree
2016-12-06 11:35                       ` Hannes Frederic Sowa
2016-12-01 16:06   ` [flamebait] xdp, well meaning but pointless Florian Westphal
2016-12-01 16:19   ` David Miller
2016-12-01 16:51     ` Florian Westphal
2016-12-01 17:20     ` Hannes Frederic Sowa
     [not found] ` <CALx6S35R_ZStV=DbD-7Gf_y5xXqQq113_6m5p-p0GQfv46v0Ow@mail.gmail.com>
2016-12-01 18:02   ` Tom Herbert
2016-12-02 17:22 ` Jesper Dangaard Brouer
2016-12-03 16:19   ` Willem de Bruijn
2016-12-03 19:48     ` John Fastabend
2016-12-05 11:04       ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161201091108.GF26507@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).