From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
Andy Gospodarek <andy@greyhouse.net>,
Daniel Borkmann <borkmann@iogearbox.net>,
Alexei Starovoitov <ast@fb.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"xdp-newbies@vger.kernel.org" <xdp-newbies@vger.kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
brouer@redhat.com
Subject: Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
Date: Tue, 25 Apr 2017 11:34:53 +0200 [thread overview]
Message-ID: <20170425113453.5c72080f@redhat.com> (raw)
In-Reply-To: <20170420171006.GA97067@ast-mbp.thefacebook.com>
On Thu, 20 Apr 2017 10:10:08 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Thu, Apr 20, 2017 at 08:10:51AM +0200, Jesper Dangaard Brouer wrote:
> > On Wed, 19 Apr 2017 19:56:13 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > On Thu, Apr 20, 2017 at 12:51:31AM +0200, Daniel Borkmann wrote:
> > > >
> > > > Is there a concrete reason that all the proposed future cases like sockets
> > > > have to be handled within the very same XDP_REDIRECT return code? F.e. why
> > > > not XDP_TX_NIC that only assumes ifindex as proposed in the patch, and future
> > > > ones would get a different return code f.e. XDP_TX_SK only handling sockets
> > > > when we get there implementation-wise?
> > >
> > > yeah. let's keep redirect to sockets, tunnels, crypto and exotic things
> > > out of this discussion.
> > > XDP_REDIRECT should assume L2 raw packet is being redirected to another L2 netdev.
> > > If we make it too generic it will lose performance.
> > >
> > > For cls_bpf the ifindex concept is symmetric. The program can access it as
> > > skb->ifindex on receive and can redirect to another ifindex via bpf_redirect() helper.
> > > Since netdev is not locked, it's actually big plus, since container management
> > > control plane can simply delete netns+veth and it goes away. The program can
> > > have dangling ifindex (if control plane is buggy and didn't update the bpf side),
> > > but it's harmless. Packets that redirect to non-existing ifindex are dropped.
> > > This approach already understood and works well, so for XDP I suggest to use
> > > the same approach initially before starting to reinvent the wheel.
> > > struct xdp_md needs ifindex field and xdp_redirect() helper that redirects
> > > to L2 netdev only. That's it. Simple and easy.
> > > I think the main use cases in John's and Jesper's minds is something like
> > > xdpswitch where packets are coming from VMs and from physical eths and
> > > being redirected to either physical eth or to VM via upcoming vhost+xdp support.
> > > This xdp_md->ifindex + xdp_redirect(to_ifindex) will solve it just fine.
> > >
> > > Once we have vhost+xdp and all other bits implemented, we must come back
> > > to this discussion about having port mapping table. As I mentioned
> > > during netconf I think it's very useful, but I don't think we should
> > > gate vhost+xdp and xdp_redirect work on this discussion.
> > > As far as this port mapping table we would need 'port' field in xdp_md as well
> > > and xdp_redirect_port() done via helper or via extra 'out_port' field in xdp_md
> > > plus another XDP_REDIRECT_PORT action code.
> >
> > Guess it would be easier to talk about if we name it "ingress_port" and
> > "egress_port".
> >
> > > The actual port table (array) should be populated by user space with netdevs
> > > and these netdev will have their refcnt incremented. Then we'll have discussion
> > > what to do with netdev_unregister notifiers, whether they should be auto-removed
> > > from port table or bpf should have a chance to be notified and act on it.
> > > Such port mapping will allow us to optimize inevitable call
> > > dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
> > > away, since netdevs will be stored in the port table and direct deref
> > > port_map_array[xdp_md->out_port] will give us target netdev quickly.
> >
> > I agree with above paragraph, and is happy that you can see that this
> > will actually be faster than using ifindex'es.
> >
> > > It's nice optimization and there are other more powerful optimizations we
> > > can do with such port table (since we will know in advance which netdevs
> > > the program will be redirecting too), but I still think we should do
> > > ifindex based xdp_redirect first and only add this port table later.
> >
> > No, we cannot first do an ifindex based xdp_redirect. The point of the
> > port table is to sandbox which ports XDP can use.
>
> hmm. port table cannot sandbox the ports. The only thing it does
> from 'safety' point of view is moving the checks from run-time into
> static insertion time.
> So the checks that we would do on netdev after looking it up
> based on ifindex are the same checks we will do at insertion time
> into port table. The user space will insert/delete them live
> from that port table, so from program point of view it's exactly
> the same as ifindex. The ports can disappear and can be added
> while the program is running.
I agree, that from the eBPF programs point of view using an ifindex or a
port number is the same. And I do like this model, that this is just a
number seem from bpf. It provides a clean separation between the
kernel and ebpf program world.
> Note the very first bpf patchset years ago contained the port table
> abstraction. ovs has concept of vports as well. These two very
> different projects needed port table to provide a layer of
> indirection between ifindex==netdev and virtual port number.
> This is still the case and I'd like to see this port table to be
> implemented for both cls_bpf and xdp. In that sense xdp is not
> special.
Glad to hear you want to see this implemented, I will start coding on
this then. Good point with cls_bpf, I was planning to make this port
table strongly connected to XDP, guess I should also think of cls_bpf.
> > XDP is different than TC/cls_bpf, as it does "bypass", there is no
> > other layer that can stop or inspect these packets. The TC hooks
> > redirect into the network stack, which have all the usual facilities
> > available for filtering, inspection and debugging what is going on
> > (e.g. tcpdump works for TC redirect).
>
> not true. when bpf_redirect() drops the packet due to incorrect ifindex
> that packet disappears without a trace. No tcpdump and no counter.
> And this is fine. We can add tracepoint there for debugging,
> but it wasn't a problem for anyone who's using it today, so it's
> 'nice to have', but certainly not mandatory.
I'm not worried about the DROP case, I agree that is fine (as you also
say). The problem is unintentionally sending a packet to a wrong
ifindex. This is clearly an eBPF program error, BUT with XDP this
becomes a very hard to debug program error. With TC-redirect/cls_bpf
we can tcpdump the packets, with XDP there is no visibility into this
happening (the NSA is going to love this "feature"). Maybe we could add
yet-another tracepoint to allow debugging this. My proposal that we
simply remove the possibility for such program errors, by as you say
move the validation from run-time into static insertion-time, via a
port table.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2017-04-25 9:35 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-18 19:58 XDP question: best API for returning/setting egress port? Jesper Dangaard Brouer
2017-04-18 20:54 ` John Fastabend
2017-04-19 12:00 ` Jesper Dangaard Brouer
2017-04-19 12:33 ` Daniel Borkmann
2017-04-19 15:24 ` Jesper Dangaard Brouer
2017-04-19 12:25 ` Hannes Frederic Sowa
2017-04-19 20:02 ` Andy Gospodarek
2017-04-19 21:42 ` Daniel Borkmann
2017-04-20 17:12 ` Andy Gospodarek
2017-04-19 22:51 ` Daniel Borkmann
2017-04-20 2:56 ` xdp_redirect ifindex vs port. Was: " Alexei Starovoitov
2017-04-20 4:38 ` John Fastabend
2017-04-20 4:58 ` Alexei Starovoitov
2017-04-20 5:14 ` John Fastabend
2017-04-20 6:10 ` Jesper Dangaard Brouer
2017-04-20 17:10 ` Alexei Starovoitov
2017-04-25 9:34 ` Jesper Dangaard Brouer [this message]
2017-04-26 0:26 ` Alexei Starovoitov
2017-04-26 3:07 ` John Fastabend
2017-04-26 9:11 ` Jesper Dangaard Brouer
2017-04-26 16:35 ` John Fastabend
2017-04-26 17:58 ` Alexei Starovoitov
2017-04-26 20:55 ` Andy Gospodarek
2017-04-27 8:41 ` Jesper Dangaard Brouer
2017-04-27 23:31 ` Alexei Starovoitov
2017-04-28 5:06 ` John Fastabend
2017-04-28 5:30 ` Alexei Starovoitov
2017-04-28 19:43 ` Hannes Frederic Sowa
2017-04-30 1:35 ` Alexei Starovoitov
2017-04-28 10:58 ` Jesper Dangaard Brouer
2017-04-30 1:04 ` Alexei Starovoitov
2017-04-30 22:55 ` John Fastabend
2017-04-20 6:39 ` XDP question: " Jesper Dangaard Brouer
2017-04-20 4:43 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170425113453.5c72080f@redhat.com \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andy@greyhouse.net \
--cc=ast@fb.com \
--cc=borkmann@iogearbox.net \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=xdp-newbies@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.