From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
"Thomas Ptacek" <thomas@sockpuppet.org>,
"Adhipati Blambangan" <adhipati@tuta.io>,
"David Ahern" <dsahern@gmail.com>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"David S . Miller" <davem@davemloft.net>,
brouer@redhat.com
Subject: Re: [PATCH net v6] net: xdp: account for layer 3 packets in generic skb handler
Date: Wed, 19 Aug 2020 09:07:56 +0200 [thread overview]
Message-ID: <20200819090756.07f76eb9@carbon> (raw)
In-Reply-To: <20200815074102.5357-1-Jason@zx2c4.com>
On Sat, 15 Aug 2020 09:41:02 +0200
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
> A user reported that packets from wireguard were possibly ignored by XDP
> [1]. Another user reported that modifying packets from layer 3
> interfaces results in impossible to diagnose drops.
>
> Apparently, the generic skb xdp handler path seems to assume that
> packets will always have an ethernet header, which really isn't always
> the case for layer 3 packets, which are produced by multiple drivers.
> This patch fixes the oversight. If the mac_len is 0 and so is
> hard_header_len, then we know that the skb is a layer 3 packet, and in
> that case prepend a pseudo ethhdr to the packet whose h_proto is copied
> from skb->protocol, which will have the appropriate v4 or v6 ethertype.
> This allows us to keep XDP programs' assumption correct about packets
> always having that ethernet header, so that existing code doesn't break,
> while still allowing layer 3 devices to use the generic XDP handler.
>
> We push on the ethernet header and then pull it right off and set
> mac_len to the ethernet header size, so that the rest of the XDP code
> does not need any changes. That is, it makes it so that the skb has its
> ethernet header just before the data pointer, of size ETH_HLEN.
>
> Previous discussions have included the point that maybe XDP should just
> be intentionally broken on layer 3 interfaces, by design, and that layer
> 3 people should be using cls_bpf. However, I think there are good
> grounds to reconsider this perspective:
>
> - Complicated deployments wind up applying XDP modifications to a
> variety of different devices on a given host, some of which are using
> specialized ethernet cards and other ones using virtual layer 3
> interfaces, such as WireGuard. Being able to apply one codebase to
> each of these winds up being essential.
>
> - cls_bpf does not support the same feature set as XDP, and operates at
> a slightly different stage in the networking stack. You may reply,
> "then add all the features you want to cls_bpf", but that seems to be
> missing the point, and would still result in there being two ways to
> do everything, which is not desirable for anyone actually _using_ this
> code.
>
> - While XDP was originally made for hardware offloading, and while many
> look disdainfully upon the generic mode, it nevertheless remains a
> highly useful and popular way of adding bespoke packet
> transformations, and from that perspective, a difference between layer
> 2 and layer 3 packets is immaterial if the user is primarily concerned
> with transformations to layer 3 and beyond.
>
> - It's not impossible to imagine layer 3 hardware (e.g. a WireGuard PCIe
> card) including eBPF/XDP functionality built-in. In that case, why
> limit XDP as a technology to only layer 2? Then, having generic XDP
> work for layer 3 would naturally fit as well.
>
> [1] https://lore.kernel.org/wireguard/M5WzVK5--3-2@tuta.io/
>
> Reported-by: Thomas Ptacek <thomas@sockpuppet.org>
> Reported-by: Adhipati Blambangan <adhipati@tuta.io>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>
> I had originally dropped this patch, but the issue kept coming up in
> user reports, so here's a v4 of it. Testing of it is still rather slim,
> but hopefully that will change in the coming days.
>
> Changes v5->v6:
> - The fix to the skb->protocol changing case is now in a separate
> stand-alone patch, and removed from this one, so that it can be
> evaluated separately.
>
> Changes v4->v5:
> - Rather than tracking in a messy manner whether the skb is l3, we just
> do the check once, and then adjust the skb geometry to be identical to
> the l2 case. This simplifies the code quite a bit.
> - Fix a preexisting bug where the l2 header remained attached if
> skb->protocol was updated.
>
> Changes v3->v4:
> - We now preserve the same logic for XDP_TX/XDP_REDIRECT as before.
> - hard_header_len is checked in addition to mac_len.
>
> net/core/dev.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 151f1651439f..79c15f4244e6 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4630,6 +4630,18 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> * header.
> */
> mac_len = skb->data - skb_mac_header(skb);
> + if (!mac_len && !skb->dev->hard_header_len) {
> + /* For l3 packets, we push on a fake mac header, and then
> + * pull it off again, so that it has the same skb geometry
> + * as for the l2 case.
> + */
> + eth = skb_push(skb, ETH_HLEN);
> + eth_zero_addr(eth->h_source);
> + eth_zero_addr(eth->h_dest);
> + eth->h_proto = skb->protocol;
> + __skb_pull(skb, ETH_HLEN);
> + mac_len = ETH_HLEN;
> + }
You are consuming a little bit of the headroom here.
> hlen = skb_headlen(skb) + mac_len;
> xdp->data = skb->data - mac_len;
> xdp->data_meta = xdp->data;
The XDP-prog is allowed to change eth->h_proto. Later (in code) we
detect this and update skb->protocol with the new protocol.
What will happen if my XDP-prog adds a VLAN header?
The selftest tools/testing/selftests/bpf/test_xdp_vlan.sh test these
situations. You can use it as an example, and write/construct a test
that does the same for your Wireguard devices. As minimum you need to
provide such a selftest together with this patch.
Generally speaking, IMHO generic-XDP was a mistake, because it is hard
to maintain and slows down the development of XDP. (I have a number of
fixes in my TODO backlog for generic-XDP). Adding this will just give
us more corner-cases that need to be maintained.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2020-08-19 7:08 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAHmME9rbRrdV0ePxT0DgurGdEKOWiEi5mH5Wtg=aJwSA6fxwMg@mail.gmail.com>
2020-08-14 7:30 ` [PATCH net v5] net: xdp: account for layer 3 packets in generic skb handler Jason A. Donenfeld
2020-08-14 20:55 ` David Miller
2020-08-14 20:57 ` Jason A. Donenfeld
2020-08-15 7:41 ` [PATCH net v6] " Jason A. Donenfeld
2020-08-19 7:07 ` Jesper Dangaard Brouer [this message]
2020-08-19 23:22 ` David Miller
2020-08-20 9:13 ` Jason A. Donenfeld
2020-08-20 18:55 ` David Miller
2020-08-20 20:29 ` Jason A. Donenfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200819090756.07f76eb9@carbon \
--to=brouer@redhat.com \
--cc=Jason@zx2c4.com \
--cc=adhipati@tuta.io \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=thomas@sockpuppet.org \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).