All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Daniel Golle <daniel@makrotopia.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	Florent Daigniere <nextgens@freenetproject.org>,
	WireGuard mailing list <wireguard@lists.zx2c4.com>
Subject: Re: passing-through TOS/DSCP marking
Date: Wed, 30 Jun 2021 22:55:09 +0200	[thread overview]
Message-ID: <87h7hf139u.fsf@toke.dk> (raw)
In-Reply-To: <YNyopHsKX2m5HSRr@makrotopia.org>

Daniel Golle <daniel@makrotopia.org> writes:

> Hi Toke,
>
> On Mon, Jun 21, 2021 at 04:27:08PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel Golle <daniel@makrotopia.org> writes:
>> 
>> > On Fri, Jun 18, 2021 at 02:24:29PM +0200, Jason A. Donenfeld wrote:
>> >> Hey Toke,
>> >> 
>> >> On Fri, Jun 18, 2021 at 1:05 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> >> > > I think you can achieve something similar using BPF filters, by relying
>> >> > > on wireguard passing through the skb->hash value when encrypting.
>> >> > >
>> >> > > Simply attach a TC-BPF filter to the wireguard netdev, pull out the DSCP
>> >> > > value and store it in a map keyed on skb->hash. Then, run a second BPF
>> >> > > filter on the physical interface that shares that same map, lookup the
>> >> > > DSCP value based on the skb->hash value, and rewrite the outer IP
>> >> > > header.
>> >> > >
>> >> > > The read-side filter will need to use bpf_get_hash_recalc() to make sure
>> >> > > the hash is calculated before the packet gets handed to wireguard, and
>> >> > > it'll be subject to hash collisions, but I think it should generally
>> >> > > work fairly well (for anything that's flow-based of course). And it can
>> >> > > be done without patching wireguard itself :)
>> >> >
>> >> > Just for fun I implemented such a pair of eBPF filters, and tested that
>> >> > it does indeed work for preserving DSCP marks on a Wireguard tunnel. The
>> >> > PoC is here:
>> >> >
>> >> > https://github.com/xdp-project/bpf-examples/tree/master/preserve-dscp
>> >> >
>> >> > To try it out (you'll need a recent-ish kernel and clang version) run:
>> >> >
>> >> > git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
>> >> > cd bpf-examples/preserve-dscp
>> >> > make
>> >> > ./preserve-dscp wg0 eth0
>> >> >
>> >> > (assuming wg0 and eth0 are the wireguard and physical interfaces in
>> >> > question, respectively).
>> >> >
>> >> > To actually deploy this it would probably need a few tweaks; in
>> >> > particular the second filter that rewrites packets should probably check
>> >> > that the packets are actually part of the Wireguard tunnel in question
>> >> > (by parsing the UDP header and checking the source port) before writing
>> >> > anything to the packet.
>> >> >
>> >> > -Toke
>> >> 
>> >> That is a super cool approach. Thanks for writing that! Sounds like a
>> >> good approach, and one pretty easy to deploy, without the need to
>> >> patch kernels and such.
>> >> 
>> >> Also, nice usage of BPF_MAP_TYPE_LRU_HASH for this.
>> >> 
>> >> Daniel -- can you let the list know if this works for your use case?
>> >
>> > Turns out not exactly easy to deploy (on OpenWrt), as it depends on an
>> > extremely recent environment. I will try pushing to that direction, but
>> > it doesn't look like it's going to be ready very soon.
>> >
>> > In terms of toolchain: LLVM/Clang is a very bulky beast, I gave up on
>> > that and started working on integrating GCC-10's BPF target in our build
>> > system...
>> 
>> I saw that, but I have no idea if GCC's BPF target support will support
>> this. My tentative guess would be no, unfortunately :(
>
> Probably you are right. When building the BPF object with GCC, the
> result is:
> root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
> libbpf: elf: skipping unrecognized data section(4) .stab
> libbpf: elf: skipping relo section(5) .rel.stab for section(4) .stab
> libbpf: elf: skipping unrecognized data section(13) .comment
> libbpf: BTF is required, but is missing or corrupted.
> Couldn't open file: preserve_dscp_kern.o

Hmm, for this example it should be possible to make it run without BTF.
I'm only using that for the map definition, so that could be changed to
the old format; you could try this patch:

diff --git a/preserve-dscp/preserve_dscp_kern.c b/preserve-dscp/preserve_dscp_kern.c
index 24120cb8a3ff..08248e1f0e41 100644
--- a/preserve-dscp/preserve_dscp_kern.c
+++ b/preserve-dscp/preserve_dscp_kern.c
@@ -9,12 +9,12 @@
  * otherwise clean up stale entries. Instead, we just rely on the LRU mechanism
  * to evict old entries as the map fills up.
  */
-struct {
-       __uint(type, BPF_MAP_TYPE_LRU_HASH);
-       __type(key, __u32);
-       __type(value, __u8);
-       __uint(max_entries, 16384);
-} flow_dscps SEC(".maps");
+struct bpf_map_def SEC("maps") flow_dscps = {
+       .type           = BPF_MAP_TYPE_LRU_HASH,
+       .key_size       = sizeof(__u32),
+       .value_size     = sizeof(__u8),
+       .max_entries    = 16384,
+};
 
 const volatile static int ip_only = 0;

> Using the LLVM/Clang compiled object also doesn't work:
> root@OpenWrt:/usr/lib/bpf# preserve-dscp wg0 eth0
> libbpf: Error in bpf_create_map_xattr(flow_dscps):Operation not permitted(-1). Retrying without BTF.
> libbpf: map 'flow_dscps': failed to create: Operation not permitted(-1)
> libbpf: permission error while running as root; try raising 'ulimit -l'? current value: 512.0 KiB
> libbpf: failed to load object 'preserve_dscp_kern.o'
> Failed to load object
>
> Probably Kernel 5.4.124 is too old...?

Here I think the hint is in the error message ;)

>> An alternative to getting LLVM built as part of the OpenWrt toolchain is
>> to just use the host clang to build the BPF binaries. It doesn't
>> actually need to be cross-compiled with a special compiler, the BPF byte
>> code format is the same on all architectures except for endianness, so
>> just passing that to the host clang should theoretically be enough...
>
> I believe that having a way to build BPF objects compatible with the
> target built-into our toolchain would be a huge step forward.
> And given that gcc already get's pretty far, I think it'd be worth
> fixing/patching what ever is missing (I haven't even tried GCC-11 yet)

For this example that might work (as noted above), but for other things
BTF is a hard requirement, and I don't believe GCC supports that at all,
sadly :(

> Find my staging tree including 'preserve-dscp' ready to play with:
>
> https://git.openwrt.org/?p=openwrt/staging/dangole.git;a=shortlog;h=refs/heads/gcc10-bpf
>
> Select 'Enable experimental features by default', but note that toolchain
> doesn't build when selecting Linux 5.10 for x86, so you need to un-select
> 'Use testing Kernel' if building for x86.
> And have a look at the patch for allow building bpf-examples BPF objects
> with GCC in package/network/utils/bpf-examples/patches
>
>
>> 
>> > In terms of kernel support: recent kernels don't build yet because of
>> > gelf_getsymshndx, so we got to update libelf first for that. Recent
>> > libelf doesn't seem to be an option yet on many of the build hosts we
>> > currently support (Darwin and such).
>> >
>> > In terms of library support: our build of libbpf comes from Linux
>> > release tarballs. There isn't yet a release supporting bpf_tc_attach,
>> > the easiest would be to wait for Linux 5.13 to be released.
>> 
>> I used the libbpf TC loading support for convenience, but it's possible
>> to load it using 'tc' as well without too much trouble (right now the
>> userspace component sets a config variable before loading the program,
>> but it can be restructured to not need that).
>> 
>> Alternatively, the bpf-examples repository is setup with a libbpf
>> submodule that it can link statically against, so you could use that for
>> now?
>
> I've updated to 5.13 + patches on top, so now it builds :)

Alright, that works.

> Library-embedding is a no-go for OpenWrt. Having different ABI-versions
> of libraries installed simultanously works, so we can just ship with
> a more recent version of libbpf.

Yeah, I wasn't suggesting it as a permanent solution, just so you could
test it out :)

>> > I (of course ;) also tried and spend almost a day looking for a
>> > quick-and-dirty path for temporary deployment, so I could at least give
>> > feedback -- bpf-examples also isn't exactly made to be cross-compiled
>> > manually, so I have failed with that as well so far.
>> 
>> Heh, no, it isn't, really. Anything in particular you need to make this
>> easier? We already added some bits to xdp-tools for supporting
>> cross-compilation (and that shares some lineage with bpf-examples), so
>> porting those over should not be too difficult.
>
> I found my way around, see the packaging for bpf-examples in the tree
> (link above, at path stated above)

Right, I see. 

>> 
>> See: https://github.com/xdp-project/xdp-tools/pull/78 and
>> https://github.com/xdp-project/xdp-tools/issues/74
>> 
>> Unfortunately I don't have a lot of time to poke more at this right now,
>> but feel free to open up an issue / pull request to the bpf-examples
>> repository with any changes you need :)
>
> I guess I'll just go ahead then and package xdp-tools :)

That would be awesome! xdp-tools will definitely need BTF, though, so
I'm afraid it'll need to be compiled with LLVM at this stage...

-Toke

  reply	other threads:[~2021-06-30 20:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16 13:24 passing-through TOS/DSCP marking Daniel Golle
2021-06-16 16:28 ` Jason A. Donenfeld
2021-06-16 19:26   ` Daniel Golle
2021-06-16 23:33     ` Toke Høiland-Jørgensen
2021-06-17  7:55       ` Florent Daigniere
2021-06-17  9:41         ` Daniel Golle
2021-06-17 12:24           ` Toke Høiland-Jørgensen
     [not found]             ` <CAMaqUZ09KRtp01OK3u-Di52X_kH9eT4E-wmnPc6QzjSCd5dEiw@mail.gmail.com>
2021-06-17 20:54               ` Toke Høiland-Jørgensen
2021-06-17 23:04             ` Toke Høiland-Jørgensen
2021-06-18 12:24               ` Jason A. Donenfeld
2021-06-21 12:36                 ` Daniel Golle
2021-06-21 14:27                   ` Toke Høiland-Jørgensen
2021-06-30 17:23                     ` Daniel Golle
2021-06-30 20:55                       ` Toke Høiland-Jørgensen [this message]
2021-07-04 14:15                         ` Daniel Golle
2021-07-05 15:21                           ` Toke Høiland-Jørgensen
2021-07-05 16:05                             ` Daniel Golle
2021-07-05 16:59                               ` Toke Høiland-Jørgensen
2021-07-05 17:26                                 ` Daniel Golle
2021-07-05 21:20                                   ` Toke Høiland-Jørgensen
2021-07-06  7:00   ` Florent Daigniere
2021-07-06 20:08     ` Luiz Angelo Daros de Luca

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h7hf139u.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=Jason@zx2c4.com \
    --cc=daniel@makrotopia.org \
    --cc=nextgens@freenetproject.org \
    --cc=wireguard@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.