From: Daniel Xu <dxu@dxuuu.xyz>
To: bpf@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
coreteam@netfilter.org, netfilter-devel@vger.kernel.org,
fw@strlen.de, daniel@iogearbox.net
Cc: dsahern@kernel.org
Subject: [PATCH bpf-next 0/7] Support defragmenting IPv(4|6) packets in BPF
Date: Mon, 26 Jun 2023 17:02:07 -0600 [thread overview]
Message-ID: <cover.1687819413.git.dxu@dxuuu.xyz> (raw)
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Patchset details ===
There was an earlier attempt at providing defrag via kfuncs [1]. The
feedback was that we could end up doing too much stuff in prog execution
context (like sending ICMP error replies). However, I think there are
still some outstanding discussion w.r.t. performance when it comes to
netfilter vs the previous approach. I'll schedule some time during
office hours for this.
Patches 1 & 2 are stolenfrom Florian. Hopefully he doesn't mind. There
were some outstanding comments on the v2 [2] but it doesn't look like a
v3 was ever submitted. I've addressed the comments and put them in this
patchset cuz I needed them.
Finally, the new selftest seems to be a little flaky. I'm not quite
sure why the server will fail to `recvfrom()` occassionaly. I'm fairly
sure it's a timing related issue with creating veths. I'll keep
debugging but I didn't want that to hold up discussion on this patchset.
[0]: https://datatracker.ietf.org/doc/html/rfc8900
[1]: https://lore.kernel.org/bpf/cover.1677526810.git.dxu@dxuuu.xyz/
[2]: https://lore.kernel.org/bpf/20230525110100.8212-1-fw@strlen.de/
Daniel Xu (7):
tools: libbpf: add netfilter link attach helper
selftests/bpf: Add bpf_program__attach_netfilter helper test
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 12 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 8 +
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 10 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 108 ++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/lib/bpf/bpf.c | 8 +
tools/lib/bpf/bpf.h | 6 +
tools/lib/bpf/libbpf.c | 47 +++
tools/lib/bpf/libbpf.h | 15 +
tools/lib/bpf/libbpf.map | 1 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 282 ++++++++++++++++++
.../bpf/prog_tests/netfilter_basic.c | 78 +++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
.../bpf/progs/test_netfilter_link_attach.c | 14 +
21 files changed, 868 insertions(+), 21 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/netfilter_basic.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/test_netfilter_link_attach.c
--
2.40.1
next reply other threads:[~2023-06-26 23:02 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-26 23:02 Daniel Xu [this message]
2023-06-26 23:02 ` [PATCH bpf-next 1/7] tools: libbpf: add netfilter link attach helper Daniel Xu
2023-06-27 0:11 ` Andrii Nakryiko
2023-06-26 23:02 ` [PATCH bpf-next 2/7] selftests/bpf: Add bpf_program__attach_netfilter helper test Daniel Xu
2023-06-26 23:02 ` [PATCH bpf-next 3/7] netfilter: defrag: Add glue hooks for enabling/disabling defrag Daniel Xu
2023-06-27 11:04 ` Florian Westphal
2023-06-26 23:02 ` [PATCH bpf-next 4/7] netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link Daniel Xu
2023-06-27 11:12 ` Florian Westphal
2023-06-27 15:35 ` Daniel Xu
2023-06-26 23:02 ` [PATCH bpf-next 5/7] bpf: selftests: Support not connecting client socket Daniel Xu
2023-06-26 23:02 ` [PATCH bpf-next 6/7] bpf: selftests: Support custom type and proto for client sockets Daniel Xu
2023-06-26 23:02 ` [PATCH bpf-next 7/7] bpf: selftests: Add defrag selftests Daniel Xu
2023-06-27 10:48 ` [PATCH bpf-next 0/7] Support defragmenting IPv(4|6) packets in BPF Florian Westphal
2023-06-27 14:18 ` Daniel Xu
2023-06-27 14:25 ` Toke Høiland-Jørgensen
2023-06-27 14:51 ` Daniel Xu
2023-06-27 15:44 ` Florian Westphal
2023-06-29 12:16 ` Toke Høiland-Jørgensen
2023-06-29 13:21 ` Florian Westphal
2023-06-29 14:35 ` Toke Høiland-Jørgensen
2023-06-29 14:53 ` Florian Westphal
2023-06-29 17:59 ` Daniel Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1687819413.git.dxu@dxuuu.xyz \
--to=dxu@dxuuu.xyz \
--cc=bpf@vger.kernel.org \
--cc=coreteam@netfilter.org \
--cc=daniel@iogearbox.net \
--cc=dsahern@kernel.org \
--cc=fw@strlen.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).