From: Paul Chaignon <paul.chaignon@gmail.com>
To: bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>
Subject: [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes
Date: Wed, 21 May 2025 00:06:24 +0200 [thread overview]
Message-ID: <aCz84JU60wd8etiT@mail.gmail.com> (raw)
In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
things, update the L4 checksum after reverse SNATing IPv6 packets. That
use case is however not currently supported and leads to invalid
skb->csum values in some cases. This patch adds support for IPv6 address
changes in bpf_l4_csum_update via a new flag.
When calling bpf_l4_csum_replace in Cilium, it ends up calling
inet_proto_csum_replace_by_diff:
1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
2: __wsum diff, bool pseudohdr)
3: {
4: if (skb->ip_summed != CHECKSUM_PARTIAL) {
5: csum_replace_by_diff(sum, diff);
6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
7: skb->csum = ~csum_sub(diff, skb->csum);
8: } else if (pseudohdr) {
9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
10: }
11: }
The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
updated one of the IPv6 addresses. The helper now updates the L4 header
checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
For an IPv6 packet, the updates of the IPv6 address and of the L4
checksum will cancel each other. The checksums are set such that
computing a checksum over the packet including its checksum will result
in a sum of 0. So the same is true here when we update the L4 checksum
on line 5. We'll update it as to cancel the previous IPv6 address
update. Hence skb->csum should remain untouched in this case.
The same bug doesn't affect IPv4 packets because, in that case, three
fields are updated: the IPv4 address, the IP checksum, and the L4
checksum. The change to the IPv4 address and one of the checksums still
cancel each other in skb->csum, but we're left with one checksum update
and should therefore update skb->csum accordingly. That's exactly what
inet_proto_csum_replace_by_diff does.
This special case for IPv6 L4 checksums is also described atop
inet_proto_csum_replace16, the function we should be using in this case.
This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6_ADDR,
to indicate we're updating the L4 checksum after an IPv6 address change.
When this flag is set, bpf_l4_csum_replace will call
inet_proto_csum_replace16 instead of inet_proto_csum_replace_by_diff.
I'm sending this as an RFC because I'm not convinced this is the ideal
solution, but also prefer to discuss this around concrete code rather
than a simple bug report. In particular, I would have preferred a
solution we can backport to stable. Since we will have to change the API
to provide additional information to bpf_l4_csum_replace, I'm unsure
such a solution exists.
Fixes: 7d672345ed295 ("bpf: add generic bpf_csum_diff helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
---
include/uapi/linux/bpf.h | 1 +
net/core/filter.c | 21 +++++++++++++--------
tools/include/uapi/linux/bpf.h | 1 +
3 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 16e95398c91c..ed2a8d223c86 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6069,6 +6069,7 @@ enum {
BPF_F_PSEUDO_HDR = (1ULL << 4),
BPF_F_MARK_MANGLED_0 = (1ULL << 5),
BPF_F_MARK_ENFORCE = (1ULL << 6),
+ BPF_F_IPV6_ADDR = (1ULL << 7),
};
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
diff --git a/net/core/filter.c b/net/core/filter.c
index 30e7d3679088..777c9e467d07 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1965,13 +1965,15 @@ static const struct bpf_func_proto bpf_l3_csum_replace_proto = {
BPF_CALL_5(bpf_l4_csum_replace, struct sk_buff *, skb, u32, offset,
u64, from, u64, to, u64, flags)
{
- bool is_pseudo = flags & BPF_F_PSEUDO_HDR;
- bool is_mmzero = flags & BPF_F_MARK_MANGLED_0;
- bool do_mforce = flags & BPF_F_MARK_ENFORCE;
+ bool is_pseudo = flags & BPF_F_PSEUDO_HDR;
+ bool is_mmzero = flags & BPF_F_MARK_MANGLED_0;
+ bool do_mforce = flags & BPF_F_MARK_ENFORCE;
+ bool is_ipv6_addr = flags & BPF_F_IPV6_ADDR;
__sum16 *ptr;
if (unlikely(flags & ~(BPF_F_MARK_MANGLED_0 | BPF_F_MARK_ENFORCE |
- BPF_F_PSEUDO_HDR | BPF_F_HDR_FIELD_MASK)))
+ BPF_F_PSEUDO_HDR | BPF_F_HDR_FIELD_MASK |
+ BPF_F_IPV6_ADDR)))
return -EINVAL;
if (unlikely(offset > 0xffff || offset & 1))
return -EFAULT;
@@ -1984,10 +1986,13 @@ BPF_CALL_5(bpf_l4_csum_replace, struct sk_buff *, skb, u32, offset,
switch (flags & BPF_F_HDR_FIELD_MASK) {
case 0:
- if (unlikely(from != 0))
- return -EINVAL;
-
- inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo);
+ if (is_ipv6_addr) {
+ inet_proto_csum_replace16(ptr, skb, (void *)from, (void *)to, is_pseudo);
+ } else {
+ if (unlikely(from != 0))
+ return -EINVAL;
+ inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo);
+ }
break;
case 2:
inet_proto_csum_replace2(ptr, skb, from, to, is_pseudo);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 16e95398c91c..ed2a8d223c86 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6069,6 +6069,7 @@ enum {
BPF_F_PSEUDO_HDR = (1ULL << 4),
BPF_F_MARK_MANGLED_0 = (1ULL << 5),
BPF_F_MARK_ENFORCE = (1ULL << 6),
+ BPF_F_IPV6_ADDR = (1ULL << 7),
};
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
--
2.43.0
next reply other threads:[~2025-05-20 22:06 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-20 22:06 Paul Chaignon [this message]
2025-05-21 1:07 ` [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes Alexei Starovoitov
2025-05-21 21:19 ` Paul Chaignon
2025-05-22 19:01 ` Paul Chaignon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aCz84JU60wd8etiT@mail.gmail.com \
--to=paul.chaignon@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox