* [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes
@ 2025-05-20 22:06 Paul Chaignon
2025-05-21 1:07 ` Alexei Starovoitov
0 siblings, 1 reply; 4+ messages in thread
From: Paul Chaignon @ 2025-05-20 22:06 UTC (permalink / raw)
To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
things, update the L4 checksum after reverse SNATing IPv6 packets. That
use case is however not currently supported and leads to invalid
skb->csum values in some cases. This patch adds support for IPv6 address
changes in bpf_l4_csum_update via a new flag.
When calling bpf_l4_csum_replace in Cilium, it ends up calling
inet_proto_csum_replace_by_diff:
1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
2: __wsum diff, bool pseudohdr)
3: {
4: if (skb->ip_summed != CHECKSUM_PARTIAL) {
5: csum_replace_by_diff(sum, diff);
6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
7: skb->csum = ~csum_sub(diff, skb->csum);
8: } else if (pseudohdr) {
9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
10: }
11: }
The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
updated one of the IPv6 addresses. The helper now updates the L4 header
checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
For an IPv6 packet, the updates of the IPv6 address and of the L4
checksum will cancel each other. The checksums are set such that
computing a checksum over the packet including its checksum will result
in a sum of 0. So the same is true here when we update the L4 checksum
on line 5. We'll update it as to cancel the previous IPv6 address
update. Hence skb->csum should remain untouched in this case.
The same bug doesn't affect IPv4 packets because, in that case, three
fields are updated: the IPv4 address, the IP checksum, and the L4
checksum. The change to the IPv4 address and one of the checksums still
cancel each other in skb->csum, but we're left with one checksum update
and should therefore update skb->csum accordingly. That's exactly what
inet_proto_csum_replace_by_diff does.
This special case for IPv6 L4 checksums is also described atop
inet_proto_csum_replace16, the function we should be using in this case.
This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6_ADDR,
to indicate we're updating the L4 checksum after an IPv6 address change.
When this flag is set, bpf_l4_csum_replace will call
inet_proto_csum_replace16 instead of inet_proto_csum_replace_by_diff.
I'm sending this as an RFC because I'm not convinced this is the ideal
solution, but also prefer to discuss this around concrete code rather
than a simple bug report. In particular, I would have preferred a
solution we can backport to stable. Since we will have to change the API
to provide additional information to bpf_l4_csum_replace, I'm unsure
such a solution exists.
Fixes: 7d672345ed295 ("bpf: add generic bpf_csum_diff helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
---
include/uapi/linux/bpf.h | 1 +
net/core/filter.c | 21 +++++++++++++--------
tools/include/uapi/linux/bpf.h | 1 +
3 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 16e95398c91c..ed2a8d223c86 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6069,6 +6069,7 @@ enum {
BPF_F_PSEUDO_HDR = (1ULL << 4),
BPF_F_MARK_MANGLED_0 = (1ULL << 5),
BPF_F_MARK_ENFORCE = (1ULL << 6),
+ BPF_F_IPV6_ADDR = (1ULL << 7),
};
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
diff --git a/net/core/filter.c b/net/core/filter.c
index 30e7d3679088..777c9e467d07 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1965,13 +1965,15 @@ static const struct bpf_func_proto bpf_l3_csum_replace_proto = {
BPF_CALL_5(bpf_l4_csum_replace, struct sk_buff *, skb, u32, offset,
u64, from, u64, to, u64, flags)
{
- bool is_pseudo = flags & BPF_F_PSEUDO_HDR;
- bool is_mmzero = flags & BPF_F_MARK_MANGLED_0;
- bool do_mforce = flags & BPF_F_MARK_ENFORCE;
+ bool is_pseudo = flags & BPF_F_PSEUDO_HDR;
+ bool is_mmzero = flags & BPF_F_MARK_MANGLED_0;
+ bool do_mforce = flags & BPF_F_MARK_ENFORCE;
+ bool is_ipv6_addr = flags & BPF_F_IPV6_ADDR;
__sum16 *ptr;
if (unlikely(flags & ~(BPF_F_MARK_MANGLED_0 | BPF_F_MARK_ENFORCE |
- BPF_F_PSEUDO_HDR | BPF_F_HDR_FIELD_MASK)))
+ BPF_F_PSEUDO_HDR | BPF_F_HDR_FIELD_MASK |
+ BPF_F_IPV6_ADDR)))
return -EINVAL;
if (unlikely(offset > 0xffff || offset & 1))
return -EFAULT;
@@ -1984,10 +1986,13 @@ BPF_CALL_5(bpf_l4_csum_replace, struct sk_buff *, skb, u32, offset,
switch (flags & BPF_F_HDR_FIELD_MASK) {
case 0:
- if (unlikely(from != 0))
- return -EINVAL;
-
- inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo);
+ if (is_ipv6_addr) {
+ inet_proto_csum_replace16(ptr, skb, (void *)from, (void *)to, is_pseudo);
+ } else {
+ if (unlikely(from != 0))
+ return -EINVAL;
+ inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo);
+ }
break;
case 2:
inet_proto_csum_replace2(ptr, skb, from, to, is_pseudo);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 16e95398c91c..ed2a8d223c86 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6069,6 +6069,7 @@ enum {
BPF_F_PSEUDO_HDR = (1ULL << 4),
BPF_F_MARK_MANGLED_0 = (1ULL << 5),
BPF_F_MARK_ENFORCE = (1ULL << 6),
+ BPF_F_IPV6_ADDR = (1ULL << 7),
};
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes
2025-05-20 22:06 [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes Paul Chaignon
@ 2025-05-21 1:07 ` Alexei Starovoitov
2025-05-21 21:19 ` Paul Chaignon
0 siblings, 1 reply; 4+ messages in thread
From: Alexei Starovoitov @ 2025-05-21 1:07 UTC (permalink / raw)
To: Paul Chaignon, Martin KaFai Lau
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
On Tue, May 20, 2025 at 3:06 PM Paul Chaignon <paul.chaignon@gmail.com> wrote:
>
> In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
> things, update the L4 checksum after reverse SNATing IPv6 packets. That
> use case is however not currently supported and leads to invalid
> skb->csum values in some cases. This patch adds support for IPv6 address
> changes in bpf_l4_csum_update via a new flag.
>
> When calling bpf_l4_csum_replace in Cilium, it ends up calling
> inet_proto_csum_replace_by_diff:
>
> 1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
> 2: __wsum diff, bool pseudohdr)
> 3: {
> 4: if (skb->ip_summed != CHECKSUM_PARTIAL) {
> 5: csum_replace_by_diff(sum, diff);
> 6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
> 7: skb->csum = ~csum_sub(diff, skb->csum);
> 8: } else if (pseudohdr) {
> 9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
> 10: }
> 11: }
>
> The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
> updated one of the IPv6 addresses. The helper now updates the L4 header
> checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
>
> For an IPv6 packet, the updates of the IPv6 address and of the L4
> checksum will cancel each other. The checksums are set such that
> computing a checksum over the packet including its checksum will result
> in a sum of 0. So the same is true here when we update the L4 checksum
> on line 5. We'll update it as to cancel the previous IPv6 address
> update. Hence skb->csum should remain untouched in this case.
Is ILA broken then?
net/ipv6/ila/ila_common.c is using
inet_proto_csum_replace_by_diff()
or is it simply doing it differently?
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes
2025-05-21 1:07 ` Alexei Starovoitov
@ 2025-05-21 21:19 ` Paul Chaignon
2025-05-22 19:01 ` Paul Chaignon
0 siblings, 1 reply; 4+ messages in thread
From: Paul Chaignon @ 2025-05-21 21:19 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
On Tue, May 20, 2025 at 06:07:15PM -0700, Alexei Starovoitov wrote:
> On Tue, May 20, 2025 at 3:06 PM Paul Chaignon <paul.chaignon@gmail.com> wrote:
> >
> > In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
> > things, update the L4 checksum after reverse SNATing IPv6 packets. That
> > use case is however not currently supported and leads to invalid
> > skb->csum values in some cases. This patch adds support for IPv6 address
> > changes in bpf_l4_csum_update via a new flag.
> >
> > When calling bpf_l4_csum_replace in Cilium, it ends up calling
> > inet_proto_csum_replace_by_diff:
> >
> > 1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
> > 2: __wsum diff, bool pseudohdr)
> > 3: {
> > 4: if (skb->ip_summed != CHECKSUM_PARTIAL) {
> > 5: csum_replace_by_diff(sum, diff);
> > 6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
> > 7: skb->csum = ~csum_sub(diff, skb->csum);
> > 8: } else if (pseudohdr) {
> > 9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
> > 10: }
> > 11: }
> >
> > The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
> > updated one of the IPv6 addresses. The helper now updates the L4 header
> > checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
> >
> > For an IPv6 packet, the updates of the IPv6 address and of the L4
> > checksum will cancel each other. The checksums are set such that
> > computing a checksum over the packet including its checksum will result
> > in a sum of 0. So the same is true here when we update the L4 checksum
> > on line 5. We'll update it as to cancel the previous IPv6 address
> > update. Hence skb->csum should remain untouched in this case.
>
> Is ILA broken then?
> net/ipv6/ila/ila_common.c is using
> inet_proto_csum_replace_by_diff()
>
> or is it simply doing it differently?
As far as I can tell, yes, it's affected by the same issue. Maybe nobody
noticed because 1) it only happens for CHECKSUM_COMPLETE and 2) only the
adj-transport checksum mode for ILA is affected. That mode doesn't look
covered in selftests and isn't the one recommended to users.
With ILA also affected, maybe passing the IPv6 flag to
inet_proto_csum_replace_by_diff is a better solution. That way we could
keep using csum_diff + l4_csum_replace.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes
2025-05-21 21:19 ` Paul Chaignon
@ 2025-05-22 19:01 ` Paul Chaignon
0 siblings, 0 replies; 4+ messages in thread
From: Paul Chaignon @ 2025-05-22 19:01 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
On Wed, May 21, 2025 at 11:19:41PM +0200, Paul Chaignon wrote:
> On Tue, May 20, 2025 at 06:07:15PM -0700, Alexei Starovoitov wrote:
> > On Tue, May 20, 2025 at 3:06 PM Paul Chaignon <paul.chaignon@gmail.com> wrote:
> > >
> > > In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
> > > things, update the L4 checksum after reverse SNATing IPv6 packets. That
> > > use case is however not currently supported and leads to invalid
> > > skb->csum values in some cases. This patch adds support for IPv6 address
> > > changes in bpf_l4_csum_update via a new flag.
> > >
> > > When calling bpf_l4_csum_replace in Cilium, it ends up calling
> > > inet_proto_csum_replace_by_diff:
> > >
> > > 1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
> > > 2: __wsum diff, bool pseudohdr)
> > > 3: {
> > > 4: if (skb->ip_summed != CHECKSUM_PARTIAL) {
> > > 5: csum_replace_by_diff(sum, diff);
> > > 6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
> > > 7: skb->csum = ~csum_sub(diff, skb->csum);
> > > 8: } else if (pseudohdr) {
> > > 9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
> > > 10: }
> > > 11: }
> > >
> > > The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
> > > updated one of the IPv6 addresses. The helper now updates the L4 header
> > > checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
> > >
> > > For an IPv6 packet, the updates of the IPv6 address and of the L4
> > > checksum will cancel each other. The checksums are set such that
> > > computing a checksum over the packet including its checksum will result
> > > in a sum of 0. So the same is true here when we update the L4 checksum
> > > on line 5. We'll update it as to cancel the previous IPv6 address
> > > update. Hence skb->csum should remain untouched in this case.
> >
> > Is ILA broken then?
> > net/ipv6/ila/ila_common.c is using
> > inet_proto_csum_replace_by_diff()
> >
> > or is it simply doing it differently?
>
> As far as I can tell, yes, it's affected by the same issue. Maybe nobody
> noticed because 1) it only happens for CHECKSUM_COMPLETE and 2) only the
> adj-transport checksum mode for ILA is affected. That mode doesn't look
> covered in selftests and isn't the one recommended to users.
>
> With ILA also affected, maybe passing the IPv6 flag to
> inet_proto_csum_replace_by_diff is a better solution. That way we could
> keep using csum_diff + l4_csum_replace.
I was able to reproduce the bug locally on an ILA setup with
adj-transport checksum mode. The tricky part was ensuring
inet_proto_csum_replace_by_diff is called while we're in
CHECKSUM_COMPLETE state, but I imagine that's easier with a proper NIC
doing checksum offload on a server.
Configuration on the target:
$ ip a show dev eth0
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 62:ae:35:9e:0f:8d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 3333:0:0:1::e952/64 scope global
valid_lft forever preferred_lft forever
inet6 fd00:10:244:2::e952/128 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::60ae:35ff:fe9e:f8d/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ ip ila add loc_match fd00:10:244:2 loc 3333:0:0:1 csum-mode adj-transport ident-type luid dev eth0
Then I hit [fd00:10:244:2::e952]:8000 with a server listening only on
[3333:0:0:1::e952]:8000. With the bug, the SYN packet is dropped with
SKB_DROP_REASON_TCP_CSUM after inet_proto_csum_replace_by_diff changed
skb->csum. pwru trace:
IFACE TUPLE FUNC
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ipv6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ip6_rcv_core
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) nf_hook_slow
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) inet_proto_csum_replace_by_diff
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_early_demux
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_route_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input_finish
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_protocol_deliver_rcu
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) raw6_local_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ipv6_raw_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) __skb_checksum_complete
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skb_reason(SKB_DROP_REASON_TCP_CSUM)
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_head_state
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_data
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_free_head
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skbmem
With the fix, I can see inet_proto_csum_replace_by_diff is not touching
skb->csum and I get a SYN+ACK (after which it fails since I didn't
configure the whole ILA setup).
I'll send a patchset to fix both the ILA and BPF sides. Not sure which
tree I should send it to though?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-05-22 19:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20 22:06 [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes Paul Chaignon
2025-05-21 1:07 ` Alexei Starovoitov
2025-05-21 21:19 ` Paul Chaignon
2025-05-22 19:01 ` Paul Chaignon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox