From: Paul Chaignon <paul.chaignon@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes
Date: Thu, 22 May 2025 21:01:17 +0200 [thread overview]
Message-ID: <aC90ffX6vpTPH5_F@mail.gmail.com> (raw)
In-Reply-To: <aC5DamQVPviBmNe5@mail.gmail.com>
On Wed, May 21, 2025 at 11:19:41PM +0200, Paul Chaignon wrote:
> On Tue, May 20, 2025 at 06:07:15PM -0700, Alexei Starovoitov wrote:
> > On Tue, May 20, 2025 at 3:06 PM Paul Chaignon <paul.chaignon@gmail.com> wrote:
> > >
> > > In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
> > > things, update the L4 checksum after reverse SNATing IPv6 packets. That
> > > use case is however not currently supported and leads to invalid
> > > skb->csum values in some cases. This patch adds support for IPv6 address
> > > changes in bpf_l4_csum_update via a new flag.
> > >
> > > When calling bpf_l4_csum_replace in Cilium, it ends up calling
> > > inet_proto_csum_replace_by_diff:
> > >
> > > 1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
> > > 2: __wsum diff, bool pseudohdr)
> > > 3: {
> > > 4: if (skb->ip_summed != CHECKSUM_PARTIAL) {
> > > 5: csum_replace_by_diff(sum, diff);
> > > 6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
> > > 7: skb->csum = ~csum_sub(diff, skb->csum);
> > > 8: } else if (pseudohdr) {
> > > 9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
> > > 10: }
> > > 11: }
> > >
> > > The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
> > > updated one of the IPv6 addresses. The helper now updates the L4 header
> > > checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
> > >
> > > For an IPv6 packet, the updates of the IPv6 address and of the L4
> > > checksum will cancel each other. The checksums are set such that
> > > computing a checksum over the packet including its checksum will result
> > > in a sum of 0. So the same is true here when we update the L4 checksum
> > > on line 5. We'll update it as to cancel the previous IPv6 address
> > > update. Hence skb->csum should remain untouched in this case.
> >
> > Is ILA broken then?
> > net/ipv6/ila/ila_common.c is using
> > inet_proto_csum_replace_by_diff()
> >
> > or is it simply doing it differently?
>
> As far as I can tell, yes, it's affected by the same issue. Maybe nobody
> noticed because 1) it only happens for CHECKSUM_COMPLETE and 2) only the
> adj-transport checksum mode for ILA is affected. That mode doesn't look
> covered in selftests and isn't the one recommended to users.
>
> With ILA also affected, maybe passing the IPv6 flag to
> inet_proto_csum_replace_by_diff is a better solution. That way we could
> keep using csum_diff + l4_csum_replace.
I was able to reproduce the bug locally on an ILA setup with
adj-transport checksum mode. The tricky part was ensuring
inet_proto_csum_replace_by_diff is called while we're in
CHECKSUM_COMPLETE state, but I imagine that's easier with a proper NIC
doing checksum offload on a server.
Configuration on the target:
$ ip a show dev eth0
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 62:ae:35:9e:0f:8d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 3333:0:0:1::e952/64 scope global
valid_lft forever preferred_lft forever
inet6 fd00:10:244:2::e952/128 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::60ae:35ff:fe9e:f8d/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ ip ila add loc_match fd00:10:244:2 loc 3333:0:0:1 csum-mode adj-transport ident-type luid dev eth0
Then I hit [fd00:10:244:2::e952]:8000 with a server listening only on
[3333:0:0:1::e952]:8000. With the bug, the SYN packet is dropped with
SKB_DROP_REASON_TCP_CSUM after inet_proto_csum_replace_by_diff changed
skb->csum. pwru trace:
IFACE TUPLE FUNC
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ipv6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ip6_rcv_core
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) nf_hook_slow
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) inet_proto_csum_replace_by_diff
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_early_demux
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_route_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input_finish
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_protocol_deliver_rcu
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) raw6_local_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ipv6_raw_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) __skb_checksum_complete
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skb_reason(SKB_DROP_REASON_TCP_CSUM)
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_head_state
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_data
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_free_head
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skbmem
With the fix, I can see inet_proto_csum_replace_by_diff is not touching
skb->csum and I get a SYN+ACK (after which it fails since I didn't
configure the whole ILA setup).
I'll send a patchset to fix both the ILA and BPF sides. Not sure which
tree I should send it to though?
prev parent reply other threads:[~2025-05-22 19:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-20 22:06 [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes Paul Chaignon
2025-05-21 1:07 ` Alexei Starovoitov
2025-05-21 21:19 ` Paul Chaignon
2025-05-22 19:01 ` Paul Chaignon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC90ffX6vpTPH5_F@mail.gmail.com \
--to=paul.chaignon@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox