From: Paul Chaignon <paul.chaignon@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes
Date: Thu, 22 May 2025 21:01:17 +0200 [thread overview]
Message-ID: <aC90ffX6vpTPH5_F@mail.gmail.com> (raw)
In-Reply-To: <aC5DamQVPviBmNe5@mail.gmail.com>
On Wed, May 21, 2025 at 11:19:41PM +0200, Paul Chaignon wrote:
> On Tue, May 20, 2025 at 06:07:15PM -0700, Alexei Starovoitov wrote:
> > On Tue, May 20, 2025 at 3:06 PM Paul Chaignon <paul.chaignon@gmail.com> wrote:
> > >
> > > In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
> > > things, update the L4 checksum after reverse SNATing IPv6 packets. That
> > > use case is however not currently supported and leads to invalid
> > > skb->csum values in some cases. This patch adds support for IPv6 address
> > > changes in bpf_l4_csum_update via a new flag.
> > >
> > > When calling bpf_l4_csum_replace in Cilium, it ends up calling
> > > inet_proto_csum_replace_by_diff:
> > >
> > > 1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
> > > 2: __wsum diff, bool pseudohdr)
> > > 3: {
> > > 4: if (skb->ip_summed != CHECKSUM_PARTIAL) {
> > > 5: csum_replace_by_diff(sum, diff);
> > > 6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
> > > 7: skb->csum = ~csum_sub(diff, skb->csum);
> > > 8: } else if (pseudohdr) {
> > > 9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
> > > 10: }
> > > 11: }
> > >
> > > The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
> > > updated one of the IPv6 addresses. The helper now updates the L4 header
> > > checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
> > >
> > > For an IPv6 packet, the updates of the IPv6 address and of the L4
> > > checksum will cancel each other. The checksums are set such that
> > > computing a checksum over the packet including its checksum will result
> > > in a sum of 0. So the same is true here when we update the L4 checksum
> > > on line 5. We'll update it as to cancel the previous IPv6 address
> > > update. Hence skb->csum should remain untouched in this case.
> >
> > Is ILA broken then?
> > net/ipv6/ila/ila_common.c is using
> > inet_proto_csum_replace_by_diff()
> >
> > or is it simply doing it differently?
>
> As far as I can tell, yes, it's affected by the same issue. Maybe nobody
> noticed because 1) it only happens for CHECKSUM_COMPLETE and 2) only the
> adj-transport checksum mode for ILA is affected. That mode doesn't look
> covered in selftests and isn't the one recommended to users.
>
> With ILA also affected, maybe passing the IPv6 flag to
> inet_proto_csum_replace_by_diff is a better solution. That way we could
> keep using csum_diff + l4_csum_replace.
I was able to reproduce the bug locally on an ILA setup with
adj-transport checksum mode. The tricky part was ensuring
inet_proto_csum_replace_by_diff is called while we're in
CHECKSUM_COMPLETE state, but I imagine that's easier with a proper NIC
doing checksum offload on a server.
Configuration on the target:
$ ip a show dev eth0
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 62:ae:35:9e:0f:8d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 3333:0:0:1::e952/64 scope global
valid_lft forever preferred_lft forever
inet6 fd00:10:244:2::e952/128 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::60ae:35ff:fe9e:f8d/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ ip ila add loc_match fd00:10:244:2 loc 3333:0:0:1 csum-mode adj-transport ident-type luid dev eth0
Then I hit [fd00:10:244:2::e952]:8000 with a server listening only on
[3333:0:0:1::e952]:8000. With the bug, the SYN packet is dropped with
SKB_DROP_REASON_TCP_CSUM after inet_proto_csum_replace_by_diff changed
skb->csum. pwru trace:
IFACE TUPLE FUNC
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ipv6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ip6_rcv_core
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) nf_hook_slow
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) inet_proto_csum_replace_by_diff
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_early_demux
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_route_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input_finish
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_protocol_deliver_rcu
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) raw6_local_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ipv6_raw_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) __skb_checksum_complete
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skb_reason(SKB_DROP_REASON_TCP_CSUM)
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_head_state
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_data
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_free_head
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skbmem
With the fix, I can see inet_proto_csum_replace_by_diff is not touching
skb->csum and I get a SYN+ACK (after which it fails since I didn't
configure the whole ILA setup).
I'll send a patchset to fix both the ILA and BPF sides. Not sure which
tree I should send it to though?
next prev parent reply other threads:[~2025-05-22 19:01 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-20 22:06 [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes Paul Chaignon
2025-05-21 1:07 ` Alexei Starovoitov
2025-05-21 21:19 ` Paul Chaignon
2025-05-22 19:01 ` Paul Chaignon [this message]
2025-05-21 13:32 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC90ffX6vpTPH5_F@mail.gmail.com \
--to=paul.chaignon@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.