public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Paul Chaignon <paul.chaignon@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes
Date: Thu, 22 May 2025 21:01:17 +0200	[thread overview]
Message-ID: <aC90ffX6vpTPH5_F@mail.gmail.com> (raw)
In-Reply-To: <aC5DamQVPviBmNe5@mail.gmail.com>

On Wed, May 21, 2025 at 11:19:41PM +0200, Paul Chaignon wrote:
> On Tue, May 20, 2025 at 06:07:15PM -0700, Alexei Starovoitov wrote:
> > On Tue, May 20, 2025 at 3:06 PM Paul Chaignon <paul.chaignon@gmail.com> wrote:
> > >
> > > In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
> > > things, update the L4 checksum after reverse SNATing IPv6 packets. That
> > > use case is however not currently supported and leads to invalid
> > > skb->csum values in some cases. This patch adds support for IPv6 address
> > > changes in bpf_l4_csum_update via a new flag.
> > >
> > > When calling bpf_l4_csum_replace in Cilium, it ends up calling
> > > inet_proto_csum_replace_by_diff:
> > >
> > >     1:  void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
> > >     2:                                       __wsum diff, bool pseudohdr)
> > >     3:  {
> > >     4:      if (skb->ip_summed != CHECKSUM_PARTIAL) {
> > >     5:          csum_replace_by_diff(sum, diff);
> > >     6:          if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
> > >     7:              skb->csum = ~csum_sub(diff, skb->csum);
> > >     8:      } else if (pseudohdr) {
> > >     9:          *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
> > >     10:     }
> > >     11: }
> > >
> > > The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
> > > updated one of the IPv6 addresses. The helper now updates the L4 header
> > > checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
> > >
> > > For an IPv6 packet, the updates of the IPv6 address and of the L4
> > > checksum will cancel each other. The checksums are set such that
> > > computing a checksum over the packet including its checksum will result
> > > in a sum of 0. So the same is true here when we update the L4 checksum
> > > on line 5. We'll update it as to cancel the previous IPv6 address
> > > update. Hence skb->csum should remain untouched in this case.
> > 
> > Is ILA broken then?
> > net/ipv6/ila/ila_common.c is using
> > inet_proto_csum_replace_by_diff()
> > 
> > or is it simply doing it differently?
> 
> As far as I can tell, yes, it's affected by the same issue. Maybe nobody
> noticed because 1) it only happens for CHECKSUM_COMPLETE and 2) only the
> adj-transport checksum mode for ILA is affected. That mode doesn't look
> covered in selftests and isn't the one recommended to users.
> 
> With ILA also affected, maybe passing the IPv6 flag to
> inet_proto_csum_replace_by_diff is a better solution. That way we could
> keep using csum_diff + l4_csum_replace.

I was able to reproduce the bug locally on an ILA setup with
adj-transport checksum mode. The tricky part was ensuring
inet_proto_csum_replace_by_diff is called while we're in
CHECKSUM_COMPLETE state, but I imagine that's easier with a proper NIC
doing checksum offload on a server.

Configuration on the target:

$ ip a show dev eth0
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:ae:35:9e:0f:8d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 3333:0:0:1::e952/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fd00:10:244:2::e952/128 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::60ae:35ff:fe9e:f8d/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever
$ ip ila add loc_match fd00:10:244:2 loc 3333:0:0:1 csum-mode adj-transport ident-type luid dev eth0

Then I hit [fd00:10:244:2::e952]:8000 with a server listening only on
[3333:0:0:1::e952]:8000. With the bug, the SYN packet is dropped with
SKB_DROP_REASON_TCP_CSUM after inet_proto_csum_replace_by_diff changed
skb->csum. pwru trace:

  IFACE   TUPLE                                                        FUNC
  eth0:9  [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp)  ipv6_rcv
  eth0:9  [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp)  ip6_rcv_core
  eth0:9  [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp)  nf_hook_slow
  eth0:9  [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp)  inet_proto_csum_replace_by_diff
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     tcp_v6_early_demux
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     ip6_route_input
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     ip6_input
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     ip6_input_finish
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     ip6_protocol_deliver_rcu
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     raw6_local_deliver
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     ipv6_raw_deliver
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     tcp_v6_rcv
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     __skb_checksum_complete
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     kfree_skb_reason(SKB_DROP_REASON_TCP_CSUM)
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     skb_release_head_state
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     skb_release_data
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     skb_free_head
  eth0:9  [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp)     kfree_skbmem

With the fix, I can see inet_proto_csum_replace_by_diff is not touching
skb->csum and I get a SYN+ACK (after which it fails since I didn't
configure the whole ILA setup).

I'll send a patchset to fix both the ILA and BPF sides. Not sure which
tree I should send it to though?


      reply	other threads:[~2025-05-22 19:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20 22:06 [RFC PATCH bpf-next] bpf: Support L4 csum update for IPv6 address changes Paul Chaignon
2025-05-21  1:07 ` Alexei Starovoitov
2025-05-21 21:19   ` Paul Chaignon
2025-05-22 19:01     ` Paul Chaignon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aC90ffX6vpTPH5_F@mail.gmail.com \
    --to=paul.chaignon@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox