From: Mahe Tardy <mahe.tardy@gmail.com>
To: Stanislav Fomichev <sdf.kernel@gmail.com>
Cc: bpf@vger.kernel.org, andrii@kernel.org, ast@kernel.org,
daniel@iogearbox.net, john.fastabend@gmail.com, jordan@jrife.io,
martin.lau@linux.dev, yonghong.song@linux.dev,
emil@etsalapatis.com, netdev@vger.kernel.org,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
davem@davemloft.net, horms@kernel.org
Subject: Re: [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc
Date: Thu, 25 Jun 2026 18:49:03 +0200 [thread overview]
Message-ID: <aj1b_z6h5xn42Hxe@gmail.com> (raw)
In-Reply-To: <aj1V2ZdzY1EGtsma@devvm7509.cco0.facebook.com>
On Thu, Jun 25, 2026 at 09:24:59AM -0700, Stanislav Fomichev wrote:
> On 06/25, Mahe Tardy wrote:
[...]
> > +__bpf_kfunc int bpf_icmp_send(struct __sk_buff *skb_ctx, int type, int code)
> > +{
> > + struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> > + struct sk_buff *nskb;
> > + struct sock *sk;
> > +
> > + sk = skb_to_full_sk(skb);
> > + if (sk && sk->sk_kern_sock &&
> > + (sk->sk_protocol == IPPROTO_ICMP || sk->sk_protocol == IPPROTO_ICMPV6))
> > + return -EBUSY;
> > +
> > + switch (skb->protocol) {
> > +#if IS_ENABLED(CONFIG_INET)
> > + case htons(ETH_P_IP): {
> > + if (type != ICMP_DEST_UNREACH)
> > + return -EOPNOTSUPP;
> > + if (code < 0 || code > NR_ICMP_UNREACH ||
> > + code == ICMP_FRAG_NEEDED) /* needs a valid next-hop MTU */
> > + return -EINVAL;
> > +
> > + /* icmp_send expects skb_dst to be a real rtable. */
> > + if (!skb_valid_dst(skb))
> > + return -ENETUNREACH;
> > +
> > + nskb = skb_clone(skb, GFP_ATOMIC);
> > + if (!nskb)
> > + return -ENOMEM;
> > +
> > + memset(IPCB(nskb), 0, sizeof(*IPCB(nskb)));
> > + icmp_send(nskb, type, code, 0);
> > + consume_skb(nskb);
> > + break;
> > + }
> > +#endif
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + case htons(ETH_P_IPV6):
> > + if (type != ICMPV6_DEST_UNREACH)
> > + return -EOPNOTSUPP;
> > + if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> > + return -EINVAL;
>
> [..]
>
> > + /* icmpv6_send may treat skb_dst as rt6_info. */
> > + if (skb_metadata_dst(skb))
> > + return -ENETUNREACH;
>
> A bit confused about this. Which part of icmpv6_send treats skb_dst as rt6_info?
> (I see the original sashiko report about dst, but icmp6 seems to be not
> requiring it)
Yeah I was also a bit confused because this came out of nowhere as soon
as I put the skb_valid_dst only on the IPv4 path (for different
reasons), but there is actually a potential trace in which we have type
confusion indeed:
- icmp6_send() checks scoped source addresses and calls icmp6_iif() at net/ipv6/icmp.c:702
- icmp6_iif() calls icmp6_dev() at net/ipv6/icmp.c:441
- icmp6_dev() does skb_rt6_info(skb) for loopback/L3 master devices at net/ipv6/icmp.c:428
- skb_rt6_info() casts any non-NULL dst to struct rt6_info at include/net/ip6_route.h:233
- rt6->rt6i_idev is then dereferenced at net/ipv6/icmp.c:434
When checking with pahole, we can find this on my local kernel:
struct rt6_info {
struct dst_entry dst; /* 0 136 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
struct fib6_info * from; /* 136 8 */
int sernum; /* 144 4 */
struct rt6key rt6i_dst; /* 148 20 */
struct rt6key rt6i_src; /* 168 20 */
struct in6_addr rt6i_gateway; /* 188 16 */
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
struct inet6_dev * rt6i_idev; /* 208 8 */ <--- we dereference this
u32 rt6i_flags; /* 216 4 */
short unsigned int rt6i_nfheader_len; /* 220 2 */
/* size: 224, cachelines: 4, members: 9 */
/* sum members: 218, holes: 1, sum holes: 4 */
/* padding: 2 */
/* last cacheline: 32 bytes */
};
And the metadata_dst would look like this:
struct metadata_dst {
struct dst_entry dst; /* 0 136 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
enum metadata_type type; /* 136 4 */
/* XXX 4 bytes hole, try to pack */
union {
struct ip_tunnel_info tun_info; /* 144 96 */
struct hw_port_info port_info; /* 144 16 */
struct macsec_info macsec_info; /* 144 8 */
struct xfrm_md_info xfrm_info; /* 144 16 */
} u; /* 144 96 */ <--- we land on this union
/* size: 240, cachelines: 4, members: 3 */
/* sum members: 236, holes: 1, sum holes: 4 */
/* last cacheline: 48 bytes */
};
Let's say it's a struct ip_tunnel_info:
struct ip_tunnel_info {
struct ip_tunnel_key key; /* 0 64 */
/* XXX last struct has 7 bytes of padding */
/* --- cacheline 1 boundary (64 bytes) --- */
struct ip_tunnel_encap encap; /* 64 8 */ <--- 144 + 64 = 208 we land here
struct dst_cache dst_cache; /* 72 16 */
u8 options_len; /* 88 1 */
u8 mode; /* 89 1 */
/* size: 96, cachelines: 2, members: 5 */
/* padding: 6 */
/* paddings: 1, sum paddings: 7 */
/* last cacheline: 32 bytes */
};
So I imagine this is fairly tricky to trigger but still a case of type
confusion. I have actually no idea how likely this can happen from my
call but the trace makes sense at least.
next prev parent reply other threads:[~2026-06-25 16:49 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:03 [PATCH bpf-next v10 0/5] bpf: add icmp_send kfunc Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc Mahe Tardy
2026-06-25 16:24 ` Stanislav Fomichev
2026-06-25 16:49 ` Mahe Tardy [this message]
2026-06-25 11:03 ` [PATCH bpf-next v10 2/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb tests Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 3/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb IPv6 tests Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 4/5] selftests/bpf: add bpf_icmp_send recursion test Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 5/5] selftests/bpf: add bpf_icmp_send no route test Mahe Tardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aj1b_z6h5xn42Hxe@gmail.com \
--to=mahe.tardy@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=emil@etsalapatis.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jordan@jrife.io \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.