Re: [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mahe Tardy <mahe.tardy@gmail.com>
To: Stanislav Fomichev <sdf.kernel@gmail.com>
Cc: bpf@vger.kernel.org, andrii@kernel.org, ast@kernel.org,
	daniel@iogearbox.net, john.fastabend@gmail.com, jordan@jrife.io,
	martin.lau@linux.dev, yonghong.song@linux.dev,
	emil@etsalapatis.com, netdev@vger.kernel.org,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	davem@davemloft.net, horms@kernel.org
Subject: Re: [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc
Date: Thu, 25 Jun 2026 18:49:03 +0200	[thread overview]
Message-ID: <aj1b_z6h5xn42Hxe@gmail.com> (raw)
In-Reply-To: <aj1V2ZdzY1EGtsma@devvm7509.cco0.facebook.com>

On Thu, Jun 25, 2026 at 09:24:59AM -0700, Stanislav Fomichev wrote:
> On 06/25, Mahe Tardy wrote:

[...]

> > +__bpf_kfunc int bpf_icmp_send(struct __sk_buff *skb_ctx, int type, int code)
> > +{
> > +	struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> > +	struct sk_buff *nskb;
> > +	struct sock *sk;
> > +
> > +	sk = skb_to_full_sk(skb);
> > +	if (sk && sk->sk_kern_sock &&
> > +	    (sk->sk_protocol == IPPROTO_ICMP || sk->sk_protocol == IPPROTO_ICMPV6))
> > +		return -EBUSY;
> > +
> > +	switch (skb->protocol) {
> > +#if IS_ENABLED(CONFIG_INET)
> > +	case htons(ETH_P_IP): {
> > +		if (type != ICMP_DEST_UNREACH)
> > +			return -EOPNOTSUPP;
> > +		if (code < 0 || code > NR_ICMP_UNREACH ||
> > +		    code == ICMP_FRAG_NEEDED) /* needs a valid next-hop MTU */
> > +			return -EINVAL;
> > +
> > +		/* icmp_send expects skb_dst to be a real rtable. */
> > +		if (!skb_valid_dst(skb))
> > +			return -ENETUNREACH;
> > +
> > +		nskb = skb_clone(skb, GFP_ATOMIC);
> > +		if (!nskb)
> > +			return -ENOMEM;
> > +
> > +		memset(IPCB(nskb), 0, sizeof(*IPCB(nskb)));
> > +		icmp_send(nskb, type, code, 0);
> > +		consume_skb(nskb);
> > +		break;
> > +	}
> > +#endif
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +	case htons(ETH_P_IPV6):
> > +		if (type != ICMPV6_DEST_UNREACH)
> > +			return -EOPNOTSUPP;
> > +		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> > +			return -EINVAL;
> 
> [..]
> 
> > +		/* icmpv6_send may treat skb_dst as rt6_info. */
> > +		if (skb_metadata_dst(skb))
> > +			return -ENETUNREACH;
> 
> A bit confused about this. Which part of icmpv6_send treats skb_dst as rt6_info?
> (I see the original sashiko report about dst, but icmp6 seems to be not
> requiring it)

Yeah I was also a bit confused because this came out of nowhere as soon
as I put the skb_valid_dst only on the IPv4 path (for different
reasons), but there is actually a potential trace in which we have type
confusion indeed:

- icmp6_send() checks scoped source addresses and calls icmp6_iif() at net/ipv6/icmp.c:702
- icmp6_iif() calls icmp6_dev() at net/ipv6/icmp.c:441
- icmp6_dev() does skb_rt6_info(skb) for loopback/L3 master devices at net/ipv6/icmp.c:428
- skb_rt6_info() casts any non-NULL dst to struct rt6_info at include/net/ip6_route.h:233
- rt6->rt6i_idev is then dereferenced at net/ipv6/icmp.c:434

When checking with pahole, we can find this on my local kernel:

struct rt6_info {
	struct dst_entry           dst;                  /*     0   136 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	struct fib6_info *         from;                 /*   136     8 */
	int                        sernum;               /*   144     4 */
	struct rt6key              rt6i_dst;             /*   148    20 */
	struct rt6key              rt6i_src;             /*   168    20 */
	struct in6_addr            rt6i_gateway;         /*   188    16 */

	/* XXX 4 bytes hole, try to pack */

	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
	struct inet6_dev *         rt6i_idev;            /*   208     8 */  <--- we dereference this
	u32                        rt6i_flags;           /*   216     4 */
	short unsigned int         rt6i_nfheader_len;    /*   220     2 */

	/* size: 224, cachelines: 4, members: 9 */
	/* sum members: 218, holes: 1, sum holes: 4 */
	/* padding: 2 */
	/* last cacheline: 32 bytes */
};

And the metadata_dst would look like this:

struct metadata_dst {
	struct dst_entry           dst;                  /*     0   136 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	enum metadata_type         type;                 /*   136     4 */

	/* XXX 4 bytes hole, try to pack */

	union {
		struct ip_tunnel_info tun_info;          /*   144    96 */
		struct hw_port_info port_info;           /*   144    16 */
		struct macsec_info macsec_info;          /*   144     8 */
		struct xfrm_md_info xfrm_info;           /*   144    16 */
	} u;                                             /*   144    96 */  <--- we land on this union

	/* size: 240, cachelines: 4, members: 3 */
	/* sum members: 236, holes: 1, sum holes: 4 */
	/* last cacheline: 48 bytes */
};

Let's say it's a struct ip_tunnel_info:

struct ip_tunnel_info {
	struct ip_tunnel_key       key;                  /*     0    64 */

	/* XXX last struct has 7 bytes of padding */

	/* --- cacheline 1 boundary (64 bytes) --- */
	struct ip_tunnel_encap     encap;                /*    64     8 */  <--- 144 + 64 = 208 we land here
	struct dst_cache           dst_cache;            /*    72    16 */
	u8                         options_len;          /*    88     1 */
	u8                         mode;                 /*    89     1 */

	/* size: 96, cachelines: 2, members: 5 */
	/* padding: 6 */
	/* paddings: 1, sum paddings: 7 */
	/* last cacheline: 32 bytes */
};

So I imagine this is fairly tricky to trigger but still a case of type
confusion. I have actually no idea how likely this can happen from my
call but the trace makes sense at least.

next prev parent reply	other threads:[~2026-06-25 16:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 11:03 [PATCH bpf-next v10 0/5] bpf: add icmp_send kfunc Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc Mahe Tardy
2026-06-25 16:24   ` Stanislav Fomichev
2026-06-25 16:49     ` Mahe Tardy [this message]
2026-06-25 11:03 ` [PATCH bpf-next v10 2/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb tests Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 3/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb IPv6 tests Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 4/5] selftests/bpf: add bpf_icmp_send recursion test Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 5/5] selftests/bpf: add bpf_icmp_send no route test Mahe Tardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aj1b_z6h5xn42Hxe@gmail.com \
    --to=mahe.tardy@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=emil@etsalapatis.com \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jordan@jrife.io \
    --cc=kuba@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf.kernel@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.