From: Stanislav Fomichev <sdf.kernel@gmail.com>
To: Mahe Tardy <mahe.tardy@gmail.com>
Cc: bpf@vger.kernel.org, andrii@kernel.org, ast@kernel.org,
daniel@iogearbox.net, john.fastabend@gmail.com, jordan@jrife.io,
martin.lau@linux.dev, yonghong.song@linux.dev,
emil@etsalapatis.com, netdev@vger.kernel.org,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
davem@davemloft.net, horms@kernel.org
Subject: Re: [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc
Date: Thu, 25 Jun 2026 09:24:59 -0700 [thread overview]
Message-ID: <aj1V2ZdzY1EGtsma@devvm7509.cco0.facebook.com> (raw)
In-Reply-To: <20260625110321.28236-2-mahe.tardy@gmail.com>
On 06/25, Mahe Tardy wrote:
> This is needed in the context of Tetragon to provide improved feedback
> (in contrast to just dropping packets) to east-west traffic when blocked
> by policies using cgroup_skb programs.
>
> This reuses concepts from netfilter reject target codepath with the
> differences that:
> * Packets are cloned since the BPF user can still let the packet pass
> (SK_PASS from the cgroup_skb progs for example) and the current skb
> need to stay untouched (cgroup_skb hooks only allow read-only skb
> payload).
> * We protect against recursion since the kfunc, by generating an ICMP
> error message, could retrigger the BPF prog that invoked it.
>
> Only ICMP_DEST_UNREACH and ICMPV6_DEST_UNREACH are currently supported.
> The interface accepts a type parameter to facilitate future extension to
> other ICMP control message types.
>
> For normal cgroup_skb paths, the skb dst route should already be set.
> However, bpf_prog_test_run_skb can create synthetic IPv4 skbs without an
> attached route. In that case, icmp_send returns early, and the kfunc
> would otherwise report success despite no ICMP reply being sent. The
> check also rejects metadata dsts, which are not valid struct rtable
> instances. For IPv6, reject metadata dsts only: icmpv6_send can reach
> icmp6_dev, where skb_rt6_info treats any non-NULL skb dst as a struct
> rt6_info, which is not valid for metadata_dst.
>
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> Reviewed-by: Jordan Rife <jordan@jrife.io>
> Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
> ---
> net/core/filter.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 95 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2e96b4b847ce..0a0191586b44 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -84,6 +84,9 @@
> #include <linux/un.h>
> #include <net/xdp_sock_drv.h>
> #include <net/inet_dscp.h>
> +#include <linux/icmpv6.h>
> +#include <net/icmp.h>
> +#include <net/ip6_route.h>
>
> #include "dev.h"
>
> @@ -12546,6 +12549,88 @@ __bpf_kfunc int bpf_xdp_pull_data(struct xdp_md *x, u32 len)
> return 0;
> }
>
> +/**
> + * bpf_icmp_send - Send an ICMP control message
> + * @skb_ctx: Packet that triggered the control message
> + * @type: ICMP type (only ICMP_DEST_UNREACH/ICMPV6_DEST_UNREACH supported)
> + * @code: ICMP code (0-15 except ICMP_FRAG_NEEDED for IPv4, 0-6 for IPv6)
> + *
> + * Sends an ICMP control message in response to the packet. The original packet
> + * is cloned before sending the ICMP message, so the BPF program can still let
> + * the packet pass if desired.
> + *
> + * Currently only ICMP_DEST_UNREACH (IPv4) and ICMPV6_DEST_UNREACH (IPv6) are
> + * supported.
> + *
> + * Return: 0 on success (send attempt), negative error code on failure:
> + * -EBUSY: Recursion detected
> + * -EPROTONOSUPPORT: Non-IP protocol
> + * -EOPNOTSUPP: Unsupported ICMP type
> + * -EINVAL: Invalid code parameter
> + * -ENETUNREACH: No usable route/dst for the ICMP reply
> + * -ENOMEM: Memory allocation failed
> + */
> +__bpf_kfunc int bpf_icmp_send(struct __sk_buff *skb_ctx, int type, int code)
> +{
> + struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> + struct sk_buff *nskb;
> + struct sock *sk;
> +
> + sk = skb_to_full_sk(skb);
> + if (sk && sk->sk_kern_sock &&
> + (sk->sk_protocol == IPPROTO_ICMP || sk->sk_protocol == IPPROTO_ICMPV6))
> + return -EBUSY;
> +
> + switch (skb->protocol) {
> +#if IS_ENABLED(CONFIG_INET)
> + case htons(ETH_P_IP): {
> + if (type != ICMP_DEST_UNREACH)
> + return -EOPNOTSUPP;
> + if (code < 0 || code > NR_ICMP_UNREACH ||
> + code == ICMP_FRAG_NEEDED) /* needs a valid next-hop MTU */
> + return -EINVAL;
> +
> + /* icmp_send expects skb_dst to be a real rtable. */
> + if (!skb_valid_dst(skb))
> + return -ENETUNREACH;
> +
> + nskb = skb_clone(skb, GFP_ATOMIC);
> + if (!nskb)
> + return -ENOMEM;
> +
> + memset(IPCB(nskb), 0, sizeof(*IPCB(nskb)));
> + icmp_send(nskb, type, code, 0);
> + consume_skb(nskb);
> + break;
> + }
> +#endif
> +#if IS_ENABLED(CONFIG_IPV6)
> + case htons(ETH_P_IPV6):
> + if (type != ICMPV6_DEST_UNREACH)
> + return -EOPNOTSUPP;
> + if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> + return -EINVAL;
[..]
> + /* icmpv6_send may treat skb_dst as rt6_info. */
> + if (skb_metadata_dst(skb))
> + return -ENETUNREACH;
A bit confused about this. Which part of icmpv6_send treats skb_dst as rt6_info?
(I see the original sashiko report about dst, but icmp6 seems to be not
requiring it)
next prev parent reply other threads:[~2026-06-25 16:25 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:03 [PATCH bpf-next v10 0/5] bpf: add icmp_send kfunc Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc Mahe Tardy
2026-06-25 16:24 ` Stanislav Fomichev [this message]
2026-06-25 16:49 ` Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 2/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb tests Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 3/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb IPv6 tests Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 4/5] selftests/bpf: add bpf_icmp_send recursion test Mahe Tardy
2026-06-25 11:03 ` [PATCH bpf-next v10 5/5] selftests/bpf: add bpf_icmp_send no route test Mahe Tardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aj1V2ZdzY1EGtsma@devvm7509.cco0.facebook.com \
--to=sdf.kernel@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=emil@etsalapatis.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jordan@jrife.io \
--cc=kuba@kernel.org \
--cc=mahe.tardy@gmail.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox