bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc
@ 2025-07-10 10:26 Mahe Tardy
  2025-07-10 10:26 ` [PATCH bpf-next v1 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-10 10:26 UTC (permalink / raw)
  To: bpf; +Cc: martin.lau, daniel, john.fastabend, ast, andrii, Mahe Tardy

Hello,

This is v1 of adding the icmp_send_unreach kfunc, as suggested during
LSF/MM/BPF 2025[^1]. The goal is to allow cgroup_skb programs to
actively reject east-west traffic, similarly to what is possible to do
with netfilter reject target.

The first step to implement this is using ICMP control messages, with
the ICMP_DEST_UNREACH type with various code ICMP_NET_UNREACH,
ICMP_HOST_UNREACH, ICMP_PROT_UNREACH, etc. This is easier to implement
than a TCP RST reply and will already hint the client TCP stack to abort
the connection and not retry extensively.

Note that this is different than the sock_destroy kfunc, that along
calls tcp_abort and thus sends a reset, destroying the underlying
socket.

Caveats of this design are that a cgroup_skb program can call this
function N times, and thus send N ICMP unreach control messages, and
that the program can also finish the BPF filter with SK_PASS leading to
a potential confusing situation where the TCP connection was established
while the client receive ICMP_DEST_UNREACH messages.

Other design ideas (to prevent above issues) could be:
* Extend the return codes for the cgroup_skb program to trigger the
  reject after completion (SK_REJECT).
* Adding a kfunc to set the kernel to send an ICMP_HOST_UNREACH control
  message with appropriate code when the cgroup_skb program eventually
  terminates with SK_DROP.

We should bear in mind that we want to extend this with TCP reset next.
Please tell me what's your opinion on above ideas: if adding new return
codes could be considered and/or the other alternatives would be better
than this patch series and thus proposed instead.

v1 updates (from Daniel's offline review):
- rename netfilter moved functions to ip(6)_route_reply_fetch_dst;
- explain why nf_ip(6)_route are replaced with core route functions;
- remove useless IP frag checks;
- add KF_TRUSTED_ARGS to the kfunc;
- simplify declarations of structs, initialize fd to -1 in tests;
- insist on why SK_PASS is easier for testing in BPF prog.

[^1]: https://lwn.net/Articles/1022034/

Mahe Tardy (4):
  net: move netfilter nf_reject_fill_skb_dst to core ipv4
  net: move netfilter nf_reject6_fill_skb_dst to core ipv6
  bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  selftests/bpf: add icmp_send_unreach kfunc tests

 include/net/ip6_route.h                       |  2 +
 include/net/route.h                           |  1 +
 net/core/filter.c                             | 63 +++++++++++-
 net/ipv4/netfilter/nf_reject_ipv4.c           | 19 +---
 net/ipv4/route.c                              | 15 +++
 net/ipv6/netfilter/nf_reject_ipv6.c           | 17 +---
 net/ipv6/route.c                              | 18 ++++
 .../bpf/prog_tests/icmp_send_unreach_kfunc.c  | 96 +++++++++++++++++++
 .../selftests/bpf/progs/icmp_send_unreach.c   | 35 +++++++
 9 files changed, 232 insertions(+), 34 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send_unreach.c

--
2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v1 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4
  2025-07-10 10:26 [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
@ 2025-07-10 10:26 ` Mahe Tardy
  2025-07-10 10:26 ` [PATCH bpf-next v1 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-10 10:26 UTC (permalink / raw)
  To: bpf; +Cc: martin.lau, daniel, john.fastabend, ast, andrii, Mahe Tardy

Move and rename nf_reject_fill_skb_dst from
ipv4/netfilter/nf_reject_ipv4 to ip_route_reply_fetch_dst in
ipv4/route.c so that it can be reused in the following patches by BPF
kfuncs.

Netfilter uses nf_ip_route that is almost a transparent wrapper around
ip_route_output_key so this patch inlines it.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 include/net/route.h                 |  1 +
 net/ipv4/netfilter/nf_reject_ipv4.c | 19 ++-----------------
 net/ipv4/route.c                    | 15 +++++++++++++++
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 8e39aa822cf9..1f032f768d52 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -173,6 +173,7 @@ struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
 				    const struct sock *sk);
 struct dst_entry *ipv4_blackhole_route(struct net *net,
 				       struct dst_entry *dst_orig);
+int ip_route_reply_fetch_dst(struct sk_buff *skb);

 static inline struct rtable *ip_route_output_key(struct net *net, struct flowi4 *flp)
 {
diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index 87fd945a0d27..76beb78f556a 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -220,21 +220,6 @@ void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
 }
 EXPORT_SYMBOL_GPL(nf_reject_ip_tcphdr_put);

-static int nf_reject_fill_skb_dst(struct sk_buff *skb_in)
-{
-	struct dst_entry *dst = NULL;
-	struct flowi fl;
-
-	memset(&fl, 0, sizeof(struct flowi));
-	fl.u.ip4.daddr = ip_hdr(skb_in)->saddr;
-	nf_ip_route(dev_net(skb_in->dev), &dst, &fl, false);
-	if (!dst)
-		return -1;
-
-	skb_dst_set(skb_in, dst);
-	return 0;
-}
-
 /* Send RST reply */
 void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		   int hook)
@@ -248,7 +233,7 @@ void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		return;

 	if ((hook == NF_INET_PRE_ROUTING || hook == NF_INET_INGRESS) &&
-	    nf_reject_fill_skb_dst(oldskb) < 0)
+	    ip_route_reply_fetch_dst(oldskb) < 0)
 		return;

 	if (skb_rtable(oldskb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
@@ -322,7 +307,7 @@ void nf_send_unreach(struct sk_buff *skb_in, int code, int hook)
 		return;

 	if ((hook == NF_INET_PRE_ROUTING || hook == NF_INET_INGRESS) &&
-	    nf_reject_fill_skb_dst(skb_in) < 0)
+	    ip_route_reply_fetch_dst(skb_in) < 0)
 		return;

 	if (skb_csum_unnecessary(skb_in) ||
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fccb05fb3a79..59b8fc3c01c0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2934,6 +2934,21 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 }
 EXPORT_SYMBOL_GPL(ip_route_output_flow);

+int ip_route_reply_fetch_dst(struct sk_buff *skb)
+{
+	struct rtable *rt;
+	struct flowi4 fl4 = {
+		.daddr = ip_hdr(skb)->saddr
+	};
+
+	rt = ip_route_output_key(dev_net(skb->dev), &fl4);
+	if (IS_ERR(rt))
+		return PTR_ERR(rt);
+	skb_dst_set(skb, &rt->dst);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ip_route_reply_fetch_dst);
+
 /* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
 			struct rtable *rt, u32 table_id, dscp_t dscp,
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v1 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6
  2025-07-10 10:26 [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  2025-07-10 10:26 ` [PATCH bpf-next v1 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
@ 2025-07-10 10:26 ` Mahe Tardy
  2025-07-10 22:02   ` kernel test robot
  2025-07-10 10:26 ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
  2025-07-10 10:26 ` [PATCH bpf-next v1 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
  3 siblings, 1 reply; 36+ messages in thread
From: Mahe Tardy @ 2025-07-10 10:26 UTC (permalink / raw)
  To: bpf; +Cc: martin.lau, daniel, john.fastabend, ast, andrii, Mahe Tardy

Move and rename nf_reject6_fill_skb_dst from
ipv6/netfilter/nf_reject_ipv6 to ip6_route_reply_fetch_dst in
ipv6/route.c so that it can be reused in the following patches by BPF
kfuncs.

Netfilter uses nf_ip6_route that is almost a transparent wrapper around
ip6_route_outputy so this patch inlines it.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 include/net/ip6_route.h             |  2 ++
 net/ipv6/netfilter/nf_reject_ipv6.c | 17 +----------------
 net/ipv6/route.c                    | 18 ++++++++++++++++++
 3 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 6dbdf60b342f..1426467df547 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -93,6 +93,8 @@ static inline struct dst_entry *ip6_route_output(struct net *net,
 	return ip6_route_output_flags(net, sk, fl6, 0);
 }

+int ip6_route_reply_fetch_dst(struct sk_buff *skb);
+
 /* Only conditionally release dst if flags indicates
  * !RT6_LOOKUP_F_DST_NOREF or dst is in uncached_list.
  */
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 9ae2b2725bf9..2553b0d43a0e 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -250,21 +250,6 @@ void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
 }
 EXPORT_SYMBOL_GPL(nf_reject_ip6_tcphdr_put);

-static int nf_reject6_fill_skb_dst(struct sk_buff *skb_in)
-{
-	struct dst_entry *dst = NULL;
-	struct flowi fl;
-
-	memset(&fl, 0, sizeof(struct flowi));
-	fl.u.ip6.daddr = ipv6_hdr(skb_in)->saddr;
-	nf_ip6_route(dev_net(skb_in->dev), &dst, &fl, false);
-	if (!dst)
-		return -1;
-
-	skb_dst_set(skb_in, dst);
-	return 0;
-}
-
 void nf_send_reset6(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		    int hook)
 {
@@ -398,7 +383,7 @@ void nf_send_unreach6(struct net *net, struct sk_buff *skb_in,
 		skb_in->dev = net->loopback_dev;

 	if ((hooknum == NF_INET_PRE_ROUTING || hooknum == NF_INET_INGRESS) &&
-	    nf_reject6_fill_skb_dst(skb_in) < 0)
+	    ip6_reply_fill_dst(skb_in) < 0)
 		return;

 	icmpv6_send(skb_in, ICMPV6_DEST_UNREACH, code, 0);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0143262094b0..f1120f6fe0a2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2705,6 +2705,24 @@ struct dst_entry *ip6_route_output_flags(struct net *net,
 }
 EXPORT_SYMBOL_GPL(ip6_route_output_flags);

+int ip6_route_reply_fetch_dst(struct sk_buff *skb)
+{
+	struct dst_entry *result;
+	struct flowi6 fl = {
+		.daddr = ipv6_hdr(skb)->saddr
+	};
+	int err;
+
+	result = ip6_route_output(dev_net(skb->dev), NULL, &fl);
+	err = result->error;
+	if (err)
+		dst_release(result);
+	else
+		skb_dst_set(skb, result);
+	return err;
+}
+EXPORT_SYMBOL_GPL(ip6_route_reply_fetch_dst);
+
 struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_orig)
 {
 	struct rt6_info *rt, *ort = dst_rt6_info(dst_orig);
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-10 10:26 [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  2025-07-10 10:26 ` [PATCH bpf-next v1 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
  2025-07-10 10:26 ` [PATCH bpf-next v1 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
@ 2025-07-10 10:26 ` Mahe Tardy
  2025-07-10 16:07   ` Alexei Starovoitov
  2025-07-11  0:32   ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc kernel test robot
  2025-07-10 10:26 ` [PATCH bpf-next v1 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
  3 siblings, 2 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-10 10:26 UTC (permalink / raw)
  To: bpf; +Cc: martin.lau, daniel, john.fastabend, ast, andrii, Mahe Tardy

This is needed in the context of Tetragon to provide improved feedback
(in contrast to just dropping packets) to east-west traffic when blocked
by policies using cgroup_skb programs.

This reuse concepts from netfilter reject target codepath with the
differences that:
* Packets are cloned since the BPF user can still return SK_PASS from
  the cgroup_skb progs and the current skb need to stay untouched
  (cgroup_skb hooks only allow read-only skb payload).
* Since cgroup_skb programs are called late in the stack, checksums do
  not need to be computed or verified, and IPv4 fragmentation does not
  need to be checked (ip_local_deliver should take care of that
  earlier).

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 net/core/filter.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index ab456bf1056e..9215f79e7690 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -85,6 +85,8 @@
 #include <linux/un.h>
 #include <net/xdp_sock_drv.h>
 #include <net/inet_dscp.h>
+#include <linux/icmp.h>
+#include <net/icmp.h>

 #include "dev.h"

@@ -12140,6 +12142,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
 	return 0;
 }

+__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
+{
+	struct sk_buff *skb = (struct sk_buff *)__skb;
+	struct sk_buff *nskb;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if (code < 0 || code > NR_ICMP_UNREACH)
+			return -EINVAL;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		if (ip_route_reply_fetch_dst(nskb) < 0) {
+			kfree_skb(nskb);
+			return -EHOSTUNREACH;
+		}
+
+		icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
+		kfree_skb(nskb);
+		break;
+#if IS_ENABLED(CONFIG_IPV6)
+	case htons(ETH_P_IPV6):
+		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
+			return -EINVAL;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		if (ip6_route_reply_fetch_dst(nskb) < 0) {
+			kfree_skb(nskb);
+			return -EHOSTUNREACH;
+		}
+
+		icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
+		kfree_skb(nskb);
+		break;
+#endif
+	default:
+		return -EPROTONOSUPPORT;
+	}
+
+	return 0;
+}
+
 __bpf_kfunc_end_defs();

 int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
@@ -12177,6 +12226,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
 BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp, KF_TRUSTED_ARGS)
 BTF_KFUNCS_END(bpf_kfunc_check_set_sock_ops)

+BTF_KFUNCS_START(bpf_kfunc_check_set_icmp_send_unreach)
+BTF_ID_FLAGS(func, bpf_icmp_send_unreach, KF_TRUSTED_ARGS)
+BTF_KFUNCS_END(bpf_kfunc_check_set_icmp_send_unreach)
+
 static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
 	.owner = THIS_MODULE,
 	.set = &bpf_kfunc_check_set_skb,
@@ -12202,6 +12255,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
 	.set = &bpf_kfunc_check_set_sock_ops,
 };

+static const struct btf_kfunc_id_set bpf_kfunc_set_icmp_send_unreach = {
+	.owner = THIS_MODULE,
+	.set = &bpf_kfunc_check_set_icmp_send_unreach,
+};
+
 static int __init bpf_kfunc_init(void)
 {
 	int ret;
@@ -12221,7 +12279,8 @@ static int __init bpf_kfunc_init(void)
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 					       &bpf_kfunc_set_sock_addr);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
-	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
+	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send_unreach);
 }
 late_initcall(bpf_kfunc_init);

--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v1 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-10 10:26 [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
                   ` (2 preceding siblings ...)
  2025-07-10 10:26 ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
@ 2025-07-10 10:26 ` Mahe Tardy
  3 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-10 10:26 UTC (permalink / raw)
  To: bpf; +Cc: martin.lau, daniel, john.fastabend, ast, andrii, Mahe Tardy

This test opens a server and client, attach a cgroup_skb program on
egress and calls the icmp_send_unreach function from the client egress
so that an ICMP unreach control message is sent back to the client.
It then fetches the message from the error queue to confirm the correct
ICMP unreach code has been sent.

Note that the BPF program returns SK_PASS to let the connection being
established to finish the test cases quicker. Otherwise, you have to
wait for the TCP three-way handshake to timeout in the kernel and
retrieve the errno translated from the unreach code set by the ICMP
control message.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_unreach_kfunc.c  | 96 +++++++++++++++++++
 .../selftests/bpf/progs/icmp_send_unreach.c   | 35 +++++++
 2 files changed, 131 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send_unreach.c

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
new file mode 100644
index 000000000000..2896d0619358
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <linux/errqueue.h>
+#include "icmp_send_unreach.skel.h"
+
+#define TIMEOUT_MS 1000
+#define SRV_PORT 54321
+
+#define ICMP_DEST_UNREACH 3
+
+#define ICMP_FRAG_NEEDED 4
+#define NR_ICMP_UNREACH 15
+
+static void read_icmp_errqueue(int sockfd, int expected_code)
+{
+	ssize_t n;
+	struct sock_extended_err *sock_err;
+	struct cmsghdr *cm;
+	char ctrl_buf[512];
+	struct msghdr msg = {
+		.msg_control = ctrl_buf,
+		.msg_controllen = sizeof(ctrl_buf),
+	};
+
+	n = recvmsg(sockfd, &msg, MSG_ERRQUEUE);
+	if (!ASSERT_GE(n, 0, "recvmsg_errqueue"))
+		return;
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if (!ASSERT_EQ(cm->cmsg_level, IPPROTO_IP, "cmsg_type") ||
+		    !ASSERT_EQ(cm->cmsg_type, IP_RECVERR, "cmsg_level"))
+			continue;
+
+		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);
+
+		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
+			       "sock_err_origin_icmp"))
+			return;
+		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
+			       "sock_err_type_dest_unreach"))
+			return;
+		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
+	}
+}
+
+void test_icmp_send_unreach_kfunc(void)
+{
+	struct icmp_send_unreach *skel;
+	int cgroup_fd = -1, client_fd = 1, srv_fd = -1;
+	int *code;
+
+	skel = icmp_send_unreach__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+		goto cleanup;
+
+	skel->links.egress =
+		bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
+		goto cleanup;
+
+	code = &skel->bss->unreach_code;
+
+	for (*code = 0; *code <= NR_ICMP_UNREACH; (*code)++) {
+		// The TCP stack reacts differently when asking for
+		// fragmentation, let's ignore it for now
+		if (*code == ICMP_FRAG_NEEDED)
+			continue;
+
+		srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1",
+				      SRV_PORT, TIMEOUT_MS);
+		if (!ASSERT_GE(srv_fd, 0, "start_server"))
+			goto for_cleanup;
+
+		client_fd = socket(AF_INET, SOCK_STREAM, 0);
+		ASSERT_GE(client_fd, 0, "client_socket");
+
+		client_fd = connect_to_fd(srv_fd, 0);
+		if (!ASSERT_GE(client_fd, 0, "client_connect"))
+			goto for_cleanup;
+
+		read_icmp_errqueue(client_fd, *code);
+
+for_cleanup:
+		close(client_fd);
+		close(srv_fd);
+	}
+
+cleanup:
+	icmp_send_unreach__destroy(skel);
+	close(cgroup_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send_unreach.c b/tools/testing/selftests/bpf/progs/icmp_send_unreach.c
new file mode 100644
index 000000000000..ec406028be5f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/icmp_send_unreach.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
+
+int unreach_code = 0;
+
+#define SERVER_PORT 54321
+#define SERVER_IP 0x7F000001
+
+SEC("cgroup_skb/egress")
+int egress(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+	struct iphdr *iph;
+	struct tcphdr *tcph;
+
+	iph = data;
+	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
+	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
+		return SK_PASS;
+
+	tcph = (void *)iph + iph->ihl * 4;
+	if ((void *)(tcph + 1) > data_end ||
+	    tcph->dest != bpf_htons(SERVER_PORT))
+		return SK_PASS;
+
+	bpf_icmp_send_unreach(skb, unreach_code);
+
+	/* returns SK_PASS to execute the test case quicker */
+	return SK_PASS;
+}
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-10 10:26 ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
@ 2025-07-10 16:07   ` Alexei Starovoitov
  2025-07-11 10:57     ` Mahe Tardy
  2025-07-25 18:53     ` [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  2025-07-11  0:32   ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc kernel test robot
  1 sibling, 2 replies; 36+ messages in thread
From: Alexei Starovoitov @ 2025-07-10 16:07 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: bpf, Martin KaFai Lau, Daniel Borkmann, John Fastabend,
	Alexei Starovoitov, Andrii Nakryiko

On Thu, Jul 10, 2025 at 3:26 AM Mahe Tardy <mahe.tardy@gmail.com> wrote:
>
> This is needed in the context of Tetragon to provide improved feedback
> (in contrast to just dropping packets) to east-west traffic when blocked
> by policies using cgroup_skb programs.
>
> This reuse concepts from netfilter reject target codepath with the
> differences that:
> * Packets are cloned since the BPF user can still return SK_PASS from
>   the cgroup_skb progs and the current skb need to stay untouched
>   (cgroup_skb hooks only allow read-only skb payload).
> * Since cgroup_skb programs are called late in the stack, checksums do
>   not need to be computed or verified, and IPv4 fragmentation does not
>   need to be checked (ip_local_deliver should take care of that
>   earlier).
>
> Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
> ---
>  net/core/filter.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index ab456bf1056e..9215f79e7690 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -85,6 +85,8 @@
>  #include <linux/un.h>
>  #include <net/xdp_sock_drv.h>
>  #include <net/inet_dscp.h>
> +#include <linux/icmp.h>
> +#include <net/icmp.h>
>
>  #include "dev.h"
>
> @@ -12140,6 +12142,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
>         return 0;
>  }
>
> +__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
> +{
> +       struct sk_buff *skb = (struct sk_buff *)__skb;
> +       struct sk_buff *nskb;
> +
> +       switch (skb->protocol) {
> +       case htons(ETH_P_IP):
> +               if (code < 0 || code > NR_ICMP_UNREACH)
> +                       return -EINVAL;
> +
> +               nskb = skb_clone(skb, GFP_ATOMIC);
> +               if (!nskb)
> +                       return -ENOMEM;
> +
> +               if (ip_route_reply_fetch_dst(nskb) < 0) {
> +                       kfree_skb(nskb);
> +                       return -EHOSTUNREACH;
> +               }
> +
> +               icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
> +               kfree_skb(nskb);
> +               break;
> +#if IS_ENABLED(CONFIG_IPV6)
> +       case htons(ETH_P_IPV6):
> +               if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> +                       return -EINVAL;
> +
> +               nskb = skb_clone(skb, GFP_ATOMIC);
> +               if (!nskb)
> +                       return -ENOMEM;
> +
> +               if (ip6_route_reply_fetch_dst(nskb) < 0) {
> +                       kfree_skb(nskb);
> +                       return -EHOSTUNREACH;
> +               }
> +
> +               icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
> +               kfree_skb(nskb);
> +               break;
> +#endif
> +       default:
> +               return -EPROTONOSUPPORT;
> +       }
> +
> +       return 0;
> +}
> +
>  __bpf_kfunc_end_defs();
>
>  int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
> @@ -12177,6 +12226,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
>  BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp, KF_TRUSTED_ARGS)
>  BTF_KFUNCS_END(bpf_kfunc_check_set_sock_ops)
>
> +BTF_KFUNCS_START(bpf_kfunc_check_set_icmp_send_unreach)
> +BTF_ID_FLAGS(func, bpf_icmp_send_unreach, KF_TRUSTED_ARGS)
> +BTF_KFUNCS_END(bpf_kfunc_check_set_icmp_send_unreach)
> +
>  static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
>         .owner = THIS_MODULE,
>         .set = &bpf_kfunc_check_set_skb,
> @@ -12202,6 +12255,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
>         .set = &bpf_kfunc_check_set_sock_ops,
>  };
>
> +static const struct btf_kfunc_id_set bpf_kfunc_set_icmp_send_unreach = {
> +       .owner = THIS_MODULE,
> +       .set = &bpf_kfunc_check_set_icmp_send_unreach,
> +};
> +
>  static int __init bpf_kfunc_init(void)
>  {
>         int ret;
> @@ -12221,7 +12279,8 @@ static int __init bpf_kfunc_init(void)
>         ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
>                                                &bpf_kfunc_set_sock_addr);
>         ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> -       return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
> +       ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
> +       return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send_unreach);

Does it have to be restricted to BPF_PROG_TYPE_CGROUP_SKB ?
Can it be a part of bpf_kfunc_set_skb[] and used more generally ?

If restriction is necessary then I guess we can live with extra
bpf_kfunc_set_icmp_send_unreach, though it's odd to create a set
just for one kfunc.
Either way don't change the last 'return ...' line in this file.
Add 'ret = ret ?: register...' instead to reduce churn.

Also cc netdev and netfilter maintainers in v2.

--
pw-bot: cr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v1 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6
  2025-07-10 10:26 ` [PATCH bpf-next v1 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
@ 2025-07-10 22:02   ` kernel test robot
  0 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2025-07-10 22:02 UTC (permalink / raw)
  To: Mahe Tardy, bpf
  Cc: oe-kbuild-all, martin.lau, daniel, john.fastabend, ast, andrii,
	Mahe Tardy

Hi Mahe,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Mahe-Tardy/net-move-netfilter-nf_reject_fill_skb_dst-to-core-ipv4/20250710-182905
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250710102607.12413-3-mahe.tardy%40gmail.com
patch subject: [PATCH bpf-next v1 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20250711/202507110529.9ZuNSyQU-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250711/202507110529.9ZuNSyQU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507110529.9ZuNSyQU-lkp@intel.com/

All errors (new ones prefixed by >>):

   net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_unreach6':
>> net/ipv6/netfilter/nf_reject_ipv6.c:386:13: error: implicit declaration of function 'ip6_reply_fill_dst' [-Werror=implicit-function-declaration]
     386 |             ip6_reply_fill_dst(skb_in) < 0)
         |             ^~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/ip6_reply_fill_dst +386 net/ipv6/netfilter/nf_reject_ipv6.c

   375	
   376	void nf_send_unreach6(struct net *net, struct sk_buff *skb_in,
   377			      unsigned char code, unsigned int hooknum)
   378	{
   379		if (!reject6_csum_ok(skb_in, hooknum))
   380			return;
   381	
   382		if (hooknum == NF_INET_LOCAL_OUT && skb_in->dev == NULL)
   383			skb_in->dev = net->loopback_dev;
   384	
   385		if ((hooknum == NF_INET_PRE_ROUTING || hooknum == NF_INET_INGRESS) &&
 > 386		    ip6_reply_fill_dst(skb_in) < 0)
   387			return;
   388	
   389		icmpv6_send(skb_in, ICMPV6_DEST_UNREACH, code, 0);
   390	}
   391	EXPORT_SYMBOL_GPL(nf_send_unreach6);
   392	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-10 10:26 ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
  2025-07-10 16:07   ` Alexei Starovoitov
@ 2025-07-11  0:32   ` kernel test robot
  1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2025-07-11  0:32 UTC (permalink / raw)
  To: Mahe Tardy, bpf
  Cc: oe-kbuild-all, martin.lau, daniel, john.fastabend, ast, andrii,
	Mahe Tardy

Hi Mahe,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Mahe-Tardy/net-move-netfilter-nf_reject_fill_skb_dst-to-core-ipv4/20250710-182905
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250710102607.12413-4-mahe.tardy%40gmail.com
patch subject: [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
config: i386-buildonly-randconfig-002-20250711 (https://download.01.org/0day-ci/archive/20250711/202507110812.bV4X2x0X-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250711/202507110812.bV4X2x0X-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507110812.bV4X2x0X-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: net/core/filter.o: in function `bpf_icmp_send_unreach':
>> filter.c:(.text+0x11e97): undefined reference to `ip_route_reply_fetch_dst'
>> ld: filter.c:(.text+0x11eaf): undefined reference to `__icmp_send'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-10 16:07   ` Alexei Starovoitov
@ 2025-07-11 10:57     ` Mahe Tardy
  2025-07-25 18:53     ` [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  1 sibling, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-11 10:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Martin KaFai Lau, Daniel Borkmann, John Fastabend,
	Alexei Starovoitov, Andrii Nakryiko

On Thu, Jul 10, 2025 at 09:07:59AM -0700, Alexei Starovoitov wrote:
> On Thu, Jul 10, 2025 at 3:26 AM Mahe Tardy <mahe.tardy@gmail.com> wrote:
> >
> > This is needed in the context of Tetragon to provide improved feedback
> > (in contrast to just dropping packets) to east-west traffic when blocked
> > by policies using cgroup_skb programs.
> >
> > This reuse concepts from netfilter reject target codepath with the
> > differences that:
> > * Packets are cloned since the BPF user can still return SK_PASS from
> >   the cgroup_skb progs and the current skb need to stay untouched
> >   (cgroup_skb hooks only allow read-only skb payload).
> > * Since cgroup_skb programs are called late in the stack, checksums do
> >   not need to be computed or verified, and IPv4 fragmentation does not
> >   need to be checked (ip_local_deliver should take care of that
> >   earlier).
> >
> > Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
> > ---
> >  net/core/filter.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 60 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index ab456bf1056e..9215f79e7690 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -85,6 +85,8 @@
> >  #include <linux/un.h>
> >  #include <net/xdp_sock_drv.h>
> >  #include <net/inet_dscp.h>
> > +#include <linux/icmp.h>
> > +#include <net/icmp.h>
> >
> >  #include "dev.h"
> >
> > @@ -12140,6 +12142,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
> >         return 0;
> >  }
> >
> > +__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
> > +{
> > +       struct sk_buff *skb = (struct sk_buff *)__skb;
> > +       struct sk_buff *nskb;
> > +
> > +       switch (skb->protocol) {
> > +       case htons(ETH_P_IP):
> > +               if (code < 0 || code > NR_ICMP_UNREACH)
> > +                       return -EINVAL;
> > +
> > +               nskb = skb_clone(skb, GFP_ATOMIC);
> > +               if (!nskb)
> > +                       return -ENOMEM;
> > +
> > +               if (ip_route_reply_fetch_dst(nskb) < 0) {
> > +                       kfree_skb(nskb);
> > +                       return -EHOSTUNREACH;
> > +               }
> > +
> > +               icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
> > +               kfree_skb(nskb);
> > +               break;
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +       case htons(ETH_P_IPV6):
> > +               if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> > +                       return -EINVAL;
> > +
> > +               nskb = skb_clone(skb, GFP_ATOMIC);
> > +               if (!nskb)
> > +                       return -ENOMEM;
> > +
> > +               if (ip6_route_reply_fetch_dst(nskb) < 0) {
> > +                       kfree_skb(nskb);
> > +                       return -EHOSTUNREACH;
> > +               }
> > +
> > +               icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
> > +               kfree_skb(nskb);
> > +               break;
> > +#endif
> > +       default:
> > +               return -EPROTONOSUPPORT;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  __bpf_kfunc_end_defs();
> >
> >  int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
> > @@ -12177,6 +12226,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
> >  BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp, KF_TRUSTED_ARGS)
> >  BTF_KFUNCS_END(bpf_kfunc_check_set_sock_ops)
> >
> > +BTF_KFUNCS_START(bpf_kfunc_check_set_icmp_send_unreach)
> > +BTF_ID_FLAGS(func, bpf_icmp_send_unreach, KF_TRUSTED_ARGS)
> > +BTF_KFUNCS_END(bpf_kfunc_check_set_icmp_send_unreach)
> > +
> >  static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
> >         .owner = THIS_MODULE,
> >         .set = &bpf_kfunc_check_set_skb,
> > @@ -12202,6 +12255,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
> >         .set = &bpf_kfunc_check_set_sock_ops,
> >  };
> >
> > +static const struct btf_kfunc_id_set bpf_kfunc_set_icmp_send_unreach = {
> > +       .owner = THIS_MODULE,
> > +       .set = &bpf_kfunc_check_set_icmp_send_unreach,
> > +};
> > +
> >  static int __init bpf_kfunc_init(void)
> >  {
> >         int ret;
> > @@ -12221,7 +12279,8 @@ static int __init bpf_kfunc_init(void)
> >         ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> >                                                &bpf_kfunc_set_sock_addr);
> >         ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> > -       return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
> > +       ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
> > +       return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send_unreach);
> 
> Does it have to be restricted to BPF_PROG_TYPE_CGROUP_SKB ?
> Can it be a part of bpf_kfunc_set_skb[] and used more generally ?

From the assumptions that have been made to write the kfunc in this
state yes, it has to be restricted to cgroup_skb. We would need
additional checks for hooks that are earlier in the stack I think.

Keeping in mind that this kfunc is not a necessity for other prog types
which can already overwrite packets, like TC.
 
> If restriction is necessary then I guess we can live with extra
> bpf_kfunc_set_icmp_send_unreach, though it's odd to create a set
> just for one kfunc.
> Either way don't change the last 'return ...' line in this file.
> Add 'ret = ret ?: register...' instead to reduce churn.
> 
> Also cc netdev and netfilter maintainers in v2.

Yes to both.

Aside, could I have your opinion on this part of the cover letter before
I proceed to fix these patches:

> Other design ideas (to prevent above issues) could be:
> * Extend the return codes for the cgroup_skb program to trigger the
>  reject after completion (SK_REJECT).
> * Adding a kfunc to set the kernel to send an ICMP_HOST_UNREACH control
>  message with appropriate code when the cgroup_skb program eventually
>  terminates with SK_DROP.
> 
> We should bear in mind that we want to extend this with TCP reset next.
> Please tell me what's your opinion on above ideas: if adding new return
> codes could be considered and/or the other alternatives would be better
> than this patch series and thus proposed instead.

These two ideas would make it more natural for cgroup_skb progs but
would prevent someone to extend it to more prog types in the future.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc
  2025-07-10 16:07   ` Alexei Starovoitov
  2025-07-11 10:57     ` Mahe Tardy
@ 2025-07-25 18:53     ` Mahe Tardy
  2025-07-25 18:53       ` [PATCH bpf-next v2 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
                         ` (3 more replies)
  1 sibling, 4 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-25 18:53 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: andrii, ast, bpf, daniel, john.fastabend, mahe.tardy, martin.lau,
	fw, netfilter-devel, pablo, netdev, coreteam

Hello,

This is v2 of adding the icmp_send_unreach kfunc, as suggested during
LSF/MM/BPF 2025[^1]. The goal is to allow cgroup_skb programs to
actively reject east-west traffic, similarly to what is possible to do
with netfilter reject target.

The first step to implement this is using ICMP control messages, with
the ICMP_DEST_UNREACH type with various code ICMP_NET_UNREACH,
ICMP_HOST_UNREACH, ICMP_PROT_UNREACH, etc. This is easier to implement
than a TCP RST reply and will already hint the client TCP stack to abort
the connection and not retry extensively.

Note that this is different than the sock_destroy kfunc, that along
calls tcp_abort and thus sends a reset, destroying the underlying
socket.

Caveats of this kfunc design are that a cgroup_skb program can call this
function N times, thus send N ICMP unreach control messages and that the
program can return from the BPF filter with SK_PASS leading to a
potential confusing situation where the TCP connection was established
while the client received ICMP_DEST_UNREACH messages.

Another more sophisticated design idea would be for the kfunc to set the
kernel to send an ICMP_HOST_UNREACH control message with the appropriate
code when the cgroup_skb program terminates with SK_DROP. Creating a new
'SK_REJECT' return code for cgroup_skb program was generally rejected
and would be too limited for other program types support.

We should bear in mind that we want to add a TCP reset kfunc next and
also could extend this kfunc to other program types if wanted.

v2 updates:
- fix a build error from a missing function call rename;
- avoid changing return line in bpf_kfunc_init;
- return SK_DROP from the kfunc (similarly to bpf_redirect);
- check the return value in the selftest.

[^1]: https://lwn.net/Articles/1022034/

Mahe Tardy (4):
  net: move netfilter nf_reject_fill_skb_dst to core ipv4
  net: move netfilter nf_reject6_fill_skb_dst to core ipv6
  bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  selftests/bpf: add icmp_send_unreach kfunc tests

 include/net/ip6_route.h                       |  2 +
 include/net/route.h                           |  1 +
 net/core/filter.c                             | 59 +++++++++++
 net/ipv4/netfilter/nf_reject_ipv4.c           | 19 +---
 net/ipv4/route.c                              | 15 +++
 net/ipv6/netfilter/nf_reject_ipv6.c           | 17 +---
 net/ipv6/route.c                              | 18 ++++
 .../bpf/prog_tests/icmp_send_unreach_kfunc.c  | 99 +++++++++++++++++++
 .../selftests/bpf/progs/icmp_send_unreach.c   | 36 +++++++
 9 files changed, 233 insertions(+), 33 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send_unreach.c

--
2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4
  2025-07-25 18:53     ` [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
@ 2025-07-25 18:53       ` Mahe Tardy
  2025-07-25 18:53       ` [PATCH bpf-next v2 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-25 18:53 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: andrii, ast, bpf, daniel, john.fastabend, mahe.tardy, martin.lau,
	fw, netfilter-devel, pablo, netdev, coreteam

Move and rename nf_reject_fill_skb_dst from
ipv4/netfilter/nf_reject_ipv4 to ip_route_reply_fetch_dst in
ipv4/route.c so that it can be reused in the following patches by BPF
kfuncs.

Netfilter uses nf_ip_route that is almost a transparent wrapper around
ip_route_output_key so this patch inlines it.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 include/net/route.h                 |  1 +
 net/ipv4/netfilter/nf_reject_ipv4.c | 19 ++-----------------
 net/ipv4/route.c                    | 15 +++++++++++++++
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 8e39aa822cf9..1f032f768d52 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -173,6 +173,7 @@ struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
 				    const struct sock *sk);
 struct dst_entry *ipv4_blackhole_route(struct net *net,
 				       struct dst_entry *dst_orig);
+int ip_route_reply_fetch_dst(struct sk_buff *skb);

 static inline struct rtable *ip_route_output_key(struct net *net, struct flowi4 *flp)
 {
diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index 87fd945a0d27..76beb78f556a 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -220,21 +220,6 @@ void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
 }
 EXPORT_SYMBOL_GPL(nf_reject_ip_tcphdr_put);

-static int nf_reject_fill_skb_dst(struct sk_buff *skb_in)
-{
-	struct dst_entry *dst = NULL;
-	struct flowi fl;
-
-	memset(&fl, 0, sizeof(struct flowi));
-	fl.u.ip4.daddr = ip_hdr(skb_in)->saddr;
-	nf_ip_route(dev_net(skb_in->dev), &dst, &fl, false);
-	if (!dst)
-		return -1;
-
-	skb_dst_set(skb_in, dst);
-	return 0;
-}
-
 /* Send RST reply */
 void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		   int hook)
@@ -248,7 +233,7 @@ void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		return;

 	if ((hook == NF_INET_PRE_ROUTING || hook == NF_INET_INGRESS) &&
-	    nf_reject_fill_skb_dst(oldskb) < 0)
+	    ip_route_reply_fetch_dst(oldskb) < 0)
 		return;

 	if (skb_rtable(oldskb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
@@ -322,7 +307,7 @@ void nf_send_unreach(struct sk_buff *skb_in, int code, int hook)
 		return;

 	if ((hook == NF_INET_PRE_ROUTING || hook == NF_INET_INGRESS) &&
-	    nf_reject_fill_skb_dst(skb_in) < 0)
+	    ip_route_reply_fetch_dst(skb_in) < 0)
 		return;

 	if (skb_csum_unnecessary(skb_in) ||
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fccb05fb3a79..59b8fc3c01c0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2934,6 +2934,21 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 }
 EXPORT_SYMBOL_GPL(ip_route_output_flow);

+int ip_route_reply_fetch_dst(struct sk_buff *skb)
+{
+	struct rtable *rt;
+	struct flowi4 fl4 = {
+		.daddr = ip_hdr(skb)->saddr
+	};
+
+	rt = ip_route_output_key(dev_net(skb->dev), &fl4);
+	if (IS_ERR(rt))
+		return PTR_ERR(rt);
+	skb_dst_set(skb, &rt->dst);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ip_route_reply_fetch_dst);
+
 /* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
 			struct rtable *rt, u32 table_id, dscp_t dscp,
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6
  2025-07-25 18:53     ` [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  2025-07-25 18:53       ` [PATCH bpf-next v2 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
@ 2025-07-25 18:53       ` Mahe Tardy
  2025-07-25 18:53       ` [PATCH bpf-next v2 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
  2025-07-25 18:53       ` [PATCH bpf-next v2 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
  3 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-25 18:53 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: andrii, ast, bpf, daniel, john.fastabend, mahe.tardy, martin.lau,
	fw, netfilter-devel, pablo, netdev, coreteam

Move and rename nf_reject6_fill_skb_dst from
ipv6/netfilter/nf_reject_ipv6 to ip6_route_reply_fetch_dst in
ipv6/route.c so that it can be reused in the following patches by BPF
kfuncs.

Netfilter uses nf_ip6_route that is almost a transparent wrapper around
ip6_route_outputy so this patch inlines it.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 include/net/ip6_route.h             |  2 ++
 net/ipv6/netfilter/nf_reject_ipv6.c | 17 +----------------
 net/ipv6/route.c                    | 18 ++++++++++++++++++
 3 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 6dbdf60b342f..1426467df547 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -93,6 +93,8 @@ static inline struct dst_entry *ip6_route_output(struct net *net,
 	return ip6_route_output_flags(net, sk, fl6, 0);
 }

+int ip6_route_reply_fetch_dst(struct sk_buff *skb);
+
 /* Only conditionally release dst if flags indicates
  * !RT6_LOOKUP_F_DST_NOREF or dst is in uncached_list.
  */
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 9ae2b2725bf9..994a3b88ac52 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -250,21 +250,6 @@ void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
 }
 EXPORT_SYMBOL_GPL(nf_reject_ip6_tcphdr_put);

-static int nf_reject6_fill_skb_dst(struct sk_buff *skb_in)
-{
-	struct dst_entry *dst = NULL;
-	struct flowi fl;
-
-	memset(&fl, 0, sizeof(struct flowi));
-	fl.u.ip6.daddr = ipv6_hdr(skb_in)->saddr;
-	nf_ip6_route(dev_net(skb_in->dev), &dst, &fl, false);
-	if (!dst)
-		return -1;
-
-	skb_dst_set(skb_in, dst);
-	return 0;
-}
-
 void nf_send_reset6(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		    int hook)
 {
@@ -398,7 +383,7 @@ void nf_send_unreach6(struct net *net, struct sk_buff *skb_in,
 		skb_in->dev = net->loopback_dev;

 	if ((hooknum == NF_INET_PRE_ROUTING || hooknum == NF_INET_INGRESS) &&
-	    nf_reject6_fill_skb_dst(skb_in) < 0)
+	    ip6_route_reply_fetch_dst(skb_in) < 0)
 		return;

 	icmpv6_send(skb_in, ICMPV6_DEST_UNREACH, code, 0);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0d5464c64965..de61540f9524 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2705,6 +2705,24 @@ struct dst_entry *ip6_route_output_flags(struct net *net,
 }
 EXPORT_SYMBOL_GPL(ip6_route_output_flags);

+int ip6_route_reply_fetch_dst(struct sk_buff *skb)
+{
+	struct dst_entry *result;
+	struct flowi6 fl = {
+		.daddr = ipv6_hdr(skb)->saddr
+	};
+	int err;
+
+	result = ip6_route_output(dev_net(skb->dev), NULL, &fl);
+	err = result->error;
+	if (err)
+		dst_release(result);
+	else
+		skb_dst_set(skb, result);
+	return err;
+}
+EXPORT_SYMBOL_GPL(ip6_route_reply_fetch_dst);
+
 struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_orig)
 {
 	struct rt6_info *rt, *ort = dst_rt6_info(dst_orig);
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-25 18:53     ` [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  2025-07-25 18:53       ` [PATCH bpf-next v2 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
  2025-07-25 18:53       ` [PATCH bpf-next v2 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
@ 2025-07-25 18:53       ` Mahe Tardy
  2025-07-27  1:49         ` kernel test robot
  2025-07-25 18:53       ` [PATCH bpf-next v2 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
  3 siblings, 1 reply; 36+ messages in thread
From: Mahe Tardy @ 2025-07-25 18:53 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: andrii, ast, bpf, daniel, john.fastabend, mahe.tardy, martin.lau,
	fw, netfilter-devel, pablo, netdev, coreteam

This is needed in the context of Tetragon to provide improved feedback
(in contrast to just dropping packets) to east-west traffic when blocked
by policies using cgroup_skb programs.

This reuse concepts from netfilter reject target codepath with the
differences that:
* Packets are cloned since the BPF user can still return SK_PASS from
  the cgroup_skb progs and the current skb need to stay untouched
  (cgroup_skb hooks only allow read-only skb payload).
* Since cgroup_skb programs are called late in the stack, checksums do
  not need to be computed or verified, and IPv4 fragmentation does not
  need to be checked (ip_local_deliver should take care of that
  earlier).

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 net/core/filter.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 7a72f766aacf..ff7b2b175425 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -85,6 +85,8 @@
 #include <linux/un.h>
 #include <net/xdp_sock_drv.h>
 #include <net/inet_dscp.h>
+#include <linux/icmp.h>
+#include <net/icmp.h>

 #include "dev.h"

@@ -12148,6 +12150,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
 	return 0;
 }

+__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
+{
+	struct sk_buff *skb = (struct sk_buff *)__skb;
+	struct sk_buff *nskb;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if (code < 0 || code > NR_ICMP_UNREACH)
+			return -EINVAL;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		if (ip_route_reply_fetch_dst(nskb) < 0) {
+			kfree_skb(nskb);
+			return -EHOSTUNREACH;
+		}
+
+		icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
+		kfree_skb(nskb);
+		break;
+#if IS_ENABLED(CONFIG_IPV6)
+	case htons(ETH_P_IPV6):
+		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
+			return -EINVAL;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		if (ip6_route_reply_fetch_dst(nskb) < 0) {
+			kfree_skb(nskb);
+			return -EHOSTUNREACH;
+		}
+
+		icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
+		kfree_skb(nskb);
+		break;
+#endif
+	default:
+		return -EPROTONOSUPPORT;
+	}
+
+	return SK_DROP;
+}
+
 __bpf_kfunc_end_defs();

 int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
@@ -12185,6 +12234,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
 BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp, KF_TRUSTED_ARGS)
 BTF_KFUNCS_END(bpf_kfunc_check_set_sock_ops)

+BTF_KFUNCS_START(bpf_kfunc_check_set_icmp_send_unreach)
+BTF_ID_FLAGS(func, bpf_icmp_send_unreach, KF_TRUSTED_ARGS)
+BTF_KFUNCS_END(bpf_kfunc_check_set_icmp_send_unreach)
+
 static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
 	.owner = THIS_MODULE,
 	.set = &bpf_kfunc_check_set_skb,
@@ -12210,6 +12263,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
 	.set = &bpf_kfunc_check_set_sock_ops,
 };

+static const struct btf_kfunc_id_set bpf_kfunc_set_icmp_send_unreach = {
+	.owner = THIS_MODULE,
+	.set = &bpf_kfunc_check_set_icmp_send_unreach,
+};
+
 static int __init bpf_kfunc_init(void)
 {
 	int ret;
@@ -12229,6 +12287,7 @@ static int __init bpf_kfunc_init(void)
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 					       &bpf_kfunc_set_sock_addr);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send_unreach);
 	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
 }
 late_initcall(bpf_kfunc_init);
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-25 18:53     ` [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
                         ` (2 preceding siblings ...)
  2025-07-25 18:53       ` [PATCH bpf-next v2 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
@ 2025-07-25 18:53       ` Mahe Tardy
  3 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-25 18:53 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: andrii, ast, bpf, daniel, john.fastabend, mahe.tardy, martin.lau,
	fw, netfilter-devel, pablo, netdev, coreteam

This test opens a server and client, attach a cgroup_skb program on
egress and calls the icmp_send_unreach function from the client egress
so that an ICMP unreach control message is sent back to the client.
It then fetches the message from the error queue to confirm the correct
ICMP unreach code has been sent.

Note that the BPF program returns SK_PASS to let the connection being
established to finish the test cases quicker. Otherwise, you have to
wait for the TCP three-way handshake to timeout in the kernel and
retrieve the errno translated from the unreach code set by the ICMP
control message.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_unreach_kfunc.c  | 99 +++++++++++++++++++
 .../selftests/bpf/progs/icmp_send_unreach.c   | 36 +++++++
 2 files changed, 135 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send_unreach.c

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
new file mode 100644
index 000000000000..414c1ed8ced3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <linux/errqueue.h>
+#include "icmp_send_unreach.skel.h"
+
+#define TIMEOUT_MS 1000
+#define SRV_PORT 54321
+
+#define ICMP_DEST_UNREACH 3
+
+#define ICMP_FRAG_NEEDED 4
+#define NR_ICMP_UNREACH 15
+
+static void read_icmp_errqueue(int sockfd, int expected_code)
+{
+	ssize_t n;
+	struct sock_extended_err *sock_err;
+	struct cmsghdr *cm;
+	char ctrl_buf[512];
+	struct msghdr msg = {
+		.msg_control = ctrl_buf,
+		.msg_controllen = sizeof(ctrl_buf),
+	};
+
+	n = recvmsg(sockfd, &msg, MSG_ERRQUEUE);
+	if (!ASSERT_GE(n, 0, "recvmsg_errqueue"))
+		return;
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if (!ASSERT_EQ(cm->cmsg_level, IPPROTO_IP, "cmsg_type") ||
+		    !ASSERT_EQ(cm->cmsg_type, IP_RECVERR, "cmsg_level"))
+			continue;
+
+		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);
+
+		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
+			       "sock_err_origin_icmp"))
+			return;
+		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
+			       "sock_err_type_dest_unreach"))
+			return;
+		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
+	}
+}
+
+void test_icmp_send_unreach_kfunc(void)
+{
+	struct icmp_send_unreach *skel;
+	int cgroup_fd = -1, client_fd = 1, srv_fd = -1;
+	int *code;
+
+	skel = icmp_send_unreach__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+		goto cleanup;
+
+	skel->links.egress =
+		bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
+		goto cleanup;
+
+	code = &skel->bss->unreach_code;
+
+	for (*code = 0; *code <= NR_ICMP_UNREACH; (*code)++) {
+		// The TCP stack reacts differently when asking for
+		// fragmentation, let's ignore it for now
+		if (*code == ICMP_FRAG_NEEDED)
+			continue;
+
+		skel->bss->kfunc_ret = -1;
+
+		srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1",
+				      SRV_PORT, TIMEOUT_MS);
+		if (!ASSERT_GE(srv_fd, 0, "start_server"))
+			goto for_cleanup;
+
+		client_fd = socket(AF_INET, SOCK_STREAM, 0);
+		ASSERT_GE(client_fd, 0, "client_socket");
+
+		client_fd = connect_to_fd(srv_fd, 0);
+		if (!ASSERT_GE(client_fd, 0, "client_connect"))
+			goto for_cleanup;
+
+		read_icmp_errqueue(client_fd, *code);
+
+		ASSERT_EQ(skel->bss->kfunc_ret, SK_DROP, "kfunc_ret");
+for_cleanup:
+		close(client_fd);
+		close(srv_fd);
+	}
+
+cleanup:
+	icmp_send_unreach__destroy(skel);
+	close(cgroup_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send_unreach.c b/tools/testing/selftests/bpf/progs/icmp_send_unreach.c
new file mode 100644
index 000000000000..15783e5d1d65
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/icmp_send_unreach.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
+
+int unreach_code = 0;
+int kfunc_ret = 0;
+
+#define SERVER_PORT 54321
+#define SERVER_IP 0x7F000001
+
+SEC("cgroup_skb/egress")
+int egress(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+	struct iphdr *iph;
+	struct tcphdr *tcph;
+
+	iph = data;
+	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
+	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
+		return SK_PASS;
+
+	tcph = (void *)iph + iph->ihl * 4;
+	if ((void *)(tcph + 1) > data_end ||
+	    tcph->dest != bpf_htons(SERVER_PORT))
+		return SK_PASS;
+
+	kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
+
+	/* returns SK_PASS to execute the test case quicker */
+	return SK_PASS;
+}
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-25 18:53       ` [PATCH bpf-next v2 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
@ 2025-07-27  1:49         ` kernel test robot
  2025-07-28  9:43           ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  0 siblings, 1 reply; 36+ messages in thread
From: kernel test robot @ 2025-07-27  1:49 UTC (permalink / raw)
  To: Mahe Tardy, alexei.starovoitov
  Cc: oe-kbuild-all, andrii, ast, bpf, daniel, john.fastabend,
	mahe.tardy, martin.lau, fw, netfilter-devel, pablo, netdev,
	coreteam

Hi Mahe,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Mahe-Tardy/net-move-netfilter-nf_reject_fill_skb_dst-to-core-ipv4/20250726-030109
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250725185342.262067-4-mahe.tardy%40gmail.com
patch subject: [PATCH bpf-next v2 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
config: arm64-defconfig (https://download.01.org/0day-ci/archive/20250727/202507270940.kXGmRbg5-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250727/202507270940.kXGmRbg5-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507270940.kXGmRbg5-lkp@intel.com/

All errors (new ones prefixed by >>):

   aarch64-linux-ld: Unexpected GOT/PLT entries detected!
   aarch64-linux-ld: Unexpected run-time procedure linkages detected!
   aarch64-linux-ld: net/core/filter.o: in function `bpf_icmp_send_unreach':
>> net/core/filter.c:12184:(.text+0x14574): undefined reference to `ip6_route_reply_fetch_dst'


vim +12184 net/core/filter.c

 12152	
 12153	__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
 12154	{
 12155		struct sk_buff *skb = (struct sk_buff *)__skb;
 12156		struct sk_buff *nskb;
 12157	
 12158		switch (skb->protocol) {
 12159		case htons(ETH_P_IP):
 12160			if (code < 0 || code > NR_ICMP_UNREACH)
 12161				return -EINVAL;
 12162	
 12163			nskb = skb_clone(skb, GFP_ATOMIC);
 12164			if (!nskb)
 12165				return -ENOMEM;
 12166	
 12167			if (ip_route_reply_fetch_dst(nskb) < 0) {
 12168				kfree_skb(nskb);
 12169				return -EHOSTUNREACH;
 12170			}
 12171	
 12172			icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
 12173			kfree_skb(nskb);
 12174			break;
 12175	#if IS_ENABLED(CONFIG_IPV6)
 12176		case htons(ETH_P_IPV6):
 12177			if (code < 0 || code > ICMPV6_REJECT_ROUTE)
 12178				return -EINVAL;
 12179	
 12180			nskb = skb_clone(skb, GFP_ATOMIC);
 12181			if (!nskb)
 12182				return -ENOMEM;
 12183	
 12184			if (ip6_route_reply_fetch_dst(nskb) < 0) {
 12185				kfree_skb(nskb);
 12186				return -EHOSTUNREACH;
 12187			}
 12188	
 12189			icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
 12190			kfree_skb(nskb);
 12191			break;
 12192	#endif
 12193		default:
 12194			return -EPROTONOSUPPORT;
 12195		}
 12196	
 12197		return SK_DROP;
 12198	}
 12199	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc
  2025-07-27  1:49         ` kernel test robot
@ 2025-07-28  9:43           ` Mahe Tardy
  2025-07-28  9:43             ` [PATCH bpf-next v3 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
                               ` (4 more replies)
  0 siblings, 5 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-28  9:43 UTC (permalink / raw)
  To: lkp
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, mahe.tardy, martin.lau, netdev, netfilter-devel,
	oe-kbuild-all, pablo

Hello,

This is v3 of adding the icmp_send_unreach kfunc, as suggested during
LSF/MM/BPF 2025[^1]. The goal is to allow cgroup_skb programs to
actively reject east-west traffic, similarly to what is possible to do
with netfilter reject target.

The first step to implement this is using ICMP control messages, with
the ICMP_DEST_UNREACH type with various code ICMP_NET_UNREACH,
ICMP_HOST_UNREACH, ICMP_PROT_UNREACH, etc. This is easier to implement
than a TCP RST reply and will already hint the client TCP stack to abort
the connection and not retry extensively.

Note that this is different than the sock_destroy kfunc, that along
calls tcp_abort and thus sends a reset, destroying the underlying
socket.

Caveats of this kfunc design are that a cgroup_skb program can call this
function N times, thus send N ICMP unreach control messages and that the
program can return from the BPF filter with SK_PASS leading to a
potential confusing situation where the TCP connection was established
while the client received ICMP_DEST_UNREACH messages.

Another more sophisticated design idea would be for the kfunc to set the
kernel to send an ICMP_HOST_UNREACH control message with the appropriate
code when the cgroup_skb program terminates with SK_DROP. Creating a new
'SK_REJECT' return code for cgroup_skb program was generally rejected
and would be too limited for other program types support.

We should bear in mind that we want to add a TCP reset kfunc next and
also could extend this kfunc to other program types if wanted.

v2 updates:
- fix a build error from a missing function call rename;
- avoid changing return line in bpf_kfunc_init;
- return SK_DROP from the kfunc (similarly to bpf_redirect);
- check the return value in the selftest.

v3 update:
- fix an undefined reference build error.

[^1]: https://lwn.net/Articles/1022034/

Mahe Tardy (4):
  net: move netfilter nf_reject_fill_skb_dst to core ipv4
  net: move netfilter nf_reject6_fill_skb_dst to core ipv6
  bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  selftests/bpf: add icmp_send_unreach kfunc tests

 include/net/ip6_route.h                       |  2 +
 include/net/route.h                           |  1 +
 net/core/filter.c                             | 61 ++++++++++++
 net/ipv4/netfilter/nf_reject_ipv4.c           | 19 +---
 net/ipv4/route.c                              | 15 +++
 net/ipv6/netfilter/nf_reject_ipv6.c           | 17 +---
 net/ipv6/route.c                              | 18 ++++
 .../bpf/prog_tests/icmp_send_unreach_kfunc.c  | 99 +++++++++++++++++++
 .../selftests/bpf/progs/icmp_send_unreach.c   | 36 +++++++
 9 files changed, 235 insertions(+), 33 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send_unreach.c

--
2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v3 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4
  2025-07-28  9:43           ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
@ 2025-07-28  9:43             ` Mahe Tardy
  2025-07-28  9:43             ` [PATCH bpf-next v3 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-28  9:43 UTC (permalink / raw)
  To: lkp
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, mahe.tardy, martin.lau, netdev, netfilter-devel,
	oe-kbuild-all, pablo

Move and rename nf_reject_fill_skb_dst from
ipv4/netfilter/nf_reject_ipv4 to ip_route_reply_fetch_dst in
ipv4/route.c so that it can be reused in the following patches by BPF
kfuncs.

Netfilter uses nf_ip_route that is almost a transparent wrapper around
ip_route_output_key so this patch inlines it.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 include/net/route.h                 |  1 +
 net/ipv4/netfilter/nf_reject_ipv4.c | 19 ++-----------------
 net/ipv4/route.c                    | 15 +++++++++++++++
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 8e39aa822cf9..1f032f768d52 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -173,6 +173,7 @@ struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
 				    const struct sock *sk);
 struct dst_entry *ipv4_blackhole_route(struct net *net,
 				       struct dst_entry *dst_orig);
+int ip_route_reply_fetch_dst(struct sk_buff *skb);

 static inline struct rtable *ip_route_output_key(struct net *net, struct flowi4 *flp)
 {
diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index 87fd945a0d27..76beb78f556a 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -220,21 +220,6 @@ void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
 }
 EXPORT_SYMBOL_GPL(nf_reject_ip_tcphdr_put);

-static int nf_reject_fill_skb_dst(struct sk_buff *skb_in)
-{
-	struct dst_entry *dst = NULL;
-	struct flowi fl;
-
-	memset(&fl, 0, sizeof(struct flowi));
-	fl.u.ip4.daddr = ip_hdr(skb_in)->saddr;
-	nf_ip_route(dev_net(skb_in->dev), &dst, &fl, false);
-	if (!dst)
-		return -1;
-
-	skb_dst_set(skb_in, dst);
-	return 0;
-}
-
 /* Send RST reply */
 void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		   int hook)
@@ -248,7 +233,7 @@ void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		return;

 	if ((hook == NF_INET_PRE_ROUTING || hook == NF_INET_INGRESS) &&
-	    nf_reject_fill_skb_dst(oldskb) < 0)
+	    ip_route_reply_fetch_dst(oldskb) < 0)
 		return;

 	if (skb_rtable(oldskb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
@@ -322,7 +307,7 @@ void nf_send_unreach(struct sk_buff *skb_in, int code, int hook)
 		return;

 	if ((hook == NF_INET_PRE_ROUTING || hook == NF_INET_INGRESS) &&
-	    nf_reject_fill_skb_dst(skb_in) < 0)
+	    ip_route_reply_fetch_dst(skb_in) < 0)
 		return;

 	if (skb_csum_unnecessary(skb_in) ||
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fccb05fb3a79..59b8fc3c01c0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2934,6 +2934,21 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 }
 EXPORT_SYMBOL_GPL(ip_route_output_flow);

+int ip_route_reply_fetch_dst(struct sk_buff *skb)
+{
+	struct rtable *rt;
+	struct flowi4 fl4 = {
+		.daddr = ip_hdr(skb)->saddr
+	};
+
+	rt = ip_route_output_key(dev_net(skb->dev), &fl4);
+	if (IS_ERR(rt))
+		return PTR_ERR(rt);
+	skb_dst_set(skb, &rt->dst);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ip_route_reply_fetch_dst);
+
 /* called with rcu_read_lock held */
 static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
 			struct rtable *rt, u32 table_id, dscp_t dscp,
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v3 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6
  2025-07-28  9:43           ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  2025-07-28  9:43             ` [PATCH bpf-next v3 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
@ 2025-07-28  9:43             ` Mahe Tardy
  2025-07-28  9:43             ` [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-28  9:43 UTC (permalink / raw)
  To: lkp
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, mahe.tardy, martin.lau, netdev, netfilter-devel,
	oe-kbuild-all, pablo

Move and rename nf_reject6_fill_skb_dst from
ipv6/netfilter/nf_reject_ipv6 to ip6_route_reply_fetch_dst in
ipv6/route.c so that it can be reused in the following patches by BPF
kfuncs.

Netfilter uses nf_ip6_route that is almost a transparent wrapper around
ip6_route_outputy so this patch inlines it.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 include/net/ip6_route.h             |  2 ++
 net/ipv6/netfilter/nf_reject_ipv6.c | 17 +----------------
 net/ipv6/route.c                    | 18 ++++++++++++++++++
 3 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 6dbdf60b342f..1426467df547 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -93,6 +93,8 @@ static inline struct dst_entry *ip6_route_output(struct net *net,
 	return ip6_route_output_flags(net, sk, fl6, 0);
 }

+int ip6_route_reply_fetch_dst(struct sk_buff *skb);
+
 /* Only conditionally release dst if flags indicates
  * !RT6_LOOKUP_F_DST_NOREF or dst is in uncached_list.
  */
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 9ae2b2725bf9..994a3b88ac52 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -250,21 +250,6 @@ void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
 }
 EXPORT_SYMBOL_GPL(nf_reject_ip6_tcphdr_put);

-static int nf_reject6_fill_skb_dst(struct sk_buff *skb_in)
-{
-	struct dst_entry *dst = NULL;
-	struct flowi fl;
-
-	memset(&fl, 0, sizeof(struct flowi));
-	fl.u.ip6.daddr = ipv6_hdr(skb_in)->saddr;
-	nf_ip6_route(dev_net(skb_in->dev), &dst, &fl, false);
-	if (!dst)
-		return -1;
-
-	skb_dst_set(skb_in, dst);
-	return 0;
-}
-
 void nf_send_reset6(struct net *net, struct sock *sk, struct sk_buff *oldskb,
 		    int hook)
 {
@@ -398,7 +383,7 @@ void nf_send_unreach6(struct net *net, struct sk_buff *skb_in,
 		skb_in->dev = net->loopback_dev;

 	if ((hooknum == NF_INET_PRE_ROUTING || hooknum == NF_INET_INGRESS) &&
-	    nf_reject6_fill_skb_dst(skb_in) < 0)
+	    ip6_route_reply_fetch_dst(skb_in) < 0)
 		return;

 	icmpv6_send(skb_in, ICMPV6_DEST_UNREACH, code, 0);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0d5464c64965..de61540f9524 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2705,6 +2705,24 @@ struct dst_entry *ip6_route_output_flags(struct net *net,
 }
 EXPORT_SYMBOL_GPL(ip6_route_output_flags);

+int ip6_route_reply_fetch_dst(struct sk_buff *skb)
+{
+	struct dst_entry *result;
+	struct flowi6 fl = {
+		.daddr = ipv6_hdr(skb)->saddr
+	};
+	int err;
+
+	result = ip6_route_output(dev_net(skb->dev), NULL, &fl);
+	err = result->error;
+	if (err)
+		dst_release(result);
+	else
+		skb_dst_set(skb, result);
+	return err;
+}
+EXPORT_SYMBOL_GPL(ip6_route_reply_fetch_dst);
+
 struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_orig)
 {
 	struct rt6_info *rt, *ort = dst_rt6_info(dst_orig);
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-28  9:43           ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
  2025-07-28  9:43             ` [PATCH bpf-next v3 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
  2025-07-28  9:43             ` [PATCH bpf-next v3 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
@ 2025-07-28  9:43             ` Mahe Tardy
  2025-07-28 20:10               ` kernel test robot
  2025-07-29  1:05               ` Martin KaFai Lau
  2025-07-28  9:43             ` [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
  2025-07-29  1:21             ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Martin KaFai Lau
  4 siblings, 2 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-28  9:43 UTC (permalink / raw)
  To: lkp
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, mahe.tardy, martin.lau, netdev, netfilter-devel,
	oe-kbuild-all, pablo

This is needed in the context of Tetragon to provide improved feedback
(in contrast to just dropping packets) to east-west traffic when blocked
by policies using cgroup_skb programs.

This reuse concepts from netfilter reject target codepath with the
differences that:
* Packets are cloned since the BPF user can still return SK_PASS from
  the cgroup_skb progs and the current skb need to stay untouched
  (cgroup_skb hooks only allow read-only skb payload).
* Since cgroup_skb programs are called late in the stack, checksums do
  not need to be computed or verified, and IPv4 fragmentation does not
  need to be checked (ip_local_deliver should take care of that
  earlier).

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 net/core/filter.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 7a72f766aacf..050872324575 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -85,6 +85,10 @@
 #include <linux/un.h>
 #include <net/xdp_sock_drv.h>
 #include <net/inet_dscp.h>
+#include <linux/icmp.h>
+#include <net/icmp.h>
+#include <net/route.h>
+#include <net/ip6_route.h>

 #include "dev.h"

@@ -12148,6 +12152,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
 	return 0;
 }

+__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
+{
+	struct sk_buff *skb = (struct sk_buff *)__skb;
+	struct sk_buff *nskb;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if (code < 0 || code > NR_ICMP_UNREACH)
+			return -EINVAL;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		if (ip_route_reply_fetch_dst(nskb) < 0) {
+			kfree_skb(nskb);
+			return -EHOSTUNREACH;
+		}
+
+		icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
+		kfree_skb(nskb);
+		break;
+#if IS_ENABLED(CONFIG_IPV6)
+	case htons(ETH_P_IPV6):
+		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
+			return -EINVAL;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		if (ip6_route_reply_fetch_dst(nskb) < 0) {
+			kfree_skb(nskb);
+			return -EHOSTUNREACH;
+		}
+
+		icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
+		kfree_skb(nskb);
+		break;
+#endif
+	default:
+		return -EPROTONOSUPPORT;
+	}
+
+	return SK_DROP;
+}
+
 __bpf_kfunc_end_defs();

 int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
@@ -12185,6 +12236,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
 BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp, KF_TRUSTED_ARGS)
 BTF_KFUNCS_END(bpf_kfunc_check_set_sock_ops)

+BTF_KFUNCS_START(bpf_kfunc_check_set_icmp_send_unreach)
+BTF_ID_FLAGS(func, bpf_icmp_send_unreach, KF_TRUSTED_ARGS)
+BTF_KFUNCS_END(bpf_kfunc_check_set_icmp_send_unreach)
+
 static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
 	.owner = THIS_MODULE,
 	.set = &bpf_kfunc_check_set_skb,
@@ -12210,6 +12265,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
 	.set = &bpf_kfunc_check_set_sock_ops,
 };

+static const struct btf_kfunc_id_set bpf_kfunc_set_icmp_send_unreach = {
+	.owner = THIS_MODULE,
+	.set = &bpf_kfunc_check_set_icmp_send_unreach,
+};
+
 static int __init bpf_kfunc_init(void)
 {
 	int ret;
@@ -12229,6 +12289,7 @@ static int __init bpf_kfunc_init(void)
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 					       &bpf_kfunc_set_sock_addr);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send_unreach);
 	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
 }
 late_initcall(bpf_kfunc_init);
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-28  9:43           ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
                               ` (2 preceding siblings ...)
  2025-07-28  9:43             ` [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
@ 2025-07-28  9:43             ` Mahe Tardy
  2025-07-28 15:40               ` Yonghong Song
                                 ` (2 more replies)
  2025-07-29  1:21             ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Martin KaFai Lau
  4 siblings, 3 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-28  9:43 UTC (permalink / raw)
  To: lkp
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, mahe.tardy, martin.lau, netdev, netfilter-devel,
	oe-kbuild-all, pablo

This test opens a server and client, attach a cgroup_skb program on
egress and calls the icmp_send_unreach function from the client egress
so that an ICMP unreach control message is sent back to the client.
It then fetches the message from the error queue to confirm the correct
ICMP unreach code has been sent.

Note that the BPF program returns SK_PASS to let the connection being
established to finish the test cases quicker. Otherwise, you have to
wait for the TCP three-way handshake to timeout in the kernel and
retrieve the errno translated from the unreach code set by the ICMP
control message.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_unreach_kfunc.c  | 99 +++++++++++++++++++
 .../selftests/bpf/progs/icmp_send_unreach.c   | 36 +++++++
 2 files changed, 135 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send_unreach.c

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
new file mode 100644
index 000000000000..414c1ed8ced3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <linux/errqueue.h>
+#include "icmp_send_unreach.skel.h"
+
+#define TIMEOUT_MS 1000
+#define SRV_PORT 54321
+
+#define ICMP_DEST_UNREACH 3
+
+#define ICMP_FRAG_NEEDED 4
+#define NR_ICMP_UNREACH 15
+
+static void read_icmp_errqueue(int sockfd, int expected_code)
+{
+	ssize_t n;
+	struct sock_extended_err *sock_err;
+	struct cmsghdr *cm;
+	char ctrl_buf[512];
+	struct msghdr msg = {
+		.msg_control = ctrl_buf,
+		.msg_controllen = sizeof(ctrl_buf),
+	};
+
+	n = recvmsg(sockfd, &msg, MSG_ERRQUEUE);
+	if (!ASSERT_GE(n, 0, "recvmsg_errqueue"))
+		return;
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if (!ASSERT_EQ(cm->cmsg_level, IPPROTO_IP, "cmsg_type") ||
+		    !ASSERT_EQ(cm->cmsg_type, IP_RECVERR, "cmsg_level"))
+			continue;
+
+		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);
+
+		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
+			       "sock_err_origin_icmp"))
+			return;
+		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
+			       "sock_err_type_dest_unreach"))
+			return;
+		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
+	}
+}
+
+void test_icmp_send_unreach_kfunc(void)
+{
+	struct icmp_send_unreach *skel;
+	int cgroup_fd = -1, client_fd = 1, srv_fd = -1;
+	int *code;
+
+	skel = icmp_send_unreach__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+		goto cleanup;
+
+	skel->links.egress =
+		bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
+		goto cleanup;
+
+	code = &skel->bss->unreach_code;
+
+	for (*code = 0; *code <= NR_ICMP_UNREACH; (*code)++) {
+		// The TCP stack reacts differently when asking for
+		// fragmentation, let's ignore it for now
+		if (*code == ICMP_FRAG_NEEDED)
+			continue;
+
+		skel->bss->kfunc_ret = -1;
+
+		srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1",
+				      SRV_PORT, TIMEOUT_MS);
+		if (!ASSERT_GE(srv_fd, 0, "start_server"))
+			goto for_cleanup;
+
+		client_fd = socket(AF_INET, SOCK_STREAM, 0);
+		ASSERT_GE(client_fd, 0, "client_socket");
+
+		client_fd = connect_to_fd(srv_fd, 0);
+		if (!ASSERT_GE(client_fd, 0, "client_connect"))
+			goto for_cleanup;
+
+		read_icmp_errqueue(client_fd, *code);
+
+		ASSERT_EQ(skel->bss->kfunc_ret, SK_DROP, "kfunc_ret");
+for_cleanup:
+		close(client_fd);
+		close(srv_fd);
+	}
+
+cleanup:
+	icmp_send_unreach__destroy(skel);
+	close(cgroup_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send_unreach.c b/tools/testing/selftests/bpf/progs/icmp_send_unreach.c
new file mode 100644
index 000000000000..15783e5d1d65
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/icmp_send_unreach.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
+
+int unreach_code = 0;
+int kfunc_ret = 0;
+
+#define SERVER_PORT 54321
+#define SERVER_IP 0x7F000001
+
+SEC("cgroup_skb/egress")
+int egress(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+	struct iphdr *iph;
+	struct tcphdr *tcph;
+
+	iph = data;
+	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
+	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
+		return SK_PASS;
+
+	tcph = (void *)iph + iph->ihl * 4;
+	if ((void *)(tcph + 1) > data_end ||
+	    tcph->dest != bpf_htons(SERVER_PORT))
+		return SK_PASS;
+
+	kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
+
+	/* returns SK_PASS to execute the test case quicker */
+	return SK_PASS;
+}
--
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-28  9:43             ` [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
@ 2025-07-28 15:40               ` Yonghong Song
  2025-07-28 15:59                 ` Mahe Tardy
  2025-07-29  1:18               ` Martin KaFai Lau
  2025-08-05 23:26               ` Jordan Rife
  2 siblings, 1 reply; 36+ messages in thread
From: Yonghong Song @ 2025-07-28 15:40 UTC (permalink / raw)
  To: Mahe Tardy, lkp
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, martin.lau, netdev, netfilter-devel,
	oe-kbuild-all, pablo



On 7/28/25 2:43 AM, Mahe Tardy wrote:
> This test opens a server and client, attach a cgroup_skb program on
> egress and calls the icmp_send_unreach function from the client egress
> so that an ICMP unreach control message is sent back to the client.
> It then fetches the message from the error queue to confirm the correct
> ICMP unreach code has been sent.
>
> Note that the BPF program returns SK_PASS to let the connection being
> established to finish the test cases quicker. Otherwise, you have to
> wait for the TCP three-way handshake to timeout in the kernel and
> retrieve the errno translated from the unreach code set by the ICMP
> control message.
>
> Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
> ---
>   .../bpf/prog_tests/icmp_send_unreach_kfunc.c  | 99 +++++++++++++++++++
>   .../selftests/bpf/progs/icmp_send_unreach.c   | 36 +++++++
>   2 files changed, 135 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
>   create mode 100644 tools/testing/selftests/bpf/progs/icmp_send_unreach.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
> new file mode 100644
> index 000000000000..414c1ed8ced3
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <test_progs.h>
> +#include <network_helpers.h>
> +#include <linux/errqueue.h>
> +#include "icmp_send_unreach.skel.h"
> +
> +#define TIMEOUT_MS 1000
> +#define SRV_PORT 54321
> +
> +#define ICMP_DEST_UNREACH 3
> +
> +#define ICMP_FRAG_NEEDED 4
> +#define NR_ICMP_UNREACH 15
> +
> +static void read_icmp_errqueue(int sockfd, int expected_code)
> +{
> +	ssize_t n;
> +	struct sock_extended_err *sock_err;
> +	struct cmsghdr *cm;
> +	char ctrl_buf[512];
> +	struct msghdr msg = {
> +		.msg_control = ctrl_buf,
> +		.msg_controllen = sizeof(ctrl_buf),
> +	};
> +
> +	n = recvmsg(sockfd, &msg, MSG_ERRQUEUE);
> +	if (!ASSERT_GE(n, 0, "recvmsg_errqueue"))
> +		return;
> +
> +	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
> +		if (!ASSERT_EQ(cm->cmsg_level, IPPROTO_IP, "cmsg_type") ||
> +		    !ASSERT_EQ(cm->cmsg_type, IP_RECVERR, "cmsg_level"))
> +			continue;
> +
> +		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);
> +
> +		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
> +			       "sock_err_origin_icmp"))
> +			return;
> +		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
> +			       "sock_err_type_dest_unreach"))
> +			return;
> +		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
> +	}
> +}
> +
> +void test_icmp_send_unreach_kfunc(void)
> +{
> +	struct icmp_send_unreach *skel;
> +	int cgroup_fd = -1, client_fd = 1, srv_fd = -1;

Should set client_fd = -1? See below ...

> +	int *code;
> +
> +	skel = icmp_send_unreach__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "skel_open"))
> +		goto cleanup;
> +
> +	cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
> +	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
> +		goto cleanup;
> +
> +	skel->links.egress =
> +		bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
> +	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
> +		goto cleanup;
> +
> +	code = &skel->bss->unreach_code;
> +
> +	for (*code = 0; *code <= NR_ICMP_UNREACH; (*code)++) {
> +		// The TCP stack reacts differently when asking for
> +		// fragmentation, let's ignore it for now
> +		if (*code == ICMP_FRAG_NEEDED)
> +			continue;
> +
> +		skel->bss->kfunc_ret = -1;
> +
> +		srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1",
> +				      SRV_PORT, TIMEOUT_MS);
> +		if (!ASSERT_GE(srv_fd, 0, "start_server"))
> +			goto for_cleanup;

Otherwise if client_fd = 1, goto for_cleanup will close(1).

> +
> +		client_fd = socket(AF_INET, SOCK_STREAM, 0);
> +		ASSERT_GE(client_fd, 0, "client_socket");

The above two lines are not necessary since client_fd is
actually set in the below.

> +
> +		client_fd = connect_to_fd(srv_fd, 0);
> +		if (!ASSERT_GE(client_fd, 0, "client_connect"))
> +			goto for_cleanup;
> +
> +		read_icmp_errqueue(client_fd, *code);
> +
> +		ASSERT_EQ(skel->bss->kfunc_ret, SK_DROP, "kfunc_ret");
> +for_cleanup:
> +		close(client_fd);
> +		close(srv_fd);
> +	}
> +
> +cleanup:
> +	icmp_send_unreach__destroy(skel);
> +	close(cgroup_fd);
> +}
[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-28 15:40               ` Yonghong Song
@ 2025-07-28 15:59                 ` Mahe Tardy
  0 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-07-28 15:59 UTC (permalink / raw)
  To: Yonghong Song
  Cc: lkp, alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, martin.lau, netdev, netfilter-devel,
	oe-kbuild-all, pablo

On Mon, Jul 28, 2025 at 08:40:49AM -0700, Yonghong Song wrote:
> 
> 
> On 7/28/25 2:43 AM, Mahe Tardy wrote:

[...]

> > +
> > +void test_icmp_send_unreach_kfunc(void)
> > +{
> > +	struct icmp_send_unreach *skel;
> > +	int cgroup_fd = -1, client_fd = 1, srv_fd = -1;
> 
> Should set client_fd = -1? See below ...

Well spotted yes, it's a typo, thank you.

> > +	int *code;
> > +
> > +	skel = icmp_send_unreach__open_and_load();
> > +	if (!ASSERT_OK_PTR(skel, "skel_open"))
> > +		goto cleanup;
> > +
> > +	cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
> > +	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
> > +		goto cleanup;
> > +
> > +	skel->links.egress =
> > +		bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
> > +	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
> > +		goto cleanup;
> > +
> > +	code = &skel->bss->unreach_code;
> > +
> > +	for (*code = 0; *code <= NR_ICMP_UNREACH; (*code)++) {
> > +		// The TCP stack reacts differently when asking for
> > +		// fragmentation, let's ignore it for now
> > +		if (*code == ICMP_FRAG_NEEDED)
> > +			continue;
> > +
> > +		skel->bss->kfunc_ret = -1;
> > +
> > +		srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1",
> > +				      SRV_PORT, TIMEOUT_MS);
> > +		if (!ASSERT_GE(srv_fd, 0, "start_server"))
> > +			goto for_cleanup;
> 
> Otherwise if client_fd = 1, goto for_cleanup will close(1).
> 
> > +
> > +		client_fd = socket(AF_INET, SOCK_STREAM, 0);
> > +		ASSERT_GE(client_fd, 0, "client_socket");
> 
> The above two lines are not necessary since client_fd is
> actually set in the below.

Yep, must have been a leftover from when I was discovering the
network_helpers, oops!

> > +
> > +		client_fd = connect_to_fd(srv_fd, 0);
> > +		if (!ASSERT_GE(client_fd, 0, "client_connect"))
> > +			goto for_cleanup;
> > +
> > +		read_icmp_errqueue(client_fd, *code);
> > +
> > +		ASSERT_EQ(skel->bss->kfunc_ret, SK_DROP, "kfunc_ret");
> > +for_cleanup:
> > +		close(client_fd);
> > +		close(srv_fd);
> > +	}
> > +
> > +cleanup:
> > +	icmp_send_unreach__destroy(skel);
> > +	close(cgroup_fd);
> > +}
> [...]

I'm sending a v4 with those fixed + fixing the builds error when IPv6 is
built as a module from the kfunc patch. Thanks for the review.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-28  9:43             ` [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
@ 2025-07-28 20:10               ` kernel test robot
  2025-07-29  1:05               ` Martin KaFai Lau
  1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2025-07-28 20:10 UTC (permalink / raw)
  To: Mahe Tardy, lkp
  Cc: oe-kbuild-all, alexei.starovoitov, andrii, ast, bpf, coreteam,
	daniel, fw, john.fastabend, mahe.tardy, martin.lau, netdev,
	netfilter-devel, pablo

Hi Mahe,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Mahe-Tardy/net-move-netfilter-nf_reject_fill_skb_dst-to-core-ipv4/20250728-174736
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250728094345.46132-4-mahe.tardy%40gmail.com
patch subject: [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
config: s390-randconfig-001-20250729 (https://download.01.org/0day-ci/archive/20250729/202507290356.HyFMR3K0-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250729/202507290356.HyFMR3K0-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507290356.HyFMR3K0-lkp@intel.com/

All errors (new ones prefixed by >>):

   s390-linux-ld: net/core/filter.o: in function `bpf_icmp_send_unreach':
   filter.c:(.text+0xf7b8): undefined reference to `ip_route_reply_fetch_dst'
>> s390-linux-ld: filter.c:(.text+0xf7ea): undefined reference to `__icmp_send'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-28  9:43             ` [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
  2025-07-28 20:10               ` kernel test robot
@ 2025-07-29  1:05               ` Martin KaFai Lau
  2025-07-29 10:06                 ` Mahe Tardy
  1 sibling, 1 reply; 36+ messages in thread
From: Martin KaFai Lau @ 2025-07-29  1:05 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On 7/28/25 2:43 AM, Mahe Tardy wrote:
> This is needed in the context of Tetragon to provide improved feedback
> (in contrast to just dropping packets) to east-west traffic when blocked
> by policies using cgroup_skb programs.
> 
> This reuse concepts from netfilter reject target codepath with the
> differences that:
> * Packets are cloned since the BPF user can still return SK_PASS from
>    the cgroup_skb progs and the current skb need to stay untouched

This needs more details. Which field(s) of the skb are changed by the kfunc, the 
skb_dst_set in ip[6]_route_reply_fetch_dst() and/or the code path in the 
icmp[v6]_send() ?

>    (cgroup_skb hooks only allow read-only skb payload).
> * Since cgroup_skb programs are called late in the stack, checksums do
>    not need to be computed or verified, and IPv4 fragmentation does not
>    need to be checked (ip_local_deliver should take care of that
>    earlier).
> 
> Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
> ---
>   net/core/filter.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 61 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7a72f766aacf..050872324575 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -85,6 +85,10 @@
>   #include <linux/un.h>
>   #include <net/xdp_sock_drv.h>
>   #include <net/inet_dscp.h>
> +#include <linux/icmp.h>
> +#include <net/icmp.h>
> +#include <net/route.h>
> +#include <net/ip6_route.h>
> 
>   #include "dev.h"
> 
> @@ -12148,6 +12152,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
>   	return 0;
>   }
> 
> +__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
> +{
> +	struct sk_buff *skb = (struct sk_buff *)__skb;
> +	struct sk_buff *nskb;
> +
> +	switch (skb->protocol) {
> +	case htons(ETH_P_IP):
> +		if (code < 0 || code > NR_ICMP_UNREACH)
> +			return -EINVAL;
> +
> +		nskb = skb_clone(skb, GFP_ATOMIC);
> +		if (!nskb)
> +			return -ENOMEM;
> +
> +		if (ip_route_reply_fetch_dst(nskb) < 0) {
> +			kfree_skb(nskb);
> +			return -EHOSTUNREACH;
> +		}
> +
> +		icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
> +		kfree_skb(nskb);
> +		break;
> +#if IS_ENABLED(CONFIG_IPV6)
> +	case htons(ETH_P_IPV6):
> +		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> +			return -EINVAL;
> +
> +		nskb = skb_clone(skb, GFP_ATOMIC);
> +		if (!nskb)
> +			return -ENOMEM;
> +
> +		if (ip6_route_reply_fetch_dst(nskb) < 0) {

 From a very quick look at icmpv6_send(), it does its own route lookup. I 
haven't looked at the v4 yet.

I am likely missing some details. Can you explain why it needs to do a lookup 
before calling icmpv6_send()?

> +			kfree_skb(nskb);
> +			return -EHOSTUNREACH;
> +		}
> +
> +		icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
> +		kfree_skb(nskb);
> +		break;
> +#endif
> +	default:
> +		return -EPROTONOSUPPORT;
> +	}
> +
> +	return SK_DROP;
> +}
> +

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-28  9:43             ` [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
  2025-07-28 15:40               ` Yonghong Song
@ 2025-07-29  1:18               ` Martin KaFai Lau
  2025-07-29  9:09                 ` Mahe Tardy
  2025-08-05 23:26               ` Jordan Rife
  2 siblings, 1 reply; 36+ messages in thread
From: Martin KaFai Lau @ 2025-07-29  1:18 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On 7/28/25 2:43 AM, Mahe Tardy wrote:
> +SEC("cgroup_skb/egress")
> +int egress(struct __sk_buff *skb)
> +{
> +	void *data = (void *)(long)skb->data;
> +	void *data_end = (void *)(long)skb->data_end;
> +	struct iphdr *iph;
> +	struct tcphdr *tcph;
> +
> +	iph = data;
> +	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
> +	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
> +		return SK_PASS;
> +
> +	tcph = (void *)iph + iph->ihl * 4;
> +	if ((void *)(tcph + 1) > data_end ||
> +	    tcph->dest != bpf_htons(SERVER_PORT))
> +		return SK_PASS;
> +
> +	kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
> +
> +	/* returns SK_PASS to execute the test case quicker */

Do you know why the user space is slower if 0 (SK_DROP) is used?

> +	return SK_PASS;



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc
  2025-07-28  9:43           ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
                               ` (3 preceding siblings ...)
  2025-07-28  9:43             ` [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
@ 2025-07-29  1:21             ` Martin KaFai Lau
  2025-07-29  9:53               ` Mahe Tardy
  4 siblings, 1 reply; 36+ messages in thread
From: Martin KaFai Lau @ 2025-07-29  1:21 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On 7/28/25 2:43 AM, Mahe Tardy wrote:
> Hello,
> 
> This is v3 of adding the icmp_send_unreach kfunc, as suggested during
> LSF/MM/BPF 2025[^1]. The goal is to allow cgroup_skb programs to
> actively reject east-west traffic, similarly to what is possible to do
> with netfilter reject target.
> 
> The first step to implement this is using ICMP control messages, with
> the ICMP_DEST_UNREACH type with various code ICMP_NET_UNREACH,
> ICMP_HOST_UNREACH, ICMP_PROT_UNREACH, etc. This is easier to implement
> than a TCP RST reply and will already hint the client TCP stack to abort
> the connection and not retry extensively.
> 
> Note that this is different than the sock_destroy kfunc, that along
> calls tcp_abort and thus sends a reset, destroying the underlying
> socket.
> 
> Caveats of this kfunc design are that a cgroup_skb program can call this
> function N times, thus send N ICMP unreach control messages and that the
> program can return from the BPF filter with SK_PASS leading to a
> potential confusing situation where the TCP connection was established
> while the client received ICMP_DEST_UNREACH messages.
> 
> Another more sophisticated design idea would be for the kfunc to set the
> kernel to send an ICMP_HOST_UNREACH control message with the appropriate
> code when the cgroup_skb program terminates with SK_DROP. Creating a new
> 'SK_REJECT' return code for cgroup_skb program was generally rejected
> and would be too limited for other program types support.
> 
> We should bear in mind that we want to add a TCP reset kfunc next and
> also could extend this kfunc to other program types if wanted.

Some high level questions.

Which other program types do you need this kfunc to send icmp and the future tcp 
rst?

This cover letter mentioned sending icmp unreach is easier than sending tcp rst. 
What problems do you see in sending tcp rst?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-29  1:18               ` Martin KaFai Lau
@ 2025-07-29  9:09                 ` Mahe Tardy
  2025-07-29 23:27                   ` Martin KaFai Lau
  0 siblings, 1 reply; 36+ messages in thread
From: Mahe Tardy @ 2025-07-29  9:09 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On Mon, Jul 28, 2025 at 06:18:11PM -0700, Martin KaFai Lau wrote:
> On 7/28/25 2:43 AM, Mahe Tardy wrote:
> > +SEC("cgroup_skb/egress")
> > +int egress(struct __sk_buff *skb)
> > +{
> > +	void *data = (void *)(long)skb->data;
> > +	void *data_end = (void *)(long)skb->data_end;
> > +	struct iphdr *iph;
> > +	struct tcphdr *tcph;
> > +
> > +	iph = data;
> > +	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
> > +	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
> > +		return SK_PASS;
> > +
> > +	tcph = (void *)iph + iph->ihl * 4;
> > +	if ((void *)(tcph + 1) > data_end ||
> > +	    tcph->dest != bpf_htons(SERVER_PORT))
> > +		return SK_PASS;
> > +
> > +	kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
> > +
> > +	/* returns SK_PASS to execute the test case quicker */
> 
> Do you know why the user space is slower if 0 (SK_DROP) is used?

I tried to write my understanding of this in the commit description:

"Note that the BPF program returns SK_PASS to let the connection being
established to finish the test cases quicker. Otherwise, you have to
wait for the TCP three-way handshake to timeout in the kernel and
retrieve the errno translated from the unreach code set by the ICMP
control message."

I added this comment because I already had some (offline) feedback that
this looked off, maybe I should develop and put this here directly.

> 
> > +	return SK_PASS;
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc
  2025-07-29  1:21             ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Martin KaFai Lau
@ 2025-07-29  9:53               ` Mahe Tardy
  2025-07-30  1:54                 ` Martin KaFai Lau
  0 siblings, 1 reply; 36+ messages in thread
From: Mahe Tardy @ 2025-07-29  9:53 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On Mon, Jul 28, 2025 at 06:21:50PM -0700, Martin KaFai Lau wrote:
> On 7/28/25 2:43 AM, Mahe Tardy wrote:
> > Hello,
> > 
> > This is v3 of adding the icmp_send_unreach kfunc, as suggested during
> > LSF/MM/BPF 2025[^1]. The goal is to allow cgroup_skb programs to
> > actively reject east-west traffic, similarly to what is possible to do
> > with netfilter reject target.
> > 
> > The first step to implement this is using ICMP control messages, with
> > the ICMP_DEST_UNREACH type with various code ICMP_NET_UNREACH,
> > ICMP_HOST_UNREACH, ICMP_PROT_UNREACH, etc. This is easier to implement
> > than a TCP RST reply and will already hint the client TCP stack to abort
> > the connection and not retry extensively.
> > 
> > Note that this is different than the sock_destroy kfunc, that along
> > calls tcp_abort and thus sends a reset, destroying the underlying
> > socket.
> > 
> > Caveats of this kfunc design are that a cgroup_skb program can call this
> > function N times, thus send N ICMP unreach control messages and that the
> > program can return from the BPF filter with SK_PASS leading to a
> > potential confusing situation where the TCP connection was established
> > while the client received ICMP_DEST_UNREACH messages.
> > 
> > Another more sophisticated design idea would be for the kfunc to set the
> > kernel to send an ICMP_HOST_UNREACH control message with the appropriate
> > code when the cgroup_skb program terminates with SK_DROP. Creating a new
> > 'SK_REJECT' return code for cgroup_skb program was generally rejected
> > and would be too limited for other program types support.
> > 
> > We should bear in mind that we want to add a TCP reset kfunc next and
> > also could extend this kfunc to other program types if wanted.
> 
> Some high level questions.
> 
> Which other program types do you need this kfunc to send icmp and the future
> tcp rst?

I don't really know, I mostly need this in cgroup_skb for my use case
but I could see other programs type using this either for simplification
(for progs that can already rewrite the packet, like tc) or other
programs types like cgroup_skb, because they can't touch the packet
themselves.

> 
> This cover letter mentioned sending icmp unreach is easier than sending tcp
> rst. What problems do you see in sending tcp rst?
> 

Yes, I based these patches on what net/ipv4/netfilter/ipt_REJECT.c's
'reject_tg' function does. In the case of sending ICMP unreach
'nf_send_unreach', the routing step is quite straighforward as they are
only inverting the daddr and the saddr (that's what my renamed/moved
ip_route_reply_fetch_dst helper does).

In the case of sending RST 'nf_send_reset', there are extra steps, first
the same routing mechanism is done by just inverting the daddr and the
saddr but later 'ip_route_me_harder' is called which is doing a lot
more. I'm currently not sure which parts of this must be ported to work
in our BPF use case so I wanted to start with unreach.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-29  1:05               ` Martin KaFai Lau
@ 2025-07-29 10:06                 ` Mahe Tardy
  2025-07-29 23:13                   ` Martin KaFai Lau
  0 siblings, 1 reply; 36+ messages in thread
From: Mahe Tardy @ 2025-07-29 10:06 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On Mon, Jul 28, 2025 at 06:05:26PM -0700, Martin KaFai Lau wrote:
> On 7/28/25 2:43 AM, Mahe Tardy wrote:
> > This is needed in the context of Tetragon to provide improved feedback
> > (in contrast to just dropping packets) to east-west traffic when blocked
> > by policies using cgroup_skb programs.
> > 
> > This reuse concepts from netfilter reject target codepath with the
> > differences that:
> > * Packets are cloned since the BPF user can still return SK_PASS from
> >    the cgroup_skb progs and the current skb need to stay untouched
> 
> This needs more details. Which field(s) of the skb are changed by the kfunc,
> the skb_dst_set in ip[6]_route_reply_fetch_dst() and/or the code path in the
> icmp[v6]_send() ?

Okay I can add that: "ip[6]_route_reply_fetch_dst set the dst of the skb
by using the saddr as a daddr and routing it", I don't think
icmp[v6]_send touches the skb?

> 
> >    (cgroup_skb hooks only allow read-only skb payload).
> > * Since cgroup_skb programs are called late in the stack, checksums do
> >    not need to be computed or verified, and IPv4 fragmentation does not
> >    need to be checked (ip_local_deliver should take care of that
> >    earlier).
> > 
> > Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
> > ---
> >   net/core/filter.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 61 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 7a72f766aacf..050872324575 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -85,6 +85,10 @@
> >   #include <linux/un.h>
> >   #include <net/xdp_sock_drv.h>
> >   #include <net/inet_dscp.h>
> > +#include <linux/icmp.h>
> > +#include <net/icmp.h>
> > +#include <net/route.h>
> > +#include <net/ip6_route.h>
> > 
> >   #include "dev.h"
> > 
> > @@ -12148,6 +12152,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
> >   	return 0;
> >   }
> > 
> > +__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
> > +{
> > +	struct sk_buff *skb = (struct sk_buff *)__skb;
> > +	struct sk_buff *nskb;
> > +
> > +	switch (skb->protocol) {
> > +	case htons(ETH_P_IP):
> > +		if (code < 0 || code > NR_ICMP_UNREACH)
> > +			return -EINVAL;
> > +
> > +		nskb = skb_clone(skb, GFP_ATOMIC);
> > +		if (!nskb)
> > +			return -ENOMEM;
> > +
> > +		if (ip_route_reply_fetch_dst(nskb) < 0) {
> > +			kfree_skb(nskb);
> > +			return -EHOSTUNREACH;
> > +		}
> > +
> > +		icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
> > +		kfree_skb(nskb);
> > +		break;
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +	case htons(ETH_P_IPV6):
> > +		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> > +			return -EINVAL;
> > +
> > +		nskb = skb_clone(skb, GFP_ATOMIC);
> > +		if (!nskb)
> > +			return -ENOMEM;
> > +
> > +		if (ip6_route_reply_fetch_dst(nskb) < 0) {
> 
> From a very quick look at icmpv6_send(), it does its own route lookup. I
> haven't looked at the v4 yet.
> 
> I am likely missing some details. Can you explain why it needs to do a
> lookup before calling icmpv6_send()?

From my understanding, I need to do this to invert the daddr with the
saddr to send the unreach message back to the sender.

> 
> > +			kfree_skb(nskb);
> > +			return -EHOSTUNREACH;
> > +		}
> > +
> > +		icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
> > +		kfree_skb(nskb);
> > +		break;
> > +#endif
> > +	default:
> > +		return -EPROTONOSUPPORT;
> > +	}
> > +
> > +	return SK_DROP;
> > +}
> > +

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
  2025-07-29 10:06                 ` Mahe Tardy
@ 2025-07-29 23:13                   ` Martin KaFai Lau
  0 siblings, 0 replies; 36+ messages in thread
From: Martin KaFai Lau @ 2025-07-29 23:13 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On 7/29/25 3:06 AM, Mahe Tardy wrote:
> On Mon, Jul 28, 2025 at 06:05:26PM -0700, Martin KaFai Lau wrote:
>> On 7/28/25 2:43 AM, Mahe Tardy wrote:
>>> This is needed in the context of Tetragon to provide improved feedback
>>> (in contrast to just dropping packets) to east-west traffic when blocked
>>> by policies using cgroup_skb programs.
>>>
>>> This reuse concepts from netfilter reject target codepath with the
>>> differences that:
>>> * Packets are cloned since the BPF user can still return SK_PASS from
>>>     the cgroup_skb progs and the current skb need to stay untouched
>>
>> This needs more details. Which field(s) of the skb are changed by the kfunc,
>> the skb_dst_set in ip[6]_route_reply_fetch_dst() and/or the code path in the
>> icmp[v6]_send() ?
> 
> Okay I can add that: "ip[6]_route_reply_fetch_dst set the dst of the skb
> by using the saddr as a daddr and routing it", I don't think
> icmp[v6]_send touches the skb?

I also don't think icmp[v6]_send touches the skb. I am still not sure if 
ip[6]_route_reply_fetch_dst is needed.

> 
>>
>>>     (cgroup_skb hooks only allow read-only skb payload).
>>> * Since cgroup_skb programs are called late in the stack, checksums do
>>>     not need to be computed or verified, and IPv4 fragmentation does not
>>>     need to be checked (ip_local_deliver should take care of that
>>>     earlier).
>>>
>>> Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
>>> ---
>>>    net/core/filter.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 61 insertions(+)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 7a72f766aacf..050872324575 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -85,6 +85,10 @@
>>>    #include <linux/un.h>
>>>    #include <net/xdp_sock_drv.h>
>>>    #include <net/inet_dscp.h>
>>> +#include <linux/icmp.h>
>>> +#include <net/icmp.h>
>>> +#include <net/route.h>
>>> +#include <net/ip6_route.h>
>>>
>>>    #include "dev.h"
>>>
>>> @@ -12148,6 +12152,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
>>>    	return 0;
>>>    }
>>>
>>> +__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
>>> +{
>>> +	struct sk_buff *skb = (struct sk_buff *)__skb;
>>> +	struct sk_buff *nskb;
>>> +
>>> +	switch (skb->protocol) {
>>> +	case htons(ETH_P_IP):
>>> +		if (code < 0 || code > NR_ICMP_UNREACH)
>>> +			return -EINVAL;
>>> +
>>> +		nskb = skb_clone(skb, GFP_ATOMIC);
>>> +		if (!nskb)
>>> +			return -ENOMEM;
>>> +
>>> +		if (ip_route_reply_fetch_dst(nskb) < 0) {
>>> +			kfree_skb(nskb);
>>> +			return -EHOSTUNREACH;
>>> +		}
>>> +
>>> +		icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
>>> +		kfree_skb(nskb);
>>> +		break;
>>> +#if IS_ENABLED(CONFIG_IPV6)
>>> +	case htons(ETH_P_IPV6):
>>> +		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
>>> +			return -EINVAL;
>>> +
>>> +		nskb = skb_clone(skb, GFP_ATOMIC);
>>> +		if (!nskb)
>>> +			return -ENOMEM;
>>> +
>>> +		if (ip6_route_reply_fetch_dst(nskb) < 0) {
>>
>>  From a very quick look at icmpv6_send(), it does its own route lookup. I
>> haven't looked at the v4 yet.
>>
>> I am likely missing some details. Can you explain why it needs to do a
>> lookup before calling icmpv6_send()?
> 
>  From my understanding, I need to do this to invert the daddr with the
> saddr to send the unreach message back to the sender.

 From looking at how fl6.{daddr,saddr} are filled and passed to 
icmpv6_route_lookup in icmpv6_send(), the icmpv6_send() should have done the 
reverse/invert route lookup. I also don't see icmpv6_send uses the skb_dst() of 
the original skb. Did I misread the code? The kfunc does not work without 
ip[6]_route_reply_fetch_dst()? Again, I have not checked the v4 icmp_send. fwiw, 
the selftest should have both v4 and v6 test.

Note that at cgroup/egress, the skb->_skb_refdst should have been set.

The same should be true for cgroup/ingress for inet proto but it seems 
BPF_CGROUP_RUN_PROG_"INET"_INGRESS is not called from INET only now. e.g. 
sk_filter() can be called from af_netlink. It seems like there is a bug.

> 
>>
>>> +			kfree_skb(nskb);
>>> +			return -EHOSTUNREACH;
>>> +		}
>>> +
>>> +		icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
>>> +		kfree_skb(nskb);
>>> +		break;
>>> +#endif
>>> +	default:
>>> +		return -EPROTONOSUPPORT;
>>> +	}
>>> +
>>> +	return SK_DROP;
>>> +}
>>> +
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-29  9:09                 ` Mahe Tardy
@ 2025-07-29 23:27                   ` Martin KaFai Lau
  2025-07-30  0:01                     ` Martin KaFai Lau
  0 siblings, 1 reply; 36+ messages in thread
From: Martin KaFai Lau @ 2025-07-29 23:27 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On 7/29/25 2:09 AM, Mahe Tardy wrote:
> On Mon, Jul 28, 2025 at 06:18:11PM -0700, Martin KaFai Lau wrote:
>> On 7/28/25 2:43 AM, Mahe Tardy wrote:
>>> +SEC("cgroup_skb/egress")
>>> +int egress(struct __sk_buff *skb)
>>> +{
>>> +	void *data = (void *)(long)skb->data;
>>> +	void *data_end = (void *)(long)skb->data_end;
>>> +	struct iphdr *iph;
>>> +	struct tcphdr *tcph;
>>> +
>>> +	iph = data;
>>> +	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
>>> +	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
>>> +		return SK_PASS;
>>> +
>>> +	tcph = (void *)iph + iph->ihl * 4;
>>> +	if ((void *)(tcph + 1) > data_end ||
>>> +	    tcph->dest != bpf_htons(SERVER_PORT))
>>> +		return SK_PASS;
>>> +
>>> +	kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
>>> +
>>> +	/* returns SK_PASS to execute the test case quicker */
>>
>> Do you know why the user space is slower if 0 (SK_DROP) is used?
> 
> I tried to write my understanding of this in the commit description:
> 
> "Note that the BPF program returns SK_PASS to let the connection being
> established to finish the test cases quicker. Otherwise, you have to
> wait for the TCP three-way handshake to timeout in the kernel and
> retrieve the errno translated from the unreach code set by the ICMP
> control message."

This feels like a bit hacky to let the 3WHS finished while the objective of the 
patch set is to drop it. It is not unusual for people to directly borrow this 
code. Does non blocking connect() help?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-29 23:27                   ` Martin KaFai Lau
@ 2025-07-30  0:01                     ` Martin KaFai Lau
  2025-07-30  0:32                       ` Martin KaFai Lau
  0 siblings, 1 reply; 36+ messages in thread
From: Martin KaFai Lau @ 2025-07-30  0:01 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On 7/29/25 4:27 PM, Martin KaFai Lau wrote:
> On 7/29/25 2:09 AM, Mahe Tardy wrote:
>> On Mon, Jul 28, 2025 at 06:18:11PM -0700, Martin KaFai Lau wrote:
>>> On 7/28/25 2:43 AM, Mahe Tardy wrote:
>>>> +SEC("cgroup_skb/egress")
>>>> +int egress(struct __sk_buff *skb)
>>>> +{
>>>> +    void *data = (void *)(long)skb->data;
>>>> +    void *data_end = (void *)(long)skb->data_end;
>>>> +    struct iphdr *iph;
>>>> +    struct tcphdr *tcph;
>>>> +
>>>> +    iph = data;
>>>> +    if ((void *)(iph + 1) > data_end || iph->version != 4 ||
>>>> +        iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
>>>> +        return SK_PASS;
>>>> +
>>>> +    tcph = (void *)iph + iph->ihl * 4;
>>>> +    if ((void *)(tcph + 1) > data_end ||
>>>> +        tcph->dest != bpf_htons(SERVER_PORT))
>>>> +        return SK_PASS;
>>>> +
>>>> +    kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
>>>> +
>>>> +    /* returns SK_PASS to execute the test case quicker */
>>>
>>> Do you know why the user space is slower if 0 (SK_DROP) is used?
>>
>> I tried to write my understanding of this in the commit description:
>>
>> "Note that the BPF program returns SK_PASS to let the connection being
>> established to finish the test cases quicker. Otherwise, you have to
>> wait for the TCP three-way handshake to timeout in the kernel and
>> retrieve the errno translated from the unreach code set by the ICMP
>> control message."
> 
> This feels like a bit hacky to let the 3WHS finished while the objective of the 
> patch set is to drop it. It is not unusual for people to directly borrow this 
> code. Does non blocking connect() help?
> 

After reading more on how sk_err_soft is used, non blocking won't help. I think 
I see why tcp rst is better.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-30  0:01                     ` Martin KaFai Lau
@ 2025-07-30  0:32                       ` Martin KaFai Lau
  0 siblings, 0 replies; 36+ messages in thread
From: Martin KaFai Lau @ 2025-07-30  0:32 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On 7/29/25 5:01 PM, Martin KaFai Lau wrote:
> On 7/29/25 4:27 PM, Martin KaFai Lau wrote:
>> On 7/29/25 2:09 AM, Mahe Tardy wrote:
>>> On Mon, Jul 28, 2025 at 06:18:11PM -0700, Martin KaFai Lau wrote:
>>>> On 7/28/25 2:43 AM, Mahe Tardy wrote:
>>>>> +SEC("cgroup_skb/egress")
>>>>> +int egress(struct __sk_buff *skb)
>>>>> +{
>>>>> +    void *data = (void *)(long)skb->data;
>>>>> +    void *data_end = (void *)(long)skb->data_end;
>>>>> +    struct iphdr *iph;
>>>>> +    struct tcphdr *tcph;
>>>>> +
>>>>> +    iph = data;
>>>>> +    if ((void *)(iph + 1) > data_end || iph->version != 4 ||
>>>>> +        iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
>>>>> +        return SK_PASS;
>>>>> +
>>>>> +    tcph = (void *)iph + iph->ihl * 4;
>>>>> +    if ((void *)(tcph + 1) > data_end ||
>>>>> +        tcph->dest != bpf_htons(SERVER_PORT))
>>>>> +        return SK_PASS;
>>>>> +
>>>>> +    kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
>>>>> +
>>>>> +    /* returns SK_PASS to execute the test case quicker */
>>>>
>>>> Do you know why the user space is slower if 0 (SK_DROP) is used?
>>>
>>> I tried to write my understanding of this in the commit description:
>>>
>>> "Note that the BPF program returns SK_PASS to let the connection being
>>> established to finish the test cases quicker. Otherwise, you have to
>>> wait for the TCP three-way handshake to timeout in the kernel and
>>> retrieve the errno translated from the unreach code set by the ICMP
>>> control message."
>>
>> This feels like a bit hacky to let the 3WHS finished while the objective of 
>> the patch set is to drop it. It is not unusual for people to directly borrow 
>> this code. Does non blocking connect() help?
>>
> 
> After reading more on how sk_err_soft is used, non blocking won't help. I think 
> I see why tcp rst is better.
> 

Actually, while replying on the cover letter and looking at tcp_v4_err again, 
there is an exception to do ip_icmp_error for TCP_SYN_SENT, so it may worth a 
try on non blocking connect and then poll the sk for err if you haven't tried 
that before.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc
  2025-07-29  9:53               ` Mahe Tardy
@ 2025-07-30  1:54                 ` Martin KaFai Lau
  2025-08-01 18:50                   ` Mahe Tardy
  0 siblings, 1 reply; 36+ messages in thread
From: Martin KaFai Lau @ 2025-07-30  1:54 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On 7/29/25 2:53 AM, Mahe Tardy wrote:
>> Which other program types do you need this kfunc to send icmp and the future
>> tcp rst?
> 
> I don't really know, I mostly need this in cgroup_skb for my use case
> but I could see other programs type using this either for simplification
> (for progs that can already rewrite the packet, like tc) or other
> programs types like cgroup_skb, because they can't touch the packet
> themselves.

I also don't think the tc needs this kfunc either. The tc should already have 
ways to do this now.

> 
>>
>> This cover letter mentioned sending icmp unreach is easier than sending tcp
>> rst. What problems do you see in sending tcp rst?
>>
> 
> Yes, I based these patches on what net/ipv4/netfilter/ipt_REJECT.c's
> 'reject_tg' function does. In the case of sending ICMP unreach
> 'nf_send_unreach', the routing step is quite straighforward as they are
> only inverting the daddr and the saddr (that's what my renamed/moved
> ip_route_reply_fetch_dst helper does).
> 
> In the case of sending RST 'nf_send_reset', there are extra steps, first
> the same routing mechanism is done by just inverting the daddr and the
> saddr but later 'ip_route_me_harder' is called which is doing a lot
> more. I'm currently not sure which parts of this must be ported to work
> in our BPF use case so I wanted to start with unreach.

I don't think we necessarily need to completely borrow from nf, the hooks' 
locations are different and the use case may be different.

A concern that I have is the icmp6_send called by the kfunc. The icmp6_send 
should eventually call to ip6_finish_output which may call the very same 
"cgroup/egress" program again in a recursive way. The same for v4 icmp_send.

The icmp packet is sent from an internal kernel sk. I suspect you will see this 
recursive behavior if the test is done in the default cgroup (/sys/fs/cgroup). I 
think the is_ineligible(skb) should have stopped the second icmpv6_send from 
replying to an icmp error and the cgroup hook cannot change the skb. However, I 
am not sure I want to cross this bridge. Is there a way to avoid the recursive 
bpf prog?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc
  2025-07-30  1:54                 ` Martin KaFai Lau
@ 2025-08-01 18:50                   ` Mahe Tardy
  0 siblings, 0 replies; 36+ messages in thread
From: Mahe Tardy @ 2025-08-01 18:50 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, netdev, netfilter-devel, oe-kbuild-all, pablo,
	lkp

On Tue, Jul 29, 2025 at 06:54:58PM -0700, Martin KaFai Lau wrote:
> On 7/29/25 2:53 AM, Mahe Tardy wrote:
> > > Which other program types do you need this kfunc to send icmp and the future
> > > tcp rst?
> > 
> > I don't really know, I mostly need this in cgroup_skb for my use case
> > but I could see other programs type using this either for simplification
> > (for progs that can already rewrite the packet, like tc) or other
> > programs types like cgroup_skb, because they can't touch the packet
> > themselves.
> 
> I also don't think the tc needs this kfunc either. The tc should already
> have ways to do this now.
> 
> > 
> > > 
> > > This cover letter mentioned sending icmp unreach is easier than sending tcp
> > > rst. What problems do you see in sending tcp rst?
> > > 
> > 
> > Yes, I based these patches on what net/ipv4/netfilter/ipt_REJECT.c's
> > 'reject_tg' function does. In the case of sending ICMP unreach
> > 'nf_send_unreach', the routing step is quite straighforward as they are
> > only inverting the daddr and the saddr (that's what my renamed/moved
> > ip_route_reply_fetch_dst helper does).
> > 
> > In the case of sending RST 'nf_send_reset', there are extra steps, first
> > the same routing mechanism is done by just inverting the daddr and the
> > saddr but later 'ip_route_me_harder' is called which is doing a lot
> > more. I'm currently not sure which parts of this must be ported to work
> > in our BPF use case so I wanted to start with unreach.
> 
> I don't think we necessarily need to completely borrow from nf, the hooks'
> locations are different and the use case may be different.
> 
> A concern that I have is the icmp6_send called by the kfunc. The icmp6_send
> should eventually call to ip6_finish_output which may call the very same
> "cgroup/egress" program again in a recursive way. The same for v4 icmp_send.
> 
> The icmp packet is sent from an internal kernel sk. I suspect you will see
> this recursive behavior if the test is done in the default cgroup
> (/sys/fs/cgroup). I think the is_ineligible(skb) should have stopped the
> second icmpv6_send from replying to an icmp error and the cgroup hook cannot
> change the skb. However, I am not sure I want to cross this bridge. Is there
> a way to avoid the recursive bpf prog?
> 

Thanks Martin for the review. Indeed the recursive BPF prog call is a
concerning issue. I'll take some time to think about it and hopefully
propose something.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests
  2025-07-28  9:43             ` [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
  2025-07-28 15:40               ` Yonghong Song
  2025-07-29  1:18               ` Martin KaFai Lau
@ 2025-08-05 23:26               ` Jordan Rife
  2 siblings, 0 replies; 36+ messages in thread
From: Jordan Rife @ 2025-08-05 23:26 UTC (permalink / raw)
  To: Mahe Tardy
  Cc: lkp, alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
	john.fastabend, martin.lau, netdev, netfilter-devel,
	oe-kbuild-all, pablo

On Mon, Jul 28, 2025 at 09:43:45AM +0000, Mahe Tardy wrote:
> This test opens a server and client, attach a cgroup_skb program on
> egress and calls the icmp_send_unreach function from the client egress
> so that an ICMP unreach control message is sent back to the client.
> It then fetches the message from the error queue to confirm the correct
> ICMP unreach code has been sent.
> 
> Note that the BPF program returns SK_PASS to let the connection being
> established to finish the test cases quicker. Otherwise, you have to
> wait for the TCP three-way handshake to timeout in the kernel and
> retrieve the errno translated from the unreach code set by the ICMP
> control message.
> 
> Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
> ---
>  .../bpf/prog_tests/icmp_send_unreach_kfunc.c  | 99 +++++++++++++++++++
>  .../selftests/bpf/progs/icmp_send_unreach.c   | 36 +++++++
>  2 files changed, 135 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
>  create mode 100644 tools/testing/selftests/bpf/progs/icmp_send_unreach.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
> new file mode 100644
> index 000000000000..414c1ed8ced3
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <test_progs.h>
> +#include <network_helpers.h>
> +#include <linux/errqueue.h>
> +#include "icmp_send_unreach.skel.h"
> +
> +#define TIMEOUT_MS 1000
> +#define SRV_PORT 54321
> +
> +#define ICMP_DEST_UNREACH 3
> +
> +#define ICMP_FRAG_NEEDED 4
> +#define NR_ICMP_UNREACH 15

small nit: Any reason why ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, and
NR_ICMP_UNREACH are redefined here? I think you should just be able to
#include <linux/icmp.h> this at the top to avoid redefining these.

> +
> +static void read_icmp_errqueue(int sockfd, int expected_code)
> +{
> +	ssize_t n;
> +	struct sock_extended_err *sock_err;
> +	struct cmsghdr *cm;
> +	char ctrl_buf[512];
> +	struct msghdr msg = {
> +		.msg_control = ctrl_buf,
> +		.msg_controllen = sizeof(ctrl_buf),
> +	};
> +
> +	n = recvmsg(sockfd, &msg, MSG_ERRQUEUE);
> +	if (!ASSERT_GE(n, 0, "recvmsg_errqueue"))
> +		return;
> +
> +	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
> +		if (!ASSERT_EQ(cm->cmsg_level, IPPROTO_IP, "cmsg_type") ||
> +		    !ASSERT_EQ(cm->cmsg_type, IP_RECVERR, "cmsg_level"))
> +			continue;
> +
> +		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);
> +
> +		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
> +			       "sock_err_origin_icmp"))
> +			return;
> +		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
> +			       "sock_err_type_dest_unreach"))
> +			return;
> +		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
> +	}
> +}
> +
> +void test_icmp_send_unreach_kfunc(void)
> +{
> +	struct icmp_send_unreach *skel;
> +	int cgroup_fd = -1, client_fd = 1, srv_fd = -1;
> +	int *code;
> +
> +	skel = icmp_send_unreach__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "skel_open"))
> +		goto cleanup;
> +
> +	cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
> +	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
> +		goto cleanup;
> +
> +	skel->links.egress =
> +		bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
> +	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
> +		goto cleanup;
> +
> +	code = &skel->bss->unreach_code;
> +
> +	for (*code = 0; *code <= NR_ICMP_UNREACH; (*code)++) {
> +		// The TCP stack reacts differently when asking for
> +		// fragmentation, let's ignore it for now
> +		if (*code == ICMP_FRAG_NEEDED)
> +			continue;
> +
> +		skel->bss->kfunc_ret = -1;
> +
> +		srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1",
> +				      SRV_PORT, TIMEOUT_MS);
> +		if (!ASSERT_GE(srv_fd, 0, "start_server"))
> +			goto for_cleanup;
> +
> +		client_fd = socket(AF_INET, SOCK_STREAM, 0);
> +		ASSERT_GE(client_fd, 0, "client_socket");
> +
> +		client_fd = connect_to_fd(srv_fd, 0);
> +		if (!ASSERT_GE(client_fd, 0, "client_connect"))
> +			goto for_cleanup;
> +
> +		read_icmp_errqueue(client_fd, *code);
> +
> +		ASSERT_EQ(skel->bss->kfunc_ret, SK_DROP, "kfunc_ret");

It might be worth testing that the kfunc returns -EINVAL when the code
is outside the accepted range as well for completeness.

> +for_cleanup:
> +		close(client_fd);
> +		close(srv_fd);
> +	}
> +
> +cleanup:
> +	icmp_send_unreach__destroy(skel);
> +	close(cgroup_fd);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/icmp_send_unreach.c b/tools/testing/selftests/bpf/progs/icmp_send_unreach.c
> new file mode 100644
> index 000000000000..15783e5d1d65
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/icmp_send_unreach.c
> @@ -0,0 +1,36 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_endian.h>
> +
> +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> +
> +int unreach_code = 0;
> +int kfunc_ret = 0;
> +
> +#define SERVER_PORT 54321
> +#define SERVER_IP 0x7F000001
> +
> +SEC("cgroup_skb/egress")
> +int egress(struct __sk_buff *skb)
> +{
> +	void *data = (void *)(long)skb->data;
> +	void *data_end = (void *)(long)skb->data_end;
> +	struct iphdr *iph;
> +	struct tcphdr *tcph;
> +
> +	iph = data;
> +	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
> +	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
> +		return SK_PASS;
> +
> +	tcph = (void *)iph + iph->ihl * 4;
> +	if ((void *)(tcph + 1) > data_end ||
> +	    tcph->dest != bpf_htons(SERVER_PORT))
> +		return SK_PASS;
> +
> +	kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
> +
> +	/* returns SK_PASS to execute the test case quicker */
> +	return SK_PASS;
> +}
> --
> 2.34.1
> 

Jordan

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2025-08-05 23:26 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-10 10:26 [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
2025-07-10 10:26 ` [PATCH bpf-next v1 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
2025-07-10 10:26 ` [PATCH bpf-next v1 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
2025-07-10 22:02   ` kernel test robot
2025-07-10 10:26 ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
2025-07-10 16:07   ` Alexei Starovoitov
2025-07-11 10:57     ` Mahe Tardy
2025-07-25 18:53     ` [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
2025-07-25 18:53       ` [PATCH bpf-next v2 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
2025-07-25 18:53       ` [PATCH bpf-next v2 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
2025-07-25 18:53       ` [PATCH bpf-next v2 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
2025-07-27  1:49         ` kernel test robot
2025-07-28  9:43           ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
2025-07-28  9:43             ` [PATCH bpf-next v3 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
2025-07-28  9:43             ` [PATCH bpf-next v3 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
2025-07-28  9:43             ` [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
2025-07-28 20:10               ` kernel test robot
2025-07-29  1:05               ` Martin KaFai Lau
2025-07-29 10:06                 ` Mahe Tardy
2025-07-29 23:13                   ` Martin KaFai Lau
2025-07-28  9:43             ` [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
2025-07-28 15:40               ` Yonghong Song
2025-07-28 15:59                 ` Mahe Tardy
2025-07-29  1:18               ` Martin KaFai Lau
2025-07-29  9:09                 ` Mahe Tardy
2025-07-29 23:27                   ` Martin KaFai Lau
2025-07-30  0:01                     ` Martin KaFai Lau
2025-07-30  0:32                       ` Martin KaFai Lau
2025-08-05 23:26               ` Jordan Rife
2025-07-29  1:21             ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Martin KaFai Lau
2025-07-29  9:53               ` Mahe Tardy
2025-07-30  1:54                 ` Martin KaFai Lau
2025-08-01 18:50                   ` Mahe Tardy
2025-07-25 18:53       ` [PATCH bpf-next v2 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
2025-07-11  0:32   ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc kernel test robot
2025-07-10 10:26 ` [PATCH bpf-next v1 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).