From: Mahe Tardy <mahe.tardy@gmail.com>
To: Jordan Rife <jordan@jrife.io>
Cc: bpf@vger.kernel.org, martin.lau@linux.dev, daniel@iogearbox.net,
john.fastabend@gmail.com, ast@kernel.org, andrii@kernel.org,
yonghong.song@linux.dev, netdev@vger.kernel.org,
netfilter-devel@vger.kernel.org, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
Subject: Re: [PATCH bpf-next v6 4/6] selftests/bpf: add bpf_icmp_send kfunc tests
Date: Wed, 20 May 2026 21:15:10 +0200 [thread overview]
Message-ID: <ag4IPnc_KBHid7Af@gmail.com> (raw)
In-Reply-To: <q46w47cqezttz2otf76tncxu7sovhywy2dqwcbajigp64uhbtz@eimvxfeeexuz>
On Mon, May 18, 2026 at 06:34:02PM -0700, Jordan Rife wrote:
> On Mon, May 18, 2026 at 12:28:40PM +0000, Mahe Tardy wrote:
[...]
> > +
> > +static int connect_to_fd_nonblock(int server_fd)
> > +{
> > + struct sockaddr_storage addr;
> > + socklen_t len = sizeof(addr);
> > + int fd, err;
> > +
> > + if (getsockname(server_fd, (struct sockaddr *)&addr, &len))
> > + return -1;
> > +
> > + fd = socket(addr.ss_family, SOCK_STREAM | SOCK_NONBLOCK, 0);
>
> Is the only customization here SOCK_NONBLOCK? Would it be possible to
> simply extend network_helper_opts to make it possible via the existing
> connect_to_fd_opts?
So first I tried this with a very complicated callback to set NONBLOCK
and the RECVERR for IPv6 but unfortunately the code from the helper
looks like this:
if (connect(fd, (const struct sockaddr *)addr, addrlen)) {
log_err("Failed to connect to server");
save_errno_close(fd);
return -1;
}
And thus I can't really check for EINPROGRESS so it doesn't work.
> Alternatively, is it possible to test with SOCK_DGRAM to avoid the
> handshake timeout issue?
So thanks for suggesting this, I didn't think about it since the whole
point of this was originally for TCP connections in my use case. I tried
it, it kinda simplifies the test code however I feel like it kinda
defeat the whole point of the patch set.
In the case of using connected UDP, you need to send a first pkt to
trigger the BPF prog and from my quick test and glance (might be wrong)
send will immediately fail because the cgroup_skb program will make the
kernel return an error to the syscall from the SK_DROP synchronously.
I think the point of this was that when connecting with TCP, in the case
of a drop, the kernel will just blindly retry to connect and sending an
ICMP control message is a way to stop the kernel immediately and give
the user the immediate feedback they were missing blocking on this.
So you can still modify the bpf prog so that it returns SK_PASS but then
it's not a good illustration of the use of this kfunc, as Martin pointed
out originally, also the fact that you use UDP in the first place is not
a good showcase of the utility of this.
Does it make sense?
> > + if (fd < 0)
> > + return -1;
> > +
> > + err = connect(fd, (struct sockaddr *)&addr, len);
> > + if (err < 0 && errno != EINPROGRESS) {
> > + close(fd);
> > + return -1;
> > + }
> > +
> > + return fd;
> > +}
> > +
> > +static void read_icmp_errqueue(int sockfd, int expected_code)
> > +{
> > + ssize_t n;
>
> very small nit: Other functions follow reverse xmas tree order but this
> one doesn't which looks kind of weird.
Ahah ok I can reorganize for reverse xmas tree.
>
> > + struct sock_extended_err *sock_err;
> > + struct cmsghdr *cm;
> > + char ctrl_buf[512];
> > + struct msghdr msg = {
> > + .msg_control = ctrl_buf,
> > + .msg_controllen = sizeof(ctrl_buf),
> > + };
> > + struct pollfd pfd = {
> > + .fd = sockfd,
> > + .events = POLLERR,
> > + };
> > +
> > + if (!ASSERT_GE(poll(&pfd, 1, TIMEOUT_MS), 1, "poll_errqueue"))
> > + return;
> > +
> > + n = recvmsg(sockfd, &msg, MSG_ERRQUEUE);
> > + if (!ASSERT_GE(n, 0, "recvmsg_errqueue"))
> > + return;
> > +
> > + cm = CMSG_FIRSTHDR(&msg);
> > + if (!ASSERT_NEQ(cm, NULL, "cm_firsthdr_null"))
> > + return;
> > +
> > + for (; cm; cm = CMSG_NXTHDR(&msg, cm)) {
> > + if (cm->cmsg_level != IPPROTO_IP ||
> > + cm->cmsg_type != IP_RECVERR)
> > + continue;
> > +
> > + sock_err = (struct sock_extended_err *)CMSG_DATA(cm);
> > +
> > + if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
> > + "sock_err_origin_icmp"))
> > + return;
> > + if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
> > + "sock_err_type_dest_unreach"))
> > + return;
> > + ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
> > + return;
> > + }
> > +
> > + ASSERT_FAIL("no IP_RECVERR/IPV6_RECVERR control message found");
> > +}
> > +
> > +static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel,
> > + int code)
> > +{
> > + int srv_fd = -1, client_fd = -1;
> > + struct sockaddr_in addr;
> > + socklen_t len = sizeof(addr);
> > +
> > + srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1", 0,
> > + TIMEOUT_MS);
> > + if (!ASSERT_GE(srv_fd, 0, "start_server"))
> > + return;
>
> ASSERT_OK_FD
Sure let's modify all of them thanks for the info!
>
> > +
> > + if (getsockname(srv_fd, (struct sockaddr *)&addr, &len)) {
> > + close(srv_fd);
> > + return;
> > + }
> > + skel->bss->server_port = ntohs(addr.sin_port);
> > + skel->bss->unreach_code = code;
> > +
> > + client_fd = connect_to_fd_nonblock(srv_fd);
> > + if (!ASSERT_GE(client_fd, 0, "client_connect_nonblock")) {
>
> ASSERT_OK_FD
>
> > + close(srv_fd);
> > + return;
> > + }
> > +
> > + /* Skip reading ICMP error queue if code is invalid */
> > + if (code >= 0 && code <= NR_ICMP_UNREACH)
> > + read_icmp_errqueue(client_fd, code);
> > +
>
> Consider doing all cleanup here and just adding a goto label like
> cleanup in test_icmp_send_unreach?
Yeah I thought it was simple enough so that the flow would be natural
like this, but let me reconsider and see.
>
> > + close(client_fd);
> > + close(srv_fd);
> > +}
> > +
> > +void test_icmp_send_unreach(void)
> > +{
> > + struct icmp_send *skel;
> > + int cgroup_fd = -1;
> > +
> > + skel = icmp_send__open_and_load();
> > + if (!ASSERT_OK_PTR(skel, "skel_open"))
> > + goto cleanup;
> > +
> > + cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
> > + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
> > + goto cleanup;
>
> ASSERT_OK_FD
>
> > +
> > + skel->links.egress =
> > + bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
> > + if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
> > + goto cleanup;
> > +
> > + for (int code = 0; code <= NR_ICMP_UNREACH; code++) {
> > + /* The TCP stack reacts differently when asking for
> > + * fragmentation, let's ignore it for now.
> > + */
> > + if (code == ICMP_FRAG_NEEDED)
> > + continue;
> > +
> > + trigger_prog_read_icmp_errqueue(skel, code);
> > + ASSERT_EQ(skel->data->kfunc_ret, 0, "kfunc_ret");
> > + }
> > +
> > + /* Test an invalid code */
> > + trigger_prog_read_icmp_errqueue(skel, -1);
> > + ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
> > +
> > +cleanup:
> > + icmp_send__destroy(skel);
> > + close(cgroup_fd);
>
> if (cgroup_fd != -1)
> close(cgroup_fd);
>
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
> > new file mode 100644
> > index 000000000000..6d0be0a9afe1
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/icmp_send.c
> > @@ -0,0 +1,38 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include "vmlinux.h"
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_endian.h>
> > +
> > +/* 127.0.0.1 in host byte order */
> > +#define SERVER_IP 0x7F000001
> > +
> > +#define ICMP_DEST_UNREACH 3
> > +
> > +__u16 server_port = 0;
> > +int unreach_code = 0;
> > +int kfunc_ret = -1;
> > +
> > +SEC("cgroup_skb/egress")
>
> No test for TC?
Indeed, let me add one.
>
> > +int egress(struct __sk_buff *skb)
> > +{
> > + void *data = (void *)(long)skb->data;
> > + void *data_end = (void *)(long)skb->data_end;
> > + struct iphdr *iph;
> > + struct tcphdr *tcph;
> > +
> > + iph = data;
> > + if ((void *)(iph + 1) > data_end || iph->version != 4 ||
> > + iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
> > + return SK_PASS;
> > +
> > + tcph = (void *)iph + iph->ihl * 4;
> > + if ((void *)(tcph + 1) > data_end ||
> > + tcph->dest != bpf_htons(server_port))
> > + return SK_PASS;
> > +
> > + kfunc_ret = bpf_icmp_send(skb, ICMP_DEST_UNREACH, unreach_code);
> > +
> > + return SK_DROP;
> > +}
> > +
> > +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> > --
> > 2.34.1
> >
>
> Jordan
Mahe
Thanks
next prev parent reply other threads:[~2026-05-20 19:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-18 12:28 [PATCH bpf-next v6 0/4] bpf: add icmp_send kfunc Mahe Tardy
2026-05-18 12:28 ` [PATCH bpf-next v6 1/6] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
2026-05-18 13:07 ` bot+bpf-ci
2026-05-18 14:21 ` Mahe Tardy
2026-05-18 12:28 ` [PATCH bpf-next v6 2/6] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
2026-05-18 13:07 ` bot+bpf-ci
2026-05-18 14:22 ` Mahe Tardy
2026-05-18 12:28 ` [PATCH bpf-next v6 3/6] bpf: add bpf_icmp_send kfunc Mahe Tardy
2026-05-18 13:34 ` bot+bpf-ci
2026-05-18 14:26 ` Mahe Tardy
2026-05-18 16:17 ` Stanislav Fomichev
2026-05-18 17:18 ` Mahe Tardy
2026-05-19 21:20 ` Stanislav Fomichev
2026-05-19 1:33 ` Jordan Rife
2026-05-20 18:48 ` Mahe Tardy
2026-05-18 12:28 ` [PATCH bpf-next v6 4/6] selftests/bpf: add bpf_icmp_send kfunc tests Mahe Tardy
2026-05-19 1:34 ` Jordan Rife
2026-05-20 19:15 ` Mahe Tardy [this message]
2026-05-18 12:28 ` [PATCH bpf-next v6 5/6] selftests/bpf: add bpf_icmp_send kfunc IPv6 tests Mahe Tardy
2026-05-18 13:21 ` bot+bpf-ci
2026-05-18 14:27 ` Mahe Tardy
2026-05-18 12:28 ` [PATCH bpf-next v6 6/6] selftests/bpf: add bpf_icmp_send recursion test Mahe Tardy
2026-05-18 13:07 ` bot+bpf-ci
2026-05-18 14:39 ` Mahe Tardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ag4IPnc_KBHid7Af@gmail.com \
--to=mahe.tardy@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=edumazet@google.com \
--cc=john.fastabend@gmail.com \
--cc=jordan@jrife.io \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox