[PATCH bpf 0/2] support SKF_NET_OFF and SKF_LL

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags
@ 2025-04-03 14:07 Willem de Bruijn
  2025-04-03 14:07 ` [PATCH bpf 1/2] bpf: " Willem de Bruijn
  2025-04-03 14:07 ` [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn
  0 siblings, 2 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-04-03 14:07 UTC (permalink / raw)
  To: bpf; +Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Address a longstanding issue that may lead to missed packets
depending on system configuration.

Ensure that reading from packet contents works regardless of skb
geometry, also when using the special SKF_.. negative offsets to
offset from L2 or L3 header.

Patch 2 is the selftest for the fix.

Willem de Bruijn (2):
  bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  selftests/net: test sk_filter support for SKF_NET_OFF on frags

 include/linux/filter.h                     |   3 -
 kernel/bpf/core.c                          |  21 --
 net/core/filter.c                          |  75 ++++---
 tools/testing/selftests/net/.gitignore     |   1 +
 tools/testing/selftests/net/Makefile       |   2 +
 tools/testing/selftests/net/skf_net_off.c  | 244 +++++++++++++++++++++
 tools/testing/selftests/net/skf_net_off.sh |  28 +++
 7 files changed, 317 insertions(+), 57 deletions(-)
 create mode 100644 tools/testing/selftests/net/skf_net_off.c
 create mode 100755 tools/testing/selftests/net/skf_net_off.sh

-- 
2.49.0.472.ge94155a9ec-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  2025-04-03 14:07 [PATCH bpf 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn
@ 2025-04-03 14:07 ` Willem de Bruijn
  2025-04-03 14:18   ` Willem de Bruijn
  2025-04-03 17:21   ` Stanislav Fomichev
  2025-04-03 14:07 ` [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn
  1 sibling, 2 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-04-03 14:07 UTC (permalink / raw)
  To: bpf
  Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn,
	Matt Moeller, Maciej Żenczykowski

From: Willem de Bruijn <willemb@google.com>

Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
read when these offsets extend into frags.

This has been observed with iwlwifi and reproduced with tun with
IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
applied to a RAW socket, will silently miss matching packets.

    const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
    const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
    struct sock_filter filter_code[] = {
            BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
            BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
            BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
            BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
            BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),

This is unexpected behavior. Socket filter programs should be
consistent regardless of environment. Silent misses are
particularly concerning as hard to detect.

Use skb_copy_bits for offsets outside linear, same as done for
non-SKF_(LL|NET) offsets.

Offset is always positive after subtracting the reference threshold
SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
the two is an offset against skb->data, and may be negative, but it
cannot point before skb->head, as skb_(mac|network)_offset would too.

This appears to go back to when frag support was introduced to
sk_run_filter in linux-2.4.4, before the introduction of git.

The amount of code change and 8/16/32 bit duplication are unfortunate.
But any attempt I made to be smarter saved very few LoC while
complicating the code.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
Reported-by: Matt Moeller <moeller.matt@gmail.com>
Co-developed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/filter.h |  3 --
 kernel/bpf/core.c      | 21 ------------
 net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
 3 files changed, 42 insertions(+), 57 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f5cf4d35d83e..708ac7e0cd36 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
 	}
 }
 
-void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
-					   int k, unsigned int size);
-
 static inline int bpf_tell_extensions(void)
 {
 	return SKF_AD_MAX;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ba6b6118cf50..0e836b5ac9a0 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -68,27 +68,6 @@
 struct bpf_mem_alloc bpf_global_ma;
 bool bpf_global_ma_set;
 
-/* No hurry in this branch
- *
- * Exported for the bpf jit load helper.
- */
-void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
-{
-	u8 *ptr = NULL;
-
-	if (k >= SKF_NET_OFF) {
-		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
-	} else if (k >= SKF_LL_OFF) {
-		if (unlikely(!skb_mac_header_was_set(skb)))
-			return NULL;
-		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
-	}
-	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
-		return ptr;
-
-	return NULL;
-}
-
 /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
 enum page_size_enum {
 	__PAGE_SIZE = PAGE_SIZE
diff --git a/net/core/filter.c b/net/core/filter.c
index bc6828761a47..b232b70dd10d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
 BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
 	   data, int, headlen, int, offset)
 {
-	u8 tmp, *ptr;
+	u8 tmp;
 	const int len = sizeof(tmp);
 
-	if (offset >= 0) {
-		if (headlen - offset >= len)
-			return *(u8 *)(data + offset);
-		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
-			return tmp;
-	} else {
-		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
-		if (likely(ptr))
-			return *(u8 *)ptr;
+	if (offset < 0) {
+		if (offset >= SKF_NET_OFF)
+			offset += skb_network_offset(skb) - SKF_NET_OFF;
+		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
+			offset += skb_mac_offset(skb) - SKF_LL_OFF;
+		else
+			return -EFAULT;
 	}
 
-	return -EFAULT;
+	if (headlen - offset >= len)
+		return *(u8 *)(data + offset);
+	if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+		return tmp;
+	else
+		return -EFAULT;
 }
 
 BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
@@ -248,21 +251,24 @@ BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
 BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *,
 	   data, int, headlen, int, offset)
 {
-	__be16 tmp, *ptr;
+	__be16 tmp;
 	const int len = sizeof(tmp);
 
-	if (offset >= 0) {
-		if (headlen - offset >= len)
-			return get_unaligned_be16(data + offset);
-		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
-			return be16_to_cpu(tmp);
-	} else {
-		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
-		if (likely(ptr))
-			return get_unaligned_be16(ptr);
+	if (offset < 0) {
+		if (offset >= SKF_NET_OFF)
+			offset += skb_network_offset(skb) - SKF_NET_OFF;
+		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
+			offset += skb_mac_offset(skb) - SKF_LL_OFF;
+		else
+			return -EFAULT;
 	}
 
-	return -EFAULT;
+	if (headlen - offset >= len)
+		return get_unaligned_be16(data + offset);
+	if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+		return be16_to_cpu(tmp);
+	else
+		return -EFAULT;
 }
 
 BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
@@ -275,21 +281,24 @@ BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
 BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *,
 	   data, int, headlen, int, offset)
 {
-	__be32 tmp, *ptr;
+	__be32 tmp;
 	const int len = sizeof(tmp);
 
-	if (likely(offset >= 0)) {
-		if (headlen - offset >= len)
-			return get_unaligned_be32(data + offset);
-		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
-			return be32_to_cpu(tmp);
-	} else {
-		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
-		if (likely(ptr))
-			return get_unaligned_be32(ptr);
+	if (offset < 0) {
+		if (offset >= SKF_NET_OFF)
+			offset += skb_network_offset(skb) - SKF_NET_OFF;
+		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
+			offset += skb_mac_offset(skb) - SKF_LL_OFF;
+		else
+			return -EFAULT;
 	}
 
-	return -EFAULT;
+	if (headlen - offset >= len)
+		return get_unaligned_be32(data + offset);
+	if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+		return be32_to_cpu(tmp);
+	else
+		return -EFAULT;
 }
 
 BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb,
-- 
2.49.0.472.ge94155a9ec-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags
  2025-04-03 14:07 [PATCH bpf 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn
  2025-04-03 14:07 ` [PATCH bpf 1/2] bpf: " Willem de Bruijn
@ 2025-04-03 14:07 ` Willem de Bruijn
  2025-04-03 17:29   ` Stanislav Fomichev
  1 sibling, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2025-04-03 14:07 UTC (permalink / raw)
  To: bpf; +Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Verify that a classic BPF linux socket filter correctly matches
packet contents. Including when accessing contents in an
skb_frag.

1. Open a SOCK_RAW socket with a classic BPF filter on UDP dport 8000.
2. Open a tap device with IFF_NAPI_FRAGS to inject skbs with frags.
3. Send a packet for which the UDP header is in frag[0].
4. Receive this packet to demonstrate that the socket accepted it.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/net/.gitignore     |   1 +
 tools/testing/selftests/net/Makefile       |   2 +
 tools/testing/selftests/net/skf_net_off.c  | 244 +++++++++++++++++++++
 tools/testing/selftests/net/skf_net_off.sh |  28 +++
 4 files changed, 275 insertions(+)
 create mode 100644 tools/testing/selftests/net/skf_net_off.c
 create mode 100755 tools/testing/selftests/net/skf_net_off.sh

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 679542f565a4..532bb732bc6d 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -39,6 +39,7 @@ scm_rights
 sk_bind_sendto_listen
 sk_connect_zero_addr
 sk_so_peek_off
+skf_net_off
 socket
 so_incoming_cpu
 so_netns_cookie
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 6d718b478ed8..124078b56fa4 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -106,6 +106,8 @@ TEST_PROGS += ipv6_route_update_soft_lockup.sh
 TEST_PROGS += busy_poll_test.sh
 TEST_GEN_PROGS += proc_net_pktgen
 TEST_PROGS += lwt_dst_cache_ref_loop.sh
+TEST_PROGS += skf_net_off.sh
+TEST_GEN_FILES += skf_net_off
 
 # YNL files, must be before "include ..lib.mk"
 YNL_GEN_FILES := busy_poller netlink-dumps
diff --git a/tools/testing/selftests/net/skf_net_off.c b/tools/testing/selftests/net/skf_net_off.c
new file mode 100644
index 000000000000..1fdf61d6cd7f
--- /dev/null
+++ b/tools/testing/selftests/net/skf_net_off.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Open a tun device.
+ *
+ * [modifications: use IFF_NAPI_FRAGS, add sk filter]
+ *
+ * Expects the device to have been configured previously, e.g.:
+ *   sudo ip tuntap add name tap1 mode tap
+ *   sudo ip link set tap1 up
+ *   sudo ip link set dev tap1 addr 02:00:00:00:00:01
+ *   sudo ip -6 addr add fdab::1 peer fdab::2 dev tap1 nodad
+ *
+ * And to avoid premature pskb_may_pull:
+ *
+ *   sudo ethtool -K tap1 gro off
+ *   sudo bash -c 'echo 0 > /proc/sys/net/ipv4/ip_early_demux'
+ */
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <linux/filter.h>
+#include <linux/if.h>
+#include <linux/if_packet.h>
+#include <linux/if_tun.h>
+#include <linux/ipv6.h>
+#include <netinet/if_ether.h>
+#include <netinet/in.h>
+#include <netinet/ip.h>
+#include <netinet/ip6.h>
+#include <netinet/udp.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/poll.h>
+#include <sys/types.h>
+#include <sys/uio.h>
+#include <unistd.h>
+
+static bool cfg_do_filter;
+static bool cfg_do_frags;
+static int cfg_dst_port = 8000;
+static char *cfg_ifname;
+
+static int tun_open(const char *tun_name)
+{
+	struct ifreq ifr = {0};
+	int fd, ret;
+
+	fd = open("/dev/net/tun", O_RDWR);
+	if (fd == -1)
+		error(1, errno, "open /dev/net/tun");
+
+	ifr.ifr_flags = IFF_TAP;
+	if (cfg_do_frags)
+		ifr.ifr_flags |= IFF_NAPI | IFF_NAPI_FRAGS;
+
+	strncpy(ifr.ifr_name, tun_name, IFNAMSIZ - 1);
+
+	ret = ioctl(fd, TUNSETIFF, &ifr);
+	if (ret)
+		error(1, ret, "ioctl TUNSETIFF");
+
+	return fd;
+}
+
+static void sk_set_filter(int fd)
+{
+	const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
+	const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
+
+	/* Filter UDP packets with destination port cfg_dst_port */
+	struct sock_filter filter_code[] = {
+		BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
+		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
+		BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
+		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
+		BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
+		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, cfg_dst_port, 1, 0),
+		BPF_STMT(BPF_RET + BPF_K, 0),
+		BPF_STMT(BPF_RET + BPF_K, 0xFFFF),
+	};
+
+	struct sock_fprog filter = {
+		sizeof(filter_code) / sizeof(filter_code[0]),
+		filter_code,
+	};
+
+	if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter)))
+		error(1, errno, "setsockopt attach filter");
+}
+
+static int raw_open(void)
+{
+	int fd;
+
+	fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);
+	if (fd == -1)
+		error(1, errno, "socket raw (udp)");
+
+	if (cfg_do_filter)
+		sk_set_filter(fd);
+
+	return fd;
+}
+
+static void tun_write(int fd)
+{
+	const char eth_src[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x02 };
+	const char eth_dst[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x01 };
+	struct tun_pi pi = {0};
+	struct ipv6hdr ip6h = {0};
+	struct udphdr uh = {0};
+	struct ethhdr eth = {0};
+	uint32_t payload;
+	struct iovec iov[5];
+	int ret;
+
+	pi.proto = htons(ETH_P_IPV6);
+
+	memcpy(eth.h_source, eth_src, sizeof(eth_src));
+	memcpy(eth.h_dest, eth_dst, sizeof(eth_dst));
+	eth.h_proto = htons(ETH_P_IPV6);
+
+	ip6h.version = 6;
+	ip6h.payload_len = htons(sizeof(uh) + sizeof(uint32_t));
+	ip6h.nexthdr = IPPROTO_UDP;
+	ip6h.hop_limit = 8;
+	if (inet_pton(AF_INET6, "fdab::2", &ip6h.saddr) != 1)
+		error(1, errno, "inet_pton src");
+	if (inet_pton(AF_INET6, "fdab::1", &ip6h.daddr) != 1)
+		error(1, errno, "inet_pton src");
+
+	uh.source = htons(8000);
+	uh.dest = htons(cfg_dst_port);
+	uh.len = ip6h.payload_len;
+	uh.check = 0;
+
+	payload = htonl(0xABABABAB);		/* Covered in IPv6 length */
+
+	iov[0].iov_base = &pi;
+	iov[0].iov_len  = sizeof(pi);
+	iov[1].iov_base = &eth;
+	iov[1].iov_len  = sizeof(eth);
+	iov[2].iov_base = &ip6h;
+	iov[2].iov_len  = sizeof(ip6h);
+	iov[3].iov_base = &uh;
+	iov[3].iov_len  = sizeof(uh);
+	iov[4].iov_base = &payload;
+	iov[4].iov_len  = sizeof(payload);
+
+	ret = writev(fd, iov, sizeof(iov) / sizeof(iov[0]));
+	if (ret <= 0)
+		error(1, errno, "writev");
+}
+
+static void raw_read(int fd)
+{
+	struct timeval tv = { .tv_usec = 100 * 1000 };
+	struct msghdr msg = {0};
+	struct iovec iov[2];
+	struct udphdr uh;
+	uint32_t payload[2];
+	int ret;
+
+	if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
+		error(1, errno, "setsockopt rcvtimeo udp");
+
+	iov[0].iov_base = &uh;
+	iov[0].iov_len = sizeof(uh);
+
+	iov[1].iov_base = payload;
+	iov[1].iov_len = sizeof(payload);
+
+	msg.msg_iov = iov;
+	msg.msg_iovlen = sizeof(iov) / sizeof(iov[0]);
+
+	ret = recvmsg(fd, &msg, 0);
+	if (ret <= 0)
+		error(1, errno, "read raw");
+	if (ret != sizeof(uh) + sizeof(payload[0]))
+		error(1, errno, "read raw: len=%d\n", ret);
+
+	fprintf(stderr, "raw recv: 0x%x\n", payload[0]);
+}
+
+static void parse_opts(int argc, char **argv)
+{
+	int c;
+
+	while ((c = getopt(argc, argv, "fFi:")) != -1) {
+		switch (c) {
+		case 'f':
+			cfg_do_filter = true;
+			printf("bpf filter enabled\n");
+			break;
+		case 'F':
+			cfg_do_frags = true;
+			printf("napi frags mode enabled\n");
+			break;
+		case 'i':
+			cfg_ifname = optarg;
+			break;
+		default:
+			error(1, 0, "unknown option %c", optopt);
+			break;
+		}
+	}
+
+	if (!cfg_ifname)
+		error(1, 0, "must specify tap interface name (-i)");
+}
+
+int main(int argc, char **argv)
+{
+	int fdt, fdr;
+
+	parse_opts(argc, argv);
+
+	fdr = raw_open();
+	fdt = tun_open(cfg_ifname);
+
+	tun_write(fdt);
+	raw_read(fdr);
+
+	if (close(fdt))
+		error(1, errno, "close tun");
+	if (close(fdr))
+		error(1, errno, "close udp");
+
+	fprintf(stderr, "OK\n");
+	return 0;
+}
+
diff --git a/tools/testing/selftests/net/skf_net_off.sh b/tools/testing/selftests/net/skf_net_off.sh
new file mode 100755
index 000000000000..e9cce93a0258
--- /dev/null
+++ b/tools/testing/selftests/net/skf_net_off.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+readonly NS="ns-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	ip netns del $NS
+}
+
+ip netns add $NS
+trap cleanup EXIT
+
+ip -netns $NS link set lo up
+ip -netns $NS tuntap add name tap1 mode tap
+ip -netns $NS link set tap1 up
+ip -netns $NS link set dev tap1 addr 02:00:00:00:00:01
+ip -netns $NS -6 addr add fdab::1 peer fdab::2 dev tap1 nodad
+ip netns exec $NS ethtool -K tap1 gro off
+ip netns exec $NS sysctl -w net.ipv4.ip_early_demux=0
+
+echo "no filter"
+ip netns exec $NS ./skf_net_off -i tap1
+
+echo "filter, linear skb (-f)"
+ip netns exec $NS ./skf_net_off -i tap1 -f
+
+echo "filter, fragmented skb (-f) (-F)"
+ip netns exec $NS ./skf_net_off -i tap1 -f -F
-- 
2.49.0.472.ge94155a9ec-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  2025-04-03 14:07 ` [PATCH bpf 1/2] bpf: " Willem de Bruijn
@ 2025-04-03 14:18   ` Willem de Bruijn
  2025-04-03 17:21   ` Stanislav Fomichev
  1 sibling, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-04-03 14:18 UTC (permalink / raw)
  To: bpf, daniel
  Cc: netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller,
	Maciej Żenczykowski

Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> read when these offsets extend into frags.
> 
> This has been observed with iwlwifi and reproduced with tun with
> IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> applied to a RAW socket, will silently miss matching packets.
> 
>     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
>     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
>     struct sock_filter filter_code[] = {
>             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
>             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
>             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
>             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
>             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> 
> This is unexpected behavior. Socket filter programs should be
> consistent regardless of environment. Silent misses are
> particularly concerning as hard to detect.
> 
> Use skb_copy_bits for offsets outside linear, same as done for
> non-SKF_(LL|NET) offsets.
> 
> Offset is always positive after subtracting the reference threshold
> SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> the two is an offset against skb->data, and may be negative, but it
> cannot point before skb->head, as skb_(mac|network)_offset would too.
> 
> This appears to go back to when frag support was introduced to
> sk_run_filter in linux-2.4.4, before the introduction of git.
> 
> The amount of code change and 8/16/32 bit duplication are unfortunate.
> But any attempt I made to be smarter saved very few LoC while
> complicating the code.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/

Let me respond here to the earlier comments in that thread, rather than reopen that.
https://lore.kernel.org/netdev/4a6be957-f932-426a-99bf-7209620f8fa9@iogearbox.net/

As Daniel suggests, a developer can work around the issue in a variety
of ways. And that is what this customer does in the short term. Avoid
SKF_NET_OFF.

But I think this does need to be addressed. SKF_NET_OFF doing what you
expect in many cases, but silently missing matches in some, is just
dangerous. It's not hard to fix: just use skb_copy_bits, like we also
do for regular positive offsets.

The specific instance is due to attaching a filter in the L3 layer,
while inspecting a transport header that will only get pulled by the
L4 layer. Pulling these headers in device drivers is a non-starter.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  2025-04-03 14:07 ` [PATCH bpf 1/2] bpf: " Willem de Bruijn
  2025-04-03 14:18   ` Willem de Bruijn
@ 2025-04-03 17:21   ` Stanislav Fomichev
  2025-04-03 17:27     ` Willem de Bruijn
  1 sibling, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-04-03 17:21 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: bpf, netdev, ast, daniel, john.fastabend, Willem de Bruijn,
	Matt Moeller, Maciej Żenczykowski

On 04/03, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> read when these offsets extend into frags.
> 
> This has been observed with iwlwifi and reproduced with tun with
> IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> applied to a RAW socket, will silently miss matching packets.
> 
>     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
>     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
>     struct sock_filter filter_code[] = {
>             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
>             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
>             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
>             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
>             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> 
> This is unexpected behavior. Socket filter programs should be
> consistent regardless of environment. Silent misses are
> particularly concerning as hard to detect.
> 
> Use skb_copy_bits for offsets outside linear, same as done for
> non-SKF_(LL|NET) offsets.
> 
> Offset is always positive after subtracting the reference threshold
> SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> the two is an offset against skb->data, and may be negative, but it
> cannot point before skb->head, as skb_(mac|network)_offset would too.
> 
> This appears to go back to when frag support was introduced to
> sk_run_filter in linux-2.4.4, before the introduction of git.
> 
> The amount of code change and 8/16/32 bit duplication are unfortunate.
> But any attempt I made to be smarter saved very few LoC while
> complicating the code.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> Reported-by: Matt Moeller <moeller.matt@gmail.com>
> Co-developed-by: Maciej Żenczykowski <maze@google.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  include/linux/filter.h |  3 --
>  kernel/bpf/core.c      | 21 ------------
>  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
>  3 files changed, 42 insertions(+), 57 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index f5cf4d35d83e..708ac7e0cd36 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
>  	}
>  }
>  
> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> -					   int k, unsigned int size);
> -
>  static inline int bpf_tell_extensions(void)
>  {
>  	return SKF_AD_MAX;
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index ba6b6118cf50..0e836b5ac9a0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -68,27 +68,6 @@
>  struct bpf_mem_alloc bpf_global_ma;
>  bool bpf_global_ma_set;
>  
> -/* No hurry in this branch
> - *
> - * Exported for the bpf jit load helper.
> - */
> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> -{
> -	u8 *ptr = NULL;
> -
> -	if (k >= SKF_NET_OFF) {
> -		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> -	} else if (k >= SKF_LL_OFF) {
> -		if (unlikely(!skb_mac_header_was_set(skb)))
> -			return NULL;
> -		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> -	}
> -	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> -		return ptr;
> -
> -	return NULL;
> -}
> -
>  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
>  enum page_size_enum {
>  	__PAGE_SIZE = PAGE_SIZE
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bc6828761a47..b232b70dd10d 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
>  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
>  	   data, int, headlen, int, offset)
>  {
> -	u8 tmp, *ptr;
> +	u8 tmp;
>  	const int len = sizeof(tmp);
>  
> -	if (offset >= 0) {
> -		if (headlen - offset >= len)
> -			return *(u8 *)(data + offset);
> -		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> -			return tmp;
> -	} else {
> -		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> -		if (likely(ptr))
> -			return *(u8 *)ptr;

[..]

> +	if (offset < 0) {
> +		if (offset >= SKF_NET_OFF)
> +			offset += skb_network_offset(skb) - SKF_NET_OFF;
> +		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> +			offset += skb_mac_offset(skb) - SKF_LL_OFF;
> +		else
> +			return -EFAULT;
>  	}

nit: we now repeat the same logic three times, maybe still worth it to put it
into a helper? bpf_resolve_classic_offset or something. 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  2025-04-03 17:21   ` Stanislav Fomichev
@ 2025-04-03 17:27     ` Willem de Bruijn
  2025-04-03 17:35       ` Stanislav Fomichev
  0 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2025-04-03 17:27 UTC (permalink / raw)
  To: Stanislav Fomichev, Willem de Bruijn
  Cc: bpf, netdev, ast, daniel, john.fastabend, Willem de Bruijn,
	Matt Moeller, Maciej Żenczykowski

Stanislav Fomichev wrote:
> On 04/03, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> > 
> > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > read when these offsets extend into frags.
> > 
> > This has been observed with iwlwifi and reproduced with tun with
> > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > applied to a RAW socket, will silently miss matching packets.
> > 
> >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> >     struct sock_filter filter_code[] = {
> >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > 
> > This is unexpected behavior. Socket filter programs should be
> > consistent regardless of environment. Silent misses are
> > particularly concerning as hard to detect.
> > 
> > Use skb_copy_bits for offsets outside linear, same as done for
> > non-SKF_(LL|NET) offsets.
> > 
> > Offset is always positive after subtracting the reference threshold
> > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > the two is an offset against skb->data, and may be negative, but it
> > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > 
> > This appears to go back to when frag support was introduced to
> > sk_run_filter in linux-2.4.4, before the introduction of git.
> > 
> > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > But any attempt I made to be smarter saved very few LoC while
> > complicating the code.
> > 
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > ---
> >  include/linux/filter.h |  3 --
> >  kernel/bpf/core.c      | 21 ------------
> >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> >  3 files changed, 42 insertions(+), 57 deletions(-)
> > 
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index f5cf4d35d83e..708ac7e0cd36 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> >  	}
> >  }
> >  
> > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > -					   int k, unsigned int size);
> > -
> >  static inline int bpf_tell_extensions(void)
> >  {
> >  	return SKF_AD_MAX;
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index ba6b6118cf50..0e836b5ac9a0 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -68,27 +68,6 @@
> >  struct bpf_mem_alloc bpf_global_ma;
> >  bool bpf_global_ma_set;
> >  
> > -/* No hurry in this branch
> > - *
> > - * Exported for the bpf jit load helper.
> > - */
> > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > -{
> > -	u8 *ptr = NULL;
> > -
> > -	if (k >= SKF_NET_OFF) {
> > -		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > -	} else if (k >= SKF_LL_OFF) {
> > -		if (unlikely(!skb_mac_header_was_set(skb)))
> > -			return NULL;
> > -		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > -	}
> > -	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > -		return ptr;
> > -
> > -	return NULL;
> > -}
> > -
> >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> >  enum page_size_enum {
> >  	__PAGE_SIZE = PAGE_SIZE
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index bc6828761a47..b232b70dd10d 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> >  	   data, int, headlen, int, offset)
> >  {
> > -	u8 tmp, *ptr;
> > +	u8 tmp;
> >  	const int len = sizeof(tmp);
> >  
> > -	if (offset >= 0) {
> > -		if (headlen - offset >= len)
> > -			return *(u8 *)(data + offset);
> > -		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > -			return tmp;
> > -	} else {
> > -		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > -		if (likely(ptr))
> > -			return *(u8 *)ptr;
> 
> [..]
> 
> > +	if (offset < 0) {
> > +		if (offset >= SKF_NET_OFF)
> > +			offset += skb_network_offset(skb) - SKF_NET_OFF;
> > +		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > +			offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > +		else
> > +			return -EFAULT;
> >  	}
> 
> nit: we now repeat the same logic three times, maybe still worth it to put it
> into a helper? bpf_resolve_classic_offset or something. 

I definitely tried this in various ways. But since the core logic is
only four lines and there is an early return on error, no helper
really simplifies anything. It just adds a layer of indirection and
more code in the end.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags
  2025-04-03 14:07 ` [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn
@ 2025-04-03 17:29   ` Stanislav Fomichev
  2025-04-03 18:56     ` Willem de Bruijn
  0 siblings, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-04-03 17:29 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: bpf, netdev, ast, daniel, john.fastabend, Willem de Bruijn

On 04/03, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Verify that a classic BPF linux socket filter correctly matches
> packet contents. Including when accessing contents in an
> skb_frag.
> 
> 1. Open a SOCK_RAW socket with a classic BPF filter on UDP dport 8000.
> 2. Open a tap device with IFF_NAPI_FRAGS to inject skbs with frags.
> 3. Send a packet for which the UDP header is in frag[0].
> 4. Receive this packet to demonstrate that the socket accepted it.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

My (weak) preference is to put (most) bpf-related things under
selftests/bpf, but since you already have it working, not sure
it's worth the effort.

> ---
>  tools/testing/selftests/net/.gitignore     |   1 +
>  tools/testing/selftests/net/Makefile       |   2 +
>  tools/testing/selftests/net/skf_net_off.c  | 244 +++++++++++++++++++++
>  tools/testing/selftests/net/skf_net_off.sh |  28 +++
>  4 files changed, 275 insertions(+)
>  create mode 100644 tools/testing/selftests/net/skf_net_off.c
>  create mode 100755 tools/testing/selftests/net/skf_net_off.sh
> 
> diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
> index 679542f565a4..532bb732bc6d 100644
> --- a/tools/testing/selftests/net/.gitignore
> +++ b/tools/testing/selftests/net/.gitignore
> @@ -39,6 +39,7 @@ scm_rights
>  sk_bind_sendto_listen
>  sk_connect_zero_addr
>  sk_so_peek_off
> +skf_net_off
>  socket
>  so_incoming_cpu
>  so_netns_cookie
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index 6d718b478ed8..124078b56fa4 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -106,6 +106,8 @@ TEST_PROGS += ipv6_route_update_soft_lockup.sh
>  TEST_PROGS += busy_poll_test.sh
>  TEST_GEN_PROGS += proc_net_pktgen
>  TEST_PROGS += lwt_dst_cache_ref_loop.sh
> +TEST_PROGS += skf_net_off.sh
> +TEST_GEN_FILES += skf_net_off
>  
>  # YNL files, must be before "include ..lib.mk"
>  YNL_GEN_FILES := busy_poller netlink-dumps
> diff --git a/tools/testing/selftests/net/skf_net_off.c b/tools/testing/selftests/net/skf_net_off.c
> new file mode 100644
> index 000000000000..1fdf61d6cd7f
> --- /dev/null
> +++ b/tools/testing/selftests/net/skf_net_off.c
> @@ -0,0 +1,244 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/* Open a tun device.
> + *
> + * [modifications: use IFF_NAPI_FRAGS, add sk filter]
> + *
> + * Expects the device to have been configured previously, e.g.:
> + *   sudo ip tuntap add name tap1 mode tap
> + *   sudo ip link set tap1 up
> + *   sudo ip link set dev tap1 addr 02:00:00:00:00:01
> + *   sudo ip -6 addr add fdab::1 peer fdab::2 dev tap1 nodad
> + *
> + * And to avoid premature pskb_may_pull:
> + *
> + *   sudo ethtool -K tap1 gro off
> + *   sudo bash -c 'echo 0 > /proc/sys/net/ipv4/ip_early_demux'
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <error.h>
> +#include <fcntl.h>
> +#include <getopt.h>
> +#include <linux/filter.h>
> +#include <linux/if.h>
> +#include <linux/if_packet.h>
> +#include <linux/if_tun.h>
> +#include <linux/ipv6.h>
> +#include <netinet/if_ether.h>
> +#include <netinet/in.h>
> +#include <netinet/ip.h>
> +#include <netinet/ip6.h>
> +#include <netinet/udp.h>
> +#include <poll.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/ioctl.h>
> +#include <sys/socket.h>
> +#include <sys/poll.h>
> +#include <sys/types.h>
> +#include <sys/uio.h>
> +#include <unistd.h>
> +
> +static bool cfg_do_filter;
> +static bool cfg_do_frags;
> +static int cfg_dst_port = 8000;
> +static char *cfg_ifname;
> +
> +static int tun_open(const char *tun_name)
> +{
> +	struct ifreq ifr = {0};
> +	int fd, ret;
> +
> +	fd = open("/dev/net/tun", O_RDWR);
> +	if (fd == -1)
> +		error(1, errno, "open /dev/net/tun");
> +
> +	ifr.ifr_flags = IFF_TAP;
> +	if (cfg_do_frags)
> +		ifr.ifr_flags |= IFF_NAPI | IFF_NAPI_FRAGS;
> +
> +	strncpy(ifr.ifr_name, tun_name, IFNAMSIZ - 1);
> +
> +	ret = ioctl(fd, TUNSETIFF, &ifr);
> +	if (ret)
> +		error(1, ret, "ioctl TUNSETIFF");
> +
> +	return fd;
> +}
> +
> +static void sk_set_filter(int fd)
> +{
> +	const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> +	const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> +
> +	/* Filter UDP packets with destination port cfg_dst_port */
> +	struct sock_filter filter_code[] = {
> +		BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> +		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> +		BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> +		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> +		BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> +		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, cfg_dst_port, 1, 0),
> +		BPF_STMT(BPF_RET + BPF_K, 0),
> +		BPF_STMT(BPF_RET + BPF_K, 0xFFFF),
> +	};
> +
> +	struct sock_fprog filter = {
> +		sizeof(filter_code) / sizeof(filter_code[0]),
> +		filter_code,
> +	};
> +
> +	if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter)))
> +		error(1, errno, "setsockopt attach filter");
> +}
> +
> +static int raw_open(void)
> +{
> +	int fd;
> +
> +	fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);
> +	if (fd == -1)
> +		error(1, errno, "socket raw (udp)");
> +
> +	if (cfg_do_filter)
> +		sk_set_filter(fd);
> +
> +	return fd;
> +}
> +
> +static void tun_write(int fd)
> +{
> +	const char eth_src[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x02 };
> +	const char eth_dst[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x01 };
> +	struct tun_pi pi = {0};
> +	struct ipv6hdr ip6h = {0};
> +	struct udphdr uh = {0};
> +	struct ethhdr eth = {0};
> +	uint32_t payload;
> +	struct iovec iov[5];
> +	int ret;
> +
> +	pi.proto = htons(ETH_P_IPV6);
> +
> +	memcpy(eth.h_source, eth_src, sizeof(eth_src));
> +	memcpy(eth.h_dest, eth_dst, sizeof(eth_dst));
> +	eth.h_proto = htons(ETH_P_IPV6);
> +
> +	ip6h.version = 6;
> +	ip6h.payload_len = htons(sizeof(uh) + sizeof(uint32_t));
> +	ip6h.nexthdr = IPPROTO_UDP;
> +	ip6h.hop_limit = 8;
> +	if (inet_pton(AF_INET6, "fdab::2", &ip6h.saddr) != 1)
> +		error(1, errno, "inet_pton src");
> +	if (inet_pton(AF_INET6, "fdab::1", &ip6h.daddr) != 1)
> +		error(1, errno, "inet_pton src");
> +
> +	uh.source = htons(8000);
> +	uh.dest = htons(cfg_dst_port);
> +	uh.len = ip6h.payload_len;
> +	uh.check = 0;
> +
> +	payload = htonl(0xABABABAB);		/* Covered in IPv6 length */
> +
> +	iov[0].iov_base = &pi;
> +	iov[0].iov_len  = sizeof(pi);
> +	iov[1].iov_base = &eth;
> +	iov[1].iov_len  = sizeof(eth);
> +	iov[2].iov_base = &ip6h;
> +	iov[2].iov_len  = sizeof(ip6h);
> +	iov[3].iov_base = &uh;
> +	iov[3].iov_len  = sizeof(uh);
> +	iov[4].iov_base = &payload;
> +	iov[4].iov_len  = sizeof(payload);
> +
> +	ret = writev(fd, iov, sizeof(iov) / sizeof(iov[0]));
> +	if (ret <= 0)
> +		error(1, errno, "writev");
> +}
> +
> +static void raw_read(int fd)
> +{
> +	struct timeval tv = { .tv_usec = 100 * 1000 };
> +	struct msghdr msg = {0};
> +	struct iovec iov[2];
> +	struct udphdr uh;
> +	uint32_t payload[2];
> +	int ret;
> +
> +	if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
> +		error(1, errno, "setsockopt rcvtimeo udp");
> +
> +	iov[0].iov_base = &uh;
> +	iov[0].iov_len = sizeof(uh);
> +
> +	iov[1].iov_base = payload;
> +	iov[1].iov_len = sizeof(payload);
> +
> +	msg.msg_iov = iov;
> +	msg.msg_iovlen = sizeof(iov) / sizeof(iov[0]);
> +
> +	ret = recvmsg(fd, &msg, 0);
> +	if (ret <= 0)
> +		error(1, errno, "read raw");
> +	if (ret != sizeof(uh) + sizeof(payload[0]))
> +		error(1, errno, "read raw: len=%d\n", ret);
> +
> +	fprintf(stderr, "raw recv: 0x%x\n", payload[0]);
> +}
> +
> +static void parse_opts(int argc, char **argv)
> +{
> +	int c;
> +
> +	while ((c = getopt(argc, argv, "fFi:")) != -1) {
> +		switch (c) {
> +		case 'f':
> +			cfg_do_filter = true;
> +			printf("bpf filter enabled\n");
> +			break;
> +		case 'F':
> +			cfg_do_frags = true;
> +			printf("napi frags mode enabled\n");
> +			break;
> +		case 'i':
> +			cfg_ifname = optarg;
> +			break;
> +		default:
> +			error(1, 0, "unknown option %c", optopt);
> +			break;
> +		}
> +	}
> +
> +	if (!cfg_ifname)
> +		error(1, 0, "must specify tap interface name (-i)");
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int fdt, fdr;
> +
> +	parse_opts(argc, argv);
> +
> +	fdr = raw_open();
> +	fdt = tun_open(cfg_ifname);
> +
> +	tun_write(fdt);
> +	raw_read(fdr);
> +
> +	if (close(fdt))
> +		error(1, errno, "close tun");
> +	if (close(fdr))
> +		error(1, errno, "close udp");
> +
> +	fprintf(stderr, "OK\n");
> +	return 0;
> +}
> +
> diff --git a/tools/testing/selftests/net/skf_net_off.sh b/tools/testing/selftests/net/skf_net_off.sh
> new file mode 100755
> index 000000000000..e9cce93a0258
> --- /dev/null
> +++ b/tools/testing/selftests/net/skf_net_off.sh
> @@ -0,0 +1,28 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +readonly NS="ns-$(mktemp -u XXXXXX)"
> +
> +cleanup() {
> +	ip netns del $NS
> +}
> +
> +ip netns add $NS
> +trap cleanup EXIT
> +
> +ip -netns $NS link set lo up
> +ip -netns $NS tuntap add name tap1 mode tap
> +ip -netns $NS link set tap1 up
> +ip -netns $NS link set dev tap1 addr 02:00:00:00:00:01
> +ip -netns $NS -6 addr add fdab::1 peer fdab::2 dev tap1 nodad
> +ip netns exec $NS ethtool -K tap1 gro off
> +ip netns exec $NS sysctl -w net.ipv4.ip_early_demux=0

Curious: why disable ip_early_demux here?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  2025-04-03 17:27     ` Willem de Bruijn
@ 2025-04-03 17:35       ` Stanislav Fomichev
  2025-04-03 18:30         ` Maciej Żenczykowski
  0 siblings, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-04-03 17:35 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: bpf, netdev, ast, daniel, john.fastabend, Willem de Bruijn,
	Matt Moeller, Maciej Żenczykowski

On 04/03, Willem de Bruijn wrote:
> Stanislav Fomichev wrote:
> > On 04/03, Willem de Bruijn wrote:
> > > From: Willem de Bruijn <willemb@google.com>
> > > 
> > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > > read when these offsets extend into frags.
> > > 
> > > This has been observed with iwlwifi and reproduced with tun with
> > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > > applied to a RAW socket, will silently miss matching packets.
> > > 
> > >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> > >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> > >     struct sock_filter filter_code[] = {
> > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> > >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > > 
> > > This is unexpected behavior. Socket filter programs should be
> > > consistent regardless of environment. Silent misses are
> > > particularly concerning as hard to detect.
> > > 
> > > Use skb_copy_bits for offsets outside linear, same as done for
> > > non-SKF_(LL|NET) offsets.
> > > 
> > > Offset is always positive after subtracting the reference threshold
> > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > > the two is an offset against skb->data, and may be negative, but it
> > > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > > 
> > > This appears to go back to when frag support was introduced to
> > > sk_run_filter in linux-2.4.4, before the introduction of git.
> > > 
> > > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > > But any attempt I made to be smarter saved very few LoC while
> > > complicating the code.
> > > 
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > ---
> > >  include/linux/filter.h |  3 --
> > >  kernel/bpf/core.c      | 21 ------------
> > >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> > >  3 files changed, 42 insertions(+), 57 deletions(-)
> > > 
> > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > index f5cf4d35d83e..708ac7e0cd36 100644
> > > --- a/include/linux/filter.h
> > > +++ b/include/linux/filter.h
> > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> > >  	}
> > >  }
> > >  
> > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > > -					   int k, unsigned int size);
> > > -
> > >  static inline int bpf_tell_extensions(void)
> > >  {
> > >  	return SKF_AD_MAX;
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index ba6b6118cf50..0e836b5ac9a0 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -68,27 +68,6 @@
> > >  struct bpf_mem_alloc bpf_global_ma;
> > >  bool bpf_global_ma_set;
> > >  
> > > -/* No hurry in this branch
> > > - *
> > > - * Exported for the bpf jit load helper.
> > > - */
> > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > > -{
> > > -	u8 *ptr = NULL;
> > > -
> > > -	if (k >= SKF_NET_OFF) {
> > > -		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > > -	} else if (k >= SKF_LL_OFF) {
> > > -		if (unlikely(!skb_mac_header_was_set(skb)))
> > > -			return NULL;
> > > -		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > > -	}
> > > -	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > > -		return ptr;
> > > -
> > > -	return NULL;
> > > -}
> > > -
> > >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> > >  enum page_size_enum {
> > >  	__PAGE_SIZE = PAGE_SIZE
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index bc6828761a47..b232b70dd10d 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> > >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> > >  	   data, int, headlen, int, offset)
> > >  {
> > > -	u8 tmp, *ptr;
> > > +	u8 tmp;
> > >  	const int len = sizeof(tmp);
> > >  
> > > -	if (offset >= 0) {
> > > -		if (headlen - offset >= len)
> > > -			return *(u8 *)(data + offset);
> > > -		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > -			return tmp;
> > > -	} else {
> > > -		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > -		if (likely(ptr))
> > > -			return *(u8 *)ptr;
> > 
> > [..]
> > 
> > > +	if (offset < 0) {
> > > +		if (offset >= SKF_NET_OFF)
> > > +			offset += skb_network_offset(skb) - SKF_NET_OFF;
> > > +		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > > +			offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > > +		else
> > > +			return -EFAULT;
> > >  	}
> > 
> > nit: we now repeat the same logic three times, maybe still worth it to put it
> > into a helper? bpf_resolve_classic_offset or something. 
> 
> I definitely tried this in various ways. But since the core logic is
> only four lines and there is an early return on error, no helper
> really simplifies anything. It just adds a layer of indirection and
> more code in the end.

More code, but at least it de-duplicates the logic of translating
SKF_XXX_OFF? Something like the following below, but yeah, a matter
of preference, up to you.

static int bpf_skb_resolve_offset(skb, offset) {
	if (offset >= 0)
		return offset;

	if (offset >= SKF_NET_OFF)
		offset += skb_network_offset(skb) - SKF_NET_OFF;
	else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
		offset += skb_mac_offset(skb) - SKF_LL_OFF;

	return -1;
}

BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
           data, int, headlen, int, offset)
{
	offset = bpf_skb_resolve_offset(skb, offset);
	if (offset < 0)
		return -EFAULT;

	...
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  2025-04-03 17:35       ` Stanislav Fomichev
@ 2025-04-03 18:30         ` Maciej Żenczykowski
  2025-04-03 18:54           ` Willem de Bruijn
  0 siblings, 1 reply; 11+ messages in thread
From: Maciej Żenczykowski @ 2025-04-03 18:30 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Willem de Bruijn, bpf, netdev, ast, daniel, john.fastabend,
	Willem de Bruijn, Matt Moeller

On Thu, Apr 3, 2025 at 10:35 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 04/03, Willem de Bruijn wrote:
> > Stanislav Fomichev wrote:
> > > On 04/03, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > > > read when these offsets extend into frags.
> > > >
> > > > This has been observed with iwlwifi and reproduced with tun with
> > > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > > > applied to a RAW socket, will silently miss matching packets.
> > > >
> > > >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> > > >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> > > >     struct sock_filter filter_code[] = {
> > > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> > > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> > > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> > > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> > > >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > > >
> > > > This is unexpected behavior. Socket filter programs should be
> > > > consistent regardless of environment. Silent misses are
> > > > particularly concerning as hard to detect.
> > > >
> > > > Use skb_copy_bits for offsets outside linear, same as done for
> > > > non-SKF_(LL|NET) offsets.
> > > >
> > > > Offset is always positive after subtracting the reference threshold
> > > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > > > the two is an offset against skb->data, and may be negative, but it
> > > > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > > >
> > > > This appears to go back to when frag support was introduced to
> > > > sk_run_filter in linux-2.4.4, before the introduction of git.
> > > >
> > > > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > > > But any attempt I made to be smarter saved very few LoC while
> > > > complicating the code.
> > > >
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > > > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > > > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > > > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > > >  include/linux/filter.h |  3 --
> > > >  kernel/bpf/core.c      | 21 ------------
> > > >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> > > >  3 files changed, 42 insertions(+), 57 deletions(-)
> > > >
> > > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > > index f5cf4d35d83e..708ac7e0cd36 100644
> > > > --- a/include/linux/filter.h
> > > > +++ b/include/linux/filter.h
> > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> > > >   }
> > > >  }
> > > >
> > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > > > -                                    int k, unsigned int size);
> > > > -
> > > >  static inline int bpf_tell_extensions(void)
> > > >  {
> > > >   return SKF_AD_MAX;
> > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > index ba6b6118cf50..0e836b5ac9a0 100644
> > > > --- a/kernel/bpf/core.c
> > > > +++ b/kernel/bpf/core.c
> > > > @@ -68,27 +68,6 @@
> > > >  struct bpf_mem_alloc bpf_global_ma;
> > > >  bool bpf_global_ma_set;
> > > >
> > > > -/* No hurry in this branch
> > > > - *
> > > > - * Exported for the bpf jit load helper.
> > > > - */
> > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > > > -{
> > > > - u8 *ptr = NULL;
> > > > -
> > > > - if (k >= SKF_NET_OFF) {
> > > > -         ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > > > - } else if (k >= SKF_LL_OFF) {
> > > > -         if (unlikely(!skb_mac_header_was_set(skb)))
> > > > -                 return NULL;
> > > > -         ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > > > - }
> > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > > > -         return ptr;
> > > > -
> > > > - return NULL;
> > > > -}
> > > > -
> > > >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> > > >  enum page_size_enum {
> > > >   __PAGE_SIZE = PAGE_SIZE
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index bc6828761a47..b232b70dd10d 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> > > >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> > > >      data, int, headlen, int, offset)
> > > >  {
> > > > - u8 tmp, *ptr;
> > > > + u8 tmp;
> > > >   const int len = sizeof(tmp);
> > > >
> > > > - if (offset >= 0) {
> > > > -         if (headlen - offset >= len)
> > > > -                 return *(u8 *)(data + offset);
> > > > -         if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > -                 return tmp;
> > > > - } else {
> > > > -         ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > > -         if (likely(ptr))
> > > > -                 return *(u8 *)ptr;
> > >
> > > [..]
> > >
> > > > + if (offset < 0) {
> > > > +         if (offset >= SKF_NET_OFF)
> > > > +                 offset += skb_network_offset(skb) - SKF_NET_OFF;
> > > > +         else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > > > +                 offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > > > +         else
> > > > +                 return -EFAULT;
> > > >   }
> > >
> > > nit: we now repeat the same logic three times, maybe still worth it to put it
> > > into a helper? bpf_resolve_classic_offset or something.
> >
> > I definitely tried this in various ways. But since the core logic is
> > only four lines and there is an early return on error, no helper
> > really simplifies anything. It just adds a layer of indirection and
> > more code in the end.
>
> More code, but at least it de-duplicates the logic of translating
> SKF_XXX_OFF? Something like the following below, but yeah, a matter
> of preference, up to you.
>
> static int bpf_skb_resolve_offset(skb, offset) {
>         if (offset >= 0)
>                 return offset;
>
>         if (offset >= SKF_NET_OFF)
>                 offset += skb_network_offset(skb) - SKF_NET_OFF;
>         else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
>                 offset += skb_mac_offset(skb) - SKF_LL_OFF;
>
>         return -1;
> }
>
> BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
>            data, int, headlen, int, offset)
> {
>         offset = bpf_skb_resolve_offset(skb, offset);
>         if (offset < 0)
>                 return -EFAULT;

this is incorrect, as offset can be legally negative here.

>
>         ...
> }

--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  2025-04-03 18:30         ` Maciej Żenczykowski
@ 2025-04-03 18:54           ` Willem de Bruijn
  0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-04-03 18:54 UTC (permalink / raw)
  To: Maciej Żenczykowski, Stanislav Fomichev
  Cc: Willem de Bruijn, bpf, netdev, ast, daniel, john.fastabend,
	Willem de Bruijn, Matt Moeller

Maciej Żenczykowski wrote:
> On Thu, Apr 3, 2025 at 10:35 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 04/03, Willem de Bruijn wrote:
> > > Stanislav Fomichev wrote:
> > > > On 04/03, Willem de Bruijn wrote:
> > > > > From: Willem de Bruijn <willemb@google.com>
> > > > >
> > > > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > > > > read when these offsets extend into frags.
> > > > >
> > > > > This has been observed with iwlwifi and reproduced with tun with
> > > > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > > > > applied to a RAW socket, will silently miss matching packets.
> > > > >
> > > > >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> > > > >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> > > > >     struct sock_filter filter_code[] = {
> > > > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> > > > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> > > > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> > > > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> > > > >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > > > >
> > > > > This is unexpected behavior. Socket filter programs should be
> > > > > consistent regardless of environment. Silent misses are
> > > > > particularly concerning as hard to detect.
> > > > >
> > > > > Use skb_copy_bits for offsets outside linear, same as done for
> > > > > non-SKF_(LL|NET) offsets.
> > > > >
> > > > > Offset is always positive after subtracting the reference threshold
> > > > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > > > > the two is an offset against skb->data, and may be negative, but it
> > > > > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > > > >
> > > > > This appears to go back to when frag support was introduced to
> > > > > sk_run_filter in linux-2.4.4, before the introduction of git.
> > > > >
> > > > > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > > > > But any attempt I made to be smarter saved very few LoC while
> > > > > complicating the code.
> > > > >
> > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > > > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > > > > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > > > > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > > > > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > ---
> > > > >  include/linux/filter.h |  3 --
> > > > >  kernel/bpf/core.c      | 21 ------------
> > > > >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> > > > >  3 files changed, 42 insertions(+), 57 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > > > index f5cf4d35d83e..708ac7e0cd36 100644
> > > > > --- a/include/linux/filter.h
> > > > > +++ b/include/linux/filter.h
> > > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> > > > >   }
> > > > >  }
> > > > >
> > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > > > > -                                    int k, unsigned int size);
> > > > > -
> > > > >  static inline int bpf_tell_extensions(void)
> > > > >  {
> > > > >   return SKF_AD_MAX;
> > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > > index ba6b6118cf50..0e836b5ac9a0 100644
> > > > > --- a/kernel/bpf/core.c
> > > > > +++ b/kernel/bpf/core.c
> > > > > @@ -68,27 +68,6 @@
> > > > >  struct bpf_mem_alloc bpf_global_ma;
> > > > >  bool bpf_global_ma_set;
> > > > >
> > > > > -/* No hurry in this branch
> > > > > - *
> > > > > - * Exported for the bpf jit load helper.
> > > > > - */
> > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > > > > -{
> > > > > - u8 *ptr = NULL;
> > > > > -
> > > > > - if (k >= SKF_NET_OFF) {
> > > > > -         ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > > > > - } else if (k >= SKF_LL_OFF) {
> > > > > -         if (unlikely(!skb_mac_header_was_set(skb)))
> > > > > -                 return NULL;
> > > > > -         ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > > > > - }
> > > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > > > > -         return ptr;
> > > > > -
> > > > > - return NULL;
> > > > > -}
> > > > > -
> > > > >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> > > > >  enum page_size_enum {
> > > > >   __PAGE_SIZE = PAGE_SIZE
> > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > index bc6828761a47..b232b70dd10d 100644
> > > > > --- a/net/core/filter.c
> > > > > +++ b/net/core/filter.c
> > > > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> > > > >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> > > > >      data, int, headlen, int, offset)
> > > > >  {
> > > > > - u8 tmp, *ptr;
> > > > > + u8 tmp;
> > > > >   const int len = sizeof(tmp);
> > > > >
> > > > > - if (offset >= 0) {
> > > > > -         if (headlen - offset >= len)
> > > > > -                 return *(u8 *)(data + offset);
> > > > > -         if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > > -                 return tmp;
> > > > > - } else {
> > > > > -         ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > > > -         if (likely(ptr))
> > > > > -                 return *(u8 *)ptr;
> > > >
> > > > [..]
> > > >
> > > > > + if (offset < 0) {
> > > > > +         if (offset >= SKF_NET_OFF)
> > > > > +                 offset += skb_network_offset(skb) - SKF_NET_OFF;
> > > > > +         else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > > > > +                 offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > > > > +         else
> > > > > +                 return -EFAULT;
> > > > >   }
> > > >
> > > > nit: we now repeat the same logic three times, maybe still worth it to put it
> > > > into a helper? bpf_resolve_classic_offset or something.
> > >
> > > I definitely tried this in various ways. But since the core logic is
> > > only four lines and there is an early return on error, no helper
> > > really simplifies anything. It just adds a layer of indirection and
> > > more code in the end.
> >
> > More code, but at least it de-duplicates the logic of translating
> > SKF_XXX_OFF? Something like the following below, but yeah, a matter
> > of preference, up to you.

I see your point. No strong opinion from me. Will revise,
assuming you don't mind the workaround below:

> > static int bpf_skb_resolve_offset(skb, offset) {
> >         if (offset >= 0)
> >                 return offset;
> >
> >         if (offset >= SKF_NET_OFF)
> >                 offset += skb_network_offset(skb) - SKF_NET_OFF;
> >         else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> >                 offset += skb_mac_offset(skb) - SKF_LL_OFF;
> >
> >         return -1;
> > }
> >
> > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> >            data, int, headlen, int, offset)
> > {
> >         offset = bpf_skb_resolve_offset(skb, offset);
> >         if (offset < 0)
> >                 return -EFAULT;
> 
> this is incorrect, as offset can be legally negative here.

Yeah, this needs a special case like INT_MIN to communicate error,
or pass-by-value. Exactly the kind of workarounds that gave me pause.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags
  2025-04-03 17:29   ` Stanislav Fomichev
@ 2025-04-03 18:56     ` Willem de Bruijn
  0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-04-03 18:56 UTC (permalink / raw)
  To: Stanislav Fomichev, Willem de Bruijn
  Cc: bpf, netdev, ast, daniel, john.fastabend, Willem de Bruijn

Stanislav Fomichev wrote:
> On 04/03, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> > 
> > Verify that a classic BPF linux socket filter correctly matches
> > packet contents. Including when accessing contents in an
> > skb_frag.
> > 
> > 1. Open a SOCK_RAW socket with a classic BPF filter on UDP dport 8000.
> > 2. Open a tap device with IFF_NAPI_FRAGS to inject skbs with frags.
> > 3. Send a packet for which the UDP header is in frag[0].
> > 4. Receive this packet to demonstrate that the socket accepted it.
> > 
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> 
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>

Thanks for the review :)

> My (weak) preference is to put (most) bpf-related things under
> selftests/bpf, but since you already have it working, not sure
> it's worth the effort.

I wasn't sure since this is exclusively legacy linux socket filters,
and needs a tun network stack to exercise it.

Will keep as is if you indeed don't mind.

> > ---
> >  tools/testing/selftests/net/.gitignore     |   1 +
> >  tools/testing/selftests/net/Makefile       |   2 +
> >  tools/testing/selftests/net/skf_net_off.c  | 244 +++++++++++++++++++++
> >  tools/testing/selftests/net/skf_net_off.sh |  28 +++
> >  4 files changed, 275 insertions(+)
> >  create mode 100644 tools/testing/selftests/net/skf_net_off.c
> >  create mode 100755 tools/testing/selftests/net/skf_net_off.sh
> > 
> > diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
> > index 679542f565a4..532bb732bc6d 100644
> > --- a/tools/testing/selftests/net/.gitignore
> > +++ b/tools/testing/selftests/net/.gitignore
> > @@ -39,6 +39,7 @@ scm_rights
> >  sk_bind_sendto_listen
> >  sk_connect_zero_addr
> >  sk_so_peek_off
> > +skf_net_off
> >  socket
> >  so_incoming_cpu
> >  so_netns_cookie
> > diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> > index 6d718b478ed8..124078b56fa4 100644
> > --- a/tools/testing/selftests/net/Makefile
> > +++ b/tools/testing/selftests/net/Makefile
> > @@ -106,6 +106,8 @@ TEST_PROGS += ipv6_route_update_soft_lockup.sh
> >  TEST_PROGS += busy_poll_test.sh
> >  TEST_GEN_PROGS += proc_net_pktgen
> >  TEST_PROGS += lwt_dst_cache_ref_loop.sh
> > +TEST_PROGS += skf_net_off.sh
> > +TEST_GEN_FILES += skf_net_off
> >  
> >  # YNL files, must be before "include ..lib.mk"
> >  YNL_GEN_FILES := busy_poller netlink-dumps
> > diff --git a/tools/testing/selftests/net/skf_net_off.c b/tools/testing/selftests/net/skf_net_off.c
> > new file mode 100644
> > index 000000000000..1fdf61d6cd7f
> > --- /dev/null
> > +++ b/tools/testing/selftests/net/skf_net_off.c
> > @@ -0,0 +1,244 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/* Open a tun device.
> > + *
> > + * [modifications: use IFF_NAPI_FRAGS, add sk filter]
> > + *
> > + * Expects the device to have been configured previously, e.g.:
> > + *   sudo ip tuntap add name tap1 mode tap
> > + *   sudo ip link set tap1 up
> > + *   sudo ip link set dev tap1 addr 02:00:00:00:00:01
> > + *   sudo ip -6 addr add fdab::1 peer fdab::2 dev tap1 nodad
> > + *
> > + * And to avoid premature pskb_may_pull:
> > + *
> > + *   sudo ethtool -K tap1 gro off
> > + *   sudo bash -c 'echo 0 > /proc/sys/net/ipv4/ip_early_demux'
> > + */
> > +
> > +#define _GNU_SOURCE
> > +
> > +#include <arpa/inet.h>
> > +#include <errno.h>
> > +#include <error.h>
> > +#include <fcntl.h>
> > +#include <getopt.h>
> > +#include <linux/filter.h>
> > +#include <linux/if.h>
> > +#include <linux/if_packet.h>
> > +#include <linux/if_tun.h>
> > +#include <linux/ipv6.h>
> > +#include <netinet/if_ether.h>
> > +#include <netinet/in.h>
> > +#include <netinet/ip.h>
> > +#include <netinet/ip6.h>
> > +#include <netinet/udp.h>
> > +#include <poll.h>
> > +#include <signal.h>
> > +#include <stdbool.h>
> > +#include <stddef.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/socket.h>
> > +#include <sys/poll.h>
> > +#include <sys/types.h>
> > +#include <sys/uio.h>
> > +#include <unistd.h>
> > +
> > +static bool cfg_do_filter;
> > +static bool cfg_do_frags;
> > +static int cfg_dst_port = 8000;
> > +static char *cfg_ifname;
> > +
> > +static int tun_open(const char *tun_name)
> > +{
> > +	struct ifreq ifr = {0};
> > +	int fd, ret;
> > +
> > +	fd = open("/dev/net/tun", O_RDWR);
> > +	if (fd == -1)
> > +		error(1, errno, "open /dev/net/tun");
> > +
> > +	ifr.ifr_flags = IFF_TAP;
> > +	if (cfg_do_frags)
> > +		ifr.ifr_flags |= IFF_NAPI | IFF_NAPI_FRAGS;
> > +
> > +	strncpy(ifr.ifr_name, tun_name, IFNAMSIZ - 1);
> > +
> > +	ret = ioctl(fd, TUNSETIFF, &ifr);
> > +	if (ret)
> > +		error(1, ret, "ioctl TUNSETIFF");
> > +
> > +	return fd;
> > +}
> > +
> > +static void sk_set_filter(int fd)
> > +{
> > +	const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> > +	const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> > +
> > +	/* Filter UDP packets with destination port cfg_dst_port */
> > +	struct sock_filter filter_code[] = {
> > +		BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> > +		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> > +		BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> > +		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> > +		BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > +		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, cfg_dst_port, 1, 0),
> > +		BPF_STMT(BPF_RET + BPF_K, 0),
> > +		BPF_STMT(BPF_RET + BPF_K, 0xFFFF),
> > +	};
> > +
> > +	struct sock_fprog filter = {
> > +		sizeof(filter_code) / sizeof(filter_code[0]),
> > +		filter_code,
> > +	};
> > +
> > +	if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter)))
> > +		error(1, errno, "setsockopt attach filter");
> > +}
> > +
> > +static int raw_open(void)
> > +{
> > +	int fd;
> > +
> > +	fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);
> > +	if (fd == -1)
> > +		error(1, errno, "socket raw (udp)");
> > +
> > +	if (cfg_do_filter)
> > +		sk_set_filter(fd);
> > +
> > +	return fd;
> > +}
> > +
> > +static void tun_write(int fd)
> > +{
> > +	const char eth_src[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x02 };
> > +	const char eth_dst[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x01 };
> > +	struct tun_pi pi = {0};
> > +	struct ipv6hdr ip6h = {0};
> > +	struct udphdr uh = {0};
> > +	struct ethhdr eth = {0};
> > +	uint32_t payload;
> > +	struct iovec iov[5];
> > +	int ret;
> > +
> > +	pi.proto = htons(ETH_P_IPV6);
> > +
> > +	memcpy(eth.h_source, eth_src, sizeof(eth_src));
> > +	memcpy(eth.h_dest, eth_dst, sizeof(eth_dst));
> > +	eth.h_proto = htons(ETH_P_IPV6);
> > +
> > +	ip6h.version = 6;
> > +	ip6h.payload_len = htons(sizeof(uh) + sizeof(uint32_t));
> > +	ip6h.nexthdr = IPPROTO_UDP;
> > +	ip6h.hop_limit = 8;
> > +	if (inet_pton(AF_INET6, "fdab::2", &ip6h.saddr) != 1)
> > +		error(1, errno, "inet_pton src");
> > +	if (inet_pton(AF_INET6, "fdab::1", &ip6h.daddr) != 1)
> > +		error(1, errno, "inet_pton src");
> > +
> > +	uh.source = htons(8000);
> > +	uh.dest = htons(cfg_dst_port);
> > +	uh.len = ip6h.payload_len;
> > +	uh.check = 0;
> > +
> > +	payload = htonl(0xABABABAB);		/* Covered in IPv6 length */
> > +
> > +	iov[0].iov_base = &pi;
> > +	iov[0].iov_len  = sizeof(pi);
> > +	iov[1].iov_base = &eth;
> > +	iov[1].iov_len  = sizeof(eth);
> > +	iov[2].iov_base = &ip6h;
> > +	iov[2].iov_len  = sizeof(ip6h);
> > +	iov[3].iov_base = &uh;
> > +	iov[3].iov_len  = sizeof(uh);
> > +	iov[4].iov_base = &payload;
> > +	iov[4].iov_len  = sizeof(payload);
> > +
> > +	ret = writev(fd, iov, sizeof(iov) / sizeof(iov[0]));
> > +	if (ret <= 0)
> > +		error(1, errno, "writev");
> > +}
> > +
> > +static void raw_read(int fd)
> > +{
> > +	struct timeval tv = { .tv_usec = 100 * 1000 };
> > +	struct msghdr msg = {0};
> > +	struct iovec iov[2];
> > +	struct udphdr uh;
> > +	uint32_t payload[2];
> > +	int ret;
> > +
> > +	if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
> > +		error(1, errno, "setsockopt rcvtimeo udp");
> > +
> > +	iov[0].iov_base = &uh;
> > +	iov[0].iov_len = sizeof(uh);
> > +
> > +	iov[1].iov_base = payload;
> > +	iov[1].iov_len = sizeof(payload);
> > +
> > +	msg.msg_iov = iov;
> > +	msg.msg_iovlen = sizeof(iov) / sizeof(iov[0]);
> > +
> > +	ret = recvmsg(fd, &msg, 0);
> > +	if (ret <= 0)
> > +		error(1, errno, "read raw");
> > +	if (ret != sizeof(uh) + sizeof(payload[0]))
> > +		error(1, errno, "read raw: len=%d\n", ret);
> > +
> > +	fprintf(stderr, "raw recv: 0x%x\n", payload[0]);
> > +}
> > +
> > +static void parse_opts(int argc, char **argv)
> > +{
> > +	int c;
> > +
> > +	while ((c = getopt(argc, argv, "fFi:")) != -1) {
> > +		switch (c) {
> > +		case 'f':
> > +			cfg_do_filter = true;
> > +			printf("bpf filter enabled\n");
> > +			break;
> > +		case 'F':
> > +			cfg_do_frags = true;
> > +			printf("napi frags mode enabled\n");
> > +			break;
> > +		case 'i':
> > +			cfg_ifname = optarg;
> > +			break;
> > +		default:
> > +			error(1, 0, "unknown option %c", optopt);
> > +			break;
> > +		}
> > +	}
> > +
> > +	if (!cfg_ifname)
> > +		error(1, 0, "must specify tap interface name (-i)");
> > +}
> > +
> > +int main(int argc, char **argv)
> > +{
> > +	int fdt, fdr;
> > +
> > +	parse_opts(argc, argv);
> > +
> > +	fdr = raw_open();
> > +	fdt = tun_open(cfg_ifname);
> > +
> > +	tun_write(fdt);
> > +	raw_read(fdr);
> > +
> > +	if (close(fdt))
> > +		error(1, errno, "close tun");
> > +	if (close(fdr))
> > +		error(1, errno, "close udp");
> > +
> > +	fprintf(stderr, "OK\n");
> > +	return 0;
> > +}
> > +
> > diff --git a/tools/testing/selftests/net/skf_net_off.sh b/tools/testing/selftests/net/skf_net_off.sh
> > new file mode 100755
> > index 000000000000..e9cce93a0258
> > --- /dev/null
> > +++ b/tools/testing/selftests/net/skf_net_off.sh
> > @@ -0,0 +1,28 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +readonly NS="ns-$(mktemp -u XXXXXX)"
> > +
> > +cleanup() {
> > +	ip netns del $NS
> > +}
> > +
> > +ip netns add $NS
> > +trap cleanup EXIT
> > +
> > +ip -netns $NS link set lo up
> > +ip -netns $NS tuntap add name tap1 mode tap
> > +ip -netns $NS link set tap1 up
> > +ip -netns $NS link set dev tap1 addr 02:00:00:00:00:01
> > +ip -netns $NS -6 addr add fdab::1 peer fdab::2 dev tap1 nodad
> > +ip netns exec $NS ethtool -K tap1 gro off
> > +ip netns exec $NS sysctl -w net.ipv4.ip_early_demux=0
> 
> Curious: why disable ip_early_demux here?

Otherwise early demux will pull the headers into linear, in
udp_v6_early_demux

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-04-03 18:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-03 14:07 [PATCH bpf 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn
2025-04-03 14:07 ` [PATCH bpf 1/2] bpf: " Willem de Bruijn
2025-04-03 14:18   ` Willem de Bruijn
2025-04-03 17:21   ` Stanislav Fomichev
2025-04-03 17:27     ` Willem de Bruijn
2025-04-03 17:35       ` Stanislav Fomichev
2025-04-03 18:30         ` Maciej Żenczykowski
2025-04-03 18:54           ` Willem de Bruijn
2025-04-03 14:07 ` [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn
2025-04-03 17:29   ` Stanislav Fomichev
2025-04-03 18:56     ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).