netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test
@ 2024-06-11 20:42 YiFei Zhu
  2024-06-11 20:42 ` [RFC PATCH net-next 1/3] selftests/bpf: Move rxq_num helper from xdp_hw_metadata to network_helpers YiFei Zhu
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: YiFei Zhu @ 2024-06-11 20:42 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Stanislav Fomichev,
	Willem de Bruijn

We have observed that hardware NIC drivers may have faulty AF_XDP
implementations, and there seem to be a lack of a test of various modes
in which AF_XDP could run. This series adds a test to verify that NIC
drivers implements many AF_XDP features by performing a send / receive
of a single UDP packet.

I put the C code of the test under selftests/bpf because I'm not really
sure how I'd build the BPF-related code without the selftests/bpf
build infrastructure.

Tested on Google Cloud, with GVE:

  $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
    REMOTE_ARGS="root@10.138.15.235" \
    LOCAL_V4="10.138.15.234" \
    REMOTE_V4="10.138.15.235" \
    LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
    REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
    python3 xsk_hw.py

  KTAP version 1
  1..22
  ok 1 xsk_hw.ipv4_basic
  ok 2 xsk_hw.ipv4_tx_skb_copy
  ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
  ok 4 xsk_hw.ipv4_rx_skb_copy
  ok 5 xsk_hw.ipv4_tx_drv_copy
  ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
  ok 7 xsk_hw.ipv4_rx_drv_copy
  [...]
  # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
  not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
  ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
  ok 10 xsk_hw.ipv4_rx_drv_zerocopy
  [...]
  # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
  [...]
  # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
  not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
  ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
  [...]
  ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
  # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0

YiFei Zhu (3):
  selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
    network_helpers
  selftests/bpf: Add xsk_hw AF_XDP functionality test
  selftests: drv-net: Add xsk_hw AF_XDP functionality test

 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   7 +-
 tools/testing/selftests/bpf/network_helpers.c |  27 +
 tools/testing/selftests/bpf/network_helpers.h |  16 +
 tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
 tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
 tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
 9 files changed, 1102 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
 create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
 create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py

-- 
2.45.2.505.gda0bf45e8d-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH net-next 1/3] selftests/bpf: Move rxq_num helper from xdp_hw_metadata to network_helpers
  2024-06-11 20:42 [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test YiFei Zhu
@ 2024-06-11 20:42 ` YiFei Zhu
  2024-06-11 21:12   ` Willem de Bruijn
  2024-06-11 20:42 ` [RFC PATCH net-next 2/3] selftests/bpf: Add xsk_hw AF_XDP functionality test YiFei Zhu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: YiFei Zhu @ 2024-06-11 20:42 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Stanislav Fomichev,
	Willem de Bruijn

This helper may be useful for other AF_XDP tests, such as xsk_hw.
Moving it out so we don't need to copy-paste that function.

I also changed the function from directly calling error(1, errno, ...)
to returning an error because I don't think it makes sense for a
library function to outright kill the process if the function fails.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 tools/testing/selftests/bpf/network_helpers.c | 27 +++++++++++++++++++
 tools/testing/selftests/bpf/network_helpers.h |  2 ++
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 27 ++-----------------
 3 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
index 35250e6cde7f..4c3bef07df23 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -569,6 +569,33 @@ int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
 	return 0;
 }
 
+int rxq_num(const char *ifname)
+{
+	struct ethtool_channels ch = {
+		.cmd = ETHTOOL_GCHANNELS,
+	};
+	struct ifreq ifr = {
+		.ifr_data = (void *)&ch,
+	};
+	strncpy(ifr.ifr_name, ifname, IF_NAMESIZE - 1);
+	int fd, ret, err;
+
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	if (fd < 0)
+		return -errno;
+
+	ret = ioctl(fd, SIOCETHTOOL, &ifr);
+	if (ret < 0) {
+		err = errno;
+		close(fd);
+		return -err;
+	}
+
+	close(fd);
+
+	return ch.rx_count + ch.combined_count;
+}
+
 struct send_recv_arg {
 	int		fd;
 	uint32_t	bytes;
diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
index 883c7ea9d8d5..b09c3bbd5b62 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -72,6 +72,8 @@ int get_socket_local_port(int sock_fd);
 int get_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
 int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
 
+int rxq_num(const char *ifname);
+
 struct nstoken;
 /**
  * open_netns() - Switch to specified network namespace by name.
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 6f9956eed797..f038a624fd1f 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -495,31 +495,6 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t
 	return 0;
 }
 
-static int rxq_num(const char *ifname)
-{
-	struct ethtool_channels ch = {
-		.cmd = ETHTOOL_GCHANNELS,
-	};
-
-	struct ifreq ifr = {
-		.ifr_data = (void *)&ch,
-	};
-	strncpy(ifr.ifr_name, ifname, IF_NAMESIZE - 1);
-	int fd, ret;
-
-	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
-	if (fd < 0)
-		error(1, errno, "socket");
-
-	ret = ioctl(fd, SIOCETHTOOL, &ifr);
-	if (ret < 0)
-		error(1, errno, "ioctl(SIOCETHTOOL)");
-
-	close(fd);
-
-	return ch.rx_count + ch.combined_count;
-}
-
 static void hwtstamp_ioctl(int op, const char *ifname, struct hwtstamp_config *cfg)
 {
 	struct ifreq ifr = {
@@ -668,6 +643,8 @@ int main(int argc, char *argv[])
 	read_args(argc, argv);
 
 	rxq = rxq_num(ifname);
+	if (rxq < 0)
+		error(1, -rxq, "rxq_num");
 
 	printf("rxq: %d\n", rxq);
 
-- 
2.45.2.505.gda0bf45e8d-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH net-next 2/3] selftests/bpf: Add xsk_hw AF_XDP functionality test
  2024-06-11 20:42 [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test YiFei Zhu
  2024-06-11 20:42 ` [RFC PATCH net-next 1/3] selftests/bpf: Move rxq_num helper from xdp_hw_metadata to network_helpers YiFei Zhu
@ 2024-06-11 20:42 ` YiFei Zhu
  2024-06-11 20:42 ` [RFC PATCH net-next 3/3] selftests: drv-net: " YiFei Zhu
  2024-06-12 11:47 ` [RFC PATCH net-next 0/3] selftests: Add " Magnus Karlsson
  3 siblings, 0 replies; 11+ messages in thread
From: YiFei Zhu @ 2024-06-11 20:42 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Stanislav Fomichev,
	Willem de Bruijn

A UDP packet is sent from one peer to the other once, and verify
if it is received before a timeout. Busypoll mode is not tested
since it's more difficult to write a pass/fail test to verify
that busypoll mode is working as intended, and adding this test
can be future work.

To test TX/RX independently, the non-XDP side uses AF_PACKET to
send/recv in order to minimize possible differences in the
packet headers (as opposed to having AF_INET(6), which makes the
kernel's stack construct the packet). However, an AF_INET(6)
SOCK_DGRAM socket is still created and bound to the port
in order to mute ICMP port unreachable messages without having
to use iptables.

Considering that the set up of AF_XDP may have variable delays
on different machines, to synchronize the execution, a simple
TCP connection is established between client and server as a
synchronization point.

I'm also adding checksum_nofold helper to network_helpers since
I'm sure if it would be a good idea to rely on hardware offload
and XDP metadata for a basic functionality test. If needed we
can extend this test to test checksum offload too.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   7 +-
 tools/testing/selftests/bpf/network_helpers.h |  14 +
 tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
 tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
 5 files changed, 937 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
 create mode 100644 tools/testing/selftests/bpf/xsk_hw.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 5025401323af..1065e4b508c4 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -47,6 +47,7 @@ test_cpp
 *.ko
 *.tmp
 xskxceiver
+xsk_hw
 xdp_redirect_multi
 xdp_synproxy
 xdp_hw_metadata
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index e0b3887b3d2d..854979898ce0 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -144,7 +144,7 @@ TEST_GEN_PROGS_EXTENDED = test_skb_cgroup_id_user \
 	flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
 	test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
 	xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \
-	xdp_features bpf_test_no_cfi.ko
+	xdp_features bpf_test_no_cfi.ko xsk_hw
 
 TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi
 
@@ -476,6 +476,7 @@ test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o
 xsk_xdp_progs.skel.h-deps := xsk_xdp_progs.bpf.o
 xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o
 xdp_features.skel.h-deps := xdp_features.bpf.o
+xsk_hw.skel.h-deps := xsk_hw.bpf.o
 
 LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps)))
 
@@ -710,6 +711,10 @@ $(OUTPUT)/xdp_features: xdp_features.c $(OUTPUT)/network_helpers.o $(OUTPUT)/xdp
 	$(call msg,BINARY,,$@)
 	$(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@
 
+$(OUTPUT)/xsk_hw: xsk_hw.c $(OUTPUT)/network_helpers.o $(OUTPUT)/xsk.o $(OUTPUT)/xsk_hw.skel.h | $(OUTPUT)
+	$(call msg,BINARY,,$@)
+	$(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@
+
 # Make sure we are able to include and link libbpf against c++.
 $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
 	$(call msg,CXX,,$@)
diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
index b09c3bbd5b62..732f07486162 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -85,6 +85,20 @@ struct nstoken *open_netns(const char *name);
 void close_netns(struct nstoken *token);
 int send_recv_data(int lfd, int fd, uint32_t total_bytes);
 
+static inline __wsum checksum_nofold(const void *data, size_t len, __wsum sum)
+{
+	const uint16_t *words = (const uint16_t *)data;
+	int i;
+
+	for (i = 0; i < len / 2; i++)
+		sum += words[i];
+
+	if (len & 1)
+		sum += ((const unsigned char *)data)[len - 1];
+
+	return sum;
+}
+
 static __u16 csum_fold(__u32 csum)
 {
 	csum = (csum & 0xffff) + (csum >> 16);
diff --git a/tools/testing/selftests/bpf/progs/xsk_hw.c b/tools/testing/selftests/bpf/progs/xsk_hw.c
new file mode 100644
index 000000000000..8009267ddc33
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xsk_hw.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/in.h>
+#include <linux/udp.h>
+#include <stdbool.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_XSKMAP);
+	__uint(max_entries, 256);
+	__type(key, __u32);
+	__type(value, __u32);
+} xsk SEC(".maps");
+
+__u16 port;
+bool should_rx;
+
+SEC("xdp")
+int rx(struct xdp_md *ctx)
+{
+	void *data, *data_end;
+	struct ipv6hdr *ip6h;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+	struct udphdr *uh;
+
+	if (!should_rx)
+		return XDP_PASS;
+
+	data = (void *)(long)ctx->data;
+	data_end = (void *)(long)ctx->data_end;
+
+	eth = data;
+	data = eth + 1;
+	if (data > data_end)
+		return XDP_PASS;
+
+	if (eth->h_proto == bpf_htons(ETH_P_IP)) {
+		iph = data;
+		data = iph + 1;
+		if (data > data_end)
+			return XDP_PASS;
+		if (iph->protocol != IPPROTO_UDP)
+			return XDP_PASS;
+	} else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) {
+		ip6h = data;
+		data = ip6h + 1;
+		if (data > data_end)
+			return XDP_PASS;
+		if (ip6h->nexthdr != IPPROTO_UDP)
+			return XDP_PASS;
+	} else {
+		return XDP_PASS;
+	}
+
+	uh = data;
+	data = uh + 1;
+	if (data > data_end)
+		return XDP_PASS;
+	if (uh->dest != port)
+		return XDP_PASS;
+
+	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/xsk_hw.c b/tools/testing/selftests/bpf/xsk_hw.c
new file mode 100644
index 000000000000..dd6fc7b562ba
--- /dev/null
+++ b/tools/testing/selftests/bpf/xsk_hw.c
@@ -0,0 +1,844 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* XSK basic regression test
+ *
+ * Exercise AF_XDP (XSK) sockets in all modes
+ * - skb copy
+ * - drv copy
+ * - drv zerocopy
+ *
+ * Run:
+ *
+ * server: ./xsk_hw -i $DEV -[46] -D $CLIENT_ADDR -S $SERVER_ADDR -m $LOCAL_MAC -M $GW_MAC -h $ARGS
+ * client: ./xsk_hw -i $DEV -[46] -D $SERVER_ADDR -S $CLIENT_ADDR -m $LOCAL_MAC -M $GW_MAC $ARGS
+ *
+ * Args:
+ *
+ * - ``: no args: minimal connectivity sanity test using PF_PACKET
+ *
+ * - `-T -s -c`: test transmit, skb copy mode
+ * - `-T -d -c`: test transmit, driver copy mode
+ * - `-T -d -z`: test transmit, driver zerocopy mode
+ *
+ * - `-R -s -c`: receive, skb copy mode
+ * - `-R -d -c`: receive, driver copy mode
+ * - `-R -d -z`: receive, driver zerocopy mode
+ */
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <linux/errqueue.h>
+#include <linux/ethtool.h>
+#include <linux/filter.h>
+#include <linux/if_ether.h>
+#include <linux/if_link.h>
+#include <linux/if_packet.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/net_tstamp.h>
+#include <linux/sockios.h>
+#include <linux/udp.h>
+#include <limits.h>
+#include <net/if.h>
+#include <poll.h>
+#include <signal.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <uapi/linux/filter.h>
+#include <unistd.h>
+
+#include <network_helpers.h>
+
+#include "xsk.h"
+#include "xsk_hw.skel.h"
+
+static int cfg_addr_len;
+static void *cfg_daddr, *cfg_saddr;
+static struct in_addr cfg_daddr4, cfg_saddr4;
+static struct in6_addr cfg_daddr6, cfg_saddr6;
+static uint16_t cfg_eth_proto;
+static int cfg_family = PF_UNSPEC;
+
+static bool cfg_host_run;
+static char *cfg_ifname = "eth0";
+static int cfg_ifindex;
+static bool cfg_fill_after_bind;
+static char *cfg_mac_dst, *cfg_mac_src;
+static int cfg_num_rxq;
+static uint16_t cfg_port = __constant_htons(8000);
+static int cfg_pkt_len;
+static const char cfg_payload[] = "aaaaaaaa";
+static int cfg_rcvtimeo = 10;
+static int cfg_send_queue_id;
+static __u32 cfg_xdp_flags = XDP_FLAGS_REPLACE;
+static __u16 cfg_xdp_bind_flags;
+static bool cfg_xdp_rx;
+static bool cfg_xdp_tx;
+static bool cfg_xdp_tx_force_attach;
+
+/* constants that can be used in static array allocation
+ * const int is not sufficient: a const qualified variable
+ */
+enum {
+	pkt_len_l4 = sizeof(struct udphdr) + sizeof(cfg_payload),
+	pkt_len_v4 = ETH_HLEN + sizeof(struct iphdr) + pkt_len_l4,
+	pkt_len_v6 = ETH_HLEN + sizeof(struct ipv6hdr) + pkt_len_l4,
+};
+
+static char pkt[pkt_len_v6];
+
+#define UMEM_NUM 8192
+#define UMEM_QLEN (UMEM_NUM / 2)
+#define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM)
+
+struct xsk {
+	void *umem_area;
+	struct xsk_umem *umem;
+	struct xsk_ring_prod fill;
+	struct xsk_ring_cons comp;
+	struct xsk_ring_prod tx;
+	struct xsk_ring_cons rx;
+	struct xsk_socket *socket;
+	__u32 tx_head;
+};
+
+static struct xsk_hw *bpf_obj;
+static struct xsk *xsks;
+
+static int pfpacket_fd;
+static int udp_fd;
+
+static void init_pkt_ipv4(struct iphdr *iph)
+{
+	struct udphdr *uh;
+
+	iph->version = 4;
+	iph->ihl = 5;
+	iph->protocol = IPPROTO_UDP;
+	iph->tot_len = htons(sizeof(*iph) + sizeof(*uh) + sizeof(cfg_payload));
+	iph->ttl = 64;
+	iph->daddr = cfg_daddr4.s_addr;
+	iph->saddr = cfg_saddr4.s_addr;
+	iph->check = csum_fold(checksum_nofold(iph, sizeof(*iph), 0));
+}
+
+static void init_pkt_ipv6(struct ipv6hdr *ip6h)
+{
+	struct udphdr *uh;
+
+	ip6h->version = 6;
+	ip6h->payload_len = htons(sizeof(*uh) + sizeof(cfg_payload));
+	ip6h->nexthdr = IPPROTO_UDP;
+	ip6h->hop_limit = 64;
+	ip6h->daddr = cfg_daddr6;
+	ip6h->saddr = cfg_saddr6;
+}
+
+static void init_pkt(void)
+{
+	struct ipv6hdr *ip6h = NULL;
+	struct iphdr *iph = NULL;
+	struct ethhdr *eth;
+	struct udphdr *uh;
+	__wsum check;
+
+	/* init mac header */
+	eth = (void *)&pkt;
+	if (sscanf(cfg_mac_dst, "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx",
+		   &eth->h_dest[0], &eth->h_dest[1], &eth->h_dest[2],
+		   &eth->h_dest[3], &eth->h_dest[4], &eth->h_dest[5]) != 6)
+		error(1, 0, "sscanf mac dst ('-M')\n");
+	if (sscanf(cfg_mac_src, "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx",
+		   &eth->h_source[0], &eth->h_source[1], &eth->h_source[2],
+		   &eth->h_source[3], &eth->h_source[4], &eth->h_source[5]) != 6)
+		error(1, 0, "sscanf mac src ('-m')\n");
+	eth->h_proto = htons(cfg_eth_proto);
+
+	if (cfg_family == PF_INET) {
+		iph = (void *)(eth + 1);
+		uh = (void *)(iph + 1);
+		init_pkt_ipv4(iph);
+	} else {
+		ip6h = (void *)(eth + 1);
+		uh = (void *)(ip6h + 1);
+		init_pkt_ipv6(ip6h);
+	}
+
+	/* init udp header */
+	uh->source = cfg_port;
+	uh->dest = cfg_port;
+	uh->len = htons(sizeof(*uh) + sizeof(cfg_payload));
+	uh->check = 0;
+
+	/* init payload */
+	memcpy(uh + 1, cfg_payload, sizeof(cfg_payload));
+
+	/* udp checksum */
+	check = checksum_nofold(uh, sizeof(*uh) + sizeof(cfg_payload), 0);
+	if (ip6h)
+		uh->check = csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr,
+					    ntohs(uh->len), IPPROTO_UDP,
+					    check);
+	else
+		uh->check = csum_tcpudp_magic(iph->saddr, iph->daddr,
+					      ntohs(uh->len), IPPROTO_UDP,
+					      check);
+}
+
+static void *verify_pkt_ipv4(void *data, void *data_end)
+{
+	struct iphdr *iph = data;
+
+	data = iph + 1;
+	if (data > data_end)
+		return NULL;
+
+	if (iph->protocol != IPPROTO_UDP)
+		return NULL;
+
+	return data;
+}
+
+static void *verify_pkt_ipv6(void *data, void *data_end)
+{
+	struct ipv6hdr *ip6h = data;
+
+	data = ip6h + 1;
+	if (data > data_end)
+		return NULL;
+
+	if (ip6h->nexthdr != IPPROTO_UDP)
+		return NULL;
+
+	return data;
+}
+
+static void verify_pkt(void *data, size_t len)
+{
+	void *data_end = data + len;
+	struct ethhdr *eth;
+	struct udphdr *uh;
+
+	eth = data;
+	data = eth + 1;
+	if (data > data_end)
+		goto bad;
+	if (eth->h_proto != htons(cfg_eth_proto))
+		goto bad;
+
+	if (cfg_family == PF_INET)
+		data = verify_pkt_ipv4(data, data_end);
+	else
+		data = verify_pkt_ipv6(data, data_end);
+	if (!data)
+		goto bad;
+
+	uh = data;
+	data = uh + 1;
+	if (data > data_end)
+		goto bad;
+	if (uh->dest != cfg_port)
+		goto bad;
+
+	if (data_end - data != sizeof(cfg_payload))
+		goto bad;
+	if (memcmp(data, cfg_payload, sizeof(cfg_payload)))
+		goto bad;
+
+	return;
+bad:
+	error(1, 0, "bad packet content");
+}
+
+static void udp_bind(void)
+{
+	/* Dual-stack, as not enabling IPV6_V6ONLY */
+	struct sockaddr_in6 ip6addr = {
+		.sin6_family = AF_INET6,
+		.sin6_port = cfg_port,
+		.sin6_addr = in6addr_any,
+	};
+
+	if (bind(udp_fd, (void *)&ip6addr, sizeof(ip6addr)) == -1)
+		error(1, 0, "udp bind");
+}
+
+static void pfpacket_setfilter_ipproto(void)
+{
+	int off_proto, off_port;
+
+	if (cfg_family == PF_INET) {
+		off_proto = ETH_HLEN + offsetof(struct iphdr, protocol);
+		off_port = ETH_HLEN + sizeof(struct iphdr) + offsetof(struct udphdr, dest);
+	} else {
+		off_proto = ETH_HLEN + offsetof(struct ipv6hdr, nexthdr);
+		off_port = ETH_HLEN + sizeof(struct ipv6hdr) + offsetof(struct udphdr, dest);
+	}
+
+	struct sock_filter filter[] = {
+		BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
+		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 5),
+		BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, off_proto),
+		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 3),
+		BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, off_port),
+		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, ntohs(cfg_port), 0, 1),
+		BPF_STMT(BPF_RET + BPF_K, 0xFFFF),
+		BPF_STMT(BPF_RET + BPF_K, 0),
+	};
+	struct sock_fprog prog = {};
+
+	prog.filter = filter;
+	prog.len = sizeof(filter) / sizeof(struct sock_filter);
+	if (setsockopt(pfpacket_fd, SOL_SOCKET, SO_ATTACH_FILTER, &prog, sizeof(prog)))
+		error(1, errno, "setsockopt filter");
+}
+
+static void pfpacket_bind(void)
+{
+	struct sockaddr_ll laddr = {
+		.sll_family = AF_PACKET,
+		.sll_protocol = cfg_xdp_rx ? 0 : htons(cfg_eth_proto),
+		.sll_ifindex = cfg_ifindex,
+	};
+
+	if (bind(pfpacket_fd, (void *)&laddr, sizeof(laddr)) == -1)
+		error(1, 0, "pfpacket bind");
+}
+
+static int fill_xsk(struct xsk *xsk)
+{
+	__u64 addr;
+	__u32 idx;
+	int i;
+
+	/* returns either 0 on failure or second arg, UMEM_QLEN */
+	if (!xsk_ring_prod__reserve(&xsk->fill, UMEM_QLEN, &idx))
+		return -ENOMEM;
+
+	for (i = 0; i < UMEM_QLEN; i++) {
+		addr = (UMEM_QLEN + i) * UMEM_FRAME_SIZE;
+		*xsk_ring_prod__fill_addr(&xsk->fill, i) = addr;
+	}
+	xsk_ring_prod__submit(&xsk->fill, UMEM_QLEN);
+
+	return 0;
+}
+
+static int open_xsk(struct xsk *xsk, __u32 queue_id)
+{
+	const int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE;
+	const struct xsk_socket_config socket_config = {
+		.rx_size = UMEM_QLEN,
+		.tx_size = UMEM_QLEN,
+		.bind_flags = cfg_xdp_bind_flags,
+	};
+	const struct xsk_umem_config umem_config = {
+		.fill_size = UMEM_QLEN,
+		.comp_size = UMEM_QLEN,
+		.frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE,
+	};
+	__u64 addr;
+	int ret;
+	int i;
+
+	xsk->umem_area = mmap(NULL, UMEM_SIZE, PROT_READ | PROT_WRITE, mmap_flags, -1, 0);
+	if (xsk->umem_area == MAP_FAILED)
+		return -ENOMEM;
+
+	ret = xsk_umem__create(&xsk->umem,
+			       xsk->umem_area, UMEM_SIZE,
+			       &xsk->fill,
+			       &xsk->comp,
+			       &umem_config);
+	if (ret)
+		return ret;
+
+	ret = xsk_socket__create(&xsk->socket, cfg_ifindex, queue_id,
+				 xsk->umem,
+				 &xsk->rx,
+				 &xsk->tx,
+				 &socket_config);
+	if (ret)
+		return ret;
+
+	/* First half of umem is for TX. This way address matches 1-to-1
+	 * to the completion queue index.
+	 */
+
+	for (i = 0; i < UMEM_QLEN; i++) {
+		addr = i * UMEM_FRAME_SIZE;
+		memcpy(xsk_umem__get_data(xsk->umem_area, addr),
+		       pkt, cfg_pkt_len);
+	}
+
+	/* Second half of umem is for RX. */
+	if (!cfg_fill_after_bind) {
+		if (fill_xsk(xsk))
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void release_tx(struct xsk *xsk)
+{
+	__u32 idx = 0;
+	unsigned int n;
+
+	n = xsk_ring_cons__peek(&xsk->comp, XSK_RING_CONS__DEFAULT_NUM_DESCS, &idx);
+	if (n)
+		xsk_ring_cons__release(&xsk->comp, n);
+}
+
+static void send_xsk(void)
+{
+	struct xsk *xsk = &xsks[cfg_send_queue_id];
+	struct xdp_desc *desc;
+	__u32 idx;
+
+	release_tx(xsk);
+	if (xsk_ring_prod__reserve(&xsk->tx, 1, &idx) != 1)
+		error(1, 0, "TX ring is full");
+
+	desc = xsk_ring_prod__tx_desc(&xsk->tx, idx);
+	desc->addr = (xsk->tx_head++ % UMEM_QLEN) * UMEM_FRAME_SIZE;
+	desc->len = cfg_pkt_len;
+
+	xsk_ring_prod__submit(&xsk->tx, 1);
+	sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0);
+}
+
+static void refill_rx(struct xsk *xsk, __u64 addr)
+{
+	__u32 idx;
+
+	if (xsk_ring_prod__reserve(&xsk->fill, 1, &idx) == 1) {
+		*xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr;
+		xsk_ring_prod__submit(&xsk->fill, 1);
+	}
+}
+
+static void recv_xsk(void)
+{
+	const struct xdp_desc *desc;
+	struct pollfd fds[cfg_num_rxq];
+	__u64 comp_addr;
+	__u64 addr;
+	__u32 idx;
+	int ret;
+	int i;
+
+	for (i = 0; i < cfg_num_rxq; i++) {
+		fds[i].fd = xsk_socket__fd(xsks[i].socket);
+		fds[i].events = POLLIN;
+		fds[i].revents = 0;
+	}
+
+	ret = poll(fds, cfg_num_rxq, cfg_rcvtimeo * 1000);
+	if (ret < 0)
+		error(1, -ret, "poll");
+	if (ret == 0)
+		error(1, 0, "%s: Timeout", __func__);
+
+	for (i = 0; i < cfg_num_rxq; i++) {
+		if (fds[i].revents == 0)
+			continue;
+
+		struct xsk *xsk = &xsks[i];
+
+		/* Reading one packet at a time, because we expect only one
+		 * packet outstanding per queue at a time due to test doing
+		 * single connection request/response
+		 */
+		ret = xsk_ring_cons__peek(&xsk->rx, 1, &idx);
+		if (ret != 1)
+			continue;
+
+		desc = xsk_ring_cons__rx_desc(&xsk->rx, idx);
+		comp_addr = xsk_umem__extract_addr(desc->addr);
+		addr = xsk_umem__add_offset_to_addr(desc->addr);
+		verify_pkt(xsk_umem__get_data(xsk->umem_area, addr), desc->len);
+		xsk_ring_cons__release(&xsk->rx, 1);
+		refill_rx(xsk, comp_addr);
+	}
+}
+
+static void send_pfpacket(void)
+{
+	int ret;
+
+	ret = write(pfpacket_fd, pkt, cfg_pkt_len);
+	if (ret == -1)
+		error(1, errno, "write");
+	if (ret != cfg_pkt_len)
+		error(1, 0, "write pkt: %uB != %uB", ret, cfg_pkt_len);
+}
+
+static void recv_pfpacket(void)
+{
+	static char recv_pkt[sizeof(pkt)];
+	struct pollfd fds = {
+		.fd = pfpacket_fd,
+		.events = POLLIN,
+	};
+	int ret, pkt_len;
+
+	ret = poll(&fds, 1, cfg_rcvtimeo * 1000);
+	if (ret < 0)
+		error(1, -ret, "poll");
+	if (ret == 0)
+		error(1, 0, "%s: Timeout", __func__);
+
+	pkt_len = cfg_family == PF_INET ? pkt_len_v4 : pkt_len_v6;
+
+	ret = recv(pfpacket_fd, recv_pkt, sizeof(pkt), MSG_TRUNC);
+	if (ret == -1)
+		error(1, errno, "recv");
+	if (ret != pkt_len)
+		error(1, 0, "recv pkt: %uB != %uB\n", ret, pkt_len);
+
+	verify_pkt(recv_pkt, ret);
+}
+
+static void do_send(void)
+{
+	if (cfg_xdp_tx)
+		send_xsk();
+	else
+		send_pfpacket();
+}
+
+static void do_recv(void)
+{
+	if (cfg_xdp_rx)
+		recv_xsk();
+	else
+		recv_pfpacket();
+}
+
+static bool link_is_down(void)
+{
+	char path[PATH_MAX];
+	FILE *file;
+	char status;
+
+	snprintf(path, PATH_MAX, "/sys/class/net/%s/carrier", cfg_ifname);
+	file = fopen(path, "r");
+	if (!file)
+		error(1, errno, "%s", path);
+
+	if (fread(&status, 1, 1, file) != 1)
+		error(1, errno, "fread");
+
+	fclose(file);
+
+	return status == '0';
+}
+
+static void do_sync_client(void)
+{
+	struct sockaddr_in ip4addr = {
+		.sin_family = AF_INET,
+		.sin_port = cfg_port,
+		.sin_addr = cfg_daddr4,
+	};
+	struct sockaddr_in6 ip6addr = {
+		.sin6_family = AF_INET6,
+		.sin6_port = cfg_port,
+		.sin6_addr = cfg_daddr6,
+	};
+	const int retry_sleep_ms = 200;
+	const int retries_per_sec = 1000 / retry_sleep_ms;
+	const int max_retries = cfg_rcvtimeo * retries_per_sec;
+	int fd, ret, retries = 0;
+
+	fd = socket(cfg_family, SOCK_STREAM, 0);
+	if (fd == -1)
+		error(1, errno, "socket sync client");
+
+	/* If the client calls connect before the server listens,
+	 * the connection will fail immediately and the call returns
+	 * with ECONNREFUSED. Retry up to cfg_rcvtimeo.
+	 */
+	while (true) {
+		if (cfg_family == PF_INET)
+			ret = connect(fd, (void *)&ip4addr, sizeof(ip4addr));
+		else
+			ret = connect(fd, (void *)&ip6addr, sizeof(ip6addr));
+
+		if (ret == -1 && errno != ECONNREFUSED)
+			error(1, errno, "connect sync client");
+		if (ret == 0)
+			break;
+		retries++;
+		usleep(retry_sleep_ms * 1000);
+		if (retries == max_retries)
+			error(1, 0, "connect sync client: max_retries");
+	}
+
+	if (close(fd))
+		error(1, errno, "close sync client");
+}
+
+static void do_sync_server(void)
+{
+	int fdl, fdc;
+	int *fds;
+
+	/* Dual-stack, as not enabling IPV6_V6ONLY */
+	fds = start_reuseport_server(AF_INET6, SOCK_STREAM, "::",
+				     ntohs(cfg_port), cfg_rcvtimeo * 1000, 1);
+	if (!fds)
+		error(1, errno, "start_server");
+	fdl = *fds;
+
+	fdc = accept(fdl, NULL, NULL);
+	if (fdc == -1)
+		error(1, errno, "accept sync");
+
+	if (close(fdc))
+		error(1, errno, "close sync child");
+	if (close(fdl))
+		error(1, errno, "close sync listener");
+}
+
+static void sync_barrier(void)
+{
+	if (cfg_host_run)
+		do_sync_server();
+	else
+		do_sync_client();
+}
+
+static void cleanup(void)
+{
+	LIBBPF_OPTS(bpf_xdp_attach_opts, opts);
+
+	if (bpf_obj) {
+		opts.old_prog_fd = bpf_program__fd(bpf_obj->progs.rx);
+		if (opts.old_prog_fd >= 0)
+			bpf_xdp_detach(cfg_ifindex, cfg_xdp_flags, &opts);
+	}
+}
+
+static void setup_for_ipv4(void)
+{
+	cfg_family = PF_INET;
+	cfg_eth_proto = ETH_P_IP;
+	cfg_addr_len = sizeof(struct in_addr);
+	cfg_pkt_len = pkt_len_v4;
+	cfg_daddr = &cfg_daddr4;
+	cfg_saddr = &cfg_saddr4;
+}
+
+static void setup_for_ipv6(void)
+{
+	cfg_family = PF_INET6;
+	cfg_eth_proto = ETH_P_IPV6;
+	cfg_addr_len = sizeof(struct in6_addr);
+	cfg_pkt_len = pkt_len_v6;
+	cfg_daddr = &cfg_daddr6;
+	cfg_saddr = &cfg_saddr6;
+}
+
+static void parse_opts(int argc, char **argv)
+{
+	char *daddr = NULL, *saddr = NULL;
+	int c;
+
+	while ((c = getopt(argc, argv, "46cD:dfhi:m:M:p:q:RS:sTz")) != -1) {
+		switch (c) {
+		case '4':
+			setup_for_ipv4();
+			break;
+		case '6':
+			setup_for_ipv6();
+			break;
+		case 'c':
+			cfg_xdp_bind_flags |= XDP_COPY;
+			break;
+		case 'D':
+			daddr = optarg;
+			break;
+		case 'd':
+			cfg_xdp_flags |= XDP_FLAGS_DRV_MODE;
+			break;
+		case 'f':
+			cfg_fill_after_bind = true;
+			break;
+		case 'h':
+			cfg_host_run = true;
+			break;
+		case 'i':
+			cfg_ifname = optarg;
+			break;
+		case 'm':
+			cfg_mac_src = optarg;
+			break;
+		case 'M':
+			cfg_mac_dst = optarg;
+			break;
+		case 'p':
+			cfg_port = htons(atoi(optarg));
+			break;
+		case 'q':
+			cfg_send_queue_id = atoi(optarg);
+			break;
+		case 'R':
+			cfg_xdp_rx = true;
+			break;
+		case 'S':
+			saddr = optarg;
+			break;
+		case 's':
+			cfg_xdp_flags |= XDP_FLAGS_SKB_MODE;
+			break;
+		case 'T':
+			if (cfg_xdp_tx)
+				cfg_xdp_tx_force_attach = true;
+			cfg_xdp_tx = true;
+			break;
+		case 'z':
+			cfg_xdp_bind_flags |= XDP_ZEROCOPY;
+			break;
+		default:
+			error(1, 0, "%s: parse error", argv[0]);
+		}
+	}
+
+	if (cfg_family == PF_UNSPEC)
+		error(1, 0, "select one of -4 or -6");
+
+	if (!cfg_mac_src || !cfg_mac_dst || !saddr || !daddr)
+		error(1, 0, "all MAC and IP addresses must be set");
+	if (cfg_fill_after_bind && !cfg_xdp_rx && cfg_xdp_tx)
+		error(1, 0, "'-f' is meaningless without '-R' or '-T'");
+
+	if (inet_pton(cfg_family, daddr, cfg_daddr) != 1)
+		error(1, 0, "dst addr parse error: dst ('-D')");
+	if (inet_pton(cfg_family, saddr, cfg_saddr) != 1)
+		error(1, 0, "src addr parse error: src ('-S')");
+
+	cfg_ifindex = if_nametoindex(cfg_ifname);
+	if (!cfg_ifindex)
+		error(1, 0, "ifname invalid");
+}
+
+static void handle_signal(int sig)
+{
+	/* Signal handler (rather than default termination) needed to
+	 * make sure the atexit cleanup is invoked and XDP is detached
+	 */
+	exit(1);
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+	int i;
+
+	parse_opts(argc, argv);
+	init_pkt();
+
+	/* A UDP socket to silence kernel-generated ICMP unreachable
+	 * without needing an iptables rule.
+	 */
+	udp_fd = socket(PF_INET6, SOCK_DGRAM, 0);
+	if (udp_fd == -1)
+		error(1, errno, "socket");
+
+	pfpacket_fd = socket(PF_PACKET, SOCK_RAW, 0);
+	if (pfpacket_fd == -1)
+		error(1, errno, "socket");
+
+	udp_bind();
+
+	pfpacket_setfilter_ipproto();
+	pfpacket_bind();
+
+	cfg_num_rxq = rxq_num(cfg_ifname);
+	if (cfg_num_rxq < 0)
+		error(1, -cfg_num_rxq, "rxq_num");
+
+	if (cfg_xdp_rx || cfg_xdp_tx_force_attach) {
+		bpf_obj = xsk_hw__open();
+		if (libbpf_get_error(bpf_obj))
+			error(1, libbpf_get_error(bpf_obj), "xsk_hw__open");
+
+		/* Not doing bpf_program__set_ifindex because it requests offload */
+
+		ret = xsk_hw__load(bpf_obj);
+		if (ret)
+			error(1, -ret, "xsk_hw__load");
+
+		bpf_obj->bss->port = cfg_port;
+		bpf_obj->bss->should_rx = cfg_xdp_rx;
+	}
+
+	if (cfg_xdp_rx || cfg_xdp_tx) {
+		xsks = calloc(cfg_num_rxq, sizeof(struct xsk));
+		if (!xsks)
+			error(1, ENOMEM, "malloc");
+
+		for (i = 0; i < cfg_num_rxq; i++) {
+			ret = open_xsk(&xsks[i], i);
+			if (ret)
+				error(1, -ret, "open_xsk");
+		}
+	}
+
+	if (cfg_xdp_rx) {
+		for (i = 0; i < cfg_num_rxq; i++) {
+			int sock_fd = xsk_socket__fd(xsks[i].socket);
+			__u32 queue_id = i;
+
+			ret = bpf_map__update_elem(bpf_obj->maps.xsk,
+						   &queue_id, sizeof(queue_id),
+						   &sock_fd, sizeof(sock_fd), 0);
+			if (ret)
+				error(1, -ret, "bpf_map__update_elem");
+		}
+	}
+
+	if (cfg_xdp_rx || cfg_xdp_tx_force_attach) {
+		ret = bpf_xdp_attach(cfg_ifindex,
+				     bpf_program__fd(bpf_obj->progs.rx),
+				     cfg_xdp_flags, NULL);
+		if (ret)
+			error(1, -ret, "bpf_xdp_attach");
+	}
+
+	/* Optionally exercise an AF_XDP API use edge case:
+	 * Bind the socket before making buffers available in the fillq.
+	 *
+	 * Peculiar behavior, but seen in practice and seen it tripping
+	 * up at least one driver that would leave networking disabled
+	 * on failing to allocate during init.
+	 */
+	if (cfg_fill_after_bind) {
+		for (i = 0; i < cfg_num_rxq; i++)
+			fill_xsk(&xsks[i]);
+	}
+
+	atexit(cleanup);
+	signal(SIGINT, handle_signal);
+	signal(SIGTERM, handle_signal);
+
+	/* XDP may need a delay for device reinitialization */
+	do {
+		usleep(100 * 1000);
+	} while (link_is_down());
+
+	sync_barrier();
+
+	if (cfg_host_run) {
+		usleep(100 * 1000);
+		do_recv();
+	} else {
+		do_send();
+	}
+}
-- 
2.45.2.505.gda0bf45e8d-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH net-next 3/3] selftests: drv-net: Add xsk_hw AF_XDP functionality test
  2024-06-11 20:42 [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test YiFei Zhu
  2024-06-11 20:42 ` [RFC PATCH net-next 1/3] selftests/bpf: Move rxq_num helper from xdp_hw_metadata to network_helpers YiFei Zhu
  2024-06-11 20:42 ` [RFC PATCH net-next 2/3] selftests/bpf: Add xsk_hw AF_XDP functionality test YiFei Zhu
@ 2024-06-11 20:42 ` YiFei Zhu
  2024-06-11 21:09   ` Willem de Bruijn
  2024-06-12 11:47 ` [RFC PATCH net-next 0/3] selftests: Add " Magnus Karlsson
  3 siblings, 1 reply; 11+ messages in thread
From: YiFei Zhu @ 2024-06-11 20:42 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Stanislav Fomichev,
	Willem de Bruijn

Run tools/testing/selftest/bpf/xsk_hw.c as part of drv-net.

A matrix of TX/RX, copy/zerocopy, and driver mode / skb mode, are
tested. Additionally, it tests some edge cases such as:
- Zerocopy TX with and without attaching an XDP prog.
- Zerocopy RX where binding happens before fillq gets filled.

TX and RX are tested separately, and the remote side always runs
the basic AF_PACKET handler rather than AF_XDP, in order to
isolate potential causes of test failures.

Currently the next-hop MAC address of each side must be manually
specified via LOCAL_NEXTHOP_MAC & REMOTE_NEXTHOP_MAC. It's probably
doable to detect these addresses automatically, but it's future work,
and probably library code since it is also applicable to csum.py.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 .../selftests/drivers/net/hw/xsk_hw.py        | 133 ++++++++++++++++++
 2 files changed, 134 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py

diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index 4933d045ab66..e4647ba126a1 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -11,6 +11,7 @@ TEST_PROGS = \
 	hw_stats_l3_gre.sh \
 	loopback.sh \
 	pp_alloc_fail.py \
+	xsk_hw.py \
 	#
 
 TEST_FILES := \
diff --git a/tools/testing/selftests/drivers/net/hw/xsk_hw.py b/tools/testing/selftests/drivers/net/hw/xsk_hw.py
new file mode 100755
index 000000000000..f8ccbb0134b9
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/xsk_hw.py
@@ -0,0 +1,133 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""Run the tools/testing/selftests/net/xsk_hw testsuite."""
+
+from os import path
+
+from lib.py import ksft_run, ksft_exit, KsftSkipEx
+from lib.py import NetdevFamily, NetDrvEpEnv
+from lib.py import bkg, cmd, ip, wait_port_listen
+
+
+def build_shared_args(cfg, ipv4):
+    """Build common arguments between test cases."""
+    if ipv4:
+        proto, local_addr, remote_addr = "-4", cfg.v4, cfg.remote_v4
+    else:
+        proto, local_addr, remote_addr = "-6", cfg.v6, cfg.remote_v6
+
+    local_args = (f"{proto} -i {cfg.ifname} -S {local_addr} -D {remote_addr} "
+                     f"-m ${cfg.mac_local} -M ${cfg.mac_nexthop_local}")
+    remote_args = (f"{proto} -i {cfg.ifname} -S {remote_addr} -D {local_addr} "
+                      f"-m ${cfg.mac_remote} -M ${cfg.mac_nexthop_remote}")
+
+    return local_args, remote_args
+
+
+def test_receive(cfg, ipv4, extra_args):
+    """Test local XSK receive. Remote host sends crafted packets."""
+    local_args, remote_args = build_shared_args(cfg, ipv4)
+
+    rx_cmd = f"{cfg.bin_local} -h {local_args} {extra_args}"
+    tx_cmd = f"{cfg.bin_remote} {remote_args}"
+
+    with bkg(rx_cmd, exit_wait=True, fail=True):
+        wait_port_listen(8000, proto="udp")
+        cmd(tx_cmd, host=cfg.remote)
+
+
+def test_transmit(cfg, ipv4, extra_args):
+    """Test local XSK transmit. Remote host verifies packets."""
+    local_args, remote_args = build_shared_args(cfg, ipv4)
+
+    rx_cmd = f"{cfg.bin_remote} -h -i {cfg.ifname} {remote_args}"
+    tx_cmd = f"{cfg.bin_local} -i {cfg.ifname} {local_args} {extra_args}"
+
+    with bkg(rx_cmd, host=cfg.remote, exit_wait=True, fail=True):
+        wait_port_listen(8000, proto="udp", host=cfg.remote)
+        cmd(tx_cmd)
+
+
+def test_builder(name, cfg, ipv4, tx, extra_args="", required_features=()):
+    """Construct specific tests from the common template.
+
+       Most tests follow the same basic pattern, differing only in
+       the direction of the test, the required XDP features, and
+       flags passed to xsk_hw."""
+    def f(cfg):
+        if ipv4:
+            cfg.require_v4()
+        else:
+            cfg.require_v6()
+
+        if not cfg.have_xdp_features.issuperset(required_features):
+            raise KsftSkipEx(f"Test requires XDP features {required_features}, "
+                             f"got: {cfg.have_xdp_features}")
+
+        if tx:
+            test_transmit(cfg, ipv4, extra_args)
+        else:
+            test_receive(cfg, ipv4, extra_args)
+
+    if ipv4:
+        f.__name__ = "ipv4_" + name
+    else:
+        f.__name__ = "ipv6_" + name
+    return f
+
+
+def check_nic_features(cfg) -> None:
+    """Populate device supported XDP features from netdev netlink.
+
+       If the device does not support any of the required features, then skip
+       the relevant tests."""
+    features = NetdevFamily().dev_get({"ifindex": cfg.ifindex})
+    cfg.have_xdp_features = features["xdp-features"]
+
+
+def main() -> None:
+    with NetDrvEpEnv(__file__, nsim_test=False) as cfg:
+        check_nic_features(cfg)
+
+        cfg.bin_local = path.abspath(path.dirname(__file__) + "/../../../bpf/xsk_hw")
+        cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
+
+        cfg.mac_local = cfg.dev["address"]
+        cfg.mac_remote = ip(f"link show dev {cfg.ifname}",
+                            host=cfg.remote, json=True)[0]["address"]
+
+        cfg.mac_nexthop_local = cfg.env["LOCAL_NEXTHOP_MAC"]
+        cfg.mac_nexthop_remote = cfg.env["REMOTE_NEXTHOP_MAC"]
+
+        cases = []
+        for ipv4 in [True, False]:
+            # Basic test with AF_PACKET on both ends
+            cases.append(test_builder("basic", cfg, ipv4, False))
+
+            cases.append(test_builder("tx_skb_copy", cfg, ipv4, True, "-T -s -c"))
+            cases.append(test_builder("tx_skb_copy_force_attach", cfg, ipv4, True, "-TT -s -c"))
+            cases.append(test_builder("rx_skb_copy", cfg, ipv4, False, "-R -s -c"))
+
+            cases.append(test_builder("tx_drv_copy", cfg, ipv4, True, "-T -d -c",
+                                      {"basic", "ndo-xmit"}))
+            cases.append(test_builder("tx_drv_copy_force_attach", cfg, ipv4, True, "-TT -d -c",
+                                      {"basic", "ndo-xmit"}))
+            cases.append(test_builder("rx_drv_copy", cfg, ipv4, False, "-R -d -c",
+                                      {"basic", "redirect"}))
+
+            cases.append(test_builder("tx_drv_zerocopy", cfg, ipv4, True, "-T -d -z",
+                                      {"basic", "xsk-zerocopy", "ndo-xmit"}))
+            cases.append(test_builder("tx_drv_zerocopy_force_attach", cfg, ipv4, True, "-TT -d -z",
+                                      {"basic", "xsk-zerocopy", "ndo-xmit"}))
+            cases.append(test_builder("rx_drv_zerocopy", cfg, ipv4, False, "-R -d -z",
+                                      {"basic", "xsk-zerocopy", "redirect"}))
+            cases.append(test_builder("rx_drv_zerocopy_fill_after_bind", cfg, ipv4, False, "-R -d -z -f",
+                                      {"basic", "xsk-zerocopy", "redirect"}))
+
+        ksft_run(cases=cases, args=(cfg, ))
+    ksft_exit()
+
+
+if __name__ == "__main__":
+    main()
-- 
2.45.2.505.gda0bf45e8d-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 3/3] selftests: drv-net: Add xsk_hw AF_XDP functionality test
  2024-06-11 20:42 ` [RFC PATCH net-next 3/3] selftests: drv-net: " YiFei Zhu
@ 2024-06-11 21:09   ` Willem de Bruijn
  0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2024-06-11 21:09 UTC (permalink / raw)
  To: YiFei Zhu, netdev, bpf
  Cc: Björn Töpel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Stanislav Fomichev,
	Willem de Bruijn

YiFei Zhu wrote:
> Run tools/testing/selftest/bpf/xsk_hw.c as part of drv-net.
> 
> A matrix of TX/RX, copy/zerocopy, and driver mode / skb mode, are
> tested. Additionally, it tests some edge cases such as:
> - Zerocopy TX with and without attaching an XDP prog.
> - Zerocopy RX where binding happens before fillq gets filled.
> 
> TX and RX are tested separately, and the remote side always runs
> the basic AF_PACKET handler rather than AF_XDP, in order to
> isolate potential causes of test failures.
> 
> Currently the next-hop MAC address of each side must be manually
> specified via LOCAL_NEXTHOP_MAC & REMOTE_NEXTHOP_MAC. It's probably
> doable to detect these addresses automatically, but it's future work,
> and probably library code since it is also applicable to csum.py.
> 
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 1/3] selftests/bpf: Move rxq_num helper from xdp_hw_metadata to network_helpers
  2024-06-11 20:42 ` [RFC PATCH net-next 1/3] selftests/bpf: Move rxq_num helper from xdp_hw_metadata to network_helpers YiFei Zhu
@ 2024-06-11 21:12   ` Willem de Bruijn
  0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2024-06-11 21:12 UTC (permalink / raw)
  To: YiFei Zhu, netdev, bpf
  Cc: Björn Töpel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Stanislav Fomichev,
	Willem de Bruijn

YiFei Zhu wrote:
> This helper may be useful for other AF_XDP tests, such as xsk_hw.
> Moving it out so we don't need to copy-paste that function.
> 
> I also changed the function from directly calling error(1, errno, ...)
> to returning an error because I don't think it makes sense for a
> library function to outright kill the process if the function fails.
> 
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> ---
>  tools/testing/selftests/bpf/network_helpers.c | 27 +++++++++++++++++++
>  tools/testing/selftests/bpf/network_helpers.h |  2 ++
>  tools/testing/selftests/bpf/xdp_hw_metadata.c | 27 ++-----------------
>  3 files changed, 31 insertions(+), 25 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
> index 35250e6cde7f..4c3bef07df23 100644
> --- a/tools/testing/selftests/bpf/network_helpers.c
> +++ b/tools/testing/selftests/bpf/network_helpers.c
> @@ -569,6 +569,33 @@ int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
>  	return 0;
>  }
>  
> +int rxq_num(const char *ifname)
> +{
> +	struct ethtool_channels ch = {
> +		.cmd = ETHTOOL_GCHANNELS,
> +	};
> +	struct ifreq ifr = {
> +		.ifr_data = (void *)&ch,
> +	};
> +	strncpy(ifr.ifr_name, ifname, IF_NAMESIZE - 1);
> +	int fd, ret, err;

Since sending this as RFC, when sending for inclusion let's move the
strncpy, to not mix declarations and code.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test
  2024-06-11 20:42 [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test YiFei Zhu
                   ` (2 preceding siblings ...)
  2024-06-11 20:42 ` [RFC PATCH net-next 3/3] selftests: drv-net: " YiFei Zhu
@ 2024-06-12 11:47 ` Magnus Karlsson
  2024-06-12 12:49   ` Maciej Fijalkowski
  2024-06-12 13:57   ` Willem de Bruijn
  3 siblings, 2 replies; 11+ messages in thread
From: Magnus Karlsson @ 2024-06-12 11:47 UTC (permalink / raw)
  To: YiFei Zhu
  Cc: netdev, bpf, Björn Töpel, Magnus Karlsson,
	Maciej Fijalkowski, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Stanislav Fomichev, Willem de Bruijn

On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
>
> We have observed that hardware NIC drivers may have faulty AF_XDP
> implementations, and there seem to be a lack of a test of various modes
> in which AF_XDP could run. This series adds a test to verify that NIC
> drivers implements many AF_XDP features by performing a send / receive
> of a single UDP packet.
>
> I put the C code of the test under selftests/bpf because I'm not really
> sure how I'd build the BPF-related code without the selftests/bpf
> build infrastructure.

Happy to see that you are contributing a number of new tests. Would it
be possible for you to integrate this into the xskxceiver framework?
You can find that in selftests/bpf too. By default, it will run its
tests using veth, but if you provide an interface name after the -i
option, it will run the tests over a real interface. I put the NIC in
loopback mode to use this feature, but feel free to add a new mode if
necessary. A lot of the setup and data plane code that you add already
exists in xskxceiver, so I would prefer if you could reuse it. Your
tests are new though and they would be valuable to have.

You could make the default packet that is sent in xskxceiver be the
UDP packet that you want and then add all the other logic that you
have to a number of new tests that you introduce.

> Tested on Google Cloud, with GVE:
>
>   $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
>     REMOTE_ARGS="root@10.138.15.235" \
>     LOCAL_V4="10.138.15.234" \
>     REMOTE_V4="10.138.15.235" \
>     LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
>     REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
>     python3 xsk_hw.py
>
>   KTAP version 1
>   1..22
>   ok 1 xsk_hw.ipv4_basic
>   ok 2 xsk_hw.ipv4_tx_skb_copy
>   ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
>   ok 4 xsk_hw.ipv4_rx_skb_copy
>   ok 5 xsk_hw.ipv4_tx_drv_copy
>   ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
>   ok 7 xsk_hw.ipv4_rx_drv_copy
>   [...]
>   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
>   not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
>   ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
>   ok 10 xsk_hw.ipv4_rx_drv_zerocopy
>   [...]
>   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
>   [...]
>   # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
>   not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
>   ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
>   [...]
>   ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
>   # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0
>
> YiFei Zhu (3):
>   selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
>     network_helpers
>   selftests/bpf: Add xsk_hw AF_XDP functionality test
>   selftests: drv-net: Add xsk_hw AF_XDP functionality test
>
>  tools/testing/selftests/bpf/.gitignore        |   1 +
>  tools/testing/selftests/bpf/Makefile          |   7 +-
>  tools/testing/selftests/bpf/network_helpers.c |  27 +
>  tools/testing/selftests/bpf/network_helpers.h |  16 +
>  tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
>  tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
>  tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
>  .../testing/selftests/drivers/net/hw/Makefile |   1 +
>  .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
>  9 files changed, 1102 insertions(+), 26 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
>  create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
>  create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py
>
> --
> 2.45.2.505.gda0bf45e8d-goog
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test
  2024-06-12 11:47 ` [RFC PATCH net-next 0/3] selftests: Add " Magnus Karlsson
@ 2024-06-12 12:49   ` Maciej Fijalkowski
  2024-06-12 16:44     ` YiFei Zhu
  2024-06-12 13:57   ` Willem de Bruijn
  1 sibling, 1 reply; 11+ messages in thread
From: Maciej Fijalkowski @ 2024-06-12 12:49 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: YiFei Zhu, netdev, bpf, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Stanislav Fomichev,
	Willem de Bruijn

On Wed, Jun 12, 2024 at 01:47:06PM +0200, Magnus Karlsson wrote:
> On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
> >
> > We have observed that hardware NIC drivers may have faulty AF_XDP
> > implementations, and there seem to be a lack of a test of various modes
> > in which AF_XDP could run. This series adds a test to verify that NIC
> > drivers implements many AF_XDP features by performing a send / receive
> > of a single UDP packet.
> >
> > I put the C code of the test under selftests/bpf because I'm not really
> > sure how I'd build the BPF-related code without the selftests/bpf
> > build infrastructure.
> 
> Happy to see that you are contributing a number of new tests. Would it
> be possible for you to integrate this into the xskxceiver framework?
> You can find that in selftests/bpf too. By default, it will run its
> tests using veth, but if you provide an interface name after the -i
> option, it will run the tests over a real interface. I put the NIC in
> loopback mode to use this feature, but feel free to add a new mode if
> necessary. A lot of the setup and data plane code that you add already
> exists in xskxceiver, so I would prefer if you could reuse it. Your
> tests are new though and they would be valuable to have.

+1

I just don't believe that you guys were not aware that xskxceiver exist.
Please provide us a proper explanation/justification why this was not
fulfilling your needs and you decided to go with another test suite.

> 
> You could make the default packet that is sent in xskxceiver be the
> UDP packet that you want and then add all the other logic that you
> have to a number of new tests that you introduce.
> 
> > Tested on Google Cloud, with GVE:
> >
> >   $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
> >     REMOTE_ARGS="root@10.138.15.235" \
> >     LOCAL_V4="10.138.15.234" \
> >     REMOTE_V4="10.138.15.235" \
> >     LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> >     REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> >     python3 xsk_hw.py
> >
> >   KTAP version 1
> >   1..22
> >   ok 1 xsk_hw.ipv4_basic
> >   ok 2 xsk_hw.ipv4_tx_skb_copy
> >   ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
> >   ok 4 xsk_hw.ipv4_rx_skb_copy
> >   ok 5 xsk_hw.ipv4_tx_drv_copy
> >   ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
> >   ok 7 xsk_hw.ipv4_rx_drv_copy
> >   [...]
> >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
> >   not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
> >   ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
> >   ok 10 xsk_hw.ipv4_rx_drv_zerocopy
> >   [...]
> >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
> >   [...]
> >   # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
> >   not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
> >   ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
> >   [...]
> >   ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
> >   # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0
> >
> > YiFei Zhu (3):
> >   selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
> >     network_helpers
> >   selftests/bpf: Add xsk_hw AF_XDP functionality test
> >   selftests: drv-net: Add xsk_hw AF_XDP functionality test
> >
> >  tools/testing/selftests/bpf/.gitignore        |   1 +
> >  tools/testing/selftests/bpf/Makefile          |   7 +-
> >  tools/testing/selftests/bpf/network_helpers.c |  27 +
> >  tools/testing/selftests/bpf/network_helpers.h |  16 +
> >  tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
> >  tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
> >  tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
> >  .../testing/selftests/drivers/net/hw/Makefile |   1 +
> >  .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
> >  9 files changed, 1102 insertions(+), 26 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
> >  create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
> >  create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py
> >
> > --
> > 2.45.2.505.gda0bf45e8d-goog
> >
> >
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test
  2024-06-12 11:47 ` [RFC PATCH net-next 0/3] selftests: Add " Magnus Karlsson
  2024-06-12 12:49   ` Maciej Fijalkowski
@ 2024-06-12 13:57   ` Willem de Bruijn
  1 sibling, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2024-06-12 13:57 UTC (permalink / raw)
  To: Magnus Karlsson, YiFei Zhu
  Cc: netdev, bpf, Björn Töpel, Magnus Karlsson,
	Maciej Fijalkowski, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Stanislav Fomichev, Willem de Bruijn

Magnus Karlsson wrote:
> On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
> >
> > We have observed that hardware NIC drivers may have faulty AF_XDP
> > implementations, and there seem to be a lack of a test of various modes
> > in which AF_XDP could run. This series adds a test to verify that NIC
> > drivers implements many AF_XDP features by performing a send / receive
> > of a single UDP packet.
> >
> > I put the C code of the test under selftests/bpf because I'm not really
> > sure how I'd build the BPF-related code without the selftests/bpf
> > build infrastructure.
> 
> Happy to see that you are contributing a number of new tests. Would it
> be possible for you to integrate this into the xskxceiver framework?

Makes sense, we'll need to take a look.

This is an internal test that we have been using for a long time in
our test framework.

My mistake for not keeping up at all with the changes to xskxceiver.c
in the meantime.

We want to test each case independently. Including a few non obvious
cases that we discovered from real use, notably

- Using XSK only for Tx, without installing an Rx program
- Using XSK with an empty fill queue, filling it after bind

> You can find that in selftests/bpf too. By default, it will run its
> tests using veth, but if you provide an interface name after the -i
> option, it will run the tests over a real interface. I put the NIC in
> loopback mode to use this feature, but feel free to add a new mode if
> necessary.

We do really want two machine tests, not loopback mode. Also to
integrate into the drv-net infrastructure.

Another non-obvious feature is to test one side AF_XDP and use
PF_PACKET on the other side, to be able to isolate and exercise only
the Tx or Rx path in a test.

> A lot of the setup and data plane code that you add already
> exists in xskxceiver, so I would prefer if you could reuse it. Your
> tests are new though and they would be valuable to have.
> 
> You could make the default packet that is sent in xskxceiver be the
> UDP packet that you want and then add all the other logic that you
> have to a number of new tests that you introduce.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test
  2024-06-12 12:49   ` Maciej Fijalkowski
@ 2024-06-12 16:44     ` YiFei Zhu
  2024-06-13  6:42       ` Magnus Karlsson
  0 siblings, 1 reply; 11+ messages in thread
From: YiFei Zhu @ 2024-06-12 16:44 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Magnus Karlsson, netdev, bpf, Björn Töpel,
	Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Stanislav Fomichev, Willem de Bruijn

On Wed, Jun 12, 2024 at 5:50 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Jun 12, 2024 at 01:47:06PM +0200, Magnus Karlsson wrote:
> > On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
> > >
> > > We have observed that hardware NIC drivers may have faulty AF_XDP
> > > implementations, and there seem to be a lack of a test of various modes
> > > in which AF_XDP could run. This series adds a test to verify that NIC
> > > drivers implements many AF_XDP features by performing a send / receive
> > > of a single UDP packet.
> > >
> > > I put the C code of the test under selftests/bpf because I'm not really
> > > sure how I'd build the BPF-related code without the selftests/bpf
> > > build infrastructure.
> >
> > Happy to see that you are contributing a number of new tests. Would it
> > be possible for you to integrate this into the xskxceiver framework?
> > You can find that in selftests/bpf too. By default, it will run its
> > tests using veth, but if you provide an interface name after the -i
> > option, it will run the tests over a real interface. I put the NIC in
> > loopback mode to use this feature, but feel free to add a new mode if
> > necessary. A lot of the setup and data plane code that you add already
> > exists in xskxceiver, so I would prefer if you could reuse it. Your
> > tests are new though and they would be valuable to have.
>
> +1
>
> I just don't believe that you guys were not aware that xskxceiver exist.
> Please provide us a proper explanation/justification why this was not
> fulfilling your needs and you decided to go with another test suite.

To answer this question, I can't speak for others, but I personally
was not fully aware.

Over a year ago when we were testing AF_XDP latency on internal NIC
drivers, we extended our internal latency test tool to support AF_XDP.
And that was when we observed the NICs we were testing had faulty
implementations - panics, packet corruptions, random drops; and we
decided to simplify the latency suite to add a simple pass/fail test
to our testing infrastructure, and we named it xsk_hw. The test was
specifically designed to test hardware NICs (rather than veth), and
there was a bunch of code around the test, to reserve & setup
machines, and to obtain information such as the IP addresses and the
host and next hop MACs addresses. At the time, the code was deemed too
dependent on our internal multi-machine-testing infrastructure to
upstream, but it has been running as part of our test suite since.

This brings us to recently. I was informed that upstream now have
drv-net, and now that upstream also has multi-machine testing, it's
time to upstream it. Hence this patch series, which I made after
adapting the code to use drv-net and network_helpers.

As for xskxceiver, for me personally, I discarded the idea after
reading the initial block comment of xskxceiver saying it spawns two
threads in a veth pair to test AF_XDP, which in my mind was like "okay
this doesn't test hardware NICs, and to extend that test to hardware
is probably a major rewrite that is probably not worth", so I did not
look too deeply into its code. I personally was unaware that it can
test a real interface, and that's partially my fault.

I'll take a look at xskxceiver and see how feasible it is to integrate
this into xskxceiver.

> >
> > You could make the default packet that is sent in xskxceiver be the
> > UDP packet that you want and then add all the other logic that you
> > have to a number of new tests that you introduce.
> >
> > > Tested on Google Cloud, with GVE:
> > >
> > >   $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
> > >     REMOTE_ARGS="root@10.138.15.235" \
> > >     LOCAL_V4="10.138.15.234" \
> > >     REMOTE_V4="10.138.15.235" \
> > >     LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> > >     REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> > >     python3 xsk_hw.py
> > >
> > >   KTAP version 1
> > >   1..22
> > >   ok 1 xsk_hw.ipv4_basic
> > >   ok 2 xsk_hw.ipv4_tx_skb_copy
> > >   ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
> > >   ok 4 xsk_hw.ipv4_rx_skb_copy
> > >   ok 5 xsk_hw.ipv4_tx_drv_copy
> > >   ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
> > >   ok 7 xsk_hw.ipv4_rx_drv_copy
> > >   [...]
> > >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
> > >   not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
> > >   ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
> > >   ok 10 xsk_hw.ipv4_rx_drv_zerocopy
> > >   [...]
> > >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
> > >   [...]
> > >   # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
> > >   not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
> > >   ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
> > >   [...]
> > >   ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
> > >   # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0
> > >
> > > YiFei Zhu (3):
> > >   selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
> > >     network_helpers
> > >   selftests/bpf: Add xsk_hw AF_XDP functionality test
> > >   selftests: drv-net: Add xsk_hw AF_XDP functionality test
> > >
> > >  tools/testing/selftests/bpf/.gitignore        |   1 +
> > >  tools/testing/selftests/bpf/Makefile          |   7 +-
> > >  tools/testing/selftests/bpf/network_helpers.c |  27 +
> > >  tools/testing/selftests/bpf/network_helpers.h |  16 +
> > >  tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
> > >  tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
> > >  tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
> > >  .../testing/selftests/drivers/net/hw/Makefile |   1 +
> > >  .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
> > >  9 files changed, 1102 insertions(+), 26 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
> > >  create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
> > >  create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py
> > >
> > > --
> > > 2.45.2.505.gda0bf45e8d-goog
> > >
> > >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test
  2024-06-12 16:44     ` YiFei Zhu
@ 2024-06-13  6:42       ` Magnus Karlsson
  0 siblings, 0 replies; 11+ messages in thread
From: Magnus Karlsson @ 2024-06-13  6:42 UTC (permalink / raw)
  To: YiFei Zhu
  Cc: Maciej Fijalkowski, netdev, bpf, Björn Töpel,
	Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Stanislav Fomichev, Willem de Bruijn

On Wed, 12 Jun 2024 at 18:44, YiFei Zhu <zhuyifei@google.com> wrote:
>
> On Wed, Jun 12, 2024 at 5:50 AM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Wed, Jun 12, 2024 at 01:47:06PM +0200, Magnus Karlsson wrote:
> > > On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
> > > >
> > > > We have observed that hardware NIC drivers may have faulty AF_XDP
> > > > implementations, and there seem to be a lack of a test of various modes
> > > > in which AF_XDP could run. This series adds a test to verify that NIC
> > > > drivers implements many AF_XDP features by performing a send / receive
> > > > of a single UDP packet.
> > > >
> > > > I put the C code of the test under selftests/bpf because I'm not really
> > > > sure how I'd build the BPF-related code without the selftests/bpf
> > > > build infrastructure.
> > >
> > > Happy to see that you are contributing a number of new tests. Would it
> > > be possible for you to integrate this into the xskxceiver framework?
> > > You can find that in selftests/bpf too. By default, it will run its
> > > tests using veth, but if you provide an interface name after the -i
> > > option, it will run the tests over a real interface. I put the NIC in
> > > loopback mode to use this feature, but feel free to add a new mode if
> > > necessary. A lot of the setup and data plane code that you add already
> > > exists in xskxceiver, so I would prefer if you could reuse it. Your
> > > tests are new though and they would be valuable to have.
> >
> > +1
> >
> > I just don't believe that you guys were not aware that xskxceiver exist.
> > Please provide us a proper explanation/justification why this was not
> > fulfilling your needs and you decided to go with another test suite.
>
> To answer this question, I can't speak for others, but I personally
> was not fully aware.
>
> Over a year ago when we were testing AF_XDP latency on internal NIC
> drivers, we extended our internal latency test tool to support AF_XDP.
> And that was when we observed the NICs we were testing had faulty
> implementations - panics, packet corruptions, random drops; and we
> decided to simplify the latency suite to add a simple pass/fail test
> to our testing infrastructure, and we named it xsk_hw. The test was
> specifically designed to test hardware NICs (rather than veth), and
> there was a bunch of code around the test, to reserve & setup
> machines, and to obtain information such as the IP addresses and the
> host and next hop MACs addresses. At the time, the code was deemed too
> dependent on our internal multi-machine-testing infrastructure to
> upstream, but it has been running as part of our test suite since.
>
> This brings us to recently. I was informed that upstream now have
> drv-net, and now that upstream also has multi-machine testing, it's
> time to upstream it. Hence this patch series, which I made after
> adapting the code to use drv-net and network_helpers.

I was not aware of drv-net. I think it would be a really good idea to
just hook up xskxceiver to this even without adding any new tests. If
this is something that is run automatically for drivers, perfect, we
should make use of it. Any idea what it would take to make xskxceiver
use drv-net?

> As for xskxceiver, for me personally, I discarded the idea after
> reading the initial block comment of xskxceiver saying it spawns two
> threads in a veth pair to test AF_XDP, which in my mind was like "okay
> this doesn't test hardware NICs, and to extend that test to hardware
> is probably a major rewrite that is probably not worth", so I did not
> look too deeply into its code. I personally was unaware that it can
> test a real interface, and that's partially my fault.

Or mine for not updating the initial block comment. In any case, no worries!

> I'll take a look at xskxceiver and see how feasible it is to integrate
> this into xskxceiver.

Thanks! Please keep the drv-net integration in mind. Hopefully it is
not that much work to tweak xskxceiver to fit into that.

> > >
> > > You could make the default packet that is sent in xskxceiver be the
> > > UDP packet that you want and then add all the other logic that you
> > > have to a number of new tests that you introduce.
> > >
> > > > Tested on Google Cloud, with GVE:
> > > >
> > > >   $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
> > > >     REMOTE_ARGS="root@10.138.15.235" \
> > > >     LOCAL_V4="10.138.15.234" \
> > > >     REMOTE_V4="10.138.15.235" \
> > > >     LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> > > >     REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> > > >     python3 xsk_hw.py
> > > >
> > > >   KTAP version 1
> > > >   1..22
> > > >   ok 1 xsk_hw.ipv4_basic
> > > >   ok 2 xsk_hw.ipv4_tx_skb_copy
> > > >   ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
> > > >   ok 4 xsk_hw.ipv4_rx_skb_copy
> > > >   ok 5 xsk_hw.ipv4_tx_drv_copy
> > > >   ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
> > > >   ok 7 xsk_hw.ipv4_rx_drv_copy
> > > >   [...]
> > > >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
> > > >   not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
> > > >   ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
> > > >   ok 10 xsk_hw.ipv4_rx_drv_zerocopy
> > > >   [...]
> > > >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
> > > >   [...]
> > > >   # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
> > > >   not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
> > > >   ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
> > > >   [...]
> > > >   ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
> > > >   # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0
> > > >
> > > > YiFei Zhu (3):
> > > >   selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
> > > >     network_helpers
> > > >   selftests/bpf: Add xsk_hw AF_XDP functionality test
> > > >   selftests: drv-net: Add xsk_hw AF_XDP functionality test
> > > >
> > > >  tools/testing/selftests/bpf/.gitignore        |   1 +
> > > >  tools/testing/selftests/bpf/Makefile          |   7 +-
> > > >  tools/testing/selftests/bpf/network_helpers.c |  27 +
> > > >  tools/testing/selftests/bpf/network_helpers.h |  16 +
> > > >  tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
> > > >  tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
> > > >  tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
> > > >  .../testing/selftests/drivers/net/hw/Makefile |   1 +
> > > >  .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
> > > >  9 files changed, 1102 insertions(+), 26 deletions(-)
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
> > > >  create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
> > > >  create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py
> > > >
> > > > --
> > > > 2.45.2.505.gda0bf45e8d-goog
> > > >
> > > >
> > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-06-13  6:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11 20:42 [RFC PATCH net-next 0/3] selftests: Add AF_XDP functionality test YiFei Zhu
2024-06-11 20:42 ` [RFC PATCH net-next 1/3] selftests/bpf: Move rxq_num helper from xdp_hw_metadata to network_helpers YiFei Zhu
2024-06-11 21:12   ` Willem de Bruijn
2024-06-11 20:42 ` [RFC PATCH net-next 2/3] selftests/bpf: Add xsk_hw AF_XDP functionality test YiFei Zhu
2024-06-11 20:42 ` [RFC PATCH net-next 3/3] selftests: drv-net: " YiFei Zhu
2024-06-11 21:09   ` Willem de Bruijn
2024-06-12 11:47 ` [RFC PATCH net-next 0/3] selftests: Add " Magnus Karlsson
2024-06-12 12:49   ` Maciej Fijalkowski
2024-06-12 16:44     ` YiFei Zhu
2024-06-13  6:42       ` Magnus Karlsson
2024-06-12 13:57   ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).