[PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases
@ 2024-07-23 18:24 Kui-Feng Lee
  2024-07-23 18:24 ` [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions Kui-Feng Lee
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-23 18:24 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii, sdf
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Capture packets in the background for flaky test cases related to
network features.

We have some flaky test cases that are difficult to debug without
knowing what the traffic looks like. Capturing packets, the CI log and
packet files may help developers to fix these flaky test cases.

This patch set monitors a few test cases. Recently, they have been
showing flaky behavior.

    IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 68, ifindex 1, SYN
    IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 60, ifindex 1, SYN, ACK
    IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 60, ifindex 1, ACK
    IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, ACK
    IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 52, ifindex 1, FIN, ACK
    IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, RST, ACK
    Packet file: packets-2172-86.log
    #280/87  select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK 

The above block is the log of a test case. It shows every packets of a
connection. The captured packets are stored in the file called
packets-2172-86.log.

The following block is an example that monitors the network traffic of
a test case. This test is running in the network namespace
"testns". You can pass NULL to traffic_monitor_start() if the entire
test, from traffic_monitor_start() to traffic_monitor_stop(), is
running in the same namespace.

    struct tmonitor_ctx *tmon;

    ...
    tmon = traffic_monitor_start("testns");
    ASSERT_TRUE(tmon, "traffic_monitor_start");

    ... test ...

    traffic_monitor_stop(tmon);

traffic_monitor_start() may fail, but we just ignore it since the
failure doesn't affect the following main test.

This feature is enabled only if BPF selftests are built with
TRAFFIC_MONITOR variable being defined. For example,

    make TRAFFIC_MONITOR=1 -C tools/testing/selftests/bpf

This command will enable traffic monitoring for BPF selftests. That
means we have to turn it on to get the log at CI.

---

Changes from v1:

 - Initialize log_fd in traffic_monitor_start().

 - Remove redundant including.

v1: https://lore.kernel.org/all/20240713055552.2482367-5-thinker.li@gmail.com/

Kui-Feng Lee (4):
  selftests/bpf: Add traffic monitor functions.
  selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
  selftests/bpf: Monitor traffic for sockmap_listen.
  selftests/bpf: Monitor traffic for select_reuseport.

 tools/testing/selftests/bpf/Makefile          |   5 +
 tools/testing/selftests/bpf/network_helpers.c | 382 ++++++++++++++++++
 tools/testing/selftests/bpf/network_helpers.h |  16 +
 .../bpf/prog_tests/select_reuseport.c         |   7 +
 .../selftests/bpf/prog_tests/sockmap_listen.c |   8 +
 .../selftests/bpf/prog_tests/tc_redirect.c    |   5 +
 6 files changed, 423 insertions(+)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions.
  2024-07-23 18:24 [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee
@ 2024-07-23 18:24 ` Kui-Feng Lee
  2024-07-23 22:03   ` Kui-Feng Lee
                     ` (2 more replies)
  2024-07-23 18:24 ` [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime Kui-Feng Lee
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-23 18:24 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii, sdf
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Add functions that capture packets and print log in the background. They
are supposed to be used for debugging flaky network test cases. A monitored
test case should call traffic_monitor_start() to start a thread to capture
packets in the background for a given namespace and call
traffic_monitor_stop() to stop capturing.

    IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 68, ifindex 1, SYN
    IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 60, ifindex 1, SYN, ACK
    IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 60, ifindex 1, ACK
    IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, ACK
    IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 52, ifindex 1, FIN, ACK
    IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, RST, ACK
    Packet file: packets-2172-86.log
    #280/87 select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK

The above is the output of an example. It shows the packets of a connection
and the name of the file that contains captured packets in the directory
/tmp/tmon_pcap. The file can be loaded by tcpdump or wireshark.

This feature only works if TRAFFIC_MONITOR variable has been passed to
build BPF selftests. For example,

  make TRAFFIC_MONITOR=1 -C tools/testing/selftests/bpf

This command will build BPF selftests with this feature enabled.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 tools/testing/selftests/bpf/Makefile          |   5 +
 tools/testing/selftests/bpf/network_helpers.c | 382 ++++++++++++++++++
 tools/testing/selftests/bpf/network_helpers.h |  16 +
 3 files changed, 403 insertions(+)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index dd49c1d23a60..9dfe17588689 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -41,6 +41,11 @@ CFLAGS += -g $(OPT_FLAGS) -rdynamic					\
 LDFLAGS += $(SAN_LDFLAGS)
 LDLIBS += $(LIBELF_LIBS) -lz -lrt -lpthread
 
+ifneq ($(TRAFFIC_MONITOR),)
+LDLIBS += -lpcap
+CFLAGS += -DTRAFFIC_MONITOR=1
+endif
+
 # The following tests perform type punning and they may break strict
 # aliasing rules, which are exploited by both GCC and clang by default
 # while optimizing.  This can lead to broken programs.
diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
index e0cba4178e41..c881f53c8218 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -10,6 +10,7 @@
 
 #include <arpa/inet.h>
 #include <sys/mount.h>
+#include <sys/select.h>
 #include <sys/stat.h>
 #include <sys/un.h>
 
@@ -18,6 +19,14 @@
 #include <linux/in6.h>
 #include <linux/limits.h>
 
+#include <netinet/udp.h>
+
+#include <pthread.h>
+/* Prevent pcap.h from including pcap/bpf.h and causing conflicts */
+#define PCAP_DONT_INCLUDE_PCAP_BPF_H 1
+#include <pcap/pcap.h>
+#include <pcap/dlt.h>
+
 #include "bpf_util.h"
 #include "network_helpers.h"
 #include "test_progs.h"
@@ -575,6 +584,379 @@ int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
 	return 0;
 }
 
+#ifdef TRAFFIC_MONITOR
+struct tmonitor_ctx {
+	pcap_t *pcap;
+	pcap_dumper_t *dumper;
+	pthread_t thread;
+	int wake_fd_r;
+	int wake_fd_w;
+
+	bool done;
+	char pkt_fname[PATH_MAX];
+	int pcap_fd;
+};
+
+/* Is this packet captured with a Ethernet protocol type? */
+static bool is_ethernet(const u_char *packet)
+{
+	u16 arphdr_type;
+
+	memcpy(&arphdr_type, packet + 8, 2);
+	arphdr_type = ntohs(arphdr_type);
+
+	/* Except the following cases, the protocol type contains the
+	 * Ethernet protocol type for the packet.
+	 *
+	 * https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
+	 */
+	switch (arphdr_type) {
+	case 770: /* ARPHRD_FRAD */
+	case 778: /* ARPHDR_IPGRE */
+	case 803: /* ARPHRD_IEEE80211_RADIOTAP */
+		return false;
+	}
+	return true;
+}
+
+/* Show the information of the transport layer in the packet */
+static void show_transport(const u_char *packet, u16 len, u32 ifindex,
+			   const char *src_addr, const char *dst_addr,
+			   u16 proto, bool ipv6)
+{
+	struct udphdr *udp;
+	struct tcphdr *tcp;
+	u16 src_port, dst_port;
+	const char *transport_str;
+
+	if (proto == IPPROTO_UDP) {
+		udp = (struct udphdr *)packet;
+		src_port = ntohs(udp->source);
+		dst_port = ntohs(udp->dest);
+		transport_str = "UDP";
+	} else if (proto == IPPROTO_TCP) {
+		tcp = (struct tcphdr *)packet;
+		src_port = ntohs(tcp->source);
+		dst_port = ntohs(tcp->dest);
+		transport_str = "TCP"
+;
+	} else {
+		printf("%s (proto %d): %s -> %s, ifindex %d\n",
+		       ipv6 ? "IPv6" : "IPv4", proto, src_addr, dst_addr, ifindex);
+		return;
+	}
+
+	if (ipv6)
+		printf("IPv6 %s packet: [%s]:%d -> [%s]:%d, len %d, ifindex %d",
+		       transport_str, src_addr, src_port,
+		       dst_addr, dst_port, len, ifindex);
+	else
+		printf("IPv4 %s packet: %s:%d -> %s:%d, len %d, ifindex %d",
+		       transport_str, src_addr, src_port,
+		       dst_addr, dst_port, len, ifindex);
+
+	if (proto == IPPROTO_TCP) {
+		if (tcp->fin)
+			printf(", FIN");
+		if (tcp->syn)
+			printf(", SYN");
+		if (tcp->rst)
+			printf(", RST");
+		if (tcp->ack)
+			printf(", ACK");
+	}
+
+	printf("\n");
+}
+
+static void show_ipv6_packet(const u_char *packet, u32 ifindex)
+{
+	struct ipv6hdr *pkt = (struct ipv6hdr *)packet;
+	struct in6_addr src;
+	struct in6_addr dst;
+	char src_str[INET6_ADDRSTRLEN], dst_str[INET6_ADDRSTRLEN];
+	u_char proto;
+
+	memcpy(&src, &pkt->saddr, sizeof(src));
+	memcpy(&dst, &pkt->daddr, sizeof(dst));
+	inet_ntop(AF_INET6, &src, src_str, sizeof(src_str));
+	inet_ntop(AF_INET6, &dst, dst_str, sizeof(dst_str));
+	proto = pkt->nexthdr;
+	show_transport(packet + sizeof(struct ipv6hdr),
+		       ntohs(pkt->payload_len),
+		       ifindex, src_str, dst_str, proto, true);
+}
+
+static void show_ipv4_packet(const u_char *packet, u32 ifindex)
+{
+	struct iphdr *pkt = (struct iphdr *)packet;
+	struct in_addr src;
+	struct in_addr dst;
+	u_char proto;
+	char src_str[INET_ADDRSTRLEN], dst_str[INET_ADDRSTRLEN];
+
+	memcpy(&src, &pkt->saddr, sizeof(src));
+	memcpy(&dst, &pkt->daddr, sizeof(dst));
+	inet_ntop(AF_INET, &src, src_str, sizeof(src_str));
+	inet_ntop(AF_INET, &dst, dst_str, sizeof(dst_str));
+	proto = pkt->protocol;
+	show_transport(packet + sizeof(struct iphdr),
+		       ntohs(pkt->tot_len),
+		       ifindex, src_str, dst_str, proto, false);
+}
+
+static void *traffic_monitor_thread(void *arg)
+{
+	const u_char *packet, *payload;
+	struct tmonitor_ctx *ctx = arg;
+	struct pcap_pkthdr header;
+	pcap_t *pcap = ctx->pcap;
+	pcap_dumper_t *dumper = ctx->dumper;
+	int fd = ctx->pcap_fd;
+	int wake_fd = ctx->wake_fd_r;
+	u16 proto;
+	u32 ifindex;
+	fd_set fds;
+	int nfds, r;
+
+	nfds = (fd > wake_fd ? fd : wake_fd) + 1;
+	FD_ZERO(&fds);
+
+	while (!ctx->done) {
+		FD_SET(fd, &fds);
+		FD_SET(wake_fd, &fds);
+		r = select(nfds, &fds, NULL, NULL, NULL);
+		if (!r)
+			continue;
+		if (r < 0) {
+			if (errno == EINTR)
+				continue;
+			log_err("Fail to select on pcap fd and wake fd: %s", strerror(errno));
+			break;
+		}
+
+		packet = pcap_next(pcap, &header);
+		if (!packet)
+			continue;
+
+		/* According to the man page of pcap_dump(), first argument
+		 * is the pcap_dumper_t pointer even it's argument type is
+		 * u_char *.
+		 */
+		pcap_dump((u_char *)dumper, &header, packet);
+
+		/* Not sure what other types of packets look like. Here, we
+		 * parse only Ethernet and compatible packets.
+		 */
+		if (!is_ethernet(packet)) {
+			printf("Packet captured\n");
+			continue;
+		}
+
+		/* Skip SLL2 header
+		 * https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
+		 *
+		 * Although the document doesn't mention that, the payload
+		 * doesn't include the Ethernet header. The payload starts
+		 * from the first byte of the network layer header.
+		 */
+		payload = packet + 20;
+
+		memcpy(&proto, packet, 2);
+		proto = ntohs(proto);
+		memcpy(&ifindex, packet + 4, 4);
+		ifindex = ntohl(ifindex);
+
+		if (proto == ETH_P_IPV6)
+			show_ipv6_packet(payload, ifindex);
+		else if (proto == ETH_P_IP)
+			show_ipv4_packet(payload, ifindex);
+		else
+			printf("Unknown network protocol type %x, ifindex %d\n", proto, ifindex);
+	}
+
+	return NULL;
+}
+
+/* Prepare the pcap handle to capture packets.
+ *
+ * This pcap is non-blocking and immediate mode is enabled to receive
+ * captured packets as soon as possible.  The snaplen is set to 1024 bytes
+ * to limit the size of captured content. The format of the link-layer
+ * header is set to DLT_LINUX_SLL2 to enable handling various link-layer
+ * technologies.
+ */
+static pcap_t *traffic_monitor_prepare_pcap(void)
+{
+	char errbuf[PCAP_ERRBUF_SIZE];
+	pcap_t *pcap;
+	int r;
+
+	/* Listen on all NICs in the namespace */
+	pcap = pcap_create("any", errbuf);
+	if (!pcap) {
+		log_err("Failed to open pcap: %s", errbuf);
+		return NULL;
+	}
+	/* Limit the size of the packet (first N bytes) */
+	r = pcap_set_snaplen(pcap, 1024);
+	if (r) {
+		log_err("Failed to set snaplen: %s", pcap_geterr(pcap));
+		goto error;
+	}
+	/* To receive packets as fast as possible */
+	r = pcap_set_immediate_mode(pcap, 1);
+	if (r) {
+		log_err("Failed to set immediate mode: %s", pcap_geterr(pcap));
+		goto error;
+	}
+	r = pcap_setnonblock(pcap, 1, errbuf);
+	if (r) {
+		log_err("Failed to set nonblock: %s", errbuf);
+		goto error;
+	}
+	r = pcap_activate(pcap);
+	if (r) {
+		log_err("Failed to activate pcap: %s", pcap_geterr(pcap));
+		goto error;
+	}
+	/* Determine the format of the link-layer header */
+	r = pcap_set_datalink(pcap, DLT_LINUX_SLL2);
+	if (r) {
+		log_err("Failed to set datalink: %s", pcap_geterr(pcap));
+		goto error;
+	}
+
+	return pcap;
+error:
+	pcap_close(pcap);
+	return NULL;
+}
+
+#define PCAP_DIR "/tmp/tmon_pcap"
+
+/* Start to monitor the network traffic in the given network namespace.
+ *
+ * netns: the name of the network namespace to monitor. If NULL, the
+ * current network namespace is monitored.
+ *
+ * This function will start a thread to capture packets going through NICs
+ * in the give network namespace.
+ */
+struct tmonitor_ctx *traffic_monitor_start(const char *netns)
+{
+	struct tmonitor_ctx *ctx = NULL;
+	struct nstoken *nstoken = NULL;
+	int pipefd[2] = {-1, -1};
+	static int tmon_seq;
+	int r;
+
+	if (netns) {
+		nstoken = open_netns(netns);
+		if (!nstoken)
+			return NULL;
+	}
+	ctx = malloc(sizeof(*ctx));
+	if (!ctx) {
+		log_err("Failed to malloc ctx");
+		goto fail_ctx;
+	}
+	memset(ctx, 0, sizeof(*ctx));
+
+	snprintf(ctx->pkt_fname, sizeof(ctx->pkt_fname),
+		 PCAP_DIR "/packets-%d-%d.log", getpid(), tmon_seq++);
+
+	r = mkdir(PCAP_DIR, 0755);
+	if (r && errno != EEXIST) {
+		log_err("Failed to create " PCAP_DIR);
+		goto fail_pcap;
+	}
+
+	ctx->pcap = traffic_monitor_prepare_pcap();
+	if (!ctx->pcap)
+		goto fail_pcap;
+	ctx->pcap_fd = pcap_get_selectable_fd(ctx->pcap);
+	if (ctx->pcap_fd < 0) {
+		log_err("Failed to get pcap fd");
+		goto fail_dumper;
+	}
+
+	/* Create a packet file */
+	ctx->dumper = pcap_dump_open(ctx->pcap, ctx->pkt_fname);
+	if (!ctx->dumper) {
+		log_err("Failed to open pcap dump");
+		goto fail_dumper;
+	}
+
+	/* Create a pipe to wake up the monitor thread */
+	r = pipe(pipefd);
+	if (r) {
+		log_err("Failed to create pipe: %s", strerror(errno));
+		goto fail;
+	}
+	ctx->wake_fd_r = pipefd[0];
+	ctx->wake_fd_w = pipefd[1];
+
+	r = pthread_create(&ctx->thread, NULL, traffic_monitor_thread, ctx);
+	if (r) {
+		log_err("Failed to create thread: %s", strerror(r));
+		goto fail;
+	}
+
+	close_netns(nstoken);
+
+	return ctx;
+
+fail:
+	close(pipefd[0]);
+	close(pipefd[1]);
+
+	pcap_dump_close(ctx->dumper);
+	unlink(ctx->pkt_fname);
+
+fail_dumper:
+	pcap_close(ctx->pcap);
+
+fail_pcap:
+	free(ctx);
+
+fail_ctx:
+	close_netns(nstoken);
+
+	return NULL;
+}
+
+static void traffic_monitor_release(struct tmonitor_ctx *ctx)
+{
+	pcap_close(ctx->pcap);
+	pcap_dump_close(ctx->dumper);
+
+	close(ctx->wake_fd_r);
+	close(ctx->wake_fd_w);
+
+	free(ctx);
+}
+
+/* Stop the network traffic monitor.
+ *
+ * ctx: the context returned by traffic_monitor_start()
+ */
+void traffic_monitor_stop(struct tmonitor_ctx *ctx)
+{
+	if (!ctx)
+		return;
+
+	/* Stop the monitor thread */
+	ctx->done = true;
+	write(ctx->wake_fd_w, "x", 1);
+	pthread_join(ctx->thread, NULL);
+
+	printf("Packet file: %s\n", strrchr(ctx->pkt_fname, '/') + 1);
+
+	traffic_monitor_release(ctx);
+}
+#endif /* TRAFFIC_MONITOR */
+
 struct send_recv_arg {
 	int		fd;
 	uint32_t	bytes;
diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
index aac5b94d6379..a4067f33a800 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -82,6 +82,22 @@ int get_socket_local_port(int sock_fd);
 int get_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
 int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
 
+struct tmonitor_ctx;
+
+#ifdef TRAFFIC_MONITOR
+struct tmonitor_ctx *traffic_monitor_start(const char *netns);
+void traffic_monitor_stop(struct tmonitor_ctx *ctx);
+#else
+static inline struct tmonitor_ctx *traffic_monitor_start(const char *netns)
+{
+	return (struct tmonitor_ctx *)-1;
+}
+
+static inline void traffic_monitor_stop(struct tmonitor_ctx *ctx)
+{
+}
+#endif
+
 struct nstoken;
 /**
  * open_netns() - Switch to specified network namespace by name.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
  2024-07-23 18:24 [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee
  2024-07-23 18:24 ` [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions Kui-Feng Lee
@ 2024-07-23 18:24 ` Kui-Feng Lee
  2024-07-24  8:36   ` Geliang Tang
  2024-07-24 15:26   ` Stanislav Fomichev
  2024-07-23 18:24 ` [PATCH bpf-next v2 3/4] selftests/bpf: Monitor traffic for sockmap_listen Kui-Feng Lee
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-23 18:24 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii, sdf
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Enable traffic monitoring for the test case tc_redirect/tc_redirect_dtime.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/tc_redirect.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
index 327d51f59142..1be6ea8d6c64 100644
--- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
@@ -900,6 +900,7 @@ static void test_udp_dtime(struct test_tc_dtime *skel, int family, bool bpf_fwd)
 static void test_tc_redirect_dtime(struct netns_setup_result *setup_result)
 {
 	struct test_tc_dtime *skel;
+	struct tmonitor_ctx *tmon = NULL;
 	struct nstoken *nstoken;
 	int hold_tstamp_fd, err;
 
@@ -934,6 +935,9 @@ static void test_tc_redirect_dtime(struct netns_setup_result *setup_result)
 	if (!ASSERT_OK(err, "disable forwarding"))
 		goto done;
 
+	tmon = traffic_monitor_start(NS_DST);
+	ASSERT_NEQ(tmon, NULL, "traffic_monitor_start");
+
 	test_tcp_clear_dtime(skel);
 
 	test_tcp_dtime(skel, AF_INET, true);
@@ -958,6 +962,7 @@ static void test_tc_redirect_dtime(struct netns_setup_result *setup_result)
 	test_udp_dtime(skel, AF_INET6, false);
 
 done:
+	traffic_monitor_stop(tmon);
 	test_tc_dtime__destroy(skel);
 	close(hold_tstamp_fd);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v2 3/4] selftests/bpf: Monitor traffic for sockmap_listen.
  2024-07-23 18:24 [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee
  2024-07-23 18:24 ` [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions Kui-Feng Lee
  2024-07-23 18:24 ` [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime Kui-Feng Lee
@ 2024-07-23 18:24 ` Kui-Feng Lee
  2024-07-24  9:32   ` Geliang Tang
  2024-07-23 18:24 ` [PATCH bpf-next v2 4/4] selftests/bpf: Monitor traffic for select_reuseport Kui-Feng Lee
  2024-07-23 22:01 ` [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee
  4 siblings, 1 reply; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-23 18:24 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii, sdf
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Enable traffic monitor for each subtest of sockmap_listen.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index e91b59366030..62683ccb6d56 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -28,6 +28,7 @@
 #include "test_sockmap_listen.skel.h"
 
 #include "sockmap_helpers.h"
+#include "network_helpers.h"
 
 static void test_insert_invalid(struct test_sockmap_listen *skel __always_unused,
 				int family, int sotype, int mapfd)
@@ -1893,14 +1894,21 @@ static void test_udp_unix_redir(struct test_sockmap_listen *skel, struct bpf_map
 {
 	const char *family_name, *map_name;
 	char s[MAX_TEST_NAME];
+	struct tmonitor_ctx *tmon;
 
 	family_name = family_str(family);
 	map_name = map_type_str(map);
 	snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__);
 	if (!test__start_subtest(s))
 		return;
+
+	tmon = traffic_monitor_start(NULL);
+	ASSERT_TRUE(tmon, "traffic_monitor_start");
+
 	inet_unix_skb_redir_to_connected(skel, map, family);
 	unix_inet_skb_redir_to_connected(skel, map, family);
+
+	traffic_monitor_stop(tmon);
 }
 
 static void run_tests(struct test_sockmap_listen *skel, struct bpf_map *map,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v2 4/4] selftests/bpf: Monitor traffic for select_reuseport.
  2024-07-23 18:24 [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee
                   ` (2 preceding siblings ...)
  2024-07-23 18:24 ` [PATCH bpf-next v2 3/4] selftests/bpf: Monitor traffic for sockmap_listen Kui-Feng Lee
@ 2024-07-23 18:24 ` Kui-Feng Lee
  2024-07-24  9:33   ` Geliang Tang
  2024-07-23 22:01 ` [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee
  4 siblings, 1 reply; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-23 18:24 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii, sdf
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Enable traffic monitoring for the subtests of select_reuseport.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/select_reuseport.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
index 64c5f5eb2994..d3039957ee94 100644
--- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
+++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
@@ -22,6 +22,7 @@
 
 #include "test_progs.h"
 #include "test_select_reuseport_common.h"
+#include "network_helpers.h"
 
 #define MAX_TEST_NAME 80
 #define MIN_TCPHDR_LEN 20
@@ -795,6 +796,7 @@ static void test_config(int sotype, sa_family_t family, bool inany)
 	};
 	char s[MAX_TEST_NAME];
 	const struct test *t;
+	struct tmonitor_ctx *tmon;
 
 	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
 		if (t->need_sotype && t->need_sotype != sotype)
@@ -808,9 +810,14 @@ static void test_config(int sotype, sa_family_t family, bool inany)
 		if (!test__start_subtest(s))
 			continue;
 
+		tmon = traffic_monitor_start(NULL);
+		ASSERT_TRUE(tmon, "traffic_monitor_start");
+
 		setup_per_test(sotype, family, inany, t->no_inner_map);
 		t->fn(sotype, family);
 		cleanup_per_test(t->no_inner_map);
+
+		traffic_monitor_stop(tmon);
 	}
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases
  2024-07-23 18:24 [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee
                   ` (3 preceding siblings ...)
  2024-07-23 18:24 ` [PATCH bpf-next v2 4/4] selftests/bpf: Monitor traffic for select_reuseport Kui-Feng Lee
@ 2024-07-23 22:01 ` Kui-Feng Lee
  4 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-23 22:01 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, martin.lau, song, kernel-team, andrii,
	sdf; +Cc: kuifeng



On 7/23/24 11:24, Kui-Feng Lee wrote:
> Capture packets in the background for flaky test cases related to
> network features.
> 
> We have some flaky test cases that are difficult to debug without
> knowing what the traffic looks like. Capturing packets, the CI log and
> packet files may help developers to fix these flaky test cases.
> 
> This patch set monitors a few test cases. Recently, they have been
> showing flaky behavior.
> 
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 68, ifindex 1, SYN
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 60, ifindex 1, SYN, ACK
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 60, ifindex 1, ACK
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, ACK
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 52, ifindex 1, FIN, ACK
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, RST, ACK
>      Packet file: packets-2172-86.log
>      #280/87  select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK
> 
> The above block is the log of a test case. It shows every packets of a
> connection. The captured packets are stored in the file called
> packets-2172-86.log.
> 
> The following block is an example that monitors the network traffic of
> a test case. This test is running in the network namespace
> "testns". You can pass NULL to traffic_monitor_start() if the entire
> test, from traffic_monitor_start() to traffic_monitor_stop(), is
> running in the same namespace.
> 
>      struct tmonitor_ctx *tmon;
>      
>      ...
>      tmon = traffic_monitor_start("testns");
>      ASSERT_TRUE(tmon, "traffic_monitor_start");
>      
>      ... test ...
>      
>      traffic_monitor_stop(tmon);
> 
> traffic_monitor_start() may fail, but we just ignore it since the
> failure doesn't affect the following main test.
> 
> This feature is enabled only if BPF selftests are built with
> TRAFFIC_MONITOR variable being defined. For example,
> 
>      make TRAFFIC_MONITOR=1 -C tools/testing/selftests/bpf
> 
> This command will enable traffic monitoring for BPF selftests. That
> means we have to turn it on to get the log at CI.
> 
> ---
> 
> Changes from v1:
> 
>   - Initialize log_fd in traffic_monitor_start().
> 
>   - Remove redundant including.

Sorry for not updating changes correctly.
No more tcpdump, it moves to use call pcap directly
in a background thread.  Packets are wrote to a packet file.
In the log, it prints parsed information of TCP or UDP packets.
For other packets,  just print a string "Packet captured" to indicate
a packet has been captured.

> 
> v1: https://lore.kernel.org/all/20240713055552.2482367-5-thinker.li@gmail.com/
> 
> Kui-Feng Lee (4):
>    selftests/bpf: Add traffic monitor functions.
>    selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
>    selftests/bpf: Monitor traffic for sockmap_listen.
>    selftests/bpf: Monitor traffic for select_reuseport.
> 
>   tools/testing/selftests/bpf/Makefile          |   5 +
>   tools/testing/selftests/bpf/network_helpers.c | 382 ++++++++++++++++++
>   tools/testing/selftests/bpf/network_helpers.h |  16 +
>   .../bpf/prog_tests/select_reuseport.c         |   7 +
>   .../selftests/bpf/prog_tests/sockmap_listen.c |   8 +
>   .../selftests/bpf/prog_tests/tc_redirect.c    |   5 +
>   6 files changed, 423 insertions(+)
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions.
  2024-07-23 18:24 ` [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions Kui-Feng Lee
@ 2024-07-23 22:03   ` Kui-Feng Lee
  2024-07-24 15:22   ` Stanislav Fomichev
  2024-07-24 19:08   ` Martin KaFai Lau
  2 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-23 22:03 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, martin.lau, song, kernel-team, andrii,
	sdf; +Cc: kuifeng



On 7/23/24 11:24, Kui-Feng Lee wrote:
> Add functions that capture packets and print log in the background. They
> are supposed to be used for debugging flaky network test cases. A monitored
> test case should call traffic_monitor_start() to start a thread to capture
> packets in the background for a given namespace and call
> traffic_monitor_stop() to stop capturing.
> 
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 68, ifindex 1, SYN
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 60, ifindex 1, SYN, ACK
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 60, ifindex 1, ACK
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, ACK
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 52, ifindex 1, FIN, ACK
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, RST, ACK
>      Packet file: packets-2172-86.log
>      #280/87 select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK
> 
> The above is the output of an example. It shows the packets of a connection
> and the name of the file that contains captured packets in the directory
> /tmp/tmon_pcap. The file can be loaded by tcpdump or wireshark.
> 
> This feature only works if TRAFFIC_MONITOR variable has been passed to
> build BPF selftests. For example,
> 
>    make TRAFFIC_MONITOR=1 -C tools/testing/selftests/bpf
> 
> This command will build BPF selftests with this feature enabled.
> 
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
>   tools/testing/selftests/bpf/Makefile          |   5 +
>   tools/testing/selftests/bpf/network_helpers.c | 382 ++++++++++++++++++
>   tools/testing/selftests/bpf/network_helpers.h |  16 +
>   3 files changed, 403 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index dd49c1d23a60..9dfe17588689 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -41,6 +41,11 @@ CFLAGS += -g $(OPT_FLAGS) -rdynamic					\
>   LDFLAGS += $(SAN_LDFLAGS)
>   LDLIBS += $(LIBELF_LIBS) -lz -lrt -lpthread
>   
> +ifneq ($(TRAFFIC_MONITOR),)
> +LDLIBS += -lpcap
> +CFLAGS += -DTRAFFIC_MONITOR=1
> +endif
> +
>   # The following tests perform type punning and they may break strict
>   # aliasing rules, which are exploited by both GCC and clang by default
>   # while optimizing.  This can lead to broken programs.
> diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
> index e0cba4178e41..c881f53c8218 100644
> --- a/tools/testing/selftests/bpf/network_helpers.c
> +++ b/tools/testing/selftests/bpf/network_helpers.c
> @@ -10,6 +10,7 @@
>   
>   #include <arpa/inet.h>
>   #include <sys/mount.h>
> +#include <sys/select.h>
>   #include <sys/stat.h>
>   #include <sys/un.h>
>   
> @@ -18,6 +19,14 @@
>   #include <linux/in6.h>
>   #include <linux/limits.h>
>   
> +#include <netinet/udp.h>
> +
> +#include <pthread.h>
> +/* Prevent pcap.h from including pcap/bpf.h and causing conflicts */
> +#define PCAP_DONT_INCLUDE_PCAP_BPF_H 1
> +#include <pcap/pcap.h>
> +#include <pcap/dlt.h>

Just noticed that the above lines fails on an environment without pcap
installed. It will be fixed in next version.

> +
>   #include "bpf_util.h"
>   #include "network_helpers.h"
>   #include "test_progs.h"
> @@ -575,6 +584,379 @@ int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
>   	return 0;
>   }
>   
> +#ifdef TRAFFIC_MONITOR
> +struct tmonitor_ctx {
> +	pcap_t *pcap;
> +	pcap_dumper_t *dumper;
> +	pthread_t thread;
> +	int wake_fd_r;
> +	int wake_fd_w;
> +
> +	bool done;
> +	char pkt_fname[PATH_MAX];
> +	int pcap_fd;
> +};
> +
> +/* Is this packet captured with a Ethernet protocol type? */
> +static bool is_ethernet(const u_char *packet)
> +{
> +	u16 arphdr_type;
> +
> +	memcpy(&arphdr_type, packet + 8, 2);
> +	arphdr_type = ntohs(arphdr_type);
> +
> +	/* Except the following cases, the protocol type contains the
> +	 * Ethernet protocol type for the packet.
> +	 *
> +	 * https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
> +	 */
> +	switch (arphdr_type) {
> +	case 770: /* ARPHRD_FRAD */
> +	case 778: /* ARPHDR_IPGRE */
> +	case 803: /* ARPHRD_IEEE80211_RADIOTAP */
> +		return false;
> +	}
> +	return true;
> +}
> +
> +/* Show the information of the transport layer in the packet */
> +static void show_transport(const u_char *packet, u16 len, u32 ifindex,
> +			   const char *src_addr, const char *dst_addr,
> +			   u16 proto, bool ipv6)
> +{
> +	struct udphdr *udp;
> +	struct tcphdr *tcp;
> +	u16 src_port, dst_port;
> +	const char *transport_str;
> +
> +	if (proto == IPPROTO_UDP) {
> +		udp = (struct udphdr *)packet;
> +		src_port = ntohs(udp->source);
> +		dst_port = ntohs(udp->dest);
> +		transport_str = "UDP";
> +	} else if (proto == IPPROTO_TCP) {
> +		tcp = (struct tcphdr *)packet;
> +		src_port = ntohs(tcp->source);
> +		dst_port = ntohs(tcp->dest);
> +		transport_str = "TCP"
> +;
> +	} else {
> +		printf("%s (proto %d): %s -> %s, ifindex %d\n",
> +		       ipv6 ? "IPv6" : "IPv4", proto, src_addr, dst_addr, ifindex);
> +		return;
> +	}
> +
> +	if (ipv6)
> +		printf("IPv6 %s packet: [%s]:%d -> [%s]:%d, len %d, ifindex %d",
> +		       transport_str, src_addr, src_port,
> +		       dst_addr, dst_port, len, ifindex);
> +	else
> +		printf("IPv4 %s packet: %s:%d -> %s:%d, len %d, ifindex %d",
> +		       transport_str, src_addr, src_port,
> +		       dst_addr, dst_port, len, ifindex);
> +
> +	if (proto == IPPROTO_TCP) {
> +		if (tcp->fin)
> +			printf(", FIN");
> +		if (tcp->syn)
> +			printf(", SYN");
> +		if (tcp->rst)
> +			printf(", RST");
> +		if (tcp->ack)
> +			printf(", ACK");
> +	}
> +
> +	printf("\n");
> +}
> +
> +static void show_ipv6_packet(const u_char *packet, u32 ifindex)
> +{
> +	struct ipv6hdr *pkt = (struct ipv6hdr *)packet;
> +	struct in6_addr src;
> +	struct in6_addr dst;
> +	char src_str[INET6_ADDRSTRLEN], dst_str[INET6_ADDRSTRLEN];
> +	u_char proto;
> +
> +	memcpy(&src, &pkt->saddr, sizeof(src));
> +	memcpy(&dst, &pkt->daddr, sizeof(dst));
> +	inet_ntop(AF_INET6, &src, src_str, sizeof(src_str));
> +	inet_ntop(AF_INET6, &dst, dst_str, sizeof(dst_str));
> +	proto = pkt->nexthdr;
> +	show_transport(packet + sizeof(struct ipv6hdr),
> +		       ntohs(pkt->payload_len),
> +		       ifindex, src_str, dst_str, proto, true);
> +}
> +
> +static void show_ipv4_packet(const u_char *packet, u32 ifindex)
> +{
> +	struct iphdr *pkt = (struct iphdr *)packet;
> +	struct in_addr src;
> +	struct in_addr dst;
> +	u_char proto;
> +	char src_str[INET_ADDRSTRLEN], dst_str[INET_ADDRSTRLEN];
> +
> +	memcpy(&src, &pkt->saddr, sizeof(src));
> +	memcpy(&dst, &pkt->daddr, sizeof(dst));
> +	inet_ntop(AF_INET, &src, src_str, sizeof(src_str));
> +	inet_ntop(AF_INET, &dst, dst_str, sizeof(dst_str));
> +	proto = pkt->protocol;
> +	show_transport(packet + sizeof(struct iphdr),
> +		       ntohs(pkt->tot_len),
> +		       ifindex, src_str, dst_str, proto, false);
> +}
> +
> +static void *traffic_monitor_thread(void *arg)
> +{
> +	const u_char *packet, *payload;
> +	struct tmonitor_ctx *ctx = arg;
> +	struct pcap_pkthdr header;
> +	pcap_t *pcap = ctx->pcap;
> +	pcap_dumper_t *dumper = ctx->dumper;
> +	int fd = ctx->pcap_fd;
> +	int wake_fd = ctx->wake_fd_r;
> +	u16 proto;
> +	u32 ifindex;
> +	fd_set fds;
> +	int nfds, r;
> +
> +	nfds = (fd > wake_fd ? fd : wake_fd) + 1;
> +	FD_ZERO(&fds);
> +
> +	while (!ctx->done) {
> +		FD_SET(fd, &fds);
> +		FD_SET(wake_fd, &fds);
> +		r = select(nfds, &fds, NULL, NULL, NULL);
> +		if (!r)
> +			continue;
> +		if (r < 0) {
> +			if (errno == EINTR)
> +				continue;
> +			log_err("Fail to select on pcap fd and wake fd: %s", strerror(errno));
> +			break;
> +		}
> +
> +		packet = pcap_next(pcap, &header);
> +		if (!packet)
> +			continue;
> +
> +		/* According to the man page of pcap_dump(), first argument
> +		 * is the pcap_dumper_t pointer even it's argument type is
> +		 * u_char *.
> +		 */
> +		pcap_dump((u_char *)dumper, &header, packet);
> +
> +		/* Not sure what other types of packets look like. Here, we
> +		 * parse only Ethernet and compatible packets.
> +		 */
> +		if (!is_ethernet(packet)) {
> +			printf("Packet captured\n");
> +			continue;
> +		}
> +
> +		/* Skip SLL2 header
> +		 * https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
> +		 *
> +		 * Although the document doesn't mention that, the payload
> +		 * doesn't include the Ethernet header. The payload starts
> +		 * from the first byte of the network layer header.
> +		 */
> +		payload = packet + 20;
> +
> +		memcpy(&proto, packet, 2);
> +		proto = ntohs(proto);
> +		memcpy(&ifindex, packet + 4, 4);
> +		ifindex = ntohl(ifindex);
> +
> +		if (proto == ETH_P_IPV6)
> +			show_ipv6_packet(payload, ifindex);
> +		else if (proto == ETH_P_IP)
> +			show_ipv4_packet(payload, ifindex);
> +		else
> +			printf("Unknown network protocol type %x, ifindex %d\n", proto, ifindex);
> +	}
> +
> +	return NULL;
> +}
> +
> +/* Prepare the pcap handle to capture packets.
> + *
> + * This pcap is non-blocking and immediate mode is enabled to receive
> + * captured packets as soon as possible.  The snaplen is set to 1024 bytes
> + * to limit the size of captured content. The format of the link-layer
> + * header is set to DLT_LINUX_SLL2 to enable handling various link-layer
> + * technologies.
> + */
> +static pcap_t *traffic_monitor_prepare_pcap(void)
> +{
> +	char errbuf[PCAP_ERRBUF_SIZE];
> +	pcap_t *pcap;
> +	int r;
> +
> +	/* Listen on all NICs in the namespace */
> +	pcap = pcap_create("any", errbuf);
> +	if (!pcap) {
> +		log_err("Failed to open pcap: %s", errbuf);
> +		return NULL;
> +	}
> +	/* Limit the size of the packet (first N bytes) */
> +	r = pcap_set_snaplen(pcap, 1024);
> +	if (r) {
> +		log_err("Failed to set snaplen: %s", pcap_geterr(pcap));
> +		goto error;
> +	}
> +	/* To receive packets as fast as possible */
> +	r = pcap_set_immediate_mode(pcap, 1);
> +	if (r) {
> +		log_err("Failed to set immediate mode: %s", pcap_geterr(pcap));
> +		goto error;
> +	}
> +	r = pcap_setnonblock(pcap, 1, errbuf);
> +	if (r) {
> +		log_err("Failed to set nonblock: %s", errbuf);
> +		goto error;
> +	}
> +	r = pcap_activate(pcap);
> +	if (r) {
> +		log_err("Failed to activate pcap: %s", pcap_geterr(pcap));
> +		goto error;
> +	}
> +	/* Determine the format of the link-layer header */
> +	r = pcap_set_datalink(pcap, DLT_LINUX_SLL2);
> +	if (r) {
> +		log_err("Failed to set datalink: %s", pcap_geterr(pcap));
> +		goto error;
> +	}
> +
> +	return pcap;
> +error:
> +	pcap_close(pcap);
> +	return NULL;
> +}
> +
> +#define PCAP_DIR "/tmp/tmon_pcap"
> +
> +/* Start to monitor the network traffic in the given network namespace.
> + *
> + * netns: the name of the network namespace to monitor. If NULL, the
> + * current network namespace is monitored.
> + *
> + * This function will start a thread to capture packets going through NICs
> + * in the give network namespace.
> + */
> +struct tmonitor_ctx *traffic_monitor_start(const char *netns)
> +{
> +	struct tmonitor_ctx *ctx = NULL;
> +	struct nstoken *nstoken = NULL;
> +	int pipefd[2] = {-1, -1};
> +	static int tmon_seq;
> +	int r;
> +
> +	if (netns) {
> +		nstoken = open_netns(netns);
> +		if (!nstoken)
> +			return NULL;
> +	}
> +	ctx = malloc(sizeof(*ctx));
> +	if (!ctx) {
> +		log_err("Failed to malloc ctx");
> +		goto fail_ctx;
> +	}
> +	memset(ctx, 0, sizeof(*ctx));
> +
> +	snprintf(ctx->pkt_fname, sizeof(ctx->pkt_fname),
> +		 PCAP_DIR "/packets-%d-%d.log", getpid(), tmon_seq++);
> +
> +	r = mkdir(PCAP_DIR, 0755);
> +	if (r && errno != EEXIST) {
> +		log_err("Failed to create " PCAP_DIR);
> +		goto fail_pcap;
> +	}
> +
> +	ctx->pcap = traffic_monitor_prepare_pcap();
> +	if (!ctx->pcap)
> +		goto fail_pcap;
> +	ctx->pcap_fd = pcap_get_selectable_fd(ctx->pcap);
> +	if (ctx->pcap_fd < 0) {
> +		log_err("Failed to get pcap fd");
> +		goto fail_dumper;
> +	}
> +
> +	/* Create a packet file */
> +	ctx->dumper = pcap_dump_open(ctx->pcap, ctx->pkt_fname);
> +	if (!ctx->dumper) {
> +		log_err("Failed to open pcap dump");
> +		goto fail_dumper;
> +	}
> +
> +	/* Create a pipe to wake up the monitor thread */
> +	r = pipe(pipefd);
> +	if (r) {
> +		log_err("Failed to create pipe: %s", strerror(errno));
> +		goto fail;
> +	}
> +	ctx->wake_fd_r = pipefd[0];
> +	ctx->wake_fd_w = pipefd[1];
> +
> +	r = pthread_create(&ctx->thread, NULL, traffic_monitor_thread, ctx);
> +	if (r) {
> +		log_err("Failed to create thread: %s", strerror(r));
> +		goto fail;
> +	}
> +
> +	close_netns(nstoken);
> +
> +	return ctx;
> +
> +fail:
> +	close(pipefd[0]);
> +	close(pipefd[1]);
> +
> +	pcap_dump_close(ctx->dumper);
> +	unlink(ctx->pkt_fname);
> +
> +fail_dumper:
> +	pcap_close(ctx->pcap);
> +
> +fail_pcap:
> +	free(ctx);
> +
> +fail_ctx:
> +	close_netns(nstoken);
> +
> +	return NULL;
> +}
> +
> +static void traffic_monitor_release(struct tmonitor_ctx *ctx)
> +{
> +	pcap_close(ctx->pcap);
> +	pcap_dump_close(ctx->dumper);
> +
> +	close(ctx->wake_fd_r);
> +	close(ctx->wake_fd_w);
> +
> +	free(ctx);
> +}
> +
> +/* Stop the network traffic monitor.
> + *
> + * ctx: the context returned by traffic_monitor_start()
> + */
> +void traffic_monitor_stop(struct tmonitor_ctx *ctx)
> +{
> +	if (!ctx)
> +		return;
> +
> +	/* Stop the monitor thread */
> +	ctx->done = true;
> +	write(ctx->wake_fd_w, "x", 1);
> +	pthread_join(ctx->thread, NULL);
> +
> +	printf("Packet file: %s\n", strrchr(ctx->pkt_fname, '/') + 1);
> +
> +	traffic_monitor_release(ctx);
> +}
> +#endif /* TRAFFIC_MONITOR */
> +
>   struct send_recv_arg {
>   	int		fd;
>   	uint32_t	bytes;
> diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
> index aac5b94d6379..a4067f33a800 100644
> --- a/tools/testing/selftests/bpf/network_helpers.h
> +++ b/tools/testing/selftests/bpf/network_helpers.h
> @@ -82,6 +82,22 @@ int get_socket_local_port(int sock_fd);
>   int get_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
>   int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
>   
> +struct tmonitor_ctx;
> +
> +#ifdef TRAFFIC_MONITOR
> +struct tmonitor_ctx *traffic_monitor_start(const char *netns);
> +void traffic_monitor_stop(struct tmonitor_ctx *ctx);
> +#else
> +static inline struct tmonitor_ctx *traffic_monitor_start(const char *netns)
> +{
> +	return (struct tmonitor_ctx *)-1;
> +}
> +
> +static inline void traffic_monitor_stop(struct tmonitor_ctx *ctx)
> +{
> +}
> +#endif
> +
>   struct nstoken;
>   /**
>    * open_netns() - Switch to specified network namespace by name.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
  2024-07-23 18:24 ` [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime Kui-Feng Lee
@ 2024-07-24  8:36   ` Geliang Tang
  2024-07-24 16:24     ` Kui-Feng Lee
  2024-07-24 15:26   ` Stanislav Fomichev
  1 sibling, 1 reply; 22+ messages in thread
From: Geliang Tang @ 2024-07-24  8:36 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, martin.lau, song, kernel-team, andrii,
	sdf
  Cc: sinquersw, kuifeng

On Tue, 2024-07-23 at 11:24 -0700, Kui-Feng Lee wrote:
> Enable traffic monitoring for the test case
> tc_redirect/tc_redirect_dtime.
> 
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
>  tools/testing/selftests/bpf/prog_tests/tc_redirect.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
> b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
> index 327d51f59142..1be6ea8d6c64 100644
> --- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
> +++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
> @@ -900,6 +900,7 @@ static void test_udp_dtime(struct test_tc_dtime
> *skel, int family, bool bpf_fwd)
>  static void test_tc_redirect_dtime(struct netns_setup_result
> *setup_result)
>  {
>  	struct test_tc_dtime *skel;
> +	struct tmonitor_ctx *tmon = NULL;

nit: No need to set "tmon" to NULL here.

>  	struct nstoken *nstoken;
>  	int hold_tstamp_fd, err;
>  
> @@ -934,6 +935,9 @@ static void test_tc_redirect_dtime(struct
> netns_setup_result *setup_result)
>  	if (!ASSERT_OK(err, "disable forwarding"))
>  		goto done;
>  
> +	tmon = traffic_monitor_start(NS_DST);
> +	ASSERT_NEQ(tmon, NULL, "traffic_monitor_start");
> +
>  	test_tcp_clear_dtime(skel);
>  
>  	test_tcp_dtime(skel, AF_INET, true);
> @@ -958,6 +962,7 @@ static void test_tc_redirect_dtime(struct
> netns_setup_result *setup_result)
>  	test_udp_dtime(skel, AF_INET6, false);
>  
>  done:
> +	traffic_monitor_stop(tmon);
>  	test_tc_dtime__destroy(skel);
>  	close(hold_tstamp_fd);
>  }


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 3/4] selftests/bpf: Monitor traffic for sockmap_listen.
  2024-07-23 18:24 ` [PATCH bpf-next v2 3/4] selftests/bpf: Monitor traffic for sockmap_listen Kui-Feng Lee
@ 2024-07-24  9:32   ` Geliang Tang
  2024-07-24 16:24     ` Kui-Feng Lee
  0 siblings, 1 reply; 22+ messages in thread
From: Geliang Tang @ 2024-07-24  9:32 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, martin.lau, song, kernel-team, andrii,
	sdf
  Cc: sinquersw, kuifeng

On Tue, 2024-07-23 at 11:24 -0700, Kui-Feng Lee wrote:
> Enable traffic monitor for each subtest of sockmap_listen.
> 
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
>  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> index e91b59366030..62683ccb6d56 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> @@ -28,6 +28,7 @@
>  #include "test_sockmap_listen.skel.h"
>  
>  #include "sockmap_helpers.h"
> +#include "network_helpers.h"
>  
>  static void test_insert_invalid(struct test_sockmap_listen *skel
> __always_unused,
>  				int family, int sotype, int mapfd)
> @@ -1893,14 +1894,21 @@ static void test_udp_unix_redir(struct
> test_sockmap_listen *skel, struct bpf_map
>  {
>  	const char *family_name, *map_name;
>  	char s[MAX_TEST_NAME];
> +	struct tmonitor_ctx *tmon;
>  
>  	family_name = family_str(family);
>  	map_name = map_type_str(map);
>  	snprintf(s, sizeof(s), "%s %s %s", map_name, family_name,
> __func__);
>  	if (!test__start_subtest(s))
>  		return;
> +
> +	tmon = traffic_monitor_start(NULL);
> +	ASSERT_TRUE(tmon, "traffic_monitor_start");

Using ASSERT_TRUE() on a pointer is a bit strange, it's better to use
ASSERT_NEQ(NULL) like patch 2.

> +
>  	inet_unix_skb_redir_to_connected(skel, map, family);
>  	unix_inet_skb_redir_to_connected(skel, map, family);
> +
> +	traffic_monitor_stop(tmon);
>  }
>  
>  static void run_tests(struct test_sockmap_listen *skel, struct
> bpf_map *map,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 4/4] selftests/bpf: Monitor traffic for select_reuseport.
  2024-07-23 18:24 ` [PATCH bpf-next v2 4/4] selftests/bpf: Monitor traffic for select_reuseport Kui-Feng Lee
@ 2024-07-24  9:33   ` Geliang Tang
  0 siblings, 0 replies; 22+ messages in thread
From: Geliang Tang @ 2024-07-24  9:33 UTC (permalink / raw)
  To: Kui-Feng Lee, bpf, ast, martin.lau, song, kernel-team, andrii,
	sdf
  Cc: sinquersw, kuifeng

On Tue, 2024-07-23 at 11:24 -0700, Kui-Feng Lee wrote:
> Enable traffic monitoring for the subtests of select_reuseport.
> 
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
>  tools/testing/selftests/bpf/prog_tests/select_reuseport.c | 7
> +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git
> a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
> b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
> index 64c5f5eb2994..d3039957ee94 100644
> --- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
> +++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c
> @@ -22,6 +22,7 @@
>  
>  #include "test_progs.h"
>  #include "test_select_reuseport_common.h"
> +#include "network_helpers.h"
>  
>  #define MAX_TEST_NAME 80
>  #define MIN_TCPHDR_LEN 20
> @@ -795,6 +796,7 @@ static void test_config(int sotype, sa_family_t
> family, bool inany)
>  	};
>  	char s[MAX_TEST_NAME];
>  	const struct test *t;
> +	struct tmonitor_ctx *tmon;
>  
>  	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
>  		if (t->need_sotype && t->need_sotype != sotype)
> @@ -808,9 +810,14 @@ static void test_config(int sotype, sa_family_t
> family, bool inany)
>  		if (!test__start_subtest(s))
>  			continue;
>  
> +		tmon = traffic_monitor_start(NULL);
> +		ASSERT_TRUE(tmon, "traffic_monitor_start");

The same here. It's better to use ASSERT_NEQ(NULL) like patch 2.

> +
>  		setup_per_test(sotype, family, inany, t-
> >no_inner_map);
>  		t->fn(sotype, family);
>  		cleanup_per_test(t->no_inner_map);
> +
> +		traffic_monitor_stop(tmon);
>  	}
>  }
>  


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions.
  2024-07-23 18:24 ` [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions Kui-Feng Lee
  2024-07-23 22:03   ` Kui-Feng Lee
@ 2024-07-24 15:22   ` Stanislav Fomichev
  2024-07-24 15:46     ` Kui-Feng Lee
  2024-07-24 19:08   ` Martin KaFai Lau
  2 siblings, 1 reply; 22+ messages in thread
From: Stanislav Fomichev @ 2024-07-24 15:22 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: bpf, ast, martin.lau, song, kernel-team, andrii, sinquersw,
	kuifeng

On 07/23, Kui-Feng Lee wrote:
> Add functions that capture packets and print log in the background. They
> are supposed to be used for debugging flaky network test cases. A monitored
> test case should call traffic_monitor_start() to start a thread to capture
> packets in the background for a given namespace and call
> traffic_monitor_stop() to stop capturing.
> 
>     IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 68, ifindex 1, SYN
>     IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 60, ifindex 1, SYN, ACK
>     IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 60, ifindex 1, ACK
>     IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, ACK
>     IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 52, ifindex 1, FIN, ACK
>     IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, RST, ACK
>     Packet file: packets-2172-86.log
>     #280/87 select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK
> 
> The above is the output of an example. It shows the packets of a connection
> and the name of the file that contains captured packets in the directory
> /tmp/tmon_pcap. The file can be loaded by tcpdump or wireshark.
> 
> This feature only works if TRAFFIC_MONITOR variable has been passed to
> build BPF selftests. For example,
> 
>   make TRAFFIC_MONITOR=1 -C tools/testing/selftests/bpf
> 
> This command will build BPF selftests with this feature enabled.
> 
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
>  tools/testing/selftests/bpf/Makefile          |   5 +
>  tools/testing/selftests/bpf/network_helpers.c | 382 ++++++++++++++++++
>  tools/testing/selftests/bpf/network_helpers.h |  16 +
>  3 files changed, 403 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index dd49c1d23a60..9dfe17588689 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -41,6 +41,11 @@ CFLAGS += -g $(OPT_FLAGS) -rdynamic					\
>  LDFLAGS += $(SAN_LDFLAGS)
>  LDLIBS += $(LIBELF_LIBS) -lz -lrt -lpthread
>  
> +ifneq ($(TRAFFIC_MONITOR),)
> +LDLIBS += -lpcap
> +CFLAGS += -DTRAFFIC_MONITOR=1
> +endif
> +
>  # The following tests perform type punning and they may break strict
>  # aliasing rules, which are exploited by both GCC and clang by default
>  # while optimizing.  This can lead to broken programs.
> diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
> index e0cba4178e41..c881f53c8218 100644
> --- a/tools/testing/selftests/bpf/network_helpers.c
> +++ b/tools/testing/selftests/bpf/network_helpers.c
> @@ -10,6 +10,7 @@
>  
>  #include <arpa/inet.h>
>  #include <sys/mount.h>
> +#include <sys/select.h>
>  #include <sys/stat.h>
>  #include <sys/un.h>
>  
> @@ -18,6 +19,14 @@
>  #include <linux/in6.h>
>  #include <linux/limits.h>
>  
> +#include <netinet/udp.h>
> +
> +#include <pthread.h>
> +/* Prevent pcap.h from including pcap/bpf.h and causing conflicts */
> +#define PCAP_DONT_INCLUDE_PCAP_BPF_H 1
> +#include <pcap/pcap.h>
> +#include <pcap/dlt.h>
> +
>  #include "bpf_util.h"
>  #include "network_helpers.h"
>  #include "test_progs.h"
> @@ -575,6 +584,379 @@ int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
>  	return 0;
>  }
>  
> +#ifdef TRAFFIC_MONITOR
> +struct tmonitor_ctx {
> +	pcap_t *pcap;
> +	pcap_dumper_t *dumper;
> +	pthread_t thread;
> +	int wake_fd_r;
> +	int wake_fd_w;
> +
> +	bool done;
> +	char pkt_fname[PATH_MAX];
> +	int pcap_fd;
> +};


[..]

> +/* Is this packet captured with a Ethernet protocol type? */
> +static bool is_ethernet(const u_char *packet)
> +{
> +	u16 arphdr_type;
> +
> +	memcpy(&arphdr_type, packet + 8, 2);
> +	arphdr_type = ntohs(arphdr_type);
> +
> +	/* Except the following cases, the protocol type contains the
> +	 * Ethernet protocol type for the packet.
> +	 *
> +	 * https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
> +	 */
> +	switch (arphdr_type) {
> +	case 770: /* ARPHRD_FRAD */
> +	case 778: /* ARPHDR_IPGRE */
> +	case 803: /* ARPHRD_IEEE80211_RADIOTAP */
> +		return false;
> +	}
> +	return true;
> +}

Are we actually getting non-ethernet packets? Any idea why?

> +/* Show the information of the transport layer in the packet */
> +static void show_transport(const u_char *packet, u16 len, u32 ifindex,
> +			   const char *src_addr, const char *dst_addr,
> +			   u16 proto, bool ipv6)
> +{
> +	struct udphdr *udp;
> +	struct tcphdr *tcp;
> +	u16 src_port, dst_port;
> +	const char *transport_str;
> +
> +	if (proto == IPPROTO_UDP) {
> +		udp = (struct udphdr *)packet;
> +		src_port = ntohs(udp->source);
> +		dst_port = ntohs(udp->dest);
> +		transport_str = "UDP";
> +	} else if (proto == IPPROTO_TCP) {
> +		tcp = (struct tcphdr *)packet;
> +		src_port = ntohs(tcp->source);
> +		dst_port = ntohs(tcp->dest);
> +		transport_str = "TCP"
> +;
> +	} else {
> +		printf("%s (proto %d): %s -> %s, ifindex %d\n",
> +		       ipv6 ? "IPv6" : "IPv4", proto, src_addr, dst_addr, ifindex);
> +		return;
> +	}
> +
> +	if (ipv6)
> +		printf("IPv6 %s packet: [%s]:%d -> [%s]:%d, len %d, ifindex %d",
> +		       transport_str, src_addr, src_port,
> +		       dst_addr, dst_port, len, ifindex);
> +	else
> +		printf("IPv4 %s packet: %s:%d -> %s:%d, len %d, ifindex %d",
> +		       transport_str, src_addr, src_port,
> +		       dst_addr, dst_port, len, ifindex);
> +
> +	if (proto == IPPROTO_TCP) {
> +		if (tcp->fin)
> +			printf(", FIN");
> +		if (tcp->syn)
> +			printf(", SYN");
> +		if (tcp->rst)
> +			printf(", RST");
> +		if (tcp->ack)
> +			printf(", ACK");
> +	}
> +
> +	printf("\n");
> +}
> +
> +static void show_ipv6_packet(const u_char *packet, u32 ifindex)
> +{
> +	struct ipv6hdr *pkt = (struct ipv6hdr *)packet;
> +	struct in6_addr src;
> +	struct in6_addr dst;
> +	char src_str[INET6_ADDRSTRLEN], dst_str[INET6_ADDRSTRLEN];
> +	u_char proto;

[..]

> +	memcpy(&src, &pkt->saddr, sizeof(src));
> +	memcpy(&dst, &pkt->daddr, sizeof(dst));
> +	inet_ntop(AF_INET6, &src, src_str, sizeof(src_str));
> +	inet_ntop(AF_INET6, &dst, dst_str, sizeof(dst_str));

nit: can probably inet_ntop(AF_INET6, &pkt->saddr, ...) directly? No
need to copy. Same for ipv4.

> +	proto = pkt->nexthdr;
> +	show_transport(packet + sizeof(struct ipv6hdr),
> +		       ntohs(pkt->payload_len),
> +		       ifindex, src_str, dst_str, proto, true);
> +}
> +
> +static void show_ipv4_packet(const u_char *packet, u32 ifindex)
> +{
> +	struct iphdr *pkt = (struct iphdr *)packet;
> +	struct in_addr src;
> +	struct in_addr dst;
> +	u_char proto;
> +	char src_str[INET_ADDRSTRLEN], dst_str[INET_ADDRSTRLEN];
> +
> +	memcpy(&src, &pkt->saddr, sizeof(src));
> +	memcpy(&dst, &pkt->daddr, sizeof(dst));
> +	inet_ntop(AF_INET, &src, src_str, sizeof(src_str));
> +	inet_ntop(AF_INET, &dst, dst_str, sizeof(dst_str));
> +	proto = pkt->protocol;
> +	show_transport(packet + sizeof(struct iphdr),
> +		       ntohs(pkt->tot_len),
> +		       ifindex, src_str, dst_str, proto, false);
> +}
> +
> +static void *traffic_monitor_thread(void *arg)
> +{
> +	const u_char *packet, *payload;
> +	struct tmonitor_ctx *ctx = arg;
> +	struct pcap_pkthdr header;
> +	pcap_t *pcap = ctx->pcap;
> +	pcap_dumper_t *dumper = ctx->dumper;
> +	int fd = ctx->pcap_fd;
> +	int wake_fd = ctx->wake_fd_r;
> +	u16 proto;
> +	u32 ifindex;
> +	fd_set fds;
> +	int nfds, r;
> +
> +	nfds = (fd > wake_fd ? fd : wake_fd) + 1;
> +	FD_ZERO(&fds);
> +
> +	while (!ctx->done) {
> +		FD_SET(fd, &fds);
> +		FD_SET(wake_fd, &fds);
> +		r = select(nfds, &fds, NULL, NULL, NULL);
> +		if (!r)
> +			continue;
> +		if (r < 0) {
> +			if (errno == EINTR)
> +				continue;
> +			log_err("Fail to select on pcap fd and wake fd: %s", strerror(errno));
> +			break;
> +		}
> +
> +		packet = pcap_next(pcap, &header);
> +		if (!packet)
> +			continue;
> +
> +		/* According to the man page of pcap_dump(), first argument
> +		 * is the pcap_dumper_t pointer even it's argument type is
> +		 * u_char *.
> +		 */
> +		pcap_dump((u_char *)dumper, &header, packet);
> +
> +		/* Not sure what other types of packets look like. Here, we
> +		 * parse only Ethernet and compatible packets.
> +		 */
> +		if (!is_ethernet(packet)) {
> +			printf("Packet captured\n");
> +			continue;
> +		}
> +
> +		/* Skip SLL2 header
> +		 * https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
> +		 *
> +		 * Although the document doesn't mention that, the payload
> +		 * doesn't include the Ethernet header. The payload starts
> +		 * from the first byte of the network layer header.
> +		 */
> +		payload = packet + 20;
> +
> +		memcpy(&proto, packet, 2);
> +		proto = ntohs(proto);
> +		memcpy(&ifindex, packet + 4, 4);
> +		ifindex = ntohl(ifindex);
> +
> +		if (proto == ETH_P_IPV6)
> +			show_ipv6_packet(payload, ifindex);
> +		else if (proto == ETH_P_IP)
> +			show_ipv4_packet(payload, ifindex);
> +		else
> +			printf("Unknown network protocol type %x, ifindex %d\n", proto, ifindex);
> +	}
> +
> +	return NULL;
> +}
> +
> +/* Prepare the pcap handle to capture packets.
> + *
> + * This pcap is non-blocking and immediate mode is enabled to receive
> + * captured packets as soon as possible.  The snaplen is set to 1024 bytes
> + * to limit the size of captured content. The format of the link-layer
> + * header is set to DLT_LINUX_SLL2 to enable handling various link-layer
> + * technologies.
> + */
> +static pcap_t *traffic_monitor_prepare_pcap(void)
> +{
> +	char errbuf[PCAP_ERRBUF_SIZE];
> +	pcap_t *pcap;
> +	int r;
> +
> +	/* Listen on all NICs in the namespace */
> +	pcap = pcap_create("any", errbuf);
> +	if (!pcap) {
> +		log_err("Failed to open pcap: %s", errbuf);
> +		return NULL;
> +	}
> +	/* Limit the size of the packet (first N bytes) */
> +	r = pcap_set_snaplen(pcap, 1024);
> +	if (r) {
> +		log_err("Failed to set snaplen: %s", pcap_geterr(pcap));
> +		goto error;
> +	}
> +	/* To receive packets as fast as possible */
> +	r = pcap_set_immediate_mode(pcap, 1);
> +	if (r) {
> +		log_err("Failed to set immediate mode: %s", pcap_geterr(pcap));
> +		goto error;
> +	}
> +	r = pcap_setnonblock(pcap, 1, errbuf);
> +	if (r) {
> +		log_err("Failed to set nonblock: %s", errbuf);
> +		goto error;
> +	}
> +	r = pcap_activate(pcap);
> +	if (r) {
> +		log_err("Failed to activate pcap: %s", pcap_geterr(pcap));
> +		goto error;
> +	}
> +	/* Determine the format of the link-layer header */
> +	r = pcap_set_datalink(pcap, DLT_LINUX_SLL2);
> +	if (r) {
> +		log_err("Failed to set datalink: %s", pcap_geterr(pcap));
> +		goto error;
> +	}
> +
> +	return pcap;
> +error:
> +	pcap_close(pcap);
> +	return NULL;
> +}
> +
> +#define PCAP_DIR "/tmp/tmon_pcap"
> +
> +/* Start to monitor the network traffic in the given network namespace.
> + *
> + * netns: the name of the network namespace to monitor. If NULL, the
> + * current network namespace is monitored.
> + *
> + * This function will start a thread to capture packets going through NICs
> + * in the give network namespace.
> + */
> +struct tmonitor_ctx *traffic_monitor_start(const char *netns)
> +{
> +	struct tmonitor_ctx *ctx = NULL;
> +	struct nstoken *nstoken = NULL;
> +	int pipefd[2] = {-1, -1};
> +	static int tmon_seq;
> +	int r;
> +
> +	if (netns) {
> +		nstoken = open_netns(netns);
> +		if (!nstoken)
> +			return NULL;
> +	}
> +	ctx = malloc(sizeof(*ctx));
> +	if (!ctx) {
> +		log_err("Failed to malloc ctx");
> +		goto fail_ctx;
> +	}
> +	memset(ctx, 0, sizeof(*ctx));
> +
> +	snprintf(ctx->pkt_fname, sizeof(ctx->pkt_fname),
> +		 PCAP_DIR "/packets-%d-%d.log", getpid(), tmon_seq++);
> +
> +	r = mkdir(PCAP_DIR, 0755);
> +	if (r && errno != EEXIST) {
> +		log_err("Failed to create " PCAP_DIR);
> +		goto fail_pcap;
> +	}
> +
> +	ctx->pcap = traffic_monitor_prepare_pcap();
> +	if (!ctx->pcap)
> +		goto fail_pcap;
> +	ctx->pcap_fd = pcap_get_selectable_fd(ctx->pcap);
> +	if (ctx->pcap_fd < 0) {
> +		log_err("Failed to get pcap fd");
> +		goto fail_dumper;
> +	}
> +
> +	/* Create a packet file */
> +	ctx->dumper = pcap_dump_open(ctx->pcap, ctx->pkt_fname);
> +	if (!ctx->dumper) {
> +		log_err("Failed to open pcap dump");
> +		goto fail_dumper;
> +	}
> +

[..]

> +	/* Create a pipe to wake up the monitor thread */
> +	r = pipe(pipefd);
> +	if (r) {
> +		log_err("Failed to create pipe: %s", strerror(errno));
> +		goto fail;
> +	}
> +	ctx->wake_fd_r = pipefd[0];
> +	ctx->wake_fd_w = pipefd[1];

eventfd might be a simpler way to handle this. Gives you one fd which is
readable/writable. But probably ok to keep the pipe since you already
have all the code written.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
  2024-07-23 18:24 ` [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime Kui-Feng Lee
  2024-07-24  8:36   ` Geliang Tang
@ 2024-07-24 15:26   ` Stanislav Fomichev
  2024-07-24 18:04     ` Kui-Feng Lee
  1 sibling, 1 reply; 22+ messages in thread
From: Stanislav Fomichev @ 2024-07-24 15:26 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: bpf, ast, martin.lau, song, kernel-team, andrii, sinquersw,
	kuifeng

On 07/23, Kui-Feng Lee wrote:
> Enable traffic monitoring for the test case tc_redirect/tc_redirect_dtime.

Alternatively, we might extend test_progs to have some new generic
arg to enable trafficmon for a given set of tests (and then pass this
flag in the CI):

./test_progs --traffic_monitor=t1,t2,t3...

Might be useful in case we need to debug some other test in the future.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions.
  2024-07-24 15:22   ` Stanislav Fomichev
@ 2024-07-24 15:46     ` Kui-Feng Lee
  0 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-24 15:46 UTC (permalink / raw)
  To: Stanislav Fomichev, Kui-Feng Lee
  Cc: bpf, ast, martin.lau, song, kernel-team, andrii, kuifeng



On 7/24/24 08:22, Stanislav Fomichev wrote:
> On 07/23, Kui-Feng Lee wrote:
>> Add functions that capture packets and print log in the background. They
>> are supposed to be used for debugging flaky network test cases. A monitored
>> test case should call traffic_monitor_start() to start a thread to capture
>> packets in the background for a given namespace and call
>> traffic_monitor_stop() to stop capturing.
>>
>>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 68, ifindex 1, SYN
>>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 60, ifindex 1, SYN, ACK
>>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 60, ifindex 1, ACK
>>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, ACK
>>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 52, ifindex 1, FIN, ACK
>>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, RST, ACK
>>      Packet file: packets-2172-86.log
>>      #280/87 select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK
>>
>> The above is the output of an example. It shows the packets of a connection
>> and the name of the file that contains captured packets in the directory
>> /tmp/tmon_pcap. The file can be loaded by tcpdump or wireshark.
>>
>> This feature only works if TRAFFIC_MONITOR variable has been passed to
>> build BPF selftests. For example,
>>
>>    make TRAFFIC_MONITOR=1 -C tools/testing/selftests/bpf
>>
>> This command will build BPF selftests with this feature enabled.
>>
>> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
>> ---
>>   tools/testing/selftests/bpf/Makefile          |   5 +
>>   tools/testing/selftests/bpf/network_helpers.c | 382 ++++++++++++++++++
>>   tools/testing/selftests/bpf/network_helpers.h |  16 +
>>   3 files changed, 403 insertions(+)
>>
>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>> index dd49c1d23a60..9dfe17588689 100644
>> --- a/tools/testing/selftests/bpf/Makefile
>> +++ b/tools/testing/selftests/bpf/Makefile
>> @@ -41,6 +41,11 @@ CFLAGS += -g $(OPT_FLAGS) -rdynamic					\
>>   LDFLAGS += $(SAN_LDFLAGS)
>>   LDLIBS += $(LIBELF_LIBS) -lz -lrt -lpthread
>>   
>> +ifneq ($(TRAFFIC_MONITOR),)
>> +LDLIBS += -lpcap
>> +CFLAGS += -DTRAFFIC_MONITOR=1
>> +endif
>> +
>>   # The following tests perform type punning and they may break strict
>>   # aliasing rules, which are exploited by both GCC and clang by default
>>   # while optimizing.  This can lead to broken programs.
>> diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
>> index e0cba4178e41..c881f53c8218 100644
>> --- a/tools/testing/selftests/bpf/network_helpers.c
>> +++ b/tools/testing/selftests/bpf/network_helpers.c
>> @@ -10,6 +10,7 @@
>>   
>>   #include <arpa/inet.h>
>>   #include <sys/mount.h>
>> +#include <sys/select.h>
>>   #include <sys/stat.h>
>>   #include <sys/un.h>
>>   
>> @@ -18,6 +19,14 @@
>>   #include <linux/in6.h>
>>   #include <linux/limits.h>
>>   
>> +#include <netinet/udp.h>
>> +
>> +#include <pthread.h>
>> +/* Prevent pcap.h from including pcap/bpf.h and causing conflicts */
>> +#define PCAP_DONT_INCLUDE_PCAP_BPF_H 1
>> +#include <pcap/pcap.h>
>> +#include <pcap/dlt.h>
>> +
>>   #include "bpf_util.h"
>>   #include "network_helpers.h"
>>   #include "test_progs.h"
>> @@ -575,6 +584,379 @@ int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
>>   	return 0;
>>   }
>>   
>> +#ifdef TRAFFIC_MONITOR
>> +struct tmonitor_ctx {
>> +	pcap_t *pcap;
>> +	pcap_dumper_t *dumper;
>> +	pthread_t thread;
>> +	int wake_fd_r;
>> +	int wake_fd_w;
>> +
>> +	bool done;
>> +	char pkt_fname[PATH_MAX];
>> +	int pcap_fd;
>> +};
> 
> 
> [..]
> 
>> +/* Is this packet captured with a Ethernet protocol type? */
>> +static bool is_ethernet(const u_char *packet)
>> +{
>> +	u16 arphdr_type;
>> +
>> +	memcpy(&arphdr_type, packet + 8, 2);
>> +	arphdr_type = ntohs(arphdr_type);
>> +
>> +	/* Except the following cases, the protocol type contains the
>> +	 * Ethernet protocol type for the packet.
>> +	 *
>> +	 * https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
>> +	 */
>> +	switch (arphdr_type) {
>> +	case 770: /* ARPHRD_FRAD */
>> +	case 778: /* ARPHDR_IPGRE */
>> +	case 803: /* ARPHRD_IEEE80211_RADIOTAP */
>> +		return false;
>> +	}
>> +	return true;
>> +}
> 
> Are we actually getting non-ethernet packets? Any idea why?

No, I haven't get any non-ethernet packets so far. This test is just
for ensuring everything going as what we expected.

> 
>> +/* Show the information of the transport layer in the packet */
>> +static void show_transport(const u_char *packet, u16 len, u32 ifindex,
>> +			   const char *src_addr, const char *dst_addr,
>> +			   u16 proto, bool ipv6)
>> +{
>> +	struct udphdr *udp;
>> +	struct tcphdr *tcp;
>> +	u16 src_port, dst_port;
>> +	const char *transport_str;
>> +
>> +	if (proto == IPPROTO_UDP) {
>> +		udp = (struct udphdr *)packet;
>> +		src_port = ntohs(udp->source);
>> +		dst_port = ntohs(udp->dest);
>> +		transport_str = "UDP";
>> +	} else if (proto == IPPROTO_TCP) {
>> +		tcp = (struct tcphdr *)packet;
>> +		src_port = ntohs(tcp->source);
>> +		dst_port = ntohs(tcp->dest);
>> +		transport_str = "TCP"
>> +;
>> +	} else {
>> +		printf("%s (proto %d): %s -> %s, ifindex %d\n",
>> +		       ipv6 ? "IPv6" : "IPv4", proto, src_addr, dst_addr, ifindex);
>> +		return;
>> +	}
>> +
>> +	if (ipv6)
>> +		printf("IPv6 %s packet: [%s]:%d -> [%s]:%d, len %d, ifindex %d",
>> +		       transport_str, src_addr, src_port,
>> +		       dst_addr, dst_port, len, ifindex);
>> +	else
>> +		printf("IPv4 %s packet: %s:%d -> %s:%d, len %d, ifindex %d",
>> +		       transport_str, src_addr, src_port,
>> +		       dst_addr, dst_port, len, ifindex);
>> +
>> +	if (proto == IPPROTO_TCP) {
>> +		if (tcp->fin)
>> +			printf(", FIN");
>> +		if (tcp->syn)
>> +			printf(", SYN");
>> +		if (tcp->rst)
>> +			printf(", RST");
>> +		if (tcp->ack)
>> +			printf(", ACK");
>> +	}
>> +
>> +	printf("\n");
>> +}
>> +
>> +static void show_ipv6_packet(const u_char *packet, u32 ifindex)
>> +{
>> +	struct ipv6hdr *pkt = (struct ipv6hdr *)packet;
>> +	struct in6_addr src;
>> +	struct in6_addr dst;
>> +	char src_str[INET6_ADDRSTRLEN], dst_str[INET6_ADDRSTRLEN];
>> +	u_char proto;
> 
> [..]
> 
>> +	memcpy(&src, &pkt->saddr, sizeof(src));
>> +	memcpy(&dst, &pkt->daddr, sizeof(dst));
>> +	inet_ntop(AF_INET6, &src, src_str, sizeof(src_str));
>> +	inet_ntop(AF_INET6, &dst, dst_str, sizeof(dst_str));
> 
> nit: can probably inet_ntop(AF_INET6, &pkt->saddr, ...) directly? No
> need to copy. Same for ipv4.

You are right.

> 
>> +	proto = pkt->nexthdr;
>> +	show_transport(packet + sizeof(struct ipv6hdr),
>> +		       ntohs(pkt->payload_len),
>> +		       ifindex, src_str, dst_str, proto, true);
>> +}
>> +
>> +static void show_ipv4_packet(const u_char *packet, u32 ifindex)
>> +{
>> +	struct iphdr *pkt = (struct iphdr *)packet;
>> +	struct in_addr src;
>> +	struct in_addr dst;
>> +	u_char proto;
>> +	char src_str[INET_ADDRSTRLEN], dst_str[INET_ADDRSTRLEN];
>> +
>> +	memcpy(&src, &pkt->saddr, sizeof(src));
>> +	memcpy(&dst, &pkt->daddr, sizeof(dst));
>> +	inet_ntop(AF_INET, &src, src_str, sizeof(src_str));
>> +	inet_ntop(AF_INET, &dst, dst_str, sizeof(dst_str));
>> +	proto = pkt->protocol;
>> +	show_transport(packet + sizeof(struct iphdr),
>> +		       ntohs(pkt->tot_len),
>> +		       ifindex, src_str, dst_str, proto, false);
>> +}
>> +
>> +static void *traffic_monitor_thread(void *arg)
>> +{
>> +	const u_char *packet, *payload;
>> +	struct tmonitor_ctx *ctx = arg;
>> +	struct pcap_pkthdr header;
>> +	pcap_t *pcap = ctx->pcap;
>> +	pcap_dumper_t *dumper = ctx->dumper;
>> +	int fd = ctx->pcap_fd;
>> +	int wake_fd = ctx->wake_fd_r;
>> +	u16 proto;
>> +	u32 ifindex;
>> +	fd_set fds;
>> +	int nfds, r;
>> +
>> +	nfds = (fd > wake_fd ? fd : wake_fd) + 1;
>> +	FD_ZERO(&fds);
>> +
>> +	while (!ctx->done) {
>> +		FD_SET(fd, &fds);
>> +		FD_SET(wake_fd, &fds);
>> +		r = select(nfds, &fds, NULL, NULL, NULL);
>> +		if (!r)
>> +			continue;
>> +		if (r < 0) {
>> +			if (errno == EINTR)
>> +				continue;
>> +			log_err("Fail to select on pcap fd and wake fd: %s", strerror(errno));
>> +			break;
>> +		}
>> +
>> +		packet = pcap_next(pcap, &header);
>> +		if (!packet)
>> +			continue;
>> +
>> +		/* According to the man page of pcap_dump(), first argument
>> +		 * is the pcap_dumper_t pointer even it's argument type is
>> +		 * u_char *.
>> +		 */
>> +		pcap_dump((u_char *)dumper, &header, packet);
>> +
>> +		/* Not sure what other types of packets look like. Here, we
>> +		 * parse only Ethernet and compatible packets.
>> +		 */
>> +		if (!is_ethernet(packet)) {
>> +			printf("Packet captured\n");
>> +			continue;
>> +		}
>> +
>> +		/* Skip SLL2 header
>> +		 * https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
>> +		 *
>> +		 * Although the document doesn't mention that, the payload
>> +		 * doesn't include the Ethernet header. The payload starts
>> +		 * from the first byte of the network layer header.
>> +		 */
>> +		payload = packet + 20;
>> +
>> +		memcpy(&proto, packet, 2);
>> +		proto = ntohs(proto);
>> +		memcpy(&ifindex, packet + 4, 4);
>> +		ifindex = ntohl(ifindex);
>> +
>> +		if (proto == ETH_P_IPV6)
>> +			show_ipv6_packet(payload, ifindex);
>> +		else if (proto == ETH_P_IP)
>> +			show_ipv4_packet(payload, ifindex);
>> +		else
>> +			printf("Unknown network protocol type %x, ifindex %d\n", proto, ifindex);
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +/* Prepare the pcap handle to capture packets.
>> + *
>> + * This pcap is non-blocking and immediate mode is enabled to receive
>> + * captured packets as soon as possible.  The snaplen is set to 1024 bytes
>> + * to limit the size of captured content. The format of the link-layer
>> + * header is set to DLT_LINUX_SLL2 to enable handling various link-layer
>> + * technologies.
>> + */
>> +static pcap_t *traffic_monitor_prepare_pcap(void)
>> +{
>> +	char errbuf[PCAP_ERRBUF_SIZE];
>> +	pcap_t *pcap;
>> +	int r;
>> +
>> +	/* Listen on all NICs in the namespace */
>> +	pcap = pcap_create("any", errbuf);
>> +	if (!pcap) {
>> +		log_err("Failed to open pcap: %s", errbuf);
>> +		return NULL;
>> +	}
>> +	/* Limit the size of the packet (first N bytes) */
>> +	r = pcap_set_snaplen(pcap, 1024);
>> +	if (r) {
>> +		log_err("Failed to set snaplen: %s", pcap_geterr(pcap));
>> +		goto error;
>> +	}
>> +	/* To receive packets as fast as possible */
>> +	r = pcap_set_immediate_mode(pcap, 1);
>> +	if (r) {
>> +		log_err("Failed to set immediate mode: %s", pcap_geterr(pcap));
>> +		goto error;
>> +	}
>> +	r = pcap_setnonblock(pcap, 1, errbuf);
>> +	if (r) {
>> +		log_err("Failed to set nonblock: %s", errbuf);
>> +		goto error;
>> +	}
>> +	r = pcap_activate(pcap);
>> +	if (r) {
>> +		log_err("Failed to activate pcap: %s", pcap_geterr(pcap));
>> +		goto error;
>> +	}
>> +	/* Determine the format of the link-layer header */
>> +	r = pcap_set_datalink(pcap, DLT_LINUX_SLL2);
>> +	if (r) {
>> +		log_err("Failed to set datalink: %s", pcap_geterr(pcap));
>> +		goto error;
>> +	}
>> +
>> +	return pcap;
>> +error:
>> +	pcap_close(pcap);
>> +	return NULL;
>> +}
>> +
>> +#define PCAP_DIR "/tmp/tmon_pcap"
>> +
>> +/* Start to monitor the network traffic in the given network namespace.
>> + *
>> + * netns: the name of the network namespace to monitor. If NULL, the
>> + * current network namespace is monitored.
>> + *
>> + * This function will start a thread to capture packets going through NICs
>> + * in the give network namespace.
>> + */
>> +struct tmonitor_ctx *traffic_monitor_start(const char *netns)
>> +{
>> +	struct tmonitor_ctx *ctx = NULL;
>> +	struct nstoken *nstoken = NULL;
>> +	int pipefd[2] = {-1, -1};
>> +	static int tmon_seq;
>> +	int r;
>> +
>> +	if (netns) {
>> +		nstoken = open_netns(netns);
>> +		if (!nstoken)
>> +			return NULL;
>> +	}
>> +	ctx = malloc(sizeof(*ctx));
>> +	if (!ctx) {
>> +		log_err("Failed to malloc ctx");
>> +		goto fail_ctx;
>> +	}
>> +	memset(ctx, 0, sizeof(*ctx));
>> +
>> +	snprintf(ctx->pkt_fname, sizeof(ctx->pkt_fname),
>> +		 PCAP_DIR "/packets-%d-%d.log", getpid(), tmon_seq++);
>> +
>> +	r = mkdir(PCAP_DIR, 0755);
>> +	if (r && errno != EEXIST) {
>> +		log_err("Failed to create " PCAP_DIR);
>> +		goto fail_pcap;
>> +	}
>> +
>> +	ctx->pcap = traffic_monitor_prepare_pcap();
>> +	if (!ctx->pcap)
>> +		goto fail_pcap;
>> +	ctx->pcap_fd = pcap_get_selectable_fd(ctx->pcap);
>> +	if (ctx->pcap_fd < 0) {
>> +		log_err("Failed to get pcap fd");
>> +		goto fail_dumper;
>> +	}
>> +
>> +	/* Create a packet file */
>> +	ctx->dumper = pcap_dump_open(ctx->pcap, ctx->pkt_fname);
>> +	if (!ctx->dumper) {
>> +		log_err("Failed to open pcap dump");
>> +		goto fail_dumper;
>> +	}
>> +
> 
> [..]
> 
>> +	/* Create a pipe to wake up the monitor thread */
>> +	r = pipe(pipefd);
>> +	if (r) {
>> +		log_err("Failed to create pipe: %s", strerror(errno));
>> +		goto fail;
>> +	}
>> +	ctx->wake_fd_r = pipefd[0];
>> +	ctx->wake_fd_w = pipefd[1];
> 
> eventfd might be a simpler way to handle this. Gives you one fd which is
> readable/writable. But probably ok to keep the pipe since you already
> have all the code written.

Sure, I will move to eventfd. It is not a big deal.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
  2024-07-24  8:36   ` Geliang Tang
@ 2024-07-24 16:24     ` Kui-Feng Lee
  0 siblings, 0 replies; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-24 16:24 UTC (permalink / raw)
  To: Geliang Tang, Kui-Feng Lee, bpf, ast, martin.lau, song,
	kernel-team, andrii, sdf
  Cc: kuifeng



On 7/24/24 01:36, Geliang Tang wrote:
> On Tue, 2024-07-23 at 11:24 -0700, Kui-Feng Lee wrote:
>> Enable traffic monitoring for the test case
>> tc_redirect/tc_redirect_dtime.
>>
>> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
>> ---
>>   tools/testing/selftests/bpf/prog_tests/tc_redirect.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
>> b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
>> index 327d51f59142..1be6ea8d6c64 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
>> @@ -900,6 +900,7 @@ static void test_udp_dtime(struct test_tc_dtime
>> *skel, int family, bool bpf_fwd)
>>   static void test_tc_redirect_dtime(struct netns_setup_result
>> *setup_result)
>>   {
>>   	struct test_tc_dtime *skel;
>> +	struct tmonitor_ctx *tmon = NULL;
> 
> nit: No need to set "tmon" to NULL here.

Earlier "goto done" statements would need tmon to be initialized.
Does that make sense?


> 
>>   	struct nstoken *nstoken;
>>   	int hold_tstamp_fd, err;
>>   
>> @@ -934,6 +935,9 @@ static void test_tc_redirect_dtime(struct
>> netns_setup_result *setup_result)
>>   	if (!ASSERT_OK(err, "disable forwarding"))
>>   		goto done;
>>   
>> +	tmon = traffic_monitor_start(NS_DST);
>> +	ASSERT_NEQ(tmon, NULL, "traffic_monitor_start");
>> +
>>   	test_tcp_clear_dtime(skel);
>>   
>>   	test_tcp_dtime(skel, AF_INET, true);
>> @@ -958,6 +962,7 @@ static void test_tc_redirect_dtime(struct
>> netns_setup_result *setup_result)
>>   	test_udp_dtime(skel, AF_INET6, false);
>>   
>>   done:
>> +	traffic_monitor_stop(tmon);
>>   	test_tc_dtime__destroy(skel);
>>   	close(hold_tstamp_fd);
>>   }
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 3/4] selftests/bpf: Monitor traffic for sockmap_listen.
  2024-07-24  9:32   ` Geliang Tang
@ 2024-07-24 16:24     ` Kui-Feng Lee
  2024-07-24 18:11       ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-24 16:24 UTC (permalink / raw)
  To: Geliang Tang, Kui-Feng Lee, bpf, ast, martin.lau, song,
	kernel-team, andrii, sdf
  Cc: kuifeng



On 7/24/24 02:32, Geliang Tang wrote:
> On Tue, 2024-07-23 at 11:24 -0700, Kui-Feng Lee wrote:
>> Enable traffic monitor for each subtest of sockmap_listen.
>>
>> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
>> ---
>>   tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
>> b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
>> index e91b59366030..62683ccb6d56 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
>> @@ -28,6 +28,7 @@
>>   #include "test_sockmap_listen.skel.h"
>>   
>>   #include "sockmap_helpers.h"
>> +#include "network_helpers.h"
>>   
>>   static void test_insert_invalid(struct test_sockmap_listen *skel
>> __always_unused,
>>   				int family, int sotype, int mapfd)
>> @@ -1893,14 +1894,21 @@ static void test_udp_unix_redir(struct
>> test_sockmap_listen *skel, struct bpf_map
>>   {
>>   	const char *family_name, *map_name;
>>   	char s[MAX_TEST_NAME];
>> +	struct tmonitor_ctx *tmon;
>>   
>>   	family_name = family_str(family);
>>   	map_name = map_type_str(map);
>>   	snprintf(s, sizeof(s), "%s %s %s", map_name, family_name,
>> __func__);
>>   	if (!test__start_subtest(s))
>>   		return;
>> +
>> +	tmon = traffic_monitor_start(NULL);
>> +	ASSERT_TRUE(tmon, "traffic_monitor_start");
> 
> Using ASSERT_TRUE() on a pointer is a bit strange, it's better to use
> ASSERT_NEQ(NULL) like patch 2.

Sure!

> 
>> +
>>   	inet_unix_skb_redir_to_connected(skel, map, family);
>>   	unix_inet_skb_redir_to_connected(skel, map, family);
>> +
>> +	traffic_monitor_stop(tmon);
>>   }
>>   
>>   static void run_tests(struct test_sockmap_listen *skel, struct
>> bpf_map *map,
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
  2024-07-24 15:26   ` Stanislav Fomichev
@ 2024-07-24 18:04     ` Kui-Feng Lee
  2024-07-24 22:13       ` Stanislav Fomichev
  0 siblings, 1 reply; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-24 18:04 UTC (permalink / raw)
  To: Stanislav Fomichev, Kui-Feng Lee
  Cc: bpf, ast, martin.lau, song, kernel-team, andrii, kuifeng

On 7/24/24 08:26, Stanislav Fomichev wrote:
> On 07/23, Kui-Feng Lee wrote:
>> Enable traffic monitoring for the test case tc_redirect/tc_redirect_dtime.
> 
> Alternatively, we might extend test_progs to have some new generic
> arg to enable trafficmon for a given set of tests (and then pass this
> flag in the CI):
> 
> ./test_progs --traffic_monitor=t1,t2,t3...
> 
> Might be useful in case we need to debug some other test in the future.

We run a few test cases with network namespaces. So we need to
specify namespaces to monitor. And, these namespaces are not created
yet when a test starts. To adapt this approach, these test cases should
be changed to use a generic way that create network namespaces when
a test starts.

Or, we just monitor default network namespace. For test cases with
network namespaces, they need to call these functions.

WDYT?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 3/4] selftests/bpf: Monitor traffic for sockmap_listen.
  2024-07-24 16:24     ` Kui-Feng Lee
@ 2024-07-24 18:11       ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2024-07-24 18:11 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Geliang Tang, Kui-Feng Lee, bpf, ast, martin.lau, song,
	kernel-team, andrii, sdf, kuifeng

On Wed, Jul 24, 2024 at 9:24 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 7/24/24 02:32, Geliang Tang wrote:
> > On Tue, 2024-07-23 at 11:24 -0700, Kui-Feng Lee wrote:
> >> Enable traffic monitor for each subtest of sockmap_listen.
> >>
> >> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> >> ---
> >>   tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 8 ++++++++
> >>   1 file changed, 8 insertions(+)
> >>
> >> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> >> b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> >> index e91b59366030..62683ccb6d56 100644
> >> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> >> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> >> @@ -28,6 +28,7 @@
> >>   #include "test_sockmap_listen.skel.h"
> >>
> >>   #include "sockmap_helpers.h"
> >> +#include "network_helpers.h"
> >>
> >>   static void test_insert_invalid(struct test_sockmap_listen *skel
> >> __always_unused,
> >>                              int family, int sotype, int mapfd)
> >> @@ -1893,14 +1894,21 @@ static void test_udp_unix_redir(struct
> >> test_sockmap_listen *skel, struct bpf_map
> >>   {
> >>      const char *family_name, *map_name;
> >>      char s[MAX_TEST_NAME];
> >> +    struct tmonitor_ctx *tmon;
> >>
> >>      family_name = family_str(family);
> >>      map_name = map_type_str(map);
> >>      snprintf(s, sizeof(s), "%s %s %s", map_name, family_name,
> >> __func__);
> >>      if (!test__start_subtest(s))
> >>              return;
> >> +
> >> +    tmon = traffic_monitor_start(NULL);
> >> +    ASSERT_TRUE(tmon, "traffic_monitor_start");
> >
> > Using ASSERT_TRUE() on a pointer is a bit strange, it's better to use
> > ASSERT_NEQ(NULL) like patch 2.
>
> Sure!

we have ASSERT_OK_PTR() for pointers

>
> >
> >> +
> >>      inet_unix_skb_redir_to_connected(skel, map, family);
> >>      unix_inet_skb_redir_to_connected(skel, map, family);
> >> +
> >> +    traffic_monitor_stop(tmon);
> >>   }
> >>
> >>   static void run_tests(struct test_sockmap_listen *skel, struct
> >> bpf_map *map,
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions.
  2024-07-23 18:24 ` [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions Kui-Feng Lee
  2024-07-23 22:03   ` Kui-Feng Lee
  2024-07-24 15:22   ` Stanislav Fomichev
@ 2024-07-24 19:08   ` Martin KaFai Lau
  2024-07-25  1:44     ` Martin KaFai Lau
  2024-07-25 22:47     ` Kui-Feng Lee
  2 siblings, 2 replies; 22+ messages in thread
From: Martin KaFai Lau @ 2024-07-24 19:08 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, song, kernel-team, andrii, sdf, sinquersw, kuifeng

On 7/23/24 11:24 AM, Kui-Feng Lee wrote:
> Add functions that capture packets and print log in the background. They
> are supposed to be used for debugging flaky network test cases. A monitored
> test case should call traffic_monitor_start() to start a thread to capture
> packets in the background for a given namespace and call
> traffic_monitor_stop() to stop capturing.
> 
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 68, ifindex 1, SYN
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 60, ifindex 1, SYN, ACK
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 60, ifindex 1, ACK
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, ACK
>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 52, ifindex 1, FIN, ACK
>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, ifindex 1, RST, ACK
>      Packet file: packets-2172-86.log
>      #280/87 select_reuseport/sockhash IPv4/TCP LOOPBACK test_detach_bpf:OK
> 
> The above is the output of an example. It shows the packets of a connection
> and the name of the file that contains captured packets in the directory
> /tmp/tmon_pcap. The file can be loaded by tcpdump or wireshark.
> 
> This feature only works if TRAFFIC_MONITOR variable has been passed to
> build BPF selftests. For example,
> 
>    make TRAFFIC_MONITOR=1 -C tools/testing/selftests/bpf

If bpf CI does not have libpcap, it is better to get bpf CI ready first/soon.

[ ... ]

> +/* Show the information of the transport layer in the packet */
> +static void show_transport(const u_char *packet, u16 len, u32 ifindex,
> +			   const char *src_addr, const char *dst_addr,
> +			   u16 proto, bool ipv6)
> +{
> +	struct udphdr *udp;
> +	struct tcphdr *tcp;
> +	u16 src_port, dst_port;
> +	const char *transport_str;
> +
> +	if (proto == IPPROTO_UDP) {
> +		udp = (struct udphdr *)packet;
> +		src_port = ntohs(udp->source);
> +		dst_port = ntohs(udp->dest);
> +		transport_str = "UDP";
> +	} else if (proto == IPPROTO_TCP) {
> +		tcp = (struct tcphdr *)packet;
> +		src_port = ntohs(tcp->source);
> +		dst_port = ntohs(tcp->dest);
> +		transport_str = "TCP"
> +;
> +	} else {

It will be useful to at least print the ICMP[46] also. Some tests use ping to 
test. For IPv6, printing the ICMPv6 messages will be useful for debugging, e.g. 
the neigh discovery. The icmp type (and code?) should be good enough.

That should be enough to begin with. The pcap dumped file can be used for the rest.

Thanks for switching to libpcap. It is easier to handle the captured packets in 
different ways.

> +		printf("%s (proto %d): %s -> %s, ifindex %d\n",
> +		       ipv6 ? "IPv6" : "IPv4", proto, src_addr, dst_addr, ifindex);
> +		return;
> +	}
> +
> +	if (ipv6)
> +		printf("IPv6 %s packet: [%s]:%d -> [%s]:%d, len %d, ifindex %d",

It will be useful to print the ifname also such that easier for human parsing. 
It should be possible by if_indextoname (cheap enough?) if libpcap doesn't have 
it. It could be something for a later followup though. Mostly nit here.

> +		       transport_str, src_addr, src_port,
> +		       dst_addr, dst_port, len, ifindex);
> +	else
> +		printf("IPv4 %s packet: %s:%d -> %s:%d, len %d, ifindex %d",
> +		       transport_str, src_addr, src_port,
> +		       dst_addr, dst_port, len, ifindex);
> +
> +	if (proto == IPPROTO_TCP) {
> +		if (tcp->fin)
> +			printf(", FIN");
> +		if (tcp->syn)
> +			printf(", SYN");
> +		if (tcp->rst)
> +			printf(", RST");
> +		if (tcp->ack)
> +			printf(", ACK");
> +	}
> +
> +	printf("\n");
> +}

[ ... ]

> +/* Start to monitor the network traffic in the given network namespace.
> + *
> + * netns: the name of the network namespace to monitor. If NULL, the
> + * current network namespace is monitored.
> + *
> + * This function will start a thread to capture packets going through NICs
> + * in the give network namespace.
> + */
> +struct tmonitor_ctx *traffic_monitor_start(const char *netns)

There is opportunity to make the traffic monitoring easier for tests that create 
its own netns which I hope most of the networking tests fall into this bucket 
now. Especially for tests that create multiple netns such that the test does not 
have to start/stop for each individual netns.

May be adding an API like "struct nstoken *netns_new(const char *netns_name)". 
The netns_new() will create the netns and (optionally) start the monitoring 
thread also. It will need another "void netns_free(struct nstoken *nstoken)" to 
stop the thread and remove the netns. The "struct tmonitor_ctx" probably makes 
sense to be embedded into "struct nstoken" if we go with this new API.

This will need some changes to the tests creating netns but it probably should 
be obvious change considering most test do "ip netns add..." and then 
open_netns(). It can start with the flaky test at hand first like tc_redirect.

May be a little more changes for the test using "unshare(CLONE_NEWNET)" but 
should not be too bad either. This can be done only when we need to turn on 
libpcap to debug that test.

Also, when the test is flaky, make it easier for people not familiar with the 
codes of the networking test to turn on traffic monitoring without changing the 
test code. May be in a libpcap.list file (in parallel to the existing DENYLIST)?

For the tests without having its own netns, they can either move to netns (which 
I think it is a good thing to do) or use the traffic_monitor_start/stop() 
manually by changing the testing code,
or a better way is to ask test_progs do it for the host netns (init_netns) 
automatically for all tests in the libpcap.list.

wdyt?

> +{
> +	struct tmonitor_ctx *ctx = NULL;
> +	struct nstoken *nstoken = NULL;
> +	int pipefd[2] = {-1, -1};
> +	static int tmon_seq;
> +	int r;
> +
> +	if (netns) {
> +		nstoken = open_netns(netns);
> +		if (!nstoken)
> +			return NULL;
> +	}
> +	ctx = malloc(sizeof(*ctx));
> +	if (!ctx) {
> +		log_err("Failed to malloc ctx");
> +		goto fail_ctx;
> +	}
> +	memset(ctx, 0, sizeof(*ctx));
> +
> +	snprintf(ctx->pkt_fname, sizeof(ctx->pkt_fname),
> +		 PCAP_DIR "/packets-%d-%d.log", getpid(), tmon_seq++);

nit. I wonder if it is useful to also have the netns name in the filename?

Not sure if it is more useful to have the test_num and subtest_num instead of 
pid. Probably doable from looking at test__start_subtest().

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
  2024-07-24 18:04     ` Kui-Feng Lee
@ 2024-07-24 22:13       ` Stanislav Fomichev
  0 siblings, 0 replies; 22+ messages in thread
From: Stanislav Fomichev @ 2024-07-24 22:13 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Kui-Feng Lee, bpf, ast, martin.lau, song, kernel-team, andrii,
	kuifeng

On 07/24, Kui-Feng Lee wrote:
> 
> 
> On 7/24/24 08:26, Stanislav Fomichev wrote:
> > On 07/23, Kui-Feng Lee wrote:
> > > Enable traffic monitoring for the test case tc_redirect/tc_redirect_dtime.
> > 
> > Alternatively, we might extend test_progs to have some new generic
> > arg to enable trafficmon for a given set of tests (and then pass this
> > flag in the CI):
> > 
> > ./test_progs --traffic_monitor=t1,t2,t3...
> > 
> > Might be useful in case we need to debug some other test in the future.
> 
> We run a few test cases with network namespaces. So we need to
> specify namespaces to monitor. And, these namespaces are not created
> yet when a test starts. To adapt this approach, these test cases should
> be changed to use a generic way that create network namespaces when
> a test starts.
> 
> Or, we just monitor default network namespace. For test cases with
> network namespaces, they need to call these functions.
> 
> WDYT?

Ah, true, in this case ignore me :-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions.
  2024-07-24 19:08   ` Martin KaFai Lau
@ 2024-07-25  1:44     ` Martin KaFai Lau
  2024-07-25 22:47     ` Kui-Feng Lee
  1 sibling, 0 replies; 22+ messages in thread
From: Martin KaFai Lau @ 2024-07-25  1:44 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: bpf, ast, song, kernel-team, andrii, sdf, sinquersw, kuifeng

On 7/24/24 12:08 PM, Martin KaFai Lau wrote:
> For the tests without having its own netns, they can either move to netns (which 
> I think it is a good thing to do) or use the traffic_monitor_start/stop() 
> manually by changing the testing code,
> or a better way is to ask test_progs do it for the host netns (init_netns) 
> automatically for all tests in the libpcap.list.

After another thought, probably scrape the init_netns auto capturing idea for 
now. I think it could be too noisy for the tests that do use netns.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions.
  2024-07-24 19:08   ` Martin KaFai Lau
  2024-07-25  1:44     ` Martin KaFai Lau
@ 2024-07-25 22:47     ` Kui-Feng Lee
  2024-07-26  0:23       ` Martin KaFai Lau
  1 sibling, 1 reply; 22+ messages in thread
From: Kui-Feng Lee @ 2024-07-25 22:47 UTC (permalink / raw)
  To: Martin KaFai Lau, Kui-Feng Lee
  Cc: bpf, ast, song, kernel-team, andrii, sdf, kuifeng



On 7/24/24 12:08, Martin KaFai Lau wrote:
> On 7/23/24 11:24 AM, Kui-Feng Lee wrote:
>> Add functions that capture packets and print log in the background. They
>> are supposed to be used for debugging flaky network test cases. A 
>> monitored
>> test case should call traffic_monitor_start() to start a thread to 
>> capture
>> packets in the background for a given namespace and call
>> traffic_monitor_stop() to stop capturing.
>>
>>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 68, 
>> ifindex 1, SYN
>>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 60, 
>> ifindex 1, SYN, ACK
>>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 60, 
>> ifindex 1, ACK
>>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, 
>> ifindex 1, ACK
>>      IPv4 TCP packet: 127.0.0.1:48165 -> 127.0.0.1:36707, len 52, 
>> ifindex 1, FIN, ACK
>>      IPv4 TCP packet: 127.0.0.1:36707 -> 127.0.0.1:48165, len 52, 
>> ifindex 1, RST, ACK
>>      Packet file: packets-2172-86.log
>>      #280/87 select_reuseport/sockhash IPv4/TCP LOOPBACK 
>> test_detach_bpf:OK
>>
>> The above is the output of an example. It shows the packets of a 
>> connection
>> and the name of the file that contains captured packets in the directory
>> /tmp/tmon_pcap. The file can be loaded by tcpdump or wireshark.
>>
>> This feature only works if TRAFFIC_MONITOR variable has been passed to
>> build BPF selftests. For example,
>>
>>    make TRAFFIC_MONITOR=1 -C tools/testing/selftests/bpf
> 
> If bpf CI does not have libpcap, it is better to get bpf CI ready 
> first/soon.
> 
> [ ... ]
> 
>> +/* Show the information of the transport layer in the packet */
>> +static void show_transport(const u_char *packet, u16 len, u32 ifindex,
>> +               const char *src_addr, const char *dst_addr,
>> +               u16 proto, bool ipv6)
>> +{
>> +    struct udphdr *udp;
>> +    struct tcphdr *tcp;
>> +    u16 src_port, dst_port;
>> +    const char *transport_str;
>> +
>> +    if (proto == IPPROTO_UDP) {
>> +        udp = (struct udphdr *)packet;
>> +        src_port = ntohs(udp->source);
>> +        dst_port = ntohs(udp->dest);
>> +        transport_str = "UDP";
>> +    } else if (proto == IPPROTO_TCP) {
>> +        tcp = (struct tcphdr *)packet;
>> +        src_port = ntohs(tcp->source);
>> +        dst_port = ntohs(tcp->dest);
>> +        transport_str = "TCP"
>> +;
>> +    } else {
> 
> It will be useful to at least print the ICMP[46] also. Some tests use 
> ping to test. For IPv6, printing the ICMPv6 messages will be useful for 
> debugging, e.g. the neigh discovery. The icmp type (and code?) should be 
> good enough.

Right, ICMP is something should be included.

> 
> That should be enough to begin with. The pcap dumped file can be used 
> for the rest.
> 
> Thanks for switching to libpcap. It is easier to handle the captured 
> packets in different ways.
> 
>> +        printf("%s (proto %d): %s -> %s, ifindex %d\n",
>> +               ipv6 ? "IPv6" : "IPv4", proto, src_addr, dst_addr, 
>> ifindex);
>> +        return;
>> +    }
>> +
>> +    if (ipv6)
>> +        printf("IPv6 %s packet: [%s]:%d -> [%s]:%d, len %d, ifindex %d",
> 
> It will be useful to print the ifname also such that easier for human 
> parsing. It should be possible by if_indextoname (cheap enough?) if 
> libpcap doesn't have it. It could be something for a later followup 
> though. Mostly nit here.

This is not a big work. I will include it in the next version.

> 
>> +               transport_str, src_addr, src_port,
>> +               dst_addr, dst_port, len, ifindex);
>> +    else
>> +        printf("IPv4 %s packet: %s:%d -> %s:%d, len %d, ifindex %d",
>> +               transport_str, src_addr, src_port,
>> +               dst_addr, dst_port, len, ifindex);
>> +
>> +    if (proto == IPPROTO_TCP) {
>> +        if (tcp->fin)
>> +            printf(", FIN");
>> +        if (tcp->syn)
>> +            printf(", SYN");
>> +        if (tcp->rst)
>> +            printf(", RST");
>> +        if (tcp->ack)
>> +            printf(", ACK");
>> +    }
>> +
>> +    printf("\n");
>> +}
> 
> [ ... ]
> 
>> +/* Start to monitor the network traffic in the given network namespace.
>> + *
>> + * netns: the name of the network namespace to monitor. If NULL, the
>> + * current network namespace is monitored.
>> + *
>> + * This function will start a thread to capture packets going through 
>> NICs
>> + * in the give network namespace.
>> + */
>> +struct tmonitor_ctx *traffic_monitor_start(const char *netns)
> 
> There is opportunity to make the traffic monitoring easier for tests 
> that create its own netns which I hope most of the networking tests fall 
> into this bucket now. Especially for tests that create multiple netns 
> such that the test does not have to start/stop for each individual netns.
> 
> May be adding an API like "struct nstoken *netns_new(const char 
> *netns_name)". The netns_new() will create the netns and (optionally) 
> start the monitoring thread also. It will need another "void 
> netns_free(struct nstoken *nstoken)" to stop the thread and remove the 
> netns. The "struct tmonitor_ctx" probably makes sense to be embedded 
> into "struct nstoken" if we go with this new API.

Agree! But, I think we need another type rather than to reuse "struct
netns". People may accidentally call close_netns() on the nstoken
returned by this function.

> 
> This will need some changes to the tests creating netns but it probably 
> should be obvious change considering most test do "ip netns add..." and 
> then open_netns(). It can start with the flaky test at hand first like 
> tc_redirect.
> 
> May be a little more changes for the test using "unshare(CLONE_NEWNET)" 
> but should not be too bad either. This can be done only when we need to 
> turn on libpcap to debug that test.
> 
> Also, when the test is flaky, make it easier for people not familiar 
> with the codes of the networking test to turn on traffic monitoring 
> without changing the test code. May be in a libpcap.list file (in 
> parallel to the existing DENYLIST)?
> 
> For the tests without having its own netns, they can either move to 
> netns (which I think it is a good thing to do) or use the 
> traffic_monitor_start/stop() manually by changing the testing code,
> or a better way is to ask test_progs do it for the host netns 
> (init_netns) automatically for all tests in the libpcap.list.

Agree! I will start move some tests to netns, and use libpcap.list to
enable them.

> 
> wdyt?
> 
>> +{
>> +    struct tmonitor_ctx *ctx = NULL;
>> +    struct nstoken *nstoken = NULL;
>> +    int pipefd[2] = {-1, -1};
>> +    static int tmon_seq;
>> +    int r;
>> +
>> +    if (netns) {
>> +        nstoken = open_netns(netns);
>> +        if (!nstoken)
>> +            return NULL;
>> +    }
>> +    ctx = malloc(sizeof(*ctx));
>> +    if (!ctx) {
>> +        log_err("Failed to malloc ctx");
>> +        goto fail_ctx;
>> +    }
>> +    memset(ctx, 0, sizeof(*ctx));
>> +
>> +    snprintf(ctx->pkt_fname, sizeof(ctx->pkt_fname),
>> +         PCAP_DIR "/packets-%d-%d.log", getpid(), tmon_seq++);
> 
> nit. I wonder if it is useful to also have the netns name in the filename?
> 
> Not sure if it is more useful to have the test_num and subtest_num 
> instead of pid. Probably doable from looking at test__start_subtest().

It should be doable. These information is available through test_env.
Last time I tried to use env, but run into an error. I will try again.


> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions.
  2024-07-25 22:47     ` Kui-Feng Lee
@ 2024-07-26  0:23       ` Martin KaFai Lau
  0 siblings, 0 replies; 22+ messages in thread
From: Martin KaFai Lau @ 2024-07-26  0:23 UTC (permalink / raw)
  To: Kui-Feng Lee, Kui-Feng Lee
  Cc: bpf, ast, song, kernel-team, andrii, sdf, kuifeng

On 7/25/24 3:47 PM, Kui-Feng Lee wrote:
>>> +/* Start to monitor the network traffic in the given network namespace.
>>> + *
>>> + * netns: the name of the network namespace to monitor. If NULL, the
>>> + * current network namespace is monitored.
>>> + *
>>> + * This function will start a thread to capture packets going through NICs
>>> + * in the give network namespace.
>>> + */
>>> +struct tmonitor_ctx *traffic_monitor_start(const char *netns)
>>
>> There is opportunity to make the traffic monitoring easier for tests that 
>> create its own netns which I hope most of the networking tests fall into this 
>> bucket now. Especially for tests that create multiple netns such that the test 
>> does not have to start/stop for each individual netns.
>>
>> May be adding an API like "struct nstoken *netns_new(const char *netns_name)". 
>> The netns_new() will create the netns and (optionally) start the monitoring 
>> thread also. It will need another "void netns_free(struct nstoken *nstoken)" 
>> to stop the thread and remove the netns. The "struct tmonitor_ctx" probably 
>> makes sense to be embedded into "struct nstoken" if we go with this new API.
> 
> Agree! But, I think we need another type rather than to reuse "struct
> netns". People may accidentally call close_netns() on the nstoken
> returned by this function.

ah. Good point. close_netns() does free the nstoken also...
yep. probably make sense to have another type for netns create/destroy which 
start/stop the monitoring automatically based on the on/off in the libpcap.list.

> 
>>
>> This will need some changes to the tests creating netns but it probably should 
>> be obvious change considering most test do "ip netns add..." and then 
>> open_netns(). It can start with the flaky test at hand first like tc_redirect.
>>
>> May be a little more changes for the test using "unshare(CLONE_NEWNET)" but 
>> should not be too bad either. This can be done only when we need to turn on 
>> libpcap to debug that test.
>>
>> Also, when the test is flaky, make it easier for people not familiar with the 
>> codes of the networking test to turn on traffic monitoring without changing 
>> the test code. May be in a libpcap.list file (in parallel to the existing 
>> DENYLIST)?
>>
>> For the tests without having its own netns, they can either move to netns 
>> (which I think it is a good thing to do) or use the 
>> traffic_monitor_start/stop() manually by changing the testing code,
>> or a better way is to ask test_progs do it for the host netns (init_netns) 
>> automatically for all tests in the libpcap.list.
> 
> Agree! I will start move some tests to netns, and use libpcap.list to
> enable them.

The tc_redirect test should be in netns already. It seems the select_reuseport 
and the sockmap_listen test, that this patchset is touching, are not in netns. I 
hope the netns migration changes should be obvious for them. Other than those 
two flaky tests, I would separate other netns moving work to another effort.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-07-26  0:24 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-23 18:24 [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee
2024-07-23 18:24 ` [PATCH bpf-next v2 1/4] selftests/bpf: Add traffic monitor functions Kui-Feng Lee
2024-07-23 22:03   ` Kui-Feng Lee
2024-07-24 15:22   ` Stanislav Fomichev
2024-07-24 15:46     ` Kui-Feng Lee
2024-07-24 19:08   ` Martin KaFai Lau
2024-07-25  1:44     ` Martin KaFai Lau
2024-07-25 22:47     ` Kui-Feng Lee
2024-07-26  0:23       ` Martin KaFai Lau
2024-07-23 18:24 ` [PATCH bpf-next v2 2/4] selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime Kui-Feng Lee
2024-07-24  8:36   ` Geliang Tang
2024-07-24 16:24     ` Kui-Feng Lee
2024-07-24 15:26   ` Stanislav Fomichev
2024-07-24 18:04     ` Kui-Feng Lee
2024-07-24 22:13       ` Stanislav Fomichev
2024-07-23 18:24 ` [PATCH bpf-next v2 3/4] selftests/bpf: Monitor traffic for sockmap_listen Kui-Feng Lee
2024-07-24  9:32   ` Geliang Tang
2024-07-24 16:24     ` Kui-Feng Lee
2024-07-24 18:11       ` Andrii Nakryiko
2024-07-23 18:24 ` [PATCH bpf-next v2 4/4] selftests/bpf: Monitor traffic for select_reuseport Kui-Feng Lee
2024-07-24  9:33   ` Geliang Tang
2024-07-23 22:01 ` [PATCH bpf-next v2 0/4] monitor network traffic for flaky test cases Kui-Feng Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox