All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2
@ 2023-06-27  1:39 Geliang Tang
  2023-06-27  1:39 ` [PATCH mptcp-next 01/10] Squash to "selftests/bpf: Add bpf_first test" Geliang Tang
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

- Add time metrics for sched tests.
- Add bpf_stale and bpf_burst schedulers.

Geliang Tang (10):
  Squash to "selftests/bpf: Add bpf_first test"
  selftests/bpf: Add bpf_first test
  Squash to "selftests/bpf: Add bpf_bkup test"
  Squash to "selftests/bpf: Add bpf_rr test"
  Squash to "selftests/bpf: Add bpf_red test"
  selftests/bpf: Add bpf_stale scheduler
  selftests/bpf: Add bpf_stale test
  bpf: Export more bpf_burst related functions
  selftests/bpf: Add bpf_burst scheduler
  selftests/bpf: Add bpf_burst test

 net/mptcp/bpf.c                               |  16 ++
 net/mptcp/protocol.c                          |   4 +-
 net/mptcp/protocol.h                          |   4 +
 tools/testing/selftests/bpf/bpf_tcp_helpers.h |   7 +-
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 121 ++++++++++-
 .../selftests/bpf/progs/mptcp_bpf_burst.c     | 205 ++++++++++++++++++
 .../selftests/bpf/progs/mptcp_bpf_stale.c     |  65 ++++++
 7 files changed, 414 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c

-- 
2.35.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 01/10] Squash to "selftests/bpf: Add bpf_first test"
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-07-07 17:45   ` Mat Martineau
  2023-06-27  1:39 ` [PATCH mptcp-next 02/10] selftests/bpf: Add bpf_first test Geliang Tang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

'''
selftests/bpf: Add bpf scheduler test

This patch expends the MPTCP test base to support MPTCP packet scheduler
tests. Add a new test to use the default in-kernel scheduler.

In the new helper sched_init(), add two veth net devices to simulate the
multiple addresses case. Use 'ip mptcp endpoint' command to add the new
endpoint ADDR_2 to PM netlink. Use sysctl to set net.mptcp.scheduler to
use the given sched.

Invoke start_mptcp_server() to start the server on ADDR_1, and invoke
connect_to_fd() to connect with the server from the client. Then invoke
send_data() to send data.

Some code in send_data() is from prog_tests/bpf_tcp_ca.c.

Add time metrics for BPF tests to compare the performance of each
schedulers. Run prog_tests with '-v' option can print out the running
time of each test.

Use the new helper has_bytes_sent() to check the bytes_sent filed of 'ss'
output after send_data() to make sure no data has been sent on ADDR_2.
All data has been sent on the first subflow.

Invoke the new helper sched_cleanup() to set back net.mptcp.scheduler to
default, flush all mptcp endpoints, and delete the veth net devices.
'''

Note: This commit should be inserted beteen "selftests/bpf: add two mptcp
netns helpers" and "selftests/bpf: Add bpf_first scheduler":

 selftests/bpf: add two mptcp netns helpers
 selftests/bpf: Add bpf scheduler test
 selftests/bpf: Add bpf_first scheduler
 selftests/bpf: Add bpf_first test

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 43 +++++++++----------
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index a968641cc94a..39d95c6a18e3 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -3,10 +3,10 @@
 /* Copyright (c) 2022, SUSE. */
 
 #include <test_progs.h>
+#include <time.h>
 #include "cgroup_helpers.h"
 #include "network_helpers.h"
 #include "mptcp_sock.skel.h"
-#include "mptcp_bpf_first.skel.h"
 #include "mptcp_bpf_bkup.skel.h"
 #include "mptcp_bpf_rr.skel.h"
 #include "mptcp_bpf_red.skel.h"
@@ -247,15 +247,19 @@ static void *server(void *arg)
 	return NULL;
 }
 
-static void send_data(int lfd, int fd)
+static void send_data(int lfd, int fd, char *msg)
 {
 	ssize_t nr_recv = 0, bytes = 0;
+	struct timespec start, end;
+	unsigned int delta_ms;
 	pthread_t srv_thread;
 	void *thread_ret;
 	char batch[1500];
 	int err;
 
 	WRITE_ONCE(stop, 0);
+	if (clock_gettime(CLOCK_MONOTONIC, &start) < 0)
+		return;
 
 	err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd);
 	if (CHECK(err != 0, "pthread_create", "err:%d errno:%d\n", err, errno))
@@ -272,9 +276,16 @@ static void send_data(int lfd, int fd)
 		bytes += nr_recv;
 	}
 
+	if (clock_gettime(CLOCK_MONOTONIC, &end) < 0)
+		return;
+
+	delta_ms = (end.tv_sec - start.tv_sec) * 1000 + (end.tv_nsec - start.tv_nsec) / 1000000;
+
 	CHECK(bytes != total_bytes, "recv", "%zd != %u nr_recv:%zd errno:%d\n",
 	      bytes, total_bytes, nr_recv, errno);
 
+	printf("%s: %u ms\n", msg, delta_ms);
+
 	WRITE_ONCE(stop, 1);
 
 	pthread_join(srv_thread, &thread_ret);
@@ -315,39 +326,25 @@ static int has_bytes_sent(char *addr)
 	return system(cmd);
 }
 
-static void test_first(void)
+static void test_default(void)
 {
-	struct mptcp_bpf_first *first_skel;
 	int server_fd, client_fd;
 	struct nstoken *nstoken;
-	struct bpf_link *link;
-
-	first_skel = mptcp_bpf_first__open_and_load();
-	if (!ASSERT_OK_PTR(first_skel, "bpf_first__open_and_load"))
-		return;
 
-	link = bpf_map__attach_struct_ops(first_skel->maps.first);
-	if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) {
-		mptcp_bpf_first__destroy(first_skel);
-		return;
-	}
-
-	nstoken = sched_init("subflow", "bpf_first");
-	if (!ASSERT_OK_PTR(nstoken, "sched_init:bpf_first"))
+	nstoken = sched_init("subflow", "default");
+	if (!ASSERT_OK_PTR(nstoken, "sched_init:default"))
 		goto fail;
 	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
 	client_fd = connect_to_fd(server_fd, 0);
 
-	send_data(server_fd, client_fd);
+	send_data(server_fd, client_fd, "default");
 	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr_1");
-	ASSERT_GT(has_bytes_sent(ADDR_2), 0, "has_bytes_sent addr_2");
+	ASSERT_OK(has_bytes_sent(ADDR_2), "has_bytes_sent addr_2");
 
 	close(client_fd);
 	close(server_fd);
 fail:
 	cleanup_netns(nstoken);
-	bpf_link__destroy(link);
-	mptcp_bpf_first__destroy(first_skel);
 }
 
 static void test_bkup(void)
@@ -459,8 +456,8 @@ void test_mptcp(void)
 {
 	if (test__start_subtest("base"))
 		test_base();
-	if (test__start_subtest("first"))
-		test_first();
+	if (test__start_subtest("default"))
+		test_default();
 	if (test__start_subtest("bkup"))
 		test_bkup();
 	if (test__start_subtest("rr"))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 02/10] selftests/bpf: Add bpf_first test
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
  2023-06-27  1:39 ` [PATCH mptcp-next 01/10] Squash to "selftests/bpf: Add bpf_first test" Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-06-27  1:39 ` [PATCH mptcp-next 03/10] Squash to "selftests/bpf: Add bpf_bkup test" Geliang Tang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch adds the bpf_first scheduler test: test_first(). Use sysctl to
set net.mptcp.scheduler to use this sched. Add two veth net devices to
simulate the multiple addresses case. Use 'ip mptcp endpoint' command to
add the new endpoint ADDR_2 to PM netlink. Send data and check bytes_sent
of 'ss' output after it to make sure the data has been only sent on the
first subflow ADDR_1.

Note: This commit should be merged behind the commit "selftests/bpf: Add
bpf_first scheduler":

 selftests/bpf: add two mptcp netns helpers
 selftests/bpf: Add bpf scheduler test
 selftests/bpf: Add bpf_first scheduler
 selftests/bpf: Add bpf_first test

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 39d95c6a18e3..21b7ac6d72fd 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -7,6 +7,7 @@
 #include "cgroup_helpers.h"
 #include "network_helpers.h"
 #include "mptcp_sock.skel.h"
+#include "mptcp_bpf_first.skel.h"
 #include "mptcp_bpf_bkup.skel.h"
 #include "mptcp_bpf_rr.skel.h"
 #include "mptcp_bpf_red.skel.h"
@@ -347,6 +348,41 @@ static void test_default(void)
 	cleanup_netns(nstoken);
 }
 
+static void test_first(void)
+{
+	struct mptcp_bpf_first *first_skel;
+	int server_fd, client_fd;
+	struct nstoken *nstoken;
+	struct bpf_link *link;
+
+	first_skel = mptcp_bpf_first__open_and_load();
+	if (!ASSERT_OK_PTR(first_skel, "bpf_first__open_and_load"))
+		return;
+
+	link = bpf_map__attach_struct_ops(first_skel->maps.first);
+	if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) {
+		mptcp_bpf_first__destroy(first_skel);
+		return;
+	}
+
+	nstoken = sched_init("subflow", "bpf_first");
+	if (!ASSERT_OK_PTR(nstoken, "sched_init:bpf_first"))
+		goto fail;
+	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
+	client_fd = connect_to_fd(server_fd, 0);
+
+	send_data(server_fd, client_fd, "bpf_first");
+	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr_1");
+	ASSERT_GT(has_bytes_sent(ADDR_2), 0, "has_bytes_sent addr_2");
+
+	close(client_fd);
+	close(server_fd);
+fail:
+	cleanup_netns(nstoken);
+	bpf_link__destroy(link);
+	mptcp_bpf_first__destroy(first_skel);
+}
+
 static void test_bkup(void)
 {
 	struct mptcp_bpf_bkup *bkup_skel;
@@ -458,6 +494,8 @@ void test_mptcp(void)
 		test_base();
 	if (test__start_subtest("default"))
 		test_default();
+	if (test__start_subtest("first"))
+		test_first();
 	if (test__start_subtest("bkup"))
 		test_bkup();
 	if (test__start_subtest("rr"))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 03/10] Squash to "selftests/bpf: Add bpf_bkup test"
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
  2023-06-27  1:39 ` [PATCH mptcp-next 01/10] Squash to "selftests/bpf: Add bpf_first test" Geliang Tang
  2023-06-27  1:39 ` [PATCH mptcp-next 02/10] selftests/bpf: Add bpf_first test Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-06-27  1:39 ` [PATCH mptcp-next 04/10] Squash to "selftests/bpf: Add bpf_rr test" Geliang Tang
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

Update send_data().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 21b7ac6d72fd..f58b83ebe52b 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -406,7 +406,7 @@ static void test_bkup(void)
 	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
 	client_fd = connect_to_fd(server_fd, 0);
 
-	send_data(server_fd, client_fd);
+	send_data(server_fd, client_fd, "bpf_bkup");
 	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr_1");
 	ASSERT_GT(has_bytes_sent(ADDR_2), 0, "has_bytes_sent addr_2");
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 04/10] Squash to "selftests/bpf: Add bpf_rr test"
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
                   ` (2 preceding siblings ...)
  2023-06-27  1:39 ` [PATCH mptcp-next 03/10] Squash to "selftests/bpf: Add bpf_bkup test" Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-06-27  1:39 ` [PATCH mptcp-next 05/10] Squash to "selftests/bpf: Add bpf_red test" Geliang Tang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

Update send_data().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index f58b83ebe52b..0da7f83fdaea 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -441,7 +441,7 @@ static void test_rr(void)
 	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
 	client_fd = connect_to_fd(server_fd, 0);
 
-	send_data(server_fd, client_fd);
+	send_data(server_fd, client_fd, "bpf_rr");
 	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr 1");
 	ASSERT_OK(has_bytes_sent(ADDR_2), "has_bytes_sent addr 2");
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 05/10] Squash to "selftests/bpf: Add bpf_red test"
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
                   ` (3 preceding siblings ...)
  2023-06-27  1:39 ` [PATCH mptcp-next 04/10] Squash to "selftests/bpf: Add bpf_rr test" Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-06-27  1:39 ` [PATCH mptcp-next 06/10] selftests/bpf: Add bpf_stale scheduler Geliang Tang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

Update send_data().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 0da7f83fdaea..4b346eeaf8e2 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -476,7 +476,7 @@ static void test_red(void)
 	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
 	client_fd = connect_to_fd(server_fd, 0);
 
-	send_data(server_fd, client_fd);
+	send_data(server_fd, client_fd, "bpf_red");
 	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr 1");
 	ASSERT_OK(has_bytes_sent(ADDR_2), "has_bytes_sent addr 2");
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 06/10] selftests/bpf: Add bpf_stale scheduler
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
                   ` (4 preceding siblings ...)
  2023-06-27  1:39 ` [PATCH mptcp-next 05/10] Squash to "selftests/bpf: Add bpf_red test" Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-07-07 17:49   ` Mat Martineau
  2023-06-27  1:39 ` [PATCH mptcp-next 07/10] selftests/bpf: Add bpf_stale test Geliang Tang
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch implements the setting of stale flag in BPF MPTCP scheduler,
named bpf_stale. The stale flag will be set in bpf_stale_data_init() and
will be checked in bpf_stale_get_subflow().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 tools/testing/selftests/bpf/bpf_tcp_helpers.h |  3 +-
 .../selftests/bpf/progs/mptcp_bpf_stale.c     | 65 +++++++++++++++++++
 2 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c

diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
index 945dd46c98c0..c749940c9103 100644
--- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
@@ -234,7 +234,8 @@ extern void tcp_cong_avoid_ai(struct tcp_sock *tp, __u32 w, __u32 acked) __ksym;
 #define MPTCP_SUBFLOWS_MAX	8
 
 struct mptcp_subflow_context {
-	__u32	backup : 1;
+	__u32	backup : 1,
+		stale : 1;
 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
 } __attribute__((preserve_access_index));
 
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c
new file mode 100644
index 000000000000..8ef0c71a6b37
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023, SUSE. */
+
+#include <linux/bpf.h>
+#include "bpf_tcp_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+static void mptcp_subflow_set_stale(struct mptcp_subflow_context *subflow,
+				    int stale)
+{
+	subflow->stale = stale;
+}
+
+SEC("struct_ops/mptcp_sched_stale_init")
+void BPF_PROG(mptcp_sched_stale_init, struct mptcp_sock *msk)
+{
+}
+
+SEC("struct_ops/mptcp_sched_stale_release")
+void BPF_PROG(mptcp_sched_stale_release, struct mptcp_sock *msk)
+{
+}
+
+void BPF_STRUCT_OPS(bpf_stale_data_init, struct mptcp_sock *msk,
+		    struct mptcp_sched_data *data)
+{
+	struct mptcp_subflow_context *subflow;
+
+	mptcp_sched_data_set_contexts(msk, data);
+	subflow = mptcp_subflow_ctx_by_pos(data, 1);
+	if (subflow)
+		mptcp_subflow_set_stale(subflow, 1);
+}
+
+int BPF_STRUCT_OPS(bpf_stale_get_subflow, struct mptcp_sock *msk,
+		   const struct mptcp_sched_data *data)
+{
+	int nr = 0;
+
+	for (int i = 0; i < data->subflows && i < MPTCP_SUBFLOWS_MAX; i++) {
+		struct mptcp_subflow_context *subflow;
+
+		subflow = mptcp_subflow_ctx_by_pos(data, i);
+		if (!subflow)
+			break;
+
+		if (!BPF_CORE_READ_BITFIELD_PROBED(subflow, stale))
+			break;
+
+		nr = i;
+	}
+
+	mptcp_subflow_set_scheduled(mptcp_subflow_ctx_by_pos(data, nr), true);
+	return 0;
+}
+
+SEC(".struct_ops")
+struct mptcp_sched_ops stale = {
+	.init		= (void *)mptcp_sched_stale_init,
+	.release	= (void *)mptcp_sched_stale_release,
+	.data_init	= (void *)bpf_stale_data_init,
+	.get_subflow	= (void *)bpf_stale_get_subflow,
+	.name		= "bpf_stale",
+};
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 07/10] selftests/bpf: Add bpf_stale test
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
                   ` (5 preceding siblings ...)
  2023-06-27  1:39 ` [PATCH mptcp-next 06/10] selftests/bpf: Add bpf_stale scheduler Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-07-07 17:54   ` Mat Martineau
  2023-06-27  1:39 ` [PATCH mptcp-next 08/10] bpf: Export more bpf_burst related functions Geliang Tang
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch adds the bpf_stale scheduler test: test_stale(). Use sysctl to
set net.mptcp.scheduler to use this sched. Add two veth net devices to
simulate the multiple addresses case. Use 'ip mptcp endpoint' command to
add the new endpoint ADDR_2 to PM netlink. Send data and check bytes_sent
of 'ss' output after it to make sure the data has been only sent on ADDR_1
since ADDR_2 is set as stale.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 4b346eeaf8e2..851ea32dc1d0 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -11,6 +11,7 @@
 #include "mptcp_bpf_bkup.skel.h"
 #include "mptcp_bpf_rr.skel.h"
 #include "mptcp_bpf_red.skel.h"
+#include "mptcp_bpf_stale.skel.h"
 
 char NS_TEST[32];
 
@@ -488,6 +489,41 @@ static void test_red(void)
 	mptcp_bpf_red__destroy(red_skel);
 }
 
+static void test_stale(void)
+{
+	struct mptcp_bpf_stale *stale_skel;
+	int server_fd, client_fd;
+	struct nstoken *nstoken;
+	struct bpf_link *link;
+
+	stale_skel = mptcp_bpf_stale__open_and_load();
+	if (!ASSERT_OK_PTR(stale_skel, "bpf_stale__open_and_load"))
+		return;
+
+	link = bpf_map__attach_struct_ops(stale_skel->maps.stale);
+	if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) {
+		mptcp_bpf_stale__destroy(stale_skel);
+		return;
+	}
+
+	nstoken = sched_init("subflow", "bpf_stale");
+	if (!ASSERT_OK_PTR(nstoken, "sched_init:bpf_stale"))
+		goto fail;
+	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
+	client_fd = connect_to_fd(server_fd, 0);
+
+	send_data(server_fd, client_fd, "bpf_stale");
+	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr_1");
+	ASSERT_GT(has_bytes_sent(ADDR_2), 0, "has_bytes_sent addr_2");
+
+	close(client_fd);
+	close(server_fd);
+fail:
+	cleanup_netns(nstoken);
+	bpf_link__destroy(link);
+	mptcp_bpf_stale__destroy(stale_skel);
+}
+
 void test_mptcp(void)
 {
 	if (test__start_subtest("base"))
@@ -502,4 +538,6 @@ void test_mptcp(void)
 		test_rr();
 	if (test__start_subtest("red"))
 		test_red();
+	if (test__start_subtest("stale"))
+		test_stale();
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 08/10] bpf: Export more bpf_burst related functions
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
                   ` (6 preceding siblings ...)
  2023-06-27  1:39 ` [PATCH mptcp-next 07/10] selftests/bpf: Add bpf_stale test Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-07-07 18:01   ` Mat Martineau
  2023-06-27  1:39 ` [PATCH mptcp-next 09/10] selftests/bpf: Add bpf_burst scheduler Geliang Tang
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

sk_stream_memory_free() and tcp_rtx_and_write_queues_empty() are needed
to export into the BPF context for bpf_burst scheduler. But these two
functions are inline ones. So this patch added two wrappers for them,
and export the wrappers in the BPF context.

Add more bpf_burst related functions into bpf_mptcp_sched_kfunc_set to make
sure these helpers can be accessed from the BPF context.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 net/mptcp/bpf.c      | 16 ++++++++++++++++
 net/mptcp/protocol.c |  4 ++--
 net/mptcp/protocol.h |  4 ++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index c580add9c7f1..a1c85605ed39 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -171,10 +171,26 @@ struct bpf_struct_ops bpf_mptcp_sched_ops = {
 	.name		= "mptcp_sched_ops",
 };
 
+bool bpf_sk_stream_memory_free(const struct sock *sk)
+{
+	return sk_stream_memory_free(sk);
+}
+
+bool bpf_tcp_rtx_and_write_queues_empty(const struct sock *sk)
+{
+	return tcp_rtx_and_write_queues_empty(sk);
+}
+
 BTF_SET8_START(bpf_mptcp_sched_kfunc_ids)
 BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled)
 BTF_ID_FLAGS(func, mptcp_sched_data_set_contexts)
 BTF_ID_FLAGS(func, mptcp_subflow_ctx_by_pos)
+BTF_ID_FLAGS(func, mptcp_subflow_active)
+BTF_ID_FLAGS(func, mptcp_set_timeout)
+BTF_ID_FLAGS(func, mptcp_wnd_end)
+BTF_ID_FLAGS(func, bpf_sk_stream_memory_free)
+BTF_ID_FLAGS(func, bpf_tcp_rtx_and_write_queues_empty)
+BTF_ID_FLAGS(func, mptcp_pm_subflow_chk_stale)
 BTF_SET8_END(bpf_mptcp_sched_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_mptcp_sched_kfunc_set = {
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 5f9f046b2124..84a82967b009 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -50,7 +50,7 @@ DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
 static struct net_device mptcp_napi_dev;
 
 /* Returns end sequence number of the receiver's advertised window */
-static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
+u64 mptcp_wnd_end(const struct mptcp_sock *msk)
 {
 	return READ_ONCE(msk->wnd_end);
 }
@@ -497,7 +497,7 @@ static long mptcp_timeout_from_subflow(const struct mptcp_subflow_context *subfl
 	       inet_csk(ssk)->icsk_timeout - jiffies : 0;
 }
 
-static void mptcp_set_timeout(struct sock *sk)
+void mptcp_set_timeout(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow;
 	long tout = 0;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index bb4d50c8c398..58a634fc2fcc 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -636,6 +636,10 @@ void __mptcp_subflow_send_ack(struct sock *ssk);
 void mptcp_subflow_reset(struct sock *ssk);
 void mptcp_subflow_queue_clean(struct sock *sk, struct sock *ssk);
 void mptcp_sock_graft(struct sock *sk, struct socket *parent);
+u64 mptcp_wnd_end(const struct mptcp_sock *msk);
+void mptcp_set_timeout(struct sock *sk);
+bool bpf_sk_stream_memory_free(const struct sock *sk);
+bool bpf_tcp_rtx_and_write_queues_empty(const struct sock *sk);
 struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk);
 bool __mptcp_close(struct sock *sk, long timeout);
 void mptcp_cancel_work(struct sock *sk);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 09/10] selftests/bpf: Add bpf_burst scheduler
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
                   ` (7 preceding siblings ...)
  2023-06-27  1:39 ` [PATCH mptcp-next 08/10] bpf: Export more bpf_burst related functions Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-07-07 18:09   ` Mat Martineau
  2023-06-27  1:39 ` [PATCH mptcp-next 10/10] selftests/bpf: Add bpf_burst test Geliang Tang
  2023-07-12 16:45 ` [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Matthieu Baerts
  10 siblings, 1 reply; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch implements the burst BPF MPTCP scheduler, named bpf_burst,
which is the default scheduler in protocol.c. bpf_burst_get_send() uses
the same logic as mptcp_subflow_get_send() and bpf_burst_get_retrans
uses the same logic as mptcp_subflow_get_retrans().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 tools/testing/selftests/bpf/bpf_tcp_helpers.h |   4 +
 .../selftests/bpf/progs/mptcp_bpf_burst.c     | 205 ++++++++++++++++++
 2 files changed, 209 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c

diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
index c749940c9103..c1d7963c3bc8 100644
--- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
@@ -36,6 +36,7 @@ enum sk_pacing {
 struct sock {
 	struct sock_common	__sk_common;
 #define sk_state		__sk_common.skc_state
+	int			sk_wmem_queued;
 	unsigned long		sk_pacing_rate;
 	__u32			sk_pacing_status; /* see enum sk_pacing */
 } __attribute__((preserve_access_index));
@@ -234,8 +235,10 @@ extern void tcp_cong_avoid_ai(struct tcp_sock *tp, __u32 w, __u32 acked) __ksym;
 #define MPTCP_SUBFLOWS_MAX	8
 
 struct mptcp_subflow_context {
+	unsigned long avg_pacing_rate;
 	__u32	backup : 1,
 		stale : 1;
+	__u8	stale_count;
 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
 } __attribute__((preserve_access_index));
 
@@ -260,6 +263,7 @@ struct mptcp_sched_ops {
 struct mptcp_sock {
 	struct inet_connection_sock	sk;
 
+	__u64		snd_nxt;
 	__u32		token;
 	struct sock	*first;
 	char		ca_name[TCP_CA_NAME_MAX];
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
new file mode 100644
index 000000000000..1886e2f7aca4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023, SUSE. */
+
+#include <linux/bpf.h>
+#include <limits.h>
+#include "bpf_tcp_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct mptcp_burst_storage {
+	int snd_burst;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, struct mptcp_burst_storage);
+} mptcp_burst_map SEC(".maps");
+
+#define MPTCP_SEND_BURST_SIZE	65428
+
+struct subflow_send_info {
+	__u8 subflow_id;
+	__u64 linger_time;
+};
+
+static inline __u64 div_u64_rem(__u64 dividend, __u32 divisor, __u32 *remainder)
+{
+	*remainder = dividend % divisor;
+	return dividend / divisor;
+}
+
+static inline __u64 div_u64(__u64 dividend, __u32 divisor)
+{
+	__u32 remainder;
+
+	return div_u64_rem(dividend, divisor, &remainder);
+}
+
+extern bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) __ksym;
+extern void mptcp_set_timeout(struct sock *sk) __ksym;
+extern __u64 mptcp_wnd_end(const struct mptcp_sock *msk) __ksym;
+extern bool bpf_sk_stream_memory_free(const struct sock *sk) __ksym;
+extern bool bpf_tcp_rtx_and_write_queues_empty(const struct sock *sk) __ksym;
+extern void mptcp_pm_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk) __ksym;
+
+#define SSK_MODE_ACTIVE	0
+#define SSK_MODE_BACKUP	1
+#define SSK_MODE_MAX	2
+
+SEC("struct_ops/mptcp_sched_burst_init")
+void BPF_PROG(mptcp_sched_burst_init, struct mptcp_sock *msk)
+{
+}
+
+SEC("struct_ops/mptcp_sched_burst_release")
+void BPF_PROG(mptcp_sched_burst_release, struct mptcp_sock *msk)
+{
+	bpf_sk_storage_delete(&mptcp_burst_map, msk);
+}
+
+void BPF_STRUCT_OPS(bpf_burst_data_init, struct mptcp_sock *msk,
+		    struct mptcp_sched_data *data)
+{
+	mptcp_sched_data_set_contexts(msk, data);
+}
+
+static int bpf_burst_get_send(struct mptcp_sock *msk,
+			      const struct mptcp_sched_data *data)
+{
+	struct subflow_send_info send_info[SSK_MODE_MAX];
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	struct mptcp_burst_storage *ptr;
+	__u32 pace, burst, wmem;
+	__u64 linger_time;
+	struct sock *ssk;
+	int i;
+
+	/* pick the subflow with the lower wmem/wspace ratio */
+	for (i = 0; i < SSK_MODE_MAX; ++i) {
+		send_info[i].subflow_id = MPTCP_SUBFLOWS_MAX;
+		send_info[i].linger_time = -1;
+	}
+
+	for (i = 0; i < data->subflows && i < MPTCP_SUBFLOWS_MAX; i++) {
+		subflow = mptcp_subflow_ctx_by_pos(data, i);
+		if (!subflow)
+			break;
+
+		ssk = mptcp_subflow_tcp_sock(subflow);
+		if (!mptcp_subflow_active(subflow))
+			continue;
+
+		pace = subflow->avg_pacing_rate;
+		if (!pace) {
+			/* init pacing rate from socket */
+			subflow->avg_pacing_rate = ssk->sk_pacing_rate;
+			pace = subflow->avg_pacing_rate;
+			if (!pace)
+				continue;
+		}
+
+		linger_time = div_u64((__u64)ssk->sk_wmem_queued << 32, pace);
+		if (linger_time < send_info[subflow->backup].linger_time) {
+			send_info[subflow->backup].subflow_id = i;
+			send_info[subflow->backup].linger_time = linger_time;
+		}
+	}
+	mptcp_set_timeout(sk);
+
+	/* pick the best backup if no other subflow is active */
+	if (send_info[SSK_MODE_ACTIVE].subflow_id == MPTCP_SUBFLOWS_MAX)
+		send_info[SSK_MODE_ACTIVE].subflow_id = send_info[SSK_MODE_BACKUP].subflow_id;
+
+	subflow = mptcp_subflow_ctx_by_pos(data, send_info[SSK_MODE_ACTIVE].subflow_id);
+	if (!subflow)
+		return -1;
+	ssk = mptcp_subflow_tcp_sock(subflow);
+	if (!ssk || !bpf_sk_stream_memory_free(ssk))
+		return -1;
+
+	burst = min(MPTCP_SEND_BURST_SIZE, mptcp_wnd_end(msk) - msk->snd_nxt);
+	wmem = ssk->sk_wmem_queued;
+	if (!burst)
+		goto out;
+
+	subflow->avg_pacing_rate = div_u64((__u64)subflow->avg_pacing_rate * wmem +
+					   ssk->sk_pacing_rate * burst,
+					   burst + wmem);
+	ptr = bpf_sk_storage_get(&mptcp_burst_map, msk, 0,
+				 BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (ptr)
+		ptr->snd_burst = burst;
+
+out:
+	mptcp_subflow_set_scheduled(subflow, true);
+	return 0;
+}
+
+static int bpf_burst_get_retrans(struct mptcp_sock *msk,
+				 const struct mptcp_sched_data *data)
+{
+	int backup = MPTCP_SUBFLOWS_MAX, pick = MPTCP_SUBFLOWS_MAX, subflow_id;
+	struct mptcp_subflow_context *subflow;
+	int min_stale_count = INT_MAX;
+	struct sock *ssk;
+
+	for (int i = 0; i < data->subflows && i < MPTCP_SUBFLOWS_MAX; i++) {
+		subflow = mptcp_subflow_ctx_by_pos(data, i);
+		if (!subflow)
+			break;
+
+		if (!mptcp_subflow_active(subflow))
+			continue;
+
+		ssk = mptcp_subflow_tcp_sock(subflow);
+		/* still data outstanding at TCP level? skip this */
+		if (!bpf_tcp_rtx_and_write_queues_empty(ssk)) {
+			mptcp_pm_subflow_chk_stale(msk, ssk);
+			min_stale_count = min(min_stale_count, subflow->stale_count);
+			continue;
+		}
+
+		if (subflow->backup) {
+			if (backup == MPTCP_SUBFLOWS_MAX)
+				backup = i;
+			continue;
+		}
+
+		if (pick == MPTCP_SUBFLOWS_MAX)
+			pick = i;
+	}
+
+	if (pick < MPTCP_SUBFLOWS_MAX) {
+		subflow_id = pick;
+		goto out;
+	}
+	subflow_id = min_stale_count > 1 ? backup : MPTCP_SUBFLOWS_MAX;
+
+out:
+	subflow = mptcp_subflow_ctx_by_pos(data, subflow_id);
+	if (!subflow)
+		return -1;
+	mptcp_subflow_set_scheduled(subflow, true);
+	return 0;
+}
+
+int BPF_STRUCT_OPS(bpf_burst_get_subflow, struct mptcp_sock *msk,
+		   const struct mptcp_sched_data *data)
+{
+	if (data->reinject)
+		return bpf_burst_get_retrans(msk, data);
+	return bpf_burst_get_send(msk, data);
+}
+
+SEC(".struct_ops")
+struct mptcp_sched_ops burst = {
+	.init		= (void *)mptcp_sched_burst_init,
+	.release	= (void *)mptcp_sched_burst_release,
+	.data_init	= (void *)bpf_burst_data_init,
+	.get_subflow	= (void *)bpf_burst_get_subflow,
+	.name		= "bpf_burst",
+};
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH mptcp-next 10/10] selftests/bpf: Add bpf_burst test
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
                   ` (8 preceding siblings ...)
  2023-06-27  1:39 ` [PATCH mptcp-next 09/10] selftests/bpf: Add bpf_burst scheduler Geliang Tang
@ 2023-06-27  1:39 ` Geliang Tang
  2023-07-07 19:02   ` selftests/bpf: Add bpf_burst test: Tests Results MPTCP CI
  2023-07-12 16:45 ` [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Matthieu Baerts
  10 siblings, 1 reply; 18+ messages in thread
From: Geliang Tang @ 2023-06-27  1:39 UTC (permalink / raw)
  To: mptcp; +Cc: Geliang Tang

This patch adds the burst BPF MPTCP scheduler test: test_burst(). Use
sysctl to set net.mptcp.scheduler to use this sched. Add two veth net
devices to simulate the multiple addresses case. Use 'ip mptcp endpoint'
command to add the new endpoint ADDR_2 to PM netlink. Send data and check
bytes_sent of 'ss' output after it to make sure the data has been sent
on both net devices.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 851ea32dc1d0..43c0645ddc2a 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -12,6 +12,7 @@
 #include "mptcp_bpf_rr.skel.h"
 #include "mptcp_bpf_red.skel.h"
 #include "mptcp_bpf_stale.skel.h"
+#include "mptcp_bpf_burst.skel.h"
 
 char NS_TEST[32];
 
@@ -524,6 +525,41 @@ static void test_stale(void)
 	mptcp_bpf_stale__destroy(stale_skel);
 }
 
+static void test_burst(void)
+{
+	struct mptcp_bpf_burst *burst_skel;
+	int server_fd, client_fd;
+	struct nstoken *nstoken;
+	struct bpf_link *link;
+
+	burst_skel = mptcp_bpf_burst__open_and_load();
+	if (!ASSERT_OK_PTR(burst_skel, "bpf_burst__open_and_load"))
+		return;
+
+	link = bpf_map__attach_struct_ops(burst_skel->maps.burst);
+	if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) {
+		mptcp_bpf_burst__destroy(burst_skel);
+		return;
+	}
+
+	nstoken = sched_init("subflow", "bpf_burst");
+	if (!ASSERT_OK_PTR(nstoken, "sched_init:bpf_burst"))
+		goto fail;
+	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
+	client_fd = connect_to_fd(server_fd, 0);
+
+	send_data(server_fd, client_fd, "bpf_burst");
+	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr 1");
+	ASSERT_OK(has_bytes_sent(ADDR_2), "has_bytes_sent addr 2");
+
+	close(client_fd);
+	close(server_fd);
+fail:
+	cleanup_netns(nstoken);
+	bpf_link__destroy(link);
+	mptcp_bpf_burst__destroy(burst_skel);
+}
+
 void test_mptcp(void)
 {
 	if (test__start_subtest("base"))
@@ -540,4 +576,6 @@ void test_mptcp(void)
 		test_red();
 	if (test__start_subtest("stale"))
 		test_stale();
+	if (test__start_subtest("burst"))
+		test_burst();
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH mptcp-next 01/10] Squash to "selftests/bpf: Add bpf_first test"
  2023-06-27  1:39 ` [PATCH mptcp-next 01/10] Squash to "selftests/bpf: Add bpf_first test" Geliang Tang
@ 2023-07-07 17:45   ` Mat Martineau
  0 siblings, 0 replies; 18+ messages in thread
From: Mat Martineau @ 2023-07-07 17:45 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp


On Tue, 27 Jun 2023, Geliang Tang wrote:

> '''
> selftests/bpf: Add bpf scheduler test
>
> This patch expends the MPTCP test base to support MPTCP packet scheduler

I think you mean "extends" rather than "expends" here, but other than that 
patches 1-5 are ok to merge:

Reviewed-by: Mat Martineau <martineau@kernel.org>



> tests. Add a new test to use the default in-kernel scheduler.
>
> In the new helper sched_init(), add two veth net devices to simulate the
> multiple addresses case. Use 'ip mptcp endpoint' command to add the new
> endpoint ADDR_2 to PM netlink. Use sysctl to set net.mptcp.scheduler to
> use the given sched.
>
> Invoke start_mptcp_server() to start the server on ADDR_1, and invoke
> connect_to_fd() to connect with the server from the client. Then invoke
> send_data() to send data.
>
> Some code in send_data() is from prog_tests/bpf_tcp_ca.c.
>
> Add time metrics for BPF tests to compare the performance of each
> schedulers. Run prog_tests with '-v' option can print out the running
> time of each test.
>
> Use the new helper has_bytes_sent() to check the bytes_sent filed of 'ss'
> output after send_data() to make sure no data has been sent on ADDR_2.
> All data has been sent on the first subflow.
>
> Invoke the new helper sched_cleanup() to set back net.mptcp.scheduler to
> default, flush all mptcp endpoints, and delete the veth net devices.
> '''
>
> Note: This commit should be inserted beteen "selftests/bpf: add two mptcp
> netns helpers" and "selftests/bpf: Add bpf_first scheduler":
>
> selftests/bpf: add two mptcp netns helpers
> selftests/bpf: Add bpf scheduler test
> selftests/bpf: Add bpf_first scheduler
> selftests/bpf: Add bpf_first test
>
> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
> ---
> .../testing/selftests/bpf/prog_tests/mptcp.c  | 43 +++++++++----------
> 1 file changed, 20 insertions(+), 23 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> index a968641cc94a..39d95c6a18e3 100644
> --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
> +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> @@ -3,10 +3,10 @@
> /* Copyright (c) 2022, SUSE. */
>
> #include <test_progs.h>
> +#include <time.h>
> #include "cgroup_helpers.h"
> #include "network_helpers.h"
> #include "mptcp_sock.skel.h"
> -#include "mptcp_bpf_first.skel.h"
> #include "mptcp_bpf_bkup.skel.h"
> #include "mptcp_bpf_rr.skel.h"
> #include "mptcp_bpf_red.skel.h"
> @@ -247,15 +247,19 @@ static void *server(void *arg)
> 	return NULL;
> }
>
> -static void send_data(int lfd, int fd)
> +static void send_data(int lfd, int fd, char *msg)
> {
> 	ssize_t nr_recv = 0, bytes = 0;
> +	struct timespec start, end;
> +	unsigned int delta_ms;
> 	pthread_t srv_thread;
> 	void *thread_ret;
> 	char batch[1500];
> 	int err;
>
> 	WRITE_ONCE(stop, 0);
> +	if (clock_gettime(CLOCK_MONOTONIC, &start) < 0)
> +		return;
>
> 	err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd);
> 	if (CHECK(err != 0, "pthread_create", "err:%d errno:%d\n", err, errno))
> @@ -272,9 +276,16 @@ static void send_data(int lfd, int fd)
> 		bytes += nr_recv;
> 	}
>
> +	if (clock_gettime(CLOCK_MONOTONIC, &end) < 0)
> +		return;
> +
> +	delta_ms = (end.tv_sec - start.tv_sec) * 1000 + (end.tv_nsec - start.tv_nsec) / 1000000;
> +
> 	CHECK(bytes != total_bytes, "recv", "%zd != %u nr_recv:%zd errno:%d\n",
> 	      bytes, total_bytes, nr_recv, errno);
>
> +	printf("%s: %u ms\n", msg, delta_ms);
> +
> 	WRITE_ONCE(stop, 1);
>
> 	pthread_join(srv_thread, &thread_ret);
> @@ -315,39 +326,25 @@ static int has_bytes_sent(char *addr)
> 	return system(cmd);
> }
>
> -static void test_first(void)
> +static void test_default(void)
> {
> -	struct mptcp_bpf_first *first_skel;
> 	int server_fd, client_fd;
> 	struct nstoken *nstoken;
> -	struct bpf_link *link;
> -
> -	first_skel = mptcp_bpf_first__open_and_load();
> -	if (!ASSERT_OK_PTR(first_skel, "bpf_first__open_and_load"))
> -		return;
>
> -	link = bpf_map__attach_struct_ops(first_skel->maps.first);
> -	if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) {
> -		mptcp_bpf_first__destroy(first_skel);
> -		return;
> -	}
> -
> -	nstoken = sched_init("subflow", "bpf_first");
> -	if (!ASSERT_OK_PTR(nstoken, "sched_init:bpf_first"))
> +	nstoken = sched_init("subflow", "default");
> +	if (!ASSERT_OK_PTR(nstoken, "sched_init:default"))
> 		goto fail;
> 	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
> 	client_fd = connect_to_fd(server_fd, 0);
>
> -	send_data(server_fd, client_fd);
> +	send_data(server_fd, client_fd, "default");
> 	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr_1");
> -	ASSERT_GT(has_bytes_sent(ADDR_2), 0, "has_bytes_sent addr_2");
> +	ASSERT_OK(has_bytes_sent(ADDR_2), "has_bytes_sent addr_2");
>
> 	close(client_fd);
> 	close(server_fd);
> fail:
> 	cleanup_netns(nstoken);
> -	bpf_link__destroy(link);
> -	mptcp_bpf_first__destroy(first_skel);
> }
>
> static void test_bkup(void)
> @@ -459,8 +456,8 @@ void test_mptcp(void)
> {
> 	if (test__start_subtest("base"))
> 		test_base();
> -	if (test__start_subtest("first"))
> -		test_first();
> +	if (test__start_subtest("default"))
> +		test_default();
> 	if (test__start_subtest("bkup"))
> 		test_bkup();
> 	if (test__start_subtest("rr"))
> -- 
> 2.35.3
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH mptcp-next 06/10] selftests/bpf: Add bpf_stale scheduler
  2023-06-27  1:39 ` [PATCH mptcp-next 06/10] selftests/bpf: Add bpf_stale scheduler Geliang Tang
@ 2023-07-07 17:49   ` Mat Martineau
  0 siblings, 0 replies; 18+ messages in thread
From: Mat Martineau @ 2023-07-07 17:49 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp

On Tue, 27 Jun 2023, Geliang Tang wrote:

> This patch implements the setting of stale flag in BPF MPTCP scheduler,
> named bpf_stale. The stale flag will be set in bpf_stale_data_init() and
> will be checked in bpf_stale_get_subflow().
>
> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
> ---
> tools/testing/selftests/bpf/bpf_tcp_helpers.h |  3 +-
> .../selftests/bpf/progs/mptcp_bpf_stale.c     | 65 +++++++++++++++++++
> 2 files changed, 67 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c
>
> diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
> index 945dd46c98c0..c749940c9103 100644
> --- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
> @@ -234,7 +234,8 @@ extern void tcp_cong_avoid_ai(struct tcp_sock *tp, __u32 w, __u32 acked) __ksym;
> #define MPTCP_SUBFLOWS_MAX	8
>
> struct mptcp_subflow_context {
> -	__u32	backup : 1;
> +	__u32	backup : 1,
> +		stale : 1;
> 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
> } __attribute__((preserve_access_index));
>
> diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c
> new file mode 100644
> index 000000000000..8ef0c71a6b37
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_stale.c
> @@ -0,0 +1,65 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023, SUSE. */
> +
> +#include <linux/bpf.h>
> +#include "bpf_tcp_helpers.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +static void mptcp_subflow_set_stale(struct mptcp_subflow_context *subflow,
> +				    int stale)
> +{
> +	subflow->stale = stale;
> +}
> +
> +SEC("struct_ops/mptcp_sched_stale_init")
> +void BPF_PROG(mptcp_sched_stale_init, struct mptcp_sock *msk)
> +{
> +}
> +
> +SEC("struct_ops/mptcp_sched_stale_release")
> +void BPF_PROG(mptcp_sched_stale_release, struct mptcp_sock *msk)
> +{
> +}
> +
> +void BPF_STRUCT_OPS(bpf_stale_data_init, struct mptcp_sock *msk,
> +		    struct mptcp_sched_data *data)
> +{
> +	struct mptcp_subflow_context *subflow;
> +
> +	mptcp_sched_data_set_contexts(msk, data);
> +	subflow = mptcp_subflow_ctx_by_pos(data, 1);
> +	if (subflow)
> +		mptcp_subflow_set_stale(subflow, 1);
> +}
> +
> +int BPF_STRUCT_OPS(bpf_stale_get_subflow, struct mptcp_sock *msk,
> +		   const struct mptcp_sched_data *data)
> +{
> +	int nr = 0;
> +
> +	for (int i = 0; i < data->subflows && i < MPTCP_SUBFLOWS_MAX; i++) {
> +		struct mptcp_subflow_context *subflow;
> +
> +		subflow = mptcp_subflow_ctx_by_pos(data, i);
> +		if (!subflow)
> +			break;
> +
> +		if (!BPF_CORE_READ_BITFIELD_PROBED(subflow, stale))
> +			break;

With these two breaks, nr could remain 0 in an error case and maybe cause 
the test to incorrectly pass. Maybe better to initialiaze nr to -1 and 
check for that error case before calling mptcp_subflow_set_scheduled 
below?

- Mat

> +
> +		nr = i;
> +	}
> +
> +	mptcp_subflow_set_scheduled(mptcp_subflow_ctx_by_pos(data, nr), true);
> +	return 0;
> +}
> +
> +SEC(".struct_ops")
> +struct mptcp_sched_ops stale = {
> +	.init		= (void *)mptcp_sched_stale_init,
> +	.release	= (void *)mptcp_sched_stale_release,
> +	.data_init	= (void *)bpf_stale_data_init,
> +	.get_subflow	= (void *)bpf_stale_get_subflow,
> +	.name		= "bpf_stale",
> +};
> -- 
> 2.35.3
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH mptcp-next 07/10] selftests/bpf: Add bpf_stale test
  2023-06-27  1:39 ` [PATCH mptcp-next 07/10] selftests/bpf: Add bpf_stale test Geliang Tang
@ 2023-07-07 17:54   ` Mat Martineau
  0 siblings, 0 replies; 18+ messages in thread
From: Mat Martineau @ 2023-07-07 17:54 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp

On Tue, 27 Jun 2023, Geliang Tang wrote:

> This patch adds the bpf_stale scheduler test: test_stale(). Use sysctl to
> set net.mptcp.scheduler to use this sched. Add two veth net devices to
> simulate the multiple addresses case. Use 'ip mptcp endpoint' command to
> add the new endpoint ADDR_2 to PM netlink. Send data and check bytes_sent
> of 'ss' output after it to make sure the data has been only sent on ADDR_1
> since ADDR_2 is set as stale.
>
> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
> ---
> .../testing/selftests/bpf/prog_tests/mptcp.c  | 38 +++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> index 4b346eeaf8e2..851ea32dc1d0 100644
> --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
> +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> @@ -11,6 +11,7 @@
> #include "mptcp_bpf_bkup.skel.h"
> #include "mptcp_bpf_rr.skel.h"
> #include "mptcp_bpf_red.skel.h"
> +#include "mptcp_bpf_stale.skel.h"
>
> char NS_TEST[32];
>
> @@ -488,6 +489,41 @@ static void test_red(void)
> 	mptcp_bpf_red__destroy(red_skel);
> }
>
> +static void test_stale(void)
> +{
> +	struct mptcp_bpf_stale *stale_skel;
> +	int server_fd, client_fd;
> +	struct nstoken *nstoken;
> +	struct bpf_link *link;
> +
> +	stale_skel = mptcp_bpf_stale__open_and_load();
> +	if (!ASSERT_OK_PTR(stale_skel, "bpf_stale__open_and_load"))
> +		return;
> +
> +	link = bpf_map__attach_struct_ops(stale_skel->maps.stale);
> +	if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) {
> +		mptcp_bpf_stale__destroy(stale_skel);
> +		return;
> +	}
> +
> +	nstoken = sched_init("subflow", "bpf_stale");
> +	if (!ASSERT_OK_PTR(nstoken, "sched_init:bpf_stale"))
> +		goto fail;
> +	server_fd = start_mptcp_server(AF_INET, ADDR_1, 0, 0);
> +	client_fd = connect_to_fd(server_fd, 0);
> +
> +	send_data(server_fd, client_fd, "bpf_stale");
> +	ASSERT_OK(has_bytes_sent(ADDR_1), "has_bytes_sent addr_1");
> +	ASSERT_GT(has_bytes_sent(ADDR_2), 0, "has_bytes_sent addr_2");

I don't think it is guaranteed that ADDR_2 will always be at index 1 like 
the bpf_stale_get_subflow() expects. How about using checking for an 
odd/even value in the last byte of the IPv4 addresses for ADDR_1/ADDR_2 
instead?

- Mat

> +
> +	close(client_fd);
> +	close(server_fd);
> +fail:
> +	cleanup_netns(nstoken);
> +	bpf_link__destroy(link);
> +	mptcp_bpf_stale__destroy(stale_skel);
> +}
> +
> void test_mptcp(void)
> {
> 	if (test__start_subtest("base"))
> @@ -502,4 +538,6 @@ void test_mptcp(void)
> 		test_rr();
> 	if (test__start_subtest("red"))
> 		test_red();
> +	if (test__start_subtest("stale"))
> +		test_stale();
> }
> -- 
> 2.35.3
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH mptcp-next 08/10] bpf: Export more bpf_burst related functions
  2023-06-27  1:39 ` [PATCH mptcp-next 08/10] bpf: Export more bpf_burst related functions Geliang Tang
@ 2023-07-07 18:01   ` Mat Martineau
  0 siblings, 0 replies; 18+ messages in thread
From: Mat Martineau @ 2023-07-07 18:01 UTC (permalink / raw)
  To: Geliang Tang, Paolo Abeni; +Cc: mptcp

On Tue, 27 Jun 2023, Geliang Tang wrote:

> sk_stream_memory_free() and tcp_rtx_and_write_queues_empty() are needed
> to export into the BPF context for bpf_burst scheduler. But these two
> functions are inline ones. So this patch added two wrappers for them,
> and export the wrappers in the BPF context.
>
> Add more bpf_burst related functions into bpf_mptcp_sched_kfunc_set to make
> sure these helpers can be accessed from the BPF context.
>
> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
> ---
> net/mptcp/bpf.c      | 16 ++++++++++++++++
> net/mptcp/protocol.c |  4 ++--
> net/mptcp/protocol.h |  4 ++++
> 3 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> index c580add9c7f1..a1c85605ed39 100644
> --- a/net/mptcp/bpf.c
> +++ b/net/mptcp/bpf.c
> @@ -171,10 +171,26 @@ struct bpf_struct_ops bpf_mptcp_sched_ops = {
> 	.name		= "mptcp_sched_ops",
> };
>
> +bool bpf_sk_stream_memory_free(const struct sock *sk)
> +{
> +	return sk_stream_memory_free(sk);
> +}
> +
> +bool bpf_tcp_rtx_and_write_queues_empty(const struct sock *sk)
> +{
> +	return tcp_rtx_and_write_queues_empty(sk);
> +}

I'm not sure the internals of TCP should be made part of the "public API" 
like this (any opinion here, Paolo?). Maybe they could be renamed 
something related to the job they are doing for MPTCP, so there's more 
flexibility to change their implementation and other BPF code doesn't 
start depending on these for generic TCP usage.

Suggest bpf_mptcp_subflow_memory_free() and 
bpf_mptcp_subflow_queues_empty()?

- Mat

> +
> BTF_SET8_START(bpf_mptcp_sched_kfunc_ids)
> BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled)
> BTF_ID_FLAGS(func, mptcp_sched_data_set_contexts)
> BTF_ID_FLAGS(func, mptcp_subflow_ctx_by_pos)
> +BTF_ID_FLAGS(func, mptcp_subflow_active)
> +BTF_ID_FLAGS(func, mptcp_set_timeout)
> +BTF_ID_FLAGS(func, mptcp_wnd_end)
> +BTF_ID_FLAGS(func, bpf_sk_stream_memory_free)
> +BTF_ID_FLAGS(func, bpf_tcp_rtx_and_write_queues_empty)
> +BTF_ID_FLAGS(func, mptcp_pm_subflow_chk_stale)
> BTF_SET8_END(bpf_mptcp_sched_kfunc_ids)
>
> static const struct btf_kfunc_id_set bpf_mptcp_sched_kfunc_set = {
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 5f9f046b2124..84a82967b009 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -50,7 +50,7 @@ DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
> static struct net_device mptcp_napi_dev;
>
> /* Returns end sequence number of the receiver's advertised window */
> -static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
> +u64 mptcp_wnd_end(const struct mptcp_sock *msk)
> {
> 	return READ_ONCE(msk->wnd_end);
> }
> @@ -497,7 +497,7 @@ static long mptcp_timeout_from_subflow(const struct mptcp_subflow_context *subfl
> 	       inet_csk(ssk)->icsk_timeout - jiffies : 0;
> }
>
> -static void mptcp_set_timeout(struct sock *sk)
> +void mptcp_set_timeout(struct sock *sk)
> {
> 	struct mptcp_subflow_context *subflow;
> 	long tout = 0;
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index bb4d50c8c398..58a634fc2fcc 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -636,6 +636,10 @@ void __mptcp_subflow_send_ack(struct sock *ssk);
> void mptcp_subflow_reset(struct sock *ssk);
> void mptcp_subflow_queue_clean(struct sock *sk, struct sock *ssk);
> void mptcp_sock_graft(struct sock *sk, struct socket *parent);
> +u64 mptcp_wnd_end(const struct mptcp_sock *msk);
> +void mptcp_set_timeout(struct sock *sk);
> +bool bpf_sk_stream_memory_free(const struct sock *sk);
> +bool bpf_tcp_rtx_and_write_queues_empty(const struct sock *sk);
> struct socket *__mptcp_nmpc_socket(struct mptcp_sock *msk);
> bool __mptcp_close(struct sock *sk, long timeout);
> void mptcp_cancel_work(struct sock *sk);
> -- 
> 2.35.3
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH mptcp-next 09/10] selftests/bpf: Add bpf_burst scheduler
  2023-06-27  1:39 ` [PATCH mptcp-next 09/10] selftests/bpf: Add bpf_burst scheduler Geliang Tang
@ 2023-07-07 18:09   ` Mat Martineau
  0 siblings, 0 replies; 18+ messages in thread
From: Mat Martineau @ 2023-07-07 18:09 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp

On Tue, 27 Jun 2023, Geliang Tang wrote:

> This patch implements the burst BPF MPTCP scheduler, named bpf_burst,
> which is the default scheduler in protocol.c. bpf_burst_get_send() uses
> the same logic as mptcp_subflow_get_send() and bpf_burst_get_retrans
> uses the same logic as mptcp_subflow_get_retrans().
>
> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
> ---
> tools/testing/selftests/bpf/bpf_tcp_helpers.h |   4 +
> .../selftests/bpf/progs/mptcp_bpf_burst.c     | 205 ++++++++++++++++++
> 2 files changed, 209 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
>
> diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
> index c749940c9103..c1d7963c3bc8 100644
> --- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
> @@ -36,6 +36,7 @@ enum sk_pacing {
> struct sock {
> 	struct sock_common	__sk_common;
> #define sk_state		__sk_common.skc_state
> +	int			sk_wmem_queued;
> 	unsigned long		sk_pacing_rate;
> 	__u32			sk_pacing_status; /* see enum sk_pacing */
> } __attribute__((preserve_access_index));
> @@ -234,8 +235,10 @@ extern void tcp_cong_avoid_ai(struct tcp_sock *tp, __u32 w, __u32 acked) __ksym;
> #define MPTCP_SUBFLOWS_MAX	8
>
> struct mptcp_subflow_context {
> +	unsigned long avg_pacing_rate;
> 	__u32	backup : 1,
> 		stale : 1;
> +	__u8	stale_count;
> 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
> } __attribute__((preserve_access_index));
>
> @@ -260,6 +263,7 @@ struct mptcp_sched_ops {
> struct mptcp_sock {
> 	struct inet_connection_sock	sk;
>
> +	__u64		snd_nxt;
> 	__u32		token;
> 	struct sock	*first;
> 	char		ca_name[TCP_CA_NAME_MAX];
> diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
> new file mode 100644
> index 000000000000..1886e2f7aca4
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
> @@ -0,0 +1,205 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023, SUSE. */
> +
> +#include <linux/bpf.h>
> +#include <limits.h>
> +#include "bpf_tcp_helpers.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +struct mptcp_burst_storage {
> +	int snd_burst;
> +};
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
> +	__uint(map_flags, BPF_F_NO_PREALLOC);
> +	__type(key, int);
> +	__type(value, struct mptcp_burst_storage);
> +} mptcp_burst_map SEC(".maps");
> +
> +#define MPTCP_SEND_BURST_SIZE	65428
> +
> +struct subflow_send_info {
> +	__u8 subflow_id;
> +	__u64 linger_time;
> +};
> +
> +static inline __u64 div_u64_rem(__u64 dividend, __u32 divisor, __u32 *remainder)
> +{
> +	*remainder = dividend % divisor;
> +	return dividend / divisor;
> +}
> +
> +static inline __u64 div_u64(__u64 dividend, __u32 divisor)
> +{
> +	__u32 remainder;
> +
> +	return div_u64_rem(dividend, divisor, &remainder);
> +}

Since this is compiling to BPF rather than native code with 
architecture-specific optimizations, I think it's better to remove 
div_u64_rem() and keep div_u64() simple:

static inline __u64 div_u64(__u64 dividend, __u32 divisor)
{
      return dividend / divisor;
}

- Mat

> +
> +extern bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) __ksym;
> +extern void mptcp_set_timeout(struct sock *sk) __ksym;
> +extern __u64 mptcp_wnd_end(const struct mptcp_sock *msk) __ksym;
> +extern bool bpf_sk_stream_memory_free(const struct sock *sk) __ksym;
> +extern bool bpf_tcp_rtx_and_write_queues_empty(const struct sock *sk) __ksym;
> +extern void mptcp_pm_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ssk) __ksym;
> +
> +#define SSK_MODE_ACTIVE	0
> +#define SSK_MODE_BACKUP	1
> +#define SSK_MODE_MAX	2
> +
> +SEC("struct_ops/mptcp_sched_burst_init")
> +void BPF_PROG(mptcp_sched_burst_init, struct mptcp_sock *msk)
> +{
> +}
> +
> +SEC("struct_ops/mptcp_sched_burst_release")
> +void BPF_PROG(mptcp_sched_burst_release, struct mptcp_sock *msk)
> +{
> +	bpf_sk_storage_delete(&mptcp_burst_map, msk);
> +}
> +
> +void BPF_STRUCT_OPS(bpf_burst_data_init, struct mptcp_sock *msk,
> +		    struct mptcp_sched_data *data)
> +{
> +	mptcp_sched_data_set_contexts(msk, data);
> +}
> +
> +static int bpf_burst_get_send(struct mptcp_sock *msk,
> +			      const struct mptcp_sched_data *data)
> +{
> +	struct subflow_send_info send_info[SSK_MODE_MAX];
> +	struct mptcp_subflow_context *subflow;
> +	struct sock *sk = (struct sock *)msk;
> +	struct mptcp_burst_storage *ptr;
> +	__u32 pace, burst, wmem;
> +	__u64 linger_time;
> +	struct sock *ssk;
> +	int i;
> +
> +	/* pick the subflow with the lower wmem/wspace ratio */
> +	for (i = 0; i < SSK_MODE_MAX; ++i) {
> +		send_info[i].subflow_id = MPTCP_SUBFLOWS_MAX;
> +		send_info[i].linger_time = -1;
> +	}
> +
> +	for (i = 0; i < data->subflows && i < MPTCP_SUBFLOWS_MAX; i++) {
> +		subflow = mptcp_subflow_ctx_by_pos(data, i);
> +		if (!subflow)
> +			break;
> +
> +		ssk = mptcp_subflow_tcp_sock(subflow);
> +		if (!mptcp_subflow_active(subflow))
> +			continue;
> +
> +		pace = subflow->avg_pacing_rate;
> +		if (!pace) {
> +			/* init pacing rate from socket */
> +			subflow->avg_pacing_rate = ssk->sk_pacing_rate;
> +			pace = subflow->avg_pacing_rate;
> +			if (!pace)
> +				continue;
> +		}
> +
> +		linger_time = div_u64((__u64)ssk->sk_wmem_queued << 32, pace);
> +		if (linger_time < send_info[subflow->backup].linger_time) {
> +			send_info[subflow->backup].subflow_id = i;
> +			send_info[subflow->backup].linger_time = linger_time;
> +		}
> +	}
> +	mptcp_set_timeout(sk);
> +
> +	/* pick the best backup if no other subflow is active */
> +	if (send_info[SSK_MODE_ACTIVE].subflow_id == MPTCP_SUBFLOWS_MAX)
> +		send_info[SSK_MODE_ACTIVE].subflow_id = send_info[SSK_MODE_BACKUP].subflow_id;
> +
> +	subflow = mptcp_subflow_ctx_by_pos(data, send_info[SSK_MODE_ACTIVE].subflow_id);
> +	if (!subflow)
> +		return -1;
> +	ssk = mptcp_subflow_tcp_sock(subflow);
> +	if (!ssk || !bpf_sk_stream_memory_free(ssk))
> +		return -1;
> +
> +	burst = min(MPTCP_SEND_BURST_SIZE, mptcp_wnd_end(msk) - msk->snd_nxt);
> +	wmem = ssk->sk_wmem_queued;
> +	if (!burst)
> +		goto out;
> +
> +	subflow->avg_pacing_rate = div_u64((__u64)subflow->avg_pacing_rate * wmem +
> +					   ssk->sk_pacing_rate * burst,
> +					   burst + wmem);
> +	ptr = bpf_sk_storage_get(&mptcp_burst_map, msk, 0,
> +				 BPF_LOCAL_STORAGE_GET_F_CREATE);
> +	if (ptr)
> +		ptr->snd_burst = burst;
> +
> +out:
> +	mptcp_subflow_set_scheduled(subflow, true);
> +	return 0;
> +}
> +
> +static int bpf_burst_get_retrans(struct mptcp_sock *msk,
> +				 const struct mptcp_sched_data *data)
> +{
> +	int backup = MPTCP_SUBFLOWS_MAX, pick = MPTCP_SUBFLOWS_MAX, subflow_id;
> +	struct mptcp_subflow_context *subflow;
> +	int min_stale_count = INT_MAX;
> +	struct sock *ssk;
> +
> +	for (int i = 0; i < data->subflows && i < MPTCP_SUBFLOWS_MAX; i++) {
> +		subflow = mptcp_subflow_ctx_by_pos(data, i);
> +		if (!subflow)
> +			break;
> +
> +		if (!mptcp_subflow_active(subflow))
> +			continue;
> +
> +		ssk = mptcp_subflow_tcp_sock(subflow);
> +		/* still data outstanding at TCP level? skip this */
> +		if (!bpf_tcp_rtx_and_write_queues_empty(ssk)) {
> +			mptcp_pm_subflow_chk_stale(msk, ssk);
> +			min_stale_count = min(min_stale_count, subflow->stale_count);
> +			continue;
> +		}
> +
> +		if (subflow->backup) {
> +			if (backup == MPTCP_SUBFLOWS_MAX)
> +				backup = i;
> +			continue;
> +		}
> +
> +		if (pick == MPTCP_SUBFLOWS_MAX)
> +			pick = i;
> +	}
> +
> +	if (pick < MPTCP_SUBFLOWS_MAX) {
> +		subflow_id = pick;
> +		goto out;
> +	}
> +	subflow_id = min_stale_count > 1 ? backup : MPTCP_SUBFLOWS_MAX;
> +
> +out:
> +	subflow = mptcp_subflow_ctx_by_pos(data, subflow_id);
> +	if (!subflow)
> +		return -1;
> +	mptcp_subflow_set_scheduled(subflow, true);
> +	return 0;
> +}
> +
> +int BPF_STRUCT_OPS(bpf_burst_get_subflow, struct mptcp_sock *msk,
> +		   const struct mptcp_sched_data *data)
> +{
> +	if (data->reinject)
> +		return bpf_burst_get_retrans(msk, data);
> +	return bpf_burst_get_send(msk, data);
> +}
> +
> +SEC(".struct_ops")
> +struct mptcp_sched_ops burst = {
> +	.init		= (void *)mptcp_sched_burst_init,
> +	.release	= (void *)mptcp_sched_burst_release,
> +	.data_init	= (void *)bpf_burst_data_init,
> +	.get_subflow	= (void *)bpf_burst_get_subflow,
> +	.name		= "bpf_burst",
> +};
> -- 
> 2.35.3
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: selftests/bpf: Add bpf_burst test: Tests Results
  2023-06-27  1:39 ` [PATCH mptcp-next 10/10] selftests/bpf: Add bpf_burst test Geliang Tang
@ 2023-07-07 19:02   ` MPTCP CI
  0 siblings, 0 replies; 18+ messages in thread
From: MPTCP CI @ 2023-07-07 19:02 UTC (permalink / raw)
  To: Geliang Tang; +Cc: mptcp

Hi Geliang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6478787735453696
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6478787735453696/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5387016662155264
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5387016662155264/summary/summary.txt

- KVM Validation: normal (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5206555876917248
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5206555876917248/summary/summary.txt

- KVM Validation: debug (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5784825894797312
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5784825894797312/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/3f2f5bdd10e9


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2
  2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
                   ` (9 preceding siblings ...)
  2023-06-27  1:39 ` [PATCH mptcp-next 10/10] selftests/bpf: Add bpf_burst test Geliang Tang
@ 2023-07-12 16:45 ` Matthieu Baerts
  10 siblings, 0 replies; 18+ messages in thread
From: Matthieu Baerts @ 2023-07-12 16:45 UTC (permalink / raw)
  To: Geliang Tang, mptcp

Hi Geliang, Mat,

On 27/06/2023 03:39, Geliang Tang wrote:
> - Add time metrics for sched tests.
> - Add bpf_stale and bpf_burst schedulers.
> 
> Geliang Tang (10):
>   Squash to "selftests/bpf: Add bpf_first test"
>   selftests/bpf: Add bpf_first test
>   Squash to "selftests/bpf: Add bpf_bkup test"
>   Squash to "selftests/bpf: Add bpf_rr test"
>   Squash to "selftests/bpf: Add bpf_red test"

Thank you for the patches and the reviews!

I just applied these 5 first patches in our tree (feat. for other trees)
with Mat's RvB tag. Please check if the result is the expected one.

I squashed the first patch with the previous version of "selftests/bpf:
Add bpf_first test" and inserted it just before "selftests/bpf: Add
bpf_first scheduler".

New patches for t/upstream:
- 623a40f22358: selftests/bpf: Add bpf scheduler test
- 0624ffbc4277: conflict in t/selftests-bpf-add-bpf_first-test
- 875e9054d305: conflict in t/selftests-bpf-Add-bpf_bkup-test
- Results: 71a6b14c30ab..ad99473d0690 (export)

- 117ad55b0a69: "squashed" (with conflicts) patch 2/10 in
"selftests/bpf: Add bpf_first test"
- 5f014cabbfbc: conflict in t/selftests-bpf-Add-bpf_bkup-test
- Results: ad99473d0690..835132fc94ee (export)

- b6f5458c8236: tg:msg: apply new commit message from Geliang
- Results: 23ee68c3fa88..c8f03d57ef57 (export)

- d5beca01e1bc: "squashed" patch 3/10 in "selftests/bpf: Add bpf_bkup test"
- e3221affcf4d: "squashed" patch 4/10 in "selftests/bpf: Add bpf_rr test"
- 8216c7624546: "squashed" patch 5/10 in "selftests/bpf: Add bpf_red test"
- Results: 835132fc94ee..23ee68c3fa88 (export)

Tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20230712T164345

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-07-12 16:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-27  1:39 [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Geliang Tang
2023-06-27  1:39 ` [PATCH mptcp-next 01/10] Squash to "selftests/bpf: Add bpf_first test" Geliang Tang
2023-07-07 17:45   ` Mat Martineau
2023-06-27  1:39 ` [PATCH mptcp-next 02/10] selftests/bpf: Add bpf_first test Geliang Tang
2023-06-27  1:39 ` [PATCH mptcp-next 03/10] Squash to "selftests/bpf: Add bpf_bkup test" Geliang Tang
2023-06-27  1:39 ` [PATCH mptcp-next 04/10] Squash to "selftests/bpf: Add bpf_rr test" Geliang Tang
2023-06-27  1:39 ` [PATCH mptcp-next 05/10] Squash to "selftests/bpf: Add bpf_red test" Geliang Tang
2023-06-27  1:39 ` [PATCH mptcp-next 06/10] selftests/bpf: Add bpf_stale scheduler Geliang Tang
2023-07-07 17:49   ` Mat Martineau
2023-06-27  1:39 ` [PATCH mptcp-next 07/10] selftests/bpf: Add bpf_stale test Geliang Tang
2023-07-07 17:54   ` Mat Martineau
2023-06-27  1:39 ` [PATCH mptcp-next 08/10] bpf: Export more bpf_burst related functions Geliang Tang
2023-07-07 18:01   ` Mat Martineau
2023-06-27  1:39 ` [PATCH mptcp-next 09/10] selftests/bpf: Add bpf_burst scheduler Geliang Tang
2023-07-07 18:09   ` Mat Martineau
2023-06-27  1:39 ` [PATCH mptcp-next 10/10] selftests/bpf: Add bpf_burst test Geliang Tang
2023-07-07 19:02   ` selftests/bpf: Add bpf_burst test: Tests Results MPTCP CI
2023-07-12 16:45 ` [PATCH mptcp-next 00/10] BPF packet scheduler updates part 2 Matthieu Baerts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.