* [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter
@ 2024-09-12 9:25 Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 1/5] bpf: Add " Geliang Tang
` (8 more replies)
0 siblings, 9 replies; 19+ messages in thread
From: Geliang Tang @ 2024-09-12 9:25 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
v5:
- update patch 1 as Andrii suggested: if msk is NULL, initialize
kit->msk to NULL in _new() and check it in _next().
v4:
- squash patch 1/5 and 3/5 in v3 together as Matt suggested.
- a new squash-to patch to drop mptcp_subflow_active declaration in
bpf_burst.
v3:
- drop bpf_iter__mptcp_subflow, __diag_push, __diag_pop and
__diag_ignore_all
- drop declarations for bpf kfuncs
v2:
- update patch 1 as Martin and Andrii suggested.
- fix warnings and errors reported by MPTCP CI.
This patch set adds a mptcp_subflow type bpf_iter, and self tests.
Geliang Tang (5):
bpf: Add mptcp_subflow bpf_iter
selftests/bpf: Add mptcp_subflow bpf_iter test prog
selftests/bpf: Add mptcp_subflow bpf_iter subtest
Squash to "mptcp: add sched_data helpers"
Squash to "selftests/bpf: Add bpf_burst scheduler & test"
net/mptcp/bpf.c | 57 +++++++++++--
net/mptcp/protocol.h | 2 -
.../testing/selftests/bpf/bpf_experimental.h | 7 ++
.../testing/selftests/bpf/prog_tests/mptcp.c | 85 +++++++++++++++++++
tools/testing/selftests/bpf/progs/mptcp_bpf.h | 4 +
.../selftests/bpf/progs/mptcp_bpf_burst.c | 1 -
.../selftests/bpf/progs/mptcp_bpf_iter.c | 38 +++++++++
7 files changed, 186 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_iter.c
--
2.43.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
@ 2024-09-12 9:25 ` Geliang Tang
2024-09-12 18:24 ` Andrii Nakryiko
2024-09-12 9:25 ` [PATCH mptcp-next v5 2/5] selftests/bpf: Add mptcp_subflow bpf_iter test prog Geliang Tang
` (7 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Geliang Tang @ 2024-09-12 9:25 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, bpf, Martin KaFai Lau
From: Geliang Tang <tanggeliang@kylinos.cn>
It's necessary to traverse all subflows on the conn_list of an MPTCP
socket and then call kfunc to modify the fields of each subflow. In
kernel space, mptcp_for_each_subflow() helper is used for this:
mptcp_for_each_subflow(msk, subflow)
kfunc(subflow);
But in the MPTCP BPF program, this has not yet been implemented. As
Martin suggested recently, this conn_list walking + modify-by-kfunc
usage fits the bpf_iter use case. So this patch adds a new bpf_iter
type named "mptcp_subflow" to do this and implements its helpers
bpf_iter_mptcp_subflow_new()/_next()/_destroy().
Since these bpf_iter mptcp_subflow helpers are invoked in its selftest
in a ftrace hook for mptcp_sched_get_send(), it's necessary to register
them into a BPF_PROG_TYPE_TRACING type kfunc set together with other
two used kfuncs mptcp_subflow_active() and mptcp_subflow_set_scheduled().
Then bpf_for_each() for mptcp_subflow can be used in BPF program like
this:
i = 0;
bpf_rcu_read_lock();
bpf_for_each(mptcp_subflow, subflow, msk) {
if (i++ >= MPTCP_SUBFLOWS_MAX)
break;
kfunc(subflow);
}
bpf_rcu_read_unlock();
v2: remove msk->pm.lock in _new() and _destroy() (Martin)
drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
v3: drop bpf_iter__mptcp_subflow
v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in
_next() (Andrii)
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
net/mptcp/bpf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 52 insertions(+), 5 deletions(-)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 6414824402e6..fec18e7e5e4a 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -201,9 +201,51 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = {
.set = &bpf_mptcp_fmodret_ids,
};
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "kfuncs which will be used in BPF programs");
+struct bpf_iter_mptcp_subflow {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_mptcp_subflow_kern {
+ struct mptcp_sock *msk;
+ struct list_head *pos;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
+ struct mptcp_sock *msk)
+{
+ struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
+
+ kit->msk = msk;
+ if (!msk)
+ return -EINVAL;
+
+ kit->pos = &msk->conn_list;
+ return 0;
+}
+
+__bpf_kfunc struct mptcp_subflow_context *
+bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
+{
+ struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
+ struct mptcp_subflow_context *subflow;
+ struct mptcp_sock *msk = kit->msk;
+
+ if (!msk)
+ return NULL;
+
+ subflow = list_entry(kit->pos->next, struct mptcp_subflow_context, node);
+ if (!subflow || list_entry_is_head(subflow, &msk->conn_list, node))
+ return NULL;
+
+ kit->pos = &subflow->node;
+ return subflow;
+}
+
+__bpf_kfunc void bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
+{
+}
__bpf_kfunc struct mptcp_subflow_context *
bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data, unsigned int pos)
@@ -218,12 +260,15 @@ __bpf_kfunc bool bpf_mptcp_subflow_queues_empty(struct sock *sk)
return tcp_rtx_queue_empty(sk);
}
-__diag_pop();
+__bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy)
+BTF_ID_FLAGS(func, mptcp_subflow_active)
BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled)
BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx_by_pos)
-BTF_ID_FLAGS(func, mptcp_subflow_active)
BTF_ID_FLAGS(func, mptcp_set_timeout)
BTF_ID_FLAGS(func, mptcp_wnd_end)
BTF_ID_FLAGS(func, tcp_stream_memory_free)
@@ -241,6 +286,8 @@ static int __init bpf_mptcp_kfunc_init(void)
int ret;
ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
+ &bpf_mptcp_sched_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
&bpf_mptcp_sched_kfunc_set);
#ifdef CONFIG_BPF_JIT
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH mptcp-next v5 2/5] selftests/bpf: Add mptcp_subflow bpf_iter test prog
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 1/5] bpf: Add " Geliang Tang
@ 2024-09-12 9:25 ` Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 3/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest Geliang Tang
` (6 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Geliang Tang @ 2024-09-12 9:25 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch adds a new mptcp bpf selftest program for the newly added
mptcp_subflow bpf_iter.
Export bpf_iter_mptcp_subflow_new/_next/_destroy into bpf_experimental.h
then use bpf_for_each(mptcp_subflow, subflow, msk) to walk the mptcp
subflow list.
Add a ftrace hook for mptcp_sched_get_send() to do this test and invoke
kfuncs mptcp_subflow_active() and mptcp_subflow_set_scheduled() in the
loops.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
.../testing/selftests/bpf/bpf_experimental.h | 7 ++++
tools/testing/selftests/bpf/progs/mptcp_bpf.h | 4 ++
.../selftests/bpf/progs/mptcp_bpf_iter.c | 38 +++++++++++++++++++
3 files changed, 49 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_iter.c
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 828556cdc2f0..97aad95c90c6 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -549,6 +549,13 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
+struct bpf_iter_mptcp_subflow;
+extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
+ struct mptcp_sock *msk) __weak __ksym;
+extern struct mptcp_subflow_context *
+bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym;
+extern void bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak __ksym;
+
extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym;
extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym;
extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
index 928a1e5ad8db..3280d4183beb 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_bpf.h
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
@@ -43,9 +43,13 @@ mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
}
/* ksym */
+extern bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) __ksym;
extern void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
bool scheduled) __ksym;
+extern void bpf_rcu_read_lock(void) __ksym;
+extern void bpf_rcu_read_unlock(void) __ksym;
+
extern struct mptcp_subflow_context *
bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data, unsigned int pos) __ksym;
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_iter.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_iter.c
new file mode 100644
index 000000000000..03168be3980b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iter.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Kylin Software */
+
+/* vmlinux.h, bpf_helpers.h and other 'define' */
+#include "bpf_tracing_net.h"
+#include "mptcp_bpf.h"
+
+char _license[] SEC("license") = "GPL";
+int iter = 0;
+int pid;
+
+SEC("fentry/mptcp_sched_get_send")
+int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk)
+{
+ struct mptcp_subflow_context *subflow;
+ int i = 0;
+
+ if (bpf_get_current_pid_tgid() >> 32 != pid)
+ return 0;
+
+ bpf_rcu_read_lock();
+ bpf_for_each(mptcp_subflow, subflow, msk) {
+ if (i++ >= MPTCP_SUBFLOWS_MAX)
+ break;
+
+ if (subflow->token != msk->token)
+ break;
+
+ if (!mptcp_subflow_active(subflow))
+ continue;
+
+ mptcp_subflow_set_scheduled(subflow, false);
+ }
+ bpf_rcu_read_unlock();
+
+ iter = i;
+ return 0;
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH mptcp-next v5 3/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 1/5] bpf: Add " Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 2/5] selftests/bpf: Add mptcp_subflow bpf_iter test prog Geliang Tang
@ 2024-09-12 9:25 ` Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 4/5] Squash to "mptcp: add sched_data helpers" Geliang Tang
` (5 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Geliang Tang @ 2024-09-12 9:25 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch adds a subtest named test_bpf_iter to load and verify the newly
added mptcp_subflow type bpf_iter example in test_mptcp. Use endpoint_init()
to add a new subflow endpoint. A new helper recv_send() is invoked to trigger
the ftrace hook for mptcp_sched_get_send() after wait_for_new_subflows().
Check if skel->bss->iter equals 2 to verify whether the ftrace program loops
twice as expected.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
.../testing/selftests/bpf/prog_tests/mptcp.c | 85 +++++++++++++++++++
1 file changed, 85 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index a3e68bc6afa3..bb40c1f35918 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -11,6 +11,7 @@
#include "mptcp_sock.skel.h"
#include "mptcpify.skel.h"
#include "mptcp_subflow.skel.h"
+#include "mptcp_bpf_iter.skel.h"
#include "mptcp_bpf_first.skel.h"
#include "mptcp_bpf_bkup.skel.h"
#include "mptcp_bpf_rr.skel.h"
@@ -259,6 +260,34 @@ static void send_byte(int fd)
ASSERT_EQ(write(fd, &b, sizeof(b)), 1, "send single byte");
}
+static int recv_send(int server_fd)
+{
+ char buf[1];
+ int ret, fd;
+ ssize_t n;
+
+ fd = accept(server_fd, NULL, NULL);
+ if (!ASSERT_OK_FD(fd, "accept"))
+ return -1;
+
+ n = recv(fd, buf, sizeof(buf), 0);
+ if (!ASSERT_GT(n, 0, "recv")) {
+ ret = -1;
+ goto close;
+ }
+
+ n = send(fd, buf, n, 0);
+ if (!ASSERT_GT(n, 0, "send")) {
+ ret = -1;
+ goto close;
+ }
+
+ ret = 0;
+close:
+ close(fd);
+ return ret;
+}
+
static int verify_mptcpify(int server_fd, int client_fd)
{
struct __mptcp_info info;
@@ -471,6 +500,60 @@ static void test_subflow(void)
close(cgroup_fd);
}
+static void run_bpf_iter(void)
+{
+ int server_fd, client_fd;
+
+ server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0);
+ if (!ASSERT_OK_FD(server_fd, "start_mptcp_server"))
+ return;
+
+ client_fd = connect_to_fd(server_fd, 0);
+ if (!ASSERT_OK_FD(client_fd, "connect_to_fd"))
+ goto close_server;
+
+ send_byte(client_fd);
+ wait_for_new_subflows(client_fd);
+ recv_send(server_fd);
+
+ close(client_fd);
+close_server:
+ close(server_fd);
+}
+
+static void test_bpf_iter(void)
+{
+ struct mptcp_bpf_iter *skel;
+ struct nstoken *nstoken;
+ int err;
+
+ skel = mptcp_bpf_iter__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open_load: mptcp_iter"))
+ return;
+
+ skel->bss->pid = getpid();
+
+ err = mptcp_bpf_iter__attach(skel);
+ if (!ASSERT_OK(err, "skel_attach: mptcp_iter"))
+ goto skel_destroy;
+
+ nstoken = create_netns();
+ if (!ASSERT_OK_PTR(nstoken, "create_netns: mptcp_iter"))
+ goto skel_destroy;
+
+ if (endpoint_init("subflow") < 0)
+ goto close_netns;
+
+ run_bpf_iter();
+
+ ASSERT_EQ(skel->bss->iter, 2, "iter");
+
+close_netns:
+ cleanup_netns(nstoken);
+skel_destroy:
+ mptcp_bpf_iter__destroy(skel);
+}
+
static struct nstoken *sched_init(char *flags, char *sched)
{
struct nstoken *nstoken;
@@ -652,6 +735,8 @@ void test_mptcp(void)
test_mptcpify();
if (test__start_subtest("subflow"))
test_subflow();
+ if (test__start_subtest("bpf_iter"))
+ test_bpf_iter();
if (test__start_subtest("default"))
test_default();
if (test__start_subtest("first"))
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH mptcp-next v5 4/5] Squash to "mptcp: add sched_data helpers"
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
` (2 preceding siblings ...)
2024-09-12 9:25 ` [PATCH mptcp-next v5 3/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest Geliang Tang
@ 2024-09-12 9:25 ` Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 5/5] Squash to "selftests/bpf: Add bpf_burst scheduler & test" Geliang Tang
` (4 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Geliang Tang @ 2024-09-12 9:25 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
Drop bpf_mptcp_subflow_ctx_by_pos declaration, since
"-Wmissing-declarations" is ignored in __bpf_kfunc_start_defs.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
net/mptcp/protocol.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index c3942416fa3a..d864c8ccf316 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -722,8 +722,6 @@ void mptcp_sock_graft(struct sock *sk, struct socket *parent);
u64 mptcp_wnd_end(const struct mptcp_sock *msk);
void mptcp_set_timeout(struct sock *sk);
bool bpf_mptcp_subflow_queues_empty(struct sock *sk);
-struct mptcp_subflow_context *
-bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data, unsigned int pos);
struct sock *__mptcp_nmpc_sk(struct mptcp_sock *msk);
bool __mptcp_close(struct sock *sk, long timeout);
void mptcp_cancel_work(struct sock *sk);
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH mptcp-next v5 5/5] Squash to "selftests/bpf: Add bpf_burst scheduler & test"
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
` (3 preceding siblings ...)
2024-09-12 9:25 ` [PATCH mptcp-next v5 4/5] Squash to "mptcp: add sched_data helpers" Geliang Tang
@ 2024-09-12 9:25 ` Geliang Tang
2024-09-12 9:52 ` [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter MPTCP CI
` (3 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Geliang Tang @ 2024-09-12 9:25 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
Drop mptcp_subflow_active declaration.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
index eb21119aa8f7..dcab196867fe 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
@@ -16,7 +16,6 @@ struct bpf_subflow_send_info {
__u64 linger_time;
};
-extern bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) __ksym;
extern void mptcp_set_timeout(struct sock *sk) __ksym;
extern __u64 mptcp_wnd_end(const struct mptcp_sock *msk) __ksym;
extern bool tcp_stream_memory_free(const struct sock *sk, int wake) __ksym;
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
` (4 preceding siblings ...)
2024-09-12 9:25 ` [PATCH mptcp-next v5 5/5] Squash to "selftests/bpf: Add bpf_burst scheduler & test" Geliang Tang
@ 2024-09-12 9:52 ` MPTCP CI
2024-09-12 9:54 ` Matthieu Baerts
2024-09-12 10:45 ` MPTCP CI
` (2 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: MPTCP CI @ 2024-09-12 9:52 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
Hi Geliang,
Thank you for your modifications, that's great!
But sadly, our CI spotted some issues with it when trying to build it.
You can find more details there:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10828370093
Status: failure
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8e5f73626f7f
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=889689
Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter
2024-09-12 9:52 ` [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter MPTCP CI
@ 2024-09-12 9:54 ` Matthieu Baerts
0 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts @ 2024-09-12 9:54 UTC (permalink / raw)
To: mptcp, Geliang Tang
Hi Geliang,
On 12/09/2024 11:52, MPTCP CI wrote:
> Hi Geliang,
>
> Thank you for your modifications, that's great!
>
> But sadly, our CI spotted some issues with it when trying to build it.
>
> You can find more details there:
>
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10828370093
Please ignore, it is due to an issue upstream. I'm on it.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
` (5 preceding siblings ...)
2024-09-12 9:52 ` [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter MPTCP CI
@ 2024-09-12 10:45 ` MPTCP CI
2024-09-12 11:10 ` MPTCP CI
2024-09-12 11:44 ` MPTCP CI
8 siblings, 0 replies; 19+ messages in thread
From: MPTCP CI @ 2024-09-12 10:45 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
Hi Geliang,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Unstable: 1 failed test(s): mptcp_connect_mmap 🔴
- KVM Validation: btf (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10828370091
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8e5f73626f7f
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=889689
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
` (6 preceding siblings ...)
2024-09-12 10:45 ` MPTCP CI
@ 2024-09-12 11:10 ` MPTCP CI
2024-09-12 11:26 ` Matthieu Baerts
2024-09-12 11:44 ` MPTCP CI
8 siblings, 1 reply; 19+ messages in thread
From: MPTCP CI @ 2024-09-12 11:10 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
Hi Geliang,
Thank you for your modifications, that's great!
But sadly, our CI spotted some issues with it when trying to build it.
You can find more details there:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10829533095
Status: failure
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/f12243725ecc
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=889689
Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter
2024-09-12 11:10 ` MPTCP CI
@ 2024-09-12 11:26 ` Matthieu Baerts
0 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts @ 2024-09-12 11:26 UTC (permalink / raw)
To: mptcp, Geliang Tang
On 12/09/2024 13:10, MPTCP CI wrote:
> Hi Geliang,
>
> Thank you for your modifications, that's great!
>
> But sadly, our CI spotted some issues with it when trying to build it.
>
> You can find more details there:
>
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10829533095
FYI, this error is different from the previous one, no longer due to a
modification upstream because I recently applied a fix for that in our tree.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
` (7 preceding siblings ...)
2024-09-12 11:10 ` MPTCP CI
@ 2024-09-12 11:44 ` MPTCP CI
8 siblings, 0 replies; 19+ messages in thread
From: MPTCP CI @ 2024-09-12 11:44 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
Hi Geliang,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10829533086
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/f12243725ecc
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=889689
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
2024-09-12 9:25 ` [PATCH mptcp-next v5 1/5] bpf: Add " Geliang Tang
@ 2024-09-12 18:24 ` Andrii Nakryiko
2024-09-13 4:04 ` Geliang Tang
0 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2024-09-12 18:24 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp, Geliang Tang, bpf, Martin KaFai Lau
On Thu, Sep 12, 2024 at 2:26 AM Geliang Tang <geliang@kernel.org> wrote:
>
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> It's necessary to traverse all subflows on the conn_list of an MPTCP
> socket and then call kfunc to modify the fields of each subflow. In
> kernel space, mptcp_for_each_subflow() helper is used for this:
>
> mptcp_for_each_subflow(msk, subflow)
> kfunc(subflow);
>
> But in the MPTCP BPF program, this has not yet been implemented. As
> Martin suggested recently, this conn_list walking + modify-by-kfunc
> usage fits the bpf_iter use case. So this patch adds a new bpf_iter
> type named "mptcp_subflow" to do this and implements its helpers
> bpf_iter_mptcp_subflow_new()/_next()/_destroy().
>
> Since these bpf_iter mptcp_subflow helpers are invoked in its selftest
> in a ftrace hook for mptcp_sched_get_send(), it's necessary to register
> them into a BPF_PROG_TYPE_TRACING type kfunc set together with other
> two used kfuncs mptcp_subflow_active() and mptcp_subflow_set_scheduled().
>
> Then bpf_for_each() for mptcp_subflow can be used in BPF program like
> this:
>
> i = 0;
> bpf_rcu_read_lock();
> bpf_for_each(mptcp_subflow, subflow, msk) {
> if (i++ >= MPTCP_SUBFLOWS_MAX)
> break;
> kfunc(subflow);
> }
> bpf_rcu_read_unlock();
>
> v2: remove msk->pm.lock in _new() and _destroy() (Martin)
> drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
> v3: drop bpf_iter__mptcp_subflow
> v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in
> _next() (Andrii)
>
> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> net/mptcp/bpf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 52 insertions(+), 5 deletions(-)
>
Looks ok from setting up open-coded iterator things, but I can't speak
for other aspects I mentioned below.
> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> index 6414824402e6..fec18e7e5e4a 100644
> --- a/net/mptcp/bpf.c
> +++ b/net/mptcp/bpf.c
> @@ -201,9 +201,51 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = {
> .set = &bpf_mptcp_fmodret_ids,
> };
>
> -__diag_push();
> -__diag_ignore_all("-Wmissing-prototypes",
> - "kfuncs which will be used in BPF programs");
> +struct bpf_iter_mptcp_subflow {
> + __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_mptcp_subflow_kern {
> + struct mptcp_sock *msk;
> + struct list_head *pos;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
> + struct mptcp_sock *msk)
> +{
> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> +
> + kit->msk = msk;
> + if (!msk)
> + return -EINVAL;
> +
> + kit->pos = &msk->conn_list;
> + return 0;
> +}
> +
> +__bpf_kfunc struct mptcp_subflow_context *
> +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> +{
> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> + struct mptcp_subflow_context *subflow;
> + struct mptcp_sock *msk = kit->msk;
> +
> + if (!msk)
> + return NULL;
> +
> + subflow = list_entry(kit->pos->next, struct mptcp_subflow_context, node);
> + if (!subflow || list_entry_is_head(subflow, &msk->conn_list, node))
it's a bit weird that you need both !subflow and list_entry_is_head().
Can you have NULL subflow at all? But this is the question to
msk->conn_list semantics.
> + return NULL;
> +
> + kit->pos = &subflow->node;
> + return subflow;
> +}
> +
> +__bpf_kfunc void bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
> +{
> +}
>
> __bpf_kfunc struct mptcp_subflow_context *
> bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data, unsigned int pos)
> @@ -218,12 +260,15 @@ __bpf_kfunc bool bpf_mptcp_subflow_queues_empty(struct sock *sk)
> return tcp_rtx_queue_empty(sk);
> }
>
> -__diag_pop();
> +__bpf_kfunc_end_defs();
>
> BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
I'm not 100% sure, but I suspect you might need to specify
KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is a
valid owned kernel object. Other BPF folks might help to clarify this.
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy)
> +BTF_ID_FLAGS(func, mptcp_subflow_active)
> BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled)
> BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx_by_pos)
> -BTF_ID_FLAGS(func, mptcp_subflow_active)
> BTF_ID_FLAGS(func, mptcp_set_timeout)
> BTF_ID_FLAGS(func, mptcp_wnd_end)
> BTF_ID_FLAGS(func, tcp_stream_memory_free)
> @@ -241,6 +286,8 @@ static int __init bpf_mptcp_kfunc_init(void)
> int ret;
>
> ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> + &bpf_mptcp_sched_kfunc_set);
> ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
> &bpf_mptcp_sched_kfunc_set);
> #ifdef CONFIG_BPF_JIT
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
2024-09-12 18:24 ` Andrii Nakryiko
@ 2024-09-13 4:04 ` Geliang Tang
2024-09-13 20:57 ` Andrii Nakryiko
0 siblings, 1 reply; 19+ messages in thread
From: Geliang Tang @ 2024-09-13 4:04 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: mptcp, Geliang Tang, bpf, Martin KaFai Lau
Hi Andrii,
On Thu, 2024-09-12 at 11:24 -0700, Andrii Nakryiko wrote:
> On Thu, Sep 12, 2024 at 2:26 AM Geliang Tang <geliang@kernel.org>
> wrote:
> >
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> >
> > It's necessary to traverse all subflows on the conn_list of an
> > MPTCP
> > socket and then call kfunc to modify the fields of each subflow. In
> > kernel space, mptcp_for_each_subflow() helper is used for this:
> >
> > mptcp_for_each_subflow(msk, subflow)
> > kfunc(subflow);
> >
> > But in the MPTCP BPF program, this has not yet been implemented. As
> > Martin suggested recently, this conn_list walking + modify-by-kfunc
> > usage fits the bpf_iter use case. So this patch adds a new bpf_iter
> > type named "mptcp_subflow" to do this and implements its helpers
> > bpf_iter_mptcp_subflow_new()/_next()/_destroy().
> >
> > Since these bpf_iter mptcp_subflow helpers are invoked in its
> > selftest
> > in a ftrace hook for mptcp_sched_get_send(), it's necessary to
> > register
> > them into a BPF_PROG_TYPE_TRACING type kfunc set together with
> > other
> > two used kfuncs mptcp_subflow_active() and
> > mptcp_subflow_set_scheduled().
> >
> > Then bpf_for_each() for mptcp_subflow can be used in BPF program
> > like
> > this:
> >
> > i = 0;
> > bpf_rcu_read_lock();
> > bpf_for_each(mptcp_subflow, subflow, msk) {
> > if (i++ >= MPTCP_SUBFLOWS_MAX)
> > break;
> > kfunc(subflow);
> > }
> > bpf_rcu_read_unlock();
> >
> > v2: remove msk->pm.lock in _new() and _destroy() (Martin)
> > drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2]
> > (Andrii)
> > v3: drop bpf_iter__mptcp_subflow
> > v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
> > it in
> > _next() (Andrii)
> >
> > Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > net/mptcp/bpf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++-
> > ----
> > 1 file changed, 52 insertions(+), 5 deletions(-)
> >
>
> Looks ok from setting up open-coded iterator things, but I can't
> speak
> for other aspects I mentioned below.
>
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > index 6414824402e6..fec18e7e5e4a 100644
> > --- a/net/mptcp/bpf.c
> > +++ b/net/mptcp/bpf.c
> > @@ -201,9 +201,51 @@ static const struct btf_kfunc_id_set
> > bpf_mptcp_fmodret_set = {
> > .set = &bpf_mptcp_fmodret_ids,
> > };
> >
> > -__diag_push();
> > -__diag_ignore_all("-Wmissing-prototypes",
> > - "kfuncs which will be used in BPF programs");
> > +struct bpf_iter_mptcp_subflow {
> > + __u64 __opaque[2];
> > +} __attribute__((aligned(8)));
> > +
> > +struct bpf_iter_mptcp_subflow_kern {
> > + struct mptcp_sock *msk;
> > + struct list_head *pos;
> > +} __attribute__((aligned(8)));
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
> > bpf_iter_mptcp_subflow *it,
> > + struct mptcp_sock *msk)
> > +{
> > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > +
> > + kit->msk = msk;
> > + if (!msk)
> > + return -EINVAL;
> > +
> > + kit->pos = &msk->conn_list;
> > + return 0;
> > +}
> > +
> > +__bpf_kfunc struct mptcp_subflow_context *
> > +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> > +{
> > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > + struct mptcp_subflow_context *subflow;
> > + struct mptcp_sock *msk = kit->msk;
> > +
> > + if (!msk)
> > + return NULL;
> > +
> > + subflow = list_entry(kit->pos->next, struct
> > mptcp_subflow_context, node);
> > + if (!subflow || list_entry_is_head(subflow, &msk-
> > >conn_list, node))
>
> it's a bit weird that you need both !subflow and
> list_entry_is_head().
> Can you have NULL subflow at all? But this is the question to
> msk->conn_list semantics.
Do you mean to return subflow regardless of whether subflow is NULL? If
so, we need to use list_is_last() instead of list_entry_is_head():
{
struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
return NULL;
kit->pos = kit->pos->next;
return list_entry(kit->pos, struct mptcp_subflow_context, node);
}
Hope this is a better version.
>
> > + return NULL;
> > +
> > + kit->pos = &subflow->node;
> > + return subflow;
> > +}
> > +
> > +__bpf_kfunc void bpf_iter_mptcp_subflow_destroy(struct
> > bpf_iter_mptcp_subflow *it)
> > +{
> > +}
> >
> > __bpf_kfunc struct mptcp_subflow_context *
> > bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data,
> > unsigned int pos)
> > @@ -218,12 +260,15 @@ __bpf_kfunc bool
> > bpf_mptcp_subflow_queues_empty(struct sock *sk)
> > return tcp_rtx_queue_empty(sk);
> > }
> >
> > -__diag_pop();
> > +__bpf_kfunc_end_defs();
> >
> > BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
>
> I'm not 100% sure, but I suspect you might need to specify
> KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is a
KF_TRUSTED_ARGS breaks the selftest in patch 2 [1]:
'''
libbpf: prog 'trace_mptcp_sched_get_send': BPF program load failed:
Invalid argument
libbpf: prog 'trace_mptcp_sched_get_send': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk) @
mptcp_bpf_iter.c:13
0: (79) r6 = *(u64 *)(r1 +0)
func 'mptcp_sched_get_send' arg0 has btf_id 28019 type STRUCT
'mptcp_sock'
1: R1=ctx() R6_w=ptr_mptcp_sock()
; if (bpf_get_current_pid_tgid() >> 32 != pid) @ mptcp_bpf_iter.c:17
1: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar()
2: (77) r0 >>= 32 ;
R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
3: (18) r1 = 0xffffb766c1684004 ;
R1_w=map_value(map=mptcp_bp.bss,ks=4,vs=8,off=4)
5: (61) r1 = *(u32 *)(r1 +0) ;
R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
6: (67) r1 <<= 32 ;
R1_w=scalar(smax=0x7fffffff00000000,umax=0xffffffff00000000,smin32=0,sm
ax32=umax32=0,var_off=(0x0; 0xffffffff00000000))
7: (c7) r1 s>>= 32 ;
R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
8: (5d) if r0 != r1 goto pc+39 ;
R0_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0;
0x7fffffff))
R1_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0;
0x7fffffff))
; iter = 0; @ mptcp_bpf_iter.c:20
9: (18) r8 = 0xffffb766c1684000 ;
R8_w=map_value(map=mptcp_bp.bss,ks=4,vs=8)
11: (b4) w1 = 0 ; R1_w=0
12: (63) *(u32 *)(r8 +0) = r1 ; R1_w=0
R8_w=map_value(map=mptcp_bp.bss,ks=4,vs=8)
; bpf_rcu_read_lock(); @ mptcp_bpf_iter.c:22
13: (85) call bpf_rcu_read_lock#84967 ;
14: (bf) r7 = r10 ; R7_w=fp0 R10=fp0
; iter = 0; @ mptcp_bpf_iter.c:20
15: (07) r7 += -16 ; R7_w=fp-16
; bpf_for_each(mptcp_subflow, subflow, msk) { @ mptcp_bpf_iter.c:23
16: (bf) r1 = r7 ; R1_w=fp-16 R7_w=fp-16
17: (bf) r2 = r6 ; R2_w=ptr_mptcp_sock()
R6=ptr_mptcp_sock()
18: (85) call bpf_iter_mptcp_subflow_new#62694
R2 must be referenced or trusted
processed 17 insns (limit 1000000) max_states_per_insn 0 total_states 1
peak_states 1 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'trace_mptcp_sched_get_send': failed to load: -22
libbpf: failed to load object 'mptcp_bpf_iter'
libbpf: failed to load BPF skeleton 'mptcp_bpf_iter': -22
test_bpf_iter:FAIL:skel_open_load: mptcp_iter unexpected error: -22
#169/4 mptcp/bpf_iter:FAIL
'''
I don't know how to fix it yet.
> valid owned kernel object. Other BPF folks might help to clarify
> this.
>
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy)
> > +BTF_ID_FLAGS(func, mptcp_subflow_active)
But we do need to add KF_ITER_NEW/NEXT/DESTROY flags here:
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT |
KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
With these flags, we can drop this additional test in loop:
if (i++ >= MPTCP_SUBFLOWS_MAX)
break;
This problem has been bothering me for a while. I got "infinite loop
detected" errors when walking the list with
bpf_for_each(mptcp_subflow). So I had to use this additional test to
break it.
With these KF_ITER_NEW/NEXT/DESTROY flags, no need to use this
additional test anymore.
Thanks,
-Geliang
[1]
https://patchwork.kernel.org/project/mptcp/patch/4467ab3af0f1a6c378778ca6be72753b16a1532c.1726132802.git.tanggeliang@kylinos.cn/
> > BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled)
> > BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx_by_pos)
> > -BTF_ID_FLAGS(func, mptcp_subflow_active)
> > BTF_ID_FLAGS(func, mptcp_set_timeout)
> > BTF_ID_FLAGS(func, mptcp_wnd_end)
> > BTF_ID_FLAGS(func, tcp_stream_memory_free)
> > @@ -241,6 +286,8 @@ static int __init bpf_mptcp_kfunc_init(void)
> > int ret;
> >
> > ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> > + ret = ret ?:
> > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> > +
> > &bpf_mptcp_sched_kfunc_set);
> > ret = ret ?:
> > register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
> >
> > &bpf_mptcp_sched_kfunc_set);
> > #ifdef CONFIG_BPF_JIT
> > --
> > 2.43.0
> >
> >
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
2024-09-13 4:04 ` Geliang Tang
@ 2024-09-13 20:57 ` Andrii Nakryiko
2024-09-14 0:41 ` Martin KaFai Lau
0 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2024-09-13 20:57 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp, Geliang Tang, bpf, Martin KaFai Lau
On Thu, Sep 12, 2024 at 9:04 PM Geliang Tang <geliang@kernel.org> wrote:
>
> Hi Andrii,
>
> On Thu, 2024-09-12 at 11:24 -0700, Andrii Nakryiko wrote:
> > On Thu, Sep 12, 2024 at 2:26 AM Geliang Tang <geliang@kernel.org>
> > wrote:
> > >
> > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > >
> > > It's necessary to traverse all subflows on the conn_list of an
> > > MPTCP
> > > socket and then call kfunc to modify the fields of each subflow. In
> > > kernel space, mptcp_for_each_subflow() helper is used for this:
> > >
> > > mptcp_for_each_subflow(msk, subflow)
> > > kfunc(subflow);
> > >
> > > But in the MPTCP BPF program, this has not yet been implemented. As
> > > Martin suggested recently, this conn_list walking + modify-by-kfunc
> > > usage fits the bpf_iter use case. So this patch adds a new bpf_iter
> > > type named "mptcp_subflow" to do this and implements its helpers
> > > bpf_iter_mptcp_subflow_new()/_next()/_destroy().
> > >
> > > Since these bpf_iter mptcp_subflow helpers are invoked in its
> > > selftest
> > > in a ftrace hook for mptcp_sched_get_send(), it's necessary to
> > > register
> > > them into a BPF_PROG_TYPE_TRACING type kfunc set together with
> > > other
> > > two used kfuncs mptcp_subflow_active() and
> > > mptcp_subflow_set_scheduled().
> > >
> > > Then bpf_for_each() for mptcp_subflow can be used in BPF program
> > > like
> > > this:
> > >
> > > i = 0;
> > > bpf_rcu_read_lock();
> > > bpf_for_each(mptcp_subflow, subflow, msk) {
> > > if (i++ >= MPTCP_SUBFLOWS_MAX)
> > > break;
> > > kfunc(subflow);
> > > }
> > > bpf_rcu_read_unlock();
> > >
> > > v2: remove msk->pm.lock in _new() and _destroy() (Martin)
> > > drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2]
> > > (Andrii)
> > > v3: drop bpf_iter__mptcp_subflow
> > > v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
> > > it in
> > > _next() (Andrii)
> > >
> > > Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> > > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > > ---
> > > net/mptcp/bpf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++-
> > > ----
> > > 1 file changed, 52 insertions(+), 5 deletions(-)
> > >
> >
> > Looks ok from setting up open-coded iterator things, but I can't
> > speak
> > for other aspects I mentioned below.
> >
> > > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > > index 6414824402e6..fec18e7e5e4a 100644
> > > --- a/net/mptcp/bpf.c
> > > +++ b/net/mptcp/bpf.c
> > > @@ -201,9 +201,51 @@ static const struct btf_kfunc_id_set
> > > bpf_mptcp_fmodret_set = {
> > > .set = &bpf_mptcp_fmodret_ids,
> > > };
> > >
> > > -__diag_push();
> > > -__diag_ignore_all("-Wmissing-prototypes",
> > > - "kfuncs which will be used in BPF programs");
> > > +struct bpf_iter_mptcp_subflow {
> > > + __u64 __opaque[2];
> > > +} __attribute__((aligned(8)));
> > > +
> > > +struct bpf_iter_mptcp_subflow_kern {
> > > + struct mptcp_sock *msk;
> > > + struct list_head *pos;
> > > +} __attribute__((aligned(8)));
> > > +
> > > +__bpf_kfunc_start_defs();
> > > +
> > > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
> > > bpf_iter_mptcp_subflow *it,
> > > + struct mptcp_sock *msk)
> > > +{
> > > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > > +
> > > + kit->msk = msk;
> > > + if (!msk)
> > > + return -EINVAL;
> > > +
> > > + kit->pos = &msk->conn_list;
> > > + return 0;
> > > +}
> > > +
> > > +__bpf_kfunc struct mptcp_subflow_context *
> > > +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> > > +{
> > > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > > + struct mptcp_subflow_context *subflow;
> > > + struct mptcp_sock *msk = kit->msk;
> > > +
> > > + if (!msk)
> > > + return NULL;
> > > +
> > > + subflow = list_entry(kit->pos->next, struct
> > > mptcp_subflow_context, node);
> > > + if (!subflow || list_entry_is_head(subflow, &msk-
> > > >conn_list, node))
> >
> > it's a bit weird that you need both !subflow and
> > list_entry_is_head().
> > Can you have NULL subflow at all? But this is the question to
> > msk->conn_list semantics.
>
> Do you mean to return subflow regardless of whether subflow is NULL? If
Can you have a NULL subflow in the middle of the list and some more
items after that? If yes, then you should *skip* NULL subflow.
But if the NULL subflow is always last, then do you even need
list_entry_is_head() or list_is_last()?
But ultimately, I have no idea what the data looks like, just
double-check that the stopping condition is correct, that's all. Has
nothing to do with BPF open-coded iterators.
> so, we need to use list_is_last() instead of list_entry_is_head():
>
> {
> struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
>
> if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
> return NULL;
>
> kit->pos = kit->pos->next;
> return list_entry(kit->pos, struct mptcp_subflow_context, node);
> }
>
> Hope this is a better version.
ok, works for me as well
>
> >
> > > + return NULL;
> > > +
> > > + kit->pos = &subflow->node;
> > > + return subflow;
> > > +}
> > > +
> > > +__bpf_kfunc void bpf_iter_mptcp_subflow_destroy(struct
> > > bpf_iter_mptcp_subflow *it)
> > > +{
> > > +}
> > >
> > > __bpf_kfunc struct mptcp_subflow_context *
> > > bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data,
> > > unsigned int pos)
> > > @@ -218,12 +260,15 @@ __bpf_kfunc bool
> > > bpf_mptcp_subflow_queues_empty(struct sock *sk)
> > > return tcp_rtx_queue_empty(sk);
> > > }
> > >
> > > -__diag_pop();
> > > +__bpf_kfunc_end_defs();
> > >
> > > BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> > > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
> >
> > I'm not 100% sure, but I suspect you might need to specify
> > KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is a
>
> KF_TRUSTED_ARGS breaks the selftest in patch 2 [1]:
Please, for the next revision, CC bpf@vger.kernel.org on *all* patches
in your series, so we don't have to search for the selftests in
another mailing list.
>
> '''
> libbpf: prog 'trace_mptcp_sched_get_send': BPF program load failed:
> Invalid argument
> libbpf: prog 'trace_mptcp_sched_get_send': -- BEGIN PROG LOAD LOG --
> 0: R1=ctx() R10=fp0
> ; int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk) @
> mptcp_bpf_iter.c:13
> 0: (79) r6 = *(u64 *)(r1 +0)
> func 'mptcp_sched_get_send' arg0 has btf_id 28019 type STRUCT
> 'mptcp_sock'
> 1: R1=ctx() R6_w=ptr_mptcp_sock()
> ; if (bpf_get_current_pid_tgid() >> 32 != pid) @ mptcp_bpf_iter.c:17
> 1: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar()
> 2: (77) r0 >>= 32 ;
> R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> 3: (18) r1 = 0xffffb766c1684004 ;
> R1_w=map_value(map=mptcp_bp.bss,ks=4,vs=8,off=4)
> 5: (61) r1 = *(u32 *)(r1 +0) ;
> R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> 6: (67) r1 <<= 32 ;
> R1_w=scalar(smax=0x7fffffff00000000,umax=0xffffffff00000000,smin32=0,sm
> ax32=umax32=0,var_off=(0x0; 0xffffffff00000000))
> 7: (c7) r1 s>>= 32 ;
> R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
> 8: (5d) if r0 != r1 goto pc+39 ;
> R0_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0;
> 0x7fffffff))
> R1_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0;
> 0x7fffffff))
> ; iter = 0; @ mptcp_bpf_iter.c:20
> 9: (18) r8 = 0xffffb766c1684000 ;
> R8_w=map_value(map=mptcp_bp.bss,ks=4,vs=8)
> 11: (b4) w1 = 0 ; R1_w=0
> 12: (63) *(u32 *)(r8 +0) = r1 ; R1_w=0
> R8_w=map_value(map=mptcp_bp.bss,ks=4,vs=8)
> ; bpf_rcu_read_lock(); @ mptcp_bpf_iter.c:22
> 13: (85) call bpf_rcu_read_lock#84967 ;
> 14: (bf) r7 = r10 ; R7_w=fp0 R10=fp0
> ; iter = 0; @ mptcp_bpf_iter.c:20
> 15: (07) r7 += -16 ; R7_w=fp-16
> ; bpf_for_each(mptcp_subflow, subflow, msk) { @ mptcp_bpf_iter.c:23
> 16: (bf) r1 = r7 ; R1_w=fp-16 R7_w=fp-16
> 17: (bf) r2 = r6 ; R2_w=ptr_mptcp_sock()
> R6=ptr_mptcp_sock()
> 18: (85) call bpf_iter_mptcp_subflow_new#62694
> R2 must be referenced or trusted
> processed 17 insns (limit 1000000) max_states_per_insn 0 total_states 1
> peak_states 1 mark_read 1
> -- END PROG LOAD LOG --
> libbpf: prog 'trace_mptcp_sched_get_send': failed to load: -22
> libbpf: failed to load object 'mptcp_bpf_iter'
> libbpf: failed to load BPF skeleton 'mptcp_bpf_iter': -22
> test_bpf_iter:FAIL:skel_open_load: mptcp_iter unexpected error: -22
> #169/4 mptcp/bpf_iter:FAIL
> '''
>
> I don't know how to fix it yet.
And the verifier is pointing out the real problem (selftest is at [0]
for those interested).
struct mptcp_sock *msk argument for fentry/mptcp_sched_get_send can't
be trusted to be valid and be properly protected from being freed,
etc. You need to have kfuncs that will have KF_ACQUIRE semantics and
will take refcount or whatnot (or KF_RCU for RCU-protection).
See some examples where we do the same with tasks or maybe sockets,
I'm not very familiar with this area.
[0] https://lore.kernel.org/mptcp/4467ab3af0f1a6c378778ca6be72753b16a1532c.1726132802.git.tanggeliang@kylinos.cn/
>
> > valid owned kernel object. Other BPF folks might help to clarify
> > this.
> >
> > > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next)
> > > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy)
> > > +BTF_ID_FLAGS(func, mptcp_subflow_active)
>
> But we do need to add KF_ITER_NEW/NEXT/DESTROY flags here:
>
> BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW)
> BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT |
> KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
>
> With these flags, we can drop this additional test in loop:
>
> if (i++ >= MPTCP_SUBFLOWS_MAX)
> break;
>
>
> This problem has been bothering me for a while. I got "infinite loop
> detected" errors when walking the list with
> bpf_for_each(mptcp_subflow). So I had to use this additional test to
> break it.
>
> With these KF_ITER_NEW/NEXT/DESTROY flags, no need to use this
> additional test anymore.
>
These flags is what makes those functions an open-coded iterator
implementation, yes.
> Thanks,
> -Geliang
>
> [1]
> https://patchwork.kernel.org/project/mptcp/patch/4467ab3af0f1a6c378778ca6be72753b16a1532c.1726132802.git.tanggeliang@kylinos.cn/
>
> > > BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled)
> > > BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx_by_pos)
> > > -BTF_ID_FLAGS(func, mptcp_subflow_active)
> > > BTF_ID_FLAGS(func, mptcp_set_timeout)
> > > BTF_ID_FLAGS(func, mptcp_wnd_end)
> > > BTF_ID_FLAGS(func, tcp_stream_memory_free)
> > > @@ -241,6 +286,8 @@ static int __init bpf_mptcp_kfunc_init(void)
> > > int ret;
> > >
> > > ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> > > + ret = ret ?:
> > > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> > > +
> > > &bpf_mptcp_sched_kfunc_set);
> > > ret = ret ?:
> > > register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
> > >
> > > &bpf_mptcp_sched_kfunc_set);
> > > #ifdef CONFIG_BPF_JIT
> > > --
> > > 2.43.0
> > >
> > >
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
2024-09-13 20:57 ` Andrii Nakryiko
@ 2024-09-14 0:41 ` Martin KaFai Lau
2024-09-14 8:40 ` Geliang Tang
0 siblings, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2024-09-14 0:41 UTC (permalink / raw)
To: Geliang Tang; +Cc: Andrii Nakryiko, mptcp, Geliang Tang, bpf, Martin KaFai Lau
On 9/13/24 1:57 PM, Andrii Nakryiko wrote:
>>>> +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
>>>> bpf_iter_mptcp_subflow *it,
>>>> + struct mptcp_sock *msk)
>>>> +{
>>>> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
>>>> +
>>>> + kit->msk = msk;
>>>> + if (!msk)
>>>> + return -EINVAL;
>>>> +
>>>> + kit->pos = &msk->conn_list;
>>>> + return 0;
>>>> +}
[ ... ]
>>>> BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
>>>> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
>>>
>>> I'm not 100% sure, but I suspect you might need to specify
>>> KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is a
+1
>>>> @@ -241,6 +286,8 @@ static int __init bpf_mptcp_kfunc_init(void)
>>>> int ret;
>>>>
>>>> ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
>>>> + ret = ret ?:
>>>> register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
>>>> +
>>>> &bpf_mptcp_sched_kfunc_set);
This cannot be used in tracing.
Going back to my earlier question in v1. How is the msk->conn_list protected?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
2024-09-14 0:41 ` Martin KaFai Lau
@ 2024-09-14 8:40 ` Geliang Tang
2024-09-14 10:12 ` Geliang Tang
0 siblings, 1 reply; 19+ messages in thread
From: Geliang Tang @ 2024-09-14 8:40 UTC (permalink / raw)
To: Martin KaFai Lau, Matthieu Baerts
Cc: Andrii Nakryiko, mptcp, Geliang Tang, bpf, Martin KaFai Lau
Hi Martin, Andrii, Matt,
On Fri, 2024-09-13 at 17:41 -0700, Martin KaFai Lau wrote:
> On 9/13/24 1:57 PM, Andrii Nakryiko wrote:
> > > > > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
> > > > > bpf_iter_mptcp_subflow *it,
> > > > > + struct mptcp_sock
> > > > > *msk)
> > > > > +{
> > > > > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > > > > +
> > > > > + kit->msk = msk;
> > > > > + if (!msk)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + kit->pos = &msk->conn_list;
> > > > > + return 0;
> > > > > +}
>
> [ ... ]
>
> > > > > BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> > > > > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
> > > >
> > > > I'm not 100% sure, but I suspect you might need to specify
> > > > KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is
> > > > a
>
> +1
So we must add KF_TRUSTED_ARGS flag, right?
>
> > > > > @@ -241,6 +286,8 @@ static int __init
> > > > > bpf_mptcp_kfunc_init(void)
> > > > > int ret;
> > > > >
> > > > > ret =
> > > > > register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> > > > > + ret = ret ?:
> > > > > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> > > > > +
> > > > > &bpf_mptcp_sched_kfunc_set);
>
> This cannot be used in tracing.
Actually, we don’t need to use mptcp_subflow bpf_iter in tracing.
We plan to use it in MPTCP BPF packet schedulers, which are not
tracing, but "struct_ops" types. And they work well with
KF_TRUSTED_ARGS flag in bpf_iter_mptcp_subflow_new:
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW |
KF_TRUSTED_ARGS);
An example of the scheduler is:
SEC("struct_ops")
int BPF_PROG(bpf_first_get_subflow, struct mptcp_sock *msk,
struct mptcp_sched_data *data)
{
struct mptcp_subflow_context *subflow;
bpf_rcu_read_lock();
bpf_for_each(mptcp_subflow, subflow, msk) {
mptcp_subflow_set_scheduled(subflow, true);
break;
}
bpf_rcu_read_unlock();
return 0;
}
SEC(".struct_ops")
struct mptcp_sched_ops first = {
.init = (void *)mptcp_sched_first_init,
.release = (void *)mptcp_sched_first_release,
.get_subflow = (void *)bpf_first_get_subflow,
.name = "bpf_first",
};
But BPF mptcp_sched_ops code has not been merged into bpf-next yet, so
I simply test this bpf_for_each(mptcp_subflow) in tracing since I
noticed other bpf_iter selftests are using tracing too:
progs/iters_task.c
SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
progs/iters_css.c
SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
If this bpf_for_each(mptcp_subflow) can only be used in struct_ops, I
will try to move the selftest into a struct_ops.
>
> Going back to my earlier question in v1. How is the msk->conn_list
> protected?
>
msk->conn_list is protected by msk socket lock. (@Matt, am I right?) We
use this in kernel code:
struct sock *sk = (struct sock *)msk;
lock_sock(sk);
kfunc(&msk->conn_list);
release_sock(sk);
If so, should we also use lock_sock/release_sock in
bpf_iter_mptcp_subflow_next()?
Thanks,
-Geliang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
2024-09-14 8:40 ` Geliang Tang
@ 2024-09-14 10:12 ` Geliang Tang
2024-09-28 1:34 ` Martin KaFai Lau
0 siblings, 1 reply; 19+ messages in thread
From: Geliang Tang @ 2024-09-14 10:12 UTC (permalink / raw)
To: Martin KaFai Lau, Matthieu Baerts
Cc: Andrii Nakryiko, mptcp, Geliang Tang, bpf, Martin KaFai Lau
On Sat, 2024-09-14 at 16:40 +0800, Geliang Tang wrote:
> Hi Martin, Andrii, Matt,
>
> On Fri, 2024-09-13 at 17:41 -0700, Martin KaFai Lau wrote:
> > On 9/13/24 1:57 PM, Andrii Nakryiko wrote:
> > > > > > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
> > > > > > bpf_iter_mptcp_subflow *it,
> > > > > > + struct
> > > > > > mptcp_sock
> > > > > > *msk)
> > > > > > +{
> > > > > > + struct bpf_iter_mptcp_subflow_kern *kit = (void
> > > > > > *)it;
> > > > > > +
> > > > > > + kit->msk = msk;
> > > > > > + if (!msk)
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + kit->pos = &msk->conn_list;
> > > > > > + return 0;
> > > > > > +}
> >
> > [ ... ]
> >
> > > > > > BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> > > > > > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
> > > > >
> > > > > I'm not 100% sure, but I suspect you might need to specify
> > > > > KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk`
> > > > > is
> > > > > a
> >
> > +1
>
> So we must add KF_TRUSTED_ARGS flag, right?
>
> >
> > > > > > @@ -241,6 +286,8 @@ static int __init
> > > > > > bpf_mptcp_kfunc_init(void)
> > > > > > int ret;
> > > > > >
> > > > > > ret =
> > > > > > register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> > > > > > + ret = ret ?:
> > > > > > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> > > > > > +
> > > > > > &bpf_mptcp_sched_kfunc_set);
> >
> > This cannot be used in tracing.
>
> Actually, we don’t need to use mptcp_subflow bpf_iter in tracing.
>
> We plan to use it in MPTCP BPF packet schedulers, which are not
> tracing, but "struct_ops" types. And they work well with
> KF_TRUSTED_ARGS flag in bpf_iter_mptcp_subflow_new:
>
> BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW |
> KF_TRUSTED_ARGS);
>
> An example of the scheduler is:
>
> SEC("struct_ops")
> int BPF_PROG(bpf_first_get_subflow, struct mptcp_sock *msk,
> struct mptcp_sched_data *data)
> {
> struct mptcp_subflow_context *subflow;
>
> bpf_rcu_read_lock();
> bpf_for_each(mptcp_subflow, subflow, msk) {
> mptcp_subflow_set_scheduled(subflow, true);
> break;
> }
> bpf_rcu_read_unlock();
>
> return 0;
> }
>
> SEC(".struct_ops")
> struct mptcp_sched_ops first = {
> .init = (void *)mptcp_sched_first_init,
> .release = (void *)mptcp_sched_first_release,
> .get_subflow = (void *)bpf_first_get_subflow,
> .name = "bpf_first",
> };
>
> But BPF mptcp_sched_ops code has not been merged into bpf-next yet,
> so
> I simply test this bpf_for_each(mptcp_subflow) in tracing since I
> noticed other bpf_iter selftests are using tracing too:
>
> progs/iters_task.c
> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>
> progs/iters_css.c
> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>
> If this bpf_for_each(mptcp_subflow) can only be used in struct_ops, I
> will try to move the selftest into a struct_ops.
>
> >
> > Going back to my earlier question in v1. How is the msk->conn_list
> > protected?
> >
>
> msk->conn_list is protected by msk socket lock. (@Matt, am I right?)
> We
> use this in kernel code:
>
> struct sock *sk = (struct sock *)msk;
>
> lock_sock(sk);
> kfunc(&msk->conn_list);
> release_sock(sk);
>
> If so, should we also use lock_sock/release_sock in
> bpf_iter_mptcp_subflow_next()?
bpf_for_each(mptcp_subflow) is expected to be used in the get_subflow()
interface of mptcp_sched_ops to traverse all subflows and then pick one
subflow to send data. This interface is invoked in
mptcp_sched_get_send(), here the msk socket lock is hold already:
int mptcp_sched_get_send(struct mptcp_sock *msk)
{
struct mptcp_subflow_context *subflow;
struct mptcp_sched_data data;
msk_owned_by_me(msk);
... ...
mptcp_for_each_subflow(msk, subflow) {
if (READ_ONCE(subflow->scheduled))
return 0;
}
data.reinject = false;
if (msk->sched == &mptcp_sched_default || !msk->sched)
return mptcp_sched_default_get_subflow(msk, &data);
return msk->sched->get_subflow(msk, &data);
}
So no need to hold msk socket lock again in BPF schedulers. This means
we can walk the conn_list without any lock. bpf_rcu_read_lock() and
bpf_rcu_read_unlock() can be dropped in BPF schedulers too. Am I right?
Thanks,
-Geliang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
2024-09-14 10:12 ` Geliang Tang
@ 2024-09-28 1:34 ` Martin KaFai Lau
0 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2024-09-28 1:34 UTC (permalink / raw)
To: Geliang Tang
Cc: Matthieu Baerts, Andrii Nakryiko, mptcp, Geliang Tang, bpf,
Martin KaFai Lau
On 9/14/24 12:12 PM, Geliang Tang wrote:
>>>>>>> @@ -241,6 +286,8 @@ static int __init
>>>>>>> bpf_mptcp_kfunc_init(void)
>>>>>>> int ret;
>>>>>>>
>>>>>>> ret =
>>>>>>> register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
>>>>>>> + ret = ret ?:
>>>>>>> register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
>>>>>>> +
>>>>>>> &bpf_mptcp_sched_kfunc_set);
>>>
>>> This cannot be used in tracing.
>>
>> Actually, we don’t need to use mptcp_subflow bpf_iter in tracing.
>>
>> We plan to use it in MPTCP BPF packet schedulers, which are not
>> tracing, but "struct_ops" types. And they work well with
>> KF_TRUSTED_ARGS flag in bpf_iter_mptcp_subflow_new:
>>
>> BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW |
>> KF_TRUSTED_ARGS);
>>
>> An example of the scheduler is:
>>
>> SEC("struct_ops")
>> int BPF_PROG(bpf_first_get_subflow, struct mptcp_sock *msk,
>> struct mptcp_sched_data *data)
>> {
>> struct mptcp_subflow_context *subflow;
>>
>> bpf_rcu_read_lock();
>> bpf_for_each(mptcp_subflow, subflow, msk) {
>> mptcp_subflow_set_scheduled(subflow, true);
>> break;
>> }
>> bpf_rcu_read_unlock();
>>
>> return 0;
>> }
>>
>> SEC(".struct_ops")
>> struct mptcp_sched_ops first = {
>> .init = (void *)mptcp_sched_first_init,
>> .release = (void *)mptcp_sched_first_release,
>> .get_subflow = (void *)bpf_first_get_subflow,
>> .name = "bpf_first",
>> };
>>
>> But BPF mptcp_sched_ops code has not been merged into bpf-next yet,
>> so
>> I simply test this bpf_for_each(mptcp_subflow) in tracing since I
>> noticed other bpf_iter selftests are using tracing too:
>>
>> progs/iters_task.c
>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>>
>> progs/iters_css.c
>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>>
>> If this bpf_for_each(mptcp_subflow) can only be used in struct_ops, I
>> will try to move the selftest into a struct_ops.
>>
>>>
>>> Going back to my earlier question in v1. How is the msk->conn_list
>>> protected?
>>>
>>
>> msk->conn_list is protected by msk socket lock. (@Matt, am I right?)
>> We use this in kernel code:
A KF_TRUSTED_ARGS msk does not mean its sock locked. That said, only the tp_btf
should have trusted msk args but still it is hard to audit all existing and
future tracepoints which may or may not have msk lock held.
If the subflow list is rcu-ify, it will be easier to reason for tracing use
case. Regardless, the use case is not for tracing, so this discussion is
probably moot now.
>>
>> struct sock *sk = (struct sock *)msk;
>>
>> lock_sock(sk);
>> kfunc(&msk->conn_list);
>> release_sock(sk);
>>
>> If so, should we also use lock_sock/release_sock in
>> bpf_iter_mptcp_subflow_next()?
>
> bpf_for_each(mptcp_subflow) is expected to be used in the get_subflow()
> interface of mptcp_sched_ops to traverse all subflows and then pick one
> subflow to send data. This interface is invoked in
> mptcp_sched_get_send(), here the msk socket lock is hold already:
>
> int mptcp_sched_get_send(struct mptcp_sock *msk)
> {
> struct mptcp_subflow_context *subflow;
> struct mptcp_sched_data data;
>
> msk_owned_by_me(msk);
>
> ... ...
>
> mptcp_for_each_subflow(msk, subflow) {
> if (READ_ONCE(subflow->scheduled))
> return 0;
> }
>
> data.reinject = false;
> if (msk->sched == &mptcp_sched_default || !msk->sched)
> return mptcp_sched_default_get_subflow(msk, &data);
> return msk->sched->get_subflow(msk, &data);
> }
>
> So no need to hold msk socket lock again in BPF schedulers. This means
> we can walk the conn_list without any lock. bpf_rcu_read_lock() and
> bpf_rcu_read_unlock() can be dropped in BPF schedulers too. Am I right?
hmm... don't know how bpf_rcu_read_lock() comes into picture considering the msk
lock has already been held in the struct_ops that you are planning to add.
Something you may need to consider on the subflow locking situation. Not sure
exactly what the bpf prog needs to do on a subflow. e.g. If it needs to change
some fields in the subflow, does it need to lock the subflow or the msk lock is
good enough.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-09-28 1:35 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-12 9:25 [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 1/5] bpf: Add " Geliang Tang
2024-09-12 18:24 ` Andrii Nakryiko
2024-09-13 4:04 ` Geliang Tang
2024-09-13 20:57 ` Andrii Nakryiko
2024-09-14 0:41 ` Martin KaFai Lau
2024-09-14 8:40 ` Geliang Tang
2024-09-14 10:12 ` Geliang Tang
2024-09-28 1:34 ` Martin KaFai Lau
2024-09-12 9:25 ` [PATCH mptcp-next v5 2/5] selftests/bpf: Add mptcp_subflow bpf_iter test prog Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 3/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 4/5] Squash to "mptcp: add sched_data helpers" Geliang Tang
2024-09-12 9:25 ` [PATCH mptcp-next v5 5/5] Squash to "selftests/bpf: Add bpf_burst scheduler & test" Geliang Tang
2024-09-12 9:52 ` [PATCH mptcp-next v5 0/5] add mptcp_subflow bpf_iter MPTCP CI
2024-09-12 9:54 ` Matthieu Baerts
2024-09-12 10:45 ` MPTCP CI
2024-09-12 11:10 ` MPTCP CI
2024-09-12 11:26 ` Matthieu Baerts
2024-09-12 11:44 ` MPTCP CI
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.