* [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog
@ 2022-09-29 7:04 Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline Martin KaFai Lau
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2022-09-29 7:04 UTC (permalink / raw)
To: ' ', ' '
Cc: 'Alexei Starovoitov ', 'Andrii Nakryiko ',
'Daniel Borkmann ', 'David Miller ',
'Jakub Kicinski ', 'Eric Dumazet ',
'Paolo Abeni ', ' '
From: Martin KaFai Lau <martin.lau@kernel.org>
The struct_ops is sharing the tracing-trampoline's enter/exit
function which tracks prog->active to avoid recursion. It turns
out the struct_ops bpf prog will hit this prog->active and
unnecessarily skipped running the struct_ops prog. eg. The
'.ssthresh' may run in_task() and then interrupted by softirq
that runs the same '.ssthresh'.
The kernel does not call the tcp-cc's ops in a recursive way,
so this set is to remove the recursion check for struct_ops prog.
v3:
- Clear the bpf_chg_cc_inprogress from the newly cloned tcp_sock
in tcp_create_openreq_child() because the listen sk can
be cloned without lock being held. (Eric Dumazet)
v2:
- v1 [0] turned into a long discussion on a few cases and also
whether it needs to follow the bpf_run_ctx chain if there is
tracing bpf_run_ctx (kprobe/trace/trampoline) running in between.
It is a good signal that it is not obvious enough to reason
about it and needs a tradeoff for a more straight forward approach.
This revision uses one bit out of an existing 1 byte hole
in the tcp_sock. It is in Patch 4.
[0]: https://lore.kernel.org/bpf/20220922225616.3054840-1-kafai@fb.com/T/#md98d40ac5ec295fdadef476c227a3401b2b6b911
Martin KaFai Lau (5):
bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()
bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another
function
bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur
itself
selftests/bpf: Check -EBUSY for the recurred
bpf_setsockopt(TCP_CONGESTION)
arch/x86/net/bpf_jit_comp.c | 3 +
include/linux/bpf.h | 4 ++
include/linux/tcp.h | 6 ++
kernel/bpf/trampoline.c | 23 ++++++
net/core/filter.c | 70 ++++++++++++++-----
net/ipv4/tcp_minisocks.c | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 ++
tools/testing/selftests/bpf/progs/bpf_dctcp.c | 25 ++++---
8 files changed, 112 insertions(+), 24 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 bpf-next 1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
2022-09-29 7:04 [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog Martin KaFai Lau
@ 2022-09-29 7:04 ` Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 2/5] bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() Martin KaFai Lau
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2022-09-29 7:04 UTC (permalink / raw)
To: ' ', ' '
Cc: 'Alexei Starovoitov ', 'Andrii Nakryiko ',
'Daniel Borkmann ', 'David Miller ',
'Jakub Kicinski ', 'Eric Dumazet ',
'Paolo Abeni ', ' '
From: Martin KaFai Lau <martin.lau@kernel.org>
The struct_ops prog is to allow using bpf to implement the functions in
a struct (eg. kernel module). The current usage is to implement the
tcp_congestion. The kernel does not call the tcp-cc's ops (ie.
the bpf prog) in a recursive way.
The struct_ops is sharing the tracing-trampoline's enter/exit
function which tracks prog->active to avoid recursion. It is
needed for tracing prog. However, it turns out the struct_ops
bpf prog will hit this prog->active and unnecessarily skipped
running the struct_ops prog. eg. The '.ssthresh' may run in_task()
and then interrupted by softirq that runs the same '.ssthresh'.
Skip running the '.ssthresh' will end up returning random value
to the caller.
The patch adds __bpf_prog_{enter,exit}_struct_ops for the
struct_ops trampoline. They do not track the prog->active
to detect recursion.
One exception is when the tcp_congestion's '.init' ops is doing
bpf_setsockopt(TCP_CONGESTION) and then recurs to the same
'.init' ops. This will be addressed in the following patches.
Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
arch/x86/net/bpf_jit_comp.c | 3 +++
include/linux/bpf.h | 4 ++++
kernel/bpf/trampoline.c | 23 +++++++++++++++++++++++
3 files changed, 30 insertions(+)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 35796db58116..5b6230779cf3 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1836,6 +1836,9 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
if (p->aux->sleepable) {
enter = __bpf_prog_enter_sleepable;
exit = __bpf_prog_exit_sleepable;
+ } else if (p->type == BPF_PROG_TYPE_STRUCT_OPS) {
+ enter = __bpf_prog_enter_struct_ops;
+ exit = __bpf_prog_exit_struct_ops;
} else if (p->expected_attach_type == BPF_LSM_CGROUP) {
enter = __bpf_prog_enter_lsm_cgroup;
exit = __bpf_prog_exit_lsm_cgroup;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5161fac0513f..f98ab36c09d3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -864,6 +864,10 @@ u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog,
struct bpf_tramp_run_ctx *run_ctx);
void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start,
struct bpf_tramp_run_ctx *run_ctx);
+u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog,
+ struct bpf_tramp_run_ctx *run_ctx);
+void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start,
+ struct bpf_tramp_run_ctx *run_ctx);
void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr);
void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr);
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 6f7b939321d6..bf0906e1e2b9 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -964,6 +964,29 @@ void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start,
rcu_read_unlock_trace();
}
+u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog,
+ struct bpf_tramp_run_ctx *run_ctx)
+ __acquires(RCU)
+{
+ rcu_read_lock();
+ migrate_disable();
+
+ run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
+
+ return bpf_prog_start_time();
+}
+
+void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start,
+ struct bpf_tramp_run_ctx *run_ctx)
+ __releases(RCU)
+{
+ bpf_reset_run_ctx(run_ctx->saved_run_ctx);
+
+ update_prog_stats(prog, start);
+ migrate_enable();
+ rcu_read_unlock();
+}
+
void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr)
{
percpu_ref_get(&tr->pcref);
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 bpf-next 2/5] bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()
2022-09-29 7:04 [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline Martin KaFai Lau
@ 2022-09-29 7:04 ` Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 3/5] bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function Martin KaFai Lau
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2022-09-29 7:04 UTC (permalink / raw)
To: ' ', ' '
Cc: 'Alexei Starovoitov ', 'Andrii Nakryiko ',
'Daniel Borkmann ', 'David Miller ',
'Jakub Kicinski ', 'Eric Dumazet ',
'Paolo Abeni ', ' '
From: Martin KaFai Lau <martin.lau@kernel.org>
The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is
used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks.
However, it is not done for cg sock_ddr, cg sockops, and some of
the bpf_lsm_cgroup hooks.
The tcp-cc "cdg" should have very limited usage. This patch is to
move the "cdg" check to the common sol_tcp_sockopt() so that all
hooks have a consistent behavior. The motivation to make
this check consistent now is because the latter patch will
refactor the bpf_setsockopt(TCP_CONGESTION) into another function,
so it is better to take this chance to refactor this piece
also.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
net/core/filter.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 2fd9449026aa..f4cea3ff994a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5127,6 +5127,13 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
case TCP_CONGESTION:
if (*optlen < 2)
return -EINVAL;
+ /* "cdg" is the only cc that alloc a ptr
+ * in inet_csk_ca area. The bpf-tcp-cc may
+ * overwrite this ptr after switching to cdg.
+ */
+ if (!getopt && *optlen >= sizeof("cdg") - 1 &&
+ !strncmp("cdg", optval, *optlen))
+ return -ENOTSUPP;
break;
case TCP_SAVED_SYN:
if (*optlen < 1)
@@ -5285,12 +5292,6 @@ static int _bpf_getsockopt(struct sock *sk, int level, int optname,
BPF_CALL_5(bpf_sk_setsockopt, struct sock *, sk, int, level,
int, optname, char *, optval, int, optlen)
{
- if (level == SOL_TCP && optname == TCP_CONGESTION) {
- if (optlen >= sizeof("cdg") - 1 &&
- !strncmp("cdg", optval, optlen))
- return -ENOTSUPP;
- }
-
return _bpf_setsockopt(sk, level, optname, optval, optlen);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 bpf-next 3/5] bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function
2022-09-29 7:04 [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 2/5] bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() Martin KaFai Lau
@ 2022-09-29 7:04 ` Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Martin KaFai Lau
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2022-09-29 7:04 UTC (permalink / raw)
To: ' ', ' '
Cc: 'Alexei Starovoitov ', 'Andrii Nakryiko ',
'Daniel Borkmann ', 'David Miller ',
'Jakub Kicinski ', 'Eric Dumazet ',
'Paolo Abeni ', ' '
From: Martin KaFai Lau <martin.lau@kernel.org>
This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into
another function. The next patch will add extra logic to avoid
recursion and this will make the latter patch easier to follow.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
net/core/filter.c | 45 ++++++++++++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 17 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index f4cea3ff994a..96f2f7a65e65 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5102,6 +5102,33 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
return 0;
}
+static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
+ int *optlen, bool getopt)
+{
+ if (*optlen < 2)
+ return -EINVAL;
+
+ if (getopt) {
+ if (!inet_csk(sk)->icsk_ca_ops)
+ return -EINVAL;
+ /* BPF expects NULL-terminated tcp-cc string */
+ optval[--(*optlen)] = '\0';
+ return do_tcp_getsockopt(sk, SOL_TCP, TCP_CONGESTION,
+ KERNEL_SOCKPTR(optval),
+ KERNEL_SOCKPTR(optlen));
+ }
+
+ /* "cdg" is the only cc that alloc a ptr
+ * in inet_csk_ca area. The bpf-tcp-cc may
+ * overwrite this ptr after switching to cdg.
+ */
+ if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
+ return -ENOTSUPP;
+
+ return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+ KERNEL_SOCKPTR(optval), *optlen);
+}
+
static int sol_tcp_sockopt(struct sock *sk, int optname,
char *optval, int *optlen,
bool getopt)
@@ -5125,16 +5152,7 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
return -EINVAL;
break;
case TCP_CONGESTION:
- if (*optlen < 2)
- return -EINVAL;
- /* "cdg" is the only cc that alloc a ptr
- * in inet_csk_ca area. The bpf-tcp-cc may
- * overwrite this ptr after switching to cdg.
- */
- if (!getopt && *optlen >= sizeof("cdg") - 1 &&
- !strncmp("cdg", optval, *optlen))
- return -ENOTSUPP;
- break;
+ return sol_tcp_sockopt_congestion(sk, optval, optlen, getopt);
case TCP_SAVED_SYN:
if (*optlen < 1)
return -EINVAL;
@@ -5159,13 +5177,6 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
return 0;
}
- if (optname == TCP_CONGESTION) {
- if (!inet_csk(sk)->icsk_ca_ops)
- return -EINVAL;
- /* BPF expects NULL-terminated tcp-cc string */
- optval[--(*optlen)] = '\0';
- }
-
return do_tcp_getsockopt(sk, SOL_TCP, optname,
KERNEL_SOCKPTR(optval),
KERNEL_SOCKPTR(optlen));
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
2022-09-29 7:04 [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog Martin KaFai Lau
` (2 preceding siblings ...)
2022-09-29 7:04 ` [PATCH v3 bpf-next 3/5] bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function Martin KaFai Lau
@ 2022-09-29 7:04 ` Martin KaFai Lau
2022-09-29 16:07 ` Eric Dumazet
2022-09-29 7:04 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) Martin KaFai Lau
2022-09-29 16:40 ` [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog patchwork-bot+netdevbpf
5 siblings, 1 reply; 8+ messages in thread
From: Martin KaFai Lau @ 2022-09-29 7:04 UTC (permalink / raw)
To: ' ', ' '
Cc: 'Alexei Starovoitov ', 'Andrii Nakryiko ',
'Daniel Borkmann ', 'David Miller ',
'Jakub Kicinski ', 'Eric Dumazet ',
'Paolo Abeni ', ' '
From: Martin KaFai Lau <martin.lau@kernel.org>
When a bad bpf prog '.init' calls
bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:
.init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
... => .init => bpf_setsockopt(tcp_cc).
It was prevented by the prog->active counter before but the prog->active
detection cannot be used in struct_ops as explained in the earlier
patch of the set.
In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
in order to break the loop. This is done by using a bit of
an existing 1 byte hole in tcp_sock to check if there is
on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.
Note that this essentially limits only the first '.init' can
call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
does not support ECN) and the second '.init' cannot fallback to
another cc. This applies even the second
bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
include/linux/tcp.h | 6 ++++++
net/core/filter.c | 28 +++++++++++++++++++++++++++-
net/ipv4/tcp_minisocks.c | 1 +
3 files changed, 34 insertions(+), 1 deletion(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a9fbe22732c3..3bdf687e2fb3 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -388,6 +388,12 @@ struct tcp_sock {
u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs
* values defined in uapi/linux/tcp.h
*/
+ u8 bpf_chg_cc_inprogress:1; /* In the middle of
+ * bpf_setsockopt(TCP_CONGESTION),
+ * it is to avoid the bpf_tcp_cc->init()
+ * to recur itself by calling
+ * bpf_setsockopt(TCP_CONGESTION, "itself").
+ */
#define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
#else
#define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
diff --git a/net/core/filter.c b/net/core/filter.c
index 96f2f7a65e65..ac4c45c02da5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
int *optlen, bool getopt)
{
+ struct tcp_sock *tp;
+ int ret;
+
if (*optlen < 2)
return -EINVAL;
@@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
return -ENOTSUPP;
- return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+ /* It stops this looping
+ *
+ * .init => bpf_setsockopt(tcp_cc) => .init =>
+ * bpf_setsockopt(tcp_cc)" => .init => ....
+ *
+ * The second bpf_setsockopt(tcp_cc) is not allowed
+ * in order to break the loop when both .init
+ * are the same bpf prog.
+ *
+ * This applies even the second bpf_setsockopt(tcp_cc)
+ * does not cause a loop. This limits only the first
+ * '.init' can call bpf_setsockopt(TCP_CONGESTION) to
+ * pick a fallback cc (eg. peer does not support ECN)
+ * and the second '.init' cannot fallback to
+ * another.
+ */
+ tp = tcp_sk(sk);
+ if (tp->bpf_chg_cc_inprogress)
+ return -EBUSY;
+
+ tp->bpf_chg_cc_inprogress = 1;
+ ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
KERNEL_SOCKPTR(optval), *optlen);
+ tp->bpf_chg_cc_inprogress = 0;
+ return ret;
}
static int sol_tcp_sockopt(struct sock *sk, int optname,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index cb95d88497ae..ddcdc2bc4c04 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -541,6 +541,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
newtp->fastopen_req = NULL;
RCU_INIT_POINTER(newtp->fastopen_rsk, NULL);
+ newtp->bpf_chg_cc_inprogress = 0;
tcp_bpf_clone(sk, newsk);
__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 bpf-next 5/5] selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION)
2022-09-29 7:04 [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog Martin KaFai Lau
` (3 preceding siblings ...)
2022-09-29 7:04 ` [PATCH v3 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Martin KaFai Lau
@ 2022-09-29 7:04 ` Martin KaFai Lau
2022-09-29 16:40 ` [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog patchwork-bot+netdevbpf
5 siblings, 0 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2022-09-29 7:04 UTC (permalink / raw)
To: ' ', ' '
Cc: 'Alexei Starovoitov ', 'Andrii Nakryiko ',
'Daniel Borkmann ', 'David Miller ',
'Jakub Kicinski ', 'Eric Dumazet ',
'Paolo Abeni ', ' '
From: Martin KaFai Lau <martin.lau@kernel.org>
This patch changes the bpf_dctcp test to ensure the recurred
bpf_setsockopt(TCP_CONGESTION) returns -EBUSY.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 +++
tools/testing/selftests/bpf/progs/bpf_dctcp.c | 25 +++++++++++++------
2 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
index 2959a52ced06..e980188d4124 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
@@ -290,6 +290,10 @@ static void test_dctcp_fallback(void)
goto done;
ASSERT_STREQ(dctcp_skel->bss->cc_res, "cubic", "cc_res");
ASSERT_EQ(dctcp_skel->bss->tcp_cdg_res, -ENOTSUPP, "tcp_cdg_res");
+ /* All setsockopt(TCP_CONGESTION) in the recurred
+ * bpf_dctcp->init() should fail with -EBUSY.
+ */
+ ASSERT_EQ(dctcp_skel->bss->ebusy_cnt, 3, "ebusy_cnt");
err = getsockopt(srv_fd, SOL_TCP, TCP_CONGESTION, srv_cc, &cc_len);
if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)"))
diff --git a/tools/testing/selftests/bpf/progs/bpf_dctcp.c b/tools/testing/selftests/bpf/progs/bpf_dctcp.c
index 9573be6122be..460682759aed 100644
--- a/tools/testing/selftests/bpf/progs/bpf_dctcp.c
+++ b/tools/testing/selftests/bpf/progs/bpf_dctcp.c
@@ -11,6 +11,7 @@
#include <linux/types.h>
#include <linux/stddef.h>
#include <linux/tcp.h>
+#include <errno.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "bpf_tcp_helpers.h"
@@ -23,6 +24,7 @@ const char tcp_cdg[] = "cdg";
char cc_res[TCP_CA_NAME_MAX];
int tcp_cdg_res = 0;
int stg_result = 0;
+int ebusy_cnt = 0;
struct {
__uint(type, BPF_MAP_TYPE_SK_STORAGE);
@@ -64,16 +66,23 @@ void BPF_PROG(dctcp_init, struct sock *sk)
if (!(tp->ecn_flags & TCP_ECN_OK) && fallback[0]) {
/* Switch to fallback */
- bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
- (void *)fallback, sizeof(fallback));
- /* Switch back to myself which the bpf trampoline
- * stopped calling dctcp_init recursively.
+ if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+ (void *)fallback, sizeof(fallback)) == -EBUSY)
+ ebusy_cnt++;
+
+ /* Switch back to myself and the recurred dctcp_init()
+ * will get -EBUSY for all bpf_setsockopt(TCP_CONGESTION),
+ * except the last "cdg" one.
*/
- bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
- (void *)bpf_dctcp, sizeof(bpf_dctcp));
+ if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+ (void *)bpf_dctcp, sizeof(bpf_dctcp)) == -EBUSY)
+ ebusy_cnt++;
+
/* Switch back to fallback */
- bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
- (void *)fallback, sizeof(fallback));
+ if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+ (void *)fallback, sizeof(fallback)) == -EBUSY)
+ ebusy_cnt++;
+
/* Expecting -ENOTSUPP for tcp_cdg_res */
tcp_cdg_res = bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
(void *)tcp_cdg, sizeof(tcp_cdg));
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
2022-09-29 7:04 ` [PATCH v3 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Martin KaFai Lau
@ 2022-09-29 16:07 ` Eric Dumazet
0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2022-09-29 16:07 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
David Miller, Jakub Kicinski, Paolo Abeni, kernel-team
On Thu, Sep 29, 2022 at 12:04 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> From: Martin KaFai Lau <martin.lau@kernel.org>
>
> When a bad bpf prog '.init' calls
> bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:
>
> .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
> ... => .init => bpf_setsockopt(tcp_cc).
>
> It was prevented by the prog->active counter before but the prog->active
> detection cannot be used in struct_ops as explained in the earlier
> patch of the set.
>
> In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
> in order to break the loop. This is done by using a bit of
> an existing 1 byte hole in tcp_sock to check if there is
> on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.
>
> Note that this essentially limits only the first '.init' can
> call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
> does not support ECN) and the second '.init' cannot fallback to
> another cc. This applies even the second
> bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
>
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog
2022-09-29 7:04 [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog Martin KaFai Lau
` (4 preceding siblings ...)
2022-09-29 7:04 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) Martin KaFai Lau
@ 2022-09-29 16:40 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-29 16:40 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: bpf, netdev, ast, andrii, daniel, davem, kuba, edumazet, pabeni,
kernel-team
Hello:
This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Thu, 29 Sep 2022 00:04:02 -0700 you wrote:
> From: Martin KaFai Lau <martin.lau@kernel.org>
>
> The struct_ops is sharing the tracing-trampoline's enter/exit
> function which tracks prog->active to avoid recursion. It turns
> out the struct_ops bpf prog will hit this prog->active and
> unnecessarily skipped running the struct_ops prog. eg. The
> '.ssthresh' may run in_task() and then interrupted by softirq
> that runs the same '.ssthresh'.
>
> [...]
Here is the summary with links:
- [v3,bpf-next,1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
https://git.kernel.org/bpf/bpf-next/c/64696c40d03c
- [v3,bpf-next,2/5] bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()
https://git.kernel.org/bpf/bpf-next/c/37cfbe0bf2e8
- [v3,bpf-next,3/5] bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function
https://git.kernel.org/bpf/bpf-next/c/1e7d217faa11
- [v3,bpf-next,4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
https://git.kernel.org/bpf/bpf-next/c/061ff040710e
- [v3,bpf-next,5/5] selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION)
https://git.kernel.org/bpf/bpf-next/c/3411c5b6f8d6
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-09-29 16:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-29 7:04 [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 2/5] bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 3/5] bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function Martin KaFai Lau
2022-09-29 7:04 ` [PATCH v3 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Martin KaFai Lau
2022-09-29 16:07 ` Eric Dumazet
2022-09-29 7:04 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) Martin KaFai Lau
2022-09-29 16:40 ` [PATCH v3 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).