* RE: [Intel-wired-lan] [PATCH] intel/i40e: delete if NULL check before dev_kfree_skb
From: G, GurucharanX @ 2022-05-17 7:07 UTC (permalink / raw)
To: Bernard Zhao, Brandeburg, Jesse, Nguyen, Anthony L,
David S. Miller, Jakub Kicinski, Paolo Abeni,
intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: bernard@vivo.com
In-Reply-To: <20220511065451.655335-1-zhaojunkui2008@126.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Bernard Zhao
> Sent: Wednesday, May 11, 2022 12:25 PM
> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>;
> Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>;
> intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Cc: bernard@vivo.com; Bernard Zhao <zhaojunkui2008@126.com>
> Subject: [Intel-wired-lan] [PATCH] intel/i40e: delete if NULL check before
> dev_kfree_skb
>
> dev_kfree_skb check if the input parameter NULL and do the right
> thing, there is no need to check again.
> This change is to cleanup the code a bit.
>
> Signed-off-by: Bernard Zhao <zhaojunkui2008@126.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
^ permalink raw reply
* [PATCH bpf-next v4 1/6] arm64: ftrace: Add ftrace direct call support
From: Xu Kuohai @ 2022-05-17 7:18 UTC (permalink / raw)
To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220517071838.3366093-1-xukuohai@huawei.com>
Add ftrace direct support for arm64.
1. When there is custom trampoline only, replace the fentry nop to a
jump instruction that jumps directly to the custom trampoline.
2. When ftrace trampoline and custom trampoline coexist, jump from
fentry to ftrace trampoline first, then jump to custom trampoline
when ftrace trampoline exits. The current unused register
pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
trampoline to custom trampoline.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
arch/arm64/Kconfig | 2 ++
arch/arm64/include/asm/ftrace.h | 12 ++++++++++++
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry-ftrace.S | 18 +++++++++++++++---
4 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 57c4c995965f..81cc330daafc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -177,6 +177,8 @@ config ARM64
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS \
if $(cc-option,-fpatchable-function-entry=2)
+ select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
+ if DYNAMIC_FTRACE_WITH_REGS
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
if DYNAMIC_FTRACE_WITH_REGS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 1494cfa8639b..14a35a5df0a1 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -78,6 +78,18 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
return addr;
}
+#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs,
+ unsigned long addr)
+{
+ /*
+ * Place custom trampoline address in regs->orig_x0 to let ftrace
+ * trampoline jump to it.
+ */
+ regs->orig_x0 = addr;
+}
+#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
struct dyn_ftrace;
int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 1197e7679882..b1ed0bf01c59 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -80,6 +80,7 @@ int main(void)
DEFINE(S_SDEI_TTBR1, offsetof(struct pt_regs, sdei_ttbr1));
DEFINE(S_PMR_SAVE, offsetof(struct pt_regs, pmr_save));
DEFINE(S_STACKFRAME, offsetof(struct pt_regs, stackframe));
+ DEFINE(S_ORIG_X0, offsetof(struct pt_regs, orig_x0));
DEFINE(PT_REGS_SIZE, sizeof(struct pt_regs));
BLANK();
#ifdef CONFIG_COMPAT
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index e535480a4069..dfe62c55e3a2 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -60,6 +60,9 @@
str x29, [sp, #S_FP]
.endif
+ /* Set orig_x0 to zero */
+ str xzr, [sp, #S_ORIG_X0]
+
/* Save the callsite's SP and LR */
add x10, sp, #(PT_REGS_SIZE + 16)
stp x9, x10, [sp, #S_LR]
@@ -119,12 +122,21 @@ ftrace_common_return:
/* Restore the callsite's FP, LR, PC */
ldr x29, [sp, #S_FP]
ldr x30, [sp, #S_LR]
- ldr x9, [sp, #S_PC]
-
+ ldr x10, [sp, #S_PC]
+
+ ldr x11, [sp, #S_ORIG_X0]
+ cbz x11, 1f
+ /* Set x9 to parent ip before jump to custom trampoline */
+ mov x9, x30
+ /* Set lr to self ip */
+ ldr x30, [sp, #S_PC]
+ /* Set x10 (used for return address) to custom trampoline */
+ mov x10, x11
+1:
/* Restore the callsite's SP */
add sp, sp, #PT_REGS_SIZE + 16
- ret x9
+ ret x10
SYM_CODE_END(ftrace_common)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
--
2.30.2
^ permalink raw reply related
* [PATCH bpf-next v4 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest
From: Xu Kuohai @ 2022-05-17 7:18 UTC (permalink / raw)
To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220517071838.3366093-1-xukuohai@huawei.com>
After direct call is enabled for arm64, ftrace selftest enters a
dead loop:
<trace_selftest_dynamic_test_func>:
00 bti c
01 mov x9, x30 <trace_direct_tramp>:
02 bl <trace_direct_tramp> ----------> ret
|
lr/x30 is 03, return to 03
|
03 mov w0, #0x0 <-----------------------------|
| |
| dead loop! |
| |
04 ret ---- lr/x30 is still 03, go back to 03 ----|
The reason is that when the direct caller trace_direct_tramp() returns
to the patched function trace_selftest_dynamic_test_func(), lr is still
the address after the instrumented instruction in the patched function,
so when the patched function exits, it returns to itself!
To fix this issue, we need to restore lr before trace_direct_tramp()
exits, so rewrite a dedicated trace_direct_tramp() for arm64.
Reported-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
---
arch/arm64/include/asm/ftrace.h | 10 ++++++++++
arch/arm64/kernel/entry-ftrace.S | 10 ++++++++++
kernel/trace/trace_selftest.c | 2 ++
3 files changed, 22 insertions(+)
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 14a35a5df0a1..6f6b184e72fb 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -126,6 +126,16 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
*/
return !strcmp(sym + 8, name);
}
+
+#ifdef CONFIG_FTRACE_SELFTEST
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+
+#define trace_direct_tramp trace_direct_tramp
+extern void trace_direct_tramp(void);
+
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+#endif /* CONFIG_FTRACE_SELFTEST */
+
#endif /* ifndef __ASSEMBLY__ */
#endif /* __ASM_FTRACE_H */
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index dfe62c55e3a2..a47e87d4d3dd 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -357,3 +357,13 @@ SYM_CODE_START(return_to_handler)
ret
SYM_CODE_END(return_to_handler)
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#ifdef CONFIG_FTRACE_SELFTEST
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+SYM_FUNC_START(trace_direct_tramp)
+ mov x10, x30
+ mov x30, x9
+ ret x10
+SYM_FUNC_END(trace_direct_tramp)
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+#endif /* CONFIG_FTRACE_SELFTEST */
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index abcadbe933bb..e7ccd0d10c39 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -785,8 +785,10 @@ static struct fgraph_ops fgraph_ops __initdata = {
};
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+#ifndef trace_direct_tramp
noinline __noclone static void trace_direct_tramp(void) { }
#endif
+#endif
/*
* Pretty much the same than for the function tracer from which the selftest
--
2.30.2
^ permalink raw reply related
* [PATCH bpf-next v4 5/6] bpf, arm64: bpf trampoline for arm64
From: Xu Kuohai @ 2022-05-17 7:18 UTC (permalink / raw)
To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220517071838.3366093-1-xukuohai@huawei.com>
Add bpf trampoline support for arm64. Most of the logic is the same as
x86.
Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
#9 /1 bpf_cookie/kprobe:OK
#9 /2 bpf_cookie/multi_kprobe_link_api:FAIL
#9 /3 bpf_cookie/multi_kprobe_attach_api:FAIL
#9 /4 bpf_cookie/uprobe:OK
#9 /5 bpf_cookie/tracepoint:OK
#9 /6 bpf_cookie/perf_event:OK
#9 /7 bpf_cookie/trampoline:OK
#9 /8 bpf_cookie/lsm:OK
#9 bpf_cookie:FAIL
#18 /1 bpf_tcp_ca/dctcp:OK
#18 /2 bpf_tcp_ca/cubic:OK
#18 /3 bpf_tcp_ca/invalid_license:OK
#18 /4 bpf_tcp_ca/dctcp_fallback:OK
#18 /5 bpf_tcp_ca/rel_setsockopt:OK
#18 bpf_tcp_ca:OK
#51 /1 dummy_st_ops/dummy_st_ops_attach:OK
#51 /2 dummy_st_ops/dummy_init_ret_value:OK
#51 /3 dummy_st_ops/dummy_init_ptr_arg:OK
#51 /4 dummy_st_ops/dummy_multiple_args:OK
#51 dummy_st_ops:OK
#55 fentry_fexit:OK
#56 fentry_test:OK
#57 /1 fexit_bpf2bpf/target_no_callees:OK
#57 /2 fexit_bpf2bpf/target_yes_callees:OK
#57 /3 fexit_bpf2bpf/func_replace:OK
#57 /4 fexit_bpf2bpf/func_replace_verify:OK
#57 /5 fexit_bpf2bpf/func_sockmap_update:OK
#57 /6 fexit_bpf2bpf/func_replace_return_code:OK
#57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK
#57 /8 fexit_bpf2bpf/func_replace_multi:OK
#57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK
#57 fexit_bpf2bpf:OK
#58 fexit_sleep:OK
#59 fexit_stress:OK
#60 fexit_test:OK
#67 get_func_args_test:OK
#68 get_func_ip_test:OK
#104 modify_return:OK
#237 xdp_bpf2bpf:OK
bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
arch/arm64/net/bpf_jit_comp.c | 420 +++++++++++++++++++++++++++++++++-
1 file changed, 413 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 5ce6ed5f42a1..59cb96dc4335 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -142,6 +142,12 @@ static inline void emit_a64_mov_i64(const int reg, const u64 val,
}
}
+static inline void emit_bti(u32 insn, struct jit_ctx *ctx)
+{
+ if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
+ emit(insn, ctx);
+}
+
/*
* Kernel addresses in the vmalloc space use at most 48 bits, and the
* remaining bits are guaranteed to be 0x1. So we can compose the address
@@ -161,6 +167,14 @@ static inline void emit_addr_mov_i64(const int reg, const u64 val,
}
}
+static inline void emit_call(u64 target, struct jit_ctx *ctx)
+{
+ u8 tmp = bpf2a64[TMP_REG_1];
+
+ emit_addr_mov_i64(tmp, target, ctx);
+ emit(A64_BLR(tmp), ctx);
+}
+
static inline int bpf2a64_offset(int bpf_insn, int off,
const struct jit_ctx *ctx)
{
@@ -281,8 +295,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
*
*/
- if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
- emit(A64_BTI_C, ctx);
+ emit_bti(A64_BTI_C, ctx);
emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
emit(A64_NOP, ctx);
@@ -316,8 +329,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
}
/* BTI landing pad for the tail call, done with a BR */
- if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
- emit(A64_BTI_J, ctx);
+ emit_bti(A64_BTI_J, ctx);
}
emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx);
@@ -995,8 +1007,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
&func_addr, &func_addr_fixed);
if (ret < 0)
return ret;
- emit_addr_mov_i64(tmp, func_addr, ctx);
- emit(A64_BLR(tmp), ctx);
+ emit_call(func_addr, ctx);
emit(A64_MOV(1, r0, A64_R(0)), ctx);
break;
}
@@ -1340,6 +1351,13 @@ static int validate_code(struct jit_ctx *ctx)
if (a64_insn == AARCH64_BREAK_FAULT)
return -1;
}
+ return 0;
+}
+
+static int validate_ctx(struct jit_ctx *ctx)
+{
+ if (validate_code(ctx))
+ return -1;
if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries))
return -1;
@@ -1464,7 +1482,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
build_epilogue(&ctx);
/* 3. Extra pass to validate JITed code. */
- if (validate_code(&ctx)) {
+ if (validate_ctx(&ctx)) {
bpf_jit_binary_free(header);
prog = orig_prog;
goto out_off;
@@ -1535,6 +1553,394 @@ void bpf_jit_free_exec(void *addr)
return vfree(addr);
}
+static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
+ int args_off, int retval_off, int run_ctx_off,
+ bool save_ret)
+{
+ u32 *branch;
+ u64 enter_prog;
+ u64 exit_prog;
+ u8 tmp = bpf2a64[TMP_REG_1];
+ u8 r0 = bpf2a64[BPF_REG_0];
+ struct bpf_prog *p = l->link.prog;
+ int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
+
+ if (p->aux->sleepable) {
+ enter_prog = (u64)__bpf_prog_enter_sleepable;
+ exit_prog = (u64)__bpf_prog_exit_sleepable;
+ } else {
+ enter_prog = (u64)__bpf_prog_enter;
+ exit_prog = (u64)__bpf_prog_exit;
+ }
+
+ if (l->cookie == 0) {
+ /* if cookie is zero, one instruction is enough to store it */
+ emit(A64_STR64I(A64_ZR, A64_SP, run_ctx_off + cookie_off), ctx);
+ } else {
+ emit_a64_mov_i64(tmp, l->cookie, ctx);
+ emit(A64_STR64I(tmp, A64_SP, run_ctx_off + cookie_off), ctx);
+ }
+
+ /* save p to callee saved register x19 to avoid loading p with mov_i64
+ * each time.
+ */
+ emit_addr_mov_i64(A64_R(19), (const u64)p, ctx);
+
+ /* arg1: prog */
+ emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
+ /* arg2: &run_ctx */
+ emit(A64_ADD_I(1, A64_R(1), A64_SP, run_ctx_off), ctx);
+
+ emit_call(enter_prog, ctx);
+
+ /* if (__bpf_prog_enter(prog) == 0)
+ * goto skip_exec_of_prog;
+ */
+ branch = ctx->image + ctx->idx;
+ emit(A64_NOP, ctx);
+
+ /* save return value to callee saved register x20 */
+ emit(A64_MOV(1, A64_R(20), r0), ctx);
+
+ emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx);
+ if (!p->jited)
+ emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx);
+
+ emit_call((const u64)p->bpf_func, ctx);
+
+ /* store return value */
+ if (save_ret)
+ emit(A64_STR64I(r0, A64_SP, retval_off), ctx);
+
+ if (ctx->image) {
+ int offset = &ctx->image[ctx->idx] - branch;
+ *branch = A64_CBZ(1, A64_R(0), offset);
+ }
+ emit_bti(A64_BTI_J, ctx);
+
+ /* arg1: prog */
+ emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
+ /* arg2: start time */
+ emit(A64_MOV(1, A64_R(1), A64_R(20)), ctx);
+ /* arg3: &run_ctx */
+ emit(A64_ADD_I(1, A64_R(2), A64_SP, run_ctx_off), ctx);
+
+ emit_call(exit_prog, ctx);
+}
+
+static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
+ int args_off, int retval_off, int run_ctx_off,
+ u32 **branches)
+{
+ int i;
+
+ /* The first fmod_ret program will receive a garbage return value.
+ * Set this to 0 to avoid confusing the program.
+ */
+ emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx);
+ for (i = 0; i < tl->nr_links; i++) {
+ invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off,
+ run_ctx_off, true);
+ /* if (*(u64 *)(sp + retval_off) != 0)
+ * goto do_fexit;
+ */
+ emit(A64_LDR64I(A64_R(10), A64_SP, retval_off), ctx);
+ /* Save the location of branch, and generate a nop.
+ * This nop will be replaced with a cbnz later.
+ */
+ branches[i] = ctx->image + ctx->idx;
+ emit(A64_NOP, ctx);
+ }
+}
+
+static void save_args(struct jit_ctx *ctx, int args_off, int nargs)
+{
+ int i;
+
+ for (i = 0; i < nargs; i++) {
+ emit(A64_STR64I(i, A64_SP, args_off), ctx);
+ args_off += 8;
+ }
+}
+
+static void restore_args(struct jit_ctx *ctx, int args_off, int nargs)
+{
+ int i;
+
+ for (i = 0; i < nargs; i++) {
+ emit(A64_LDR64I(i, A64_SP, args_off), ctx);
+ args_off += 8;
+ }
+}
+
+/* Based on the x86's implementation of arch_prepare_bpf_trampoline().
+ *
+ * bpf prog and function entry before bpf trampoline hooked:
+ * mov x9, x30
+ * nop
+ *
+ * bpf prog and function entry after bpf trampoline hooked:
+ * mov x9, x30
+ * bl <bpf_trampoline>
+ *
+ */
+static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
+ struct bpf_tramp_links *tlinks, void *orig_call,
+ int nargs, u32 flags)
+{
+ int i;
+ int stack_size;
+ int retaddr_off;
+ int regs_off;
+ int retval_off;
+ int args_off;
+ int nargs_off;
+ int ip_off;
+ int run_ctx_off;
+ struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+ struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+ bool save_ret;
+ u32 **branches = NULL;
+
+ /* trampoline stack layout:
+ * [ parent ip ]
+ * [ FP ]
+ * SP + retaddr_off [ self ip ]
+ * [ FP ]
+ *
+ * [ padding ] align SP to multiples of 16
+ *
+ * [ x20 ] callee saved reg x20
+ * sp + regs_off [ x19 ] callee saved reg x19
+ *
+ * SP + retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or
+ * BPF_TRAMP_F_RET_FENTRY_RET
+ *
+ * [ argN ]
+ * [ ... ]
+ * sp + args_off [ arg1 ]
+ *
+ * SP + nargs_off [ args count ]
+ *
+ * SP + ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag
+ *
+ * SP + run_ctx_off [ bpf_tramp_run_ctx ]
+ */
+
+ stack_size = 0;
+ run_ctx_off = stack_size;
+ /* room for bpf_tramp_run_ctx */
+ stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8);
+
+ ip_off = stack_size;
+ /* room for IP address argument */
+ if (flags & BPF_TRAMP_F_IP_ARG)
+ stack_size += 8;
+
+ nargs_off = stack_size;
+ /* room for args count */
+ stack_size += 8;
+
+ args_off = stack_size;
+ /* room for args */
+ stack_size += nargs * 8;
+
+ /* room for return value */
+ retval_off = stack_size;
+ save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
+ if (save_ret)
+ stack_size += 8;
+
+ /* room for callee saved registers, currently x19 and x20 are used */
+ regs_off = stack_size;
+ stack_size += 16;
+
+ /* round up to multiples of 16 to avoid SPAlignmentFault */
+ stack_size = round_up(stack_size, 16);
+
+ /* return address locates above FP */
+ retaddr_off = stack_size + 8;
+
+ emit_bti(A64_BTI_C, ctx);
+
+ /* frame for parent function */
+ emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
+ emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
+ /* frame for patched function */
+ emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
+ emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
+ /* allocate stack space */
+ emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
+
+ if (flags & BPF_TRAMP_F_IP_ARG) {
+ /* save ip address of the traced function */
+ emit_addr_mov_i64(A64_R(10), (const u64)orig_call, ctx);
+ emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx);
+ }
+
+ /* save args count*/
+ emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx);
+ emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx);
+
+ /* save args */
+ save_args(ctx, args_off, nargs);
+
+ /* save callee saved registers */
+ emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx);
+ emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
+
+ if (flags & BPF_TRAMP_F_CALL_ORIG) {
+ emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+ emit_call((const u64)__bpf_tramp_enter, ctx);
+ }
+
+ for (i = 0; i < fentry->nr_links; i++)
+ invoke_bpf_prog(ctx, fentry->links[i], args_off,
+ retval_off, run_ctx_off,
+ flags & BPF_TRAMP_F_RET_FENTRY_RET);
+
+ if (fmod_ret->nr_links) {
+ branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *),
+ GFP_KERNEL);
+ if (!branches)
+ return -ENOMEM;
+
+ invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off,
+ run_ctx_off, branches);
+ }
+
+ if (flags & BPF_TRAMP_F_CALL_ORIG) {
+ restore_args(ctx, args_off, nargs);
+ /* call original func */
+ emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx);
+ emit(A64_BLR(A64_R(10)), ctx);
+ /* store return value */
+ emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx);
+ /* reserve a nop for bpf_tramp_image_put */
+ im->ip_after_call = ctx->image + ctx->idx;
+ emit(A64_NOP, ctx);
+ }
+
+ /* update the branches saved in invoke_bpf_mod_ret with cbnz */
+ for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) {
+ int offset = &ctx->image[ctx->idx] - branches[i];
+ *branches[i] = A64_CBNZ(1, A64_R(10), offset);
+ }
+
+ if (fmod_ret->nr_links)
+ emit_bti(A64_BTI_J, ctx);
+
+ for (i = 0; i < fexit->nr_links; i++)
+ invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off,
+ run_ctx_off, false);
+
+ if (flags & BPF_TRAMP_F_RESTORE_REGS)
+ restore_args(ctx, args_off, nargs);
+
+ if (flags & BPF_TRAMP_F_CALL_ORIG) {
+ im->ip_epilogue = ctx->image + ctx->idx;
+ emit_bti(A64_BTI_J, ctx);
+ emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+ emit_call((const u64)__bpf_tramp_exit, ctx);
+ }
+
+ /* restore callee saved register x19 and x20 */
+ emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx);
+ emit(A64_LDR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
+
+ if (save_ret)
+ emit(A64_LDR64I(A64_R(0), A64_SP, retval_off), ctx);
+
+ /* reset SP */
+ emit(A64_MOV(1, A64_SP, A64_FP), ctx);
+
+ /* pop frames */
+ emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
+ emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
+
+ if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+ /* skip patched function, return to parent */
+ emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
+ emit(A64_RET(A64_R(9)), ctx);
+ } else {
+ /* return to patched function */
+ emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
+ emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
+ emit(A64_RET(A64_R(10)), ctx);
+ }
+
+ if (ctx->image)
+ bpf_flush_icache(ctx->image, ctx->image + ctx->idx);
+
+ kfree(branches);
+
+ return ctx->idx;
+}
+
+static inline bool is_long_jump(void *ip, void *target)
+{
+ long offset;
+
+ /* when ip == NULL, the trampoline address is used by bpf_struct_ops
+ * as a normal kernel function pointer, which can be always jumped to,
+ * so don't treat it as a long jump.
+ */
+ if (ip == NULL)
+ return false;
+
+ offset = (long)target - (long)ip;
+ return offset < -SZ_128M || offset >= SZ_128M;
+}
+
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
+ void *image_end, const struct btf_func_model *m,
+ u32 flags, struct bpf_tramp_links *tlinks,
+ void *orig_call)
+{
+ int ret;
+ int nargs = m->nr_args;
+ int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE;
+ struct jit_ctx ctx = {
+ .image = NULL,
+ .idx = 0
+ };
+
+ /* Return error since the long jump is not implemented. Otherwise,
+ * ftrace_bug will be triggered when the fentry is patched by ftrace,
+ * making ftrace no longer work properly.
+ */
+ if (is_long_jump(orig_call, image))
+ return -ENOTSUPP;
+
+ /* the first 8 arguments are passed by registers */
+ if (nargs > 8)
+ return -ENOTSUPP;
+
+ ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
+ if (ret < 0)
+ return ret;
+
+ if (ret > max_insns)
+ return -EFBIG;
+
+ ctx.image = image;
+ ctx.idx = 0;
+
+ jit_fill_hole(image, (unsigned int)(image_end - image));
+ ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
+
+ if (ret > 0 && validate_code(&ctx) < 0)
+ ret = -EINVAL;
+
+ if (ret > 0)
+ ret *= AARCH64_INSN_SIZE;
+
+ return ret;
+}
+
static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
void *addr, u32 *insn)
{
--
2.30.2
^ permalink raw reply related
* [PATCH bpf-next v4 6/6] selftests/bpf: Fix trivial typo in fentry_fexit.c
From: Xu Kuohai @ 2022-05-17 7:18 UTC (permalink / raw)
To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220517071838.3366093-1-xukuohai@huawei.com>
The "ipv6" word in assertion message should be "fentry_fexit".
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
tools/testing/selftests/bpf/prog_tests/fentry_fexit.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c b/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
index 130f5b82d2e6..e3c139bde46e 100644
--- a/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
+++ b/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
@@ -28,8 +28,8 @@ void test_fentry_fexit(void)
prog_fd = fexit_skel->progs.test1.prog_fd;
err = bpf_prog_test_run_opts(prog_fd, &topts);
- ASSERT_OK(err, "ipv6 test_run");
- ASSERT_OK(topts.retval, "ipv6 test retval");
+ ASSERT_OK(err, "fentry_fexit test_run");
+ ASSERT_OK(topts.retval, "fentry_fexit test retval");
fentry_res = (__u64 *)fentry_skel->bss;
fexit_res = (__u64 *)fexit_skel->bss;
--
2.30.2
^ permalink raw reply related
* [PATCH bpf-next v4 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
From: Xu Kuohai @ 2022-05-17 7:18 UTC (permalink / raw)
To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220517071838.3366093-1-xukuohai@huawei.com>
Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
it to replace nop with jump, or replace jump with nop.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
---
arch/arm64/net/bpf_jit.h | 1 +
arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
2 files changed, 99 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index 194c95ccc1cf..1c4b0075a3e2 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -270,6 +270,7 @@
#define A64_BTI_C A64_HINT(AARCH64_INSN_HINT_BTIC)
#define A64_BTI_J A64_HINT(AARCH64_INSN_HINT_BTIJ)
#define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
+#define A64_NOP A64_HINT(AARCH64_INSN_HINT_NOP)
/* DMB */
#define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8ab4035dea27..5ce6ed5f42a1 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -9,6 +9,7 @@
#include <linux/bitfield.h>
#include <linux/bpf.h>
+#include <linux/memory.h>
#include <linux/filter.h>
#include <linux/printk.h>
#include <linux/slab.h>
@@ -18,6 +19,7 @@
#include <asm/cacheflush.h>
#include <asm/debug-monitors.h>
#include <asm/insn.h>
+#include <asm/patching.h>
#include <asm/set_memory.h>
#include "bpf_jit.h"
@@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
return true;
}
+#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
+#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
+
/* Tail call offset to jump into */
-#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
- IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
-#define PROLOGUE_OFFSET 9
-#else
-#define PROLOGUE_OFFSET 8
-#endif
+#define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8)
+/* Offset of nop instruction in bpf prog entry to be poked */
+#define POKE_OFFSET (BTI_INSNS + 1)
static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
{
@@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
*
*/
+ if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
+ emit(A64_BTI_C, ctx);
+
+ emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
+ emit(A64_NOP, ctx);
+
/* Sign lr */
if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
emit(A64_PACIASP, ctx);
- /* BTI landing pad */
- else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
- emit(A64_BTI_C, ctx);
/* Save FP and LR registers to stay align with ARM64 AAPCS */
emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
@@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
{
return vfree(addr);
}
+
+static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
+ void *addr, u32 *insn)
+{
+ if (!addr)
+ *insn = aarch64_insn_gen_nop();
+ else
+ *insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
+ (unsigned long)addr,
+ type);
+
+ return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
+}
+
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
+ void *old_addr, void *new_addr)
+{
+ int ret;
+ u32 old_insn;
+ u32 new_insn;
+ u32 replaced;
+ unsigned long offset = ~0UL;
+ enum aarch64_insn_branch_type branch_type;
+ char namebuf[KSYM_NAME_LEN];
+
+ if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
+ /* Only poking bpf text is supported. Since kernel function
+ * entry is set up by ftrace, we reply on ftrace to poke kernel
+ * functions.
+ */
+ return -EINVAL;
+
+ /* bpf entry */
+ if (offset == 0UL)
+ /* skip to the nop instruction in bpf prog entry:
+ * bti c // if BTI enabled
+ * mov x9, x30
+ * nop
+ */
+ ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
+
+ if (poke_type == BPF_MOD_CALL)
+ branch_type = AARCH64_INSN_BRANCH_LINK;
+ else
+ branch_type = AARCH64_INSN_BRANCH_NOLINK;
+
+ if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
+ return -EFAULT;
+
+ if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
+ return -EFAULT;
+
+ mutex_lock(&text_mutex);
+ if (aarch64_insn_read(ip, &replaced)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ if (replaced != old_insn) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ /* We call aarch64_insn_patch_text_nosync() to replace instruction
+ * atomically, so no other CPUs will fetch a half-new and half-old
+ * instruction. But there is chance that another CPU fetches the old
+ * instruction after bpf_arch_text_poke() finishes, that is, different
+ * CPUs may execute different versions of instructions at the same
+ * time before the icache is synchronized by hardware.
+ *
+ * 1. when a new trampoline is attached, it is not an issue for
+ * different CPUs to jump to different trampolines temporarily.
+ *
+ * 2. when an old trampoline is freed, we should wait for all other
+ * CPUs to exit the trampoline and make sure the trampoline is no
+ * longer reachable, since bpf_tramp_image_put() function already
+ * uses percpu_ref and rcu task to do the sync, no need to call the
+ * sync interface here.
+ */
+ ret = aarch64_insn_patch_text_nosync(ip, new_insn);
+out:
+ mutex_unlock(&text_mutex);
+ return ret;
+}
--
2.30.2
^ permalink raw reply related
* [PATCH bpf-next v4 3/6] bpf: Move is_valid_bpf_tramp_flags() to the public trampoline code
From: Xu Kuohai @ 2022-05-17 7:18 UTC (permalink / raw)
To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220517071838.3366093-1-xukuohai@huawei.com>
is_valid_bpf_tramp_flags() is not relevant to architecture, so move it
to the public trampoline code.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
arch/x86/net/bpf_jit_comp.c | 20 --------------------
include/linux/bpf.h | 6 ++++++
kernel/bpf/bpf_struct_ops.c | 4 ++--
kernel/bpf/trampoline.c | 34 +++++++++++++++++++++++++++++++---
4 files changed, 39 insertions(+), 25 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a2b6d197c226..7698ef3b4821 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
return 0;
}
-static bool is_valid_bpf_tramp_flags(unsigned int flags)
-{
- if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
- (flags & BPF_TRAMP_F_SKIP_FRAME))
- return false;
-
- /*
- * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
- * and it must be used alone.
- */
- if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
- (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
- return false;
-
- return true;
-}
-
/* Example:
* __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
* its 'struct btf_func_model' will be nr_args=2
@@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
if (nr_args > 6)
return -ENOTSUPP;
- if (!is_valid_bpf_tramp_flags(flags))
- return -EINVAL;
-
/* Generated trampoline stack layout:
*
* RBP + 8 [ return address ]
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c107392b0ba7..5615f31b2a30 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -760,6 +760,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *i
const struct btf_func_model *m, u32 flags,
struct bpf_tramp_links *tlinks,
void *orig_call);
+int bpf_prepare_trampoline(struct bpf_tramp_image *tr, void *image, void *image_end,
+ const struct btf_func_model *m, u32 flags,
+ struct bpf_tramp_links *tlinks,
+ void *orig_call);
+
+
/* these two functions are called from generated trampoline */
u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx);
void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx);
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index d9a3c9207240..4d81675e0dd5 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -342,8 +342,8 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
tlinks[BPF_TRAMP_FENTRY].links[0] = link;
tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
- return arch_prepare_bpf_trampoline(NULL, image, image_end,
- model, flags, tlinks, NULL);
+ return bpf_prepare_trampoline(NULL, image, image_end, model, flags,
+ tlinks, NULL);
}
static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 93c7675f0c9e..bef786fe307d 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -363,9 +363,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
if (ip_arg)
flags |= BPF_TRAMP_F_IP_ARG;
- err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE,
- &tr->func.model, flags, tlinks,
- tr->func.addr);
+ err = bpf_prepare_trampoline(im, im->image, im->image + PAGE_SIZE,
+ &tr->func.model, flags, tlinks,
+ tr->func.addr);
if (err < 0)
goto out;
@@ -671,6 +671,34 @@ arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *image
return -ENOTSUPP;
}
+static bool is_valid_bpf_tramp_flags(unsigned int flags)
+{
+ if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
+ (flags & BPF_TRAMP_F_SKIP_FRAME))
+ return false;
+
+ /* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
+ * and it must be used alone.
+ */
+ if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
+ (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
+ return false;
+
+ return true;
+}
+
+int bpf_prepare_trampoline(struct bpf_tramp_image *tr, void *image,
+ void *image_end, const struct btf_func_model *m,
+ u32 flags, struct bpf_tramp_links *tlinks,
+ void *orig_call)
+{
+ if (!is_valid_bpf_tramp_flags(flags))
+ return -EINVAL;
+
+ return arch_prepare_bpf_trampoline(tr, image, image_end, m, flags,
+ tlinks, orig_call);
+}
+
static int __init init_trampolines(void)
{
int i;
--
2.30.2
^ permalink raw reply related
* Re: [PATCH net-next] net: ifdefy the wireless pointers in struct net_device
From: Stefan Schmidt @ 2022-05-17 7:08 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, johannes, alex.aring, mareklindner, sw,
a, sven, linux-wireless, linux-wpan
In-Reply-To: <20220516215638.1787257-1-kuba@kernel.org>
Hello.
On 16.05.22 23:56, Jakub Kicinski wrote:
> Most protocol-specific pointers in struct net_device are under
> a respective ifdef. Wireless is the notable exception. Since
> there's a sizable number of custom-built kernels for datacenter
> workloads which don't build wireless it seems reasonable to
> ifdefy those pointers as well.
>
> While at it move IPv4 and IPv6 pointers up, those are special
> for obvious reasons.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: johannes@sipsolutions.net
> CC: alex.aring@gmail.com
> CC: stefan@datenfreihafen.org
> CC: mareklindner@neomailbox.ch
> CC: sw@simonwunderlich.de
> CC: a@unstable.cc
> CC: sven@narfation.org
> CC: linux-wireless@vger.kernel.org
> CC: linux-wpan@vger.kernel.org
> ---
> include/linux/netdevice.h | 8 ++++++--
> include/net/cfg80211.h | 5 +----
> include/net/cfg802154.h | 2 ++
> net/batman-adv/hard-interface.c | 2 ++
> net/wireless/core.c | 6 ++++++
> 5 files changed, 17 insertions(+), 6 deletions(-)
For the ieee802154 changes:
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
regards
Stefan Schmidt
^ permalink raw reply
* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
From: Maxime Chevallier @ 2022-05-17 7:09 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko
In-Reply-To: <Yn/kWG7X1YCPk17F@shell.armlinux.org.uk>
Hello Russell,
Thanks for the review.
On Sat, 14 May 2022 18:18:17 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> On Sat, May 14, 2022 at 05:06:52PM +0200, Maxime Chevallier wrote:
> > +static int ipqess_do_ioctl(struct net_device *netdev, struct ifreq
> > *ifr, int cmd) +{
> > + struct ipqess *ess = netdev_priv(netdev);
> > +
> > + switch (cmd) {
> > + case SIOCGMIIPHY:
> > + case SIOCGMIIREG:
> > + case SIOCSMIIREG:
> > + return phylink_mii_ioctl(ess->phylink, ifr, cmd);
> > + default:
> > + break;
> > + }
> > +
> > + return -EOPNOTSUPP;
> > +}
>
> Is there a reason this isn't just:
>
> return phylink_mii_ioctl(ess->phylink, ifr, cmd);
Not really, an oversight on my part, I'll address that in v3
> ?
>
> > +static int ipqess_axi_probe(struct platform_device *pdev)
> > +{
> > + struct device_node *np = pdev->dev.of_node;
> > + struct net_device *netdev;
> > + phy_interface_t phy_mode;
> > + struct resource *res;
> > + struct ipqess *ess;
> > + int i, err = 0;
> > +
> > + netdev = devm_alloc_etherdev_mqs(&pdev->dev, sizeof(struct
> > ipqess),
> > + IPQESS_NETDEV_QUEUES,
> > + IPQESS_NETDEV_QUEUES);
> > + if (!netdev)
> > + return -ENOMEM;
> > +
> > + ess = netdev_priv(netdev);
> > + ess->netdev = netdev;
> > + ess->pdev = pdev;
> > + spin_lock_init(&ess->stats_lock);
> > + SET_NETDEV_DEV(netdev, &pdev->dev);
> > + platform_set_drvdata(pdev, netdev);
> > +
> > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > + ess->hw_addr = devm_ioremap_resource(&pdev->dev, res);
> > + if (IS_ERR(ess->hw_addr))
> > + return PTR_ERR(ess->hw_addr);
> > +
> > + err = of_get_phy_mode(np, &phy_mode);
> > + if (err) {
> > + dev_err(&pdev->dev, "incorrect phy-mode\n");
> > + return err;
> > + }
> > +
> > + ess->ess_clk = devm_clk_get(&pdev->dev, "ess");
> > + if (!IS_ERR(ess->ess_clk))
> > + clk_prepare_enable(ess->ess_clk);
> > +
> > + ess->ess_rst = devm_reset_control_get(&pdev->dev, "ess");
> > + if (IS_ERR(ess->ess_rst))
> > + goto err_clk;
> > +
> > + ipqess_reset(ess);
> > +
> > + ess->phylink_config.dev = &netdev->dev;
> > + ess->phylink_config.type = PHYLINK_NETDEV;
> > +
> > + __set_bit(PHY_INTERFACE_MODE_INTERNAL,
> > + ess->phylink_config.supported_interfaces);
>
> No mac capabilities?
My bad too, I also missed that. I'll also address that in v3.
> > +
> > + ess->phylink = phylink_create(&ess->phylink_config,
> > + of_fwnode_handle(np),
> > phy_mode,
> > + &ipqess_phylink_mac_ops);
>
Thanks,
Maxime
^ permalink raw reply
* Re: [PATCH bpf v2 2/4] bpf_trace: support 32-bit kernels in bpf_kprobe_multi_link_attach
From: Yonghong Song @ 2022-05-17 7:08 UTC (permalink / raw)
To: Eugene Syromiatnikov, Jiri Olsa, Masami Hiramatsu, Steven Rostedt,
Ingo Molnar, Alexei Starovoitov, Daniel Borkmann
Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, John Fastabend,
KP Singh, netdev, bpf, linux-kernel, Shuah Khan, linux-kselftest
In-Reply-To: <20220516230506.GA25374@asgard.redhat.com>
On 5/16/22 4:05 PM, Eugene Syromiatnikov wrote:
> It seems that there is no reason not to support 32-bit architectures;
> doing so requires a bit of rework with respect to cookies handling,
> however, as the current code implicitly assumes
> that sizeof(long) == sizeof(u64).
>
> Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
> ---
> kernel/trace/bpf_trace.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f1d4e68..bf5bcfb 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2406,16 +2406,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> struct bpf_link_primer link_primer;
> void __user *ucookies;
> unsigned long *addrs;
> - u32 flags, cnt, size;
> + u32 flags, cnt, size, cookies_size;
> void __user *uaddrs;
> u64 *cookies = NULL;
> void __user *usyms;
> int err;
>
> - /* no support for 32bit archs yet */
> - if (sizeof(u64) != sizeof(void *))
> - return -EOPNOTSUPP;
> -
> if (prog->expected_attach_type != BPF_TRACE_KPROBE_MULTI)
> return -EINVAL;
>
> @@ -2425,6 +2421,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
>
> uaddrs = u64_to_user_ptr(attr->link_create.kprobe_multi.addrs);
> usyms = u64_to_user_ptr(attr->link_create.kprobe_multi.syms);
> + ucookies = u64_to_user_ptr(attr->link_create.kprobe_multi.cookies);
> if (!!uaddrs == !!usyms)
> return -EINVAL;
>
> @@ -2432,8 +2429,11 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> if (!cnt)
> return -EINVAL;
>
> - if (check_mul_overflow(cnt, (u32)sizeof(*addrs), &size))
> + if (check_mul_overflow(cnt, (u32)sizeof(*addrs), &size) ||
> + (ucookies &&
> + check_mul_overflow(cnt, (u32)sizeof(*cookies), &cookies_size))) {
> return -EOVERFLOW;
> + }
> size = cnt * sizeof(*addrs);
size has been calculated, no need to calculate again.
> addrs = kvmalloc(size, GFP_KERNEL);
> if (!addrs)
> @@ -2450,14 +2450,14 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> goto error;
> }
>
> - ucookies = u64_to_user_ptr(attr->link_create.kprobe_multi.cookies);
> if (ucookies) {
> - cookies = kvmalloc(size, GFP_KERNEL);
> + cookies_size = cnt * sizeof(*cookies);
same for cookies_size.
> + cookies = kvmalloc(cookies_size, GFP_KERNEL);
> if (!cookies) {
> err = -ENOMEM;
> goto error;
> }
> - if (copy_from_user(cookies, ucookies, size)) {
> + if (copy_from_user(cookies, ucookies, cookies_size)) {
> err = -EFAULT;
> goto error;
> }
^ permalink raw reply
* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
From: Maxime Chevallier @ 2022-05-17 7:11 UTC (permalink / raw)
To: Vladimir Oltean
Cc: davem@davemloft.net, Rob Herring, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
thomas.petazzoni@bootlin.com, Andrew Lunn, Florian Fainelli,
Heiner Kallweit, Russell King,
linux-arm-kernel@lists.infradead.org, Luka Perkov, Robert Marko
In-Reply-To: <20220514204438.urxot42jfazwnjlz@skbuf>
Hello Vlad,
On Sat, 14 May 2022 20:44:38 +0000
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> On Sat, May 14, 2022 at 05:06:52PM +0200, Maxime Chevallier wrote:
> > +/* locking is handled by the caller */
> > +static int ipqess_rx_buf_alloc_napi(struct ipqess_rx_ring *rx_ring)
> > +{
> > + struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
> > +
> > + buf->skb = napi_alloc_skb(&rx_ring->napi_rx,
> > IPQESS_RX_HEAD_BUFF_SIZE);
> > + if (!buf->skb)
> > + return -ENOMEM;
> > +
> > + return ipqess_rx_buf_prepare(buf, rx_ring);
> > +}
> > +
> > +static int ipqess_rx_buf_alloc(struct ipqess_rx_ring *rx_ring)
> > +{
> > + struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
> > +
> > + buf->skb = netdev_alloc_skb_ip_align(rx_ring->ess->netdev,
> > +
> > IPQESS_RX_HEAD_BUFF_SIZE); +
> > + if (!buf->skb)
> > + return -ENOMEM;
> > +
> > + return ipqess_rx_buf_prepare(buf, rx_ring);
> > +}
> > +
> > +static void ipqess_refill_work(struct work_struct *work)
> > +{
> > + struct ipqess_rx_ring_refill *rx_refill =
> > container_of(work,
> > + struct ipqess_rx_ring_refill, refill_work);
> > + struct ipqess_rx_ring *rx_ring = rx_refill->rx_ring;
> > + int refill = 0;
> > +
> > + /* don't let this loop by accident. */
> > + while (atomic_dec_and_test(&rx_ring->refill_count)) {
> > + napi_disable(&rx_ring->napi_rx);
> > + if (ipqess_rx_buf_alloc(rx_ring)) {
> > + refill++;
> > + dev_dbg(rx_ring->ppdev,
> > + "Not all buffers were
> > reallocated");
> > + }
> > + napi_enable(&rx_ring->napi_rx);
> > + }
> > +
> > + if (atomic_add_return(refill, &rx_ring->refill_count))
> > + schedule_work(&rx_refill->refill_work);
> > +}
> > +
> > +static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int
> > budget) +{
>
> > + while (done < budget) {
>
> > + num_desc += atomic_xchg(&rx_ring->refill_count, 0);
> > + while (num_desc) {
> > + if (ipqess_rx_buf_alloc_napi(rx_ring)) {
> > + num_desc =
> > atomic_add_return(num_desc,
> > +
> > &rx_ring->refill_count);
> > + if (num_desc >= ((4 *
> > IPQESS_RX_RING_SIZE + 6) / 7))
>
> DIV_ROUND_UP(IPQESS_RX_RING_SIZE * 4, 7)
> Also, why this number?
Ah this was from the original out-of-tree driver... I'll try to figure
out what's going on an replace that by some #define that would make
more sense.
> > +
> > schedule_work(&rx_ring->ess->rx_refill[rx_ring->ring_id].refill_work);
> > + break;
> > + }
> > + num_desc--;
> > + }
> > + }
> > +
> > + ipqess_w32(rx_ring->ess,
> > IPQESS_REG_RX_SW_CONS_IDX_Q(rx_ring->idx),
> > + rx_ring_tail);
> > + rx_ring->tail = rx_ring_tail;
> > +
> > + return done;
> > +}
>
> > +static void ipqess_rx_ring_free(struct ipqess *ess)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> > + int j;
> > +
> > + atomic_set(&ess->rx_ring[i].refill_count, 0);
> > + cancel_work_sync(&ess->rx_refill[i].refill_work);
>
> When refill_work is currently scheduled and executing the while loop,
> will refill_count underflow due to the possibility of calling
> atomic_dec_and_test(0)?
Good question, I'll double-check, you might be correct. Nice catch
> > +
> > + for (j = 0; j < IPQESS_RX_RING_SIZE; j++) {
> > + dma_unmap_single(&ess->pdev->dev,
> > +
> > ess->rx_ring[i].buf[j].dma,
> > +
> > ess->rx_ring[i].buf[j].length,
> > + DMA_FROM_DEVICE);
> > +
> > dev_kfree_skb_any(ess->rx_ring[i].buf[j].skb);
> > + }
> > + }
> > +
Thanks,
Maxime
^ permalink raw reply
* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
From: Maxime Chevallier @ 2022-05-17 7:13 UTC (permalink / raw)
To: Andrew Lunn
Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
thomas.petazzoni, Florian Fainelli, Heiner Kallweit, Russell King,
linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko
In-Reply-To: <YoG8F8V0Z+7hz/jw@lunn.ch>
Hi Andrew,
On Mon, 16 May 2022 04:51:03 +0200
Andrew Lunn <andrew@lunn.ch> wrote:
> > +static int ipqess_tx_ring_alloc(struct ipqess *ess)
> > +{
> > + struct device *dev = &ess->pdev->dev;
> > + int i;
> > +
> > + for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> > + struct ipqess_tx_ring *tx_ring = &ess->tx_ring[i];
> > + size_t size;
> > + u32 idx;
> > +
> > + tx_ring->ess = ess;
> > + tx_ring->ring_id = i;
> > + tx_ring->idx = i * 4;
> > + tx_ring->count = IPQESS_TX_RING_SIZE;
> > + tx_ring->nq = netdev_get_tx_queue(ess->netdev, i);
> > +
> > + size = sizeof(struct ipqess_buf) *
> > IPQESS_TX_RING_SIZE;
> > + tx_ring->buf = devm_kzalloc(dev, size, GFP_KERNEL);
> > + if (!tx_ring->buf) {
> > + netdev_err(ess->netdev, "buffer alloc of
> > tx ring failed");
> > + return -ENOMEM;
> > + }
>
> kzalloc() is pretty loud when there is no memory. So you see patches
> removing such warning messages.
Ack, I'll remove that
> > +static int ipqess_rx_napi(struct napi_struct *napi, int budget)
> > +{
> > + struct ipqess_rx_ring *rx_ring = container_of(napi, struct
> > ipqess_rx_ring,
> > + napi_rx);
> > + struct ipqess *ess = rx_ring->ess;
> > + u32 rx_mask = BIT(rx_ring->idx);
> > + int remain_budget = budget;
> > + int rx_done;
> > + u32 status;
> > +
> > +poll_again:
> > + ipqess_w32(ess, IPQESS_REG_RX_ISR, rx_mask);
> > + rx_done = ipqess_rx_poll(rx_ring, remain_budget);
> > +
> > + if (rx_done == remain_budget)
> > + return budget;
> > +
> > + status = ipqess_r32(ess, IPQESS_REG_RX_ISR);
> > + if (status & rx_mask) {
> > + remain_budget -= rx_done;
> > + goto poll_again;
> > + }
>
> Could this be turned into a do while() loop?
Yes indeed, I'll fix this for v3
> > +static void ipqess_irq_enable(struct ipqess *ess)
> > +{
> > + int i;
> > +
> > + ipqess_w32(ess, IPQESS_REG_RX_ISR, 0xff);
> > + ipqess_w32(ess, IPQESS_REG_TX_ISR, 0xffff);
> > + for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> > + ipqess_w32(ess,
> > IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 1);
> > + ipqess_w32(ess,
> > IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 1);
> > + }
> > +}
> > +
> > +static void ipqess_irq_disable(struct ipqess *ess)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> > + ipqess_w32(ess,
> > IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 0);
> > + ipqess_w32(ess,
> > IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 0);
> > + }
> > +}
>
> Enable and disable are not symmetric?
Ah nice catch too, I'll dig into this, either to make it symmetric or
to explain with a comment why it isn't
>
> > +static inline void ipqess_kick_tx(struct ipqess_tx_ring *tx_ring)
>
> No inline functions please in .c files. Let the compiler decide.
Ack, I'll address that.
> Andrew
Thanks again for the review
Maxime
^ permalink raw reply
* Re: [PATCH RESEND net] bonding: fix missed rcu protection
From: Paolo Abeni @ 2022-05-17 7:24 UTC (permalink / raw)
To: Hangbin Liu, Jakub Kicinski
Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
David S . Miller, David Ahern, Jonathan Toppins, Eric Dumazet,
syzbot+92beb3d46aab498710fa, Vladimir Oltean
In-Reply-To: <YoMZvrPcgIm8k2b6@Laptop-X1>
On Tue, 2022-05-17 at 11:42 +0800, Hangbin Liu wrote:
> On Mon, May 16, 2022 at 06:10:28PM -0700, Jakub Kicinski wrote:
> > Can't ->get_ts_info sleep now? It'd be a little sad to force it
> > to be atomic just because of one upper dev trying to be fancy.
> > Maybe all we need to do is to take a ref on the real_dev?
>
> Do you mean
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 38e152548126..b60450211579 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5591,16 +5591,20 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev,
> const struct ethtool_ops *ops;
> struct net_device *real_dev;
> struct phy_device *phydev;
> + int ret = 0;
>
You additionally need something alike the following:
rcu_read_lock();
> real_dev = bond_option_active_slave_get_rcu(bond);
> if (real_dev) {
> + dev_hold(real_dev)
rcu_read_unlock();
> ops = real_dev->ethtool_ops;
> phydev = real_dev->phydev;
>
> if (phy_has_tsinfo(phydev)) {
> - return phy_ts_info(phydev, info);
> + ret = phy_ts_info(phydev, info);
> + goto out;
> } else if (ops->get_ts_info) {
> - return ops->get_ts_info(real_dev, info);
> + ret = ops->get_ts_info(real_dev, info);
> + goto out;
> }
} else {
rcu_read_unlock();
> }
... or you will hit the initial RCU splat. Overall this will not put
atomicy constraint on get_ts_info.
Cheers,
Paol
>
> @@ -5608,7 +5612,10 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev,
> SOF_TIMESTAMPING_SOFTWARE;
> info->phc_index = -1;
>
> - return 0;
> +out:
> + if (real_dev)
> + dev_put(real_dev);
> + return ret;
> }
>
>
> This look OK to me.
>
> Vladimir, Jay, WDYT?
>
> >
> > Also please add a Link: to the previous discussion, it'd have been
> > useful to get the context in which Vladimir suggested this.
>
> OK, I will.
>
> Thanks
> Hangbin
>
^ permalink raw reply
* [PATCH 1/3] net: macb: Fix PTP one step sync support
From: Harini Katakam @ 2022-05-17 7:32 UTC (permalink / raw)
To: nicolas.ferre, davem, richardcochran, claudiu.beznea, kuba,
pabeni
Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
harini.katakam, radhey.shyam.pandey
In-Reply-To: <20220517073259.23476-1-harini.katakam@xilinx.com>
PTP one step sync packets cannot have CSUM padding and insertion in
SW since time stamp is inserted on the fly by HW.
In addition, ptp4l version 3.0 and above report an error when skb
timestamps are reported for packets that not processed for TX TS
after transmission.
Add a helper to identify PTP one step sync and fix the above two
errors.
Also reset ptp OSS bit when one step is not selected.
Fixes: ab91f0a9b5f4 ("net: macb: Add hardware PTP support")
Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation")
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
---
Notes:
-> Added the macb pad and fcs fixes tag because strictly speaking the PTP support
patch precedes the fcs patch in timeline.
-> FYI, the error observed with setting HW TX timestamp for one step sync packets:
ptp4l[405.292]: port 1: unexpected socket error
drivers/net/ethernet/cadence/macb_main.c | 54 ++++++++++++++++++++----
drivers/net/ethernet/cadence/macb_ptp.c | 4 +-
2 files changed, 48 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index e993616308f8..e23a03e8badf 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -37,6 +37,7 @@
#include <linux/phy/phy.h>
#include <linux/pm_runtime.h>
#include <linux/reset.h>
+#include <linux/ptp_classify.h>
#include "macb.h"
/* This structure is only used for MACB on SiFive FU540 devices */
@@ -98,6 +99,9 @@ struct sifive_fu540_macb_mgmt {
#define MACB_MDIO_TIMEOUT 1000000 /* in usecs */
+/* IEEE1588 PTP flag field values */
+#define PTP_FLAG_TWOSTEP 0x2
+
/* DMA buffer descriptor might be different size
* depends on hardware configuration:
*
@@ -1122,6 +1126,36 @@ static void macb_tx_error_task(struct work_struct *work)
napi_enable(&queue->napi_tx);
}
+static inline bool ptp_oss(struct sk_buff *skb)
+{
+ struct ptp_header *hdr;
+ unsigned int ptp_class;
+ u8 msgtype;
+
+ /* No need to parse packet if PTP TS is not involved */
+ if (!(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
+ goto not_oss;
+
+ /* Identify and return whether PTP one step sync is being processed */
+ ptp_class = ptp_classify_raw(skb);
+ if (ptp_class == PTP_CLASS_NONE)
+ goto not_oss;
+
+ hdr = ptp_parse_header(skb, ptp_class);
+ if (!hdr)
+ goto not_oss;
+
+ if (hdr->flag_field[0] & PTP_FLAG_TWOSTEP)
+ goto not_oss;
+
+ msgtype = ptp_get_msgtype(hdr, ptp_class);
+ if (msgtype == PTP_MSGTYPE_SYNC)
+ return true;
+
+not_oss:
+ return false;
+}
+
static int macb_tx_complete(struct macb_queue *queue, int budget)
{
struct macb *bp = queue->bp;
@@ -1158,13 +1192,14 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
/* First, update TX stats if needed */
if (skb) {
- if (unlikely(skb_shinfo(skb)->tx_flags &
- SKBTX_HW_TSTAMP) &&
- gem_ptp_do_txstamp(queue, skb, desc) == 0) {
- /* skb now belongs to timestamp buffer
- * and will be removed later
- */
- tx_skb->skb = NULL;
+ if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
+ !ptp_oss(skb)) {
+ if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
+ /* skb now belongs to timestamp buffer
+ * and will be removed later
+ */
+ tx_skb->skb = NULL;
+ }
}
netdev_vdbg(bp->dev, "skb %u (data %p) TX complete\n",
macb_tx_ring_wrap(bp, tail),
@@ -2063,7 +2098,7 @@ static unsigned int macb_tx_map(struct macb *bp,
ctrl |= MACB_BF(TX_LSO, lso_ctrl);
ctrl |= MACB_BF(TX_TCP_SEQ_SRC, seq_ctrl);
if ((bp->dev->features & NETIF_F_HW_CSUM) &&
- skb->ip_summed != CHECKSUM_PARTIAL && !lso_ctrl)
+ skb->ip_summed != CHECKSUM_PARTIAL && !lso_ctrl && !ptp_oss(skb))
ctrl |= MACB_BIT(TX_NOCRC);
} else
/* Only set MSS/MFS on payload descriptors
@@ -2159,9 +2194,10 @@ static int macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev)
struct sk_buff *nskb;
u32 fcs;
+ /* Not available for GSO and PTP one step sync */
if (!(ndev->features & NETIF_F_HW_CSUM) ||
!((*skb)->ip_summed != CHECKSUM_PARTIAL) ||
- skb_shinfo(*skb)->gso_size) /* Not available for GSO */
+ skb_shinfo(*skb)->gso_size || ptp_oss(*skb))
return 0;
if (padlen <= 0) {
diff --git a/drivers/net/ethernet/cadence/macb_ptp.c b/drivers/net/ethernet/cadence/macb_ptp.c
index fb6b27f46b15..9559c16078f9 100644
--- a/drivers/net/ethernet/cadence/macb_ptp.c
+++ b/drivers/net/ethernet/cadence/macb_ptp.c
@@ -470,8 +470,10 @@ int gem_set_hwtst(struct net_device *dev, struct ifreq *ifr, int cmd)
case HWTSTAMP_TX_ONESTEP_SYNC:
if (gem_ptp_set_one_step_sync(bp, 1) != 0)
return -ERANGE;
- fallthrough;
+ tx_bd_control = TSTAMP_ALL_FRAMES;
+ break;
case HWTSTAMP_TX_ON:
+ gem_ptp_set_one_step_sync(bp, 0);
tx_bd_control = TSTAMP_ALL_FRAMES;
break;
default:
--
2.17.1
^ permalink raw reply related
* [PATCH 0/3] Macb PTP updates
From: Harini Katakam @ 2022-05-17 7:32 UTC (permalink / raw)
To: nicolas.ferre, davem, richardcochran, claudiu.beznea, kuba,
pabeni
Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
harini.katakam, radhey.shyam.pandey
Macb PTP updates to handle PTP one step sync support and other minor
enhancements.
Harini Katakam (3):
net: macb: Fix PTP one step sync support
net: macb: Enable PTP unicast
net: macb: Optimize reading HW timestamp
drivers/net/ethernet/cadence/macb.h | 4 ++
drivers/net/ethernet/cadence/macb_main.c | 61 +++++++++++++++++++-----
drivers/net/ethernet/cadence/macb_ptp.c | 12 +++--
3 files changed, 63 insertions(+), 14 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH 3/3] net: macb: Optimize reading HW timestamp
From: Harini Katakam @ 2022-05-17 7:32 UTC (permalink / raw)
To: nicolas.ferre, davem, richardcochran, claudiu.beznea, kuba,
pabeni
Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
harini.katakam, radhey.shyam.pandey
In-Reply-To: <20220517073259.23476-1-harini.katakam@xilinx.com>
The seconds input from BD (6 bits) just needs to be ORed with the
upper bits from timer in this function. Avoid +/- operations every
single time. Check for seconds rollover at BIT 5 and subtract the
overhead only in that case.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
---
drivers/net/ethernet/cadence/macb_ptp.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb_ptp.c b/drivers/net/ethernet/cadence/macb_ptp.c
index 9559c16078f9..c853518c0406 100644
--- a/drivers/net/ethernet/cadence/macb_ptp.c
+++ b/drivers/net/ethernet/cadence/macb_ptp.c
@@ -247,6 +247,7 @@ static int gem_hw_timestamp(struct macb *bp, u32 dma_desc_ts_1,
u32 dma_desc_ts_2, struct timespec64 *ts)
{
struct timespec64 tsu;
+ bool sec_rollover = false;
ts->tv_sec = (GEM_BFEXT(DMA_SECH, dma_desc_ts_2) << GEM_DMA_SECL_SIZE) |
GEM_BFEXT(DMA_SECL, dma_desc_ts_1);
@@ -264,9 +265,12 @@ static int gem_hw_timestamp(struct macb *bp, u32 dma_desc_ts_1,
*/
if ((ts->tv_sec & (GEM_DMA_SEC_TOP >> 1)) &&
!(tsu.tv_sec & (GEM_DMA_SEC_TOP >> 1)))
- ts->tv_sec -= GEM_DMA_SEC_TOP;
+ sec_rollover = true;
+
+ ts->tv_sec |= ((~GEM_DMA_SEC_MASK) & tsu.tv_sec);
- ts->tv_sec += ((~GEM_DMA_SEC_MASK) & tsu.tv_sec);
+ if (sec_rollover)
+ ts->tv_sec -= GEM_DMA_SEC_TOP;
return 0;
}
--
2.17.1
^ permalink raw reply related
* [PATCH 2/3] net: macb: Enable PTP unicast
From: Harini Katakam @ 2022-05-17 7:32 UTC (permalink / raw)
To: nicolas.ferre, davem, richardcochran, claudiu.beznea, kuba,
pabeni
Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
harini.katakam, radhey.shyam.pandey
In-Reply-To: <20220517073259.23476-1-harini.katakam@xilinx.com>
Enable transmission and reception of PTP unicast packets by
updating PTP unicast config bit and setting current HW mac
address as allowed address in PTP unicast filter registers.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
---
drivers/net/ethernet/cadence/macb.h | 4 ++++
drivers/net/ethernet/cadence/macb_main.c | 7 +++++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 7ca077b65eaa..d245fd78ec51 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -95,6 +95,8 @@
#define GEM_SA4B 0x00A0 /* Specific4 Bottom */
#define GEM_SA4T 0x00A4 /* Specific4 Top */
#define GEM_WOL 0x00b8 /* Wake on LAN */
+#define GEM_RXPTPUNI 0x00D4 /* PTP RX Unicast address */
+#define GEM_TXPTPUNI 0x00D8 /* PTP TX Unicast address */
#define GEM_EFTSH 0x00e8 /* PTP Event Frame Transmitted Seconds Register 47:32 */
#define GEM_EFRSH 0x00ec /* PTP Event Frame Received Seconds Register 47:32 */
#define GEM_PEFTSH 0x00f0 /* PTP Peer Event Frame Transmitted Seconds Register 47:32 */
@@ -245,6 +247,8 @@
#define MACB_TZQ_OFFSET 12 /* Transmit zero quantum pause frame */
#define MACB_TZQ_SIZE 1
#define MACB_SRTSM_OFFSET 15 /* Store Receive Timestamp to Memory */
+#define MACB_PTPUNI_OFFSET 20 /* PTP Unicast packet enable */
+#define MACB_PTPUNI_SIZE 1
#define MACB_OSSMODE_OFFSET 24 /* Enable One Step Synchro Mode */
#define MACB_OSSMODE_SIZE 1
#define MACB_MIIONRGMII_OFFSET 28 /* MII Usage on RGMII Interface */
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index e23a03e8badf..19276583811e 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -290,6 +290,9 @@ static void macb_set_hwaddr(struct macb *bp)
top = cpu_to_le16(*((u16 *)(bp->dev->dev_addr + 4)));
macb_or_gem_writel(bp, SA1T, top);
+ gem_writel(bp, RXPTPUNI, bottom);
+ gem_writel(bp, TXPTPUNI, bottom);
+
/* Clear unused address register sets */
macb_or_gem_writel(bp, SA2B, 0);
macb_or_gem_writel(bp, SA2T, 0);
@@ -723,8 +726,8 @@ static void macb_mac_link_up(struct phylink_config *config,
spin_unlock_irqrestore(&bp->lock, flags);
- /* Enable Rx and Tx */
- macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(RE) | MACB_BIT(TE));
+ /* Enable Rx and Tx; Enable PTP unicast */
+ macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(RE) | MACB_BIT(TE) | MACB_BIT(PTPUNI));
netif_tx_wake_all_queues(ndev);
}
--
2.17.1
^ permalink raw reply related
* [PATCH bpf-next v3 1/4] bpf_trace: check size for overflow in bpf_kprobe_multi_link_attach
From: Eugene Syromiatnikov @ 2022-05-17 7:36 UTC (permalink / raw)
To: Jiri Olsa, Masami Hiramatsu, Steven Rostedt, Ingo Molnar,
Alexei Starovoitov, Daniel Borkmann
Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
John Fastabend, KP Singh, netdev, bpf, linux-kernel, Shuah Khan,
linux-kselftest
In-Reply-To: <cover.1652772731.git.esyr@redhat.com>
Check that size would not overflow before calculation (and return
-EOVERFLOW if it will), to prevent potential out-of-bounds write
with the following copy_from_user. Use kvmalloc_array
in copy_user_syms to prevent out-of-bounds write into syms
(and especially buf) as well.
Fixes: 0dcac272540613d4 ("bpf: Add multi kprobe link")
Cc: <stable@vger.kernel.org> # 5.18
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
---
kernel/trace/bpf_trace.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7141ca8..9c041be 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2261,11 +2261,11 @@ static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32
int err = -ENOMEM;
unsigned int i;
- syms = kvmalloc(cnt * sizeof(*syms), GFP_KERNEL);
+ syms = kvmalloc_array(cnt, sizeof(*syms), GFP_KERNEL);
if (!syms)
goto error;
- buf = kvmalloc(cnt * KSYM_NAME_LEN, GFP_KERNEL);
+ buf = kvmalloc_array(cnt, KSYM_NAME_LEN, GFP_KERNEL);
if (!buf)
goto error;
@@ -2461,7 +2461,8 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
if (!cnt)
return -EINVAL;
- size = cnt * sizeof(*addrs);
+ if (check_mul_overflow(cnt, (u32)sizeof(*addrs), &size))
+ return -EOVERFLOW;
addrs = kvmalloc(size, GFP_KERNEL);
if (!addrs)
return -ENOMEM;
--
2.1.4
^ permalink raw reply related
* [PATCH bpf-next v3 2/4] bpf_trace: support 32-bit kernels in bpf_kprobe_multi_link_attach
From: Eugene Syromiatnikov @ 2022-05-17 7:36 UTC (permalink / raw)
To: Jiri Olsa, Masami Hiramatsu, Steven Rostedt, Ingo Molnar,
Alexei Starovoitov, Daniel Borkmann
Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
John Fastabend, KP Singh, netdev, bpf, linux-kernel, Shuah Khan,
linux-kselftest
In-Reply-To: <cover.1652772731.git.esyr@redhat.com>
It seems that there is no reason not to support 32-bit architectures;
doing so requires a bit of rework with respect to cookies handling,
however, as the current code implicitly assumes
that sizeof(long) == sizeof(u64).
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
---
kernel/trace/bpf_trace.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9c041be..a93a54f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2435,16 +2435,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
struct bpf_link_primer link_primer;
void __user *ucookies;
unsigned long *addrs;
- u32 flags, cnt, size;
+ u32 flags, cnt, size, cookies_size;
void __user *uaddrs;
u64 *cookies = NULL;
void __user *usyms;
int err;
- /* no support for 32bit archs yet */
- if (sizeof(u64) != sizeof(void *))
- return -EOPNOTSUPP;
-
if (prog->expected_attach_type != BPF_TRACE_KPROBE_MULTI)
return -EINVAL;
@@ -2454,6 +2450,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
uaddrs = u64_to_user_ptr(attr->link_create.kprobe_multi.addrs);
usyms = u64_to_user_ptr(attr->link_create.kprobe_multi.syms);
+ ucookies = u64_to_user_ptr(attr->link_create.kprobe_multi.cookies);
if (!!uaddrs == !!usyms)
return -EINVAL;
@@ -2461,8 +2458,11 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
if (!cnt)
return -EINVAL;
- if (check_mul_overflow(cnt, (u32)sizeof(*addrs), &size))
+ if (check_mul_overflow(cnt, (u32)sizeof(*addrs), &size) ||
+ (ucookies &&
+ check_mul_overflow(cnt, (u32)sizeof(*cookies), &cookies_size))) {
return -EOVERFLOW;
+ }
addrs = kvmalloc(size, GFP_KERNEL);
if (!addrs)
return -ENOMEM;
@@ -2486,14 +2486,13 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
goto error;
}
- ucookies = u64_to_user_ptr(attr->link_create.kprobe_multi.cookies);
if (ucookies) {
- cookies = kvmalloc(size, GFP_KERNEL);
+ cookies = kvmalloc(cookies_size, GFP_KERNEL);
if (!cookies) {
err = -ENOMEM;
goto error;
}
- if (copy_from_user(cookies, ucookies, size)) {
+ if (copy_from_user(cookies, ucookies, cookies_size)) {
err = -EFAULT;
goto error;
}
--
2.1.4
^ permalink raw reply related
* [PATCH bpf-next v3 0/4] Fix 32-bit arch and compat support for the kprobe_multi attach type
From: Eugene Syromiatnikov @ 2022-05-17 7:36 UTC (permalink / raw)
To: Jiri Olsa, Masami Hiramatsu, Steven Rostedt, Ingo Molnar,
Alexei Starovoitov, Daniel Borkmann
Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
John Fastabend, KP Singh, netdev, bpf, linux-kernel, Shuah Khan,
linux-kselftest
As suggested in [1], the kprobe_multi interface is to be fixed for 32-bit
architectures and compat, rather then disabled. As it turned out,
there are a couple of additional problems that are to be addressed:
- the absence of size overflow checks, leading to possible
out-of-bounds writes (addressed by the first patch; this one likely has
to be fixed in 5.18, where the version of the patch from [3]
may be preferrable, along with [4] to avoid applying the rest
of the series);
- the assumption that long has the same size as u64, which would make
cookies arrays size calculation incorrect on 32-bit architectures
(addressed by the second patch);
- the addrs array passing API, that is incompatible with compat and has
to be changed (addressed in the fourth patch): those are kernel
addresses and not user ones (as was incorrectly stated in [2]);
this change is only semantical for 64-bit user/kernelspace,
so it shouldn't impact ABI there, at least.
[1] https://lore.kernel.org/lkml/CAADnVQ+2gwhcMht4PuDnDOFKY68Wsq8QFz4Y69NBX_TLaSexQQ@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20220510184155.GA8295@asgard.redhat.com/
[3] https://lore.kernel.org/lkml/20220516230455.GA25103@asgard.redhat.com/
[4] https://lore.kernel.org/lkml/20220506142148.GA24802@asgard.redhat.com/
v3:
- Rebased on top of bpf-next
- Removed unnecessary size/cookies_size assignments as suggested
by Yonghong Sond
v2: https://lore.kernel.org/lkml/20220516230441.GA22091@asgard.redhat.com/
- Fixed the isses reported by CI
v1: https://lore.kernel.org/lkml/20220516182657.GA28596@asgard.redhat.com/
Eugene Syromiatnikov (4):
bpf_trace: check size for overflow in bpf_kprobe_multi_link_attach
bpf_trace: support 32-bit kernels in bpf_kprobe_multi_link_attach
bpf_trace: handle compat in copy_user_syms
bpf_trace: pass array of u64 values in kprobe_multi.addrs
kernel/trace/bpf_trace.c | 67 ++++++++++++++++------
tools/lib/bpf/bpf.h | 2 +-
tools/lib/bpf/libbpf.c | 8 +--
tools/lib/bpf/libbpf.h | 2 +-
.../testing/selftests/bpf/prog_tests/bpf_cookie.c | 2 +-
.../selftests/bpf/prog_tests/kprobe_multi_test.c | 8 +--
6 files changed, 62 insertions(+), 27 deletions(-)
--
2.1.4
^ permalink raw reply
* [PATCH bpf-next v3 3/4] bpf_trace: handle compat in copy_user_syms
From: Eugene Syromiatnikov @ 2022-05-17 7:36 UTC (permalink / raw)
To: Jiri Olsa, Masami Hiramatsu, Steven Rostedt, Ingo Molnar,
Alexei Starovoitov, Daniel Borkmann
Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
John Fastabend, KP Singh, netdev, bpf, linux-kernel, Shuah Khan,
linux-kselftest
In-Reply-To: <cover.1652772731.git.esyr@redhat.com>
For compat processes, userspace size for syms pointers is different.
Provide compat handling for copying array elements from the user space.
Fixes: 0dcac272540613d4 ("bpf: Add multi kprobe link")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
---
kernel/trace/bpf_trace.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a93a54f..9d3028a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2253,6 +2253,24 @@ struct user_syms {
char *buf;
};
+static inline int get_arr_ptr(unsigned long *p,
+ unsigned long __user *uaddr, u32 idx)
+{
+ if (unlikely(in_compat_syscall())) {
+ compat_uptr_t __user *compat_uaddr = (compat_uptr_t __user *)uaddr;
+ compat_uptr_t val;
+ int err;
+
+ err = __get_user(val, compat_uaddr + idx);
+ if (!err)
+ *p = val;
+
+ return err;
+ } else {
+ return __get_user(*p, uaddr + idx);
+ }
+}
+
static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32 cnt)
{
unsigned long __user usymbol;
@@ -2270,7 +2288,7 @@ static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32
goto error;
for (p = buf, i = 0; i < cnt; i++) {
- if (__get_user(usymbol, usyms + i)) {
+ if (get_arr_ptr(&usymbol, usyms, i)) {
err = -EFAULT;
goto error;
}
--
2.1.4
^ permalink raw reply related
* [PATCH bpf-next v3 4/4] bpf_trace: pass array of u64 values in kprobe_multi.addrs
From: Eugene Syromiatnikov @ 2022-05-17 7:36 UTC (permalink / raw)
To: Jiri Olsa, Masami Hiramatsu, Steven Rostedt, Ingo Molnar,
Alexei Starovoitov, Daniel Borkmann
Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
John Fastabend, KP Singh, netdev, bpf, linux-kernel, Shuah Khan,
linux-kselftest
In-Reply-To: <cover.1652772731.git.esyr@redhat.com>
With the interface as defined, it is impossible to pass 64-bit kernel
addresses from a 32-bit userspace process in BPF_LINK_TYPE_KPROBE_MULTI,
which severly limits the useability of the interface, change the ABI
to accept an array of u64 values instead of (kernel? user?) longs.
Interestingly, the rest of the libbpf infrastructure uses 64-bit values
for kallsyms addresses already, so this patch also eliminates
the sym_addr cast in tools/lib/bpf/libbpf.c:resolve_kprobe_multi_cb().
Fixes: 0dcac272540613d4 ("bpf: Add multi kprobe link")
Fixes: 5117c26e877352bc ("libbpf: Add bpf_link_create support for multi kprobes")
Fixes: ddc6b04989eb0993 ("libbpf: Add bpf_program__attach_kprobe_multi_opts function")
Fixes: f7a11eeccb111854 ("selftests/bpf: Add kprobe_multi attach test")
Fixes: 9271a0c7ae7a9147 ("selftests/bpf: Add attach test for bpf_program__attach_kprobe_multi_opts")
Fixes: 2c6401c966ae1fbe ("selftests/bpf: Add kprobe_multi bpf_cookie test")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
---
kernel/trace/bpf_trace.c | 25 ++++++++++++++++++----
tools/lib/bpf/bpf.h | 2 +-
tools/lib/bpf/libbpf.c | 8 +++----
tools/lib/bpf/libbpf.h | 2 +-
.../testing/selftests/bpf/prog_tests/bpf_cookie.c | 2 +-
.../selftests/bpf/prog_tests/kprobe_multi_test.c | 8 +++----
6 files changed, 32 insertions(+), 15 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9d3028a..30a15b3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2454,7 +2454,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
void __user *ucookies;
unsigned long *addrs;
u32 flags, cnt, size, cookies_size;
- void __user *uaddrs;
+ u64 __user *uaddrs;
u64 *cookies = NULL;
void __user *usyms;
int err;
@@ -2486,9 +2486,26 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
return -ENOMEM;
if (uaddrs) {
- if (copy_from_user(addrs, uaddrs, size)) {
- err = -EFAULT;
- goto error;
+ if (sizeof(*addrs) == sizeof(*uaddrs)) {
+ if (copy_from_user(addrs, uaddrs, size)) {
+ err = -EFAULT;
+ goto error;
+ }
+ } else {
+ u32 i;
+ u64 addr;
+
+ for (i = 0; i < cnt; i++) {
+ if (get_user(addr, uaddrs + i)) {
+ err = -EFAULT;
+ goto error;
+ }
+ if (addr > ULONG_MAX) {
+ err = -EINVAL;
+ goto error;
+ }
+ addrs[i] = addr;
+ }
}
} else {
struct user_syms us;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 2e0d373..da9c6037 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -418,7 +418,7 @@ struct bpf_link_create_opts {
__u32 flags;
__u32 cnt;
const char **syms;
- const unsigned long *addrs;
+ const __u64 *addrs;
const __u64 *cookies;
} kprobe_multi;
struct {
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ef7f302..35fa9c5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10737,7 +10737,7 @@ static bool glob_match(const char *str, const char *pat)
struct kprobe_multi_resolve {
const char *pattern;
- unsigned long *addrs;
+ __u64 *addrs;
size_t cap;
size_t cnt;
};
@@ -10752,12 +10752,12 @@ resolve_kprobe_multi_cb(unsigned long long sym_addr, char sym_type,
if (!glob_match(sym_name, res->pattern))
return 0;
- err = libbpf_ensure_mem((void **) &res->addrs, &res->cap, sizeof(unsigned long),
+ err = libbpf_ensure_mem((void **) &res->addrs, &res->cap, sizeof(__u64),
res->cnt + 1);
if (err)
return err;
- res->addrs[res->cnt++] = (unsigned long) sym_addr;
+ res->addrs[res->cnt++] = sym_addr;
return 0;
}
@@ -10772,7 +10772,7 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
};
struct bpf_link *link = NULL;
char errmsg[STRERR_BUFSIZE];
- const unsigned long *addrs;
+ const __u64 *addrs;
int err, link_fd, prog_fd;
const __u64 *cookies;
const char **syms;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 9e9a3fd..76e171d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -489,7 +489,7 @@ struct bpf_kprobe_multi_opts {
/* array of function symbols to attach */
const char **syms;
/* array of function addresses to attach */
- const unsigned long *addrs;
+ const __u64 *addrs;
/* array of user-provided values fetchable through bpf_get_attach_cookie */
const __u64 *cookies;
/* number of elements in syms/addrs/cookies arrays */
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
index 83ef55e3..e843840 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
@@ -140,7 +140,7 @@ static void kprobe_multi_link_api_subtest(void)
cookies[6] = 7;
cookies[7] = 8;
- opts.kprobe_multi.addrs = (const unsigned long *) &addrs;
+ opts.kprobe_multi.addrs = (const __u64 *) &addrs;
opts.kprobe_multi.cnt = ARRAY_SIZE(addrs);
opts.kprobe_multi.cookies = (const __u64 *) &cookies;
prog_fd = bpf_program__fd(skel->progs.test_kprobe);
diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
index 586dc52..7646112 100644
--- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
@@ -108,7 +108,7 @@ static void test_link_api_addrs(void)
GET_ADDR("bpf_fentry_test7", addrs[6]);
GET_ADDR("bpf_fentry_test8", addrs[7]);
- opts.kprobe_multi.addrs = (const unsigned long*) addrs;
+ opts.kprobe_multi.addrs = (const __u64 *) addrs;
opts.kprobe_multi.cnt = ARRAY_SIZE(addrs);
test_link_api(&opts);
}
@@ -186,7 +186,7 @@ static void test_attach_api_addrs(void)
GET_ADDR("bpf_fentry_test7", addrs[6]);
GET_ADDR("bpf_fentry_test8", addrs[7]);
- opts.addrs = (const unsigned long *) addrs;
+ opts.addrs = (const __u64 *) addrs;
opts.cnt = ARRAY_SIZE(addrs);
test_attach_api(NULL, &opts);
}
@@ -244,7 +244,7 @@ static void test_attach_api_fails(void)
goto cleanup;
/* fail_2 - both addrs and syms set */
- opts.addrs = (const unsigned long *) addrs;
+ opts.addrs = (const __u64 *) addrs;
opts.syms = syms;
opts.cnt = ARRAY_SIZE(syms);
opts.cookies = NULL;
@@ -258,7 +258,7 @@ static void test_attach_api_fails(void)
goto cleanup;
/* fail_3 - pattern and addrs set */
- opts.addrs = (const unsigned long *) addrs;
+ opts.addrs = (const __u64 *) addrs;
opts.syms = NULL;
opts.cnt = ARRAY_SIZE(syms);
opts.cookies = NULL;
--
2.1.4
^ permalink raw reply related
* Re: [PATCH net-next] net: ifdefy the wireless pointers in struct net_device
From: Johannes Berg @ 2022-05-17 7:48 UTC (permalink / raw)
To: Florian Fainelli, Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, alex.aring, stefan, mareklindner, sw, a,
sven, linux-wireless, linux-wpan
In-Reply-To: <8e9f1b04-d17b-2812-22bb-e62b5560aa6e@gmail.com>
On Mon, 2022-05-16 at 19:12 -0700, Florian Fainelli wrote:
>
> On 5/16/2022 2:56 PM, Jakub Kicinski wrote:
> > Most protocol-specific pointers in struct net_device are under
> > a respective ifdef. Wireless is the notable exception. Since
> > there's a sizable number of custom-built kernels for datacenter
> > workloads which don't build wireless it seems reasonable to
> > ifdefy those pointers as well.
> >
> > While at it move IPv4 and IPv6 pointers up, those are special
> > for obvious reasons.
> >
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
> Could not we move to an union of pointers in the future since in many
> cases a network device can only have one of those pointers at any given
> time?
Then at the very least we'd need some kind of type that we can assign to
disambiguate, because today e.g. we have a netdev notifier (and other
code) that could get a non-wireless netdev and check like this:
static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
unsigned long state, void *ptr)
{
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct wireless_dev *wdev = dev->ieee80211_ptr;
[...]
if (!wdev)
return NOTIFY_DONE;
We could probably use the netdev->dev.type for this though, that's just
a pointer we can compare to. We'd have to set it differently (today
cfg80211 sets it based on whether or not you have ieee80211_ptr, we'd
have to turn that around), but that's not terribly hard, especially
since wireless drivers now have to call cfg80211_register_netdevice()
anyway, rather than register_netdevice() directly.
Something like the patch below might do that, but I haven't carefully
checked it yet, nor checked if there are any paths in mac80211/drivers
that might be doing this check - and it looks from Jakub's patch that
batman code would like to check this too.
johannes
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 3e5d12040726..6ea2a597f4ca 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1192,7 +1192,7 @@ void cfg80211_unregister_wdev(struct wireless_dev *wdev)
}
EXPORT_SYMBOL(cfg80211_unregister_wdev);
-static const struct device_type wiphy_type = {
+const struct device_type wiphy_type = {
.name = "wlan",
};
@@ -1369,6 +1369,9 @@ int cfg80211_register_netdevice(struct net_device *dev)
lockdep_assert_held(&rdev->wiphy.mtx);
+ /* this lets us identify our netdevs in the future */
+ SET_NETDEV_DEVTYPE(dev, &wiphy_type);
+
/* we'll take care of this */
wdev->registered = true;
wdev->registering = true;
@@ -1394,7 +1397,7 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
struct cfg80211_registered_device *rdev;
struct cfg80211_sched_scan_request *pos, *tmp;
- if (!wdev)
+ if (!netdev_is_wireless(dev))
return NOTIFY_DONE;
rdev = wiphy_to_rdev(wdev->wiphy);
@@ -1403,7 +1406,6 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
switch (state) {
case NETDEV_POST_INIT:
- SET_NETDEV_DEVTYPE(dev, &wiphy_type);
wdev->netdev = dev;
/* can only change netns with wiphy */
dev->features |= NETIF_F_NETNS_LOCAL;
diff --git a/net/wireless/core.h b/net/wireless/core.h
index 2c195067ddff..e256ea5caf49 100644
--- a/net/wireless/core.h
+++ b/net/wireless/core.h
@@ -219,6 +219,13 @@ void cfg80211_init_wdev(struct wireless_dev *wdev);
void cfg80211_register_wdev(struct cfg80211_registered_device *rdev,
struct wireless_dev *wdev);
+extern const struct device_type wiphy_type;
+
+static inline bool netdev_is_wireless(struct net_device *dev)
+{
+ return dev && dev->dev.type == &wiphy_type && dev->ieee80211_ptr;
+}
+
static inline void wdev_lock(struct wireless_dev *wdev)
__acquires(wdev)
{
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 342dfefb6eca..58bc3750c380 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -182,7 +182,7 @@ __cfg80211_rdev_from_attrs(struct net *netns, struct nlattr **attrs)
netdev = __dev_get_by_index(netns, ifindex);
if (netdev) {
- if (netdev->ieee80211_ptr)
+ if (netdev_is_wireless(netdev))
tmp = wiphy_to_rdev(
netdev->ieee80211_ptr->wiphy);
else
@@ -2978,7 +2978,7 @@ static int nl80211_dump_wiphy_parse(struct sk_buff *skb,
ret = -ENODEV;
goto out;
}
- if (netdev->ieee80211_ptr) {
+ if (netdev_is_wireless(netdev)) {
rdev = wiphy_to_rdev(
netdev->ieee80211_ptr->wiphy);
state->filter_wiphy = rdev->wiphy_idx;
@@ -3364,7 +3364,7 @@ static int nl80211_set_wiphy(struct sk_buff *skb, struct genl_info *info)
int ifindex = nla_get_u32(info->attrs[NL80211_ATTR_IFINDEX]);
netdev = __dev_get_by_index(genl_info_net(info), ifindex);
- if (netdev && netdev->ieee80211_ptr)
+ if (netdev_is_wireless(netdev))
rdev = wiphy_to_rdev(netdev->ieee80211_ptr->wiphy);
else
netdev = NULL;
^ permalink raw reply related
* Re: [PATCH bpf-next 1/2] cpuidle/rcu: Making arch_cpu_idle and rcu_idle_exit noinstr
From: Jiri Olsa @ 2022-05-17 7:49 UTC (permalink / raw)
To: Yonghong Song
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Masami Hiramatsu, Paul E. McKenney, netdev, bpf, lkml,
Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
Steven Rostedt
In-Reply-To: <5b4bd044-ba88-649b-9b85-e08e175691f9@fb.com>
On Mon, May 16, 2022 at 07:54:37PM -0700, Yonghong Song wrote:
>
>
> On 5/15/22 1:36 PM, Jiri Olsa wrote:
> > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > in rcu 'not watching' context and if there's tracer attached to
> > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > warning like:
> >
> > [ 3.017540] WARNING: suspicious RCU usage
> > ...
> > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > [ 3.018374] 0xffffffffa00080c8
> > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > [ 3.018398] arch_cpu_idle+0x5/0x10
> > [ 3.018399] default_idle_call+0x59/0x90
> > [ 3.018401] do_idle+0x1c3/0x1d0
> >
> > The call path is following:
> >
> > default_idle_call
> > rcu_idle_enter
> > arch_cpu_idle
> > rcu_idle_exit
> >
> > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > path that are traceble and cause this problem on my setup.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > arch/x86/kernel/process.c | 2 +-
> > kernel/rcu/tree.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index b370767f5b19..1345cb0124a6 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> > /*
> > * Called from the generic idle code.
> > */
> > -void arch_cpu_idle(void)
> > +void noinstr arch_cpu_idle(void)
>
> noinstr includes a lot of attributes:
>
> #define noinstr \
> noinline notrace __attribute((__section__(".noinstr.text"))) \
> __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
>
> should we use notrace here?
hm right, so notrace should be enough for our case (kprobe_multi)
which is based on ftrace/fprobe jump
noinstr (among other things) adds the function also the kprobes
blacklist, which will prevent standard kprobes to attach
ASAICS standard kprobes use rcu in probe path as well, like in
opt_pre_handler function
so I think we should go with noinstr
jirka
>
> > {
> > x86_idle();
> > }
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index a4b8189455d5..20d529722f51 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> > * If you add or remove a call to rcu_idle_exit(), be sure to test with
> > * CONFIG_RCU_EQS_DEBUG=y.
> > */
> > -void rcu_idle_exit(void)
> > +void noinstr rcu_idle_exit(void)
> > {
> > unsigned long flags;
^ permalink raw reply
* Re: [PATCH net-next] net: ifdefy the wireless pointers in struct net_device
From: Johannes Berg @ 2022-05-17 7:51 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, alex.aring, stefan, mareklindner, sw, a,
sven, linux-wireless, linux-wpan
In-Reply-To: <20220516215638.1787257-1-kuba@kernel.org>
On Mon, 2022-05-16 at 14:56 -0700, Jakub Kicinski wrote:
>
> +#if IS_ENABLED(CONFIG_WIRELESS)
> struct wireless_dev *ieee80211_ptr;
> +#endif
Technically, you should be able to use CONFIG_CFG80211 here, but in
practice I'd really hope nobody enables WIRELESS without CFG80211 :)
> +++ b/include/net/cfg80211.h
> @@ -8004,10 +8004,7 @@ int cfg80211_register_netdevice(struct net_device *dev);
> *
> * Requires the RTNL and wiphy mutex to be held.
> */
> -static inline void cfg80211_unregister_netdevice(struct net_device *dev)
> -{
> - cfg80211_unregister_wdev(dev->ieee80211_ptr);
> -}
> +void cfg80211_unregister_netdevice(struct net_device *dev);
Exported functions aren't free either - I think in this case I'd
(slightly) prefer the extra ifdef.
Anyway, we can do this, but I also like Florian's suggestion about the
union, and sent an attempt at a disambiguation patch there.
johannes
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox