BPF List
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
@ 2022-11-10 18:43 Hari Bathini
  2022-11-10 18:43 ` [RFC PATCH 1/3] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Hari Bathini @ 2022-11-10 18:43 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Michael Ellerman

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Patches 1 & 2 add the arch specific functions needed to support this
feature. Patch 3 enables the support for powerpc. The last patch
ensures cleanup is handled racefully.

Tested the changes successfully on a PowerVM. patch_instruction(),
needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
Posting the patches in the meanwhile for feedback on these changes.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/

Hari Bathini (3):
  powerpc/bpf: implement bpf_arch_text_copy
  powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

 arch/powerpc/net/bpf_jit.h        |  18 +--
 arch/powerpc/net/bpf_jit_comp.c   | 194 ++++++++++++++++++++++++------
 arch/powerpc/net/bpf_jit_comp32.c |  26 ++--
 arch/powerpc/net/bpf_jit_comp64.c |  32 ++---
 4 files changed, 198 insertions(+), 72 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/3] powerpc/bpf: implement bpf_arch_text_copy
  2022-11-10 18:43 [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Hari Bathini
@ 2022-11-10 18:43 ` Hari Bathini
  2022-11-13 13:17   ` Christophe Leroy
  2022-11-10 18:43 ` [RFC PATCH 2/3] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack Hari Bathini
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Hari Bathini @ 2022-11-10 18:43 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Michael Ellerman

bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
multiple BPF programs to share the same page. Using patch_instruction
to implement it.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp.c | 39 ++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 43e634126514..7383e0effad2 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -13,9 +13,12 @@
 #include <linux/netdevice.h>
 #include <linux/filter.h>
 #include <linux/if_vlan.h>
-#include <asm/kprobes.h>
+#include <linux/memory.h>
 #include <linux/bpf.h>
 
+#include <asm/kprobes.h>
+#include <asm/code-patching.h>
+
 #include "bpf_jit.h"
 
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
@@ -23,6 +26,35 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
 	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
+/*
+ * Patch 'len' bytes of instructions from opcode to addr, one instruction
+ * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
+ */
+static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
+{
+	void *ret = ERR_PTR(-EINVAL);
+	size_t patched = 0;
+	u32 *inst = opcode;
+	u32 *start = addr;
+
+	if (WARN_ON_ONCE(core_kernel_text((unsigned long)addr)))
+		return ret;
+
+	mutex_lock(&text_mutex);
+	while (patched < len) {
+		if (patch_instruction(start++, ppc_inst(*inst)))
+			goto error;
+
+		inst++;
+		patched += 4;
+	}
+
+	ret = addr;
+error:
+	mutex_unlock(&text_mutex);
+	return ret;
+}
+
 /* Fix updated addresses (for subprog calls, ldimm64, et al) during extra pass */
 static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
 				   struct codegen_context *ctx, u32 *addrs)
@@ -357,3 +389,8 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
 	ctx->exentry_idx++;
 	return 0;
 }
+
+void *bpf_arch_text_copy(void *dst, void *src, size_t len)
+{
+	return bpf_patch_instructions(dst, src, len);
+}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/3] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  2022-11-10 18:43 [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Hari Bathini
  2022-11-10 18:43 ` [RFC PATCH 1/3] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
@ 2022-11-10 18:43 ` Hari Bathini
  2022-11-13 18:00   ` Christophe Leroy
  2022-11-10 18:43 ` [RFC PATCH 3/3] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] Hari Bathini
  2022-11-11 11:25 ` [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Christophe Leroy
  3 siblings, 1 reply; 19+ messages in thread
From: Hari Bathini @ 2022-11-10 18:43 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Michael Ellerman

Implement bpf_arch_text_invalidate and use it to fill unused part of
the bpf_prog_pack with trap instructions when a BPF program is freed.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 7383e0effad2..f925755cd249 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -26,6 +26,33 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
 	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
+/*
+ * Patch 'len' bytes with trap instruction at addr, one instruction
+ * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
+ */
+static void *bpf_patch_ill_insns(void *addr, size_t len)
+{
+	void *ret = ERR_PTR(-EINVAL);
+	size_t patched = 0;
+	u32 *start = addr;
+
+	if (WARN_ON_ONCE(core_kernel_text((unsigned long)addr)))
+		return ret;
+
+	mutex_lock(&text_mutex);
+	while (patched < len) {
+		if (patch_instruction(start++, ppc_inst(PPC_RAW_TRAP())))
+			goto error;
+
+		patched += 4;
+	}
+
+	ret = addr;
+error:
+	mutex_unlock(&text_mutex);
+	return ret;
+}
+
 /*
  * Patch 'len' bytes of instructions from opcode to addr, one instruction
  * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
@@ -394,3 +421,8 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
 	return bpf_patch_instructions(dst, src, len);
 }
+
+int bpf_arch_text_invalidate(void *dst, size_t len)
+{
+	return IS_ERR(bpf_patch_ill_insns(dst, len));
+}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/3] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
  2022-11-10 18:43 [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Hari Bathini
  2022-11-10 18:43 ` [RFC PATCH 1/3] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
  2022-11-10 18:43 ` [RFC PATCH 2/3] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack Hari Bathini
@ 2022-11-10 18:43 ` Hari Bathini
  2022-11-13 18:36   ` Christophe Leroy
  2022-11-11 11:25 ` [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Christophe Leroy
  3 siblings, 1 reply; 19+ messages in thread
From: Hari Bathini @ 2022-11-10 18:43 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Michael Ellerman

Use bpf_jit_binary_pack_alloc in powerpc jit. The jit engine first
writes the program to the rw buffer. When the jit is done, the program
is copied to the final location with bpf_jit_binary_pack_finalize.
With multiple jit_subprogs, bpf_jit_free is called on some subprograms
that haven't got bpf_jit_binary_pack_finalize() yet. Implement custom
bpf_jit_free() like in commit 1d5f82d9dd47 ("bpf, x86: fix freeing of
not-finalized bpf_prog_pack") to call bpf_jit_binary_pack_finalize(),
if necessary. While here, correct the misnomer powerpc64_jit_data to
powerpc_jit_data as it is meant for both ppc32 and ppc64.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit.h        |  18 +++--
 arch/powerpc/net/bpf_jit_comp.c   | 123 +++++++++++++++++++++---------
 arch/powerpc/net/bpf_jit_comp32.c |  26 +++----
 arch/powerpc/net/bpf_jit_comp64.c |  32 ++++----
 4 files changed, 128 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index a4f7880f959d..e314d6a23bce 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -21,7 +21,7 @@
 
 #define PLANT_INSTR(d, idx, instr)					      \
 	do { if (d) { (d)[idx] = instr; } idx++; } while (0)
-#define EMIT(instr)		PLANT_INSTR(image, ctx->idx, instr)
+#define EMIT(instr)		PLANT_INSTR(rw_image, ctx->idx, instr)
 
 /* Long jump; (unconditional 'branch') */
 #define PPC_JMP(dest)							      \
@@ -167,16 +167,18 @@ static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i)
 }
 
 void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func);
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *rw_image, struct codegen_context *ctx, u64 func);
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *rw_image, struct codegen_context *ctx,
 		       u32 *addrs, int pass);
-void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_prologue(u32 *image, u32 *rw_image, struct codegen_context *ctx);
+void bpf_jit_build_epilogue(u32 *image, u32 *rw_image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
-int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
+int bpf_jit_emit_exit_insn(u32 *image, u32 *rw_image, struct codegen_context *ctx,
+			   int tmp_reg, long exit_addr);
 
-int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct codegen_context *ctx,
-			  int insn_idx, int jmp_off, int dst_reg);
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *rw_image, int pass,
+			  struct codegen_context *ctx, int insn_idx,
+			  int jmp_off, int dst_reg);
 
 #endif
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index f925755cd249..c4c1f7a21d89 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -83,7 +83,7 @@ static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
 }
 
 /* Fix updated addresses (for subprog calls, ldimm64, et al) during extra pass */
-static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
+static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image, u32 *rw_image,
 				   struct codegen_context *ctx, u32 *addrs)
 {
 	const struct bpf_insn *insn = fp->insnsi;
@@ -118,7 +118,7 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
 			 */
 			tmp_idx = ctx->idx;
 			ctx->idx = addrs[i] / 4;
-			ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+			ret = bpf_jit_emit_func_call_rel(image, rw_image, ctx, func_addr);
 			if (ret)
 				return ret;
 
@@ -150,7 +150,8 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
 	return 0;
 }
 
-int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr)
+int bpf_jit_emit_exit_insn(u32 *image, u32 *rw_image, struct codegen_context *ctx,
+			   int tmp_reg, long exit_addr)
 {
 	if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 4))) {
 		PPC_JMP(exit_addr);
@@ -160,13 +161,14 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
 		PPC_JMP(ctx->alt_exit_addr);
 	} else {
 		ctx->alt_exit_addr = ctx->idx * 4;
-		bpf_jit_build_epilogue(image, ctx);
+		bpf_jit_build_epilogue(image, rw_image, ctx);
 	}
 
 	return 0;
 }
 
-struct powerpc64_jit_data {
+struct powerpc_jit_data {
+	struct bpf_binary_header *rw_header;
 	struct bpf_binary_header *header;
 	u32 *addrs;
 	u8 *image;
@@ -181,22 +183,25 @@ bool bpf_jit_needs_zext(void)
 
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
-	u32 proglen;
-	u32 alloclen;
-	u8 *image = NULL;
-	u32 *code_base;
-	u32 *addrs;
-	struct powerpc64_jit_data *jit_data;
-	struct codegen_context cgctx;
-	int pass;
-	int flen;
+	struct bpf_binary_header *rw_header = NULL;
+	struct powerpc_jit_data *jit_data;
 	struct bpf_binary_header *bpf_hdr;
+	struct codegen_context cgctx;
 	struct bpf_prog *org_fp = fp;
 	struct bpf_prog *tmp_fp;
 	bool bpf_blinded = false;
 	bool extra_pass = false;
+	u8 *rw_image = NULL;
+	u32 *rw_code_base;
+	u8 *image = NULL;
 	u32 extable_len;
+	u32 *code_base;
 	u32 fixup_len;
+	u32 alloclen;
+	u32 proglen;
+	u32 *addrs;
+	int pass;
+	int flen;
 
 	if (!fp->jit_requested)
 		return org_fp;
@@ -227,6 +232,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		image = jit_data->image;
 		bpf_hdr = jit_data->header;
 		proglen = jit_data->proglen;
+		rw_header = jit_data->rw_header;
+		rw_image = (void *)rw_header + ((void *)image - (void *)bpf_hdr);
 		extra_pass = true;
 		goto skip_init_ctx;
 	}
@@ -244,7 +251,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
 	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0)) {
+	if (bpf_jit_build_body(fp, 0, 0, &cgctx, addrs, 0)) {
 		/* We hit something illegal or unsupported. */
 		fp = org_fp;
 		goto out_addrs;
@@ -259,7 +266,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	 */
 	if (cgctx.seen & SEEN_TAILCALL || !is_offset_in_branch_range((long)cgctx.idx * 4)) {
 		cgctx.idx = 0;
-		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0)) {
+		if (bpf_jit_build_body(fp, 0, 0, &cgctx, addrs, 0)) {
 			fp = org_fp;
 			goto out_addrs;
 		}
@@ -271,9 +278,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	 * update ctgtx.idx as it pretends to output instructions, then we can
 	 * calculate total size from idx.
 	 */
-	bpf_jit_build_prologue(0, &cgctx);
+	bpf_jit_build_prologue(0, 0, &cgctx);
 	addrs[fp->len] = cgctx.idx * 4;
-	bpf_jit_build_epilogue(0, &cgctx);
+	bpf_jit_build_epilogue(0, 0, &cgctx);
 
 	fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
 	extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
@@ -281,7 +288,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	proglen = cgctx.idx * 4;
 	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len;
 
-	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4, bpf_jit_fill_ill_insns);
+	bpf_hdr = bpf_jit_binary_pack_alloc(alloclen, &image, 4, &rw_header, &rw_image,
+					    bpf_jit_fill_ill_insns);
 	if (!bpf_hdr) {
 		fp = org_fp;
 		goto out_addrs;
@@ -292,6 +300,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 skip_init_ctx:
 	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
+	rw_code_base = (u32 *)(rw_image + FUNCTION_DESCR_SIZE);
 
 	if (extra_pass) {
 		/*
@@ -303,7 +312,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		 * call instruction sequences and hence, the size of the JITed
 		 * image as well.
 		 */
-		bpf_jit_fixup_addresses(fp, code_base, &cgctx, addrs);
+		bpf_jit_fixup_addresses(fp, code_base, rw_code_base, &cgctx, addrs);
 
 		/* There is no need to perform the usual passes. */
 		goto skip_codegen_passes;
@@ -314,13 +323,15 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
 		cgctx.alt_exit_addr = 0;
-		bpf_jit_build_prologue(code_base, &cgctx);
-		if (bpf_jit_build_body(fp, code_base, &cgctx, addrs, pass)) {
-			bpf_jit_binary_free(bpf_hdr);
+		bpf_jit_build_prologue(code_base, rw_code_base, &cgctx);
+		if (bpf_jit_build_body(fp, code_base, rw_code_base, &cgctx, addrs, pass)) {
+			bpf_arch_text_copy(&bpf_hdr->size, &rw_header->size,
+					   sizeof(rw_header->size));
+			bpf_jit_binary_pack_free(bpf_hdr, rw_header);
 			fp = org_fp;
 			goto out_addrs;
 		}
-		bpf_jit_build_epilogue(code_base, &cgctx);
+		bpf_jit_build_epilogue(code_base, rw_code_base, &cgctx);
 
 		if (bpf_jit_enable > 1)
 			pr_info("Pass %d: shrink = %d, seen = 0x%x\n", pass,
@@ -337,17 +348,26 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 #ifdef CONFIG_PPC64_ELF_ABI_V1
 	/* Function descriptor nastiness: Address + TOC */
-	((u64 *)image)[0] = (u64)code_base;
-	((u64 *)image)[1] = local_paca->kernel_toc;
+	((u64 *)rw_image)[0] = (u64)code_base;
+	((u64 *)rw_image)[1] = local_paca->kernel_toc;
 #endif
 
 	fp->bpf_func = (void *)image;
 	fp->jited = 1;
 	fp->jited_len = proglen + FUNCTION_DESCR_SIZE;
 
-	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + bpf_hdr->size);
 	if (!fp->is_func || extra_pass) {
-		bpf_jit_binary_lock_ro(bpf_hdr);
+		/*
+		 * bpf_jit_binary_pack_finalize fails in two scenarios:
+		 *   1) header is not pointing to proper module memory;
+		 *   2) the arch doesn't support bpf_arch_text_copy().
+		 *
+		 * Both cases are serious bugs that justify WARN_ON.
+		 */
+		if (WARN_ON(bpf_jit_binary_pack_finalize(fp, bpf_hdr, rw_header))) {
+			fp = org_fp;
+			goto out_addrs;
+		}
 		bpf_prog_fill_jited_linfo(fp, addrs);
 out_addrs:
 		kfree(addrs);
@@ -359,6 +379,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		jit_data->proglen = proglen;
 		jit_data->image = image;
 		jit_data->header = bpf_hdr;
+		jit_data->rw_header = rw_header;
 	}
 
 out:
@@ -372,12 +393,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
  * The caller should check for (BPF_MODE(code) == BPF_PROBE_MEM) before calling
  * this function, as this only applies to BPF_PROBE_MEM, for now.
  */
-int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct codegen_context *ctx,
-			  int insn_idx, int jmp_off, int dst_reg)
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *rw_image, int pass,
+			  struct codegen_context *ctx, int insn_idx,
+			  int jmp_off, int dst_reg)
 {
-	off_t offset;
+	struct exception_table_entry *ex, *rw_extable;
 	unsigned long pc;
-	struct exception_table_entry *ex;
+	off_t offset;
 	u32 *fixup;
 
 	/* Populate extable entries only in the last pass */
@@ -388,9 +410,16 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
 	    WARN_ON_ONCE(ctx->exentry_idx >= fp->aux->num_exentries))
 		return -EINVAL;
 
-	pc = (unsigned long)&image[insn_idx];
+	/*
+	 * Program is firt written to rw_image before copying to the
+	 * final location. Accordingly, update in the rw_image first.
+	 * As all offsets used are relative, copying as is to the
+	 * final location should be alright.
+	 */
+	pc = (unsigned long)&rw_image[insn_idx];
+	rw_extable = (void *)fp->aux->extable - (void *)image + (void *)rw_image;
 
-	fixup = (void *)fp->aux->extable -
+	fixup = (void *)rw_extable -
 		(fp->aux->num_exentries * BPF_FIXUP_LEN * 4) +
 		(ctx->exentry_idx * BPF_FIXUP_LEN * 4);
 
@@ -401,7 +430,7 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
 	fixup[BPF_FIXUP_LEN - 1] =
 		PPC_RAW_BRANCH((long)(pc + jmp_off) - (long)&fixup[BPF_FIXUP_LEN - 1]);
 
-	ex = &fp->aux->extable[ctx->exentry_idx];
+	ex = &rw_extable[ctx->exentry_idx];
 
 	offset = pc - (long)&ex->insn;
 	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
@@ -426,3 +455,27 @@ int bpf_arch_text_invalidate(void *dst, size_t len)
 {
 	return IS_ERR(bpf_patch_ill_insns(dst, len));
 }
+
+void bpf_jit_free(struct bpf_prog *fp)
+{
+	if (fp->jited) {
+		struct powerpc_jit_data *jit_data = fp->aux->jit_data;
+		struct bpf_binary_header *hdr;
+
+		/*
+		 * If we fail the final pass of JIT (from jit_subprogs),
+		 * the program may not be finalized yet. Call finalize here
+		 * before freeing it.
+		 */
+		if (jit_data) {
+			bpf_jit_binary_pack_finalize(fp, jit_data->header, jit_data->rw_header);
+			kvfree(jit_data->addrs);
+			kfree(jit_data);
+		}
+		hdr = bpf_jit_binary_pack_hdr(fp);
+		bpf_jit_binary_pack_free(hdr, NULL);
+		WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
+	}
+
+	bpf_prog_unlock_free(fp);
+}
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 43f1c76d48ce..047ef627c2aa 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -109,7 +109,7 @@ void bpf_jit_realloc_regs(struct codegen_context *ctx)
 	}
 }
 
-void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_prologue(u32 *image, u32 *rw_image, struct codegen_context *ctx)
 {
 	int i;
 
@@ -162,7 +162,7 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 		EMIT(PPC_RAW_STW(_R0, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
 }
 
-static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx)
+static void bpf_jit_emit_common_epilogue(u32 *image, u32 *rw_image, struct codegen_context *ctx)
 {
 	int i;
 
@@ -172,11 +172,11 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
 			EMIT(PPC_RAW_LWZ(i, _R1, bpf_jit_stack_offsetof(ctx, i)));
 }
 
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_epilogue(u32 *image, u32 *rw_image, struct codegen_context *ctx)
 {
 	EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_0)));
 
-	bpf_jit_emit_common_epilogue(image, ctx);
+	bpf_jit_emit_common_epilogue(image, rw_image, ctx);
 
 	/* Tear down our stack frame */
 
@@ -191,7 +191,7 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	EMIT(PPC_RAW_BLR());
 }
 
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *rw_image, struct codegen_context *ctx, u64 func)
 {
 	s32 rel = (s32)func - (s32)(image + ctx->idx);
 
@@ -211,7 +211,7 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func
 	return 0;
 }
 
-static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
+static int bpf_jit_emit_tail_call(u32 *image, u32 *rw_image, struct codegen_context *ctx, u32 out)
 {
 	/*
 	 * By now, the eBPF program has already setup parameters in r3-r6
@@ -269,7 +269,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
 	EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_1)));
 
 	/* tear restore NVRs, ... */
-	bpf_jit_emit_common_epilogue(image, ctx);
+	bpf_jit_emit_common_epilogue(image, rw_image, ctx);
 
 	EMIT(PPC_RAW_BCTR());
 
@@ -278,7 +278,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
 }
 
 /* Assemble the body code between the prologue & epilogue */
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *rw_image, struct codegen_context *ctx,
 		       u32 *addrs, int pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
@@ -954,8 +954,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 					jmp_off += 4;
 				}
 
-				ret = bpf_add_extable_entry(fp, image, pass, ctx, insn_idx,
-							    jmp_off, dst_reg);
+				ret = bpf_add_extable_entry(fp, image, rw_image, pass, ctx,
+							    insn_idx, jmp_off, dst_reg);
 				if (ret)
 					return ret;
 			}
@@ -986,7 +986,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 			 * we'll just fall through to the epilogue.
 			 */
 			if (i != flen - 1) {
-				ret = bpf_jit_emit_exit_insn(image, ctx, _R0, exit_addr);
+				ret = bpf_jit_emit_exit_insn(image, rw_image, ctx, _R0, exit_addr);
 				if (ret)
 					return ret;
 			}
@@ -1009,7 +1009,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 				EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_5), _R1, 12));
 			}
 
-			ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+			ret = bpf_jit_emit_func_call_rel(image, rw_image, ctx, func_addr);
 			if (ret)
 				return ret;
 
@@ -1231,7 +1231,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 */
 		case BPF_JMP | BPF_TAIL_CALL:
 			ctx->seen |= SEEN_TAILCALL;
-			ret = bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+			ret = bpf_jit_emit_tail_call(image, rw_image, ctx, addrs[i + 1]);
 			if (ret < 0)
 				return ret;
 			break;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 29ee306d6302..f15bc20909d8 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -122,7 +122,7 @@ void bpf_jit_realloc_regs(struct codegen_context *ctx)
 {
 }
 
-void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_prologue(u32 *image, u32 *rw_image, struct codegen_context *ctx)
 {
 	int i;
 
@@ -171,7 +171,7 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 				STACK_FRAME_MIN_SIZE + ctx->stack_size));
 }
 
-static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx)
+static void bpf_jit_emit_common_epilogue(u32 *image, u32 *rw_image, struct codegen_context *ctx)
 {
 	int i;
 
@@ -190,9 +190,9 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
 	}
 }
 
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_epilogue(u32 *image, u32 *rw_image, struct codegen_context *ctx)
 {
-	bpf_jit_emit_common_epilogue(image, ctx);
+	bpf_jit_emit_common_epilogue(image, rw_image, ctx);
 
 	/* Move result to r3 */
 	EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_0)));
@@ -200,7 +200,8 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	EMIT(PPC_RAW_BLR());
 }
 
-static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, u64 func)
+static int bpf_jit_emit_func_call_hlp(u32 *image, u32 *rw_image, struct codegen_context *ctx,
+				      u64 func)
 {
 	unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
 	long reladdr;
@@ -222,7 +223,7 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, u
 	return 0;
 }
 
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *rw_image, struct codegen_context *ctx, u64 func)
 {
 	unsigned int i, ctx_idx = ctx->idx;
 
@@ -254,7 +255,7 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func
 	return 0;
 }
 
-static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
+static int bpf_jit_emit_tail_call(u32 *image, u32 *rw_image, struct codegen_context *ctx, u32 out)
 {
 	/*
 	 * By now, the eBPF program has already setup parameters in r3, r4 and r5
@@ -311,7 +312,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
 	EMIT(PPC_RAW_MTCTR(bpf_to_ppc(TMP_REG_1)));
 
 	/* tear down stack, restore NVRs, ... */
-	bpf_jit_emit_common_epilogue(image, ctx);
+	bpf_jit_emit_common_epilogue(image, rw_image, ctx);
 
 	EMIT(PPC_RAW_BCTR());
 
@@ -342,7 +343,7 @@ asm (
 );
 
 /* Assemble the body code between the prologue & epilogue */
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *rw_image, struct codegen_context *ctx,
 		       u32 *addrs, int pass)
 {
 	enum stf_barrier_type stf_barrier = stf_barrier_type_get();
@@ -921,8 +922,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 				addrs[++i] = ctx->idx * 4;
 
 			if (BPF_MODE(code) == BPF_PROBE_MEM) {
-				ret = bpf_add_extable_entry(fp, image, pass, ctx, ctx->idx - 1,
-							    4, dst_reg);
+				ret = bpf_add_extable_entry(fp, image, rw_image, pass, ctx,
+							    ctx->idx - 1, 4, dst_reg);
 				if (ret)
 					return ret;
 			}
@@ -954,7 +955,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 			 * we'll just fall through to the epilogue.
 			 */
 			if (i != flen - 1) {
-				ret = bpf_jit_emit_exit_insn(image, ctx, tmp1_reg, exit_addr);
+				ret = bpf_jit_emit_exit_insn(image, rw_image, ctx,
+							     tmp1_reg, exit_addr);
 				if (ret)
 					return ret;
 			}
@@ -973,9 +975,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 				return ret;
 
 			if (func_addr_fixed)
-				ret = bpf_jit_emit_func_call_hlp(image, ctx, func_addr);
+				ret = bpf_jit_emit_func_call_hlp(image, rw_image, ctx, func_addr);
 			else
-				ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+				ret = bpf_jit_emit_func_call_rel(image, rw_image, ctx, func_addr);
 
 			if (ret)
 				return ret;
@@ -1184,7 +1186,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 */
 		case BPF_JMP | BPF_TAIL_CALL:
 			ctx->seen |= SEEN_TAILCALL;
-			ret = bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+			ret = bpf_jit_emit_tail_call(image, rw_image, ctx, addrs[i + 1]);
 			if (ret < 0)
 				return ret;
 			break;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-10 18:43 [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Hari Bathini
                   ` (2 preceding siblings ...)
  2022-11-10 18:43 ` [RFC PATCH 3/3] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] Hari Bathini
@ 2022-11-11 11:25 ` Christophe Leroy
  2022-11-14 14:47   ` Hari Bathini
  3 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2022-11-11 11:25 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

Le 10/11/2022 à 19:43, Hari Bathini a écrit :
> Most BPF programs are small, but they consume a page each. For systems
> with busy traffic and many BPF programs, this may also add significant
> pressure on instruction TLB. High iTLB pressure usually slows down the
> whole system causing visible performance degradation for production
> workloads.
> 
> bpf_prog_pack, a customized allocator that packs multiple bpf programs
> into preallocated memory chunks, was proposed [1] to address it. This
> series extends this support on powerpc.
> 
> Patches 1 & 2 add the arch specific functions needed to support this
> feature. Patch 3 enables the support for powerpc. The last patch
> ensures cleanup is handled racefully.
> 
> Tested the changes successfully on a PowerVM. patch_instruction(),
> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
> Posting the patches in the meanwhile for feedback on these changes.

I did a quick test on ppc32, I don't get such a problem, only something 
wrong in the dump print as traps intructions only are dumped, but 
tcpdump works as expected:

[   55.692998] bpf_jit_enable = 2 was set! NEVER use this in production, 
only for JIT debugging!
[   66.279259] device eth0 entered promiscuous mode
[   67.214756] Pass 1: shrink = 0, seen = 0x1f980000
[   67.214880] Pass 2: shrink = 0, seen = 0x1f980000
[   67.214966] flen=5 proglen=60 pass=3 image=be7a8038 from=tcpdump pid=459
[   67.225261] JIT code: 00000000: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.233904] JIT code: 00000010: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.242579] JIT code: 00000020: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.249694] JIT code: 00000030: 7f e0 00 08 7f e0 00 08 7f e0 00 08
[   67.259255] Pass 1: shrink = 0, seen = 0x3ff801fe
[   67.259421] Pass 2: shrink = 0, seen = 0x3ff801fe
[   67.259514] flen=40 proglen=504 pass=3 image=be7a80a0 from=tcpdump 
pid=459
[   67.269467] JIT code: 00000000: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.278001] JIT code: 00000010: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.286519] JIT code: 00000020: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.295041] JIT code: 00000030: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.303596] JIT code: 00000040: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.312164] JIT code: 00000050: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.319231] JIT code: 00000060: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.328822] JIT code: 00000070: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.337382] JIT code: 00000080: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.345901] JIT code: 00000090: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.354423] JIT code: 000000a0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.362941] JIT code: 000000b0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.371462] JIT code: 000000c0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.378526] JIT code: 000000d0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.388120] JIT code: 000000e0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.396680] JIT code: 000000f0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.405199] JIT code: 00000100: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.413756] JIT code: 00000110: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.422324] JIT code: 00000120: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.429389] JIT code: 00000130: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.438982] JIT code: 00000140: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.447541] JIT code: 00000150: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.456059] JIT code: 00000160: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.464578] JIT code: 00000170: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.473201] JIT code: 00000180: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.481705] JIT code: 00000190: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.488770] JIT code: 000001a0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.498359] JIT code: 000001b0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.506921] JIT code: 000001c0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.515439] JIT code: 000001d0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.523998] JIT code: 000001e0: 7f e0 00 08 7f e0 00 08 7f e0 00 08 
7f e0 00 08
[   67.532565] JIT code: 000001f0: 7f e0 00 08 7f e0 00 08
[   82.620898] device eth0 left promiscuous mode


> 
> [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/
> 
> Hari Bathini (3):
>    powerpc/bpf: implement bpf_arch_text_copy
>    powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
>    powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
> 
>   arch/powerpc/net/bpf_jit.h        |  18 +--
>   arch/powerpc/net/bpf_jit_comp.c   | 194 ++++++++++++++++++++++++------
>   arch/powerpc/net/bpf_jit_comp32.c |  26 ++--
>   arch/powerpc/net/bpf_jit_comp64.c |  32 ++---
>   4 files changed, 198 insertions(+), 72 deletions(-)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/bpf: implement bpf_arch_text_copy
  2022-11-10 18:43 ` [RFC PATCH 1/3] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
@ 2022-11-13 13:17   ` Christophe Leroy
  2022-11-14 14:54     ` Hari Bathini
  0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2022-11-13 13:17 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

Le 10/11/2022 à 19:43, Hari Bathini a écrit :
> bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
> multiple BPF programs to share the same page. Using patch_instruction
> to implement it.

Using patch_instruction() is nice for a quick implementation, but it is 
probably suboptimal. Due to the amount of data to be copied, it is worth 
a dedicated function that maps a RW copy of the page to be updated then 
does the copy at once with memcpy() then unmaps the page.

> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>   arch/powerpc/net/bpf_jit_comp.c | 39 ++++++++++++++++++++++++++++++++-
>   1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 43e634126514..7383e0effad2 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -13,9 +13,12 @@
>   #include <linux/netdevice.h>
>   #include <linux/filter.h>
>   #include <linux/if_vlan.h>
> -#include <asm/kprobes.h>
> +#include <linux/memory.h>
>   #include <linux/bpf.h>
>   
> +#include <asm/kprobes.h>
> +#include <asm/code-patching.h>
> +
>   #include "bpf_jit.h"
>   
>   static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> @@ -23,6 +26,35 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
>   	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
>   }
>   
> +/*
> + * Patch 'len' bytes of instructions from opcode to addr, one instruction
> + * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
> + */
> +static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
> +{
> +	void *ret = ERR_PTR(-EINVAL);
> +	size_t patched = 0;
> +	u32 *inst = opcode;
> +	u32 *start = addr;
> +
> +	if (WARN_ON_ONCE(core_kernel_text((unsigned long)addr)))
> +		return ret;
> +
> +	mutex_lock(&text_mutex);
> +	while (patched < len) {
> +		if (patch_instruction(start++, ppc_inst(*inst)))
> +			goto error;
> +
> +		inst++;
> +		patched += 4;
> +	}
> +
> +	ret = addr;
> +error:
> +	mutex_unlock(&text_mutex);
> +	return ret;
> +}
> +
>   /* Fix updated addresses (for subprog calls, ldimm64, et al) during extra pass */
>   static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
>   				   struct codegen_context *ctx, u32 *addrs)
> @@ -357,3 +389,8 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
>   	ctx->exentry_idx++;
>   	return 0;
>   }
> +
> +void *bpf_arch_text_copy(void *dst, void *src, size_t len)
> +{
> +	return bpf_patch_instructions(dst, src, len);
> +}

I can't see the added value of having two functions when the first one 
just calls the second one and is the only user of it. Why not have 
implemented bpf_patch_instructions() directly inside bpf_arch_text_copy() ?

By the way, it can be nice to have two functions, but split them 
differently, to avoid the goto: etc ....

I also prefer using for loops instead of while loops.

It could have looked like below (untested):

static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
{
	u32 *inst = opcode;
	u32 *start = addr;
	u32 *end = addr + len;

	for (inst = opcode, start = addr; start < end; inst++, start++) {
		if (patch_instruction(start, ppc_inst(*inst)))
			return ERR_PTR(-EINVAL);
	}

	return addr;
}

void *bpf_arch_text_copy(void *dst, void *src, size_t len)
{
	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
		return ret;

	mutex_lock(&text_mutex);

	ret = bpf_patch_instructions(dst, src, len);

	mutex_unlock(&text_mutex);

	return ret;
}



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  2022-11-10 18:43 ` [RFC PATCH 2/3] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack Hari Bathini
@ 2022-11-13 18:00   ` Christophe Leroy
  0 siblings, 0 replies; 19+ messages in thread
From: Christophe Leroy @ 2022-11-13 18:00 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

Le 10/11/2022 à 19:43, Hari Bathini a écrit :
> Implement bpf_arch_text_invalidate and use it to fill unused part of
> the bpf_prog_pack with trap instructions when a BPF program is freed.

Same here, allthough patch_instruction() is nice for a first try, it is 
not the solution on the long run. Same as with previous patch, it should 
just map the necessary page by allocating a vma area then mapping the 
associated physical pages over it using map_kernel_page(), then use 
bpf_jit_fill_ill_insns() over than page.

> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>   arch/powerpc/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++++++++
>   1 file changed, 32 insertions(+)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 7383e0effad2..f925755cd249 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -26,6 +26,33 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
>   	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
>   }
>   
> +/*
> + * Patch 'len' bytes with trap instruction at addr, one instruction
> + * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
> + */
> +static void *bpf_patch_ill_insns(void *addr, size_t len)
> +{
> +	void *ret = ERR_PTR(-EINVAL);
> +	size_t patched = 0;
> +	u32 *start = addr;
> +
> +	if (WARN_ON_ONCE(core_kernel_text((unsigned long)addr)))
> +		return ret;
> +
> +	mutex_lock(&text_mutex);
> +	while (patched < len) {
> +		if (patch_instruction(start++, ppc_inst(PPC_RAW_TRAP())))

Use BREAKPOINT_INSTRUCTION instead of PPC_RAW_TRAP()

> +			goto error;
> +
> +		patched += 4;
> +	}
> +
> +	ret = addr;
> +error:
> +	mutex_unlock(&text_mutex);
> +	return ret;
> +}
> +
>   /*
>    * Patch 'len' bytes of instructions from opcode to addr, one instruction
>    * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
> @@ -394,3 +421,8 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
>   {
>   	return bpf_patch_instructions(dst, src, len);
>   }
> +
> +int bpf_arch_text_invalidate(void *dst, size_t len)
> +{
> +	return IS_ERR(bpf_patch_ill_insns(dst, len));
> +}


The exact same split between bpf_arch_text_invalidate() and 
bpf_patch_ill_insns() as previous patch could be done here.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
  2022-11-10 18:43 ` [RFC PATCH 3/3] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] Hari Bathini
@ 2022-11-13 18:36   ` Christophe Leroy
  0 siblings, 0 replies; 19+ messages in thread
From: Christophe Leroy @ 2022-11-13 18:36 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

Le 10/11/2022 à 19:43, Hari Bathini a écrit :
> Use bpf_jit_binary_pack_alloc in powerpc jit. The jit engine first
> writes the program to the rw buffer. When the jit is done, the program
> is copied to the final location with bpf_jit_binary_pack_finalize.
> With multiple jit_subprogs, bpf_jit_free is called on some subprograms
> that haven't got bpf_jit_binary_pack_finalize() yet. Implement custom
> bpf_jit_free() like in commit 1d5f82d9dd47 ("bpf, x86: fix freeing of
> not-finalized bpf_prog_pack") to call bpf_jit_binary_pack_finalize(),
> if necessary. While here, correct the misnomer powerpc64_jit_data to
> powerpc_jit_data as it is meant for both ppc32 and ppc64.

This patch looks heavy compared to x86 commit 1022a5498f6f.

I didn't look into details, is there really a need to carry that 
rw_image over all functions you changed ?

As far as I can see, ok you need it for EMIT macro. But then some of the 
function that use EMIT will now use rw_image instead of image, so why do 
they need both image and rw_image ?

Maybe you'd have less churn if you leave image, and add a ro_image 
wherever necessary but not everywhere.


> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>   arch/powerpc/net/bpf_jit.h        |  18 +++--
>   arch/powerpc/net/bpf_jit_comp.c   | 123 +++++++++++++++++++++---------
>   arch/powerpc/net/bpf_jit_comp32.c |  26 +++----
>   arch/powerpc/net/bpf_jit_comp64.c |  32 ++++----
>   4 files changed, 128 insertions(+), 71 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index f925755cd249..c4c1f7a21d89 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -181,22 +183,25 @@ bool bpf_jit_needs_zext(void)
>   
>   struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   {
> -	u32 proglen;
> -	u32 alloclen;
> -	u8 *image = NULL;
> -	u32 *code_base;
> -	u32 *addrs;
> -	struct powerpc64_jit_data *jit_data;
> -	struct codegen_context cgctx;
> -	int pass;
> -	int flen;
> +	struct bpf_binary_header *rw_header = NULL;
> +	struct powerpc_jit_data *jit_data;
>   	struct bpf_binary_header *bpf_hdr;
> +	struct codegen_context cgctx;
>   	struct bpf_prog *org_fp = fp;
>   	struct bpf_prog *tmp_fp;
>   	bool bpf_blinded = false;
>   	bool extra_pass = false;
> +	u8 *rw_image = NULL;
> +	u32 *rw_code_base;
> +	u8 *image = NULL;
>   	u32 extable_len;
> +	u32 *code_base;
>   	u32 fixup_len;
> +	u32 alloclen;
> +	u32 proglen;
> +	u32 *addrs;
> +	int pass;
> +	int flen;

Why so many changes here, a lot of items seems to only have moved 
without any modification. Why that churn ?

>   
>   	if (!fp->jit_requested)
>   		return org_fp;
> @@ -227,6 +232,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   		image = jit_data->image;
>   		bpf_hdr = jit_data->header;
>   		proglen = jit_data->proglen;
> +		rw_header = jit_data->rw_header;
> +		rw_image = (void *)rw_header + ((void *)image - (void *)bpf_hdr);
>   		extra_pass = true;
>   		goto skip_init_ctx;
>   	}
> @@ -244,7 +251,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
>   
>   	/* Scouting faux-generate pass 0 */
> -	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0)) {
> +	if (bpf_jit_build_body(fp, 0, 0, &cgctx, addrs, 0)) {

Some of the 0s in this call are pointers. You should use NULL instead.
This comment applies to several other lines you have changed.

>   		/* We hit something illegal or unsupported. */
>   		fp = org_fp;
>   		goto out_addrs;
> @@ -259,7 +266,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   	 */
>   	if (cgctx.seen & SEEN_TAILCALL || !is_offset_in_branch_range((long)cgctx.idx * 4)) {
>   		cgctx.idx = 0;
> -		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0)) {
> +		if (bpf_jit_build_body(fp, 0, 0, &cgctx, addrs, 0)) {

0 ==> NULL

>   			fp = org_fp;
>   			goto out_addrs;
>   		}
> @@ -271,9 +278,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   	 * update ctgtx.idx as it pretends to output instructions, then we can
>   	 * calculate total size from idx.
>   	 */
> -	bpf_jit_build_prologue(0, &cgctx);
> +	bpf_jit_build_prologue(0, 0, &cgctx);
>   	addrs[fp->len] = cgctx.idx * 4;
> -	bpf_jit_build_epilogue(0, &cgctx);
> +	bpf_jit_build_epilogue(0, 0, &cgctx);

0 ==> NULL

>   
>   	fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
>   	extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
> @@ -337,17 +348,26 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   
>   #ifdef CONFIG_PPC64_ELF_ABI_V1
>   	/* Function descriptor nastiness: Address + TOC */
> -	((u64 *)image)[0] = (u64)code_base;
> -	((u64 *)image)[1] = local_paca->kernel_toc;
> +	((u64 *)rw_image)[0] = (u64)code_base;
> +	((u64 *)rw_image)[1] = local_paca->kernel_toc;

Would be better to use 'struct func_desc'

And the #ifdef is not necessary, IS_ENABLED() would be enough.

>   #endif
>   
>   	fp->bpf_func = (void *)image;
>   	fp->jited = 1;
>   	fp->jited_len = proglen + FUNCTION_DESCR_SIZE;
>   
> -	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + bpf_hdr->size);
>   	if (!fp->is_func || extra_pass) {
> -		bpf_jit_binary_lock_ro(bpf_hdr);
> +		/*
> +		 * bpf_jit_binary_pack_finalize fails in two scenarios:
> +		 *   1) header is not pointing to proper module memory;

Can that really happen ?

> +		 *   2) the arch doesn't support bpf_arch_text_copy().

The above cannot happen, you added support bpf_arch_text_copy() in patch 
1. So why this comment ?

> +		 *
> +		 * Both cases are serious bugs that justify WARN_ON.
> +		 */

Case 2 would mean a bug in the compiler, if you can't trust your 
compiler for that you can't trust it for anything else. That's odd.

> +		if (WARN_ON(bpf_jit_binary_pack_finalize(fp, bpf_hdr, rw_header))) {
> +			fp = org_fp;
> +			goto out_addrs;
> +		}
>   		bpf_prog_fill_jited_linfo(fp, addrs);
>   out_addrs:
>   		kfree(addrs);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-11 11:25 ` [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Christophe Leroy
@ 2022-11-14 14:47   ` Hari Bathini
       [not found]     ` <bf0af91e-861c-1608-7150-d31578be9b02@csgroup.eu>
  0 siblings, 1 reply; 19+ messages in thread
From: Hari Bathini @ 2022-11-14 14:47 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko

Hi Christophe,

On 11/11/22 4:55 pm, Christophe Leroy wrote:
> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>> Most BPF programs are small, but they consume a page each. For systems
>> with busy traffic and many BPF programs, this may also add significant
>> pressure on instruction TLB. High iTLB pressure usually slows down the
>> whole system causing visible performance degradation for production
>> workloads.
>>
>> bpf_prog_pack, a customized allocator that packs multiple bpf programs
>> into preallocated memory chunks, was proposed [1] to address it. This
>> series extends this support on powerpc.
>>
>> Patches 1 & 2 add the arch specific functions needed to support this
>> feature. Patch 3 enables the support for powerpc. The last patch
>> ensures cleanup is handled racefully.
>>

>> Tested the changes successfully on a PowerVM. patch_instruction(),
>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>> Posting the patches in the meanwhile for feedback on these changes.
> 
> I did a quick test on ppc32, I don't get such a problem, only something
> wrong in the dump print as traps intructions only are dumped, but
> tcpdump works as expected:

Thanks for the quick test. Could you please share the config you used.
I am probably missing a few knobs in my conifg...

Thanks
Hari

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/bpf: implement bpf_arch_text_copy
  2022-11-13 13:17   ` Christophe Leroy
@ 2022-11-14 14:54     ` Hari Bathini
  0 siblings, 0 replies; 19+ messages in thread
From: Hari Bathini @ 2022-11-14 14:54 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



On 13/11/22 6:47 pm, Christophe Leroy wrote:
> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>> bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
>> multiple BPF programs to share the same page. Using patch_instruction
>> to implement it.
> 
> Using patch_instruction() is nice for a quick implementation, but it is
> probably suboptimal. Due to the amount of data to be copied, it is worth

Yeah.

> a dedicated function that maps a RW copy of the page to be updated then
> does the copy at once with memcpy() then unmaps the page.

I will see if I can come up with such implementation for the respin.

> 
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>    arch/powerpc/net/bpf_jit_comp.c | 39 ++++++++++++++++++++++++++++++++-
>>    1 file changed, 38 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
>> index 43e634126514..7383e0effad2 100644
>> --- a/arch/powerpc/net/bpf_jit_comp.c
>> +++ b/arch/powerpc/net/bpf_jit_comp.c
>> @@ -13,9 +13,12 @@
>>    #include <linux/netdevice.h>
>>    #include <linux/filter.h>
>>    #include <linux/if_vlan.h>
>> -#include <asm/kprobes.h>
>> +#include <linux/memory.h>
>>    #include <linux/bpf.h>
>>    
>> +#include <asm/kprobes.h>
>> +#include <asm/code-patching.h>
>> +
>>    #include "bpf_jit.h"
>>    
>>    static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
>> @@ -23,6 +26,35 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
>>    	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
>>    }
>>    
>> +/*
>> + * Patch 'len' bytes of instructions from opcode to addr, one instruction
>> + * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
>> + */
>> +static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
>> +{
>> +	void *ret = ERR_PTR(-EINVAL);
>> +	size_t patched = 0;
>> +	u32 *inst = opcode;
>> +	u32 *start = addr;
>> +
>> +	if (WARN_ON_ONCE(core_kernel_text((unsigned long)addr)))
>> +		return ret;
>> +
>> +	mutex_lock(&text_mutex);
>> +	while (patched < len) {
>> +		if (patch_instruction(start++, ppc_inst(*inst)))
>> +			goto error;
>> +
>> +		inst++;
>> +		patched += 4;
>> +	}
>> +
>> +	ret = addr;
>> +error:
>> +	mutex_unlock(&text_mutex);
>> +	return ret;
>> +}
>> +
>>    /* Fix updated addresses (for subprog calls, ldimm64, et al) during extra pass */
>>    static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
>>    				   struct codegen_context *ctx, u32 *addrs)
>> @@ -357,3 +389,8 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
>>    	ctx->exentry_idx++;
>>    	return 0;
>>    }
>> +
>> +void *bpf_arch_text_copy(void *dst, void *src, size_t len)
>> +{
>> +	return bpf_patch_instructions(dst, src, len);
>> +}
> 
> I can't see the added value of having two functions when the first one
> just calls the second one and is the only user of it. Why not have
> implemented bpf_patch_instructions() directly inside bpf_arch_text_copy() ?
> 
> By the way, it can be nice to have two functions, but split them
> differently, to avoid the goto: etc ....
> 
> I also prefer using for loops instead of while loops.
> 

> It could have looked like below (untested):
> 
> static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
> {
> 	u32 *inst = opcode;
> 	u32 *start = addr;
> 	u32 *end = addr + len;
> 
> 	for (inst = opcode, start = addr; start < end; inst++, start++) {
> 		if (patch_instruction(start, ppc_inst(*inst)))
> 			return ERR_PTR(-EINVAL);
> 	}
> 
> 	return addr;
> }
> 
> void *bpf_arch_text_copy(void *dst, void *src, size_t len)
> {
> 	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
> 		return ret;
> 
> 	mutex_lock(&text_mutex);
> 
> 	ret = bpf_patch_instructions(dst, src, len);
> 
> 	mutex_unlock(&text_mutex);
> 
> 	return ret;
> }
> 
> 

Sure. Will use this.

Thanks
Hari

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
       [not found]       ` <e0266414-843f-db48-a56d-1d8a8944726a@csgroup.eu>
@ 2022-11-16 17:01         ` Hari Bathini
  2022-11-16 17:16           ` Christophe Leroy
  2022-11-17  6:59           ` Christophe Leroy
  0 siblings, 2 replies; 19+ messages in thread
From: Hari Bathini @ 2022-11-16 17:01 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



On 16/11/22 12:14 am, Christophe Leroy wrote:
> 
> 
> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>
>>
>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>> Hi Christophe,
>>>
>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>> Most BPF programs are small, but they consume a page each. For systems
>>>>> with busy traffic and many BPF programs, this may also add significant
>>>>> pressure on instruction TLB. High iTLB pressure usually slows down the
>>>>> whole system causing visible performance degradation for production
>>>>> workloads.
>>>>>
>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf programs
>>>>> into preallocated memory chunks, was proposed [1] to address it. This
>>>>> series extends this support on powerpc.
>>>>>
>>>>> Patches 1 & 2 add the arch specific functions needed to support this
>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>> ensures cleanup is handled racefully.
>>>>>
>>>
>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>
>>>> I did a quick test on ppc32, I don't get such a problem, only something
>>>> wrong in the dump print as traps intructions only are dumped, but
>>>> tcpdump works as expected:
>>>
>>> Thanks for the quick test. Could you please share the config you used.
>>> I am probably missing a few knobs in my conifg...
>>>
>>
> 
> I also managed to test it on QEMU. The config is based on pmac32_defconfig.

I had the same config but hit this problem:

	# echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
	test_bpf: #0 TAX
	------------[ cut here ]------------
	WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
bpf_int_jit_compile+0x8a0/0x9f8
	Modules linked in: test_bpf(+)
	CPU: 0 PID: 96 Comm: modprobe Not tainted 6.1.0-rc5+ #116
	Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
	NIP:  c0038224 LR: c0037f74 CTR: 00000009
	REGS: f1b41b10 TRAP: 0700   Not tainted  (6.1.0-rc5+)
	MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 82004882  XER: 20000000

	GPR00: c0037f74 f1b41bd0 c12333c0 ffffffea c19747e0 c0123a08 c1974a00 
c12333c0
	GPR08: 00000000 00000b58 00009032 00000003 82004882 100d816a f2660000 
00000000
	GPR16: 00000000 c0c10000 00000000 c0c10000 c1913780 00000000 0000001a 
100a2ee3
	GPR24: 0000014c be857000 be857020 f1b41c00 c194c180 f1b46000 ffffffea 
f1b46000
	NIP [c0038224] bpf_int_jit_compile+0x8a0/0x9f8
	LR [c0037f74] bpf_int_jit_compile+0x5f0/0x9f8
	Call Trace:
	[f1b41bd0] [c0037f74] bpf_int_jit_compile+0x5f0/0x9f8 (unreliable)
	[f1b41ca0] [c0123790] bpf_prog_select_runtime+0x178/0x1c4
	[f1b41cd0] [c06cc4b0] bpf_prepare_filter+0x524/0x624
	[f1b41d20] [c06cc63c] bpf_prog_create+0x8c/0xd0
	[f1b41d40] [be85083c] test_bpf_init+0x46c/0x11b0 [test_bpf]
	[f1b41df0] [c0008534] do_one_initcall+0x58/0x1b8
	[f1b41e60] [c00b6c38] do_init_module+0x54/0x1e4
	[f1b41e80] [c00b8f80] sys_init_module+0x10c/0x174
	[f1b41f10] [c0014390] system_call_exception+0x94/0x144
	[f1b41f30] [c001a1ac] ret_from_syscall+0x0/0x2c
	--- interrupt: c00 at 0xfef48ac
	NIP:  0fef48ac LR: 10015de0 CTR: 00000000
	REGS: f1b41f40 TRAP: 0c00   Not tainted  (6.1.0-rc5+)
	MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 88002224  XER: 20000000

	GPR00: 00000080 aff34990 a7a8d380 a75c0010 00477e90 1051b9a0 0fedfdd8 
0000d032
	GPR08: 00000000 00000008 00478ffa 400f26fa 400f23c7 100d816a 00000000 
00000000
	GPR16: 00000000 1051b950 00000000 1051b92c 100d0000 100a2eab 100a2e97 
100a2ee3
	GPR24: 100a2ed5 1051b980 00000001 100d01a8 1051b920 a75c0010 1051b9a0 
a7a86388
	NIP [0fef48ac] 0xfef48ac
	LR [10015de0] 0x10015de0
	--- interrupt: c00
	Instruction dump:
	8081001c 38a00004 7f23cb78 4bfff5d1 7f23cb78 8081001c 480eba85 82410098
	8261009c 82a100a4 82e100ac 4bfffce8 <0fe00000> 92010090 92410098 92e100ac
	---[ end trace 0000000000000000 ]---
	jited:1
	kernel tried to execute exec-protected page (be857020) - exploit 
attempt? (uid: 0)
	BUG: Unable to handle kernel instruction fetch
	Faulting instruction address: 0xbe857020
	Vector: 400 (Instruction Access) at [f1b41c30]
	    pc: be857020
	    lr: be84eaa4: __run_one+0xec/0x248 [test_bpf]
	    sp: f1b41cf0
	   msr: 40009032
	  current = 0xc12333c0
	    pid   = 96, comm = modprobe
	Linux version 6.1.0-rc5+ (root@hbathini-Standard-PC-Q35-ICH9-2009) 
(powerpc-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, GNU ld 
(GNU Binutils for Ubuntu) 2.38) #116 Wed Nov 16 20:48:10 IST 2022
	enter ? for help
	[link register   ] be84eaa4 __run_one+0xec/0x248 [test_bpf]
	[f1b41cf0] be84e9dc __run_one+0x24/0x248 [test_bpf] (unreliable)
	[f1b41d40] be850c78 test_bpf_init+0x8a8/0x11b0 [test_bpf]
	[f1b41df0] c0008534 do_one_initcall+0x58/0x1b8
	[f1b41e60] c00b6c38 do_init_module+0x54/0x1e4
	[f1b41e80] c00b8f80 sys_init_module+0x10c/0x174
	[f1b41f10] c0014390 system_call_exception+0x94/0x144
	[f1b41f30] c001a1ac ret_from_syscall+0x0/0x2c
	--- Exception: c00 (System Call) at 0fef48ac
	SP (aff34990) is in userspace
	mon>

bpf_jit_binary_pack_finalize() function failed due to patch_instruction() ..
Also, I am hitting the same problem with the other config too..

Thanks
Hari

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-16 17:01         ` Hari Bathini
@ 2022-11-16 17:16           ` Christophe Leroy
  2022-11-17  6:59           ` Christophe Leroy
  1 sibling, 0 replies; 19+ messages in thread
From: Christophe Leroy @ 2022-11-16 17:16 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>
>> I also managed to test it on QEMU. The config is based on 
>> pmac32_defconfig.
> 
> I had the same config but hit this problem:
> 
>      # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>      test_bpf: #0 TAX
>      ------------[ cut here ]------------
>      WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
> bpf_int_jit_compile+0x8a0/0x9f8

Ok, I'll give it a try with test_bpf module.

Christophe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-16 17:01         ` Hari Bathini
  2022-11-16 17:16           ` Christophe Leroy
@ 2022-11-17  6:59           ` Christophe Leroy
  2022-11-18  8:39             ` Hari Bathini
  1 sibling, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2022-11-17  6:59 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



Le 16/11/2022 à 18:01, Hari Bathini a écrit :
> 
> 
> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>
>>
>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>
>>>
>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>>> Hi Christophe,
>>>>
>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>>> Most BPF programs are small, but they consume a page each. For 
>>>>>> systems
>>>>>> with busy traffic and many BPF programs, this may also add 
>>>>>> significant
>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down 
>>>>>> the
>>>>>> whole system causing visible performance degradation for production
>>>>>> workloads.
>>>>>>
>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf 
>>>>>> programs
>>>>>> into preallocated memory chunks, was proposed [1] to address it. This
>>>>>> series extends this support on powerpc.
>>>>>>
>>>>>> Patches 1 & 2 add the arch specific functions needed to support this
>>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>>> ensures cleanup is handled racefully.
>>>>>>
>>>>
>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>>
>>>>> I did a quick test on ppc32, I don't get such a problem, only 
>>>>> something
>>>>> wrong in the dump print as traps intructions only are dumped, but
>>>>> tcpdump works as expected:
>>>>
>>>> Thanks for the quick test. Could you please share the config you used.
>>>> I am probably missing a few knobs in my conifg...
>>>>
>>>
>>
>> I also managed to test it on QEMU. The config is based on 
>> pmac32_defconfig.
> 
> I had the same config but hit this problem:
> 
>      # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>      test_bpf: #0 TAX
>      ------------[ cut here ]------------
>      WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367 
> bpf_int_jit_compile+0x8a0/0x9f8

I get no such problem, on QEMU, and I checked the .config has:
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_STRICT_MODULE_RWX=y

Boot successful.
/ # ifconfig eth0 10.0.2.15
e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
/ # tftp -g 10.0.2.2 -r test_bpf.ko
/ # echo 1 > /proc/sys/net/core/bpf_jit_enable
/ # insmod ./test_bpf.ko
test_bpf: #0 TAX jited:1 216 87 86 PASS
test_bpf: #1 TXA jited:1 57 27 27 PASS
test_bpf: #2 ADD_SUB_MUL_K jited:1 50 PASS
test_bpf: #3 DIV_MOD_KX jited:1 110 PASS
test_bpf: #4 AND_OR_LSH_K jited:1 67 26 PASS
test_bpf: #5 LD_IMM_0 jited:1 77 PASS
...

By the way, you can note that during the boot you get:

	This platform has HASH MMU, STRICT_MODULE_RWX won't work

See why in 0670010f3b10 ("powerpc/32s: Enable STRICT_MODULE_RWX for the 
603 core")

Nevertheless it should prevent patch_instruction() to work.

Could you had a pr_err() in __patch_instruction() in the failure path to 
print and check exec_addr and patch_addr ?



>      jited:1
>      kernel tried to execute exec-protected page (be857020) - exploit 
> attempt? (uid: 0)
>      BUG: Unable to handle kernel instruction fetch
>      Faulting instruction address: 0xbe857020

I'm a bit surprised of this. On hash based book3s/32 there is no way to 
protect pages for exec-protection. Protection is performed at segment 
level, all kernel segments have the NX bit set except the segment used 
for module text, which is by default 0xb0000000-0xbfffffff.

Or maybe this is the first time that address is accessed, and the ISI 
handler does the check before loading the hash table ?

> 
> bpf_jit_binary_pack_finalize() function failed due to 
> patch_instruction() ..

Is there a way tell BPF core that jit failed in that case to avoid that ?

Christophe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-17  6:59           ` Christophe Leroy
@ 2022-11-18  8:39             ` Hari Bathini
  2022-11-18  8:51               ` Christophe Leroy
  0 siblings, 1 reply; 19+ messages in thread
From: Hari Bathini @ 2022-11-18  8:39 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



On 17/11/22 12:29 pm, Christophe Leroy wrote:
> 
> 
> Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>
>>
>> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>>
>>>
>>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>>
>>>>
>>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>>>> Hi Christophe,
>>>>>
>>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>>>> Most BPF programs are small, but they consume a page each. For
>>>>>>> systems
>>>>>>> with busy traffic and many BPF programs, this may also add
>>>>>>> significant
>>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down
>>>>>>> the
>>>>>>> whole system causing visible performance degradation for production
>>>>>>> workloads.
>>>>>>>
>>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf
>>>>>>> programs
>>>>>>> into preallocated memory chunks, was proposed [1] to address it. This
>>>>>>> series extends this support on powerpc.
>>>>>>>
>>>>>>> Patches 1 & 2 add the arch specific functions needed to support this
>>>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>>>> ensures cleanup is handled racefully.
>>>>>>>
>>>>>
>>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging it.
>>>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>>>
>>>>>> I did a quick test on ppc32, I don't get such a problem, only
>>>>>> something
>>>>>> wrong in the dump print as traps intructions only are dumped, but
>>>>>> tcpdump works as expected:
>>>>>
>>>>> Thanks for the quick test. Could you please share the config you used.
>>>>> I am probably missing a few knobs in my conifg...
>>>>>
>>>>
>>>
>>> I also managed to test it on QEMU. The config is based on
>>> pmac32_defconfig.
>>
>> I had the same config but hit this problem:
>>
>>       # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>       test_bpf: #0 TAX
>>       ------------[ cut here ]------------
>>       WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>> bpf_int_jit_compile+0x8a0/0x9f8
> 
> I get no such problem, on QEMU, and I checked the .config has:

> CONFIG_STRICT_KERNEL_RWX=y
> CONFIG_STRICT_MODULE_RWX=y

Yeah. That did the trick. These options were missing in my config and
the pmac config you shared. I could not run the other config you shared
on QEMU. Thanks for all the pointers.

- Hari

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-18  8:39             ` Hari Bathini
@ 2022-11-18  8:51               ` Christophe Leroy
  2022-11-18  9:39                 ` Hari Bathini
  0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2022-11-18  8:51 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



Le 18/11/2022 à 09:39, Hari Bathini a écrit :
> 
> 
> On 17/11/22 12:29 pm, Christophe Leroy wrote:
>>
>>
>> Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>>
>>>
>>> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>>>
>>>>>
>>>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>>>>> Hi Christophe,
>>>>>>
>>>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>>>>> Most BPF programs are small, but they consume a page each. For
>>>>>>>> systems
>>>>>>>> with busy traffic and many BPF programs, this may also add
>>>>>>>> significant
>>>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down
>>>>>>>> the
>>>>>>>> whole system causing visible performance degradation for production
>>>>>>>> workloads.
>>>>>>>>
>>>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf
>>>>>>>> programs
>>>>>>>> into preallocated memory chunks, was proposed [1] to address it. 
>>>>>>>> This
>>>>>>>> series extends this support on powerpc.
>>>>>>>>
>>>>>>>> Patches 1 & 2 add the arch specific functions needed to support 
>>>>>>>> this
>>>>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>>>>> ensures cleanup is handled racefully.
>>>>>>>>
>>>>>>
>>>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging 
>>>>>>>> it.
>>>>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>>>>
>>>>>>> I did a quick test on ppc32, I don't get such a problem, only
>>>>>>> something
>>>>>>> wrong in the dump print as traps intructions only are dumped, but
>>>>>>> tcpdump works as expected:
>>>>>>
>>>>>> Thanks for the quick test. Could you please share the config you 
>>>>>> used.
>>>>>> I am probably missing a few knobs in my conifg...
>>>>>>
>>>>>
>>>>
>>>> I also managed to test it on QEMU. The config is based on
>>>> pmac32_defconfig.
>>>
>>> I had the same config but hit this problem:
>>>
>>>       # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>>       test_bpf: #0 TAX
>>>       ------------[ cut here ]------------
>>>       WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>>> bpf_int_jit_compile+0x8a0/0x9f8
>>
>> I get no such problem, on QEMU, and I checked the .config has:
> 
>> CONFIG_STRICT_KERNEL_RWX=y
>> CONFIG_STRICT_MODULE_RWX=y
> 
> Yeah. That did the trick.

Interesting. I guess we have to find out why it fails when those config 
are missing.

Maybe module code plays with RO and NX flags even if 
CONFIG_STRICT_MODULE_RWX is not selected ?

Christophe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-18  8:51               ` Christophe Leroy
@ 2022-11-18  9:39                 ` Hari Bathini
  2022-11-18 11:46                   ` Christophe Leroy
  0 siblings, 1 reply; 19+ messages in thread
From: Hari Bathini @ 2022-11-18  9:39 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



On 18/11/22 2:21 pm, Christophe Leroy wrote:
> 
> 
> Le 18/11/2022 à 09:39, Hari Bathini a écrit :
>>
>>
>> On 17/11/22 12:29 pm, Christophe Leroy wrote:
>>>
>>>
>>> Le 16/11/2022 à 18:01, Hari Bathini a écrit :
>>>>
>>>>
>>>> On 16/11/22 12:14 am, Christophe Leroy wrote:
>>>>>
>>>>>
>>>>> Le 14/11/2022 à 18:27, Christophe Leroy a écrit :
>>>>>>
>>>>>>
>>>>>> Le 14/11/2022 à 15:47, Hari Bathini a écrit :
>>>>>>> Hi Christophe,
>>>>>>>
>>>>>>> On 11/11/22 4:55 pm, Christophe Leroy wrote:
>>>>>>>> Le 10/11/2022 à 19:43, Hari Bathini a écrit :
>>>>>>>>> Most BPF programs are small, but they consume a page each. For
>>>>>>>>> systems
>>>>>>>>> with busy traffic and many BPF programs, this may also add
>>>>>>>>> significant
>>>>>>>>> pressure on instruction TLB. High iTLB pressure usually slows down
>>>>>>>>> the
>>>>>>>>> whole system causing visible performance degradation for production
>>>>>>>>> workloads.
>>>>>>>>>
>>>>>>>>> bpf_prog_pack, a customized allocator that packs multiple bpf
>>>>>>>>> programs
>>>>>>>>> into preallocated memory chunks, was proposed [1] to address it.
>>>>>>>>> This
>>>>>>>>> series extends this support on powerpc.
>>>>>>>>>
>>>>>>>>> Patches 1 & 2 add the arch specific functions needed to support
>>>>>>>>> this
>>>>>>>>> feature. Patch 3 enables the support for powerpc. The last patch
>>>>>>>>> ensures cleanup is handled racefully.
>>>>>>>>>
>>>>>>>
>>>>>>>>> Tested the changes successfully on a PowerVM. patch_instruction(),
>>>>>>>>> needed for bpf_arch_text_copy(), is failing for ppc32. Debugging
>>>>>>>>> it.
>>>>>>>>> Posting the patches in the meanwhile for feedback on these changes.
>>>>>>>>
>>>>>>>> I did a quick test on ppc32, I don't get such a problem, only
>>>>>>>> something
>>>>>>>> wrong in the dump print as traps intructions only are dumped, but
>>>>>>>> tcpdump works as expected:
>>>>>>>
>>>>>>> Thanks for the quick test. Could you please share the config you
>>>>>>> used.
>>>>>>> I am probably missing a few knobs in my conifg...
>>>>>>>
>>>>>>
>>>>>
>>>>> I also managed to test it on QEMU. The config is based on
>>>>> pmac32_defconfig.
>>>>
>>>> I had the same config but hit this problem:
>>>>
>>>>        # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>>>        test_bpf: #0 TAX
>>>>        ------------[ cut here ]------------
>>>>        WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>>>> bpf_int_jit_compile+0x8a0/0x9f8
>>>
>>> I get no such problem, on QEMU, and I checked the .config has:
>>
>>> CONFIG_STRICT_KERNEL_RWX=y
>>> CONFIG_STRICT_MODULE_RWX=y
>>
>> Yeah. That did the trick.
> 
> Interesting. I guess we have to find out why it fails when those config
> are missing.
> 
> Maybe module code plays with RO and NX flags even if
> CONFIG_STRICT_MODULE_RWX is not selected ?

Need to look at the code closely but fwiw, observing same failure on
64-bit as well with !STRICT_RWX...

Thanks
Hari

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-18  9:39                 ` Hari Bathini
@ 2022-11-18 11:46                   ` Christophe Leroy
  2022-11-18 17:28                     ` Song Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2022-11-18 11:46 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



Le 18/11/2022 à 10:39, Hari Bathini a écrit :
> 
> 
> On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>>
>>>>> I had the same config but hit this problem:
>>>>>
>>>>>        # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>>>>        test_bpf: #0 TAX
>>>>>        ------------[ cut here ]------------
>>>>>        WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>>>>> bpf_int_jit_compile+0x8a0/0x9f8
>>>>
>>>> I get no such problem, on QEMU, and I checked the .config has:
>>>
>>>> CONFIG_STRICT_KERNEL_RWX=y
>>>> CONFIG_STRICT_MODULE_RWX=y
>>>
>>> Yeah. That did the trick.
>>
>> Interesting. I guess we have to find out why it fails when those config
>> are missing.
>>
>> Maybe module code plays with RO and NX flags even if
>> CONFIG_STRICT_MODULE_RWX is not selected ?
> 
> Need to look at the code closely but fwiw, observing same failure on
> 64-bit as well with !STRICT_RWX...

The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They 
do set_memory_ro() and set_memory_x() without taking into account 
CONFIG_STRICT_MODULE_RWX.

When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc() 
allocates PAGE_KERNEL memory, that is RW memory, and expects the user to 
call do set_memory_ro() and set_memory_x().

But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc 
module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory, 
and expects to be able to always write into it.

Christophe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-18 11:46                   ` Christophe Leroy
@ 2022-11-18 17:28                     ` Song Liu
  2022-11-18 18:05                       ` Christophe Leroy
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2022-11-18 17:28 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Hari Bathini, linuxppc-dev, bpf@vger.kernel.org, Naveen N. Rao,
	Song Liu, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko

On Fri, Nov 18, 2022 at 3:47 AM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 18/11/2022 à 10:39, Hari Bathini a écrit :
> >
> >
> > On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>>
> >>>>> I had the same config but hit this problem:
> >>>>>
> >>>>>        # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
> >>>>>        test_bpf: #0 TAX
> >>>>>        ------------[ cut here ]------------
> >>>>>        WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
> >>>>> bpf_int_jit_compile+0x8a0/0x9f8
> >>>>
> >>>> I get no such problem, on QEMU, and I checked the .config has:
> >>>
> >>>> CONFIG_STRICT_KERNEL_RWX=y
> >>>> CONFIG_STRICT_MODULE_RWX=y
> >>>
> >>> Yeah. That did the trick.
> >>
> >> Interesting. I guess we have to find out why it fails when those config
> >> are missing.
> >>
> >> Maybe module code plays with RO and NX flags even if
> >> CONFIG_STRICT_MODULE_RWX is not selected ?
> >
> > Need to look at the code closely but fwiw, observing same failure on
> > 64-bit as well with !STRICT_RWX...
>
> The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They
> do set_memory_ro() and set_memory_x() without taking into account
> CONFIG_STRICT_MODULE_RWX.
>
> When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc()
> allocates PAGE_KERNEL memory, that is RW memory, and expects the user to
> call do set_memory_ro() and set_memory_x().
>
> But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc
> module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory,
> and expects to be able to always write into it.

Ah, I see. x86_64 requires CONFIG_STRICT_MODULE_RWX, so this hasn't
been a problem yet.

Thanks,
Song

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc
  2022-11-18 17:28                     ` Song Liu
@ 2022-11-18 18:05                       ` Christophe Leroy
  0 siblings, 0 replies; 19+ messages in thread
From: Christophe Leroy @ 2022-11-18 18:05 UTC (permalink / raw)
  To: Song Liu, Hari Bathini
  Cc: linuxppc-dev, bpf@vger.kernel.org, Naveen N. Rao, Song Liu,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko



Le 18/11/2022 à 18:28, Song Liu a écrit :
> On Fri, Nov 18, 2022 at 3:47 AM Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>>
>>
>> Le 18/11/2022 à 10:39, Hari Bathini a écrit :
>>>
>>>
>>> On 18/11/22 2:21 pm, Christophe Leroy wrote: >>>>>
>>>>>>> I had the same config but hit this problem:
>>>>>>>
>>>>>>>         # echo 1 > /proc/sys/net/core/bpf_jit_enable; modprobe test_bpf
>>>>>>>         test_bpf: #0 TAX
>>>>>>>         ------------[ cut here ]------------
>>>>>>>         WARNING: CPU: 0 PID: 96 at arch/powerpc/net/bpf_jit_comp.c:367
>>>>>>> bpf_int_jit_compile+0x8a0/0x9f8
>>>>>>
>>>>>> I get no such problem, on QEMU, and I checked the .config has:
>>>>>
>>>>>> CONFIG_STRICT_KERNEL_RWX=y
>>>>>> CONFIG_STRICT_MODULE_RWX=y
>>>>>
>>>>> Yeah. That did the trick.
>>>>
>>>> Interesting. I guess we have to find out why it fails when those config
>>>> are missing.
>>>>
>>>> Maybe module code plays with RO and NX flags even if
>>>> CONFIG_STRICT_MODULE_RWX is not selected ?
>>>
>>> Need to look at the code closely but fwiw, observing same failure on
>>> 64-bit as well with !STRICT_RWX...
>>
>> The problem is in bpf_prog_pack_alloc() and in alloc_new_pack() : They
>> do set_memory_ro() and set_memory_x() without taking into account
>> CONFIG_STRICT_MODULE_RWX.
>>
>> When CONFIG_STRICT_MODULE_RWX is selected, powerpc module_alloc()
>> allocates PAGE_KERNEL memory, that is RW memory, and expects the user to
>> call do set_memory_ro() and set_memory_x().
>>
>> But when CONFIG_STRICT_MODULE_RWX is not selected, powerpc
>> module_alloc() allocates PAGE_KERNEL_TEXT memory, that is RWX memory,
>> and expects to be able to always write into it.
> 
> Ah, I see. x86_64 requires CONFIG_STRICT_MODULE_RWX, so this hasn't
> been a problem yet.
> 

In fact it shouldn't be a problem for BPF on powerpc either. Because 
powerpc BPF expects RO at all time and today uses bpf_jit_binary_lock_ro().

It just means that we can't use patch_instruction() for that. Anyway, 
using patch_instruction() was sub-optimal.

All we have to do I think is set a mirror of the page using vmap() then 
perform a memcpy() of the code then vunmap() it. Maybe a call to 
flush_tlb_kernel_range() will be also needed, unless BPF already does it.

Christophe

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-11-18 18:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-10 18:43 [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Hari Bathini
2022-11-10 18:43 ` [RFC PATCH 1/3] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
2022-11-13 13:17   ` Christophe Leroy
2022-11-14 14:54     ` Hari Bathini
2022-11-10 18:43 ` [RFC PATCH 2/3] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack Hari Bathini
2022-11-13 18:00   ` Christophe Leroy
2022-11-10 18:43 ` [RFC PATCH 3/3] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] Hari Bathini
2022-11-13 18:36   ` Christophe Leroy
2022-11-11 11:25 ` [RFC PATCH 0/3] enable bpf_prog_pack allocator for powerpc Christophe Leroy
2022-11-14 14:47   ` Hari Bathini
     [not found]     ` <bf0af91e-861c-1608-7150-d31578be9b02@csgroup.eu>
     [not found]       ` <e0266414-843f-db48-a56d-1d8a8944726a@csgroup.eu>
2022-11-16 17:01         ` Hari Bathini
2022-11-16 17:16           ` Christophe Leroy
2022-11-17  6:59           ` Christophe Leroy
2022-11-18  8:39             ` Hari Bathini
2022-11-18  8:51               ` Christophe Leroy
2022-11-18  9:39                 ` Hari Bathini
2022-11-18 11:46                   ` Christophe Leroy
2022-11-18 17:28                     ` Song Liu
2022-11-18 18:05                       ` Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox