[PATCH bpf-next 0/3] Use bpf_prog_pack for RV64 bpf trampoline

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/3] Use bpf_prog_pack for RV64 bpf trampoline
@ 2024-01-23 10:32 Pu Lehui
  2024-01-23 10:32 ` [PATCH bpf-next 1/3] bpf: Use precise image size for struct_ops trampoline Pu Lehui
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Pu Lehui @ 2024-01-23 10:32 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Song Liu, Puranjay Mohan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui, Pu Lehui

We used bpf_prog_pack to aggregate bpf programs into huge page to
relieve the iTLB pressure on the system. We can apply it to bpf
trampoline, as Song had been implemented it in core and x86 [0]. This
patch is going to use bpf_prog_pack to RV64 bpf trampoline. Since Song
and Puranjay have done a lot of work for bpf_prog_pack on RV64,
implementing this function will be easy. But one thing to mention is
that emit_call in RV64 will generate the maximum number of instructions
during dry run, but during real patching it may be optimized to 1
instruction due to distance. This is no problem as it does not overflow
the allocated RO image.

Tests about regular trampoline and struct_ops trampoline have passed, as
well as "test_verifier" with no failure cases.

Link: https://lore.kernel.org/all/20231206224054.492250-1-song@kernel.org [0]

Pu Lehui (3):
  bpf: Use precise image size for struct_ops trampoline
  bpf: Keep im address consistent between dry run and real patching
  riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline

 arch/arm64/net/bpf_jit_comp.c   |  7 ++--
 arch/riscv/net/bpf_jit_comp64.c | 66 +++++++++++++++++++++++----------
 arch/s390/net/bpf_jit_comp.c    |  7 ++--
 arch/x86/net/bpf_jit_comp.c     |  7 ++--
 include/linux/bpf.h             |  4 +-
 kernel/bpf/bpf_struct_ops.c     |  4 +-
 kernel/bpf/trampoline.c         | 43 +++++++++++----------
 7 files changed, 81 insertions(+), 57 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf-next 1/3] bpf: Use precise image size for struct_ops trampoline
  2024-01-23 10:32 [PATCH bpf-next 0/3] Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
@ 2024-01-23 10:32 ` Pu Lehui
  2024-01-29 17:21   ` Song Liu
  2024-01-23 10:32 ` [PATCH bpf-next 2/3] bpf: Keep im address consistent between dry run and real patching Pu Lehui
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Pu Lehui @ 2024-01-23 10:32 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Song Liu, Puranjay Mohan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

For trampoline using bpf_prog_pack, we need to generate a rw_image
buffer with size of (image_end - image). For regular trampoline, we use
the precise image size generated by arch_bpf_trampoline_size to allocate
rw_image. But for struct_ops trampoline, we allocate rw_image directly
using close to PAGE_SIZE size. We do not need to allocate for that much,
as the patch size is usually much smaller than PAGE_SIZE. Let's use
precise image size for it too.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 kernel/bpf/bpf_struct_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 02068bd0e4d9..e2e1bf3c69a3 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -368,7 +368,7 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
 		return size;
 	if (size > (unsigned long)image_end - (unsigned long)image)
 		return -E2BIG;
-	return arch_prepare_bpf_trampoline(NULL, image, image_end,
+	return arch_prepare_bpf_trampoline(NULL, image, image + size,
 					   model, flags, tlinks, stub_func);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next 2/3] bpf: Keep im address consistent between dry run and real patching
  2024-01-23 10:32 [PATCH bpf-next 0/3] Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
  2024-01-23 10:32 ` [PATCH bpf-next 1/3] bpf: Use precise image size for struct_ops trampoline Pu Lehui
@ 2024-01-23 10:32 ` Pu Lehui
  2024-01-29 17:58   ` Song Liu
  2024-01-23 10:32 ` [PATCH bpf-next 3/3] riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
  2024-01-28 17:30 ` [PATCH bpf-next 0/3] " Björn Töpel
  3 siblings, 1 reply; 9+ messages in thread
From: Pu Lehui @ 2024-01-23 10:32 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Song Liu, Puranjay Mohan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

In __arch_prepare_bpf_trampoline, we emit instructions to store the
address of im to register and then pass it to __bpf_tramp_enter and
__bpf_tramp_exit functions. Currently we use fake im in
arch_bpf_trampoline_size for the dry run, and then allocate new im for
the real patching. This is fine for architectures that use fixed
instructions to generate addresses. However, for architectures that use
dynamic instructions to generate addresses, this may make the front and
rear images inconsistent, leading to patching overflow. We can extract
the im allocation ahead of the dry run and pass the allocated im to
arch_bpf_trampoline_size, so that we can ensure that im is consistent in
dry run and real patching.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/arm64/net/bpf_jit_comp.c   |  7 +++---
 arch/riscv/net/bpf_jit_comp64.c |  7 +++---
 arch/s390/net/bpf_jit_comp.c    |  7 +++---
 arch/x86/net/bpf_jit_comp.c     |  7 +++---
 include/linux/bpf.h             |  4 +--
 kernel/bpf/bpf_struct_ops.c     |  2 +-
 kernel/bpf/trampoline.c         | 43 ++++++++++++++++-----------------
 7 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8955da5c47cf..fad760f14a96 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -2041,14 +2041,13 @@ static int btf_func_model_nregs(const struct btf_func_model *m)
 	return nregs;
 }
 
-int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
-			     struct bpf_tramp_links *tlinks, void *func_addr)
+int arch_bpf_trampoline_size(struct bpf_tramp_image *im, const struct btf_func_model *m,
+			     u32 flags, struct bpf_tramp_links *tlinks, void *func_addr)
 {
 	struct jit_ctx ctx = {
 		.image = NULL,
 		.idx = 0,
 	};
-	struct bpf_tramp_image im;
 	int nregs, ret;
 
 	nregs = btf_func_model_nregs(m);
@@ -2056,7 +2055,7 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
 	if (nregs > 8)
 		return -ENOTSUPP;
 
-	ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, nregs, flags);
+	ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags);
 	if (ret < 0)
 		return ret;
 
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 719a97e7edb2..5c4e0ac389d0 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1030,17 +1030,16 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	return ret;
 }
 
-int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
-			     struct bpf_tramp_links *tlinks, void *func_addr)
+int arch_bpf_trampoline_size(struct bpf_tramp_image *im, const struct btf_func_model *m,
+			     u32 flags, struct bpf_tramp_links *tlinks, void *func_addr)
 {
-	struct bpf_tramp_image im;
 	struct rv_jit_context ctx;
 	int ret;
 
 	ctx.ninsns = 0;
 	ctx.insns = NULL;
 	ctx.ro_insns = NULL;
-	ret = __arch_prepare_bpf_trampoline(&im, m, tlinks, func_addr, flags, &ctx);
+	ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx);
 
 	return ret < 0 ? ret : ninsns_rvoff(ctx.ninsns);
 }
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index b418333bb086..adf289eee6cd 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -2638,16 +2638,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	return 0;
 }
 
-int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
-			     struct bpf_tramp_links *tlinks, void *orig_call)
+int arch_bpf_trampoline_size(struct bpf_tramp_image *im, const struct btf_func_model *m,
+			     u32 flags, struct bpf_tramp_links *tlinks, void *orig_call)
 {
-	struct bpf_tramp_image im;
 	struct bpf_tramp_jit tjit;
 	int ret;
 
 	memset(&tjit, 0, sizeof(tjit));
 
-	ret = __arch_prepare_bpf_trampoline(&im, &tjit, m, flags,
+	ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags,
 					    tlinks, orig_call);
 
 	return ret < 0 ? ret : tjit.common.prg;
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e1390d1e331b..fdef44913643 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2817,10 +2817,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	return ret;
 }
 
-int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
-			     struct bpf_tramp_links *tlinks, void *func_addr)
+int arch_bpf_trampoline_size(struct bpf_tramp_image *im, const struct btf_func_model *m,
+			     u32 flags, struct bpf_tramp_links *tlinks, void *func_addr)
 {
-	struct bpf_tramp_image im;
 	void *image;
 	int ret;
 
@@ -2835,7 +2834,7 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
 	if (!image)
 		return -ENOMEM;
 
-	ret = __arch_prepare_bpf_trampoline(&im, image, image + PAGE_SIZE, image,
+	ret = __arch_prepare_bpf_trampoline(im, image, image + PAGE_SIZE, image,
 					    m, flags, tlinks, func_addr);
 	bpf_jit_free_exec(image);
 	return ret;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 377857b232c6..d3a486e12b17 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1114,8 +1114,8 @@ void *arch_alloc_bpf_trampoline(unsigned int size);
 void arch_free_bpf_trampoline(void *image, unsigned int size);
 void arch_protect_bpf_trampoline(void *image, unsigned int size);
 void arch_unprotect_bpf_trampoline(void *image, unsigned int size);
-int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
-			     struct bpf_tramp_links *tlinks, void *func_addr);
+int arch_bpf_trampoline_size(struct bpf_tramp_image *im, const struct btf_func_model *m,
+			     u32 flags, struct bpf_tramp_links *tlinks, void *func_addr);
 
 u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog,
 					     struct bpf_tramp_run_ctx *run_ctx);
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index e2e1bf3c69a3..8b3c6cc7ea94 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -363,7 +363,7 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
 	if (model->ret_size > 0)
 		flags |= BPF_TRAMP_F_RET_FENTRY_RET;
 
-	size = arch_bpf_trampoline_size(model, flags, tlinks, NULL);
+	size = arch_bpf_trampoline_size(NULL, model, flags, tlinks, NULL);
 	if (size < 0)
 		return size;
 	if (size > (unsigned long)image_end - (unsigned long)image)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index d382f5ebe06c..25621d97f3ca 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -349,20 +349,15 @@ static void bpf_tramp_image_put(struct bpf_tramp_image *im)
 	call_rcu_tasks_trace(&im->rcu, __bpf_tramp_image_put_rcu_tasks);
 }
 
-static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key, int size)
+static int bpf_tramp_image_alloc(struct bpf_tramp_image *im, u64 key, int size)
 {
-	struct bpf_tramp_image *im;
 	struct bpf_ksym *ksym;
 	void *image;
-	int err = -ENOMEM;
-
-	im = kzalloc(sizeof(*im), GFP_KERNEL);
-	if (!im)
-		goto out;
+	int err;
 
 	err = bpf_jit_charge_modmem(size);
 	if (err)
-		goto out_free_im;
+		goto out;
 	im->size = size;
 
 	err = -ENOMEM;
@@ -378,16 +373,14 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key, int size)
 	INIT_LIST_HEAD_RCU(&ksym->lnode);
 	snprintf(ksym->name, KSYM_NAME_LEN, "bpf_trampoline_%llu", key);
 	bpf_image_ksym_add(image, size, ksym);
-	return im;
+	return 0;
 
 out_free_image:
 	arch_free_bpf_trampoline(im->image, im->size);
 out_uncharge:
 	bpf_jit_uncharge_modmem(size);
-out_free_im:
-	kfree(im);
 out:
-	return ERR_PTR(err);
+	return err;
 }
 
 static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex)
@@ -432,23 +425,27 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 		tr->flags |= BPF_TRAMP_F_ORIG_STACK;
 #endif
 
-	size = arch_bpf_trampoline_size(&tr->func.model, tr->flags,
+	im = kzalloc(sizeof(*im), GFP_KERNEL);
+	if (!im) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	size = arch_bpf_trampoline_size(im, &tr->func.model, tr->flags,
 					tlinks, tr->func.addr);
 	if (size < 0) {
 		err = size;
-		goto out;
+		goto out_free_im;
 	}
 
 	if (size > PAGE_SIZE) {
 		err = -E2BIG;
-		goto out;
+		goto out_free_im;
 	}
 
-	im = bpf_tramp_image_alloc(tr->key, size);
-	if (IS_ERR(im)) {
-		err = PTR_ERR(im);
-		goto out;
-	}
+	err = bpf_tramp_image_alloc(im, tr->key, size);
+	if (err < 0)
+		goto out_free_im;
 
 	err = arch_prepare_bpf_trampoline(im, im->image, im->image + size,
 					  &tr->func.model, tr->flags, tlinks,
@@ -496,6 +493,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 
 out_free:
 	bpf_tramp_image_free(im);
+out_free_im:
+	kfree_rcu(im, rcu);
 	goto out;
 }
 
@@ -1085,8 +1084,8 @@ void __weak arch_unprotect_bpf_trampoline(void *image, unsigned int size)
 	set_memory_rw((long)image, 1);
 }
 
-int __weak arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
-				    struct bpf_tramp_links *tlinks, void *func_addr)
+int __weak arch_bpf_trampoline_size(struct bpf_tramp_image *im, const struct btf_func_model *m,
+				    u32 flags, struct bpf_tramp_links *tlinks, void *func_addr)
 {
 	return -ENOTSUPP;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next 3/3] riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline
  2024-01-23 10:32 [PATCH bpf-next 0/3] Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
  2024-01-23 10:32 ` [PATCH bpf-next 1/3] bpf: Use precise image size for struct_ops trampoline Pu Lehui
  2024-01-23 10:32 ` [PATCH bpf-next 2/3] bpf: Keep im address consistent between dry run and real patching Pu Lehui
@ 2024-01-23 10:32 ` Pu Lehui
  2024-01-29 21:58   ` Song Liu
  2024-01-28 17:30 ` [PATCH bpf-next 0/3] " Björn Töpel
  3 siblings, 1 reply; 9+ messages in thread
From: Pu Lehui @ 2024-01-23 10:32 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Song Liu, Puranjay Mohan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

We used bpf_prog_pack to aggregate bpf programs into huge page to
relieve the iTLB pressure on the system. We can apply it to bpf
trampoline, as Song had been implemented it in core and x86 [0]. This
patch is going to use bpf_prog_pack to RV64 bpf trampoline. Since Song
and Puranjay have done a lot of work for bpf_prog_pack on RV64,
implementing this function will be easy. But one thing to mention is
that emit_call in RV64 will generate the maximum number of instructions
during dry run, but during real patching it may be optimized to 1
instruction due to distance. This is no problem as it does not overflow
the allocated RO image.

Link: https://lore.kernel.org/all/20231206224054.492250-1-song@kernel.org [0]
Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit_comp64.c | 59 ++++++++++++++++++++++++---------
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 5c4e0ac389d0..903f724cd785 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -961,7 +961,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 			goto out;
 		emit_sd(RV_REG_FP, -retval_off, RV_REG_A0, ctx);
 		emit_sd(RV_REG_FP, -(retval_off - 8), regmap[BPF_REG_0], ctx);
-		im->ip_after_call = ctx->insns + ctx->ninsns;
+		im->ip_after_call = ctx->ro_insns + ctx->ninsns;
 		/* 2 nops reserved for auipc+jalr pair */
 		emit(rv_nop(), ctx);
 		emit(rv_nop(), ctx);
@@ -982,7 +982,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	}
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
-		im->ip_epilogue = ctx->insns + ctx->ninsns;
+		im->ip_epilogue = ctx->ro_insns + ctx->ninsns;
 		emit_imm(RV_REG_A0, (const s64)im, ctx);
 		ret = emit_call((const u64)__bpf_tramp_exit, true, ctx);
 		if (ret)
@@ -1044,31 +1044,60 @@ int arch_bpf_trampoline_size(struct bpf_tramp_image *im, const struct btf_func_m
 	return ret < 0 ? ret : ninsns_rvoff(ctx.ninsns);
 }
 
-int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
-				void *image_end, const struct btf_func_model *m,
+void *arch_alloc_bpf_trampoline(unsigned int size)
+{
+	return bpf_prog_pack_alloc(size, bpf_fill_ill_insns);
+}
+
+void arch_free_bpf_trampoline(void *image, unsigned int size)
+{
+	bpf_prog_pack_free(image, size);
+}
+
+void arch_protect_bpf_trampoline(void *image, unsigned int size)
+{
+}
+
+void arch_unprotect_bpf_trampoline(void *image, unsigned int size)
+{
+}
+
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
+				void *ro_image_end, const struct btf_func_model *m,
 				u32 flags, struct bpf_tramp_links *tlinks,
 				void *func_addr)
 {
 	int ret;
+	void *image, *tmp;
 	struct rv_jit_context ctx;
+	u32 size = ro_image_end - ro_image;
+
+	image = kvmalloc(size, GFP_KERNEL);
+	if (!image)
+		return -ENOMEM;
 
 	ctx.ninsns = 0;
-	/*
-	 * The bpf_int_jit_compile() uses a RW buffer (ctx.insns) to write the
-	 * JITed instructions and later copies it to a RX region (ctx.ro_insns).
-	 * It also uses ctx.ro_insns to calculate offsets for jumps etc. As the
-	 * trampoline image uses the same memory area for writing and execution,
-	 * both ctx.insns and ctx.ro_insns can be set to image.
-	 */
 	ctx.insns = image;
-	ctx.ro_insns = image;
+	ctx.ro_insns = ro_image;
 	ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx);
 	if (ret < 0)
-		return ret;
+		goto out;
 
-	bpf_flush_icache(ctx.insns, ctx.insns + ctx.ninsns);
+	if (WARN_ON(size < ninsns_rvoff(ctx.ninsns))) {
+		ret = -E2BIG;
+		goto out;
+	}
 
-	return ninsns_rvoff(ret);
+	tmp = bpf_arch_text_copy(ro_image, image, size);
+	if (IS_ERR(tmp)) {
+		ret = PTR_ERR(tmp);
+		goto out;
+	}
+
+	bpf_flush_icache(ro_image, ro_image + size);
+out:
+	kvfree(image);
+	return ret < 0 ? ret : size;
 }
 
 int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next 0/3] Use bpf_prog_pack for RV64 bpf trampoline
  2024-01-23 10:32 [PATCH bpf-next 0/3] Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
                   ` (2 preceding siblings ...)
  2024-01-23 10:32 ` [PATCH bpf-next 3/3] riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
@ 2024-01-28 17:30 ` Björn Töpel
  3 siblings, 0 replies; 9+ messages in thread
From: Björn Töpel @ 2024-01-28 17:30 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Song Liu, Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> We used bpf_prog_pack to aggregate bpf programs into huge page to
> relieve the iTLB pressure on the system. We can apply it to bpf
> trampoline, as Song had been implemented it in core and x86 [0]. This
> patch is going to use bpf_prog_pack to RV64 bpf trampoline. Since Song
> and Puranjay have done a lot of work for bpf_prog_pack on RV64,
> implementing this function will be easy. But one thing to mention is
> that emit_call in RV64 will generate the maximum number of instructions
> during dry run, but during real patching it may be optimized to 1
> instruction due to distance. This is no problem as it does not overflow
> the allocated RO image.
>
> Tests about regular trampoline and struct_ops trampoline have passed, as
> well as "test_verifier" with no failure cases.
>
> Link: https://lore.kernel.org/all/20231206224054.492250-1-song@kernel.org [0]

Tested-by: Björn Töpel <bjorn@rivosinc.com> #riscv

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Use precise image size for struct_ops trampoline
  2024-01-23 10:32 ` [PATCH bpf-next 1/3] bpf: Use precise image size for struct_ops trampoline Pu Lehui
@ 2024-01-29 17:21   ` Song Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2024-01-29 17:21 UTC (permalink / raw)
  To: Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel, Puranjay Mohan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui

On Tue, Jan 23, 2024 at 2:32 AM Pu Lehui <pulehui@huaweicloud.com> wrote:
>
> From: Pu Lehui <pulehui@huawei.com>
>
> For trampoline using bpf_prog_pack, we need to generate a rw_image
> buffer with size of (image_end - image). For regular trampoline, we use
> the precise image size generated by arch_bpf_trampoline_size to allocate
> rw_image. But for struct_ops trampoline, we allocate rw_image directly
> using close to PAGE_SIZE size. We do not need to allocate for that much,
> as the patch size is usually much smaller than PAGE_SIZE. Let's use
> precise image size for it too.
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>

Acked-by: Song Liu <song@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Keep im address consistent between dry run and real patching
  2024-01-23 10:32 ` [PATCH bpf-next 2/3] bpf: Keep im address consistent between dry run and real patching Pu Lehui
@ 2024-01-29 17:58   ` Song Liu
  2024-01-30  3:19     ` Pu Lehui
  0 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2024-01-29 17:58 UTC (permalink / raw)
  To: Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel, Puranjay Mohan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui

On Tue, Jan 23, 2024 at 2:32 AM Pu Lehui <pulehui@huaweicloud.com> wrote:
>
> From: Pu Lehui <pulehui@huawei.com>
>
> In __arch_prepare_bpf_trampoline, we emit instructions to store the
> address of im to register and then pass it to __bpf_tramp_enter and
> __bpf_tramp_exit functions. Currently we use fake im in
> arch_bpf_trampoline_size for the dry run, and then allocate new im for
> the real patching. This is fine for architectures that use fixed
> instructions to generate addresses. However, for architectures that use
> dynamic instructions to generate addresses, this may make the front and
> rear images inconsistent, leading to patching overflow. We can extract
> the im allocation ahead of the dry run and pass the allocated im to
> arch_bpf_trampoline_size, so that we can ensure that im is consistent in
> dry run and real patching.

IIUC, this is required because emit_imm() for riscv may generate variable
size instructions (depends on the value of im). I wonder we can fix this by
simply set a special value for fake im in arch_bpf_trampoline_size() to
so that emit_imm() always gives biggest value for the fake im.

>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
[...]
>
>  static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex)
> @@ -432,23 +425,27 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
>                 tr->flags |= BPF_TRAMP_F_ORIG_STACK;
>  #endif
>
> -       size = arch_bpf_trampoline_size(&tr->func.model, tr->flags,
> +       im = kzalloc(sizeof(*im), GFP_KERNEL);
> +       if (!im) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       size = arch_bpf_trampoline_size(im, &tr->func.model, tr->flags,
>                                         tlinks, tr->func.addr);
>         if (size < 0) {
>                 err = size;
> -               goto out;
> +               goto out_free_im;
>         }
>
>         if (size > PAGE_SIZE) {
>                 err = -E2BIG;
> -               goto out;
> +               goto out_free_im;
>         }
>
> -       im = bpf_tramp_image_alloc(tr->key, size);
> -       if (IS_ERR(im)) {
> -               err = PTR_ERR(im);
> -               goto out;
> -       }
> +       err = bpf_tramp_image_alloc(im, tr->key, size);
> +       if (err < 0)
> +               goto out_free_im;

I feel this change just makes bpf_trampoline_update() even
more confusing.

>
>         err = arch_prepare_bpf_trampoline(im, im->image, im->image + size,
>                                           &tr->func.model, tr->flags, tlinks,
> @@ -496,6 +493,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
>
>  out_free:
>         bpf_tramp_image_free(im);
> +out_free_im:
> +       kfree_rcu(im, rcu);

If we goto out_free above, we will call kfree_rcu(im, rcu)
twice, right? Once in bpf_tramp_image_free(), and again
here.

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next 3/3] riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline
  2024-01-23 10:32 ` [PATCH bpf-next 3/3] riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
@ 2024-01-29 21:58   ` Song Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2024-01-29 21:58 UTC (permalink / raw)
  To: Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel, Puranjay Mohan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui

On Tue, Jan 23, 2024 at 2:32 AM Pu Lehui <pulehui@huaweicloud.com> wrote:
>
> From: Pu Lehui <pulehui@huawei.com>
>
> We used bpf_prog_pack to aggregate bpf programs into huge page to
> relieve the iTLB pressure on the system. We can apply it to bpf
> trampoline, as Song had been implemented it in core and x86 [0]. This
> patch is going to use bpf_prog_pack to RV64 bpf trampoline. Since Song
> and Puranjay have done a lot of work for bpf_prog_pack on RV64,
> implementing this function will be easy. But one thing to mention is
> that emit_call in RV64 will generate the maximum number of instructions
> during dry run, but during real patching it may be optimized to 1
> instruction due to distance. This is no problem as it does not overflow
> the allocated RO image.
>
> Link: https://lore.kernel.org/all/20231206224054.492250-1-song@kernel.org [0]
> Signed-off-by: Pu Lehui <pulehui@huawei.com>

Acked-by: Song Liu <song@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Keep im address consistent between dry run and real patching
  2024-01-29 17:58   ` Song Liu
@ 2024-01-30  3:19     ` Pu Lehui
  0 siblings, 0 replies; 9+ messages in thread
From: Pu Lehui @ 2024-01-30  3:19 UTC (permalink / raw)
  To: Song Liu, Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel, Puranjay Mohan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson



On 2024/1/30 1:58, Song Liu wrote:
> On Tue, Jan 23, 2024 at 2:32 AM Pu Lehui <pulehui@huaweicloud.com> wrote:
>>
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> In __arch_prepare_bpf_trampoline, we emit instructions to store the
>> address of im to register and then pass it to __bpf_tramp_enter and
>> __bpf_tramp_exit functions. Currently we use fake im in
>> arch_bpf_trampoline_size for the dry run, and then allocate new im for
>> the real patching. This is fine for architectures that use fixed
>> instructions to generate addresses. However, for architectures that use
>> dynamic instructions to generate addresses, this may make the front and
>> rear images inconsistent, leading to patching overflow. We can extract
>> the im allocation ahead of the dry run and pass the allocated im to
>> arch_bpf_trampoline_size, so that we can ensure that im is consistent in
>> dry run and real patching.
> 
> IIUC, this is required because emit_imm() for riscv may generate variable
> size instructions (depends on the value of im). I wonder we can fix this by
> simply set a special value for fake im in arch_bpf_trampoline_size() to
> so that emit_imm() always gives biggest value for the fake im.
> 

Hi Song,

Thanks for your review. Yes, I had the same idea as you at first, emit 
biggist count instructions when ctx->insns is NULL, but this may lead to 
memory waste. So try moving out of IM to get a fixed IM address, maybe 
other architectures require it too. If you feel it is inappropriate, I 
will withdraw it.

>>
>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>> ---
> [...]
>>
>>   static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex)
>> @@ -432,23 +425,27 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
>>                  tr->flags |= BPF_TRAMP_F_ORIG_STACK;
>>   #endif
>>
>> -       size = arch_bpf_trampoline_size(&tr->func.model, tr->flags,
>> +       im = kzalloc(sizeof(*im), GFP_KERNEL);
>> +       if (!im) {
>> +               err = -ENOMEM;
>> +               goto out;
>> +       }
>> +
>> +       size = arch_bpf_trampoline_size(im, &tr->func.model, tr->flags,
>>                                          tlinks, tr->func.addr);
>>          if (size < 0) {
>>                  err = size;
>> -               goto out;
>> +               goto out_free_im;
>>          }
>>
>>          if (size > PAGE_SIZE) {
>>                  err = -E2BIG;
>> -               goto out;
>> +               goto out_free_im;
>>          }
>>
>> -       im = bpf_tramp_image_alloc(tr->key, size);
>> -       if (IS_ERR(im)) {
>> -               err = PTR_ERR(im);
>> -               goto out;
>> -       }
>> +       err = bpf_tramp_image_alloc(im, tr->key, size);
>> +       if (err < 0)
>> +               goto out_free_im;
> 
> I feel this change just makes bpf_trampoline_update() even
> more confusing.
> 
>>
>>          err = arch_prepare_bpf_trampoline(im, im->image, im->image + size,
>>                                            &tr->func.model, tr->flags, tlinks,
>> @@ -496,6 +493,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
>>
>>   out_free:
>>          bpf_tramp_image_free(im);
>> +out_free_im:
>> +       kfree_rcu(im, rcu);
> 
> If we goto out_free above, we will call kfree_rcu(im, rcu)
> twice, right? Once in bpf_tramp_image_free(), and again
> here.
> 

Oops, sorry, forgot to remove kfree_rcu in bpf_tramp_image_free in this 
version.

> Thanks,
> Song
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-30  3:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-23 10:32 [PATCH bpf-next 0/3] Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
2024-01-23 10:32 ` [PATCH bpf-next 1/3] bpf: Use precise image size for struct_ops trampoline Pu Lehui
2024-01-29 17:21   ` Song Liu
2024-01-23 10:32 ` [PATCH bpf-next 2/3] bpf: Keep im address consistent between dry run and real patching Pu Lehui
2024-01-29 17:58   ` Song Liu
2024-01-30  3:19     ` Pu Lehui
2024-01-23 10:32 ` [PATCH bpf-next 3/3] riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline Pu Lehui
2024-01-29 21:58   ` Song Liu
2024-01-28 17:30 ` [PATCH bpf-next 0/3] " Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).