* [PATCH bpf-next v2 0/2] riscv, bpf: Support per-CPU insn and inline bpf_get_smp_processor_id() @ 2024-04-30 17:58 Puranjay Mohan 2024-04-30 17:58 ` [PATCH bpf-next v2 1/2] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan 2024-04-30 17:58 ` [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan 0 siblings, 2 replies; 11+ messages in thread From: Puranjay Mohan @ 2024-04-30 17:58 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui Cc: puranjay12 Changes in v1->v2: v1: https://lore.kernel.org/all/20240405124348.27644-1-puranjay12@gmail.com/ - Use emit_addr() in place of emit_imm() in the first patch. - Add the second patch to inline bpf_get_smp_processor_id() The first patch add the support for the internal-only MOV instruction to resolve per-CPU addrs to absolute addresses. RISC-V uses generic per-cpu implementation where the offsets for CPUs are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores the address of the task_struct in TP register. The first element in task_struct is struct thread_info, and we can get the cpu number by reading from the TP register + offsetof(struct thread_info, cpu). The second patch inlines the calls bpf_get_smp_processor_id() in the JIT by emitting a load from the TP + offsetof(struct thread_info, cpu). Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ [1] https://github.com/anakryiko/linux/commit/8dec900975ef Puranjay Mohan (2): riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs riscv, bpf: inline bpf_get_smp_processor_id() arch/riscv/net/bpf_jit_comp64.c | 50 +++++++++++++++++++++++++++++++++ include/linux/filter.h | 1 + kernel/bpf/core.c | 11 ++++++++ kernel/bpf/verifier.c | 2 ++ 4 files changed, 64 insertions(+) -- 2.40.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH bpf-next v2 1/2] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs 2024-04-30 17:58 [PATCH bpf-next v2 0/2] riscv, bpf: Support per-CPU insn and inline bpf_get_smp_processor_id() Puranjay Mohan @ 2024-04-30 17:58 ` Puranjay Mohan 2024-05-01 16:39 ` Andrii Nakryiko 2024-04-30 17:58 ` [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan 1 sibling, 1 reply; 11+ messages in thread From: Puranjay Mohan @ 2024-04-30 17:58 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui Cc: puranjay12 Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. RISC-V uses generic per-cpu implementation where the offsets for CPUs are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores the address of the task_struct in TP register. The first element in task_struct is struct thread_info, and we can get the cpu number by reading from the TP register + offsetof(struct thread_info, cpu). Once we have the cpu number in a register we read the offset for that cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this offset to the destination register. To measure the improvement from this change, the benchmark in [1] was used on Qemu: Before: glob-arr-inc : 1.127 ± 0.013M/s arr-inc : 1.121 ± 0.004M/s hash-inc : 0.681 ± 0.052M/s After: glob-arr-inc : 1.138 ± 0.011M/s arr-inc : 1.366 ± 0.006M/s hash-inc : 0.676 ± 0.001M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> --- arch/riscv/net/bpf_jit_comp64.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index 15e482f2c657..99d7006f1420 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -12,6 +12,7 @@ #include <linux/stop_machine.h> #include <asm/patch.h> #include <asm/cfi.h> +#include <asm/percpu.h> #include "bpf_jit.h" #define RV_FENTRY_NINSNS 2 @@ -1089,6 +1090,24 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, emit_or(RV_REG_T1, rd, RV_REG_T1, ctx); emit_mv(rd, RV_REG_T1, ctx); break; + } else if (insn_is_mov_percpu_addr(insn)) { + if (rd != rs) + emit_mv(rd, rs, ctx); +#ifdef CONFIG_SMP + /* Load current CPU number in T1 */ + emit_ld(RV_REG_T1, offsetof(struct thread_info, cpu), + RV_REG_TP, ctx); + /* << 3 because offsets are 8 bytes */ + emit_slli(RV_REG_T1, RV_REG_T1, 3, ctx); + /* Load address of __per_cpu_offset array in T2 */ + emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx); + /* Add offset of current CPU to __per_cpu_offset */ + emit_add(RV_REG_T1, RV_REG_T2, RV_REG_T1, ctx); + /* Load __per_cpu_offset[cpu] in T1 */ + emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx); + /* Add the offset to Rd */ + emit_add(rd, rd, RV_REG_T1, ctx); +#endif } if (imm == 1) { /* Special mov32 for zext */ @@ -2038,3 +2057,8 @@ bool bpf_jit_supports_arena(void) { return true; } + +bool bpf_jit_supports_percpu_insn(void) +{ + return true; +} -- 2.40.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 1/2] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs 2024-04-30 17:58 ` [PATCH bpf-next v2 1/2] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan @ 2024-05-01 16:39 ` Andrii Nakryiko 2024-05-02 16:18 ` Björn Töpel 0 siblings, 1 reply; 11+ messages in thread From: Andrii Nakryiko @ 2024-05-01 16:39 UTC (permalink / raw) To: Puranjay Mohan Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui, puranjay12 On Tue, Apr 30, 2024 at 10:58 AM Puranjay Mohan <puranjay@kernel.org> wrote: > > Support an instruction for resolving absolute addresses of per-CPU > data from their per-CPU offsets. This instruction is internal-only and > users are not allowed to use them directly. They will only be used for > internal inlining optimizations for now between BPF verifier and BPF > JITs. > > RISC-V uses generic per-cpu implementation where the offsets for CPUs > are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores > the address of the task_struct in TP register. The first element in > task_struct is struct thread_info, and we can get the cpu number by > reading from the TP register + offsetof(struct thread_info, cpu). > > Once we have the cpu number in a register we read the offset for that > cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this > offset to the destination register. > > To measure the improvement from this change, the benchmark in [1] was > used on Qemu: > > Before: > glob-arr-inc : 1.127 ± 0.013M/s > arr-inc : 1.121 ± 0.004M/s > hash-inc : 0.681 ± 0.052M/s > > After: > glob-arr-inc : 1.138 ± 0.011M/s > arr-inc : 1.366 ± 0.006M/s > hash-inc : 0.676 ± 0.001M/s > > [1] https://github.com/anakryiko/linux/commit/8dec900975ef > > Signed-off-by: Puranjay Mohan <puranjay@kernel.org> > --- > arch/riscv/net/bpf_jit_comp64.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c > index 15e482f2c657..99d7006f1420 100644 > --- a/arch/riscv/net/bpf_jit_comp64.c > +++ b/arch/riscv/net/bpf_jit_comp64.c > @@ -12,6 +12,7 @@ > #include <linux/stop_machine.h> > #include <asm/patch.h> > #include <asm/cfi.h> > +#include <asm/percpu.h> > #include "bpf_jit.h" > > #define RV_FENTRY_NINSNS 2 > @@ -1089,6 +1090,24 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, > emit_or(RV_REG_T1, rd, RV_REG_T1, ctx); > emit_mv(rd, RV_REG_T1, ctx); > break; > + } else if (insn_is_mov_percpu_addr(insn)) { > + if (rd != rs) > + emit_mv(rd, rs, ctx); > +#ifdef CONFIG_SMP > + /* Load current CPU number in T1 */ > + emit_ld(RV_REG_T1, offsetof(struct thread_info, cpu), > + RV_REG_TP, ctx); > + /* << 3 because offsets are 8 bytes */ > + emit_slli(RV_REG_T1, RV_REG_T1, 3, ctx); > + /* Load address of __per_cpu_offset array in T2 */ > + emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx); > + /* Add offset of current CPU to __per_cpu_offset */ > + emit_add(RV_REG_T1, RV_REG_T2, RV_REG_T1, ctx); > + /* Load __per_cpu_offset[cpu] in T1 */ > + emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx); > + /* Add the offset to Rd */ > + emit_add(rd, rd, RV_REG_T1, ctx); is this the right level of code indentation? > +#endif > } > if (imm == 1) { > /* Special mov32 for zext */ > @@ -2038,3 +2057,8 @@ bool bpf_jit_supports_arena(void) > { > return true; > } > + > +bool bpf_jit_supports_percpu_insn(void) > +{ > + return true; > +} > -- > 2.40.1 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 1/2] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs 2024-05-01 16:39 ` Andrii Nakryiko @ 2024-05-02 16:18 ` Björn Töpel 2024-05-02 16:20 ` Puranjay Mohan 0 siblings, 1 reply; 11+ messages in thread From: Björn Töpel @ 2024-05-02 16:18 UTC (permalink / raw) To: Andrii Nakryiko, Puranjay Mohan Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui, puranjay12 Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > On Tue, Apr 30, 2024 at 10:58 AM Puranjay Mohan <puranjay@kernel.org> wrote: >> >> Support an instruction for resolving absolute addresses of per-CPU >> data from their per-CPU offsets. This instruction is internal-only and >> users are not allowed to use them directly. They will only be used for >> internal inlining optimizations for now between BPF verifier and BPF >> JITs. >> >> RISC-V uses generic per-cpu implementation where the offsets for CPUs >> are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores >> the address of the task_struct in TP register. The first element in >> task_struct is struct thread_info, and we can get the cpu number by >> reading from the TP register + offsetof(struct thread_info, cpu). >> >> Once we have the cpu number in a register we read the offset for that >> cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this >> offset to the destination register. >> >> To measure the improvement from this change, the benchmark in [1] was >> used on Qemu: >> >> Before: >> glob-arr-inc : 1.127 ± 0.013M/s >> arr-inc : 1.121 ± 0.004M/s >> hash-inc : 0.681 ± 0.052M/s >> >> After: >> glob-arr-inc : 1.138 ± 0.011M/s >> arr-inc : 1.366 ± 0.006M/s >> hash-inc : 0.676 ± 0.001M/s >> >> [1] https://github.com/anakryiko/linux/commit/8dec900975ef >> >> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> >> --- >> arch/riscv/net/bpf_jit_comp64.c | 24 ++++++++++++++++++++++++ >> 1 file changed, 24 insertions(+) >> >> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c >> index 15e482f2c657..99d7006f1420 100644 >> --- a/arch/riscv/net/bpf_jit_comp64.c >> +++ b/arch/riscv/net/bpf_jit_comp64.c >> @@ -12,6 +12,7 @@ >> #include <linux/stop_machine.h> >> #include <asm/patch.h> >> #include <asm/cfi.h> >> +#include <asm/percpu.h> >> #include "bpf_jit.h" >> >> #define RV_FENTRY_NINSNS 2 >> @@ -1089,6 +1090,24 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, >> emit_or(RV_REG_T1, rd, RV_REG_T1, ctx); >> emit_mv(rd, RV_REG_T1, ctx); >> break; >> + } else if (insn_is_mov_percpu_addr(insn)) { >> + if (rd != rs) >> + emit_mv(rd, rs, ctx); No biggie, but you did not fold this check into emit_mv(). >> +#ifdef CONFIG_SMP >> + /* Load current CPU number in T1 */ >> + emit_ld(RV_REG_T1, offsetof(struct thread_info, cpu), >> + RV_REG_TP, ctx); >> + /* << 3 because offsets are 8 bytes */ >> + emit_slli(RV_REG_T1, RV_REG_T1, 3, ctx); >> + /* Load address of __per_cpu_offset array in T2 */ >> + emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx); >> + /* Add offset of current CPU to __per_cpu_offset */ >> + emit_add(RV_REG_T1, RV_REG_T2, RV_REG_T1, ctx); >> + /* Load __per_cpu_offset[cpu] in T1 */ >> + emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx); >> + /* Add the offset to Rd */ >> + emit_add(rd, rd, RV_REG_T1, ctx); > > is this the right level of code indentation? Looks wrong. When the indent is fixed, feel free to add: Acked-by: Björn Töpel <bjorn@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 1/2] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs 2024-05-02 16:18 ` Björn Töpel @ 2024-05-02 16:20 ` Puranjay Mohan 0 siblings, 0 replies; 11+ messages in thread From: Puranjay Mohan @ 2024-05-02 16:20 UTC (permalink / raw) To: Björn Töpel Cc: Andrii Nakryiko, Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui On Thu, May 2, 2024 at 6:18 PM Björn Töpel <bjorn@kernel.org> wrote: > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > > > On Tue, Apr 30, 2024 at 10:58 AM Puranjay Mohan <puranjay@kernel.org> wrote: > >> > >> Support an instruction for resolving absolute addresses of per-CPU > >> data from their per-CPU offsets. This instruction is internal-only and > >> users are not allowed to use them directly. They will only be used for > >> internal inlining optimizations for now between BPF verifier and BPF > >> JITs. > >> > >> RISC-V uses generic per-cpu implementation where the offsets for CPUs > >> are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores > >> the address of the task_struct in TP register. The first element in > >> task_struct is struct thread_info, and we can get the cpu number by > >> reading from the TP register + offsetof(struct thread_info, cpu). > >> > >> Once we have the cpu number in a register we read the offset for that > >> cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this > >> offset to the destination register. > >> > >> To measure the improvement from this change, the benchmark in [1] was > >> used on Qemu: > >> > >> Before: > >> glob-arr-inc : 1.127 ± 0.013M/s > >> arr-inc : 1.121 ± 0.004M/s > >> hash-inc : 0.681 ± 0.052M/s > >> > >> After: > >> glob-arr-inc : 1.138 ± 0.011M/s > >> arr-inc : 1.366 ± 0.006M/s > >> hash-inc : 0.676 ± 0.001M/s > >> > >> [1] https://github.com/anakryiko/linux/commit/8dec900975ef > >> > >> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> > >> --- > >> arch/riscv/net/bpf_jit_comp64.c | 24 ++++++++++++++++++++++++ > >> 1 file changed, 24 insertions(+) > >> > >> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c > >> index 15e482f2c657..99d7006f1420 100644 > >> --- a/arch/riscv/net/bpf_jit_comp64.c > >> +++ b/arch/riscv/net/bpf_jit_comp64.c > >> @@ -12,6 +12,7 @@ > >> #include <linux/stop_machine.h> > >> #include <asm/patch.h> > >> #include <asm/cfi.h> > >> +#include <asm/percpu.h> > >> #include "bpf_jit.h" > >> > >> #define RV_FENTRY_NINSNS 2 > >> @@ -1089,6 +1090,24 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, > >> emit_or(RV_REG_T1, rd, RV_REG_T1, ctx); > >> emit_mv(rd, RV_REG_T1, ctx); > >> break; > >> + } else if (insn_is_mov_percpu_addr(insn)) { > >> + if (rd != rs) > >> + emit_mv(rd, rs, ctx); > > No biggie, but you did not fold this check into emit_mv(). > > >> +#ifdef CONFIG_SMP > >> + /* Load current CPU number in T1 */ > >> + emit_ld(RV_REG_T1, offsetof(struct thread_info, cpu), > >> + RV_REG_TP, ctx); > >> + /* << 3 because offsets are 8 bytes */ > >> + emit_slli(RV_REG_T1, RV_REG_T1, 3, ctx); > >> + /* Load address of __per_cpu_offset array in T2 */ > >> + emit_addr(RV_REG_T2, (u64)&__per_cpu_offset, extra_pass, ctx); > >> + /* Add offset of current CPU to __per_cpu_offset */ > >> + emit_add(RV_REG_T1, RV_REG_T2, RV_REG_T1, ctx); > >> + /* Load __per_cpu_offset[cpu] in T1 */ > >> + emit_ld(RV_REG_T1, 0, RV_REG_T1, ctx); > >> + /* Add the offset to Rd */ > >> + emit_add(rd, rd, RV_REG_T1, ctx); > > > > is this the right level of code indentation? > > Looks wrong. > > When the indent is fixed, feel free to add: > > Acked-by: Björn Töpel <bjorn@kernel.org> I fixed the indent and sent it as part of: https://lore.kernel.org/all/20240502151854.9810-1-puranjay@kernel.org/ Also, for the emit_mv() thing, I wanted to verify if we are using that somewhere for zero-extension or something. So, I thought I would send a separate patch for it. Thanks, Puranjay ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() 2024-04-30 17:58 [PATCH bpf-next v2 0/2] riscv, bpf: Support per-CPU insn and inline bpf_get_smp_processor_id() Puranjay Mohan 2024-04-30 17:58 ` [PATCH bpf-next v2 1/2] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan @ 2024-04-30 17:58 ` Puranjay Mohan 2024-04-30 19:18 ` Kumar Kartikeya Dwivedi ` (2 more replies) 1 sibling, 3 replies; 11+ messages in thread From: Puranjay Mohan @ 2024-04-30 17:58 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui Cc: puranjay12 Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) register. This makes it trivial to get the CPU's processor id. As thread_info is the first member of task_struct, we can read the processor id from TP + offsetof(struct thread_info, cpu). RISCV64 JIT output for `call bpf_get_smp_processor_id` ====================================================== Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ NOTE: This benchmark includes changes from this patch and the previous patch that implemented the per-cpu insn. [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> --- arch/riscv/net/bpf_jit_comp64.c | 26 ++++++++++++++++++++++++++ include/linux/filter.h | 1 + kernel/bpf/core.c | 11 +++++++++++ kernel/bpf/verifier.c | 2 ++ 4 files changed, 40 insertions(+) diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index 99d7006f1420..5789b7afae47 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -1493,6 +1493,22 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, bool fixed_addr; u64 addr; + /* Inline calls to bpf_get_smp_processor_id() + * + * RV_REG_TP holds the address of the current CPU's task_struct and thread_info is + * at offset 0 in task_struct. + * Load cpu from thread_info: + * Set R0 to ((struct thread_info *)(RV_REG_TP))->cpu + * + * This replicates the implementation of raw_smp_processor_id() on RISCV + */ + if (insn->src_reg == 0 && insn->imm == BPF_FUNC_get_smp_processor_id) { + /* Load current CPU number in R0 */ + emit_ld(bpf_to_rv_reg(BPF_REG_0, ctx), offsetof(struct thread_info, cpu), + RV_REG_TP, ctx); + break; + } + mark_call(ctx); ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, &addr, &fixed_addr); @@ -2062,3 +2078,13 @@ bool bpf_jit_supports_percpu_insn(void) { return true; } + +bool bpf_jit_inlines_helper_call(s32 imm) +{ + switch (imm) { + case BPF_FUNC_get_smp_processor_id: + return true; + } + + return false; +} diff --git a/include/linux/filter.h b/include/linux/filter.h index 7a27f19bf44d..3e19bb62ed1a 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -993,6 +993,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog); void bpf_jit_compile(struct bpf_prog *prog); bool bpf_jit_needs_zext(void); +bool bpf_jit_inlines_helper_call(s32 imm); bool bpf_jit_supports_subprog_tailcalls(void); bool bpf_jit_supports_percpu_insn(void); bool bpf_jit_supports_kfunc_call(void); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 99b8b1c9a248..aa59af9f9bd9 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2941,6 +2941,17 @@ bool __weak bpf_jit_needs_zext(void) return false; } +/* Return true if the JIT inlines the call to the helper corresponding to + * the imm. + * + * The verifier will not patch the insn->imm for the call to the helper if + * this returns true. + */ +bool __weak bpf_jit_inlines_helper_call(s32 imm) +{ + return false; +} + /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */ bool __weak bpf_jit_supports_subprog_tailcalls(void) { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5d42db05315e..e78f766d7f91 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -20013,6 +20013,8 @@ static int do_misc_fixups(struct bpf_verifier_env *env) goto next_insn; } + if (bpf_jit_inlines_helper_call(insn->imm)) + goto next_insn; if (insn->imm == BPF_FUNC_get_route_realm) prog->dst_needed = 1; if (insn->imm == BPF_FUNC_get_prandom_u32) -- 2.40.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() 2024-04-30 17:58 ` [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan @ 2024-04-30 19:18 ` Kumar Kartikeya Dwivedi 2024-05-01 16:46 ` Andrii Nakryiko 2024-05-02 16:19 ` Björn Töpel 2 siblings, 0 replies; 11+ messages in thread From: Kumar Kartikeya Dwivedi @ 2024-04-30 19:18 UTC (permalink / raw) To: Puranjay Mohan Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui, puranjay12 On Tue, 30 Apr 2024 at 20:00, Puranjay Mohan <puranjay@kernel.org> wrote: > > Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. > > RISCV saves the pointer to the CPU's task_struct in the TP (thread > pointer) register. This makes it trivial to get the CPU's processor id. > As thread_info is the first member of task_struct, we can read the > processor id from TP + offsetof(struct thread_info, cpu). > > RISCV64 JIT output for `call bpf_get_smp_processor_id` > ====================================================== > > Before After > -------- ------- > > auipc t1,0x848c ld a5,32(tp) > jalr 604(t1) > mv a5,a0 > > Benchmark using [1] on Qemu. > > ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc > > +---------------+------------------+------------------+--------------+ > | Name | Before | After | % change | > |---------------+------------------+------------------+--------------| > | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | > | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | > | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | > +---------------+------------------+------------------+--------------+ > > NOTE: This benchmark includes changes from this patch and the previous > patch that implemented the per-cpu insn. > > [1] https://github.com/anakryiko/linux/commit/8dec900975ef > > Signed-off-by: Puranjay Mohan <puranjay@kernel.org> > --- For non-riscv bits (& fwiw): Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() 2024-04-30 17:58 ` [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan 2024-04-30 19:18 ` Kumar Kartikeya Dwivedi @ 2024-05-01 16:46 ` Andrii Nakryiko 2024-05-02 13:16 ` Puranjay Mohan 2024-05-02 16:19 ` Björn Töpel 2 siblings, 1 reply; 11+ messages in thread From: Andrii Nakryiko @ 2024-05-01 16:46 UTC (permalink / raw) To: Puranjay Mohan Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui, puranjay12 On Tue, Apr 30, 2024 at 10:59 AM Puranjay Mohan <puranjay@kernel.org> wrote: > > Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. > > RISCV saves the pointer to the CPU's task_struct in the TP (thread > pointer) register. This makes it trivial to get the CPU's processor id. > As thread_info is the first member of task_struct, we can read the > processor id from TP + offsetof(struct thread_info, cpu). > > RISCV64 JIT output for `call bpf_get_smp_processor_id` > ====================================================== > > Before After > -------- ------- > > auipc t1,0x848c ld a5,32(tp) > jalr 604(t1) > mv a5,a0 > Nice, great find! Would you be able to do similar inlining for x86-64 as well? Disassembling bpf_get_smp_processor_id for x86-64 shows this: Dump of assembler code for function bpf_get_smp_processor_id: 0xffffffff810f91a0 <+0>: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 0xffffffff810f91a5 <+5>: 65 8b 05 60 79 f3 7e mov %gs:0x7ef37960(%rip),%eax # 0x30b0c <pcpu_hot+12> 0xffffffff810f91ac <+12>: 48 98 cltq 0xffffffff810f91ae <+14>: c3 ret End of assembler dump. We should be able to do the same in x86-64 BPF JIT. (it's actually how I started initially, I had a dedicated instruction reading per-cpu memory, but ended up with more general "calculate per-cpu address"). Anyways, great work, a small nit below. Acked-by: Andrii Nakryiko <andrii@kernel.org> > Benchmark using [1] on Qemu. > > ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc > > +---------------+------------------+------------------+--------------+ > | Name | Before | After | % change | > |---------------+------------------+------------------+--------------| > | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | > | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | > | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | > +---------------+------------------+------------------+--------------+ > > NOTE: This benchmark includes changes from this patch and the previous > patch that implemented the per-cpu insn. > > [1] https://github.com/anakryiko/linux/commit/8dec900975ef > > Signed-off-by: Puranjay Mohan <puranjay@kernel.org> > --- > arch/riscv/net/bpf_jit_comp64.c | 26 ++++++++++++++++++++++++++ > include/linux/filter.h | 1 + > kernel/bpf/core.c | 11 +++++++++++ > kernel/bpf/verifier.c | 2 ++ > 4 files changed, 40 insertions(+) > > diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c > index 99d7006f1420..5789b7afae47 100644 > --- a/arch/riscv/net/bpf_jit_comp64.c > +++ b/arch/riscv/net/bpf_jit_comp64.c > @@ -1493,6 +1493,22 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, > bool fixed_addr; > u64 addr; > > + /* Inline calls to bpf_get_smp_processor_id() > + * > + * RV_REG_TP holds the address of the current CPU's task_struct and thread_info is > + * at offset 0 in task_struct. > + * Load cpu from thread_info: > + * Set R0 to ((struct thread_info *)(RV_REG_TP))->cpu > + * > + * This replicates the implementation of raw_smp_processor_id() on RISCV > + */ > + if (insn->src_reg == 0 && insn->imm == BPF_FUNC_get_smp_processor_id) { > + /* Load current CPU number in R0 */ > + emit_ld(bpf_to_rv_reg(BPF_REG_0, ctx), offsetof(struct thread_info, cpu), > + RV_REG_TP, ctx); > + break; > + } > + > mark_call(ctx); > ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, > &addr, &fixed_addr); > @@ -2062,3 +2078,13 @@ bool bpf_jit_supports_percpu_insn(void) > { > return true; > } > + > +bool bpf_jit_inlines_helper_call(s32 imm) > +{ > + switch (imm) { > + case BPF_FUNC_get_smp_processor_id: > + return true; > + } > + > + return false; nit: why not default: return false; to keep everything within the switch? > +} > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 7a27f19bf44d..3e19bb62ed1a 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -993,6 +993,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog); > void bpf_jit_compile(struct bpf_prog *prog); > bool bpf_jit_needs_zext(void); > +bool bpf_jit_inlines_helper_call(s32 imm); > bool bpf_jit_supports_subprog_tailcalls(void); > bool bpf_jit_supports_percpu_insn(void); > bool bpf_jit_supports_kfunc_call(void); > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 99b8b1c9a248..aa59af9f9bd9 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -2941,6 +2941,17 @@ bool __weak bpf_jit_needs_zext(void) > return false; > } > > +/* Return true if the JIT inlines the call to the helper corresponding to > + * the imm. > + * > + * The verifier will not patch the insn->imm for the call to the helper if > + * this returns true. > + */ > +bool __weak bpf_jit_inlines_helper_call(s32 imm) > +{ > + return false; > +} > + > /* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */ > bool __weak bpf_jit_supports_subprog_tailcalls(void) > { > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 5d42db05315e..e78f766d7f91 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -20013,6 +20013,8 @@ static int do_misc_fixups(struct bpf_verifier_env *env) > goto next_insn; > } > > + if (bpf_jit_inlines_helper_call(insn->imm)) > + goto next_insn; It's nice to be able to allow BPF JIT to do a higher-performance implementation. Let's add a short comment above to mention that this is bypassing normal inlining because BPF JIT will do it better (I know you have this description for the function definition, but a short remark here would be helpful). And please add an empty line after this check to logically separate it from the rest of helper inlining logic in verifier, thanks! pw-bot: cr > if (insn->imm == BPF_FUNC_get_route_realm) > prog->dst_needed = 1; > if (insn->imm == BPF_FUNC_get_prandom_u32) > -- > 2.40.1 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() 2024-05-01 16:46 ` Andrii Nakryiko @ 2024-05-02 13:16 ` Puranjay Mohan 2024-05-02 16:03 ` Andrii Nakryiko 0 siblings, 1 reply; 11+ messages in thread From: Puranjay Mohan @ 2024-05-02 13:16 UTC (permalink / raw) To: Andrii Nakryiko Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > On Tue, Apr 30, 2024 at 10:59 AM Puranjay Mohan <puranjay@kernel.org> wrote: >> >> Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. >> >> RISCV saves the pointer to the CPU's task_struct in the TP (thread >> pointer) register. This makes it trivial to get the CPU's processor id. >> As thread_info is the first member of task_struct, we can read the >> processor id from TP + offsetof(struct thread_info, cpu). >> >> RISCV64 JIT output for `call bpf_get_smp_processor_id` >> ====================================================== >> >> Before After >> -------- ------- >> >> auipc t1,0x848c ld a5,32(tp) >> jalr 604(t1) >> mv a5,a0 >> > > Nice, great find! Would you be able to do similar inlining for x86-64 > as well? Disassembling bpf_get_smp_processor_id for x86-64 shows this: > > Dump of assembler code for function bpf_get_smp_processor_id: > 0xffffffff810f91a0 <+0>: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > 0xffffffff810f91a5 <+5>: 65 8b 05 60 79 f3 7e mov > %gs:0x7ef37960(%rip),%eax # 0x30b0c <pcpu_hot+12> > 0xffffffff810f91ac <+12>: 48 98 cltq > 0xffffffff810f91ae <+14>: c3 ret > End of assembler dump. > We should be able to do the same in x86-64 BPF JIT. (it's actually how > I started initially, I had a dedicated instruction reading per-cpu > memory, but ended up with more general "calculate per-cpu address"). I feel in x86-64's case JIT can not do a (much) better job compared to the current approach in the verifier. On RISC-V and ARM64, JIT was able to do it better because both of these architectures save a pointer to the task struct in a special CPU register. As x86-64 doesn't have enough extra registers, it uses a percpu variable to store task struct, thread_info, and the cpu number. P.S. - While doing this for BPF, I realized that ARM64 kernel code is also not optimal as it is using the percpu variable and is not reading the CPU register directly. So, I sent a patch[1] to fix it in the kernel and get rid of the per-cpu variable in ARM64. [1] https://lore.kernel.org/all/20240502123449.2690-2-puranjay@kernel.org/ > Anyways, great work, a small nit below. > > Acked-by: Andrii Nakryiko <andrii@kernel.org> Thanks, Puranjay ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() 2024-05-02 13:16 ` Puranjay Mohan @ 2024-05-02 16:03 ` Andrii Nakryiko 0 siblings, 0 replies; 11+ messages in thread From: Andrii Nakryiko @ 2024-05-02 16:03 UTC (permalink / raw) To: Puranjay Mohan Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui On Thu, May 2, 2024 at 6:16 AM Puranjay Mohan <puranjay@kernel.org> wrote: > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > > > On Tue, Apr 30, 2024 at 10:59 AM Puranjay Mohan <puranjay@kernel.org> wrote: > >> > >> Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. > >> > >> RISCV saves the pointer to the CPU's task_struct in the TP (thread > >> pointer) register. This makes it trivial to get the CPU's processor id. > >> As thread_info is the first member of task_struct, we can read the > >> processor id from TP + offsetof(struct thread_info, cpu). > >> > >> RISCV64 JIT output for `call bpf_get_smp_processor_id` > >> ====================================================== > >> > >> Before After > >> -------- ------- > >> > >> auipc t1,0x848c ld a5,32(tp) > >> jalr 604(t1) > >> mv a5,a0 > >> > > > > Nice, great find! Would you be able to do similar inlining for x86-64 > > as well? Disassembling bpf_get_smp_processor_id for x86-64 shows this: > > > > Dump of assembler code for function bpf_get_smp_processor_id: > > 0xffffffff810f91a0 <+0>: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > 0xffffffff810f91a5 <+5>: 65 8b 05 60 79 f3 7e mov > > %gs:0x7ef37960(%rip),%eax # 0x30b0c <pcpu_hot+12> > > 0xffffffff810f91ac <+12>: 48 98 cltq > > 0xffffffff810f91ae <+14>: c3 ret > > End of assembler dump. > > We should be able to do the same in x86-64 BPF JIT. (it's actually how > > I started initially, I had a dedicated instruction reading per-cpu > > memory, but ended up with more general "calculate per-cpu address"). > > I feel in x86-64's case JIT can not do a (much) better job compared to the > current approach in the verifier. This direct memory read (using gs segment) ought to be a bit faster than calculating offset and then doing memory dereference, but yes, the difference won't be as big as you got with RISC-V and ARM64. Ok, never mind, we can always benchmark and add that later, no big deal. > > On RISC-V and ARM64, JIT was able to do it better because both of these > architectures save a pointer to the task struct in a special CPU > register. As x86-64 doesn't have enough extra registers, it uses a > percpu variable to store task struct, thread_info, and the cpu > number. > > P.S. - While doing this for BPF, I realized that ARM64 kernel code is > also not optimal as it is using the percpu variable and is not reading > the CPU register directly. So, I sent a patch[1] to fix it in the kernel > and get rid of the per-cpu variable in ARM64. > > > [1] https://lore.kernel.org/all/20240502123449.2690-2-puranjay@kernel.org/ > > > Anyways, great work, a small nit below. > > > > Acked-by: Andrii Nakryiko <andrii@kernel.org> > > Thanks, > Puranjay ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() 2024-04-30 17:58 ` [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan 2024-04-30 19:18 ` Kumar Kartikeya Dwivedi 2024-05-01 16:46 ` Andrii Nakryiko @ 2024-05-02 16:19 ` Björn Töpel 2 siblings, 0 replies; 11+ messages in thread From: Björn Töpel @ 2024-05-02 16:19 UTC (permalink / raw) To: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Walmsley, Palmer Dabbelt, Albert Ou, bpf, linux-riscv, linux-kernel, Pu Lehui Cc: puranjay12 Puranjay Mohan <puranjay@kernel.org> writes: > Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. > > RISCV saves the pointer to the CPU's task_struct in the TP (thread > pointer) register. This makes it trivial to get the CPU's processor id. > As thread_info is the first member of task_struct, we can read the > processor id from TP + offsetof(struct thread_info, cpu). > > RISCV64 JIT output for `call bpf_get_smp_processor_id` > ====================================================== > > Before After > -------- ------- > > auipc t1,0x848c ld a5,32(tp) > jalr 604(t1) > mv a5,a0 > > Benchmark using [1] on Qemu. > > ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc > > +---------------+------------------+------------------+--------------+ > | Name | Before | After | % change | > |---------------+------------------+------------------+--------------| > | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | > | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | > | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | > +---------------+------------------+------------------+--------------+ > > NOTE: This benchmark includes changes from this patch and the previous > patch that implemented the per-cpu insn. > > [1] https://github.com/anakryiko/linux/commit/8dec900975ef > > Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-05-02 16:21 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-30 17:58 [PATCH bpf-next v2 0/2] riscv, bpf: Support per-CPU insn and inline bpf_get_smp_processor_id() Puranjay Mohan 2024-04-30 17:58 ` [PATCH bpf-next v2 1/2] riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan 2024-05-01 16:39 ` Andrii Nakryiko 2024-05-02 16:18 ` Björn Töpel 2024-05-02 16:20 ` Puranjay Mohan 2024-04-30 17:58 ` [PATCH bpf-next v2 2/2] riscv, bpf: inline bpf_get_smp_processor_id() Puranjay Mohan 2024-04-30 19:18 ` Kumar Kartikeya Dwivedi 2024-05-01 16:46 ` Andrii Nakryiko 2024-05-02 13:16 ` Puranjay Mohan 2024-05-02 16:03 ` Andrii Nakryiko 2024-05-02 16:19 ` Björn Töpel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).