From: Puranjay Mohan <puranjay@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Zi Shen Lim <zlim.lnx@gmail.com>,
Xu Kuohai <xukuohai@huawei.com>,
Florent Revest <revest@chromium.org>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 2/2] bpf, arm64: inline bpf_get_smp_processor_id() helper
Date: Fri, 26 Apr 2024 17:06:51 +0000 [thread overview]
Message-ID: <mb61pplucvys4.fsf@kernel.org> (raw)
In-Reply-To: <CAEf4BzaNM5H3Ad2=Syhhq1cbfuB5FrtuFTZHPTdQP3QME3naKA@mail.gmail.com>
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Fri, Apr 26, 2024 at 5:14 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> As ARM64 JIT now implements BPF_MOV64_PERCPU_REG instruction, inline
>> bpf_get_smp_processor_id().
>>
>> ARM64 uses the per-cpu variable cpu_number to store the cpu id.
>>
>> Here is how the BPF and ARM64 JITed assembly changes after this commit:
>>
>> BPF
>> =====
>> BEFORE AFTER
>> -------- -------
>>
>> int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
>> (85) call bpf_get_smp_processor_id#229032 (18) r0 = 0xffff800082072008
>> (bf) r0 = &(void __percpu *)(r0)
>> (61) r0 = *(u32 *)(r0 +0)
>>
>> ARM64 JIT
>> ===========
>>
>> BEFORE AFTER
>> -------- -------
>>
>> int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
>> mov x10, #0xfffffffffffff4d0 mov x7, #0xffff8000ffffffff
>> movk x10, #0x802b, lsl #16 movk x7, #0x8207, lsl #16
>> movk x10, #0x8000, lsl #32 movk x7, #0x2008
>> blr x10 mrs x10, tpidr_el1
>> add x7, x0, #0x0 add x7, x7, x10
>> ldr w7, [x7]
>>
>> Performance improvement using benchmark[1]
>>
>> BEFORE AFTER
>> -------- -------
>>
>> glob-arr-inc : 23.817 ± 0.019M/s glob-arr-inc : 24.631 ± 0.027M/s
>> arr-inc : 23.253 ± 0.019M/s arr-inc : 23.742 ± 0.023M/s
>> hash-inc : 12.258 ± 0.010M/s hash-inc : 12.625 ± 0.004M/s
>>
>> [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>>
>> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> Acked-by: Andrii Nakryiko <andrii@kernel.org>
>> ---
>> kernel/bpf/verifier.c | 24 +++++++++++++++++-------
>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 4e474ef44e9c..6ff4e63b2ef2 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -20273,20 +20273,31 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> goto next_insn;
>> }
>>
>> -#ifdef CONFIG_X86_64
>> /* Implement bpf_get_smp_processor_id() inline. */
>> if (insn->imm == BPF_FUNC_get_smp_processor_id &&
>> prog->jit_requested && bpf_jit_supports_percpu_insn()) {
>> /* BPF_FUNC_get_smp_processor_id inlining is an
>> - * optimization, so if pcpu_hot.cpu_number is ever
>> + * optimization, so if cpu_number_addr is ever
>> * changed in some incompatible and hard to support
>> * way, it's fine to back out this inlining logic
>> */
>> - insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
>> - insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
>> - insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
>> - cnt = 3;
>> + u64 cpu_number_addr;
>>
>> +#if defined(CONFIG_X86_64)
>> + cpu_number_addr = (u64)&pcpu_hot.cpu_number;
>> +#elif defined(CONFIG_ARM64)
>> + cpu_number_addr = (u64)&cpu_number;
>> +#else
>> + goto next_insn;
>> +#endif
>> + struct bpf_insn ld_cpu_number_addr[2] = {
>> + BPF_LD_IMM64(BPF_REG_0, cpu_number_addr)
>> + };
>
> here we are violating C89 requirement to have a single block of
> variable declarations by mixing variables and statements. I'm
> surprised this is not triggering any build errors on !arm64 &&
> !x86_64.
>
> I think we can declare this BPF_LD_IMM64 instruction with zero "addr".
> And then update
>
> ld_cpu_number_addr[0].imm = (u32)cpu_number_addr;
> ld_cpu_number_addr[1].imm = (u32)(cpu_number_addr >> 32);
>
> WDYT?
>
> nit: I'd rename ld_cpu_number_addr to ld_insn or something short like that
I agree with you,
What do you think about the following diff:
--- 8< ---
-#ifdef CONFIG_X86_64
/* Implement bpf_get_smp_processor_id() inline. */
if (insn->imm == BPF_FUNC_get_smp_processor_id &&
prog->jit_requested && bpf_jit_supports_percpu_insn()) {
/* BPF_FUNC_get_smp_processor_id inlining is an
- * optimization, so if pcpu_hot.cpu_number is ever
+ * optimization, so if cpu_number_addr is ever
* changed in some incompatible and hard to support
* way, it's fine to back out this inlining logic
*/
- insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
- insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
- insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
- cnt = 3;
+ u64 cpu_number_addr;
+ struct bpf_insn ld_insn[2] = {
+ BPF_LD_IMM64(BPF_REG_0, 0)
+ };
+
+#if defined(CONFIG_X86_64)
+ cpu_number_addr = (u64)&pcpu_hot.cpu_number;
+#elif defined(CONFIG_ARM64)
+ cpu_number_addr = (u64)&cpu_number;
+#else
+ goto next_insn;
+#endif
+ ld_insn[0].imm = (u32)cpu_number_addr;
+ ld_insn[1].imm = (u32)(cpu_number_addr >> 32);
+ insn_buf[0] = ld_insn[0];
+ insn_buf[1] = ld_insn[1];
+ insn_buf[2] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
+ insn_buf[3] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
+ cnt = 4;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
@@ -20296,7 +20310,6 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
insn = new_prog->insnsi + i + delta;
goto next_insn;
}
-#endif
/* Implement bpf_get_func_arg inline. */
--- >8---
Thanks,
Puranjay
WARNING: multiple messages have this Message-ID (diff)
From: Puranjay Mohan <puranjay@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Zi Shen Lim <zlim.lnx@gmail.com>,
Xu Kuohai <xukuohai@huawei.com>,
Florent Revest <revest@chromium.org>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 2/2] bpf, arm64: inline bpf_get_smp_processor_id() helper
Date: Fri, 26 Apr 2024 17:06:51 +0000 [thread overview]
Message-ID: <mb61pplucvys4.fsf@kernel.org> (raw)
In-Reply-To: <CAEf4BzaNM5H3Ad2=Syhhq1cbfuB5FrtuFTZHPTdQP3QME3naKA@mail.gmail.com>
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Fri, Apr 26, 2024 at 5:14 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> As ARM64 JIT now implements BPF_MOV64_PERCPU_REG instruction, inline
>> bpf_get_smp_processor_id().
>>
>> ARM64 uses the per-cpu variable cpu_number to store the cpu id.
>>
>> Here is how the BPF and ARM64 JITed assembly changes after this commit:
>>
>> BPF
>> =====
>> BEFORE AFTER
>> -------- -------
>>
>> int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
>> (85) call bpf_get_smp_processor_id#229032 (18) r0 = 0xffff800082072008
>> (bf) r0 = &(void __percpu *)(r0)
>> (61) r0 = *(u32 *)(r0 +0)
>>
>> ARM64 JIT
>> ===========
>>
>> BEFORE AFTER
>> -------- -------
>>
>> int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
>> mov x10, #0xfffffffffffff4d0 mov x7, #0xffff8000ffffffff
>> movk x10, #0x802b, lsl #16 movk x7, #0x8207, lsl #16
>> movk x10, #0x8000, lsl #32 movk x7, #0x2008
>> blr x10 mrs x10, tpidr_el1
>> add x7, x0, #0x0 add x7, x7, x10
>> ldr w7, [x7]
>>
>> Performance improvement using benchmark[1]
>>
>> BEFORE AFTER
>> -------- -------
>>
>> glob-arr-inc : 23.817 ± 0.019M/s glob-arr-inc : 24.631 ± 0.027M/s
>> arr-inc : 23.253 ± 0.019M/s arr-inc : 23.742 ± 0.023M/s
>> hash-inc : 12.258 ± 0.010M/s hash-inc : 12.625 ± 0.004M/s
>>
>> [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>>
>> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> Acked-by: Andrii Nakryiko <andrii@kernel.org>
>> ---
>> kernel/bpf/verifier.c | 24 +++++++++++++++++-------
>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 4e474ef44e9c..6ff4e63b2ef2 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -20273,20 +20273,31 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> goto next_insn;
>> }
>>
>> -#ifdef CONFIG_X86_64
>> /* Implement bpf_get_smp_processor_id() inline. */
>> if (insn->imm == BPF_FUNC_get_smp_processor_id &&
>> prog->jit_requested && bpf_jit_supports_percpu_insn()) {
>> /* BPF_FUNC_get_smp_processor_id inlining is an
>> - * optimization, so if pcpu_hot.cpu_number is ever
>> + * optimization, so if cpu_number_addr is ever
>> * changed in some incompatible and hard to support
>> * way, it's fine to back out this inlining logic
>> */
>> - insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
>> - insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
>> - insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
>> - cnt = 3;
>> + u64 cpu_number_addr;
>>
>> +#if defined(CONFIG_X86_64)
>> + cpu_number_addr = (u64)&pcpu_hot.cpu_number;
>> +#elif defined(CONFIG_ARM64)
>> + cpu_number_addr = (u64)&cpu_number;
>> +#else
>> + goto next_insn;
>> +#endif
>> + struct bpf_insn ld_cpu_number_addr[2] = {
>> + BPF_LD_IMM64(BPF_REG_0, cpu_number_addr)
>> + };
>
> here we are violating C89 requirement to have a single block of
> variable declarations by mixing variables and statements. I'm
> surprised this is not triggering any build errors on !arm64 &&
> !x86_64.
>
> I think we can declare this BPF_LD_IMM64 instruction with zero "addr".
> And then update
>
> ld_cpu_number_addr[0].imm = (u32)cpu_number_addr;
> ld_cpu_number_addr[1].imm = (u32)(cpu_number_addr >> 32);
>
> WDYT?
>
> nit: I'd rename ld_cpu_number_addr to ld_insn or something short like that
I agree with you,
What do you think about the following diff:
--- 8< ---
-#ifdef CONFIG_X86_64
/* Implement bpf_get_smp_processor_id() inline. */
if (insn->imm == BPF_FUNC_get_smp_processor_id &&
prog->jit_requested && bpf_jit_supports_percpu_insn()) {
/* BPF_FUNC_get_smp_processor_id inlining is an
- * optimization, so if pcpu_hot.cpu_number is ever
+ * optimization, so if cpu_number_addr is ever
* changed in some incompatible and hard to support
* way, it's fine to back out this inlining logic
*/
- insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
- insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
- insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
- cnt = 3;
+ u64 cpu_number_addr;
+ struct bpf_insn ld_insn[2] = {
+ BPF_LD_IMM64(BPF_REG_0, 0)
+ };
+
+#if defined(CONFIG_X86_64)
+ cpu_number_addr = (u64)&pcpu_hot.cpu_number;
+#elif defined(CONFIG_ARM64)
+ cpu_number_addr = (u64)&cpu_number;
+#else
+ goto next_insn;
+#endif
+ ld_insn[0].imm = (u32)cpu_number_addr;
+ ld_insn[1].imm = (u32)(cpu_number_addr >> 32);
+ insn_buf[0] = ld_insn[0];
+ insn_buf[1] = ld_insn[1];
+ insn_buf[2] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
+ insn_buf[3] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
+ cnt = 4;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
@@ -20296,7 +20310,6 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
insn = new_prog->insnsi + i + delta;
goto next_insn;
}
-#endif
/* Implement bpf_get_func_arg inline. */
--- >8---
Thanks,
Puranjay
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-04-26 17:06 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-26 12:13 [PATCH bpf-next v3 0/2] bpf, arm64: Support per-cpu instruction Puranjay Mohan
2024-04-26 12:13 ` Puranjay Mohan
2024-04-26 12:13 ` [PATCH bpf-next v3 1/2] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
2024-04-26 12:13 ` Puranjay Mohan
2024-04-26 16:19 ` Andrii Nakryiko
2024-04-26 16:19 ` Andrii Nakryiko
2024-04-26 16:55 ` Puranjay Mohan
2024-04-26 16:55 ` Puranjay Mohan
2024-04-26 17:35 ` Andrii Nakryiko
2024-04-26 17:35 ` Andrii Nakryiko
2024-04-30 18:30 ` Puranjay Mohan
2024-04-30 18:30 ` Puranjay Mohan
2024-04-26 12:13 ` [PATCH bpf-next v3 2/2] bpf, arm64: inline bpf_get_smp_processor_id() helper Puranjay Mohan
2024-04-26 12:13 ` Puranjay Mohan
2024-04-26 16:26 ` Andrii Nakryiko
2024-04-26 16:26 ` Andrii Nakryiko
2024-04-26 17:06 ` Puranjay Mohan [this message]
2024-04-26 17:06 ` Puranjay Mohan
2024-04-26 17:31 ` Andrii Nakryiko
2024-04-26 17:31 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mb61pplucvys4.fsf@kernel.org \
--to=puranjay@kernel.org \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=revest@chromium.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=will@kernel.org \
--cc=xukuohai@huawei.com \
--cc=yonghong.song@linux.dev \
--cc=zlim.lnx@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.