From: Puranjay Mohan <puranjay@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Zi Shen Lim <zlim.lnx@gmail.com>,
Xu Kuohai <xukuohai@huawei.com>,
Florent Revest <revest@chromium.org>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v2 2/2] bpf, arm64: inline bpf_get_smp_processor_id() helper
Date: Thu, 25 Apr 2024 18:55:56 +0000 [thread overview]
Message-ID: <mb61po79x9sqr.fsf@kernel.org> (raw)
In-Reply-To: <CAEf4BzZe-rtewAvDeNwqoud+x+fTraiLM1mzdvae_5yNrWsWyg@mail.gmail.com>
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Thu, Apr 25, 2024 at 3:14 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Wed, Apr 24, 2024 at 10:36 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>> >>
>> >> As ARM64 JIT now implements BPF_MOV64_PERCPU_REG instruction, inline
>> >> bpf_get_smp_processor_id().
>> >>
>> >> ARM64 uses the per-cpu variable cpu_number to store the cpu id.
>> >>
>> >> Here is how the BPF and ARM64 JITed assembly changes after this commit:
>> >>
>> >> BPF
>> >> =====
>> >> BEFORE AFTER
>> >> -------- -------
>> >>
>> >> int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
>> >> (85) call bpf_get_smp_processor_id#229032 (18) r0 = 0xffff800082072008
>> >> (bf) r0 = r0
>> >
>> > nit: hmm, you are probably using a bit outdated bpftool, it should be
>> > emitted as:
>> >
>> > (bf) r0 = &(void __percpu *)(r0)
>>
>> Yes, I was using the bpftool shipped with the distro. I tried it again
>> with the latest bpftool and it emitted this as expected.
>
> Cool, would be nice to update the commit message with the right syntax
> for next revision, thanks!
>
Sure, will do.
>>
>> >
>> >> (61) r0 = *(u32 *)(r0 +0)
>> >>
>> >> ARM64 JIT
>> >> ===========
>> >>
>> >> BEFORE AFTER
>> >> -------- -------
>> >>
>> >> int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
>> >> mov x10, #0xfffffffffffff4d0 mov x7, #0xffff8000ffffffff
>> >> movk x10, #0x802b, lsl #16 movk x7, #0x8207, lsl #16
>> >> movk x10, #0x8000, lsl #32 movk x7, #0x2008
>> >> blr x10 mrs x10, tpidr_el1
>> >> add x7, x0, #0x0 add x7, x7, x10
>> >> ldr w7, [x7]
>> >>
>> >> Performance improvement using benchmark[1]
>> >>
>> >> BEFORE AFTER
>> >> -------- -------
>> >>
>> >> glob-arr-inc : 23.817 ± 0.019M/s glob-arr-inc : 24.631 ± 0.027M/s
>> >> arr-inc : 23.253 ± 0.019M/s arr-inc : 23.742 ± 0.023M/s
>> >> hash-inc : 12.258 ± 0.010M/s hash-inc : 12.625 ± 0.004M/s
>> >>
>> >> [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>> >>
>> >> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> >> ---
>> >> kernel/bpf/verifier.c | 11 ++++++++++-
>> >> 1 file changed, 10 insertions(+), 1 deletion(-)
>> >>
>> >
>> > Besides the nits, lgtm.
>> >
>> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
>> >
>> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> >> index 9715c88cc025..3373be261889 100644
>> >> --- a/kernel/bpf/verifier.c
>> >> +++ b/kernel/bpf/verifier.c
>> >> @@ -20205,7 +20205,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> >> goto next_insn;
>> >> }
>> >>
>> >> -#ifdef CONFIG_X86_64
>> >> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
>> >
>> > I think you can drop this, we are protected by
>> > bpf_jit_supports_percpu_insn() check and newly added inner #if/#elif
>> > checks?
>>
>> If I remove this and later add support of percpu_insn on RISCV without
>> inlining bpf_get_smp_processor_id() then it will cause problems here
>> right? because then the last 5-6 lines inside this if(){} will be
>> executed for RISCV.
>
> Just add
>
> #else
> return -EFAULT;
I don't think we can return.
> #endif
>
> ?
>
> I'm trying to avoid this duplication of the defined(CONFIG_xxx) checks
> for supported architectures.
Does the following look correct?
I will do it like this:
/* Implement bpf_get_smp_processor_id() inline. */
if (insn->imm == BPF_FUNC_get_smp_processor_id &&
prog->jit_requested && bpf_jit_supports_percpu_insn()) {
/* BPF_FUNC_get_smp_processor_id inlining is an
* optimization, so if pcpu_hot.cpu_number is ever
* changed in some incompatible and hard to support
* way, it's fine to back out this inlining logic
*/
#if defined(CONFIG_X86_64)
insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
cnt = 3;
#elif defined(CONFIG_ARM64)
struct bpf_insn cpu_number_addr[2] = { BPF_LD_IMM64(BPF_REG_0, (u64)&cpu_number) };
insn_buf[0] = cpu_number_addr[0];
insn_buf[1] = cpu_number_addr[1];
insn_buf[2] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
insn_buf[3] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
cnt = 4;
#else
goto next_insn;
#endif
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
return -ENOMEM;
delta += cnt - 1;
env->prog = prog = new_prog;
insn = new_prog->insnsi + i + delta;
goto next_insn;
}
>>
>> >
>> >> /* Implement bpf_get_smp_processor_id() inline. */
>> >> if (insn->imm == BPF_FUNC_get_smp_processor_id &&
>> >> prog->jit_requested && bpf_jit_supports_percpu_insn()) {
>> >> @@ -20214,11 +20214,20 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> >> * changed in some incompatible and hard to support
>> >> * way, it's fine to back out this inlining logic
>> >> */
>> >> +#if defined(CONFIG_X86_64)
>> >> insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
>> >> insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
>> >> insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
>> >> cnt = 3;
>> >> +#elif defined(CONFIG_ARM64)
>> >> + struct bpf_insn cpu_number_addr[2] = { BPF_LD_IMM64(BPF_REG_0, (u64)&cpu_number) };
>> >>
>> >
>> > this &cpu_number offset is not guaranteed to be within 4GB on arm64?
>>
>> Unfortunately, the per-cpu section is not placed in the first 4GB and
>> therefore the per-cpu pointers are not 32-bit on ARM64.
>
> I see. It might make sense to turn x86-64 code into using MOV64_IMM as
> well to keep more of the logic common. Then it will be just the
> difference of an offset that's loaded. Give it a try?
I think MOV64_IMM would have more overhead than MOV32_IMM and if we can
use it in x86-64 we should keep doing it that way. Wdyt?
>>
>> >
>> >> + insn_buf[0] = cpu_number_addr[0];
>> >> + insn_buf[1] = cpu_number_addr[1];
>> >> + insn_buf[2] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
>> >> + insn_buf[3] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
>> >> + cnt = 4;
>> >> +#endif
>> >> new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
>> >> if (!new_prog)
>> >> return -ENOMEM;
>> >> --
>> >> 2.40.1
>> >>
WARNING: multiple messages have this Message-ID (diff)
From: Puranjay Mohan <puranjay@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Zi Shen Lim <zlim.lnx@gmail.com>,
Xu Kuohai <xukuohai@huawei.com>,
Florent Revest <revest@chromium.org>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v2 2/2] bpf, arm64: inline bpf_get_smp_processor_id() helper
Date: Thu, 25 Apr 2024 18:55:56 +0000 [thread overview]
Message-ID: <mb61po79x9sqr.fsf@kernel.org> (raw)
In-Reply-To: <CAEf4BzZe-rtewAvDeNwqoud+x+fTraiLM1mzdvae_5yNrWsWyg@mail.gmail.com>
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Thu, Apr 25, 2024 at 3:14 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Wed, Apr 24, 2024 at 10:36 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>> >>
>> >> As ARM64 JIT now implements BPF_MOV64_PERCPU_REG instruction, inline
>> >> bpf_get_smp_processor_id().
>> >>
>> >> ARM64 uses the per-cpu variable cpu_number to store the cpu id.
>> >>
>> >> Here is how the BPF and ARM64 JITed assembly changes after this commit:
>> >>
>> >> BPF
>> >> =====
>> >> BEFORE AFTER
>> >> -------- -------
>> >>
>> >> int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
>> >> (85) call bpf_get_smp_processor_id#229032 (18) r0 = 0xffff800082072008
>> >> (bf) r0 = r0
>> >
>> > nit: hmm, you are probably using a bit outdated bpftool, it should be
>> > emitted as:
>> >
>> > (bf) r0 = &(void __percpu *)(r0)
>>
>> Yes, I was using the bpftool shipped with the distro. I tried it again
>> with the latest bpftool and it emitted this as expected.
>
> Cool, would be nice to update the commit message with the right syntax
> for next revision, thanks!
>
Sure, will do.
>>
>> >
>> >> (61) r0 = *(u32 *)(r0 +0)
>> >>
>> >> ARM64 JIT
>> >> ===========
>> >>
>> >> BEFORE AFTER
>> >> -------- -------
>> >>
>> >> int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
>> >> mov x10, #0xfffffffffffff4d0 mov x7, #0xffff8000ffffffff
>> >> movk x10, #0x802b, lsl #16 movk x7, #0x8207, lsl #16
>> >> movk x10, #0x8000, lsl #32 movk x7, #0x2008
>> >> blr x10 mrs x10, tpidr_el1
>> >> add x7, x0, #0x0 add x7, x7, x10
>> >> ldr w7, [x7]
>> >>
>> >> Performance improvement using benchmark[1]
>> >>
>> >> BEFORE AFTER
>> >> -------- -------
>> >>
>> >> glob-arr-inc : 23.817 ± 0.019M/s glob-arr-inc : 24.631 ± 0.027M/s
>> >> arr-inc : 23.253 ± 0.019M/s arr-inc : 23.742 ± 0.023M/s
>> >> hash-inc : 12.258 ± 0.010M/s hash-inc : 12.625 ± 0.004M/s
>> >>
>> >> [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>> >>
>> >> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> >> ---
>> >> kernel/bpf/verifier.c | 11 ++++++++++-
>> >> 1 file changed, 10 insertions(+), 1 deletion(-)
>> >>
>> >
>> > Besides the nits, lgtm.
>> >
>> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
>> >
>> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> >> index 9715c88cc025..3373be261889 100644
>> >> --- a/kernel/bpf/verifier.c
>> >> +++ b/kernel/bpf/verifier.c
>> >> @@ -20205,7 +20205,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> >> goto next_insn;
>> >> }
>> >>
>> >> -#ifdef CONFIG_X86_64
>> >> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
>> >
>> > I think you can drop this, we are protected by
>> > bpf_jit_supports_percpu_insn() check and newly added inner #if/#elif
>> > checks?
>>
>> If I remove this and later add support of percpu_insn on RISCV without
>> inlining bpf_get_smp_processor_id() then it will cause problems here
>> right? because then the last 5-6 lines inside this if(){} will be
>> executed for RISCV.
>
> Just add
>
> #else
> return -EFAULT;
I don't think we can return.
> #endif
>
> ?
>
> I'm trying to avoid this duplication of the defined(CONFIG_xxx) checks
> for supported architectures.
Does the following look correct?
I will do it like this:
/* Implement bpf_get_smp_processor_id() inline. */
if (insn->imm == BPF_FUNC_get_smp_processor_id &&
prog->jit_requested && bpf_jit_supports_percpu_insn()) {
/* BPF_FUNC_get_smp_processor_id inlining is an
* optimization, so if pcpu_hot.cpu_number is ever
* changed in some incompatible and hard to support
* way, it's fine to back out this inlining logic
*/
#if defined(CONFIG_X86_64)
insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
cnt = 3;
#elif defined(CONFIG_ARM64)
struct bpf_insn cpu_number_addr[2] = { BPF_LD_IMM64(BPF_REG_0, (u64)&cpu_number) };
insn_buf[0] = cpu_number_addr[0];
insn_buf[1] = cpu_number_addr[1];
insn_buf[2] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
insn_buf[3] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
cnt = 4;
#else
goto next_insn;
#endif
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
return -ENOMEM;
delta += cnt - 1;
env->prog = prog = new_prog;
insn = new_prog->insnsi + i + delta;
goto next_insn;
}
>>
>> >
>> >> /* Implement bpf_get_smp_processor_id() inline. */
>> >> if (insn->imm == BPF_FUNC_get_smp_processor_id &&
>> >> prog->jit_requested && bpf_jit_supports_percpu_insn()) {
>> >> @@ -20214,11 +20214,20 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> >> * changed in some incompatible and hard to support
>> >> * way, it's fine to back out this inlining logic
>> >> */
>> >> +#if defined(CONFIG_X86_64)
>> >> insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
>> >> insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
>> >> insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
>> >> cnt = 3;
>> >> +#elif defined(CONFIG_ARM64)
>> >> + struct bpf_insn cpu_number_addr[2] = { BPF_LD_IMM64(BPF_REG_0, (u64)&cpu_number) };
>> >>
>> >
>> > this &cpu_number offset is not guaranteed to be within 4GB on arm64?
>>
>> Unfortunately, the per-cpu section is not placed in the first 4GB and
>> therefore the per-cpu pointers are not 32-bit on ARM64.
>
> I see. It might make sense to turn x86-64 code into using MOV64_IMM as
> well to keep more of the logic common. Then it will be just the
> difference of an offset that's loaded. Give it a try?
I think MOV64_IMM would have more overhead than MOV32_IMM and if we can
use it in x86-64 we should keep doing it that way. Wdyt?
>>
>> >
>> >> + insn_buf[0] = cpu_number_addr[0];
>> >> + insn_buf[1] = cpu_number_addr[1];
>> >> + insn_buf[2] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
>> >> + insn_buf[3] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
>> >> + cnt = 4;
>> >> +#endif
>> >> new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
>> >> if (!new_prog)
>> >> return -ENOMEM;
>> >> --
>> >> 2.40.1
>> >>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-04-25 18:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-24 17:35 [PATCH bpf-next v2 0/2] bpf, arm64: Support per-cpu instructions Puranjay Mohan
2024-04-24 17:35 ` Puranjay Mohan
2024-04-24 17:35 ` [PATCH bpf-next v2 1/2] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs Puranjay Mohan
2024-04-24 17:35 ` Puranjay Mohan
2024-04-24 17:35 ` [PATCH bpf-next v2 2/2] bpf, arm64: inline bpf_get_smp_processor_id() helper Puranjay Mohan
2024-04-24 17:35 ` Puranjay Mohan
2024-04-24 22:01 ` Andrii Nakryiko
2024-04-24 22:01 ` Andrii Nakryiko
2024-04-25 10:14 ` Puranjay Mohan
2024-04-25 10:14 ` Puranjay Mohan
2024-04-25 18:09 ` Andrii Nakryiko
2024-04-25 18:09 ` Andrii Nakryiko
2024-04-25 18:55 ` Puranjay Mohan [this message]
2024-04-25 18:55 ` Puranjay Mohan
2024-04-25 20:43 ` Andrii Nakryiko
2024-04-25 20:43 ` Andrii Nakryiko
2024-04-26 10:26 ` Puranjay Mohan
2024-04-26 10:26 ` Puranjay Mohan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mb61po79x9sqr.fsf@kernel.org \
--to=puranjay@kernel.org \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=revest@chromium.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=will@kernel.org \
--cc=xukuohai@huawei.com \
--cc=yonghong.song@linux.dev \
--cc=zlim.lnx@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.