* [PATCH bpf-next 1/3] bpf: Support inlining bpf_kptr_xchg() helper
2023-12-19 13:56 [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
@ 2023-12-19 13:56 ` Hou Tao
2023-12-20 15:02 ` Daniel Borkmann
2023-12-20 17:07 ` Eduard Zingerman
2023-12-19 13:56 ` [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG Hou Tao
` (2 subsequent siblings)
3 siblings, 2 replies; 14+ messages in thread
From: Hou Tao @ 2023-12-19 13:56 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1
From: Hou Tao <houtao1@huawei.com>
Considering most 64-bit JIT backends support atomic_xchg() and the
implementation of xchg() is the same as atomic_xchg() on these 64-bits
architectures, inline bpf_kptr_xchg() by converting the calling of
bpf_kptr_xchg() into an atomic_xchg() instruction.
As a precaution, defining a weak function bpf_jit_supports_ptr_xchg()
to state whether such conversion is safe and only inlining under 64-bit
host.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/filter.h | 1 +
kernel/bpf/core.c | 10 ++++++++++
kernel/bpf/verifier.c | 17 +++++++++++++++++
3 files changed, 28 insertions(+)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 12d907f17d36..fee070b9826e 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -955,6 +955,7 @@ bool bpf_jit_supports_subprog_tailcalls(void);
bool bpf_jit_supports_kfunc_call(void);
bool bpf_jit_supports_far_kfunc_call(void);
bool bpf_jit_supports_exceptions(void);
+bool bpf_jit_supports_ptr_xchg(void);
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
bool bpf_helper_changes_pkt_data(void *func);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 5aa6863ac33b..b64885827f90 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2922,6 +2922,16 @@ bool __weak bpf_jit_supports_far_kfunc_call(void)
return false;
}
+/* Return TRUE if the JIT backend satisfies the following two conditions:
+ * 1) JIT backend supports atomic_xchg() on pointer-sized words.
+ * 2) Under the specific arch, the implementation of xchg() is the same
+ * as atomic_xchg() on pointer-sized words.
+ */
+bool __weak bpf_jit_supports_ptr_xchg(void)
+{
+ return false;
+}
+
/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
* skb_copy_bits(), so provide a weak definition of it for NET-less config.
*/
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9456ee0ad129..7814c4f7576e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19668,6 +19668,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
continue;
}
+ /* Implement bpf_kptr_xchg inline */
+ if (prog->jit_requested && BITS_PER_LONG == 64 &&
+ insn->imm == BPF_FUNC_kptr_xchg &&
+ bpf_jit_supports_ptr_xchg()) {
+ insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
+ insn_buf[1] = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
+ cnt = 2;
+
+ new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+ if (!new_prog)
+ return -ENOMEM;
+
+ delta += cnt - 1;
+ env->prog = prog = new_prog;
+ insn = new_prog->insnsi + i + delta;
+ continue;
+ }
patch_call_imm:
fn = env->ops->get_func_proto(insn->imm, env->prog);
/* all functions that have prototype and verifier allowed
--
2.29.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH bpf-next 1/3] bpf: Support inlining bpf_kptr_xchg() helper
2023-12-19 13:56 ` [PATCH bpf-next 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
@ 2023-12-20 15:02 ` Daniel Borkmann
2023-12-20 17:07 ` Eduard Zingerman
1 sibling, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-12-20 15:02 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1
On 12/19/23 2:56 PM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> Considering most 64-bit JIT backends support atomic_xchg() and the
> implementation of xchg() is the same as atomic_xchg() on these 64-bits
> architectures, inline bpf_kptr_xchg() by converting the calling of
> bpf_kptr_xchg() into an atomic_xchg() instruction.
>
> As a precaution, defining a weak function bpf_jit_supports_ptr_xchg()
> to state whether such conversion is safe and only inlining under 64-bit
> host.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
You can probably send patch 2/3 as stand-alone, and squash patch 3/3 with
this one in here.
> ---
> include/linux/filter.h | 1 +
> kernel/bpf/core.c | 10 ++++++++++
> kernel/bpf/verifier.c | 17 +++++++++++++++++
> 3 files changed, 28 insertions(+)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 12d907f17d36..fee070b9826e 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -955,6 +955,7 @@ bool bpf_jit_supports_subprog_tailcalls(void);
> bool bpf_jit_supports_kfunc_call(void);
> bool bpf_jit_supports_far_kfunc_call(void);
> bool bpf_jit_supports_exceptions(void);
> +bool bpf_jit_supports_ptr_xchg(void);
> void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
> bool bpf_helper_changes_pkt_data(void *func);
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 5aa6863ac33b..b64885827f90 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2922,6 +2922,16 @@ bool __weak bpf_jit_supports_far_kfunc_call(void)
> return false;
> }
>
> +/* Return TRUE if the JIT backend satisfies the following two conditions:
> + * 1) JIT backend supports atomic_xchg() on pointer-sized words.
> + * 2) Under the specific arch, the implementation of xchg() is the same
> + * as atomic_xchg() on pointer-sized words.
> + */
> +bool __weak bpf_jit_supports_ptr_xchg(void)
> +{
> + return false;
> +}
> +
> /* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
> * skb_copy_bits(), so provide a weak definition of it for NET-less config.
> */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9456ee0ad129..7814c4f7576e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19668,6 +19668,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> continue;
> }
>
> + /* Implement bpf_kptr_xchg inline */
> + if (prog->jit_requested && BITS_PER_LONG == 64 &&
> + insn->imm == BPF_FUNC_kptr_xchg &&
> + bpf_jit_supports_ptr_xchg()) {
> + insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
> + insn_buf[1] = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
> + cnt = 2;
> +
> + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> + if (!new_prog)
> + return -ENOMEM;
> +
> + delta += cnt - 1;
> + env->prog = prog = new_prog;
> + insn = new_prog->insnsi + i + delta;
> + continue;
> + }
> patch_call_imm:
> fn = env->ops->get_func_proto(insn->imm, env->prog);
> /* all functions that have prototype and verifier allowed
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH bpf-next 1/3] bpf: Support inlining bpf_kptr_xchg() helper
2023-12-19 13:56 ` [PATCH bpf-next 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
2023-12-20 15:02 ` Daniel Borkmann
@ 2023-12-20 17:07 ` Eduard Zingerman
2023-12-21 11:40 ` Hou Tao
1 sibling, 1 reply; 14+ messages in thread
From: Eduard Zingerman @ 2023-12-20 17:07 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1
On Tue, 2023-12-19 at 21:56 +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
[...]
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9456ee0ad129..7814c4f7576e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19668,6 +19668,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> continue;
> }
>
> + /* Implement bpf_kptr_xchg inline */
> + if (prog->jit_requested && BITS_PER_LONG == 64 &&
> + insn->imm == BPF_FUNC_kptr_xchg &&
> + bpf_jit_supports_ptr_xchg()) {
> + insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
> + insn_buf[1] = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
> + cnt = 2;
> +
> + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> + if (!new_prog)
> + return -ENOMEM;
> +
> + delta += cnt - 1;
> + env->prog = prog = new_prog;
> + insn = new_prog->insnsi + i + delta;
> + continue;
> + }
> patch_call_imm:
> fn = env->ops->get_func_proto(insn->imm, env->prog);
> /* all functions that have prototype and verifier allowed
Hi Hou,
I have a suggestion about testing this rewrite.
It is possible to use function get_xlated_program() from
tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c,
to obtain a BPF disassembly for the program after
do_misc_fixups() are applied.
So, it shouldn't be difficult to:
- prepare a dummy program in progs/ that uses bpf_kptr_xchg();
- prepare a new test_* function in prog_tests/ that:
- loads that dummy program;
- queries it's disassembly using get_xlated_program();
- compares it with expected template.
I know that do_misc_fixups() are usually not tested this way,
but that does not mean they shouldn't, wdyt?
Thanks,
Eduard
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH bpf-next 1/3] bpf: Support inlining bpf_kptr_xchg() helper
2023-12-20 17:07 ` Eduard Zingerman
@ 2023-12-21 11:40 ` Hou Tao
0 siblings, 0 replies; 14+ messages in thread
From: Hou Tao @ 2023-12-21 11:40 UTC (permalink / raw)
To: Eduard Zingerman, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1
Hi Eduard,
On 12/21/2023 1:07 AM, Eduard Zingerman wrote:
> On Tue, 2023-12-19 at 21:56 +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
> [...]
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 9456ee0ad129..7814c4f7576e 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -19668,6 +19668,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> continue;
>> }
>>
>> + /* Implement bpf_kptr_xchg inline */
>> + if (prog->jit_requested && BITS_PER_LONG == 64 &&
>> + insn->imm == BPF_FUNC_kptr_xchg &&
>> + bpf_jit_supports_ptr_xchg()) {
>> + insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
>> + insn_buf[1] = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
>> + cnt = 2;
>> +
>> + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
>> + if (!new_prog)
>> + return -ENOMEM;
>> +
>> + delta += cnt - 1;
>> + env->prog = prog = new_prog;
>> + insn = new_prog->insnsi + i + delta;
>> + continue;
>> + }
>> patch_call_imm:
>> fn = env->ops->get_func_proto(insn->imm, env->prog);
>> /* all functions that have prototype and verifier allowed
> Hi Hou,
>
> I have a suggestion about testing this rewrite.
> It is possible to use function get_xlated_program() from
> tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c,
> to obtain a BPF disassembly for the program after
> do_misc_fixups() are applied.
>
> So, it shouldn't be difficult to:
> - prepare a dummy program in progs/ that uses bpf_kptr_xchg();
> - prepare a new test_* function in prog_tests/ that:
> - loads that dummy program;
> - queries it's disassembly using get_xlated_program();
> - compares it with expected template.
>
> I know that do_misc_fixups() are usually not tested this way,
> but that does not mean they shouldn't, wdyt?
Good idea and thanks for the detailed guidance. Will try it in v2.
>
> Thanks,
> Eduard
> .
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG
2023-12-19 13:56 [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
2023-12-19 13:56 ` [PATCH bpf-next 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
@ 2023-12-19 13:56 ` Hou Tao
2023-12-20 14:58 ` Daniel Borkmann
2023-12-19 13:56 ` [PATCH bpf-next 3/3] bpf, x86: Inline bpf_kptr_xchg() on x86-64 Hou Tao
2023-12-20 14:54 ` [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg() Daniel Borkmann
3 siblings, 1 reply; 14+ messages in thread
From: Hou Tao @ 2023-12-19 13:56 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1
From: Hou Tao <houtao1@huawei.com>
According to the implementation of atomic_xchg() under x86-64, the lock
prefix is not necessary for BPF_XCHG atomic operation, so just remove
it.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
arch/x86/net/bpf_jit_comp.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index c89a4abdd726..49dac4d22a7b 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -990,7 +990,9 @@ static int emit_atomic(u8 **pprog, u8 atomic_op,
{
u8 *prog = *pprog;
- EMIT1(0xF0); /* lock prefix */
+ /* lock prefix */
+ if (atomic_op != BPF_XCHG)
+ EMIT1(0xF0);
maybe_emit_mod(&prog, dst_reg, src_reg, bpf_size == BPF_DW);
--
2.29.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG
2023-12-19 13:56 ` [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG Hou Tao
@ 2023-12-20 14:58 ` Daniel Borkmann
2023-12-20 18:25 ` Alexei Starovoitov
0 siblings, 1 reply; 14+ messages in thread
From: Daniel Borkmann @ 2023-12-20 14:58 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1
On 12/19/23 2:56 PM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> According to the implementation of atomic_xchg() under x86-64, the lock
> prefix is not necessary for BPF_XCHG atomic operation, so just remove
> it.
It's probably a good idea for the commit message to explicitly quote the
Intel docs in here, so it's easier to find on why the lock prefix would
not be needed for the xchg op.
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
> arch/x86/net/bpf_jit_comp.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index c89a4abdd726..49dac4d22a7b 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -990,7 +990,9 @@ static int emit_atomic(u8 **pprog, u8 atomic_op,
> {
> u8 *prog = *pprog;
>
> - EMIT1(0xF0); /* lock prefix */
> + /* lock prefix */
> + if (atomic_op != BPF_XCHG)
> + EMIT1(0xF0);
>
> maybe_emit_mod(&prog, dst_reg, src_reg, bpf_size == BPF_DW);
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG
2023-12-20 14:58 ` Daniel Borkmann
@ 2023-12-20 18:25 ` Alexei Starovoitov
2023-12-20 18:46 ` Daniel Borkmann
0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2023-12-20 18:25 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Hou Tao, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao
On Wed, Dec 20, 2023 at 6:58 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/19/23 2:56 PM, Hou Tao wrote:
> > From: Hou Tao <houtao1@huawei.com>
> >
> > According to the implementation of atomic_xchg() under x86-64, the lock
> > prefix is not necessary for BPF_XCHG atomic operation, so just remove
> > it.
>
> It's probably a good idea for the commit message to explicitly quote the
> Intel docs in here, so it's easier to find on why the lock prefix would
> not be needed for the xchg op.
It's a surprise to me as well.
Definitely more info would be good.
Also if xchg insn without lock somehow implies lock in the HW
what is the harm of adding it explicitly?
If it's a lock in HW than performance with and without lock prefix
should be the same, right?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG
2023-12-20 18:25 ` Alexei Starovoitov
@ 2023-12-20 18:46 ` Daniel Borkmann
2023-12-21 11:37 ` Hou Tao
0 siblings, 1 reply; 14+ messages in thread
From: Daniel Borkmann @ 2023-12-20 18:46 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Hou Tao, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao
On 12/20/23 7:25 PM, Alexei Starovoitov wrote:
> On Wed, Dec 20, 2023 at 6:58 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> On 12/19/23 2:56 PM, Hou Tao wrote:
>>> From: Hou Tao <houtao1@huawei.com>
>>>
>>> According to the implementation of atomic_xchg() under x86-64, the lock
>>> prefix is not necessary for BPF_XCHG atomic operation, so just remove
>>> it.
>>
>> It's probably a good idea for the commit message to explicitly quote the
>> Intel docs in here, so it's easier to find on why the lock prefix would
>> not be needed for the xchg op.
>
> It's a surprise to me as well.
> Definitely more info would be good.
>
> Also if xchg insn without lock somehow implies lock in the HW
> what is the harm of adding it explicitly?
> If it's a lock in HW than performance with and without lock prefix
> should be the same, right?
e.g. 7.3.1.2 Exchange Instructions says:
The XCHG (exchange) instruction swaps the contents of two operands. This
instruction takes the place of three MOV instructions and does not require
a temporary location to save the contents of one operand location while
the other is being loaded. When a memory operand is used with the XCHG
instruction, the processor’s LOCK signal is automatically asserted.
Also curious if there is any harm adding it explicitly.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG
2023-12-20 18:46 ` Daniel Borkmann
@ 2023-12-21 11:37 ` Hou Tao
2023-12-23 4:01 ` Hou Tao
0 siblings, 1 reply; 14+ messages in thread
From: Hou Tao @ 2023-12-21 11:37 UTC (permalink / raw)
To: Daniel Borkmann, Alexei Starovoitov
Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
Yonghong Song, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Hou Tao
Hi,
On 12/21/2023 2:46 AM, Daniel Borkmann wrote:
> On 12/20/23 7:25 PM, Alexei Starovoitov wrote:
>> On Wed, Dec 20, 2023 at 6:58 AM Daniel Borkmann
>> <daniel@iogearbox.net> wrote:
>>>
>>> On 12/19/23 2:56 PM, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> According to the implementation of atomic_xchg() under x86-64, the
>>>> lock
>>>> prefix is not necessary for BPF_XCHG atomic operation, so just remove
>>>> it.
>>>
>>> It's probably a good idea for the commit message to explicitly quote
>>> the
>>> Intel docs in here, so it's easier to find on why the lock prefix would
>>> not be needed for the xchg op.
>>
>> It's a surprise to me as well.
>> Definitely more info would be good.
>>
>> Also if xchg insn without lock somehow implies lock in the HW
>> what is the harm of adding it explicitly?
>> If it's a lock in HW than performance with and without lock prefix
>> should be the same, right?
>
> e.g. 7.3.1.2 Exchange Instructions says:
>
> The XCHG (exchange) instruction swaps the contents of two operands.
> This
> instruction takes the place of three MOV instructions and does not
> require
> a temporary location to save the contents of one operand location while
> the other is being loaded. When a memory operand is used with the XCHG
> instruction, the processor’s LOCK signal is automatically asserted.
>
> Also curious if there is any harm adding it explicitly.
>
> .
I could use the bpf ma benchmark to test it, but I doubt it will make
any visible difference.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG
2023-12-21 11:37 ` Hou Tao
@ 2023-12-23 4:01 ` Hou Tao
0 siblings, 0 replies; 14+ messages in thread
From: Hou Tao @ 2023-12-23 4:01 UTC (permalink / raw)
To: Hou Tao, Daniel Borkmann, Alexei Starovoitov
Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
Yonghong Song, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend
Hi,
On 12/21/2023 7:37 PM, Hou Tao wrote:
> Hi,
>
> On 12/21/2023 2:46 AM, Daniel Borkmann wrote:
>> On 12/20/23 7:25 PM, Alexei Starovoitov wrote:
>>> On Wed, Dec 20, 2023 at 6:58 AM Daniel Borkmann
>>> <daniel@iogearbox.net> wrote:
>>>> On 12/19/23 2:56 PM, Hou Tao wrote:
>>>>> From: Hou Tao <houtao1@huawei.com>
>>>>>
>>>>> According to the implementation of atomic_xchg() under x86-64, the
>>>>> lock
>>>>> prefix is not necessary for BPF_XCHG atomic operation, so just remove
>>>>> it.
>>>> It's probably a good idea for the commit message to explicitly quote
>>>> the
>>>> Intel docs in here, so it's easier to find on why the lock prefix would
>>>> not be needed for the xchg op.
>>> It's a surprise to me as well.
>>> Definitely more info would be good.
>>>
>>> Also if xchg insn without lock somehow implies lock in the HW
>>> what is the harm of adding it explicitly?
>>> If it's a lock in HW than performance with and without lock prefix
>>> should be the same, right?
>> e.g. 7.3.1.2 Exchange Instructions says:
>>
>> The XCHG (exchange) instruction swaps the contents of two operands.
>> This
>> instruction takes the place of three MOV instructions and does not
>> require
>> a temporary location to save the contents of one operand location while
>> the other is being loaded. When a memory operand is used with the XCHG
>> instruction, the processor’s LOCK signal is automatically asserted.
>>
>> Also curious if there is any harm adding it explicitly.
>>
>> .
> I could use the bpf ma benchmark to test it, but I doubt it will make
> any visible difference.
>
> .
The following is the performance comparison between inlined-kptr-xchg()
without and with lock prefix. It seems for helper without-lock-prefix
the peak free performance is indeed larger than the helper with lock
prefix, but it is so tiny that it may be just fluctuations. So I will
keep the JIT of BPF_XCHG() as-is.
(1) inlined bpf_kptr_xchg() without lock prefix:
$ for i in 1 2 4 8; do ./bench -w3 -d10 bpf_ma -p${i} -a; done | grep
Summary
Summary: per-prod alloc 11.50 ± 0.25M/s free 37.52 ± 0.59M/s, total
memory usage 0.00 ± 0.00MiB
Summary: per-prod alloc 8.52 ± 0.14M/s free 34.73 ± 0.46M/s, total
memory usage 0.01 ± 0.00MiB
Summary: per-prod alloc 7.40 ± 0.09M/s free 36.09 ± 0.65M/s, total
memory usage 0.02 ± 0.00MiB
Summary: per-prod alloc 6.62 ± 0.16M/s free 36.48 ± 2.06M/s, total
memory usage 0.07 ± 0.00MiB
$ for j in $(seq 3); do ./bench -w3 -d10 bpf_ma -p1 -a; done | grep Summary
Summary: per-prod alloc 10.81 ± 0.12M/s free 36.78 ± 0.35M/s, total
memory usage 0.00 ± 0.00MiB
Summary: per-prod alloc 10.77 ± 0.07M/s free 35.76 ± 0.37M/s, total
memory usage 0.01 ± 0.00MiB
Summary: per-prod alloc 10.61 ± 0.09M/s free 33.92 ± 0.41M/s, total
memory usage 0.00 ± 0.00MiB
(2) inlined bpf_kptr_xchg() with lock prefix:
$ for i in 1 2 4 8; do ./bench -w3 -d10 bpf_ma -p${i} -a; done | grep
Summary
Summary: per-prod alloc 11.10 ± 0.19M/s free 36.07 ± 0.40M/s, total
memory usage 0.00 ± 0.00MiB
Summary: per-prod alloc 8.56 ± 0.13M/s free 35.74 ± 0.55M/s, total
memory usage 0.01 ± 0.00MiB
Summary: per-prod alloc 7.37 ± 0.05M/s free 35.78 ± 0.64M/s, total
memory usage 0.02 ± 0.00MiB
Summary: per-prod alloc 6.57 ± 0.10M/s free 33.72 ± 0.57M/s, total
memory usage 0.07 ± 0.00MiB
$ for j in $(seq 3); do ./bench -w3 -d10 bpf_ma -p1 -a; done | grep Summary
Summary: per-prod alloc 11.51 ± 0.20M/s free 35.42 ± 0.34M/s, total
memory usage 0.00 ± 0.00MiB
Summary: per-prod alloc 11.30 ± 0.08M/s free 34.81 ± 0.35M/s, total
memory usage 0.00 ± 0.00MiB
Summary: per-prod alloc 11.43 ± 0.11M/s free 35.43 ± 0.33M/s, total
memory usage 0.00 ± 0.00MiB
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH bpf-next 3/3] bpf, x86: Inline bpf_kptr_xchg() on x86-64
2023-12-19 13:56 [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
2023-12-19 13:56 ` [PATCH bpf-next 1/3] bpf: Support inlining bpf_kptr_xchg() helper Hou Tao
2023-12-19 13:56 ` [PATCH bpf-next 2/3] bpf, x86: Don't generate lock prefix for BPF_XCHG Hou Tao
@ 2023-12-19 13:56 ` Hou Tao
2023-12-20 14:54 ` [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg() Daniel Borkmann
3 siblings, 0 replies; 14+ messages in thread
From: Hou Tao @ 2023-12-19 13:56 UTC (permalink / raw)
To: bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1
From: Hou Tao <houtao1@huawei.com>
x86-64 supports BPF_XCHG atomic operation. The implementation of xchg()
and atomic_xchg() are also the same, because both of these two APIs
use arch_xchg() to implement the exchange. So enabling the inlining of
bpf_kptr_xchg() on x86-64.
After the optimization, the performance of an artificial test which
fetches the pointers through bpf_kptr_xchg() and frees the pointers
through bpf_obj_drop() increases 4%.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
arch/x86/net/bpf_jit_comp.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 49dac4d22a7b..9466c979ef9a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3201,3 +3201,8 @@ void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp
#endif
WARN(1, "verification of programs using bpf_throw should have failed\n");
}
+
+bool bpf_jit_supports_ptr_xchg(void)
+{
+ return true;
+}
--
2.29.2
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg()
2023-12-19 13:56 [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg() Hou Tao
` (2 preceding siblings ...)
2023-12-19 13:56 ` [PATCH bpf-next 3/3] bpf, x86: Inline bpf_kptr_xchg() on x86-64 Hou Tao
@ 2023-12-20 14:54 ` Daniel Borkmann
2023-12-21 11:32 ` Hou Tao
3 siblings, 1 reply; 14+ messages in thread
From: Daniel Borkmann @ 2023-12-20 14:54 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1
On 12/19/23 2:56 PM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> Hi,
>
> The motivation for the patch set comes from the performance profiling of
> bpf memory allocator benchmark (will post it soon). The initial purpose
> of the benchmark is used to test whether or not there is performance
> degradation when using c->unit_size instead of ksize() to select the
> target cache for free [1]. The benchmark uses bpf_kptr_xchg() to stash
> the allocated objects and fetches the stashed objects for free. Based on
> the fix proposed in [1], After inling bpf_kptr_xchg(), the performance
> for object free increase about ~4%.
It would probably make more sense if you place this also in the actual
patch as motivation / use case on /why/ it's needed.
> Initially the inline is implemented in do_jit() for x86-64 directly, but
> I think it will more portable to implement the inline in verifier.
> Please see individual patches for more details. And comments are always
> welcome.
>
> [1]: https://lore.kernel.org/bpf/20231216131052.27621-1-houtao@huaweicloud.com
>
> Hou Tao (3):
> bpf: Support inlining bpf_kptr_xchg() helper
> bpf, x86: Don't generate lock prefix for BPF_XCHG
> bpf, x86: Inline bpf_kptr_xchg() on x86-64
>
> arch/x86/net/bpf_jit_comp.c | 9 ++++++++-
> include/linux/filter.h | 1 +
> kernel/bpf/core.c | 10 ++++++++++
> kernel/bpf/verifier.c | 17 +++++++++++++++++
> 4 files changed, 36 insertions(+), 1 deletion(-)
>
nit: Needs a rebase.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg()
2023-12-20 14:54 ` [PATCH bpf-next 0/3] bpf: inline bpf_kptr_xchg() Daniel Borkmann
@ 2023-12-21 11:32 ` Hou Tao
0 siblings, 0 replies; 14+ messages in thread
From: Hou Tao @ 2023-12-21 11:32 UTC (permalink / raw)
To: Daniel Borkmann, bpf
Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
Hao Luo, Yonghong Song, KP Singh, Stanislav Fomichev, Jiri Olsa,
John Fastabend, houtao1
Hi,
On 12/20/2023 10:54 PM, Daniel Borkmann wrote:
> On 12/19/23 2:56 PM, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Hi,
>>
>> The motivation for the patch set comes from the performance profiling of
>> bpf memory allocator benchmark (will post it soon). The initial purpose
>> of the benchmark is used to test whether or not there is performance
>> degradation when using c->unit_size instead of ksize() to select the
>> target cache for free [1]. The benchmark uses bpf_kptr_xchg() to stash
>> the allocated objects and fetches the stashed objects for free. Based on
>> the fix proposed in [1], After inling bpf_kptr_xchg(), the performance
>> for object free increase about ~4%.
>
> It would probably make more sense if you place this also in the actual
> patch as motivation / use case on /why/ it's needed.
Thanks for the suggestion. Will added it in the inline patch.
>
>> Initially the inline is implemented in do_jit() for x86-64 directly, but
>> I think it will more portable to implement the inline in verifier.
>> Please see individual patches for more details. And comments are always
>> welcome.
>>
>> [1]:
>> https://lore.kernel.org/bpf/20231216131052.27621-1-houtao@huaweicloud.com
>>
>> Hou Tao (3):
>> bpf: Support inlining bpf_kptr_xchg() helper
>> bpf, x86: Don't generate lock prefix for BPF_XCHG
>> bpf, x86: Inline bpf_kptr_xchg() on x86-64
>>
>> arch/x86/net/bpf_jit_comp.c | 9 ++++++++-
>> include/linux/filter.h | 1 +
>> kernel/bpf/core.c | 10 ++++++++++
>> kernel/bpf/verifier.c | 17 +++++++++++++++++
>> 4 files changed, 36 insertions(+), 1 deletion(-)
>>
>
> nit: Needs a rebase.
> .
Will do.
^ permalink raw reply [flat|nested] 14+ messages in thread