* [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64
@ 2026-01-20 7:05 Menglong Dong
2026-01-20 7:05 ` [PATCH bpf-next v6 1/2] " Menglong Dong
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Menglong Dong @ 2026-01-20 7:05 UTC (permalink / raw)
To: ast, eddyz87
Cc: davem, dsahern, daniel, andrii, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx, mingo, bp,
dave.hansen, x86, hpa, netdev, bpf, linux-kernel
Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
to obtain better performance, and add the testcase for it.
Changes since v5:
* remove unnecessary 'ifdef' and __description in the selftests
* v5: https://lore.kernel.org/bpf/20260119070246.249499-1-dongml2@chinatelecom.cn/
Changes since v4:
* don't support the !CONFIG_SMP case
* v4: https://lore.kernel.org/bpf/20260112104529.224645-1-dongml2@chinatelecom.cn/
Changes since v3:
* handle the !CONFIG_SMP case
* ignore the !CONFIG_SMP case in the testcase, as we enable CONFIG_SMP
for x86_64 in the selftests
Changes since v2:
* implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in
x86_64 JIT (Alexei).
Changes since v1:
* add the testcase
* remove the usage of const_current_task
Menglong Dong (2):
bpf, x86: inline bpf_get_current_task() for x86_64
selftests/bpf: test the jited inline of bpf_get_current_task
kernel/bpf/verifier.c | 22 +++++++++++++++++++
.../selftests/bpf/prog_tests/verifier.c | 2 ++
.../selftests/bpf/progs/verifier_jit_inline.c | 20 +++++++++++++++++
3 files changed, 44 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/verifier_jit_inline.c
--
2.52.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-20 7:05 [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64 Menglong Dong
@ 2026-01-20 7:05 ` Menglong Dong
2026-01-21 1:23 ` Andrii Nakryiko
2026-01-20 7:05 ` [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task Menglong Dong
2026-01-21 4:50 ` [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64 patchwork-bot+netdevbpf
2 siblings, 1 reply; 17+ messages in thread
From: Menglong Dong @ 2026-01-20 7:05 UTC (permalink / raw)
To: ast, eddyz87
Cc: davem, dsahern, daniel, andrii, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx, mingo, bp,
dave.hansen, x86, hpa, netdev, bpf, linux-kernel
Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
to obtain better performance.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
---
v5:
- don't support the !CONFIG_SMP case
v4:
- handle the !CONFIG_SMP case
v3:
- implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in
x86_64 JIT.
---
kernel/bpf/verifier.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9de0ec0c3ed9..c4e2ffadfb1f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17739,6 +17739,10 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
switch (imm) {
#ifdef CONFIG_X86_64
case BPF_FUNC_get_smp_processor_id:
+#ifdef CONFIG_SMP
+ case BPF_FUNC_get_current_task_btf:
+ case BPF_FUNC_get_current_task:
+#endif
return env->prog->jit_requested && bpf_jit_supports_percpu_insn();
#endif
default:
@@ -23319,6 +23323,24 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
insn = new_prog->insnsi + i + delta;
goto next_insn;
}
+
+ /* Implement bpf_get_current_task() and bpf_get_current_task_btf() inline. */
+ if ((insn->imm == BPF_FUNC_get_current_task || insn->imm == BPF_FUNC_get_current_task_btf) &&
+ verifier_inlines_helper_call(env, insn->imm)) {
+ insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, (u32)(unsigned long)¤t_task);
+ insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
+ insn_buf[2] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
+ cnt = 3;
+
+ new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+ if (!new_prog)
+ return -ENOMEM;
+
+ delta += cnt - 1;
+ env->prog = prog = new_prog;
+ insn = new_prog->insnsi + i + delta;
+ goto next_insn;
+ }
#endif
/* Implement bpf_get_func_arg inline. */
if (prog_type == BPF_PROG_TYPE_TRACING &&
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task
2026-01-20 7:05 [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64 Menglong Dong
2026-01-20 7:05 ` [PATCH bpf-next v6 1/2] " Menglong Dong
@ 2026-01-20 7:05 ` Menglong Dong
2026-01-20 17:52 ` Eduard Zingerman
2026-01-21 1:05 ` Andrii Nakryiko
2026-01-21 4:50 ` [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64 patchwork-bot+netdevbpf
2 siblings, 2 replies; 17+ messages in thread
From: Menglong Dong @ 2026-01-20 7:05 UTC (permalink / raw)
To: ast, eddyz87
Cc: davem, dsahern, daniel, andrii, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx, mingo, bp,
dave.hansen, x86, hpa, netdev, bpf, linux-kernel
Add the testcase for the jited inline of bpf_get_current_task().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
v6:
* remove unnecessary 'ifdef' and __description
---
.../selftests/bpf/prog_tests/verifier.c | 2 ++
.../selftests/bpf/progs/verifier_jit_inline.c | 20 +++++++++++++++++++
2 files changed, 22 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/verifier_jit_inline.c
diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index 38c5ba70100c..2ae7b096bd64 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -111,6 +111,7 @@
#include "verifier_xdp_direct_packet_access.skel.h"
#include "verifier_bits_iter.skel.h"
#include "verifier_lsm.skel.h"
+#include "verifier_jit_inline.skel.h"
#include "irq.skel.h"
#define MAX_ENTRIES 11
@@ -253,6 +254,7 @@ void test_verifier_bits_iter(void) { RUN(verifier_bits_iter); }
void test_verifier_lsm(void) { RUN(verifier_lsm); }
void test_irq(void) { RUN(irq); }
void test_verifier_mtu(void) { RUN(verifier_mtu); }
+void test_verifier_jit_inline(void) { RUN(verifier_jit_inline); }
static int init_test_val_map(struct bpf_object *obj, char *map_name)
{
diff --git a/tools/testing/selftests/bpf/progs/verifier_jit_inline.c b/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
new file mode 100644
index 000000000000..4ea254063646
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+SEC("fentry/bpf_fentry_test1")
+__success __retval(0)
+__arch_x86_64
+__jited(" addq %gs:{{.*}}, %rax")
+__arch_arm64
+__jited(" mrs x7, SP_EL0")
+int inline_bpf_get_current_task(void)
+{
+ bpf_get_current_task();
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task
2026-01-20 7:05 ` [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task Menglong Dong
@ 2026-01-20 17:52 ` Eduard Zingerman
2026-01-21 1:05 ` Andrii Nakryiko
1 sibling, 0 replies; 17+ messages in thread
From: Eduard Zingerman @ 2026-01-20 17:52 UTC (permalink / raw)
To: Menglong Dong, ast
Cc: davem, dsahern, daniel, andrii, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx, mingo, bp,
dave.hansen, x86, hpa, netdev, bpf, linux-kernel
On Tue, 2026-01-20 at 15:05 +0800, Menglong Dong wrote:
> Add the testcase for the jited inline of bpf_get_current_task().
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> ---
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task
2026-01-20 7:05 ` [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task Menglong Dong
2026-01-20 17:52 ` Eduard Zingerman
@ 2026-01-21 1:05 ` Andrii Nakryiko
2026-01-21 1:28 ` Menglong Dong
1 sibling, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2026-01-21 1:05 UTC (permalink / raw)
To: Menglong Dong
Cc: ast, eddyz87, davem, dsahern, daniel, andrii, martin.lau, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx,
mingo, bp, dave.hansen, x86, hpa, netdev, bpf, linux-kernel
On Mon, Jan 19, 2026 at 11:06 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> Add the testcase for the jited inline of bpf_get_current_task().
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> ---
> v6:
> * remove unnecessary 'ifdef' and __description
> ---
> .../selftests/bpf/prog_tests/verifier.c | 2 ++
> .../selftests/bpf/progs/verifier_jit_inline.c | 20 +++++++++++++++++++
> 2 files changed, 22 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/progs/verifier_jit_inline.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
> index 38c5ba70100c..2ae7b096bd64 100644
> --- a/tools/testing/selftests/bpf/prog_tests/verifier.c
> +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
> @@ -111,6 +111,7 @@
> #include "verifier_xdp_direct_packet_access.skel.h"
> #include "verifier_bits_iter.skel.h"
> #include "verifier_lsm.skel.h"
> +#include "verifier_jit_inline.skel.h"
> #include "irq.skel.h"
>
> #define MAX_ENTRIES 11
> @@ -253,6 +254,7 @@ void test_verifier_bits_iter(void) { RUN(verifier_bits_iter); }
> void test_verifier_lsm(void) { RUN(verifier_lsm); }
> void test_irq(void) { RUN(irq); }
> void test_verifier_mtu(void) { RUN(verifier_mtu); }
> +void test_verifier_jit_inline(void) { RUN(verifier_jit_inline); }
>
> static int init_test_val_map(struct bpf_object *obj, char *map_name)
> {
> diff --git a/tools/testing/selftests/bpf/progs/verifier_jit_inline.c b/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
> new file mode 100644
> index 000000000000..4ea254063646
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
> @@ -0,0 +1,20 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <vmlinux.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_misc.h"
> +
> +SEC("fentry/bpf_fentry_test1")
> +__success __retval(0)
> +__arch_x86_64
> +__jited(" addq %gs:{{.*}}, %rax")
> +__arch_arm64
> +__jited(" mrs x7, SP_EL0")
I was confused to see this, as your patch actually implements inlining
only on x86-64. And then it turned out that on arm64 we inline this in
JIT. But Eduard also noticed that we actually SKIP this test on arm64
because of missing LLVM dependency, so that's not great.
So we should do something about silently skipped tests at least...
> +int inline_bpf_get_current_task(void)
> +{
> + bpf_get_current_task();
> +
> + return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-20 7:05 ` [PATCH bpf-next v6 1/2] " Menglong Dong
@ 2026-01-21 1:23 ` Andrii Nakryiko
2026-01-21 1:43 ` Alexei Starovoitov
2026-01-21 1:58 ` Menglong Dong
0 siblings, 2 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2026-01-21 1:23 UTC (permalink / raw)
To: Menglong Dong
Cc: ast, eddyz87, davem, dsahern, daniel, andrii, martin.lau, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx,
mingo, bp, dave.hansen, x86, hpa, netdev, bpf, linux-kernel
On Mon, Jan 19, 2026 at 11:06 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
> to obtain better performance.
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
> v5:
> - don't support the !CONFIG_SMP case
>
> v4:
> - handle the !CONFIG_SMP case
>
> v3:
> - implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in
> x86_64 JIT.
> ---
> kernel/bpf/verifier.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9de0ec0c3ed9..c4e2ffadfb1f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -17739,6 +17739,10 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
> switch (imm) {
> #ifdef CONFIG_X86_64
> case BPF_FUNC_get_smp_processor_id:
> +#ifdef CONFIG_SMP
> + case BPF_FUNC_get_current_task_btf:
> + case BPF_FUNC_get_current_task:
> +#endif
Does this have to be x86-64 specific inlining? With verifier inlining
and per_cpu instruction support it should theoretically work across
all architectures that do support per-cpu instruction, no?
Eduard pointed out [0] to me for why we have that x86-64 specific
check. But looking at do_misc_fixups(), we have that early
bpf_jit_inlines_helper_call(insn->imm)) check, so if some JIT has more
performant inlining implementation, we will just do that.
So it seems like we can just drop all that x86-64 specific logic and
claim all three of these functions as inlinable, no?
And even more. We can drop rather confusing
verifier_inlines_helper_call() that duplicates the decision of which
helpers can be inlined or not, and have:
if (env->prog->jit_requested && bpf_jit_supports_percpu_insn() {
switch (insn->imm) {
case BPF_FUNC_get_smp_processor_id:
...
break;
case BPF_FUNC_get_current_task_btf:
case BPF_FUNC_get_current_task_btf:
...
break;
default:
}
And the decision about inlining will live in one place.
Or am I missing some complications?
And with all that, should we mark get_current_task and
get_current_task_btf as __bpf_fastcall?
[0] https://lore.kernel.org/all/20240722233844.1406874-4-eddyz87@gmail.com/
> return env->prog->jit_requested && bpf_jit_supports_percpu_insn();
> #endif
> default:
> @@ -23319,6 +23323,24 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> insn = new_prog->insnsi + i + delta;
> goto next_insn;
> }
> +
> + /* Implement bpf_get_current_task() and bpf_get_current_task_btf() inline. */
> + if ((insn->imm == BPF_FUNC_get_current_task || insn->imm == BPF_FUNC_get_current_task_btf) &&
> + verifier_inlines_helper_call(env, insn->imm)) {
> + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, (u32)(unsigned long)¤t_task);
> + insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
> + insn_buf[2] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
> + cnt = 3;
> +
> + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> + if (!new_prog)
> + return -ENOMEM;
> +
> + delta += cnt - 1;
> + env->prog = prog = new_prog;
> + insn = new_prog->insnsi + i + delta;
> + goto next_insn;
> + }
> #endif
> /* Implement bpf_get_func_arg inline. */
> if (prog_type == BPF_PROG_TYPE_TRACING &&
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task
2026-01-21 1:05 ` Andrii Nakryiko
@ 2026-01-21 1:28 ` Menglong Dong
2026-01-21 1:32 ` Eduard Zingerman
0 siblings, 1 reply; 17+ messages in thread
From: Menglong Dong @ 2026-01-21 1:28 UTC (permalink / raw)
To: Menglong Dong, Andrii Nakryiko
Cc: ast, eddyz87, davem, dsahern, daniel, andrii, martin.lau, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx,
mingo, bp, dave.hansen, x86, hpa, netdev, bpf, linux-kernel
On 2026/1/21 09:05 Andrii Nakryiko <andrii.nakryiko@gmail.com> write:
> On Mon, Jan 19, 2026 at 11:06 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Add the testcase for the jited inline of bpf_get_current_task().
> >
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > ---
> > v6:
> > * remove unnecessary 'ifdef' and __description
> > ---
> > .../selftests/bpf/prog_tests/verifier.c | 2 ++
> > .../selftests/bpf/progs/verifier_jit_inline.c | 20 +++++++++++++++++++
> > 2 files changed, 22 insertions(+)
> > create mode 100644 tools/testing/selftests/bpf/progs/verifier_jit_inline.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
> > index 38c5ba70100c..2ae7b096bd64 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/verifier.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
> > @@ -111,6 +111,7 @@
> > #include "verifier_xdp_direct_packet_access.skel.h"
> > #include "verifier_bits_iter.skel.h"
> > #include "verifier_lsm.skel.h"
> > +#include "verifier_jit_inline.skel.h"
> > #include "irq.skel.h"
> >
> > #define MAX_ENTRIES 11
> > @@ -253,6 +254,7 @@ void test_verifier_bits_iter(void) { RUN(verifier_bits_iter); }
> > void test_verifier_lsm(void) { RUN(verifier_lsm); }
> > void test_irq(void) { RUN(irq); }
> > void test_verifier_mtu(void) { RUN(verifier_mtu); }
> > +void test_verifier_jit_inline(void) { RUN(verifier_jit_inline); }
> >
> > static int init_test_val_map(struct bpf_object *obj, char *map_name)
> > {
> > diff --git a/tools/testing/selftests/bpf/progs/verifier_jit_inline.c b/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
> > new file mode 100644
> > index 000000000000..4ea254063646
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/verifier_jit_inline.c
> > @@ -0,0 +1,20 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <vmlinux.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include "bpf_misc.h"
> > +
> > +SEC("fentry/bpf_fentry_test1")
> > +__success __retval(0)
> > +__arch_x86_64
> > +__jited(" addq %gs:{{.*}}, %rax")
> > +__arch_arm64
> > +__jited(" mrs x7, SP_EL0")
>
> I was confused to see this, as your patch actually implements inlining
> only on x86-64. And then it turned out that on arm64 we inline this in
Yeah, the arm64 implemented it already. And I add the testing for it
BTW.
> JIT. But Eduard also noticed that we actually SKIP this test on arm64
> because of missing LLVM dependency, so that's not great.
Do you mean that the CI of arm64 doesn't use LLVM for the selftests?
I noted that. I found that there are other similar "__jited" testings for
arm64, is there anything we can do?
PS: I tested the arm64 locally, and it works fine.
>
> So we should do something about silently skipped tests at least...
Like a warning?
Thanks!
Menglong Dong
>
> > +int inline_bpf_get_current_task(void)
> > +{
> > + bpf_get_current_task();
> > +
> > + return 0;
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
> > --
> > 2.52.0
> >
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task
2026-01-21 1:28 ` Menglong Dong
@ 2026-01-21 1:32 ` Eduard Zingerman
2026-01-21 3:03 ` Menglong Dong
0 siblings, 1 reply; 17+ messages in thread
From: Eduard Zingerman @ 2026-01-21 1:32 UTC (permalink / raw)
To: Menglong Dong, Menglong Dong, Andrii Nakryiko
Cc: ast, davem, dsahern, daniel, andrii, martin.lau, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx,
mingo, bp, dave.hansen, x86, hpa, netdev, bpf, linux-kernel
On Wed, 2026-01-21 at 09:28 +0800, Menglong Dong wrote:
[...]
> Do you mean that the CI of arm64 doesn't use LLVM for the selftests?
> I noted that. I found that there are other similar "__jited" testings for
> arm64, is there anything we can do?
>
> PS: I tested the arm64 locally, and it works fine.
>
> >
> > So we should do something about silently skipped tests at least...
>
> Like a warning?
Yes, probably llvm-devel or libs dependency is missing,
hence jit related selftests are skipped. Same thing for x86.
Discussed with Andrii making llvm an opt-out dependency:
fail selftests compilation if libraries are not found and SKIP_LLVM is not set.
We plan to address CI config issue tomorrow.
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-21 1:23 ` Andrii Nakryiko
@ 2026-01-21 1:43 ` Alexei Starovoitov
2026-01-21 1:58 ` Menglong Dong
1 sibling, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2026-01-21 1:43 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Menglong Dong, Alexei Starovoitov, Eduard, David S. Miller,
David Ahern, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Song Liu, Yonghong Song, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
Network Development, bpf, LKML
On Tue, Jan 20, 2026 at 5:24 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jan 19, 2026 at 11:06 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
> > to obtain better performance.
> >
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > ---
> > v5:
> > - don't support the !CONFIG_SMP case
> >
> > v4:
> > - handle the !CONFIG_SMP case
> >
> > v3:
> > - implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in
> > x86_64 JIT.
> > ---
> > kernel/bpf/verifier.c | 22 ++++++++++++++++++++++
> > 1 file changed, 22 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 9de0ec0c3ed9..c4e2ffadfb1f 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -17739,6 +17739,10 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
> > switch (imm) {
> > #ifdef CONFIG_X86_64
> > case BPF_FUNC_get_smp_processor_id:
> > +#ifdef CONFIG_SMP
> > + case BPF_FUNC_get_current_task_btf:
> > + case BPF_FUNC_get_current_task:
> > +#endif
>
> Does this have to be x86-64 specific inlining? With verifier inlining
> and per_cpu instruction support it should theoretically work across
> all architectures that do support per-cpu instruction, no?
>
> Eduard pointed out [0] to me for why we have that x86-64 specific
> check. But looking at do_misc_fixups(), we have that early
> bpf_jit_inlines_helper_call(insn->imm)) check, so if some JIT has more
> performant inlining implementation, we will just do that.
>
> So it seems like we can just drop all that x86-64 specific logic and
> claim all three of these functions as inlinable, no?
>
> And even more. We can drop rather confusing
> verifier_inlines_helper_call() that duplicates the decision of which
> helpers can be inlined or not, and have:
>
> if (env->prog->jit_requested && bpf_jit_supports_percpu_insn() {
> switch (insn->imm) {
> case BPF_FUNC_get_smp_processor_id:
> ...
> break;
> case BPF_FUNC_get_current_task_btf:
> case BPF_FUNC_get_current_task_btf:
> ...
> break;
> default:
> }
>
> And the decision about inlining will live in one place.
>
> Or am I missing some complications?
I think it needs to be arch specific, since 'current' is arch
specific. x86 is different from arm64.
Though both JITs support percpu pseudo insn, it doesn't help
to make get_current inlining generic.
One has to analyze each arch individually.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-21 1:23 ` Andrii Nakryiko
2026-01-21 1:43 ` Alexei Starovoitov
@ 2026-01-21 1:58 ` Menglong Dong
2026-01-21 3:10 ` Alexei Starovoitov
1 sibling, 1 reply; 17+ messages in thread
From: Menglong Dong @ 2026-01-21 1:58 UTC (permalink / raw)
To: Menglong Dong, Andrii Nakryiko
Cc: ast, eddyz87, davem, dsahern, daniel, andrii, martin.lau, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx,
mingo, bp, dave.hansen, x86, hpa, netdev, bpf, linux-kernel
On 2026/1/21 09:23 Andrii Nakryiko <andrii.nakryiko@gmail.com> write:
> On Mon, Jan 19, 2026 at 11:06 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
> > to obtain better performance.
> >
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > ---
> > v5:
> > - don't support the !CONFIG_SMP case
> >
> > v4:
> > - handle the !CONFIG_SMP case
> >
> > v3:
> > - implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in
> > x86_64 JIT.
> > ---
> > kernel/bpf/verifier.c | 22 ++++++++++++++++++++++
> > 1 file changed, 22 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 9de0ec0c3ed9..c4e2ffadfb1f 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -17739,6 +17739,10 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
> > switch (imm) {
> > #ifdef CONFIG_X86_64
> > case BPF_FUNC_get_smp_processor_id:
> > +#ifdef CONFIG_SMP
> > + case BPF_FUNC_get_current_task_btf:
> > + case BPF_FUNC_get_current_task:
> > +#endif
>
> Does this have to be x86-64 specific inlining? With verifier inlining
> and per_cpu instruction support it should theoretically work across
> all architectures that do support per-cpu instruction, no?
>
> Eduard pointed out [0] to me for why we have that x86-64 specific
> check. But looking at do_misc_fixups(), we have that early
> bpf_jit_inlines_helper_call(insn->imm)) check, so if some JIT has more
> performant inlining implementation, we will just do that.
>
> So it seems like we can just drop all that x86-64 specific logic and
> claim all three of these functions as inlinable, no?
>
> And even more. We can drop rather confusing
> verifier_inlines_helper_call() that duplicates the decision of which
> helpers can be inlined or not, and have:
The verifier_inlines_helper_call() is confusing, but I think we can't
remove the x86-64 checking. For example, some architecture
don't support BPF_FUNC_get_current_task both in
bpf_jit_inlines_helper_call() and verifier_inlines_helper_call(), which
means it can't be inline.
>
> if (env->prog->jit_requested && bpf_jit_supports_percpu_insn() {
> switch (insn->imm) {
> case BPF_FUNC_get_smp_processor_id:
> ...
> break;
> case BPF_FUNC_get_current_task_btf:
> case BPF_FUNC_get_current_task_btf:
> ...
> break;
> default:
> }
>
> And the decision about inlining will live in one place.
>
> Or am I missing some complications?
As Alexei said, the implement of "current" is architecture specific,
and the per-cpu variable "current_task" only exist on x86_64.
>
> And with all that, should we mark get_current_task and
> get_current_task_btf as __bpf_fastcall?
I think it make sense, and the I saw bpf_get_smp_processor_id does
such operation:
const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
[...]
.allow_fastcall = true,
};
PS: I'm a little confused about the fast call. We inline many helper,
but it seems that bpf_get_smp_processor_id is the only one that
use the "allow_fastcall". Why? I'd better study harder.
Thanks!
Menglong Dong
>
>
> [0] https://lore.kernel.org/all/20240722233844.1406874-4-eddyz87@gmail.com/
>
> > return env->prog->jit_requested && bpf_jit_supports_percpu_insn();
> > #endif
> > default:
> > @@ -23319,6 +23323,24 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> > insn = new_prog->insnsi + i + delta;
> > goto next_insn;
> > }
> > +
> > + /* Implement bpf_get_current_task() and bpf_get_current_task_btf() inline. */
> > + if ((insn->imm == BPF_FUNC_get_current_task || insn->imm == BPF_FUNC_get_current_task_btf) &&
> > + verifier_inlines_helper_call(env, insn->imm)) {
> > + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, (u32)(unsigned long)¤t_task);
> > + insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
> > + insn_buf[2] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
> > + cnt = 3;
> > +
> > + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> > + if (!new_prog)
> > + return -ENOMEM;
> > +
> > + delta += cnt - 1;
> > + env->prog = prog = new_prog;
> > + insn = new_prog->insnsi + i + delta;
> > + goto next_insn;
> > + }
> > #endif
> > /* Implement bpf_get_func_arg inline. */
> > if (prog_type == BPF_PROG_TYPE_TRACING &&
> > --
> > 2.52.0
> >
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task
2026-01-21 1:32 ` Eduard Zingerman
@ 2026-01-21 3:03 ` Menglong Dong
0 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2026-01-21 3:03 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Menglong Dong, Andrii Nakryiko, ast, davem, dsahern, daniel,
andrii, martin.lau, song, yonghong.song, john.fastabend, kpsingh,
sdf, haoluo, jolsa, tglx, mingo, bp, dave.hansen, x86, hpa,
netdev, bpf, linux-kernel
On Wed, Jan 21, 2026 at 9:32 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2026-01-21 at 09:28 +0800, Menglong Dong wrote:
>
> [...]
>
> > Do you mean that the CI of arm64 doesn't use LLVM for the selftests?
> > I noted that. I found that there are other similar "__jited" testings for
> > arm64, is there anything we can do?
> >
> > PS: I tested the arm64 locally, and it works fine.
> >
> > >
> > > So we should do something about silently skipped tests at least...
> >
> > Like a warning?
>
> Yes, probably llvm-devel or libs dependency is missing,
> hence jit related selftests are skipped. Same thing for x86.
> Discussed with Andrii making llvm an opt-out dependency:
> fail selftests compilation if libraries are not found and SKIP_LLVM is not set.
Sounds nice. People may not be aware of the LLVM dependence
sometimes.
So is there anything I can do in this series?
Thanks!
Menglong Dong
> We plan to address CI config issue tomorrow.
>
> [...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-21 1:58 ` Menglong Dong
@ 2026-01-21 3:10 ` Alexei Starovoitov
2026-01-21 3:37 ` Menglong Dong
2026-01-21 4:12 ` Andrii Nakryiko
0 siblings, 2 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2026-01-21 3:10 UTC (permalink / raw)
To: Menglong Dong
Cc: Menglong Dong, Andrii Nakryiko, Alexei Starovoitov, Eduard,
David S. Miller, David Ahern, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
Network Development, bpf, LKML
On Tue, Jan 20, 2026 at 5:58 PM Menglong Dong <menglong.dong@linux.dev> wrote:
>
> On 2026/1/21 09:23 Andrii Nakryiko <andrii.nakryiko@gmail.com> write:
> > On Mon, Jan 19, 2026 at 11:06 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > >
> > > Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
> > > to obtain better performance.
> > >
> > > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > > ---
> > > v5:
> > > - don't support the !CONFIG_SMP case
> > >
> > > v4:
> > > - handle the !CONFIG_SMP case
> > >
> > > v3:
> > > - implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in
> > > x86_64 JIT.
> > > ---
> > > kernel/bpf/verifier.c | 22 ++++++++++++++++++++++
> > > 1 file changed, 22 insertions(+)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 9de0ec0c3ed9..c4e2ffadfb1f 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -17739,6 +17739,10 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
> > > switch (imm) {
> > > #ifdef CONFIG_X86_64
> > > case BPF_FUNC_get_smp_processor_id:
> > > +#ifdef CONFIG_SMP
> > > + case BPF_FUNC_get_current_task_btf:
> > > + case BPF_FUNC_get_current_task:
> > > +#endif
> >
> > Does this have to be x86-64 specific inlining? With verifier inlining
> > and per_cpu instruction support it should theoretically work across
> > all architectures that do support per-cpu instruction, no?
> >
> > Eduard pointed out [0] to me for why we have that x86-64 specific
> > check. But looking at do_misc_fixups(), we have that early
> > bpf_jit_inlines_helper_call(insn->imm)) check, so if some JIT has more
> > performant inlining implementation, we will just do that.
> >
> > So it seems like we can just drop all that x86-64 specific logic and
> > claim all three of these functions as inlinable, no?
> >
> > And even more. We can drop rather confusing
> > verifier_inlines_helper_call() that duplicates the decision of which
> > helpers can be inlined or not, and have:
>
> The verifier_inlines_helper_call() is confusing, but I think we can't
> remove the x86-64 checking. For example, some architecture
> don't support BPF_FUNC_get_current_task both in
> bpf_jit_inlines_helper_call() and verifier_inlines_helper_call(), which
> means it can't be inline.
>
> >
> > if (env->prog->jit_requested && bpf_jit_supports_percpu_insn() {
> > switch (insn->imm) {
> > case BPF_FUNC_get_smp_processor_id:
> > ...
> > break;
> > case BPF_FUNC_get_current_task_btf:
> > case BPF_FUNC_get_current_task_btf:
> > ...
> > break;
> > default:
> > }
> >
> > And the decision about inlining will live in one place.
> >
> > Or am I missing some complications?
>
> As Alexei said, the implement of "current" is architecture specific,
> and the per-cpu variable "current_task" only exist on x86_64.
>
> >
> > And with all that, should we mark get_current_task and
> > get_current_task_btf as __bpf_fastcall?
>
> I think it make sense, and the I saw bpf_get_smp_processor_id does
> such operation:
>
> const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
> [...]
> .allow_fastcall = true,
> };
>
> PS: I'm a little confused about the fast call. We inline many helper,
> but it seems that bpf_get_smp_processor_id is the only one that
> use the "allow_fastcall". Why? I'd better study harder.
It's
static __bpf_fastcall __u32 (* const bpf_get_smp_processor_id)(void) =
(void *) 8;
and
#define __bpf_fastcall __attribute__((bpf_fastcall))
which makes LLVM use more registers at the callsite (less spill/fill).
Looking at the patch again. I think it's fine as-is.
fastcall can be a follow up.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-21 3:10 ` Alexei Starovoitov
@ 2026-01-21 3:37 ` Menglong Dong
2026-01-21 4:12 ` Andrii Nakryiko
1 sibling, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2026-01-21 3:37 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Menglong Dong, Andrii Nakryiko, Alexei Starovoitov, Eduard,
David S. Miller, David Ahern, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
Network Development, bpf, LKML
On 2026/1/21 11:10 Alexei Starovoitov <alexei.starovoitov@gmail.com> write:
> On Tue, Jan 20, 2026 at 5:58 PM Menglong Dong <menglong.dong@linux.dev> wrote:
> >
> > On 2026/1/21 09:23 Andrii Nakryiko <andrii.nakryiko@gmail.com> write:
> > > On Mon, Jan 19, 2026 at 11:06 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > >
> > > > Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
> > > > to obtain better performance.
> > > >
> > > > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > > > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > > > ---
> > > > v5:
> > > > - don't support the !CONFIG_SMP case
> > > >
> > > > v4:
> > > > - handle the !CONFIG_SMP case
> > > >
> > > > v3:
> > > > - implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in
> > > > x86_64 JIT.
> > > > ---
> > > > kernel/bpf/verifier.c | 22 ++++++++++++++++++++++
> > > > 1 file changed, 22 insertions(+)
> > > >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 9de0ec0c3ed9..c4e2ffadfb1f 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -17739,6 +17739,10 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
> > > > switch (imm) {
> > > > #ifdef CONFIG_X86_64
> > > > case BPF_FUNC_get_smp_processor_id:
> > > > +#ifdef CONFIG_SMP
> > > > + case BPF_FUNC_get_current_task_btf:
> > > > + case BPF_FUNC_get_current_task:
> > > > +#endif
> > >
> > > Does this have to be x86-64 specific inlining? With verifier inlining
> > > and per_cpu instruction support it should theoretically work across
> > > all architectures that do support per-cpu instruction, no?
> > >
> > > Eduard pointed out [0] to me for why we have that x86-64 specific
> > > check. But looking at do_misc_fixups(), we have that early
> > > bpf_jit_inlines_helper_call(insn->imm)) check, so if some JIT has more
> > > performant inlining implementation, we will just do that.
> > >
> > > So it seems like we can just drop all that x86-64 specific logic and
> > > claim all three of these functions as inlinable, no?
> > >
> > > And even more. We can drop rather confusing
> > > verifier_inlines_helper_call() that duplicates the decision of which
> > > helpers can be inlined or not, and have:
> >
> > The verifier_inlines_helper_call() is confusing, but I think we can't
> > remove the x86-64 checking. For example, some architecture
> > don't support BPF_FUNC_get_current_task both in
> > bpf_jit_inlines_helper_call() and verifier_inlines_helper_call(), which
> > means it can't be inline.
> >
> > >
> > > if (env->prog->jit_requested && bpf_jit_supports_percpu_insn() {
> > > switch (insn->imm) {
> > > case BPF_FUNC_get_smp_processor_id:
> > > ...
> > > break;
> > > case BPF_FUNC_get_current_task_btf:
> > > case BPF_FUNC_get_current_task_btf:
> > > ...
> > > break;
> > > default:
> > > }
> > >
> > > And the decision about inlining will live in one place.
> > >
> > > Or am I missing some complications?
> >
> > As Alexei said, the implement of "current" is architecture specific,
> > and the per-cpu variable "current_task" only exist on x86_64.
> >
> > >
> > > And with all that, should we mark get_current_task and
> > > get_current_task_btf as __bpf_fastcall?
> >
> > I think it make sense, and the I saw bpf_get_smp_processor_id does
> > such operation:
> >
> > const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
> > [...]
> > .allow_fastcall = true,
> > };
> >
> > PS: I'm a little confused about the fast call. We inline many helper,
> > but it seems that bpf_get_smp_processor_id is the only one that
> > use the "allow_fastcall". Why? I'd better study harder.
>
> It's
> static __bpf_fastcall __u32 (* const bpf_get_smp_processor_id)(void) =
> (void *) 8;
>
> and
> #define __bpf_fastcall __attribute__((bpf_fastcall))
Ah, I see. It seems that the bpf_doc.py does the trick.
>
> which makes LLVM use more registers at the callsite (less spill/fill).
>
> Looking at the patch again. I think it's fine as-is.
> fastcall can be a follow up.
Okay!
Thanks!
Menglong Dong
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-21 3:10 ` Alexei Starovoitov
2026-01-21 3:37 ` Menglong Dong
@ 2026-01-21 4:12 ` Andrii Nakryiko
2026-01-21 4:46 ` Alexei Starovoitov
1 sibling, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2026-01-21 4:12 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Menglong Dong, Menglong Dong, Alexei Starovoitov, Eduard,
David S. Miller, David Ahern, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
Network Development, bpf, LKML
On Tue, Jan 20, 2026 at 7:10 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 20, 2026 at 5:58 PM Menglong Dong <menglong.dong@linux.dev> wrote:
> >
> > On 2026/1/21 09:23 Andrii Nakryiko <andrii.nakryiko@gmail.com> write:
> > > On Mon, Jan 19, 2026 at 11:06 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > >
> > > > Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
> > > > to obtain better performance.
> > > >
> > > > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > > > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > > > ---
> > > > v5:
> > > > - don't support the !CONFIG_SMP case
> > > >
> > > > v4:
> > > > - handle the !CONFIG_SMP case
> > > >
> > > > v3:
> > > > - implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in
> > > > x86_64 JIT.
> > > > ---
> > > > kernel/bpf/verifier.c | 22 ++++++++++++++++++++++
> > > > 1 file changed, 22 insertions(+)
> > > >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 9de0ec0c3ed9..c4e2ffadfb1f 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -17739,6 +17739,10 @@ static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm)
> > > > switch (imm) {
> > > > #ifdef CONFIG_X86_64
> > > > case BPF_FUNC_get_smp_processor_id:
> > > > +#ifdef CONFIG_SMP
> > > > + case BPF_FUNC_get_current_task_btf:
> > > > + case BPF_FUNC_get_current_task:
> > > > +#endif
> > >
> > > Does this have to be x86-64 specific inlining? With verifier inlining
> > > and per_cpu instruction support it should theoretically work across
> > > all architectures that do support per-cpu instruction, no?
> > >
> > > Eduard pointed out [0] to me for why we have that x86-64 specific
> > > check. But looking at do_misc_fixups(), we have that early
> > > bpf_jit_inlines_helper_call(insn->imm)) check, so if some JIT has more
> > > performant inlining implementation, we will just do that.
> > >
> > > So it seems like we can just drop all that x86-64 specific logic and
> > > claim all three of these functions as inlinable, no?
> > >
> > > And even more. We can drop rather confusing
> > > verifier_inlines_helper_call() that duplicates the decision of which
> > > helpers can be inlined or not, and have:
> >
> > The verifier_inlines_helper_call() is confusing, but I think we can't
> > remove the x86-64 checking. For example, some architecture
> > don't support BPF_FUNC_get_current_task both in
> > bpf_jit_inlines_helper_call() and verifier_inlines_helper_call(), which
> > means it can't be inline.
> >
> > >
> > > if (env->prog->jit_requested && bpf_jit_supports_percpu_insn() {
> > > switch (insn->imm) {
> > > case BPF_FUNC_get_smp_processor_id:
> > > ...
> > > break;
> > > case BPF_FUNC_get_current_task_btf:
> > > case BPF_FUNC_get_current_task_btf:
> > > ...
> > > break;
> > > default:
> > > }
> > >
> > > And the decision about inlining will live in one place.
> > >
> > > Or am I missing some complications?
> >
> > As Alexei said, the implement of "current" is architecture specific,
> > and the per-cpu variable "current_task" only exist on x86_64.
> >
Ah, ok, that's the complication :)
> > >
> > > And with all that, should we mark get_current_task and
> > > get_current_task_btf as __bpf_fastcall?
> >
> > I think it make sense, and the I saw bpf_get_smp_processor_id does
> > such operation:
> >
> > const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
> > [...]
> > .allow_fastcall = true,
> > };
> >
> > PS: I'm a little confused about the fast call. We inline many helper,
> > but it seems that bpf_get_smp_processor_id is the only one that
> > use the "allow_fastcall". Why? I'd better study harder.
>
> It's
> static __bpf_fastcall __u32 (* const bpf_get_smp_processor_id)(void) =
> (void *) 8;
>
> and
> #define __bpf_fastcall __attribute__((bpf_fastcall))
>
> which makes LLVM use more registers at the callsite (less spill/fill).
>
> Looking at the patch again. I think it's fine as-is.
> fastcall can be a follow up.
Yeah, it's fine as is. But it still seems like
verifier_inlines_helper_call() is an unnecessary extra hop we can
remove (even if it has to stay arch-specific).
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-21 4:12 ` Andrii Nakryiko
@ 2026-01-21 4:46 ` Alexei Starovoitov
2026-01-21 6:35 ` Andrii Nakryiko
0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2026-01-21 4:46 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Menglong Dong, Menglong Dong, Alexei Starovoitov, Eduard,
David S. Miller, David Ahern, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
Network Development, bpf, LKML
On Tue, Jan 20, 2026 at 8:12 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> >
> > Looking at the patch again. I think it's fine as-is.
> > fastcall can be a follow up.
>
> Yeah, it's fine as is. But it still seems like
Thanks!
> verifier_inlines_helper_call() is an unnecessary extra hop we can
> remove (even if it has to stay arch-specific).
I'm not sure that we can, since it's used in two places:
get_call_summary():
cs->fastcall = fn->allow_fastcall &&
(verifier_inlines_helper_call(env, call->imm) ||
bpf_jit_inlines_helper_call(call->imm));
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-20 7:05 [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64 Menglong Dong
2026-01-20 7:05 ` [PATCH bpf-next v6 1/2] " Menglong Dong
2026-01-20 7:05 ` [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task Menglong Dong
@ 2026-01-21 4:50 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-01-21 4:50 UTC (permalink / raw)
To: Menglong Dong
Cc: ast, eddyz87, davem, dsahern, daniel, andrii, martin.lau, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, tglx,
mingo, bp, dave.hansen, x86, hpa, netdev, bpf, linux-kernel
Hello:
This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Tue, 20 Jan 2026 15:05:53 +0800 you wrote:
> Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64
> to obtain better performance, and add the testcase for it.
>
> Changes since v5:
> * remove unnecessary 'ifdef' and __description in the selftests
> * v5: https://lore.kernel.org/bpf/20260119070246.249499-1-dongml2@chinatelecom.cn/
>
> [...]
Here is the summary with links:
- [bpf-next,v6,1/2] bpf, x86: inline bpf_get_current_task() for x86_64
https://git.kernel.org/bpf/bpf-next/c/eaedea154eb9
- [bpf-next,v6,2/2] selftests/bpf: test the jited inline of bpf_get_current_task
https://git.kernel.org/bpf/bpf-next/c/4fca95095cdc
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v6 1/2] bpf, x86: inline bpf_get_current_task() for x86_64
2026-01-21 4:46 ` Alexei Starovoitov
@ 2026-01-21 6:35 ` Andrii Nakryiko
0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2026-01-21 6:35 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Menglong Dong, Menglong Dong, Alexei Starovoitov, Eduard,
David S. Miller, David Ahern, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
Network Development, bpf, LKML
On Tue, Jan 20, 2026 at 8:46 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 20, 2026 at 8:12 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > >
> > > Looking at the patch again. I think it's fine as-is.
> > > fastcall can be a follow up.
> >
> > Yeah, it's fine as is. But it still seems like
>
> Thanks!
>
> > verifier_inlines_helper_call() is an unnecessary extra hop we can
> > remove (even if it has to stay arch-specific).
>
> I'm not sure that we can, since it's used in two places:
> get_call_summary():
> cs->fastcall = fn->allow_fastcall &&
> (verifier_inlines_helper_call(env, call->imm) ||
> bpf_jit_inlines_helper_call(call->imm));
well then, just ignore me :)
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-01-21 6:36 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 7:05 [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64 Menglong Dong
2026-01-20 7:05 ` [PATCH bpf-next v6 1/2] " Menglong Dong
2026-01-21 1:23 ` Andrii Nakryiko
2026-01-21 1:43 ` Alexei Starovoitov
2026-01-21 1:58 ` Menglong Dong
2026-01-21 3:10 ` Alexei Starovoitov
2026-01-21 3:37 ` Menglong Dong
2026-01-21 4:12 ` Andrii Nakryiko
2026-01-21 4:46 ` Alexei Starovoitov
2026-01-21 6:35 ` Andrii Nakryiko
2026-01-20 7:05 ` [PATCH bpf-next v6 2/2] selftests/bpf: test the jited inline of bpf_get_current_task Menglong Dong
2026-01-20 17:52 ` Eduard Zingerman
2026-01-21 1:05 ` Andrii Nakryiko
2026-01-21 1:28 ` Menglong Dong
2026-01-21 1:32 ` Eduard Zingerman
2026-01-21 3:03 ` Menglong Dong
2026-01-21 4:50 ` [PATCH bpf-next v6 0/2] bpf, x86: inline bpf_get_current_task() for x86_64 patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox