* [PATCH RFC] rethook: inline arch_rethook_trampoline_callback() in assembly code
@ 2024-04-25 0:02 Andrii Nakryiko
2024-04-29 22:38 ` Andrii Nakryiko
0 siblings, 1 reply; 4+ messages in thread
From: Andrii Nakryiko @ 2024-04-25 0:02 UTC (permalink / raw)
To: linux-trace-kernel, rostedt, mhiramat; +Cc: bpf, Andrii Nakryiko
At the lowest level, rethook-based kretprobes on x86-64 architecture go
through arch_rethoook_trampoline() function, manually written in
assembly, which calls into a simple arch_rethook_trampoline_callback()
function, written in C, and only doing a few straightforward field
assignments, before calling further into rethook_trampoline_handler(),
which handles kretprobe callbacks generically.
Looking at simplicity of arch_rethook_trampoline_callback(), it seems
not really worthwhile to spend an extra function call just to do 4 or
5 assignments. As such, this patch proposes to "inline"
arch_rethook_trampoline_callback() into arch_rethook_trampoline() by
manually implementing it in an assembly code.
This has two motivations. First, we do get a bit of runtime speed up by
avoiding function calls. Using BPF selftests's bench tool, we see
0.6%-0.8% throughput improvement for kretprobe/multi-kretprobe
triggering code path:
BEFORE (latest probes/for-next)
===============================
kretprobe : 10.455 ± 0.024M/s
kretprobe-multi: 11.150 ± 0.012M/s
AFTER (probes/for-next + this patch)
====================================
kretprobe : 10.540 ± 0.009M/s (+0.8%)
kretprobe-multi: 11.219 ± 0.042M/s (+0.6%)
Second, and no less importantly for some specialized use cases, this
avoids unnecessarily "polluting" LBR records with an extra function call
(recorded as a jump by CPU). This is the case for the retsnoop ([0])
tool, which relies havily on capturing LBR records to provide users with
lots of insight into kernel internals.
This RFC patch is only inlining this function for x86-64, but it's
possible to do that for 32-bit x86 arch as well and then remove
arch_rethook_trampoline_callback() implementation altogether. Please let
me know if this change is acceptable and whether I should complete it
with 32-bit "inlining" as well. Thanks!
[0] https://nakryiko.com/posts/retsnoop-intro/#peering-deep-into-functions-with-lbr
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
arch/x86/kernel/asm-offsets_64.c | 4 ++++
arch/x86/kernel/rethook.c | 37 +++++++++++++++++++++++++++-----
2 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index bb65371ea9df..5c444abc540c 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -42,6 +42,10 @@ int main(void)
ENTRY(r14);
ENTRY(r15);
ENTRY(flags);
+ ENTRY(ip);
+ ENTRY(cs);
+ ENTRY(ss);
+ ENTRY(orig_ax);
BLANK();
#undef ENTRY
diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
index 8a1c0111ae79..3e1c01beebd1 100644
--- a/arch/x86/kernel/rethook.c
+++ b/arch/x86/kernel/rethook.c
@@ -6,6 +6,7 @@
#include <linux/rethook.h>
#include <linux/kprobes.h>
#include <linux/objtool.h>
+#include <asm/asm-offsets.h>
#include "kprobes/common.h"
@@ -34,10 +35,36 @@ asm(
" pushq %rsp\n"
" pushfq\n"
SAVE_REGS_STRING
- " movq %rsp, %rdi\n"
- " call arch_rethook_trampoline_callback\n"
+ " movq %rsp, %rdi\n" /* $rdi points to regs */
+ /* fixup registers */
+ /* regs->cs = __KERNEL_CS; */
+ " movq $" __stringify(__KERNEL_CS) ", " __stringify(pt_regs_cs) "(%rdi)\n"
+ /* regs->ip = (unsigned long)&arch_rethook_trampoline; */
+ " movq $arch_rethook_trampoline, " __stringify(pt_regs_ip) "(%rdi)\n"
+ /* regs->orig_ax = ~0UL; */
+ " movq $0xffffffffffffffff, " __stringify(pt_regs_orig_ax) "(%rdi)\n"
+ /* regs->sp += 2*sizeof(long); */
+ " addq $16, " __stringify(pt_regs_sp) "(%rdi)\n"
+ /* 2nd arg is frame_pointer = (long *)(regs + 1); */
+ " lea " __stringify(PTREGS_SIZE) "(%rdi), %rsi\n"
+ /*
+ * The return address at 'frame_pointer' is recovered by the
+ * arch_rethook_fixup_return() which called from this
+ * rethook_trampoline_handler().
+ */
+ " call rethook_trampoline_handler\n"
+ /*
+ * Copy FLAGS to 'pt_regs::ss' so we can do RET right after POPF.
+ *
+ * We don't save/restore %rax below, because we ignore
+ * rethook_trampoline_handler result.
+ *
+ * *(unsigned long *)®s->ss = regs->flags;
+ */
+ " mov " __stringify(pt_regs_flags) "(%rsp), %rax\n"
+ " mov %rax, " __stringify(pt_regs_ss) "(%rsp)\n"
RESTORE_REGS_STRING
- /* In the callback function, 'regs->flags' is copied to 'regs->ss'. */
+ /* We just copied 'regs->flags' into 'regs->ss'. */
" addq $16, %rsp\n"
" popfq\n"
#else
@@ -61,6 +88,7 @@ asm(
);
NOKPROBE_SYMBOL(arch_rethook_trampoline);
+#ifdef CONFIG_X86_32
/*
* Called from arch_rethook_trampoline
*/
@@ -70,9 +98,7 @@ __used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
/* fixup registers */
regs->cs = __KERNEL_CS;
-#ifdef CONFIG_X86_32
regs->gs = 0;
-#endif
regs->ip = (unsigned long)&arch_rethook_trampoline;
regs->orig_ax = ~0UL;
regs->sp += 2*sizeof(long);
@@ -92,6 +118,7 @@ __used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
*(unsigned long *)®s->ss = regs->flags;
}
NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
+#endif
/*
* arch_rethook_trampoline() skips updating frame pointer. The frame pointer
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] rethook: inline arch_rethook_trampoline_callback() in assembly code
2024-04-25 0:02 [PATCH RFC] rethook: inline arch_rethook_trampoline_callback() in assembly code Andrii Nakryiko
@ 2024-04-29 22:38 ` Andrii Nakryiko
2024-10-16 0:53 ` Masami Hiramatsu
0 siblings, 1 reply; 4+ messages in thread
From: Andrii Nakryiko @ 2024-04-29 22:38 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: linux-trace-kernel, rostedt, mhiramat, bpf
On Wed, Apr 24, 2024 at 5:02 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> At the lowest level, rethook-based kretprobes on x86-64 architecture go
> through arch_rethoook_trampoline() function, manually written in
> assembly, which calls into a simple arch_rethook_trampoline_callback()
> function, written in C, and only doing a few straightforward field
> assignments, before calling further into rethook_trampoline_handler(),
> which handles kretprobe callbacks generically.
>
> Looking at simplicity of arch_rethook_trampoline_callback(), it seems
> not really worthwhile to spend an extra function call just to do 4 or
> 5 assignments. As such, this patch proposes to "inline"
> arch_rethook_trampoline_callback() into arch_rethook_trampoline() by
> manually implementing it in an assembly code.
>
> This has two motivations. First, we do get a bit of runtime speed up by
> avoiding function calls. Using BPF selftests's bench tool, we see
> 0.6%-0.8% throughput improvement for kretprobe/multi-kretprobe
> triggering code path:
>
> BEFORE (latest probes/for-next)
> ===============================
> kretprobe : 10.455 ± 0.024M/s
> kretprobe-multi: 11.150 ± 0.012M/s
>
> AFTER (probes/for-next + this patch)
> ====================================
> kretprobe : 10.540 ± 0.009M/s (+0.8%)
> kretprobe-multi: 11.219 ± 0.042M/s (+0.6%)
>
> Second, and no less importantly for some specialized use cases, this
> avoids unnecessarily "polluting" LBR records with an extra function call
> (recorded as a jump by CPU). This is the case for the retsnoop ([0])
> tool, which relies havily on capturing LBR records to provide users with
> lots of insight into kernel internals.
>
> This RFC patch is only inlining this function for x86-64, but it's
> possible to do that for 32-bit x86 arch as well and then remove
> arch_rethook_trampoline_callback() implementation altogether. Please let
> me know if this change is acceptable and whether I should complete it
> with 32-bit "inlining" as well. Thanks!
>
> [0] https://nakryiko.com/posts/retsnoop-intro/#peering-deep-into-functions-with-lbr
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
> arch/x86/kernel/asm-offsets_64.c | 4 ++++
> arch/x86/kernel/rethook.c | 37 +++++++++++++++++++++++++++-----
> 2 files changed, 36 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
> index bb65371ea9df..5c444abc540c 100644
> --- a/arch/x86/kernel/asm-offsets_64.c
> +++ b/arch/x86/kernel/asm-offsets_64.c
> @@ -42,6 +42,10 @@ int main(void)
> ENTRY(r14);
> ENTRY(r15);
> ENTRY(flags);
> + ENTRY(ip);
> + ENTRY(cs);
> + ENTRY(ss);
> + ENTRY(orig_ax);
> BLANK();
> #undef ENTRY
>
> diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
> index 8a1c0111ae79..3e1c01beebd1 100644
> --- a/arch/x86/kernel/rethook.c
> +++ b/arch/x86/kernel/rethook.c
> @@ -6,6 +6,7 @@
> #include <linux/rethook.h>
> #include <linux/kprobes.h>
> #include <linux/objtool.h>
> +#include <asm/asm-offsets.h>
>
> #include "kprobes/common.h"
>
> @@ -34,10 +35,36 @@ asm(
> " pushq %rsp\n"
> " pushfq\n"
> SAVE_REGS_STRING
> - " movq %rsp, %rdi\n"
> - " call arch_rethook_trampoline_callback\n"
> + " movq %rsp, %rdi\n" /* $rdi points to regs */
> + /* fixup registers */
> + /* regs->cs = __KERNEL_CS; */
> + " movq $" __stringify(__KERNEL_CS) ", " __stringify(pt_regs_cs) "(%rdi)\n"
> + /* regs->ip = (unsigned long)&arch_rethook_trampoline; */
> + " movq $arch_rethook_trampoline, " __stringify(pt_regs_ip) "(%rdi)\n"
> + /* regs->orig_ax = ~0UL; */
> + " movq $0xffffffffffffffff, " __stringify(pt_regs_orig_ax) "(%rdi)\n"
> + /* regs->sp += 2*sizeof(long); */
> + " addq $16, " __stringify(pt_regs_sp) "(%rdi)\n"
> + /* 2nd arg is frame_pointer = (long *)(regs + 1); */
> + " lea " __stringify(PTREGS_SIZE) "(%rdi), %rsi\n"
BTW, all this __stringify() ugliness can be avoided if we move this
assembly into its own .S file, like lots of other assembly functions
in arch/x86/kernel subdir. That has another benefit of generating
better line information in DWARF for those assembly instructions. It's
lots more work, so before I do this, I'd like to get confirmation that
this change is acceptable in principle.
> + /*
> + * The return address at 'frame_pointer' is recovered by the
> + * arch_rethook_fixup_return() which called from this
> + * rethook_trampoline_handler().
> + */
> + " call rethook_trampoline_handler\n"
> + /*
> + * Copy FLAGS to 'pt_regs::ss' so we can do RET right after POPF.
> + *
> + * We don't save/restore %rax below, because we ignore
> + * rethook_trampoline_handler result.
> + *
> + * *(unsigned long *)®s->ss = regs->flags;
> + */
> + " mov " __stringify(pt_regs_flags) "(%rsp), %rax\n"
> + " mov %rax, " __stringify(pt_regs_ss) "(%rsp)\n"
> RESTORE_REGS_STRING
> - /* In the callback function, 'regs->flags' is copied to 'regs->ss'. */
> + /* We just copied 'regs->flags' into 'regs->ss'. */
> " addq $16, %rsp\n"
> " popfq\n"
> #else
> @@ -61,6 +88,7 @@ asm(
> );
> NOKPROBE_SYMBOL(arch_rethook_trampoline);
>
> +#ifdef CONFIG_X86_32
> /*
> * Called from arch_rethook_trampoline
> */
> @@ -70,9 +98,7 @@ __used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
>
> /* fixup registers */
> regs->cs = __KERNEL_CS;
> -#ifdef CONFIG_X86_32
> regs->gs = 0;
> -#endif
> regs->ip = (unsigned long)&arch_rethook_trampoline;
> regs->orig_ax = ~0UL;
> regs->sp += 2*sizeof(long);
> @@ -92,6 +118,7 @@ __used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
> *(unsigned long *)®s->ss = regs->flags;
> }
> NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
> +#endif
>
> /*
> * arch_rethook_trampoline() skips updating frame pointer. The frame pointer
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] rethook: inline arch_rethook_trampoline_callback() in assembly code
2024-04-29 22:38 ` Andrii Nakryiko
@ 2024-10-16 0:53 ` Masami Hiramatsu
2024-10-16 19:53 ` Andrii Nakryiko
0 siblings, 1 reply; 4+ messages in thread
From: Masami Hiramatsu @ 2024-10-16 0:53 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, linux-trace-kernel, rostedt, mhiramat, bpf
Hi Andrii,
Sorry I excavated this from patchwork.
On Mon, 29 Apr 2024 15:38:08 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> On Wed, Apr 24, 2024 at 5:02 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > At the lowest level, rethook-based kretprobes on x86-64 architecture go
> > through arch_rethoook_trampoline() function, manually written in
> > assembly, which calls into a simple arch_rethook_trampoline_callback()
> > function, written in C, and only doing a few straightforward field
> > assignments, before calling further into rethook_trampoline_handler(),
> > which handles kretprobe callbacks generically.
> >
> > Looking at simplicity of arch_rethook_trampoline_callback(), it seems
> > not really worthwhile to spend an extra function call just to do 4 or
> > 5 assignments. As such, this patch proposes to "inline"
> > arch_rethook_trampoline_callback() into arch_rethook_trampoline() by
> > manually implementing it in an assembly code.
Yeah, I think it is possible, but this makes code ugly, that is
trade-off. As you say, we should move this with other ugly inline
assembly code into kprobe.S or something like it. With my current
fprobe-on-fgraph, rethook is only required for kretprobes, so it
is natual and simple to have kprobe.S.
Thank you,
> >
> > This has two motivations. First, we do get a bit of runtime speed up by
> > avoiding function calls. Using BPF selftests's bench tool, we see
> > 0.6%-0.8% throughput improvement for kretprobe/multi-kretprobe
> > triggering code path:
> >
> > BEFORE (latest probes/for-next)
> > ===============================
> > kretprobe : 10.455 ± 0.024M/s
> > kretprobe-multi: 11.150 ± 0.012M/s
> >
> > AFTER (probes/for-next + this patch)
> > ====================================
> > kretprobe : 10.540 ± 0.009M/s (+0.8%)
> > kretprobe-multi: 11.219 ± 0.042M/s (+0.6%)
> >
> > Second, and no less importantly for some specialized use cases, this
> > avoids unnecessarily "polluting" LBR records with an extra function call
> > (recorded as a jump by CPU). This is the case for the retsnoop ([0])
> > tool, which relies havily on capturing LBR records to provide users with
> > lots of insight into kernel internals.
> >
> > This RFC patch is only inlining this function for x86-64, but it's
> > possible to do that for 32-bit x86 arch as well and then remove
> > arch_rethook_trampoline_callback() implementation altogether. Please let
> > me know if this change is acceptable and whether I should complete it
> > with 32-bit "inlining" as well. Thanks!
> >
> > [0] https://nakryiko.com/posts/retsnoop-intro/#peering-deep-into-functions-with-lbr
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> > arch/x86/kernel/asm-offsets_64.c | 4 ++++
> > arch/x86/kernel/rethook.c | 37 +++++++++++++++++++++++++++-----
> > 2 files changed, 36 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
> > index bb65371ea9df..5c444abc540c 100644
> > --- a/arch/x86/kernel/asm-offsets_64.c
> > +++ b/arch/x86/kernel/asm-offsets_64.c
> > @@ -42,6 +42,10 @@ int main(void)
> > ENTRY(r14);
> > ENTRY(r15);
> > ENTRY(flags);
> > + ENTRY(ip);
> > + ENTRY(cs);
> > + ENTRY(ss);
> > + ENTRY(orig_ax);
> > BLANK();
> > #undef ENTRY
> >
> > diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
> > index 8a1c0111ae79..3e1c01beebd1 100644
> > --- a/arch/x86/kernel/rethook.c
> > +++ b/arch/x86/kernel/rethook.c
> > @@ -6,6 +6,7 @@
> > #include <linux/rethook.h>
> > #include <linux/kprobes.h>
> > #include <linux/objtool.h>
> > +#include <asm/asm-offsets.h>
> >
> > #include "kprobes/common.h"
> >
> > @@ -34,10 +35,36 @@ asm(
> > " pushq %rsp\n"
> > " pushfq\n"
> > SAVE_REGS_STRING
> > - " movq %rsp, %rdi\n"
> > - " call arch_rethook_trampoline_callback\n"
> > + " movq %rsp, %rdi\n" /* $rdi points to regs */
> > + /* fixup registers */
> > + /* regs->cs = __KERNEL_CS; */
> > + " movq $" __stringify(__KERNEL_CS) ", " __stringify(pt_regs_cs) "(%rdi)\n"
> > + /* regs->ip = (unsigned long)&arch_rethook_trampoline; */
> > + " movq $arch_rethook_trampoline, " __stringify(pt_regs_ip) "(%rdi)\n"
> > + /* regs->orig_ax = ~0UL; */
> > + " movq $0xffffffffffffffff, " __stringify(pt_regs_orig_ax) "(%rdi)\n"
> > + /* regs->sp += 2*sizeof(long); */
> > + " addq $16, " __stringify(pt_regs_sp) "(%rdi)\n"
> > + /* 2nd arg is frame_pointer = (long *)(regs + 1); */
> > + " lea " __stringify(PTREGS_SIZE) "(%rdi), %rsi\n"
>
> BTW, all this __stringify() ugliness can be avoided if we move this
> assembly into its own .S file, like lots of other assembly functions
> in arch/x86/kernel subdir. That has another benefit of generating
> better line information in DWARF for those assembly instructions. It's
> lots more work, so before I do this, I'd like to get confirmation that
> this change is acceptable in principle.
>
> > + /*
> > + * The return address at 'frame_pointer' is recovered by the
> > + * arch_rethook_fixup_return() which called from this
> > + * rethook_trampoline_handler().
> > + */
> > + " call rethook_trampoline_handler\n"
> > + /*
> > + * Copy FLAGS to 'pt_regs::ss' so we can do RET right after POPF.
> > + *
> > + * We don't save/restore %rax below, because we ignore
> > + * rethook_trampoline_handler result.
> > + *
> > + * *(unsigned long *)®s->ss = regs->flags;
> > + */
> > + " mov " __stringify(pt_regs_flags) "(%rsp), %rax\n"
> > + " mov %rax, " __stringify(pt_regs_ss) "(%rsp)\n"
> > RESTORE_REGS_STRING
> > - /* In the callback function, 'regs->flags' is copied to 'regs->ss'. */
> > + /* We just copied 'regs->flags' into 'regs->ss'. */
> > " addq $16, %rsp\n"
> > " popfq\n"
> > #else
> > @@ -61,6 +88,7 @@ asm(
> > );
> > NOKPROBE_SYMBOL(arch_rethook_trampoline);
> >
> > +#ifdef CONFIG_X86_32
> > /*
> > * Called from arch_rethook_trampoline
> > */
> > @@ -70,9 +98,7 @@ __used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
> >
> > /* fixup registers */
> > regs->cs = __KERNEL_CS;
> > -#ifdef CONFIG_X86_32
> > regs->gs = 0;
> > -#endif
> > regs->ip = (unsigned long)&arch_rethook_trampoline;
> > regs->orig_ax = ~0UL;
> > regs->sp += 2*sizeof(long);
> > @@ -92,6 +118,7 @@ __used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
> > *(unsigned long *)®s->ss = regs->flags;
> > }
> > NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
> > +#endif
> >
> > /*
> > * arch_rethook_trampoline() skips updating frame pointer. The frame pointer
> > --
> > 2.43.0
> >
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] rethook: inline arch_rethook_trampoline_callback() in assembly code
2024-10-16 0:53 ` Masami Hiramatsu
@ 2024-10-16 19:53 ` Andrii Nakryiko
0 siblings, 0 replies; 4+ messages in thread
From: Andrii Nakryiko @ 2024-10-16 19:53 UTC (permalink / raw)
To: Masami Hiramatsu; +Cc: Andrii Nakryiko, linux-trace-kernel, rostedt, bpf
On Tue, Oct 15, 2024 at 5:53 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi Andrii,
>
> Sorry I excavated this from patchwork.
>
> On Mon, 29 Apr 2024 15:38:08 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Wed, Apr 24, 2024 at 5:02 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > At the lowest level, rethook-based kretprobes on x86-64 architecture go
> > > through arch_rethoook_trampoline() function, manually written in
> > > assembly, which calls into a simple arch_rethook_trampoline_callback()
> > > function, written in C, and only doing a few straightforward field
> > > assignments, before calling further into rethook_trampoline_handler(),
> > > which handles kretprobe callbacks generically.
> > >
> > > Looking at simplicity of arch_rethook_trampoline_callback(), it seems
> > > not really worthwhile to spend an extra function call just to do 4 or
> > > 5 assignments. As such, this patch proposes to "inline"
> > > arch_rethook_trampoline_callback() into arch_rethook_trampoline() by
> > > manually implementing it in an assembly code.
>
> Yeah, I think it is possible, but this makes code ugly, that is
> trade-off. As you say, we should move this with other ugly inline
> assembly code into kprobe.S or something like it. With my current
> fprobe-on-fgraph, rethook is only required for kretprobes, so it
> is natual and simple to have kprobe.S.
>
Alright, I'll wait for fprobe-on-fgraph work to land, and will look at
the LBR "wastage" and see what can be done to minimize it. If at that
point something like this is still needed, I'll follow up separately.
Thanks.
> Thank you,
>
> > >
> > > This has two motivations. First, we do get a bit of runtime speed up by
> > > avoiding function calls. Using BPF selftests's bench tool, we see
> > > 0.6%-0.8% throughput improvement for kretprobe/multi-kretprobe
> > > triggering code path:
> > >
> > > BEFORE (latest probes/for-next)
> > > ===============================
> > > kretprobe : 10.455 ± 0.024M/s
> > > kretprobe-multi: 11.150 ± 0.012M/s
> > >
> > > AFTER (probes/for-next + this patch)
> > > ====================================
> > > kretprobe : 10.540 ± 0.009M/s (+0.8%)
> > > kretprobe-multi: 11.219 ± 0.042M/s (+0.6%)
> > >
> > > Second, and no less importantly for some specialized use cases, this
> > > avoids unnecessarily "polluting" LBR records with an extra function call
> > > (recorded as a jump by CPU). This is the case for the retsnoop ([0])
> > > tool, which relies havily on capturing LBR records to provide users with
> > > lots of insight into kernel internals.
> > >
> > > This RFC patch is only inlining this function for x86-64, but it's
> > > possible to do that for 32-bit x86 arch as well and then remove
> > > arch_rethook_trampoline_callback() implementation altogether. Please let
> > > me know if this change is acceptable and whether I should complete it
> > > with 32-bit "inlining" as well. Thanks!
> > >
> > > [0] https://nakryiko.com/posts/retsnoop-intro/#peering-deep-into-functions-with-lbr
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > > arch/x86/kernel/asm-offsets_64.c | 4 ++++
> > > arch/x86/kernel/rethook.c | 37 +++++++++++++++++++++++++++-----
> > > 2 files changed, 36 insertions(+), 5 deletions(-)
> > >
[...]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-10-16 19:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-25 0:02 [PATCH RFC] rethook: inline arch_rethook_trampoline_callback() in assembly code Andrii Nakryiko
2024-04-29 22:38 ` Andrii Nakryiko
2024-10-16 0:53 ` Masami Hiramatsu
2024-10-16 19:53 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).