On Mon, 29 Apr 2024 13:25:04 -0700 Andrii Nakryiko wrote: > On Mon, Apr 29, 2024 at 6:51 AM Masami Hiramatsu wrote: > > > > Hi Andrii, > > > > On Thu, 25 Apr 2024 13:31:53 -0700 > > Andrii Nakryiko wrote: > > > > > Hey Masami, > > > > > > I can't really review most of that code as I'm completely unfamiliar > > > with all those inner workings of fprobe/ftrace/function_graph. I left > > > a few comments where there were somewhat more obvious BPF-related > > > pieces. > > > > > > But I also did run our BPF benchmarks on probes/for-next as a baseline > > > and then with your series applied on top. Just to see if there are any > > > regressions. I think it will be a useful data point for you. > > > > Thanks for testing! > > > > > > > > You should be already familiar with the bench tool we have in BPF > > > selftests (I used it on some other patches for your tree). > > > > What patches we need? > > > > You mean for this `bench` tool? They are part of BPF selftests (under > tools/testing/selftests/bpf), you can build them by running: > > $ make RELEASE=1 -j$(nproc) bench > > After that you'll get a self-container `bench` binary, which has all > the self-contained benchmarks. > > You might also find a small script (benchs/run_bench_trigger.sh inside > BPF selftests directory) helpful, it collects final summary of the > benchmark run and optionally accepts a specific set of benchmarks. So > you can use it like this: > > $ benchs/run_bench_trigger.sh kprobe kprobe-multi > kprobe : 18.731 ± 0.639M/s > kprobe-multi : 23.938 ± 0.612M/s > > By default it will run a wider set of benchmarks (no uprobes, but a > bunch of extra fentry/fexit tests and stuff like this). origin: # benchs/run_bench_trigger.sh kretprobe : 1.329 ± 0.007M/s kretprobe-multi: 1.341 ± 0.004M/s # benchs/run_bench_trigger.sh kretprobe : 1.288 ± 0.014M/s kretprobe-multi: 1.365 ± 0.002M/s # benchs/run_bench_trigger.sh kretprobe : 1.329 ± 0.002M/s kretprobe-multi: 1.331 ± 0.011M/s # benchs/run_bench_trigger.sh kretprobe : 1.311 ± 0.003M/s kretprobe-multi: 1.318 ± 0.002M/s s patched: # benchs/run_bench_trigger.sh kretprobe : 1.274 ± 0.003M/s kretprobe-multi: 1.397 ± 0.002M/s # benchs/run_bench_trigger.sh kretprobe : 1.307 ± 0.002M/s kretprobe-multi: 1.406 ± 0.004M/s # benchs/run_bench_trigger.sh kretprobe : 1.279 ± 0.004M/s kretprobe-multi: 1.330 ± 0.014M/s # benchs/run_bench_trigger.sh kretprobe : 1.256 ± 0.010M/s kretprobe-multi: 1.412 ± 0.003M/s Hmm, in my case, it seems smaller differences (~3%?). I attached perf report results for those, but I don't see large difference. > > > > > > BASELINE > > > ======== > > > kprobe : 24.634 ± 0.205M/s > > > kprobe-multi : 28.898 ± 0.531M/s > > > kretprobe : 10.478 ± 0.015M/s > > > kretprobe-multi: 11.012 ± 0.063M/s > > > > > > THIS PATCH SET ON TOP > > > ===================== > > > kprobe : 25.144 ± 0.027M/s (+2%) > > > kprobe-multi : 28.909 ± 0.074M/s > > > kretprobe : 9.482 ± 0.008M/s (-9.5%) > > > kretprobe-multi: 13.688 ± 0.027M/s (+24%) > > > > This looks good. Kretprobe should also use kretprobe-multi (fprobe) > > eventually because it should be a single callback version of > > kretprobe-multi. I ran another benchmark (prctl loop, attached), the origin kernel result is here; # sh ./benchmark.sh count = 10000000, took 6.748133 sec And the patched kernel result; # sh ./benchmark.sh count = 10000000, took 6.644095 sec I confirmed that the parf result has no big difference. Thank you, > > > > > > > > These numbers are pretty stable and look to be more or less representative. > > > > > > As you can see, kprobes got a bit faster, kprobe-multi seems to be > > > about the same, though. > > > > > > Then (I suppose they are "legacy") kretprobes got quite noticeably > > > slower, almost by 10%. Not sure why, but looks real after re-running > > > benchmarks a bunch of times and getting stable results. > > > > Hmm, kretprobe on x86 should use ftrace + rethook even with my series. > > So nothing should be changed. Maybe cache access pattern has been > > changed? > > I'll check it with tracefs (to remove the effect from bpf related changes) > > > > > > > > On the other hand, multi-kretprobes got significantly faster (+24%!). > > > Again, I don't know if it is expected or not, but it's a nice > > > improvement. > > > > Thanks! > > > > > > > > If you have any idea why kretprobes would get so much slower, it would > > > be nice to look into that and see if you can mitigate the regression > > > somehow. Thanks! > > > > OK, let me check it. > > > > Thank you! > > > > > > > > > > > > 51 files changed, 2325 insertions(+), 882 deletions(-) > > > > create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc > > > > > > > > -- > > > > Masami Hiramatsu (Google) > > > > > > > > > > -- > > Masami Hiramatsu (Google) -- Masami Hiramatsu (Google)