From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Vadim Fedorenko <vadfed@meta.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Mykola Lysenko <mykolal@fb.com>, Jakub Kicinski <kuba@kernel.org>,
x86@kernel.org, bpf@vger.kernel.org,
Martin KaFai Lau <martin.lau@linux.dev>
Subject: Re: [PATCH bpf-next v5 1/4] bpf: add bpf_get_cpu_cycles kfunc
Date: Tue, 12 Nov 2024 21:43:01 +0000 [thread overview]
Message-ID: <946c61c2-ab1e-43cb-adfd-cc5b7716b915@linux.dev> (raw)
In-Reply-To: <CAEf4BzaRb+fUK17wrj4sWnYM5oKxTvwZC=U-GjvsdUtF94PqrA@mail.gmail.com>
On 12/11/2024 05:50, Andrii Nakryiko wrote:
> On Fri, Nov 8, 2024 at 4:42 PM Vadim Fedorenko <vadfed@meta.com> wrote:
>>
>> New kfunc to return ARCH-specific timecounter. For x86 BPF JIT converts
>> it into rdtsc ordered call. Other architectures will get JIT
>> implementation too if supported. The fallback is to
>> __arch_get_hw_counter().
>>
>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>> ---
>
> nit: please add cover letter for the next revision, multi-patch sets
> generally should come with a cover letter, unless it's some set of
> trivial and mostly independent patches. Anyways...
Yeah, sure. This series has grown from the first small version...
>
> I haven't yet looked through the code (yet), but I was curious to
> benchmark the perf benefit, so that's what I did for fun this evening.
>
> (!!!) BTW, a complete aside, but I think it's interesting. It turned
> out that using bpf_test_prog_run() scales *VERY POORLY* with large
> number of CPUs, because we start spending tons of time in
> fdget()/fdput(), so I initially got pretty unscalable results,
> profiled a bit, and then switched to just doing
> syscall(syscall(__NR_getpgid); + SEC("raw_tp/sys_enter")). Anyways,
> the point is that microbenchmarking is tricky and we need to improve
> our existing bench setup for some benchmarks. Anyways, getting back to
> the main topic.
>
> I wrote a quick two benchmarks testing what I see as intended use case
> for these kfuncs (batching amortizes the cost of triggering BPF
> program, batch_iters = 500 in my case):
>
> SEC("?raw_tp/sys_enter")
> int trigger_driver_ktime(void *ctx)
> {
> volatile __u64 total = 0;
> int i;
>
> for (i = 0; i < batch_iters; i++) {
> __u64 start, end;
>
> start = bpf_ktime_get_ns();
> end = bpf_ktime_get_ns();
> total += end - start;
> }
> inc_counter_batch(batch_iters);
>
> return 0;
> }
>
> extern __u64 bpf_get_cpu_cycles(void) __weak __ksym;
> extern __u64 bpf_cpu_cycles_to_ns(__u64 cycles) __weak __ksym;
>
> SEC("?raw_tp/sys_enter")
> int trigger_driver_cycles(void *ctx)
> {
> volatile __u64 total = 0;
> int i;
>
> for (i = 0; i < batch_iters; i++) {
> __u64 start, end;
>
> start = bpf_get_cpu_cycles();
> end = bpf_get_cpu_cycles();
> total += bpf_cpu_cycles_to_ns(end - start);
> }
> inc_counter_batch(batch_iters);
>
> return 0;
> }
>
> And here's what I got across multiple numbers of parallel CPUs on our
> production host.
>
> # ./bench_timing.sh
>
> ktime ( 1 cpus): 32.286 ± 0.309M/s ( 32.286M/s/cpu)
> ktime ( 2 cpus): 63.021 ± 0.538M/s ( 31.511M/s/cpu)
> ktime ( 3 cpus): 94.211 ± 0.686M/s ( 31.404M/s/cpu)
> ktime ( 4 cpus): 124.757 ± 0.691M/s ( 31.189M/s/cpu)
> ktime ( 5 cpus): 154.855 ± 0.693M/s ( 30.971M/s/cpu)
> ktime ( 6 cpus): 185.551 ± 2.304M/s ( 30.925M/s/cpu)
> ktime ( 7 cpus): 211.117 ± 4.755M/s ( 30.160M/s/cpu)
> ktime ( 8 cpus): 236.454 ± 0.226M/s ( 29.557M/s/cpu)
> ktime (10 cpus): 295.526 ± 0.126M/s ( 29.553M/s/cpu)
> ktime (12 cpus): 322.282 ± 0.153M/s ( 26.857M/s/cpu)
> ktime (14 cpus): 375.347 ± 0.087M/s ( 26.811M/s/cpu)
> ktime (16 cpus): 399.813 ± 0.181M/s ( 24.988M/s/cpu)
> ktime (24 cpus): 617.675 ± 7.053M/s ( 25.736M/s/cpu)
> ktime (32 cpus): 819.695 ± 0.231M/s ( 25.615M/s/cpu)
> ktime (40 cpus): 996.264 ± 0.290M/s ( 24.907M/s/cpu)
> ktime (48 cpus): 1180.201 ± 0.160M/s ( 24.588M/s/cpu)
> ktime (56 cpus): 1321.084 ± 0.099M/s ( 23.591M/s/cpu)
> ktime (64 cpus): 1482.061 ± 0.121M/s ( 23.157M/s/cpu)
> ktime (72 cpus): 1666.540 ± 0.460M/s ( 23.146M/s/cpu)
> ktime (80 cpus): 1851.419 ± 0.439M/s ( 23.143M/s/cpu)
>
> cycles ( 1 cpus): 45.815 ± 0.018M/s ( 45.815M/s/cpu)
> cycles ( 2 cpus): 86.706 ± 0.068M/s ( 43.353M/s/cpu)
> cycles ( 3 cpus): 129.899 ± 0.101M/s ( 43.300M/s/cpu)
> cycles ( 4 cpus): 168.435 ± 0.073M/s ( 42.109M/s/cpu)
> cycles ( 5 cpus): 210.520 ± 0.164M/s ( 42.104M/s/cpu)
> cycles ( 6 cpus): 252.596 ± 0.050M/s ( 42.099M/s/cpu)
> cycles ( 7 cpus): 294.356 ± 0.159M/s ( 42.051M/s/cpu)
> cycles ( 8 cpus): 317.167 ± 0.163M/s ( 39.646M/s/cpu)
> cycles (10 cpus): 396.141 ± 0.208M/s ( 39.614M/s/cpu)
> cycles (12 cpus): 431.938 ± 0.511M/s ( 35.995M/s/cpu)
> cycles (14 cpus): 503.055 ± 0.070M/s ( 35.932M/s/cpu)
> cycles (16 cpus): 534.261 ± 0.107M/s ( 33.391M/s/cpu)
> cycles (24 cpus): 836.838 ± 0.141M/s ( 34.868M/s/cpu)
> cycles (32 cpus): 1099.689 ± 0.314M/s ( 34.365M/s/cpu)
> cycles (40 cpus): 1336.573 ± 0.015M/s ( 33.414M/s/cpu)
> cycles (48 cpus): 1571.734 ± 11.151M/s ( 32.744M/s/cpu)
> cycles (56 cpus): 1819.242 ± 4.627M/s ( 32.486M/s/cpu)
> cycles (64 cpus): 2046.285 ± 5.169M/s ( 31.973M/s/cpu)
> cycles (72 cpus): 2287.683 ± 0.787M/s ( 31.773M/s/cpu)
> cycles (80 cpus): 2505.414 ± 0.626M/s ( 31.318M/s/cpu)
>
> So, from about +42% on a single CPU, to +36% at 80 CPUs. Not that bad.
> Scalability-wise, we still see some drop off in performance, but
> believe me, with bpf_prog_test_run() it was so much worse :) I also
> verified that now we spend cycles almost exclusively inside the BPF
> program (so presumably in those benchmarked kfuncs).
Am I right that the numbers show how many iterations were done during
the very same amount of time? It would be also great to understand if
we get more precise measurements - just in case you have your tests
ready...
next prev parent reply other threads:[~2024-11-12 21:43 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-09 0:41 [PATCH bpf-next v5 1/4] bpf: add bpf_get_cpu_cycles kfunc Vadim Fedorenko
2024-11-09 0:41 ` [PATCH bpf-next v5 2/4] bpf: add bpf_cpu_cycles_to_ns helper Vadim Fedorenko
2024-11-12 23:03 ` Eduard Zingerman
2024-11-09 0:41 ` [PATCH bpf-next v5 3/4] selftests/bpf: add selftest to check rdtsc jit Vadim Fedorenko
2024-11-12 23:17 ` Eduard Zingerman
2024-11-09 0:41 ` [PATCH bpf-next v5 4/4] selftests/bpf: add usage example for cpu cycles kfuncs Vadim Fedorenko
2024-11-12 5:50 ` [PATCH bpf-next v5 1/4] bpf: add bpf_get_cpu_cycles kfunc Andrii Nakryiko
2024-11-12 21:43 ` Vadim Fedorenko [this message]
2024-11-12 23:59 ` Andrii Nakryiko
2024-11-12 21:21 ` Eduard Zingerman
2024-11-12 21:39 ` Vadim Fedorenko
2024-11-12 21:53 ` Eduard Zingerman
2024-11-12 22:19 ` Eduard Zingerman
2024-11-12 22:27 ` Alexei Starovoitov
2024-11-12 23:08 ` Vadim Fedorenko
2024-11-13 0:09 ` Alexei Starovoitov
2024-11-13 0:20 ` Vadim Fedorenko
2024-11-13 17:38 ` Yonghong Song
2024-11-13 17:52 ` Vadim Fedorenko
2024-11-13 18:42 ` Yonghong Song
2024-11-13 22:28 ` Vadim Fedorenko
2024-11-13 23:02 ` Yonghong Song
2024-11-14 1:05 ` Vadim Fedorenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=946c61c2-ab1e-43cb-adfd-cc5b7716b915@linux.dev \
--to=vadim.fedorenko@linux.dev \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=tglx@linutronix.de \
--cc=vadfed@meta.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox