From: Puranjay Mohan <puranjay@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>,
Alexei Starovoitov <ast@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eduard Zingerman <eddyz87@gmail.com>,
Eric Dumazet <edumazet@google.com>, Hao Luo <haoluo@google.com>,
Helge Deller <deller@gmx.de>, Jakub Kicinski <kuba@kernel.org>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org,
linux-riscv@lists.infradead.org,
Martin KaFai Lau <martin.lau@linux.dev>,
Mykola Lysenko <mykolal@fb.com>,
netdev@vger.kernel.org, Palmer Dabbelt <palmer@dabbelt.com>,
Paolo Abeni <pabeni@redhat.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Shuah Khan <shuah@kernel.org>, Song Liu <song@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Yonghong Song <yonghong.song@linux.dev>
Subject: Re: [PATCH bpf-next 4/5] selftests/bpf: Add benchmark for bpf_csum_diff() helper
Date: Tue, 22 Oct 2024 10:21:43 +0000 [thread overview]
Message-ID: <mb61pa5ewbfpk.fsf@kernel.org> (raw)
In-Reply-To: <CAEf4BzY1LgCF1VOoAQkMdDTx87C0mfyftMvhvVU4GpsFc6fw5g@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4101 bytes --]
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Mon, Oct 21, 2024 at 5:22 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> Add a microbenchmark for bpf_csum_diff() helper. This benchmark works by
>> filling a 4KB buffer with random data and calculating the internet
>> checksum on different parts of this buffer using bpf_csum_diff().
>>
>> Example run using ./benchs/run_bench_csum_diff.sh on x86_64:
>>
>> [bpf]$ ./benchs/run_bench_csum_diff.sh
>> 4 2.296 ± 0.066M/s (drops 0.000 ± 0.000M/s)
>> 8 2.320 ± 0.003M/s (drops 0.000 ± 0.000M/s)
>> 16 2.315 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 20 2.318 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 32 2.308 ± 0.003M/s (drops 0.000 ± 0.000M/s)
>> 40 2.300 ± 0.029M/s (drops 0.000 ± 0.000M/s)
>> 64 2.286 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 128 2.250 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 256 2.173 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 512 2.023 ± 0.055M/s (drops 0.000 ± 0.000M/s)
>
> you are not benchmarking bpf_csum_diff(), you are benchmarking how
> often you can call bpf_prog_test_run(). Add some batching on the BPF
> side, these numbers tell you that there is no difference between
> calculating checksum for 4 bytes and for 512, that didn't seem strange
> to you?
This didn't seem strange to me because if you see the tables I added to
the cover letter, there is a clear improvement after optimizing the
helper and arm64 even shows a linear drop going from 4 bytes to 512
bytes, even after the optimization.
On x86 after the improvement, 4 bytes and 512 bytes show similar numbers
but there is still a small drop that can be seen going from 4 to 512
bytes.
My thought was that because the bpf_csum_diff() calls csum_partial() on
x86 which is already optimised, most of the overhead was due to copying
the buffer which is now removed.
I guess I can amplify the difference between 4B and 512B by calling
bpf_csum_diff() multiple times in a loop, or by calculating the csum by
dividing the buffer into more parts (currently the BPF code divides it
into 2 parts only).
>>
>> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> ---
>> tools/testing/selftests/bpf/Makefile | 2 +
>> tools/testing/selftests/bpf/bench.c | 4 +
>> .../selftests/bpf/benchs/bench_csum_diff.c | 164 ++++++++++++++++++
>> .../bpf/benchs/run_bench_csum_diff.sh | 10 ++
>> .../selftests/bpf/progs/csum_diff_bench.c | 25 +++
>> 5 files changed, 205 insertions(+)
>> create mode 100644 tools/testing/selftests/bpf/benchs/bench_csum_diff.c
>> create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_csum_diff.sh
>> create mode 100644 tools/testing/selftests/bpf/progs/csum_diff_bench.c
>>
>
> [...]
>
>> +
>> +static void csum_diff_setup(void)
>> +{
>> + int err;
>> + char *buff;
>> + size_t i, sz;
>> +
>> + sz = sizeof(ctx.skel->rodata->buff);
>> +
>> + setup_libbpf();
>> +
>> + ctx.skel = csum_diff_bench__open();
>> + if (!ctx.skel) {
>> + fprintf(stderr, "failed to open skeleton\n");
>> + exit(1);
>> + }
>> +
>> + srandom(time(NULL));
>> + buff = ctx.skel->rodata->buff;
>> +
>> + /*
>> + * Set first 8 bytes of buffer to 0xdeadbeefdeadbeef, this is later used to verify the
>> + * correctness of the helper by comparing the checksum result for 0xdeadbeefdeadbeef that
>> + * should be 0x3b3b
>> + */
>> +
>> + *(u64 *)buff = 0xdeadbeefdeadbeef;
>> +
>> + for (i = 8; i < sz; i++)
>> + buff[i] = '1' + random() % 9;
>
> so, you only generate 9 different values for bytes, why? Why not full
> byte range?
Thanks for catching this, there is no reason for this to be [1,10] I
will use the full byte range in the next version.
Thanks,
Puranjay
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: Puranjay Mohan <puranjay@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>,
Alexei Starovoitov <ast@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eduard Zingerman <eddyz87@gmail.com>,
Eric Dumazet <edumazet@google.com>, Hao Luo <haoluo@google.com>,
Helge Deller <deller@gmx.de>, Jakub Kicinski <kuba@kernel.org>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org,
linux-riscv@lists.infradead.org,
Martin KaFai Lau <martin.lau@linux.dev>,
Mykola Lysenko <mykolal@fb.com>,
netdev@vger.kernel.org, Palmer Dabbelt <palmer@dabbelt.com>,
Paolo Abeni <pabeni@redhat.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Shuah Khan <shuah@kernel.org>, Song Liu <song@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Yonghong Song <yonghong.song@linux.dev>
Subject: Re: [PATCH bpf-next 4/5] selftests/bpf: Add benchmark for bpf_csum_diff() helper
Date: Tue, 22 Oct 2024 10:21:43 +0000 [thread overview]
Message-ID: <mb61pa5ewbfpk.fsf@kernel.org> (raw)
In-Reply-To: <CAEf4BzY1LgCF1VOoAQkMdDTx87C0mfyftMvhvVU4GpsFc6fw5g@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 4101 bytes --]
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Mon, Oct 21, 2024 at 5:22 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> Add a microbenchmark for bpf_csum_diff() helper. This benchmark works by
>> filling a 4KB buffer with random data and calculating the internet
>> checksum on different parts of this buffer using bpf_csum_diff().
>>
>> Example run using ./benchs/run_bench_csum_diff.sh on x86_64:
>>
>> [bpf]$ ./benchs/run_bench_csum_diff.sh
>> 4 2.296 ± 0.066M/s (drops 0.000 ± 0.000M/s)
>> 8 2.320 ± 0.003M/s (drops 0.000 ± 0.000M/s)
>> 16 2.315 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 20 2.318 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 32 2.308 ± 0.003M/s (drops 0.000 ± 0.000M/s)
>> 40 2.300 ± 0.029M/s (drops 0.000 ± 0.000M/s)
>> 64 2.286 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 128 2.250 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 256 2.173 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>> 512 2.023 ± 0.055M/s (drops 0.000 ± 0.000M/s)
>
> you are not benchmarking bpf_csum_diff(), you are benchmarking how
> often you can call bpf_prog_test_run(). Add some batching on the BPF
> side, these numbers tell you that there is no difference between
> calculating checksum for 4 bytes and for 512, that didn't seem strange
> to you?
This didn't seem strange to me because if you see the tables I added to
the cover letter, there is a clear improvement after optimizing the
helper and arm64 even shows a linear drop going from 4 bytes to 512
bytes, even after the optimization.
On x86 after the improvement, 4 bytes and 512 bytes show similar numbers
but there is still a small drop that can be seen going from 4 to 512
bytes.
My thought was that because the bpf_csum_diff() calls csum_partial() on
x86 which is already optimised, most of the overhead was due to copying
the buffer which is now removed.
I guess I can amplify the difference between 4B and 512B by calling
bpf_csum_diff() multiple times in a loop, or by calculating the csum by
dividing the buffer into more parts (currently the BPF code divides it
into 2 parts only).
>>
>> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> ---
>> tools/testing/selftests/bpf/Makefile | 2 +
>> tools/testing/selftests/bpf/bench.c | 4 +
>> .../selftests/bpf/benchs/bench_csum_diff.c | 164 ++++++++++++++++++
>> .../bpf/benchs/run_bench_csum_diff.sh | 10 ++
>> .../selftests/bpf/progs/csum_diff_bench.c | 25 +++
>> 5 files changed, 205 insertions(+)
>> create mode 100644 tools/testing/selftests/bpf/benchs/bench_csum_diff.c
>> create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_csum_diff.sh
>> create mode 100644 tools/testing/selftests/bpf/progs/csum_diff_bench.c
>>
>
> [...]
>
>> +
>> +static void csum_diff_setup(void)
>> +{
>> + int err;
>> + char *buff;
>> + size_t i, sz;
>> +
>> + sz = sizeof(ctx.skel->rodata->buff);
>> +
>> + setup_libbpf();
>> +
>> + ctx.skel = csum_diff_bench__open();
>> + if (!ctx.skel) {
>> + fprintf(stderr, "failed to open skeleton\n");
>> + exit(1);
>> + }
>> +
>> + srandom(time(NULL));
>> + buff = ctx.skel->rodata->buff;
>> +
>> + /*
>> + * Set first 8 bytes of buffer to 0xdeadbeefdeadbeef, this is later used to verify the
>> + * correctness of the helper by comparing the checksum result for 0xdeadbeefdeadbeef that
>> + * should be 0x3b3b
>> + */
>> +
>> + *(u64 *)buff = 0xdeadbeefdeadbeef;
>> +
>> + for (i = 8; i < sz; i++)
>> + buff[i] = '1' + random() % 9;
>
> so, you only generate 9 different values for bytes, why? Why not full
> byte range?
Thanks for catching this, there is no reason for this to be [1,10] I
will use the full byte range in the next version.
Thanks,
Puranjay
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]
[-- Attachment #2: Type: text/plain, Size: 161 bytes --]
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2024-10-22 10:21 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-21 12:21 [PATCH bpf-next 0/5] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 12:21 ` [PATCH bpf-next 1/5] net: checksum: move from32to16() to generic header Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:41 ` Daniel Borkmann
2024-10-21 13:41 ` Daniel Borkmann
2024-10-22 9:49 ` Toke Høiland-Jørgensen
2024-10-22 9:49 ` Toke Høiland-Jørgensen
2024-10-22 13:50 ` kernel test robot
2024-10-22 13:50 ` kernel test robot
2024-10-21 12:21 ` [PATCH bpf-next 2/5] bpf: bpf_csum_diff: optimize and homogenize for all archs Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:42 ` Daniel Borkmann
2024-10-21 13:42 ` Daniel Borkmann
2024-10-22 9:54 ` Toke Høiland-Jørgensen
2024-10-22 9:54 ` Toke Høiland-Jørgensen
2024-10-22 18:09 ` kernel test robot
2024-10-22 18:09 ` kernel test robot
2024-10-21 12:21 ` [PATCH bpf-next 3/5] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:01 ` Helge Deller
2024-10-21 13:01 ` Helge Deller
2024-10-21 13:14 ` Puranjay Mohan
2024-10-21 13:14 ` Puranjay Mohan
2024-10-21 14:04 ` Helge Deller
2024-10-21 14:04 ` Helge Deller
2024-10-21 13:42 ` Daniel Borkmann
2024-10-21 13:42 ` Daniel Borkmann
2024-10-22 9:55 ` Toke Høiland-Jørgensen
2024-10-22 9:55 ` Toke Høiland-Jørgensen
2024-10-21 12:21 ` [PATCH bpf-next 4/5] selftests/bpf: Add benchmark for bpf_csum_diff() helper Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:43 ` Daniel Borkmann
2024-10-21 13:43 ` Daniel Borkmann
2024-10-21 23:28 ` Andrii Nakryiko
2024-10-21 23:28 ` Andrii Nakryiko
2024-10-22 10:21 ` Puranjay Mohan [this message]
2024-10-22 10:21 ` Puranjay Mohan
2024-10-22 17:47 ` Andrii Nakryiko
2024-10-22 17:47 ` Andrii Nakryiko
2024-10-22 17:58 ` Puranjay Mohan
2024-10-22 17:58 ` Puranjay Mohan
2024-10-23 15:37 ` Puranjay Mohan
2024-10-23 15:37 ` Puranjay Mohan
2024-10-21 12:21 ` [PATCH bpf-next 5/5] selftests/bpf: Add a selftest for bpf_csum_diff() Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:44 ` Daniel Borkmann
2024-10-21 13:44 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mb61pa5ewbfpk.fsf@kernel.org \
--to=puranjay@kernel.org \
--cc=James.Bottomley@hansenpartnership.com \
--cc=akpm@linux-foundation.org \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=aou@eecs.berkeley.edu \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.