From: Puranjay Mohan <puranjay@kernel.org>
To: Albert Ou <aou@eecs.berkeley.edu>,
Alexei Starovoitov <ast@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eduard Zingerman <eddyz87@gmail.com>,
Eric Dumazet <edumazet@google.com>, Hao Luo <haoluo@google.com>,
Helge Deller <deller@gmx.de>, Jakub Kicinski <kuba@kernel.org>,
"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org,
linux-riscv@lists.infradead.org,
Martin KaFai Lau <martin.lau@linux.dev>,
Mykola Lysenko <mykolal@fb.com>,
netdev@vger.kernel.org, Palmer Dabbelt <palmer@dabbelt.com>,
Paolo Abeni <pabeni@redhat.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Puranjay Mohan <puranjay12@gmail.com>,
Puranjay Mohan <puranjay@kernel.org>,
Shuah Khan <shuah@kernel.org>, Song Liu <song@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Yonghong Song <yonghong.song@linux.dev>
Subject: [PATCH bpf-next v2 0/4] Optimize bpf_csum_diff() and homogenize for all archs
Date: Wed, 23 Oct 2024 15:39:18 +0000 [thread overview]
Message-ID: <20241023153922.86909-1-puranjay@kernel.org> (raw)
Changes in v2:
v1: https://lore.kernel.org/all/20241021122112.101513-1-puranjay@kernel.org/
- Remove the patch that adds the benchmark as it is not useful enough to be
added to the tree.
- Fixed a sparse warning in patch 1.
- Add reviewed-by and acked-by tags.
NOTE: There are some sparse warning in net/core/filter.c but those are not
worth fixing because bpf helpers take and return u64 values and using them
in csum related functions that take and return __sum16 / __wsum would need
a lot of casts everywhere.
The bpf_csum_diff() helper currently returns different values on different
architectures because it calls csum_partial() that is either implemented by
the architecture like x86_64, arm, etc or uses the generic implementation
in lib/checksum.c like arm64, riscv, etc.
The implementation in lib/checksum.c returns the folded result that is
16-bit long, but the architecture specific implementation can return an
unfolded value that is larger than 16-bits.
The helper uses a per-cpu scratchpad buffer for copying the data and then
computing the csum on this buffer. This can be optimised by utilising some
mathematical properties of csum.
The patch 1 in this series does preparatory work for homogenizing the
helper. patch 2 does the changes to the helper itself. The performance gain
can be seen in the tables below that are generated using the benchmark
built in patch 4 of v1 of this series:
x86-64:
+-------------+------------------+------------------+-------------+
| Buffer Size | Before | After | Improvement |
+-------------+------------------+------------------+-------------+
| 4 | 2.296 ± 0.066M/s | 3.415 ± 0.001M/s | 48.73 % |
| 8 | 2.320 ± 0.003M/s | 3.409 ± 0.003M/s | 46.93 % |
| 16 | 2.315 ± 0.001M/s | 3.414 ± 0.003M/s | 47.47 % |
| 20 | 2.318 ± 0.001M/s | 3.416 ± 0.001M/s | 47.36 % |
| 32 | 2.308 ± 0.003M/s | 3.413 ± 0.003M/s | 47.87 % |
| 40 | 2.300 ± 0.029M/s | 3.413 ± 0.003M/s | 48.39 % |
| 64 | 2.286 ± 0.001M/s | 3.410 ± 0.001M/s | 49.16 % |
| 128 | 2.250 ± 0.001M/s | 3.404 ± 0.001M/s | 51.28 % |
| 256 | 2.173 ± 0.001M/s | 3.383 ± 0.001M/s | 55.68 % |
| 512 | 2.023 ± 0.055M/s | 3.340 ± 0.001M/s | 65.10 % |
+-------------+------------------+------------------+-------------+
ARM64:
+-------------+------------------+------------------+-------------+
| Buffer Size | Before | After | Improvement |
+-------------+------------------+------------------+-------------+
| 4 | 1.397 ± 0.005M/s | 1.493 ± 0.005M/s | 6.87 % |
| 8 | 1.402 ± 0.002M/s | 1.489 ± 0.002M/s | 6.20 % |
| 16 | 1.391 ± 0.001M/s | 1.481 ± 0.001M/s | 6.47 % |
| 20 | 1.379 ± 0.001M/s | 1.477 ± 0.001M/s | 7.10 % |
| 32 | 1.358 ± 0.001M/s | 1.469 ± 0.002M/s | 8.17 % |
| 40 | 1.339 ± 0.001M/s | 1.462 ± 0.002M/s | 9.18 % |
| 64 | 1.302 ± 0.002M/s | 1.449 ± 0.003M/s | 11.29 % |
| 128 | 1.214 ± 0.001M/s | 1.443 ± 0.003M/s | 18.86 % |
| 256 | 1.080 ± 0.001M/s | 1.423 ± 0.001M/s | 31.75 % |
| 512 | 0.887 ± 0.001M/s | 1.411 ± 0.002M/s | 59.07 % |
+-------------+------------------+------------------+-------------+
Patch 3 reverts a hack that was done to make the selftest pass on all
architectures.
Patch 4 adds a selftest for this helper to verify the results produced by
this helper in multiple modes and edge cases.
Puranjay Mohan (4):
net: checksum: move from32to16() to generic header
bpf: bpf_csum_diff: optimize and homogenize for all archs
selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier
selftests/bpf: Add a selftest for bpf_csum_diff()
arch/parisc/lib/checksum.c | 13 +-
include/net/checksum.h | 6 +
lib/checksum.c | 11 +-
net/core/filter.c | 37 +-
.../selftests/bpf/prog_tests/test_csum_diff.c | 408 ++++++++++++++++++
.../selftests/bpf/progs/csum_diff_test.c | 42 ++
.../bpf/progs/verifier_array_access.c | 3 +-
7 files changed, 469 insertions(+), 51 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_csum_diff.c
create mode 100644 tools/testing/selftests/bpf/progs/csum_diff_test.c
--
2.40.1
next reply other threads:[~2024-10-23 15:39 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-23 15:39 Puranjay Mohan [this message]
2024-10-23 15:39 ` [PATCH bpf-next v2 1/4] net: checksum: move from32to16() to generic header Puranjay Mohan
2024-10-23 15:39 ` [PATCH bpf-next v2 2/4] bpf: bpf_csum_diff: optimize and homogenize for all archs Puranjay Mohan
2024-10-25 7:38 ` kernel test robot
2024-10-25 10:11 ` Puranjay Mohan
2024-10-25 11:32 ` Daniel Borkmann
2024-10-23 15:39 ` [PATCH bpf-next v2 3/4] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier Puranjay Mohan
2024-10-23 15:39 ` [PATCH bpf-next v2 4/4] selftests/bpf: Add a selftest for bpf_csum_diff() Puranjay Mohan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241023153922.86909-1-puranjay@kernel.org \
--to=puranjay@kernel.org \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=aou@eecs.berkeley.edu \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=puranjay12@gmail.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).