netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/5] Optimize bpf_csum_diff() and homogenize for all archs
@ 2024-10-21 12:21 Puranjay Mohan
  2024-10-21 12:21 ` [PATCH bpf-next 1/5] net: checksum: move from32to16() to generic header Puranjay Mohan
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Puranjay Mohan @ 2024-10-21 12:21 UTC (permalink / raw)
  To: Albert Ou, Alexei Starovoitov, Andrew Morton, Andrii Nakryiko,
	bpf, Daniel Borkmann, David S. Miller, Eduard Zingerman,
	Eric Dumazet, Hao Luo, Helge Deller, Jakub Kicinski,
	James E.J. Bottomley, Jiri Olsa, John Fastabend, KP Singh,
	linux-kernel, linux-parisc, linux-riscv, Martin KaFai Lau,
	Mykola Lysenko, netdev, Palmer Dabbelt, Paolo Abeni,
	Paul Walmsley, Puranjay Mohan, Puranjay Mohan, Shuah Khan,
	Song Liu, Stanislav Fomichev, Yonghong Song

The bpf_csum_diff() helper currently returns different values on different
architectures because it calls csum_partial() that is either implemented by
the architecture like x86_64, arm, etc or uses the generic implementation
in lib/checksum.c like arm64, riscv, etc.

The implementation in lib/checksum.c returns the folded result that is
16-bit long, but the architecture specific implementation can return an
unfolded value that is larger than 16-bits.

The helper uses a per-cpu scratchpad buffer for copying the data and then
computing the csum on this buffer. This can be optimised by utilising some
mathematical properties of csum.

The patch 1 in this series does preparatory work for homogenizing the
helper. patch 2 does the changes to the helper itself. The performance gain
can be seen in the tables below that are generated using the benchmark
built in patch 4:

  x86-64:
  +-------------+------------------+------------------+-------------+
  | Buffer Size |      Before      |      After       | Improvement |
  +-------------+------------------+------------------+-------------+
  |      4      | 2.296 ± 0.066M/s | 3.415 ± 0.001M/s |   48.73  %  |
  |      8      | 2.320 ± 0.003M/s | 3.409 ± 0.003M/s |   46.93  %  |
  |      16     | 2.315 ± 0.001M/s | 3.414 ± 0.003M/s |   47.47  %  |
  |      20     | 2.318 ± 0.001M/s | 3.416 ± 0.001M/s |   47.36  %  |
  |      32     | 2.308 ± 0.003M/s | 3.413 ± 0.003M/s |   47.87  %  |
  |      40     | 2.300 ± 0.029M/s | 3.413 ± 0.003M/s |   48.39  %  |
  |      64     | 2.286 ± 0.001M/s | 3.410 ± 0.001M/s |   49.16  %  |
  |      128    | 2.250 ± 0.001M/s | 3.404 ± 0.001M/s |   51.28  %  |
  |      256    | 2.173 ± 0.001M/s | 3.383 ± 0.001M/s |   55.68  %  |
  |      512    | 2.023 ± 0.055M/s | 3.340 ± 0.001M/s |   65.10  %  |
  +-------------+------------------+------------------+-------------+

  ARM64:
  +-------------+------------------+------------------+-------------+
  | Buffer Size |      Before      |      After       | Improvement |
  +-------------+------------------+------------------+-------------+
  |      4      | 1.397 ± 0.005M/s | 1.493 ± 0.005M/s |    6.87  %  |
  |      8      | 1.402 ± 0.002M/s | 1.489 ± 0.002M/s |    6.20  %  |
  |      16     | 1.391 ± 0.001M/s | 1.481 ± 0.001M/s |    6.47  %  |
  |      20     | 1.379 ± 0.001M/s | 1.477 ± 0.001M/s |    7.10  %  |
  |      32     | 1.358 ± 0.001M/s | 1.469 ± 0.002M/s |    8.17  %  |
  |      40     | 1.339 ± 0.001M/s | 1.462 ± 0.002M/s |    9.18  %  |
  |      64     | 1.302 ± 0.002M/s | 1.449 ± 0.003M/s |    11.29 %  |
  |      128    | 1.214 ± 0.001M/s | 1.443 ± 0.003M/s |    18.86 %  |
  |      256    | 1.080 ± 0.001M/s | 1.423 ± 0.001M/s |    31.75 %  |
  |      512    | 0.887 ± 0.001M/s | 1.411 ± 0.002M/s |    59.07 %  |
  +-------------+------------------+------------------+-------------+

Patch 5 adds a selftest for this helper to verify the results produced by
this helper in multiple modes and edge cases.

Patch 3 reverts a hack that was done to make the selftest pass on all
architectures.

Puranjay Mohan (5):
  net: checksum: move from32to16() to generic header
  bpf: bpf_csum_diff: optimize and homogenize for all archs
  selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier
  selftests/bpf: Add benchmark for bpf_csum_diff() helper
  selftests/bpf: Add a selftest for bpf_csum_diff()

 arch/parisc/lib/checksum.c                    |  13 +-
 include/net/checksum.h                        |   6 +
 lib/checksum.c                                |  11 +-
 net/core/filter.c                             |  37 +-
 tools/testing/selftests/bpf/Makefile          |   2 +
 tools/testing/selftests/bpf/bench.c           |   4 +
 .../selftests/bpf/benchs/bench_csum_diff.c    | 164 +++++++
 .../bpf/benchs/run_bench_csum_diff.sh         |  10 +
 .../selftests/bpf/prog_tests/test_csum_diff.c | 408 ++++++++++++++++++
 .../selftests/bpf/progs/csum_diff_bench.c     |  25 ++
 .../selftests/bpf/progs/csum_diff_test.c      |  42 ++
 .../bpf/progs/verifier_array_access.c         |   3 +-
 12 files changed, 674 insertions(+), 51 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_csum_diff.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_csum_diff.sh
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_csum_diff.c
 create mode 100644 tools/testing/selftests/bpf/progs/csum_diff_bench.c
 create mode 100644 tools/testing/selftests/bpf/progs/csum_diff_test.c

-- 
2.40.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-10-23 15:37 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-21 12:21 [PATCH bpf-next 0/5] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
2024-10-21 12:21 ` [PATCH bpf-next 1/5] net: checksum: move from32to16() to generic header Puranjay Mohan
2024-10-21 13:41   ` Daniel Borkmann
2024-10-22  9:49   ` Toke Høiland-Jørgensen
2024-10-22 13:50   ` kernel test robot
2024-10-21 12:21 ` [PATCH bpf-next 2/5] bpf: bpf_csum_diff: optimize and homogenize for all archs Puranjay Mohan
2024-10-21 13:42   ` Daniel Borkmann
2024-10-22  9:54   ` Toke Høiland-Jørgensen
2024-10-22 18:09   ` kernel test robot
2024-10-21 12:21 ` [PATCH bpf-next 3/5] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier Puranjay Mohan
2024-10-21 13:01   ` Helge Deller
2024-10-21 13:14     ` Puranjay Mohan
2024-10-21 14:04       ` Helge Deller
2024-10-21 13:42   ` Daniel Borkmann
2024-10-22  9:55   ` Toke Høiland-Jørgensen
2024-10-21 12:21 ` [PATCH bpf-next 4/5] selftests/bpf: Add benchmark for bpf_csum_diff() helper Puranjay Mohan
2024-10-21 13:43   ` Daniel Borkmann
2024-10-21 23:28   ` Andrii Nakryiko
2024-10-22 10:21     ` Puranjay Mohan
2024-10-22 17:47       ` Andrii Nakryiko
2024-10-22 17:58         ` Puranjay Mohan
2024-10-23 15:37           ` Puranjay Mohan
2024-10-21 12:21 ` [PATCH bpf-next 5/5] selftests/bpf: Add a selftest for bpf_csum_diff() Puranjay Mohan
2024-10-21 13:44   ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).