[PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs

linux-parisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs
@ 2024-10-26 12:53 Puranjay Mohan
  2024-10-26 12:53 ` [PATCH bpf-next v3 1/4] net: checksum: move from32to16() to generic header Puranjay Mohan
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Puranjay Mohan @ 2024-10-26 12:53 UTC (permalink / raw)
  To: Albert Ou, Alexei Starovoitov, Andrew Morton, Andrii Nakryiko,
	bpf, Daniel Borkmann, David S. Miller, Eduard Zingerman,
	Eric Dumazet, Hao Luo, Helge Deller, Jakub Kicinski,
	James E.J. Bottomley, Jiri Olsa, John Fastabend, KP Singh,
	linux-kernel, linux-parisc, linux-riscv, Martin KaFai Lau,
	Mykola Lysenko, netdev, Palmer Dabbelt, Paolo Abeni,
	Paul Walmsley, Puranjay Mohan, Puranjay Mohan, Shuah Khan,
	Song Liu, Stanislav Fomichev, Yonghong Song

Changes in v3:
v2: https://lore.kernel.org/all/20241023153922.86909-1-puranjay@kernel.org/
- Fix sparse warning in patch 2

Changes in v2:
v1: https://lore.kernel.org/all/20241021122112.101513-1-puranjay@kernel.org/
- Remove the patch that adds the benchmark as it is not useful enough to be
  added to the tree.
- Fixed a sparse warning in patch 1.
- Add reviewed-by and acked-by tags.

NOTE: There are some sparse warning in net/core/filter.c but those are not
worth fixing because bpf helpers take and return u64 values and using them
in csum related functions that take and return __sum16 / __wsum would need
a lot of casts everywhere.

The bpf_csum_diff() helper currently returns different values on different
architectures because it calls csum_partial() that is either implemented by
the architecture like x86_64, arm, etc or uses the generic implementation
in lib/checksum.c like arm64, riscv, etc.

The implementation in lib/checksum.c returns the folded result that is
16-bit long, but the architecture specific implementation can return an
unfolded value that is larger than 16-bits.

The helper uses a per-cpu scratchpad buffer for copying the data and then
computing the csum on this buffer. This can be optimised by utilising some
mathematical properties of csum.

The patch 1 in this series does preparatory work for homogenizing the
helper. patch 2 does the changes to the helper itself. The performance gain
can be seen in the tables below that are generated using the benchmark
built in patch 4 of v1 of this series:

  x86-64:
  +-------------+------------------+------------------+-------------+
  | Buffer Size |      Before      |      After       | Improvement |
  +-------------+------------------+------------------+-------------+
  |      4      | 2.296 ± 0.066M/s | 3.415 ± 0.001M/s |   48.73  %  |
  |      8      | 2.320 ± 0.003M/s | 3.409 ± 0.003M/s |   46.93  %  |
  |      16     | 2.315 ± 0.001M/s | 3.414 ± 0.003M/s |   47.47  %  |
  |      20     | 2.318 ± 0.001M/s | 3.416 ± 0.001M/s |   47.36  %  |
  |      32     | 2.308 ± 0.003M/s | 3.413 ± 0.003M/s |   47.87  %  |
  |      40     | 2.300 ± 0.029M/s | 3.413 ± 0.003M/s |   48.39  %  |
  |      64     | 2.286 ± 0.001M/s | 3.410 ± 0.001M/s |   49.16  %  |
  |      128    | 2.250 ± 0.001M/s | 3.404 ± 0.001M/s |   51.28  %  |
  |      256    | 2.173 ± 0.001M/s | 3.383 ± 0.001M/s |   55.68  %  |
  |      512    | 2.023 ± 0.055M/s | 3.340 ± 0.001M/s |   65.10  %  |
  +-------------+------------------+------------------+-------------+

  ARM64:
  +-------------+------------------+------------------+-------------+
  | Buffer Size |      Before      |      After       | Improvement |
  +-------------+------------------+------------------+-------------+
  |      4      | 1.397 ± 0.005M/s | 1.493 ± 0.005M/s |    6.87  %  |
  |      8      | 1.402 ± 0.002M/s | 1.489 ± 0.002M/s |    6.20  %  |
  |      16     | 1.391 ± 0.001M/s | 1.481 ± 0.001M/s |    6.47  %  |
  |      20     | 1.379 ± 0.001M/s | 1.477 ± 0.001M/s |    7.10  %  |
  |      32     | 1.358 ± 0.001M/s | 1.469 ± 0.002M/s |    8.17  %  |
  |      40     | 1.339 ± 0.001M/s | 1.462 ± 0.002M/s |    9.18  %  |
  |      64     | 1.302 ± 0.002M/s | 1.449 ± 0.003M/s |    11.29 %  |
  |      128    | 1.214 ± 0.001M/s | 1.443 ± 0.003M/s |    18.86 %  |
  |      256    | 1.080 ± 0.001M/s | 1.423 ± 0.001M/s |    31.75 %  |
  |      512    | 0.887 ± 0.001M/s | 1.411 ± 0.002M/s |    59.07 %  |
  +-------------+------------------+------------------+-------------+

Patch 3 reverts a hack that was done to make the selftest pass on all
architectures.

Patch 4 adds a selftest for this helper to verify the results produced by
this helper in multiple modes and edge cases.

Puranjay Mohan (4):
  net: checksum: move from32to16() to generic header
  bpf: bpf_csum_diff: optimize and homogenize for all archs
  selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier
  selftests/bpf: Add a selftest for bpf_csum_diff()

 arch/parisc/lib/checksum.c                    |  13 +-
 include/net/checksum.h                        |   6 +
 lib/checksum.c                                |  11 +-
 net/core/filter.c                             |  39 +-
 .../selftests/bpf/prog_tests/test_csum_diff.c | 408 ++++++++++++++++++
 .../selftests/bpf/progs/csum_diff_test.c      |  42 ++
 .../bpf/progs/verifier_array_access.c         |   3 +-
 7 files changed, 471 insertions(+), 51 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_csum_diff.c
 create mode 100644 tools/testing/selftests/bpf/progs/csum_diff_test.c

-- 
2.40.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v3 1/4] net: checksum: move from32to16() to generic header
  2024-10-26 12:53 [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
@ 2024-10-26 12:53 ` Puranjay Mohan
  2024-10-26 12:53 ` [PATCH bpf-next v3 2/4] bpf: bpf_csum_diff: optimize and homogenize for all archs Puranjay Mohan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2024-10-26 12:53 UTC (permalink / raw)
  To: Albert Ou, Alexei Starovoitov, Andrew Morton, Andrii Nakryiko,
	bpf, Daniel Borkmann, David S. Miller, Eduard Zingerman,
	Eric Dumazet, Hao Luo, Helge Deller, Jakub Kicinski,
	James E.J. Bottomley, Jiri Olsa, John Fastabend, KP Singh,
	linux-kernel, linux-parisc, linux-riscv, Martin KaFai Lau,
	Mykola Lysenko, netdev, Palmer Dabbelt, Paolo Abeni,
	Paul Walmsley, Puranjay Mohan, Puranjay Mohan, Shuah Khan,
	Song Liu, Stanislav Fomichev, Yonghong Song

from32to16() is used by lib/checksum.c and also by
arch/parisc/lib/checksum.c. The next patch will use it in the
bpf_csum_diff helper.

Move from32to16() to the include/net/checksum.h as csum_from32to16() and
remove other implementations.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 arch/parisc/lib/checksum.c | 13 ++-----------
 include/net/checksum.h     |  6 ++++++
 lib/checksum.c             | 11 +----------
 3 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/arch/parisc/lib/checksum.c b/arch/parisc/lib/checksum.c
index 4818f3db84a5c..59d8c15d81bd0 100644
--- a/arch/parisc/lib/checksum.c
+++ b/arch/parisc/lib/checksum.c
@@ -25,15 +25,6 @@
 	: "=r"(_t)                      \
 	: "r"(_r), "0"(_t));
 
-static inline unsigned short from32to16(unsigned int x)
-{
-	/* 32 bits --> 16 bits + carry */
-	x = (x & 0xffff) + (x >> 16);
-	/* 16 bits + carry --> 16 bits including carry */
-	x = (x & 0xffff) + (x >> 16);
-	return (unsigned short)x;
-}
-
 static inline unsigned int do_csum(const unsigned char * buff, int len)
 {
 	int odd, count;
@@ -85,7 +76,7 @@ static inline unsigned int do_csum(const unsigned char * buff, int len)
 	}
 	if (len & 1)
 		result += le16_to_cpu(*buff);
-	result = from32to16(result);
+	result = csum_from32to16(result);
 	if (odd)
 		result = swab16(result);
 out:
@@ -102,7 +93,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 {
 	unsigned int result = do_csum(buff, len);
 	addc(result, sum);
-	return (__force __wsum)from32to16(result);
+	return (__force __wsum)csum_from32to16(result);
 }
 
 EXPORT_SYMBOL(csum_partial);
diff --git a/include/net/checksum.h b/include/net/checksum.h
index 1338cb92c8e72..243f972267b8d 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -151,6 +151,12 @@ static inline void csum_replace(__wsum *csum, __wsum old, __wsum new)
 	*csum = csum_add(csum_sub(*csum, old), new);
 }
 
+static inline unsigned short csum_from32to16(unsigned int sum)
+{
+	sum += (sum >> 16) | (sum << 16);
+	return (unsigned short)(sum >> 16);
+}
+
 struct sk_buff;
 void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
 			      __be32 from, __be32 to, bool pseudohdr);
diff --git a/lib/checksum.c b/lib/checksum.c
index 6860d6b05a171..025ba546e1ec6 100644
--- a/lib/checksum.c
+++ b/lib/checksum.c
@@ -34,15 +34,6 @@
 #include <asm/byteorder.h>
 
 #ifndef do_csum
-static inline unsigned short from32to16(unsigned int x)
-{
-	/* add up 16-bit and 16-bit for 16+c bit */
-	x = (x & 0xffff) + (x >> 16);
-	/* add up carry.. */
-	x = (x & 0xffff) + (x >> 16);
-	return x;
-}
-
 static unsigned int do_csum(const unsigned char *buff, int len)
 {
 	int odd;
@@ -90,7 +81,7 @@ static unsigned int do_csum(const unsigned char *buff, int len)
 #else
 		result += (*buff << 8);
 #endif
-	result = from32to16(result);
+	result = csum_from32to16(result);
 	if (odd)
 		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
 out:
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v3 2/4] bpf: bpf_csum_diff: optimize and homogenize for all archs
  2024-10-26 12:53 [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
  2024-10-26 12:53 ` [PATCH bpf-next v3 1/4] net: checksum: move from32to16() to generic header Puranjay Mohan
@ 2024-10-26 12:53 ` Puranjay Mohan
  2024-10-26 12:53 ` [PATCH bpf-next v3 3/4] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier Puranjay Mohan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2024-10-26 12:53 UTC (permalink / raw)
  To: Albert Ou, Alexei Starovoitov, Andrew Morton, Andrii Nakryiko,
	bpf, Daniel Borkmann, David S. Miller, Eduard Zingerman,
	Eric Dumazet, Hao Luo, Helge Deller, Jakub Kicinski,
	James E.J. Bottomley, Jiri Olsa, John Fastabend, KP Singh,
	linux-kernel, linux-parisc, linux-riscv, Martin KaFai Lau,
	Mykola Lysenko, netdev, Palmer Dabbelt, Paolo Abeni,
	Paul Walmsley, Puranjay Mohan, Puranjay Mohan, Shuah Khan,
	Song Liu, Stanislav Fomichev, Yonghong Song

1. Optimization
   ------------

The current implementation copies the 'from' and 'to' buffers to a
scratchpad and it takes the bitwise NOT of 'from' buffer while copying.
In the next step csum_partial() is called with this scratchpad.

so, mathematically, the current implementation is doing:

	result = csum(to - from)

Here, 'to'  and '~ from' are copied in to the scratchpad buffer, we need
it in the scratchpad buffer because csum_partial() takes a single
contiguous buffer and not two disjoint buffers like 'to' and 'from'.

We can re write this equation to:

	result = csum(to) - csum(from)

using the distributive property of csum().

this allows 'to' and 'from' to be at different locations and therefore
this scratchpad and copying is not needed.

This in C code will look like:

result = csum_sub(csum_partial(to, to_size, seed),
                  csum_partial(from, from_size, 0));

2. Homogenization
   --------------

The bpf_csum_diff() helper calls csum_partial() which is implemented by
some architectures like arm and x86 but other architectures rely on the
generic implementation in lib/checksum.c

The generic implementation in lib/checksum.c returns a 16 bit value but
the arch specific implementations can return more than 16 bits, this
works out in most places because before the result is used, it is passed
through csum_fold() that turns it into a 16-bit value.

bpf_csum_diff() directly returns the value from csum_partial() and
therefore the returned values could be different on different
architectures. see discussion in [1]:

for the int value 28 the calculated checksums are:

x86                    :    -29 : 0xffffffe3
generic (arm64, riscv) :  65507 : 0x0000ffe3
arm                    : 131042 : 0x0001ffe2

Pass the result of bpf_csum_diff() through from32to16() before returning
to homogenize this result for all architectures.

NOTE: from32to16() is used instead of csum_fold() because csum_fold()
does from32to16() + bitwise NOT of the result, which is not what we want
to do here.

[1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/core/filter.c | 39 +++++++++++----------------------------
 1 file changed, 11 insertions(+), 28 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index e31ee8be2de07..f2f8e64f19066 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1654,18 +1654,6 @@ void sk_reuseport_prog_free(struct bpf_prog *prog)
 		bpf_prog_destroy(prog);
 }
 
-struct bpf_scratchpad {
-	union {
-		__be32 diff[MAX_BPF_STACK / sizeof(__be32)];
-		u8     buff[MAX_BPF_STACK];
-	};
-	local_lock_t	bh_lock;
-};
-
-static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp) = {
-	.bh_lock	= INIT_LOCAL_LOCK(bh_lock),
-};
-
 static inline int __bpf_try_make_writable(struct sk_buff *skb,
 					  unsigned int write_len)
 {
@@ -2022,11 +2010,6 @@ static const struct bpf_func_proto bpf_l4_csum_replace_proto = {
 BPF_CALL_5(bpf_csum_diff, __be32 *, from, u32, from_size,
 	   __be32 *, to, u32, to_size, __wsum, seed)
 {
-	struct bpf_scratchpad *sp = this_cpu_ptr(&bpf_sp);
-	u32 diff_size = from_size + to_size;
-	int i, j = 0;
-	__wsum ret;
-
 	/* This is quite flexible, some examples:
 	 *
 	 * from_size == 0, to_size > 0,  seed := csum --> pushing data
@@ -2035,19 +2018,19 @@ BPF_CALL_5(bpf_csum_diff, __be32 *, from, u32, from_size,
 	 *
 	 * Even for diffing, from_size and to_size don't need to be equal.
 	 */
-	if (unlikely(((from_size | to_size) & (sizeof(__be32) - 1)) ||
-		     diff_size > sizeof(sp->diff)))
-		return -EINVAL;
 
-	local_lock_nested_bh(&bpf_sp.bh_lock);
-	for (i = 0; i < from_size / sizeof(__be32); i++, j++)
-		sp->diff[j] = ~from[i];
-	for (i = 0; i <   to_size / sizeof(__be32); i++, j++)
-		sp->diff[j] = to[i];
+	__wsum ret = seed;
 
-	ret = csum_partial(sp->diff, diff_size, seed);
-	local_unlock_nested_bh(&bpf_sp.bh_lock);
-	return ret;
+	if (from_size && to_size)
+		ret = csum_sub(csum_partial(to, to_size, ret),
+			       csum_partial(from, from_size, 0));
+	else if (to_size)
+		ret = csum_partial(to, to_size, ret);
+
+	else if (from_size)
+		ret = ~csum_partial(from, from_size, ~ret);
+
+	return csum_from32to16((__force unsigned int)ret);
 }
 
 static const struct bpf_func_proto bpf_csum_diff_proto = {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v3 3/4] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier
  2024-10-26 12:53 [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
  2024-10-26 12:53 ` [PATCH bpf-next v3 1/4] net: checksum: move from32to16() to generic header Puranjay Mohan
  2024-10-26 12:53 ` [PATCH bpf-next v3 2/4] bpf: bpf_csum_diff: optimize and homogenize for all archs Puranjay Mohan
@ 2024-10-26 12:53 ` Puranjay Mohan
  2024-10-26 12:53 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add a selftest for bpf_csum_diff() Puranjay Mohan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2024-10-26 12:53 UTC (permalink / raw)
  To: Albert Ou, Alexei Starovoitov, Andrew Morton, Andrii Nakryiko,
	bpf, Daniel Borkmann, David S. Miller, Eduard Zingerman,
	Eric Dumazet, Hao Luo, Helge Deller, Jakub Kicinski,
	James E.J. Bottomley, Jiri Olsa, John Fastabend, KP Singh,
	linux-kernel, linux-parisc, linux-riscv, Martin KaFai Lau,
	Mykola Lysenko, netdev, Palmer Dabbelt, Paolo Abeni,
	Paul Walmsley, Puranjay Mohan, Puranjay Mohan, Shuah Khan,
	Song Liu, Stanislav Fomichev, Yonghong Song

The bpf_csum_diff() helper has been fixed to return a 16-bit value for
all archs, so now we don't need to mask the result.

This commit is basically reverting the below:

commit 6185266c5a85 ("selftests/bpf: Mask bpf_csum_diff() return value
to 16 bits in test_verifier")

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/testing/selftests/bpf/progs/verifier_array_access.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/verifier_array_access.c b/tools/testing/selftests/bpf/progs/verifier_array_access.c
index 95d7ecc12963b..4195aa824ba55 100644
--- a/tools/testing/selftests/bpf/progs/verifier_array_access.c
+++ b/tools/testing/selftests/bpf/progs/verifier_array_access.c
@@ -368,8 +368,7 @@ __naked void a_read_only_array_2_1(void)
 	r4 = 0;						\
 	r5 = 0;						\
 	call %[bpf_csum_diff];				\
-l0_%=:	r0 &= 0xffff;					\
-	exit;						\
+l0_%=:	exit;						\
 "	:
 	: __imm(bpf_csum_diff),
 	  __imm(bpf_map_lookup_elem),
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v3 4/4] selftests/bpf: Add a selftest for bpf_csum_diff()
  2024-10-26 12:53 [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
                   ` (2 preceding siblings ...)
  2024-10-26 12:53 ` [PATCH bpf-next v3 3/4] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier Puranjay Mohan
@ 2024-10-26 12:53 ` Puranjay Mohan
  2024-10-30 14:40 ` [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs patchwork-bot+netdevbpf
  2024-12-11 22:32 ` patchwork-bot+linux-riscv
  5 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2024-10-26 12:53 UTC (permalink / raw)
  To: Albert Ou, Alexei Starovoitov, Andrew Morton, Andrii Nakryiko,
	bpf, Daniel Borkmann, David S. Miller, Eduard Zingerman,
	Eric Dumazet, Hao Luo, Helge Deller, Jakub Kicinski,
	James E.J. Bottomley, Jiri Olsa, John Fastabend, KP Singh,
	linux-kernel, linux-parisc, linux-riscv, Martin KaFai Lau,
	Mykola Lysenko, netdev, Palmer Dabbelt, Paolo Abeni,
	Paul Walmsley, Puranjay Mohan, Puranjay Mohan, Shuah Khan,
	Song Liu, Stanislav Fomichev, Yonghong Song

Add a selftest for the bpf_csum_diff() helper. This selftests runs the
helper in all three configurations(push, pull, and diff) and verifies
its output. The correct results have been computed by hand and by the
helper's older implementation.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 .../selftests/bpf/prog_tests/test_csum_diff.c | 408 ++++++++++++++++++
 .../selftests/bpf/progs/csum_diff_test.c      |  42 ++
 2 files changed, 450 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_csum_diff.c
 create mode 100644 tools/testing/selftests/bpf/progs/csum_diff_test.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_csum_diff.c b/tools/testing/selftests/bpf/prog_tests/test_csum_diff.c
new file mode 100644
index 0000000000000..107b20d43e839
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_csum_diff.c
@@ -0,0 +1,408 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates */
+#include <test_progs.h>
+#include "csum_diff_test.skel.h"
+
+#define BUFF_SZ 512
+
+struct testcase {
+	unsigned long long to_buff[BUFF_SZ / 8];
+	unsigned int to_buff_len;
+	unsigned long long from_buff[BUFF_SZ / 8];
+	unsigned int from_buff_len;
+	unsigned short seed;
+	unsigned short result;
+};
+
+#define NUM_PUSH_TESTS 4
+
+struct testcase push_tests[NUM_PUSH_TESTS] = {
+	{
+		.to_buff = {
+			0xdeadbeefdeadbeef,
+		},
+		.to_buff_len = 8,
+		.from_buff = {},
+		.from_buff_len = 0,
+		.seed = 0,
+		.result = 0x3b3b
+	},
+	{
+		.to_buff = {
+			0xdeadbeefdeadbeef,
+			0xbeefdeadbeefdead,
+		},
+		.to_buff_len = 16,
+		.from_buff = {},
+		.from_buff_len = 0,
+		.seed = 0x1234,
+		.result = 0x88aa
+	},
+	{
+		.to_buff = {
+			0xdeadbeefdeadbeef,
+			0xbeefdeadbeefdead,
+		},
+		.to_buff_len = 15,
+		.from_buff = {},
+		.from_buff_len = 0,
+		.seed = 0x1234,
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+		.result = 0xcaa9
+#else
+		.result = 0x87fd
+#endif
+	},
+	{
+		.to_buff = {
+			0x327b23c66b8b4567,
+			0x66334873643c9869,
+			0x19495cff74b0dc51,
+			0x625558ec2ae8944a,
+			0x46e87ccd238e1f29,
+			0x507ed7ab3d1b58ba,
+			0x41b71efb2eb141f2,
+			0x7545e14679e2a9e3,
+			0x5bd062c2515f007c,
+			0x4db127f812200854,
+			0x1f16e9e80216231b,
+			0x66ef438d1190cde7,
+			0x3352255a140e0f76,
+			0x0ded7263109cf92e,
+			0x1befd79f7fdcc233,
+			0x6b68079a41a7c4c9,
+			0x25e45d324e6afb66,
+			0x431bd7b7519b500d,
+			0x7c83e4583f2dba31,
+			0x62bbd95a257130a3,
+			0x628c895d436c6125,
+			0x721da317333ab105,
+			0x2d1d5ae92443a858,
+			0x75a2a8d46763845e,
+			0x79838cb208edbdab,
+			0x0b03e0c64353d0cd,
+			0x54e49eb4189a769b,
+			0x2ca8861171f32454,
+			0x02901d820836c40e,
+			0x081386413a95f874,
+			0x7c3dbd3d1e7ff521,
+			0x6ceaf087737b8ddc,
+			0x4516dde922221a70,
+			0x614fd4a13006c83e,
+			0x5577f8e1419ac241,
+			0x05072367440badfc,
+			0x77465f013804823e,
+			0x5c482a977724c67e,
+			0x5e884adc2463b9ea,
+			0x2d51779651ead36b,
+			0x153ea438580bd78f,
+			0x70a64e2a3855585c,
+			0x2a487cb06a2342ec,
+			0x725a06fb1d4ed43b,
+			0x57e4ccaf2cd89a32,
+			0x4b588f547a6d8d3c,
+			0x6de91b18542289ec,
+			0x7644a45c38437fdb,
+			0x684a481a32fff902,
+			0x749abb43579478fe,
+			0x1ba026fa3dc240fb,
+			0x75c6c33a79a1deaa,
+			0x70c6a52912e685fb,
+			0x374a3fe6520eedd1,
+			0x23f9c13c4f4ef005,
+			0x275ac794649bb77c,
+			0x1cf10fd839386575,
+			0x235ba861180115be,
+			0x354fe9f947398c89,
+			0x741226bb15b5af5c,
+			0x10233c990d34b6a8,
+			0x615740953f6ab60f,
+			0x77ae35eb7e0c57b1,
+			0x310c50b3579be4f1,
+		},
+		.to_buff_len = 512,
+		.from_buff = {},
+		.from_buff_len = 0,
+		.seed = 0xffff,
+		.result = 0xca45
+	},
+};
+
+#define NUM_PULL_TESTS 4
+
+struct testcase pull_tests[NUM_PULL_TESTS] = {
+	{
+		.from_buff = {
+			0xdeadbeefdeadbeef,
+		},
+		.from_buff_len = 8,
+		.to_buff = {},
+		.to_buff_len = 0,
+		.seed = 0,
+		.result = 0xc4c4
+	},
+	{
+		.from_buff = {
+			0xdeadbeefdeadbeef,
+			0xbeefdeadbeefdead,
+		},
+		.from_buff_len = 16,
+		.to_buff = {},
+		.to_buff_len = 0,
+		.seed = 0x1234,
+		.result = 0x9bbd
+	},
+	{
+		.from_buff = {
+			0xdeadbeefdeadbeef,
+			0xbeefdeadbeefdead,
+		},
+		.from_buff_len = 15,
+		.to_buff = {},
+		.to_buff_len = 0,
+		.seed = 0x1234,
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+		.result = 0x59be
+#else
+		.result = 0x9c6a
+#endif
+	},
+	{
+		.from_buff = {
+			0x327b23c66b8b4567,
+			0x66334873643c9869,
+			0x19495cff74b0dc51,
+			0x625558ec2ae8944a,
+			0x46e87ccd238e1f29,
+			0x507ed7ab3d1b58ba,
+			0x41b71efb2eb141f2,
+			0x7545e14679e2a9e3,
+			0x5bd062c2515f007c,
+			0x4db127f812200854,
+			0x1f16e9e80216231b,
+			0x66ef438d1190cde7,
+			0x3352255a140e0f76,
+			0x0ded7263109cf92e,
+			0x1befd79f7fdcc233,
+			0x6b68079a41a7c4c9,
+			0x25e45d324e6afb66,
+			0x431bd7b7519b500d,
+			0x7c83e4583f2dba31,
+			0x62bbd95a257130a3,
+			0x628c895d436c6125,
+			0x721da317333ab105,
+			0x2d1d5ae92443a858,
+			0x75a2a8d46763845e,
+			0x79838cb208edbdab,
+			0x0b03e0c64353d0cd,
+			0x54e49eb4189a769b,
+			0x2ca8861171f32454,
+			0x02901d820836c40e,
+			0x081386413a95f874,
+			0x7c3dbd3d1e7ff521,
+			0x6ceaf087737b8ddc,
+			0x4516dde922221a70,
+			0x614fd4a13006c83e,
+			0x5577f8e1419ac241,
+			0x05072367440badfc,
+			0x77465f013804823e,
+			0x5c482a977724c67e,
+			0x5e884adc2463b9ea,
+			0x2d51779651ead36b,
+			0x153ea438580bd78f,
+			0x70a64e2a3855585c,
+			0x2a487cb06a2342ec,
+			0x725a06fb1d4ed43b,
+			0x57e4ccaf2cd89a32,
+			0x4b588f547a6d8d3c,
+			0x6de91b18542289ec,
+			0x7644a45c38437fdb,
+			0x684a481a32fff902,
+			0x749abb43579478fe,
+			0x1ba026fa3dc240fb,
+			0x75c6c33a79a1deaa,
+			0x70c6a52912e685fb,
+			0x374a3fe6520eedd1,
+			0x23f9c13c4f4ef005,
+			0x275ac794649bb77c,
+			0x1cf10fd839386575,
+			0x235ba861180115be,
+			0x354fe9f947398c89,
+			0x741226bb15b5af5c,
+			0x10233c990d34b6a8,
+			0x615740953f6ab60f,
+			0x77ae35eb7e0c57b1,
+			0x310c50b3579be4f1,
+		},
+		.from_buff_len = 512,
+		.to_buff = {},
+		.to_buff_len = 0,
+		.seed = 0xffff,
+		.result = 0x35ba
+	},
+};
+
+#define NUM_DIFF_TESTS 4
+
+struct testcase diff_tests[NUM_DIFF_TESTS] = {
+	{
+		.from_buff = {
+			0xdeadbeefdeadbeef,
+		},
+		.from_buff_len = 8,
+		.to_buff = {
+			0xabababababababab,
+		},
+		.to_buff_len = 8,
+		.seed = 0,
+		.result = 0x7373
+	},
+	{
+		.from_buff = {
+			0xdeadbeefdeadbeef,
+		},
+		.from_buff_len = 7,
+		.to_buff = {
+			0xabababababababab,
+		},
+		.to_buff_len = 7,
+		.seed = 0,
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+		.result = 0xa673
+#else
+		.result = 0x73b7
+#endif
+	},
+	{
+		.from_buff = {
+			0,
+		},
+		.from_buff_len = 8,
+		.to_buff = {
+			0xabababababababab,
+		},
+		.to_buff_len = 8,
+		.seed = 0,
+		.result = 0xaeae
+	},
+	{
+		.from_buff = {
+			0xdeadbeefdeadbeef
+		},
+		.from_buff_len = 8,
+		.to_buff = {
+			0,
+		},
+		.to_buff_len = 8,
+		.seed = 0xffff,
+		.result = 0xc4c4
+	},
+};
+
+#define NUM_EDGE_TESTS 4
+
+struct testcase edge_tests[NUM_EDGE_TESTS] = {
+	{
+		.from_buff = {},
+		.from_buff_len = 0,
+		.to_buff = {},
+		.to_buff_len = 0,
+		.seed = 0,
+		.result = 0
+	},
+	{
+		.from_buff = {
+			0x1234
+		},
+		.from_buff_len = 0,
+		.to_buff = {
+			0x1234
+		},
+		.to_buff_len = 0,
+		.seed = 0,
+		.result = 0
+	},
+	{
+		.from_buff = {},
+		.from_buff_len = 0,
+		.to_buff = {},
+		.to_buff_len = 0,
+		.seed = 0x1234,
+		.result = 0x1234
+	},
+	{
+		.from_buff = {},
+		.from_buff_len = 512,
+		.to_buff = {},
+		.to_buff_len = 0,
+		.seed = 0xffff,
+		.result = 0xffff
+	},
+};
+
+static unsigned short trigger_csum_diff(const struct csum_diff_test *skel)
+{
+	u8 tmp_out[64 << 2] = {};
+	u8 tmp_in[64] = {};
+	int err;
+	int pfd;
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = tmp_in,
+		.data_size_in = sizeof(tmp_in),
+		.data_out = tmp_out,
+		.data_size_out = sizeof(tmp_out),
+		.repeat = 1,
+	);
+	pfd = bpf_program__fd(skel->progs.compute_checksum);
+	err = bpf_prog_test_run_opts(pfd, &topts);
+	if (err)
+		return -1;
+
+	return skel->bss->result;
+}
+
+static void test_csum_diff(struct testcase *tests, int num_tests)
+{
+	struct csum_diff_test *skel;
+	unsigned short got;
+	int err;
+
+	for (int i = 0; i < num_tests; i++) {
+		skel = csum_diff_test__open();
+		if (!ASSERT_OK_PTR(skel, "csum_diff_test open"))
+			return;
+
+		skel->rodata->to_buff_len = tests[i].to_buff_len;
+		skel->rodata->from_buff_len = tests[i].from_buff_len;
+
+		err = csum_diff_test__load(skel);
+		if (!ASSERT_EQ(err, 0, "csum_diff_test load"))
+			goto out;
+
+		memcpy(skel->bss->to_buff, tests[i].to_buff, tests[i].to_buff_len);
+		memcpy(skel->bss->from_buff, tests[i].from_buff, tests[i].from_buff_len);
+		skel->bss->seed = tests[i].seed;
+
+		got = trigger_csum_diff(skel);
+		ASSERT_EQ(got, tests[i].result, "csum_diff result");
+
+		csum_diff_test__destroy(skel);
+	}
+
+	return;
+out:
+	csum_diff_test__destroy(skel);
+}
+
+void test_test_csum_diff(void)
+{
+	if (test__start_subtest("csum_diff_push"))
+		test_csum_diff(push_tests, NUM_PUSH_TESTS);
+	if (test__start_subtest("csum_diff_pull"))
+		test_csum_diff(pull_tests, NUM_PULL_TESTS);
+	if (test__start_subtest("csum_diff_diff"))
+		test_csum_diff(diff_tests, NUM_DIFF_TESTS);
+	if (test__start_subtest("csum_diff_edge"))
+		test_csum_diff(edge_tests, NUM_EDGE_TESTS);
+}
diff --git a/tools/testing/selftests/bpf/progs/csum_diff_test.c b/tools/testing/selftests/bpf/progs/csum_diff_test.c
new file mode 100644
index 0000000000000..9438f1773a589
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/csum_diff_test.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates */
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#define BUFF_SZ 512
+
+/* Will be updated by benchmark before program loading */
+char to_buff[BUFF_SZ];
+const volatile unsigned int to_buff_len = 0;
+char from_buff[BUFF_SZ];
+const volatile unsigned int from_buff_len = 0;
+unsigned short seed = 0;
+
+short result;
+
+char _license[] SEC("license") = "GPL";
+
+SEC("tc")
+int compute_checksum(void *ctx)
+{
+	int to_len_half = to_buff_len / 2;
+	int from_len_half = from_buff_len / 2;
+	short result2;
+
+	/* Calculate checksum in one go */
+	result2 = bpf_csum_diff((void *)from_buff, from_buff_len,
+				(void *)to_buff, to_buff_len, seed);
+
+	/* Calculate checksum by concatenating bpf_csum_diff()*/
+	result = bpf_csum_diff((void *)from_buff, from_buff_len - from_len_half,
+			       (void *)to_buff, to_buff_len - to_len_half, seed);
+
+	result = bpf_csum_diff((void *)from_buff + (from_buff_len - from_len_half), from_len_half,
+			       (void *)to_buff + (to_buff_len - to_len_half), to_len_half, result);
+
+	result = (result == result2) ? result : 0;
+
+	return 0;
+}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs
  2024-10-26 12:53 [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
                   ` (3 preceding siblings ...)
  2024-10-26 12:53 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add a selftest for bpf_csum_diff() Puranjay Mohan
@ 2024-10-30 14:40 ` patchwork-bot+netdevbpf
  2024-12-11 22:32 ` patchwork-bot+linux-riscv
  5 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-10-30 14:40 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: aou, ast, akpm, andrii, bpf, daniel, davem, eddyz87, edumazet,
	haoluo, deller, kuba, James.Bottomley, jolsa, john.fastabend,
	kpsingh, linux-kernel, linux-parisc, linux-riscv, martin.lau,
	mykolal, netdev, palmer, pabeni, paul.walmsley, puranjay12, shuah,
	song, sdf, yonghong.song

Hello:

This series was applied to bpf/bpf-next.git (net)
by Daniel Borkmann <daniel@iogearbox.net>:

On Sat, 26 Oct 2024 12:53:35 +0000 you wrote:
> Changes in v3:
> v2: https://lore.kernel.org/all/20241023153922.86909-1-puranjay@kernel.org/
> - Fix sparse warning in patch 2
> 
> Changes in v2:
> v1: https://lore.kernel.org/all/20241021122112.101513-1-puranjay@kernel.org/
> - Remove the patch that adds the benchmark as it is not useful enough to be
>   added to the tree.
> - Fixed a sparse warning in patch 1.
> - Add reviewed-by and acked-by tags.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3,1/4] net: checksum: move from32to16() to generic header
    https://git.kernel.org/bpf/bpf-next/c/db71aae70e3e
  - [bpf-next,v3,2/4] bpf: bpf_csum_diff: optimize and homogenize for all archs
    https://git.kernel.org/bpf/bpf-next/c/6a4794d5a3e2
  - [bpf-next,v3,3/4] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier
    https://git.kernel.org/bpf/bpf-next/c/b87f584024e1
  - [bpf-next,v3,4/4] selftests/bpf: Add a selftest for bpf_csum_diff()
    https://git.kernel.org/bpf/bpf-next/c/00c1f3dc66a3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs
  2024-10-26 12:53 [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
                   ` (4 preceding siblings ...)
  2024-10-30 14:40 ` [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs patchwork-bot+netdevbpf
@ 2024-12-11 22:32 ` patchwork-bot+linux-riscv
  5 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+linux-riscv @ 2024-12-11 22:32 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: linux-riscv, aou, ast, akpm, andrii, bpf, daniel, davem, eddyz87,
	edumazet, haoluo, deller, kuba, James.Bottomley, jolsa,
	john.fastabend, kpsingh, linux-kernel, linux-parisc, martin.lau,
	mykolal, netdev, palmer, pabeni, paul.walmsley, puranjay12, shuah,
	song, sdf, yonghong.song

Hello:

This series was applied to riscv/linux.git (fixes)
by Daniel Borkmann <daniel@iogearbox.net>:

On Sat, 26 Oct 2024 12:53:35 +0000 you wrote:
> Changes in v3:
> v2: https://lore.kernel.org/all/20241023153922.86909-1-puranjay@kernel.org/
> - Fix sparse warning in patch 2
> 
> Changes in v2:
> v1: https://lore.kernel.org/all/20241021122112.101513-1-puranjay@kernel.org/
> - Remove the patch that adds the benchmark as it is not useful enough to be
>   added to the tree.
> - Fixed a sparse warning in patch 1.
> - Add reviewed-by and acked-by tags.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3,1/4] net: checksum: move from32to16() to generic header
    https://git.kernel.org/riscv/c/db71aae70e3e
  - [bpf-next,v3,2/4] bpf: bpf_csum_diff: optimize and homogenize for all archs
    https://git.kernel.org/riscv/c/6a4794d5a3e2
  - [bpf-next,v3,3/4] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier
    https://git.kernel.org/riscv/c/b87f584024e1
  - [bpf-next,v3,4/4] selftests/bpf: Add a selftest for bpf_csum_diff()
    https://git.kernel.org/riscv/c/00c1f3dc66a3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-11 22:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-26 12:53 [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
2024-10-26 12:53 ` [PATCH bpf-next v3 1/4] net: checksum: move from32to16() to generic header Puranjay Mohan
2024-10-26 12:53 ` [PATCH bpf-next v3 2/4] bpf: bpf_csum_diff: optimize and homogenize for all archs Puranjay Mohan
2024-10-26 12:53 ` [PATCH bpf-next v3 3/4] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier Puranjay Mohan
2024-10-26 12:53 ` [PATCH bpf-next v3 4/4] selftests/bpf: Add a selftest for bpf_csum_diff() Puranjay Mohan
2024-10-30 14:40 ` [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs patchwork-bot+netdevbpf
2024-12-11 22:32 ` patchwork-bot+linux-riscv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).