linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] riscv: Add fine-tuned checksum functions
@ 2023-08-27  1:26 Charlie Jenkins
  2023-08-27  1:26 ` [PATCH 1/5] riscv: Checksum header Charlie Jenkins
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-27  1:26 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Charlie Jenkins

Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking.

Vector support is included in this patch to start a discussion on those
since I am not super familiar with the vector instructions. I wasn't
able to get the vector patches to compile in the kernel, but as vector
support matures I will be able to go back and fix them up. I have tested
the vector patches as standalone algorithms in QEMU.

These functions work best with the Zba and Zbb extensions, so support
for those instructions were added to the kernel.

To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT
and RISCV_CHECKSUM_KUNIT.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
Charlie Jenkins (5):
      riscv: Checksum header
      riscv: Add checksum library
      riscv: Vector checksum header
      riscv: Vector checksum library
      riscv: Test checksum functions

 arch/riscv/Kconfig                    |  23 +++
 arch/riscv/Kconfig.debug              |   1 +
 arch/riscv/Makefile                   |   2 +
 arch/riscv/include/asm/checksum.h     | 165 ++++++++++++++++++++
 arch/riscv/lib/Kconfig.debug          |  31 ++++
 arch/riscv/lib/Makefile               |   3 +
 arch/riscv/lib/csum.c                 | 283 ++++++++++++++++++++++++++++++++++
 arch/riscv/lib/riscv_checksum_kunit.c | 111 +++++++++++++
 8 files changed, 619 insertions(+)
---
base-commit: 7bafbd4027ae86572f308c4ddf93120c90126332
change-id: 20230804-optimize_checksum-db145288ac21
-- 
- Charlie


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] riscv: Checksum header
  2023-08-27  1:26 [PATCH 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
@ 2023-08-27  1:26 ` Charlie Jenkins
  2023-08-27  1:42   ` Conor Dooley
  2023-08-27  1:26 ` [PATCH 2/5] riscv: Add checksum library Charlie Jenkins
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-27  1:26 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Charlie Jenkins

Provide checksum algorithms that have been designed to leverage riscv
instructions such as rotate. In 64-bit, can take advantage of the larger
register to avoid some overflow checking.

Add configuration for Zba extension and add march for Zba and Zbb.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/Kconfig                | 23 +++++++++++
 arch/riscv/Makefile               |  2 +
 arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 111 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 4c07b9189c86..8d7e475ca28d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
 
 	  If you don't know what to do here, say Y.
 
+config TOOLCHAIN_HAS_ZBA
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
+	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
+	depends on AS_HAS_OPTION_ARCH
+
+config RISCV_ISA_ZBA
+	bool "Zba extension support for bit manipulation instructions"
+	depends on TOOLCHAIN_HAS_ZBA
+	depends on MMU
+	depends on RISCV_ALTERNATIVE
+	default y
+	help
+	   Adds support to dynamically detect the presence of the ZBA
+	   extension (basic bit manipulation) and enable its usage.
+
+	   The Zba extension provides instructions to accelerate a number
+	   of bit-specific address creation operations.
+
+	   If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZBB
 	bool
 	default y
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 6ec6d52a4180..51fa3f67fc9a 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
 riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
 riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
 riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
+riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
+riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
 
 ifdef CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC
 KBUILD_CFLAGS += -Wa,-misa-spec=2.2
diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
new file mode 100644
index 000000000000..cd98f8cde888
--- /dev/null
+++ b/arch/riscv/include/asm/checksum.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * IP checksum routines
+ *
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#ifndef __ASM_RISCV_CHECKSUM_H
+#define __ASM_RISCV_CHECKSUM_H
+
+#include <linux/in6.h>
+#include <linux/uaccess.h>
+
+/* Default version is sufficient for 32 bit */
+#ifdef CONFIG_64BIT
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum sum);
+#endif
+
+/*
+ *	Fold a partial checksum without adding pseudo headers
+ */
+static inline __sum16 csum_fold(__wsum sum)
+{
+	sum += (sum >> 16) | (sum << 16);
+	return (__force __sum16)(~(sum >> 16));
+}
+
+#define csum_fold csum_fold
+
+/*
+ *	This is a version of ip_compute_csum() optimized for IP headers,
+ *	which always checksum on 4 octet boundaries.
+ *	Optimized for 32 and 64 bit platforms, with and without vector, with and
+ *	without the bitmanip extensions zba/zbb.
+ */
+#ifdef CONFIG_32BIT
+static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
+{
+	__wsum csum = 0;
+	int pos = 0;
+
+	do {
+		csum += ((const __wsum *)iph)[pos];
+		csum += csum < ((const __wsum *)iph)[pos];
+	} while (++pos < ihl);
+	return csum_fold(csum);
+}
+#else
+
+/*
+ * Quickly compute an IP checksum with the assumption that IPv4 headers will
+ * always be in multiples of 32-bits, and have an ihl of at least 5.
+ * @ihl is the number of 32 bit segments and must be greater than or equal to 5.
+ * @iph is also assumed to be word aligned.
+ */
+static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
+{
+	unsigned long beginning;
+	unsigned long csum = 0;
+
+	beginning = ((const unsigned long *)iph)[0];
+	beginning += ((const unsigned long *)iph)[1];
+	beginning += beginning < ((const unsigned long *)iph)[1];
+	int pos = 4;
+
+	do {
+		csum += ((const unsigned int *)iph)[pos];
+	} while (++pos < ihl);
+	csum += beginning;
+	csum += csum < beginning;
+	csum += (csum >> 32) | (csum << 32); // Calculate overflow
+	return csum_fold((__force __wsum)(csum >> 32));
+}
+#endif
+#define ip_fast_csum ip_fast_csum
+
+#ifdef CONFIG_64BIT
+extern unsigned int do_csum(const unsigned char *buff, int len);
+#define do_csum do_csum
+#endif
+
+#include <asm-generic/checksum.h>
+
+#endif

-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] riscv: Add checksum library
  2023-08-27  1:26 [PATCH 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
  2023-08-27  1:26 ` [PATCH 1/5] riscv: Checksum header Charlie Jenkins
@ 2023-08-27  1:26 ` Charlie Jenkins
  2023-08-27  1:26 ` [PATCH 3/5] riscv: Vector checksum header Charlie Jenkins
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-27  1:26 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Charlie Jenkins

Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
will load from the buffer in groups of 32 bits, and when compiled for
64-bit will load in groups of 64 bits.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/checksum.h |   2 -
 arch/riscv/lib/Makefile           |   1 +
 arch/riscv/lib/csum.c             | 118 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
index cd98f8cde888..af49b3409576 100644
--- a/arch/riscv/include/asm/checksum.h
+++ b/arch/riscv/include/asm/checksum.h
@@ -76,10 +76,8 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
 #endif
 #define ip_fast_csum ip_fast_csum
 
-#ifdef CONFIG_64BIT
 extern unsigned int do_csum(const unsigned char *buff, int len);
 #define do_csum do_csum
-#endif
 
 #include <asm-generic/checksum.h>
 
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 26cb2502ecf8..2aa1a4ad361f 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -6,6 +6,7 @@ lib-y			+= memmove.o
 lib-y			+= strcmp.o
 lib-y			+= strlen.o
 lib-y			+= strncmp.o
+lib-y			+= csum.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
new file mode 100644
index 000000000000..2037041ce8a0
--- /dev/null
+++ b/arch/riscv/lib/csum.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IP checksum library
+ *
+ * Influenced by arch/arm64/lib/csum.c
+ * Copyright (C) 2023 Rivos Inc.
+ */
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/kasan-checks.h>
+#include <linux/kernel.h>
+
+#include <net/checksum.h>
+
+/* Default version is sufficient for 32 bit */
+#ifdef CONFIG_64BIT
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum csum)
+{
+	unsigned long sum, ulen, uproto;
+
+	uproto = (unsigned long)htonl(proto);
+	ulen = (unsigned long)htonl(len);
+	sum = (unsigned long)csum;
+
+	sum += *(const unsigned long *)saddr->s6_addr;
+	sum += sum < csum;
+
+	sum += *((const unsigned long *)saddr->s6_addr + 1);
+	sum += sum < *((const unsigned long *)saddr->s6_addr + 1);
+
+	sum += *(const unsigned long *)daddr->s6_addr;
+	sum += sum < *(const unsigned long *)daddr->s6_addr;
+
+	sum += *((const unsigned long *)daddr->s6_addr + 1);
+	sum += sum < *((const unsigned long *)daddr->s6_addr + 1);
+
+	sum += ulen;
+	sum += sum < ulen;
+
+	sum += uproto;
+	sum += sum < uproto;
+
+	sum += (sum >> 32) | (sum << 32);
+	sum >>= 32;
+	return csum_fold((__force __wsum)sum);
+}
+EXPORT_SYMBOL(csum_ipv6_magic);
+#endif
+
+#ifdef CONFIG_32BIT
+typedef unsigned int csum_t;
+#define OFFSET_MASK 3
+#else
+typedef unsigned long csum_t;
+#define OFFSET_MASK 7
+#endif
+
+/*
+ * Perform a checksum on an arbitrary memory address.
+ * Algorithm accounts for buff being misaligned.
+ * If not aligned on an 8-byte boundary, will read the whole byte but not use
+ * the bytes that it shouldn't. The same thing will occur on the tail-end of the
+ * read.
+ */
+unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
+{
+	unsigned int offset, shift;
+	csum_t csum, data;
+	const csum_t *ptr;
+
+	if (unlikely(len <= 0))
+		return 0;
+	/*
+	 * To align the address, grab the whole first byte in buff.
+	 * Since it is inside of a same byte, it will never cross pages or cache
+	 * lines.
+	 * Directly call KASAN with the alignment we will be using.
+	 */
+	offset = (csum_t)buff & OFFSET_MASK;
+	kasan_check_read(buff, len);
+	ptr = (const csum_t *)(buff - offset);
+	len = len + offset - sizeof(csum_t);
+
+	/*
+	 * RISC-V is always little endian, so need to clear bits to the right.
+	 */
+	shift = offset * 8;
+	data = *ptr;
+	data = (data >> shift) << shift;
+
+	while (len > 0) {
+		csum += data;
+		csum += csum < data;
+		len -= sizeof(csum_t);
+		ptr += 1;
+		data = *ptr;
+	}
+
+	/*
+	 * Perform alignment (and over-read) bytes on the tail if any bytes
+	 * leftover.
+	 */
+	shift = len * -8;
+	data = (data << shift) >> shift;
+	csum += data;
+	csum += csum < data;
+
+#ifdef CONFIG_64BIT
+	csum += (csum >> 32) | (csum << 32);
+	csum >>= 32;
+#endif
+	csum = (unsigned int)csum + (((unsigned int)csum >> 16) | ((unsigned int)csum << 16));
+	if (offset & 1)
+		return (unsigned short)swab32(csum);
+	return csum >> 16;
+}

-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] riscv: Vector checksum header
  2023-08-27  1:26 [PATCH 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
  2023-08-27  1:26 ` [PATCH 1/5] riscv: Checksum header Charlie Jenkins
  2023-08-27  1:26 ` [PATCH 2/5] riscv: Add checksum library Charlie Jenkins
@ 2023-08-27  1:26 ` Charlie Jenkins
  2023-08-28 18:22   ` Samuel Holland
  2023-08-27  1:26 ` [PATCH 4/5] riscv: Vector checksum library Charlie Jenkins
  2023-08-27  1:26 ` [PATCH 5/5] riscv: Test checksum functions Charlie Jenkins
  4 siblings, 1 reply; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-27  1:26 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Charlie Jenkins

This patch is not ready for merge as vector support in the kernel is
limited. However, the code has been tested in QEMU so the algorithms
do work. It is written in assembly rather than using the GCC vector
instrinsics because they did not provide optimal code.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/checksum.h | 81 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
index af49b3409576..7e31c0ad6346 100644
--- a/arch/riscv/include/asm/checksum.h
+++ b/arch/riscv/include/asm/checksum.h
@@ -10,6 +10,10 @@
 #include <linux/in6.h>
 #include <linux/uaccess.h>
 
+#ifdef CONFIG_RISCV_ISA_V
+#include <riscv_vector.h>
+#endif
+
 /* Default version is sufficient for 32 bit */
 #ifdef CONFIG_64BIT
 #define _HAVE_ARCH_IPV6_CSUM
@@ -36,6 +40,46 @@ static inline __sum16 csum_fold(__wsum sum)
  *	without the bitmanip extensions zba/zbb.
  */
 #ifdef CONFIG_32BIT
+#ifdef CONFIG_RISCV_ISA_V
+static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
+{
+	vuint64m1_t prev_buffer;
+	vuint32m1_t curr_buffer;
+	unsigned int vl;
+	unsigned int high_result;
+	unsigned int low_result;
+
+	asm("vsetivli x0, 1, e64, ta, ma				\n\t\
+	vmv.v.i %[prev_buffer], 0					\n\t\
+	1:								\n\t\
+	vsetvli %[vl], %[ihl], e32, m1, ta, ma				\n\t\
+	vle32.v %[curr_buffer], (%[iph])				\n\t\
+	vwredsumu.vs %[prev_buffer], %[curr_buffer], %[prev_buffer]	\n\t\
+	sub %[ihl], %[ihl], %[vl]					\n\t"
+#ifdef CONFIG_RISCV_ISA_ZBA
+	"sh2add %[iph], %[vl], %[iph]					\n\t"
+#else
+	"slli %[vl], %[vl], 2						\n\
+	add %[iph], %[vl], %[iph]					\n\t"
+#endif
+	"bnez %[ihl], 1b						\n\
+	vsetivli x0, 1, e64, m1, ta, ma					\n\
+	vmv.x.s %[low_result], %[prev_buffer]				\n\
+	addi %[vl], x0, 32						\n\
+	vsrl.vx %[prev_buffer], %[prev_buffer], %[vl]			\n\
+	vmv.x.s %[high_result], %[prev_buffer]"
+	: [vl] "=&r" (vl), [prev_buffer] "=&vd" (prev_buffer),
+		[curr_buffer] "=&vd" (curr_buffer),
+		[high_result] "=&r" (high_result),
+		[low_result] "=&r" (low_result)
+	: [iph] "r" (iph), [ihl] "r" (ihl));
+
+	high_result += low_result;
+	high_result += high_result < low_result;
+	return csum_fold((__force __wsum)(high_result));
+}
+
+#else
 static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
 {
 	__wsum csum = 0;
@@ -47,8 +91,44 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
 	} while (++pos < ihl);
 	return csum_fold(csum);
 }
+#endif
+#else
+
+#ifdef CONFIG_RISCV_ISA_V
+static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
+{
+	vuint64m1_t prev_buffer;
+	vuint32m1_t curr_buffer;
+	unsigned long vl;
+	unsigned long result;
+
+	asm("vsetivli x0, 1, e64, ta, ma				\n\
+	vmv.v.i %[prev_buffer], 0					\n\
+	1:								\n\
+	# Setup 32-bit sum of iph					\n\
+	vsetvli %[vl], %[ihl], e32, m1, ta, ma				\n\
+	vle32.v %[curr_buffer], (%[iph])				\n\
+	# Sum each 32-bit segment of iph that can fit into a vector reg	\n\
+	vwredsumu.vs %[prev_buffer], %[curr_buffer], %[prev_buffer]     \n\
+	subw %[ihl], %[ihl], %[vl]					\n\t"
+#ifdef CONFIG_RISCV_ISA_ZBA
+	"sh2add %[iph], %[vl], %[iph]					\n\t"
 #else
+	"slli %[vl], %[vl], 2						\n\
+	addw %[iph], %[vl], %[iph]					\n\t"
+#endif
+	"# If not all of iph could fit into vector reg, do another sum	\n\
+	bnez %[ihl], 1b							\n\
+	vsetvli x0, x0, e64, m1, ta, ma					\n\
+	vmv.x.s %[result], %[prev_buffer]"
+	: [vl] "=&r" (vl), [prev_buffer] "=&vd" (prev_buffer),
+		[curr_buffer] "=&vd" (curr_buffer), [result] "=&r" (result)
+	: [iph] "r" (iph), [ihl] "r" (ihl));
 
+	result += (result >> 32) | (result << 32);
+	return csum_fold((__force __wsum)(result >> 32));
+}
+#else
 /*
  * Quickly compute an IP checksum with the assumption that IPv4 headers will
  * always be in multiples of 32-bits, and have an ihl of at least 5.
@@ -74,6 +154,7 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
 	return csum_fold((__force __wsum)(csum >> 32));
 }
 #endif
+#endif
 #define ip_fast_csum ip_fast_csum
 
 extern unsigned int do_csum(const unsigned char *buff, int len);

-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] riscv: Vector checksum library
  2023-08-27  1:26 [PATCH 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
                   ` (2 preceding siblings ...)
  2023-08-27  1:26 ` [PATCH 3/5] riscv: Vector checksum header Charlie Jenkins
@ 2023-08-27  1:26 ` Charlie Jenkins
  2023-08-27  1:26 ` [PATCH 5/5] riscv: Test checksum functions Charlie Jenkins
  4 siblings, 0 replies; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-27  1:26 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Charlie Jenkins

This patch is not ready for merge as vector support in the kernel is
limited. However, the code has been tested in QEMU so the algorithms
do work. When Vector support is more mature, I will do more thorough
testing of this code. It is written in assembly rather than using
the GCC vector instrinsics because they did not provide optimal code.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/lib/csum.c | 165 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 165 insertions(+)

diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
index 2037041ce8a0..049a10596008 100644
--- a/arch/riscv/lib/csum.c
+++ b/arch/riscv/lib/csum.c
@@ -12,6 +12,10 @@
 
 #include <net/checksum.h>
 
+#ifdef CONFIG_RISCV_ISA_V
+#include <riscv_vector.h>
+#endif
+
 /* Default version is sufficient for 32 bit */
 #ifdef CONFIG_64BIT
 __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
@@ -64,6 +68,166 @@ typedef unsigned long csum_t;
  * the bytes that it shouldn't. The same thing will occur on the tail-end of the
  * read.
  */
+#ifdef CONFIG_RISCV_ISA_V
+#ifdef CONFIG_32BIT
+unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
+{
+	vuint64m1_t prev_buffer;
+	vuint32m1_t curr_buffer;
+	unsigned int shift;
+	unsigned int vl, high_result, low_result, csum, offset;
+	unsigned int tail_seg;
+	const unsigned int *ptr;
+
+	if (len <= 0)
+		return 0;
+
+	/*
+	 * To align the address, grab the whole first byte in buff.
+	 * Directly call KASAN with the alignment we will be using.
+	 */
+	offset = (unsigned int)buff & OFFSET_MASK;
+	kasan_check_read(buff, len);
+	ptr = (const unsigned int *)(buff - offset);
+	len += offset;
+
+	// Read the tail segment
+	tail_seg = len % 4;
+	csum = 0;
+	if (tail_seg) {
+		shift = (4 - tail_seg) * 8;
+		csum = *(unsigned int *)((const unsigned char *)ptr + len - tail_seg);
+		csum = ((unsigned int)csum << shift) >> shift;
+		len -= tail_seg;
+	}
+
+	unsigned long start_mask = (unsigned int)(~(~0U << offset));
+
+	asm("vsetvli %[vl], %[len], e8, m1, ta, ma			\n\
+	# clear out mask and vector registers since we switch up sizes	\n\
+	vmclr.m v0							\n\
+	vmclr.m %[prev_buffer]						\n\
+	vmclr.m %[curr_buffer]						\n\
+	# Mask out the leading bits of a misaligned address		\n\
+	vsetivli x0, 1, e64, m1, ta, ma					\n\
+	vmv.s.x %[prev_buffer], %[csum]					\n\
+	vmv.s.x v0, %[start_mask]					\n\
+	vsetvli %[vl], %[len], e8, m1, ta, ma				\n\
+	vmnot.m v0, v0							\n\
+	vle8.v %[curr_buffer], (%[buff]), v0.t				\n\
+	j 2f								\n\
+	# Iterate through the buff and sum all words			\n\
+	1:								\n\
+	vsetvli %[vl], %[len], e8, m1, ta, ma				\n\
+	vle8.v %[curr_buffer], (%[buff])				\n\
+	2:								\n\
+	vsetvli x0, x0, e32, m1, ta, ma					\n\
+	vwredsumu.vs %[prev_buffer], %[curr_buffer], %[prev_buffer]	\n\
+	sub %[len], %[len], %[vl]					\n\t"
+#ifdef CONFIG_RISCV_ISA_ZBA
+	"sh2add %[iph], %[vl], %[iph]					\n\t"
+#else
+	"slli %[vl], %[vl], 2						\n\
+	add %[iph], %[vl], %[iph]					\n\t"
+#endif
+	"bnez %[len], 1b						\n\
+	vsetvli x0, x0, e64, m1, ta, ma					\n\
+	vmv.x.s %[result], %[prev_buffer]				\n\
+	addi %[vl], x0, 32						\n\
+	vsrl.vx %[prev_buffer], %[prev_buffer], %[vl]			\n\
+	vmv.x.s %[high_result], %[prev_buffer]"
+	: [vl] "=&r" (vl), [prev_buffer] "=&vd" (prev_buffer),
+		[curr_buffer] "=&vd" (curr_buffer),
+		[high_result] "=&r" (high_result),
+		[low_result] "=&r" (low_result)
+	: [buff] "r" (ptr), [len] "r" (len), [start_mask] "r" (start_mask),
+		[csum] "r" (csum));
+
+	high_result += low_result;
+	high_result += high_result < low_result;
+	result = (unsigned int)result + (((unsigned int)result >> 16) | ((unsigned int)result << 16));
+	if (offset & 1)
+		return (unsigned short)swab32(result);
+	return result >> 16;
+}
+#else
+unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
+{
+	vuint64m1_t prev_buffer;
+	vuint32m1_t curr_buffer;
+	unsigned int shift;
+	unsigned long vl, result, csum, offset;
+	unsigned int tail_seg;
+	const unsigned long *ptr;
+
+	if (len <= 0)
+		return 0;
+
+	/*
+	 * To align the address, grab the whole first byte in buff.
+	 * Directly call KASAN with the alignment we will be using.
+	 */
+	offset = (unsigned long)buff & 7;
+	kasan_check_read(buff, len);
+	ptr = (const unsigned long *)(buff - offset);
+	len += offset;
+
+	// Read the tail segment
+	tail_seg = len % 4;
+	csum = 0;
+	if (tail_seg) {
+		shift = (4 - tail_seg) * 8;
+		csum = *(unsigned int *)((const unsigned char *)ptr + len - tail_seg);
+		csum = ((unsigned int)csum << shift) >> shift;
+		len -= tail_seg;
+	}
+
+	unsigned long start_mask = (unsigned int)(~(~0U << offset));
+
+	asm("vsetvli %[vl], %[len], e8, m1, ta, ma			\n\
+	# clear out mask and vector registers since we switch up sizes	\n\
+	vmclr.m v0							\n\
+	vmclr.m %[prev_buffer]						\n\
+	vmclr.m %[curr_buffer]						\n\
+	# Mask out the leading bits of a misaligned address		\n\
+	vsetivli x0, 1, e64, m1, ta, ma					\n\
+	vmv.s.x %[prev_buffer], %[csum]					\n\
+	vmv.s.x v0, %[start_mask]					\n\
+	vsetvli %[vl], %[len], e8, m1, ta, ma				\n\
+	vmnot.m v0, v0							\n\
+	vle8.v %[curr_buffer], (%[buff]), v0.t				\n\
+	j 2f								\n\
+	# Iterate through the buff and sum all words			\n\
+	1:								\n\
+	vsetvli %[vl], %[len], e8, m1, ta, ma				\n\
+	vle8.v %[curr_buffer], (%[buff])				\n\
+	2:								\n\
+	vsetvli x0, x0, e32, m1, ta, ma					\n\
+	vwredsumu.vs %[prev_buffer], %[curr_buffer], %[prev_buffer]	\n\
+	subw %[len], %[len], %[vl]					\n\t"
+#ifdef CONFIG_RISCV_ISA_ZBA
+	"sh2add %[iph], %[vl], %[iph]					\n\t"
+#else
+	"slli %[vl], %[vl], 2						\n\
+	addw %[iph], %[vl], %[iph]					\n\t"
+#endif
+	"bnez %[len], 1b						\n\
+	vsetvli x0, x0, e64, m1, ta, ma					\n\
+	vmv.x.s %[result], %[prev_buffer]"
+	: [vl] "=&r" (vl), [prev_buffer] "=&vd" (prev_buffer),
+		[curr_buffer] "=&vd" (curr_buffer), [result] "=&r" (result)
+	: [buff] "r" (ptr), [len] "r" (len), [start_mask] "r" (start_mask),
+		[csum] "r" (csum));
+
+	result += (result >> 32) | (result << 32);
+	result >>= 32;
+	result = (unsigned int)result + (((unsigned int)result >> 16) | ((unsigned int)result << 16));
+	if (offset & 1)
+		return (unsigned short)swab32(result);
+	return result >> 16;
+}
+#endif
+#else
 unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
 {
 	unsigned int offset, shift;
@@ -116,3 +280,4 @@ unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
 		return (unsigned short)swab32(csum);
 	return csum >> 16;
 }
+#endif

-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] riscv: Test checksum functions
  2023-08-27  1:26 [PATCH 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
                   ` (3 preceding siblings ...)
  2023-08-27  1:26 ` [PATCH 4/5] riscv: Vector checksum library Charlie Jenkins
@ 2023-08-27  1:26 ` Charlie Jenkins
  4 siblings, 0 replies; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-27  1:26 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Charlie Jenkins

Add Kconfig support for riscv specific testing modules. This was created
to supplement lib/checksum_kunit.c, and add tests for ip_fast_csum and
csum_ipv6_magic.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/Kconfig.debug              |   1 +
 arch/riscv/lib/Kconfig.debug          |  31 ++++++++++
 arch/riscv/lib/Makefile               |   2 +
 arch/riscv/lib/riscv_checksum_kunit.c | 111 ++++++++++++++++++++++++++++++++++
 4 files changed, 145 insertions(+)

diff --git a/arch/riscv/Kconfig.debug b/arch/riscv/Kconfig.debug
index e69de29bb2d1..53a84ec4f91f 100644
--- a/arch/riscv/Kconfig.debug
+++ b/arch/riscv/Kconfig.debug
@@ -0,0 +1 @@
+source "arch/riscv/lib/Kconfig.debug"
diff --git a/arch/riscv/lib/Kconfig.debug b/arch/riscv/lib/Kconfig.debug
new file mode 100644
index 000000000000..15fc83b68340
--- /dev/null
+++ b/arch/riscv/lib/Kconfig.debug
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "riscv Testing and Coverage"
+
+menuconfig RUNTIME_TESTING_MENU
+	bool "Runtime Testing"
+	def_bool y
+	help
+	  Enable riscv runtime testing.
+
+if RUNTIME_TESTING_MENU
+
+config RISCV_CHECKSUM_KUNIT
+	tristate "KUnit test riscv checksum functions at runtime" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to test the checksum functions at boot.
+
+	  KUnit tests run during boot and output the results to the debug log
+	  in TAP format (http://testanything.org/). Only useful for kernel devs
+	  running the KUnit test harness, and not intended for inclusion into a
+	  production build.
+
+	  For more information on KUnit and unit tests in general please refer
+	  to the KUnit documentation in Documentation/dev-tools/kunit/.
+
+	  If unsure, say N.
+
+endif # RUNTIME_TESTING_MENU
+
+endmenu # "riscv Testing and Coverage"
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 2aa1a4ad361f..1535a8c81430 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -12,3 +12,5 @@ lib-$(CONFIG_64BIT)	+= tishift.o
 lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+
+obj-$(CONFIG_RISCV_CHECKSUM_KUNIT) += riscv_checksum_kunit.o
diff --git a/arch/riscv/lib/riscv_checksum_kunit.c b/arch/riscv/lib/riscv_checksum_kunit.c
new file mode 100644
index 000000000000..05b4710c907f
--- /dev/null
+++ b/arch/riscv/lib/riscv_checksum_kunit.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Test cases for checksum
+ */
+
+#include <linux/in6.h>
+
+#include <kunit/test.h>
+#include <net/checksum.h>
+#include <net/ip6_checksum.h>
+
+#define CHECK_EQ(lhs, rhs) KUNIT_ASSERT_EQ(test, lhs, rhs)
+
+static void test_csum_fold(struct kunit *test)
+{
+	unsigned int one = 1226127848;
+	unsigned int two = 446627905;
+	unsigned int three = 3644783064;
+	unsigned int four = 361842745;
+	unsigned int five = 4281073503;
+	unsigned int max = -1;
+
+	CHECK_EQ(0x7d02, csum_fold(one));
+	CHECK_EQ(0xe51f, csum_fold(two));
+	CHECK_EQ(0x2ce8, csum_fold(three));
+	CHECK_EQ(0xa235, csum_fold(four));
+	CHECK_EQ(0x174, csum_fold(five));
+	CHECK_EQ(0x0, csum_fold(max));
+}
+
+static void test_ip_fast_csum(struct kunit *test)
+{
+	unsigned char *average = { 0x1c, 0x00, 0x00, 0x45, 0x00, 0x00, 0x68,
+				   0x74, 0x00, 0x00, 0x11, 0x80, 0x01, 0x64,
+				   0xa8, 0xc0, 0xe9, 0x9c, 0x46, 0xab };
+	unsigned char *larger = { 0xa3, 0xde, 0x43, 0x41, 0x11, 0x19,
+				  0x2f, 0x73, 0x00, 0x00, 0xf1, 0xc5,
+				  0x31, 0xbb, 0xaa, 0xc1, 0x23, 0x5f,
+				  0x32, 0xde, 0x65, 0x39, 0xfe, 0xbc };
+	unsigned char *overflow = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+				    0xff, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00,
+				    0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
+	unsigned char *max = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xfd, 0xff, 0xff,
+			       0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			       0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			       0xff, 0xff, 0xff, 0xff, 0xff, 0xfd, 0xff, 0xff,
+			       0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			       0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			       0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+
+	CHECK_EQ(0x598f, ip_fast_csum(average, 5));
+	CHECK_EQ(0xdd4f, ip_fast_csum(larger, 6));
+	CHECK_EQ(0xfffe, ip_fast_csum(overflow, 5));
+	CHECK_EQ(0x400, ip_fast_csum(max, 14));
+}
+
+static void test_csum_ipv6_magic(struct kunit *test)
+{
+	struct in6_addr saddr = {
+		.s6_addr = { 0xf8, 0x43, 0x43, 0xf0, 0xdc, 0xa0, 0x39, 0x92,
+			     0x43, 0x67, 0x12, 0x03, 0xe3, 0x32, 0xfe, 0xed }};
+	struct in6_addr daddr = {
+		.s6_addr = { 0xa8, 0x23, 0x46, 0xdc, 0xc8, 0x2d, 0xaa, 0xe3,
+			     0xdc, 0x66, 0x72, 0x43, 0xe2, 0x12, 0xee, 0xfd }};
+	u32 len = 1 << 10;
+	u8 proto = 17;
+	__wsum csum = 53;
+
+	CHECK_EQ(0x2fbb, csum_ipv6_magic(&saddr, &daddr, len, proto, csum));
+}
+
+static void test_do_csum(struct kunit *test)
+{
+	unsigned char *very_small = {0x32};
+	unsigned char *small = {0xd3, 0x43, 0xad, 0x46};
+	unsigned char *medium = {
+		0xa0, 0x13, 0xaa, 0xa6, 0x53, 0xac, 0xa3, 0x43
+	};
+	unsigned char *misaligned = medium + 1;
+	unsigned char *large = {
+		0xa0, 0x13, 0xaa, 0xa6, 0x53, 0xac, 0xa3, 0x43,
+		0xa0, 0x13, 0xaa, 0xa6, 0x53, 0xac, 0xa3, 0x43,
+		0xa0, 0x13, 0xaa, 0xa6, 0x53, 0xac, 0xa3, 0x43,
+		0xa0, 0x13, 0xaa, 0xa6, 0x53, 0xac, 0xa3, 0x43
+	};
+	unsigned char *large_misaligned = large + 3;
+
+	CHECK_EQ(0xffcd, ip_compute_csum(very_small, 1));
+	CHECK_EQ(0x757f, ip_compute_csum(small, 4));
+	CHECK_EQ(0x5e56, ip_compute_csum(misaligned, 7));
+	CHECK_EQ(0x469d, ip_compute_csum(large, 29));
+	CHECK_EQ(0x43ae, ip_compute_csum(large_misaligned, 28));
+}
+
+static struct kunit_case __refdata riscv_checksum_test_cases[] = {
+	KUNIT_CASE(test_csum_fold),
+	KUNIT_CASE(test_ip_fast_csum),
+	KUNIT_CASE(test_csum_ipv6_magic),
+	KUNIT_CASE(test_do_csum),
+	{}
+};
+
+static struct kunit_suite riscv_checksum_test_suite = {
+	.name = "riscv_checksum",
+	.test_cases = riscv_checksum_test_cases,
+};
+
+kunit_test_suites(&riscv_checksum_test_suite);
+
+MODULE_AUTHOR("Charlie Jenkins <charlie@rivosinc.com>");
+MODULE_LICENSE("GPL");

-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-27  1:26 ` [PATCH 1/5] riscv: Checksum header Charlie Jenkins
@ 2023-08-27  1:42   ` Conor Dooley
  2023-08-27  2:00     ` Palmer Dabbelt
  0 siblings, 1 reply; 16+ messages in thread
From: Conor Dooley @ 2023-08-27  1:42 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: linux-riscv, linux-kernel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou


[-- Attachment #1.1: Type: text/plain, Size: 3066 bytes --]

On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
> Provide checksum algorithms that have been designed to leverage riscv
> instructions such as rotate. In 64-bit, can take advantage of the larger
> register to avoid some overflow checking.
> 
> Add configuration for Zba extension and add march for Zba and Zbb.
> 
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/riscv/Kconfig                | 23 +++++++++++
>  arch/riscv/Makefile               |  2 +
>  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 111 insertions(+)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 4c07b9189c86..8d7e475ca28d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
>  
>  	  If you don't know what to do here, say Y.
>  
> +config TOOLCHAIN_HAS_ZBA
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
> +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> +	depends on AS_HAS_OPTION_ARCH
> +
> +config RISCV_ISA_ZBA
> +	bool "Zba extension support for bit manipulation instructions"
> +	depends on TOOLCHAIN_HAS_ZBA
> +	depends on MMU
> +	depends on RISCV_ALTERNATIVE
> +	default y
> +	help
> +	   Adds support to dynamically detect the presence of the ZBA
> +	   extension (basic bit manipulation) and enable its usage.
> +
> +	   The Zba extension provides instructions to accelerate a number
> +	   of bit-specific address creation operations.
> +
> +	   If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZBB
>  	bool
>  	default y
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index 6ec6d52a4180..51fa3f67fc9a 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
>  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
>  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
>  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
> +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
> +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb

AFAICT, this is going to break immediately on any system that enables
RISCV_ISA_ZBA (which will happen by default) but does not support the
extension. You made the option depend on RISCV_ALTERNATIVE, but I do
not see any use of alternatives in the code to actually perform the
dynamic detection of Zba.
Note that for fd & v, we add it to riscv-march-y, but then immediately
remove it again before passing to the compiler, only allow them in
AFLAGS:
	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')

What am I missing?

Thanks,
Conor.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-27  1:42   ` Conor Dooley
@ 2023-08-27  2:00     ` Palmer Dabbelt
  2023-08-27 10:28       ` Conor Dooley
  0 siblings, 1 reply; 16+ messages in thread
From: Palmer Dabbelt @ 2023-08-27  2:00 UTC (permalink / raw)
  To: Conor Dooley; +Cc: charlie, linux-riscv, linux-kernel, Paul Walmsley, aou

On Sat, 26 Aug 2023 18:42:41 PDT (-0700), Conor Dooley wrote:
> On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
>> Provide checksum algorithms that have been designed to leverage riscv
>> instructions such as rotate. In 64-bit, can take advantage of the larger
>> register to avoid some overflow checking.
>> 
>> Add configuration for Zba extension and add march for Zba and Zbb.
>> 
>> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
>> ---
>>  arch/riscv/Kconfig                | 23 +++++++++++
>>  arch/riscv/Makefile               |  2 +
>>  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 111 insertions(+)
>> 
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 4c07b9189c86..8d7e475ca28d 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
>>  
>>  	  If you don't know what to do here, say Y.
>>  
>> +config TOOLCHAIN_HAS_ZBA
>> +	bool
>> +	default y
>> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
>> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
>> +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
>> +	depends on AS_HAS_OPTION_ARCH
>> +
>> +config RISCV_ISA_ZBA
>> +	bool "Zba extension support for bit manipulation instructions"
>> +	depends on TOOLCHAIN_HAS_ZBA
>> +	depends on MMU
>> +	depends on RISCV_ALTERNATIVE
>> +	default y
>> +	help
>> +	   Adds support to dynamically detect the presence of the ZBA
>> +	   extension (basic bit manipulation) and enable its usage.
>> +
>> +	   The Zba extension provides instructions to accelerate a number
>> +	   of bit-specific address creation operations.
>> +
>> +	   If you don't know what to do here, say Y.
>> +
>>  config TOOLCHAIN_HAS_ZBB
>>  	bool
>>  	default y
>> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
>> index 6ec6d52a4180..51fa3f67fc9a 100644
>> --- a/arch/riscv/Makefile
>> +++ b/arch/riscv/Makefile
>> @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
>>  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
>>  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
>>  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
>> +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
>> +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
>
> AFAICT, this is going to break immediately on any system that enables
> RISCV_ISA_ZBA (which will happen by default) but does not support the
> extension. You made the option depend on RISCV_ALTERNATIVE, but I do
> not see any use of alternatives in the code to actually perform the
> dynamic detection of Zba.

I guess we kind of have an ambiguity here: for stuff like C we just 
unconditionally use the instructions, but for the rest we probe first.  
We should probably have three states for each extension: disabled, 
dynamically detected, and assumed.

> Note that for fd & v, we add it to riscv-march-y, but then immediately
> remove it again before passing to the compiler, only allow them in
> AFLAGS:
> 	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> 	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> 	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
>
> What am I missing?

FD and V both have state that can be saved lazily, so we can't let 
arbitrary code use them.  The extensions formally known as B don't add 
state, so they are safe to flip on in arbitrary places (aside from the 
issues you pointed out above).

>
> Thanks,
> Conor.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-27  2:00     ` Palmer Dabbelt
@ 2023-08-27 10:28       ` Conor Dooley
  2023-08-27 12:25         ` Conor Dooley
  0 siblings, 1 reply; 16+ messages in thread
From: Conor Dooley @ 2023-08-27 10:28 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: charlie, linux-riscv, linux-kernel, Paul Walmsley, aou


[-- Attachment #1.1: Type: text/plain, Size: 4509 bytes --]

On Sat, Aug 26, 2023 at 07:00:47PM -0700, Palmer Dabbelt wrote:
> On Sat, 26 Aug 2023 18:42:41 PDT (-0700), Conor Dooley wrote:
> > On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
> > > Provide checksum algorithms that have been designed to leverage riscv
> > > instructions such as rotate. In 64-bit, can take advantage of the larger
> > > register to avoid some overflow checking.
> > > 
> > > Add configuration for Zba extension and add march for Zba and Zbb.
> > > 
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > ---
> > >  arch/riscv/Kconfig                | 23 +++++++++++
> > >  arch/riscv/Makefile               |  2 +
> > >  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 111 insertions(+)
> > > 
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 4c07b9189c86..8d7e475ca28d 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> > >  	  If you don't know what to do here, say Y.
> > > +config TOOLCHAIN_HAS_ZBA
> > > +	bool
> > > +	default y
> > > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
> > > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
> > > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > > +	depends on AS_HAS_OPTION_ARCH
> > > +
> > > +config RISCV_ISA_ZBA
> > > +	bool "Zba extension support for bit manipulation instructions"
> > > +	depends on TOOLCHAIN_HAS_ZBA
> > > +	depends on MMU
> > > +	depends on RISCV_ALTERNATIVE
> > > +	default y
> > > +	help
> > > +	   Adds support to dynamically detect the presence of the ZBA
> > > +	   extension (basic bit manipulation) and enable its usage.
> > > +
> > > +	   The Zba extension provides instructions to accelerate a number
> > > +	   of bit-specific address creation operations.
> > > +
> > > +	   If you don't know what to do here, say Y.
> > > +
> > >  config TOOLCHAIN_HAS_ZBB
> > >  	bool
> > >  	default y
> > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > index 6ec6d52a4180..51fa3f67fc9a 100644
> > > --- a/arch/riscv/Makefile
> > > +++ b/arch/riscv/Makefile
> > > @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
> > >  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
> > >  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
> > >  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
> > > +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
> > > +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
> > 
> > AFAICT, this is going to break immediately on any system that enables
> > RISCV_ISA_ZBA (which will happen by default) but does not support the
> > extension. You made the option depend on RISCV_ALTERNATIVE, but I do
> > not see any use of alternatives in the code to actually perform the
> > dynamic detection of Zba.
> 
> I guess we kind of have an ambiguity here: for stuff like C we just
> unconditionally use the instructions, but for the rest we probe first.  We
> should probably have three states for each extension: disabled, dynamically
> detected, and assumed.

You mean, just add some comments to the makefile surrounding each
section or to some rst documentation?

> > Note that for fd & v, we add it to riscv-march-y, but then immediately
> > remove it again before passing to the compiler, only allow them in
> > AFLAGS:
> > 	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> > 	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> > 	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > 
> > What am I missing?
> 
> FD and V both have state that can be saved lazily, so we can't let arbitrary
> code use them.  The extensions formally known as B don't add state, so they
> are safe to flip on in arbitrary places (aside from the issues you pointed
> out above).

I probably went about this badly since you missed the point. I was
trying to point out that for anything other than the compressed
extensions in the block above that we only pass them in march to the
assembler, and not to the compiler, in contrast to this patch which just
always passes them. I should have pointed to how we handled the
in-kernel Zbb stuff & asked how this was any different, would probably
have been clearer.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-27 10:28       ` Conor Dooley
@ 2023-08-27 12:25         ` Conor Dooley
  2023-08-28 16:55           ` Charlie Jenkins
  0 siblings, 1 reply; 16+ messages in thread
From: Conor Dooley @ 2023-08-27 12:25 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: charlie, linux-riscv, linux-kernel, Paul Walmsley, aou


[-- Attachment #1.1: Type: text/plain, Size: 5123 bytes --]

On Sun, Aug 27, 2023 at 11:28:33AM +0100, Conor Dooley wrote:
> On Sat, Aug 26, 2023 at 07:00:47PM -0700, Palmer Dabbelt wrote:
> > On Sat, 26 Aug 2023 18:42:41 PDT (-0700), Conor Dooley wrote:
> > > On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
> > > > Provide checksum algorithms that have been designed to leverage riscv
> > > > instructions such as rotate. In 64-bit, can take advantage of the larger
> > > > register to avoid some overflow checking.
> > > > 
> > > > Add configuration for Zba extension and add march for Zba and Zbb.
> > > > 
> > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > ---
> > > >  arch/riscv/Kconfig                | 23 +++++++++++
> > > >  arch/riscv/Makefile               |  2 +
> > > >  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 111 insertions(+)
> > > > 
> > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > index 4c07b9189c86..8d7e475ca28d 100644
> > > > --- a/arch/riscv/Kconfig
> > > > +++ b/arch/riscv/Kconfig
> > > > @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> > > >  	  If you don't know what to do here, say Y.
> > > > +config TOOLCHAIN_HAS_ZBA
> > > > +	bool
> > > > +	default y
> > > > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
> > > > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
> > > > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > > > +	depends on AS_HAS_OPTION_ARCH
> > > > +
> > > > +config RISCV_ISA_ZBA
> > > > +	bool "Zba extension support for bit manipulation instructions"
> > > > +	depends on TOOLCHAIN_HAS_ZBA
> > > > +	depends on MMU
> > > > +	depends on RISCV_ALTERNATIVE
> > > > +	default y
> > > > +	help
> > > > +	   Adds support to dynamically detect the presence of the ZBA
> > > > +	   extension (basic bit manipulation) and enable its usage.
> > > > +
> > > > +	   The Zba extension provides instructions to accelerate a number
> > > > +	   of bit-specific address creation operations.
> > > > +
> > > > +	   If you don't know what to do here, say Y.
> > > > +
> > > >  config TOOLCHAIN_HAS_ZBB
> > > >  	bool
> > > >  	default y
> > > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > > index 6ec6d52a4180..51fa3f67fc9a 100644
> > > > --- a/arch/riscv/Makefile
> > > > +++ b/arch/riscv/Makefile
> > > > @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
> > > >  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
> > > >  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
> > > >  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
> > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
> > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
> > > 
> > > AFAICT, this is going to break immediately on any system that enables
> > > RISCV_ISA_ZBA (which will happen by default) but does not support the
> > > extension. You made the option depend on RISCV_ALTERNATIVE, but I do
> > > not see any use of alternatives in the code to actually perform the
> > > dynamic detection of Zba.
> > 
> > I guess we kind of have an ambiguity here: for stuff like C we just
> > unconditionally use the instructions, but for the rest we probe first.  We
> > should probably have three states for each extension: disabled, dynamically
> > detected, and assumed.
> 
> You mean, just add some comments to the makefile surrounding each
> section or to some rst documentation?

Also, the code here doesn't build w/
	warning: invalid argument to '-march': '_zba_zbb_zicsr_zifencei_zihintpause'
so there's something else wrong with TOOLCHAIN_HAS_ZBA :)

> 
> > > Note that for fd & v, we add it to riscv-march-y, but then immediately
> > > remove it again before passing to the compiler, only allow them in
> > > AFLAGS:
> > > 	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> > > 	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> > > 	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > > 
> > > What am I missing?
> > 
> > FD and V both have state that can be saved lazily, so we can't let arbitrary
> > code use them.  The extensions formally known as B don't add state, so they
> > are safe to flip on in arbitrary places (aside from the issues you pointed
> > out above).
> 
> I probably went about this badly since you missed the point. I was
> trying to point out that for anything other than the compressed
> extensions in the block above that we only pass them in march to the
> assembler, and not to the compiler, in contrast to this patch which just
> always passes them. I should have pointed to how we handled the
> in-kernel Zbb stuff & asked how this was any different, would probably
> have been clearer.
> 



> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-27 12:25         ` Conor Dooley
@ 2023-08-28 16:55           ` Charlie Jenkins
  2023-08-28 17:08             ` Conor Dooley
  0 siblings, 1 reply; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-28 16:55 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, Paul Walmsley, aou

On Sun, Aug 27, 2023 at 01:25:27PM +0100, Conor Dooley wrote:
> On Sun, Aug 27, 2023 at 11:28:33AM +0100, Conor Dooley wrote:
> > On Sat, Aug 26, 2023 at 07:00:47PM -0700, Palmer Dabbelt wrote:
> > > On Sat, 26 Aug 2023 18:42:41 PDT (-0700), Conor Dooley wrote:
> > > > On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
> > > > > Provide checksum algorithms that have been designed to leverage riscv
> > > > > instructions such as rotate. In 64-bit, can take advantage of the larger
> > > > > register to avoid some overflow checking.
> > > > > 
> > > > > Add configuration for Zba extension and add march for Zba and Zbb.
> > > > > 
> > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > > ---
> > > > >  arch/riscv/Kconfig                | 23 +++++++++++
> > > > >  arch/riscv/Makefile               |  2 +
> > > > >  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
> > > > >  3 files changed, 111 insertions(+)
> > > > > 
> > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > index 4c07b9189c86..8d7e475ca28d 100644
> > > > > --- a/arch/riscv/Kconfig
> > > > > +++ b/arch/riscv/Kconfig
> > > > > @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> > > > >  	  If you don't know what to do here, say Y.
> > > > > +config TOOLCHAIN_HAS_ZBA
> > > > > +	bool
> > > > > +	default y
> > > > > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
> > > > > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
> > > > > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > > > > +	depends on AS_HAS_OPTION_ARCH
> > > > > +
> > > > > +config RISCV_ISA_ZBA
> > > > > +	bool "Zba extension support for bit manipulation instructions"
> > > > > +	depends on TOOLCHAIN_HAS_ZBA
> > > > > +	depends on MMU
> > > > > +	depends on RISCV_ALTERNATIVE
> > > > > +	default y
> > > > > +	help
> > > > > +	   Adds support to dynamically detect the presence of the ZBA
> > > > > +	   extension (basic bit manipulation) and enable its usage.
> > > > > +
> > > > > +	   The Zba extension provides instructions to accelerate a number
> > > > > +	   of bit-specific address creation operations.
> > > > > +
> > > > > +	   If you don't know what to do here, say Y.
> > > > > +
> > > > >  config TOOLCHAIN_HAS_ZBB
> > > > >  	bool
> > > > >  	default y
> > > > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > > > index 6ec6d52a4180..51fa3f67fc9a 100644
> > > > > --- a/arch/riscv/Makefile
> > > > > +++ b/arch/riscv/Makefile
> > > > > @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
> > > > >  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
> > > > >  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
> > > > >  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
> > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
> > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
> > > > 
> > > > AFAICT, this is going to break immediately on any system that enables
> > > > RISCV_ISA_ZBA (which will happen by default) but does not support the
> > > > extension. You made the option depend on RISCV_ALTERNATIVE, but I do
> > > > not see any use of alternatives in the code to actually perform the
> > > > dynamic detection of Zba.
> > > 
> > > I guess we kind of have an ambiguity here: for stuff like C we just
> > > unconditionally use the instructions, but for the rest we probe first.  We
> > > should probably have three states for each extension: disabled, dynamically
> > > detected, and assumed.
> > 
> > You mean, just add some comments to the makefile surrounding each
> > section or to some rst documentation?
> 
> Also, the code here doesn't build w/
> 	warning: invalid argument to '-march': '_zba_zbb_zicsr_zifencei_zihintpause'
> so there's something else wrong with TOOLCHAIN_HAS_ZBA :)
It is odd that this is missing 'rv64ima' or 'rv32ima' at the beginning of
this string. What configuration are you using that could cause that to
be left off?

Compiling with defconfig automatically enables Zba and appears to not
cause this issue. I realized that I put the header definitions for
do_csum and csum_ipv6_magic in this patch instead of the next one so the
code will fail to compile from this but not due to march settings.
> 
> > 
> > > > Note that for fd & v, we add it to riscv-march-y, but then immediately
> > > > remove it again before passing to the compiler, only allow them in
> > > > AFLAGS:
> > > > 	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> > > > 	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> > > > 	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > > > 
> > > > What am I missing?
> > > 
> > > FD and V both have state that can be saved lazily, so we can't let arbitrary
> > > code use them.  The extensions formally known as B don't add state, so they
> > > are safe to flip on in arbitrary places (aside from the issues you pointed
> > > out above).
> > 
> > I probably went about this badly since you missed the point. I was
> > trying to point out that for anything other than the compressed
> > extensions in the block above that we only pass them in march to the
> > assembler, and not to the compiler, in contrast to this patch which just
> > always passes them. I should have pointed to how we handled the
> > in-kernel Zbb stuff & asked how this was any different, would probably
> > have been clearer.
> > 
I supposed it might be better if I submit these changes in a different
patch so we can have more discussion there. Zbb was previously only used
by assembly files (arch/riscv/lib/strcmp.S, arch/riscv/lib/strlen.S,
arch/riscv/lib/strncmp.S). I wanted to add them to the compiler so that
that C programs could leverage these extensions. However, I neglected to
consider machines that compile the kernel with these extensions but have
cores without these extensions. The purpose of using these extensions is
to save a couple of clock cycles, so if it is necessary to first
check if the extension is enabled it may not be worth it for these
functions.

> 
> 
> 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-28 16:55           ` Charlie Jenkins
@ 2023-08-28 17:08             ` Conor Dooley
  2023-08-28 18:20               ` Charlie Jenkins
  0 siblings, 1 reply; 16+ messages in thread
From: Conor Dooley @ 2023-08-28 17:08 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, Paul Walmsley, aou


[-- Attachment #1.1: Type: text/plain, Size: 7572 bytes --]

On Mon, Aug 28, 2023 at 09:55:49AM -0700, Charlie Jenkins wrote:
> On Sun, Aug 27, 2023 at 01:25:27PM +0100, Conor Dooley wrote:
> > On Sun, Aug 27, 2023 at 11:28:33AM +0100, Conor Dooley wrote:
> > > On Sat, Aug 26, 2023 at 07:00:47PM -0700, Palmer Dabbelt wrote:
> > > > On Sat, 26 Aug 2023 18:42:41 PDT (-0700), Conor Dooley wrote:
> > > > > On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
> > > > > > Provide checksum algorithms that have been designed to leverage riscv
> > > > > > instructions such as rotate. In 64-bit, can take advantage of the larger
> > > > > > register to avoid some overflow checking.
> > > > > > 
> > > > > > Add configuration for Zba extension and add march for Zba and Zbb.
> > > > > > 
> > > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > > > ---
> > > > > >  arch/riscv/Kconfig                | 23 +++++++++++
> > > > > >  arch/riscv/Makefile               |  2 +
> > > > > >  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
> > > > > >  3 files changed, 111 insertions(+)
> > > > > > 
> > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > index 4c07b9189c86..8d7e475ca28d 100644
> > > > > > --- a/arch/riscv/Kconfig
> > > > > > +++ b/arch/riscv/Kconfig
> > > > > > @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> > > > > >  	  If you don't know what to do here, say Y.
> > > > > > +config TOOLCHAIN_HAS_ZBA
> > > > > > +	bool
> > > > > > +	default y
> > > > > > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
> > > > > > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
> > > > > > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > > > > > +	depends on AS_HAS_OPTION_ARCH
> > > > > > +
> > > > > > +config RISCV_ISA_ZBA
> > > > > > +	bool "Zba extension support for bit manipulation instructions"
> > > > > > +	depends on TOOLCHAIN_HAS_ZBA
> > > > > > +	depends on MMU
> > > > > > +	depends on RISCV_ALTERNATIVE
> > > > > > +	default y
> > > > > > +	help
> > > > > > +	   Adds support to dynamically detect the presence of the ZBA
> > > > > > +	   extension (basic bit manipulation) and enable its usage.
> > > > > > +
> > > > > > +	   The Zba extension provides instructions to accelerate a number
> > > > > > +	   of bit-specific address creation operations.
> > > > > > +
> > > > > > +	   If you don't know what to do here, say Y.
> > > > > > +
> > > > > >  config TOOLCHAIN_HAS_ZBB
> > > > > >  	bool
> > > > > >  	default y
> > > > > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > > > > index 6ec6d52a4180..51fa3f67fc9a 100644
> > > > > > --- a/arch/riscv/Makefile
> > > > > > +++ b/arch/riscv/Makefile
> > > > > > @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
> > > > > >  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
> > > > > >  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
> > > > > >  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
> > > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
> > > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
> > > > > 
> > > > > AFAICT, this is going to break immediately on any system that enables
> > > > > RISCV_ISA_ZBA (which will happen by default) but does not support the
> > > > > extension. You made the option depend on RISCV_ALTERNATIVE, but I do
> > > > > not see any use of alternatives in the code to actually perform the
> > > > > dynamic detection of Zba.
> > > > 
> > > > I guess we kind of have an ambiguity here: for stuff like C we just
> > > > unconditionally use the instructions, but for the rest we probe first.  We
> > > > should probably have three states for each extension: disabled, dynamically
> > > > detected, and assumed.
> > > 
> > > You mean, just add some comments to the makefile surrounding each
> > > section or to some rst documentation?
> > 
> > Also, the code here doesn't build w/
> > 	warning: invalid argument to '-march': '_zba_zbb_zicsr_zifencei_zihintpause'
> > so there's something else wrong with TOOLCHAIN_HAS_ZBA :)
> It is odd that this is missing 'rv64ima' or 'rv32ima' at the beginning of
> this string. What configuration are you using that could cause that to
> be left off?

I don't know, but that configuration is pretty pervasive. The patchwork
CI blew up too & that is using kernel.org toolchains built by Arnd:
https://mirrors.edge.kernel.org/pub/tools/crosstool/

> Compiling with defconfig automatically enables Zba and appears to not
> cause this issue. I realized that I put the header definitions for
> do_csum and csum_ipv6_magic in this patch instead of the next one so the
> code will fail to compile from this but not due to march settings.
> > 
> > > 
> > > > > Note that for fd & v, we add it to riscv-march-y, but then immediately
> > > > > remove it again before passing to the compiler, only allow them in
> > > > > AFLAGS:
> > > > > 	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> > > > > 	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> > > > > 	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > > > > 
> > > > > What am I missing?
> > > > 
> > > > FD and V both have state that can be saved lazily, so we can't let arbitrary
> > > > code use them.  The extensions formally known as B don't add state, so they
> > > > are safe to flip on in arbitrary places (aside from the issues you pointed
> > > > out above).
> > > 
> > > I probably went about this badly since you missed the point. I was
> > > trying to point out that for anything other than the compressed
> > > extensions in the block above that we only pass them in march to the
> > > assembler, and not to the compiler, in contrast to this patch which just
> > > always passes them. I should have pointed to how we handled the
> > > in-kernel Zbb stuff & asked how this was any different, would probably
> > > have been clearer.
> > > 
> I supposed it might be better if I submit these changes in a different
> patch so we can have more discussion there. Zbb was previously only used
> by assembly files (arch/riscv/lib/strcmp.S, arch/riscv/lib/strlen.S,
> arch/riscv/lib/strncmp.S). I wanted to add them to the compiler so that
> that C programs could leverage these extensions. However, I neglected to
> consider machines that compile the kernel with these extensions but have
> cores without these extensions.

Less so cores, since we don't support heterogeneous stuff, and moreso
platforms that do not support the extensions. It's expected that the
same kernel could in theory be used across a wide variety of systems.

> The purpose of using these extensions is
> to save a couple of clock cycles, so if it is necessary to first
> check if the extension is enabled it may not be worth it for these
> functions.

That's still possible, it's what the alternatives mechanism exists for.
During boot the codepaths are patched to use what works for a given
machine, check out the code that makes use of Zbb or
riscv_has_extension_[un]likely(). You'd need to do something like the
existing users of Zbb instructions does, with an alternative used to
avoid the custom asm implementations when the hardware does not support
them. (That's what the CONFIG_ALTERNATIVE & CONFIG_AS_HAS_OPTION_ARCH
options you made the TOOLCHAIN_HAS_ZBA depend on are for).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-28 17:08             ` Conor Dooley
@ 2023-08-28 18:20               ` Charlie Jenkins
  2023-08-28 18:56                 ` Conor Dooley
  0 siblings, 1 reply; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-28 18:20 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, Paul Walmsley, aou

On Mon, Aug 28, 2023 at 06:08:40PM +0100, Conor Dooley wrote:
> On Mon, Aug 28, 2023 at 09:55:49AM -0700, Charlie Jenkins wrote:
> > On Sun, Aug 27, 2023 at 01:25:27PM +0100, Conor Dooley wrote:
> > > On Sun, Aug 27, 2023 at 11:28:33AM +0100, Conor Dooley wrote:
> > > > On Sat, Aug 26, 2023 at 07:00:47PM -0700, Palmer Dabbelt wrote:
> > > > > On Sat, 26 Aug 2023 18:42:41 PDT (-0700), Conor Dooley wrote:
> > > > > > On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
> > > > > > > Provide checksum algorithms that have been designed to leverage riscv
> > > > > > > instructions such as rotate. In 64-bit, can take advantage of the larger
> > > > > > > register to avoid some overflow checking.
> > > > > > > 
> > > > > > > Add configuration for Zba extension and add march for Zba and Zbb.
> > > > > > > 
> > > > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > > > > ---
> > > > > > >  arch/riscv/Kconfig                | 23 +++++++++++
> > > > > > >  arch/riscv/Makefile               |  2 +
> > > > > > >  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
> > > > > > >  3 files changed, 111 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > index 4c07b9189c86..8d7e475ca28d 100644
> > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> > > > > > >  	  If you don't know what to do here, say Y.
> > > > > > > +config TOOLCHAIN_HAS_ZBA
> > > > > > > +	bool
> > > > > > > +	default y
> > > > > > > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
> > > > > > > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
> > > > > > > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > > > > > > +	depends on AS_HAS_OPTION_ARCH
> > > > > > > +
> > > > > > > +config RISCV_ISA_ZBA
> > > > > > > +	bool "Zba extension support for bit manipulation instructions"
> > > > > > > +	depends on TOOLCHAIN_HAS_ZBA
> > > > > > > +	depends on MMU
> > > > > > > +	depends on RISCV_ALTERNATIVE
> > > > > > > +	default y
> > > > > > > +	help
> > > > > > > +	   Adds support to dynamically detect the presence of the ZBA
> > > > > > > +	   extension (basic bit manipulation) and enable its usage.
> > > > > > > +
> > > > > > > +	   The Zba extension provides instructions to accelerate a number
> > > > > > > +	   of bit-specific address creation operations.
> > > > > > > +
> > > > > > > +	   If you don't know what to do here, say Y.
> > > > > > > +
> > > > > > >  config TOOLCHAIN_HAS_ZBB
> > > > > > >  	bool
> > > > > > >  	default y
> > > > > > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > > > > > index 6ec6d52a4180..51fa3f67fc9a 100644
> > > > > > > --- a/arch/riscv/Makefile
> > > > > > > +++ b/arch/riscv/Makefile
> > > > > > > @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
> > > > > > >  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
> > > > > > >  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
> > > > > > >  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
> > > > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
> > > > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
> > > > > > 
> > > > > > AFAICT, this is going to break immediately on any system that enables
> > > > > > RISCV_ISA_ZBA (which will happen by default) but does not support the
> > > > > > extension. You made the option depend on RISCV_ALTERNATIVE, but I do
> > > > > > not see any use of alternatives in the code to actually perform the
> > > > > > dynamic detection of Zba.
> > > > > 
> > > > > I guess we kind of have an ambiguity here: for stuff like C we just
> > > > > unconditionally use the instructions, but for the rest we probe first.  We
> > > > > should probably have three states for each extension: disabled, dynamically
> > > > > detected, and assumed.
> > > > 
> > > > You mean, just add some comments to the makefile surrounding each
> > > > section or to some rst documentation?
> > > 
> > > Also, the code here doesn't build w/
> > > 	warning: invalid argument to '-march': '_zba_zbb_zicsr_zifencei_zihintpause'
> > > so there's something else wrong with TOOLCHAIN_HAS_ZBA :)
> > It is odd that this is missing 'rv64ima' or 'rv32ima' at the beginning of
> > this string. What configuration are you using that could cause that to
> > be left off?
> 
> I don't know, but that configuration is pretty pervasive. The patchwork
> CI blew up too & that is using kernel.org toolchains built by Arnd:
> https://mirrors.edge.kernel.org/pub/tools/crosstool/
> 
> > Compiling with defconfig automatically enables Zba and appears to not
> > cause this issue. I realized that I put the header definitions for
> > do_csum and csum_ipv6_magic in this patch instead of the next one so the
> > code will fail to compile from this but not due to march settings.
> > > 
> > > > 
> > > > > > Note that for fd & v, we add it to riscv-march-y, but then immediately
> > > > > > remove it again before passing to the compiler, only allow them in
> > > > > > AFLAGS:
> > > > > > 	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> > > > > > 	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> > > > > > 	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > > > > > 
> > > > > > What am I missing?
> > > > > 
> > > > > FD and V both have state that can be saved lazily, so we can't let arbitrary
> > > > > code use them.  The extensions formally known as B don't add state, so they
> > > > > are safe to flip on in arbitrary places (aside from the issues you pointed
> > > > > out above).
> > > > 
> > > > I probably went about this badly since you missed the point. I was
> > > > trying to point out that for anything other than the compressed
> > > > extensions in the block above that we only pass them in march to the
> > > > assembler, and not to the compiler, in contrast to this patch which just
> > > > always passes them. I should have pointed to how we handled the
> > > > in-kernel Zbb stuff & asked how this was any different, would probably
> > > > have been clearer.
> > > > 
> > I supposed it might be better if I submit these changes in a different
> > patch so we can have more discussion there. Zbb was previously only used
> > by assembly files (arch/riscv/lib/strcmp.S, arch/riscv/lib/strlen.S,
> > arch/riscv/lib/strncmp.S). I wanted to add them to the compiler so that
> > that C programs could leverage these extensions. However, I neglected to
> > consider machines that compile the kernel with these extensions but have
> > cores without these extensions.
> 
> Less so cores, since we don't support heterogeneous stuff, and moreso
> platforms that do not support the extensions. It's expected that the
> same kernel could in theory be used across a wide variety of systems.
> 
> > The purpose of using these extensions is
> > to save a couple of clock cycles, so if it is necessary to first
> > check if the extension is enabled it may not be worth it for these
> > functions.
> 
> That's still possible, it's what the alternatives mechanism exists for.
> During boot the codepaths are patched to use what works for a given
> machine, check out the code that makes use of Zbb or
> riscv_has_extension_[un]likely(). You'd need to do something like the
> existing users of Zbb instructions does, with an alternative used to
> avoid the custom asm implementations when the hardware does not support
> them. (That's what the CONFIG_ALTERNATIVE & CONFIG_AS_HAS_OPTION_ARCH
> options you made the TOOLCHAIN_HAS_ZBA depend on are for).

I can see how to get this to work if I port this code into assembly and
write two different versions (one with Zbb and one without), but
I don't see how this would work in C. Unless I am mistaken, there would
need to be some sort of wrapper around the C code that told the compiler
to compile it multiple times for different extension combinations and
then use the riscv_has_extension_[un]likely() functions to determine
which version to use at runtime. Is this feasible?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] riscv: Vector checksum header
  2023-08-27  1:26 ` [PATCH 3/5] riscv: Vector checksum header Charlie Jenkins
@ 2023-08-28 18:22   ` Samuel Holland
  0 siblings, 0 replies; 16+ messages in thread
From: Samuel Holland @ 2023-08-28 18:22 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv,
	linux-kernel

On 2023-08-26 8:26 PM, Charlie Jenkins wrote:
> This patch is not ready for merge as vector support in the kernel is
> limited. However, the code has been tested in QEMU so the algorithms
> do work. It is written in assembly rather than using the GCC vector
> instrinsics because they did not provide optimal code.
> 
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/riscv/include/asm/checksum.h | 81 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
> index af49b3409576..7e31c0ad6346 100644
> --- a/arch/riscv/include/asm/checksum.h
> +++ b/arch/riscv/include/asm/checksum.h
> @@ -10,6 +10,10 @@
>  #include <linux/in6.h>
>  #include <linux/uaccess.h>
>  
> +#ifdef CONFIG_RISCV_ISA_V
> +#include <riscv_vector.h>
> +#endif
> +
>  /* Default version is sufficient for 32 bit */
>  #ifdef CONFIG_64BIT
>  #define _HAVE_ARCH_IPV6_CSUM
> @@ -36,6 +40,46 @@ static inline __sum16 csum_fold(__wsum sum)
>   *	without the bitmanip extensions zba/zbb.
>   */
>  #ifdef CONFIG_32BIT
> +#ifdef CONFIG_RISCV_ISA_V
> +static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
> +{
> +	vuint64m1_t prev_buffer;
> +	vuint32m1_t curr_buffer;
> +	unsigned int vl;
> +	unsigned int high_result;
> +	unsigned int low_result;
> +
> +	asm("vsetivli x0, 1, e64, ta, ma				\n\t\

The same concerns from patch 1 apply here as well. Vector assembly must be gated
behind an alternative section or a call to has_vector(), so the kernel can fall
back to a non-vector implementation at runtime.

You are also missing calls to kernel_vector_begin()/kernel_vector_end(), as
added by [1], which are required to avoid corrupting the user-mode vector
register context.

Regards,
Samuel

[1]: https://lore.kernel.org/linux-riscv/20230721112855.1006-3-andy.chiu@sifive.com/

> +	vmv.v.i %[prev_buffer], 0					\n\t\
> +	1:								\n\t\
> +	vsetvli %[vl], %[ihl], e32, m1, ta, ma				\n\t\
> +	vle32.v %[curr_buffer], (%[iph])				\n\t\
> +	vwredsumu.vs %[prev_buffer], %[curr_buffer], %[prev_buffer]	\n\t\
> +	sub %[ihl], %[ihl], %[vl]					\n\t"
> +#ifdef CONFIG_RISCV_ISA_ZBA
> +	"sh2add %[iph], %[vl], %[iph]					\n\t"
> +#else
> +	"slli %[vl], %[vl], 2						\n\
> +	add %[iph], %[vl], %[iph]					\n\t"
> +#endif
> +	"bnez %[ihl], 1b						\n\
> +	vsetivli x0, 1, e64, m1, ta, ma					\n\
> +	vmv.x.s %[low_result], %[prev_buffer]				\n\
> +	addi %[vl], x0, 32						\n\
> +	vsrl.vx %[prev_buffer], %[prev_buffer], %[vl]			\n\
> +	vmv.x.s %[high_result], %[prev_buffer]"
> +	: [vl] "=&r" (vl), [prev_buffer] "=&vd" (prev_buffer),
> +		[curr_buffer] "=&vd" (curr_buffer),
> +		[high_result] "=&r" (high_result),
> +		[low_result] "=&r" (low_result)
> +	: [iph] "r" (iph), [ihl] "r" (ihl));
> +
> +	high_result += low_result;
> +	high_result += high_result < low_result;
> +	return csum_fold((__force __wsum)(high_result));
> +}
> +
> +#else
>  static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
>  {
>  	__wsum csum = 0;
> @@ -47,8 +91,44 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
>  	} while (++pos < ihl);
>  	return csum_fold(csum);
>  }
> +#endif
> +#else
> +
> +#ifdef CONFIG_RISCV_ISA_V
> +static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
> +{
> +	vuint64m1_t prev_buffer;
> +	vuint32m1_t curr_buffer;
> +	unsigned long vl;
> +	unsigned long result;
> +
> +	asm("vsetivli x0, 1, e64, ta, ma				\n\
> +	vmv.v.i %[prev_buffer], 0					\n\
> +	1:								\n\
> +	# Setup 32-bit sum of iph					\n\
> +	vsetvli %[vl], %[ihl], e32, m1, ta, ma				\n\
> +	vle32.v %[curr_buffer], (%[iph])				\n\
> +	# Sum each 32-bit segment of iph that can fit into a vector reg	\n\
> +	vwredsumu.vs %[prev_buffer], %[curr_buffer], %[prev_buffer]     \n\
> +	subw %[ihl], %[ihl], %[vl]					\n\t"
> +#ifdef CONFIG_RISCV_ISA_ZBA
> +	"sh2add %[iph], %[vl], %[iph]					\n\t"
>  #else
> +	"slli %[vl], %[vl], 2						\n\
> +	addw %[iph], %[vl], %[iph]					\n\t"
> +#endif
> +	"# If not all of iph could fit into vector reg, do another sum	\n\
> +	bnez %[ihl], 1b							\n\
> +	vsetvli x0, x0, e64, m1, ta, ma					\n\
> +	vmv.x.s %[result], %[prev_buffer]"
> +	: [vl] "=&r" (vl), [prev_buffer] "=&vd" (prev_buffer),
> +		[curr_buffer] "=&vd" (curr_buffer), [result] "=&r" (result)
> +	: [iph] "r" (iph), [ihl] "r" (ihl));
>  
> +	result += (result >> 32) | (result << 32);
> +	return csum_fold((__force __wsum)(result >> 32));
> +}
> +#else
>  /*
>   * Quickly compute an IP checksum with the assumption that IPv4 headers will
>   * always be in multiples of 32-bits, and have an ihl of at least 5.
> @@ -74,6 +154,7 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
>  	return csum_fold((__force __wsum)(csum >> 32));
>  }
>  #endif
> +#endif
>  #define ip_fast_csum ip_fast_csum
>  
>  extern unsigned int do_csum(const unsigned char *buff, int len);
> 


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-28 18:20               ` Charlie Jenkins
@ 2023-08-28 18:56                 ` Conor Dooley
  2023-08-28 21:39                   ` Charlie Jenkins
  0 siblings, 1 reply; 16+ messages in thread
From: Conor Dooley @ 2023-08-28 18:56 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, Paul Walmsley, aou


[-- Attachment #1.1: Type: text/plain, Size: 9028 bytes --]

On Mon, Aug 28, 2023 at 11:20:39AM -0700, Charlie Jenkins wrote:
> On Mon, Aug 28, 2023 at 06:08:40PM +0100, Conor Dooley wrote:
> > On Mon, Aug 28, 2023 at 09:55:49AM -0700, Charlie Jenkins wrote:
> > > On Sun, Aug 27, 2023 at 01:25:27PM +0100, Conor Dooley wrote:
> > > > On Sun, Aug 27, 2023 at 11:28:33AM +0100, Conor Dooley wrote:
> > > > > On Sat, Aug 26, 2023 at 07:00:47PM -0700, Palmer Dabbelt wrote:
> > > > > > On Sat, 26 Aug 2023 18:42:41 PDT (-0700), Conor Dooley wrote:
> > > > > > > On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
> > > > > > > > Provide checksum algorithms that have been designed to leverage riscv
> > > > > > > > instructions such as rotate. In 64-bit, can take advantage of the larger
> > > > > > > > register to avoid some overflow checking.
> > > > > > > > 
> > > > > > > > Add configuration for Zba extension and add march for Zba and Zbb.
> > > > > > > > 
> > > > > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > > > > > ---
> > > > > > > >  arch/riscv/Kconfig                | 23 +++++++++++
> > > > > > > >  arch/riscv/Makefile               |  2 +
> > > > > > > >  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
> > > > > > > >  3 files changed, 111 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > > index 4c07b9189c86..8d7e475ca28d 100644
> > > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > > @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> > > > > > > >  	  If you don't know what to do here, say Y.
> > > > > > > > +config TOOLCHAIN_HAS_ZBA
> > > > > > > > +	bool
> > > > > > > > +	default y
> > > > > > > > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
> > > > > > > > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
> > > > > > > > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > > > > > > > +	depends on AS_HAS_OPTION_ARCH
> > > > > > > > +
> > > > > > > > +config RISCV_ISA_ZBA
> > > > > > > > +	bool "Zba extension support for bit manipulation instructions"
> > > > > > > > +	depends on TOOLCHAIN_HAS_ZBA
> > > > > > > > +	depends on MMU
> > > > > > > > +	depends on RISCV_ALTERNATIVE
> > > > > > > > +	default y
> > > > > > > > +	help
> > > > > > > > +	   Adds support to dynamically detect the presence of the ZBA
> > > > > > > > +	   extension (basic bit manipulation) and enable its usage.
> > > > > > > > +
> > > > > > > > +	   The Zba extension provides instructions to accelerate a number
> > > > > > > > +	   of bit-specific address creation operations.
> > > > > > > > +
> > > > > > > > +	   If you don't know what to do here, say Y.
> > > > > > > > +
> > > > > > > >  config TOOLCHAIN_HAS_ZBB
> > > > > > > >  	bool
> > > > > > > >  	default y
> > > > > > > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > > > > > > index 6ec6d52a4180..51fa3f67fc9a 100644
> > > > > > > > --- a/arch/riscv/Makefile
> > > > > > > > +++ b/arch/riscv/Makefile
> > > > > > > > @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
> > > > > > > >  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
> > > > > > > >  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
> > > > > > > >  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
> > > > > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
> > > > > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
> > > > > > > 
> > > > > > > AFAICT, this is going to break immediately on any system that enables
> > > > > > > RISCV_ISA_ZBA (which will happen by default) but does not support the
> > > > > > > extension. You made the option depend on RISCV_ALTERNATIVE, but I do
> > > > > > > not see any use of alternatives in the code to actually perform the
> > > > > > > dynamic detection of Zba.
> > > > > > 
> > > > > > I guess we kind of have an ambiguity here: for stuff like C we just
> > > > > > unconditionally use the instructions, but for the rest we probe first.  We
> > > > > > should probably have three states for each extension: disabled, dynamically
> > > > > > detected, and assumed.
> > > > > 
> > > > > You mean, just add some comments to the makefile surrounding each
> > > > > section or to some rst documentation?
> > > > 
> > > > Also, the code here doesn't build w/
> > > > 	warning: invalid argument to '-march': '_zba_zbb_zicsr_zifencei_zihintpause'
> > > > so there's something else wrong with TOOLCHAIN_HAS_ZBA :)
> > > It is odd that this is missing 'rv64ima' or 'rv32ima' at the beginning of
> > > this string. What configuration are you using that could cause that to
> > > be left off?
> > 
> > I don't know, but that configuration is pretty pervasive. The patchwork
> > CI blew up too & that is using kernel.org toolchains built by Arnd:
> > https://mirrors.edge.kernel.org/pub/tools/crosstool/
> > 
> > > Compiling with defconfig automatically enables Zba and appears to not
> > > cause this issue. I realized that I put the header definitions for
> > > do_csum and csum_ipv6_magic in this patch instead of the next one so the
> > > code will fail to compile from this but not due to march settings.
> > > > 
> > > > > 
> > > > > > > Note that for fd & v, we add it to riscv-march-y, but then immediately
> > > > > > > remove it again before passing to the compiler, only allow them in
> > > > > > > AFLAGS:
> > > > > > > 	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> > > > > > > 	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> > > > > > > 	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > > > > > > 
> > > > > > > What am I missing?
> > > > > > 
> > > > > > FD and V both have state that can be saved lazily, so we can't let arbitrary
> > > > > > code use them.  The extensions formally known as B don't add state, so they
> > > > > > are safe to flip on in arbitrary places (aside from the issues you pointed
> > > > > > out above).
> > > > > 
> > > > > I probably went about this badly since you missed the point. I was
> > > > > trying to point out that for anything other than the compressed
> > > > > extensions in the block above that we only pass them in march to the
> > > > > assembler, and not to the compiler, in contrast to this patch which just
> > > > > always passes them. I should have pointed to how we handled the
> > > > > in-kernel Zbb stuff & asked how this was any different, would probably
> > > > > have been clearer.
> > > > > 
> > > I supposed it might be better if I submit these changes in a different
> > > patch so we can have more discussion there. Zbb was previously only used
> > > by assembly files (arch/riscv/lib/strcmp.S, arch/riscv/lib/strlen.S,
> > > arch/riscv/lib/strncmp.S). I wanted to add them to the compiler so that
> > > that C programs could leverage these extensions. However, I neglected to
> > > consider machines that compile the kernel with these extensions but have
> > > cores without these extensions.
> > 
> > Less so cores, since we don't support heterogeneous stuff, and moreso
> > platforms that do not support the extensions. It's expected that the
> > same kernel could in theory be used across a wide variety of systems.
> > 
> > > The purpose of using these extensions is
> > > to save a couple of clock cycles, so if it is necessary to first
> > > check if the extension is enabled it may not be worth it for these
> > > functions.
> > 
> > That's still possible, it's what the alternatives mechanism exists for.
> > During boot the codepaths are patched to use what works for a given
> > machine, check out the code that makes use of Zbb or
> > riscv_has_extension_[un]likely(). You'd need to do something like the
> > existing users of Zbb instructions does, with an alternative used to
> > avoid the custom asm implementations when the hardware does not support
> > them. (That's what the CONFIG_ALTERNATIVE & CONFIG_AS_HAS_OPTION_ARCH
> > options you made the TOOLCHAIN_HAS_ZBA depend on are for).
> 
> I can see how to get this to work if I port this code into assembly and
> write two different versions (one with Zbb and one without), but
> I don't see how this would work in C. Unless I am mistaken, there would
> need to be some sort of wrapper around the C code that told the compiler
> to compile it multiple times for different extension combinations and
> then use the riscv_has_extension_[un]likely() functions to determine
> which version to use at runtime. Is this feasible?

IIRC, if you put all the code using Zbb etc into a compilation unit of
its own then you can set march for that unit alone, but it may well just
be easier to write a custom asm one for the Zbb case & use the c
implementation from this patch for the non-Zbb case.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] riscv: Checksum header
  2023-08-28 18:56                 ` Conor Dooley
@ 2023-08-28 21:39                   ` Charlie Jenkins
  0 siblings, 0 replies; 16+ messages in thread
From: Charlie Jenkins @ 2023-08-28 21:39 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, Paul Walmsley, aou

On Mon, Aug 28, 2023 at 07:56:13PM +0100, Conor Dooley wrote:
> On Mon, Aug 28, 2023 at 11:20:39AM -0700, Charlie Jenkins wrote:
> > On Mon, Aug 28, 2023 at 06:08:40PM +0100, Conor Dooley wrote:
> > > On Mon, Aug 28, 2023 at 09:55:49AM -0700, Charlie Jenkins wrote:
> > > > On Sun, Aug 27, 2023 at 01:25:27PM +0100, Conor Dooley wrote:
> > > > > On Sun, Aug 27, 2023 at 11:28:33AM +0100, Conor Dooley wrote:
> > > > > > On Sat, Aug 26, 2023 at 07:00:47PM -0700, Palmer Dabbelt wrote:
> > > > > > > On Sat, 26 Aug 2023 18:42:41 PDT (-0700), Conor Dooley wrote:
> > > > > > > > On Sat, Aug 26, 2023 at 06:26:06PM -0700, Charlie Jenkins wrote:
> > > > > > > > > Provide checksum algorithms that have been designed to leverage riscv
> > > > > > > > > instructions such as rotate. In 64-bit, can take advantage of the larger
> > > > > > > > > register to avoid some overflow checking.
> > > > > > > > > 
> > > > > > > > > Add configuration for Zba extension and add march for Zba and Zbb.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > > > > > > ---
> > > > > > > > >  arch/riscv/Kconfig                | 23 +++++++++++
> > > > > > > > >  arch/riscv/Makefile               |  2 +
> > > > > > > > >  arch/riscv/include/asm/checksum.h | 86 +++++++++++++++++++++++++++++++++++++++
> > > > > > > > >  3 files changed, 111 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > > > index 4c07b9189c86..8d7e475ca28d 100644
> > > > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > > > @@ -507,6 +507,29 @@ config RISCV_ISA_V_DEFAULT_ENABLE
> > > > > > > > >  	  If you don't know what to do here, say Y.
> > > > > > > > > +config TOOLCHAIN_HAS_ZBA
> > > > > > > > > +	bool
> > > > > > > > > +	default y
> > > > > > > > > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zba)
> > > > > > > > > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zba)
> > > > > > > > > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > > > > > > > > +	depends on AS_HAS_OPTION_ARCH
> > > > > > > > > +
> > > > > > > > > +config RISCV_ISA_ZBA
> > > > > > > > > +	bool "Zba extension support for bit manipulation instructions"
> > > > > > > > > +	depends on TOOLCHAIN_HAS_ZBA
> > > > > > > > > +	depends on MMU
> > > > > > > > > +	depends on RISCV_ALTERNATIVE
> > > > > > > > > +	default y
> > > > > > > > > +	help
> > > > > > > > > +	   Adds support to dynamically detect the presence of the ZBA
> > > > > > > > > +	   extension (basic bit manipulation) and enable its usage.
> > > > > > > > > +
> > > > > > > > > +	   The Zba extension provides instructions to accelerate a number
> > > > > > > > > +	   of bit-specific address creation operations.
> > > > > > > > > +
> > > > > > > > > +	   If you don't know what to do here, say Y.
> > > > > > > > > +
> > > > > > > > >  config TOOLCHAIN_HAS_ZBB
> > > > > > > > >  	bool
> > > > > > > > >  	default y
> > > > > > > > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > > > > > > > index 6ec6d52a4180..51fa3f67fc9a 100644
> > > > > > > > > --- a/arch/riscv/Makefile
> > > > > > > > > +++ b/arch/riscv/Makefile
> > > > > > > > > @@ -61,6 +61,8 @@ riscv-march-$(CONFIG_ARCH_RV64I)	:= rv64ima
> > > > > > > > >  riscv-march-$(CONFIG_FPU)		:= $(riscv-march-y)fd
> > > > > > > > >  riscv-march-$(CONFIG_RISCV_ISA_C)	:= $(riscv-march-y)c
> > > > > > > > >  riscv-march-$(CONFIG_RISCV_ISA_V)	:= $(riscv-march-y)v
> > > > > > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBA)	:= $(riscv-march-y)_zba
> > > > > > > > > +riscv-march-$(CONFIG_RISCV_ISA_ZBB)	:= $(riscv-march-y)_zbb
> > > > > > > > 
> > > > > > > > AFAICT, this is going to break immediately on any system that enables
> > > > > > > > RISCV_ISA_ZBA (which will happen by default) but does not support the
> > > > > > > > extension. You made the option depend on RISCV_ALTERNATIVE, but I do
> > > > > > > > not see any use of alternatives in the code to actually perform the
> > > > > > > > dynamic detection of Zba.
> > > > > > > 
> > > > > > > I guess we kind of have an ambiguity here: for stuff like C we just
> > > > > > > unconditionally use the instructions, but for the rest we probe first.  We
> > > > > > > should probably have three states for each extension: disabled, dynamically
> > > > > > > detected, and assumed.
> > > > > > 
> > > > > > You mean, just add some comments to the makefile surrounding each
> > > > > > section or to some rst documentation?
> > > > > 
> > > > > Also, the code here doesn't build w/
> > > > > 	warning: invalid argument to '-march': '_zba_zbb_zicsr_zifencei_zihintpause'
> > > > > so there's something else wrong with TOOLCHAIN_HAS_ZBA :)
> > > > It is odd that this is missing 'rv64ima' or 'rv32ima' at the beginning of
> > > > this string. What configuration are you using that could cause that to
> > > > be left off?
> > > 
> > > I don't know, but that configuration is pretty pervasive. The patchwork
> > > CI blew up too & that is using kernel.org toolchains built by Arnd:
> > > https://mirrors.edge.kernel.org/pub/tools/crosstool/
> > > 
> > > > Compiling with defconfig automatically enables Zba and appears to not
> > > > cause this issue. I realized that I put the header definitions for
> > > > do_csum and csum_ipv6_magic in this patch instead of the next one so the
> > > > code will fail to compile from this but not due to march settings.
> > > > > 
> > > > > > 
> > > > > > > > Note that for fd & v, we add it to riscv-march-y, but then immediately
> > > > > > > > remove it again before passing to the compiler, only allow them in
> > > > > > > > AFLAGS:
> > > > > > > > 	# Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> > > > > > > > 	# matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> > > > > > > > 	KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > > > > > > > 
> > > > > > > > What am I missing?
> > > > > > > 
> > > > > > > FD and V both have state that can be saved lazily, so we can't let arbitrary
> > > > > > > code use them.  The extensions formally known as B don't add state, so they
> > > > > > > are safe to flip on in arbitrary places (aside from the issues you pointed
> > > > > > > out above).
> > > > > > 
> > > > > > I probably went about this badly since you missed the point. I was
> > > > > > trying to point out that for anything other than the compressed
> > > > > > extensions in the block above that we only pass them in march to the
> > > > > > assembler, and not to the compiler, in contrast to this patch which just
> > > > > > always passes them. I should have pointed to how we handled the
> > > > > > in-kernel Zbb stuff & asked how this was any different, would probably
> > > > > > have been clearer.
> > > > > > 
> > > > I supposed it might be better if I submit these changes in a different
> > > > patch so we can have more discussion there. Zbb was previously only used
> > > > by assembly files (arch/riscv/lib/strcmp.S, arch/riscv/lib/strlen.S,
> > > > arch/riscv/lib/strncmp.S). I wanted to add them to the compiler so that
> > > > that C programs could leverage these extensions. However, I neglected to
> > > > consider machines that compile the kernel with these extensions but have
> > > > cores without these extensions.
> > > 
> > > Less so cores, since we don't support heterogeneous stuff, and moreso
> > > platforms that do not support the extensions. It's expected that the
> > > same kernel could in theory be used across a wide variety of systems.
> > > 
> > > > The purpose of using these extensions is
> > > > to save a couple of clock cycles, so if it is necessary to first
> > > > check if the extension is enabled it may not be worth it for these
> > > > functions.
> > > 
> > > That's still possible, it's what the alternatives mechanism exists for.
> > > During boot the codepaths are patched to use what works for a given
> > > machine, check out the code that makes use of Zbb or
> > > riscv_has_extension_[un]likely(). You'd need to do something like the
> > > existing users of Zbb instructions does, with an alternative used to
> > > avoid the custom asm implementations when the hardware does not support
> > > them. (That's what the CONFIG_ALTERNATIVE & CONFIG_AS_HAS_OPTION_ARCH
> > > options you made the TOOLCHAIN_HAS_ZBA depend on are for).
> > 
> > I can see how to get this to work if I port this code into assembly and
> > write two different versions (one with Zbb and one without), but
> > I don't see how this would work in C. Unless I am mistaken, there would
> > need to be some sort of wrapper around the C code that told the compiler
> > to compile it multiple times for different extension combinations and
> > then use the riscv_has_extension_[un]likely() functions to determine
> > which version to use at runtime. Is this feasible?
> 
> IIRC, if you put all the code using Zbb etc into a compilation unit of
> its own then you can set march for that unit alone, but it may well just
> be easier to write a custom asm one for the Zbb case & use the c
> implementation from this patch for the non-Zbb case.
In include/linux/bitops.h there are implementations of rotations (i.e.
ror32, ror16). Do you think it would be an acceptable solution to move
those definitions into asm-generic and have riscv implement these
functions in assembly with Zbb and non-Zbb versions and use the
alternatives macro? Every other architecture would just use the
existing definitions but could implement their own if they wanted. The
code that would benefit from Zbb is a rotation.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-08-28 21:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-27  1:26 [PATCH 0/5] riscv: Add fine-tuned checksum functions Charlie Jenkins
2023-08-27  1:26 ` [PATCH 1/5] riscv: Checksum header Charlie Jenkins
2023-08-27  1:42   ` Conor Dooley
2023-08-27  2:00     ` Palmer Dabbelt
2023-08-27 10:28       ` Conor Dooley
2023-08-27 12:25         ` Conor Dooley
2023-08-28 16:55           ` Charlie Jenkins
2023-08-28 17:08             ` Conor Dooley
2023-08-28 18:20               ` Charlie Jenkins
2023-08-28 18:56                 ` Conor Dooley
2023-08-28 21:39                   ` Charlie Jenkins
2023-08-27  1:26 ` [PATCH 2/5] riscv: Add checksum library Charlie Jenkins
2023-08-27  1:26 ` [PATCH 3/5] riscv: Vector checksum header Charlie Jenkins
2023-08-28 18:22   ` Samuel Holland
2023-08-27  1:26 ` [PATCH 4/5] riscv: Vector checksum library Charlie Jenkins
2023-08-27  1:26 ` [PATCH 5/5] riscv: Test checksum functions Charlie Jenkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).