linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] Zacas/Zabha support and qspinlocks
@ 2024-07-17  6:19 Alexandre Ghiti
  2024-07-17  6:19 ` [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
                   ` (11 more replies)
  0 siblings, 12 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

This implements [cmp]xchgXX() macros using Zacas and Zabha extensions
and finally uses those newly introduced macros to add support for
qspinlocks: note that this implementation of qspinlocks satisfies the
forward progress guarantee.

It also uses Ziccrse to provide the qspinlock implementation.

Thanks to Guo and Leonardo for their work!

v2: https://lore.kernel.org/linux-riscv/20240626130347.520750-1-alexghiti@rivosinc.com/
v1: https://lore.kernel.org/linux-riscv/20240528151052.313031-1-alexghiti@rivosinc.com/

Changes in v3:
- Fix patch 4 to restrict the optimization to fully ordered AMO (Andrea)
- Move RISCV_ISA_EXT_ZABHA definition to patch 4 (Andrea)
- !Zacas at build time => no CAS from Zabha too (Andrea)
- drop patch 7 "riscv: Improve amoswap.X use in xchg()" (Andrea)
- Switch lr/sc and cas order (Guo)
- Combo spinlocks do not depend on Zabha
- Add a Kconfig for ticket/queued/combo (Guo)
- Use Ziccrse (Guo)

Changes in v2:
- Add patch for Zabha dtbinding (Conor)
- Fix cmpxchg128() build warnings missed in v1
- Make arch_cmpxchg128() fully ordered
- Improve Kconfig help texts for both extensions (Conor)
- Fix Makefile dependencies by requiring TOOLCHAIN_HAS_XXX (Nathan)
- Fix compilation errors when the toolchain does not support the
  extensions (Nathan)
- Fix C23 warnings about label at the end of coumpound statements (Nathan)
- Fix Zabha and !Zacas configurations (Andrea)
- Add COMBO spinlocks (Guo)
- Improve amocas fully ordered operations by using .aqrl semantics and
  removing the fence rw, rw (Andrea)
- Rebase on top "riscv: Fix fully ordered LR/SC xchg[8|16]() implementations"
- Add ARCH_WEAK_RELEASE_ACQUIRE (Andrea)
- Remove the extension version in march for LLVM since it is only required
  for experimental extensions (Nathan)
- Fix cmpxchg128() implementation by adding both registers of a pair
  in the list of input/output operands

Alexandre Ghiti (9):
  riscv: Implement cmpxchg32/64() using Zacas
  dt-bindings: riscv: Add Zabha ISA extension description
  riscv: Implement cmpxchg8/16() using Zabha
  riscv: Improve zacas fully-ordered cmpxchg()
  riscv: Implement arch_cmpxchg128() using Zacas
  riscv: Implement xchg8/16() using Zabha
  riscv: Add ISA extension parsing for Ziccrse
  dt-bindings: riscv: Add Ziccrse ISA extension description
  riscv: Add qspinlock support

Guo Ren (2):
  asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
  asm-generic: ticket-lock: Add separate ticket-lock.h

 .../devicetree/bindings/riscv/extensions.yaml |  12 ++
 .../locking/queued-spinlocks/arch-support.txt |   2 +-
 arch/riscv/Kconfig                            |  64 +++++++
 arch/riscv/Makefile                           |   6 +
 arch/riscv/include/asm/Kbuild                 |   4 +-
 arch/riscv/include/asm/cmpxchg.h              | 173 +++++++++++++++---
 arch/riscv/include/asm/hwcap.h                |   2 +
 arch/riscv/include/asm/spinlock.h             |  39 ++++
 arch/riscv/kernel/cpufeature.c                |   2 +
 arch/riscv/kernel/setup.c                     |  33 ++++
 include/asm-generic/qspinlock.h               |   2 +
 include/asm-generic/spinlock.h                |  87 +--------
 include/asm-generic/spinlock_types.h          |  12 +-
 include/asm-generic/ticket_spinlock.h         | 105 +++++++++++
 14 files changed, 424 insertions(+), 119 deletions(-)
 create mode 100644 arch/riscv/include/asm/spinlock.h
 create mode 100644 include/asm-generic/ticket_spinlock.h

-- 
2.39.2


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17 15:08   ` Andrew Jones
  2024-07-19  0:45   ` Samuel Holland
  2024-07-17  6:19 ` [PATCH v3 02/11] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

This adds runtime support for Zacas in cmpxchg operations.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/Kconfig               | 17 +++++++++++++++++
 arch/riscv/Makefile              |  3 +++
 arch/riscv/include/asm/cmpxchg.h | 26 +++++++++++++++++++++++---
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 05ccba8ca33a..1caaedec88c7 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
 	  preemption. Enabling this config will result in higher memory
 	  consumption due to the allocation of per-task's kernel Vector context.
 
+config TOOLCHAIN_HAS_ZACAS
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zacas)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zacas)
+	depends on AS_HAS_OPTION_ARCH
+
+config RISCV_ISA_ZACAS
+	bool "Zacas extension support for atomic CAS"
+	depends on TOOLCHAIN_HAS_ZACAS
+	default y
+	help
+	  Enable the use of the Zacas ISA-extension to implement kernel atomic
+	  cmpxchg operations when it is detected at boot.
+
+	  If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZBB
 	bool
 	default y
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 06de9d365088..9fd13d7a9cc6 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -85,6 +85,9 @@ endif
 # Check if the toolchain supports Zihintpause extension
 riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
 
+# Check if the toolchain supports Zacas
+riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
+
 # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
 # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
 KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 808b4c78462e..5d38153e2f13 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -9,6 +9,7 @@
 #include <linux/bug.h>
 
 #include <asm/fence.h>
+#include <asm/alternative.h>
 
 #define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)		\
 ({									\
@@ -134,21 +135,40 @@
 	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
 })
 
-#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)	\
+#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
 ({									\
+	__label__ no_zacas, end;					\
 	register unsigned int __rc;					\
 									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
+		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
+				     RISCV_ISA_EXT_ZACAS, 1)		\
+			 : : : : no_zacas);				\
+									\
+		__asm__ __volatile__ (					\
+			prepend						\
+			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
+			append						\
+			: "+&r" (r), "+A" (*(p))			\
+			: "rJ" (n)					\
+			: "memory");					\
+		goto end;						\
+	}								\
+									\
+no_zacas:								\
 	__asm__ __volatile__ (						\
 		prepend							\
 		"0:	lr" lr_sfx " %0, %2\n"				\
 		"	bne  %0, %z3, 1f\n"				\
-		"	sc" sc_sfx " %1, %z4, %2\n"			\
+		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
 		"	bnez %1, 0b\n"					\
 		append							\
 		"1:\n"							\
 		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
 		: "rJ" (co o), "rJ" (n)					\
 		: "memory");						\
+									\
+end:;									\
 })
 
 #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
@@ -156,7 +176,7 @@
 	__typeof__(ptr) __ptr = (ptr);					\
 	__typeof__(*(__ptr)) __old = (old);				\
 	__typeof__(*(__ptr)) __new = (new);				\
-	__typeof__(*(__ptr)) __ret;					\
+	__typeof__(*(__ptr)) __ret = (old);				\
 									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 02/11] dt-bindings: riscv: Add Zabha ISA extension description
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
  2024-07-17  6:19 ` [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17  6:42   ` Krzysztof Kozlowski
  2024-07-17  9:32   ` Guo Ren
  2024-07-17  6:19 ` [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha Alexandre Ghiti
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

Add description for the Zabha ISA extension which was ratified in April
2024.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 468c646247aa..e6436260bdeb 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -171,6 +171,12 @@ properties:
             memory types as ratified in the 20191213 version of the privileged
             ISA specification.
 
+        - const: zabha
+          description: |
+            The Zabha extension for Byte and Halfword Atomic Memory Operations
+            as ratified at commit 49f49c842ff9 ("Update to Rafified state") of
+            riscv-zabha.
+
         - const: zacas
           description: |
             The Zacas extension for Atomic Compare-and-Swap (CAS) instructions
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
  2024-07-17  6:19 ` [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
  2024-07-17  6:19 ` [PATCH v3 02/11] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17 15:26   ` Andrew Jones
  2024-07-17  6:19 ` [PATCH v3 04/11] riscv: Improve zacas fully-ordered cmpxchg() Alexandre Ghiti
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

This adds runtime support for Zabha in cmpxchg8/16() operations.

Note that in the absence of Zacas support in the toolchain, CAS
instructions from Zabha won't be used.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/Kconfig               | 17 ++++++++++++++++
 arch/riscv/Makefile              |  3 +++
 arch/riscv/include/asm/cmpxchg.h | 33 ++++++++++++++++++++++++++++++--
 arch/riscv/include/asm/hwcap.h   |  1 +
 arch/riscv/kernel/cpufeature.c   |  1 +
 5 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 1caaedec88c7..d3b0f92f92da 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
 	  preemption. Enabling this config will result in higher memory
 	  consumption due to the allocation of per-task's kernel Vector context.
 
+config TOOLCHAIN_HAS_ZABHA
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zabha)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zabha)
+	depends on AS_HAS_OPTION_ARCH
+
+config RISCV_ISA_ZABHA
+	bool "Zabha extension support for atomic byte/halfword operations"
+	depends on TOOLCHAIN_HAS_ZABHA
+	default y
+	help
+	  Enable the use of the Zabha ISA-extension to implement kernel
+	  byte/halfword atomic memory operations when it is detected at boot.
+
+	  If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZACAS
 	bool
 	default y
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 9fd13d7a9cc6..78dcaaeebf4e 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -88,6 +88,9 @@ riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
 # Check if the toolchain supports Zacas
 riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
 
+# Check if the toolchain supports Zabha
+riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZABHA) := $(riscv-march-y)_zabha
+
 # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
 # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
 KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 5d38153e2f13..c86722a101d0 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -105,8 +105,30 @@
  * indicated by comparing RETURN with OLD.
  */
 
-#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
+#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
 ({									\
+	__label__ no_zabha_zacas, end;					\
+									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&			\
+	    IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
+		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
+				     RISCV_ISA_EXT_ZABHA, 1)		\
+			 : : : : no_zabha_zacas);			\
+		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
+				     RISCV_ISA_EXT_ZACAS, 1)		\
+			 : : : : no_zabha_zacas);			\
+									\
+		__asm__ __volatile__ (					\
+			prepend						\
+			"	amocas" cas_sfx " %0, %z2, %1\n"	\
+			append						\
+			: "+&r" (r), "+A" (*(p))			\
+			: "rJ" (n)					\
+			: "memory");					\
+		goto end;						\
+	}								\
+									\
+no_zabha_zacas:;							\
 	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
 	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
 	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
@@ -133,6 +155,8 @@
 		: "memory");						\
 									\
 	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
+									\
+end:;									\
 })
 
 #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
@@ -180,8 +204,13 @@ end:;									\
 									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
+		__arch_cmpxchg_masked(sc_sfx, ".b" sc_sfx,		\
+					prepend, append,		\
+					__ret, __ptr, __old, __new);    \
+		break;							\
 	case 2:								\
-		__arch_cmpxchg_masked(sc_sfx, prepend, append,		\
+		__arch_cmpxchg_masked(sc_sfx, ".h" sc_sfx,		\
+					prepend, append,		\
 					__ret, __ptr, __old, __new);	\
 		break;							\
 	case 4:								\
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index e17d0078a651..f71ddd2ca163 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -81,6 +81,7 @@
 #define RISCV_ISA_EXT_ZTSO		72
 #define RISCV_ISA_EXT_ZACAS		73
 #define RISCV_ISA_EXT_XANDESPMU		74
+#define RISCV_ISA_EXT_ZABHA		75
 
 #define RISCV_ISA_EXT_XLINUXENVCFG	127
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 5ef48cb20ee1..c125d82c894b 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -257,6 +257,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
 	__RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
 	__RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
+	__RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA),
 	__RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
 	__RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
 	__RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 04/11] riscv: Improve zacas fully-ordered cmpxchg()
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (2 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17  6:19 ` [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas Alexandre Ghiti
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti, Andrea Parri

The current fully-ordered cmpxchgXX() implementation results in:

  amocas.X.rl     a5,a4,(s1)
  fence           rw,rw

This provides enough sync but we can actually use the following better
mapping instead:

  amocas.X.aqrl   a5,a4,(s1)

Suggested-by: Andrea Parri <andrea@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/include/asm/cmpxchg.h | 71 ++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 27 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index c86722a101d0..97b24da38897 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -105,7 +105,10 @@
  * indicated by comparing RETURN with OLD.
  */
 
-#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
+#define __arch_cmpxchg_masked(sc_sfx, cas_sfx,				\
+			      sc_prepend, sc_append,			\
+			      cas_prepend, cas_append,			\
+			      r, p, o, n)				\
 ({									\
 	__label__ no_zabha_zacas, end;					\
 									\
@@ -119,9 +122,9 @@
 			 : : : : no_zabha_zacas);			\
 									\
 		__asm__ __volatile__ (					\
-			prepend						\
+			cas_prepend					\
 			"	amocas" cas_sfx " %0, %z2, %1\n"	\
-			append						\
+			cas_append					\
 			: "+&r" (r), "+A" (*(p))			\
 			: "rJ" (n)					\
 			: "memory");					\
@@ -139,7 +142,7 @@ no_zabha_zacas:;							\
 	ulong __rc;							\
 									\
 	__asm__ __volatile__ (						\
-		prepend							\
+		sc_prepend						\
 		"0:	lr.w %0, %2\n"					\
 		"	and  %1, %0, %z5\n"				\
 		"	bne  %1, %z3, 1f\n"				\
@@ -147,7 +150,7 @@ no_zabha_zacas:;							\
 		"	or   %1, %1, %z4\n"				\
 		"	sc.w" sc_sfx " %1, %1, %2\n"			\
 		"	bnez %1, 0b\n"					\
-		append							\
+		sc_append						\
 		"1:\n"							\
 		: "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
 		: "rJ" ((long)__oldx), "rJ" (__newx),			\
@@ -159,7 +162,10 @@ no_zabha_zacas:;							\
 end:;									\
 })
 
-#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
+#define __arch_cmpxchg(lr_sfx, sc_sfx, cas_sfx,				\
+		       sc_prepend, sc_append,				\
+		       cas_prepend, cas_append,				\
+		       r, p, co, o, n)					\
 ({									\
 	__label__ no_zacas, end;					\
 	register unsigned int __rc;					\
@@ -170,9 +176,9 @@ end:;									\
 			 : : : : no_zacas);				\
 									\
 		__asm__ __volatile__ (					\
-			prepend						\
-			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
-			append						\
+			cas_prepend					\
+			"	amocas" cas_sfx " %0, %z2, %1\n"	\
+			cas_append					\
 			: "+&r" (r), "+A" (*(p))			\
 			: "rJ" (n)					\
 			: "memory");					\
@@ -181,12 +187,12 @@ end:;									\
 									\
 no_zacas:								\
 	__asm__ __volatile__ (						\
-		prepend							\
+		sc_prepend						\
 		"0:	lr" lr_sfx " %0, %2\n"				\
 		"	bne  %0, %z3, 1f\n"				\
-		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
+		"	sc" sc_sfx " %1, %z4, %2\n"			\
 		"	bnez %1, 0b\n"					\
-		append							\
+		sc_append						\
 		"1:\n"							\
 		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
 		: "rJ" (co o), "rJ" (n)					\
@@ -195,7 +201,9 @@ no_zacas:								\
 end:;									\
 })
 
-#define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
+#define _arch_cmpxchg(ptr, old, new, sc_sfx, cas_sfx,			\
+		      sc_prepend, sc_append,				\
+		      cas_prepend, cas_append)				\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
 	__typeof__(*(__ptr)) __old = (old);				\
@@ -204,22 +212,28 @@ end:;									\
 									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
-		__arch_cmpxchg_masked(sc_sfx, ".b" sc_sfx,		\
-					prepend, append,		\
-					__ret, __ptr, __old, __new);    \
+		__arch_cmpxchg_masked(sc_sfx, ".b" cas_sfx,		\
+				      sc_prepend, sc_append,		\
+				      cas_prepend, cas_append,		\
+				      __ret, __ptr, __old, __new);	\
 		break;							\
 	case 2:								\
-		__arch_cmpxchg_masked(sc_sfx, ".h" sc_sfx,		\
-					prepend, append,		\
-					__ret, __ptr, __old, __new);	\
+		__arch_cmpxchg_masked(sc_sfx, ".h" cas_sfx,		\
+				      sc_prepend, sc_append,		\
+				      cas_prepend, cas_append,		\
+				      __ret, __ptr, __old, __new);	\
 		break;							\
 	case 4:								\
-		__arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,	\
-				__ret, __ptr, (long), __old, __new);	\
+		__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx,		\
+			       sc_prepend, sc_append,			\
+			       cas_prepend, cas_append,			\
+			       __ret, __ptr, (long), __old, __new);	\
 		break;							\
 	case 8:								\
-		__arch_cmpxchg(".d", ".d" sc_sfx, prepend, append,	\
-				__ret, __ptr, /**/, __old, __new);	\
+		__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx,		\
+			       sc_prepend, sc_append,			\
+			       cas_prepend, cas_append,			\
+			       __ret, __ptr, /**/, __old, __new);	\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
@@ -228,16 +242,19 @@ end:;									\
 })
 
 #define arch_cmpxchg_relaxed(ptr, o, n)					\
-	_arch_cmpxchg((ptr), (o), (n), "", "", "")
+	_arch_cmpxchg((ptr), (o), (n), "", "", "", "", "", "")
 
 #define arch_cmpxchg_acquire(ptr, o, n)					\
-	_arch_cmpxchg((ptr), (o), (n), "", "", RISCV_ACQUIRE_BARRIER)
+	_arch_cmpxchg((ptr), (o), (n), "", "",				\
+		      "", RISCV_ACQUIRE_BARRIER, "", RISCV_ACQUIRE_BARRIER)
 
 #define arch_cmpxchg_release(ptr, o, n)					\
-	_arch_cmpxchg((ptr), (o), (n), "", RISCV_RELEASE_BARRIER, "")
+	_arch_cmpxchg((ptr), (o), (n), "", "",				\
+		      RISCV_RELEASE_BARRIER, "", RISCV_RELEASE_BARRIER, "")
 
 #define arch_cmpxchg(ptr, o, n)						\
-	_arch_cmpxchg((ptr), (o), (n), ".rl", "", "	fence rw, rw\n")
+	_arch_cmpxchg((ptr), (o), (n), ".rl", ".aqrl",			\
+		      "", RISCV_FULL_BARRIER, "", "")
 
 #define arch_cmpxchg_local(ptr, o, n)					\
 	arch_cmpxchg_relaxed((ptr), (o), (n))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (3 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 04/11] riscv: Improve zacas fully-ordered cmpxchg() Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17 20:34   ` Andrew Jones
  2024-07-17  6:19 ` [PATCH v3 06/11] riscv: Implement xchg8/16() using Zabha Alexandre Ghiti
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

Now that Zacas is supported in the kernel, let's use the double word
atomic version of amocas to improve the SLUB allocator.

Note that we have to select fixed registers, otherwise gcc fails to pick
even registers and then produces a reserved encoding which fails to
assemble.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/Kconfig               |  1 +
 arch/riscv/include/asm/cmpxchg.h | 39 ++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index d3b0f92f92da..0bbaec0444d0 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -104,6 +104,7 @@ config RISCV
 	select GENERIC_VDSO_TIME_NS if HAVE_GENERIC_VDSO
 	select HARDIRQS_SW_RESEND
 	select HAS_IOPORT if MMU
+	select HAVE_ALIGNED_STRUCT_PAGE
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP
 	select HAVE_ARCH_HUGE_VMAP if MMU && 64BIT
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 97b24da38897..608d98522557 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -289,4 +289,43 @@ end:;									\
 	arch_cmpxchg_release((ptr), (o), (n));				\
 })
 
+#ifdef CONFIG_RISCV_ISA_ZACAS
+
+#define system_has_cmpxchg128()						\
+			riscv_has_extension_unlikely(RISCV_ISA_EXT_ZACAS)
+
+union __u128_halves {
+	u128 full;
+	struct {
+		u64 low, high;
+	};
+};
+
+#define __arch_cmpxchg128(p, o, n, cas_sfx)					\
+({										\
+	__typeof__(*(p)) __o = (o);						\
+	union __u128_halves __hn = { .full = (n) };				\
+	union __u128_halves __ho = { .full = (__o) };				\
+	register unsigned long x6 asm ("x6") = __hn.low;			\
+	register unsigned long x7 asm ("x7") = __hn.high;			\
+	register unsigned long x28 asm ("x28") = __ho.low;			\
+	register unsigned long x29 asm ("x29") = __ho.high;			\
+										\
+	__asm__ __volatile__ (							\
+		"	amocas.q" cas_sfx " %0, %z3, %2"			\
+		: "+&r" (x28), "+&r" (x29), "+A" (*(p))				\
+		: "rJ" (x6), "rJ" (x7)						\
+		: "memory");							\
+										\
+	((u128)x29 << 64) | x28;						\
+})
+
+#define arch_cmpxchg128(ptr, o, n)						\
+	__arch_cmpxchg128((ptr), (o), (n), ".aqrl")
+
+#define arch_cmpxchg128_local(ptr, o, n)					\
+	__arch_cmpxchg128((ptr), (o), (n), "")
+
+#endif /* CONFIG_RISCV_ISA_ZACAS */
+
 #endif /* _ASM_RISCV_CMPXCHG_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 06/11] riscv: Implement xchg8/16() using Zabha
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (4 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17  6:19 ` [PATCH v3 07/11] asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock Alexandre Ghiti
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

This adds runtime support for Zabha in xchg8/16() operations.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/include/asm/cmpxchg.h | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 608d98522557..091e6612ddb3 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,8 +11,27 @@
 #include <asm/fence.h>
 #include <asm/alternative.h>
 
-#define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)		\
+#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,	\
+			   swap_append, r, p, n)			\
 ({									\
+	__label__ no_zabha, end;					\
+									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
+		asm goto(ALTERNATIVE("j %[no_zabha]", "nop", 0,		\
+				     RISCV_ISA_EXT_ZABHA, 1)		\
+			 : : : : no_zabha);				\
+									\
+		__asm__ __volatile__ (					\
+			prepend						\
+			"	amoswap" swap_sfx " %0, %z2, %1\n"	\
+			swap_append					\
+			: "=&r" (r), "+A" (*(p))			\
+			: "rJ" (n)					\
+			: "memory");					\
+		goto end;						\
+	}								\
+									\
+no_zabha:;								\
 	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
 	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
 	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
@@ -28,12 +47,14 @@
 	       "	or   %1, %1, %z3\n"				\
 	       "	sc.w" sc_sfx " %1, %1, %2\n"			\
 	       "	bnez %1, 0b\n"					\
-	       append							\
+	       sc_append						\
 	       : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
 	       : "rJ" (__newx), "rJ" (~__mask)				\
 	       : "memory");						\
 									\
 	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
+									\
+end:;									\
 })
 
 #define __arch_xchg(sfx, prepend, append, r, p, n)			\
@@ -56,8 +77,13 @@
 									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
+		__arch_xchg_masked(sc_sfx, ".b" swap_sfx,		\
+				   prepend, sc_append, swap_append,	\
+				   __ret, __ptr, __new);		\
+		break;							\
 	case 2:								\
-		__arch_xchg_masked(sc_sfx, prepend, sc_append,		\
+		__arch_xchg_masked(sc_sfx, ".h" swap_sfx,		\
+				   prepend, sc_append, swap_append,	\
 				   __ret, __ptr, __new);		\
 		break;							\
 	case 4:								\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 07/11] asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (5 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 06/11] riscv: Implement xchg8/16() using Zabha Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17  6:19 ` [PATCH v3 08/11] asm-generic: ticket-lock: Add separate ticket-lock.h Alexandre Ghiti
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

The arch_spinlock_t of qspinlock has contained the atomic_t val, which
satisfies the ticket-lock requirement. Thus, unify the arch_spinlock_t
into qspinlock_types.h. This is the preparation for the next combo
spinlock.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
---
 include/asm-generic/spinlock.h       | 14 +++++++-------
 include/asm-generic/spinlock_types.h | 12 ++----------
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 90803a826ba0..4773334ee638 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -32,7 +32,7 @@
 
 static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 {
-	u32 val = atomic_fetch_add(1<<16, lock);
+	u32 val = atomic_fetch_add(1<<16, &lock->val);
 	u16 ticket = val >> 16;
 
 	if (ticket == (u16)val)
@@ -46,31 +46,31 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 	 * have no outstanding writes due to the atomic_fetch_add() the extra
 	 * orderings are free.
 	 */
-	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
+	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
 	smp_mb();
 }
 
 static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
 {
-	u32 old = atomic_read(lock);
+	u32 old = atomic_read(&lock->val);
 
 	if ((old >> 16) != (old & 0xffff))
 		return false;
 
-	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
+	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
 }
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
-	u32 val = atomic_read(lock);
+	u32 val = atomic_read(&lock->val);
 
 	smp_store_release(ptr, (u16)val + 1);
 }
 
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
-	u32 val = lock.counter;
+	u32 val = lock.val.counter;
 
 	return ((val >> 16) == (val & 0xffff));
 }
@@ -84,7 +84,7 @@ static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
 
 static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
-	u32 val = atomic_read(lock);
+	u32 val = atomic_read(&lock->val);
 
 	return (s16)((val >> 16) - (val & 0xffff)) > 1;
 }
diff --git a/include/asm-generic/spinlock_types.h b/include/asm-generic/spinlock_types.h
index 8962bb730945..f534aa5de394 100644
--- a/include/asm-generic/spinlock_types.h
+++ b/include/asm-generic/spinlock_types.h
@@ -3,15 +3,7 @@
 #ifndef __ASM_GENERIC_SPINLOCK_TYPES_H
 #define __ASM_GENERIC_SPINLOCK_TYPES_H
 
-#include <linux/types.h>
-typedef atomic_t arch_spinlock_t;
-
-/*
- * qrwlock_types depends on arch_spinlock_t, so we must typedef that before the
- * include.
- */
-#include <asm/qrwlock_types.h>
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	ATOMIC_INIT(0)
+#include <asm-generic/qspinlock_types.h>
+#include <asm-generic/qrwlock_types.h>
 
 #endif /* __ASM_GENERIC_SPINLOCK_TYPES_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 08/11] asm-generic: ticket-lock: Add separate ticket-lock.h
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (6 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 07/11] asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17  6:19 ` [PATCH v3 09/11] riscv: Add ISA extension parsing for Ziccrse Alexandre Ghiti
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Add a separate ticket-lock.h to include multiple spinlock versions and
select one at compile time or runtime.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 include/asm-generic/spinlock.h        |  87 +---------------------
 include/asm-generic/ticket_spinlock.h | 103 ++++++++++++++++++++++++++
 2 files changed, 104 insertions(+), 86 deletions(-)
 create mode 100644 include/asm-generic/ticket_spinlock.h

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 4773334ee638..970590baf61b 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -1,94 +1,9 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
-/*
- * 'Generic' ticket-lock implementation.
- *
- * It relies on atomic_fetch_add() having well defined forward progress
- * guarantees under contention. If your architecture cannot provide this, stick
- * to a test-and-set lock.
- *
- * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
- * sub-word of the value. This is generally true for anything LL/SC although
- * you'd be hard pressed to find anything useful in architecture specifications
- * about this. If your architecture cannot do this you might be better off with
- * a test-and-set.
- *
- * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
- * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
- * a full fence after the spin to upgrade the otherwise-RCpc
- * atomic_cond_read_acquire().
- *
- * The implementation uses smp_cond_load_acquire() to spin, so if the
- * architecture has WFE like instructions to sleep instead of poll for word
- * modifications be sure to implement that (see ARM64 for example).
- *
- */
-
 #ifndef __ASM_GENERIC_SPINLOCK_H
 #define __ASM_GENERIC_SPINLOCK_H
 
-#include <linux/atomic.h>
-#include <asm-generic/spinlock_types.h>
-
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	u32 val = atomic_fetch_add(1<<16, &lock->val);
-	u16 ticket = val >> 16;
-
-	if (ticket == (u16)val)
-		return;
-
-	/*
-	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
-	 * custom cond_read_rcsc() here we just emit a full fence.  We only
-	 * need the prior reads before subsequent writes ordering from
-	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
-	 * have no outstanding writes due to the atomic_fetch_add() the extra
-	 * orderings are free.
-	 */
-	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
-	smp_mb();
-}
-
-static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
-{
-	u32 old = atomic_read(&lock->val);
-
-	if ((old >> 16) != (old & 0xffff))
-		return false;
-
-	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
-	u32 val = atomic_read(&lock->val);
-
-	smp_store_release(ptr, (u16)val + 1);
-}
-
-static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	u32 val = lock.val.counter;
-
-	return ((val >> 16) == (val & 0xffff));
-}
-
-static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	arch_spinlock_t val = READ_ONCE(*lock);
-
-	return !arch_spin_value_unlocked(val);
-}
-
-static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	u32 val = atomic_read(&lock->val);
-
-	return (s16)((val >> 16) - (val & 0xffff)) > 1;
-}
-
+#include <asm-generic/ticket_spinlock.h>
 #include <asm/qrwlock.h>
 
 #endif /* __ASM_GENERIC_SPINLOCK_H */
diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
new file mode 100644
index 000000000000..cfcff22b37b3
--- /dev/null
+++ b/include/asm-generic/ticket_spinlock.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * 'Generic' ticket-lock implementation.
+ *
+ * It relies on atomic_fetch_add() having well defined forward progress
+ * guarantees under contention. If your architecture cannot provide this, stick
+ * to a test-and-set lock.
+ *
+ * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
+ * sub-word of the value. This is generally true for anything LL/SC although
+ * you'd be hard pressed to find anything useful in architecture specifications
+ * about this. If your architecture cannot do this you might be better off with
+ * a test-and-set.
+ *
+ * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
+ * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
+ * a full fence after the spin to upgrade the otherwise-RCpc
+ * atomic_cond_read_acquire().
+ *
+ * The implementation uses smp_cond_load_acquire() to spin, so if the
+ * architecture has WFE like instructions to sleep instead of poll for word
+ * modifications be sure to implement that (see ARM64 for example).
+ *
+ */
+
+#ifndef __ASM_GENERIC_TICKET_SPINLOCK_H
+#define __ASM_GENERIC_TICKET_SPINLOCK_H
+
+#include <linux/atomic.h>
+#include <asm-generic/spinlock_types.h>
+
+static __always_inline void ticket_spin_lock(arch_spinlock_t *lock)
+{
+	u32 val = atomic_fetch_add(1<<16, &lock->val);
+	u16 ticket = val >> 16;
+
+	if (ticket == (u16)val)
+		return;
+
+	/*
+	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
+	 * custom cond_read_rcsc() here we just emit a full fence.  We only
+	 * need the prior reads before subsequent writes ordering from
+	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
+	 * have no outstanding writes due to the atomic_fetch_add() the extra
+	 * orderings are free.
+	 */
+	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
+	smp_mb();
+}
+
+static __always_inline bool ticket_spin_trylock(arch_spinlock_t *lock)
+{
+	u32 old = atomic_read(&lock->val);
+
+	if ((old >> 16) != (old & 0xffff))
+		return false;
+
+	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
+}
+
+static __always_inline void ticket_spin_unlock(arch_spinlock_t *lock)
+{
+	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+	u32 val = atomic_read(&lock->val);
+
+	smp_store_release(ptr, (u16)val + 1);
+}
+
+static __always_inline int ticket_spin_value_unlocked(arch_spinlock_t lock)
+{
+	u32 val = lock.val.counter;
+
+	return ((val >> 16) == (val & 0xffff));
+}
+
+static __always_inline int ticket_spin_is_locked(arch_spinlock_t *lock)
+{
+	arch_spinlock_t val = READ_ONCE(*lock);
+
+	return !ticket_spin_value_unlocked(val);
+}
+
+static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
+{
+	u32 val = atomic_read(&lock->val);
+
+	return (s16)((val >> 16) - (val & 0xffff)) > 1;
+}
+
+/*
+ * Remapping spinlock architecture specific functions to the corresponding
+ * ticket spinlock functions.
+ */
+#define arch_spin_is_locked(l)		ticket_spin_is_locked(l)
+#define arch_spin_is_contended(l)	ticket_spin_is_contended(l)
+#define arch_spin_value_unlocked(l)	ticket_spin_value_unlocked(l)
+#define arch_spin_lock(l)		ticket_spin_lock(l)
+#define arch_spin_trylock(l)		ticket_spin_trylock(l)
+#define arch_spin_unlock(l)		ticket_spin_unlock(l)
+
+#endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 09/11] riscv: Add ISA extension parsing for Ziccrse
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (7 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 08/11] asm-generic: ticket-lock: Add separate ticket-lock.h Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-19  0:53   ` Samuel Holland
  2024-07-17  6:19 ` [PATCH v3 10/11] dt-bindings: riscv: Add Ziccrse ISA extension description Alexandre Ghiti
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

Add support to parse the Ziccrse string in the riscv,isa string.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/include/asm/hwcap.h | 1 +
 arch/riscv/kernel/cpufeature.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index f71ddd2ca163..863b9b7d4a4f 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -82,6 +82,7 @@
 #define RISCV_ISA_EXT_ZACAS		73
 #define RISCV_ISA_EXT_XANDESPMU		74
 #define RISCV_ISA_EXT_ZABHA		75
+#define RISCV_ISA_EXT_ZICCRSE		76
 
 #define RISCV_ISA_EXT_XLINUXENVCFG	127
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index c125d82c894b..93d8cc7e232c 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -306,6 +306,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
 	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
 	__RISCV_ISA_EXT_DATA(xandespmu, RISCV_ISA_EXT_XANDESPMU),
+	__RISCV_ISA_EXT_DATA(ziccrse, RISCV_ISA_EXT_ZICCRSE),
 };
 
 const size_t riscv_isa_ext_count = ARRAY_SIZE(riscv_isa_ext);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 10/11] dt-bindings: riscv: Add Ziccrse ISA extension description
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (8 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 09/11] riscv: Add ISA extension parsing for Ziccrse Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17  6:55   ` Krzysztof Kozlowski
  2024-07-17  9:42   ` Guo Ren
  2024-07-17  6:19 ` [PATCH v3 11/11] riscv: Add qspinlock support Alexandre Ghiti
  2024-07-17 16:37 ` [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Andrea Parri
  11 siblings, 2 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

Add description for the Ziccrse ISA extension which was introduced in
the riscv profiles specification v0.9.2.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index e6436260bdeb..b08bf1a8d8f8 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -245,6 +245,12 @@ properties:
             in commit 64074bc ("Update version numbers for Zfh/Zfinx") of
             riscv-isa-manual.
 
+        - const: ziccrse
+          description:
+            The standard Ziccrse extension which provides forward progress
+            guarantee on LR/SC sequences, as introduced in the riscv profiles
+            specification v0.9.2.
+
         - const: zk
           description:
             The standard Zk Standard Scalar cryptography extension as ratified
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 11/11] riscv: Add qspinlock support
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (9 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 10/11] dt-bindings: riscv: Add Ziccrse ISA extension description Alexandre Ghiti
@ 2024-07-17  6:19 ` Alexandre Ghiti
  2024-07-17  9:30   ` Guo Ren
                     ` (2 more replies)
  2024-07-17 16:37 ` [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Andrea Parri
  11 siblings, 3 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch
  Cc: Alexandre Ghiti

In order to produce a generic kernel, a user can select
CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
spinlock implementation if Zabha or Ziccrse are not present.

Note that we can't use alternatives here because the discovery of
extensions is done too late and we need to start with the qspinlock
implementation because the ticket spinlock implementation would pollute
the spinlock value, so let's use static keys.

This is largely based on Guo's work and Leonardo reviews at [1].

Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 .../locking/queued-spinlocks/arch-support.txt |  2 +-
 arch/riscv/Kconfig                            | 29 ++++++++++++++
 arch/riscv/include/asm/Kbuild                 |  4 +-
 arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
 arch/riscv/kernel/setup.c                     | 33 ++++++++++++++++
 include/asm-generic/qspinlock.h               |  2 +
 include/asm-generic/ticket_spinlock.h         |  2 +
 7 files changed, 109 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/include/asm/spinlock.h

diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
index 22f2990392ff..cf26042480e2 100644
--- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
+++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
@@ -20,7 +20,7 @@
     |    openrisc: |  ok  |
     |      parisc: | TODO |
     |     powerpc: |  ok  |
-    |       riscv: | TODO |
+    |       riscv: |  ok  |
     |        s390: | TODO |
     |          sh: | TODO |
     |       sparc: |  ok  |
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 0bbaec0444d0..5040c7eac70d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -72,6 +72,7 @@ config RISCV
 	select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
 	select ARCH_WANTS_NO_INSTR
 	select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
+	select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
 	select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
 	select BUILDTIME_TABLE_SORT if MMU
 	select CLINT_TIMER if RISCV_M_MODE
@@ -482,6 +483,34 @@ config NODES_SHIFT
 	  Specify the maximum number of NUMA Nodes available on the target
 	  system.  Increases memory reserved to accommodate various tables.
 
+choice
+	prompt "RISC-V spinlock type"
+	default RISCV_COMBO_SPINLOCKS
+
+config RISCV_TICKET_SPINLOCKS
+	bool "Using ticket spinlock"
+
+config RISCV_QUEUED_SPINLOCKS
+	bool "Using queued spinlock"
+	depends on SMP && MMU
+	select ARCH_USE_QUEUED_SPINLOCKS
+	help
+	  The queued spinlock implementation requires the forward progress
+	  guarantee of cmpxchg()/xchg() atomic operations: CAS with Zabha or
+	  LR/SC with Ziccrse provide such guarantee.
+
+	  Select this if and only if Zabha or Ziccrse is available on your
+	  platform.
+
+config RISCV_COMBO_SPINLOCKS
+	bool "Using combo spinlock"
+	depends on SMP && MMU
+	select ARCH_USE_QUEUED_SPINLOCKS
+	help
+	  Embed both queued spinlock and ticket lock so that the spinlock
+	  implementation can be chosen at runtime.
+endchoice
+
 config RISCV_ALTERNATIVE
 	bool
 	depends on !XIP_KERNEL
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 504f8b7e72d4..ad72f2bd4cc9 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -2,10 +2,12 @@
 generic-y += early_ioremap.h
 generic-y += flat.h
 generic-y += kvm_para.h
+generic-y += mcs_spinlock.h
 generic-y += parport.h
-generic-y += spinlock.h
 generic-y += spinlock_types.h
+generic-y += ticket_spinlock.h
 generic-y += qrwlock.h
 generic-y += qrwlock_types.h
+generic-y += qspinlock.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
new file mode 100644
index 000000000000..4856d50006f2
--- /dev/null
+++ b/arch/riscv/include/asm/spinlock.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_RISCV_SPINLOCK_H
+#define __ASM_RISCV_SPINLOCK_H
+
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+#define _Q_PENDING_LOOPS	(1 << 9)
+
+#define __no_arch_spinlock_redefine
+#include <asm/ticket_spinlock.h>
+#include <asm/qspinlock.h>
+#include <asm/alternative.h>
+
+DECLARE_STATIC_KEY_TRUE(qspinlock_key);
+
+#define SPINLOCK_BASE_DECLARE(op, type, type_lock)			\
+static __always_inline type arch_spin_##op(type_lock lock)		\
+{									\
+	if (static_branch_unlikely(&qspinlock_key))			\
+		return queued_spin_##op(lock);				\
+	return ticket_spin_##op(lock);					\
+}
+
+SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
+
+#else
+
+#include <asm/ticket_spinlock.h>
+
+#endif
+
+#include <asm/qrwlock.h>
+
+#endif /* __ASM_RISCV_SPINLOCK_H */
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 4f73c0ae44b2..d7c31c9b8ead 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -244,6 +244,38 @@ static void __init parse_dtb(void)
 #endif
 }
 
+DEFINE_STATIC_KEY_TRUE(qspinlock_key);
+EXPORT_SYMBOL(qspinlock_key);
+
+static void __init riscv_spinlock_init(void)
+{
+	char *using_ext;
+
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) &&
+	    IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {
+		using_ext = "using Zabha";
+
+		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
+			 : : : : no_zacas);
+		asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
+			 : : : : qspinlock);
+	}
+
+no_zacas:
+	using_ext = "using Ziccrse";
+	asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
+			     RISCV_ISA_EXT_ZICCRSE, 1)
+		 : : : : qspinlock);
+
+	static_branch_disable(&qspinlock_key);
+	pr_info("Ticket spinlock: enabled\n");
+
+	return;
+
+qspinlock:
+	pr_info("Queued spinlock %s: enabled\n", using_ext);
+}
+
 extern void __init init_rt_signal_env(void);
 
 void __init setup_arch(char **cmdline_p)
@@ -295,6 +327,7 @@ void __init setup_arch(char **cmdline_p)
 	riscv_set_dma_cache_alignment();
 
 	riscv_user_isa_enable();
+	riscv_spinlock_init();
 }
 
 bool arch_cpu_is_hotpluggable(int cpu)
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 0655aa5b57b2..bf47cca2c375 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
 }
 #endif
 
+#ifndef __no_arch_spinlock_redefine
 /*
  * Remapping spinlock architecture specific functions to the corresponding
  * queued spinlock functions.
@@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
 #define arch_spin_lock(l)		queued_spin_lock(l)
 #define arch_spin_trylock(l)		queued_spin_trylock(l)
 #define arch_spin_unlock(l)		queued_spin_unlock(l)
+#endif
 
 #endif /* __ASM_GENERIC_QSPINLOCK_H */
diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
index cfcff22b37b3..325779970d8a 100644
--- a/include/asm-generic/ticket_spinlock.h
+++ b/include/asm-generic/ticket_spinlock.h
@@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
 	return (s16)((val >> 16) - (val & 0xffff)) > 1;
 }
 
+#ifndef __no_arch_spinlock_redefine
 /*
  * Remapping spinlock architecture specific functions to the corresponding
  * ticket spinlock functions.
@@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
 #define arch_spin_lock(l)		ticket_spin_lock(l)
 #define arch_spin_trylock(l)		ticket_spin_trylock(l)
 #define arch_spin_unlock(l)		ticket_spin_unlock(l)
+#endif
 
 #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 02/11] dt-bindings: riscv: Add Zabha ISA extension description
  2024-07-17  6:19 ` [PATCH v3 02/11] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
@ 2024-07-17  6:42   ` Krzysztof Kozlowski
  2024-07-17  9:32   ` Guo Ren
  1 sibling, 0 replies; 40+ messages in thread
From: Krzysztof Kozlowski @ 2024-07-17  6:42 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Conor Dooley, Rob Herring, Krzysztof Kozlowski,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

On 17/07/2024 08:19, Alexandre Ghiti wrote:
> Add description for the Zabha ISA extension which was ratified in April
> 2024.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>

<form letter>
Please use scripts/get_maintainers.pl to get a list of necessary people
and lists to CC. It might happen, that command when run on an older
kernel, gives you outdated entries. Therefore please be sure you base
your patches on recent Linux kernel.

Tools like b4 or scripts/get_maintainer.pl provide you proper list of
people, so fix your workflow. Tools might also fail if you work on some
ancient tree (don't, instead use mainline) or work on fork of kernel
(don't, instead use mainline). Just use b4 and everything should be
fine, although remember about `b4 prep --auto-to-cc` if you added new
patches to the patchset.

You missed at least devicetree list (maybe more), so this won't be
tested by automated tooling. Performing review on untested code might be
a waste of time.

Please kindly resend and include all necessary To/Cc entries.
</form letter>

Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 10/11] dt-bindings: riscv: Add Ziccrse ISA extension description
  2024-07-17  6:19 ` [PATCH v3 10/11] dt-bindings: riscv: Add Ziccrse ISA extension description Alexandre Ghiti
@ 2024-07-17  6:55   ` Krzysztof Kozlowski
  2024-07-17  9:42   ` Guo Ren
  1 sibling, 0 replies; 40+ messages in thread
From: Krzysztof Kozlowski @ 2024-07-17  6:55 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Conor Dooley, Rob Herring, Krzysztof Kozlowski,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

On 17/07/2024 08:19, Alexandre Ghiti wrote:
> Add description for the Ziccrse ISA extension which was introduced in
> the riscv profiles specification v0.9.2.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>

<form letter>
Please use scripts/get_maintainers.pl to get a list of necessary people
and lists to CC. It might happen, that command when run on an older
kernel, gives you outdated entries. Therefore please be sure you base
your patches on recent Linux kernel.

Tools like b4 or scripts/get_maintainer.pl provide you proper list of
people, so fix your workflow. Tools might also fail if you work on some
ancient tree (don't, instead use mainline) or work on fork of kernel
(don't, instead use mainline). Just use b4 and everything should be
fine, although remember about `b4 prep --auto-to-cc` if you added new
patches to the patchset.

You missed at least devicetree list (maybe more), so this won't be
tested by automated tooling. Performing review on untested code might be
a waste of time.

Please kindly resend and include all necessary To/Cc entries.
</form letter>

Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 11/11] riscv: Add qspinlock support
  2024-07-17  6:19 ` [PATCH v3 11/11] riscv: Add qspinlock support Alexandre Ghiti
@ 2024-07-17  9:30   ` Guo Ren
  2024-07-18 13:11     ` Alexandre Ghiti
  2024-07-17 16:29   ` Andrea Parri
  2024-07-19  1:05   ` Samuel Holland
  2 siblings, 1 reply; 40+ messages in thread
From: Guo Ren @ 2024-07-17  9:30 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, linux-doc,
	linux-kernel, linux-riscv, linux-arch

On Wed, Jul 17, 2024 at 2:31 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> In order to produce a generic kernel, a user can select
> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> spinlock implementation if Zabha or Ziccrse are not present.
>
> Note that we can't use alternatives here because the discovery of
> extensions is done too late and we need to start with the qspinlock
> implementation because the ticket spinlock implementation would pollute
> the spinlock value, so let's use static keys.
>
> This is largely based on Guo's work and Leonardo reviews at [1].
>
> Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  .../locking/queued-spinlocks/arch-support.txt |  2 +-
>  arch/riscv/Kconfig                            | 29 ++++++++++++++
>  arch/riscv/include/asm/Kbuild                 |  4 +-
>  arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
>  arch/riscv/kernel/setup.c                     | 33 ++++++++++++++++
>  include/asm-generic/qspinlock.h               |  2 +
>  include/asm-generic/ticket_spinlock.h         |  2 +
>  7 files changed, 109 insertions(+), 2 deletions(-)
>  create mode 100644 arch/riscv/include/asm/spinlock.h
>
> diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> index 22f2990392ff..cf26042480e2 100644
> --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
> +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> @@ -20,7 +20,7 @@
>      |    openrisc: |  ok  |
>      |      parisc: | TODO |
>      |     powerpc: |  ok  |
> -    |       riscv: | TODO |
> +    |       riscv: |  ok  |
>      |        s390: | TODO |
>      |          sh: | TODO |
>      |       sparc: |  ok  |
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 0bbaec0444d0..5040c7eac70d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -72,6 +72,7 @@ config RISCV
>         select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
>         select ARCH_WANTS_NO_INSTR
>         select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +       select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
>         select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
>         select BUILDTIME_TABLE_SORT if MMU
>         select CLINT_TIMER if RISCV_M_MODE
> @@ -482,6 +483,34 @@ config NODES_SHIFT
>           Specify the maximum number of NUMA Nodes available on the target
>           system.  Increases memory reserved to accommodate various tables.
>
> +choice
> +       prompt "RISC-V spinlock type"
> +       default RISCV_COMBO_SPINLOCKS
> +
> +config RISCV_TICKET_SPINLOCKS
> +       bool "Using ticket spinlock"
> +
> +config RISCV_QUEUED_SPINLOCKS
> +       bool "Using queued spinlock"
> +       depends on SMP && MMU
> +       select ARCH_USE_QUEUED_SPINLOCKS
> +       help
> +         The queued spinlock implementation requires the forward progress
> +         guarantee of cmpxchg()/xchg() atomic operations: CAS with Zabha or
> +         LR/SC with Ziccrse provide such guarantee.
> +
> +         Select this if and only if Zabha or Ziccrse is available on your
> +         platform.
> +
> +config RISCV_COMBO_SPINLOCKS
> +       bool "Using combo spinlock"
> +       depends on SMP && MMU
> +       select ARCH_USE_QUEUED_SPINLOCKS
> +       help
> +         Embed both queued spinlock and ticket lock so that the spinlock
> +         implementation can be chosen at runtime.
> +endchoice
> +
>  config RISCV_ALTERNATIVE
>         bool
>         depends on !XIP_KERNEL
> diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> index 504f8b7e72d4..ad72f2bd4cc9 100644
> --- a/arch/riscv/include/asm/Kbuild
> +++ b/arch/riscv/include/asm/Kbuild
> @@ -2,10 +2,12 @@
>  generic-y += early_ioremap.h
>  generic-y += flat.h
>  generic-y += kvm_para.h
> +generic-y += mcs_spinlock.h
>  generic-y += parport.h
> -generic-y += spinlock.h
>  generic-y += spinlock_types.h
> +generic-y += ticket_spinlock.h
>  generic-y += qrwlock.h
>  generic-y += qrwlock_types.h
> +generic-y += qspinlock.h
>  generic-y += user.h
>  generic-y += vmlinux.lds.h
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> new file mode 100644
> index 000000000000..4856d50006f2
> --- /dev/null
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_RISCV_SPINLOCK_H
> +#define __ASM_RISCV_SPINLOCK_H
> +
> +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> +#define _Q_PENDING_LOOPS       (1 << 9)
> +
> +#define __no_arch_spinlock_redefine
> +#include <asm/ticket_spinlock.h>
> +#include <asm/qspinlock.h>
> +#include <asm/alternative.h>
> +
> +DECLARE_STATIC_KEY_TRUE(qspinlock_key);
> +
> +#define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
> +static __always_inline type arch_spin_##op(type_lock lock)             \
> +{                                                                      \
> +       if (static_branch_unlikely(&qspinlock_key))                     \
> +               return queued_spin_##op(lock);                          \
> +       return ticket_spin_##op(lock);                                  \
> +}
> +
> +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
> +
> +#else
> +
> +#include <asm/ticket_spinlock.h>
> +
> +#endif
> +
> +#include <asm/qrwlock.h>
> +
> +#endif /* __ASM_RISCV_SPINLOCK_H */
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 4f73c0ae44b2..d7c31c9b8ead 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -244,6 +244,38 @@ static void __init parse_dtb(void)
>  #endif
>  }
>
> +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
> +EXPORT_SYMBOL(qspinlock_key);
> +
> +static void __init riscv_spinlock_init(void)
> +{
> +       char *using_ext;
> +
> +       if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) &&
> +           IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {
> +               using_ext = "using Zabha";
> +
> +               asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
> +                        : : : : no_zacas);
> +               asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
> +                        : : : : qspinlock);
> +       }
I'm okay with this patch.
I suggest putting an arg such as "enable_qspinlock," which people
could use on the non-ZABHA machines. I hope it could happen in this
series. That's all I need, thank you very much.

> +
> +no_zacas:
> +       using_ext = "using Ziccrse";
> +       asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
> +                            RISCV_ISA_EXT_ZICCRSE, 1)
> +                : : : : qspinlock);
> +
> +       static_branch_disable(&qspinlock_key);
> +       pr_info("Ticket spinlock: enabled\n");
> +
> +       return;
> +
> +qspinlock:
> +       pr_info("Queued spinlock %s: enabled\n", using_ext);
> +}
> +
>  extern void __init init_rt_signal_env(void);
>
>  void __init setup_arch(char **cmdline_p)
> @@ -295,6 +327,7 @@ void __init setup_arch(char **cmdline_p)
>         riscv_set_dma_cache_alignment();
>
>         riscv_user_isa_enable();
> +       riscv_spinlock_init();
>  }
>
>  bool arch_cpu_is_hotpluggable(int cpu)
> diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> index 0655aa5b57b2..bf47cca2c375 100644
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
>  }
>  #endif
>
> +#ifndef __no_arch_spinlock_redefine
>  /*
>   * Remapping spinlock architecture specific functions to the corresponding
>   * queued spinlock functions.
> @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
>  #define arch_spin_lock(l)              queued_spin_lock(l)
>  #define arch_spin_trylock(l)           queued_spin_trylock(l)
>  #define arch_spin_unlock(l)            queued_spin_unlock(l)
> +#endif
>
>  #endif /* __ASM_GENERIC_QSPINLOCK_H */
> diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
> index cfcff22b37b3..325779970d8a 100644
> --- a/include/asm-generic/ticket_spinlock.h
> +++ b/include/asm-generic/ticket_spinlock.h
> @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
>         return (s16)((val >> 16) - (val & 0xffff)) > 1;
>  }
>
> +#ifndef __no_arch_spinlock_redefine
>  /*
>   * Remapping spinlock architecture specific functions to the corresponding
>   * ticket spinlock functions.
> @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
>  #define arch_spin_lock(l)              ticket_spin_lock(l)
>  #define arch_spin_trylock(l)           ticket_spin_trylock(l)
>  #define arch_spin_unlock(l)            ticket_spin_unlock(l)
> +#endif
>
>  #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
> --
> 2.39.2
>


-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 02/11] dt-bindings: riscv: Add Zabha ISA extension description
  2024-07-17  6:19 ` [PATCH v3 02/11] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
  2024-07-17  6:42   ` Krzysztof Kozlowski
@ 2024-07-17  9:32   ` Guo Ren
  1 sibling, 0 replies; 40+ messages in thread
From: Guo Ren @ 2024-07-17  9:32 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, linux-doc,
	linux-kernel, linux-riscv, linux-arch

On Wed, Jul 17, 2024 at 2:22 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> Add description for the Zabha ISA extension which was ratified in April
> 2024.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
> index 468c646247aa..e6436260bdeb 100644
> --- a/Documentation/devicetree/bindings/riscv/extensions.yaml
> +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
> @@ -171,6 +171,12 @@ properties:
>              memory types as ratified in the 20191213 version of the privileged
>              ISA specification.
>
> +        - const: zabha
> +          description: |
> +            The Zabha extension for Byte and Halfword Atomic Memory Operations
> +            as ratified at commit 49f49c842ff9 ("Update to Rafified state") of
> +            riscv-zabha.
> +
>          - const: zacas
>            description: |
>              The Zacas extension for Atomic Compare-and-Swap (CAS) instructions
> --
> 2.39.2
>
Reviewed-by: Guo Ren <guoren@kernel.org>

-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 10/11] dt-bindings: riscv: Add Ziccrse ISA extension description
  2024-07-17  6:19 ` [PATCH v3 10/11] dt-bindings: riscv: Add Ziccrse ISA extension description Alexandre Ghiti
  2024-07-17  6:55   ` Krzysztof Kozlowski
@ 2024-07-17  9:42   ` Guo Ren
  1 sibling, 0 replies; 40+ messages in thread
From: Guo Ren @ 2024-07-17  9:42 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, linux-doc,
	linux-kernel, linux-riscv, linux-arch

On Wed, Jul 17, 2024 at 2:30 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> Add description for the Ziccrse ISA extension which was introduced in
> the riscv profiles specification v0.9.2.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
> index e6436260bdeb..b08bf1a8d8f8 100644
> --- a/Documentation/devicetree/bindings/riscv/extensions.yaml
> +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
> @@ -245,6 +245,12 @@ properties:
>              in commit 64074bc ("Update version numbers for Zfh/Zfinx") of
>              riscv-isa-manual.
>
> +        - const: ziccrse
> +          description:
> +            The standard Ziccrse extension which provides forward progress
> +            guarantee on LR/SC sequences, as introduced in the riscv profiles
> +            specification v0.9.2.
> +
>          - const: zk
>            description:
>              The standard Zk Standard Scalar cryptography extension as ratified
> --
> 2.39.2
>
Reviewed-by: Guo Ren <guoren@kernel.org>

-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-17  6:19 ` [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
@ 2024-07-17 15:08   ` Andrew Jones
  2024-07-17 15:18     ` Alexandre Ghiti
  2024-07-19  0:45   ` Samuel Holland
  1 sibling, 1 reply; 40+ messages in thread
From: Andrew Jones @ 2024-07-17 15:08 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

On Wed, Jul 17, 2024 at 08:19:47AM GMT, Alexandre Ghiti wrote:
> This adds runtime support for Zacas in cmpxchg operations.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/Kconfig               | 17 +++++++++++++++++
>  arch/riscv/Makefile              |  3 +++
>  arch/riscv/include/asm/cmpxchg.h | 26 +++++++++++++++++++++++---
>  3 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 05ccba8ca33a..1caaedec88c7 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
>  	  preemption. Enabling this config will result in higher memory
>  	  consumption due to the allocation of per-task's kernel Vector context.
>  
> +config TOOLCHAIN_HAS_ZACAS
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zacas)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zacas)
> +	depends on AS_HAS_OPTION_ARCH
> +
> +config RISCV_ISA_ZACAS
> +	bool "Zacas extension support for atomic CAS"
> +	depends on TOOLCHAIN_HAS_ZACAS
> +	default y
> +	help
> +	  Enable the use of the Zacas ISA-extension to implement kernel atomic
> +	  cmpxchg operations when it is detected at boot.
> +
> +	  If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZBB
>  	bool
>  	default y
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index 06de9d365088..9fd13d7a9cc6 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -85,6 +85,9 @@ endif
>  # Check if the toolchain supports Zihintpause extension
>  riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
>  
> +# Check if the toolchain supports Zacas
> +riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
> +
>  # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
>  # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
>  KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index 808b4c78462e..5d38153e2f13 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -9,6 +9,7 @@
>  #include <linux/bug.h>
>  
>  #include <asm/fence.h>
> +#include <asm/alternative.h>
>  
>  #define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)		\
>  ({									\
> @@ -134,21 +135,40 @@
>  	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
>  })
>  
> -#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)	\
> +#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\

I'd either not bother renaming sc_sfx or also rename it in _arch_cmpxchg.

>  ({									\
> +	__label__ no_zacas, end;					\
>  	register unsigned int __rc;					\
>  									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
> +				     RISCV_ISA_EXT_ZACAS, 1)		\
> +			 : : : : no_zacas);				\
> +									\
> +		__asm__ __volatile__ (					\
> +			prepend						\
> +			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
> +			append						\
> +			: "+&r" (r), "+A" (*(p))			\
> +			: "rJ" (n)					\
> +			: "memory");					\
> +		goto end;						\
> +	}								\
> +									\
> +no_zacas:								\
>  	__asm__ __volatile__ (						\
>  		prepend							\
>  		"0:	lr" lr_sfx " %0, %2\n"				\
>  		"	bne  %0, %z3, 1f\n"				\
> -		"	sc" sc_sfx " %1, %z4, %2\n"			\
> +		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
>  		"	bnez %1, 0b\n"					\
>  		append							\
>  		"1:\n"							\
>  		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
>  		: "rJ" (co o), "rJ" (n)					\
>  		: "memory");						\
> +									\
> +end:;									\
>  })
>  
>  #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
> @@ -156,7 +176,7 @@
>  	__typeof__(ptr) __ptr = (ptr);					\
>  	__typeof__(*(__ptr)) __old = (old);				\
>  	__typeof__(*(__ptr)) __new = (new);				\
> -	__typeof__(*(__ptr)) __ret;					\
> +	__typeof__(*(__ptr)) __ret = (old);				\

Is this just to silence some compiler warnings? Can we point out
whatever the reason is in the commit message?

>  									\
>  	switch (sizeof(*__ptr)) {					\
>  	case 1:								\
> -- 
> 2.39.2
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-17 15:08   ` Andrew Jones
@ 2024-07-17 15:18     ` Alexandre Ghiti
  0 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17 15:18 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

Hi drew,

On Wed, Jul 17, 2024 at 5:08 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Wed, Jul 17, 2024 at 08:19:47AM GMT, Alexandre Ghiti wrote:
> > This adds runtime support for Zacas in cmpxchg operations.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >  arch/riscv/Kconfig               | 17 +++++++++++++++++
> >  arch/riscv/Makefile              |  3 +++
> >  arch/riscv/include/asm/cmpxchg.h | 26 +++++++++++++++++++++++---
> >  3 files changed, 43 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 05ccba8ca33a..1caaedec88c7 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
> >         preemption. Enabling this config will result in higher memory
> >         consumption due to the allocation of per-task's kernel Vector context.
> >
> > +config TOOLCHAIN_HAS_ZACAS
> > +     bool
> > +     default y
> > +     depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zacas)
> > +     depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zacas)
> > +     depends on AS_HAS_OPTION_ARCH
> > +
> > +config RISCV_ISA_ZACAS
> > +     bool "Zacas extension support for atomic CAS"
> > +     depends on TOOLCHAIN_HAS_ZACAS
> > +     default y
> > +     help
> > +       Enable the use of the Zacas ISA-extension to implement kernel atomic
> > +       cmpxchg operations when it is detected at boot.
> > +
> > +       If you don't know what to do here, say Y.
> > +
> >  config TOOLCHAIN_HAS_ZBB
> >       bool
> >       default y
> > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > index 06de9d365088..9fd13d7a9cc6 100644
> > --- a/arch/riscv/Makefile
> > +++ b/arch/riscv/Makefile
> > @@ -85,6 +85,9 @@ endif
> >  # Check if the toolchain supports Zihintpause extension
> >  riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
> >
> > +# Check if the toolchain supports Zacas
> > +riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
> > +
> >  # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> >  # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> >  KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > index 808b4c78462e..5d38153e2f13 100644
> > --- a/arch/riscv/include/asm/cmpxchg.h
> > +++ b/arch/riscv/include/asm/cmpxchg.h
> > @@ -9,6 +9,7 @@
> >  #include <linux/bug.h>
> >
> >  #include <asm/fence.h>
> > +#include <asm/alternative.h>
> >
> >  #define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)         \
> >  ({                                                                   \
> > @@ -134,21 +135,40 @@
> >       r = (__typeof__(*(p)))((__retx & __mask) >> __s);               \
> >  })
> >
> > -#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)      \
> > +#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)  \
>
> I'd either not bother renaming sc_sfx or also rename it in _arch_cmpxchg.

I'll rename both then.

>
> >  ({                                                                   \
> > +     __label__ no_zacas, end;                                        \
> >       register unsigned int __rc;                                     \
> >                                                                       \
> > +     if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {                       \
> > +             asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,         \
> > +                                  RISCV_ISA_EXT_ZACAS, 1)            \
> > +                      : : : : no_zacas);                             \
> > +                                                                     \
> > +             __asm__ __volatile__ (                                  \
> > +                     prepend                                         \
> > +                     "       amocas" sc_cas_sfx " %0, %z2, %1\n"     \
> > +                     append                                          \
> > +                     : "+&r" (r), "+A" (*(p))                        \
> > +                     : "rJ" (n)                                      \
> > +                     : "memory");                                    \
> > +             goto end;                                               \
> > +     }                                                               \
> > +                                                                     \
> > +no_zacas:                                                            \
> >       __asm__ __volatile__ (                                          \
> >               prepend                                                 \
> >               "0:     lr" lr_sfx " %0, %2\n"                          \
> >               "       bne  %0, %z3, 1f\n"                             \
> > -             "       sc" sc_sfx " %1, %z4, %2\n"                     \
> > +             "       sc" sc_cas_sfx " %1, %z4, %2\n"                 \
> >               "       bnez %1, 0b\n"                                  \
> >               append                                                  \
> >               "1:\n"                                                  \
> >               : "=&r" (r), "=&r" (__rc), "+A" (*(p))                  \
> >               : "rJ" (co o), "rJ" (n)                                 \
> >               : "memory");                                            \
> > +                                                                     \
> > +end:;                                                                        \
> >  })
> >
> >  #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)                \
> > @@ -156,7 +176,7 @@
> >       __typeof__(ptr) __ptr = (ptr);                                  \
> >       __typeof__(*(__ptr)) __old = (old);                             \
> >       __typeof__(*(__ptr)) __new = (new);                             \
> > -     __typeof__(*(__ptr)) __ret;                                     \
> > +     __typeof__(*(__ptr)) __ret = (old);                             \
>
> Is this just to silence some compiler warnings? Can we point out
> whatever the reason is in the commit message?

CAS expects to find the old value in rd (__ret) to check against the
current value in memory before actually swapping with the new value.

But both you and Andrea were confused by this, I'll make it more explicit.

>
> >                                                                       \
> >       switch (sizeof(*__ptr)) {                                       \
> >       case 1:                                                         \
> > --
> > 2.39.2
> >
>
> Thanks,
> drew

Thanks,

Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-17  6:19 ` [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha Alexandre Ghiti
@ 2024-07-17 15:26   ` Andrew Jones
  2024-07-17 15:29     ` Conor Dooley
  2024-07-18 12:50     ` Alexandre Ghiti
  0 siblings, 2 replies; 40+ messages in thread
From: Andrew Jones @ 2024-07-17 15:26 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

On Wed, Jul 17, 2024 at 08:19:49AM GMT, Alexandre Ghiti wrote:
> This adds runtime support for Zabha in cmpxchg8/16() operations.
> 
> Note that in the absence of Zacas support in the toolchain, CAS
> instructions from Zabha won't be used.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/Kconfig               | 17 ++++++++++++++++
>  arch/riscv/Makefile              |  3 +++
>  arch/riscv/include/asm/cmpxchg.h | 33 ++++++++++++++++++++++++++++++--
>  arch/riscv/include/asm/hwcap.h   |  1 +
>  arch/riscv/kernel/cpufeature.c   |  1 +
>  5 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 1caaedec88c7..d3b0f92f92da 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
>  	  preemption. Enabling this config will result in higher memory
>  	  consumption due to the allocation of per-task's kernel Vector context.
>  
> +config TOOLCHAIN_HAS_ZABHA
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zabha)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zabha)
> +	depends on AS_HAS_OPTION_ARCH
> +
> +config RISCV_ISA_ZABHA
> +	bool "Zabha extension support for atomic byte/halfword operations"
> +	depends on TOOLCHAIN_HAS_ZABHA
> +	default y
> +	help
> +	  Enable the use of the Zabha ISA-extension to implement kernel
> +	  byte/halfword atomic memory operations when it is detected at boot.
> +
> +	  If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZACAS
>  	bool
>  	default y
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index 9fd13d7a9cc6..78dcaaeebf4e 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -88,6 +88,9 @@ riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
>  # Check if the toolchain supports Zacas
>  riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
>  
> +# Check if the toolchain supports Zabha
> +riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZABHA) := $(riscv-march-y)_zabha
> +
>  # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
>  # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
>  KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index 5d38153e2f13..c86722a101d0 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -105,8 +105,30 @@
>   * indicated by comparing RETURN with OLD.
>   */
>  
> -#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
> +#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
>  ({									\
> +	__label__ no_zabha_zacas, end;					\
> +									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&			\
> +	    IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
> +		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
> +				     RISCV_ISA_EXT_ZABHA, 1)		\
> +			 : : : : no_zabha_zacas);			\
> +		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
> +				     RISCV_ISA_EXT_ZACAS, 1)		\
> +			 : : : : no_zabha_zacas);			\

I came late to the call, but I guess trying to get rid of these asm gotos
was the topic of the discussion. The proposal was to try and use static
branches, but keep in mind that we've had trouble with static branches
inside macros in the past when those macros are used in many places[1]

[1] commit 0b1d60d6dd9e ("riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y")

> +									\
> +		__asm__ __volatile__ (					\
> +			prepend						\
> +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
> +			append						\
> +			: "+&r" (r), "+A" (*(p))			\
> +			: "rJ" (n)					\
> +			: "memory");					\
> +		goto end;						\
> +	}								\
> +									\
> +no_zabha_zacas:;							\

unnecessary ;

>  	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>  	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>  	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
> @@ -133,6 +155,8 @@
>  		: "memory");						\
>  									\
>  	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
> +									\
> +end:;									\
>  })
>  
>  #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
> @@ -180,8 +204,13 @@ end:;									\
>  									\
>  	switch (sizeof(*__ptr)) {					\
>  	case 1:								\
> +		__arch_cmpxchg_masked(sc_sfx, ".b" sc_sfx,		\
> +					prepend, append,		\
> +					__ret, __ptr, __old, __new);    \
> +		break;							\
>  	case 2:								\
> -		__arch_cmpxchg_masked(sc_sfx, prepend, append,		\
> +		__arch_cmpxchg_masked(sc_sfx, ".h" sc_sfx,		\
> +					prepend, append,		\
>  					__ret, __ptr, __old, __new);	\
>  		break;							\
>  	case 4:								\
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index e17d0078a651..f71ddd2ca163 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -81,6 +81,7 @@
>  #define RISCV_ISA_EXT_ZTSO		72
>  #define RISCV_ISA_EXT_ZACAS		73
>  #define RISCV_ISA_EXT_XANDESPMU		74
> +#define RISCV_ISA_EXT_ZABHA		75
>  
>  #define RISCV_ISA_EXT_XLINUXENVCFG	127
>  
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 5ef48cb20ee1..c125d82c894b 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -257,6 +257,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
>  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>  	__RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
>  	__RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
> +	__RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA),
>  	__RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
>  	__RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
>  	__RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),
> -- 
> 2.39.2
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-17 15:26   ` Andrew Jones
@ 2024-07-17 15:29     ` Conor Dooley
  2024-07-17 15:34       ` Alexandre Ghiti
  2024-07-18 12:50     ` Alexandre Ghiti
  1 sibling, 1 reply; 40+ messages in thread
From: Conor Dooley @ 2024-07-17 15:29 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]

On Wed, Jul 17, 2024 at 10:26:34AM -0500, Andrew Jones wrote:
> On Wed, Jul 17, 2024 at 08:19:49AM GMT, Alexandre Ghiti wrote:
> > -#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
> > +#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
> >  ({									\
> > +	__label__ no_zabha_zacas, end;					\
> > +									\
> > +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&			\
> > +	    IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
> > +		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
> > +				     RISCV_ISA_EXT_ZABHA, 1)		\
> > +			 : : : : no_zabha_zacas);			\
> > +		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
> > +				     RISCV_ISA_EXT_ZACAS, 1)		\
> > +			 : : : : no_zabha_zacas);			\
> 
> I came late to the call, but I guess trying to get rid of these asm gotos
> was the topic of the discussion. The proposal was to try and use static
> branches, but keep in mind that we've had trouble with static branches
> inside macros in the past when those macros are used in many places[1]
> 
> [1] commit 0b1d60d6dd9e ("riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y")

The other half of the suggestion was not using an asm goto, but instead
trying to patch the whole thing in the alternative, for the problematic
section with llvm < 17.

> 
> > +									\
> > +		__asm__ __volatile__ (					\
> > +			prepend						\
> > +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
> > +			append						\
> > +			: "+&r" (r), "+A" (*(p))			\
> > +			: "rJ" (n)					\
> > +			: "memory");					\
> > +		goto end;						\
> > +	}								\
> > +									\
> > +no_zabha_zacas:;							\
> 
> unnecessary ;
> 
> >  	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
> >  	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
> >  	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
> > @@ -133,6 +155,8 @@
> >  		: "memory");						\
> >  									\
> >  	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
> > +									\
> > +end:;									\
> >  })
> >  
> >  #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-17 15:29     ` Conor Dooley
@ 2024-07-17 15:34       ` Alexandre Ghiti
  0 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-17 15:34 UTC (permalink / raw)
  To: Conor Dooley, Andrew Jones
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

On 17/07/2024 17:29, Conor Dooley wrote:
> On Wed, Jul 17, 2024 at 10:26:34AM -0500, Andrew Jones wrote:
>> On Wed, Jul 17, 2024 at 08:19:49AM GMT, Alexandre Ghiti wrote:
>>> -#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
>>> +#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
>>>   ({									\
>>> +	__label__ no_zabha_zacas, end;					\
>>> +									\
>>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&			\
>>> +	    IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
>>> +		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
>>> +				     RISCV_ISA_EXT_ZABHA, 1)		\
>>> +			 : : : : no_zabha_zacas);			\
>>> +		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
>>> +				     RISCV_ISA_EXT_ZACAS, 1)		\
>>> +			 : : : : no_zabha_zacas);			\
>> I came late to the call, but I guess trying to get rid of these asm gotos
>> was the topic of the discussion. The proposal was to try and use static
>> branches, but keep in mind that we've had trouble with static branches
>> inside macros in the past when those macros are used in many places[1]
>>
>> [1] commit 0b1d60d6dd9e ("riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y")
> The other half of the suggestion was not using an asm goto, but instead
> trying to patch the whole thing in the alternative, for the problematic
> section with llvm < 17.


And I'm not a big fan of this solution since it would imply patching the 
5-7 instructions for LR/SC into nops which would probably slow (a bit) 
the amocas/amoswap sequence. I agree it should not be that big, but that 
it is just to fix an llvm issue, so not worth it to me!


>>> +									\
>>> +		__asm__ __volatile__ (					\
>>> +			prepend						\
>>> +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
>>> +			append						\
>>> +			: "+&r" (r), "+A" (*(p))			\
>>> +			: "rJ" (n)					\
>>> +			: "memory");					\
>>> +		goto end;						\
>>> +	}								\
>>> +									\
>>> +no_zabha_zacas:;							\
>> unnecessary ;
>>
>>>   	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>>>   	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>>>   	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
>>> @@ -133,6 +155,8 @@
>>>   		: "memory");						\
>>>   									\
>>>   	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
>>> +									\
>>> +end:;									\
>>>   })
>>>   
>>>   #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 11/11] riscv: Add qspinlock support
  2024-07-17  6:19 ` [PATCH v3 11/11] riscv: Add qspinlock support Alexandre Ghiti
  2024-07-17  9:30   ` Guo Ren
@ 2024-07-17 16:29   ` Andrea Parri
  2024-07-18 13:08     ` Alexandre Ghiti
  2024-07-19  1:05   ` Samuel Holland
  2 siblings, 1 reply; 40+ messages in thread
From: Andrea Parri @ 2024-07-17 16:29 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Nathan Chancellor,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Arnd Bergmann, Leonardo Bras, Guo Ren, linux-doc, linux-kernel,
	linux-riscv, linux-arch

> +config RISCV_QUEUED_SPINLOCKS

I'm seeing the following warnings with CONFIG_RISCV_QUEUED_SPINLOCKS=y:

In file included from ./arch/riscv/include/generated/asm/qspinlock.h:1,
                 from kernel/locking/qspinlock.c:24:
./include/asm-generic/qspinlock.h:144:9: warning: "arch_spin_is_locked" redefined
  144 | #define arch_spin_is_locked(l)          queued_spin_is_locked(l)
      |         ^~~~~~~~~~~~~~~~~~~
In file included from ./arch/riscv/include/generated/asm/ticket_spinlock.h:1,
                 from ./arch/riscv/include/asm/spinlock.h:33,
                 from ./include/linux/spinlock.h:95,
                 from ./include/linux/sched.h:2142,
                 from ./include/linux/percpu.h:13,
                 from kernel/locking/qspinlock.c:19:
./include/asm-generic/ticket_spinlock.h:97:9: note: this is the location of the previous definition
   97 | #define arch_spin_is_locked(l)          ticket_spin_is_locked(l)
      |         ^~~~~~~~~~~~~~~~~~~
./include/asm-generic/qspinlock.h:145:9: warning: "arch_spin_is_contended" redefined
  145 | #define arch_spin_is_contended(l)       queued_spin_is_contended(l)
      |         ^~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/ticket_spinlock.h:98:9: note: this is the location of the previous definition
   98 | #define arch_spin_is_contended(l)       ticket_spin_is_contended(l)
      |         ^~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/qspinlock.h:146:9: warning: "arch_spin_value_unlocked" redefined
  146 | #define arch_spin_value_unlocked(l)     queued_spin_value_unlocked(l)
      |         ^~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/ticket_spinlock.h:99:9: note: this is the location of the previous definition
   99 | #define arch_spin_value_unlocked(l)     ticket_spin_value_unlocked(l)
      |         ^~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/qspinlock.h:147:9: warning: "arch_spin_lock" redefined
  147 | #define arch_spin_lock(l)               queued_spin_lock(l)
      |         ^~~~~~~~~~~~~~
./include/asm-generic/ticket_spinlock.h:100:9: note: this is the location of the previous definition
  100 | #define arch_spin_lock(l)               ticket_spin_lock(l)
      |         ^~~~~~~~~~~~~~
./include/asm-generic/qspinlock.h:148:9: warning: "arch_spin_trylock" redefined
  148 | #define arch_spin_trylock(l)            queued_spin_trylock(l)
      |         ^~~~~~~~~~~~~~~~~
./include/asm-generic/ticket_spinlock.h:101:9: note: this is the location of the previous definition
  101 | #define arch_spin_trylock(l)            ticket_spin_trylock(l)
      |         ^~~~~~~~~~~~~~~~~
./include/asm-generic/qspinlock.h:149:9: warning: "arch_spin_unlock" redefined
  149 | #define arch_spin_unlock(l)             queued_spin_unlock(l)
      |         ^~~~~~~~~~~~~~~~
./include/asm-generic/ticket_spinlock.h:102:9: note: this is the location of the previous definition
  102 | #define arch_spin_unlock(l)             ticket_spin_unlock(l)


The following diff resolves them for me (please double check):

diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index 4856d50006f28..2d59f56a9e2d1 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -30,7 +30,11 @@ SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
 
 #else
 
+#if defined(CONFIG_RISCV_TICKET_SPINLOCKS)
 #include <asm/ticket_spinlock.h>
+#elif defined(CONFIG_RISCV_QUEUED_SPINLOCKS)
+#include <asm/qspinlock.h>
+#endif
 
 #endif


> +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
> +EXPORT_SYMBOL(qspinlock_key);
> +
> +static void __init riscv_spinlock_init(void)
> +{
> +	char *using_ext;
> +
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) &&
> +	    IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {
> +		using_ext = "using Zabha";
> +
> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
> +			 : : : : no_zacas);
> +		asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
> +			 : : : : qspinlock);
> +	}
> +
> +no_zacas:
> +	using_ext = "using Ziccrse";
> +	asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
> +			     RISCV_ISA_EXT_ZICCRSE, 1)
> +		 : : : : qspinlock);
> +
> +	static_branch_disable(&qspinlock_key);
> +	pr_info("Ticket spinlock: enabled\n");
> +
> +	return;
> +
> +qspinlock:
> +	pr_info("Queued spinlock %s: enabled\n", using_ext);
> +}
> +

Your commit message suggests that riscv_spinlock_init() doesn't need to
do anything if CONFIG_RISCV_COMBO_SPINLOCKS=n:

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index d7c31c9b8ead2..b2be1b0b700d2 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -244,6 +244,7 @@ static void __init parse_dtb(void)
 #endif
 }
 
+#if defined(CONFIG_RISCV_COMBO_SPINLOCKS)
 DEFINE_STATIC_KEY_TRUE(qspinlock_key);
 EXPORT_SYMBOL(qspinlock_key);
 
@@ -275,6 +276,11 @@ static void __init riscv_spinlock_init(void)
 qspinlock:
 	pr_info("Queued spinlock %s: enabled\n", using_ext);
 }
+#else
+static void __init riscv_spinlock_init(void)
+{
+}
+#endif
 
 extern void __init init_rt_signal_env(void);


Makes sense?  What am I missing?

  Andrea

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 00/11] Zacas/Zabha support and qspinlocks
  2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (10 preceding siblings ...)
  2024-07-17  6:19 ` [PATCH v3 11/11] riscv: Add qspinlock support Alexandre Ghiti
@ 2024-07-17 16:37 ` Andrea Parri
  11 siblings, 0 replies; 40+ messages in thread
From: Andrea Parri @ 2024-07-17 16:37 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Nathan Chancellor,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Arnd Bergmann, Leonardo Bras, Guo Ren, linux-doc, linux-kernel,
	linux-riscv, linux-arch

>   riscv: Implement cmpxchg32/64() using Zacas
>   dt-bindings: riscv: Add Zabha ISA extension description
>   riscv: Implement cmpxchg8/16() using Zabha
>   riscv: Improve zacas fully-ordered cmpxchg()
>   riscv: Implement arch_cmpxchg128() using Zacas
>   riscv: Implement xchg8/16() using Zabha

These look good to me.  Thanks!

  Andrea

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas
  2024-07-17  6:19 ` [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas Alexandre Ghiti
@ 2024-07-17 20:34   ` Andrew Jones
  2024-07-18  7:48     ` Alexandre Ghiti
  0 siblings, 1 reply; 40+ messages in thread
From: Andrew Jones @ 2024-07-17 20:34 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

On Wed, Jul 17, 2024 at 08:19:51AM GMT, Alexandre Ghiti wrote:
> Now that Zacas is supported in the kernel, let's use the double word
> atomic version of amocas to improve the SLUB allocator.
> 
> Note that we have to select fixed registers, otherwise gcc fails to pick
> even registers and then produces a reserved encoding which fails to
> assemble.

Oh, that's quite unfortunate... I guess we should try to get some new
RISC-V inline assembly register constraints added to support register
pairs.

> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/Kconfig               |  1 +
>  arch/riscv/include/asm/cmpxchg.h | 39 ++++++++++++++++++++++++++++++++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index d3b0f92f92da..0bbaec0444d0 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -104,6 +104,7 @@ config RISCV
>  	select GENERIC_VDSO_TIME_NS if HAVE_GENERIC_VDSO
>  	select HARDIRQS_SW_RESEND
>  	select HAS_IOPORT if MMU
> +	select HAVE_ALIGNED_STRUCT_PAGE
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP
>  	select HAVE_ARCH_HUGE_VMAP if MMU && 64BIT
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index 97b24da38897..608d98522557 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -289,4 +289,43 @@ end:;									\
>  	arch_cmpxchg_release((ptr), (o), (n));				\
>  })
>  
> +#ifdef CONFIG_RISCV_ISA_ZACAS

This is also 64-bit only, so needs a CONFIG_64BIT check too.

> +
> +#define system_has_cmpxchg128()						\
> +			riscv_has_extension_unlikely(RISCV_ISA_EXT_ZACAS)

nit: let's let this stick out since we have 100 chars

> +
> +union __u128_halves {
> +	u128 full;
> +	struct {
> +		u64 low, high;

Should we consider big endian too?

> +	};
> +};
> +
> +#define __arch_cmpxchg128(p, o, n, cas_sfx)					\
> +({										\
> +	__typeof__(*(p)) __o = (o);						\
> +	union __u128_halves __hn = { .full = (n) };				\
> +	union __u128_halves __ho = { .full = (__o) };				\
> +	register unsigned long x6 asm ("x6") = __hn.low;			\
> +	register unsigned long x7 asm ("x7") = __hn.high;			\
> +	register unsigned long x28 asm ("x28") = __ho.low;			\
> +	register unsigned long x29 asm ("x29") = __ho.high;			\

Can we use t1,t2,t3,t4 rather than the x names?

> +										\
> +	__asm__ __volatile__ (							\
> +		"	amocas.q" cas_sfx " %0, %z3, %2"			\
> +		: "+&r" (x28), "+&r" (x29), "+A" (*(p))				\
> +		: "rJ" (x6), "rJ" (x7)						\
> +		: "memory");							\
> +										\
> +	((u128)x29 << 64) | x28;						\
> +})
> +
> +#define arch_cmpxchg128(ptr, o, n)						\
> +	__arch_cmpxchg128((ptr), (o), (n), ".aqrl")
> +
> +#define arch_cmpxchg128_local(ptr, o, n)					\
> +	__arch_cmpxchg128((ptr), (o), (n), "")
> +
> +#endif /* CONFIG_RISCV_ISA_ZACAS */
> +
>  #endif /* _ASM_RISCV_CMPXCHG_H */
> -- 
> 2.39.2

Thanks,
drew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas
  2024-07-17 20:34   ` Andrew Jones
@ 2024-07-18  7:48     ` Alexandre Ghiti
  2024-07-18  8:33       ` Conor Dooley
  0 siblings, 1 reply; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-18  7:48 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Drew,

On Wed, Jul 17, 2024 at 10:34 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Wed, Jul 17, 2024 at 08:19:51AM GMT, Alexandre Ghiti wrote:
> > Now that Zacas is supported in the kernel, let's use the double word
> > atomic version of amocas to improve the SLUB allocator.
> >
> > Note that we have to select fixed registers, otherwise gcc fails to pick
> > even registers and then produces a reserved encoding which fails to
> > assemble.
>
> Oh, that's quite unfortunate... I guess we should try to get some new
> RISC-V inline assembly register constraints added to support register
> pairs.

I internally informed the compilers people, I'll check their progress.

>
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >  arch/riscv/Kconfig               |  1 +
> >  arch/riscv/include/asm/cmpxchg.h | 39 ++++++++++++++++++++++++++++++++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index d3b0f92f92da..0bbaec0444d0 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -104,6 +104,7 @@ config RISCV
> >       select GENERIC_VDSO_TIME_NS if HAVE_GENERIC_VDSO
> >       select HARDIRQS_SW_RESEND
> >       select HAS_IOPORT if MMU
> > +     select HAVE_ALIGNED_STRUCT_PAGE
> >       select HAVE_ARCH_AUDITSYSCALL
> >       select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP
> >       select HAVE_ARCH_HUGE_VMAP if MMU && 64BIT
> > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > index 97b24da38897..608d98522557 100644
> > --- a/arch/riscv/include/asm/cmpxchg.h
> > +++ b/arch/riscv/include/asm/cmpxchg.h
> > @@ -289,4 +289,43 @@ end:;                                                                    \
> >       arch_cmpxchg_release((ptr), (o), (n));                          \
> >  })
> >
> > +#ifdef CONFIG_RISCV_ISA_ZACAS
>
> This is also 64-bit only, so needs a CONFIG_64BIT check too.

Yep, thanks

>
> > +
> > +#define system_has_cmpxchg128()                                              \
> > +                     riscv_has_extension_unlikely(RISCV_ISA_EXT_ZACAS)
>
> nit: let's let this stick out since we have 100 chars

Ok

>
> > +
> > +union __u128_halves {
> > +     u128 full;
> > +     struct {
> > +             u64 low, high;
>
> Should we consider big endian too?

Should we care about big endian? We don't deal with big endian
anywhere in our kernel right now.

>
> > +     };
> > +};
> > +
> > +#define __arch_cmpxchg128(p, o, n, cas_sfx)                                  \
> > +({                                                                           \
> > +     __typeof__(*(p)) __o = (o);                                             \
> > +     union __u128_halves __hn = { .full = (n) };                             \
> > +     union __u128_halves __ho = { .full = (__o) };                           \
> > +     register unsigned long x6 asm ("x6") = __hn.low;                        \
> > +     register unsigned long x7 asm ("x7") = __hn.high;                       \
> > +     register unsigned long x28 asm ("x28") = __ho.low;                      \
> > +     register unsigned long x29 asm ("x29") = __ho.high;                     \
>
> Can we use t1,t2,t3,t4 rather than the x names?

We can, I did not because it was a bit misleading in the sense that
amocas expects an *even* register and using the tX registers, we'll
pass an *odd* register (which actually is even but still).

Anyway, I'll change that, I don't like the xX notation.

Thanks for the review,

Alex

>
> > +                                                                             \
> > +     __asm__ __volatile__ (                                                  \
> > +             "       amocas.q" cas_sfx " %0, %z3, %2"                        \
> > +             : "+&r" (x28), "+&r" (x29), "+A" (*(p))                         \
> > +             : "rJ" (x6), "rJ" (x7)                                          \
> > +             : "memory");                                                    \
> > +                                                                             \
> > +     ((u128)x29 << 64) | x28;                                                \
> > +})
> > +
> > +#define arch_cmpxchg128(ptr, o, n)                                           \
> > +     __arch_cmpxchg128((ptr), (o), (n), ".aqrl")
> > +
> > +#define arch_cmpxchg128_local(ptr, o, n)                                     \
> > +     __arch_cmpxchg128((ptr), (o), (n), "")
> > +
> > +#endif /* CONFIG_RISCV_ISA_ZACAS */
> > +
> >  #endif /* _ASM_RISCV_CMPXCHG_H */
> > --
> > 2.39.2
>
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas
  2024-07-18  7:48     ` Alexandre Ghiti
@ 2024-07-18  8:33       ` Conor Dooley
  2024-07-18  9:35         ` Arnd Bergmann
  0 siblings, 1 reply; 40+ messages in thread
From: Conor Dooley @ 2024-07-18  8:33 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Andrew Jones, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Conor Dooley, Rob Herring, Krzysztof Kozlowski,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

[-- Attachment #1: Type: text/plain, Size: 913 bytes --]

On Thu, Jul 18, 2024 at 09:48:42AM +0200, Alexandre Ghiti wrote:
> On Wed, Jul 17, 2024 at 10:34 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> > On Wed, Jul 17, 2024 at 08:19:51AM GMT, Alexandre Ghiti wrote:
> > > +
> > > +union __u128_halves {
> > > +     u128 full;
> > > +     struct {
> > > +             u64 low, high;
> >
> > Should we consider big endian too?
> 
> Should we care about big endian? We don't deal with big endian
> anywhere in our kernel right now.

There's one or two places I think that we do actually have some
conditional stuff for BE. The Zbb string routines I believe is one such
place, and maybe there are one or two others. In general I'm not of the
opinion that it is worth adding complexity for BE until there's
linux-capable hardware that supports it (so not QEMU or people's toy
implementations), unless it's something that userspace is able to see.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas
  2024-07-18  8:33       ` Conor Dooley
@ 2024-07-18  9:35         ` Arnd Bergmann
  0 siblings, 0 replies; 40+ messages in thread
From: Arnd Bergmann @ 2024-07-18  9:35 UTC (permalink / raw)
  To: Conor.Dooley, Alexandre Ghiti
  Cc: Andrew Jones, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Conor Dooley, Rob Herring, Krzysztof Kozlowski,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Leonardo Bras, guoren,
	linux-doc, linux-kernel, linux-riscv, Linux-Arch

On Thu, Jul 18, 2024, at 10:33, Conor Dooley wrote:
> On Thu, Jul 18, 2024 at 09:48:42AM +0200, Alexandre Ghiti wrote:
>> On Wed, Jul 17, 2024 at 10:34 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>> > On Wed, Jul 17, 2024 at 08:19:51AM GMT, Alexandre Ghiti wrote:
>> > > +
>> > > +union __u128_halves {
>> > > +     u128 full;
>> > > +     struct {
>> > > +             u64 low, high;
>> >
>> > Should we consider big endian too?
>> 
>> Should we care about big endian? We don't deal with big endian
>> anywhere in our kernel right now.
>
> There's one or two places I think that we do actually have some
> conditional stuff for BE. The Zbb string routines I believe is one such
> place, and maybe there are one or two others. In general I'm not of the
> opinion that it is worth adding complexity for BE until there's
> linux-capable hardware that supports it (so not QEMU or people's toy
> implementations), unless it's something that userspace is able to see.

I don't think you want to go there at all: maintaining an
extra user space ABI (or two if you add 32-bit BE as well)
has a huge long-term cost, and there is pretty much zero
benefit for a BE ABI these days.

Adding it to arm64 turned out to be a mistake. We did have
a handful of users in the first year, and it technically
still works, but I don't think there are any users left
after they managed to fix their nonportable legacy
userspace from that was ported from big-endian mips or
powerpc.

     Arnd

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-17 15:26   ` Andrew Jones
  2024-07-17 15:29     ` Conor Dooley
@ 2024-07-18 12:50     ` Alexandre Ghiti
  2024-07-18 16:06       ` Andrew Jones
  1 sibling, 1 reply; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-18 12:50 UTC (permalink / raw)
  To: Andrew Jones, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Drew,

On 17/07/2024 17:26, Andrew Jones wrote:
> On Wed, Jul 17, 2024 at 08:19:49AM GMT, Alexandre Ghiti wrote:
>> This adds runtime support for Zabha in cmpxchg8/16() operations.
>>
>> Note that in the absence of Zacas support in the toolchain, CAS
>> instructions from Zabha won't be used.
>>
>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>> ---
>>   arch/riscv/Kconfig               | 17 ++++++++++++++++
>>   arch/riscv/Makefile              |  3 +++
>>   arch/riscv/include/asm/cmpxchg.h | 33 ++++++++++++++++++++++++++++++--
>>   arch/riscv/include/asm/hwcap.h   |  1 +
>>   arch/riscv/kernel/cpufeature.c   |  1 +
>>   5 files changed, 53 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 1caaedec88c7..d3b0f92f92da 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
>>   	  preemption. Enabling this config will result in higher memory
>>   	  consumption due to the allocation of per-task's kernel Vector context.
>>   
>> +config TOOLCHAIN_HAS_ZABHA
>> +	bool
>> +	default y
>> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zabha)
>> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zabha)
>> +	depends on AS_HAS_OPTION_ARCH
>> +
>> +config RISCV_ISA_ZABHA
>> +	bool "Zabha extension support for atomic byte/halfword operations"
>> +	depends on TOOLCHAIN_HAS_ZABHA
>> +	default y
>> +	help
>> +	  Enable the use of the Zabha ISA-extension to implement kernel
>> +	  byte/halfword atomic memory operations when it is detected at boot.
>> +
>> +	  If you don't know what to do here, say Y.
>> +
>>   config TOOLCHAIN_HAS_ZACAS
>>   	bool
>>   	default y
>> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
>> index 9fd13d7a9cc6..78dcaaeebf4e 100644
>> --- a/arch/riscv/Makefile
>> +++ b/arch/riscv/Makefile
>> @@ -88,6 +88,9 @@ riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
>>   # Check if the toolchain supports Zacas
>>   riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
>>   
>> +# Check if the toolchain supports Zabha
>> +riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZABHA) := $(riscv-march-y)_zabha
>> +
>>   # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
>>   # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
>>   KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
>> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
>> index 5d38153e2f13..c86722a101d0 100644
>> --- a/arch/riscv/include/asm/cmpxchg.h
>> +++ b/arch/riscv/include/asm/cmpxchg.h
>> @@ -105,8 +105,30 @@
>>    * indicated by comparing RETURN with OLD.
>>    */
>>   
>> -#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
>> +#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
>>   ({									\
>> +	__label__ no_zabha_zacas, end;					\
>> +									\
>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) &&			\
>> +	    IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
>> +		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
>> +				     RISCV_ISA_EXT_ZABHA, 1)		\
>> +			 : : : : no_zabha_zacas);			\
>> +		asm goto(ALTERNATIVE("j %[no_zabha_zacas]", "nop", 0,	\
>> +				     RISCV_ISA_EXT_ZACAS, 1)		\
>> +			 : : : : no_zabha_zacas);			\
> I came late to the call, but I guess trying to get rid of these asm gotos
> was the topic of the discussion. The proposal was to try and use static
> branches, but keep in mind that we've had trouble with static branches
> inside macros in the past when those macros are used in many places[1]
>
> [1] commit 0b1d60d6dd9e ("riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y")


Thanks for the pointer, I was not aware of this. I came up with a 
solution with preprocessor guards, not the prettiest solution but at 
least it does not create more problems :)


>
>> +									\
>> +		__asm__ __volatile__ (					\
>> +			prepend						\
>> +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
>> +			append						\
>> +			: "+&r" (r), "+A" (*(p))			\
>> +			: "rJ" (n)					\
>> +			: "memory");					\
>> +		goto end;						\
>> +	}								\
>> +									\
>> +no_zabha_zacas:;							\
> unnecessary ;


Actually it is, it fixes a warning encountered on llvm: 
https://lore.kernel.org/linux-riscv/20240528193110.GA2196855@thelio-3990X/

Thanks,

Alex


>
>>   	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>>   	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>>   	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
>> @@ -133,6 +155,8 @@
>>   		: "memory");						\
>>   									\
>>   	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
>> +									\
>> +end:;									\
>>   })
>>   
>>   #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
>> @@ -180,8 +204,13 @@ end:;									\
>>   									\
>>   	switch (sizeof(*__ptr)) {					\
>>   	case 1:								\
>> +		__arch_cmpxchg_masked(sc_sfx, ".b" sc_sfx,		\
>> +					prepend, append,		\
>> +					__ret, __ptr, __old, __new);    \
>> +		break;							\
>>   	case 2:								\
>> -		__arch_cmpxchg_masked(sc_sfx, prepend, append,		\
>> +		__arch_cmpxchg_masked(sc_sfx, ".h" sc_sfx,		\
>> +					prepend, append,		\
>>   					__ret, __ptr, __old, __new);	\
>>   		break;							\
>>   	case 4:								\
>> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
>> index e17d0078a651..f71ddd2ca163 100644
>> --- a/arch/riscv/include/asm/hwcap.h
>> +++ b/arch/riscv/include/asm/hwcap.h
>> @@ -81,6 +81,7 @@
>>   #define RISCV_ISA_EXT_ZTSO		72
>>   #define RISCV_ISA_EXT_ZACAS		73
>>   #define RISCV_ISA_EXT_XANDESPMU		74
>> +#define RISCV_ISA_EXT_ZABHA		75
>>   
>>   #define RISCV_ISA_EXT_XLINUXENVCFG	127
>>   
>> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>> index 5ef48cb20ee1..c125d82c894b 100644
>> --- a/arch/riscv/kernel/cpufeature.c
>> +++ b/arch/riscv/kernel/cpufeature.c
>> @@ -257,6 +257,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
>>   	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>>   	__RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
>>   	__RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
>> +	__RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA),
>>   	__RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
>>   	__RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
>>   	__RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),
>> -- 
>> 2.39.2
>>
> Thanks,
> drew
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 11/11] riscv: Add qspinlock support
  2024-07-17 16:29   ` Andrea Parri
@ 2024-07-18 13:08     ` Alexandre Ghiti
  0 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-18 13:08 UTC (permalink / raw)
  To: Andrea Parri, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Nathan Chancellor,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Arnd Bergmann, Leonardo Bras, Guo Ren, linux-doc, linux-kernel,
	linux-riscv, linux-arch

Hi Andrea,

On 17/07/2024 18:29, Andrea Parri wrote:
>> +config RISCV_QUEUED_SPINLOCKS
> I'm seeing the following warnings with CONFIG_RISCV_QUEUED_SPINLOCKS=y:
>
> In file included from ./arch/riscv/include/generated/asm/qspinlock.h:1,
>                   from kernel/locking/qspinlock.c:24:
> ./include/asm-generic/qspinlock.h:144:9: warning: "arch_spin_is_locked" redefined
>    144 | #define arch_spin_is_locked(l)          queued_spin_is_locked(l)
>        |         ^~~~~~~~~~~~~~~~~~~
> In file included from ./arch/riscv/include/generated/asm/ticket_spinlock.h:1,
>                   from ./arch/riscv/include/asm/spinlock.h:33,
>                   from ./include/linux/spinlock.h:95,
>                   from ./include/linux/sched.h:2142,
>                   from ./include/linux/percpu.h:13,
>                   from kernel/locking/qspinlock.c:19:
> ./include/asm-generic/ticket_spinlock.h:97:9: note: this is the location of the previous definition
>     97 | #define arch_spin_is_locked(l)          ticket_spin_is_locked(l)
>        |         ^~~~~~~~~~~~~~~~~~~
> ./include/asm-generic/qspinlock.h:145:9: warning: "arch_spin_is_contended" redefined
>    145 | #define arch_spin_is_contended(l)       queued_spin_is_contended(l)
>        |         ^~~~~~~~~~~~~~~~~~~~~~
> ./include/asm-generic/ticket_spinlock.h:98:9: note: this is the location of the previous definition
>     98 | #define arch_spin_is_contended(l)       ticket_spin_is_contended(l)
>        |         ^~~~~~~~~~~~~~~~~~~~~~
> ./include/asm-generic/qspinlock.h:146:9: warning: "arch_spin_value_unlocked" redefined
>    146 | #define arch_spin_value_unlocked(l)     queued_spin_value_unlocked(l)
>        |         ^~~~~~~~~~~~~~~~~~~~~~~~
> ./include/asm-generic/ticket_spinlock.h:99:9: note: this is the location of the previous definition
>     99 | #define arch_spin_value_unlocked(l)     ticket_spin_value_unlocked(l)
>        |         ^~~~~~~~~~~~~~~~~~~~~~~~
> ./include/asm-generic/qspinlock.h:147:9: warning: "arch_spin_lock" redefined
>    147 | #define arch_spin_lock(l)               queued_spin_lock(l)
>        |         ^~~~~~~~~~~~~~
> ./include/asm-generic/ticket_spinlock.h:100:9: note: this is the location of the previous definition
>    100 | #define arch_spin_lock(l)               ticket_spin_lock(l)
>        |         ^~~~~~~~~~~~~~
> ./include/asm-generic/qspinlock.h:148:9: warning: "arch_spin_trylock" redefined
>    148 | #define arch_spin_trylock(l)            queued_spin_trylock(l)
>        |         ^~~~~~~~~~~~~~~~~
> ./include/asm-generic/ticket_spinlock.h:101:9: note: this is the location of the previous definition
>    101 | #define arch_spin_trylock(l)            ticket_spin_trylock(l)
>        |         ^~~~~~~~~~~~~~~~~
> ./include/asm-generic/qspinlock.h:149:9: warning: "arch_spin_unlock" redefined
>    149 | #define arch_spin_unlock(l)             queued_spin_unlock(l)
>        |         ^~~~~~~~~~~~~~~~
> ./include/asm-generic/ticket_spinlock.h:102:9: note: this is the location of the previous definition
>    102 | #define arch_spin_unlock(l)             ticket_spin_unlock(l)
>
>
> The following diff resolves them for me (please double check):
>
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> index 4856d50006f28..2d59f56a9e2d1 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -30,7 +30,11 @@ SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
>   
>   #else
>   
> +#if defined(CONFIG_RISCV_TICKET_SPINLOCKS)
>   #include <asm/ticket_spinlock.h>
> +#elif defined(CONFIG_RISCV_QUEUED_SPINLOCKS)
> +#include <asm/qspinlock.h>
> +#endif
>   
>   #endif


Thanks for testing this config (when I did not...)!

I came up with something slightly different, but same fix in the end, 
thanks!


>
>> +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
>> +EXPORT_SYMBOL(qspinlock_key);
>> +
>> +static void __init riscv_spinlock_init(void)
>> +{
>> +	char *using_ext;
>> +
>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) &&
>> +	    IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {
>> +		using_ext = "using Zabha";
>> +
>> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
>> +			 : : : : no_zacas);
>> +		asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
>> +			 : : : : qspinlock);
>> +	}
>> +
>> +no_zacas:
>> +	using_ext = "using Ziccrse";
>> +	asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
>> +			     RISCV_ISA_EXT_ZICCRSE, 1)
>> +		 : : : : qspinlock);
>> +
>> +	static_branch_disable(&qspinlock_key);
>> +	pr_info("Ticket spinlock: enabled\n");
>> +
>> +	return;
>> +
>> +qspinlock:
>> +	pr_info("Queued spinlock %s: enabled\n", using_ext);
>> +}
>> +
> Your commit message suggests that riscv_spinlock_init() doesn't need to
> do anything if CONFIG_RISCV_COMBO_SPINLOCKS=n:
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index d7c31c9b8ead2..b2be1b0b700d2 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -244,6 +244,7 @@ static void __init parse_dtb(void)
>   #endif
>   }
>   
> +#if defined(CONFIG_RISCV_COMBO_SPINLOCKS)
>   DEFINE_STATIC_KEY_TRUE(qspinlock_key);
>   EXPORT_SYMBOL(qspinlock_key);
>   
> @@ -275,6 +276,11 @@ static void __init riscv_spinlock_init(void)
>   qspinlock:
>   	pr_info("Queued spinlock %s: enabled\n", using_ext);
>   }
> +#else
> +static void __init riscv_spinlock_init(void)
> +{
> +}
> +#endif
>   
>   extern void __init init_rt_signal_env(void);
>
>
> Makes sense?  What am I missing?


Totally makes sense, I completely overlooked this when I added the 
ticket/queued configs, thanks for taking the time to look into it.

That will be fixed in the next version.

Thanks again,

Alex


>
>    Andrea
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 11/11] riscv: Add qspinlock support
  2024-07-17  9:30   ` Guo Ren
@ 2024-07-18 13:11     ` Alexandre Ghiti
  0 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-18 13:11 UTC (permalink / raw)
  To: Guo Ren, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, linux-doc,
	linux-kernel, linux-riscv, linux-arch

Hi Guo,

On 17/07/2024 11:30, Guo Ren wrote:
> On Wed, Jul 17, 2024 at 2:31 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>> In order to produce a generic kernel, a user can select
>> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
>> spinlock implementation if Zabha or Ziccrse are not present.
>>
>> Note that we can't use alternatives here because the discovery of
>> extensions is done too late and we need to start with the qspinlock
>> implementation because the ticket spinlock implementation would pollute
>> the spinlock value, so let's use static keys.
>>
>> This is largely based on Guo's work and Leonardo reviews at [1].
>>
>> Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
>> Signed-off-by: Guo Ren <guoren@kernel.org>
>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>> ---
>>   .../locking/queued-spinlocks/arch-support.txt |  2 +-
>>   arch/riscv/Kconfig                            | 29 ++++++++++++++
>>   arch/riscv/include/asm/Kbuild                 |  4 +-
>>   arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
>>   arch/riscv/kernel/setup.c                     | 33 ++++++++++++++++
>>   include/asm-generic/qspinlock.h               |  2 +
>>   include/asm-generic/ticket_spinlock.h         |  2 +
>>   7 files changed, 109 insertions(+), 2 deletions(-)
>>   create mode 100644 arch/riscv/include/asm/spinlock.h
>>
>> diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
>> index 22f2990392ff..cf26042480e2 100644
>> --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
>> +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
>> @@ -20,7 +20,7 @@
>>       |    openrisc: |  ok  |
>>       |      parisc: | TODO |
>>       |     powerpc: |  ok  |
>> -    |       riscv: | TODO |
>> +    |       riscv: |  ok  |
>>       |        s390: | TODO |
>>       |          sh: | TODO |
>>       |       sparc: |  ok  |
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 0bbaec0444d0..5040c7eac70d 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -72,6 +72,7 @@ config RISCV
>>          select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
>>          select ARCH_WANTS_NO_INSTR
>>          select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
>> +       select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
>>          select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
>>          select BUILDTIME_TABLE_SORT if MMU
>>          select CLINT_TIMER if RISCV_M_MODE
>> @@ -482,6 +483,34 @@ config NODES_SHIFT
>>            Specify the maximum number of NUMA Nodes available on the target
>>            system.  Increases memory reserved to accommodate various tables.
>>
>> +choice
>> +       prompt "RISC-V spinlock type"
>> +       default RISCV_COMBO_SPINLOCKS
>> +
>> +config RISCV_TICKET_SPINLOCKS
>> +       bool "Using ticket spinlock"
>> +
>> +config RISCV_QUEUED_SPINLOCKS
>> +       bool "Using queued spinlock"
>> +       depends on SMP && MMU
>> +       select ARCH_USE_QUEUED_SPINLOCKS
>> +       help
>> +         The queued spinlock implementation requires the forward progress
>> +         guarantee of cmpxchg()/xchg() atomic operations: CAS with Zabha or
>> +         LR/SC with Ziccrse provide such guarantee.
>> +
>> +         Select this if and only if Zabha or Ziccrse is available on your
>> +         platform.
>> +
>> +config RISCV_COMBO_SPINLOCKS
>> +       bool "Using combo spinlock"
>> +       depends on SMP && MMU
>> +       select ARCH_USE_QUEUED_SPINLOCKS
>> +       help
>> +         Embed both queued spinlock and ticket lock so that the spinlock
>> +         implementation can be chosen at runtime.
>> +endchoice
>> +
>>   config RISCV_ALTERNATIVE
>>          bool
>>          depends on !XIP_KERNEL
>> diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
>> index 504f8b7e72d4..ad72f2bd4cc9 100644
>> --- a/arch/riscv/include/asm/Kbuild
>> +++ b/arch/riscv/include/asm/Kbuild
>> @@ -2,10 +2,12 @@
>>   generic-y += early_ioremap.h
>>   generic-y += flat.h
>>   generic-y += kvm_para.h
>> +generic-y += mcs_spinlock.h
>>   generic-y += parport.h
>> -generic-y += spinlock.h
>>   generic-y += spinlock_types.h
>> +generic-y += ticket_spinlock.h
>>   generic-y += qrwlock.h
>>   generic-y += qrwlock_types.h
>> +generic-y += qspinlock.h
>>   generic-y += user.h
>>   generic-y += vmlinux.lds.h
>> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
>> new file mode 100644
>> index 000000000000..4856d50006f2
>> --- /dev/null
>> +++ b/arch/riscv/include/asm/spinlock.h
>> @@ -0,0 +1,39 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef __ASM_RISCV_SPINLOCK_H
>> +#define __ASM_RISCV_SPINLOCK_H
>> +
>> +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
>> +#define _Q_PENDING_LOOPS       (1 << 9)
>> +
>> +#define __no_arch_spinlock_redefine
>> +#include <asm/ticket_spinlock.h>
>> +#include <asm/qspinlock.h>
>> +#include <asm/alternative.h>
>> +
>> +DECLARE_STATIC_KEY_TRUE(qspinlock_key);
>> +
>> +#define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
>> +static __always_inline type arch_spin_##op(type_lock lock)             \
>> +{                                                                      \
>> +       if (static_branch_unlikely(&qspinlock_key))                     \
>> +               return queued_spin_##op(lock);                          \
>> +       return ticket_spin_##op(lock);                                  \
>> +}
>> +
>> +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
>> +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
>> +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
>> +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
>> +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
>> +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
>> +
>> +#else
>> +
>> +#include <asm/ticket_spinlock.h>
>> +
>> +#endif
>> +
>> +#include <asm/qrwlock.h>
>> +
>> +#endif /* __ASM_RISCV_SPINLOCK_H */
>> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
>> index 4f73c0ae44b2..d7c31c9b8ead 100644
>> --- a/arch/riscv/kernel/setup.c
>> +++ b/arch/riscv/kernel/setup.c
>> @@ -244,6 +244,38 @@ static void __init parse_dtb(void)
>>   #endif
>>   }
>>
>> +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
>> +EXPORT_SYMBOL(qspinlock_key);
>> +
>> +static void __init riscv_spinlock_init(void)
>> +{
>> +       char *using_ext;
>> +
>> +       if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) &&
>> +           IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {
>> +               using_ext = "using Zabha";
>> +
>> +               asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
>> +                        : : : : no_zacas);
>> +               asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
>> +                        : : : : qspinlock);
>> +       }
> I'm okay with this patch.


Great, thanks!


> I suggest putting an arg such as "enable_qspinlock," which people
> could use on the non-ZABHA machines. I hope it could happen in this
> series. That's all I need, thank you very much.


Do you think that's really necessary? I added Ziccrse support just 
below, to me that fits your needs to use qspinlocks on !Ziccrse.

BTW, can I add you SoB on this patch? Or a Co-developed-by or anything 
to show that you greatly contributed to this patch?

Thanks,

Alex


>
>> +
>> +no_zacas:
>> +       using_ext = "using Ziccrse";
>> +       asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
>> +                            RISCV_ISA_EXT_ZICCRSE, 1)
>> +                : : : : qspinlock);
>> +
>> +       static_branch_disable(&qspinlock_key);
>> +       pr_info("Ticket spinlock: enabled\n");
>> +
>> +       return;
>> +
>> +qspinlock:
>> +       pr_info("Queued spinlock %s: enabled\n", using_ext);
>> +}
>> +
>>   extern void __init init_rt_signal_env(void);
>>
>>   void __init setup_arch(char **cmdline_p)
>> @@ -295,6 +327,7 @@ void __init setup_arch(char **cmdline_p)
>>          riscv_set_dma_cache_alignment();
>>
>>          riscv_user_isa_enable();
>> +       riscv_spinlock_init();
>>   }
>>
>>   bool arch_cpu_is_hotpluggable(int cpu)
>> diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
>> index 0655aa5b57b2..bf47cca2c375 100644
>> --- a/include/asm-generic/qspinlock.h
>> +++ b/include/asm-generic/qspinlock.h
>> @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
>>   }
>>   #endif
>>
>> +#ifndef __no_arch_spinlock_redefine
>>   /*
>>    * Remapping spinlock architecture specific functions to the corresponding
>>    * queued spinlock functions.
>> @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
>>   #define arch_spin_lock(l)              queued_spin_lock(l)
>>   #define arch_spin_trylock(l)           queued_spin_trylock(l)
>>   #define arch_spin_unlock(l)            queued_spin_unlock(l)
>> +#endif
>>
>>   #endif /* __ASM_GENERIC_QSPINLOCK_H */
>> diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
>> index cfcff22b37b3..325779970d8a 100644
>> --- a/include/asm-generic/ticket_spinlock.h
>> +++ b/include/asm-generic/ticket_spinlock.h
>> @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
>>          return (s16)((val >> 16) - (val & 0xffff)) > 1;
>>   }
>>
>> +#ifndef __no_arch_spinlock_redefine
>>   /*
>>    * Remapping spinlock architecture specific functions to the corresponding
>>    * ticket spinlock functions.
>> @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
>>   #define arch_spin_lock(l)              ticket_spin_lock(l)
>>   #define arch_spin_trylock(l)           ticket_spin_trylock(l)
>>   #define arch_spin_unlock(l)            ticket_spin_unlock(l)
>> +#endif
>>
>>   #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
>> --
>> 2.39.2
>>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-18 12:50     ` Alexandre Ghiti
@ 2024-07-18 16:06       ` Andrew Jones
  2024-07-18 16:20         ` Alexandre Ghiti
  0 siblings, 1 reply; 40+ messages in thread
From: Andrew Jones @ 2024-07-18 16:06 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Conor Dooley, Rob Herring, Krzysztof Kozlowski,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

On Thu, Jul 18, 2024 at 02:50:28PM GMT, Alexandre Ghiti wrote:
...
> > > +									\
> > > +		__asm__ __volatile__ (					\
> > > +			prepend						\
> > > +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
> > > +			append						\
> > > +			: "+&r" (r), "+A" (*(p))			\
> > > +			: "rJ" (n)					\
> > > +			: "memory");					\
> > > +		goto end;						\
> > > +	}								\
> > > +									\
> > > +no_zabha_zacas:;							\
> > unnecessary ;
> 
> 
> Actually it is, it fixes a warning encountered on llvm:
> https://lore.kernel.org/linux-riscv/20240528193110.GA2196855@thelio-3990X/

I'm not complaining about the 'end:' label. That one we need ';' because
there's no following statement and labels must be followed by a statement.
But no_zabha_zacas always has following statements.

Thanks,
drew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-18 16:06       ` Andrew Jones
@ 2024-07-18 16:20         ` Alexandre Ghiti
  0 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-18 16:20 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Conor Dooley, Rob Herring, Krzysztof Kozlowski,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

On Thu, Jul 18, 2024 at 6:06 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Thu, Jul 18, 2024 at 02:50:28PM GMT, Alexandre Ghiti wrote:
> ...
> > > > +                                                                 \
> > > > +         __asm__ __volatile__ (                                  \
> > > > +                 prepend                                         \
> > > > +                 "       amocas" cas_sfx " %0, %z2, %1\n"        \
> > > > +                 append                                          \
> > > > +                 : "+&r" (r), "+A" (*(p))                        \
> > > > +                 : "rJ" (n)                                      \
> > > > +                 : "memory");                                    \
> > > > +         goto end;                                               \
> > > > + }                                                               \
> > > > +                                                                 \
> > > > +no_zabha_zacas:;                                                 \
> > > unnecessary ;
> >
> >
> > Actually it is, it fixes a warning encountered on llvm:
> > https://lore.kernel.org/linux-riscv/20240528193110.GA2196855@thelio-3990X/
>
> I'm not complaining about the 'end:' label. That one we need ';' because
> there's no following statement and labels must be followed by a statement.
> But no_zabha_zacas always has following statements.

My bad, that's another warning that is emitted by llvm and requires the ';':

../include/linux/atomic/atomic-arch-fallback.h:2026:9: warning: label
followed by a declaration is a C23 extension [-Wc23-extensions]
 2026 |         return raw_cmpxchg(&v->counter, old, new);
      |                ^
../include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded
from macro 'raw_cmpxchg'
   55 | #define raw_cmpxchg arch_cmpxchg
      |                     ^
../arch/riscv/include/asm/cmpxchg.h:310:2: note: expanded from macro
'arch_cmpxchg'
  310 |         _arch_cmpxchg((ptr), (o), (n), ".rl", ".aqrl",
         \
      |         ^
../arch/riscv/include/asm/cmpxchg.h:269:3: note: expanded from macro
'_arch_cmpxchg'
  269 |                 __arch_cmpxchg_masked(sc_sfx, ".b" cas_sfx,
         \
      |                 ^
../arch/riscv/include/asm/cmpxchg.h:178:2: note: expanded from macro
'__arch_cmpxchg_masked'
  178 |         u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);
         \
      |         ^


>
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-17  6:19 ` [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
  2024-07-17 15:08   ` Andrew Jones
@ 2024-07-19  0:45   ` Samuel Holland
  2024-07-19 11:48     ` Alexandre Ghiti
  1 sibling, 1 reply; 40+ messages in thread
From: Samuel Holland @ 2024-07-19  0:45 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Alex,

On 2024-07-17 1:19 AM, Alexandre Ghiti wrote:
> This adds runtime support for Zacas in cmpxchg operations.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/Kconfig               | 17 +++++++++++++++++
>  arch/riscv/Makefile              |  3 +++
>  arch/riscv/include/asm/cmpxchg.h | 26 +++++++++++++++++++++++---
>  3 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 05ccba8ca33a..1caaedec88c7 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
>  	  preemption. Enabling this config will result in higher memory
>  	  consumption due to the allocation of per-task's kernel Vector context.
>  
> +config TOOLCHAIN_HAS_ZACAS
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zacas)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zacas)
> +	depends on AS_HAS_OPTION_ARCH
> +
> +config RISCV_ISA_ZACAS
> +	bool "Zacas extension support for atomic CAS"
> +	depends on TOOLCHAIN_HAS_ZACAS
> +	default y
> +	help
> +	  Enable the use of the Zacas ISA-extension to implement kernel atomic
> +	  cmpxchg operations when it is detected at boot.
> +
> +	  If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZBB
>  	bool
>  	default y
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index 06de9d365088..9fd13d7a9cc6 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -85,6 +85,9 @@ endif
>  # Check if the toolchain supports Zihintpause extension
>  riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
>  
> +# Check if the toolchain supports Zacas
> +riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
> +
>  # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
>  # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
>  KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index 808b4c78462e..5d38153e2f13 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -9,6 +9,7 @@
>  #include <linux/bug.h>
>  
>  #include <asm/fence.h>
> +#include <asm/alternative.h>
>  
>  #define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)		\
>  ({									\
> @@ -134,21 +135,40 @@
>  	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
>  })
>  
> -#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)	\
> +#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
>  ({									\
> +	__label__ no_zacas, end;					\
>  	register unsigned int __rc;					\
>  									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
> +				     RISCV_ISA_EXT_ZACAS, 1)		\
> +			 : : : : no_zacas);				\
> +									\
> +		__asm__ __volatile__ (					\
> +			prepend						\
> +			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
> +			append						\
> +			: "+&r" (r), "+A" (*(p))			\
> +			: "rJ" (n)					\
> +			: "memory");					\
> +		goto end;						\
> +	}								\
> +									\
> +no_zacas:								\
>  	__asm__ __volatile__ (						\
>  		prepend							\
>  		"0:	lr" lr_sfx " %0, %2\n"				\
>  		"	bne  %0, %z3, 1f\n"				\
> -		"	sc" sc_sfx " %1, %z4, %2\n"			\
> +		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
>  		"	bnez %1, 0b\n"					\
>  		append							\

This would probably be a good place to use inline ALTERNATIVE instead of an asm
goto. It saves overall code size, and a jump in the non-Zacas case, at the cost
of 3 nops in the Zacas case. (And all the nops can go after the amocas, where
they will likely be hidden by the amocas latency.)

Regards,
Samuel

>  		"1:\n"							\
>  		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
>  		: "rJ" (co o), "rJ" (n)					\
>  		: "memory");						\
> +									\
> +end:;									\
>  })
>  
>  #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
> @@ -156,7 +176,7 @@
>  	__typeof__(ptr) __ptr = (ptr);					\
>  	__typeof__(*(__ptr)) __old = (old);				\
>  	__typeof__(*(__ptr)) __new = (new);				\
> -	__typeof__(*(__ptr)) __ret;					\
> +	__typeof__(*(__ptr)) __ret = (old);				\
>  									\
>  	switch (sizeof(*__ptr)) {					\
>  	case 1:								\


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 09/11] riscv: Add ISA extension parsing for Ziccrse
  2024-07-17  6:19 ` [PATCH v3 09/11] riscv: Add ISA extension parsing for Ziccrse Alexandre Ghiti
@ 2024-07-19  0:53   ` Samuel Holland
  2024-07-19  9:11     ` Alexandre Ghiti
  0 siblings, 1 reply; 40+ messages in thread
From: Samuel Holland @ 2024-07-19  0:53 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Alex,

On 2024-07-17 1:19 AM, Alexandre Ghiti wrote:
> Add support to parse the Ziccrse string in the riscv,isa string.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/include/asm/hwcap.h | 1 +
>  arch/riscv/kernel/cpufeature.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index f71ddd2ca163..863b9b7d4a4f 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -82,6 +82,7 @@
>  #define RISCV_ISA_EXT_ZACAS		73
>  #define RISCV_ISA_EXT_XANDESPMU		74
>  #define RISCV_ISA_EXT_ZABHA		75
> +#define RISCV_ISA_EXT_ZICCRSE		76
>  
>  #define RISCV_ISA_EXT_XLINUXENVCFG	127
>  
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index c125d82c894b..93d8cc7e232c 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -306,6 +306,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
>  	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
>  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
>  	__RISCV_ISA_EXT_DATA(xandespmu, RISCV_ISA_EXT_XANDESPMU),
> +	__RISCV_ISA_EXT_DATA(ziccrse, RISCV_ISA_EXT_ZICCRSE),

Please sort this entry per the comment at the beginning of the array.

Regards,
Samuel

>  };
>  
>  const size_t riscv_isa_ext_count = ARRAY_SIZE(riscv_isa_ext);


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 11/11] riscv: Add qspinlock support
  2024-07-17  6:19 ` [PATCH v3 11/11] riscv: Add qspinlock support Alexandre Ghiti
  2024-07-17  9:30   ` Guo Ren
  2024-07-17 16:29   ` Andrea Parri
@ 2024-07-19  1:05   ` Samuel Holland
  2024-07-19  9:06     ` Alexandre Ghiti
  2 siblings, 1 reply; 40+ messages in thread
From: Samuel Holland @ 2024-07-19  1:05 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Alex,

On 2024-07-17 1:19 AM, Alexandre Ghiti wrote:
> In order to produce a generic kernel, a user can select
> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> spinlock implementation if Zabha or Ziccrse are not present.
> 
> Note that we can't use alternatives here because the discovery of
> extensions is done too late and we need to start with the qspinlock
> implementation because the ticket spinlock implementation would pollute
> the spinlock value, so let's use static keys.
> 
> This is largely based on Guo's work and Leonardo reviews at [1].
> 
> Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
> Signed-off-by: Guo Ren <guoren@kernel.org>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  .../locking/queued-spinlocks/arch-support.txt |  2 +-
>  arch/riscv/Kconfig                            | 29 ++++++++++++++
>  arch/riscv/include/asm/Kbuild                 |  4 +-
>  arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
>  arch/riscv/kernel/setup.c                     | 33 ++++++++++++++++
>  include/asm-generic/qspinlock.h               |  2 +
>  include/asm-generic/ticket_spinlock.h         |  2 +
>  7 files changed, 109 insertions(+), 2 deletions(-)
>  create mode 100644 arch/riscv/include/asm/spinlock.h
> 
> diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> index 22f2990392ff..cf26042480e2 100644
> --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
> +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> @@ -20,7 +20,7 @@
>      |    openrisc: |  ok  |
>      |      parisc: | TODO |
>      |     powerpc: |  ok  |
> -    |       riscv: | TODO |
> +    |       riscv: |  ok  |
>      |        s390: | TODO |
>      |          sh: | TODO |
>      |       sparc: |  ok  |
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 0bbaec0444d0..5040c7eac70d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -72,6 +72,7 @@ config RISCV
>  	select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
>  	select ARCH_WANTS_NO_INSTR
>  	select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +	select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
>  	select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
>  	select BUILDTIME_TABLE_SORT if MMU
>  	select CLINT_TIMER if RISCV_M_MODE
> @@ -482,6 +483,34 @@ config NODES_SHIFT
>  	  Specify the maximum number of NUMA Nodes available on the target
>  	  system.  Increases memory reserved to accommodate various tables.
>  
> +choice
> +	prompt "RISC-V spinlock type"
> +	default RISCV_COMBO_SPINLOCKS
> +
> +config RISCV_TICKET_SPINLOCKS
> +	bool "Using ticket spinlock"
> +
> +config RISCV_QUEUED_SPINLOCKS
> +	bool "Using queued spinlock"
> +	depends on SMP && MMU

This needs:

	depends on NONPORTABLE

> +	select ARCH_USE_QUEUED_SPINLOCKS
> +	help
> +	  The queued spinlock implementation requires the forward progress
> +	  guarantee of cmpxchg()/xchg() atomic operations: CAS with Zabha or
> +	  LR/SC with Ziccrse provide such guarantee.
> +
> +	  Select this if and only if Zabha or Ziccrse is available on your
> +	  platform.
> +
> +config RISCV_COMBO_SPINLOCKS
> +	bool "Using combo spinlock"
> +	depends on SMP && MMU
> +	select ARCH_USE_QUEUED_SPINLOCKS
> +	help
> +	  Embed both queued spinlock and ticket lock so that the spinlock
> +	  implementation can be chosen at runtime.
> +endchoice
> +
>  config RISCV_ALTERNATIVE
>  	bool
>  	depends on !XIP_KERNEL
> diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> index 504f8b7e72d4..ad72f2bd4cc9 100644
> --- a/arch/riscv/include/asm/Kbuild
> +++ b/arch/riscv/include/asm/Kbuild
> @@ -2,10 +2,12 @@
>  generic-y += early_ioremap.h
>  generic-y += flat.h
>  generic-y += kvm_para.h
> +generic-y += mcs_spinlock.h
>  generic-y += parport.h
> -generic-y += spinlock.h
>  generic-y += spinlock_types.h
> +generic-y += ticket_spinlock.h
>  generic-y += qrwlock.h
>  generic-y += qrwlock_types.h
> +generic-y += qspinlock.h
>  generic-y += user.h
>  generic-y += vmlinux.lds.h
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> new file mode 100644
> index 000000000000..4856d50006f2
> --- /dev/null
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_RISCV_SPINLOCK_H
> +#define __ASM_RISCV_SPINLOCK_H
> +
> +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> +#define _Q_PENDING_LOOPS	(1 << 9)
> +
> +#define __no_arch_spinlock_redefine
> +#include <asm/ticket_spinlock.h>
> +#include <asm/qspinlock.h>
> +#include <asm/alternative.h>
> +
> +DECLARE_STATIC_KEY_TRUE(qspinlock_key);
> +
> +#define SPINLOCK_BASE_DECLARE(op, type, type_lock)			\
> +static __always_inline type arch_spin_##op(type_lock lock)		\
> +{									\
> +	if (static_branch_unlikely(&qspinlock_key))			\
> +		return queued_spin_##op(lock);				\
> +	return ticket_spin_##op(lock);					\
> +}
> +
> +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
> +
> +#else
> +
> +#include <asm/ticket_spinlock.h>
> +
> +#endif
> +
> +#include <asm/qrwlock.h>
> +
> +#endif /* __ASM_RISCV_SPINLOCK_H */
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 4f73c0ae44b2..d7c31c9b8ead 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -244,6 +244,38 @@ static void __init parse_dtb(void)
>  #endif
>  }
>  
> +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
> +EXPORT_SYMBOL(qspinlock_key);
> +
> +static void __init riscv_spinlock_init(void)
> +{
> +	char *using_ext;
> +
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) &&
> +	    IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {
> +		using_ext = "using Zabha";
> +
> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
> +			 : : : : no_zacas);
> +		asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
> +			 : : : : qspinlock);
> +	}
> +
> +no_zacas:
> +	using_ext = "using Ziccrse";
> +	asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
> +			     RISCV_ISA_EXT_ZICCRSE, 1)
> +		 : : : : qspinlock);
> +
> +	static_branch_disable(&qspinlock_key);
> +	pr_info("Ticket spinlock: enabled\n");
> +
> +	return;
> +
> +qspinlock:
> +	pr_info("Queued spinlock %s: enabled\n", using_ext);
> +}

This function would be much easier to read with
riscv_has_extension_[un]likely(), or even riscv_isa_extension_available() since
it only gets called once. Thankfully the concerns about using those inside
macros don't apply here :)

Regards,
Samuel

> +
>  extern void __init init_rt_signal_env(void);
>  
>  void __init setup_arch(char **cmdline_p)
> @@ -295,6 +327,7 @@ void __init setup_arch(char **cmdline_p)
>  	riscv_set_dma_cache_alignment();
>  
>  	riscv_user_isa_enable();
> +	riscv_spinlock_init();
>  }
>  
>  bool arch_cpu_is_hotpluggable(int cpu)
> diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> index 0655aa5b57b2..bf47cca2c375 100644
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
>  }
>  #endif
>  
> +#ifndef __no_arch_spinlock_redefine
>  /*
>   * Remapping spinlock architecture specific functions to the corresponding
>   * queued spinlock functions.
> @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
>  #define arch_spin_lock(l)		queued_spin_lock(l)
>  #define arch_spin_trylock(l)		queued_spin_trylock(l)
>  #define arch_spin_unlock(l)		queued_spin_unlock(l)
> +#endif
>  
>  #endif /* __ASM_GENERIC_QSPINLOCK_H */
> diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
> index cfcff22b37b3..325779970d8a 100644
> --- a/include/asm-generic/ticket_spinlock.h
> +++ b/include/asm-generic/ticket_spinlock.h
> @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
>  	return (s16)((val >> 16) - (val & 0xffff)) > 1;
>  }
>  
> +#ifndef __no_arch_spinlock_redefine
>  /*
>   * Remapping spinlock architecture specific functions to the corresponding
>   * ticket spinlock functions.
> @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
>  #define arch_spin_lock(l)		ticket_spin_lock(l)
>  #define arch_spin_trylock(l)		ticket_spin_trylock(l)
>  #define arch_spin_unlock(l)		ticket_spin_unlock(l)
> +#endif
>  
>  #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 11/11] riscv: Add qspinlock support
  2024-07-19  1:05   ` Samuel Holland
@ 2024-07-19  9:06     ` Alexandre Ghiti
  0 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-19  9:06 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Samuel,

On Fri, Jul 19, 2024 at 3:05 AM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Alex,
>
> On 2024-07-17 1:19 AM, Alexandre Ghiti wrote:
> > In order to produce a generic kernel, a user can select
> > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> > spinlock implementation if Zabha or Ziccrse are not present.
> >
> > Note that we can't use alternatives here because the discovery of
> > extensions is done too late and we need to start with the qspinlock
> > implementation because the ticket spinlock implementation would pollute
> > the spinlock value, so let's use static keys.
> >
> > This is largely based on Guo's work and Leonardo reviews at [1].
> >
> > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
> > Signed-off-by: Guo Ren <guoren@kernel.org>
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >  .../locking/queued-spinlocks/arch-support.txt |  2 +-
> >  arch/riscv/Kconfig                            | 29 ++++++++++++++
> >  arch/riscv/include/asm/Kbuild                 |  4 +-
> >  arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
> >  arch/riscv/kernel/setup.c                     | 33 ++++++++++++++++
> >  include/asm-generic/qspinlock.h               |  2 +
> >  include/asm-generic/ticket_spinlock.h         |  2 +
> >  7 files changed, 109 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/spinlock.h
> >
> > diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > index 22f2990392ff..cf26042480e2 100644
> > --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > @@ -20,7 +20,7 @@
> >      |    openrisc: |  ok  |
> >      |      parisc: | TODO |
> >      |     powerpc: |  ok  |
> > -    |       riscv: | TODO |
> > +    |       riscv: |  ok  |
> >      |        s390: | TODO |
> >      |          sh: | TODO |
> >      |       sparc: |  ok  |
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 0bbaec0444d0..5040c7eac70d 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -72,6 +72,7 @@ config RISCV
> >       select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
> >       select ARCH_WANTS_NO_INSTR
> >       select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > +     select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
> >       select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
> >       select BUILDTIME_TABLE_SORT if MMU
> >       select CLINT_TIMER if RISCV_M_MODE
> > @@ -482,6 +483,34 @@ config NODES_SHIFT
> >         Specify the maximum number of NUMA Nodes available on the target
> >         system.  Increases memory reserved to accommodate various tables.
> >
> > +choice
> > +     prompt "RISC-V spinlock type"
> > +     default RISCV_COMBO_SPINLOCKS
> > +
> > +config RISCV_TICKET_SPINLOCKS
> > +     bool "Using ticket spinlock"
> > +
> > +config RISCV_QUEUED_SPINLOCKS
> > +     bool "Using queued spinlock"
> > +     depends on SMP && MMU
>
> This needs:
>
>         depends on NONPORTABLE

Nice, thanks

>
> > +     select ARCH_USE_QUEUED_SPINLOCKS
> > +     help
> > +       The queued spinlock implementation requires the forward progress
> > +       guarantee of cmpxchg()/xchg() atomic operations: CAS with Zabha or
> > +       LR/SC with Ziccrse provide such guarantee.
> > +
> > +       Select this if and only if Zabha or Ziccrse is available on your
> > +       platform.
> > +
> > +config RISCV_COMBO_SPINLOCKS
> > +     bool "Using combo spinlock"
> > +     depends on SMP && MMU
> > +     select ARCH_USE_QUEUED_SPINLOCKS
> > +     help
> > +       Embed both queued spinlock and ticket lock so that the spinlock
> > +       implementation can be chosen at runtime.
> > +endchoice
> > +
> >  config RISCV_ALTERNATIVE
> >       bool
> >       depends on !XIP_KERNEL
> > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> > index 504f8b7e72d4..ad72f2bd4cc9 100644
> > --- a/arch/riscv/include/asm/Kbuild
> > +++ b/arch/riscv/include/asm/Kbuild
> > @@ -2,10 +2,12 @@
> >  generic-y += early_ioremap.h
> >  generic-y += flat.h
> >  generic-y += kvm_para.h
> > +generic-y += mcs_spinlock.h
> >  generic-y += parport.h
> > -generic-y += spinlock.h
> >  generic-y += spinlock_types.h
> > +generic-y += ticket_spinlock.h
> >  generic-y += qrwlock.h
> >  generic-y += qrwlock_types.h
> > +generic-y += qspinlock.h
> >  generic-y += user.h
> >  generic-y += vmlinux.lds.h
> > diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> > new file mode 100644
> > index 000000000000..4856d50006f2
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/spinlock.h
> > @@ -0,0 +1,39 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef __ASM_RISCV_SPINLOCK_H
> > +#define __ASM_RISCV_SPINLOCK_H
> > +
> > +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> > +#define _Q_PENDING_LOOPS     (1 << 9)
> > +
> > +#define __no_arch_spinlock_redefine
> > +#include <asm/ticket_spinlock.h>
> > +#include <asm/qspinlock.h>
> > +#include <asm/alternative.h>
> > +
> > +DECLARE_STATIC_KEY_TRUE(qspinlock_key);
> > +
> > +#define SPINLOCK_BASE_DECLARE(op, type, type_lock)                   \
> > +static __always_inline type arch_spin_##op(type_lock lock)           \
> > +{                                                                    \
> > +     if (static_branch_unlikely(&qspinlock_key))                     \
> > +             return queued_spin_##op(lock);                          \
> > +     return ticket_spin_##op(lock);                                  \
> > +}
> > +
> > +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
> > +
> > +#else
> > +
> > +#include <asm/ticket_spinlock.h>
> > +
> > +#endif
> > +
> > +#include <asm/qrwlock.h>
> > +
> > +#endif /* __ASM_RISCV_SPINLOCK_H */
> > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> > index 4f73c0ae44b2..d7c31c9b8ead 100644
> > --- a/arch/riscv/kernel/setup.c
> > +++ b/arch/riscv/kernel/setup.c
> > @@ -244,6 +244,38 @@ static void __init parse_dtb(void)
> >  #endif
> >  }
> >
> > +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
> > +EXPORT_SYMBOL(qspinlock_key);
> > +
> > +static void __init riscv_spinlock_init(void)
> > +{
> > +     char *using_ext;
> > +
> > +     if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) &&
> > +         IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {
> > +             using_ext = "using Zabha";
> > +
> > +             asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
> > +                      : : : : no_zacas);
> > +             asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
> > +                      : : : : qspinlock);
> > +     }
> > +
> > +no_zacas:
> > +     using_ext = "using Ziccrse";
> > +     asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
> > +                          RISCV_ISA_EXT_ZICCRSE, 1)
> > +              : : : : qspinlock);
> > +
> > +     static_branch_disable(&qspinlock_key);
> > +     pr_info("Ticket spinlock: enabled\n");
> > +
> > +     return;
> > +
> > +qspinlock:
> > +     pr_info("Queued spinlock %s: enabled\n", using_ext);
> > +}
>
> This function would be much easier to read with
> riscv_has_extension_[un]likely(), or even riscv_isa_extension_available() since
> it only gets called once. Thankfully the concerns about using those inside
> macros don't apply here :)
>

Yeah, way better, thanks!

Alex

> Regards,
> Samuel
>
> > +
> >  extern void __init init_rt_signal_env(void);
> >
> >  void __init setup_arch(char **cmdline_p)
> > @@ -295,6 +327,7 @@ void __init setup_arch(char **cmdline_p)
> >       riscv_set_dma_cache_alignment();
> >
> >       riscv_user_isa_enable();
> > +     riscv_spinlock_init();
> >  }
> >
> >  bool arch_cpu_is_hotpluggable(int cpu)
> > diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> > index 0655aa5b57b2..bf47cca2c375 100644
> > --- a/include/asm-generic/qspinlock.h
> > +++ b/include/asm-generic/qspinlock.h
> > @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> >  }
> >  #endif
> >
> > +#ifndef __no_arch_spinlock_redefine
> >  /*
> >   * Remapping spinlock architecture specific functions to the corresponding
> >   * queued spinlock functions.
> > @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> >  #define arch_spin_lock(l)            queued_spin_lock(l)
> >  #define arch_spin_trylock(l)         queued_spin_trylock(l)
> >  #define arch_spin_unlock(l)          queued_spin_unlock(l)
> > +#endif
> >
> >  #endif /* __ASM_GENERIC_QSPINLOCK_H */
> > diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
> > index cfcff22b37b3..325779970d8a 100644
> > --- a/include/asm-generic/ticket_spinlock.h
> > +++ b/include/asm-generic/ticket_spinlock.h
> > @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> >       return (s16)((val >> 16) - (val & 0xffff)) > 1;
> >  }
> >
> > +#ifndef __no_arch_spinlock_redefine
> >  /*
> >   * Remapping spinlock architecture specific functions to the corresponding
> >   * ticket spinlock functions.
> > @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> >  #define arch_spin_lock(l)            ticket_spin_lock(l)
> >  #define arch_spin_trylock(l)         ticket_spin_trylock(l)
> >  #define arch_spin_unlock(l)          ticket_spin_unlock(l)
> > +#endif
> >
> >  #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 09/11] riscv: Add ISA extension parsing for Ziccrse
  2024-07-19  0:53   ` Samuel Holland
@ 2024-07-19  9:11     ` Alexandre Ghiti
  0 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-19  9:11 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

On Fri, Jul 19, 2024 at 2:53 AM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Alex,
>
> On 2024-07-17 1:19 AM, Alexandre Ghiti wrote:
> > Add support to parse the Ziccrse string in the riscv,isa string.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/hwcap.h | 1 +
> >  arch/riscv/kernel/cpufeature.c | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index f71ddd2ca163..863b9b7d4a4f 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -82,6 +82,7 @@
> >  #define RISCV_ISA_EXT_ZACAS          73
> >  #define RISCV_ISA_EXT_XANDESPMU              74
> >  #define RISCV_ISA_EXT_ZABHA          75
> > +#define RISCV_ISA_EXT_ZICCRSE                76
> >
> >  #define RISCV_ISA_EXT_XLINUXENVCFG   127
> >
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index c125d82c894b..93d8cc7e232c 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -306,6 +306,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
> >       __RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
> >       __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
> >       __RISCV_ISA_EXT_DATA(xandespmu, RISCV_ISA_EXT_XANDESPMU),
> > +     __RISCV_ISA_EXT_DATA(ziccrse, RISCV_ISA_EXT_ZICCRSE),
>
> Please sort this entry per the comment at the beginning of the array.

Done, thanks

Alex

>
> Regards,
> Samuel
>
> >  };
> >
> >  const size_t riscv_isa_ext_count = ARRAY_SIZE(riscv_isa_ext);
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-19  0:45   ` Samuel Holland
@ 2024-07-19 11:48     ` Alexandre Ghiti
  2024-07-19 11:53       ` Alexandre Ghiti
  0 siblings, 1 reply; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-19 11:48 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

On Fri, Jul 19, 2024 at 2:45 AM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Alex,
>
> On 2024-07-17 1:19 AM, Alexandre Ghiti wrote:
> > This adds runtime support for Zacas in cmpxchg operations.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >  arch/riscv/Kconfig               | 17 +++++++++++++++++
> >  arch/riscv/Makefile              |  3 +++
> >  arch/riscv/include/asm/cmpxchg.h | 26 +++++++++++++++++++++++---
> >  3 files changed, 43 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 05ccba8ca33a..1caaedec88c7 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
> >         preemption. Enabling this config will result in higher memory
> >         consumption due to the allocation of per-task's kernel Vector context.
> >
> > +config TOOLCHAIN_HAS_ZACAS
> > +     bool
> > +     default y
> > +     depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zacas)
> > +     depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zacas)
> > +     depends on AS_HAS_OPTION_ARCH
> > +
> > +config RISCV_ISA_ZACAS
> > +     bool "Zacas extension support for atomic CAS"
> > +     depends on TOOLCHAIN_HAS_ZACAS
> > +     default y
> > +     help
> > +       Enable the use of the Zacas ISA-extension to implement kernel atomic
> > +       cmpxchg operations when it is detected at boot.
> > +
> > +       If you don't know what to do here, say Y.
> > +
> >  config TOOLCHAIN_HAS_ZBB
> >       bool
> >       default y
> > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > index 06de9d365088..9fd13d7a9cc6 100644
> > --- a/arch/riscv/Makefile
> > +++ b/arch/riscv/Makefile
> > @@ -85,6 +85,9 @@ endif
> >  # Check if the toolchain supports Zihintpause extension
> >  riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
> >
> > +# Check if the toolchain supports Zacas
> > +riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
> > +
> >  # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> >  # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> >  KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > index 808b4c78462e..5d38153e2f13 100644
> > --- a/arch/riscv/include/asm/cmpxchg.h
> > +++ b/arch/riscv/include/asm/cmpxchg.h
> > @@ -9,6 +9,7 @@
> >  #include <linux/bug.h>
> >
> >  #include <asm/fence.h>
> > +#include <asm/alternative.h>
> >
> >  #define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)         \
> >  ({                                                                   \
> > @@ -134,21 +135,40 @@
> >       r = (__typeof__(*(p)))((__retx & __mask) >> __s);               \
> >  })
> >
> > -#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)      \
> > +#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)  \
> >  ({                                                                   \
> > +     __label__ no_zacas, end;                                        \
> >       register unsigned int __rc;                                     \
> >                                                                       \
> > +     if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {                       \
> > +             asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,         \
> > +                                  RISCV_ISA_EXT_ZACAS, 1)            \
> > +                      : : : : no_zacas);                             \
> > +                                                                     \
> > +             __asm__ __volatile__ (                                  \
> > +                     prepend                                         \
> > +                     "       amocas" sc_cas_sfx " %0, %z2, %1\n"     \
> > +                     append                                          \
> > +                     : "+&r" (r), "+A" (*(p))                        \
> > +                     : "rJ" (n)                                      \
> > +                     : "memory");                                    \
> > +             goto end;                                               \
> > +     }                                                               \
> > +                                                                     \
> > +no_zacas:                                                            \
> >       __asm__ __volatile__ (                                          \
> >               prepend                                                 \
> >               "0:     lr" lr_sfx " %0, %2\n"                          \
> >               "       bne  %0, %z3, 1f\n"                             \
> > -             "       sc" sc_sfx " %1, %z4, %2\n"                     \
> > +             "       sc" sc_cas_sfx " %1, %z4, %2\n"                 \
> >               "       bnez %1, 0b\n"                                  \
> >               append                                                  \
>
> This would probably be a good place to use inline ALTERNATIVE instead of an asm
> goto. It saves overall code size, and a jump in the non-Zacas case, at the cost
> of 3 nops in the Zacas case. (And all the nops can go after the amocas, where
> they will likely be hidden by the amocas latency.)

That's what Conor proposed indeed.

I have just given it a try, but it does not work. The number of
instructions in the zacas asm inline is different in the case of the
fully-ordered version so I cannot set a unique number of nops. I could
pass this information from arch_cmpxchg() down to __arch_cmpxchg() but
those macros are already complex enough so I'd rather not add another
parameter.

This suggestion unfortunately cannot be applied to
__arch_cmpxchg_masked(), nor __arch_xchg_masked().

So unless you and Conor really insist, I'll drop the idea!

Thanks,

Alex


>
> Regards,
> Samuel
>
> >               "1:\n"                                                  \
> >               : "=&r" (r), "=&r" (__rc), "+A" (*(p))                  \
> >               : "rJ" (co o), "rJ" (n)                                 \
> >               : "memory");                                            \
> > +                                                                     \
> > +end:;                                                                        \
> >  })
> >
> >  #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)                \
> > @@ -156,7 +176,7 @@
> >       __typeof__(ptr) __ptr = (ptr);                                  \
> >       __typeof__(*(__ptr)) __old = (old);                             \
> >       __typeof__(*(__ptr)) __new = (new);                             \
> > -     __typeof__(*(__ptr)) __ret;                                     \
> > +     __typeof__(*(__ptr)) __ret = (old);                             \
> >                                                                       \
> >       switch (sizeof(*__ptr)) {                                       \
> >       case 1:                                                         \
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-19 11:48     ` Alexandre Ghiti
@ 2024-07-19 11:53       ` Alexandre Ghiti
  0 siblings, 0 replies; 40+ messages in thread
From: Alexandre Ghiti @ 2024-07-19 11:53 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Conor Dooley, Rob Herring, Krzysztof Kozlowski, Andrea Parri,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

On Fri, Jul 19, 2024 at 1:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> On Fri, Jul 19, 2024 at 2:45 AM Samuel Holland
> <samuel.holland@sifive.com> wrote:
> >
> > Hi Alex,
> >
> > On 2024-07-17 1:19 AM, Alexandre Ghiti wrote:
> > > This adds runtime support for Zacas in cmpxchg operations.
> > >
> > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > > ---
> > >  arch/riscv/Kconfig               | 17 +++++++++++++++++
> > >  arch/riscv/Makefile              |  3 +++
> > >  arch/riscv/include/asm/cmpxchg.h | 26 +++++++++++++++++++++++---
> > >  3 files changed, 43 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 05ccba8ca33a..1caaedec88c7 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
> > >         preemption. Enabling this config will result in higher memory
> > >         consumption due to the allocation of per-task's kernel Vector context.
> > >
> > > +config TOOLCHAIN_HAS_ZACAS
> > > +     bool
> > > +     default y
> > > +     depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zacas)
> > > +     depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zacas)
> > > +     depends on AS_HAS_OPTION_ARCH
> > > +
> > > +config RISCV_ISA_ZACAS
> > > +     bool "Zacas extension support for atomic CAS"
> > > +     depends on TOOLCHAIN_HAS_ZACAS
> > > +     default y
> > > +     help
> > > +       Enable the use of the Zacas ISA-extension to implement kernel atomic
> > > +       cmpxchg operations when it is detected at boot.
> > > +
> > > +       If you don't know what to do here, say Y.
> > > +
> > >  config TOOLCHAIN_HAS_ZBB
> > >       bool
> > >       default y
> > > diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> > > index 06de9d365088..9fd13d7a9cc6 100644
> > > --- a/arch/riscv/Makefile
> > > +++ b/arch/riscv/Makefile
> > > @@ -85,6 +85,9 @@ endif
> > >  # Check if the toolchain supports Zihintpause extension
> > >  riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
> > >
> > > +# Check if the toolchain supports Zacas
> > > +riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
> > > +
> > >  # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
> > >  # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
> > >  KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
> > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > > index 808b4c78462e..5d38153e2f13 100644
> > > --- a/arch/riscv/include/asm/cmpxchg.h
> > > +++ b/arch/riscv/include/asm/cmpxchg.h
> > > @@ -9,6 +9,7 @@
> > >  #include <linux/bug.h>
> > >
> > >  #include <asm/fence.h>
> > > +#include <asm/alternative.h>
> > >
> > >  #define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)         \
> > >  ({                                                                   \
> > > @@ -134,21 +135,40 @@
> > >       r = (__typeof__(*(p)))((__retx & __mask) >> __s);               \
> > >  })
> > >
> > > -#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)      \
> > > +#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)  \
> > >  ({                                                                   \
> > > +     __label__ no_zacas, end;                                        \
> > >       register unsigned int __rc;                                     \
> > >                                                                       \
> > > +     if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {                       \
> > > +             asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,         \
> > > +                                  RISCV_ISA_EXT_ZACAS, 1)            \
> > > +                      : : : : no_zacas);                             \
> > > +                                                                     \
> > > +             __asm__ __volatile__ (                                  \
> > > +                     prepend                                         \
> > > +                     "       amocas" sc_cas_sfx " %0, %z2, %1\n"     \
> > > +                     append                                          \
> > > +                     : "+&r" (r), "+A" (*(p))                        \
> > > +                     : "rJ" (n)                                      \
> > > +                     : "memory");                                    \
> > > +             goto end;                                               \
> > > +     }                                                               \
> > > +                                                                     \
> > > +no_zacas:                                                            \
> > >       __asm__ __volatile__ (                                          \
> > >               prepend                                                 \
> > >               "0:     lr" lr_sfx " %0, %2\n"                          \
> > >               "       bne  %0, %z3, 1f\n"                             \
> > > -             "       sc" sc_sfx " %1, %z4, %2\n"                     \
> > > +             "       sc" sc_cas_sfx " %1, %z4, %2\n"                 \
> > >               "       bnez %1, 0b\n"                                  \
> > >               append                                                  \
> >
> > This would probably be a good place to use inline ALTERNATIVE instead of an asm
> > goto. It saves overall code size, and a jump in the non-Zacas case, at the cost
> > of 3 nops in the Zacas case. (And all the nops can go after the amocas, where
> > they will likely be hidden by the amocas latency.)
>
> That's what Conor proposed indeed.
>
> I have just given it a try, but it does not work. The number of
> instructions in the zacas asm inline is different in the case of the
> fully-ordered version so I cannot set a unique number of nops. I could
> pass this information from arch_cmpxchg() down to __arch_cmpxchg() but
> those macros are already complex enough so I'd rather not add another
> parameter.
>
> This suggestion unfortunately cannot be applied to
> __arch_cmpxchg_masked(), nor __arch_xchg_masked().
>
> So unless you and Conor really insist, I'll drop the idea!

Or I can pass a nop when the full barrier is not needed, and it works!
I'll probably keep this version then since it avoids the introduction
of new macros or the use of a static branch to circumvent the llvm
bug.

>
> Thanks,
>
> Alex
>
>
> >
> > Regards,
> > Samuel
> >
> > >               "1:\n"                                                  \
> > >               : "=&r" (r), "=&r" (__rc), "+A" (*(p))                  \
> > >               : "rJ" (co o), "rJ" (n)                                 \
> > >               : "memory");                                            \
> > > +                                                                     \
> > > +end:;                                                                        \
> > >  })
> > >
> > >  #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)                \
> > > @@ -156,7 +176,7 @@
> > >       __typeof__(ptr) __ptr = (ptr);                                  \
> > >       __typeof__(*(__ptr)) __old = (old);                             \
> > >       __typeof__(*(__ptr)) __new = (new);                             \
> > > -     __typeof__(*(__ptr)) __ret;                                     \
> > > +     __typeof__(*(__ptr)) __ret = (old);                             \
> > >                                                                       \
> > >       switch (sizeof(*__ptr)) {                                       \
> > >       case 1:                                                         \
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2024-07-19 11:53 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-17  6:19 [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Alexandre Ghiti
2024-07-17  6:19 ` [PATCH v3 01/11] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
2024-07-17 15:08   ` Andrew Jones
2024-07-17 15:18     ` Alexandre Ghiti
2024-07-19  0:45   ` Samuel Holland
2024-07-19 11:48     ` Alexandre Ghiti
2024-07-19 11:53       ` Alexandre Ghiti
2024-07-17  6:19 ` [PATCH v3 02/11] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
2024-07-17  6:42   ` Krzysztof Kozlowski
2024-07-17  9:32   ` Guo Ren
2024-07-17  6:19 ` [PATCH v3 03/11] riscv: Implement cmpxchg8/16() using Zabha Alexandre Ghiti
2024-07-17 15:26   ` Andrew Jones
2024-07-17 15:29     ` Conor Dooley
2024-07-17 15:34       ` Alexandre Ghiti
2024-07-18 12:50     ` Alexandre Ghiti
2024-07-18 16:06       ` Andrew Jones
2024-07-18 16:20         ` Alexandre Ghiti
2024-07-17  6:19 ` [PATCH v3 04/11] riscv: Improve zacas fully-ordered cmpxchg() Alexandre Ghiti
2024-07-17  6:19 ` [PATCH v3 05/11] riscv: Implement arch_cmpxchg128() using Zacas Alexandre Ghiti
2024-07-17 20:34   ` Andrew Jones
2024-07-18  7:48     ` Alexandre Ghiti
2024-07-18  8:33       ` Conor Dooley
2024-07-18  9:35         ` Arnd Bergmann
2024-07-17  6:19 ` [PATCH v3 06/11] riscv: Implement xchg8/16() using Zabha Alexandre Ghiti
2024-07-17  6:19 ` [PATCH v3 07/11] asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock Alexandre Ghiti
2024-07-17  6:19 ` [PATCH v3 08/11] asm-generic: ticket-lock: Add separate ticket-lock.h Alexandre Ghiti
2024-07-17  6:19 ` [PATCH v3 09/11] riscv: Add ISA extension parsing for Ziccrse Alexandre Ghiti
2024-07-19  0:53   ` Samuel Holland
2024-07-19  9:11     ` Alexandre Ghiti
2024-07-17  6:19 ` [PATCH v3 10/11] dt-bindings: riscv: Add Ziccrse ISA extension description Alexandre Ghiti
2024-07-17  6:55   ` Krzysztof Kozlowski
2024-07-17  9:42   ` Guo Ren
2024-07-17  6:19 ` [PATCH v3 11/11] riscv: Add qspinlock support Alexandre Ghiti
2024-07-17  9:30   ` Guo Ren
2024-07-18 13:11     ` Alexandre Ghiti
2024-07-17 16:29   ` Andrea Parri
2024-07-18 13:08     ` Alexandre Ghiti
2024-07-19  1:05   ` Samuel Holland
2024-07-19  9:06     ` Alexandre Ghiti
2024-07-17 16:37 ` [PATCH v3 00/11] Zacas/Zabha support and qspinlocks Andrea Parri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).