linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Zacas/Zabha support and qspinlocks
@ 2024-06-26 13:03 Alexandre Ghiti
  2024-06-26 13:03 ` [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
                   ` (9 more replies)
  0 siblings, 10 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti

This implements [cmp]xchgXX() macros using Zacas and Zabha extensions
and finally uses those newly introduced macros to add support for
qspinlocks: note that this implementation of qspinlocks satisfies the
forward progress guarantee.

Thanks to Guo and Leonardo for their work!

Changes in v2:
- Add patch for Zabha dtbinding (Conor)
- Fix cmpxchg128() build warnings missed in v1
- Make arch_cmpxchg128() fully ordered
- Improve Kconfig help texts for both extensions (Conor)
- Fix Makefile dependencies by requiring TOOLCHAIN_HAS_XXX (Nathan)
- Fix compilation errors when the toolchain does not support the
  extensions (Nathan)
- Fix C23 warnings about label at the end of coumpound statements (Nathan)
- Fix Zabha and !Zacas configurations (Andrea)
- Add COMBO spinlocks (Guo)
- Improve amocas fully ordered operations by using .aqrl semantics and
  removing the fence rw, rw (Andrea)
- Rebase on top "riscv: Fix fully ordered LR/SC xchg[8|16]() implementations"
- Add ARCH_WEAK_RELEASE_ACQUIRE (Andrea)
- Remove the extension version in march for LLVM since it is only required
  for experimental extensions (Nathan)
- Fix cmpxchg128() implementation by adding both registers of a pair
  in the list of input/output operands

Alexandre Ghiti (8):
  riscv: Implement cmpxchg32/64() using Zacas
  dt-bindings: riscv: Add Zabha ISA extension description
  riscv: Implement cmpxchg8/16() using Zabha
  riscv: Improve amocas.X use in cmpxchg()
  riscv: Implement arch_cmpxchg128() using Zacas
  riscv: Implement xchg8/16() using Zabha
  riscv: Improve amoswap.X use in xchg()
  riscv: Add qspinlock support based on Zabha extension

Guo Ren (2):
  asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
  asm-generic: ticket-lock: Add separate ticket-lock.h

 .../devicetree/bindings/riscv/extensions.yaml |   6 +
 .../locking/queued-spinlocks/arch-support.txt |   2 +-
 arch/riscv/Kconfig                            |  45 +++++
 arch/riscv/Makefile                           |   6 +
 arch/riscv/include/asm/Kbuild                 |   4 +-
 arch/riscv/include/asm/cmpxchg.h              | 188 ++++++++++++++----
 arch/riscv/include/asm/hwcap.h                |   1 +
 arch/riscv/include/asm/spinlock.h             |  39 ++++
 arch/riscv/kernel/cpufeature.c                |   1 +
 arch/riscv/kernel/setup.c                     |  21 ++
 include/asm-generic/qspinlock.h               |   2 +
 include/asm-generic/spinlock.h                |  87 +-------
 include/asm-generic/spinlock_types.h          |  12 +-
 include/asm-generic/ticket_spinlock.h         | 105 ++++++++++
 14 files changed, 385 insertions(+), 134 deletions(-)
 create mode 100644 arch/riscv/include/asm/spinlock.h
 create mode 100644 include/asm-generic/ticket_spinlock.h

-- 
2.39.2


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-27 11:06   ` Andrea Parri
  2024-07-04  3:38   ` kernel test robot
  2024-06-26 13:03 ` [PATCH v2 02/10] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti

This adds runtime support for Zacas in cmpxchg operations.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/Kconfig               | 17 +++++++++++++++++
 arch/riscv/Makefile              |  3 +++
 arch/riscv/include/asm/cmpxchg.h | 27 ++++++++++++++++++++++++---
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 05ccba8ca33a..1caaedec88c7 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
 	  preemption. Enabling this config will result in higher memory
 	  consumption due to the allocation of per-task's kernel Vector context.
 
+config TOOLCHAIN_HAS_ZACAS
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zacas)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zacas)
+	depends on AS_HAS_OPTION_ARCH
+
+config RISCV_ISA_ZACAS
+	bool "Zacas extension support for atomic CAS"
+	depends on TOOLCHAIN_HAS_ZACAS
+	default y
+	help
+	  Enable the use of the Zacas ISA-extension to implement kernel atomic
+	  cmpxchg operations when it is detected at boot.
+
+	  If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZBB
 	bool
 	default y
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 06de9d365088..9fd13d7a9cc6 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -85,6 +85,9 @@ endif
 # Check if the toolchain supports Zihintpause extension
 riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
 
+# Check if the toolchain supports Zacas
+riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
+
 # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
 # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
 KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 808b4c78462e..a58a2141c6d3 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -9,6 +9,7 @@
 #include <linux/bug.h>
 
 #include <asm/fence.h>
+#include <asm/alternative.h>
 
 #define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)		\
 ({									\
@@ -134,21 +135,41 @@
 	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
 })
 
-#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)	\
+#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
 ({									\
+	__label__ zacas, end;						\
 	register unsigned int __rc;					\
 									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
+		asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,		\
+				     RISCV_ISA_EXT_ZACAS, 1)		\
+			 : : : : zacas);				\
+	}								\
+									\
 	__asm__ __volatile__ (						\
 		prepend							\
 		"0:	lr" lr_sfx " %0, %2\n"				\
 		"	bne  %0, %z3, 1f\n"				\
-		"	sc" sc_sfx " %1, %z4, %2\n"			\
+		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
 		"	bnez %1, 0b\n"					\
 		append							\
 		"1:\n"							\
 		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
 		: "rJ" (co o), "rJ" (n)					\
 		: "memory");						\
+	goto end;							\
+									\
+zacas:									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
+		__asm__ __volatile__ (					\
+			prepend						\
+			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
+			append						\
+			: "+&r" (r), "+A" (*(p))			\
+			: "rJ" (n)					\
+			: "memory");					\
+	}								\
+end:;									\
 })
 
 #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
@@ -156,7 +177,7 @@
 	__typeof__(ptr) __ptr = (ptr);					\
 	__typeof__(*(__ptr)) __old = (old);				\
 	__typeof__(*(__ptr)) __new = (new);				\
-	__typeof__(*(__ptr)) __ret;					\
+	__typeof__(*(__ptr)) __ret = (old);				\
 									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 02/10] dt-bindings: riscv: Add Zabha ISA extension description
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
  2024-06-26 13:03 ` [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-26 14:20   ` Krzysztof Kozlowski
  2024-06-26 13:03 ` [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha Alexandre Ghiti
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti

Add description for the Zabha ISA extension which was ratified in April
2024.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 468c646247aa..e6436260bdeb 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -171,6 +171,12 @@ properties:
             memory types as ratified in the 20191213 version of the privileged
             ISA specification.
 
+        - const: zabha
+          description: |
+            The Zabha extension for Byte and Halfword Atomic Memory Operations
+            as ratified at commit 49f49c842ff9 ("Update to Rafified state") of
+            riscv-zabha.
+
         - const: zacas
           description: |
             The Zacas extension for Atomic Compare-and-Swap (CAS) instructions
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
  2024-06-26 13:03 ` [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
  2024-06-26 13:03 ` [PATCH v2 02/10] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-27 11:53   ` Andrea Parri
  2024-06-26 13:03 ` [PATCH v2 04/10] riscv: Improve amocas.X use in cmpxchg() Alexandre Ghiti
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti

This adds runtime support for Zabha in cmpxchg8/16() operations.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/Kconfig               | 17 ++++++++++++++++
 arch/riscv/Makefile              |  3 +++
 arch/riscv/include/asm/cmpxchg.h | 34 ++++++++++++++++++++++++++++++--
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 1caaedec88c7..d3b0f92f92da 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -596,6 +596,23 @@ config RISCV_ISA_V_PREEMPTIVE
 	  preemption. Enabling this config will result in higher memory
 	  consumption due to the allocation of per-task's kernel Vector context.
 
+config TOOLCHAIN_HAS_ZABHA
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zabha)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zabha)
+	depends on AS_HAS_OPTION_ARCH
+
+config RISCV_ISA_ZABHA
+	bool "Zabha extension support for atomic byte/halfword operations"
+	depends on TOOLCHAIN_HAS_ZABHA
+	default y
+	help
+	  Enable the use of the Zabha ISA-extension to implement kernel
+	  byte/halfword atomic memory operations when it is detected at boot.
+
+	  If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZACAS
 	bool
 	default y
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 9fd13d7a9cc6..78dcaaeebf4e 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -88,6 +88,9 @@ riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZIHINTPAUSE) := $(riscv-march-y)_zihintpause
 # Check if the toolchain supports Zacas
 riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZACAS) := $(riscv-march-y)_zacas
 
+# Check if the toolchain supports Zabha
+riscv-march-$(CONFIG_TOOLCHAIN_HAS_ZABHA) := $(riscv-march-y)_zabha
+
 # Remove F,D,V from isa string for all. Keep extensions between "fd" and "v" by
 # matching non-v and non-multi-letter extensions out with the filter ([^v_]*)
 KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)fd([^v_]*)v?/\1\2/')
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index a58a2141c6d3..b9a3fdcec919 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -105,8 +105,20 @@
  * indicated by comparing RETURN with OLD.
  */
 
-#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
+#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
 ({									\
+	__label__ no_zacas, zabha, end;					\
+									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
+		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
+				     RISCV_ISA_EXT_ZACAS, 1)		\
+			 : : : : no_zacas);				\
+		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
+				     RISCV_ISA_EXT_ZABHA, 1)		\
+			 : : : : zabha);				\
+	}								\
+									\
+no_zacas:;								\
 	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
 	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
 	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
@@ -133,6 +145,19 @@
 		: "memory");						\
 									\
 	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
+	goto end;							\
+									\
+zabha:									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
+		__asm__ __volatile__ (					\
+			prepend						\
+			"	amocas" cas_sfx " %0, %z2, %1\n"	\
+			append						\
+			: "+&r" (r), "+A" (*(p))			\
+			: "rJ" (n)					\
+			: "memory");					\
+	}								\
+end:;									\
 })
 
 #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
@@ -181,8 +206,13 @@ end:;									\
 									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
+		__arch_cmpxchg_masked(sc_sfx, ".b" sc_sfx,		\
+					prepend, append,		\
+					__ret, __ptr, __old, __new);    \
+		break;							\
 	case 2:								\
-		__arch_cmpxchg_masked(sc_sfx, prepend, append,		\
+		__arch_cmpxchg_masked(sc_sfx, ".h" sc_sfx,		\
+					prepend, append,		\
 					__ret, __ptr, __old, __new);	\
 		break;							\
 	case 4:								\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 04/10] riscv: Improve amocas.X use in cmpxchg()
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (2 preceding siblings ...)
  2024-06-26 13:03 ` [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-27 13:31   ` Andrea Parri
  2024-06-26 13:03 ` [PATCH v2 05/10] riscv: Implement arch_cmpxchg128() using Zacas Alexandre Ghiti
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti, Andrea Parri

cmpxchg() uses amocas.X instructions from Zacas and Zabha but still uses
the LR/SC acquire/release semantics which require barriers.

Let's improve that by using proper amocas acquire/release semantics in
order to avoid any of those barriers.

Suggested-by: Andrea Parri <andrea@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/include/asm/cmpxchg.h | 60 ++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 27 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index b9a3fdcec919..3c65b00a0d36 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -105,7 +105,9 @@
  * indicated by comparing RETURN with OLD.
  */
 
-#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
+#define __arch_cmpxchg_masked(sc_sfx, cas_sfx,				\
+			      sc_prepend, sc_append,			\
+			      r, p, o, n)				\
 ({									\
 	__label__ no_zacas, zabha, end;					\
 									\
@@ -129,7 +131,7 @@ no_zacas:;								\
 	ulong __rc;							\
 									\
 	__asm__ __volatile__ (						\
-		prepend							\
+		sc_prepend							\
 		"0:	lr.w %0, %2\n"					\
 		"	and  %1, %0, %z5\n"				\
 		"	bne  %1, %z3, 1f\n"				\
@@ -137,7 +139,7 @@ no_zacas:;								\
 		"	or   %1, %1, %z4\n"				\
 		"	sc.w" sc_sfx " %1, %1, %2\n"			\
 		"	bnez %1, 0b\n"					\
-		append							\
+		sc_append							\
 		"1:\n"							\
 		: "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
 		: "rJ" ((long)__oldx), "rJ" (__newx),			\
@@ -150,9 +152,7 @@ no_zacas:;								\
 zabha:									\
 	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
 		__asm__ __volatile__ (					\
-			prepend						\
 			"	amocas" cas_sfx " %0, %z2, %1\n"	\
-			append						\
 			: "+&r" (r), "+A" (*(p))			\
 			: "rJ" (n)					\
 			: "memory");					\
@@ -160,7 +160,9 @@ zabha:									\
 end:;									\
 })
 
-#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
+#define __arch_cmpxchg(lr_sfx, sc_sfx, cas_sfx,				\
+		       sc_prepend, sc_append,				\
+		       r, p, co, o, n)					\
 ({									\
 	__label__ zacas, end;						\
 	register unsigned int __rc;					\
@@ -172,12 +174,12 @@ end:;									\
 	}								\
 									\
 	__asm__ __volatile__ (						\
-		prepend							\
+		sc_prepend							\
 		"0:	lr" lr_sfx " %0, %2\n"				\
 		"	bne  %0, %z3, 1f\n"				\
-		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
+		"	sc" sc_sfx " %1, %z4, %2\n"			\
 		"	bnez %1, 0b\n"					\
-		append							\
+		sc_append							\
 		"1:\n"							\
 		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
 		: "rJ" (co o), "rJ" (n)					\
@@ -187,9 +189,7 @@ end:;									\
 zacas:									\
 	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
 		__asm__ __volatile__ (					\
-			prepend						\
-			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
-			append						\
+			"	amocas" cas_sfx " %0, %z2, %1\n"	\
 			: "+&r" (r), "+A" (*(p))			\
 			: "rJ" (n)					\
 			: "memory");					\
@@ -197,7 +197,8 @@ zacas:									\
 end:;									\
 })
 
-#define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
+#define _arch_cmpxchg(ptr, old, new, sc_sfx, cas_sfx,			\
+		      sc_prepend, sc_append)				\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
 	__typeof__(*(__ptr)) __old = (old);				\
@@ -206,22 +207,24 @@ end:;									\
 									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
-		__arch_cmpxchg_masked(sc_sfx, ".b" sc_sfx,		\
-					prepend, append,		\
-					__ret, __ptr, __old, __new);    \
+		__arch_cmpxchg_masked(sc_sfx, ".b" cas_sfx,		\
+				      sc_prepend, sc_append,		\
+				      __ret, __ptr, __old, __new);	\
 		break;							\
 	case 2:								\
-		__arch_cmpxchg_masked(sc_sfx, ".h" sc_sfx,		\
-					prepend, append,		\
-					__ret, __ptr, __old, __new);	\
+		__arch_cmpxchg_masked(sc_sfx, ".h" cas_sfx,		\
+				      sc_prepend, sc_append,		\
+				      __ret, __ptr, __old, __new);	\
 		break;							\
 	case 4:								\
-		__arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,	\
-				__ret, __ptr, (long), __old, __new);	\
+		__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx,		\
+			       sc_prepend, sc_append,			\
+			       __ret, __ptr, (long), __old, __new);	\
 		break;							\
 	case 8:								\
-		__arch_cmpxchg(".d", ".d" sc_sfx, prepend, append,	\
-				__ret, __ptr, /**/, __old, __new);	\
+		__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx,		\
+			       sc_prepend, sc_append,			\
+			       __ret, __ptr, /**/, __old, __new);	\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
@@ -230,16 +233,19 @@ end:;									\
 })
 
 #define arch_cmpxchg_relaxed(ptr, o, n)					\
-	_arch_cmpxchg((ptr), (o), (n), "", "", "")
+	_arch_cmpxchg((ptr), (o), (n), "", "", "", "")
 
 #define arch_cmpxchg_acquire(ptr, o, n)					\
-	_arch_cmpxchg((ptr), (o), (n), "", "", RISCV_ACQUIRE_BARRIER)
+	_arch_cmpxchg((ptr), (o), (n), "", ".aq",			\
+		      "", RISCV_ACQUIRE_BARRIER)
 
 #define arch_cmpxchg_release(ptr, o, n)					\
-	_arch_cmpxchg((ptr), (o), (n), "", RISCV_RELEASE_BARRIER, "")
+	_arch_cmpxchg((ptr), (o), (n), "", ".rl",			\
+		      RISCV_RELEASE_BARRIER, "")
 
 #define arch_cmpxchg(ptr, o, n)						\
-	_arch_cmpxchg((ptr), (o), (n), ".rl", "", "	fence rw, rw\n")
+	_arch_cmpxchg((ptr), (o), (n), ".rl", ".aqrl",			\
+		      "", RISCV_FULL_BARRIER)
 
 #define arch_cmpxchg_local(ptr, o, n)					\
 	arch_cmpxchg_relaxed((ptr), (o), (n))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 05/10] riscv: Implement arch_cmpxchg128() using Zacas
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (3 preceding siblings ...)
  2024-06-26 13:03 ` [PATCH v2 04/10] riscv: Improve amocas.X use in cmpxchg() Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-26 13:03 ` [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha Alexandre Ghiti
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti

Now that Zacas is supported in the kernel, let's use the double word
atomic version of amocas to improve the SLUB allocator.

Note that we have to select fixed registers, otherwise gcc fails to pick
even registers and then produces a reserved encoding which fails to
assemble.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/Kconfig               |  1 +
 arch/riscv/include/asm/cmpxchg.h | 39 ++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index d3b0f92f92da..0bbaec0444d0 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -104,6 +104,7 @@ config RISCV
 	select GENERIC_VDSO_TIME_NS if HAVE_GENERIC_VDSO
 	select HARDIRQS_SW_RESEND
 	select HAS_IOPORT if MMU
+	select HAVE_ALIGNED_STRUCT_PAGE
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP
 	select HAVE_ARCH_HUGE_VMAP if MMU && 64BIT
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 3c65b00a0d36..da42f32ea53d 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -280,4 +280,43 @@ end:;									\
 	arch_cmpxchg_release((ptr), (o), (n));				\
 })
 
+#ifdef CONFIG_RISCV_ISA_ZACAS
+
+#define system_has_cmpxchg128()						\
+			riscv_has_extension_unlikely(RISCV_ISA_EXT_ZACAS)
+
+union __u128_halves {
+	u128 full;
+	struct {
+		u64 low, high;
+	};
+};
+
+#define __arch_cmpxchg128(p, o, n, cas_sfx)					\
+({										\
+	__typeof__(*(p)) __o = (o);						\
+	union __u128_halves __hn = { .full = (n) };				\
+	union __u128_halves __ho = { .full = (__o) };				\
+	register unsigned long x6 asm ("x6") = __hn.low;			\
+	register unsigned long x7 asm ("x7") = __hn.high;			\
+	register unsigned long x28 asm ("x28") = __ho.low;			\
+	register unsigned long x29 asm ("x29") = __ho.high;			\
+										\
+	__asm__ __volatile__ (							\
+		"	amocas.q" cas_sfx " %0, %z3, %2"			\
+		: "+&r" (x28), "+&r" (x29), "+A" (*(p))				\
+		: "rJ" (x6), "rJ" (x7)						\
+		: "memory");							\
+										\
+	((u128)x29 << 64) | x28;						\
+})
+
+#define arch_cmpxchg128(ptr, o, n)						\
+	__arch_cmpxchg128((ptr), (o), (n), ".aqrl")
+
+#define arch_cmpxchg128_local(ptr, o, n)					\
+	__arch_cmpxchg128((ptr), (o), (n), "")
+
+#endif /* CONFIG_RISCV_ISA_ZACAS */
+
 #endif /* _ASM_RISCV_CMPXCHG_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (4 preceding siblings ...)
  2024-06-26 13:03 ` [PATCH v2 05/10] riscv: Implement arch_cmpxchg128() using Zacas Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-27 13:45   ` Andrea Parri
  2024-07-10  1:37   ` Guo Ren
  2024-06-26 13:03 ` [PATCH v2 07/10] riscv: Improve amoswap.X use in xchg() Alexandre Ghiti
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti

This adds runtime support for Zabha in xchg8/16() operations.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/include/asm/cmpxchg.h | 33 +++++++++++++++++++++++++++++---
 arch/riscv/include/asm/hwcap.h   |  1 +
 arch/riscv/kernel/cpufeature.c   |  1 +
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index da42f32ea53d..eb35e2d30a97 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,8 +11,17 @@
 #include <asm/fence.h>
 #include <asm/alternative.h>
 
-#define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)		\
+#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,	\
+			   swap_append, r, p, n)			\
 ({									\
+	__label__ zabha, end;						\
+									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
+		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
+				     RISCV_ISA_EXT_ZABHA, 1)		\
+			 : : : : zabha);				\
+	}								\
+									\
 	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
 	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
 	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
@@ -28,12 +37,25 @@
 	       "	or   %1, %1, %z3\n"				\
 	       "	sc.w" sc_sfx " %1, %1, %2\n"			\
 	       "	bnez %1, 0b\n"					\
-	       append							\
+	       sc_append							\
 	       : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
 	       : "rJ" (__newx), "rJ" (~__mask)				\
 	       : "memory");						\
 									\
 	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
+	goto end;							\
+									\
+zabha:									\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
+		__asm__ __volatile__ (					\
+			prepend						\
+			"	amoswap" swap_sfx " %0, %z2, %1\n"	\
+			swap_append						\
+			: "=&r" (r), "+A" (*(p))			\
+			: "rJ" (n)					\
+			: "memory");					\
+	}								\
+end:;									\
 })
 
 #define __arch_xchg(sfx, prepend, append, r, p, n)			\
@@ -56,8 +78,13 @@
 									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
+		__arch_xchg_masked(sc_sfx, ".b" swap_sfx,		\
+				   prepend, sc_append, swap_append,	\
+				   __ret, __ptr, __new);		\
+		break;							\
 	case 2:								\
-		__arch_xchg_masked(sc_sfx, prepend, sc_append,		\
+		__arch_xchg_masked(sc_sfx, ".h" swap_sfx,		\
+				   prepend, sc_append, swap_append,	\
 				   __ret, __ptr, __new);		\
 		break;							\
 	case 4:								\
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index e17d0078a651..f71ddd2ca163 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -81,6 +81,7 @@
 #define RISCV_ISA_EXT_ZTSO		72
 #define RISCV_ISA_EXT_ZACAS		73
 #define RISCV_ISA_EXT_XANDESPMU		74
+#define RISCV_ISA_EXT_ZABHA		75
 
 #define RISCV_ISA_EXT_XLINUXENVCFG	127
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 5ef48cb20ee1..c125d82c894b 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -257,6 +257,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
 	__RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
 	__RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
+	__RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA),
 	__RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
 	__RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
 	__RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 07/10] riscv: Improve amoswap.X use in xchg()
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (5 preceding siblings ...)
  2024-06-26 13:03 ` [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-27 13:58   ` Andrea Parri
  2024-06-26 13:03 ` [PATCH v2 08/10] asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock Alexandre Ghiti
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti, Andrea Parri

xchg() uses amoswap.X instructions from Zabha but still uses
the LR/SC acquire/release semantics which require barriers.

Let's improve that by using proper amoswap acquire/release semantics in
order to avoid any of those barriers.

Suggested-by: Andrea Parri <andrea@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/include/asm/cmpxchg.h | 35 +++++++++++++-------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index eb35e2d30a97..0e57d5fbf227 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,8 +11,8 @@
 #include <asm/fence.h>
 #include <asm/alternative.h>
 
-#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,	\
-			   swap_append, r, p, n)			\
+#define __arch_xchg_masked(sc_sfx, swap_sfx, sc_prepend, sc_append,	\
+			   r, p, n)					\
 ({									\
 	__label__ zabha, end;						\
 									\
@@ -31,7 +31,7 @@
 	ulong __rc;							\
 									\
 	__asm__ __volatile__ (						\
-	       prepend							\
+	       sc_prepend							\
 	       "0:	lr.w %0, %2\n"					\
 	       "	and  %1, %0, %z4\n"				\
 	       "	or   %1, %1, %z3\n"				\
@@ -48,9 +48,7 @@
 zabha:									\
 	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
 		__asm__ __volatile__ (					\
-			prepend						\
 			"	amoswap" swap_sfx " %0, %z2, %1\n"	\
-			swap_append						\
 			: "=&r" (r), "+A" (*(p))			\
 			: "rJ" (n)					\
 			: "memory");					\
@@ -58,19 +56,17 @@ zabha:									\
 end:;									\
 })
 
-#define __arch_xchg(sfx, prepend, append, r, p, n)			\
+#define __arch_xchg(sfx, r, p, n)					\
 ({									\
 	__asm__ __volatile__ (						\
-		prepend							\
 		"	amoswap" sfx " %0, %2, %1\n"			\
-		append							\
 		: "=r" (r), "+A" (*(p))					\
 		: "r" (n)						\
 		: "memory");						\
 })
 
-#define _arch_xchg(ptr, new, sc_sfx, swap_sfx, prepend,			\
-		   sc_append, swap_append)				\
+#define _arch_xchg(ptr, new, sc_sfx, swap_sfx,				\
+		   sc_prepend, sc_append)				\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
 	__typeof__(*(__ptr)) __new = (new);				\
@@ -79,21 +75,19 @@ end:;									\
 	switch (sizeof(*__ptr)) {					\
 	case 1:								\
 		__arch_xchg_masked(sc_sfx, ".b" swap_sfx,		\
-				   prepend, sc_append, swap_append,	\
+				   sc_prepend, sc_append,		\
 				   __ret, __ptr, __new);		\
 		break;							\
 	case 2:								\
 		__arch_xchg_masked(sc_sfx, ".h" swap_sfx,		\
-				   prepend, sc_append, swap_append,	\
+				   sc_prepend, sc_append,		\
 				   __ret, __ptr, __new);		\
 		break;							\
 	case 4:								\
-		__arch_xchg(".w" swap_sfx, prepend, swap_append,	\
-			      __ret, __ptr, __new);			\
+		__arch_xchg(".w" swap_sfx,  __ret, __ptr, __new);	\
 		break;							\
 	case 8:								\
-		__arch_xchg(".d" swap_sfx, prepend, swap_append,	\
-			      __ret, __ptr, __new);			\
+		__arch_xchg(".d" swap_sfx,  __ret, __ptr, __new);	\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
@@ -102,17 +96,16 @@ end:;									\
 })
 
 #define arch_xchg_relaxed(ptr, x)					\
-	_arch_xchg(ptr, x, "", "", "", "", "")
+	_arch_xchg(ptr, x, "", "", "", "")
 
 #define arch_xchg_acquire(ptr, x)					\
-	_arch_xchg(ptr, x, "", "", "",					\
-		   RISCV_ACQUIRE_BARRIER, RISCV_ACQUIRE_BARRIER)
+	_arch_xchg(ptr, x, "", ".aq", "", RISCV_ACQUIRE_BARRIER)
 
 #define arch_xchg_release(ptr, x)					\
-	_arch_xchg(ptr, x, "", "", RISCV_RELEASE_BARRIER, "", "")
+	_arch_xchg(ptr, x, "", ".rl", RISCV_RELEASE_BARRIER, "")
 
 #define arch_xchg(ptr, x)						\
-	_arch_xchg(ptr, x, ".rl", ".aqrl", "", RISCV_FULL_BARRIER, "")
+	_arch_xchg(ptr, x, ".rl", ".aqrl", "", RISCV_FULL_BARRIER)
 
 #define xchg32(ptr, x)							\
 ({									\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 08/10] asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (6 preceding siblings ...)
  2024-06-26 13:03 ` [PATCH v2 07/10] riscv: Improve amoswap.X use in xchg() Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-26 13:03 ` [PATCH v2 09/10] asm-generic: ticket-lock: Add separate ticket-lock.h Alexandre Ghiti
  2024-06-26 13:03 ` [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension Alexandre Ghiti
  9 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

The arch_spinlock_t of qspinlock has contained the atomic_t val, which
satisfies the ticket-lock requirement. Thus, unify the arch_spinlock_t
into qspinlock_types.h. This is the preparation for the next combo
spinlock.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
---
 include/asm-generic/spinlock.h       | 14 +++++++-------
 include/asm-generic/spinlock_types.h | 12 ++----------
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 90803a826ba0..4773334ee638 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -32,7 +32,7 @@
 
 static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 {
-	u32 val = atomic_fetch_add(1<<16, lock);
+	u32 val = atomic_fetch_add(1<<16, &lock->val);
 	u16 ticket = val >> 16;
 
 	if (ticket == (u16)val)
@@ -46,31 +46,31 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 	 * have no outstanding writes due to the atomic_fetch_add() the extra
 	 * orderings are free.
 	 */
-	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
+	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
 	smp_mb();
 }
 
 static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
 {
-	u32 old = atomic_read(lock);
+	u32 old = atomic_read(&lock->val);
 
 	if ((old >> 16) != (old & 0xffff))
 		return false;
 
-	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
+	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
 }
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
-	u32 val = atomic_read(lock);
+	u32 val = atomic_read(&lock->val);
 
 	smp_store_release(ptr, (u16)val + 1);
 }
 
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
-	u32 val = lock.counter;
+	u32 val = lock.val.counter;
 
 	return ((val >> 16) == (val & 0xffff));
 }
@@ -84,7 +84,7 @@ static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
 
 static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
-	u32 val = atomic_read(lock);
+	u32 val = atomic_read(&lock->val);
 
 	return (s16)((val >> 16) - (val & 0xffff)) > 1;
 }
diff --git a/include/asm-generic/spinlock_types.h b/include/asm-generic/spinlock_types.h
index 8962bb730945..f534aa5de394 100644
--- a/include/asm-generic/spinlock_types.h
+++ b/include/asm-generic/spinlock_types.h
@@ -3,15 +3,7 @@
 #ifndef __ASM_GENERIC_SPINLOCK_TYPES_H
 #define __ASM_GENERIC_SPINLOCK_TYPES_H
 
-#include <linux/types.h>
-typedef atomic_t arch_spinlock_t;
-
-/*
- * qrwlock_types depends on arch_spinlock_t, so we must typedef that before the
- * include.
- */
-#include <asm/qrwlock_types.h>
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	ATOMIC_INIT(0)
+#include <asm-generic/qspinlock_types.h>
+#include <asm-generic/qrwlock_types.h>
 
 #endif /* __ASM_GENERIC_SPINLOCK_TYPES_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 09/10] asm-generic: ticket-lock: Add separate ticket-lock.h
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (7 preceding siblings ...)
  2024-06-26 13:03 ` [PATCH v2 08/10] asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-26 13:03 ` [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension Alexandre Ghiti
  9 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Add a separate ticket-lock.h to include multiple spinlock versions and
select one at compile time or runtime.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 include/asm-generic/spinlock.h        |  87 +---------------------
 include/asm-generic/ticket_spinlock.h | 103 ++++++++++++++++++++++++++
 2 files changed, 104 insertions(+), 86 deletions(-)
 create mode 100644 include/asm-generic/ticket_spinlock.h

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 4773334ee638..970590baf61b 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -1,94 +1,9 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
-/*
- * 'Generic' ticket-lock implementation.
- *
- * It relies on atomic_fetch_add() having well defined forward progress
- * guarantees under contention. If your architecture cannot provide this, stick
- * to a test-and-set lock.
- *
- * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
- * sub-word of the value. This is generally true for anything LL/SC although
- * you'd be hard pressed to find anything useful in architecture specifications
- * about this. If your architecture cannot do this you might be better off with
- * a test-and-set.
- *
- * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
- * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
- * a full fence after the spin to upgrade the otherwise-RCpc
- * atomic_cond_read_acquire().
- *
- * The implementation uses smp_cond_load_acquire() to spin, so if the
- * architecture has WFE like instructions to sleep instead of poll for word
- * modifications be sure to implement that (see ARM64 for example).
- *
- */
-
 #ifndef __ASM_GENERIC_SPINLOCK_H
 #define __ASM_GENERIC_SPINLOCK_H
 
-#include <linux/atomic.h>
-#include <asm-generic/spinlock_types.h>
-
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	u32 val = atomic_fetch_add(1<<16, &lock->val);
-	u16 ticket = val >> 16;
-
-	if (ticket == (u16)val)
-		return;
-
-	/*
-	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
-	 * custom cond_read_rcsc() here we just emit a full fence.  We only
-	 * need the prior reads before subsequent writes ordering from
-	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
-	 * have no outstanding writes due to the atomic_fetch_add() the extra
-	 * orderings are free.
-	 */
-	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
-	smp_mb();
-}
-
-static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
-{
-	u32 old = atomic_read(&lock->val);
-
-	if ((old >> 16) != (old & 0xffff))
-		return false;
-
-	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
-	u32 val = atomic_read(&lock->val);
-
-	smp_store_release(ptr, (u16)val + 1);
-}
-
-static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	u32 val = lock.val.counter;
-
-	return ((val >> 16) == (val & 0xffff));
-}
-
-static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	arch_spinlock_t val = READ_ONCE(*lock);
-
-	return !arch_spin_value_unlocked(val);
-}
-
-static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	u32 val = atomic_read(&lock->val);
-
-	return (s16)((val >> 16) - (val & 0xffff)) > 1;
-}
-
+#include <asm-generic/ticket_spinlock.h>
 #include <asm/qrwlock.h>
 
 #endif /* __ASM_GENERIC_SPINLOCK_H */
diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
new file mode 100644
index 000000000000..cfcff22b37b3
--- /dev/null
+++ b/include/asm-generic/ticket_spinlock.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * 'Generic' ticket-lock implementation.
+ *
+ * It relies on atomic_fetch_add() having well defined forward progress
+ * guarantees under contention. If your architecture cannot provide this, stick
+ * to a test-and-set lock.
+ *
+ * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
+ * sub-word of the value. This is generally true for anything LL/SC although
+ * you'd be hard pressed to find anything useful in architecture specifications
+ * about this. If your architecture cannot do this you might be better off with
+ * a test-and-set.
+ *
+ * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
+ * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
+ * a full fence after the spin to upgrade the otherwise-RCpc
+ * atomic_cond_read_acquire().
+ *
+ * The implementation uses smp_cond_load_acquire() to spin, so if the
+ * architecture has WFE like instructions to sleep instead of poll for word
+ * modifications be sure to implement that (see ARM64 for example).
+ *
+ */
+
+#ifndef __ASM_GENERIC_TICKET_SPINLOCK_H
+#define __ASM_GENERIC_TICKET_SPINLOCK_H
+
+#include <linux/atomic.h>
+#include <asm-generic/spinlock_types.h>
+
+static __always_inline void ticket_spin_lock(arch_spinlock_t *lock)
+{
+	u32 val = atomic_fetch_add(1<<16, &lock->val);
+	u16 ticket = val >> 16;
+
+	if (ticket == (u16)val)
+		return;
+
+	/*
+	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
+	 * custom cond_read_rcsc() here we just emit a full fence.  We only
+	 * need the prior reads before subsequent writes ordering from
+	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
+	 * have no outstanding writes due to the atomic_fetch_add() the extra
+	 * orderings are free.
+	 */
+	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
+	smp_mb();
+}
+
+static __always_inline bool ticket_spin_trylock(arch_spinlock_t *lock)
+{
+	u32 old = atomic_read(&lock->val);
+
+	if ((old >> 16) != (old & 0xffff))
+		return false;
+
+	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
+}
+
+static __always_inline void ticket_spin_unlock(arch_spinlock_t *lock)
+{
+	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+	u32 val = atomic_read(&lock->val);
+
+	smp_store_release(ptr, (u16)val + 1);
+}
+
+static __always_inline int ticket_spin_value_unlocked(arch_spinlock_t lock)
+{
+	u32 val = lock.val.counter;
+
+	return ((val >> 16) == (val & 0xffff));
+}
+
+static __always_inline int ticket_spin_is_locked(arch_spinlock_t *lock)
+{
+	arch_spinlock_t val = READ_ONCE(*lock);
+
+	return !ticket_spin_value_unlocked(val);
+}
+
+static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
+{
+	u32 val = atomic_read(&lock->val);
+
+	return (s16)((val >> 16) - (val & 0xffff)) > 1;
+}
+
+/*
+ * Remapping spinlock architecture specific functions to the corresponding
+ * ticket spinlock functions.
+ */
+#define arch_spin_is_locked(l)		ticket_spin_is_locked(l)
+#define arch_spin_is_contended(l)	ticket_spin_is_contended(l)
+#define arch_spin_value_unlocked(l)	ticket_spin_value_unlocked(l)
+#define arch_spin_lock(l)		ticket_spin_lock(l)
+#define arch_spin_trylock(l)		ticket_spin_trylock(l)
+#define arch_spin_unlock(l)		ticket_spin_unlock(l)
+
+#endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
                   ` (8 preceding siblings ...)
  2024-06-26 13:03 ` [PATCH v2 09/10] asm-generic: ticket-lock: Add separate ticket-lock.h Alexandre Ghiti
@ 2024-06-26 13:03 ` Alexandre Ghiti
  2024-06-27 15:19   ` Andrea Parri
  2024-07-07  2:20   ` Guo Ren
  9 siblings, 2 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-06-26 13:03 UTC (permalink / raw)
  To: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: Alexandre Ghiti

In order to produce a generic kernel, a user can select
CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
spinlock implementation if Zabha is not present.

Note that we can't use alternatives here because the discovery of
extensions is done too late and we need to start with the qspinlock
implementation because the ticket spinlock implementation would pollute
the spinlock value, so let's use static keys.

This is largely based on Guo's work and Leonardo reviews at [1].

Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 .../locking/queued-spinlocks/arch-support.txt |  2 +-
 arch/riscv/Kconfig                            | 10 +++++
 arch/riscv/include/asm/Kbuild                 |  4 +-
 arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
 arch/riscv/kernel/setup.c                     | 21 ++++++++++
 include/asm-generic/qspinlock.h               |  2 +
 include/asm-generic/ticket_spinlock.h         |  2 +
 7 files changed, 78 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/include/asm/spinlock.h

diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
index 22f2990392ff..cf26042480e2 100644
--- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
+++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
@@ -20,7 +20,7 @@
     |    openrisc: |  ok  |
     |      parisc: | TODO |
     |     powerpc: |  ok  |
-    |       riscv: | TODO |
+    |       riscv: |  ok  |
     |        s390: | TODO |
     |          sh: | TODO |
     |       sparc: |  ok  |
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 0bbaec0444d0..c2ba673e58ca 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -72,6 +72,7 @@ config RISCV
 	select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
 	select ARCH_WANTS_NO_INSTR
 	select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
+	select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
 	select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
 	select BUILDTIME_TABLE_SORT if MMU
 	select CLINT_TIMER if RISCV_M_MODE
@@ -482,6 +483,15 @@ config NODES_SHIFT
 	  Specify the maximum number of NUMA Nodes available on the target
 	  system.  Increases memory reserved to accommodate various tables.
 
+config RISCV_COMBO_SPINLOCKS
+	bool "Using combo spinlock"
+	depends on SMP && MMU && TOOLCHAIN_HAS_ZABHA
+	select ARCH_USE_QUEUED_SPINLOCKS
+	default y
+	help
+	  Embed both queued spinlock and ticket lock so that the spinlock
+	  implementation can be chosen at runtime.
+
 config RISCV_ALTERNATIVE
 	bool
 	depends on !XIP_KERNEL
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 504f8b7e72d4..ad72f2bd4cc9 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -2,10 +2,12 @@
 generic-y += early_ioremap.h
 generic-y += flat.h
 generic-y += kvm_para.h
+generic-y += mcs_spinlock.h
 generic-y += parport.h
-generic-y += spinlock.h
 generic-y += spinlock_types.h
+generic-y += ticket_spinlock.h
 generic-y += qrwlock.h
 generic-y += qrwlock_types.h
+generic-y += qspinlock.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
new file mode 100644
index 000000000000..4856d50006f2
--- /dev/null
+++ b/arch/riscv/include/asm/spinlock.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_RISCV_SPINLOCK_H
+#define __ASM_RISCV_SPINLOCK_H
+
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+#define _Q_PENDING_LOOPS	(1 << 9)
+
+#define __no_arch_spinlock_redefine
+#include <asm/ticket_spinlock.h>
+#include <asm/qspinlock.h>
+#include <asm/alternative.h>
+
+DECLARE_STATIC_KEY_TRUE(qspinlock_key);
+
+#define SPINLOCK_BASE_DECLARE(op, type, type_lock)			\
+static __always_inline type arch_spin_##op(type_lock lock)		\
+{									\
+	if (static_branch_unlikely(&qspinlock_key))			\
+		return queued_spin_##op(lock);				\
+	return ticket_spin_##op(lock);					\
+}
+
+SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
+SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
+
+#else
+
+#include <asm/ticket_spinlock.h>
+
+#endif
+
+#include <asm/qrwlock.h>
+
+#endif /* __ASM_RISCV_SPINLOCK_H */
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 4f73c0ae44b2..929bccd4c9e5 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -244,6 +244,26 @@ static void __init parse_dtb(void)
 #endif
 }
 
+DEFINE_STATIC_KEY_TRUE(qspinlock_key);
+EXPORT_SYMBOL(qspinlock_key);
+
+static void __init riscv_spinlock_init(void)
+{
+	asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
+		 : : : : no_zacas);
+	asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
+		 : : : : qspinlock);
+
+no_zacas:
+	static_branch_disable(&qspinlock_key);
+	pr_info("Ticket spinlock: enabled\n");
+
+	return;
+
+qspinlock:
+	pr_info("Queued spinlock: enabled\n");
+}
+
 extern void __init init_rt_signal_env(void);
 
 void __init setup_arch(char **cmdline_p)
@@ -295,6 +315,7 @@ void __init setup_arch(char **cmdline_p)
 	riscv_set_dma_cache_alignment();
 
 	riscv_user_isa_enable();
+	riscv_spinlock_init();
 }
 
 bool arch_cpu_is_hotpluggable(int cpu)
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 0655aa5b57b2..bf47cca2c375 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
 }
 #endif
 
+#ifndef __no_arch_spinlock_redefine
 /*
  * Remapping spinlock architecture specific functions to the corresponding
  * queued spinlock functions.
@@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
 #define arch_spin_lock(l)		queued_spin_lock(l)
 #define arch_spin_trylock(l)		queued_spin_trylock(l)
 #define arch_spin_unlock(l)		queued_spin_unlock(l)
+#endif
 
 #endif /* __ASM_GENERIC_QSPINLOCK_H */
diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
index cfcff22b37b3..325779970d8a 100644
--- a/include/asm-generic/ticket_spinlock.h
+++ b/include/asm-generic/ticket_spinlock.h
@@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
 	return (s16)((val >> 16) - (val & 0xffff)) > 1;
 }
 
+#ifndef __no_arch_spinlock_redefine
 /*
  * Remapping spinlock architecture specific functions to the corresponding
  * ticket spinlock functions.
@@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
 #define arch_spin_lock(l)		ticket_spin_lock(l)
 #define arch_spin_trylock(l)		ticket_spin_trylock(l)
 #define arch_spin_unlock(l)		ticket_spin_unlock(l)
+#endif
 
 #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 02/10] dt-bindings: riscv: Add Zabha ISA extension description
  2024-06-26 13:03 ` [PATCH v2 02/10] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
@ 2024-06-26 14:20   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2024-06-26 14:20 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Andrea Parri, Nathan Chancellor, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

On 26/06/2024 15:03, Alexandre Ghiti wrote:
> Add description for the Zabha ISA extension which was ratified in April
> 2024.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>

<form letter>
Please use scripts/get_maintainers.pl to get a list of necessary people
and lists to CC. It might happen, that command when run on an older
kernel, gives you outdated entries. Therefore please be sure you base
your patches on recent Linux kernel.

Tools like b4 or scripts/get_maintainer.pl provide you proper list of
people, so fix your workflow. Tools might also fail if you work on some
ancient tree (don't, instead use mainline), work on fork of kernel
(don't, instead use mainline) or you ignore some maintainers (really
don't). Just use b4 and everything should be fine, although remember
about `b4 prep --auto-to-cc` if you added new patches to the patchset.

You missed at least devicetree list (maybe more), so this won't be
tested by automated tooling. Performing review on untested code might be
a waste of time.

Please kindly resend and include all necessary To/Cc entries.
</form letter>

Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-06-26 13:03 ` [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
@ 2024-06-27 11:06   ` Andrea Parri
  2024-07-04 16:25     ` Alexandre Ghiti
  2024-07-04  3:38   ` kernel test robot
  1 sibling, 1 reply; 45+ messages in thread
From: Andrea Parri @ 2024-06-27 11:06 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

> -#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)	\
> +#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
>  ({									\
> +	__label__ zacas, end;						\
>  	register unsigned int __rc;					\
>  									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
> +		asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,		\
> +				     RISCV_ISA_EXT_ZACAS, 1)		\
> +			 : : : : zacas);				\
> +	}								\
> +									\
>  	__asm__ __volatile__ (						\
>  		prepend							\
>  		"0:	lr" lr_sfx " %0, %2\n"				\
>  		"	bne  %0, %z3, 1f\n"				\
> -		"	sc" sc_sfx " %1, %z4, %2\n"			\
> +		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
>  		"	bnez %1, 0b\n"					\
>  		append							\
>  		"1:\n"							\
>  		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
>  		: "rJ" (co o), "rJ" (n)					\
>  		: "memory");						\
> +	goto end;							\
> +									\
> +zacas:									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
> +		__asm__ __volatile__ (					\
> +			prepend						\
> +			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
> +			append						\
> +			: "+&r" (r), "+A" (*(p))			\
> +			: "rJ" (n)					\
> +			: "memory");					\
> +	}								\

Is this second IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check actually needed?
(just wondering - no real objection)


> +end:;									\

Why the semicolon?


>  })
>  
>  #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
> @@ -156,7 +177,7 @@
>  	__typeof__(ptr) __ptr = (ptr);					\
>  	__typeof__(*(__ptr)) __old = (old);				\
>  	__typeof__(*(__ptr)) __new = (new);				\
> -	__typeof__(*(__ptr)) __ret;					\
> +	__typeof__(*(__ptr)) __ret = (old);				\

This is because the compiler doesn't realize __ret is actually
initialized, right?  IAC, seems a bit unexpected to initialize
with (old) (which indicates SUCCESS of the CMPXCHG operation);
how about using (new) for the initialization of __ret instead?
would (new) still work for you?

  Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha
  2024-06-26 13:03 ` [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha Alexandre Ghiti
@ 2024-06-27 11:53   ` Andrea Parri
  2024-06-29 19:19     ` Andrea Parri
  2024-07-04 16:36     ` Alexandre Ghiti
  0 siblings, 2 replies; 45+ messages in thread
From: Andrea Parri @ 2024-06-27 11:53 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

> -#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
> +#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
>  ({									\
> +	__label__ no_zacas, zabha, end;					\
> +									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
> +				     RISCV_ISA_EXT_ZACAS, 1)		\
> +			 : : : : no_zacas);				\
> +		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
> +				     RISCV_ISA_EXT_ZABHA, 1)		\
> +			 : : : : zabha);				\
> +	}								\
> +									\
> +no_zacas:;								\
>  	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>  	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>  	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
> @@ -133,6 +145,19 @@
>  		: "memory");						\
>  									\
>  	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
> +	goto end;							\
> +									\
> +zabha:									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
> +		__asm__ __volatile__ (					\
> +			prepend						\
> +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
> +			append						\
> +			: "+&r" (r), "+A" (*(p))			\
> +			: "rJ" (n)					\
> +			: "memory");					\
> +	}								\
> +end:;									\
>  })

I admit that I found this all quite difficult to read; IIUC, this is
missing an IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check.  How about adding
such a check under the zabha: label (replacing/in place of the second
IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) check) and moving the corresponding
asm goto statement there, perhaps as follows? (on top of this patch)

Also, the patch presents the first occurrence of RISCV_ISA_EXT_ZABHA;
perhaps worth moving the hwcap/cpufeature changes from patch #6 here?

  Andrea

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index b9a3fdcec919..3c913afec150 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -110,15 +110,12 @@
 	__label__ no_zacas, zabha, end;					\
 									\
 	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
-		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
-				     RISCV_ISA_EXT_ZACAS, 1)		\
-			 : : : : no_zacas);				\
 		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
 				     RISCV_ISA_EXT_ZABHA, 1)		\
 			 : : : : zabha);				\
 	}								\
 									\
-no_zacas:;								\
+no_zacas:								\
 	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
 	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
 	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
@@ -148,16 +145,20 @@ no_zacas:;								\
 	goto end;							\
 									\
 zabha:									\
-	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
-		__asm__ __volatile__ (					\
-			prepend						\
-			"	amocas" cas_sfx " %0, %z2, %1\n"	\
-			append						\
-			: "+&r" (r), "+A" (*(p))			\
-			: "rJ" (n)					\
-			: "memory");					\
+	if (IS_ENABLED(CONFIG_RISCV_ISA_ZAZAS)) {			\
+		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
+				     RISCV_ISA_EXT_ZACAS, 1)		\
+			 : : : : no_zacas);				\
 	}								\
-end:;									\
+									\
+	__asm__ __volatile__ (						\
+		prepend							\
+		"	amocas" cas_sfx " %0, %z2, %1\n"		\
+		append							\
+		: "+&r" (r), "+A" (*(p))				\
+		: "rJ" (n)						\
+		: "memory");						\
+end:									\
 })
 
 #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 04/10] riscv: Improve amocas.X use in cmpxchg()
  2024-06-26 13:03 ` [PATCH v2 04/10] riscv: Improve amocas.X use in cmpxchg() Alexandre Ghiti
@ 2024-06-27 13:31   ` Andrea Parri
  2024-07-04 16:40     ` Alexandre Ghiti
  0 siblings, 1 reply; 45+ messages in thread
From: Andrea Parri @ 2024-06-27 13:31 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch, Andrea Parri

On Wed, Jun 26, 2024 at 03:03:41PM +0200, Alexandre Ghiti wrote:
> cmpxchg() uses amocas.X instructions from Zacas and Zabha but still uses
> the LR/SC acquire/release semantics which require barriers.
> 
> Let's improve that by using proper amocas acquire/release semantics in
> order to avoid any of those barriers.

I can't really parse this changelog...


> Suggested-by: Andrea Parri <andrea@rivosinc.com>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/include/asm/cmpxchg.h | 60 ++++++++++++++++++--------------
>  1 file changed, 33 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index b9a3fdcec919..3c65b00a0d36 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -105,7 +105,9 @@
>   * indicated by comparing RETURN with OLD.
>   */
>  
> -#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
> +#define __arch_cmpxchg_masked(sc_sfx, cas_sfx,				\
> +			      sc_prepend, sc_append,			\
> +			      r, p, o, n)				\
>  ({									\
>  	__label__ no_zacas, zabha, end;					\
>  									\
> @@ -129,7 +131,7 @@ no_zacas:;								\
>  	ulong __rc;							\
>  									\
>  	__asm__ __volatile__ (						\
> -		prepend							\
> +		sc_prepend							\
>  		"0:	lr.w %0, %2\n"					\
>  		"	and  %1, %0, %z5\n"				\
>  		"	bne  %1, %z3, 1f\n"				\
> @@ -137,7 +139,7 @@ no_zacas:;								\
>  		"	or   %1, %1, %z4\n"				\
>  		"	sc.w" sc_sfx " %1, %1, %2\n"			\
>  		"	bnez %1, 0b\n"					\
> -		append							\
> +		sc_append							\
>  		"1:\n"							\
>  		: "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
>  		: "rJ" ((long)__oldx), "rJ" (__newx),			\
> @@ -150,9 +152,7 @@ no_zacas:;								\
>  zabha:									\
>  	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
>  		__asm__ __volatile__ (					\
> -			prepend						\
>  			"	amocas" cas_sfx " %0, %z2, %1\n"	\
> -			append						\
>  			: "+&r" (r), "+A" (*(p))			\
>  			: "rJ" (n)					\
>  			: "memory");					\
> @@ -160,7 +160,9 @@ zabha:									\
>  end:;									\
>  })
>  
> -#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
> +#define __arch_cmpxchg(lr_sfx, sc_sfx, cas_sfx,				\
> +		       sc_prepend, sc_append,				\
> +		       r, p, co, o, n)					\
>  ({									\
>  	__label__ zacas, end;						\
>  	register unsigned int __rc;					\
> @@ -172,12 +174,12 @@ end:;									\
>  	}								\
>  									\
>  	__asm__ __volatile__ (						\
> -		prepend							\
> +		sc_prepend							\
>  		"0:	lr" lr_sfx " %0, %2\n"				\
>  		"	bne  %0, %z3, 1f\n"				\
> -		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
> +		"	sc" sc_sfx " %1, %z4, %2\n"			\
>  		"	bnez %1, 0b\n"					\
> -		append							\
> +		sc_append							\
>  		"1:\n"							\
>  		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
>  		: "rJ" (co o), "rJ" (n)					\
> @@ -187,9 +189,7 @@ end:;									\
>  zacas:									\
>  	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
>  		__asm__ __volatile__ (					\
> -			prepend						\
> -			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
> -			append						\
> +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
>  			: "+&r" (r), "+A" (*(p))			\
>  			: "rJ" (n)					\
>  			: "memory");					\
> @@ -197,7 +197,8 @@ zacas:									\
>  end:;									\
>  })
>  
> -#define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
> +#define _arch_cmpxchg(ptr, old, new, sc_sfx, cas_sfx,			\
> +		      sc_prepend, sc_append)				\
>  ({									\
>  	__typeof__(ptr) __ptr = (ptr);					\
>  	__typeof__(*(__ptr)) __old = (old);				\
> @@ -206,22 +207,24 @@ end:;									\
>  									\
>  	switch (sizeof(*__ptr)) {					\
>  	case 1:								\
> -		__arch_cmpxchg_masked(sc_sfx, ".b" sc_sfx,		\
> -					prepend, append,		\
> -					__ret, __ptr, __old, __new);    \
> +		__arch_cmpxchg_masked(sc_sfx, ".b" cas_sfx,		\
> +				      sc_prepend, sc_append,		\
> +				      __ret, __ptr, __old, __new);	\
>  		break;							\
>  	case 2:								\
> -		__arch_cmpxchg_masked(sc_sfx, ".h" sc_sfx,		\
> -					prepend, append,		\
> -					__ret, __ptr, __old, __new);	\
> +		__arch_cmpxchg_masked(sc_sfx, ".h" cas_sfx,		\
> +				      sc_prepend, sc_append,		\
> +				      __ret, __ptr, __old, __new);	\
>  		break;							\
>  	case 4:								\
> -		__arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,	\
> -				__ret, __ptr, (long), __old, __new);	\
> +		__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx,		\
> +			       sc_prepend, sc_append,			\
> +			       __ret, __ptr, (long), __old, __new);	\
>  		break;							\
>  	case 8:								\
> -		__arch_cmpxchg(".d", ".d" sc_sfx, prepend, append,	\
> -				__ret, __ptr, /**/, __old, __new);	\
> +		__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx,		\
> +			       sc_prepend, sc_append,			\
> +			       __ret, __ptr, /**/, __old, __new);	\
>  		break;							\
>  	default:							\
>  		BUILD_BUG();						\
> @@ -230,16 +233,19 @@ end:;									\
>  })
>  
>  #define arch_cmpxchg_relaxed(ptr, o, n)					\
> -	_arch_cmpxchg((ptr), (o), (n), "", "", "")
> +	_arch_cmpxchg((ptr), (o), (n), "", "", "", "")
>  
>  #define arch_cmpxchg_acquire(ptr, o, n)					\
> -	_arch_cmpxchg((ptr), (o), (n), "", "", RISCV_ACQUIRE_BARRIER)
> +	_arch_cmpxchg((ptr), (o), (n), "", ".aq",			\
> +		      "", RISCV_ACQUIRE_BARRIER)
>  
>  #define arch_cmpxchg_release(ptr, o, n)					\
> -	_arch_cmpxchg((ptr), (o), (n), "", RISCV_RELEASE_BARRIER, "")
> +	_arch_cmpxchg((ptr), (o), (n), "", ".rl",			\
> +		      RISCV_RELEASE_BARRIER, "")
>  
>  #define arch_cmpxchg(ptr, o, n)						\
> -	_arch_cmpxchg((ptr), (o), (n), ".rl", "", "	fence rw, rw\n")
> +	_arch_cmpxchg((ptr), (o), (n), ".rl", ".aqrl",			\
> +		      "", RISCV_FULL_BARRIER)

... but this is not what I suggested: my suggestion [1] was about (limited
to) the fully-ordered macro arch_cmpxchg().  In fact, I've recently raised
some concern about similar changes to the acquire/release macros, cf. [2].

Any particular reasons for doing this?

  Andrea

[1] https://lore.kernel.org/lkml/ZlYff9x12FICHoP0@andrea/
[2] https://lore.kernel.org/lkml/20240505123340.38495-1-puranjay@kernel.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha
  2024-06-26 13:03 ` [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha Alexandre Ghiti
@ 2024-06-27 13:45   ` Andrea Parri
  2024-07-04 17:25     ` Alexandre Ghiti
  2024-07-10  1:37   ` Guo Ren
  1 sibling, 1 reply; 45+ messages in thread
From: Andrea Parri @ 2024-06-27 13:45 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

> -#define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)		\
> +#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,	\
> +			   swap_append, r, p, n)			\
>  ({									\
> +	__label__ zabha, end;						\
> +									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
> +		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
> +				     RISCV_ISA_EXT_ZABHA, 1)		\
> +			 : : : : zabha);				\
> +	}								\
> +									\
>  	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>  	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>  	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
> @@ -28,12 +37,25 @@
>  	       "	or   %1, %1, %z3\n"				\
>  	       "	sc.w" sc_sfx " %1, %1, %2\n"			\
>  	       "	bnez %1, 0b\n"					\
> -	       append							\
> +	       sc_append							\
>  	       : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
>  	       : "rJ" (__newx), "rJ" (~__mask)				\
>  	       : "memory");						\
>  									\
>  	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
> +	goto end;							\
> +									\
> +zabha:									\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
> +		__asm__ __volatile__ (					\
> +			prepend						\
> +			"	amoswap" swap_sfx " %0, %z2, %1\n"	\
> +			swap_append						\
> +			: "=&r" (r), "+A" (*(p))			\
> +			: "rJ" (n)					\
> +			: "memory");					\
> +	}								\
> +end:;									\
>  })

As for patch #1: why the semicolon? and should the second IS_ENABLED()
be kept?


> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index e17d0078a651..f71ddd2ca163 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -81,6 +81,7 @@
>  #define RISCV_ISA_EXT_ZTSO		72
>  #define RISCV_ISA_EXT_ZACAS		73
>  #define RISCV_ISA_EXT_XANDESPMU		74
> +#define RISCV_ISA_EXT_ZABHA		75
>  
>  #define RISCV_ISA_EXT_XLINUXENVCFG	127
>  
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 5ef48cb20ee1..c125d82c894b 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -257,6 +257,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
>  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>  	__RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
>  	__RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
> +	__RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA),
>  	__RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
>  	__RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
>  	__RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),

To be squashed into patch #3?

  Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 07/10] riscv: Improve amoswap.X use in xchg()
  2024-06-26 13:03 ` [PATCH v2 07/10] riscv: Improve amoswap.X use in xchg() Alexandre Ghiti
@ 2024-06-27 13:58   ` Andrea Parri
  2024-07-04 17:26     ` Alexandre Ghiti
  0 siblings, 1 reply; 45+ messages in thread
From: Andrea Parri @ 2024-06-27 13:58 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch, Andrea Parri

On Wed, Jun 26, 2024 at 03:03:44PM +0200, Alexandre Ghiti wrote:
> xchg() uses amoswap.X instructions from Zabha but still uses
> the LR/SC acquire/release semantics which require barriers.
> 
> Let's improve that by using proper amoswap acquire/release semantics in
> order to avoid any of those barriers.
> 
> Suggested-by: Andrea Parri <andrea@rivosinc.com>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/include/asm/cmpxchg.h | 35 +++++++++++++-------------------
>  1 file changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index eb35e2d30a97..0e57d5fbf227 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -11,8 +11,8 @@
>  #include <asm/fence.h>
>  #include <asm/alternative.h>
>  
> -#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,	\
> -			   swap_append, r, p, n)			\
> +#define __arch_xchg_masked(sc_sfx, swap_sfx, sc_prepend, sc_append,	\
> +			   r, p, n)					\
>  ({									\
>  	__label__ zabha, end;						\
>  									\
> @@ -31,7 +31,7 @@
>  	ulong __rc;							\
>  									\
>  	__asm__ __volatile__ (						\
> -	       prepend							\
> +	       sc_prepend							\
>  	       "0:	lr.w %0, %2\n"					\
>  	       "	and  %1, %0, %z4\n"				\
>  	       "	or   %1, %1, %z3\n"				\
> @@ -48,9 +48,7 @@
>  zabha:									\
>  	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
>  		__asm__ __volatile__ (					\
> -			prepend						\
>  			"	amoswap" swap_sfx " %0, %z2, %1\n"	\
> -			swap_append						\
>  			: "=&r" (r), "+A" (*(p))			\
>  			: "rJ" (n)					\
>  			: "memory");					\
> @@ -58,19 +56,17 @@ zabha:									\
>  end:;									\
>  })
>  
> -#define __arch_xchg(sfx, prepend, append, r, p, n)			\
> +#define __arch_xchg(sfx, r, p, n)					\
>  ({									\
>  	__asm__ __volatile__ (						\
> -		prepend							\
>  		"	amoswap" sfx " %0, %2, %1\n"			\
> -		append							\
>  		: "=r" (r), "+A" (*(p))					\
>  		: "r" (n)						\
>  		: "memory");						\
>  })
>  
> -#define _arch_xchg(ptr, new, sc_sfx, swap_sfx, prepend,			\
> -		   sc_append, swap_append)				\
> +#define _arch_xchg(ptr, new, sc_sfx, swap_sfx,				\
> +		   sc_prepend, sc_append)				\
>  ({									\
>  	__typeof__(ptr) __ptr = (ptr);					\
>  	__typeof__(*(__ptr)) __new = (new);				\
> @@ -79,21 +75,19 @@ end:;									\
>  	switch (sizeof(*__ptr)) {					\
>  	case 1:								\
>  		__arch_xchg_masked(sc_sfx, ".b" swap_sfx,		\
> -				   prepend, sc_append, swap_append,	\
> +				   sc_prepend, sc_append,		\
>  				   __ret, __ptr, __new);		\
>  		break;							\
>  	case 2:								\
>  		__arch_xchg_masked(sc_sfx, ".h" swap_sfx,		\
> -				   prepend, sc_append, swap_append,	\
> +				   sc_prepend, sc_append,		\
>  				   __ret, __ptr, __new);		\
>  		break;							\
>  	case 4:								\
> -		__arch_xchg(".w" swap_sfx, prepend, swap_append,	\
> -			      __ret, __ptr, __new);			\
> +		__arch_xchg(".w" swap_sfx,  __ret, __ptr, __new);	\
>  		break;							\
>  	case 8:								\
> -		__arch_xchg(".d" swap_sfx, prepend, swap_append,	\
> -			      __ret, __ptr, __new);			\
> +		__arch_xchg(".d" swap_sfx,  __ret, __ptr, __new);	\
>  		break;							\
>  	default:							\
>  		BUILD_BUG();						\
> @@ -102,17 +96,16 @@ end:;									\
>  })
>  
>  #define arch_xchg_relaxed(ptr, x)					\
> -	_arch_xchg(ptr, x, "", "", "", "", "")
> +	_arch_xchg(ptr, x, "", "", "", "")
>  
>  #define arch_xchg_acquire(ptr, x)					\
> -	_arch_xchg(ptr, x, "", "", "",					\
> -		   RISCV_ACQUIRE_BARRIER, RISCV_ACQUIRE_BARRIER)
> +	_arch_xchg(ptr, x, "", ".aq", "", RISCV_ACQUIRE_BARRIER)
>  
>  #define arch_xchg_release(ptr, x)					\
> -	_arch_xchg(ptr, x, "", "", RISCV_RELEASE_BARRIER, "", "")
> +	_arch_xchg(ptr, x, "", ".rl", RISCV_RELEASE_BARRIER, "")
>  
>  #define arch_xchg(ptr, x)						\
> -	_arch_xchg(ptr, x, ".rl", ".aqrl", "", RISCV_FULL_BARRIER, "")
> +	_arch_xchg(ptr, x, ".rl", ".aqrl", "", RISCV_FULL_BARRIER)

I actually see no reason for this patch, please see also my remarks
/question on patch #4.

  Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-06-26 13:03 ` [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension Alexandre Ghiti
@ 2024-06-27 15:19   ` Andrea Parri
  2024-07-04 17:33     ` Alexandre Ghiti
  2024-07-07  2:20   ` Guo Ren
  1 sibling, 1 reply; 45+ messages in thread
From: Andrea Parri @ 2024-06-27 15:19 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

> This is largely based on Guo's work and Leonardo reviews at [1].

Guo, could/should this have your Co-developed-by:/Signed-off-by:?

(disclaimer: I haven't looked at the last three patches of this submission
with due calm and probably won't before ~mid-July...)


> Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]

There seems to be a distinct lack of experimental results, compared to the
previous/cited submission (and numbers are good to have!!  ;-)). Maybe Guo
/others can provide some? to confirm this is going in the right direction.

  Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha
  2024-06-27 11:53   ` Andrea Parri
@ 2024-06-29 19:19     ` Andrea Parri
  2024-07-04 16:36     ` Alexandre Ghiti
  1 sibling, 0 replies; 45+ messages in thread
From: Andrea Parri @ 2024-06-29 19:19 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

> I admit that I found this all quite difficult to read; IIUC, this is
> missing an IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check.  How about adding
> such a check under the zabha: label (replacing/in place of the second
> IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) check) and moving the corresponding
> asm goto statement there, perhaps as follows? (on top of this patch)

[...]

> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index b9a3fdcec919..3c913afec150 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -110,15 +110,12 @@
>  	__label__ no_zacas, zabha, end;					\
>  									\
>  	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
> -		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
> -				     RISCV_ISA_EXT_ZACAS, 1)		\
> -			 : : : : no_zacas);				\
>  		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
>  				     RISCV_ISA_EXT_ZABHA, 1)		\
>  			 : : : : zabha);				\
>  	}								\
>  									\
> -no_zacas:;								\
> +no_zacas:								\
>  	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>  	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>  	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
> @@ -148,16 +145,20 @@ no_zacas:;								\
>  	goto end;							\
>  									\
>  zabha:									\
> -	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
> -		__asm__ __volatile__ (					\
> -			prepend						\
> -			"	amocas" cas_sfx " %0, %z2, %1\n"	\
> -			append						\
> -			: "+&r" (r), "+A" (*(p))			\
> -			: "rJ" (n)					\
> -			: "memory");					\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZAZAS)) {			\
> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
> +				     RISCV_ISA_EXT_ZACAS, 1)		\
> +			 : : : : no_zacas);				\
>  	}								\
> -end:;									\
> +									\
> +	__asm__ __volatile__ (						\
> +		prepend							\
> +		"	amocas" cas_sfx " %0, %z2, %1\n"		\
> +		append							\
> +		: "+&r" (r), "+A" (*(p))				\
> +		: "rJ" (n)						\
> +		: "memory");						\
> +end:									\
>  })
>  
>  #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\

Unfortunately, this diff won't work e.g. when ZABHA is supported and
detected at boot while ZACAS is not supported; more thinking required...

  Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-06-26 13:03 ` [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
  2024-06-27 11:06   ` Andrea Parri
@ 2024-07-04  3:38   ` kernel test robot
  2024-07-05 17:27     ` Nathan Chancellor
  1 sibling, 1 reply; 45+ messages in thread
From: kernel test robot @ 2024-07-04  3:38 UTC (permalink / raw)
  To: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Andrea Parri, Nathan Chancellor, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch
  Cc: llvm, oe-kbuild-all, Alexandre Ghiti

Hi Alexandre,

kernel test robot noticed the following build errors:

[auto build test ERROR on soc/for-next]
[also build test ERROR on linus/master v6.10-rc6 next-20240703]
[cannot apply to arnd-asm-generic/master robh/for-next tip/locking/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Alexandre-Ghiti/riscv-Implement-cmpxchg32-64-using-Zacas/20240627-034946
base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
patch link:    https://lore.kernel.org/r/20240626130347.520750-2-alexghiti%40rivosinc.com
patch subject: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
config: riscv-randconfig-002-20240704 (https://download.01.org/0day-ci/archive/20240704/202407041157.odTZAYZ6-lkp@intel.com/config)
compiler: clang version 16.0.6 (https://github.com/llvm/llvm-project 7cbf1a2591520c2491aa35339f227775f4d3adf6)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240704/202407041157.odTZAYZ6-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407041157.odTZAYZ6-lkp@intel.com/

All errors (new ones prefixed by >>):

>> kernel/sched/core.c:11873:7: error: cannot jump from this asm goto statement to one of its possible targets
                   if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
                       ^
   include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
           raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
           ^
   include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
           ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
                  ^
   include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
   #define raw_cmpxchg arch_cmpxchg
                       ^
   arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
           _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
           ^
   arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
                   __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
                   ^
   arch/riscv/include/asm/cmpxchg.h:144:3: note: expanded from macro '__arch_cmpxchg'
                   asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,            \
                   ^
   kernel/sched/core.c:11840:7: note: possible target of asm goto statement
           if (!try_cmpxchg(&pcpu_cid->cid, &cid, lazy_cid))
                ^
   include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
           raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
           ^
   include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
           ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
                  ^
   include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
   #define raw_cmpxchg arch_cmpxchg
                       ^
   arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
           _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
           ^
   arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
                   __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
                   ^
   arch/riscv/include/asm/cmpxchg.h:161:10: note: expanded from macro '__arch_cmpxchg'
                                                                           \
                                                                           ^
   kernel/sched/core.c:11872:2: note: jump exits scope of variable with __attribute__((cleanup))
           scoped_guard (irqsave) {
           ^
   include/linux/cleanup.h:169:20: note: expanded from macro 'scoped_guard'
           for (CLASS(_name, scope)(args),                                 \
                             ^
   kernel/sched/core.c:11840:7: error: cannot jump from this asm goto statement to one of its possible targets
           if (!try_cmpxchg(&pcpu_cid->cid, &cid, lazy_cid))
                ^
   include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
           raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
           ^
   include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
           ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
                  ^
   include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
   #define raw_cmpxchg arch_cmpxchg
                       ^
   arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
           _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
           ^
   arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
                   __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
                   ^
   arch/riscv/include/asm/cmpxchg.h:144:3: note: expanded from macro '__arch_cmpxchg'
                   asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,            \
                   ^
   kernel/sched/core.c:11873:7: note: possible target of asm goto statement
                   if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
                       ^
   include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
           raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
           ^
   include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
           ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
                  ^
   include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
   #define raw_cmpxchg arch_cmpxchg
                       ^
   arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
           _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
           ^
   arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
                   __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
                   ^
   arch/riscv/include/asm/cmpxchg.h:161:10: note: expanded from macro '__arch_cmpxchg'
                                                                           \
                                                                           ^
   kernel/sched/core.c:11872:2: note: jump bypasses initialization of variable with __attribute__((cleanup))
           scoped_guard (irqsave) {
           ^
   include/linux/cleanup.h:169:20: note: expanded from macro 'scoped_guard'
           for (CLASS(_name, scope)(args),                                 \
                             ^
   2 errors generated.


vim +11873 kernel/sched/core.c

223baf9d17f25e Mathieu Desnoyers 2023-04-20  11821  
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11822  static void sched_mm_cid_remote_clear(struct mm_struct *mm, struct mm_cid *pcpu_cid,
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11823  				      int cpu)
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11824  {
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11825  	struct rq *rq = cpu_rq(cpu);
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11826  	struct task_struct *t;
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11827  	int cid, lazy_cid;
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11828  
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11829  	cid = READ_ONCE(pcpu_cid->cid);
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11830  	if (!mm_cid_is_valid(cid))
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11831  		return;
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11832  
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11833  	/*
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11834  	 * Clear the cpu cid if it is set to keep cid allocation compact.  If
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11835  	 * there happens to be other tasks left on the source cpu using this
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11836  	 * mm, the next task using this mm will reallocate its cid on context
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11837  	 * switch.
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11838  	 */
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11839  	lazy_cid = mm_cid_set_lazy_put(cid);
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11840  	if (!try_cmpxchg(&pcpu_cid->cid, &cid, lazy_cid))
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11841  		return;
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11842  
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11843  	/*
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11844  	 * The implicit barrier after cmpxchg per-mm/cpu cid before loading
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11845  	 * rq->curr->mm matches the scheduler barrier in context_switch()
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11846  	 * between store to rq->curr and load of prev and next task's
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11847  	 * per-mm/cpu cid.
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11848  	 *
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11849  	 * The implicit barrier after cmpxchg per-mm/cpu cid before loading
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11850  	 * rq->curr->mm_cid_active matches the barrier in
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11851  	 * sched_mm_cid_exit_signals(), sched_mm_cid_before_execve(), and
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11852  	 * sched_mm_cid_after_execve() between store to t->mm_cid_active and
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11853  	 * load of per-mm/cpu cid.
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11854  	 */
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11855  
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11856  	/*
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11857  	 * If we observe an active task using the mm on this rq after setting
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11858  	 * the lazy-put flag, that task will be responsible for transitioning
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11859  	 * from lazy-put flag set to MM_CID_UNSET.
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11860  	 */
0e34600ac9317d Peter Zijlstra    2023-06-09  11861  	scoped_guard (rcu) {
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11862  		t = rcu_dereference(rq->curr);
0e34600ac9317d Peter Zijlstra    2023-06-09  11863  		if (READ_ONCE(t->mm_cid_active) && t->mm == mm)
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11864  			return;
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11865  	}
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11866  
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11867  	/*
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11868  	 * The cid is unused, so it can be unset.
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11869  	 * Disable interrupts to keep the window of cid ownership without rq
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11870  	 * lock small.
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11871  	 */
0e34600ac9317d Peter Zijlstra    2023-06-09  11872  	scoped_guard (irqsave) {
223baf9d17f25e Mathieu Desnoyers 2023-04-20 @11873  		if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
223baf9d17f25e Mathieu Desnoyers 2023-04-20  11874  			__mm_cid_put(mm, cid);
0e34600ac9317d Peter Zijlstra    2023-06-09  11875  	}
af7f588d8f7355 Mathieu Desnoyers 2022-11-22  11876  }
af7f588d8f7355 Mathieu Desnoyers 2022-11-22  11877  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-06-27 11:06   ` Andrea Parri
@ 2024-07-04 16:25     ` Alexandre Ghiti
  2024-07-09 23:47       ` Andrea Parri
  0 siblings, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-04 16:25 UTC (permalink / raw)
  To: Andrea Parri, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Andrea,

On 27/06/2024 13:06, Andrea Parri wrote:
>> -#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n)	\
>> +#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
>>   ({									\
>> +	__label__ zacas, end;						\
>>   	register unsigned int __rc;					\
>>   									\
>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
>> +		asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,		\
>> +				     RISCV_ISA_EXT_ZACAS, 1)		\
>> +			 : : : : zacas);				\
>> +	}								\
>> +									\
>>   	__asm__ __volatile__ (						\
>>   		prepend							\
>>   		"0:	lr" lr_sfx " %0, %2\n"				\
>>   		"	bne  %0, %z3, 1f\n"				\
>> -		"	sc" sc_sfx " %1, %z4, %2\n"			\
>> +		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
>>   		"	bnez %1, 0b\n"					\
>>   		append							\
>>   		"1:\n"							\
>>   		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
>>   		: "rJ" (co o), "rJ" (n)					\
>>   		: "memory");						\
>> +	goto end;							\
>> +									\
>> +zacas:									\
>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
>> +		__asm__ __volatile__ (					\
>> +			prepend						\
>> +			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
>> +			append						\
>> +			: "+&r" (r), "+A" (*(p))			\
>> +			: "rJ" (n)					\
>> +			: "memory");					\
>> +	}								\
> Is this second IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check actually needed?
> (just wondering - no real objection)


To me yes, otherwise a toolchain without zacas support would fail to 
assemble the amocas instruction.


>
>
>> +end:;									\
> Why the semicolon?


That fixes a clang warning reported by Nathan here: 
https://lore.kernel.org/linux-riscv/20240528193110.GA2196855@thelio-3990X/


>
>
>>   })
>>   
>>   #define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
>> @@ -156,7 +177,7 @@
>>   	__typeof__(ptr) __ptr = (ptr);					\
>>   	__typeof__(*(__ptr)) __old = (old);				\
>>   	__typeof__(*(__ptr)) __new = (new);				\
>> -	__typeof__(*(__ptr)) __ret;					\
>> +	__typeof__(*(__ptr)) __ret = (old);				\
> This is because the compiler doesn't realize __ret is actually
> initialized, right?  IAC, seems a bit unexpected to initialize
> with (old) (which indicates SUCCESS of the CMPXCHG operation);
> how about using (new) for the initialization of __ret instead?
> would (new) still work for you?


But amocas rd register must contain the expected old value in order to 
actually work right?


>
>    Andrea
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha
  2024-06-27 11:53   ` Andrea Parri
  2024-06-29 19:19     ` Andrea Parri
@ 2024-07-04 16:36     ` Alexandre Ghiti
  2024-07-09 23:51       ` Andrea Parri
  1 sibling, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-04 16:36 UTC (permalink / raw)
  To: Andrea Parri, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch


On 27/06/2024 13:53, Andrea Parri wrote:
>> -#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n)	\
>> +#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
>>   ({									\
>> +	__label__ no_zacas, zabha, end;					\
>> +									\
>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
>> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
>> +				     RISCV_ISA_EXT_ZACAS, 1)		\
>> +			 : : : : no_zacas);				\
>> +		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
>> +				     RISCV_ISA_EXT_ZABHA, 1)		\
>> +			 : : : : zabha);				\
>> +	}								\
>> +									\
>> +no_zacas:;								\
>>   	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>>   	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>>   	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
>> @@ -133,6 +145,19 @@
>>   		: "memory");						\
>>   									\
>>   	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
>> +	goto end;							\
>> +									\
>> +zabha:									\
>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
>> +		__asm__ __volatile__ (					\
>> +			prepend						\
>> +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
>> +			append						\
>> +			: "+&r" (r), "+A" (*(p))			\
>> +			: "rJ" (n)					\
>> +			: "memory");					\
>> +	}								\
>> +end:;									\
>>   })
> I admit that I found this all quite difficult to read; IIUC, this is
> missing an IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check.


I'm not sure we need the zacas check here, since we could use a 
toolchain that supports zabha but not zacas, run this on a zabha/zacas 
platform and it would work.


>    How about adding
> such a check under the zabha: label (replacing/in place of the second
> IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) check) and moving the corresponding
> asm goto statement there, perhaps as follows? (on top of this patch)


If that makes things clearer for you, sure, I can do that.


>
> Also, the patch presents the first occurrence of RISCV_ISA_EXT_ZABHA;
> perhaps worth moving the hwcap/cpufeature changes from patch #6 here?
>
>    Andrea
>
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index b9a3fdcec919..3c913afec150 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -110,15 +110,12 @@
>   	__label__ no_zacas, zabha, end;					\
>   									\
>   	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
> -		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
> -				     RISCV_ISA_EXT_ZACAS, 1)		\
> -			 : : : : no_zacas);				\
>   		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
>   				     RISCV_ISA_EXT_ZABHA, 1)		\
>   			 : : : : zabha);				\
>   	}								\
>   									\
> -no_zacas:;								\
> +no_zacas:								\
>   	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>   	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>   	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
> @@ -148,16 +145,20 @@ no_zacas:;								\
>   	goto end;							\
>   									\
>   zabha:									\
> -	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\


But we need to keep this check, otherwise it would fail to compile on 
toolchains without Zabha support.


> -		__asm__ __volatile__ (					\
> -			prepend						\
> -			"	amocas" cas_sfx " %0, %z2, %1\n"	\
> -			append						\
> -			: "+&r" (r), "+A" (*(p))			\
> -			: "rJ" (n)					\
> -			: "memory");					\
> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZAZAS)) {			\
> +		asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0,		\
> +				     RISCV_ISA_EXT_ZACAS, 1)		\
> +			 : : : : no_zacas);				\
>   	}								\
> -end:;									\
> +									\
> +	__asm__ __volatile__ (						\
> +		prepend							\
> +		"	amocas" cas_sfx " %0, %z2, %1\n"		\
> +		append							\
> +		: "+&r" (r), "+A" (*(p))				\
> +		: "rJ" (n)						\
> +		: "memory");						\
> +end:									\
>   })
>   
>   #define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 04/10] riscv: Improve amocas.X use in cmpxchg()
  2024-06-27 13:31   ` Andrea Parri
@ 2024-07-04 16:40     ` Alexandre Ghiti
  0 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-04 16:40 UTC (permalink / raw)
  To: Andrea Parri, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch, Andrea Parri

On 27/06/2024 15:31, Andrea Parri wrote:
> On Wed, Jun 26, 2024 at 03:03:41PM +0200, Alexandre Ghiti wrote:
>> cmpxchg() uses amocas.X instructions from Zacas and Zabha but still uses
>> the LR/SC acquire/release semantics which require barriers.
>>
>> Let's improve that by using proper amocas acquire/release semantics in
>> order to avoid any of those barriers.
> I can't really parse this changelog...
>
>
>> Suggested-by: Andrea Parri <andrea@rivosinc.com>
>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>> ---
>>   arch/riscv/include/asm/cmpxchg.h | 60 ++++++++++++++++++--------------
>>   1 file changed, 33 insertions(+), 27 deletions(-)
>>
>> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
>> index b9a3fdcec919..3c65b00a0d36 100644
>> --- a/arch/riscv/include/asm/cmpxchg.h
>> +++ b/arch/riscv/include/asm/cmpxchg.h
>> @@ -105,7 +105,9 @@
>>    * indicated by comparing RETURN with OLD.
>>    */
>>   
>> -#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, prepend, append, r, p, o, n)	\
>> +#define __arch_cmpxchg_masked(sc_sfx, cas_sfx,				\
>> +			      sc_prepend, sc_append,			\
>> +			      r, p, o, n)				\
>>   ({									\
>>   	__label__ no_zacas, zabha, end;					\
>>   									\
>> @@ -129,7 +131,7 @@ no_zacas:;								\
>>   	ulong __rc;							\
>>   									\
>>   	__asm__ __volatile__ (						\
>> -		prepend							\
>> +		sc_prepend							\
>>   		"0:	lr.w %0, %2\n"					\
>>   		"	and  %1, %0, %z5\n"				\
>>   		"	bne  %1, %z3, 1f\n"				\
>> @@ -137,7 +139,7 @@ no_zacas:;								\
>>   		"	or   %1, %1, %z4\n"				\
>>   		"	sc.w" sc_sfx " %1, %1, %2\n"			\
>>   		"	bnez %1, 0b\n"					\
>> -		append							\
>> +		sc_append							\
>>   		"1:\n"							\
>>   		: "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
>>   		: "rJ" ((long)__oldx), "rJ" (__newx),			\
>> @@ -150,9 +152,7 @@ no_zacas:;								\
>>   zabha:									\
>>   	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
>>   		__asm__ __volatile__ (					\
>> -			prepend						\
>>   			"	amocas" cas_sfx " %0, %z2, %1\n"	\
>> -			append						\
>>   			: "+&r" (r), "+A" (*(p))			\
>>   			: "rJ" (n)					\
>>   			: "memory");					\
>> @@ -160,7 +160,9 @@ zabha:									\
>>   end:;									\
>>   })
>>   
>> -#define __arch_cmpxchg(lr_sfx, sc_cas_sfx, prepend, append, r, p, co, o, n)	\
>> +#define __arch_cmpxchg(lr_sfx, sc_sfx, cas_sfx,				\
>> +		       sc_prepend, sc_append,				\
>> +		       r, p, co, o, n)					\
>>   ({									\
>>   	__label__ zacas, end;						\
>>   	register unsigned int __rc;					\
>> @@ -172,12 +174,12 @@ end:;									\
>>   	}								\
>>   									\
>>   	__asm__ __volatile__ (						\
>> -		prepend							\
>> +		sc_prepend							\
>>   		"0:	lr" lr_sfx " %0, %2\n"				\
>>   		"	bne  %0, %z3, 1f\n"				\
>> -		"	sc" sc_cas_sfx " %1, %z4, %2\n"			\
>> +		"	sc" sc_sfx " %1, %z4, %2\n"			\
>>   		"	bnez %1, 0b\n"					\
>> -		append							\
>> +		sc_append							\
>>   		"1:\n"							\
>>   		: "=&r" (r), "=&r" (__rc), "+A" (*(p))			\
>>   		: "rJ" (co o), "rJ" (n)					\
>> @@ -187,9 +189,7 @@ end:;									\
>>   zacas:									\
>>   	if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS)) {			\
>>   		__asm__ __volatile__ (					\
>> -			prepend						\
>> -			"	amocas" sc_cas_sfx " %0, %z2, %1\n"	\
>> -			append						\
>> +			"	amocas" cas_sfx " %0, %z2, %1\n"	\
>>   			: "+&r" (r), "+A" (*(p))			\
>>   			: "rJ" (n)					\
>>   			: "memory");					\
>> @@ -197,7 +197,8 @@ zacas:									\
>>   end:;									\
>>   })
>>   
>> -#define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append)		\
>> +#define _arch_cmpxchg(ptr, old, new, sc_sfx, cas_sfx,			\
>> +		      sc_prepend, sc_append)				\
>>   ({									\
>>   	__typeof__(ptr) __ptr = (ptr);					\
>>   	__typeof__(*(__ptr)) __old = (old);				\
>> @@ -206,22 +207,24 @@ end:;									\
>>   									\
>>   	switch (sizeof(*__ptr)) {					\
>>   	case 1:								\
>> -		__arch_cmpxchg_masked(sc_sfx, ".b" sc_sfx,		\
>> -					prepend, append,		\
>> -					__ret, __ptr, __old, __new);    \
>> +		__arch_cmpxchg_masked(sc_sfx, ".b" cas_sfx,		\
>> +				      sc_prepend, sc_append,		\
>> +				      __ret, __ptr, __old, __new);	\
>>   		break;							\
>>   	case 2:								\
>> -		__arch_cmpxchg_masked(sc_sfx, ".h" sc_sfx,		\
>> -					prepend, append,		\
>> -					__ret, __ptr, __old, __new);	\
>> +		__arch_cmpxchg_masked(sc_sfx, ".h" cas_sfx,		\
>> +				      sc_prepend, sc_append,		\
>> +				      __ret, __ptr, __old, __new);	\
>>   		break;							\
>>   	case 4:								\
>> -		__arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,	\
>> -				__ret, __ptr, (long), __old, __new);	\
>> +		__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx,		\
>> +			       sc_prepend, sc_append,			\
>> +			       __ret, __ptr, (long), __old, __new);	\
>>   		break;							\
>>   	case 8:								\
>> -		__arch_cmpxchg(".d", ".d" sc_sfx, prepend, append,	\
>> -				__ret, __ptr, /**/, __old, __new);	\
>> +		__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx,		\
>> +			       sc_prepend, sc_append,			\
>> +			       __ret, __ptr, /**/, __old, __new);	\
>>   		break;							\
>>   	default:							\
>>   		BUILD_BUG();						\
>> @@ -230,16 +233,19 @@ end:;									\
>>   })
>>   
>>   #define arch_cmpxchg_relaxed(ptr, o, n)					\
>> -	_arch_cmpxchg((ptr), (o), (n), "", "", "")
>> +	_arch_cmpxchg((ptr), (o), (n), "", "", "", "")
>>   
>>   #define arch_cmpxchg_acquire(ptr, o, n)					\
>> -	_arch_cmpxchg((ptr), (o), (n), "", "", RISCV_ACQUIRE_BARRIER)
>> +	_arch_cmpxchg((ptr), (o), (n), "", ".aq",			\
>> +		      "", RISCV_ACQUIRE_BARRIER)
>>   
>>   #define arch_cmpxchg_release(ptr, o, n)					\
>> -	_arch_cmpxchg((ptr), (o), (n), "", RISCV_RELEASE_BARRIER, "")
>> +	_arch_cmpxchg((ptr), (o), (n), "", ".rl",			\
>> +		      RISCV_RELEASE_BARRIER, "")
>>   
>>   #define arch_cmpxchg(ptr, o, n)						\
>> -	_arch_cmpxchg((ptr), (o), (n), ".rl", "", "	fence rw, rw\n")
>> +	_arch_cmpxchg((ptr), (o), (n), ".rl", ".aqrl",			\
>> +		      "", RISCV_FULL_BARRIER)
> ... but this is not what I suggested: my suggestion [1] was about (limited
> to) the fully-ordered macro arch_cmpxchg().  In fact, I've recently raised
> some concern about similar changes to the acquire/release macros, cf. [2].
>
> Any particular reasons for doing this?


Not at all, I overinterpreted your suggestion, I'll restrict this to the 
fully-ordered macro then.

Thanks,


>
>    Andrea
>
> [1] https://lore.kernel.org/lkml/ZlYff9x12FICHoP0@andrea/
> [2] https://lore.kernel.org/lkml/20240505123340.38495-1-puranjay@kernel.org/
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha
  2024-06-27 13:45   ` Andrea Parri
@ 2024-07-04 17:25     ` Alexandre Ghiti
  0 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-04 17:25 UTC (permalink / raw)
  To: Andrea Parri, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch


On 27/06/2024 15:45, Andrea Parri wrote:
>> -#define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)		\
>> +#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,	\
>> +			   swap_append, r, p, n)			\
>>   ({									\
>> +	__label__ zabha, end;						\
>> +									\
>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
>> +		asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,		\
>> +				     RISCV_ISA_EXT_ZABHA, 1)		\
>> +			 : : : : zabha);				\
>> +	}								\
>> +									\
>>   	u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);			\
>>   	ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;	\
>>   	ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)	\
>> @@ -28,12 +37,25 @@
>>   	       "	or   %1, %1, %z3\n"				\
>>   	       "	sc.w" sc_sfx " %1, %1, %2\n"			\
>>   	       "	bnez %1, 0b\n"					\
>> -	       append							\
>> +	       sc_append							\
>>   	       : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
>>   	       : "rJ" (__newx), "rJ" (~__mask)				\
>>   	       : "memory");						\
>>   									\
>>   	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
>> +	goto end;							\
>> +									\
>> +zabha:									\
>> +	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
>> +		__asm__ __volatile__ (					\
>> +			prepend						\
>> +			"	amoswap" swap_sfx " %0, %z2, %1\n"	\
>> +			swap_append						\
>> +			: "=&r" (r), "+A" (*(p))			\
>> +			: "rJ" (n)					\
>> +			: "memory");					\
>> +	}								\
>> +end:;									\
>>   })
> As for patch #1: why the semicolon? and should the second IS_ENABLED()
> be kept?
>
>
>> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
>> index e17d0078a651..f71ddd2ca163 100644
>> --- a/arch/riscv/include/asm/hwcap.h
>> +++ b/arch/riscv/include/asm/hwcap.h
>> @@ -81,6 +81,7 @@
>>   #define RISCV_ISA_EXT_ZTSO		72
>>   #define RISCV_ISA_EXT_ZACAS		73
>>   #define RISCV_ISA_EXT_XANDESPMU		74
>> +#define RISCV_ISA_EXT_ZABHA		75
>>   
>>   #define RISCV_ISA_EXT_XLINUXENVCFG	127
>>   
>> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>> index 5ef48cb20ee1..c125d82c894b 100644
>> --- a/arch/riscv/kernel/cpufeature.c
>> +++ b/arch/riscv/kernel/cpufeature.c
>> @@ -257,6 +257,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
>>   	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>>   	__RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
>>   	__RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
>> +	__RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA),
>>   	__RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
>>   	__RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
>>   	__RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),
> To be squashed into patch #3?


Yep, done, thanks.


>
>    Andrea
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 07/10] riscv: Improve amoswap.X use in xchg()
  2024-06-27 13:58   ` Andrea Parri
@ 2024-07-04 17:26     ` Alexandre Ghiti
  2024-07-10  0:09       ` Andrea Parri
  0 siblings, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-04 17:26 UTC (permalink / raw)
  To: Andrea Parri, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch, Andrea Parri

On 27/06/2024 15:58, Andrea Parri wrote:
> On Wed, Jun 26, 2024 at 03:03:44PM +0200, Alexandre Ghiti wrote:
>> xchg() uses amoswap.X instructions from Zabha but still uses
>> the LR/SC acquire/release semantics which require barriers.
>>
>> Let's improve that by using proper amoswap acquire/release semantics in
>> order to avoid any of those barriers.
>>
>> Suggested-by: Andrea Parri <andrea@rivosinc.com>
>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>> ---
>>   arch/riscv/include/asm/cmpxchg.h | 35 +++++++++++++-------------------
>>   1 file changed, 14 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
>> index eb35e2d30a97..0e57d5fbf227 100644
>> --- a/arch/riscv/include/asm/cmpxchg.h
>> +++ b/arch/riscv/include/asm/cmpxchg.h
>> @@ -11,8 +11,8 @@
>>   #include <asm/fence.h>
>>   #include <asm/alternative.h>
>>   
>> -#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,	\
>> -			   swap_append, r, p, n)			\
>> +#define __arch_xchg_masked(sc_sfx, swap_sfx, sc_prepend, sc_append,	\
>> +			   r, p, n)					\
>>   ({									\
>>   	__label__ zabha, end;						\
>>   									\
>> @@ -31,7 +31,7 @@
>>   	ulong __rc;							\
>>   									\
>>   	__asm__ __volatile__ (						\
>> -	       prepend							\
>> +	       sc_prepend							\
>>   	       "0:	lr.w %0, %2\n"					\
>>   	       "	and  %1, %0, %z4\n"				\
>>   	       "	or   %1, %1, %z3\n"				\
>> @@ -48,9 +48,7 @@
>>   zabha:									\
>>   	if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {			\
>>   		__asm__ __volatile__ (					\
>> -			prepend						\
>>   			"	amoswap" swap_sfx " %0, %z2, %1\n"	\
>> -			swap_append						\
>>   			: "=&r" (r), "+A" (*(p))			\
>>   			: "rJ" (n)					\
>>   			: "memory");					\
>> @@ -58,19 +56,17 @@ zabha:									\
>>   end:;									\
>>   })
>>   
>> -#define __arch_xchg(sfx, prepend, append, r, p, n)			\
>> +#define __arch_xchg(sfx, r, p, n)					\
>>   ({									\
>>   	__asm__ __volatile__ (						\
>> -		prepend							\
>>   		"	amoswap" sfx " %0, %2, %1\n"			\
>> -		append							\
>>   		: "=r" (r), "+A" (*(p))					\
>>   		: "r" (n)						\
>>   		: "memory");						\
>>   })
>>   
>> -#define _arch_xchg(ptr, new, sc_sfx, swap_sfx, prepend,			\
>> -		   sc_append, swap_append)				\
>> +#define _arch_xchg(ptr, new, sc_sfx, swap_sfx,				\
>> +		   sc_prepend, sc_append)				\
>>   ({									\
>>   	__typeof__(ptr) __ptr = (ptr);					\
>>   	__typeof__(*(__ptr)) __new = (new);				\
>> @@ -79,21 +75,19 @@ end:;									\
>>   	switch (sizeof(*__ptr)) {					\
>>   	case 1:								\
>>   		__arch_xchg_masked(sc_sfx, ".b" swap_sfx,		\
>> -				   prepend, sc_append, swap_append,	\
>> +				   sc_prepend, sc_append,		\
>>   				   __ret, __ptr, __new);		\
>>   		break;							\
>>   	case 2:								\
>>   		__arch_xchg_masked(sc_sfx, ".h" swap_sfx,		\
>> -				   prepend, sc_append, swap_append,	\
>> +				   sc_prepend, sc_append,		\
>>   				   __ret, __ptr, __new);		\
>>   		break;							\
>>   	case 4:								\
>> -		__arch_xchg(".w" swap_sfx, prepend, swap_append,	\
>> -			      __ret, __ptr, __new);			\
>> +		__arch_xchg(".w" swap_sfx,  __ret, __ptr, __new);	\
>>   		break;							\
>>   	case 8:								\
>> -		__arch_xchg(".d" swap_sfx, prepend, swap_append,	\
>> -			      __ret, __ptr, __new);			\
>> +		__arch_xchg(".d" swap_sfx,  __ret, __ptr, __new);	\
>>   		break;							\
>>   	default:							\
>>   		BUILD_BUG();						\
>> @@ -102,17 +96,16 @@ end:;									\
>>   })
>>   
>>   #define arch_xchg_relaxed(ptr, x)					\
>> -	_arch_xchg(ptr, x, "", "", "", "", "")
>> +	_arch_xchg(ptr, x, "", "", "", "")
>>   
>>   #define arch_xchg_acquire(ptr, x)					\
>> -	_arch_xchg(ptr, x, "", "", "",					\
>> -		   RISCV_ACQUIRE_BARRIER, RISCV_ACQUIRE_BARRIER)
>> +	_arch_xchg(ptr, x, "", ".aq", "", RISCV_ACQUIRE_BARRIER)
>>   
>>   #define arch_xchg_release(ptr, x)					\
>> -	_arch_xchg(ptr, x, "", "", RISCV_RELEASE_BARRIER, "", "")
>> +	_arch_xchg(ptr, x, "", ".rl", RISCV_RELEASE_BARRIER, "")
>>   
>>   #define arch_xchg(ptr, x)						\
>> -	_arch_xchg(ptr, x, ".rl", ".aqrl", "", RISCV_FULL_BARRIER, "")
>> +	_arch_xchg(ptr, x, ".rl", ".aqrl", "", RISCV_FULL_BARRIER)
> I actually see no reason for this patch, please see also my remarks
> /question on patch #4.


You mean that we can't improve the fully-ordered version here?


>
>    Andrea
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-06-27 15:19   ` Andrea Parri
@ 2024-07-04 17:33     ` Alexandre Ghiti
  0 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-04 17:33 UTC (permalink / raw)
  To: Andrea Parri, Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Nathan Chancellor, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch


On 27/06/2024 17:19, Andrea Parri wrote:
>> This is largely based on Guo's work and Leonardo reviews at [1].
> Guo, could/should this have your Co-developed-by:/Signed-off-by:?


Indeed, I'll add a SoB from Guo.


>
> (disclaimer: I haven't looked at the last three patches of this submission
> with due calm and probably won't before ~mid-July...)
>
>
>> Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
> There seems to be a distinct lack of experimental results, compared to the
> previous/cited submission (and numbers are good to have!!  ;-)). Maybe Guo
> /others can provide some? to confirm this is going in the right direction.
>
>    Andrea
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-04  3:38   ` kernel test robot
@ 2024-07-05 17:27     ` Nathan Chancellor
  2024-07-16 12:19       ` Alexandre Ghiti
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Chancellor @ 2024-07-05 17:27 UTC (permalink / raw)
  To: kernel test robot
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Andrea Parri, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch, llvm,
	oe-kbuild-all

On Thu, Jul 04, 2024 at 11:38:46AM +0800, kernel test robot wrote:
> Hi Alexandre,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on soc/for-next]
> [also build test ERROR on linus/master v6.10-rc6 next-20240703]
> [cannot apply to arnd-asm-generic/master robh/for-next tip/locking/core]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Alexandre-Ghiti/riscv-Implement-cmpxchg32-64-using-Zacas/20240627-034946
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
> patch link:    https://lore.kernel.org/r/20240626130347.520750-2-alexghiti%40rivosinc.com
> patch subject: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
> config: riscv-randconfig-002-20240704 (https://download.01.org/0day-ci/archive/20240704/202407041157.odTZAYZ6-lkp@intel.com/config)
> compiler: clang version 16.0.6 (https://github.com/llvm/llvm-project 7cbf1a2591520c2491aa35339f227775f4d3adf6)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240704/202407041157.odTZAYZ6-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202407041157.odTZAYZ6-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
> >> kernel/sched/core.c:11873:7: error: cannot jump from this asm goto statement to one of its possible targets
>                    if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
>                        ^
>    include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
>            raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
>            ^
>    include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
>            ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
>                   ^
>    include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
>    #define raw_cmpxchg arch_cmpxchg
>                        ^
>    arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
>            _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
>            ^
>    arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
>                    __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
>                    ^
>    arch/riscv/include/asm/cmpxchg.h:144:3: note: expanded from macro '__arch_cmpxchg'
>                    asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,            \
>                    ^
>    kernel/sched/core.c:11840:7: note: possible target of asm goto statement
>            if (!try_cmpxchg(&pcpu_cid->cid, &cid, lazy_cid))
>                 ^
>    include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
>            raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
>            ^
>    include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
>            ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
>                   ^
>    include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
>    #define raw_cmpxchg arch_cmpxchg
>                        ^
>    arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
>            _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
>            ^
>    arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
>                    __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
>                    ^
>    arch/riscv/include/asm/cmpxchg.h:161:10: note: expanded from macro '__arch_cmpxchg'
>                                                                            \
>                                                                            ^
>    kernel/sched/core.c:11872:2: note: jump exits scope of variable with __attribute__((cleanup))
>            scoped_guard (irqsave) {
>            ^
>    include/linux/cleanup.h:169:20: note: expanded from macro 'scoped_guard'
>            for (CLASS(_name, scope)(args),                                 \
>                              ^
>    kernel/sched/core.c:11840:7: error: cannot jump from this asm goto statement to one of its possible targets
>            if (!try_cmpxchg(&pcpu_cid->cid, &cid, lazy_cid))
>                 ^
>    include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
>            raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
>            ^
>    include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
>            ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
>                   ^
>    include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
>    #define raw_cmpxchg arch_cmpxchg
>                        ^
>    arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
>            _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
>            ^
>    arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
>                    __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
>                    ^
>    arch/riscv/include/asm/cmpxchg.h:144:3: note: expanded from macro '__arch_cmpxchg'
>                    asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,            \
>                    ^
>    kernel/sched/core.c:11873:7: note: possible target of asm goto statement
>                    if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
>                        ^
>    include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
>            raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
>            ^
>    include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
>            ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
>                   ^
>    include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
>    #define raw_cmpxchg arch_cmpxchg
>                        ^
>    arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
>            _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
>            ^
>    arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
>                    __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
>                    ^
>    arch/riscv/include/asm/cmpxchg.h:161:10: note: expanded from macro '__arch_cmpxchg'
>                                                                            \
>                                                                            ^
>    kernel/sched/core.c:11872:2: note: jump bypasses initialization of variable with __attribute__((cleanup))
>            scoped_guard (irqsave) {
>            ^
>    include/linux/cleanup.h:169:20: note: expanded from macro 'scoped_guard'
>            for (CLASS(_name, scope)(args),                                 \
>                              ^
>    2 errors generated.

Ugh, this is an unfortunate interaction with clang's jump scope analysis
and asm goto in LLVM releases prior to 17 :/

https://github.com/ClangBuiltLinux/linux/issues/1886#issuecomment-1645979992

Unfortunately, 'if (0)' does not prevent this (the analysis runs early
in the front end as far as I understand it), we would need to workaround
this with full preprocessor guards...

Another alternative would be to require LLVM 17+ for RISC-V, which may
not be the worst alternative, since I think most people doing serious
work with clang will probably be living close to tip of tree anyways
because of all the extension work that goes on upstream.

I am open to other thoughts though.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-06-26 13:03 ` [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension Alexandre Ghiti
  2024-06-27 15:19   ` Andrea Parri
@ 2024-07-07  2:20   ` Guo Ren
  2024-07-08 11:51     ` Guo Ren
  2024-07-15  7:27     ` Alexandre Ghiti
  1 sibling, 2 replies; 45+ messages in thread
From: Guo Ren @ 2024-07-07  2:20 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> In order to produce a generic kernel, a user can select
> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> spinlock implementation if Zabha is not present.
>
> Note that we can't use alternatives here because the discovery of
> extensions is done too late and we need to start with the qspinlock
That's not true; we treat spinlock as qspinlock at first.
qspinlock_unlock would make the lock value zero (clean), but
ticket_lock would make a dirty one. (I've spent much time on this
mechanism, and you've preserved it in this patch.) So, making the
qspinlock -> ticket_lock change point safe until sched_init() is late
enough to make alternatives. The key problem of alternative
implementation is tough coding because you can't reuse the C code. The
whole ticket_lock must be rewritten in asm and include the qspinlock
fast path.

I think we should discuss some points before continuing the patch:
1. Using alternative mechanisms for combo spinlock
2. Using three Kconfigs for
ticket_spinlock/queune_spinlock/combo_spinlock during compile stage.
3. The forward progress guarantee requirement is written in
qspinlock.h comment. That is not about our CAS/BHA.

> implementation because the ticket spinlock implementation would pollute
> the spinlock value, so let's use static keys.
>
> This is largely based on Guo's work and Leonardo reviews at [1].
>
> Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  .../locking/queued-spinlocks/arch-support.txt |  2 +-
>  arch/riscv/Kconfig                            | 10 +++++
>  arch/riscv/include/asm/Kbuild                 |  4 +-
>  arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
>  arch/riscv/kernel/setup.c                     | 21 ++++++++++
>  include/asm-generic/qspinlock.h               |  2 +
>  include/asm-generic/ticket_spinlock.h         |  2 +
>  7 files changed, 78 insertions(+), 2 deletions(-)
>  create mode 100644 arch/riscv/include/asm/spinlock.h
>
> diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> index 22f2990392ff..cf26042480e2 100644
> --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
> +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> @@ -20,7 +20,7 @@
>      |    openrisc: |  ok  |
>      |      parisc: | TODO |
>      |     powerpc: |  ok  |
> -    |       riscv: | TODO |
> +    |       riscv: |  ok  |
>      |        s390: | TODO |
>      |          sh: | TODO |
>      |       sparc: |  ok  |
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 0bbaec0444d0..c2ba673e58ca 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -72,6 +72,7 @@ config RISCV
>         select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
>         select ARCH_WANTS_NO_INSTR
>         select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +       select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
>         select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
>         select BUILDTIME_TABLE_SORT if MMU
>         select CLINT_TIMER if RISCV_M_MODE
> @@ -482,6 +483,15 @@ config NODES_SHIFT
>           Specify the maximum number of NUMA Nodes available on the target
>           system.  Increases memory reserved to accommodate various tables.
>
> +config RISCV_COMBO_SPINLOCKS
> +       bool "Using combo spinlock"
> +       depends on SMP && MMU && TOOLCHAIN_HAS_ZABHA
> +       select ARCH_USE_QUEUED_SPINLOCKS
> +       default y
> +       help
> +         Embed both queued spinlock and ticket lock so that the spinlock
> +         implementation can be chosen at runtime.
> +

COMBO SPINLOCK has side effects, which would expand spinlock code size
a lot. Ref: ARCH_INLINE_SPIN_LOCK

So, we shouldn't remove the three configs' selection.

+choice
+ prompt "RISC-V spinlock type"
+ default RISCV_COMBO_SPINLOCKS
+
+config RISCV_TICKET_SPINLOCKS
+ bool "Using ticket spinlock"
+
+config RISCV_QUEUED_SPINLOCKS
+ bool "Using queued spinlock"
+ depends on SMP && MMU
+ select ARCH_USE_QUEUED_SPINLOCKS
+ help
+  Make sure your micro arch give cmpxchg/xchg forward progress
+  guarantee. Otherwise, stay at ticket-lock.
+
+config RISCV_COMBO_SPINLOCKS
+ bool "Using combo spinlock"
+ depends on SMP && MMU
+ select ARCH_USE_QUEUED_SPINLOCKS
+ help
+  Select queued spinlock or ticket-lock by cmdline.
+endchoice
+



>  config RISCV_ALTERNATIVE
>         bool
>         depends on !XIP_KERNEL
> diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> index 504f8b7e72d4..ad72f2bd4cc9 100644
> --- a/arch/riscv/include/asm/Kbuild
> +++ b/arch/riscv/include/asm/Kbuild
> @@ -2,10 +2,12 @@
>  generic-y += early_ioremap.h
>  generic-y += flat.h
>  generic-y += kvm_para.h
> +generic-y += mcs_spinlock.h
>  generic-y += parport.h
> -generic-y += spinlock.h
>  generic-y += spinlock_types.h
> +generic-y += ticket_spinlock.h
>  generic-y += qrwlock.h
>  generic-y += qrwlock_types.h
> +generic-y += qspinlock.h
>  generic-y += user.h
>  generic-y += vmlinux.lds.h
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> new file mode 100644
> index 000000000000..4856d50006f2
> --- /dev/null
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_RISCV_SPINLOCK_H
> +#define __ASM_RISCV_SPINLOCK_H
> +
> +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> +#define _Q_PENDING_LOOPS       (1 << 9)
> +
> +#define __no_arch_spinlock_redefine
> +#include <asm/ticket_spinlock.h>
> +#include <asm/qspinlock.h>
> +#include <asm/alternative.h>
> +
> +DECLARE_STATIC_KEY_TRUE(qspinlock_key);
> +
> +#define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
> +static __always_inline type arch_spin_##op(type_lock lock)             \
> +{                                                                      \
> +       if (static_branch_unlikely(&qspinlock_key))                     \
> +               return queued_spin_##op(lock);                          \
> +       return ticket_spin_##op(lock);                                  \
> +}
> +
> +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
> +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
> +
> +#else
> +
> +#include <asm/ticket_spinlock.h>
> +
> +#endif
> +
> +#include <asm/qrwlock.h>
> +
> +#endif /* __ASM_RISCV_SPINLOCK_H */
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 4f73c0ae44b2..929bccd4c9e5 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -244,6 +244,26 @@ static void __init parse_dtb(void)
>  #endif
>  }
>
> +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
> +EXPORT_SYMBOL(qspinlock_key);
> +
> +static void __init riscv_spinlock_init(void)
> +{
> +       asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
> +                : : : : no_zacas);
> +       asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
> +                : : : : qspinlock);
The requirement of qspinlock concerns the forward progress guarantee
in micro-arch design, which includes cmpxchg_loop & xchg_tail. So, I
don't think these features have a relationship with Qspinlock.

If your machine doesn't have enough stickiness for a young exclusive
cacheline, fall back to ticket_lock.

> +
> +no_zacas:
> +       static_branch_disable(&qspinlock_key);
> +       pr_info("Ticket spinlock: enabled\n");
> +
> +       return;
> +
> +qspinlock:
> +       pr_info("Queued spinlock: enabled\n");
> +}
> +
>  extern void __init init_rt_signal_env(void);
>
>  void __init setup_arch(char **cmdline_p)
> @@ -295,6 +315,7 @@ void __init setup_arch(char **cmdline_p)
>         riscv_set_dma_cache_alignment();
>
>         riscv_user_isa_enable();
> +       riscv_spinlock_init();
>  }
>
>  bool arch_cpu_is_hotpluggable(int cpu)
> diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> index 0655aa5b57b2..bf47cca2c375 100644
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
>  }
>  #endif
>
> +#ifndef __no_arch_spinlock_redefine
>  /*
>   * Remapping spinlock architecture specific functions to the corresponding
>   * queued spinlock functions.
> @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
>  #define arch_spin_lock(l)              queued_spin_lock(l)
>  #define arch_spin_trylock(l)           queued_spin_trylock(l)
>  #define arch_spin_unlock(l)            queued_spin_unlock(l)
> +#endif
>
>  #endif /* __ASM_GENERIC_QSPINLOCK_H */
> diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
> index cfcff22b37b3..325779970d8a 100644
> --- a/include/asm-generic/ticket_spinlock.h
> +++ b/include/asm-generic/ticket_spinlock.h
> @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
>         return (s16)((val >> 16) - (val & 0xffff)) > 1;
>  }
>
> +#ifndef __no_arch_spinlock_redefine
>  /*
>   * Remapping spinlock architecture specific functions to the corresponding
>   * ticket spinlock functions.
> @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
>  #define arch_spin_lock(l)              ticket_spin_lock(l)
>  #define arch_spin_trylock(l)           ticket_spin_trylock(l)
>  #define arch_spin_unlock(l)            ticket_spin_unlock(l)
> +#endif
>
>  #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
> --
> 2.39.2
>


--
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-07-07  2:20   ` Guo Ren
@ 2024-07-08 11:51     ` Guo Ren
  2024-07-15  7:33       ` Alexandre Ghiti
  2024-07-15  7:27     ` Alexandre Ghiti
  1 sibling, 1 reply; 45+ messages in thread
From: Guo Ren @ 2024-07-08 11:51 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

On Sun, Jul 7, 2024 at 10:20 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > In order to produce a generic kernel, a user can select
> > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> > spinlock implementation if Zabha is not present.
> >
> > Note that we can't use alternatives here because the discovery of
> > extensions is done too late and we need to start with the qspinlock
> That's not true; we treat spinlock as qspinlock at first.
> qspinlock_unlock would make the lock value zero (clean), but
> ticket_lock would make a dirty one. (I've spent much time on this
> mechanism, and you've preserved it in this patch.) So, making the
> qspinlock -> ticket_lock change point safe until sched_init() is late
> enough to make alternatives. The key problem of alternative
> implementation is tough coding because you can't reuse the C code. The
> whole ticket_lock must be rewritten in asm and include the qspinlock
> fast path.
>
> I think we should discuss some points before continuing the patch:
> 1. Using alternative mechanisms for combo spinlock
> 2. Using three Kconfigs for
> ticket_spinlock/queune_spinlock/combo_spinlock during compile stage.
> 3. The forward progress guarantee requirement is written in
> qspinlock.h comment. That is not about our CAS/BHA.
>
> > implementation because the ticket spinlock implementation would pollute
> > the spinlock value, so let's use static keys.
> >
> > This is largely based on Guo's work and Leonardo reviews at [1].
> >
> > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >  .../locking/queued-spinlocks/arch-support.txt |  2 +-
> >  arch/riscv/Kconfig                            | 10 +++++
> >  arch/riscv/include/asm/Kbuild                 |  4 +-
> >  arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
> >  arch/riscv/kernel/setup.c                     | 21 ++++++++++
> >  include/asm-generic/qspinlock.h               |  2 +
> >  include/asm-generic/ticket_spinlock.h         |  2 +
> >  7 files changed, 78 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/spinlock.h
> >
> > diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > index 22f2990392ff..cf26042480e2 100644
> > --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > @@ -20,7 +20,7 @@
> >      |    openrisc: |  ok  |
> >      |      parisc: | TODO |
> >      |     powerpc: |  ok  |
> > -    |       riscv: | TODO |
> > +    |       riscv: |  ok  |
> >      |        s390: | TODO |
> >      |          sh: | TODO |
> >      |       sparc: |  ok  |
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 0bbaec0444d0..c2ba673e58ca 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -72,6 +72,7 @@ config RISCV
> >         select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
> >         select ARCH_WANTS_NO_INSTR
> >         select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > +       select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
> >         select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
> >         select BUILDTIME_TABLE_SORT if MMU
> >         select CLINT_TIMER if RISCV_M_MODE
> > @@ -482,6 +483,15 @@ config NODES_SHIFT
> >           Specify the maximum number of NUMA Nodes available on the target
> >           system.  Increases memory reserved to accommodate various tables.
> >
> > +config RISCV_COMBO_SPINLOCKS
> > +       bool "Using combo spinlock"
> > +       depends on SMP && MMU && TOOLCHAIN_HAS_ZABHA
> > +       select ARCH_USE_QUEUED_SPINLOCKS
> > +       default y
> > +       help
> > +         Embed both queued spinlock and ticket lock so that the spinlock
> > +         implementation can be chosen at runtime.
> > +
>
> COMBO SPINLOCK has side effects, which would expand spinlock code size
> a lot. Ref: ARCH_INLINE_SPIN_LOCK
>
> So, we shouldn't remove the three configs' selection.
>
> +choice
> + prompt "RISC-V spinlock type"
> + default RISCV_COMBO_SPINLOCKS
> +
> +config RISCV_TICKET_SPINLOCKS
> + bool "Using ticket spinlock"
> +
> +config RISCV_QUEUED_SPINLOCKS
> + bool "Using queued spinlock"
> + depends on SMP && MMU
> + select ARCH_USE_QUEUED_SPINLOCKS
> + help
> +  Make sure your micro arch give cmpxchg/xchg forward progress
> +  guarantee. Otherwise, stay at ticket-lock.
> +
> +config RISCV_COMBO_SPINLOCKS
> + bool "Using combo spinlock"
> + depends on SMP && MMU
> + select ARCH_USE_QUEUED_SPINLOCKS
> + help
> +  Select queued spinlock or ticket-lock by cmdline.
> +endchoice
> +
>
>
>
> >  config RISCV_ALTERNATIVE
> >         bool
> >         depends on !XIP_KERNEL
> > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> > index 504f8b7e72d4..ad72f2bd4cc9 100644
> > --- a/arch/riscv/include/asm/Kbuild
> > +++ b/arch/riscv/include/asm/Kbuild
> > @@ -2,10 +2,12 @@
> >  generic-y += early_ioremap.h
> >  generic-y += flat.h
> >  generic-y += kvm_para.h
> > +generic-y += mcs_spinlock.h
> >  generic-y += parport.h
> > -generic-y += spinlock.h
> >  generic-y += spinlock_types.h
> > +generic-y += ticket_spinlock.h
> >  generic-y += qrwlock.h
> >  generic-y += qrwlock_types.h
> > +generic-y += qspinlock.h
> >  generic-y += user.h
> >  generic-y += vmlinux.lds.h
> > diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> > new file mode 100644
> > index 000000000000..4856d50006f2
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/spinlock.h
> > @@ -0,0 +1,39 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef __ASM_RISCV_SPINLOCK_H
> > +#define __ASM_RISCV_SPINLOCK_H
> > +
> > +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> > +#define _Q_PENDING_LOOPS       (1 << 9)
> > +
> > +#define __no_arch_spinlock_redefine
> > +#include <asm/ticket_spinlock.h>
> > +#include <asm/qspinlock.h>
> > +#include <asm/alternative.h>
> > +
> > +DECLARE_STATIC_KEY_TRUE(qspinlock_key);
> > +
> > +#define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
> > +static __always_inline type arch_spin_##op(type_lock lock)             \
> > +{                                                                      \
> > +       if (static_branch_unlikely(&qspinlock_key))                     \
> > +               return queued_spin_##op(lock);                          \
> > +       return ticket_spin_##op(lock);                                  \
> > +}
> > +
> > +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
> > +
> > +#else
> > +
> > +#include <asm/ticket_spinlock.h>
> > +
> > +#endif
> > +
> > +#include <asm/qrwlock.h>
> > +
> > +#endif /* __ASM_RISCV_SPINLOCK_H */
> > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> > index 4f73c0ae44b2..929bccd4c9e5 100644
> > --- a/arch/riscv/kernel/setup.c
> > +++ b/arch/riscv/kernel/setup.c
> > @@ -244,6 +244,26 @@ static void __init parse_dtb(void)
> >  #endif
> >  }
> >
> > +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
> > +EXPORT_SYMBOL(qspinlock_key);
> > +
> > +static void __init riscv_spinlock_init(void)
> > +{
> > +       asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
> > +                : : : : no_zacas);
> > +       asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
> > +                : : : : qspinlock);
> The requirement of qspinlock concerns the forward progress guarantee
> in micro-arch design, which includes cmpxchg_loop & xchg_tail. So, I
> don't think these features have a relationship with Qspinlock.
>
> If your machine doesn't have enough stickiness for a young exclusive
> cacheline, fall back to ticket_lock.

Could we use "Ziccrse: Main memory supports forward progress on LR/SC
sequences" for qspinlock selection?

+       asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
RISCV_ISA_EXT_ZICCRSE, 1)
                : : : : qspinlock);

[1] https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc

>
> > +
> > +no_zacas:
> > +       static_branch_disable(&qspinlock_key);
> > +       pr_info("Ticket spinlock: enabled\n");
> > +
> > +       return;
> > +
> > +qspinlock:
> > +       pr_info("Queued spinlock: enabled\n");
> > +}
> > +
> >  extern void __init init_rt_signal_env(void);
> >
> >  void __init setup_arch(char **cmdline_p)
> > @@ -295,6 +315,7 @@ void __init setup_arch(char **cmdline_p)
> >         riscv_set_dma_cache_alignment();
> >
> >         riscv_user_isa_enable();
> > +       riscv_spinlock_init();
> >  }
> >
> >  bool arch_cpu_is_hotpluggable(int cpu)
> > diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> > index 0655aa5b57b2..bf47cca2c375 100644
> > --- a/include/asm-generic/qspinlock.h
> > +++ b/include/asm-generic/qspinlock.h
> > @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> >  }
> >  #endif
> >
> > +#ifndef __no_arch_spinlock_redefine
> >  /*
> >   * Remapping spinlock architecture specific functions to the corresponding
> >   * queued spinlock functions.
> > @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> >  #define arch_spin_lock(l)              queued_spin_lock(l)
> >  #define arch_spin_trylock(l)           queued_spin_trylock(l)
> >  #define arch_spin_unlock(l)            queued_spin_unlock(l)
> > +#endif
> >
> >  #endif /* __ASM_GENERIC_QSPINLOCK_H */
> > diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
> > index cfcff22b37b3..325779970d8a 100644
> > --- a/include/asm-generic/ticket_spinlock.h
> > +++ b/include/asm-generic/ticket_spinlock.h
> > @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> >         return (s16)((val >> 16) - (val & 0xffff)) > 1;
> >  }
> >
> > +#ifndef __no_arch_spinlock_redefine
> >  /*
> >   * Remapping spinlock architecture specific functions to the corresponding
> >   * ticket spinlock functions.
> > @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> >  #define arch_spin_lock(l)              ticket_spin_lock(l)
> >  #define arch_spin_trylock(l)           ticket_spin_trylock(l)
> >  #define arch_spin_unlock(l)            ticket_spin_unlock(l)
> > +#endif
> >
> >  #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
> > --
> > 2.39.2
> >
>
>
> --
> Best Regards
>  Guo Ren



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-04 16:25     ` Alexandre Ghiti
@ 2024-07-09 23:47       ` Andrea Parri
  2024-07-15 11:48         ` Alexandre Ghiti
  0 siblings, 1 reply; 45+ messages in thread
From: Andrea Parri @ 2024-07-09 23:47 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

> > Is this second IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check actually needed?
> > (just wondering - no real objection)
> 
> To me yes, otherwise a toolchain without zacas support would fail to
> assemble the amocas instruction.

To elaborate on my question:  Such a toolchain may be able to recognize
that the block of code following the zacas: label (and comprising the
amocas instruction) can't be reached/executed if the first IS_ENABLED()
evaluates to false (due to the goto end; statement), and consequently it
may compile out the entire block/instruction no matter the presence or
not of the second IS_ENABLE() check.  IOW, such a toolchain/compiler may
not actually have to assemble the amocas instruction under such config.
In fact, this is how the current gcc trunk (which doesn't support zacas)
seems to behave.  And this very same optimization/code removal seems to
be performed by clang when CONFIG_RISCV_ISA_ZACAS=n.  IAC, I'd agree it
is good to be explicit in the sources and keep both of these checks.


> > Why the semicolon?
> 
> That fixes a clang warning reported by Nathan here:
> https://lore.kernel.org/linux-riscv/20240528193110.GA2196855@thelio-3990X/

I see.  Thanks for the pointer.


> > This is because the compiler doesn't realize __ret is actually
> > initialized, right?  IAC, seems a bit unexpected to initialize
> > with (old) (which indicates SUCCESS of the CMPXCHG operation);
> > how about using (new) for the initialization of __ret instead?
> > would (new) still work for you?
> 
> But amocas rd register must contain the expected old value in order to
> actually work right?

Agreed.  Thanks for the clarification.

  Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-04 16:36     ` Alexandre Ghiti
@ 2024-07-09 23:51       ` Andrea Parri
  2024-07-15 12:56         ` Alexandre Ghiti
  0 siblings, 1 reply; 45+ messages in thread
From: Andrea Parri @ 2024-07-09 23:51 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

> > I admit that I found this all quite difficult to read; IIUC, this is
> > missing an IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check.
> 
> I'm not sure we need the zacas check here, since we could use a toolchain
> that supports zabha but not zacas, run this on a zabha/zacas platform and it
> would work.

One specific set-up I was concerned about is as follows:

  1) hardware implements both zabha and zacas
  2) toolchain supports both zabha and zacas
  3) CONFIG_RISCV_ISA_ZABHA=y and CONFIG_RISCV_ISA_ZACAS=n

Since CONFIG_RISCV_ISA_ZABHA=y, the first asm goto will get executed
and, since the hardware implements zacas, that will result in a nop.
Then the second asm goto will get executed and, since the hardware
implements zabha, it will result in the j zabha.  In conclusion, the
amocas instruction following the zabha: label will get executed, thus
violating (the semantics of) CONFIG_RISCV_ISA_ZACAS=n.  IIUC, the diff
I've posted previously in this thread shared a similar limitation/bug.

  Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 07/10] riscv: Improve amoswap.X use in xchg()
  2024-07-04 17:26     ` Alexandre Ghiti
@ 2024-07-10  0:09       ` Andrea Parri
  0 siblings, 0 replies; 45+ messages in thread
From: Andrea Parri @ 2024-07-10  0:09 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch, Andrea Parri

> > I actually see no reason for this patch, please see also my remarks
> > /question on patch #4.
> 
> You mean that we can't improve the fully-ordered version here?

Well, I mean, this patch doesn't AFAICT.  You've recently fixed these
LR/SC mappings in 1d84afaf0252 ("riscv: Fix fully ordered LR/SC
xchg[8|16]() implementations"), and the remaining look just fine to me.

  Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha
  2024-06-26 13:03 ` [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha Alexandre Ghiti
  2024-06-27 13:45   ` Andrea Parri
@ 2024-07-10  1:37   ` Guo Ren
  2024-07-15 13:20     ` Alexandre Ghiti
  1 sibling, 1 reply; 45+ messages in thread
From: Guo Ren @ 2024-07-10  1:37 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

On Wed, Jun 26, 2024 at 9:10 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> This adds runtime support for Zabha in xchg8/16() operations.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/include/asm/cmpxchg.h | 33 +++++++++++++++++++++++++++++---
>  arch/riscv/include/asm/hwcap.h   |  1 +
>  arch/riscv/kernel/cpufeature.c   |  1 +
>  3 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index da42f32ea53d..eb35e2d30a97 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -11,8 +11,17 @@
>  #include <asm/fence.h>
>  #include <asm/alternative.h>
>
> -#define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)           \
> +#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,       \
> +                          swap_append, r, p, n)                        \
>  ({                                                                     \
> +       __label__ zabha, end;                                           \
> +                                                                       \
> +       if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {                       \
> +               asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,            \
> +                                    RISCV_ISA_EXT_ZABHA, 1)            \
> +                        : : : : zabha);                                \
> +       }                                                               \
> +                                                                       \
Could we exchange the sequence between Zabha & lr/sc?
I mean:
nop -> zabha
j -> lr/sc


>         u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);                     \
>         ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;  \
>         ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)   \
> @@ -28,12 +37,25 @@
>                "        or   %1, %1, %z3\n"                             \
>                "        sc.w" sc_sfx " %1, %1, %2\n"                    \
>                "        bnez %1, 0b\n"                                  \
> -              append                                                   \
> +              sc_append                                                        \
>                : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))       \
>                : "rJ" (__newx), "rJ" (~__mask)                          \
>                : "memory");                                             \
>                                                                         \
>         r = (__typeof__(*(p)))((__retx & __mask) >> __s);               \
> +       goto end;                                                       \
> +                                                                       \
> +zabha:
jump lr/sc implementation because it's already slow.
                                                               \
> +       if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {                       \
> +               __asm__ __volatile__ (                                  \
> +                       prepend                                         \
> +                       "       amoswap" swap_sfx " %0, %z2, %1\n"      \
> +                       swap_append                                             \
> +                       : "=&r" (r), "+A" (*(p))                        \
> +                       : "rJ" (n)                                      \
> +                       : "memory");                                    \
> +       }                                                               \
> +end:;                                                                  \
>  })
>
>  #define __arch_xchg(sfx, prepend, append, r, p, n)                     \
> @@ -56,8 +78,13 @@
>                                                                         \
>         switch (sizeof(*__ptr)) {                                       \
>         case 1:                                                         \
> +               __arch_xchg_masked(sc_sfx, ".b" swap_sfx,               \
> +                                  prepend, sc_append, swap_append,     \
> +                                  __ret, __ptr, __new);                \
> +               break;                                                  \
>         case 2:                                                         \
> -               __arch_xchg_masked(sc_sfx, prepend, sc_append,          \
> +               __arch_xchg_masked(sc_sfx, ".h" swap_sfx,               \
> +                                  prepend, sc_append, swap_append,     \
>                                    __ret, __ptr, __new);                \
>                 break;                                                  \
>         case 4:                                                         \
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index e17d0078a651..f71ddd2ca163 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -81,6 +81,7 @@
>  #define RISCV_ISA_EXT_ZTSO             72
>  #define RISCV_ISA_EXT_ZACAS            73
>  #define RISCV_ISA_EXT_XANDESPMU                74
> +#define RISCV_ISA_EXT_ZABHA            75
>
>  #define RISCV_ISA_EXT_XLINUXENVCFG     127
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 5ef48cb20ee1..c125d82c894b 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -257,6 +257,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
>         __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>         __RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
>         __RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
> +       __RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA),
>         __RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
>         __RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
>         __RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),
> --
> 2.39.2
>


-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-07-07  2:20   ` Guo Ren
  2024-07-08 11:51     ` Guo Ren
@ 2024-07-15  7:27     ` Alexandre Ghiti
  2024-07-15 19:30       ` Waiman Long
  1 sibling, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-15  7:27 UTC (permalink / raw)
  To: Guo Ren
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Guo,

On Sun, Jul 7, 2024 at 4:20 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > In order to produce a generic kernel, a user can select
> > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> > spinlock implementation if Zabha is not present.
> >
> > Note that we can't use alternatives here because the discovery of
> > extensions is done too late and we need to start with the qspinlock
> That's not true; we treat spinlock as qspinlock at first.

That's what I said: we have to use the qspinlock implementation at
first *because* we can't discover the extensions soon enough to use
the right spinlock implementation before the kernel uses a spinlock.
And since the spinlocks are used *before* the discovery of the
extensions, we cannot use the current alternative mechanism or we'd
need to extend it to add an "initial" value which does not depend on
the available extensions.

> qspinlock_unlock would make the lock value zero (clean), but
> ticket_lock would make a dirty one. (I've spent much time on this
> mechanism, and you've preserved it in this patch.) So, making the
> qspinlock -> ticket_lock change point safe until sched_init() is late
> enough to make alternatives. The key problem of alternative
> implementation is tough coding because you can't reuse the C code. The
> whole ticket_lock must be rewritten in asm and include the qspinlock
> fast path.
>
> I think we should discuss some points before continuing the patch:
> 1. Using alternative mechanisms for combo spinlock

We can easily get the extension string from the DT, and I have a PoC
that works with ACPI, that would make this possible.

> 2. Using three Kconfigs for
> ticket_spinlock/queune_spinlock/combo_spinlock during compile stage.

This makes sense, I'll do that.

> 3. The forward progress guarantee requirement is written in
> qspinlock.h comment. That is not about our CAS/BHA.
>
> > implementation because the ticket spinlock implementation would pollute
> > the spinlock value, so let's use static keys.
> >
> > This is largely based on Guo's work and Leonardo reviews at [1].
> >
> > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >  .../locking/queued-spinlocks/arch-support.txt |  2 +-
> >  arch/riscv/Kconfig                            | 10 +++++
> >  arch/riscv/include/asm/Kbuild                 |  4 +-
> >  arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
> >  arch/riscv/kernel/setup.c                     | 21 ++++++++++
> >  include/asm-generic/qspinlock.h               |  2 +
> >  include/asm-generic/ticket_spinlock.h         |  2 +
> >  7 files changed, 78 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/spinlock.h
> >
> > diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > index 22f2990392ff..cf26042480e2 100644
> > --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > @@ -20,7 +20,7 @@
> >      |    openrisc: |  ok  |
> >      |      parisc: | TODO |
> >      |     powerpc: |  ok  |
> > -    |       riscv: | TODO |
> > +    |       riscv: |  ok  |
> >      |        s390: | TODO |
> >      |          sh: | TODO |
> >      |       sparc: |  ok  |
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 0bbaec0444d0..c2ba673e58ca 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -72,6 +72,7 @@ config RISCV
> >         select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
> >         select ARCH_WANTS_NO_INSTR
> >         select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > +       select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
> >         select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
> >         select BUILDTIME_TABLE_SORT if MMU
> >         select CLINT_TIMER if RISCV_M_MODE
> > @@ -482,6 +483,15 @@ config NODES_SHIFT
> >           Specify the maximum number of NUMA Nodes available on the target
> >           system.  Increases memory reserved to accommodate various tables.
> >
> > +config RISCV_COMBO_SPINLOCKS
> > +       bool "Using combo spinlock"
> > +       depends on SMP && MMU && TOOLCHAIN_HAS_ZABHA
> > +       select ARCH_USE_QUEUED_SPINLOCKS
> > +       default y
> > +       help
> > +         Embed both queued spinlock and ticket lock so that the spinlock
> > +         implementation can be chosen at runtime.
> > +
>
> COMBO SPINLOCK has side effects, which would expand spinlock code size
> a lot. Ref: ARCH_INLINE_SPIN_LOCK
>
> So, we shouldn't remove the three configs' selection.
>
> +choice
> + prompt "RISC-V spinlock type"
> + default RISCV_COMBO_SPINLOCKS
> +
> +config RISCV_TICKET_SPINLOCKS
> + bool "Using ticket spinlock"
> +
> +config RISCV_QUEUED_SPINLOCKS
> + bool "Using queued spinlock"
> + depends on SMP && MMU
> + select ARCH_USE_QUEUED_SPINLOCKS
> + help
> +  Make sure your micro arch give cmpxchg/xchg forward progress
> +  guarantee. Otherwise, stay at ticket-lock.
> +
> +config RISCV_COMBO_SPINLOCKS
> + bool "Using combo spinlock"
> + depends on SMP && MMU
> + select ARCH_USE_QUEUED_SPINLOCKS
> + help
> +  Select queued spinlock or ticket-lock by cmdline.
> +endchoice
> +
>
>
>
> >  config RISCV_ALTERNATIVE
> >         bool
> >         depends on !XIP_KERNEL
> > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> > index 504f8b7e72d4..ad72f2bd4cc9 100644
> > --- a/arch/riscv/include/asm/Kbuild
> > +++ b/arch/riscv/include/asm/Kbuild
> > @@ -2,10 +2,12 @@
> >  generic-y += early_ioremap.h
> >  generic-y += flat.h
> >  generic-y += kvm_para.h
> > +generic-y += mcs_spinlock.h
> >  generic-y += parport.h
> > -generic-y += spinlock.h
> >  generic-y += spinlock_types.h
> > +generic-y += ticket_spinlock.h
> >  generic-y += qrwlock.h
> >  generic-y += qrwlock_types.h
> > +generic-y += qspinlock.h
> >  generic-y += user.h
> >  generic-y += vmlinux.lds.h
> > diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> > new file mode 100644
> > index 000000000000..4856d50006f2
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/spinlock.h
> > @@ -0,0 +1,39 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef __ASM_RISCV_SPINLOCK_H
> > +#define __ASM_RISCV_SPINLOCK_H
> > +
> > +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> > +#define _Q_PENDING_LOOPS       (1 << 9)
> > +
> > +#define __no_arch_spinlock_redefine
> > +#include <asm/ticket_spinlock.h>
> > +#include <asm/qspinlock.h>
> > +#include <asm/alternative.h>
> > +
> > +DECLARE_STATIC_KEY_TRUE(qspinlock_key);
> > +
> > +#define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
> > +static __always_inline type arch_spin_##op(type_lock lock)             \
> > +{                                                                      \
> > +       if (static_branch_unlikely(&qspinlock_key))                     \
> > +               return queued_spin_##op(lock);                          \
> > +       return ticket_spin_##op(lock);                                  \
> > +}
> > +
> > +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
> > +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
> > +
> > +#else
> > +
> > +#include <asm/ticket_spinlock.h>
> > +
> > +#endif
> > +
> > +#include <asm/qrwlock.h>
> > +
> > +#endif /* __ASM_RISCV_SPINLOCK_H */
> > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> > index 4f73c0ae44b2..929bccd4c9e5 100644
> > --- a/arch/riscv/kernel/setup.c
> > +++ b/arch/riscv/kernel/setup.c
> > @@ -244,6 +244,26 @@ static void __init parse_dtb(void)
> >  #endif
> >  }
> >
> > +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
> > +EXPORT_SYMBOL(qspinlock_key);
> > +
> > +static void __init riscv_spinlock_init(void)
> > +{
> > +       asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
> > +                : : : : no_zacas);
> > +       asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
> > +                : : : : qspinlock);
> The requirement of qspinlock concerns the forward progress guarantee
> in micro-arch design, which includes cmpxchg_loop & xchg_tail. So, I
> don't think these features have a relationship with Qspinlock.
>
> If your machine doesn't have enough stickiness for a young exclusive
> cacheline, fall back to ticket_lock.

How riscv zacas/zabha implementation would not provide forward
progress guarantee when all other architecture's atomic memory
operations do?

>
> > +
> > +no_zacas:
> > +       static_branch_disable(&qspinlock_key);
> > +       pr_info("Ticket spinlock: enabled\n");
> > +
> > +       return;
> > +
> > +qspinlock:
> > +       pr_info("Queued spinlock: enabled\n");
> > +}
> > +
> >  extern void __init init_rt_signal_env(void);
> >
> >  void __init setup_arch(char **cmdline_p)
> > @@ -295,6 +315,7 @@ void __init setup_arch(char **cmdline_p)
> >         riscv_set_dma_cache_alignment();
> >
> >         riscv_user_isa_enable();
> > +       riscv_spinlock_init();
> >  }
> >
> >  bool arch_cpu_is_hotpluggable(int cpu)
> > diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> > index 0655aa5b57b2..bf47cca2c375 100644
> > --- a/include/asm-generic/qspinlock.h
> > +++ b/include/asm-generic/qspinlock.h
> > @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> >  }
> >  #endif
> >
> > +#ifndef __no_arch_spinlock_redefine
> >  /*
> >   * Remapping spinlock architecture specific functions to the corresponding
> >   * queued spinlock functions.
> > @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> >  #define arch_spin_lock(l)              queued_spin_lock(l)
> >  #define arch_spin_trylock(l)           queued_spin_trylock(l)
> >  #define arch_spin_unlock(l)            queued_spin_unlock(l)
> > +#endif
> >
> >  #endif /* __ASM_GENERIC_QSPINLOCK_H */
> > diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
> > index cfcff22b37b3..325779970d8a 100644
> > --- a/include/asm-generic/ticket_spinlock.h
> > +++ b/include/asm-generic/ticket_spinlock.h
> > @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> >         return (s16)((val >> 16) - (val & 0xffff)) > 1;
> >  }
> >
> > +#ifndef __no_arch_spinlock_redefine
> >  /*
> >   * Remapping spinlock architecture specific functions to the corresponding
> >   * ticket spinlock functions.
> > @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> >  #define arch_spin_lock(l)              ticket_spin_lock(l)
> >  #define arch_spin_trylock(l)           ticket_spin_trylock(l)
> >  #define arch_spin_unlock(l)            ticket_spin_unlock(l)
> > +#endif
> >
> >  #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
> > --
> > 2.39.2
> >
>
>
> --
> Best Regards
>  Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-07-08 11:51     ` Guo Ren
@ 2024-07-15  7:33       ` Alexandre Ghiti
  0 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-15  7:33 UTC (permalink / raw)
  To: Guo Ren
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Guo,

On Mon, Jul 8, 2024 at 1:51 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Sun, Jul 7, 2024 at 10:20 AM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > >
> > > In order to produce a generic kernel, a user can select
> > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> > > spinlock implementation if Zabha is not present.
> > >
> > > Note that we can't use alternatives here because the discovery of
> > > extensions is done too late and we need to start with the qspinlock
> > That's not true; we treat spinlock as qspinlock at first.
> > qspinlock_unlock would make the lock value zero (clean), but
> > ticket_lock would make a dirty one. (I've spent much time on this
> > mechanism, and you've preserved it in this patch.) So, making the
> > qspinlock -> ticket_lock change point safe until sched_init() is late
> > enough to make alternatives. The key problem of alternative
> > implementation is tough coding because you can't reuse the C code. The
> > whole ticket_lock must be rewritten in asm and include the qspinlock
> > fast path.
> >
> > I think we should discuss some points before continuing the patch:
> > 1. Using alternative mechanisms for combo spinlock
> > 2. Using three Kconfigs for
> > ticket_spinlock/queune_spinlock/combo_spinlock during compile stage.
> > 3. The forward progress guarantee requirement is written in
> > qspinlock.h comment. That is not about our CAS/BHA.
> >
> > > implementation because the ticket spinlock implementation would pollute
> > > the spinlock value, so let's use static keys.
> > >
> > > This is largely based on Guo's work and Leonardo reviews at [1].
> > >
> > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1]
> > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > > ---
> > >  .../locking/queued-spinlocks/arch-support.txt |  2 +-
> > >  arch/riscv/Kconfig                            | 10 +++++
> > >  arch/riscv/include/asm/Kbuild                 |  4 +-
> > >  arch/riscv/include/asm/spinlock.h             | 39 +++++++++++++++++++
> > >  arch/riscv/kernel/setup.c                     | 21 ++++++++++
> > >  include/asm-generic/qspinlock.h               |  2 +
> > >  include/asm-generic/ticket_spinlock.h         |  2 +
> > >  7 files changed, 78 insertions(+), 2 deletions(-)
> > >  create mode 100644 arch/riscv/include/asm/spinlock.h
> > >
> > > diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > > index 22f2990392ff..cf26042480e2 100644
> > > --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > > +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> > > @@ -20,7 +20,7 @@
> > >      |    openrisc: |  ok  |
> > >      |      parisc: | TODO |
> > >      |     powerpc: |  ok  |
> > > -    |       riscv: | TODO |
> > > +    |       riscv: |  ok  |
> > >      |        s390: | TODO |
> > >      |          sh: | TODO |
> > >      |       sparc: |  ok  |
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 0bbaec0444d0..c2ba673e58ca 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -72,6 +72,7 @@ config RISCV
> > >         select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
> > >         select ARCH_WANTS_NO_INSTR
> > >         select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > > +       select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS
> > >         select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
> > >         select BUILDTIME_TABLE_SORT if MMU
> > >         select CLINT_TIMER if RISCV_M_MODE
> > > @@ -482,6 +483,15 @@ config NODES_SHIFT
> > >           Specify the maximum number of NUMA Nodes available on the target
> > >           system.  Increases memory reserved to accommodate various tables.
> > >
> > > +config RISCV_COMBO_SPINLOCKS
> > > +       bool "Using combo spinlock"
> > > +       depends on SMP && MMU && TOOLCHAIN_HAS_ZABHA
> > > +       select ARCH_USE_QUEUED_SPINLOCKS
> > > +       default y
> > > +       help
> > > +         Embed both queued spinlock and ticket lock so that the spinlock
> > > +         implementation can be chosen at runtime.
> > > +
> >
> > COMBO SPINLOCK has side effects, which would expand spinlock code size
> > a lot. Ref: ARCH_INLINE_SPIN_LOCK
> >
> > So, we shouldn't remove the three configs' selection.
> >
> > +choice
> > + prompt "RISC-V spinlock type"
> > + default RISCV_COMBO_SPINLOCKS
> > +
> > +config RISCV_TICKET_SPINLOCKS
> > + bool "Using ticket spinlock"
> > +
> > +config RISCV_QUEUED_SPINLOCKS
> > + bool "Using queued spinlock"
> > + depends on SMP && MMU
> > + select ARCH_USE_QUEUED_SPINLOCKS
> > + help
> > +  Make sure your micro arch give cmpxchg/xchg forward progress
> > +  guarantee. Otherwise, stay at ticket-lock.
> > +
> > +config RISCV_COMBO_SPINLOCKS
> > + bool "Using combo spinlock"
> > + depends on SMP && MMU
> > + select ARCH_USE_QUEUED_SPINLOCKS
> > + help
> > +  Select queued spinlock or ticket-lock by cmdline.
> > +endchoice
> > +
> >
> >
> >
> > >  config RISCV_ALTERNATIVE
> > >         bool
> > >         depends on !XIP_KERNEL
> > > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> > > index 504f8b7e72d4..ad72f2bd4cc9 100644
> > > --- a/arch/riscv/include/asm/Kbuild
> > > +++ b/arch/riscv/include/asm/Kbuild
> > > @@ -2,10 +2,12 @@
> > >  generic-y += early_ioremap.h
> > >  generic-y += flat.h
> > >  generic-y += kvm_para.h
> > > +generic-y += mcs_spinlock.h
> > >  generic-y += parport.h
> > > -generic-y += spinlock.h
> > >  generic-y += spinlock_types.h
> > > +generic-y += ticket_spinlock.h
> > >  generic-y += qrwlock.h
> > >  generic-y += qrwlock_types.h
> > > +generic-y += qspinlock.h
> > >  generic-y += user.h
> > >  generic-y += vmlinux.lds.h
> > > diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> > > new file mode 100644
> > > index 000000000000..4856d50006f2
> > > --- /dev/null
> > > +++ b/arch/riscv/include/asm/spinlock.h
> > > @@ -0,0 +1,39 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +
> > > +#ifndef __ASM_RISCV_SPINLOCK_H
> > > +#define __ASM_RISCV_SPINLOCK_H
> > > +
> > > +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> > > +#define _Q_PENDING_LOOPS       (1 << 9)
> > > +
> > > +#define __no_arch_spinlock_redefine
> > > +#include <asm/ticket_spinlock.h>
> > > +#include <asm/qspinlock.h>
> > > +#include <asm/alternative.h>
> > > +
> > > +DECLARE_STATIC_KEY_TRUE(qspinlock_key);
> > > +
> > > +#define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
> > > +static __always_inline type arch_spin_##op(type_lock lock)             \
> > > +{                                                                      \
> > > +       if (static_branch_unlikely(&qspinlock_key))                     \
> > > +               return queued_spin_##op(lock);                          \
> > > +       return ticket_spin_##op(lock);                                  \
> > > +}
> > > +
> > > +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *)
> > > +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *)
> > > +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *)
> > > +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *)
> > > +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *)
> > > +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t)
> > > +
> > > +#else
> > > +
> > > +#include <asm/ticket_spinlock.h>
> > > +
> > > +#endif
> > > +
> > > +#include <asm/qrwlock.h>
> > > +
> > > +#endif /* __ASM_RISCV_SPINLOCK_H */
> > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> > > index 4f73c0ae44b2..929bccd4c9e5 100644
> > > --- a/arch/riscv/kernel/setup.c
> > > +++ b/arch/riscv/kernel/setup.c
> > > @@ -244,6 +244,26 @@ static void __init parse_dtb(void)
> > >  #endif
> > >  }
> > >
> > > +DEFINE_STATIC_KEY_TRUE(qspinlock_key);
> > > +EXPORT_SYMBOL(qspinlock_key);
> > > +
> > > +static void __init riscv_spinlock_init(void)
> > > +{
> > > +       asm goto(ALTERNATIVE("j %[no_zacas]", "nop", 0, RISCV_ISA_EXT_ZACAS, 1)
> > > +                : : : : no_zacas);
> > > +       asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0, RISCV_ISA_EXT_ZABHA, 1)
> > > +                : : : : qspinlock);
> > The requirement of qspinlock concerns the forward progress guarantee
> > in micro-arch design, which includes cmpxchg_loop & xchg_tail. So, I
> > don't think these features have a relationship with Qspinlock.
> >
> > If your machine doesn't have enough stickiness for a young exclusive
> > cacheline, fall back to ticket_lock.
>
> Could we use "Ziccrse: Main memory supports forward progress on LR/SC
> sequences" for qspinlock selection?
>
> +       asm goto(ALTERNATIVE("nop", "j %[qspinlock]", 0,
> RISCV_ISA_EXT_ZICCRSE, 1)
>                 : : : : qspinlock);
>
> [1] https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc

Yes, I'll do that, thanks for the pointer!

Thanks,

Alex

>
> >
> > > +
> > > +no_zacas:
> > > +       static_branch_disable(&qspinlock_key);
> > > +       pr_info("Ticket spinlock: enabled\n");
> > > +
> > > +       return;
> > > +
> > > +qspinlock:
> > > +       pr_info("Queued spinlock: enabled\n");
> > > +}
> > > +
> > >  extern void __init init_rt_signal_env(void);
> > >
> > >  void __init setup_arch(char **cmdline_p)
> > > @@ -295,6 +315,7 @@ void __init setup_arch(char **cmdline_p)
> > >         riscv_set_dma_cache_alignment();
> > >
> > >         riscv_user_isa_enable();
> > > +       riscv_spinlock_init();
> > >  }
> > >
> > >  bool arch_cpu_is_hotpluggable(int cpu)
> > > diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> > > index 0655aa5b57b2..bf47cca2c375 100644
> > > --- a/include/asm-generic/qspinlock.h
> > > +++ b/include/asm-generic/qspinlock.h
> > > @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> > >  }
> > >  #endif
> > >
> > > +#ifndef __no_arch_spinlock_redefine
> > >  /*
> > >   * Remapping spinlock architecture specific functions to the corresponding
> > >   * queued spinlock functions.
> > > @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> > >  #define arch_spin_lock(l)              queued_spin_lock(l)
> > >  #define arch_spin_trylock(l)           queued_spin_trylock(l)
> > >  #define arch_spin_unlock(l)            queued_spin_unlock(l)
> > > +#endif
> > >
> > >  #endif /* __ASM_GENERIC_QSPINLOCK_H */
> > > diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
> > > index cfcff22b37b3..325779970d8a 100644
> > > --- a/include/asm-generic/ticket_spinlock.h
> > > +++ b/include/asm-generic/ticket_spinlock.h
> > > @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> > >         return (s16)((val >> 16) - (val & 0xffff)) > 1;
> > >  }
> > >
> > > +#ifndef __no_arch_spinlock_redefine
> > >  /*
> > >   * Remapping spinlock architecture specific functions to the corresponding
> > >   * ticket spinlock functions.
> > > @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> > >  #define arch_spin_lock(l)              ticket_spin_lock(l)
> > >  #define arch_spin_trylock(l)           ticket_spin_trylock(l)
> > >  #define arch_spin_unlock(l)            ticket_spin_unlock(l)
> > > +#endif
> > >
> > >  #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
> > > --
> > > 2.39.2
> > >
> >
> >
> > --
> > Best Regards
> >  Guo Ren
>
>
>
> --
> Best Regards
>  Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-09 23:47       ` Andrea Parri
@ 2024-07-15 11:48         ` Alexandre Ghiti
  0 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-15 11:48 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

Hi Andrea,

On Wed, Jul 10, 2024 at 1:47 AM Andrea Parri <parri.andrea@gmail.com> wrote:
>
> > > Is this second IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check actually needed?
> > > (just wondering - no real objection)
> >
> > To me yes, otherwise a toolchain without zacas support would fail to
> > assemble the amocas instruction.
>
> To elaborate on my question:  Such a toolchain may be able to recognize
> that the block of code following the zacas: label (and comprising the
> amocas instruction) can't be reached/executed if the first IS_ENABLED()
> evaluates to false (due to the goto end; statement), and consequently it
> may compile out the entire block/instruction no matter the presence or
> not of the second IS_ENABLE() check.  IOW, such a toolchain/compiler may
> not actually have to assemble the amocas instruction under such config.
> In fact, this is how the current gcc trunk (which doesn't support zacas)
> seems to behave.  And this very same optimization/code removal seems to
> be performed by clang when CONFIG_RISCV_ISA_ZACAS=n.  IAC, I'd agree it
> is good to be explicit in the sources and keep both of these checks.

Indeed, clang works fine without the second IS_ENABLED(). I'll remove
it then as the code is complex enough.

Thanks,

Alex

>
>
> > > Why the semicolon?
> >
> > That fixes a clang warning reported by Nathan here:
> > https://lore.kernel.org/linux-riscv/20240528193110.GA2196855@thelio-3990X/
>
> I see.  Thanks for the pointer.
>
>
> > > This is because the compiler doesn't realize __ret is actually
> > > initialized, right?  IAC, seems a bit unexpected to initialize
> > > with (old) (which indicates SUCCESS of the CMPXCHG operation);
> > > how about using (new) for the initialization of __ret instead?
> > > would (new) still work for you?
> >
> > But amocas rd register must contain the expected old value in order to
> > actually work right?
>
> Agreed.  Thanks for the clarification.
>
>   Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha
  2024-07-09 23:51       ` Andrea Parri
@ 2024-07-15 12:56         ` Alexandre Ghiti
  0 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-15 12:56 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch

Hi Andrea,

On Wed, Jul 10, 2024 at 1:51 AM Andrea Parri <parri.andrea@gmail.com> wrote:
>
> > > I admit that I found this all quite difficult to read; IIUC, this is
> > > missing an IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) check.
> >
> > I'm not sure we need the zacas check here, since we could use a toolchain
> > that supports zabha but not zacas, run this on a zabha/zacas platform and it
> > would work.
>
> One specific set-up I was concerned about is as follows:
>
>   1) hardware implements both zabha and zacas
>   2) toolchain supports both zabha and zacas
>   3) CONFIG_RISCV_ISA_ZABHA=y and CONFIG_RISCV_ISA_ZACAS=n
>
> Since CONFIG_RISCV_ISA_ZABHA=y, the first asm goto will get executed
> and, since the hardware implements zacas, that will result in a nop.
> Then the second asm goto will get executed and, since the hardware
> implements zabha, it will result in the j zabha.  In conclusion, the
> amocas instruction following the zabha: label will get executed, thus
> violating (the semantics of) CONFIG_RISCV_ISA_ZACAS=n.  IIUC, the diff
> I've posted previously in this thread shared a similar limitation/bug.

So you mean that when disabling Zacas, we should actually disable
*all* the CAS instructions, even the Zabha ones. It makes sense and
allows for a single way to disable the CAS instructions but keeping
the other atomic operations.

I'll fix that and add a comment.

Thanks,

Alex

>
>   Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha
  2024-07-10  1:37   ` Guo Ren
@ 2024-07-15 13:20     ` Alexandre Ghiti
  0 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-15 13:20 UTC (permalink / raw)
  To: Guo Ren
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Guo,

On Wed, Jul 10, 2024 at 3:37 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Wed, Jun 26, 2024 at 9:10 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > This adds runtime support for Zabha in xchg8/16() operations.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/cmpxchg.h | 33 +++++++++++++++++++++++++++++---
> >  arch/riscv/include/asm/hwcap.h   |  1 +
> >  arch/riscv/kernel/cpufeature.c   |  1 +
> >  3 files changed, 32 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > index da42f32ea53d..eb35e2d30a97 100644
> > --- a/arch/riscv/include/asm/cmpxchg.h
> > +++ b/arch/riscv/include/asm/cmpxchg.h
> > @@ -11,8 +11,17 @@
> >  #include <asm/fence.h>
> >  #include <asm/alternative.h>
> >
> > -#define __arch_xchg_masked(sc_sfx, prepend, append, r, p, n)           \
> > +#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append,       \
> > +                          swap_append, r, p, n)                        \
> >  ({                                                                     \
> > +       __label__ zabha, end;                                           \
> > +                                                                       \
> > +       if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {                       \
> > +               asm goto(ALTERNATIVE("nop", "j %[zabha]", 0,            \
> > +                                    RISCV_ISA_EXT_ZABHA, 1)            \
> > +                        : : : : zabha);                                \
> > +       }                                                               \
> > +                                                                       \
> Could we exchange the sequence between Zabha & lr/sc?
> I mean:
> nop -> zabha
> j -> lr/sc
>

Yes, you're right, it makes more sense this way. I'll do that.

Thanks,

Alex

>
> >         u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3);                     \
> >         ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE;  \
> >         ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0)   \
> > @@ -28,12 +37,25 @@
> >                "        or   %1, %1, %z3\n"                             \
> >                "        sc.w" sc_sfx " %1, %1, %2\n"                    \
> >                "        bnez %1, 0b\n"                                  \
> > -              append                                                   \
> > +              sc_append                                                        \
> >                : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))       \
> >                : "rJ" (__newx), "rJ" (~__mask)                          \
> >                : "memory");                                             \
> >                                                                         \
> >         r = (__typeof__(*(p)))((__retx & __mask) >> __s);               \
> > +       goto end;                                                       \
> > +                                                                       \
> > +zabha:
> jump lr/sc implementation because it's already slow.
>                                                                \
> > +       if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA)) {                       \
> > +               __asm__ __volatile__ (                                  \
> > +                       prepend                                         \
> > +                       "       amoswap" swap_sfx " %0, %z2, %1\n"      \
> > +                       swap_append                                             \
> > +                       : "=&r" (r), "+A" (*(p))                        \
> > +                       : "rJ" (n)                                      \
> > +                       : "memory");                                    \
> > +       }                                                               \
> > +end:;                                                                  \
> >  })
> >
> >  #define __arch_xchg(sfx, prepend, append, r, p, n)                     \
> > @@ -56,8 +78,13 @@
> >                                                                         \
> >         switch (sizeof(*__ptr)) {                                       \
> >         case 1:                                                         \
> > +               __arch_xchg_masked(sc_sfx, ".b" swap_sfx,               \
> > +                                  prepend, sc_append, swap_append,     \
> > +                                  __ret, __ptr, __new);                \
> > +               break;                                                  \
> >         case 2:                                                         \
> > -               __arch_xchg_masked(sc_sfx, prepend, sc_append,          \
> > +               __arch_xchg_masked(sc_sfx, ".h" swap_sfx,               \
> > +                                  prepend, sc_append, swap_append,     \
> >                                    __ret, __ptr, __new);                \
> >                 break;                                                  \
> >         case 4:                                                         \
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index e17d0078a651..f71ddd2ca163 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -81,6 +81,7 @@
> >  #define RISCV_ISA_EXT_ZTSO             72
> >  #define RISCV_ISA_EXT_ZACAS            73
> >  #define RISCV_ISA_EXT_XANDESPMU                74
> > +#define RISCV_ISA_EXT_ZABHA            75
> >
> >  #define RISCV_ISA_EXT_XLINUXENVCFG     127
> >
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 5ef48cb20ee1..c125d82c894b 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -257,6 +257,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
> >         __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> >         __RISCV_ISA_EXT_DATA(zihpm, RISCV_ISA_EXT_ZIHPM),
> >         __RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS),
> > +       __RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA),
> >         __RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
> >         __RISCV_ISA_EXT_DATA(zfh, RISCV_ISA_EXT_ZFH),
> >         __RISCV_ISA_EXT_DATA(zfhmin, RISCV_ISA_EXT_ZFHMIN),
> > --
> > 2.39.2
> >
>
>
> --
> Best Regards
>  Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-07-15  7:27     ` Alexandre Ghiti
@ 2024-07-15 19:30       ` Waiman Long
  2024-07-16  1:04         ` Guo Ren
  0 siblings, 1 reply; 45+ messages in thread
From: Waiman Long @ 2024-07-15 19:30 UTC (permalink / raw)
  To: Alexandre Ghiti, Guo Ren
  Cc: Jonathan Corbet, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Andrea Parri, Nathan Chancellor, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Arnd Bergmann, Leonardo Bras, linux-doc,
	linux-kernel, linux-riscv, linux-arch

On 7/15/24 03:27, Alexandre Ghiti wrote:
> Hi Guo,
>
> On Sun, Jul 7, 2024 at 4:20 AM Guo Ren <guoren@kernel.org> wrote:
>> On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>>> In order to produce a generic kernel, a user can select
>>> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
>>> spinlock implementation if Zabha is not present.
>>>
>>> Note that we can't use alternatives here because the discovery of
>>> extensions is done too late and we need to start with the qspinlock
>> That's not true; we treat spinlock as qspinlock at first.
> That's what I said: we have to use the qspinlock implementation at
> first *because* we can't discover the extensions soon enough to use
> the right spinlock implementation before the kernel uses a spinlock.
> And since the spinlocks are used *before* the discovery of the
> extensions, we cannot use the current alternative mechanism or we'd
> need to extend it to add an "initial" value which does not depend on
> the available extensions.

With qspinlock, the lock remains zero after a lock/unlock sequence. That 
is not the case with ticket lock. Assuming that all the discovery will 
be done before SMP boot, the qspinlock slowpath won't be activated and 
so we don't need the presence of any extension. I believe that is the 
main reason why qspinlock is used as the initial default and not ticket 
lock.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-07-15 19:30       ` Waiman Long
@ 2024-07-16  1:04         ` Guo Ren
  2024-07-16  6:43           ` Alexandre Ghiti
  0 siblings, 1 reply; 45+ messages in thread
From: Guo Ren @ 2024-07-16  1:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexandre Ghiti, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Andrea Parri, Nathan Chancellor, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

On Tue, Jul 16, 2024 at 3:30 AM Waiman Long <longman@redhat.com> wrote:
>
> On 7/15/24 03:27, Alexandre Ghiti wrote:
> > Hi Guo,
> >
> > On Sun, Jul 7, 2024 at 4:20 AM Guo Ren <guoren@kernel.org> wrote:
> >> On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >>> In order to produce a generic kernel, a user can select
> >>> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> >>> spinlock implementation if Zabha is not present.
> >>>
> >>> Note that we can't use alternatives here because the discovery of
> >>> extensions is done too late and we need to start with the qspinlock
> >> That's not true; we treat spinlock as qspinlock at first.
> > That's what I said: we have to use the qspinlock implementation at
> > first *because* we can't discover the extensions soon enough to use
> > the right spinlock implementation before the kernel uses a spinlock.
> > And since the spinlocks are used *before* the discovery of the
> > extensions, we cannot use the current alternative mechanism or we'd
> > need to extend it to add an "initial" value which does not depend on
> > the available extensions.
>
> With qspinlock, the lock remains zero after a lock/unlock sequence. That
> is not the case with ticket lock. Assuming that all the discovery will
> be done before SMP boot, the qspinlock slowpath won't be activated and
> so we don't need the presence of any extension. I believe that is the
> main reason why qspinlock is used as the initial default and not ticket
> lock.
Thx Waiman,
Yes, qspinlock is a clean guy, but ticket lock is a dirty one.

Hi Alexandre,
Therefore, the switch point(before reset_init()) is late enough to
change the lock mechanism, and this satisfies the requirements of
apply_boot_alternatives(), apply_early_boot_alternatives(), and
apply_module_alternatives().

>
> Cheers,
> Longman
>


-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-07-16  1:04         ` Guo Ren
@ 2024-07-16  6:43           ` Alexandre Ghiti
  2024-07-16  8:31             ` Guo Ren
  0 siblings, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-16  6:43 UTC (permalink / raw)
  To: Guo Ren
  Cc: Waiman Long, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Andrea Parri, Nathan Chancellor, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

Hi Guo, Waiman,

On Tue, Jul 16, 2024 at 3:05 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Jul 16, 2024 at 3:30 AM Waiman Long <longman@redhat.com> wrote:
> >
> > On 7/15/24 03:27, Alexandre Ghiti wrote:
> > > Hi Guo,
> > >
> > > On Sun, Jul 7, 2024 at 4:20 AM Guo Ren <guoren@kernel.org> wrote:
> > >> On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > >>> In order to produce a generic kernel, a user can select
> > >>> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> > >>> spinlock implementation if Zabha is not present.
> > >>>
> > >>> Note that we can't use alternatives here because the discovery of
> > >>> extensions is done too late and we need to start with the qspinlock
> > >> That's not true; we treat spinlock as qspinlock at first.
> > > That's what I said: we have to use the qspinlock implementation at
> > > first *because* we can't discover the extensions soon enough to use
> > > the right spinlock implementation before the kernel uses a spinlock.
> > > And since the spinlocks are used *before* the discovery of the
> > > extensions, we cannot use the current alternative mechanism or we'd
> > > need to extend it to add an "initial" value which does not depend on
> > > the available extensions.
> >
> > With qspinlock, the lock remains zero after a lock/unlock sequence. That
> > is not the case with ticket lock. Assuming that all the discovery will
> > be done before SMP boot, the qspinlock slowpath won't be activated and
> > so we don't need the presence of any extension. I believe that is the
> > main reason why qspinlock is used as the initial default and not ticket
> > lock.
> Thx Waiman,
> Yes, qspinlock is a clean guy, but ticket lock is a dirty one.

Guys, we all agree on that, that's why I kept this behaviour in this patchset.

>
> Hi Alexandre,
> Therefore, the switch point(before reset_init()) is late enough to
> change the lock mechanism, and this satisfies the requirements of
> apply_boot_alternatives(), apply_early_boot_alternatives(), and
> apply_module_alternatives().

I can't find reset_init().

The discovery of the extensions is done in riscv_fill_hwcap() called
from setup_arch()
https://elixir.bootlin.com/linux/latest/source/arch/riscv/kernel/setup.c#L288.
So *before* this point, we have no knowledge of the available
extensions on the platform.  Let's imagine now that we use an
alternative for the qspinlock implementation, it will look like this
(I use only zabha here, that's an example):

--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -16,8 +16,12 @@ DECLARE_STATIC_KEY_TRUE(qspinlock_key);
 #define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
 static __always_inline type arch_spin_##op(type_lock lock)             \
 {                                                                      \
-       if (static_branch_unlikely(&qspinlock_key))                     \
-               return queued_spin_##op(lock);                          \
+       asm goto(ALTERNATIVE("j %[no_zabha]", "nop", 0,                 \
+                                     RISCV_ISA_EXT_ZABHA, 1)            \
+                         : : : : no_zabha);                            \
+                                                                       \
+       return queued_spin_##op(lock);                                  \
+no_zabha:                                                              \
        return ticket_spin_##op(lock);                                  \
 }

apply_early_boot_alternatives() can't be used to patch the above
alternative since it is called from setup_vm(), way before we know the
available extensions.
apply_boot_alternatives(), instead, is called after riscv_fill_hwcap()
and then will be able to patch the alternatives correctly
https://elixir.bootlin.com/linux/latest/source/arch/riscv/kernel/setup.c#L290.

But then, before apply_boot_alternatives(), any use of a spinlock
(printk does so) will use a ticket spinlock implementation, and not
the qspinlock implementation. How do you fix that?

>
> >
> > Cheers,
> > Longman
> >
>
>
> --
> Best Regards
>  Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-07-16  6:43           ` Alexandre Ghiti
@ 2024-07-16  8:31             ` Guo Ren
  2024-07-17  6:19               ` Alexandre Ghiti
  0 siblings, 1 reply; 45+ messages in thread
From: Guo Ren @ 2024-07-16  8:31 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Waiman Long, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Andrea Parri, Nathan Chancellor, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

On Tue, Jul 16, 2024 at 2:43 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> Hi Guo, Waiman,
>
> On Tue, Jul 16, 2024 at 3:05 AM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Tue, Jul 16, 2024 at 3:30 AM Waiman Long <longman@redhat.com> wrote:
> > >
> > > On 7/15/24 03:27, Alexandre Ghiti wrote:
> > > > Hi Guo,
> > > >
> > > > On Sun, Jul 7, 2024 at 4:20 AM Guo Ren <guoren@kernel.org> wrote:
> > > >> On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > >>> In order to produce a generic kernel, a user can select
> > > >>> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> > > >>> spinlock implementation if Zabha is not present.
> > > >>>
> > > >>> Note that we can't use alternatives here because the discovery of
> > > >>> extensions is done too late and we need to start with the qspinlock
> > > >> That's not true; we treat spinlock as qspinlock at first.
> > > > That's what I said: we have to use the qspinlock implementation at
> > > > first *because* we can't discover the extensions soon enough to use
> > > > the right spinlock implementation before the kernel uses a spinlock.
> > > > And since the spinlocks are used *before* the discovery of the
> > > > extensions, we cannot use the current alternative mechanism or we'd
> > > > need to extend it to add an "initial" value which does not depend on
> > > > the available extensions.
> > >
> > > With qspinlock, the lock remains zero after a lock/unlock sequence. That
> > > is not the case with ticket lock. Assuming that all the discovery will
> > > be done before SMP boot, the qspinlock slowpath won't be activated and
> > > so we don't need the presence of any extension. I believe that is the
> > > main reason why qspinlock is used as the initial default and not ticket
> > > lock.
> > Thx Waiman,
> > Yes, qspinlock is a clean guy, but ticket lock is a dirty one.
>
> Guys, we all agree on that, that's why I kept this behaviour in this patchset.
>
> >
> > Hi Alexandre,
> > Therefore, the switch point(before reset_init()) is late enough to
> > change the lock mechanism, and this satisfies the requirements of
> > apply_boot_alternatives(), apply_early_boot_alternatives(), and
> > apply_module_alternatives().
>
> I can't find reset_init().
Sorry for the typo, rest_init()
>
> The discovery of the extensions is done in riscv_fill_hwcap() called
> from setup_arch()
> https://elixir.bootlin.com/linux/latest/source/arch/riscv/kernel/setup.c#L288.
> So *before* this point, we have no knowledge of the available
> extensions on the platform.  Let's imagine now that we use an
> alternative for the qspinlock implementation, it will look like this
> (I use only zabha here, that's an example):
>
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -16,8 +16,12 @@ DECLARE_STATIC_KEY_TRUE(qspinlock_key);
>  #define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
>  static __always_inline type arch_spin_##op(type_lock lock)             \
>  {                                                                      \
> -       if (static_branch_unlikely(&qspinlock_key))                     \
> -               return queued_spin_##op(lock);                          \
> +       asm goto(ALTERNATIVE("j %[no_zabha]", "nop", 0,                 \
> +                                     RISCV_ISA_EXT_ZABHA, 1)            \
> +                         : : : : no_zabha);                            \
> +                                                                       \
> +       return queued_spin_##op(lock);                                  \
> +no_zabha:                                                              \
>         return ticket_spin_##op(lock);                                  \
>  }
>
> apply_early_boot_alternatives() can't be used to patch the above
> alternative since it is called from setup_vm(), way before we know the
> available extensions.
> apply_boot_alternatives(), instead, is called after riscv_fill_hwcap()
> and then will be able to patch the alternatives correctly
> https://elixir.bootlin.com/linux/latest/source/arch/riscv/kernel/setup.c#L290.
>
> But then, before apply_boot_alternatives(), any use of a spinlock
> (printk does so) will use a ticket spinlock implementation, and not
> the qspinlock implementation. How do you fix that?
Why "before apply_boot_alternatives(), any use of a spinlock (printk
does so) will use a ticket spinlock implementation" ?
We treat qspinlock as the initial spinlock forever, so there is only
qspinlock -> ticket_lock direction and no reverse.

>
> >
> > >
> > > Cheers,
> > > Longman
> > >
> >
> >
> > --
> > Best Regards
> >  Guo Ren



-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-05 17:27     ` Nathan Chancellor
@ 2024-07-16 12:19       ` Alexandre Ghiti
  2024-07-16 14:00         ` Nathan Chancellor
  0 siblings, 1 reply; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-16 12:19 UTC (permalink / raw)
  To: Nathan Chancellor, Conor Dooley
  Cc: kernel test robot, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Andrea Parri, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, Arnd Bergmann, Leonardo Bras, Guo Ren,
	linux-doc, linux-kernel, linux-riscv, linux-arch, llvm,
	oe-kbuild-all

Hi Nathan,

On Fri, Jul 5, 2024 at 7:27 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Thu, Jul 04, 2024 at 11:38:46AM +0800, kernel test robot wrote:
> > Hi Alexandre,
> >
> > kernel test robot noticed the following build errors:
> >
> > [auto build test ERROR on soc/for-next]
> > [also build test ERROR on linus/master v6.10-rc6 next-20240703]
> > [cannot apply to arnd-asm-generic/master robh/for-next tip/locking/core]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> >
> > url:    https://github.com/intel-lab-lkp/linux/commits/Alexandre-Ghiti/riscv-Implement-cmpxchg32-64-using-Zacas/20240627-034946
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
> > patch link:    https://lore.kernel.org/r/20240626130347.520750-2-alexghiti%40rivosinc.com
> > patch subject: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
> > config: riscv-randconfig-002-20240704 (https://download.01.org/0day-ci/archive/20240704/202407041157.odTZAYZ6-lkp@intel.com/config)
> > compiler: clang version 16.0.6 (https://github.com/llvm/llvm-project 7cbf1a2591520c2491aa35339f227775f4d3adf6)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240704/202407041157.odTZAYZ6-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202407041157.odTZAYZ6-lkp@intel.com/
> >
> > All errors (new ones prefixed by >>):
> >
> > >> kernel/sched/core.c:11873:7: error: cannot jump from this asm goto statement to one of its possible targets
> >                    if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
> >                        ^
> >    include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
> >            raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> >            ^
> >    include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
> >            ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
> >                   ^
> >    include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
> >    #define raw_cmpxchg arch_cmpxchg
> >                        ^
> >    arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
> >            _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
> >            ^
> >    arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
> >                    __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
> >                    ^
> >    arch/riscv/include/asm/cmpxchg.h:144:3: note: expanded from macro '__arch_cmpxchg'
> >                    asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,            \
> >                    ^
> >    kernel/sched/core.c:11840:7: note: possible target of asm goto statement
> >            if (!try_cmpxchg(&pcpu_cid->cid, &cid, lazy_cid))
> >                 ^
> >    include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
> >            raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> >            ^
> >    include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
> >            ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
> >                   ^
> >    include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
> >    #define raw_cmpxchg arch_cmpxchg
> >                        ^
> >    arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
> >            _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
> >            ^
> >    arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
> >                    __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
> >                    ^
> >    arch/riscv/include/asm/cmpxchg.h:161:10: note: expanded from macro '__arch_cmpxchg'
> >                                                                            \
> >                                                                            ^
> >    kernel/sched/core.c:11872:2: note: jump exits scope of variable with __attribute__((cleanup))
> >            scoped_guard (irqsave) {
> >            ^
> >    include/linux/cleanup.h:169:20: note: expanded from macro 'scoped_guard'
> >            for (CLASS(_name, scope)(args),                                 \
> >                              ^
> >    kernel/sched/core.c:11840:7: error: cannot jump from this asm goto statement to one of its possible targets
> >            if (!try_cmpxchg(&pcpu_cid->cid, &cid, lazy_cid))
> >                 ^
> >    include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
> >            raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> >            ^
> >    include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
> >            ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
> >                   ^
> >    include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
> >    #define raw_cmpxchg arch_cmpxchg
> >                        ^
> >    arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
> >            _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
> >            ^
> >    arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
> >                    __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
> >                    ^
> >    arch/riscv/include/asm/cmpxchg.h:144:3: note: expanded from macro '__arch_cmpxchg'
> >                    asm goto(ALTERNATIVE("nop", "j %[zacas]", 0,            \
> >                    ^
> >    kernel/sched/core.c:11873:7: note: possible target of asm goto statement
> >                    if (try_cmpxchg(&pcpu_cid->cid, &lazy_cid, MM_CID_UNSET))
> >                        ^
> >    include/linux/atomic/atomic-instrumented.h:4880:2: note: expanded from macro 'try_cmpxchg'
> >            raw_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> >            ^
> >    include/linux/atomic/atomic-arch-fallback.h:192:9: note: expanded from macro 'raw_try_cmpxchg'
> >            ___r = raw_cmpxchg((_ptr), ___o, (_new)); \
> >                   ^
> >    include/linux/atomic/atomic-arch-fallback.h:55:21: note: expanded from macro 'raw_cmpxchg'
> >    #define raw_cmpxchg arch_cmpxchg
> >                        ^
> >    arch/riscv/include/asm/cmpxchg.h:212:2: note: expanded from macro 'arch_cmpxchg'
> >            _arch_cmpxchg((ptr), (o), (n), ".rl", "", "     fence rw, rw\n")
> >            ^
> >    arch/riscv/include/asm/cmpxchg.h:189:3: note: expanded from macro '_arch_cmpxchg'
> >                    __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append,      \
> >                    ^
> >    arch/riscv/include/asm/cmpxchg.h:161:10: note: expanded from macro '__arch_cmpxchg'
> >                                                                            \
> >                                                                            ^
> >    kernel/sched/core.c:11872:2: note: jump bypasses initialization of variable with __attribute__((cleanup))
> >            scoped_guard (irqsave) {
> >            ^
> >    include/linux/cleanup.h:169:20: note: expanded from macro 'scoped_guard'
> >            for (CLASS(_name, scope)(args),                                 \
> >                              ^
> >    2 errors generated.
>
> Ugh, this is an unfortunate interaction with clang's jump scope analysis
> and asm goto in LLVM releases prior to 17 :/
>
> https://github.com/ClangBuiltLinux/linux/issues/1886#issuecomment-1645979992
>
> Unfortunately, 'if (0)' does not prevent this (the analysis runs early
> in the front end as far as I understand it), we would need to workaround
> this with full preprocessor guards...

Thanks for jumping in on this one, I was quite puzzled :)

>
> Another alternative would be to require LLVM 17+ for RISC-V, which may
> not be the worst alternative, since I think most people doing serious
> work with clang will probably be living close to tip of tree anyways
> because of all the extension work that goes on upstream.

Stupid question but why the fix in llvm 17 was not backported to
previous versions?

Anyway, I'd rather require llvm 17+ than add a bunch of preprocessor
guards in this file (IIUC what you said above) as it is complex
enough.

@Conor Dooley @Palmer Dabbelt WDYT? Is there any interest in
supporting llvm < 17? We may encounter this bug again in the future so
I'd be in favor of moving to llvm 17+.

Thanks again Nathan,

Alex

>
> I am open to other thoughts though.
>
> Cheers,
> Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas
  2024-07-16 12:19       ` Alexandre Ghiti
@ 2024-07-16 14:00         ` Nathan Chancellor
  0 siblings, 0 replies; 45+ messages in thread
From: Nathan Chancellor @ 2024-07-16 14:00 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Conor Dooley, kernel test robot, Jonathan Corbet, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrea Parri, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, Guo Ren, linux-doc, linux-kernel, linux-riscv,
	linux-arch, llvm, oe-kbuild-all

On Tue, Jul 16, 2024 at 02:19:57PM +0200, Alexandre Ghiti wrote:
> On Fri, Jul 5, 2024 at 7:27 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > Another alternative would be to require LLVM 17+ for RISC-V, which may
> > not be the worst alternative, since I think most people doing serious
> > work with clang will probably be living close to tip of tree anyways
> > because of all the extension work that goes on upstream.
> 
> Stupid question but why the fix in llvm 17 was not backported to
> previous versions?

Unfortunately, LLVM releases are only supported for a few months with
fixes, unlike GCC that supported their releases for a few years. By the
time this issue was uncovered and resolved in LLVM main (17 at the
time), LLVM 16 was no longer supported.

I could potentially patch the kernel.org toolchains but that doesn't fix
the issue for other versions of clang out there.

> Anyway, I'd rather require llvm 17+ than add a bunch of preprocessor
> guards in this file (IIUC what you said above) as it is complex
> enough.

Sure, this is not a super unreasonable issue to bump the minimum
supported version for RISC-V over in my opinion, so no real objections
from me.

> @Conor Dooley @Palmer Dabbelt WDYT? Is there any interest in
> supporting llvm < 17? We may encounter this bug again in the future so
> I'd be in favor of moving to llvm 17+.

FWIW, I would envision a diff like this (assuming it actually works to
resolve this issue, I didn't actually test it):

diff --git a/scripts/min-tool-version.sh b/scripts/min-tool-version.sh
index 91c91201212c..e81eb7ed257d 100755
--- a/scripts/min-tool-version.sh
+++ b/scripts/min-tool-version.sh
@@ -28,6 +28,8 @@ llvm)
 		echo 15.0.0
 	elif [ "$SRCARCH" = loongarch ]; then
 		echo 18.0.0
+	elif [ "$SRCARCH" = riscv ]; then
+		echo 17.0.0
 	else
 		echo 13.0.1
 	fi

Cheers,
Nathan

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension
  2024-07-16  8:31             ` Guo Ren
@ 2024-07-17  6:19               ` Alexandre Ghiti
  0 siblings, 0 replies; 45+ messages in thread
From: Alexandre Ghiti @ 2024-07-17  6:19 UTC (permalink / raw)
  To: Guo Ren
  Cc: Waiman Long, Jonathan Corbet, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Andrea Parri, Nathan Chancellor, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Arnd Bergmann,
	Leonardo Bras, linux-doc, linux-kernel, linux-riscv, linux-arch

On Tue, Jul 16, 2024 at 10:31 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Jul 16, 2024 at 2:43 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > Hi Guo, Waiman,
> >
> > On Tue, Jul 16, 2024 at 3:05 AM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Tue, Jul 16, 2024 at 3:30 AM Waiman Long <longman@redhat.com> wrote:
> > > >
> > > > On 7/15/24 03:27, Alexandre Ghiti wrote:
> > > > > Hi Guo,
> > > > >
> > > > > On Sun, Jul 7, 2024 at 4:20 AM Guo Ren <guoren@kernel.org> wrote:
> > > > >> On Wed, Jun 26, 2024 at 9:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > >>> In order to produce a generic kernel, a user can select
> > > > >>> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket
> > > > >>> spinlock implementation if Zabha is not present.
> > > > >>>
> > > > >>> Note that we can't use alternatives here because the discovery of
> > > > >>> extensions is done too late and we need to start with the qspinlock
> > > > >> That's not true; we treat spinlock as qspinlock at first.
> > > > > That's what I said: we have to use the qspinlock implementation at
> > > > > first *because* we can't discover the extensions soon enough to use
> > > > > the right spinlock implementation before the kernel uses a spinlock.
> > > > > And since the spinlocks are used *before* the discovery of the
> > > > > extensions, we cannot use the current alternative mechanism or we'd
> > > > > need to extend it to add an "initial" value which does not depend on
> > > > > the available extensions.
> > > >
> > > > With qspinlock, the lock remains zero after a lock/unlock sequence. That
> > > > is not the case with ticket lock. Assuming that all the discovery will
> > > > be done before SMP boot, the qspinlock slowpath won't be activated and
> > > > so we don't need the presence of any extension. I believe that is the
> > > > main reason why qspinlock is used as the initial default and not ticket
> > > > lock.
> > > Thx Waiman,
> > > Yes, qspinlock is a clean guy, but ticket lock is a dirty one.
> >
> > Guys, we all agree on that, that's why I kept this behaviour in this patchset.
> >
> > >
> > > Hi Alexandre,
> > > Therefore, the switch point(before reset_init()) is late enough to
> > > change the lock mechanism, and this satisfies the requirements of
> > > apply_boot_alternatives(), apply_early_boot_alternatives(), and
> > > apply_module_alternatives().
> >
> > I can't find reset_init().
> Sorry for the typo, rest_init()
> >
> > The discovery of the extensions is done in riscv_fill_hwcap() called
> > from setup_arch()
> > https://elixir.bootlin.com/linux/latest/source/arch/riscv/kernel/setup.c#L288.
> > So *before* this point, we have no knowledge of the available
> > extensions on the platform.  Let's imagine now that we use an
> > alternative for the qspinlock implementation, it will look like this
> > (I use only zabha here, that's an example):
> >
> > --- a/arch/riscv/include/asm/spinlock.h
> > +++ b/arch/riscv/include/asm/spinlock.h
> > @@ -16,8 +16,12 @@ DECLARE_STATIC_KEY_TRUE(qspinlock_key);
> >  #define SPINLOCK_BASE_DECLARE(op, type, type_lock)                     \
> >  static __always_inline type arch_spin_##op(type_lock lock)             \
> >  {                                                                      \
> > -       if (static_branch_unlikely(&qspinlock_key))                     \
> > -               return queued_spin_##op(lock);                          \
> > +       asm goto(ALTERNATIVE("j %[no_zabha]", "nop", 0,                 \
> > +                                     RISCV_ISA_EXT_ZABHA, 1)            \
> > +                         : : : : no_zabha);                            \
> > +                                                                       \
> > +       return queued_spin_##op(lock);                                  \
> > +no_zabha:                                                              \
> >         return ticket_spin_##op(lock);                                  \
> >  }
> >
> > apply_early_boot_alternatives() can't be used to patch the above
> > alternative since it is called from setup_vm(), way before we know the
> > available extensions.
> > apply_boot_alternatives(), instead, is called after riscv_fill_hwcap()
> > and then will be able to patch the alternatives correctly
> > https://elixir.bootlin.com/linux/latest/source/arch/riscv/kernel/setup.c#L290.
> >
> > But then, before apply_boot_alternatives(), any use of a spinlock
> > (printk does so) will use a ticket spinlock implementation, and not
> > the qspinlock implementation. How do you fix that?
> Why "before apply_boot_alternatives(), any use of a spinlock (printk
> does so) will use a ticket spinlock implementation" ?
> We treat qspinlock as the initial spinlock forever, so there is only
> qspinlock -> ticket_lock direction and no reverse.

Can you please provide an implementation of what you suggest using the
current alternatives tools? I'm about to send the v3 of this series,
you can use this as a base.

Thanks,

Alex

>
> >
> > >
> > > >
> > > > Cheers,
> > > > Longman
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >  Guo Ren
>
>
>
> --
> Best Regards
>  Guo Ren

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2024-07-17  6:19 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-26 13:03 [PATCH v2 00/10] Zacas/Zabha support and qspinlocks Alexandre Ghiti
2024-06-26 13:03 ` [PATCH v2 01/10] riscv: Implement cmpxchg32/64() using Zacas Alexandre Ghiti
2024-06-27 11:06   ` Andrea Parri
2024-07-04 16:25     ` Alexandre Ghiti
2024-07-09 23:47       ` Andrea Parri
2024-07-15 11:48         ` Alexandre Ghiti
2024-07-04  3:38   ` kernel test robot
2024-07-05 17:27     ` Nathan Chancellor
2024-07-16 12:19       ` Alexandre Ghiti
2024-07-16 14:00         ` Nathan Chancellor
2024-06-26 13:03 ` [PATCH v2 02/10] dt-bindings: riscv: Add Zabha ISA extension description Alexandre Ghiti
2024-06-26 14:20   ` Krzysztof Kozlowski
2024-06-26 13:03 ` [PATCH v2 03/10] riscv: Implement cmpxchg8/16() using Zabha Alexandre Ghiti
2024-06-27 11:53   ` Andrea Parri
2024-06-29 19:19     ` Andrea Parri
2024-07-04 16:36     ` Alexandre Ghiti
2024-07-09 23:51       ` Andrea Parri
2024-07-15 12:56         ` Alexandre Ghiti
2024-06-26 13:03 ` [PATCH v2 04/10] riscv: Improve amocas.X use in cmpxchg() Alexandre Ghiti
2024-06-27 13:31   ` Andrea Parri
2024-07-04 16:40     ` Alexandre Ghiti
2024-06-26 13:03 ` [PATCH v2 05/10] riscv: Implement arch_cmpxchg128() using Zacas Alexandre Ghiti
2024-06-26 13:03 ` [PATCH v2 06/10] riscv: Implement xchg8/16() using Zabha Alexandre Ghiti
2024-06-27 13:45   ` Andrea Parri
2024-07-04 17:25     ` Alexandre Ghiti
2024-07-10  1:37   ` Guo Ren
2024-07-15 13:20     ` Alexandre Ghiti
2024-06-26 13:03 ` [PATCH v2 07/10] riscv: Improve amoswap.X use in xchg() Alexandre Ghiti
2024-06-27 13:58   ` Andrea Parri
2024-07-04 17:26     ` Alexandre Ghiti
2024-07-10  0:09       ` Andrea Parri
2024-06-26 13:03 ` [PATCH v2 08/10] asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock Alexandre Ghiti
2024-06-26 13:03 ` [PATCH v2 09/10] asm-generic: ticket-lock: Add separate ticket-lock.h Alexandre Ghiti
2024-06-26 13:03 ` [PATCH v2 10/10] riscv: Add qspinlock support based on Zabha extension Alexandre Ghiti
2024-06-27 15:19   ` Andrea Parri
2024-07-04 17:33     ` Alexandre Ghiti
2024-07-07  2:20   ` Guo Ren
2024-07-08 11:51     ` Guo Ren
2024-07-15  7:33       ` Alexandre Ghiti
2024-07-15  7:27     ` Alexandre Ghiti
2024-07-15 19:30       ` Waiman Long
2024-07-16  1:04         ` Guo Ren
2024-07-16  6:43           ` Alexandre Ghiti
2024-07-16  8:31             ` Guo Ren
2024-07-17  6:19               ` Alexandre Ghiti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).