linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops
@ 2025-06-17 18:36 Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 1/7] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-17 18:36 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel, Yeoreum Yun

Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
previleged level to access to access user memory without clearing
PSTATE.PAN bit.

This patchset support FEAT_LUSI and applies in futex atomic operation
where can replace from ldxr/stlxr pair implmentation with clearing
PSTATE.PAN bit to correspondant load/store unprevileged atomic operation
without clearing PSTATE.PAN bit.

Patch Sequences
================

Patch #1 adds cpufeature for FEAT_LUSI

Patch #2 expose FEAT_LUSI to guest

Patch #3 adds Kconfig for FEAT_LUSI

Patch #4 separtes former futex atomic-op implmentation from futex.h
to futex_ll_sc_u.h

Patch #5 implments futex atomic operation using lsui instruction.

Patch #6 introduces lsui.h to apply runtime patch to use former
implmentation when FEAT_LUSI doesn't support.

Patch #7 applies lsui.h into arch_futext_atomic_op().

Patch History
==============
from v2 to v3:
  - expose FEAT_LUSI to guest
  - add help section for LUSI Kconfig
  - https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/

from v1 to v2:
  - remove empty v9.6 menu entry
  - locate HAS_LUSI in cpucaps in order
  - https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/

Yeoreum Yun (7):
  arm64: cpufeature: add FEAT_LSUI
  arm64/kvm: expose FEAT_LSUI to guest
  arm64/Kconfig: add LSUI Kconfig
  arm64/futex: move futex atomic logic with clearing PAN bit
  arm64/futex: add futex atomic operation with FEAT_LSUI
  arm64/asm: introduce lsui.h
  arm64/futex: support futex with FEAT_LSUI

 arch/arm64/Kconfig                     |   9 ++
 arch/arm64/include/asm/futex.h         |  99 ++++++-------------
 arch/arm64/include/asm/futex_ll_sc_u.h | 115 +++++++++++++++++++++
 arch/arm64/include/asm/futex_lsui.h    | 132 +++++++++++++++++++++++++
 arch/arm64/include/asm/lsui.h          |  37 +++++++
 arch/arm64/kernel/cpufeature.c         |   8 ++
 arch/arm64/kvm/sys_regs.c              |   5 +-
 arch/arm64/tools/cpucaps               |   1 +
 8 files changed, 336 insertions(+), 70 deletions(-)
 create mode 100644 arch/arm64/include/asm/futex_ll_sc_u.h
 create mode 100644 arch/arm64/include/asm/futex_lsui.h
 create mode 100644 arch/arm64/include/asm/lsui.h

--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/7] arm64: cpufeature: add FEAT_LSUI
  2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
@ 2025-06-17 18:36 ` Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 2/7] arm64/kvm: expose FEAT_LSUI to guest Yeoreum Yun
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-17 18:36 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel, Yeoreum Yun

Since Armv9.6, FEAT_LSUI supplies load/store instructions
for privileged level to access user memory without clearing PSTATE.PAN bit.

Add LSUI feature so that the unprevilieged load/store instructions
could be used when kernel accesses user memory without clearing PSTATE.PAN bit.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 arch/arm64/kernel/cpufeature.c | 8 ++++++++
 arch/arm64/tools/cpucaps       | 1 +
 2 files changed, 9 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b34044e20128..d914982c7cee 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -278,6 +278,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar2[] = {
 
 static const struct arm64_ftr_bits ftr_id_aa64isar3[] = {
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_FPRCVT_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_LSUI_SHIFT, 4, ID_AA64ISAR3_EL1_LSUI_NI),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_FAMINMAX_SHIFT, 4, 0),
 	ARM64_FTR_END,
 };
@@ -3061,6 +3062,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_pmuv3,
 	},
 #endif
+	{
+		.desc = "Unprivileged Load Store Instructions (LSUI)",
+		.capability = ARM64_HAS_LSUI,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		ARM64_CPUID_FIELDS(ID_AA64ISAR3_EL1, LSUI, IMP)
+	},
 	{},
 };
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 10effd4cff6b..31f2cd655666 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -43,6 +43,7 @@ HAS_HCX
 HAS_LDAPR
 HAS_LPA2
 HAS_LSE_ATOMICS
+HAS_LSUI
 HAS_MOPS
 HAS_NESTED_VIRT
 HAS_PAN
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/7] arm64/kvm: expose FEAT_LSUI to guest
  2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 1/7] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
@ 2025-06-17 18:36 ` Yeoreum Yun
  2025-07-02 17:17   ` Marc Zyngier
  2025-06-17 18:36 ` [PATCH v3 3/7] arm64/Kconfig: add LSUI Kconfig Yeoreum Yun
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-17 18:36 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel, Yeoreum Yun

expose FEAT_LSUI to guest.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 76c2f0da821f..5c5a9c3ace2f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1636,7 +1636,8 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
 			val &= ~ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_WFxT);
 		break;
 	case SYS_ID_AA64ISAR3_EL1:
-		val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX;
+		val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX |
+		       ID_AA64ISAR3_EL1_LSUI;
 		break;
 	case SYS_ID_AA64MMFR2_EL1:
 		val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK;
@@ -2921,7 +2922,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 					ID_AA64ISAR2_EL1_APA3 |
 					ID_AA64ISAR2_EL1_GPA3)),
 	ID_WRITABLE(ID_AA64ISAR3_EL1, (ID_AA64ISAR3_EL1_FPRCVT |
-				       ID_AA64ISAR3_EL1_FAMINMAX)),
+				       ID_AA64ISAR3_EL1_FAMINMAX | ID_AA64ISAR3_EL1_LSUI)),
 	ID_UNALLOCATED(6,4),
 	ID_UNALLOCATED(6,5),
 	ID_UNALLOCATED(6,6),
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/7] arm64/Kconfig: add LSUI Kconfig
  2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 1/7] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 2/7] arm64/kvm: expose FEAT_LSUI to guest Yeoreum Yun
@ 2025-06-17 18:36 ` Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 4/7] arm64/futex: move futex atomic logic with clearing PAN bit Yeoreum Yun
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-17 18:36 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel, Yeoreum Yun

Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
previleged level to access to access user memory without clearing
PSTATE.PAN bit.
It's enough to add CONFIG_AS_HAS_LSUI only because the code for LUSI uses
indiviual `.arch_extension` entries.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 arch/arm64/Kconfig | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 55fc331af337..769fbb507996 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2237,6 +2237,15 @@ config ARM64_GCS
 
 endmenu # "v9.4 architectural features"
 
+config AS_HAS_LSUI
+	def_bool $(as-instr,.arch_extension lsui)
+	help
+	 Unprivileged Load Store is an extension to introduce unprivileged
+	 variants of load and store instructions so that clearing PSTATE.PAN
+	 is never required in privileged mode.
+	 This feature is available with clang version 20 and later and not yet
+	 supported by gcc.
+
 config ARM64_SVE
 	bool "ARM Scalable Vector Extension support"
 	default y
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/7] arm64/futex: move futex atomic logic with clearing PAN bit
  2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
                   ` (2 preceding siblings ...)
  2025-06-17 18:36 ` [PATCH v3 3/7] arm64/Kconfig: add LSUI Kconfig Yeoreum Yun
@ 2025-06-17 18:36 ` Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 5/7] arm64/futex: add futex atomic operation with FEAT_LSUI Yeoreum Yun
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-17 18:36 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel, Yeoreum Yun

Move current futex atomic logics which uses ll/sc method with cleraing
PSTATE.PAN to separate file (futex_ll_sc_u.h) so that
former method will be used only when FEAT_LSUI isn't supported.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 arch/arm64/include/asm/futex_ll_sc_u.h | 115 +++++++++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100644 arch/arm64/include/asm/futex_ll_sc_u.h

diff --git a/arch/arm64/include/asm/futex_ll_sc_u.h b/arch/arm64/include/asm/futex_ll_sc_u.h
new file mode 100644
index 000000000000..6702ba66f1b2
--- /dev/null
+++ b/arch/arm64/include/asm/futex_ll_sc_u.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2025 Arm Ltd.
+ */
+#ifndef __ASM_FUTEX_LL_SC_U_H
+#define __ASM_FUTEX_LL_SC_U_H
+
+#include <linux/uaccess.h>
+#include <linux/stringify.h>
+
+#define FUTEX_ATOMIC_OP(op, asm_op)					\
+static __always_inline int						\
+__ll_sc_u_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval)	\
+{									\
+	unsigned int loops = LL_SC_MAX_LOOPS;				\
+	int ret, val, tmp;						\
+									\
+	uaccess_enable_privileged();					\
+	asm volatile("// __ll_sc_u_futex_atomic_" #op "\n"		\
+	"	prfm	pstl1strm, %2\n"				\
+	"1:	ldxr	%w1, %2\n"					\
+	"	" #asm_op "	%w3, %w1, %w5\n"			\
+	"2:	stlxr	%w0, %w3, %2\n"					\
+	"	cbz	%w0, 3f\n"					\
+	"	sub	%w4, %w4, %w0\n"				\
+	"	cbnz	%w4, 1b\n"					\
+	"	mov	%w0, %w6\n"					\
+	"3:\n"								\
+	"	dmb	ish\n"						\
+	_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)				\
+	_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)				\
+	: "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp),		\
+	  "+r" (loops)							\
+	: "r" (oparg), "Ir" (-EAGAIN)					\
+	: "memory");							\
+	uaccess_disable_privileged();					\
+									\
+	if (!ret)							\
+		*oval = val;						\
+									\
+	return ret;							\
+}
+
+FUTEX_ATOMIC_OP(add, add)
+FUTEX_ATOMIC_OP(or, orr)
+FUTEX_ATOMIC_OP(and, and)
+FUTEX_ATOMIC_OP(eor, eor)
+
+#undef FUTEX_ATOMIC_OP
+
+static __always_inline int
+__ll_sc_u_futex_atomic_set(int oparg, u32 __user *uaddr, int *oval)
+{
+	unsigned int loops = LL_SC_MAX_LOOPS;
+	int ret, val;
+
+	uaccess_enable_privileged();
+	asm volatile("//__ll_sc_u_futex_xchg\n"
+	"	prfm	pstl1strm, %2\n"
+	"1:	ldxr	%w1, %2\n"
+	"2:	stlxr	%w0, %w4, %2\n"
+	"	cbz	%w3, 3f\n"
+	"	sub	%w3, %w3, %w0\n"
+	"	cbnz	%w3, 1b\n"
+	"	mov	%w0, %w5\n"
+	"3:\n"
+	"	dmb	ish\n"
+	_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
+	_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
+	: "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "+r" (loops)
+	: "r" (oparg), "Ir" (-EAGAIN)
+	: "memory");
+	uaccess_disable_privileged();
+
+	if (!ret)
+		*oval = val;
+
+	return ret;
+}
+
+static __always_inline int
+__ll_sc_u_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+	int ret = 0;
+	unsigned int loops = LL_SC_MAX_LOOPS;
+	u32 val, tmp;
+
+	uaccess_enable_privileged();
+	asm volatile("//__ll_sc_u_futex_cmpxchg\n"
+	"	prfm	pstl1strm, %2\n"
+	"1:	ldxr	%w1, %2\n"
+	"	eor	%w3, %w1, %w5\n"
+	"	cbnz	%w3, 4f\n"
+	"2:	stlxr	%w3, %w6, %2\n"
+	"	cbz	%w3, 3f\n"
+	"	sub	%w4, %w4, %w3\n"
+	"	cbnz	%w4, 1b\n"
+	"	mov	%w0, %w7\n"
+	"3:\n"
+	"	dmb	ish\n"
+	"4:\n"
+	_ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
+	_ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
+	: "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
+	: "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
+	: "memory");
+	uaccess_disable_privileged();
+
+	if (!ret)
+		*oval = val;
+
+	return ret;
+}
+
+#endif /* __ASM_FUTEX_LL_SC_U_H */
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 5/7] arm64/futex: add futex atomic operation with FEAT_LSUI
  2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
                   ` (3 preceding siblings ...)
  2025-06-17 18:36 ` [PATCH v3 4/7] arm64/futex: move futex atomic logic with clearing PAN bit Yeoreum Yun
@ 2025-06-17 18:36 ` Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 6/7] arm64/asm: introduce lsui.h Yeoreum Yun
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-17 18:36 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel, Yeoreum Yun

Current futex atomic operations are implemented with ll/sc instructions and
clearing PSTATE.PAN.

Since Armv9.6, FEAT_LSUI supplies not only load/store instructions but
also atomic operation for user memory access in kernel it doesn't need
to clear PSTATE.PAN bit anymore.

With theses instructions some of futex atomic operations don't need to
be implmented with ldxr/stlxr pair instead can be implmented with
one atomic operation supplied by FEAT_LSUI.

However, some of futex atomic operations still need to use ll/sc way
via ldtxr/stltxr supplied by FEAT_LSUI since there is no correspondant
atomic instruction or doesn't support word size operation
(i.e) eor, cas{mb}t But It's good to work without clearing PSTATE.PAN bit.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 arch/arm64/include/asm/futex_lsui.h | 132 ++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)
 create mode 100644 arch/arm64/include/asm/futex_lsui.h

diff --git a/arch/arm64/include/asm/futex_lsui.h b/arch/arm64/include/asm/futex_lsui.h
new file mode 100644
index 000000000000..0dc7dca91cdb
--- /dev/null
+++ b/arch/arm64/include/asm/futex_lsui.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2025 Arm Ltd.
+ */
+
+#ifndef __ASM_FUTEX_LSUI_H
+#define __ASM_FUTEX_LSUI_H
+
+#include <linux/uaccess.h>
+#include <linux/stringify.h>
+
+#define FUTEX_ATOMIC_OP(op, asm_op, mb)					\
+static __always_inline int						\
+__lsui_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval)	\
+{									\
+	int ret = 0;							\
+	int val;							\
+									\
+	mte_enable_tco();						\
+	uaccess_ttbr0_enable();						\
+									\
+	asm volatile("// __lsui_futex_atomic_" #op "\n"			\
+	__LSUI_PREAMBLE							\
+	"1:	" #asm_op #mb "	%w3, %w2, %1\n"				\
+	"2:\n"								\
+	_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %w0)				\
+	: "+r" (ret), "+Q" (*uaddr), "=r" (val)				\
+	: "r" (oparg)							\
+	: "memory");							\
+									\
+	mte_disable_tco();						\
+	uaccess_ttbr0_disable();					\
+									\
+	if (!ret)							\
+		*oval = val;						\
+									\
+	return ret;							\
+}
+
+FUTEX_ATOMIC_OP(add, ldtadd, al)
+FUTEX_ATOMIC_OP(or, ldtset, al)
+FUTEX_ATOMIC_OP(andnot, ldtclr, al)
+FUTEX_ATOMIC_OP(set, swpt, al)
+
+#undef FUTEX_ATOMIC_OP
+
+static __always_inline int
+__lsui_futex_atomic_and(int oparg, u32 __user *uaddr, int *oval)
+{
+	return __lsui_futex_atomic_andnot(~oparg, uaddr, oval);
+}
+
+static __always_inline int
+__lsui_futex_atomic_eor(int oparg, u32 __user *uaddr, int *oval)
+{
+	unsigned int loops = LL_SC_MAX_LOOPS;
+	int ret, val, tmp;
+
+	mte_enable_tco();
+	uaccess_ttbr0_enable();
+
+	asm volatile("// __lsui_futex_atomic_eor\n"
+	__LSUI_PREAMBLE
+	"	prfm	pstl1strm, %2\n"
+	"1:	ldtxr	%w1, %2\n"
+	"	eor	%w3, %w1, %w5\n"
+	"2:	stltxr	%w0, %w3, %2\n"
+	"	cbz	%w0, 3f\n"
+	"	sub	%w4, %w4, %w0\n"
+	"	cbnz	%w4, 1b\n"
+	"	mov	%w0, %w6\n"
+	"3:\n"
+	"	dmb	ish\n"
+	_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
+	_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
+	: "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp),
+	  "+r" (loops)
+	: "r" (oparg), "Ir" (-EAGAIN)
+	: "memory");
+
+	mte_disable_tco();
+	uaccess_ttbr0_disable();
+
+	if (!ret)
+		*oval = val;
+
+	return ret;
+}
+
+static __always_inline int
+__lsui_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+	int ret = 0;
+	unsigned int loops = LL_SC_MAX_LOOPS;
+	u32 val, tmp;
+
+	mte_enable_tco();
+	uaccess_ttbr0_enable();
+
+	/*
+	 * cas{al}t doesn't support word size...
+	 */
+	asm volatile("//__lsui_futex_cmpxchg\n"
+	__LSUI_PREAMBLE
+	"	prfm	pstl1strm, %2\n"
+	"1:	ldtxr	%w1, %2\n"
+	"	eor	%w3, %w1, %w5\n"
+	"	cbnz	%w3, 4f\n"
+	"2:	stltxr	%w3, %w6, %2\n"
+	"	cbz	%w3, 3f\n"
+	"	sub	%w4, %w4, %w3\n"
+	"	cbnz	%w4, 1b\n"
+	"	mov	%w0, %w7\n"
+	"3:\n"
+	"	dmb	ish\n"
+	"4:\n"
+	_ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
+	_ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
+	: "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
+	: "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
+	: "memory");
+
+	mte_disable_tco();
+	uaccess_ttbr0_disable();
+
+	if (!ret)
+		*oval = oldval;
+
+	return ret;
+}
+
+#endif /* __ASM_FUTEX_LSUI_H */
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 6/7] arm64/asm: introduce lsui.h
  2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
                   ` (4 preceding siblings ...)
  2025-06-17 18:36 ` [PATCH v3 5/7] arm64/futex: add futex atomic operation with FEAT_LSUI Yeoreum Yun
@ 2025-06-17 18:36 ` Yeoreum Yun
  2025-06-17 18:36 ` [PATCH v3 7/7] arm64/futex: support futex with FEAT_LSUI Yeoreum Yun
  2025-06-26  7:56 ` [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
  7 siblings, 0 replies; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-17 18:36 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel, Yeoreum Yun

This patch introduces lsui.h header file for applying runtime patch
to use load/store unprevileged instructions when cpu supports
FEAT_LSUI otherwise uses method implemented via ll/sc way with
clearing PSTATE.PAN bit

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 arch/arm64/include/asm/lsui.h | 37 +++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
 create mode 100644 arch/arm64/include/asm/lsui.h

diff --git a/arch/arm64/include/asm/lsui.h b/arch/arm64/include/asm/lsui.h
new file mode 100644
index 000000000000..39bf232f3eb7
--- /dev/null
+++ b/arch/arm64/include/asm/lsui.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2025 Arm Ltd.
+ */
+#ifndef __ASM_LSUI_H
+#define __ASM_LSUI_H
+
+#define LL_SC_MAX_LOOPS	128 /* What's the largest number you can think of? */
+
+#include <asm/futex_ll_sc_u.h>
+
+#ifdef CONFIG_AS_HAS_LSUI
+
+#define __LSUI_PREAMBLE	".arch_extension lsui\n"
+
+#include <linux/compiler_types.h>
+#include <linux/export.h>
+#include <linux/stringify.h>
+#include <asm/alternative.h>
+#include <asm/alternative-macros.h>
+#include <asm/cpucaps.h>
+
+#include <asm/futex_lsui.h>
+
+#define __lsui_ll_sc_u_body(op, ...)					\
+({									\
+	alternative_has_cap_likely(ARM64_HAS_LSUI) ?		\
+		__lsui_##op(__VA_ARGS__) :				\
+		__ll_sc_u_##op(__VA_ARGS__);				\
+})
+
+#else	/* CONFIG_AS_HAS_LSUI */
+
+#define __lsui_ll_sc_u_body(op, ...)		__ll_sc_u_##op(__VA_ARGS__)
+
+#endif	/* CONFIG_AS_HAS_LSUI */
+#endif	/* __ASM_LSUI_H */
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 7/7] arm64/futex: support futex with FEAT_LSUI
  2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
                   ` (5 preceding siblings ...)
  2025-06-17 18:36 ` [PATCH v3 6/7] arm64/asm: introduce lsui.h Yeoreum Yun
@ 2025-06-17 18:36 ` Yeoreum Yun
  2025-06-26  7:56 ` [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
  7 siblings, 0 replies; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-17 18:36 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel, Yeoreum Yun

Since Armv9.6, FEAT_LSUI supplies load/store unprevileged instructions
for kernel to access user memory without clearing PSTATE.PAN.

This patch makes futex use futex_atomic operations implemented with these
instruction when cpu supports FEAT_LSUI otherwise they work with
ldxr/stlxr with clearing PSTATE.PAN bit.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 arch/arm64/include/asm/futex.h | 99 +++++++++++-----------------------
 1 file changed, 31 insertions(+), 68 deletions(-)

diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index bc06691d2062..ed4586776655 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -9,71 +9,60 @@
 #include <linux/uaccess.h>
 
 #include <asm/errno.h>
+#include <asm/lsui.h>
 
-#define FUTEX_MAX_LOOPS	128 /* What's the largest number you can think of? */
-
-#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg)		\
-do {									\
-	unsigned int loops = FUTEX_MAX_LOOPS;				\
-									\
-	uaccess_enable_privileged();					\
-	asm volatile(							\
-"	prfm	pstl1strm, %2\n"					\
-"1:	ldxr	%w1, %2\n"						\
-	insn "\n"							\
-"2:	stlxr	%w0, %w3, %2\n"						\
-"	cbz	%w0, 3f\n"						\
-"	sub	%w4, %w4, %w0\n"					\
-"	cbnz	%w4, 1b\n"						\
-"	mov	%w0, %w6\n"						\
-"3:\n"									\
-"	dmb	ish\n"							\
-	_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)				\
-	_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)				\
-	: "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp),	\
-	  "+r" (loops)							\
-	: "r" (oparg), "Ir" (-EAGAIN)					\
-	: "memory");							\
-	uaccess_disable_privileged();					\
-} while (0)
+#define FUTEX_ATOMIC_OP(op)										\
+static __always_inline int										\
+__futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval)	\
+{									\
+	return __lsui_ll_sc_u_body(futex_atomic_##op, oparg, uaddr, oval);					\
+}
+
+FUTEX_ATOMIC_OP(add)
+FUTEX_ATOMIC_OP(or)
+FUTEX_ATOMIC_OP(and)
+FUTEX_ATOMIC_OP(eor)
+FUTEX_ATOMIC_OP(set)
+
+#undef FUTEX_ATOMIC_OP
+
+static __always_inline int
+__futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+	return __lsui_ll_sc_u_body(futex_cmpxchg, uaddr, oldval, newval, oval);
+}
 
 static inline int
 arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr)
 {
-	int oldval = 0, ret, tmp;
-	u32 __user *uaddr = __uaccess_mask_ptr(_uaddr);
+	int ret;
+	u32 __user *uaddr;
 
 	if (!access_ok(_uaddr, sizeof(u32)))
 		return -EFAULT;
 
+	uaddr = __uaccess_mask_ptr(_uaddr);
+
 	switch (op) {
 	case FUTEX_OP_SET:
-		__futex_atomic_op("mov	%w3, %w5",
-				  ret, oldval, uaddr, tmp, oparg);
+		ret = __futex_atomic_set(oparg, uaddr, oval);
 		break;
 	case FUTEX_OP_ADD:
-		__futex_atomic_op("add	%w3, %w1, %w5",
-				  ret, oldval, uaddr, tmp, oparg);
+		ret = __futex_atomic_add(oparg, uaddr, oval);
 		break;
 	case FUTEX_OP_OR:
-		__futex_atomic_op("orr	%w3, %w1, %w5",
-				  ret, oldval, uaddr, tmp, oparg);
+		ret = __futex_atomic_or(oparg, uaddr, oval);
 		break;
 	case FUTEX_OP_ANDN:
-		__futex_atomic_op("and	%w3, %w1, %w5",
-				  ret, oldval, uaddr, tmp, ~oparg);
+		ret = __futex_atomic_and(~oparg, uaddr, oval);
 		break;
 	case FUTEX_OP_XOR:
-		__futex_atomic_op("eor	%w3, %w1, %w5",
-				  ret, oldval, uaddr, tmp, oparg);
+		ret = __futex_atomic_eor(oparg, uaddr, oval);
 		break;
 	default:
 		ret = -ENOSYS;
 	}
 
-	if (!ret)
-		*oval = oldval;
-
 	return ret;
 }
 
@@ -81,40 +70,14 @@ static inline int
 futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *_uaddr,
 			      u32 oldval, u32 newval)
 {
-	int ret = 0;
-	unsigned int loops = FUTEX_MAX_LOOPS;
-	u32 val, tmp;
 	u32 __user *uaddr;
 
 	if (!access_ok(_uaddr, sizeof(u32)))
 		return -EFAULT;
 
 	uaddr = __uaccess_mask_ptr(_uaddr);
-	uaccess_enable_privileged();
-	asm volatile("// futex_atomic_cmpxchg_inatomic\n"
-"	prfm	pstl1strm, %2\n"
-"1:	ldxr	%w1, %2\n"
-"	sub	%w3, %w1, %w5\n"
-"	cbnz	%w3, 4f\n"
-"2:	stlxr	%w3, %w6, %2\n"
-"	cbz	%w3, 3f\n"
-"	sub	%w4, %w4, %w3\n"
-"	cbnz	%w4, 1b\n"
-"	mov	%w0, %w7\n"
-"3:\n"
-"	dmb	ish\n"
-"4:\n"
-	_ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
-	_ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
-	: "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
-	: "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
-	: "memory");
-	uaccess_disable_privileged();
-
-	if (!ret)
-		*uval = val;
 
-	return ret;
+	return __futex_cmpxchg(uaddr, oldval, newval, uval);
 }
 
 #endif /* __ASM_FUTEX_H */
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops
  2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
                   ` (6 preceding siblings ...)
  2025-06-17 18:36 ` [PATCH v3 7/7] arm64/futex: support futex with FEAT_LSUI Yeoreum Yun
@ 2025-06-26  7:56 ` Yeoreum Yun
  7 siblings, 0 replies; 10+ messages in thread
From: Yeoreum Yun @ 2025-06-26  7:56 UTC (permalink / raw)
  To: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott
  Cc: linux-arm-kernel, linux-kernel

Gentle ping in case of forgotten.

> Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
> previleged level to access to access user memory without clearing
> PSTATE.PAN bit.
>
> This patchset support FEAT_LUSI and applies in futex atomic operation
> where can replace from ldxr/stlxr pair implmentation with clearing
> PSTATE.PAN bit to correspondant load/store unprevileged atomic operation
> without clearing PSTATE.PAN bit.
>
> Patch Sequences
> ================
>
> Patch #1 adds cpufeature for FEAT_LUSI
>
> Patch #2 expose FEAT_LUSI to guest
>
> Patch #3 adds Kconfig for FEAT_LUSI
>
> Patch #4 separtes former futex atomic-op implmentation from futex.h
> to futex_ll_sc_u.h
>
> Patch #5 implments futex atomic operation using lsui instruction.
>
> Patch #6 introduces lsui.h to apply runtime patch to use former
> implmentation when FEAT_LUSI doesn't support.
>
> Patch #7 applies lsui.h into arch_futext_atomic_op().
>
> Patch History
> ==============
> from v2 to v3:
>   - expose FEAT_LUSI to guest
>   - add help section for LUSI Kconfig
>   - https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/
>
> from v1 to v2:
>   - remove empty v9.6 menu entry
>   - locate HAS_LUSI in cpucaps in order
>   - https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/
>
> Yeoreum Yun (7):
>   arm64: cpufeature: add FEAT_LSUI
>   arm64/kvm: expose FEAT_LSUI to guest
>   arm64/Kconfig: add LSUI Kconfig
>   arm64/futex: move futex atomic logic with clearing PAN bit
>   arm64/futex: add futex atomic operation with FEAT_LSUI
>   arm64/asm: introduce lsui.h
>   arm64/futex: support futex with FEAT_LSUI
>
>  arch/arm64/Kconfig                     |   9 ++
>  arch/arm64/include/asm/futex.h         |  99 ++++++-------------
>  arch/arm64/include/asm/futex_ll_sc_u.h | 115 +++++++++++++++++++++
>  arch/arm64/include/asm/futex_lsui.h    | 132 +++++++++++++++++++++++++
>  arch/arm64/include/asm/lsui.h          |  37 +++++++
>  arch/arm64/kernel/cpufeature.c         |   8 ++
>  arch/arm64/kvm/sys_regs.c              |   5 +-
>  arch/arm64/tools/cpucaps               |   1 +
>  8 files changed, 336 insertions(+), 70 deletions(-)
>  create mode 100644 arch/arm64/include/asm/futex_ll_sc_u.h
>  create mode 100644 arch/arm64/include/asm/futex_lsui.h
>  create mode 100644 arch/arm64/include/asm/lsui.h
>
> --
> LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
>

--
Sincerely,
Yeoreum Yun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/7] arm64/kvm: expose FEAT_LSUI to guest
  2025-06-17 18:36 ` [PATCH v3 2/7] arm64/kvm: expose FEAT_LSUI to guest Yeoreum Yun
@ 2025-07-02 17:17   ` Marc Zyngier
  0 siblings, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2025-07-02 17:17 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: catalin.marinas, will, broonie, oliver.upton, ardb, frederic,
	james.morse, joey.gouly, scott, linux-arm-kernel, linux-kernel

On Tue, 17 Jun 2025 19:36:30 +0100,
Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> 
> expose FEAT_LSUI to guest.
> 
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 76c2f0da821f..5c5a9c3ace2f 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1636,7 +1636,8 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
>  			val &= ~ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_WFxT);
>  		break;
>  	case SYS_ID_AA64ISAR3_EL1:
> -		val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX;
> +		val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX |
> +		       ID_AA64ISAR3_EL1_LSUI;
>  		break;
>  	case SYS_ID_AA64MMFR2_EL1:
>  		val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK;
> @@ -2921,7 +2922,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  					ID_AA64ISAR2_EL1_APA3 |
>  					ID_AA64ISAR2_EL1_GPA3)),
>  	ID_WRITABLE(ID_AA64ISAR3_EL1, (ID_AA64ISAR3_EL1_FPRCVT |
> -				       ID_AA64ISAR3_EL1_FAMINMAX)),
> +				       ID_AA64ISAR3_EL1_FAMINMAX | ID_AA64ISAR3_EL1_LSUI)),
>  	ID_UNALLOCATED(6,4),
>  	ID_UNALLOCATED(6,5),
>  	ID_UNALLOCATED(6,6),

In the future, please Cc the relevant people and mailing lists.

With $SUBJECT fixed to match the KVM/arm64 log,

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-02 17:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17 18:36 [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-06-17 18:36 ` [PATCH v3 1/7] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
2025-06-17 18:36 ` [PATCH v3 2/7] arm64/kvm: expose FEAT_LSUI to guest Yeoreum Yun
2025-07-02 17:17   ` Marc Zyngier
2025-06-17 18:36 ` [PATCH v3 3/7] arm64/Kconfig: add LSUI Kconfig Yeoreum Yun
2025-06-17 18:36 ` [PATCH v3 4/7] arm64/futex: move futex atomic logic with clearing PAN bit Yeoreum Yun
2025-06-17 18:36 ` [PATCH v3 5/7] arm64/futex: add futex atomic operation with FEAT_LSUI Yeoreum Yun
2025-06-17 18:36 ` [PATCH v3 6/7] arm64/asm: introduce lsui.h Yeoreum Yun
2025-06-17 18:36 ` [PATCH v3 7/7] arm64/futex: support futex with FEAT_LSUI Yeoreum Yun
2025-06-26  7:56 ` [PATCH v3 0/7] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).