* [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops
@ 2025-08-16 15:19 Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 1/6] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Yeoreum Yun @ 2025-08-16 15:19 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton, joey.gouly,
james.morse, ardb, scott, suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
previleged level to access to access user memory without clearing
PSTATE.PAN bit.
This patchset support FEAT_LSUI and applies in futex atomic operation
where can replace from ldxr/stlxr pair implmentation with clearing
PSTATE.PAN bit to correspondant load/store unprevileged atomic operation
without clearing PSTATE.PAN bit.
(Sorry, I've sent wrongly for patch version 7 and resend it.
Again, sorry for mail-boom).
Patch Sequences
================
Patch #1 adds cpufeature for FEAT_LSUI
Patch #2 expose FEAT_LSUI to guest
Patch #3 adds Kconfig for FEAT_LSUI
Patch #4 refactor former futex atomic-op implmentation with ll/sc &
clearing PSTATE.PAN
Patch #5 applies small optimisation for __llc_futex_atomic_set().
Patch #6 support futext atomic-op with FEAT_LSUI
Patch History
==============
from v6 to v7:
- wrap FEAT_LSUI with CONFIG_AS_HAS_LSUI in cpufeature
- remove unnecessary addition of indentation.
- remove unnecessary mte_tco_enable()/disable() on LSUI operation.
- https://lore.kernel.org/all/20250811163635.1562145-1-yeoreum.yun@arm.com/
from v5 to v6:
- rebase to v6.17-rc1
- https://lore.kernel.org/all/20250722121956.1509403-1-yeoreum.yun@arm.com/
from v4 to v5:
- remove futex_ll_sc.h futext_lsui and lsui.h and move them to futex.h
- reorganize the patches.
- https://lore.kernel.org/all/20250721083618.2743569-1-yeoreum.yun@arm.com/
from v3 to v4:
- rebase to v6.16-rc7
- modify some patch's title.
- https://lore.kernel.org/all/20250617183635.1266015-1-yeoreum.yun@arm.com/
from v2 to v3:
- expose FEAT_LUSI to guest
- add help section for LUSI Kconfig
- https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/
from v1 to v2:
- remove empty v9.6 menu entry
- locate HAS_LUSI in cpucaps in order
- https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/
Yeoreum Yun (6):
arm64: cpufeature: add FEAT_LSUI
KVM: arm64: expose FEAT_LSUI to guest
arm64: Kconfig: add LSUI Kconfig
arm64: futex: refactor futex atomic operation
arm64: futex: small optimisation for __llsc_futex_atomic_set()
arm64: futex: support futex with FEAT_LSUI
arch/arm64/Kconfig | 5 +
arch/arm64/include/asm/futex.h | 291 +++++++++++++++++++++++++++------
arch/arm64/kernel/cpufeature.c | 10 ++
arch/arm64/kvm/sys_regs.c | 5 +-
arch/arm64/tools/cpucaps | 1 +
5 files changed, 261 insertions(+), 51 deletions(-)
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH RESEND v7 1/6] arm64: cpufeature: add FEAT_LSUI
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
@ 2025-08-16 15:19 ` Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 2/6] KVM: arm64: expose FEAT_LSUI to guest Yeoreum Yun
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Yeoreum Yun @ 2025-08-16 15:19 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton, joey.gouly,
james.morse, ardb, scott, suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Since Armv9.6, FEAT_LSUI supplies load/store instructions
for privileged level to access user memory without clearing PSTATE.PAN bit.
Add LSUI feature so that the unprevilieged load/store instructions
could be used when kernel accesses user memory without clearing PSTATE.PAN bit.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/kernel/cpufeature.c | 10 ++++++++++
arch/arm64/tools/cpucaps | 1 +
2 files changed, 11 insertions(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9ad065f15f1d..b8660c8d51b2 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -278,6 +278,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar2[] = {
static const struct arm64_ftr_bits ftr_id_aa64isar3[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_FPRCVT_SHIFT, 4, 0),
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_LSUI_SHIFT, 4, ID_AA64ISAR3_EL1_LSUI_NI),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_FAMINMAX_SHIFT, 4, 0),
ARM64_FTR_END,
};
@@ -3131,6 +3132,15 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.matches = has_cpuid_feature,
ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, GCIE, IMP)
},
+#ifdef CONFIG_AS_HAS_LSUI
+ {
+ .desc = "Unprivileged Load Store Instructions (LSUI)",
+ .capability = ARM64_HAS_LSUI,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = has_cpuid_feature,
+ ARM64_CPUID_FIELDS(ID_AA64ISAR3_EL1, LSUI, IMP)
+ },
+#endif
{},
};
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index ef0b7946f5a4..73f8e5211cd2 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -44,6 +44,7 @@ HAS_HCX
HAS_LDAPR
HAS_LPA2
HAS_LSE_ATOMICS
+HAS_LSUI
HAS_MOPS
HAS_NESTED_VIRT
HAS_BBML2_NOABORT
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH RESEND v7 2/6] KVM: arm64: expose FEAT_LSUI to guest
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 1/6] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
@ 2025-08-16 15:19 ` Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 3/6] arm64: Kconfig: add LSUI Kconfig Yeoreum Yun
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Yeoreum Yun @ 2025-08-16 15:19 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton, joey.gouly,
james.morse, ardb, scott, suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
expose FEAT_LSUI to guest.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/sys_regs.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 82ffb3b3b3cf..fb6c154aa37d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1642,7 +1642,8 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
val &= ~ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_WFxT);
break;
case SYS_ID_AA64ISAR3_EL1:
- val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX;
+ val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX |
+ ID_AA64ISAR3_EL1_LSUI;
break;
case SYS_ID_AA64MMFR2_EL1:
val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK;
@@ -2991,7 +2992,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_AA64ISAR2_EL1_APA3 |
ID_AA64ISAR2_EL1_GPA3)),
ID_WRITABLE(ID_AA64ISAR3_EL1, (ID_AA64ISAR3_EL1_FPRCVT |
- ID_AA64ISAR3_EL1_FAMINMAX)),
+ ID_AA64ISAR3_EL1_FAMINMAX | ID_AA64ISAR3_EL1_LSUI)),
ID_UNALLOCATED(6,4),
ID_UNALLOCATED(6,5),
ID_UNALLOCATED(6,6),
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH RESEND v7 3/6] arm64: Kconfig: add LSUI Kconfig
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 1/6] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 2/6] KVM: arm64: expose FEAT_LSUI to guest Yeoreum Yun
@ 2025-08-16 15:19 ` Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 4/6] arm64: futex: refactor futex atomic operation Yeoreum Yun
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Yeoreum Yun @ 2025-08-16 15:19 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton, joey.gouly,
james.morse, ardb, scott, suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
previleged level to access to access user memory without clearing
PSTATE.PAN bit.
It's enough to add CONFIG_AS_HAS_LSUI only because the code for LSUI uses
individual `.arch_extension` entries.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/Kconfig | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e9bbfacc35a6..c474de3dce02 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2239,6 +2239,11 @@ config ARM64_GCS
endmenu # "v9.4 architectural features"
+config AS_HAS_LSUI
+ def_bool $(as-instr,.arch_extension lsui)
+ help
+ Supported by LLVM 20 and later, not yet supported by GNU AS.
+
config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH RESEND v7 4/6] arm64: futex: refactor futex atomic operation
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
` (2 preceding siblings ...)
2025-08-16 15:19 ` [PATCH RESEND v7 3/6] arm64: Kconfig: add LSUI Kconfig Yeoreum Yun
@ 2025-08-16 15:19 ` Yeoreum Yun
2025-08-16 15:19 ` [PATCH v7 RESEND 5/6] arm64: futex: small optimisation for __llsc_futex_atomic_set() Yeoreum Yun
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Yeoreum Yun @ 2025-08-16 15:19 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton, joey.gouly,
james.morse, ardb, scott, suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Refactor futex atomic operations using ll/sc method with
clearing PSTATE.PAN to prepare to apply FEAT_LSUI on them.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/include/asm/futex.h | 132 +++++++++++++++++++++------------
1 file changed, 84 insertions(+), 48 deletions(-)
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index bc06691d2062..ab7003cb4724 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -7,17 +7,21 @@
#include <linux/futex.h>
#include <linux/uaccess.h>
+#include <linux/stringify.h>
#include <asm/errno.h>
-#define FUTEX_MAX_LOOPS 128 /* What's the largest number you can think of? */
+#define LLSC_MAX_LOOPS 128 /* What's the largest number you can think of? */
-#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
-do { \
- unsigned int loops = FUTEX_MAX_LOOPS; \
+#define LLSC_FUTEX_ATOMIC_OP(op, insn) \
+static __always_inline int \
+__llsc_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
+{ \
+ unsigned int loops = LLSC_MAX_LOOPS; \
+ int ret, oldval, tmp; \
\
uaccess_enable_privileged(); \
- asm volatile( \
+ asm volatile("// __llsc_futex_atomic_" #op "\n" \
" prfm pstl1strm, %2\n" \
"1: ldxr %w1, %2\n" \
insn "\n" \
@@ -35,45 +39,103 @@ do { \
: "r" (oparg), "Ir" (-EAGAIN) \
: "memory"); \
uaccess_disable_privileged(); \
-} while (0)
+ \
+ if (!ret) \
+ *oval = oldval; \
+ \
+ return ret; \
+}
+
+LLSC_FUTEX_ATOMIC_OP(add, "add %w3, %w1, %w5")
+LLSC_FUTEX_ATOMIC_OP(or, "orr %w3, %w1, %w5")
+LLSC_FUTEX_ATOMIC_OP(and, "and %w3, %w1, %w5")
+LLSC_FUTEX_ATOMIC_OP(eor, "eor %w3, %w1, %w5")
+LLSC_FUTEX_ATOMIC_OP(set, "mov %w3, %w5")
+
+static __always_inline int
+__llsc_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+ int ret = 0;
+ unsigned int loops = LLSC_MAX_LOOPS;
+ u32 val, tmp;
+
+ uaccess_enable_privileged();
+ asm volatile("//__llsc_futex_cmpxchg\n"
+" prfm pstl1strm, %2\n"
+"1: ldxr %w1, %2\n"
+" eor %w3, %w1, %w5\n"
+" cbnz %w3, 4f\n"
+"2: stlxr %w3, %w6, %2\n"
+" cbz %w3, 3f\n"
+" sub %w4, %w4, %w3\n"
+" cbnz %w4, 1b\n"
+" mov %w0, %w7\n"
+"3:\n"
+" dmb ish\n"
+"4:\n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
+ _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
+ : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
+ : "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
+ : "memory");
+ uaccess_disable_privileged();
+
+ if (!ret)
+ *oval = val;
+
+ return ret;
+}
+
+#define FUTEX_ATOMIC_OP(op) \
+static __always_inline int \
+__futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
+{ \
+ return __llsc_futex_atomic_##op(oparg, uaddr, oval); \
+}
+
+FUTEX_ATOMIC_OP(add)
+FUTEX_ATOMIC_OP(or)
+FUTEX_ATOMIC_OP(and)
+FUTEX_ATOMIC_OP(eor)
+FUTEX_ATOMIC_OP(set)
+
+static __always_inline int
+__futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+ return __llsc_futex_cmpxchg(uaddr, oldval, newval, oval);
+}
static inline int
arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr)
{
- int oldval = 0, ret, tmp;
- u32 __user *uaddr = __uaccess_mask_ptr(_uaddr);
+ int ret;
+ u32 __user *uaddr;
if (!access_ok(_uaddr, sizeof(u32)))
return -EFAULT;
+ uaddr = __uaccess_mask_ptr(_uaddr);
+
switch (op) {
case FUTEX_OP_SET:
- __futex_atomic_op("mov %w3, %w5",
- ret, oldval, uaddr, tmp, oparg);
+ ret = __futex_atomic_set(oparg, uaddr, oval);
break;
case FUTEX_OP_ADD:
- __futex_atomic_op("add %w3, %w1, %w5",
- ret, oldval, uaddr, tmp, oparg);
+ ret = __futex_atomic_add(oparg, uaddr, oval);
break;
case FUTEX_OP_OR:
- __futex_atomic_op("orr %w3, %w1, %w5",
- ret, oldval, uaddr, tmp, oparg);
+ ret = __futex_atomic_or(oparg, uaddr, oval);
break;
case FUTEX_OP_ANDN:
- __futex_atomic_op("and %w3, %w1, %w5",
- ret, oldval, uaddr, tmp, ~oparg);
+ ret = __futex_atomic_and(~oparg, uaddr, oval);
break;
case FUTEX_OP_XOR:
- __futex_atomic_op("eor %w3, %w1, %w5",
- ret, oldval, uaddr, tmp, oparg);
+ ret = __futex_atomic_eor(oparg, uaddr, oval);
break;
default:
ret = -ENOSYS;
}
- if (!ret)
- *oval = oldval;
-
return ret;
}
@@ -81,40 +143,14 @@ static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *_uaddr,
u32 oldval, u32 newval)
{
- int ret = 0;
- unsigned int loops = FUTEX_MAX_LOOPS;
- u32 val, tmp;
u32 __user *uaddr;
if (!access_ok(_uaddr, sizeof(u32)))
return -EFAULT;
uaddr = __uaccess_mask_ptr(_uaddr);
- uaccess_enable_privileged();
- asm volatile("// futex_atomic_cmpxchg_inatomic\n"
-" prfm pstl1strm, %2\n"
-"1: ldxr %w1, %2\n"
-" sub %w3, %w1, %w5\n"
-" cbnz %w3, 4f\n"
-"2: stlxr %w3, %w6, %2\n"
-" cbz %w3, 3f\n"
-" sub %w4, %w4, %w3\n"
-" cbnz %w4, 1b\n"
-" mov %w0, %w7\n"
-"3:\n"
-" dmb ish\n"
-"4:\n"
- _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
- _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
- : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
- : "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
- : "memory");
- uaccess_disable_privileged();
-
- if (!ret)
- *uval = val;
- return ret;
+ return __futex_cmpxchg(uaddr, oldval, newval, uval);
}
#endif /* __ASM_FUTEX_H */
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v7 RESEND 5/6] arm64: futex: small optimisation for __llsc_futex_atomic_set()
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
` (3 preceding siblings ...)
2025-08-16 15:19 ` [PATCH RESEND v7 4/6] arm64: futex: refactor futex atomic operation Yeoreum Yun
@ 2025-08-16 15:19 ` Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 6/6] arm64: futex: support futex with FEAT_LSUI Yeoreum Yun
2025-09-01 10:06 ` [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
6 siblings, 0 replies; 8+ messages in thread
From: Yeoreum Yun @ 2025-08-16 15:19 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton, joey.gouly,
james.morse, ardb, scott, suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
__llsc_futex_atomic_set() is implmented using
LLSC_FUTEX_ATOMIC_OP() macro with "mov %w3, %w5".
But this instruction isn't required to implement fux_atomic_set()
so make a small optimisation by implementing __llsc_futex_atomic_set()
as separate function.
This will make usage of LLSC_FUTEX_ATOMIC_OP() macro more simple.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/include/asm/futex.h | 43 ++++++++++++++++++++++++++++------
1 file changed, 36 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index ab7003cb4724..22a6301a9f3d 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -13,7 +13,7 @@
#define LLSC_MAX_LOOPS 128 /* What's the largest number you can think of? */
-#define LLSC_FUTEX_ATOMIC_OP(op, insn) \
+#define LLSC_FUTEX_ATOMIC_OP(op, asm_op) \
static __always_inline int \
__llsc_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
{ \
@@ -24,7 +24,7 @@ __llsc_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
asm volatile("// __llsc_futex_atomic_" #op "\n" \
" prfm pstl1strm, %2\n" \
"1: ldxr %w1, %2\n" \
- insn "\n" \
+" " #asm_op " %w3, %w1, %w5\n" \
"2: stlxr %w0, %w3, %2\n" \
" cbz %w0, 3f\n" \
" sub %w4, %w4, %w0\n" \
@@ -46,11 +46,40 @@ __llsc_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
return ret; \
}
-LLSC_FUTEX_ATOMIC_OP(add, "add %w3, %w1, %w5")
-LLSC_FUTEX_ATOMIC_OP(or, "orr %w3, %w1, %w5")
-LLSC_FUTEX_ATOMIC_OP(and, "and %w3, %w1, %w5")
-LLSC_FUTEX_ATOMIC_OP(eor, "eor %w3, %w1, %w5")
-LLSC_FUTEX_ATOMIC_OP(set, "mov %w3, %w5")
+LLSC_FUTEX_ATOMIC_OP(add, add)
+LLSC_FUTEX_ATOMIC_OP(or, orr)
+LLSC_FUTEX_ATOMIC_OP(and, and)
+LLSC_FUTEX_ATOMIC_OP(eor, eor)
+
+static __always_inline int
+__llsc_futex_atomic_set(int oparg, u32 __user *uaddr, int *oval)
+{
+ unsigned int loops = LLSC_MAX_LOOPS;
+ int ret, oldval;
+
+ uaccess_enable_privileged();
+ asm volatile("//__llsc_futex_xchg\n"
+" prfm pstl1strm, %2\n"
+"1: ldxr %w1, %2\n"
+"2: stlxr %w0, %w4, %2\n"
+" cbz %w3, 3f\n"
+" sub %w3, %w3, %w0\n"
+" cbnz %w3, 1b\n"
+" mov %w0, %w5\n"
+"3:\n"
+" dmb ish\n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
+ _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
+ : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "+r" (loops)
+ : "r" (oparg), "Ir" (-EAGAIN)
+ : "memory");
+ uaccess_disable_privileged();
+
+ if (!ret)
+ *oval = oldval;
+
+ return ret;
+}
static __always_inline int
__llsc_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH RESEND v7 6/6] arm64: futex: support futex with FEAT_LSUI
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
` (4 preceding siblings ...)
2025-08-16 15:19 ` [PATCH v7 RESEND 5/6] arm64: futex: small optimisation for __llsc_futex_atomic_set() Yeoreum Yun
@ 2025-08-16 15:19 ` Yeoreum Yun
2025-09-01 10:06 ` [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
6 siblings, 0 replies; 8+ messages in thread
From: Yeoreum Yun @ 2025-08-16 15:19 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton, joey.gouly,
james.morse, ardb, scott, suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Current futex atomic operations are implemented with ll/sc instructions
and clearing PSTATE.PAN.
Since Armv9.6, FEAT_LSUI supplies not only load/store instructions but
also atomic operation for user memory access in kernel it doesn't need
to clear PSTATE.PAN bit anymore.
With theses instructions some of futex atomic operations don't need to
be implmented with ldxr/stlxr pair instead can be implmented with
one atomic operation supplied by FEAT_LSUI.
However, some of futex atomic operations still need to use ll/sc way
via ldtxr/stltxr supplied by FEAT_LSUI since there is no correspondant
atomic instruction or doesn't support word size operation.
(i.e) eor, cas{mb}t
But It's good to work without clearing PSTATE.PAN bit.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/include/asm/futex.h | 130 ++++++++++++++++++++++++++++++++-
1 file changed, 129 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index 22a6301a9f3d..ece35ca9b5d9 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -9,6 +9,8 @@
#include <linux/uaccess.h>
#include <linux/stringify.h>
+#include <asm/alternative.h>
+#include <asm/alternative-macros.h>
#include <asm/errno.h>
#define LLSC_MAX_LOOPS 128 /* What's the largest number you can think of? */
@@ -115,11 +117,137 @@ __llsc_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
return ret;
}
+#ifdef CONFIG_AS_HAS_LSUI
+
+#define __LSUI_PREAMBLE ".arch_extension lsui\n"
+
+#define LSUI_FUTEX_ATOMIC_OP(op, asm_op, mb) \
+static __always_inline int \
+__lsui_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
+{ \
+ int ret = 0; \
+ int oldval; \
+ \
+ uaccess_ttbr0_enable(); \
+ asm volatile("// __lsui_futex_atomic_" #op "\n" \
+ __LSUI_PREAMBLE \
+"1: " #asm_op #mb " %w3, %w2, %1\n" \
+"2:\n" \
+ _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %w0) \
+ : "+r" (ret), "+Q" (*uaddr), "=r" (oldval) \
+ : "r" (oparg) \
+ : "memory"); \
+ uaccess_ttbr0_disable(); \
+ \
+ if (!ret) \
+ *oval = oldval; \
+ \
+ return ret; \
+}
+
+LSUI_FUTEX_ATOMIC_OP(add, ldtadd, al)
+LSUI_FUTEX_ATOMIC_OP(or, ldtset, al)
+LSUI_FUTEX_ATOMIC_OP(andnot, ldtclr, al)
+LSUI_FUTEX_ATOMIC_OP(set, swpt, al)
+
+static __always_inline int
+__lsui_futex_atomic_and(int oparg, u32 __user *uaddr, int *oval)
+{
+ return __lsui_futex_atomic_andnot(~oparg, uaddr, oval);
+}
+
+static __always_inline int
+__lsui_futex_atomic_eor(int oparg, u32 __user *uaddr, int *oval)
+{
+ unsigned int loops = LLSC_MAX_LOOPS;
+ int ret, oldval, tmp;
+
+ uaccess_ttbr0_enable();
+ /*
+ * there are no ldteor/stteor instructions...
+ */
+ asm volatile("// __lsui_futex_atomic_eor\n"
+ __LSUI_PREAMBLE
+" prfm pstl1strm, %2\n"
+"1: ldtxr %w1, %2\n"
+" eor %w3, %w1, %w5\n"
+"2: stltxr %w0, %w3, %2\n"
+" cbz %w0, 3f\n"
+" sub %w4, %w4, %w0\n"
+" cbnz %w4, 1b\n"
+" mov %w0, %w6\n"
+"3:\n"
+" dmb ish\n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
+ _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
+ : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp),
+ "+r" (loops)
+ : "r" (oparg), "Ir" (-EAGAIN)
+ : "memory");
+ uaccess_ttbr0_disable();
+
+ if (!ret)
+ *oval = oldval;
+
+ return ret;
+}
+
+static __always_inline int
+__lsui_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+ int ret = 0;
+ unsigned int loops = LLSC_MAX_LOOPS;
+ u32 val, tmp;
+
+ uaccess_ttbr0_enable();
+ /*
+ * cas{al}t doesn't support word size...
+ */
+ asm volatile("//__lsui_futex_cmpxchg\n"
+ __LSUI_PREAMBLE
+" prfm pstl1strm, %2\n"
+"1: ldtxr %w1, %2\n"
+" eor %w3, %w1, %w5\n"
+" cbnz %w3, 4f\n"
+"2: stltxr %w3, %w6, %2\n"
+" cbz %w3, 3f\n"
+" sub %w4, %w4, %w3\n"
+" cbnz %w4, 1b\n"
+" mov %w0, %w7\n"
+"3:\n"
+" dmb ish\n"
+"4:\n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
+ _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
+ : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
+ : "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
+ : "memory");
+ uaccess_ttbr0_disable();
+
+ if (!ret)
+ *oval = oldval;
+
+ return ret;
+}
+
+#define __lsui_llsc_body(op, ...) \
+({ \
+ alternative_has_cap_likely(ARM64_HAS_LSUI) ? \
+ __lsui_##op(__VA_ARGS__) : __llsc_##op(__VA_ARGS__); \
+})
+
+#else /* CONFIG_AS_HAS_LSUI */
+
+#define __lsui_llsc_body(op, ...) __llsc_##op(__VA_ARGS__)
+
+#endif /* CONFIG_AS_HAS_LSUI */
+
+
#define FUTEX_ATOMIC_OP(op) \
static __always_inline int \
__futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
{ \
- return __llsc_futex_atomic_##op(oparg, uaddr, oval); \
+ return __lsui_llsc_body(futex_atomic_##op, oparg, uaddr, oval); \
}
FUTEX_ATOMIC_OP(add)
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
` (5 preceding siblings ...)
2025-08-16 15:19 ` [PATCH RESEND v7 6/6] arm64: futex: support futex with FEAT_LSUI Yeoreum Yun
@ 2025-09-01 10:06 ` Yeoreum Yun
6 siblings, 0 replies; 8+ messages in thread
From: Yeoreum Yun @ 2025-09-01 10:06 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton, joey.gouly,
james.morse, ardb, scott, suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel
Gentle ping in case of forgotten.
On Sat, Aug 16, 2025 at 04:19:23PM +0100, Yeoreum Yun wrote:
> Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
> previleged level to access to access user memory without clearing
> PSTATE.PAN bit.
>
> This patchset support FEAT_LSUI and applies in futex atomic operation
> where can replace from ldxr/stlxr pair implmentation with clearing
> PSTATE.PAN bit to correspondant load/store unprevileged atomic operation
> without clearing PSTATE.PAN bit.
>
> (Sorry, I've sent wrongly for patch version 7 and resend it.
> Again, sorry for mail-boom).
>
> Patch Sequences
> ================
>
> Patch #1 adds cpufeature for FEAT_LSUI
>
> Patch #2 expose FEAT_LSUI to guest
>
> Patch #3 adds Kconfig for FEAT_LSUI
>
> Patch #4 refactor former futex atomic-op implmentation with ll/sc &
> clearing PSTATE.PAN
>
> Patch #5 applies small optimisation for __llc_futex_atomic_set().
>
> Patch #6 support futext atomic-op with FEAT_LSUI
>
> Patch History
> ==============
> from v6 to v7:
> - wrap FEAT_LSUI with CONFIG_AS_HAS_LSUI in cpufeature
> - remove unnecessary addition of indentation.
> - remove unnecessary mte_tco_enable()/disable() on LSUI operation.
> - https://lore.kernel.org/all/20250811163635.1562145-1-yeoreum.yun@arm.com/
>
> from v5 to v6:
> - rebase to v6.17-rc1
> - https://lore.kernel.org/all/20250722121956.1509403-1-yeoreum.yun@arm.com/
>
> from v4 to v5:
> - remove futex_ll_sc.h futext_lsui and lsui.h and move them to futex.h
> - reorganize the patches.
> - https://lore.kernel.org/all/20250721083618.2743569-1-yeoreum.yun@arm.com/
>
> from v3 to v4:
> - rebase to v6.16-rc7
> - modify some patch's title.
> - https://lore.kernel.org/all/20250617183635.1266015-1-yeoreum.yun@arm.com/
>
> from v2 to v3:
> - expose FEAT_LUSI to guest
> - add help section for LUSI Kconfig
> - https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/
>
> from v1 to v2:
> - remove empty v9.6 menu entry
> - locate HAS_LUSI in cpucaps in order
> - https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/
>
> Yeoreum Yun (6):
> arm64: cpufeature: add FEAT_LSUI
> KVM: arm64: expose FEAT_LSUI to guest
> arm64: Kconfig: add LSUI Kconfig
> arm64: futex: refactor futex atomic operation
> arm64: futex: small optimisation for __llsc_futex_atomic_set()
> arm64: futex: support futex with FEAT_LSUI
>
> arch/arm64/Kconfig | 5 +
> arch/arm64/include/asm/futex.h | 291 +++++++++++++++++++++++++++------
> arch/arm64/kernel/cpufeature.c | 10 ++
> arch/arm64/kvm/sys_regs.c | 5 +-
> arch/arm64/tools/cpucaps | 1 +
> 5 files changed, 261 insertions(+), 51 deletions(-)
>
>
> base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
> --
> LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
>
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-01 11:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-16 15:19 [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 1/6] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 2/6] KVM: arm64: expose FEAT_LSUI to guest Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 3/6] arm64: Kconfig: add LSUI Kconfig Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 4/6] arm64: futex: refactor futex atomic operation Yeoreum Yun
2025-08-16 15:19 ` [PATCH v7 RESEND 5/6] arm64: futex: small optimisation for __llsc_futex_atomic_set() Yeoreum Yun
2025-08-16 15:19 ` [PATCH RESEND v7 6/6] arm64: futex: support futex with FEAT_LSUI Yeoreum Yun
2025-09-01 10:06 ` [PATCH RESEND v7 0/6] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).