* [PATCH v6 0/5] support FEAT_LSUI and apply it on futex atomic ops
@ 2025-08-11 16:36 Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 1/5] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-11 16:36 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton,
shameerali.kolothum.thodi, joey.gouly, james.morse, ardb, scott,
suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
previleged level to access to access user memory without clearing
PSTATE.PAN bit.
This patchset support FEAT_LSUI and applies in futex atomic operation
where can replace from ldxr/stlxr pair implmentation with clearing
PSTATE.PAN bit to correspondant load/store unprevileged atomic operation
without clearing PSTATE.PAN bit.
Patch Sequences
================
Patch #1 adds cpufeature for FEAT_LSUI
Patch #2 expose FEAT_LSUI to guest
Patch #3 adds Kconfig for FEAT_LSUI
Patch #4 refactor former futex atomic-op implmentation with ll/sc &
clearing PSTATE.PAN
Patch #5 support futext atomic-op with FEAT_LSUI
Patch History
==============
from v5 to v6:
- rebase to v6.17-rc1
- https://lore.kernel.org/all/20250722121956.1509403-1-yeoreum.yun@arm.com/
from v4 to v5:
- remove futex_ll_sc.h futext_lsui and lsui.h and move them to futex.h
- reorganize the patches.
- https://lore.kernel.org/all/20250721083618.2743569-1-yeoreum.yun@arm.com/
from v3 to v4:
- rebase to v6.16-rc7
- modify some patch's title.
- https://lore.kernel.org/all/20250617183635.1266015-1-yeoreum.yun@arm.com/
from v2 to v3:
- expose FEAT_LUSI to guest
- add help section for LUSI Kconfig
- https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/
from v1 to v2:
- remove empty v9.6 menu entry
- locate HAS_LUSI in cpucaps in order
- https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/
Yeoreum Yun (5):
arm64: cpufeature: add FEAT_LSUI
KVM: arm64: expose FEAT_LSUI to guest
arm64: Kconfig: add LSUI Kconfig
arm64: futex: refactor futex atomic operation
arm64: futex: support futex with FEAT_LSUI
arch/arm64/Kconfig | 5 +
arch/arm64/include/asm/futex.h | 323 +++++++++++++++++++++++++++------
arch/arm64/kernel/cpufeature.c | 8 +
arch/arm64/kvm/sys_regs.c | 5 +-
arch/arm64/tools/cpucaps | 1 +
5 files changed, 281 insertions(+), 61 deletions(-)
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v6 1/5] arm64: cpufeature: add FEAT_LSUI
2025-08-11 16:36 [PATCH v6 0/5] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
@ 2025-08-11 16:36 ` Yeoreum Yun
2025-08-15 17:33 ` Catalin Marinas
2025-08-11 16:36 ` [PATCH v6 2/5] KVM: arm64: expose FEAT_LSUI to guest Yeoreum Yun
` (3 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-11 16:36 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton,
shameerali.kolothum.thodi, joey.gouly, james.morse, ardb, scott,
suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Since Armv9.6, FEAT_LSUI supplies load/store instructions
for privileged level to access user memory without clearing PSTATE.PAN bit.
Add LSUI feature so that the unprevilieged load/store instructions
could be used when kernel accesses user memory without clearing PSTATE.PAN bit.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/kernel/cpufeature.c | 8 ++++++++
arch/arm64/tools/cpucaps | 1 +
2 files changed, 9 insertions(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9ad065f15f1d..fd8ec291adab 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -278,6 +278,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar2[] = {
static const struct arm64_ftr_bits ftr_id_aa64isar3[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_FPRCVT_SHIFT, 4, 0),
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_LSUI_SHIFT, 4, ID_AA64ISAR3_EL1_LSUI_NI),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR3_EL1_FAMINMAX_SHIFT, 4, 0),
ARM64_FTR_END,
};
@@ -3131,6 +3132,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.matches = has_cpuid_feature,
ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, GCIE, IMP)
},
+ {
+ .desc = "Unprivileged Load Store Instructions (LSUI)",
+ .capability = ARM64_HAS_LSUI,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = has_cpuid_feature,
+ ARM64_CPUID_FIELDS(ID_AA64ISAR3_EL1, LSUI, IMP)
+ },
{},
};
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index ef0b7946f5a4..73f8e5211cd2 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -44,6 +44,7 @@ HAS_HCX
HAS_LDAPR
HAS_LPA2
HAS_LSE_ATOMICS
+HAS_LSUI
HAS_MOPS
HAS_NESTED_VIRT
HAS_BBML2_NOABORT
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 2/5] KVM: arm64: expose FEAT_LSUI to guest
2025-08-11 16:36 [PATCH v6 0/5] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 1/5] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
@ 2025-08-11 16:36 ` Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 3/5] arm64: Kconfig: add LSUI Kconfig Yeoreum Yun
` (2 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-11 16:36 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton,
shameerali.kolothum.thodi, joey.gouly, james.morse, ardb, scott,
suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
expose FEAT_LSUI to guest.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/sys_regs.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 82ffb3b3b3cf..fb6c154aa37d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1642,7 +1642,8 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu,
val &= ~ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_WFxT);
break;
case SYS_ID_AA64ISAR3_EL1:
- val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX;
+ val &= ID_AA64ISAR3_EL1_FPRCVT | ID_AA64ISAR3_EL1_FAMINMAX |
+ ID_AA64ISAR3_EL1_LSUI;
break;
case SYS_ID_AA64MMFR2_EL1:
val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK;
@@ -2991,7 +2992,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_AA64ISAR2_EL1_APA3 |
ID_AA64ISAR2_EL1_GPA3)),
ID_WRITABLE(ID_AA64ISAR3_EL1, (ID_AA64ISAR3_EL1_FPRCVT |
- ID_AA64ISAR3_EL1_FAMINMAX)),
+ ID_AA64ISAR3_EL1_FAMINMAX | ID_AA64ISAR3_EL1_LSUI)),
ID_UNALLOCATED(6,4),
ID_UNALLOCATED(6,5),
ID_UNALLOCATED(6,6),
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 3/5] arm64: Kconfig: add LSUI Kconfig
2025-08-11 16:36 [PATCH v6 0/5] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 1/5] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 2/5] KVM: arm64: expose FEAT_LSUI to guest Yeoreum Yun
@ 2025-08-11 16:36 ` Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 4/5] arm64: futex: refactor futex atomic operation Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI Yeoreum Yun
4 siblings, 0 replies; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-11 16:36 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton,
shameerali.kolothum.thodi, joey.gouly, james.morse, ardb, scott,
suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
previleged level to access to access user memory without clearing
PSTATE.PAN bit.
It's enough to add CONFIG_AS_HAS_LSUI only because the code for LSUI uses
individual `.arch_extension` entries.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/Kconfig | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e9bbfacc35a6..c474de3dce02 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2239,6 +2239,11 @@ config ARM64_GCS
endmenu # "v9.4 architectural features"
+config AS_HAS_LSUI
+ def_bool $(as-instr,.arch_extension lsui)
+ help
+ Supported by LLVM 20 and later, not yet supported by GNU AS.
+
config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 4/5] arm64: futex: refactor futex atomic operation
2025-08-11 16:36 [PATCH v6 0/5] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
` (2 preceding siblings ...)
2025-08-11 16:36 ` [PATCH v6 3/5] arm64: Kconfig: add LSUI Kconfig Yeoreum Yun
@ 2025-08-11 16:36 ` Yeoreum Yun
2025-08-15 16:38 ` Catalin Marinas
2025-08-11 16:36 ` [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI Yeoreum Yun
4 siblings, 1 reply; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-11 16:36 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton,
shameerali.kolothum.thodi, joey.gouly, james.morse, ardb, scott,
suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Refactor futex atomic operations using ll/sc method with
clearing PSTATE.PAN to prepare to apply FEAT_LSUI on them.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/include/asm/futex.h | 183 ++++++++++++++++++++++-----------
1 file changed, 124 insertions(+), 59 deletions(-)
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index bc06691d2062..fdec4f3f2b15 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -7,73 +7,164 @@
#include <linux/futex.h>
#include <linux/uaccess.h>
+#include <linux/stringify.h>
#include <asm/errno.h>
-#define FUTEX_MAX_LOOPS 128 /* What's the largest number you can think of? */
+#define LLSC_MAX_LOOPS 128 /* What's the largest number you can think of? */
-#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
-do { \
- unsigned int loops = FUTEX_MAX_LOOPS; \
+#define LLSC_FUTEX_ATOMIC_OP(op, asm_op) \
+static __always_inline int \
+__llsc_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
+{ \
+ unsigned int loops = LLSC_MAX_LOOPS; \
+ int ret, val, tmp; \
\
uaccess_enable_privileged(); \
- asm volatile( \
-" prfm pstl1strm, %2\n" \
-"1: ldxr %w1, %2\n" \
- insn "\n" \
-"2: stlxr %w0, %w3, %2\n" \
-" cbz %w0, 3f\n" \
-" sub %w4, %w4, %w0\n" \
-" cbnz %w4, 1b\n" \
-" mov %w0, %w6\n" \
-"3:\n" \
-" dmb ish\n" \
+ asm volatile("// __llsc_futex_atomic_" #op "\n" \
+ " prfm pstl1strm, %2\n" \
+ "1: ldxr %w1, %2\n" \
+ " " #asm_op " %w3, %w1, %w5\n" \
+ "2: stlxr %w0, %w3, %2\n" \
+ " cbz %w0, 3f\n" \
+ " sub %w4, %w4, %w0\n" \
+ " cbnz %w4, 1b\n" \
+ " mov %w0, %w6\n" \
+ "3:\n" \
+ " dmb ish\n" \
_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0) \
_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0) \
- : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp), \
+ : "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), \
"+r" (loops) \
: "r" (oparg), "Ir" (-EAGAIN) \
: "memory"); \
uaccess_disable_privileged(); \
-} while (0)
+ \
+ if (!ret) \
+ *oval = val; \
+ \
+ return ret; \
+}
+
+LLSC_FUTEX_ATOMIC_OP(add, add)
+LLSC_FUTEX_ATOMIC_OP(or, orr)
+LLSC_FUTEX_ATOMIC_OP(and, and)
+LLSC_FUTEX_ATOMIC_OP(eor, eor)
+
+static __always_inline int
+__llsc_futex_atomic_set(int oparg, u32 __user *uaddr, int *oval)
+{
+ unsigned int loops = LLSC_MAX_LOOPS;
+ int ret, val;
+
+ uaccess_enable_privileged();
+ asm volatile("//__llsc_futex_xchg\n"
+ " prfm pstl1strm, %2\n"
+ "1: ldxr %w1, %2\n"
+ "2: stlxr %w0, %w4, %2\n"
+ " cbz %w3, 3f\n"
+ " sub %w3, %w3, %w0\n"
+ " cbnz %w3, 1b\n"
+ " mov %w0, %w5\n"
+ "3:\n"
+ " dmb ish\n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
+ _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
+ : "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "+r" (loops)
+ : "r" (oparg), "Ir" (-EAGAIN)
+ : "memory");
+ uaccess_disable_privileged();
+
+ if (!ret)
+ *oval = val;
+
+ return ret;
+}
+
+static __always_inline int
+__llsc_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+ int ret = 0;
+ unsigned int loops = LLSC_MAX_LOOPS;
+ u32 val, tmp;
+
+ uaccess_enable_privileged();
+ asm volatile("//__llsc_futex_cmpxchg\n"
+ " prfm pstl1strm, %2\n"
+ "1: ldxr %w1, %2\n"
+ " eor %w3, %w1, %w5\n"
+ " cbnz %w3, 4f\n"
+ "2: stlxr %w3, %w6, %2\n"
+ " cbz %w3, 3f\n"
+ " sub %w4, %w4, %w3\n"
+ " cbnz %w4, 1b\n"
+ " mov %w0, %w7\n"
+ "3:\n"
+ " dmb ish\n"
+ "4:\n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
+ _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
+ : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
+ : "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
+ : "memory");
+ uaccess_disable_privileged();
+
+ if (!ret)
+ *oval = val;
+
+ return ret;
+}
+
+#define FUTEX_ATOMIC_OP(op) \
+static __always_inline int \
+__futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
+{ \
+ return __llsc_futex_atomic_##op(oparg, uaddr, oval); \
+}
+
+FUTEX_ATOMIC_OP(add)
+FUTEX_ATOMIC_OP(or)
+FUTEX_ATOMIC_OP(and)
+FUTEX_ATOMIC_OP(eor)
+FUTEX_ATOMIC_OP(set)
+
+static __always_inline int
+__futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+ return __llsc_futex_cmpxchg(uaddr, oldval, newval, oval);
+}
static inline int
arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr)
{
- int oldval = 0, ret, tmp;
- u32 __user *uaddr = __uaccess_mask_ptr(_uaddr);
+ int ret;
+ u32 __user *uaddr;
if (!access_ok(_uaddr, sizeof(u32)))
return -EFAULT;
+ uaddr = __uaccess_mask_ptr(_uaddr);
+
switch (op) {
case FUTEX_OP_SET:
- __futex_atomic_op("mov %w3, %w5",
- ret, oldval, uaddr, tmp, oparg);
+ ret = __futex_atomic_set(oparg, uaddr, oval);
break;
case FUTEX_OP_ADD:
- __futex_atomic_op("add %w3, %w1, %w5",
- ret, oldval, uaddr, tmp, oparg);
+ ret = __futex_atomic_add(oparg, uaddr, oval);
break;
case FUTEX_OP_OR:
- __futex_atomic_op("orr %w3, %w1, %w5",
- ret, oldval, uaddr, tmp, oparg);
+ ret = __futex_atomic_or(oparg, uaddr, oval);
break;
case FUTEX_OP_ANDN:
- __futex_atomic_op("and %w3, %w1, %w5",
- ret, oldval, uaddr, tmp, ~oparg);
+ ret = __futex_atomic_and(~oparg, uaddr, oval);
break;
case FUTEX_OP_XOR:
- __futex_atomic_op("eor %w3, %w1, %w5",
- ret, oldval, uaddr, tmp, oparg);
+ ret = __futex_atomic_eor(oparg, uaddr, oval);
break;
default:
ret = -ENOSYS;
}
- if (!ret)
- *oval = oldval;
-
return ret;
}
@@ -81,40 +172,14 @@ static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *_uaddr,
u32 oldval, u32 newval)
{
- int ret = 0;
- unsigned int loops = FUTEX_MAX_LOOPS;
- u32 val, tmp;
u32 __user *uaddr;
if (!access_ok(_uaddr, sizeof(u32)))
return -EFAULT;
uaddr = __uaccess_mask_ptr(_uaddr);
- uaccess_enable_privileged();
- asm volatile("// futex_atomic_cmpxchg_inatomic\n"
-" prfm pstl1strm, %2\n"
-"1: ldxr %w1, %2\n"
-" sub %w3, %w1, %w5\n"
-" cbnz %w3, 4f\n"
-"2: stlxr %w3, %w6, %2\n"
-" cbz %w3, 3f\n"
-" sub %w4, %w4, %w3\n"
-" cbnz %w4, 1b\n"
-" mov %w0, %w7\n"
-"3:\n"
-" dmb ish\n"
-"4:\n"
- _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
- _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
- : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
- : "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
- : "memory");
- uaccess_disable_privileged();
- if (!ret)
- *uval = val;
-
- return ret;
+ return __futex_cmpxchg(uaddr, oldval, newval, uval);
}
#endif /* __ASM_FUTEX_H */
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-11 16:36 [PATCH v6 0/5] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
` (3 preceding siblings ...)
2025-08-11 16:36 ` [PATCH v6 4/5] arm64: futex: refactor futex atomic operation Yeoreum Yun
@ 2025-08-11 16:36 ` Yeoreum Yun
2025-08-15 17:02 ` Catalin Marinas
4 siblings, 1 reply; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-11 16:36 UTC (permalink / raw)
To: catalin.marinas, will, broonie, maz, oliver.upton,
shameerali.kolothum.thodi, joey.gouly, james.morse, ardb, scott,
suzuki.poulose, yuzenghui, mark.rutland
Cc: linux-arm-kernel, kvmarm, linux-kernel, Yeoreum Yun
Current futex atomic operations are implemented with ll/sc instructions
and clearing PSTATE.PAN.
Since Armv9.6, FEAT_LSUI supplies not only load/store instructions but
also atomic operation for user memory access in kernel it doesn't need
to clear PSTATE.PAN bit anymore.
With theses instructions some of futex atomic operations don't need to
be implmented with ldxr/stlxr pair instead can be implmented with
one atomic operation supplied by FEAT_LSUI.
However, some of futex atomic operations still need to use ll/sc way
via ldtxr/stltxr supplied by FEAT_LSUI since there is no correspondant
atomic instruction or doesn't support word size operation.
(i.e) eor, cas{mb}t
But It's good to work without clearing PSTATE.PAN bit.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/include/asm/futex.h | 142 ++++++++++++++++++++++++++++++++-
1 file changed, 141 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index fdec4f3f2b15..38fc98f4af46 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -9,6 +9,8 @@
#include <linux/uaccess.h>
#include <linux/stringify.h>
+#include <asm/alternative.h>
+#include <asm/alternative-macros.h>
#include <asm/errno.h>
#define LLSC_MAX_LOOPS 128 /* What's the largest number you can think of? */
@@ -115,11 +117,149 @@ __llsc_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
return ret;
}
+#ifdef CONFIG_AS_HAS_LSUI
+
+#define __LSUI_PREAMBLE ".arch_extension lsui\n"
+
+#define LSUI_FUTEX_ATOMIC_OP(op, asm_op, mb) \
+static __always_inline int \
+__lsui_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
+{ \
+ int ret = 0; \
+ int val; \
+ \
+ mte_enable_tco(); \
+ uaccess_ttbr0_enable(); \
+ \
+ asm volatile("// __lsui_futex_atomic_" #op "\n" \
+ __LSUI_PREAMBLE \
+ "1: " #asm_op #mb " %w3, %w2, %1\n" \
+ "2:\n" \
+ _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %w0) \
+ : "+r" (ret), "+Q" (*uaddr), "=r" (val) \
+ : "r" (oparg) \
+ : "memory"); \
+ \
+ mte_disable_tco(); \
+ uaccess_ttbr0_disable(); \
+ \
+ if (!ret) \
+ *oval = val; \
+ \
+ return ret; \
+}
+
+LSUI_FUTEX_ATOMIC_OP(add, ldtadd, al)
+LSUI_FUTEX_ATOMIC_OP(or, ldtset, al)
+LSUI_FUTEX_ATOMIC_OP(andnot, ldtclr, al)
+LSUI_FUTEX_ATOMIC_OP(set, swpt, al)
+
+static __always_inline int
+__lsui_futex_atomic_and(int oparg, u32 __user *uaddr, int *oval)
+{
+ return __lsui_futex_atomic_andnot(~oparg, uaddr, oval);
+}
+
+static __always_inline int
+__lsui_futex_atomic_eor(int oparg, u32 __user *uaddr, int *oval)
+{
+ unsigned int loops = LLSC_MAX_LOOPS;
+ int ret, val, tmp;
+
+ mte_enable_tco();
+ uaccess_ttbr0_enable();
+
+ /*
+ * there are no ldteor/stteor instructions...
+ */
+ asm volatile("// __lsui_futex_atomic_eor\n"
+ __LSUI_PREAMBLE
+ " prfm pstl1strm, %2\n"
+ "1: ldtxr %w1, %2\n"
+ " eor %w3, %w1, %w5\n"
+ "2: stltxr %w0, %w3, %2\n"
+ " cbz %w0, 3f\n"
+ " sub %w4, %w4, %w0\n"
+ " cbnz %w4, 1b\n"
+ " mov %w0, %w6\n"
+ "3:\n"
+ " dmb ish\n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
+ _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
+ : "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp),
+ "+r" (loops)
+ : "r" (oparg), "Ir" (-EAGAIN)
+ : "memory");
+
+ mte_disable_tco();
+ uaccess_ttbr0_disable();
+
+ if (!ret)
+ *oval = val;
+
+ return ret;
+}
+
+static __always_inline int
+__lsui_futex_cmpxchg(u32 __user *uaddr, u32 oldval, u32 newval, u32 *oval)
+{
+ int ret = 0;
+ unsigned int loops = LLSC_MAX_LOOPS;
+ u32 val, tmp;
+
+ mte_enable_tco();
+ uaccess_ttbr0_enable();
+
+ /*
+ * cas{al}t doesn't support word size...
+ */
+ asm volatile("//__lsui_futex_cmpxchg\n"
+ __LSUI_PREAMBLE
+ " prfm pstl1strm, %2\n"
+ "1: ldtxr %w1, %2\n"
+ " eor %w3, %w1, %w5\n"
+ " cbnz %w3, 4f\n"
+ "2: stltxr %w3, %w6, %2\n"
+ " cbz %w3, 3f\n"
+ " sub %w4, %w4, %w3\n"
+ " cbnz %w4, 1b\n"
+ " mov %w0, %w7\n"
+ "3:\n"
+ " dmb ish\n"
+ "4:\n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 4b, %w0)
+ _ASM_EXTABLE_UACCESS_ERR(2b, 4b, %w0)
+ : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), "+r" (loops)
+ : "r" (oldval), "r" (newval), "Ir" (-EAGAIN)
+ : "memory");
+
+ mte_disable_tco();
+ uaccess_ttbr0_disable();
+
+ if (!ret)
+ *oval = oldval;
+
+ return ret;
+}
+
+#define __lsui_llsc_body(op, ...) \
+({ \
+ alternative_has_cap_likely(ARM64_HAS_LSUI) ? \
+ __lsui_##op(__VA_ARGS__) : __llsc_##op(__VA_ARGS__); \
+})
+
+#else /* CONFIG_AS_HAS_LSUI */
+
+#define __lsui_llsc_body(op, ...) __llsc_##op(__VA_ARGS__)
+
+#endif /* CONFIG_AS_HAS_LSUI */
+
+
#define FUTEX_ATOMIC_OP(op) \
static __always_inline int \
__futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
{ \
- return __llsc_futex_atomic_##op(oparg, uaddr, oval); \
+ return __lsui_llsc_body(futex_atomic_##op, oparg, uaddr, oval); \
}
FUTEX_ATOMIC_OP(add)
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v6 4/5] arm64: futex: refactor futex atomic operation
2025-08-11 16:36 ` [PATCH v6 4/5] arm64: futex: refactor futex atomic operation Yeoreum Yun
@ 2025-08-15 16:38 ` Catalin Marinas
2025-08-16 13:03 ` Yeoreum Yun
0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-08-15 16:38 UTC (permalink / raw)
To: Yeoreum Yun
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
On Mon, Aug 11, 2025 at 05:36:34PM +0100, Yeoreum Yun wrote:
> Refactor futex atomic operations using ll/sc method with
> clearing PSTATE.PAN to prepare to apply FEAT_LSUI on them.
>
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> ---
> arch/arm64/include/asm/futex.h | 183 ++++++++++++++++++++++-----------
> 1 file changed, 124 insertions(+), 59 deletions(-)
>
> diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
> index bc06691d2062..fdec4f3f2b15 100644
> --- a/arch/arm64/include/asm/futex.h
> +++ b/arch/arm64/include/asm/futex.h
> @@ -7,73 +7,164 @@
>
> #include <linux/futex.h>
> #include <linux/uaccess.h>
> +#include <linux/stringify.h>
>
> #include <asm/errno.h>
>
> -#define FUTEX_MAX_LOOPS 128 /* What's the largest number you can think of? */
> +#define LLSC_MAX_LOOPS 128 /* What's the largest number you can think of? */
>
> -#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
> -do { \
> - unsigned int loops = FUTEX_MAX_LOOPS; \
> +#define LLSC_FUTEX_ATOMIC_OP(op, asm_op) \
> +static __always_inline int \
> +__llsc_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
> +{ \
> + unsigned int loops = LLSC_MAX_LOOPS; \
> + int ret, val, tmp; \
> \
> uaccess_enable_privileged(); \
> - asm volatile( \
> -" prfm pstl1strm, %2\n" \
> -"1: ldxr %w1, %2\n" \
> - insn "\n" \
> -"2: stlxr %w0, %w3, %2\n" \
> -" cbz %w0, 3f\n" \
> -" sub %w4, %w4, %w0\n" \
> -" cbnz %w4, 1b\n" \
> -" mov %w0, %w6\n" \
> -"3:\n" \
> -" dmb ish\n" \
> + asm volatile("// __llsc_futex_atomic_" #op "\n" \
> + " prfm pstl1strm, %2\n" \
> + "1: ldxr %w1, %2\n" \
> + " " #asm_op " %w3, %w1, %w5\n" \
> + "2: stlxr %w0, %w3, %2\n" \
> + " cbz %w0, 3f\n" \
> + " sub %w4, %w4, %w0\n" \
> + " cbnz %w4, 1b\n" \
> + " mov %w0, %w6\n" \
> + "3:\n" \
> + " dmb ish\n" \
Don't change indentation and code in the same patch, it makes it harder
to follow what you actually changed. I guess the only difference is
asm_op instead of insn.
> _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0) \
> _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0) \
> - : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp), \
> + : "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), \
And here you changed oldval to val (was this necessary?)
> "+r" (loops) \
> : "r" (oparg), "Ir" (-EAGAIN) \
> : "memory"); \
> uaccess_disable_privileged(); \
> -} while (0)
> + \
> + if (!ret) \
> + *oval = val; \
> + \
> + return ret; \
> +}
> +
> +LLSC_FUTEX_ATOMIC_OP(add, add)
> +LLSC_FUTEX_ATOMIC_OP(or, orr)
> +LLSC_FUTEX_ATOMIC_OP(and, and)
> +LLSC_FUTEX_ATOMIC_OP(eor, eor)
> +
> +static __always_inline int
> +__llsc_futex_atomic_set(int oparg, u32 __user *uaddr, int *oval)
> +{
> + unsigned int loops = LLSC_MAX_LOOPS;
> + int ret, val;
> +
> + uaccess_enable_privileged();
> + asm volatile("//__llsc_futex_xchg\n"
> + " prfm pstl1strm, %2\n"
> + "1: ldxr %w1, %2\n"
> + "2: stlxr %w0, %w4, %2\n"
> + " cbz %w3, 3f\n"
> + " sub %w3, %w3, %w0\n"
> + " cbnz %w3, 1b\n"
> + " mov %w0, %w5\n"
> + "3:\n"
> + " dmb ish\n"
> + _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
> + _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
> + : "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "+r" (loops)
> + : "r" (oparg), "Ir" (-EAGAIN)
> + : "memory");
> + uaccess_disable_privileged();
Was this separate function just to avoid the "mov" instruction for the
"set" case? The patch description states that the reworking is necessary
for the FEAT_LSUI use but it looks to me like it does more. Please split
it in separate patches, though I'd leave any potential optimisation for
a separate series and keep the current code as close as possible to the
original one.
--
Catalin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-11 16:36 ` [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI Yeoreum Yun
@ 2025-08-15 17:02 ` Catalin Marinas
2025-08-16 12:30 ` Yeoreum Yun
0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-08-15 17:02 UTC (permalink / raw)
To: Yeoreum Yun
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
On Mon, Aug 11, 2025 at 05:36:35PM +0100, Yeoreum Yun wrote:
> +#ifdef CONFIG_AS_HAS_LSUI
> +
> +#define __LSUI_PREAMBLE ".arch_extension lsui\n"
> +
> +#define LSUI_FUTEX_ATOMIC_OP(op, asm_op, mb) \
> +static __always_inline int \
> +__lsui_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
> +{ \
> + int ret = 0; \
> + int val; \
> + \
> + mte_enable_tco(); \
The reason uaccess_disable_privileged() sets the MTE TCO (tag check
override) is because the user and the kernel may have different settings
for tag checking. If we use the user instructions provided by FEAT_LSUI,
we leave the MTE checking as is.
The same comment for all the other functions here.
--
Catalin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 1/5] arm64: cpufeature: add FEAT_LSUI
2025-08-11 16:36 ` [PATCH v6 1/5] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
@ 2025-08-15 17:33 ` Catalin Marinas
2025-08-16 11:04 ` Yeoreum Yun
0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-08-15 17:33 UTC (permalink / raw)
To: Yeoreum Yun
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
On Mon, Aug 11, 2025 at 05:36:31PM +0100, Yeoreum Yun wrote:
> @@ -3131,6 +3132,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> .matches = has_cpuid_feature,
> ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, GCIE, IMP)
> },
> + {
> + .desc = "Unprivileged Load Store Instructions (LSUI)",
> + .capability = ARM64_HAS_LSUI,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_cpuid_feature,
> + ARM64_CPUID_FIELDS(ID_AA64ISAR3_EL1, LSUI, IMP)
> + },
> {},
> };
Since this is only used in the kernel, I wonder whether we should hide
it behind #ifdef CONFIG_AS_HAS_LSUI. Otherwise we report it as present
and one may infer that the kernel is going to use it. Not a strong view
and I don't think we have a precedent for this.
--
Catalin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 1/5] arm64: cpufeature: add FEAT_LSUI
2025-08-15 17:33 ` Catalin Marinas
@ 2025-08-16 11:04 ` Yeoreum Yun
0 siblings, 0 replies; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-16 11:04 UTC (permalink / raw)
To: Catalin Marinas
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
Hi Catalin,
> On Mon, Aug 11, 2025 at 05:36:31PM +0100, Yeoreum Yun wrote:
> > @@ -3131,6 +3132,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> > .matches = has_cpuid_feature,
> > ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, GCIE, IMP)
> > },
> > + {
> > + .desc = "Unprivileged Load Store Instructions (LSUI)",
> > + .capability = ARM64_HAS_LSUI,
> > + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> > + .matches = has_cpuid_feature,
> > + ARM64_CPUID_FIELDS(ID_AA64ISAR3_EL1, LSUI, IMP)
> > + },
> > {},
> > };
>
> Since this is only used in the kernel, I wonder whether we should hide
> it behind #ifdef CONFIG_AS_HAS_LSUI. Otherwise we report it as present
> and one may infer that the kernel is going to use it. Not a strong view
> and I don't think we have a precedent for this.
Agree, anyway if CONFIG_AS_HAS_LSUI, it'll be not used anywhere
right now though kernel report it has this feature.
I'll wrap it as your suggestion.
Thanks.
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-15 17:02 ` Catalin Marinas
@ 2025-08-16 12:30 ` Yeoreum Yun
2025-08-16 14:57 ` Yeoreum Yun
0 siblings, 1 reply; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-16 12:30 UTC (permalink / raw)
To: Catalin Marinas
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
Hi Catalin,
> On Mon, Aug 11, 2025 at 05:36:35PM +0100, Yeoreum Yun wrote:
> > +#ifdef CONFIG_AS_HAS_LSUI
> > +
> > +#define __LSUI_PREAMBLE ".arch_extension lsui\n"
> > +
> > +#define LSUI_FUTEX_ATOMIC_OP(op, asm_op, mb) \
> > +static __always_inline int \
> > +__lsui_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
> > +{ \
> > + int ret = 0; \
> > + int val; \
> > + \
> > + mte_enable_tco(); \
>
> The reason uaccess_disable_privileged() sets the MTE TCO (tag check
> override) is because the user and the kernel may have different settings
> for tag checking. If we use the user instructions provided by FEAT_LSUI,
> we leave the MTE checking as is.
>
> The same comment for all the other functions here.
You're right. Thanks for catching this :)
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 4/5] arm64: futex: refactor futex atomic operation
2025-08-15 16:38 ` Catalin Marinas
@ 2025-08-16 13:03 ` Yeoreum Yun
0 siblings, 0 replies; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-16 13:03 UTC (permalink / raw)
To: Catalin Marinas
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
Hi Catalin,
[...]
> > diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
> > index bc06691d2062..fdec4f3f2b15 100644
> > --- a/arch/arm64/include/asm/futex.h
> > +++ b/arch/arm64/include/asm/futex.h
> > @@ -7,73 +7,164 @@
> >
> > #include <linux/futex.h>
> > #include <linux/uaccess.h>
> > +#include <linux/stringify.h>
> >
> > #include <asm/errno.h>
> >
> > -#define FUTEX_MAX_LOOPS 128 /* What's the largest number you can think of? */
> > +#define LLSC_MAX_LOOPS 128 /* What's the largest number you can think of? */
> >
> > -#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
> > -do { \
> > - unsigned int loops = FUTEX_MAX_LOOPS; \
> > +#define LLSC_FUTEX_ATOMIC_OP(op, asm_op) \
> > +static __always_inline int \
> > +__llsc_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
> > +{ \
> > + unsigned int loops = LLSC_MAX_LOOPS; \
> > + int ret, val, tmp; \
> > \
> > uaccess_enable_privileged(); \
> > - asm volatile( \
> > -" prfm pstl1strm, %2\n" \
> > -"1: ldxr %w1, %2\n" \
> > - insn "\n" \
> > -"2: stlxr %w0, %w3, %2\n" \
> > -" cbz %w0, 3f\n" \
> > -" sub %w4, %w4, %w0\n" \
> > -" cbnz %w4, 1b\n" \
> > -" mov %w0, %w6\n" \
> > -"3:\n" \
> > -" dmb ish\n" \
> > + asm volatile("// __llsc_futex_atomic_" #op "\n" \
> > + " prfm pstl1strm, %2\n" \
> > + "1: ldxr %w1, %2\n" \
> > + " " #asm_op " %w3, %w1, %w5\n" \
> > + "2: stlxr %w0, %w3, %2\n" \
> > + " cbz %w0, 3f\n" \
> > + " sub %w4, %w4, %w0\n" \
> > + " cbnz %w4, 1b\n" \
> > + " mov %w0, %w6\n" \
> > + "3:\n" \
> > + " dmb ish\n" \
>
> Don't change indentation and code in the same patch, it makes it harder
> to follow what you actually changed. I guess the only difference is
> asm_op instead of insn.
Sorry for bothering you. I'll restore indentation to make it clear.
and yes. the only difference is to change you mention it.
>
> > _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0) \
> > _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0) \
> > - : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp), \
> > + : "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp), \
>
> And here you changed oldval to val (was this necessary?)
Not really. I keep the "oldval" as it is.
Thanks.
>
> > "+r" (loops) \
> > : "r" (oparg), "Ir" (-EAGAIN) \
> > : "memory"); \
> > uaccess_disable_privileged(); \
> > -} while (0)
> > + \
> > + if (!ret) \
> > + *oval = val; \
> > + \
> > + return ret; \
> > +}
> > +
> > +LLSC_FUTEX_ATOMIC_OP(add, add)
> > +LLSC_FUTEX_ATOMIC_OP(or, orr)
> > +LLSC_FUTEX_ATOMIC_OP(and, and)
> > +LLSC_FUTEX_ATOMIC_OP(eor, eor)
> > +
> > +static __always_inline int
> > +__llsc_futex_atomic_set(int oparg, u32 __user *uaddr, int *oval)
> > +{
> > + unsigned int loops = LLSC_MAX_LOOPS;
> > + int ret, val;
> > +
> > + uaccess_enable_privileged();
> > + asm volatile("//__llsc_futex_xchg\n"
> > + " prfm pstl1strm, %2\n"
> > + "1: ldxr %w1, %2\n"
> > + "2: stlxr %w0, %w4, %2\n"
> > + " cbz %w3, 3f\n"
> > + " sub %w3, %w3, %w0\n"
> > + " cbnz %w3, 1b\n"
> > + " mov %w0, %w5\n"
> > + "3:\n"
> > + " dmb ish\n"
> > + _ASM_EXTABLE_UACCESS_ERR(1b, 3b, %w0)
> > + _ASM_EXTABLE_UACCESS_ERR(2b, 3b, %w0)
> > + : "=&r" (ret), "=&r" (val), "+Q" (*uaddr), "+r" (loops)
> > + : "r" (oparg), "Ir" (-EAGAIN)
> > + : "memory");
> > + uaccess_disable_privileged();
>
> Was this separate function just to avoid the "mov" instruction for the
> "set" case? The patch description states that the reworking is necessary
> for the FEAT_LSUI use but it looks to me like it does more. Please split
> it in separate patches, though I'd leave any potential optimisation for
> a separate series and keep the current code as close as possible to the
> original one.
>
Yes. It's a small optimisation -- not use "mov" instruction.
I'll separate that part.
Thanks!
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-16 12:30 ` Yeoreum Yun
@ 2025-08-16 14:57 ` Yeoreum Yun
2025-08-18 18:35 ` Catalin Marinas
0 siblings, 1 reply; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-16 14:57 UTC (permalink / raw)
To: Catalin Marinas
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
Hi Catalin,
[...]
> > > +#ifdef CONFIG_AS_HAS_LSUI
> > > +
> > > +#define __LSUI_PREAMBLE ".arch_extension lsui\n"
> > > +
> > > +#define LSUI_FUTEX_ATOMIC_OP(op, asm_op, mb) \
> > > +static __always_inline int \
> > > +__lsui_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
> > > +{ \
> > > + int ret = 0; \
> > > + int val; \
> > > + \
> > > + mte_enable_tco(); \
> >
>
> > The reason uaccess_disable_privileged() sets the MTE TCO (tag check
> > override) is because the user and the kernel may have different settings
> > for tag checking. If we use the user instructions provided by FEAT_LSUI,
> > we leave the MTE checking as is.
> >
> > The same comment for all the other functions here.
>
> You're right. Thanks for catching this :)
But one bikeshedding question.
why we need to care about the different settings for tag checking when
we use uaccess_disable_privileged()?
IIUC, the reason we uses to uaccess_disaable_privileged() to access
user memory with copy_from/to_user() and etc.
But, although tag check fault happens in kernel side,
It seems to be handled by fixup code if user address is wrong.
Am I missing something?
> --
> Sincerely,
> Yeoreum Yun
>
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-16 14:57 ` Yeoreum Yun
@ 2025-08-18 18:35 ` Catalin Marinas
2025-08-18 19:53 ` Yeoreum Yun
0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-08-18 18:35 UTC (permalink / raw)
To: Yeoreum Yun
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
On Sat, Aug 16, 2025 at 03:57:49PM +0100, Yeoreum Yun wrote:
> > > > +#ifdef CONFIG_AS_HAS_LSUI
> > > > +
> > > > +#define __LSUI_PREAMBLE ".arch_extension lsui\n"
> > > > +
> > > > +#define LSUI_FUTEX_ATOMIC_OP(op, asm_op, mb) \
> > > > +static __always_inline int \
> > > > +__lsui_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
> > > > +{ \
> > > > + int ret = 0; \
> > > > + int val; \
> > > > + \
> > > > + mte_enable_tco(); \
> > >
> >
> > > The reason uaccess_disable_privileged() sets the MTE TCO (tag check
> > > override) is because the user and the kernel may have different settings
> > > for tag checking. If we use the user instructions provided by FEAT_LSUI,
> > > we leave the MTE checking as is.
> > >
> > > The same comment for all the other functions here.
> >
> > You're right. Thanks for catching this :)
>
> But one bikeshedding question.
> why we need to care about the different settings for tag checking when
> we use uaccess_disable_privileged()?
Because, for example, the user may not be interested in any tag check
faults (has checking disabled) but the kernel uses KASAN with
synchronous tag check faults. If it uses the privileged instructions as
in the futex API, it either won't make progress or report errors to the
user which it does not expect.
> IIUC, the reason we uses to uaccess_disaable_privileged() to access
> user memory with copy_from/to_user() and etc.
We don't use uaccess_disable_privileged() with copy_from_user() since
those use the unprivileged instructions already.
> But, although tag check fault happens in kernel side,
> It seems to be handled by fixup code if user address is wrong.
The user may know it is wrong and not care (e.g. one wants to keep using
a buggy application).
--
Catalin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-18 18:35 ` Catalin Marinas
@ 2025-08-18 19:53 ` Yeoreum Yun
2025-08-19 8:38 ` Catalin Marinas
0 siblings, 1 reply; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-18 19:53 UTC (permalink / raw)
To: Catalin Marinas
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
Hi Catalin,
> On Sat, Aug 16, 2025 at 03:57:49PM +0100, Yeoreum Yun wrote:
> > > > > +#ifdef CONFIG_AS_HAS_LSUI
> > > > > +
> > > > > +#define __LSUI_PREAMBLE ".arch_extension lsui\n"
> > > > > +
> > > > > +#define LSUI_FUTEX_ATOMIC_OP(op, asm_op, mb) \
> > > > > +static __always_inline int \
> > > > > +__lsui_futex_atomic_##op(int oparg, u32 __user *uaddr, int *oval) \
> > > > > +{ \
> > > > > + int ret = 0; \
> > > > > + int val; \
> > > > > + \
> > > > > + mte_enable_tco(); \
> > > >
> > >
> > > > The reason uaccess_disable_privileged() sets the MTE TCO (tag check
> > > > override) is because the user and the kernel may have different settings
> > > > for tag checking. If we use the user instructions provided by FEAT_LSUI,
> > > > we leave the MTE checking as is.
> > > >
> > > > The same comment for all the other functions here.
> > >
> > > You're right. Thanks for catching this :)
> >
> > But one bikeshedding question.
> > why we need to care about the different settings for tag checking when
> > we use uaccess_disable_privileged()?
>
> Because, for example, the user may not be interested in any tag check
> faults (has checking disabled) but the kernel uses KASAN with
> synchronous tag check faults. If it uses the privileged instructions as
> in the futex API, it either won't make progress or report errors to the
> user which it does not expect.
>
> > IIUC, the reason we uses to uaccess_disaable_privileged() to access
> > user memory with copy_from/to_user() and etc.
>
> We don't use uaccess_disable_privileged() with copy_from_user() since
> those use the unprivileged instructions already.
Thanks for your explaination :)
>
> > But, although tag check fault happens in kernel side,
> > It seems to be handled by fixup code if user address is wrong.
>
> The user may know it is wrong and not care (e.g. one wants to keep using
> a buggy application).
Then Does this example -- ignoring wrong and keep using a buggy
application shows us that we need to enable TCO when
we runs the LSUI instruction?
AFAIK, LSUI instruction also check memory tag -- i.e) ldtadd.
if passed user address which has unmatched tag and if user isn't
interested in tah check, It can meet the unexpected report from KASAN.
Am I missing something?
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-18 19:53 ` Yeoreum Yun
@ 2025-08-19 8:38 ` Catalin Marinas
2025-08-19 9:11 ` Yeoreum Yun
0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-08-19 8:38 UTC (permalink / raw)
To: Yeoreum Yun
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
On Mon, Aug 18, 2025 at 08:53:57PM +0100, Yeoreum Yun wrote:
> > On Sat, Aug 16, 2025 at 03:57:49PM +0100, Yeoreum Yun wrote:
> > > why we need to care about the different settings for tag checking when
> > > we use uaccess_disable_privileged()?
[...]
> > > But, although tag check fault happens in kernel side,
> > > It seems to be handled by fixup code if user address is wrong.
> >
> > The user may know it is wrong and not care (e.g. one wants to keep using
> > a buggy application).
>
> Then Does this example -- ignoring wrong and keep using a buggy
> application shows us that we need to enable TCO when
> we runs the LSUI instruction?
>
> AFAIK, LSUI instruction also check memory tag -- i.e) ldtadd.
> if passed user address which has unmatched tag and if user isn't
> interested in tah check, It can meet the unexpected report from KASAN.
That's a valid point w.r.t. PSTATE.TCO that applies to copy_to/from_user
as well. I don't think we documented it but we don't expect the user
PSTATE.TCO state to be taken into account while doing uaccess from the
kernel. We do, however, expect SCTLR_EL1.TCF0 to be honoured and that's
what the user normally tweaks via a prctl(). The TCO is meant to
disable tag checking briefly when TCF enabled the tag check faults.
--
Catalin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-19 8:38 ` Catalin Marinas
@ 2025-08-19 9:11 ` Yeoreum Yun
2025-08-19 14:29 ` Catalin Marinas
0 siblings, 1 reply; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-19 9:11 UTC (permalink / raw)
To: Catalin Marinas
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
On Tue, Aug 19, 2025 at 09:38:54AM +0100, Catalin Marinas wrote:
> On Mon, Aug 18, 2025 at 08:53:57PM +0100, Yeoreum Yun wrote:
> > > On Sat, Aug 16, 2025 at 03:57:49PM +0100, Yeoreum Yun wrote:
> > > > why we need to care about the different settings for tag checking when
> > > > we use uaccess_disable_privileged()?
> [...]
> > > > But, although tag check fault happens in kernel side,
> > > > It seems to be handled by fixup code if user address is wrong.
> > >
> > > The user may know it is wrong and not care (e.g. one wants to keep using
> > > a buggy application).
> >
> > Then Does this example -- ignoring wrong and keep using a buggy
> > application shows us that we need to enable TCO when
> > we runs the LSUI instruction?
> >
> > AFAIK, LSUI instruction also check memory tag -- i.e) ldtadd.
> > if passed user address which has unmatched tag and if user isn't
> > interested in tah check, It can meet the unexpected report from KASAN.
>
> That's a valid point w.r.t. PSTATE.TCO that applies to copy_to/from_user
> as well. I don't think we documented it but we don't expect the user
> PSTATE.TCO state to be taken into account while doing uaccess from the
> kernel. We do, however, expect SCTLR_EL1.TCF0 to be honoured and that's
> what the user normally tweaks via a prctl(). The TCO is meant to
> disable tag checking briefly when TCF enabled the tag check faults.
So, IMHO, as copy_to/from_user (ldt/sttr) enable tco before it operates,
I think futex using LSUI should enable TCO bit
before it calls LSUI instruction.
Otherwise, this sounds have a inconsistency of allowing TCF according to
SCTLR_EL1.TCF (not 0)'s configuration while kernel accescss user memory.
Am I on right way?
Thanks.
> --
> Catalin
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-19 9:11 ` Yeoreum Yun
@ 2025-08-19 14:29 ` Catalin Marinas
2025-08-19 15:15 ` Yeoreum Yun
0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-08-19 14:29 UTC (permalink / raw)
To: Yeoreum Yun
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
On Tue, Aug 19, 2025 at 10:11:02AM +0100, Yeoreum Yun wrote:
> On Tue, Aug 19, 2025 at 09:38:54AM +0100, Catalin Marinas wrote:
> > On Mon, Aug 18, 2025 at 08:53:57PM +0100, Yeoreum Yun wrote:
> > > > On Sat, Aug 16, 2025 at 03:57:49PM +0100, Yeoreum Yun wrote:
> > > > > why we need to care about the different settings for tag checking when
> > > > > we use uaccess_disable_privileged()?
> > [...]
> > > > > But, although tag check fault happens in kernel side,
> > > > > It seems to be handled by fixup code if user address is wrong.
> > > >
> > > > The user may know it is wrong and not care (e.g. one wants to keep using
> > > > a buggy application).
> > >
> > > Then Does this example -- ignoring wrong and keep using a buggy
> > > application shows us that we need to enable TCO when
> > > we runs the LSUI instruction?
> > >
> > > AFAIK, LSUI instruction also check memory tag -- i.e) ldtadd.
> > > if passed user address which has unmatched tag and if user isn't
> > > interested in tah check, It can meet the unexpected report from KASAN.
> >
> > That's a valid point w.r.t. PSTATE.TCO that applies to copy_to/from_user
> > as well. I don't think we documented it but we don't expect the user
> > PSTATE.TCO state to be taken into account while doing uaccess from the
> > kernel. We do, however, expect SCTLR_EL1.TCF0 to be honoured and that's
> > what the user normally tweaks via a prctl(). The TCO is meant to
> > disable tag checking briefly when TCF enabled the tag check faults.
>
> So, IMHO, as copy_to/from_user (ldt/sttr) enable tco before it operates,
They don't enable TCO.
--
Catalin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI
2025-08-19 14:29 ` Catalin Marinas
@ 2025-08-19 15:15 ` Yeoreum Yun
0 siblings, 0 replies; 19+ messages in thread
From: Yeoreum Yun @ 2025-08-19 15:15 UTC (permalink / raw)
To: Catalin Marinas
Cc: will, broonie, maz, oliver.upton, shameerali.kolothum.thodi,
joey.gouly, james.morse, ardb, scott, suzuki.poulose, yuzenghui,
mark.rutland, linux-arm-kernel, kvmarm, linux-kernel
> > > > > > why we need to care about the different settings for tag checking when
> > > > > > we use uaccess_disable_privileged()?
> > > [...]
> > > > > > But, although tag check fault happens in kernel side,
> > > > > > It seems to be handled by fixup code if user address is wrong.
> > > > >
> > > > > The user may know it is wrong and not care (e.g. one wants to keep using
> > > > > a buggy application).
> > > >
> > > > Then Does this example -- ignoring wrong and keep using a buggy
> > > > application shows us that we need to enable TCO when
> > > > we runs the LSUI instruction?
> > > >
> > > > AFAIK, LSUI instruction also check memory tag -- i.e) ldtadd.
> > > > if passed user address which has unmatched tag and if user isn't
> > > > interested in tah check, It can meet the unexpected report from KASAN.
> > >
> > > That's a valid point w.r.t. PSTATE.TCO that applies to copy_to/from_user
> > > as well. I don't think we documented it but we don't expect the user
> > > PSTATE.TCO state to be taken into account while doing uaccess from the
> > > kernel. We do, however, expect SCTLR_EL1.TCF0 to be honoured and that's
> > > what the user normally tweaks via a prctl(). The TCO is meant to
> > > disable tag checking briefly when TCF enabled the tag check faults.
> >
> > So, IMHO, as copy_to/from_user (ldt/sttr) enable tco before it operates,
>
> They don't enable TCO.
Ah right. I've confused. Thanks for answer!
>
> --
> Catalin
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-08-19 15:16 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11 16:36 [PATCH v6 0/5] support FEAT_LSUI and apply it on futex atomic ops Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 1/5] arm64: cpufeature: add FEAT_LSUI Yeoreum Yun
2025-08-15 17:33 ` Catalin Marinas
2025-08-16 11:04 ` Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 2/5] KVM: arm64: expose FEAT_LSUI to guest Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 3/5] arm64: Kconfig: add LSUI Kconfig Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 4/5] arm64: futex: refactor futex atomic operation Yeoreum Yun
2025-08-15 16:38 ` Catalin Marinas
2025-08-16 13:03 ` Yeoreum Yun
2025-08-11 16:36 ` [PATCH v6 5/5] arm64: futex: support futex with FEAT_LSUI Yeoreum Yun
2025-08-15 17:02 ` Catalin Marinas
2025-08-16 12:30 ` Yeoreum Yun
2025-08-16 14:57 ` Yeoreum Yun
2025-08-18 18:35 ` Catalin Marinas
2025-08-18 19:53 ` Yeoreum Yun
2025-08-19 8:38 ` Catalin Marinas
2025-08-19 9:11 ` Yeoreum Yun
2025-08-19 14:29 ` Catalin Marinas
2025-08-19 15:15 ` Yeoreum Yun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).