linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] arm64: Use memory copy instructions in kernel routines
@ 2024-09-30 16:10 Kristina Martsenko
  2024-09-30 16:10 ` [PATCH 1/5] arm64: probes: Disable kprobes/uprobes on MOPS instructions Kristina Martsenko
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Kristina Martsenko @ 2024-09-30 16:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

Hi,

Here is a small series to make memcpy() and related functions use the
memory copy/set instructions (Armv8.8 FEAT_MOPS).

The kernel uses several library routines for copying or initializing
memory, for example copy_to_user() and memset(). These routines have
been optimized to make their load/store sequence perform well across a
range of CPUs. However the chosen sequence can't be the fastest possible
for every CPU microarchitecture nor for heterogeneous systems, and needs
to be rewritten periodically as hardware changes.

Future arm64 CPUs will have CPY* and SET* instructions that can copy (or
set) a block of memory of arbitrary size and alignment. The kernel
currently supports using these instructions in userspace applications
[1] and KVM guests [2] but does not use them within the kernel.

CPUs are expected to implement the CPY/SET instructions close to
optimally for their microarchitecture (i.e. close to the performance of
the best load/store sequence performing a generic copy/set). Using the
instructions in the kernel's copy/set routines would therefore make the
routines optimal and avoid the need to rewrite them. It could also lead
to a performance improvement for some CPUs and systems.

This series makes the memcpy(), memmove() and memset() routines use the
CPY/SET instructions, as well as copy_page() and clear_page(). I'll send
a follow-up series to update the usercopy routines (copy_to_user() etc)
"soon", as it needs a bit more work.

The patches were tested on an Arm FVP.

Thanks,
Kristina

[1] https://lore.kernel.org/lkml/20230509142235.3284028-1-kristina.martsenko@arm.com/
[2] https://lore.kernel.org/linux-arm-kernel/20230922112508.1774352-1-kristina.martsenko@arm.com/

Kristina Martsenko (5):
  arm64: probes: Disable kprobes/uprobes on MOPS instructions
  arm64: mops: Handle MOPS exceptions from EL1
  arm64: mops: Document booting requirement for HCR_EL2.MCE2
  arm64: lib: Use MOPS for memcpy() routines
  arm64: lib: Use MOPS for copy_page() and clear_page()

 Documentation/arch/arm64/booting.rst    |  3 +++
 arch/arm64/Kconfig                      |  3 +++
 arch/arm64/include/asm/debug-monitors.h |  1 +
 arch/arm64/include/asm/exception.h      |  1 +
 arch/arm64/include/asm/insn.h           |  1 +
 arch/arm64/kernel/debug-monitors.c      |  5 +++++
 arch/arm64/kernel/entry-common.c        | 12 ++++++++++++
 arch/arm64/kernel/probes/decode-insn.c  |  7 +++++--
 arch/arm64/kernel/traps.c               |  7 +++++++
 arch/arm64/lib/clear_page.S             | 13 +++++++++++++
 arch/arm64/lib/copy_page.S              | 13 +++++++++++++
 arch/arm64/lib/memcpy.S                 | 19 ++++++++++++++++++-
 arch/arm64/lib/memset.S                 | 20 +++++++++++++++++++-
 13 files changed, 101 insertions(+), 4 deletions(-)


base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
-- 
2.34.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] arm64: probes: Disable kprobes/uprobes on MOPS instructions
  2024-09-30 16:10 [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Kristina Martsenko
@ 2024-09-30 16:10 ` Kristina Martsenko
  2024-10-02 10:28   ` Catalin Marinas
  2024-09-30 16:10 ` [PATCH 2/5] arm64: mops: Handle MOPS exceptions from EL1 Kristina Martsenko
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Kristina Martsenko @ 2024-09-30 16:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

FEAT_MOPS instructions require that all three instructions (prologue,
main and epilogue) appear consecutively in memory. Placing a
kprobe/uprobe on one of them doesn't work as only a single instruction
gets executed out-of-line or simulated. So don't allow placing a probe
on a MOPS instruction.

Fixes: b7564127ffcb ("arm64: mops: detect and enable FEAT_MOPS")
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
---
 arch/arm64/include/asm/insn.h          | 1 +
 arch/arm64/kernel/probes/decode-insn.c | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 8c0a36f72d6f..bc77869dbd43 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -353,6 +353,7 @@ __AARCH64_INSN_FUNCS(ldrsw_lit,	0xFF000000, 0x98000000)
 __AARCH64_INSN_FUNCS(exclusive,	0x3F800000, 0x08000000)
 __AARCH64_INSN_FUNCS(load_ex,	0x3F400000, 0x08400000)
 __AARCH64_INSN_FUNCS(store_ex,	0x3F400000, 0x08000000)
+__AARCH64_INSN_FUNCS(mops,	0x3B200C00, 0x19000400)
 __AARCH64_INSN_FUNCS(stp,	0x7FC00000, 0x29000000)
 __AARCH64_INSN_FUNCS(ldp,	0x7FC00000, 0x29400000)
 __AARCH64_INSN_FUNCS(stp_post,	0x7FC00000, 0x28800000)
diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c
index 968d5fffe233..77f3c8eb0916 100644
--- a/arch/arm64/kernel/probes/decode-insn.c
+++ b/arch/arm64/kernel/probes/decode-insn.c
@@ -58,10 +58,13 @@ static bool __kprobes aarch64_insn_is_steppable(u32 insn)
 	 * Instructions which load PC relative literals are not going to work
 	 * when executed from an XOL slot. Instructions doing an exclusive
 	 * load/store are not going to complete successfully when single-step
-	 * exception handling happens in the middle of the sequence.
+	 * exception handling happens in the middle of the sequence. Memory
+	 * copy/set instructions require that all three instructions be placed
+	 * consecutively in memory.
 	 */
 	if (aarch64_insn_uses_literal(insn) ||
-	    aarch64_insn_is_exclusive(insn))
+	    aarch64_insn_is_exclusive(insn) ||
+	    aarch64_insn_is_mops(insn))
 		return false;
 
 	return true;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] arm64: mops: Handle MOPS exceptions from EL1
  2024-09-30 16:10 [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Kristina Martsenko
  2024-09-30 16:10 ` [PATCH 1/5] arm64: probes: Disable kprobes/uprobes on MOPS instructions Kristina Martsenko
@ 2024-09-30 16:10 ` Kristina Martsenko
  2024-09-30 16:10 ` [PATCH 3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2 Kristina Martsenko
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Kristina Martsenko @ 2024-09-30 16:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

We will soon be using MOPS instructions in the kernel, so wire up the
exception handler to handle exceptions from EL1 caused by the copy/set
operation being stopped and resumed on a different type of CPU.

Add a helper for advancing the single step state machine, similarly to
what the EL0 exception handler does.

Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
---

Note, I haven't looked at whether this needs any kprobes annotations
(NOKPROBE_SYMBOL or __kprobes) like some other exception handlers, as my
understanding is that it's not currently consistently applied anyway.

 arch/arm64/include/asm/debug-monitors.h |  1 +
 arch/arm64/include/asm/exception.h      |  1 +
 arch/arm64/kernel/debug-monitors.c      |  5 +++++
 arch/arm64/kernel/entry-common.c        | 12 ++++++++++++
 arch/arm64/kernel/traps.c               |  7 +++++++
 5 files changed, 26 insertions(+)

diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index 13d437bcbf58..8f6ba31b8658 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -105,6 +105,7 @@ void kernel_enable_single_step(struct pt_regs *regs);
 void kernel_disable_single_step(void);
 int kernel_active_single_step(void);
 void kernel_rewind_single_step(struct pt_regs *regs);
+void kernel_fastforward_single_step(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 int reinstall_suspended_bps(struct pt_regs *regs);
diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
index f296662590c7..8689b95f6b53 100644
--- a/arch/arm64/include/asm/exception.h
+++ b/arch/arm64/include/asm/exception.h
@@ -73,6 +73,7 @@ void do_el0_svc_compat(struct pt_regs *regs);
 void do_el0_fpac(struct pt_regs *regs, unsigned long esr);
 void do_el1_fpac(struct pt_regs *regs, unsigned long esr);
 void do_el0_mops(struct pt_regs *regs, unsigned long esr);
+void do_el1_mops(struct pt_regs *regs, unsigned long esr);
 void do_serror(struct pt_regs *regs, unsigned long esr);
 void do_signal(struct pt_regs *regs);
 
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 024a7b245056..c60a4a90c6a5 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -441,6 +441,11 @@ void kernel_rewind_single_step(struct pt_regs *regs)
 	set_regs_spsr_ss(regs);
 }
 
+void kernel_fastforward_single_step(struct pt_regs *regs)
+{
+	clear_regs_spsr_ss(regs);
+}
+
 /* ptrace API */
 void user_enable_single_step(struct task_struct *task)
 {
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 3fcd9d080bf2..9d174cd541ef 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -463,6 +463,15 @@ static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
 	exit_to_kernel_mode(regs);
 }
 
+static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
+{
+	enter_from_kernel_mode(regs);
+	local_daif_inherit(regs);
+	do_el1_mops(regs, esr);
+	local_daif_mask();
+	exit_to_kernel_mode(regs);
+}
+
 static void noinstr el1_dbg(struct pt_regs *regs, unsigned long esr)
 {
 	unsigned long far = read_sysreg(far_el1);
@@ -505,6 +514,9 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
 	case ESR_ELx_EC_BTI:
 		el1_bti(regs, esr);
 		break;
+	case ESR_ELx_EC_MOPS:
+		el1_mops(regs, esr);
+		break;
 	case ESR_ELx_EC_BREAKPT_CUR:
 	case ESR_ELx_EC_SOFTSTP_CUR:
 	case ESR_ELx_EC_WATCHPT_CUR:
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 563cbce11126..fc6d44e06b8d 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -531,6 +531,13 @@ void do_el0_mops(struct pt_regs *regs, unsigned long esr)
 	user_fastforward_single_step(current);
 }
 
+void do_el1_mops(struct pt_regs *regs, unsigned long esr)
+{
+	arm64_mops_reset_regs(&regs->user_regs, esr);
+
+	kernel_fastforward_single_step(regs);
+}
+
 #define __user_cache_maint(insn, address, res)			\
 	if (address >= TASK_SIZE_MAX) {				\
 		res = -EFAULT;					\
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2
  2024-09-30 16:10 [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Kristina Martsenko
  2024-09-30 16:10 ` [PATCH 1/5] arm64: probes: Disable kprobes/uprobes on MOPS instructions Kristina Martsenko
  2024-09-30 16:10 ` [PATCH 2/5] arm64: mops: Handle MOPS exceptions from EL1 Kristina Martsenko
@ 2024-09-30 16:10 ` Kristina Martsenko
  2024-10-02 10:38   ` Catalin Marinas
  2024-09-30 16:10 ` [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines Kristina Martsenko
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Kristina Martsenko @ 2024-09-30 16:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

Document that hypervisors must set HCR_EL2.MCE2 and handle MOPS
exceptions when they migrate a vCPU to another type of CPU, as Linux may
not be able to handle the exception at all times.

As one example, when running under nested virtualization, KVM does not
handle MOPS exceptions from the nVHE/hVHE EL2 hyp as the hyp is never
migrated, so the host hypervisor needs to handle them. There may be
other situations (now or in the future) where the kernel can't handle an
unexpected MOPS exception, so require that the hypervisor handles them.

Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
---
 Documentation/arch/arm64/booting.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
index b57776a68f15..db46af5b9f0f 100644
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@@ -385,6 +385,9 @@ Before jumping into the kernel, the following conditions must be met:
 
     - HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
 
+    - HCRX_EL2.MCE2 (bit 10) must be initialised to 0b1. The exception
+      handler must set PSTATE.SS to 0b0.
+
   For CPUs with the Extended Translation Control Register feature (FEAT_TCR2):
 
   - If EL3 is present:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines
  2024-09-30 16:10 [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Kristina Martsenko
                   ` (2 preceding siblings ...)
  2024-09-30 16:10 ` [PATCH 3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2 Kristina Martsenko
@ 2024-09-30 16:10 ` Kristina Martsenko
  2024-10-02 15:29   ` Catalin Marinas
  2024-09-30 16:10 ` [PATCH 5/5] arm64: lib: Use MOPS for copy_page() and clear_page() Kristina Martsenko
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Kristina Martsenko @ 2024-09-30 16:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

Make memcpy(), memmove() and memset() use the Armv8.8 FEAT_MOPS
instructions when implemented on the CPU.

The CPY*/SET* instructions copy or set a block of memory of arbitrary
size and alignment. They can be interrupted by the CPU and the copying
resumed later. Their performance is expected to be close to the best
generic copy/set sequence of loads/stores for a given CPU. Using them in
the kernel's copy/set routines therefore avoids the need to periodically
rewrite the routines to optimize for new microarchitectures. It could
also lead to a performance improvement for some CPUs and systems.

With this change the kernel will always use the instructions if they are
implemented on the CPU (and have not been disabled by the arm64.nomops
command line parameter). When not implemented the usual routines will be
used (patched via alternatives). Note, we need to patch B/NOP instead of
the whole sequence to avoid executing a partially patched sequence in
case the compiler generates a mem*() call inside the alternatives
patching code.

Note that MOPS instructions have relaxed behavior on Device memory, but
it is expected that these routines are not generally used on MMIO.

Note: For memcpy(), this uses the CPY* instructions instead of CPYF*, as
CPY* allows overlaps between the source and destination buffers, and
despite contradicting the C standard, compilers require that memcpy()
work on exactly overlapping source and destination:
  https://gcc.gnu.org/onlinedocs/gcc/Standards.html#C-Language
  https://reviews.llvm.org/D86993

Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
---
 arch/arm64/Kconfig      |  3 +++
 arch/arm64/lib/memcpy.S | 19 ++++++++++++++++++-
 arch/arm64/lib/memset.S | 20 +++++++++++++++++++-
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3e29b44d2d7b..d0fe90ea704d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2155,6 +2155,9 @@ config ARM64_EPAN
 	  if the cpu does not implement the feature.
 endmenu # "ARMv8.7 architectural features"
 
+config AS_HAS_MOPS
+	def_bool $(as-instr,.arch_extension mops)
+
 menu "ARMv8.9 architectural features"
 
 config ARM64_POE
diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
index 4ab48d49c451..9b99106fb95f 100644
--- a/arch/arm64/lib/memcpy.S
+++ b/arch/arm64/lib/memcpy.S
@@ -57,7 +57,7 @@
    The loop tail is handled by always copying 64 bytes from the end.
 */
 
-SYM_FUNC_START(__pi_memcpy)
+SYM_FUNC_START_LOCAL(__pi_memcpy_generic)
 	add	srcend, src, count
 	add	dstend, dstin, count
 	cmp	count, 128
@@ -238,7 +238,24 @@ L(copy64_from_start):
 	stp	B_l, B_h, [dstin, 16]
 	stp	C_l, C_h, [dstin]
 	ret
+SYM_FUNC_END(__pi_memcpy_generic)
+
+#ifdef CONFIG_AS_HAS_MOPS
+	.arch_extension mops
+SYM_FUNC_START(__pi_memcpy)
+alternative_if_not ARM64_HAS_MOPS
+	b	__pi_memcpy_generic
+alternative_else_nop_endif
+
+	mov	dst, dstin
+	cpyp	[dst]!, [src]!, count!
+	cpym	[dst]!, [src]!, count!
+	cpye	[dst]!, [src]!, count!
+	ret
 SYM_FUNC_END(__pi_memcpy)
+#else
+SYM_FUNC_ALIAS(__pi_memcpy, __pi_memcpy_generic)
+#endif
 
 SYM_FUNC_ALIAS(__memcpy, __pi_memcpy)
 EXPORT_SYMBOL(__memcpy)
diff --git a/arch/arm64/lib/memset.S b/arch/arm64/lib/memset.S
index a5aebe82ad73..97157da65ec6 100644
--- a/arch/arm64/lib/memset.S
+++ b/arch/arm64/lib/memset.S
@@ -26,6 +26,7 @@
  */
 
 dstin		.req	x0
+val_x		.req	x1
 val		.req	w1
 count		.req	x2
 tmp1		.req	x3
@@ -42,7 +43,7 @@ dst		.req	x8
 tmp3w		.req	w9
 tmp3		.req	x9
 
-SYM_FUNC_START(__pi_memset)
+SYM_FUNC_START_LOCAL(__pi_memset_generic)
 	mov	dst, dstin	/* Preserve return value.  */
 	and	A_lw, val, #255
 	orr	A_lw, A_lw, A_lw, lsl #8
@@ -201,7 +202,24 @@ SYM_FUNC_START(__pi_memset)
 	ands	count, count, zva_bits_x
 	b.ne	.Ltail_maybe_long
 	ret
+SYM_FUNC_END(__pi_memset_generic)
+
+#ifdef CONFIG_AS_HAS_MOPS
+	.arch_extension mops
+SYM_FUNC_START(__pi_memset)
+alternative_if_not ARM64_HAS_MOPS
+	b	__pi_memset_generic
+alternative_else_nop_endif
+
+	mov	dst, dstin
+	setp	[dst]!, count!, val_x
+	setm	[dst]!, count!, val_x
+	sete	[dst]!, count!, val_x
+	ret
 SYM_FUNC_END(__pi_memset)
+#else
+SYM_FUNC_ALIAS(__pi_memset, __pi_memset_generic)
+#endif
 
 SYM_FUNC_ALIAS(__memset, __pi_memset)
 EXPORT_SYMBOL(__memset)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] arm64: lib: Use MOPS for copy_page() and clear_page()
  2024-09-30 16:10 [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Kristina Martsenko
                   ` (3 preceding siblings ...)
  2024-09-30 16:10 ` [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines Kristina Martsenko
@ 2024-09-30 16:10 ` Kristina Martsenko
  2024-10-02 15:37   ` Catalin Marinas
  2024-10-02 16:20 ` [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Catalin Marinas
  2024-10-17 18:00 ` Catalin Marinas
  6 siblings, 1 reply; 20+ messages in thread
From: Kristina Martsenko @ 2024-09-30 16:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

Similarly to what was done to the memcpy() routines, make copy_page()
and clear_page() also use the Armv8.8 FEAT_MOPS instructions.

Note: For copy_page() this uses the CPY* instructions instead of CPYF*
as CPYF* doesn't allow src and dst to be equal. It's not clear if
copy_page() needs to allow equal src and dst but it has worked so far
with the current implementation and there is no documentation forbidding
it.

Note, the unoptimized version of copy_page() in assembler.h is left as
it is.

Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
---
 arch/arm64/lib/clear_page.S | 13 +++++++++++++
 arch/arm64/lib/copy_page.S  | 13 +++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
index ebde40e7fa2b..bd6f7d5eb6eb 100644
--- a/arch/arm64/lib/clear_page.S
+++ b/arch/arm64/lib/clear_page.S
@@ -15,6 +15,19 @@
  *	x0 - dest
  */
 SYM_FUNC_START(__pi_clear_page)
+#ifdef CONFIG_AS_HAS_MOPS
+	.arch_extension mops
+alternative_if_not ARM64_HAS_MOPS
+	b	.Lno_mops
+alternative_else_nop_endif
+
+	mov	x1, #PAGE_SIZE
+	setpn	[x0]!, x1!, xzr
+	setmn	[x0]!, x1!, xzr
+	seten	[x0]!, x1!, xzr
+	ret
+.Lno_mops:
+#endif
 	mrs	x1, dczid_el0
 	tbnz	x1, #4, 2f	/* Branch if DC ZVA is prohibited */
 	and	w1, w1, #0xf
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index 6a56d7cf309d..e6374e7e5511 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -18,6 +18,19 @@
  *	x1 - src
  */
 SYM_FUNC_START(__pi_copy_page)
+#ifdef CONFIG_AS_HAS_MOPS
+	.arch_extension mops
+alternative_if_not ARM64_HAS_MOPS
+	b	.Lno_mops
+alternative_else_nop_endif
+
+	mov	x2, #PAGE_SIZE
+	cpypwn	[x0]!, [x1]!, x2!
+	cpymwn	[x0]!, [x1]!, x2!
+	cpyewn	[x0]!, [x1]!, x2!
+	ret
+.Lno_mops:
+#endif
 	ldp	x2, x3, [x1]
 	ldp	x4, x5, [x1, #16]
 	ldp	x6, x7, [x1, #32]
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] arm64: probes: Disable kprobes/uprobes on MOPS instructions
  2024-09-30 16:10 ` [PATCH 1/5] arm64: probes: Disable kprobes/uprobes on MOPS instructions Kristina Martsenko
@ 2024-10-02 10:28   ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-10-02 10:28 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On Mon, Sep 30, 2024 at 05:10:47PM +0100, Kristina Martsenko wrote:
> FEAT_MOPS instructions require that all three instructions (prologue,
> main and epilogue) appear consecutively in memory. Placing a
> kprobe/uprobe on one of them doesn't work as only a single instruction
> gets executed out-of-line or simulated. So don't allow placing a probe
> on a MOPS instruction.
> 
> Fixes: b7564127ffcb ("arm64: mops: detect and enable FEAT_MOPS")
> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>

I think this would benefit from a cc stable:

Cc: <stable@vger.kernel.org> # 6.5.x

I can add it when applying the patch.

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2
  2024-09-30 16:10 ` [PATCH 3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2 Kristina Martsenko
@ 2024-10-02 10:38   ` Catalin Marinas
  2024-10-02 13:31     ` Kristina Martsenko
  0 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2024-10-02 10:38 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel@lists.infradead.org, Will Deacon, Mark Rutland,
	Robin Murphy, Marc Zyngier

On Mon, Sep 30, 2024 at 05:10:49PM +0100, Kristina Martsenko wrote:
> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
> index b57776a68f15..db46af5b9f0f 100644
> --- a/Documentation/arch/arm64/booting.rst
> +++ b/Documentation/arch/arm64/booting.rst
> @@ -385,6 +385,9 @@ Before jumping into the kernel, the following conditions must be met:
>  
>      - HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
>  
> +    - HCRX_EL2.MCE2 (bit 10) must be initialised to 0b1. The exception
> +      handler must set PSTATE.SS to 0b0.

That's a booting document, do we need to specify the single-step
exception?

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2
  2024-10-02 10:38   ` Catalin Marinas
@ 2024-10-02 13:31     ` Kristina Martsenko
  2024-10-02 17:09       ` Catalin Marinas
  0 siblings, 1 reply; 20+ messages in thread
From: Kristina Martsenko @ 2024-10-02 13:31 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel@lists.infradead.org, Will Deacon, Mark Rutland,
	Robin Murphy, Marc Zyngier

On 02/10/2024 11:38, Catalin Marinas wrote:
> On Mon, Sep 30, 2024 at 05:10:49PM +0100, Kristina Martsenko wrote:
>> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
>> index b57776a68f15..db46af5b9f0f 100644
>> --- a/Documentation/arch/arm64/booting.rst
>> +++ b/Documentation/arch/arm64/booting.rst
>> @@ -385,6 +385,9 @@ Before jumping into the kernel, the following conditions must be met:
>>  
>>      - HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
>>  
>> +    - HCRX_EL2.MCE2 (bit 10) must be initialised to 0b1. The exception
>> +      handler must set PSTATE.SS to 0b0.
> 
> That's a booting document, do we need to specify the single-step
> exception?

A hypervisor can't just set MCE2 at kernel boot without also implementing an
exception handler for MOPS exceptions. The handler needs to implement the
algorithm from the Arm ARM, and in addition the kernel needs it to also clear
SS so that breakpoints/watchpoints (and KGDB single stepping) work as expected.
Is there a better place to specify this?

Thanks,
Kristina



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines
  2024-09-30 16:10 ` [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines Kristina Martsenko
@ 2024-10-02 15:29   ` Catalin Marinas
  2024-10-03 16:46     ` Kristina Martsenko
  0 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2024-10-02 15:29 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote:
> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
> index 4ab48d49c451..9b99106fb95f 100644
> --- a/arch/arm64/lib/memcpy.S
> +++ b/arch/arm64/lib/memcpy.S
> @@ -57,7 +57,7 @@
>     The loop tail is handled by always copying 64 bytes from the end.
>  */
>  
> -SYM_FUNC_START(__pi_memcpy)
> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic)
>  	add	srcend, src, count
>  	add	dstend, dstin, count
>  	cmp	count, 128
> @@ -238,7 +238,24 @@ L(copy64_from_start):
>  	stp	B_l, B_h, [dstin, 16]
>  	stp	C_l, C_h, [dstin]
>  	ret
> +SYM_FUNC_END(__pi_memcpy_generic)
> +
> +#ifdef CONFIG_AS_HAS_MOPS
> +	.arch_extension mops
> +SYM_FUNC_START(__pi_memcpy)
> +alternative_if_not ARM64_HAS_MOPS
> +	b	__pi_memcpy_generic
> +alternative_else_nop_endif

I'm fine with patching the branch but I wonder whether, for the time
being, we should use alternative_if instead and the NOP to fall through
the default implementation. The hardware in the field doesn't have
FEAT_MOPS yet and they may see a slight penalty introduced by the
branch, especially for small memcpys. Just guessing, I haven't done any
benchmarks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] arm64: lib: Use MOPS for copy_page() and clear_page()
  2024-09-30 16:10 ` [PATCH 5/5] arm64: lib: Use MOPS for copy_page() and clear_page() Kristina Martsenko
@ 2024-10-02 15:37   ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-10-02 15:37 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On Mon, Sep 30, 2024 at 05:10:51PM +0100, Kristina Martsenko wrote:
> Similarly to what was done to the memcpy() routines, make copy_page()
> and clear_page() also use the Armv8.8 FEAT_MOPS instructions.
> 
> Note: For copy_page() this uses the CPY* instructions instead of CPYF*
> as CPYF* doesn't allow src and dst to be equal. It's not clear if
> copy_page() needs to allow equal src and dst but it has worked so far
> with the current implementation and there is no documentation forbidding
> it.

When we get real hardware, if CPYF* is faster we should switch to these
instructions. I wouldn't expect source and destination to be the same
but we can add a check.

>  SYM_FUNC_START(__pi_copy_page)
> +#ifdef CONFIG_AS_HAS_MOPS
> +	.arch_extension mops
> +alternative_if_not ARM64_HAS_MOPS
> +	b	.Lno_mops
> +alternative_else_nop_endif

Same comment as on the previous patch w.r.t. the branch.

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] arm64: Use memory copy instructions in kernel routines
  2024-09-30 16:10 [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Kristina Martsenko
                   ` (4 preceding siblings ...)
  2024-09-30 16:10 ` [PATCH 5/5] arm64: lib: Use MOPS for copy_page() and clear_page() Kristina Martsenko
@ 2024-10-02 16:20 ` Catalin Marinas
  2024-10-03 16:49   ` Kristina Martsenko
  2024-10-17 18:00 ` Catalin Marinas
  6 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2024-10-02 16:20 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On Mon, Sep 30, 2024 at 05:10:46PM +0100, Kristina Martsenko wrote:
> This series makes the memcpy(), memmove() and memset() routines use the
> CPY/SET instructions, as well as copy_page() and clear_page().

Another function we should optimise is mte_zero_clear_page_tags() using
SETG*.

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2
  2024-10-02 13:31     ` Kristina Martsenko
@ 2024-10-02 17:09       ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-10-02 17:09 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel@lists.infradead.org, Will Deacon, Mark Rutland,
	Robin Murphy, Marc Zyngier

On Wed, Oct 02, 2024 at 02:31:47PM +0100, Kristina Martsenko wrote:
> On 02/10/2024 11:38, Catalin Marinas wrote:
> > On Mon, Sep 30, 2024 at 05:10:49PM +0100, Kristina Martsenko wrote:
> >> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
> >> index b57776a68f15..db46af5b9f0f 100644
> >> --- a/Documentation/arch/arm64/booting.rst
> >> +++ b/Documentation/arch/arm64/booting.rst
> >> @@ -385,6 +385,9 @@ Before jumping into the kernel, the following conditions must be met:
> >>  
> >>      - HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
> >>  
> >> +    - HCRX_EL2.MCE2 (bit 10) must be initialised to 0b1. The exception
> >> +      handler must set PSTATE.SS to 0b0.
> > 
> > That's a booting document, do we need to specify the single-step
> > exception?
> 
> A hypervisor can't just set MCE2 at kernel boot without also implementing an
> exception handler for MOPS exceptions. The handler needs to implement the
> algorithm from the Arm ARM, and in addition the kernel needs it to also clear
> SS so that breakpoints/watchpoints (and KGDB single stepping) work as expected.
> Is there a better place to specify this?

Not sure, maybe a short mops.rst document describing the exception
handling needs for a hypervisor running Linux (well, if it's just a couple
of sentences, we might as well keep them in booting.rst). In a mops.rst
you could add more lines explaining the exception handling and the
reasoning behind PSTATE.SS. In booting.rst you can just refer mops.rst.

I'm trying to remember the discussions that lead to such requirement.
Basically the worry is that the vCPU the kernel is running on migrates
to another physical CPU with a different MOPS implementation and
triggers a fault into the kernel. The kernel may not be able to handle
the fault itself, hence setting MCE2 to force trapping to EL2.

This is all fine but the requirement for the hypervisor to clear
PSTATE.SS feels a bit strange. Doesn't it break the kernel's state
machine (or gdb's) if, suddenly, it no longer traps the next
instruction?

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines
  2024-10-02 15:29   ` Catalin Marinas
@ 2024-10-03 16:46     ` Kristina Martsenko
  2024-10-04 10:07       ` Catalin Marinas
  0 siblings, 1 reply; 20+ messages in thread
From: Kristina Martsenko @ 2024-10-03 16:46 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On 02/10/2024 16:29, Catalin Marinas wrote:
> On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote:
>> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
>> index 4ab48d49c451..9b99106fb95f 100644
>> --- a/arch/arm64/lib/memcpy.S
>> +++ b/arch/arm64/lib/memcpy.S
>> @@ -57,7 +57,7 @@
>>     The loop tail is handled by always copying 64 bytes from the end.
>>  */
>>  
>> -SYM_FUNC_START(__pi_memcpy)
>> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic)
>>  	add	srcend, src, count
>>  	add	dstend, dstin, count
>>  	cmp	count, 128
>> @@ -238,7 +238,24 @@ L(copy64_from_start):
>>  	stp	B_l, B_h, [dstin, 16]
>>  	stp	C_l, C_h, [dstin]
>>  	ret
>> +SYM_FUNC_END(__pi_memcpy_generic)
>> +
>> +#ifdef CONFIG_AS_HAS_MOPS
>> +	.arch_extension mops
>> +SYM_FUNC_START(__pi_memcpy)
>> +alternative_if_not ARM64_HAS_MOPS
>> +	b	__pi_memcpy_generic
>> +alternative_else_nop_endif
> 
> I'm fine with patching the branch but I wonder whether, for the time
> being, we should use alternative_if instead and the NOP to fall through
> the default implementation. The hardware in the field doesn't have
> FEAT_MOPS yet and they may see a slight penalty introduced by the
> branch, especially for small memcpys. Just guessing, I haven't done any
> benchmarks.

My thinking was that this way it doesn't have to be changed again in the
future. But I'm fine with switching to alternative_if for v2.

Thanks,
Kristina



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] arm64: Use memory copy instructions in kernel routines
  2024-10-02 16:20 ` [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Catalin Marinas
@ 2024-10-03 16:49   ` Kristina Martsenko
  2024-10-04 10:10     ` Catalin Marinas
  0 siblings, 1 reply; 20+ messages in thread
From: Kristina Martsenko @ 2024-10-03 16:49 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On 02/10/2024 17:20, Catalin Marinas wrote:
> On Mon, Sep 30, 2024 at 05:10:46PM +0100, Kristina Martsenko wrote:
>> This series makes the memcpy(), memmove() and memset() routines use the
>> CPY/SET instructions, as well as copy_page() and clear_page().
> 
> Another function we should optimise is mte_zero_clear_page_tags() using
> SETG*.

I can send that as a separate series. What about mte_set_mem_tag_range()?

Thanks,
Kristina



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines
  2024-10-03 16:46     ` Kristina Martsenko
@ 2024-10-04 10:07       ` Catalin Marinas
  2024-10-16 13:08         ` Kristina Martsenko
  0 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2024-10-04 10:07 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On Thu, Oct 03, 2024 at 05:46:08PM +0100, Kristina Martsenko wrote:
> On 02/10/2024 16:29, Catalin Marinas wrote:
> > On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote:
> >> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
> >> index 4ab48d49c451..9b99106fb95f 100644
> >> --- a/arch/arm64/lib/memcpy.S
> >> +++ b/arch/arm64/lib/memcpy.S
> >> @@ -57,7 +57,7 @@
> >>     The loop tail is handled by always copying 64 bytes from the end.
> >>  */
> >>  
> >> -SYM_FUNC_START(__pi_memcpy)
> >> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic)
> >>  	add	srcend, src, count
> >>  	add	dstend, dstin, count
> >>  	cmp	count, 128
> >> @@ -238,7 +238,24 @@ L(copy64_from_start):
> >>  	stp	B_l, B_h, [dstin, 16]
> >>  	stp	C_l, C_h, [dstin]
> >>  	ret
> >> +SYM_FUNC_END(__pi_memcpy_generic)
> >> +
> >> +#ifdef CONFIG_AS_HAS_MOPS
> >> +	.arch_extension mops
> >> +SYM_FUNC_START(__pi_memcpy)
> >> +alternative_if_not ARM64_HAS_MOPS
> >> +	b	__pi_memcpy_generic
> >> +alternative_else_nop_endif
> > 
> > I'm fine with patching the branch but I wonder whether, for the time
> > being, we should use alternative_if instead and the NOP to fall through
> > the default implementation. The hardware in the field doesn't have
> > FEAT_MOPS yet and they may see a slight penalty introduced by the
> > branch, especially for small memcpys. Just guessing, I haven't done any
> > benchmarks.
> 
> My thinking was that this way it doesn't have to be changed again in the
> future. But I'm fine with switching to alternative_if for v2.

The other option is to benchmark the proposed patches a bit and see if
we notice any difference on current hardware. Not sure exactly what
benchmarks would exercise these paths. For copy_page(), I suspect the
branch is probably lost in the noise. It's more like small copies that
might notice.

Yet another option is to leave the patches as they are and see if anyone
complains, we swap them over then ;).

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] arm64: Use memory copy instructions in kernel routines
  2024-10-03 16:49   ` Kristina Martsenko
@ 2024-10-04 10:10     ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-10-04 10:10 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On Thu, Oct 03, 2024 at 05:49:57PM +0100, Kristina Martsenko wrote:
> On 02/10/2024 17:20, Catalin Marinas wrote:
> > On Mon, Sep 30, 2024 at 05:10:46PM +0100, Kristina Martsenko wrote:
> >> This series makes the memcpy(), memmove() and memset() routines use the
> >> CPY/SET instructions, as well as copy_page() and clear_page().
> > 
> > Another function we should optimise is mte_zero_clear_page_tags() using
> > SETG*.
> 
> I can send that as a separate series.

Sounds good, thanks.

> What about mte_set_mem_tag_range()?

Ah, that as well but I guess only the init == true path since I don't
think we have any FEAT_MOPS instruction for only setting the tags.

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines
  2024-10-04 10:07       ` Catalin Marinas
@ 2024-10-16 13:08         ` Kristina Martsenko
  2024-10-17 11:57           ` Catalin Marinas
  0 siblings, 1 reply; 20+ messages in thread
From: Kristina Martsenko @ 2024-10-16 13:08 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On 04/10/2024 11:07, Catalin Marinas wrote:
> On Thu, Oct 03, 2024 at 05:46:08PM +0100, Kristina Martsenko wrote:
>> On 02/10/2024 16:29, Catalin Marinas wrote:
>>> On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote:
>>>> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
>>>> index 4ab48d49c451..9b99106fb95f 100644
>>>> --- a/arch/arm64/lib/memcpy.S
>>>> +++ b/arch/arm64/lib/memcpy.S
>>>> @@ -57,7 +57,7 @@
>>>>     The loop tail is handled by always copying 64 bytes from the end.
>>>>  */
>>>>  
>>>> -SYM_FUNC_START(__pi_memcpy)
>>>> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic)
>>>>  	add	srcend, src, count
>>>>  	add	dstend, dstin, count
>>>>  	cmp	count, 128
>>>> @@ -238,7 +238,24 @@ L(copy64_from_start):
>>>>  	stp	B_l, B_h, [dstin, 16]
>>>>  	stp	C_l, C_h, [dstin]
>>>>  	ret
>>>> +SYM_FUNC_END(__pi_memcpy_generic)
>>>> +
>>>> +#ifdef CONFIG_AS_HAS_MOPS
>>>> +	.arch_extension mops
>>>> +SYM_FUNC_START(__pi_memcpy)
>>>> +alternative_if_not ARM64_HAS_MOPS
>>>> +	b	__pi_memcpy_generic
>>>> +alternative_else_nop_endif
>>>
>>> I'm fine with patching the branch but I wonder whether, for the time
>>> being, we should use alternative_if instead and the NOP to fall through
>>> the default implementation. The hardware in the field doesn't have
>>> FEAT_MOPS yet and they may see a slight penalty introduced by the
>>> branch, especially for small memcpys. Just guessing, I haven't done any
>>> benchmarks.
>>
>> My thinking was that this way it doesn't have to be changed again in the
>> future. But I'm fine with switching to alternative_if for v2.
> 
> The other option is to benchmark the proposed patches a bit and see if
> we notice any difference on current hardware. Not sure exactly what
> benchmarks would exercise these paths. For copy_page(), I suspect the
> branch is probably lost in the noise. It's more like small copies that
> might notice.
> 
> Yet another option is to leave the patches as they are and see if anyone
> complains, we swap them over then ;).

I tried benchmarking a kernel build and hackbench on a Morello board (with
usercopy patches applied as well) but didn't see any significant performance
difference between the branch and NOP so I would leave the patches as they are.

Thanks,
Kristina



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines
  2024-10-16 13:08         ` Kristina Martsenko
@ 2024-10-17 11:57           ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-10-17 11:57 UTC (permalink / raw)
  To: Kristina Martsenko
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Robin Murphy,
	Marc Zyngier

On Wed, Oct 16, 2024 at 02:08:27PM +0100, Kristina Martsenko wrote:
> On 04/10/2024 11:07, Catalin Marinas wrote:
> > On Thu, Oct 03, 2024 at 05:46:08PM +0100, Kristina Martsenko wrote:
> >> On 02/10/2024 16:29, Catalin Marinas wrote:
> >>> On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote:
> >>>> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
> >>>> index 4ab48d49c451..9b99106fb95f 100644
> >>>> --- a/arch/arm64/lib/memcpy.S
> >>>> +++ b/arch/arm64/lib/memcpy.S
> >>>> @@ -57,7 +57,7 @@
> >>>>     The loop tail is handled by always copying 64 bytes from the end.
> >>>>  */
> >>>>  
> >>>> -SYM_FUNC_START(__pi_memcpy)
> >>>> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic)
> >>>>  	add	srcend, src, count
> >>>>  	add	dstend, dstin, count
> >>>>  	cmp	count, 128
> >>>> @@ -238,7 +238,24 @@ L(copy64_from_start):
> >>>>  	stp	B_l, B_h, [dstin, 16]
> >>>>  	stp	C_l, C_h, [dstin]
> >>>>  	ret
> >>>> +SYM_FUNC_END(__pi_memcpy_generic)
> >>>> +
> >>>> +#ifdef CONFIG_AS_HAS_MOPS
> >>>> +	.arch_extension mops
> >>>> +SYM_FUNC_START(__pi_memcpy)
> >>>> +alternative_if_not ARM64_HAS_MOPS
> >>>> +	b	__pi_memcpy_generic
> >>>> +alternative_else_nop_endif
> >>>
> >>> I'm fine with patching the branch but I wonder whether, for the time
> >>> being, we should use alternative_if instead and the NOP to fall through
> >>> the default implementation. The hardware in the field doesn't have
> >>> FEAT_MOPS yet and they may see a slight penalty introduced by the
> >>> branch, especially for small memcpys. Just guessing, I haven't done any
> >>> benchmarks.
> >>
> >> My thinking was that this way it doesn't have to be changed again in the
> >> future. But I'm fine with switching to alternative_if for v2.
> > 
> > The other option is to benchmark the proposed patches a bit and see if
> > we notice any difference on current hardware. Not sure exactly what
> > benchmarks would exercise these paths. For copy_page(), I suspect the
> > branch is probably lost in the noise. It's more like small copies that
> > might notice.
> > 
> > Yet another option is to leave the patches as they are and see if anyone
> > complains, we swap them over then ;).
> 
> I tried benchmarking a kernel build and hackbench on a Morello board (with
> usercopy patches applied as well) but didn't see any significant performance
> difference between the branch and NOP so I would leave the patches as they are.

That's great. Thanks for checking.

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] arm64: Use memory copy instructions in kernel routines
  2024-09-30 16:10 [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Kristina Martsenko
                   ` (5 preceding siblings ...)
  2024-10-02 16:20 ` [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Catalin Marinas
@ 2024-10-17 18:00 ` Catalin Marinas
  6 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-10-17 18:00 UTC (permalink / raw)
  To: linux-arm-kernel, Kristina Martsenko
  Cc: Will Deacon, Mark Rutland, Robin Murphy, Marc Zyngier

On Mon, 30 Sep 2024 17:10:46 +0100, Kristina Martsenko wrote:
> Here is a small series to make memcpy() and related functions use the
> memory copy/set instructions (Armv8.8 FEAT_MOPS).
> 
> The kernel uses several library routines for copying or initializing
> memory, for example copy_to_user() and memset(). These routines have
> been optimized to make their load/store sequence perform well across a
> range of CPUs. However the chosen sequence can't be the fastest possible
> for every CPU microarchitecture nor for heterogeneous systems, and needs
> to be rewritten periodically as hardware changes.
> 
> [...]

Applied to arm64 (for-next/mops), thanks!

I left the documentation patch as is but it may be helpful to add a
dedicated mops.txt one with some explanations around hypevisor
requirements. Not urgent though.

[1/5] arm64: probes: Disable kprobes/uprobes on MOPS instructions
      https://git.kernel.org/arm64/c/c56c599d9002
[2/5] arm64: mops: Handle MOPS exceptions from EL1
      https://git.kernel.org/arm64/c/13840229d6bd
[3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2
      https://git.kernel.org/arm64/c/b616058c6613
[4/5] arm64: lib: Use MOPS for memcpy() routines
      https://git.kernel.org/arm64/c/836ed3c4e473
[5/5] arm64: lib: Use MOPS for copy_page() and clear_page()
      https://git.kernel.org/arm64/c/ce6b5ff5f16d

-- 
Catalin



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-10-17 18:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-30 16:10 [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Kristina Martsenko
2024-09-30 16:10 ` [PATCH 1/5] arm64: probes: Disable kprobes/uprobes on MOPS instructions Kristina Martsenko
2024-10-02 10:28   ` Catalin Marinas
2024-09-30 16:10 ` [PATCH 2/5] arm64: mops: Handle MOPS exceptions from EL1 Kristina Martsenko
2024-09-30 16:10 ` [PATCH 3/5] arm64: mops: Document booting requirement for HCR_EL2.MCE2 Kristina Martsenko
2024-10-02 10:38   ` Catalin Marinas
2024-10-02 13:31     ` Kristina Martsenko
2024-10-02 17:09       ` Catalin Marinas
2024-09-30 16:10 ` [PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines Kristina Martsenko
2024-10-02 15:29   ` Catalin Marinas
2024-10-03 16:46     ` Kristina Martsenko
2024-10-04 10:07       ` Catalin Marinas
2024-10-16 13:08         ` Kristina Martsenko
2024-10-17 11:57           ` Catalin Marinas
2024-09-30 16:10 ` [PATCH 5/5] arm64: lib: Use MOPS for copy_page() and clear_page() Kristina Martsenko
2024-10-02 15:37   ` Catalin Marinas
2024-10-02 16:20 ` [PATCH 0/5] arm64: Use memory copy instructions in kernel routines Catalin Marinas
2024-10-03 16:49   ` Kristina Martsenko
2024-10-04 10:10     ` Catalin Marinas
2024-10-17 18:00 ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).