* [PATCH 0/3] arm64: Run kernel mode NEON with preemption enabled
@ 2023-11-21 13:52 Ard Biesheuvel
2023-11-21 13:52 ` [PATCH 1/3] arm64: fpsimd: Drop unneeded 'busy' flag Ard Biesheuvel
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2023-11-21 13:52 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
From: Ard Biesheuvel <ardb@kernel.org>
Currently, kernel mode NEON (SIMD) support is implemented in a way that
requires preemption to be disabled while the SIMD registers are live.
The reason for this is that those registers are not in the set that is
preserved/restored on exception entry/exit and context switch, as this
would impact performance generally, even for workloads where kernel mode
SIMD is not the bottleneck.
However, doing substantial work with preemption disabled is not great,
as it affects scheduling latency, which is especially problematic for
real-time use cases. So ideally, we should keep preemption enabled when
we can, and find another way to ensure that this does not corrupt the
NEON register state of in-kernel SIMD users.
This series implements a suggestion by Mark Rutland, and introduces a
thread_info flag TIF_USING_KMODE_NEON, which indicates to the thread
switch machinery that the task in question has live kernel mode SIMD
state which needs to be preserved and restored. The space needed for
this is allocated in thread_struct. (*)
Given that currently, we run kernel mode NEON with softirqs disabled (to
avoid the need for preserving kernel mode NEON context belonging to task
context while the SIMD unit is being used by code running in softirq
context), just removing the preempt_disable/enable calls is not
sufficient, and we also need to leave softirqs enabled. This means that
we may need to preserve kernel mode NEON state not only on a context
switch, but also when code running in softirq context takes ownership of
the SIMD unit, but this is straight-forward once we add the scratch
space to thread_struct.
(*) We might decide to allocate this space (~512 bytes) dynamically, if
the thread_struct memory footprint causes issues. However, we should
also explore doing the same for the user space FPSIMD state, as kernel
threads never return to user space and have no need for this allocation.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Ard Biesheuvel (3):
arm64: fpsimd: Drop unneeded 'busy' flag
arm64: fpsimd: preserve/restore kernel mode NEON at context switch
arm64: crypto: remove conditional yield logic
arch/arm64/crypto/aes-glue.c | 21 ++--
arch/arm64/crypto/aes-modes.S | 2 -
arch/arm64/crypto/sha1-ce-core.S | 2 -
arch/arm64/crypto/sha1-ce-glue.c | 19 +--
arch/arm64/crypto/sha2-ce-core.S | 2 -
arch/arm64/crypto/sha2-ce-glue.c | 19 +--
arch/arm64/crypto/sha3-ce-core.S | 4 +-
arch/arm64/crypto/sha3-ce-glue.c | 14 +--
arch/arm64/crypto/sha512-ce-core.S | 2 -
arch/arm64/crypto/sha512-ce-glue.c | 16 +--
arch/arm64/include/asm/assembler.h | 29 -----
arch/arm64/include/asm/processor.h | 2 +
arch/arm64/include/asm/simd.h | 11 +-
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/kernel/asm-offsets.c | 4 -
arch/arm64/kernel/fpsimd.c | 128 +++++++++++---------
16 files changed, 106 insertions(+), 170 deletions(-)
--
2.43.0.rc1.413.gea7ed67945-goog
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] arm64: fpsimd: Drop unneeded 'busy' flag
2023-11-21 13:52 [PATCH 0/3] arm64: Run kernel mode NEON with preemption enabled Ard Biesheuvel
@ 2023-11-21 13:52 ` Ard Biesheuvel
2023-11-21 16:58 ` Mark Brown
2023-11-21 13:52 ` [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch Ard Biesheuvel
2023-11-21 13:52 ` [PATCH 3/3] arm64: crypto: remove conditional yield logic Ard Biesheuvel
2 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2023-11-21 13:52 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
From: Ard Biesheuvel <ardb@kernel.org>
Kernel mode NEON will preserve the user mode FPSIMD state by saving it
into the task struct before clobbering the registers. In order to avoid
the need for preserving kernel mode state too, we disallow nested use of
kernel mode NEON, i..e, use in softirq context while the interrupted
task context was using kernel mode NEON too.
Originally, this policy was implemented using a per-CPU flag which was
exposed via may_use_simd(), requiring the users of the kernel mode NEON
to deal with the possibility that it might return false, and having NEON
and non-NEON code paths. This policy was changed by commit
13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
disabled"), and now, softirq processing is disabled entirely instead,
and so may_use_simd() can never fail when called from task or softirq
context.
This means we can drop the fpsimd_context_busy flag entirely, and
instead, ensure that we disable softirq processing in places where we
formerly relied on the flag for preventing races in the FPSIMD preserve
routines.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/simd.h | 11 +---
arch/arm64/kernel/fpsimd.c | 53 +++++---------------
2 files changed, 13 insertions(+), 51 deletions(-)
diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
index 6a75d7ecdcaa..8e86c9e70e48 100644
--- a/arch/arm64/include/asm/simd.h
+++ b/arch/arm64/include/asm/simd.h
@@ -12,8 +12,6 @@
#include <linux/preempt.h>
#include <linux/types.h>
-DECLARE_PER_CPU(bool, fpsimd_context_busy);
-
#ifdef CONFIG_KERNEL_MODE_NEON
/*
@@ -28,17 +26,10 @@ static __must_check inline bool may_use_simd(void)
/*
* We must make sure that the SVE has been initialized properly
* before using the SIMD in kernel.
- * fpsimd_context_busy is only set while preemption is disabled,
- * and is clear whenever preemption is enabled. Since
- * this_cpu_read() is atomic w.r.t. preemption, fpsimd_context_busy
- * cannot change under our feet -- if it's set we cannot be
- * migrated, and if it's clear we cannot be migrated to a CPU
- * where it is set.
*/
return !WARN_ON(!system_capabilities_finalized()) &&
system_supports_fpsimd() &&
- !in_hardirq() && !irqs_disabled() && !in_nmi() &&
- !this_cpu_read(fpsimd_context_busy);
+ !in_hardirq() && !irqs_disabled() && !in_nmi();
}
#else /* ! CONFIG_KERNEL_MODE_NEON */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1559c706d32d..ccc4a78a70e4 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -85,13 +85,13 @@
* softirq kicks in. Upon vcpu_put(), KVM will save the vcpu FP state and
* flag the register state as invalid.
*
- * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may
- * save the task's FPSIMD context back to task_struct from softirq context.
- * To prevent this from racing with the manipulation of the task's FPSIMD state
- * from task context and thereby corrupting the state, it is necessary to
- * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
- * flag with {, __}get_cpu_fpsimd_context(). This will still allow softirqs to
- * run but prevent them to use FPSIMD.
+ * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may be
+ * called from softirq context, which will save the task's FPSIMD context back
+ * to task_struct. To prevent this from racing with the manipulation of the
+ * task's FPSIMD state from task context and thereby corrupting the state, it
+ * is necessary to protect any manipulation of a task's fpsimd_state or
+ * TIF_FOREIGN_FPSTATE flag with get_cpu_fpsimd_context(), which will suspend
+ * softirq servicing entirely until put_cpu_fpsimd_context() is called.
*
* For a certain task, the sequence may look something like this:
* - the task gets scheduled in; if both the task's fpsimd_cpu field
@@ -209,27 +209,14 @@ static inline void sme_free(struct task_struct *t) { }
#endif
-DEFINE_PER_CPU(bool, fpsimd_context_busy);
-EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy);
-
static void fpsimd_bind_task_to_cpu(void);
-static void __get_cpu_fpsimd_context(void)
-{
- bool busy = __this_cpu_xchg(fpsimd_context_busy, true);
-
- WARN_ON(busy);
-}
-
/*
* Claim ownership of the CPU FPSIMD context for use by the calling context.
*
* The caller may freely manipulate the FPSIMD context metadata until
* put_cpu_fpsimd_context() is called.
*
- * The double-underscore version must only be called if you know the task
- * can't be preempted.
- *
* On RT kernels local_bh_disable() is not sufficient because it only
* serializes soft interrupt related sections via a local lock, but stays
* preemptible. Disabling preemption is the right choice here as bottom
@@ -242,14 +229,6 @@ static void get_cpu_fpsimd_context(void)
local_bh_disable();
else
preempt_disable();
- __get_cpu_fpsimd_context();
-}
-
-static void __put_cpu_fpsimd_context(void)
-{
- bool busy = __this_cpu_xchg(fpsimd_context_busy, false);
-
- WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */
}
/*
@@ -261,18 +240,12 @@ static void __put_cpu_fpsimd_context(void)
*/
static void put_cpu_fpsimd_context(void)
{
- __put_cpu_fpsimd_context();
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
local_bh_enable();
else
preempt_enable();
}
-static bool have_cpu_fpsimd_context(void)
-{
- return !preemptible() && __this_cpu_read(fpsimd_context_busy);
-}
-
unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)
{
return task->thread.vl[type];
@@ -383,7 +356,7 @@ static void task_fpsimd_load(void)
bool restore_ffr;
WARN_ON(!system_supports_fpsimd());
- WARN_ON(!have_cpu_fpsimd_context());
+ WARN_ON(preemptible());
if (system_supports_sve() || system_supports_sme()) {
switch (current->thread.fp_type) {
@@ -467,7 +440,7 @@ static void fpsimd_save(void)
unsigned int vl;
WARN_ON(!system_supports_fpsimd());
- WARN_ON(!have_cpu_fpsimd_context());
+ WARN_ON(preemptible());
if (test_thread_flag(TIF_FOREIGN_FPSTATE))
return;
@@ -1507,7 +1480,7 @@ void fpsimd_thread_switch(struct task_struct *next)
if (!system_supports_fpsimd())
return;
- __get_cpu_fpsimd_context();
+ WARN_ON_ONCE(!irqs_disabled());
/* Save unsaved fpsimd state, if any: */
fpsimd_save();
@@ -1523,8 +1496,6 @@ void fpsimd_thread_switch(struct task_struct *next)
update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
wrong_task || wrong_cpu);
-
- __put_cpu_fpsimd_context();
}
static void fpsimd_flush_thread_vl(enum vec_type type)
@@ -1829,10 +1800,10 @@ void fpsimd_save_and_flush_cpu_state(void)
if (!system_supports_fpsimd())
return;
WARN_ON(preemptible());
- __get_cpu_fpsimd_context();
+ get_cpu_fpsimd_context();
fpsimd_save();
fpsimd_flush_cpu_state();
- __put_cpu_fpsimd_context();
+ put_cpu_fpsimd_context();
}
#ifdef CONFIG_KERNEL_MODE_NEON
--
2.43.0.rc1.413.gea7ed67945-goog
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch
2023-11-21 13:52 [PATCH 0/3] arm64: Run kernel mode NEON with preemption enabled Ard Biesheuvel
2023-11-21 13:52 ` [PATCH 1/3] arm64: fpsimd: Drop unneeded 'busy' flag Ard Biesheuvel
@ 2023-11-21 13:52 ` Ard Biesheuvel
2023-11-21 16:56 ` Mark Brown
2023-11-21 13:52 ` [PATCH 3/3] arm64: crypto: remove conditional yield logic Ard Biesheuvel
2 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2023-11-21 13:52 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
From: Ard Biesheuvel <ardb@kernel.org>
Currently, the SIMD register file is not preserved and restored along
with the general registers on exception entry/exit or context switch.
For this reason, we disable preemption when enabling the NEON for kernel
mode use in task context, and suspend the processing of softirqs so that
there are no concurrent uses in the kernel. (Kernel mode NEON may not be
used at all in other contexts).
Disabling preemption while doing CPU intensive work on inputs of
potentially unbounded size is bad for real-time performance, which is
why we try and ensure that SIMD crypto code does not operate on more
than ~4k at a time, which is an arbitrary value and requires assembler
code to implement efficiently.
We can avoid the need for disabling preemption if we can ensure that any
in-kernel users of the NEON will not lose the FPSIMD register state
across a context switch. And given that disabling softirqs implicitly
disables preemption as well, we will also have to ensure that a softirq
that uses the NEON can safely interrupt an in-kernel user.
So introduce a thread_info flag TIF_USING_KMODE_NEON, and modify the
context switch hook for FPSIMD to preserve resp. restore the kernel mode
FPSIMD to/from struct thread_struct when it is set. This avoids any
scheduling blackouts due to prolonged use of the NEON in kernel mode,
without the need for manual yielding.
In order to support softirq processing while the NEON is being used in
kernel task context, use the same flag to decide whether the kernel mode
FPSIMD state needs to be preserved and restored before allowing the NEON
to be used in softirq context.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/processor.h | 2 +
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/kernel/fpsimd.c | 75 ++++++++++++++++----
3 files changed, 63 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index e5bc54522e71..dcb51c0571af 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -167,6 +167,8 @@ struct thread_struct {
unsigned long fault_address; /* fault info */
unsigned long fault_code; /* ESR_EL1 value */
struct debug_info debug; /* debugging */
+
+ struct user_fpsimd_state kmode_fpsimd_state;
#ifdef CONFIG_ARM64_PTR_AUTH
struct ptrauth_keys_user keys_user;
#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 553d1bc559c6..b65324d1ffc8 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -80,6 +80,7 @@ void arch_setup_new_exec(void);
#define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */
#define TIF_SME 27 /* SME in use */
#define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */
+#define TIF_USING_KMODE_NEON 29 /* Task is in a kernel mode NEON section */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ccc4a78a70e4..326c1613091e 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -357,6 +357,7 @@ static void task_fpsimd_load(void)
WARN_ON(!system_supports_fpsimd());
WARN_ON(preemptible());
+ WARN_ON(test_thread_flag(TIF_USING_KMODE_NEON));
if (system_supports_sve() || system_supports_sme()) {
switch (current->thread.fp_type) {
@@ -1482,20 +1483,29 @@ void fpsimd_thread_switch(struct task_struct *next)
WARN_ON_ONCE(!irqs_disabled());
- /* Save unsaved fpsimd state, if any: */
- fpsimd_save();
+ if (!test_thread_flag(TIF_USING_KMODE_NEON)) {
+ /* Save unsaved user mode fpsimd state, if any: */
+ fpsimd_save();
+ } else {
+ fpsimd_save_state(¤t->thread.kmode_fpsimd_state);
+ }
- /*
- * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's
- * state. For kernel threads, FPSIMD registers are never loaded
- * and wrong_task and wrong_cpu will always be true.
- */
- wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
- &next->thread.uw.fpsimd_state;
- wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
+ if (test_tsk_thread_flag(next, TIF_USING_KMODE_NEON)) {
+ fpsimd_load_state(&next->thread.kmode_fpsimd_state);
+ set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
+ } else {
+ /*
+ * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's
+ * state. For kernel threads, FPSIMD registers are never loaded
+ * and wrong_task and wrong_cpu will always be true.
+ */
+ wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
+ &next->thread.uw.fpsimd_state;
+ wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
- update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
- wrong_task || wrong_cpu);
+ update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
+ wrong_task || wrong_cpu);
+ }
}
static void fpsimd_flush_thread_vl(enum vec_type type)
@@ -1834,11 +1844,37 @@ void kernel_neon_begin(void)
get_cpu_fpsimd_context();
- /* Save unsaved fpsimd state, if any: */
- fpsimd_save();
+ if (!test_thread_flag(TIF_USING_KMODE_NEON)) {
+ /* Save unsaved user mode fpsimd state, if any: */
+ fpsimd_save();
+
+ /*
+ * Set the thread flag so that the kernel mode NEON state will
+ * be context switched along with the rest of the task state.
+ *
+ * On non-PREEMPT_RT, softirqs may interrupt task level kernel
+ * mode NEON, but the task will not be preemptible so setting
+ * TIF_USING_KMODE_NEON for those would be both wrong (as it
+ * would mark the task context NEON state as requiring a
+ * context switch) and unnecessary.
+ *
+ * On PREEMPT_RT, softirqs are serviced from a separate thread,
+ * which is scheduled as usual, and this guarantees that these
+ * softirqs are not interrupting use of the NEON in kernel mode
+ * in task context. So in this case, setting the flag here is
+ * always appropriate.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq())
+ set_thread_flag(TIF_USING_KMODE_NEON);
+ } else {
+ BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq());
+ fpsimd_save_state(¤t->thread.kmode_fpsimd_state);
+ }
/* Invalidate any task state remaining in the fpsimd regs: */
fpsimd_flush_cpu_state();
+
+ put_cpu_fpsimd_context();
}
EXPORT_SYMBOL_GPL(kernel_neon_begin);
@@ -1856,7 +1892,16 @@ void kernel_neon_end(void)
if (!system_supports_fpsimd())
return;
- put_cpu_fpsimd_context();
+ /*
+ * If we are returning from a nested use of the kernel mode NEON,
+ * restore the task context kernel mode NEON state. This can only
+ * happen when running in softirq context on non-PREEMPT_RT.
+ */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) && in_serving_softirq() &&
+ test_thread_flag(TIF_USING_KMODE_NEON))
+ fpsimd_load_state(¤t->thread.kmode_fpsimd_state);
+ else
+ clear_thread_flag(TIF_USING_KMODE_NEON);
}
EXPORT_SYMBOL_GPL(kernel_neon_end);
--
2.43.0.rc1.413.gea7ed67945-goog
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] arm64: crypto: remove conditional yield logic
2023-11-21 13:52 [PATCH 0/3] arm64: Run kernel mode NEON with preemption enabled Ard Biesheuvel
2023-11-21 13:52 ` [PATCH 1/3] arm64: fpsimd: Drop unneeded 'busy' flag Ard Biesheuvel
2023-11-21 13:52 ` [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch Ard Biesheuvel
@ 2023-11-21 13:52 ` Ard Biesheuvel
2 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2023-11-21 13:52 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
From: Ard Biesheuvel <ardb@kernel.org>
Some classes of crypto algorithms (such as skciphers or aeads) have
natural yield points, but SIMD based shashes yield the NEON unit
manually to avoid causing scheduling blackouts when operating on large
inputs.
This is no longer necessary now that kernel mode NEON runs with
preemption enabled, so remove this logic from the crypto code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/crypto/aes-glue.c | 21 +++++---------
arch/arm64/crypto/aes-modes.S | 2 --
arch/arm64/crypto/sha1-ce-core.S | 2 --
arch/arm64/crypto/sha1-ce-glue.c | 19 ++++---------
arch/arm64/crypto/sha2-ce-core.S | 2 --
arch/arm64/crypto/sha2-ce-glue.c | 19 ++++---------
arch/arm64/crypto/sha3-ce-core.S | 4 +--
arch/arm64/crypto/sha3-ce-glue.c | 14 ++++------
arch/arm64/crypto/sha512-ce-core.S | 2 --
arch/arm64/crypto/sha512-ce-glue.c | 16 ++++-------
arch/arm64/include/asm/assembler.h | 29 --------------------
arch/arm64/kernel/asm-offsets.c | 4 ---
12 files changed, 30 insertions(+), 104 deletions(-)
diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 162787c7aa86..c42c903b7d60 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -109,9 +109,9 @@ asmlinkage void aes_essiv_cbc_decrypt(u8 out[], u8 const in[], u32 const rk1[],
int rounds, int blocks, u8 iv[],
u32 const rk2[]);
-asmlinkage int aes_mac_update(u8 const in[], u32 const rk[], int rounds,
- int blocks, u8 dg[], int enc_before,
- int enc_after);
+asmlinkage void aes_mac_update(u8 const in[], u32 const rk[], int rounds,
+ int blocks, u8 dg[], int enc_before,
+ int enc_after);
struct crypto_aes_xts_ctx {
struct crypto_aes_ctx key1;
@@ -880,17 +880,10 @@ static void mac_do_update(struct crypto_aes_ctx *ctx, u8 const in[], int blocks,
int rounds = 6 + ctx->key_length / 4;
if (crypto_simd_usable()) {
- int rem;
-
- do {
- kernel_neon_begin();
- rem = aes_mac_update(in, ctx->key_enc, rounds, blocks,
- dg, enc_before, enc_after);
- kernel_neon_end();
- in += (blocks - rem) * AES_BLOCK_SIZE;
- blocks = rem;
- enc_before = 0;
- } while (blocks);
+ kernel_neon_begin();
+ aes_mac_update(in, ctx->key_enc, rounds, blocks, dg,
+ enc_before, enc_after);
+ kernel_neon_end();
} else {
if (enc_before)
aes_encrypt(ctx, dg, dg);
diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index 0e834a2c062c..4d68853d0caf 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -842,7 +842,6 @@ AES_FUNC_START(aes_mac_update)
cbz w5, .Lmacout
encrypt_block v0, w2, x1, x7, w8
st1 {v0.16b}, [x4] /* return dg */
- cond_yield .Lmacout, x7, x8
b .Lmacloop4x
.Lmac1x:
add w3, w3, #4
@@ -861,6 +860,5 @@ AES_FUNC_START(aes_mac_update)
.Lmacout:
st1 {v0.16b}, [x4] /* return dg */
- mov w0, w3
ret
AES_FUNC_END(aes_mac_update)
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 9b1f2d82a6fe..113deb09dbf6 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -121,7 +121,6 @@ CPU_LE( rev32 v11.16b, v11.16b )
add dgav.4s, dgav.4s, dg0v.4s
cbz w2, 2f
- cond_yield 3f, x5, x6
b 0b
/*
@@ -145,6 +144,5 @@ CPU_LE( rev32 v11.16b, v11.16b )
/* store new state */
3: st1 {dgav.4s}, [x0]
str dgb, [x0, #16]
- mov w0, w2
ret
SYM_FUNC_END(__sha1_ce_transform)
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
index 1dd93e1fcb39..c1c5c5cb104b 100644
--- a/arch/arm64/crypto/sha1-ce-glue.c
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -29,23 +29,16 @@ struct sha1_ce_state {
extern const u32 sha1_ce_offsetof_count;
extern const u32 sha1_ce_offsetof_finalize;
-asmlinkage int __sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src,
- int blocks);
+asmlinkage void __sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src,
+ int blocks);
static void sha1_ce_transform(struct sha1_state *sst, u8 const *src,
int blocks)
{
- while (blocks) {
- int rem;
-
- kernel_neon_begin();
- rem = __sha1_ce_transform(container_of(sst,
- struct sha1_ce_state,
- sst), src, blocks);
- kernel_neon_end();
- src += (blocks - rem) * SHA1_BLOCK_SIZE;
- blocks = rem;
- }
+ kernel_neon_begin();
+ __sha1_ce_transform(container_of(sst, struct sha1_ce_state, sst), src,
+ blocks);
+ kernel_neon_end();
}
const u32 sha1_ce_offsetof_count = offsetof(struct sha1_ce_state, sst.count);
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
index fce84d88ddb2..c9629992d3cf 100644
--- a/arch/arm64/crypto/sha2-ce-core.S
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -129,7 +129,6 @@ CPU_LE( rev32 v19.16b, v19.16b )
/* handled all input blocks? */
cbz w2, 2f
- cond_yield 3f, x5, x6
b 0b
/*
@@ -152,6 +151,5 @@ CPU_LE( rev32 v19.16b, v19.16b )
/* store new state */
3: st1 {dgav.4s, dgbv.4s}, [x0]
- mov w0, w2
ret
SYM_FUNC_END(__sha256_ce_transform)
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
index 0a44d2e7ee1f..f785a66a1de4 100644
--- a/arch/arm64/crypto/sha2-ce-glue.c
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -30,23 +30,16 @@ struct sha256_ce_state {
extern const u32 sha256_ce_offsetof_count;
extern const u32 sha256_ce_offsetof_finalize;
-asmlinkage int __sha256_ce_transform(struct sha256_ce_state *sst, u8 const *src,
- int blocks);
+asmlinkage void __sha256_ce_transform(struct sha256_ce_state *sst, u8 const *src,
+ int blocks);
static void sha256_ce_transform(struct sha256_state *sst, u8 const *src,
int blocks)
{
- while (blocks) {
- int rem;
-
- kernel_neon_begin();
- rem = __sha256_ce_transform(container_of(sst,
- struct sha256_ce_state,
- sst), src, blocks);
- kernel_neon_end();
- src += (blocks - rem) * SHA256_BLOCK_SIZE;
- blocks = rem;
- }
+ kernel_neon_begin();
+ __sha256_ce_transform(container_of(sst, struct sha256_ce_state, sst),
+ src, blocks);
+ kernel_neon_end();
}
const u32 sha256_ce_offsetof_count = offsetof(struct sha256_ce_state,
diff --git a/arch/arm64/crypto/sha3-ce-core.S b/arch/arm64/crypto/sha3-ce-core.S
index 9c77313f5a60..10c74f19054d 100644
--- a/arch/arm64/crypto/sha3-ce-core.S
+++ b/arch/arm64/crypto/sha3-ce-core.S
@@ -184,18 +184,16 @@ SYM_FUNC_START(sha3_ce_transform)
eor v0.16b, v0.16b, v31.16b
cbnz w8, 3b
- cond_yield 4f, x8, x9
cbnz w2, 0b
/* save state */
-4: st1 { v0.1d- v3.1d}, [x0], #32
+ st1 { v0.1d- v3.1d}, [x0], #32
st1 { v4.1d- v7.1d}, [x0], #32
st1 { v8.1d-v11.1d}, [x0], #32
st1 {v12.1d-v15.1d}, [x0], #32
st1 {v16.1d-v19.1d}, [x0], #32
st1 {v20.1d-v23.1d}, [x0], #32
st1 {v24.1d}, [x0]
- mov w0, w2
ret
SYM_FUNC_END(sha3_ce_transform)
diff --git a/arch/arm64/crypto/sha3-ce-glue.c b/arch/arm64/crypto/sha3-ce-glue.c
index 250e1377c481..d689cd2bf4cf 100644
--- a/arch/arm64/crypto/sha3-ce-glue.c
+++ b/arch/arm64/crypto/sha3-ce-glue.c
@@ -28,8 +28,8 @@ MODULE_ALIAS_CRYPTO("sha3-256");
MODULE_ALIAS_CRYPTO("sha3-384");
MODULE_ALIAS_CRYPTO("sha3-512");
-asmlinkage int sha3_ce_transform(u64 *st, const u8 *data, int blocks,
- int md_len);
+asmlinkage void sha3_ce_transform(u64 *st, const u8 *data, int blocks,
+ int md_len);
static int sha3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
@@ -59,15 +59,11 @@ static int sha3_update(struct shash_desc *desc, const u8 *data,
blocks = len / sctx->rsiz;
len %= sctx->rsiz;
- while (blocks) {
- int rem;
-
+ if (blocks) {
kernel_neon_begin();
- rem = sha3_ce_transform(sctx->st, data, blocks,
- digest_size);
+ sha3_ce_transform(sctx->st, data, blocks, digest_size);
kernel_neon_end();
- data += (blocks - rem) * sctx->rsiz;
- blocks = rem;
+ data += blocks * sctx->rsiz;
}
}
diff --git a/arch/arm64/crypto/sha512-ce-core.S b/arch/arm64/crypto/sha512-ce-core.S
index 91ef68b15fcc..bfaa4f591290 100644
--- a/arch/arm64/crypto/sha512-ce-core.S
+++ b/arch/arm64/crypto/sha512-ce-core.S
@@ -195,12 +195,10 @@ CPU_LE( rev64 v19.16b, v19.16b )
add v10.2d, v10.2d, v2.2d
add v11.2d, v11.2d, v3.2d
- cond_yield 3f, x4, x5
/* handled all input blocks? */
cbnz w2, 0b
/* store new state */
3: st1 {v8.2d-v11.2d}, [x0]
- mov w0, w2
ret
SYM_FUNC_END(__sha512_ce_transform)
diff --git a/arch/arm64/crypto/sha512-ce-glue.c b/arch/arm64/crypto/sha512-ce-glue.c
index f3431fc62315..70eef74fe031 100644
--- a/arch/arm64/crypto/sha512-ce-glue.c
+++ b/arch/arm64/crypto/sha512-ce-glue.c
@@ -26,23 +26,17 @@ MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("sha384");
MODULE_ALIAS_CRYPTO("sha512");
-asmlinkage int __sha512_ce_transform(struct sha512_state *sst, u8 const *src,
- int blocks);
+asmlinkage void __sha512_ce_transform(struct sha512_state *sst, u8 const *src,
+ int blocks);
asmlinkage void sha512_block_data_order(u64 *digest, u8 const *src, int blocks);
static void sha512_ce_transform(struct sha512_state *sst, u8 const *src,
int blocks)
{
- while (blocks) {
- int rem;
-
- kernel_neon_begin();
- rem = __sha512_ce_transform(sst, src, blocks);
- kernel_neon_end();
- src += (blocks - rem) * SHA512_BLOCK_SIZE;
- blocks = rem;
- }
+ kernel_neon_begin();
+ __sha512_ce_transform(sst, src, blocks);
+ kernel_neon_end();
}
static void sha512_arm64_transform(struct sha512_state *sst, u8 const *src,
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 376a980f2bad..f0da53a0388f 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -759,35 +759,6 @@ alternative_endif
set_sctlr sctlr_el2, \reg
.endm
- /*
- * Check whether preempt/bh-disabled asm code should yield as soon as
- * it is able. This is the case if we are currently running in task
- * context, and either a softirq is pending, or the TIF_NEED_RESCHED
- * flag is set and re-enabling preemption a single time would result in
- * a preempt count of zero. (Note that the TIF_NEED_RESCHED flag is
- * stored negated in the top word of the thread_info::preempt_count
- * field)
- */
- .macro cond_yield, lbl:req, tmp:req, tmp2:req
- get_current_task \tmp
- ldr \tmp, [\tmp, #TSK_TI_PREEMPT]
- /*
- * If we are serving a softirq, there is no point in yielding: the
- * softirq will not be preempted no matter what we do, so we should
- * run to completion as quickly as we can.
- */
- tbnz \tmp, #SOFTIRQ_SHIFT, .Lnoyield_\@
-#ifdef CONFIG_PREEMPTION
- sub \tmp, \tmp, #PREEMPT_DISABLE_OFFSET
- cbz \tmp, \lbl
-#endif
- adr_l \tmp, irq_stat + IRQ_CPUSTAT_SOFTIRQ_PENDING
- get_this_cpu_offset \tmp2
- ldr w\tmp, [\tmp, \tmp2]
- cbnz w\tmp, \lbl // yield on pending softirq in task context
-.Lnoyield_\@:
- .endm
-
/*
* Branch Target Identifier (BTI)
*/
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 5ff1942b04fc..fb9e9ef9b527 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -116,10 +116,6 @@ int main(void)
DEFINE(DMA_TO_DEVICE, DMA_TO_DEVICE);
DEFINE(DMA_FROM_DEVICE, DMA_FROM_DEVICE);
BLANK();
- DEFINE(PREEMPT_DISABLE_OFFSET, PREEMPT_DISABLE_OFFSET);
- DEFINE(SOFTIRQ_SHIFT, SOFTIRQ_SHIFT);
- DEFINE(IRQ_CPUSTAT_SOFTIRQ_PENDING, offsetof(irq_cpustat_t, __softirq_pending));
- BLANK();
DEFINE(CPU_BOOT_TASK, offsetof(struct secondary_data, task));
BLANK();
DEFINE(FTR_OVR_VAL_OFFSET, offsetof(struct arm64_ftr_override, val));
--
2.43.0.rc1.413.gea7ed67945-goog
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch
2023-11-21 13:52 ` [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch Ard Biesheuvel
@ 2023-11-21 16:56 ` Mark Brown
2023-11-21 17:45 ` Ard Biesheuvel
0 siblings, 1 reply; 10+ messages in thread
From: Mark Brown @ 2023-11-21 16:56 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas
[-- Attachment #1.1: Type: text/plain, Size: 1984 bytes --]
On Tue, Nov 21, 2023 at 02:52:15PM +0100, Ard Biesheuvel wrote:
> So introduce a thread_info flag TIF_USING_KMODE_NEON, and modify the
> context switch hook for FPSIMD to preserve resp. restore the kernel mode
Can we bikeshed this to TIF_USING_KMODE_FP please?
Also is "context switch hook ... preserve resp. restore" just "context
switch hooks to preserve and restore" or am I missing something there?
> - /* Save unsaved fpsimd state, if any: */
> - fpsimd_save();
> + if (!test_thread_flag(TIF_USING_KMODE_NEON)) {
> + /* Save unsaved user mode fpsimd state, if any: */
> + fpsimd_save();
> + } else {
> + fpsimd_save_state(¤t->thread.kmode_fpsimd_state);
> + }
This all works but I'm having a hard time loving the fact that it's
not integrated with the cleverness that we're doing with KVM to switch
out where a FP save gets written to, we've open coded a FPSIMD only
register save and we'll have to extend that to support SVE or SME (which
I'm pretty sure we'll want to do eventually).
Unless we're going to totally rework how KVM does things it seems like
it's better to use fpsimd_bind_state_to_cpu() in kernel_neon_begin(),
but that does raise the complexity with checking TIF_FOREIGN_FPSTATE in
there. I *think* what we want here is to do something like rename the
existing fpsimd_save() to user_fpsimd_save() (or something), pull out
the actual register->memory bit into a new fpsimd_save() that's only
used there, in fpsimd_thread_switch() and in the newly added code. That
way there's only one place where we have code that knows how to write
data out to memory and we always have whatever memory backs any live
data in the registers bound for use there, that seems less likely to go
wrong in future and easier to add asserts for.
I don't *think* we need to worry about overwriting whatever KVM bound
there since we're about to set TIF_FOREIGN_FPSTATE and start using the
registers so we won't be saving again, I could be missing some case
there though.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] arm64: fpsimd: Drop unneeded 'busy' flag
2023-11-21 13:52 ` [PATCH 1/3] arm64: fpsimd: Drop unneeded 'busy' flag Ard Biesheuvel
@ 2023-11-21 16:58 ` Mark Brown
0 siblings, 0 replies; 10+ messages in thread
From: Mark Brown @ 2023-11-21 16:58 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas
[-- Attachment #1.1: Type: text/plain, Size: 504 bytes --]
On Tue, Nov 21, 2023 at 02:52:14PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Kernel mode NEON will preserve the user mode FPSIMD state by saving it
> into the task struct before clobbering the registers. In order to avoid
> the need for preserving kernel mode state too, we disallow nested use of
> kernel mode NEON, i..e, use in softirq context while the interrupted
> task context was using kernel mode NEON too.
Reviewed-by: Mark Brown <broonie@kernel.org>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch
2023-11-21 16:56 ` Mark Brown
@ 2023-11-21 17:45 ` Ard Biesheuvel
2023-11-21 20:24 ` Mark Brown
0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2023-11-21 17:45 UTC (permalink / raw)
To: Mark Brown
Cc: Ard Biesheuvel, linux-arm-kernel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas
On Tue, 21 Nov 2023 at 11:56, Mark Brown <broonie@kernel.org> wrote:
>
> On Tue, Nov 21, 2023 at 02:52:15PM +0100, Ard Biesheuvel wrote:
>
> > So introduce a thread_info flag TIF_USING_KMODE_NEON, and modify the
> > context switch hook for FPSIMD to preserve resp. restore the kernel mode
>
> Can we bikeshed this to TIF_USING_KMODE_FP please?
>
TIF_USING_KMODE_FPSIMD perhaps?
> Also is "context switch hook ... preserve resp. restore" just "context
> switch hooks to preserve and restore" or am I missing something there?
>
It combines with the to/from that follows, but I'll reword this to
make it less confusing
> > - /* Save unsaved fpsimd state, if any: */
> > - fpsimd_save();
> > + if (!test_thread_flag(TIF_USING_KMODE_NEON)) {
> > + /* Save unsaved user mode fpsimd state, if any: */
> > + fpsimd_save();
> > + } else {
> > + fpsimd_save_state(¤t->thread.kmode_fpsimd_state);
> > + }
>
> This all works but I'm having a hard time loving the fact that it's
> not integrated with the cleverness that we're doing with KVM to switch
> out where a FP save gets written to, we've open coded a FPSIMD only
> register save and we'll have to extend that to support SVE or SME (which
> I'm pretty sure we'll want to do eventually).
>
How so? KVM only deals with user mode (guest) FPSIMD state, and never
uses the FPSIMD unit in kernel mode.
Also, I don't think it makes lots of sense to complicate things now in
anticipation of kernel mode SVE/SME, which doesn't seem that
compelling to me. (The primary use case for kernel mode NEON is the
crypto extensions, which are only accessible via NEON registers to
begin with, and which provide a 5x-15x performance boost compared to
scalar code. SVE may support those crypto extensions too, but that
doesn't necessarily imply they will actually be faster than plain
NEON.)
> Unless we're going to totally rework how KVM does things it seems like
> it's better to use fpsimd_bind_state_to_cpu() in kernel_neon_begin(),
> but that does raise the complexity with checking TIF_FOREIGN_FPSTATE in
> there. I *think* what we want here is to do something like rename the
> existing fpsimd_save() to user_fpsimd_save() (or something), pull out
> the actual register->memory bit into a new fpsimd_save() that's only
> used there, in fpsimd_thread_switch() and in the newly added code. That
> way there's only one place where we have code that knows how to write
> data out to memory and we always have whatever memory backs any live
> data in the registers bound for use there, that seems less likely to go
> wrong in future and easier to add asserts for.
>
> I don't *think* we need to worry about overwriting whatever KVM bound
> there since we're about to set TIF_FOREIGN_FPSTATE and start using the
> registers so we won't be saving again, I could be missing some case
> there though.
Modulo future support for kernel mode SVE/SME, I don't think there is
a need for all of this. But of course, I could be the one missing
something here.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch
2023-11-21 17:45 ` Ard Biesheuvel
@ 2023-11-21 20:24 ` Mark Brown
2023-11-22 9:30 ` Ard Biesheuvel
0 siblings, 1 reply; 10+ messages in thread
From: Mark Brown @ 2023-11-21 20:24 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-arm-kernel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas
[-- Attachment #1.1: Type: text/plain, Size: 3787 bytes --]
On Tue, Nov 21, 2023 at 12:45:48PM -0500, Ard Biesheuvel wrote:
> On Tue, 21 Nov 2023 at 11:56, Mark Brown <broonie@kernel.org> wrote:
> > On Tue, Nov 21, 2023 at 02:52:15PM +0100, Ard Biesheuvel wrote:
> > > So introduce a thread_info flag TIF_USING_KMODE_NEON, and modify the
> > > context switch hook for FPSIMD to preserve resp. restore the kernel mode
> > Can we bikeshed this to TIF_USING_KMODE_FP please?
> TIF_USING_KMODE_FPSIMD perhaps?
Sure - my goal was to avoid wanting to rename the flag if we use the
vector extensions, but then we already use FPSIMD for all FP related
state so that's fine also.
> > This all works but I'm having a hard time loving the fact that it's
> > not integrated with the cleverness that we're doing with KVM to switch
> > out where a FP save gets written to, we've open coded a FPSIMD only
> > register save and we'll have to extend that to support SVE or SME (which
> > I'm pretty sure we'll want to do eventually).
> How so? KVM only deals with user mode (guest) FPSIMD state, and never
> uses the FPSIMD unit in kernel mode.
Given that we only have one copy of the FP registers which is shared
between all ELs I'm just thinking of this in terms of the FP registers,
what owns them and where we write their contents to rather than that
they're particularly user registers - the only thing that's specifically
user mode about the existing fpsimd_save() is that it's got the check
for TIF_FOREIGN_FPSTATE in there. It is also called fpsimd_save() so it
is where I'd expect to find all the code that saves FP things. I would
also expect that anything that takes ownership of the registers for use
in a preemptable context where we might need to save would be using
bind_state_to_cpu(). Adding new, independent paths for managing the
live register state feels like it's making things harder to reason
about.
To me what this change does is add another potential owner of the live
FP state, I wouldn't expect the fact that this user is for kernel mode
to change how we keep track of who owns that state.
> Also, I don't think it makes lots of sense to complicate things now in
> anticipation of kernel mode SVE/SME, which doesn't seem that
> compelling to me. (The primary use case for kernel mode NEON is the
> crypto extensions, which are only accessible via NEON registers to
> begin with, and which provide a 5x-15x performance boost compared to
> scalar code. SVE may support those crypto extensions too, but that
> doesn't necessarily imply they will actually be faster than plain
> NEON.)
The preparation for kernel mode SVE/SME is as much a nice fallout that I
noticed as it is a specific goal, my thinking here is more about
consistency in how we work with the FP state.
I do suspect we might see some useful crypto stuff for them at some
point as the architecture and implementations advance, but I very much
agree that it's premature to actually do anything since we've not yet
seen any concrete examples.
> Modulo future support for kernel mode SVE/SME, I don't think there is
> a need for all of this. But of course, I could be the one missing
> something here.
I'm not sure it's a huge refactor, it's mostly just the rename? It
seemed proportionate to the change anyway. I've not actually *done* it
though so it might be much more involved than I'm imagining. I might
have a look tomorrow if I get time, I need to poke at that code anyway
for some other things.
Oh, and BTW I should mention my series for the 2023 DPISA:
https://lore.kernel.org/linux-arm-kernel/20231114-arm64-2023-dpisa-v2-0-47251894f6a8@kernel.org/
Once that's merged there'll need to be some consideration of FPMR,
though as far as I'm aware nothing in there impacts any crypto stuff so
it should't cause any immediate problems to ignore it.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch
2023-11-21 20:24 ` Mark Brown
@ 2023-11-22 9:30 ` Ard Biesheuvel
2023-11-22 15:45 ` Mark Brown
0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2023-11-22 9:30 UTC (permalink / raw)
To: Mark Brown
Cc: Ard Biesheuvel, linux-arm-kernel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas
On Tue, 21 Nov 2023 at 21:25, Mark Brown <broonie@kernel.org> wrote:
>
> On Tue, Nov 21, 2023 at 12:45:48PM -0500, Ard Biesheuvel wrote:
> > On Tue, 21 Nov 2023 at 11:56, Mark Brown <broonie@kernel.org> wrote:
> > > This all works but I'm having a hard time loving the fact that it's
> > > not integrated with the cleverness that we're doing with KVM to switch
> > > out where a FP save gets written to, we've open coded a FPSIMD only
> > > register save and we'll have to extend that to support SVE or SME (which
> > > I'm pretty sure we'll want to do eventually).
>
> > How so? KVM only deals with user mode (guest) FPSIMD state, and never
> > uses the FPSIMD unit in kernel mode.
>
> Given that we only have one copy of the FP registers which is shared
> between all ELs I'm just thinking of this in terms of the FP registers,
> what owns them and where we write their contents to rather than that
> they're particularly user registers - the only thing that's specifically
> user mode about the existing fpsimd_save() is that it's got the check
> for TIF_FOREIGN_FPSTATE in there.
The user mode state is preserved eagerly but selectively (i.e., we
only preserve the pieces that were actually live), and restored
lazily, so that we never blow away useful user context and replace it
with something that is never used (either because the task never
returns to user space before being switched out, or because it is a
kernel thread which has no user space to begin with)
All the complexity related to user return, KVM etc is to handle with
these policies.
Kernel mode FPSIMD state is always preserved eagerly and always
restored eagerly, and always involves the contents of the NEON
registers only. So what do we gain by forcing ourselves to use the
complicated logic for a very simple use case that is completely
disjoint?
> It is also called fpsimd_save() so it
> is where I'd expect to find all the code that saves FP things.
I can rename fpsimd_save() to fpsimd_save_user_state() and add a
fpsimd_save_kernel_state() that encapsulates the fpsimd_save_state()
call if that makes things clearer.
> I would
> also expect that anything that takes ownership of the registers for use
> in a preemptable context where we might need to save would be using
> bind_state_to_cpu().
Why? This is part of the lazy restore logic that we have no need for.
> Adding new, independent paths for managing the
> live register state feels like it's making things harder to reason
> about.
>
> To me what this change does is add another potential owner of the live
> FP state, I wouldn't expect the fact that this user is for kernel mode
> to change how we keep track of who owns that state.
>
The difference is that kernel mode FPSIMD state will always be
reloaded eagerly when switching to the task. None of the load/save
logic that reasons about who owns what, whether the current state is
the most recent one or whether it is SVE or SME is needed here.
> > Also, I don't think it makes lots of sense to complicate things now in
> > anticipation of kernel mode SVE/SME, which doesn't seem that
> > compelling to me. (The primary use case for kernel mode NEON is the
> > crypto extensions, which are only accessible via NEON registers to
> > begin with, and which provide a 5x-15x performance boost compared to
> > scalar code. SVE may support those crypto extensions too, but that
> > doesn't necessarily imply they will actually be faster than plain
> > NEON.)
>
> The preparation for kernel mode SVE/SME is as much a nice fallout that I
> noticed as it is a specific goal, my thinking here is more about
> consistency in how we work with the FP state.
>
> I do suspect we might see some useful crypto stuff for them at some
> point as the architecture and implementations advance, but I very much
> agree that it's premature to actually do anything since we've not yet
> seen any concrete examples.
>
Exactly. And if there are any data points you can share, I'm happy to
look into this from the crypto side.
> > Modulo future support for kernel mode SVE/SME, I don't think there is
> > a need for all of this. But of course, I could be the one missing
> > something here.
>
> I'm not sure it's a huge refactor, it's mostly just the rename? It
> seemed proportionate to the change anyway. I've not actually *done* it
> though so it might be much more involved than I'm imagining. I might
> have a look tomorrow if I get time, I need to poke at that code anyway
> for some other things.
>
OK.
> Oh, and BTW I should mention my series for the 2023 DPISA:
>
> https://lore.kernel.org/linux-arm-kernel/20231114-arm64-2023-dpisa-v2-0-47251894f6a8@kernel.org/
>
> Once that's merged there'll need to be some consideration of FPMR,
> though as far as I'm aware nothing in there impacts any crypto stuff so
> it should't cause any immediate problems to ignore it.
Thanks for the head's up. Agree that this seems only relevant to user space.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch
2023-11-22 9:30 ` Ard Biesheuvel
@ 2023-11-22 15:45 ` Mark Brown
0 siblings, 0 replies; 10+ messages in thread
From: Mark Brown @ 2023-11-22 15:45 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-arm-kernel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas
[-- Attachment #1.1: Type: text/plain, Size: 2959 bytes --]
On Wed, Nov 22, 2023 at 10:30:15AM +0100, Ard Biesheuvel wrote:
> On Tue, 21 Nov 2023 at 21:25, Mark Brown <broonie@kernel.org> wrote:
> > On Tue, Nov 21, 2023 at 12:45:48PM -0500, Ard Biesheuvel wrote:
> Kernel mode FPSIMD state is always preserved eagerly and always
> restored eagerly, and always involves the contents of the NEON
> registers only. So what do we gain by forcing ourselves to use the
> complicated logic for a very simple use case that is completely
> disjoint?
I think a key difference between us here is that I don't see the use
case as completely disjoint - I see the floating point state of the
hardware as being the primary point from which everything else flows.
TBH I wouldn't even think about what's going on in fpsimd_save() as
relevant complexity, it's all encapsulated in there anyway.
> > It is also called fpsimd_save() so it
> > is where I'd expect to find all the code that saves FP things.
> I can rename fpsimd_save() to fpsimd_save_user_state() and add a
> fpsimd_save_kernel_state() that encapsulates the fpsimd_save_state()
> call if that makes things clearer.
I think that'd definitely help.
> > I would
> > also expect that anything that takes ownership of the registers for use
> > in a preemptable context where we might need to save would be using
> > bind_state_to_cpu().
> Why? This is part of the lazy restore logic that we have no need for.
It's mainly a code smell thing with making sure that things can be
followed and nobody forgets that something needs updates in future -
it's not that there's any actual problem now but rather about trying to
reduce the chances of surprises for anyone looking at the code in the
future.
> > Adding new, independent paths for managing the
> > live register state feels like it's making things harder to reason
> > about.
> > To me what this change does is add another potential owner of the live
> > FP state, I wouldn't expect the fact that this user is for kernel mode
> > to change how we keep track of who owns that state.
> The difference is that kernel mode FPSIMD state will always be
> reloaded eagerly when switching to the task. None of the load/save
> logic that reasons about who owns what, whether the current state is
> the most recent one or whether it is SVE or SME is needed here.
It's about reducing the number of cases that the rest of the code needs
to think about rather than about the patch not working.
> > I do suspect we might see some useful crypto stuff for them at some
> > point as the architecture and implementations advance, but I very much
> > agree that it's premature to actually do anything since we've not yet
> > seen any concrete examples.
> Exactly. And if there are any data points you can share, I'm happy to
> look into this from the crypto side.
This is very much forward looking stuff rather than concrete reality,
there's nothing right now that I'm aware of (though I'd be delighted to
discover I'm wrong!).
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-22 15:46 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-21 13:52 [PATCH 0/3] arm64: Run kernel mode NEON with preemption enabled Ard Biesheuvel
2023-11-21 13:52 ` [PATCH 1/3] arm64: fpsimd: Drop unneeded 'busy' flag Ard Biesheuvel
2023-11-21 16:58 ` Mark Brown
2023-11-21 13:52 ` [PATCH 2/3] arm64: fpsimd: preserve/restore kernel mode NEON at context switch Ard Biesheuvel
2023-11-21 16:56 ` Mark Brown
2023-11-21 17:45 ` Ard Biesheuvel
2023-11-21 20:24 ` Mark Brown
2023-11-22 9:30 ` Ard Biesheuvel
2023-11-22 15:45 ` Mark Brown
2023-11-21 13:52 ` [PATCH 3/3] arm64: crypto: remove conditional yield logic Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).