* [PATCH v2 0/4] kernel mode NEON optimizations
@ 2014-02-05 17:13 Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 1/4] arm64: add abstractions for FPSIMD state manipulation Ard Biesheuvel
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2014-02-05 17:13 UTC (permalink / raw)
To: linux-arm-kernel
The main purpose of this series is to allow Crypto Extensions instructions to be
used in interrupt context. For instance, the WPA2 CCMP handling (i.e., AES-128
in CCM mode) in the mac80211 layer runs entirely in softirq context.
Patch #1 does some preparatory work so all code external to kernel/fpsimd.c uses
accessor functions to manipulate both the on-CPU and preserved FPSIMD states
rather than poking them directly.
Patch #2 implements an optimization that reduces the number of times the
userland FPSIMD state is needlessly preserved and restored.
Patch #3 modifies kernel_neon_begin() and kernel_neon_end() so they may be
called from interrupt context.
Patch #4 contains the actual AES implementation, updated to reflect some recent
feedback I received when I posted it separately from this series.
Changes since previous version:
- added preparatory patch #1
- based on feedback from Will Deacon, make sure ptrace() and return from signal
handler cases are covered.
Mildly but not yet thoroughly tested.
All feedback highly appreciated.
Ard Biesheuvel (4):
arm64: add abstractions for FPSIMD state manipulation
arm64: defer reloading a task's FPSIMD state to userland resume
arm64: add support for kernel mode NEON in interrupt context
arm64: add Crypto Extensions based synchronous core AES cipher
arch/arm64/Makefile | 1 +
arch/arm64/crypto/Makefile | 13 +++
arch/arm64/crypto/aes-ce-cipher.c | 134 +++++++++++++++++++++++++++++
arch/arm64/include/asm/fpsimd.h | 30 ++++++-
arch/arm64/include/asm/fpsimdmacros.h | 37 ++++++++
arch/arm64/include/asm/neon.h | 6 +-
arch/arm64/include/asm/thread_info.h | 4 +-
arch/arm64/kernel/entry-fpsimd.S | 24 ++++++
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/kernel/fpsimd.c | 156 +++++++++++++++++++++++++++++-----
arch/arm64/kernel/process.c | 2 +-
arch/arm64/kernel/ptrace.c | 22 +++--
arch/arm64/kernel/signal.c | 10 ++-
arch/arm64/kernel/signal32.c | 6 +-
crypto/Kconfig | 6 ++
15 files changed, 411 insertions(+), 42 deletions(-)
create mode 100644 arch/arm64/crypto/Makefile
create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
--
1.8.3.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/4] arm64: add abstractions for FPSIMD state manipulation
2014-02-05 17:13 [PATCH v2 0/4] kernel mode NEON optimizations Ard Biesheuvel
@ 2014-02-05 17:13 ` Ard Biesheuvel
2014-02-21 14:25 ` Catalin Marinas
2014-02-05 17:13 ` [PATCH v2 2/4] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2014-02-05 17:13 UTC (permalink / raw)
To: linux-arm-kernel
This is a preparatory patch that replaces code that saves or restores the
on-CPU or preserved FPSIMD state directly with wrapper functions, resulting
in all direct manipulation of the FPSIMD state to be concentrated in fpsimd.c
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 9 ++++++---
arch/arm64/kernel/fpsimd.c | 31 +++++++++++++++++++++++++++++++
arch/arm64/kernel/process.c | 2 +-
arch/arm64/kernel/ptrace.c | 22 +++++++++++++---------
arch/arm64/kernel/signal.c | 6 +++---
arch/arm64/kernel/signal32.c | 6 +++---
6 files changed, 57 insertions(+), 19 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index c43b4ac13008..7807974b49ee 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -52,12 +52,15 @@ struct fpsimd_state {
struct task_struct;
-extern void fpsimd_save_state(struct fpsimd_state *state);
-extern void fpsimd_load_state(struct fpsimd_state *state);
-
extern void fpsimd_thread_switch(struct task_struct *next);
extern void fpsimd_flush_thread(void);
+struct fpsimd_state *fpsimd_get_task_state(void);
+void fpsimd_set_task_state(struct fpsimd_state *state);
+
+struct user_fpsimd_state *fpsimd_get_user_state(struct task_struct *t);
+void fpsimd_set_user_state(struct task_struct *t, struct user_fpsimd_state *st);
+
#endif
#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 4aef42a04bdc..eeb003f54ad0 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -34,6 +34,10 @@
#define FPEXC_IXF (1 << 4)
#define FPEXC_IDF (1 << 7)
+/* defined in entry-fpsimd.S but only used in this unit */
+void fpsimd_save_state(struct fpsimd_state *state);
+void fpsimd_load_state(struct fpsimd_state *state);
+
/*
* Trapped FP/ASIMD access.
*/
@@ -87,6 +91,33 @@ void fpsimd_flush_thread(void)
preempt_enable();
}
+/*
+ * Sync the saved FPSIMD context with the FPSIMD register file
+ */
+struct fpsimd_state *fpsimd_get_task_state(void)
+{
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+ return ¤t->thread.fpsimd_state;
+}
+
+/*
+ * Load a new FPSIMD state into the FPSIMD register file.
+ */
+void fpsimd_set_task_state(struct fpsimd_state *state)
+{
+ fpsimd_load_state(state);
+}
+
+struct user_fpsimd_state *fpsimd_get_user_state(struct task_struct *t)
+{
+ return &t->thread.fpsimd_state.user_fpsimd;
+}
+
+void fpsimd_set_user_state(struct task_struct *t, struct user_fpsimd_state *st)
+{
+ t->thread.fpsimd_state.user_fpsimd = *st;
+}
+
#ifdef CONFIG_KERNEL_MODE_NEON
/*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 1c0a9be2ffa8..bfa8214f92d1 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -199,7 +199,7 @@ void release_thread(struct task_struct *dead_task)
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
- fpsimd_save_state(¤t->thread.fpsimd_state);
+ fpsimd_get_task_state();
*dst = *src;
return 0;
}
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 6a8928bba03c..d0b35af14539 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -500,8 +500,7 @@ static int fpr_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
- struct user_fpsimd_state *uregs;
- uregs = &target->thread.fpsimd_state.user_fpsimd;
+ struct user_fpsimd_state *uregs = fpsimd_get_user_state(target);
return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1);
}
@@ -516,7 +515,7 @@ static int fpr_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;
- target->thread.fpsimd_state.user_fpsimd = newstate;
+ fpsimd_set_user_state(target, &newstate);
return ret;
}
@@ -723,7 +722,7 @@ static int compat_vfp_get(struct task_struct *target,
compat_ulong_t fpscr;
int ret;
- uregs = &target->thread.fpsimd_state.user_fpsimd;
+ uregs = fpsimd_get_user_state(target);
/*
* The VFP registers are packed into the fpsimd_state, so they all sit
@@ -746,24 +745,29 @@ static int compat_vfp_set(struct task_struct *target,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct user_fpsimd_state *uregs;
+ struct user_fpsimd_state newstate;
compat_ulong_t fpscr;
int ret;
if (pos + count > VFP_STATE_SIZE)
return -EIO;
- uregs = &target->thread.fpsimd_state.user_fpsimd;
+ /*
+ * We will not overwrite the entire FPSIMD state, so we need to
+ * initialize 'newstate' with sane values.
+ */
+ newstate = *fpsimd_get_user_state(target);
- ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, uregs, 0,
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0,
VFP_STATE_SIZE - sizeof(compat_ulong_t));
if (count && !ret) {
ret = get_user(fpscr, (compat_ulong_t *)ubuf);
- uregs->fpsr = fpscr & VFP_FPSCR_STAT_MASK;
- uregs->fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
+ newstate.fpsr = fpscr & VFP_FPSCR_STAT_MASK;
+ newstate.fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
}
+ fpsimd_set_user_state(target, &newstate);
return ret;
}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591f75dd..54e1092c5b4c 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -47,11 +47,11 @@ struct rt_sigframe {
static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
{
- struct fpsimd_state *fpsimd = ¤t->thread.fpsimd_state;
+ struct fpsimd_state *fpsimd;
int err;
/* dump the hardware registers to the fpsimd_state structure */
- fpsimd_save_state(fpsimd);
+ fpsimd = fpsimd_get_task_state();
/* copy the FP and status/control registers */
err = __copy_to_user(ctx->vregs, fpsimd->vregs, sizeof(fpsimd->vregs));
@@ -88,7 +88,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
/* load the hardware registers from the fpsimd_state structure */
if (!err) {
preempt_disable();
- fpsimd_load_state(&fpsimd);
+ fpsimd_set_task_state(&fpsimd);
preempt_enable();
}
diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index b3fc9f5ec6d3..88e4535c3a45 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -208,7 +208,7 @@ int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
*/
static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
{
- struct fpsimd_state *fpsimd = ¤t->thread.fpsimd_state;
+ struct fpsimd_state *fpsimd;
compat_ulong_t magic = VFP_MAGIC;
compat_ulong_t size = VFP_STORAGE_SIZE;
compat_ulong_t fpscr, fpexc;
@@ -219,7 +219,7 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
* Note that this also saves V16-31, which aren't visible
* in AArch32.
*/
- fpsimd_save_state(fpsimd);
+ fpsimd = fpsimd_get_task_state();
/* Place structure header on the stack */
__put_user_error(magic, &frame->magic, err);
@@ -284,7 +284,7 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
*/
if (!err) {
preempt_disable();
- fpsimd_load_state(&fpsimd);
+ fpsimd_set_task_state(&fpsimd);
preempt_enable();
}
--
1.8.3.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/4] arm64: defer reloading a task's FPSIMD state to userland resume
2014-02-05 17:13 [PATCH v2 0/4] kernel mode NEON optimizations Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 1/4] arm64: add abstractions for FPSIMD state manipulation Ard Biesheuvel
@ 2014-02-05 17:13 ` Ard Biesheuvel
2014-02-21 17:48 ` Catalin Marinas
2014-02-05 17:13 ` [PATCH v2 3/4] arm64: add support for kernel mode NEON in interrupt context Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 4/4] arm64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
3 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2014-02-05 17:13 UTC (permalink / raw)
To: linux-arm-kernel
If a task gets scheduled out and back in again and nothing has touched
its FPSIMD state in the mean time, there is really no reason to reload
it from memory. Similarly, repeated calls to kernel_neon_begin() and
kernel_neon_end() will preserve and restore the FPSIMD state every time.
This patch defers the FPSIMD state restore to the last possible moment,
i.e., right before the task re-enters userland. If a task does not enter
userland at all (for any reason), the existing FPSIMD state is preserved
and may be reused by the owning task if it gets scheduled in again on the
same CPU.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 4 ++
arch/arm64/include/asm/thread_info.h | 4 +-
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/kernel/fpsimd.c | 102 +++++++++++++++++++++++++++++------
arch/arm64/kernel/signal.c | 4 ++
5 files changed, 98 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 7807974b49ee..f7e70f3f1eb7 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -37,6 +37,8 @@ struct fpsimd_state {
u32 fpcr;
};
};
+ /* the id of the last cpu to have restored this state */
+ unsigned int last_cpu;
};
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
@@ -61,6 +63,8 @@ void fpsimd_set_task_state(struct fpsimd_state *state);
struct user_fpsimd_state *fpsimd_get_user_state(struct task_struct *t);
void fpsimd_set_user_state(struct task_struct *t, struct user_fpsimd_state *st);
+void fpsimd_reload_fpstate(void);
+
#endif
#endif
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 720e70b66ffd..4a1ca1cfb2f8 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_SIGPENDING 0
#define TIF_NEED_RESCHED 1
#define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
+#define TIF_FOREIGN_FPSTATE 3 /* CPU's FP state is not current's */
#define TIF_SYSCALL_TRACE 8
#define TIF_POLLING_NRFLAG 16
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
@@ -112,10 +113,11 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
+#define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE)
#define _TIF_32BIT (1 << TIF_32BIT)
#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
- _TIF_NOTIFY_RESUME)
+ _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE)
#endif /* __KERNEL__ */
#endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 39ac630d83de..80464e2fb1a5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -576,7 +576,7 @@ fast_work_pending:
str x0, [sp, #S_X0] // returned x0
work_pending:
tbnz x1, #TIF_NEED_RESCHED, work_resched
- /* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+ /* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
ldr x2, [sp, #S_PSTATE]
mov x0, sp // 'regs'
tst x2, #PSR_MODE_MASK // user mode regs?
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index eeb003f54ad0..239c8162473f 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -39,6 +39,23 @@ void fpsimd_save_state(struct fpsimd_state *state);
void fpsimd_load_state(struct fpsimd_state *state);
/*
+ * In order to reduce the number of times the fpsimd state is needlessly saved
+ * and restored, keep track here of which task's userland owns the current state
+ * of the FPSIMD register file.
+ *
+ * This percpu variable points to the fpsimd_state.last_cpu field of the task
+ * whose FPSIMD state was most recently loaded onto this cpu. The last_cpu field
+ * itself contains the id of the cpu onto which the task's FPSIMD state was
+ * loaded most recently. So, to decide whether we can skip reloading the FPSIMD
+ * state, we need to check
+ * (a) whether this task was the last one to have its FPSIMD state loaded onto
+ * this cpu
+ * (b) whether this task may have manipulated its FPSIMD state on another cpu in
+ * the meantime
+ */
+static DEFINE_PER_CPU(unsigned int *, fpsimd_last_task);
+
+/*
* Trapped FP/ASIMD access.
*/
void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
@@ -76,36 +93,84 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
void fpsimd_thread_switch(struct task_struct *next)
{
- /* check if not kernel threads */
- if (current->mm)
+ /*
+ * The thread flag TIF_FOREIGN_FPSTATE conveys that the userland FPSIMD
+ * state belonging to the current task is not present in the registers
+ * but has (already) been saved to memory in order for the kernel to be
+ * able to go off and use the registers for something else. Therefore,
+ * we must not (re)save the register contents if this flag is set.
+ */
+ if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
- if (next->mm)
- fpsimd_load_state(&next->thread.fpsimd_state);
+
+ if (next->mm) {
+ /*
+ * If we are switching to a task whose most recent userland
+ * FPSIMD contents are already in the registers of *this* cpu,
+ * we can skip loading the state from memory. Otherwise, set
+ * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
+ * upon the next entry of userland.
+ */
+ struct fpsimd_state *st = &next->thread.fpsimd_state;
+
+ if (__get_cpu_var(fpsimd_last_task) == &st->last_cpu
+ && st->last_cpu == smp_processor_id())
+ clear_ti_thread_flag(task_thread_info(next),
+ TIF_FOREIGN_FPSTATE);
+ else
+ set_ti_thread_flag(task_thread_info(next),
+ TIF_FOREIGN_FPSTATE);
+ }
}
void fpsimd_flush_thread(void)
{
- preempt_disable();
memset(¤t->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
- fpsimd_load_state(¤t->thread.fpsimd_state);
+ set_thread_flag(TIF_FOREIGN_FPSTATE);
+}
+
+/*
+ * Sync the FPSIMD register file with the saved FPSIMD context (if necessary)
+ */
+void fpsimd_reload_fpstate(void)
+{
+ preempt_disable();
+ if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+ struct fpsimd_state *st = ¤t->thread.fpsimd_state;
+
+ fpsimd_load_state(st);
+ __get_cpu_var(fpsimd_last_task) = &st->last_cpu;
+ st->last_cpu = smp_processor_id();
+ }
preempt_enable();
}
/*
- * Sync the saved FPSIMD context with the FPSIMD register file
+ * Sync the saved FPSIMD context with the FPSIMD register file (if necessary)
*/
struct fpsimd_state *fpsimd_get_task_state(void)
{
- fpsimd_save_state(¤t->thread.fpsimd_state);
+ preempt_disable();
+ if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+ preempt_enable();
return ¤t->thread.fpsimd_state;
}
/*
- * Load a new FPSIMD state into the FPSIMD register file.
+ * Load a new FPSIMD state into the FPSIMD register file, and clear the
+ * TIF_FOREIGN_FPSTATE flag to convey that the register content is now
+ * owned by 'current'. To be called with preemption disabled.
*/
void fpsimd_set_task_state(struct fpsimd_state *state)
{
fpsimd_load_state(state);
+ if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+ struct fpsimd_state *st = ¤t->thread.fpsimd_state;
+
+ __get_cpu_var(fpsimd_last_task) = &st->last_cpu;
+ st->last_cpu = smp_processor_id();
+ }
}
struct user_fpsimd_state *fpsimd_get_user_state(struct task_struct *t)
@@ -116,6 +181,9 @@ struct user_fpsimd_state *fpsimd_get_user_state(struct task_struct *t)
void fpsimd_set_user_state(struct task_struct *t, struct user_fpsimd_state *st)
{
t->thread.fpsimd_state.user_fpsimd = *st;
+
+ /* invalidate potential live copies of this FPSIMD state */
+ t->thread.fpsimd_state.last_cpu = NR_CPUS;
}
#ifdef CONFIG_KERNEL_MODE_NEON
@@ -129,16 +197,19 @@ void kernel_neon_begin(void)
BUG_ON(in_interrupt());
preempt_disable();
- if (current->mm)
+ /*
+ * Save the userland FPSIMD state if we have one and if we haven't done
+ * so already. Clear fpsimd_last_task to indicate that there is no
+ * longer userland context in the registers.
+ */
+ if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
+ __get_cpu_var(fpsimd_last_task) = NULL;
}
EXPORT_SYMBOL(kernel_neon_begin);
void kernel_neon_end(void)
{
- if (current->mm)
- fpsimd_load_state(¤t->thread.fpsimd_state);
-
preempt_enable();
}
EXPORT_SYMBOL(kernel_neon_end);
@@ -151,12 +222,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
{
switch (cmd) {
case CPU_PM_ENTER:
- if (current->mm)
+ if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
break;
case CPU_PM_EXIT:
- if (current->mm)
- fpsimd_load_state(¤t->thread.fpsimd_state);
+ set_thread_flag(TIF_FOREIGN_FPSTATE);
break;
case CPU_PM_ENTER_FAILED:
default:
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 54e1092c5b4c..68d2957e5ebe 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -416,4 +416,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
}
+
+ if (thread_flags & _TIF_FOREIGN_FPSTATE)
+ fpsimd_reload_fpstate();
+
}
--
1.8.3.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/4] arm64: add support for kernel mode NEON in interrupt context
2014-02-05 17:13 [PATCH v2 0/4] kernel mode NEON optimizations Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 1/4] arm64: add abstractions for FPSIMD state manipulation Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 2/4] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
@ 2014-02-05 17:13 ` Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 4/4] arm64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
3 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2014-02-05 17:13 UTC (permalink / raw)
To: linux-arm-kernel
This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the case where only
a couple of registers are needed, kernel_neon_begin_partial(u32) is
introduced which takes as a parameter the number of bottom 'n' NEON
q-registers required. To mark the end of such a partial section, the
regular kernel_neon_end() should be used.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 17 ++++++++++++++
arch/arm64/include/asm/fpsimdmacros.h | 37 ++++++++++++++++++++++++++++++
arch/arm64/include/asm/neon.h | 6 ++++-
arch/arm64/kernel/entry-fpsimd.S | 24 +++++++++++++++++++
arch/arm64/kernel/fpsimd.c | 43 +++++++++++++++++++++++------------
5 files changed, 112 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index f7e70f3f1eb7..8c0f8d332528 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,6 +41,19 @@ struct fpsimd_state {
unsigned int last_cpu;
};
+/*
+ * Struct for stacking the bottom 'n' FP/SIMD registers.
+ * Mainly intended for kernel use of v8 Crypto Extensions which only
+ * needs a few registers and may need to execute in atomic context.
+ */
+struct fpsimd_partial_state {
+ u32 num_regs;
+ u32 fpsr;
+ u32 fpcr;
+ __uint128_t vregs[32] __aligned(16);
+} __aligned(16);
+
+
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/* Masks for extracting the FPSR and FPCR from the FPSCR */
#define VFP_FPSCR_STAT_MASK 0xf800009f
@@ -65,6 +78,10 @@ void fpsimd_set_user_state(struct task_struct *t, struct user_fpsimd_state *st);
void fpsimd_reload_fpstate(void);
+extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
+ u32 num_regs);
+extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state);
+
#endif
#endif
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bbec599c96bd..42990a82c671 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -62,3 +62,40 @@
ldr w\tmpnr, [\state, #16 * 2 + 4]
msr fpcr, x\tmpnr
.endm
+
+.altmacro
+.macro q2op, op, q1, q2, state
+ \op q\q1, q\q2, [\state, # -16 * \q1 - 16]
+.endm
+
+.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
+ mrs x\tmpnr1, fpsr
+ str w\numnr, [\state]
+ mrs x\tmpnr2, fpcr
+ stp w\tmpnr1, w\tmpnr2, [\state, #4]
+ adr x\tmpnr1, 0f
+ add \state, \state, x\numnr, lsl #4
+ sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
+ br x\tmpnr1
+ .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+ qb = \qa + 1
+ q2op stp, \qa, %qb, \state
+ .endr
+0:
+.endm
+
+.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
+ ldp w\tmpnr1, w\tmpnr2, [\state, #4]
+ msr fpsr, x\tmpnr1
+ msr fpcr, x\tmpnr2
+ adr x\tmpnr1, 0f
+ ldr w\tmpnr2, [\state]
+ add \state, \state, x\tmpnr2, lsl #4
+ sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
+ br x\tmpnr1
+ .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+ qb = \qa + 1
+ q2op ldp, \qa, %qb, \state
+ .endr
+0:
+.endm
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index b0cc58a97780..13ce4cc18e26 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -8,7 +8,11 @@
* published by the Free Software Foundation.
*/
+#include <linux/types.h>
+
#define cpu_has_neon() (1)
-void kernel_neon_begin(void);
+#define kernel_neon_begin() kernel_neon_begin_partial(32)
+
+void kernel_neon_begin_partial(u32 num_regs);
void kernel_neon_end(void);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 6a27cd6dbfa6..d358ccacfc00 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -41,3 +41,27 @@ ENTRY(fpsimd_load_state)
fpsimd_restore x0, 8
ret
ENDPROC(fpsimd_load_state)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+/*
+ * Save the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_save_partial_state)
+ fpsimd_save_partial x0, 1, 8, 9
+ ret
+ENDPROC(fpsimd_load_partial_state)
+
+/*
+ * Load the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_load_partial_state)
+ fpsimd_restore_partial x0, 8, 9
+ ret
+ENDPROC(fpsimd_load_partial_state)
+
+#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 239c8162473f..2ed92976ac16 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -188,29 +188,44 @@ void fpsimd_set_user_state(struct task_struct *t, struct user_fpsimd_state *st)
#ifdef CONFIG_KERNEL_MODE_NEON
+static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
+static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+
/*
* Kernel-side NEON support functions
*/
-void kernel_neon_begin(void)
+void kernel_neon_begin_partial(u32 num_regs)
{
- /* Avoid using the NEON in interrupt context */
- BUG_ON(in_interrupt());
- preempt_disable();
+ if (in_interrupt()) {
+ struct fpsimd_partial_state *s = this_cpu_ptr(
+ in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
- /*
- * Save the userland FPSIMD state if we have one and if we haven't done
- * so already. Clear fpsimd_last_task to indicate that there is no
- * longer userland context in the registers.
- */
- if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
- fpsimd_save_state(¤t->thread.fpsimd_state);
- __get_cpu_var(fpsimd_last_task) = NULL;
+ BUG_ON(num_regs > 32);
+ fpsimd_save_partial_state(s, roundup(num_regs, 2));
+ } else {
+ /*
+ * Save the userland FPSIMD state if we have one and if we
+ * haven't done so already. Clear fpsimd_last_task to indicate
+ * that there is no longer userland context in the registers.
+ */
+ preempt_disable();
+ if (current->mm &&
+ !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+ __get_cpu_var(fpsimd_last_task) = NULL;
+ }
}
-EXPORT_SYMBOL(kernel_neon_begin);
+EXPORT_SYMBOL(kernel_neon_begin_partial);
void kernel_neon_end(void)
{
- preempt_enable();
+ if (in_interrupt()) {
+ struct fpsimd_partial_state *s = this_cpu_ptr(
+ in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
+ fpsimd_load_partial_state(s);
+ } else {
+ preempt_enable();
+ }
}
EXPORT_SYMBOL(kernel_neon_end);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 4/4] arm64: add Crypto Extensions based synchronous core AES cipher
2014-02-05 17:13 [PATCH v2 0/4] kernel mode NEON optimizations Ard Biesheuvel
` (2 preceding siblings ...)
2014-02-05 17:13 ` [PATCH v2 3/4] arm64: add support for kernel mode NEON in interrupt context Ard Biesheuvel
@ 2014-02-05 17:13 ` Ard Biesheuvel
3 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2014-02-05 17:13 UTC (permalink / raw)
To: linux-arm-kernel
This implements the core AES cipher using the Crypto Extensions,
using only NEON registers q0 - q3.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/Makefile | 1 +
arch/arm64/crypto/Makefile | 13 ++++
arch/arm64/crypto/aes-ce-cipher.c | 134 ++++++++++++++++++++++++++++++++++++++
crypto/Kconfig | 6 ++
4 files changed, 154 insertions(+)
create mode 100644 arch/arm64/crypto/Makefile
create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2fceb71ac3b7..8185a913c5ed 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -45,6 +45,7 @@ export TEXT_OFFSET GZFLAGS
core-y += arch/arm64/kernel/ arch/arm64/mm/
core-$(CONFIG_KVM) += arch/arm64/kvm/
core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
libs-y := arch/arm64/lib/ $(libs-y)
libs-y += $(LIBGCC)
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 000000000000..ac58945c50b3
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,13 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
+
+CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
new file mode 100644
index 000000000000..6dd64106923f
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -0,0 +1,134 @@
+/*
+ * linux/arch/arm64/crypto/aes-ce-cipher.c
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+
+MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+ /*
+ * # of rounds specified by AES:
+ * 128 bit key 10 rounds
+ * 192 bit key 12 rounds
+ * 256 bit key 14 rounds
+ * => n byte key => 6 + (n/4) rounds
+ */
+ return 6 + ctx->key_length / 4;
+}
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ kernel_neon_begin_partial(4);
+
+ __asm__(" ld1 {v0.16b}, [%[in]] ;"
+ " cmp %[rounds], #10 ;"
+ " bmi 0f ;"
+ " bne 3f ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ " b 2f ;"
+ "0: ld1 {v2.2d-v3.2d}, [%[key]], #32 ;"
+ "1: aese v0.16b, v2.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ "2: aese v0.16b, v3.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ "3: ld1 {v1.2d-v3.2d}, [%[key]], #48 ;"
+ " subs %[rounds], %[rounds], #3 ;"
+ " aese v0.16b, v1.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ " bpl 1b ;"
+ " aese v0.16b, v2.16b ;"
+ " eor v0.16b, v0.16b, v3.16b ;"
+ " st1 {v0.16b}, [%[out]] ;"
+ : :
+ [out] "r"(dst),
+ [in] "r"(src),
+ [rounds] "r"(num_rounds(ctx) - 2),
+ [key] "r"(ctx->key_enc)
+ : "cc", "memory");
+
+ kernel_neon_end();
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ kernel_neon_begin_partial(4);
+
+ __asm__(" ld1 {v0.16b}, [%[in]] ;"
+ " cmp %[rounds], #10 ;"
+ " bmi 0f ;"
+ " bne 3f ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ " b 2f ;"
+ "0: ld1 {v2.2d-v3.2d}, [%[key]], #32 ;"
+ "1: aesd v0.16b, v2.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ "2: aesd v0.16b, v3.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ "3: ld1 {v1.2d-v3.2d}, [%[key]], #48 ;"
+ " subs %[rounds], %[rounds], #3 ;"
+ " aesd v0.16b, v1.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ " bpl 1b ;"
+ " aesd v0.16b, v2.16b ;"
+ " eor v0.16b, v0.16b, v3.16b ;"
+ " st1 {v0.16b}, [%[out]] ;"
+ : :
+ [out] "r"(dst),
+ [in] "r"(src),
+ [rounds] "r"(num_rounds(ctx) - 2),
+ [key] "r"(ctx->key_dec)
+ : "cc", "memory");
+
+ kernel_neon_end();
+}
+
+static struct crypto_alg aes_alg = {
+ .cra_name = "aes",
+ .cra_driver_name = "aes-ce",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_cipher = {
+ .cia_min_keysize = AES_MIN_KEY_SIZE,
+ .cia_max_keysize = AES_MAX_KEY_SIZE,
+ .cia_setkey = crypto_aes_set_key,
+ .cia_encrypt = aes_cipher_encrypt,
+ .cia_decrypt = aes_cipher_decrypt
+ }
+};
+
+static int __init aes_mod_init(void)
+{
+ if (!(elf_hwcap & HWCAP_AES))
+ return -ENODEV;
+ return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+ crypto_unregister_alg(&aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 7bcb70d216e1..f1d98bc346b6 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -791,6 +791,12 @@ config CRYPTO_AES_ARM_BS
This implementation does not rely on any lookup tables so it is
believed to be invulnerable to cache timing attacks.
+config CRYPTO_AES_ARM64_CE
+ tristate "Synchronous AES cipher using ARMv8 Crypto Extensions"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_ALGAPI
+ select CRYPTO_AES
+
config CRYPTO_ANUBIS
tristate "Anubis cipher algorithm"
select CRYPTO_ALGAPI
--
1.8.3.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 1/4] arm64: add abstractions for FPSIMD state manipulation
2014-02-05 17:13 ` [PATCH v2 1/4] arm64: add abstractions for FPSIMD state manipulation Ard Biesheuvel
@ 2014-02-21 14:25 ` Catalin Marinas
2014-02-21 14:52 ` Ard Biesheuvel
0 siblings, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2014-02-21 14:25 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Feb 05, 2014 at 05:13:35PM +0000, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 4aef42a04bdc..eeb003f54ad0 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -34,6 +34,10 @@
> #define FPEXC_IXF (1 << 4)
> #define FPEXC_IDF (1 << 7)
>
> +/* defined in entry-fpsimd.S but only used in this unit */
> +void fpsimd_save_state(struct fpsimd_state *state);
> +void fpsimd_load_state(struct fpsimd_state *state);
This functions still make sense for some situations where
fpsimd_get_task_state() is confusing.
> +
> /*
> * Trapped FP/ASIMD access.
> */
> @@ -87,6 +91,33 @@ void fpsimd_flush_thread(void)
> preempt_enable();
> }
>
> +/*
> + * Sync the saved FPSIMD context with the FPSIMD register file
> + */
> +struct fpsimd_state *fpsimd_get_task_state(void)
> +{
> + fpsimd_save_state(¤t->thread.fpsimd_state);
> + return ¤t->thread.fpsimd_state;
> +}
> +
> +/*
> + * Load a new FPSIMD state into the FPSIMD register file.
> + */
> +void fpsimd_set_task_state(struct fpsimd_state *state)
> +{
> + fpsimd_load_state(state);
> +}
Maybe s/task/current/. But in general I see a "get" function as not
having side-effects while here it populates the fpsimd_state. I don't
think the code gets clearer.
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 1c0a9be2ffa8..bfa8214f92d1 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -199,7 +199,7 @@ void release_thread(struct task_struct *dead_task)
>
> int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> {
> - fpsimd_save_state(¤t->thread.fpsimd_state);
> + fpsimd_get_task_state();
That's where get vs save looks strange to me.
> @@ -746,24 +745,29 @@ static int compat_vfp_set(struct task_struct *target,
> unsigned int pos, unsigned int count,
> const void *kbuf, const void __user *ubuf)
> {
> - struct user_fpsimd_state *uregs;
> + struct user_fpsimd_state newstate;
> compat_ulong_t fpscr;
> int ret;
>
> if (pos + count > VFP_STATE_SIZE)
> return -EIO;
>
> - uregs = &target->thread.fpsimd_state.user_fpsimd;
> + /*
> + * We will not overwrite the entire FPSIMD state, so we need to
> + * initialize 'newstate' with sane values.
> + */
> + newstate = *fpsimd_get_user_state(target);
>
> - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, uregs, 0,
> + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0,
> VFP_STATE_SIZE - sizeof(compat_ulong_t));
>
> if (count && !ret) {
> ret = get_user(fpscr, (compat_ulong_t *)ubuf);
> - uregs->fpsr = fpscr & VFP_FPSCR_STAT_MASK;
> - uregs->fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
> + newstate.fpsr = fpscr & VFP_FPSCR_STAT_MASK;
> + newstate.fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
> }
>
> + fpsimd_set_user_state(target, &newstate);
> return ret;
> }
What's the reason for this change? The patch is aimed at adding
abstractions but it does some more. Now you save half KB on the stack as
user_fpsimd_state. What if at some point we grow this structure?
--
Catalin
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/4] arm64: add abstractions for FPSIMD state manipulation
2014-02-21 14:25 ` Catalin Marinas
@ 2014-02-21 14:52 ` Ard Biesheuvel
0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2014-02-21 14:52 UTC (permalink / raw)
To: linux-arm-kernel
On 21 February 2014 15:25, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Feb 05, 2014 at 05:13:35PM +0000, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> index 4aef42a04bdc..eeb003f54ad0 100644
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -34,6 +34,10 @@
>> #define FPEXC_IXF (1 << 4)
>> #define FPEXC_IDF (1 << 7)
>>
>> +/* defined in entry-fpsimd.S but only used in this unit */
>> +void fpsimd_save_state(struct fpsimd_state *state);
>> +void fpsimd_load_state(struct fpsimd_state *state);
>
> This functions still make sense for some situations where
> fpsimd_get_task_state() is confusing.
>
This is an attempt to distinguish between, on the one hand, functions
that operate on the contents of the register, regardless of who owns
the context they belong to, and on the other hand, having a level of
abstraction where 'store my task's FPSIMD context to memory' does not
need to handle the situation where 'my task's FPSIMD state' is
somewhere else than in the registers.
So after the next patch, it is up to the caller of
fpsimd_{load|save}_state to understand that what he loads/saves
depends on TIF_FOREIGN_FPSTATE etc.
>> +
>> /*
>> * Trapped FP/ASIMD access.
>> */
>> @@ -87,6 +91,33 @@ void fpsimd_flush_thread(void)
>> preempt_enable();
>> }
>>
>> +/*
>> + * Sync the saved FPSIMD context with the FPSIMD register file
>> + */
>> +struct fpsimd_state *fpsimd_get_task_state(void)
>> +{
>> + fpsimd_save_state(¤t->thread.fpsimd_state);
>> + return ¤t->thread.fpsimd_state;
>> +}
>> +
>> +/*
>> + * Load a new FPSIMD state into the FPSIMD register file.
>> + */
>> +void fpsimd_set_task_state(struct fpsimd_state *state)
>> +{
>> + fpsimd_load_state(state);
>> +}
>
> Maybe s/task/current/. But in general I see a "get" function as not
> having side-effects while here it populates the fpsimd_state. I don't
> think the code gets clearer.
>
I am happy to improve the names, but I think this is a meaningful
abstraction to have, i.e., get a pointer to the task's saved FPSIMD
state and make sure the saved state is up to date.
In the next patch, the call to fpsimd_save_state() becomes
conditional, because the registers may contain something unrelated.
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index 1c0a9be2ffa8..bfa8214f92d1 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -199,7 +199,7 @@ void release_thread(struct task_struct *dead_task)
>>
>> int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>> {
>> - fpsimd_save_state(¤t->thread.fpsimd_state);
>> + fpsimd_get_task_state();
>
> That's where get vs save looks strange to me.
>
The existing function already relies on the assumption that src == current.
Perhaps it would be better to explicitly copy the fpsimd_state of src
in dst instead, but it would result in the FPSIMD state being copied
twice.
But the bottom line is that there needs to be some abstraction to
distinguish 'the FPSIMD state of current' from 'the FPSIMD in this
CPU's registers', as they will not be the same after the next patch.
>> @@ -746,24 +745,29 @@ static int compat_vfp_set(struct task_struct *target,
>> unsigned int pos, unsigned int count,
>> const void *kbuf, const void __user *ubuf)
>> {
>> - struct user_fpsimd_state *uregs;
>> + struct user_fpsimd_state newstate;
>> compat_ulong_t fpscr;
>> int ret;
>>
>> if (pos + count > VFP_STATE_SIZE)
>> return -EIO;
>>
>> - uregs = &target->thread.fpsimd_state.user_fpsimd;
>> + /*
>> + * We will not overwrite the entire FPSIMD state, so we need to
>> + * initialize 'newstate' with sane values.
>> + */
>> + newstate = *fpsimd_get_user_state(target);
>>
>> - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, uregs, 0,
>> + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0,
>> VFP_STATE_SIZE - sizeof(compat_ulong_t));
>>
>> if (count && !ret) {
>> ret = get_user(fpscr, (compat_ulong_t *)ubuf);
>> - uregs->fpsr = fpscr & VFP_FPSCR_STAT_MASK;
>> - uregs->fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
>> + newstate.fpsr = fpscr & VFP_FPSCR_STAT_MASK;
>> + newstate.fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
>> }
>>
>> + fpsimd_set_user_state(target, &newstate);
>> return ret;
>> }
>
> What's the reason for this change? The patch is aimed at adding
> abstractions but it does some more. Now you save half KB on the stack as
> user_fpsimd_state. What if at some point we grow this structure?
>
Perhaps it would be better to have a compat_vfp_state type of the
correct size and dedicated getters/setters?
But the reason is the same: in order not to clutter this code with
knowledge about the fact that current userland's FPSIMD state may not
be present in the registers all the time, I introduced an abstraction.
If you look at the next patch, set_user_state() needs to invalidate
live copies of the FPSIMD register state that may be present
elsewhere, and I think it is better to do that in only a single place,
not in all code that deals with FPSIMD state but on a less detailed
level.
Regards,
Ard.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/4] arm64: defer reloading a task's FPSIMD state to userland resume
2014-02-05 17:13 ` [PATCH v2 2/4] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
@ 2014-02-21 17:48 ` Catalin Marinas
2014-02-21 18:33 ` Ard Biesheuvel
0 siblings, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2014-02-21 17:48 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Feb 05, 2014 at 05:13:36PM +0000, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 7807974b49ee..f7e70f3f1eb7 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -37,6 +37,8 @@ struct fpsimd_state {
> u32 fpcr;
> };
> };
> + /* the id of the last cpu to have restored this state */
> + unsigned int last_cpu;
Just "cpu" is enough.
> };
>
> #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
> @@ -61,6 +63,8 @@ void fpsimd_set_task_state(struct fpsimd_state *state);
> struct user_fpsimd_state *fpsimd_get_user_state(struct task_struct *t);
> void fpsimd_set_user_state(struct task_struct *t, struct user_fpsimd_state *st);
>
> +void fpsimd_reload_fpstate(void);
> +
> #endif
>
> #endif
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 720e70b66ffd..4a1ca1cfb2f8 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
> #define TIF_SIGPENDING 0
> #define TIF_NEED_RESCHED 1
> #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
> +#define TIF_FOREIGN_FPSTATE 3 /* CPU's FP state is not current's */
> #define TIF_SYSCALL_TRACE 8
> #define TIF_POLLING_NRFLAG 16
> #define TIF_MEMDIE 18 /* is terminating due to OOM killer */
> @@ -112,10 +113,11 @@ static inline struct thread_info *current_thread_info(void)
> #define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
> #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
> #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
> +#define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE)
> #define _TIF_32BIT (1 << TIF_32BIT)
>
> #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
> - _TIF_NOTIFY_RESUME)
> + _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE)
>
> #endif /* __KERNEL__ */
> #endif /* __ASM_THREAD_INFO_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 39ac630d83de..80464e2fb1a5 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -576,7 +576,7 @@ fast_work_pending:
> str x0, [sp, #S_X0] // returned x0
> work_pending:
> tbnz x1, #TIF_NEED_RESCHED, work_resched
> - /* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
> + /* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
> ldr x2, [sp, #S_PSTATE]
> mov x0, sp // 'regs'
> tst x2, #PSR_MODE_MASK // user mode regs?
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index eeb003f54ad0..239c8162473f 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -39,6 +39,23 @@ void fpsimd_save_state(struct fpsimd_state *state);
> void fpsimd_load_state(struct fpsimd_state *state);
>
> /*
> + * In order to reduce the number of times the fpsimd state is needlessly saved
> + * and restored, keep track here of which task's userland owns the current state
> + * of the FPSIMD register file.
> + *
> + * This percpu variable points to the fpsimd_state.last_cpu field of the task
> + * whose FPSIMD state was most recently loaded onto this cpu. The last_cpu field
> + * itself contains the id of the cpu onto which the task's FPSIMD state was
> + * loaded most recently. So, to decide whether we can skip reloading the FPSIMD
> + * state, we need to check
> + * (a) whether this task was the last one to have its FPSIMD state loaded onto
> + * this cpu
> + * (b) whether this task may have manipulated its FPSIMD state on another cpu in
> + * the meantime
> + */
> +static DEFINE_PER_CPU(unsigned int *, fpsimd_last_task);
This can simply be struct fpsimd_state * to avoid &st->last_cpu. Also
rename to fpsimd_state_last.
> +
> +/*
> * Trapped FP/ASIMD access.
> */
> void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
> @@ -76,36 +93,84 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>
> void fpsimd_thread_switch(struct task_struct *next)
> {
> - /* check if not kernel threads */
> - if (current->mm)
> + /*
> + * The thread flag TIF_FOREIGN_FPSTATE conveys that the userland FPSIMD
> + * state belonging to the current task is not present in the registers
> + * but has (already) been saved to memory in order for the kernel to be
> + * able to go off and use the registers for something else. Therefore,
> + * we must not (re)save the register contents if this flag is set.
> + */
> + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
> fpsimd_save_state(¤t->thread.fpsimd_state);
> - if (next->mm)
> - fpsimd_load_state(&next->thread.fpsimd_state);
> +
> + if (next->mm) {
> + /*
> + * If we are switching to a task whose most recent userland
> + * FPSIMD contents are already in the registers of *this* cpu,
> + * we can skip loading the state from memory. Otherwise, set
> + * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
> + * upon the next entry of userland.
I would say "return" instead of "entry" (we enter the kernel and return
to user space ;)).
> + */
> + struct fpsimd_state *st = &next->thread.fpsimd_state;
> +
> + if (__get_cpu_var(fpsimd_last_task) == &st->last_cpu
> + && st->last_cpu == smp_processor_id())
> + clear_ti_thread_flag(task_thread_info(next),
> + TIF_FOREIGN_FPSTATE);
> + else
> + set_ti_thread_flag(task_thread_info(next),
> + TIF_FOREIGN_FPSTATE);
> + }
> }
I'm still trying to get my head around why we have 3 different type of
checks for this (fpsimd_last_task, last_cpu and TIF). The code seems
correct but I wonder whether we can reduce this to 2 checks?
--
Catalin
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/4] arm64: defer reloading a task's FPSIMD state to userland resume
2014-02-21 17:48 ` Catalin Marinas
@ 2014-02-21 18:33 ` Ard Biesheuvel
2014-02-24 10:14 ` Catalin Marinas
0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2014-02-21 18:33 UTC (permalink / raw)
To: linux-arm-kernel
On 21 February 2014 18:48, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Feb 05, 2014 at 05:13:36PM +0000, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
>> index 7807974b49ee..f7e70f3f1eb7 100644
>> --- a/arch/arm64/include/asm/fpsimd.h
>> +++ b/arch/arm64/include/asm/fpsimd.h
>> @@ -37,6 +37,8 @@ struct fpsimd_state {
>> u32 fpcr;
>> };
>> };
>> + /* the id of the last cpu to have restored this state */
>> + unsigned int last_cpu;
>
> Just "cpu" is enough.
>
OK
[...]
>> /*
>> + * In order to reduce the number of times the fpsimd state is needlessly saved
>> + * and restored, keep track here of which task's userland owns the current state
>> + * of the FPSIMD register file.
>> + *
>> + * This percpu variable points to the fpsimd_state.last_cpu field of the task
>> + * whose FPSIMD state was most recently loaded onto this cpu. The last_cpu field
>> + * itself contains the id of the cpu onto which the task's FPSIMD state was
>> + * loaded most recently. So, to decide whether we can skip reloading the FPSIMD
>> + * state, we need to check
>> + * (a) whether this task was the last one to have its FPSIMD state loaded onto
>> + * this cpu
>> + * (b) whether this task may have manipulated its FPSIMD state on another cpu in
>> + * the meantime
>> + */
>> +static DEFINE_PER_CPU(unsigned int *, fpsimd_last_task);
>
> This can simply be struct fpsimd_state * to avoid &st->last_cpu. Also
> rename to fpsimd_state_last.
>
OK
>> +
>> +/*
>> * Trapped FP/ASIMD access.
>> */
>> void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
>> @@ -76,36 +93,84 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>>
>> void fpsimd_thread_switch(struct task_struct *next)
>> {
>> - /* check if not kernel threads */
>> - if (current->mm)
>> + /*
>> + * The thread flag TIF_FOREIGN_FPSTATE conveys that the userland FPSIMD
>> + * state belonging to the current task is not present in the registers
>> + * but has (already) been saved to memory in order for the kernel to be
>> + * able to go off and use the registers for something else. Therefore,
>> + * we must not (re)save the register contents if this flag is set.
>> + */
>> + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
>> fpsimd_save_state(¤t->thread.fpsimd_state);
>> - if (next->mm)
>> - fpsimd_load_state(&next->thread.fpsimd_state);
>> +
>> + if (next->mm) {
>> + /*
>> + * If we are switching to a task whose most recent userland
>> + * FPSIMD contents are already in the registers of *this* cpu,
>> + * we can skip loading the state from memory. Otherwise, set
>> + * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
>> + * upon the next entry of userland.
>
> I would say "return" instead of "entry" (we enter the kernel and return
> to user space ;)).
>
OK
>> + */
>> + struct fpsimd_state *st = &next->thread.fpsimd_state;
>> +
>> + if (__get_cpu_var(fpsimd_last_task) == &st->last_cpu
>> + && st->last_cpu == smp_processor_id())
>> + clear_ti_thread_flag(task_thread_info(next),
>> + TIF_FOREIGN_FPSTATE);
>> + else
>> + set_ti_thread_flag(task_thread_info(next),
>> + TIF_FOREIGN_FPSTATE);
>> + }
>> }
>
> I'm still trying to get my head around why we have 3 different type of
> checks for this (fpsimd_last_task, last_cpu and TIF). The code seems
> correct but I wonder whether we can reduce this to 2 checks?
>
Well, I suppose using the TIF flag is somewhat redundant, it is
basically a shorthand for expressing that the following does /not/
hold
__get_cpu_var(fpsimd_last_state) == ¤t->thread.fpsimd_state &&
current->thread.fpsimd_state.cpu == smp_processor_id()
I suppose that the test at resume can tolerate the overhead, so I can
rework the code to get rid of it.
Regards,
Ard.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/4] arm64: defer reloading a task's FPSIMD state to userland resume
2014-02-21 18:33 ` Ard Biesheuvel
@ 2014-02-24 10:14 ` Catalin Marinas
0 siblings, 0 replies; 10+ messages in thread
From: Catalin Marinas @ 2014-02-24 10:14 UTC (permalink / raw)
To: linux-arm-kernel
On 21 Feb 2014, at 18:33, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 21 February 2014 18:48, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Wed, Feb 05, 2014 at 05:13:36PM +0000, Ard Biesheuvel wrote:
>>> + */
>>> + struct fpsimd_state *st = &next->thread.fpsimd_state;
>>> +
>>> + if (__get_cpu_var(fpsimd_last_task) == &st->last_cpu
>>> + && st->last_cpu == smp_processor_id())
>>> + clear_ti_thread_flag(task_thread_info(next),
>>> + TIF_FOREIGN_FPSTATE);
>>> + else
>>> + set_ti_thread_flag(task_thread_info(next),
>>> + TIF_FOREIGN_FPSTATE);
>>> + }
>>> }
>>
>> I'm still trying to get my head around why we have 3 different type of
>> checks for this (fpsimd_last_task, last_cpu and TIF). The code seems
>> correct but I wonder whether we can reduce this to 2 checks?
>
> Well, I suppose using the TIF flag is somewhat redundant, it is
> basically a shorthand for expressing that the following does /not/
> hold
>
> __get_cpu_var(fpsimd_last_state) == ¤t->thread.fpsimd_state &&
> current->thread.fpsimd_state.cpu == smp_processor_id()
OK, it starts to make more sense now ;).
Basically, if we only cared about context switching (rather than Neon in
the kernel), we would have to always save the state of the scheduled out
task but restore it only if the current hw state is different. A way to
check this is fpsimd_last_state && cpu (I can?t really think of a
better way).
With the addition of kernel_neon_begin/end(), we want to optimise this
further by (a) only saving the state at context switch if it hasn?t
been saved already (by kernel_neon_begin) and (b) defer the restoring to
user space to avoid re-saving/restoring of the state.
Case (a) is when Neon is used between the syscall entry and switch_to()
for a given thread. Case (b) is for scenarios where Neon is used between
switch_to() and return to user. Are both of these likely? I think they are
(e.g. sending->waiting->receiving).
> I suppose that the test at resume can tolerate the overhead, so I can
> rework the code to get rid of it.
It may not be that simple since we need per-CPU variables retrieved in
assembly. So we end up with a function call plus per-CPU variable
checking and this must be done on the return from interrupt path as
well. In which case the TIF flag is quicker as an optimisation. If I
have any better idea I?ll let you know.
In the meantime, I think it?s ok to keep all three checks for
different scenarios but please add some more explanation in the fpsimd.c
file so that in a year time we still remember the logic (documenting the
scenarios and when we check which TIF flag, per-CPU variable etc.).
Thanks,
Catalin
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-02-24 10:14 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-05 17:13 [PATCH v2 0/4] kernel mode NEON optimizations Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 1/4] arm64: add abstractions for FPSIMD state manipulation Ard Biesheuvel
2014-02-21 14:25 ` Catalin Marinas
2014-02-21 14:52 ` Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 2/4] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
2014-02-21 17:48 ` Catalin Marinas
2014-02-21 18:33 ` Ard Biesheuvel
2014-02-24 10:14 ` Catalin Marinas
2014-02-05 17:13 ` [PATCH v2 3/4] arm64: add support for kernel mode NEON in interrupt context Ard Biesheuvel
2014-02-05 17:13 ` [PATCH v2 4/4] arm64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).