[RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts
@ 2013-10-13 12:14 Ard Biesheuvel
  2013-10-13 12:14 ` [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context Ard Biesheuvel
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-13 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

Take #3 of this RFC series.

Instead of having additional separate versions of kernel_neon_begin/end, the
existing ones now have been modified to always take a preallocated stack area
as an argument.

The stack area is allocated by DEFINE_NEON_REGSTACK[_PARTIAL](varname), where
the partial version takes an additional int num_regs indicating how many
registers need to be freed up.

In the !in_interrupt() case, these functions operate as before, and the regstack
is defined to minimal size in this case as it will remain unused anyway. In the
in_interrupt() case, 'num_regs' (or all) NEON registers are stacked/unstacked
using the allocated stack region.

Patches #1 and #4 implement the above for ARM and ARM64, respectively. Patch #3
implements the optimization for ARM64 suggested by Catalin, which has no lazy
restore, potentially resulting in lots of unnecessary stack/unstack sequences
otherwise.

The remaining patches are existing or new users of this API, for reference.

Ard Biesheuvel (7):
  ARM: add support for kernel mode NEON in atomic context
  ARM: port NEON version of xor_blocks() to new kmode NEON api
  ARM64: defer reloading a task's FPSIMD state to userland resume
  ARM64: add support for kernel mode NEON in atomic context
  ARM64: add Crypto Extensions based synchronous core AES cipher
  ARM64: add Crypto Extensions based synchronous AES in CCM mode
  lib/raid6: port NEON implementation to updated kmode NEON api

 arch/arm/include/asm/fpstate.h        |  12 +
 arch/arm/include/asm/neon.h           |  32 ++-
 arch/arm/include/asm/xor.h            |  48 ++--
 arch/arm/vfp/vfphw.S                  |  45 ++++
 arch/arm/vfp/vfpmodule.c              |  55 +++--
 arch/arm64/Makefile                   |  11 +-
 arch/arm64/crypto/Makefile            |  14 ++
 arch/arm64/crypto/aes-sync.c          | 453 ++++++++++++++++++++++++++++++++++
 arch/arm64/crypto/aesce-ccm.S         | 186 ++++++++++++++
 arch/arm64/include/asm/fpsimd.h       |  17 ++
 arch/arm64/include/asm/fpsimdmacros.h |  35 +++
 arch/arm64/include/asm/neon.h         |  31 ++-
 arch/arm64/include/asm/thread_info.h  |   4 +-
 arch/arm64/kernel/entry-fpsimd.S      |  24 ++
 arch/arm64/kernel/entry.S             |   2 +-
 arch/arm64/kernel/fpsimd.c            |  34 +--
 arch/arm64/kernel/signal.c            |   2 +
 lib/raid6/neon.c                      |   9 +-
 18 files changed, 932 insertions(+), 82 deletions(-)
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/aes-sync.c
 create mode 100644 arch/arm64/crypto/aesce-ccm.S

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context
  2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
@ 2013-10-13 12:14 ` Ard Biesheuvel
  2013-10-15 17:26   ` Catalin Marinas
  2013-10-13 12:14 ` [RFC v3 PATCH 2/7] ARM: port NEON version of xor_blocks() to new kmode NEON api Ard Biesheuvel
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-13 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

Some applications, such as WPA CCMP encryption, do substantial
amounts of work in non-process context. In order to support
accelerated NEON implementations under these circumstances, we
need a way to preserve the NEON context that may
(a) belong to a completely unrelated userland process (if the
    NEON unit is turned off atm);
(b) belong to current userland;
(c) belong to current kernel mode in process context.

The best way to deal with this is to just stack whatever registers
we are going to use, and unstack them when we are done.

This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the in_interrupt()
case, they now both take a parameter defined by DEFINE_NEON_REGSTACK()
or DEFINE_NEON_REGSTACK_PARTIAL() [in case only a few NEON registers
are in fact used]. The !in_interrupt() case is unchanged from before.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/fpstate.h | 12 +++++++++
 arch/arm/include/asm/neon.h    | 32 +++++++++++++++++++++---
 arch/arm/vfp/vfphw.S           | 45 ++++++++++++++++++++++++++++++++++
 arch/arm/vfp/vfpmodule.c       | 55 ++++++++++++++++++++++++------------------
 4 files changed, 118 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/fpstate.h b/arch/arm/include/asm/fpstate.h
index 3ad4c10..0471c36 100644
--- a/arch/arm/include/asm/fpstate.h
+++ b/arch/arm/include/asm/fpstate.h
@@ -52,6 +52,18 @@ union vfp_state {
 extern void vfp_flush_thread(union vfp_state *);
 extern void vfp_release_thread(union vfp_state *);
 
+/*
+ * Variable sized struct for stacking the bottom 'n' NEON registers.
+ */
+struct vfp_partial_state {
+	__u32		fpexc;
+	__u32		fpscr;
+	__u8		qregs[] __aligned(16);
+} __aligned(16);
+
+extern void vfp_load_partial_state(struct vfp_partial_state *, u32 num_regs);
+extern void vfp_save_partial_state(struct vfp_partial_state *, u32 num_regs);
+
 #define FP_HARD_SIZE 35
 
 struct fp_hard_struct {
diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h
index 8f730fe..800d85c 100644
--- a/arch/arm/include/asm/neon.h
+++ b/arch/arm/include/asm/neon.h
@@ -8,10 +8,30 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/types.h>
+#include <linux/hardirq.h>
+#include <asm/fpstate.h>
 #include <asm/hwcap.h>
 
 #define cpu_has_neon()		(!!(elf_hwcap & HWCAP_NEON))
 
+/*
+ * Avoid wasting stack space by making the size of the allocated area depend on
+ * whether we are currently running in process context. (If this is the case, we
+ * will use the normal preserve/restore mechanism, leaving the allocated stack
+ * space unused.)
+ */
+#define __QREG_SIZE(num)	\
+	((!in_interrupt()) ? 0 : (num) > 16 ? 256 : 16 * (((num) + 1) & ~1U))
+
+#define DEFINE_NEON_REGSTACK_PARTIAL(v, num)		\
+	struct {					\
+		struct vfp_partial_state regs;		\
+		u8 qregs[__QREG_SIZE(num)];		\
+	} v
+
+#define DEFINE_NEON_REGSTACK(name)	DEFINE_NEON_REGSTACK_PARTIAL(name, 16)
+
 #ifdef __ARM_NEON__
 
 /*
@@ -27,10 +47,16 @@
  *     -mpfu=neon is set.
  */
 
-#define kernel_neon_begin() \
+#define kernel_neon_begin(p) \
 	BUILD_BUG_ON_MSG(1, "kernel_neon_begin() called from NEON code")
 
 #else
-void kernel_neon_begin(void);
+#define kernel_neon_begin(p) \
+	__kernel_neon_begin(&(p).regs, sizeof((p).qregs)/16)
 #endif
-void kernel_neon_end(void);
+
+#define kernel_neon_end(p) \
+	__kernel_neon_end(&(p).regs, sizeof((p).qregs)/16)
+
+void __kernel_neon_begin(struct vfp_partial_state *regs, u32 num_regs);
+void __kernel_neon_end(struct vfp_partial_state *regs, u32 num_regs);
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index 3e5d311..28384a5 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -322,3 +322,48 @@ ENTRY(vfp_put_double)
 	.endr
 #endif
 ENDPROC(vfp_put_double)
+
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+	.fpu	neon
+ENTRY(vfp_save_partial_state)
+	VFPFMRX	r2, FPEXC			@ load the control registers
+	tst	r2, #FPEXC_EN
+	str	r2, [r0]			@ save to memory
+	bne	0f
+	orr	r2, r2, #FPEXC_EN		@ enable VFP if it was disabled
+	VFPFMXR	FPEXC, r2
+0:	VFPFMRX	r3, FPSCR
+	str	r3, [r0, #4]			@ save to memory
+	rsbs	r1, r1, #16
+	add	r2, r0, #16
+	beq	1f
+	adr	r3, 1f
+	add	r3, r3, r1, lsl #1
+THUMB(	orr	r3, r3, #1)
+	bx	r3
+1:	.irp	qq,q14-q15,q12-q13,q10-q11,q8-q9,q6-q7,q4-q5,q2-q3,q0-q1
+	vst1.8	{\qq}, [r2,:128]!
+	.endr
+	bx	lr
+ENDPROC(vfp_save_partial_state)
+
+ENTRY(vfp_load_partial_state)
+	rsbs	r1, r1, #16
+	add	r2, r0, #16
+	beq	0f
+	adr	r3, 0f
+	add	r3, r3, r1, lsl #1
+THUMB(	orr	r3, r3, #1)
+	bx	r3
+0:	.irp	qq,q14-q15,q12-q13,q10-q11,q8-q9,q6-q7,q4-q5,q2-q3,q0-q1
+	vld1.8	{\qq}, [r2,:128]!
+	.endr
+	ldrd	r2, r3, [r0]
+	VFPFMXR	FPSCR, r3
+	VFPFMXR	FPEXC, r2
+	bx	lr
+ENDPROC(vfp_load_partial_state)
+
+#endif
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 52b8f40..b924a5b 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -674,44 +674,53 @@ void vfp_kmode_exception(void)
 /*
  * Kernel-side NEON support functions
  */
-void kernel_neon_begin(void)
+void __kernel_neon_begin(struct vfp_partial_state *regs, u32 num_regs)
 {
 	struct thread_info *thread = current_thread_info();
 	unsigned int cpu;
 	u32 fpexc;
 
 	/*
-	 * Kernel mode NEON is only allowed outside of interrupt context
-	 * with preemption disabled. This will make sure that the kernel
-	 * mode NEON register contents never need to be preserved.
+	 * If running in non-process context, we just stack whatever registers
+	 * the caller has indicated he needs. Otherwise, do a regular preserve
+	 * of the userland context.
 	 */
-	BUG_ON(in_interrupt());
-	cpu = get_cpu();
+	if (in_interrupt()) {
+		BUG_ON(!num_regs);
+		vfp_save_partial_state(regs, num_regs);
+	} else {
+		cpu = get_cpu();
 
-	fpexc = fmrx(FPEXC) | FPEXC_EN;
-	fmxr(FPEXC, fpexc);
+		fpexc = fmrx(FPEXC) | FPEXC_EN;
+		fmxr(FPEXC, fpexc);
 
-	/*
-	 * Save the userland NEON/VFP state. Under UP,
-	 * the owner could be a task other than 'current'
-	 */
-	if (vfp_state_in_hw(cpu, thread))
-		vfp_save_state(&thread->vfpstate, fpexc);
+		/*
+		 * Save the userland NEON/VFP state. Under UP,
+		 * the owner could be a task other than 'current'
+		 */
+		if (vfp_state_in_hw(cpu, thread))
+			vfp_save_state(&thread->vfpstate, fpexc);
 #ifndef CONFIG_SMP
-	else if (vfp_current_hw_state[cpu] != NULL)
-		vfp_save_state(vfp_current_hw_state[cpu], fpexc);
+		else if (vfp_current_hw_state[cpu] != NULL)
+			vfp_save_state(vfp_current_hw_state[cpu], fpexc);
 #endif
-	vfp_current_hw_state[cpu] = NULL;
+		vfp_current_hw_state[cpu] = NULL;
+	}
 }
-EXPORT_SYMBOL(kernel_neon_begin);
+EXPORT_SYMBOL(__kernel_neon_begin);
 
-void kernel_neon_end(void)
+void __kernel_neon_end(struct vfp_partial_state *regs, u32 num_regs)
 {
-	/* Disable the NEON/VFP unit. */
-	fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
-	put_cpu();
+	if (in_interrupt()) {
+		BUG_ON(!num_regs);
+		vfp_load_partial_state(regs, num_regs);
+	} else {
+		/* Disable the NEON/VFP unit. */
+		fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
+		put_cpu();
+	}
 }
-EXPORT_SYMBOL(kernel_neon_end);
+EXPORT_SYMBOL(__kernel_neon_end);
 
 #endif /* CONFIG_KERNEL_MODE_NEON */
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 2/7] ARM: port NEON version of xor_blocks() to new kmode NEON api
  2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
  2013-10-13 12:14 ` [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context Ard Biesheuvel
@ 2013-10-13 12:14 ` Ard Biesheuvel
  2013-10-13 12:14 ` [RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-13 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

It is now permissible to use the NEON in non-process context, so
update the XOR code so it uses the NEON version even in non-process
context.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/xor.h | 48 +++++++++++++++++++---------------------------
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
index 4ffb26d..1bda8b5 100644
--- a/arch/arm/include/asm/xor.h
+++ b/arch/arm/include/asm/xor.h
@@ -151,52 +151,44 @@ extern struct xor_block_template const xor_block_neon_inner;
 static void
 xor_neon_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
 {
-	if (in_interrupt()) {
-		xor_arm4regs_2(bytes, p1, p2);
-	} else {
-		kernel_neon_begin();
-		xor_block_neon_inner.do_2(bytes, p1, p2);
-		kernel_neon_end();
-	}
+	DEFINE_NEON_REGSTACK(s);
+
+	kernel_neon_begin(s);
+	xor_block_neon_inner.do_2(bytes, p1, p2);
+	kernel_neon_end(s);
 }
 
 static void
 xor_neon_3(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 		unsigned long *p3)
 {
-	if (in_interrupt()) {
-		xor_arm4regs_3(bytes, p1, p2, p3);
-	} else {
-		kernel_neon_begin();
-		xor_block_neon_inner.do_3(bytes, p1, p2, p3);
-		kernel_neon_end();
-	}
+	DEFINE_NEON_REGSTACK(s);
+
+	kernel_neon_begin(s);
+	xor_block_neon_inner.do_3(bytes, p1, p2, p3);
+	kernel_neon_end(s);
 }
 
 static void
 xor_neon_4(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 		unsigned long *p3, unsigned long *p4)
 {
-	if (in_interrupt()) {
-		xor_arm4regs_4(bytes, p1, p2, p3, p4);
-	} else {
-		kernel_neon_begin();
-		xor_block_neon_inner.do_4(bytes, p1, p2, p3, p4);
-		kernel_neon_end();
-	}
+	DEFINE_NEON_REGSTACK(s);
+
+	kernel_neon_begin(s);
+	xor_block_neon_inner.do_4(bytes, p1, p2, p3, p4);
+	kernel_neon_end(s);
 }
 
 static void
 xor_neon_5(unsigned long bytes, unsigned long *p1, unsigned long *p2,
 		unsigned long *p3, unsigned long *p4, unsigned long *p5)
 {
-	if (in_interrupt()) {
-		xor_arm4regs_5(bytes, p1, p2, p3, p4, p5);
-	} else {
-		kernel_neon_begin();
-		xor_block_neon_inner.do_5(bytes, p1, p2, p3, p4, p5);
-		kernel_neon_end();
-	}
+	DEFINE_NEON_REGSTACK(s);
+
+	kernel_neon_begin(s);
+	xor_block_neon_inner.do_5(bytes, p1, p2, p3, p4, p5);
+	kernel_neon_end(s);
 }
 
 static struct xor_block_template xor_block_neon = {
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume
  2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
  2013-10-13 12:14 ` [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context Ard Biesheuvel
  2013-10-13 12:14 ` [RFC v3 PATCH 2/7] ARM: port NEON version of xor_blocks() to new kmode NEON api Ard Biesheuvel
@ 2013-10-13 12:14 ` Ard Biesheuvel
  2013-10-28 18:12   ` Catalin Marinas
  2013-10-13 12:15 ` [RFC v3 PATCH 4/7] ARM64: add support for kernel mode NEON in atomic context Ard Biesheuvel
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-13 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

Modify kernel_neon_begin() and kernel_neon_end() so subsequent calls
don't need to preserve/restore the userland FPSIMD state if the task
has not entered userland in the mean time.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/thread_info.h | 4 +++-
 arch/arm64/kernel/entry.S            | 2 +-
 arch/arm64/kernel/fpsimd.c           | 7 ++-----
 arch/arm64/kernel/signal.c           | 2 ++
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 23a3c47..3bdeab6 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SIGPENDING		0
 #define TIF_NEED_RESCHED	1
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
+#define TIF_RELOAD_FPSTATE	3	/* user FPSIMD context saved to mem */
 #define TIF_SYSCALL_TRACE	8
 #define TIF_POLLING_NRFLAG	16
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
@@ -118,10 +119,11 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
+#define _TIF_RELOAD_FPSTATE	(1 << TIF_RELOAD_FPSTATE)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
-				 _TIF_NOTIFY_RESUME)
+				 _TIF_NOTIFY_RESUME | _TIF_RELOAD_FPSTATE)
 
 #endif /* __KERNEL__ */
 #endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 3881fd1..2c6c7fb 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -589,7 +589,7 @@ fast_work_pending:
 	str	x0, [sp, #S_X0]			// returned x0
 work_pending:
 	tbnz	x1, #TIF_NEED_RESCHED, work_resched
-	/* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+	/* TIF_SIGPENDING/TIF_NOTIFY_RESUME/TIF_RELOAD_FPSTATE case */
 	ldr	x2, [sp, #S_PSTATE]
 	mov	x0, sp				// 'regs'
 	tst	x2, #PSR_MODE_MASK		// user mode regs?
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1f2e4d5..a52affd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 void fpsimd_thread_switch(struct task_struct *next)
 {
 	/* check if not kernel threads */
-	if (current->mm)
+	if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
 	if (next->mm)
 		fpsimd_load_state(&next->thread.fpsimd_state);
@@ -95,16 +95,13 @@ void kernel_neon_begin(void)
 	BUG_ON(in_interrupt());
 	preempt_disable();
 
-	if (current->mm)
+	if (current->mm && !test_and_set_thread_flag(TIF_RELOAD_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
 }
 EXPORT_SYMBOL(kernel_neon_begin);
 
 void kernel_neon_end(void)
 {
-	if (current->mm)
-		fpsimd_load_state(&current->thread.fpsimd_state);
-
 	preempt_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591..da3a433 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -416,4 +416,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 	}
+	if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
+		fpsimd_load_state(&current->thread.fpsimd_state);
 }
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 4/7] ARM64: add support for kernel mode NEON in atomic context
  2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2013-10-13 12:14 ` [RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
@ 2013-10-13 12:15 ` Ard Biesheuvel
  2013-10-13 12:15 ` [RFC v3 PATCH 5/7] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-13 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the in_interrupt()
case, they now both take a parameter defined by DEFINE_NEON_REGSTACK()
or DEFINE_NEON_REGSTACK_PARTIAL() [in case only a few NEON registers
are in fact used]. The !in_interrupt() case is unchanged from before.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h       | 17 +++++++++++++++++
 arch/arm64/include/asm/fpsimdmacros.h | 35 +++++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/neon.h         | 31 +++++++++++++++++++++++++++++--
 arch/arm64/kernel/entry-fpsimd.S      | 24 ++++++++++++++++++++++++
 arch/arm64/kernel/fpsimd.c            | 29 ++++++++++++++++++-----------
 5 files changed, 123 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index c43b4ac..755bdf1 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -39,6 +39,18 @@ struct fpsimd_state {
 	};
 };
 
+/*
+ * Variable sized struct for stacking the bottom 'n' FP/SIMD registers.
+ * Mainly intended for kernel use of v8 Crypto Extensions which only
+ * needs a few registers and may need to execute in atomic context.
+ */
+struct fpsimd_partial_state {
+	u32		fpsr;
+	u32		fpcr;
+	__uint128_t	vregs[] __aligned(16);
+} __aligned(16);
+
+
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
 /* Masks for extracting the FPSR and FPCR from the FPSCR */
 #define VFP_FPSCR_STAT_MASK	0xf800009f
@@ -55,6 +67,11 @@ struct task_struct;
 extern void fpsimd_save_state(struct fpsimd_state *state);
 extern void fpsimd_load_state(struct fpsimd_state *state);
 
+extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
+				      u32 num_regs);
+extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state,
+				      u32 num_regs);
+
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bbec599..f771b69 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -62,3 +62,38 @@
 	ldr	w\tmpnr, [\state, #16 * 2 + 4]
 	msr	fpcr, x\tmpnr
 .endm
+
+.altmacro
+.macro	q2op, op, q1, q2, state
+	\op	q\q1, q\q2, [\state, #-(16 * \q1) - 16]
+.endm
+
+.macro fpsimd_save_partial state, num, tmpnr1, tmpnr2
+	mrs	x\tmpnr1, fpsr
+	mrs	x\tmpnr2, fpcr
+	stp	w\tmpnr1, w\tmpnr2, [\state]
+	adr	x\tmpnr1, 0f
+	add	\state, \state, \num, lsl #4
+	sub	x\tmpnr1, x\tmpnr1, \num, lsl #1
+	br	x\tmpnr1
+	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+		qb = \qa + 1
+	q2op	stp, \qa, %qb, \state
+	.endr
+0:
+.endm
+
+.macro fpsimd_restore_partial state, num, tmpnr1, tmpnr2
+	ldp	w\tmpnr1, w\tmpnr2, [\state]
+	msr	fpsr, x\tmpnr1
+	msr	fpcr, x\tmpnr2
+	adr	x\tmpnr1, 0f
+	add	\state, \state, \num, lsl #4
+	sub	x\tmpnr1, x\tmpnr1, \num, lsl #1
+	br	x\tmpnr1
+	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+		qb = \qa + 1
+	q2op	ldp, \qa, %qb, \state
+	.endr
+0:
+.endm
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index b0cc58a9..e496dce 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -8,7 +8,34 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/hardirq.h>
+#include <linux/types.h>
+#include <asm/fpsimd.h>
+
 #define cpu_has_neon()		(1)
 
-void kernel_neon_begin(void);
-void kernel_neon_end(void);
+/*
+ * Avoid wasting stack space by making the size of the allocated area depend on
+ * whether we are currently running in process context. (If this is the case, we
+ * will use the normal preserve/restore mechanism, leaving the allocated stack
+ * space unused.)
+ */
+#define __VREG_SIZE(num)	\
+	((!in_interrupt()) ? 0 : (num) > 32 ? 512 : 32 * (((num) + 1) & ~1U))
+
+#define DEFINE_NEON_REGSTACK_PARTIAL(v, num)		\
+	struct {					\
+		struct fpsimd_partial_state regs;		\
+		u8 vregs[__VREG_SIZE(num)];		\
+	} v
+
+#define DEFINE_NEON_REGSTACK(name)	DEFINE_NEON_REGSTACK_PARTIAL(name, 32)
+
+#define kernel_neon_begin(p) \
+	__kernel_neon_begin(&(p).regs, sizeof((p).vregs)/16)
+
+#define kernel_neon_end(p) \
+	__kernel_neon_end(&(p).regs, sizeof((p).vregs)/16)
+
+void __kernel_neon_begin(struct fpsimd_partial_state *regs, u32 num_regs);
+void __kernel_neon_end(struct fpsimd_partial_state *regs, u32 num_regs);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 6a27cd6..aa73ee9 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -41,3 +41,27 @@ ENTRY(fpsimd_load_state)
 	fpsimd_restore x0, 8
 	ret
 ENDPROC(fpsimd_load_state)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+/*
+ * Save the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_save_partial_state)
+	fpsimd_save_partial x0, x1, 8, 9
+	ret
+ENDPROC(fpsimd_load_partial_state)
+
+/*
+ * Load the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_load_partial_state)
+	fpsimd_restore_partial x0, x1, 8, 9
+	ret
+ENDPROC(fpsimd_load_partial_state)
+
+#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index a52affd..34fa94b 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -89,22 +89,29 @@ void fpsimd_flush_thread(void)
 /*
  * Kernel-side NEON support functions
  */
-void kernel_neon_begin(void)
+void __kernel_neon_begin(struct fpsimd_partial_state *regs, u32 num_regs)
 {
-	/* Avoid using the NEON in interrupt context */
-	BUG_ON(in_interrupt());
-	preempt_disable();
-
-	if (current->mm && !test_and_set_thread_flag(TIF_RELOAD_FPSTATE))
-		fpsimd_save_state(&current->thread.fpsimd_state);
+	if (in_interrupt()) {
+		BUG_ON(!num_regs);
+		fpsimd_save_partial_state(regs, num_regs);
+	} else {
+		preempt_disable();
+		if (current->mm &&
+		    !test_and_set_thread_flag(TIF_RELOAD_FPSTATE))
+			fpsimd_save_state(&current->thread.fpsimd_state);
+	}
 }
-EXPORT_SYMBOL(kernel_neon_begin);
+EXPORT_SYMBOL(__kernel_neon_begin);
 
-void kernel_neon_end(void)
+void __kernel_neon_end(struct fpsimd_partial_state *regs, u32 num_regs)
 {
-	preempt_enable();
+	if (in_interrupt()) {
+		BUG_ON(!num_regs);
+		fpsimd_load_partial_state(regs, num_regs);
+	} else
+		preempt_enable();
 }
-EXPORT_SYMBOL(kernel_neon_end);
+EXPORT_SYMBOL(__kernel_neon_end);
 
 #endif /* CONFIG_KERNEL_MODE_NEON */
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 5/7] ARM64: add Crypto Extensions based synchronous core AES cipher
  2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2013-10-13 12:15 ` [RFC v3 PATCH 4/7] ARM64: add support for kernel mode NEON in atomic context Ard Biesheuvel
@ 2013-10-13 12:15 ` Ard Biesheuvel
  2013-10-13 12:15 ` [RFC v3 PATCH 6/7] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-13 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

This implements the core AES cipher using the Crypto Extensions,
using only NEON register q0 and q1.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Makefile          |  11 +++--
 arch/arm64/crypto/Makefile   |  14 ++++++
 arch/arm64/crypto/aes-sync.c | 106 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 126 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/aes-sync.c

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index d90cf79..d1ca9d8 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -36,11 +36,12 @@ TEXT_OFFSET := 0x00080000
 
 export	TEXT_OFFSET GZFLAGS
 
-core-y		+= arch/arm64/kernel/ arch/arm64/mm/
-core-$(CONFIG_KVM) += arch/arm64/kvm/
-core-$(CONFIG_XEN) += arch/arm64/xen/
-libs-y		:= arch/arm64/lib/ $(libs-y)
-libs-y		+= $(LIBGCC)
+core-y			+= arch/arm64/kernel/ arch/arm64/mm/
+core-$(CONFIG_KVM)	+= arch/arm64/kvm/
+core-$(CONFIG_XEN)	+= arch/arm64/xen/
+core-$(CONFIG_CRYPTO)	+= arch/arm64/crypto/
+libs-y			:= arch/arm64/lib/ $(libs-y)
+libs-y			+= $(LIBGCC)
 
 # Default target when executing plain make
 KBUILD_IMAGE	:= Image.gz
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 0000000..269d9be
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,14 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+aesce-sync-y	:= aes-sync.o
+obj-m		+= aesce-sync.o
+
+CFLAGS_aes-sync.o	+= -march=armv8-a+crypto
diff --git a/arch/arm64/crypto/aes-sync.c b/arch/arm64/crypto/aes-sync.c
new file mode 100644
index 0000000..5d7ed4e
--- /dev/null
+++ b/arch/arm64/crypto/aes-sync.c
@@ -0,0 +1,106 @@
+/*
+ * linux/arch/arm64/crypto/aes-sync.c
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 rounds = 6 + ctx->key_length / 4;
+	DEFINE_NEON_REGSTACK_PARTIAL(regs, 2);
+
+	kernel_neon_begin(regs);
+
+	__asm__("	ld1		{v0.16b}, [%[in]]		;"
+		"	ld1		{v1.16b}, [%[key]], #16		;"
+		"0:	aese		v0.16b, v1.16b			;"
+		"	subs		%[rounds], %[rounds], #1	;"
+		"	ld1		{v1.16b}, [%[key]], #16		;"
+		"	beq		1f				;"
+		"	aesmc		v0.16b, v0.16b			;"
+		"	b		0b				;"
+		"1:	eor		v0.16b, v0.16b, v1.16b		;"
+		"	st1		{v0.16b}, [%[out]]		;"
+	: :
+		[out]		"r"(dst),
+		[in]		"r"(src),
+		[rounds]	"r"(rounds),
+		[key]		"r"(ctx->key_enc)
+	:			"cc");
+
+	kernel_neon_end(regs);
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 rounds = 6 + ctx->key_length / 4;
+	DEFINE_NEON_REGSTACK_PARTIAL(regs, 2);
+
+	kernel_neon_begin(regs);
+
+	__asm__("	ld1		{v0.16b}, [%[in]]		;"
+		"	ld1		{v1.16b}, [%[key]], #16		;"
+		"0:	aesd		v0.16b, v1.16b			;"
+		"	ld1		{v1.16b}, [%[key]], #16		;"
+		"	subs		%[rounds], %[rounds], #1	;"
+		"	beq		1f				;"
+		"	aesimc		v0.16b, v0.16b			;"
+		"	b		0b				;"
+		"1:	eor		v0.16b, v0.16b, v1.16b		;"
+		"	st1		{v0.16b}, [%[out]]		;"
+	: :
+		[out]		"r"(dst),
+		[in]		"r"(src),
+		[rounds]	"r"(rounds),
+		[key]		"r"(ctx->key_dec)
+	:			"cc");
+
+	kernel_neon_end(regs);
+}
+
+static struct crypto_alg aes_alg = {
+	.cra_name		= "aes",
+	.cra_driver_name	= "aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_cipher = {
+		.cia_min_keysize	= AES_MIN_KEY_SIZE,
+		.cia_max_keysize	= AES_MAX_KEY_SIZE,
+		.cia_setkey		= crypto_aes_set_key,
+		.cia_encrypt		= aes_cipher_encrypt,
+		.cia_decrypt		= aes_cipher_decrypt
+	}
+};
+
+static int __init aes_mod_init(void)
+{
+	if (0) // TODO check for crypto extensions
+		return -ENODEV;
+	return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+	crypto_unregister_alg(&aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
+
+MODULE_DESCRIPTION("Synchronous AES using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 6/7] ARM64: add Crypto Extensions based synchronous AES in CCM mode
  2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2013-10-13 12:15 ` [RFC v3 PATCH 5/7] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
@ 2013-10-13 12:15 ` Ard Biesheuvel
  2013-10-13 12:15 ` [RFC v3 PATCH 7/7] lib/raid6: port NEON implementation to updated kmode NEON api Ard Biesheuvel
  2013-10-15  4:01 ` [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Nicolas Pitre
  7 siblings, 0 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-13 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

This implements the CCM AEAD chaining mode for AES using Crypto
Extensions instructions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Makefile    |   2 +-
 arch/arm64/crypto/aes-sync.c  | 355 +++++++++++++++++++++++++++++++++++++++++-
 arch/arm64/crypto/aesce-ccm.S | 186 ++++++++++++++++++++++
 3 files changed, 538 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/crypto/aesce-ccm.S

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 269d9be..f15940c 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -8,7 +8,7 @@
 # published by the Free Software Foundation.
 #
 
-aesce-sync-y	:= aes-sync.o
+aesce-sync-y	:= aes-sync.o aesce-ccm.o
 obj-m		+= aesce-sync.o
 
 CFLAGS_aes-sync.o	+= -march=armv8-a+crypto
diff --git a/arch/arm64/crypto/aes-sync.c b/arch/arm64/crypto/aes-sync.c
index 5d7ed4e..0c0d0bd 100644
--- a/arch/arm64/crypto/aes-sync.c
+++ b/arch/arm64/crypto/aes-sync.c
@@ -9,7 +9,10 @@
  */
 
 #include <asm/neon.h>
+#include <asm/unaligned.h>
 #include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/scatterwalk.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
 
@@ -69,7 +72,313 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 	kernel_neon_end(regs);
 }
 
-static struct crypto_alg aes_alg = {
+struct crypto_ccm_aes_ctx {
+	struct crypto_aes_ctx	*key;
+	struct crypto_blkcipher	*blk_tfm;
+};
+
+asmlinkage void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+				     u32 const rk[], u32 rounds);
+
+asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+				   u32 const rk[], u32 rounds, u8 mac[],
+				   u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+				   u32 const rk[], u32 rounds, u8 mac[],
+				   u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
+				 long rounds);
+
+static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
+		      unsigned int key_len)
+{
+	struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(tfm);
+	int ret;
+
+	ret = crypto_aes_expand_key(ctx->key, in_key, key_len);
+	if (!ret)
+		return 0;
+
+	tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+	return -EINVAL;
+}
+
+static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
+{
+	if ((authsize & 1) || authsize < 4)
+		return -EINVAL;
+	return 0;
+}
+
+static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	__be32 *n = (__be32 *)&maciv[AES_BLOCK_SIZE - 8];
+	u32 l = req->iv[0] + 1;
+
+	/* verify that CCM dimension 'L' is set correctly in the IV */
+	if (l < 2 || l > 8)
+		return -EINVAL;
+
+	/* verify that msglen can in fact be represented in L bytes */
+	if (msglen >> (8 * l))
+		return -EOVERFLOW;
+
+	/*
+	 * Even if the CCM spec allows L values of up to 8, the Linux cryptoapi
+	 * uses a u32 type to represent msglen so the top 4 bytes are always 0.
+	 */
+	n[0] = 0;
+	n[1] = cpu_to_be32(msglen);
+
+	memcpy(maciv, req->iv, AES_BLOCK_SIZE - l);
+
+	maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2;
+	if (req->assoclen)
+		maciv[0] |= 0x40;
+
+	memset(&req->iv[AES_BLOCK_SIZE - l], 0, l);
+	return 0;
+}
+
+static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct __packed { __be16 l; __be32 h; } ltag;
+	u32 rounds = 6 + ctx->key->key_length / 4;
+	struct scatter_walk walk;
+	u32 len = req->assoclen;
+	u32 macp;
+
+	/* prepend the AAD with a length tag */
+	if (len < 0xff00) {
+		ltag.l = cpu_to_be16(len);
+		macp = 2;
+	} else  {
+		ltag.l = cpu_to_be16(0xfffe);
+		put_unaligned_be32(len, &ltag.h);
+		macp = 6;
+	}
+
+	ce_aes_ccm_auth_data(mac, (u8 *)&ltag, macp, ctx->key->key_enc, rounds);
+	scatterwalk_start(&walk, req->assoc);
+
+	do {
+		u32 n = scatterwalk_clamp(&walk, len);
+		u32 m;
+		u8 *p;
+
+		if (!n) {
+			scatterwalk_start(&walk, sg_next(walk.sg));
+			n = scatterwalk_clamp(&walk, len);
+		}
+		p = scatterwalk_map(&walk);
+		m = min(n, AES_BLOCK_SIZE - macp);
+		crypto_xor(&mac[macp], p, m);
+
+		len -= n;
+		n -= m;
+		macp += m;
+		if (macp == AES_BLOCK_SIZE && (n || len)) {
+			ce_aes_ccm_auth_data(mac, &p[m], n, ctx->key->key_enc,
+					     rounds);
+			macp = n % AES_BLOCK_SIZE;
+		}
+
+		scatterwalk_unmap(p);
+		scatterwalk_advance(&walk, n + m);
+		scatterwalk_done(&walk, 0, len);
+	} while (len);
+}
+
+struct ccm_inner_desc_info {
+	u8	ctriv[AES_BLOCK_SIZE];
+	u8	mac[AES_BLOCK_SIZE];
+} __aligned(8);
+
+static int ccm_inner_encrypt(struct blkcipher_desc *desc,
+			     struct scatterlist *dst, struct scatterlist *src,
+			     unsigned int nbytes)
+{
+	struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
+	struct ccm_inner_desc_info *descinfo = desc->info;
+	u32 rounds = 6 + ctx->key_length / 4;
+	struct blkcipher_walk walk;
+	int err;
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE);
+
+	while (walk.nbytes) {
+		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+		if (walk.nbytes == nbytes)
+			tail = 0;
+
+		ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				   walk.nbytes - tail, ctx->key_enc, rounds,
+				   descinfo->mac, descinfo->ctriv);
+
+		nbytes -= walk.nbytes - tail;
+		err = blkcipher_walk_done(desc, &walk, tail);
+	}
+	return err;
+}
+
+static int ccm_inner_decrypt(struct blkcipher_desc *desc,
+			     struct scatterlist *dst, struct scatterlist *src,
+			     unsigned int nbytes)
+{
+	struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
+	struct ccm_inner_desc_info *descinfo = desc->info;
+	u32 rounds = 6 + ctx->key_length / 4;
+	struct blkcipher_walk walk;
+	int err;
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE);
+
+	while (walk.nbytes) {
+		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+		if (walk.nbytes == nbytes)
+			tail = 0;
+
+		ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				   walk.nbytes - tail, ctx->key_enc, rounds,
+				   descinfo->mac, descinfo->ctriv);
+
+		nbytes -= walk.nbytes - tail;
+		err = blkcipher_walk_done(desc, &walk, tail);
+	}
+	return err;
+}
+
+static int ccm_encrypt(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+	u32 rounds = 6 + ctx->key->key_length / 4;
+	struct ccm_inner_desc_info descinfo;
+	DEFINE_NEON_REGSTACK_PARTIAL(regs, 4);
+	int err;
+
+	struct blkcipher_desc desc = {
+		.tfm	= ctx->blk_tfm,
+		.info	= &descinfo,
+		.flags = 0,
+	};
+
+	err = ccm_init_mac(req, descinfo.mac, req->cryptlen);
+	if (err)
+		return err;
+
+	kernel_neon_begin(regs);
+
+	if (req->assoclen)
+		ccm_calculate_auth_mac(req, descinfo.mac);
+
+	memcpy(descinfo.ctriv, req->iv, AES_BLOCK_SIZE);
+
+	/* call inner blkcipher to process the payload */
+	err = ccm_inner_encrypt(&desc, req->dst, req->src, req->cryptlen);
+	if (!err)
+		ce_aes_ccm_final(descinfo.mac, req->iv, ctx->key->key_enc,
+				 rounds);
+
+	kernel_neon_end(regs);
+
+	if (err)
+		return err;
+
+	/* copy authtag to end of dst */
+	scatterwalk_map_and_copy(descinfo.mac, req->dst, req->cryptlen,
+				 crypto_aead_authsize(aead), 1);
+
+	return 0;
+}
+
+static int ccm_decrypt(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+	u32 rounds = 6 + ctx->key->key_length / 4;
+	struct ccm_inner_desc_info descinfo;
+	DEFINE_NEON_REGSTACK_PARTIAL(regs, 4);
+	u8 atag[AES_BLOCK_SIZE];
+	u32 len;
+	int err;
+
+	struct blkcipher_desc desc = {
+		.tfm	= ctx->blk_tfm,
+		.info	= &descinfo,
+		.flags = 0,
+	};
+
+	len = req->cryptlen - crypto_aead_authsize(aead);
+	err = ccm_init_mac(req, descinfo.mac, len);
+	if (err)
+		return err;
+
+	if (req->assoclen)
+		ccm_calculate_auth_mac(req, descinfo.mac);
+
+	memcpy(descinfo.ctriv, req->iv, AES_BLOCK_SIZE);
+
+	kernel_neon_begin(regs);
+
+	/* call inner blkcipher to process the payload */
+	err = ccm_inner_decrypt(&desc, req->dst, req->src, len);
+	if (!err)
+		ce_aes_ccm_final(descinfo.mac, req->iv, ctx->key->key_enc,
+				 rounds);
+
+	kernel_neon_end(regs);
+
+	if (err)
+		return err;
+
+	/* compare calculated auth tag with the stored one */
+	scatterwalk_map_and_copy(atag, req->src, len,
+				 crypto_aead_authsize(aead), 0);
+
+	if (memcmp(descinfo.mac, atag, crypto_aead_authsize(aead)))
+		return -EBADMSG;
+	return 0;
+}
+
+static int ccm_init(struct crypto_tfm *tfm)
+{
+	struct crypto_ccm_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct crypto_blkcipher *blk_tfm;
+
+	blk_tfm = crypto_alloc_blkcipher("__driver-ccm-aesce-inner", 0, 0);
+	if (IS_ERR(blk_tfm))
+		return PTR_ERR(blk_tfm);
+
+	/* did we get the right one? (sanity check) */
+	if (crypto_blkcipher_crt(blk_tfm)->encrypt != ccm_inner_encrypt) {
+		crypto_free_blkcipher(ctx->blk_tfm);
+		return -EINVAL;
+	}
+
+	ctx->blk_tfm = blk_tfm;
+	ctx->key = crypto_blkcipher_ctx(blk_tfm);
+
+	return 0;
+}
+
+static void ccm_exit(struct crypto_tfm *tfm)
+{
+	struct crypto_ccm_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	crypto_free_blkcipher(ctx->blk_tfm);
+}
+
+static struct crypto_alg aes_algs[] = { {
 	.cra_name		= "aes",
 	.cra_driver_name	= "aes-ce",
 	.cra_priority		= 300,
@@ -84,18 +393,56 @@ static struct crypto_alg aes_alg = {
 		.cia_encrypt		= aes_cipher_encrypt,
 		.cia_decrypt		= aes_cipher_decrypt
 	}
-};
+}, {
+	.cra_name		= "__ccm-aesce-inner",
+	.cra_driver_name	= "__driver-ccm-aesce-inner",
+	.cra_priority		= 0,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= 1,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_alignmask		= 7,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_blkcipher = {
+		.min_keysize	= AES_MIN_KEY_SIZE,
+		.max_keysize	= AES_MAX_KEY_SIZE,
+		.ivsize		= sizeof(struct ccm_inner_desc_info),
+		.setkey		= crypto_aes_set_key,
+		.encrypt	= ccm_inner_encrypt,
+		.decrypt	= ccm_inner_decrypt,
+	},
+}, {
+	.cra_name		= "ccm(aes)",
+	.cra_driver_name	= "ccm-aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_AEAD,
+	.cra_blocksize		= 1,
+	.cra_ctxsize		= sizeof(struct crypto_ccm_aes_ctx),
+	.cra_alignmask		= 7,
+	.cra_type		= &crypto_aead_type,
+	.cra_module		= THIS_MODULE,
+	.cra_init		= ccm_init,
+	.cra_exit		= ccm_exit,
+	.cra_aead = {
+		.ivsize		= AES_BLOCK_SIZE,
+		.maxauthsize	= AES_BLOCK_SIZE,
+		.setkey		= ccm_setkey,
+		.setauthsize	= ccm_setauthsize,
+		.encrypt	= ccm_encrypt,
+		.decrypt	= ccm_decrypt,
+	}
+} };
 
 static int __init aes_mod_init(void)
 {
 	if (0) // TODO check for crypto extensions
 		return -ENODEV;
-	return crypto_register_alg(&aes_alg);
+	return crypto_register_algs(aes_algs, ARRAY_SIZE(aes_algs));
 }
 
 static void __exit aes_mod_exit(void)
 {
-	crypto_unregister_alg(&aes_alg);
+	crypto_unregister_algs(aes_algs, ARRAY_SIZE(aes_algs));
 }
 
 module_init(aes_mod_init);
diff --git a/arch/arm64/crypto/aesce-ccm.S b/arch/arm64/crypto/aesce-ccm.S
new file mode 100644
index 0000000..df1248b
--- /dev/null
+++ b/arch/arm64/crypto/aesce-ccm.S
@@ -0,0 +1,186 @@
+/*
+ * linux/arch/arm64/crypto/aesce-ccm.S - AES-CCM transform for ARMv8 with
+ *                                       Crypto Extensions
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.arch	armv8-a+crypto
+
+	/*
+	 * void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+	 *			     u8 const rk[], u32 rounds);
+	 */
+ENTRY(ce_aes_ccm_auth_data)
+	ld1	{v0.16b}, [x0]			/* load mac */
+0:	ld1	{v3.16b}, [x3]			/* load first round key */
+	mov	w7, w4
+	add	x6, x3, #16
+	b	2f
+1:	aese	v0.16b, v2.16b
+	subs	w7, w7, #2
+	beq	3f
+	aesmc	v0.16b, v0.16b
+2:	aese	v0.16b, v3.16b
+	ld1	{v2.16b-v3.16b}, [x6], #32	/* load next round keys */
+	aesmc	v0.16b, v0.16b
+	b	1b
+3:	eor	v0.16b, v0.16b, v3.16b		/* final round */
+	subs	w2, w2, #16			/* last data? */
+	bmi	4f
+	ld1	{v1.16b}, [x1], #16		/* load next input block */
+	eor	v0.16b, v0.16b, v1.16b		/* xor with mac */
+	bne	0b
+4:	st1	{v0.16b}, [x0]			/* store mac */
+	beq	6f
+	adds	w2, w2, #16
+	beq	6f
+5:	ldrb	w7, [x1], #1
+	umov	w6, v0.b[0]
+	eor	w6, w6, w7
+	strb	w6, [x0], #1
+	subs	w2, w2, #1
+	beq	6f
+	ext	v0.16b, v0.16b, v0.16b, #1	/* rotate out the mac bytes */
+	b	5b
+6:	ret
+ENDPROC(ce_aes_ccm_auth_data)
+
+	/*
+	 * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[],
+	 * 			 u32 rounds);
+	 */
+ENTRY(ce_aes_ccm_final)
+	ld1	{v0.16b}, [x0]			/* load mac */
+	ld1	{v2.16b-v3.16b}, [x2], #32	/* load first 2 round keys */
+	ld1	{v1.16b}, [x1]			/* load 1st ctriv */
+	cmp	w3, #12
+	beq	1f
+0:	aese	v0.16b, v2.16b			/* 4 rounds, 2x interleaved */
+	aese	v1.16b, v2.16b
+	aesmc	v0.16b, v0.16b
+	aesmc	v1.16b, v1.16b
+	aese	v0.16b, v3.16b
+	aese	v1.16b, v3.16b
+	subs	w3, w3, #4
+	ble	2f
+	ld1	{v2.16b-v3.16b}, [x2], #32	/* load next 2 round keys */
+	aesmc	v0.16b, v0.16b
+	aesmc	v1.16b, v1.16b
+1:	aese	v0.16b, v2.16b
+	aese	v1.16b, v2.16b
+	aesmc	v0.16b, v0.16b
+	aesmc	v1.16b, v1.16b
+	aese	v0.16b, v3.16b
+	aese	v1.16b, v3.16b
+	ld1	{v2.16b-v3.16b}, [x2], #32	/* load next 2 round keys */
+	aesmc	v0.16b, v0.16b
+	aesmc	v1.16b, v1.16b
+	b	0b
+2:	/* final round key cancels out */
+	eor	v0.16b, v0.16b, v1.16b		/* en-/decrypt the mac */
+	st1	{v0.16b}, [x0]			/* store result */
+	ret
+ENDPROC(ce_aes_ccm_final)
+
+	.macro	aes_ccm_do_crypt,enc
+	ldr	x8, [x6, #8]			/* load lower ctr */
+	ld1	{v0.16b}, [x5]			/* load mac */
+	rev	x8, x8				/* keep swabbed ctr in reg */
+	b	0f
+	.align	6
+0:	ld1	{v1.8b}, [x6]			/* load upper ctr */
+	ld1	{v3.16b}, [x3]			/* load first round key */
+	add	x8, x8, #1
+	mov	w7, w4				/* get # of rounds */
+	rev	x9, x8
+	cmp	w4, #12				/* 10, 12 or 14 rounds? */
+	add	x10, x3, #16
+	ins	v1.d[1], x9			/* no carry in lower ctr */
+	beq	3f
+	b	2f
+1:	aese	v0.16b, v2.16b			/* 4 rounds, 2x interleaved */
+	aese	v1.16b, v2.16b
+	aesmc	v0.16b, v0.16b
+	aesmc	v1.16b, v1.16b
+2:	aese	v0.16b, v3.16b
+	aese	v1.16b, v3.16b
+	ld1	{v2.16b-v3.16b}, [x10], #32	/* load next 2 round keys */
+	aesmc	v0.16b, v0.16b
+	aesmc	v1.16b, v1.16b
+	subs	w7, w7, #4
+	aese	v0.16b, v2.16b
+	aese	v1.16b, v2.16b
+	ble	4f
+	aesmc	v0.16b, v0.16b
+	aesmc	v1.16b, v1.16b
+3:	aese	v0.16b, v3.16b
+	aese	v1.16b, v3.16b
+	ld1	{v2.16b-v3.16b}, [x10], #32	/* load next 2 round keys */
+	aesmc	v0.16b, v0.16b
+	aesmc	v1.16b, v1.16b
+	b	1b
+4:	subs	w2, w2, #16
+	bmi	5f
+	ld1	{v2.16b}, [x1], #16		/* load next input block */
+	.if	\enc == 1
+	eor	v2.16b, v2.16b, v3.16b		/* final round enc+mac */
+	eor	v1.16b, v1.16b, v2.16b		/* xor with crypted ctr */
+	.else
+	eor	v2.16b, v2.16b, v1.16b		/* xor with crypted ctr */
+	eor	v1.16b, v2.16b, v3.16b		/* final round enc */
+	.endif
+	eor	v0.16b, v0.16b, v2.16b		/* xor mac with pt ^ rk[last] */
+	st1	{v1.16b}, [x0], #16		/* write output block */
+	beq	5f
+	b	0b
+5:	eor	v0.16b, v0.16b, v3.16b		/* final round mac */
+	eor	v1.16b, v1.16b, v3.16b		/* final round enc */
+	st1	{v0.16b}, [x5]			/* store mac */
+	beq	7f
+	add	w2, w2, #16			/* process partial tail block */
+6:	ldrb	w9, [x1], #1			/* get 1 byte of input */
+	umov	w6, v1.b[0]			/* get top crypted ctr byte */
+	umov	w7, v0.b[0]			/* get top mac byte */
+	.if	\enc == 1
+	eor	w7, w7, w9
+	eor	w9, w9, w6
+	.else
+	eor	w9, w9, w6
+	eor	w7, w7, w9
+	.endif
+	strb	w9, [x0], #1			/* store out byte */
+	strb	w7, [x5], #1			/* store mac byte */
+	subs	w2, w2, #1
+	beq	8f
+	ext	v0.16b, v0.16b, v0.16b, #1	/* shift out mac byte */
+	ext	v1.16b, v1.16b, v1.16b, #1	/* shift out ctr byte */
+	b	6b
+7:	rev	x8, x8
+	str	x8, [x6, #8]			/* store lsb end of ctr (BE) */
+8:	ret
+	.endm
+
+	/*
+	 * void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+	 * 			   u8 const rk[], u32 rounds, u8 mac[],
+	 * 			   u8 ctr[]);
+	 * void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+	 * 			   u8 const rk[], u32 rounds, u8 mac[],
+	 * 			   u8 ctr[]);
+	 */
+ENTRY(ce_aes_ccm_encrypt)
+	aes_ccm_do_crypt	1
+ENDPROC(ce_aes_ccm_encrypt)
+
+ENTRY(ce_aes_ccm_decrypt)
+	aes_ccm_do_crypt	0
+ENDPROC(ce_aes_ccm_decrypt)
+
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 7/7] lib/raid6: port NEON implementation to updated kmode NEON api
  2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2013-10-13 12:15 ` [RFC v3 PATCH 6/7] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
@ 2013-10-13 12:15 ` Ard Biesheuvel
  2013-10-15  4:01 ` [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Nicolas Pitre
  7 siblings, 0 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-13 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 lib/raid6/neon.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/raid6/neon.c b/lib/raid6/neon.c
index 36ad470..172b53f 100644
--- a/lib/raid6/neon.c
+++ b/lib/raid6/neon.c
@@ -13,8 +13,8 @@
 #ifdef __KERNEL__
 #include <asm/neon.h>
 #else
-#define kernel_neon_begin()
-#define kernel_neon_end()
+#define kernel_neon_begin(s)
+#define kernel_neon_end(s)
 #define cpu_has_neon()		(1)
 #endif
 
@@ -33,12 +33,13 @@
 	static void raid6_neon ## _n ## _gen_syndrome(int disks,	\
 					size_t bytes, void **ptrs)	\
 	{								\
+		DEFINE_NEON_REGSTACK(s);				\
 		void raid6_neon ## _n  ## _gen_syndrome_real(int,	\
 						unsigned long, void**);	\
-		kernel_neon_begin();					\
+		kernel_neon_begin(s);					\
 		raid6_neon ## _n ## _gen_syndrome_real(disks,		\
 					(unsigned long)bytes, ptrs);	\
-		kernel_neon_end();					\
+		kernel_neon_end(s);					\
 	}								\
 	struct raid6_calls const raid6_neonx ## _n = {			\
 		raid6_neon ## _n ## _gen_syndrome,			\
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts
  2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2013-10-13 12:15 ` [RFC v3 PATCH 7/7] lib/raid6: port NEON implementation to updated kmode NEON api Ard Biesheuvel
@ 2013-10-15  4:01 ` Nicolas Pitre
  2013-10-15 13:13   ` Ard Biesheuvel
  7 siblings, 1 reply; 19+ messages in thread
From: Nicolas Pitre @ 2013-10-15  4:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 13 Oct 2013, Ard Biesheuvel wrote:

> Take #3 of this RFC series.
> 
> Instead of having additional separate versions of kernel_neon_begin/end, the
> existing ones now have been modified to always take a preallocated stack area
> as an argument.

The problem with this approach is that you break git bisect by making
the kernel unbuildable when this series is partially applied.  Either
you make kernel_neon_begin/end into wrappers with no argument around the
new interface, or you change all users at the same time as the
interface.  One big principle is not to break the kernel build in the
middle of a patch series when altering an existing interface.

> The stack area is allocated by DEFINE_NEON_REGSTACK[_PARTIAL](varname), where
> the partial version takes an additional int num_regs indicating how many
> registers need to be freed up.
> 
> In the !in_interrupt() case, these functions operate as before, and the regstack
> is defined to minimal size in this case as it will remain unused anyway. In the
> in_interrupt() case, 'num_regs' (or all) NEON registers are stacked/unstacked
> using the allocated stack region.

Would have been nice to have the stack simply be a NULL pointer when 
!in_interrupt() or when the number of regs is 0.  This would remove the 
need for a runtime check on !num_regs.  I don't see an obvious way to 
accomplish that right now though.

Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts
  2013-10-15  4:01 ` [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Nicolas Pitre
@ 2013-10-15 13:13   ` Ard Biesheuvel
  2013-10-15 14:06     ` Ard Biesheuvel
  2013-10-15 16:05     ` Nicolas Pitre
  0 siblings, 2 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-15 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 October 2013 06:01, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Sun, 13 Oct 2013, Ard Biesheuvel wrote:
>
>> Instead of having additional separate versions of kernel_neon_begin/end, the
>> existing ones now have been modified to always take a preallocated stack area
>> as an argument.
>
> The problem with this approach is that you break git bisect by making
> the kernel unbuildable when this series is partially applied.  Either
> you make kernel_neon_begin/end into wrappers with no argument around the
> new interface, or you change all users at the same time as the
> interface.  One big principle is not to break the kernel build in the
> middle of a patch series when altering an existing interface.
>

I see.

>> The stack area is allocated by DEFINE_NEON_REGSTACK[_PARTIAL](varname), where
>> the partial version takes an additional int num_regs indicating how many
>> registers need to be freed up.
>>
>> In the !in_interrupt() case, these functions operate as before, and the regstack
>> is defined to minimal size in this case as it will remain unused anyway. In the
>> in_interrupt() case, 'num_regs' (or all) NEON registers are stacked/unstacked
>> using the allocated stack region.
>
> Would have been nice to have the stack simply be a NULL pointer when
> !in_interrupt() or when the number of regs is 0.  This would remove the
> need for a runtime check on !num_regs.  I don't see an obvious way to
> accomplish that right now though.
>

We could address both of these issues by implementing Catalin's
suggestion to reserve per-process vfp_states[] for both irq and
softirq context in addition to the ordinary one, but it would waste a
lot of space imo. What is your take on that?

-- 
Ard.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts
  2013-10-15 13:13   ` Ard Biesheuvel
@ 2013-10-15 14:06     ` Ard Biesheuvel
  2013-10-15 16:05     ` Nicolas Pitre
  1 sibling, 0 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-15 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 October 2013 15:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 15 October 2013 06:01, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> On Sun, 13 Oct 2013, Ard Biesheuvel wrote:
>>
>>> Instead of having additional separate versions of kernel_neon_begin/end, the
>>> existing ones now have been modified to always take a preallocated stack area
>>> as an argument.
>>
>> The problem with this approach is that you break git bisect by making
>> the kernel unbuildable when this series is partially applied.  Either
>> you make kernel_neon_begin/end into wrappers with no argument around the
>> new interface, or you change all users at the same time as the
>> interface.  One big principle is not to break the kernel build in the
>> middle of a patch series when altering an existing interface.
>>
>
> I see.
>
>>> The stack area is allocated by DEFINE_NEON_REGSTACK[_PARTIAL](varname), where
>>> the partial version takes an additional int num_regs indicating how many
>>> registers need to be freed up.
>>>
>>> In the !in_interrupt() case, these functions operate as before, and the regstack
>>> is defined to minimal size in this case as it will remain unused anyway. In the
>>> in_interrupt() case, 'num_regs' (or all) NEON registers are stacked/unstacked
>>> using the allocated stack region.
>>
>> Would have been nice to have the stack simply be a NULL pointer when
>> !in_interrupt() or when the number of regs is 0.  This would remove the
>> need for a runtime check on !num_regs.  I don't see an obvious way to
>> accomplish that right now though.
>>
>
> We could address both of these issues by implementing Catalin's
> suggestion to reserve per-process vfp_states[] for both irq and
> softirq context in addition to the ordinary one, but it would waste a
> lot of space imo. What is your take on that?
>

Replying to self: two per-cpu vfp_states, one for irq and one for
softirq, is probably the best approach here. I still need to add
kernel_neon_begin_partial() in this case, but the existing users can
remain unmodified.
I will do a v4 by end of next week.

Regards,
Ard.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts
  2013-10-15 13:13   ` Ard Biesheuvel
  2013-10-15 14:06     ` Ard Biesheuvel
@ 2013-10-15 16:05     ` Nicolas Pitre
  2013-10-15 16:53       ` Catalin Marinas
  1 sibling, 1 reply; 19+ messages in thread
From: Nicolas Pitre @ 2013-10-15 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 15 Oct 2013, Ard Biesheuvel wrote:

> On 15 October 2013 06:01, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Sun, 13 Oct 2013, Ard Biesheuvel wrote:
> >
> >> The stack area is allocated by DEFINE_NEON_REGSTACK[_PARTIAL](varname), where
> >> the partial version takes an additional int num_regs indicating how many
> >> registers need to be freed up.
> >>
> >> In the !in_interrupt() case, these functions operate as before, and the regstack
> >> is defined to minimal size in this case as it will remain unused anyway. In the
> >> in_interrupt() case, 'num_regs' (or all) NEON registers are stacked/unstacked
> >> using the allocated stack region.
> >
> > Would have been nice to have the stack simply be a NULL pointer when
> > !in_interrupt() or when the number of regs is 0.  This would remove the
> > need for a runtime check on !num_regs.  I don't see an obvious way to
> > accomplish that right now though.
> >
> 
> We could address both of these issues by implementing Catalin's
> suggestion to reserve per-process vfp_states[] for both irq and
> softirq context in addition to the ordinary one, but it would waste a
> lot of space imo. What is your take on that?

I agree that this would be rather wasteful.  I really like your current 
approach of dynamically allocating just the right amount of space on the 
stack.  I'm not a big fan of statically allocated memory which is 
seldomly used.

What I meant by my suggestion was something like this:

#define kernel_neon_begin(p) \
	__kernel_neon_begin(sizeof((p).qregs) ? &(p).regs : NULL, \
			    sizeof((p).qregs)/16)

However it seems gcc is not clever enough to optimize the stack usage 
away at all in that case which is worse than your current version.  So 
better forget about this suggestion.


Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts
  2013-10-15 16:05     ` Nicolas Pitre
@ 2013-10-15 16:53       ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2013-10-15 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 15, 2013 at 05:05:48PM +0100, Nicolas Pitre wrote:
> On Tue, 15 Oct 2013, Ard Biesheuvel wrote:
> 
> > On 15 October 2013 06:01, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > On Sun, 13 Oct 2013, Ard Biesheuvel wrote:
> > >
> > >> The stack area is allocated by DEFINE_NEON_REGSTACK[_PARTIAL](varname), where
> > >> the partial version takes an additional int num_regs indicating how many
> > >> registers need to be freed up.
> > >>
> > >> In the !in_interrupt() case, these functions operate as before, and the regstack
> > >> is defined to minimal size in this case as it will remain unused anyway. In the
> > >> in_interrupt() case, 'num_regs' (or all) NEON registers are stacked/unstacked
> > >> using the allocated stack region.
> > >
> > > Would have been nice to have the stack simply be a NULL pointer when
> > > !in_interrupt() or when the number of regs is 0.  This would remove the
> > > need for a runtime check on !num_regs.  I don't see an obvious way to
> > > accomplish that right now though.
> > >
> > 
> > We could address both of these issues by implementing Catalin's
> > suggestion to reserve per-process vfp_states[] for both irq and
> > softirq context in addition to the ordinary one, but it would waste a
> > lot of space imo. What is your take on that?
> 
> I agree that this would be rather wasteful.  I really like your current 
> approach of dynamically allocating just the right amount of space on the 
> stack.  I'm not a big fan of statically allocated memory which is 
> seldomly used.

I agree here, especially since we need to cover both soft and hard irqs.
It would be about 1KB per CPU, not noticeable even on big systems but
still looks like it's only going to be used rarely.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context
  2013-10-13 12:14 ` [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context Ard Biesheuvel
@ 2013-10-15 17:26   ` Catalin Marinas
  2013-10-15 17:30     ` Ard Biesheuvel
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2013-10-15 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Oct 13, 2013 at 01:14:57PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h
> index 8f730fe..800d85c 100644
> --- a/arch/arm/include/asm/neon.h
> +++ b/arch/arm/include/asm/neon.h
> @@ -8,10 +8,30 @@
>   * published by the Free Software Foundation.
>   */
>  
> +#include <linux/types.h>
> +#include <linux/hardirq.h>
> +#include <asm/fpstate.h>
>  #include <asm/hwcap.h>
>  
>  #define cpu_has_neon()		(!!(elf_hwcap & HWCAP_NEON))
>  
> +/*
> + * Avoid wasting stack space by making the size of the allocated area depend on
> + * whether we are currently running in process context. (If this is the case, we
> + * will use the normal preserve/restore mechanism, leaving the allocated stack
> + * space unused.)
> + */
> +#define __QREG_SIZE(num)	\
> +	((!in_interrupt()) ? 0 : (num) > 16 ? 256 : 16 * (((num) + 1) & ~1U))
> +
> +#define DEFINE_NEON_REGSTACK_PARTIAL(v, num)		\
> +	struct {					\
> +		struct vfp_partial_state regs;		\
> +		u8 qregs[__QREG_SIZE(num)];		\
> +	} v

Oh, interesting gcc feature. What does it generate?

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context
  2013-10-15 17:26   ` Catalin Marinas
@ 2013-10-15 17:30     ` Ard Biesheuvel
  2013-10-15 17:46       ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-15 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 October 2013 19:26, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Sun, Oct 13, 2013 at 01:14:57PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h
>> index 8f730fe..800d85c 100644
>> --- a/arch/arm/include/asm/neon.h
>> +++ b/arch/arm/include/asm/neon.h
>> @@ -8,10 +8,30 @@
>>   * published by the Free Software Foundation.
>>   */
>>
>> +#include <linux/types.h>
>> +#include <linux/hardirq.h>
>> +#include <asm/fpstate.h>
>>  #include <asm/hwcap.h>
>>
>>  #define cpu_has_neon()               (!!(elf_hwcap & HWCAP_NEON))
>>
>> +/*
>> + * Avoid wasting stack space by making the size of the allocated area depend on
>> + * whether we are currently running in process context. (If this is the case, we
>> + * will use the normal preserve/restore mechanism, leaving the allocated stack
>> + * space unused.)
>> + */
>> +#define __QREG_SIZE(num)     \
>> +     ((!in_interrupt()) ? 0 : (num) > 16 ? 256 : 16 * (((num) + 1) & ~1U))
>> +
>> +#define DEFINE_NEON_REGSTACK_PARTIAL(v, num)         \
>> +     struct {                                        \
>> +             struct vfp_partial_state regs;          \
>> +             u8 qregs[__QREG_SIZE(num)];             \
>> +     } v
>
> Oh, interesting gcc feature. What does it generate?
>

Well, it's not a feature particular to GCC, as far as I am aware. The
anonymous struct is just runtime variably sized depending on
in_interrupt() and the requested number of registers.

-- 
Ard.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context
  2013-10-15 17:30     ` Ard Biesheuvel
@ 2013-10-15 17:46       ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2013-10-15 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 15, 2013 at 06:30:50PM +0100, Ard Biesheuvel wrote:
> On 15 October 2013 19:26, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Sun, Oct 13, 2013 at 01:14:57PM +0100, Ard Biesheuvel wrote:
> >> diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h
> >> index 8f730fe..800d85c 100644
> >> --- a/arch/arm/include/asm/neon.h
> >> +++ b/arch/arm/include/asm/neon.h
> >> @@ -8,10 +8,30 @@
> >>   * published by the Free Software Foundation.
> >>   */
> >>
> >> +#include <linux/types.h>
> >> +#include <linux/hardirq.h>
> >> +#include <asm/fpstate.h>
> >>  #include <asm/hwcap.h>
> >>
> >>  #define cpu_has_neon()               (!!(elf_hwcap & HWCAP_NEON))
> >>
> >> +/*
> >> + * Avoid wasting stack space by making the size of the allocated area depend on
> >> + * whether we are currently running in process context. (If this is the case, we
> >> + * will use the normal preserve/restore mechanism, leaving the allocated stack
> >> + * space unused.)
> >> + */
> >> +#define __QREG_SIZE(num)     \
> >> +     ((!in_interrupt()) ? 0 : (num) > 16 ? 256 : 16 * (((num) + 1) & ~1U))
> >> +
> >> +#define DEFINE_NEON_REGSTACK_PARTIAL(v, num)         \
> >> +     struct {                                        \
> >> +             struct vfp_partial_state regs;          \
> >> +             u8 qregs[__QREG_SIZE(num)];             \
> >> +     } v
> >
> > Oh, interesting gcc feature. What does it generate?
> >
> 
> Well, it's not a feature particular to GCC, as far as I am aware. The
> anonymous struct is just runtime variably sized depending on
> in_interrupt() and the requested number of registers.

OK, it looks like it's valid C99. I was worried the compiler may
generate something like an alloca() library call.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume
  2013-10-13 12:14 ` [RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
@ 2013-10-28 18:12   ` Catalin Marinas
  2013-10-28 20:32     ` Ard Biesheuvel
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2013-10-28 18:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote:
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>  void fpsimd_thread_switch(struct task_struct *next)
>  {
>  	/* check if not kernel threads */
> -	if (current->mm)
> +	if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>  		fpsimd_save_state(&current->thread.fpsimd_state);

Why does it need test_and_set_thread_flag() here? Some comments would be
useful as it looks strange to check a reload flag to decide whether to
save a state. Or change the name to something like 'dirty'.

>  	if (next->mm)
>  		fpsimd_load_state(&next->thread.fpsimd_state);

This function could be optimised a bit more to avoid saving/restoring if
the switch only happened between a user thread and a kernel one (and
back again) since the FP state may not have been dirtied. But what I had
in mind was per-CPU fpstate (possibly pointer or some flag) rather than
per-thread.

> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -416,4 +416,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>  		clear_thread_flag(TIF_NOTIFY_RESUME);
>  		tracehook_notify_resume(regs);
>  	}
> +	if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
> +		fpsimd_load_state(&current->thread.fpsimd_state);

I think this code can be preempted, it is run with IRQs enabled. And
there is a small window where we cleared the flag but haven't loaded the
state.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume
  2013-10-28 18:12   ` Catalin Marinas
@ 2013-10-28 20:32     ` Ard Biesheuvel
  2013-10-28 22:29       ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Ard Biesheuvel @ 2013-10-28 20:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 October 2013 11:12, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote:
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>>  void fpsimd_thread_switch(struct task_struct *next)
>>  {
>>       /* check if not kernel threads */
>> -     if (current->mm)
>> +     if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>>               fpsimd_save_state(&current->thread.fpsimd_state);
>
> Why does it need test_and_set_thread_flag() here? Some comments would be
> useful as it looks strange to check a reload flag to decide whether to
> save a state. Or change the name to something like 'dirty'.
>

Actually, it's test and clear. If the userland register file has
already been preserved for the purpose of performing kernel mode NEON,
it should not be saved again when the task gets scheduled out. The
clearing could also be deferred to the time when the task gets
scheduled in again. Or perhaps, it would be even better to always
defer loading the userland state for the next task when that task in
fact enters userland.

>>       if (next->mm)
>>               fpsimd_load_state(&next->thread.fpsimd_state);
>
> This function could be optimised a bit more to avoid saving/restoring if
> the switch only happened between a user thread and a kernel one (and
> back again) since the FP state may not have been dirtied. But what I had
> in mind was per-CPU fpstate (possibly pointer or some flag) rather than
> per-thread.
>

Well, then we are entering the realm of lazy restore, imo. There were
some patches proposed for that already, I think? But I do agree that
at his point, there is no need to restore the userland register
contents yet, it can be deferred to the point when the task reenters
userland (as mentioned above).

>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -416,4 +416,6 @@ asmlinkage void do_noti fy_resume(struct pt_regs *regs,
>>               clear_thread_flag(TIF_NOTIFY_RESUME);
>>               tracehook_notify_resume(regs);
>>       }
>> +     if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>> +             fpsimd_load_state(&current->thread.fpsimd_state);
>
> I think this code can be preempted, it is run with IRQs enabled. And
> there is a small window where we cleared the flag but haven't loaded the
> state.
>

If we are preempted at this point, the fpstate will be loaded in the
normal way the next time this task runs, so I think this is harmless.
Although I guess we may be restoring the fp state twice in that case?

So in summary, what I need to do is:
- rework to use a per_cpu flag rather than a TIF;
- preserve the userland state (if it has one) when a task gets scheduled out;
- restore the userland state when a task enters userland;

I will propose an updated patch after I wrap up the work on the
in_interrupt() kernel mode NEON, as these topics are really orthogonal
and there is no reason to keep them combined in a single series.

-- 
Ard.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume
  2013-10-28 20:32     ` Ard Biesheuvel
@ 2013-10-28 22:29       ` Catalin Marinas
  0 siblings, 0 replies; 19+ messages in thread
From: Catalin Marinas @ 2013-10-28 22:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 Oct 2013, at 20:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 28 October 2013 11:12, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote:
>>> --- a/arch/arm64/kernel/fpsimd.c
>>> +++ b/arch/arm64/kernel/fpsimd.c
>>> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>>> void fpsimd_thread_switch(struct task_struct *next)
>>> {
>>>      /* check if not kernel threads */
>>> -     if (current->mm)
>>> +     if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>>>              fpsimd_save_state(&current->thread.fpsimd_state);
>> 
>> Why does it need test_and_set_thread_flag() here? Some comments would be
>> useful as it looks strange to check a reload flag to decide whether to
>> save a state. Or change the name to something like 'dirty'.
> 
> Actually, it's test and clear.

Yes, just a typo.

> If the userland register file has
> already been preserved for the purpose of performing kernel mode NEON,
> it should not be saved again when the task gets scheduled out. The
> clearing could also be deferred to the time when the task gets
> scheduled in again.

The above should be turned into a comment in the code.

> Or perhaps, it would be even better to always
> defer loading the userland state for the next task when that task in
> fact enters userland.

It needs some more thinking, its usually cleaner in the context
switching code.

>>>      if (next->mm)
>>>              fpsimd_load_state(&next->thread.fpsimd_state);
>> 
>> This function could be optimised a bit more to avoid saving/restoring if
>> the switch only happened between a user thread and a kernel one (and
>> back again) since the FP state may not have been dirtied. But what I had
>> in mind was per-CPU fpstate (possibly pointer or some flag) rather than
>> per-thread.
> 
> Well, then we are entering the realm of lazy restore, imo. There were
> some patches proposed for that already, I think? But I do agree that
> at his point, there is no need to restore the userland register
> contents yet, it can be deferred to the point when the task reenters
> userland (as mentioned above).

Not entirely lazy.  What I dont want to see (without proper benchmarks)
is disabling the FP at context switch and restoring the registers lazily
via the fault mechanism.  What Im proposing above is not lazy, just an
optimisation for a clear case where the FP is not used in a kernel
thread.

>>> --- a/arch/arm64/kernel/signal.c
>>> +++ b/arch/arm64/kernel/signal.c
>>> @@ -416,4 +416,6 @@ asmlinkage void do_noti fy_resume(struct pt_regs *regs,
>>>              clear_thread_flag(TIF_NOTIFY_RESUME);
>>>              tracehook_notify_resume(regs);
>>>      }
>>> +     if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>>> +             fpsimd_load_state(&current->thread.fpsimd_state);
>> 
>> I think this code can be preempted, it is run with IRQs enabled. And
>> there is a small window where we cleared the flag but haven't loaded the
>> state.
> 
> If we are preempted at this point, the fpstate will be loaded in the
> normal way the next time this task runs, so I think this is harmless.
> Although I guess we may be restoring the fp state twice in that case?

Lets say task A does an svc and gets into kernel mode followed by
kernel_neon_begin/end().  Before returning to user, the kernel runs the
above test_and_clear_thread_flag().  Immediately after, an interrupt
happens the task A is preempted.  The fpsimd_thread_switch() function
finds that TIF_RELOAD_FPSTATE is cleared and save the current FP state.
But the FP regs contain whatever the kernel neon code did, so you
corrupt the existing data.

> So in summary, what I need to do is:
> - rework to use a per_cpu flag rather than a TIF;
> - preserve the userland state (if it has one) when a task gets scheduled out;
> - restore the userland state when a task enters userland;

Happy to discuss the algorithm before you code it (unless you prefer to
write the code quickly).

> I will propose an updated patch after I wrap up the work on the
> in_interrupt() kernel mode NEON, as these topics are really orthogonal
> and there is no reason to keep them combined in a single series.

Sounds fine.

Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-10-28 22:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-13 12:14 [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Ard Biesheuvel
2013-10-13 12:14 ` [RFC v3 PATCH 1/7] ARM: add support for kernel mode NEON in atomic context Ard Biesheuvel
2013-10-15 17:26   ` Catalin Marinas
2013-10-15 17:30     ` Ard Biesheuvel
2013-10-15 17:46       ` Catalin Marinas
2013-10-13 12:14 ` [RFC v3 PATCH 2/7] ARM: port NEON version of xor_blocks() to new kmode NEON api Ard Biesheuvel
2013-10-13 12:14 ` [RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
2013-10-28 18:12   ` Catalin Marinas
2013-10-28 20:32     ` Ard Biesheuvel
2013-10-28 22:29       ` Catalin Marinas
2013-10-13 12:15 ` [RFC v3 PATCH 4/7] ARM64: add support for kernel mode NEON in atomic context Ard Biesheuvel
2013-10-13 12:15 ` [RFC v3 PATCH 5/7] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
2013-10-13 12:15 ` [RFC v3 PATCH 6/7] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
2013-10-13 12:15 ` [RFC v3 PATCH 7/7] lib/raid6: port NEON implementation to updated kmode NEON api Ard Biesheuvel
2013-10-15  4:01 ` [RFC v3 PATCH 0/7] ARM[64]: kernel mode NEON in atomic contexts Nicolas Pitre
2013-10-15 13:13   ` Ard Biesheuvel
2013-10-15 14:06     ` Ard Biesheuvel
2013-10-15 16:05     ` Nicolas Pitre
2013-10-15 16:53       ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).