linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts
@ 2013-10-07 12:12 Ard Biesheuvel
  2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

I am probably going to be flamed for bringing this up, but here it goes ...

This is more of a request for discussion rather than a request for comments on
these patches.

After floating point and SIMD we now have a third class of instructions that use
the NEON register file, the AES and SHA instructions that are present in the v8
Crypto Extensions.

This series uses CCMP as an example to make the case for having limited support
for the use of the NEON register file in atomic context. CCMP is the encryption
standard used in WPA2, and it is based on AES in CCM mode, which is basically
both encryption and authentication by passing all the data through AES twice.

The mac80211 layer, which performs this encryption and decryption, does so in a
context which does not allow the use of asynchronous ciphers, which in practice
means that it uses the C implementation (on ARM64), which I expect to be around
an order of magnitude slower than the dedicated instructions(*).

I have included two ways of working around this: patch #3 implements the core
AES cipher using only registers q0 and q1. Patch #4 implements the CCM chaining
mode using registers q0 - q3. (The significance of the latter is that I expect a
certain degree of interleaving to be required to run the AES instructions at
full speed, and CCM -while difficult to parallelize- can easily be implemented
with a 2-way interleave of the encryption and authentication parts.)

Patch #1 implements the stacking of 4 NEON registers (but note that patch #3
only needs 2 registers). Patch #2 implements emulation of the AES instructions
(considering how few of us have access to the Fast Model plugin). Patch #5
modifies the mac80211 code so it relies on the crypto api to supply a CCM
implementation rather than cooking up its own (latter is compile tested only and
included for reference)

* On ARM, we have the C implementation which runs in ~64 cycles per round and
  an accelerated synchronous implementation which runs in ~32 cycles per round
  (on Cortex-A15), but the latter relies heavily on the barrel shifter so its
  performance is difficult to extrapolate to ARMv8. It should also be noted that
  the table based C implementation uses 16kB in lookup tables (8 kB each way).


Ard Biesheuvel (5):
  ARM64: allow limited use of some NEON registers in exceptions
  ARM64: add quick-n-dirty emulation for AES instructions
  ARM64: add Crypto Extensions based synchronous core AES cipher
  ARM64: add Crypto Extensions based synchronous AES in CCM mode
  mac80211: Use CCM crypto driver for CCMP

 arch/arm64/Kconfig              |  14 ++
 arch/arm64/Makefile             |   1 +
 arch/arm64/crypto/Makefile      |  16 ++
 arch/arm64/crypto/aes-sync.c    | 410 ++++++++++++++++++++++++++++++++++++++++
 arch/arm64/crypto/aesce-ccm.S   | 159 ++++++++++++++++
 arch/arm64/crypto/aesce-emu.c   | 221 ++++++++++++++++++++++
 arch/arm64/include/asm/ptrace.h |   3 +
 arch/arm64/include/asm/traps.h  |  10 +
 arch/arm64/kernel/asm-offsets.c |   3 +
 arch/arm64/kernel/entry.S       |  12 +-
 arch/arm64/kernel/traps.c       |  49 +++++
 net/mac80211/Kconfig            |   1 +
 net/mac80211/aes_ccm.c          | 159 +++++-----------
 net/mac80211/aes_ccm.h          |   8 +-
 net/mac80211/key.h              |   2 +-
 net/mac80211/wpa.c              |  21 +-
 16 files changed, 961 insertions(+), 128 deletions(-)
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/aes-sync.c
 create mode 100644 arch/arm64/crypto/aesce-ccm.S
 create mode 100644 arch/arm64/crypto/aesce-emu.c

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions
  2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
  2013-10-07 22:03   ` Nicolas Pitre
  2013-10-07 12:12 ` [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions Ard Biesheuvel
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

Stack/unstack the bottom 4 NEON registers on exception entry/exit
so we can use them in places where we are not allowed to sleep.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig              | 14 ++++++++++++++
 arch/arm64/include/asm/ptrace.h |  3 +++
 arch/arm64/kernel/asm-offsets.c |  3 +++
 arch/arm64/kernel/entry.S       |  8 ++++++++
 4 files changed, 28 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c044548..b97a458 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -98,6 +98,9 @@ config IOMMU_HELPER
 config KERNEL_MODE_NEON
 	def_bool y
 
+config STACK_NEON_REGS_ON_EXCEPTION
+	def_bool n
+
 source "init/Kconfig"
 
 source "kernel/Kconfig.freezer"
@@ -219,6 +222,17 @@ config FORCE_MAX_ZONEORDER
 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
 	default "11"
 
+config KERNEL_MODE_SYNC_CE_CRYPTO
+	bool "Support for synchronous crypto ciphers using Crypto Extensions"
+	depends on KERNEL_MODE_NEON
+	select STACK_NEON_REGS_ON_EXCEPTION
+	help
+	  This enables support for using ARMv8 Crypto Extensions instructions
+	  in places where sleeping is not allowed. The synchronous ciphers are
+	  only allowed to use the bottom 4 NEON register q0 - q3, as stacking
+	  the entire NEON register file at every exception is too costly.
+
+
 endmenu
 
 menu "Boot options"
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index 0dacbbf..17ea483 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -103,6 +103,9 @@ struct pt_regs {
 	};
 	u64 orig_x0;
 	u64 syscallno;
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+	struct { u64 l, h; } qregs[4];
+#endif
 };
 
 #define arch_has_single_step()	(1)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 666e231..73c944a 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -58,6 +58,9 @@ int main(void)
   DEFINE(S_PC,			offsetof(struct pt_regs, pc));
   DEFINE(S_ORIG_X0,		offsetof(struct pt_regs, orig_x0));
   DEFINE(S_SYSCALLNO,		offsetof(struct pt_regs, syscallno));
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+  DEFINE(S_QREGS,		offsetof(struct pt_regs, qregs));
+#endif
   DEFINE(S_FRAME_SIZE,		sizeof(struct pt_regs));
   BLANK();
   DEFINE(MM_CONTEXT_ID,		offsetof(struct mm_struct, context.id));
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 3881fd1..c74dcca 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -58,6 +58,10 @@
 	push	x4, x5
 	push	x2, x3
 	push	x0, x1
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+	add	x21, sp, #S_QREGS
+	st1	{v0.16b-v3.16b}, [x21]
+#endif
 	.if	\el == 0
 	mrs	x21, sp_el0
 	.else
@@ -86,6 +90,10 @@
 	.endm
 
 	.macro	kernel_exit, el, ret = 0
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+	add	x21, sp, #S_QREGS
+	ld1	{v0.16b-v3.16b}, [x21]
+#endif
 	ldp	x21, x22, [sp, #S_PC]		// load ELR, SPSR
 	.if	\el == 0
 	ldr	x23, [sp, #S_SP]		// load return stack pointer
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions
  2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
  2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
  2013-10-07 16:18   ` Catalin Marinas
  2013-10-07 12:12 ` [RFC PATCH 3/5] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Makefile            |   1 +
 arch/arm64/crypto/Makefile     |  11 ++
 arch/arm64/crypto/aesce-emu.c  | 221 +++++++++++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/traps.h |  10 ++
 arch/arm64/kernel/entry.S      |   4 +-
 arch/arm64/kernel/traps.c      |  49 +++++++++
 6 files changed, 295 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/aesce-emu.c

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index d90cf79..c864bb5 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -39,6 +39,7 @@ export	TEXT_OFFSET GZFLAGS
 core-y		+= arch/arm64/kernel/ arch/arm64/mm/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
 libs-y		:= arch/arm64/lib/ $(libs-y)
 libs-y		+= $(LIBGCC)
 
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 0000000..f87ec80
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,11 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-y += aesce-emu.o
diff --git a/arch/arm64/crypto/aesce-emu.c b/arch/arm64/crypto/aesce-emu.c
new file mode 100644
index 0000000..4cc7ee9
--- /dev/null
+++ b/arch/arm64/crypto/aesce-emu.c
@@ -0,0 +1,221 @@
+/*
+ * aesce-emu-c - emulate aese/aesd/aesmc/aesimc instructions
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/init.h>
+#include <linux/printk.h>
+#include <linux/ptrace.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <asm/traps.h>
+
+union AES_STATE {
+	u8	bytes[16];
+	u64	l[2];
+} __aligned(8);
+
+static void add_sub_shift(union AES_STATE *st, union AES_STATE *rk, int inv);
+static void mix_columns(union AES_STATE *out, union AES_STATE *in);
+static void inv_mix_columns_pre(union AES_STATE *out);
+
+#define REG_ACCESS(op, r, mem) \
+	do { case r: asm(#op " {v" #r ".16b}, [%0]" : : "r"(mem)); goto out; \
+	} while (0)
+
+#define REG_SWITCH(reg, op, m) do { switch (reg) { \
+	REG_ACCESS(op,  0, m);	REG_ACCESS(op,  1, m);	REG_ACCESS(op,  2, m); \
+	REG_ACCESS(op,  3, m);	REG_ACCESS(op,  4, m);	REG_ACCESS(op,  5, m); \
+	REG_ACCESS(op,  6, m);	REG_ACCESS(op,  7, m);	REG_ACCESS(op,  8, m); \
+	REG_ACCESS(op,  9, m);	REG_ACCESS(op, 10, m);	REG_ACCESS(op, 11, m); \
+	REG_ACCESS(op, 12, m);	REG_ACCESS(op, 13, m);	REG_ACCESS(op, 14, m); \
+	REG_ACCESS(op, 15, m);	REG_ACCESS(op, 16, m);	REG_ACCESS(op, 17, m); \
+	REG_ACCESS(op, 18, m);	REG_ACCESS(op, 19, m);	REG_ACCESS(op, 20, m); \
+	REG_ACCESS(op, 21, m);	REG_ACCESS(op, 22, m);	REG_ACCESS(op, 23, m); \
+	REG_ACCESS(op, 24, m);	REG_ACCESS(op, 25, m);	REG_ACCESS(op, 26, m); \
+	REG_ACCESS(op, 27, m);	REG_ACCESS(op, 28, m);	REG_ACCESS(op, 29, m); \
+	REG_ACCESS(op, 30, m);	REG_ACCESS(op, 31, m); \
+	} out:; } while (0)
+
+static void load_neon_reg(union AES_STATE *st, int reg)
+{
+	REG_SWITCH(reg, st1, st->bytes);
+}
+
+static void save_neon_reg(union AES_STATE *st, int reg, struct pt_regs *regs)
+{
+	REG_SWITCH(reg, ld1, st->bytes);
+
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+	if (reg < 4)
+		/* update the stacked reg as well */
+		memcpy((u8 *)&regs->qregs[reg], st->bytes, 16);
+#endif
+}
+
+static void aesce_do_emulate(unsigned int instr, struct pt_regs *regs)
+{
+	enum { AESE, AESD, AESMC, AESIMC } kind = (instr >> 12) & 3;
+	int rn = (instr >> 5) & 0x1f;
+	int rd = instr & 0x1f;
+	union AES_STATE in, out;
+
+	load_neon_reg(&in, rn);
+
+	switch (kind) {
+	case AESE:
+	case AESD:
+		load_neon_reg(&out, rd);
+		add_sub_shift(&out, &in, kind);
+		break;
+	case AESIMC:
+		inv_mix_columns_pre(&in);
+	case AESMC:
+		mix_columns(&out, &in);
+	}
+	save_neon_reg(&out, rd, regs);
+}
+
+static int aesce_emu_instr(struct pt_regs *regs, unsigned int instr);
+
+static struct undef_hook aesce_emu_uh = {
+	.instr_val	= 0x4e284800,
+	.instr_mask	= 0xffffcc00,
+	.fn		= aesce_emu_instr,
+};
+
+static int aesce_emu_instr(struct pt_regs *regs, unsigned int instr)
+{
+	do {
+		aesce_do_emulate(instr, regs);
+		regs->pc += 4;
+		get_user(instr, (u32 __user *)regs->pc);
+	} while ((instr & aesce_emu_uh.instr_mask) == aesce_emu_uh.instr_val);
+
+	return 0;
+}
+
+static int aesce_emu_init(void)
+{
+	register_undef_hook(&aesce_emu_uh);
+	return 0;
+}
+
+arch_initcall(aesce_emu_init);
+
+#define gf8_mul_x(a) \
+	(((a) << 1) ^ (((a) & 0x80) ? 0x1b : 0))
+
+static void mix_columns(union AES_STATE *out, union AES_STATE *in)
+{
+	int i;
+
+	for (i = 0; i < 16; i++)
+		out->bytes[i] =
+			gf8_mul_x(in->bytes[i]) ^
+			gf8_mul_x(in->bytes[((i + 1) % 4) | (i & ~3)]) ^
+				in->bytes[((i + 1) % 4) | (i & ~3)] ^
+				in->bytes[((i + 2) % 4) | (i & ~3)] ^
+				in->bytes[((i + 3) % 4) | (i & ~3)];
+}
+
+#define gf8_mul_x2(a) \
+	(((a) << 2) ^ (((a) & 0x80) ? 0x36 : 0) ^ (((a) & 0x40) ? 0x1b : 0))
+
+static void inv_mix_columns_pre(union AES_STATE *out)
+{
+	union AES_STATE in = *out;
+	int i;
+
+	for (i = 0; i < 16; i++)
+		out->bytes[i] = gf8_mul_x2(in.bytes[i]) ^ in.bytes[i] ^
+				gf8_mul_x2(in.bytes[i ^ 2]);
+}
+
+static void add_sub_shift(union AES_STATE *st, union AES_STATE *rk, int inv)
+{
+	static u8 const sbox[][256] = { {
+		0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
+		0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
+		0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0,
+		0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
+		0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc,
+		0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
+		0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a,
+		0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75,
+		0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0,
+		0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84,
+		0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b,
+		0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
+		0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85,
+		0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8,
+		0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5,
+		0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2,
+		0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17,
+		0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
+		0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88,
+		0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb,
+		0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c,
+		0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
+		0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9,
+		0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
+		0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6,
+		0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a,
+		0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e,
+		0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e,
+		0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94,
+		0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
+		0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68,
+		0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16
+	}, {
+		0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38,
+		0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb,
+		0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87,
+		0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb,
+		0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d,
+		0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e,
+		0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2,
+		0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25,
+		0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16,
+		0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92,
+		0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda,
+		0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84,
+		0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a,
+		0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06,
+		0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02,
+		0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b,
+		0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea,
+		0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73,
+		0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85,
+		0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e,
+		0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89,
+		0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b,
+		0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20,
+		0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4,
+		0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31,
+		0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f,
+		0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d,
+		0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef,
+		0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0,
+		0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61,
+		0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26,
+		0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
+	} };
+	static u8 const permute[][16] = { {
+		0,  5, 10, 15, 4, 9, 14,  3, 8, 13, 2,  7, 12, 1, 6, 11
+	}, {
+		0, 13, 10,  7, 4, 1, 14, 11, 8,  5, 2, 15, 12, 9, 6,  3
+	} };
+	int i;
+
+	rk->l[0] ^= st->l[0];
+	rk->l[1] ^= st->l[1];
+
+	for (i = 0; i < 16; i++)
+		st->bytes[i] = sbox[inv][rk->bytes[permute[inv][i]]];
+}
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index 10ca8ff..781e50cb2 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -27,4 +27,14 @@ static inline int in_exception_text(unsigned long ptr)
 	       ptr < (unsigned long)&__exception_text_end;
 }
 
+struct undef_hook {
+	struct list_head node;
+	u32 instr_mask;
+	u32 instr_val;
+	int (*fn)(struct pt_regs *regs, unsigned int instr);
+};
+
+void register_undef_hook(struct undef_hook *hook);
+void unregister_undef_hook(struct undef_hook *hook);
+
 #endif
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index c74dcca..e4d89df 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -291,7 +291,9 @@ el1_undef:
 	 * Undefined instruction
 	 */
 	mov	x0, sp
-	b	do_undefinstr
+	bl	do_undefinstr
+
+	kernel_exit 1
 el1_dbg:
 	/*
 	 * Debug exception handling
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 7ffaddd..3cc4c91 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -257,11 +257,60 @@ void arm64_notify_die(const char *str, struct pt_regs *regs,
 		die(str, regs, err);
 }
 
+static LIST_HEAD(undef_hook);
+static DEFINE_RAW_SPINLOCK(undef_lock);
+
+void register_undef_hook(struct undef_hook *hook)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&undef_lock, flags);
+	list_add(&hook->node, &undef_hook);
+	raw_spin_unlock_irqrestore(&undef_lock, flags);
+}
+
+void unregister_undef_hook(struct undef_hook *hook)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&undef_lock, flags);
+	list_del(&hook->node);
+	raw_spin_unlock_irqrestore(&undef_lock, flags);
+}
+
+static int call_undef_hook(struct pt_regs *regs, void __user *pc)
+{
+	struct undef_hook *hook;
+	unsigned long flags;
+	int (*fn)(struct pt_regs *regs, unsigned int instr) = NULL;
+	unsigned int instr;
+	mm_segment_t fs;
+	int ret;
+
+	fs = get_fs();
+	set_fs(KERNEL_DS);
+
+	get_user(instr, (u32 __user *)pc);
+
+	raw_spin_lock_irqsave(&undef_lock, flags);
+	list_for_each_entry(hook, &undef_hook, node)
+		if ((instr & hook->instr_mask) == hook->instr_val)
+			fn = hook->fn;
+	raw_spin_unlock_irqrestore(&undef_lock, flags);
+
+	ret = fn ? fn(regs, instr) : 1;
+	set_fs(fs);
+	return ret;
+}
+
 asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
 {
 	siginfo_t info;
 	void __user *pc = (void __user *)instruction_pointer(regs);
 
+	if (call_undef_hook(regs, pc) == 0)
+		return;
+
 	/* check for AArch32 breakpoint instructions */
 	if (!aarch32_break_handler(regs))
 		return;
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/5] ARM64: add Crypto Extensions based synchronous core AES cipher
  2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
  2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
  2013-10-07 12:12 ` [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
  2013-10-07 12:12 ` [RFC PATCH 4/5] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
  2013-10-07 12:12 ` [RFC PATCH 5/5] mac80211: Use CCM crypto driver for CCMP Ard Biesheuvel
  4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

This implements the core AES cipher using the Crypto Extensions,
using only NEON register q0 and q1.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Makefile   |  5 +++
 arch/arm64/crypto/aes-sync.c | 95 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+)
 create mode 100644 arch/arm64/crypto/aes-sync.c

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index f87ec80..e598c0a 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -9,3 +9,8 @@
 #
 
 obj-y += aesce-emu.o
+
+ifeq ($(CONFIG_KERNEL_MODE_SYNC_CE_CRYPTO),y)
+aesce-sync-y	:= aes-sync.o
+obj-m		+= aesce-sync.o
+endif
diff --git a/arch/arm64/crypto/aes-sync.c b/arch/arm64/crypto/aes-sync.c
new file mode 100644
index 0000000..5c5d641
--- /dev/null
+++ b/arch/arm64/crypto/aes-sync.c
@@ -0,0 +1,95 @@
+/*
+ * linux/arch/arm64/crypto/aes-sync.c
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/aes.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	int rounds = 6 + ctx->key_length / 4;
+
+	__asm__("	.arch		armv8-a+crypto			\n\t"
+		"	ld1		{v0.16b}, [%[in]]		\n\t"
+		"	ld1		{v1.16b}, [%[key]], #16		\n\t"
+		"0:	aese		v0.16b, v1.16b			\n\t"
+		"	subs		%[rounds], %[rounds], #1	\n\t"
+		"	ld1		{v1.16b}, [%[key]], #16		\n\t"
+		"	beq		1f				\n\t"
+		"	aesmc		v0.16b, v0.16b			\n\t"
+		"	b		0b				\n\t"
+		"1:	eor		v0.16b, v0.16b, v1.16b		\n\t"
+		"	st1		{v0.16b}, [%[out]]		\n\t"
+	: :
+		[out]		"r"(dst),
+		[in]		"r"(src),
+		[rounds]	"r"(rounds),
+		[key]		"r"(ctx->key_enc));
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	int rounds = 6 + ctx->key_length / 4;
+
+	__asm__("	.arch		armv8-a+crypto			\n\t"
+		"	ld1		{v0.16b}, [%[in]]		\n\t"
+		"	ld1		{v1.16b}, [%[key]], #16		\n\t"
+		"0:	aesd		v0.16b, v1.16b			\n\t"
+		"	ld1		{v1.16b}, [%[key]], #16		\n\t"
+		"	subs		%[rounds], %[rounds], #1	\n\t"
+		"	beq		1f				\n\t"
+		"	aesimc		v0.16b, v0.16b			\n\t"
+		"	b		0b				\n\t"
+		"1:	eor		v0.16b, v0.16b, v1.16b		\n\t"
+		"	st1		{v0.16b}, [%[out]]		\n\t"
+	: :
+		[out]		"r"(dst),
+		[in]		"r"(src),
+		[rounds]	"r"(rounds),
+		[key]		"r"(ctx->key_dec));
+}
+
+static struct crypto_alg aes_alg = {
+	.cra_name		= "aes",
+	.cra_driver_name	= "aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_cipher = {
+		.cia_min_keysize	= AES_MIN_KEY_SIZE,
+		.cia_max_keysize	= AES_MAX_KEY_SIZE,
+		.cia_setkey		= crypto_aes_set_key,
+		.cia_encrypt		= aes_cipher_encrypt,
+		.cia_decrypt		= aes_cipher_decrypt
+	}
+};
+
+static int __init aes_mod_init(void)
+{
+	if (0) // TODO check for crypto extensions
+		return -ENODEV;
+	return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+	crypto_unregister_alg(&aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
+
+MODULE_DESCRIPTION("Synchronous AES using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 4/5] ARM64: add Crypto Extensions based synchronous AES in CCM mode
  2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2013-10-07 12:12 ` [RFC PATCH 3/5] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
  2013-10-07 12:12 ` [RFC PATCH 5/5] mac80211: Use CCM crypto driver for CCMP Ard Biesheuvel
  4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

This implements the CCM AEAD chaining mode for AES using Crypto
Extensions instructions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Makefile    |   2 +-
 arch/arm64/crypto/aes-sync.c  | 323 +++++++++++++++++++++++++++++++++++++++++-
 arch/arm64/crypto/aesce-ccm.S | 159 +++++++++++++++++++++
 3 files changed, 479 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/crypto/aesce-ccm.S

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index e598c0a..dfd1886 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -11,6 +11,6 @@
 obj-y += aesce-emu.o
 
 ifeq ($(CONFIG_KERNEL_MODE_SYNC_CE_CRYPTO),y)
-aesce-sync-y	:= aes-sync.o
+aesce-sync-y	:= aes-sync.o aesce-ccm.o
 obj-m		+= aesce-sync.o
 endif
diff --git a/arch/arm64/crypto/aes-sync.c b/arch/arm64/crypto/aes-sync.c
index 5c5d641..263925a5 100644
--- a/arch/arm64/crypto/aes-sync.c
+++ b/arch/arm64/crypto/aes-sync.c
@@ -8,7 +8,10 @@
  * published by the Free Software Foundation.
  */
 
+#include <asm/unaligned.h>
 #include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/scatterwalk.h>
 #include <linux/crypto.h>
 #include <linux/module.h>
 
@@ -58,7 +61,281 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
 		[key]		"r"(ctx->key_dec));
 }
 
-static struct crypto_alg aes_alg = {
+struct crypto_ccm_aes_ctx {
+	struct crypto_aes_ctx	*key;
+	struct crypto_blkcipher	*blk_tfm;
+};
+
+asmlinkage void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], long abytes,
+				     u32 const rk[], int rounds);
+
+asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], long cbytes,
+				   u32 const rk[], int rounds, u8 mac[],
+				   u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], long cbytes,
+				   u32 const rk[], int rounds, u8 mac[],
+				   u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
+				 long rounds);
+
+static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
+		      unsigned int key_len)
+{
+	struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(tfm);
+	int ret;
+
+	ret = crypto_aes_expand_key(ctx->key, in_key, key_len);
+	if (!ret)
+		return 0;
+
+	tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+	return -EINVAL;
+}
+
+static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
+{
+	if ((authsize & 1) || authsize < 4)
+		return -EINVAL;
+	return 0;
+}
+
+static void ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	__be32 *n = (__be32 *)(&maciv[AES_BLOCK_SIZE - 4]);
+	u32 l = req->iv[0] + 1;
+
+	*n = cpu_to_be32(msglen);
+
+	memcpy(maciv, req->iv, AES_BLOCK_SIZE - l);
+
+	maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2;
+	if (req->assoclen)
+		maciv[0] |= 0x40;
+
+	memset(&req->iv[AES_BLOCK_SIZE - l], 0, l);
+}
+
+static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct __packed { __be16 l; __be32 h; } ltag;
+	int rounds = 6 + ctx->key->key_length / 4;
+	struct scatter_walk walk;
+	u32 len = req->assoclen;
+	u32 macp;
+
+	/* prepend the AAD with a length tag */
+	if (len < 0xff00) {
+		ltag.l = cpu_to_be16(len);
+		macp = 2;
+	} else  {
+		ltag.l = cpu_to_be16(0xfffe);
+		put_unaligned_be32(len, &ltag.h);
+		macp = 6;
+	}
+
+	ce_aes_ccm_auth_data(mac, (u8 *)&ltag, macp, ctx->key->key_enc, rounds);
+	scatterwalk_start(&walk, req->assoc);
+
+	do {
+		u32 n = scatterwalk_clamp(&walk, len);
+		u32 m;
+		u8 *p;
+
+		if (!n) {
+			scatterwalk_start(&walk, sg_next(walk.sg));
+			n = scatterwalk_clamp(&walk, len);
+		}
+		p = scatterwalk_map(&walk);
+		m = min(n, AES_BLOCK_SIZE - macp);
+		crypto_xor(&mac[macp], p, m);
+
+		len -= n;
+		n -= m;
+		macp += m;
+		if (macp == AES_BLOCK_SIZE && (n || len)) {
+			ce_aes_ccm_auth_data(mac, &p[m], n, ctx->key->key_enc,
+					     rounds);
+			macp = n % AES_BLOCK_SIZE;
+		}
+
+		scatterwalk_unmap(p);
+		scatterwalk_advance(&walk, n + m);
+		scatterwalk_done(&walk, 0, len);
+	} while (len);
+}
+
+struct ccm_inner_desc_info {
+	u8	ctriv[AES_BLOCK_SIZE];
+	u8	mac[AES_BLOCK_SIZE];
+} __aligned(8);
+
+static int ccm_inner_encrypt(struct blkcipher_desc *desc,
+			     struct scatterlist *dst, struct scatterlist *src,
+			     unsigned int nbytes)
+{
+	struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
+	struct ccm_inner_desc_info *descinfo = desc->info;
+	int rounds = 6 + ctx->key_length / 4;
+	struct blkcipher_walk walk;
+	int err;
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE);
+
+	while (walk.nbytes) {
+		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+		if (walk.nbytes == nbytes)
+			tail = 0;
+
+		ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				   walk.nbytes - tail, ctx->key_enc, rounds,
+				   descinfo->mac, descinfo->ctriv);
+
+		nbytes -= walk.nbytes - tail;
+		err = blkcipher_walk_done(desc, &walk, tail);
+	}
+	return err;
+}
+
+static int ccm_inner_decrypt(struct blkcipher_desc *desc,
+			     struct scatterlist *dst, struct scatterlist *src,
+			     unsigned int nbytes)
+{
+	struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
+	struct ccm_inner_desc_info *descinfo = desc->info;
+	int rounds = 6 + ctx->key_length / 4;
+	struct blkcipher_walk walk;
+	int err;
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE);
+
+	while (walk.nbytes) {
+		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+		if (walk.nbytes == nbytes)
+			tail = 0;
+
+		ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				   walk.nbytes - tail, ctx->key_enc, rounds,
+				   descinfo->mac, descinfo->ctriv);
+
+		nbytes -= walk.nbytes - tail;
+		err = blkcipher_walk_done(desc, &walk, tail);
+	}
+	return err;
+}
+
+static int ccm_encrypt(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+	int rounds = 6 + ctx->key->key_length / 4;
+	struct ccm_inner_desc_info descinfo;
+	int err;
+
+	struct blkcipher_desc desc = {
+		.tfm	= ctx->blk_tfm,
+		.info	= &descinfo,
+		.flags = 0,
+	};
+
+	ccm_init_mac(req, descinfo.mac, req->cryptlen);
+
+	if (req->assoclen)
+		ccm_calculate_auth_mac(req, descinfo.mac);
+
+	memcpy(descinfo.ctriv, req->iv, AES_BLOCK_SIZE);
+
+	/* call inner blkcipher to process the payload */
+	err = ccm_inner_encrypt(&desc, req->dst, req->src, req->cryptlen);
+	if (err)
+		return err;
+
+	ce_aes_ccm_final(descinfo.mac, req->iv, ctx->key->key_enc, rounds);
+
+	/* copy authtag to end of dst */
+	scatterwalk_map_and_copy(descinfo.mac, req->dst, req->cryptlen,
+				 crypto_aead_authsize(aead), 1);
+
+	return 0;
+}
+
+static int ccm_decrypt(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+	int rounds = 6 + ctx->key->key_length / 4;
+	struct ccm_inner_desc_info descinfo;
+	u8 atag[AES_BLOCK_SIZE];
+	u32 len;
+	int err;
+
+	struct blkcipher_desc desc = {
+		.tfm	= ctx->blk_tfm,
+		.info	= &descinfo,
+		.flags = 0,
+	};
+
+	len = req->cryptlen - crypto_aead_authsize(aead);
+	ccm_init_mac(req, descinfo.mac, len);
+
+	if (req->assoclen)
+		ccm_calculate_auth_mac(req, descinfo.mac);
+
+	memcpy(descinfo.ctriv, req->iv, AES_BLOCK_SIZE);
+
+	/* call inner blkcipher to process the payload */
+	err = ccm_inner_decrypt(&desc, req->dst, req->src, len);
+	if (err)
+		return err;
+
+	ce_aes_ccm_final(descinfo.mac, req->iv, ctx->key->key_enc, rounds);
+
+	/* compare calculated auth tag with the stored one */
+	scatterwalk_map_and_copy(atag, req->src, len,
+				 crypto_aead_authsize(aead), 0);
+
+	if (memcmp(descinfo.mac, atag, crypto_aead_authsize(aead)))
+		return -EBADMSG;
+	return 0;
+}
+
+static int ccm_init(struct crypto_tfm *tfm)
+{
+	struct crypto_ccm_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct crypto_blkcipher *blk_tfm;
+
+	blk_tfm = crypto_alloc_blkcipher("__driver-ccm-aesce-inner", 0, 0);
+	if (IS_ERR(blk_tfm))
+		return PTR_ERR(blk_tfm);
+
+	/* did we get the right one? (sanity check) */
+	if (crypto_blkcipher_crt(blk_tfm)->encrypt != ccm_inner_encrypt) {
+		crypto_free_blkcipher(ctx->blk_tfm);
+		return -EINVAL;
+	}
+
+	ctx->blk_tfm = blk_tfm;
+	ctx->key = crypto_blkcipher_ctx(blk_tfm);
+
+	return 0;
+}
+
+static void ccm_exit(struct crypto_tfm *tfm)
+{
+	struct crypto_ccm_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	crypto_free_blkcipher(ctx->blk_tfm);
+}
+
+static struct crypto_alg aes_algs[] = { {
 	.cra_name		= "aes",
 	.cra_driver_name	= "aes-ce",
 	.cra_priority		= 300,
@@ -73,18 +350,56 @@ static struct crypto_alg aes_alg = {
 		.cia_encrypt		= aes_cipher_encrypt,
 		.cia_decrypt		= aes_cipher_decrypt
 	}
-};
+}, {
+	.cra_name		= "__ccm-aesce-inner",
+	.cra_driver_name	= "__driver-ccm-aesce-inner",
+	.cra_priority		= 0,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= 1,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_alignmask		= 7,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_blkcipher = {
+		.min_keysize	= AES_MIN_KEY_SIZE,
+		.max_keysize	= AES_MAX_KEY_SIZE,
+		.ivsize		= sizeof(struct ccm_inner_desc_info),
+		.setkey		= crypto_aes_set_key,
+		.encrypt	= ccm_inner_encrypt,
+		.decrypt	= ccm_inner_decrypt,
+	},
+}, {
+	.cra_name		= "ccm(aes)",
+	.cra_driver_name	= "ccm-aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_AEAD,
+	.cra_blocksize		= 1,
+	.cra_ctxsize		= sizeof(struct crypto_ccm_aes_ctx),
+	.cra_alignmask		= 7,
+	.cra_type		= &crypto_aead_type,
+	.cra_module		= THIS_MODULE,
+	.cra_init		= ccm_init,
+	.cra_exit		= ccm_exit,
+	.cra_aead = {
+		.ivsize		= AES_BLOCK_SIZE,
+		.maxauthsize	= AES_BLOCK_SIZE,
+		.setkey		= ccm_setkey,
+		.setauthsize	= ccm_setauthsize,
+		.encrypt	= ccm_encrypt,
+		.decrypt	= ccm_decrypt,
+	}
+} };
 
 static int __init aes_mod_init(void)
 {
 	if (0) // TODO check for crypto extensions
 		return -ENODEV;
-	return crypto_register_alg(&aes_alg);
+	return crypto_register_algs(aes_algs, ARRAY_SIZE(aes_algs));
 }
 
 static void __exit aes_mod_exit(void)
 {
-	crypto_unregister_alg(&aes_alg);
+	crypto_unregister_algs(aes_algs, ARRAY_SIZE(aes_algs));
 }
 
 module_init(aes_mod_init);
diff --git a/arch/arm64/crypto/aesce-ccm.S b/arch/arm64/crypto/aesce-ccm.S
new file mode 100644
index 0000000..35d09af
--- /dev/null
+++ b/arch/arm64/crypto/aesce-ccm.S
@@ -0,0 +1,159 @@
+/*
+ * linux/arch/arm64/crypto/aesce-ccm.S - AES-CCM transform for ARMv8 with
+ *                                       Crypto Extensions
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.arch	armv8-a+crypto
+	.align	4
+
+	/*
+	 * void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], long abytes,
+	 *			     u8 const rk[], int rounds);
+	 */
+ENTRY(ce_aes_ccm_auth_data)
+	ld1	{v0.16b}, [x0]			/* load mac */
+	ld1	{v1.16b}, [x3]			/* load first round key */
+0:	mov	x7, x4
+	add	x6, x3, #16
+1:	aese	v0.16b, v1.16b
+	ld1	{v1.16b}, [x6], #16		/* load next round key */
+	subs	x7, x7, #1
+	beq	2f
+	aesmc	v0.16b, v0.16b
+	b	1b
+2:	eor	v0.16b, v0.16b, v1.16b		/* final round */
+	subs	x2, x2, #16			/* last data? */
+	bmi	3f
+	ld1	{v1.16b}, [x1], #16		/* load next input block */
+	eor	v0.16b, v0.16b, v1.16b		/* xor with mac */
+	beq	3f
+	ld1	{v1.16b}, [x3]			/* reload first round key */
+	b	0b
+3:	st1	{v0.16b}, [x0]			/* store mac */
+	beq	5f
+	adds	x2, x2, #16
+	beq	5f
+4:	ldrb	w7, [x1], #1
+	umov	w6, v0.b[0]
+	eor	w6, w6, w7
+	strb	w6, [x0], #1
+	subs	x2, x2, #1
+	beq	5f
+	ext	v0.16b, v0.16b, v0.16b, #1	/* rotate out the mac bytes */
+	b	4b
+5:	ret
+ENDPROC(ce_aes_ccm_auth_data)
+
+	.macro	aes_ccm_do_crypt,enc
+	stp	x29, x30, [sp, #-32]!		/* prologue */
+	mov	x29, sp
+	stp	x8, x9, [sp, #16]
+	ld1	{v0.16b}, [x5]			/* load mac */
+	ld1	{v2.16b}, [x6]			/* load ctr */
+	ld1	{v3.16b}, [x3]			/* load first round key */
+	umov	x8, v2.d[1]
+	rev	x8, x8				/* keep swabbed ctr in reg */
+0:	add	x8, x8, #1
+	rev	x9, x8
+	ins	v2.d[1], x9			/* no carry */
+	mov	x7, x4
+	add	x9, x3, #16
+1:	aese	v0.16b, v3.16b
+	aese	v2.16b, v3.16b
+	ld1	{v3.16b}, [x9], #16		/* load next round key */
+	subs	x7, x7, #1
+	beq	2f
+	aesmc	v0.16b, v0.16b
+	aesmc	v2.16b, v2.16b
+	b	1b
+2:	eor	v2.16b, v2.16b, v3.16b		/* final round enc */
+	eor	v0.16b, v0.16b, v3.16b		/* final round mac */
+	subs	x2, x2, #16
+	bmi	3f
+	ld1	{v1.16b}, [x1], #16		/* load next input block */
+	.if	\enc == 1
+	eor	v0.16b, v0.16b, v1.16b		/* xor mac with plaintext */
+	eor	v1.16b, v1.16b, v2.16b		/* xor with crypted ctr */
+	.else
+	eor	v1.16b, v1.16b, v2.16b		/* xor with crypted ctr */
+	eor	v0.16b, v0.16b, v1.16b		/* xor mac with plaintext */
+	.endif
+	st1	{v1.16b}, [x0], #16		/* write output block */
+	beq	5f
+	ld1	{v2.8b}, [x6]			/* reload ctriv */
+	ld1	{v3.16b}, [x3]			/* reload first round key */
+	b	0b
+3:	st1	{v0.16b}, [x5]			/* store mac */
+	add	x2, x2, #16			/* process partial tail block */
+4:	ldrb	w9, [x1], #1			/* get 1 byte of input */
+	umov	w6, v2.b[0]			/* get top crypted ctr byte */
+	umov	w7, v0.b[0]			/* get top mac byte */
+	.if	\enc == 1
+	eor	w7, w7, w9
+	eor	w9, w9, w6
+	.else
+	eor	w9, w9, w6
+	eor	w7, w7, w9
+	.endif
+	strb	w9, [x0], #1			/* store out byte */
+	strb	w7, [x5], #1			/* store mac byte */
+	subs	x2, x2, #1
+	beq	6f
+	ext	v0.16b, v0.16b, v0.16b, #1	/* shift out mac byte */
+	ext	v2.16b, v2.16b, v2.16b, #1	/* shift out ctr byte */
+	b	4b
+5:	rev	x8, x8
+	st1	{v0.16b}, [x5]			/* store mac */
+	str	x8, [x6, #8]			/* store lsb end of ctr (BE) */
+6:	ldp	x8, x9, [sp, #16]		/* epilogue */
+	ldp	x29, x30, [sp], #32
+	ret
+	.endm
+
+	/*
+	 * void ce_aes_ccm_encrypt(u8 out[], u8 const in[], long cbytes,
+	 * 			   u8 const rk[], int rounds, u8 mac[],
+	 * 			   u8 ctr[]);
+	 * void ce_aes_ccm_decrypt(u8 out[], u8 const in[], long cbytes,
+	 * 			   u8 const rk[], int rounds, u8 mac[],
+	 * 			   u8 ctr[]);
+	 */
+ENTRY(ce_aes_ccm_encrypt)
+	aes_ccm_do_crypt	1
+ENDPROC(ce_aes_ccm_encrypt)
+
+ENTRY(ce_aes_ccm_decrypt)
+	aes_ccm_do_crypt	0
+ENDPROC(ce_aes_ccm_decrypt)
+
+	/*
+	 * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[],
+	 * 			 long rounds);
+	 */
+ENTRY(ce_aes_ccm_final)
+	ld1	{v0.16b}, [x0]			/* load mac */
+	ld1	{v3.16b}, [x2], #16		/* load first round key */
+	ld1	{v2.16b}, [x1]			/* load 1st ctriv */
+0:	aese	v0.16b, v3.16b
+	aese	v2.16b, v3.16b
+	ld1	{v3.16b}, [x2], #16		/* load next round key */
+	subs	x3, x3, #1
+	beq	1f
+	aesmc	v0.16b, v0.16b
+	aesmc	v2.16b, v2.16b
+	b	0b
+1:	eor	v2.16b, v2.16b, v3.16b		/* final round enc */
+	eor	v0.16b, v0.16b, v3.16b		/* final round mac */
+	eor	v0.16b, v0.16b, v2.16b		/* en-/decrypt the mac */
+	st1	{v0.16b}, [x0]			/* store result */
+	ret
+ENDPROC(ce_aes_ccm_final)
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 5/5] mac80211: Use CCM crypto driver for CCMP
  2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2013-10-07 12:12 ` [RFC PATCH 4/5] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
  4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

Let the crypto layer supply the CCM implementation rather than
coding it directly on top of the core AES cipher.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 net/mac80211/Kconfig   |   1 +
 net/mac80211/aes_ccm.c | 159 +++++++++++++++----------------------------------
 net/mac80211/aes_ccm.h |   8 +--
 net/mac80211/key.h     |   2 +-
 net/mac80211/wpa.c     |  21 ++++---
 5 files changed, 64 insertions(+), 127 deletions(-)

diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig
index 62535fe..dc31ec3 100644
--- a/net/mac80211/Kconfig
+++ b/net/mac80211/Kconfig
@@ -4,6 +4,7 @@ config MAC80211
 	select CRYPTO
 	select CRYPTO_ARC4
 	select CRYPTO_AES
+	select CRYPTO_CCM
 	select CRC32
 	select AVERAGE
 	---help---
diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c
index be7614b9..c17f3a3 100644
--- a/net/mac80211/aes_ccm.c
+++ b/net/mac80211/aes_ccm.c
@@ -17,134 +17,71 @@
 #include "key.h"
 #include "aes_ccm.h"
 
-static void aes_ccm_prepare(struct crypto_cipher *tfm, u8 *scratch, u8 *a)
-{
-	int i;
-	u8 *b_0, *aad, *b, *s_0;
-
-	b_0 = scratch + 3 * AES_BLOCK_SIZE;
-	aad = scratch + 4 * AES_BLOCK_SIZE;
-	b = scratch;
-	s_0 = scratch + AES_BLOCK_SIZE;
-
-	crypto_cipher_encrypt_one(tfm, b, b_0);
-
-	/* Extra Authenticate-only data (always two AES blocks) */
-	for (i = 0; i < AES_BLOCK_SIZE; i++)
-		aad[i] ^= b[i];
-	crypto_cipher_encrypt_one(tfm, b, aad);
-
-	aad += AES_BLOCK_SIZE;
-
-	for (i = 0; i < AES_BLOCK_SIZE; i++)
-		aad[i] ^= b[i];
-	crypto_cipher_encrypt_one(tfm, a, aad);
-
-	/* Mask out bits from auth-only-b_0 */
-	b_0[0] &= 0x07;
-
-	/* S_0 is used to encrypt T (= MIC) */
-	b_0[14] = 0;
-	b_0[15] = 0;
-	crypto_cipher_encrypt_one(tfm, s_0, b_0);
-}
-
-
-void ieee80211_aes_ccm_encrypt(struct crypto_cipher *tfm, u8 *scratch,
+void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *scratch,
 			       u8 *data, size_t data_len,
 			       u8 *cdata, u8 *mic)
 {
-	int i, j, last_len, num_blocks;
-	u8 *pos, *cpos, *b, *s_0, *e, *b_0;
+	struct scatterlist aad, pt, ct;
+	struct aead_request req;
+	u8 *iv = scratch + 3 * AES_BLOCK_SIZE; /* b0 */
 
-	b = scratch;
-	s_0 = scratch + AES_BLOCK_SIZE;
-	e = scratch + 2 * AES_BLOCK_SIZE;
-	b_0 = scratch + 3 * AES_BLOCK_SIZE;
+	sg_init_one(&pt, data, data_len);
+	sg_init_one(&ct, cdata, data_len + IEEE80211_CCMP_MIC_LEN);
+	sg_init_one(&aad, scratch + 4 * AES_BLOCK_SIZE, 2 * AES_BLOCK_SIZE - 2);
 
-	num_blocks = DIV_ROUND_UP(data_len, AES_BLOCK_SIZE);
-	last_len = data_len % AES_BLOCK_SIZE;
-	aes_ccm_prepare(tfm, scratch, b);
+	aead_request_set_crypt(&req, &pt, &ct, data_len, iv);
+	aead_request_set_assoc(&req, &aad, 2 * AES_BLOCK_SIZE - 2);
 
-	/* Process payload blocks */
-	pos = data;
-	cpos = cdata;
-	for (j = 1; j <= num_blocks; j++) {
-		int blen = (j == num_blocks && last_len) ?
-			last_len : AES_BLOCK_SIZE;
-
-		/* Authentication followed by encryption */
-		for (i = 0; i < blen; i++)
-			b[i] ^= pos[i];
-		crypto_cipher_encrypt_one(tfm, b, b);
-
-		b_0[14] = (j >> 8) & 0xff;
-		b_0[15] = j & 0xff;
-		crypto_cipher_encrypt_one(tfm, e, b_0);
-		for (i = 0; i < blen; i++)
-			*cpos++ = *pos++ ^ e[i];
-	}
-
-	for (i = 0; i < IEEE80211_CCMP_MIC_LEN; i++)
-		mic[i] = b[i] ^ s_0[i];
+	crypto_aead_encrypt(&req);
 }
 
-
-int ieee80211_aes_ccm_decrypt(struct crypto_cipher *tfm, u8 *scratch,
+int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *scratch,
 			      u8 *cdata, size_t data_len, u8 *mic, u8 *data)
 {
-	int i, j, last_len, num_blocks;
-	u8 *pos, *cpos, *b, *s_0, *a, *b_0;
-
-	b = scratch;
-	s_0 = scratch + AES_BLOCK_SIZE;
-	a = scratch + 2 * AES_BLOCK_SIZE;
-	b_0 = scratch + 3 * AES_BLOCK_SIZE;
-
-	num_blocks = DIV_ROUND_UP(data_len, AES_BLOCK_SIZE);
-	last_len = data_len % AES_BLOCK_SIZE;
-	aes_ccm_prepare(tfm, scratch, a);
+	struct scatterlist aad, pt, ct;
+	struct aead_request req;
+	u8 *iv = scratch + 3 * AES_BLOCK_SIZE; /* b0 */
 
-	/* Process payload blocks */
-	cpos = cdata;
-	pos = data;
-	for (j = 1; j <= num_blocks; j++) {
-		int blen = (j == num_blocks && last_len) ?
-			last_len : AES_BLOCK_SIZE;
+	sg_init_one(&pt, data, data_len);
+	sg_init_one(&ct, cdata, data_len + IEEE80211_CCMP_MIC_LEN);
+	sg_init_one(&aad, scratch + 4 * AES_BLOCK_SIZE, 2 * AES_BLOCK_SIZE - 2);
 
-		/* Decryption followed by authentication */
-		b_0[14] = (j >> 8) & 0xff;
-		b_0[15] = j & 0xff;
-		crypto_cipher_encrypt_one(tfm, b, b_0);
-		for (i = 0; i < blen; i++) {
-			*pos = *cpos++ ^ b[i];
-			a[i] ^= *pos++;
-		}
-		crypto_cipher_encrypt_one(tfm, a, a);
-	}
+	aead_request_set_crypt(&req, &ct, &pt, data_len, iv);
+	aead_request_set_assoc(&req, &aad, 2 * AES_BLOCK_SIZE - 2);
 
-	for (i = 0; i < IEEE80211_CCMP_MIC_LEN; i++) {
-		if ((mic[i] ^ s_0[i]) != a[i])
-			return -1;
-	}
-
-	return 0;
+	return crypto_aead_decrypt(&req);
 }
 
-
-struct crypto_cipher *ieee80211_aes_key_setup_encrypt(const u8 key[])
+struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[])
 {
-	struct crypto_cipher *tfm;
-
-	tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC);
-	if (!IS_ERR(tfm))
-		crypto_cipher_setkey(tfm, key, WLAN_KEY_LEN_CCMP);
-
-	return tfm;
+	struct crypto_aead *tfm;
+	int err;
+
+	tfm = crypto_alloc_aead("ccm(aes)", 0, CRYPTO_ALG_ASYNC);
+	if (IS_ERR(tfm))
+		return tfm;
+
+	/* HACK: we use an auto variable for aead_request in the functions above
+	 * but this only works if the reqsize is 0, i.e., the aead alg does not
+	 * need additional space in the req struct to track its operations.
+	 * This is a proof of concept for the Crypto Extensions based AES/CCM,
+	 * so for now we just hack around it.
+	 */
+	err = -EINVAL;
+	if (crypto_aead_reqsize(tfm))
+		goto out;
+
+	err = crypto_aead_setkey(tfm, key, WLAN_KEY_LEN_CCMP);
+	if (!err)
+		err = crypto_aead_setauthsize(tfm, IEEE80211_CCMP_MIC_LEN);
+	if (!err)
+		return tfm;
+out:
+	crypto_free_aead(tfm);
+	return ERR_PTR(err);
 }
 
-
-void ieee80211_aes_key_free(struct crypto_cipher *tfm)
+void ieee80211_aes_key_free(struct crypto_aead *tfm)
 {
-	crypto_free_cipher(tfm);
+	crypto_free_aead(tfm);
 }
diff --git a/net/mac80211/aes_ccm.h b/net/mac80211/aes_ccm.h
index 5b7d744..797f3f0 100644
--- a/net/mac80211/aes_ccm.h
+++ b/net/mac80211/aes_ccm.h
@@ -12,13 +12,13 @@
 
 #include <linux/crypto.h>
 
-struct crypto_cipher *ieee80211_aes_key_setup_encrypt(const u8 key[]);
-void ieee80211_aes_ccm_encrypt(struct crypto_cipher *tfm, u8 *scratch,
+struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[]);
+void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *scratch,
 			       u8 *data, size_t data_len,
 			       u8 *cdata, u8 *mic);
-int ieee80211_aes_ccm_decrypt(struct crypto_cipher *tfm, u8 *scratch,
+int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *scratch,
 			      u8 *cdata, size_t data_len,
 			      u8 *mic, u8 *data);
-void ieee80211_aes_key_free(struct crypto_cipher *tfm);
+void ieee80211_aes_key_free(struct crypto_aead *tfm);
 
 #endif /* AES_CCM_H */
diff --git a/net/mac80211/key.h b/net/mac80211/key.h
index 036d57e..aaae0ed 100644
--- a/net/mac80211/key.h
+++ b/net/mac80211/key.h
@@ -83,7 +83,7 @@ struct ieee80211_key {
 			 * Management frames.
 			 */
 			u8 rx_pn[IEEE80211_NUM_TIDS + 1][IEEE80211_CCMP_PN_LEN];
-			struct crypto_cipher *tfm;
+			struct crypto_aead *tfm;
 			u32 replays; /* dot11RSNAStatsCCMPReplays */
 		} ccmp;
 		struct {
diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c
index c9edfcb..c3df1d7 100644
--- a/net/mac80211/wpa.c
+++ b/net/mac80211/wpa.c
@@ -343,7 +343,7 @@ static void ccmp_special_blocks(struct sk_buff *skb, u8 *pn, u8 *scratch,
 		data_len -= IEEE80211_CCMP_MIC_LEN;
 
 	/* First block, b_0 */
-	b_0[0] = 0x59; /* flags: Adata: 1, M: 011, L: 001 */
+	b_0[0] = 0x1; /* L == 1 */
 	/* Nonce: Nonce Flags | A2 | PN
 	 * Nonce Flags: Priority (b0..b3) | Management (b4) | Reserved (b5..b7)
 	 */
@@ -355,21 +355,20 @@ static void ccmp_special_blocks(struct sk_buff *skb, u8 *pn, u8 *scratch,
 
 	/* AAD (extra authenticate-only data) / masked 802.11 header
 	 * FC | A1 | A2 | A3 | SC | [A4] | [QC] */
-	put_unaligned_be16(len_a, &aad[0]);
-	put_unaligned(mask_fc, (__le16 *)&aad[2]);
-	memcpy(&aad[4], &hdr->addr1, 3 * ETH_ALEN);
+	put_unaligned(mask_fc, (__le16 *)&aad[0]);
+	memcpy(&aad[2], &hdr->addr1, 3 * ETH_ALEN);
 
 	/* Mask Seq#, leave Frag# */
-	aad[22] = *((u8 *) &hdr->seq_ctrl) & 0x0f;
-	aad[23] = 0;
+	aad[20] = *((u8 *) &hdr->seq_ctrl) & 0x0f;
+	aad[21] = 0;
 
 	if (a4_included) {
-		memcpy(&aad[24], hdr->addr4, ETH_ALEN);
-		aad[30] = qos_tid;
-		aad[31] = 0;
+		memcpy(&aad[22], hdr->addr4, ETH_ALEN);
+		aad[28] = qos_tid;
+		aad[29] = 0;
 	} else {
-		memset(&aad[24], 0, ETH_ALEN + IEEE80211_QOS_CTL_LEN);
-		aad[24] = qos_tid;
+		memset(&aad[22], 0, ETH_ALEN + IEEE80211_QOS_CTL_LEN);
+		aad[22] = qos_tid;
 	}
 }
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions
  2013-10-07 12:12 ` [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions Ard Biesheuvel
@ 2013-10-07 16:18   ` Catalin Marinas
  2013-10-07 16:28     ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2013-10-07 16:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 07, 2013 at 01:12:28PM +0100, Ard Biesheuvel wrote:
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

It's missing a commit message with an explanation. In the meantime, NAK
;)

-- 
Catalin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions
  2013-10-07 16:18   ` Catalin Marinas
@ 2013-10-07 16:28     ` Ard Biesheuvel
  0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 October 2013 18:18, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Oct 07, 2013 at 01:12:28PM +0100, Ard Biesheuvel wrote:
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> It's missing a commit message with an explanation. In the meantime, NAK
> ;)
>

Well, I just included it for completeness sake, not for upstreaming,
but if all it takes to get you to accept it is a commit message, than
maybe I will reconsider :-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions
  2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
@ 2013-10-07 22:03   ` Nicolas Pitre
  2013-10-07 22:44     ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Pitre @ 2013-10-07 22:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 7 Oct 2013, Ard Biesheuvel wrote:

> Stack/unstack the bottom 4 NEON registers on exception entry/exit
> so we can use them in places where we are not allowed to sleep.

If you want to use Neon in places where you're not allowed to sleep, 
this also means you are not going to be scheduled away unexpectedly.  
Why not simply save/restore those Neon regs locally instead?  Same goes 
for interrupt context.


Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions
  2013-10-07 22:03   ` Nicolas Pitre
@ 2013-10-07 22:44     ` Ard Biesheuvel
  0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 22:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 October 2013 00:03, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Mon, 7 Oct 2013, Ard Biesheuvel wrote:
>
>> Stack/unstack the bottom 4 NEON registers on exception entry/exit
>> so we can use them in places where we are not allowed to sleep.
>
> If you want to use Neon in places where you're not allowed to sleep,
> this also means you are not going to be scheduled away unexpectedly.
> Why not simply save/restore those Neon regs locally instead?  Same goes
> for interrupt context.
>

Hmm, I guess it is that simple: if you are certain you will not be
interrupted between stacking and unstacking, you can basically do
whatever you like at any time.
That is quite good news, actually, if that means there is no penalty
in general for allowing NEON in atomic context in some cases.

-- 
Ard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-10-07 22:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
2013-10-07 22:03   ` Nicolas Pitre
2013-10-07 22:44     ` Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions Ard Biesheuvel
2013-10-07 16:18   ` Catalin Marinas
2013-10-07 16:28     ` Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 3/5] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 4/5] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 5/5] mac80211: Use CCM crypto driver for CCMP Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).