* [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts
@ 2013-10-07 12:12 Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
To: linux-arm-kernel
I am probably going to be flamed for bringing this up, but here it goes ...
This is more of a request for discussion rather than a request for comments on
these patches.
After floating point and SIMD we now have a third class of instructions that use
the NEON register file, the AES and SHA instructions that are present in the v8
Crypto Extensions.
This series uses CCMP as an example to make the case for having limited support
for the use of the NEON register file in atomic context. CCMP is the encryption
standard used in WPA2, and it is based on AES in CCM mode, which is basically
both encryption and authentication by passing all the data through AES twice.
The mac80211 layer, which performs this encryption and decryption, does so in a
context which does not allow the use of asynchronous ciphers, which in practice
means that it uses the C implementation (on ARM64), which I expect to be around
an order of magnitude slower than the dedicated instructions(*).
I have included two ways of working around this: patch #3 implements the core
AES cipher using only registers q0 and q1. Patch #4 implements the CCM chaining
mode using registers q0 - q3. (The significance of the latter is that I expect a
certain degree of interleaving to be required to run the AES instructions at
full speed, and CCM -while difficult to parallelize- can easily be implemented
with a 2-way interleave of the encryption and authentication parts.)
Patch #1 implements the stacking of 4 NEON registers (but note that patch #3
only needs 2 registers). Patch #2 implements emulation of the AES instructions
(considering how few of us have access to the Fast Model plugin). Patch #5
modifies the mac80211 code so it relies on the crypto api to supply a CCM
implementation rather than cooking up its own (latter is compile tested only and
included for reference)
* On ARM, we have the C implementation which runs in ~64 cycles per round and
an accelerated synchronous implementation which runs in ~32 cycles per round
(on Cortex-A15), but the latter relies heavily on the barrel shifter so its
performance is difficult to extrapolate to ARMv8. It should also be noted that
the table based C implementation uses 16kB in lookup tables (8 kB each way).
Ard Biesheuvel (5):
ARM64: allow limited use of some NEON registers in exceptions
ARM64: add quick-n-dirty emulation for AES instructions
ARM64: add Crypto Extensions based synchronous core AES cipher
ARM64: add Crypto Extensions based synchronous AES in CCM mode
mac80211: Use CCM crypto driver for CCMP
arch/arm64/Kconfig | 14 ++
arch/arm64/Makefile | 1 +
arch/arm64/crypto/Makefile | 16 ++
arch/arm64/crypto/aes-sync.c | 410 ++++++++++++++++++++++++++++++++++++++++
arch/arm64/crypto/aesce-ccm.S | 159 ++++++++++++++++
arch/arm64/crypto/aesce-emu.c | 221 ++++++++++++++++++++++
arch/arm64/include/asm/ptrace.h | 3 +
arch/arm64/include/asm/traps.h | 10 +
arch/arm64/kernel/asm-offsets.c | 3 +
arch/arm64/kernel/entry.S | 12 +-
arch/arm64/kernel/traps.c | 49 +++++
net/mac80211/Kconfig | 1 +
net/mac80211/aes_ccm.c | 159 +++++-----------
net/mac80211/aes_ccm.h | 8 +-
net/mac80211/key.h | 2 +-
net/mac80211/wpa.c | 21 +-
16 files changed, 961 insertions(+), 128 deletions(-)
create mode 100644 arch/arm64/crypto/Makefile
create mode 100644 arch/arm64/crypto/aes-sync.c
create mode 100644 arch/arm64/crypto/aesce-ccm.S
create mode 100644 arch/arm64/crypto/aesce-emu.c
--
1.8.1.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions
2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
2013-10-07 22:03 ` Nicolas Pitre
2013-10-07 12:12 ` [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions Ard Biesheuvel
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
To: linux-arm-kernel
Stack/unstack the bottom 4 NEON registers on exception entry/exit
so we can use them in places where we are not allowed to sleep.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/Kconfig | 14 ++++++++++++++
arch/arm64/include/asm/ptrace.h | 3 +++
arch/arm64/kernel/asm-offsets.c | 3 +++
arch/arm64/kernel/entry.S | 8 ++++++++
4 files changed, 28 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c044548..b97a458 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -98,6 +98,9 @@ config IOMMU_HELPER
config KERNEL_MODE_NEON
def_bool y
+config STACK_NEON_REGS_ON_EXCEPTION
+ def_bool n
+
source "init/Kconfig"
source "kernel/Kconfig.freezer"
@@ -219,6 +222,17 @@ config FORCE_MAX_ZONEORDER
default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
default "11"
+config KERNEL_MODE_SYNC_CE_CRYPTO
+ bool "Support for synchronous crypto ciphers using Crypto Extensions"
+ depends on KERNEL_MODE_NEON
+ select STACK_NEON_REGS_ON_EXCEPTION
+ help
+ This enables support for using ARMv8 Crypto Extensions instructions
+ in places where sleeping is not allowed. The synchronous ciphers are
+ only allowed to use the bottom 4 NEON register q0 - q3, as stacking
+ the entire NEON register file at every exception is too costly.
+
+
endmenu
menu "Boot options"
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index 0dacbbf..17ea483 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -103,6 +103,9 @@ struct pt_regs {
};
u64 orig_x0;
u64 syscallno;
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+ struct { u64 l, h; } qregs[4];
+#endif
};
#define arch_has_single_step() (1)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 666e231..73c944a 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -58,6 +58,9 @@ int main(void)
DEFINE(S_PC, offsetof(struct pt_regs, pc));
DEFINE(S_ORIG_X0, offsetof(struct pt_regs, orig_x0));
DEFINE(S_SYSCALLNO, offsetof(struct pt_regs, syscallno));
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+ DEFINE(S_QREGS, offsetof(struct pt_regs, qregs));
+#endif
DEFINE(S_FRAME_SIZE, sizeof(struct pt_regs));
BLANK();
DEFINE(MM_CONTEXT_ID, offsetof(struct mm_struct, context.id));
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 3881fd1..c74dcca 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -58,6 +58,10 @@
push x4, x5
push x2, x3
push x0, x1
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+ add x21, sp, #S_QREGS
+ st1 {v0.16b-v3.16b}, [x21]
+#endif
.if \el == 0
mrs x21, sp_el0
.else
@@ -86,6 +90,10 @@
.endm
.macro kernel_exit, el, ret = 0
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+ add x21, sp, #S_QREGS
+ ld1 {v0.16b-v3.16b}, [x21]
+#endif
ldp x21, x22, [sp, #S_PC] // load ELR, SPSR
.if \el == 0
ldr x23, [sp, #S_SP] // load return stack pointer
--
1.8.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions
2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
2013-10-07 16:18 ` Catalin Marinas
2013-10-07 12:12 ` [RFC PATCH 3/5] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
To: linux-arm-kernel
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/Makefile | 1 +
arch/arm64/crypto/Makefile | 11 ++
arch/arm64/crypto/aesce-emu.c | 221 +++++++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/traps.h | 10 ++
arch/arm64/kernel/entry.S | 4 +-
arch/arm64/kernel/traps.c | 49 +++++++++
6 files changed, 295 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/crypto/Makefile
create mode 100644 arch/arm64/crypto/aesce-emu.c
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index d90cf79..c864bb5 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -39,6 +39,7 @@ export TEXT_OFFSET GZFLAGS
core-y += arch/arm64/kernel/ arch/arm64/mm/
core-$(CONFIG_KVM) += arch/arm64/kvm/
core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
libs-y := arch/arm64/lib/ $(libs-y)
libs-y += $(LIBGCC)
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 0000000..f87ec80
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,11 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-y += aesce-emu.o
diff --git a/arch/arm64/crypto/aesce-emu.c b/arch/arm64/crypto/aesce-emu.c
new file mode 100644
index 0000000..4cc7ee9
--- /dev/null
+++ b/arch/arm64/crypto/aesce-emu.c
@@ -0,0 +1,221 @@
+/*
+ * aesce-emu-c - emulate aese/aesd/aesmc/aesimc instructions
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/init.h>
+#include <linux/printk.h>
+#include <linux/ptrace.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <asm/traps.h>
+
+union AES_STATE {
+ u8 bytes[16];
+ u64 l[2];
+} __aligned(8);
+
+static void add_sub_shift(union AES_STATE *st, union AES_STATE *rk, int inv);
+static void mix_columns(union AES_STATE *out, union AES_STATE *in);
+static void inv_mix_columns_pre(union AES_STATE *out);
+
+#define REG_ACCESS(op, r, mem) \
+ do { case r: asm(#op " {v" #r ".16b}, [%0]" : : "r"(mem)); goto out; \
+ } while (0)
+
+#define REG_SWITCH(reg, op, m) do { switch (reg) { \
+ REG_ACCESS(op, 0, m); REG_ACCESS(op, 1, m); REG_ACCESS(op, 2, m); \
+ REG_ACCESS(op, 3, m); REG_ACCESS(op, 4, m); REG_ACCESS(op, 5, m); \
+ REG_ACCESS(op, 6, m); REG_ACCESS(op, 7, m); REG_ACCESS(op, 8, m); \
+ REG_ACCESS(op, 9, m); REG_ACCESS(op, 10, m); REG_ACCESS(op, 11, m); \
+ REG_ACCESS(op, 12, m); REG_ACCESS(op, 13, m); REG_ACCESS(op, 14, m); \
+ REG_ACCESS(op, 15, m); REG_ACCESS(op, 16, m); REG_ACCESS(op, 17, m); \
+ REG_ACCESS(op, 18, m); REG_ACCESS(op, 19, m); REG_ACCESS(op, 20, m); \
+ REG_ACCESS(op, 21, m); REG_ACCESS(op, 22, m); REG_ACCESS(op, 23, m); \
+ REG_ACCESS(op, 24, m); REG_ACCESS(op, 25, m); REG_ACCESS(op, 26, m); \
+ REG_ACCESS(op, 27, m); REG_ACCESS(op, 28, m); REG_ACCESS(op, 29, m); \
+ REG_ACCESS(op, 30, m); REG_ACCESS(op, 31, m); \
+ } out:; } while (0)
+
+static void load_neon_reg(union AES_STATE *st, int reg)
+{
+ REG_SWITCH(reg, st1, st->bytes);
+}
+
+static void save_neon_reg(union AES_STATE *st, int reg, struct pt_regs *regs)
+{
+ REG_SWITCH(reg, ld1, st->bytes);
+
+#ifdef CONFIG_STACK_NEON_REGS_ON_EXCEPTION
+ if (reg < 4)
+ /* update the stacked reg as well */
+ memcpy((u8 *)®s->qregs[reg], st->bytes, 16);
+#endif
+}
+
+static void aesce_do_emulate(unsigned int instr, struct pt_regs *regs)
+{
+ enum { AESE, AESD, AESMC, AESIMC } kind = (instr >> 12) & 3;
+ int rn = (instr >> 5) & 0x1f;
+ int rd = instr & 0x1f;
+ union AES_STATE in, out;
+
+ load_neon_reg(&in, rn);
+
+ switch (kind) {
+ case AESE:
+ case AESD:
+ load_neon_reg(&out, rd);
+ add_sub_shift(&out, &in, kind);
+ break;
+ case AESIMC:
+ inv_mix_columns_pre(&in);
+ case AESMC:
+ mix_columns(&out, &in);
+ }
+ save_neon_reg(&out, rd, regs);
+}
+
+static int aesce_emu_instr(struct pt_regs *regs, unsigned int instr);
+
+static struct undef_hook aesce_emu_uh = {
+ .instr_val = 0x4e284800,
+ .instr_mask = 0xffffcc00,
+ .fn = aesce_emu_instr,
+};
+
+static int aesce_emu_instr(struct pt_regs *regs, unsigned int instr)
+{
+ do {
+ aesce_do_emulate(instr, regs);
+ regs->pc += 4;
+ get_user(instr, (u32 __user *)regs->pc);
+ } while ((instr & aesce_emu_uh.instr_mask) == aesce_emu_uh.instr_val);
+
+ return 0;
+}
+
+static int aesce_emu_init(void)
+{
+ register_undef_hook(&aesce_emu_uh);
+ return 0;
+}
+
+arch_initcall(aesce_emu_init);
+
+#define gf8_mul_x(a) \
+ (((a) << 1) ^ (((a) & 0x80) ? 0x1b : 0))
+
+static void mix_columns(union AES_STATE *out, union AES_STATE *in)
+{
+ int i;
+
+ for (i = 0; i < 16; i++)
+ out->bytes[i] =
+ gf8_mul_x(in->bytes[i]) ^
+ gf8_mul_x(in->bytes[((i + 1) % 4) | (i & ~3)]) ^
+ in->bytes[((i + 1) % 4) | (i & ~3)] ^
+ in->bytes[((i + 2) % 4) | (i & ~3)] ^
+ in->bytes[((i + 3) % 4) | (i & ~3)];
+}
+
+#define gf8_mul_x2(a) \
+ (((a) << 2) ^ (((a) & 0x80) ? 0x36 : 0) ^ (((a) & 0x40) ? 0x1b : 0))
+
+static void inv_mix_columns_pre(union AES_STATE *out)
+{
+ union AES_STATE in = *out;
+ int i;
+
+ for (i = 0; i < 16; i++)
+ out->bytes[i] = gf8_mul_x2(in.bytes[i]) ^ in.bytes[i] ^
+ gf8_mul_x2(in.bytes[i ^ 2]);
+}
+
+static void add_sub_shift(union AES_STATE *st, union AES_STATE *rk, int inv)
+{
+ static u8 const sbox[][256] = { {
+ 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
+ 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
+ 0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0,
+ 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
+ 0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc,
+ 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
+ 0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a,
+ 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75,
+ 0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0,
+ 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84,
+ 0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b,
+ 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
+ 0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85,
+ 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8,
+ 0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5,
+ 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2,
+ 0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17,
+ 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
+ 0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88,
+ 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb,
+ 0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c,
+ 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
+ 0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9,
+ 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
+ 0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6,
+ 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a,
+ 0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e,
+ 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e,
+ 0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94,
+ 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
+ 0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68,
+ 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16
+ }, {
+ 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38,
+ 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb,
+ 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87,
+ 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb,
+ 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d,
+ 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e,
+ 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2,
+ 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25,
+ 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16,
+ 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92,
+ 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda,
+ 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84,
+ 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a,
+ 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06,
+ 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02,
+ 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b,
+ 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea,
+ 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73,
+ 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85,
+ 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e,
+ 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89,
+ 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b,
+ 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20,
+ 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4,
+ 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31,
+ 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f,
+ 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d,
+ 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef,
+ 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0,
+ 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61,
+ 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26,
+ 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
+ } };
+ static u8 const permute[][16] = { {
+ 0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11
+ }, {
+ 0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3
+ } };
+ int i;
+
+ rk->l[0] ^= st->l[0];
+ rk->l[1] ^= st->l[1];
+
+ for (i = 0; i < 16; i++)
+ st->bytes[i] = sbox[inv][rk->bytes[permute[inv][i]]];
+}
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index 10ca8ff..781e50cb2 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -27,4 +27,14 @@ static inline int in_exception_text(unsigned long ptr)
ptr < (unsigned long)&__exception_text_end;
}
+struct undef_hook {
+ struct list_head node;
+ u32 instr_mask;
+ u32 instr_val;
+ int (*fn)(struct pt_regs *regs, unsigned int instr);
+};
+
+void register_undef_hook(struct undef_hook *hook);
+void unregister_undef_hook(struct undef_hook *hook);
+
#endif
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index c74dcca..e4d89df 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -291,7 +291,9 @@ el1_undef:
* Undefined instruction
*/
mov x0, sp
- b do_undefinstr
+ bl do_undefinstr
+
+ kernel_exit 1
el1_dbg:
/*
* Debug exception handling
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 7ffaddd..3cc4c91 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -257,11 +257,60 @@ void arm64_notify_die(const char *str, struct pt_regs *regs,
die(str, regs, err);
}
+static LIST_HEAD(undef_hook);
+static DEFINE_RAW_SPINLOCK(undef_lock);
+
+void register_undef_hook(struct undef_hook *hook)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&undef_lock, flags);
+ list_add(&hook->node, &undef_hook);
+ raw_spin_unlock_irqrestore(&undef_lock, flags);
+}
+
+void unregister_undef_hook(struct undef_hook *hook)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&undef_lock, flags);
+ list_del(&hook->node);
+ raw_spin_unlock_irqrestore(&undef_lock, flags);
+}
+
+static int call_undef_hook(struct pt_regs *regs, void __user *pc)
+{
+ struct undef_hook *hook;
+ unsigned long flags;
+ int (*fn)(struct pt_regs *regs, unsigned int instr) = NULL;
+ unsigned int instr;
+ mm_segment_t fs;
+ int ret;
+
+ fs = get_fs();
+ set_fs(KERNEL_DS);
+
+ get_user(instr, (u32 __user *)pc);
+
+ raw_spin_lock_irqsave(&undef_lock, flags);
+ list_for_each_entry(hook, &undef_hook, node)
+ if ((instr & hook->instr_mask) == hook->instr_val)
+ fn = hook->fn;
+ raw_spin_unlock_irqrestore(&undef_lock, flags);
+
+ ret = fn ? fn(regs, instr) : 1;
+ set_fs(fs);
+ return ret;
+}
+
asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
{
siginfo_t info;
void __user *pc = (void __user *)instruction_pointer(regs);
+ if (call_undef_hook(regs, pc) == 0)
+ return;
+
/* check for AArch32 breakpoint instructions */
if (!aarch32_break_handler(regs))
return;
--
1.8.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 3/5] ARM64: add Crypto Extensions based synchronous core AES cipher
2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 4/5] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 5/5] mac80211: Use CCM crypto driver for CCMP Ard Biesheuvel
4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
To: linux-arm-kernel
This implements the core AES cipher using the Crypto Extensions,
using only NEON register q0 and q1.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Makefile | 5 +++
arch/arm64/crypto/aes-sync.c | 95 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 100 insertions(+)
create mode 100644 arch/arm64/crypto/aes-sync.c
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index f87ec80..e598c0a 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -9,3 +9,8 @@
#
obj-y += aesce-emu.o
+
+ifeq ($(CONFIG_KERNEL_MODE_SYNC_CE_CRYPTO),y)
+aesce-sync-y := aes-sync.o
+obj-m += aesce-sync.o
+endif
diff --git a/arch/arm64/crypto/aes-sync.c b/arch/arm64/crypto/aes-sync.c
new file mode 100644
index 0000000..5c5d641
--- /dev/null
+++ b/arch/arm64/crypto/aes-sync.c
@@ -0,0 +1,95 @@
+/*
+ * linux/arch/arm64/crypto/aes-sync.c
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <crypto/aes.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+ int rounds = 6 + ctx->key_length / 4;
+
+ __asm__(" .arch armv8-a+crypto \n\t"
+ " ld1 {v0.16b}, [%[in]] \n\t"
+ " ld1 {v1.16b}, [%[key]], #16 \n\t"
+ "0: aese v0.16b, v1.16b \n\t"
+ " subs %[rounds], %[rounds], #1 \n\t"
+ " ld1 {v1.16b}, [%[key]], #16 \n\t"
+ " beq 1f \n\t"
+ " aesmc v0.16b, v0.16b \n\t"
+ " b 0b \n\t"
+ "1: eor v0.16b, v0.16b, v1.16b \n\t"
+ " st1 {v0.16b}, [%[out]] \n\t"
+ : :
+ [out] "r"(dst),
+ [in] "r"(src),
+ [rounds] "r"(rounds),
+ [key] "r"(ctx->key_enc));
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+ int rounds = 6 + ctx->key_length / 4;
+
+ __asm__(" .arch armv8-a+crypto \n\t"
+ " ld1 {v0.16b}, [%[in]] \n\t"
+ " ld1 {v1.16b}, [%[key]], #16 \n\t"
+ "0: aesd v0.16b, v1.16b \n\t"
+ " ld1 {v1.16b}, [%[key]], #16 \n\t"
+ " subs %[rounds], %[rounds], #1 \n\t"
+ " beq 1f \n\t"
+ " aesimc v0.16b, v0.16b \n\t"
+ " b 0b \n\t"
+ "1: eor v0.16b, v0.16b, v1.16b \n\t"
+ " st1 {v0.16b}, [%[out]] \n\t"
+ : :
+ [out] "r"(dst),
+ [in] "r"(src),
+ [rounds] "r"(rounds),
+ [key] "r"(ctx->key_dec));
+}
+
+static struct crypto_alg aes_alg = {
+ .cra_name = "aes",
+ .cra_driver_name = "aes-ce",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_cipher = {
+ .cia_min_keysize = AES_MIN_KEY_SIZE,
+ .cia_max_keysize = AES_MAX_KEY_SIZE,
+ .cia_setkey = crypto_aes_set_key,
+ .cia_encrypt = aes_cipher_encrypt,
+ .cia_decrypt = aes_cipher_decrypt
+ }
+};
+
+static int __init aes_mod_init(void)
+{
+ if (0) // TODO check for crypto extensions
+ return -ENODEV;
+ return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+ crypto_unregister_alg(&aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
+
+MODULE_DESCRIPTION("Synchronous AES using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
--
1.8.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 4/5] ARM64: add Crypto Extensions based synchronous AES in CCM mode
2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
` (2 preceding siblings ...)
2013-10-07 12:12 ` [RFC PATCH 3/5] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 5/5] mac80211: Use CCM crypto driver for CCMP Ard Biesheuvel
4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
To: linux-arm-kernel
This implements the CCM AEAD chaining mode for AES using Crypto
Extensions instructions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Makefile | 2 +-
arch/arm64/crypto/aes-sync.c | 323 +++++++++++++++++++++++++++++++++++++++++-
arch/arm64/crypto/aesce-ccm.S | 159 +++++++++++++++++++++
3 files changed, 479 insertions(+), 5 deletions(-)
create mode 100644 arch/arm64/crypto/aesce-ccm.S
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index e598c0a..dfd1886 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -11,6 +11,6 @@
obj-y += aesce-emu.o
ifeq ($(CONFIG_KERNEL_MODE_SYNC_CE_CRYPTO),y)
-aesce-sync-y := aes-sync.o
+aesce-sync-y := aes-sync.o aesce-ccm.o
obj-m += aesce-sync.o
endif
diff --git a/arch/arm64/crypto/aes-sync.c b/arch/arm64/crypto/aes-sync.c
index 5c5d641..263925a5 100644
--- a/arch/arm64/crypto/aes-sync.c
+++ b/arch/arm64/crypto/aes-sync.c
@@ -8,7 +8,10 @@
* published by the Free Software Foundation.
*/
+#include <asm/unaligned.h>
#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/scatterwalk.h>
#include <linux/crypto.h>
#include <linux/module.h>
@@ -58,7 +61,281 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
[key] "r"(ctx->key_dec));
}
-static struct crypto_alg aes_alg = {
+struct crypto_ccm_aes_ctx {
+ struct crypto_aes_ctx *key;
+ struct crypto_blkcipher *blk_tfm;
+};
+
+asmlinkage void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], long abytes,
+ u32 const rk[], int rounds);
+
+asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], long cbytes,
+ u32 const rk[], int rounds, u8 mac[],
+ u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], long cbytes,
+ u32 const rk[], int rounds, u8 mac[],
+ u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
+ long rounds);
+
+static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(tfm);
+ int ret;
+
+ ret = crypto_aes_expand_key(ctx->key, in_key, key_len);
+ if (!ret)
+ return 0;
+
+ tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+ return -EINVAL;
+}
+
+static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
+{
+ if ((authsize & 1) || authsize < 4)
+ return -EINVAL;
+ return 0;
+}
+
+static void ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ __be32 *n = (__be32 *)(&maciv[AES_BLOCK_SIZE - 4]);
+ u32 l = req->iv[0] + 1;
+
+ *n = cpu_to_be32(msglen);
+
+ memcpy(maciv, req->iv, AES_BLOCK_SIZE - l);
+
+ maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2;
+ if (req->assoclen)
+ maciv[0] |= 0x40;
+
+ memset(&req->iv[AES_BLOCK_SIZE - l], 0, l);
+}
+
+static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+ struct __packed { __be16 l; __be32 h; } ltag;
+ int rounds = 6 + ctx->key->key_length / 4;
+ struct scatter_walk walk;
+ u32 len = req->assoclen;
+ u32 macp;
+
+ /* prepend the AAD with a length tag */
+ if (len < 0xff00) {
+ ltag.l = cpu_to_be16(len);
+ macp = 2;
+ } else {
+ ltag.l = cpu_to_be16(0xfffe);
+ put_unaligned_be32(len, <ag.h);
+ macp = 6;
+ }
+
+ ce_aes_ccm_auth_data(mac, (u8 *)<ag, macp, ctx->key->key_enc, rounds);
+ scatterwalk_start(&walk, req->assoc);
+
+ do {
+ u32 n = scatterwalk_clamp(&walk, len);
+ u32 m;
+ u8 *p;
+
+ if (!n) {
+ scatterwalk_start(&walk, sg_next(walk.sg));
+ n = scatterwalk_clamp(&walk, len);
+ }
+ p = scatterwalk_map(&walk);
+ m = min(n, AES_BLOCK_SIZE - macp);
+ crypto_xor(&mac[macp], p, m);
+
+ len -= n;
+ n -= m;
+ macp += m;
+ if (macp == AES_BLOCK_SIZE && (n || len)) {
+ ce_aes_ccm_auth_data(mac, &p[m], n, ctx->key->key_enc,
+ rounds);
+ macp = n % AES_BLOCK_SIZE;
+ }
+
+ scatterwalk_unmap(p);
+ scatterwalk_advance(&walk, n + m);
+ scatterwalk_done(&walk, 0, len);
+ } while (len);
+}
+
+struct ccm_inner_desc_info {
+ u8 ctriv[AES_BLOCK_SIZE];
+ u8 mac[AES_BLOCK_SIZE];
+} __aligned(8);
+
+static int ccm_inner_encrypt(struct blkcipher_desc *desc,
+ struct scatterlist *dst, struct scatterlist *src,
+ unsigned int nbytes)
+{
+ struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
+ struct ccm_inner_desc_info *descinfo = desc->info;
+ int rounds = 6 + ctx->key_length / 4;
+ struct blkcipher_walk walk;
+ int err;
+
+ blkcipher_walk_init(&walk, dst, src, nbytes);
+ err = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE);
+
+ while (walk.nbytes) {
+ u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+ if (walk.nbytes == nbytes)
+ tail = 0;
+
+ ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes - tail, ctx->key_enc, rounds,
+ descinfo->mac, descinfo->ctriv);
+
+ nbytes -= walk.nbytes - tail;
+ err = blkcipher_walk_done(desc, &walk, tail);
+ }
+ return err;
+}
+
+static int ccm_inner_decrypt(struct blkcipher_desc *desc,
+ struct scatterlist *dst, struct scatterlist *src,
+ unsigned int nbytes)
+{
+ struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
+ struct ccm_inner_desc_info *descinfo = desc->info;
+ int rounds = 6 + ctx->key_length / 4;
+ struct blkcipher_walk walk;
+ int err;
+
+ blkcipher_walk_init(&walk, dst, src, nbytes);
+ err = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE);
+
+ while (walk.nbytes) {
+ u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+ if (walk.nbytes == nbytes)
+ tail = 0;
+
+ ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes - tail, ctx->key_enc, rounds,
+ descinfo->mac, descinfo->ctriv);
+
+ nbytes -= walk.nbytes - tail;
+ err = blkcipher_walk_done(desc, &walk, tail);
+ }
+ return err;
+}
+
+static int ccm_encrypt(struct aead_request *req)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+ int rounds = 6 + ctx->key->key_length / 4;
+ struct ccm_inner_desc_info descinfo;
+ int err;
+
+ struct blkcipher_desc desc = {
+ .tfm = ctx->blk_tfm,
+ .info = &descinfo,
+ .flags = 0,
+ };
+
+ ccm_init_mac(req, descinfo.mac, req->cryptlen);
+
+ if (req->assoclen)
+ ccm_calculate_auth_mac(req, descinfo.mac);
+
+ memcpy(descinfo.ctriv, req->iv, AES_BLOCK_SIZE);
+
+ /* call inner blkcipher to process the payload */
+ err = ccm_inner_encrypt(&desc, req->dst, req->src, req->cryptlen);
+ if (err)
+ return err;
+
+ ce_aes_ccm_final(descinfo.mac, req->iv, ctx->key->key_enc, rounds);
+
+ /* copy authtag to end of dst */
+ scatterwalk_map_and_copy(descinfo.mac, req->dst, req->cryptlen,
+ crypto_aead_authsize(aead), 1);
+
+ return 0;
+}
+
+static int ccm_decrypt(struct aead_request *req)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_ccm_aes_ctx *ctx = crypto_aead_ctx(aead);
+ int rounds = 6 + ctx->key->key_length / 4;
+ struct ccm_inner_desc_info descinfo;
+ u8 atag[AES_BLOCK_SIZE];
+ u32 len;
+ int err;
+
+ struct blkcipher_desc desc = {
+ .tfm = ctx->blk_tfm,
+ .info = &descinfo,
+ .flags = 0,
+ };
+
+ len = req->cryptlen - crypto_aead_authsize(aead);
+ ccm_init_mac(req, descinfo.mac, len);
+
+ if (req->assoclen)
+ ccm_calculate_auth_mac(req, descinfo.mac);
+
+ memcpy(descinfo.ctriv, req->iv, AES_BLOCK_SIZE);
+
+ /* call inner blkcipher to process the payload */
+ err = ccm_inner_decrypt(&desc, req->dst, req->src, len);
+ if (err)
+ return err;
+
+ ce_aes_ccm_final(descinfo.mac, req->iv, ctx->key->key_enc, rounds);
+
+ /* compare calculated auth tag with the stored one */
+ scatterwalk_map_and_copy(atag, req->src, len,
+ crypto_aead_authsize(aead), 0);
+
+ if (memcmp(descinfo.mac, atag, crypto_aead_authsize(aead)))
+ return -EBADMSG;
+ return 0;
+}
+
+static int ccm_init(struct crypto_tfm *tfm)
+{
+ struct crypto_ccm_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+ struct crypto_blkcipher *blk_tfm;
+
+ blk_tfm = crypto_alloc_blkcipher("__driver-ccm-aesce-inner", 0, 0);
+ if (IS_ERR(blk_tfm))
+ return PTR_ERR(blk_tfm);
+
+ /* did we get the right one? (sanity check) */
+ if (crypto_blkcipher_crt(blk_tfm)->encrypt != ccm_inner_encrypt) {
+ crypto_free_blkcipher(ctx->blk_tfm);
+ return -EINVAL;
+ }
+
+ ctx->blk_tfm = blk_tfm;
+ ctx->key = crypto_blkcipher_ctx(blk_tfm);
+
+ return 0;
+}
+
+static void ccm_exit(struct crypto_tfm *tfm)
+{
+ struct crypto_ccm_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ crypto_free_blkcipher(ctx->blk_tfm);
+}
+
+static struct crypto_alg aes_algs[] = { {
.cra_name = "aes",
.cra_driver_name = "aes-ce",
.cra_priority = 300,
@@ -73,18 +350,56 @@ static struct crypto_alg aes_alg = {
.cia_encrypt = aes_cipher_encrypt,
.cia_decrypt = aes_cipher_decrypt
}
-};
+}, {
+ .cra_name = "__ccm-aesce-inner",
+ .cra_driver_name = "__driver-ccm-aesce-inner",
+ .cra_priority = 0,
+ .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_alignmask = 7,
+ .cra_type = &crypto_blkcipher_type,
+ .cra_module = THIS_MODULE,
+ .cra_blkcipher = {
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = sizeof(struct ccm_inner_desc_info),
+ .setkey = crypto_aes_set_key,
+ .encrypt = ccm_inner_encrypt,
+ .decrypt = ccm_inner_decrypt,
+ },
+}, {
+ .cra_name = "ccm(aes)",
+ .cra_driver_name = "ccm-aes-ce",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_AEAD,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct crypto_ccm_aes_ctx),
+ .cra_alignmask = 7,
+ .cra_type = &crypto_aead_type,
+ .cra_module = THIS_MODULE,
+ .cra_init = ccm_init,
+ .cra_exit = ccm_exit,
+ .cra_aead = {
+ .ivsize = AES_BLOCK_SIZE,
+ .maxauthsize = AES_BLOCK_SIZE,
+ .setkey = ccm_setkey,
+ .setauthsize = ccm_setauthsize,
+ .encrypt = ccm_encrypt,
+ .decrypt = ccm_decrypt,
+ }
+} };
static int __init aes_mod_init(void)
{
if (0) // TODO check for crypto extensions
return -ENODEV;
- return crypto_register_alg(&aes_alg);
+ return crypto_register_algs(aes_algs, ARRAY_SIZE(aes_algs));
}
static void __exit aes_mod_exit(void)
{
- crypto_unregister_alg(&aes_alg);
+ crypto_unregister_algs(aes_algs, ARRAY_SIZE(aes_algs));
}
module_init(aes_mod_init);
diff --git a/arch/arm64/crypto/aesce-ccm.S b/arch/arm64/crypto/aesce-ccm.S
new file mode 100644
index 0000000..35d09af
--- /dev/null
+++ b/arch/arm64/crypto/aesce-ccm.S
@@ -0,0 +1,159 @@
+/*
+ * linux/arch/arm64/crypto/aesce-ccm.S - AES-CCM transform for ARMv8 with
+ * Crypto Extensions
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+ .text
+ .arch armv8-a+crypto
+ .align 4
+
+ /*
+ * void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], long abytes,
+ * u8 const rk[], int rounds);
+ */
+ENTRY(ce_aes_ccm_auth_data)
+ ld1 {v0.16b}, [x0] /* load mac */
+ ld1 {v1.16b}, [x3] /* load first round key */
+0: mov x7, x4
+ add x6, x3, #16
+1: aese v0.16b, v1.16b
+ ld1 {v1.16b}, [x6], #16 /* load next round key */
+ subs x7, x7, #1
+ beq 2f
+ aesmc v0.16b, v0.16b
+ b 1b
+2: eor v0.16b, v0.16b, v1.16b /* final round */
+ subs x2, x2, #16 /* last data? */
+ bmi 3f
+ ld1 {v1.16b}, [x1], #16 /* load next input block */
+ eor v0.16b, v0.16b, v1.16b /* xor with mac */
+ beq 3f
+ ld1 {v1.16b}, [x3] /* reload first round key */
+ b 0b
+3: st1 {v0.16b}, [x0] /* store mac */
+ beq 5f
+ adds x2, x2, #16
+ beq 5f
+4: ldrb w7, [x1], #1
+ umov w6, v0.b[0]
+ eor w6, w6, w7
+ strb w6, [x0], #1
+ subs x2, x2, #1
+ beq 5f
+ ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */
+ b 4b
+5: ret
+ENDPROC(ce_aes_ccm_auth_data)
+
+ .macro aes_ccm_do_crypt,enc
+ stp x29, x30, [sp, #-32]! /* prologue */
+ mov x29, sp
+ stp x8, x9, [sp, #16]
+ ld1 {v0.16b}, [x5] /* load mac */
+ ld1 {v2.16b}, [x6] /* load ctr */
+ ld1 {v3.16b}, [x3] /* load first round key */
+ umov x8, v2.d[1]
+ rev x8, x8 /* keep swabbed ctr in reg */
+0: add x8, x8, #1
+ rev x9, x8
+ ins v2.d[1], x9 /* no carry */
+ mov x7, x4
+ add x9, x3, #16
+1: aese v0.16b, v3.16b
+ aese v2.16b, v3.16b
+ ld1 {v3.16b}, [x9], #16 /* load next round key */
+ subs x7, x7, #1
+ beq 2f
+ aesmc v0.16b, v0.16b
+ aesmc v2.16b, v2.16b
+ b 1b
+2: eor v2.16b, v2.16b, v3.16b /* final round enc */
+ eor v0.16b, v0.16b, v3.16b /* final round mac */
+ subs x2, x2, #16
+ bmi 3f
+ ld1 {v1.16b}, [x1], #16 /* load next input block */
+ .if \enc == 1
+ eor v0.16b, v0.16b, v1.16b /* xor mac with plaintext */
+ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */
+ .else
+ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */
+ eor v0.16b, v0.16b, v1.16b /* xor mac with plaintext */
+ .endif
+ st1 {v1.16b}, [x0], #16 /* write output block */
+ beq 5f
+ ld1 {v2.8b}, [x6] /* reload ctriv */
+ ld1 {v3.16b}, [x3] /* reload first round key */
+ b 0b
+3: st1 {v0.16b}, [x5] /* store mac */
+ add x2, x2, #16 /* process partial tail block */
+4: ldrb w9, [x1], #1 /* get 1 byte of input */
+ umov w6, v2.b[0] /* get top crypted ctr byte */
+ umov w7, v0.b[0] /* get top mac byte */
+ .if \enc == 1
+ eor w7, w7, w9
+ eor w9, w9, w6
+ .else
+ eor w9, w9, w6
+ eor w7, w7, w9
+ .endif
+ strb w9, [x0], #1 /* store out byte */
+ strb w7, [x5], #1 /* store mac byte */
+ subs x2, x2, #1
+ beq 6f
+ ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */
+ ext v2.16b, v2.16b, v2.16b, #1 /* shift out ctr byte */
+ b 4b
+5: rev x8, x8
+ st1 {v0.16b}, [x5] /* store mac */
+ str x8, [x6, #8] /* store lsb end of ctr (BE) */
+6: ldp x8, x9, [sp, #16] /* epilogue */
+ ldp x29, x30, [sp], #32
+ ret
+ .endm
+
+ /*
+ * void ce_aes_ccm_encrypt(u8 out[], u8 const in[], long cbytes,
+ * u8 const rk[], int rounds, u8 mac[],
+ * u8 ctr[]);
+ * void ce_aes_ccm_decrypt(u8 out[], u8 const in[], long cbytes,
+ * u8 const rk[], int rounds, u8 mac[],
+ * u8 ctr[]);
+ */
+ENTRY(ce_aes_ccm_encrypt)
+ aes_ccm_do_crypt 1
+ENDPROC(ce_aes_ccm_encrypt)
+
+ENTRY(ce_aes_ccm_decrypt)
+ aes_ccm_do_crypt 0
+ENDPROC(ce_aes_ccm_decrypt)
+
+ /*
+ * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[],
+ * long rounds);
+ */
+ENTRY(ce_aes_ccm_final)
+ ld1 {v0.16b}, [x0] /* load mac */
+ ld1 {v3.16b}, [x2], #16 /* load first round key */
+ ld1 {v2.16b}, [x1] /* load 1st ctriv */
+0: aese v0.16b, v3.16b
+ aese v2.16b, v3.16b
+ ld1 {v3.16b}, [x2], #16 /* load next round key */
+ subs x3, x3, #1
+ beq 1f
+ aesmc v0.16b, v0.16b
+ aesmc v2.16b, v2.16b
+ b 0b
+1: eor v2.16b, v2.16b, v3.16b /* final round enc */
+ eor v0.16b, v0.16b, v3.16b /* final round mac */
+ eor v0.16b, v0.16b, v2.16b /* en-/decrypt the mac */
+ st1 {v0.16b}, [x0] /* store result */
+ ret
+ENDPROC(ce_aes_ccm_final)
--
1.8.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 5/5] mac80211: Use CCM crypto driver for CCMP
2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
` (3 preceding siblings ...)
2013-10-07 12:12 ` [RFC PATCH 4/5] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
@ 2013-10-07 12:12 ` Ard Biesheuvel
4 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 12:12 UTC (permalink / raw)
To: linux-arm-kernel
Let the crypto layer supply the CCM implementation rather than
coding it directly on top of the core AES cipher.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
net/mac80211/Kconfig | 1 +
net/mac80211/aes_ccm.c | 159 +++++++++++++++----------------------------------
net/mac80211/aes_ccm.h | 8 +--
net/mac80211/key.h | 2 +-
net/mac80211/wpa.c | 21 ++++---
5 files changed, 64 insertions(+), 127 deletions(-)
diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig
index 62535fe..dc31ec3 100644
--- a/net/mac80211/Kconfig
+++ b/net/mac80211/Kconfig
@@ -4,6 +4,7 @@ config MAC80211
select CRYPTO
select CRYPTO_ARC4
select CRYPTO_AES
+ select CRYPTO_CCM
select CRC32
select AVERAGE
---help---
diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c
index be7614b9..c17f3a3 100644
--- a/net/mac80211/aes_ccm.c
+++ b/net/mac80211/aes_ccm.c
@@ -17,134 +17,71 @@
#include "key.h"
#include "aes_ccm.h"
-static void aes_ccm_prepare(struct crypto_cipher *tfm, u8 *scratch, u8 *a)
-{
- int i;
- u8 *b_0, *aad, *b, *s_0;
-
- b_0 = scratch + 3 * AES_BLOCK_SIZE;
- aad = scratch + 4 * AES_BLOCK_SIZE;
- b = scratch;
- s_0 = scratch + AES_BLOCK_SIZE;
-
- crypto_cipher_encrypt_one(tfm, b, b_0);
-
- /* Extra Authenticate-only data (always two AES blocks) */
- for (i = 0; i < AES_BLOCK_SIZE; i++)
- aad[i] ^= b[i];
- crypto_cipher_encrypt_one(tfm, b, aad);
-
- aad += AES_BLOCK_SIZE;
-
- for (i = 0; i < AES_BLOCK_SIZE; i++)
- aad[i] ^= b[i];
- crypto_cipher_encrypt_one(tfm, a, aad);
-
- /* Mask out bits from auth-only-b_0 */
- b_0[0] &= 0x07;
-
- /* S_0 is used to encrypt T (= MIC) */
- b_0[14] = 0;
- b_0[15] = 0;
- crypto_cipher_encrypt_one(tfm, s_0, b_0);
-}
-
-
-void ieee80211_aes_ccm_encrypt(struct crypto_cipher *tfm, u8 *scratch,
+void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *scratch,
u8 *data, size_t data_len,
u8 *cdata, u8 *mic)
{
- int i, j, last_len, num_blocks;
- u8 *pos, *cpos, *b, *s_0, *e, *b_0;
+ struct scatterlist aad, pt, ct;
+ struct aead_request req;
+ u8 *iv = scratch + 3 * AES_BLOCK_SIZE; /* b0 */
- b = scratch;
- s_0 = scratch + AES_BLOCK_SIZE;
- e = scratch + 2 * AES_BLOCK_SIZE;
- b_0 = scratch + 3 * AES_BLOCK_SIZE;
+ sg_init_one(&pt, data, data_len);
+ sg_init_one(&ct, cdata, data_len + IEEE80211_CCMP_MIC_LEN);
+ sg_init_one(&aad, scratch + 4 * AES_BLOCK_SIZE, 2 * AES_BLOCK_SIZE - 2);
- num_blocks = DIV_ROUND_UP(data_len, AES_BLOCK_SIZE);
- last_len = data_len % AES_BLOCK_SIZE;
- aes_ccm_prepare(tfm, scratch, b);
+ aead_request_set_crypt(&req, &pt, &ct, data_len, iv);
+ aead_request_set_assoc(&req, &aad, 2 * AES_BLOCK_SIZE - 2);
- /* Process payload blocks */
- pos = data;
- cpos = cdata;
- for (j = 1; j <= num_blocks; j++) {
- int blen = (j == num_blocks && last_len) ?
- last_len : AES_BLOCK_SIZE;
-
- /* Authentication followed by encryption */
- for (i = 0; i < blen; i++)
- b[i] ^= pos[i];
- crypto_cipher_encrypt_one(tfm, b, b);
-
- b_0[14] = (j >> 8) & 0xff;
- b_0[15] = j & 0xff;
- crypto_cipher_encrypt_one(tfm, e, b_0);
- for (i = 0; i < blen; i++)
- *cpos++ = *pos++ ^ e[i];
- }
-
- for (i = 0; i < IEEE80211_CCMP_MIC_LEN; i++)
- mic[i] = b[i] ^ s_0[i];
+ crypto_aead_encrypt(&req);
}
-
-int ieee80211_aes_ccm_decrypt(struct crypto_cipher *tfm, u8 *scratch,
+int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *scratch,
u8 *cdata, size_t data_len, u8 *mic, u8 *data)
{
- int i, j, last_len, num_blocks;
- u8 *pos, *cpos, *b, *s_0, *a, *b_0;
-
- b = scratch;
- s_0 = scratch + AES_BLOCK_SIZE;
- a = scratch + 2 * AES_BLOCK_SIZE;
- b_0 = scratch + 3 * AES_BLOCK_SIZE;
-
- num_blocks = DIV_ROUND_UP(data_len, AES_BLOCK_SIZE);
- last_len = data_len % AES_BLOCK_SIZE;
- aes_ccm_prepare(tfm, scratch, a);
+ struct scatterlist aad, pt, ct;
+ struct aead_request req;
+ u8 *iv = scratch + 3 * AES_BLOCK_SIZE; /* b0 */
- /* Process payload blocks */
- cpos = cdata;
- pos = data;
- for (j = 1; j <= num_blocks; j++) {
- int blen = (j == num_blocks && last_len) ?
- last_len : AES_BLOCK_SIZE;
+ sg_init_one(&pt, data, data_len);
+ sg_init_one(&ct, cdata, data_len + IEEE80211_CCMP_MIC_LEN);
+ sg_init_one(&aad, scratch + 4 * AES_BLOCK_SIZE, 2 * AES_BLOCK_SIZE - 2);
- /* Decryption followed by authentication */
- b_0[14] = (j >> 8) & 0xff;
- b_0[15] = j & 0xff;
- crypto_cipher_encrypt_one(tfm, b, b_0);
- for (i = 0; i < blen; i++) {
- *pos = *cpos++ ^ b[i];
- a[i] ^= *pos++;
- }
- crypto_cipher_encrypt_one(tfm, a, a);
- }
+ aead_request_set_crypt(&req, &ct, &pt, data_len, iv);
+ aead_request_set_assoc(&req, &aad, 2 * AES_BLOCK_SIZE - 2);
- for (i = 0; i < IEEE80211_CCMP_MIC_LEN; i++) {
- if ((mic[i] ^ s_0[i]) != a[i])
- return -1;
- }
-
- return 0;
+ return crypto_aead_decrypt(&req);
}
-
-struct crypto_cipher *ieee80211_aes_key_setup_encrypt(const u8 key[])
+struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[])
{
- struct crypto_cipher *tfm;
-
- tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC);
- if (!IS_ERR(tfm))
- crypto_cipher_setkey(tfm, key, WLAN_KEY_LEN_CCMP);
-
- return tfm;
+ struct crypto_aead *tfm;
+ int err;
+
+ tfm = crypto_alloc_aead("ccm(aes)", 0, CRYPTO_ALG_ASYNC);
+ if (IS_ERR(tfm))
+ return tfm;
+
+ /* HACK: we use an auto variable for aead_request in the functions above
+ * but this only works if the reqsize is 0, i.e., the aead alg does not
+ * need additional space in the req struct to track its operations.
+ * This is a proof of concept for the Crypto Extensions based AES/CCM,
+ * so for now we just hack around it.
+ */
+ err = -EINVAL;
+ if (crypto_aead_reqsize(tfm))
+ goto out;
+
+ err = crypto_aead_setkey(tfm, key, WLAN_KEY_LEN_CCMP);
+ if (!err)
+ err = crypto_aead_setauthsize(tfm, IEEE80211_CCMP_MIC_LEN);
+ if (!err)
+ return tfm;
+out:
+ crypto_free_aead(tfm);
+ return ERR_PTR(err);
}
-
-void ieee80211_aes_key_free(struct crypto_cipher *tfm)
+void ieee80211_aes_key_free(struct crypto_aead *tfm)
{
- crypto_free_cipher(tfm);
+ crypto_free_aead(tfm);
}
diff --git a/net/mac80211/aes_ccm.h b/net/mac80211/aes_ccm.h
index 5b7d744..797f3f0 100644
--- a/net/mac80211/aes_ccm.h
+++ b/net/mac80211/aes_ccm.h
@@ -12,13 +12,13 @@
#include <linux/crypto.h>
-struct crypto_cipher *ieee80211_aes_key_setup_encrypt(const u8 key[]);
-void ieee80211_aes_ccm_encrypt(struct crypto_cipher *tfm, u8 *scratch,
+struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[]);
+void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *scratch,
u8 *data, size_t data_len,
u8 *cdata, u8 *mic);
-int ieee80211_aes_ccm_decrypt(struct crypto_cipher *tfm, u8 *scratch,
+int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *scratch,
u8 *cdata, size_t data_len,
u8 *mic, u8 *data);
-void ieee80211_aes_key_free(struct crypto_cipher *tfm);
+void ieee80211_aes_key_free(struct crypto_aead *tfm);
#endif /* AES_CCM_H */
diff --git a/net/mac80211/key.h b/net/mac80211/key.h
index 036d57e..aaae0ed 100644
--- a/net/mac80211/key.h
+++ b/net/mac80211/key.h
@@ -83,7 +83,7 @@ struct ieee80211_key {
* Management frames.
*/
u8 rx_pn[IEEE80211_NUM_TIDS + 1][IEEE80211_CCMP_PN_LEN];
- struct crypto_cipher *tfm;
+ struct crypto_aead *tfm;
u32 replays; /* dot11RSNAStatsCCMPReplays */
} ccmp;
struct {
diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c
index c9edfcb..c3df1d7 100644
--- a/net/mac80211/wpa.c
+++ b/net/mac80211/wpa.c
@@ -343,7 +343,7 @@ static void ccmp_special_blocks(struct sk_buff *skb, u8 *pn, u8 *scratch,
data_len -= IEEE80211_CCMP_MIC_LEN;
/* First block, b_0 */
- b_0[0] = 0x59; /* flags: Adata: 1, M: 011, L: 001 */
+ b_0[0] = 0x1; /* L == 1 */
/* Nonce: Nonce Flags | A2 | PN
* Nonce Flags: Priority (b0..b3) | Management (b4) | Reserved (b5..b7)
*/
@@ -355,21 +355,20 @@ static void ccmp_special_blocks(struct sk_buff *skb, u8 *pn, u8 *scratch,
/* AAD (extra authenticate-only data) / masked 802.11 header
* FC | A1 | A2 | A3 | SC | [A4] | [QC] */
- put_unaligned_be16(len_a, &aad[0]);
- put_unaligned(mask_fc, (__le16 *)&aad[2]);
- memcpy(&aad[4], &hdr->addr1, 3 * ETH_ALEN);
+ put_unaligned(mask_fc, (__le16 *)&aad[0]);
+ memcpy(&aad[2], &hdr->addr1, 3 * ETH_ALEN);
/* Mask Seq#, leave Frag# */
- aad[22] = *((u8 *) &hdr->seq_ctrl) & 0x0f;
- aad[23] = 0;
+ aad[20] = *((u8 *) &hdr->seq_ctrl) & 0x0f;
+ aad[21] = 0;
if (a4_included) {
- memcpy(&aad[24], hdr->addr4, ETH_ALEN);
- aad[30] = qos_tid;
- aad[31] = 0;
+ memcpy(&aad[22], hdr->addr4, ETH_ALEN);
+ aad[28] = qos_tid;
+ aad[29] = 0;
} else {
- memset(&aad[24], 0, ETH_ALEN + IEEE80211_QOS_CTL_LEN);
- aad[24] = qos_tid;
+ memset(&aad[22], 0, ETH_ALEN + IEEE80211_QOS_CTL_LEN);
+ aad[22] = qos_tid;
}
}
--
1.8.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions
2013-10-07 12:12 ` [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions Ard Biesheuvel
@ 2013-10-07 16:18 ` Catalin Marinas
2013-10-07 16:28 ` Ard Biesheuvel
0 siblings, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2013-10-07 16:18 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Oct 07, 2013 at 01:12:28PM +0100, Ard Biesheuvel wrote:
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
It's missing a commit message with an explanation. In the meantime, NAK
;)
--
Catalin
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions
2013-10-07 16:18 ` Catalin Marinas
@ 2013-10-07 16:28 ` Ard Biesheuvel
0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 16:28 UTC (permalink / raw)
To: linux-arm-kernel
On 7 October 2013 18:18, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Oct 07, 2013 at 01:12:28PM +0100, Ard Biesheuvel wrote:
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> It's missing a commit message with an explanation. In the meantime, NAK
> ;)
>
Well, I just included it for completeness sake, not for upstreaming,
but if all it takes to get you to accept it is a commit message, than
maybe I will reconsider :-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions
2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
@ 2013-10-07 22:03 ` Nicolas Pitre
2013-10-07 22:44 ` Ard Biesheuvel
0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Pitre @ 2013-10-07 22:03 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 7 Oct 2013, Ard Biesheuvel wrote:
> Stack/unstack the bottom 4 NEON registers on exception entry/exit
> so we can use them in places where we are not allowed to sleep.
If you want to use Neon in places where you're not allowed to sleep,
this also means you are not going to be scheduled away unexpectedly.
Why not simply save/restore those Neon regs locally instead? Same goes
for interrupt context.
Nicolas
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions
2013-10-07 22:03 ` Nicolas Pitre
@ 2013-10-07 22:44 ` Ard Biesheuvel
0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2013-10-07 22:44 UTC (permalink / raw)
To: linux-arm-kernel
On 8 October 2013 00:03, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Mon, 7 Oct 2013, Ard Biesheuvel wrote:
>
>> Stack/unstack the bottom 4 NEON registers on exception entry/exit
>> so we can use them in places where we are not allowed to sleep.
>
> If you want to use Neon in places where you're not allowed to sleep,
> this also means you are not going to be scheduled away unexpectedly.
> Why not simply save/restore those Neon regs locally instead? Same goes
> for interrupt context.
>
Hmm, I guess it is that simple: if you are certain you will not be
interrupted between stacking and unstacking, you can basically do
whatever you like at any time.
That is quite good news, actually, if that means there is no penalty
in general for allowing NEON in atomic context in some cases.
--
Ard.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-10-07 22:44 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-07 12:12 [RFC PATCH 0/5] ARM64: NEON/CE based crypto in atomic contexts Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 1/5] ARM64: allow limited use of some NEON registers in exceptions Ard Biesheuvel
2013-10-07 22:03 ` Nicolas Pitre
2013-10-07 22:44 ` Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 2/5] ARM64: add quick-n-dirty emulation for AES instructions Ard Biesheuvel
2013-10-07 16:18 ` Catalin Marinas
2013-10-07 16:28 ` Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 3/5] ARM64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 4/5] ARM64: add Crypto Extensions based synchronous AES in CCM mode Ard Biesheuvel
2013-10-07 12:12 ` [RFC PATCH 5/5] mac80211: Use CCM crypto driver for CCMP Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).