Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] dt-bindings: zx296718-clk: add compatible for audio clock controller
From: Rob Herring @ 2016-12-12 17:10 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481189157-8995-1-git-send-email-shawnguo@kernel.org>

On Thu, Dec 08, 2016 at 05:25:56PM +0800, Shawn Guo wrote:
> From: Shawn Guo <shawn.guo@linaro.org>
> 
> It adds the compatible string for zx296718 audio clock controller.
> 
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> ---
>  Documentation/devicetree/bindings/clock/zx296718-clk.txt | 3 +++
>  1 file changed, 3 insertions(+)

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* [PATCH] dt-bindings: Document the hi3660 reset bindings
From: Rob Herring @ 2016-12-12 17:20 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481249504-7942-1-git-send-email-zhangfei.gao@linaro.org>

On Fri, Dec 09, 2016 at 10:11:44AM +0800, Zhangfei Gao wrote:
> Add DT bindings documentation for hi3660 SoC reset controller.
> 
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>  .../bindings/reset/hisilicon,hi3660-reset.txt      | 43 ++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.txt

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* [PATCH] clk: bcm: Fix 'maybe-uninitialized' warning in bcm2835_clock_choose_div_and_prate()
From: Eric Anholt @ 2016-12-12 17:24 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481529653-28133-1-git-send-email-boris.brezillon@free-electrons.com>

Boris Brezillon <boris.brezillon@free-electrons.com> writes:

> best_rate is reported as potentially uninitialized by gcc.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> Fixes: 155e8b3b0ee3 ("clk: bcm: Support rate change propagation on bcm2835 clocks")
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>

Reviewed-by: Eric Anholt <eric@anholt.net>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161212/65cadde3/attachment.sig>

^ permalink raw reply

* [PATCH 2/4] dt-bindings: mfd: Remove TPS65217 interrupts
From: Rob Herring @ 2016-12-12 17:25 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161209062833.5768-3-woogyom.kim@gmail.com>

On Fri, Dec 09, 2016 at 03:28:31PM +0900, Milo Kim wrote:
> Interrupt numbers are from the datasheet, so no need to keep them in
> the ABI. Use the number in the DT file.

I don't see the purpose of ripping this out. The headers have always 
been for convienence, not whether the values come from the datasheet or 
not.

> Signed-off-by: Milo Kim <woogyom.kim@gmail.com>
> ---
>  arch/arm/boot/dts/am335x-bone-common.dtsi |  8 +++-----
>  include/dt-bindings/mfd/tps65217.h        | 26 --------------------------
>  2 files changed, 3 insertions(+), 31 deletions(-)
>  delete mode 100644 include/dt-bindings/mfd/tps65217.h

^ permalink raw reply

* [PATCH 3/4] dt-bindings: power/supply: Update TPS65217 properties
From: Rob Herring @ 2016-12-12 17:26 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161209062833.5768-4-woogyom.kim@gmail.com>

On Fri, Dec 09, 2016 at 03:28:32PM +0900, Milo Kim wrote:
> Add interrupt specifiers for USB and AC charger input. Interrupt numbers
> are from the datasheet.
> Fix wrong property for compatible string.
> 
> Signed-off-by: Milo Kim <woogyom.kim@gmail.com>
> ---
>  .../devicetree/bindings/power/supply/tps65217_charger.txt          | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* [PATCH 4/4] dt-bindings: input: Specify the interrupt number of TPS65217 power button
From: Rob Herring @ 2016-12-12 17:27 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161209062833.5768-5-woogyom.kim@gmail.com>

On Fri, Dec 09, 2016 at 03:28:33PM +0900, Milo Kim wrote:
> Specify the power button interrupt number which is from the datasheet.
> 
> Signed-off-by: Milo Kim <woogyom.kim@gmail.com>
> ---
>  Documentation/devicetree/bindings/input/tps65218-pwrbutton.txt | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* [PATCH] ARM: dts: vexpress: Support GICC_DIR operations
From: Marc Zyngier @ 2016-12-12 17:35 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161210201351.25894-1-christoffer.dall@linaro.org>

[+Sudeep]

On 10/12/16 20:13, Christoffer Dall wrote:
> The GICv2 CPU interface registers span across 8K, not 4K as indicated in
> the DT.  Only the GICC_DIR register is located after the initial 4K
> boundary, leaving a functional system but without support for separately
> EOI'ing and deactivating interrupts.
> 
> After this change the system support split priority drop and interrupt
> deactivation.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts b/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
> index 0205c97..2e0cf39 100644
> --- a/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
> +++ b/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
> @@ -126,7 +126,7 @@
>  		#address-cells = <0>;
>  		interrupt-controller;
>  		reg = <0 0x2c001000 0 0x1000>,
> -		      <0 0x2c002000 0 0x1000>,
> +		      <0 0x2c002000 0 0x2000>,
>  		      <0 0x2c004000 0 0x2000>,
>  		      <0 0x2c006000 0 0x2000>;
>  		interrupts = <1 9 0xf04>;
> 

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* [PATCH] crypto: arm64/aes: reimplement bit-sliced ARM/NEON implementation for arm64
From: Ard Biesheuvel @ 2016-12-12 17:45 UTC (permalink / raw)
  To: linux-arm-kernel

This is a reimplementation of the NEON version of the bit-sliced AES
algorithm. This code is heavily based on Andy Polyakov's OpenSSL version
for ARM, which is also available in the kernel. This is an alternative for
the existing NEON implementation for arm64 authored by me, which suffers
from poor performance due to its reliance on the pathologically slow four
register variant of the tbl/tbx NEON instruction.

This version is about ~30% (*) faster than the generic C code, but only in
cases where the input can be 8x interleaved (this is a fundamental property
of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR
are implemented. (The significance of ECB is that it could potentially be
used by other chaining modes)

* Measured on Cortex-A57. Note that this is still an order of magnitude
  slower than the implementations that use the dedicated AES instructions
  introduced in ARMv8, but those are part of an optional extension, and so
  it is good to have a fallback.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig           |   6 +
 arch/arm64/crypto/Makefile          |   3 +
 arch/arm64/crypto/aes-neonbs-core.S | 905 ++++++++++++++++++++++++++++++++++++
 arch/arm64/crypto/aes-neonbs-glue.c | 300 ++++++++++++
 4 files changed, 1214 insertions(+)
 create mode 100644 arch/arm64/crypto/aes-neonbs-core.S
 create mode 100644 arch/arm64/crypto/aes-neonbs-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 450a85df041a..cd0e7a6146b7 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -72,4 +72,10 @@ config CRYPTO_CRC32_ARM64
 	depends on ARM64
 	select CRYPTO_HASH
 
+config CRYPTO_AES_NEON_BS
+	tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
+	depends on KERNEL_MODE_NEON
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_AES
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index aa8888d7b744..11d20714ec48 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -41,6 +41,9 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
 sha512-arm64-y := sha512-glue.o sha512-core.o
 
+obj-$(CONFIG_CRYPTO_AES_NEON_BS) += aes-neon-bs.o
+aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o
+
 AFLAGS_aes-ce.o		:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o	:= -DINTERLEAVE=4
 
diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S
new file mode 100644
index 000000000000..d027c276cc75
--- /dev/null
+++ b/arch/arm64/crypto/aes-neonbs-core.S
@@ -0,0 +1,905 @@
+/*
+ * Bit sliced AES using NEON instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * The algorithm implemented here is described in detail by the paper
+ * 'Faster and Timing-Attack Resistant AES-GCM' by Emilia Kaesper and
+ * Peter Schwabe (https://eprint.iacr.org/2009/129.pdf)
+ *
+ * This implementation is based primarily on the OpenSSL implementation
+ * for 32-bit ARM written by Andy Polyakov <appro@openssl.org>
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+
+	rounds		.req	x11
+	bskey		.req	x12
+
+	.macro		in_bs_ch, b0, b1, b2, b3, b4, b5, b6, b7
+	eor		\b2, \b2, \b1
+	eor		\b5, \b5, \b6
+	eor		\b3, \b3, \b0
+	eor		\b6, \b6, \b2
+	eor		\b5, \b5, \b0
+	eor		\b6, \b6, \b3
+	eor		\b3, \b3, \b7
+	eor		\b7, \b7, \b5
+	eor		\b3, \b3, \b4
+	eor		\b4, \b4, \b5
+	eor		\b2, \b2, \b7
+	eor		\b3, \b3, \b1
+	eor		\b1, \b1, \b5
+	.endm
+
+	.macro		out_bs_ch, b0, b1, b2, b3, b4, b5, b6, b7
+	eor		\b0, \b0, \b6
+	eor		\b1, \b1, \b4
+	eor		\b4, \b4, \b6
+	eor		\b2, \b2, \b0
+	eor		\b6, \b6, \b1
+	eor		\b1, \b1, \b5
+	eor		\b5, \b5, \b3
+	eor		\b3, \b3, \b7
+	eor		\b7, \b7, \b5
+	eor		\b2, \b2, \b5
+	eor		\b4, \b4, \b7
+	.endm
+
+	.macro		inv_in_bs_ch, b6, b1, b2, b4, b7, b0, b3, b5
+	eor		\b1, \b1, \b7
+	eor		\b4, \b4, \b7
+	eor		\b7, \b7, \b5
+	eor		\b1, \b1, \b3
+	eor		\b2, \b2, \b5
+	eor		\b3, \b3, \b7
+	eor		\b6, \b6, \b1
+	eor		\b2, \b2, \b0
+	eor		\b5, \b5, \b3
+	eor		\b4, \b4, \b6
+	eor		\b0, \b0, \b6
+	eor		\b1, \b1, \b4
+	.endm
+
+	.macro		inv_out_bs_ch, b6, b5, b0, b3, b7, b1, b4, b2
+	eor		\b1, \b1, \b5
+	eor		\b2, \b2, \b7
+	eor		\b3, \b3, \b1
+	eor		\b4, \b4, \b5
+	eor		\b7, \b7, \b5
+	eor		\b3, \b3, \b4
+	eor 		\b5, \b5, \b0
+	eor		\b3, \b3, \b7
+	eor		\b6, \b6, \b2
+	eor		\b2, \b2, \b1
+	eor		\b6, \b6, \b3
+	eor		\b3, \b3, \b0
+	eor		\b5, \b5, \b6
+	.endm
+
+	.macro		mul_gf4, x0, x1, y0, y1, t0, t1
+	eor 		\t0, \y0, \y1
+	and		\t0, \t0, \x0
+	eor		\x0, \x0, \x1
+	and		\t1, \x1, \y0
+	and		\x0, \x0, \y1
+	eor		\x1, \t1, \t0
+	eor		\x0, \x0, \t1
+	.endm
+
+	.macro		mul_gf4_n, x0, x1, y0, y1, t0
+	eor		\t0, \y0, \y1
+	and		\t0, \t0, \x0
+	eor		\x0, \x0, \x1
+	and		\x1, \x1, \y0
+	and		\x0, \x0, \y1
+	eor		\x1, \x1, \x0
+	eor		\x0, \x0, \t0
+	.endm
+
+	.macro		mul_gf4_n_gf4, x0, x1, y0, y1, t0, x2, x3, y2, y3, t1
+	eor		\t0, \y0, \y1
+	eor 		\t1, \y2, \y3
+	and		\t0, \t0, \x0
+	and		\t1, \t1, \x2
+	eor		\x0, \x0, \x1
+	eor		\x2, \x2, \x3
+	and		\x1, \x1, \y0
+	and		\x3, \x3, \y2
+	and		\x0, \x0, \y1
+	and		\x2, \x2, \y3
+	eor		\x1, \x1, \x0
+	eor		\x2, \x2, \x3
+	eor		\x0, \x0, \t0
+	eor		\x3, \x3, \t1
+	.endm
+
+	.macro		mul_gf16_2, x0, x1, x2, x3, x4, x5, x6, x7, \
+				    y0, y1, y2, y3, t0, t1, t2, t3
+	eor		\t0, \x0, \x2
+	eor		\t1, \x1, \x3
+	mul_gf4  	\x0, \x1, \y0, \y1, \t2, \t3
+	eor		\y0, \y0, \y2
+	eor		\y1, \y1, \y3
+	mul_gf4_n_gf4	\t0, \t1, \y0, \y1, \t3, \x2, \x3, \y2, \y3, \t2
+	eor		\x0, \x0, \t0
+	eor		\x2, \x2, \t0
+	eor		\x1, \x1, \t1
+	eor		\x3, \x3, \t1
+	eor		\t0, \x4, \x6
+	eor		\t1, \x5, \x7
+	mul_gf4_n_gf4	\t0, \t1, \y0, \y1, \t3, \x6, \x7, \y2, \y3, \t2
+	eor		\y0, \y0, \y2
+	eor		\y1, \y1, \y3
+	mul_gf4  	\x4, \x5, \y0, \y1, \t2, \t3
+	eor		\x4, \x4, \t0
+	eor		\x6, \x6, \t0
+	eor		\x5, \x5, \t1
+	eor		\x7, \x7, \t1
+	.endm
+
+	.macro		inv_gf256, x0, x1, x2, x3, x4, x5, x6, x7, \
+				   t0, t1, t2, t3, s0, s1, s2, s3
+	eor		\t3, \x4, \x6
+	eor		\t2, \x5, \x7
+	eor		\t1, \x1, \x3
+	eor		\s1, \x7, \x6
+	mov		\t0, \t2
+	eor		\s0, \x0, \x2
+	orr		\t2, \t2, \t1
+	eor		\s3, \t3, \t0
+	and		\s2, \t3, \s0
+	orr		\t3, \t3, \s0
+	eor		\s0, \s0, \t1
+	and		\t0, \t0, \t1
+	eor		\t1, \x3, \x2
+	and		\s3, \s3, \s0
+	and		\s1, \s1, \t1
+	eor		\t1, \x4, \x5
+	eor		\s0, \x1, \x0
+	eor		\t3, \t3, \s1
+	eor		\t2, \t2, \s1
+	and		\s1, \t1, \s0
+	orr		\t1, \t1, \s0
+	eor		\t3, \t3, \s3
+	eor		\t0, \t0, \s1
+	eor		\t2, \t2, \s2
+	eor		\t1, \t1, \s3
+	eor		\t0, \t0, \s2
+	and		\s0, \x7, \x3
+	eor		\t1, \t1, \s2
+	and		\s1, \x6, \x2
+	and		\s2, \x5, \x1
+	orr		\s3, \x4, \x0
+	eor		\t3, \t3, \s0
+	eor		\t1, \t1, \s2
+	eor		\t0, \t0, \s3
+	eor		\t2, \t2, \s1
+	and		\s2, \t3, \t1
+	mov		\s0, \t0
+	eor		\s1, \t2, \s2
+	eor		\s3, \t0, \s2
+	eor		\s2, \t0, \s2
+	bsl		\s1, \t1, \t0
+	bsl		\s3, \t3, \t2
+	eor		\t3, \t3, \t2
+	bsl		\s0, \s1, \s2
+	bsl		\t0, \s2, \s1
+	and		\s2, \s0, \s3
+	eor		\t1, \t1, \t0
+	eor		\s2, \s2, \t3
+	mul_gf16_2	\x0, \x1, \x2, \x3, \x4, \x5, \x6, \x7, \
+			\s3, \s2, \s1, \t1, \s0, \t0, \t2, \t3
+	.endm
+
+	.macro		sbox, b0, b1, b2, b3, b4, b5, b6, b7, \
+			      t0, t1, t2, t3, s0, s1, s2, s3
+	in_bs_ch	\b0\().16b, \b1\().16b, \b2\().16b, \b3\().16b, \
+			\b4\().16b, \b5\().16b, \b6\().16b, \b7\().16b
+	inv_gf256	\b6\().16b, \b5\().16b, \b0\().16b, \b3\().16b, \
+			\b7\().16b, \b1\().16b, \b4\().16b, \b2\().16b, \
+			\t0\().16b, \t1\().16b, \t2\().16b, \t3\().16b, \
+			\s0\().16b, \s1\().16b, \s2\().16b, \s3\().16b
+	out_bs_ch	\b7\().16b, \b1\().16b, \b4\().16b, \b2\().16b, \
+			\b6\().16b, \b5\().16b, \b0\().16b, \b3\().16b
+	.endm
+
+	.macro		inv_sbox, b0, b1, b2, b3, b4, b5, b6, b7, \
+				  t0, t1, t2, t3, s0, s1, s2, s3
+	inv_in_bs_ch	\b0\().16b, \b1\().16b, \b2\().16b, \b3\().16b, \
+			\b4\().16b, \b5\().16b, \b6\().16b, \b7\().16b
+	inv_gf256	\b5\().16b, \b1\().16b, \b2\().16b, \b6\().16b, \
+			\b3\().16b, \b7\().16b, \b0\().16b, \b4\().16b, \
+			\t0\().16b, \t1\().16b, \t2\().16b, \t3\().16b, \
+			\s0\().16b, \s1\().16b, \s2\().16b, \s3\().16b
+	inv_out_bs_ch	\b3\().16b, \b7\().16b, \b0\().16b, \b4\().16b, \
+			\b5\().16b, \b1\().16b, \b2\().16b, \b6\().16b
+	.endm
+
+	.macro		enc_next_rk
+	ldp		q16, q17, [bskey], #32
+	ldp		q18, q19, [bskey], #32
+	ldp		q20, q21, [bskey], #32
+	ldp		q22, q23, [bskey], #32
+	.endm
+
+	.macro		dec_next_rk
+	ldp		q16, q17, [bskey, #-128]!
+	ldp		q18, q19, [bskey, #32]
+	ldp		q20, q21, [bskey, #64]
+	ldp		q22, q23, [bskey, #96]
+	.endm
+
+	.macro		add_round_key, x0, x1, x2, x3, x4, x5, x6, x7
+	eor		\x0\().16b, \x0\().16b, v16.16b
+	eor		\x1\().16b, \x1\().16b, v17.16b
+	eor		\x2\().16b, \x2\().16b, v18.16b
+	eor		\x3\().16b, \x3\().16b, v19.16b
+	eor		\x4\().16b, \x4\().16b, v20.16b
+	eor		\x5\().16b, \x5\().16b, v21.16b
+	eor		\x6\().16b, \x6\().16b, v22.16b
+	eor		\x7\().16b, \x7\().16b, v23.16b
+	.endm
+
+	.macro		shift_rows, x0, x1, x2, x3, x4, x5, x6, x7, mask
+	tbl		\x0\().16b, {\x0\().16b}, \mask\().16b
+	tbl		\x1\().16b, {\x1\().16b}, \mask\().16b
+	tbl		\x2\().16b, {\x2\().16b}, \mask\().16b
+	tbl		\x3\().16b, {\x3\().16b}, \mask\().16b
+	tbl		\x4\().16b, {\x4\().16b}, \mask\().16b
+	tbl		\x5\().16b, {\x5\().16b}, \mask\().16b
+	tbl		\x6\().16b, {\x6\().16b}, \mask\().16b
+	tbl		\x7\().16b, {\x7\().16b}, \mask\().16b
+	.endm
+
+	.macro		mix_cols, x0, x1, x2, x3, x4, x5, x6, x7, \
+				  t0, t1, t2, t3, t4, t5, t6, t7, inv
+	ext		\t0\().16b, \x0\().16b, \x0\().16b, #12
+	ext		\t1\().16b, \x1\().16b, \x1\().16b, #12
+	eor		\x0\().16b, \x0\().16b, \t0\().16b
+	ext		\t2\().16b, \x2\().16b, \x2\().16b, #12
+	eor		\x1\().16b, \x1\().16b, \t1\().16b
+	ext		\t3\().16b, \x3\().16b, \x3\().16b, #12
+	eor		\x2\().16b, \x2\().16b, \t2\().16b
+	ext		\t4\().16b, \x4\().16b, \x4\().16b, #12
+	eor		\x3\().16b, \x3\().16b, \t3\().16b
+	ext		\t5\().16b, \x5\().16b, \x5\().16b, #12
+	eor		\x4\().16b, \x4\().16b, \t4\().16b
+	ext		\t6\().16b, \x6\().16b, \x6\().16b, #12
+	eor		\x5\().16b, \x5\().16b, \t5\().16b
+	ext		\t7\().16b, \x7\().16b, \x7\().16b, #12
+	eor		\x6\().16b, \x6\().16b, \t6\().16b
+	eor		\t1\().16b, \t1\().16b, \x0\().16b
+	eor		\x7\().16b, \x7\().16b, \t7\().16b
+	ext		\x0\().16b, \x0\().16b, \x0\().16b, #8
+	eor		\t2\().16b, \t2\().16b, \x1\().16b
+	eor		\t0\().16b, \t0\().16b, \x7\().16b
+	eor		\t1\().16b, \t1\().16b, \x7\().16b
+	ext		\x1\().16b, \x1\().16b, \x1\().16b, #8
+	eor		\t5\().16b, \t5\().16b, \x4\().16b
+	eor		\x0\().16b, \x0\().16b, \t0\().16b
+	eor		\t6\().16b, \t6\().16b, \x5\().16b
+	eor		\x1\().16b, \x1\().16b, \t1\().16b
+	ext		\t0\().16b, \x4\().16b, \x4\().16b, #8
+	eor		\t4\().16b, \t4\().16b, \x3\().16b
+	ext		\t1\().16b, \x5\().16b, \x5\().16b, #8
+	eor		\t7\().16b, \t7\().16b, \x6\().16b
+	ext		\x4\().16b, \x3\().16b, \x3\().16b, #8
+	eor		\t3\().16b, \t3\().16b, \x2\().16b
+	ext		\x5\().16b, \x7\().16b, \x7\().16b, #8
+	eor		\t4\().16b, \t4\().16b, \x7\().16b
+	ext		\x3\().16b, \x6\().16b, \x6\().16b, #8
+	eor		\t3\().16b, \t3\().16b, \x7\().16b
+	ext		\x6\().16b, \x2\().16b, \x2\().16b, #8
+	eor		\x7\().16b, \t1\().16b, \t5\().16b
+	.ifb		\inv
+	eor		\x2\().16b, \t0\().16b, \t4\().16b
+	eor		\x4\().16b, \x4\().16b, \t3\().16b
+	eor		\x5\().16b, \x5\().16b, \t7\().16b
+	eor		\x3\().16b, \x3\().16b, \t6\().16b
+	eor		\x6\().16b, \x6\().16b, \t2\().16b
+	.else
+	eor		\t3\().16b, \t3\().16b, \x4\().16b
+	eor		\x5\().16b, \x5\().16b, \t7\().16b
+	eor		\x2\().16b, \x3\().16b, \t6\().16b
+	eor		\x3\().16b, \t0\().16b, \t4\().16b
+	eor		\x4\().16b, \x6\().16b, \t2\().16b
+	mov		\x6\().16b, \t3\().16b
+	.endif
+	.endm
+
+	.macro		inv_mix_cols, x0, x1, x2, x3, x4, x5, x6, x7, \
+				      t0, t1, t2, t3, t4, t5, t6, t7
+	ext		\t0\().16b, \x0\().16b, \x0\().16b, #8
+	ext		\t6\().16b, \x6\().16b, \x6\().16b, #8
+	ext		\t7\().16b, \x7\().16b, \x7\().16b, #8
+	eor		\t0\().16b, \t0\().16b, \x0\().16b
+	ext		\t1\().16b, \x1\().16b, \x1\().16b, #8
+	eor		\t6\().16b, \t6\().16b, \x6\().16b
+	ext		\t2\().16b, \x2\().16b, \x2\().16b, #8
+	eor		\t7\().16b, \t7\().16b, \x7\().16b
+	ext		\t3\().16b, \x3\().16b, \x3\().16b, #8
+	eor		\t1\().16b, \t1\().16b, \x1\().16b
+	ext		\t4\().16b, \x4\().16b, \x4\().16b, #8
+	eor		\t2\().16b, \t2\().16b, \x2\().16b
+	ext		\t5\().16b, \x5\().16b, \x5\().16b, #8
+	eor		\t3\().16b, \t3\().16b, \x3\().16b
+	eor		\t4\().16b, \t4\().16b, \x4\().16b
+	eor		\t5\().16b, \t5\().16b, \x5\().16b
+	eor		\x0\().16b, \x0\().16b, \t6\().16b
+	eor		\x1\().16b, \x1\().16b, \t6\().16b
+	eor		\x2\().16b, \x2\().16b, \t0\().16b
+	eor		\x4\().16b, \x4\().16b, \t2\().16b
+	eor		\x3\().16b, \x3\().16b, \t1\().16b
+	eor		\x1\().16b, \x1\().16b, \t7\().16b
+	eor		\x2\().16b, \x2\().16b, \t7\().16b
+	eor		\x4\().16b, \x4\().16b, \t6\().16b
+	eor		\x5\().16b, \x5\().16b, \t3\().16b
+	eor		\x3\().16b, \x3\().16b, \t6\().16b
+	eor		\x6\().16b, \x6\().16b, \t4\().16b
+	eor		\x4\().16b, \x4\().16b, \t7\().16b
+	eor		\x5\().16b, \x5\().16b, \t7\().16b
+	eor		\x7\().16b, \x7\().16b, \t5\().16b
+	mix_cols	\x0, \x1, \x2, \x3, \x4, \x5, \x6, \x7, \
+			\t0, \t1, \t2, \t3, \t4, \t5, \t6, \t7, 1
+	.endm
+
+	.macro		swapmove_2x, a0, b0, a1, b1, n, mask, t0, t1
+	ushr		\t0\().2d, \b0\().2d, #\n
+	ushr		\t1\().2d, \b1\().2d, #\n
+	eor		\t0\().16b, \t0\().16b, \a0\().16b
+	eor		\t1\().16b, \t1\().16b, \a1\().16b
+	and		\t0\().16b, \t0\().16b, \mask\().16b
+	and		\t1\().16b, \t1\().16b, \mask\().16b
+	eor		\a0\().16b, \a0\().16b, \t0\().16b
+	shl		\t0\().2d, \t0\().2d, #\n
+	eor		\a1\().16b, \a1\().16b, \t1\().16b
+	shl		\t1\().2d, \t1\().2d, #\n
+	eor		\b0\().16b, \b0\().16b, \t0\().16b
+	eor		\b1\().16b, \b1\().16b, \t1\().16b
+	.endm
+
+	.macro		bitslice, x7, x6, x5, x4, x3, x2, x1, x0, t0, t1, t2, t3
+	movi		\t0\().16b, #0x55
+	movi		\t1\().16b, #0x33
+	swapmove_2x	\x0, \x1, \x2, \x3, 1, \t0, \t2, \t3
+	swapmove_2x	\x4, \x5, \x6, \x7, 1, \t0, \t2, \t3
+	movi		\t0\().16b, #0x0f
+	swapmove_2x	\x0, \x2, \x1, \x3, 2, \t1, \t2, \t3
+	swapmove_2x	\x4, \x6, \x5, \x7, 2, \t1, \t2, \t3
+	swapmove_2x	\x0, \x4, \x1, \x5, 4, \t0, \t2, \t3
+	swapmove_2x	\x2, \x6, \x3, \x7, 4, \t0, \t2, \t3
+	.endm
+
+
+	.align		6
+M0:	.octa		0x0004080c0105090d02060a0e03070b0f
+
+M0SR:	.octa		0x0004080c05090d010a0e02060f03070b
+SR:	.octa		0x0f0e0d0c0a09080b0504070600030201
+SRM0:	.octa		0x01060b0c0207080d0304090e00050a0f
+
+M0ISR:	.octa		0x0004080c0d0105090a0e0206070b0f03
+ISR:	.octa		0x0f0e0d0c080b0a090504070602010003
+ISRM0:	.octa		0x0306090c00070a0d01040b0e0205080f
+
+	/*
+	 * void aesbs_convert_key(u8 out[], u32 const rk[], int rounds)
+	 */
+ENTRY(aesbs_convert_key)
+	ld1		{v7.4s}, [x1], #16		// load round 0 key
+	ld1		{v17.4s}, [x1], #16		// load round 1 key
+
+	movi		v8.16b,  #0x01			// bit masks
+	movi		v9.16b,  #0x02
+	movi		v10.16b, #0x04
+	movi		v11.16b, #0x08
+	movi		v12.16b, #0x10
+	movi		v13.16b, #0x20
+	movi		v14.16b, #0x40
+	movi		v15.16b, #0x80
+	ldr		q16, M0
+
+	sub		x2, x2, #1
+	str		q7, [x0], #16		// save round 0 key
+
+.Lkey_loop:
+	tbl		v7.16b ,{v17.16b}, v16.16b
+	ld1		{v17.4s}, [x1], #16		// load next round key
+
+	cmtst		v0.16b, v7.16b, v8.16b
+	cmtst		v1.16b, v7.16b, v9.16b
+	cmtst		v2.16b, v7.16b, v10.16b
+	cmtst		v3.16b, v7.16b, v11.16b
+	cmtst		v4.16b, v7.16b, v12.16b
+	cmtst		v5.16b, v7.16b, v13.16b
+	cmtst		v6.16b, v7.16b, v14.16b
+	cmtst		v7.16b, v7.16b, v15.16b
+	not		v0.16b, v0.16b
+	not		v1.16b, v1.16b
+	not		v5.16b, v5.16b
+	not		v6.16b, v6.16b
+
+	subs		x2, x2, #1
+	stp		q2, q3, [x0, #32]
+	stp		q4, q5, [x0, #64]
+	stp		q6, q7, [x0, #96]
+	stp		q0, q1, [x0], #128
+	b.ne		.Lkey_loop
+
+	movi		v7.16b, #0x63			// compose .L63
+	eor		v17.16b, v17.16b, v7.16b
+	str		q17, [x0]
+	ret
+ENDPROC(aesbs_convert_key)
+
+	.align		4
+aesbs_encrypt8:
+	ldr		q9, [bskey], #16		// round 0 key
+	ldr		q8, M0SR
+	ldr		q24, SR
+
+	eor		v10.16b, v0.16b, v9.16b		// xor with round0 key
+	eor		v11.16b, v1.16b, v9.16b
+	tbl		v0.16b, {v10.16b}, v8.16b
+	eor		v12.16b, v2.16b, v9.16b
+	tbl		v1.16b, {v11.16b}, v8.16b
+	eor		v13.16b, v3.16b, v9.16b
+	tbl		v2.16b, {v12.16b}, v8.16b
+	eor		v14.16b, v4.16b, v9.16b
+	tbl		v3.16b, {v13.16b}, v8.16b
+	eor		v15.16b, v5.16b, v9.16b
+	tbl		v4.16b, {v14.16b}, v8.16b
+	eor		v10.16b, v6.16b, v9.16b
+	tbl		v5.16b, {v15.16b}, v8.16b
+	eor		v11.16b, v7.16b, v9.16b
+	tbl		v6.16b, {v10.16b}, v8.16b
+	tbl		v7.16b, {v11.16b}, v8.16b
+
+	bitslice	v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11
+
+	sub		rounds, rounds, #1
+	b		.Lenc_sbox
+
+.Lenc_loop:
+	shift_rows	v0, v1, v2, v3, v4, v5, v6, v7, v24
+.Lenc_sbox:
+	sbox		v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, \
+								v13, v14, v15
+	subs		rounds, rounds, #1
+	b.cc		.Lenc_done
+
+	enc_next_rk
+
+	mix_cols	v0, v1, v4, v6, v3, v7, v2, v5, v8, v9, v10, v11, v12, \
+								v13, v14, v15
+
+	add_round_key	v0, v1, v2, v3, v4, v5, v6, v7
+
+	b.ne		.Lenc_loop
+	ldr		q24, SRM0
+	b		.Lenc_loop
+
+.Lenc_done:
+	ldr		q12, [bskey]			// last round key
+
+	bitslice	v0, v1, v4, v6, v3, v7, v2, v5, v8, v9, v10, v11
+
+	eor		v0.16b, v0.16b, v12.16b
+	eor		v1.16b, v1.16b, v12.16b
+	eor		v4.16b, v4.16b, v12.16b
+	eor		v6.16b, v6.16b, v12.16b
+	eor		v3.16b, v3.16b, v12.16b
+	eor		v7.16b, v7.16b, v12.16b
+	eor		v2.16b, v2.16b, v12.16b
+	eor		v5.16b, v5.16b, v12.16b
+	ret
+ENDPROC(aesbs_encrypt8)
+
+	.align		4
+aesbs_decrypt8:
+	lsl		x9, rounds, #7
+	add		bskey, bskey, x9
+
+	ldr		q9, [bskey, #-112]!		// round 0 key
+	ldr		q8, M0ISR
+	ldr		q24, ISR
+
+	eor		v10.16b, v0.16b, v9.16b		// xor with round0 key
+	eor		v11.16b, v1.16b, v9.16b
+	tbl		v0.16b, {v10.16b}, v8.16b
+	eor		v12.16b, v2.16b, v9.16b
+	tbl		v1.16b, {v11.16b}, v8.16b
+	eor		v13.16b, v3.16b, v9.16b
+	tbl		v2.16b, {v12.16b}, v8.16b
+	eor		v14.16b, v4.16b, v9.16b
+	tbl		v3.16b, {v13.16b}, v8.16b
+	eor		v15.16b, v5.16b, v9.16b
+	tbl		v4.16b, {v14.16b}, v8.16b
+	eor		v10.16b, v6.16b, v9.16b
+	tbl		v5.16b, {v15.16b}, v8.16b
+	eor		v11.16b, v7.16b, v9.16b
+	tbl		v6.16b, {v10.16b}, v8.16b
+	tbl		v7.16b, {v11.16b}, v8.16b
+
+	bitslice	v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11
+
+	sub		rounds, rounds, #1
+	b		.Ldec_sbox
+
+.Ldec_loop:
+	shift_rows	v0, v1, v2, v3, v4, v5, v6, v7, v24
+.Ldec_sbox:
+	inv_sbox	v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, \
+								v13, v14, v15
+	subs		rounds, rounds, #1
+	b.cc		.Ldec_done
+
+	dec_next_rk
+
+	add_round_key	v0, v1, v6, v4, v2, v7, v3, v5
+
+	inv_mix_cols	v0, v1, v6, v4, v2, v7, v3, v5, v8, v9, v10, v11, v12, \
+								v13, v14, v15
+
+	b.ne		.Ldec_loop
+	ldr		q24, ISRM0
+	b		.Ldec_loop
+.Ldec_done:
+	ldr		q12, [bskey, #-16]		// last round key
+
+	bitslice	v0, v1, v6, v4, v2, v7, v3, v5, v8, v9, v10, v11
+
+	eor		v0.16b, v0.16b, v12.16b
+	eor		v1.16b, v1.16b, v12.16b
+	eor		v6.16b, v6.16b, v12.16b
+	eor		v4.16b, v4.16b, v12.16b
+	eor		v2.16b, v2.16b, v12.16b
+	eor		v7.16b, v7.16b, v12.16b
+	eor		v3.16b, v3.16b, v12.16b
+	eor		v5.16b, v5.16b, v12.16b
+	ret
+ENDPROC(aesbs_decrypt8)
+
+	/*
+	 * aesbs_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
+	 *		     int blocks)
+	 * aesbs_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
+	 *		     int blocks)
+	 */
+	.macro		__ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
+	stp		x29, x30, [sp, #-16]!
+	mov		x29, sp
+
+99:	mov		x5, #1
+	lsl		x5, x5, x4
+	subs		w4, w4, #8
+	csel		x4, x4, xzr, pl
+	csel		x5, x5, xzr, mi
+
+	ld1		{v0.16b}, [x1], #16
+	tbnz		x5, #1, 0f
+	ld1		{v1.16b}, [x1], #16
+	tbnz		x5, #2, 0f
+	ld1		{v2.16b}, [x1], #16
+	tbnz		x5, #3, 0f
+	ld1		{v3.16b}, [x1], #16
+	tbnz		x5, #4, 0f
+	ld1		{v4.16b}, [x1], #16
+	tbnz		x5, #5, 0f
+	ld1		{v5.16b}, [x1], #16
+	tbnz		x5, #6, 0f
+	ld1		{v6.16b}, [x1], #16
+	tbnz		x5, #7, 0f
+	ld1		{v7.16b}, [x1], #16
+
+0:	mov		bskey, x2
+	mov		rounds, x3
+	bl		\do8
+
+	st1		{\o0\().16b}, [x0], #16
+	tbnz		x5, #1, 1f
+	st1		{\o1\().16b}, [x0], #16
+	tbnz		x5, #2, 1f
+	st1		{\o2\().16b}, [x0], #16
+	tbnz		x5, #3, 1f
+	st1		{\o3\().16b}, [x0], #16
+	tbnz		x5, #4, 1f
+	st1		{\o4\().16b}, [x0], #16
+	tbnz		x5, #5, 1f
+	st1		{\o5\().16b}, [x0], #16
+	tbnz		x5, #6, 1f
+	st1		{\o6\().16b}, [x0], #16
+	tbnz		x5, #7, 1f
+	st1		{\o7\().16b}, [x0], #16
+
+	cbnz		x4, 99b
+
+1:	ldp		x29, x30, [sp], #16
+	ret
+	.endm
+
+	.align		4
+ENTRY(aesbs_ecb_encrypt)
+	__ecb_crypt	aesbs_encrypt8, v0, v1, v4, v6, v3, v7, v2, v5
+ENDPROC(aesbs_ecb_encrypt)
+
+	.align		4
+ENTRY(aesbs_ecb_decrypt)
+	__ecb_crypt	aesbs_decrypt8, v0, v1, v6, v4, v2, v7, v3, v5
+ENDPROC(aesbs_ecb_decrypt)
+
+	.macro		next_tweak, out, in, const, tmp
+	sshr		\tmp\().2d,  \in\().2d,   #63
+	and		\tmp\().16b, \tmp\().16b, \const\().16b
+	add		\out\().2d,  \in\().2d,   \in\().2d
+	ext		\tmp\().16b, \tmp\().16b, \tmp\().16b, #8
+	eor		\out\().16b, \out\().16b, \tmp\().16b
+	.endm
+
+	.align		4
+.Lxts_mul_x:
+CPU_LE(	.quad		1, 0x87		)
+CPU_BE(	.quad		0x87, 1		)
+
+	/*
+	 * aesbs_xts_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
+	 *		     int blocks, u8 iv[])
+	 * aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
+	 *		     int blocks, u8 iv[])
+	 */
+__xts_crypt8:
+	mov		x6, #1
+	lsl		x6, x6, x4
+	subs		w4, w4, #8
+	csel		x4, x4, xzr, pl
+	csel		x6, x6, xzr, mi
+
+	ld1		{v0.16b}, [x1], #16
+	next_tweak	v26, v25, v30, v31
+	eor		v0.16b, v0.16b, v25.16b
+	tbnz		x6, #1, 0f
+
+	ld1		{v1.16b}, [x1], #16
+	next_tweak	v27, v26, v30, v31
+	eor		v1.16b, v1.16b, v26.16b
+	tbnz		x6, #2, 0f
+
+	ld1		{v2.16b}, [x1], #16
+	next_tweak	v28, v27, v30, v31
+	eor		v2.16b, v2.16b, v27.16b
+	tbnz		x6, #3, 0f
+
+	ld1		{v3.16b}, [x1], #16
+	next_tweak	v29, v28, v30, v31
+	eor		v3.16b, v3.16b, v28.16b
+	tbnz		x6, #4, 0f
+
+	ld1		{v4.16b}, [x1], #16
+	str		q29, [sp, #16]
+	eor		v4.16b, v4.16b, v29.16b
+	next_tweak	v29, v29, v30, v31
+	tbnz		x6, #5, 0f
+
+	ld1		{v5.16b}, [x1], #16
+	str		q29, [sp, #32]
+	eor		v5.16b, v5.16b, v29.16b
+	next_tweak	v29, v29, v30, v31
+	tbnz		x6, #6, 0f
+
+	ld1		{v6.16b}, [x1], #16
+	str		q29, [sp, #48]
+	eor		v6.16b, v6.16b, v29.16b
+	next_tweak	v29, v29, v30, v31
+	tbnz		x6, #7, 0f
+
+	ld1		{v7.16b}, [x1], #16
+	str		q29, [sp, #64]
+	eor		v7.16b, v7.16b, v29.16b
+	next_tweak	v29, v29, v30, v31
+
+0:	mov		bskey, x2
+	mov		rounds, x3
+	br		x7
+ENDPROC(__xts_crypt8)
+
+	.macro		__xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
+	stp		x29, x30, [sp, #-80]!
+	mov		x29, sp
+
+	ldr		q30, .Lxts_mul_x
+	ld1		{v25.16b}, [x5]
+
+99:	adr		x7, \do8
+	bl		__xts_crypt8
+
+	ldp		q16, q17, [sp, #16]
+	ldp		q18, q19, [sp, #48]
+
+	eor		\o0\().16b, \o0\().16b, v25.16b
+	eor		\o1\().16b, \o1\().16b, v26.16b
+	eor		\o2\().16b, \o2\().16b, v27.16b
+	eor		\o3\().16b, \o3\().16b, v28.16b
+
+	st1		{\o0\().16b}, [x0], #16
+	mov		v25.16b, v26.16b
+	tbnz		x6, #1, 1f
+	st1		{\o1\().16b}, [x0], #16
+	mov		v25.16b, v27.16b
+	tbnz		x6, #2, 1f
+	st1		{\o2\().16b}, [x0], #16
+	mov		v25.16b, v28.16b
+	tbnz		x6, #3, 1f
+	st1		{\o3\().16b}, [x0], #16
+	mov		v25.16b, v29.16b
+	tbnz		x6, #4, 1f
+
+	eor		\o4\().16b, \o4\().16b, v16.16b
+	eor		\o5\().16b, \o5\().16b, v17.16b
+	eor		\o6\().16b, \o6\().16b, v18.16b
+	eor		\o7\().16b, \o7\().16b, v19.16b
+
+	st1		{\o4\().16b}, [x0], #16
+	tbnz		x6, #5, 1f
+	st1		{\o5\().16b}, [x0], #16
+	tbnz		x6, #6, 1f
+	st1		{\o6\().16b}, [x0], #16
+	tbnz		x6, #7, 1f
+	st1		{\o7\().16b}, [x0], #16
+
+	cbnz		x4, 99b
+
+1:	st1		{v25.16b}, [x5]
+	ldp		x29, x30, [sp], #80
+	ret
+	.endm
+
+ENTRY(aesbs_xts_encrypt)
+	__xts_crypt	aesbs_encrypt8, v0, v1, v4, v6, v3, v7, v2, v5
+ENDPROC(aesbs_xts_encrypt)
+
+ENTRY(aesbs_xts_decrypt)
+	__xts_crypt	aesbs_decrypt8, v0, v1, v6, v4, v2, v7, v3, v5
+ENDPROC(aesbs_xts_decrypt)
+
+	.macro		next_ctr, v
+	mov		\v\().d[1], x8
+	mov		\v\().d[0], x7
+	adds		x8, x8, #1
+	adc		x7, x7, xzr
+	rev64		\v\().16b, \v\().16b
+	.endm
+
+	/*
+	 * aesbs_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
+	 *		     int rounds, int blocks, u8 iv[], bool final)
+	 */
+ENTRY(aesbs_ctr_encrypt)
+	stp		x29, x30, [sp, #-16]!
+	mov		x29, sp
+
+	add		x4, x4, x6		// do one extra block if final
+
+	ldp		x7, x8, [x5]
+	ld1		{v0.16b}, [x5]
+CPU_LE(	rev		x7, x7		)
+CPU_LE(	rev		x8, x8		)
+	adds		x8, x8, #1
+	adc		x7, x7, xzr
+
+99:	mov		x9, #1
+	lsl		x9, x9, x4
+	subs		w4, w4, #8
+	csel		x4, x4, xzr, pl
+	csel		x9, x9, xzr, le
+
+	tbnz		x9, #1, 0f
+
+	next_ctr	v1
+	tbnz		x9, #2, 0f
+
+	next_ctr	v2
+	tbnz		x9, #3, 0f
+
+	next_ctr	v3
+	tbnz		x9, #4, 0f
+
+	next_ctr	v4
+	tbnz		x9, #5, 0f
+
+	next_ctr	v5
+	tbnz		x9, #6, 0f
+
+	next_ctr	v6
+	tbnz		x9, #7, 0f
+
+	next_ctr	v7
+
+0:	mov		bskey, x2
+	mov		rounds, x3
+	bl		aesbs_encrypt8
+
+	lsr		x9, x9, x6		// disregard the final block
+	tbnz		x9, #0, 0f
+
+	ld1		{v8.16b}, [x1], #16
+	eor		v0.16b, v0.16b, v8.16b
+	st1		{v0.16b}, [x0], #16
+	tbnz		x9, #1, 1f
+
+	ld1		{v9.16b}, [x1], #16
+	eor		v1.16b, v1.16b, v9.16b
+	st1		{v1.16b}, [x0], #16
+	tbnz		x9, #2, 2f
+
+	ld1		{v10.16b}, [x1], #16
+	eor		v4.16b, v4.16b, v10.16b
+	st1		{v4.16b}, [x0], #16
+	tbnz		x9, #3, 3f
+
+	ld1		{v11.16b}, [x1], #16
+	eor		v6.16b, v6.16b, v11.16b
+	st1		{v6.16b}, [x0], #16
+	tbnz		x9, #4, 4f
+
+	ld1		{v12.16b}, [x1], #16
+	eor		v3.16b, v3.16b, v12.16b
+	st1		{v3.16b}, [x0], #16
+	tbnz		x9, #5, 5f
+
+	ld1		{v13.16b}, [x1], #16
+	eor		v7.16b, v7.16b, v13.16b
+	st1		{v7.16b}, [x0], #16
+	tbnz		x9, #6, 6f
+
+	ld1		{v14.16b}, [x1], #16
+	eor		v2.16b, v2.16b, v14.16b
+	st1		{v2.16b}, [x0], #16
+	tbnz		x9, #7, 7f
+
+	ld1		{v15.16b}, [x1], #16
+	eor		v5.16b, v5.16b, v15.16b
+	st1		{v5.16b}, [x0], #16
+
+	next_ctr	v0
+	cbnz		x4, 99b
+
+0:	st1		{v0.16b}, [x5]
+8:	ldp		x29, x30, [sp], #16
+	ret
+
+	/*
+	 * If we are handling the tail of the input (x6 == 1), return the
+	 * final keystream block back to the caller via the IV buffer.
+	 */
+1:	cbz		x6, 8b
+	st1		{v1.16b}, [x5]
+	b		8b
+2:	cbz		x6, 8b
+	st1		{v4.16b}, [x5]
+	b		8b
+3:	cbz		x6, 8b
+	st1		{v6.16b}, [x5]
+	b		8b
+4:	cbz		x6, 8b
+	st1		{v3.16b}, [x5]
+	b		8b
+5:	cbz		x6, 8b
+	st1		{v7.16b}, [x5]
+	b		8b
+6:	cbz		x6, 8b
+	st1		{v2.16b}, [x5]
+	b		8b
+7:	cbz		x6, 8b
+	st1		{v5.16b}, [x5]
+	b		8b
+ENDPROC(aesbs_ctr_encrypt)
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
new file mode 100644
index 000000000000..57982172563c
--- /dev/null
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -0,0 +1,300 @@
+/*
+ * Bit sliced AES using NEON instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/xts.h>
+#include <linux/module.h>
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void aesbs_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[],
+				  int rounds, int blocks);
+asmlinkage void aesbs_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[],
+				  int rounds, int blocks);
+
+asmlinkage void aesbs_xts_encrypt(u8 out[], u8 const in[], u8 const rk[],
+				  int rounds, int blocks, u8 iv[]);
+asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[],
+				  int rounds, int blocks, u8 iv[]);
+
+asmlinkage void aesbs_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
+				  int rounds, int blocks, u8 iv[], bool final);
+
+asmlinkage void aesbs_convert_key(u8 out[], u32 const rk[], int rounds);
+
+struct aesbs_key {
+	u8			key[13 * (8 * AES_BLOCK_SIZE) + 32];
+};
+
+struct aesbs_ctx {
+	struct aesbs_key	bskey;
+	int			rounds;
+};
+
+struct aesbs_xts_ctx {
+	struct aesbs_key	bskey;
+	struct crypto_cipher	*tweak_tfm;
+	int			rounds;
+};
+
+static int aesbs_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+			unsigned int key_len)
+{
+	struct aesbs_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct crypto_aes_ctx rk;
+	int err;
+
+	err = crypto_aes_expand_key(&rk, in_key, key_len);
+	if (err)
+		return err;
+
+	ctx->rounds = 6 + key_len / 4;
+
+	kernel_neon_begin();
+	aesbs_convert_key(ctx->bskey.key, rk.key_enc, ctx->rounds);
+	kernel_neon_end();
+
+	return 0;
+}
+
+static int xts_init(struct crypto_skcipher *tfm)
+{
+	struct aesbs_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	ctx->tweak_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC);
+	if (IS_ERR(ctx->tweak_tfm))
+		return PTR_ERR(ctx->tweak_tfm);
+
+	return 0;
+}
+
+static void xts_exit(struct crypto_skcipher *tfm)
+{
+	struct aesbs_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	crypto_free_cipher(ctx->tweak_tfm);
+}
+
+static int aesbs_xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+			    unsigned int key_len)
+{
+	struct aesbs_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct crypto_aes_ctx rk;
+	int err;
+
+	err = xts_verify_key(tfm, in_key, key_len);
+	if (err)
+		return err;
+
+	err = crypto_cipher_setkey(ctx->tweak_tfm, in_key + key_len / 2,
+				   key_len / 2);
+	if (err)
+		return err;
+
+	err = crypto_aes_expand_key(&rk, in_key, key_len / 2);
+	if (err)
+		return err;
+
+	ctx->rounds = 6 + key_len / 8;
+
+	kernel_neon_begin();
+	aesbs_convert_key(ctx->bskey.key, rk.key_enc, ctx->rounds);
+	kernel_neon_end();
+
+	return 0;
+}
+
+static int __ecb_crypt(struct skcipher_request *req,
+		       void (*fn)(u8 out[], u8 const in[], u8 const rk[],
+				  int rounds, int blocks))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aesbs_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	kernel_neon_begin();
+	while (walk.nbytes >= AES_BLOCK_SIZE) {
+		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
+
+		if (walk.nbytes < walk.total)
+			blocks = round_down(blocks,
+					    walk.chunksize / AES_BLOCK_SIZE);
+
+		fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->bskey.key,
+		   ctx->rounds, blocks);
+		err = skcipher_walk_done(&walk,
+					 walk.nbytes - blocks * AES_BLOCK_SIZE);
+	}
+	kernel_neon_end();
+
+	return err;
+}
+
+static int ecb_encrypt(struct skcipher_request *req)
+{
+	return __ecb_crypt(req, aesbs_ecb_encrypt);
+}
+
+static int ecb_decrypt(struct skcipher_request *req)
+{
+	return __ecb_crypt(req, aesbs_ecb_decrypt);
+}
+
+static int __xts_crypt(struct skcipher_request *req,
+		       void (*fn)(u8 out[], u8 const in[], u8 const rk[],
+				  int rounds, int blocks, u8 iv[]))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aesbs_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	crypto_cipher_encrypt_one(ctx->tweak_tfm, walk.iv, walk.iv);
+
+	kernel_neon_begin();
+	while (walk.nbytes >= AES_BLOCK_SIZE) {
+		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
+
+		if (walk.nbytes < walk.total)
+			blocks = round_down(blocks,
+					    walk.chunksize / AES_BLOCK_SIZE);
+
+		fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->bskey.key,
+		   ctx->rounds, blocks, walk.iv);
+		err = skcipher_walk_done(&walk,
+					 walk.nbytes - blocks * AES_BLOCK_SIZE);
+	}
+	kernel_neon_end();
+
+	return err;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	return __xts_crypt(req, aesbs_xts_encrypt);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	return __xts_crypt(req, aesbs_xts_decrypt);
+}
+
+static int ctr_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aesbs_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	kernel_neon_begin();
+	while (walk.nbytes > 0) {
+		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
+		bool final = (walk.total % AES_BLOCK_SIZE) != 0;
+
+		if (walk.nbytes < walk.total) {
+			blocks = round_down(blocks,
+					    walk.chunksize / AES_BLOCK_SIZE);
+			final = false;
+		}
+
+		aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				  ctx->bskey.key, ctx->rounds, blocks, walk.iv,
+				  final);
+
+		if (final) {
+			u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE;
+			u8 *src = walk.src.virt.addr + blocks * AES_BLOCK_SIZE;
+
+			if (dst != src)
+				memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
+			crypto_xor(dst, walk.iv, walk.total % AES_BLOCK_SIZE);
+
+			err = skcipher_walk_done(&walk, 0);
+			break;
+		}
+		err = skcipher_walk_done(&walk,
+					 walk.nbytes - blocks * AES_BLOCK_SIZE);
+	}
+	kernel_neon_end();
+
+	return err;
+}
+
+static struct skcipher_alg aes_algs[] = { {
+	.base.cra_name		= "ecb(aes)",
+	.base.cra_driver_name	= "ecb-aes-neonbs",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= AES_BLOCK_SIZE,
+	.base.cra_ctxsize	= sizeof(struct aesbs_ctx),
+	.base.cra_module	= THIS_MODULE,
+
+	.min_keysize		= AES_MIN_KEY_SIZE,
+	.max_keysize		= AES_MAX_KEY_SIZE,
+	.chunksize		= 8 * AES_BLOCK_SIZE,
+	.setkey			= aesbs_setkey,
+	.encrypt		= ecb_encrypt,
+	.decrypt		= ecb_decrypt,
+}, {
+	.base.cra_name		= "xts(aes)",
+	.base.cra_driver_name	= "xts-aes-neonbs",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= AES_BLOCK_SIZE,
+	.base.cra_ctxsize	= sizeof(struct aesbs_xts_ctx),
+	.base.cra_module	= THIS_MODULE,
+
+	.min_keysize		= 2 * AES_MIN_KEY_SIZE,
+	.max_keysize		= 2 * AES_MAX_KEY_SIZE,
+	.chunksize		= 8 * AES_BLOCK_SIZE,
+	.ivsize			= AES_BLOCK_SIZE,
+	.setkey			= aesbs_xts_setkey,
+	.encrypt		= xts_encrypt,
+	.decrypt		= xts_decrypt,
+	.init			= xts_init,
+	.exit			= xts_exit,
+}, {
+	.base.cra_name		= "ctr(aes)",
+	.base.cra_driver_name	= "ctr-aes-neonbs",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= 1,
+	.base.cra_ctxsize	= sizeof(struct aesbs_ctx),
+	.base.cra_module	= THIS_MODULE,
+
+	.min_keysize		= AES_MIN_KEY_SIZE,
+	.max_keysize		= AES_MAX_KEY_SIZE,
+	.chunksize		= 8 * AES_BLOCK_SIZE,
+	.ivsize			= AES_BLOCK_SIZE,
+	.setkey			= aesbs_setkey,
+	.encrypt		= ctr_encrypt,
+	.decrypt		= ctr_encrypt,
+} };
+
+static int __init aes_init(void)
+{
+	return crypto_register_skciphers(aes_algs, ARRAY_SIZE(aes_algs));
+}
+
+static void aes_exit(void)
+{
+	crypto_unregister_skciphers(aes_algs, ARRAY_SIZE(aes_algs));
+}
+
+module_init(aes_init);
+module_exit(aes_exit);
-- 
2.7.4

^ permalink raw reply related

* [PATCH] watchdog: bcm2835_wdt: set WDOG_HW_RUNNING bit when appropriate
From: Eric Anholt @ 2016-12-12 17:46 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481536123-9279-1-git-send-email-rasmus.villemoes@prevas.dk>

Rasmus Villemoes <rasmus.villemoes@prevas.dk> writes:

> A bootloader may start the watchdog device before handing control to
> the kernel - in that case, we should tell the kernel about it so the
> watchdog framework can keep it alive until userspace opens
> /dev/watchdog0.

I don't believe our current bootloaders (the closed firmware or u-boot)
set up the watchdog, but this seems reasonable since they might want to
later.

Acked-by: Eric Anholt <eric@anholt.net>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161212/673b1918/attachment.sig>

^ permalink raw reply

* [PATCH v4] arm64: fpsimd: improve stacking logic in non-interruptible context
From: Ard Biesheuvel @ 2016-12-12 17:55 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161212103512.GE1574@e103592.cambridge.arm.com>

On 12 December 2016 at 10:35, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Dec 09, 2016 at 08:57:20PM +0000, Ard Biesheuvel wrote:
>> On 9 December 2016 at 19:29, Dave Martin <Dave.Martin@arm.com> wrote:
>> > On Fri, Dec 09, 2016 at 06:21:55PM +0000, Catalin Marinas wrote:
>> >> On Fri, Dec 09, 2016 at 04:46:32PM +0000, Ard Biesheuvel wrote:
>> >> >  void kernel_neon_begin_partial(u32 num_regs)
>> >> >  {
>> >> > -   if (in_interrupt()) {
>> >> > -           struct fpsimd_partial_state *s = this_cpu_ptr(
>> >> > -                   in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
>> >> > +   struct fpsimd_partial_state *s;
>> >> > +   int level;
>> >> > +
>> >> > +   preempt_disable();
>> >> > +
>> >> > +   level = this_cpu_inc_return(kernel_neon_nesting_level);
>> >> > +   BUG_ON(level > 3);
>> >> > +
>> >> > +   if (level > 1) {
>> >> > +           s = this_cpu_ptr(nested_fpsimdstate);
>> >> >
>> >> > -           BUG_ON(num_regs > 32);
>> >> > -           fpsimd_save_partial_state(s, roundup(num_regs, 2));
>> >> > +           WARN_ON_ONCE(num_regs > 32);
>> >> > +           num_regs = min(roundup(num_regs, 2), 32U);
>> >> > +
>> >> > +           fpsimd_save_partial_state(&s[level - 2], num_regs);
>> >> >     } else {
>> >> >             /*
>> >> >              * Save the userland FPSIMD state if we have one and if we
>> >> > @@ -241,7 +256,6 @@ void kernel_neon_begin_partial(u32 num_regs)
>> >> >              * that there is no longer userland FPSIMD state in the
>> >> >              * registers.
>> >> >              */
>> >> > -           preempt_disable();
>> >> >             if (current->mm &&
>> >> >                 !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
>> >> >                     fpsimd_save_state(&current->thread.fpsimd_state);
>> >>
>> >> I wonder whether we could actually do this saving and flag/level setting
>> >> in reverse to simplify the races. Something like your previous patch but
>> >> only set TIF_FOREIGN_FPSTATE after saving:
>> >>
>> >>       level = this_cpu_read(kernel_neon_nesting_level);
>> >>       if (level > 0) {
>> >>               ...
>> >>               fpsimd_save_partial_state();
>> >>       } else {
>> >>               if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
>> >>                       fpsimd_save_state();
>> >>               set_thread_flag(TIF_FOREIGN_FPSTATE);
>> >>       }
>> >>       this_cpu_inc(kernel_neon_nesting_level);
>> >>
>> >> There is a risk of extra saving if we get an interrupt after
>> >> test_thread_flag() and before set_thread_flag() but I don't think this
>> >> would corrupt any state, just writing things twice.
>> >
>> > I would worry that we can save two states over the same buffer and then
>> > restore an uninitialised buffer in this case unless we are careful.
>> > Because the level-dependent code is now misbracketed by the inc/dec,
>> > a preempting call races with the outer call and use the same value.
>> >
>> > I guess we could do
>> >
>> > if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
>> >         fpsimd_save_state();
>> > clear_thread_flag(TIF_FOREIGN_FPSTATE);
>> >
>> > at the start unconditionally, before the _inc_return().
>> >
>> > The task state may then get saved in the middle of being saved, but
>> > as you say it shouldn't have changed in the meantime.
>>
>> It /will/ have changed in the meantime: when the interrupted context
>> is resumed, it will happily proceed with saving the state where it
>> left off, but now the register file contains whatever was left after
>> the interrupt handler is done with the NEON.
>
> Hmmm, true.  The NEON regs will have been restored by kernel_neon_end()
> in the inner context, but the extra SVE bits won't have been.
>

Even worse: both the interrupter and the interruptee think they are
preserving the userland context, so once the interrupter is done, it
will not restore the context as it found it. The interruptee will then
proceed and write whatever is left in those registers into the saved
state.

>>
>> > The nested
>> > save code may then do a partial save of the same state on top of that
>> > which could get restored at the inner kernel_neon_end() call.
>> >
>>
>> I'm afraid the only way to deal with this correctly is to treat the
>> whole sequence as a critical section, which means execute it with
>> interrupts disabled.
>
> Or we make the KERNEL_MODE_NEON code SVE-aware, which is where I started
> off.  In that case, we do SVE (partial) save/restore whenever
> kernel_mode_neon() is called with live SVE state.  The change here is
> that would we consider that there is always live SVE state until the
> fpsimd_save_state() actually finishes at the outer level.  We may want
> to delay setting of TIF_FOREIGN_FPSTATE for that purpose.
>
> This means you do take an additional latency hit if you want to use NEON
> in an interrupting context and there happens to be live SVE state.  It's
> a consequence of the architecture though -- I don't think there's any
> way to get around it.  We can still scale the cost by implementing
> sve_save_partial_state() or something equivalent.
>
> You original inc()+save() ... restore()+dec() seems sound enough if
> viewed this way.  Unless I'm missing something?
>

I think having a small critical section is not so bad. Let me send out
a v5 so we can discuss ...

^ permalink raw reply

* [PATCH v5] arm64: fpsimd: improve stacking logic in non-interruptible context
From: Ard Biesheuvel @ 2016-12-12 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, we allow kernel mode NEON in softirq or hardirq context by
stacking and unstacking a slice of the NEON register file for each call
to kernel_neon_begin() and kernel_neon_end(), respectively.

Given that
a) a CPU typically spends most of its time in userland, during which time
   no kernel mode NEON in process context is in progress,
b) a CPU spends most of its time in the kernel doing other things than
   kernel mode NEON when it gets interrupted to perform kernel mode NEON
   in softirq context

the stacking and subsequent unstacking is only necessary if we are
interrupting a thread while it is performing kernel mode NEON in process
context, which means that in all other cases, we can simply preserve the
userland FPSIMD state once, and only restore it upon return to userland,
even if we are being invoked from softirq or hardirq context.

So instead of checking whether we are running in interrupt context, keep
track of the level of nested kernel mode NEON calls in progress, and only
perform the eager stack/unstack if the level exceeds 1.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/fpsimd.c | 64 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 46 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 394c61db5566..c19363775436 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -220,45 +220,73 @@ void fpsimd_flush_task_state(struct task_struct *t)
 
 #ifdef CONFIG_KERNEL_MODE_NEON
 
-static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
-static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+/*
+ * Although unlikely, it is possible for three kernel mode NEON contexts to
+ * be live at the same time: process context, softirq context and hardirq
+ * context. So while the userland context is stashed in the thread's fpsimd
+ * state structure, we need two additional levels of storage.
+ */
+static DEFINE_PER_CPU(struct fpsimd_partial_state, nested_fpsimdstate[2]);
+static DEFINE_PER_CPU(int, kernel_neon_nesting_level);
 
 /*
  * Kernel-side NEON support functions
  */
 void kernel_neon_begin_partial(u32 num_regs)
 {
-	if (in_interrupt()) {
-		struct fpsimd_partial_state *s = this_cpu_ptr(
-			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
+	struct fpsimd_partial_state *s;
+	int level;
 
-		BUG_ON(num_regs > 32);
-		fpsimd_save_partial_state(s, roundup(num_regs, 2));
-	} else {
+	preempt_disable();
+
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		/*
 		 * Save the userland FPSIMD state if we have one and if we
 		 * haven't done so already. Clear fpsimd_last_state to indicate
 		 * that there is no longer userland FPSIMD state in the
 		 * registers.
 		 */
-		preempt_disable();
-		if (current->mm &&
-		    !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
-			fpsimd_save_state(&current->thread.fpsimd_state);
+		if (current->mm) {
+			unsigned long flags;
+
+			local_irq_save(flags);
+			if (!test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+				fpsimd_save_state(&current->thread.fpsimd_state);
+			local_irq_restore(flags);
+		} else {
+			set_thread_flag(TIF_FOREIGN_FPSTATE);
+		}
 		this_cpu_write(fpsimd_last_state, NULL);
 	}
+
+	level = this_cpu_inc_return(kernel_neon_nesting_level);
+	BUG_ON(level > 3);
+
+	if (level > 1) {
+		s = this_cpu_ptr(nested_fpsimdstate);
+
+		WARN_ON_ONCE(num_regs > 32);
+		num_regs = min(roundup(num_regs, 2), 32U);
+
+		fpsimd_save_partial_state(&s[level - 2], num_regs);
+	}
 }
 EXPORT_SYMBOL(kernel_neon_begin_partial);
 
 void kernel_neon_end(void)
 {
-	if (in_interrupt()) {
-		struct fpsimd_partial_state *s = this_cpu_ptr(
-			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
-		fpsimd_load_partial_state(s);
-	} else {
-		preempt_enable();
+	struct fpsimd_partial_state *s;
+	int level;
+
+	level = this_cpu_read(kernel_neon_nesting_level);
+	BUG_ON(level < 1);
+
+	if (level > 1) {
+		s = this_cpu_ptr(nested_fpsimdstate);
+		fpsimd_load_partial_state(&s[level - 2]);
 	}
+	this_cpu_dec(kernel_neon_nesting_level);
+	preempt_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH] media: platform: exynos4-is: constify v4l2_subdev_* structures
From: Bhumika Goyal @ 2016-12-12 18:03 UTC (permalink / raw)
  To: linux-arm-kernel

v4l2_subdev_{core/pad/video}_ops structures are stored in the
fields of the v4l2_subdev_ops structure which are of type const.
Also, v4l2_subdev_ops structure is passed to a function
having its argument of type const. As these structures are never
modified, so declare them as const.
Done using Coccinelle:(one of the scripts used)

@r1 disable optional_qualifier @
identifier i;
position p;
@@
static struct v4l2_subdev_ops i at p = {...};

@ok1@
identifier r1.i;
position p;
expression e1;
@@
v4l2_subdev_init(e1,&i at p)

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i at p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct v4l2_subdev_ops i;

File size before:
   text	   data	    bss	    dec	    hex	filename
  16830	   1064	      0	  17894	   45e6 platform/exynos4-is/fimc-capture.o
   7787	    704	     20	   8511	   213f platform/exynos4-is/mipi-csis.o

File size after:
   text	   data	    bss	    dec	    hex	filename
  17022	    880	      0	  17902	   45ee platform/exynos4-is/fimc-capture.o
   8299	    192	     20	   8511	   213f platform/exynos4-is/mipi-csis.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
---
 drivers/media/platform/exynos4-is/fimc-capture.c | 4 ++--
 drivers/media/platform/exynos4-is/mipi-csis.c    | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/media/platform/exynos4-is/fimc-capture.c b/drivers/media/platform/exynos4-is/fimc-capture.c
index 964f4a6..a5e729c 100644
--- a/drivers/media/platform/exynos4-is/fimc-capture.c
+++ b/drivers/media/platform/exynos4-is/fimc-capture.c
@@ -1695,7 +1695,7 @@ static int fimc_subdev_set_selection(struct v4l2_subdev *sd,
 	return 0;
 }
 
-static struct v4l2_subdev_pad_ops fimc_subdev_pad_ops = {
+static const struct v4l2_subdev_pad_ops fimc_subdev_pad_ops = {
 	.enum_mbus_code = fimc_subdev_enum_mbus_code,
 	.get_selection = fimc_subdev_get_selection,
 	.set_selection = fimc_subdev_set_selection,
@@ -1703,7 +1703,7 @@ static int fimc_subdev_set_selection(struct v4l2_subdev *sd,
 	.set_fmt = fimc_subdev_set_fmt,
 };
 
-static struct v4l2_subdev_ops fimc_subdev_ops = {
+static const struct v4l2_subdev_ops fimc_subdev_ops = {
 	.pad = &fimc_subdev_pad_ops,
 };
 
diff --git a/drivers/media/platform/exynos4-is/mipi-csis.c b/drivers/media/platform/exynos4-is/mipi-csis.c
index befd9fc..f819b29 100644
--- a/drivers/media/platform/exynos4-is/mipi-csis.c
+++ b/drivers/media/platform/exynos4-is/mipi-csis.c
@@ -649,23 +649,23 @@ static int s5pcsis_log_status(struct v4l2_subdev *sd)
 	return 0;
 }
 
-static struct v4l2_subdev_core_ops s5pcsis_core_ops = {
+static const struct v4l2_subdev_core_ops s5pcsis_core_ops = {
 	.s_power = s5pcsis_s_power,
 	.log_status = s5pcsis_log_status,
 };
 
-static struct v4l2_subdev_pad_ops s5pcsis_pad_ops = {
+static const struct v4l2_subdev_pad_ops s5pcsis_pad_ops = {
 	.enum_mbus_code = s5pcsis_enum_mbus_code,
 	.get_fmt = s5pcsis_get_fmt,
 	.set_fmt = s5pcsis_set_fmt,
 };
 
-static struct v4l2_subdev_video_ops s5pcsis_video_ops = {
+static const struct v4l2_subdev_video_ops s5pcsis_video_ops = {
 	.s_rx_buffer = s5pcsis_s_rx_buffer,
 	.s_stream = s5pcsis_s_stream,
 };
 
-static struct v4l2_subdev_ops s5pcsis_subdev_ops = {
+static const struct v4l2_subdev_ops s5pcsis_subdev_ops = {
 	.core = &s5pcsis_core_ops,
 	.pad = &s5pcsis_pad_ops,
 	.video = &s5pcsis_video_ops,
-- 
1.9.1

^ permalink raw reply related

* [RFC v3 PATCH 00/25] Allow NOMMU for MULTIPLATFORM
From: Afzal Mohammed @ 2016-12-12 18:15 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <87fulus094.fsf@dell.be.48ers.dk>

Hi,

On Sun, Dec 11, 2016 at 09:01:59PM +0100, Peter Korsgaard wrote:

> When you select a cortex-A variant, then we enable MMU support by
> default, but you can disable it under toolchain options (Enable MMU) and
> then the flat binary option is available.

Thank You Peter Korsgaard, that did the trick, able to boot to
prompt!, logs at the end.

> Hmm, I'm not sure why a cortex-M toolchain wouldn't work on cortex-A, I
> thought the 'M' instruction set was a pure subset of the 'A'.

On Mon, Dec 12, 2016 at 09:28:03AM +0000, Vladimir Murzin wrote:

> M-class toolchain should just work with A-class; you don't even need to
> disable MMU to try it out after d782e42 ("ARM: 8594/1: enable binfmt_flat on
> systems with an MMU").

Earlier, there was a nonsense done by me in not enabling flat binary
support in Kernel.

But even after that, it didn't work, dunno why, upon enabling flat
binary support in Kernel, it ended up instead with,

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

The exit code probably refers to interrupted system call

On Mon, Dec 12, 2016 at 08:07:16AM +0100, mickael guene wrote:

>  You can find an R toolchain here:
> https://github.com/mickael-guene/fdpic_manifest/releases/download/v7-r-1.0.1/toolset-v7-r-1.0.1-0-gbdcc6a7c-armv7-r.tgz
> 
>  It's an fdpic toolset for cortex-r cpu class. gcc version is
> quite old (4.7).
> 
>  Note also that generated code may crash on class A cpu due to
> generation of udiv/sdiv which is optional for class A.
> (cortex a15 is ok but not a9).
> 
> Hope it helps

On Mon, Dec 12, 2016 at 10:44:45AM +0100, mickael guene wrote:

>  At the end of https://github.com/mickael-guene/fdpic_manifest you can
> find a set of patch to apply for kernel fdpic support. Unfortunately
> they are quite old ... But I have done some test on May for
> stm32f469-disco platform and I have attached patches against more
> recent kernel.

Thanks Mickael.

Earlier had tried syncing the repo, download was getting interrupted
frequently, though persisting on it would have fetched it fully. But
seeing the Kernel patches parallely, pushed the plan aside for the
time being as context of the changes was very much different with the
version of Kernel (4.9-rc7) used here.

But the attached patches seems can be applied w/o any/much difficulty.

As already reached the prompt, will keep note of these details, might
help later.

And Vladimir, Thanks.

Regards
afzal


[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.9.0-rc7-00026-g7a142ca8231b (afzal at debian) (gcc version 6.2.0 (GCC) ) #26 Mon Dec 12 22:32:33 IST 2016
[    0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=00c50478
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt:Machine model: TI AM437x Industrial Development Kit
[    0.000000] bootconsole [earlycon0] enabled
[    0.000000] AM437x ES1.2 (sgx neon)
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260096
[    0.000000] Kernel command line: console=ttyO0,115200n8 earlyprintk
[    0.000000] PID hash table entries: 4096 (order: 2, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Memory: 1029196K/1048576K available (6562K kernel code, 523K rwdata, 2096K rodata, 712K init, 274K bss, 19380K reserved, 0K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0x80000000 - 0x80001000   (   4 kB)
[    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
[    0.000000]     vmalloc : 0x00000000 - 0xffffffff   (4095 MB)
[    0.000000]     lowmem  : 0x80000000 - 0xc0000000   (1024 MB)
[    0.000000]     modules : 0x80000000 - 0xc0000000   (1024 MB)
[    0.000000]       .text : 0x80008000 - 0x80670b88   (6563 kB)
[    0.000000]       .init : 0x8087e000 - 0x80930000   ( 712 kB)
[    0.000000]       .data : 0x80930000 - 0x809b2f60   ( 524 kB)
[    0.000000]        .bss : 0x809b2f60 - 0x809f7a9c   ( 275 kB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS:16 nr_irqs:16 16
[    0.000000] OMAP clockevent source: timer1 at 32786 Hz
[    0.000259] sched_clock: 64 bits at 500MHz, resolution 2ns, wraps every 4398046511103ns
[    0.009660] clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles: 0xe6a171a037, max_idle_ns: 881590485102 ns
[    0.022315] Switching to timer-based delay loop, resolution 2ns
[    0.141364] clocksource: 32k_counter: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 58327039986419 ns
[    0.152415] OMAP clocksource: 32k_counter at 32768 Hz
[    0.231362] Console: colour dummy device 80x30
[    0.236920] Calibrating delay loop (skipped), value calculated using timer frequency.. 1000.00 BogoMIPS (lpj=5000000)
[    0.249062] pid_max: default: 32768 minimum: 301
[    0.256668] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.264524] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[    0.323801] devtmpfs: initialized
[    0.935615] VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
[    0.951495] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.963940] pinctrl core: initialized pinctrl subsystem
[    1.014378] NET: Registered protocol family 16
[    2.111659] cpuidle: using governor menu
[    2.176928] omap_l3_noc 44000000.ocp: L3 debug error: target 8 mod:0 (unclearable)
[    2.186840] omap_l3_noc 44000000.ocp: L3 application error: target 8 mod:0 (unclearable)
[    2.494565] OMAP GPIO hardware version 0.1
[    2.883468] platform 53701000.des: Cannot lookup hwmod 'des'
[    2.900195] platform 48310000.rng: Cannot lookup hwmod 'rng'
[    3.046777] hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
[    3.055998] hw-breakpoint: maximum watchpoint size is 4 bytes.
[    3.072570] omap4_sram_init:Unable to allocate sram needed to handle errata I688
[    3.080942] omap4_sram_init:Unable to get sram pool needed to handle errata I688
[    4.016395] edma 49000000.edma: TI EDMA DMA engine driver
[    4.042166] V3_3D: supplied by V24_0D
[    4.056616] VDD_COREREG: supplied by V24_0D
[    4.072516] VDD_CORE: supplied by VDD_COREREG
[    4.088252] V1_8DREG: supplied by V24_0D
[    4.103897] V1_8D: supplied by V1_8DREG
[    4.118796] V1_5DREG: supplied by V24_0D
[    4.134236] V1_5D: supplied by V1_5DREG
[    4.288700] vgaarb: loaded
[    4.326444] SCSI subsystem initialized
[    4.345255] usbcore: registered new interface driver usbfs
[    4.354345] usbcore: registered new interface driver hub
[    4.362195] usbcore: registered new device driver usb
[    4.383412] omap_i2c 44e0b000.i2c: could not find pctldev for node /ocp at 44000000/l4_wkup at 44c00000/scm at 210000/pinmux at 800/i2c0_pins_default, deferring probe
[    4.400047] omap_i2c 4819c000.i2c: could not find pctldev for node /ocp at 44000000/l4_wkup at 44c00000/scm at 210000/pinmux at 800/i2c2_pins_default, deferring probe
[    4.420788] pps_core: LinuxPPS API ver. 1 registered
[    4.426744] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    4.437776] PTP clock support registered
[    4.449669] EDAC MC: Ver: 3.0.0
[    4.507254] clocksource: Switched to clocksource arm_global_timer
[    4.891236] NET: Registered protocol family 2
[    4.920504] TCP established hash table entries: 8192 (order: 3, 32768 bytes)
[    4.934239] TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
[    4.947220] TCP: Hash tables configured (established 8192 bind 8192)
[    4.956856] UDP hash table entries: 512 (order: 1, 8192 bytes)
[    4.965035] UDP-Lite hash table entries: 512 (order: 1, 8192 bytes)
[    4.976320] NET: Registered protocol family 1
[    4.988215] RPC: Registered named UNIX socket transport module.
[    4.994956] RPC: Registered udp transport module.
[    5.000656] RPC: Registered tcp transport module.
[    5.006103] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    6.371750] workingset: timestamp_bits=30 max_order=18 bucket_order=0
[    6.835038] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    6.888403] NFS: Registering the id_resolver key type
[    6.894459] Key type id_resolver registered
[    6.899596] Key type id_legacy registered
[    6.905432] ntfs: driver 2.1.32 [Flags: R/O].
[    6.961089] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248)
[    6.969706] io scheduler noop registered
[    6.974328] io scheduler deadline registered
[    6.989359] io scheduler cfq registered (default)
[    7.085244] pinctrl-single 44e10800.pinmux: 199 pins at pa 44e10800 size 796
[    9.483420] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    9.593419] omap_uart 44e09000.serial: no wakeirq for uart0
[    9.600384] omap_uart 44e09000.serial: No clock speed specified: using default: 48000000
[    9.612215] 44e09000.serial: ttyO0 at MMIO 0x44e09000 (irq = 29, base_baud = 3000000) is a OMAP UART0
[    9.623241] console [ttyO0] enabled
[    9.623241] console [ttyO0] enabled
[    9.631603] bootconsole [earlycon0] disabled
[    9.631603] bootconsole [earlycon0] disabled
[    9.657952] STMicroelectronics ASC driver initialized
[    9.703627] omap_rng 48310000.rng: _od_fail_runtime_resume: FIXME: missing hwmod/omap_dev info
[    9.714158] omap_rng 48310000.rng: Failed to runtime_get device: -19
[    9.722078] omap_rng 48310000.rng: initialization failed.
[   10.149265] brd: module loaded
[   10.379026] loop: module loaded
[   10.549954] libphy: Fixed MDIO Bus: probed
[   10.621367] CAN device driver interface
[   10.696022] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[   10.703201] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[   10.713426] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
[   10.721901] igb: Copyright (c) 2007-2014 Intel Corporation.
[   10.937883] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[   10.945298] davinci_mdio 4a101000.mdio: detected phy mask fffffffe
[   10.963397] libphy: 4a101000.mdio: probed
[   10.968744] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver Micrel KSZ9031 Gigabit PHY
[   11.002569] cpsw 4a100000.ethernet: Detected MACID = c4:be:84:cc:f8:b2
[   11.059455] pegasus: v0.9.3 (2013/04/25), Pegasus/Pegasus II USB Ethernet driver
[   11.070237] usbcore: registered new interface driver pegasus
[   11.079659] usbcore: registered new interface driver asix
[   11.088196] usbcore: registered new interface driver ax88179_178a
[   11.097524] usbcore: registered new interface driver cdc_ether
[   11.107046] usbcore: registered new interface driver smsc75xx
[   11.116880] usbcore: registered new interface driver smsc95xx
[   11.125813] usbcore: registered new interface driver net1080
[   11.134794] usbcore: registered new interface driver cdc_subset
[   11.143940] usbcore: registered new interface driver zaurus
[   11.153558] usbcore: registered new interface driver cdc_ncm
[   11.224727] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   11.232713] ehci-pci: EHCI PCI platform driver
[   11.240224] ehci-platform: EHCI generic platform driver
[   11.256349] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   11.264046] ohci-pci: OHCI PCI platform driver
[   11.271558] ohci-platform: OHCI generic platform driver
[   11.284180] ohci-omap3: OHCI OMAP3 driver
[   11.325702] usbcore: registered new interface driver usb-storage
[   11.400750] mousedev: PS/2 mouse device common for all mice
[   11.447930] i2c /dev entries driver
[   11.615616] sdhci: Secure Digital Host Controller Interface driver
[   11.623170] sdhci: Copyright(c) Pierre Ossman
[   11.647533] omap_hsmmc 48060000.mmc: Got CD GPIO
[   11.734930] Synopsys Designware Multimedia Card Interface Driver
[   11.764020] sdhci-pltfm: SDHCI platform and OF driver helper
[   11.812878] ledtrig-cpu: registered to indicate activity on CPUs
[   11.827153] usbcore: registered new interface driver usbhid
[   11.834026] usbhid: USB HID core driver
[   11.882277] NET: Registered protocol family 10
[   11.934153] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[   11.971848] NET: Registered protocol family 17
[   11.977629] can: controller area network core (rev 20120528 abi 9)
[   11.987031] NET: Registered protocol family 29
[   11.992658] can: raw protocol (rev 20120528)
[   11.998072] can: broadcast manager protocol (rev 20161123 t)
[   12.004962] can: netlink gateway (rev 20130117) max_hops=1
[   12.025921] Key type dns_resolver registered
[   12.035682] omap_voltage_late_init: Voltage driver support not added
[   12.048436] ThumbEE CPU extension supported.
[   12.280999] mmc0: host does not support reading read-only switch, assuming write-enable
[   12.293776] mmc0: new high speed SDHC card at address 0002
[   12.318060] at24 0-0050: 32768 byte 24c256 EEPROM, writable, 64 bytes/write
[   12.340744] mmcblk0: mmc0:0002 00000 3.66 GiB 
[   12.366577] omap_i2c 44e0b000.i2c: bus 0 rev0.12 at 400 kHz
[   12.393428]  mmcblk0: p1 p2
[   12.433995] omap_i2c 4819c000.i2c: bus 2 rev0.12 at 100 kHz
[   12.464867] input: gpio_keys as /devices/platform/gpio_keys/input/input0
[   12.479679] hctosys: unable to open rtc device (rtc0)
[   12.564936] Freeing unused kernel memory: 712K (8087e000 - 80930000)
[   12.572725] This architecture does not have kernel memory protection.
Initializing random number generator... [   14.422674] random: dd: uninitialized urandom read (512 bytes read)
done.

Welcome to Buildroot
buildroot login: root
Jan  1 00:00:16 login[81]: root login on 'ttyO0'
~ # uname -a
Linux buildroot 4.9.0-rc7-00026-g7a142ca8231b #26 Mon Dec 12 22:32:33 IST 2016 armv7l GNU/Linux
~ # 

^ permalink raw reply

* [RFT PATCH] ARM64: dts: meson-gxbb: Add reserved memory zone and usable memory range
From: Heinrich Schuchardt @ 2016-12-12 18:23 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161212101801.28491-1-narmstrong@baylibre.com>

On 12/12/2016 11:18 AM, Neil Armstrong wrote:
> The Amlogic Meson GXBB secure monitor uses part of the memory space, this
> patch adds these reserved zones and redefines the usable memory range for
> each boards.
> 
> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-gx-p23x-q20x.dtsi |  2 +-
>  arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 21 +++++++++++++++++++++
>  .../boot/dts/amlogic/meson-gxbb-nexbox-a95x.dts     |  2 +-
>  arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts |  2 +-
>  arch/arm64/boot/dts/amlogic/meson-gxbb-p20x.dtsi    |  2 +-
>  .../boot/dts/amlogic/meson-gxbb-vega-s95-meta.dts   |  2 +-
>  .../boot/dts/amlogic/meson-gxbb-vega-s95-pro.dts    |  2 +-
>  .../boot/dts/amlogic/meson-gxbb-vega-s95-telos.dts  |  2 +-
>  .../boot/dts/amlogic/meson-gxl-nexbox-a95x.dts      |  2 +-
>  .../arm64/boot/dts/amlogic/meson-gxl-s905x-p212.dts |  2 +-
>  arch/arm64/boot/dts/amlogic/meson-gxm-nexbox-a1.dts |  2 +-
>  11 files changed, 31 insertions(+), 10 deletions(-)
> 

I added your patch to next-20161212.

My kernel config is available as
https://github.com/xypron/kernel-odroid-c2/blob/5ec4be0c1b45297bbcbc1ce3d3d787e45dac66b6/config/config-next-20161212

To build the same kernel just run ./build-dpkg.sh (or make) on
https://github.com/xypron/kernel-odroid-c2/tree/5ec4be0c1b45297bbcbc1ce3d3d787e45dac66b6

Free showed 0x2301000 less total memory available than next-20161209
without the patch.

When git cloning linux-next I got the following error on Hardkernel
Odroid C2:

[  811.602365] Bad mode in Error handler detected on CPU2, code
0xbf000000 -- SError
[  811.604205] CPU: 2 PID: 1447 Comm: git Not tainted
4.9.0-next-20161212-r005-arm64 #1
[  811.611876] Hardware name: Hardkernel ODROID-C2 (DT)
[  811.616793] task: ffff8000745c5780 task.stack: ffff800072d3c000
[  811.622660] PC is at 0xaaaad3770f28
[  811.626107] LR is at 0xffffab54e53c
[  811.629558] pc : [<0000aaaad3770f28>] lr : [<0000ffffab54e53c>]
pstate: 20000000
[  811.636888] sp : 0000ffffd3a1d950
[  811.640166] x29: 0000ffffd3a1d950 x28: 0000ffff9853a050
[  811.645427] x27: 00000000000ffc5e x26: 0000ffff8fe00020
[  811.650688] x25: 0000ffffd3a1da98 x24: 0000000000000000
[  811.655949] x23: 0000aaaad3770f28 x22: 0000000000000010
[  811.661211] x21: 0000ffff9809bae0 x20: 000000000003de04
[  811.666472] x19: 0000ffff8fe00010 x18: 0000000023c57c32
[  811.671733] x17: 0000ffffab58f988 x16: 0000ffffab660008
[  811.676994] x15: 00000000000006dc x14: 0000000000000000
[  811.682255] x13: 00000000002549ea x12: 0000000029555c36
[  811.687517] x11: 00000000002549eb x10: 0000000029555c36
[  811.692778] x9 : 00000000002549ea x8 : 0000000029555c36
[  811.698039] x7 : 00000000002549e9 x6 : 0000000029555c36
[  811.703300] x5 : 0000ffff98d54b40 x4 : 0000ffff8f93c030
[  811.708562] x3 : 00000000ffffffff x2 : 0000000000000000
[  811.713823] x1 : 0000ffff9853a050 x0 : 0000ffff9809bae0

[  811.720561] Internal error: Attempting to execute userspace memory:
8600000f [#1] PREEMPT SMP
[  811.729004] Modules linked in: meson_rng rng_core ip_tables x_tables
ipv6 realtek
[  811.736422] CPU: 2 PID: 1447 Comm: git Not tainted
4.9.0-next-20161212-r005-arm64 #1
[  811.744097] Hardware name: Hardkernel ODROID-C2 (DT)
[  811.749014] task: ffff8000745c5780 task.stack: ffff800072d3c000
[  811.754879] PC is at 0xffffab54e53c
[  811.758328] LR is at 0xffffab54e53c
[  811.761779] pc : [<0000ffffab54e53c>] lr : [<0000ffffab54e53c>]
pstate: 600003c5
[  811.769109] sp : ffff800072d3fec0
[  811.772387] x29: 0000000000000000 x28: ffff8000745c5780
[  811.777648] x27: 00000000000ffc5e x26: 0000ffff8fe00020
[  811.782909] x25: 0000ffffd3a1da98 x24: 0000000000000000
[  811.788171] x23: 0000000020000000 x22: 0000aaaad3770f28
[  811.793432] x21: ffffffffffffffff x20: 000080006e538000
[  811.798693] x19: 0000000000000000 x18: 0000000000000010
[  811.803954] x17: 0000ffffab58f988 x16: 0000ffffab660008
[  811.809215] x15: 0000000000000006 x14: ffff000088b2eabf
[  811.814477] x13: ffff000008b2eacd x12: 0000000000000105
[  811.819738] x11: 0000000000000002 x10: 0000000000000106
[  811.824999] x9 : ffff800072d3fb40 x8 : 00000000000af8ec
[  811.830260] x7 : 0000000000000000 x6 : 0000000000000a65
[  811.835522] x5 : 000000000a660a65 x4 : 0000000000000000
[  811.840783] x3 : 0000000000000002 x2 : 0000000000000a66
[  811.846044] x1 : ffff8000745c5780 x0 : 0000000000000000

[  811.852773] Process git (pid: 1447, stack limit = 0xffff800072d3c000)
[  811.859156] Stack: (0xffff800072d3fec0 to 0xffff800072d40000)
[  811.864849] fec0: 0000ffff9809bae0 0000ffff9853a050 0000000000000000
00000000ffffffff
[  811.872611] fee0: 0000ffff8f93c030 0000ffff98d54b40 0000000029555c36
00000000002549e9
[  811.880374] ff00: 0000000029555c36 00000000002549ea 0000000029555c36
00000000002549eb
[  811.888136] ff20: 0000000029555c36 00000000002549ea 0000000000000000
00000000000006dc
[  811.895898] ff40: 0000ffffab660008 0000ffffab58f988 0000000023c57c32
0000ffff8fe00010
[  811.903661] ff60: 000000000003de04 0000ffff9809bae0 0000000000000010
0000aaaad3770f28
[  811.911423] ff80: 0000000000000000 0000ffffd3a1da98 0000ffff8fe00020
00000000000ffc5e
[  811.919186] ffa0: 0000ffff9853a050 0000ffffd3a1d950 0000ffffab54e53c
0000ffffd3a1d950
[  811.926949] ffc0: 0000aaaad3770f28 0000000020000000 0000000000000000
ffffffffffffffff
[  811.934711] ffe0: 0000000000000000 0000000000000000 3136363920746e61
3064613364666464
[  811.942473] Call trace:
[  811.944888] Exception stack(0xffff800072d3fcf0 to 0xffff800072d3fe20)
[  811.951270] fce0:                                   0000000000000000
0001000000000000
[  811.959034] fd00: ffff800072d3fec0 0000ffffab54e53c ffff8000731ab640
0000000000000000
[  811.966796] fd20: 0000000000000004 ffff000008ab9818 ffff8000745c5780
000000000808540c
[  811.974559] fd40: ffff800072d3fd90 ffff0000080c8858 ffff800072d3fe40
ffff8000745c5780
[  811.982321] fd60: 0000000000000004 00000000000003c0 ffff800072d3fe40
0000000000000000
[  811.990084] fd80: 0000ffffd3a1da98 0000ffff8fe00020 0000000000000000
ffff8000745c5780
[  811.997846] fda0: 0000000000000a66 0000000000000002 0000000000000000
000000000a660a65
[  812.005609] fdc0: 0000000000000a65 0000000000000000 00000000000af8ec
ffff800072d3fb40
[  812.013371] fde0: 0000000000000106 0000000000000002 0000000000000105
ffff000008b2eacd
[  812.021134] fe00: ffff000088b2eabf 0000000000000006 0000ffffab660008
0000ffffab58f988
[  812.028896] [<0000ffffab54e53c>] 0xffffab54e53c
[  812.033382] Code: aa1c03e1 aa1503e0 8b16027a d63f02e0 (7100001f)
[  812.039501] ---[ end trace e791f586be1831bb ]---

^ permalink raw reply

* [PATCHv4 00/15] clk: ti: add support for hwmod clocks
From: Michael Turquette @ 2016-12-12 18:25 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <7371ef35-5d95-bf78-4c97-c61091a1fa4b@ti.com>

Quoting Tero Kristo (2016-12-02 00:15:53)
> On 29/10/16 02:37, Stephen Boyd wrote:
> > On 10/28, Tero Kristo wrote:
> >> Eventually that should happen. However, we have plenty of legacy
> >> code still in place which depend on clk_get functionality within
> >> kernel. The major contributing factor is the hwmod codebase, for
> >> which we have plans to:
> >>
> >> - get this clock driver merged
> >> - implement a new interconnect driver for OMAP family SoCs
> >> - interconnect driver will use DT handles for fetching clocks,
> >> rather than clock aliases
> >> - reset handling will be implemented as part of the interconnect
> >> driver somehow (no prototype / clear plans for that as of yet)
> >> - all the hwmod stuff can be dropped
> >>
> >> The clock alias handling is still needed as a transition phase until
> >> all the above is done, then we can start dropping them. Basically
> >> anything that is using omap_hwmod depends on the clock aliases right
> >> now.
> >
> > Ok, sounds good. Thanks.
> 
> Stephen, any final comments on this series? I guess its too late to push 
> for 4.10, but I would like to get this merged early for 4.11 window.

Hi Tero,

No final comments from me. I needed to go back and forth with Tony about
the clockdomain modeling, but it seems sensible to create clock
providers from the clock domains if you want to pass those struct clk
objects down to the drivers.

One thing I wasn't able to follow exactly in the code is how the
clockdomains are linking parent clocks from cm1, cm2, etc to the clock
domains. Are the clockdomain providers calling clk_get() on the clocks
that it *consumes*, or are the clockdomain providers never calling
clk_get() on those clocks and just establishing the tree hierarchy at
clk_register() time?

Unless Stephen has any more review comments we can merge this into a
clk-next based on v4.10-rc1 when that drops.

Regards,
Mike

> 
> -Tero

^ permalink raw reply

* [PATCH] PCI: mvebu: Handle changes to the bridge windows while enabled
From: Jason Gunthorpe @ 2016-12-12 18:30 UTC (permalink / raw)
  To: linux-arm-kernel

The PCI core will write to the bridge window config multiple times
while they are enabled. This can lead to mbus failures like:

 mvebu_mbus: cannot add window '4:e8', conflicts with another window
 mvebu-pcie mbus:pex at e0000000: Could not create MBus window at [mem 0xe0000000-0xe00fffff]: -22

For me this is happening during a hotplug cycle. The PCI core is
not changing the values, just writing them twice while active.

The patch addresses the general case of any change to an active window,
but not atomically. The code is slightly refactored so io and mem
can share more of the window logic.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/pci/host/pci-mvebu.c | 101 +++++++++++++++++++++++++------------------
 1 file changed, 60 insertions(+), 41 deletions(-)

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 307f81d6b479af..af724731b22f53 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -133,6 +133,12 @@ struct mvebu_pcie {
 	int nports;
 };
 
+struct mvebu_pcie_window {
+	phys_addr_t base;
+	phys_addr_t remap;
+	size_t size;
+};
+
 /* Structure representing one PCIe interface */
 struct mvebu_pcie_port {
 	char *name;
@@ -150,10 +156,8 @@ struct mvebu_pcie_port {
 	struct mvebu_sw_pci_bridge bridge;
 	struct device_node *dn;
 	struct mvebu_pcie *pcie;
-	phys_addr_t memwin_base;
-	size_t memwin_size;
-	phys_addr_t iowin_base;
-	size_t iowin_size;
+	struct mvebu_pcie_window memwin;
+	struct mvebu_pcie_window iowin;
 	u32 saved_pcie_stat;
 };
 
@@ -379,23 +383,45 @@ static void mvebu_pcie_add_windows(struct mvebu_pcie_port *port,
 	}
 }
 
+static void mvebu_pcie_set_window(struct mvebu_pcie_port *port,
+				  unsigned int target, unsigned int attribute,
+				  const struct mvebu_pcie_window *desired,
+				  struct mvebu_pcie_window *cur)
+{
+	if (desired->base == cur->base && desired->remap == cur->remap &&
+	    desired->size == cur->size)
+		return;
+
+	if (cur->size != 0) {
+		mvebu_pcie_del_windows(port, cur->base, cur->size);
+		cur->size = 0;
+		cur->base = 0;
+
+		/*
+		 * If something tries to change the window while it is enabled
+		 * the change will not be done atomically. That would be
+		 * difficult to do in the general case.
+		 */
+	}
+
+	if (desired->size == 0)
+		return;
+
+	mvebu_pcie_add_windows(port, target, attribute, desired->base,
+			       desired->size, desired->remap);
+	*cur = *desired;
+}
+
 static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 {
-	phys_addr_t iobase;
+	struct mvebu_pcie_window desired = {};
 
 	/* Are the new iobase/iolimit values invalid? */
 	if (port->bridge.iolimit < port->bridge.iobase ||
 	    port->bridge.iolimitupper < port->bridge.iobaseupper ||
 	    !(port->bridge.command & PCI_COMMAND_IO)) {
-
-		/* If a window was configured, remove it */
-		if (port->iowin_base) {
-			mvebu_pcie_del_windows(port, port->iowin_base,
-					       port->iowin_size);
-			port->iowin_base = 0;
-			port->iowin_size = 0;
-		}
-
+		mvebu_pcie_set_window(port, port->io_target, port->io_attr,
+				      &desired, &port->iowin);
 		return;
 	}
 
@@ -412,32 +438,27 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 	 * specifications. iobase is the bus address, port->iowin_base
 	 * is the CPU address.
 	 */
-	iobase = ((port->bridge.iobase & 0xF0) << 8) |
-		(port->bridge.iobaseupper << 16);
-	port->iowin_base = port->pcie->io.start + iobase;
-	port->iowin_size = ((0xFFF | ((port->bridge.iolimit & 0xF0) << 8) |
-			    (port->bridge.iolimitupper << 16)) -
-			    iobase) + 1;
-
-	mvebu_pcie_add_windows(port, port->io_target, port->io_attr,
-			       port->iowin_base, port->iowin_size,
-			       iobase);
+	desired.remap = ((port->bridge.iobase & 0xF0) << 8) |
+			(port->bridge.iobaseupper << 16);
+	desired.base = port->pcie->io.start + desired.remap;
+	desired.size = ((0xFFF | ((port->bridge.iolimit & 0xF0) << 8) |
+			 (port->bridge.iolimitupper << 16)) -
+			desired.remap) +
+		       1;
+
+	mvebu_pcie_set_window(port, port->io_target, port->io_attr, &desired,
+			      &port->iowin);
 }
 
 static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
 {
+	struct mvebu_pcie_window desired = {.remap = MVEBU_MBUS_NO_REMAP};
+
 	/* Are the new membase/memlimit values invalid? */
 	if (port->bridge.memlimit < port->bridge.membase ||
 	    !(port->bridge.command & PCI_COMMAND_MEMORY)) {
-
-		/* If a window was configured, remove it */
-		if (port->memwin_base) {
-			mvebu_pcie_del_windows(port, port->memwin_base,
-					       port->memwin_size);
-			port->memwin_base = 0;
-			port->memwin_size = 0;
-		}
-
+		mvebu_pcie_set_window(port, port->mem_target, port->mem_attr,
+				      &desired, &port->memwin);
 		return;
 	}
 
@@ -447,14 +468,12 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
 	 * window to setup, according to the PCI-to-PCI bridge
 	 * specifications.
 	 */
-	port->memwin_base  = ((port->bridge.membase & 0xFFF0) << 16);
-	port->memwin_size  =
-		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
-		port->memwin_base + 1;
-
-	mvebu_pcie_add_windows(port, port->mem_target, port->mem_attr,
-			       port->memwin_base, port->memwin_size,
-			       MVEBU_MBUS_NO_REMAP);
+	desired.base = ((port->bridge.membase & 0xFFF0) << 16);
+	desired.size = (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
+		       desired.base + 1;
+
+	mvebu_pcie_set_window(port, port->mem_target, port->mem_attr, &desired,
+			      &port->memwin);
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related

* [PATCH V7 0/8] Add support for privileged mappings
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

This series is a resend of the V5 that Mitch sent sometime back [2]
All the patches are the same and i have just rebased. Redid patch [3],
as it does not apply in this code base. Added a couple of more patches
[4], [5] from Robin for adding the privileged attributes to armv7s format
and arm-smmuv3 revert.

The following patch to the ARM SMMU driver:

    commit d346180e70b91b3d5a1ae7e5603e65593d4622bc
    Author: Robin Murphy <robin.murphy@arm.com>
    Date:   Tue Jan 26 18:06:34 2016 +0000
    
        iommu/arm-smmu: Treat all device transactions as unprivileged

started forcing all SMMU transactions to come through as "unprivileged".
The rationale given was that:

  (1) There is no way in the IOMMU API to even request privileged
      mappings.

  (2) It's difficult to implement a DMA mapper that correctly models the
      ARM VMSAv8 behavior of unprivileged-writeable =>
      privileged-execute-never.

This series rectifies (1) by introducing an IOMMU API for privileged
mappings and implements it in io-pgtable-arm.

This series rectifies (2) by introducing a new dma attribute
(DMA_ATTR_PRIVILEGED) for users of the DMA API that need privileged
mappings which are inaccessible to lesser-privileged execution levels, and
implements it in the arm64 IOMMU DMA mapper.  The one known user (pl330.c)
is converted over to the new attribute.

Jordan and Jeremy can provide more info on the use case if needed, but the
high level is that it's a security feature to prevent attacks such as [1].

Note that, i tested this on arm64 with arm-smmuv2, short descriptor changes,
and do not have an platform to test this with arm-smmuv3.

[1] https://github.com/robclark/kilroy
[2] https://lkml.org/lkml/2016/7/27/590
[3] https://patchwork.kernel.org/patch/9250493/
[4] http://www.linux-arm.org/git?p=linux-rm.git;a=commit;h=1291bd74f05d31da1dab3df02987cba5bd25849b
[5] http://www.linux-arm.org/git?p=linux-rm.git;a=commit;h=a79c1c6333f26849dba418cd92de26b60f5954f3

Changelog:
 v6..v7
    - Added couple of more patches, picked up acks, updated commit log

 v5..v6
    - Rebased all the patches and redid 6/6 as it does not apply in
      this code base. 

 v4..v5

    - Simplified patch 4/6 (suggested by Robin Murphy).

  v3..v4

    - Rebased and reworked on linux next due to the dma attrs rework going
      on over there.  Patches changed: 3/6, 4/6, and 5/6.

  v2..v3

    - Incorporated feedback from Robin:
      * Various comments and re-wordings.
      * Use existing bit definitions for IOMMU_PRIV implementation
        in io-pgtable-arm.
      * Renamed and redocumented dma_direction_to_prot.
      * Don't worry about executability in new DMA attr.

  v1..v2

    - Added a new DMA attribute to make executable privileged mappings
      work, and use that in the pl330 driver (suggested by Will).

Jeremy Gebben (1):
  iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

Mitchel Humpherys (4):
  iommu: add IOMMU_PRIV attribute
  common: DMA-mapping: add DMA_ATTR_PRIVILEGED attribute
  arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED
  dmaengine: pl330: Make sure microcode is privileged

Robin Murphy (2):
  iommu/io-pgtable-arm-v7s: Add support for the IOMMU_PRIV flag
  iommu/arm-smmu: Revert "iommu/arm-smmu: Set PRIVCFG in stage 1 STEs"

Sricharan R (1):
  iommu/arm-smmu: Set privileged attribute to 'default' instead of
    'unprivileged'

 Documentation/DMA-attributes.txt   | 10 ++++++++++
 arch/arm64/mm/dma-mapping.c        |  6 +++---
 drivers/dma/pl330.c                |  5 +++--
 drivers/iommu/arm-smmu-v3.c        |  7 +------
 drivers/iommu/arm-smmu.c           |  2 +-
 drivers/iommu/dma-iommu.c          | 10 ++++++++--
 drivers/iommu/io-pgtable-arm-v7s.c |  6 +++++-
 drivers/iommu/io-pgtable-arm.c     |  5 ++++-
 include/linux/dma-iommu.h          |  3 ++-
 include/linux/dma-mapping.h        |  7 +++++++
 include/linux/iommu.h              |  1 +
 11 files changed, 45 insertions(+), 17 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply

* [PATCH V7 1/8] iommu: add IOMMU_PRIV attribute
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481567927-14791-1-git-send-email-sricharan@codeaurora.org>

From: Mitchel Humpherys <mitchelh@codeaurora.org>

Add the IOMMU_PRIV attribute, which is used to indicate privileged
mappings.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f2960e4..bf22131 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -31,6 +31,7 @@
 #define IOMMU_CACHE	(1 << 2) /* DMA cache coherency */
 #define IOMMU_NOEXEC	(1 << 3)
 #define IOMMU_MMIO	(1 << 4) /* e.g. things like MSI doorbells */
+#define IOMMU_PRIV	(1 << 5) /* privileged */
 
 struct iommu_ops;
 struct iommu_group;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related

* [PATCH V7 2/8] iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481567927-14791-1-git-send-email-sricharan@codeaurora.org>

From: Jeremy Gebben <jgebben@codeaurora.org>

Allow the creation of privileged mode mappings, for stage 1 only.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org>
---
 drivers/iommu/io-pgtable-arm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f5c90e1..69ba83a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -350,11 +350,14 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 
 	if (data->iop.fmt == ARM_64_LPAE_S1 ||
 	    data->iop.fmt == ARM_32_LPAE_S1) {
-		pte = ARM_LPAE_PTE_AP_UNPRIV | ARM_LPAE_PTE_nG;
+		pte = ARM_LPAE_PTE_nG;
 
 		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
 			pte |= ARM_LPAE_PTE_AP_RDONLY;
 
+		if (!(prot & IOMMU_PRIV))
+			pte |= ARM_LPAE_PTE_AP_UNPRIV;
+
 		if (prot & IOMMU_MMIO)
 			pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related

* [PATCH V7 3/8] iommu/io-pgtable-arm-v7s: Add support for the IOMMU_PRIV flag
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481567927-14791-1-git-send-email-sricharan@codeaurora.org>

From: Robin Murphy <robin.murphy@arm.com>

The short-descriptor format also allows privileged-only mappings, so
let's wire it up.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Sricharan R <sricharan@codeaurora.org>
---
 drivers/iommu/io-pgtable-arm-v7s.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index f50e51c..1177782 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -265,7 +265,9 @@ static arm_v7s_iopte arm_v7s_prot_to_pte(int prot, int lvl,
 	if (!(prot & IOMMU_MMIO))
 		pte |= ARM_V7S_ATTR_TEX(1);
 	if (ap) {
-		pte |= ARM_V7S_PTE_AF | ARM_V7S_PTE_AP_UNPRIV;
+		pte |= ARM_V7S_PTE_AF;
+		if (!(prot & IOMMU_PRIV))
+			pte |= ARM_V7S_PTE_AP_UNPRIV;
 		if (!(prot & IOMMU_WRITE))
 			pte |= ARM_V7S_PTE_AP_RDONLY;
 	}
@@ -288,6 +290,8 @@ static int arm_v7s_pte_to_prot(arm_v7s_iopte pte, int lvl)
 
 	if (!(attr & ARM_V7S_PTE_AP_RDONLY))
 		prot |= IOMMU_WRITE;
+	if (!(attr & ARM_V7S_PTE_AP_UNPRIV))
+		prot |= IOMMU_PRIV;
 	if ((attr & (ARM_V7S_TEX_MASK << ARM_V7S_TEX_SHIFT)) == 0)
 		prot |= IOMMU_MMIO;
 	else if (pte & ARM_V7S_ATTR_C)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related

* [PATCH V7 4/8] common: DMA-mapping: add DMA_ATTR_PRIVILEGED attribute
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481567927-14791-1-git-send-email-sricharan@codeaurora.org>

From: Mitchel Humpherys <mitchelh@codeaurora.org>

This patch adds the DMA_ATTR_PRIVILEGED attribute to the DMA-mapping
subsystem.

Some advanced peripherals such as remote processors and GPUs perform
accesses to DMA buffers in both privileged "supervisor" and unprivileged
"user" modes.  This attribute is used to indicate to the DMA-mapping
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).

Cc: linux-doc at vger.kernel.org
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
---
 Documentation/DMA-attributes.txt | 10 ++++++++++
 include/linux/dma-mapping.h      |  7 +++++++
 2 files changed, 17 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 98bf7ac..44c6bc4 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -143,3 +143,13 @@ So, this provides a way for drivers to avoid those error messages on calls
 where allocation failures are not a problem, and shouldn't bother the logs.
 
 NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
+
+DMA_ATTR_PRIVILEGED
+------------------------------
+
+Some advanced peripherals such as remote processors and GPUs perform
+accesses to DMA buffers in both privileged "supervisor" and unprivileged
+"user" modes.  This attribute is used to indicate to the DMA-mapping
+subsystem that the buffer is fully accessible at the elevated privilege
+level (and ideally inaccessible or at least read-only at the
+lesser-privileged levels).
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6f3e6ca..ee31ea1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -63,6 +63,13 @@
 #define DMA_ATTR_NO_WARN	(1UL << 8)
 
 /*
+ * DMA_ATTR_PRIVILEGED: used to indicate that the buffer is fully
+ * accessible at an elevated privilege level (and ideally inaccessible or
+ * at least read-only@lesser-privileged levels).
+ */
+#define DMA_ATTR_PRIVILEGED		(1UL << 8)
+
+/*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.
  * It can be given to a device to use as a DMA source or target.  A CPU cannot
  * reference a dma_addr_t directly because there may be translation between
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related

* [PATCH V7 5/8] arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481567927-14791-1-git-send-email-sricharan@codeaurora.org>

From: Mitchel Humpherys <mitchelh@codeaurora.org>

The newly added DMA_ATTR_PRIVILEGED is useful for creating mappings that
are only accessible to privileged DMA engines.  Implement it in
dma-iommu.c so that the ARM64 DMA IOMMU mapper can make use of it.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
---
 arch/arm64/mm/dma-mapping.c |  6 +++---
 drivers/iommu/dma-iommu.c   | 10 ++++++++--
 include/linux/dma-iommu.h   |  3 ++-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 401f79a..ae76ead 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -557,7 +557,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 unsigned long attrs)
 {
 	bool coherent = is_device_dma_coherent(dev);
-	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
 	size_t iosize = size;
 	void *addr;
 
@@ -711,7 +711,7 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
 				   unsigned long attrs)
 {
 	bool coherent = is_device_dma_coherent(dev);
-	int prot = dma_direction_to_prot(dir, coherent);
+	int prot = dma_info_to_prot(dir, coherent, attrs);
 	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
 	if (!iommu_dma_mapping_error(dev, dev_addr) &&
@@ -769,7 +769,7 @@ static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
 		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
 
 	return iommu_dma_map_sg(dev, sgl, nelems,
-			dma_direction_to_prot(dir, coherent));
+				dma_info_to_prot(dir, coherent, attrs));
 }
 
 static void __iommu_unmap_sg_attrs(struct device *dev,
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d2a7a46..756d5e0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -182,16 +182,22 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 EXPORT_SYMBOL(iommu_dma_init_domain);
 
 /**
- * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
+ *                    page flags.
  * @dir: Direction of DMA transfer
  * @coherent: Is the DMA master cache-coherent?
+ * @attrs: DMA attributes for the mapping
  *
  * Return: corresponding IOMMU API page protection flags
  */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
+		     unsigned long attrs)
 {
 	int prot = coherent ? IOMMU_CACHE : 0;
 
+	if (attrs & DMA_ATTR_PRIVILEGED)
+		prot |= IOMMU_PRIV;
+
 	switch (dir) {
 	case DMA_BIDIRECTIONAL:
 		return prot | IOMMU_READ | IOMMU_WRITE;
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c5890..a203181 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -34,7 +34,8 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 		u64 size, struct device *dev);
 
 /* General helpers for DMA-API <-> IOMMU-API interaction */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
+		     unsigned long attrs);
 
 /*
  * These implement the bulk of the relevant DMA mapping callbacks, but require
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related

* [PATCH V7 6/8] dmaengine: pl330: Make sure microcode is privileged
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481567927-14791-1-git-send-email-sricharan@codeaurora.org>

From: Mitchel Humpherys <mitchelh@codeaurora.org>

The PL330 is hard-wired such that instruction fetches on both the
manager and channel threads go out onto the bus with the "privileged"
bit set. This can become troublesome once there is an IOMMU or other
form of memory protection downstream, since those will typically be
programmed by the DMA mapping subsystem in the expectation of normal
unprivileged transactions (such as the PL330 channel threads' own data
accesses as currently configured by this driver).

To avoid the case of, say, an IOMMU blocking an unexpected privileged
transaction with a permission fault, use the newly-introduced
DMA_ATTR_PRIVILEGED attribute for the mapping of our microcode buffer.
That way the DMA layer can do whatever it needs to do to make things
continue to work as expected on more complex systems.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
[rm: remove now-redundant local variable, clarify commit message]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/dma/pl330.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 030fe05..1e5ae0c 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -1859,9 +1859,10 @@ static int dmac_alloc_resources(struct pl330_dmac *pl330)
 	 * Alloc MicroCode buffer for 'chans' Channel threads.
 	 * A channel's buffer offset is (Channel_Id * MCODE_BUFF_PERCHAN)
 	 */
-	pl330->mcode_cpu = dma_alloc_coherent(pl330->ddma.dev,
+	pl330->mcode_cpu = dma_alloc_attrs(pl330->ddma.dev,
 				chans * pl330->mcbufsz,
-				&pl330->mcode_bus, GFP_KERNEL);
+				&pl330->mcode_bus, GFP_KERNEL,
+				DMA_ATTR_PRIVILEGED);
 	if (!pl330->mcode_cpu) {
 		dev_err(pl330->ddma.dev, "%s:%d Can't allocate memory!\n",
 			__func__, __LINE__);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related

* [PATCH V7 7/8] iommu/arm-smmu: Set privileged attribute to 'default' instead of 'unprivileged'
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481567927-14791-1-git-send-email-sricharan@codeaurora.org>

Currently the driver sets all the device transactions privileges
to UNPRIVILEGED, but there are cases where the iommu masters wants
to isolate privileged supervisor and unprivileged user.
So don't override the privileged setting to unprivileged, instead
set it to default as incoming and let it be controlled by the pagetable
settings.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index eaa8f44..8bb0eea 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1213,7 +1213,7 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
 			continue;
 
 		s2cr[idx].type = type;
-		s2cr[idx].privcfg = S2CR_PRIVCFG_UNPRIV;
+		s2cr[idx].privcfg = S2CR_PRIVCFG_DEFAULT;
 		s2cr[idx].cbndx = cbndx;
 		arm_smmu_write_s2cr(smmu, idx);
 	}
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related

* [PATCH V7 8/8] iommu/arm-smmu: Revert "iommu/arm-smmu: Set PRIVCFG in stage 1 STEs"
From: Sricharan R @ 2016-12-12 18:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481567927-14791-1-git-send-email-sricharan@codeaurora.org>

From: Robin Murphy <robin.murphy@arm.com>

Now that proper privileged mappings can be requested via IOMMU_PRIV,
unconditionally overriding the incoming PRIVCFG becomes the wrong thing
to do, so stop it.

This reverts commit df5e1a0f2a2d779ad467a691203bcbc74d75690e.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 257a6a3..0eca0553 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -269,9 +269,6 @@
 #define STRTAB_STE_1_SHCFG_INCOMING	1UL
 #define STRTAB_STE_1_SHCFG_SHIFT	44
 
-#define STRTAB_STE_1_PRIVCFG_UNPRIV	2UL
-#define STRTAB_STE_1_PRIVCFG_SHIFT	48
-
 #define STRTAB_STE_2_S2VMID_SHIFT	0
 #define STRTAB_STE_2_S2VMID_MASK	0xffffUL
 #define STRTAB_STE_2_VTCR_SHIFT		32
@@ -1073,9 +1070,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 #ifdef CONFIG_PCI_ATS
 			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
 #endif
-			 STRTAB_STE_1_STRW_NSEL1 << STRTAB_STE_1_STRW_SHIFT |
-			 STRTAB_STE_1_PRIVCFG_UNPRIV <<
-			 STRTAB_STE_1_PRIVCFG_SHIFT);
+			 STRTAB_STE_1_STRW_NSEL1 << STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox