Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] dt: pwm: lpc32xx: add description of clocks and #pwm-cells properties
From: Vladimir Zapolskiy @ 2016-12-05  1:42 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161205014237.1689-1-vz@mleia.com>

NXP LPC32xx SoCs have two simple independent PWM controllers with a single
output each, in this case there is no need to specify PWM channel argument
on client side, one cell for setting PWM output frequency is sufficient.

Another added to the description property 'clocks' has a standard meaning
of a controller supply clock, in the LPC32xx User's Manual the clock is
denoted as PWM1_CLK or PWM2_CLK clock.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
---
 Documentation/devicetree/bindings/pwm/lpc32xx-pwm.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/pwm/lpc32xx-pwm.txt b/Documentation/devicetree/bindings/pwm/lpc32xx-pwm.txt
index 74b5bc5..523d796 100644
--- a/Documentation/devicetree/bindings/pwm/lpc32xx-pwm.txt
+++ b/Documentation/devicetree/bindings/pwm/lpc32xx-pwm.txt
@@ -3,15 +3,22 @@ LPC32XX PWM controller
 Required properties:
 - compatible: should be "nxp,lpc3220-pwm"
 - reg: physical base address and length of the controller's registers
+- clocks: clock phandle and clock specifier pair
+- #pwm-cells: should be 1, the cell is used to specify the period in
+  nanoseconds.
 
 Examples:
 
 pwm at 4005c000 {
 	compatible = "nxp,lpc3220-pwm";
 	reg = <0x4005c000 0x4>;
+	clocks = <&clk LPC32XX_CLK_PWM1>;
+	#pwm-cells = <1>;
 };
 
 pwm at 4005c004 {
 	compatible = "nxp,lpc3220-pwm";
 	reg = <0x4005c004 0x4>;
+	clocks = <&clk LPC32XX_CLK_PWM2>;
+	#pwm-cells = <1>;
 };
-- 
2.10.2

^ permalink raw reply related

* [PATCH 0/3] pwm: lpc32xx: switch driver to one phandle argument for PWM consumers
From: Vladimir Zapolskiy @ 2016-12-05  1:42 UTC (permalink / raw)
  To: linux-arm-kernel

The change adds private of_xlate() function to process one argument given
along with a PWM phandle. LPC32xx SoCs have two independent single channel
PWM controllers, the argument is used as a description of PWM output
frequency, see previous discussion here:

  http://www.spinics.net/lists/arm-kernel/msg534068.html

The changes on actual board dts files are not needed at the moment, because
all LPC32xx boards in upstream don't describe any PWM consumers.

Vladimir Zapolskiy (3):
  dt: pwm: lpc32xx: add description of clocks and #pwm-cells properties
  pwm: lpc32xx: switch driver to one phandle argument for PWM consumers
  pwm: lpc32xx: remove handling of PWM channels

 .../devicetree/bindings/pwm/lpc32xx-pwm.txt        |  7 +++++
 drivers/pwm/pwm-lpc32xx.c                          | 36 +++++++++++++++-------
 2 files changed, 32 insertions(+), 11 deletions(-)

-- 
2.10.2

^ permalink raw reply

* [v2, PATCH] pinctrl: mt8173: set GPIO16 to usb iddig mode
From: Chunfeng Yun @ 2016-12-05  1:38 UTC (permalink / raw)
  To: linux-arm-kernel

the default mode of GPIO16 pin is gpio, when set EINT16 to
IRQ_TYPE_LEVEL_HIGH, no interrupt is triggered, it can be
fixed when set its default mode as usb iddig.

Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Acked-by: Hongzhou Yang <hongzhou.yang@mediatek.com>
---
 drivers/pinctrl/mediatek/pinctrl-mtk-mt8173.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-mt8173.h b/drivers/pinctrl/mediatek/pinctrl-mtk-mt8173.h
index 13e5b68..9b018fd 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mtk-mt8173.h
+++ b/drivers/pinctrl/mediatek/pinctrl-mtk-mt8173.h
@@ -201,7 +201,7 @@
 	MTK_PIN(
 		PINCTRL_PIN(16, "IDDIG"),
 		NULL, "mt8173",
-		MTK_EINT_FUNCTION(0, 16),
+		MTK_EINT_FUNCTION(1, 16),
 		MTK_FUNCTION(0, "GPIO16"),
 		MTK_FUNCTION(1, "IDDIG"),
 		MTK_FUNCTION(2, "CMFLASH"),
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] ARM: dts: imx7d: fix LCDIF clock assignment
From: Stefan Agner @ 2016-12-05  1:26 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAOMZO5AqKupmmKb4biGfmy4v3ZwS-BFjPY=MLd8Rc0ZXL+vQvg@mail.gmail.com>

Hi Shawn

On 2016-11-23 15:02, Fabio Estevam wrote:
> On Tue, Nov 22, 2016 at 10:42 PM, Stefan Agner <stefan@agner.ch> wrote:
>> The eLCDIF IP of the i.MX 7 SoC knows multiple clocks and lists them
>> separately:
>>
>> Clock      Clock Root              Description
>> apb_clk    MAIN_AXI_CLK_ROOT       AXI clock
>> pix_clk    LCDIF_PIXEL_CLK_ROOT    Pixel clock
>> ipg_clk_s  MAIN_AXI_CLK_ROOT       Peripheral access clock
>>
>> All of them are switched by a single gate, which is part of the
>> IMX7D_LCDIF_PIXEL_ROOT_CLK clock. Hence using that clock also for
>> the AXI bus clock (clock-name "axi") makes sure the gate gets
>> enabled when accessing registers.
>>
>> There seem to be no separate AXI display clock, and the clock is
>> optional. Hence remove the dummy clock.
>>
>> This fixes kernel freezes when starting the X-Server (which
>> disables/re-enables the display controller).
>>
>> Signed-off-by: Stefan Agner <stefan@agner.ch>
> 
> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>

Since this fixes a kernel freeze, is there a chance to get this still in
4.9?

--
Stefan

^ permalink raw reply

* [PATCH v9 07/16] drivers: acpi: implement acpi_dma_configure
From: Rafael J. Wysocki @ 2016-12-05  1:21 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161203103927.GA14953@red-moon>

On Sat, Dec 3, 2016 at 11:39 AM, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
> On Sat, Dec 03, 2016 at 03:11:09AM +0100, Rafael J. Wysocki wrote:
>> On Fri, Dec 2, 2016 at 4:38 PM, Lorenzo Pieralisi
>> <lorenzo.pieralisi@arm.com> wrote:
>> > Rafael, Mark, Suravee,
>> >
>> > On Mon, Nov 21, 2016 at 10:01:39AM +0000, Lorenzo Pieralisi wrote:
>> >> On DT based systems, the of_dma_configure() API implements DMA
>> >> configuration for a given device. On ACPI systems an API equivalent to
>> >> of_dma_configure() is missing which implies that it is currently not
>> >> possible to set-up DMA operations for devices through the ACPI generic
>> >> kernel layer.
>> >>
>> >> This patch fills the gap by introducing acpi_dma_configure/deconfigure()
>> >> calls that for now are just wrappers around arch_setup_dma_ops() and
>> >> arch_teardown_dma_ops() and also updates ACPI and PCI core code to use
>> >> the newly introduced acpi_dma_configure/acpi_dma_deconfigure functions.
>> >>
>> >> Since acpi_dma_configure() is used to configure DMA operations, the
>> >> function initializes the dma/coherent_dma masks to sane default values
>> >> if the current masks are uninitialized (also to keep the default values
>> >> consistent with DT systems) to make sure the device has a complete
>> >> default DMA set-up.
>> >
>> > I spotted a niggle that unfortunately was hard to spot (and should not
>> > be a problem per se but better safe than sorry) and I am not comfortable
>> > with it.
>> >
>> > Following commit d0562674838c ("ACPI / scan: Parse _CCA and setup
>> > device coherency") in acpi_bind_one() we check if the acpi_device
>> > associated with a device just added supports DMA, first it was
>> > done with acpi_check_dma() and then commit 1831eff876bd ("device
>> > property: ACPI: Make use of the new DMA Attribute APIs") changed
>> > it to acpi_get_dma_attr().
>> >
>> > The subsequent check (attr != DEV_DMA_NOT_SUPPORTED) is always true
>> > on _any_ acpi device we pass to acpi_bind_one() on x86, which was
>> > fine because we used it to call arch_setup_dma_ops(), which is a nop
>> > on x86. On ARM64 a _CCA method is required to define if a device
>> > supports DMA so (attr != DEV_DMA_NOT_SUPPORTED) may well be false.
>> >
>> > Now, acpi_bind_one() is used to bind an acpi_device to its physical
>> > node also for pseudo-devices like cpus and memory nodes. For those
>> > objects, on x86, attr will always be != DEV_DMA_NOT_SUPPORTED.
>> >
>> > So far so good, because on x86 arch_setup_dma_ops() is empty code.
>> >
>> > With this patch, I use the (attr != DEV_DMA_NOT_SUPPORTED) check
>> > to call acpi_dma_configure() which is basically a nop on x86 except
>> > that it sets up the dma_mask/coherent_dma_mask to a sane default value
>> > (after all we are setting up DMA for the device so it makes sense to
>> > initialize the masks there if they were unset since we are configuring
>> > DMA for the device in question) for the given device.
>> >
>> > Problem is, as per the explanation above, we are also setting the
>> > default dma masks for pseudo-devices (eg CPUs) that were previously
>> > untouched, it should not be a problem per-se but I am not comfortable
>> > with that, honestly it does not make much sense.
>> >
>> > An easy "fix" would be to move the default dma masks initialization out
>> > of acpi_dma_configure() (as it was in previous patch versions of this
>> > series - I moved it to acpi_dma_configure() just a consolidation point
>> > for initializing the masks instead of scattering them in every
>> > acpi_dma_configure caller) I can send this as a fix-up patch to Joerg if
>> > we think that's the right thing to do (or I can send it to Rafael later
>> > when the code is in the merged depending on the timing) just let me
>> > know please.
>>
>> Why can't arch_setup_dma_ops() set those masks too?
>
> Because the dma masks set-up is done by the caller (see
> of_dma_configure()) according to firmware configuration or
> platform data knowledge. I wanted to replicate the of_dma_configure()
> interface on ACPI for obvious reasons (on ARM systems), I stopped
> short of adding ACPI code to mirror of_dma_get_range() equivalent
> (through the _DMA object) but I am really really nervous about changing
> the code path on x86 because in theory all is fine, in practice even
> just setting the masks to sane values can have unexpected consequences,
> I just can't know (that's why I wasn't doing it in the first iterations
> of this series).
>
> Side note: DT with of_dma_configure() and ACPI with
> acpi_create_platform_device() set the default dma mask for all
> platform devices already _regardless_ of what they really are, though
> arguably acpi_bind_one() touches ways more devices.
>
> I really think that removing the default dma masks settings from
> acpi_dma_configure() is the safer thing to do for the time being (or
> moving acpi_dma_configure() to acpi_create_platform_device(), where the
> DMA masks are set-up by default by core ACPI. Mark, Suravee, what was
> the rationale behind calling arch_setup_dma_ops() in acpi_bind_one() ?)

Alternatively, you can add one more arch wrapper that will be a no-op
on x86 and that will set up the default masks and call
arch_setup_dma_ops() on ARM.  Then, you can invoke that from
acpi_dma_configure().

Or make the definition of acpi_dma_configure() itself depend on the
architecture.

Thanks,
Rafael

^ permalink raw reply

* [PATCH 3/3] ARM: dts: imx6: Support Savageboard quad
From: Milo Kim @ 2016-12-05  1:07 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161205010729.7047-1-woogyom.kim@gmail.com>

Use common board file and support SATA interface additionally.

Signed-off-by: Milo Kim <woogyom.kim@gmail.com>
---
 arch/arm/boot/dts/imx6q-savageboard.dts | 54 +++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)
 create mode 100644 arch/arm/boot/dts/imx6q-savageboard.dts

diff --git a/arch/arm/boot/dts/imx6q-savageboard.dts b/arch/arm/boot/dts/imx6q-savageboard.dts
new file mode 100644
index 0000000..8d74002
--- /dev/null
+++ b/arch/arm/boot/dts/imx6q-savageboard.dts
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2016 Milo Kim <woogyom.kim@gmail.com>
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License
+ *     version 2 as published by the Free Software Foundation.
+ *
+ *     This file is distributed in the hope that it will be useful
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ * Or, alternatively
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED , WITHOUT WARRANTY OF ANY KIND
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+
+#include "imx6q.dtsi"
+#include "imx6qdl-savageboard.dtsi"
+
+/ {
+	model = "Poslab SavageBoard Quad";
+	compatible = "poslab,imx6q-savageboard", "fsl,imx6q";
+};
+
+&sata {
+	status = "okay";
+};
-- 
2.9.3

^ permalink raw reply related

* [PATCH 2/3] ARM: dts: imx6: Support Savageboard dual
From: Milo Kim @ 2016-12-05  1:07 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161205010729.7047-1-woogyom.kim@gmail.com>

Common savageboard DT file is used for board support.

Signed-off-by: Milo Kim <woogyom.kim@gmail.com>
---
 arch/arm/boot/dts/imx6dl-savageboard.dts | 50 ++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 arch/arm/boot/dts/imx6dl-savageboard.dts

diff --git a/arch/arm/boot/dts/imx6dl-savageboard.dts b/arch/arm/boot/dts/imx6dl-savageboard.dts
new file mode 100644
index 0000000..2cac30d
--- /dev/null
+++ b/arch/arm/boot/dts/imx6dl-savageboard.dts
@@ -0,0 +1,50 @@
+/*
+ * Copyright (C) 2016 Milo Kim <woogyom.kim@gmail.com>
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License
+ *     version 2 as published by the Free Software Foundation.
+ *
+ *     This file is distributed in the hope that it will be useful
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ * Or, alternatively
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED , WITHOUT WARRANTY OF ANY KIND
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+
+#include "imx6dl.dtsi"
+#include "imx6qdl-savageboard.dtsi"
+
+/ {
+	model = "Poslab SavageBoard Dual";
+	compatible = "poslab,imx6dl-savageboard", "fsl,imx6dl";
+};
-- 
2.9.3

^ permalink raw reply related

* [PATCH 1/3] ARM: dts: imx6: Add Savageboard common file
From: Milo Kim @ 2016-12-05  1:07 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161205010729.7047-1-woogyom.kim@gmail.com>

This patch enables common board DT properties for Poslab i.MX6 Savageboard
dual and quad. (https://www.savageboard.org)

* Memory
  memblock for DDR3 1GB

* Regulator
  3.3V for panel and backlight.

* Display
  Enable HDMI and LVDS panel. Savageboard supports AVIC TM097TDH02 panel
  which is compatible with Hannstar HSD100PXN1, so reuse it.

* Clock
  The commit d28be499c45e6 is applied to support LVDS and HDMI output
  simultaneously.

* Pinmux
  eMMC, ethernet, HDMI, I2C, power button, PWM, SD card and UART.

* Others
  Enable ethernet, UART1 debug, USB host, USDHC3 for microSD card and
  USDHC4 for built-in eMMC storage.

Signed-off-by: Milo Kim <woogyom.kim@gmail.com>
---
 arch/arm/boot/dts/imx6qdl-savageboard.dtsi | 271 +++++++++++++++++++++++++++++
 1 file changed, 271 insertions(+)
 create mode 100644 arch/arm/boot/dts/imx6qdl-savageboard.dtsi

diff --git a/arch/arm/boot/dts/imx6qdl-savageboard.dtsi b/arch/arm/boot/dts/imx6qdl-savageboard.dtsi
new file mode 100644
index 0000000..09db16c
--- /dev/null
+++ b/arch/arm/boot/dts/imx6qdl-savageboard.dtsi
@@ -0,0 +1,271 @@
+/*
+ * Copyright (C) 2016 Milo Kim <woogyom.kim@gmail.com>
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License
+ *     version 2 as published by the Free Software Foundation.
+ *
+ *     This file is distributed in the hope that it will be useful
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ * Or, alternatively
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED , WITHOUT WARRANTY OF ANY KIND
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <dt-bindings/gpio/gpio.h>
+#include <dt-bindings/input/input.h>
+
+/ {
+	chosen {
+		stdout-path = &uart1;
+	};
+
+	memory at 10000000 {
+		device_type = "memory";
+		reg = <0x10000000 0x40000000>;
+	};
+
+	backlight: panel_bl {
+		compatible = "pwm-backlight";
+		brightness-levels = <0 4 8 16 32 64 128 255>;
+		default-brightness-level = <4>;
+		power-supply = <&reg_3p3v>;
+		pwms = <&pwm1 0 10000>;
+	};
+
+	gpio-keys {
+		compatible = "gpio-keys";
+		pinctrl-names = "default";
+		pinctrl-0 = <&pinctrl_gpio_keys>;
+
+		power {
+			gpios = <&gpio3 7 GPIO_ACTIVE_HIGH>;
+			label = "Power Button";
+			linux,code = <KEY_POWER>;
+			wakeup-source;
+		};
+	};
+
+	panel {
+		compatible = "avic, tm097tdh02", "hannstar,hsd100pxn1";
+		backlight = <&backlight>;
+		power-supply = <&reg_3p3v>;
+
+		port {
+			panel_in: endpoint {
+				remote-endpoint = <&lvds0_out>;
+			};
+		};
+	};
+
+	regulators {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		reg_3p3v: regulator at 0 {
+			compatible = "regulator-fixed";
+			reg = <0>;
+			regulator-name = "3P3V";
+			regulator-min-microvolt = <3300000>;
+			regulator-max-microvolt = <3300000>;
+			regulator-always-on;
+		};
+	};
+};
+
+&clks {
+	assigned-clocks = <&clks IMX6QDL_CLK_LDB_DI0_SEL>,
+			  <&clks IMX6QDL_CLK_LDB_DI1_SEL>;
+	assigned-clock-parents = <&clks IMX6QDL_CLK_PLL3_USB_OTG>,
+				 <&clks IMX6QDL_CLK_PLL3_USB_OTG>;
+};
+
+&fec {
+	phy-mode = "rgmii";
+	phy-reset-gpios = <&gpio1 25 GPIO_ACTIVE_HIGH>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&pinctrl_enet>;
+	status = "okay";
+};
+
+&hdmi {
+	ddc-i2c-bus = <&i2c2>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&pinctrl_hdmi_tx_cec>;
+	status = "okay";
+};
+
+&i2c2 {
+	clock-frequency = <100000>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&pinctrl_i2c2>;
+	status = "okay";
+};
+
+&iomuxc {
+	savageboard {
+		pinctrl_emmc: emmcgrp {
+			fsl,pins = <
+				MX6QDL_PAD_SD4_CMD__SD4_CMD		0x17059
+				MX6QDL_PAD_SD4_CLK__SD4_CLK		0x10059
+				MX6QDL_PAD_SD4_DAT0__SD4_DATA0		0x17059
+				MX6QDL_PAD_SD4_DAT1__SD4_DATA1		0x17059
+				MX6QDL_PAD_SD4_DAT2__SD4_DATA2		0x17059
+				MX6QDL_PAD_SD4_DAT3__SD4_DATA3		0x17059
+				MX6QDL_PAD_SD4_DAT4__SD4_DATA4		0x17059
+				MX6QDL_PAD_SD4_DAT5__SD4_DATA5		0x17059
+				MX6QDL_PAD_SD4_DAT6__SD4_DATA6		0x17059
+				MX6QDL_PAD_SD4_DAT7__SD4_DATA7		0x17059
+			>;
+		};
+
+		pinctrl_enet: enetgrp {
+			fsl,pins = <
+				MX6QDL_PAD_ENET_MDIO__ENET_MDIO		0x1b0b0
+				MX6QDL_PAD_ENET_MDC__ENET_MDC		0x1b0b0
+				MX6QDL_PAD_RGMII_TXC__RGMII_TXC		0x1b030
+				MX6QDL_PAD_RGMII_TD0__RGMII_TD0		0x1b030
+				MX6QDL_PAD_RGMII_TD1__RGMII_TD1		0x1b030
+				MX6QDL_PAD_RGMII_TD2__RGMII_TD2		0x1b030
+				MX6QDL_PAD_RGMII_TD3__RGMII_TD3		0x1b030
+				MX6QDL_PAD_RGMII_TX_CTL__RGMII_TX_CTL	0x1b030
+				MX6QDL_PAD_ENET_REF_CLK__ENET_TX_CLK	0x1b0b0
+				MX6QDL_PAD_RGMII_RXC__RGMII_RXC		0x1b030
+				MX6QDL_PAD_RGMII_RD0__RGMII_RD0		0x1b030
+				MX6QDL_PAD_RGMII_RD1__RGMII_RD1		0x1b030
+				MX6QDL_PAD_RGMII_RD2__RGMII_RD2		0x1b030
+				MX6QDL_PAD_RGMII_RD3__RGMII_RD3		0x1b030
+				MX6QDL_PAD_RGMII_RX_CTL__RGMII_RX_CTL	0x1b030
+				/* PHY reset */
+				MX6QDL_PAD_ENET_CRS_DV__GPIO1_IO25	0x1b0b0
+			>;
+		};
+
+		pinctrl_hdmi_tx_cec: hdmitxcecgrp {
+			fsl,pins = <
+				MX6QDL_PAD_KEY_ROW2__HDMI_TX_CEC_LINE	0x1f8b0
+			>;
+		};
+
+		pinctrl_i2c2: i2c2grp {
+			fsl,pins = <
+				MX6QDL_PAD_KEY_COL3__I2C2_SCL		0x4001b8b1
+				MX6QDL_PAD_KEY_ROW3__I2C2_SDA		0x4001b8b1
+			>;
+		};
+
+		pinctrl_gpio_keys: gpiokeysgrp {
+			fsl,pins = <
+				MX6QDL_PAD_EIM_DA7__GPIO3_IO07		0xb0b1
+			>;
+		};
+
+		pinctrl_pwm1: pwm1grp {
+			fsl,pins = <
+				MX6QDL_PAD_SD1_DAT3__PWM1_OUT		0x1b0b1
+			>;
+		};
+
+		pinctrl_sd: sdgrp {
+			fsl,pins = <
+				MX6QDL_PAD_SD3_CMD__SD3_CMD		0x17059
+				MX6QDL_PAD_SD3_CLK__SD3_CLK		0x10059
+				MX6QDL_PAD_SD3_DAT0__SD3_DATA0		0x17059
+				MX6QDL_PAD_SD3_DAT1__SD3_DATA1		0x17059
+				MX6QDL_PAD_SD3_DAT2__SD3_DATA2		0x17059
+				MX6QDL_PAD_SD3_DAT3__SD3_DATA3		0x17059
+				/* CD pin */
+				MX6QDL_PAD_NANDF_D0__GPIO2_IO00		0x1b0b1
+			>;
+		};
+
+		pinctrl_uart1: uart1grp {
+			fsl,pins = <
+				MX6QDL_PAD_CSI0_DAT10__UART1_TX_DATA	0x1b0b1
+				MX6QDL_PAD_CSI0_DAT11__UART1_RX_DATA	0x1b0b1
+			>;
+		};
+	};
+};
+
+&ldb {
+	status = "okay";
+
+	lvds-channel at 0 {
+		reg = <0>;
+		status = "okay";
+
+		port at 4 {
+			reg = <4>;
+
+			lvds0_out: endpoint {
+				remote-endpoint = <&panel_in>;
+			};
+		};
+	};
+};
+
+&pwm1 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&pinctrl_pwm1>;
+	status = "okay";
+};
+
+&uart1 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&pinctrl_uart1>;
+	status = "okay";
+};
+
+&usbh1 {
+	status = "okay";
+};
+
+/* SD card */
+&usdhc3 {
+	bus-width = <4>;
+	cd-gpios = <&gpio2 0 GPIO_ACTIVE_LOW>;
+	no-1-8-v;
+	pinctrl-names = "default";
+	pinctrl-0 = <&pinctrl_sd>;
+	status = "okay";
+};
+
+/* eMMC */
+&usdhc4 {
+	bus-width = <8>;
+	keep-power-in-suspend;
+	no-1-8-v;
+	non-removable;
+	pinctrl-names = "default";
+	pinctrl-0 = <&pinctrl_emmc>;
+	status = "okay";
+};
-- 
2.9.3

^ permalink raw reply related

* [PATCH 0/3] ARM: dts: imx6: Support Poslab Savageboard dual & quad
From: Milo Kim @ 2016-12-05  1:07 UTC (permalink / raw)
  To: linux-arm-kernel

Poslab Savageboard is i.MX6 SoC base, but BSP code from the vendor is 
not mainline u-boot and kernel. Personal reason of using this board is 
testing etnaviv user-space driver, so I re-write device tree files based on
mainline kernel for the first step.

This patchset includes common DT file, dual and quad board files.

Supported components are
  - Display: HDMI and LVDS panel
  - eMMC and SD card
  - Ethernet
  - Pinmux configuration
  - SATA: only for Savageboard quad
  - UART1 for debug console
  - USB host

Missing features are
  - Audio (WM8903)
  - USB OTG
  - PMIC WM8326: default settings are used so no issue to bring-up the system
  - MIPI DSI, CSI

Patches are tested on the Savageboard quad but the dual version should work 
because the only difference between dual and quad is SATA support.

More information in http://www.savageboard.org

Milo Kim (3):
  ARM: dts: imx6: Add Savageboard common file
  ARM: dts: imx6: Support Savageboard dual
  ARM: dts: imx6: Support Savageboard quad

 arch/arm/boot/dts/imx6dl-savageboard.dts   |  50 ++++++
 arch/arm/boot/dts/imx6q-savageboard.dts    |  54 ++++++
 arch/arm/boot/dts/imx6qdl-savageboard.dtsi | 271 +++++++++++++++++++++++++++++
 3 files changed, 375 insertions(+)
 create mode 100644 arch/arm/boot/dts/imx6dl-savageboard.dts
 create mode 100644 arch/arm/boot/dts/imx6q-savageboard.dts
 create mode 100644 arch/arm/boot/dts/imx6qdl-savageboard.dtsi

-- 
2.9.3

^ permalink raw reply

* [PATCH] dt: pwm: bcm2835: fix typo in clocks property name
From: Vladimir Zapolskiy @ 2016-12-05  0:10 UTC (permalink / raw)
  To: linux-arm-kernel

According to the examples of BCM2835 PWM device nodes there is a typo in
'clocks' property name, which is a common property name on clock consumer
side to store a phandle to an input clock.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
---
 Documentation/devicetree/bindings/pwm/pwm-bcm2835.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/pwm/pwm-bcm2835.txt b/Documentation/devicetree/bindings/pwm/pwm-bcm2835.txt
index fb6fb31..cf573e8 100644
--- a/Documentation/devicetree/bindings/pwm/pwm-bcm2835.txt
+++ b/Documentation/devicetree/bindings/pwm/pwm-bcm2835.txt
@@ -3,7 +3,7 @@ BCM2835 PWM controller (Raspberry Pi controller)
 Required properties:
 - compatible: should be "brcm,bcm2835-pwm"
 - reg: physical base address and length of the controller's registers
-- clock: This clock defines the base clock frequency of the PWM hardware
+- clocks: This clock defines the base clock frequency of the PWM hardware
   system, the period and the duty_cycle of the PWM signal is a multiple of
   the base period.
 - #pwm-cells: Should be 2. See pwm.txt in this directory for a description of
-- 
2.10.2

^ permalink raw reply related

* [PATCH dt V2] ARM: BCM5301X: Enable UART by default for BCM4708(1), BCM4709(4) & BCM53012
From: Hauke Mehrtens @ 2016-12-04 19:08 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161201174051.4965-1-zajec5@gmail.com>

On 12/01/2016 06:40 PM, Rafa? Mi?ecki wrote:
> From: Rafa? Mi?ecki <rafal@milecki.pl>
> 
> Every device tested so far got UART0 (at 0x18000300) working as serial
> console. It's most likely part of reference design and all vendors use
> it that way.

I think the last 4 boards are some of the reference boards. ;-)

> It seems to be easier to enable it by default and just disable it if we
> ever see a device with different hardware design.
> 
> Signed-off-by: Rafa? Mi?ecki <rafal@milecki.pl>

Acked-by: Hauke Mehrtens <hauke@hauke-m.de>

> ---
> V2: Update bcm94708.dts bcm94709.dts bcm953012er.dts & bcm953012k.dts
> ---
>  arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts  | 4 ----
>  arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts       | 4 ----
>  arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts       | 4 ----
>  arch/arm/boot/dts/bcm4708-netgear-r6250.dts        | 4 ----
>  arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts      | 4 ----
>  arch/arm/boot/dts/bcm4708.dtsi                     | 4 ++++
>  arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts | 4 ----
>  arch/arm/boot/dts/bcm47081.dtsi                    | 4 ++++
>  arch/arm/boot/dts/bcm4709-netgear-r7000.dts        | 4 ----
>  arch/arm/boot/dts/bcm4709-netgear-r8000.dts        | 4 ----
>  arch/arm/boot/dts/bcm4709-tplink-archer-c9-v1.dts  | 4 ----
>  arch/arm/boot/dts/bcm4709.dtsi                     | 1 +
>  arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts      | 4 ----
>  arch/arm/boot/dts/bcm47094-luxul-xwr-3100.dts      | 4 ----
>  arch/arm/boot/dts/bcm47094-netgear-r8500.dts       | 4 ----
>  arch/arm/boot/dts/bcm47094.dtsi                    | 1 +
>  arch/arm/boot/dts/bcm94708.dts                     | 4 ----
>  arch/arm/boot/dts/bcm94709.dts                     | 4 ----
>  arch/arm/boot/dts/bcm953012er.dts                  | 4 ----
>  arch/arm/boot/dts/bcm953012k.dts                   | 1 -
>  20 files changed, 10 insertions(+), 61 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts b/arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts
> index 9cb186e..d49afec0 100644
> --- a/arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts
> +++ b/arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dts
> @@ -136,10 +136,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &usb2 {
>  	vcc-gpio = <&chipcommon 9 GPIO_ACTIVE_HIGH>;
>  };
> diff --git a/arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts b/arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts
> index 35e6ed6..f591b0f 100644
> --- a/arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts
> +++ b/arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts
> @@ -55,10 +55,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &spi_nor {
>  	status = "okay";
>  };
> diff --git a/arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts b/arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts
> index 1c7e53d..50d65d8 100644
> --- a/arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts
> +++ b/arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts
> @@ -56,10 +56,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &spi_nor {
>  	status = "okay";
>  };
> diff --git a/arch/arm/boot/dts/bcm4708-netgear-r6250.dts b/arch/arm/boot/dts/bcm4708-netgear-r6250.dts
> index 8ce39d5..8519548 100644
> --- a/arch/arm/boot/dts/bcm4708-netgear-r6250.dts
> +++ b/arch/arm/boot/dts/bcm4708-netgear-r6250.dts
> @@ -83,10 +83,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &usb3 {
>  	vcc-gpio = <&chipcommon 0 GPIO_ACTIVE_HIGH>;
>  };
> diff --git a/arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts b/arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts
> index 70f4bb9..74cfcd3 100644
> --- a/arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts
> +++ b/arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts
> @@ -119,10 +119,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &spi_nor {
>  	status = "okay";
>  };
> diff --git a/arch/arm/boot/dts/bcm4708.dtsi b/arch/arm/boot/dts/bcm4708.dtsi
> index eed4dd1..d0eec09 100644
> --- a/arch/arm/boot/dts/bcm4708.dtsi
> +++ b/arch/arm/boot/dts/bcm4708.dtsi
> @@ -34,3 +34,7 @@
>  	};
>  
>  };
> +
> +&uart0 {
> +	status = "okay";
> +};
> diff --git a/arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts b/arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts
> index a9c8def..2922536 100644
> --- a/arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts
> +++ b/arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts
> @@ -122,7 +122,3 @@
>  		};
>  	};
>  };
> -
> -&uart0 {
> -	status = "okay";
> -};
> diff --git a/arch/arm/boot/dts/bcm47081.dtsi b/arch/arm/boot/dts/bcm47081.dtsi
> index f720012..c5f7619 100644
> --- a/arch/arm/boot/dts/bcm47081.dtsi
> +++ b/arch/arm/boot/dts/bcm47081.dtsi
> @@ -24,3 +24,7 @@
>  		};
>  	};
>  };
> +
> +&uart0 {
> +	status = "okay";
> +};
> diff --git a/arch/arm/boot/dts/bcm4709-netgear-r7000.dts b/arch/arm/boot/dts/bcm4709-netgear-r7000.dts
> index fd38d2a..0225d82 100644
> --- a/arch/arm/boot/dts/bcm4709-netgear-r7000.dts
> +++ b/arch/arm/boot/dts/bcm4709-netgear-r7000.dts
> @@ -100,7 +100,3 @@
>  		};
>  	};
>  };
> -
> -&uart0 {
> -	status = "okay";
> -};
> diff --git a/arch/arm/boot/dts/bcm4709-netgear-r8000.dts b/arch/arm/boot/dts/bcm4709-netgear-r8000.dts
> index 92f8a72..56d38a3 100644
> --- a/arch/arm/boot/dts/bcm4709-netgear-r8000.dts
> +++ b/arch/arm/boot/dts/bcm4709-netgear-r8000.dts
> @@ -107,10 +107,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &usb2 {
>  	vcc-gpio = <&chipcommon 0 GPIO_ACTIVE_HIGH>;
>  };
> diff --git a/arch/arm/boot/dts/bcm4709-tplink-archer-c9-v1.dts b/arch/arm/boot/dts/bcm4709-tplink-archer-c9-v1.dts
> index 9a92c24..c67bfaa 100644
> --- a/arch/arm/boot/dts/bcm4709-tplink-archer-c9-v1.dts
> +++ b/arch/arm/boot/dts/bcm4709-tplink-archer-c9-v1.dts
> @@ -97,10 +97,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &usb2 {
>  	vcc-gpio = <&chipcommon 13 GPIO_ACTIVE_HIGH>;
>  };
> diff --git a/arch/arm/boot/dts/bcm4709.dtsi b/arch/arm/boot/dts/bcm4709.dtsi
> index f039765..c645fea 100644
> --- a/arch/arm/boot/dts/bcm4709.dtsi
> +++ b/arch/arm/boot/dts/bcm4709.dtsi
> @@ -8,4 +8,5 @@
>  
>  &uart0 {
>  	clock-frequency = <125000000>;
> +	status = "okay";
>  };
> diff --git a/arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts b/arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts
> index 661348d..7fb9270 100644
> --- a/arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts
> +++ b/arch/arm/boot/dts/bcm47094-dlink-dir-885l.dts
> @@ -105,10 +105,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &usb3 {
>  	vcc-gpio = <&chipcommon 18 GPIO_ACTIVE_HIGH>;
>  };
> diff --git a/arch/arm/boot/dts/bcm47094-luxul-xwr-3100.dts b/arch/arm/boot/dts/bcm47094-luxul-xwr-3100.dts
> index 169b35f..2f4a651 100644
> --- a/arch/arm/boot/dts/bcm47094-luxul-xwr-3100.dts
> +++ b/arch/arm/boot/dts/bcm47094-luxul-xwr-3100.dts
> @@ -98,10 +98,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &usb3 {
>  	vcc-gpio = <&chipcommon 18 GPIO_ACTIVE_HIGH>;
>  };
> diff --git a/arch/arm/boot/dts/bcm47094-netgear-r8500.dts b/arch/arm/boot/dts/bcm47094-netgear-r8500.dts
> index 521b415..7ecd57c 100644
> --- a/arch/arm/boot/dts/bcm47094-netgear-r8500.dts
> +++ b/arch/arm/boot/dts/bcm47094-netgear-r8500.dts
> @@ -97,7 +97,3 @@
>  		};
>  	};
>  };
> -
> -&uart0 {
> -	status = "okay";
> -};
> diff --git a/arch/arm/boot/dts/bcm47094.dtsi b/arch/arm/boot/dts/bcm47094.dtsi
> index 4f09aa0..4840a78 100644
> --- a/arch/arm/boot/dts/bcm47094.dtsi
> +++ b/arch/arm/boot/dts/bcm47094.dtsi
> @@ -14,4 +14,5 @@
>  
>  &uart0 {
>  	clock-frequency = <125000000>;
> +	status = "okay";
>  };
> diff --git a/arch/arm/boot/dts/bcm94708.dts b/arch/arm/boot/dts/bcm94708.dts
> index 251a486..42855a7 100644
> --- a/arch/arm/boot/dts/bcm94708.dts
> +++ b/arch/arm/boot/dts/bcm94708.dts
> @@ -50,7 +50,3 @@
>  		reg = <0x00000000 0x08000000>;
>  	};
>  };
> -
> -&uart0 {
> -	status = "okay";
> -};
> diff --git a/arch/arm/boot/dts/bcm94709.dts b/arch/arm/boot/dts/bcm94709.dts
> index b16cac9..95e8be6 100644
> --- a/arch/arm/boot/dts/bcm94709.dts
> +++ b/arch/arm/boot/dts/bcm94709.dts
> @@ -50,7 +50,3 @@
>  		reg = <0x00000000 0x08000000>;
>  	};
>  };
> -
> -&uart0 {
> -	status = "okay";
> -};
> diff --git a/arch/arm/boot/dts/bcm953012er.dts b/arch/arm/boot/dts/bcm953012er.dts
> index 0a9abec..decd86b 100644
> --- a/arch/arm/boot/dts/bcm953012er.dts
> +++ b/arch/arm/boot/dts/bcm953012er.dts
> @@ -70,10 +70,6 @@
>  	};
>  };
>  
> -&uart0 {
> -	status = "okay";
> -};
> -
>  &spi_nor {
>  	status = "okay";
>  };
> diff --git a/arch/arm/boot/dts/bcm953012k.dts b/arch/arm/boot/dts/bcm953012k.dts
> index 05a985a..bfd9230 100644
> --- a/arch/arm/boot/dts/bcm953012k.dts
> +++ b/arch/arm/boot/dts/bcm953012k.dts
> @@ -54,7 +54,6 @@
>  
>  &uart0 {
>  	clock-frequency = <62499840>;
> -	status = "okay";
>  };
>  
>  &uart1 {
> 

^ permalink raw reply

* [PATCH] dts: sun8i-h3: correct UART3 pin definitions
From: jorik at kippendief.biz @ 2016-12-04 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

From: Jorik Jonker <jorik@kippendief.biz>

In a previous commit, I made a copy/paste error in the pinmux
definitions of UART3: PG{13,14} instead of PA{13,14}. This commit takes
care of that. I have tested this commit on Orange Pi PC and Orange Pi
Plus, and it works for these boards.

Fixes: e3d11d3c45c5 ("dts: sun8i-h3: add pinmux definitions for
UART2-3")

Signed-off-by: Jorik Jonker <jorik@kippendief.biz>
---
 arch/arm/boot/dts/sun8i-h3.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/sun8i-h3.dtsi b/arch/arm/boot/dts/sun8i-h3.dtsi
index 75a8654..f4ba088 100644
--- a/arch/arm/boot/dts/sun8i-h3.dtsi
+++ b/arch/arm/boot/dts/sun8i-h3.dtsi
@@ -410,7 +410,7 @@
 			};
 
 			uart3_pins: uart3 {
-				allwinner,pins = "PG13", "PG14";
+				allwinner,pins = "PA13", "PA14";
 				allwinner,function = "uart3";
 				allwinner,drive = <SUN4I_PINCTRL_10_MA>;
 				allwinner,pull = <SUN4I_PINCTRL_NO_PULL>;
-- 
2.9.3

^ permalink raw reply related

* [PATCH v2 6/6] crypto: arm/crc32 - accelerated support based on x86 SSE implementation
From: Ard Biesheuvel @ 2016-12-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1480852447-25082-1-git-send-email-ard.biesheuvel@linaro.org>

This is a combination of the the Intel algorithm implemented using SSE
and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and
the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in
version 8 of the architecture. Two versions of the above combo are
provided, one for CRC32 and one for CRC32C.

The PMULL/NEON algorithm is faster, but operates on blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig         |   5 +
 arch/arm/crypto/Makefile        |   2 +
 arch/arm/crypto/crc32-ce-core.S | 306 ++++++++++++++++++++
 arch/arm/crypto/crc32-ce-glue.c | 195 +++++++++++++
 4 files changed, 508 insertions(+)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index fce801fa52a1..de7bb20815bf 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -125,4 +125,9 @@ config CRYPTO_CRCT10DIF_ARM_CE
 	depends on KERNEL_MODE_NEON && CRC_T10DIF
 	select CRYPTO_HASH
 
+config CRYPTO_CRC32_ARM_CE
+	tristate "CRC32(C) digest algorithm using CRC and/or PMULL instructions"
+	depends on KERNEL_MODE_NEON && CRC32
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index fc77265014b7..b578a1820ab1 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -14,6 +14,7 @@ ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM_CE) += crct10dif-arm-ce.o
+ce-obj-$(CONFIG_CRYPTO_CRC32_ARM_CE) += crc32-arm-ce.o
 
 ifneq ($(ce-obj-y)$(ce-obj-m),)
 ifeq ($(call as-instr,.fpu crypto-neon-fp-armv8,y,n),y)
@@ -38,6 +39,7 @@ sha2-arm-ce-y	:= sha2-ce-core.o sha2-ce-glue.o
 aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
+crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/crc32-ce-core.S b/arch/arm/crypto/crc32-ce-core.S
new file mode 100644
index 000000000000..70e0c8042880
--- /dev/null
+++ b/arch/arm/crypto/crc32-ce-core.S
@@ -0,0 +1,306 @@
+/*
+ * Accelerated CRC32(C) using ARM CRC, NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/* GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see http://www.gnu.org/licenses
+ *
+ * Please  visit http://www.xyratex.com/contact if you need additional
+ * information or have any questions.
+ *
+ * GPL HEADER END
+ */
+
+/*
+ * Copyright 2012 Xyratex Technology Limited
+ *
+ * Using hardware provided PCLMULQDQ instruction to accelerate the CRC32
+ * calculation.
+ * CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE)
+ * PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found
+ * at:
+ * http://www.intel.com/products/processor/manuals/
+ * Intel(R) 64 and IA-32 Architectures Software Developer's Manual
+ * Volume 2B: Instruction Set Reference, N-Z
+ *
+ * Authors:   Gregory Prestas <Gregory_Prestas@us.xyratex.com>
+ *	      Alexander Boyko <Alexander_Boyko@xyratex.com>
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.align		6
+	.arch		armv8-a
+	.arch_extension	crc
+	.fpu		crypto-neon-fp-armv8
+
+.Lcrc32_constants:
+	/*
+	 * [x4*128+32 mod P(x) << 32)]'  << 1   = 0x154442bd4
+	 * #define CONSTANT_R1  0x154442bd4LL
+	 *
+	 * [(x4*128-32 mod P(x) << 32)]' << 1   = 0x1c6e41596
+	 * #define CONSTANT_R2  0x1c6e41596LL
+	 */
+	.quad		0x0000000154442bd4
+	.quad		0x00000001c6e41596
+
+	/*
+	 * [(x128+32 mod P(x) << 32)]'   << 1   = 0x1751997d0
+	 * #define CONSTANT_R3  0x1751997d0LL
+	 *
+	 * [(x128-32 mod P(x) << 32)]'   << 1   = 0x0ccaa009e
+	 * #define CONSTANT_R4  0x0ccaa009eLL
+	 */
+	.quad		0x00000001751997d0
+	.quad		0x00000000ccaa009e
+
+	/*
+	 * [(x64 mod P(x) << 32)]'       << 1   = 0x163cd6124
+	 * #define CONSTANT_R5  0x163cd6124LL
+	 */
+	.quad		0x0000000163cd6124
+	.quad		0x00000000FFFFFFFF
+
+	/*
+	 * #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL
+	 *
+	 * Barrett Reduction constant (u64`) = u` = (x**64 / P(x))`
+	 *                                                      = 0x1F7011641LL
+	 * #define CONSTANT_RU  0x1F7011641LL
+	 */
+	.quad		0x00000001DB710641
+	.quad		0x00000001F7011641
+
+.Lcrc32c_constants:
+	.quad		0x00000000740eef02
+	.quad		0x000000009e4addf8
+	.quad		0x00000000f20c0dfe
+	.quad		0x000000014cd00bd6
+	.quad		0x00000000dd45aab8
+	.quad		0x00000000FFFFFFFF
+	.quad		0x0000000105ec76f0
+	.quad		0x00000000dea713f1
+
+	dCONSTANTl	.req	d0
+	dCONSTANTh	.req	d1
+	qCONSTANT	.req	q0
+
+	BUF		.req	r0
+	LEN		.req	r1
+	CRC		.req	r2
+
+	qzr		.req	q9
+
+	/**
+	 * Calculate crc32
+	 * BUF - buffer
+	 * LEN - sizeof buffer (multiple of 16 bytes), LEN should be > 63
+	 * CRC - initial crc32
+	 * return %eax crc32
+	 * uint crc32_pmull_le(unsigned char const *buffer,
+	 *                     size_t len, uint crc32)
+	 */
+ENTRY(crc32_pmull_le)
+	adr		r3, .Lcrc32_constants
+	b		0f
+
+ENTRY(crc32c_pmull_le)
+	adr		r3, .Lcrc32c_constants
+
+0:	bic		LEN, LEN, #15
+	vld1.8		{q1-q2}, [BUF]!
+	vld1.8		{q3-q4}, [BUF]!
+	vmov.i8		qzr, #0
+	vmov.i8		qCONSTANT, #0
+	vmov		dCONSTANTl[0], CRC
+	veor.8		d2, d2, dCONSTANTl
+	sub		LEN, LEN, #0x40
+	cmp		LEN, #0x40
+	blt		less_64
+
+	vld1.64		{qCONSTANT}, [r3]
+
+loop_64:		/* 64 bytes Full cache line folding */
+	sub		LEN, LEN, #0x40
+
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q6, d5, dCONSTANTh
+	vmull.p64	q7, d7, dCONSTANTh
+	vmull.p64	q8, d9, dCONSTANTh
+
+	vmull.p64	q1, d2, dCONSTANTl
+	vmull.p64	q2, d4, dCONSTANTl
+	vmull.p64	q3, d6, dCONSTANTl
+	vmull.p64	q4, d8, dCONSTANTl
+
+	veor.8		q1, q1, q5
+	vld1.8		{q5}, [BUF]!
+	veor.8		q2, q2, q6
+	vld1.8		{q6}, [BUF]!
+	veor.8		q3, q3, q7
+	vld1.8		{q7}, [BUF]!
+	veor.8		q4, q4, q8
+	vld1.8		{q8}, [BUF]!
+
+	veor.8		q1, q1, q5
+	veor.8		q2, q2, q6
+	veor.8		q3, q3, q7
+	veor.8		q4, q4, q8
+
+	cmp		LEN, #0x40
+	bge		loop_64
+
+less_64:		/* Folding cache line into 128bit */
+	vldr		dCONSTANTl, [r3, #16]
+	vldr		dCONSTANTh, [r3, #24]
+
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q5
+	veor.8		q1, q1, q2
+
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q5
+	veor.8		q1, q1, q3
+
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q5
+	veor.8		q1, q1, q4
+
+	teq		LEN, #0
+	beq		fold_64
+
+loop_16:		/* Folding rest buffer into 128bit */
+	subs		LEN, LEN, #0x10
+
+	vld1.8		{q2}, [BUF]!
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q5
+	veor.8		q1, q1, q2
+
+	bne		loop_16
+
+fold_64:
+	/* perform the last 64 bit fold, also adds 32 zeroes
+	 * to the input stream */
+	vmull.p64	q2, d2, dCONSTANTh
+	vext.8		q1, q1, qzr, #8
+	veor.8		q1, q1, q2
+
+	/* final 32-bit fold */
+	vldr		dCONSTANTl, [r3, #32]
+	vldr		d6, [r3, #40]
+	vmov.i8		d7, #0
+
+	vext.8		q2, q1, qzr, #4
+	vand.8		d2, d2, d6
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q2
+
+	/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
+	vldr		dCONSTANTl, [r3, #48]
+	vldr		dCONSTANTh, [r3, #56]
+
+	vand.8		q2, q1, q3
+	vext.8		q2, qzr, q2, #8
+	vmull.p64	q2, d5, dCONSTANTh
+	vand.8		q2, q2, q3
+	vmull.p64	q2, d4, dCONSTANTl
+	veor.8		q1, q1, q2
+	vmov		r0, s5
+
+	bx		lr
+ENDPROC(crc32_pmull_le)
+ENDPROC(crc32c_pmull_le)
+
+	.macro		__crc32, c
+	subs		ip, r2, #8
+	bmi		.Ltail
+
+	tst		r1, #3
+	bne		.Lunaligned
+
+	teq		ip, #0
+.Laligned8\c:
+	ldrd		r2, r3, [r1], #8
+ARM_BE8(rev		r2, r2		)
+ARM_BE8(rev		r3, r3		)
+	crc32\c\()w	r0, r0, r2
+	crc32\c\()w	r0, r0, r3
+	bxeq		lr
+	subs		ip, ip, #8
+	bpl		.Laligned8\c
+
+.Ltail\c:
+	tst		ip, #4
+	beq		2f
+	ldr		r3, [r1], #4
+ARM_BE8(rev		r3, r3		)
+	crc32\c\()w	r0, r0, r3
+
+2:	tst		ip, #2
+	beq		1f
+	ldrh		r3, [r1], #2
+ARM_BE8(rev16		r3, r3		)
+	crc32\c\()h	r0, r0, r3
+
+1:	tst		ip, #1
+	bxeq		lr
+	ldrb		r3, [r1]
+	crc32\c\()b	r0, r0, r3
+	bx		lr
+
+.Lunaligned\c:
+	tst		r1, #1
+	beq		2f
+	ldrb		r3, [r1], #1
+	subs		r2, r2, #1
+	crc32\c\()b	r0, r0, r3
+
+	tst		r1, #2
+	beq		0f
+2:	ldrh		r3, [r1], #2
+	subs		r2, r2, #2
+ARM_BE8(rev16		r3, r3		)
+	crc32\c\()h	r0, r0, r3
+
+0:	subs		ip, r2, #8
+	bpl		.Laligned8\c
+	b		.Ltail\c
+	.endm
+
+	.align		5
+ENTRY(crc32_armv8_le)
+	__crc32
+ENDPROC(crc32_armv8_le)
+
+	.align		5
+ENTRY(crc32c_armv8_le)
+	__crc32		c
+ENDPROC(crc32c_armv8_le)
diff --git a/arch/arm/crypto/crc32-ce-glue.c b/arch/arm/crypto/crc32-ce-glue.c
new file mode 100644
index 000000000000..f78cf1689669
--- /dev/null
+++ b/arch/arm/crypto/crc32-ce-glue.c
@@ -0,0 +1,195 @@
+/*
+ * Accelerated CRC32(C) using ARM CRC, NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crc32.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+#include <crypto/internal/hash.h>
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <asm/unaligned.h>
+
+#define PMULL_MIN_LEN		64L	/* minimum size of buffer
+					 * for crc32_pmull_le_16 */
+#define SCALE_F			16L	/* size of NEON register */
+
+asmlinkage u32 crc32_pmull_le(const u8 buf[], u32 len, u32 init_crc);
+asmlinkage u32 crc32_armv8_le(u32 init_crc, const u8 buf[], u32 len);
+
+asmlinkage u32 crc32c_pmull_le(const u8 buf[], u32 len, u32 init_crc);
+asmlinkage u32 crc32c_armv8_le(u32 init_crc, const u8 buf[], u32 len);
+
+static int crc32_pmull_cra_init(struct crypto_tfm *tfm)
+{
+	u32 *key = crypto_tfm_ctx(tfm);
+
+	*key = 0;
+	return 0;
+}
+
+static int crc32c_pmull_cra_init(struct crypto_tfm *tfm)
+{
+	u32 *key = crypto_tfm_ctx(tfm);
+
+	*key = ~0;
+	return 0;
+}
+
+static int crc32_pmull_setkey(struct crypto_shash *hash, const u8 *key,
+			      unsigned int keylen)
+{
+	u32 *mctx = crypto_shash_ctx(hash);
+
+	if (keylen != sizeof(u32)) {
+		crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+	*mctx = le32_to_cpup((__le32 *)key);
+	return 0;
+}
+
+static int crc32_pmull_init(struct shash_desc *desc)
+{
+	u32 *mctx = crypto_shash_ctx(desc->tfm);
+	u32 *crc = shash_desc_ctx(desc);
+
+	*crc = *mctx;
+	return 0;
+}
+
+static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
+			      unsigned int length)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	if (length >= PMULL_MIN_LEN && may_use_simd() &&
+	    (elf_hwcap2 & HWCAP2_PMULL)) {
+		unsigned int l = round_down(length, SCALE_F);
+
+		kernel_neon_begin();
+		*crc = crc32_pmull_le(data, l, *crc);
+		kernel_neon_end();
+
+		data += l;
+		length -= l;
+	}
+
+	if (length > 0) {
+		if (elf_hwcap2 & HWCAP2_CRC32)
+			*crc = crc32_armv8_le(*crc, data, length);
+		else
+			*crc = crc32_le(*crc, data, length);
+	}
+
+	return 0;
+}
+
+static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
+			       unsigned int length)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	if (length >= PMULL_MIN_LEN && may_use_simd() &&
+	    (elf_hwcap2 & HWCAP2_PMULL)) {
+		unsigned int l = round_down(length, SCALE_F);
+
+		kernel_neon_begin();
+		*crc = crc32c_pmull_le(data, l, *crc);
+		kernel_neon_end();
+
+		data += l;
+		length -= l;
+	}
+
+	if (length > 0) {
+		if (elf_hwcap2 & HWCAP2_CRC32)
+			*crc = crc32c_armv8_le(*crc, data, length);
+		else
+			*crc = __crc32c_le(*crc, data, length);
+	}
+
+	return 0;
+}
+
+static int crc32_pmull_final(struct shash_desc *desc, u8 *out)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	put_unaligned_le32(*crc, out);
+	return 0;
+}
+
+static int crc32c_pmull_final(struct shash_desc *desc, u8 *out)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	put_unaligned_le32(~*crc, out);
+	return 0;
+}
+
+static struct shash_alg crc32_pmull_algs[] = { {
+	.setkey			= crc32_pmull_setkey,
+	.init			= crc32_pmull_init,
+	.update			= crc32_pmull_update,
+	.final			= crc32_pmull_final,
+	.descsize		= sizeof(u32),
+	.digestsize		= sizeof(u32),
+
+	.base.cra_ctxsize	= sizeof(u32),
+	.base.cra_init		= crc32_pmull_cra_init,
+	.base.cra_name		= "crc32",
+	.base.cra_driver_name	= "crc32-arm-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= 1,
+	.base.cra_module	= THIS_MODULE,
+}, {
+	.setkey			= crc32_pmull_setkey,
+	.init			= crc32_pmull_init,
+	.update			= crc32c_pmull_update,
+	.final			= crc32c_pmull_final,
+	.descsize		= sizeof(u32),
+	.digestsize		= sizeof(u32),
+
+	.base.cra_ctxsize	= sizeof(u32),
+	.base.cra_init		= crc32c_pmull_cra_init,
+	.base.cra_name		= "crc32c",
+	.base.cra_driver_name	= "crc32c-arm-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= 1,
+	.base.cra_module	= THIS_MODULE,
+} };
+
+static int __init crc32_pmull_mod_init(void)
+{
+	if (!(elf_hwcap2 & (HWCAP2_PMULL|HWCAP2_CRC32)))
+		return -ENODEV;
+
+	return crypto_register_shashes(crc32_pmull_algs,
+				       ARRAY_SIZE(crc32_pmull_algs));
+}
+
+static void __exit crc32_pmull_mod_exit(void)
+{
+	crypto_unregister_shashes(crc32_pmull_algs,
+				  ARRAY_SIZE(crc32_pmull_algs));
+}
+
+module_init(crc32_pmull_mod_init);
+module_exit(crc32_pmull_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("crc32");
+MODULE_ALIAS_CRYPTO("crc32c");
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 5/6] crypto: arm64/crc32 - accelerated support based on x86 SSE implementation
From: Ard Biesheuvel @ 2016-12-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1480852447-25082-1-git-send-email-ard.biesheuvel@linaro.org>

This is a combination of the the Intel algorithm implemented using SSE
and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and
the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in
version 8 of the architecture. Two versions of the above combo are
provided, one for CRC32 and one for CRC32C.

The PMULL/NEON algorithm is faster, but operates on blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |   6 +
 arch/arm64/crypto/Makefile        |   3 +
 arch/arm64/crypto/crc32-ce-core.S | 266 ++++++++++++++++++++
 arch/arm64/crypto/crc32-ce-glue.c | 188 ++++++++++++++
 4 files changed, 463 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index d773c0659202..21835deb1ab9 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -28,6 +28,11 @@ config CRYPTO_CRCT10DIF_ARM64_CE
 	depends on KERNEL_MODE_NEON && CRC_T10DIF
 	select CRYPTO_HASH
 
+config CRYPTO_CRC32_ARM64_CE
+	tristate "CRC32 and CRC32C digest algorithms using PMULL instructions"
+	depends on KERNEL_MODE_NEON && CRC32
+	select CRYPTO_HASH
+
 config CRYPTO_AES_ARM64_CE
 	tristate "AES core cipher using ARMv8 Crypto Extensions"
 	depends on ARM64 && KERNEL_MODE_NEON
@@ -58,4 +63,5 @@ config CRYPTO_CRC32_ARM64
 	tristate "CRC32 and CRC32C using optional ARMv8 instructions"
 	depends on ARM64
 	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 36fd3eb4201b..144387805a46 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -20,6 +20,9 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM64_CE) += crct10dif-ce.o
 crct10dif-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
 
+obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
+crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
+
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
 CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
 
diff --git a/arch/arm64/crypto/crc32-ce-core.S b/arch/arm64/crypto/crc32-ce-core.S
new file mode 100644
index 000000000000..18f5a8442276
--- /dev/null
+++ b/arch/arm64/crypto/crc32-ce-core.S
@@ -0,0 +1,266 @@
+/*
+ * Accelerated CRC32(C) using arm64 CRC, NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/* GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see http://www.gnu.org/licenses
+ *
+ * Please  visit http://www.xyratex.com/contact if you need additional
+ * information or have any questions.
+ *
+ * GPL HEADER END
+ */
+
+/*
+ * Copyright 2012 Xyratex Technology Limited
+ *
+ * Using hardware provided PCLMULQDQ instruction to accelerate the CRC32
+ * calculation.
+ * CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE)
+ * PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found
+ * at:
+ * http://www.intel.com/products/processor/manuals/
+ * Intel(R) 64 and IA-32 Architectures Software Developer's Manual
+ * Volume 2B: Instruction Set Reference, N-Z
+ *
+ * Authors:   Gregory Prestas <Gregory_Prestas@us.xyratex.com>
+ *	      Alexander Boyko <Alexander_Boyko@xyratex.com>
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.align		6
+	.cpu		generic+crypto+crc
+
+.Lcrc32_constants:
+	/*
+	 * [x4*128+32 mod P(x) << 32)]'  << 1   = 0x154442bd4
+	 * #define CONSTANT_R1  0x154442bd4LL
+	 *
+	 * [(x4*128-32 mod P(x) << 32)]' << 1   = 0x1c6e41596
+	 * #define CONSTANT_R2  0x1c6e41596LL
+	 */
+	.octa		0x00000001c6e415960000000154442bd4
+
+	/*
+	 * [(x128+32 mod P(x) << 32)]'   << 1   = 0x1751997d0
+	 * #define CONSTANT_R3  0x1751997d0LL
+	 *
+	 * [(x128-32 mod P(x) << 32)]'   << 1   = 0x0ccaa009e
+	 * #define CONSTANT_R4  0x0ccaa009eLL
+	 */
+	.octa		0x00000000ccaa009e00000001751997d0
+
+	/*
+	 * [(x64 mod P(x) << 32)]'       << 1   = 0x163cd6124
+	 * #define CONSTANT_R5  0x163cd6124LL
+	 */
+	.quad		0x0000000163cd6124
+	.quad		0x00000000FFFFFFFF
+
+	/*
+	 * #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL
+	 *
+	 * Barrett Reduction constant (u64`) = u` = (x**64 / P(x))`
+	 *                                                      = 0x1F7011641LL
+	 * #define CONSTANT_RU  0x1F7011641LL
+	 */
+	.octa		0x00000001F701164100000001DB710641
+
+.Lcrc32c_constants:
+	.octa		0x000000009e4addf800000000740eef02
+	.octa		0x000000014cd00bd600000000f20c0dfe
+	.quad		0x00000000dd45aab8
+	.quad		0x00000000FFFFFFFF
+	.octa		0x00000000dea713f10000000105ec76f0
+
+	vCONSTANT	.req	v0
+	dCONSTANT	.req	d0
+	qCONSTANT	.req	q0
+
+	BUF		.req	x0
+	LEN		.req	x1
+	CRC		.req	x2
+
+	vzr		.req	v9
+
+	/**
+	 * Calculate crc32
+	 * BUF - buffer
+	 * LEN - sizeof buffer (multiple of 16 bytes), LEN should be > 63
+	 * CRC - initial crc32
+	 * return %eax crc32
+	 * uint crc32_pmull_le(unsigned char const *buffer,
+	 *                     size_t len, uint crc32)
+	 */
+ENTRY(crc32_pmull_le)
+	adr		x3, .Lcrc32_constants
+	b		0f
+
+ENTRY(crc32c_pmull_le)
+	adr		x3, .Lcrc32c_constants
+
+0:	bic		LEN, LEN, #15
+	ld1		{v1.16b-v4.16b}, [BUF], #0x40
+	movi		vzr.16b, #0
+	fmov		dCONSTANT, CRC
+	eor		v1.16b, v1.16b, vCONSTANT.16b
+	sub		LEN, LEN, #0x40
+	cmp		LEN, #0x40
+	b.lt		less_64
+
+	ldr		qCONSTANT, [x3]
+
+loop_64:		/* 64 bytes Full cache line folding */
+	sub		LEN, LEN, #0x40
+
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull2		v6.1q, v2.2d, vCONSTANT.2d
+	pmull2		v7.1q, v3.2d, vCONSTANT.2d
+	pmull2		v8.1q, v4.2d, vCONSTANT.2d
+
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	pmull		v2.1q, v2.1d, vCONSTANT.1d
+	pmull		v3.1q, v3.1d, vCONSTANT.1d
+	pmull		v4.1q, v4.1d, vCONSTANT.1d
+
+	eor		v1.16b, v1.16b, v5.16b
+	ld1		{v5.16b}, [BUF], #0x10
+	eor		v2.16b, v2.16b, v6.16b
+	ld1		{v6.16b}, [BUF], #0x10
+	eor		v3.16b, v3.16b, v7.16b
+	ld1		{v7.16b}, [BUF], #0x10
+	eor		v4.16b, v4.16b, v8.16b
+	ld1		{v8.16b}, [BUF], #0x10
+
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v2.16b, v2.16b, v6.16b
+	eor		v3.16b, v3.16b, v7.16b
+	eor		v4.16b, v4.16b, v8.16b
+
+	cmp		LEN, #0x40
+	b.ge		loop_64
+
+less_64:		/* Folding cache line into 128bit */
+	ldr		qCONSTANT, [x3, #16]
+
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v1.16b, v1.16b, v2.16b
+
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v1.16b, v1.16b, v3.16b
+
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v1.16b, v1.16b, v4.16b
+
+	cbz		LEN, fold_64
+
+loop_16:		/* Folding rest buffer into 128bit */
+	subs		LEN, LEN, #0x10
+
+	ld1		{v2.16b}, [BUF], #0x10
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v1.16b, v1.16b, v2.16b
+
+	b.ne		loop_16
+
+fold_64:
+	/* perform the last 64 bit fold, also adds 32 zeroes
+	 * to the input stream */
+	ext		v2.16b, v1.16b, v1.16b, #8
+	pmull2		v2.1q, v2.2d, vCONSTANT.2d
+	ext		v1.16b, v1.16b, vzr.16b, #8
+	eor		v1.16b, v1.16b, v2.16b
+
+	/* final 32-bit fold */
+	ldr		dCONSTANT, [x3, #32]
+	ldr		d3, [x3, #40]
+
+	ext		v2.16b, v1.16b, vzr.16b, #4
+	and		v1.16b, v1.16b, v3.16b
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v2.16b
+
+	/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
+	ldr		qCONSTANT, [x3, #48]
+
+	and		v2.16b, v1.16b, v3.16b
+	ext		v2.16b, vzr.16b, v2.16b, #8
+	pmull2		v2.1q, v2.2d, vCONSTANT.2d
+	and		v2.16b, v2.16b, v3.16b
+	pmull		v2.1q, v2.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v2.16b
+	mov		w0, v1.s[1]
+
+	ret
+ENDPROC(crc32_pmull_le)
+ENDPROC(crc32c_pmull_le)
+
+	.macro		__crc32, c
+0:	subs		x2, x2, #16
+	b.mi		8f
+	ldp		x3, x4, [x1], #16
+CPU_BE(	rev		x3, x3		)
+CPU_BE(	rev		x4, x4		)
+	crc32\c\()x	w0, w0, x3
+	crc32\c\()x	w0, w0, x4
+	b.ne		0b
+	ret
+
+8:	tbz		x2, #3, 4f
+	ldr		x3, [x1], #8
+CPU_BE(	rev		x3, x3		)
+	crc32\c\()x	w0, w0, x3
+4:	tbz		x2, #2, 2f
+	ldr		w3, [x1], #4
+CPU_BE(	rev		w3, w3		)
+	crc32\c\()w	w0, w0, w3
+2:	tbz		x2, #1, 1f
+	ldrh		w3, [x1], #2
+CPU_BE(	rev16		w3, w3		)
+	crc32\c\()h	w0, w0, w3
+1:	tbz		x2, #0, 0f
+	ldrb		w3, [x1]
+	crc32\c\()b	w0, w0, w3
+0:	ret
+	.endm
+
+	.align		5
+ENTRY(crc32_armv8_le)
+	__crc32
+ENDPROC(crc32_armv8_le)
+
+	.align		5
+ENTRY(crc32c_armv8_le)
+	__crc32		c
+ENDPROC(crc32c_armv8_le)
diff --git a/arch/arm64/crypto/crc32-ce-glue.c b/arch/arm64/crypto/crc32-ce-glue.c
new file mode 100644
index 000000000000..c3c7e5848e2a
--- /dev/null
+++ b/arch/arm64/crypto/crc32-ce-glue.c
@@ -0,0 +1,188 @@
+/*
+ * Accelerated CRC32(C) using arm64 NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cpufeature.h>
+#include <linux/crc32.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+#include <crypto/internal/hash.h>
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+
+#define PMULL_MIN_LEN		64L	/* minimum size of buffer
+					 * for crc32_pmull_le_16 */
+#define SCALE_F			16L	/* size of NEON register */
+
+asmlinkage u32 crc32_pmull_le(const u8 buf[], u64 len, u32 init_crc);
+asmlinkage u32 crc32_armv8_le(u32 init_crc, const u8 buf[], u64 len);
+
+asmlinkage u32 crc32c_pmull_le(const u8 buf[], u64 len, u32 init_crc);
+asmlinkage u32 crc32c_armv8_le(u32 init_crc, const u8 buf[], u64 len);
+
+static int crc32_pmull_cra_init(struct crypto_tfm *tfm)
+{
+	u32 *key = crypto_tfm_ctx(tfm);
+
+	*key = 0;
+	return 0;
+}
+
+static int crc32c_pmull_cra_init(struct crypto_tfm *tfm)
+{
+	u32 *key = crypto_tfm_ctx(tfm);
+
+	*key = ~0;
+	return 0;
+}
+
+static int crc32_pmull_setkey(struct crypto_shash *hash, const u8 *key,
+			      unsigned int keylen)
+{
+	u32 *mctx = crypto_shash_ctx(hash);
+
+	if (keylen != sizeof(u32)) {
+		crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+	*mctx = le32_to_cpup((__le32 *)key);
+	return 0;
+}
+
+static int crc32_pmull_init(struct shash_desc *desc)
+{
+	u32 *mctx = crypto_shash_ctx(desc->tfm);
+	u32 *crc = shash_desc_ctx(desc);
+
+	*crc = *mctx;
+	return 0;
+}
+
+static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	if (length >= PMULL_MIN_LEN) {
+		unsigned int l = round_down(length, SCALE_F);
+
+		kernel_neon_begin_partial(10);
+		*crc = crc32_pmull_le(data, l, *crc);
+		kernel_neon_end();
+
+		data += l;
+		length -= l;
+	}
+
+	if (length > 0) {
+		if (elf_hwcap & HWCAP_CRC32)
+			*crc = crc32_armv8_le(*crc, data, length);
+		else
+			*crc = crc32_le(*crc, data, length);
+	}
+
+	return 0;
+}
+
+static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	if (length >= PMULL_MIN_LEN) {
+		unsigned int l = round_down(length, SCALE_F);
+
+		kernel_neon_begin_partial(10);
+		*crc = crc32c_pmull_le(data, l, *crc);
+		kernel_neon_end();
+
+		data += l;
+		length -= l;
+	}
+
+	if (length > 0) {
+		if (elf_hwcap & HWCAP_CRC32)
+			*crc = crc32c_armv8_le(*crc, data, length);
+		else
+			*crc = __crc32c_le(*crc, data, length);
+	}
+
+	return 0;
+}
+
+static int crc32_pmull_final(struct shash_desc *desc, u8 *out)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	put_unaligned_le32(*crc, out);
+	return 0;
+}
+
+static int crc32c_pmull_final(struct shash_desc *desc, u8 *out)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	put_unaligned_le32(~*crc, out);
+	return 0;
+}
+
+static struct shash_alg crc32_pmull_algs[] = { {
+	.setkey			= crc32_pmull_setkey,
+	.init			= crc32_pmull_init,
+	.update			= crc32_pmull_update,
+	.final			= crc32_pmull_final,
+	.descsize		= sizeof(u32),
+	.digestsize		= sizeof(u32),
+
+	.base.cra_ctxsize	= sizeof(u32),
+	.base.cra_init		= crc32_pmull_cra_init,
+	.base.cra_name		= "crc32",
+	.base.cra_driver_name	= "crc32-arm64-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= 1,
+	.base.cra_module	= THIS_MODULE,
+}, {
+	.setkey			= crc32_pmull_setkey,
+	.init			= crc32_pmull_init,
+	.update			= crc32c_pmull_update,
+	.final			= crc32c_pmull_final,
+	.descsize		= sizeof(u32),
+	.digestsize		= sizeof(u32),
+
+	.base.cra_ctxsize	= sizeof(u32),
+	.base.cra_init		= crc32c_pmull_cra_init,
+	.base.cra_name		= "crc32c",
+	.base.cra_driver_name	= "crc32c-arm64-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= 1,
+	.base.cra_module	= THIS_MODULE,
+} };
+
+static int __init crc32_pmull_mod_init(void)
+{
+	return crypto_register_shashes(crc32_pmull_algs,
+				       ARRAY_SIZE(crc32_pmull_algs));
+}
+
+static void __exit crc32_pmull_mod_exit(void)
+{
+	crypto_unregister_shashes(crc32_pmull_algs,
+				  ARRAY_SIZE(crc32_pmull_algs));
+}
+
+module_cpu_feature_match(PMULL, crc32_pmull_mod_init);
+module_exit(crc32_pmull_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 4/6] crypto: arm/crct10dif - port x86 SSE implementation to ARM
From: Ard Biesheuvel @ 2016-12-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1480852447-25082-1-git-send-email-ard.biesheuvel@linaro.org>

This is a transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
operate on multiples of 16 bytes. The residual data is handled
by the generic C implementation.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig             |   5 +
 arch/arm/crypto/Makefile            |   2 +
 arch/arm/crypto/crct10dif-ce-core.S | 349 ++++++++++++++++++++
 arch/arm/crypto/crct10dif-ce-glue.c |  95 ++++++
 4 files changed, 451 insertions(+)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 27ed1b1cd1d7..fce801fa52a1 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -120,4 +120,9 @@ config CRYPTO_GHASH_ARM_CE
 	  that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64)
 	  that is part of the ARMv8 Crypto Extensions
 
+config CRYPTO_CRCT10DIF_ARM_CE
+	tristate "CRCT10DIF digest algorithm using PMULL instructions"
+	depends on KERNEL_MODE_NEON && CRC_T10DIF
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index fc5150702b64..fc77265014b7 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -13,6 +13,7 @@ ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
+ce-obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM_CE) += crct10dif-arm-ce.o
 
 ifneq ($(ce-obj-y)$(ce-obj-m),)
 ifeq ($(call as-instr,.fpu crypto-neon-fp-armv8,y,n),y)
@@ -36,6 +37,7 @@ sha1-arm-ce-y	:= sha1-ce-core.o sha1-ce-glue.o
 sha2-arm-ce-y	:= sha2-ce-core.o sha2-ce-glue.o
 aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
+crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/crct10dif-ce-core.S b/arch/arm/crypto/crct10dif-ce-core.S
new file mode 100644
index 000000000000..ae2adb54e905
--- /dev/null
+++ b/arch/arm/crypto/crct10dif-ce-core.S
@@ -0,0 +1,349 @@
+//
+// Accelerated CRC-T10DIF using ARM NEON and Crypto Extensions instructions
+//
+// Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+//
+// This program is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License version 2 as
+// published by the Free Software Foundation.
+//
+
+//
+// Implement fast CRC-T10DIF computation with SSE and PCLMULQDQ instructions
+//
+// Copyright (c) 2013, Intel Corporation
+//
+// Authors:
+//     Erdinc Ozturk <erdinc.ozturk@intel.com>
+//     Vinodh Gopal <vinodh.gopal@intel.com>
+//     James Guilford <james.guilford@intel.com>
+//     Tim Chen <tim.c.chen@linux.intel.com>
+//
+// This software is available to you under a choice of one of two
+// licenses.  You may choose to be licensed under the terms of the GNU
+// General Public License (GPL) Version 2, available from the file
+// COPYING in the main directory of this source tree, or the
+// OpenIB.org BSD license below:
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// * Redistributions of source code must retain the above copyright
+//   notice, this list of conditions and the following disclaimer.
+//
+// * Redistributions in binary form must reproduce the above copyright
+//   notice, this list of conditions and the following disclaimer in the
+//   documentation and/or other materials provided with the
+//   distribution.
+//
+// * Neither the name of the Intel Corporation nor the names of its
+//   contributors may be used to endorse or promote products derived from
+//   this software without specific prior written permission.
+//
+//
+// THIS SOFTWARE IS PROVIDED BY INTEL CORPORATION ""AS IS"" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL CORPORATION OR
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+//       Function API:
+//       UINT16 crc_t10dif_pcl(
+//               UINT16 init_crc, //initial CRC value, 16 bits
+//               const unsigned char *buf, //buffer pointer to calculate CRC on
+//               UINT64 len //buffer length in bytes (64-bit data)
+//       );
+//
+//       Reference paper titled "Fast CRC Computation for Generic
+//	Polynomials Using PCLMULQDQ Instruction"
+//       URL: http://www.intel.com/content/dam/www/public/us/en/documents
+//  /white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
+//
+//
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+#ifdef CONFIG_CPU_ENDIAN_BE8
+#define CPU_LE(code...)
+#else
+#define CPU_LE(code...)		code
+#endif
+
+	.text
+	.fpu		crypto-neon-fp-armv8
+
+	arg1_low32	.req	r0
+	arg2		.req	r1
+	arg3		.req	r2
+
+	qzr		.req	q13
+
+	q0l		.req	d0
+	q0h		.req	d1
+	q1l		.req	d2
+	q1h		.req	d3
+	q2l		.req	d4
+	q2h		.req	d5
+	q3l		.req	d6
+	q3h		.req	d7
+	q4l		.req	d8
+	q4h		.req	d9
+	q5l		.req	d10
+	q5h		.req	d11
+	q6l		.req	d12
+	q6h		.req	d13
+	q7l		.req	d14
+	q7h		.req	d15
+
+ENTRY(crc_t10dif_pmull)
+	push		{r4, lr}
+
+	vmov.i8		qzr, #0			// init zero register
+
+	// adjust the 16-bit initial_crc value, scale it to 32 bits
+	lsl		arg1_low32, arg1_low32, #16
+
+	// check if smaller than 256
+	cmp		arg3, #256
+
+	// for sizes less than 128, we can't fold 64B at a time...
+	blt		_less_than_128
+
+	// load the initial crc value
+	// crc value does not need to be byte-reflected, but it needs
+	// to be moved to the high part of the register.
+	// because data will be byte-reflected and will align with
+	// initial crc at correct place.
+	vmov		s0, arg1_low32		// initial crc
+	vext.8		q10, qzr, q0, #4
+
+	// receive the initial 64B data, xor the initial crc value
+	vld1.64		{q0-q1}, [arg2]!
+	vld1.64		{q2-q3}, [arg2]!
+	vld1.64		{q4-q5}, [arg2]!
+	vld1.64		{q6-q7}, [arg2]!
+CPU_LE(	vrev64.8	q0, q0			)
+CPU_LE(	vrev64.8	q1, q1			)
+CPU_LE(	vrev64.8	q2, q2			)
+CPU_LE(	vrev64.8	q3, q3			)
+CPU_LE(	vrev64.8	q4, q4			)
+CPU_LE(	vrev64.8	q5, q5			)
+CPU_LE(	vrev64.8	q6, q6			)
+CPU_LE(	vrev64.8	q7, q7			)
+
+	vswp		d0, d1
+	vswp		d2, d3
+	vswp		d4, d5
+	vswp		d6, d7
+	vswp		d8, d9
+	vswp		d10, d11
+	vswp		d12, d13
+	vswp		d14, d15
+
+	// XOR the initial_crc value
+	veor.8		q0, q0, q10
+
+	adr		ip, rk3
+	vld1.64		{q10}, [ip]	// xmm10 has rk3 and rk4
+					// type of pmull instruction
+					// will determine which constant to use
+
+	//
+	// we subtract 256 instead of 128 to save one instruction from the loop
+	//
+	sub		arg3, arg3, #256
+
+	// at this section of the code, there is 64*x+y (0<=y<64) bytes of
+	// buffer. The _fold_64_B_loop will fold 64B at a time
+	// until we have 64+y Bytes of buffer
+
+
+	// fold 64B at a time. This section of the code folds 4 vector
+	// registers in parallel
+_fold_64_B_loop:
+
+	.macro		fold64, reg1, reg2
+	vld1.64		{q11-q12}, [arg2]!
+
+	vmull.p64	q8, \reg1\()h, d21
+	vmull.p64	\reg1, \reg1\()l, d20
+	vmull.p64	q9, \reg2\()h, d21
+	vmull.p64	\reg2, \reg2\()l, d20
+
+CPU_LE(	vrev64.8	q11, q11		)
+CPU_LE(	vrev64.8	q12, q12		)
+	vswp		d22, d23
+	vswp		d24, d25
+
+	veor.8		\reg1, \reg1, q8
+	veor.8		\reg2, \reg2, q9
+	veor.8		\reg1, \reg1, q11
+	veor.8		\reg2, \reg2, q12
+	.endm
+
+	fold64		q0, q1
+	fold64		q2, q3
+	fold64		q4, q5
+	fold64		q6, q7
+
+	subs		arg3, arg3, #128
+
+	// check if there is another 64B in the buffer to be able to fold
+	bge		_fold_64_B_loop
+
+	// at this point, the buffer pointer is pointing at the last y Bytes
+	// of the buffer the 64B of folded data is in 4 of the vector
+	// registers: v0, v1, v2, v3
+
+	// fold the 8 vector registers to 1 vector register with different
+	// constants
+
+	adr		ip, rk9
+	vld1.64		{q10}, [ip]!
+
+	.macro		fold16, reg, rk
+	vmull.p64	q8, \reg\()l, d20
+	vmull.p64	\reg, \reg\()h, d21
+	.ifnb		\rk
+	vld1.64		{q10}, [ip]!
+	.endif
+	veor.8		q7, q7, q8
+	veor.8		q7, q7, \reg
+	.endm
+
+	fold16		q0, rk11
+	fold16		q1, rk13
+	fold16		q2, rk15
+	fold16		q3, rk17
+	fold16		q4, rk19
+	fold16		q5, rk1
+	fold16		q6
+
+	// instead of 64, we add 48 to the loop counter to save 1 instruction
+	// from the loop instead of a cmp instruction, we use the negative
+	// flag with the jl instruction
+	adds		arg3, arg3, #(128-16)
+	blt		_final_reduction_for_128
+
+	// now we have 16+y bytes left to reduce. 16 Bytes is in register v7
+	// and the rest is in memory. We can fold 16 bytes@a time if y>=16
+	// continue folding 16B at a time
+
+_16B_reduction_loop:
+	vmull.p64	q8, d14, d20
+	vmull.p64	q7, d15, d21
+	veor.8		q7, q7, q8
+
+	vld1.64		{q0}, [arg2]!
+CPU_LE(	vrev64.8	q0, q0		)
+	vswp		d0, d1
+	veor.8		q7, q7, q0
+	subs		arg3, arg3, #16
+
+	// instead of a cmp instruction, we utilize the flags with the
+	// jge instruction equivalent of: cmp arg3, 16-16
+	// check if there is any more 16B in the buffer to be able to fold
+	bge		_16B_reduction_loop
+
+_final_reduction_for_128:
+	// compute crc of a 128-bit value
+	vldr		d20, rk5
+	vldr		d21, rk6		// rk5 and rk6 in xmm10
+
+	// 64b fold
+	vext.8		q0, qzr, q7, #8
+	vmull.p64	q7, d15, d20
+	veor.8		q7, q7, q0
+
+	// 32b fold
+	vext.8		q0, q7, qzr, #12
+	vmov		s31, s3
+	vmull.p64	q0, d0, d21
+	veor.8		q7, q0, q7
+
+	// barrett reduction
+_barrett:
+	vldr		d20, rk7
+	vldr		d21, rk8
+
+	vmull.p64	q0, d15, d20
+	vext.8		q0, qzr, q0, #12
+	vmull.p64	q0, d1, d21
+	vext.8		q0, qzr, q0, #12
+	veor.8		q7, q7, q0
+	vmov		r0, s29
+
+_cleanup:
+	// scale the result back to 16 bits
+	lsr		r0, r0, #16
+	pop		{r4, pc}
+
+_less_than_128:
+	teq		arg3, #0
+	beq		_cleanup
+
+	vmov.i8		q0, #0
+	vmov		s3, arg1_low32		// get the initial crc value
+
+	vld1.64		{q7}, [arg2]!
+CPU_LE(	vrev64.8	q7, q7		)
+	vswp		d14, d15
+	veor.8		q7, q7, q0
+
+	// check if there is enough buffer to be able to fold 16B at a time
+	cmp		arg3, #32
+	blt		_final_reduction_for_128
+
+	// now if there is, load the constants
+	vldr		d20, rk1
+	vldr		d21, rk2		// rk1 and rk2 in xmm10
+
+	// update the counter. subtract 32 instead of 16 to save one
+	// instruction from the loop
+	sub		arg3, arg3, #32
+
+	b		_16B_reduction_loop
+ENDPROC(crc_t10dif_pmull)
+
+// precomputed constants
+// these constants are precomputed from the poly:
+// 0x8bb70000 (0x8bb7 scaled to 32 bits)
+	.align		4
+// Q = 0x18BB70000
+// rk1 = 2^(32*3) mod Q << 32
+// rk2 = 2^(32*5) mod Q << 32
+// rk3 = 2^(32*15) mod Q << 32
+// rk4 = 2^(32*17) mod Q << 32
+// rk5 = 2^(32*3) mod Q << 32
+// rk6 = 2^(32*2) mod Q << 32
+// rk7 = floor(2^64/Q)
+// rk8 = Q
+
+rk3:	.quad		0x9d9d000000000000
+rk4:	.quad		0x7cf5000000000000
+rk5:	.quad		0x2d56000000000000
+rk6:	.quad		0x1368000000000000
+rk7:	.quad		0x00000001f65a57f8
+rk8:	.quad		0x000000018bb70000
+rk9:	.quad		0xceae000000000000
+rk10:	.quad		0xbfd6000000000000
+rk11:	.quad		0x1e16000000000000
+rk12:	.quad		0x713c000000000000
+rk13:	.quad		0xf7f9000000000000
+rk14:	.quad		0x80a6000000000000
+rk15:	.quad		0x044c000000000000
+rk16:	.quad		0xe658000000000000
+rk17:	.quad		0xad18000000000000
+rk18:	.quad		0xa497000000000000
+rk19:	.quad		0x6ee3000000000000
+rk20:	.quad		0xe7b5000000000000
+rk1:	.quad		0x2d56000000000000
+rk2:	.quad		0x06df000000000000
diff --git a/arch/arm/crypto/crct10dif-ce-glue.c b/arch/arm/crypto/crct10dif-ce-glue.c
new file mode 100644
index 000000000000..8225422c34a7
--- /dev/null
+++ b/arch/arm/crypto/crct10dif-ce-glue.c
@@ -0,0 +1,95 @@
+/*
+ * Accelerated CRC-T10DIF using ARM NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crc-t10dif.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+#include <crypto/internal/hash.h>
+
+#include <asm/neon.h>
+#include <asm/simd.h>
+
+#define CRC_T10DIF_PMULL_CHUNK_SIZE	16U
+
+asmlinkage u16 crc_t10dif_pmull(u16 init_crc, const u8 buf[], u64 len);
+
+static int crct10dif_init(struct shash_desc *desc)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	*crc = 0;
+	return 0;
+}
+
+static int crct10dif_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	if (may_use_simd() && length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
+		unsigned int l = length & ~(CRC_T10DIF_PMULL_CHUNK_SIZE - 1);
+
+		kernel_neon_begin();
+		*crc = crc_t10dif_pmull(*crc, data, l);
+		kernel_neon_end();
+
+		length -= l;
+		data += l;
+	}
+	if (length > 0)
+		*crc = crc_t10dif_generic(*crc, data, length);
+
+	return 0;
+}
+
+static int crct10dif_final(struct shash_desc *desc, u8 *out)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	*(u16 *)out = *crc;
+	return 0;
+}
+
+static struct shash_alg crc_t10dif_alg = {
+	.digestsize		= CRC_T10DIF_DIGEST_SIZE,
+	.init			= crct10dif_init,
+	.update			= crct10dif_update,
+	.final			= crct10dif_final,
+	.descsize		= CRC_T10DIF_DIGEST_SIZE,
+
+	.base.cra_name		= "crct10dif",
+	.base.cra_driver_name	= "crct10dif-arm-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init crc_t10dif_mod_init(void)
+{
+	if (!(elf_hwcap2 & HWCAP2_PMULL))
+		return -ENODEV;
+
+	return crypto_register_shash(&crc_t10dif_alg);
+}
+
+static void __exit crc_t10dif_mod_exit(void)
+{
+	crypto_unregister_shash(&crc_t10dif_alg);
+}
+
+module_init(crc_t10dif_mod_init);
+module_exit(crc_t10dif_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("crct10dif");
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 3/6] crypto: arm64/crct10dif - port x86 SSE implementation to arm64
From: Ard Biesheuvel @ 2016-12-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1480852447-25082-1-git-send-email-ard.biesheuvel@linaro.org>

This is a transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
operate on multiples of 16 bytes. The residual data is handled
by the generic C implementation.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig             |   5 +
 arch/arm64/crypto/Makefile            |   3 +
 arch/arm64/crypto/crct10dif-ce-core.S | 317 ++++++++++++++++++++
 arch/arm64/crypto/crct10dif-ce-glue.c |  91 ++++++
 4 files changed, 416 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9887e1..d773c0659202 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+config CRYPTO_CRCT10DIF_ARM64_CE
+	tristate "CRCT10DIF digest algorithm using PMULL instructions"
+	depends on KERNEL_MODE_NEON && CRC_T10DIF
+	select CRYPTO_HASH
+
 config CRYPTO_AES_ARM64_CE
 	tristate "AES core cipher using ARMv8 Crypto Extensions"
 	depends on ARM64 && KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index abb79b3cfcfe..36fd3eb4201b 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -17,6 +17,9 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
 obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
 ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 
+obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM64_CE) += crct10dif-ce.o
+crct10dif-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
+
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
 CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
 
diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S
new file mode 100644
index 000000000000..641685effebd
--- /dev/null
+++ b/arch/arm64/crypto/crct10dif-ce-core.S
@@ -0,0 +1,317 @@
+//
+// Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
+//
+// Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+//
+// This program is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License version 2 as
+// published by the Free Software Foundation.
+//
+
+//
+// Implement fast CRC-T10DIF computation with SSE and PCLMULQDQ instructions
+//
+// Copyright (c) 2013, Intel Corporation
+//
+// Authors:
+//     Erdinc Ozturk <erdinc.ozturk@intel.com>
+//     Vinodh Gopal <vinodh.gopal@intel.com>
+//     James Guilford <james.guilford@intel.com>
+//     Tim Chen <tim.c.chen@linux.intel.com>
+//
+// This software is available to you under a choice of one of two
+// licenses.  You may choose to be licensed under the terms of the GNU
+// General Public License (GPL) Version 2, available from the file
+// COPYING in the main directory of this source tree, or the
+// OpenIB.org BSD license below:
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// * Redistributions of source code must retain the above copyright
+//   notice, this list of conditions and the following disclaimer.
+//
+// * Redistributions in binary form must reproduce the above copyright
+//   notice, this list of conditions and the following disclaimer in the
+//   documentation and/or other materials provided with the
+//   distribution.
+//
+// * Neither the name of the Intel Corporation nor the names of its
+//   contributors may be used to endorse or promote products derived from
+//   this software without specific prior written permission.
+//
+//
+// THIS SOFTWARE IS PROVIDED BY INTEL CORPORATION ""AS IS"" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL CORPORATION OR
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+//       Function API:
+//       UINT16 crc_t10dif_pcl(
+//               UINT16 init_crc, //initial CRC value, 16 bits
+//               const unsigned char *buf, //buffer pointer to calculate CRC on
+//               UINT64 len //buffer length in bytes (64-bit data)
+//       );
+//
+//       Reference paper titled "Fast CRC Computation for Generic
+//	Polynomials Using PCLMULQDQ Instruction"
+//       URL: http://www.intel.com/content/dam/www/public/us/en/documents
+//  /white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
+//
+//
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.cpu		generic+crypto
+
+	arg1_low32	.req	w0
+	arg2		.req	x1
+	arg3		.req	x2
+
+	vzr		.req	v13
+
+ENTRY(crc_t10dif_pmull)
+	stp		x29, x30, [sp, #-16]!
+	mov		x29, sp
+
+	movi		vzr.16b, #0		// init zero register
+
+	// adjust the 16-bit initial_crc value, scale it to 32 bits
+	lsl		arg1_low32, arg1_low32, #16
+
+	// check if smaller than 256
+	cmp		arg3, #256
+
+	// for sizes less than 128, we can't fold 64B at a time...
+	b.lt		_less_than_128
+
+	// load the initial crc value
+	// crc value does not need to be byte-reflected, but it needs
+	// to be moved to the high part of the register.
+	// because data will be byte-reflected and will align with
+	// initial crc at correct place.
+	movi		v10.16b, #0
+	mov		v10.s[3], arg1_low32		// initial crc
+
+	// receive the initial 64B data, xor the initial crc value
+	ldp		q0, q1, [arg2]
+	ldp		q2, q3, [arg2, #0x20]
+	ldp		q4, q5, [arg2, #0x40]
+	ldp		q6, q7, [arg2, #0x60]
+	add		arg2, arg2, #0x80
+
+CPU_LE(	rev64		v0.16b, v0.16b			)
+CPU_LE(	rev64		v1.16b, v1.16b			)
+CPU_LE(	rev64		v2.16b, v2.16b			)
+CPU_LE(	rev64		v3.16b, v3.16b			)
+CPU_LE(	rev64		v4.16b, v4.16b			)
+CPU_LE(	rev64		v5.16b, v5.16b			)
+CPU_LE(	rev64		v6.16b, v6.16b			)
+CPU_LE(	rev64		v7.16b, v7.16b			)
+
+CPU_LE(	ext		v0.16b, v0.16b, v0.16b, #8	)
+CPU_LE(	ext		v1.16b, v1.16b, v1.16b, #8	)
+CPU_LE(	ext		v2.16b, v2.16b, v2.16b, #8	)
+CPU_LE(	ext		v3.16b, v3.16b, v3.16b, #8	)
+CPU_LE(	ext		v4.16b, v4.16b, v4.16b, #8	)
+CPU_LE(	ext		v5.16b, v5.16b, v5.16b, #8	)
+CPU_LE(	ext		v6.16b, v6.16b, v6.16b, #8	)
+CPU_LE(	ext		v7.16b, v7.16b, v7.16b, #8	)
+
+	// XOR the initial_crc value
+	eor		v0.16b, v0.16b, v10.16b
+
+	ldr		q10, rk3	// xmm10 has rk3 and rk4
+					// type of pmull instruction
+					// will determine which constant to use
+
+	//
+	// we subtract 256 instead of 128 to save one instruction from the loop
+	//
+	sub		arg3, arg3, #256
+
+	// at this section of the code, there is 64*x+y (0<=y<64) bytes of
+	// buffer. The _fold_64_B_loop will fold 64B at a time
+	// until we have 64+y Bytes of buffer
+
+
+	// fold 64B at a time. This section of the code folds 4 vector
+	// registers in parallel
+_fold_64_B_loop:
+
+	.macro		fold64, reg1, reg2
+	ldp		q11, q12, [arg2], #0x20
+
+	pmull2		v8.1q, \reg1\().2d, v10.2d
+	pmull		\reg1\().1q, \reg1\().1d, v10.1d
+
+CPU_LE(	rev64		v11.16b, v11.16b		)
+CPU_LE(	rev64		v12.16b, v12.16b		)
+
+	pmull2		v9.1q, \reg2\().2d, v10.2d
+	pmull		\reg2\().1q, \reg2\().1d, v10.1d
+
+CPU_LE(	ext		v11.16b, v11.16b, v11.16b, #8	)
+CPU_LE(	ext		v12.16b, v12.16b, v12.16b, #8	)
+
+	eor		\reg1\().16b, \reg1\().16b, v8.16b
+	eor		\reg2\().16b, \reg2\().16b, v9.16b
+	eor		\reg1\().16b, \reg1\().16b, v11.16b
+	eor		\reg2\().16b, \reg2\().16b, v12.16b
+	.endm
+
+	fold64		v0, v1
+	fold64		v2, v3
+	fold64		v4, v5
+	fold64		v6, v7
+
+	subs		arg3, arg3, #128
+
+	// check if there is another 64B in the buffer to be able to fold
+	b.ge		_fold_64_B_loop
+
+	// at this point, the buffer pointer is pointing at the last y Bytes
+	// of the buffer the 64B of folded data is in 4 of the vector
+	// registers: v0, v1, v2, v3
+
+	// fold the 8 vector registers to 1 vector register with different
+	// constants
+
+	ldr		q10, rk9
+
+	.macro		fold16, reg, rk
+	pmull		v8.1q, \reg\().1d, v10.1d
+	pmull2		\reg\().1q, \reg\().2d, v10.2d
+	.ifnb		\rk
+	ldr		q10, \rk
+	.endif
+	eor		v7.16b, v7.16b, v8.16b
+	eor		v7.16b, v7.16b, \reg\().16b
+	.endm
+
+	fold16		v0, rk11
+	fold16		v1, rk13
+	fold16		v2, rk15
+	fold16		v3, rk17
+	fold16		v4, rk19
+	fold16		v5, rk1
+	fold16		v6
+
+	// instead of 64, we add 48 to the loop counter to save 1 instruction
+	// from the loop instead of a cmp instruction, we use the negative
+	// flag with the jl instruction
+	adds		arg3, arg3, #(128-16)
+	b.lt		_final_reduction_for_128
+
+	// now we have 16+y bytes left to reduce. 16 Bytes is in register v7
+	// and the rest is in memory. We can fold 16 bytes@a time if y>=16
+	// continue folding 16B at a time
+
+_16B_reduction_loop:
+	pmull		v8.1q, v7.1d, v10.1d
+	pmull2		v7.1q, v7.2d, v10.2d
+	eor		v7.16b, v7.16b, v8.16b
+
+	ldr		q0, [arg2], #16
+CPU_LE(	rev64		v0.16b, v0.16b			)
+CPU_LE(	ext		v0.16b, v0.16b, v0.16b, #8	)
+	eor		v7.16b, v7.16b, v0.16b
+	subs		arg3, arg3, #16
+
+	// instead of a cmp instruction, we utilize the flags with the
+	// jge instruction equivalent of: cmp arg3, 16-16
+	// check if there is any more 16B in the buffer to be able to fold
+	b.ge		_16B_reduction_loop
+
+_final_reduction_for_128:
+	// compute crc of a 128-bit value
+	ldr		q10, rk5		// rk5 and rk6 in xmm10
+
+	// 64b fold
+	ext		v0.16b, vzr.16b, v7.16b, #8
+	mov		v7.d[0], v7.d[1]
+	pmull		v7.1q, v7.1d, v10.1d
+	eor		v7.16b, v7.16b, v0.16b
+
+	// 32b fold
+	ext		v0.16b, v7.16b, vzr.16b, #4
+	mov		v7.s[3], vzr.s[0]
+	pmull2		v0.1q, v0.2d, v10.2d
+	eor		v7.16b, v7.16b, v0.16b
+
+	// barrett reduction
+_barrett:
+	ldr		q10, rk7
+	mov		v0.d[0], v7.d[1]
+
+	pmull		v0.1q, v0.1d, v10.1d
+	ext		v0.16b, vzr.16b, v0.16b, #12
+	pmull2		v0.1q, v0.2d, v10.2d
+	ext		v0.16b, vzr.16b, v0.16b, #12
+	eor		v7.16b, v7.16b, v0.16b
+	mov		w0, v7.s[1]
+
+_cleanup:
+	// scale the result back to 16 bits
+	lsr		x0, x0, #16
+	ldp		x29, x30, [sp], #16
+	ret
+
+_less_than_128:
+	cbz		arg3, _cleanup
+
+	movi		v0.16b, #0
+	mov		v0.s[3], arg1_low32	// get the initial crc value
+
+	ldr		q7, [arg2], #0x10
+CPU_LE(	rev64		v7.16b, v7.16b			)
+CPU_LE(	ext		v7.16b, v7.16b, v7.16b, #8	)
+	eor		v7.16b, v7.16b, v0.16b	// xor the initial crc value
+
+	// check if there is enough buffer to be able to fold 16B at a time
+	cmp		arg3, #32
+	b.lt		_final_reduction_for_128
+
+	// now if there is, load the constants
+	ldr		q10, rk1		// rk1 and rk2 in xmm10
+
+	// update the counter. subtract 32 instead of 16 to save one
+	// instruction from the loop
+	sub		arg3, arg3, #32
+	b		_16B_reduction_loop
+ENDPROC(crc_t10dif_pmull)
+
+// precomputed constants
+// these constants are precomputed from the poly:
+// 0x8bb70000 (0x8bb7 scaled to 32 bits)
+	.align		4
+// Q = 0x18BB70000
+// rk1 = 2^(32*3) mod Q << 32
+// rk2 = 2^(32*5) mod Q << 32
+// rk3 = 2^(32*15) mod Q << 32
+// rk4 = 2^(32*17) mod Q << 32
+// rk5 = 2^(32*3) mod Q << 32
+// rk6 = 2^(32*2) mod Q << 32
+// rk7 = floor(2^64/Q)
+// rk8 = Q
+
+rk1:	.octa		0x06df0000000000002d56000000000000
+rk3:	.octa		0x7cf50000000000009d9d000000000000
+rk5:	.octa		0x13680000000000002d56000000000000
+rk7:	.octa		0x000000018bb7000000000001f65a57f8
+rk9:	.octa		0xbfd6000000000000ceae000000000000
+rk11:	.octa		0x713c0000000000001e16000000000000
+rk13:	.octa		0x80a6000000000000f7f9000000000000
+rk15:	.octa		0xe658000000000000044c000000000000
+rk17:	.octa		0xa497000000000000ad18000000000000
+rk19:	.octa		0xe7b50000000000006ee3000000000000
diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm64/crypto/crct10dif-ce-glue.c
new file mode 100644
index 000000000000..735678884194
--- /dev/null
+++ b/arch/arm64/crypto/crct10dif-ce-glue.c
@@ -0,0 +1,91 @@
+/*
+ * Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cpufeature.h>
+#include <linux/crc-t10dif.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+#include <crypto/internal/hash.h>
+
+#include <asm/neon.h>
+
+#define CRC_T10DIF_PMULL_CHUNK_SIZE	16U
+
+asmlinkage u16 crc_t10dif_pmull(u16 init_crc, const u8 buf[], u64 len);
+
+static int crct10dif_init(struct shash_desc *desc)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	*crc = 0;
+	return 0;
+}
+
+static int crct10dif_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE) {
+		unsigned int l = length & ~(CRC_T10DIF_PMULL_CHUNK_SIZE - 1);
+
+		kernel_neon_begin_partial(14);
+		*crc = crc_t10dif_pmull(*crc, data, l);
+		kernel_neon_end();
+
+		data += l;
+	}
+	if (length % CRC_T10DIF_PMULL_CHUNK_SIZE)
+		*crc = crc_t10dif_generic(*crc, data,
+					  length % CRC_T10DIF_PMULL_CHUNK_SIZE);
+
+	return 0;
+}
+
+static int crct10dif_final(struct shash_desc *desc, u8 *out)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	*(u16 *)out = *crc;
+	return 0;
+}
+
+static struct shash_alg crc_t10dif_alg = {
+	.digestsize		= CRC_T10DIF_DIGEST_SIZE,
+	.init			= crct10dif_init,
+	.update			= crct10dif_update,
+	.final			= crct10dif_final,
+	.descsize		= CRC_T10DIF_DIGEST_SIZE,
+
+	.base.cra_name		= "crct10dif",
+	.base.cra_driver_name	= "crct10dif-arm64-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init crc_t10dif_mod_init(void)
+{
+	return crypto_register_shash(&crc_t10dif_alg);
+}
+
+static void __exit crc_t10dif_mod_exit(void)
+{
+	crypto_unregister_shash(&crc_t10dif_alg);
+}
+
+module_cpu_feature_match(PMULL, crc_t10dif_mod_init);
+module_exit(crc_t10dif_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 2/6] crypto: testmgr - add/enhance test cases for CRC-T10DIF
From: Ard Biesheuvel @ 2016-12-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1480852447-25082-1-git-send-email-ard.biesheuvel@linaro.org>

The existing test cases only exercise a small slice of the various
possible code paths through the x86 SSE/PCLMULQDQ implementation,
and the upcoming ports of it for arm64. So add one that exceeds 256
bytes in size, and convert another to a chunked test.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/testmgr.h | 70 ++++++++++++--------
 1 file changed, 42 insertions(+), 28 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index e64a4ef9d8ca..b7cd41b25a2a 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -1334,36 +1334,50 @@ static struct hash_testvec rmd320_tv_template[] = {
 	}
 };
 
-#define CRCT10DIF_TEST_VECTORS	3
+#define CRCT10DIF_TEST_VECTORS	ARRAY_SIZE(crct10dif_tv_template)
 static struct hash_testvec crct10dif_tv_template[] = {
 	{
-		.plaintext = "abc",
-		.psize  = 3,
-#ifdef __LITTLE_ENDIAN
-		.digest = "\x3b\x44",
-#else
-		.digest = "\x44\x3b",
-#endif
-	}, {
-		.plaintext = "1234567890123456789012345678901234567890"
-			     "123456789012345678901234567890123456789",
-		.psize	= 79,
-#ifdef __LITTLE_ENDIAN
-		.digest	= "\x70\x4b",
-#else
-		.digest	= "\x4b\x70",
-#endif
-	}, {
-		.plaintext =
-		"abcddddddddddddddddddddddddddddddddddddddddddddddddddddd",
-		.psize  = 56,
-#ifdef __LITTLE_ENDIAN
-		.digest = "\xe3\x9c",
-#else
-		.digest = "\x9c\xe3",
-#endif
-		.np     = 2,
-		.tap    = { 28, 28 }
+		.plaintext	= "abc",
+		.psize		= 3,
+		.digest		= (u8 *)(u16 []){ 0x443b },
+	}, {
+		.plaintext 	= "1234567890123456789012345678901234567890"
+				  "123456789012345678901234567890123456789",
+		.psize		= 79,
+		.digest 	= (u8 *)(u16 []){ 0x4b70 },
+		.np		= 2,
+		.tap		= { 63, 16 },
+	}, {
+		.plaintext	= "abcdddddddddddddddddddddddddddddddddddddddd"
+				  "ddddddddddddd",
+		.psize		= 56,
+		.digest		= (u8 *)(u16 []){ 0x9ce3 },
+		.np		= 8,
+		.tap		= { 1, 2, 28, 7, 6, 5, 4, 3 },
+	}, {
+		.plaintext 	= "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "123456789012345678901234567890123456789",
+		.psize		= 319,
+		.digest		= (u8 *)(u16 []){ 0x44c6 },
+	}, {
+		.plaintext 	= "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "123456789012345678901234567890123456789",
+		.psize		= 319,
+		.digest		= (u8 *)(u16 []){ 0x44c6 },
+		.np		= 4,
+		.tap		= { 1, 255, 57, 6 },
 	}
 };
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 1/6] crypto: testmgr - avoid overlap in chunked tests
From: Ard Biesheuvel @ 2016-12-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1480852447-25082-1-git-send-email-ard.biesheuvel@linaro.org>

The IDXn offsets are chosen such that tap values (which may go up to
255) end up overlapping in the xbuf allocation. In particular, IDX1
and IDX3 are too close together, so update IDX3 to avoid this issue.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/testmgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index ded50b67c757..670893bcf361 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -63,7 +63,7 @@ int alg_test(const char *driver, const char *alg, u32 type, u32 mask)
  */
 #define IDX1		32
 #define IDX2		32400
-#define IDX3		1
+#define IDX3		511
 #define IDX4		8193
 #define IDX5		22222
 #define IDX6		17101
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 0/6] crypto: ARM/arm64 CRC-T10DIF/CRC32/CRC32C roundup
From: Ard Biesheuvel @ 2016-12-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel

This v2 combines the CRC-T10DIF and CRC32 implementations for both ARM and
arm64 that I sent out a couple of weeks ago, and adds support to the latter
for CRC32C.

Ard Biesheuvel (6):
  crypto: testmgr - avoid overlap in chunked tests
  crypto: testmgr - add/enhance test cases for CRC-T10DIF
  crypto: arm64/crct10dif - port x86 SSE implementation to arm64
  crypto: arm/crct10dif - port x86 SSE implementation to ARM
  crypto: arm64/crc32 - accelerated support based on x86 SSE
    implementation
  crypto: arm/crc32 - accelerated support based on x86 SSE
    implementation

 arch/arm/crypto/Kconfig               |  10 +
 arch/arm/crypto/Makefile              |   4 +
 arch/arm/crypto/crc32-ce-core.S       | 306 +++++++++++++++++
 arch/arm/crypto/crc32-ce-glue.c       | 195 +++++++++++
 arch/arm/crypto/crct10dif-ce-core.S   | 349 ++++++++++++++++++++
 arch/arm/crypto/crct10dif-ce-glue.c   |  95 ++++++
 arch/arm64/crypto/Kconfig             |  11 +
 arch/arm64/crypto/Makefile            |   6 +
 arch/arm64/crypto/crc32-ce-core.S     | 266 +++++++++++++++
 arch/arm64/crypto/crc32-ce-glue.c     | 188 +++++++++++
 arch/arm64/crypto/crct10dif-ce-core.S | 317 ++++++++++++++++++
 arch/arm64/crypto/crct10dif-ce-glue.c |  91 +++++
 crypto/testmgr.c                      |   2 +-
 crypto/testmgr.h                      |  70 ++--
 14 files changed, 1881 insertions(+), 29 deletions(-)
 create mode 100644 arch/arm/crypto/crc32-ce-core.S
 create mode 100644 arch/arm/crypto/crc32-ce-glue.c
 create mode 100644 arch/arm/crypto/crct10dif-ce-core.S
 create mode 100644 arch/arm/crypto/crct10dif-ce-glue.c
 create mode 100644 arch/arm64/crypto/crc32-ce-core.S
 create mode 100644 arch/arm64/crypto/crc32-ce-glue.c
 create mode 100644 arch/arm64/crypto/crct10dif-ce-core.S
 create mode 100644 arch/arm64/crypto/crct10dif-ce-glue.c

-- 
2.7.4

^ permalink raw reply

* [SPCR] mmio32 iotype access requirements for X-Gene 8250(_dw) UART
From: Duc Dang @ 2016-12-04 10:35 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <c6a0c134-2fde-21c2-03fb-6e1961161a26@redhat.com>

On Sat, Dec 3, 2016 at 12:33 PM, Jon Masters <jcm@redhat.com> wrote:
> Hi Mark,
>
> On 12/03/2016 12:15 PM, Mark Salter wrote:
>> On Sat, 2016-12-03 at 05:06 -0500, Jon Masters wrote:
>
>>> There is a broader problem with X-Gene SPCR support. The problem is
>>> that the 16550 UART in X-Gene requires the 32-bit access quirk (the
>>> iotype is set to UPIO_MEM32 for the APMC0D08 device). This means
>>> that when univ8250_console_match runs later, it will compare the
>>> iotype (MEM32) with the type previously registered with the
>>> kernel when the earlycon setup the preferred console.
>>
>> Linaro has a kernel patch which looks at the bit_width field of the
>> port address:
>>
>> Author: Aleksey Makarov <aleksey.makarov@linaro.org>
>> Date:   Thu Apr 28 19:52:38 2016 +0300
>>
>>     serial: SPCR: check bit width for the 16550 UART
>>
>> The SPCR in 3.06.25 firmware has a bit_width field set to 32 and with the
>> above patch, I don't need to use console=. But HP firmware on m400 sets
>> bit width to 8 so that needs a firmware fix to work with above.

Will this patch be posted for upstream?

>
> Indeed, thanks. Graeme also mentioned that last night. I think this
> a good solution for the moment, but still that it makes sense to
> have a new 16650 UART type defined in the DBG2 spec for those
> requiring 32-bit access (to mirror the type D for pl011). I
> heard back from Microsoft this morning that they're looking.

Yes, thanks, Jon. It will be nice to have a new 16550 UART type for
those requiring 32-bit access.

>
> Jon.
>
> --
> Computer Architect | Sent from my Fedora powered laptop
>
Regards,
Duc Dang.

^ permalink raw reply

* [PATCH] trace: extend trace_clock to support arch_arm clock counter
From: Srinivas Ramana @ 2016-12-04  8:36 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161202110845.GC8266@arm.com>

On 12/02/2016 04:38 PM, Will Deacon wrote:
> On Fri, Dec 02, 2016 at 01:44:55PM +0530, Srinivas Ramana wrote:
>> Extend the trace_clock to support the arch timer cycle
>> counter so that we can get the monotonic cycle count
>> in the traces. This will help in correlating the traces with the
>> timestamps/events in other subsystems in the soc which share
>> this common counter for driving their timers.
>
> I'm not sure I follow this reasoning. What's wrong with nanoseconds? In
> particular, the "perf" trace_clock hangs off sched_clock, which should
> be backed by the architected counter anyway. What does the cycle counter in
> isolation tell you, given that the frequency isn't architected?
>
> I think I'm missing something here.
>
> Will
>

Having cycle counter would help in the cases where we want to correlate 
the time with other subsystems which are outside cpu subsystem. 
local_clock or even the perf track_clock uses sched_clock which gets 
suspended during system suspend. Yes, they are backed up by the 
architected counter but they ignore the cycles spent in suspend. so, 
when comparing with monotonically increasing cycle counter, other clocks 
doesn't help. It seems X86 uses the TSC counter to help such cases.

Thanks,
-- Srinivas R

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, 
Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [linux-sunxi] [PATCH v3 -next 2/2] ARM: dts: sunxi: add support for Orange Pi Zero board
From: Alexey Kardashevskiy @ 2016-12-04  8:12 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAGb2v661hBsd7tBcYByeRKEZh+8YMB3ewbnkrT3rUvzP2f90yQ@mail.gmail.com>

On 03/12/16 03:41, Chen-Yu Tsai wrote:
> Hi,
> 
> On Fri, Dec 2, 2016 at 11:05 PM, Icenowy Zheng <icenowy@aosc.xyz> wrote:
>> Orange Pi Zero is a board that came with the new Allwinner H2+ SoC and a
>> SDIO Wi-Fi chip by Allwinner (XR819).
>>
>> Add a device tree file for it.
>>
>> Signed-off-by: Icenowy Zheng <icenowy@aosc.xyz>
>> ---
>> Changes since v2:
>> - Merged SDIO Wi-Fi patch into it.
>> - SDIO Wi-Fi: add a ethernet1 alias to it, as it has no internal NVRAM.
>> - SDIO Wi-Fi: changed pinctrl binding to generic pinconf
>> - removed all gpio pinctrl nodes
>> - changed h2plus to h2-plus
>> Changes since v1:
>> - Convert to generic pinconf bindings.
>> - SDIO Wi-Fi: add patch.
>>
>> Some notes:
>> - The uart1 and uart2 is available on the unsoldered gpio header.
>> - The onboard USB connector has its Vbus directly connected to DCIN-5V (the
>>   power jack)
>>
>>  arch/arm/boot/dts/Makefile                        |   1 +
>>  arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts | 159 ++++++++++++++++++++++
>>  2 files changed, 160 insertions(+)
>>  create mode 100644 arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
>>
>> diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
>> index 6447abc..59f6e86 100644
>> --- a/arch/arm/boot/dts/Makefile
>> +++ b/arch/arm/boot/dts/Makefile
>> @@ -844,6 +844,7 @@ dtb-$(CONFIG_MACH_SUN8I) += \
>>         sun8i-a33-sinlinx-sina33.dtb \
>>         sun8i-a83t-allwinner-h8homlet-v2.dtb \
>>         sun8i-a83t-cubietruck-plus.dtb \
>> +       sun8i-h2-plus-orangepi-zero.dtb \
>>         sun8i-h3-bananapi-m2-plus.dtb \
>>         sun8i-h3-nanopi-neo.dtb \
>>         sun8i-h3-orangepi-2.dtb \
>> diff --git a/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
>> new file mode 100644
>> index 0000000..d18807f
>> --- /dev/null
>> +++ b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
>> @@ -0,0 +1,159 @@
>> +/*
>> + * Copyright (C) 2016 Icenowy Zheng <icenowy@aosc.xyz>
>> + *
>> + * Based on sun8i-h3-orangepi-one.dts, which is:
>> + *   Copyright (C) 2016 Hans de Goede <hdegoede@redhat.com>
>> + *
>> + * This file is dual-licensed: you can use it either under the terms
>> + * of the GPL or the X11 license, at your option. Note that this dual
>> + * licensing only applies to this file, and not this project as a
>> + * whole.
>> + *
>> + *  a) This file is free software; you can redistribute it and/or
>> + *     modify it under the terms of the GNU General Public License as
>> + *     published by the Free Software Foundation; either version 2 of the
>> + *     License, or (at your option) any later version.
>> + *
>> + *     This file is distributed in the hope that it will be useful,
>> + *     but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + *     GNU General Public License for more details.
>> + *
>> + * Or, alternatively,
>> + *
>> + *  b) Permission is hereby granted, free of charge, to any person
>> + *     obtaining a copy of this software and associated documentation
>> + *     files (the "Software"), to deal in the Software without
>> + *     restriction, including without limitation the rights to use,
>> + *     copy, modify, merge, publish, distribute, sublicense, and/or
>> + *     sell copies of the Software, and to permit persons to whom the
>> + *     Software is furnished to do so, subject to the following
>> + *     conditions:
>> + *
>> + *     The above copyright notice and this permission notice shall be
>> + *     included in all copies or substantial portions of the Software.
>> + *
>> + *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
>> + *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
>> + *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
>> + *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + *     OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +/dts-v1/;
>> +#include "sun8i-h3.dtsi"
>> +#include "sunxi-common-regulators.dtsi"
>> +
>> +#include <dt-bindings/gpio/gpio.h>
>> +#include <dt-bindings/input/input.h>
>> +#include <dt-bindings/pinctrl/sun4i-a10.h>
>> +
>> +/ {
>> +       model = "Xunlong Orange Pi Zero";
>> +       compatible = "xunlong,orangepi-zero", "allwinner,sun8i-h2-plus";
>> +
>> +       aliases {
>> +               serial0 = &uart0;
>> +               /* ethernet0 is the H3 emac, defined in sun8i-h3.dtsi */
>> +               ethernet1 = &xr819;
>> +       };
>> +
>> +       chosen {
>> +               stdout-path = "serial0:115200n8";
>> +       };
>> +
>> +       leds {
>> +               compatible = "gpio-leds";
>> +
>> +               pwr_led {
>> +                       label = "orangepi:green:pwr";
>> +                       gpios = <&r_pio 0 10 GPIO_ACTIVE_HIGH>;
>> +                       default-state = "on";
>> +               };
>> +
>> +               status_led {
>> +                       label = "orangepi:red:status";
>> +                       gpios = <&pio 0 17 GPIO_ACTIVE_HIGH>;
>> +               };
>> +       };
>> +
>> +       reg_vcc_wifi: reg_vcc_wifi {
>> +               compatible = "regulator-fixed";
>> +               regulator-min-microvolt = <3300000>;
>> +               regulator-max-microvolt = <3300000>;
>> +               regulator-name = "vcc-wifi";
>> +               enable-active-high;
>> +               gpio = <&pio 0 20 GPIO_ACTIVE_HIGH>;
>> +       };
>> +
>> +       wifi_pwrseq: wifi_pwrseq {
>> +               compatible = "mmc-pwrseq-simple";
>> +               reset-gpios = <&r_pio 0 7 GPIO_ACTIVE_LOW>;
>> +       };
>> +};
>> +
>> +&ehci1 {
>> +       status = "okay";
>> +};
>> +
>> +&mmc0 {
>> +       pinctrl-names = "default";
>> +       pinctrl-0 = <&mmc0_pins_a>;
>> +       vmmc-supply = <&reg_vcc3v3>;
>> +       bus-width = <4>;
>> +       cd-gpios = <&pio 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
>> +       cd-inverted;
>> +       status = "okay";
>> +};
>> +
>> +&mmc1 {
>> +       pinctrl-names = "default";
>> +       pinctrl-0 = <&mmc1_pins_a>;
>> +       vmmc-supply = <&reg_vcc_wifi>;
>> +       mmc-pwrseq = <&wifi_pwrseq>;
>> +       bus-width = <4>;
>> +       non-removable;
>> +       status = "okay";
>> +
>> +       /*
>> +        * Explicitly define the sdio device, so that we can add an ethernet
>> +        * alias for it (which e.g. makes u-boot set a mac-address).
>> +        */
>> +       xr819: sdio_wifi at 1 {
>> +               reg = <1>;
>> +       };
>> +};
>> +
>> +&mmc1_pins_a {
>> +       bias-pull-up;
> 
> This is already set in h3.dtsi
> 
>> +};
>> +
>> +&ohci1 {
>> +       status = "okay";
>> +};
>> +
>> +&uart0 {
>> +       pinctrl-names = "default";
>> +       pinctrl-0 = <&uart0_pins_a>;
>> +       status = "okay";
>> +};
>> +
>> +&uart1 {
>> +       pinctrl-names = "default";
>> +       pinctrl-0 = <&uart1_pins>;
>> +       status = "disabled";
>> +};
>> +
>> +&uart2 {
>> +       pinctrl-names = "default";
>> +       pinctrl-0 = <&uart2_pins>;
>> +       status = "disabled";
>> +};
>> +
>> +&usbphy {
>> +       /* USB VBUS is always on */
> 
> I think this comment could use a little work.
> 
> AFAIK this board doesn't have an actual USB port.


Mine does have one port.


> It's just the D+/D- pins on the pin header, along
> with the board-wide 5V, also on the pin header.
> 
> ChenYu
> 
>> +       status = "okay";
>> +};
>> --
>> 2.10.2
>>
>> --
>> You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.


-- 
Alexey

^ permalink raw reply

* [RESEND PATCH V6 0/6] Add support for privileged mappings
From: Sricharan @ 2016-12-04  7:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <e7e05493-e216-d744-b828-351d2c70150d@arm.com>

Hi Robin,

>Hi Sricharan,
>
>On 02/12/16 14:55, Sricharan R wrote:
>> This series is a resend of the V5 that Mitch sent sometime back [2]
>> All the patches are the same and i have just rebased. Not sure why this
>> finally did not make it last time. The last patch in the previous
>> series does not apply now [3], so just redid that. Also Copied the tags
>> that he had from last time as well.
>
>Heh, I was assuming this would be down to me to pick up. Vinod did have
>some complaints last time about the commit message on the PL330 patch -
>I did get as far as rewriting that and reworking onto my SMMU
>changes[1], I just hadn't got round to sending it, so it fell onto the
>"after the next merge window" pile.
>
>I'd give some review comments, but they'd essentially be a diff against
>that branch :)
>

Sure, i did not knew that you were on this already. I can repost with the diff
from your branch taken in or wait for you as well. I am fine with either ways
that you suggest.

I checked the patches against your branch, i see that the changes are,

1) one patch for implementing it for armv7s descriptor
2) Changes on pl330 patch commit logs and
3) One patch for doing the revert on arm-smmuv3 as well.

Regards,
 Sricharan

^ permalink raw reply

* [GIT PULL 1/10] mailbox: Add Tegra HSP driver
From: Olof Johansson @ 2016-12-04  5:25 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161121081752.GC25171@ulmo.ba.sec>

On Mon, Nov 21, 2016 at 09:17:52AM +0100, Thierry Reding wrote:
> On Fri, Nov 18, 2016 at 06:27:42PM -0800, Olof Johansson wrote:
> > Hi,
> > 
> > On Fri, Nov 18, 2016 at 05:17:10PM +0100, Thierry Reding wrote:
> > > Hi ARM SoC maintainers,
> > > 
> > > The following changes since commit 1001354ca34179f3db924eb66672442a173147dc:
> > > 
> > >   Linux 4.9-rc1 (2016-10-15 12:17:50 -0700)
> > > 
> > > are available in the git repository at:
> > > 
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux.git tags/tegra-for-4.10-mailbox
> > > 
> > > for you to fetch changes up to 68050eb6c611527232fe5574c7306e97e47499ef:
> > > 
> > >   mailbox: tegra-hsp: Use after free in tegra_hsp_remove_doorbells() (2016-11-18 14:32:13 +0100)
> > > 
> > > Thanks,
> > > Thierry
> > > 
> > > ----------------------------------------------------------------
> > > mailbox: Add Tegra HSP driver
> > > 
> > > This contains the device tree bindings and a driver for the Tegra HSP, a
> > > hardware block that provides hardware synchronization primitives and is
> > > the foundation for inter-processor communication between CPU and BPMP.
> > > 
> > > ----------------------------------------------------------------
> > > Dan Carpenter (1):
> > >       mailbox: tegra-hsp: Use after free in tegra_hsp_remove_doorbells()
> > > 
> > > Joseph Lo (2):
> > >       soc/tegra: Add Tegra186 support
> > 
> > I don't think you really needed to merge this in here, since all you need it
> > for is to fulfill the kconfig dependency and enable the driver, right? That'd
> > happen when the driver and soc branch is merged at the toplevel anyway.
> 
> The reason I did this is that I wanted each branch to be buildable as a
> way to confirm that the dependencies are correct. In order to do that I
> need the Kconfig symbol to enable the driver.

Good point, but that's more of a local setup thing for you, and not something
that necessarily needs to go upstream.

> I suppose there are other ways I could've done that, though. Maybe in
> the future new SoC Kconfig symbols should just be introduced way ahead
> of time, so that they're already in a release or two before actual code
> starts to emerge.

That'd work too!

> > Anyhow, no damage done, I've merged this in. I would say that it'd be a little
> > more logical to send the SoC branch before the driver branch given this
> > dependency though.
> 
> The reason that the SoC branch was sent after is because only the first
> commit in that branch was pulled into the mailbox branch.
> 
> In retrospect, I think perhaps a better approach would've been to have a
> separate branch with only the Kconfig symbol addition and pull that in
> where needed.

That could work, but the whole branch-merge-features-then-enable it workflow is
quite acceptable as well.


-Olof

^ permalink raw reply

* [GIT PULL 9/10] arm64: tegra: Device tree changes for v4.10-rc1
From: Olof Johansson @ 2016-12-04  5:13 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161121082210.GD25171@ulmo.ba.sec>

On Mon, Nov 21, 2016 at 09:22:10AM +0100, Thierry Reding wrote:
> On Fri, Nov 18, 2016 at 06:40:21PM -0800, Olof Johansson wrote:
> > On Fri, Nov 18, 2016 at 05:17:18PM +0100, Thierry Reding wrote:
> > > Hi ARM SoC maintainers,
> > > 
> > > The following changes since commit 1001354ca34179f3db924eb66672442a173147dc:
> > > 
> > >   Linux 4.9-rc1 (2016-10-15 12:17:50 -0700)
> > > 
> > > are available in the git repository at:
> > > 
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux.git tags/tegra-for-4.10-arm64-dt
> > > 
> > > for you to fetch changes up to cc13b4fa4ac780cec6c21b64a39ab2950e95e8f6:
> > > 
> > >   arm64: tegra: Add NVIDIA P2771 board support (2016-11-18 14:35:53 +0100)
> > > 
> > > Thanks,
> > > Thierry
> > > 
> > > ----------------------------------------------------------------
> > > arm64: tegra: Device tree changes for v4.10-rc1
> > > 
> > > This adds initial support for Tegra186, the P3310 processor module as
> > > well as the P2771 development board. Not much is functional, but there
> > > is enough to boot to an initial ramdisk with debug serial output.
> > > 
> > > ----------------------------------------------------------------
> > > Dan Carpenter (1):
> > >       mailbox: tegra-hsp: Use after free in tegra_hsp_remove_doorbells()
> > > 
> > > Joseph Lo (6):
> > >       soc/tegra: Add Tegra186 support
> > >       dt-bindings: mailbox: Add Tegra HSP binding
> > >       dt-bindings: firmware: Add bindings for Tegra BPMP
> > >       arm64: tegra: Add Tegra186 support
> > >       arm64: tegra: Add NVIDIA P3310 processor module support
> > >       arm64: tegra: Add NVIDIA P2771 board support
> > > 
> > > Stephen Warren (2):
> > >       dt-bindings: Add power domains to Tegra BPMP firmware
> > >       dt-bindings: firmware: Allow child nodes inside the Tegra BPMP
> > > 
> > > Thierry Reding (12):
> > >       Merge branch 'for-4.10/soc' into for-4.10/mailbox
> > >       mailbox: Add Tegra HSP driver
> > >       Merge branch 'for-4.10/mailbox' into for-4.10/firmware
> > >       firmware: tegra: Add IVC library
> > >       firmware: tegra: Add BPMP support
> > >       Merge branch 'for-4.10/firmware' into for-4.10/arm64/dt
> > >       arm64: tegra: Add CPU nodes for Tegra186
> > >       arm64: tegra: Add serial ports on Tegra186
> > >       arm64: tegra: Add I2C controllers on Tegra186
> > >       arm64: tegra: Add SDHCI controllers on Tegra186
> > >       arm64: tegra: Add GPIO controllers on Tegra186
> > >       arm64: tegra: Enable PSCI on P3310
> > 
> > The drivers->dt dependency here is annoying. Any chance you can respin without
> > it?
> > 
> > We've been encouraging people to consider using numerical clock/gpio/reset
> > numbers on initial submission to avoid these dependencies on dt-bindings
> > includes, and then follow up with a move to the symbolic names between -rc1 and
> > -rc2. Mind doing the same here?
> 
> Yes, I can do that. Would it be acceptable to have a dt-bindings->dt
> dependency? Stephen's already done a good job of avoiding this kind of
> dependency by getting the bindings, and hence dt-bindings headers,
> merged ahead of Linux kernel support because he had already gotten the
> bindings reviewed and finalized during his work on U-Boot.
> 
> I've been told in the past that it's not necessary to strictly split DT
> bindings patches from driver patches, but I suppose if a dt-bindings->dt
> is acceptable, then splitting things up more strictly would actually be
> the preferable solution here because it also avoids the slight churn of
> converting to symbolic values later on.

Sorry, haven't been looking at this older for a while so this reply is
high-latency.

If you want to do a separate dt-bindings branch that includes the headers and
is present in both the dt and drivers branch, that'd work. If the changes are
trivial though, and just contains a few clocks, I think we should just have
them opencoded on the original merge, and then move over once the include files
have gone in. I'd be willing to merge such conversions between -rc1 and -rc2.

That's in particular the case when there's an external driver tree/maintainer,
since it avoids the three-way handshake, etc.


-Olof

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox