Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 4/4] arm64: dts: marvell: Enable spi0 on the board Armada-3720-db
From: Romain Perier @ 2016-12-08 14:58 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161208145847.7794-1-romain.perier@free-electrons.com>

This commit enables the device node spi0 on the official development
board for the Marvell Armada 3700. It also adds sub-node for the 128Mb
SPI-NOR present on the board.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---

Changes in v3:
 - Added tag "Tested-by" by Gregory

 arch/arm64/boot/dts/marvell/armada-3720-db.dts | 30 ++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-3720-db.dts b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
index 1372e9a6..0c4eb98 100644
--- a/arch/arm64/boot/dts/marvell/armada-3720-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
@@ -67,6 +67,36 @@
 	status = "okay";
 };
 
+&spi0 {
+	status = "okay";
+
+	m25p80 at 0 {
+		compatible = "jedec,spi-nor";
+		reg = <0>;
+		spi-max-frequency = <108000000>;
+		spi-rx-bus-width = <4>;
+		spi-tx-bus-width = <4>;
+
+		partitions {
+			compatible = "fixed-partitions";
+			#address-cells = <1>;
+			#size-cells = <1>;
+			partition at 0 {
+				label = "bootloader";
+				reg = <0x0 0x200000>;
+			};
+			partition at 200000 {
+				label = "U-boot Env";
+				reg = <0x200000 0x10000>;
+			};
+			partition at 210000 {
+				label = "Linux";
+				reg = <0x210000 0xDF0000>;
+			};
+		};
+	};
+};
+
 /* Exported on the micro USB connector CON32 through an FTDI */
 &uart0 {
 	status = "okay";
-- 
2.9.3

^ permalink raw reply related

* [PATCH v4 3/4] arm64: dts: marvell: Add definition of SPI controller for Armada 3700
From: Romain Perier @ 2016-12-08 14:58 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161208145847.7794-1-romain.perier@free-electrons.com>

Armada 3700 SoC has an SPI Controller, this commit adds the definition
of the SPI device node at the SoC level.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---

Changes in v3:
 - Fixed wrong register size for spi0, as suggested by the maintainer
   on the ML.
 - Added tag "Tested-by" by Gregory

Changes in v2:
 - Removed properties max-frequency and clock-frequency, it is no
   longer required and not used by the DT-bindings.

 arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
index e9bd587..fcef9a5 100644
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
@@ -98,6 +98,17 @@
 			/* 32M internal register @ 0xd000_0000 */
 			ranges = <0x0 0x0 0xd0000000 0x2000000>;
 
+			spi0: spi at 10600 {
+				compatible = "marvell,armada-3700-spi";
+				#address-cells = <1>;
+				#size-cells = <0>;
+				reg = <0x10600 0xA00>;
+				clocks = <&nb_periph_clk 7>;
+				interrupts = <GIC_SPI 0 IRQ_TYPE_LEVEL_HIGH>;
+				num-cs = <4>;
+				status = "disabled";
+			};
+
 			uart0: serial at 12000 {
 				compatible = "marvell,armada-3700-uart";
 				reg = <0x12000 0x400>;
-- 
2.9.3

^ permalink raw reply related

* [PATCH v4 2/4] spi: armada-3700: Add documentation for the Armada 3700 SPI Controller
From: Romain Perier @ 2016-12-08 14:58 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161208145847.7794-1-romain.perier@free-electrons.com>

This adds the devicetree bindings documentation for the SPI controller
present in the Marvell Armada 3700 SoCs.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Rob Herring <robh@kernel.org>
---

Changes in v4:
 - Added tag "Acked-by" by Rob Herring

Changes in v3:
 - Added tag "Tested-by" by Gregory
 - Fixed commit title, as requested by Mark Brown

 .../devicetree/bindings/spi/spi-armada-3700.txt    | 25 ++++++++++++++++++++++
 1 file changed, 25 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/spi/spi-armada-3700.txt

diff --git a/Documentation/devicetree/bindings/spi/spi-armada-3700.txt b/Documentation/devicetree/bindings/spi/spi-armada-3700.txt
new file mode 100644
index 0000000..1564aa8
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/spi-armada-3700.txt
@@ -0,0 +1,25 @@
+* Marvell Armada 3700 SPI Controller
+
+Required Properties:
+
+- compatible: should be "marvell,armada-3700-spi"
+- reg: physical base address of the controller and length of memory mapped
+       region.
+- interrupts: The interrupt number. The interrupt specifier format depends on
+	      the interrupt controller and of its driver.
+- clocks: Must contain the clock source, usually from the North Bridge clocks.
+- num-cs: The number of chip selects that is supported by this SPI Controller
+- #address-cells: should be 1.
+- #size-cells: should be 0.
+
+Example:
+
+	spi0: spi at 10600 {
+		compatible = "marvell,armada-3700-spi";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		reg = <0x10600 0x5d>;
+		clocks = <&nb_perih_clk 7>;
+		interrupts = <GIC_SPI 0 IRQ_TYPE_LEVEL_HIGH>;
+		num-cs = <4>;
+	};
-- 
2.9.3

^ permalink raw reply related

* [PATCH v4 1/4] spi: Add support for Armada 3700 SPI Controller
From: Romain Perier @ 2016-12-08 14:58 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161208145847.7794-1-romain.perier@free-electrons.com>

Marvell Armada 3700 SoC comprises an SPI Controller. This Controller
supports up to 4 SPI slave devices, with dedicated chip selects,supports
SPI mode 0/1/2 and 3, CPIO or Fifo mode with DMA transfers and different
SPI transfer mode (Single, Dual or Quad).

This commit adds basic driver support for FIFO mode. In this mode,
dedicated registers are used to store the instruction, the address, the
read mode and the data. Write and Read FIFO are used to store the
outcoming or incoming data. The data FIFOs are accessible via DMA or by
the CPU. Only the CPU is supported for now.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---

Changes in v4:
 - Added missing COMPILE_TEST in KConfig
 - Replaced the busy wait for loop by a single busy wait in spi_init()
 - Handle IRQ_NONE in the interrupt handler 
 - Assign pdev->id to master->cs_num directly (no conditionnal operator)
 - Moved clock gating into prepare/unprepare callbacks
 - Removed dev_info() in the end of the probe() function
 - Added flag to the spi master for single duplex transfers
 - Squashed patch 1/5 and 2/5 together: remove the non-fifo mode, just keep
   the more efficient mode
 - So removed the module parameter

Changes in v3:
 - Don't enable the fifo mode based on the compatible string, we introduce
   a module parameter "pio_mode". By default this option is set to zero, so
   the fifo mode is enabled. Pass pio_mode=1 to the driver enables the PIO
   mode.
 - Added tag "Tested-by" by Gregory
 - Fixed wrong variable passed as MODULE_DEVICE_TABLE
 - Added missing null terminated entry in a3700_spi_dt_ids 

Changes in v2:
 - Removed a3700_spi_bytelen_set from the setup function, it was accidentally
   let here and not required, as it is configured in the prepare callback now
   (defaults to 4 for fifo mode). It solves unrecognized spi-nor flash memory
   detection with jedec.

 drivers/spi/Kconfig           |   7 +
 drivers/spi/Makefile          |   1 +
 drivers/spi/spi-armada-3700.c | 923 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 931 insertions(+)
 create mode 100644 drivers/spi/spi-armada-3700.c

diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
index b799547..1faad2ce 100644
--- a/drivers/spi/Kconfig
+++ b/drivers/spi/Kconfig
@@ -67,6 +67,13 @@ config SPI_ATH79
 	  This enables support for the SPI controller present on the
 	  Atheros AR71XX/AR724X/AR913X SoCs.
 
+config SPI_ARMADA_3700
+	tristate "Marvell Armada 3700 SPI Controller"
+	depends on (ARCH_MVEBU && OF) || COMPILE_TEST
+	help
+	  This enables support for the SPI controller present on the
+	  Marvell Armada 3700 SoCs.
+
 config SPI_ATMEL
 	tristate "Atmel SPI Controller"
 	depends on HAS_DMA
diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile
index aa939d9..140ca45 100644
--- a/drivers/spi/Makefile
+++ b/drivers/spi/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_SPI_LOOPBACK_TEST)		+= spi-loopback-test.o
 
 # SPI master controller drivers (bus)
 obj-$(CONFIG_SPI_ALTERA)		+= spi-altera.o
+obj-$(CONFIG_SPI_ARMADA_3700)		+= spi-armada-3700.o
 obj-$(CONFIG_SPI_ATMEL)			+= spi-atmel.o
 obj-$(CONFIG_SPI_ATH79)			+= spi-ath79.o
 obj-$(CONFIG_SPI_AU1550)		+= spi-au1550.o
diff --git a/drivers/spi/spi-armada-3700.c b/drivers/spi/spi-armada-3700.c
new file mode 100644
index 0000000..e89da0a
--- /dev/null
+++ b/drivers/spi/spi-armada-3700.c
@@ -0,0 +1,923 @@
+/*
+ * Marvell Armada-3700 SPI controller driver
+ *
+ * Copyright (C) 2016 Marvell Ltd.
+ *
+ * Author: Wilson Ding <dingwei@marvell.com>
+ * Author: Romain Perier <romain.perier@free-electrons.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/clk.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_irq.h>
+#include <linux/of_device.h>
+#include <linux/pinctrl/consumer.h>
+#include <linux/spi/spi.h>
+
+#define DRIVER_NAME			"armada_3700_spi"
+
+#define A3700_SPI_TIMEOUT		10
+
+/* SPI Register Offest */
+#define A3700_SPI_IF_CTRL_REG		0x00
+#define A3700_SPI_IF_CFG_REG		0x04
+#define A3700_SPI_DATA_OUT_REG		0x08
+#define A3700_SPI_DATA_IN_REG		0x0C
+#define A3700_SPI_IF_INST_REG		0x10
+#define A3700_SPI_IF_ADDR_REG		0x14
+#define A3700_SPI_IF_RMODE_REG		0x18
+#define A3700_SPI_IF_HDR_CNT_REG	0x1C
+#define A3700_SPI_IF_DIN_CNT_REG	0x20
+#define A3700_SPI_IF_TIME_REG		0x24
+#define A3700_SPI_INT_STAT_REG		0x28
+#define A3700_SPI_INT_MASK_REG		0x2C
+
+/* A3700_SPI_IF_CTRL_REG */
+#define A3700_SPI_EN			BIT(16)
+#define A3700_SPI_ADDR_NOT_CONFIG	BIT(12)
+#define A3700_SPI_WFIFO_OVERFLOW	BIT(11)
+#define A3700_SPI_WFIFO_UNDERFLOW	BIT(10)
+#define A3700_SPI_RFIFO_OVERFLOW	BIT(9)
+#define A3700_SPI_RFIFO_UNDERFLOW	BIT(8)
+#define A3700_SPI_WFIFO_FULL		BIT(7)
+#define A3700_SPI_WFIFO_EMPTY		BIT(6)
+#define A3700_SPI_RFIFO_FULL		BIT(5)
+#define A3700_SPI_RFIFO_EMPTY		BIT(4)
+#define A3700_SPI_WFIFO_RDY		BIT(3)
+#define A3700_SPI_RFIFO_RDY		BIT(2)
+#define A3700_SPI_XFER_RDY		BIT(1)
+#define A3700_SPI_XFER_DONE		BIT(0)
+
+/* A3700_SPI_IF_CFG_REG */
+#define A3700_SPI_WFIFO_THRS		BIT(28)
+#define A3700_SPI_RFIFO_THRS		BIT(24)
+#define A3700_SPI_AUTO_CS		BIT(20)
+#define A3700_SPI_DMA_RD_EN		BIT(18)
+#define A3700_SPI_FIFO_MODE		BIT(17)
+#define A3700_SPI_SRST			BIT(16)
+#define A3700_SPI_XFER_START		BIT(15)
+#define A3700_SPI_XFER_STOP		BIT(14)
+#define A3700_SPI_INST_PIN		BIT(13)
+#define A3700_SPI_ADDR_PIN		BIT(12)
+#define A3700_SPI_DATA_PIN1		BIT(11)
+#define A3700_SPI_DATA_PIN0		BIT(10)
+#define A3700_SPI_FIFO_FLUSH		BIT(9)
+#define A3700_SPI_RW_EN			BIT(8)
+#define A3700_SPI_CLK_POL		BIT(7)
+#define A3700_SPI_CLK_PHA		BIT(6)
+#define A3700_SPI_BYTE_LEN		BIT(5)
+#define A3700_SPI_CLK_PRESCALE		BIT(0)
+#define A3700_SPI_CLK_PRESCALE_MASK	(0x1f)
+
+#define A3700_SPI_WFIFO_THRS_BIT	28
+#define A3700_SPI_RFIFO_THRS_BIT	24
+#define A3700_SPI_FIFO_THRS_MASK	0x7
+
+#define A3700_SPI_DATA_PIN_MASK		0x3
+
+/* A3700_SPI_IF_HDR_CNT_REG */
+#define A3700_SPI_DUMMY_CNT_BIT		12
+#define A3700_SPI_DUMMY_CNT_MASK	0x7
+#define A3700_SPI_RMODE_CNT_BIT		8
+#define A3700_SPI_RMODE_CNT_MASK	0x3
+#define A3700_SPI_ADDR_CNT_BIT		4
+#define A3700_SPI_ADDR_CNT_MASK		0x7
+#define A3700_SPI_INSTR_CNT_BIT		0
+#define A3700_SPI_INSTR_CNT_MASK	0x3
+
+/* A3700_SPI_IF_TIME_REG */
+#define A3700_SPI_CLK_CAPT_EDGE		BIT(7)
+
+/* Flags and macros for struct a3700_spi */
+#define A3700_INSTR_CNT			1
+#define A3700_ADDR_CNT			3
+#define A3700_DUMMY_CNT			1
+
+struct a3700_spi {
+	struct spi_master *master;
+	void __iomem *base;
+	struct clk *clk;
+	unsigned int irq;
+	unsigned int flags;
+	bool xmit_data;
+	const u8 *tx_buf;
+	u8 *rx_buf;
+	size_t buf_len;
+	u8 byte_len;
+	u32 wait_mask;
+	struct completion done;
+	u32 addr_cnt;
+	u32 instr_cnt;
+	size_t hdr_cnt;
+};
+
+static u32 spireg_read(struct a3700_spi *a3700_spi, u32 offset)
+{
+	return readl(a3700_spi->base + offset);
+}
+
+static void spireg_write(struct a3700_spi *a3700_spi, u32 offset, u32 data)
+{
+	writel(data, a3700_spi->base + offset);
+}
+
+static void a3700_spi_auto_cs_unset(struct a3700_spi *a3700_spi)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val &= ~A3700_SPI_AUTO_CS;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+}
+
+static void a3700_spi_activate_cs(struct a3700_spi *a3700_spi, unsigned int cs)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CTRL_REG);
+	val |= (A3700_SPI_EN << cs);
+	spireg_write(a3700_spi, A3700_SPI_IF_CTRL_REG, val);
+}
+
+static void a3700_spi_deactivate_cs(struct a3700_spi *a3700_spi,
+				    unsigned int cs)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CTRL_REG);
+	val &= ~(A3700_SPI_EN << cs);
+	spireg_write(a3700_spi, A3700_SPI_IF_CTRL_REG, val);
+}
+
+static int a3700_spi_pin_mode_set(struct a3700_spi *a3700_spi,
+				  unsigned int pin_mode)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val &= ~(A3700_SPI_INST_PIN | A3700_SPI_ADDR_PIN);
+	val &= ~(A3700_SPI_DATA_PIN0 | A3700_SPI_DATA_PIN1);
+
+	switch (pin_mode) {
+	case 1:
+		break;
+	case 2:
+		val |= A3700_SPI_DATA_PIN0;
+		break;
+	case 4:
+		val |= A3700_SPI_DATA_PIN1;
+		break;
+	default:
+		dev_err(&a3700_spi->master->dev, "wrong pin mode %u", pin_mode);
+		return -EINVAL;
+	}
+
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+
+	return 0;
+}
+
+static void a3700_spi_fifo_mode_set(struct a3700_spi *a3700_spi)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val |= A3700_SPI_FIFO_MODE;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+}
+
+static void a3700_spi_mode_set(struct a3700_spi *a3700_spi,
+			       unsigned int mode_bits)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+
+	if (mode_bits & SPI_CPOL)
+		val |= A3700_SPI_CLK_POL;
+	else
+		val &= ~A3700_SPI_CLK_POL;
+
+	if (mode_bits & SPI_CPHA)
+		val |= A3700_SPI_CLK_PHA;
+	else
+		val &= ~A3700_SPI_CLK_PHA;
+
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+}
+
+static void a3700_spi_clock_set(struct a3700_spi *a3700_spi,
+				unsigned int speed_hz, u16 mode)
+{
+	u32 val;
+	u32 prescale;
+
+	prescale = DIV_ROUND_UP(clk_get_rate(a3700_spi->clk), speed_hz);
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val = val & ~A3700_SPI_CLK_PRESCALE_MASK;
+
+	val = val | (prescale & A3700_SPI_CLK_PRESCALE_MASK);
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+
+	if (prescale <= 2) {
+		val = spireg_read(a3700_spi, A3700_SPI_IF_TIME_REG);
+		val |= A3700_SPI_CLK_CAPT_EDGE;
+		spireg_write(a3700_spi, A3700_SPI_IF_TIME_REG, val);
+	}
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val &= ~(A3700_SPI_CLK_POL | A3700_SPI_CLK_PHA);
+
+	if (mode & SPI_CPOL)
+		val |= A3700_SPI_CLK_POL;
+
+	if (mode & SPI_CPHA)
+		val |= A3700_SPI_CLK_PHA;
+
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+}
+
+static void a3700_spi_bytelen_set(struct a3700_spi *a3700_spi, unsigned int len)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	if (len == 4)
+		val |= A3700_SPI_BYTE_LEN;
+	else
+		val &= ~A3700_SPI_BYTE_LEN;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+
+	a3700_spi->byte_len = len;
+}
+
+static int a3700_spi_fifo_flush(struct a3700_spi *a3700_spi)
+{
+	int timeout = A3700_SPI_TIMEOUT;
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val |= A3700_SPI_FIFO_FLUSH;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+
+	while (--timeout) {
+		val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+		if (!(val & A3700_SPI_FIFO_FLUSH))
+			return 0;
+		udelay(1);
+	}
+
+	return -ETIMEDOUT;
+}
+
+static int a3700_spi_init(struct a3700_spi *a3700_spi)
+{
+	struct spi_master *master = a3700_spi->master;
+	u32 val;
+	int i, ret = 0;
+
+	/* Reset SPI unit */
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val |= A3700_SPI_SRST;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+
+	udelay(A3700_SPI_TIMEOUT);
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val &= ~A3700_SPI_SRST;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+
+	/* Disable AUTO_CS and deactivate all chip-selects */
+	a3700_spi_auto_cs_unset(a3700_spi);
+	for (i = 0; i < master->num_chipselect; i++)
+		a3700_spi_deactivate_cs(a3700_spi, i);
+
+	/* Enable FIFO mode */
+	a3700_spi_fifo_mode_set(a3700_spi);
+
+	/* Set SPI mode */
+	a3700_spi_mode_set(a3700_spi, master->mode_bits);
+
+	/* Reset counters */
+	spireg_write(a3700_spi, A3700_SPI_IF_HDR_CNT_REG, 0);
+	spireg_write(a3700_spi, A3700_SPI_IF_DIN_CNT_REG, 0);
+
+	/* Mask the interrupts and clear cause bits */
+	spireg_write(a3700_spi, A3700_SPI_INT_MASK_REG, 0);
+	spireg_write(a3700_spi, A3700_SPI_INT_STAT_REG, ~0U);
+
+	return ret;
+}
+
+static irqreturn_t a3700_spi_interrupt(int irq, void *dev_id)
+{
+	struct spi_master *master = dev_id;
+	struct a3700_spi *a3700_spi;
+	u32 cause;
+
+	a3700_spi = spi_master_get_devdata(master);
+
+	/* Get interrupt causes */
+	cause = spireg_read(a3700_spi, A3700_SPI_INT_STAT_REG);
+
+	if (!cause || !(a3700_spi->wait_mask & cause))
+		return IRQ_NONE;
+
+	/* mask and acknowledge the SPI interrupts */
+	spireg_write(a3700_spi, A3700_SPI_INT_MASK_REG, 0);
+	spireg_write(a3700_spi, A3700_SPI_INT_STAT_REG, cause);
+
+	/* Wake up the transfer */
+	if (a3700_spi->wait_mask & cause)
+		complete(&a3700_spi->done);
+
+	return IRQ_HANDLED;
+}
+
+static bool a3700_spi_wait_completion(struct spi_device *spi)
+{
+	struct a3700_spi *a3700_spi;
+	unsigned int timeout;
+	unsigned int ctrl_reg;
+	unsigned long timeout_jiffies;
+
+	a3700_spi = spi_master_get_devdata(spi->master);
+
+	/* SPI interrupt is edge-triggered, which means an interrupt will
+	 * be generated only when detecting a specific status bit changed
+	 * from '0' to '1'. So when we start waiting for a interrupt, we
+	 * need to check status bit in control reg first, if it is already 1,
+	 * then we do not need to wait for interrupt
+	 */
+	ctrl_reg = spireg_read(a3700_spi, A3700_SPI_IF_CTRL_REG);
+	if (a3700_spi->wait_mask & ctrl_reg)
+		return true;
+
+	reinit_completion(&a3700_spi->done);
+
+	spireg_write(a3700_spi, A3700_SPI_INT_MASK_REG,
+		     a3700_spi->wait_mask);
+
+	timeout_jiffies = msecs_to_jiffies(A3700_SPI_TIMEOUT);
+	timeout = wait_for_completion_timeout(&a3700_spi->done,
+					      timeout_jiffies);
+
+	a3700_spi->wait_mask = 0;
+
+	if (timeout)
+		return true;
+
+	/* there might be the case that right after we checked the
+	 * status bits in this routine and before start to wait for
+	 * interrupt by wait_for_completion_timeout, the interrupt
+	 * happens, to avoid missing it we need to double check
+	 * status bits in control reg, if it is already 1, then
+	 * consider that we have the interrupt successfully and
+	 * return true.
+	 */
+	ctrl_reg = spireg_read(a3700_spi, A3700_SPI_IF_CTRL_REG);
+	if (a3700_spi->wait_mask & ctrl_reg)
+		return true;
+
+	spireg_write(a3700_spi, A3700_SPI_INT_MASK_REG, 0);
+
+	return true;
+}
+
+static bool a3700_spi_transfer_wait(struct spi_device *spi,
+				    unsigned int bit_mask)
+{
+	struct a3700_spi *a3700_spi;
+
+	a3700_spi = spi_master_get_devdata(spi->master);
+	a3700_spi->wait_mask = bit_mask;
+
+	return a3700_spi_wait_completion(spi);
+}
+
+static void a3700_spi_fifo_thres_set(struct a3700_spi *a3700_spi,
+				     unsigned int bytes)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val &= ~(A3700_SPI_FIFO_THRS_MASK << A3700_SPI_RFIFO_THRS_BIT);
+	val |= (bytes - 1) << A3700_SPI_RFIFO_THRS_BIT;
+	val &= ~(A3700_SPI_FIFO_THRS_MASK << A3700_SPI_WFIFO_THRS_BIT);
+	val |= (7 - bytes) << A3700_SPI_WFIFO_THRS_BIT;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+}
+
+static void a3700_spi_transfer_setup(struct spi_device *spi,
+				    struct spi_transfer *xfer)
+{
+	struct a3700_spi *a3700_spi;
+	unsigned int byte_len;
+
+	a3700_spi = spi_master_get_devdata(spi->master);
+
+	a3700_spi_clock_set(a3700_spi, xfer->speed_hz, spi->mode);
+
+	byte_len = xfer->bits_per_word >> 3;
+
+	a3700_spi_fifo_thres_set(a3700_spi, byte_len);
+}
+
+static void a3700_spi_set_cs(struct spi_device *spi, bool enable)
+{
+	struct a3700_spi *a3700_spi = spi_master_get_devdata(spi->master);
+
+	if (!enable)
+		a3700_spi_activate_cs(a3700_spi, spi->chip_select);
+	else
+		a3700_spi_deactivate_cs(a3700_spi, spi->chip_select);
+}
+
+static void a3700_spi_header_set(struct a3700_spi *a3700_spi)
+{
+	u32 instr_cnt = 0, addr_cnt = 0, dummy_cnt = 0;
+	u32 val = 0;
+
+	/* Clear the header registers */
+	spireg_write(a3700_spi, A3700_SPI_IF_INST_REG, 0);
+	spireg_write(a3700_spi, A3700_SPI_IF_ADDR_REG, 0);
+	spireg_write(a3700_spi, A3700_SPI_IF_RMODE_REG, 0);
+
+	/* Set header counters */
+	if (a3700_spi->tx_buf) {
+		if (a3700_spi->buf_len <= a3700_spi->instr_cnt) {
+			instr_cnt = a3700_spi->buf_len;
+		} else if (a3700_spi->buf_len <= (a3700_spi->instr_cnt +
+						  a3700_spi->addr_cnt)) {
+			instr_cnt = a3700_spi->instr_cnt;
+			addr_cnt = a3700_spi->buf_len - instr_cnt;
+		} else if (a3700_spi->buf_len <= a3700_spi->hdr_cnt) {
+			instr_cnt = a3700_spi->instr_cnt;
+			addr_cnt = a3700_spi->addr_cnt;
+			/* Need to handle the normal write case with 1 byte
+			 * data
+			 */
+			if (!a3700_spi->tx_buf[instr_cnt + addr_cnt])
+				dummy_cnt = a3700_spi->buf_len - instr_cnt -
+					    addr_cnt;
+		}
+		val |= ((instr_cnt & A3700_SPI_INSTR_CNT_MASK)
+			<< A3700_SPI_INSTR_CNT_BIT);
+		val |= ((addr_cnt & A3700_SPI_ADDR_CNT_MASK)
+			<< A3700_SPI_ADDR_CNT_BIT);
+		val |= ((dummy_cnt & A3700_SPI_DUMMY_CNT_MASK)
+			<< A3700_SPI_DUMMY_CNT_BIT);
+	}
+	spireg_write(a3700_spi, A3700_SPI_IF_HDR_CNT_REG, val);
+
+	/* Update the buffer length to be transferred */
+	a3700_spi->buf_len -= (instr_cnt + addr_cnt + dummy_cnt);
+
+	/* Set Instruction */
+	val = 0;
+	while (instr_cnt--) {
+		val = (val << 8) | a3700_spi->tx_buf[0];
+		a3700_spi->tx_buf++;
+	}
+	spireg_write(a3700_spi, A3700_SPI_IF_INST_REG, val);
+
+	/* Set Address */
+	val = 0;
+	while (addr_cnt--) {
+		val = (val << 8) | a3700_spi->tx_buf[0];
+		a3700_spi->tx_buf++;
+	}
+	spireg_write(a3700_spi, A3700_SPI_IF_ADDR_REG, val);
+}
+
+static int a3700_is_wfifo_full(struct a3700_spi *a3700_spi)
+{
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CTRL_REG);
+	return (val & A3700_SPI_WFIFO_FULL);
+}
+
+static int a3700_spi_fifo_write(struct a3700_spi *a3700_spi)
+{
+	u32 val;
+	int i = 0;
+
+	while (!a3700_is_wfifo_full(a3700_spi) && a3700_spi->buf_len) {
+		val = 0;
+		if (a3700_spi->buf_len >= 4) {
+			val = cpu_to_le32(*(u32 *)a3700_spi->tx_buf);
+			spireg_write(a3700_spi, A3700_SPI_DATA_OUT_REG, val);
+
+			a3700_spi->buf_len -= 4;
+			a3700_spi->tx_buf += 4;
+		} else {
+			/*
+			 * If the remained buffer length is less than 4-bytes,
+			 * we should pad the write buffer with all ones. So that
+			 * it avoids overwrite the unexpected bytes following
+			 * the last one.
+			 */
+			val = GENMASK(31, 0);
+			while (a3700_spi->buf_len) {
+				val &= ~(0xff << (8 * i));
+				val |= *a3700_spi->tx_buf++ << (8 * i);
+				i++;
+				a3700_spi->buf_len--;
+
+				spireg_write(a3700_spi, A3700_SPI_DATA_OUT_REG,
+					     val);
+			}
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static int a3700_is_rfifo_empty(struct a3700_spi *a3700_spi)
+{
+	u32 val = spireg_read(a3700_spi, A3700_SPI_IF_CTRL_REG);
+
+	return (val & A3700_SPI_RFIFO_EMPTY);
+}
+
+static int a3700_spi_fifo_read(struct a3700_spi *a3700_spi)
+{
+	u32 val;
+
+	while (!a3700_is_rfifo_empty(a3700_spi) && a3700_spi->buf_len) {
+		val = spireg_read(a3700_spi, A3700_SPI_DATA_IN_REG);
+		if (a3700_spi->buf_len >= 4) {
+			u32 data = le32_to_cpu(val);
+			memcpy(a3700_spi->rx_buf, &data, 4);
+
+			a3700_spi->buf_len -= 4;
+			a3700_spi->rx_buf += 4;
+		} else {
+			/*
+			 * When remain bytes is not larger than 4, we should
+			 * avoid memory overwriting and just write the left rx
+			 * buffer bytes.
+			 */
+			while (a3700_spi->buf_len) {
+				*a3700_spi->rx_buf = val & 0xff;
+				val >>= 8;
+
+				a3700_spi->buf_len--;
+				a3700_spi->rx_buf++;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static void a3700_spi_transfer_abort_fifo(struct a3700_spi *a3700_spi)
+{
+	int timeout = A3700_SPI_TIMEOUT;
+	u32 val;
+
+	val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+	val |= A3700_SPI_XFER_STOP;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+
+	while (--timeout) {
+		val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+		if (!(val & A3700_SPI_XFER_START))
+			break;
+		udelay(1);
+	}
+
+	a3700_spi_fifo_flush(a3700_spi);
+
+	val &= ~A3700_SPI_XFER_STOP;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+}
+
+static int a3700_spi_prepare_message(struct spi_master *master,
+				     struct spi_message *message)
+{
+	struct a3700_spi *a3700_spi = spi_master_get_devdata(master);
+	struct spi_device *spi = message->spi;
+	int ret;
+
+	ret = clk_enable(a3700_spi->clk);
+	if (ret) {
+		dev_err(&spi->dev, "failed to enable clk with error %d\n", ret);
+		return ret;
+	}
+
+	/* Flush the FIFOs */
+	ret = a3700_spi_fifo_flush(a3700_spi);
+	if (ret)
+		return ret;
+
+	a3700_spi_bytelen_set(a3700_spi, 4);
+
+	return 0;
+}
+
+static int a3700_spi_transfer_one(struct spi_master *master,
+				  struct spi_device *spi,
+				  struct spi_transfer *xfer)
+{
+	struct a3700_spi *a3700_spi = spi_master_get_devdata(master);
+	int ret = 0, timeout = A3700_SPI_TIMEOUT;
+	unsigned int nbits = 0;
+	u32 val;
+
+	a3700_spi_transfer_setup(spi, xfer);
+
+	a3700_spi->tx_buf  = xfer->tx_buf;
+	a3700_spi->rx_buf  = xfer->rx_buf;
+	a3700_spi->buf_len = xfer->len;
+
+	/* SPI transfer headers */
+	a3700_spi_header_set(a3700_spi);
+
+	if (xfer->tx_buf)
+		nbits = xfer->tx_nbits;
+	else if (xfer->rx_buf)
+		nbits = xfer->rx_nbits;
+
+	a3700_spi_pin_mode_set(a3700_spi, nbits);
+
+	if (xfer->rx_buf) {
+		/* Set read data length */
+		spireg_write(a3700_spi, A3700_SPI_IF_DIN_CNT_REG,
+			     a3700_spi->buf_len);
+		/* Start READ transfer */
+		val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+		val &= ~A3700_SPI_RW_EN;
+		val |= A3700_SPI_XFER_START;
+		spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+	} else if (xfer->tx_buf) {
+		/* Start Write transfer */
+		val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+		val |= (A3700_SPI_XFER_START | A3700_SPI_RW_EN);
+		spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+
+		/*
+		 * If there are data to be written to the SPI device, xmit_data
+		 * flag is set true; otherwise the instruction in SPI_INSTR does
+		 * not require data to be written to the SPI device, then
+		 * xmit_data flag is set false.
+		 */
+		a3700_spi->xmit_data = (a3700_spi->buf_len != 0);
+	}
+
+	while (a3700_spi->buf_len) {
+		if (a3700_spi->tx_buf) {
+			/* Wait wfifo ready */
+			if (!a3700_spi_transfer_wait(spi,
+						     A3700_SPI_WFIFO_RDY)) {
+				dev_err(&spi->dev,
+					"wait wfifo ready timed out\n");
+				ret = -ETIMEDOUT;
+				goto error;
+			}
+			/* Fill up the wfifo */
+			ret = a3700_spi_fifo_write(a3700_spi);
+			if (ret)
+				goto error;
+		} else if (a3700_spi->rx_buf) {
+			/* Wait rfifo ready */
+			if (!a3700_spi_transfer_wait(spi,
+						     A3700_SPI_RFIFO_RDY)) {
+				dev_err(&spi->dev,
+					"wait rfifo ready timed out\n");
+				ret = -ETIMEDOUT;
+				goto error;
+			}
+			/* Drain out the rfifo */
+			ret = a3700_spi_fifo_read(a3700_spi);
+			if (ret)
+				goto error;
+		}
+	}
+
+	/*
+	 * Stop a write transfer in fifo mode:
+	 *	- wait all the bytes in wfifo to be shifted out
+	 *	 - set XFER_STOP bit
+	 *	- wait XFER_START bit clear
+	 *	- clear XFER_STOP bit
+	 * Stop a read transfer in fifo mode:
+	 *	- the hardware is to reset the XFER_START bit
+	 *	   after the number of bytes indicated in DIN_CNT
+	 *	   register
+	 *	- just wait XFER_START bit clear
+	 */
+	if (a3700_spi->tx_buf) {
+		if (a3700_spi->xmit_data) {
+			/*
+			 * If there are data written to the SPI device, wait
+			 * until SPI_WFIFO_EMPTY is 1 to wait for all data to
+			 * transfer out of write FIFO.
+			 */
+			if (!a3700_spi_transfer_wait(spi,
+						     A3700_SPI_WFIFO_EMPTY)) {
+				dev_err(&spi->dev, "wait wfifo empty timed out\n");
+				return -ETIMEDOUT;
+			}
+		} else {
+			/*
+			 * If the instruction in SPI_INSTR does not require data
+			 * to be written to the SPI device, wait until SPI_RDY
+			 * is 1 for the SPI interface to be in idle.
+			 */
+			if (!a3700_spi_transfer_wait(spi, A3700_SPI_XFER_RDY)) {
+				dev_err(&spi->dev, "wait xfer ready timed out\n");
+				return -ETIMEDOUT;
+			}
+		}
+
+		val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+		val |= A3700_SPI_XFER_STOP;
+		spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+	}
+
+	while (--timeout) {
+		val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
+		if (!(val & A3700_SPI_XFER_START))
+			break;
+		udelay(1);
+	}
+
+	if (timeout == 0) {
+		dev_err(&spi->dev, "wait transfer start clear timed out\n");
+		ret = -ETIMEDOUT;
+		goto error;
+	}
+
+	val &= ~A3700_SPI_XFER_STOP;
+	spireg_write(a3700_spi, A3700_SPI_IF_CFG_REG, val);
+	goto out;
+
+error:
+	a3700_spi_transfer_abort_fifo(a3700_spi);
+out:
+	spi_finalize_current_transfer(master);
+
+	return ret;
+}
+
+static int a3700_spi_unprepare_message(struct spi_master *master,
+				       struct spi_message *message)
+{
+	struct a3700_spi *a3700_spi = spi_master_get_devdata(master);
+
+	clk_disable(a3700_spi->clk);
+
+	return 0;
+}
+
+static const struct of_device_id a3700_spi_dt_ids[] = {
+	{ .compatible = "marvell,armada-3700-spi", .data = NULL },
+	{},
+};
+
+MODULE_DEVICE_TABLE(of, a3700_spi_dt_ids);
+
+static int a3700_spi_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct device_node *of_node = dev->of_node;
+	struct resource *res;
+	struct spi_master *master;
+	struct a3700_spi *spi;
+	u32 num_cs = 0;
+	int ret = 0;
+
+	master = spi_alloc_master(dev, sizeof(*spi));
+	if (!master) {
+		dev_err(dev, "master allocation failed\n");
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (of_property_read_u32(of_node, "num-cs", &num_cs)) {
+		dev_err(dev, "could not find num-cs\n");
+		ret = -ENXIO;
+		goto error;
+	}
+
+	master->bus_num = pdev->id;
+	master->dev.of_node = of_node;
+	master->mode_bits = SPI_MODE_3;
+	master->num_chipselect = num_cs;
+	master->bits_per_word_mask = SPI_BPW_MASK(8) | SPI_BPW_MASK(32);
+	master->prepare_message =  a3700_spi_prepare_message;
+	master->transfer_one = a3700_spi_transfer_one;
+	master->unprepare_message = a3700_spi_unprepare_message;
+	master->set_cs = a3700_spi_set_cs;
+	master->flags = SPI_MASTER_HALF_DUPLEX;
+	master->mode_bits |= (SPI_RX_DUAL | SPI_RX_DUAL |
+			      SPI_RX_QUAD | SPI_TX_QUAD);
+
+	platform_set_drvdata(pdev, master);
+
+	spi = spi_master_get_devdata(master);
+	memset(spi, 0, sizeof(struct a3700_spi));
+
+	spi->master = master;
+	spi->instr_cnt = A3700_INSTR_CNT;
+	spi->addr_cnt = A3700_ADDR_CNT;
+	spi->hdr_cnt = A3700_INSTR_CNT + A3700_ADDR_CNT +
+		       A3700_DUMMY_CNT;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	spi->base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(spi->base)) {
+		ret = PTR_ERR(spi->base);
+		goto error;
+	}
+
+	spi->irq = platform_get_irq(pdev, 0);
+	if (spi->irq < 0) {
+		dev_err(dev, "could not get irq: %d\n", spi->irq);
+		ret = -ENXIO;
+		goto error;
+	}
+
+	init_completion(&spi->done);
+
+	spi->clk = devm_clk_get(dev, NULL);
+	if (IS_ERR(spi->clk)) {
+		dev_err(dev, "could not find clk: %ld\n", PTR_ERR(spi->clk));
+		goto error;
+	}
+
+	ret = clk_prepare(spi->clk);
+	if (ret) {
+		dev_err(dev, "could not prepare clk: %d\n", ret);
+		goto error;
+	}
+
+	ret = a3700_spi_init(spi);
+	if (ret)
+		goto error_clk;
+
+	ret = devm_request_irq(dev, spi->irq, a3700_spi_interrupt, 0,
+			       dev_name(dev), master);
+	if (ret) {
+		dev_err(dev, "could not request IRQ: %d\n", ret);
+		goto error_clk;
+	}
+
+	ret = devm_spi_register_master(dev, master);
+	if (ret) {
+		dev_err(dev, "Failed to register master\n");
+		goto error_clk;
+	}
+
+	return 0;
+
+error_clk:
+	clk_disable_unprepare(spi->clk);
+error:
+	spi_master_put(master);
+out:
+	return ret;
+}
+
+static int a3700_spi_remove(struct platform_device *pdev)
+{
+	struct spi_master *master = platform_get_drvdata(pdev);
+	struct a3700_spi *spi = spi_master_get_devdata(master);
+
+	clk_unprepare(spi->clk);
+	spi_master_put(master);
+
+	return 0;
+}
+
+static struct platform_driver a3700_spi_driver = {
+	.driver = {
+		.name	= DRIVER_NAME,
+		.owner	= THIS_MODULE,
+		.of_match_table = of_match_ptr(a3700_spi_dt_ids),
+	},
+	.probe		= a3700_spi_probe,
+	.remove		= a3700_spi_remove,
+};
+
+module_platform_driver(a3700_spi_driver);
+
+MODULE_DESCRIPTION("Armada-3700 SPI driver");
+MODULE_AUTHOR("Wilson Ding <dingwei@marvell.com>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:" DRIVER_NAME);
-- 
2.9.3

^ permalink raw reply related

* [PATCH v4 0/4] Add support for the Armada 3700 SPI controller
From: Romain Perier @ 2016-12-08 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

The Marvell Armada 3700 SoC includes an SPI controller. This controller
supports up to 4 SPI slave devices, with dedicated chip selects, CPIO or
FIFO mode with DMA or CPU transfers and different SPI transfer modes
(Standard single, Dual or Quad).

This set of patches adds a basic support for the FIFO mode, (CPU-side
only, DMA not supported yet). It also adds the required definitions of
the spi nodes to the devicetree.

Romain Perier (4):
  spi: Add support for Armada 3700 SPI Controller
  spi: armada-3700: Add documentation for the Armada 3700 SPI Controller
  arm64: dts: marvell: Add definition of SPI controller for Armada 3700
  arm64: dts: marvell: Enable spi0 on the board Armada-3720-db

 .../devicetree/bindings/spi/spi-armada-3700.txt    |  25 +
 arch/arm64/boot/dts/marvell/armada-3720-db.dts     |  30 +
 arch/arm64/boot/dts/marvell/armada-37xx.dtsi       |  11 +
 drivers/spi/Kconfig                                |   7 +
 drivers/spi/Makefile                               |   1 +
 drivers/spi/spi-armada-3700.c                      | 923 +++++++++++++++++++++
 6 files changed, 997 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/spi/spi-armada-3700.txt
 create mode 100644 drivers/spi/spi-armada-3700.c

-- 
2.9.3

^ permalink raw reply

* [PATCH 2/2] ASoC: zte: spdif: correct ZX_SPDIF_CLK_RAT define
From: Jun Nie @ 2016-12-08 14:58 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481186655-8213-2-git-send-email-shawnguo@kernel.org>

2016-12-08 16:44 GMT+08:00 Shawn Guo <shawnguo@kernel.org>:
> From: Shawn Guo <shawn.guo@linaro.org>
>
> The macro ZX_SPDIF_CLK_RAT should be 2 instead of 4.  With this
> fix, we can get correct audio output on HDMI through SPDIF interface.
>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> ---
>  sound/soc/zte/zx-spdif.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sound/soc/zte/zx-spdif.c b/sound/soc/zte/zx-spdif.c
> index 26265ce4caca..9fa6463ce5d7 100644
> --- a/sound/soc/zte/zx-spdif.c
> +++ b/sound/soc/zte/zx-spdif.c
> @@ -71,7 +71,7 @@
>  #define ZX_VALID_RIGHT_TRACK           (2 << 0)
>  #define ZX_VALID_TRACK_MASK            (3 << 0)
>
> -#define ZX_SPDIF_CLK_RAT               (4 * 32)
> +#define ZX_SPDIF_CLK_RAT               (2 * 32)
>
>  struct zx_spdif_info {
>         struct snd_dmaengine_dai_dma_data       dma_data;
> --
> 1.9.1
>

Acked-by: Jun Nie <jun.nie@linaro.org>

^ permalink raw reply

* [PATCH 2/2] clk: zte: add audio clocks for zx296718
From: Jun Nie @ 2016-12-08 14:55 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481189157-8995-2-git-send-email-shawnguo@kernel.org>

2016-12-08 17:25 GMT+08:00 Shawn Guo <shawnguo@kernel.org>:
> From: Jun Nie <jun.nie@linaro.org>
>
> The audio related clock support is missing from the existing zx296718
> clock driver.  Let's add it, so that the upstream ZX SPDIF driver can
> work for HDMI audio support.
>
> Signed-off-by: Jun Nie <jun.nie@linaro.org>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> ---
>  drivers/clk/zte/clk-zx296718.c | 150 +++++++++++++++++++++++++++++++++++++++++
>  drivers/clk/zte/clk.c          | 149 ++++++++++++++++++++++++++++++++++++++++
>  drivers/clk/zte/clk.h          |  28 ++++++++
>  3 files changed, 327 insertions(+)
>
> diff --git a/drivers/clk/zte/clk-zx296718.c b/drivers/clk/zte/clk-zx296718.c
> index 707d62956e9b..eed8581b1b25 100644
> --- a/drivers/clk/zte/clk-zx296718.c
> +++ b/drivers/clk/zte/clk-zx296718.c
> @@ -888,10 +888,160 @@ static int __init lsp1_clocks_init(struct device_node *np)
>         return 0;
>  }
>
> +PNAME(audio_wclk_common_p) = {
> +       "audio_99m",
> +       "audio_24m",
> +};
> +
> +PNAME(audio_timer_p) = {
> +       "audio_24m",
> +       "audio_32k",
> +};
> +
> +static struct zx_clk_mux audio_mux_clk[] = {
> +       MUX(0, "i2s0_wclk_mux", audio_wclk_common_p, AUDIO_I2S0_CLK, 0, 1),
> +       MUX(0, "i2s1_wclk_mux", audio_wclk_common_p, AUDIO_I2S1_CLK, 0, 1),
> +       MUX(0, "i2s2_wclk_mux", audio_wclk_common_p, AUDIO_I2S2_CLK, 0, 1),
> +       MUX(0, "i2s3_wclk_mux", audio_wclk_common_p, AUDIO_I2S3_CLK, 0, 1),
> +       MUX(0, "i2c0_wclk_mux", audio_wclk_common_p, AUDIO_I2C0_CLK, 0, 1),
> +       MUX(0, "spdif0_wclk_mux", audio_wclk_common_p, AUDIO_SPDIF0_CLK, 0, 1),
> +       MUX(0, "spdif1_wclk_mux", audio_wclk_common_p, AUDIO_SPDIF1_CLK, 0, 1),
> +       MUX(0, "timer_wclk_mux", audio_timer_p, AUDIO_TIMER_CLK, 0, 1),
> +};
> +
> +struct zx_clk_audio_div_table i2s_wclk_div_table[] = {
> +       {2048000, 0x3000030, 0xffff5700},
> +       {4096000, 0x3000018, 0xffff2b80},
> +       {2822400, 0x3000011, 0xffff89cb},
> +       {3072000, 0x3000010, 0xffff1d00},
> +       {4096000, 0x300000c, 0xffff15c0},
> +       {5644800, 0x3000008, 0xffffc4e5},
> +       {6144000, 0x3000008, 0xffff0e80},
> +       {11289600, 0x3000004, 0xffff6273},
> +       {12288000, 0x3000004, 0xffff0740},
> +       {22579200, 0x3000002, 0xffff3139},
> +       {24576000, 0x3000002, 0xffff03a0},
> +};
> +
> +struct zx_clk_audio_div_table spdif_wclk_div_table[] = {
> +       {2822400, 0x00023, 0xffff1397},
> +       {3072000, 0x00020, 0xffff3a00},
> +       {4096000, 0x00018, 0xffff2b80},
> +       {5644800, 0x00011, 0xffff89cb},
> +       {6144000, 0x00010, 0xffff1d00},
> +       {11289600, 0x00008, 0xffffc4e5},
> +       {12288000, 0x00008, 0xffff0e80},
> +       {22579200, 0x00004, 0xffff6273},
> +       {24576000, 0x00004, 0xffff0740},
> +};

You can remove these two tables and table pointer member as I already
removed table pointer assignment in macro AUDIO_DIV in this code. I am
sorry for not cleaning code properly.

> +
> +struct clk_zx_audio_divider audio_adiv_clk[] = {
> +       AUDIO_DIV(0, "i2s0_wclk_div", "i2s0_wclk_mux", AUDIO_I2S0_DIV_CFG1, i2s_wclk_div_table),
> +       AUDIO_DIV(0, "i2s1_wclk_div", "i2s1_wclk_mux", AUDIO_I2S1_DIV_CFG1, i2s_wclk_div_table),
> +       AUDIO_DIV(0, "i2s2_wclk_div", "i2s2_wclk_mux", AUDIO_I2S2_DIV_CFG1, i2s_wclk_div_table),
> +       AUDIO_DIV(0, "i2s3_wclk_div", "i2s3_wclk_mux", AUDIO_I2S3_DIV_CFG1, i2s_wclk_div_table),
> +       AUDIO_DIV(0, "spdif0_wclk_div", "spdif0_wclk_mux", AUDIO_SPDIF0_DIV_CFG1, spdif_wclk_div_table),
> +       AUDIO_DIV(0, "spdif1_wclk_div", "spdif1_wclk_mux", AUDIO_SPDIF1_DIV_CFG1, spdif_wclk_div_table),
> +};
> +
> +struct zx_clk_div audio_div_clk[] = {
> +       DIV_T(0, "tdm_wclk_div", "audio_16m384", AUDIO_TDM_CLK, 8, 4, 0, common_div_table),
> +};
> +
> +struct zx_clk_gate audio_gate_clk[] = {
> +       GATE(AUDIO_I2S0_WCLK, "i2s0_wclk", "i2s0_wclk_div", AUDIO_I2S0_CLK, 9, CLK_SET_RATE_PARENT, 0),
> +       GATE(AUDIO_I2S1_WCLK, "i2s1_wclk", "i2s1_wclk_div", AUDIO_I2S1_CLK, 9, CLK_SET_RATE_PARENT, 0),
> +       GATE(AUDIO_I2S2_WCLK, "i2s2_wclk", "i2s2_wclk_div", AUDIO_I2S2_CLK, 9, CLK_SET_RATE_PARENT, 0),
> +       GATE(AUDIO_I2S3_WCLK, "i2s3_wclk", "i2s3_wclk_div", AUDIO_I2S3_CLK, 9, CLK_SET_RATE_PARENT, 0),
> +       GATE(AUDIO_I2C0_WCLK, "i2c0_wclk", "i2c0_wclk_mux", AUDIO_I2C0_CLK, 9, CLK_SET_RATE_PARENT, 0),
> +       GATE(AUDIO_SPDIF0_WCLK, "spdif0_wclk", "spdif0_wclk_div", AUDIO_SPDIF0_CLK, 9, CLK_SET_RATE_PARENT, 0),
> +       GATE(AUDIO_SPDIF1_WCLK, "spdif1_wclk", "spdif1_wclk_div", AUDIO_SPDIF1_CLK, 9, CLK_SET_RATE_PARENT, 0),
> +       GATE(AUDIO_TDM_WCLK, "tdm_wclk", "tdm_wclk_div", AUDIO_TDM_CLK, 17, CLK_SET_RATE_PARENT, 0),
> +       GATE(AUDIO_TS_PCLK, "tempsensor_pclk", "clk49m5", AUDIO_TS_CLK, 1, 0, 0),
> +};
> +
> +static struct clk_hw_onecell_data audio_hw_onecell_data = {
> +       .num = AUDIO_NR_CLKS,
> +       .hws = {
> +               [AUDIO_NR_CLKS - 1] = NULL,
> +       },
> +};
> +
> +static int __init audio_clocks_init(struct device_node *np)
> +{
> +       void __iomem *reg_base;
> +       int i, ret;
> +
> +       reg_base = of_iomap(np, 0);
> +       if (!reg_base) {
> +               pr_err("%s: Unable to map audio clk base\n", __func__);
> +               return -ENXIO;
> +       }
> +
> +       for (i = 0; i < ARRAY_SIZE(audio_mux_clk); i++) {
> +               if (audio_mux_clk[i].id)
> +                       audio_hw_onecell_data.hws[audio_mux_clk[i].id] =
> +                                       &audio_mux_clk[i].mux.hw;
> +
> +               audio_mux_clk[i].mux.reg += (u64)reg_base;

Fix build test failure on 32bit system.
 audio_mux_clk[i].mux.reg +=  (uintptr_t)reg_base;

> +               ret = clk_hw_register(NULL, &audio_mux_clk[i].mux.hw);
> +               if (ret) {
> +                       pr_warn("audio clk %s init error!\n",
> +                               audio_mux_clk[i].mux.hw.init->name);
> +               }
> +       }
> +
> +       for (i = 0; i < ARRAY_SIZE(audio_adiv_clk); i++) {
> +               if (audio_adiv_clk[i].id)
> +                       audio_hw_onecell_data.hws[audio_adiv_clk[i].id] =
> +                                       &audio_adiv_clk[i].hw;
> +
> +               audio_adiv_clk[i].reg_base += (u64)reg_base;

The same to this line and below cases.

> +               ret = clk_hw_register(NULL, &audio_adiv_clk[i].hw);
> +               if (ret) {
> +                       pr_warn("audio clk %s init error!\n",
> +                               audio_adiv_clk[i].hw.init->name);
> +               }
> +       }
> +
> +       for (i = 0; i < ARRAY_SIZE(audio_div_clk); i++) {
> +               if (audio_div_clk[i].id)
> +                       audio_hw_onecell_data.hws[audio_div_clk[i].id] =
> +                                       &audio_div_clk[i].div.hw;
> +
> +               audio_div_clk[i].div.reg += (u64)reg_base;
> +               ret = clk_hw_register(NULL, &audio_div_clk[i].div.hw);
> +               if (ret) {
> +                       pr_warn("audio clk %s init error!\n",
> +                               audio_div_clk[i].div.hw.init->name);
> +               }
> +       }
> +
> +       for (i = 0; i < ARRAY_SIZE(audio_gate_clk); i++) {
> +               if (audio_gate_clk[i].id)
> +                       audio_hw_onecell_data.hws[audio_gate_clk[i].id] =
> +                                       &audio_gate_clk[i].gate.hw;
> +
> +               audio_gate_clk[i].gate.reg += (u64)reg_base;
> +               ret = clk_hw_register(NULL, &audio_gate_clk[i].gate.hw);
> +               if (ret) {
> +                       pr_warn("audio clk %s init error!\n",
> +                               audio_gate_clk[i].gate.hw.init->name);
> +               }
> +       }
> +
> +       if (of_clk_add_hw_provider(np, of_clk_hw_onecell_get, &audio_hw_onecell_data))
> +               panic("could not register clk provider\n");
> +       pr_info("audio-clk init over, nr:%d\n", AUDIO_NR_CLKS);
> +
> +       return 0;
> +}
> +
>  static const struct of_device_id zx_clkc_match_table[] = {
>         { .compatible = "zte,zx296718-topcrm", .data = &top_clocks_init },
>         { .compatible = "zte,zx296718-lsp0crm", .data = &lsp0_clocks_init },
>         { .compatible = "zte,zx296718-lsp1crm", .data = &lsp1_clocks_init },
> +       { .compatible = "zte,zx296718-audiocrm", .data = &audio_clocks_init },
>         { }
>  };
>
> diff --git a/drivers/clk/zte/clk.c b/drivers/clk/zte/clk.c
> index c4c1251bc1e7..ea97024b37aa 100644
> --- a/drivers/clk/zte/clk.c
> +++ b/drivers/clk/zte/clk.c
> @@ -9,6 +9,7 @@
>
>  #include <linux/clk-provider.h>
>  #include <linux/err.h>
> +#include <linux/gcd.h>
>  #include <linux/io.h>
>  #include <linux/iopoll.h>
>  #include <linux/slab.h>
> @@ -310,3 +311,151 @@ struct clk *clk_register_zx_audio(const char *name,
>
>         return clk;
>  }
> +
> +#define CLK_AUDIO_DIV_FRAC     BIT(0)
> +#define CLK_AUDIO_DIV_INT      BIT(1)
> +#define CLK_AUDIO_DIV_UNCOMMON BIT(1)
> +
> +#define CLK_AUDIO_DIV_FRAC_NSHIFT      16
> +#define CLK_AUDIO_DIV_INT_FRAC_RE      BIT(16)
> +#define CLK_AUDIO_DIV_INT_FRAC_MAX     (0xffff)
> +#define CLK_AUDIO_DIV_INT_FRAC_MIN     (0x2)
> +#define CLK_AUDIO_DIV_INT_INT_SHIFT    24
> +#define CLK_AUDIO_DIV_INT_INT_WIDTH    4
> +
> +#define to_clk_zx_audio_div(_hw) container_of(_hw, struct clk_zx_audio_divider, hw)
> +
> +static unsigned long audio_calc_rate(struct clk_zx_audio_divider *audio_div,
> +                                    u32 reg_frac, u32 reg_int,
> +                                    unsigned long parent_rate)
> +{
> +       unsigned long rate, m, n;
> +
> +       if (audio_div->table) {
> +               const struct zx_clk_audio_div_table *divt = audio_div->table;
> +
> +               for (; divt->rate; divt++) {
> +                       if ((divt->int_reg == reg_int) && (divt->frac_reg == reg_frac))
> +                               return divt->rate;
> +               }
> +       }
> +       if (audio_div->table)
> +               pr_warn("cannot found the config(int_reg:0x%x, frac_reg:0x%x) in table, we will caculate it\n",
> +                       reg_int, reg_frac);

Logic of register value table can be removed now.

> +
> +       m = reg_frac & 0xffff;
> +       n = (reg_frac >> 16) & 0xffff;
> +
> +       m = (reg_int & 0xffff) * n + m;
> +       rate = (parent_rate * n) / m;
> +
> +       return rate;
> +}
> +
> +static void audio_calc_reg(struct clk_zx_audio_divider *audio_div,
> +                          struct zx_clk_audio_div_table *div_table,
> +                          unsigned long rate, unsigned long parent_rate)
> +{
> +       unsigned int reg_int, reg_frac;
> +       unsigned long m, n, div;
> +
> +       if (audio_div->table) {
> +               const struct zx_clk_audio_div_table *divt = audio_div->table;
> +
> +               for (; divt->rate; divt++) {
> +                       if (divt->rate == rate) {
> +                               div_table->rate = divt->rate;
> +                               div_table->int_reg = divt->int_reg;
> +                               div_table->frac_reg = divt->frac_reg;
> +                               return;
> +                       }
> +               }
> +       }
> +       if (audio_div->table)
> +               pr_warn("cannot found the rate(%ld) in table, we will caculate the config\n",
> +                       rate);

Table is not used here actually neither.

> +
> +       reg_int = parent_rate / rate;
> +
> +       if (reg_int > CLK_AUDIO_DIV_INT_FRAC_MAX)
> +               reg_int = CLK_AUDIO_DIV_INT_FRAC_MAX;
> +       else if (reg_int < CLK_AUDIO_DIV_INT_FRAC_MIN)
> +               reg_int = 0;
> +       m = parent_rate - rate * reg_int;
> +       n = rate;
> +
> +       div = gcd(m, n);
> +       m = m / div;
> +       n = n / div;
> +
> +       if ((m >> 16) || (n >> 16)) {
> +               if (m > n) {
> +                       n = n * 0xffff / m;
> +                       m = 0xffff;
> +               } else {
> +                       m = m * 0xffff / n;
> +                       n = 0xffff;
> +               }
> +       }
> +       reg_frac = m | (n << 16);
> +
> +       div_table->rate = (ulong)(parent_rate * n) / ((ulong)reg_int * n + m);
> +       div_table->int_reg = reg_int;
> +       div_table->frac_reg = reg_frac;
> +}
> +
> +static unsigned long zx_audio_div_recalc_rate(struct clk_hw *hw,
> +                                         unsigned long parent_rate)
> +{
> +       struct clk_zx_audio_divider *zx_audio_div = to_clk_zx_audio_div(hw);
> +       u32 reg_frac, reg_int;
> +
> +       reg_frac = readl_relaxed(zx_audio_div->reg_base);
> +       reg_int = readl_relaxed(zx_audio_div->reg_base + 0x4);
> +
> +       return audio_calc_rate(zx_audio_div, reg_frac, reg_int, parent_rate);
> +}
> +
> +static long zx_audio_div_round_rate(struct clk_hw *hw, unsigned long rate,
> +                               unsigned long *prate)
> +{
> +       struct clk_zx_audio_divider *zx_audio_div = to_clk_zx_audio_div(hw);
> +       struct zx_clk_audio_div_table divt;
> +
> +       audio_calc_reg(zx_audio_div, &divt, rate, *prate);
> +
> +       return audio_calc_rate(zx_audio_div, divt.frac_reg, divt.int_reg, *prate);
> +}
> +
> +static int zx_audio_div_set_rate(struct clk_hw *hw, unsigned long rate,
> +                                   unsigned long parent_rate)
> +{
> +       struct clk_zx_audio_divider *zx_audio_div = to_clk_zx_audio_div(hw);
> +       struct zx_clk_audio_div_table divt;
> +       unsigned int val;
> +
> +       audio_calc_reg(zx_audio_div, &divt, rate, parent_rate);
> +       if (divt.rate != rate)
> +               pr_info("the real rate is:%ld", divt.rate);
> +
> +       writel_relaxed(divt.frac_reg, zx_audio_div->reg_base);
> +
> +       val = readl_relaxed(zx_audio_div->reg_base + 0x4);
> +       val &= ~0xffff;
> +       val |= divt.int_reg | CLK_AUDIO_DIV_INT_FRAC_RE;
> +       writel_relaxed(val, zx_audio_div->reg_base + 0x4);
> +
> +       mdelay(1);
> +
> +       val = readl_relaxed(zx_audio_div->reg_base + 0x4);
> +       val &= ~CLK_AUDIO_DIV_INT_FRAC_RE;
> +       writel_relaxed(val, zx_audio_div->reg_base + 0x4);
> +
> +       return 0;
> +}
> +
> +const struct clk_ops zx_audio_div_ops = {
> +       .recalc_rate = zx_audio_div_recalc_rate,
> +       .round_rate = zx_audio_div_round_rate,
> +       .set_rate = zx_audio_div_set_rate,
> +};
> diff --git a/drivers/clk/zte/clk.h b/drivers/clk/zte/clk.h
> index 0df3474b2cf3..6e7ccb752c24 100644
> --- a/drivers/clk/zte/clk.h
> +++ b/drivers/clk/zte/clk.h
> @@ -153,6 +153,32 @@ struct zx_clk_div {
>         .id = _id,                                                      \
>  }
>
> +struct zx_clk_audio_div_table {
> +       unsigned long rate;
> +       unsigned int int_reg;
> +       unsigned int frac_reg;
> +};
> +
> +struct clk_zx_audio_divider {
> +       struct clk_hw                           hw;
> +       void __iomem                            *reg_base;
> +       const struct zx_clk_audio_div_table     *table;
> +       unsigned int                            rate_count;
> +       spinlock_t                              *lock;
> +       u16                                     id;
> +};
> +
> +#define AUDIO_DIV(_id, _name, _parent, _reg, _table)                   \

Remove unused table here.

> +{                                                                      \
> +       .reg_base       = (void __iomem *) _reg,                        \
> +       .lock           = &clk_lock,                                    \
> +       .hw.init        = CLK_HW_INIT(_name,                            \
> +                                     _parent,                          \
> +                                     &zx_audio_div_ops,                \
> +                                     0),                               \
> +       .id = _id,                                                      \
> +}
> +
>  struct clk *clk_register_zx_pll(const char *name, const char *parent_name,
>         unsigned long flags, void __iomem *reg_base,
>         const struct zx_pll_config *lookup_table, int count, spinlock_t *lock);
> @@ -167,4 +193,6 @@ struct clk *clk_register_zx_audio(const char *name,
>                                   unsigned long flags, void __iomem *reg_base);
>
>  extern const struct clk_ops zx_pll_ops;
> +extern const struct clk_ops zx_audio_div_ops;
> +
>  #endif
> --
> 1.9.1
>

^ permalink raw reply

* [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence GEM.
From: Andrei.Pistirica at microchip.com @ 2016-12-08 14:41 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161207210416.GA27622@netboy>



> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran at gmail.com]
> Sent: Wednesday, December 07, 2016 11:04 PM
> To: Andrei Pistirica - M16132
> Cc: netdev at vger.kernel.org; linux-kernel at vger.kernel.org; linux-arm-
> kernel at lists.infradead.org; davem at davemloft.net;
> nicolas.ferre at atmel.com; harinikatakamlinux at gmail.com;
> harini.katakam at xilinx.com; punnaia at xilinx.com; michals at xilinx.com;
> anirudh at xilinx.com; boris.brezillon at free-electrons.com;
> alexandre.belloni at free-electrons.com; tbultel at pixelsurmer.com;
> rafalo at cadence.com
> Subject: Re: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in
> Cadence GEM.
> 
> On Wed, Dec 07, 2016 at 08:39:09PM +0100, Richard Cochran wrote:
> > > +static s32 gem_ptp_max_adj(unsigned int f_nom) {
> > > +	u64 adj;
> > > +
> > > +	/* The 48 bits of seconds for the GEM overflows every:
> > > +	 * 2^48/(365.25 * 24 * 60 *60) =~ 8 925 512 years (~= 9 mil years),
> > > +	 * thus the maximum adjust frequency must not overflow CNS
> register:
> > > +	 *
> > > +	 * addend  = 10^9/nominal_freq
> > > +	 * adj_max = +/- addend*ppb_max/10^9
> > > +	 * max_ppb = (2^8-1)*nominal_freq-10^9
> > > +	 */
> > > +	adj = f_nom;
> > > +	adj *= 0xffff;
> > > +	adj -= 1000000000ULL;
> >
> > What is this computation, and how does it relate to the comment?

I considered the following simple equation: increment value at nominal frequency (which is 10^9/nominal frequency nsecs) + the maximum drift value (nsecs) <= maximum increment value@nominal frequency (which is 8bit:0xffff).
If maximum drift is written as function of nominal frequency and maximum ppb, then the equation above yields that the maximum ppb is: (2^8 - 1) *nominal_frequency - 10^9. The equation is also simplified by the fact that the drift is written as ppm + 16bit_fractions and the increment value is written as nsec + 16bit_fractions.

Rafal said that this value is hardcoded: 0x64E6, while Harini said: 250000000.

I need to dig into this...

> 
> I am not sure what you meant, but it sounds like you are on the wrong track.
> Let me explain...

Thanks.

> 
> The max_adj has nothing at all to do with the width of the time register.
> Rather, it should reflect the maximum possible change in the tuning word.
> 
> For example, with a nominal 8 ns period, the tuning word is 0x80000.
> Looking at running the clock more slowly, the slowest possible word is
> 0x00001, meaning a difference of 0x7FFFF.  This implies an adjustment of
> 0x7FFFF/0x80000 or 999998092 ppb.  Running more quickly, we can already
> have 0x100000, twice as fast, or just under 2 billion ppb.
> 
> You should consider the extreme cases to determine the most limited
> (smallest) max_adj value:
> 
> Case 1 - high frequency
> ~~~~~~~~~~~~~~~~~~~~~~~
> 
> With a nominal 1 ns period, we have the nominal tuning word 0x10000.
> The smallest is 0x1 for a difference of 0xFFFF.  This corresponds to an
> adjustment of 0xFFFF/0x10000 = .9999847412109375 or 999984741 ppb.
> 
> Case 2 - low frequency
> ~~~~~~~~~~~~~~~~~~~~~~
> 
> With a nominal 255 ns period, the nominal word is 0xFF0000, the largest
> 0xFFFFFF, and the difference is 0xFFFF.  This corresponds to and adjustment
> of 0xFFFF/0xFF0000 = .0039215087890625 or 3921508 ppb.
> 
> Since 3921508 ppb is a huge adjustment, you can simply use that as a safe
> maximum, ignoring the actual input clock.
> 
> Thanks,
> Richard
> 
> 

Regards,
Andrei

^ permalink raw reply

* [PATCH V1] pinctrl:pxa:pinctrl-pxa2xx:- No need of devm functions
From: Arvind Yadav @ 2016-12-08 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

In functions pxa2xx_build_functions, the memory allocated for
'functions' is live within the function only. After the
allocation it is immediately freed with devm_kfree. There is
no need to allocate memory for 'functions' with devm function
so replace devm_kcalloc  with kcalloc and devm_kfree with kfree.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
 drivers/pinctrl/pxa/pinctrl-pxa2xx.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/pinctrl/pxa/pinctrl-pxa2xx.c b/drivers/pinctrl/pxa/pinctrl-pxa2xx.c
index 866aa3c..47b8e3a 100644
--- a/drivers/pinctrl/pxa/pinctrl-pxa2xx.c
+++ b/drivers/pinctrl/pxa/pinctrl-pxa2xx.c
@@ -277,7 +277,7 @@ static int pxa2xx_build_functions(struct pxa_pinctrl *pctl)
 	 * alternate function, 6 * npins is an absolute high limit of the number
 	 * of functions.
 	 */
-	functions = devm_kcalloc(pctl->dev, pctl->npins * 6,
+	functions = kcalloc(pctl->npins * 6,
 				 sizeof(*functions), GFP_KERNEL);
 	if (!functions)
 		return -ENOMEM;
@@ -289,10 +289,12 @@ static int pxa2xx_build_functions(struct pxa_pinctrl *pctl)
 	pctl->functions = devm_kmemdup(pctl->dev, functions,
 				       pctl->nfuncs * sizeof(*functions),
 				       GFP_KERNEL);
-	if (!pctl->functions)
+	if (!pctl->functions) {
+		kfree(functions);
 		return -ENOMEM;
+	}
 
-	devm_kfree(pctl->dev, functions);
+	kfree(functions);
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/2] crypto: arm/chacha20 - implement NEON version based on SSE3 code
From: Ard Biesheuvel @ 2016-12-08 14:28 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481207339-17332-1-git-send-email-ard.biesheuvel@linaro.org>

This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig              |   6 +
 arch/arm/crypto/Makefile             |   2 +
 arch/arm/crypto/chacha20-neon-core.S | 524 ++++++++++++++++++++
 arch/arm/crypto/chacha20-neon-glue.c | 136 +++++
 4 files changed, 668 insertions(+)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 13f1b4c289d4..2f3339f015d3 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -130,4 +130,10 @@ config CRYPTO_CRC32_ARM_CE
 	depends on KERNEL_MODE_NEON && CRC32
 	select CRYPTO_HASH
 
+config CRYPTO_CHACHA20_NEON
+	tristate "NEON accelerated ChaCha20 symmetric cipher"
+	depends on KERNEL_MODE_NEON
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_CHACHA20
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index b578a1820ab1..8d74e55eacd4 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
+obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
 
 ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -40,6 +41,7 @@ aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
+chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S
new file mode 100644
index 000000000000..9f315041f521
--- /dev/null
+++ b/arch/arm/crypto/chacha20-neon-core.S
@@ -0,0 +1,524 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SNEON3 functions
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.fpu		neon
+	.align		5
+
+ENTRY(chacha20_block_xor_neon)
+	// r0: Input state matrix, s
+	// r1: 1 data block output, o
+	// r2: 1 data block input, i
+
+	//
+	// This function encrypts one ChaCha20 block by loading the state matrix
+	// in four NEON registers. It performs matrix operation on four words in
+	// parallel, but requireds shuffling to rearrange the words after each
+	// round.
+	//
+
+	// x0..3 = s0..3
+	add		ip, r0, #0x20
+	vld1.32		{q0-q1}, [r0]
+	vld1.32		{q2-q3}, [ip]
+
+	vmov		q8, q0
+	vmov		q9, q1
+	vmov		q10, q2
+	vmov		q11, q3
+
+	mov		r3, #10
+
+.Ldoubleround:
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+	vadd.i32	q0, q0, q1
+	veor		q4, q3, q0
+	vshl.u32	q3, q4, #16
+	vsri.u32	q3, q4, #16
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+	vadd.i32	q2, q2, q3
+	veor		q4, q1, q2
+	vshl.u32	q1, q4, #12
+	vsri.u32	q1, q4, #20
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+	vadd.i32	q0, q0, q1
+	veor		q4, q3, q0
+	vshl.u32	q3, q4, #8
+	vsri.u32	q3, q4, #24
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+	vadd.i32	q2, q2, q3
+	veor		q4, q1, q2
+	vshl.u32	q1, q4, #7
+	vsri.u32	q1, q4, #25
+
+	// x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+	vext.8		q1, q1, q1, #4
+	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+	vext.8		q2, q2, q2, #8
+	// x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+	vext.8		q3, q3, q3, #12
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+	vadd.i32	q0, q0, q1
+	veor		q4, q3, q0
+	vshl.u32	q3, q4, #16
+	vsri.u32	q3, q4, #16
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+	vadd.i32	q2, q2, q3
+	veor		q4, q1, q2
+	vshl.u32	q1, q4, #12
+	vsri.u32	q1, q4, #20
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+	vadd.i32	q0, q0, q1
+	veor		q4, q3, q0
+	vshl.u32	q3, q4, #8
+	vsri.u32	q3, q4, #24
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+	vadd.i32	q2, q2, q3
+	veor		q4, q1, q2
+	vshl.u32	q1, q4, #7
+	vsri.u32	q1, q4, #25
+
+	// x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+	vext.8		q1, q1, q1, #12
+	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+	vext.8		q2, q2, q2, #8
+	// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+	vext.8		q3, q3, q3, #4
+
+	subs		r3, r3, #1
+	bne		.Ldoubleround
+
+	add		ip, r2, #0x20
+	vld1.8		{q4-q5}, [r2]
+	vld1.8		{q6-q7}, [ip]
+
+	// o0 = i0 ^ (x0 + s0)
+	vadd.i32	q0, q0, q8
+	veor		q0, q0, q4
+
+	// o1 = i1 ^ (x1 + s1)
+	vadd.i32	q1, q1, q9
+	veor		q1, q1, q5
+
+	// o2 = i2 ^ (x2 + s2)
+	vadd.i32	q2, q2, q10
+	veor		q2, q2, q6
+
+	// o3 = i3 ^ (x3 + s3)
+	vadd.i32	q3, q3, q11
+	veor		q3, q3, q7
+
+	add		ip, r1, #0x20
+	vst1.8		{q0-q1}, [r1]
+	vst1.8		{q2-q3}, [ip]
+
+	bx		lr
+ENDPROC(chacha20_block_xor_neon)
+
+	.align		5
+ENTRY(chacha20_4block_xor_neon)
+	push		{r4-r6, lr}
+	mov		ip, sp			// preserve the stack pointer
+	sub		r3, sp, #0x20		// allocate a 32 byte buffer
+	bic		r3, r3, #0x1f		// aligned to 32 bytes
+	mov		sp, r3
+
+	// r0: Input state matrix, s
+	// r1: 4 data blocks output, o
+	// r2: 4 data blocks input, i
+
+	//
+	// This function encrypts four consecutive ChaCha20 blocks by loading
+	// the state matrix in NEON registers four times. The algorithm performs
+	// each operation on the corresponding word of each state matrix, hence
+	// requires no word shuffling. For final XORing step we transpose the
+	// matrix by interleaving 32- and then 64-bit words, which allows us to
+	// do XOR in NEON registers.
+	//
+
+	// x0..15[0-3] = s0..3[0..3]
+	add		r3, r0, #0x20
+	vld1.32		{q0-q1}, [r0]
+	vld1.32		{q2-q3}, [r3]
+
+	adr		r3, CTRINC
+	vdup.32		q15, d7[1]
+	vdup.32		q14, d7[0]
+	vld1.32		{q11}, [r3, :128]
+	vdup.32		q13, d6[1]
+	vdup.32		q12, d6[0]
+	vadd.i32	q12, q12, q11		// x12 += counter values 0-3
+	vdup.32		q11, d5[1]
+	vdup.32		q10, d5[0]
+	vdup.32		q9, d4[1]
+	vdup.32		q8, d4[0]
+	vdup.32		q7, d3[1]
+	vdup.32		q6, d3[0]
+	vdup.32		q5, d2[1]
+	vdup.32		q4, d2[0]
+	vdup.32		q3, d1[1]
+	vdup.32		q2, d1[0]
+	vdup.32		q1, d0[1]
+	vdup.32		q0, d0[0]
+
+	mov		r3, #10
+
+.Ldoubleround4:
+	// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
+	// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
+	// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
+	// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
+	vadd.i32	q0, q0, q4
+	vadd.i32	q1, q1, q5
+	vadd.i32	q2, q2, q6
+	vadd.i32	q3, q3, q7
+
+	veor		q12, q12, q0
+	veor		q13, q13, q1
+	veor		q14, q14, q2
+	veor		q15, q15, q3
+
+	vrev32.16	q12, q12
+	vrev32.16	q13, q13
+	vrev32.16	q14, q14
+	vrev32.16	q15, q15
+
+	// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
+	// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
+	// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
+	// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
+	vadd.i32	q8, q8, q12
+	vadd.i32	q9, q9, q13
+	vadd.i32	q10, q10, q14
+	vadd.i32	q11, q11, q15
+
+	vst1.32		{q8-q9}, [sp, :256]
+
+	veor		q8, q4, q8
+	veor		q9, q5, q9
+	vshl.u32	q4, q8, #12
+	vshl.u32	q5, q9, #12
+	vsri.u32	q4, q8, #20
+	vsri.u32	q5, q9, #20
+
+	veor		q8, q6, q10
+	veor		q9, q7, q11
+	vshl.u32	q6, q8, #12
+	vshl.u32	q7, q9, #12
+	vsri.u32	q6, q8, #20
+	vsri.u32	q7, q9, #20
+
+	// x0 += x4, x12 = rotl32(x12 ^ x0, 8)
+	// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
+	// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
+	// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
+	vadd.i32	q0, q0, q4
+	vadd.i32	q1, q1, q5
+	vadd.i32	q2, q2, q6
+	vadd.i32	q3, q3, q7
+
+	veor		q8, q12, q0
+	veor		q9, q13, q1
+	vshl.u32	q12, q8, #8
+	vshl.u32	q13, q9, #8
+	vsri.u32	q12, q8, #24
+	vsri.u32	q13, q9, #24
+
+	veor		q8, q14, q2
+	veor		q9, q15, q3
+	vshl.u32	q14, q8, #8
+	vshl.u32	q15, q9, #8
+	vsri.u32	q14, q8, #24
+	vsri.u32	q15, q9, #24
+
+	vld1.32		{q8-q9}, [sp, :256]
+
+	// x8 += x12, x4 = rotl32(x4 ^ x8, 7)
+	// x9 += x13, x5 = rotl32(x5 ^ x9, 7)
+	// x10 += x14, x6 = rotl32(x6 ^ x10, 7)
+	// x11 += x15, x7 = rotl32(x7 ^ x11, 7)
+	vadd.i32	q8, q8, q12
+	vadd.i32	q9, q9, q13
+	vadd.i32	q10, q10, q14
+	vadd.i32	q11, q11, q15
+
+	vst1.32		{q8-q9}, [sp, :256]
+
+	veor		q8, q4, q8
+	veor		q9, q5, q9
+	vshl.u32	q4, q8, #7
+	vshl.u32	q5, q9, #7
+	vsri.u32	q4, q8, #25
+	vsri.u32	q5, q9, #25
+
+	veor		q8, q6, q10
+	veor		q9, q7, q11
+	vshl.u32	q6, q8, #7
+	vshl.u32	q7, q9, #7
+	vsri.u32	q6, q8, #25
+	vsri.u32	q7, q9, #25
+
+	vld1.32		{q8-q9}, [sp, :256]
+
+	// x0 += x5, x15 = rotl32(x15 ^ x0, 16)
+	// x1 += x6, x12 = rotl32(x12 ^ x1, 16)
+	// x2 += x7, x13 = rotl32(x13 ^ x2, 16)
+	// x3 += x4, x14 = rotl32(x14 ^ x3, 16)
+	vadd.i32	q0, q0, q5
+	vadd.i32	q1, q1, q6
+	vadd.i32	q2, q2, q7
+	vadd.i32	q3, q3, q4
+
+	veor		q15, q15, q0
+	veor		q12, q12, q1
+	veor		q13, q13, q2
+	veor		q14, q14, q3
+
+	vrev32.16	q15, q15
+	vrev32.16	q12, q12
+	vrev32.16	q13, q13
+	vrev32.16	q14, q14
+
+	// x10 += x15, x5 = rotl32(x5 ^ x10, 12)
+	// x11 += x12, x6 = rotl32(x6 ^ x11, 12)
+	// x8 += x13, x7 = rotl32(x7 ^ x8, 12)
+	// x9 += x14, x4 = rotl32(x4 ^ x9, 12)
+	vadd.i32	q10, q10, q15
+	vadd.i32	q11, q11, q12
+	vadd.i32	q8, q8, q13
+	vadd.i32	q9, q9, q14
+
+	vst1.32		{q8-q9}, [sp, :256]
+
+	veor		q8, q7, q8
+	veor		q9, q4, q9
+	vshl.u32	q7, q8, #12
+	vshl.u32	q4, q9, #12
+	vsri.u32	q7, q8, #20
+	vsri.u32	q4, q9, #20
+
+	veor		q8, q5, q10
+	veor		q9, q6, q11
+	vshl.u32	q5, q8, #12
+	vshl.u32	q6, q9, #12
+	vsri.u32	q5, q8, #20
+	vsri.u32	q6, q9, #20
+
+	// x0 += x5, x15 = rotl32(x15 ^ x0, 8)
+	// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
+	// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
+	// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
+	vadd.i32	q0, q0, q5
+	vadd.i32	q1, q1, q6
+	vadd.i32	q2, q2, q7
+	vadd.i32	q3, q3, q4
+
+	veor		q8, q15, q0
+	veor		q9, q12, q1
+	vshl.u32	q15, q8, #8
+	vshl.u32	q12, q9, #8
+	vsri.u32	q15, q8, #24
+	vsri.u32	q12, q9, #24
+
+	veor		q8, q13, q2
+	veor		q9, q14, q3
+	vshl.u32	q13, q8, #8
+	vshl.u32	q14, q9, #8
+	vsri.u32	q13, q8, #24
+	vsri.u32	q14, q9, #24
+
+	vld1.32		{q8-q9}, [sp, :256]
+
+	// x10 += x15, x5 = rotl32(x5 ^ x10, 7)
+	// x11 += x12, x6 = rotl32(x6 ^ x11, 7)
+	// x8 += x13, x7 = rotl32(x7 ^ x8, 7)
+	// x9 += x14, x4 = rotl32(x4 ^ x9, 7)
+	vadd.i32	q10, q10, q15
+	vadd.i32	q11, q11, q12
+	vadd.i32	q8, q8, q13
+	vadd.i32	q9, q9, q14
+
+	vst1.32		{q8-q9}, [sp, :256]
+
+	veor		q8, q7, q8
+	veor		q9, q4, q9
+	vshl.u32	q7, q8, #7
+	vshl.u32	q4, q9, #7
+	vsri.u32	q7, q8, #25
+	vsri.u32	q4, q9, #25
+
+	veor		q8, q5, q10
+	veor		q9, q6, q11
+	vshl.u32	q5, q8, #7
+	vshl.u32	q6, q9, #7
+	vsri.u32	q5, q8, #25
+	vsri.u32	q6, q9, #25
+
+	subs		r3, r3, #1
+	beq		0f
+
+	vld1.32		{q8-q9}, [sp, :256]
+	b		.Ldoubleround4
+
+	// x0[0-3] += s0[0]
+	// x1[0-3] += s0[1]
+	// x2[0-3] += s0[2]
+	// x3[0-3] += s0[3]
+0:	ldmia		r0!, {r3-r6}
+	vdup.32		q8, r3
+	vdup.32		q9, r4
+	vadd.i32	q0, q0, q8
+	vadd.i32	q1, q1, q9
+	vdup.32		q8, r5
+	vdup.32		q9, r6
+	vadd.i32	q2, q2, q8
+	vadd.i32	q3, q3, q9
+
+	// x4[0-3] += s1[0]
+	// x5[0-3] += s1[1]
+	// x6[0-3] += s1[2]
+	// x7[0-3] += s1[3]
+	ldmia		r0!, {r3-r6}
+	vdup.32		q8, r3
+	vdup.32		q9, r4
+	vadd.i32	q4, q4, q8
+	vadd.i32	q5, q5, q9
+	vdup.32		q8, r5
+	vdup.32		q9, r6
+	vadd.i32	q6, q6, q8
+	vadd.i32	q7, q7, q9
+
+	// interleave 32-bit words in state n, n+1
+	vzip.32		q0, q1
+	vzip.32		q2, q3
+	vzip.32		q4, q5
+	vzip.32		q6, q7
+
+	// interleave 64-bit words in state n, n+2
+	vswp		d1, d4
+	vswp		d3, d6
+	vswp		d9, d12
+	vswp		d11, d14
+
+	// xor with corresponding input, write to output
+	vld1.8		{q8-q9}, [r2]!
+	veor		q8, q8, q0
+	veor		q9, q9, q4
+	vst1.8		{q8-q9}, [r1]!
+
+	vld1.32		{q8-q9}, [sp, :256]
+
+	// x8[0-3] += s2[0]
+	// x9[0-3] += s2[1]
+	// x10[0-3] += s2[2]
+	// x11[0-3] += s2[3]
+	ldmia		r0!, {r3-r6}
+	vdup.32		q0, r3
+	vdup.32		q4, r4
+	vadd.i32	q8, q8, q0
+	vadd.i32	q9, q9, q4
+	vdup.32		q0, r5
+	vdup.32		q4, r6
+	vadd.i32	q10, q10, q0
+	vadd.i32	q11, q11, q4
+
+	// x12[0-3] += s3[0]
+	// x13[0-3] += s3[1]
+	// x14[0-3] += s3[2]
+	// x15[0-3] += s3[3]
+	ldmia		r0!, {r3-r6}
+	vdup.32		q0, r3
+	vdup.32		q4, r4
+	adr		r3, CTRINC
+	vadd.i32	q12, q12, q0
+	vld1.32		{q0}, [r3, :128]
+	vadd.i32	q13, q13, q4
+	vadd.i32	q12, q12, q0		// x12 += counter values 0-3
+
+	vdup.32		q0, r5
+	vdup.32		q4, r6
+	vadd.i32	q14, q14, q0
+	vadd.i32	q15, q15, q4
+
+	// interleave 32-bit words in state n, n+1
+	vzip.32		q8, q9
+	vzip.32		q10, q11
+	vzip.32		q12, q13
+	vzip.32		q14, q15
+
+	// interleave 64-bit words in state n, n+2
+	vswp		d17, d20
+	vswp		d19, d22
+	vswp		d25, d28
+	vswp		d27, d30
+
+	vmov		q4, q1
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q8
+	veor		q1, q1, q12
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q2
+	veor		q1, q1, q6
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q10
+	veor		q1, q1, q14
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q4
+	veor		q1, q1, q5
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q9
+	veor		q1, q1, q13
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q3
+	veor		q1, q1, q7
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]
+	veor		q0, q0, q11
+	veor		q1, q1, q15
+	vst1.8		{q0-q1}, [r1]
+
+	mov		sp, ip
+	pop		{r4-r6, pc}
+ENDPROC(chacha20_4block_xor_neon)
+
+	.align		4
+CTRINC:	.word		0, 1, 2, 3
+
diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c
new file mode 100644
index 000000000000..554f7f6069da
--- /dev/null
+++ b/arch/arm/crypto/chacha20-neon-glue.c
@@ -0,0 +1,136 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/chacha20.h>
+#include <linux/crypto.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+
+asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+
+static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
+			    unsigned int bytes)
+{
+	u8 buf[CHACHA20_BLOCK_SIZE];
+
+	while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
+		chacha20_4block_xor_neon(state, dst, src);
+		bytes -= CHACHA20_BLOCK_SIZE * 4;
+		src += CHACHA20_BLOCK_SIZE * 4;
+		dst += CHACHA20_BLOCK_SIZE * 4;
+		state[12] += 4;
+	}
+	while (bytes >= CHACHA20_BLOCK_SIZE) {
+		chacha20_block_xor_neon(state, dst, src);
+		bytes -= CHACHA20_BLOCK_SIZE;
+		src += CHACHA20_BLOCK_SIZE;
+		dst += CHACHA20_BLOCK_SIZE;
+		state[12]++;
+	}
+	if (bytes) {
+		memcpy(buf, src, bytes);
+		chacha20_block_xor_neon(state, buf, buf);
+		memcpy(dst, buf, bytes);
+	}
+}
+
+static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist *dst,
+			 struct scatterlist *src, unsigned int nbytes)
+{
+	struct blkcipher_walk walk;
+	u32 state[16];
+	int err;
+
+	if (nbytes <= CHACHA20_BLOCK_SIZE || !may_use_simd())
+		return crypto_chacha20_crypt(desc, dst, src, nbytes);
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt_block(desc, &walk, CHACHA20_BLOCK_SIZE);
+
+	crypto_chacha20_init(state, crypto_blkcipher_ctx(desc->tfm), walk.iv);
+
+	kernel_neon_begin();
+
+	while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
+		chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
+				rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
+		err = blkcipher_walk_done(desc, &walk,
+					  walk.nbytes % CHACHA20_BLOCK_SIZE);
+	}
+
+	if (walk.nbytes) {
+		chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
+				walk.nbytes);
+		err = blkcipher_walk_done(desc, &walk, 0);
+	}
+
+	kernel_neon_end();
+
+	return err;
+}
+
+static struct crypto_alg alg = {
+	.cra_name		= "chacha20",
+	.cra_driver_name	= "chacha20-neon",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= 1,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_ctxsize		= sizeof(struct chacha20_ctx),
+	.cra_alignmask		= sizeof(u32) - 1,
+	.cra_module		= THIS_MODULE,
+	.cra_u			= {
+		.blkcipher = {
+			.min_keysize	= CHACHA20_KEY_SIZE,
+			.max_keysize	= CHACHA20_KEY_SIZE,
+			.ivsize		= CHACHA20_IV_SIZE,
+			.geniv		= "seqiv",
+			.setkey		= crypto_chacha20_setkey,
+			.encrypt	= chacha20_simd,
+			.decrypt	= chacha20_simd,
+		},
+	},
+};
+
+static int __init chacha20_simd_mod_init(void)
+{
+	if (!(elf_hwcap & HWCAP_NEON))
+		return -ENODEV;
+
+	return crypto_register_alg(&alg);
+}
+
+static void __exit chacha20_simd_mod_fini(void)
+{
+	crypto_unregister_alg(&alg);
+}
+
+module_init(chacha20_simd_mod_init);
+module_exit(chacha20_simd_mod_fini);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("chacha20");
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/2] crypto: arm64/chacha20 - implement NEON version based on SSE3 code
From: Ard Biesheuvel @ 2016-12-08 14:28 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1481207339-17332-1-git-send-email-ard.biesheuvel@linaro.org>

This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig              |   6 +
 arch/arm64/crypto/Makefile             |   3 +
 arch/arm64/crypto/chacha20-neon-core.S | 480 ++++++++++++++++++++
 arch/arm64/crypto/chacha20-neon-glue.c | 131 ++++++
 4 files changed, 620 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 450a85df041a..0bf0f531f539 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -72,4 +72,10 @@ config CRYPTO_CRC32_ARM64
 	depends on ARM64
 	select CRYPTO_HASH
 
+config CRYPTO_CHACHA20_NEON
+	tristate "NEON accelerated ChaCha20 symmetric cipher"
+	depends on KERNEL_MODE_NEON
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_CHACHA20
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index aa8888d7b744..9d2826c5fccf 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -41,6 +41,9 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
 sha512-arm64-y := sha512-glue.o sha512-core.o
 
+obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+
 AFLAGS_aes-ce.o		:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o	:= -DINTERLEAVE=4
 
diff --git a/arch/arm64/crypto/chacha20-neon-core.S b/arch/arm64/crypto/chacha20-neon-core.S
new file mode 100644
index 000000000000..e2cd65580807
--- /dev/null
+++ b/arch/arm64/crypto/chacha20-neon-core.S
@@ -0,0 +1,480 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.align		6
+
+ENTRY(chacha20_block_xor_neon)
+	// x0: Input state matrix, s
+	// x1: 1 data block output, o
+	// x2: 1 data block input, i
+
+	//
+	// This function encrypts one ChaCha20 block by loading the state matrix
+	// in four NEON registers. It performs matrix operation on four words in
+	// parallel, but requires shuffling to rearrange the words after each
+	// round.
+	//
+
+	// x0..3 = s0..3
+	ld1		{v0.4s-v3.4s}, [x0]
+	ld1		{v8.4s-v11.4s}, [x0]
+
+	mov		x3, #10
+
+.Ldoubleround:
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+	add		v0.4s, v0.4s, v1.4s
+	eor		v3.16b, v3.16b, v0.16b
+	rev32		v3.8h, v3.8h
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+	add		v2.4s, v2.4s, v3.4s
+	eor		v4.16b, v1.16b, v2.16b
+	shl		v1.4s, v4.4s, #12
+	sri		v1.4s, v4.4s, #20
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+	add		v0.4s, v0.4s, v1.4s
+	eor		v4.16b, v3.16b, v0.16b
+	shl		v3.4s, v4.4s, #8
+	sri		v3.4s, v4.4s, #24
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+	add		v2.4s, v2.4s, v3.4s
+	eor		v4.16b, v1.16b, v2.16b
+	shl		v1.4s, v4.4s, #7
+	sri		v1.4s, v4.4s, #25
+
+	// x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+	ext		v1.16b, v1.16b, v1.16b, #4
+	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+	ext		v2.16b, v2.16b, v2.16b, #8
+	// x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+	ext		v3.16b, v3.16b, v3.16b, #12
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+	add		v0.4s, v0.4s, v1.4s
+	eor		v3.16b, v3.16b, v0.16b
+	rev32		v3.8h, v3.8h
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+	add		v2.4s, v2.4s, v3.4s
+	eor		v4.16b, v1.16b, v2.16b
+	shl		v1.4s, v4.4s, #12
+	sri		v1.4s, v4.4s, #20
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+	add		v0.4s, v0.4s, v1.4s
+	eor		v4.16b, v3.16b, v0.16b
+	shl		v3.4s, v4.4s, #8
+	sri		v3.4s, v4.4s, #24
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+	add		v2.4s, v2.4s, v3.4s
+	eor		v4.16b, v1.16b, v2.16b
+	shl		v1.4s, v4.4s, #7
+	sri		v1.4s, v4.4s, #25
+
+	// x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+	ext		v1.16b, v1.16b, v1.16b, #12
+	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+	ext		v2.16b, v2.16b, v2.16b, #8
+	// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+	ext		v3.16b, v3.16b, v3.16b, #4
+
+	subs		x3, x3, #1
+	b.ne		.Ldoubleround
+
+	ld1		{v4.16b-v7.16b}, [x2]
+
+	// o0 = i0 ^ (x0 + s0)
+	add		v0.4s, v0.4s, v8.4s
+	eor		v0.16b, v0.16b, v4.16b
+
+	// o1 = i1 ^ (x1 + s1)
+	add		v1.4s, v1.4s, v9.4s
+	eor		v1.16b, v1.16b, v5.16b
+
+	// o2 = i2 ^ (x2 + s2)
+	add		v2.4s, v2.4s, v10.4s
+	eor		v2.16b, v2.16b, v6.16b
+
+	// o3 = i3 ^ (x3 + s3)
+	add		v3.4s, v3.4s, v11.4s
+	eor		v3.16b, v3.16b, v7.16b
+
+	st1		{v0.16b-v3.16b}, [x1]
+
+	ret
+ENDPROC(chacha20_block_xor_neon)
+
+	.align		6
+ENTRY(chacha20_4block_xor_neon)
+	// x0: Input state matrix, s
+	// x1: 4 data blocks output, o
+	// x2: 4 data blocks input, i
+
+	//
+	// This function encrypts four consecutive ChaCha20 blocks by loading
+	// the state matrix in NEON registers four times. The algorithm performs
+	// each operation on the corresponding word of each state matrix, hence
+	// requires no word shuffling. For final XORing step we transpose the
+	// matrix by interleaving 32- and then 64-bit words, which allows us to
+	// do XOR in NEON registers.
+	//
+	adr		x3, CTRINC
+	ld1		{v16.4s}, [x3]
+
+	// x0..15[0-3] = s0..3[0..3]
+	mov		x4, x0
+	ld4r		{ v0.4s- v3.4s}, [x4], #16
+	ld4r		{ v4.4s- v7.4s}, [x4], #16
+	ld4r		{ v8.4s-v11.4s}, [x4], #16
+	ld4r		{v12.4s-v15.4s}, [x4]
+
+	// x12 += counter values 0-3
+	add		v12.4s, v12.4s, v16.4s
+
+	mov		x3, #10
+
+.Ldoubleround4:
+	// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
+	// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
+	// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
+	// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
+	add		v0.4s, v0.4s, v4.4s
+	add		v1.4s, v1.4s, v5.4s
+	add		v2.4s, v2.4s, v6.4s
+	add		v3.4s, v3.4s, v7.4s
+
+	eor		v12.16b, v12.16b, v0.16b
+	eor		v13.16b, v13.16b, v1.16b
+	eor		v14.16b, v14.16b, v2.16b
+	eor		v15.16b, v15.16b, v3.16b
+
+	rev32		v12.8h, v12.8h
+	rev32		v13.8h, v13.8h
+	rev32		v14.8h, v14.8h
+	rev32		v15.8h, v15.8h
+
+	// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
+	// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
+	// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
+	// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
+	add		v8.4s, v8.4s, v12.4s
+	add		v9.4s, v9.4s, v13.4s
+	add		v10.4s, v10.4s, v14.4s
+	add		v11.4s, v11.4s, v15.4s
+
+	eor		v17.16b, v4.16b, v8.16b
+	eor		v18.16b, v5.16b, v9.16b
+	eor		v19.16b, v6.16b, v10.16b
+	eor		v20.16b, v7.16b, v11.16b
+
+	shl		v4.4s, v17.4s, #12
+	shl		v5.4s, v18.4s, #12
+	shl		v6.4s, v19.4s, #12
+	shl		v7.4s, v20.4s, #12
+
+	sri		v4.4s, v17.4s, #20
+	sri		v5.4s, v18.4s, #20
+	sri		v6.4s, v19.4s, #20
+	sri		v7.4s, v20.4s, #20
+
+	// x0 += x4, x12 = rotl32(x12 ^ x0, 8)
+	// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
+	// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
+	// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
+	add		v0.4s, v0.4s, v4.4s
+	add		v1.4s, v1.4s, v5.4s
+	add		v2.4s, v2.4s, v6.4s
+	add		v3.4s, v3.4s, v7.4s
+
+	eor		v17.16b, v12.16b, v0.16b
+	eor		v18.16b, v13.16b, v1.16b
+	eor		v19.16b, v14.16b, v2.16b
+	eor		v20.16b, v15.16b, v3.16b
+
+	shl		v12.4s, v17.4s, #8
+	shl		v13.4s, v18.4s, #8
+	shl		v14.4s, v19.4s, #8
+	shl		v15.4s, v20.4s, #8
+
+	sri		v12.4s, v17.4s, #24
+	sri		v13.4s, v18.4s, #24
+	sri		v14.4s, v19.4s, #24
+	sri		v15.4s, v20.4s, #24
+
+	// x8 += x12, x4 = rotl32(x4 ^ x8, 7)
+	// x9 += x13, x5 = rotl32(x5 ^ x9, 7)
+	// x10 += x14, x6 = rotl32(x6 ^ x10, 7)
+	// x11 += x15, x7 = rotl32(x7 ^ x11, 7)
+	add		v8.4s, v8.4s, v12.4s
+	add		v9.4s, v9.4s, v13.4s
+	add		v10.4s, v10.4s, v14.4s
+	add		v11.4s, v11.4s, v15.4s
+
+	eor		v17.16b, v4.16b, v8.16b
+	eor		v18.16b, v5.16b, v9.16b
+	eor		v19.16b, v6.16b, v10.16b
+	eor		v20.16b, v7.16b, v11.16b
+
+	shl		v4.4s, v17.4s, #7
+	shl		v5.4s, v18.4s, #7
+	shl		v6.4s, v19.4s, #7
+	shl		v7.4s, v20.4s, #7
+
+	sri		v4.4s, v17.4s, #25
+	sri		v5.4s, v18.4s, #25
+	sri		v6.4s, v19.4s, #25
+	sri		v7.4s, v20.4s, #25
+
+	// x0 += x5, x15 = rotl32(x15 ^ x0, 16)
+	// x1 += x6, x12 = rotl32(x12 ^ x1, 16)
+	// x2 += x7, x13 = rotl32(x13 ^ x2, 16)
+	// x3 += x4, x14 = rotl32(x14 ^ x3, 16)
+	add		v0.4s, v0.4s, v5.4s
+	add		v1.4s, v1.4s, v6.4s
+	add		v2.4s, v2.4s, v7.4s
+	add		v3.4s, v3.4s, v4.4s
+
+	eor		v15.16b, v15.16b, v0.16b
+	eor		v12.16b, v12.16b, v1.16b
+	eor		v13.16b, v13.16b, v2.16b
+	eor		v14.16b, v14.16b, v3.16b
+
+	rev32		v15.8h, v15.8h
+	rev32		v12.8h, v12.8h
+	rev32		v13.8h, v13.8h
+	rev32		v14.8h, v14.8h
+
+	// x10 += x15, x5 = rotl32(x5 ^ x10, 12)
+	// x11 += x12, x6 = rotl32(x6 ^ x11, 12)
+	// x8 += x13, x7 = rotl32(x7 ^ x8, 12)
+	// x9 += x14, x4 = rotl32(x4 ^ x9, 12)
+	add		v10.4s, v10.4s, v15.4s
+	add		v11.4s, v11.4s, v12.4s
+	add		v8.4s, v8.4s, v13.4s
+	add		v9.4s, v9.4s, v14.4s
+
+	eor		v17.16b, v5.16b, v10.16b
+	eor		v18.16b, v6.16b, v11.16b
+	eor		v19.16b, v7.16b, v8.16b
+	eor		v20.16b, v4.16b, v9.16b
+
+	shl		v5.4s, v17.4s, #12
+	shl		v6.4s, v18.4s, #12
+	shl		v7.4s, v19.4s, #12
+	shl		v4.4s, v20.4s, #12
+
+	sri		v5.4s, v17.4s, #20
+	sri		v6.4s, v18.4s, #20
+	sri		v7.4s, v19.4s, #20
+	sri		v4.4s, v20.4s, #20
+
+	// x0 += x5, x15 = rotl32(x15 ^ x0, 8)
+	// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
+	// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
+	// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
+	add		v0.4s, v0.4s, v5.4s
+	add		v1.4s, v1.4s, v6.4s
+	add		v2.4s, v2.4s, v7.4s
+	add		v3.4s, v3.4s, v4.4s
+
+	eor		v17.16b, v15.16b, v0.16b
+	eor		v18.16b, v12.16b, v1.16b
+	eor		v19.16b, v13.16b, v2.16b
+	eor		v20.16b, v14.16b, v3.16b
+
+	shl		v15.4s, v17.4s, #8
+	shl		v12.4s, v18.4s, #8
+	shl		v13.4s, v19.4s, #8
+	shl		v14.4s, v20.4s, #8
+
+	sri		v15.4s, v17.4s, #24
+	sri		v12.4s, v18.4s, #24
+	sri		v13.4s, v19.4s, #24
+	sri		v14.4s, v20.4s, #24
+
+	// x10 += x15, x5 = rotl32(x5 ^ x10, 7)
+	// x11 += x12, x6 = rotl32(x6 ^ x11, 7)
+	// x8 += x13, x7 = rotl32(x7 ^ x8, 7)
+	// x9 += x14, x4 = rotl32(x4 ^ x9, 7)
+	add		v10.4s, v10.4s, v15.4s
+	add		v11.4s, v11.4s, v12.4s
+	add		v8.4s, v8.4s, v13.4s
+	add		v9.4s, v9.4s, v14.4s
+
+	eor		v17.16b, v5.16b, v10.16b
+	eor		v18.16b, v6.16b, v11.16b
+	eor		v19.16b, v7.16b, v8.16b
+	eor		v20.16b, v4.16b, v9.16b
+
+	shl		v5.4s, v17.4s, #7
+	shl		v6.4s, v18.4s, #7
+	shl		v7.4s, v19.4s, #7
+	shl		v4.4s, v20.4s, #7
+
+	sri		v5.4s, v17.4s, #25
+	sri		v6.4s, v18.4s, #25
+	sri		v7.4s, v19.4s, #25
+	sri		v4.4s, v20.4s, #25
+
+	subs		x3, x3, #1
+	b.ne		.Ldoubleround4
+
+	// x0[0-3] += s0[0]
+	// x1[0-3] += s0[1]
+	// x2[0-3] += s0[2]
+	// x3[0-3] += s0[3]
+	ld4r		{v17.4s-v20.4s}, [x0], #16
+	add		v0.4s, v0.4s, v17.4s
+	add		v1.4s, v1.4s, v18.4s
+	add		v2.4s, v2.4s, v19.4s
+	add		v3.4s, v3.4s, v20.4s
+
+	// x4[0-3] += s1[0]
+	// x5[0-3] += s1[1]
+	// x6[0-3] += s1[2]
+	// x7[0-3] += s1[3]
+	ld4r		{v21.4s-v24.4s}, [x0], #16
+	add		v4.4s, v4.4s, v21.4s
+	add		v5.4s, v5.4s, v22.4s
+	add		v6.4s, v6.4s, v23.4s
+	add		v7.4s, v7.4s, v24.4s
+
+	// x8[0-3] += s2[0]
+	// x9[0-3] += s2[1]
+	// x10[0-3] += s2[2]
+	// x11[0-3] += s2[3]
+	ld4r		{v17.4s-v20.4s}, [x0], #16
+	add		v8.4s, v8.4s, v17.4s
+	add		v9.4s, v9.4s, v18.4s
+	add		v10.4s, v10.4s, v19.4s
+	add		v11.4s, v11.4s, v20.4s
+
+	// x12[0-3] += s3[0]
+	// x13[0-3] += s3[1]
+	// x14[0-3] += s3[2]
+	// x15[0-3] += s3[3]
+	ld4r		{v21.4s-v24.4s}, [x0]
+	add		v12.4s, v12.4s, v21.4s
+	add		v13.4s, v13.4s, v22.4s
+	add		v14.4s, v14.4s, v23.4s
+	add		v15.4s, v15.4s, v24.4s
+
+	// x12 += counter values 0-3
+	add		v12.4s, v12.4s, v16.4s
+
+	ld1		{v16.16b-v19.16b}, [x2], #64
+	ld1		{v20.16b-v23.16b}, [x2], #64
+
+	// interleave 32-bit words in state n, n+1
+	zip1		v24.4s, v0.4s, v1.4s
+	zip1		v25.4s, v2.4s, v3.4s
+	zip1		v26.4s, v4.4s, v5.4s
+	zip1		v27.4s, v6.4s, v7.4s
+	zip1		v28.4s, v8.4s, v9.4s
+	zip1		v29.4s, v10.4s, v11.4s
+	zip1		v30.4s, v12.4s, v13.4s
+	zip1		v31.4s, v14.4s, v15.4s
+
+	zip2		v1.4s, v0.4s, v1.4s
+	zip2		v3.4s, v2.4s, v3.4s
+	zip2		v5.4s, v4.4s, v5.4s
+	zip2		v7.4s, v6.4s, v7.4s
+	zip2		v9.4s, v8.4s, v9.4s
+	zip2		v11.4s, v10.4s, v11.4s
+	zip2		v13.4s, v12.4s, v13.4s
+	zip2		v15.4s, v14.4s, v15.4s
+
+	mov		v0.16b, v24.16b
+	mov		v2.16b, v25.16b
+	mov		v4.16b, v26.16b
+	mov		v6.16b, v27.16b
+	mov		v8.16b, v28.16b
+	mov		v10.16b, v29.16b
+	mov		v12.16b, v30.16b
+	mov		v14.16b, v31.16b
+
+	// interleave 64-bit words in state n, n+2
+	zip1		v24.2d, v0.2d, v2.2d
+	zip1		v25.2d, v1.2d, v3.2d
+	zip1		v26.2d, v4.2d, v6.2d
+	zip1		v27.2d, v5.2d, v7.2d
+	zip1		v28.2d, v8.2d, v10.2d
+	zip1		v29.2d, v9.2d, v11.2d
+	zip1		v30.2d, v12.2d, v14.2d
+	zip1		v31.2d, v13.2d, v15.2d
+
+	zip2		v2.2d, v0.2d, v2.2d
+	zip2		v3.2d, v1.2d, v3.2d
+	zip2		v6.2d, v4.2d, v6.2d
+	zip2		v7.2d, v5.2d, v7.2d
+	zip2		v10.2d, v8.2d, v10.2d
+	zip2		v11.2d, v9.2d, v11.2d
+	zip2		v14.2d, v12.2d, v14.2d
+	zip2		v15.2d, v13.2d, v15.2d
+
+	mov		v0.16b, v24.16b
+	mov		v1.16b, v25.16b
+	mov		v4.16b, v26.16b
+	mov		v5.16b, v27.16b
+
+	mov		v8.16b, v28.16b
+	mov		v9.16b, v29.16b
+	mov		v12.16b, v30.16b
+	mov		v13.16b, v31.16b
+
+	ld1		{v24.16b-v27.16b}, [x2], #64
+	ld1		{v28.16b-v31.16b}, [x2]
+
+	// xor with corresponding input, write to output
+	eor		v16.16b, v16.16b, v0.16b
+	eor		v17.16b, v17.16b, v4.16b
+	eor		v18.16b, v18.16b, v8.16b
+	eor		v19.16b, v19.16b, v12.16b
+	st1		{v16.16b-v19.16b}, [x1], #64
+
+	eor		v20.16b, v20.16b, v2.16b
+	eor		v21.16b, v21.16b, v6.16b
+	eor		v22.16b, v22.16b, v10.16b
+	eor		v23.16b, v23.16b, v14.16b
+	st1		{v20.16b-v23.16b}, [x1], #64
+
+	eor		v24.16b, v24.16b, v1.16b
+	eor		v25.16b, v25.16b, v5.16b
+	eor		v26.16b, v26.16b, v9.16b
+	eor		v27.16b, v27.16b, v13.16b
+	st1		{v24.16b-v27.16b}, [x1], #64
+
+	eor		v28.16b, v28.16b, v3.16b
+	eor		v29.16b, v29.16b, v7.16b
+	eor		v30.16b, v30.16b, v11.16b
+	eor		v31.16b, v31.16b, v15.16b
+	st1		{v28.16b-v31.16b}, [x1]
+
+	ret
+ENDPROC(chacha20_4block_xor_neon)
+
+CTRINC:	.word		0, 1, 2, 3
diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
new file mode 100644
index 000000000000..705b42b06d00
--- /dev/null
+++ b/arch/arm64/crypto/chacha20-neon-glue.c
@@ -0,0 +1,131 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/chacha20.h>
+#include <linux/crypto.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <asm/neon.h>
+
+asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+
+static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
+			    unsigned int bytes)
+{
+	u8 buf[CHACHA20_BLOCK_SIZE];
+
+	while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
+		chacha20_4block_xor_neon(state, dst, src);
+		bytes -= CHACHA20_BLOCK_SIZE * 4;
+		src += CHACHA20_BLOCK_SIZE * 4;
+		dst += CHACHA20_BLOCK_SIZE * 4;
+		state[12] += 4;
+	}
+	while (bytes >= CHACHA20_BLOCK_SIZE) {
+		chacha20_block_xor_neon(state, dst, src);
+		bytes -= CHACHA20_BLOCK_SIZE;
+		src += CHACHA20_BLOCK_SIZE;
+		dst += CHACHA20_BLOCK_SIZE;
+		state[12]++;
+	}
+	if (bytes) {
+		memcpy(buf, src, bytes);
+		chacha20_block_xor_neon(state, buf, buf);
+		memcpy(dst, buf, bytes);
+	}
+}
+
+static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist *dst,
+			 struct scatterlist *src, unsigned int nbytes)
+{
+	struct blkcipher_walk walk;
+	u32 state[16];
+	int err;
+
+	if (nbytes <= CHACHA20_BLOCK_SIZE)
+		return crypto_chacha20_crypt(desc, dst, src, nbytes);
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt_block(desc, &walk, CHACHA20_BLOCK_SIZE);
+
+	crypto_chacha20_init(state, crypto_blkcipher_ctx(desc->tfm), walk.iv);
+
+	kernel_neon_begin();
+
+	while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
+		chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
+				rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
+		err = blkcipher_walk_done(desc, &walk,
+					  walk.nbytes % CHACHA20_BLOCK_SIZE);
+	}
+
+	if (walk.nbytes) {
+		chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
+				walk.nbytes);
+		err = blkcipher_walk_done(desc, &walk, 0);
+	}
+
+	kernel_neon_end();
+
+	return err;
+}
+
+static struct crypto_alg alg = {
+	.cra_name		= "chacha20",
+	.cra_driver_name	= "chacha20-neon",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= 1,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_ctxsize		= sizeof(struct chacha20_ctx),
+	.cra_alignmask		= sizeof(u32) - 1,
+	.cra_module		= THIS_MODULE,
+	.cra_u			= {
+		.blkcipher = {
+			.min_keysize	= CHACHA20_KEY_SIZE,
+			.max_keysize	= CHACHA20_KEY_SIZE,
+			.ivsize		= CHACHA20_IV_SIZE,
+			.geniv		= "seqiv",
+			.setkey		= crypto_chacha20_setkey,
+			.encrypt	= chacha20_simd,
+			.decrypt	= chacha20_simd,
+		},
+	},
+};
+
+static int __init chacha20_simd_mod_init(void)
+{
+	return crypto_register_alg(&alg);
+}
+
+static void __exit chacha20_simd_mod_fini(void)
+{
+	crypto_unregister_alg(&alg);
+}
+
+module_init(chacha20_simd_mod_init);
+module_exit(chacha20_simd_mod_fini);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("chacha20");
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20
From: Ard Biesheuvel @ 2016-12-08 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

Another port of existing x86 SSE code to NEON, again both for arm64 and ARM.

ChaCha20 is a stream cipher described in RFC 7539, and is intended to be
an efficient software implementable 'standby cipher', in case AES cannot
be used.

This NEON implementation is almost 2x as fast as the generic C code
(measured on Cortex-A57 using the arm64 version)

I'm aware that blkciphers are deprecated in favor of skciphers, but this
code (like the x86 version) uses the init and setkey routines of the generic
version, so it is probably better to port all implementations at once.

Ard Biesheuvel (2):
  crypto: arm64/chacha20 - implement NEON version based on SSE3 code
  crypto: arm/chacha20 - implement NEON version based on SSE3 code

 arch/arm/crypto/Kconfig                |   6 +
 arch/arm/crypto/Makefile               |   2 +
 arch/arm/crypto/chacha20-neon-core.S   | 524 ++++++++++++++++++++
 arch/arm/crypto/chacha20-neon-glue.c   | 136 +++++
 arch/arm64/crypto/Kconfig              |   6 +
 arch/arm64/crypto/Makefile             |   3 +
 arch/arm64/crypto/chacha20-neon-core.S | 480 ++++++++++++++++++
 arch/arm64/crypto/chacha20-neon-glue.c | 131 +++++
 8 files changed, 1288 insertions(+)
 create mode 100644 arch/arm/crypto/chacha20-neon-core.S
 create mode 100644 arch/arm/crypto/chacha20-neon-glue.c
 create mode 100644 arch/arm64/crypto/chacha20-neon-core.S
 create mode 100644 arch/arm64/crypto/chacha20-neon-glue.c

-- 
2.7.4

^ permalink raw reply

* [PATCH v2.1 1/4] ARM: dts: davinci: da850: VPIF: add node and muxing
From: Laurent Pinchart @ 2016-12-08 13:51 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161208001418.4469-1-khilman@baylibre.com>

Hi Kevin,

Thank you for the patch.

On Wednesday 07 Dec 2016 16:14:18 Kevin Hilman wrote:
> Add VPIF node an pins to da850 and enable on boards.  VPIF has two input
> channels described using the standard DT ports and enpoints.
> 
> Signed-off-by: Kevin Hilman <khilman@baylibre.com>

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> ---
> v2 -> v2.1: moved ports from SoC .dtsi to board .dts files.
> 
>  arch/arm/boot/dts/da850-evm.dts  | 20 ++++++++++++++++++++
>  arch/arm/boot/dts/da850-lcdk.dts | 13 +++++++++++++
>  arch/arm/boot/dts/da850.dtsi     | 27 ++++++++++++++++++++++++++-
>  3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/boot/dts/da850-evm.dts
> b/arch/arm/boot/dts/da850-evm.dts index 41de15fe15a2..cea36ee6fd07 100644
> --- a/arch/arm/boot/dts/da850-evm.dts
> +++ b/arch/arm/boot/dts/da850-evm.dts
> @@ -289,3 +289,23 @@
>  		};
>  	};
>  };
> +
> +&vpif {
> +	pinctrl-names = "default";
> +	pinctrl-0 = <&vpif_capture_pins>;
> +	status = "okay";
> +
> +	/* VPIF capture port */
> +	port {
> +		vpif_ch0: endpoint at 0 {
> +			  reg = <0>;
> +			  bus-width = <8>;
> +		};
> +
> +		vpif_ch1: endpoint at 1 {
> +			  reg = <1>;
> +			  bus-width = <8>;
> +			  data-shift = <8>;
> +		};
> +	};
> +};
> diff --git a/arch/arm/boot/dts/da850-lcdk.dts
> b/arch/arm/boot/dts/da850-lcdk.dts index 7b8ab21fed6c..5fc21528e0ba 100644
> --- a/arch/arm/boot/dts/da850-lcdk.dts
> +++ b/arch/arm/boot/dts/da850-lcdk.dts
> @@ -219,3 +219,16 @@
>  		};
>  	};
>  };
> +
> +&vpif {
> +	pinctrl-names = "default";
> +	pinctrl-0 = <&vpif_capture_pins>;
> +	status = "okay";
> +
> +	/* VPIF capture port */
> +	port {
> +		vpif_ch0: endpoint {
> +			  bus-width = <8>;
> +		};
> +	};
> +};
> diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
> index f79e1b91c680..5f0b40510b2b 100644
> --- a/arch/arm/boot/dts/da850.dtsi
> +++ b/arch/arm/boot/dts/da850.dtsi
> @@ -186,7 +186,18 @@
>  					0xc 0x88888888 0xffffffff
> 
>  				>;
> 
>  			};
> -
> +			vpif_capture_pins: vpif_capture_pins {
> +				pinctrl-single,bits = <
> +					/* VP_DIN[2..7], VP_CLKIN1, VP_CLKIN0 
*/
> +					0x38 0x11111111 0xffffffff
> +					/* VP_DIN[10..15,0..1] */
> +					0x3c 0x11111111 0xffffffff
> +					/* VP_DIN[8..9] */
> +					0x40 0x00000011 0x000000ff
> +					/* VP_CLKIN3, VP_CLKIN2 */
> +					0x4c 0x00010100 0x000f0f00
> +				>;
> +			};
>  		};
>  		edma0: edma at 0 {
>  			compatible = "ti,edma3-tpcc";
> @@ -399,7 +410,21 @@
>  				<&edma0 0 1>;
>  			dma-names = "tx", "rx";
>  		};
> +
> +		vpif: video at 217000 {
> +			compatible = "ti,da850-vpif";
> +			reg = <0x217000 0x1000>;
> +			interrupts = <92>;
> +			status = "disabled";
> +
> +			/* VPIF capture port */
> +			port {
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +		};
>  	};
> +
>  	aemif: aemif at 68000000 {
>  		compatible = "ti,da850-aemif";
>  		#address-cells = <2>;

-- 
Regards,

Laurent Pinchart

^ permalink raw reply

* Tearing down DMA transfer setup after DMA client has finished
From: Måns Rullgård @ 2016-12-08 13:39 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <68bbe77a-c991-4e64-c189-fafbcda8e7ae@free.fr>

Mason <slash.tmp@free.fr> writes:

> On 08/12/2016 13:44, M?ns Rullg?rd wrote:
>
>> Mason <slash.tmp@free.fr> writes:
>> 
>>> On 08/12/2016 13:20, M?ns Rullg?rd wrote:
>>>
>>>> The only problem we have is that nobody envisioned hardware where the
>>>> dma engine indicates completion slightly too soon.  I suspect there's a
>>>> fifo or such somewhere, and the interrupt is triggered when the last
>>>> byte has been placed in the fifo rather than when it has been removed
>>>> which would have been more correct.
>>>
>>> As I (tried to) explain here:
>>> https://marc.info/?l=dmaengine&m=148007808418242&w=2
>>>
>>> A *read* MBUS agent raises its IRQ when it is safe for the memory
>>> to be overwritten (i.e. every byte has been pushed into the pipe).
>>>
>>> A *write* MBUS agent raises its IRQ when it is safe for another
>>> agent to read any one of the transferred bytes.
>>>
>>> The issue comes from the fact that, for a memory-to-device transfer,
>>> the system will receive the read agent's IRQ, but most devices
>>> (NFC, SATA) don't have an IRQ line to signal that their part of the
>>> operation is complete.
>> 
>> SATA does, actually.  Nevertheless, it's an unusual design.
>
> Thanks, I was mistaken about the SATA controller.
>
> On tango3 (and also tango4, I assume)
>
> IRQ 41 = Serial ATA #0
> IRQ 42 = Serial ATA DMA #0
> IRQ 54 = Serial ATA #1
> IRQ 55 = Serial ATA DMA #1
>
> But in the end, whether there is a device interrupt (SATA)
> or not (NFC), for a memory-to-device transfer, the DMA
> driver will get the read agent notification (which should
> be ignored) and the client driver should either spin until
> idle (NFC) or wait for its completion IRQ (SATA).
>
> Correct?

Yes, and when the client device is finished, the driver needs to signal
this to the dma driver so it can reuse the channel.  It's this last
piece that's missing.

-- 
M?ns Rullg?rd

^ permalink raw reply

* [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions
From: Auger Eric @ 2016-12-08 13:36 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <cd16fc5c-8649-0bfa-d67d-8f257aa38bd6@arm.com>

Hi Robin,

On 08/12/2016 14:14, Robin Murphy wrote:
> On 08/12/16 09:36, Auger Eric wrote:
>> Hi,
>>
>> On 15/11/2016 14:09, Eric Auger wrote:
>>> Following LPC discussions, we now report reserved regions through
>>> iommu-group sysfs reserved_regions attribute file.
>>
>>
>> While I am respinning this series into v4, here is a tentative summary
>> of technical topics for which no consensus was reached at this point.
>>
>> 1) Shall we report the usable IOVA range instead of reserved IOVA
>>    ranges. Not discussed at/after LPC.
>>    x I currently report reserved regions. Alex expressed the need to
>>      report the full usable IOVA range instead (x86 min-max range
>>      minus MSI APIC window). I think this is meaningful for ARM
>>      too where arm-smmu might not support the full 64b range.
>>    x Any objection we report the usable IOVA regions instead?
> 
> The issue with that is that we can't actually report "the usable
> regions" at the moment, as that involves pulling together disjoint
> properties of arbitrary hardware unrelated to the IOMMU. We'd be
> reporting "the not-definitely-unusable regions, which may have some
> unusable holes in them still". That seems like an ABI nightmare - I'd
> still much rather say "here are some, but not necessarily all, regions
> you definitely can't use", because saying "here are some regions which
> you might be able to use most of, probably" is what we're already doing
> today, via a single implicit region from 0 to ULONG_MAX ;)
> 
> The address space limits are definitely useful to know, but I think it
> would be better to expose them separately to avoid the ambiguity. At
> worst, I guess it would be reasonable to express the limits via an
> "out-of-range" reserved region type for 0 to $base and $top to
> ULONG-MAX. To *safely* expose usable regions, we'd have to start out
> with a very conservative assumption (e.g. only IOVAs matching physical
> RAM), and only expand them once we're sure we can detect every possible
> bit of problematic hardware in the system - that's just too limiting to
> be useful. And if we expose something knowingly inaccurate, we risk
> having another "bogoMIPS in /proc/cpuinfo" ABI burden on our hands, and
> nobody wants that...
Makes sense to me. "out-of-range reserved region type for 0 to $base and
$top to ULONG-MAX" can be an alternative to fulfill the requirement.
> 
>> 2) Shall the kernel check collision with MSI window* when userspace
>>    calls VFIO_IOMMU_MAP_DMA?
>>    Joerg/Will No; Alex yes
>>    *for IOVA regions consumed downstream to the IOMMU: everyone says NO
> 
> If we're starting off by having the SMMU drivers expose it as a fake
> fixed region, I don't think we need to worry about this yet. We all seem
> to agree that as long as we communicate the fixed regions to userspace,
> it's then userspace's job to work around them. Let's come back to this
> one once we actually get to the point of dynamically sizing and
> allocating 'real' MSI remapping region(s).
> 
> Ultimately, the kernel *will* police collisions either way, because an
> underlying iommu_map() is going to fail if overlapping IOVAs are ever
> actually used, so it's really just a question of whether to have a more
> user-friendly failure mode.
That's true on ARM but not on x86 where the APIC MSI region is not
mapped I think.
> 
>> 3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no
>>    My current series does not expose them in iommu group sysfs.
>>    I understand we can expose the RMRR regions in the iomm group sysfs
>>    without necessarily supporting RMRR requiring device assignment.
>>    We can also add this support later.
> 
> As you say, reporting them doesn't necessitate allowing device
> assignment, and it's information which can already be easily grovelled
> out of dmesg (for intel-iommu at least) - there doesn't seem to be any
> need to hide them, but the x86 folks can have the final word on that.
agreed

Thanks

Eric
> 
> Robin.
> 
>> Thanks
>>
>> Eric
>>
>>
>>>
>>> Reserved regions are populated through the IOMMU get_resv_region callback
>>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
>>> arm-smmu.
>>>
>>> The intel-iommu reports the [FEE0_0000h - FEF0_000h] MSI window as an
>>> IOMMU_RESV_NOMAP reserved region.
>>>
>>> arm-smmu reports the MSI window (arbitrarily located at 0x8000000 and
>>> 1MB large) and the PCI host bridge windows.
>>>
>>> The series integrates a not officially posted patch from Robin:
>>> "iommu/dma: Allow MSI-only cookies".
>>>
>>> This series currently does not address IRQ safety assessment.
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> Git: complete series available at
>>> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
>>>
>>> History:
>>> RFC v2 -> v3:
>>> - switch to an iommu-group sysfs API
>>> - use new dummy allocator provided by Robin
>>> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>>>   the reserved regions
>>> - at the moment ARM MSI base address/size is left unchanged compared
>>>   to v2
>>> - we currently report reserved regions and not usable IOVA regions as
>>>   requested by Alex
>>>
>>> RFC v1 -> v2:
>>> - fix intel_add_reserved_regions
>>> - add mutex lock/unlock in vfio_iommu_type1
>>>
>>>
>>> Eric Auger (10):
>>>   iommu/dma: Allow MSI-only cookies
>>>   iommu: Rename iommu_dm_regions into iommu_resv_regions
>>>   iommu: Add new reserved IOMMU attributes
>>>   iommu: iommu_alloc_resv_region
>>>   iommu: Do not map reserved regions
>>>   iommu: iommu_get_group_resv_regions
>>>   iommu: Implement reserved_regions iommu-group sysfs file
>>>   iommu/vt-d: Implement reserved region get/put callbacks
>>>   iommu/arm-smmu: Implement reserved region get/put callbacks
>>>   vfio/type1: Get MSI cookie
>>>
>>>  drivers/iommu/amd_iommu.c       |  20 +++---
>>>  drivers/iommu/arm-smmu.c        |  52 +++++++++++++++
>>>  drivers/iommu/dma-iommu.c       | 116 ++++++++++++++++++++++++++-------
>>>  drivers/iommu/intel-iommu.c     |  50 ++++++++++----
>>>  drivers/iommu/iommu.c           | 141 ++++++++++++++++++++++++++++++++++++----
>>>  drivers/vfio/vfio_iommu_type1.c |  26 ++++++++
>>>  include/linux/dma-iommu.h       |   7 ++
>>>  include/linux/iommu.h           |  49 ++++++++++----
>>>  8 files changed, 391 insertions(+), 70 deletions(-)
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH] arm64: dts: h3ulcb: Provide sd0_uhs node
From: Simon Horman @ 2016-12-08 13:30 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1480583246-6707-1-git-send-email-horms+renesas@verge.net.au>

On Thu, Dec 01, 2016 at 10:07:26AM +0100, Simon Horman wrote:
> Provide separaate sd0 and sd0_uhs nodes rather than duplicate sd0 nodes.
> 
> Cc: Vladimir Barinov <vladimir.barinov@cogentembedded.com>
> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> Fixes: 93373c309a70 ("arm64: dts: h3ulcb: rename SDHI0 pins")
> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>

I have queued this up as a fix for v4.10.

>  arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts b/arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts
> index 6ffb0517421a..dbea2c3d8f0c 100644
> --- a/arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts
> +++ b/arch/arm64/boot/dts/renesas/r8a7795-h3ulcb.dts
> @@ -169,7 +169,7 @@
>  		power-source = <3300>;
>  	};
>  
> -	sdhi0_pins_uhs: sd0 {
> +	sdhi0_pins_uhs: sd0_uhs {
>  		groups = "sdhi0_data4", "sdhi0_ctrl";
>  		function = "sdhi0";
>  		power-source = <1800>;
> -- 
> 2.7.0.rc3.207.g0ac5344
> 

^ permalink raw reply

* Tearing down DMA transfer setup after DMA client has finished
From: Mason @ 2016-12-08 13:29 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <yw1xfulyzj2b.fsf@unicorn.mansr.com>

On 08/12/2016 13:44, M?ns Rullg?rd wrote:

> Mason <slash.tmp@free.fr> writes:
> 
>> On 08/12/2016 13:20, M?ns Rullg?rd wrote:
>>
>>> The only problem we have is that nobody envisioned hardware where the
>>> dma engine indicates completion slightly too soon.  I suspect there's a
>>> fifo or such somewhere, and the interrupt is triggered when the last
>>> byte has been placed in the fifo rather than when it has been removed
>>> which would have been more correct.
>>
>> As I (tried to) explain here:
>> https://marc.info/?l=dmaengine&m=148007808418242&w=2
>>
>> A *read* MBUS agent raises its IRQ when it is safe for the memory
>> to be overwritten (i.e. every byte has been pushed into the pipe).
>>
>> A *write* MBUS agent raises its IRQ when it is safe for another
>> agent to read any one of the transferred bytes.
>>
>> The issue comes from the fact that, for a memory-to-device transfer,
>> the system will receive the read agent's IRQ, but most devices
>> (NFC, SATA) don't have an IRQ line to signal that their part of the
>> operation is complete.
> 
> SATA does, actually.  Nevertheless, it's an unusual design.

Thanks, I was mistaken about the SATA controller.

On tango3 (and also tango4, I assume)

IRQ 41 = Serial ATA #0
IRQ 42 = Serial ATA DMA #0
IRQ 54 = Serial ATA #1
IRQ 55 = Serial ATA DMA #1

But in the end, whether there is a device interrupt (SATA)
or not (NFC), for a memory-to-device transfer, the DMA
driver will get the read agent notification (which should
be ignored) and the client driver should either spin until
idle (NFC) or wait for its completion IRQ (SATA).

Correct?

Regards.

^ permalink raw reply

* [PATCH] arm64: Work around Falkor erratum 1009
From: Catalin Marinas @ 2016-12-08 13:27 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161208114511.GC9768@leverpostej>

On Thu, Dec 08, 2016 at 11:45:12AM +0000, Mark Rutland wrote:
> On Wed, Dec 07, 2016 at 03:04:31PM -0500, Christopher Covington wrote:
> > +	asm volatile(ALTERNATIVE(
> > +		     "nop \n"
> > +		     "nop \n",
> > +		     "tlbi vmalle1is \n"
> > +		     "dsb ish \n",
> 
> As a general note, perhaps we want a C compatible NOP_ALTERNATIVE() so
> that the nop case can be implicitly generated for sequences like this.

It's also worth checking what cpus_have_const_cap() would generate for
the default (no workaround required) case.

-- 
Catalin

^ permalink raw reply

* [PATCH renesas/devel 1/2] ARM: shmobile: defconfig: Enable r8a774[35] SoCs
From: Simon Horman @ 2016-12-08 13:26 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAMuHMdUyCcT3k22b-N8J8aasE=YDYfQYfU5GqFKEtsdFnUnHCw@mail.gmail.com>

On Tue, Dec 06, 2016 at 03:00:53PM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 6, 2016 at 2:32 PM, Simon Horman <horms+renesas@verge.net.au> wrote:
> > Enable recently added r8a7743 (RZ/G1M) and r8a7745 (RZ/G1E) SoCs.
> >
> > Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
> 
> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>

Thanks, I have queued up this and the following patch for v4.11.

^ permalink raw reply

* [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions
From: Robin Murphy @ 2016-12-08 13:14 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <c5d4efa7-699d-4aa3-44cc-4ce03d0ce185@redhat.com>

On 08/12/16 09:36, Auger Eric wrote:
> Hi,
> 
> On 15/11/2016 14:09, Eric Auger wrote:
>> Following LPC discussions, we now report reserved regions through
>> iommu-group sysfs reserved_regions attribute file.
> 
> 
> While I am respinning this series into v4, here is a tentative summary
> of technical topics for which no consensus was reached at this point.
> 
> 1) Shall we report the usable IOVA range instead of reserved IOVA
>    ranges. Not discussed at/after LPC.
>    x I currently report reserved regions. Alex expressed the need to
>      report the full usable IOVA range instead (x86 min-max range
>      minus MSI APIC window). I think this is meaningful for ARM
>      too where arm-smmu might not support the full 64b range.
>    x Any objection we report the usable IOVA regions instead?

The issue with that is that we can't actually report "the usable
regions" at the moment, as that involves pulling together disjoint
properties of arbitrary hardware unrelated to the IOMMU. We'd be
reporting "the not-definitely-unusable regions, which may have some
unusable holes in them still". That seems like an ABI nightmare - I'd
still much rather say "here are some, but not necessarily all, regions
you definitely can't use", because saying "here are some regions which
you might be able to use most of, probably" is what we're already doing
today, via a single implicit region from 0 to ULONG_MAX ;)

The address space limits are definitely useful to know, but I think it
would be better to expose them separately to avoid the ambiguity. At
worst, I guess it would be reasonable to express the limits via an
"out-of-range" reserved region type for 0 to $base and $top to
ULONG-MAX. To *safely* expose usable regions, we'd have to start out
with a very conservative assumption (e.g. only IOVAs matching physical
RAM), and only expand them once we're sure we can detect every possible
bit of problematic hardware in the system - that's just too limiting to
be useful. And if we expose something knowingly inaccurate, we risk
having another "bogoMIPS in /proc/cpuinfo" ABI burden on our hands, and
nobody wants that...

> 2) Shall the kernel check collision with MSI window* when userspace
>    calls VFIO_IOMMU_MAP_DMA?
>    Joerg/Will No; Alex yes
>    *for IOVA regions consumed downstream to the IOMMU: everyone says NO

If we're starting off by having the SMMU drivers expose it as a fake
fixed region, I don't think we need to worry about this yet. We all seem
to agree that as long as we communicate the fixed regions to userspace,
it's then userspace's job to work around them. Let's come back to this
one once we actually get to the point of dynamically sizing and
allocating 'real' MSI remapping region(s).

Ultimately, the kernel *will* police collisions either way, because an
underlying iommu_map() is going to fail if overlapping IOVAs are ever
actually used, so it's really just a question of whether to have a more
user-friendly failure mode.

> 3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no
>    My current series does not expose them in iommu group sysfs.
>    I understand we can expose the RMRR regions in the iomm group sysfs
>    without necessarily supporting RMRR requiring device assignment.
>    We can also add this support later.

As you say, reporting them doesn't necessitate allowing device
assignment, and it's information which can already be easily grovelled
out of dmesg (for intel-iommu at least) - there doesn't seem to be any
need to hide them, but the x86 folks can have the final word on that.

Robin.

> Thanks
> 
> Eric
> 
> 
>>
>> Reserved regions are populated through the IOMMU get_resv_region callback
>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
>> arm-smmu.
>>
>> The intel-iommu reports the [FEE0_0000h - FEF0_000h] MSI window as an
>> IOMMU_RESV_NOMAP reserved region.
>>
>> arm-smmu reports the MSI window (arbitrarily located at 0x8000000 and
>> 1MB large) and the PCI host bridge windows.
>>
>> The series integrates a not officially posted patch from Robin:
>> "iommu/dma: Allow MSI-only cookies".
>>
>> This series currently does not address IRQ safety assessment.
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
>>
>> History:
>> RFC v2 -> v3:
>> - switch to an iommu-group sysfs API
>> - use new dummy allocator provided by Robin
>> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>>   the reserved regions
>> - at the moment ARM MSI base address/size is left unchanged compared
>>   to v2
>> - we currently report reserved regions and not usable IOVA regions as
>>   requested by Alex
>>
>> RFC v1 -> v2:
>> - fix intel_add_reserved_regions
>> - add mutex lock/unlock in vfio_iommu_type1
>>
>>
>> Eric Auger (10):
>>   iommu/dma: Allow MSI-only cookies
>>   iommu: Rename iommu_dm_regions into iommu_resv_regions
>>   iommu: Add new reserved IOMMU attributes
>>   iommu: iommu_alloc_resv_region
>>   iommu: Do not map reserved regions
>>   iommu: iommu_get_group_resv_regions
>>   iommu: Implement reserved_regions iommu-group sysfs file
>>   iommu/vt-d: Implement reserved region get/put callbacks
>>   iommu/arm-smmu: Implement reserved region get/put callbacks
>>   vfio/type1: Get MSI cookie
>>
>>  drivers/iommu/amd_iommu.c       |  20 +++---
>>  drivers/iommu/arm-smmu.c        |  52 +++++++++++++++
>>  drivers/iommu/dma-iommu.c       | 116 ++++++++++++++++++++++++++-------
>>  drivers/iommu/intel-iommu.c     |  50 ++++++++++----
>>  drivers/iommu/iommu.c           | 141 ++++++++++++++++++++++++++++++++++++----
>>  drivers/vfio/vfio_iommu_type1.c |  26 ++++++++
>>  include/linux/dma-iommu.h       |   7 ++
>>  include/linux/iommu.h           |  49 ++++++++++----
>>  8 files changed, 391 insertions(+), 70 deletions(-)
>>

^ permalink raw reply

* [PATCH 16/18] arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32
From: Catalin Marinas @ 2016-12-08 13:12 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <3703608.GKr1zzErMk@wuerfel>

On Wed, Dec 07, 2016 at 09:40:13PM +0100, Arnd Bergmann wrote:
> On Wednesday, December 7, 2016 4:59:13 PM CET Catalin Marinas wrote:
> > On Tue, Dec 06, 2016 at 11:55:08AM +0530, Yury Norov wrote:
> > > On Mon, Dec 05, 2016 at 04:34:23PM +0000, Catalin Marinas wrote:
> > > > On Fri, Oct 21, 2016 at 11:33:15PM +0300, Yury Norov wrote:
> > > > > New aarch32 ptrace syscall handler is introduced to avoid run-time
> > > > > detection of the task type.
> > > > 
> > > > What's wrong with the run-time detection? If it's just to avoid a
> > > > negligible overhead, I would rather keep the code simpler by avoiding
> > > > duplicating the generic compat_sys_ptrace().
> > > 
> > > Nothing wrong. This is how Arnd asked me to do. You already asked this
> > > question: http://lkml.iu.edu/hypermail/linux/kernel/1604.3/00930.html
> > 
> > Hmm, I completely forgot about this ;). There is still an advantage to
> > doing run-time checking if we avoid touching core code (less acks to
> > gather and less code duplication).
> > 
> > Let's see what Arnd says but the initial patch looked simpler.
> 
> I don't currently have either version of the patch in my inbox
> (the archive is on a different machine), but in general I'd still
> think it's best to avoid the runtime check for aarch64-ilp32
> altogether. I'd have to look at the overall kernel source to
> see if it's worth avoiding one or two instances though, or
> if there are an overwhelming number of other checks that we
> can't avoid at all.

Just in case you haven't found them already, current version:

https://marc.info/?l=linux-arm-kernel&m=147708276818318&w=2

Original version:

https://patchwork.kernel.org/patch/7980521/

The old one looks more readable and given that ptrace is not really a
fast path, I'm not two worried about run-time checks

> Regarding ptrace, I notice that arch/tile doesn't even use
> the compat entry point for its ilp32 user space on 64-bit
> kernels, it just calls the regular 64-bit one. Would that
> help here?

I don't know whether it would work, we have incompatible siginfo_t on
AArch64/ILP32.

-- 
Catalin

^ permalink raw reply

* [PATCH v9 05/11] arm/arm64: vgic: Introduce VENG0 and VENG1 fields to vmcr struct
From: Auger Eric @ 2016-12-08 12:50 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161208122115.GG4816@cbox>

Hi Christoffer,

On 08/12/2016 13:21, Christoffer Dall wrote:
> On Thu, Dec 08, 2016 at 12:52:39PM +0100, Auger Eric wrote:
>> Hi Vijay,
>>
>> On 28/11/2016 15:28, Christoffer Dall wrote:
>>> On Wed, Nov 23, 2016 at 06:31:52PM +0530, vijay.kilari at gmail.com wrote:
>>>> From: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
>>>>
>>>> ICC_VMCR_EL2 supports virtual access to ICC_IGRPEN1_EL1.Enable
>>>> and ICC_IGRPEN0_EL1.Enable fields. Add grpen0 and grpen1 member
>>>> variables to struct vmcr to support read and write of these fields.
>>>>
>>>> Also refactor vgic_set_vmcr and vgic_get_vmcr() code.
>>>> Drop ICH_VMCR_CTLR_SHIFT and ICH_VMCR_CTLR_MASK macros and instead
>>>> use ICH_VMCR_EOI* and ICH_VMCR_CBPR* macros
>>>> .
>>>> Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
>>>> ---
>>>>  include/linux/irqchip/arm-gic-v3.h |  2 --
>>>>  virt/kvm/arm/vgic/vgic-mmio-v2.c   | 16 ----------------
>>>>  virt/kvm/arm/vgic/vgic-mmio.c      | 16 ++++++++++++++++
>>>>  virt/kvm/arm/vgic/vgic-v3.c        | 22 ++++++++++++++++++++--
>>>>  virt/kvm/arm/vgic/vgic.h           |  5 +++++
>>>>  5 files changed, 41 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
>>>> index b4f8287..406fc3e 100644
>>>> --- a/include/linux/irqchip/arm-gic-v3.h
>>>> +++ b/include/linux/irqchip/arm-gic-v3.h
>>>> @@ -404,8 +404,6 @@
>>>>  #define ICH_HCR_EN			(1 << 0)
>>>>  #define ICH_HCR_UIE			(1 << 1)
>>>>  
>>>> -#define ICH_VMCR_CTLR_SHIFT		0
>>>> -#define ICH_VMCR_CTLR_MASK		(0x21f << ICH_VMCR_CTLR_SHIFT)
>>>>  #define ICH_VMCR_CBPR_SHIFT		4
>>>>  #define ICH_VMCR_CBPR_MASK		(1 << ICH_VMCR_CBPR_SHIFT)
>>>>  #define ICH_VMCR_EOIM_SHIFT		9
>>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>> index 2cb04b7..ad353b5 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>> @@ -212,22 +212,6 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>>>>  	}
>>>>  }
>>>>  
>>>> -static void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
>>>> -{
>>>> -	if (kvm_vgic_global_state.type == VGIC_V2)
>>>> -		vgic_v2_set_vmcr(vcpu, vmcr);
>>>> -	else
>>>> -		vgic_v3_set_vmcr(vcpu, vmcr);
>>>> -}
>>>> -
>>>> -static void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
>>>> -{
>>>> -	if (kvm_vgic_global_state.type == VGIC_V2)
>>>> -		vgic_v2_get_vmcr(vcpu, vmcr);
>>>> -	else
>>>> -		vgic_v3_get_vmcr(vcpu, vmcr);
>>>> -}
>>>> -
>>>>  #define GICC_ARCH_VERSION_V2	0x2
>>>>  
>>>>  /* These are for userland accesses only, there is no guest-facing emulation. */
>>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
>>>> index 0d1bc98..f81e0e5 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-mmio.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
>>>> @@ -416,6 +416,22 @@ int vgic_validate_mmio_region_addr(struct kvm_device *dev,
>>>>  	return -ENXIO;
>>>>  }
>>>>  
>>>> +void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
>>>> +{
>>>> +	if (kvm_vgic_global_state.type == VGIC_V2)
>>>> +		vgic_v2_set_vmcr(vcpu, vmcr);
>>>> +	else
>>>> +		vgic_v3_set_vmcr(vcpu, vmcr);
>>>> +}
>>>> +
>>>> +void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
>>>> +{
>>>> +	if (kvm_vgic_global_state.type == VGIC_V2)
>>>> +		vgic_v2_get_vmcr(vcpu, vmcr);
>>>> +	else
>>>> +		vgic_v3_get_vmcr(vcpu, vmcr);
>>>> +}
>>>> +
>>>>  /*
>>>>   * kvm_mmio_read_buf() returns a value in a format where it can be converted
>>>>   * to a byte array and be directly observed as the guest wanted it to appear
>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>>>> index 9f0dae3..a3ff04b 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>>> @@ -175,10 +175,19 @@ void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
>>>>  {
>>>>  	u32 vmcr;
>>>>  
>>>> -	vmcr  = (vmcrp->ctlr << ICH_VMCR_CTLR_SHIFT) & ICH_VMCR_CTLR_MASK;
>>>> +	/*
>>>> +	 * Ignore the FIQen bit, because GIC emulation always implies
>>>> +	 * SRE=1 which means the vFIQEn bit is also RES1.
>>>> +	 */
>>>> +	vmcr = (vmcrp->ctlr & ICC_CTLR_EL1_EOImode_MASK) >>
>>>> +		ICC_CTLR_EL1_EOImode_SHIFT;
>>>> +	vmcr = (vmcr << ICH_VMCR_EOIM_SHIFT) & ICH_VMCR_EOIM_MASK;
> 
>> I am not able to understand why we use ICC_CTLR _*macros here? Please
>> could you explain it to me? Besides if we want to ignore the FIQen bit
>> can't we change the ICH_VMCR_CTLR_MASK value?
>>
> This first statement is setting the vmcr to the ctlr's bit[1], but
> placed in bit[0], and then the next statement is moving that bit value
> to the corresponding place in the vmcr.  I think this is correct
> (although a little opaque).
Yes the question was more about the semantic of the vmcrp->ctlr field. I
thought it was supposed to store ICH_VMCR_EL2 ctrl bits as they are and
not with a different layout.
> 
> There's also a newer series on the list, just so you know.

Argh OK I missed it. I will check the diffs in the AArch64 related patches.

Thanks

Eric
> 
> Thanks,
> -Christoffer
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply

* Tearing down DMA transfer setup after DMA client has finished
From: Måns Rullgård @ 2016-12-08 12:44 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <a865e7d0-4247-657e-25db-9093731005d3@free.fr>

Mason <slash.tmp@free.fr> writes:

> On 08/12/2016 13:20, M?ns Rullg?rd wrote:
>
>> The only problem we have is that nobody envisioned hardware where the
>> dma engine indicates completion slightly too soon.  I suspect there's a
>> fifo or such somewhere, and the interrupt is triggered when the last
>> byte has been placed in the fifo rather than when it has been removed
>> which would have been more correct.
>
> As I (tried to) explain here:
> https://marc.info/?l=dmaengine&m=148007808418242&w=2
>
> A *read* MBUS agent raises its IRQ when it is safe for the memory
> to be overwritten (i.e. every byte has been pushed into the pipe).
>
> A *write* MBUS agent raises its IRQ when it is safe for another
> agent to read any one of the transferred bytes.
>
> The issue comes from the fact that, for a memory-to-device transfer,
> the system will receive the read agent's IRQ, but most devices
> (NFC, SATA) don't have an IRQ line to signal that their part of the
> operation is complete.

SATA does, actually.  Nevertheless, it's an unusual design.

-- 
M?ns Rullg?rd

^ permalink raw reply

* [PATCH 5/8] efi: Get the secure boot status [ver #5]
From: Lukas Wunner @ 2016-12-08 12:42 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <6009.1481184981@warthog.procyon.org.uk>

On Thu, Dec 08, 2016 at 08:16:21AM +0000, David Howells wrote:
> +/*
> + * Determine whether we're in secure boot mode.
> + */
> +enum efi_secureboot_mode efi_get_secureboot(efi_system_table_t *sys_table_arg)
> +{
> +	u8 secboot, setupmode;
> +	unsigned long size;
> +	efi_status_t status;
> +
> +	size = sizeof(secboot);
> +	status = get_efi_var(efi_SecureBoot_name, &efi_variable_guid,
> +			     NULL, &size, &secboot);
> +	if (status != EFI_SUCCESS)
> +		goto out_efi_err;
> +
> +	size = sizeof(setupmode);
> +	status = get_efi_var(efi_SetupMode_name, &efi_variable_guid,
> +			     NULL, &size, &setupmode);
> +	if (status != EFI_SUCCESS)
> +		goto out_efi_err;
> +
> +	if (secboot == 0 || setupmode == 1)
> +		return efi_secureboot_mode_disabled;
> +
> +	pr_efi(sys_table_arg, "UEFI Secure Boot is enabled.\n");
> +	return efi_secureboot_mode_enabled;
> +
> +out_efi_err:
> +	pr_efi_err(sys_table_arg, "Could not determine UEFI Secure Boot status.\n");
> +	if (status == EFI_NOT_FOUND)
> +		return efi_secureboot_mode_disabled;
> +	return efi_secureboot_mode_unknown;
> +}

In the out_efi_err path, the if-statement needs to come before the
pr_efi_err() call.  Otherwise it would be a change of behaviour for
ARM to what we have now.

Also, minor nit, I'd expect Matt to ask for a newline between the
if-statement and the following statements, so:

out_efi_err:
	if (status == EFI_NOT_FOUND)
		return efi_secureboot_mode_disabled;

	pr_efi_err(sys_table_arg, "Could not determine UEFI Secure Boot status.\n");
	return efi_secureboot_mode_unknown;

The error message doesn't say what the consequence is of the
failure to determine the status, but IIUC this differs between
x86 and ARM, is that correct?  (If I remember the discussion
correctly, x86 defaults to disabled, ARM to enabled.)

Thanks,

Lukas

^ permalink raw reply

* Tearing down DMA transfer setup after DMA client has finished
From: Mason @ 2016-12-08 12:41 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <yw1xshpyzk6p.fsf@unicorn.mansr.com>

On 08/12/2016 13:20, M?ns Rullg?rd wrote:

> The only problem we have is that nobody envisioned hardware where the
> dma engine indicates completion slightly too soon.  I suspect there's a
> fifo or such somewhere, and the interrupt is triggered when the last
> byte has been placed in the fifo rather than when it has been removed
> which would have been more correct.

As I (tried to) explain here:
https://marc.info/?l=dmaengine&m=148007808418242&w=2

A *read* MBUS agent raises its IRQ when it is safe for the memory
to be overwritten (i.e. every byte has been pushed into the pipe).

A *write* MBUS agent raises its IRQ when it is safe for another
agent to read any one of the transferred bytes.

The issue comes from the fact that, for a memory-to-device transfer,
the system will receive the read agent's IRQ, but most devices
(NFC, SATA) don't have an IRQ line to signal that their part of the
operation is complete.

As I explained, if one sets up a memory-to-memory DMA copy, then
the system will actually receive *two* IRQs.

Regards.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox