DMA Engine development

DMA Engine development
 help / color / mirror / Atom feed

* [PATCH v12 0/4] add uart DMA function
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng

In Mediatek SOCs, the uart can support DMA function.
Base on DMA engine formwork, we add the DMA code to support uart. And put the code under drivers/dma/mediatek.

This series contains document bindings, Kconfig to control the function enable or not,
device tree including interrupt and dma device node, the code of UART DMA

Changes compared to v11
-modify TX/RX
-pause function by software
Changes compared to v10
-modify DMA tx status function
-modify 8250_mtk for DMA rx
-add notes to binding Document.
Changes compared to v9
-rename dt-bindings file
-remove direction from device_config
-simplified code
Changes compared to v8
-revise missing items
Changes compared to v7:
-modify apdma uart tx
Changes compared to v6:
-Correct spelling
Changes compared to v5:
-move 'requst irqs' to alloc channel
-remove tasklet.
Changes compared to v4:
-modify Kconfig depends on.
Changes compared to v3:
-fix CONFIG_PM, will cause build fail
Changes compared to v2:
-remove unimportant parameters
-instead of cookie, use APIs of virtual channel.
-use of_dma_xlate_by_chan_id.
Changes compared to v1:
-mian revised file, 8250_mtk_dma.c
--parameters renamed for standard
--remove atomic operation

Long Cheng (4):
  dmaengine: mediatek: Add MediaTek UART APDMA support
  arm: dts: mt2712: add uart APDMA to device tree
  dt-bindings: dma: uart: rename binding
  serial: 8250-mtk: modify uart DMA rx

 .../devicetree/bindings/dma/8250_mtk_dma.txt       |   33 -
 .../devicetree/bindings/dma/mtk-uart-apdma.txt     |   55 ++
 arch/arm64/boot/dts/mediatek/mt2712e.dtsi          |   51 ++
 drivers/dma/mediatek/Kconfig                       |   11 +
 drivers/dma/mediatek/Makefile                      |    1 +
 drivers/dma/mediatek/mtk-uart-apdma.c              |  666 ++++++++++++++++++++
 drivers/tty/serial/8250/8250_mtk.c                 |   53 +-
 7 files changed, 807 insertions(+), 63 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
 create mode 100644 Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt
 create mode 100644 drivers/dma/mediatek/mtk-uart-apdma.c

-- 
1.7.9.5



^ permalink raw reply

* [PATCH 4/4] serial: 8250-mtk: modify uart DMA rx
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng
In-Reply-To: <1556336193-15198-1-git-send-email-long.cheng@mediatek.com>

Modify uart rx and complete for DMA.

Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
 drivers/tty/serial/8250/8250_mtk.c |   53 ++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_mtk.c b/drivers/tty/serial/8250/8250_mtk.c
index c1fdbc0..04081a6 100644
--- a/drivers/tty/serial/8250/8250_mtk.c
+++ b/drivers/tty/serial/8250/8250_mtk.c
@@ -30,7 +30,6 @@
 #define MTK_UART_DMA_EN_TX	0x2
 #define MTK_UART_DMA_EN_RX	0x5
 
-#define MTK_UART_TX_SIZE	UART_XMIT_SIZE
 #define MTK_UART_RX_SIZE	0x8000
 #define MTK_UART_TX_TRIGGER	1
 #define MTK_UART_RX_TRIGGER	MTK_UART_RX_SIZE
@@ -64,28 +63,30 @@ static void mtk8250_dma_rx_complete(void *param)
 	struct mtk8250_data *data = up->port.private_data;
 	struct tty_port *tty_port = &up->port.state->port;
 	struct dma_tx_state state;
+	int copied, cnt, tmp;
 	unsigned char *ptr;
-	int copied;
 
-	dma_sync_single_for_cpu(dma->rxchan->device->dev, dma->rx_addr,
-				dma->rx_size, DMA_FROM_DEVICE);
+	if (data->rx_status == DMA_RX_SHUTDOWN)
+		return;
 
 	dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
+	cnt = dma->rx_size - state.residue;
+	tmp = cnt;
 
-	if (data->rx_status == DMA_RX_SHUTDOWN)
-		return;
+	if ((data->rx_pos + cnt) > dma->rx_size)
+		tmp = dma->rx_size - data->rx_pos;
 
-	if ((data->rx_pos + state.residue) <= dma->rx_size) {
-		ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
-		copied = tty_insert_flip_string(tty_port, ptr, state.residue);
-	} else {
-		ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
-		copied = tty_insert_flip_string(tty_port, ptr,
-						dma->rx_size - data->rx_pos);
+	ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
+	copied = tty_insert_flip_string(tty_port, ptr, tmp);
+	data->rx_pos += tmp;
+
+	if (cnt > tmp) {
 		ptr = (unsigned char *)(dma->rx_buf);
-		copied += tty_insert_flip_string(tty_port, ptr,
-				data->rx_pos + state.residue - dma->rx_size);
+		tmp = cnt - tmp;
+		copied += tty_insert_flip_string(tty_port, ptr, tmp);
+		data->rx_pos = tmp;
 	}
+
 	up->port.icount.rx += copied;
 
 	tty_flip_buffer_push(tty_port);
@@ -96,9 +97,7 @@ static void mtk8250_dma_rx_complete(void *param)
 static void mtk8250_rx_dma(struct uart_8250_port *up)
 {
 	struct uart_8250_dma *dma = up->dma;
-	struct mtk8250_data *data = up->port.private_data;
 	struct dma_async_tx_descriptor	*desc;
-	struct dma_tx_state	 state;
 
 	desc = dmaengine_prep_slave_single(dma->rxchan, dma->rx_addr,
 					   dma->rx_size, DMA_DEV_TO_MEM,
@@ -113,12 +112,6 @@ static void mtk8250_rx_dma(struct uart_8250_port *up)
 
 	dma->rx_cookie = dmaengine_submit(desc);
 
-	dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
-	data->rx_pos = state.residue;
-
-	dma_sync_single_for_device(dma->rxchan->device->dev, dma->rx_addr,
-				   dma->rx_size, DMA_FROM_DEVICE);
-
 	dma_async_issue_pending(dma->rxchan);
 }
 
@@ -131,13 +124,13 @@ static void mtk8250_dma_enable(struct uart_8250_port *up)
 	if (data->rx_status != DMA_RX_START)
 		return;
 
-	dma->rxconf.direction		= DMA_DEV_TO_MEM;
-	dma->rxconf.src_addr_width	= dma->rx_size / 1024;
-	dma->rxconf.src_addr		= dma->rx_addr;
+	dma->rxconf.direction				= DMA_DEV_TO_MEM;
+	dma->rxconf.src_port_window_size	= dma->rx_size;
+	dma->rxconf.src_addr				= dma->rx_addr;
 
-	dma->txconf.direction		= DMA_MEM_TO_DEV;
-	dma->txconf.dst_addr_width	= MTK_UART_TX_SIZE / 1024;
-	dma->txconf.dst_addr		= dma->tx_addr;
+	dma->txconf.direction				= DMA_MEM_TO_DEV;
+	dma->txconf.dst_port_window_size	= UART_XMIT_SIZE;
+	dma->txconf.dst_addr				= dma->tx_addr;
 
 	serial_out(up, UART_FCR, UART_FCR_ENABLE_FIFO | UART_FCR_CLEAR_RCVR |
 		UART_FCR_CLEAR_XMIT);
@@ -217,7 +210,7 @@ static void mtk8250_shutdown(struct uart_port *port)
 	 * Mediatek UARTs use an extra highspeed register (UART_MTK_HIGHS)
 	 *
 	 * We need to recalcualte the quot register, as the claculation depends
-	 * on the vaule in the highspeed register.
+	 * on the value in the highspeed register.
 	 *
 	 * Some baudrates are not supported by the chip, so we use the next
 	 * lower rate supported and update termios c_flag.
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH 3/4] dt-bindings: dma: uart: rename binding
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng
In-Reply-To: <1556336193-15198-1-git-send-email-long.cheng@mediatek.com>

The filename matches mtk-uart-apdma.c.
So using "mtk-uart-apdma.txt" should be better.
And add some property.

Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
 .../devicetree/bindings/dma/8250_mtk_dma.txt       |   33 ------------
 .../devicetree/bindings/dma/mtk-uart-apdma.txt     |   55 ++++++++++++++++++++
 2 files changed, 55 insertions(+), 33 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
 create mode 100644 Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt

diff --git a/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt b/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
deleted file mode 100644
index 3fe0961..0000000
--- a/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-* Mediatek UART APDMA Controller
-
-Required properties:
-- compatible should contain:
-  * "mediatek,mt2712-uart-dma" for MT2712 compatible APDMA
-  * "mediatek,mt6577-uart-dma" for MT6577 and all of the above
-
-- reg: The base address of the APDMA register bank.
-
-- interrupts: A single interrupt specifier.
-
-- clocks : Must contain an entry for each entry in clock-names.
-  See ../clocks/clock-bindings.txt for details.
-- clock-names: The APDMA clock for register accesses
-
-Examples:
-
-	apdma: dma-controller@11000380 {
-		compatible = "mediatek,mt2712-uart-dma";
-		reg = <0 0x11000380 0 0x400>;
-		interrupts = <GIC_SPI 63 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 64 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 65 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 66 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 67 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 68 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 69 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 70 IRQ_TYPE_LEVEL_LOW>;
-		clocks = <&pericfg CLK_PERI_AP_DMA>;
-		clock-names = "apdma";
-		#dma-cells = <1>;
-	};
-
diff --git a/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt b/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt
new file mode 100644
index 0000000..e0424b3
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt
@@ -0,0 +1,55 @@
+* Mediatek UART APDMA Controller
+
+Required properties:
+- compatible should contain:
+  * "mediatek,mt2712-uart-dma" for MT2712 compatible APDMA
+  * "mediatek,mt6577-uart-dma" for MT6577 and all of the above
+
+- reg: The base address of the APDMA register bank.
+
+- interrupts: A single interrupt specifier.
+ One interrupt per dma-requests, or 8 if no dma-requests property is present
+
+- dma-requests: The number of DMA channels
+
+- clocks : Must contain an entry for each entry in clock-names.
+  See ../clocks/clock-bindings.txt for details.
+- clock-names: The APDMA clock for register accesses
+
+- mediatek,dma-33bits: Present if the DMA requires support
+
+Examples:
+
+	apdma: dma-controller@11000400 {
+		compatible = "mediatek,mt2712-uart-dma";
+		reg = <0 0x11000400 0 0x80>,
+		      <0 0x11000480 0 0x80>,
+		      <0 0x11000500 0 0x80>,
+		      <0 0x11000580 0 0x80>,
+		      <0 0x11000600 0 0x80>,
+		      <0 0x11000680 0 0x80>,
+		      <0 0x11000700 0 0x80>,
+		      <0 0x11000780 0 0x80>,
+		      <0 0x11000800 0 0x80>,
+		      <0 0x11000880 0 0x80>,
+		      <0 0x11000900 0 0x80>,
+		      <0 0x11000980 0 0x80>;
+		interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 104 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 105 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 106 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 107 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 108 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 109 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 110 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 111 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 112 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 113 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 114 IRQ_TYPE_LEVEL_LOW>;
+		dma-requests = <12>;
+		clocks = <&pericfg CLK_PERI_AP_DMA>;
+		clock-names = "apdma";
+		mediatek,dma-33bits;
+		#dma-cells = <1>;
+	};
+
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH 2/4] arm: dts: mt2712: add uart APDMA to device tree
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng
In-Reply-To: <1556336193-15198-1-git-send-email-long.cheng@mediatek.com>

1. add uart APDMA controller device node
2. add uart 0/1/2/3/4/5 DMA function

Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
 arch/arm64/boot/dts/mediatek/mt2712e.dtsi |   51 +++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
index 976d92a..f1e419e 100644
--- a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
@@ -300,6 +300,9 @@
 		interrupts = <GIC_SPI 127 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 10
+			&apdma 11>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -369,6 +372,39 @@
 			 (GIC_CPU_MASK_RAW(0x13) | IRQ_TYPE_LEVEL_HIGH)>;
 	};
 
+	apdma: dma-controller@11000400 {
+		compatible = "mediatek,mt2712-uart-dma",
+			     "mediatek,mt6577-uart-dma";
+		reg = <0 0x11000400 0 0x80>,
+		      <0 0x11000480 0 0x80>,
+		      <0 0x11000500 0 0x80>,
+		      <0 0x11000580 0 0x80>,
+		      <0 0x11000600 0 0x80>,
+		      <0 0x11000680 0 0x80>,
+		      <0 0x11000700 0 0x80>,
+		      <0 0x11000780 0 0x80>,
+		      <0 0x11000800 0 0x80>,
+		      <0 0x11000880 0 0x80>,
+		      <0 0x11000900 0 0x80>,
+		      <0 0x11000980 0 0x80>;
+		interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 104 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 105 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 106 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 107 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 108 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 109 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 110 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 111 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 112 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 113 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 114 IRQ_TYPE_LEVEL_LOW>;
+		dma-requests = <12>;
+		clocks = <&pericfg CLK_PERI_AP_DMA>;
+		clock-names = "apdma";
+		#dma-cells = <1>;
+	};
+
 	auxadc: adc@11001000 {
 		compatible = "mediatek,mt2712-auxadc";
 		reg = <0 0x11001000 0 0x1000>;
@@ -385,6 +421,9 @@
 		interrupts = <GIC_SPI 91 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 0
+			&apdma 1>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -395,6 +434,9 @@
 		interrupts = <GIC_SPI 92 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 2
+			&apdma 3>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -405,6 +447,9 @@
 		interrupts = <GIC_SPI 93 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 4
+			&apdma 5>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -415,6 +460,9 @@
 		interrupts = <GIC_SPI 94 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 6
+			&apdma 7>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -629,6 +677,9 @@
 		interrupts = <GIC_SPI 126 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 8
+			&apdma 9>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH 1/4] dmaengine: mediatek: Add MediaTek UART APDMA support
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng
In-Reply-To: <1556336193-15198-1-git-send-email-long.cheng@mediatek.com>

Add 8250 UART APDMA to support MediaTek UART. If MediaTek UART is
enabled by SERIAL_8250_MT6577, and we can enable this driver to offload
the UART device moving bytes.

Signed-off-by: Long Cheng <long.cheng@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
---
 drivers/dma/mediatek/Kconfig          |   11 +
 drivers/dma/mediatek/Makefile         |    1 +
 drivers/dma/mediatek/mtk-uart-apdma.c |  666 +++++++++++++++++++++++++++++++++
 3 files changed, 678 insertions(+)
 create mode 100644 drivers/dma/mediatek/mtk-uart-apdma.c

diff --git a/drivers/dma/mediatek/Kconfig b/drivers/dma/mediatek/Kconfig
index 680fc05..ac49eb6 100644
--- a/drivers/dma/mediatek/Kconfig
+++ b/drivers/dma/mediatek/Kconfig
@@ -24,3 +24,14 @@ config MTK_CQDMA
 
 	  This controller provides the channels which is dedicated to
 	  memory-to-memory transfer to offload from CPU.
+
+config MTK_UART_APDMA
+	tristate "MediaTek SoCs APDMA support for UART"
+	depends on OF && SERIAL_8250_MT6577
+	select DMA_ENGINE
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  Support for the UART DMA engine found on MediaTek MTK SoCs.
+	  When SERIAL_8250_MT6577 is enabled, and if you want to use DMA,
+	  you can enable the config. The DMA engine can only be used
+	  with MediaTek SoCs.
diff --git a/drivers/dma/mediatek/Makefile b/drivers/dma/mediatek/Makefile
index 41bb381..61a6d29 100644
--- a/drivers/dma/mediatek/Makefile
+++ b/drivers/dma/mediatek/Makefile
@@ -1,2 +1,3 @@
+obj-$(CONFIG_MTK_UART_APDMA) += mtk-uart-apdma.o
 obj-$(CONFIG_MTK_HSDMA) += mtk-hsdma.o
 obj-$(CONFIG_MTK_CQDMA) += mtk-cqdma.o
diff --git a/drivers/dma/mediatek/mtk-uart-apdma.c b/drivers/dma/mediatek/mtk-uart-apdma.c
new file mode 100644
index 0000000..546995c
--- /dev/null
+++ b/drivers/dma/mediatek/mtk-uart-apdma.c
@@ -0,0 +1,666 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * MediaTek UART APDMA driver.
+ *
+ * Copyright (c) 2019 MediaTek Inc.
+ * Author: Long Cheng <long.cheng@mediatek.com>
+ */
+
+#include <linux/clk.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/iopoll.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/of_dma.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "../virt-dma.h"
+
+/* The default number of virtual channel */
+#define MTK_UART_APDMA_NR_VCHANS	8
+
+#define VFF_EN_B		BIT(0)
+#define VFF_STOP_B		BIT(0)
+#define VFF_FLUSH_B		BIT(0)
+#define VFF_4G_EN_B		BIT(0)
+/* rx valid size >=  vff thre */
+#define VFF_RX_INT_EN_B		(BIT(0) | BIT(1))
+/* tx left size >= vff thre */
+#define VFF_TX_INT_EN_B		BIT(0)
+#define VFF_WARM_RST_B		BIT(0)
+#define VFF_RX_INT_CLR_B	(BIT(0) | BIT(1))
+#define VFF_TX_INT_CLR_B	0
+#define VFF_STOP_CLR_B		0
+#define VFF_EN_CLR_B		0
+#define VFF_INT_EN_CLR_B	0
+#define VFF_4G_SUPPORT_CLR_B	0
+
+/*
+ * interrupt trigger level for tx
+ * if threshold is n, no polling is required to start tx.
+ * otherwise need polling VFF_FLUSH.
+ */
+#define VFF_TX_THRE(n)		(n)
+/* interrupt trigger level for rx */
+#define VFF_RX_THRE(n)		((n) * 3 / 4)
+
+#define VFF_RING_SIZE	0xffff
+/* invert this bit when wrap ring head again */
+#define VFF_RING_WRAP	0x10000
+
+#define VFF_INT_FLAG		0x00
+#define VFF_INT_EN		0x04
+#define VFF_EN			0x08
+#define VFF_RST			0x0c
+#define VFF_STOP		0x10
+#define VFF_FLUSH		0x14
+#define VFF_ADDR		0x1c
+#define VFF_LEN			0x24
+#define VFF_THRE		0x28
+#define VFF_WPT			0x2c
+#define VFF_RPT			0x30
+/* TX: the buffer size HW can read. RX: the buffer size SW can read. */
+#define VFF_VALID_SIZE		0x3c
+/* TX: the buffer size SW can write. RX: the buffer size HW can write. */
+#define VFF_LEFT_SIZE		0x40
+#define VFF_DEBUG_STATUS	0x50
+#define VFF_4G_SUPPORT		0x54
+
+struct mtk_uart_apdmadev {
+	struct dma_device ddev;
+	struct clk *clk;
+	bool support_33bits;
+	unsigned int dma_requests;
+};
+
+struct mtk_uart_apdma_desc {
+	struct virt_dma_desc vd;
+
+	dma_addr_t addr;
+	unsigned int avail_len;
+};
+
+struct mtk_chan {
+	struct virt_dma_chan vc;
+	struct dma_slave_config	cfg;
+	struct mtk_uart_apdma_desc *desc;
+	enum dma_transfer_direction dir;
+
+	void __iomem *base;
+	unsigned int irq;
+
+	unsigned int rx_status;
+};
+
+static inline struct mtk_uart_apdmadev *
+to_mtk_uart_apdma_dev(struct dma_device *d)
+{
+	return container_of(d, struct mtk_uart_apdmadev, ddev);
+}
+
+static inline struct mtk_chan *to_mtk_uart_apdma_chan(struct dma_chan *c)
+{
+	return container_of(c, struct mtk_chan, vc.chan);
+}
+
+static inline struct mtk_uart_apdma_desc *to_mtk_uart_apdma_desc
+	(struct dma_async_tx_descriptor *t)
+{
+	return container_of(t, struct mtk_uart_apdma_desc, vd.tx);
+}
+
+static void mtk_uart_apdma_write(struct mtk_chan *c,
+			       unsigned int reg, unsigned int val)
+{
+	writel(val, c->base + reg);
+}
+
+static unsigned int mtk_uart_apdma_read(struct mtk_chan *c, unsigned int reg)
+{
+	return readl(c->base + reg);
+}
+
+static void mtk_uart_apdma_desc_free(struct virt_dma_desc *vd)
+{
+	struct dma_chan *chan = vd->tx.chan;
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+	kfree(c->desc);
+}
+
+static void mtk_uart_apdma_start_tx(struct mtk_chan *c)
+{
+	struct mtk_uart_apdmadev *mtkd =
+				to_mtk_uart_apdma_dev(c->vc.chan.device);
+	struct mtk_uart_apdma_desc *d = c->desc;
+	unsigned int wpt, vff_sz;
+
+	vff_sz = c->cfg.dst_port_window_size;
+	if (!mtk_uart_apdma_read(c, VFF_LEN)) {
+		mtk_uart_apdma_write(c, VFF_ADDR, d->addr);
+		mtk_uart_apdma_write(c, VFF_LEN, vff_sz);
+		mtk_uart_apdma_write(c, VFF_THRE, VFF_TX_THRE(vff_sz));
+		mtk_uart_apdma_write(c, VFF_WPT, 0);
+		mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+
+		if (mtkd->support_33bits)
+			mtk_uart_apdma_write(c, VFF_4G_SUPPORT, VFF_4G_EN_B);
+	}
+
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_B);
+	if (mtk_uart_apdma_read(c, VFF_EN) != VFF_EN_B)
+		dev_err(c->vc.chan.device->dev, "Enable TX fail\n");
+
+	if (!mtk_uart_apdma_read(c, VFF_LEFT_SIZE)) {
+		mtk_uart_apdma_write(c, VFF_INT_EN, VFF_TX_INT_EN_B);
+		return;
+	}
+
+	wpt = mtk_uart_apdma_read(c, VFF_WPT);
+
+	wpt += c->desc->avail_len;
+	if ((wpt & VFF_RING_SIZE) == vff_sz)
+		wpt = (wpt & VFF_RING_WRAP) ^ VFF_RING_WRAP;
+
+	/* Let DMA start moving data */
+	mtk_uart_apdma_write(c, VFF_WPT, wpt);
+
+	/* HW auto set to 0 when left size >= threshold */
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_TX_INT_EN_B);
+	if (!mtk_uart_apdma_read(c, VFF_FLUSH))
+		mtk_uart_apdma_write(c, VFF_FLUSH, VFF_FLUSH_B);
+}
+
+static void mtk_uart_apdma_start_rx(struct mtk_chan *c)
+{
+	struct mtk_uart_apdmadev *mtkd =
+				to_mtk_uart_apdma_dev(c->vc.chan.device);
+	struct mtk_uart_apdma_desc *d = c->desc;
+	unsigned int vff_sz;
+
+	vff_sz = c->cfg.src_port_window_size;
+	if (!mtk_uart_apdma_read(c, VFF_LEN)) {
+		mtk_uart_apdma_write(c, VFF_ADDR, d->addr);
+		mtk_uart_apdma_write(c, VFF_LEN, vff_sz);
+		mtk_uart_apdma_write(c, VFF_THRE, VFF_RX_THRE(vff_sz));
+		mtk_uart_apdma_write(c, VFF_RPT, 0);
+		mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+
+		if (mtkd->support_33bits)
+			mtk_uart_apdma_write(c, VFF_4G_SUPPORT, VFF_4G_EN_B);
+	}
+
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_RX_INT_EN_B);
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_B);
+	if (mtk_uart_apdma_read(c, VFF_EN) != VFF_EN_B)
+		dev_err(c->vc.chan.device->dev, "Enable RX fail\n");
+}
+
+static void mtk_uart_apdma_tx_handler(struct mtk_chan *c)
+{
+	struct mtk_uart_apdma_desc *d = c->desc;
+
+	mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_CLR_B);
+
+	list_del(&d->vd.node);
+	vchan_cookie_complete(&d->vd);
+}
+
+static void mtk_uart_apdma_rx_handler(struct mtk_chan *c)
+{
+	struct mtk_uart_apdma_desc *d = c->desc;
+	unsigned int len, wg, rg;
+	int cnt;
+
+	mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+
+	if (!mtk_uart_apdma_read(c, VFF_VALID_SIZE))
+		return;
+
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_CLR_B);
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+
+	len = c->cfg.src_port_window_size;
+	rg = mtk_uart_apdma_read(c, VFF_RPT);
+	wg = mtk_uart_apdma_read(c, VFF_WPT);
+	cnt = (wg & VFF_RING_SIZE) - (rg & VFF_RING_SIZE);
+
+	/*
+	 * The buffer is ring buffer. If wrap bit different,
+	 * represents the start of the next cycle for WPT
+	 */
+	if ((rg ^ wg) & VFF_RING_WRAP)
+		cnt += len;
+
+	c->rx_status = d->avail_len - cnt;
+	mtk_uart_apdma_write(c, VFF_RPT, wg);
+
+	list_del(&d->vd.node);
+	vchan_cookie_complete(&d->vd);
+}
+
+static irqreturn_t mtk_uart_apdma_irq_handler(int irq, void *dev_id)
+{
+	struct dma_chan *chan = (struct dma_chan *)dev_id;
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&c->vc.lock, flags);
+	if (c->dir == DMA_DEV_TO_MEM)
+		mtk_uart_apdma_rx_handler(c);
+	else if (c->dir == DMA_MEM_TO_DEV)
+		mtk_uart_apdma_tx_handler(c);
+	spin_unlock_irqrestore(&c->vc.lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+static int mtk_uart_apdma_alloc_chan_resources(struct dma_chan *chan)
+{
+	struct mtk_uart_apdmadev *mtkd = to_mtk_uart_apdma_dev(chan->device);
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	unsigned int status;
+	int ret;
+
+	ret = pm_runtime_get_sync(mtkd->ddev.dev);
+	if (ret < 0) {
+		pm_runtime_put_noidle(chan->device->dev);
+		return ret;
+	}
+
+	mtk_uart_apdma_write(c, VFF_ADDR, 0);
+	mtk_uart_apdma_write(c, VFF_THRE, 0);
+	mtk_uart_apdma_write(c, VFF_LEN, 0);
+	mtk_uart_apdma_write(c, VFF_RST, VFF_WARM_RST_B);
+
+	ret = readx_poll_timeout(readl, c->base + VFF_EN,
+			  status, !status, 10, 100);
+	if (ret)
+		return ret;
+
+	ret = request_irq(c->irq, mtk_uart_apdma_irq_handler,
+			  IRQF_TRIGGER_NONE, KBUILD_MODNAME, chan);
+	if (ret < 0) {
+		dev_err(chan->device->dev, "Can't request dma IRQ\n");
+		return -EINVAL;
+	}
+
+	if (mtkd->support_33bits)
+		mtk_uart_apdma_write(c, VFF_4G_SUPPORT, VFF_4G_SUPPORT_CLR_B);
+
+	return ret;
+}
+
+static void mtk_uart_apdma_free_chan_resources(struct dma_chan *chan)
+{
+	struct mtk_uart_apdmadev *mtkd = to_mtk_uart_apdma_dev(chan->device);
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+	free_irq(c->irq, chan);
+
+	tasklet_kill(&c->vc.task);
+
+	vchan_free_chan_resources(&c->vc);
+
+	pm_runtime_put_sync(mtkd->ddev.dev);
+}
+
+static enum dma_status mtk_uart_apdma_tx_status(struct dma_chan *chan,
+					 dma_cookie_t cookie,
+					 struct dma_tx_state *txstate)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	enum dma_status ret;
+
+	ret = dma_cookie_status(chan, cookie, txstate);
+	if (!txstate)
+		return ret;
+
+	dma_set_residue(txstate, c->rx_status);
+
+	return ret;
+}
+
+/*
+ * dmaengine_prep_slave_single will call the function. and sglen is 1.
+ * 8250 uart using one ring buffer, and deal with one sg.
+ */
+static struct dma_async_tx_descriptor *mtk_uart_apdma_prep_slave_sg
+	(struct dma_chan *chan, struct scatterlist *sgl,
+	unsigned int sglen, enum dma_transfer_direction dir,
+	unsigned long tx_flags, void *context)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	struct mtk_uart_apdma_desc *d;
+
+	if (!is_slave_direction(dir) || sglen != 1)
+		return NULL;
+
+	/* Now allocate and setup the descriptor */
+	d = kzalloc(sizeof(*d), GFP_ATOMIC);
+	if (!d)
+		return NULL;
+
+	d->avail_len = sg_dma_len(sgl);
+	d->addr = sg_dma_address(sgl);
+	c->dir = dir;
+
+	return vchan_tx_prep(&c->vc, &d->vd, tx_flags);
+}
+
+static void mtk_uart_apdma_issue_pending(struct dma_chan *chan)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	struct virt_dma_desc *vd;
+	unsigned long flags;
+
+	spin_lock_irqsave(&c->vc.lock, flags);
+	if (vchan_issue_pending(&c->vc)) {
+		vd = vchan_next_desc(&c->vc);
+		c->desc = to_mtk_uart_apdma_desc(&vd->tx);
+
+		if (c->dir == DMA_DEV_TO_MEM)
+			mtk_uart_apdma_start_rx(c);
+		else if (c->dir == DMA_MEM_TO_DEV)
+			mtk_uart_apdma_start_tx(c);
+	}
+
+	spin_unlock_irqrestore(&c->vc.lock, flags);
+}
+
+static int mtk_uart_apdma_slave_config(struct dma_chan *chan,
+				   struct dma_slave_config *config)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+	memcpy(&c->cfg, config, sizeof(*config));
+
+	return 0;
+}
+
+static int mtk_uart_apdma_terminate_all(struct dma_chan *chan)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	unsigned long flags;
+	unsigned int status;
+	LIST_HEAD(head);
+	int ret;
+
+	mtk_uart_apdma_write(c, VFF_FLUSH, VFF_FLUSH_B);
+
+	ret = readx_poll_timeout(readl, c->base + VFF_FLUSH,
+			  status, status != VFF_FLUSH_B, 10, 100);
+	if (ret)
+		dev_err(c->vc.chan.device->dev, "flush: fail, status=0x%x\n",
+			mtk_uart_apdma_read(c, VFF_DEBUG_STATUS));
+
+	/*
+	 * Stop need 3 steps.
+	 * 1. set stop to 1
+	 * 2. wait en to 0
+	 * 3. set stop as 0
+	 */
+	mtk_uart_apdma_write(c, VFF_STOP, VFF_STOP_B);
+	ret = readx_poll_timeout(readl, c->base + VFF_EN,
+			  status, !status, 10, 100);
+	if (ret)
+		dev_err(c->vc.chan.device->dev, "stop: fail, status=0x%x\n",
+			mtk_uart_apdma_read(c, VFF_DEBUG_STATUS));
+
+	mtk_uart_apdma_write(c, VFF_STOP, VFF_STOP_CLR_B);
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+
+	if (c->dir == DMA_DEV_TO_MEM)
+		mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+	else if (c->dir == DMA_MEM_TO_DEV)
+		mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+
+	synchronize_irq(c->irq);
+
+	spin_lock_irqsave(&c->vc.lock, flags);
+	vchan_get_all_descriptors(&c->vc, &head);
+	vchan_dma_desc_free_list(&c->vc, &head);
+	spin_unlock_irqrestore(&c->vc.lock, flags);
+
+	return 0;
+}
+
+static int mtk_uart_apdma_device_pause(struct dma_chan *chan)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&c->vc.lock, flags);
+
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_CLR_B);
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+
+	synchronize_irq(c->irq);
+
+	spin_unlock_irqrestore(&c->vc.lock, flags);
+
+	return 0;
+}
+
+static void mtk_uart_apdma_free(struct mtk_uart_apdmadev *mtkd)
+{
+	while (!list_empty(&mtkd->ddev.channels)) {
+		struct mtk_chan *c = list_first_entry(&mtkd->ddev.channels,
+			struct mtk_chan, vc.chan.device_node);
+
+		list_del(&c->vc.chan.device_node);
+		tasklet_kill(&c->vc.task);
+	}
+}
+
+static const struct of_device_id mtk_uart_apdma_match[] = {
+	{ .compatible = "mediatek,mt6577-uart-dma", },
+	{ /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, mtk_uart_apdma_match);
+
+static int mtk_uart_apdma_probe(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	struct mtk_uart_apdmadev *mtkd;
+	int bit_mask = 32, rc;
+	struct resource *res;
+	struct mtk_chan *c;
+	unsigned int i;
+
+	mtkd = devm_kzalloc(&pdev->dev, sizeof(*mtkd), GFP_KERNEL);
+	if (!mtkd)
+		return -ENOMEM;
+
+	mtkd->clk = devm_clk_get(&pdev->dev, NULL);
+	if (IS_ERR(mtkd->clk)) {
+		dev_err(&pdev->dev, "No clock specified\n");
+		rc = PTR_ERR(mtkd->clk);
+		return rc;
+	}
+
+	if (of_property_read_bool(np, "mediatek,dma-33bits"))
+		mtkd->support_33bits = true;
+
+	if (mtkd->support_33bits)
+		bit_mask = 33;
+
+	rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(bit_mask));
+	if (rc)
+		return rc;
+
+	dma_cap_set(DMA_SLAVE, mtkd->ddev.cap_mask);
+	mtkd->ddev.device_alloc_chan_resources =
+				mtk_uart_apdma_alloc_chan_resources;
+	mtkd->ddev.device_free_chan_resources =
+				mtk_uart_apdma_free_chan_resources;
+	mtkd->ddev.device_tx_status = mtk_uart_apdma_tx_status;
+	mtkd->ddev.device_issue_pending = mtk_uart_apdma_issue_pending;
+	mtkd->ddev.device_prep_slave_sg = mtk_uart_apdma_prep_slave_sg;
+	mtkd->ddev.device_config = mtk_uart_apdma_slave_config;
+	mtkd->ddev.device_pause = mtk_uart_apdma_device_pause;
+	mtkd->ddev.device_terminate_all = mtk_uart_apdma_terminate_all;
+	mtkd->ddev.src_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE);
+	mtkd->ddev.dst_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE);
+	mtkd->ddev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
+	mtkd->ddev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
+	mtkd->ddev.dev = &pdev->dev;
+	INIT_LIST_HEAD(&mtkd->ddev.channels);
+
+	mtkd->dma_requests = MTK_UART_APDMA_NR_VCHANS;
+	if (of_property_read_u32(np, "dma-requests", &mtkd->dma_requests)) {
+		dev_info(&pdev->dev,
+			 "Using %u as missing dma-requests property\n",
+			 MTK_UART_APDMA_NR_VCHANS);
+	}
+
+	for (i = 0; i < mtkd->dma_requests; i++) {
+		c = devm_kzalloc(mtkd->ddev.dev, sizeof(*c), GFP_KERNEL);
+		if (!c) {
+			rc = -ENODEV;
+			goto err_no_dma;
+		}
+
+		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
+		if (!res) {
+			rc = -ENODEV;
+			goto err_no_dma;
+		}
+
+		c->base = devm_ioremap_resource(&pdev->dev, res);
+		if (IS_ERR(c->base)) {
+			rc = PTR_ERR(c->base);
+			goto err_no_dma;
+		}
+		c->vc.desc_free = mtk_uart_apdma_desc_free;
+		vchan_init(&c->vc, &mtkd->ddev);
+
+		rc = platform_get_irq(pdev, i);
+		if (rc < 0) {
+			dev_err(&pdev->dev, "failed to get IRQ[%d]\n", i);
+			goto err_no_dma;
+		}
+		c->irq = rc;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+	pm_runtime_set_active(&pdev->dev);
+
+	rc = dma_async_device_register(&mtkd->ddev);
+	if (rc)
+		goto rpm_disable;
+
+	platform_set_drvdata(pdev, mtkd);
+
+	/* Device-tree DMA controller registration */
+	rc = of_dma_controller_register(np, of_dma_xlate_by_chan_id, mtkd);
+	if (rc)
+		goto dma_remove;
+
+	return rc;
+
+dma_remove:
+	dma_async_device_unregister(&mtkd->ddev);
+rpm_disable:
+	pm_runtime_disable(&pdev->dev);
+err_no_dma:
+	mtk_uart_apdma_free(mtkd);
+	return rc;
+}
+
+static int mtk_uart_apdma_remove(struct platform_device *pdev)
+{
+	struct mtk_uart_apdmadev *mtkd = platform_get_drvdata(pdev);
+
+	of_dma_controller_free(pdev->dev.of_node);
+
+	mtk_uart_apdma_free(mtkd);
+
+	dma_async_device_unregister(&mtkd->ddev);
+
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int mtk_uart_apdma_suspend(struct device *dev)
+{
+	struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+	if (!pm_runtime_suspended(dev))
+		clk_disable_unprepare(mtkd->clk);
+
+	return 0;
+}
+
+static int mtk_uart_apdma_resume(struct device *dev)
+{
+	int ret;
+	struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+	if (!pm_runtime_suspended(dev)) {
+		ret = clk_prepare_enable(mtkd->clk);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
+
+#ifdef CONFIG_PM
+static int mtk_uart_apdma_runtime_suspend(struct device *dev)
+{
+	struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+	clk_disable_unprepare(mtkd->clk);
+
+	return 0;
+}
+
+static int mtk_uart_apdma_runtime_resume(struct device *dev)
+{
+	int ret;
+	struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+	ret = clk_prepare_enable(mtkd->clk);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+#endif /* CONFIG_PM */
+
+static const struct dev_pm_ops mtk_uart_apdma_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(mtk_uart_apdma_suspend, mtk_uart_apdma_resume)
+	SET_RUNTIME_PM_OPS(mtk_uart_apdma_runtime_suspend,
+			   mtk_uart_apdma_runtime_resume, NULL)
+};
+
+static struct platform_driver mtk_uart_apdma_driver = {
+	.probe	= mtk_uart_apdma_probe,
+	.remove	= mtk_uart_apdma_remove,
+	.driver = {
+		.name		= KBUILD_MODNAME,
+		.pm		= &mtk_uart_apdma_pm_ops,
+		.of_match_table = of_match_ptr(mtk_uart_apdma_match),
+	},
+};
+
+module_platform_driver(mtk_uart_apdma_driver);
+
+MODULE_DESCRIPTION("MediaTek UART APDMA Controller Driver");
+MODULE_AUTHOR("Long Cheng <long.cheng@mediatek.com>");
+MODULE_LICENSE("GPL v2");
-- 
1.7.9.5


^ permalink raw reply related

* [4/4] serial: 8250-mtk: modify uart DMA rx
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng

Modify uart rx and complete for DMA.

Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
 drivers/tty/serial/8250/8250_mtk.c |   53 ++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_mtk.c b/drivers/tty/serial/8250/8250_mtk.c
index c1fdbc0..04081a6 100644
--- a/drivers/tty/serial/8250/8250_mtk.c
+++ b/drivers/tty/serial/8250/8250_mtk.c
@@ -30,7 +30,6 @@
 #define MTK_UART_DMA_EN_TX	0x2
 #define MTK_UART_DMA_EN_RX	0x5
 
-#define MTK_UART_TX_SIZE	UART_XMIT_SIZE
 #define MTK_UART_RX_SIZE	0x8000
 #define MTK_UART_TX_TRIGGER	1
 #define MTK_UART_RX_TRIGGER	MTK_UART_RX_SIZE
@@ -64,28 +63,30 @@ static void mtk8250_dma_rx_complete(void *param)
 	struct mtk8250_data *data = up->port.private_data;
 	struct tty_port *tty_port = &up->port.state->port;
 	struct dma_tx_state state;
+	int copied, cnt, tmp;
 	unsigned char *ptr;
-	int copied;
 
-	dma_sync_single_for_cpu(dma->rxchan->device->dev, dma->rx_addr,
-				dma->rx_size, DMA_FROM_DEVICE);
+	if (data->rx_status == DMA_RX_SHUTDOWN)
+		return;
 
 	dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
+	cnt = dma->rx_size - state.residue;
+	tmp = cnt;
 
-	if (data->rx_status == DMA_RX_SHUTDOWN)
-		return;
+	if ((data->rx_pos + cnt) > dma->rx_size)
+		tmp = dma->rx_size - data->rx_pos;
 
-	if ((data->rx_pos + state.residue) <= dma->rx_size) {
-		ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
-		copied = tty_insert_flip_string(tty_port, ptr, state.residue);
-	} else {
-		ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
-		copied = tty_insert_flip_string(tty_port, ptr,
-						dma->rx_size - data->rx_pos);
+	ptr = (unsigned char *)(data->rx_pos + dma->rx_buf);
+	copied = tty_insert_flip_string(tty_port, ptr, tmp);
+	data->rx_pos += tmp;
+
+	if (cnt > tmp) {
 		ptr = (unsigned char *)(dma->rx_buf);
-		copied += tty_insert_flip_string(tty_port, ptr,
-				data->rx_pos + state.residue - dma->rx_size);
+		tmp = cnt - tmp;
+		copied += tty_insert_flip_string(tty_port, ptr, tmp);
+		data->rx_pos = tmp;
 	}
+
 	up->port.icount.rx += copied;
 
 	tty_flip_buffer_push(tty_port);
@@ -96,9 +97,7 @@ static void mtk8250_dma_rx_complete(void *param)
 static void mtk8250_rx_dma(struct uart_8250_port *up)
 {
 	struct uart_8250_dma *dma = up->dma;
-	struct mtk8250_data *data = up->port.private_data;
 	struct dma_async_tx_descriptor	*desc;
-	struct dma_tx_state	 state;
 
 	desc = dmaengine_prep_slave_single(dma->rxchan, dma->rx_addr,
 					   dma->rx_size, DMA_DEV_TO_MEM,
@@ -113,12 +112,6 @@ static void mtk8250_rx_dma(struct uart_8250_port *up)
 
 	dma->rx_cookie = dmaengine_submit(desc);
 
-	dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
-	data->rx_pos = state.residue;
-
-	dma_sync_single_for_device(dma->rxchan->device->dev, dma->rx_addr,
-				   dma->rx_size, DMA_FROM_DEVICE);
-
 	dma_async_issue_pending(dma->rxchan);
 }
 
@@ -131,13 +124,13 @@ static void mtk8250_dma_enable(struct uart_8250_port *up)
 	if (data->rx_status != DMA_RX_START)
 		return;
 
-	dma->rxconf.direction		= DMA_DEV_TO_MEM;
-	dma->rxconf.src_addr_width	= dma->rx_size / 1024;
-	dma->rxconf.src_addr		= dma->rx_addr;
+	dma->rxconf.direction				= DMA_DEV_TO_MEM;
+	dma->rxconf.src_port_window_size	= dma->rx_size;
+	dma->rxconf.src_addr				= dma->rx_addr;
 
-	dma->txconf.direction		= DMA_MEM_TO_DEV;
-	dma->txconf.dst_addr_width	= MTK_UART_TX_SIZE / 1024;
-	dma->txconf.dst_addr		= dma->tx_addr;
+	dma->txconf.direction				= DMA_MEM_TO_DEV;
+	dma->txconf.dst_port_window_size	= UART_XMIT_SIZE;
+	dma->txconf.dst_addr				= dma->tx_addr;
 
 	serial_out(up, UART_FCR, UART_FCR_ENABLE_FIFO | UART_FCR_CLEAR_RCVR |
 		UART_FCR_CLEAR_XMIT);
@@ -217,7 +210,7 @@ static void mtk8250_shutdown(struct uart_port *port)
 	 * Mediatek UARTs use an extra highspeed register (UART_MTK_HIGHS)
 	 *
 	 * We need to recalcualte the quot register, as the claculation depends
-	 * on the vaule in the highspeed register.
+	 * on the value in the highspeed register.
 	 *
 	 * Some baudrates are not supported by the chip, so we use the next
 	 * lower rate supported and update termios c_flag.

^ permalink raw reply related

* [3/4] dt-bindings: dma: uart: rename binding
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng

The filename matches mtk-uart-apdma.c.
So using "mtk-uart-apdma.txt" should be better.
And add some property.

Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
 .../devicetree/bindings/dma/8250_mtk_dma.txt       |   33 ------------
 .../devicetree/bindings/dma/mtk-uart-apdma.txt     |   55 ++++++++++++++++++++
 2 files changed, 55 insertions(+), 33 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
 create mode 100644 Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt

diff --git a/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt b/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
deleted file mode 100644
index 3fe0961..0000000
--- a/Documentation/devicetree/bindings/dma/8250_mtk_dma.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-* Mediatek UART APDMA Controller
-
-Required properties:
-- compatible should contain:
-  * "mediatek,mt2712-uart-dma" for MT2712 compatible APDMA
-  * "mediatek,mt6577-uart-dma" for MT6577 and all of the above
-
-- reg: The base address of the APDMA register bank.
-
-- interrupts: A single interrupt specifier.
-
-- clocks : Must contain an entry for each entry in clock-names.
-  See ../clocks/clock-bindings.txt for details.
-- clock-names: The APDMA clock for register accesses
-
-Examples:
-
-	apdma: dma-controller@11000380 {
-		compatible = "mediatek,mt2712-uart-dma";
-		reg = <0 0x11000380 0 0x400>;
-		interrupts = <GIC_SPI 63 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 64 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 65 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 66 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 67 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 68 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 69 IRQ_TYPE_LEVEL_LOW>,
-			     <GIC_SPI 70 IRQ_TYPE_LEVEL_LOW>;
-		clocks = <&pericfg CLK_PERI_AP_DMA>;
-		clock-names = "apdma";
-		#dma-cells = <1>;
-	};
-
diff --git a/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt b/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt
new file mode 100644
index 0000000..e0424b3
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/mtk-uart-apdma.txt
@@ -0,0 +1,55 @@
+* Mediatek UART APDMA Controller
+
+Required properties:
+- compatible should contain:
+  * "mediatek,mt2712-uart-dma" for MT2712 compatible APDMA
+  * "mediatek,mt6577-uart-dma" for MT6577 and all of the above
+
+- reg: The base address of the APDMA register bank.
+
+- interrupts: A single interrupt specifier.
+ One interrupt per dma-requests, or 8 if no dma-requests property is present
+
+- dma-requests: The number of DMA channels
+
+- clocks : Must contain an entry for each entry in clock-names.
+  See ../clocks/clock-bindings.txt for details.
+- clock-names: The APDMA clock for register accesses
+
+- mediatek,dma-33bits: Present if the DMA requires support
+
+Examples:
+
+	apdma: dma-controller@11000400 {
+		compatible = "mediatek,mt2712-uart-dma";
+		reg = <0 0x11000400 0 0x80>,
+		      <0 0x11000480 0 0x80>,
+		      <0 0x11000500 0 0x80>,
+		      <0 0x11000580 0 0x80>,
+		      <0 0x11000600 0 0x80>,
+		      <0 0x11000680 0 0x80>,
+		      <0 0x11000700 0 0x80>,
+		      <0 0x11000780 0 0x80>,
+		      <0 0x11000800 0 0x80>,
+		      <0 0x11000880 0 0x80>,
+		      <0 0x11000900 0 0x80>,
+		      <0 0x11000980 0 0x80>;
+		interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 104 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 105 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 106 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 107 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 108 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 109 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 110 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 111 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 112 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 113 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 114 IRQ_TYPE_LEVEL_LOW>;
+		dma-requests = <12>;
+		clocks = <&pericfg CLK_PERI_AP_DMA>;
+		clock-names = "apdma";
+		mediatek,dma-33bits;
+		#dma-cells = <1>;
+	};
+

^ permalink raw reply related

* [2/4] arm: dts: mt2712: add uart APDMA to device tree
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng

1. add uart APDMA controller device node
2. add uart 0/1/2/3/4/5 DMA function

Signed-off-by: Long Cheng <long.cheng@mediatek.com>
---
 arch/arm64/boot/dts/mediatek/mt2712e.dtsi |   51 +++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
index 976d92a..f1e419e 100644
--- a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
@@ -300,6 +300,9 @@
 		interrupts = <GIC_SPI 127 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 10
+			&apdma 11>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -369,6 +372,39 @@
 			 (GIC_CPU_MASK_RAW(0x13) | IRQ_TYPE_LEVEL_HIGH)>;
 	};
 
+	apdma: dma-controller@11000400 {
+		compatible = "mediatek,mt2712-uart-dma",
+			     "mediatek,mt6577-uart-dma";
+		reg = <0 0x11000400 0 0x80>,
+		      <0 0x11000480 0 0x80>,
+		      <0 0x11000500 0 0x80>,
+		      <0 0x11000580 0 0x80>,
+		      <0 0x11000600 0 0x80>,
+		      <0 0x11000680 0 0x80>,
+		      <0 0x11000700 0 0x80>,
+		      <0 0x11000780 0 0x80>,
+		      <0 0x11000800 0 0x80>,
+		      <0 0x11000880 0 0x80>,
+		      <0 0x11000900 0 0x80>,
+		      <0 0x11000980 0 0x80>;
+		interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 104 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 105 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 106 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 107 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 108 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 109 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 110 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 111 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 112 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 113 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 114 IRQ_TYPE_LEVEL_LOW>;
+		dma-requests = <12>;
+		clocks = <&pericfg CLK_PERI_AP_DMA>;
+		clock-names = "apdma";
+		#dma-cells = <1>;
+	};
+
 	auxadc: adc@11001000 {
 		compatible = "mediatek,mt2712-auxadc";
 		reg = <0 0x11001000 0 0x1000>;
@@ -385,6 +421,9 @@
 		interrupts = <GIC_SPI 91 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 0
+			&apdma 1>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -395,6 +434,9 @@
 		interrupts = <GIC_SPI 92 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 2
+			&apdma 3>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -405,6 +447,9 @@
 		interrupts = <GIC_SPI 93 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 4
+			&apdma 5>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -415,6 +460,9 @@
 		interrupts = <GIC_SPI 94 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 6
+			&apdma 7>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 
@@ -629,6 +677,9 @@
 		interrupts = <GIC_SPI 126 IRQ_TYPE_LEVEL_LOW>;
 		clocks = <&baud_clk>, <&sys_clk>;
 		clock-names = "baud", "bus";
+		dmas = <&apdma 8
+			&apdma 9>;
+		dma-names = "tx", "rx";
 		status = "disabled";
 	};
 

^ permalink raw reply related

* [1/4] dmaengine: mediatek: Add MediaTek UART APDMA support
From: Long Cheng @ 2019-04-27  3:36 UTC (permalink / raw)
  To: Vinod Koul, Randy Dunlap, Rob Herring, Mark Rutland, Ryder Lee,
	Sean Wang, Nicolas Boichat, Matthias Brugger
  Cc: Dan Williams, Greg Kroah-Hartman, Jiri Slaby, Sean Wang,
	dmaengine, devicetree, linux-arm-kernel, linux-mediatek,
	linux-kernel, linux-serial, srv_heupstream, Yingjoe Chen, YT Shen,
	Zhenbao Liu, Long Cheng

Add 8250 UART APDMA to support MediaTek UART. If MediaTek UART is
enabled by SERIAL_8250_MT6577, and we can enable this driver to offload
the UART device moving bytes.

Signed-off-by: Long Cheng <long.cheng@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
---
 drivers/dma/mediatek/Kconfig          |   11 +
 drivers/dma/mediatek/Makefile         |    1 +
 drivers/dma/mediatek/mtk-uart-apdma.c |  666 +++++++++++++++++++++++++++++++++
 3 files changed, 678 insertions(+)
 create mode 100644 drivers/dma/mediatek/mtk-uart-apdma.c

diff --git a/drivers/dma/mediatek/Kconfig b/drivers/dma/mediatek/Kconfig
index 680fc05..ac49eb6 100644
--- a/drivers/dma/mediatek/Kconfig
+++ b/drivers/dma/mediatek/Kconfig
@@ -24,3 +24,14 @@ config MTK_CQDMA
 
 	  This controller provides the channels which is dedicated to
 	  memory-to-memory transfer to offload from CPU.
+
+config MTK_UART_APDMA
+	tristate "MediaTek SoCs APDMA support for UART"
+	depends on OF && SERIAL_8250_MT6577
+	select DMA_ENGINE
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  Support for the UART DMA engine found on MediaTek MTK SoCs.
+	  When SERIAL_8250_MT6577 is enabled, and if you want to use DMA,
+	  you can enable the config. The DMA engine can only be used
+	  with MediaTek SoCs.
diff --git a/drivers/dma/mediatek/Makefile b/drivers/dma/mediatek/Makefile
index 41bb381..61a6d29 100644
--- a/drivers/dma/mediatek/Makefile
+++ b/drivers/dma/mediatek/Makefile
@@ -1,2 +1,3 @@
+obj-$(CONFIG_MTK_UART_APDMA) += mtk-uart-apdma.o
 obj-$(CONFIG_MTK_HSDMA) += mtk-hsdma.o
 obj-$(CONFIG_MTK_CQDMA) += mtk-cqdma.o
diff --git a/drivers/dma/mediatek/mtk-uart-apdma.c b/drivers/dma/mediatek/mtk-uart-apdma.c
new file mode 100644
index 0000000..546995c
--- /dev/null
+++ b/drivers/dma/mediatek/mtk-uart-apdma.c
@@ -0,0 +1,666 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * MediaTek UART APDMA driver.
+ *
+ * Copyright (c) 2019 MediaTek Inc.
+ * Author: Long Cheng <long.cheng@mediatek.com>
+ */
+
+#include <linux/clk.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/iopoll.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/of_dma.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "../virt-dma.h"
+
+/* The default number of virtual channel */
+#define MTK_UART_APDMA_NR_VCHANS	8
+
+#define VFF_EN_B		BIT(0)
+#define VFF_STOP_B		BIT(0)
+#define VFF_FLUSH_B		BIT(0)
+#define VFF_4G_EN_B		BIT(0)
+/* rx valid size >=  vff thre */
+#define VFF_RX_INT_EN_B		(BIT(0) | BIT(1))
+/* tx left size >= vff thre */
+#define VFF_TX_INT_EN_B		BIT(0)
+#define VFF_WARM_RST_B		BIT(0)
+#define VFF_RX_INT_CLR_B	(BIT(0) | BIT(1))
+#define VFF_TX_INT_CLR_B	0
+#define VFF_STOP_CLR_B		0
+#define VFF_EN_CLR_B		0
+#define VFF_INT_EN_CLR_B	0
+#define VFF_4G_SUPPORT_CLR_B	0
+
+/*
+ * interrupt trigger level for tx
+ * if threshold is n, no polling is required to start tx.
+ * otherwise need polling VFF_FLUSH.
+ */
+#define VFF_TX_THRE(n)		(n)
+/* interrupt trigger level for rx */
+#define VFF_RX_THRE(n)		((n) * 3 / 4)
+
+#define VFF_RING_SIZE	0xffff
+/* invert this bit when wrap ring head again */
+#define VFF_RING_WRAP	0x10000
+
+#define VFF_INT_FLAG		0x00
+#define VFF_INT_EN		0x04
+#define VFF_EN			0x08
+#define VFF_RST			0x0c
+#define VFF_STOP		0x10
+#define VFF_FLUSH		0x14
+#define VFF_ADDR		0x1c
+#define VFF_LEN			0x24
+#define VFF_THRE		0x28
+#define VFF_WPT			0x2c
+#define VFF_RPT			0x30
+/* TX: the buffer size HW can read. RX: the buffer size SW can read. */
+#define VFF_VALID_SIZE		0x3c
+/* TX: the buffer size SW can write. RX: the buffer size HW can write. */
+#define VFF_LEFT_SIZE		0x40
+#define VFF_DEBUG_STATUS	0x50
+#define VFF_4G_SUPPORT		0x54
+
+struct mtk_uart_apdmadev {
+	struct dma_device ddev;
+	struct clk *clk;
+	bool support_33bits;
+	unsigned int dma_requests;
+};
+
+struct mtk_uart_apdma_desc {
+	struct virt_dma_desc vd;
+
+	dma_addr_t addr;
+	unsigned int avail_len;
+};
+
+struct mtk_chan {
+	struct virt_dma_chan vc;
+	struct dma_slave_config	cfg;
+	struct mtk_uart_apdma_desc *desc;
+	enum dma_transfer_direction dir;
+
+	void __iomem *base;
+	unsigned int irq;
+
+	unsigned int rx_status;
+};
+
+static inline struct mtk_uart_apdmadev *
+to_mtk_uart_apdma_dev(struct dma_device *d)
+{
+	return container_of(d, struct mtk_uart_apdmadev, ddev);
+}
+
+static inline struct mtk_chan *to_mtk_uart_apdma_chan(struct dma_chan *c)
+{
+	return container_of(c, struct mtk_chan, vc.chan);
+}
+
+static inline struct mtk_uart_apdma_desc *to_mtk_uart_apdma_desc
+	(struct dma_async_tx_descriptor *t)
+{
+	return container_of(t, struct mtk_uart_apdma_desc, vd.tx);
+}
+
+static void mtk_uart_apdma_write(struct mtk_chan *c,
+			       unsigned int reg, unsigned int val)
+{
+	writel(val, c->base + reg);
+}
+
+static unsigned int mtk_uart_apdma_read(struct mtk_chan *c, unsigned int reg)
+{
+	return readl(c->base + reg);
+}
+
+static void mtk_uart_apdma_desc_free(struct virt_dma_desc *vd)
+{
+	struct dma_chan *chan = vd->tx.chan;
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+	kfree(c->desc);
+}
+
+static void mtk_uart_apdma_start_tx(struct mtk_chan *c)
+{
+	struct mtk_uart_apdmadev *mtkd =
+				to_mtk_uart_apdma_dev(c->vc.chan.device);
+	struct mtk_uart_apdma_desc *d = c->desc;
+	unsigned int wpt, vff_sz;
+
+	vff_sz = c->cfg.dst_port_window_size;
+	if (!mtk_uart_apdma_read(c, VFF_LEN)) {
+		mtk_uart_apdma_write(c, VFF_ADDR, d->addr);
+		mtk_uart_apdma_write(c, VFF_LEN, vff_sz);
+		mtk_uart_apdma_write(c, VFF_THRE, VFF_TX_THRE(vff_sz));
+		mtk_uart_apdma_write(c, VFF_WPT, 0);
+		mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+
+		if (mtkd->support_33bits)
+			mtk_uart_apdma_write(c, VFF_4G_SUPPORT, VFF_4G_EN_B);
+	}
+
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_B);
+	if (mtk_uart_apdma_read(c, VFF_EN) != VFF_EN_B)
+		dev_err(c->vc.chan.device->dev, "Enable TX fail\n");
+
+	if (!mtk_uart_apdma_read(c, VFF_LEFT_SIZE)) {
+		mtk_uart_apdma_write(c, VFF_INT_EN, VFF_TX_INT_EN_B);
+		return;
+	}
+
+	wpt = mtk_uart_apdma_read(c, VFF_WPT);
+
+	wpt += c->desc->avail_len;
+	if ((wpt & VFF_RING_SIZE) == vff_sz)
+		wpt = (wpt & VFF_RING_WRAP) ^ VFF_RING_WRAP;
+
+	/* Let DMA start moving data */
+	mtk_uart_apdma_write(c, VFF_WPT, wpt);
+
+	/* HW auto set to 0 when left size >= threshold */
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_TX_INT_EN_B);
+	if (!mtk_uart_apdma_read(c, VFF_FLUSH))
+		mtk_uart_apdma_write(c, VFF_FLUSH, VFF_FLUSH_B);
+}
+
+static void mtk_uart_apdma_start_rx(struct mtk_chan *c)
+{
+	struct mtk_uart_apdmadev *mtkd =
+				to_mtk_uart_apdma_dev(c->vc.chan.device);
+	struct mtk_uart_apdma_desc *d = c->desc;
+	unsigned int vff_sz;
+
+	vff_sz = c->cfg.src_port_window_size;
+	if (!mtk_uart_apdma_read(c, VFF_LEN)) {
+		mtk_uart_apdma_write(c, VFF_ADDR, d->addr);
+		mtk_uart_apdma_write(c, VFF_LEN, vff_sz);
+		mtk_uart_apdma_write(c, VFF_THRE, VFF_RX_THRE(vff_sz));
+		mtk_uart_apdma_write(c, VFF_RPT, 0);
+		mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+
+		if (mtkd->support_33bits)
+			mtk_uart_apdma_write(c, VFF_4G_SUPPORT, VFF_4G_EN_B);
+	}
+
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_RX_INT_EN_B);
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_B);
+	if (mtk_uart_apdma_read(c, VFF_EN) != VFF_EN_B)
+		dev_err(c->vc.chan.device->dev, "Enable RX fail\n");
+}
+
+static void mtk_uart_apdma_tx_handler(struct mtk_chan *c)
+{
+	struct mtk_uart_apdma_desc *d = c->desc;
+
+	mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_CLR_B);
+
+	list_del(&d->vd.node);
+	vchan_cookie_complete(&d->vd);
+}
+
+static void mtk_uart_apdma_rx_handler(struct mtk_chan *c)
+{
+	struct mtk_uart_apdma_desc *d = c->desc;
+	unsigned int len, wg, rg;
+	int cnt;
+
+	mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+
+	if (!mtk_uart_apdma_read(c, VFF_VALID_SIZE))
+		return;
+
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_CLR_B);
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+
+	len = c->cfg.src_port_window_size;
+	rg = mtk_uart_apdma_read(c, VFF_RPT);
+	wg = mtk_uart_apdma_read(c, VFF_WPT);
+	cnt = (wg & VFF_RING_SIZE) - (rg & VFF_RING_SIZE);
+
+	/*
+	 * The buffer is ring buffer. If wrap bit different,
+	 * represents the start of the next cycle for WPT
+	 */
+	if ((rg ^ wg) & VFF_RING_WRAP)
+		cnt += len;
+
+	c->rx_status = d->avail_len - cnt;
+	mtk_uart_apdma_write(c, VFF_RPT, wg);
+
+	list_del(&d->vd.node);
+	vchan_cookie_complete(&d->vd);
+}
+
+static irqreturn_t mtk_uart_apdma_irq_handler(int irq, void *dev_id)
+{
+	struct dma_chan *chan = (struct dma_chan *)dev_id;
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&c->vc.lock, flags);
+	if (c->dir == DMA_DEV_TO_MEM)
+		mtk_uart_apdma_rx_handler(c);
+	else if (c->dir == DMA_MEM_TO_DEV)
+		mtk_uart_apdma_tx_handler(c);
+	spin_unlock_irqrestore(&c->vc.lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+static int mtk_uart_apdma_alloc_chan_resources(struct dma_chan *chan)
+{
+	struct mtk_uart_apdmadev *mtkd = to_mtk_uart_apdma_dev(chan->device);
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	unsigned int status;
+	int ret;
+
+	ret = pm_runtime_get_sync(mtkd->ddev.dev);
+	if (ret < 0) {
+		pm_runtime_put_noidle(chan->device->dev);
+		return ret;
+	}
+
+	mtk_uart_apdma_write(c, VFF_ADDR, 0);
+	mtk_uart_apdma_write(c, VFF_THRE, 0);
+	mtk_uart_apdma_write(c, VFF_LEN, 0);
+	mtk_uart_apdma_write(c, VFF_RST, VFF_WARM_RST_B);
+
+	ret = readx_poll_timeout(readl, c->base + VFF_EN,
+			  status, !status, 10, 100);
+	if (ret)
+		return ret;
+
+	ret = request_irq(c->irq, mtk_uart_apdma_irq_handler,
+			  IRQF_TRIGGER_NONE, KBUILD_MODNAME, chan);
+	if (ret < 0) {
+		dev_err(chan->device->dev, "Can't request dma IRQ\n");
+		return -EINVAL;
+	}
+
+	if (mtkd->support_33bits)
+		mtk_uart_apdma_write(c, VFF_4G_SUPPORT, VFF_4G_SUPPORT_CLR_B);
+
+	return ret;
+}
+
+static void mtk_uart_apdma_free_chan_resources(struct dma_chan *chan)
+{
+	struct mtk_uart_apdmadev *mtkd = to_mtk_uart_apdma_dev(chan->device);
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+	free_irq(c->irq, chan);
+
+	tasklet_kill(&c->vc.task);
+
+	vchan_free_chan_resources(&c->vc);
+
+	pm_runtime_put_sync(mtkd->ddev.dev);
+}
+
+static enum dma_status mtk_uart_apdma_tx_status(struct dma_chan *chan,
+					 dma_cookie_t cookie,
+					 struct dma_tx_state *txstate)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	enum dma_status ret;
+
+	ret = dma_cookie_status(chan, cookie, txstate);
+	if (!txstate)
+		return ret;
+
+	dma_set_residue(txstate, c->rx_status);
+
+	return ret;
+}
+
+/*
+ * dmaengine_prep_slave_single will call the function. and sglen is 1.
+ * 8250 uart using one ring buffer, and deal with one sg.
+ */
+static struct dma_async_tx_descriptor *mtk_uart_apdma_prep_slave_sg
+	(struct dma_chan *chan, struct scatterlist *sgl,
+	unsigned int sglen, enum dma_transfer_direction dir,
+	unsigned long tx_flags, void *context)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	struct mtk_uart_apdma_desc *d;
+
+	if (!is_slave_direction(dir) || sglen != 1)
+		return NULL;
+
+	/* Now allocate and setup the descriptor */
+	d = kzalloc(sizeof(*d), GFP_ATOMIC);
+	if (!d)
+		return NULL;
+
+	d->avail_len = sg_dma_len(sgl);
+	d->addr = sg_dma_address(sgl);
+	c->dir = dir;
+
+	return vchan_tx_prep(&c->vc, &d->vd, tx_flags);
+}
+
+static void mtk_uart_apdma_issue_pending(struct dma_chan *chan)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	struct virt_dma_desc *vd;
+	unsigned long flags;
+
+	spin_lock_irqsave(&c->vc.lock, flags);
+	if (vchan_issue_pending(&c->vc)) {
+		vd = vchan_next_desc(&c->vc);
+		c->desc = to_mtk_uart_apdma_desc(&vd->tx);
+
+		if (c->dir == DMA_DEV_TO_MEM)
+			mtk_uart_apdma_start_rx(c);
+		else if (c->dir == DMA_MEM_TO_DEV)
+			mtk_uart_apdma_start_tx(c);
+	}
+
+	spin_unlock_irqrestore(&c->vc.lock, flags);
+}
+
+static int mtk_uart_apdma_slave_config(struct dma_chan *chan,
+				   struct dma_slave_config *config)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+
+	memcpy(&c->cfg, config, sizeof(*config));
+
+	return 0;
+}
+
+static int mtk_uart_apdma_terminate_all(struct dma_chan *chan)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	unsigned long flags;
+	unsigned int status;
+	LIST_HEAD(head);
+	int ret;
+
+	mtk_uart_apdma_write(c, VFF_FLUSH, VFF_FLUSH_B);
+
+	ret = readx_poll_timeout(readl, c->base + VFF_FLUSH,
+			  status, status != VFF_FLUSH_B, 10, 100);
+	if (ret)
+		dev_err(c->vc.chan.device->dev, "flush: fail, status=0x%x\n",
+			mtk_uart_apdma_read(c, VFF_DEBUG_STATUS));
+
+	/*
+	 * Stop need 3 steps.
+	 * 1. set stop to 1
+	 * 2. wait en to 0
+	 * 3. set stop as 0
+	 */
+	mtk_uart_apdma_write(c, VFF_STOP, VFF_STOP_B);
+	ret = readx_poll_timeout(readl, c->base + VFF_EN,
+			  status, !status, 10, 100);
+	if (ret)
+		dev_err(c->vc.chan.device->dev, "stop: fail, status=0x%x\n",
+			mtk_uart_apdma_read(c, VFF_DEBUG_STATUS));
+
+	mtk_uart_apdma_write(c, VFF_STOP, VFF_STOP_CLR_B);
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+
+	if (c->dir == DMA_DEV_TO_MEM)
+		mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_RX_INT_CLR_B);
+	else if (c->dir == DMA_MEM_TO_DEV)
+		mtk_uart_apdma_write(c, VFF_INT_FLAG, VFF_TX_INT_CLR_B);
+
+	synchronize_irq(c->irq);
+
+	spin_lock_irqsave(&c->vc.lock, flags);
+	vchan_get_all_descriptors(&c->vc, &head);
+	vchan_dma_desc_free_list(&c->vc, &head);
+	spin_unlock_irqrestore(&c->vc.lock, flags);
+
+	return 0;
+}
+
+static int mtk_uart_apdma_device_pause(struct dma_chan *chan)
+{
+	struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&c->vc.lock, flags);
+
+	mtk_uart_apdma_write(c, VFF_EN, VFF_EN_CLR_B);
+	mtk_uart_apdma_write(c, VFF_INT_EN, VFF_INT_EN_CLR_B);
+
+	synchronize_irq(c->irq);
+
+	spin_unlock_irqrestore(&c->vc.lock, flags);
+
+	return 0;
+}
+
+static void mtk_uart_apdma_free(struct mtk_uart_apdmadev *mtkd)
+{
+	while (!list_empty(&mtkd->ddev.channels)) {
+		struct mtk_chan *c = list_first_entry(&mtkd->ddev.channels,
+			struct mtk_chan, vc.chan.device_node);
+
+		list_del(&c->vc.chan.device_node);
+		tasklet_kill(&c->vc.task);
+	}
+}
+
+static const struct of_device_id mtk_uart_apdma_match[] = {
+	{ .compatible = "mediatek,mt6577-uart-dma", },
+	{ /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, mtk_uart_apdma_match);
+
+static int mtk_uart_apdma_probe(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	struct mtk_uart_apdmadev *mtkd;
+	int bit_mask = 32, rc;
+	struct resource *res;
+	struct mtk_chan *c;
+	unsigned int i;
+
+	mtkd = devm_kzalloc(&pdev->dev, sizeof(*mtkd), GFP_KERNEL);
+	if (!mtkd)
+		return -ENOMEM;
+
+	mtkd->clk = devm_clk_get(&pdev->dev, NULL);
+	if (IS_ERR(mtkd->clk)) {
+		dev_err(&pdev->dev, "No clock specified\n");
+		rc = PTR_ERR(mtkd->clk);
+		return rc;
+	}
+
+	if (of_property_read_bool(np, "mediatek,dma-33bits"))
+		mtkd->support_33bits = true;
+
+	if (mtkd->support_33bits)
+		bit_mask = 33;
+
+	rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(bit_mask));
+	if (rc)
+		return rc;
+
+	dma_cap_set(DMA_SLAVE, mtkd->ddev.cap_mask);
+	mtkd->ddev.device_alloc_chan_resources =
+				mtk_uart_apdma_alloc_chan_resources;
+	mtkd->ddev.device_free_chan_resources =
+				mtk_uart_apdma_free_chan_resources;
+	mtkd->ddev.device_tx_status = mtk_uart_apdma_tx_status;
+	mtkd->ddev.device_issue_pending = mtk_uart_apdma_issue_pending;
+	mtkd->ddev.device_prep_slave_sg = mtk_uart_apdma_prep_slave_sg;
+	mtkd->ddev.device_config = mtk_uart_apdma_slave_config;
+	mtkd->ddev.device_pause = mtk_uart_apdma_device_pause;
+	mtkd->ddev.device_terminate_all = mtk_uart_apdma_terminate_all;
+	mtkd->ddev.src_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE);
+	mtkd->ddev.dst_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE);
+	mtkd->ddev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
+	mtkd->ddev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
+	mtkd->ddev.dev = &pdev->dev;
+	INIT_LIST_HEAD(&mtkd->ddev.channels);
+
+	mtkd->dma_requests = MTK_UART_APDMA_NR_VCHANS;
+	if (of_property_read_u32(np, "dma-requests", &mtkd->dma_requests)) {
+		dev_info(&pdev->dev,
+			 "Using %u as missing dma-requests property\n",
+			 MTK_UART_APDMA_NR_VCHANS);
+	}
+
+	for (i = 0; i < mtkd->dma_requests; i++) {
+		c = devm_kzalloc(mtkd->ddev.dev, sizeof(*c), GFP_KERNEL);
+		if (!c) {
+			rc = -ENODEV;
+			goto err_no_dma;
+		}
+
+		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
+		if (!res) {
+			rc = -ENODEV;
+			goto err_no_dma;
+		}
+
+		c->base = devm_ioremap_resource(&pdev->dev, res);
+		if (IS_ERR(c->base)) {
+			rc = PTR_ERR(c->base);
+			goto err_no_dma;
+		}
+		c->vc.desc_free = mtk_uart_apdma_desc_free;
+		vchan_init(&c->vc, &mtkd->ddev);
+
+		rc = platform_get_irq(pdev, i);
+		if (rc < 0) {
+			dev_err(&pdev->dev, "failed to get IRQ[%d]\n", i);
+			goto err_no_dma;
+		}
+		c->irq = rc;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+	pm_runtime_set_active(&pdev->dev);
+
+	rc = dma_async_device_register(&mtkd->ddev);
+	if (rc)
+		goto rpm_disable;
+
+	platform_set_drvdata(pdev, mtkd);
+
+	/* Device-tree DMA controller registration */
+	rc = of_dma_controller_register(np, of_dma_xlate_by_chan_id, mtkd);
+	if (rc)
+		goto dma_remove;
+
+	return rc;
+
+dma_remove:
+	dma_async_device_unregister(&mtkd->ddev);
+rpm_disable:
+	pm_runtime_disable(&pdev->dev);
+err_no_dma:
+	mtk_uart_apdma_free(mtkd);
+	return rc;
+}
+
+static int mtk_uart_apdma_remove(struct platform_device *pdev)
+{
+	struct mtk_uart_apdmadev *mtkd = platform_get_drvdata(pdev);
+
+	of_dma_controller_free(pdev->dev.of_node);
+
+	mtk_uart_apdma_free(mtkd);
+
+	dma_async_device_unregister(&mtkd->ddev);
+
+	pm_runtime_disable(&pdev->dev);
+
+	return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int mtk_uart_apdma_suspend(struct device *dev)
+{
+	struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+	if (!pm_runtime_suspended(dev))
+		clk_disable_unprepare(mtkd->clk);
+
+	return 0;
+}
+
+static int mtk_uart_apdma_resume(struct device *dev)
+{
+	int ret;
+	struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+	if (!pm_runtime_suspended(dev)) {
+		ret = clk_prepare_enable(mtkd->clk);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
+
+#ifdef CONFIG_PM
+static int mtk_uart_apdma_runtime_suspend(struct device *dev)
+{
+	struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+	clk_disable_unprepare(mtkd->clk);
+
+	return 0;
+}
+
+static int mtk_uart_apdma_runtime_resume(struct device *dev)
+{
+	int ret;
+	struct mtk_uart_apdmadev *mtkd = dev_get_drvdata(dev);
+
+	ret = clk_prepare_enable(mtkd->clk);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+#endif /* CONFIG_PM */
+
+static const struct dev_pm_ops mtk_uart_apdma_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(mtk_uart_apdma_suspend, mtk_uart_apdma_resume)
+	SET_RUNTIME_PM_OPS(mtk_uart_apdma_runtime_suspend,
+			   mtk_uart_apdma_runtime_resume, NULL)
+};
+
+static struct platform_driver mtk_uart_apdma_driver = {
+	.probe	= mtk_uart_apdma_probe,
+	.remove	= mtk_uart_apdma_remove,
+	.driver = {
+		.name		= KBUILD_MODNAME,
+		.pm		= &mtk_uart_apdma_pm_ops,
+		.of_match_table = of_match_ptr(mtk_uart_apdma_match),
+	},
+};
+
+module_platform_driver(mtk_uart_apdma_driver);
+
+MODULE_DESCRIPTION("MediaTek UART APDMA Controller Driver");
+MODULE_AUTHOR("Long Cheng <long.cheng@mediatek.com>");
+MODULE_LICENSE("GPL v2");

^ permalink raw reply related

* Re: [Bug 203435] New: Xilinx pcie endpoint DMA not functional for ZynqMP PS AXI-PXI DMA endpoint
From: Bjorn Helgaas @ 2019-04-26 16:14 UTC (permalink / raw)
  To: Michal Simek; +Cc: Bjorn Helgaas, Linux PCI, dtyree, Vinod Koul, dmaengine
In-Reply-To: <38d59480-3057-9352-677e-a365ad526990@xilinx.com>

On Fri, Apr 26, 2019 at 11:11 AM Michal Simek <michal.simek@xilinx.com> wrote:
>
> Hi Bjorn,
>
> On 26. 04. 19 6:21, Bjorn Helgaas wrote:
> > [+cc linux-pci, Xilinx and DMA engine folks]
> >
> > I'm not sure what to do with this report.  It mentions "vivado" and
> > xilinx_ps_pcie_platform.c, neither of which seems to be in mainline or
> > linux-next.  A Google search suggests that Vivado is a Xilinx product;
> > maybe this bug report should be directed there?
>
> vivado is xilinx design tool for programmable logic and soc setup.
>
> And this driver is not in mainline but just in xilinx git tree.
> It means you can simply close this.
> And author should contact Xilinx via forums or via git@xilinx.com if
> there is any issue with this driver.

Great, thanks!  I've closed the upstream bugzilla.

^ permalink raw reply

* Re: [Bug 203435] New: Xilinx pcie endpoint DMA not functional for ZynqMP PS AXI-PXI DMA endpoint
From: Michal Simek @ 2019-04-26 16:11 UTC (permalink / raw)
  To: bjorn, linux-pci; +Cc: dtyree, Michal Simek, Vinod Koul, dmaengine
In-Reply-To: <CABhMZUV2bSHUeZErR8=AQrBvuJHeKEMfQJbGTVRgcxZcXp6i2g@mail.gmail.com>

Hi Bjorn,

On 26. 04. 19 6:21, Bjorn Helgaas wrote:
> [+cc linux-pci, Xilinx and DMA engine folks]
> 
> I'm not sure what to do with this report.  It mentions "vivado" and
> xilinx_ps_pcie_platform.c, neither of which seems to be in mainline or
> linux-next.  A Google search suggests that Vivado is a Xilinx product;
> maybe this bug report should be directed there?

vivado is xilinx design tool for programmable logic and soc setup.

And this driver is not in mainline but just in xilinx git tree.
It means you can simply close this.
And author should contact Xilinx via forums or via git@xilinx.com if
there is any issue with this driver.

Thanks,
Michal

^ permalink raw reply

* Re: [PATCH v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Thierry Reding @ 2019-04-26 15:11 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Jon Hunter, Laxman Dewangan, Vinod Koul, dmaengine, linux-tegra,
	linux-kernel
In-Reply-To: <d6d1e420-6707-f446-a531-4b38e4e82c19@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5456 bytes --]

On Fri, Apr 26, 2019 at 04:03:08PM +0300, Dmitry Osipenko wrote:
> 26.04.2019 15:42, Dmitry Osipenko пишет:
> > 26.04.2019 15:18, Dmitry Osipenko пишет:
> >> 26.04.2019 14:13, Jon Hunter пишет:
> >>>
> >>> On 26/04/2019 11:45, Dmitry Osipenko wrote:
> >>>> 26.04.2019 12:52, Jon Hunter пишет:
> >>>>>
> >>>>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
> >>>>>> The readl/writel functions are inserting memory barrier in order to
> >>>>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
> >>>>>> results in L2 cache syncing which isn't a cheapest operation. The
> >>>>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
> >>>>>> accesses, hence use the relaxed versions of the functions.
> >>>>>
> >>>>> Do you mean device-io accesses here as this is not generic memory?
> >>>>
> >>>> Yes. The IOMEM accesses within are always ordered and uncached, while
> >>>> generic memory accesses are out-of-order and cached.
> >>>>
> >>>>> Although there may not be any issues with this change, I think I need a
> >>>>> bit more convincing that we should do this given that we have had it
> >>>>> this way for sometime and I would not like to see us introduce any
> >>>>> regressions as this point without being 100% certain we would not.
> >>>>> Ideally, if I had some good extensive tests I could run to hammer the
> >>>>> DMA for all configurations with different combinations of channels
> >>>>> running simultaneously then we could test this, but right now I don't :-(
> >>>>>
> >>>>> Have you ...
> >>>>> 1. Tested both cyclic and scatter-gather transfers?
> >>>>> 2. Stress tested simultaneous transfers with various different
> >>>>>    configurations?
> >>>>> 3. Quantified the actual performance benefit of this change so we can
> >>>>>    understand how much of a performance boost this offers?
> >>>>
> >>>> Actually I found a case where this change causes a problem, I'm seeing
> >>>> I2C transfer timeout for touchscreen and it breaks the touch input.
> >>>> Indeed, I haven't tested this patch very well.
> >>>>
> >>>> And the fix is this:
> >>>>
> >>>> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
> >>>> *dev)
> >>>>  						  TEGRA_APBDMA_CHAN_WCOUNT);
> >>>>  	}
> >>>>
> >>>> +	dsb();
> >>>> +
> >>>>  	clk_disable_unprepare(tdma->dma_clk);
> >>>>
> >>>>  	return 0;
> >>>>
> >>>>
> >>>> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
> >>>> incoherent and CPU disables clock before writes are reaching DMA controller.
> >>>>
> >>>> I'd say that cyclic and scatter-gather transfers are now tested. I also
> >>>> made some more testing of simultaneous transfers.
> >>>>
> >>>> Quantifying performance probably won't be easy to make as the DMA
> >>>> read/writes are not on any kind of code's hot-path.
> >>>
> >>> So why make the change?
> >>
> >> For consistency.
> >>
> >>>> Jon, are you still insisting about to drop this patch or you will be
> >>>> fine with the v2 that will have the dsb() in place?
> >>>
> >>> If we can't quantify the performance gain, then it is difficult to
> >>> justify the change. I would also be concerned if that is the only place
> >>> we need an explicit dsb.
> >>
> >> Maybe it won't hurt to add dsb to the ISR as well. But okay, let's drop
> >> this patch for now.
> >>
> > 
> > Jon, it occurred to me that there still should be a problem with the
> > writel() ordering in the driver because writel() ensures that memory
> > stores are completed *before* the write occurs and hence translates into
> > iowmb() + writel_relaxed() [0]. Thus the last write will always happen
> > asynchronously in regards to clk accesses.
> > 
> > [0]
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/io.h#n311
> > 
> 
> Also please note that iowmb() translates into wmb() if
> CONFIG_ARM_DMA_MEM_BUFFERABLE=y and sometime ago I was profiling host1x
> driver job submission performance and have seen cases where wmb() could
> take up to 1ms on T20 due to L2 syncing if there are outstanding memory
> writes in the cache (or even more, I don't remember exactly already how
> bad it was..).

This looks to be primarily caused by the fact that we have the L2X0
cache on Tegra20. So there's not really anything that can be done there
without potentially compromising correctness of the code.

> Altogether, I think the usage of readl/writel in pretty much all of
> Tegra drivers is plainly wrong and explicit dsb() shall be used in
> places where hardware synchronization is really needed.

I don't think that's an accurate observation. readl()/writel() are more
likely to be correct than the relaxed versions. You already saw yourself
that using the relaxed versions can easily introduce regressions.

Granted, readl()/writel() might add more memory barriers than strictly
necessary, and therefore they might in many cases be suboptimal. But, we
can't just go and engage in a wholesale conversion of all drivers. If we
do this, we need to very carefully audit every conversion to make sure
no regressions are introduced. This is especially complicated because
these would be subtle regressions and may be difficult to catch or
reproduce.

Also, we should avoid using primitives such as dsb in driver code to
avoid making the code too architecture specific.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Thierry Reding @ 2019-04-26 15:11 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Jon Hunter, Laxman Dewangan, Vinod Koul, dmaengine, linux-tegra,
	linux-kernel

On Fri, Apr 26, 2019 at 04:03:08PM +0300, Dmitry Osipenko wrote:
> 26.04.2019 15:42, Dmitry Osipenko пишет:
> > 26.04.2019 15:18, Dmitry Osipenko пишет:
> >> 26.04.2019 14:13, Jon Hunter пишет:
> >>>
> >>> On 26/04/2019 11:45, Dmitry Osipenko wrote:
> >>>> 26.04.2019 12:52, Jon Hunter пишет:
> >>>>>
> >>>>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
> >>>>>> The readl/writel functions are inserting memory barrier in order to
> >>>>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
> >>>>>> results in L2 cache syncing which isn't a cheapest operation. The
> >>>>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
> >>>>>> accesses, hence use the relaxed versions of the functions.
> >>>>>
> >>>>> Do you mean device-io accesses here as this is not generic memory?
> >>>>
> >>>> Yes. The IOMEM accesses within are always ordered and uncached, while
> >>>> generic memory accesses are out-of-order and cached.
> >>>>
> >>>>> Although there may not be any issues with this change, I think I need a
> >>>>> bit more convincing that we should do this given that we have had it
> >>>>> this way for sometime and I would not like to see us introduce any
> >>>>> regressions as this point without being 100% certain we would not.
> >>>>> Ideally, if I had some good extensive tests I could run to hammer the
> >>>>> DMA for all configurations with different combinations of channels
> >>>>> running simultaneously then we could test this, but right now I don't :-(
> >>>>>
> >>>>> Have you ...
> >>>>> 1. Tested both cyclic and scatter-gather transfers?
> >>>>> 2. Stress tested simultaneous transfers with various different
> >>>>>    configurations?
> >>>>> 3. Quantified the actual performance benefit of this change so we can
> >>>>>    understand how much of a performance boost this offers?
> >>>>
> >>>> Actually I found a case where this change causes a problem, I'm seeing
> >>>> I2C transfer timeout for touchscreen and it breaks the touch input.
> >>>> Indeed, I haven't tested this patch very well.
> >>>>
> >>>> And the fix is this:
> >>>>
> >>>> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
> >>>> *dev)
> >>>>  						  TEGRA_APBDMA_CHAN_WCOUNT);
> >>>>  	}
> >>>>
> >>>> +	dsb();
> >>>> +
> >>>>  	clk_disable_unprepare(tdma->dma_clk);
> >>>>
> >>>>  	return 0;
> >>>>
> >>>>
> >>>> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
> >>>> incoherent and CPU disables clock before writes are reaching DMA controller.
> >>>>
> >>>> I'd say that cyclic and scatter-gather transfers are now tested. I also
> >>>> made some more testing of simultaneous transfers.
> >>>>
> >>>> Quantifying performance probably won't be easy to make as the DMA
> >>>> read/writes are not on any kind of code's hot-path.
> >>>
> >>> So why make the change?
> >>
> >> For consistency.
> >>
> >>>> Jon, are you still insisting about to drop this patch or you will be
> >>>> fine with the v2 that will have the dsb() in place?
> >>>
> >>> If we can't quantify the performance gain, then it is difficult to
> >>> justify the change. I would also be concerned if that is the only place
> >>> we need an explicit dsb.
> >>
> >> Maybe it won't hurt to add dsb to the ISR as well. But okay, let's drop
> >> this patch for now.
> >>
> > 
> > Jon, it occurred to me that there still should be a problem with the
> > writel() ordering in the driver because writel() ensures that memory
> > stores are completed *before* the write occurs and hence translates into
> > iowmb() + writel_relaxed() [0]. Thus the last write will always happen
> > asynchronously in regards to clk accesses.
> > 
> > [0]
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/io.h#n311
> > 
> 
> Also please note that iowmb() translates into wmb() if
> CONFIG_ARM_DMA_MEM_BUFFERABLE=y and sometime ago I was profiling host1x
> driver job submission performance and have seen cases where wmb() could
> take up to 1ms on T20 due to L2 syncing if there are outstanding memory
> writes in the cache (or even more, I don't remember exactly already how
> bad it was..).

This looks to be primarily caused by the fact that we have the L2X0
cache on Tegra20. So there's not really anything that can be done there
without potentially compromising correctness of the code.

> Altogether, I think the usage of readl/writel in pretty much all of
> Tegra drivers is plainly wrong and explicit dsb() shall be used in
> places where hardware synchronization is really needed.

I don't think that's an accurate observation. readl()/writel() are more
likely to be correct than the relaxed versions. You already saw yourself
that using the relaxed versions can easily introduce regressions.

Granted, readl()/writel() might add more memory barriers than strictly
necessary, and therefore they might in many cases be suboptimal. But, we
can't just go and engage in a wholesale conversion of all drivers. If we
do this, we need to very carefully audit every conversion to make sure
no regressions are introduced. This is especially complicated because
these would be subtle regressions and may be difficult to catch or
reproduce.

Also, we should avoid using primitives such as dsb in driver code to
avoid making the code too architecture specific.

Thierry

^ permalink raw reply

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
From: Arnaud Pouliquen @ 2019-04-26 13:41 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel,
	dmaengine
In-Reply-To: <20190426121751.GC28103@vkoul-mobl>

Hi Vinod

On 4/26/19 2:17 PM, Vinod Koul wrote:
> Hi Arnaud,
> 
> Sorry for delay in review, the conference travel/vacation plan delayed
> this.
no problem, just a rememder to be sure that you not missed it in the
patch stream.

> 
> On 27-03-19, 13:21, Arnaud Pouliquen wrote:
>> During residue calculation. the DMA can switch to the next sg. When
>> this race condition occurs, the residue returned value is not valid.
>> Indeed the position in the sg returned by the hardware is the position
>> of the next sg, not the current sg.
>> Solution is to check the sg after the calculation to verify it.
>> If a transition is detected we consider that the DMA has switched to
>> the beginning of next sg.
> 
> Now, that sounds like duct tape. Why should we bother doing that.
> 
> Also looking back at the stm32_dma_desc_residue() and calls to it from
> stm32_dma_tx_status() am not sure we are doing the right thing
Please, could you explain what you have in mind here?

> 
> why are we looking at next_sg here, can you explain me that please

This solution is similar to one implemented in the at_hdmac.c driver
(atc_get_bytes_left function).

Yes could be consider as a workaround for a hardware issue...

In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
mode is mainly use for audio transfer initiated by an ALSA driver.

From hardware point of view the DMA transfers first block based on sg1,
then it updates registers to prepare sg2 transfer, and then generates an
IRQ to inform that it issues the next transfer (sg2).

Then driver can update sg1 to prepare the third transfer...

In parallel the client driver can requests status to get the residue to
update internal pointer.
The issue is in the race condition between the call of the
device_tx_status ops and the update of the DMA register on sg switch.

During a short time the hardware updated the registers containing the
sg ID but not the transfer counter(SxNDTR). In this case there is a
mismatch between the Sg ID and the associated transfer counter.
So residue calculation is wrong.
Idea of this patch is to perform the calculation and then to crosscheck
that the hardware has not switched to the next sg during the
calculation. The way to crosscheck is to compare the the sg ID before
and after the calculation.

I tested the solution to force a new recalculation but no real solution
to trust the registers during this phase. In this case an approximation
is to consider that the DMA is transferring the first bytes of the next sg.
So we return the residue corresponding to the beginning of the next buffer.

Don't hesitate if it is still not clear

Thanks
Arnaud

> 
>>
>> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>  drivers/dma/stm32-dma.c | 70 ++++++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 57 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
>> index 4903a40..30309d2 100644
>> --- a/drivers/dma/stm32-dma.c
>> +++ b/drivers/dma/stm32-dma.c
>> @@ -1038,33 +1038,77 @@ static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
>>  	return ndtr << width;
>>  }
>>  
>> +static bool stm32_dma_is_current_sg(struct stm32_dma_chan *chan)
>> +{
>> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
>> +	struct stm32_dma_sg_req *sg_req;
>> +	u32 dma_scr, dma_smar, id;
>> +
>> +	id = chan->id;
>> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(id));
>> +
>> +	if (!(dma_scr & STM32_DMA_SCR_DBM))
>> +		return true;
>> +
>> +	sg_req = &chan->desc->sg_req[chan->next_sg];
>> +
>> +	if (dma_scr & STM32_DMA_SCR_CT) {
>> +		dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM0AR(id));
>> +		return (dma_smar == sg_req->chan_reg.dma_sm0ar);
>> +	}
>> +
>> +	dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM1AR(id));
>> +
>> +	return (dma_smar == sg_req->chan_reg.dma_sm1ar);
>> +}
>> +
>>  static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
>>  				     struct stm32_dma_desc *desc,
>>  				     u32 next_sg)
>>  {
>>  	u32 modulo, burst_size;
>> -	u32 residue = 0;
>> +	u32 residue;
>> +	u32 n_sg = next_sg;
>> +	struct stm32_dma_sg_req *sg_req = &chan->desc->sg_req[chan->next_sg];
>>  	int i;
>>  
>> +	residue = stm32_dma_get_remaining_bytes(chan);
>> +
>>  	/*
>> -	 * In cyclic mode, for the last period, residue = remaining bytes from
>> -	 * NDTR
>> +	 * Calculate the residue means compute the descriptors
>> +	 * information:
>> +	 * - the sg currently transferred
>> +	 * - the remaining position in this sg (NDTR).
>> +	 *
>> +	 * The issue is that a race condition can occur if DMA is
>> +	 * running. DMA can have started to transfer the next sg before
>> +	 * the position in sg is read. In this case the remaing position
>> +	 * can correspond to the new sg position.
>> +	 * The strategy implemented in the stm32 driver is to check the
>> +	 * sg transition. If detected we can not trust the SxNDTR register
>> +	 * value, this register can not be up to date during the transition.
>> +	 * In this case we can assume that the dma is at the beginning of next
>> +	 * sg so we calculate the residue in consequence.
>>  	 */
>> -	if (chan->desc->cyclic && next_sg == 0) {
>> -		residue = stm32_dma_get_remaining_bytes(chan);
>> -		goto end;
>> +
>> +	if (!stm32_dma_is_current_sg(chan)) {
>> +		n_sg++;
>> +		if (n_sg == chan->desc->num_sgs)
>> +			n_sg = 0;
>> +		residue = sg_req->len;
>>  	}
>>  
>>  	/*
>> -	 * For all other periods in cyclic mode, and in sg mode,
>> -	 * residue = remaining bytes from NDTR + remaining periods/sg to be
>> -	 * transferred
>> +	 * In cyclic mode, for the last period, residue = remaining bytes
>> +	 * from NDTR,
>> +	 * else for all other periods in cyclic mode, and in sg mode,
>> +	 * residue = remaining bytes from NDTR + remaining
>> +	 * periods/sg to be transferred
>>  	 */
>> -	for (i = next_sg; i < desc->num_sgs; i++)
>> -		residue += desc->sg_req[i].len;
>> -	residue += stm32_dma_get_remaining_bytes(chan);
>> +	if (!chan->desc->cyclic || n_sg != 0)
>> +		for (i = n_sg; i < desc->num_sgs; i++)
>> +			residue += desc->sg_req[i].len;
>>  
>> -end:
>>  	if (!chan->mem_burst)
>>  		return residue;
>>  
>> -- 
>> 2.7.4
> 

^ permalink raw reply

* dmaengine: stm32-dma: fix residue calculation in stm32-dma
From: Arnaud Pouliquen @ 2019-04-26 13:41 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel,
	dmaengine

Hi Vinod

On 4/26/19 2:17 PM, Vinod Koul wrote:
> Hi Arnaud,
> 
> Sorry for delay in review, the conference travel/vacation plan delayed
> this.
no problem, just a rememder to be sure that you not missed it in the
patch stream.

> 
> On 27-03-19, 13:21, Arnaud Pouliquen wrote:
>> During residue calculation. the DMA can switch to the next sg. When
>> this race condition occurs, the residue returned value is not valid.
>> Indeed the position in the sg returned by the hardware is the position
>> of the next sg, not the current sg.
>> Solution is to check the sg after the calculation to verify it.
>> If a transition is detected we consider that the DMA has switched to
>> the beginning of next sg.
> 
> Now, that sounds like duct tape. Why should we bother doing that.
> 
> Also looking back at the stm32_dma_desc_residue() and calls to it from
> stm32_dma_tx_status() am not sure we are doing the right thing
Please, could you explain what you have in mind here?

> 
> why are we looking at next_sg here, can you explain me that please

This solution is similar to one implemented in the at_hdmac.c driver
(atc_get_bytes_left function).

Yes could be consider as a workaround for a hardware issue...

In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
mode is mainly use for audio transfer initiated by an ALSA driver.

From hardware point of view the DMA transfers first block based on sg1,
then it updates registers to prepare sg2 transfer, and then generates an
IRQ to inform that it issues the next transfer (sg2).

Then driver can update sg1 to prepare the third transfer...

In parallel the client driver can requests status to get the residue to
update internal pointer.
The issue is in the race condition between the call of the
device_tx_status ops and the update of the DMA register on sg switch.

During a short time the hardware updated the registers containing the
sg ID but not the transfer counter(SxNDTR). In this case there is a
mismatch between the Sg ID and the associated transfer counter.
So residue calculation is wrong.
Idea of this patch is to perform the calculation and then to crosscheck
that the hardware has not switched to the next sg during the
calculation. The way to crosscheck is to compare the the sg ID before
and after the calculation.

I tested the solution to force a new recalculation but no real solution
to trust the registers during this phase. In this case an approximation
is to consider that the DMA is transferring the first bytes of the next sg.
So we return the residue corresponding to the beginning of the next buffer.

Don't hesitate if it is still not clear

Thanks
Arnaud

> 
>>
>> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>  drivers/dma/stm32-dma.c | 70 ++++++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 57 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
>> index 4903a40..30309d2 100644
>> --- a/drivers/dma/stm32-dma.c
>> +++ b/drivers/dma/stm32-dma.c
>> @@ -1038,33 +1038,77 @@ static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
>>  	return ndtr << width;
>>  }
>>  
>> +static bool stm32_dma_is_current_sg(struct stm32_dma_chan *chan)
>> +{
>> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
>> +	struct stm32_dma_sg_req *sg_req;
>> +	u32 dma_scr, dma_smar, id;
>> +
>> +	id = chan->id;
>> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(id));
>> +
>> +	if (!(dma_scr & STM32_DMA_SCR_DBM))
>> +		return true;
>> +
>> +	sg_req = &chan->desc->sg_req[chan->next_sg];
>> +
>> +	if (dma_scr & STM32_DMA_SCR_CT) {
>> +		dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM0AR(id));
>> +		return (dma_smar == sg_req->chan_reg.dma_sm0ar);
>> +	}
>> +
>> +	dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM1AR(id));
>> +
>> +	return (dma_smar == sg_req->chan_reg.dma_sm1ar);
>> +}
>> +
>>  static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
>>  				     struct stm32_dma_desc *desc,
>>  				     u32 next_sg)
>>  {
>>  	u32 modulo, burst_size;
>> -	u32 residue = 0;
>> +	u32 residue;
>> +	u32 n_sg = next_sg;
>> +	struct stm32_dma_sg_req *sg_req = &chan->desc->sg_req[chan->next_sg];
>>  	int i;
>>  
>> +	residue = stm32_dma_get_remaining_bytes(chan);
>> +
>>  	/*
>> -	 * In cyclic mode, for the last period, residue = remaining bytes from
>> -	 * NDTR
>> +	 * Calculate the residue means compute the descriptors
>> +	 * information:
>> +	 * - the sg currently transferred
>> +	 * - the remaining position in this sg (NDTR).
>> +	 *
>> +	 * The issue is that a race condition can occur if DMA is
>> +	 * running. DMA can have started to transfer the next sg before
>> +	 * the position in sg is read. In this case the remaing position
>> +	 * can correspond to the new sg position.
>> +	 * The strategy implemented in the stm32 driver is to check the
>> +	 * sg transition. If detected we can not trust the SxNDTR register
>> +	 * value, this register can not be up to date during the transition.
>> +	 * In this case we can assume that the dma is at the beginning of next
>> +	 * sg so we calculate the residue in consequence.
>>  	 */
>> -	if (chan->desc->cyclic && next_sg == 0) {
>> -		residue = stm32_dma_get_remaining_bytes(chan);
>> -		goto end;
>> +
>> +	if (!stm32_dma_is_current_sg(chan)) {
>> +		n_sg++;
>> +		if (n_sg == chan->desc->num_sgs)
>> +			n_sg = 0;
>> +		residue = sg_req->len;
>>  	}
>>  
>>  	/*
>> -	 * For all other periods in cyclic mode, and in sg mode,
>> -	 * residue = remaining bytes from NDTR + remaining periods/sg to be
>> -	 * transferred
>> +	 * In cyclic mode, for the last period, residue = remaining bytes
>> +	 * from NDTR,
>> +	 * else for all other periods in cyclic mode, and in sg mode,
>> +	 * residue = remaining bytes from NDTR + remaining
>> +	 * periods/sg to be transferred
>>  	 */
>> -	for (i = next_sg; i < desc->num_sgs; i++)
>> -		residue += desc->sg_req[i].len;
>> -	residue += stm32_dma_get_remaining_bytes(chan);
>> +	if (!chan->desc->cyclic || n_sg != 0)
>> +		for (i = n_sg; i < desc->num_sgs; i++)
>> +			residue += desc->sg_req[i].len;
>>  
>> -end:
>>  	if (!chan->mem_burst)
>>  		return residue;
>>  
>> -- 
>> 2.7.4
>

^ permalink raw reply

* Re: [Bug 203435] New: Xilinx pcie endpoint DMA not functional for ZynqMP PS AXI-PXI DMA endpoint
From: Bjorn Helgaas @ 2019-04-26 13:21 UTC (permalink / raw)
  To: linux-pci; +Cc: dtyree, Michal Simek, Vinod Koul, dmaengine
In-Reply-To: <bug-203435-193951@https.bugzilla.kernel.org/>

[+cc linux-pci, Xilinx and DMA engine folks]

I'm not sure what to do with this report.  It mentions "vivado" and
xilinx_ps_pcie_platform.c, neither of which seems to be in mainline or
linux-next.  A Google search suggests that Vivado is a Xilinx product;
maybe this bug report should be directed there?

On Fri, Apr 26, 2019 at 7:09 AM <bugzilla-daemon@bugzilla.kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=203435
>
>             Bug ID: 203435
>            Summary: Xilinx pcie endpoint DMA not functional for ZynqMP PS
>                     AXI-PXI DMA endpoint
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 4.14.0
>           Hardware: Other
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PCI
>           Assignee: drivers_pci@kernel-bugs.osdl.org
>           Reporter: dtyree@newagemicro.com
>         Regression: No
>
> The ps_pcie_dma-1.00.a driver does not work as an endpoint. I'd be happy to
> help fix/enable/test this out, but I do not understand what the intent was.
>
> The rootdma option is the only one allowed / documented through device-tree and
> kernel docs. It does not cover the epdma branches in the code that require that
> drivers/dma/xilinx/xilinx_ps_pcie_platform.c:18:#define PLATFORM_DRIVER_NAME
>           "ps_pcie_pform_dma"
>
> be available. I am not sure if this was intended to be the endpoint.
>
> The vivado build for the pcie endpoint will not enable the endpoint DMA
> channels. I would expect that to be done through device-tree/PCIe BAR
> configuration reads (for sizes) to set up the channels, but I can't find
> anything that may other than the mystery of epdma with PLATFORM_DRIVER_NAME.
>
> Because the fsbl does not enable the AXI-DMA the rootdma channel enable check
> fails on line xilinx_ps_pcie_platform.c:1832
>         channel->chan_base =
>         (struct DMA_ENGINE_REGISTERS *)((__force char *)(xdev->reg_base) +
>                                  (channel_number * DMA_CHANNEL_REGS_SIZE));
>
>         if (((channel->chan_base->dma_channel_status) &
>                                 DMA_STATUS_DMA_PRES_BIT) == 0) {
>                 dev_err(&platform_dev->dev,
>                         "Hardware reports channel not present\n");
>                 return -ENOTSUPP;
>         }
>
> Is there a different way to access use the PS AXI-PCIe DMA?
>
> Can anyone offer some advice? Have I misunderstood the dis-functionality of the
> epdma -vs- rootdma branching?
>
> Thanks

^ permalink raw reply

* Re: [PATCH v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Dmitry Osipenko @ 2019-04-26 13:03 UTC (permalink / raw)
  To: Jon Hunter, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel
In-Reply-To: <242863b9-b75e-4b37-178a-5aa03e56d3e1@gmail.com>

26.04.2019 15:42, Dmitry Osipenko пишет:
> 26.04.2019 15:18, Dmitry Osipenko пишет:
>> 26.04.2019 14:13, Jon Hunter пишет:
>>>
>>> On 26/04/2019 11:45, Dmitry Osipenko wrote:
>>>> 26.04.2019 12:52, Jon Hunter пишет:
>>>>>
>>>>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>>>>>> The readl/writel functions are inserting memory barrier in order to
>>>>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>>>>>> results in L2 cache syncing which isn't a cheapest operation. The
>>>>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>>>>>> accesses, hence use the relaxed versions of the functions.
>>>>>
>>>>> Do you mean device-io accesses here as this is not generic memory?
>>>>
>>>> Yes. The IOMEM accesses within are always ordered and uncached, while
>>>> generic memory accesses are out-of-order and cached.
>>>>
>>>>> Although there may not be any issues with this change, I think I need a
>>>>> bit more convincing that we should do this given that we have had it
>>>>> this way for sometime and I would not like to see us introduce any
>>>>> regressions as this point without being 100% certain we would not.
>>>>> Ideally, if I had some good extensive tests I could run to hammer the
>>>>> DMA for all configurations with different combinations of channels
>>>>> running simultaneously then we could test this, but right now I don't :-(
>>>>>
>>>>> Have you ...
>>>>> 1. Tested both cyclic and scatter-gather transfers?
>>>>> 2. Stress tested simultaneous transfers with various different
>>>>>    configurations?
>>>>> 3. Quantified the actual performance benefit of this change so we can
>>>>>    understand how much of a performance boost this offers?
>>>>
>>>> Actually I found a case where this change causes a problem, I'm seeing
>>>> I2C transfer timeout for touchscreen and it breaks the touch input.
>>>> Indeed, I haven't tested this patch very well.
>>>>
>>>> And the fix is this:
>>>>
>>>> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
>>>> *dev)
>>>>  						  TEGRA_APBDMA_CHAN_WCOUNT);
>>>>  	}
>>>>
>>>> +	dsb();
>>>> +
>>>>  	clk_disable_unprepare(tdma->dma_clk);
>>>>
>>>>  	return 0;
>>>>
>>>>
>>>> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
>>>> incoherent and CPU disables clock before writes are reaching DMA controller.
>>>>
>>>> I'd say that cyclic and scatter-gather transfers are now tested. I also
>>>> made some more testing of simultaneous transfers.
>>>>
>>>> Quantifying performance probably won't be easy to make as the DMA
>>>> read/writes are not on any kind of code's hot-path.
>>>
>>> So why make the change?
>>
>> For consistency.
>>
>>>> Jon, are you still insisting about to drop this patch or you will be
>>>> fine with the v2 that will have the dsb() in place?
>>>
>>> If we can't quantify the performance gain, then it is difficult to
>>> justify the change. I would also be concerned if that is the only place
>>> we need an explicit dsb.
>>
>> Maybe it won't hurt to add dsb to the ISR as well. But okay, let's drop
>> this patch for now.
>>
> 
> Jon, it occurred to me that there still should be a problem with the
> writel() ordering in the driver because writel() ensures that memory
> stores are completed *before* the write occurs and hence translates into
> iowmb() + writel_relaxed() [0]. Thus the last write will always happen
> asynchronously in regards to clk accesses.
> 
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/io.h#n311
> 

Also please note that iowmb() translates into wmb() if
CONFIG_ARM_DMA_MEM_BUFFERABLE=y and sometime ago I was profiling host1x
driver job submission performance and have seen cases where wmb() could
take up to 1ms on T20 due to L2 syncing if there are outstanding memory
writes in the cache (or even more, I don't remember exactly already how
bad it was..).

Altogether, I think the usage of readl/writel in pretty much all of
Tegra drivers is plainly wrong and explicit dsb() shall be used in
places where hardware synchronization is really needed.

^ permalink raw reply

* [v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Dmitry Osipenko @ 2019-04-26 13:03 UTC (permalink / raw)
  To: Jon Hunter, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel

26.04.2019 15:42, Dmitry Osipenko пишет:
> 26.04.2019 15:18, Dmitry Osipenko пишет:
>> 26.04.2019 14:13, Jon Hunter пишет:
>>>
>>> On 26/04/2019 11:45, Dmitry Osipenko wrote:
>>>> 26.04.2019 12:52, Jon Hunter пишет:
>>>>>
>>>>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>>>>>> The readl/writel functions are inserting memory barrier in order to
>>>>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>>>>>> results in L2 cache syncing which isn't a cheapest operation. The
>>>>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>>>>>> accesses, hence use the relaxed versions of the functions.
>>>>>
>>>>> Do you mean device-io accesses here as this is not generic memory?
>>>>
>>>> Yes. The IOMEM accesses within are always ordered and uncached, while
>>>> generic memory accesses are out-of-order and cached.
>>>>
>>>>> Although there may not be any issues with this change, I think I need a
>>>>> bit more convincing that we should do this given that we have had it
>>>>> this way for sometime and I would not like to see us introduce any
>>>>> regressions as this point without being 100% certain we would not.
>>>>> Ideally, if I had some good extensive tests I could run to hammer the
>>>>> DMA for all configurations with different combinations of channels
>>>>> running simultaneously then we could test this, but right now I don't :-(
>>>>>
>>>>> Have you ...
>>>>> 1. Tested both cyclic and scatter-gather transfers?
>>>>> 2. Stress tested simultaneous transfers with various different
>>>>>    configurations?
>>>>> 3. Quantified the actual performance benefit of this change so we can
>>>>>    understand how much of a performance boost this offers?
>>>>
>>>> Actually I found a case where this change causes a problem, I'm seeing
>>>> I2C transfer timeout for touchscreen and it breaks the touch input.
>>>> Indeed, I haven't tested this patch very well.
>>>>
>>>> And the fix is this:
>>>>
>>>> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
>>>> *dev)
>>>>  						  TEGRA_APBDMA_CHAN_WCOUNT);
>>>>  	}
>>>>
>>>> +	dsb();
>>>> +
>>>>  	clk_disable_unprepare(tdma->dma_clk);
>>>>
>>>>  	return 0;
>>>>
>>>>
>>>> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
>>>> incoherent and CPU disables clock before writes are reaching DMA controller.
>>>>
>>>> I'd say that cyclic and scatter-gather transfers are now tested. I also
>>>> made some more testing of simultaneous transfers.
>>>>
>>>> Quantifying performance probably won't be easy to make as the DMA
>>>> read/writes are not on any kind of code's hot-path.
>>>
>>> So why make the change?
>>
>> For consistency.
>>
>>>> Jon, are you still insisting about to drop this patch or you will be
>>>> fine with the v2 that will have the dsb() in place?
>>>
>>> If we can't quantify the performance gain, then it is difficult to
>>> justify the change. I would also be concerned if that is the only place
>>> we need an explicit dsb.
>>
>> Maybe it won't hurt to add dsb to the ISR as well. But okay, let's drop
>> this patch for now.
>>
> 
> Jon, it occurred to me that there still should be a problem with the
> writel() ordering in the driver because writel() ensures that memory
> stores are completed *before* the write occurs and hence translates into
> iowmb() + writel_relaxed() [0]. Thus the last write will always happen
> asynchronously in regards to clk accesses.
> 
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/io.h#n311
> 

Also please note that iowmb() translates into wmb() if
CONFIG_ARM_DMA_MEM_BUFFERABLE=y and sometime ago I was profiling host1x
driver job submission performance and have seen cases where wmb() could
take up to 1ms on T20 due to L2 syncing if there are outstanding memory
writes in the cache (or even more, I don't remember exactly already how
bad it was..).

Altogether, I think the usage of readl/writel in pretty much all of
Tegra drivers is plainly wrong and explicit dsb() shall be used in
places where hardware synchronization is really needed.

^ permalink raw reply

* Re: [PATCH v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Vinod Koul @ 2019-04-26 12:47 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Laxman Dewangan, Thierry Reding, Jonathan Hunter, dmaengine,
	linux-tegra, linux-kernel
In-Reply-To: <68c342e2-5662-5632-82aa-659da0381f54@gmail.com>

On 26-04-19, 15:19, Dmitry Osipenko wrote:
> 26.04.2019 15:01, Vinod Koul пишет:
> > On 25-04-19, 02:17, Dmitry Osipenko wrote:
> >> The readl/writel functions are inserting memory barrier in order to
> >> ensure that memory stores are completed. On Tegra20 and Tegra30 this
> >> results in L2 cache syncing which isn't a cheapest operation. The
> >> tegra20-apb-dma driver doesn't need to synchronize generic memory
> >> accesses, hence use the relaxed versions of the functions.
> > 
> > Subsystem name is **dmaengine** not dma! Please fix that
> > 
> 
> Looks like you answered to a wrong email.

Yes looks like I did, apologies for that :(

-- 
~Vinod

^ permalink raw reply

* [v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Vinod Koul @ 2019-04-26 12:47 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Laxman Dewangan, Thierry Reding, Jonathan Hunter, dmaengine,
	linux-tegra, linux-kernel

On 26-04-19, 15:19, Dmitry Osipenko wrote:
> 26.04.2019 15:01, Vinod Koul пишет:
> > On 25-04-19, 02:17, Dmitry Osipenko wrote:
> >> The readl/writel functions are inserting memory barrier in order to
> >> ensure that memory stores are completed. On Tegra20 and Tegra30 this
> >> results in L2 cache syncing which isn't a cheapest operation. The
> >> tegra20-apb-dma driver doesn't need to synchronize generic memory
> >> accesses, hence use the relaxed versions of the functions.
> > 
> > Subsystem name is **dmaengine** not dma! Please fix that
> > 
> 
> Looks like you answered to a wrong email.

Yes looks like I did, apologies for that :(

^ permalink raw reply

* Re: [V3 1/2] dmaengine: fsl-dpaa2-qdma: Add the DPDMAI(Data Path DMA Interface) support
From: Vinod Koul @ 2019-04-26 12:45 UTC (permalink / raw)
  To: Peng Ma; +Cc: dan.j.williams, leoyang.li, linux-kernel, dmaengine
In-Reply-To: <20190409072212.15860-1-peng.ma@nxp.com>

On 09-04-19, 15:22, Peng Ma wrote:

Subject missed PATCH tag!

> The MC exports the DPDMAI object as an interface to operate the DPAA2 QDMA

whats MC, DPDMAI, DPAA2

> Engine. The DPDMAI enables sending frame-based requests to QDMA and receiving
> back confirmation response on transaction completion, utilizing the DPAA2 QBMan
> infrastructure. DPDMAI object provides up to two priorities for processing QDMA
> requests.

> +int dpdmai_open(struct fsl_mc_io *mc_io,
> +		u32 cmd_flags,
> +		int dpdmai_id,
> +		u16 *token)

could be written as:

int dpdmai_open(struct fsl_mc_io *mc_io, u32 cmd_flags,
                int dpdmai_id, u16 *token)

Looks neater, right? It would be to reread coding guidelines and run
checkpatch with --strict on this


> +{
> +	struct fsl_mc_command cmd = { 0 };

where is this defined?

> +	struct dpdmai_cmd_open *cmd_params;
> +	int err;
> +
> +	/* prepare command */
> +	cmd.header = mc_encode_cmd_header(DPDMAI_CMDID_OPEN,
> +					  cmd_flags,
> +					  0);
> +
> +	cmd_params = (struct dpdmai_cmd_open *)cmd.params;

I dont like casts, can you explain

> +	cmd_params->dpdmai_id = cpu_to_le32(dpdmai_id);
> +
> +	/* send command to mc*/
> +	err = mc_send_command(mc_io, &cmd);
> +	if (err)
> +		return err;
> +
> +	/* retrieve response parameters */
> +	*token = mc_cmd_hdr_read_token(&cmd);
> +	return 0;
> +}

who will call this API?

> +int dpdmai_create(struct fsl_mc_io *mc_io,
> +		  u32 cmd_flags,
> +		  const struct dpdmai_cfg *cfg,
> +		  u16 *token)
> +{
> +	struct fsl_mc_command cmd = { 0 };
> +	int err;
> +
> +	/* prepare command */
> +	cmd.header = mc_encode_cmd_header(DPDMAI_CMDID_CREATE,
> +					  cmd_flags,
> +					  0);

why waste a line, last arg can be in previous line!

> +int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
> +			u32 cmd_flags,
> +			u16 token,
> +			u8 priority,
> +			struct dpdmai_tx_queue_attr *attr)
> +{
> +	struct fsl_mc_command cmd = { 0 };
> +	struct dpdmai_cmd_queue *cmd_params;
> +	struct dpdmai_rsp_get_tx_queue *rsp_params;
> +	int err;
> +
> +	/* prepare command */
> +	cmd.header = mc_encode_cmd_header(DPDMAI_CMDID_GET_TX_QUEUE,
> +					  cmd_flags,
> +					  token);
> +
> +	cmd_params = (struct dpdmai_cmd_queue *)cmd.params;
> +	cmd_params->queue = priority;
> +
> +	/* send command to mc*/
> +	err = mc_send_command(mc_io, &cmd);
> +	if (err)
> +		return err;
> +
> +	/* retrieve response parameters */
> +
> +	rsp_params = (struct dpdmai_rsp_get_tx_queue *)cmd.params;
> +	attr->fqid = le32_to_cpu(rsp_params->fqid);
> +
> +	return 0;
> +}

Okay this file does not use any of dmaengine apis, is this a dmaengine
driver?

> diff --git a/drivers/dma/fsl-dpaa2-qdma/dpdmai.h b/drivers/dma/fsl-dpaa2-qdma/dpdmai.h
> new file mode 100644
> index 0000000..c8a7b7f
> --- /dev/null
> +++ b/drivers/dma/fsl-dpaa2-qdma/dpdmai.h
> @@ -0,0 +1,524 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright 2014-2018 NXP */
> +
> +/*
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are met:
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * * Neither the name of the above-listed copyright holders nor the
> + * names of any contributors may be used to endorse or promote products
> + * derived from this software without specific prior written permission.
> + *
> + *
> + * ALTERNATIVELY, this software may be distributed under the terms of the
> + * GNU General Public License ("GPL") as published by the Free Software
> + * Foundation, either version 2 of that License or (at your option) any
> + * later version.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + */

we have SiPDX tag, why do you need ths here!

-- 
~Vinod

^ permalink raw reply

* [V3,1/2] dmaengine: fsl-dpaa2-qdma: Add the DPDMAI(Data Path DMA Interface) support
From: Vinod Koul @ 2019-04-26 12:45 UTC (permalink / raw)
  To: Peng Ma; +Cc: dan.j.williams, leoyang.li, linux-kernel, dmaengine

On 09-04-19, 15:22, Peng Ma wrote:

Subject missed PATCH tag!

> The MC exports the DPDMAI object as an interface to operate the DPAA2 QDMA

whats MC, DPDMAI, DPAA2

> Engine. The DPDMAI enables sending frame-based requests to QDMA and receiving
> back confirmation response on transaction completion, utilizing the DPAA2 QBMan
> infrastructure. DPDMAI object provides up to two priorities for processing QDMA
> requests.

> +int dpdmai_open(struct fsl_mc_io *mc_io,
> +		u32 cmd_flags,
> +		int dpdmai_id,
> +		u16 *token)

could be written as:

int dpdmai_open(struct fsl_mc_io *mc_io, u32 cmd_flags,
                int dpdmai_id, u16 *token)

Looks neater, right? It would be to reread coding guidelines and run
checkpatch with --strict on this


> +{
> +	struct fsl_mc_command cmd = { 0 };

where is this defined?

> +	struct dpdmai_cmd_open *cmd_params;
> +	int err;
> +
> +	/* prepare command */
> +	cmd.header = mc_encode_cmd_header(DPDMAI_CMDID_OPEN,
> +					  cmd_flags,
> +					  0);
> +
> +	cmd_params = (struct dpdmai_cmd_open *)cmd.params;

I dont like casts, can you explain

> +	cmd_params->dpdmai_id = cpu_to_le32(dpdmai_id);
> +
> +	/* send command to mc*/
> +	err = mc_send_command(mc_io, &cmd);
> +	if (err)
> +		return err;
> +
> +	/* retrieve response parameters */
> +	*token = mc_cmd_hdr_read_token(&cmd);
> +	return 0;
> +}

who will call this API?

> +int dpdmai_create(struct fsl_mc_io *mc_io,
> +		  u32 cmd_flags,
> +		  const struct dpdmai_cfg *cfg,
> +		  u16 *token)
> +{
> +	struct fsl_mc_command cmd = { 0 };
> +	int err;
> +
> +	/* prepare command */
> +	cmd.header = mc_encode_cmd_header(DPDMAI_CMDID_CREATE,
> +					  cmd_flags,
> +					  0);

why waste a line, last arg can be in previous line!

> +int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
> +			u32 cmd_flags,
> +			u16 token,
> +			u8 priority,
> +			struct dpdmai_tx_queue_attr *attr)
> +{
> +	struct fsl_mc_command cmd = { 0 };
> +	struct dpdmai_cmd_queue *cmd_params;
> +	struct dpdmai_rsp_get_tx_queue *rsp_params;
> +	int err;
> +
> +	/* prepare command */
> +	cmd.header = mc_encode_cmd_header(DPDMAI_CMDID_GET_TX_QUEUE,
> +					  cmd_flags,
> +					  token);
> +
> +	cmd_params = (struct dpdmai_cmd_queue *)cmd.params;
> +	cmd_params->queue = priority;
> +
> +	/* send command to mc*/
> +	err = mc_send_command(mc_io, &cmd);
> +	if (err)
> +		return err;
> +
> +	/* retrieve response parameters */
> +
> +	rsp_params = (struct dpdmai_rsp_get_tx_queue *)cmd.params;
> +	attr->fqid = le32_to_cpu(rsp_params->fqid);
> +
> +	return 0;
> +}

Okay this file does not use any of dmaengine apis, is this a dmaengine
driver?

> diff --git a/drivers/dma/fsl-dpaa2-qdma/dpdmai.h b/drivers/dma/fsl-dpaa2-qdma/dpdmai.h
> new file mode 100644
> index 0000000..c8a7b7f
> --- /dev/null
> +++ b/drivers/dma/fsl-dpaa2-qdma/dpdmai.h
> @@ -0,0 +1,524 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright 2014-2018 NXP */
> +
> +/*
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are met:
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * * Neither the name of the above-listed copyright holders nor the
> + * names of any contributors may be used to endorse or promote products
> + * derived from this software without specific prior written permission.
> + *
> + *
> + * ALTERNATIVELY, this software may be distributed under the terms of the
> + * GNU General Public License ("GPL") as published by the Free Software
> + * Foundation, either version 2 of that License or (at your option) any
> + * later version.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + */

we have SiPDX tag, why do you need ths here!

^ permalink raw reply

* Re: [PATCH v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Dmitry Osipenko @ 2019-04-26 12:42 UTC (permalink / raw)
  To: Jon Hunter, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel
In-Reply-To: <49392c02-6dcc-9a95-0035-27c4c0d14820@gmail.com>

26.04.2019 15:18, Dmitry Osipenko пишет:
> 26.04.2019 14:13, Jon Hunter пишет:
>>
>> On 26/04/2019 11:45, Dmitry Osipenko wrote:
>>> 26.04.2019 12:52, Jon Hunter пишет:
>>>>
>>>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>>>>> The readl/writel functions are inserting memory barrier in order to
>>>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>>>>> results in L2 cache syncing which isn't a cheapest operation. The
>>>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>>>>> accesses, hence use the relaxed versions of the functions.
>>>>
>>>> Do you mean device-io accesses here as this is not generic memory?
>>>
>>> Yes. The IOMEM accesses within are always ordered and uncached, while
>>> generic memory accesses are out-of-order and cached.
>>>
>>>> Although there may not be any issues with this change, I think I need a
>>>> bit more convincing that we should do this given that we have had it
>>>> this way for sometime and I would not like to see us introduce any
>>>> regressions as this point without being 100% certain we would not.
>>>> Ideally, if I had some good extensive tests I could run to hammer the
>>>> DMA for all configurations with different combinations of channels
>>>> running simultaneously then we could test this, but right now I don't :-(
>>>>
>>>> Have you ...
>>>> 1. Tested both cyclic and scatter-gather transfers?
>>>> 2. Stress tested simultaneous transfers with various different
>>>>    configurations?
>>>> 3. Quantified the actual performance benefit of this change so we can
>>>>    understand how much of a performance boost this offers?
>>>
>>> Actually I found a case where this change causes a problem, I'm seeing
>>> I2C transfer timeout for touchscreen and it breaks the touch input.
>>> Indeed, I haven't tested this patch very well.
>>>
>>> And the fix is this:
>>>
>>> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
>>> *dev)
>>>  						  TEGRA_APBDMA_CHAN_WCOUNT);
>>>  	}
>>>
>>> +	dsb();
>>> +
>>>  	clk_disable_unprepare(tdma->dma_clk);
>>>
>>>  	return 0;
>>>
>>>
>>> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
>>> incoherent and CPU disables clock before writes are reaching DMA controller.
>>>
>>> I'd say that cyclic and scatter-gather transfers are now tested. I also
>>> made some more testing of simultaneous transfers.
>>>
>>> Quantifying performance probably won't be easy to make as the DMA
>>> read/writes are not on any kind of code's hot-path.
>>
>> So why make the change?
> 
> For consistency.
> 
>>> Jon, are you still insisting about to drop this patch or you will be
>>> fine with the v2 that will have the dsb() in place?
>>
>> If we can't quantify the performance gain, then it is difficult to
>> justify the change. I would also be concerned if that is the only place
>> we need an explicit dsb.
> 
> Maybe it won't hurt to add dsb to the ISR as well. But okay, let's drop
> this patch for now.
> 

Jon, it occurred to me that there still should be a problem with the
writel() ordering in the driver because writel() ensures that memory
stores are completed *before* the write occurs and hence translates into
iowmb() + writel_relaxed() [0]. Thus the last write will always happen
asynchronously in regards to clk accesses.

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/io.h#n311

^ permalink raw reply

* [v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Dmitry Osipenko @ 2019-04-26 12:42 UTC (permalink / raw)
  To: Jon Hunter, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel

26.04.2019 15:18, Dmitry Osipenko пишет:
> 26.04.2019 14:13, Jon Hunter пишет:
>>
>> On 26/04/2019 11:45, Dmitry Osipenko wrote:
>>> 26.04.2019 12:52, Jon Hunter пишет:
>>>>
>>>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>>>>> The readl/writel functions are inserting memory barrier in order to
>>>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>>>>> results in L2 cache syncing which isn't a cheapest operation. The
>>>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>>>>> accesses, hence use the relaxed versions of the functions.
>>>>
>>>> Do you mean device-io accesses here as this is not generic memory?
>>>
>>> Yes. The IOMEM accesses within are always ordered and uncached, while
>>> generic memory accesses are out-of-order and cached.
>>>
>>>> Although there may not be any issues with this change, I think I need a
>>>> bit more convincing that we should do this given that we have had it
>>>> this way for sometime and I would not like to see us introduce any
>>>> regressions as this point without being 100% certain we would not.
>>>> Ideally, if I had some good extensive tests I could run to hammer the
>>>> DMA for all configurations with different combinations of channels
>>>> running simultaneously then we could test this, but right now I don't :-(
>>>>
>>>> Have you ...
>>>> 1. Tested both cyclic and scatter-gather transfers?
>>>> 2. Stress tested simultaneous transfers with various different
>>>>    configurations?
>>>> 3. Quantified the actual performance benefit of this change so we can
>>>>    understand how much of a performance boost this offers?
>>>
>>> Actually I found a case where this change causes a problem, I'm seeing
>>> I2C transfer timeout for touchscreen and it breaks the touch input.
>>> Indeed, I haven't tested this patch very well.
>>>
>>> And the fix is this:
>>>
>>> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
>>> *dev)
>>>  						  TEGRA_APBDMA_CHAN_WCOUNT);
>>>  	}
>>>
>>> +	dsb();
>>> +
>>>  	clk_disable_unprepare(tdma->dma_clk);
>>>
>>>  	return 0;
>>>
>>>
>>> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
>>> incoherent and CPU disables clock before writes are reaching DMA controller.
>>>
>>> I'd say that cyclic and scatter-gather transfers are now tested. I also
>>> made some more testing of simultaneous transfers.
>>>
>>> Quantifying performance probably won't be easy to make as the DMA
>>> read/writes are not on any kind of code's hot-path.
>>
>> So why make the change?
> 
> For consistency.
> 
>>> Jon, are you still insisting about to drop this patch or you will be
>>> fine with the v2 that will have the dsb() in place?
>>
>> If we can't quantify the performance gain, then it is difficult to
>> justify the change. I would also be concerned if that is the only place
>> we need an explicit dsb.
> 
> Maybe it won't hurt to add dsb to the ISR as well. But okay, let's drop
> this patch for now.
> 

Jon, it occurred to me that there still should be a problem with the
writel() ordering in the driver because writel() ensures that memory
stores are completed *before* the write occurs and hence translates into
iowmb() + writel_relaxed() [0]. Thus the last write will always happen
asynchronously in regards to clk accesses.

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/io.h#n311

^ permalink raw reply

* Re: [PATCH] dmaengine: stm32-dma: dmaengine: stm32-dma: use platform_get_irq()
From: Vinod Koul @ 2019-04-26 12:20 UTC (permalink / raw)
  To: Fabien Dessenne
  Cc: Dan Williams, Maxime Coquelin, Alexandre Torgue,
	Pierre-Yves MORDRET, dmaengine, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1556097685-7236-1-git-send-email-fabien.dessenne@st.com>

On 24-04-19, 11:21, Fabien Dessenne wrote:
> platform_get_resource(pdev, IORESOURCE_IRQ) is not recommended for
> requesting IRQ's resources, as they can be not ready yet. Using
> platform_get_irq() instead is preferred for getting IRQ even if it was
> not retrieved earlier.

Applied, thanks

-- 
~Vinod

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox