linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] Add support for BCM2712 DMA engine
@ 2024-02-04  6:59 Andrea della Porta
  2024-02-04  6:59 ` [PATCH 01/12] bcm2835-dma: Add support for per-channel flags Andrea della Porta
                   ` (13 more replies)
  0 siblings, 14 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell, Andrea della Porta

This patchset aims to update the dma engine for BCM* chipset with respect
to current advancements in downstream vendor tree. In particular:

* Added support for BCM2712 DMA.
* Extended DMA addressing to 40 bit. Since BCM2711 also supports 40 bit addressing,
it will also benefit from the update.
* Handled the devicetree node from vendor dts (e.g. "dma40").

The only difference between the application of this patch and the relative code
in vendor tree is the dropping of channel reservation for BCM2708 DMA legacy
driver, that seems to have not made its way to upstream anyway, and it's
probably used only from deprecated subsystems.

Compile tested and runtime tested on RPi4B only.

Dom Cobley (4):
  bcm2835-dma: Support dma flags for multi-beat burst
  bcm2835-dma: Need to keep PROT bits set in CS on 40bit controller
  dmaengine: bcm2835: Rename to_bcm2711_cbaddr to to_40bit_cbaddr
  bcm2835-dma: Fixes for dma_abort

Maxime Ripard (2):
  dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant
  dmaengine: bcm2835: Support DMA-Lite channels

Phil Elwell (6):
  bcm2835-dma: Add support for per-channel flags
  bcm2835-dma: Add proper 40-bit DMA support
  bcm2835-dma: Add NO_WAIT_RESP, DMA_WIDE_SOURCE and DMA_WIDE_DEST flag
  bcm2835-dma: Advertise the full DMA range
  bcm2835-dma: Derive slave DMA addresses correctly
  dmaengine: bcm2835: Add BCM2712 support

 drivers/dma/bcm2835-dma.c | 701 ++++++++++++++++++++++++++++++++------
 1 file changed, 588 insertions(+), 113 deletions(-)

-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/12] bcm2835-dma: Add support for per-channel flags
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-04  6:59 ` [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support Andrea della Porta
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell, Phil Elwell,
	Andrea della Porta

From: Phil Elwell <phil@raspberrypi.org>

Add the ability to interpret the high bits of the dreq specifier as
flags to be included in the DMA_CS register. The motivation for this
change is the ability to set the DISDEBUG flag for SD card transfers
to avoid corruption when using the VPU debugger.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
 drivers/dma/bcm2835-dma.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 9d74fe97452e..2704e2578e23 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -137,6 +137,10 @@ struct bcm2835_desc {
 #define BCM2835_DMA_S_DREQ	BIT(10) /* enable SREQ for source */
 #define BCM2835_DMA_S_IGNORE	BIT(11) /* ignore source reads - read 0 */
 #define BCM2835_DMA_BURST_LENGTH(x) ((x & 15) << 12)
+#define BCM2835_DMA_CS_FLAGS(x) ((x) & (BCM2835_DMA_PRIORITY(15) | \
+				      BCM2835_DMA_PANIC_PRIORITY(15) | \
+				      BCM2835_DMA_WAIT_FOR_WRITES | \
+				      BCM2835_DMA_DIS_DEBUG))
 #define BCM2835_DMA_PER_MAP(x)	((x & 31) << 16) /* REQ source */
 #define BCM2835_DMA_WAIT(x)	((x & 31) << 21) /* add DMA-wait cycles */
 #define BCM2835_DMA_NO_WIDE_BURSTS BIT(26) /* no 2 beat write bursts */
@@ -450,7 +454,8 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
 	c->desc = d = to_bcm2835_dma_desc(&vd->tx);
 
 	writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
-	writel(BCM2835_DMA_ACTIVE, c->chan_base + BCM2835_DMA_CS);
+	writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
+	       c->chan_base + BCM2835_DMA_CS);
 }
 
 static irqreturn_t bcm2835_dma_callback(int irq, void *data)
@@ -477,7 +482,8 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
 	 * if this IRQ handler is threaded.) If the channel is finished, it
 	 * will remain idle despite the ACTIVE flag being set.
 	 */
-	writel(BCM2835_DMA_INT | BCM2835_DMA_ACTIVE,
+	writel(BCM2835_DMA_INT | BCM2835_DMA_ACTIVE |
+	       BCM2835_DMA_CS_FLAGS(c->dreq),
 	       c->chan_base + BCM2835_DMA_CS);
 
 	d = c->desc;
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
  2024-02-04  6:59 ` [PATCH 01/12] bcm2835-dma: Add support for per-channel flags Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-05 18:50   ` Stefan Wahren
  2024-02-04  6:59 ` [PATCH 03/12] bcm2835-dma: Add NO_WAIT_RESP, DMA_WIDE_SOURCE and DMA_WIDE_DEST flag Andrea della Porta
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell, Phil Elwell,
	Andrea della Porta

From: Phil Elwell <phil@raspberrypi.org>

BCM2711 has 4 DMA channels with a 40-bit address range, allowing them
to access the full 4GB of memory on a Pi 4.

Cc: Phil Elwell <phil@raspberrypi.org>
Cc: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
 drivers/dma/bcm2835-dma.c | 601 ++++++++++++++++++++++++++++++++------
 1 file changed, 505 insertions(+), 96 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 2704e2578e23..11c6bf7d8a4b 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -36,6 +36,11 @@
 
 #define BCM2835_DMA_MAX_DMA_CHAN_SUPPORTED 14
 #define BCM2835_DMA_CHAN_NAME_SIZE 8
+#define BCM2711_DMA_MEMCPY_CHAN 14
+
+struct bcm2835_dma_cfg_data {
+	u32	chan_40bit_mask;
+};
 
 /**
  * struct bcm2835_dmadev - BCM2835 DMA controller
@@ -48,6 +53,7 @@ struct bcm2835_dmadev {
 	struct dma_device ddev;
 	void __iomem *base;
 	dma_addr_t zero_page;
+	const struct bcm2835_dma_cfg_data *cfg_data;
 };
 
 struct bcm2835_dma_cb {
@@ -60,6 +66,17 @@ struct bcm2835_dma_cb {
 	uint32_t pad[2];
 };
 
+struct bcm2711_dma40_scb {
+	u32 ti;
+	u32 src;
+	u32 srci;
+	u32 dst;
+	u32 dsti;
+	u32 len;
+	u32 next_cb;
+	u32 rsvd;
+};
+
 struct bcm2835_cb_entry {
 	struct bcm2835_dma_cb *cb;
 	dma_addr_t paddr;
@@ -80,6 +97,7 @@ struct bcm2835_chan {
 	unsigned int irq_flags;
 
 	bool is_lite_channel;
+	bool is_40bit_channel;
 };
 
 struct bcm2835_desc {
@@ -169,13 +187,118 @@ struct bcm2835_desc {
 #define BCM2835_DMA_DATA_TYPE_S128	16
 
 /* Valid only for channels 0 - 14, 15 has its own base address */
-#define BCM2835_DMA_CHAN(n)	((n) << 8) /* Base address */
+#define BCM2835_DMA_CHAN_SIZE	0x100
+#define BCM2835_DMA_CHAN(n)	((n) * BCM2835_DMA_CHAN_SIZE) /* Base address */
 #define BCM2835_DMA_CHANIO(base, n) ((base) + BCM2835_DMA_CHAN(n))
 
 /* the max dma length for different channels */
 #define MAX_DMA_LEN SZ_1G
 #define MAX_LITE_DMA_LEN (SZ_64K - 4)
 
+/* 40-bit DMA support */
+#define BCM2711_DMA40_CS	0x00
+#define BCM2711_DMA40_CB	0x04
+#define BCM2711_DMA40_DEBUG	0x0c
+#define BCM2711_DMA40_TI	0x10
+#define BCM2711_DMA40_SRC	0x14
+#define BCM2711_DMA40_SRCI	0x18
+#define BCM2711_DMA40_DEST	0x1c
+#define BCM2711_DMA40_DESTI	0x20
+#define BCM2711_DMA40_LEN	0x24
+#define BCM2711_DMA40_NEXT_CB	0x28
+#define BCM2711_DMA40_DEBUG2	0x2c
+
+#define BCM2711_DMA40_ACTIVE		BIT(0)
+#define BCM2711_DMA40_END		BIT(1)
+#define BCM2711_DMA40_INT		BIT(2)
+#define BCM2711_DMA40_DREQ		BIT(3)  /* DREQ state */
+#define BCM2711_DMA40_RD_PAUSED		BIT(4)  /* Reading is paused */
+#define BCM2711_DMA40_WR_PAUSED		BIT(5)  /* Writing is paused */
+#define BCM2711_DMA40_DREQ_PAUSED	BIT(6)  /* Is paused by DREQ flow control */
+#define BCM2711_DMA40_WAITING_FOR_WRITES BIT(7)  /* Waiting for last write */
+#define BCM2711_DMA40_ERR		BIT(10)
+#define BCM2711_DMA40_QOS(x)		(((x) & 0x1f) << 16)
+#define BCM2711_DMA40_PANIC_QOS(x)	(((x) & 0x1f) << 20)
+#define BCM2711_DMA40_WAIT_FOR_WRITES	BIT(28)
+#define BCM2711_DMA40_DISDEBUG		BIT(29)
+#define BCM2711_DMA40_ABORT		BIT(30)
+#define BCM2711_DMA40_HALT		BIT(31)
+#define BCM2711_DMA40_CS_FLAGS(x) ((x) & (BCM2711_DMA40_QOS(15) | \
+					BCM2711_DMA40_PANIC_QOS(15) | \
+					BCM2711_DMA40_WAIT_FOR_WRITES |	\
+					BCM2711_DMA40_DISDEBUG))
+
+/* Transfer information bits */
+#define BCM2711_DMA40_INTEN		BIT(0)
+#define BCM2711_DMA40_TDMODE		BIT(1) /* 2D-Mode */
+#define BCM2711_DMA40_WAIT_RESP		BIT(2) /* wait for AXI write to be acked */
+#define BCM2711_DMA40_WAIT_RD_RESP	BIT(3) /* wait for AXI read to complete */
+#define BCM2711_DMA40_PER_MAP(x)	(((x) & 31) << 9) /* REQ source */
+#define BCM2711_DMA40_S_DREQ		BIT(14) /* enable SREQ for source */
+#define BCM2711_DMA40_D_DREQ		BIT(15) /* enable DREQ for destination */
+#define BCM2711_DMA40_S_WAIT(x)		(((x) & 0xff) << 16) /* add DMA read-wait cycles */
+#define BCM2711_DMA40_D_WAIT(x)		(((x) & 0xff) << 24) /* add DMA write-wait cycles */
+
+/* debug register bits */
+#define BCM2711_DMA40_DEBUG_WRITE_ERR		BIT(0)
+#define BCM2711_DMA40_DEBUG_FIFO_ERR		BIT(1)
+#define BCM2711_DMA40_DEBUG_READ_ERR		BIT(2)
+#define BCM2711_DMA40_DEBUG_READ_CB_ERR		BIT(3)
+#define BCM2711_DMA40_DEBUG_IN_ON_ERR		BIT(8)
+#define BCM2711_DMA40_DEBUG_ABORT_ON_ERR	BIT(9)
+#define BCM2711_DMA40_DEBUG_HALT_ON_ERR		BIT(10)
+#define BCM2711_DMA40_DEBUG_DISABLE_CLK_GATE	BIT(11)
+#define BCM2711_DMA40_DEBUG_RSTATE_SHIFT	14
+#define BCM2711_DMA40_DEBUG_RSTATE_BITS		4
+#define BCM2711_DMA40_DEBUG_WSTATE_SHIFT	18
+#define BCM2711_DMA40_DEBUG_WSTATE_BITS		4
+#define BCM2711_DMA40_DEBUG_RESET		BIT(23)
+#define BCM2711_DMA40_DEBUG_ID_SHIFT		24
+#define BCM2711_DMA40_DEBUG_ID_BITS		4
+#define BCM2711_DMA40_DEBUG_VERSION_SHIFT	28
+#define BCM2711_DMA40_DEBUG_VERSION_BITS	4
+
+/* Valid only for channels 0 - 3 (11 - 14) */
+#define BCM2711_DMA40_CHAN(n)	(((n) + 11) << 8) /* Base address */
+#define BCM2711_DMA40_CHANIO(base, n) ((base) + BCM2711_DMA_CHAN(n))
+
+/* the max dma length for different channels */
+#define MAX_DMA40_LEN SZ_1G
+
+#define BCM2711_DMA40_BURST_LEN(x)	((min(x, 16) - 1) << 8)
+#define BCM2711_DMA40_INC		BIT(12)
+#define BCM2711_DMA40_SIZE_32		(0 << 13)
+#define BCM2711_DMA40_SIZE_64		(1 << 13)
+#define BCM2711_DMA40_SIZE_128		(2 << 13)
+#define BCM2711_DMA40_SIZE_256		(3 << 13)
+#define BCM2711_DMA40_IGNORE		BIT(15)
+#define BCM2711_DMA40_STRIDE(x)		((x) << 16) /* For 2D mode */
+
+#define BCM2711_DMA40_MEMCPY_FLAGS \
+	(BCM2711_DMA40_QOS(0) | \
+	 BCM2711_DMA40_PANIC_QOS(0) | \
+	 BCM2711_DMA40_WAIT_FOR_WRITES | \
+	 BCM2711_DMA40_DISDEBUG)
+
+#define BCM2711_DMA40_MEMCPY_XFER_INFO \
+	(BCM2711_DMA40_SIZE_128 | \
+	 BCM2711_DMA40_INC | \
+	 BCM2711_DMA40_BURST_LEN(16))
+
+struct bcm2835_dmadev *memcpy_parent;
+static void __iomem *memcpy_chan;
+static struct bcm2711_dma40_scb *memcpy_scb;
+static dma_addr_t memcpy_scb_dma;
+DEFINE_SPINLOCK(memcpy_lock);
+
+static const struct bcm2835_dma_cfg_data bcm2835_dma_cfg = {
+	.chan_40bit_mask = 0,
+};
+
+static const struct bcm2835_dma_cfg_data bcm2711_dma_cfg = {
+	.chan_40bit_mask = BIT(11) | BIT(12) | BIT(13) | BIT(14),
+};
+
 static inline size_t bcm2835_dma_max_frame_length(struct bcm2835_chan *c)
 {
 	/* lite and normal channels have different max frame length */
@@ -205,6 +328,32 @@ static inline struct bcm2835_desc *to_bcm2835_dma_desc(
 	return container_of(t, struct bcm2835_desc, vd.tx);
 }
 
+static inline uint32_t to_bcm2711_ti(uint32_t info)
+{
+	return ((info & BCM2835_DMA_INT_EN) ? BCM2711_DMA40_INTEN : 0) |
+		((info & BCM2835_DMA_WAIT_RESP) ? BCM2711_DMA40_WAIT_RESP : 0) |
+		((info & BCM2835_DMA_S_DREQ) ?
+		 (BCM2711_DMA40_S_DREQ | BCM2711_DMA40_WAIT_RD_RESP) : 0) |
+		((info & BCM2835_DMA_D_DREQ) ? BCM2711_DMA40_D_DREQ : 0) |
+		BCM2711_DMA40_PER_MAP((info >> 16) & 0x1f);
+}
+
+static inline uint32_t to_bcm2711_srci(uint32_t info)
+{
+	return ((info & BCM2835_DMA_S_INC) ? BCM2711_DMA40_INC : 0);
+}
+
+static inline uint32_t to_bcm2711_dsti(uint32_t info)
+{
+	return ((info & BCM2835_DMA_D_INC) ? BCM2711_DMA40_INC : 0);
+}
+
+static inline uint32_t to_bcm2711_cbaddr(dma_addr_t addr)
+{
+	WARN_ON_ONCE(addr & 0x1f);
+	return (addr >> 5);
+}
+
 static void bcm2835_dma_free_cb_chain(struct bcm2835_desc *desc)
 {
 	size_t i;
@@ -223,45 +372,53 @@ static void bcm2835_dma_desc_free(struct virt_dma_desc *vd)
 }
 
 static void bcm2835_dma_create_cb_set_length(
-	struct bcm2835_chan *chan,
+	struct bcm2835_chan *c,
 	struct bcm2835_dma_cb *control_block,
 	size_t len,
 	size_t period_len,
 	size_t *total_len,
 	u32 finalextrainfo)
 {
-	size_t max_len = bcm2835_dma_max_frame_length(chan);
+	size_t max_len = bcm2835_dma_max_frame_length(c);
+	u32 cb_len;
 
 	/* set the length taking lite-channel limitations into account */
-	control_block->length = min_t(u32, len, max_len);
+	cb_len = min_t(u32, len, max_len);
 
-	/* finished if we have no period_length */
-	if (!period_len)
-		return;
+	if (period_len) {
+		/*
+		 * period_len means: that we need to generate
+		 * transfers that are terminating at every
+		 * multiple of period_len - this is typically
+		 * used to set the interrupt flag in info
+		 * which is required during cyclic transfers
+		 */
 
-	/*
-	 * period_len means: that we need to generate
-	 * transfers that are terminating at every
-	 * multiple of period_len - this is typically
-	 * used to set the interrupt flag in info
-	 * which is required during cyclic transfers
-	 */
+		/* have we filled in period_length yet? */
+		if (*total_len + cb_len < period_len) {
+			/* update number of bytes in this period so far */
+			*total_len += cb_len;
+		} else {
+			/* calculate the length that remains to reach period_len */
+			cb_len = period_len - *total_len;
 
-	/* have we filled in period_length yet? */
-	if (*total_len + control_block->length < period_len) {
-		/* update number of bytes in this period so far */
-		*total_len += control_block->length;
-		return;
+			/* reset total_length for next period */
+			*total_len = 0;
+		}
 	}
 
-	/* calculate the length that remains to reach period_length */
-	control_block->length = period_len - *total_len;
-
-	/* reset total_length for next period */
-	*total_len = 0;
+	if (c->is_40bit_channel) {
+		struct bcm2711_dma40_scb *scb =
+			(struct bcm2711_dma40_scb *)control_block;
 
-	/* add extrainfo bits in info */
-	control_block->info |= finalextrainfo;
+		scb->len = cb_len;
+		/* add extrainfo bits to ti */
+		scb->ti |= to_bcm2711_ti(finalextrainfo);
+	} else {
+		control_block->length = cb_len;
+		/* add extrainfo bits to info */
+		control_block->info |= finalextrainfo;
+	}
 }
 
 static inline size_t bcm2835_dma_count_frames_for_sg(
@@ -284,7 +441,7 @@ static inline size_t bcm2835_dma_count_frames_for_sg(
 /**
  * bcm2835_dma_create_cb_chain - create a control block and fills data in
  *
- * @chan:           the @dma_chan for which we run this
+ * @c:              the @bcm2835_chan for which we run this
  * @direction:      the direction in which we transfer
  * @cyclic:         it is a cyclic transfer
  * @info:           the default info bits to apply per controlblock
@@ -302,12 +459,11 @@ static inline size_t bcm2835_dma_count_frames_for_sg(
  * @gfp:            the GFP flag to use for allocation
  */
 static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
-	struct dma_chan *chan, enum dma_transfer_direction direction,
+	struct bcm2835_chan *c, enum dma_transfer_direction direction,
 	bool cyclic, u32 info, u32 finalextrainfo, size_t frames,
 	dma_addr_t src, dma_addr_t dst, size_t buf_len,
 	size_t period_len, gfp_t gfp)
 {
-	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
 	size_t len = buf_len, total_len;
 	size_t frame;
 	struct bcm2835_desc *d;
@@ -339,11 +495,23 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
 
 		/* fill in the control block */
 		control_block = cb_entry->cb;
-		control_block->info = info;
-		control_block->src = src;
-		control_block->dst = dst;
-		control_block->stride = 0;
-		control_block->next = 0;
+		if (c->is_40bit_channel) {
+			struct bcm2711_dma40_scb *scb =
+				(struct bcm2711_dma40_scb *)control_block;
+			scb->ti = to_bcm2711_ti(info);
+			scb->src = lower_32_bits(src);
+			scb->srci = upper_32_bits(src) | to_bcm2711_srci(info);
+			scb->dst = lower_32_bits(dst);
+			scb->dsti = upper_32_bits(dst) | to_bcm2711_dsti(info);
+			scb->next_cb = 0;
+		} else {
+			control_block->info = info;
+			control_block->src = src;
+			control_block->dst = dst;
+			control_block->stride = 0;
+			control_block->next = 0;
+		}
+
 		/* set up length in control_block if requested */
 		if (buf_len) {
 			/* calculate length honoring period_length */
@@ -353,25 +521,51 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
 				cyclic ? finalextrainfo : 0);
 
 			/* calculate new remaining length */
-			len -= control_block->length;
+			if (c->is_40bit_channel)
+				len -= ((struct bcm2711_dma40_scb *)control_block)->len;
+			else
+				len -= control_block->length;
 		}
 
 		/* link this the last controlblock */
-		if (frame)
+		if (frame && c->is_40bit_channel)
+			((struct bcm2711_dma40_scb *)
+			 d->cb_list[frame - 1].cb)->next_cb =
+				to_bcm2711_cbaddr(cb_entry->paddr);
+		if (frame && !c->is_40bit_channel)
 			d->cb_list[frame - 1].cb->next = cb_entry->paddr;
 
 		/* update src and dst and length */
-		if (src && (info & BCM2835_DMA_S_INC))
-			src += control_block->length;
-		if (dst && (info & BCM2835_DMA_D_INC))
-			dst += control_block->length;
+		if (src && (info & BCM2835_DMA_S_INC)) {
+			if (c->is_40bit_channel)
+				src += ((struct bcm2711_dma40_scb *)control_block)->len;
+			else
+				src += control_block->length;
+		}
+
+		if (dst && (info & BCM2835_DMA_D_INC)) {
+			if (c->is_40bit_channel)
+				dst += ((struct bcm2711_dma40_scb *)control_block)->len;
+			else
+				dst += control_block->length;
+		}
 
 		/* Length of total transfer */
-		d->size += control_block->length;
+		if (c->is_40bit_channel)
+			d->size += ((struct bcm2711_dma40_scb *)control_block)->len;
+		else
+			d->size += control_block->length;
 	}
 
 	/* the last frame requires extra flags */
-	d->cb_list[d->frames - 1].cb->info |= finalextrainfo;
+	if (c->is_40bit_channel) {
+		struct bcm2711_dma40_scb *scb =
+			(struct bcm2711_dma40_scb *)d->cb_list[d->frames - 1].cb;
+
+		scb->ti |= to_bcm2711_ti(finalextrainfo);
+	} else {
+		d->cb_list[d->frames - 1].cb->info |= finalextrainfo;
+	}
 
 	/* detect a size missmatch */
 	if (buf_len && (d->size != buf_len))
@@ -385,13 +579,12 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
 }
 
 static void bcm2835_dma_fill_cb_chain_with_sg(
-	struct dma_chan *chan,
+	struct bcm2835_chan *c,
 	enum dma_transfer_direction direction,
 	struct bcm2835_cb_entry *cb,
 	struct scatterlist *sgl,
 	unsigned int sg_len)
 {
-	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
 	size_t len, max_len;
 	unsigned int i;
 	dma_addr_t addr;
@@ -399,14 +592,35 @@ static void bcm2835_dma_fill_cb_chain_with_sg(
 
 	max_len = bcm2835_dma_max_frame_length(c);
 	for_each_sg(sgl, sgent, sg_len, i) {
-		for (addr = sg_dma_address(sgent), len = sg_dma_len(sgent);
-		     len > 0;
-		     addr += cb->cb->length, len -= cb->cb->length, cb++) {
-			if (direction == DMA_DEV_TO_MEM)
-				cb->cb->dst = addr;
-			else
-				cb->cb->src = addr;
-			cb->cb->length = min(len, max_len);
+		if (c->is_40bit_channel) {
+			struct bcm2711_dma40_scb *scb;
+
+			for (addr = sg_dma_address(sgent),
+			     len = sg_dma_len(sgent);
+			     len > 0;
+			     addr += scb->len, len -= scb->len, cb++) {
+				scb = (struct bcm2711_dma40_scb *)cb->cb;
+				if (direction == DMA_DEV_TO_MEM) {
+					scb->dst = lower_32_bits(addr);
+					scb->dsti = upper_32_bits(addr) | BCM2711_DMA40_INC;
+				} else {
+					scb->src = lower_32_bits(addr);
+					scb->srci = upper_32_bits(addr) | BCM2711_DMA40_INC;
+				}
+				scb->len = min(len, max_len);
+			}
+		} else {
+			for (addr = sg_dma_address(sgent),
+			     len = sg_dma_len(sgent);
+			     len > 0;
+			     addr += cb->cb->length, len -= cb->cb->length,
+			     cb++) {
+				if (direction == DMA_DEV_TO_MEM)
+					cb->cb->dst = addr;
+				else
+					cb->cb->src = addr;
+				cb->cb->length = min(len, max_len);
+			}
 		}
 	}
 }
@@ -423,20 +637,60 @@ static void bcm2835_dma_abort(struct bcm2835_chan *c)
 	if (!readl(chan_base + BCM2835_DMA_ADDR))
 		return;
 
-	/* Write 0 to the active bit - Pause the DMA */
-	writel(0, chan_base + BCM2835_DMA_CS);
+	if (c->is_40bit_channel) {
+		/* Halt the current DMA */
+		writel(readl(chan_base + BCM2711_DMA40_CS) | BCM2711_DMA40_HALT,
+		       chan_base + BCM2711_DMA40_CS);
 
-	/* Wait for any current AXI transfer to complete */
-	while ((readl(chan_base + BCM2835_DMA_CS) &
-		BCM2835_DMA_WAITING_FOR_WRITES) && --timeout)
-		cpu_relax();
+		while ((readl(chan_base + BCM2711_DMA40_CS) & BCM2711_DMA40_HALT) && --timeout)
+			cpu_relax();
 
-	/* Peripheral might be stuck and fail to signal AXI write responses */
-	if (!timeout)
-		dev_err(c->vc.chan.device->dev,
-			"failed to complete outstanding writes\n");
+		/* Peripheral might be stuck and fail to halt */
+		if (!timeout)
+			dev_err(c->vc.chan.device->dev,
+				"failed to halt dma\n");
 
-	writel(BCM2835_DMA_RESET, chan_base + BCM2835_DMA_CS);
+		writel(0, chan_base + BCM2711_DMA40_CS);
+		writel(0, chan_base + BCM2711_DMA40_CB);
+	} else {
+		/*
+		 * A zero control block address means the channel is idle.
+		 * (The ACTIVE flag in the CS register is not a reliable indicator.)
+		 */
+		if (!readl(chan_base + BCM2835_DMA_ADDR))
+			return;
+
+		/* Write 0 to the active bit - Pause the DMA */
+		writel(readl(chan_base + BCM2835_DMA_CS) & ~BCM2835_DMA_ACTIVE,
+		       chan_base + BCM2835_DMA_CS);
+
+		/* wait for DMA to be paused */
+		while ((readl(chan_base + BCM2835_DMA_CS) & BCM2835_DMA_WAITING_FOR_WRITES) &&
+		       --timeout)
+			cpu_relax();
+
+		/* Peripheral might be stuck and fail to signal AXI write responses */
+		if (!timeout)
+			dev_err(c->vc.chan.device->dev,
+				"failed to pause dma\n");
+
+		/* We need to clear the next DMA block pending */
+		writel(0, chan_base + BCM2835_DMA_NEXTCB);
+
+		/* Abort the DMA, which needs to be enabled to complete */
+		writel(readl(chan_base + BCM2835_DMA_CS) | BCM2835_DMA_ABORT | BCM2835_DMA_ACTIVE,
+		       chan_base + BCM2835_DMA_CS);
+
+		/* wait for DMA to have been aborted */
+		timeout = 10000;
+		while ((readl(chan_base + BCM2835_DMA_CS) & BCM2835_DMA_ABORT) && --timeout)
+			cpu_relax();
+
+		/* Peripheral might be stuck and fail to signal AXI write responses */
+		if (!timeout)
+			dev_err(c->vc.chan.device->dev,
+				"failed to abort dma\n");
+	}
 }
 
 static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
@@ -453,9 +707,16 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
 
 	c->desc = d = to_bcm2835_dma_desc(&vd->tx);
 
-	writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
-	writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
-	       c->chan_base + BCM2835_DMA_CS);
+	if (c->is_40bit_channel) {
+		writel(to_bcm2711_cbaddr(d->cb_list[0].paddr),
+		       c->chan_base + BCM2711_DMA40_CB);
+		writel(BCM2711_DMA40_ACTIVE | BCM2711_DMA40_CS_FLAGS(c->dreq),
+		       c->chan_base + BCM2711_DMA40_CS);
+	} else {
+		writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
+		writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
+		       c->chan_base + BCM2835_DMA_CS);
+	}
 }
 
 static irqreturn_t bcm2835_dma_callback(int irq, void *data)
@@ -482,8 +743,7 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
 	 * if this IRQ handler is threaded.) If the channel is finished, it
 	 * will remain idle despite the ACTIVE flag being set.
 	 */
-	writel(BCM2835_DMA_INT | BCM2835_DMA_ACTIVE |
-	       BCM2835_DMA_CS_FLAGS(c->dreq),
+	writel(BCM2835_DMA_INT | BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
 	       c->chan_base + BCM2835_DMA_CS);
 
 	d = c->desc;
@@ -546,20 +806,39 @@ static size_t bcm2835_dma_desc_size_pos(struct bcm2835_desc *d, dma_addr_t addr)
 	unsigned int i;
 	size_t size;
 
-	for (size = i = 0; i < d->frames; i++) {
-		struct bcm2835_dma_cb *control_block = d->cb_list[i].cb;
-		size_t this_size = control_block->length;
-		dma_addr_t dma;
+	if (d->c->is_40bit_channel) {
+		for (size = i = 0; i < d->frames; i++) {
+			struct bcm2711_dma40_scb *control_block =
+				(struct bcm2711_dma40_scb *)d->cb_list[i].cb;
+			size_t this_size = control_block->len;
+			dma_addr_t dma;
 
-		if (d->dir == DMA_DEV_TO_MEM)
-			dma = control_block->dst;
-		else
-			dma = control_block->src;
+			if (d->dir == DMA_DEV_TO_MEM)
+				dma = control_block->dst;
+			else
+				dma = control_block->src;
 
-		if (size)
-			size += this_size;
-		else if (addr >= dma && addr < dma + this_size)
-			size += dma + this_size - addr;
+			if (size)
+				size += this_size;
+			else if (addr >= dma && addr < dma + this_size)
+				size += dma + this_size - addr;
+		}
+	} else {
+		for (size = i = 0; i < d->frames; i++) {
+			struct bcm2835_dma_cb *control_block = d->cb_list[i].cb;
+			size_t this_size = control_block->length;
+			dma_addr_t dma;
+
+			if (d->dir == DMA_DEV_TO_MEM)
+				dma = control_block->dst;
+			else
+				dma = control_block->src;
+
+			if (size)
+				size += this_size;
+			else if (addr >= dma && addr < dma + this_size)
+				size += dma + this_size - addr;
+		}
 	}
 
 	return size;
@@ -586,12 +865,25 @@ static enum dma_status bcm2835_dma_tx_status(struct dma_chan *chan,
 		struct bcm2835_desc *d = c->desc;
 		dma_addr_t pos;
 
-		if (d->dir == DMA_MEM_TO_DEV)
+		if (d->dir == DMA_MEM_TO_DEV && c->is_40bit_channel) {
+			u64 lo_bits, hi_bits;
+
+			lo_bits = readl(c->chan_base + BCM2711_DMA40_SRC);
+			hi_bits = readl(c->chan_base + BCM2711_DMA40_SRCI) & 0xff;
+			pos = (hi_bits << 32) | lo_bits;
+		} else if (d->dir == DMA_MEM_TO_DEV && !c->is_40bit_channel) {
 			pos = readl(c->chan_base + BCM2835_DMA_SOURCE_AD);
-		else if (d->dir == DMA_DEV_TO_MEM)
+		} else if (d->dir == DMA_DEV_TO_MEM && c->is_40bit_channel) {
+			u64 lo_bits, hi_bits;
+
+			lo_bits = readl(c->chan_base + BCM2711_DMA40_DEST);
+			hi_bits = readl(c->chan_base + BCM2711_DMA40_DESTI) & 0xff;
+			pos = (hi_bits << 32) | lo_bits;
+		} else if (d->dir == DMA_DEV_TO_MEM && !c->is_40bit_channel) {
 			pos = readl(c->chan_base + BCM2835_DMA_DEST_AD);
-		else
+		} else {
 			pos = 0;
+		}
 
 		txstate->residue = bcm2835_dma_desc_size_pos(d, pos);
 	} else {
@@ -634,7 +926,7 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_memcpy(
 	frames = bcm2835_dma_frames_for_length(len, max_len);
 
 	/* allocate the CB chain - this also fills in the pointers */
-	d = bcm2835_dma_create_cb_chain(chan, DMA_MEM_TO_MEM, false,
+	d = bcm2835_dma_create_cb_chain(c, DMA_MEM_TO_MEM, false,
 					info, extra, frames,
 					src, dst, len, 0, GFP_KERNEL);
 	if (!d)
@@ -669,11 +961,21 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
 		if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
 			return NULL;
 		src = c->cfg.src_addr;
+		/*
+		 * One would think it ought to be possible to get the physical
+		 * to dma address mapping information from the dma-ranges DT
+		 * property, but I've not found a way yet that doesn't involve
+		 * open-coding the whole thing.
+		 */
+		if (c->is_40bit_channel)
+			src |= 0x400000000ull;
 		info |= BCM2835_DMA_S_DREQ | BCM2835_DMA_D_INC;
 	} else {
 		if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
 			return NULL;
 		dst = c->cfg.dst_addr;
+		if (c->is_40bit_channel)
+			dst |= 0x400000000ull;
 		info |= BCM2835_DMA_D_DREQ | BCM2835_DMA_S_INC;
 	}
 
@@ -681,7 +983,7 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
 	frames = bcm2835_dma_count_frames_for_sg(c, sgl, sg_len);
 
 	/* allocate the CB chain */
-	d = bcm2835_dma_create_cb_chain(chan, direction, false,
+	d = bcm2835_dma_create_cb_chain(c, direction, false,
 					info, extra,
 					frames, src, dst, 0, 0,
 					GFP_NOWAIT);
@@ -689,7 +991,7 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
 		return NULL;
 
 	/* fill in frames with scatterlist pointers */
-	bcm2835_dma_fill_cb_chain_with_sg(chan, direction, d->cb_list,
+	bcm2835_dma_fill_cb_chain_with_sg(c, direction, d->cb_list,
 					  sgl, sg_len);
 
 	return vchan_tx_prep(&c->vc, &d->vd, flags);
@@ -743,12 +1045,16 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
 		if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
 			return NULL;
 		src = c->cfg.src_addr;
+		if (c->is_40bit_channel)
+			src |= 0x400000000ull;
 		dst = buf_addr;
 		info |= BCM2835_DMA_S_DREQ | BCM2835_DMA_D_INC;
 	} else {
 		if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
 			return NULL;
 		dst = c->cfg.dst_addr;
+		if (c->is_40bit_channel)
+			dst |= 0x400000000ull;
 		src = buf_addr;
 		info |= BCM2835_DMA_D_DREQ | BCM2835_DMA_S_INC;
 
@@ -768,7 +1074,7 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
 	 * note that we need to use GFP_NOWAIT, as the ALSA i2s dmaengine
 	 * implementation calls prep_dma_cyclic with interrupts disabled.
 	 */
-	d = bcm2835_dma_create_cb_chain(chan, direction, true,
+	d = bcm2835_dma_create_cb_chain(c, direction, true,
 					info, extra,
 					frames, src, dst, buf_len,
 					period_len, GFP_NOWAIT);
@@ -776,7 +1082,12 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
 		return NULL;
 
 	/* wrap around into a loop */
-	d->cb_list[d->frames - 1].cb->next = d->cb_list[0].paddr;
+	if (c->is_40bit_channel)
+		((struct bcm2711_dma40_scb *)
+		 d->cb_list[frames - 1].cb)->next_cb =
+			to_bcm2711_cbaddr(d->cb_list[0].paddr);
+	else
+		d->cb_list[d->frames - 1].cb->next = d->cb_list[0].paddr;
 
 	return vchan_tx_prep(&c->vc, &d->vd, flags);
 }
@@ -837,9 +1148,11 @@ static int bcm2835_dma_chan_init(struct bcm2835_dmadev *d, int chan_id,
 	c->irq_number = irq;
 	c->irq_flags = irq_flags;
 
-	/* check in DEBUG register if this is a LITE channel */
-	if (readl(c->chan_base + BCM2835_DMA_DEBUG) &
-		BCM2835_DMA_DEBUG_LITE)
+	/* check for 40bit and lite channels */
+	if (d->cfg_data->chan_40bit_mask & BIT(chan_id))
+		c->is_40bit_channel = true;
+	else if (readl(c->chan_base + BCM2835_DMA_DEBUG) &
+		 BCM2835_DMA_DEBUG_LITE)
 		c->is_lite_channel = true;
 
 	return 0;
@@ -859,8 +1172,58 @@ static void bcm2835_dma_free(struct bcm2835_dmadev *od)
 			     DMA_TO_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
 }
 
+int bcm2711_dma40_memcpy_init(void)
+{
+	if (!memcpy_parent)
+		return -EPROBE_DEFER;
+
+	if (!memcpy_chan)
+		return -EINVAL;
+
+	if (!memcpy_scb)
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL(bcm2711_dma40_memcpy_init);
+
+void bcm2711_dma40_memcpy(dma_addr_t dst, dma_addr_t src, size_t size)
+{
+	struct bcm2711_dma40_scb *scb = memcpy_scb;
+	unsigned long flags;
+
+	if (!scb) {
+		pr_err("%s not initialised!\n", __func__);
+		return;
+	}
+
+	spin_lock_irqsave(&memcpy_lock, flags);
+
+	scb->ti = 0;
+	scb->src = lower_32_bits(src);
+	scb->srci = upper_32_bits(src) | BCM2711_DMA40_MEMCPY_XFER_INFO;
+	scb->dst = lower_32_bits(dst);
+	scb->dsti = upper_32_bits(dst) | BCM2711_DMA40_MEMCPY_XFER_INFO;
+	scb->len = size;
+	scb->next_cb = 0;
+
+	writel((u32)(memcpy_scb_dma >> 5), memcpy_chan + BCM2711_DMA40_CB);
+	writel(BCM2711_DMA40_MEMCPY_FLAGS + BCM2711_DMA40_ACTIVE,
+	       memcpy_chan + BCM2711_DMA40_CS);
+
+	/* Poll for completion */
+	while (!(readl(memcpy_chan + BCM2711_DMA40_CS) & BCM2711_DMA40_END))
+		cpu_relax();
+
+	writel(BCM2711_DMA40_END, memcpy_chan + BCM2711_DMA40_CS);
+
+	spin_unlock_irqrestore(&memcpy_lock, flags);
+}
+EXPORT_SYMBOL(bcm2711_dma40_memcpy);
+
 static const struct of_device_id bcm2835_dma_of_match[] = {
-	{ .compatible = "brcm,bcm2835-dma", },
+	{ .compatible = "brcm,bcm2835-dma", .data = &bcm2835_dma_cfg },
+	{ .compatible = "brcm,bcm2711-dma", .data = &bcm2711_dma_cfg },
 	{},
 };
 MODULE_DEVICE_TABLE(of, bcm2835_dma_of_match);
@@ -884,6 +1247,7 @@ static struct dma_chan *bcm2835_dma_xlate(struct of_phandle_args *spec,
 static int bcm2835_dma_probe(struct platform_device *pdev)
 {
 	struct bcm2835_dmadev *od;
+	struct resource *res;
 	void __iomem *base;
 	int rc;
 	int i, j;
@@ -891,6 +1255,8 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
 	int irq_flags;
 	uint32_t chans_available;
 	char chan_name[BCM2835_DMA_CHAN_NAME_SIZE];
+	const struct of_device_id *of_id;
+	int chan_count, chan_start, chan_end;
 
 	if (!pdev->dev.dma_mask)
 		pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
@@ -907,10 +1273,17 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
 
 	dma_set_max_seg_size(&pdev->dev, 0x3FFFFFFF);
 
-	base = devm_platform_ioremap_resource(pdev, 0);
+	base = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
 	if (IS_ERR(base))
 		return PTR_ERR(base);
 
+	/* The set of channels can be split across multiple instances. */
+	chan_start = ((u32)(uintptr_t)base / BCM2835_DMA_CHAN_SIZE) & 0xf;
+	base -= BCM2835_DMA_CHAN(chan_start);
+	chan_count = resource_size(res) / BCM2835_DMA_CHAN_SIZE;
+	chan_end = min(chan_start + chan_count,
+		       BCM2835_DMA_MAX_DMA_CHAN_SUPPORTED + 1);
+
 	od->base = base;
 
 	dma_cap_set(DMA_SLAVE, od->ddev.cap_mask);
@@ -946,6 +1319,14 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	}
 
+	of_id = of_match_node(bcm2835_dma_of_match, pdev->dev.of_node);
+	if (!of_id) {
+		dev_err(&pdev->dev, "Failed to match compatible string\n");
+		return -EINVAL;
+	}
+
+	od->cfg_data = of_id->data;
+
 	/* Request DMA channel mask from device tree */
 	if (of_property_read_u32(pdev->dev.of_node,
 			"brcm,dma-channel-mask",
@@ -955,8 +1336,23 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
 		goto err_no_dma;
 	}
 
+	/* One channel is reserved for the 40-bit DMA memcpy API */
+	if (chans_available & od->cfg_data->chan_40bit_mask &
+	    BIT(BCM2711_DMA_MEMCPY_CHAN)) {
+		memcpy_parent = od;
+		memcpy_chan = BCM2835_DMA_CHANIO(base, BCM2711_DMA_MEMCPY_CHAN);
+		memcpy_scb = dma_alloc_coherent(memcpy_parent->ddev.dev,
+						sizeof(*memcpy_scb),
+						&memcpy_scb_dma, GFP_KERNEL);
+		if (!memcpy_scb)
+			dev_warn(&pdev->dev,
+				 "Failed to allocated memcpy scb\n");
+
+		chans_available &= ~BIT(BCM2711_DMA_MEMCPY_CHAN);
+	}
+
 	/* get irqs for each channel that we support */
-	for (i = 0; i <= BCM2835_DMA_MAX_DMA_CHAN_SUPPORTED; i++) {
+	for (i = chan_start; i < chan_end; i++) {
 		/* skip masked out channels */
 		if (!(chans_available & (1 << i))) {
 			irq[i] = -1;
@@ -979,13 +1375,18 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
 		irq[i] = platform_get_irq(pdev, i < 11 ? i : 11);
 	}
 
+	chan_count = 0;
+
 	/* get irqs for each channel */
-	for (i = 0; i <= BCM2835_DMA_MAX_DMA_CHAN_SUPPORTED; i++) {
+	for (i = chan_start; i < chan_end; i++) {
 		/* skip channels without irq */
 		if (irq[i] < 0)
 			continue;
 
 		/* check if there are other channels that also use this irq */
+		/* FIXME: This will fail if interrupts are shared across
+		 * instances
+		 */
 		irq_flags = 0;
 		for (j = 0; j <= BCM2835_DMA_MAX_DMA_CHAN_SUPPORTED; j++)
 			if ((i != j) && (irq[j] == irq[i])) {
@@ -997,9 +1398,10 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
 		rc = bcm2835_dma_chan_init(od, i, irq[i], irq_flags);
 		if (rc)
 			goto err_no_dma;
+		chan_count++;
 	}
 
-	dev_dbg(&pdev->dev, "Initialized %i DMA channels\n", i);
+	dev_dbg(&pdev->dev, "Initialized %i DMA channels\n", chan_count);
 
 	/* Device-tree DMA controller registration */
 	rc = of_dma_controller_register(pdev->dev.of_node,
@@ -1030,6 +1432,13 @@ static void bcm2835_dma_remove(struct platform_device *pdev)
 	struct bcm2835_dmadev *od = platform_get_drvdata(pdev);
 
 	dma_async_device_unregister(&od->ddev);
+	if (memcpy_parent == od) {
+		dma_free_coherent(&pdev->dev, sizeof(*memcpy_scb), memcpy_scb,
+				  memcpy_scb_dma);
+		memcpy_parent = NULL;
+		memcpy_scb = NULL;
+		memcpy_chan = NULL;
+	}
 	bcm2835_dma_free(od);
 }
 
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/12] bcm2835-dma: Add NO_WAIT_RESP, DMA_WIDE_SOURCE and DMA_WIDE_DEST flag
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
  2024-02-04  6:59 ` [PATCH 01/12] bcm2835-dma: Add support for per-channel flags Andrea della Porta
  2024-02-04  6:59 ` [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-04  6:59 ` [PATCH 04/12] bcm2835-dma: Advertise the full DMA range Andrea della Porta
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell, Andrea della Porta

From: Phil Elwell <phil@raspberrypi.com>

Use bit 27 of the dreq value (the second cell of the DT DMA descriptor)
to request that the WAIT_RESP bit is not set.

Use (reserved) bits 24 and 25 of the dreq value
(the second cell of the DT DMA descriptor) to request
that wide source reads or wide dest writes are required

Cc: Dom Cobley <popcornmix@gmail.com>
Cc: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
 drivers/dma/bcm2835-dma.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 11c6bf7d8a4b..36bad198b655 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -163,6 +163,21 @@ struct bcm2835_desc {
 #define BCM2835_DMA_WAIT(x)	((x & 31) << 21) /* add DMA-wait cycles */
 #define BCM2835_DMA_NO_WIDE_BURSTS BIT(26) /* no 2 beat write bursts */
 
+/* A fake bit to request that the driver doesn't set the WAIT_RESP bit. */
+#define BCM2835_DMA_NO_WAIT_RESP BIT(27)
+#define WAIT_RESP(x) (((x) & BCM2835_DMA_NO_WAIT_RESP) ? \
+		      0 : BCM2835_DMA_WAIT_RESP)
+
+/* A fake bit to request that the driver requires wide reads */
+#define BCM2835_DMA_WIDE_SOURCE BIT(24)
+#define WIDE_SOURCE(x) (((x) & BCM2835_DMA_WIDE_SOURCE) ? \
+		      BCM2835_DMA_S_WIDTH : 0)
+
+/* A fake bit to request that the driver requires wide writes */
+#define BCM2835_DMA_WIDE_DEST BIT(25)
+#define WIDE_DEST(x) (((x) & BCM2835_DMA_WIDE_DEST) ? \
+		      BCM2835_DMA_D_WIDTH : 0)
+
 /* debug register bits */
 #define BCM2835_DMA_DEBUG_LAST_NOT_SET_ERR	BIT(0)
 #define BCM2835_DMA_DEBUG_FIFO_ERR		BIT(1)
@@ -913,8 +928,9 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_memcpy(
 {
 	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
 	struct bcm2835_desc *d;
-	u32 info = BCM2835_DMA_D_INC | BCM2835_DMA_S_INC;
-	u32 extra = BCM2835_DMA_INT_EN | BCM2835_DMA_WAIT_RESP;
+	u32 info = BCM2835_DMA_D_INC | BCM2835_DMA_S_INC |
+		   WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
+	u32 extra = BCM2835_DMA_INT_EN;
 	size_t max_len = bcm2835_dma_max_frame_length(c);
 	size_t frames;
 
@@ -944,7 +960,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
 	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
 	struct bcm2835_desc *d;
 	dma_addr_t src = 0, dst = 0;
-	u32 info = BCM2835_DMA_WAIT_RESP;
+	u32 info = WAIT_RESP(c->dreq) |
+		   WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
 	u32 extra = BCM2835_DMA_INT_EN;
 	size_t frames;
 
@@ -1006,7 +1023,7 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
 	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
 	struct bcm2835_desc *d;
 	dma_addr_t src, dst;
-	u32 info = BCM2835_DMA_WAIT_RESP;
+	u32 info = WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
 	u32 extra = 0;
 	size_t max_len = bcm2835_dma_max_frame_length(c);
 	size_t frames;
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/12] bcm2835-dma: Advertise the full DMA range
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (2 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 03/12] bcm2835-dma: Add NO_WAIT_RESP, DMA_WIDE_SOURCE and DMA_WIDE_DEST flag Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-05 17:55   ` Robin Murphy
  2024-02-05 18:25   ` Stefan Wahren
  2024-02-04  6:59 ` [PATCH 05/12] bcm2835-dma: Derive slave DMA addresses correctly Andrea della Porta
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

From: Phil Elwell <phil@raspberrypi.com>

Unless the DMA mask is set wider than 32 bits, DMA mapping will use a
bounce buffer.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
---
 drivers/dma/bcm2835-dma.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 36bad198b655..237dcdb8d726 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -39,6 +39,7 @@
 #define BCM2711_DMA_MEMCPY_CHAN 14
 
 struct bcm2835_dma_cfg_data {
+	u64	dma_mask;
 	u32	chan_40bit_mask;
 };
 
@@ -308,10 +309,12 @@ DEFINE_SPINLOCK(memcpy_lock);
 
 static const struct bcm2835_dma_cfg_data bcm2835_dma_cfg = {
 	.chan_40bit_mask = 0,
+	.dma_mask = DMA_BIT_MASK(32),
 };
 
 static const struct bcm2835_dma_cfg_data bcm2711_dma_cfg = {
 	.chan_40bit_mask = BIT(11) | BIT(12) | BIT(13) | BIT(14),
+	.dma_mask = DMA_BIT_MASK(36),
 };
 
 static inline size_t bcm2835_dma_max_frame_length(struct bcm2835_chan *c)
@@ -1263,6 +1266,8 @@ static struct dma_chan *bcm2835_dma_xlate(struct of_phandle_args *spec,
 
 static int bcm2835_dma_probe(struct platform_device *pdev)
 {
+	const struct bcm2835_dma_cfg_data *cfg_data;
+	const struct of_device_id *of_id;
 	struct bcm2835_dmadev *od;
 	struct resource *res;
 	void __iomem *base;
@@ -1272,13 +1277,20 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
 	int irq_flags;
 	uint32_t chans_available;
 	char chan_name[BCM2835_DMA_CHAN_NAME_SIZE];
-	const struct of_device_id *of_id;
 	int chan_count, chan_start, chan_end;
 
+	of_id = of_match_node(bcm2835_dma_of_match, pdev->dev.of_node);
+	if (!of_id) {
+		dev_err(&pdev->dev, "Failed to match compatible string\n");
+		return -EINVAL;
+	}
+
+	cfg_data = of_id->data;
+
 	if (!pdev->dev.dma_mask)
 		pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
 
-	rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+	rc = dma_set_mask_and_coherent(&pdev->dev, cfg_data->dma_mask);
 	if (rc) {
 		dev_err(&pdev->dev, "Unable to set DMA mask\n");
 		return rc;
@@ -1342,7 +1354,7 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
 		return -EINVAL;
 	}
 
-	od->cfg_data = of_id->data;
+	od->cfg_data = cfg_data;
 
 	/* Request DMA channel mask from device tree */
 	if (of_property_read_u32(pdev->dev.of_node,
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/12] bcm2835-dma: Derive slave DMA addresses correctly
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (3 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 04/12] bcm2835-dma: Advertise the full DMA range Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-05 18:03   ` Robin Murphy
  2024-02-04  6:59 ` [PATCH 06/12] dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant Andrea della Porta
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

From: Phil Elwell <phil@raspberrypi.com>

Slave addresses for DMA are meant to be supplied as physical addresses
(contrary to what struct snd_dmaengine_dai_dma_data does). It is up to
the DMA controller driver to perform the translation based on its own
view of the world, as described in Device Tree.

Now that the Pi Device Trees have the correct peripheral mappings,
replace the hacky address munging with phys_to_dma().

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
---
 drivers/dma/bcm2835-dma.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 237dcdb8d726..077812eda609 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -18,6 +18,7 @@
  *	Copyright 2012 Marvell International Ltd.
  */
 #include <linux/dmaengine.h>
+#include <linux/dma-direct.h>
 #include <linux/dma-mapping.h>
 #include <linux/dmapool.h>
 #include <linux/err.h>
@@ -980,22 +981,12 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
 	if (direction == DMA_DEV_TO_MEM) {
 		if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
 			return NULL;
-		src = c->cfg.src_addr;
-		/*
-		 * One would think it ought to be possible to get the physical
-		 * to dma address mapping information from the dma-ranges DT
-		 * property, but I've not found a way yet that doesn't involve
-		 * open-coding the whole thing.
-		 */
-		if (c->is_40bit_channel)
-			src |= 0x400000000ull;
+		src = phys_to_dma(chan->device->dev, c->cfg.src_addr);
 		info |= BCM2835_DMA_S_DREQ | BCM2835_DMA_D_INC;
 	} else {
 		if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
 			return NULL;
-		dst = c->cfg.dst_addr;
-		if (c->is_40bit_channel)
-			dst |= 0x400000000ull;
+		dst = phys_to_dma(chan->device->dev, c->cfg.dst_addr);
 		info |= BCM2835_DMA_D_DREQ | BCM2835_DMA_S_INC;
 	}
 
@@ -1064,17 +1055,13 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
 	if (direction == DMA_DEV_TO_MEM) {
 		if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
 			return NULL;
-		src = c->cfg.src_addr;
-		if (c->is_40bit_channel)
-			src |= 0x400000000ull;
+		src = phys_to_dma(chan->device->dev, c->cfg.src_addr);
 		dst = buf_addr;
 		info |= BCM2835_DMA_S_DREQ | BCM2835_DMA_D_INC;
 	} else {
 		if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
 			return NULL;
-		dst = c->cfg.dst_addr;
-		if (c->is_40bit_channel)
-			dst |= 0x400000000ull;
+		dst = phys_to_dma(chan->device->dev, c->cfg.dst_addr);
 		src = buf_addr;
 		info |= BCM2835_DMA_D_DREQ | BCM2835_DMA_S_INC;
 
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/12] dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (4 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 05/12] bcm2835-dma: Derive slave DMA addresses correctly Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-04 17:04   ` Florian Fainelli
  2024-02-04  6:59 ` [PATCH 07/12] bcm2835-dma: Support dma flags for multi-beat burst Andrea della Porta
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

From: Maxime Ripard <maxime@cerno.tech>

bcm2711_dma40_memcpy has some code strictly equivalent to the
to_bcm2711_cbaddr() function. Let's use it instead.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
---
 drivers/dma/bcm2835-dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 077812eda609..d8d1f9ba2572 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -1214,7 +1214,7 @@ void bcm2711_dma40_memcpy(dma_addr_t dst, dma_addr_t src, size_t size)
 	scb->len = size;
 	scb->next_cb = 0;
 
-	writel((u32)(memcpy_scb_dma >> 5), memcpy_chan + BCM2711_DMA40_CB);
+	writel(to_bcm2711_cbaddr(memcpy_scb_dma), memcpy_chan + BCM2711_DMA40_CB);
 	writel(BCM2711_DMA40_MEMCPY_FLAGS + BCM2711_DMA40_ACTIVE,
 	       memcpy_chan + BCM2711_DMA40_CS);
 
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/12] bcm2835-dma: Support dma flags for multi-beat burst
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (5 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 06/12] dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-07  8:22   ` Vinod Koul
  2024-02-04  6:59 ` [PATCH 08/12] bcm2835-dma: Need to keep PROT bits set in CS on 40bit controller Andrea della Porta
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell, Andrea della Porta

From: Dom Cobley <popcornmix@gmail.com>

Add a control bit to enable a multi-beat burst on a DMA.
This improves DMA performance and is required for HDMI audio.

Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
 drivers/dma/bcm2835-dma.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index d8d1f9ba2572..a20700a400a2 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -156,7 +156,8 @@ struct bcm2835_desc {
 #define BCM2835_DMA_S_WIDTH	BIT(9) /* 128bit writes if set */
 #define BCM2835_DMA_S_DREQ	BIT(10) /* enable SREQ for source */
 #define BCM2835_DMA_S_IGNORE	BIT(11) /* ignore source reads - read 0 */
-#define BCM2835_DMA_BURST_LENGTH(x) ((x & 15) << 12)
+#define BCM2835_DMA_BURST_LENGTH(x) (((x) & 15) << 12)
+#define BCM2835_DMA_GET_BURST_LENGTH(x) (((x) >> 12) & 15)
 #define BCM2835_DMA_CS_FLAGS(x) ((x) & (BCM2835_DMA_PRIORITY(15) | \
 				      BCM2835_DMA_PANIC_PRIORITY(15) | \
 				      BCM2835_DMA_WAIT_FOR_WRITES | \
@@ -180,6 +181,11 @@ struct bcm2835_desc {
 #define WIDE_DEST(x) (((x) & BCM2835_DMA_WIDE_DEST) ? \
 		      BCM2835_DMA_D_WIDTH : 0)
 
+/* A fake bit to request that the driver requires multi-beat burst */
+#define BCM2835_DMA_BURST BIT(30)
+#define BURST_LENGTH(x) (((x) & BCM2835_DMA_BURST) ? \
+			 BCM2835_DMA_BURST_LENGTH(3) : 0)
+
 /* debug register bits */
 #define BCM2835_DMA_DEBUG_LAST_NOT_SET_ERR	BIT(0)
 #define BCM2835_DMA_DEBUG_FIFO_ERR		BIT(1)
@@ -282,7 +288,7 @@ struct bcm2835_desc {
 /* the max dma length for different channels */
 #define MAX_DMA40_LEN SZ_1G
 
-#define BCM2711_DMA40_BURST_LEN(x)	((min(x, 16) - 1) << 8)
+#define BCM2711_DMA40_BURST_LEN(x)	(((x) & 15) << 8)
 #define BCM2711_DMA40_INC		BIT(12)
 #define BCM2711_DMA40_SIZE_32		(0 << 13)
 #define BCM2711_DMA40_SIZE_64		(1 << 13)
@@ -359,12 +365,16 @@ static inline uint32_t to_bcm2711_ti(uint32_t info)
 
 static inline uint32_t to_bcm2711_srci(uint32_t info)
 {
-	return ((info & BCM2835_DMA_S_INC) ? BCM2711_DMA40_INC : 0);
+	return ((info & BCM2835_DMA_S_INC) ? BCM2711_DMA40_INC : 0) |
+	       ((info & BCM2835_DMA_S_WIDTH) ? BCM2711_DMA40_SIZE_128 : 0) |
+	       BCM2711_DMA40_BURST_LEN(BCM2835_DMA_GET_BURST_LENGTH(info));
 }
 
 static inline uint32_t to_bcm2711_dsti(uint32_t info)
 {
-	return ((info & BCM2835_DMA_D_INC) ? BCM2711_DMA40_INC : 0);
+	return ((info & BCM2835_DMA_D_INC) ? BCM2711_DMA40_INC : 0) |
+	       ((info & BCM2835_DMA_D_WIDTH) ? BCM2711_DMA40_SIZE_128 : 0) |
+	       BCM2711_DMA40_BURST_LEN(BCM2835_DMA_GET_BURST_LENGTH(info));
 }
 
 static inline uint32_t to_bcm2711_cbaddr(dma_addr_t addr)
@@ -933,7 +943,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_memcpy(
 	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
 	struct bcm2835_desc *d;
 	u32 info = BCM2835_DMA_D_INC | BCM2835_DMA_S_INC |
-		   WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
+		   WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) |
+		   WIDE_DEST(c->dreq) | BURST_LENGTH(c->dreq);
 	u32 extra = BCM2835_DMA_INT_EN;
 	size_t max_len = bcm2835_dma_max_frame_length(c);
 	size_t frames;
@@ -964,8 +975,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
 	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
 	struct bcm2835_desc *d;
 	dma_addr_t src = 0, dst = 0;
-	u32 info = WAIT_RESP(c->dreq) |
-		   WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
+	u32 info = WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) |
+		   WIDE_DEST(c->dreq) | BURST_LENGTH(c->dreq);
 	u32 extra = BCM2835_DMA_INT_EN;
 	size_t frames;
 
@@ -1017,7 +1028,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
 	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
 	struct bcm2835_desc *d;
 	dma_addr_t src, dst;
-	u32 info = WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
+	u32 info = WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) |
+		   WIDE_DEST(c->dreq) | BURST_LENGTH(c->dreq);
 	u32 extra = 0;
 	size_t max_len = bcm2835_dma_max_frame_length(c);
 	size_t frames;
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/12] bcm2835-dma: Need to keep PROT bits set in CS on 40bit controller
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (6 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 07/12] bcm2835-dma: Support dma flags for multi-beat burst Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-04  6:59 ` [PATCH 09/12] dmaengine: bcm2835: Add BCM2712 support Andrea della Porta
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell, Andrea della Porta

From: Dom Cobley <popcornmix@gmail.com>

Resetting them to zero puts DMA channel into secure mode
which makes further accesses impossible

Cc: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
 drivers/dma/bcm2835-dma.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index a20700a400a2..1b3f470274b2 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -239,6 +239,8 @@ struct bcm2835_desc {
 #define BCM2711_DMA40_WR_PAUSED		BIT(5)  /* Writing is paused */
 #define BCM2711_DMA40_DREQ_PAUSED	BIT(6)  /* Is paused by DREQ flow control */
 #define BCM2711_DMA40_WAITING_FOR_WRITES BIT(7)  /* Waiting for last write */
+// we always want to run in supervisor mode
+#define BCM2711_DMA40_PROT		(BIT(8) | BIT(9))
 #define BCM2711_DMA40_ERR		BIT(10)
 #define BCM2711_DMA40_QOS(x)		(((x) & 0x1f) << 16)
 #define BCM2711_DMA40_PANIC_QOS(x)	(((x) & 0x1f) << 20)
@@ -246,10 +248,10 @@ struct bcm2835_desc {
 #define BCM2711_DMA40_DISDEBUG		BIT(29)
 #define BCM2711_DMA40_ABORT		BIT(30)
 #define BCM2711_DMA40_HALT		BIT(31)
-#define BCM2711_DMA40_CS_FLAGS(x) ((x) & (BCM2711_DMA40_QOS(15) | \
-					BCM2711_DMA40_PANIC_QOS(15) | \
-					BCM2711_DMA40_WAIT_FOR_WRITES |	\
-					BCM2711_DMA40_DISDEBUG))
+#define BCM2711_DMA40_CS_FLAGS(x)	((x) & (BCM2711_DMA40_QOS(15) | \
+					 BCM2711_DMA40_PANIC_QOS(15) | \
+					 BCM2711_DMA40_WAIT_FOR_WRITES | \
+					 BCM2711_DMA40_DISDEBUG))
 
 /* Transfer information bits */
 #define BCM2711_DMA40_INTEN		BIT(0)
@@ -679,7 +681,7 @@ static void bcm2835_dma_abort(struct bcm2835_chan *c)
 			dev_err(c->vc.chan.device->dev,
 				"failed to halt dma\n");
 
-		writel(0, chan_base + BCM2711_DMA40_CS);
+		writel(BCM2711_DMA40_PROT, chan_base + BCM2711_DMA40_CS);
 		writel(0, chan_base + BCM2711_DMA40_CB);
 	} else {
 		/*
@@ -739,7 +741,7 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
 	if (c->is_40bit_channel) {
 		writel(to_bcm2711_cbaddr(d->cb_list[0].paddr),
 		       c->chan_base + BCM2711_DMA40_CB);
-		writel(BCM2711_DMA40_ACTIVE | BCM2711_DMA40_CS_FLAGS(c->dreq),
+		writel(BCM2711_DMA40_ACTIVE | BCM2711_DMA40_PROT | BCM2711_DMA40_CS_FLAGS(c->dreq),
 		       c->chan_base + BCM2711_DMA40_CS);
 	} else {
 		writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
@@ -772,8 +774,13 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
 	 * if this IRQ handler is threaded.) If the channel is finished, it
 	 * will remain idle despite the ACTIVE flag being set.
 	 */
-	writel(BCM2835_DMA_INT | BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
-	       c->chan_base + BCM2835_DMA_CS);
+	if (c->is_40bit_channel)
+		writel(BCM2835_DMA_INT | BCM2711_DMA40_ACTIVE | BCM2711_DMA40_PROT |
+		       BCM2711_DMA40_CS_FLAGS(c->dreq),
+		       c->chan_base + BCM2711_DMA40_CS);
+	else
+		writel(BCM2835_DMA_INT | BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
+		       c->chan_base + BCM2835_DMA_CS);
 
 	d = c->desc;
 
@@ -1227,14 +1234,14 @@ void bcm2711_dma40_memcpy(dma_addr_t dst, dma_addr_t src, size_t size)
 	scb->next_cb = 0;
 
 	writel(to_bcm2711_cbaddr(memcpy_scb_dma), memcpy_chan + BCM2711_DMA40_CB);
-	writel(BCM2711_DMA40_MEMCPY_FLAGS + BCM2711_DMA40_ACTIVE,
+	writel(BCM2711_DMA40_MEMCPY_FLAGS | BCM2711_DMA40_ACTIVE | BCM2711_DMA40_PROT,
 	       memcpy_chan + BCM2711_DMA40_CS);
 
 	/* Poll for completion */
 	while (!(readl(memcpy_chan + BCM2711_DMA40_CS) & BCM2711_DMA40_END))
 		cpu_relax();
 
-	writel(BCM2711_DMA40_END, memcpy_chan + BCM2711_DMA40_CS);
+	writel(BCM2711_DMA40_END | BCM2711_DMA40_PROT, memcpy_chan + BCM2711_DMA40_CS);
 
 	spin_unlock_irqrestore(&memcpy_lock, flags);
 }
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/12] dmaengine: bcm2835: Add BCM2712 support
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (7 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 08/12] bcm2835-dma: Need to keep PROT bits set in CS on 40bit controller Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-04  6:59 ` [PATCH 10/12] dmaengine: bcm2835: Support DMA-Lite channels Andrea della Porta
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

From: Phil Elwell <phil@raspberrypi.com>

BCM2712 has 6 40-bit channels - DMA6 to DMA11. Add a new compatible
string to indicate that the current platform is BCM2712.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
---
 drivers/dma/bcm2835-dma.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 1b3f470274b2..548cf7343d83 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -326,6 +326,12 @@ static const struct bcm2835_dma_cfg_data bcm2711_dma_cfg = {
 	.dma_mask = DMA_BIT_MASK(36),
 };
 
+static const struct bcm2835_dma_cfg_data bcm2712_dma_cfg = {
+	.chan_40bit_mask = BIT(6) | BIT(7) | BIT(8) | BIT(9) |
+				 BIT(10) | BIT(11),
+	.dma_mask = DMA_BIT_MASK(40),
+};
+
 static inline size_t bcm2835_dma_max_frame_length(struct bcm2835_chan *c)
 {
 	/* lite and normal channels have different max frame length */
@@ -1250,6 +1256,7 @@ EXPORT_SYMBOL(bcm2711_dma40_memcpy);
 static const struct of_device_id bcm2835_dma_of_match[] = {
 	{ .compatible = "brcm,bcm2835-dma", .data = &bcm2835_dma_cfg },
 	{ .compatible = "brcm,bcm2711-dma", .data = &bcm2711_dma_cfg },
+	{ .compatible = "brcm,bcm2712-dma", .data = &bcm2712_dma_cfg },
 	{},
 };
 MODULE_DEVICE_TABLE(of, bcm2835_dma_of_match);
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/12] dmaengine: bcm2835: Support DMA-Lite channels
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (8 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 09/12] dmaengine: bcm2835: Add BCM2712 support Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-07  8:26   ` Vinod Koul
  2024-02-04  6:59 ` [PATCH 11/12] dmaengine: bcm2835: Rename to_bcm2711_cbaddr to to_40bit_cbaddr Andrea della Porta
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell, Andrea della Porta

From: Maxime Ripard <maxime@cerno.tech>

The BCM2712 has a DMA-Lite controller that is basically a BCM2835-style
DMA controller that supports 40 bits DMA addresses.

We need it for HDMI audio to work.

Cc: Maxime Ripard <maxime@cerno.tech>
Cc: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
 drivers/dma/bcm2835-dma.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 548cf7343d83..055c558caa0e 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -100,6 +100,7 @@ struct bcm2835_chan {
 
 	bool is_lite_channel;
 	bool is_40bit_channel;
+	bool is_2712;
 };
 
 struct bcm2835_desc {
@@ -545,7 +546,11 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
 			control_block->info = info;
 			control_block->src = src;
 			control_block->dst = dst;
-			control_block->stride = 0;
+			if (c->is_2712)
+				control_block->stride = (upper_32_bits(dst) << 8) |
+							upper_32_bits(src);
+			else
+				control_block->stride = 0;
 			control_block->next = 0;
 		}
 
@@ -570,7 +575,8 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
 			 d->cb_list[frame - 1].cb)->next_cb =
 				to_bcm2711_cbaddr(cb_entry->paddr);
 		if (frame && !c->is_40bit_channel)
-			d->cb_list[frame - 1].cb->next = cb_entry->paddr;
+			d->cb_list[frame - 1].cb->next = c->is_2712 ?
+			to_bcm2711_cbaddr(cb_entry->paddr) : cb_entry->paddr;
 
 		/* update src and dst and length */
 		if (src && (info & BCM2835_DMA_S_INC)) {
@@ -750,7 +756,10 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
 		writel(BCM2711_DMA40_ACTIVE | BCM2711_DMA40_PROT | BCM2711_DMA40_CS_FLAGS(c->dreq),
 		       c->chan_base + BCM2711_DMA40_CS);
 	} else {
-		writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
+		writel(BIT(31), c->chan_base + BCM2835_DMA_CS);
+
+		writel(c->is_2712 ? to_bcm2711_cbaddr(d->cb_list[0].paddr) : d->cb_list[0].paddr,
+		       c->chan_base + BCM2835_DMA_ADDR);
 		writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
 		       c->chan_base + BCM2835_DMA_CS);
 	}
@@ -1119,7 +1128,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
 		 d->cb_list[frames - 1].cb)->next_cb =
 			to_bcm2711_cbaddr(d->cb_list[0].paddr);
 	else
-		d->cb_list[d->frames - 1].cb->next = d->cb_list[0].paddr;
+		d->cb_list[d->frames - 1].cb->next = c->is_2712 ?
+		to_bcm2711_cbaddr(d->cb_list[0].paddr) : d->cb_list[0].paddr;
 
 	return vchan_tx_prep(&c->vc, &d->vd, flags);
 }
@@ -1186,6 +1196,8 @@ static int bcm2835_dma_chan_init(struct bcm2835_dmadev *d, int chan_id,
 	else if (readl(c->chan_base + BCM2835_DMA_DEBUG) &
 		 BCM2835_DMA_DEBUG_LITE)
 		c->is_lite_channel = true;
+	if (d->cfg_data->dma_mask == DMA_BIT_MASK(40))
+		c->is_2712 = true;
 
 	return 0;
 }
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/12] dmaengine: bcm2835: Rename to_bcm2711_cbaddr to to_40bit_cbaddr
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (9 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 10/12] dmaengine: bcm2835: Support DMA-Lite channels Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-04  6:59 ` [PATCH 12/12] bcm2835-dma: Fixes for dma_abort Andrea della Porta
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

From: Dom Cobley <popcornmix@gmail.com>

As the shifted address also applies to bcm2712,
give the function a more specific name.

Signed-off-by: Dom Cobley <popcornmix@gmail.com>
---
 drivers/dma/bcm2835-dma.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 055c558caa0e..40df0a165992 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -386,7 +386,7 @@ static inline uint32_t to_bcm2711_dsti(uint32_t info)
 	       BCM2711_DMA40_BURST_LEN(BCM2835_DMA_GET_BURST_LENGTH(info));
 }
 
-static inline uint32_t to_bcm2711_cbaddr(dma_addr_t addr)
+static inline uint32_t to_40bit_cbaddr(dma_addr_t addr)
 {
 	WARN_ON_ONCE(addr & 0x1f);
 	return (addr >> 5);
@@ -573,10 +573,10 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
 		if (frame && c->is_40bit_channel)
 			((struct bcm2711_dma40_scb *)
 			 d->cb_list[frame - 1].cb)->next_cb =
-				to_bcm2711_cbaddr(cb_entry->paddr);
+				to_40bit_cbaddr(cb_entry->paddr);
 		if (frame && !c->is_40bit_channel)
 			d->cb_list[frame - 1].cb->next = c->is_2712 ?
-			to_bcm2711_cbaddr(cb_entry->paddr) : cb_entry->paddr;
+			to_40bit_cbaddr(cb_entry->paddr) : cb_entry->paddr;
 
 		/* update src and dst and length */
 		if (src && (info & BCM2835_DMA_S_INC)) {
@@ -751,14 +751,14 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
 	c->desc = d = to_bcm2835_dma_desc(&vd->tx);
 
 	if (c->is_40bit_channel) {
-		writel(to_bcm2711_cbaddr(d->cb_list[0].paddr),
+		writel(to_40bit_cbaddr(d->cb_list[0].paddr),
 		       c->chan_base + BCM2711_DMA40_CB);
 		writel(BCM2711_DMA40_ACTIVE | BCM2711_DMA40_PROT | BCM2711_DMA40_CS_FLAGS(c->dreq),
 		       c->chan_base + BCM2711_DMA40_CS);
 	} else {
 		writel(BIT(31), c->chan_base + BCM2835_DMA_CS);
 
-		writel(c->is_2712 ? to_bcm2711_cbaddr(d->cb_list[0].paddr) : d->cb_list[0].paddr,
+		writel(c->is_2712 ? to_40bit_cbaddr(d->cb_list[0].paddr) : d->cb_list[0].paddr,
 		       c->chan_base + BCM2835_DMA_ADDR);
 		writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
 		       c->chan_base + BCM2835_DMA_CS);
@@ -1126,10 +1126,10 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
 	if (c->is_40bit_channel)
 		((struct bcm2711_dma40_scb *)
 		 d->cb_list[frames - 1].cb)->next_cb =
-			to_bcm2711_cbaddr(d->cb_list[0].paddr);
+			to_40bit_cbaddr(d->cb_list[0].paddr);
 	else
 		d->cb_list[d->frames - 1].cb->next = c->is_2712 ?
-		to_bcm2711_cbaddr(d->cb_list[0].paddr) : d->cb_list[0].paddr;
+		to_40bit_cbaddr(d->cb_list[0].paddr) : d->cb_list[0].paddr;
 
 	return vchan_tx_prep(&c->vc, &d->vd, flags);
 }
@@ -1251,7 +1251,7 @@ void bcm2711_dma40_memcpy(dma_addr_t dst, dma_addr_t src, size_t size)
 	scb->len = size;
 	scb->next_cb = 0;
 
-	writel(to_bcm2711_cbaddr(memcpy_scb_dma), memcpy_chan + BCM2711_DMA40_CB);
+	writel(to_40bit_cbaddr(memcpy_scb_dma), memcpy_chan + BCM2711_DMA40_CB);
 	writel(BCM2711_DMA40_MEMCPY_FLAGS | BCM2711_DMA40_ACTIVE | BCM2711_DMA40_PROT,
 	       memcpy_chan + BCM2711_DMA40_CS);
 
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/12] bcm2835-dma: Fixes for dma_abort
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (10 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 11/12] dmaengine: bcm2835: Rename to_bcm2711_cbaddr to to_40bit_cbaddr Andrea della Porta
@ 2024-02-04  6:59 ` Andrea della Porta
  2024-02-05 19:06 ` [PATCH 00/12] Add support for BCM2712 DMA engine Stefan Wahren
  2024-02-07  8:19 ` Vinod Koul
  13 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-04  6:59 UTC (permalink / raw)
  To: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell, Andrea della Porta

From: Dom Cobley <popcornmix@gmail.com>

There is a problem with the current abort scheme
when dma is blocked on a DREQ which prevents halting.

This is triggered by SPI driver which aborts dma
in this state and so leads to a halt timeout.

Discussion with Broadcom suggests the sequence:

CS.ACTIVE=0
while (CS.OUTSTANDING_TRANSACTIONS == 0)
  wait()
DEBUG.RESET=1

should be safe on a dma40 channel.

Unfortunately the non-dma40 channels don't have
OUTSTANDING_TRANSACTIONS, so we need a more
complicated scheme.

We attempt to abort the channel, which will work
if there is no blocked DREQ.

It it times out, we can assume there is no AXI
transfer in progress and reset anyway.

The length of the timeout is observed at ~20us.

Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
 drivers/dma/bcm2835-dma.c | 72 +++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 33 deletions(-)

diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 40df0a165992..5751d1c6ff94 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -245,6 +245,7 @@ struct bcm2835_desc {
 #define BCM2711_DMA40_ERR		BIT(10)
 #define BCM2711_DMA40_QOS(x)		(((x) & 0x1f) << 16)
 #define BCM2711_DMA40_PANIC_QOS(x)	(((x) & 0x1f) << 20)
+#define BCM2711_DMA40_TRANSACTIONS	BIT(25)
 #define BCM2711_DMA40_WAIT_FOR_WRITES	BIT(28)
 #define BCM2711_DMA40_DISDEBUG		BIT(29)
 #define BCM2711_DMA40_ABORT		BIT(30)
@@ -671,30 +672,37 @@ static void bcm2835_dma_fill_cb_chain_with_sg(
 static void bcm2835_dma_abort(struct bcm2835_chan *c)
 {
 	void __iomem *chan_base = c->chan_base;
-	long int timeout = 10000;
-
-	/*
-	 * A zero control block address means the channel is idle.
-	 * (The ACTIVE flag in the CS register is not a reliable indicator.)
-	 */
-	if (!readl(chan_base + BCM2835_DMA_ADDR))
-		return;
+	long timeout = 100;
 
 	if (c->is_40bit_channel) {
-		/* Halt the current DMA */
-		writel(readl(chan_base + BCM2711_DMA40_CS) | BCM2711_DMA40_HALT,
+		/*
+		 * A zero control block address means the channel is idle.
+		 * (The ACTIVE flag in the CS register is not a reliable indicator.)
+		 */
+		if (!readl(chan_base + BCM2711_DMA40_CB))
+			return;
+
+		/* Pause the current DMA */
+		writel(readl(chan_base + BCM2711_DMA40_CS) & ~BCM2711_DMA40_ACTIVE,
 		       chan_base + BCM2711_DMA40_CS);
 
-		while ((readl(chan_base + BCM2711_DMA40_CS) & BCM2711_DMA40_HALT) && --timeout)
+		/* wait for outstanding transactions to complete */
+		while ((readl(chan_base + BCM2711_DMA40_CS) & BCM2711_DMA40_TRANSACTIONS) &&
+		       --timeout)
 			cpu_relax();
 
-		/* Peripheral might be stuck and fail to halt */
+		/* Peripheral might be stuck and fail to complete */
 		if (!timeout)
 			dev_err(c->vc.chan.device->dev,
-				"failed to halt dma\n");
+				"failed to complete pause on dma %d (CS:%08x)\n", c->ch,
+				readl(chan_base + BCM2711_DMA40_CS));
 
+		/* Set CS back to default state */
 		writel(BCM2711_DMA40_PROT, chan_base + BCM2711_DMA40_CS);
-		writel(0, chan_base + BCM2711_DMA40_CB);
+
+		/* Reset the DMA */
+		writel(readl(chan_base + BCM2711_DMA40_DEBUG) | BCM2711_DMA40_DEBUG_RESET,
+		       chan_base + BCM2711_DMA40_DEBUG);
 	} else {
 		/*
 		 * A zero control block address means the channel is idle.
@@ -703,20 +711,6 @@ static void bcm2835_dma_abort(struct bcm2835_chan *c)
 		if (!readl(chan_base + BCM2835_DMA_ADDR))
 			return;
 
-		/* Write 0 to the active bit - Pause the DMA */
-		writel(readl(chan_base + BCM2835_DMA_CS) & ~BCM2835_DMA_ACTIVE,
-		       chan_base + BCM2835_DMA_CS);
-
-		/* wait for DMA to be paused */
-		while ((readl(chan_base + BCM2835_DMA_CS) & BCM2835_DMA_WAITING_FOR_WRITES) &&
-		       --timeout)
-			cpu_relax();
-
-		/* Peripheral might be stuck and fail to signal AXI write responses */
-		if (!timeout)
-			dev_err(c->vc.chan.device->dev,
-				"failed to pause dma\n");
-
 		/* We need to clear the next DMA block pending */
 		writel(0, chan_base + BCM2835_DMA_NEXTCB);
 
@@ -724,15 +718,27 @@ static void bcm2835_dma_abort(struct bcm2835_chan *c)
 		writel(readl(chan_base + BCM2835_DMA_CS) | BCM2835_DMA_ABORT | BCM2835_DMA_ACTIVE,
 		       chan_base + BCM2835_DMA_CS);
 
-		/* wait for DMA to have been aborted */
-		timeout = 10000;
+		/* wait for DMA to be aborted */
 		while ((readl(chan_base + BCM2835_DMA_CS) & BCM2835_DMA_ABORT) && --timeout)
 			cpu_relax();
 
-		/* Peripheral might be stuck and fail to signal AXI write responses */
-		if (!timeout)
+		/* Write 0 to the active bit - Pause the DMA */
+		writel(readl(chan_base + BCM2835_DMA_CS) & ~BCM2835_DMA_ACTIVE,
+		       chan_base + BCM2835_DMA_CS);
+
+		/*
+		 * Peripheral might be stuck and fail to complete
+		 * This is expected when dreqs are enabled but not asserted
+		 * so only report error in non dreq case
+		 */
+		if (!timeout && !(readl(chan_base + BCM2835_DMA_TI) &
+		   (BCM2835_DMA_S_DREQ | BCM2835_DMA_D_DREQ)))
 			dev_err(c->vc.chan.device->dev,
-				"failed to abort dma\n");
+				"failed to complete pause on dma %d (CS:%08x)\n", c->ch,
+				readl(chan_base + BCM2835_DMA_CS));
+
+		/* Set CS back to default state and reset the DMA */
+		writel(BCM2835_DMA_RESET, chan_base + BCM2835_DMA_CS);
 	}
 }
 
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 06/12] dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant
  2024-02-04  6:59 ` [PATCH 06/12] dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant Andrea della Porta
@ 2024-02-04 17:04   ` Florian Fainelli
  2024-02-05 10:25     ` Andrea della Porta
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Fainelli @ 2024-02-04 17:04 UTC (permalink / raw)
  To: Andrea della Porta, Vinod Koul, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell


[-- Attachment #1.1: Type: text/plain, Size: 348 bytes --]



On 2/3/2024 10:59 PM, 'Andrea della Porta' via 
BCM-KERNEL-FEEDBACK-LIST,PDL wrote:
> From: Maxime Ripard <maxime@cerno.tech>
> 
> bcm2711_dma40_memcpy has some code strictly equivalent to the
> to_bcm2711_cbaddr() function. Let's use it instead.
> 
> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

Where is the full patch series?
-- 
Florian

[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4221 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 06/12] dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant
  2024-02-04 17:04   ` Florian Fainelli
@ 2024-02-05 10:25     ` Andrea della Porta
  0 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-05 10:25 UTC (permalink / raw)
  To: florian.fainelli
  Cc: andrea.porta, bcm-kernel-feedback-list, dmaengine,
	linux-arm-kernel, linux-kernel, linux-rpi-kernel, maxime, phil,
	popcornmix, rjui, sbranden, vkoul

>> From: Maxime Ripard <maxime@cerno.tech>
>> 
>> bcm2711_dma40_memcpy has some code strictly equivalent to the
>> to_bcm2711_cbaddr() function. Let's use it instead.
>> 
>> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

>Where is the full patch series?

Hi Florian,
sorry, what do you mean with 'where is the full patch series', exactly?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/12] bcm2835-dma: Advertise the full DMA range
  2024-02-04  6:59 ` [PATCH 04/12] bcm2835-dma: Advertise the full DMA range Andrea della Porta
@ 2024-02-05 17:55   ` Robin Murphy
  2024-03-01 13:55     ` Andrea della Porta
  2024-02-05 18:25   ` Stefan Wahren
  1 sibling, 1 reply; 28+ messages in thread
From: Robin Murphy @ 2024-02-05 17:55 UTC (permalink / raw)
  To: Andrea della Porta, Vinod Koul, Florian Fainelli, Ray Jui,
	Scott Branden, Broadcom internal kernel review list, dmaengine,
	linux-rpi-kernel, linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

On 2024-02-04 6:59 am, Andrea della Porta wrote:
> From: Phil Elwell <phil@raspberrypi.com>
> 
> Unless the DMA mask is set wider than 32 bits, DMA mapping will use a
> bounce buffer.
> 
> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
> ---
>   drivers/dma/bcm2835-dma.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index 36bad198b655..237dcdb8d726 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -39,6 +39,7 @@
>   #define BCM2711_DMA_MEMCPY_CHAN 14
>   
>   struct bcm2835_dma_cfg_data {
> +	u64	dma_mask;
>   	u32	chan_40bit_mask;
>   };
>   
> @@ -308,10 +309,12 @@ DEFINE_SPINLOCK(memcpy_lock);
>   
>   static const struct bcm2835_dma_cfg_data bcm2835_dma_cfg = {
>   	.chan_40bit_mask = 0,
> +	.dma_mask = DMA_BIT_MASK(32),
>   };
>   
>   static const struct bcm2835_dma_cfg_data bcm2711_dma_cfg = {
>   	.chan_40bit_mask = BIT(11) | BIT(12) | BIT(13) | BIT(14),
> +	.dma_mask = DMA_BIT_MASK(36),
>   };
>   
>   static inline size_t bcm2835_dma_max_frame_length(struct bcm2835_chan *c)
> @@ -1263,6 +1266,8 @@ static struct dma_chan *bcm2835_dma_xlate(struct of_phandle_args *spec,
>   
>   static int bcm2835_dma_probe(struct platform_device *pdev)
>   {
> +	const struct bcm2835_dma_cfg_data *cfg_data;
> +	const struct of_device_id *of_id;
>   	struct bcm2835_dmadev *od;
>   	struct resource *res;
>   	void __iomem *base;
> @@ -1272,13 +1277,20 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
>   	int irq_flags;
>   	uint32_t chans_available;
>   	char chan_name[BCM2835_DMA_CHAN_NAME_SIZE];
> -	const struct of_device_id *of_id;
>   	int chan_count, chan_start, chan_end;
>   
> +	of_id = of_match_node(bcm2835_dma_of_match, pdev->dev.of_node);
> +	if (!of_id) {
> +		dev_err(&pdev->dev, "Failed to match compatible string\n");
> +		return -EINVAL;
> +	}
> +
> +	cfg_data = of_id->data;

We've had of_device_get_match_data() for nearly 9 years now, and even a 
generic device_get_match_data() for 6 ;)

> +
>   	if (!pdev->dev.dma_mask)
>   		pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;

[ Passing nit: that also really shouldn't be there, especially since 
cdfee5623290 ]

>   
> -	rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
> +	rc = dma_set_mask_and_coherent(&pdev->dev, cfg_data->dma_mask);

Wait, does chan_40bit_mask mean that you still have some channels which 
*can't* address this full mask? If so this can't work properly. You may 
well need to redesign a bit further to have a separate DMA device for 
each channel such they can each have different masks.

Thanks,
Robin.

>   	if (rc) {
>   		dev_err(&pdev->dev, "Unable to set DMA mask\n");
>   		return rc;
> @@ -1342,7 +1354,7 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
>   		return -EINVAL;
>   	}
>   
> -	od->cfg_data = of_id->data;
> +	od->cfg_data = cfg_data;
>   
>   	/* Request DMA channel mask from device tree */
>   	if (of_property_read_u32(pdev->dev.of_node,

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 05/12] bcm2835-dma: Derive slave DMA addresses correctly
  2024-02-04  6:59 ` [PATCH 05/12] bcm2835-dma: Derive slave DMA addresses correctly Andrea della Porta
@ 2024-02-05 18:03   ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2024-02-05 18:03 UTC (permalink / raw)
  To: Andrea della Porta, Vinod Koul, Florian Fainelli, Ray Jui,
	Scott Branden, Broadcom internal kernel review list, dmaengine,
	linux-rpi-kernel, linux-arm-kernel, linux-kernel,
	iommu@lists.linux.dev
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

On 2024-02-04 6:59 am, Andrea della Porta wrote:
> From: Phil Elwell <phil@raspberrypi.com>
> 
> Slave addresses for DMA are meant to be supplied as physical addresses
> (contrary to what struct snd_dmaengine_dai_dma_data does). It is up to
> the DMA controller driver to perform the translation based on its own
> view of the world, as described in Device Tree.
> 
> Now that the Pi Device Trees have the correct peripheral mappings,
> replace the hacky address munging with phys_to_dma().
> 
> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
> ---
>   drivers/dma/bcm2835-dma.c | 23 +++++------------------
>   1 file changed, 5 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index 237dcdb8d726..077812eda609 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -18,6 +18,7 @@
>    *	Copyright 2012 Marvell International Ltd.
>    */
>   #include <linux/dmaengine.h>
> +#include <linux/dma-direct.h>

Please read the comment at the top of that file; this driver is 
definitely not a DMA API implementation, and should not be including it.

>   #include <linux/dma-mapping.h>
>   #include <linux/dmapool.h>
>   #include <linux/err.h>
> @@ -980,22 +981,12 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
>   	if (direction == DMA_DEV_TO_MEM) {
>   		if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
>   			return NULL;
> -		src = c->cfg.src_addr;
> -		/*
> -		 * One would think it ought to be possible to get the physical
> -		 * to dma address mapping information from the dma-ranges DT
> -		 * property, but I've not found a way yet that doesn't involve
> -		 * open-coding the whole thing.
> -		 */
> -		if (c->is_40bit_channel)
> -			src |= 0x400000000ull;
> +		src = phys_to_dma(chan->device->dev, c->cfg.src_addr);

FWIW I'd argue that abusing DMA API internals like this is even more 
hacky than bypassing it entirely. The appropriate public API for setting 
up the device end of a transfer is dma_map_resource(). Now, it *is* the 
case currently that the dma-direct implementation of that does not take 
dma_range_map into account, but that's already an open question:

https://lore.kernel.org/linux-iommu/20220610080802.11147-1-Sergey.Semin@baikalelectronics.ru/

Thanks,
Robin.

>   		info |= BCM2835_DMA_S_DREQ | BCM2835_DMA_D_INC;
>   	} else {
>   		if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
>   			return NULL;
> -		dst = c->cfg.dst_addr;
> -		if (c->is_40bit_channel)
> -			dst |= 0x400000000ull;
> +		dst = phys_to_dma(chan->device->dev, c->cfg.dst_addr);
>   		info |= BCM2835_DMA_D_DREQ | BCM2835_DMA_S_INC;
>   	}
>   
> @@ -1064,17 +1055,13 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
>   	if (direction == DMA_DEV_TO_MEM) {
>   		if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
>   			return NULL;
> -		src = c->cfg.src_addr;
> -		if (c->is_40bit_channel)
> -			src |= 0x400000000ull;
> +		src = phys_to_dma(chan->device->dev, c->cfg.src_addr);
>   		dst = buf_addr;
>   		info |= BCM2835_DMA_S_DREQ | BCM2835_DMA_D_INC;
>   	} else {
>   		if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
>   			return NULL;
> -		dst = c->cfg.dst_addr;
> -		if (c->is_40bit_channel)
> -			dst |= 0x400000000ull;
> +		dst = phys_to_dma(chan->device->dev, c->cfg.dst_addr);
>   		src = buf_addr;
>   		info |= BCM2835_DMA_D_DREQ | BCM2835_DMA_S_INC;
>   

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/12] bcm2835-dma: Advertise the full DMA range
  2024-02-04  6:59 ` [PATCH 04/12] bcm2835-dma: Advertise the full DMA range Andrea della Porta
  2024-02-05 17:55   ` Robin Murphy
@ 2024-02-05 18:25   ` Stefan Wahren
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Wahren @ 2024-02-05 18:25 UTC (permalink / raw)
  To: Andrea della Porta, Vinod Koul, Florian Fainelli, Ray Jui,
	Scott Branden, Broadcom internal kernel review list, dmaengine,
	linux-rpi-kernel, linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

Hi Andrea,

Am 04.02.24 um 07:59 schrieb Andrea della Porta:
> From: Phil Elwell <phil@raspberrypi.com>
>
> Unless the DMA mask is set wider than 32 bits, DMA mapping will use a
> bounce buffer.
>
> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
this lacks your Signed-off-by

Regards

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support
  2024-02-04  6:59 ` [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support Andrea della Porta
@ 2024-02-05 18:50   ` Stefan Wahren
  2024-02-06 16:31     ` Dave Stevenson
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Wahren @ 2024-02-05 18:50 UTC (permalink / raw)
  To: Andrea della Porta, Vinod Koul, Florian Fainelli, Ray Jui,
	Scott Branden, Broadcom internal kernel review list, dmaengine,
	linux-rpi-kernel, linux-arm-kernel, linux-kernel, Dave Stevenson
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

Hi Andrea,

[add Dave]

Am 04.02.24 um 07:59 schrieb Andrea della Porta:
> From: Phil Elwell <phil@raspberrypi.org>
>
> BCM2711 has 4 DMA channels with a 40-bit address range, allowing them
> to access the full 4GB of memory on a Pi 4.
>
> Cc: Phil Elwell <phil@raspberrypi.org>
> Cc: Maxime Ripard <maxime@cerno.tech>
> Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
mainlining isn't that simple by sending just the downstream patches to
the mailing list. In many cases there reasons why this hasn't been
upstreamed yet.

In my opinion just this feature is worth a separate patch series. In
2021 i already send an initial version, which tried to implement it in a
cleaner & maintainabler way [1]. In the meantime Dave Stevenson from
Raspberry Pi wrote that he also wanted to work on this. Maybe you want
to work on this together?

[1] -
https://lore.kernel.org/linux-arm-kernel/13ec386b-2305-27da-9765-8fa3ad71146c@i2se.com/T/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Add support for BCM2712 DMA engine
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (11 preceding siblings ...)
  2024-02-04  6:59 ` [PATCH 12/12] bcm2835-dma: Fixes for dma_abort Andrea della Porta
@ 2024-02-05 19:06 ` Stefan Wahren
  2024-02-07  8:19 ` Vinod Koul
  13 siblings, 0 replies; 28+ messages in thread
From: Stefan Wahren @ 2024-02-05 19:06 UTC (permalink / raw)
  To: Andrea della Porta, Vinod Koul, Florian Fainelli, Ray Jui,
	Scott Branden, Broadcom internal kernel review list, dmaengine,
	linux-rpi-kernel, linux-arm-kernel, linux-kernel
  Cc: Maxime Ripard, Dom Cobley, Phil Elwell

Hi Andrea,

Am 04.02.24 um 07:59 schrieb Andrea della Porta:
> This patchset aims to update the dma engine for BCM* chipset with respect
> to current advancements in downstream vendor tree. In particular:
>
> * Added support for BCM2712 DMA.
> * Extended DMA addressing to 40 bit. Since BCM2711 also supports 40 bit addressing,
> it will also benefit from the update.
> * Handled the devicetree node from vendor dts (e.g. "dma40").
>
> The only difference between the application of this patch and the relative code
> in vendor tree is the dropping of channel reservation for BCM2708 DMA legacy
> driver, that seems to have not made its way to upstream anyway, and it's
> probably used only from deprecated subsystems.
>
> Compile tested and runtime tested on RPi4B only.
sorry but this is not sufficient. AFAIK only the Raspberry Pi 5 has a
BCM2712. I suggest to start with BCM2711 40 bit support, which is enough
work.

This whole series does neither contain a change to the dt-bindings nor
to the DTS files. This is not how it works.

Best regards
>
> Dom Cobley (4):
>    bcm2835-dma: Support dma flags for multi-beat burst
>    bcm2835-dma: Need to keep PROT bits set in CS on 40bit controller
>    dmaengine: bcm2835: Rename to_bcm2711_cbaddr to to_40bit_cbaddr
>    bcm2835-dma: Fixes for dma_abort
>
> Maxime Ripard (2):
>    dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant
>    dmaengine: bcm2835: Support DMA-Lite channels
>
> Phil Elwell (6):
>    bcm2835-dma: Add support for per-channel flags
>    bcm2835-dma: Add proper 40-bit DMA support
>    bcm2835-dma: Add NO_WAIT_RESP, DMA_WIDE_SOURCE and DMA_WIDE_DEST flag
>    bcm2835-dma: Advertise the full DMA range
>    bcm2835-dma: Derive slave DMA addresses correctly
>    dmaengine: bcm2835: Add BCM2712 support
>
>   drivers/dma/bcm2835-dma.c | 701 ++++++++++++++++++++++++++++++++------
>   1 file changed, 588 insertions(+), 113 deletions(-)
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support
  2024-02-05 18:50   ` Stefan Wahren
@ 2024-02-06 16:31     ` Dave Stevenson
  2024-02-06 18:08       ` Stefan Wahren
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Stevenson @ 2024-02-06 16:31 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Andrea della Porta, Vinod Koul, Florian Fainelli, Ray Jui,
	Scott Branden, Broadcom internal kernel review list, dmaengine,
	linux-rpi-kernel, linux-arm-kernel, linux-kernel, Maxime Ripard,
	Dom Cobley, Phil Elwell

Hi Stefan and Andrea

On Mon, 5 Feb 2024 at 18:50, Stefan Wahren <wahrenst@gmx.net> wrote:
>
> Hi Andrea,
>
> [add Dave]
>
> Am 04.02.24 um 07:59 schrieb Andrea della Porta:
> > From: Phil Elwell <phil@raspberrypi.org>
> >
> > BCM2711 has 4 DMA channels with a 40-bit address range, allowing them
> > to access the full 4GB of memory on a Pi 4.
> >
> > Cc: Phil Elwell <phil@raspberrypi.org>
> > Cc: Maxime Ripard <maxime@cerno.tech>
> > Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
> mainlining isn't that simple by sending just the downstream patches to
> the mailing list. In many cases there reasons why this hasn't been
> upstreamed yet.
>
> In my opinion just this feature is worth a separate patch series. In
> 2021 i already send an initial version, which tried to implement it in a
> cleaner & maintainabler way [1]. In the meantime Dave Stevenson from
> Raspberry Pi wrote that he also wanted to work on this. Maybe you want
> to work on this together?

Yes, I'm looking at reworking Stefan's series to work on Pi4 & Pi5 as
it's needed for HDMI audio (and other things) on those platforms which
I'm working to upstream.

I was getting weirdness from the sdhci block when I was last looking
at it, so it was just proving a little trickier than first thought.
Hopefully I'll get some time on it in the next couple of weeks.

  Dave

> [1] -
> https://lore.kernel.org/linux-arm-kernel/13ec386b-2305-27da-9765-8fa3ad71146c@i2se.com/T/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support
  2024-02-06 16:31     ` Dave Stevenson
@ 2024-02-06 18:08       ` Stefan Wahren
  2024-02-06 18:11         ` Stefan Wahren
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Wahren @ 2024-02-06 18:08 UTC (permalink / raw)
  To: Dave Stevenson, Andrea della Porta
  Cc: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel, Maxime Ripard, Dom Cobley,
	Phil Elwell

Hi Dave and Andrea,

Am 06.02.24 um 17:31 schrieb Dave Stevenson:
> Hi Stefan and Andrea
>
> On Mon, 5 Feb 2024 at 18:50, Stefan Wahren <wahrenst@gmx.net> wrote:
>> Hi Andrea,
>>
>> [add Dave]
>>
>> Am 04.02.24 um 07:59 schrieb Andrea della Porta:
>>> From: Phil Elwell <phil@raspberrypi.org>
>>>
>>> BCM2711 has 4 DMA channels with a 40-bit address range, allowing them
>>> to access the full 4GB of memory on a Pi 4.
>>>
>>> Cc: Phil Elwell <phil@raspberrypi.org>
>>> Cc: Maxime Ripard <maxime@cerno.tech>
>>> Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
>> mainlining isn't that simple by sending just the downstream patches to
>> the mailing list. In many cases there reasons why this hasn't been
>> upstreamed yet.
>>
>> In my opinion just this feature is worth a separate patch series. In
>> 2021 i already send an initial version, which tried to implement it in a
>> cleaner & maintainabler way [1]. In the meantime Dave Stevenson from
>> Raspberry Pi wrote that he also wanted to work on this. Maybe you want
>> to work on this together?
> Yes, I'm looking at reworking Stefan's series to work on Pi4 & Pi5 as
> it's needed for HDMI audio (and other things) on those platforms which
> I'm working to upstream.
>
> I was getting weirdness from the sdhci block when I was last looking
> at it, so it was just proving a little trickier than first thought.
> Hopefully I'll get some time on it in the next couple of weeks.
i must confess that my series was just a draft to see that the general
approach would be accepted. Yes, it's possible that there are issues :-(

Maybe i can help you a little bit by taking care of first two patches
(node name fix & YAML conversion)?

Regards
>    Dave
>
>> [1] -
>> https://lore.kernel.org/linux-arm-kernel/13ec386b-2305-27da-9765-8fa3ad71146c@i2se.com/T/


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support
  2024-02-06 18:08       ` Stefan Wahren
@ 2024-02-06 18:11         ` Stefan Wahren
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Wahren @ 2024-02-06 18:11 UTC (permalink / raw)
  To: Dave Stevenson, Andrea della Porta
  Cc: Vinod Koul, Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel, Maxime Ripard, Dom Cobley,
	Phil Elwell

Am 06.02.24 um 19:08 schrieb Stefan Wahren:
> Hi Dave and Andrea,
>
> Am 06.02.24 um 17:31 schrieb Dave Stevenson:
>> Hi Stefan and Andrea
>>
>> On Mon, 5 Feb 2024 at 18:50, Stefan Wahren <wahrenst@gmx.net> wrote:
>>> Hi Andrea,
>>>
>>> [add Dave]
>>>
>>> Am 04.02.24 um 07:59 schrieb Andrea della Porta:
>>>> From: Phil Elwell <phil@raspberrypi.org>
>>>>
>>>> BCM2711 has 4 DMA channels with a 40-bit address range, allowing them
>>>> to access the full 4GB of memory on a Pi 4.
>>>>
>>>> Cc: Phil Elwell <phil@raspberrypi.org>
>>>> Cc: Maxime Ripard <maxime@cerno.tech>
>>>> Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
>>> mainlining isn't that simple by sending just the downstream patches to
>>> the mailing list. In many cases there reasons why this hasn't been
>>> upstreamed yet.
>>>
>>> In my opinion just this feature is worth a separate patch series. In
>>> 2021 i already send an initial version, which tried to implement it
>>> in a
>>> cleaner & maintainabler way [1]. In the meantime Dave Stevenson from
>>> Raspberry Pi wrote that he also wanted to work on this. Maybe you want
>>> to work on this together?
>> Yes, I'm looking at reworking Stefan's series to work on Pi4 & Pi5 as
>> it's needed for HDMI audio (and other things) on those platforms which
>> I'm working to upstream.
>>
>> I was getting weirdness from the sdhci block when I was last looking
>> at it, so it was just proving a little trickier than first thought.
>> Hopefully I'll get some time on it in the next couple of weeks.
> i must confess that my series was just a draft to see that the general
> approach would be accepted. Yes, it's possible that there are issues :-(
>
> Maybe i can help you a little bit by taking care of first two patches
> (node name fix & YAML conversion)?
Forget about this, it's already done
>
> Regards
>>    Dave
>>
>>> [1] -
>>> https://lore.kernel.org/linux-arm-kernel/13ec386b-2305-27da-9765-8fa3ad71146c@i2se.com/T/
>>>
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Add support for BCM2712 DMA engine
  2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
                   ` (12 preceding siblings ...)
  2024-02-05 19:06 ` [PATCH 00/12] Add support for BCM2712 DMA engine Stefan Wahren
@ 2024-02-07  8:19 ` Vinod Koul
  2024-02-07 10:24   ` Andrea della Porta
  13 siblings, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2024-02-07  8:19 UTC (permalink / raw)
  To: Andrea della Porta
  Cc: Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel, Maxime Ripard, Dom Cobley,
	Phil Elwell

On 04-02-24, 07:59, Andrea della Porta wrote:
> This patchset aims to update the dma engine for BCM* chipset with respect
> to current advancements in downstream vendor tree. In particular:
> 
> * Added support for BCM2712 DMA.
> * Extended DMA addressing to 40 bit. Since BCM2711 also supports 40 bit addressing,
> it will also benefit from the update.
> * Handled the devicetree node from vendor dts (e.g. "dma40").
> 
> The only difference between the application of this patch and the relative code
> in vendor tree is the dropping of channel reservation for BCM2708 DMA legacy
> driver, that seems to have not made its way to upstream anyway, and it's
> probably used only from deprecated subsystems.
> 
> Compile tested and runtime tested on RPi4B only.
> 
> Dom Cobley (4):
>   bcm2835-dma: Support dma flags for multi-beat burst
>   bcm2835-dma: Need to keep PROT bits set in CS on 40bit controller
>   dmaengine: bcm2835: Rename to_bcm2711_cbaddr to to_40bit_cbaddr
>   bcm2835-dma: Fixes for dma_abort
> 
> Maxime Ripard (2):
>   dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant
>   dmaengine: bcm2835: Support DMA-Lite channels
> 
> Phil Elwell (6):
>   bcm2835-dma: Add support for per-channel flags
>   bcm2835-dma: Add proper 40-bit DMA support
>   bcm2835-dma: Add NO_WAIT_RESP, DMA_WIDE_SOURCE and DMA_WIDE_DEST flag
>   bcm2835-dma: Advertise the full DMA range
>   bcm2835-dma: Derive slave DMA addresses correctly
>   dmaengine: bcm2835: Add BCM2712 support
> 
>  drivers/dma/bcm2835-dma.c | 701 ++++++++++++++++++++++++++++++++------
>  1 file changed, 588 insertions(+), 113 deletions(-)

Everything is modifying one file and still you have 2 different tags for patches, why?

Consistency is a good thing, right...

-- 
~Vinod

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 07/12] bcm2835-dma: Support dma flags for multi-beat burst
  2024-02-04  6:59 ` [PATCH 07/12] bcm2835-dma: Support dma flags for multi-beat burst Andrea della Porta
@ 2024-02-07  8:22   ` Vinod Koul
  0 siblings, 0 replies; 28+ messages in thread
From: Vinod Koul @ 2024-02-07  8:22 UTC (permalink / raw)
  To: Andrea della Porta
  Cc: Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel, Maxime Ripard, Dom Cobley,
	Phil Elwell

On 04-02-24, 07:59, Andrea della Porta wrote:
> From: Dom Cobley <popcornmix@gmail.com>
> 
> Add a control bit to enable a multi-beat burst on a DMA.
> This improves DMA performance and is required for HDMI audio.
> 
> Signed-off-by: Dom Cobley <popcornmix@gmail.com>
> Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
> ---
>  drivers/dma/bcm2835-dma.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index d8d1f9ba2572..a20700a400a2 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -156,7 +156,8 @@ struct bcm2835_desc {
>  #define BCM2835_DMA_S_WIDTH	BIT(9) /* 128bit writes if set */
>  #define BCM2835_DMA_S_DREQ	BIT(10) /* enable SREQ for source */
>  #define BCM2835_DMA_S_IGNORE	BIT(11) /* ignore source reads - read 0 */
> -#define BCM2835_DMA_BURST_LENGTH(x) ((x & 15) << 12)
> +#define BCM2835_DMA_BURST_LENGTH(x) (((x) & 15) << 12)

why this changes, sounds like it does not belong here.. 


> +#define BCM2835_DMA_GET_BURST_LENGTH(x) (((x) >> 12) & 15)
>  #define BCM2835_DMA_CS_FLAGS(x) ((x) & (BCM2835_DMA_PRIORITY(15) | \
>  				      BCM2835_DMA_PANIC_PRIORITY(15) | \
>  				      BCM2835_DMA_WAIT_FOR_WRITES | \
> @@ -180,6 +181,11 @@ struct bcm2835_desc {
>  #define WIDE_DEST(x) (((x) & BCM2835_DMA_WIDE_DEST) ? \
>  		      BCM2835_DMA_D_WIDTH : 0)
>  
> +/* A fake bit to request that the driver requires multi-beat burst */
> +#define BCM2835_DMA_BURST BIT(30)
> +#define BURST_LENGTH(x) (((x) & BCM2835_DMA_BURST) ? \
> +			 BCM2835_DMA_BURST_LENGTH(3) : 0)
> +
>  /* debug register bits */
>  #define BCM2835_DMA_DEBUG_LAST_NOT_SET_ERR	BIT(0)
>  #define BCM2835_DMA_DEBUG_FIFO_ERR		BIT(1)
> @@ -282,7 +288,7 @@ struct bcm2835_desc {
>  /* the max dma length for different channels */
>  #define MAX_DMA40_LEN SZ_1G
>  
> -#define BCM2711_DMA40_BURST_LEN(x)	((min(x, 16) - 1) << 8)
> +#define BCM2711_DMA40_BURST_LEN(x)	(((x) & 15) << 8)
>  #define BCM2711_DMA40_INC		BIT(12)
>  #define BCM2711_DMA40_SIZE_32		(0 << 13)
>  #define BCM2711_DMA40_SIZE_64		(1 << 13)
> @@ -359,12 +365,16 @@ static inline uint32_t to_bcm2711_ti(uint32_t info)
>  
>  static inline uint32_t to_bcm2711_srci(uint32_t info)
>  {
> -	return ((info & BCM2835_DMA_S_INC) ? BCM2711_DMA40_INC : 0);
> +	return ((info & BCM2835_DMA_S_INC) ? BCM2711_DMA40_INC : 0) |
> +	       ((info & BCM2835_DMA_S_WIDTH) ? BCM2711_DMA40_SIZE_128 : 0) |
> +	       BCM2711_DMA40_BURST_LEN(BCM2835_DMA_GET_BURST_LENGTH(info));
>  }
>  
>  static inline uint32_t to_bcm2711_dsti(uint32_t info)
>  {
> -	return ((info & BCM2835_DMA_D_INC) ? BCM2711_DMA40_INC : 0);
> +	return ((info & BCM2835_DMA_D_INC) ? BCM2711_DMA40_INC : 0) |
> +	       ((info & BCM2835_DMA_D_WIDTH) ? BCM2711_DMA40_SIZE_128 : 0) |
> +	       BCM2711_DMA40_BURST_LEN(BCM2835_DMA_GET_BURST_LENGTH(info));
>  }
>  
>  static inline uint32_t to_bcm2711_cbaddr(dma_addr_t addr)
> @@ -933,7 +943,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_memcpy(
>  	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct bcm2835_desc *d;
>  	u32 info = BCM2835_DMA_D_INC | BCM2835_DMA_S_INC |
> -		   WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
> +		   WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) |
> +		   WIDE_DEST(c->dreq) | BURST_LENGTH(c->dreq);
>  	u32 extra = BCM2835_DMA_INT_EN;
>  	size_t max_len = bcm2835_dma_max_frame_length(c);
>  	size_t frames;
> @@ -964,8 +975,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
>  	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct bcm2835_desc *d;
>  	dma_addr_t src = 0, dst = 0;
> -	u32 info = WAIT_RESP(c->dreq) |
> -		   WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
> +	u32 info = WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) |
> +		   WIDE_DEST(c->dreq) | BURST_LENGTH(c->dreq);
>  	u32 extra = BCM2835_DMA_INT_EN;
>  	size_t frames;
>  
> @@ -1017,7 +1028,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
>  	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct bcm2835_desc *d;
>  	dma_addr_t src, dst;
> -	u32 info = WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) | WIDE_DEST(c->dreq);
> +	u32 info = WAIT_RESP(c->dreq) | WIDE_SOURCE(c->dreq) |
> +		   WIDE_DEST(c->dreq) | BURST_LENGTH(c->dreq);
>  	u32 extra = 0;
>  	size_t max_len = bcm2835_dma_max_frame_length(c);
>  	size_t frames;
> -- 
> 2.41.0

-- 
~Vinod

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 10/12] dmaengine: bcm2835: Support DMA-Lite channels
  2024-02-04  6:59 ` [PATCH 10/12] dmaengine: bcm2835: Support DMA-Lite channels Andrea della Porta
@ 2024-02-07  8:26   ` Vinod Koul
  0 siblings, 0 replies; 28+ messages in thread
From: Vinod Koul @ 2024-02-07  8:26 UTC (permalink / raw)
  To: Andrea della Porta
  Cc: Florian Fainelli, Ray Jui, Scott Branden,
	Broadcom internal kernel review list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel, Maxime Ripard, Dom Cobley,
	Phil Elwell

On 04-02-24, 07:59, Andrea della Porta wrote:
> From: Maxime Ripard <maxime@cerno.tech>
> 
> The BCM2712 has a DMA-Lite controller that is basically a BCM2835-style
> DMA controller that supports 40 bits DMA addresses.
> 
> We need it for HDMI audio to work.
> 
> Cc: Maxime Ripard <maxime@cerno.tech>
> Cc: Dom Cobley <popcornmix@gmail.com>
> Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
> ---
>  drivers/dma/bcm2835-dma.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index 548cf7343d83..055c558caa0e 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -100,6 +100,7 @@ struct bcm2835_chan {
>  
>  	bool is_lite_channel;
>  	bool is_40bit_channel;
> +	bool is_2712;

why not use is_40bit_channel..? also this can be applicable for more
soc, make it generic flag if you cant reuse this one

>  };
>  
>  struct bcm2835_desc {
> @@ -545,7 +546,11 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
>  			control_block->info = info;
>  			control_block->src = src;
>  			control_block->dst = dst;
> -			control_block->stride = 0;
> +			if (c->is_2712)
> +				control_block->stride = (upper_32_bits(dst) << 8) |
> +							upper_32_bits(src);
> +			else
> +				control_block->stride = 0;
>  			control_block->next = 0;
>  		}
>  
> @@ -570,7 +575,8 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
>  			 d->cb_list[frame - 1].cb)->next_cb =
>  				to_bcm2711_cbaddr(cb_entry->paddr);
>  		if (frame && !c->is_40bit_channel)
> -			d->cb_list[frame - 1].cb->next = cb_entry->paddr;
> +			d->cb_list[frame - 1].cb->next = c->is_2712 ?
> +			to_bcm2711_cbaddr(cb_entry->paddr) : cb_entry->paddr;
>  
>  		/* update src and dst and length */
>  		if (src && (info & BCM2835_DMA_S_INC)) {
> @@ -750,7 +756,10 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
>  		writel(BCM2711_DMA40_ACTIVE | BCM2711_DMA40_PROT | BCM2711_DMA40_CS_FLAGS(c->dreq),
>  		       c->chan_base + BCM2711_DMA40_CS);
>  	} else {
> -		writel(d->cb_list[0].paddr, c->chan_base + BCM2835_DMA_ADDR);
> +		writel(BIT(31), c->chan_base + BCM2835_DMA_CS);
> +
> +		writel(c->is_2712 ? to_bcm2711_cbaddr(d->cb_list[0].paddr) : d->cb_list[0].paddr,
> +		       c->chan_base + BCM2835_DMA_ADDR);
>  		writel(BCM2835_DMA_ACTIVE | BCM2835_DMA_CS_FLAGS(c->dreq),
>  		       c->chan_base + BCM2835_DMA_CS);
>  	}
> @@ -1119,7 +1128,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
>  		 d->cb_list[frames - 1].cb)->next_cb =
>  			to_bcm2711_cbaddr(d->cb_list[0].paddr);
>  	else
> -		d->cb_list[d->frames - 1].cb->next = d->cb_list[0].paddr;
> +		d->cb_list[d->frames - 1].cb->next = c->is_2712 ?
> +		to_bcm2711_cbaddr(d->cb_list[0].paddr) : d->cb_list[0].paddr;
>  
>  	return vchan_tx_prep(&c->vc, &d->vd, flags);
>  }
> @@ -1186,6 +1196,8 @@ static int bcm2835_dma_chan_init(struct bcm2835_dmadev *d, int chan_id,
>  	else if (readl(c->chan_base + BCM2835_DMA_DEBUG) &
>  		 BCM2835_DMA_DEBUG_LITE)
>  		c->is_lite_channel = true;
> +	if (d->cfg_data->dma_mask == DMA_BIT_MASK(40))
> +		c->is_2712 = true;
>  
>  	return 0;
>  }
> -- 
> 2.41.0

-- 
~Vinod

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Add support for BCM2712 DMA engine
  2024-02-07  8:19 ` Vinod Koul
@ 2024-02-07 10:24   ` Andrea della Porta
  0 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-02-07 10:24 UTC (permalink / raw)
  To: vkoul
  Cc: andrea.porta, bcm-kernel-feedback-list, dmaengine,
	florian.fainelli, linux-arm-kernel, linux-kernel,
	linux-rpi-kernel, maxime, phil, popcornmix, rjui, sbranden

I was utterly inclined to unify all tags, then I realized that
could be simpler for anyone working on it to be able to grep
the patch subjects as they are for an easier mapping to the commit
from the vendor tree. But I see the point and I agree with you, so 
the next series version will have 'dmaengine: bcm2835:'.

Thanks

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/12] bcm2835-dma: Advertise the full DMA range
  2024-02-05 17:55   ` Robin Murphy
@ 2024-03-01 13:55     ` Andrea della Porta
  0 siblings, 0 replies; 28+ messages in thread
From: Andrea della Porta @ 2024-03-01 13:55 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Andrea della Porta, Vinod Koul, Florian Fainelli, Ray Jui,
	Scott Branden, Broadcom internal kernel review list, dmaengine,
	linux-rpi-kernel, linux-arm-kernel, linux-kernel, Maxime Ripard,
	Dom Cobley, Phil Elwell

On 17:55 Mon 05 Feb     , Robin Murphy wrote:
> On 2024-02-04 6:59 am, Andrea della Porta wrote:
> > From: Phil Elwell <phil@raspberrypi.com>
> > 
> > Unless the DMA mask is set wider than 32 bits, DMA mapping will use a
> > bounce buffer.
> > 
> > Signed-off-by: Phil Elwell <phil@raspberrypi.com>
> > ---
> >   drivers/dma/bcm2835-dma.c | 18 +++++++++++++++---
> >   1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> > index 36bad198b655..237dcdb8d726 100644
> > --- a/drivers/dma/bcm2835-dma.c
> > +++ b/drivers/dma/bcm2835-dma.c
> > @@ -39,6 +39,7 @@
> >   #define BCM2711_DMA_MEMCPY_CHAN 14
> >   struct bcm2835_dma_cfg_data {
> > +	u64	dma_mask;
> >   	u32	chan_40bit_mask;
> >   };
> > @@ -308,10 +309,12 @@ DEFINE_SPINLOCK(memcpy_lock);
> >   static const struct bcm2835_dma_cfg_data bcm2835_dma_cfg = {
> >   	.chan_40bit_mask = 0,
> > +	.dma_mask = DMA_BIT_MASK(32),
> >   };
> >   static const struct bcm2835_dma_cfg_data bcm2711_dma_cfg = {
> >   	.chan_40bit_mask = BIT(11) | BIT(12) | BIT(13) | BIT(14),
> > +	.dma_mask = DMA_BIT_MASK(36),
> >   };
> >   static inline size_t bcm2835_dma_max_frame_length(struct bcm2835_chan *c)
> > @@ -1263,6 +1266,8 @@ static struct dma_chan *bcm2835_dma_xlate(struct of_phandle_args *spec,
> >   static int bcm2835_dma_probe(struct platform_device *pdev)
> >   {
> > +	const struct bcm2835_dma_cfg_data *cfg_data;
> > +	const struct of_device_id *of_id;
> >   	struct bcm2835_dmadev *od;
> >   	struct resource *res;
> >   	void __iomem *base;
> > @@ -1272,13 +1277,20 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
> >   	int irq_flags;
> >   	uint32_t chans_available;
> >   	char chan_name[BCM2835_DMA_CHAN_NAME_SIZE];
> > -	const struct of_device_id *of_id;
> >   	int chan_count, chan_start, chan_end;
> > +	of_id = of_match_node(bcm2835_dma_of_match, pdev->dev.of_node);
> > +	if (!of_id) {
> > +		dev_err(&pdev->dev, "Failed to match compatible string\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	cfg_data = of_id->data;
> 
> We've had of_device_get_match_data() for nearly 9 years now, and even a
> generic device_get_match_data() for 6 ;)
> 
> > +
> >   	if (!pdev->dev.dma_mask)
> >   		pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
> 
> [ Passing nit: that also really shouldn't be there, especially since
> cdfee5623290 ]
> 
> > -	rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
> > +	rc = dma_set_mask_and_coherent(&pdev->dev, cfg_data->dma_mask);
> 
> Wait, does chan_40bit_mask mean that you still have some channels which
> *can't* address this full mask? If so this can't work properly. You may well
> need to redesign a bit further to have a separate DMA device for each
> channel such they can each have different masks.
>

It seems that the original intention here was to create a device for each value of dma_mask in
hw descriptors. That is, for 2711 which has 32 and 40 bit channels, the DT should look something
like this:

	dma: dma-controller@7e007000 {
		interrupts = <...>;
		brcm,dma-channel-mask = <0x7f5>;
		compatible = "brcm,bcm2835-dma";
		interrupt-names = "...";
		reg = <0x7e007000 0xb00>;
		#dma-cells = <0x01>;
	};      

	dma40: dma-controller@7e007b00 {
		interrupts = <...>;     
		brcm,dma-channel-mask = <0x3000>;
		compatible = "brcm,bcm2711-dma";
		interrupt-names = "...";
		reg = <0x00 0x7e007b00 0x00 0x400>;
		#dma-cells = <0x01>;
	};

Two devices dma0 and dma1 will be created, each one serving a different mask and the call
to dma_set_mask_and_coherent(..., dma_mask) on the specific device will be consistent. Please
note that of course "brcm,dma-channel-mask" from DT only refers to what channels are available
to be used in the kernel, while dma_mask parameter of the aforementioned dma_set_mask_and_coherent()
call is the addressing mask enforced by the driver, and its the same for each specific device
(dma0 or dma1).

Many thanks,
Andrea

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-03-01 13:56 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-04  6:59 [PATCH 00/12] Add support for BCM2712 DMA engine Andrea della Porta
2024-02-04  6:59 ` [PATCH 01/12] bcm2835-dma: Add support for per-channel flags Andrea della Porta
2024-02-04  6:59 ` [PATCH 02/12] bcm2835-dma: Add proper 40-bit DMA support Andrea della Porta
2024-02-05 18:50   ` Stefan Wahren
2024-02-06 16:31     ` Dave Stevenson
2024-02-06 18:08       ` Stefan Wahren
2024-02-06 18:11         ` Stefan Wahren
2024-02-04  6:59 ` [PATCH 03/12] bcm2835-dma: Add NO_WAIT_RESP, DMA_WIDE_SOURCE and DMA_WIDE_DEST flag Andrea della Porta
2024-02-04  6:59 ` [PATCH 04/12] bcm2835-dma: Advertise the full DMA range Andrea della Porta
2024-02-05 17:55   ` Robin Murphy
2024-03-01 13:55     ` Andrea della Porta
2024-02-05 18:25   ` Stefan Wahren
2024-02-04  6:59 ` [PATCH 05/12] bcm2835-dma: Derive slave DMA addresses correctly Andrea della Porta
2024-02-05 18:03   ` Robin Murphy
2024-02-04  6:59 ` [PATCH 06/12] dmaengine: bcm2835: Use to_bcm2711_cbaddr where relevant Andrea della Porta
2024-02-04 17:04   ` Florian Fainelli
2024-02-05 10:25     ` Andrea della Porta
2024-02-04  6:59 ` [PATCH 07/12] bcm2835-dma: Support dma flags for multi-beat burst Andrea della Porta
2024-02-07  8:22   ` Vinod Koul
2024-02-04  6:59 ` [PATCH 08/12] bcm2835-dma: Need to keep PROT bits set in CS on 40bit controller Andrea della Porta
2024-02-04  6:59 ` [PATCH 09/12] dmaengine: bcm2835: Add BCM2712 support Andrea della Porta
2024-02-04  6:59 ` [PATCH 10/12] dmaengine: bcm2835: Support DMA-Lite channels Andrea della Porta
2024-02-07  8:26   ` Vinod Koul
2024-02-04  6:59 ` [PATCH 11/12] dmaengine: bcm2835: Rename to_bcm2711_cbaddr to to_40bit_cbaddr Andrea della Porta
2024-02-04  6:59 ` [PATCH 12/12] bcm2835-dma: Fixes for dma_abort Andrea della Porta
2024-02-05 19:06 ` [PATCH 00/12] Add support for BCM2712 DMA engine Stefan Wahren
2024-02-07  8:19 ` Vinod Koul
2024-02-07 10:24   ` Andrea della Porta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).