[PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements

DMA Engine development
 help / color / mirror / Atom feed

* [PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements
@ 2026-05-18 12:36 Benoît Monin
  2026-05-18 12:36 ` [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin
  2026-05-18 12:36 ` [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin
  0 siblings, 2 replies; 5+ messages in thread
From: Benoît Monin @ 2026-05-18 12:36 UTC (permalink / raw)
  To: Frank Li, Vinod Koul
  Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel,
	Benoît Monin

This series adds support for scatter/gather DMA transfers via dma_vec
and dynamic descriptor chaining to the Freescale eDMA controller driver.

The first patch implements the .device_prep_peripheral_dma_vec() callback,
enabling the DMA engine to accept an array of dma_vec structures. This
callback supports both regular and cyclic transfer modes.

The second patch introduces dynamic scatter/gather chaining, which allows
multiple DMA descriptors to be linked together without stopping the channel.
This optimization eliminates idle periods when back-to-back transfers are
submitted, improving throughput and reducing latency. The implementation
carefully preserves cyclic transfer semantics and respects hardware
constraints on platforms with split register layouts.

I tested it on the i.MX93. The dynamic scatter/gather chaining should
work with other eDMA controller with split register layout.

Signed-off-by: Benoît Monin <benoit.monin@bootlin.com>
---
Changes in v4:
- To keep transactions in order, link DMA transaction to the end of
  submitted list first, only lookup the issued list is the submitted
  list is empty.
- Link to v3: https://patch.msgid.link/20260511-fsl-edma-dyn-sg-v3-0-98a181775dae@bootlin.com

Changes in v3:
- Fix formatting errors reported by Frank Li.
- Add fsl_edma_tx_submit() to link the DMA transactions
  when they are submitted, not when they are prepared.
- Link to v2: https://patch.msgid.link/20260506-fsl-edma-dyn-sg-v2-0-66439cdd414e@bootlin.com

Changes in v2:
- Drop the RFC prefix, as asked by Frank Li
- No code change
- Link to v1: https://patch.msgid.link/20260430-fsl-edma-dyn-sg-v1-0-4e0ecbe2df66@bootlin.com

To: Frank Li <Frank.Li@nxp.com>
To: Vinod Koul <vkoul@kernel.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Frank Li <Frank.Li@kernel.org>
Cc: imx@lists.linux.dev
Cc: dmaengine@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

---
Benoît Monin (2):
      dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec
      dmaengine: fsl-edma: Support dynamic scatter/gather chaining

 drivers/dma/fsl-edma-common.c | 197 ++++++++++++++++++++++++++++++++++++++++--
 drivers/dma/fsl-edma-common.h |   4 +
 drivers/dma/fsl-edma-main.c   |   2 +
 drivers/dma/fsl-edma-trace.h  |   5 ++
 4 files changed, 202 insertions(+), 6 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260428-fsl-edma-dyn-sg-960731e37da2

Best regards,
--  
Benoît Monin, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec
  2026-05-18 12:36 [PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin
@ 2026-05-18 12:36 ` Benoît Monin
  2026-05-18 13:04   ` sashiko-bot
  2026-05-18 12:36 ` [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin
  1 sibling, 1 reply; 5+ messages in thread
From: Benoît Monin @ 2026-05-18 12:36 UTC (permalink / raw)
  To: Frank Li, Vinod Koul
  Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel,
	Benoît Monin

Add implementation of .device_prep_peripheral_dma_vec() callback to setup
a scatter/gather DMA transfer from an array of dma_vec structures. Setup
a cyclic transfer if the DMA_PREP_REPEAT flag is set.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Benoît Monin <benoit.monin@bootlin.com>
---
 drivers/dma/fsl-edma-common.c | 109 ++++++++++++++++++++++++++++++++++++++++++
 drivers/dma/fsl-edma-common.h |   4 ++
 drivers/dma/fsl-edma-main.c   |   2 +
 3 files changed, 115 insertions(+)

diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
index bb7531c456df..c10190164926 100644
--- a/drivers/dma/fsl-edma-common.c
+++ b/drivers/dma/fsl-edma-common.c
@@ -673,6 +673,115 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic(
 	return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
 }
 
+struct dma_async_tx_descriptor *
+fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *vecs,
+				 size_t nb, enum dma_transfer_direction direction,
+				 unsigned long flags)
+{
+	struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan);
+	dma_addr_t src_addr, dst_addr, last_sg;
+	struct fsl_edma_desc *fsl_desc;
+	u16 soff, doff, iter;
+	u32 nbytes;
+	int i;
+
+	if (!is_slave_direction(direction))
+		return NULL;
+
+	if (!fsl_edma_prep_slave_dma(fsl_chan, direction))
+		return NULL;
+
+	fsl_desc = fsl_edma_alloc_desc(fsl_chan, nb);
+	if (!fsl_desc)
+		return NULL;
+	fsl_desc->iscyclic = flags & DMA_PREP_REPEAT;
+	fsl_desc->dirn = direction;
+
+	if (direction == DMA_MEM_TO_DEV) {
+		if (!fsl_chan->cfg.src_addr_width)
+			fsl_chan->cfg.src_addr_width = fsl_chan->cfg.dst_addr_width;
+		fsl_chan->attr =
+			fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width,
+					      fsl_chan->cfg.dst_addr_width);
+		nbytes = fsl_chan->cfg.dst_addr_width * fsl_chan->cfg.dst_maxburst;
+	} else {
+		if (!fsl_chan->cfg.dst_addr_width)
+			fsl_chan->cfg.dst_addr_width = fsl_chan->cfg.src_addr_width;
+		fsl_chan->attr =
+			fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width,
+					      fsl_chan->cfg.dst_addr_width);
+		nbytes = fsl_chan->cfg.src_addr_width * fsl_chan->cfg.src_maxburst;
+	}
+
+	for (i = 0; i < nb; i++) {
+		if (direction == DMA_MEM_TO_DEV) {
+			src_addr = vecs[i].addr;
+			dst_addr = fsl_chan->dma_dev_addr;
+			soff = fsl_chan->cfg.dst_addr_width;
+			doff = 0;
+		} else if (direction == DMA_DEV_TO_MEM) {
+			src_addr = fsl_chan->dma_dev_addr;
+			dst_addr = vecs[i].addr;
+			soff = 0;
+			doff = fsl_chan->cfg.src_addr_width;
+		} else {
+			/* DMA_DEV_TO_DEV */
+			src_addr = fsl_chan->cfg.src_addr;
+			dst_addr = fsl_chan->cfg.dst_addr;
+			soff = 0;
+			doff = 0;
+		}
+
+		/*
+		 * Choose the suitable burst length if dma_vec length is not
+		 * multiple of burst length so that the whole transfer length is
+		 * multiple of minor loop(burst length).
+		 */
+		if (vecs[i].len % nbytes) {
+			u32 width = (direction == DMA_DEV_TO_MEM) ? doff : soff;
+			u32 burst = (direction == DMA_DEV_TO_MEM) ?
+						fsl_chan->cfg.src_maxburst :
+						fsl_chan->cfg.dst_maxburst;
+			int j;
+
+			for (j = burst; j > 1; j--) {
+				if (!(vecs[i].len % (j * width))) {
+					nbytes = j * width;
+					break;
+				}
+			}
+			/* Set burst size as 1 if there's no suitable one */
+			if (j == 1)
+				nbytes = width;
+		}
+
+		iter = vecs[i].len / nbytes;
+		if (i < nb - 1) {
+			last_sg = fsl_desc->tcd[(i + 1)].ptcd;
+			fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr,
+					  dst_addr, fsl_chan->attr, soff,
+					  nbytes, 0, iter, iter, doff, last_sg,
+					  false, false, true);
+		} else {
+			if (fsl_desc->iscyclic) {
+				last_sg = fsl_desc->tcd[0].ptcd;
+				fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr,
+						  dst_addr, fsl_chan->attr, soff,
+						  nbytes, 0, iter, iter, doff, last_sg,
+						  true, false, true);
+			} else {
+				last_sg = 0;
+				fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr,
+						  dst_addr, fsl_chan->attr, soff,
+						  nbytes, 0, iter, iter, doff, last_sg,
+						  true, true, false);
+			}
+		}
+	}
+
+	return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
+}
+
 struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg(
 		struct dma_chan *chan, struct scatterlist *sgl,
 		unsigned int sg_len, enum dma_transfer_direction direction,
diff --git a/drivers/dma/fsl-edma-common.h b/drivers/dma/fsl-edma-common.h
index 205a96489094..0d028048701d 100644
--- a/drivers/dma/fsl-edma-common.h
+++ b/drivers/dma/fsl-edma-common.h
@@ -496,6 +496,10 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic(
 		struct dma_chan *chan, dma_addr_t dma_addr, size_t buf_len,
 		size_t period_len, enum dma_transfer_direction direction,
 		unsigned long flags);
+struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec(
+		struct dma_chan *chan, const struct dma_vec *vecs,
+		size_t nb, enum dma_transfer_direction direction,
+		unsigned long flags);
 struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg(
 		struct dma_chan *chan, struct scatterlist *sgl,
 		unsigned int sg_len, enum dma_transfer_direction direction,
diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c
index 36155ab1602a..6693b4270a1a 100644
--- a/drivers/dma/fsl-edma-main.c
+++ b/drivers/dma/fsl-edma-main.c
@@ -841,6 +841,8 @@ static int fsl_edma_probe(struct platform_device *pdev)
 	fsl_edma->dma_dev.device_free_chan_resources
 		= fsl_edma_free_chan_resources;
 	fsl_edma->dma_dev.device_tx_status = fsl_edma_tx_status;
+	fsl_edma->dma_dev.device_prep_peripheral_dma_vec
+		= fsl_edma_prep_peripheral_dma_vec;
 	fsl_edma->dma_dev.device_prep_slave_sg = fsl_edma_prep_slave_sg;
 	fsl_edma->dma_dev.device_prep_dma_cyclic = fsl_edma_prep_dma_cyclic;
 	fsl_edma->dma_dev.device_prep_dma_memcpy = fsl_edma_prep_memcpy;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec
  2026-05-18 12:36 ` [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin
@ 2026-05-18 13:04   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-05-18 13:04 UTC (permalink / raw)
  To: Benoît Monin; +Cc: dmaengine, Frank.Li, imx, vkoul

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Multiple deterministic division by zero kernel panics in burst length and offset calculations.
- [High] Transfer lengths smaller than the bus width result in a 0 iteration count, causing a massive hardware memory overflow.
- [High] Incorrect memory address offsets (`soff` and `doff`) are used when source and destination bus widths differ, causing overlapping memory access.
--

commit f9b345d16d721fb5194b3218cae1f0adbbad0340
Author: Benoît Monin <benoit.monin@bootlin.com>

dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec

Add implementation of .device_prep_peripheral_dma_vec() callback to setup
a scatter/gather DMA transfer from an array of dma_vec structures. Setup
a cyclic transfer if the DMA_PREP_REPEAT flag is set.

> diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
> --- a/drivers/dma/fsl-edma-common.c
> +++ b/drivers/dma/fsl-edma-common.c
> @@ -673,6 +673,115 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic(
>  	return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
>  }
>  
> +struct dma_async_tx_descriptor *
> +fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *vecs,
> +				 size_t nb, enum dma_transfer_direction direction,
> +				 unsigned long flags)
> +{
[ ... ]
> +	if (direction == DMA_MEM_TO_DEV) {
> +		if (!fsl_chan->cfg.src_addr_width)
> +			fsl_chan->cfg.src_addr_width = fsl_chan->cfg.dst_addr_width;
> +		fsl_chan->attr =
> +			fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width,
> +					      fsl_chan->cfg.dst_addr_width);
> +		nbytes = fsl_chan->cfg.dst_addr_width * fsl_chan->cfg.dst_maxburst;

This regression appears to be duplicated from the existing fsl_edma_prep_slave_sg
logic.

If a client leaves dst_addr_width or src_addr_width as 0 (undefined), does
this evaluate to nbytes = 0?

The driver relies on fsl_edma_get_tcd_attr() to handle undefined values:

static unsigned int fsl_edma_get_tcd_attr(...)
{
    if (src_addr_width == DMA_SLAVE_BUSWIDTH_UNDEFINED)
        src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
    ...
}

However, that function takes arguments by value and does not update the
fsl_chan->cfg structures. If nbytes evaluates to 0 here, it will lead to a
divide-by-zero panic shortly after in the modulo operation:

    if (vecs[i].len % nbytes)

> +	} else {
[ ... ]
> +	}
> +
> +	for (i = 0; i < nb; i++) {
> +		if (direction == DMA_MEM_TO_DEV) {
> +			src_addr = vecs[i].addr;
> +			dst_addr = fsl_chan->dma_dev_addr;
> +			soff = fsl_chan->cfg.dst_addr_width;
> +			doff = 0;
> +		} else if (direction == DMA_DEV_TO_MEM) {

Is it intentional that the source address offset (soff) is set to
dst_addr_width here?

For a DMA_MEM_TO_DEV transfer, memory is the source. The hardware will perform
reads sized according to src_addr_width (configured in the SSIZE attribute).
If a client configures the memory side with a 4-byte width but the peripheral
with a 1-byte width, the DMA engine will read 4 bytes per minor loop, but
increment the source address by only 1 byte.

Could this regression result in overlapping memory reads and corrupted data
being sent to the peripheral? (And conversely for DMA_DEV_TO_MEM where doff
is tied to src_addr_width).

> +			src_addr = fsl_chan->dma_dev_addr;
> +			dst_addr = vecs[i].addr;
> +			soff = 0;
> +			doff = fsl_chan->cfg.src_addr_width;
> +		} else {
> +			/* DMA_DEV_TO_DEV */
> +			src_addr = fsl_chan->cfg.src_addr;
> +			dst_addr = fsl_chan->cfg.dst_addr;
> +			soff = 0;
> +			doff = 0;
> +		}
> +
> +		/*
> +		 * Choose the suitable burst length if dma_vec length is not
> +		 * multiple of burst length so that the whole transfer length is
> +		 * multiple of minor loop(burst length).
> +		 */
> +		if (vecs[i].len % nbytes) {
> +			u32 width = (direction == DMA_DEV_TO_MEM) ? doff : soff;

If the direction is DMA_DEV_TO_DEV, both soff and doff are explicitly set to
0 above. This means width will evaluate to 0.

If vecs[i].len is not a multiple of nbytes, the fallback loop will then
evaluate:

    if (!(vecs[i].len % (j * width)))

Will this trigger a divide-by-zero panic?

Furthermore, if the loop finishes with j == 1, nbytes is set to width (0),
which leads to another divide-by-zero panic at iter = vecs[i].len / nbytes.

> +			u32 burst = (direction == DMA_DEV_TO_MEM) ?
> +						fsl_chan->cfg.src_maxburst :
> +						fsl_chan->cfg.dst_maxburst;
> +			int j;
> +
> +			for (j = burst; j > 1; j--) {
> +				if (!(vecs[i].len % (j * width))) {
> +					nbytes = j * width;
> +					break;
> +				}
> +			}
> +			/* Set burst size as 1 if there's no suitable one */
> +			if (j == 1)
> +				nbytes = width;
> +		}
> +
> +		iter = vecs[i].len / nbytes;

If a transfer length (vecs[i].len) is smaller than the bus width, the fallback
block above sets nbytes = width. The integer division iter = vecs[i].len /
nbytes will then truncate to 0.

Does writing 0 to the iteration count registers on Freescale eDMA hardware
cause the counter to decrement before checking for zero?

If so, programming the hardware's CITER and BITER registers with 0 could be
interpreted as the maximum possible iterations (e.g. 32768 or 65536). Instead
of safely rejecting the invalid sub-word transfer, could this regression
initiate a massive DMA transfer that silently accesses thousands of bytes past
the client's buffer boundaries?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518-fsl-edma-dyn-sg-v4-0-8ce7d95b1ce9@bootlin.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining
  2026-05-18 12:36 [PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin
  2026-05-18 12:36 ` [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin
@ 2026-05-18 12:36 ` Benoît Monin
  2026-05-18 13:34   ` sashiko-bot
  1 sibling, 1 reply; 5+ messages in thread
From: Benoît Monin @ 2026-05-18 12:36 UTC (permalink / raw)
  To: Frank Li, Vinod Koul
  Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel,
	Benoît Monin

Implement dynamic linking of scatter/gather transfers to enable
chaining multiple DMA descriptors without stopping the channel.
This avoids waiting for the channel to go idle if there is another
transaction already issued.

Add fsl_edma_link_sg() to dynamically link the last TCD of a previously
submitted descriptor to the first TCD of a new descriptor by setting
the scatter/gather address and the E_SG flag, and keeping the channel
active by clearing the DREQ bit.

Linking is done when the transaction is submitted by fsl_edma_tx_submit().
To do so, the .tx_submit() callback is overridden for non-cyclic
transactions prepared by fsl_edma_prep_peripheral_dma_vec() and
fsl_edma_prep_slave_sg(). This ensures that transactions are linked
in the order they are submitted.

Update fsl_edma_xfer_desc() to avoid re-initializing the hardware when a
transfer is already in progress, allowing seamless chaining of descriptors.

Modify the transfer completion handler to check the DONE flag in the
channel CSR before marking the transfer complete. Since this flag is
only available on SoC with the split registers layout, we only link
transactions for DMA controllers flagged with FSL_EDMA_DRV_SPLIT_REG.

Add trace event for scatter/gather linking operations.

Signed-off-by: Benoît Monin <benoit.monin@bootlin.com>
---
 drivers/dma/fsl-edma-common.c | 90 +++++++++++++++++++++++++++++++++++++++----
 drivers/dma/fsl-edma-trace.h  |  5 +++
 2 files changed, 88 insertions(+), 7 deletions(-)

diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
index c10190164926..6e5820051f29 100644
--- a/drivers/dma/fsl-edma-common.c
+++ b/drivers/dma/fsl-edma-common.c
@@ -58,7 +58,10 @@ void fsl_edma_tx_chan_handler(struct fsl_edma_chan *fsl_chan)
 		list_del(&fsl_chan->edesc->vdesc.node);
 		vchan_cookie_complete(&fsl_chan->edesc->vdesc);
 		fsl_chan->edesc = NULL;
-		fsl_chan->status = DMA_COMPLETE;
+		if (!(fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_SPLIT_REG) ||
+		    (edma_readl_chreg(fsl_chan, ch_csr) & EDMA_V3_CH_CSR_DONE)) {
+			fsl_chan->status = DMA_COMPLETE;
+		}
 	} else {
 		vchan_cyclic_callback(&fsl_chan->edesc->vdesc);
 	}
@@ -673,6 +676,68 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic(
 	return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
 }
 
+static void fsl_edma_link_sg(struct fsl_edma_chan *fsl_chan, struct fsl_edma_desc *fsl_desc)
+{
+	u32 flags = fsl_edma_drvflags(fsl_chan);
+	struct fsl_edma_hw_tcd *last_tcd;
+	struct fsl_edma_desc *prev_desc;
+	struct virt_dma_desc *vdesc;
+	u16 csr;
+
+	lockdep_assert_held(&fsl_chan->vchan.lock);
+
+	if (!(flags & FSL_EDMA_DRV_SPLIT_REG))
+		return;
+
+	vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_submitted,
+					struct virt_dma_desc, node);
+	if (!vdesc)
+		vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_issued,
+						struct virt_dma_desc, node);
+	if (!vdesc)
+		return;
+
+	prev_desc = to_fsl_edma_desc(vdesc);
+	last_tcd = prev_desc->tcd[prev_desc->n_tcds - 1].vtcd;
+
+	csr = fsl_edma_get_tcd_to_cpu(fsl_chan, last_tcd, csr);
+	if (!(csr & EDMA_TCD_CSR_D_REQ))
+		return;
+
+	fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, fsl_desc->tcd[0].ptcd, dlast_sga);
+
+	csr &= ~EDMA_TCD_CSR_D_REQ;
+	csr |= EDMA_TCD_CSR_E_SG;
+	fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, csr, csr);
+
+	if (prev_desc == fsl_chan->edesc && prev_desc->n_tcds == 1) {
+		if (flags & FSL_EDMA_DRV_CLEAR_DONE_E_SG)
+			edma_writel_chreg(fsl_chan, edma_readl_chreg(fsl_chan, ch_csr), ch_csr);
+
+		edma_cp_tcd_to_reg(fsl_chan, last_tcd, dlast_sga);
+		edma_cp_tcd_to_reg(fsl_chan, last_tcd, csr);
+	}
+
+	trace_edma_link_sg(fsl_chan, last_tcd);
+}
+
+static dma_cookie_t fsl_edma_tx_submit(struct dma_async_tx_descriptor *tx)
+{
+	struct virt_dma_desc *vd = container_of(tx, struct virt_dma_desc, tx);
+	struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(tx->chan);
+	struct fsl_edma_desc *fsl_desc = to_fsl_edma_desc(vd);
+	struct virt_dma_chan *vc = to_virt_chan(tx->chan);
+	dma_cookie_t cookie;
+
+	guard(spinlock_irqsave)(&fsl_chan->vchan.lock);
+
+	fsl_edma_link_sg(fsl_chan, fsl_desc);
+	cookie = dma_cookie_assign(tx);
+	list_move_tail(&vd->node, &vc->desc_submitted);
+
+	return cookie;
+}
+
 struct dma_async_tx_descriptor *
 fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *vecs,
 				 size_t nb, enum dma_transfer_direction direction,
@@ -680,6 +745,7 @@ fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *ve
 {
 	struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan);
 	dma_addr_t src_addr, dst_addr, last_sg;
+	struct dma_async_tx_descriptor *tx;
 	struct fsl_edma_desc *fsl_desc;
 	u16 soff, doff, iter;
 	u32 nbytes;
@@ -779,7 +845,10 @@ fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *ve
 		}
 	}
 
-	return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
+	tx = vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
+	if (!fsl_desc->iscyclic)
+		tx->tx_submit = fsl_edma_tx_submit;
+	return tx;
 }
 
 struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg(
@@ -788,9 +857,10 @@ struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg(
 		unsigned long flags, void *context)
 {
 	struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan);
+	dma_addr_t src_addr, dst_addr, last_sg;
+	struct dma_async_tx_descriptor *tx;
 	struct fsl_edma_desc *fsl_desc;
 	struct scatterlist *sg;
-	dma_addr_t src_addr, dst_addr, last_sg;
 	u16 soff, doff, iter;
 	u32 nbytes;
 	int i;
@@ -882,7 +952,10 @@ struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg(
 		}
 	}
 
-	return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
+	tx = vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
+	tx->tx_submit = fsl_edma_tx_submit;
+
+	return tx;
 }
 
 struct dma_async_tx_descriptor *fsl_edma_prep_memcpy(struct dma_chan *chan,
@@ -924,9 +997,12 @@ void fsl_edma_xfer_desc(struct fsl_edma_chan *fsl_chan)
 	if (!vdesc)
 		return;
 	fsl_chan->edesc = to_fsl_edma_desc(vdesc);
-	fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd);
-	fsl_edma_enable_request(fsl_chan);
-	fsl_chan->status = DMA_IN_PROGRESS;
+
+	if (fsl_chan->status != DMA_IN_PROGRESS) {
+		fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd);
+		fsl_edma_enable_request(fsl_chan);
+		fsl_chan->status = DMA_IN_PROGRESS;
+	}
 }
 
 void fsl_edma_issue_pending(struct dma_chan *chan)
diff --git a/drivers/dma/fsl-edma-trace.h b/drivers/dma/fsl-edma-trace.h
index d3541301a247..ac319d2dbb90 100644
--- a/drivers/dma/fsl-edma-trace.h
+++ b/drivers/dma/fsl-edma-trace.h
@@ -119,6 +119,11 @@ DEFINE_EVENT(edma_log_tcd, edma_fill_tcd,
 	TP_ARGS(chan, tcd)
 );
 
+DEFINE_EVENT(edma_log_tcd, edma_link_sg,
+	     TP_PROTO(struct fsl_edma_chan *chan, void *tcd),
+	     TP_ARGS(chan, tcd)
+);
+
 #endif
 
 /* this part must be outside header guard */

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining
  2026-05-18 12:36 ` [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin
@ 2026-05-18 13:34   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-05-18 13:34 UTC (permalink / raw)
  To: Benoît Monin; +Cc: dmaengine, Frank.Li, vkoul, imx

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Dynamically linking descriptors in the .tx_submit() callback causes premature execution, lost interrupts, and permanent channel stalls.
- [Critical] Relying on the asynchronous hardware DONE flag in the completion handler creates a race condition leading to double-execution of DMA transfers.
- [High] Clearing the hardware DONE bit during active descriptor linking causes a Time-of-Check to Time-of-Use (TOCTOU) race that permanently deadlocks the DMA channel.
- [High] Modifying scatter/gather fields of an active multi-TCD descriptor in coherent memory without a memory barrier causes fatal DMA bus errors.
--

commit c846f40d8bb27c4bedc449b94402c8b84f93b59c
Author: Benoît Monin <benoit.monin@bootlin.com>

dmaengine: fsl-edma: Support dynamic scatter/gather chaining

Implement dynamic linking of scatter/gather transfers to enable chaining
multiple DMA descriptors without stopping the channel.

> diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
> index c10190164926a..6e5820051f292 100644
> --- a/drivers/dma/fsl-edma-common.c
> +++ b/drivers/dma/fsl-edma-common.c
> @@ -58,7 +58,10 @@ void fsl_edma_tx_chan_handler(struct fsl_edma_chan *fsl_chan)
>  		list_del(&fsl_chan->edesc->vdesc.node);
>  		vchan_cookie_complete(&fsl_chan->edesc->vdesc);
>  		fsl_chan->edesc = NULL;
> -		fsl_chan->status = DMA_COMPLETE;
> +		if (!(fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_SPLIT_REG) ||
> +		    (edma_readl_chreg(fsl_chan, ch_csr) & EDMA_V3_CH_CSR_DONE)) {
> +			fsl_chan->status = DMA_COMPLETE;
> +		}

Could this DONE flag check lead to a race condition where a chained DMA 
transfer is executed twice?

If descriptor A is chained to descriptor B, and both A and B complete in 
hardware before the CPU services A's completion interrupt, the hardware 
will halt and set the DONE flag.

The interrupt handler for A would then see the DONE flag set, assume the 
channel cleanly stopped after A, and set the status to DMA_COMPLETE. Later 
in the same handler, fsl_edma_xfer_desc() will fetch descriptor B from the 
virtual channel and, seeing the DMA_COMPLETE status, unconditionally write 
B's TCD into the hardware and restart the channel, causing B to execute 
a second time.

[ ... ]

> @@ -673,6 +676,68 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic(
>  	return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags);
>  }
>  
> +static void fsl_edma_link_sg(struct fsl_edma_chan *fsl_chan, struct fsl_edma_desc *fsl_desc)
> +{
> +	u32 flags = fsl_edma_drvflags(fsl_chan);
> +	struct fsl_edma_hw_tcd *last_tcd;
> +	struct fsl_edma_desc *prev_desc;
> +	struct virt_dma_desc *vdesc;
> +	u16 csr;
> +
> +	lockdep_assert_held(&fsl_chan->vchan.lock);
> +
> +	if (!(flags & FSL_EDMA_DRV_SPLIT_REG))
> +		return;
> +
> +	vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_submitted,
> +					struct virt_dma_desc, node);
> +	if (!vdesc)
> +		vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_issued,
> +						struct virt_dma_desc, node);
> +	if (!vdesc)
> +		return;
> +
> +	prev_desc = to_fsl_edma_desc(vdesc);
> +	last_tcd = prev_desc->tcd[prev_desc->n_tcds - 1].vtcd;
> +
> +	csr = fsl_edma_get_tcd_to_cpu(fsl_chan, last_tcd, csr);
> +	if (!(csr & EDMA_TCD_CSR_D_REQ))
> +		return;
> +
> +	fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, fsl_desc->tcd[0].ptcd, dlast_sga);
> +
> +	csr &= ~EDMA_TCD_CSR_D_REQ;
> +	csr |= EDMA_TCD_CSR_E_SG;
> +	fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, csr, csr);

Is a memory barrier needed between updating dlast_sga and updating the csr 
to set EDMA_TCD_CSR_E_SG?

Since this descriptor is in DMA coherent memory and actively visible to the 
hardware, the CPU or memory subsystem might reorder these writes. If the eDMA 
engine fetches the TCD while E_SG is set but dlast_sga is still stale or 
uninitialized, it might fetch the next descriptor from an invalid physical 
address and trigger a fatal bus error.

> +
> +	if (prev_desc == fsl_chan->edesc && prev_desc->n_tcds == 1) {
> +		if (flags & FSL_EDMA_DRV_CLEAR_DONE_E_SG)
> +			edma_writel_chreg(fsl_chan, edma_readl_chreg(fsl_chan, ch_csr), ch_csr);

Does this read and write-back of ch_csr introduce a race condition that can 
deadlock the channel?

If the hardware transfer completes just before the channel lock is acquired, 
the channel becomes idle and the DONE bit is set. The driver then implicitly 
clears the write-1-to-clear DONE bit and sets E_SG, but setting E_SG on an 
already halted channel will not restart it.

When the delayed completion interrupt runs, it will read ch_csr, see that 
DONE is 0, incorrectly assume the channel seamlessly continued, and leave 
the status as DMA_IN_PROGRESS. This would prevent fsl_edma_xfer_desc() from
enabling the hardware request, leaving the channel stopped indefinitely.

> +
> +		edma_cp_tcd_to_reg(fsl_chan, last_tcd, dlast_sga);
> +		edma_cp_tcd_to_reg(fsl_chan, last_tcd, csr);
> +	}
> +
> +	trace_edma_link_sg(fsl_chan, last_tcd);
> +}
> +
> +static dma_cookie_t fsl_edma_tx_submit(struct dma_async_tx_descriptor *tx)
> +{
> +	struct virt_dma_desc *vd = container_of(tx, struct virt_dma_desc, tx);
> +	struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(tx->chan);
> +	struct fsl_edma_desc *fsl_desc = to_fsl_edma_desc(vd);
> +	struct virt_dma_chan *vc = to_virt_chan(tx->chan);
> +	dma_cookie_t cookie;
> +
> +	guard(spinlock_irqsave)(&fsl_chan->vchan.lock);
> +
> +	fsl_edma_link_sg(fsl_chan, fsl_desc);
> +	cookie = dma_cookie_assign(tx);
> +	list_move_tail(&vd->node, &vc->desc_submitted);
> +
> +	return cookie;
> +}

Can calling fsl_edma_link_sg() directly from the tx_submit callback lead to 
premature execution and lost interrupts?

By modifying the active hardware TCD to link the new descriptor before the 
client calls dma_async_issue_pending(), the hardware can transition to and 
start executing the newly submitted descriptor automatically.

If the completion interrupt for the previous descriptor fires before 
issue_pending() is called, vchan_next_desc() won't find the new descriptor 
because it is still sitting in the desc_submitted list rather than the 
desc_issued list.

This would leave fsl_chan->edesc as NULL, meaning when the new descriptor 
finishes, its completion interrupt is dropped. When issue_pending() is 
finally called, the channel may either permanently stall or mistakenly 
restart the descriptor and execute it a second time.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518-fsl-edma-dyn-sg-v4-0-8ce7d95b1ce9@bootlin.com?part=2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-18 13:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 12:36 [PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin
2026-05-18 12:36 ` [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin
2026-05-18 13:04   ` sashiko-bot
2026-05-18 12:36 ` [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin
2026-05-18 13:34   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox