* [PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements
@ 2026-05-18 12:36 Benoît Monin
2026-05-18 12:36 ` [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin
2026-05-18 12:36 ` [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin
0 siblings, 2 replies; 5+ messages in thread
From: Benoît Monin @ 2026-05-18 12:36 UTC (permalink / raw)
To: Frank Li, Vinod Koul
Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel,
Benoît Monin
This series adds support for scatter/gather DMA transfers via dma_vec
and dynamic descriptor chaining to the Freescale eDMA controller driver.
The first patch implements the .device_prep_peripheral_dma_vec() callback,
enabling the DMA engine to accept an array of dma_vec structures. This
callback supports both regular and cyclic transfer modes.
The second patch introduces dynamic scatter/gather chaining, which allows
multiple DMA descriptors to be linked together without stopping the channel.
This optimization eliminates idle periods when back-to-back transfers are
submitted, improving throughput and reducing latency. The implementation
carefully preserves cyclic transfer semantics and respects hardware
constraints on platforms with split register layouts.
I tested it on the i.MX93. The dynamic scatter/gather chaining should
work with other eDMA controller with split register layout.
Signed-off-by: Benoît Monin <benoit.monin@bootlin.com>
---
Changes in v4:
- To keep transactions in order, link DMA transaction to the end of
submitted list first, only lookup the issued list is the submitted
list is empty.
- Link to v3: https://patch.msgid.link/20260511-fsl-edma-dyn-sg-v3-0-98a181775dae@bootlin.com
Changes in v3:
- Fix formatting errors reported by Frank Li.
- Add fsl_edma_tx_submit() to link the DMA transactions
when they are submitted, not when they are prepared.
- Link to v2: https://patch.msgid.link/20260506-fsl-edma-dyn-sg-v2-0-66439cdd414e@bootlin.com
Changes in v2:
- Drop the RFC prefix, as asked by Frank Li
- No code change
- Link to v1: https://patch.msgid.link/20260430-fsl-edma-dyn-sg-v1-0-4e0ecbe2df66@bootlin.com
To: Frank Li <Frank.Li@nxp.com>
To: Vinod Koul <vkoul@kernel.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Frank Li <Frank.Li@kernel.org>
Cc: imx@lists.linux.dev
Cc: dmaengine@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Benoît Monin (2):
dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec
dmaengine: fsl-edma: Support dynamic scatter/gather chaining
drivers/dma/fsl-edma-common.c | 197 ++++++++++++++++++++++++++++++++++++++++--
drivers/dma/fsl-edma-common.h | 4 +
drivers/dma/fsl-edma-main.c | 2 +
drivers/dma/fsl-edma-trace.h | 5 ++
4 files changed, 202 insertions(+), 6 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260428-fsl-edma-dyn-sg-960731e37da2
Best regards,
--
Benoît Monin, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec 2026-05-18 12:36 [PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin @ 2026-05-18 12:36 ` Benoît Monin 2026-05-18 13:04 ` sashiko-bot 2026-05-18 12:36 ` [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin 1 sibling, 1 reply; 5+ messages in thread From: Benoît Monin @ 2026-05-18 12:36 UTC (permalink / raw) To: Frank Li, Vinod Koul Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel, Benoît Monin Add implementation of .device_prep_peripheral_dma_vec() callback to setup a scatter/gather DMA transfer from an array of dma_vec structures. Setup a cyclic transfer if the DMA_PREP_REPEAT flag is set. Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Benoît Monin <benoit.monin@bootlin.com> --- drivers/dma/fsl-edma-common.c | 109 ++++++++++++++++++++++++++++++++++++++++++ drivers/dma/fsl-edma-common.h | 4 ++ drivers/dma/fsl-edma-main.c | 2 + 3 files changed, 115 insertions(+) diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c index bb7531c456df..c10190164926 100644 --- a/drivers/dma/fsl-edma-common.c +++ b/drivers/dma/fsl-edma-common.c @@ -673,6 +673,115 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); } +struct dma_async_tx_descriptor * +fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *vecs, + size_t nb, enum dma_transfer_direction direction, + unsigned long flags) +{ + struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan); + dma_addr_t src_addr, dst_addr, last_sg; + struct fsl_edma_desc *fsl_desc; + u16 soff, doff, iter; + u32 nbytes; + int i; + + if (!is_slave_direction(direction)) + return NULL; + + if (!fsl_edma_prep_slave_dma(fsl_chan, direction)) + return NULL; + + fsl_desc = fsl_edma_alloc_desc(fsl_chan, nb); + if (!fsl_desc) + return NULL; + fsl_desc->iscyclic = flags & DMA_PREP_REPEAT; + fsl_desc->dirn = direction; + + if (direction == DMA_MEM_TO_DEV) { + if (!fsl_chan->cfg.src_addr_width) + fsl_chan->cfg.src_addr_width = fsl_chan->cfg.dst_addr_width; + fsl_chan->attr = + fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width, + fsl_chan->cfg.dst_addr_width); + nbytes = fsl_chan->cfg.dst_addr_width * fsl_chan->cfg.dst_maxburst; + } else { + if (!fsl_chan->cfg.dst_addr_width) + fsl_chan->cfg.dst_addr_width = fsl_chan->cfg.src_addr_width; + fsl_chan->attr = + fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width, + fsl_chan->cfg.dst_addr_width); + nbytes = fsl_chan->cfg.src_addr_width * fsl_chan->cfg.src_maxburst; + } + + for (i = 0; i < nb; i++) { + if (direction == DMA_MEM_TO_DEV) { + src_addr = vecs[i].addr; + dst_addr = fsl_chan->dma_dev_addr; + soff = fsl_chan->cfg.dst_addr_width; + doff = 0; + } else if (direction == DMA_DEV_TO_MEM) { + src_addr = fsl_chan->dma_dev_addr; + dst_addr = vecs[i].addr; + soff = 0; + doff = fsl_chan->cfg.src_addr_width; + } else { + /* DMA_DEV_TO_DEV */ + src_addr = fsl_chan->cfg.src_addr; + dst_addr = fsl_chan->cfg.dst_addr; + soff = 0; + doff = 0; + } + + /* + * Choose the suitable burst length if dma_vec length is not + * multiple of burst length so that the whole transfer length is + * multiple of minor loop(burst length). + */ + if (vecs[i].len % nbytes) { + u32 width = (direction == DMA_DEV_TO_MEM) ? doff : soff; + u32 burst = (direction == DMA_DEV_TO_MEM) ? + fsl_chan->cfg.src_maxburst : + fsl_chan->cfg.dst_maxburst; + int j; + + for (j = burst; j > 1; j--) { + if (!(vecs[i].len % (j * width))) { + nbytes = j * width; + break; + } + } + /* Set burst size as 1 if there's no suitable one */ + if (j == 1) + nbytes = width; + } + + iter = vecs[i].len / nbytes; + if (i < nb - 1) { + last_sg = fsl_desc->tcd[(i + 1)].ptcd; + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, + dst_addr, fsl_chan->attr, soff, + nbytes, 0, iter, iter, doff, last_sg, + false, false, true); + } else { + if (fsl_desc->iscyclic) { + last_sg = fsl_desc->tcd[0].ptcd; + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, + dst_addr, fsl_chan->attr, soff, + nbytes, 0, iter, iter, doff, last_sg, + true, false, true); + } else { + last_sg = 0; + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, + dst_addr, fsl_chan->attr, soff, + nbytes, 0, iter, iter, doff, last_sg, + true, true, false); + } + } + } + + return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); +} + struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction, diff --git a/drivers/dma/fsl-edma-common.h b/drivers/dma/fsl-edma-common.h index 205a96489094..0d028048701d 100644 --- a/drivers/dma/fsl-edma-common.h +++ b/drivers/dma/fsl-edma-common.h @@ -496,6 +496,10 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( struct dma_chan *chan, dma_addr_t dma_addr, size_t buf_len, size_t period_len, enum dma_transfer_direction direction, unsigned long flags); +struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( + struct dma_chan *chan, const struct dma_vec *vecs, + size_t nb, enum dma_transfer_direction direction, + unsigned long flags); struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction, diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c index 36155ab1602a..6693b4270a1a 100644 --- a/drivers/dma/fsl-edma-main.c +++ b/drivers/dma/fsl-edma-main.c @@ -841,6 +841,8 @@ static int fsl_edma_probe(struct platform_device *pdev) fsl_edma->dma_dev.device_free_chan_resources = fsl_edma_free_chan_resources; fsl_edma->dma_dev.device_tx_status = fsl_edma_tx_status; + fsl_edma->dma_dev.device_prep_peripheral_dma_vec + = fsl_edma_prep_peripheral_dma_vec; fsl_edma->dma_dev.device_prep_slave_sg = fsl_edma_prep_slave_sg; fsl_edma->dma_dev.device_prep_dma_cyclic = fsl_edma_prep_dma_cyclic; fsl_edma->dma_dev.device_prep_dma_memcpy = fsl_edma_prep_memcpy; -- 2.54.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec 2026-05-18 12:36 ` [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin @ 2026-05-18 13:04 ` sashiko-bot 0 siblings, 0 replies; 5+ messages in thread From: sashiko-bot @ 2026-05-18 13:04 UTC (permalink / raw) To: Benoît Monin; +Cc: dmaengine, Frank.Li, imx, vkoul Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: - [Critical] Multiple deterministic division by zero kernel panics in burst length and offset calculations. - [High] Transfer lengths smaller than the bus width result in a 0 iteration count, causing a massive hardware memory overflow. - [High] Incorrect memory address offsets (`soff` and `doff`) are used when source and destination bus widths differ, causing overlapping memory access. -- commit f9b345d16d721fb5194b3218cae1f0adbbad0340 Author: Benoît Monin <benoit.monin@bootlin.com> dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Add implementation of .device_prep_peripheral_dma_vec() callback to setup a scatter/gather DMA transfer from an array of dma_vec structures. Setup a cyclic transfer if the DMA_PREP_REPEAT flag is set. > diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c > --- a/drivers/dma/fsl-edma-common.c > +++ b/drivers/dma/fsl-edma-common.c > @@ -673,6 +673,115 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( > return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); > } > > +struct dma_async_tx_descriptor * > +fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *vecs, > + size_t nb, enum dma_transfer_direction direction, > + unsigned long flags) > +{ [ ... ] > + if (direction == DMA_MEM_TO_DEV) { > + if (!fsl_chan->cfg.src_addr_width) > + fsl_chan->cfg.src_addr_width = fsl_chan->cfg.dst_addr_width; > + fsl_chan->attr = > + fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width, > + fsl_chan->cfg.dst_addr_width); > + nbytes = fsl_chan->cfg.dst_addr_width * fsl_chan->cfg.dst_maxburst; This regression appears to be duplicated from the existing fsl_edma_prep_slave_sg logic. If a client leaves dst_addr_width or src_addr_width as 0 (undefined), does this evaluate to nbytes = 0? The driver relies on fsl_edma_get_tcd_attr() to handle undefined values: static unsigned int fsl_edma_get_tcd_attr(...) { if (src_addr_width == DMA_SLAVE_BUSWIDTH_UNDEFINED) src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES; ... } However, that function takes arguments by value and does not update the fsl_chan->cfg structures. If nbytes evaluates to 0 here, it will lead to a divide-by-zero panic shortly after in the modulo operation: if (vecs[i].len % nbytes) > + } else { [ ... ] > + } > + > + for (i = 0; i < nb; i++) { > + if (direction == DMA_MEM_TO_DEV) { > + src_addr = vecs[i].addr; > + dst_addr = fsl_chan->dma_dev_addr; > + soff = fsl_chan->cfg.dst_addr_width; > + doff = 0; > + } else if (direction == DMA_DEV_TO_MEM) { Is it intentional that the source address offset (soff) is set to dst_addr_width here? For a DMA_MEM_TO_DEV transfer, memory is the source. The hardware will perform reads sized according to src_addr_width (configured in the SSIZE attribute). If a client configures the memory side with a 4-byte width but the peripheral with a 1-byte width, the DMA engine will read 4 bytes per minor loop, but increment the source address by only 1 byte. Could this regression result in overlapping memory reads and corrupted data being sent to the peripheral? (And conversely for DMA_DEV_TO_MEM where doff is tied to src_addr_width). > + src_addr = fsl_chan->dma_dev_addr; > + dst_addr = vecs[i].addr; > + soff = 0; > + doff = fsl_chan->cfg.src_addr_width; > + } else { > + /* DMA_DEV_TO_DEV */ > + src_addr = fsl_chan->cfg.src_addr; > + dst_addr = fsl_chan->cfg.dst_addr; > + soff = 0; > + doff = 0; > + } > + > + /* > + * Choose the suitable burst length if dma_vec length is not > + * multiple of burst length so that the whole transfer length is > + * multiple of minor loop(burst length). > + */ > + if (vecs[i].len % nbytes) { > + u32 width = (direction == DMA_DEV_TO_MEM) ? doff : soff; If the direction is DMA_DEV_TO_DEV, both soff and doff are explicitly set to 0 above. This means width will evaluate to 0. If vecs[i].len is not a multiple of nbytes, the fallback loop will then evaluate: if (!(vecs[i].len % (j * width))) Will this trigger a divide-by-zero panic? Furthermore, if the loop finishes with j == 1, nbytes is set to width (0), which leads to another divide-by-zero panic at iter = vecs[i].len / nbytes. > + u32 burst = (direction == DMA_DEV_TO_MEM) ? > + fsl_chan->cfg.src_maxburst : > + fsl_chan->cfg.dst_maxburst; > + int j; > + > + for (j = burst; j > 1; j--) { > + if (!(vecs[i].len % (j * width))) { > + nbytes = j * width; > + break; > + } > + } > + /* Set burst size as 1 if there's no suitable one */ > + if (j == 1) > + nbytes = width; > + } > + > + iter = vecs[i].len / nbytes; If a transfer length (vecs[i].len) is smaller than the bus width, the fallback block above sets nbytes = width. The integer division iter = vecs[i].len / nbytes will then truncate to 0. Does writing 0 to the iteration count registers on Freescale eDMA hardware cause the counter to decrement before checking for zero? If so, programming the hardware's CITER and BITER registers with 0 could be interpreted as the maximum possible iterations (e.g. 32768 or 65536). Instead of safely rejecting the invalid sub-word transfer, could this regression initiate a massive DMA transfer that silently accesses thousands of bytes past the client's buffer boundaries? -- Sashiko AI review · https://sashiko.dev/#/patchset/20260518-fsl-edma-dyn-sg-v4-0-8ce7d95b1ce9@bootlin.com?part=1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining 2026-05-18 12:36 [PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin 2026-05-18 12:36 ` [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin @ 2026-05-18 12:36 ` Benoît Monin 2026-05-18 13:34 ` sashiko-bot 1 sibling, 1 reply; 5+ messages in thread From: Benoît Monin @ 2026-05-18 12:36 UTC (permalink / raw) To: Frank Li, Vinod Koul Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel, Benoît Monin Implement dynamic linking of scatter/gather transfers to enable chaining multiple DMA descriptors without stopping the channel. This avoids waiting for the channel to go idle if there is another transaction already issued. Add fsl_edma_link_sg() to dynamically link the last TCD of a previously submitted descriptor to the first TCD of a new descriptor by setting the scatter/gather address and the E_SG flag, and keeping the channel active by clearing the DREQ bit. Linking is done when the transaction is submitted by fsl_edma_tx_submit(). To do so, the .tx_submit() callback is overridden for non-cyclic transactions prepared by fsl_edma_prep_peripheral_dma_vec() and fsl_edma_prep_slave_sg(). This ensures that transactions are linked in the order they are submitted. Update fsl_edma_xfer_desc() to avoid re-initializing the hardware when a transfer is already in progress, allowing seamless chaining of descriptors. Modify the transfer completion handler to check the DONE flag in the channel CSR before marking the transfer complete. Since this flag is only available on SoC with the split registers layout, we only link transactions for DMA controllers flagged with FSL_EDMA_DRV_SPLIT_REG. Add trace event for scatter/gather linking operations. Signed-off-by: Benoît Monin <benoit.monin@bootlin.com> --- drivers/dma/fsl-edma-common.c | 90 +++++++++++++++++++++++++++++++++++++++---- drivers/dma/fsl-edma-trace.h | 5 +++ 2 files changed, 88 insertions(+), 7 deletions(-) diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c index c10190164926..6e5820051f29 100644 --- a/drivers/dma/fsl-edma-common.c +++ b/drivers/dma/fsl-edma-common.c @@ -58,7 +58,10 @@ void fsl_edma_tx_chan_handler(struct fsl_edma_chan *fsl_chan) list_del(&fsl_chan->edesc->vdesc.node); vchan_cookie_complete(&fsl_chan->edesc->vdesc); fsl_chan->edesc = NULL; - fsl_chan->status = DMA_COMPLETE; + if (!(fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_SPLIT_REG) || + (edma_readl_chreg(fsl_chan, ch_csr) & EDMA_V3_CH_CSR_DONE)) { + fsl_chan->status = DMA_COMPLETE; + } } else { vchan_cyclic_callback(&fsl_chan->edesc->vdesc); } @@ -673,6 +676,68 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); } +static void fsl_edma_link_sg(struct fsl_edma_chan *fsl_chan, struct fsl_edma_desc *fsl_desc) +{ + u32 flags = fsl_edma_drvflags(fsl_chan); + struct fsl_edma_hw_tcd *last_tcd; + struct fsl_edma_desc *prev_desc; + struct virt_dma_desc *vdesc; + u16 csr; + + lockdep_assert_held(&fsl_chan->vchan.lock); + + if (!(flags & FSL_EDMA_DRV_SPLIT_REG)) + return; + + vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_submitted, + struct virt_dma_desc, node); + if (!vdesc) + vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_issued, + struct virt_dma_desc, node); + if (!vdesc) + return; + + prev_desc = to_fsl_edma_desc(vdesc); + last_tcd = prev_desc->tcd[prev_desc->n_tcds - 1].vtcd; + + csr = fsl_edma_get_tcd_to_cpu(fsl_chan, last_tcd, csr); + if (!(csr & EDMA_TCD_CSR_D_REQ)) + return; + + fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, fsl_desc->tcd[0].ptcd, dlast_sga); + + csr &= ~EDMA_TCD_CSR_D_REQ; + csr |= EDMA_TCD_CSR_E_SG; + fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, csr, csr); + + if (prev_desc == fsl_chan->edesc && prev_desc->n_tcds == 1) { + if (flags & FSL_EDMA_DRV_CLEAR_DONE_E_SG) + edma_writel_chreg(fsl_chan, edma_readl_chreg(fsl_chan, ch_csr), ch_csr); + + edma_cp_tcd_to_reg(fsl_chan, last_tcd, dlast_sga); + edma_cp_tcd_to_reg(fsl_chan, last_tcd, csr); + } + + trace_edma_link_sg(fsl_chan, last_tcd); +} + +static dma_cookie_t fsl_edma_tx_submit(struct dma_async_tx_descriptor *tx) +{ + struct virt_dma_desc *vd = container_of(tx, struct virt_dma_desc, tx); + struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(tx->chan); + struct fsl_edma_desc *fsl_desc = to_fsl_edma_desc(vd); + struct virt_dma_chan *vc = to_virt_chan(tx->chan); + dma_cookie_t cookie; + + guard(spinlock_irqsave)(&fsl_chan->vchan.lock); + + fsl_edma_link_sg(fsl_chan, fsl_desc); + cookie = dma_cookie_assign(tx); + list_move_tail(&vd->node, &vc->desc_submitted); + + return cookie; +} + struct dma_async_tx_descriptor * fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *vecs, size_t nb, enum dma_transfer_direction direction, @@ -680,6 +745,7 @@ fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *ve { struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan); dma_addr_t src_addr, dst_addr, last_sg; + struct dma_async_tx_descriptor *tx; struct fsl_edma_desc *fsl_desc; u16 soff, doff, iter; u32 nbytes; @@ -779,7 +845,10 @@ fsl_edma_prep_peripheral_dma_vec(struct dma_chan *chan, const struct dma_vec *ve } } - return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); + tx = vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); + if (!fsl_desc->iscyclic) + tx->tx_submit = fsl_edma_tx_submit; + return tx; } struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( @@ -788,9 +857,10 @@ struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( unsigned long flags, void *context) { struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan); + dma_addr_t src_addr, dst_addr, last_sg; + struct dma_async_tx_descriptor *tx; struct fsl_edma_desc *fsl_desc; struct scatterlist *sg; - dma_addr_t src_addr, dst_addr, last_sg; u16 soff, doff, iter; u32 nbytes; int i; @@ -882,7 +952,10 @@ struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( } } - return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); + tx = vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); + tx->tx_submit = fsl_edma_tx_submit; + + return tx; } struct dma_async_tx_descriptor *fsl_edma_prep_memcpy(struct dma_chan *chan, @@ -924,9 +997,12 @@ void fsl_edma_xfer_desc(struct fsl_edma_chan *fsl_chan) if (!vdesc) return; fsl_chan->edesc = to_fsl_edma_desc(vdesc); - fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd); - fsl_edma_enable_request(fsl_chan); - fsl_chan->status = DMA_IN_PROGRESS; + + if (fsl_chan->status != DMA_IN_PROGRESS) { + fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd); + fsl_edma_enable_request(fsl_chan); + fsl_chan->status = DMA_IN_PROGRESS; + } } void fsl_edma_issue_pending(struct dma_chan *chan) diff --git a/drivers/dma/fsl-edma-trace.h b/drivers/dma/fsl-edma-trace.h index d3541301a247..ac319d2dbb90 100644 --- a/drivers/dma/fsl-edma-trace.h +++ b/drivers/dma/fsl-edma-trace.h @@ -119,6 +119,11 @@ DEFINE_EVENT(edma_log_tcd, edma_fill_tcd, TP_ARGS(chan, tcd) ); +DEFINE_EVENT(edma_log_tcd, edma_link_sg, + TP_PROTO(struct fsl_edma_chan *chan, void *tcd), + TP_ARGS(chan, tcd) +); + #endif /* this part must be outside header guard */ -- 2.54.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining 2026-05-18 12:36 ` [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin @ 2026-05-18 13:34 ` sashiko-bot 0 siblings, 0 replies; 5+ messages in thread From: sashiko-bot @ 2026-05-18 13:34 UTC (permalink / raw) To: Benoît Monin; +Cc: dmaengine, Frank.Li, vkoul, imx Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider: - [Critical] Dynamically linking descriptors in the .tx_submit() callback causes premature execution, lost interrupts, and permanent channel stalls. - [Critical] Relying on the asynchronous hardware DONE flag in the completion handler creates a race condition leading to double-execution of DMA transfers. - [High] Clearing the hardware DONE bit during active descriptor linking causes a Time-of-Check to Time-of-Use (TOCTOU) race that permanently deadlocks the DMA channel. - [High] Modifying scatter/gather fields of an active multi-TCD descriptor in coherent memory without a memory barrier causes fatal DMA bus errors. -- commit c846f40d8bb27c4bedc449b94402c8b84f93b59c Author: Benoît Monin <benoit.monin@bootlin.com> dmaengine: fsl-edma: Support dynamic scatter/gather chaining Implement dynamic linking of scatter/gather transfers to enable chaining multiple DMA descriptors without stopping the channel. > diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c > index c10190164926a..6e5820051f292 100644 > --- a/drivers/dma/fsl-edma-common.c > +++ b/drivers/dma/fsl-edma-common.c > @@ -58,7 +58,10 @@ void fsl_edma_tx_chan_handler(struct fsl_edma_chan *fsl_chan) > list_del(&fsl_chan->edesc->vdesc.node); > vchan_cookie_complete(&fsl_chan->edesc->vdesc); > fsl_chan->edesc = NULL; > - fsl_chan->status = DMA_COMPLETE; > + if (!(fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_SPLIT_REG) || > + (edma_readl_chreg(fsl_chan, ch_csr) & EDMA_V3_CH_CSR_DONE)) { > + fsl_chan->status = DMA_COMPLETE; > + } Could this DONE flag check lead to a race condition where a chained DMA transfer is executed twice? If descriptor A is chained to descriptor B, and both A and B complete in hardware before the CPU services A's completion interrupt, the hardware will halt and set the DONE flag. The interrupt handler for A would then see the DONE flag set, assume the channel cleanly stopped after A, and set the status to DMA_COMPLETE. Later in the same handler, fsl_edma_xfer_desc() will fetch descriptor B from the virtual channel and, seeing the DMA_COMPLETE status, unconditionally write B's TCD into the hardware and restart the channel, causing B to execute a second time. [ ... ] > @@ -673,6 +676,68 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( > return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); > } > > +static void fsl_edma_link_sg(struct fsl_edma_chan *fsl_chan, struct fsl_edma_desc *fsl_desc) > +{ > + u32 flags = fsl_edma_drvflags(fsl_chan); > + struct fsl_edma_hw_tcd *last_tcd; > + struct fsl_edma_desc *prev_desc; > + struct virt_dma_desc *vdesc; > + u16 csr; > + > + lockdep_assert_held(&fsl_chan->vchan.lock); > + > + if (!(flags & FSL_EDMA_DRV_SPLIT_REG)) > + return; > + > + vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_submitted, > + struct virt_dma_desc, node); > + if (!vdesc) > + vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_issued, > + struct virt_dma_desc, node); > + if (!vdesc) > + return; > + > + prev_desc = to_fsl_edma_desc(vdesc); > + last_tcd = prev_desc->tcd[prev_desc->n_tcds - 1].vtcd; > + > + csr = fsl_edma_get_tcd_to_cpu(fsl_chan, last_tcd, csr); > + if (!(csr & EDMA_TCD_CSR_D_REQ)) > + return; > + > + fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, fsl_desc->tcd[0].ptcd, dlast_sga); > + > + csr &= ~EDMA_TCD_CSR_D_REQ; > + csr |= EDMA_TCD_CSR_E_SG; > + fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, csr, csr); Is a memory barrier needed between updating dlast_sga and updating the csr to set EDMA_TCD_CSR_E_SG? Since this descriptor is in DMA coherent memory and actively visible to the hardware, the CPU or memory subsystem might reorder these writes. If the eDMA engine fetches the TCD while E_SG is set but dlast_sga is still stale or uninitialized, it might fetch the next descriptor from an invalid physical address and trigger a fatal bus error. > + > + if (prev_desc == fsl_chan->edesc && prev_desc->n_tcds == 1) { > + if (flags & FSL_EDMA_DRV_CLEAR_DONE_E_SG) > + edma_writel_chreg(fsl_chan, edma_readl_chreg(fsl_chan, ch_csr), ch_csr); Does this read and write-back of ch_csr introduce a race condition that can deadlock the channel? If the hardware transfer completes just before the channel lock is acquired, the channel becomes idle and the DONE bit is set. The driver then implicitly clears the write-1-to-clear DONE bit and sets E_SG, but setting E_SG on an already halted channel will not restart it. When the delayed completion interrupt runs, it will read ch_csr, see that DONE is 0, incorrectly assume the channel seamlessly continued, and leave the status as DMA_IN_PROGRESS. This would prevent fsl_edma_xfer_desc() from enabling the hardware request, leaving the channel stopped indefinitely. > + > + edma_cp_tcd_to_reg(fsl_chan, last_tcd, dlast_sga); > + edma_cp_tcd_to_reg(fsl_chan, last_tcd, csr); > + } > + > + trace_edma_link_sg(fsl_chan, last_tcd); > +} > + > +static dma_cookie_t fsl_edma_tx_submit(struct dma_async_tx_descriptor *tx) > +{ > + struct virt_dma_desc *vd = container_of(tx, struct virt_dma_desc, tx); > + struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(tx->chan); > + struct fsl_edma_desc *fsl_desc = to_fsl_edma_desc(vd); > + struct virt_dma_chan *vc = to_virt_chan(tx->chan); > + dma_cookie_t cookie; > + > + guard(spinlock_irqsave)(&fsl_chan->vchan.lock); > + > + fsl_edma_link_sg(fsl_chan, fsl_desc); > + cookie = dma_cookie_assign(tx); > + list_move_tail(&vd->node, &vc->desc_submitted); > + > + return cookie; > +} Can calling fsl_edma_link_sg() directly from the tx_submit callback lead to premature execution and lost interrupts? By modifying the active hardware TCD to link the new descriptor before the client calls dma_async_issue_pending(), the hardware can transition to and start executing the newly submitted descriptor automatically. If the completion interrupt for the previous descriptor fires before issue_pending() is called, vchan_next_desc() won't find the new descriptor because it is still sitting in the desc_submitted list rather than the desc_issued list. This would leave fsl_chan->edesc as NULL, meaning when the new descriptor finishes, its completion interrupt is dropped. When issue_pending() is finally called, the channel may either permanently stall or mistakenly restart the descriptor and execute it a second time. -- Sashiko AI review · https://sashiko.dev/#/patchset/20260518-fsl-edma-dyn-sg-v4-0-8ce7d95b1ce9@bootlin.com?part=2 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-18 13:34 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-18 12:36 [PATCH v4 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin 2026-05-18 12:36 ` [PATCH v4 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin 2026-05-18 13:04 ` sashiko-bot 2026-05-18 12:36 ` [PATCH v4 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin 2026-05-18 13:34 ` sashiko-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox