* [PATCH RFC 0/2] dmaengine: fsl-edma: Scatter/gather improvements
@ 2026-04-30 9:49 Benoît Monin
2026-04-30 9:49 ` [PATCH RFC 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin
2026-04-30 9:49 ` [PATCH RFC 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin
0 siblings, 2 replies; 8+ messages in thread
From: Benoît Monin @ 2026-04-30 9:49 UTC (permalink / raw)
To: Frank Li, Vinod Koul
Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel,
Benoît Monin
This series adds support for scatter/gather DMA transfers via dma_vec
and dynamic descriptor chaining to the Freescale eDMA controller driver.
The first patch implements the .device_prep_peripheral_dma_vec() callback,
enabling the DMA engine to accept an array of dma_vec structures. This
callback supports both regular and cyclic transfer modes.
The second patch introduces dynamic scatter/gather chaining, which allows
multiple DMA descriptors to be linked together without stopping the channel.
This optimization eliminates idle periods when back-to-back transfers are
submitted, improving throughput and reducing latency. The implementation
carefully preserves cyclic transfer semantics and respects hardware
constraints on platforms with split register layouts.
I am posting this as an RFC since I only tested it on the i.MX93. The
dynamic scatter/gather chaining should work with other eDMA controller
with split register layout.
Signed-off-by: Benoît Monin <benoit.monin@bootlin.com>
---
Benoît Monin (2):
dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec
dmaengine: fsl-edma: Support dynamic scatter/gather chaining
drivers/dma/fsl-edma-common.c | 174 +++++++++++++++++++++++++++++++++++++++++-
drivers/dma/fsl-edma-common.h | 4 +
drivers/dma/fsl-edma-main.c | 2 +
drivers/dma/fsl-edma-trace.h | 5 ++
4 files changed, 181 insertions(+), 4 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260428-fsl-edma-dyn-sg-960731e37da2
Best regards,
--
Benoît Monin, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH RFC 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec 2026-04-30 9:49 [PATCH RFC 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin @ 2026-04-30 9:49 ` Benoît Monin 2026-05-04 15:58 ` Frank Li 2026-04-30 9:49 ` [PATCH RFC 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin 1 sibling, 1 reply; 8+ messages in thread From: Benoît Monin @ 2026-04-30 9:49 UTC (permalink / raw) To: Frank Li, Vinod Koul Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel, Benoît Monin Add implementation of .device_prep_peripheral_dma_vec() callback to setup a scatter/gather DMA transfer from an array of dma_vec structures. Setup a cyclic transfer if the DMA_PREP_REPEAT flag is set. Signed-off-by: Benoît Monin <benoit.monin@bootlin.com> --- drivers/dma/fsl-edma-common.c | 110 ++++++++++++++++++++++++++++++++++++++++++ drivers/dma/fsl-edma-common.h | 4 ++ drivers/dma/fsl-edma-main.c | 2 + 3 files changed, 116 insertions(+) diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c index bb7531c456df..26a5ecf493b9 100644 --- a/drivers/dma/fsl-edma-common.c +++ b/drivers/dma/fsl-edma-common.c @@ -673,6 +673,116 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); } +struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( + struct dma_chan *chan, const struct dma_vec *vecs, + size_t nb, enum dma_transfer_direction direction, + unsigned long flags) +{ + struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan); + struct fsl_edma_desc *fsl_desc; + dma_addr_t src_addr, dst_addr, last_sg; + u16 soff, doff, iter; + u32 nbytes; + int i; + + if (!is_slave_direction(direction)) + return NULL; + + if (!fsl_edma_prep_slave_dma(fsl_chan, direction)) + return NULL; + + fsl_desc = fsl_edma_alloc_desc(fsl_chan, nb); + if (!fsl_desc) + return NULL; + fsl_desc->iscyclic = flags & DMA_PREP_REPEAT; + fsl_desc->dirn = direction; + + if (direction == DMA_MEM_TO_DEV) { + if (!fsl_chan->cfg.src_addr_width) + fsl_chan->cfg.src_addr_width = fsl_chan->cfg.dst_addr_width; + fsl_chan->attr = + fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width, + fsl_chan->cfg.dst_addr_width); + nbytes = fsl_chan->cfg.dst_addr_width * + fsl_chan->cfg.dst_maxburst; + } else { + if (!fsl_chan->cfg.dst_addr_width) + fsl_chan->cfg.dst_addr_width = fsl_chan->cfg.src_addr_width; + fsl_chan->attr = + fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width, + fsl_chan->cfg.dst_addr_width); + nbytes = fsl_chan->cfg.src_addr_width * + fsl_chan->cfg.src_maxburst; + } + + for (i = 0; i < nb; i++) { + if (direction == DMA_MEM_TO_DEV) { + src_addr = vecs[i].addr; + dst_addr = fsl_chan->dma_dev_addr; + soff = fsl_chan->cfg.dst_addr_width; + doff = 0; + } else if (direction == DMA_DEV_TO_MEM) { + src_addr = fsl_chan->dma_dev_addr; + dst_addr = vecs[i].addr; + soff = 0; + doff = fsl_chan->cfg.src_addr_width; + } else { + /* DMA_DEV_TO_DEV */ + src_addr = fsl_chan->cfg.src_addr; + dst_addr = fsl_chan->cfg.dst_addr; + soff = 0; + doff = 0; + } + + /* + * Choose the suitable burst length if dma_vec length is not + * multiple of burst length so that the whole transfer length is + * multiple of minor loop(burst length). + */ + if (vecs[i].len % nbytes) { + u32 width = (direction == DMA_DEV_TO_MEM) ? doff : soff; + u32 burst = (direction == DMA_DEV_TO_MEM) ? + fsl_chan->cfg.src_maxburst : + fsl_chan->cfg.dst_maxburst; + int j; + + for (j = burst; j > 1; j--) { + if (!(vecs[i].len % (j * width))) { + nbytes = j * width; + break; + } + } + /* Set burst size as 1 if there's no suitable one */ + if (j == 1) + nbytes = width; + } + iter = vecs[i].len / nbytes; + if (i < nb - 1) { + last_sg = fsl_desc->tcd[(i + 1)].ptcd; + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, + dst_addr, fsl_chan->attr, soff, + nbytes, 0, iter, iter, doff, last_sg, + false, false, true); + } else { + if (fsl_desc->iscyclic) { + last_sg = fsl_desc->tcd[0].ptcd; + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, + dst_addr, fsl_chan->attr, soff, + nbytes, 0, iter, iter, doff, last_sg, + true, false, true); + } else { + last_sg = 0; + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, + dst_addr, fsl_chan->attr, soff, + nbytes, 0, iter, iter, doff, last_sg, + true, true, false); + } + } + } + + return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); +} + struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction, diff --git a/drivers/dma/fsl-edma-common.h b/drivers/dma/fsl-edma-common.h index 205a96489094..0d028048701d 100644 --- a/drivers/dma/fsl-edma-common.h +++ b/drivers/dma/fsl-edma-common.h @@ -496,6 +496,10 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( struct dma_chan *chan, dma_addr_t dma_addr, size_t buf_len, size_t period_len, enum dma_transfer_direction direction, unsigned long flags); +struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( + struct dma_chan *chan, const struct dma_vec *vecs, + size_t nb, enum dma_transfer_direction direction, + unsigned long flags); struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction, diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c index 36155ab1602a..6693b4270a1a 100644 --- a/drivers/dma/fsl-edma-main.c +++ b/drivers/dma/fsl-edma-main.c @@ -841,6 +841,8 @@ static int fsl_edma_probe(struct platform_device *pdev) fsl_edma->dma_dev.device_free_chan_resources = fsl_edma_free_chan_resources; fsl_edma->dma_dev.device_tx_status = fsl_edma_tx_status; + fsl_edma->dma_dev.device_prep_peripheral_dma_vec + = fsl_edma_prep_peripheral_dma_vec; fsl_edma->dma_dev.device_prep_slave_sg = fsl_edma_prep_slave_sg; fsl_edma->dma_dev.device_prep_dma_cyclic = fsl_edma_prep_dma_cyclic; fsl_edma->dma_dev.device_prep_dma_memcpy = fsl_edma_prep_memcpy; -- 2.54.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH RFC 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec 2026-04-30 9:49 ` [PATCH RFC 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin @ 2026-05-04 15:58 ` Frank Li 2026-05-05 13:51 ` Benoît Monin 0 siblings, 1 reply; 8+ messages in thread From: Frank Li @ 2026-05-04 15:58 UTC (permalink / raw) To: Benoît Monin Cc: Vinod Koul, Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel On Thu, Apr 30, 2026 at 11:49:32AM +0200, Benoît Monin wrote: > Add implementation of .device_prep_peripheral_dma_vec() callback to setup > a scatter/gather DMA transfer from an array of dma_vec structures. Setup > a cyclic transfer if the DMA_PREP_REPEAT flag is set. > > Signed-off-by: Benoît Monin <benoit.monin@bootlin.com> > --- Please remove RFC for this patch. Frank > drivers/dma/fsl-edma-common.c | 110 ++++++++++++++++++++++++++++++++++++++++++ > drivers/dma/fsl-edma-common.h | 4 ++ > drivers/dma/fsl-edma-main.c | 2 + > 3 files changed, 116 insertions(+) > > diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c > index bb7531c456df..26a5ecf493b9 100644 > --- a/drivers/dma/fsl-edma-common.c > +++ b/drivers/dma/fsl-edma-common.c > @@ -673,6 +673,116 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( > return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); > } > > +struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( > + struct dma_chan *chan, const struct dma_vec *vecs, > + size_t nb, enum dma_transfer_direction direction, > + unsigned long flags) > +{ > + struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan); > + struct fsl_edma_desc *fsl_desc; > + dma_addr_t src_addr, dst_addr, last_sg; > + u16 soff, doff, iter; > + u32 nbytes; > + int i; > + > + if (!is_slave_direction(direction)) > + return NULL; > + > + if (!fsl_edma_prep_slave_dma(fsl_chan, direction)) > + return NULL; > + > + fsl_desc = fsl_edma_alloc_desc(fsl_chan, nb); > + if (!fsl_desc) > + return NULL; > + fsl_desc->iscyclic = flags & DMA_PREP_REPEAT; > + fsl_desc->dirn = direction; > + > + if (direction == DMA_MEM_TO_DEV) { > + if (!fsl_chan->cfg.src_addr_width) > + fsl_chan->cfg.src_addr_width = fsl_chan->cfg.dst_addr_width; > + fsl_chan->attr = > + fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width, > + fsl_chan->cfg.dst_addr_width); > + nbytes = fsl_chan->cfg.dst_addr_width * > + fsl_chan->cfg.dst_maxburst; > + } else { > + if (!fsl_chan->cfg.dst_addr_width) > + fsl_chan->cfg.dst_addr_width = fsl_chan->cfg.src_addr_width; > + fsl_chan->attr = > + fsl_edma_get_tcd_attr(fsl_chan->cfg.src_addr_width, > + fsl_chan->cfg.dst_addr_width); > + nbytes = fsl_chan->cfg.src_addr_width * > + fsl_chan->cfg.src_maxburst; > + } > + > + for (i = 0; i < nb; i++) { > + if (direction == DMA_MEM_TO_DEV) { > + src_addr = vecs[i].addr; > + dst_addr = fsl_chan->dma_dev_addr; > + soff = fsl_chan->cfg.dst_addr_width; > + doff = 0; > + } else if (direction == DMA_DEV_TO_MEM) { > + src_addr = fsl_chan->dma_dev_addr; > + dst_addr = vecs[i].addr; > + soff = 0; > + doff = fsl_chan->cfg.src_addr_width; > + } else { > + /* DMA_DEV_TO_DEV */ > + src_addr = fsl_chan->cfg.src_addr; > + dst_addr = fsl_chan->cfg.dst_addr; > + soff = 0; > + doff = 0; > + } > + > + /* > + * Choose the suitable burst length if dma_vec length is not > + * multiple of burst length so that the whole transfer length is > + * multiple of minor loop(burst length). > + */ > + if (vecs[i].len % nbytes) { > + u32 width = (direction == DMA_DEV_TO_MEM) ? doff : soff; > + u32 burst = (direction == DMA_DEV_TO_MEM) ? > + fsl_chan->cfg.src_maxburst : > + fsl_chan->cfg.dst_maxburst; > + int j; > + > + for (j = burst; j > 1; j--) { > + if (!(vecs[i].len % (j * width))) { > + nbytes = j * width; > + break; > + } > + } > + /* Set burst size as 1 if there's no suitable one */ > + if (j == 1) > + nbytes = width; > + } > + iter = vecs[i].len / nbytes; > + if (i < nb - 1) { > + last_sg = fsl_desc->tcd[(i + 1)].ptcd; > + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, > + dst_addr, fsl_chan->attr, soff, > + nbytes, 0, iter, iter, doff, last_sg, > + false, false, true); > + } else { > + if (fsl_desc->iscyclic) { > + last_sg = fsl_desc->tcd[0].ptcd; > + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, > + dst_addr, fsl_chan->attr, soff, > + nbytes, 0, iter, iter, doff, last_sg, > + true, false, true); > + } else { > + last_sg = 0; > + fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[i].vtcd, src_addr, > + dst_addr, fsl_chan->attr, soff, > + nbytes, 0, iter, iter, doff, last_sg, > + true, true, false); > + } > + } > + } > + > + return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); > +} > + > struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( > struct dma_chan *chan, struct scatterlist *sgl, > unsigned int sg_len, enum dma_transfer_direction direction, > diff --git a/drivers/dma/fsl-edma-common.h b/drivers/dma/fsl-edma-common.h > index 205a96489094..0d028048701d 100644 > --- a/drivers/dma/fsl-edma-common.h > +++ b/drivers/dma/fsl-edma-common.h > @@ -496,6 +496,10 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( > struct dma_chan *chan, dma_addr_t dma_addr, size_t buf_len, > size_t period_len, enum dma_transfer_direction direction, > unsigned long flags); > +struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( > + struct dma_chan *chan, const struct dma_vec *vecs, > + size_t nb, enum dma_transfer_direction direction, > + unsigned long flags); > struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( > struct dma_chan *chan, struct scatterlist *sgl, > unsigned int sg_len, enum dma_transfer_direction direction, > diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c > index 36155ab1602a..6693b4270a1a 100644 > --- a/drivers/dma/fsl-edma-main.c > +++ b/drivers/dma/fsl-edma-main.c > @@ -841,6 +841,8 @@ static int fsl_edma_probe(struct platform_device *pdev) > fsl_edma->dma_dev.device_free_chan_resources > = fsl_edma_free_chan_resources; > fsl_edma->dma_dev.device_tx_status = fsl_edma_tx_status; > + fsl_edma->dma_dev.device_prep_peripheral_dma_vec > + = fsl_edma_prep_peripheral_dma_vec; > fsl_edma->dma_dev.device_prep_slave_sg = fsl_edma_prep_slave_sg; > fsl_edma->dma_dev.device_prep_dma_cyclic = fsl_edma_prep_dma_cyclic; > fsl_edma->dma_dev.device_prep_dma_memcpy = fsl_edma_prep_memcpy; > > -- > 2.54.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec 2026-05-04 15:58 ` Frank Li @ 2026-05-05 13:51 ` Benoît Monin 0 siblings, 0 replies; 8+ messages in thread From: Benoît Monin @ 2026-05-05 13:51 UTC (permalink / raw) To: Frank Li Cc: Vinod Koul, Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel On Monday, 4 May 2026 at 17:58:08 CEST, Frank Li wrote: > On Thu, Apr 30, 2026 at 11:49:32AM +0200, Benoît Monin wrote: > > Add implementation of .device_prep_peripheral_dma_vec() callback to setup > > a scatter/gather DMA transfer from an array of dma_vec structures. Setup > > a cyclic transfer if the DMA_PREP_REPEAT flag is set. > > > > Signed-off-by: Benoît Monin <benoit.monin@bootlin.com> > > --- > > Please remove RFC for this patch. > Ok, will do. Best regards, -- Benoît Monin, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH RFC 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining 2026-04-30 9:49 [PATCH RFC 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin 2026-04-30 9:49 ` [PATCH RFC 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin @ 2026-04-30 9:49 ` Benoît Monin 2026-05-04 16:04 ` Frank Li 1 sibling, 1 reply; 8+ messages in thread From: Benoît Monin @ 2026-04-30 9:49 UTC (permalink / raw) To: Frank Li, Vinod Koul Cc: Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel, Benoît Monin Implement dynamic linking of scatter/gather transfers to enable chaining multiple DMA descriptors without stopping the channel. This avoids waiting for the channel to go idle if there is another transaction already issued. Add fsl_edma_link_sg() to dynamically link the last TCD of a previously submitted descriptor to the first TCD of a new descriptor by setting the scatter/gather address and the E_SG flag, and keeping the channel active by clearing the DREQ bit. Linking is only done if the last TCD was set to disable the DMA channel, to prevent corrupting cyclic transaction. Update fsl_edma_xfer_desc() to avoid re-initializing the hardware when a transfer is already in progress, allowing seamless chaining of descriptors. Modify the transfer completion handler to check the DONE flag in the channel CSR before marking the transfer complete. Since this flag is only available on SoC with the split registers layout, we only link transactions for DMA controllers flagged with FSL_EDMA_DRV_SPLIT_REG. Add trace event for scatter/gather linking operations. Signed-off-by: Benoît Monin <benoit.monin@bootlin.com> --- drivers/dma/fsl-edma-common.c | 64 ++++++++++++++++++++++++++++++++++++++++--- drivers/dma/fsl-edma-trace.h | 5 ++++ 2 files changed, 65 insertions(+), 4 deletions(-) diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c index 26a5ecf493b9..7094c747defa 100644 --- a/drivers/dma/fsl-edma-common.c +++ b/drivers/dma/fsl-edma-common.c @@ -58,7 +58,10 @@ void fsl_edma_tx_chan_handler(struct fsl_edma_chan *fsl_chan) list_del(&fsl_chan->edesc->vdesc.node); vchan_cookie_complete(&fsl_chan->edesc->vdesc); fsl_chan->edesc = NULL; - fsl_chan->status = DMA_COMPLETE; + if (!(fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_SPLIT_REG) || + (edma_readl_chreg(fsl_chan, ch_csr) & EDMA_V3_CH_CSR_DONE)) { + fsl_chan->status = DMA_COMPLETE; + } } else { vchan_cyclic_callback(&fsl_chan->edesc->vdesc); } @@ -673,6 +676,51 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); } +static void fsl_edma_link_sg(struct fsl_edma_chan *fsl_chan, struct fsl_edma_desc *fsl_desc) +{ + u32 flags = fsl_edma_drvflags(fsl_chan); + struct virt_dma_desc *vdesc; + struct fsl_edma_desc *prev_desc; + struct fsl_edma_hw_tcd *last_tcd; + u16 csr; + + if (!(flags & FSL_EDMA_DRV_SPLIT_REG)) + return; + + guard(spinlock_irqsave)(&fsl_chan->vchan.lock); + + vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_issued, + struct virt_dma_desc, node); + if (!vdesc) + vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_submitted, + struct virt_dma_desc, node); + if (!vdesc) + return; + + prev_desc = to_fsl_edma_desc(vdesc); + last_tcd = prev_desc->tcd[prev_desc->n_tcds - 1].vtcd; + + csr = fsl_edma_get_tcd_to_cpu(fsl_chan, last_tcd, csr); + if (!(csr & EDMA_TCD_CSR_D_REQ)) + return; + + fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, fsl_desc->tcd[0].ptcd, dlast_sga); + + csr &= ~EDMA_TCD_CSR_D_REQ; + csr |= EDMA_TCD_CSR_E_SG; + fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, csr, csr); + + if (prev_desc == fsl_chan->edesc && prev_desc->n_tcds == 1) { + if (flags & FSL_EDMA_DRV_CLEAR_DONE_E_SG) + edma_writel_chreg(fsl_chan, edma_readl_chreg(fsl_chan, ch_csr), ch_csr); + + edma_cp_tcd_to_reg(fsl_chan, last_tcd, dlast_sga); + edma_cp_tcd_to_reg(fsl_chan, last_tcd, csr); + } + + trace_edma_link_sg(fsl_chan, last_tcd); +} + struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( struct dma_chan *chan, const struct dma_vec *vecs, size_t nb, enum dma_transfer_direction direction, @@ -780,6 +828,9 @@ struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( } } + if (!fsl_desc->iscyclic) + fsl_edma_link_sg(fsl_chan, fsl_desc); + return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); } @@ -883,6 +934,8 @@ struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( } } + fsl_edma_link_sg(fsl_chan, fsl_desc); + return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); } @@ -925,9 +978,12 @@ void fsl_edma_xfer_desc(struct fsl_edma_chan *fsl_chan) if (!vdesc) return; fsl_chan->edesc = to_fsl_edma_desc(vdesc); - fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd); - fsl_edma_enable_request(fsl_chan); - fsl_chan->status = DMA_IN_PROGRESS; + + if (fsl_chan->status != DMA_IN_PROGRESS) { + fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd); + fsl_edma_enable_request(fsl_chan); + fsl_chan->status = DMA_IN_PROGRESS; + } } void fsl_edma_issue_pending(struct dma_chan *chan) diff --git a/drivers/dma/fsl-edma-trace.h b/drivers/dma/fsl-edma-trace.h index d3541301a247..ac319d2dbb90 100644 --- a/drivers/dma/fsl-edma-trace.h +++ b/drivers/dma/fsl-edma-trace.h @@ -119,6 +119,11 @@ DEFINE_EVENT(edma_log_tcd, edma_fill_tcd, TP_ARGS(chan, tcd) ); +DEFINE_EVENT(edma_log_tcd, edma_link_sg, + TP_PROTO(struct fsl_edma_chan *chan, void *tcd), + TP_ARGS(chan, tcd) +); + #endif /* this part must be outside header guard */ -- 2.54.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH RFC 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining 2026-04-30 9:49 ` [PATCH RFC 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin @ 2026-05-04 16:04 ` Frank Li 2026-05-05 13:51 ` Benoît Monin 0 siblings, 1 reply; 8+ messages in thread From: Frank Li @ 2026-05-04 16:04 UTC (permalink / raw) To: Benoît Monin Cc: Vinod Koul, Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel On Thu, Apr 30, 2026 at 11:49:33AM +0200, Benoît Monin wrote: > Implement dynamic linking of scatter/gather transfers to enable > chaining multiple DMA descriptors without stopping the channel. > This avoids waiting for the channel to go idle if there is another > transaction already issued. > > Add fsl_edma_link_sg() to dynamically link the last TCD of a previously > submitted descriptor to the first TCD of a new descriptor by setting > the scatter/gather address and the E_SG flag, and keeping the channel > active by clearing the DREQ bit. Thank for your trying this, which I want to do long time ago. The key problem is how to guarratee safe when link to last TCD and DMA is working it? if update last TCD's next pointer before DMA load it, it is good. but, if update last TCD's next pointer after DMA load it. DMA engine may stop. how do you test it? and how much preformance improved? Frank > > Linking is only done if the last TCD was set to disable the DMA channel, > to prevent corrupting cyclic transaction. > > Update fsl_edma_xfer_desc() to avoid re-initializing the hardware when a > transfer is already in progress, allowing seamless chaining of descriptors. > > Modify the transfer completion handler to check the DONE flag in the > channel CSR before marking the transfer complete. Since this flag is > only available on SoC with the split registers layout, we only link > transactions for DMA controllers flagged with FSL_EDMA_DRV_SPLIT_REG. > > Add trace event for scatter/gather linking operations. > > Signed-off-by: Benoît Monin <benoit.monin@bootlin.com> > --- > drivers/dma/fsl-edma-common.c | 64 ++++++++++++++++++++++++++++++++++++++++--- > drivers/dma/fsl-edma-trace.h | 5 ++++ > 2 files changed, 65 insertions(+), 4 deletions(-) > > diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c > index 26a5ecf493b9..7094c747defa 100644 > --- a/drivers/dma/fsl-edma-common.c > +++ b/drivers/dma/fsl-edma-common.c > @@ -58,7 +58,10 @@ void fsl_edma_tx_chan_handler(struct fsl_edma_chan *fsl_chan) > list_del(&fsl_chan->edesc->vdesc.node); > vchan_cookie_complete(&fsl_chan->edesc->vdesc); > fsl_chan->edesc = NULL; > - fsl_chan->status = DMA_COMPLETE; > + if (!(fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_SPLIT_REG) || > + (edma_readl_chreg(fsl_chan, ch_csr) & EDMA_V3_CH_CSR_DONE)) { > + fsl_chan->status = DMA_COMPLETE; > + } > } else { > vchan_cyclic_callback(&fsl_chan->edesc->vdesc); > } > @@ -673,6 +676,51 @@ struct dma_async_tx_descriptor *fsl_edma_prep_dma_cyclic( > return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); > } > > +static void fsl_edma_link_sg(struct fsl_edma_chan *fsl_chan, struct fsl_edma_desc *fsl_desc) > +{ > + u32 flags = fsl_edma_drvflags(fsl_chan); > + struct virt_dma_desc *vdesc; > + struct fsl_edma_desc *prev_desc; > + struct fsl_edma_hw_tcd *last_tcd; > + u16 csr; > + > + if (!(flags & FSL_EDMA_DRV_SPLIT_REG)) > + return; > + > + guard(spinlock_irqsave)(&fsl_chan->vchan.lock); > + > + vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_issued, > + struct virt_dma_desc, node); > + if (!vdesc) > + vdesc = list_last_entry_or_null(&fsl_chan->vchan.desc_submitted, > + struct virt_dma_desc, node); > + if (!vdesc) > + return; > + > + prev_desc = to_fsl_edma_desc(vdesc); > + last_tcd = prev_desc->tcd[prev_desc->n_tcds - 1].vtcd; > + > + csr = fsl_edma_get_tcd_to_cpu(fsl_chan, last_tcd, csr); > + if (!(csr & EDMA_TCD_CSR_D_REQ)) > + return; > + > + fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, fsl_desc->tcd[0].ptcd, dlast_sga); > + > + csr &= ~EDMA_TCD_CSR_D_REQ; > + csr |= EDMA_TCD_CSR_E_SG; > + fsl_edma_set_tcd_to_le(fsl_chan, last_tcd, csr, csr); > + > + if (prev_desc == fsl_chan->edesc && prev_desc->n_tcds == 1) { > + if (flags & FSL_EDMA_DRV_CLEAR_DONE_E_SG) > + edma_writel_chreg(fsl_chan, edma_readl_chreg(fsl_chan, ch_csr), ch_csr); > + > + edma_cp_tcd_to_reg(fsl_chan, last_tcd, dlast_sga); > + edma_cp_tcd_to_reg(fsl_chan, last_tcd, csr); > + } > + > + trace_edma_link_sg(fsl_chan, last_tcd); > +} > + > struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( > struct dma_chan *chan, const struct dma_vec *vecs, > size_t nb, enum dma_transfer_direction direction, > @@ -780,6 +828,9 @@ struct dma_async_tx_descriptor *fsl_edma_prep_peripheral_dma_vec( > } > } > > + if (!fsl_desc->iscyclic) > + fsl_edma_link_sg(fsl_chan, fsl_desc); > + > return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); > } > > @@ -883,6 +934,8 @@ struct dma_async_tx_descriptor *fsl_edma_prep_slave_sg( > } > } > > + fsl_edma_link_sg(fsl_chan, fsl_desc); > + > return vchan_tx_prep(&fsl_chan->vchan, &fsl_desc->vdesc, flags); > } > > @@ -925,9 +978,12 @@ void fsl_edma_xfer_desc(struct fsl_edma_chan *fsl_chan) > if (!vdesc) > return; > fsl_chan->edesc = to_fsl_edma_desc(vdesc); > - fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd); > - fsl_edma_enable_request(fsl_chan); > - fsl_chan->status = DMA_IN_PROGRESS; > + > + if (fsl_chan->status != DMA_IN_PROGRESS) { > + fsl_edma_set_tcd_regs(fsl_chan, fsl_chan->edesc->tcd[0].vtcd); > + fsl_edma_enable_request(fsl_chan); > + fsl_chan->status = DMA_IN_PROGRESS; > + } > } > > void fsl_edma_issue_pending(struct dma_chan *chan) > diff --git a/drivers/dma/fsl-edma-trace.h b/drivers/dma/fsl-edma-trace.h > index d3541301a247..ac319d2dbb90 100644 > --- a/drivers/dma/fsl-edma-trace.h > +++ b/drivers/dma/fsl-edma-trace.h > @@ -119,6 +119,11 @@ DEFINE_EVENT(edma_log_tcd, edma_fill_tcd, > TP_ARGS(chan, tcd) > ); > > +DEFINE_EVENT(edma_log_tcd, edma_link_sg, > + TP_PROTO(struct fsl_edma_chan *chan, void *tcd), > + TP_ARGS(chan, tcd) > +); > + > #endif > > /* this part must be outside header guard */ > > -- > 2.54.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining 2026-05-04 16:04 ` Frank Li @ 2026-05-05 13:51 ` Benoît Monin 2026-05-05 15:07 ` Frank Li 0 siblings, 1 reply; 8+ messages in thread From: Benoît Monin @ 2026-05-05 13:51 UTC (permalink / raw) To: Frank Li Cc: Vinod Koul, Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel On Monday, 4 May 2026 at 18:04:49 CEST, Frank Li wrote: > On Thu, Apr 30, 2026 at 11:49:33AM +0200, Benoît Monin wrote: > > Implement dynamic linking of scatter/gather transfers to enable > > chaining multiple DMA descriptors without stopping the channel. > > This avoids waiting for the channel to go idle if there is another > > transaction already issued. > > > > Add fsl_edma_link_sg() to dynamically link the last TCD of a previously > > submitted descriptor to the first TCD of a new descriptor by setting > > the scatter/gather address and the E_SG flag, and keeping the channel > > active by clearing the DREQ bit. > > Thank for your trying this, which I want to do long time ago. > > The key problem is > > how to guarratee safe when link to last TCD and DMA is working it? > if update last TCD's next pointer before DMA load it, it is good. > but, if update last TCD's next pointer after DMA load it. DMA engine > may stop. > I followed what is described in the dynamic scatter/gather chapter of the i.MX93 reference manual. We update two registers of the last TCD when chaining SG transfers, first TCD_DLAST_SGA then TCD_CSR. This gives us three possibilities of concurrence between CPU writes and eDMA reads. * First case, both registers are updated before the eDMA reads the TCD, we get the expected chaining. * Second case, both registers are updated too late and the eDMA have already read the TCD, we are back to the current implementation. After processing the last TCD, the eDMA will disable the channel and the call to fsl_edma_xfer_desc() from fsl_edma_tx_chan_handler() will move to the next issued descriptor and re-enable the channel. * Final case, only TCD_DLAST_SGA gets picked up by the eDMA. The eDMA will also disable the channel after processing the last TCD, the only difference is that it will update TCD_DADDR by adding the value TCD_DLAST_SGA. Since we are not reusing TCD_DADDR, this has no impact. > how do you test it? and how much preformance improved? I did my tests by doing SPI transfers with the LPSPI controllers, doing DMA transactions with different number of buffers and different buffer sizes. Without chaining, interruptions on the SPI bus occur between each DMA transaction. With chaining, the activity on the SPI bus is continuous as long as DMA transactions are issued before the end of the current transaction. Best regards, -- Benoît Monin, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining 2026-05-05 13:51 ` Benoît Monin @ 2026-05-05 15:07 ` Frank Li 0 siblings, 0 replies; 8+ messages in thread From: Frank Li @ 2026-05-05 15:07 UTC (permalink / raw) To: Benoît Monin Cc: Vinod Koul, Thomas Petazzoni, Frank Li, imx, dmaengine, linux-kernel On Tue, May 05, 2026 at 03:51:51PM +0200, Benoît Monin wrote: > On Monday, 4 May 2026 at 18:04:49 CEST, Frank Li wrote: > > On Thu, Apr 30, 2026 at 11:49:33AM +0200, Benoît Monin wrote: > > > Implement dynamic linking of scatter/gather transfers to enable > > > chaining multiple DMA descriptors without stopping the channel. > > > This avoids waiting for the channel to go idle if there is another > > > transaction already issued. > > > > > > Add fsl_edma_link_sg() to dynamically link the last TCD of a previously > > > submitted descriptor to the first TCD of a new descriptor by setting > > > the scatter/gather address and the E_SG flag, and keeping the channel > > > active by clearing the DREQ bit. > > > > Thank for your trying this, which I want to do long time ago. > > > > The key problem is > > > > how to guarratee safe when link to last TCD and DMA is working it? > > if update last TCD's next pointer before DMA load it, it is good. > > but, if update last TCD's next pointer after DMA load it. DMA engine > > may stop. > > > I followed what is described in the dynamic scatter/gather chapter of the > i.MX93 reference manual. We update two registers of the last TCD when > chaining SG transfers, first TCD_DLAST_SGA then TCD_CSR. This gives us > three possibilities of concurrence between CPU writes and eDMA reads. > > * First case, both registers are updated before the eDMA reads the TCD, we > get the expected chaining. > * Second case, both registers are updated too late and the eDMA have already > read the TCD, we are back to the current implementation. After processing > the last TCD, the eDMA will disable the channel and the call to > fsl_edma_xfer_desc() from fsl_edma_tx_chan_handler() will move to the next > issued descriptor and re-enable the channel. > * Final case, only TCD_DLAST_SGA gets picked up by the eDMA. The eDMA will > also disable the channel after processing the last TCD, the only difference > is that it will update TCD_DADDR by adding the value TCD_DLAST_SGA. Since > we are not reusing TCD_DADDR, this has no impact. Okay, possible work, let me think more detail after you remove RFC. > > > how do you test it? and how much preformance improved? > I did my tests by doing SPI transfers with the LPSPI controllers, doing DMA > transactions with different number of buffers and different buffer sizes. > Without chaining, interruptions on the SPI bus occur between each DMA > transaction. With chaining, the activity on the SPI bus is continuous as > long as DMA transactions are issued before the end of the current > transaction. Does SPI support issue new transfers without wait for previous transfer complete, or SPI transfer already support async queue? Frank > > Best regards, > -- > Benoît Monin, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-05 15:07 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-30 9:49 [PATCH RFC 0/2] dmaengine: fsl-edma: Scatter/gather improvements Benoît Monin 2026-04-30 9:49 ` [PATCH RFC 1/2] dmaengine: fsl-edma: Implement device_prep_peripheral_dma_vec Benoît Monin 2026-05-04 15:58 ` Frank Li 2026-05-05 13:51 ` Benoît Monin 2026-04-30 9:49 ` [PATCH RFC 2/2] dmaengine: fsl-edma: Support dynamic scatter/gather chaining Benoît Monin 2026-05-04 16:04 ` Frank Li 2026-05-05 13:51 ` Benoît Monin 2026-05-05 15:07 ` Frank Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox