[PATCH 0/3] dmaengine: Add batched scatter-gather DMA support

public inbox for dmaengine@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] dmaengine: Add batched scatter-gather DMA support
@ 2026-03-13  6:49 Sumit Kumar
  2026-03-13  6:49 ` [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer Sumit Kumar
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Sumit Kumar @ 2026-03-13  6:49 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Veerabhadrarao Badiganti,
	Subramanian Ananthanarayanan, Akhil Vinod, Manivannan Sadhasivam,
	Vinod Koul, Marek Szyprowski, Robin Murphy,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas
  Cc: dmaengine, linux-kernel, iommu, linux-pci, mhi, linux-arm-msm,
	Sumit Kumar

Synopsys DesignWare eDMA IP supports a linked-list (LL) mode where
each LL item carries independent source and destination addresses. This
allows multiple independent memory transfers to be described in a single
linked list and submitted to the hardware as one DMA transaction, without
any CPU intervention between items. The IP processes LL items strictly
in order, guaranteeing that scatter-gather entries are never reordered.

This series leverages that hardware capability to introduce a new
dmaengine API — dmaengine_prep_batch_sg_dma() — for batching multiple
independent buffers into a single DMA transaction. Each scatter-gather
entry specifies both its own source (dma_address) and destination
(dma_dst_address), enabling the eDMA hardware to process them as a
single linked-list transaction.

The primary use case is MHI endpoint ring caching. When an MHI ring
wraps around, data spans two non-contiguous memory regions (tail and
head portions). Previously this required two separate DMA transactions
with two interrupts. With this series, both regions are submitted as a
single batched transaction, reducing submission overhead and interrupt
count.

The series includes:
1. Core DMA engine API and DW eDMA driver implementation
2. PCI EPF MHI driver support for batched transfers
3. MHI endpoint ring caching optimization using batched reads

Performance Benefits:
--------------------
- Reduced DMA submission overhead for multiple transfers
- Better hardware utilization through batched operations
- Lower latency for ring wraparound scenarios

Signed-off-by: Sumit Kumar <sumit.kumar@oss.qualcomm.com>
---
Sumit Kumar (3):
      dmaengine: Add multi-buffer support in single DMA transfer
      PCI: epf-mhi: Add batched DMA read support
      bus: mhi: ep: Use batched read for ring caching

 drivers/bus/mhi/ep/ring.c                    |  43 +++++-----
 drivers/dma/dw-edma/Kconfig                  |   1 +
 drivers/dma/dw-edma/dw-edma-core.c           |  40 ++++++++-
 drivers/dma/dw-edma/dw-edma-core.h           |   3 +-
 drivers/pci/endpoint/functions/Kconfig       |   1 +
 drivers/pci/endpoint/functions/pci-epf-mhi.c | 120 +++++++++++++++++++++++++++
 include/linux/dmaengine.h                    |  29 ++++++-
 include/linux/mhi_ep.h                       |   3 +
 include/linux/scatterlist.h                  |   7 ++
 kernel/dma/Kconfig                           |   3 +
 10 files changed, 224 insertions(+), 26 deletions(-)
---
base-commit: f0b9d8eb98dfee8d00419aa07543bdc2c1a44fb1
change-id: 20260108-dma_multi_sg-c217650373c2

Best regards,
-- 
Sumit Kumar <sumit.kumar@oss.qualcomm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer
  2026-03-13  6:49 [PATCH 0/3] dmaengine: Add batched scatter-gather DMA support Sumit Kumar
@ 2026-03-13  6:49 ` Sumit Kumar
  2026-03-13 15:16   ` Robin Murphy
  2026-03-17 10:54   ` Vinod Koul
  2026-03-13  6:49 ` [PATCH 2/3] PCI: epf-mhi: Add batched DMA read support Sumit Kumar
  2026-03-13  6:49 ` [PATCH 3/3] bus: mhi: ep: Use batched read for ring caching Sumit Kumar
  2 siblings, 2 replies; 7+ messages in thread
From: Sumit Kumar @ 2026-03-13  6:49 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Veerabhadrarao Badiganti,
	Subramanian Ananthanarayanan, Akhil Vinod, Manivannan Sadhasivam,
	Vinod Koul, Marek Szyprowski, Robin Murphy,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas
  Cc: dmaengine, linux-kernel, iommu, linux-pci, mhi, linux-arm-msm,
	Sumit Kumar

Add dmaengine_prep_batch_sg API for batching multiple independent buffers
in a single DMA transaction. Each scatter-gather entry specifies both
source and destination addresses. This allows multiple non-contiguous
memory regions to be transferred in a single DMA transaction instead of
separate operations, significantly reducing submission overhead and
interrupt overhead.

Extends struct scatterlist with optional dma_dst_address field
and implements support in dw-edma driver.

Signed-off-by: Sumit Kumar <sumit.kumar@oss.qualcomm.com>
---
 drivers/dma/dw-edma/Kconfig        |  1 +
 drivers/dma/dw-edma/dw-edma-core.c | 40 ++++++++++++++++++++++++++++++++++----
 drivers/dma/dw-edma/dw-edma-core.h |  3 ++-
 include/linux/dmaengine.h          | 29 ++++++++++++++++++++++++++-
 include/linux/scatterlist.h        |  7 +++++++
 kernel/dma/Kconfig                 |  3 +++
 6 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/dw-edma/Kconfig b/drivers/dma/dw-edma/Kconfig
index 2b6f2679508d93b94b7efecd4e36d5902f7b4c99..0472a6554ff38d4cf172a90b6bf0bdaa9e7f4b95 100644
--- a/drivers/dma/dw-edma/Kconfig
+++ b/drivers/dma/dw-edma/Kconfig
@@ -5,6 +5,7 @@ config DW_EDMA
 	depends on PCI && PCI_MSI
 	select DMA_ENGINE
 	select DMA_VIRTUAL_CHANNELS
+	select NEED_SG_DMA_DST_ADDR
 	help
 	  Support the Synopsys DesignWare eDMA controller, normally
 	  implemented on endpoints SoCs.
diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 8e5f7defa6b678eefe0f312ebc59f654677c744f..04314cfd82edbed6ed3665eb4c8e6b428339c207 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -411,6 +411,9 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
 			return NULL;
 		if (!xfer->xfer.il->src_inc || !xfer->xfer.il->dst_inc)
 			return NULL;
+	} else if (xfer->type == EDMA_XFER_DUAL_ADDR_SG) {
+		if (xfer->xfer.sg.len < 1)
+			return NULL;
 	} else {
 		return NULL;
 	}
@@ -438,7 +441,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
 
 	if (xfer->type == EDMA_XFER_CYCLIC) {
 		cnt = xfer->xfer.cyclic.cnt;
-	} else if (xfer->type == EDMA_XFER_SCATTER_GATHER) {
+	} else if (xfer->type == EDMA_XFER_SCATTER_GATHER || xfer->type == EDMA_XFER_DUAL_ADDR_SG) {
 		cnt = xfer->xfer.sg.len;
 		sg = xfer->xfer.sg.sgl;
 	} else if (xfer->type == EDMA_XFER_INTERLEAVED) {
@@ -447,7 +450,8 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
 	}
 
 	for (i = 0; i < cnt; i++) {
-		if (xfer->type == EDMA_XFER_SCATTER_GATHER && !sg)
+		if ((xfer->type == EDMA_XFER_SCATTER_GATHER ||
+		     xfer->type == EDMA_XFER_DUAL_ADDR_SG) && !sg)
 			break;
 
 		if (chunk->bursts_alloc == chan->ll_max) {
@@ -462,7 +466,8 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
 
 		if (xfer->type == EDMA_XFER_CYCLIC)
 			burst->sz = xfer->xfer.cyclic.len;
-		else if (xfer->type == EDMA_XFER_SCATTER_GATHER)
+		else if (xfer->type == EDMA_XFER_SCATTER_GATHER ||
+			 xfer->type == EDMA_XFER_DUAL_ADDR_SG)
 			burst->sz = sg_dma_len(sg);
 		else if (xfer->type == EDMA_XFER_INTERLEAVED)
 			burst->sz = xfer->xfer.il->sgl[i % fsz].size;
@@ -486,6 +491,9 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
 				 */
 			} else if (xfer->type == EDMA_XFER_INTERLEAVED) {
 				burst->dar = dst_addr;
+			} else if (xfer->type == EDMA_XFER_DUAL_ADDR_SG) {
+				burst->sar = dw_edma_get_pci_address(chan, sg_dma_address(sg));
+				burst->dar = sg_dma_dst_address(sg);
 			}
 		} else {
 			burst->dar = dst_addr;
@@ -503,10 +511,14 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
 				 */
 			}  else if (xfer->type == EDMA_XFER_INTERLEAVED) {
 				burst->sar = src_addr;
+			} else if (xfer->type == EDMA_XFER_DUAL_ADDR_SG) {
+				burst->sar = sg_dma_address(sg);
+				burst->dar = dw_edma_get_pci_address(chan, sg_dma_dst_address(sg));
 			}
 		}
 
-		if (xfer->type == EDMA_XFER_SCATTER_GATHER) {
+		if (xfer->type == EDMA_XFER_SCATTER_GATHER ||
+		    xfer->type == EDMA_XFER_DUAL_ADDR_SG) {
 			sg = sg_next(sg);
 		} else if (xfer->type == EDMA_XFER_INTERLEAVED) {
 			struct dma_interleaved_template *il = xfer->xfer.il;
@@ -603,6 +615,25 @@ static void dw_hdma_set_callback_result(struct virt_dma_desc *vd,
 	res->residue = residue;
 }
 
+static struct dma_async_tx_descriptor *
+dw_edma_device_prep_batch_sg_dma(struct dma_chan *dchan,
+				 struct scatterlist *sg,
+				 unsigned int nents,
+				 enum dma_transfer_direction direction,
+				 unsigned long flags)
+{
+	struct dw_edma_transfer xfer;
+
+	xfer.dchan = dchan;
+	xfer.direction = direction;
+	xfer.xfer.sg.sgl = sg;
+	xfer.xfer.sg.len = nents;
+	xfer.flags = flags;
+	xfer.type = EDMA_XFER_DUAL_ADDR_SG;
+
+	return dw_edma_device_transfer(&xfer);
+}
+
 static void dw_edma_done_interrupt(struct dw_edma_chan *chan)
 {
 	struct dw_edma_desc *desc;
@@ -818,6 +849,7 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
 	dma->device_prep_slave_sg = dw_edma_device_prep_slave_sg;
 	dma->device_prep_dma_cyclic = dw_edma_device_prep_dma_cyclic;
 	dma->device_prep_interleaved_dma = dw_edma_device_prep_interleaved_dma;
+	dma->device_prep_batch_sg_dma = dw_edma_device_prep_batch_sg_dma;
 
 	dma_set_max_seg_size(dma->dev, U32_MAX);
 
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
index 71894b9e0b1539c636171738963e80a0a5ef43a4..1a266dc58315edb3d5fd9eddb19fc350f1ed9a1b 100644
--- a/drivers/dma/dw-edma/dw-edma-core.h
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -36,7 +36,8 @@ enum dw_edma_status {
 enum dw_edma_xfer_type {
 	EDMA_XFER_SCATTER_GATHER = 0,
 	EDMA_XFER_CYCLIC,
-	EDMA_XFER_INTERLEAVED
+	EDMA_XFER_INTERLEAVED,
+	EDMA_XFER_DUAL_ADDR_SG,
 };
 
 struct dw_edma_chan;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 99efe2b9b4ea9844ca6161208362ef18ef111d96..fdba75b5c40f805904a6697fce3062303fea762a 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -939,7 +939,11 @@ struct dma_device {
 		size_t period_len, enum dma_transfer_direction direction,
 		unsigned long flags);
 	struct dma_async_tx_descriptor *(*device_prep_interleaved_dma)(
-		struct dma_chan *chan, struct dma_interleaved_template *xt,
+	    struct dma_chan *chan, struct dma_interleaved_template *xt,
+		unsigned long flags);
+	struct dma_async_tx_descriptor *(*device_prep_batch_sg_dma)
+	    (struct dma_chan *chan, struct scatterlist *sg, unsigned int nents,
+	    enum dma_transfer_direction direction,
 		unsigned long flags);
 
 	void (*device_caps)(struct dma_chan *chan, struct dma_slave_caps *caps);
@@ -1060,6 +1064,29 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_interleaved_dma(
 	return chan->device->device_prep_interleaved_dma(chan, xt, flags);
 }
 
+/**
+ * dmaengine_prep_batch_sg_dma() - Prepare single DMA transfer for multiple independent buffers.
+ * @chan: DMA channel
+ * @sg: Scatter-gather list with both source (dma_address) and destination (dma_dst_address)
+ * @nents: Number of entries in the list
+ * @direction: Transfer direction (DMA_MEM_TO_MEM, DMA_DEV_TO_MEM, DMA_MEM_TO_DEV)
+ * @flags: DMA engine flags
+ *
+ * Each SG entry contains both source (sg_dma_address) and destination (sg_dma_dst_address).
+ * This allows multiple independent transfers in a single DMA transaction.
+ * Requires CONFIG_NEED_SG_DMA_DST_ADDR to be enabled.
+ */
+static inline struct dma_async_tx_descriptor *dmaengine_prep_batch_sg_dma
+		(struct dma_chan *chan, struct scatterlist *sg, unsigned int nents,
+		enum dma_transfer_direction direction, unsigned long flags)
+{
+	if (!chan || !chan->device || !chan->device->device_prep_batch_sg_dma ||
+	    !sg || !nents)
+		return NULL;
+
+	return chan->device->device_prep_batch_sg_dma(chan, sg, nents, direction, flags);
+}
+
 /**
  * dmaengine_prep_dma_memset() - Prepare a DMA memset descriptor.
  * @chan: The channel to be used for this descriptor
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 29f6ceb98d74b118d08b6a3d4eb7f62dcde0495d..20b65ffcd5e2a65ec5026a29344caf6baa09700b 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -19,6 +19,9 @@ struct scatterlist {
 #ifdef CONFIG_NEED_SG_DMA_FLAGS
 	unsigned int    dma_flags;
 #endif
+#ifdef CONFIG_NEED_SG_DMA_DST_ADDR
+	dma_addr_t	dma_dst_address;
+#endif
 };
 
 /*
@@ -36,6 +39,10 @@ struct scatterlist {
 #define sg_dma_len(sg)		((sg)->length)
 #endif
 
+#ifdef CONFIG_NEED_SG_DMA_DST_ADDR
+#define sg_dma_dst_address(sg)	((sg)->dma_dst_address)
+#endif
+
 struct sg_table {
 	struct scatterlist *sgl;	/* the list */
 	unsigned int nents;		/* number of mapped entries */
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 31cfdb6b4bc3e33c239111955d97b3ec160baafa..3539b5b1efe27be7ccbfebb358dbb9cad2868f11 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -32,6 +32,9 @@ config NEED_SG_DMA_LENGTH
 config NEED_DMA_MAP_STATE
 	bool
 
+config NEED_SG_DMA_DST_ADDR
+	bool
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool 64BIT || PHYS_ADDR_T_64BIT
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] PCI: epf-mhi: Add batched DMA read support
  2026-03-13  6:49 [PATCH 0/3] dmaengine: Add batched scatter-gather DMA support Sumit Kumar
  2026-03-13  6:49 ` [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer Sumit Kumar
@ 2026-03-13  6:49 ` Sumit Kumar
  2026-03-13  6:49 ` [PATCH 3/3] bus: mhi: ep: Use batched read for ring caching Sumit Kumar
  2 siblings, 0 replies; 7+ messages in thread
From: Sumit Kumar @ 2026-03-13  6:49 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Veerabhadrarao Badiganti,
	Subramanian Ananthanarayanan, Akhil Vinod, Manivannan Sadhasivam,
	Vinod Koul, Marek Szyprowski, Robin Murphy,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas
  Cc: dmaengine, linux-kernel, iommu, linux-pci, mhi, linux-arm-msm,
	Sumit Kumar

Add support for batched DMA transfers in the PCI EPF MHI driver to
improve performance when reading multiple buffers from the host.

Implement two variants of the read_batch() callback:
- pci_epf_mhi_edma_read_batch(): DMA-optimized implementation using
  dmaengine_prep_batch_sg_dma() to transfer multiple buffers in a single
  DMA transaction.
- pci_epf_mhi_iatu_read_batch(): CPU-copy fallback that sequentially
  processes buffers using IATU.

This enables the MHI endpoint stack to efficiently cache ring data,
particularly for wraparound scenarios where ring data spans two
non-contiguous memory regions.

Signed-off-by: Sumit Kumar <sumit.kumar@oss.qualcomm.com>
---
 drivers/pci/endpoint/functions/Kconfig       |   1 +
 drivers/pci/endpoint/functions/pci-epf-mhi.c | 120 +++++++++++++++++++++++++++
 include/linux/mhi_ep.h                       |   3 +
 3 files changed, 124 insertions(+)

diff --git a/drivers/pci/endpoint/functions/Kconfig b/drivers/pci/endpoint/functions/Kconfig
index 0c9cea0698d7bd3d8bd11aa1db0195978d9406b9..43131b6db8a2ca57b7a4f0eba8affba3a77f9ad7 100644
--- a/drivers/pci/endpoint/functions/Kconfig
+++ b/drivers/pci/endpoint/functions/Kconfig
@@ -41,6 +41,7 @@ config PCI_EPF_VNTB
 config PCI_EPF_MHI
 	tristate "PCI Endpoint driver for MHI bus"
 	depends on PCI_ENDPOINT && MHI_BUS_EP
+	select NEED_SG_DMA_DST_ADDR
 	help
 	   Enable this configuration option to enable the PCI Endpoint
 	   driver for Modem Host Interface (MHI) bus in Qualcomm Endpoint
diff --git a/drivers/pci/endpoint/functions/pci-epf-mhi.c b/drivers/pci/endpoint/functions/pci-epf-mhi.c
index 6643a88c7a0ce38161bc6253c09d29f1c36ba394..198201d734cc2c6d09be229464a8efdafc3cd611 100644
--- a/drivers/pci/endpoint/functions/pci-epf-mhi.c
+++ b/drivers/pci/endpoint/functions/pci-epf-mhi.c
@@ -448,6 +448,124 @@ static int pci_epf_mhi_edma_write(struct mhi_ep_cntrl *mhi_cntrl,
 	return ret;
 }
 
+static int pci_epf_mhi_iatu_read_batch(struct mhi_ep_cntrl *mhi_cntrl,
+				       struct mhi_ep_buf_info *buf_info_array,
+				       u32 num_buffers)
+{
+	int ret;
+	u32 i;
+
+	for (i = 0; i < num_buffers; i++) {
+		ret = pci_epf_mhi_iatu_read(mhi_cntrl, &buf_info_array[i]);
+		if (ret < 0)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int pci_epf_mhi_edma_read_batch(struct mhi_ep_cntrl *mhi_cntrl,
+				       struct mhi_ep_buf_info *buf_info_array,
+				       u32 num_buffers)
+{
+	struct pci_epf_mhi *epf_mhi = to_epf_mhi(mhi_cntrl);
+	struct device *dma_dev = epf_mhi->epf->epc->dev.parent;
+	struct dma_chan *chan = epf_mhi->dma_chan_rx;
+	struct device *dev = &epf_mhi->epf->dev;
+	struct dma_async_tx_descriptor *desc;
+	struct dma_slave_config config = {};
+	DECLARE_COMPLETION_ONSTACK(complete);
+	struct scatterlist *sg;
+	dma_addr_t *dst_addrs;
+	dma_cookie_t cookie;
+	int ret;
+	u32 i;
+
+	if (num_buffers == 0)
+		return -EINVAL;
+
+	mutex_lock(&epf_mhi->lock);
+
+	sg = kcalloc(num_buffers, sizeof(*sg), GFP_KERNEL);
+	if (!sg) {
+		ret = -ENOMEM;
+		goto err_unlock;
+	}
+
+	dst_addrs = kcalloc(num_buffers, sizeof(*dst_addrs), GFP_KERNEL);
+	if (!dst_addrs) {
+		ret = -ENOMEM;
+		goto err_free_sg;
+	}
+
+	sg_init_table(sg, num_buffers);
+
+	for (i = 0; i < num_buffers; i++) {
+		dst_addrs[i] = dma_map_single(dma_dev, buf_info_array[i].dev_addr,
+					      buf_info_array[i].size, DMA_FROM_DEVICE);
+		ret = dma_mapping_error(dma_dev, dst_addrs[i]);
+		if (ret) {
+			dev_err(dev, "Failed to map buffer %u\n", i);
+			goto err_unmap;
+		}
+
+		sg_dma_address(&sg[i]) = buf_info_array[i].host_addr;
+		sg_dma_dst_address(&sg[i]) = dst_addrs[i];
+		sg_dma_len(&sg[i]) = buf_info_array[i].size;
+	}
+
+	config.direction = DMA_DEV_TO_MEM;
+	ret = dmaengine_slave_config(chan, &config);
+	if (ret) {
+		dev_err(dev, "Failed to configure DMA channel\n");
+		goto err_unmap;
+	}
+
+	desc = dmaengine_prep_batch_sg_dma(chan, sg, num_buffers,
+					   DMA_DEV_TO_MEM,
+					   DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
+	if (!desc) {
+		dev_err(dev, "Failed to prepare batch sg DMA\n");
+		ret = -EIO;
+		goto err_unmap;
+	}
+
+	desc->callback = pci_epf_mhi_dma_callback;
+	desc->callback_param = &complete;
+
+	cookie = dmaengine_submit(desc);
+	ret = dma_submit_error(cookie);
+	if (ret) {
+		dev_err(dev, "Failed to submit DMA\n");
+		goto err_unmap;
+	}
+
+	dma_async_issue_pending(chan);
+
+	ret = wait_for_completion_timeout(&complete, msecs_to_jiffies(1000));
+	if (!ret) {
+		dev_err(dev, "DMA transfer timeout\n");
+		dmaengine_terminate_sync(chan);
+		ret = -ETIMEDOUT;
+		goto err_unmap;
+	}
+
+	ret = 0;
+
+err_unmap:
+	for (i = 0; i < num_buffers; i++) {
+		if (dst_addrs[i])
+			dma_unmap_single(dma_dev, dst_addrs[i],
+					 buf_info_array[i].size, DMA_FROM_DEVICE);
+	}
+	kfree(dst_addrs);
+err_free_sg:
+	kfree(sg);
+err_unlock:
+	mutex_unlock(&epf_mhi->lock);
+	return ret;
+}
+
 static void pci_epf_mhi_dma_worker(struct work_struct *work)
 {
 	struct pci_epf_mhi *epf_mhi = container_of(work, struct pci_epf_mhi, dma_work);
@@ -803,11 +921,13 @@ static int pci_epf_mhi_link_up(struct pci_epf *epf)
 	mhi_cntrl->unmap_free = pci_epf_mhi_unmap_free;
 	mhi_cntrl->read_sync = mhi_cntrl->read_async = pci_epf_mhi_iatu_read;
 	mhi_cntrl->write_sync = mhi_cntrl->write_async = pci_epf_mhi_iatu_write;
+	mhi_cntrl->read_batch = pci_epf_mhi_iatu_read_batch;
 	if (info->flags & MHI_EPF_USE_DMA) {
 		mhi_cntrl->read_sync = pci_epf_mhi_edma_read;
 		mhi_cntrl->write_sync = pci_epf_mhi_edma_write;
 		mhi_cntrl->read_async = pci_epf_mhi_edma_read_async;
 		mhi_cntrl->write_async = pci_epf_mhi_edma_write_async;
+		mhi_cntrl->read_batch = pci_epf_mhi_edma_read_batch;
 	}
 
 	/* Register the MHI EP controller */
diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h
index 7b40fc8cbe77ab8419d167e89264b69a817b9fb1..15554f966e4be1aea1f3129c5f26253f5087edba 100644
--- a/include/linux/mhi_ep.h
+++ b/include/linux/mhi_ep.h
@@ -107,6 +107,7 @@ struct mhi_ep_buf_info {
  * @write_sync: CB function for writing to host memory synchronously
  * @read_async: CB function for reading from host memory asynchronously
  * @write_async: CB function for writing to host memory asynchronously
+ * @read_batch: CB function for reading from host memory in batches synchronously
  * @mhi_state: MHI Endpoint state
  * @max_chan: Maximum channels supported by the endpoint controller
  * @mru: MRU (Maximum Receive Unit) value of the endpoint controller
@@ -164,6 +165,8 @@ struct mhi_ep_cntrl {
 	int (*write_sync)(struct mhi_ep_cntrl *mhi_cntrl, struct mhi_ep_buf_info *buf_info);
 	int (*read_async)(struct mhi_ep_cntrl *mhi_cntrl, struct mhi_ep_buf_info *buf_info);
 	int (*write_async)(struct mhi_ep_cntrl *mhi_cntrl, struct mhi_ep_buf_info *buf_info);
+	int (*read_batch)(struct mhi_ep_cntrl *mhi_cntrl, struct mhi_ep_buf_info *buf_info_array,
+			  u32 num_buffers);
 
 	enum mhi_state mhi_state;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] bus: mhi: ep: Use batched read for ring caching
  2026-03-13  6:49 [PATCH 0/3] dmaengine: Add batched scatter-gather DMA support Sumit Kumar
  2026-03-13  6:49 ` [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer Sumit Kumar
  2026-03-13  6:49 ` [PATCH 2/3] PCI: epf-mhi: Add batched DMA read support Sumit Kumar
@ 2026-03-13  6:49 ` Sumit Kumar
  2 siblings, 0 replies; 7+ messages in thread
From: Sumit Kumar @ 2026-03-13  6:49 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Veerabhadrarao Badiganti,
	Subramanian Ananthanarayanan, Akhil Vinod, Manivannan Sadhasivam,
	Vinod Koul, Marek Szyprowski, Robin Murphy,
	Krzysztof Wilczyński, Kishon Vijay Abraham I, Bjorn Helgaas
  Cc: dmaengine, linux-kernel, iommu, linux-pci, mhi, linux-arm-msm,
	Sumit Kumar

Simplify ring caching logic by using the new read_batch() callback
for all ring read operations, replacing the previous approach that
used separate read_sync() calls.

Signed-off-by: Sumit Kumar <sumit.kumar@oss.qualcomm.com>
---
 drivers/bus/mhi/ep/ring.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/bus/mhi/ep/ring.c b/drivers/bus/mhi/ep/ring.c
index 26357ee68dee984d70ae5bf39f8f09f2cbcafe30..03c60c579e12c3bad100c7e1b6a75ae0e5646281 100644
--- a/drivers/bus/mhi/ep/ring.c
+++ b/drivers/bus/mhi/ep/ring.c
@@ -30,7 +30,7 @@ static int __mhi_ep_cache_ring(struct mhi_ep_ring *ring, size_t end)
 {
 	struct mhi_ep_cntrl *mhi_cntrl = ring->mhi_cntrl;
 	struct device *dev = &mhi_cntrl->mhi_dev->dev;
-	struct mhi_ep_buf_info buf_info = {};
+	struct mhi_ep_buf_info buf_info[2] = {};
 	size_t start;
 	int ret;
 
@@ -44,35 +44,38 @@ static int __mhi_ep_cache_ring(struct mhi_ep_ring *ring, size_t end)
 
 	start = ring->wr_offset;
 	if (start < end) {
-		buf_info.size = (end - start) * sizeof(struct mhi_ring_element);
-		buf_info.host_addr = ring->rbase + (start * sizeof(struct mhi_ring_element));
-		buf_info.dev_addr = &ring->ring_cache[start];
+		/* No wraparound */
+		buf_info[0].size = (end - start) * sizeof(struct mhi_ring_element);
+		buf_info[0].host_addr = ring->rbase + (start * sizeof(struct mhi_ring_element));
+		buf_info[0].dev_addr = &ring->ring_cache[start];
 
-		ret = mhi_cntrl->read_sync(mhi_cntrl, &buf_info);
+		ret = mhi_cntrl->read_batch(mhi_cntrl, buf_info, 1);
 		if (ret < 0)
 			return ret;
+
+		dev_dbg(dev, "Cached ring: start %zu end %zu size %zu\n", start, end,
+			buf_info[0].size);
 	} else {
-		buf_info.size = (ring->ring_size - start) * sizeof(struct mhi_ring_element);
-		buf_info.host_addr = ring->rbase + (start * sizeof(struct mhi_ring_element));
-		buf_info.dev_addr = &ring->ring_cache[start];
+		/* Wraparound */
+
+		/* Buffer 0: Tail portion (start → ring_size) */
+		buf_info[0].size = (ring->ring_size - start) * sizeof(struct mhi_ring_element);
+		buf_info[0].host_addr = ring->rbase + (start * sizeof(struct mhi_ring_element));
+		buf_info[0].dev_addr = &ring->ring_cache[start];
 
-		ret = mhi_cntrl->read_sync(mhi_cntrl, &buf_info);
+		/* Buffer 1: Head portion (0 → end) */
+		buf_info[1].size = end * sizeof(struct mhi_ring_element);
+		buf_info[1].host_addr = ring->rbase;
+		buf_info[1].dev_addr = &ring->ring_cache[0];
+
+		ret = mhi_cntrl->read_batch(mhi_cntrl, buf_info, 2);
 		if (ret < 0)
 			return ret;
 
-		if (end) {
-			buf_info.host_addr = ring->rbase;
-			buf_info.dev_addr = &ring->ring_cache[0];
-			buf_info.size = end * sizeof(struct mhi_ring_element);
-
-			ret = mhi_cntrl->read_sync(mhi_cntrl, &buf_info);
-			if (ret < 0)
-				return ret;
-		}
+		dev_dbg(dev, "Cached ring (batched): start %zu end %zu tail_size %zu head_size %zu\n",
+			start, end, buf_info[0].size, buf_info[1].size);
 	}
 
-	dev_dbg(dev, "Cached ring: start %zu end %zu size %zu\n", start, end, buf_info.size);
-
 	return 0;
 }
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer
  2026-03-13  6:49 ` [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer Sumit Kumar
@ 2026-03-13 15:16   ` Robin Murphy
  2026-03-16 17:05     ` Niklas Cassel
  2026-03-17 10:54   ` Vinod Koul
  1 sibling, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2026-03-13 15:16 UTC (permalink / raw)
  To: Sumit Kumar, Krishna Chaitanya Chundru, Veerabhadrarao Badiganti,
	Subramanian Ananthanarayanan, Akhil Vinod, Manivannan Sadhasivam,
	Vinod Koul, Marek Szyprowski, Krzysztof Wilczyński,
	Kishon Vijay Abraham I, Bjorn Helgaas
  Cc: dmaengine, linux-kernel, iommu, linux-pci, mhi, linux-arm-msm

On 2026-03-13 6:49 am, Sumit Kumar wrote:
> Add dmaengine_prep_batch_sg API for batching multiple independent buffers
> in a single DMA transaction. Each scatter-gather entry specifies both
> source and destination addresses. This allows multiple non-contiguous
> memory regions to be transferred in a single DMA transaction instead of
> separate operations, significantly reducing submission overhead and
> interrupt overhead.
> 
> Extends struct scatterlist with optional dma_dst_address field
> and implements support in dw-edma driver.

[...]
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 29f6ceb98d74b118d08b6a3d4eb7f62dcde0495d..20b65ffcd5e2a65ec5026a29344caf6baa09700b 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -19,6 +19,9 @@ struct scatterlist {
>   #ifdef CONFIG_NEED_SG_DMA_FLAGS
>   	unsigned int    dma_flags;
>   #endif
> +#ifdef CONFIG_NEED_SG_DMA_DST_ADDR
> +	dma_addr_t	dma_dst_address;
> +#endif

Eww, no, what does this even mean? Is the regular dma_addr somehow 
implicitly a "source" now? How could the single piece of memory 
represented by page_link/offset/length have two different DMA addresses? 
How are both the DMA mapping code and users supposed to know which one 
is relevant in any particular situation?

If you want to bring back DMA_MEMCPY_SG yet again, and you have an 
actual user this time, then do that (although by now it most likely 
wants to be a dma_vec version). Don't do whatever this is...

If you want to batch multiple 
dmaengine_slave_config()/dma_prep_slave_single() operations into some 
many-to-many variant of dmaengine_prep_peripheral_dma_vec(), then surely 
that requires actual batching of the config part as well - e.g. passing 
an explicit vector of distinct dma_slave_configs corresponding to each 
individual dma_vec - in order to be able to work correctly in general?

Thanks,
Robin.

>   };
>   
>   /*
> @@ -36,6 +39,10 @@ struct scatterlist {
>   #define sg_dma_len(sg)		((sg)->length)
>   #endif
>   
> +#ifdef CONFIG_NEED_SG_DMA_DST_ADDR
> +#define sg_dma_dst_address(sg)	((sg)->dma_dst_address)
> +#endif
> +
>   struct sg_table {
>   	struct scatterlist *sgl;	/* the list */
>   	unsigned int nents;		/* number of mapped entries */
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index 31cfdb6b4bc3e33c239111955d97b3ec160baafa..3539b5b1efe27be7ccbfebb358dbb9cad2868f11 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -32,6 +32,9 @@ config NEED_SG_DMA_LENGTH
>   config NEED_DMA_MAP_STATE
>   	bool
>   
> +config NEED_SG_DMA_DST_ADDR
> +	bool
> +
>   config ARCH_DMA_ADDR_T_64BIT
>   	def_bool 64BIT || PHYS_ADDR_T_64BIT
>   
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer
  2026-03-13 15:16   ` Robin Murphy
@ 2026-03-16 17:05     ` Niklas Cassel
  0 siblings, 0 replies; 7+ messages in thread
From: Niklas Cassel @ 2026-03-16 17:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Sumit Kumar, Krishna Chaitanya Chundru, Veerabhadrarao Badiganti,
	Subramanian Ananthanarayanan, Akhil Vinod, Manivannan Sadhasivam,
	Vinod Koul, Marek Szyprowski, Krzysztof Wilczyński,
	Kishon Vijay Abraham I, Bjorn Helgaas, dmaengine, linux-kernel,
	iommu, linux-pci, mhi, linux-arm-msm, Frank Li

On Fri, Mar 13, 2026 at 03:16:50PM +0000, Robin Murphy wrote:
> On 2026-03-13 6:49 am, Sumit Kumar wrote:
> 
> If you want to batch multiple
> dmaengine_slave_config()/dma_prep_slave_single() operations into some
> many-to-many variant of dmaengine_prep_peripheral_dma_vec(), then surely
> that requires actual batching of the config part as well - e.g. passing an
> explicit vector of distinct dma_slave_configs corresponding to each
> individual dma_vec - in order to be able to work correctly in general?

This make me think of Frank's series which tries to create an API which
combines dmaengine_slave_config() with dmaengine_prep_slave_single():

https://lore.kernel.org/dmaengine/20251218-dma_prep_config-v2-0-c07079836128@nxp.com/

Not exactly the same, but might still be of interest.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer
  2026-03-13  6:49 ` [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer Sumit Kumar
  2026-03-13 15:16   ` Robin Murphy
@ 2026-03-17 10:54   ` Vinod Koul
  1 sibling, 0 replies; 7+ messages in thread
From: Vinod Koul @ 2026-03-17 10:54 UTC (permalink / raw)
  To: Sumit Kumar
  Cc: Krishna Chaitanya Chundru, Veerabhadrarao Badiganti,
	Subramanian Ananthanarayanan, Akhil Vinod, Manivannan Sadhasivam,
	Marek Szyprowski, Robin Murphy, Krzysztof Wilczyński,
	Kishon Vijay Abraham I, Bjorn Helgaas, dmaengine, linux-kernel,
	iommu, linux-pci, mhi, linux-arm-msm

On 13-03-26, 12:19, Sumit Kumar wrote:
> Add dmaengine_prep_batch_sg API for batching multiple independent buffers
> in a single DMA transaction. Each scatter-gather entry specifies both
> source and destination addresses. This allows multiple non-contiguous

Looks like you want to bring back dmaengine_prep_dma_sg() see commit c678fa66341c

> memory regions to be transferred in a single DMA transaction instead of
> separate operations, significantly reducing submission overhead and
> interrupt overhead.
> 
> Extends struct scatterlist with optional dma_dst_address field
> and implements support in dw-edma driver.

If this is memcpy why are you talking about dma_dst_address which is a
slave field?

-- 
~Vinod

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-17 10:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-13  6:49 [PATCH 0/3] dmaengine: Add batched scatter-gather DMA support Sumit Kumar
2026-03-13  6:49 ` [PATCH 1/3] dmaengine: Add multi-buffer support in single DMA transfer Sumit Kumar
2026-03-13 15:16   ` Robin Murphy
2026-03-16 17:05     ` Niklas Cassel
2026-03-17 10:54   ` Vinod Koul
2026-03-13  6:49 ` [PATCH 2/3] PCI: epf-mhi: Add batched DMA read support Sumit Kumar
2026-03-13  6:49 ` [PATCH 3/3] bus: mhi: ep: Use batched read for ring caching Sumit Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox