linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/4] Add ethtool support to configure irq coalescing count and delay
@ 2025-07-10 10:12 Suraj Gupta
  2025-07-10 10:12 ` [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters Suraj Gupta
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Suraj Gupta @ 2025-07-10 10:12 UTC (permalink / raw)
  To: andrew+netdev, davem, kuba, pabeni, michal.simek, vkoul,
	radhey.shyam.pandey
  Cc: netdev, linux-arm-kernel, linux-kernel, dmaengine, harini.katakam

AXI ethernet driver uses AXI DMA dmaengine driver. Add support to
configure / report coalesce parameters dynamically during runtime
via ethtool. Add support in DMAengine driver to communicate coalesce
parameters between client and DMA driver. Add support for Tx and Rx
adaptive irq coalescing with DIM in AXI ethernet driver.

Changes in V2:
- Add DIM in AXI ethernet driver.
- Fix following LKP warning in V1:
 https://lore.kernel.org/all/202505252153.Nm1BzFUq-lkp@intel.com/
- Consolidate separate Dmaengine and netdev series in V1.

V1 axienet and dmaengine series:
https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd.com/
https://lore.kernel.org/all/20250525102217.1181104-1-suraj.gupta2@amd.com/

Suraj Gupta (4):
  dmaengine: Add support to configure and read IRQ coalescing parameters
  dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI
    DMA
  dmaengine: xilinx_dma: Add support to configure/report coalesce
    parameters from/to client using AXI DMA
  net: xilinx: axienet: Add ethtool support to configure/report irq
    coalescing parameters in DMAengine flow

 drivers/dma/xilinx/xilinx_dma.c               |  93 +++++++--
 drivers/net/ethernet/xilinx/xilinx_axienet.h  |  13 +-
 .../net/ethernet/xilinx/xilinx_axienet_main.c | 190 +++++++++++++++++-
 include/linux/dmaengine.h                     |  10 +
 4 files changed, 276 insertions(+), 30 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters
  2025-07-10 10:12 [PATCH V2 0/4] Add ethtool support to configure irq coalescing count and delay Suraj Gupta
@ 2025-07-10 10:12 ` Suraj Gupta
  2025-07-23  7:30   ` Vinod Koul
  2025-07-10 10:12 ` [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA Suraj Gupta
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Suraj Gupta @ 2025-07-10 10:12 UTC (permalink / raw)
  To: andrew+netdev, davem, kuba, pabeni, michal.simek, vkoul,
	radhey.shyam.pandey
  Cc: netdev, linux-arm-kernel, linux-kernel, dmaengine, harini.katakam

Interrupt coalescing is a mechanism to reduce the number of hardware
interrupts triggered ether until a certain amount of work is pending,
or a timeout timer triggers. Tuning the interrupt coalesce settings
involves adjusting the amount of work and timeout delay.
Many DMA controllers support to configure coalesce count and delay.
Add support to configure them via dma_slave_config and read
using dma_slave_caps.

Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
---
 include/linux/dmaengine.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index bb146c5ac3e4..c7c1adb8e571 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -431,6 +431,9 @@ enum dma_slave_buswidth {
  * @peripheral_config: peripheral configuration for programming peripheral
  * for dmaengine transfer
  * @peripheral_size: peripheral configuration buffer size
+ * @coalesce_cnt: Maximum number of transfers before receiving an interrupt.
+ * @coalesce_usecs: How many usecs to delay an interrupt after a transfer
+ * is completed.
  *
  * This struct is passed in as configuration data to a DMA engine
  * in order to set up a certain channel for DMA transport at runtime.
@@ -457,6 +460,8 @@ struct dma_slave_config {
 	bool device_fc;
 	void *peripheral_config;
 	size_t peripheral_size;
+	u32 coalesce_cnt;
+	u32 coalesce_usecs;
 };
 
 /**
@@ -507,6 +512,9 @@ enum dma_residue_granularity {
  * @residue_granularity: granularity of the reported transfer residue
  * @descriptor_reuse: if a descriptor can be reused by client and
  * resubmitted multiple times
+ * @coalesce_cnt: Maximum number of transfers before receiving an interrupt.
+ * @coalesce_usecs: How many usecs to delay an interrupt after a transfer
+ * is completed.
  */
 struct dma_slave_caps {
 	u32 src_addr_widths;
@@ -520,6 +528,8 @@ struct dma_slave_caps {
 	bool cmd_terminate;
 	enum dma_residue_granularity residue_granularity;
 	bool descriptor_reuse;
+	u32 coalesce_cnt;
+	u32 coalesce_usecs;
 };
 
 static inline const char *dma_chan_name(struct dma_chan *chan)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA
  2025-07-10 10:12 [PATCH V2 0/4] Add ethtool support to configure irq coalescing count and delay Suraj Gupta
  2025-07-10 10:12 ` [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters Suraj Gupta
@ 2025-07-10 10:12 ` Suraj Gupta
  2025-07-10 11:26   ` Simon Horman
                     ` (3 more replies)
  2025-07-10 10:12 ` [PATCH V2 3/4] dmaengine: xilinx_dma: Add support to configure/report coalesce parameters from/to client using " Suraj Gupta
  2025-07-10 10:12 ` [PATCH V2 4/4] net: xilinx: axienet: Add ethtool support to configure/report irq coalescing parameters in DMAengine flow Suraj Gupta
  3 siblings, 4 replies; 16+ messages in thread
From: Suraj Gupta @ 2025-07-10 10:12 UTC (permalink / raw)
  To: andrew+netdev, davem, kuba, pabeni, michal.simek, vkoul,
	radhey.shyam.pandey
  Cc: netdev, linux-arm-kernel, linux-kernel, dmaengine, harini.katakam

AXI DMA driver incorrectly assumes complete transfer completion upon
IRQ reception, particularly problematic when IRQ coalescing is active.
Updating the tail pointer dynamically fixes it.
Remove existing idle state validation in the beginning of
xilinx_dma_start_transfer() as it blocks valid transfer initiation on
busy channels with queued descriptors.
Additionally, refactor xilinx_dma_start_transfer() to consolidate coalesce
and delay configurations while conditionally starting channels
only when idle.

Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
Fixes: Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine")
---
 drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index a34d8f0ceed8..187749b7b8a6 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -1548,9 +1548,6 @@ static void xilinx_dma_start_transfer(struct xilinx_dma_chan *chan)
 	if (list_empty(&chan->pending_list))
 		return;
 
-	if (!chan->idle)
-		return;
-
 	head_desc = list_first_entry(&chan->pending_list,
 				     struct xilinx_dma_tx_descriptor, node);
 	tail_desc = list_last_entry(&chan->pending_list,
@@ -1558,23 +1555,24 @@ static void xilinx_dma_start_transfer(struct xilinx_dma_chan *chan)
 	tail_segment = list_last_entry(&tail_desc->segments,
 				       struct xilinx_axidma_tx_segment, node);
 
+	if (chan->has_sg && list_empty(&chan->active_list))
+		xilinx_write(chan, XILINX_DMA_REG_CURDESC,
+			     head_desc->async_tx.phys);
+
 	reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
 
 	if (chan->desc_pendingcount <= XILINX_DMA_COALESCE_MAX) {
 		reg &= ~XILINX_DMA_CR_COALESCE_MAX;
 		reg |= chan->desc_pendingcount <<
 				  XILINX_DMA_CR_COALESCE_SHIFT;
-		dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
 	}
 
-	if (chan->has_sg)
-		xilinx_write(chan, XILINX_DMA_REG_CURDESC,
-			     head_desc->async_tx.phys);
 	reg  &= ~XILINX_DMA_CR_DELAY_MAX;
 	reg  |= chan->irq_delay << XILINX_DMA_CR_DELAY_SHIFT;
 	dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
 
-	xilinx_dma_start(chan);
+	if (chan->idle)
+		xilinx_dma_start(chan);
 
 	if (chan->err)
 		return;
@@ -1914,8 +1912,10 @@ static irqreturn_t xilinx_dma_irq_handler(int irq, void *data)
 		      XILINX_DMA_DMASR_DLY_CNT_IRQ)) {
 		spin_lock(&chan->lock);
 		xilinx_dma_complete_descriptor(chan);
-		chan->idle = true;
-		chan->start_transfer(chan);
+		if (list_empty(&chan->active_list)) {
+			chan->idle = true;
+			chan->start_transfer(chan);
+		}
 		spin_unlock(&chan->lock);
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 3/4] dmaengine: xilinx_dma: Add support to configure/report coalesce parameters from/to client using AXI DMA
  2025-07-10 10:12 [PATCH V2 0/4] Add ethtool support to configure irq coalescing count and delay Suraj Gupta
  2025-07-10 10:12 ` [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters Suraj Gupta
  2025-07-10 10:12 ` [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA Suraj Gupta
@ 2025-07-10 10:12 ` Suraj Gupta
  2025-07-10 10:12 ` [PATCH V2 4/4] net: xilinx: axienet: Add ethtool support to configure/report irq coalescing parameters in DMAengine flow Suraj Gupta
  3 siblings, 0 replies; 16+ messages in thread
From: Suraj Gupta @ 2025-07-10 10:12 UTC (permalink / raw)
  To: andrew+netdev, davem, kuba, pabeni, michal.simek, vkoul,
	radhey.shyam.pandey
  Cc: netdev, linux-arm-kernel, linux-kernel, dmaengine, harini.katakam

AXI DMA supports interrupt coalescing. Client can fine-tune coalesce
parameters based on transaction load. Add support to configure/
report coalesce parameters.
Change delay setting to scale with SG clock rate rather than being a
fixed number of clock cycles (Referred from AXI ethernet driver).
Increase Buffer Descriptors ring size from 512 to 1024 to allow
sufficient space in BD ring during max coalesce count of 255.

Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
---
 drivers/dma/xilinx/xilinx_dma.c | 73 +++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 7 deletions(-)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 187749b7b8a6..26f328cd3e10 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -33,6 +33,7 @@
  *
  */
 
+#include <linux/bitfield.h>
 #include <linux/bitops.h>
 #include <linux/dmapool.h>
 #include <linux/dma/xilinx_dma.h>
@@ -159,6 +160,9 @@
 		 XILINX_DMA_DMASR_SOF_EARLY_ERR | \
 		 XILINX_DMA_DMASR_DMA_INT_ERR)
 
+/* Constant to convert delay counts to microseconds */
+#define XILINX_DMA_DELAY_SCALE		(125ULL * USEC_PER_SEC)
+
 /* Axi VDMA Flush on Fsync bits */
 #define XILINX_DMA_FLUSH_S2MM		3
 #define XILINX_DMA_FLUSH_MM2S		2
@@ -184,7 +188,7 @@
 #define XILINX_DMA_BD_EOP		BIT(26)
 #define XILINX_DMA_BD_COMP_MASK		BIT(31)
 #define XILINX_DMA_COALESCE_MAX		255
-#define XILINX_DMA_NUM_DESCS		512
+#define XILINX_DMA_NUM_DESCS		1024
 #define XILINX_DMA_NUM_APP_WORDS	5
 
 /* AXI CDMA Specific Registers/Offsets */
@@ -403,6 +407,7 @@ struct xilinx_dma_tx_descriptor {
  * @terminating: Check for channel being synchronized by user
  * @tasklet: Cleanup work after irq
  * @config: Device configuration info
+ * @slave_cfg: Device configuration info from Dmaengine
  * @flush_on_fsync: Flush on Frame sync
  * @desc_pendingcount: Descriptor pending count
  * @ext_addr: Indicates 64 bit addressing is supported by dma channel
@@ -442,6 +447,7 @@ struct xilinx_dma_chan {
 	bool terminating;
 	struct tasklet_struct tasklet;
 	struct xilinx_vdma_config config;
+	struct dma_slave_config slave_cfg;
 	bool flush_on_fsync;
 	u32 desc_pendingcount;
 	bool ext_addr;
@@ -1540,7 +1546,9 @@ static void xilinx_dma_start_transfer(struct xilinx_dma_chan *chan)
 {
 	struct xilinx_dma_tx_descriptor *head_desc, *tail_desc;
 	struct xilinx_axidma_tx_segment *tail_segment;
-	u32 reg;
+	struct dma_slave_config *slave_cfg = &chan->slave_cfg;
+	u64 clk_rate;
+	u32 reg, usec, timer;
 
 	if (chan->err)
 		return;
@@ -1561,14 +1569,32 @@ static void xilinx_dma_start_transfer(struct xilinx_dma_chan *chan)
 
 	reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
 
-	if (chan->desc_pendingcount <= XILINX_DMA_COALESCE_MAX) {
-		reg &= ~XILINX_DMA_CR_COALESCE_MAX;
+	reg &= ~XILINX_DMA_CR_COALESCE_MAX;
+	reg  &= ~XILINX_DMA_CR_DELAY_MAX;
+
+	/* Use dma_slave_config if it has valid values */
+	if (slave_cfg->coalesce_cnt &&
+	    slave_cfg->coalesce_cnt <= XILINX_DMA_COALESCE_MAX)
+		reg |= slave_cfg->coalesce_cnt <<
+			XILINX_DMA_CR_COALESCE_SHIFT;
+	else if (chan->desc_pendingcount <= XILINX_DMA_COALESCE_MAX)
 		reg |= chan->desc_pendingcount <<
 				  XILINX_DMA_CR_COALESCE_SHIFT;
-	}
 
-	reg  &= ~XILINX_DMA_CR_DELAY_MAX;
-	reg  |= chan->irq_delay << XILINX_DMA_CR_DELAY_SHIFT;
+	if (slave_cfg->coalesce_usecs <= XILINX_DMA_DMACR_DELAY_MAX)
+		usec = slave_cfg->coalesce_usecs;
+	else
+		usec = chan->irq_delay;
+
+	/* Scale with SG clock rate rather than being a fixed number of
+	 * clock cycles.
+	 * 1 Timeout Interval = 125 * (clock period of SG clock)
+	 */
+	clk_rate = clk_get_rate(chan->xdev->rx_clk);
+	timer = DIV64_U64_ROUND_CLOSEST((u64)usec * clk_rate,
+					XILINX_DMA_DELAY_SCALE);
+	timer = min(timer, FIELD_MAX(XILINX_DMA_DMACR_DELAY_MASK));
+	reg |= timer << XILINX_DMA_CR_DELAY_SHIFT;
 	dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
 
 	if (chan->idle)
@@ -1701,9 +1727,41 @@ static void xilinx_dma_issue_pending(struct dma_chan *dchan)
 static int xilinx_dma_device_config(struct dma_chan *dchan,
 				    struct dma_slave_config *config)
 {
+	struct xilinx_dma_chan *chan = to_xilinx_chan(dchan);
+
+	if (chan->xdev->dma_config->dmatype != XDMA_TYPE_AXIDMA)
+		return 0;
+
+	if (!config->coalesce_cnt ||
+	    config->coalesce_cnt > XILINX_DMA_DMACR_FRAME_COUNT_MAX ||
+	    config->coalesce_usecs > XILINX_DMA_DMACR_DELAY_MAX)
+		return -EINVAL;
+
+	chan->slave_cfg.coalesce_cnt = config->coalesce_cnt;
+	chan->slave_cfg.coalesce_usecs = config->coalesce_usecs;
+
 	return 0;
 }
 
+static void xilinx_dma_device_caps(struct dma_chan *dchan,
+				   struct dma_slave_caps *caps)
+{
+	struct xilinx_dma_chan *chan = to_xilinx_chan(dchan);
+	u64 clk_rate, timer;
+	u32 reg;
+
+	if (chan->xdev->dma_config->dmatype != XDMA_TYPE_AXIDMA)
+		return;
+
+	reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
+	caps->coalesce_cnt = FIELD_GET(XILINX_DMA_CR_COALESCE_MAX, reg);
+
+	clk_rate = clk_get_rate(chan->xdev->rx_clk);
+	timer = FIELD_GET(XILINX_DMA_CR_DELAY_MAX, reg);
+	caps->coalesce_usecs = DIV64_U64_ROUND_CLOSEST(timer * XILINX_DMA_DELAY_SCALE,
+						       clk_rate);
+}
+
 /**
  * xilinx_dma_complete_descriptor - Mark the active descriptor as complete
  * @chan : xilinx DMA channel
@@ -3178,6 +3236,7 @@ static int xilinx_dma_probe(struct platform_device *pdev)
 	xdev->common.device_tx_status = xilinx_dma_tx_status;
 	xdev->common.device_issue_pending = xilinx_dma_issue_pending;
 	xdev->common.device_config = xilinx_dma_device_config;
+	xdev->common.device_caps = xilinx_dma_device_caps;
 	if (xdev->dma_config->dmatype == XDMA_TYPE_AXIDMA) {
 		dma_cap_set(DMA_CYCLIC, xdev->common.cap_mask);
 		xdev->common.device_prep_slave_sg = xilinx_dma_prep_slave_sg;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 4/4] net: xilinx: axienet: Add ethtool support to configure/report irq coalescing parameters in DMAengine flow
  2025-07-10 10:12 [PATCH V2 0/4] Add ethtool support to configure irq coalescing count and delay Suraj Gupta
                   ` (2 preceding siblings ...)
  2025-07-10 10:12 ` [PATCH V2 3/4] dmaengine: xilinx_dma: Add support to configure/report coalesce parameters from/to client using " Suraj Gupta
@ 2025-07-10 10:12 ` Suraj Gupta
  2025-07-11 16:33   ` Subbaraya Sundeep
  3 siblings, 1 reply; 16+ messages in thread
From: Suraj Gupta @ 2025-07-10 10:12 UTC (permalink / raw)
  To: andrew+netdev, davem, kuba, pabeni, michal.simek, vkoul,
	radhey.shyam.pandey
  Cc: netdev, linux-arm-kernel, linux-kernel, dmaengine, harini.katakam

Add support to configure and report interrupt coalesce count and delay
via ethtool in DMAEngine flow.
Enable Tx and Rx adaptive irq coalescing with DIM to allow runtime
configuration of coalesce count based on load. CQE profiles same as
legacy (non-dmaengine) flow are used.
Increase Rx skb ring size from 128 as maximum coalesce packets are 255.

Netperf numbers and CPU usage after DIM:
TCP Tx:	885 Mb/s, 27.02%
TCP Rx:	640 Mb/s, 27.73%
UDP Tx: 857 Mb/s, 25.00%
UDP Rx:	730 Mb/s, 23.94%

Above numbers are observed with 4x Cortex-a53.

Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
---
 drivers/net/ethernet/xilinx/xilinx_axienet.h  |  13 +-
 .../net/ethernet/xilinx/xilinx_axienet_main.c | 190 +++++++++++++++++-
 2 files changed, 190 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
index 5ff742103beb..747efde9a05f 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
@@ -126,6 +126,9 @@
 #define XAXIDMA_DFT_TX_USEC		50
 #define XAXIDMA_DFT_RX_USEC		16
 
+/* Default TX delay timer value for SGDMA mode with DMAEngine */
+#define XAXIDMAENGINE_DFT_TX_USEC	16
+
 #define XAXIDMA_BD_CTRL_TXSOF_MASK	0x08000000 /* First tx packet */
 #define XAXIDMA_BD_CTRL_TXEOF_MASK	0x04000000 /* Last tx packet */
 #define XAXIDMA_BD_CTRL_ALL_MASK	0x0C000000 /* All control bits */
@@ -485,8 +488,11 @@ struct skbuf_dma_descriptor {
  * @dma_regs:	Base address for the axidma device address space
  * @napi_rx:	NAPI RX control structure
  * @rx_dim:     DIM state for the receive queue
- * @rx_dim_enabled: Whether DIM is enabled or not
- * @rx_irqs:    Number of interrupts
+ * @tx_dim:     DIM state for the transmit queue
+ * @rx_dim_enabled: Whether Rx DIM is enabled or not
+ * @tx_dim_enabled: Whether Tx DIM is enabled or not
+ * @rx_irqs:    Number of Rx interrupts
+ * @tx_irqs:    Number of Tx interrupts
  * @rx_cr_lock: Lock protecting @rx_dma_cr, its register, and @rx_dma_started
  * @rx_dma_cr:  Nominal content of RX DMA control register
  * @rx_dma_started: Set when RX DMA is started
@@ -570,8 +576,11 @@ struct axienet_local {
 
 	struct napi_struct napi_rx;
 	struct dim rx_dim;
+	struct dim tx_dim;
 	bool rx_dim_enabled;
+	bool tx_dim_enabled;
 	u16 rx_irqs;
+	u16 tx_irqs;
 	spinlock_t rx_cr_lock;
 	u32 rx_dma_cr;
 	bool rx_dma_started;
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index 6011d7eae0c7..2c7cc092fbe8 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -54,7 +54,7 @@
 #define RX_BD_NUM_MAX			4096
 #define DMA_NUM_APP_WORDS		5
 #define LEN_APP				4
-#define RX_BUF_NUM_DEFAULT		128
+#define RX_BUF_NUM_DEFAULT		512
 
 /* Must be shorter than length of ethtool_drvinfo.driver field to fit */
 #define DRIVER_NAME		"xaxienet"
@@ -869,6 +869,7 @@ static void axienet_dma_tx_cb(void *data, const struct dmaengine_result *result)
 	struct netdev_queue *txq;
 	int len;
 
+	WRITE_ONCE(lp->tx_irqs, READ_ONCE(lp->tx_irqs) + 1);
 	skbuf_dma = axienet_get_tx_desc(lp, lp->tx_ring_tail++);
 	len = skbuf_dma->skb->len;
 	txq = skb_get_tx_queue(lp->ndev, skbuf_dma->skb);
@@ -881,6 +882,17 @@ static void axienet_dma_tx_cb(void *data, const struct dmaengine_result *result)
 	netif_txq_completed_wake(txq, 1, len,
 				 CIRC_SPACE(lp->tx_ring_head, lp->tx_ring_tail, TX_BD_NUM_MAX),
 				 2);
+
+	if (READ_ONCE(lp->tx_dim_enabled)) {
+		struct dim_sample sample = {
+			.time = ktime_get(),
+			.pkt_ctr = u64_stats_read(&lp->tx_packets),
+			.byte_ctr = u64_stats_read(&lp->tx_bytes),
+			.event_ctr = READ_ONCE(lp->tx_irqs),
+		};
+
+		net_dim(&lp->tx_dim, &sample);
+	}
 }
 
 /**
@@ -1161,6 +1173,7 @@ static void axienet_dma_rx_cb(void *data, const struct dmaengine_result *result)
 	struct sk_buff *skb;
 	u32 *app_metadata;
 
+	WRITE_ONCE(lp->rx_irqs, READ_ONCE(lp->rx_irqs) + 1);
 	skbuf_dma = axienet_get_rx_desc(lp, lp->rx_ring_tail++);
 	skb = skbuf_dma->skb;
 	app_metadata = dmaengine_desc_get_metadata_ptr(skbuf_dma->desc, &meta_len,
@@ -1179,7 +1192,18 @@ static void axienet_dma_rx_cb(void *data, const struct dmaengine_result *result)
 	u64_stats_add(&lp->rx_bytes, rx_len);
 	u64_stats_update_end(&lp->rx_stat_sync);
 	axienet_rx_submit_desc(lp->ndev);
+
 	dma_async_issue_pending(lp->rx_chan);
+	if (READ_ONCE(lp->rx_dim_enabled)) {
+		struct dim_sample sample = {
+			.time = ktime_get(),
+			.pkt_ctr = u64_stats_read(&lp->rx_packets),
+			.byte_ctr = u64_stats_read(&lp->rx_bytes),
+			.event_ctr = READ_ONCE(lp->rx_irqs),
+		};
+
+		net_dim(&lp->rx_dim, &sample);
+	}
 }
 
 /**
@@ -1492,6 +1516,9 @@ static void axienet_rx_submit_desc(struct net_device *ndev)
 	dev_kfree_skb(skb);
 }
 
+static u32 axienet_dim_coalesce_count_rx(struct axienet_local *lp);
+static u32 axienet_dim_coalesce_count_tx(struct axienet_local *lp);
+
 /**
  * axienet_init_dmaengine - init the dmaengine code.
  * @ndev:       Pointer to net_device structure
@@ -1505,6 +1532,7 @@ static int axienet_init_dmaengine(struct net_device *ndev)
 {
 	struct axienet_local *lp = netdev_priv(ndev);
 	struct skbuf_dma_descriptor *skbuf_dma;
+	struct dma_slave_config tx_config, rx_config;
 	int i, ret;
 
 	lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0");
@@ -1520,6 +1548,22 @@ static int axienet_init_dmaengine(struct net_device *ndev)
 		goto err_dma_release_tx;
 	}
 
+	tx_config.coalesce_cnt = axienet_dim_coalesce_count_tx(lp);
+	tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC;
+	rx_config.coalesce_cnt = axienet_dim_coalesce_count_rx(lp);
+	rx_config.coalesce_usecs =  XAXIDMA_DFT_RX_USEC;
+
+	ret = dmaengine_slave_config(lp->tx_chan, &tx_config);
+	if (ret) {
+		dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n");
+		goto err_dma_release_tx;
+	}
+	ret = dmaengine_slave_config(lp->rx_chan, &rx_config);
+	if (ret) {
+		dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n");
+		goto err_dma_release_tx;
+	}
+
 	lp->tx_ring_tail = 0;
 	lp->tx_ring_head = 0;
 	lp->rx_ring_tail = 0;
@@ -1692,6 +1736,7 @@ static int axienet_open(struct net_device *ndev)
 		free_irq(lp->eth_irq, ndev);
 err_phy:
 	cancel_work_sync(&lp->rx_dim.work);
+	cancel_work_sync(&lp->tx_dim.work);
 	cancel_delayed_work_sync(&lp->stats_work);
 	phylink_stop(lp->phylink);
 	phylink_disconnect_phy(lp->phylink);
@@ -1722,6 +1767,7 @@ static int axienet_stop(struct net_device *ndev)
 	}
 
 	cancel_work_sync(&lp->rx_dim.work);
+	cancel_work_sync(&lp->tx_dim.work);
 	cancel_delayed_work_sync(&lp->stats_work);
 
 	phylink_stop(lp->phylink);
@@ -2104,6 +2150,15 @@ static u32 axienet_dim_coalesce_count_rx(struct axienet_local *lp)
 	return min(1 << (lp->rx_dim.profile_ix << 1), 255);
 }
 
+/**
+ * axienet_dim_coalesce_count_tx() - TX coalesce count for DIM
+ * @lp: Device private data
+ */
+static u32 axienet_dim_coalesce_count_tx(struct axienet_local *lp)
+{
+	return min(1 << (lp->tx_dim.profile_ix << 1), 255);
+}
+
 /**
  * axienet_rx_dim_work() - Adjust RX DIM settings
  * @work: The work struct
@@ -2120,6 +2175,40 @@ static void axienet_rx_dim_work(struct work_struct *work)
 	lp->rx_dim.state = DIM_START_MEASURE;
 }
 
+/**
+ * axienet_rx_dim_work_dmaengine() - Adjust RX DIM settings in dmaengine
+ * @work: The work struct
+ */
+static void axienet_rx_dim_work_dmaengine(struct work_struct *work)
+{
+	struct axienet_local *lp =
+		container_of(work, struct axienet_local, rx_dim.work);
+	struct dma_slave_config cfg = {
+		.coalesce_cnt	= axienet_dim_coalesce_count_rx(lp),
+		.coalesce_usecs	= 16,
+	};
+
+	dmaengine_slave_config(lp->rx_chan, &cfg);
+	lp->rx_dim.state = DIM_START_MEASURE;
+}
+
+/**
+ * axienet_tx_dim_work_dmaengine() - Adjust RX DIM settings in dmaengine
+ * @work: The work struct
+ */
+static void axienet_tx_dim_work_dmaengine(struct work_struct *work)
+{
+	struct axienet_local *lp =
+		container_of(work, struct axienet_local, tx_dim.work);
+	struct dma_slave_config cfg = {
+		.coalesce_cnt	= axienet_dim_coalesce_count_tx(lp),
+		.coalesce_usecs	= 16,
+	};
+
+	dmaengine_slave_config(lp->tx_chan, &cfg);
+	lp->tx_dim.state = DIM_START_MEASURE;
+}
+
 /**
  * axienet_update_coalesce_tx() - Set TX CR
  * @lp: Device private data
@@ -2171,6 +2260,20 @@ axienet_ethtools_get_coalesce(struct net_device *ndev,
 	u32 cr;
 
 	ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled;
+	ecoalesce->use_adaptive_tx_coalesce = lp->tx_dim_enabled;
+
+	if (lp->use_dmaengine) {
+		struct dma_slave_caps tx_caps, rx_caps;
+
+		dma_get_slave_caps(lp->tx_chan, &tx_caps);
+		dma_get_slave_caps(lp->rx_chan, &rx_caps);
+
+		ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt;
+		ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs;
+		ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt;
+		ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs;
+		return 0;
+	}
 
 	spin_lock_irq(&lp->rx_cr_lock);
 	cr = lp->rx_dma_cr;
@@ -2208,8 +2311,10 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
 			      struct netlink_ext_ack *extack)
 {
 	struct axienet_local *lp = netdev_priv(ndev);
-	bool new_dim = ecoalesce->use_adaptive_rx_coalesce;
-	bool old_dim = lp->rx_dim_enabled;
+	bool new_rxdim = ecoalesce->use_adaptive_rx_coalesce;
+	bool new_txdim = ecoalesce->use_adaptive_tx_coalesce;
+	bool old_rxdim = lp->rx_dim_enabled;
+	bool old_txdim = lp->tx_dim_enabled;
 	u32 cr, mask = ~XAXIDMA_CR_RUNSTOP_MASK;
 
 	if (ecoalesce->rx_max_coalesced_frames > 255 ||
@@ -2224,20 +2329,76 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
 		return -EINVAL;
 	}
 
-	if (((ecoalesce->rx_max_coalesced_frames > 1 || new_dim) &&
+	if (((ecoalesce->rx_max_coalesced_frames > 1 || new_rxdim) &&
 	     !ecoalesce->rx_coalesce_usecs) ||
-	    (ecoalesce->tx_max_coalesced_frames > 1 &&
+	    ((ecoalesce->tx_max_coalesced_frames > 1 || new_txdim) &&
 	     !ecoalesce->tx_coalesce_usecs)) {
 		NL_SET_ERR_MSG(extack,
 			       "usecs must be non-zero when frames is greater than one");
 		return -EINVAL;
 	}
 
-	if (new_dim && !old_dim) {
+	if (lp->use_dmaengine)	{
+		struct dma_slave_config tx_cfg, rx_cfg;
+		int ret;
+
+		if (new_rxdim && !old_rxdim) {
+			rx_cfg.coalesce_cnt = axienet_dim_coalesce_count_rx(lp);
+			rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs;
+		} else if (!new_rxdim) {
+			if (old_rxdim) {
+				WRITE_ONCE(lp->rx_dim_enabled, false);
+				flush_work(&lp->rx_dim.work);
+			}
+
+			rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames;
+			rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs;
+		} else {
+			rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames;
+			rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs;
+		}
+
+		if (new_txdim && !old_txdim) {
+			tx_cfg.coalesce_cnt = axienet_dim_coalesce_count_tx(lp);
+			tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs;
+		} else if (!new_txdim) {
+			if (old_txdim) {
+				WRITE_ONCE(lp->tx_dim_enabled, false);
+				flush_work(&lp->tx_dim.work);
+			}
+
+			tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames;
+			tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs;
+		} else {
+			tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames;
+			tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs;
+		}
+
+		ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg);
+		if (ret) {
+			NL_SET_ERR_MSG(extack, "failed to set rx coalesce parameters");
+			return ret;
+		}
+
+		if (new_rxdim && !old_rxdim)
+			WRITE_ONCE(lp->rx_dim_enabled, true);
+
+		ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg);
+		if (ret) {
+			NL_SET_ERR_MSG(extack, "failed to set tx coalesce parameters");
+			return ret;
+		}
+		if (new_txdim && !old_txdim)
+			WRITE_ONCE(lp->tx_dim_enabled, true);
+
+		return 0;
+	}
+
+	if (new_rxdim && !old_rxdim) {
 		cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp),
 				     ecoalesce->rx_coalesce_usecs);
-	} else if (!new_dim) {
-		if (old_dim) {
+	} else if (!new_rxdim) {
+		if (old_rxdim) {
 			WRITE_ONCE(lp->rx_dim_enabled, false);
 			napi_synchronize(&lp->napi_rx);
 			flush_work(&lp->rx_dim.work);
@@ -2252,7 +2413,7 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
 	}
 
 	axienet_update_coalesce_rx(lp, cr, mask);
-	if (new_dim && !old_dim)
+	if (new_rxdim && !old_rxdim)
 		WRITE_ONCE(lp->rx_dim_enabled, true);
 
 	cr = axienet_calc_cr(lp, ecoalesce->tx_max_coalesced_frames,
@@ -2496,7 +2657,7 @@ axienet_ethtool_get_rmon_stats(struct net_device *dev,
 static const struct ethtool_ops axienet_ethtool_ops = {
 	.supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
 				     ETHTOOL_COALESCE_USECS |
-				     ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
+				     ETHTOOL_COALESCE_USE_ADAPTIVE,
 	.get_drvinfo    = axienet_ethtools_get_drvinfo,
 	.get_regs_len   = axienet_ethtools_get_regs_len,
 	.get_regs       = axienet_ethtools_get_regs,
@@ -3041,7 +3202,14 @@ static int axienet_probe(struct platform_device *pdev)
 
 	spin_lock_init(&lp->rx_cr_lock);
 	spin_lock_init(&lp->tx_cr_lock);
-	INIT_WORK(&lp->rx_dim.work, axienet_rx_dim_work);
+	if (lp->use_dmaengine) {
+		INIT_WORK(&lp->rx_dim.work, axienet_rx_dim_work_dmaengine);
+		INIT_WORK(&lp->tx_dim.work, axienet_tx_dim_work_dmaengine);
+		lp->tx_dim_enabled = true;
+		lp->tx_dim.profile_ix = 1;
+	} else {
+		INIT_WORK(&lp->rx_dim.work, axienet_rx_dim_work);
+	}
 	lp->rx_dim_enabled = true;
 	lp->rx_dim.profile_ix = 1;
 	lp->rx_dma_cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA
  2025-07-10 10:12 ` [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA Suraj Gupta
@ 2025-07-10 11:26   ` Simon Horman
  2025-07-11  5:32   ` Folker Schwesinger
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Simon Horman @ 2025-07-10 11:26 UTC (permalink / raw)
  To: Suraj Gupta
  Cc: andrew+netdev, davem, kuba, pabeni, michal.simek, vkoul,
	radhey.shyam.pandey, netdev, linux-arm-kernel, linux-kernel,
	dmaengine, harini.katakam

On Thu, Jul 10, 2025 at 03:42:27PM +0530, Suraj Gupta wrote:
> AXI DMA driver incorrectly assumes complete transfer completion upon
> IRQ reception, particularly problematic when IRQ coalescing is active.
> Updating the tail pointer dynamically fixes it.
> Remove existing idle state validation in the beginning of
> xilinx_dma_start_transfer() as it blocks valid transfer initiation on
> busy channels with queued descriptors.
> Additionally, refactor xilinx_dma_start_transfer() to consolidate coalesce
> and delay configurations while conditionally starting channels
> only when idle.
> 
> Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> Fixes: Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine")

Hi, 

This is not a proper review.
And there is probably no need to repost just becuse of it.
But:

s/Fixes: Fixes: /Fixes: /

...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA
  2025-07-10 10:12 ` [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA Suraj Gupta
  2025-07-10 11:26   ` Simon Horman
@ 2025-07-11  5:32   ` Folker Schwesinger
  2025-07-11 16:26   ` Subbaraya Sundeep
  2025-07-15 11:05   ` Pandey, Radhey Shyam
  3 siblings, 0 replies; 16+ messages in thread
From: Folker Schwesinger @ 2025-07-11  5:32 UTC (permalink / raw)
  To: Suraj Gupta, andrew+netdev, davem, kuba, pabeni, michal.simek,
	vkoul, radhey.shyam.pandey
  Cc: netdev, linux-arm-kernel, linux-kernel, dmaengine, harini.katakam

On Thu Jul 10, 2025 at 12:12 PM CEST, Suraj Gupta wrote:
> AXI DMA driver incorrectly assumes complete transfer completion upon
> IRQ reception, particularly problematic when IRQ coalescing is active.
> Updating the tail pointer dynamically fixes it.
> Remove existing idle state validation in the beginning of
> xilinx_dma_start_transfer() as it blocks valid transfer initiation on
> busy channels with queued descriptors.
> Additionally, refactor xilinx_dma_start_transfer() to consolidate coalesce
> and delay configurations while conditionally starting channels
> only when idle.
>
> Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> Fixes: Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine")

This fixes an issue I recently ran into which prevented starting
consecutive transfers. Thanks and:

Tested-by: Folker Schwesinger <dev@folker-schwesinger.de>

> ---
>  drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
> index a34d8f0ceed8..187749b7b8a6 100644
> --- a/drivers/dma/xilinx/xilinx_dma.c
> +++ b/drivers/dma/xilinx/xilinx_dma.c
> @@ -1548,9 +1548,6 @@ static void xilinx_dma_start_transfer(struct xilinx_dma_chan *chan)
>  	if (list_empty(&chan->pending_list))
>  		return;
>  
> -	if (!chan->idle)
> -		return;
> -
>  	head_desc = list_first_entry(&chan->pending_list,
>  				     struct xilinx_dma_tx_descriptor, node);
>  	tail_desc = list_last_entry(&chan->pending_list,
> @@ -1558,23 +1555,24 @@ static void xilinx_dma_start_transfer(struct xilinx_dma_chan *chan)
>  	tail_segment = list_last_entry(&tail_desc->segments,
>  				       struct xilinx_axidma_tx_segment, node);
>  
> +	if (chan->has_sg && list_empty(&chan->active_list))
> +		xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> +			     head_desc->async_tx.phys);
> +
>  	reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
>  
>  	if (chan->desc_pendingcount <= XILINX_DMA_COALESCE_MAX) {
>  		reg &= ~XILINX_DMA_CR_COALESCE_MAX;
>  		reg |= chan->desc_pendingcount <<
>  				  XILINX_DMA_CR_COALESCE_SHIFT;
> -		dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
>  	}
>  
> -	if (chan->has_sg)
> -		xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> -			     head_desc->async_tx.phys);
>  	reg  &= ~XILINX_DMA_CR_DELAY_MAX;
>  	reg  |= chan->irq_delay << XILINX_DMA_CR_DELAY_SHIFT;
>  	dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
>  
> -	xilinx_dma_start(chan);
> +	if (chan->idle)
> +		xilinx_dma_start(chan);
>  
>  	if (chan->err)
>  		return;
> @@ -1914,8 +1912,10 @@ static irqreturn_t xilinx_dma_irq_handler(int irq, void *data)
>  		      XILINX_DMA_DMASR_DLY_CNT_IRQ)) {
>  		spin_lock(&chan->lock);
>  		xilinx_dma_complete_descriptor(chan);
> -		chan->idle = true;
> -		chan->start_transfer(chan);
> +		if (list_empty(&chan->active_list)) {
> +			chan->idle = true;
> +			chan->start_transfer(chan);
> +		}
>  		spin_unlock(&chan->lock);
>  	}
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA
  2025-07-10 10:12 ` [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA Suraj Gupta
  2025-07-10 11:26   ` Simon Horman
  2025-07-11  5:32   ` Folker Schwesinger
@ 2025-07-11 16:26   ` Subbaraya Sundeep
  2025-07-11 20:13     ` Gupta, Suraj
  2025-07-15 11:05   ` Pandey, Radhey Shyam
  3 siblings, 1 reply; 16+ messages in thread
From: Subbaraya Sundeep @ 2025-07-11 16:26 UTC (permalink / raw)
  To: Suraj Gupta
  Cc: andrew+netdev, davem, kuba, pabeni, michal.simek, vkoul,
	radhey.shyam.pandey, netdev, linux-arm-kernel, linux-kernel,
	dmaengine, harini.katakam

On 2025-07-10 at 10:12:27, Suraj Gupta (suraj.gupta2@amd.com) wrote:
> AXI DMA driver incorrectly assumes complete transfer completion upon
> IRQ reception, particularly problematic when IRQ coalescing is active.
> Updating the tail pointer dynamically fixes it.
> Remove existing idle state validation in the beginning of
> xilinx_dma_start_transfer() as it blocks valid transfer initiation on
> busy channels with queued descriptors.
> Additionally, refactor xilinx_dma_start_transfer() to consolidate coalesce
> and delay configurations while conditionally starting channels
> only when idle.
> 
> Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> Fixes: Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine")

You series looks like net-next material and this one is fixing some
existing bug. Send this one patch seperately to net.
Also include net or net-next in subject.

Thanks,
Sundeep
> ---
>  drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
> index a34d8f0ceed8..187749b7b8a6 100644
> --- a/drivers/dma/xilinx/xilinx_dma.c
> +++ b/drivers/dma/xilinx/xilinx_dma.c
> @@ -1548,9 +1548,6 @@ static void xilinx_dma_start_transfer(struct xilinx_dma_chan *chan)
>  	if (list_empty(&chan->pending_list))
>  		return;
>  
> -	if (!chan->idle)
> -		return;
> -
>  	head_desc = list_first_entry(&chan->pending_list,
>  				     struct xilinx_dma_tx_descriptor, node);
>  	tail_desc = list_last_entry(&chan->pending_list,
> @@ -1558,23 +1555,24 @@ static void xilinx_dma_start_transfer(struct xilinx_dma_chan *chan)
>  	tail_segment = list_last_entry(&tail_desc->segments,
>  				       struct xilinx_axidma_tx_segment, node);
>  
> +	if (chan->has_sg && list_empty(&chan->active_list))
> +		xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> +			     head_desc->async_tx.phys);
> +
>  	reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
>  
>  	if (chan->desc_pendingcount <= XILINX_DMA_COALESCE_MAX) {
>  		reg &= ~XILINX_DMA_CR_COALESCE_MAX;
>  		reg |= chan->desc_pendingcount <<
>  				  XILINX_DMA_CR_COALESCE_SHIFT;
> -		dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
>  	}
>  
> -	if (chan->has_sg)
> -		xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> -			     head_desc->async_tx.phys);
>  	reg  &= ~XILINX_DMA_CR_DELAY_MAX;
>  	reg  |= chan->irq_delay << XILINX_DMA_CR_DELAY_SHIFT;
>  	dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
>  
> -	xilinx_dma_start(chan);
> +	if (chan->idle)
> +		xilinx_dma_start(chan);
>  
>  	if (chan->err)
>  		return;
> @@ -1914,8 +1912,10 @@ static irqreturn_t xilinx_dma_irq_handler(int irq, void *data)
>  		      XILINX_DMA_DMASR_DLY_CNT_IRQ)) {
>  		spin_lock(&chan->lock);
>  		xilinx_dma_complete_descriptor(chan);
> -		chan->idle = true;
> -		chan->start_transfer(chan);
> +		if (list_empty(&chan->active_list)) {
> +			chan->idle = true;
> +			chan->start_transfer(chan);
> +		}
>  		spin_unlock(&chan->lock);
>  	}
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 4/4] net: xilinx: axienet: Add ethtool support to configure/report irq coalescing parameters in DMAengine flow
  2025-07-10 10:12 ` [PATCH V2 4/4] net: xilinx: axienet: Add ethtool support to configure/report irq coalescing parameters in DMAengine flow Suraj Gupta
@ 2025-07-11 16:33   ` Subbaraya Sundeep
  0 siblings, 0 replies; 16+ messages in thread
From: Subbaraya Sundeep @ 2025-07-11 16:33 UTC (permalink / raw)
  To: Suraj Gupta
  Cc: andrew+netdev, davem, kuba, pabeni, michal.simek, vkoul,
	radhey.shyam.pandey, netdev, linux-arm-kernel, linux-kernel,
	dmaengine, harini.katakam

On 2025-07-10 at 10:12:29, Suraj Gupta (suraj.gupta2@amd.com) wrote:
> Add support to configure and report interrupt coalesce count and delay
> via ethtool in DMAEngine flow.
> Enable Tx and Rx adaptive irq coalescing with DIM to allow runtime
> configuration of coalesce count based on load. CQE profiles same as
> legacy (non-dmaengine) flow are used.
> Increase Rx skb ring size from 128 as maximum coalesce packets are 255.
> 
> Netperf numbers and CPU usage after DIM:
> TCP Tx:	885 Mb/s, 27.02%
> TCP Rx:	640 Mb/s, 27.73%
> UDP Tx: 857 Mb/s, 25.00%
> UDP Rx:	730 Mb/s, 23.94%
> 
> Above numbers are observed with 4x Cortex-a53.
> 
> Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> ---
>  drivers/net/ethernet/xilinx/xilinx_axienet.h  |  13 +-
>  .../net/ethernet/xilinx/xilinx_axienet_main.c | 190 +++++++++++++++++-
>  2 files changed, 190 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
> index 5ff742103beb..747efde9a05f 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
> @@ -126,6 +126,9 @@
>  #define XAXIDMA_DFT_TX_USEC		50
>  #define XAXIDMA_DFT_RX_USEC		16
>  
> +/* Default TX delay timer value for SGDMA mode with DMAEngine */
> +#define XAXIDMAENGINE_DFT_TX_USEC	16
> +
>  #define XAXIDMA_BD_CTRL_TXSOF_MASK	0x08000000 /* First tx packet */
>  #define XAXIDMA_BD_CTRL_TXEOF_MASK	0x04000000 /* Last tx packet */
>  #define XAXIDMA_BD_CTRL_ALL_MASK	0x0C000000 /* All control bits */
> @@ -485,8 +488,11 @@ struct skbuf_dma_descriptor {
>   * @dma_regs:	Base address for the axidma device address space
>   * @napi_rx:	NAPI RX control structure
>   * @rx_dim:     DIM state for the receive queue
> - * @rx_dim_enabled: Whether DIM is enabled or not
> - * @rx_irqs:    Number of interrupts
> + * @tx_dim:     DIM state for the transmit queue
> + * @rx_dim_enabled: Whether Rx DIM is enabled or not
> + * @tx_dim_enabled: Whether Tx DIM is enabled or not
> + * @rx_irqs:    Number of Rx interrupts
> + * @tx_irqs:    Number of Tx interrupts
>   * @rx_cr_lock: Lock protecting @rx_dma_cr, its register, and @rx_dma_started
>   * @rx_dma_cr:  Nominal content of RX DMA control register
>   * @rx_dma_started: Set when RX DMA is started
> @@ -570,8 +576,11 @@ struct axienet_local {
>  
>  	struct napi_struct napi_rx;
>  	struct dim rx_dim;
> +	struct dim tx_dim;
>  	bool rx_dim_enabled;
> +	bool tx_dim_enabled;
>  	u16 rx_irqs;
> +	u16 tx_irqs;
>  	spinlock_t rx_cr_lock;
>  	u32 rx_dma_cr;
>  	bool rx_dma_started;
> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> index 6011d7eae0c7..2c7cc092fbe8 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> @@ -54,7 +54,7 @@
>  #define RX_BD_NUM_MAX			4096
>  #define DMA_NUM_APP_WORDS		5
>  #define LEN_APP				4
> -#define RX_BUF_NUM_DEFAULT		128
> +#define RX_BUF_NUM_DEFAULT		512
>  
>  /* Must be shorter than length of ethtool_drvinfo.driver field to fit */
>  #define DRIVER_NAME		"xaxienet"
> @@ -869,6 +869,7 @@ static void axienet_dma_tx_cb(void *data, const struct dmaengine_result *result)
>  	struct netdev_queue *txq;
>  	int len;
>  
> +	WRITE_ONCE(lp->tx_irqs, READ_ONCE(lp->tx_irqs) + 1);
>  	skbuf_dma = axienet_get_tx_desc(lp, lp->tx_ring_tail++);
>  	len = skbuf_dma->skb->len;
>  	txq = skb_get_tx_queue(lp->ndev, skbuf_dma->skb);
> @@ -881,6 +882,17 @@ static void axienet_dma_tx_cb(void *data, const struct dmaengine_result *result)
>  	netif_txq_completed_wake(txq, 1, len,
>  				 CIRC_SPACE(lp->tx_ring_head, lp->tx_ring_tail, TX_BD_NUM_MAX),
>  				 2);
> +
> +	if (READ_ONCE(lp->tx_dim_enabled)) {
> +		struct dim_sample sample = {
> +			.time = ktime_get(),
> +			.pkt_ctr = u64_stats_read(&lp->tx_packets),
> +			.byte_ctr = u64_stats_read(&lp->tx_bytes),
> +			.event_ctr = READ_ONCE(lp->tx_irqs),
> +		};
> +
> +		net_dim(&lp->tx_dim, &sample);
> +	}
>  }
>  
>  /**
> @@ -1161,6 +1173,7 @@ static void axienet_dma_rx_cb(void *data, const struct dmaengine_result *result)
>  	struct sk_buff *skb;
>  	u32 *app_metadata;
>  
> +	WRITE_ONCE(lp->rx_irqs, READ_ONCE(lp->rx_irqs) + 1);
>  	skbuf_dma = axienet_get_rx_desc(lp, lp->rx_ring_tail++);
>  	skb = skbuf_dma->skb;
>  	app_metadata = dmaengine_desc_get_metadata_ptr(skbuf_dma->desc, &meta_len,
> @@ -1179,7 +1192,18 @@ static void axienet_dma_rx_cb(void *data, const struct dmaengine_result *result)
>  	u64_stats_add(&lp->rx_bytes, rx_len);
>  	u64_stats_update_end(&lp->rx_stat_sync);
>  	axienet_rx_submit_desc(lp->ndev);
> +
>  	dma_async_issue_pending(lp->rx_chan);
> +	if (READ_ONCE(lp->rx_dim_enabled)) {
> +		struct dim_sample sample = {
> +			.time = ktime_get(),
> +			.pkt_ctr = u64_stats_read(&lp->rx_packets),
> +			.byte_ctr = u64_stats_read(&lp->rx_bytes),
> +			.event_ctr = READ_ONCE(lp->rx_irqs),
> +		};
> +
> +		net_dim(&lp->rx_dim, &sample);
> +	}
>  }
>  
>  /**
> @@ -1492,6 +1516,9 @@ static void axienet_rx_submit_desc(struct net_device *ndev)
>  	dev_kfree_skb(skb);
>  }
>  
> +static u32 axienet_dim_coalesce_count_rx(struct axienet_local *lp);
> +static u32 axienet_dim_coalesce_count_tx(struct axienet_local *lp);
> +
>  /**
>   * axienet_init_dmaengine - init the dmaengine code.
>   * @ndev:       Pointer to net_device structure
> @@ -1505,6 +1532,7 @@ static int axienet_init_dmaengine(struct net_device *ndev)
>  {
>  	struct axienet_local *lp = netdev_priv(ndev);
>  	struct skbuf_dma_descriptor *skbuf_dma;
> +	struct dma_slave_config tx_config, rx_config;
>  	int i, ret;
>  
>  	lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0");
> @@ -1520,6 +1548,22 @@ static int axienet_init_dmaengine(struct net_device *ndev)
>  		goto err_dma_release_tx;
>  	}
>  
> +	tx_config.coalesce_cnt = axienet_dim_coalesce_count_tx(lp);
> +	tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC;
> +	rx_config.coalesce_cnt = axienet_dim_coalesce_count_rx(lp);
> +	rx_config.coalesce_usecs =  XAXIDMA_DFT_RX_USEC;
> +
> +	ret = dmaengine_slave_config(lp->tx_chan, &tx_config);
> +	if (ret) {
> +		dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n");
> +		goto err_dma_release_tx;
> +	}
> +	ret = dmaengine_slave_config(lp->rx_chan, &rx_config);
> +	if (ret) {
> +		dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n");
> +		goto err_dma_release_tx;
> +	}
> +
>  	lp->tx_ring_tail = 0;
>  	lp->tx_ring_head = 0;
>  	lp->rx_ring_tail = 0;
> @@ -1692,6 +1736,7 @@ static int axienet_open(struct net_device *ndev)
>  		free_irq(lp->eth_irq, ndev);
>  err_phy:
>  	cancel_work_sync(&lp->rx_dim.work);
> +	cancel_work_sync(&lp->tx_dim.work);
>  	cancel_delayed_work_sync(&lp->stats_work);
>  	phylink_stop(lp->phylink);
>  	phylink_disconnect_phy(lp->phylink);
> @@ -1722,6 +1767,7 @@ static int axienet_stop(struct net_device *ndev)
>  	}
>  
>  	cancel_work_sync(&lp->rx_dim.work);
> +	cancel_work_sync(&lp->tx_dim.work);
>  	cancel_delayed_work_sync(&lp->stats_work);
>  
>  	phylink_stop(lp->phylink);
> @@ -2104,6 +2150,15 @@ static u32 axienet_dim_coalesce_count_rx(struct axienet_local *lp)
>  	return min(1 << (lp->rx_dim.profile_ix << 1), 255);
>  }
>  
> +/**
> + * axienet_dim_coalesce_count_tx() - TX coalesce count for DIM
> + * @lp: Device private data
> + */
> +static u32 axienet_dim_coalesce_count_tx(struct axienet_local *lp)
> +{
> +	return min(1 << (lp->tx_dim.profile_ix << 1), 255);
> +}
> +
>  /**
>   * axienet_rx_dim_work() - Adjust RX DIM settings
>   * @work: The work struct
> @@ -2120,6 +2175,40 @@ static void axienet_rx_dim_work(struct work_struct *work)
>  	lp->rx_dim.state = DIM_START_MEASURE;
>  }
>  
> +/**
> + * axienet_rx_dim_work_dmaengine() - Adjust RX DIM settings in dmaengine
> + * @work: The work struct
> + */
> +static void axienet_rx_dim_work_dmaengine(struct work_struct *work)
> +{
> +	struct axienet_local *lp =
> +		container_of(work, struct axienet_local, rx_dim.work);
> +	struct dma_slave_config cfg = {
> +		.coalesce_cnt	= axienet_dim_coalesce_count_rx(lp),
> +		.coalesce_usecs	= 16,
> +	};
> +
> +	dmaengine_slave_config(lp->rx_chan, &cfg);
> +	lp->rx_dim.state = DIM_START_MEASURE;
> +}
> +
> +/**
> + * axienet_tx_dim_work_dmaengine() - Adjust RX DIM settings in dmaengine
> + * @work: The work struct
> + */
> +static void axienet_tx_dim_work_dmaengine(struct work_struct *work)
> +{
> +	struct axienet_local *lp =
> +		container_of(work, struct axienet_local, tx_dim.work);
> +	struct dma_slave_config cfg = {
> +		.coalesce_cnt	= axienet_dim_coalesce_count_tx(lp),
> +		.coalesce_usecs	= 16,
> +	};
> +
> +	dmaengine_slave_config(lp->tx_chan, &cfg);
> +	lp->tx_dim.state = DIM_START_MEASURE;
> +}
> +
>  /**
>   * axienet_update_coalesce_tx() - Set TX CR
>   * @lp: Device private data
> @@ -2171,6 +2260,20 @@ axienet_ethtools_get_coalesce(struct net_device *ndev,
>  	u32 cr;
>  
>  	ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled;
> +	ecoalesce->use_adaptive_tx_coalesce = lp->tx_dim_enabled;
> +
> +	if (lp->use_dmaengine) {
> +		struct dma_slave_caps tx_caps, rx_caps;
> +
> +		dma_get_slave_caps(lp->tx_chan, &tx_caps);
> +		dma_get_slave_caps(lp->rx_chan, &rx_caps);
> +
> +		ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt;
> +		ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs;
> +		ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt;
> +		ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs;
> +		return 0;
> +	}
>  
>  	spin_lock_irq(&lp->rx_cr_lock);
>  	cr = lp->rx_dma_cr;
> @@ -2208,8 +2311,10 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
>  			      struct netlink_ext_ack *extack)
>  {
>  	struct axienet_local *lp = netdev_priv(ndev);
> -	bool new_dim = ecoalesce->use_adaptive_rx_coalesce;
> -	bool old_dim = lp->rx_dim_enabled;
> +	bool new_rxdim = ecoalesce->use_adaptive_rx_coalesce;
> +	bool new_txdim = ecoalesce->use_adaptive_tx_coalesce;
> +	bool old_rxdim = lp->rx_dim_enabled;
> +	bool old_txdim = lp->tx_dim_enabled;
>  	u32 cr, mask = ~XAXIDMA_CR_RUNSTOP_MASK;
>  
>  	if (ecoalesce->rx_max_coalesced_frames > 255 ||
> @@ -2224,20 +2329,76 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
>  		return -EINVAL;
>  	}
>  
> -	if (((ecoalesce->rx_max_coalesced_frames > 1 || new_dim) &&
> +	if (((ecoalesce->rx_max_coalesced_frames > 1 || new_rxdim) &&
>  	     !ecoalesce->rx_coalesce_usecs) ||
> -	    (ecoalesce->tx_max_coalesced_frames > 1 &&
> +	    ((ecoalesce->tx_max_coalesced_frames > 1 || new_txdim) &&
>  	     !ecoalesce->tx_coalesce_usecs)) {
>  		NL_SET_ERR_MSG(extack,
>  			       "usecs must be non-zero when frames is greater than one");
>  		return -EINVAL;
>  	}
>  
> -	if (new_dim && !old_dim) {
> +	if (lp->use_dmaengine)	{
> +		struct dma_slave_config tx_cfg, rx_cfg;
> +		int ret;
> +
> +		if (new_rxdim && !old_rxdim) {
> +			rx_cfg.coalesce_cnt = axienet_dim_coalesce_count_rx(lp);
> +			rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs;
> +		} else if (!new_rxdim) {
> +			if (old_rxdim) {
> +				WRITE_ONCE(lp->rx_dim_enabled, false);
> +				flush_work(&lp->rx_dim.work);
> +			}
> +
> +			rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames;
> +			rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs;
> +		} else {
> +			rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames;
> +			rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs;
> +		}
> +
> +		if (new_txdim && !old_txdim) {
> +			tx_cfg.coalesce_cnt = axienet_dim_coalesce_count_tx(lp);
> +			tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs;
> +		} else if (!new_txdim) {
> +			if (old_txdim) {
> +				WRITE_ONCE(lp->tx_dim_enabled, false);
> +				flush_work(&lp->tx_dim.work);
> +			}
> +
> +			tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames;
> +			tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs;
> +		} else {
> +			tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames;
> +			tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs;
> +		}
> +
> +		ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg);
> +		if (ret) {
> +			NL_SET_ERR_MSG(extack, "failed to set rx coalesce parameters");
> +			return ret;
> +		}
> +
> +		if (new_rxdim && !old_rxdim)
> +			WRITE_ONCE(lp->rx_dim_enabled, true);
> +
> +		ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg);
> +		if (ret) {
> +			NL_SET_ERR_MSG(extack, "failed to set tx coalesce parameters");
> +			return ret;
> +		}
> +		if (new_txdim && !old_txdim)
> +			WRITE_ONCE(lp->tx_dim_enabled, true);
> +
> +		return 0;
> +	}

Very big block of if and else conditions and looks confusing. Please
simplify using small helpers for TX and RX. Also write a comment what
you are trying to do with new and old TX and RX dims.

Thanks,
Sundeep

> +
> +	if (new_rxdim && !old_rxdim) {
>  		cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp),
>  				     ecoalesce->rx_coalesce_usecs);
> -	} else if (!new_dim) {
> -		if (old_dim) {
> +	} else if (!new_rxdim) {
> +		if (old_rxdim) {
>  			WRITE_ONCE(lp->rx_dim_enabled, false);
>  			napi_synchronize(&lp->napi_rx);
>  			flush_work(&lp->rx_dim.work);
> @@ -2252,7 +2413,7 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
>  	}
>  
>  	axienet_update_coalesce_rx(lp, cr, mask);
> -	if (new_dim && !old_dim)
> +	if (new_rxdim && !old_rxdim)
>  		WRITE_ONCE(lp->rx_dim_enabled, true);
>  
>  	cr = axienet_calc_cr(lp, ecoalesce->tx_max_coalesced_frames,
> @@ -2496,7 +2657,7 @@ axienet_ethtool_get_rmon_stats(struct net_device *dev,
>  static const struct ethtool_ops axienet_ethtool_ops = {
>  	.supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
>  				     ETHTOOL_COALESCE_USECS |
> -				     ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
> +				     ETHTOOL_COALESCE_USE_ADAPTIVE,
>  	.get_drvinfo    = axienet_ethtools_get_drvinfo,
>  	.get_regs_len   = axienet_ethtools_get_regs_len,
>  	.get_regs       = axienet_ethtools_get_regs,
> @@ -3041,7 +3202,14 @@ static int axienet_probe(struct platform_device *pdev)
>  
>  	spin_lock_init(&lp->rx_cr_lock);
>  	spin_lock_init(&lp->tx_cr_lock);
> -	INIT_WORK(&lp->rx_dim.work, axienet_rx_dim_work);
> +	if (lp->use_dmaengine) {
> +		INIT_WORK(&lp->rx_dim.work, axienet_rx_dim_work_dmaengine);
> +		INIT_WORK(&lp->tx_dim.work, axienet_tx_dim_work_dmaengine);
> +		lp->tx_dim_enabled = true;
> +		lp->tx_dim.profile_ix = 1;
> +	} else {
> +		INIT_WORK(&lp->rx_dim.work, axienet_rx_dim_work);
> +	}
>  	lp->rx_dim_enabled = true;
>  	lp->rx_dim.profile_ix = 1;
>  	lp->rx_dma_cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp),
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA
  2025-07-11 16:26   ` Subbaraya Sundeep
@ 2025-07-11 20:13     ` Gupta, Suraj
  2025-07-12  5:36       ` Subbaraya Sundeep
  0 siblings, 1 reply; 16+ messages in thread
From: Gupta, Suraj @ 2025-07-11 20:13 UTC (permalink / raw)
  To: Subbaraya Sundeep
  Cc: andrew+netdev@lunn.ch, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, Simek, Michal, vkoul@kernel.org,
	Pandey, Radhey Shyam, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	Katakam, Harini

[Public]

> -----Original Message-----
> From: Subbaraya Sundeep <sbhatta@marvell.com>
> Sent: Friday, July 11, 2025 9:56 PM
> To: Gupta, Suraj <Suraj.Gupta2@amd.com>
> Cc: andrew+netdev@lunn.ch; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; Simek, Michal <michal.simek@amd.com>; vkoul@kernel.org;
> Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>;
> netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; dmaengine@vger.kernel.org; Katakam, Harini
> <harini.katakam@amd.com>
> Subject: Re: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer
> path for AXI DMA
>
> Caution: This message originated from an External Source. Use proper caution when
> opening attachments, clicking links, or responding.
>
>
> On 2025-07-10 at 10:12:27, Suraj Gupta (suraj.gupta2@amd.com) wrote:
> > AXI DMA driver incorrectly assumes complete transfer completion upon
> > IRQ reception, particularly problematic when IRQ coalescing is active.
> > Updating the tail pointer dynamically fixes it.
> > Remove existing idle state validation in the beginning of
> > xilinx_dma_start_transfer() as it blocks valid transfer initiation on
> > busy channels with queued descriptors.
> > Additionally, refactor xilinx_dma_start_transfer() to consolidate
> > coalesce and delay configurations while conditionally starting
> > channels only when idle.
> >
> > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> > Fixes: Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx
> > AXI Direct Memory Access Engine")
>
> You series looks like net-next material and this one is fixing some existing bug. Send
> this one patch seperately to net.
> Also include net or net-next in subject.
>
> Thanks,
> Sundeep

This series is more of DMAengine as we're enabling coalesce parameters configuration via DMAengine framework. AXI ethernet is just an example client using AXI DMA.

I sent V1 as separate patches for net-next and dmaengine mailing list and got suggestion[1] to send them as single series for better review, so I didn't used net specific subject prefix.
[1]: https://lore.kernel.org/all/d5be7218-8ec1-4208-ac24-94d4831bfdb6@linux.dev/

Regards,
Suraj

> > ---
> >  drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/dma/xilinx/xilinx_dma.c
> > b/drivers/dma/xilinx/xilinx_dma.c index a34d8f0ceed8..187749b7b8a6
> > 100644
> > --- a/drivers/dma/xilinx/xilinx_dma.c
> > +++ b/drivers/dma/xilinx/xilinx_dma.c
> > @@ -1548,9 +1548,6 @@ static void xilinx_dma_start_transfer(struct
> xilinx_dma_chan *chan)
> >       if (list_empty(&chan->pending_list))
> >               return;
> >
> > -     if (!chan->idle)
> > -             return;
> > -
> >       head_desc = list_first_entry(&chan->pending_list,
> >                                    struct xilinx_dma_tx_descriptor, node);
> >       tail_desc = list_last_entry(&chan->pending_list,
> > @@ -1558,23 +1555,24 @@ static void xilinx_dma_start_transfer(struct
> xilinx_dma_chan *chan)
> >       tail_segment = list_last_entry(&tail_desc->segments,
> >                                      struct xilinx_axidma_tx_segment,
> > node);
> >
> > +     if (chan->has_sg && list_empty(&chan->active_list))
> > +             xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> > +                          head_desc->async_tx.phys);
> > +
> >       reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
> >
> >       if (chan->desc_pendingcount <= XILINX_DMA_COALESCE_MAX) {
> >               reg &= ~XILINX_DMA_CR_COALESCE_MAX;
> >               reg |= chan->desc_pendingcount <<
> >                                 XILINX_DMA_CR_COALESCE_SHIFT;
> > -             dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
> >       }
> >
> > -     if (chan->has_sg)
> > -             xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> > -                          head_desc->async_tx.phys);
> >       reg  &= ~XILINX_DMA_CR_DELAY_MAX;
> >       reg  |= chan->irq_delay << XILINX_DMA_CR_DELAY_SHIFT;
> >       dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
> >
> > -     xilinx_dma_start(chan);
> > +     if (chan->idle)
> > +             xilinx_dma_start(chan);
> >
> >       if (chan->err)
> >               return;
> > @@ -1914,8 +1912,10 @@ static irqreturn_t xilinx_dma_irq_handler(int irq, void
> *data)
> >                     XILINX_DMA_DMASR_DLY_CNT_IRQ)) {
> >               spin_lock(&chan->lock);
> >               xilinx_dma_complete_descriptor(chan);
> > -             chan->idle = true;
> > -             chan->start_transfer(chan);
> > +             if (list_empty(&chan->active_list)) {
> > +                     chan->idle = true;
> > +                     chan->start_transfer(chan);
> > +             }
> >               spin_unlock(&chan->lock);
> >       }
> >
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA
  2025-07-11 20:13     ` Gupta, Suraj
@ 2025-07-12  5:36       ` Subbaraya Sundeep
  0 siblings, 0 replies; 16+ messages in thread
From: Subbaraya Sundeep @ 2025-07-12  5:36 UTC (permalink / raw)
  To: Gupta, Suraj
  Cc: andrew+netdev@lunn.ch, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, Simek, Michal, vkoul@kernel.org,
	Pandey, Radhey Shyam, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	Katakam, Harini

On 2025-07-11 at 20:13:40, Gupta, Suraj (Suraj.Gupta2@amd.com) wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Subbaraya Sundeep <sbhatta@marvell.com>
> > Sent: Friday, July 11, 2025 9:56 PM
> > To: Gupta, Suraj <Suraj.Gupta2@amd.com>
> > Cc: andrew+netdev@lunn.ch; davem@davemloft.net; kuba@kernel.org;
> > pabeni@redhat.com; Simek, Michal <michal.simek@amd.com>; vkoul@kernel.org;
> > Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>;
> > netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> > kernel@vger.kernel.org; dmaengine@vger.kernel.org; Katakam, Harini
> > <harini.katakam@amd.com>
> > Subject: Re: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer
> > path for AXI DMA
> >
> > Caution: This message originated from an External Source. Use proper caution when
> > opening attachments, clicking links, or responding.
> >
> >
> > On 2025-07-10 at 10:12:27, Suraj Gupta (suraj.gupta2@amd.com) wrote:
> > > AXI DMA driver incorrectly assumes complete transfer completion upon
> > > IRQ reception, particularly problematic when IRQ coalescing is active.
> > > Updating the tail pointer dynamically fixes it.
> > > Remove existing idle state validation in the beginning of
> > > xilinx_dma_start_transfer() as it blocks valid transfer initiation on
> > > busy channels with queued descriptors.
> > > Additionally, refactor xilinx_dma_start_transfer() to consolidate
> > > coalesce and delay configurations while conditionally starting
> > > channels only when idle.
> > >
> > > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> > > Fixes: Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx
> > > AXI Direct Memory Access Engine")
> >
> > You series looks like net-next material and this one is fixing some existing bug. Send
> > this one patch seperately to net.
> > Also include net or net-next in subject.
> >
> > Thanks,
> > Sundeep
> 
> This series is more of DMAengine as we're enabling coalesce parameters configuration via DMAengine framework. AXI ethernet is just an example client using AXI DMA.
> 
> I sent V1 as separate patches for net-next and dmaengine mailing list and got suggestion[1] to send them as single series for better review, so I didn't used net specific subject prefix.
> [1]: https://lore.kernel.org/all/d5be7218-8ec1-4208-ac24-94d4831bfdb6@linux.dev/

My bad I forgot AXI ethernet and DMA are different IPs.

Thanks,
Sundeep
> 
> Regards,
> Suraj
> 
> > > ---
> > >  drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
> > >  1 file changed, 10 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/dma/xilinx/xilinx_dma.c
> > > b/drivers/dma/xilinx/xilinx_dma.c index a34d8f0ceed8..187749b7b8a6
> > > 100644
> > > --- a/drivers/dma/xilinx/xilinx_dma.c
> > > +++ b/drivers/dma/xilinx/xilinx_dma.c
> > > @@ -1548,9 +1548,6 @@ static void xilinx_dma_start_transfer(struct
> > xilinx_dma_chan *chan)
> > >       if (list_empty(&chan->pending_list))
> > >               return;
> > >
> > > -     if (!chan->idle)
> > > -             return;
> > > -
> > >       head_desc = list_first_entry(&chan->pending_list,
> > >                                    struct xilinx_dma_tx_descriptor, node);
> > >       tail_desc = list_last_entry(&chan->pending_list,
> > > @@ -1558,23 +1555,24 @@ static void xilinx_dma_start_transfer(struct
> > xilinx_dma_chan *chan)
> > >       tail_segment = list_last_entry(&tail_desc->segments,
> > >                                      struct xilinx_axidma_tx_segment,
> > > node);
> > >
> > > +     if (chan->has_sg && list_empty(&chan->active_list))
> > > +             xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> > > +                          head_desc->async_tx.phys);
> > > +
> > >       reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
> > >
> > >       if (chan->desc_pendingcount <= XILINX_DMA_COALESCE_MAX) {
> > >               reg &= ~XILINX_DMA_CR_COALESCE_MAX;
> > >               reg |= chan->desc_pendingcount <<
> > >                                 XILINX_DMA_CR_COALESCE_SHIFT;
> > > -             dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
> > >       }
> > >
> > > -     if (chan->has_sg)
> > > -             xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> > > -                          head_desc->async_tx.phys);
> > >       reg  &= ~XILINX_DMA_CR_DELAY_MAX;
> > >       reg  |= chan->irq_delay << XILINX_DMA_CR_DELAY_SHIFT;
> > >       dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
> > >
> > > -     xilinx_dma_start(chan);
> > > +     if (chan->idle)
> > > +             xilinx_dma_start(chan);
> > >
> > >       if (chan->err)
> > >               return;
> > > @@ -1914,8 +1912,10 @@ static irqreturn_t xilinx_dma_irq_handler(int irq, void
> > *data)
> > >                     XILINX_DMA_DMASR_DLY_CNT_IRQ)) {
> > >               spin_lock(&chan->lock);
> > >               xilinx_dma_complete_descriptor(chan);
> > > -             chan->idle = true;
> > > -             chan->start_transfer(chan);
> > > +             if (list_empty(&chan->active_list)) {
> > > +                     chan->idle = true;
> > > +                     chan->start_transfer(chan);
> > > +             }
> > >               spin_unlock(&chan->lock);
> > >       }
> > >
> > > --
> > > 2.25.1
> > >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA
  2025-07-10 10:12 ` [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA Suraj Gupta
                     ` (2 preceding siblings ...)
  2025-07-11 16:26   ` Subbaraya Sundeep
@ 2025-07-15 11:05   ` Pandey, Radhey Shyam
  3 siblings, 0 replies; 16+ messages in thread
From: Pandey, Radhey Shyam @ 2025-07-15 11:05 UTC (permalink / raw)
  To: Gupta, Suraj, andrew+netdev@lunn.ch, davem@davemloft.net,
	kuba@kernel.org, pabeni@redhat.com, Simek, Michal,
	vkoul@kernel.org
  Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	Katakam, Harini

[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Suraj Gupta <suraj.gupta2@amd.com>
> Sent: Thursday, July 10, 2025 3:42 PM
> To: andrew+netdev@lunn.ch; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; Simek, Michal <michal.simek@amd.com>; vkoul@kernel.org;
> Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>
> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; dmaengine@vger.kernel.org; Katakam, Harini
> <harini.katakam@amd.com>
> Subject: [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer
> path for AXI DMA

Mention - a short summary of what you are fixing. i.e allow queuing up multiple
DMA transaction in running state or something like that.

>
> AXI DMA driver incorrectly assumes complete transfer completion upon IRQ
> reception, particularly problematic when IRQ coalescing is active.
> Updating the tail pointer dynamically fixes it.
> Remove existing idle state validation in the beginning of
> xilinx_dma_start_transfer() as it blocks valid transfer initiation on busy channels with
> queued descriptors.
> Additionally, refactor xilinx_dma_start_transfer() to consolidate coalesce and delay
> configurations while conditionally starting channels only when idle.

These two refactor should be in separate patch and as it in optimizing existing flow.
>
> Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> Fixes: Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx AXI Direct
> Memory Access Engine")

Not a real bug but an implementation from the start and subsequent queue
we're not allowed on running DMA. Same is the case for other DMA variants.

> ---
>  drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c index
> a34d8f0ceed8..187749b7b8a6 100644
> --- a/drivers/dma/xilinx/xilinx_dma.c
> +++ b/drivers/dma/xilinx/xilinx_dma.c
> @@ -1548,9 +1548,6 @@ static void xilinx_dma_start_transfer(struct
> xilinx_dma_chan *chan)
>       if (list_empty(&chan->pending_list))
>               return;
>
> -     if (!chan->idle)
> -             return;
> -
>       head_desc = list_first_entry(&chan->pending_list,
>                                    struct xilinx_dma_tx_descriptor, node);
>       tail_desc = list_last_entry(&chan->pending_list,
> @@ -1558,23 +1555,24 @@ static void xilinx_dma_start_transfer(struct
> xilinx_dma_chan *chan)
>       tail_segment = list_last_entry(&tail_desc->segments,
>                                      struct xilinx_axidma_tx_segment, node);
>
> +     if (chan->has_sg && list_empty(&chan->active_list))

Can also use chan-> idle equivalent of empty active list?
But is fine as is also.


> +             xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> +                          head_desc->async_tx.phys);
> +
>       reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
>
>       if (chan->desc_pendingcount <= XILINX_DMA_COALESCE_MAX) {
>               reg &= ~XILINX_DMA_CR_COALESCE_MAX;
>               reg |= chan->desc_pendingcount <<
>                                 XILINX_DMA_CR_COALESCE_SHIFT;
> -             dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);

This seems to unrelated change. Please consider optimization to separate
patch.

>       }
>
> -     if (chan->has_sg)
> -             xilinx_write(chan, XILINX_DMA_REG_CURDESC,
> -                          head_desc->async_tx.phys);
>       reg  &= ~XILINX_DMA_CR_DELAY_MAX;
>       reg  |= chan->irq_delay << XILINX_DMA_CR_DELAY_SHIFT;
>       dma_ctrl_write(chan, XILINX_DMA_REG_DMACR, reg);
>
> -     xilinx_dma_start(chan);
> +     if (chan->idle)
> +             xilinx_dma_start(chan);

Same as above.

>
>       if (chan->err)
>               return;
> @@ -1914,8 +1912,10 @@ static irqreturn_t xilinx_dma_irq_handler(int irq, void
> *data)
>                     XILINX_DMA_DMASR_DLY_CNT_IRQ)) {
>               spin_lock(&chan->lock);
>               xilinx_dma_complete_descriptor(chan);
> -             chan->idle = true;
> -             chan->start_transfer(chan);
> +             if (list_empty(&chan->active_list)) {
> +                     chan->idle = true;
> +                     chan->start_transfer(chan);
> +             }
>               spin_unlock(&chan->lock);
>       }
>
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters
  2025-07-10 10:12 ` [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters Suraj Gupta
@ 2025-07-23  7:30   ` Vinod Koul
  2025-07-23 11:49     ` Gupta, Suraj
  0 siblings, 1 reply; 16+ messages in thread
From: Vinod Koul @ 2025-07-23  7:30 UTC (permalink / raw)
  To: Suraj Gupta
  Cc: andrew+netdev, davem, kuba, pabeni, michal.simek,
	radhey.shyam.pandey, netdev, linux-arm-kernel, linux-kernel,
	dmaengine, harini.katakam

On 10-07-25, 15:42, Suraj Gupta wrote:
> Interrupt coalescing is a mechanism to reduce the number of hardware
> interrupts triggered ether until a certain amount of work is pending,
> or a timeout timer triggers. Tuning the interrupt coalesce settings
> involves adjusting the amount of work and timeout delay.
> Many DMA controllers support to configure coalesce count and delay.
> Add support to configure them via dma_slave_config and read
> using dma_slave_caps.
> 
> Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> ---
>  include/linux/dmaengine.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index bb146c5ac3e4..c7c1adb8e571 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -431,6 +431,9 @@ enum dma_slave_buswidth {
>   * @peripheral_config: peripheral configuration for programming peripheral
>   * for dmaengine transfer
>   * @peripheral_size: peripheral configuration buffer size
> + * @coalesce_cnt: Maximum number of transfers before receiving an interrupt.
> + * @coalesce_usecs: How many usecs to delay an interrupt after a transfer
> + * is completed.
>   *
>   * This struct is passed in as configuration data to a DMA engine
>   * in order to set up a certain channel for DMA transport at runtime.
> @@ -457,6 +460,8 @@ struct dma_slave_config {
>  	bool device_fc;
>  	void *peripheral_config;
>  	size_t peripheral_size;
> +	u32 coalesce_cnt;
> +	u32 coalesce_usecs;
>  };
>  
>  /**
> @@ -507,6 +512,9 @@ enum dma_residue_granularity {
>   * @residue_granularity: granularity of the reported transfer residue
>   * @descriptor_reuse: if a descriptor can be reused by client and
>   * resubmitted multiple times
> + * @coalesce_cnt: Maximum number of transfers before receiving an interrupt.
> + * @coalesce_usecs: How many usecs to delay an interrupt after a transfer
> + * is completed.
>   */
>  struct dma_slave_caps {
>  	u32 src_addr_widths;
> @@ -520,6 +528,8 @@ struct dma_slave_caps {
>  	bool cmd_terminate;
>  	enum dma_residue_granularity residue_granularity;
>  	bool descriptor_reuse;
> +	u32 coalesce_cnt;
> +	u32 coalesce_usecs;

Why not selectively set interrupts for the descriptor. The dma
descriptors are in order, so one a descriptor is notified and complete,
you can also complete the descriptors before that. I would suggest to
use that rather than define a new interface for this

-- 
~Vinod

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters
  2025-07-23  7:30   ` Vinod Koul
@ 2025-07-23 11:49     ` Gupta, Suraj
  2025-08-25  6:17       ` Gupta, Suraj
  2025-08-25 11:30       ` Vinod Koul
  0 siblings, 2 replies; 16+ messages in thread
From: Gupta, Suraj @ 2025-07-23 11:49 UTC (permalink / raw)
  To: Vinod Koul
  Cc: andrew+netdev@lunn.ch, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, Simek, Michal, Pandey, Radhey Shyam,
	netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	Katakam, Harini

[Public]

Hi Vinod,

> -----Original Message-----
> From: Vinod Koul <vkoul@kernel.org>
> Sent: Wednesday, July 23, 2025 1:00 PM
> To: Gupta, Suraj <Suraj.Gupta2@amd.com>
> Cc: andrew+netdev@lunn.ch; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; Simek, Michal <michal.simek@amd.com>; Pandey, Radhey
> Shyam <radhey.shyam.pandey@amd.com>; netdev@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> dmaengine@vger.kernel.org; Katakam, Harini <harini.katakam@amd.com>
> Subject: Re: [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ
> coalescing parameters
>
> Caution: This message originated from an External Source. Use proper caution when
> opening attachments, clicking links, or responding.
>
>
> On 10-07-25, 15:42, Suraj Gupta wrote:
> > Interrupt coalescing is a mechanism to reduce the number of hardware
> > interrupts triggered ether until a certain amount of work is pending,
> > or a timeout timer triggers. Tuning the interrupt coalesce settings
> > involves adjusting the amount of work and timeout delay.
> > Many DMA controllers support to configure coalesce count and delay.
> > Add support to configure them via dma_slave_config and read using
> > dma_slave_caps.
> >
> > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> > ---
> >  include/linux/dmaengine.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index bb146c5ac3e4..c7c1adb8e571 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -431,6 +431,9 @@ enum dma_slave_buswidth {
> >   * @peripheral_config: peripheral configuration for programming peripheral
> >   * for dmaengine transfer
> >   * @peripheral_size: peripheral configuration buffer size
> > + * @coalesce_cnt: Maximum number of transfers before receiving an interrupt.
> > + * @coalesce_usecs: How many usecs to delay an interrupt after a
> > + transfer
> > + * is completed.
> >   *
> >   * This struct is passed in as configuration data to a DMA engine
> >   * in order to set up a certain channel for DMA transport at runtime.
> > @@ -457,6 +460,8 @@ struct dma_slave_config {
> >       bool device_fc;
> >       void *peripheral_config;
> >       size_t peripheral_size;
> > +     u32 coalesce_cnt;
> > +     u32 coalesce_usecs;
> >  };
> >
> >  /**
> > @@ -507,6 +512,9 @@ enum dma_residue_granularity {
> >   * @residue_granularity: granularity of the reported transfer residue
> >   * @descriptor_reuse: if a descriptor can be reused by client and
> >   * resubmitted multiple times
> > + * @coalesce_cnt: Maximum number of transfers before receiving an interrupt.
> > + * @coalesce_usecs: How many usecs to delay an interrupt after a
> > + transfer
> > + * is completed.
> >   */
> >  struct dma_slave_caps {
> >       u32 src_addr_widths;
> > @@ -520,6 +528,8 @@ struct dma_slave_caps {
> >       bool cmd_terminate;
> >       enum dma_residue_granularity residue_granularity;
> >       bool descriptor_reuse;
> > +     u32 coalesce_cnt;
> > +     u32 coalesce_usecs;
>
> Why not selectively set interrupts for the descriptor. The dma descriptors are in order,
> so one a descriptor is notified and complete, you can also complete the descriptors
> before that. I would suggest to use that rather than define a new interface for this
>
> --
> ~Vinod

The reason I used struct dma_slave_config to pass coalesce and delay information to DMA driver is that the coalesce count is configured per channel in AXI DMA channel control register[1].
AXI DMA IP doesn't have provision to set interrupt per descriptor[2].
I can explore other ways to pass this information via struct dma_async_tx_descriptor or metadata, or any other way.
Please let me know your thoughts.

References:
[1]: https://docs.amd.com/r/en-US/pg021_axi_dma/MM2S_DMACR-MM2S-DMA-Control-Register-Offset-00h ("IRQ Threshold" and "IRQ Delay" fields)
[2]: https://docs.amd.com/r/en-US/pg021_axi_dma/Scatter-Gather-Descriptor

Thanks,
Suraj

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters
  2025-07-23 11:49     ` Gupta, Suraj
@ 2025-08-25  6:17       ` Gupta, Suraj
  2025-08-25 11:30       ` Vinod Koul
  1 sibling, 0 replies; 16+ messages in thread
From: Gupta, Suraj @ 2025-08-25  6:17 UTC (permalink / raw)
  To: Vinod Koul
  Cc: andrew+netdev@lunn.ch, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, Simek, Michal, Pandey, Radhey Shyam,
	netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	Katakam, Harini

[Public]

> -----Original Message-----
> From: Gupta, Suraj <Suraj.Gupta2@amd.com>
> Sent: Wednesday, July 23, 2025 5:19 PM
> To: Vinod Koul <vkoul@kernel.org>
> Cc: andrew+netdev@lunn.ch; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; Simek, Michal <michal.simek@amd.com>; Pandey,
> Radhey Shyam <radhey.shyam.pandey@amd.com>; netdev@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> dmaengine@vger.kernel.org; Katakam, Harini <harini.katakam@amd.com>
> Subject: RE: [PATCH V2 1/4] dmaengine: Add support to configure and read
> IRQ coalescing parameters
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> [Public]
>
> Hi Vinod,
>
> > -----Original Message-----
> > From: Vinod Koul <vkoul@kernel.org>
> > Sent: Wednesday, July 23, 2025 1:00 PM
> > To: Gupta, Suraj <Suraj.Gupta2@amd.com>
> > Cc: andrew+netdev@lunn.ch; davem@davemloft.net; kuba@kernel.org;
> > pabeni@redhat.com; Simek, Michal <michal.simek@amd.com>; Pandey,
> > Radhey Shyam <radhey.shyam.pandey@amd.com>;
> netdev@vger.kernel.org;
> > linux-arm- kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> > dmaengine@vger.kernel.org; Katakam, Harini <harini.katakam@amd.com>
> > Subject: Re: [PATCH V2 1/4] dmaengine: Add support to configure and
> > read IRQ coalescing parameters
> >
> > Caution: This message originated from an External Source. Use proper
> > caution when opening attachments, clicking links, or responding.
> >
> >
> > On 10-07-25, 15:42, Suraj Gupta wrote:
> > > Interrupt coalescing is a mechanism to reduce the number of hardware
> > > interrupts triggered ether until a certain amount of work is
> > > pending, or a timeout timer triggers. Tuning the interrupt coalesce
> > > settings involves adjusting the amount of work and timeout delay.
> > > Many DMA controllers support to configure coalesce count and delay.
> > > Add support to configure them via dma_slave_config and read using
> > > dma_slave_caps.
> > >
> > > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
> > > ---
> > >  include/linux/dmaengine.h | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > > index bb146c5ac3e4..c7c1adb8e571 100644
> > > --- a/include/linux/dmaengine.h
> > > +++ b/include/linux/dmaengine.h
> > > @@ -431,6 +431,9 @@ enum dma_slave_buswidth {
> > >   * @peripheral_config: peripheral configuration for programming
> peripheral
> > >   * for dmaengine transfer
> > >   * @peripheral_size: peripheral configuration buffer size
> > > + * @coalesce_cnt: Maximum number of transfers before receiving an
> interrupt.
> > > + * @coalesce_usecs: How many usecs to delay an interrupt after a
> > > + transfer
> > > + * is completed.
> > >   *
> > >   * This struct is passed in as configuration data to a DMA engine
> > >   * in order to set up a certain channel for DMA transport at runtime.
> > > @@ -457,6 +460,8 @@ struct dma_slave_config {
> > >       bool device_fc;
> > >       void *peripheral_config;
> > >       size_t peripheral_size;
> > > +     u32 coalesce_cnt;
> > > +     u32 coalesce_usecs;
> > >  };
> > >
> > >  /**
> > > @@ -507,6 +512,9 @@ enum dma_residue_granularity {
> > >   * @residue_granularity: granularity of the reported transfer residue
> > >   * @descriptor_reuse: if a descriptor can be reused by client and
> > >   * resubmitted multiple times
> > > + * @coalesce_cnt: Maximum number of transfers before receiving an
> interrupt.
> > > + * @coalesce_usecs: How many usecs to delay an interrupt after a
> > > + transfer
> > > + * is completed.
> > >   */
> > >  struct dma_slave_caps {
> > >       u32 src_addr_widths;
> > > @@ -520,6 +528,8 @@ struct dma_slave_caps {
> > >       bool cmd_terminate;
> > >       enum dma_residue_granularity residue_granularity;
> > >       bool descriptor_reuse;
> > > +     u32 coalesce_cnt;
> > > +     u32 coalesce_usecs;
> >
> > Why not selectively set interrupts for the descriptor. The dma
> > descriptors are in order, so one a descriptor is notified and
> > complete, you can also complete the descriptors before that. I would
> > suggest to use that rather than define a new interface for this
> >
> > --
> > ~Vinod
>
> The reason I used struct dma_slave_config to pass coalesce and delay
> information to DMA driver is that the coalesce count is configured per
> channel in AXI DMA channel control register[1].
> AXI DMA IP doesn't have provision to set interrupt per descriptor[2].
> I can explore other ways to pass this information via struct
> dma_async_tx_descriptor or metadata, or any other way.
> Please let me know your thoughts.
>
> References:
> [1]: https://docs.amd.com/r/en-US/pg021_axi_dma/MM2S_DMACR-MM2S-
> DMA-Control-Register-Offset-00h ("IRQ Threshold" and "IRQ Delay" fields)
> [2]: https://docs.amd.com/r/en-US/pg021_axi_dma/Scatter-Gather-
> Descriptor
>
> Thanks,
> Suraj

Hi Vinod,

I've added a rationale above explaining why I opted to implement a new interface instead of using the interrupt flag in struct dma_async_tx_descriptor.
Please let me know your thoughts when you get a chance.

Regards,
Suraj

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters
  2025-07-23 11:49     ` Gupta, Suraj
  2025-08-25  6:17       ` Gupta, Suraj
@ 2025-08-25 11:30       ` Vinod Koul
  1 sibling, 0 replies; 16+ messages in thread
From: Vinod Koul @ 2025-08-25 11:30 UTC (permalink / raw)
  To: Gupta, Suraj
  Cc: andrew+netdev@lunn.ch, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, Simek, Michal, Pandey, Radhey Shyam,
	netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	Katakam, Harini

On 23-07-25, 11:49, Gupta, Suraj wrote:
> > >  struct dma_slave_caps {
> > >       u32 src_addr_widths;
> > > @@ -520,6 +528,8 @@ struct dma_slave_caps {
> > >       bool cmd_terminate;
> > >       enum dma_residue_granularity residue_granularity;
> > >       bool descriptor_reuse;
> > > +     u32 coalesce_cnt;
> > > +     u32 coalesce_usecs;
> >
> > Why not selectively set interrupts for the descriptor. The dma descriptors are in order,
> > so one a descriptor is notified and complete, you can also complete the descriptors
> > before that. I would suggest to use that rather than define a new interface for this
> >
> 
> The reason I used struct dma_slave_config to pass coalesce and delay information to DMA driver is that the coalesce count is configured per channel in AXI DMA channel control register[1].
> AXI DMA IP doesn't have provision to set interrupt per descriptor[2].
> I can explore other ways to pass this information via struct dma_async_tx_descriptor or metadata, or any other way.
> Please let me know your thoughts.

dma_async_tx_descriptor has dma_ctrl_flags and one of them is
DMA_PREP_INTERRUPT which you can set for a descriptor and control when
you get the interrupt

I am not a fan of adding custom interfaces.

-- 
~Vinod

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-25 11:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-10 10:12 [PATCH V2 0/4] Add ethtool support to configure irq coalescing count and delay Suraj Gupta
2025-07-10 10:12 ` [PATCH V2 1/4] dmaengine: Add support to configure and read IRQ coalescing parameters Suraj Gupta
2025-07-23  7:30   ` Vinod Koul
2025-07-23 11:49     ` Gupta, Suraj
2025-08-25  6:17       ` Gupta, Suraj
2025-08-25 11:30       ` Vinod Koul
2025-07-10 10:12 ` [PATCH V2 2/4] dmaengine: xilinx_dma: Fix irq handler and start transfer path for AXI DMA Suraj Gupta
2025-07-10 11:26   ` Simon Horman
2025-07-11  5:32   ` Folker Schwesinger
2025-07-11 16:26   ` Subbaraya Sundeep
2025-07-11 20:13     ` Gupta, Suraj
2025-07-12  5:36       ` Subbaraya Sundeep
2025-07-15 11:05   ` Pandey, Radhey Shyam
2025-07-10 10:12 ` [PATCH V2 3/4] dmaengine: xilinx_dma: Add support to configure/report coalesce parameters from/to client using " Suraj Gupta
2025-07-10 10:12 ` [PATCH V2 4/4] net: xilinx: axienet: Add ethtool support to configure/report irq coalescing parameters in DMAengine flow Suraj Gupta
2025-07-11 16:33   ` Subbaraya Sundeep

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).