public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Support ECSPI dynamic burst feature for DMA mode
@ 2025-11-25 10:06 Carlos Song
  2025-11-25 10:06 ` [PATCH 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer() Carlos Song
                   ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-25 10:06 UTC (permalink / raw)
  To: broonie, frank.li, hawnguo, s.hauer, kernel, festevam
  Cc: linux-spi, imx, linux-arm-kernel, linux-kernel, Carlos Song

ECSPI has a low throughput because of no dynamic burst support, it
transfers only one word per frame in DMA mode, causing SCLK stalls
between words due to BURST_LENGTH updates.

This patch set is to support ECSPI dynamic burst feature to help improve
the ECSPI DMA mode performance.

Performance test (spidev_test @10MHz, 4KB):
  Before: tx/rx ~6651.9 kbps
  After:  tx/rx ~9922.2 kbps (~50% improvement)

For compatibility with slow SPI devices, add configurable word delay in
DMA mode. When word delay is set, dynamic burst is disabled and
BURST_LENGTH equals word length.

Also support target DMA mode with enabled dynamic burst.

Carlos Song (6):
  spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer()
  spi: imx: introduce helper to clear DMA mode logic
  spi: imx: avoid dmaengine_terminate_all() on TX prep failure
  spi: imx: handle DMA submission errors with dma_submit_error()
  spi: imx: support dynamic burst length for ECSPI DMA mode
  spi: imx: enable DMA mode for target operation

 drivers/spi/spi-imx.c | 620 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 513 insertions(+), 107 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer()
  2025-11-25 10:06 [PATCH 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
@ 2025-11-25 10:06 ` Carlos Song
  2025-11-25 15:32   ` Frank Li
  2025-11-25 10:06 ` [PATCH 2/6] spi: imx: introduce helper to clear DMA mode logic Carlos Song
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Carlos Song @ 2025-11-25 10:06 UTC (permalink / raw)
  To: broonie, frank.li, hawnguo, s.hauer, kernel, festevam
  Cc: linux-spi, imx, linux-arm-kernel, linux-kernel, Carlos Song

Relocate spi_imx_dma_configure() next to spi_imx_dma_transfer() so that
all DMA-related functions are grouped together for better readability.
No functional changes.

Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
 drivers/spi/spi-imx.c | 88 +++++++++++++++++++++----------------------
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index b8b79bb7fec3..e78e02a84b50 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1282,50 +1282,6 @@ static irqreturn_t spi_imx_isr(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-static int spi_imx_dma_configure(struct spi_controller *controller)
-{
-	int ret;
-	enum dma_slave_buswidth buswidth;
-	struct dma_slave_config rx = {}, tx = {};
-	struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
-
-	switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
-	case 4:
-		buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
-		break;
-	case 2:
-		buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
-		break;
-	case 1:
-		buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
-		break;
-	default:
-		return -EINVAL;
-	}
-
-	tx.direction = DMA_MEM_TO_DEV;
-	tx.dst_addr = spi_imx->base_phys + MXC_CSPITXDATA;
-	tx.dst_addr_width = buswidth;
-	tx.dst_maxburst = spi_imx->wml;
-	ret = dmaengine_slave_config(controller->dma_tx, &tx);
-	if (ret) {
-		dev_err(spi_imx->dev, "TX dma configuration failed with %d\n", ret);
-		return ret;
-	}
-
-	rx.direction = DMA_DEV_TO_MEM;
-	rx.src_addr = spi_imx->base_phys + MXC_CSPIRXDATA;
-	rx.src_addr_width = buswidth;
-	rx.src_maxburst = spi_imx->wml;
-	ret = dmaengine_slave_config(controller->dma_rx, &rx);
-	if (ret) {
-		dev_err(spi_imx->dev, "RX dma configuration failed with %d\n", ret);
-		return ret;
-	}
-
-	return 0;
-}
-
 static int spi_imx_setupxfer(struct spi_device *spi,
 				 struct spi_transfer *t)
 {
@@ -1481,6 +1437,50 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
 	return secs_to_jiffies(2 * timeout);
 }
 
+static int spi_imx_dma_configure(struct spi_controller *controller)
+{
+	int ret;
+	enum dma_slave_buswidth buswidth;
+	struct dma_slave_config rx = {}, tx = {};
+	struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
+
+	switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
+	case 4:
+		buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
+		break;
+	case 2:
+		buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
+		break;
+	case 1:
+		buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	tx.direction = DMA_MEM_TO_DEV;
+	tx.dst_addr = spi_imx->base_phys + MXC_CSPITXDATA;
+	tx.dst_addr_width = buswidth;
+	tx.dst_maxburst = spi_imx->wml;
+	ret = dmaengine_slave_config(controller->dma_tx, &tx);
+	if (ret) {
+		dev_err(spi_imx->dev, "TX dma configuration failed with %d\n", ret);
+		return ret;
+	}
+
+	rx.direction = DMA_DEV_TO_MEM;
+	rx.src_addr = spi_imx->base_phys + MXC_CSPIRXDATA;
+	rx.src_addr_width = buswidth;
+	rx.src_maxburst = spi_imx->wml;
+	ret = dmaengine_slave_config(controller->dma_rx, &rx);
+	if (ret) {
+		dev_err(spi_imx->dev, "RX dma configuration failed with %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
 static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
 				struct spi_transfer *transfer)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/6] spi: imx: introduce helper to clear DMA mode logic
  2025-11-25 10:06 [PATCH 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
  2025-11-25 10:06 ` [PATCH 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer() Carlos Song
@ 2025-11-25 10:06 ` Carlos Song
  2025-11-25 15:41   ` Frank Li
  2025-11-25 10:06 ` [PATCH 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure Carlos Song
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Carlos Song @ 2025-11-25 10:06 UTC (permalink / raw)
  To: broonie, frank.li, hawnguo, s.hauer, kernel, festevam
  Cc: linux-spi, imx, linux-arm-kernel, linux-kernel, Carlos Song

Add a helper function to handle clearing DMA mode, including getting the
maximum watermark length and submitting the DMA request. This refactoring
makes the code more concise and improves readability.
No functional changes.

Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
 drivers/spi/spi-imx.c | 164 +++++++++++++++++++++++-------------------
 1 file changed, 92 insertions(+), 72 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index e78e02a84b50..012f5bcbf73f 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1437,6 +1437,94 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
 	return secs_to_jiffies(2 * timeout);
 }
 
+static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
+			      struct spi_transfer *transfer)
+{
+	struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
+	struct spi_controller *controller = spi_imx->controller;
+	struct dma_async_tx_descriptor *desc_tx, *desc_rx;
+	unsigned long transfer_timeout;
+	unsigned long time_left;
+
+	/*
+	 * The TX DMA setup starts the transfer, so make sure RX is configured
+	 * before TX.
+	 */
+	desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
+					  rx->sgl, rx->nents, DMA_DEV_TO_MEM,
+					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+	if (!desc_rx) {
+		transfer->error |= SPI_TRANS_FAIL_NO_START;
+		return -EINVAL;
+	}
+
+	desc_rx->callback = spi_imx_dma_rx_callback;
+	desc_rx->callback_param = (void *)spi_imx;
+	dmaengine_submit(desc_rx);
+	reinit_completion(&spi_imx->dma_rx_completion);
+	dma_async_issue_pending(controller->dma_rx);
+
+	desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
+					  tx->sgl, tx->nents, DMA_MEM_TO_DEV,
+					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+	if (!desc_tx) {
+		dmaengine_terminate_all(controller->dma_tx);
+		dmaengine_terminate_all(controller->dma_rx);
+		return -EINVAL;
+	}
+
+	desc_tx->callback = spi_imx_dma_tx_callback;
+	desc_tx->callback_param = (void *)spi_imx;
+	dmaengine_submit(desc_tx);
+	reinit_completion(&spi_imx->dma_tx_completion);
+	dma_async_issue_pending(controller->dma_tx);
+
+	spi_imx->devtype_data->trigger(spi_imx);
+
+	transfer_timeout = spi_imx_calculate_timeout(spi_imx, transfer->len);
+
+	/* Wait SDMA to finish the data transfer.*/
+	time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
+						transfer_timeout);
+	if (!time_left) {
+		dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
+		dmaengine_terminate_all(controller->dma_tx);
+		dmaengine_terminate_all(controller->dma_rx);
+		return -ETIMEDOUT;
+	}
+
+	time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
+						transfer_timeout);
+	if (!time_left) {
+		dev_err(&controller->dev, "I/O Error in DMA RX\n");
+		spi_imx->devtype_data->reset(spi_imx);
+		dmaengine_terminate_all(controller->dma_rx);
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
+				     struct spi_transfer *transfer)
+{
+	struct sg_table *rx = &transfer->rx_sg;
+	struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
+	unsigned int bytes_per_word, i;
+
+	/* Get the right burst length from the last sg to ensure no tail data */
+	bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
+	for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
+		if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
+			break;
+	}
+	/* Use 1 as wml in case no available burst length got */
+	if (i == 0)
+		i = 1;
+
+	spi_imx->wml = i;
+}
+
 static int spi_imx_dma_configure(struct spi_controller *controller)
 {
 	int ret;
@@ -1484,26 +1572,10 @@ static int spi_imx_dma_configure(struct spi_controller *controller)
 static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
 				struct spi_transfer *transfer)
 {
-	struct dma_async_tx_descriptor *desc_tx, *desc_rx;
-	unsigned long transfer_timeout;
-	unsigned long time_left;
 	struct spi_controller *controller = spi_imx->controller;
-	struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
-	struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
-	unsigned int bytes_per_word, i;
 	int ret;
 
-	/* Get the right burst length from the last sg to ensure no tail data */
-	bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
-	for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
-		if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
-			break;
-	}
-	/* Use 1 as wml in case no available burst length got */
-	if (i == 0)
-		i = 1;
-
-	spi_imx->wml =  i;
+	spi_imx_dma_max_wml_find(spi_imx, transfer);
 
 	ret = spi_imx_dma_configure(controller);
 	if (ret)
@@ -1516,61 +1588,9 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
 	}
 	spi_imx->devtype_data->setup_wml(spi_imx);
 
-	/*
-	 * The TX DMA setup starts the transfer, so make sure RX is configured
-	 * before TX.
-	 */
-	desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
-				rx->sgl, rx->nents, DMA_DEV_TO_MEM,
-				DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
-	if (!desc_rx) {
-		ret = -EINVAL;
-		goto dma_failure_no_start;
-	}
-
-	desc_rx->callback = spi_imx_dma_rx_callback;
-	desc_rx->callback_param = (void *)spi_imx;
-	dmaengine_submit(desc_rx);
-	reinit_completion(&spi_imx->dma_rx_completion);
-	dma_async_issue_pending(controller->dma_rx);
-
-	desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
-				tx->sgl, tx->nents, DMA_MEM_TO_DEV,
-				DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
-	if (!desc_tx) {
-		dmaengine_terminate_all(controller->dma_tx);
-		dmaengine_terminate_all(controller->dma_rx);
-		return -EINVAL;
-	}
-
-	desc_tx->callback = spi_imx_dma_tx_callback;
-	desc_tx->callback_param = (void *)spi_imx;
-	dmaengine_submit(desc_tx);
-	reinit_completion(&spi_imx->dma_tx_completion);
-	dma_async_issue_pending(controller->dma_tx);
-
-	spi_imx->devtype_data->trigger(spi_imx);
-
-	transfer_timeout = spi_imx_calculate_timeout(spi_imx, transfer->len);
-
-	/* Wait SDMA to finish the data transfer.*/
-	time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
-						transfer_timeout);
-	if (!time_left) {
-		dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
-		dmaengine_terminate_all(controller->dma_tx);
-		dmaengine_terminate_all(controller->dma_rx);
-		return -ETIMEDOUT;
-	}
-
-	time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
-						transfer_timeout);
-	if (!time_left) {
-		dev_err(&controller->dev, "I/O Error in DMA RX\n");
-		spi_imx->devtype_data->reset(spi_imx);
-		dmaengine_terminate_all(controller->dma_rx);
-		return -ETIMEDOUT;
-	}
+	ret = spi_imx_dma_submit(spi_imx, transfer);
+	if (ret)
+		return ret;
 
 	return 0;
 /* fallback to pio */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure
  2025-11-25 10:06 [PATCH 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
  2025-11-25 10:06 ` [PATCH 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer() Carlos Song
  2025-11-25 10:06 ` [PATCH 2/6] spi: imx: introduce helper to clear DMA mode logic Carlos Song
@ 2025-11-25 10:06 ` Carlos Song
  2025-11-25 15:42   ` Frank Li
  2025-11-25 10:06 ` [PATCH 4/6] spi: imx: handle DMA submission errors with dma_submit_error() Carlos Song
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Carlos Song @ 2025-11-25 10:06 UTC (permalink / raw)
  To: broonie, frank.li, hawnguo, s.hauer, kernel, festevam
  Cc: linux-spi, imx, linux-arm-kernel, linux-kernel, Carlos Song

If dmaengine_prep_slave_sg() fails, no descriptor is submitted to the TX
channel and DMA is never started. Therefore, calling
dmaengine_terminate_all() for the TX DMA channel is unnecessary.

Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
 drivers/spi/spi-imx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 012f5bcbf73f..186963d3d2e0 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1468,7 +1468,6 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
 					  tx->sgl, tx->nents, DMA_MEM_TO_DEV,
 					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
 	if (!desc_tx) {
-		dmaengine_terminate_all(controller->dma_tx);
 		dmaengine_terminate_all(controller->dma_rx);
 		return -EINVAL;
 	}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/6] spi: imx: handle DMA submission errors with dma_submit_error()
  2025-11-25 10:06 [PATCH 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
                   ` (2 preceding siblings ...)
  2025-11-25 10:06 ` [PATCH 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure Carlos Song
@ 2025-11-25 10:06 ` Carlos Song
  2025-11-25 15:45   ` Frank Li
  2025-11-25 10:06 ` [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
  2025-11-25 10:06 ` [PATCH 6/6] spi: imx: enable DMA mode for target operation Carlos Song
  5 siblings, 1 reply; 34+ messages in thread
From: Carlos Song @ 2025-11-25 10:06 UTC (permalink / raw)
  To: broonie, frank.li, hawnguo, s.hauer, kernel, festevam
  Cc: linux-spi, imx, linux-arm-kernel, linux-kernel, Carlos Song

Add error handling for DMA request submission by checking the return
cookie with dma_submit_error(). This prevents propagating submission
failures through the DMA transfer process, which could otherwise lead
to unexpected behavior.

Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
 drivers/spi/spi-imx.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 186963d3d2e0..42f64d9535c9 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1445,6 +1445,7 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
 	struct dma_async_tx_descriptor *desc_tx, *desc_rx;
 	unsigned long transfer_timeout;
 	unsigned long time_left;
+	dma_cookie_t cookie;
 
 	/*
 	 * The TX DMA setup starts the transfer, so make sure RX is configured
@@ -1460,21 +1461,29 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
 
 	desc_rx->callback = spi_imx_dma_rx_callback;
 	desc_rx->callback_param = (void *)spi_imx;
-	dmaengine_submit(desc_rx);
+	cookie = dmaengine_submit(desc_rx);
+	if (dma_submit_error(cookie)) {
+		dev_err(spi_imx->dev, "submitting DMA RX failed\n");
+		transfer->error |= SPI_TRANS_FAIL_NO_START;
+		goto dmaengine_terminate_rx;
+	}
+
 	reinit_completion(&spi_imx->dma_rx_completion);
 	dma_async_issue_pending(controller->dma_rx);
 
 	desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
 					  tx->sgl, tx->nents, DMA_MEM_TO_DEV,
 					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
-	if (!desc_tx) {
-		dmaengine_terminate_all(controller->dma_rx);
-		return -EINVAL;
-	}
+	if (!desc_tx)
+		goto dmaengine_terminate_rx;
 
 	desc_tx->callback = spi_imx_dma_tx_callback;
 	desc_tx->callback_param = (void *)spi_imx;
-	dmaengine_submit(desc_tx);
+	cookie = dmaengine_submit(desc_tx);
+	if (dma_submit_error(cookie)) {
+		dev_err(spi_imx->dev, "submitting DMA TX failed\n");
+		goto dmaengine_terminate_tx;
+	}
 	reinit_completion(&spi_imx->dma_tx_completion);
 	dma_async_issue_pending(controller->dma_tx);
 
@@ -1502,6 +1511,13 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
 	}
 
 	return 0;
+
+dmaengine_terminate_tx:
+	dmaengine_terminate_all(controller->dma_tx);
+dmaengine_terminate_rx:
+	dmaengine_terminate_all(controller->dma_rx);
+
+	return -EINVAL;
 }
 
 static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-25 10:06 [PATCH 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
                   ` (3 preceding siblings ...)
  2025-11-25 10:06 ` [PATCH 4/6] spi: imx: handle DMA submission errors with dma_submit_error() Carlos Song
@ 2025-11-25 10:06 ` Carlos Song
  2025-11-25 12:10   ` Marc Kleine-Budde
                     ` (4 more replies)
  2025-11-25 10:06 ` [PATCH 6/6] spi: imx: enable DMA mode for target operation Carlos Song
  5 siblings, 5 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-25 10:06 UTC (permalink / raw)
  To: broonie, frank.li, hawnguo, s.hauer, kernel, festevam
  Cc: linux-spi, imx, linux-arm-kernel, linux-kernel, Carlos Song

ECSPI transfers only one word per frame in DMA mode, causing SCLK stalls
between words due to BURST_LENGTH updates, which significantly impacts
performance.

To improve throughput, configure BURST_LENGTH as large as possible (up to
512 bytes per frame) instead of word length. This avoids delays between
words. When transfer length is not 4-byte aligned, use bounce buffers to
align data for DMA. TX uses aligned words for TXFIFO, while RX trims DMA
buffer data after transfer completion.

Introduce a new dma_package structure to store:
  1. BURST_LENGTH values for each DMA request
  2. Variables for DMA submission
  3. DMA transmission length and actual data length

Handle three cases:
  - len <= 512 bytes: one package, BURST_LENGTH = len * 8 - 1
  - len > 512 and aligned: one package, BURST_LENGTH = max (512 bytes)
  - len > 512 and unaligned: two packages, second for tail data

Performance test (spidev_test @10MHz, 4KB):
  Before: tx/rx ~6651.9 kbps
  After:  tx/rx ~9922.2 kbps (~50% improvement)

For compatibility with slow SPI devices, add configurable word delay in
DMA mode. When word delay is set, dynamic burst is disabled and
BURST_LENGTH equals word length.

Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
 drivers/spi/spi-imx.c | 409 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 373 insertions(+), 36 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 42f64d9535c9..f02a47fbba8a 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -60,6 +60,7 @@ MODULE_PARM_DESC(polling_limit_us,
 #define MX51_ECSPI_CTRL_MAX_BURST	512
 /* The maximum bytes that IMX53_ECSPI can transfer in target mode.*/
 #define MX53_MAX_TRANSFER_BYTES		512
+#define BYTES_PER_32BITS_WORD		4
 
 enum spi_imx_devtype {
 	IMX1_CSPI,
@@ -95,6 +96,16 @@ struct spi_imx_devtype_data {
 	enum spi_imx_devtype devtype;
 };
 
+struct dma_data_package {
+	u32 cmd_word;
+	void *dma_rx_buf;
+	void *dma_tx_buf;
+	dma_addr_t	dma_tx_addr;
+	dma_addr_t	dma_rx_addr;
+	int dma_len;
+	int data_len;
+};
+
 struct spi_imx_data {
 	struct spi_controller *controller;
 	struct device *dev;
@@ -130,6 +141,9 @@ struct spi_imx_data {
 	u32 wml;
 	struct completion dma_rx_completion;
 	struct completion dma_tx_completion;
+	struct dma_data_package *dma_data;
+	int dma_package_num;
+	int rx_offset;
 
 	const struct spi_imx_devtype_data *devtype_data;
 };
@@ -189,6 +203,9 @@ MXC_SPI_BUF_TX(u16)
 MXC_SPI_BUF_RX(u32)
 MXC_SPI_BUF_TX(u32)
 
+/* Align to cache line to avoid swiotlo bounce */
+#define DMA_CACHE_ALIGNED_LEN(x) ALIGN((x), dma_get_cache_alignment())
+
 /* First entry is reserved, second entry is valid only if SDHC_SPIEN is set
  * (which is currently not the case in this driver)
  */
@@ -253,6 +270,14 @@ static bool spi_imx_can_dma(struct spi_controller *controller, struct spi_device
 	if (transfer->len < spi_imx->devtype_data->fifo_size)
 		return false;
 
+	/* DMA only can transmit data in bytes */
+	if (spi_imx->bits_per_word != 8 && spi_imx->bits_per_word != 16 &&
+	    spi_imx->bits_per_word != 32)
+		return false;
+
+	if (transfer->len >= MAX_SDMA_BD_BYTES)
+		return false;
+
 	spi_imx->dynamic_burst = 0;
 
 	return true;
@@ -1398,8 +1423,6 @@ static int spi_imx_sdma_init(struct device *dev, struct spi_imx_data *spi_imx,
 
 	init_completion(&spi_imx->dma_rx_completion);
 	init_completion(&spi_imx->dma_tx_completion);
-	controller->can_dma = spi_imx_can_dma;
-	controller->max_dma_len = MAX_SDMA_BD_BYTES;
 	spi_imx->controller->flags = SPI_CONTROLLER_MUST_RX |
 					 SPI_CONTROLLER_MUST_TX;
 
@@ -1437,10 +1460,252 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
 	return secs_to_jiffies(2 * timeout);
 }
 
+static void spi_imx_dma_unmap(struct spi_imx_data *spi_imx,
+			      struct dma_data_package *dma_data)
+{
+	struct device *tx_dev = spi_imx->controller->dma_tx->device->dev;
+	struct device *rx_dev = spi_imx->controller->dma_rx->device->dev;
+
+	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
+			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+			 DMA_TO_DEVICE);
+	dma_unmap_single(rx_dev, dma_data->dma_rx_addr,
+			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+			 DMA_FROM_DEVICE);
+}
+
+static void spi_imx_dma_rx_data_handle(struct spi_imx_data *spi_imx,
+				       struct dma_data_package *dma_data, void *rx_buf,
+				       bool word_delay)
+{
+#ifdef __LITTLE_ENDIAN
+	unsigned int bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+	u32 *temp = dma_data->dma_rx_buf;
+#endif
+	void *copy_ptr;
+	int unaligned;
+
+#ifdef __LITTLE_ENDIAN
+	/*
+	 * On little-endian CPUs, adjust byte order:
+	 * - Swap bytes when bpw = 8
+	 * - Swap half-words when bpw = 16
+	 * This ensures correct data ordering for DMA transfers.
+	 */
+	if (!word_delay) {
+		for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len, BYTES_PER_32BITS_WORD); i++) {
+			if (bytes_per_word == 1)
+				swab32s(temp + i);
+			else if (bytes_per_word == 2)
+				swahw32s(temp + i);
+		}
+	}
+#endif
+
+	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
+		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
+		copy_ptr = (u8 *)dma_data->dma_rx_buf + BYTES_PER_32BITS_WORD - unaligned;
+	} else {
+		copy_ptr = dma_data->dma_rx_buf;
+	}
+
+	memcpy(rx_buf, copy_ptr, dma_data->data_len);
+}
+
+static int spi_imx_dma_map(struct spi_imx_data *spi_imx,
+			   struct dma_data_package *dma_data)
+{
+	struct spi_controller *controller = spi_imx->controller;
+	struct device *tx_dev = controller->dma_tx->device->dev;
+	struct device *rx_dev = controller->dma_rx->device->dev;
+
+	dma_data->dma_tx_addr = dma_map_single(tx_dev, dma_data->dma_tx_buf,
+					       DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+					       DMA_TO_DEVICE);
+	if (dma_mapping_error(tx_dev, dma_data->dma_tx_addr)) {
+		dev_err(spi_imx->dev, "DMA TX map failed\n");
+		return -EINVAL;
+	}
+
+	dma_data->dma_rx_addr = dma_map_single(rx_dev, dma_data->dma_rx_buf,
+					       DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+					       DMA_FROM_DEVICE);
+	if (dma_mapping_error(rx_dev, dma_data->dma_rx_addr)) {
+		dev_err(spi_imx->dev, "DMA RX map failed\n");
+		goto rx_map_err;
+	}
+
+	return 0;
+
+rx_map_err:
+	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
+			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+			 DMA_TO_DEVICE);
+	return -EINVAL;
+}
+
+static int spi_imx_dma_tx_data_handle(struct spi_imx_data *spi_imx,
+				      struct dma_data_package *dma_data,
+				      const void *tx_buf,
+				      bool word_delay)
+{
+#ifdef __LITTLE_ENDIAN
+	unsigned int bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+	u32 *temp;
+#endif
+	void *copy_ptr;
+	int unaligned;
+
+	if (word_delay) {
+		dma_data->dma_len = dma_data->data_len;
+	} else {
+		/*
+		 * As per the reference manual, when burst length = 32*n + m bits, ECSPI
+		 * sends m LSB bits in the first word, followed by n full 32-bit words.
+		 * Since actual data may not be 4-byte aligned, allocate DMA TX/RX buffers
+		 * to ensure alignment. For TX, DMA pushes 4-byte aligned words to TXFIFO,
+		 * while ECSPI uses BURST_LENGTH settings to maintain correct bit count.
+		 * For RX, DMA receives 32-bit words from RXFIFO; after transfer completes,
+		 * trim the DMA RX buffer and copy the actual data to rx_buf.
+		 */
+		dma_data->dma_len = ALIGN(dma_data->data_len, BYTES_PER_32BITS_WORD);
+	}
+
+	dma_data->dma_tx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL | __GFP_ZERO);
+	if (!dma_data->dma_tx_buf)
+		return -ENOMEM;
+
+	dma_data->dma_rx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL | __GFP_ZERO);
+	if (!dma_data->dma_rx_buf) {
+		kfree(dma_data->dma_tx_buf);
+		return -ENOMEM;
+	}
+
+	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
+		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
+		copy_ptr = (u8 *)dma_data->dma_tx_buf + BYTES_PER_32BITS_WORD - unaligned;
+	} else {
+		copy_ptr = dma_data->dma_tx_buf;
+	}
+
+	memcpy(copy_ptr, tx_buf, dma_data->data_len);
+
+	/*
+	 * When word_delay is enabled, DMA transfers an entire word in one minor loop.
+	 * In this case, no data requires additional handling.
+	 */
+	if (word_delay)
+		return 0;
+
+#ifdef __LITTLE_ENDIAN
+	/*
+	 * On little-endian CPUs, adjust byte order:
+	 * - Swap bytes when bpw = 8
+	 * - Swap half-words when bpw = 16
+	 * This ensures correct data ordering for DMA transfers.
+	 */
+	temp = dma_data->dma_tx_buf;
+	for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len, BYTES_PER_32BITS_WORD); i++) {
+		if (bytes_per_word == 1)
+			swab32s(temp + i);
+		else if (bytes_per_word == 2)
+			swahw32s(temp + i);
+	}
+#endif
+
+	return 0;
+}
+
+static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
+				    struct spi_transfer *transfer,
+				    bool word_delay)
+{
+	u32 pre_bl, tail_bl;
+	u32 ctrl;
+	int ret;
+
+	/*
+	 * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds 512
+	 * and is not a multiple of 512, a tail transfer is required. In this case,
+	 * an extra DMA request is issued for the remaining data.
+	 */
+	ctrl = readl(spi_imx->base + MX51_ECSPI_CTRL);
+	if (word_delay) {
+		/*
+		 * When SPI IMX need to support word delay, according to "Sample Period Control
+		 * Register" shows, The Sample Period Control Register (ECSPI_PERIODREG)
+		 * provides software a way to insert delays (wait states) between consecutive
+		 * SPI transfers. As a result, ECSPI can only transfer one word per frame, and
+		 * the delay occurs between frames.
+		 */
+		spi_imx->dma_package_num = 1;
+		pre_bl = spi_imx->bits_per_word - 1;
+	} else if (transfer->len <= MX51_ECSPI_CTRL_MAX_BURST) {
+		spi_imx->dma_package_num = 1;
+		pre_bl = transfer->len * BITS_PER_BYTE - 1;
+	} else if (!(transfer->len % MX51_ECSPI_CTRL_MAX_BURST)) {
+		spi_imx->dma_package_num = 1;
+		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
+	} else {
+		spi_imx->dma_package_num = 2;
+		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
+		tail_bl = (transfer->len % MX51_ECSPI_CTRL_MAX_BURST) * BITS_PER_BYTE - 1;
+	}
+
+	spi_imx->dma_data = kmalloc_array(spi_imx->dma_package_num,
+					  sizeof(struct dma_data_package),
+					  GFP_KERNEL | __GFP_ZERO);
+	if (!spi_imx->dma_data) {
+		dev_err(spi_imx->dev, "Failed to allocate DMA package buffer!\n");
+		return -ENOMEM;
+	}
+
+	if (spi_imx->dma_package_num == 1) {
+		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
+		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
+		spi_imx->dma_data[0].cmd_word = ctrl;
+		spi_imx->dma_data[0].data_len = transfer->len;
+		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
+						 word_delay);
+		if (ret) {
+			kfree(spi_imx->dma_data);
+			return ret;
+		}
+	} else {
+		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
+		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
+		spi_imx->dma_data[0].cmd_word = ctrl;
+		spi_imx->dma_data[0].data_len = DIV_ROUND_DOWN_ULL(transfer->len,
+								   MX51_ECSPI_CTRL_MAX_BURST)
+								   * MX51_ECSPI_CTRL_MAX_BURST;
+		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
+						 false);
+		if (ret) {
+			kfree(spi_imx->dma_data);
+			return ret;
+		}
+
+		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
+		ctrl |= tail_bl << MX51_ECSPI_CTRL_BL_OFFSET;
+		spi_imx->dma_data[1].cmd_word = ctrl;
+		spi_imx->dma_data[1].data_len = transfer->len % MX51_ECSPI_CTRL_MAX_BURST;
+		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[1],
+						 transfer->tx_buf + spi_imx->dma_data[0].data_len,
+						 false);
+		if (ret) {
+			kfree(spi_imx->dma_data[0].dma_tx_buf);
+			kfree(spi_imx->dma_data[0].dma_rx_buf);
+			kfree(spi_imx->dma_data);
+		}
+	}
+
+	return 0;
+}
+
 static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
+			      struct dma_data_package *dma_data,
 			      struct spi_transfer *transfer)
 {
-	struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
 	struct spi_controller *controller = spi_imx->controller;
 	struct dma_async_tx_descriptor *desc_tx, *desc_rx;
 	unsigned long transfer_timeout;
@@ -1451,9 +1716,9 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
 	 * The TX DMA setup starts the transfer, so make sure RX is configured
 	 * before TX.
 	 */
-	desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
-					  rx->sgl, rx->nents, DMA_DEV_TO_MEM,
-					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+	desc_rx = dmaengine_prep_slave_single(controller->dma_rx, dma_data->dma_rx_addr,
+					      dma_data->dma_len, DMA_DEV_TO_MEM,
+					      DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
 	if (!desc_rx) {
 		transfer->error |= SPI_TRANS_FAIL_NO_START;
 		return -EINVAL;
@@ -1471,9 +1736,9 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
 	reinit_completion(&spi_imx->dma_rx_completion);
 	dma_async_issue_pending(controller->dma_rx);
 
-	desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
-					  tx->sgl, tx->nents, DMA_MEM_TO_DEV,
-					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+	desc_tx = dmaengine_prep_slave_single(controller->dma_tx, dma_data->dma_tx_addr,
+					      dma_data->dma_len, DMA_MEM_TO_DEV,
+					      DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
 	if (!desc_tx)
 		goto dmaengine_terminate_rx;
 
@@ -1521,16 +1786,16 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
 }
 
 static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
-				     struct spi_transfer *transfer)
+				     struct dma_data_package *dma_data,
+				     bool word_delay)
 {
-	struct sg_table *rx = &transfer->rx_sg;
-	struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
-	unsigned int bytes_per_word, i;
+	unsigned int bytes_per_word = word_delay ?
+				      spi_imx_bytes_per_word(spi_imx->bits_per_word) :
+				      BYTES_PER_32BITS_WORD;
+	unsigned int i;
 
-	/* Get the right burst length from the last sg to ensure no tail data */
-	bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
 	for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
-		if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
+		if (!dma_data->dma_len % (i * bytes_per_word))
 			break;
 	}
 	/* Use 1 as wml in case no available burst length got */
@@ -1540,25 +1805,29 @@ static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
 	spi_imx->wml = i;
 }
 
-static int spi_imx_dma_configure(struct spi_controller *controller)
+static int spi_imx_dma_configure(struct spi_controller *controller, bool word_delay)
 {
 	int ret;
 	enum dma_slave_buswidth buswidth;
 	struct dma_slave_config rx = {}, tx = {};
 	struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
 
-	switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
-	case 4:
+	if (word_delay) {
+		switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
+		case 4:
+			buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
+			break;
+		case 2:
+			buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
+			break;
+		case 1:
+			buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
+			break;
+		default:
+			return -EINVAL;
+		}
+	} else {
 		buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
-		break;
-	case 2:
-		buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
-		break;
-	case 1:
-		buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
-		break;
-	default:
-		return -EINVAL;
 	}
 
 	tx.direction = DMA_MEM_TO_DEV;
@@ -1584,15 +1853,17 @@ static int spi_imx_dma_configure(struct spi_controller *controller)
 	return 0;
 }
 
-static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
-				struct spi_transfer *transfer)
+static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
+					struct dma_data_package *dma_data,
+					struct spi_transfer *transfer,
+					bool word_delay)
 {
 	struct spi_controller *controller = spi_imx->controller;
 	int ret;
 
-	spi_imx_dma_max_wml_find(spi_imx, transfer);
+	spi_imx_dma_max_wml_find(spi_imx, dma_data, word_delay);
 
-	ret = spi_imx_dma_configure(controller);
+	ret = spi_imx_dma_configure(controller, word_delay);
 	if (ret)
 		goto dma_failure_no_start;
 
@@ -1603,10 +1874,17 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
 	}
 	spi_imx->devtype_data->setup_wml(spi_imx);
 
-	ret = spi_imx_dma_submit(spi_imx, transfer);
+	ret = spi_imx_dma_submit(spi_imx, dma_data, transfer);
 	if (ret)
 		return ret;
 
+	/* Trim the DMA RX buffer and copy the actual data to rx_buf */
+	dma_sync_single_for_cpu(controller->dma_rx->device->dev, dma_data->dma_rx_addr,
+				dma_data->dma_len, DMA_FROM_DEVICE);
+	spi_imx_dma_rx_data_handle(spi_imx, dma_data, transfer->rx_buf + spi_imx->rx_offset,
+				   word_delay);
+	spi_imx->rx_offset += dma_data->data_len;
+
 	return 0;
 /* fallback to pio */
 dma_failure_no_start:
@@ -1614,6 +1892,60 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
 	return ret;
 }
 
+static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
+				struct spi_transfer *transfer)
+{
+	bool word_delay = transfer->word_delay.value != 0;
+	int ret;
+	int i;
+
+	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
+	if (ret < 0) {
+		transfer->error |= SPI_TRANS_FAIL_NO_START;
+		dev_err(spi_imx->dev, "DMA data prepare fail\n");
+		goto fallback_pio;
+	}
+
+	spi_imx->rx_offset = 0;
+
+	/* Each dma_package performs a separate DMA transfer once */
+	for (i = 0; i < spi_imx->dma_package_num; i++) {
+		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
+		if (ret < 0) {
+			transfer->error |= SPI_TRANS_FAIL_NO_START;
+			dev_err(spi_imx->dev, "DMA map fail\n");
+			break;
+		}
+
+		/* Update the CTRL register BL field */
+		writel(spi_imx->dma_data[i].cmd_word, spi_imx->base + MX51_ECSPI_CTRL);
+
+		ret = spi_imx_dma_package_transfer(spi_imx, &spi_imx->dma_data[i],
+						   transfer, word_delay);
+
+		/* Whether the dma transmission is successful or not, dma unmap is necessary */
+		spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
+
+		if (ret < 0) {
+			dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
+			break;
+		}
+	}
+
+	for (int j = 0; j < spi_imx->dma_package_num; j++) {
+		kfree(spi_imx->dma_data[j].dma_tx_buf);
+		kfree(spi_imx->dma_data[j].dma_rx_buf);
+	}
+	kfree(spi_imx->dma_data);
+
+fallback_pio:
+	/* If no any dma package data is transferred, fallback to PIO mode transfer */
+	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
+		transfer->error &= !SPI_TRANS_FAIL_NO_START;
+
+	return ret;
+}
+
 static int spi_imx_pio_transfer(struct spi_device *spi,
 				struct spi_transfer *transfer)
 {
@@ -1780,9 +2112,14 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
 	 * transfer, the SPI transfer has already been mapped, so we
 	 * have to do the DMA transfer here.
 	 */
-	if (spi_imx->usedma)
-		return spi_imx_dma_transfer(spi_imx, transfer);
-
+	if (spi_imx->usedma) {
+		ret = spi_imx_dma_transfer(spi_imx, transfer);
+		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
+			spi_imx->usedma = false;
+			return spi_imx_pio_transfer(spi, transfer);
+		}
+		return ret;
+	}
 	/* run in polling mode for short transfers */
 	if (transfer->len == 1 || (polling_limit_us &&
 				   spi_imx_transfer_estimate_time_us(transfer) < polling_limit_us))
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/6] spi: imx: enable DMA mode for target operation
  2025-11-25 10:06 [PATCH 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
                   ` (4 preceding siblings ...)
  2025-11-25 10:06 ` [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
@ 2025-11-25 10:06 ` Carlos Song
  2025-11-25 16:05   ` Frank Li
  2025-11-26 12:18   ` Marc Kleine-Budde
  5 siblings, 2 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-25 10:06 UTC (permalink / raw)
  To: broonie, frank.li, hawnguo, s.hauer, kernel, festevam
  Cc: linux-spi, imx, linux-arm-kernel, linux-kernel, Carlos Song

Enable DMA mode for SPI IMX in target mode.
Disable the word delay feature for target mode, because target mode should
always keep high performance to make sure it can follow the master. Target
mode continues to operate in dynamic burst mode.

Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
 drivers/spi/spi-imx.c | 78 +++++++++++++++++++++++++++++++------------
 1 file changed, 56 insertions(+), 22 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index f02a47fbba8a..16b7d2f45012 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -264,9 +264,6 @@ static bool spi_imx_can_dma(struct spi_controller *controller, struct spi_device
 	if (!controller->dma_rx)
 		return false;
 
-	if (spi_imx->target_mode)
-		return false;
-
 	if (transfer->len < spi_imx->devtype_data->fifo_size)
 		return false;
 
@@ -1756,23 +1753,51 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
 
 	transfer_timeout = spi_imx_calculate_timeout(spi_imx, transfer->len);
 
-	/* Wait SDMA to finish the data transfer.*/
-	time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
-						transfer_timeout);
-	if (!time_left) {
-		dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
-		dmaengine_terminate_all(controller->dma_tx);
-		dmaengine_terminate_all(controller->dma_rx);
-		return -ETIMEDOUT;
-	}
+	if (!spi_imx->target_mode) {
+		/* Wait SDMA to finish the data transfer.*/
+		time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
+							transfer_timeout);
+		if (!time_left) {
+			dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
+			dmaengine_terminate_all(controller->dma_tx);
+			dmaengine_terminate_all(controller->dma_rx);
+			return -ETIMEDOUT;
+		}
 
-	time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
-						transfer_timeout);
-	if (!time_left) {
-		dev_err(&controller->dev, "I/O Error in DMA RX\n");
-		spi_imx->devtype_data->reset(spi_imx);
-		dmaengine_terminate_all(controller->dma_rx);
-		return -ETIMEDOUT;
+		time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
+							transfer_timeout);
+		if (!time_left) {
+			dev_err(&controller->dev, "I/O Error in DMA RX\n");
+			spi_imx->devtype_data->reset(spi_imx);
+			dmaengine_terminate_all(controller->dma_rx);
+			return -ETIMEDOUT;
+		}
+	} else {
+		spi_imx->target_aborted = false;
+
+		if (wait_for_completion_interruptible(&spi_imx->dma_tx_completion) ||
+		    spi_imx->target_aborted) {
+			dev_dbg(spi_imx->dev, "I/O Error in DMA TX interrupted\n");
+			dmaengine_terminate_all(controller->dma_tx);
+			dmaengine_terminate_all(controller->dma_rx);
+			return -EINTR;
+		}
+
+		if (wait_for_completion_interruptible(&spi_imx->dma_rx_completion) ||
+		    spi_imx->target_aborted) {
+			dev_dbg(spi_imx->dev, "I/O Error in DMA RX interrupted\n");
+			dmaengine_terminate_all(controller->dma_rx);
+			return -EINTR;
+		}
+
+		/*
+		 * ECSPI has a HW issue when works in Target mode, after 64 words
+		 * writtern to TXFIFO, even TXFIFO becomes empty, ECSPI_TXDATA keeps
+		 * shift out the last word data, so we have to disable ECSPI when in
+		 * target mode after the transfer completes.
+		 */
+		if (spi_imx->devtype_data->disable)
+			spi_imx->devtype_data->disable(spi_imx);
 	}
 
 	return 0;
@@ -1895,10 +1920,16 @@ static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
 static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
 				struct spi_transfer *transfer)
 {
-	bool word_delay = transfer->word_delay.value != 0;
+	bool word_delay = transfer->word_delay.value != 0 && !spi_imx->target_mode;
 	int ret;
 	int i;
 
+	if (transfer->len > MX53_MAX_TRANSFER_BYTES && spi_imx->target_mode) {
+		dev_err(spi_imx->dev, "Transaction too big, max size is %d bytes\n",
+			MX53_MAX_TRANSFER_BYTES);
+		return -EMSGSIZE;
+	}
+
 	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
 	if (ret < 0) {
 		transfer->error |= SPI_TRANS_FAIL_NO_START;
@@ -2104,7 +2135,7 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
 	while (spi_imx->devtype_data->rx_available(spi_imx))
 		readl(spi_imx->base + MXC_CSPIRXDATA);
 
-	if (spi_imx->target_mode)
+	if (spi_imx->target_mode && !spi_imx->usedma)
 		return spi_imx_pio_transfer_target(spi, transfer);
 
 	/*
@@ -2116,7 +2147,10 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
 		ret = spi_imx_dma_transfer(spi_imx, transfer);
 		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
 			spi_imx->usedma = false;
-			return spi_imx_pio_transfer(spi, transfer);
+			if (spi_imx->target_mode)
+				return spi_imx_pio_transfer_target(spi, transfer);
+			else
+				return spi_imx_pio_transfer(spi, transfer);
 		}
 		return ret;
 	}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-25 10:06 ` [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
@ 2025-11-25 12:10   ` Marc Kleine-Budde
  2025-11-26  7:42     ` Carlos Song
  2025-11-26  8:11   ` Marc Kleine-Budde
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-25 12:10 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, frank.li, hawnguo, s.hauer, kernel, festevam, imx,
	linux-kernel, linux-arm-kernel, linux-spi

[-- Attachment #1: Type: text/plain, Size: 21738 bytes --]

On 25.11.2025 18:06:17, Carlos Song wrote:
> ECSPI transfers only one word per frame in DMA mode, causing SCLK stalls
> between words due to BURST_LENGTH updates, which significantly impacts
> performance.
>
> To improve throughput, configure BURST_LENGTH as large as possible (up to
> 512 bytes per frame) instead of word length. This avoids delays between
> words. When transfer length is not 4-byte aligned, use bounce buffers to
> align data for DMA. TX uses aligned words for TXFIFO, while RX trims DMA
> buffer data after transfer completion.
>
> Introduce a new dma_package structure to store:
>   1. BURST_LENGTH values for each DMA request
>   2. Variables for DMA submission
>   3. DMA transmission length and actual data length
>
> Handle three cases:
>   - len <= 512 bytes: one package, BURST_LENGTH = len * 8 - 1
>   - len > 512 and aligned: one package, BURST_LENGTH = max (512 bytes)
>   - len > 512 and unaligned: two packages, second for tail data
>
> Performance test (spidev_test @10MHz, 4KB):
>   Before: tx/rx ~6651.9 kbps
>   After:  tx/rx ~9922.2 kbps (~50% improvement)
>
> For compatibility with slow SPI devices, add configurable word delay in
> DMA mode. When word delay is set, dynamic burst is disabled and
> BURST_LENGTH equals word length.
>
> Signed-off-by: Carlos Song <carlos.song@nxp.com>
> ---
>  drivers/spi/spi-imx.c | 409 ++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 373 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index 42f64d9535c9..f02a47fbba8a 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -60,6 +60,7 @@ MODULE_PARM_DESC(polling_limit_us,
>  #define MX51_ECSPI_CTRL_MAX_BURST	512
>  /* The maximum bytes that IMX53_ECSPI can transfer in target mode.*/
>  #define MX53_MAX_TRANSFER_BYTES		512
> +#define BYTES_PER_32BITS_WORD		4
>
>  enum spi_imx_devtype {
>  	IMX1_CSPI,
> @@ -95,6 +96,16 @@ struct spi_imx_devtype_data {
>  	enum spi_imx_devtype devtype;
>  };
>
> +struct dma_data_package {
> +	u32 cmd_word;
> +	void *dma_rx_buf;
> +	void *dma_tx_buf;
> +	dma_addr_t	dma_tx_addr;
> +	dma_addr_t	dma_rx_addr;

Better use uniform indention: here one space, not one tab.

> +	int dma_len;
> +	int data_len;
> +};
> +
>  struct spi_imx_data {
>  	struct spi_controller *controller;
>  	struct device *dev;
> @@ -130,6 +141,9 @@ struct spi_imx_data {
>  	u32 wml;
>  	struct completion dma_rx_completion;
>  	struct completion dma_tx_completion;
> +	struct dma_data_package *dma_data;

please add __counted_by(dma_package_num)

> +	int dma_package_num;
> +	int rx_offset;
>
>  	const struct spi_imx_devtype_data *devtype_data;
>  };
> @@ -189,6 +203,9 @@ MXC_SPI_BUF_TX(u16)
>  MXC_SPI_BUF_RX(u32)
>  MXC_SPI_BUF_TX(u32)
>
> +/* Align to cache line to avoid swiotlo bounce */
> +#define DMA_CACHE_ALIGNED_LEN(x) ALIGN((x), dma_get_cache_alignment())
> +
>  /* First entry is reserved, second entry is valid only if SDHC_SPIEN is set
>   * (which is currently not the case in this driver)
>   */
> @@ -253,6 +270,14 @@ static bool spi_imx_can_dma(struct spi_controller *controller, struct spi_device
>  	if (transfer->len < spi_imx->devtype_data->fifo_size)
>  		return false;
>
> +	/* DMA only can transmit data in bytes */
> +	if (spi_imx->bits_per_word != 8 && spi_imx->bits_per_word != 16 &&
> +	    spi_imx->bits_per_word != 32)
> +		return false;
> +
> +	if (transfer->len >= MAX_SDMA_BD_BYTES)
> +		return false;
> +
>  	spi_imx->dynamic_burst = 0;
>
>  	return true;
> @@ -1398,8 +1423,6 @@ static int spi_imx_sdma_init(struct device *dev, struct spi_imx_data *spi_imx,
>
>  	init_completion(&spi_imx->dma_rx_completion);
>  	init_completion(&spi_imx->dma_tx_completion);
> -	controller->can_dma = spi_imx_can_dma;
> -	controller->max_dma_len = MAX_SDMA_BD_BYTES;
>  	spi_imx->controller->flags = SPI_CONTROLLER_MUST_RX |
>  					 SPI_CONTROLLER_MUST_TX;
>
> @@ -1437,10 +1460,252 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
>  	return secs_to_jiffies(2 * timeout);
>  }
>
> +static void spi_imx_dma_unmap(struct spi_imx_data *spi_imx,
> +			      struct dma_data_package *dma_data)
> +{
> +	struct device *tx_dev = spi_imx->controller->dma_tx->device->dev;
> +	struct device *rx_dev = spi_imx->controller->dma_rx->device->dev;
> +
> +	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> +			 DMA_TO_DEVICE);
> +	dma_unmap_single(rx_dev, dma_data->dma_rx_addr,
> +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> +			 DMA_FROM_DEVICE);
> +}
> +
> +static void spi_imx_dma_rx_data_handle(struct spi_imx_data *spi_imx,
> +				       struct dma_data_package *dma_data, void *rx_buf,
> +				       bool word_delay)
> +{
> +#ifdef __LITTLE_ENDIAN
> +	unsigned int bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
> +	u32 *temp = dma_data->dma_rx_buf;

can you move this into the scope of the if() below?
> +#endif
> +	void *copy_ptr;
> +	int unaligned;
> +
> +#ifdef __LITTLE_ENDIAN
> +	/*
> +	 * On little-endian CPUs, adjust byte order:
> +	 * - Swap bytes when bpw = 8
> +	 * - Swap half-words when bpw = 16
> +	 * This ensures correct data ordering for DMA transfers.
> +	 */
> +	if (!word_delay) {
> +		for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len, BYTES_PER_32BITS_WORD); i++) {

sizeof(*temp) instead of BYTES_PER_32BITS_WORD?

> +			if (bytes_per_word == 1)
> +				swab32s(temp + i);
> +			else if (bytes_per_word == 2)
> +				swahw32s(temp + i);

Why do you do first a swap in place and then a memcpy()

> +		}
> +	}
> +#endif
> +
> +	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {

I think this deserves a comment, why you do a re-alignment of the data here.

> +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> +		copy_ptr = (u8 *)dma_data->dma_rx_buf + BYTES_PER_32BITS_WORD - unaligned;
> +	} else {
> +		copy_ptr = dma_data->dma_rx_buf;

Why do you use the bounce buffer if the data is aligned correct?

> +	}
> +
> +	memcpy(rx_buf, copy_ptr, dma_data->data_len);
> +}
> +
> +static int spi_imx_dma_map(struct spi_imx_data *spi_imx,
> +			   struct dma_data_package *dma_data)
> +{
> +	struct spi_controller *controller = spi_imx->controller;
> +	struct device *tx_dev = controller->dma_tx->device->dev;
> +	struct device *rx_dev = controller->dma_rx->device->dev;
> +
> +	dma_data->dma_tx_addr = dma_map_single(tx_dev, dma_data->dma_tx_buf,
> +					       DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> +					       DMA_TO_DEVICE);
> +	if (dma_mapping_error(tx_dev, dma_data->dma_tx_addr)) {
> +		dev_err(spi_imx->dev, "DMA TX map failed\n");
> +		return -EINVAL;

please propagate the error value of dma_mapping_error()

> +	}
> +
> +	dma_data->dma_rx_addr = dma_map_single(rx_dev, dma_data->dma_rx_buf,
> +					       DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> +					       DMA_FROM_DEVICE);
> +	if (dma_mapping_error(rx_dev, dma_data->dma_rx_addr)) {
> +		dev_err(spi_imx->dev, "DMA RX map failed\n");
> +		goto rx_map_err;

there's only one user of the dma_unmap_single(), so no need for the
goto.

> +	}
> +
> +	return 0;
> +
> +rx_map_err:
> +	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> +			 DMA_TO_DEVICE);
> +	return -EINVAL;
> +}
> +
> +static int spi_imx_dma_tx_data_handle(struct spi_imx_data *spi_imx,
> +				      struct dma_data_package *dma_data,
> +				      const void *tx_buf,
> +				      bool word_delay)
> +{
> +#ifdef __LITTLE_ENDIAN
> +	unsigned int bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
> +	u32 *temp;
> +#endif

move into scope of if()

> +	void *copy_ptr;
> +	int unaligned;
> +
> +	if (word_delay) {
> +		dma_data->dma_len = dma_data->data_len;
> +	} else {
> +		/*
> +		 * As per the reference manual, when burst length = 32*n + m bits, ECSPI
> +		 * sends m LSB bits in the first word, followed by n full 32-bit words.
> +		 * Since actual data may not be 4-byte aligned, allocate DMA TX/RX buffers
> +		 * to ensure alignment. For TX, DMA pushes 4-byte aligned words to TXFIFO,
> +		 * while ECSPI uses BURST_LENGTH settings to maintain correct bit count.
> +		 * For RX, DMA receives 32-bit words from RXFIFO; after transfer completes,
> +		 * trim the DMA RX buffer and copy the actual data to rx_buf.
> +		 */

Ahh, please add the corresponding description for rx.

> +		dma_data->dma_len = ALIGN(dma_data->data_len, BYTES_PER_32BITS_WORD);
> +	}
> +
> +	dma_data->dma_tx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL | __GFP_ZERO);

kzalloc()?

> +	if (!dma_data->dma_tx_buf)
> +		return -ENOMEM;
> +
> +	dma_data->dma_rx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL | __GFP_ZERO);
> +	if (!dma_data->dma_rx_buf) {
> +		kfree(dma_data->dma_tx_buf);
> +		return -ENOMEM;
> +	}
> +
> +	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> +		copy_ptr = (u8 *)dma_data->dma_tx_buf + BYTES_PER_32BITS_WORD - unaligned;
> +	} else {
> +		copy_ptr = dma_data->dma_tx_buf;
> +	}
> +
> +	memcpy(copy_ptr, tx_buf, dma_data->data_len);
> +
> +	/*
> +	 * When word_delay is enabled, DMA transfers an entire word in one minor loop.
> +	 * In this case, no data requires additional handling.
> +	 */
> +	if (word_delay)
> +		return 0;
> +
> +#ifdef __LITTLE_ENDIAN
> +	/*
> +	 * On little-endian CPUs, adjust byte order:
> +	 * - Swap bytes when bpw = 8
> +	 * - Swap half-words when bpw = 16
> +	 * This ensures correct data ordering for DMA transfers.
> +	 */
> +	temp = dma_data->dma_tx_buf;
> +	for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len, BYTES_PER_32BITS_WORD); i++) {
> +		if (bytes_per_word == 1)
> +			swab32s(temp + i);
> +		else if (bytes_per_word == 2)
> +			swahw32s(temp + i);
> +	}
> +#endif
> +
> +	return 0;
> +}
> +
> +static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
> +				    struct spi_transfer *transfer,
> +				    bool word_delay)
> +{
> +	u32 pre_bl, tail_bl;
> +	u32 ctrl;
> +	int ret;
> +
> +	/*
> +	 * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds 512
> +	 * and is not a multiple of 512, a tail transfer is required. In this case,
> +	 * an extra DMA request is issued for the remaining data.

Why do you need an extra transfer in this case?

> +	 */
> +	ctrl = readl(spi_imx->base + MX51_ECSPI_CTRL);
> +	if (word_delay) {
> +		/*
> +		 * When SPI IMX need to support word delay, according to "Sample Period Control
> +		 * Register" shows, The Sample Period Control Register (ECSPI_PERIODREG)
> +		 * provides software a way to insert delays (wait states) between consecutive
> +		 * SPI transfers. As a result, ECSPI can only transfer one word per frame, and
> +		 * the delay occurs between frames.
> +		 */
> +		spi_imx->dma_package_num = 1;
> +		pre_bl = spi_imx->bits_per_word - 1;
> +	} else if (transfer->len <= MX51_ECSPI_CTRL_MAX_BURST) {
> +		spi_imx->dma_package_num = 1;
> +		pre_bl = transfer->len * BITS_PER_BYTE - 1;
> +	} else if (!(transfer->len % MX51_ECSPI_CTRL_MAX_BURST)) {
> +		spi_imx->dma_package_num = 1;
> +		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> +	} else {
> +		spi_imx->dma_package_num = 2;
> +		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> +		tail_bl = (transfer->len % MX51_ECSPI_CTRL_MAX_BURST) * BITS_PER_BYTE - 1;
> +	}
> +
> +	spi_imx->dma_data = kmalloc_array(spi_imx->dma_package_num,
> +					  sizeof(struct dma_data_package),
> +					  GFP_KERNEL | __GFP_ZERO);
> +	if (!spi_imx->dma_data) {
> +		dev_err(spi_imx->dev, "Failed to allocate DMA package buffer!\n");
> +		return -ENOMEM;
> +	}
> +
> +	if (spi_imx->dma_package_num == 1) {
> +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> +		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> +		spi_imx->dma_data[0].cmd_word = ctrl;
> +		spi_imx->dma_data[0].data_len = transfer->len;
> +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
> +						 word_delay);
> +		if (ret) {
> +			kfree(spi_imx->dma_data);
> +			return ret;
> +		}
> +	} else {
> +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> +		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> +		spi_imx->dma_data[0].cmd_word = ctrl;
> +		spi_imx->dma_data[0].data_len = DIV_ROUND_DOWN_ULL(transfer->len,
> +								   MX51_ECSPI_CTRL_MAX_BURST)
> +								   * MX51_ECSPI_CTRL_MAX_BURST;
> +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
> +						 false);
> +		if (ret) {
> +			kfree(spi_imx->dma_data);
> +			return ret;
> +		}
> +
> +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> +		ctrl |= tail_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> +		spi_imx->dma_data[1].cmd_word = ctrl;
> +		spi_imx->dma_data[1].data_len = transfer->len % MX51_ECSPI_CTRL_MAX_BURST;
> +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[1],
> +						 transfer->tx_buf + spi_imx->dma_data[0].data_len,
> +						 false);
> +		if (ret) {
> +			kfree(spi_imx->dma_data[0].dma_tx_buf);
> +			kfree(spi_imx->dma_data[0].dma_rx_buf);
> +			kfree(spi_imx->dma_data);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
> +			      struct dma_data_package *dma_data,
>  			      struct spi_transfer *transfer)
>  {
> -	struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
>  	struct spi_controller *controller = spi_imx->controller;
>  	struct dma_async_tx_descriptor *desc_tx, *desc_rx;
>  	unsigned long transfer_timeout;
> @@ -1451,9 +1716,9 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>  	 * The TX DMA setup starts the transfer, so make sure RX is configured
>  	 * before TX.
>  	 */
> -	desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
> -					  rx->sgl, rx->nents, DMA_DEV_TO_MEM,
> -					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> +	desc_rx = dmaengine_prep_slave_single(controller->dma_rx, dma_data->dma_rx_addr,
> +					      dma_data->dma_len, DMA_DEV_TO_MEM,
> +					      DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
>  	if (!desc_rx) {
>  		transfer->error |= SPI_TRANS_FAIL_NO_START;
>  		return -EINVAL;
> @@ -1471,9 +1736,9 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>  	reinit_completion(&spi_imx->dma_rx_completion);
>  	dma_async_issue_pending(controller->dma_rx);
>
> -	desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
> -					  tx->sgl, tx->nents, DMA_MEM_TO_DEV,
> -					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> +	desc_tx = dmaengine_prep_slave_single(controller->dma_tx, dma_data->dma_tx_addr,
> +					      dma_data->dma_len, DMA_MEM_TO_DEV,
> +					      DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
>  	if (!desc_tx)
>  		goto dmaengine_terminate_rx;
>
> @@ -1521,16 +1786,16 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>  }
>
>  static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
> -				     struct spi_transfer *transfer)
> +				     struct dma_data_package *dma_data,
> +				     bool word_delay)
>  {
> -	struct sg_table *rx = &transfer->rx_sg;
> -	struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
> -	unsigned int bytes_per_word, i;
> +	unsigned int bytes_per_word = word_delay ?
> +				      spi_imx_bytes_per_word(spi_imx->bits_per_word) :
> +				      BYTES_PER_32BITS_WORD;
> +	unsigned int i;
>
> -	/* Get the right burst length from the last sg to ensure no tail data */
> -	bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
>  	for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
> -		if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
> +		if (!dma_data->dma_len % (i * bytes_per_word))
>  			break;
>  	}
>  	/* Use 1 as wml in case no available burst length got */
> @@ -1540,25 +1805,29 @@ static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
>  	spi_imx->wml = i;
>  }
>
> -static int spi_imx_dma_configure(struct spi_controller *controller)
> +static int spi_imx_dma_configure(struct spi_controller *controller, bool word_delay)
>  {
>  	int ret;
>  	enum dma_slave_buswidth buswidth;
>  	struct dma_slave_config rx = {}, tx = {};
>  	struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
>
> -	switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
> -	case 4:
> +	if (word_delay) {
> +		switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
> +		case 4:
> +			buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
> +			break;
> +		case 2:
> +			buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
> +			break;
> +		case 1:
> +			buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +	} else {
>  		buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
> -		break;
> -	case 2:
> -		buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
> -		break;
> -	case 1:
> -		buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
> -		break;
> -	default:
> -		return -EINVAL;
>  	}
>
>  	tx.direction = DMA_MEM_TO_DEV;
> @@ -1584,15 +1853,17 @@ static int spi_imx_dma_configure(struct spi_controller *controller)
>  	return 0;
>  }
>
> -static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> -				struct spi_transfer *transfer)
> +static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
> +					struct dma_data_package *dma_data,
> +					struct spi_transfer *transfer,
> +					bool word_delay)
>  {
>  	struct spi_controller *controller = spi_imx->controller;
>  	int ret;
>
> -	spi_imx_dma_max_wml_find(spi_imx, transfer);
> +	spi_imx_dma_max_wml_find(spi_imx, dma_data, word_delay);
>
> -	ret = spi_imx_dma_configure(controller);
> +	ret = spi_imx_dma_configure(controller, word_delay);
>  	if (ret)
>  		goto dma_failure_no_start;
>
> @@ -1603,10 +1874,17 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
>  	}
>  	spi_imx->devtype_data->setup_wml(spi_imx);
>
> -	ret = spi_imx_dma_submit(spi_imx, transfer);
> +	ret = spi_imx_dma_submit(spi_imx, dma_data, transfer);
>  	if (ret)
>  		return ret;
>
> +	/* Trim the DMA RX buffer and copy the actual data to rx_buf */
> +	dma_sync_single_for_cpu(controller->dma_rx->device->dev, dma_data->dma_rx_addr,
> +				dma_data->dma_len, DMA_FROM_DEVICE);
> +	spi_imx_dma_rx_data_handle(spi_imx, dma_data, transfer->rx_buf + spi_imx->rx_offset,
> +				   word_delay);
> +	spi_imx->rx_offset += dma_data->data_len;
> +
>  	return 0;
>  /* fallback to pio */
>  dma_failure_no_start:
> @@ -1614,6 +1892,60 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
>  	return ret;
>  }
>
> +static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> +				struct spi_transfer *transfer)
> +{
> +	bool word_delay = transfer->word_delay.value != 0;
> +	int ret;
> +	int i;
> +
> +	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> +	if (ret < 0) {
> +		transfer->error |= SPI_TRANS_FAIL_NO_START;
> +		dev_err(spi_imx->dev, "DMA data prepare fail\n");
> +		goto fallback_pio;
> +	}
> +
> +	spi_imx->rx_offset = 0;
> +
> +	/* Each dma_package performs a separate DMA transfer once */
> +	for (i = 0; i < spi_imx->dma_package_num; i++) {
> +		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> +		if (ret < 0) {
> +			transfer->error |= SPI_TRANS_FAIL_NO_START;
> +			dev_err(spi_imx->dev, "DMA map fail\n");
> +			break;
> +		}
> +
> +		/* Update the CTRL register BL field */
> +		writel(spi_imx->dma_data[i].cmd_word, spi_imx->base + MX51_ECSPI_CTRL);
> +
> +		ret = spi_imx_dma_package_transfer(spi_imx, &spi_imx->dma_data[i],
> +						   transfer, word_delay);
> +
> +		/* Whether the dma transmission is successful or not, dma unmap is necessary */
> +		spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
> +
> +		if (ret < 0) {
> +			dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
> +			break;
> +		}
> +	}
> +
> +	for (int j = 0; j < spi_imx->dma_package_num; j++) {
> +		kfree(spi_imx->dma_data[j].dma_tx_buf);
> +		kfree(spi_imx->dma_data[j].dma_rx_buf);
> +	}
> +	kfree(spi_imx->dma_data);
> +
> +fallback_pio:
> +	/* If no any dma package data is transferred, fallback to PIO mode transfer */
> +	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
> +		transfer->error &= !SPI_TRANS_FAIL_NO_START;
> +
> +	return ret;
> +}
> +
>  static int spi_imx_pio_transfer(struct spi_device *spi,
>  				struct spi_transfer *transfer)
>  {
> @@ -1780,9 +2112,14 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
>  	 * transfer, the SPI transfer has already been mapped, so we
>  	 * have to do the DMA transfer here.
>  	 */
> -	if (spi_imx->usedma)
> -		return spi_imx_dma_transfer(spi_imx, transfer);
> -
> +	if (spi_imx->usedma) {
> +		ret = spi_imx_dma_transfer(spi_imx, transfer);
> +		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> +			spi_imx->usedma = false;
> +			return spi_imx_pio_transfer(spi, transfer);
> +		}
> +		return ret;
> +	}
>  	/* run in polling mode for short transfers */
>  	if (transfer->len == 1 || (polling_limit_us &&
>  				   spi_imx_transfer_estimate_time_us(transfer) < polling_limit_us))
> --
> 2.34.1
>
>
>

regards,
Marc

--
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer()
  2025-11-25 10:06 ` [PATCH 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer() Carlos Song
@ 2025-11-25 15:32   ` Frank Li
  0 siblings, 0 replies; 34+ messages in thread
From: Frank Li @ 2025-11-25 15:32 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, hawnguo, s.hauer, kernel, festevam, linux-spi, imx,
	linux-arm-kernel, linux-kernel

On Tue, Nov 25, 2025 at 06:06:13PM +0800, Carlos Song wrote:
> Relocate spi_imx_dma_configure() next to spi_imx_dma_transfer() so that
> all DMA-related functions are grouped together for better readability.
> No functional changes.
>
> Signed-off-by: Carlos Song <carlos.song@nxp.com>
> ---

Reviewed-by: Frank Li <Frank.Li@nxp.com>

>  drivers/spi/spi-imx.c | 88 +++++++++++++++++++++----------------------
>  1 file changed, 44 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index b8b79bb7fec3..e78e02a84b50 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -1282,50 +1282,6 @@ static irqreturn_t spi_imx_isr(int irq, void *dev_id)
>  	return IRQ_HANDLED;
>  }
>
> -static int spi_imx_dma_configure(struct spi_controller *controller)
> -{
> -	int ret;
> -	enum dma_slave_buswidth buswidth;
> -	struct dma_slave_config rx = {}, tx = {};
> -	struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
> -
> -	switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
> -	case 4:
> -		buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
> -		break;
> -	case 2:
> -		buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
> -		break;
> -	case 1:
> -		buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
> -		break;
> -	default:
> -		return -EINVAL;
> -	}
> -
> -	tx.direction = DMA_MEM_TO_DEV;
> -	tx.dst_addr = spi_imx->base_phys + MXC_CSPITXDATA;
> -	tx.dst_addr_width = buswidth;
> -	tx.dst_maxburst = spi_imx->wml;
> -	ret = dmaengine_slave_config(controller->dma_tx, &tx);
> -	if (ret) {
> -		dev_err(spi_imx->dev, "TX dma configuration failed with %d\n", ret);
> -		return ret;
> -	}
> -
> -	rx.direction = DMA_DEV_TO_MEM;
> -	rx.src_addr = spi_imx->base_phys + MXC_CSPIRXDATA;
> -	rx.src_addr_width = buswidth;
> -	rx.src_maxburst = spi_imx->wml;
> -	ret = dmaengine_slave_config(controller->dma_rx, &rx);
> -	if (ret) {
> -		dev_err(spi_imx->dev, "RX dma configuration failed with %d\n", ret);
> -		return ret;
> -	}
> -
> -	return 0;
> -}
> -
>  static int spi_imx_setupxfer(struct spi_device *spi,
>  				 struct spi_transfer *t)
>  {
> @@ -1481,6 +1437,50 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
>  	return secs_to_jiffies(2 * timeout);
>  }
>
> +static int spi_imx_dma_configure(struct spi_controller *controller)
> +{
> +	int ret;
> +	enum dma_slave_buswidth buswidth;
> +	struct dma_slave_config rx = {}, tx = {};
> +	struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
> +
> +	switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
> +	case 4:
> +		buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
> +		break;
> +	case 2:
> +		buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
> +		break;
> +	case 1:
> +		buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	tx.direction = DMA_MEM_TO_DEV;
> +	tx.dst_addr = spi_imx->base_phys + MXC_CSPITXDATA;
> +	tx.dst_addr_width = buswidth;
> +	tx.dst_maxburst = spi_imx->wml;
> +	ret = dmaengine_slave_config(controller->dma_tx, &tx);
> +	if (ret) {
> +		dev_err(spi_imx->dev, "TX dma configuration failed with %d\n", ret);
> +		return ret;
> +	}
> +
> +	rx.direction = DMA_DEV_TO_MEM;
> +	rx.src_addr = spi_imx->base_phys + MXC_CSPIRXDATA;
> +	rx.src_addr_width = buswidth;
> +	rx.src_maxburst = spi_imx->wml;
> +	ret = dmaengine_slave_config(controller->dma_rx, &rx);
> +	if (ret) {
> +		dev_err(spi_imx->dev, "RX dma configuration failed with %d\n", ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
>  static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
>  				struct spi_transfer *transfer)
>  {
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] spi: imx: introduce helper to clear DMA mode logic
  2025-11-25 10:06 ` [PATCH 2/6] spi: imx: introduce helper to clear DMA mode logic Carlos Song
@ 2025-11-25 15:41   ` Frank Li
  0 siblings, 0 replies; 34+ messages in thread
From: Frank Li @ 2025-11-25 15:41 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, hawnguo, s.hauer, kernel, festevam, linux-spi, imx,
	linux-arm-kernel, linux-kernel

On Tue, Nov 25, 2025 at 06:06:14PM +0800, Carlos Song wrote:
> Add a helper function to handle clearing DMA mode, including getting the
> maximum watermark length and submitting the DMA request. This refactoring
> makes the code more concise and improves readability.
> No functional changes.

spi: imx: introduce helper functions to improve readability

Add helper functions spi_imx_dma_submit() and spi_imx_dma_max_wml_find() to
improve readability and prepare to add more complex logic for dynamic burst
length calculation.

No functional changes.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
>
> Signed-off-by: Carlos Song <carlos.song@nxp.com>
> ---
>  drivers/spi/spi-imx.c | 164 +++++++++++++++++++++++-------------------
>  1 file changed, 92 insertions(+), 72 deletions(-)
>
...
>
>  	return 0;
>  /* fallback to pio */
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure
  2025-11-25 10:06 ` [PATCH 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure Carlos Song
@ 2025-11-25 15:42   ` Frank Li
  0 siblings, 0 replies; 34+ messages in thread
From: Frank Li @ 2025-11-25 15:42 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, hawnguo, s.hauer, kernel, festevam, linux-spi, imx,
	linux-arm-kernel, linux-kernel

On Tue, Nov 25, 2025 at 06:06:15PM +0800, Carlos Song wrote:
> If dmaengine_prep_slave_sg() fails, no descriptor is submitted to the TX
> channel and DMA is never started. Therefore, calling
> dmaengine_terminate_all() for the TX DMA channel is unnecessary.
>
> Signed-off-by: Carlos Song <carlos.song@nxp.com>

Reviewed-by: Frank Li <Frank.Li@nxp.com>

> ---
>  drivers/spi/spi-imx.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index 012f5bcbf73f..186963d3d2e0 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -1468,7 +1468,6 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>  					  tx->sgl, tx->nents, DMA_MEM_TO_DEV,
>  					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
>  	if (!desc_tx) {
> -		dmaengine_terminate_all(controller->dma_tx);
>  		dmaengine_terminate_all(controller->dma_rx);
>  		return -EINVAL;
>  	}
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] spi: imx: handle DMA submission errors with dma_submit_error()
  2025-11-25 10:06 ` [PATCH 4/6] spi: imx: handle DMA submission errors with dma_submit_error() Carlos Song
@ 2025-11-25 15:45   ` Frank Li
  0 siblings, 0 replies; 34+ messages in thread
From: Frank Li @ 2025-11-25 15:45 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, hawnguo, s.hauer, kernel, festevam, linux-spi, imx,
	linux-arm-kernel, linux-kernel

On Tue, Nov 25, 2025 at 06:06:16PM +0800, Carlos Song wrote:
> Add error handling for DMA request submission by checking the return
> cookie with dma_submit_error(). This prevents propagating submission
> failures through the DMA transfer process, which could otherwise lead
> to unexpected behavior.

nit:  remove "otherwise".

Reviewed-by: Frank Li <Frank.Li@nxp.com>

>
> Signed-off-by: Carlos Song <carlos.song@nxp.com>
> ---
>  drivers/spi/spi-imx.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index 186963d3d2e0..42f64d9535c9 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -1445,6 +1445,7 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>  	struct dma_async_tx_descriptor *desc_tx, *desc_rx;
>  	unsigned long transfer_timeout;
>  	unsigned long time_left;
> +	dma_cookie_t cookie;
>
>  	/*
>  	 * The TX DMA setup starts the transfer, so make sure RX is configured
> @@ -1460,21 +1461,29 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>
>  	desc_rx->callback = spi_imx_dma_rx_callback;
>  	desc_rx->callback_param = (void *)spi_imx;
> -	dmaengine_submit(desc_rx);
> +	cookie = dmaengine_submit(desc_rx);
> +	if (dma_submit_error(cookie)) {
> +		dev_err(spi_imx->dev, "submitting DMA RX failed\n");
> +		transfer->error |= SPI_TRANS_FAIL_NO_START;
> +		goto dmaengine_terminate_rx;
> +	}
> +
>  	reinit_completion(&spi_imx->dma_rx_completion);
>  	dma_async_issue_pending(controller->dma_rx);
>
>  	desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
>  					  tx->sgl, tx->nents, DMA_MEM_TO_DEV,
>  					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> -	if (!desc_tx) {
> -		dmaengine_terminate_all(controller->dma_rx);
> -		return -EINVAL;
> -	}
> +	if (!desc_tx)
> +		goto dmaengine_terminate_rx;
>
>  	desc_tx->callback = spi_imx_dma_tx_callback;
>  	desc_tx->callback_param = (void *)spi_imx;
> -	dmaengine_submit(desc_tx);
> +	cookie = dmaengine_submit(desc_tx);
> +	if (dma_submit_error(cookie)) {
> +		dev_err(spi_imx->dev, "submitting DMA TX failed\n");
> +		goto dmaengine_terminate_tx;
> +	}
>  	reinit_completion(&spi_imx->dma_tx_completion);
>  	dma_async_issue_pending(controller->dma_tx);
>
> @@ -1502,6 +1511,13 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>  	}
>
>  	return 0;
> +
> +dmaengine_terminate_tx:
> +	dmaengine_terminate_all(controller->dma_tx);
> +dmaengine_terminate_rx:
> +	dmaengine_terminate_all(controller->dma_rx);
> +
> +	return -EINVAL;
>  }
>
>  static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/6] spi: imx: enable DMA mode for target operation
  2025-11-25 10:06 ` [PATCH 6/6] spi: imx: enable DMA mode for target operation Carlos Song
@ 2025-11-25 16:05   ` Frank Li
  2025-11-26  2:11     ` Carlos Song
  2025-12-02  7:00     ` Carlos Song
  2025-11-26 12:18   ` Marc Kleine-Budde
  1 sibling, 2 replies; 34+ messages in thread
From: Frank Li @ 2025-11-25 16:05 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, hawnguo, s.hauer, kernel, festevam, linux-spi, imx,
	linux-arm-kernel, linux-kernel

On Tue, Nov 25, 2025 at 06:06:18PM +0800, Carlos Song wrote:
> Enable DMA mode for SPI IMX in target mode.
> Disable the word delay feature for target mode, because target mode should
> always keep high performance to make sure it can follow the master. Target
> mode continues to operate in dynamic burst mode.

If two paragraph, need extra empty line. If one parapraph, move to previous
line.

>
> Signed-off-by: Carlos Song <carlos.song@nxp.com>
> ---
>  drivers/spi/spi-imx.c | 78 +++++++++++++++++++++++++++++++------------
>  1 file changed, 56 insertions(+), 22 deletions(-)
>
...
>
> @@ -1756,23 +1753,51 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>
>  	transfer_timeout = spi_imx_calculate_timeout(spi_imx, transfer->len);
>
> -	/* Wait SDMA to finish the data transfer.*/
> -	time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
> -						transfer_timeout);
> -	if (!time_left) {
> -		dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
> -		dmaengine_terminate_all(controller->dma_tx);
> -		dmaengine_terminate_all(controller->dma_rx);
> -		return -ETIMEDOUT;
> -	}
> +	if (!spi_imx->target_mode) {
> +		/* Wait SDMA to finish the data transfer.*/
> +		time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
> +							transfer_timeout);
> +		if (!time_left) {
> +			dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
> +			dmaengine_terminate_all(controller->dma_tx);
> +			dmaengine_terminate_all(controller->dma_rx);
> +			return -ETIMEDOUT;
> +		}
>
> -	time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
> -						transfer_timeout);
> -	if (!time_left) {
> -		dev_err(&controller->dev, "I/O Error in DMA RX\n");
> -		spi_imx->devtype_data->reset(spi_imx);
> -		dmaengine_terminate_all(controller->dma_rx);
> -		return -ETIMEDOUT;
> +		time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
> +							transfer_timeout);
> +		if (!time_left) {
> +			dev_err(&controller->dev, "I/O Error in DMA RX\n");
> +			spi_imx->devtype_data->reset(spi_imx);
> +			dmaengine_terminate_all(controller->dma_rx);
> +			return -ETIMEDOUT;
> +		}
> +	} else {
> +		spi_imx->target_aborted = false;
> +
> +		if (wait_for_completion_interruptible(&spi_imx->dma_tx_completion) ||
> +		    spi_imx->target_aborted) {

Suppose somewhere set target_aborted to true. I think here should use
READ_ONCE() to make sure read from memory.

Not sure why here use wait_for_completion_interruptible() but at master
mode use wait_for_completion_timeout().

> +			dev_dbg(spi_imx->dev, "I/O Error in DMA TX interrupted\n");
> +			dmaengine_terminate_all(controller->dma_tx);
> +			dmaengine_terminate_all(controller->dma_rx);
> +			return -EINTR;
> +		}
> +
> +		if (wait_for_completion_interruptible(&spi_imx->dma_rx_completion) ||
> +		    spi_imx->target_aborted) {
> +			dev_dbg(spi_imx->dev, "I/O Error in DMA RX interrupted\n");
> +			dmaengine_terminate_all(controller->dma_rx);
> +			return -EINTR;
> +		}
> +
> +		/*
> +		 * ECSPI has a HW issue when works in Target mode, after 64 words
> +		 * writtern to TXFIFO, even TXFIFO becomes empty, ECSPI_TXDATA keeps
> +		 * shift out the last word data, so we have to disable ECSPI when in
> +		 * target mode after the transfer completes.
> +		 */
> +		if (spi_imx->devtype_data->disable)
> +			spi_imx->devtype_data->disable(spi_imx);
>  	}
>
>  	return 0;
> @@ -1895,10 +1920,16 @@ static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
>  static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
>  				struct spi_transfer *transfer)
>  {
> -	bool word_delay = transfer->word_delay.value != 0;
> +	bool word_delay = transfer->word_delay.value != 0 && !spi_imx->target_mode;
>  	int ret;
>  	int i;
>
> +	if (transfer->len > MX53_MAX_TRANSFER_BYTES && spi_imx->target_mode) {

why only target have len limiation?

Frank
> +		dev_err(spi_imx->dev, "Transaction too big, max size is %d bytes\n",
> +			MX53_MAX_TRANSFER_BYTES);
> +		return -EMSGSIZE;
> +	}
> +
>  	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
>  	if (ret < 0) {
>  		transfer->error |= SPI_TRANS_FAIL_NO_START;
> @@ -2104,7 +2135,7 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
>  	while (spi_imx->devtype_data->rx_available(spi_imx))
>  		readl(spi_imx->base + MXC_CSPIRXDATA);
>
> -	if (spi_imx->target_mode)
> +	if (spi_imx->target_mode && !spi_imx->usedma)
>  		return spi_imx_pio_transfer_target(spi, transfer);
>
>  	/*
> @@ -2116,7 +2147,10 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
>  		ret = spi_imx_dma_transfer(spi_imx, transfer);
>  		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
>  			spi_imx->usedma = false;
> -			return spi_imx_pio_transfer(spi, transfer);
> +			if (spi_imx->target_mode)
> +				return spi_imx_pio_transfer_target(spi, transfer);
> +			else
> +				return spi_imx_pio_transfer(spi, transfer);
>  		}
>  		return ret;
>  	}
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH 6/6] spi: imx: enable DMA mode for target operation
  2025-11-25 16:05   ` Frank Li
@ 2025-11-26  2:11     ` Carlos Song
  2025-12-02  7:00     ` Carlos Song
  1 sibling, 0 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-26  2:11 UTC (permalink / raw)
  To: Frank Li
  Cc: broonie@kernel.org, hawnguo@kernel.org, s.hauer@pengutronix.de,
	kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Frank Li <frank.li@nxp.com>
> Sent: Wednesday, November 26, 2025 12:06 AM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; hawnguo@kernel.org; s.hauer@pengutronix.de;
> kernel@pengutronix.de; festevam@gmail.com; linux-spi@vger.kernel.org;
> imx@lists.linux.dev; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 6/6] spi: imx: enable DMA mode for target operation
> 
> On Tue, Nov 25, 2025 at 06:06:18PM +0800, Carlos Song wrote:
> > Enable DMA mode for SPI IMX in target mode.
> > Disable the word delay feature for target mode, because target mode
> > should always keep high performance to make sure it can follow the
> > master. Target mode continues to operate in dynamic burst mode.
> 
> If two paragraph, need extra empty line. If one parapraph, move to previous
> line.
> 
Will do this in V2.
> >
> > Signed-off-by: Carlos Song <carlos.song@nxp.com>
> > ---
> >  drivers/spi/spi-imx.c | 78
> > +++++++++++++++++++++++++++++++------------
> >  1 file changed, 56 insertions(+), 22 deletions(-)
> >
> ...
> >
> > @@ -1756,23 +1753,51 @@ static int spi_imx_dma_submit(struct
> > spi_imx_data *spi_imx,
> >
> >  	transfer_timeout = spi_imx_calculate_timeout(spi_imx,
> > transfer->len);
> >
> > -	/* Wait SDMA to finish the data transfer.*/
> > -	time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
> > -						transfer_timeout);
> > -	if (!time_left) {
> > -		dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
> > -		dmaengine_terminate_all(controller->dma_tx);
> > -		dmaengine_terminate_all(controller->dma_rx);
> > -		return -ETIMEDOUT;
> > -	}
> > +	if (!spi_imx->target_mode) {
> > +		/* Wait SDMA to finish the data transfer.*/
> > +		time_left =
> wait_for_completion_timeout(&spi_imx->dma_tx_completion,
> > +							transfer_timeout);
> > +		if (!time_left) {
> > +			dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
> > +			dmaengine_terminate_all(controller->dma_tx);
> > +			dmaengine_terminate_all(controller->dma_rx);
> > +			return -ETIMEDOUT;
> > +		}
> >
> > -	time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
> > -						transfer_timeout);
> > -	if (!time_left) {
> > -		dev_err(&controller->dev, "I/O Error in DMA RX\n");
> > -		spi_imx->devtype_data->reset(spi_imx);
> > -		dmaengine_terminate_all(controller->dma_rx);
> > -		return -ETIMEDOUT;
> > +		time_left =
> wait_for_completion_timeout(&spi_imx->dma_rx_completion,
> > +							transfer_timeout);
> > +		if (!time_left) {
> > +			dev_err(&controller->dev, "I/O Error in DMA RX\n");
> > +			spi_imx->devtype_data->reset(spi_imx);
> > +			dmaengine_terminate_all(controller->dma_rx);
> > +			return -ETIMEDOUT;
> > +		}
> > +	} else {
> > +		spi_imx->target_aborted = false;
> > +
> > +		if (wait_for_completion_interruptible(&spi_imx->dma_tx_completion)
> ||
> > +		    spi_imx->target_aborted) {
> 
> Suppose somewhere set target_aborted to true. I think here should use
> READ_ONCE() to make sure read from memory.
> 
> Not sure why here use wait_for_completion_interruptible() but at master mode
> use wait_for_completion_timeout().
> 
> > +			dev_dbg(spi_imx->dev, "I/O Error in DMA TX interrupted\n");
> > +			dmaengine_terminate_all(controller->dma_tx);
> > +			dmaengine_terminate_all(controller->dma_rx);
> > +			return -EINTR;
> > +		}
> > +
> > +		if (wait_for_completion_interruptible(&spi_imx->dma_rx_completion)
> ||
> > +		    spi_imx->target_aborted) {
> > +			dev_dbg(spi_imx->dev, "I/O Error in DMA RX interrupted\n");
> > +			dmaengine_terminate_all(controller->dma_rx);
> > +			return -EINTR;
> > +		}
> > +
> > +		/*
> > +		 * ECSPI has a HW issue when works in Target mode, after 64 words
> > +		 * writtern to TXFIFO, even TXFIFO becomes empty, ECSPI_TXDATA
> keeps
> > +		 * shift out the last word data, so we have to disable ECSPI when in
> > +		 * target mode after the transfer completes.
> > +		 */
> > +		if (spi_imx->devtype_data->disable)
> > +			spi_imx->devtype_data->disable(spi_imx);
> >  	}
> >
> >  	return 0;
> > @@ -1895,10 +1920,16 @@ static int spi_imx_dma_package_transfer(struct
> > spi_imx_data *spi_imx,  static int spi_imx_dma_transfer(struct spi_imx_data
> *spi_imx,
> >  				struct spi_transfer *transfer)
> >  {
> > -	bool word_delay = transfer->word_delay.value != 0;
> > +	bool word_delay = transfer->word_delay.value != 0 &&
> > +!spi_imx->target_mode;
> >  	int ret;
> >  	int i;
> >
> > +	if (transfer->len > MX53_MAX_TRANSFER_BYTES &&
> spi_imx->target_mode)
> > +{
> 
> why only target have len limiation?
> 
> Frank
> > +		dev_err(spi_imx->dev, "Transaction too big, max size is %d bytes\n",
> > +			MX53_MAX_TRANSFER_BYTES);
> > +		return -EMSGSIZE;
> > +	}
> > +
> >  	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> >  	if (ret < 0) {
> >  		transfer->error |= SPI_TRANS_FAIL_NO_START; @@ -2104,7 +2135,7
> @@
> > static int spi_imx_transfer_one(struct spi_controller *controller,
> >  	while (spi_imx->devtype_data->rx_available(spi_imx))
> >  		readl(spi_imx->base + MXC_CSPIRXDATA);
> >
> > -	if (spi_imx->target_mode)
> > +	if (spi_imx->target_mode && !spi_imx->usedma)
> >  		return spi_imx_pio_transfer_target(spi, transfer);
> >
> >  	/*
> > @@ -2116,7 +2147,10 @@ static int spi_imx_transfer_one(struct
> spi_controller *controller,
> >  		ret = spi_imx_dma_transfer(spi_imx, transfer);
> >  		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> >  			spi_imx->usedma = false;
> > -			return spi_imx_pio_transfer(spi, transfer);
> > +			if (spi_imx->target_mode)
> > +				return spi_imx_pio_transfer_target(spi, transfer);
> > +			else
> > +				return spi_imx_pio_transfer(spi, transfer);
> >  		}
> >  		return ret;
> >  	}
> > --
> > 2.34.1
> >


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-25 12:10   ` Marc Kleine-Budde
@ 2025-11-26  7:42     ` Carlos Song
  2025-11-26  8:31       ` Marc Kleine-Budde
  0 siblings, 1 reply; 34+ messages in thread
From: Carlos Song @ 2025-11-26  7:42 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	imx@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-spi@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Tuesday, November 25, 2025 8:10 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> imx@lists.linux.dev; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; linux-spi@vger.kernel.org
> Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> DMA mode
> 
> On 25.11.2025 18:06:17, Carlos Song wrote:
> > ECSPI transfers only one word per frame in DMA mode, causing SCLK
> > stalls between words due to BURST_LENGTH updates, which significantly
> > impacts performance.
> >
> > To improve throughput, configure BURST_LENGTH as large as possible (up
> > to
> > 512 bytes per frame) instead of word length. This avoids delays
> > between words. When transfer length is not 4-byte aligned, use bounce
> > buffers to align data for DMA. TX uses aligned words for TXFIFO, while
> > RX trims DMA buffer data after transfer completion.
> >
> > Introduce a new dma_package structure to store:
> >   1. BURST_LENGTH values for each DMA request
> >   2. Variables for DMA submission
> >   3. DMA transmission length and actual data length
> >
> > Handle three cases:
> >   - len <= 512 bytes: one package, BURST_LENGTH = len * 8 - 1
> >   - len > 512 and aligned: one package, BURST_LENGTH = max (512 bytes)
> >   - len > 512 and unaligned: two packages, second for tail data
> >
> > Performance test (spidev_test @10MHz, 4KB):
> >   Before: tx/rx ~6651.9 kbps
> >   After:  tx/rx ~9922.2 kbps (~50% improvement)
> >
> > For compatibility with slow SPI devices, add configurable word delay
> > in DMA mode. When word delay is set, dynamic burst is disabled and
> > BURST_LENGTH equals word length.
> >
> > Signed-off-by: Carlos Song <carlos.song@nxp.com>
> > ---
> >  drivers/spi/spi-imx.c | 409
> > ++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 373 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c index
> > 42f64d9535c9..f02a47fbba8a 100644
> > --- a/drivers/spi/spi-imx.c
> > +++ b/drivers/spi/spi-imx.c
> > @@ -60,6 +60,7 @@ MODULE_PARM_DESC(polling_limit_us,
> >  #define MX51_ECSPI_CTRL_MAX_BURST	512
> >  /* The maximum bytes that IMX53_ECSPI can transfer in target mode.*/
> >  #define MX53_MAX_TRANSFER_BYTES		512
> > +#define BYTES_PER_32BITS_WORD		4
> >
> >  enum spi_imx_devtype {
> >  	IMX1_CSPI,
> > @@ -95,6 +96,16 @@ struct spi_imx_devtype_data {
> >  	enum spi_imx_devtype devtype;
> >  };
> >
> > +struct dma_data_package {
> > +	u32 cmd_word;
> > +	void *dma_rx_buf;
> > +	void *dma_tx_buf;
> > +	dma_addr_t	dma_tx_addr;
> > +	dma_addr_t	dma_rx_addr;
> 
Hi, Marc

Thank you very much for your ack!

> Better use uniform indention: here one space, not one tab.
> 
Will do this in V2
> > +	int dma_len;
> > +	int data_len;
> > +};
> > +
> >  struct spi_imx_data {
> >  	struct spi_controller *controller;
> >  	struct device *dev;
> > @@ -130,6 +141,9 @@ struct spi_imx_data {
> >  	u32 wml;
> >  	struct completion dma_rx_completion;
> >  	struct completion dma_tx_completion;
> > +	struct dma_data_package *dma_data;
> 
> please add __counted_by(dma_package_num)
> 
Will do this in V2
> > +	int dma_package_num;
> > +	int rx_offset;
> >
> >  	const struct spi_imx_devtype_data *devtype_data;  }; @@ -189,6
> > +203,9 @@ MXC_SPI_BUF_TX(u16)
> >  MXC_SPI_BUF_RX(u32)
> >  MXC_SPI_BUF_TX(u32)
> >
> > +/* Align to cache line to avoid swiotlo bounce */ #define
> > +DMA_CACHE_ALIGNED_LEN(x) ALIGN((x), dma_get_cache_alignment())
> > +
> >  /* First entry is reserved, second entry is valid only if SDHC_SPIEN is set
> >   * (which is currently not the case in this driver)
> >   */
> > @@ -253,6 +270,14 @@ static bool spi_imx_can_dma(struct spi_controller
> *controller, struct spi_device
> >  	if (transfer->len < spi_imx->devtype_data->fifo_size)
> >  		return false;
> >
> > +	/* DMA only can transmit data in bytes */
> > +	if (spi_imx->bits_per_word != 8 && spi_imx->bits_per_word != 16 &&
> > +	    spi_imx->bits_per_word != 32)
> > +		return false;
> > +
> > +	if (transfer->len >= MAX_SDMA_BD_BYTES)
> > +		return false;
> > +
> >  	spi_imx->dynamic_burst = 0;
> >
> >  	return true;
> > @@ -1398,8 +1423,6 @@ static int spi_imx_sdma_init(struct device *dev,
> > struct spi_imx_data *spi_imx,
> >
> >  	init_completion(&spi_imx->dma_rx_completion);
> >  	init_completion(&spi_imx->dma_tx_completion);
> > -	controller->can_dma = spi_imx_can_dma;
> > -	controller->max_dma_len = MAX_SDMA_BD_BYTES;
> >  	spi_imx->controller->flags = SPI_CONTROLLER_MUST_RX |
> >  					 SPI_CONTROLLER_MUST_TX;
> >
> > @@ -1437,10 +1460,252 @@ static int spi_imx_calculate_timeout(struct
> spi_imx_data *spi_imx, int size)
> >  	return secs_to_jiffies(2 * timeout);  }
> >
> > +static void spi_imx_dma_unmap(struct spi_imx_data *spi_imx,
> > +			      struct dma_data_package *dma_data) {
> > +	struct device *tx_dev = spi_imx->controller->dma_tx->device->dev;
> > +	struct device *rx_dev = spi_imx->controller->dma_rx->device->dev;
> > +
> > +	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> > +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > +			 DMA_TO_DEVICE);
> > +	dma_unmap_single(rx_dev, dma_data->dma_rx_addr,
> > +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > +			 DMA_FROM_DEVICE);
> > +}
> > +
> > +static void spi_imx_dma_rx_data_handle(struct spi_imx_data *spi_imx,
> > +				       struct dma_data_package *dma_data, void
> *rx_buf,
> > +				       bool word_delay)
> > +{
> > +#ifdef __LITTLE_ENDIAN
> > +	unsigned int bytes_per_word =
> spi_imx_bytes_per_word(spi_imx->bits_per_word);
> > +	u32 *temp = dma_data->dma_rx_buf;
> 
> can you move this into the scope of the if() below?
> > +#endif
> > +	void *copy_ptr;
> > +	int unaligned;
> > +
> > +#ifdef __LITTLE_ENDIAN
> > +	/*
> > +	 * On little-endian CPUs, adjust byte order:
> > +	 * - Swap bytes when bpw = 8
> > +	 * - Swap half-words when bpw = 16
> > +	 * This ensures correct data ordering for DMA transfers.
> > +	 */
> > +	if (!word_delay) {
> > +		for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len,
> > +BYTES_PER_32BITS_WORD); i++) {
> 
> sizeof(*temp) instead of BYTES_PER_32BITS_WORD?
> 
Looks good I will try this in V2
> > +			if (bytes_per_word == 1)
> > +				swab32s(temp + i);
> > +			else if (bytes_per_word == 2)
> > +				swahw32s(temp + i);
> 
> Why do you do first a swap in place and then a memcpy()
> 
When dynamic burst enabled, DMA copy data always using buswidth=4 bytes.
CPU is little endian so bytes order actually little endian in dma_buf.
But for bytes_per_word=1, every bytes should be the same order with mem order, it should be big endian so swap every bytes.
In the same reason, bytes per word = 2, swap half word.
Bytes per word = 4 don't need do any thing.
(SPI is not ruled bytes order, so SPI bytes order always follow CPU bytes order, here still follow)
> > +		}
> > +	}
> > +#endif
> > +
> > +	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> 
> I think this deserves a comment, why you do a re-alignment of the data here.
> 
Yes. I can add one comment here.

In fact it is not re-alignment, it is trim data.
When dynamic burst enabled, DMA copy data always using bus width=4 bytes.
So DMA always get 4 bytes data from RXFIFO. But if data lens is not 4-byte alignment,
the data in the DMA bounce buffer contains extra garbage bytes, so it needs to be trimmed before memcopy to xfer buffer.

Why is the first word?
It is from HW behavior. When dynamic burst enabled, BURST_LENGTH will be set same with actual data len, 
It helps maintain correct bit count.

As RM shows:
"
In master mode, it controls the number of bits per SPI burst. Since the shift register always loads 32-bit
data from transmit FIFO, only the n least-significant (n = BURST LENGTH + 1) will be shifted out. The
remaining bits will be ignored.

Number of Valid Bits in a SPI burst.

0x000 A SPI burst contains the 1 LSB in a word.
0x001 A SPI burst contains the 2 LSB in a word.
0x002 A SPI burst contains the 3 LSB in a word.
...
0x01F A SPI burst contains all 32 bits in a word.
0x020 A SPI burst contains the 1 LSB in first word and all 32 bits in second word.
0x021 A SPI burst contains the 2 LSB in first word and all 32 bits in second word.
"
When data len is not 4 bytes-align, so the first word is always include some garbage bytes(if transfer 7 bytes. first word include one garbage byte and 3 valid bytes, four bytes in second word).

> > +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> > +		copy_ptr = (u8 *)dma_data->dma_rx_buf +
> BYTES_PER_32BITS_WORD - unaligned;
> > +	} else {
> > +		copy_ptr = dma_data->dma_rx_buf;
> 
> Why do you use the bounce buffer if the data is aligned correct?
> 
Whatever data is 4 bytes align, when CPU is little endian, bytes swap should be done additionally according to different bytes per word setting.

Summary whole solution about dynamic burst for DMA mode:
1. Always read/write SPI FIFO with DMA buswidth = 4, so DMA bounce buffer always 4-bytes alignment:
  swap bytes/half word according to bytes per word=8/16 when in CPU little endian.
2. BURST_LENGTH setting is important, it helps maintain correct bit count(HW trim: don't shift out bits to TXFIFO also don't shift in bits in RXFIFO):
  * TX: Although DMA put 4 byte-alignment data to FIFO, but in bounce buffer we put valid data in valid LSB of first word, it can make sure ECSPI only shift out valid data in bonus buffer.
  * RX: Although DMA get 4bytes- alignment data from RXFIFO to bounce buffer, but trim it with valid LSB according actual xfer->len, it can make rx_buf is right data.

> > +	}
> > +
> > +	memcpy(rx_buf, copy_ptr, dma_data->data_len); }
> > +
> > +static int spi_imx_dma_map(struct spi_imx_data *spi_imx,
> > +			   struct dma_data_package *dma_data) {
> > +	struct spi_controller *controller = spi_imx->controller;
> > +	struct device *tx_dev = controller->dma_tx->device->dev;
> > +	struct device *rx_dev = controller->dma_rx->device->dev;
> > +
> > +	dma_data->dma_tx_addr = dma_map_single(tx_dev,
> dma_data->dma_tx_buf,
> > +
> DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > +					       DMA_TO_DEVICE);
> > +	if (dma_mapping_error(tx_dev, dma_data->dma_tx_addr)) {
> > +		dev_err(spi_imx->dev, "DMA TX map failed\n");
> > +		return -EINVAL;
> 
> please propagate the error value of dma_mapping_error()
> 

Will do this in V2
> > +	}
> > +
> > +	dma_data->dma_rx_addr = dma_map_single(rx_dev,
> dma_data->dma_rx_buf,
> > +
> DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > +					       DMA_FROM_DEVICE);
> > +	if (dma_mapping_error(rx_dev, dma_data->dma_rx_addr)) {
> > +		dev_err(spi_imx->dev, "DMA RX map failed\n");
> > +		goto rx_map_err;
> 
> there's only one user of the dma_unmap_single(), so no need for the goto.
> 
This goto is to unmap previous TX, not this RX. TX has been mapped then start to map RX, now RX mapping error, Do we really don't need to
rollback for TX?

> > +	}
> > +
> > +	return 0;
> > +
> > +rx_map_err:
> > +	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> > +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > +			 DMA_TO_DEVICE);
> > +	return -EINVAL;
> > +}
> > +
> > +static int spi_imx_dma_tx_data_handle(struct spi_imx_data *spi_imx,
> > +				      struct dma_data_package *dma_data,
> > +				      const void *tx_buf,
> > +				      bool word_delay)
> > +{
> > +#ifdef __LITTLE_ENDIAN
> > +	unsigned int bytes_per_word =
> spi_imx_bytes_per_word(spi_imx->bits_per_word);
> > +	u32 *temp;
> > +#endif
> 
> move into scope of if()
> 
Will do this in V2.
> > +	void *copy_ptr;
> > +	int unaligned;
> > +
> > +	if (word_delay) {
> > +		dma_data->dma_len = dma_data->data_len;
> > +	} else {
> > +		/*
> > +		 * As per the reference manual, when burst length = 32*n + m bits,
> ECSPI
> > +		 * sends m LSB bits in the first word, followed by n full 32-bit words.
> > +		 * Since actual data may not be 4-byte aligned, allocate DMA TX/RX
> buffers
> > +		 * to ensure alignment. For TX, DMA pushes 4-byte aligned words to
> TXFIFO,
> > +		 * while ECSPI uses BURST_LENGTH settings to maintain correct bit
> count.
> > +		 * For RX, DMA receives 32-bit words from RXFIFO; after transfer
> completes,
> > +		 * trim the DMA RX buffer and copy the actual data to rx_buf.
> > +		 */
> 
> Ahh, please add the corresponding description for rx.
> 
Will do this in V2.
> > +		dma_data->dma_len = ALIGN(dma_data->data_len,
> BYTES_PER_32BITS_WORD);
> > +	}
> > +
> > +	dma_data->dma_tx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL |
> > +__GFP_ZERO);
> 
> kzalloc()?
> 
Yes. Will do this in V2
> > +	if (!dma_data->dma_tx_buf)
> > +		return -ENOMEM;
> > +
> > +	dma_data->dma_rx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL |
> __GFP_ZERO);
> > +	if (!dma_data->dma_rx_buf) {
> > +		kfree(dma_data->dma_tx_buf);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> > +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> > +		copy_ptr = (u8 *)dma_data->dma_tx_buf +
> BYTES_PER_32BITS_WORD - unaligned;
> > +	} else {
> > +		copy_ptr = dma_data->dma_tx_buf;
> > +	}
> > +
> > +	memcpy(copy_ptr, tx_buf, dma_data->data_len);
> > +
> > +	/*
> > +	 * When word_delay is enabled, DMA transfers an entire word in one
> minor loop.
> > +	 * In this case, no data requires additional handling.
> > +	 */
> > +	if (word_delay)
> > +		return 0;
> > +
> > +#ifdef __LITTLE_ENDIAN
> > +	/*
> > +	 * On little-endian CPUs, adjust byte order:
> > +	 * - Swap bytes when bpw = 8
> > +	 * - Swap half-words when bpw = 16
> > +	 * This ensures correct data ordering for DMA transfers.
> > +	 */
> > +	temp = dma_data->dma_tx_buf;
> > +	for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len,
> BYTES_PER_32BITS_WORD); i++) {
> > +		if (bytes_per_word == 1)
> > +			swab32s(temp + i);
> > +		else if (bytes_per_word == 2)
> > +			swahw32s(temp + i);
> > +	}
> > +#endif
> > +
> > +	return 0;
> > +}
> > +
> > +static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
> > +				    struct spi_transfer *transfer,
> > +				    bool word_delay)
> > +{
> > +	u32 pre_bl, tail_bl;
> > +	u32 ctrl;
> > +	int ret;
> > +
> > +	/*
> > +	 * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds
> 512
> > +	 * and is not a multiple of 512, a tail transfer is required. In this case,
> > +	 * an extra DMA request is issued for the remaining data.
> 
> Why do you need an extra transfer in this case?
> 

BURST_LEGTH is used for SPI HW to maintain correct bit count. So BURST_LENGTH should update with
data length. After DMA request submit, SPI can not update the BURST_LENGTH, when needed, we must
split two package, update the register then setup second DMA transfer.

ECSPI HW can update BURST_LENGTH auto, but it always update this using previous value.
When len > 512 but len is not 512-unaligned, we need two packages, second for tail data.
For example len is 512 *3 + 511. So first transfer using BURST_LENGTH = 512 bytes(auto update 3 times), DMA transfer len = 512 *3,
second package BURST_LENGTH = 511 bytes, DMA transfer len = 511.(If here we use 512 bytes as BURST_LENGTH,
SPI will shift out/in extra bits, it previous isn't acceptable!)

Carlos Song
> > +	 */
> > +	ctrl = readl(spi_imx->base + MX51_ECSPI_CTRL);
> > +	if (word_delay) {
> > +		/*
> > +		 * When SPI IMX need to support word delay, according to "Sample
> Period Control
> > +		 * Register" shows, The Sample Period Control Register
> (ECSPI_PERIODREG)
> > +		 * provides software a way to insert delays (wait states) between
> consecutive
> > +		 * SPI transfers. As a result, ECSPI can only transfer one word per
> frame, and
> > +		 * the delay occurs between frames.
> > +		 */
> > +		spi_imx->dma_package_num = 1;
> > +		pre_bl = spi_imx->bits_per_word - 1;
> > +	} else if (transfer->len <= MX51_ECSPI_CTRL_MAX_BURST) {
> > +		spi_imx->dma_package_num = 1;
> > +		pre_bl = transfer->len * BITS_PER_BYTE - 1;
> > +	} else if (!(transfer->len % MX51_ECSPI_CTRL_MAX_BURST)) {
> > +		spi_imx->dma_package_num = 1;
> > +		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> > +	} else {
> > +		spi_imx->dma_package_num = 2;
> > +		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> > +		tail_bl = (transfer->len % MX51_ECSPI_CTRL_MAX_BURST) *
> BITS_PER_BYTE - 1;
> > +	}
> > +
> > +	spi_imx->dma_data = kmalloc_array(spi_imx->dma_package_num,
> > +					  sizeof(struct dma_data_package),
> > +					  GFP_KERNEL | __GFP_ZERO);
> > +	if (!spi_imx->dma_data) {
> > +		dev_err(spi_imx->dev, "Failed to allocate DMA package buffer!\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	if (spi_imx->dma_package_num == 1) {
> > +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> > +		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> > +		spi_imx->dma_data[0].cmd_word = ctrl;
> > +		spi_imx->dma_data[0].data_len = transfer->len;
> > +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0],
> transfer->tx_buf,
> > +						 word_delay);
> > +		if (ret) {
> > +			kfree(spi_imx->dma_data);
> > +			return ret;
> > +		}
> > +	} else {
> > +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> > +		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> > +		spi_imx->dma_data[0].cmd_word = ctrl;
> > +		spi_imx->dma_data[0].data_len =
> DIV_ROUND_DOWN_ULL(transfer->len,
> > +								   MX51_ECSPI_CTRL_MAX_BURST)
> > +								   * MX51_ECSPI_CTRL_MAX_BURST;
> > +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0],
> transfer->tx_buf,
> > +						 false);
> > +		if (ret) {
> > +			kfree(spi_imx->dma_data);
> > +			return ret;
> > +		}
> > +
> > +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> > +		ctrl |= tail_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> > +		spi_imx->dma_data[1].cmd_word = ctrl;
> > +		spi_imx->dma_data[1].data_len = transfer->len %
> MX51_ECSPI_CTRL_MAX_BURST;
> > +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[1],
> > +						 transfer->tx_buf +
> spi_imx->dma_data[0].data_len,
> > +						 false);
> > +		if (ret) {
> > +			kfree(spi_imx->dma_data[0].dma_tx_buf);
> > +			kfree(spi_imx->dma_data[0].dma_rx_buf);
> > +			kfree(spi_imx->dma_data);
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
> > +			      struct dma_data_package *dma_data,
> >  			      struct spi_transfer *transfer)  {
> > -	struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
> >  	struct spi_controller *controller = spi_imx->controller;
> >  	struct dma_async_tx_descriptor *desc_tx, *desc_rx;
> >  	unsigned long transfer_timeout;
> > @@ -1451,9 +1716,9 @@ static int spi_imx_dma_submit(struct spi_imx_data
> *spi_imx,
> >  	 * The TX DMA setup starts the transfer, so make sure RX is configured
> >  	 * before TX.
> >  	 */
> > -	desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
> > -					  rx->sgl, rx->nents, DMA_DEV_TO_MEM,
> > -					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> > +	desc_rx = dmaengine_prep_slave_single(controller->dma_rx,
> dma_data->dma_rx_addr,
> > +					      dma_data->dma_len, DMA_DEV_TO_MEM,
> > +					      DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> >  	if (!desc_rx) {
> >  		transfer->error |= SPI_TRANS_FAIL_NO_START;
> >  		return -EINVAL;
> > @@ -1471,9 +1736,9 @@ static int spi_imx_dma_submit(struct spi_imx_data
> *spi_imx,
> >  	reinit_completion(&spi_imx->dma_rx_completion);
> >  	dma_async_issue_pending(controller->dma_rx);
> >
> > -	desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
> > -					  tx->sgl, tx->nents, DMA_MEM_TO_DEV,
> > -					  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> > +	desc_tx = dmaengine_prep_slave_single(controller->dma_tx,
> dma_data->dma_tx_addr,
> > +					      dma_data->dma_len, DMA_MEM_TO_DEV,
> > +					      DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> >  	if (!desc_tx)
> >  		goto dmaengine_terminate_rx;
> >
> > @@ -1521,16 +1786,16 @@ static int spi_imx_dma_submit(struct
> > spi_imx_data *spi_imx,  }
> >
> >  static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
> > -				     struct spi_transfer *transfer)
> > +				     struct dma_data_package *dma_data,
> > +				     bool word_delay)
> >  {
> > -	struct sg_table *rx = &transfer->rx_sg;
> > -	struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
> > -	unsigned int bytes_per_word, i;
> > +	unsigned int bytes_per_word = word_delay ?
> > +				      spi_imx_bytes_per_word(spi_imx->bits_per_word) :
> > +				      BYTES_PER_32BITS_WORD;
> > +	unsigned int i;
> >
> > -	/* Get the right burst length from the last sg to ensure no tail data */
> > -	bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
> >  	for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
> > -		if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
> > +		if (!dma_data->dma_len % (i * bytes_per_word))
> >  			break;
> >  	}
> >  	/* Use 1 as wml in case no available burst length got */ @@ -1540,25
> > +1805,29 @@ static void spi_imx_dma_max_wml_find(struct spi_imx_data
> *spi_imx,
> >  	spi_imx->wml = i;
> >  }
> >
> > -static int spi_imx_dma_configure(struct spi_controller *controller)
> > +static int spi_imx_dma_configure(struct spi_controller *controller,
> > +bool word_delay)
> >  {
> >  	int ret;
> >  	enum dma_slave_buswidth buswidth;
> >  	struct dma_slave_config rx = {}, tx = {};
> >  	struct spi_imx_data *spi_imx =
> > spi_controller_get_devdata(controller);
> >
> > -	switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
> > -	case 4:
> > +	if (word_delay) {
> > +		switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
> > +		case 4:
> > +			buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
> > +			break;
> > +		case 2:
> > +			buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
> > +			break;
> > +		case 1:
> > +			buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
> > +			break;
> > +		default:
> > +			return -EINVAL;
> > +		}
> > +	} else {
> >  		buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
> > -		break;
> > -	case 2:
> > -		buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
> > -		break;
> > -	case 1:
> > -		buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
> > -		break;
> > -	default:
> > -		return -EINVAL;
> >  	}
> >
> >  	tx.direction = DMA_MEM_TO_DEV;
> > @@ -1584,15 +1853,17 @@ static int spi_imx_dma_configure(struct
> spi_controller *controller)
> >  	return 0;
> >  }
> >
> > -static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> > -				struct spi_transfer *transfer)
> > +static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
> > +					struct dma_data_package *dma_data,
> > +					struct spi_transfer *transfer,
> > +					bool word_delay)
> >  {
> >  	struct spi_controller *controller = spi_imx->controller;
> >  	int ret;
> >
> > -	spi_imx_dma_max_wml_find(spi_imx, transfer);
> > +	spi_imx_dma_max_wml_find(spi_imx, dma_data, word_delay);
> >
> > -	ret = spi_imx_dma_configure(controller);
> > +	ret = spi_imx_dma_configure(controller, word_delay);
> >  	if (ret)
> >  		goto dma_failure_no_start;
> >
> > @@ -1603,10 +1874,17 @@ static int spi_imx_dma_transfer(struct
> spi_imx_data *spi_imx,
> >  	}
> >  	spi_imx->devtype_data->setup_wml(spi_imx);
> >
> > -	ret = spi_imx_dma_submit(spi_imx, transfer);
> > +	ret = spi_imx_dma_submit(spi_imx, dma_data, transfer);
> >  	if (ret)
> >  		return ret;
> >
> > +	/* Trim the DMA RX buffer and copy the actual data to rx_buf */
> > +	dma_sync_single_for_cpu(controller->dma_rx->device->dev,
> dma_data->dma_rx_addr,
> > +				dma_data->dma_len, DMA_FROM_DEVICE);
> > +	spi_imx_dma_rx_data_handle(spi_imx, dma_data, transfer->rx_buf +
> spi_imx->rx_offset,
> > +				   word_delay);
> > +	spi_imx->rx_offset += dma_data->data_len;
> > +
> >  	return 0;
> >  /* fallback to pio */
> >  dma_failure_no_start:
> > @@ -1614,6 +1892,60 @@ static int spi_imx_dma_transfer(struct
> spi_imx_data *spi_imx,
> >  	return ret;
> >  }
> >
> > +static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> > +				struct spi_transfer *transfer)
> > +{
> > +	bool word_delay = transfer->word_delay.value != 0;
> > +	int ret;
> > +	int i;
> > +
> > +	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> > +	if (ret < 0) {
> > +		transfer->error |= SPI_TRANS_FAIL_NO_START;
> > +		dev_err(spi_imx->dev, "DMA data prepare fail\n");
> > +		goto fallback_pio;
> > +	}
> > +
> > +	spi_imx->rx_offset = 0;
> > +
> > +	/* Each dma_package performs a separate DMA transfer once */
> > +	for (i = 0; i < spi_imx->dma_package_num; i++) {
> > +		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> > +		if (ret < 0) {
> > +			transfer->error |= SPI_TRANS_FAIL_NO_START;
> > +			dev_err(spi_imx->dev, "DMA map fail\n");
> > +			break;
> > +		}
> > +
> > +		/* Update the CTRL register BL field */
> > +		writel(spi_imx->dma_data[i].cmd_word, spi_imx->base +
> > +MX51_ECSPI_CTRL);
> > +
> > +		ret = spi_imx_dma_package_transfer(spi_imx,
> &spi_imx->dma_data[i],
> > +						   transfer, word_delay);
> > +
> > +		/* Whether the dma transmission is successful or not, dma unmap is
> necessary */
> > +		spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
> > +
> > +		if (ret < 0) {
> > +			dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
> > +			break;
> > +		}
> > +	}
> > +
> > +	for (int j = 0; j < spi_imx->dma_package_num; j++) {
> > +		kfree(spi_imx->dma_data[j].dma_tx_buf);
> > +		kfree(spi_imx->dma_data[j].dma_rx_buf);
> > +	}
> > +	kfree(spi_imx->dma_data);
> > +
> > +fallback_pio:
> > +	/* If no any dma package data is transferred, fallback to PIO mode transfer
> */
> > +	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
> > +		transfer->error &= !SPI_TRANS_FAIL_NO_START;
> > +
> > +	return ret;
> > +}
> > +
> >  static int spi_imx_pio_transfer(struct spi_device *spi,
> >  				struct spi_transfer *transfer)
> >  {
> > @@ -1780,9 +2112,14 @@ static int spi_imx_transfer_one(struct
> spi_controller *controller,
> >  	 * transfer, the SPI transfer has already been mapped, so we
> >  	 * have to do the DMA transfer here.
> >  	 */
> > -	if (spi_imx->usedma)
> > -		return spi_imx_dma_transfer(spi_imx, transfer);
> > -
> > +	if (spi_imx->usedma) {
> > +		ret = spi_imx_dma_transfer(spi_imx, transfer);
> > +		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> > +			spi_imx->usedma = false;
> > +			return spi_imx_pio_transfer(spi, transfer);
> > +		}
> > +		return ret;
> > +	}
> >  	/* run in polling mode for short transfers */
> >  	if (transfer->len == 1 || (polling_limit_us &&
> >  				   spi_imx_transfer_estimate_time_us(transfer) <
> > polling_limit_us))
> > --
> > 2.34.1
> >
> >
> >
> 
> regards,
> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-25 10:06 ` [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
  2025-11-25 12:10   ` Marc Kleine-Budde
@ 2025-11-26  8:11   ` Marc Kleine-Budde
  2025-11-26  8:18     ` [EXT] " Carlos Song
  2025-11-26  8:20   ` Marc Kleine-Budde
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26  8:11 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, frank.li, hawnguo, s.hauer, kernel, festevam, linux-spi,
	imx, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2146 bytes --]

On 25.11.2025 18:06:17, Carlos Song wrote:
> +static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> +				struct spi_transfer *transfer)
> +{
> +	bool word_delay = transfer->word_delay.value != 0;
> +	int ret;
> +	int i;
> +
> +	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> +	if (ret < 0) {
> +		transfer->error |= SPI_TRANS_FAIL_NO_START;
> +		dev_err(spi_imx->dev, "DMA data prepare fail\n");
> +		goto fallback_pio;
> +	}
> +
> +	spi_imx->rx_offset = 0;
> +
> +	/* Each dma_package performs a separate DMA transfer once */
> +	for (i = 0; i < spi_imx->dma_package_num; i++) {
> +		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> +		if (ret < 0) {
> +			transfer->error |= SPI_TRANS_FAIL_NO_START;
> +			dev_err(spi_imx->dev, "DMA map fail\n");
> +			break;
> +		}
> +
> +		/* Update the CTRL register BL field */
> +		writel(spi_imx->dma_data[i].cmd_word, spi_imx->base + MX51_ECSPI_CTRL);
> +
> +		ret = spi_imx_dma_package_transfer(spi_imx, &spi_imx->dma_data[i],
> +						   transfer, word_delay);
> +
> +		/* Whether the dma transmission is successful or not, dma unmap is necessary */
> +		spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
> +
> +		if (ret < 0) {
> +			dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
> +			break;
> +		}
> +	}
> +
> +	for (int j = 0; j < spi_imx->dma_package_num; j++) {
> +		kfree(spi_imx->dma_data[j].dma_tx_buf);
> +		kfree(spi_imx->dma_data[j].dma_rx_buf);
> +	}
> +	kfree(spi_imx->dma_data);
> +
> +fallback_pio:
> +	/* If no any dma package data is transferred, fallback to PIO mode transfer */
> +	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
> +		transfer->error &= !SPI_TRANS_FAIL_NO_START;
                                   ^
this doesn't look correct, you probably want to use a "~", right?

Marc


-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26  8:11   ` Marc Kleine-Budde
@ 2025-11-26  8:18     ` Carlos Song
  0 siblings, 0 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-26  8:18 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 4:12 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> linux-spi@vger.kernel.org; imx@lists.linux.dev;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> DMA mode
> 
> On 25.11.2025 18:06:17, Carlos Song wrote:
> > +static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> > +				struct spi_transfer *transfer)
> > +{
> > +	bool word_delay = transfer->word_delay.value != 0;
> > +	int ret;
> > +	int i;
> > +
> > +	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> > +	if (ret < 0) {
> > +		transfer->error |= SPI_TRANS_FAIL_NO_START;
> > +		dev_err(spi_imx->dev, "DMA data prepare fail\n");
> > +		goto fallback_pio;
> > +	}
> > +
> > +	spi_imx->rx_offset = 0;
> > +
> > +	/* Each dma_package performs a separate DMA transfer once */
> > +	for (i = 0; i < spi_imx->dma_package_num; i++) {
> > +		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> > +		if (ret < 0) {
> > +			transfer->error |= SPI_TRANS_FAIL_NO_START;
> > +			dev_err(spi_imx->dev, "DMA map fail\n");
> > +			break;
> > +		}
> > +
> > +		/* Update the CTRL register BL field */
> > +		writel(spi_imx->dma_data[i].cmd_word, spi_imx->base +
> MX51_ECSPI_CTRL);
> > +
> > +		ret = spi_imx_dma_package_transfer(spi_imx,
> &spi_imx->dma_data[i],
> > +						   transfer, word_delay);
> > +
> > +		/* Whether the dma transmission is successful or not, dma unmap is
> necessary */
> > +		spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
> > +
> > +		if (ret < 0) {
> > +			dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
> > +			break;
> > +		}
> > +	}
> > +
> > +	for (int j = 0; j < spi_imx->dma_package_num; j++) {
> > +		kfree(spi_imx->dma_data[j].dma_tx_buf);
> > +		kfree(spi_imx->dma_data[j].dma_rx_buf);
> > +	}
> > +	kfree(spi_imx->dma_data);
> > +
> > +fallback_pio:
> > +	/* If no any dma package data is transferred, fallback to PIO mode transfer
> */
> > +	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
> > +		transfer->error &= !SPI_TRANS_FAIL_NO_START;
>                                    ^
> this doesn't look correct, you probably want to use a "~", right?
> 
> Marc
> 

Ohh! Yes! your are totally right!
Also will fix it in V2~
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-25 10:06 ` [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
  2025-11-25 12:10   ` Marc Kleine-Budde
  2025-11-26  8:11   ` Marc Kleine-Budde
@ 2025-11-26  8:20   ` Marc Kleine-Budde
  2025-11-26  8:34     ` [EXT] " Carlos Song
  2025-11-26  8:27   ` Marc Kleine-Budde
  2025-11-26 12:22   ` Marc Kleine-Budde
  4 siblings, 1 reply; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26  8:20 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, frank.li, hawnguo, s.hauer, kernel, festevam, linux-spi,
	imx, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2189 bytes --]

On 25.11.2025 18:06:17, Carlos Song wrote:
> +static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> +				struct spi_transfer *transfer)
> +{
> +	bool word_delay = transfer->word_delay.value != 0;
> +	int ret;
> +	int i;
> +
> +	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> +	if (ret < 0) {
> +		transfer->error |= SPI_TRANS_FAIL_NO_START;
> +		dev_err(spi_imx->dev, "DMA data prepare fail\n");
> +		goto fallback_pio;
> +	}
> +
> +	spi_imx->rx_offset = 0;
> +
> +	/* Each dma_package performs a separate DMA transfer once */
> +	for (i = 0; i < spi_imx->dma_package_num; i++) {
> +		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> +		if (ret < 0) {
> +			transfer->error |= SPI_TRANS_FAIL_NO_START;

What about:

if (i == 0)
        transfer->error |= SPI_TRANS_FAIL_NO_START;

instead of removing the later?
> +			dev_err(spi_imx->dev, "DMA map fail\n");
> +			break;
> +		}
> +
> +		/* Update the CTRL register BL field */
> +		writel(spi_imx->dma_data[i].cmd_word, spi_imx->base + MX51_ECSPI_CTRL);
> +
> +		ret = spi_imx_dma_package_transfer(spi_imx, &spi_imx->dma_data[i],
> +						   transfer, word_delay);
> +
> +		/* Whether the dma transmission is successful or not, dma unmap is necessary */
> +		spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
> +
> +		if (ret < 0) {
> +			dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
> +			break;
> +		}
> +	}
> +
> +	for (int j = 0; j < spi_imx->dma_package_num; j++) {
> +		kfree(spi_imx->dma_data[j].dma_tx_buf);
> +		kfree(spi_imx->dma_data[j].dma_rx_buf);
> +	}
> +	kfree(spi_imx->dma_data);
> +
> +fallback_pio:
> +	/* If no any dma package data is transferred, fallback to PIO mode transfer */
> +	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
> +		transfer->error &= !SPI_TRANS_FAIL_NO_START;
> +
> +	return ret;
> +}
> +

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-25 10:06 ` [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
                     ` (2 preceding siblings ...)
  2025-11-26  8:20   ` Marc Kleine-Budde
@ 2025-11-26  8:27   ` Marc Kleine-Budde
  2025-11-26  8:43     ` [EXT] " Carlos Song
  2025-11-26 12:22   ` Marc Kleine-Budde
  4 siblings, 1 reply; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26  8:27 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, frank.li, hawnguo, s.hauer, kernel, festevam, linux-spi,
	imx, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3703 bytes --]

On 25.11.2025 18:06:17, Carlos Song wrote:
> +static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
> +				    struct spi_transfer *transfer,
> +				    bool word_delay)
> +{
> +	u32 pre_bl, tail_bl;
> +	u32 ctrl;
> +	int ret;
> +
> +	/*
> +	 * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds 512
> +	 * and is not a multiple of 512, a tail transfer is required. In this case,
> +	 * an extra DMA request is issued for the remaining data.
> +	 */
> +	ctrl = readl(spi_imx->base + MX51_ECSPI_CTRL);
> +	if (word_delay) {
> +		/*
> +		 * When SPI IMX need to support word delay, according to "Sample Period Control
> +		 * Register" shows, The Sample Period Control Register (ECSPI_PERIODREG)
> +		 * provides software a way to insert delays (wait states) between consecutive
> +		 * SPI transfers. As a result, ECSPI can only transfer one word per frame, and
> +		 * the delay occurs between frames.
> +		 */
> +		spi_imx->dma_package_num = 1;
> +		pre_bl = spi_imx->bits_per_word - 1;
> +	} else if (transfer->len <= MX51_ECSPI_CTRL_MAX_BURST) {
> +		spi_imx->dma_package_num = 1;
> +		pre_bl = transfer->len * BITS_PER_BYTE - 1;
> +	} else if (!(transfer->len % MX51_ECSPI_CTRL_MAX_BURST)) {
> +		spi_imx->dma_package_num = 1;
> +		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> +	} else {
> +		spi_imx->dma_package_num = 2;
> +		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> +		tail_bl = (transfer->len % MX51_ECSPI_CTRL_MAX_BURST) * BITS_PER_BYTE - 1;
> +	}
> +
> +	spi_imx->dma_data = kmalloc_array(spi_imx->dma_package_num,
> +					  sizeof(struct dma_data_package),
> +					  GFP_KERNEL | __GFP_ZERO);
> +	if (!spi_imx->dma_data) {
> +		dev_err(spi_imx->dev, "Failed to allocate DMA package buffer!\n");
> +		return -ENOMEM;
> +	}
> +
> +	if (spi_imx->dma_package_num == 1) {
> +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> +		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> +		spi_imx->dma_data[0].cmd_word = ctrl;
> +		spi_imx->dma_data[0].data_len = transfer->len;
> +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
> +						 word_delay);
> +		if (ret) {
> +			kfree(spi_imx->dma_data);
> +			return ret;
> +		}
> +	} else {
> +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> +		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> +		spi_imx->dma_data[0].cmd_word = ctrl;
> +		spi_imx->dma_data[0].data_len = DIV_ROUND_DOWN_ULL(transfer->len,
> +								   MX51_ECSPI_CTRL_MAX_BURST)
> +								   * MX51_ECSPI_CTRL_MAX_BURST;

What mathematical operation do you want to do? Why do you use a 64 bit
div? What about round_down()?

> +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
> +						 false);
> +		if (ret) {
> +			kfree(spi_imx->dma_data);
> +			return ret;
> +		}
> +
> +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> +		ctrl |= tail_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> +		spi_imx->dma_data[1].cmd_word = ctrl;
> +		spi_imx->dma_data[1].data_len = transfer->len % MX51_ECSPI_CTRL_MAX_BURST;
> +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[1],
> +						 transfer->tx_buf + spi_imx->dma_data[0].data_len,
> +						 false);
> +		if (ret) {
> +			kfree(spi_imx->dma_data[0].dma_tx_buf);
> +			kfree(spi_imx->dma_data[0].dma_rx_buf);
> +			kfree(spi_imx->dma_data);
> +		}
> +	}
> +
> +	return 0;
> +}

regards,
Marc

--
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26  7:42     ` Carlos Song
@ 2025-11-26  8:31       ` Marc Kleine-Budde
  2025-11-26  9:51         ` [EXT] " Carlos Song
  2025-11-26 11:17         ` Carlos Song
  0 siblings, 2 replies; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26  8:31 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	imx@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-spi@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 10842 bytes --]

On 26.11.2025 07:42:36, Carlos Song wrote:

[...]

> > > +			if (bytes_per_word == 1)
> > > +				swab32s(temp + i);
> > > +			else if (bytes_per_word == 2)
> > > +				swahw32s(temp + i);
> >
> > Why do you do first a swap in place and then a memcpy()
> >
> When dynamic burst enabled, DMA copy data always using buswidth=4 bytes.
> CPU is little endian so bytes order actually little endian in dma_buf.
> But for bytes_per_word=1, every bytes should be the same order with mem order, it should be big endian so swap every bytes.
> In the same reason, bytes per word = 2, swap half word.
> Bytes per word = 4 don't need do any thing.
> (SPI is not ruled bytes order, so SPI bytes order always follow CPU bytes order, here still follow)

Thanks for the explanation. I think my question was not completely
clear. I want to know why you touch every byte twice, first you do the
swap on the existing buffer, then you do a memcpy(). You might do both
operations in one go, i.e. read the bytes from the original buffer and
write them swapped to the bounce buffer.

> > > +		}
> > > +	}
> > > +#endif
> > > +
> > > +	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> >
> > I think this deserves a comment, why you do a re-alignment of the data here.
> >
> Yes. I can add one comment here.
>
> In fact it is not re-alignment, it is trim data.
> When dynamic burst enabled, DMA copy data always using bus width=4 bytes.
> So DMA always get 4 bytes data from RXFIFO. But if data lens is not 4-byte alignment,
> the data in the DMA bounce buffer contains extra garbage bytes, so it needs to be trimmed before memcopy to xfer buffer.
>
> Why is the first word?
> It is from HW behavior. When dynamic burst enabled, BURST_LENGTH will be set same with actual data len,
> It helps maintain correct bit count.
>
> As RM shows:
> "
> In master mode, it controls the number of bits per SPI burst. Since the shift register always loads 32-bit
> data from transmit FIFO, only the n least-significant (n = BURST LENGTH + 1) will be shifted out. The
> remaining bits will be ignored.
>
> Number of Valid Bits in a SPI burst.
>
> 0x000 A SPI burst contains the 1 LSB in a word.
> 0x001 A SPI burst contains the 2 LSB in a word.
> 0x002 A SPI burst contains the 3 LSB in a word.
> ...
> 0x01F A SPI burst contains all 32 bits in a word.
> 0x020 A SPI burst contains the 1 LSB in first word and all 32 bits in second word.
> 0x021 A SPI burst contains the 2 LSB in first word and all 32 bits in second word.
> "
> When data len is not 4 bytes-align, so the first word is always include some garbage bytes(if transfer 7 bytes. first word include one garbage byte and 3 valid bytes, four bytes in second word).
>
> > > +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> > > +		copy_ptr = (u8 *)dma_data->dma_rx_buf +
> > BYTES_PER_32BITS_WORD - unaligned;
> > > +	} else {
> > > +		copy_ptr = dma_data->dma_rx_buf;
> >
> > Why do you use the bounce buffer if the data is aligned correct?
> >
> Whatever data is 4 bytes align, when CPU is little endian, bytes swap should be done additionally according to different bytes per word setting.

But for 32 bits per word or word delay you do a not needed bounce
buffers and memcpy()?

We still need the bounce buffer for length not multiple of 4, because a
direct DMA write would overflow the destination buffer, right? And of
course bounce buffers for LE 8 and 16 bit per word for the byte
swapping.

> Summary whole solution about dynamic burst for DMA mode:
> 1. Always read/write SPI FIFO with DMA buswidth = 4, so DMA bounce buffer always 4-bytes alignment:
>   swap bytes/half word according to bytes per word=8/16 when in CPU little endian.
> 2. BURST_LENGTH setting is important, it helps maintain correct bit count(HW trim: don't shift out bits to TXFIFO also don't shift in bits in RXFIFO):
>   * TX: Although DMA put 4 byte-alignment data to FIFO, but in bounce buffer we put valid data in valid LSB of first word, it can make sure ECSPI only shift out valid data in bonus buffer.
>   * RX: Although DMA get 4bytes- alignment data from RXFIFO to bounce buffer, but trim it with valid LSB according actual xfer->len, it can make rx_buf is right data.
>
> > > +	}
> > > +
> > > +	memcpy(rx_buf, copy_ptr, dma_data->data_len); }
> > > +
> > > +static int spi_imx_dma_map(struct spi_imx_data *spi_imx,
> > > +			   struct dma_data_package *dma_data) {
> > > +	struct spi_controller *controller = spi_imx->controller;
> > > +	struct device *tx_dev = controller->dma_tx->device->dev;
> > > +	struct device *rx_dev = controller->dma_rx->device->dev;
> > > +
> > > +	dma_data->dma_tx_addr = dma_map_single(tx_dev,
> > dma_data->dma_tx_buf,
> > > +
> > DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > +					       DMA_TO_DEVICE);
> > > +	if (dma_mapping_error(tx_dev, dma_data->dma_tx_addr)) {
> > > +		dev_err(spi_imx->dev, "DMA TX map failed\n");
> > > +		return -EINVAL;
> >
> > please propagate the error value of dma_mapping_error()
> >
>
> Will do this in V2
> > > +	}
> > > +
> > > +	dma_data->dma_rx_addr = dma_map_single(rx_dev,
> > dma_data->dma_rx_buf,
> > > +
> > DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > +					       DMA_FROM_DEVICE);
> > > +	if (dma_mapping_error(rx_dev, dma_data->dma_rx_addr)) {
> > > +		dev_err(spi_imx->dev, "DMA RX map failed\n");
> > > +		goto rx_map_err;
> >
> > there's only one user of the dma_unmap_single(), so no need for the goto.
> >
> This goto is to unmap previous TX, not this RX. TX has been mapped then start to map RX, now RX mapping error, Do we really don't need to
> rollback for TX?

Sorry there was a misunderstanding, I mean you should directly call
dma_unmap_single() and remove the goto.

>
> > > +	}
> > > +
> > > +	return 0;
> > > +
> > > +rx_map_err:
> > > +	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> > > +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > +			 DMA_TO_DEVICE);
> > > +	return -EINVAL;
> > > +}
> > > +
> > > +static int spi_imx_dma_tx_data_handle(struct spi_imx_data *spi_imx,
> > > +				      struct dma_data_package *dma_data,
> > > +				      const void *tx_buf,
> > > +				      bool word_delay)
> > > +{
> > > +#ifdef __LITTLE_ENDIAN
> > > +	unsigned int bytes_per_word =
> > spi_imx_bytes_per_word(spi_imx->bits_per_word);
> > > +	u32 *temp;
> > > +#endif
> >
> > move into scope of if()
> >
> Will do this in V2.
> > > +	void *copy_ptr;
> > > +	int unaligned;
> > > +
> > > +	if (word_delay) {
> > > +		dma_data->dma_len = dma_data->data_len;
> > > +	} else {
> > > +		/*
> > > +		 * As per the reference manual, when burst length = 32*n + m bits,
> > ECSPI
> > > +		 * sends m LSB bits in the first word, followed by n full 32-bit words.
> > > +		 * Since actual data may not be 4-byte aligned, allocate DMA TX/RX
> > buffers
> > > +		 * to ensure alignment. For TX, DMA pushes 4-byte aligned words to
> > TXFIFO,
> > > +		 * while ECSPI uses BURST_LENGTH settings to maintain correct bit
> > count.
> > > +		 * For RX, DMA receives 32-bit words from RXFIFO; after transfer
> > completes,
> > > +		 * trim the DMA RX buffer and copy the actual data to rx_buf.
> > > +		 */
> >
> > Ahh, please add the corresponding description for rx.
> >
> Will do this in V2.
> > > +		dma_data->dma_len = ALIGN(dma_data->data_len,
> > BYTES_PER_32BITS_WORD);
> > > +	}
> > > +
> > > +	dma_data->dma_tx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL |
> > > +__GFP_ZERO);
> >
> > kzalloc()?
> >
> Yes. Will do this in V2
> > > +	if (!dma_data->dma_tx_buf)
> > > +		return -ENOMEM;
> > > +
> > > +	dma_data->dma_rx_buf = kmalloc(dma_data->dma_len, GFP_KERNEL |
> > __GFP_ZERO);
> > > +	if (!dma_data->dma_rx_buf) {
> > > +		kfree(dma_data->dma_tx_buf);
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> > > +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> > > +		copy_ptr = (u8 *)dma_data->dma_tx_buf +
> > BYTES_PER_32BITS_WORD - unaligned;
> > > +	} else {
> > > +		copy_ptr = dma_data->dma_tx_buf;
> > > +	}
> > > +
> > > +	memcpy(copy_ptr, tx_buf, dma_data->data_len);
> > > +
> > > +	/*
> > > +	 * When word_delay is enabled, DMA transfers an entire word in one
> > minor loop.
> > > +	 * In this case, no data requires additional handling.
> > > +	 */
> > > +	if (word_delay)
> > > +		return 0;
> > > +
> > > +#ifdef __LITTLE_ENDIAN
> > > +	/*
> > > +	 * On little-endian CPUs, adjust byte order:
> > > +	 * - Swap bytes when bpw = 8
> > > +	 * - Swap half-words when bpw = 16
> > > +	 * This ensures correct data ordering for DMA transfers.
> > > +	 */
> > > +	temp = dma_data->dma_tx_buf;
> > > +	for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len,
> > BYTES_PER_32BITS_WORD); i++) {
> > > +		if (bytes_per_word == 1)
> > > +			swab32s(temp + i);
> > > +		else if (bytes_per_word == 2)
> > > +			swahw32s(temp + i);
> > > +	}
> > > +#endif
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
> > > +				    struct spi_transfer *transfer,
> > > +				    bool word_delay)
> > > +{
> > > +	u32 pre_bl, tail_bl;
> > > +	u32 ctrl;
> > > +	int ret;
> > > +
> > > +	/*
> > > +	 * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds
> > 512
> > > +	 * and is not a multiple of 512, a tail transfer is required. In this case,
> > > +	 * an extra DMA request is issued for the remaining data.
> >
> > Why do you need an extra transfer in this case?
> >
>
> BURST_LEGTH is used for SPI HW to maintain correct bit count. So BURST_LENGTH should update with
> data length. After DMA request submit, SPI can not update the BURST_LENGTH, when needed, we must
> split two package, update the register then setup second DMA transfer.
>
> ECSPI HW can update BURST_LENGTH auto, but it always update this using previous value.
> When len > 512 but len is not 512-unaligned, we need two packages, second for tail data.
> For example len is 512 *3 + 511. So first transfer using BURST_LENGTH = 512 bytes(auto update 3 times), DMA transfer len = 512 *3,
> second package BURST_LENGTH = 511 bytes, DMA transfer len = 511.(If here we use 512 bytes as BURST_LENGTH,
> SPI will shift out/in extra bits, it previous isn't acceptable!)

What happens if you keep the Burst Length at 512 and only transfer 511
bytes with the DMA engine?

Marc

--
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26  8:20   ` Marc Kleine-Budde
@ 2025-11-26  8:34     ` Carlos Song
  2025-11-26  8:44       ` Marc Kleine-Budde
  0 siblings, 1 reply; 34+ messages in thread
From: Carlos Song @ 2025-11-26  8:34 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 4:20 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> linux-spi@vger.kernel.org; imx@lists.linux.dev;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> DMA mode
> 
> On 25.11.2025 18:06:17, Carlos Song wrote:
> > +static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> > +				struct spi_transfer *transfer)
> > +{
> > +	bool word_delay = transfer->word_delay.value != 0;
> > +	int ret;
> > +	int i;
> > +
> > +	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> > +	if (ret < 0) {
> > +		transfer->error |= SPI_TRANS_FAIL_NO_START;
> > +		dev_err(spi_imx->dev, "DMA data prepare fail\n");
> > +		goto fallback_pio;
> > +	}
> > +
> > +	spi_imx->rx_offset = 0;
> > +
> > +	/* Each dma_package performs a separate DMA transfer once */
> > +	for (i = 0; i < spi_imx->dma_package_num; i++) {
> > +		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> > +		if (ret < 0) {
> > +			transfer->error |= SPI_TRANS_FAIL_NO_START;
> 
> What about:
> 
> if (i == 0)
>         transfer->error |= SPI_TRANS_FAIL_NO_START;
> 
> instead of removing the later?

	for (i = 0; i < spi_imx->dma_package_num; i++) {
		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
		if (ret < 0) {
			if (I == 0)
				transfer->error |= SPI_TRANS_FAIL_NO_START;
			dev_err(spi_imx->dev, "DMA map fail\n");
			break;
		}
...
-	/* If no any dma package data is transferred, fallback to PIO mode transfer */
-	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
-		transfer->error &= !SPI_TRANS_FAIL_NO_START;

	return ret;

Just like this? I accept this. I can fix this in V2
> > +			dev_err(spi_imx->dev, "DMA map fail\n");
> > +			break;
> > +		}
> > +
> > +		/* Update the CTRL register BL field */
> > +		writel(spi_imx->dma_data[i].cmd_word, spi_imx->base +
> MX51_ECSPI_CTRL);
> > +
> > +		ret = spi_imx_dma_package_transfer(spi_imx,
> &spi_imx->dma_data[i],
> > +						   transfer, word_delay);
> > +
> > +		/* Whether the dma transmission is successful or not, dma unmap is
> necessary */
> > +		spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
> > +
> > +		if (ret < 0) {
> > +			dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
> > +			break;
> > +		}
> > +	}
> > +
> > +	for (int j = 0; j < spi_imx->dma_package_num; j++) {
> > +		kfree(spi_imx->dma_data[j].dma_tx_buf);
> > +		kfree(spi_imx->dma_data[j].dma_rx_buf);
> > +	}
> > +	kfree(spi_imx->dma_data);
> > +
> > +fallback_pio:
> > +	/* If no any dma package data is transferred, fallback to PIO mode transfer
> */
> > +	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
> > +		transfer->error &= !SPI_TRANS_FAIL_NO_START;
> > +
> > +	return ret;
> > +}
> > +
> 
> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26  8:27   ` Marc Kleine-Budde
@ 2025-11-26  8:43     ` Carlos Song
  0 siblings, 0 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-26  8:43 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 4:28 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> linux-spi@vger.kernel.org; imx@lists.linux.dev;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> DMA mode
> 
> On 25.11.2025 18:06:17, Carlos Song wrote:
> > +static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
> > +				    struct spi_transfer *transfer,
> > +				    bool word_delay)
> > +{
> > +	u32 pre_bl, tail_bl;
> > +	u32 ctrl;
> > +	int ret;
> > +
> > +	/*
> > +	 * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds
> 512
> > +	 * and is not a multiple of 512, a tail transfer is required. In this case,
> > +	 * an extra DMA request is issued for the remaining data.
> > +	 */
> > +	ctrl = readl(spi_imx->base + MX51_ECSPI_CTRL);
> > +	if (word_delay) {
> > +		/*
> > +		 * When SPI IMX need to support word delay, according to "Sample
> Period Control
> > +		 * Register" shows, The Sample Period Control Register
> (ECSPI_PERIODREG)
> > +		 * provides software a way to insert delays (wait states) between
> consecutive
> > +		 * SPI transfers. As a result, ECSPI can only transfer one word per
> frame, and
> > +		 * the delay occurs between frames.
> > +		 */
> > +		spi_imx->dma_package_num = 1;
> > +		pre_bl = spi_imx->bits_per_word - 1;
> > +	} else if (transfer->len <= MX51_ECSPI_CTRL_MAX_BURST) {
> > +		spi_imx->dma_package_num = 1;
> > +		pre_bl = transfer->len * BITS_PER_BYTE - 1;
> > +	} else if (!(transfer->len % MX51_ECSPI_CTRL_MAX_BURST)) {
> > +		spi_imx->dma_package_num = 1;
> > +		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> > +	} else {
> > +		spi_imx->dma_package_num = 2;
> > +		pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> > +		tail_bl = (transfer->len % MX51_ECSPI_CTRL_MAX_BURST) *
> BITS_PER_BYTE - 1;
> > +	}
> > +
> > +	spi_imx->dma_data = kmalloc_array(spi_imx->dma_package_num,
> > +					  sizeof(struct dma_data_package),
> > +					  GFP_KERNEL | __GFP_ZERO);
> > +	if (!spi_imx->dma_data) {
> > +		dev_err(spi_imx->dev, "Failed to allocate DMA package buffer!\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	if (spi_imx->dma_package_num == 1) {
> > +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> > +		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> > +		spi_imx->dma_data[0].cmd_word = ctrl;
> > +		spi_imx->dma_data[0].data_len = transfer->len;
> > +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0],
> transfer->tx_buf,
> > +						 word_delay);
> > +		if (ret) {
> > +			kfree(spi_imx->dma_data);
> > +			return ret;
> > +		}
> > +	} else {
> > +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> > +		ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> > +		spi_imx->dma_data[0].cmd_word = ctrl;
> > +		spi_imx->dma_data[0].data_len =
> DIV_ROUND_DOWN_ULL(transfer->len,
> > +								   MX51_ECSPI_CTRL_MAX_BURST)
> > +								   * MX51_ECSPI_CTRL_MAX_BURST;
> 
> What mathematical operation do you want to do? Why do you use a 64 bit div?
> What about round_down()?
> 

I want to round down xfer->len by MX51_ECSPI_CTRL_MAX_BURST to calculate how many MX51_ECSPI_CTRL_MAX_BURST are in xfer->len.
When len =512*3 +511. So after this I can get spi_imx->dma_data[0].data_len = 512*3.
round_down() is enough(u32). Don't need 64 bit.. I can fix it in v2

> > +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0],
> transfer->tx_buf,
> > +						 false);
> > +		if (ret) {
> > +			kfree(spi_imx->dma_data);
> > +			return ret;
> > +		}
> > +
> > +		ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> > +		ctrl |= tail_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> > +		spi_imx->dma_data[1].cmd_word = ctrl;
> > +		spi_imx->dma_data[1].data_len = transfer->len %
> MX51_ECSPI_CTRL_MAX_BURST;
> > +		ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[1],
> > +						 transfer->tx_buf +
> spi_imx->dma_data[0].data_len,
> > +						 false);
> > +		if (ret) {
> > +			kfree(spi_imx->dma_data[0].dma_tx_buf);
> > +			kfree(spi_imx->dma_data[0].dma_rx_buf);
> > +			kfree(spi_imx->dma_data);
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> 
> regards,
> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26  8:34     ` [EXT] " Carlos Song
@ 2025-11-26  8:44       ` Marc Kleine-Budde
  0 siblings, 0 replies; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26  8:44 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie@kernel.org, Frank Li, s.hauer@pengutronix.de,
	kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1927 bytes --]

On 26.11.2025 08:34:18, Carlos Song wrote:
> > On 25.11.2025 18:06:17, Carlos Song wrote:
> > > +static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> > > +				struct spi_transfer *transfer)
> > > +{
> > > +	bool word_delay = transfer->word_delay.value != 0;
> > > +	int ret;
> > > +	int i;
> > > +
> > > +	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> > > +	if (ret < 0) {
> > > +		transfer->error |= SPI_TRANS_FAIL_NO_START;
> > > +		dev_err(spi_imx->dev, "DMA data prepare fail\n");
> > > +		goto fallback_pio;
> > > +	}
> > > +
> > > +	spi_imx->rx_offset = 0;
> > > +
> > > +	/* Each dma_package performs a separate DMA transfer once */
> > > +	for (i = 0; i < spi_imx->dma_package_num; i++) {
> > > +		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> > > +		if (ret < 0) {
> > > +			transfer->error |= SPI_TRANS_FAIL_NO_START;
> >
> > What about:
> >
> > if (i == 0)
> >         transfer->error |= SPI_TRANS_FAIL_NO_START;
> >
> > instead of removing the later?
>
> 	for (i = 0; i < spi_imx->dma_package_num; i++) {
> 		ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> 		if (ret < 0) {
> 			if (I == 0)
                            ^ i
> 				transfer->error |= SPI_TRANS_FAIL_NO_START;
> 			dev_err(spi_imx->dev, "DMA map fail\n");
> 			break;
> 		}
> ...
> -	/* If no any dma package data is transferred, fallback to PIO mode transfer */
> -	if ((transfer->error & SPI_TRANS_FAIL_NO_START) && i != 0)
> -		transfer->error &= !SPI_TRANS_FAIL_NO_START;
>
> 	return ret;
>
> Just like this? I accept this. I can fix this in V2

Yes, with the fixed typo.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26  8:31       ` Marc Kleine-Budde
@ 2025-11-26  9:51         ` Carlos Song
  2025-11-26 11:17         ` Carlos Song
  1 sibling, 0 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-26  9:51 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	imx@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-spi@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 4:31 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> imx@lists.linux.dev; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; linux-spi@vger.kernel.org
> Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> DMA mode
> 
> On 26.11.2025 07:42:36, Carlos Song wrote:
> 
> [...]
> 
> > > > +			if (bytes_per_word == 1)
> > > > +				swab32s(temp + i);
> > > > +			else if (bytes_per_word == 2)
> > > > +				swahw32s(temp + i);
> > >
> > > Why do you do first a swap in place and then a memcpy()
> > >
> > When dynamic burst enabled, DMA copy data always using buswidth=4 bytes.
> > CPU is little endian so bytes order actually little endian in dma_buf.
> > But for bytes_per_word=1, every bytes should be the same order with mem
> order, it should be big endian so swap every bytes.
> > In the same reason, bytes per word = 2, swap half word.
> > Bytes per word = 4 don't need do any thing.
> > (SPI is not ruled bytes order, so SPI bytes order always follow CPU
> > bytes order, here still follow)
> 
> Thanks for the explanation. I think my question was not completely clear. I want
> to know why you touch every byte twice, first you do the swap on the existing
> buffer, then you do a memcpy(). You might do both operations in one go, i.e.
> read the bytes from the original buffer and write them swapped to the bounce
> buffer.
> 

Hi, Marc

If CPU is big endian, we don't touch this bytes twice.
But CPU is little endian, we must touch this bytes twice. This is needed to make right bytes order.

Then if RX trim and tx bytes relocate are needed, we also can not combine 2 steps to 1 step.

> > > > +		}
> > > > +	}
> > > > +#endif
> > > > +
> > > > +	if (dma_data->data_len % BYTES_PER_32BITS_WORD
> && !word_delay) {
> > >
> > > I think this deserves a comment, why you do a re-alignment of the data
> here.
> > >
> > Yes. I can add one comment here.
> >
> > In fact it is not re-alignment, it is trim data.
> > When dynamic burst enabled, DMA copy data always using bus width=4 bytes.
> > So DMA always get 4 bytes data from RXFIFO. But if data lens is not
> > 4-byte alignment, the data in the DMA bounce buffer contains extra garbage
> bytes, so it needs to be trimmed before memcopy to xfer buffer.
> >
> > Why is the first word?
> > It is from HW behavior. When dynamic burst enabled, BURST_LENGTH will
> > be set same with actual data len, It helps maintain correct bit count.
> >
> > As RM shows:
> > "
> > In master mode, it controls the number of bits per SPI burst. Since
> > the shift register always loads 32-bit data from transmit FIFO, only
> > the n least-significant (n = BURST LENGTH + 1) will be shifted out. The
> remaining bits will be ignored.
> >
> > Number of Valid Bits in a SPI burst.
> >
> > 0x000 A SPI burst contains the 1 LSB in a word.
> > 0x001 A SPI burst contains the 2 LSB in a word.
> > 0x002 A SPI burst contains the 3 LSB in a word.
> > ...
> > 0x01F A SPI burst contains all 32 bits in a word.
> > 0x020 A SPI burst contains the 1 LSB in first word and all 32 bits in second
> word.
> > 0x021 A SPI burst contains the 2 LSB in first word and all 32 bits in second
> word.
> > "
> > When data len is not 4 bytes-align, so the first word is always include some
> garbage bytes(if transfer 7 bytes. first word include one garbage byte and 3 valid
> bytes, four bytes in second word).
> >
> > > > +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> > > > +		copy_ptr = (u8 *)dma_data->dma_rx_buf +
> > > BYTES_PER_32BITS_WORD - unaligned;
> > > > +	} else {
> > > > +		copy_ptr = dma_data->dma_rx_buf;
> > >
> > > Why do you use the bounce buffer if the data is aligned correct?
> > >
> > Whatever data is 4 bytes align, when CPU is little endian, bytes swap should be
> done additionally according to different bytes per word setting.
> 
> But for 32 bits per word or word delay you do a not needed bounce buffers and
> memcpy()?
> 
Yes, awesome! you are totally right! I I've also considered this.
If do this like what you say, a signal DMA logic code should be create from DMA start map to transfer finished.

I should provide one DMA xfer function for bits_per_word=32 and word delay(no bounce buffer).
Then provide another DMA xfer function for bits_per_word=8/16 (with bounce buffer).

Actually it will not provide any previous improvement, but the drivers become complex with more codes.
I prefer maintain a set of simple and common code.

So I didn't do this.

> We still need the bounce buffer for length not multiple of 4, because a direct
> DMA write would overflow the destination buffer, right? And of course bounce
> buffers for LE 8 and 16 bit per word for the byte swapping.
> 

Yes. totally right!

> > Summary whole solution about dynamic burst for DMA mode:
> > 1. Always read/write SPI FIFO with DMA buswidth = 4, so DMA bounce buffer
> always 4-bytes alignment:
> >   swap bytes/half word according to bytes per word=8/16 when in CPU little
> endian.
> > 2. BURST_LENGTH setting is important, it helps maintain correct bit count(HW
> trim: don't shift out bits to TXFIFO also don't shift in bits in RXFIFO):
> >   * TX: Although DMA put 4 byte-alignment data to FIFO, but in bounce
> buffer we put valid data in valid LSB of first word, it can make sure ECSPI only
> shift out valid data in bonus buffer.
> >   * RX: Although DMA get 4bytes- alignment data from RXFIFO to bounce
> buffer, but trim it with valid LSB according actual xfer->len, it can make rx_buf is
> right data.
> >
> > > > +	}
> > > > +
> > > > +	memcpy(rx_buf, copy_ptr, dma_data->data_len); }
> > > > +
> > > > +static int spi_imx_dma_map(struct spi_imx_data *spi_imx,
> > > > +			   struct dma_data_package *dma_data) {
> > > > +	struct spi_controller *controller = spi_imx->controller;
> > > > +	struct device *tx_dev = controller->dma_tx->device->dev;
> > > > +	struct device *rx_dev = controller->dma_rx->device->dev;
> > > > +
> > > > +	dma_data->dma_tx_addr = dma_map_single(tx_dev,
> > > dma_data->dma_tx_buf,
> > > > +
> > > DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > > +					       DMA_TO_DEVICE);
> > > > +	if (dma_mapping_error(tx_dev, dma_data->dma_tx_addr)) {
> > > > +		dev_err(spi_imx->dev, "DMA TX map failed\n");
> > > > +		return -EINVAL;
> > >
> > > please propagate the error value of dma_mapping_error()
> > >
> >
> > Will do this in V2
> > > > +	}
> > > > +
> > > > +	dma_data->dma_rx_addr = dma_map_single(rx_dev,
> > > dma_data->dma_rx_buf,
> > > > +
> > > DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > > +					       DMA_FROM_DEVICE);
> > > > +	if (dma_mapping_error(rx_dev, dma_data->dma_rx_addr)) {
> > > > +		dev_err(spi_imx->dev, "DMA RX map failed\n");
> > > > +		goto rx_map_err;
> > >
> > > there's only one user of the dma_unmap_single(), so no need for the goto.
> > >
> > This goto is to unmap previous TX, not this RX. TX has been mapped
> > then start to map RX, now RX mapping error, Do we really don't need to
> rollback for TX?
> 
> Sorry there was a misunderstanding, I mean you should directly call
> dma_unmap_single() and remove the goto.
> 
> >
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +
> > > > +rx_map_err:
> > > > +	dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> > > > +			 DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> > > > +			 DMA_TO_DEVICE);
> > > > +	return -EINVAL;
> > > > +}
> > > > +
> > > > +static int spi_imx_dma_tx_data_handle(struct spi_imx_data *spi_imx,
> > > > +				      struct dma_data_package *dma_data,
> > > > +				      const void *tx_buf,
> > > > +				      bool word_delay)
> > > > +{
> > > > +#ifdef __LITTLE_ENDIAN
> > > > +	unsigned int bytes_per_word =
> > > spi_imx_bytes_per_word(spi_imx->bits_per_word);
> > > > +	u32 *temp;
> > > > +#endif
> > >
> > > move into scope of if()
> > >
> > Will do this in V2.
> > > > +	void *copy_ptr;
> > > > +	int unaligned;
> > > > +
> > > > +	if (word_delay) {
> > > > +		dma_data->dma_len = dma_data->data_len;
> > > > +	} else {
> > > > +		/*
> > > > +		 * As per the reference manual, when burst length = 32*n + m
> > > > +bits,
> > > ECSPI
> > > > +		 * sends m LSB bits in the first word, followed by n full 32-bit
> words.
> > > > +		 * Since actual data may not be 4-byte aligned, allocate DMA
> > > > +TX/RX
> > > buffers
> > > > +		 * to ensure alignment. For TX, DMA pushes 4-byte aligned
> words
> > > > +to
> > > TXFIFO,
> > > > +		 * while ECSPI uses BURST_LENGTH settings to maintain correct
> > > > +bit
> > > count.
> > > > +		 * For RX, DMA receives 32-bit words from RXFIFO; after
> > > > +transfer
> > > completes,
> > > > +		 * trim the DMA RX buffer and copy the actual data to rx_buf.
> > > > +		 */
> > >
> > > Ahh, please add the corresponding description for rx.
> > >
> > Will do this in V2.
> > > > +		dma_data->dma_len = ALIGN(dma_data->data_len,
> > > BYTES_PER_32BITS_WORD);
> > > > +	}
> > > > +
> > > > +	dma_data->dma_tx_buf = kmalloc(dma_data->dma_len,
> GFP_KERNEL |
> > > > +__GFP_ZERO);
> > >
> > > kzalloc()?
> > >
> > Yes. Will do this in V2
> > > > +	if (!dma_data->dma_tx_buf)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	dma_data->dma_rx_buf = kmalloc(dma_data->dma_len,
> GFP_KERNEL |
> > > __GFP_ZERO);
> > > > +	if (!dma_data->dma_rx_buf) {
> > > > +		kfree(dma_data->dma_tx_buf);
> > > > +		return -ENOMEM;
> > > > +	}
> > > > +
> > > > +	if (dma_data->data_len % BYTES_PER_32BITS_WORD
> && !word_delay) {
> > > > +		unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> > > > +		copy_ptr = (u8 *)dma_data->dma_tx_buf +
> > > BYTES_PER_32BITS_WORD - unaligned;
> > > > +	} else {
> > > > +		copy_ptr = dma_data->dma_tx_buf;
> > > > +	}
> > > > +
> > > > +	memcpy(copy_ptr, tx_buf, dma_data->data_len);
> > > > +
> > > > +	/*
> > > > +	 * When word_delay is enabled, DMA transfers an entire word in
> > > > +one
> > > minor loop.
> > > > +	 * In this case, no data requires additional handling.
> > > > +	 */
> > > > +	if (word_delay)
> > > > +		return 0;
> > > > +
> > > > +#ifdef __LITTLE_ENDIAN
> > > > +	/*
> > > > +	 * On little-endian CPUs, adjust byte order:
> > > > +	 * - Swap bytes when bpw = 8
> > > > +	 * - Swap half-words when bpw = 16
> > > > +	 * This ensures correct data ordering for DMA transfers.
> > > > +	 */
> > > > +	temp = dma_data->dma_tx_buf;
> > > > +	for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len,
> > > BYTES_PER_32BITS_WORD); i++) {
> > > > +		if (bytes_per_word == 1)
> > > > +			swab32s(temp + i);
> > > > +		else if (bytes_per_word == 2)
> > > > +			swahw32s(temp + i);
> > > > +	}
> > > > +#endif
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
> > > > +				    struct spi_transfer *transfer,
> > > > +				    bool word_delay)
> > > > +{
> > > > +	u32 pre_bl, tail_bl;
> > > > +	u32 ctrl;
> > > > +	int ret;
> > > > +
> > > > +	/*
> > > > +	 * ECSPI supports a maximum burst of 512 bytes. When xfer->len
> > > > +exceeds
> > > 512
> > > > +	 * and is not a multiple of 512, a tail transfer is required. In this case,
> > > > +	 * an extra DMA request is issued for the remaining data.
> > >
> > > Why do you need an extra transfer in this case?
> > >
> >
> > BURST_LEGTH is used for SPI HW to maintain correct bit count. So
> > BURST_LENGTH should update with data length. After DMA request submit,
> > SPI can not update the BURST_LENGTH, when needed, we must split two
> package, update the register then setup second DMA transfer.
> >
> > ECSPI HW can update BURST_LENGTH auto, but it always update this using
> previous value.
> > When len > 512 but len is not 512-unaligned, we need two packages, second
> for tail data.
> > For example len is 512 *3 + 511. So first transfer using BURST_LENGTH
> > = 512 bytes(auto update 3 times), DMA transfer len = 512 *3, second
> > package BURST_LENGTH = 511 bytes, DMA transfer len = 511.(If here we
> > use 512 bytes as BURST_LENGTH, SPI will shift out/in extra bits, it
> > previous isn't acceptable!)
> 
> What happens if you keep the Burst Length at 512 and only transfer 511 bytes
> with the DMA engine?
> 
> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26  8:31       ` Marc Kleine-Budde
  2025-11-26  9:51         ` [EXT] " Carlos Song
@ 2025-11-26 11:17         ` Carlos Song
  2025-11-26 12:36           ` Marc Kleine-Budde
  1 sibling, 1 reply; 34+ messages in thread
From: Carlos Song @ 2025-11-26 11:17 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	imx@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-spi@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 4:31 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> imx@lists.linux.dev; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; linux-spi@vger.kernel.org
> Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> DMA mode
> 
> On 26.11.2025 07:42:36, Carlos Song wrote:
> 
> [...]
> > For example len is 512 *3 + 511. So first transfer using BURST_LENGTH
> > = 512 bytes(auto update 3 times), DMA transfer len = 512 *3, second
> > package BURST_LENGTH = 511 bytes, DMA transfer len = 511.(If here we
> > use 512 bytes as BURST_LENGTH, SPI will shift out/in extra bits, it
> > previous isn't acceptable!)
> 
> What happens if you keep the Burst Length at 512 and only transfer 511 bytes
> with the DMA engine?
> 

Sorry for missing one question:
BURST_LENGTH = 511, ECSPI will shift out 511 bytes in bus.
BURST_LENGTH = 512, ECSPI will shift out the last one word all bit in 32 bits FIFO. 
So ECSPI will shift out 512 bytes include 8-bits zero bytes in bus.

Carlos

> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/6] spi: imx: enable DMA mode for target operation
  2025-11-25 10:06 ` [PATCH 6/6] spi: imx: enable DMA mode for target operation Carlos Song
  2025-11-25 16:05   ` Frank Li
@ 2025-11-26 12:18   ` Marc Kleine-Budde
  2025-11-26 12:30     ` [EXT] " Carlos Song
  1 sibling, 1 reply; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26 12:18 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, frank.li, hawnguo, s.hauer, kernel, festevam, linux-spi,
	imx, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 987 bytes --]

On 25.11.2025 18:06:18, Carlos Song wrote:
> @@ -1895,10 +1920,16 @@ static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
>  static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
>  				struct spi_transfer *transfer)
>  {
> -	bool word_delay = transfer->word_delay.value != 0;
> +	bool word_delay = transfer->word_delay.value != 0 && !spi_imx->target_mode;
>  	int ret;
>  	int i;
>
> +	if (transfer->len > MX53_MAX_TRANSFER_BYTES && spi_imx->target_mode) {
> +		dev_err(spi_imx->dev, "Transaction too big, max size is %d bytes\n",
> +			MX53_MAX_TRANSFER_BYTES);
> +		return -EMSGSIZE;
> +	}

If there is this limitation, this check should go into
spi_imx_can_dma().

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-25 10:06 ` [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
                     ` (3 preceding siblings ...)
  2025-11-26  8:27   ` Marc Kleine-Budde
@ 2025-11-26 12:22   ` Marc Kleine-Budde
  2025-11-26 12:29     ` [EXT] " Carlos Song
  4 siblings, 1 reply; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26 12:22 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie, frank.li, hawnguo, s.hauer, kernel, festevam, linux-spi,
	imx, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

On 25.11.2025 18:06:17, Carlos Song wrote:
>  static int spi_imx_pio_transfer(struct spi_device *spi,
>  				struct spi_transfer *transfer)
>  {
> @@ -1780,9 +2112,14 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
>  	 * transfer, the SPI transfer has already been mapped, so we
>  	 * have to do the DMA transfer here.
>  	 */
> -	if (spi_imx->usedma)
> -		return spi_imx_dma_transfer(spi_imx, transfer);
> -
> +	if (spi_imx->usedma) {
> +		ret = spi_imx_dma_transfer(spi_imx, transfer);
> +		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> +			spi_imx->usedma = false;
> +			return spi_imx_pio_transfer(spi, transfer);
> +		}
> +		return ret;
> +	}

Why do you do this? AFAICS the framework already does this for you see:
spi_transfer_one_message().

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26 12:22   ` Marc Kleine-Budde
@ 2025-11-26 12:29     ` Carlos Song
  2025-11-26 12:52       ` Marc Kleine-Budde
  0 siblings, 1 reply; 34+ messages in thread
From: Carlos Song @ 2025-11-26 12:29 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 8:23 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> linux-spi@vger.kernel.org; imx@lists.linux.dev;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> DMA mode
> 
> On 25.11.2025 18:06:17, Carlos Song wrote:
> >  static int spi_imx_pio_transfer(struct spi_device *spi,
> >  				struct spi_transfer *transfer)
> >  {
> > @@ -1780,9 +2112,14 @@ static int spi_imx_transfer_one(struct
> spi_controller *controller,
> >  	 * transfer, the SPI transfer has already been mapped, so we
> >  	 * have to do the DMA transfer here.
> >  	 */
> > -	if (spi_imx->usedma)
> > -		return spi_imx_dma_transfer(spi_imx, transfer);
> > -
> > +	if (spi_imx->usedma) {
> > +		ret = spi_imx_dma_transfer(spi_imx, transfer);
> > +		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> > +			spi_imx->usedma = false;
> > +			return spi_imx_pio_transfer(spi, transfer);
> > +		}
> > +		return ret;
> > +	}
> 
> Why do you do this? AFAICS the framework already does this for you see:
> spi_transfer_one_message().
> 
Hi,

In frame work:
				if ((xfer->tx_sg_mapped || xfer->rx_sg_mapped) &&
				    (xfer->error & SPI_TRANS_FAIL_NO_START)) {
					__spi_unmap_msg(ctlr, msg);
					ctlr->fallback = true;
					xfer->error &= ~SPI_TRANS_FAIL_NO_START;
					goto fallback_pio;
				}

It only will help do this for "framework finished DMA map case". But DMA dynamic burst feature is mapped in driver.
So this condition never meet: (xfer->tx_sg_mapped || xfer->rx_sg_mapped) I think..
Carlos

> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [EXT] Re: [PATCH 6/6] spi: imx: enable DMA mode for target operation
  2025-11-26 12:18   ` Marc Kleine-Budde
@ 2025-11-26 12:30     ` Carlos Song
  0 siblings, 0 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-26 12:30 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 8:18 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> linux-spi@vger.kernel.org; imx@lists.linux.dev;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: [EXT] Re: [PATCH 6/6] spi: imx: enable DMA mode for target operation
> 
> On 25.11.2025 18:06:18, Carlos Song wrote:
> > @@ -1895,10 +1920,16 @@ static int spi_imx_dma_package_transfer(struct
> > spi_imx_data *spi_imx,  static int spi_imx_dma_transfer(struct spi_imx_data
> *spi_imx,
> >  				struct spi_transfer *transfer)
> >  {
> > -	bool word_delay = transfer->word_delay.value != 0;
> > +	bool word_delay = transfer->word_delay.value != 0 &&
> > +!spi_imx->target_mode;
> >  	int ret;
> >  	int i;
> >
> > +	if (transfer->len > MX53_MAX_TRANSFER_BYTES &&
> spi_imx->target_mode) {
> > +		dev_err(spi_imx->dev, "Transaction too big, max size is %d bytes\n",
> > +			MX53_MAX_TRANSFER_BYTES);
> > +		return -EMSGSIZE;
> > +	}
> 
> If there is this limitation, this check should go into spi_imx_can_dma().
> 

Yes. I can do this in V2.
>

 Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26 11:17         ` Carlos Song
@ 2025-11-26 12:36           ` Marc Kleine-Budde
  2025-11-27  2:36             ` [EXT] " Carlos Song
  0 siblings, 1 reply; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26 12:36 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	imx@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-spi@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

On 26.11.2025 11:17:41, Carlos Song wrote:
>
>
> > -----Original Message-----
> > From: Marc Kleine-Budde <mkl@pengutronix.de>
> > Sent: Wednesday, November 26, 2025 4:31 PM
> > To: Carlos Song <carlos.song@nxp.com>
> > Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> > s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> > imx@lists.linux.dev; linux-kernel@vger.kernel.org;
> > linux-arm-kernel@lists.infradead.org; linux-spi@vger.kernel.org
> > Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> > DMA mode
> >
> > On 26.11.2025 07:42:36, Carlos Song wrote:
> >
> > [...]
> > > For example len is 512 *3 + 511. So first transfer using BURST_LENGTH
> > > = 512 bytes(auto update 3 times), DMA transfer len = 512 *3, second
> > > package BURST_LENGTH = 511 bytes, DMA transfer len = 511.(If here we
> > > use 512 bytes as BURST_LENGTH, SPI will shift out/in extra bits, it
> > > previous isn't acceptable!)
> >
> > What happens if you keep the Burst Length at 512 and only transfer 511 bytes
> > with the DMA engine?
> >
>
> Sorry for missing one question:
> BURST_LENGTH = 511, ECSPI will shift out 511 bytes in bus.
> BURST_LENGTH = 512, ECSPI will shift out the last one word all bit in 32 bits FIFO.
> So ECSPI will shift out 512 bytes include 8-bits zero bytes in bus.

Why was the tail transfer not needed before your patch?

Is because you configure buswidth to DMA_SLAVE_BUSWIDTH_4_BYTES
unconditionally on !word_delay mode. What happens if you always use
DMA_SLAVE_BUSWIDTH_1_BYTES?

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26 12:29     ` [EXT] " Carlos Song
@ 2025-11-26 12:52       ` Marc Kleine-Budde
  2025-11-27  2:58         ` Carlos Song
  0 siblings, 1 reply; 34+ messages in thread
From: Marc Kleine-Budde @ 2025-11-26 12:52 UTC (permalink / raw)
  To: Carlos Song
  Cc: broonie@kernel.org, Frank Li, s.hauer@pengutronix.de,
	kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2124 bytes --]

On 26.11.2025 12:29:35, Carlos Song wrote:
> > On 25.11.2025 18:06:17, Carlos Song wrote:
> > >  static int spi_imx_pio_transfer(struct spi_device *spi,
> > >  				struct spi_transfer *transfer)
> > >  {
> > > @@ -1780,9 +2112,14 @@ static int spi_imx_transfer_one(struct
> > spi_controller *controller,
> > >  	 * transfer, the SPI transfer has already been mapped, so we
> > >  	 * have to do the DMA transfer here.
> > >  	 */
> > > -	if (spi_imx->usedma)
> > > -		return spi_imx_dma_transfer(spi_imx, transfer);
> > > -
> > > +	if (spi_imx->usedma) {
> > > +		ret = spi_imx_dma_transfer(spi_imx, transfer);
> > > +		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> > > +			spi_imx->usedma = false;
> > > +			return spi_imx_pio_transfer(spi, transfer);
> > > +		}
> > > +		return ret;
> > > +	}
> >
> > Why do you do this? AFAICS the framework already does this for you see:
> > spi_transfer_one_message().
> >
> Hi,
>
> In frame work:
> 				if ((xfer->tx_sg_mapped || xfer->rx_sg_mapped) &&
> 				    (xfer->error & SPI_TRANS_FAIL_NO_START)) {
> 					__spi_unmap_msg(ctlr, msg);
> 					ctlr->fallback = true;
> 					xfer->error &= ~SPI_TRANS_FAIL_NO_START;
> 					goto fallback_pio;
> 				}
>
> It only will help do this for "framework finished DMA map case". But DMA dynamic burst feature is mapped in driver.
> So this condition never meet: (xfer->tx_sg_mapped || xfer->rx_sg_mapped) I think..

...and you remove "controller->can_dma" this deserves some comments in
the commit message. That you fool the framework, that you cannot do DMA,
and do all the mapping on your own.

If you can make it work with DMA_SLAVE_BUSWIDTH_1_BYTES then you can use
controller->prepare_message and controller->unprepare_message to do the
byte swapping. These functions are called before and after the DMA sync.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26 12:36           ` Marc Kleine-Budde
@ 2025-11-27  2:36             ` Carlos Song
  0 siblings, 0 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-27  2:36 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, hawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	imx@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-spi@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 8:36 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>; hawnguo@kernel.org;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> imx@lists.linux.dev; linux-kernel@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; linux-spi@vger.kernel.org
> Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI
> DMA mode
> 
> On 26.11.2025 11:17:41, Carlos Song wrote:
> >
> >
> > > -----Original Message-----
> > > From: Marc Kleine-Budde <mkl@pengutronix.de>
> > > Sent: Wednesday, November 26, 2025 4:31 PM
> > > To: Carlos Song <carlos.song@nxp.com>
> > > Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>;
> > > hawnguo@kernel.org; s.hauer@pengutronix.de; kernel@pengutronix.de;
> > > festevam@gmail.com; imx@lists.linux.dev;
> > > linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> > > linux-spi@vger.kernel.org
> > > Subject: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst
> > > length for ECSPI DMA mode
> > >
> > > On 26.11.2025 07:42:36, Carlos Song wrote:
> > >
> > > [...]
> > > > For example len is 512 *3 + 511. So first transfer using
> > > > BURST_LENGTH = 512 bytes(auto update 3 times), DMA transfer len =
> > > > 512 *3, second package BURST_LENGTH = 511 bytes, DMA transfer len
> > > > = 511.(If here we use 512 bytes as BURST_LENGTH, SPI will shift
> > > > out/in extra bits, it previous isn't acceptable!)
> > >
> > > What happens if you keep the Burst Length at 512 and only transfer
> > > 511 bytes with the DMA engine?
> > >
> >
> > Sorry for missing one question:
> > BURST_LENGTH = 511, ECSPI will shift out 511 bytes in bus.
> > BURST_LENGTH = 512, ECSPI will shift out the last one word all bit in 32 bits
> FIFO.
> > So ECSPI will shift out 512 bytes include 8-bits zero bytes in bus.
> 
> Why was the tail transfer not needed before your patch?
> 
> Is because you configure buswidth to DMA_SLAVE_BUSWIDTH_4_BYTES
> unconditionally on !word_delay mode. What happens if you always use
> DMA_SLAVE_BUSWIDTH_1_BYTES?
> 

Yes. DMA_SLAVE_BUSWIDTH_4_BYTES make this. I can not use DMA_SLAVE_BUSWIDTH_1_BYTES.
When BUTST_LENGTH is set >= 4 bytes, SPI always handle whole word size(always 32 bits) as RM shows.
"
Burst Length. This field defines the length of a SPI burst to be transferred. The Chip Select (SS) will
remain asserted until all bits in a SPI burst are shifted out. A maximum of 2^12 bits can be transferred in a
single SPI burst.
In master mode, it controls the number of bits per SPI burst. Since the shift register always loads 32-bit
data from transmit FIFO, only the n least-significant (n = BURST LENGTH + 1) will be shifted out. The
remaining bits will be ignored.
Number of Valid Bits in a SPI burst.
0x000 A SPI burst contains the 1 LSB in a word.
0x001 A SPI burst contains the 2 LSB in a word.
0x002 A SPI burst contains the 3 LSB in a word.
...
0x01F A SPI burst contains all 32 bits in a word.
0x020 A SPI burst contains the 1 LSB in first word and all 32 bits in second word.
0x021 A SPI burst contains the 2 LSB in first word and all 32 bits in second word.
...
0xFFE A SPI burst contains the 31 LSB in first word and 2^7 -1 words.
0xFFF A SPI burst contains 2^7 words.
"
For example, for TX, xfer->len = 4, BURST_LENGTRH = 4 bytes, DMA_SLAVE_BUSWIDTH_1_BYTES, DMA will fill one bytes in 32bits FIFO but SPI will shift one whole 32bits word in FIFO (3 zero bytes+ 1 bytes) + one word(3 zero bytes+ 1 bytes) +one word+ one word. Actually 16 bytes, not 4 bytes. That is wrong.

You can check previous code, it use BURST_LENTH to maintain the real bits count.
DMA_SLAVE_BUSWIDTH_1_BYTES: BBURST_LENGTH = 1 bytes
DMA_SLAVE_BUSWIDTH_2_BYTES: BBURST_LENGTH = 2 bytes
DMA_SLAVE_BUSWIDTH_4_BYTES: BBURST_LENGTH = 4 bytes

Carlos

> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
  2025-11-26 12:52       ` Marc Kleine-Budde
@ 2025-11-27  2:58         ` Carlos Song
  0 siblings, 0 replies; 34+ messages in thread
From: Carlos Song @ 2025-11-27  2:58 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: broonie@kernel.org, Frank Li, s.hauer@pengutronix.de,
	kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: Wednesday, November 26, 2025 8:52 PM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; Frank Li <frank.li@nxp.com>;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> linux-spi@vger.kernel.org; imx@lists.linux.dev;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: Re: RE: [EXT] Re: [PATCH 5/6] spi: imx: support dynamic burst length
> for ECSPI DMA mode
> 
> On 26.11.2025 12:29:35, Carlos Song wrote:
> > > On 25.11.2025 18:06:17, Carlos Song wrote:
> > > >  static int spi_imx_pio_transfer(struct spi_device *spi,
> > > >  				struct spi_transfer *transfer)  { @@ -1780,9
> +2112,14 @@
> > > > static int spi_imx_transfer_one(struct
> > > spi_controller *controller,
> > > >  	 * transfer, the SPI transfer has already been mapped, so we
> > > >  	 * have to do the DMA transfer here.
> > > >  	 */
> > > > -	if (spi_imx->usedma)
> > > > -		return spi_imx_dma_transfer(spi_imx, transfer);
> > > > -
> > > > +	if (spi_imx->usedma) {
> > > > +		ret = spi_imx_dma_transfer(spi_imx, transfer);
> > > > +		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> > > > +			spi_imx->usedma = false;
> > > > +			return spi_imx_pio_transfer(spi, transfer);
> > > > +		}
> > > > +		return ret;
> > > > +	}
> > >
> > > Why do you do this? AFAICS the framework already does this for you see:
> > > spi_transfer_one_message().
> > >
> > Hi,
> >
> > In frame work:
> > 				if ((xfer->tx_sg_mapped || xfer->rx_sg_mapped) &&
> > 				    (xfer->error & SPI_TRANS_FAIL_NO_START)) {
> > 					__spi_unmap_msg(ctlr, msg);
> > 					ctlr->fallback = true;
> > 					xfer->error &= ~SPI_TRANS_FAIL_NO_START;
> > 					goto fallback_pio;
> > 				}
> >
> > It only will help do this for "framework finished DMA map case". But DMA
> dynamic burst feature is mapped in driver.
> > So this condition never meet: (xfer->tx_sg_mapped || xfer->rx_sg_mapped) I
> think..
> 
> ...and you remove "controller->can_dma" this deserves some comments in the
> commit message. That you fool the framework, that you cannot do DMA, and
> do all the mapping on your own.
> 
Hi,

Yes, I will add this in V2 commit log.

Hahaha, I think it is not forced handle DMA in framework. What I understand is
which path to go by " controller->can_dma "(just like a flag), if I have special DMA configure, I don't need
code this then do DMA on my own, I have never had any fool framework ideas, I just do
what I should do.

> If you can make it work with DMA_SLAVE_BUSWIDTH_1_BYTES then you can
> use
> controller->prepare_message and controller->unprepare_message to do the
> byte swapping. These functions are called before and after the DMA sync.
> 

Just I said, DMA_SLAVE_BUSWIDTH_1_BYTES is not match ECSPI HW when enable dynamic burst.
Because SPI in IMX using sdma(external common function DMA), it is not dedicated SPI DMA,
so improve speed while maintaining compatibility with two different hardware components, these configurations were necessary.

Carlos
> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH 6/6] spi: imx: enable DMA mode for target operation
  2025-11-25 16:05   ` Frank Li
  2025-11-26  2:11     ` Carlos Song
@ 2025-12-02  7:00     ` Carlos Song
  1 sibling, 0 replies; 34+ messages in thread
From: Carlos Song @ 2025-12-02  7:00 UTC (permalink / raw)
  To: Frank Li
  Cc: broonie@kernel.org, hawnguo@kernel.org, s.hauer@pengutronix.de,
	kernel@pengutronix.de, festevam@gmail.com,
	linux-spi@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Frank Li <frank.li@nxp.com>
> Sent: Wednesday, November 26, 2025 12:06 AM
> To: Carlos Song <carlos.song@nxp.com>
> Cc: broonie@kernel.org; hawnguo@kernel.org; s.hauer@pengutronix.de;
> kernel@pengutronix.de; festevam@gmail.com; linux-spi@vger.kernel.org;
> imx@lists.linux.dev; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 6/6] spi: imx: enable DMA mode for target operation
> 
> On Tue, Nov 25, 2025 at 06:06:18PM +0800, Carlos Song wrote:
> > Enable DMA mode for SPI IMX in target mode.
> > Disable the word delay feature for target mode, because target mode
> > should always keep high performance to make sure it can follow the
> > master. Target mode continues to operate in dynamic burst mode.
> 
> If two paragraph, need extra empty line. If one parapraph, move to previous
> line.
> 
> >
> > Signed-off-by: Carlos Song <carlos.song@nxp.com>
> > ---
> >  drivers/spi/spi-imx.c | 78
> > +++++++++++++++++++++++++++++++------------
> >  1 file changed, 56 insertions(+), 22 deletions(-)
> >
> ...
> >
> > @@ -1756,23 +1753,51 @@ static int spi_imx_dma_submit(struct
> > spi_imx_data *spi_imx,
> >
> >  	transfer_timeout = spi_imx_calculate_timeout(spi_imx,
> > transfer->len);
> >
> > -	/* Wait SDMA to finish the data transfer.*/
> > -	time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
> > -						transfer_timeout);
> > -	if (!time_left) {
> > -		dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
> > -		dmaengine_terminate_all(controller->dma_tx);
> > -		dmaengine_terminate_all(controller->dma_rx);
> > -		return -ETIMEDOUT;
> > -	}
> > +	if (!spi_imx->target_mode) {
> > +		/* Wait SDMA to finish the data transfer.*/
> > +		time_left =
> wait_for_completion_timeout(&spi_imx->dma_tx_completion,
> > +							transfer_timeout);
> > +		if (!time_left) {
> > +			dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
> > +			dmaengine_terminate_all(controller->dma_tx);
> > +			dmaengine_terminate_all(controller->dma_rx);
> > +			return -ETIMEDOUT;
> > +		}
> >
> > -	time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
> > -						transfer_timeout);
> > -	if (!time_left) {
> > -		dev_err(&controller->dev, "I/O Error in DMA RX\n");
> > -		spi_imx->devtype_data->reset(spi_imx);
> > -		dmaengine_terminate_all(controller->dma_rx);
> > -		return -ETIMEDOUT;
> > +		time_left =
> wait_for_completion_timeout(&spi_imx->dma_rx_completion,
> > +							transfer_timeout);
> > +		if (!time_left) {
> > +			dev_err(&controller->dev, "I/O Error in DMA RX\n");
> > +			spi_imx->devtype_data->reset(spi_imx);
> > +			dmaengine_terminate_all(controller->dma_rx);
> > +			return -ETIMEDOUT;
> > +		}
> > +	} else {
> > +		spi_imx->target_aborted = false;
> > +
> > +		if (wait_for_completion_interruptible(&spi_imx->dma_tx_completion)
> ||
> > +		    spi_imx->target_aborted) {
> 
Hi, Frank
Sorry missing these questions:

> Suppose somewhere set target_aborted to true. I think here should use
> READ_ONCE() to make sure read from memory.
> 

Will do this in V2.

> Not sure why here use wait_for_completion_interruptible() but at master mode
> use wait_for_completion_timeout().
>

Because target should support ctrl+c to stop this transfer(need interruptible) and target will keep waiting until the master clocks coming to shift out/in data(don't need timeout).
So use wait_for_completion_interruptible() for target mode

> > +			dev_dbg(spi_imx->dev, "I/O Error in DMA TX interrupted\n");
> > +			dmaengine_terminate_all(controller->dma_tx);
> > +			dmaengine_terminate_all(controller->dma_rx);
> > +			return -EINTR;
> > +		}
> > +
> > +		if (wait_for_completion_interruptible(&spi_imx->dma_rx_completion)
> ||
> > +		    spi_imx->target_aborted) {
> > +			dev_dbg(spi_imx->dev, "I/O Error in DMA RX interrupted\n");
> > +			dmaengine_terminate_all(controller->dma_rx);
> > +			return -EINTR;
> > +		}
> > +
> > +		/*
> > +		 * ECSPI has a HW issue when works in Target mode, after 64 words
> > +		 * writtern to TXFIFO, even TXFIFO becomes empty, ECSPI_TXDATA
> keeps
> > +		 * shift out the last word data, so we have to disable ECSPI when in
> > +		 * target mode after the transfer completes.
> > +		 */
> > +		if (spi_imx->devtype_data->disable)
> > +			spi_imx->devtype_data->disable(spi_imx);
> >  	}
> >
> >  	return 0;
> > @@ -1895,10 +1920,16 @@ static int spi_imx_dma_package_transfer(struct
> > spi_imx_data *spi_imx,  static int spi_imx_dma_transfer(struct spi_imx_data
> *spi_imx,
> >  				struct spi_transfer *transfer)
> >  {
> > -	bool word_delay = transfer->word_delay.value != 0;
> > +	bool word_delay = transfer->word_delay.value != 0 &&
> > +!spi_imx->target_mode;
> >  	int ret;
> >  	int i;
> >
> > +	if (transfer->len > MX53_MAX_TRANSFER_BYTES &&
> spi_imx->target_mode)
> > +{
> 
> why only target have len limiation?
> 
> Frank

This is a errata:
Due to Freescale errata ERR003775 "eCSPI: Burst completion by Chip Select (SS) signal in Slave mode is not functional" burst size must be set exactly to the size of the transfer.
This limit SPI transaction with maximum 2^12 bits.

So I add a limit for this. I will comment at V2.

Carlos
> > +		dev_err(spi_imx->dev, "Transaction too big, max size is %d bytes\n",
> > +			MX53_MAX_TRANSFER_BYTES);
> > +		return -EMSGSIZE;
> > +	}
> > +
> >  	ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> >  	if (ret < 0) {
> >  		transfer->error |= SPI_TRANS_FAIL_NO_START; @@ -2104,7 +2135,7
> @@
> > static int spi_imx_transfer_one(struct spi_controller *controller,
> >  	while (spi_imx->devtype_data->rx_available(spi_imx))
> >  		readl(spi_imx->base + MXC_CSPIRXDATA);
> >
> > -	if (spi_imx->target_mode)
> > +	if (spi_imx->target_mode && !spi_imx->usedma)
> >  		return spi_imx_pio_transfer_target(spi, transfer);
> >
> >  	/*
> > @@ -2116,7 +2147,10 @@ static int spi_imx_transfer_one(struct
> spi_controller *controller,
> >  		ret = spi_imx_dma_transfer(spi_imx, transfer);
> >  		if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> >  			spi_imx->usedma = false;
> > -			return spi_imx_pio_transfer(spi, transfer);
> > +			if (spi_imx->target_mode)
> > +				return spi_imx_pio_transfer_target(spi, transfer);
> > +			else
> > +				return spi_imx_pio_transfer(spi, transfer);
> >  		}
> >  		return ret;
> >  	}
> > --
> > 2.34.1
> >


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-12-02  7:01 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25 10:06 [PATCH 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
2025-11-25 10:06 ` [PATCH 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer() Carlos Song
2025-11-25 15:32   ` Frank Li
2025-11-25 10:06 ` [PATCH 2/6] spi: imx: introduce helper to clear DMA mode logic Carlos Song
2025-11-25 15:41   ` Frank Li
2025-11-25 10:06 ` [PATCH 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure Carlos Song
2025-11-25 15:42   ` Frank Li
2025-11-25 10:06 ` [PATCH 4/6] spi: imx: handle DMA submission errors with dma_submit_error() Carlos Song
2025-11-25 15:45   ` Frank Li
2025-11-25 10:06 ` [PATCH 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
2025-11-25 12:10   ` Marc Kleine-Budde
2025-11-26  7:42     ` Carlos Song
2025-11-26  8:31       ` Marc Kleine-Budde
2025-11-26  9:51         ` [EXT] " Carlos Song
2025-11-26 11:17         ` Carlos Song
2025-11-26 12:36           ` Marc Kleine-Budde
2025-11-27  2:36             ` [EXT] " Carlos Song
2025-11-26  8:11   ` Marc Kleine-Budde
2025-11-26  8:18     ` [EXT] " Carlos Song
2025-11-26  8:20   ` Marc Kleine-Budde
2025-11-26  8:34     ` [EXT] " Carlos Song
2025-11-26  8:44       ` Marc Kleine-Budde
2025-11-26  8:27   ` Marc Kleine-Budde
2025-11-26  8:43     ` [EXT] " Carlos Song
2025-11-26 12:22   ` Marc Kleine-Budde
2025-11-26 12:29     ` [EXT] " Carlos Song
2025-11-26 12:52       ` Marc Kleine-Budde
2025-11-27  2:58         ` Carlos Song
2025-11-25 10:06 ` [PATCH 6/6] spi: imx: enable DMA mode for target operation Carlos Song
2025-11-25 16:05   ` Frank Li
2025-11-26  2:11     ` Carlos Song
2025-12-02  7:00     ` Carlos Song
2025-11-26 12:18   ` Marc Kleine-Budde
2025-11-26 12:30     ` [EXT] " Carlos Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox