* [PATCH v2 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer()
2025-12-02 7:54 [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
@ 2025-12-02 7:54 ` Carlos Song
2025-12-02 7:54 ` [PATCH v2 2/6] spi: imx: introduce helper to clear DMA mode logic Carlos Song
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Carlos Song @ 2025-12-02 7:54 UTC (permalink / raw)
To: frank.li, mkl, broonie, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars
Cc: linux-spi, imx, linux-kernel, linux-arm-kernel, linux-hardening,
Carlos Song, Frank Li
Relocate spi_imx_dma_configure() next to spi_imx_dma_transfer() so that
all DMA-related functions are grouped together for better readability.
No functional changes.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
drivers/spi/spi-imx.c | 88 +++++++++++++++++++++----------------------
1 file changed, 44 insertions(+), 44 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index b8b79bb7fec3..e78e02a84b50 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1282,50 +1282,6 @@ static irqreturn_t spi_imx_isr(int irq, void *dev_id)
return IRQ_HANDLED;
}
-static int spi_imx_dma_configure(struct spi_controller *controller)
-{
- int ret;
- enum dma_slave_buswidth buswidth;
- struct dma_slave_config rx = {}, tx = {};
- struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
-
- switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
- case 4:
- buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
- break;
- case 2:
- buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
- break;
- case 1:
- buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
- break;
- default:
- return -EINVAL;
- }
-
- tx.direction = DMA_MEM_TO_DEV;
- tx.dst_addr = spi_imx->base_phys + MXC_CSPITXDATA;
- tx.dst_addr_width = buswidth;
- tx.dst_maxburst = spi_imx->wml;
- ret = dmaengine_slave_config(controller->dma_tx, &tx);
- if (ret) {
- dev_err(spi_imx->dev, "TX dma configuration failed with %d\n", ret);
- return ret;
- }
-
- rx.direction = DMA_DEV_TO_MEM;
- rx.src_addr = spi_imx->base_phys + MXC_CSPIRXDATA;
- rx.src_addr_width = buswidth;
- rx.src_maxburst = spi_imx->wml;
- ret = dmaengine_slave_config(controller->dma_rx, &rx);
- if (ret) {
- dev_err(spi_imx->dev, "RX dma configuration failed with %d\n", ret);
- return ret;
- }
-
- return 0;
-}
-
static int spi_imx_setupxfer(struct spi_device *spi,
struct spi_transfer *t)
{
@@ -1481,6 +1437,50 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
return secs_to_jiffies(2 * timeout);
}
+static int spi_imx_dma_configure(struct spi_controller *controller)
+{
+ int ret;
+ enum dma_slave_buswidth buswidth;
+ struct dma_slave_config rx = {}, tx = {};
+ struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
+
+ switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
+ case 4:
+ buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
+ break;
+ case 2:
+ buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
+ break;
+ case 1:
+ buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ tx.direction = DMA_MEM_TO_DEV;
+ tx.dst_addr = spi_imx->base_phys + MXC_CSPITXDATA;
+ tx.dst_addr_width = buswidth;
+ tx.dst_maxburst = spi_imx->wml;
+ ret = dmaengine_slave_config(controller->dma_tx, &tx);
+ if (ret) {
+ dev_err(spi_imx->dev, "TX dma configuration failed with %d\n", ret);
+ return ret;
+ }
+
+ rx.direction = DMA_DEV_TO_MEM;
+ rx.src_addr = spi_imx->base_phys + MXC_CSPIRXDATA;
+ rx.src_addr_width = buswidth;
+ rx.src_maxburst = spi_imx->wml;
+ ret = dmaengine_slave_config(controller->dma_rx, &rx);
+ if (ret) {
+ dev_err(spi_imx->dev, "RX dma configuration failed with %d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
struct spi_transfer *transfer)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 2/6] spi: imx: introduce helper to clear DMA mode logic
2025-12-02 7:54 [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
2025-12-02 7:54 ` [PATCH v2 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer() Carlos Song
@ 2025-12-02 7:54 ` Carlos Song
2025-12-02 7:55 ` [PATCH v2 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure Carlos Song
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Carlos Song @ 2025-12-02 7:54 UTC (permalink / raw)
To: frank.li, mkl, broonie, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars
Cc: linux-spi, imx, linux-kernel, linux-arm-kernel, linux-hardening,
Carlos Song, Frank Li
Add a helper function to handle clearing DMA mode, including getting the
maximum watermark length and submitting the DMA request. This refactoring
makes the code more concise and improves readability.
No functional changes.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
drivers/spi/spi-imx.c | 164 +++++++++++++++++++++++-------------------
1 file changed, 92 insertions(+), 72 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index e78e02a84b50..012f5bcbf73f 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1437,6 +1437,94 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
return secs_to_jiffies(2 * timeout);
}
+static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
+ struct spi_transfer *transfer)
+{
+ struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
+ struct spi_controller *controller = spi_imx->controller;
+ struct dma_async_tx_descriptor *desc_tx, *desc_rx;
+ unsigned long transfer_timeout;
+ unsigned long time_left;
+
+ /*
+ * The TX DMA setup starts the transfer, so make sure RX is configured
+ * before TX.
+ */
+ desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
+ rx->sgl, rx->nents, DMA_DEV_TO_MEM,
+ DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+ if (!desc_rx) {
+ transfer->error |= SPI_TRANS_FAIL_NO_START;
+ return -EINVAL;
+ }
+
+ desc_rx->callback = spi_imx_dma_rx_callback;
+ desc_rx->callback_param = (void *)spi_imx;
+ dmaengine_submit(desc_rx);
+ reinit_completion(&spi_imx->dma_rx_completion);
+ dma_async_issue_pending(controller->dma_rx);
+
+ desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
+ tx->sgl, tx->nents, DMA_MEM_TO_DEV,
+ DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+ if (!desc_tx) {
+ dmaengine_terminate_all(controller->dma_tx);
+ dmaengine_terminate_all(controller->dma_rx);
+ return -EINVAL;
+ }
+
+ desc_tx->callback = spi_imx_dma_tx_callback;
+ desc_tx->callback_param = (void *)spi_imx;
+ dmaengine_submit(desc_tx);
+ reinit_completion(&spi_imx->dma_tx_completion);
+ dma_async_issue_pending(controller->dma_tx);
+
+ spi_imx->devtype_data->trigger(spi_imx);
+
+ transfer_timeout = spi_imx_calculate_timeout(spi_imx, transfer->len);
+
+ /* Wait SDMA to finish the data transfer.*/
+ time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
+ transfer_timeout);
+ if (!time_left) {
+ dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
+ dmaengine_terminate_all(controller->dma_tx);
+ dmaengine_terminate_all(controller->dma_rx);
+ return -ETIMEDOUT;
+ }
+
+ time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
+ transfer_timeout);
+ if (!time_left) {
+ dev_err(&controller->dev, "I/O Error in DMA RX\n");
+ spi_imx->devtype_data->reset(spi_imx);
+ dmaengine_terminate_all(controller->dma_rx);
+ return -ETIMEDOUT;
+ }
+
+ return 0;
+}
+
+static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
+ struct spi_transfer *transfer)
+{
+ struct sg_table *rx = &transfer->rx_sg;
+ struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
+ unsigned int bytes_per_word, i;
+
+ /* Get the right burst length from the last sg to ensure no tail data */
+ bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
+ for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
+ if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
+ break;
+ }
+ /* Use 1 as wml in case no available burst length got */
+ if (i == 0)
+ i = 1;
+
+ spi_imx->wml = i;
+}
+
static int spi_imx_dma_configure(struct spi_controller *controller)
{
int ret;
@@ -1484,26 +1572,10 @@ static int spi_imx_dma_configure(struct spi_controller *controller)
static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
struct spi_transfer *transfer)
{
- struct dma_async_tx_descriptor *desc_tx, *desc_rx;
- unsigned long transfer_timeout;
- unsigned long time_left;
struct spi_controller *controller = spi_imx->controller;
- struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
- struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
- unsigned int bytes_per_word, i;
int ret;
- /* Get the right burst length from the last sg to ensure no tail data */
- bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
- for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
- if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
- break;
- }
- /* Use 1 as wml in case no available burst length got */
- if (i == 0)
- i = 1;
-
- spi_imx->wml = i;
+ spi_imx_dma_max_wml_find(spi_imx, transfer);
ret = spi_imx_dma_configure(controller);
if (ret)
@@ -1516,61 +1588,9 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
}
spi_imx->devtype_data->setup_wml(spi_imx);
- /*
- * The TX DMA setup starts the transfer, so make sure RX is configured
- * before TX.
- */
- desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
- rx->sgl, rx->nents, DMA_DEV_TO_MEM,
- DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
- if (!desc_rx) {
- ret = -EINVAL;
- goto dma_failure_no_start;
- }
-
- desc_rx->callback = spi_imx_dma_rx_callback;
- desc_rx->callback_param = (void *)spi_imx;
- dmaengine_submit(desc_rx);
- reinit_completion(&spi_imx->dma_rx_completion);
- dma_async_issue_pending(controller->dma_rx);
-
- desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
- tx->sgl, tx->nents, DMA_MEM_TO_DEV,
- DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
- if (!desc_tx) {
- dmaengine_terminate_all(controller->dma_tx);
- dmaengine_terminate_all(controller->dma_rx);
- return -EINVAL;
- }
-
- desc_tx->callback = spi_imx_dma_tx_callback;
- desc_tx->callback_param = (void *)spi_imx;
- dmaengine_submit(desc_tx);
- reinit_completion(&spi_imx->dma_tx_completion);
- dma_async_issue_pending(controller->dma_tx);
-
- spi_imx->devtype_data->trigger(spi_imx);
-
- transfer_timeout = spi_imx_calculate_timeout(spi_imx, transfer->len);
-
- /* Wait SDMA to finish the data transfer.*/
- time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
- transfer_timeout);
- if (!time_left) {
- dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
- dmaengine_terminate_all(controller->dma_tx);
- dmaengine_terminate_all(controller->dma_rx);
- return -ETIMEDOUT;
- }
-
- time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
- transfer_timeout);
- if (!time_left) {
- dev_err(&controller->dev, "I/O Error in DMA RX\n");
- spi_imx->devtype_data->reset(spi_imx);
- dmaengine_terminate_all(controller->dma_rx);
- return -ETIMEDOUT;
- }
+ ret = spi_imx_dma_submit(spi_imx, transfer);
+ if (ret)
+ return ret;
return 0;
/* fallback to pio */
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure
2025-12-02 7:54 [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
2025-12-02 7:54 ` [PATCH v2 1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer() Carlos Song
2025-12-02 7:54 ` [PATCH v2 2/6] spi: imx: introduce helper to clear DMA mode logic Carlos Song
@ 2025-12-02 7:55 ` Carlos Song
2025-12-02 7:55 ` [PATCH v2 4/6] spi: imx: handle DMA submission errors with dma_submit_error() Carlos Song
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Carlos Song @ 2025-12-02 7:55 UTC (permalink / raw)
To: frank.li, mkl, broonie, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars
Cc: linux-spi, imx, linux-kernel, linux-arm-kernel, linux-hardening,
Carlos Song, Frank Li
If dmaengine_prep_slave_sg() fails, no descriptor is submitted to the TX
channel and DMA is never started. Therefore, calling
dmaengine_terminate_all() for the TX DMA channel is unnecessary.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
drivers/spi/spi-imx.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 012f5bcbf73f..186963d3d2e0 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1468,7 +1468,6 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
tx->sgl, tx->nents, DMA_MEM_TO_DEV,
DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
if (!desc_tx) {
- dmaengine_terminate_all(controller->dma_tx);
dmaengine_terminate_all(controller->dma_rx);
return -EINVAL;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 4/6] spi: imx: handle DMA submission errors with dma_submit_error()
2025-12-02 7:54 [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
` (2 preceding siblings ...)
2025-12-02 7:55 ` [PATCH v2 3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure Carlos Song
@ 2025-12-02 7:55 ` Carlos Song
2025-12-02 7:55 ` [PATCH v2 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Carlos Song @ 2025-12-02 7:55 UTC (permalink / raw)
To: frank.li, mkl, broonie, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars
Cc: linux-spi, imx, linux-kernel, linux-arm-kernel, linux-hardening,
Carlos Song, Frank Li
Add error handling for DMA request submission by checking the return
cookie with dma_submit_error(). This prevents propagating submission
failures through the DMA transfer process, which could lead to
unexpected behavior.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
drivers/spi/spi-imx.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 186963d3d2e0..42f64d9535c9 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1445,6 +1445,7 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
struct dma_async_tx_descriptor *desc_tx, *desc_rx;
unsigned long transfer_timeout;
unsigned long time_left;
+ dma_cookie_t cookie;
/*
* The TX DMA setup starts the transfer, so make sure RX is configured
@@ -1460,21 +1461,29 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
desc_rx->callback = spi_imx_dma_rx_callback;
desc_rx->callback_param = (void *)spi_imx;
- dmaengine_submit(desc_rx);
+ cookie = dmaengine_submit(desc_rx);
+ if (dma_submit_error(cookie)) {
+ dev_err(spi_imx->dev, "submitting DMA RX failed\n");
+ transfer->error |= SPI_TRANS_FAIL_NO_START;
+ goto dmaengine_terminate_rx;
+ }
+
reinit_completion(&spi_imx->dma_rx_completion);
dma_async_issue_pending(controller->dma_rx);
desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
tx->sgl, tx->nents, DMA_MEM_TO_DEV,
DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
- if (!desc_tx) {
- dmaengine_terminate_all(controller->dma_rx);
- return -EINVAL;
- }
+ if (!desc_tx)
+ goto dmaengine_terminate_rx;
desc_tx->callback = spi_imx_dma_tx_callback;
desc_tx->callback_param = (void *)spi_imx;
- dmaengine_submit(desc_tx);
+ cookie = dmaengine_submit(desc_tx);
+ if (dma_submit_error(cookie)) {
+ dev_err(spi_imx->dev, "submitting DMA TX failed\n");
+ goto dmaengine_terminate_tx;
+ }
reinit_completion(&spi_imx->dma_tx_completion);
dma_async_issue_pending(controller->dma_tx);
@@ -1502,6 +1511,13 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
}
return 0;
+
+dmaengine_terminate_tx:
+ dmaengine_terminate_all(controller->dma_tx);
+dmaengine_terminate_rx:
+ dmaengine_terminate_all(controller->dma_rx);
+
+ return -EINVAL;
}
static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
2025-12-02 7:54 [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
` (3 preceding siblings ...)
2025-12-02 7:55 ` [PATCH v2 4/6] spi: imx: handle DMA submission errors with dma_submit_error() Carlos Song
@ 2025-12-02 7:55 ` Carlos Song
2025-12-02 15:06 ` Frank Li
2025-12-03 6:59 ` kernel test robot
2025-12-02 7:55 ` [PATCH v2 6/6] spi: imx: enable DMA mode for target operation Carlos Song
2025-12-15 13:59 ` [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Mark Brown
6 siblings, 2 replies; 11+ messages in thread
From: Carlos Song @ 2025-12-02 7:55 UTC (permalink / raw)
To: frank.li, mkl, broonie, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars
Cc: linux-spi, imx, linux-kernel, linux-arm-kernel, linux-hardening,
Carlos Song
ECSPI transfers only one word per frame in DMA mode, causing SCLK stalls
between words due to BURST_LENGTH updates, which significantly impacts
performance.
To improve throughput, configure BURST_LENGTH as large as possible (up to
512 bytes per frame) instead of word length. This avoids delays between
words. When transfer length is not 4-byte aligned, use bounce buffers to
align data for DMA. TX uses aligned words for TXFIFO, while RX trims DMA
buffer data after transfer completion.
Introduce a new dma_package structure to store:
1. BURST_LENGTH values for each DMA request
2. Variables for DMA submission
3. DMA transmission length and actual data length
Handle three cases:
- len <= 512 bytes: one package, BURST_LENGTH = len * 8 - 1
- len > 512 and aligned: one package, BURST_LENGTH = max (512 bytes)
- len > 512 and unaligned: two packages, second for tail data
Performance test (spidev_test @10MHz, 4KB):
Before: tx/rx ~6651.9 kbps
After: tx/rx ~9922.2 kbps (~50% improvement)
For compatibility with slow SPI devices, add configurable word delay in
DMA mode. When word delay is set, dynamic burst is disabled and
BURST_LENGTH equals word length.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
drivers/spi/spi-imx.c | 413 ++++++++++++++++++++++++++++++++++++++----
1 file changed, 377 insertions(+), 36 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 42f64d9535c9..045f4ffd680a 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -60,6 +60,7 @@ MODULE_PARM_DESC(polling_limit_us,
#define MX51_ECSPI_CTRL_MAX_BURST 512
/* The maximum bytes that IMX53_ECSPI can transfer in target mode.*/
#define MX53_MAX_TRANSFER_BYTES 512
+#define BYTES_PER_32BITS_WORD 4
enum spi_imx_devtype {
IMX1_CSPI,
@@ -95,6 +96,16 @@ struct spi_imx_devtype_data {
enum spi_imx_devtype devtype;
};
+struct dma_data_package {
+ u32 cmd_word;
+ void *dma_rx_buf;
+ void *dma_tx_buf;
+ dma_addr_t dma_tx_addr;
+ dma_addr_t dma_rx_addr;
+ int dma_len;
+ int data_len;
+};
+
struct spi_imx_data {
struct spi_controller *controller;
struct device *dev;
@@ -130,6 +141,9 @@ struct spi_imx_data {
u32 wml;
struct completion dma_rx_completion;
struct completion dma_tx_completion;
+ size_t dma_package_num;
+ struct dma_data_package *dma_data __counted_by(dma_package_num);
+ int rx_offset;
const struct spi_imx_devtype_data *devtype_data;
};
@@ -189,6 +203,9 @@ MXC_SPI_BUF_TX(u16)
MXC_SPI_BUF_RX(u32)
MXC_SPI_BUF_TX(u32)
+/* Align to cache line to avoid swiotlo bounce */
+#define DMA_CACHE_ALIGNED_LEN(x) ALIGN((x), dma_get_cache_alignment())
+
/* First entry is reserved, second entry is valid only if SDHC_SPIEN is set
* (which is currently not the case in this driver)
*/
@@ -253,6 +270,14 @@ static bool spi_imx_can_dma(struct spi_controller *controller, struct spi_device
if (transfer->len < spi_imx->devtype_data->fifo_size)
return false;
+ /* DMA only can transmit data in bytes */
+ if (spi_imx->bits_per_word != 8 && spi_imx->bits_per_word != 16 &&
+ spi_imx->bits_per_word != 32)
+ return false;
+
+ if (transfer->len >= MAX_SDMA_BD_BYTES)
+ return false;
+
spi_imx->dynamic_burst = 0;
return true;
@@ -1398,8 +1423,6 @@ static int spi_imx_sdma_init(struct device *dev, struct spi_imx_data *spi_imx,
init_completion(&spi_imx->dma_rx_completion);
init_completion(&spi_imx->dma_tx_completion);
- controller->can_dma = spi_imx_can_dma;
- controller->max_dma_len = MAX_SDMA_BD_BYTES;
spi_imx->controller->flags = SPI_CONTROLLER_MUST_RX |
SPI_CONTROLLER_MUST_TX;
@@ -1437,10 +1460,259 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
return secs_to_jiffies(2 * timeout);
}
+static void spi_imx_dma_unmap(struct spi_imx_data *spi_imx,
+ struct dma_data_package *dma_data)
+{
+ struct device *tx_dev = spi_imx->controller->dma_tx->device->dev;
+ struct device *rx_dev = spi_imx->controller->dma_rx->device->dev;
+
+ dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
+ DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+ DMA_TO_DEVICE);
+ dma_unmap_single(rx_dev, dma_data->dma_rx_addr,
+ DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+ DMA_FROM_DEVICE);
+}
+
+static void spi_imx_dma_rx_data_handle(struct spi_imx_data *spi_imx,
+ struct dma_data_package *dma_data, void *rx_buf,
+ bool word_delay)
+{
+ void *copy_ptr;
+ int unaligned;
+
+ /*
+ * On little-endian CPUs, adjust byte order:
+ * - Swap bytes when bpw = 8
+ * - Swap half-words when bpw = 16
+ * This ensures correct data ordering for DMA transfers.
+ */
+#ifdef __LITTLE_ENDIAN
+ if (!word_delay) {
+ unsigned int bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+ u32 *temp = dma_data->dma_rx_buf;
+
+ for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len, sizeof(*temp)); i++) {
+ if (bytes_per_word == 1)
+ swab32s(temp + i);
+ else if (bytes_per_word == 2)
+ swahw32s(temp + i);
+ }
+ }
+#endif
+
+ /*
+ * When dynamic burst enabled, DMA RX always receives 32-bit words from RXFIFO with
+ * buswidth = 4, but when data_len is not 4-bytes alignment, the RM shows when
+ * burst length = 32*n + m bits, a SPI burst contains the m LSB in first word and all
+ * 32 bits in other n words. So if garbage bytes in the first word, trim first word then
+ * copy the actual data to rx_buf.
+ */
+ if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
+ unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
+ copy_ptr = (u8 *)dma_data->dma_rx_buf + BYTES_PER_32BITS_WORD - unaligned;
+ } else {
+ copy_ptr = dma_data->dma_rx_buf;
+ }
+
+ memcpy(rx_buf, copy_ptr, dma_data->data_len);
+}
+
+static int spi_imx_dma_map(struct spi_imx_data *spi_imx,
+ struct dma_data_package *dma_data)
+{
+ struct spi_controller *controller = spi_imx->controller;
+ struct device *tx_dev = controller->dma_tx->device->dev;
+ struct device *rx_dev = controller->dma_rx->device->dev;
+ int ret;
+
+ dma_data->dma_tx_addr = dma_map_single(tx_dev, dma_data->dma_tx_buf,
+ DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+ DMA_TO_DEVICE);
+ ret = dma_mapping_error(tx_dev, dma_data->dma_tx_addr);
+ if (ret < 0) {
+ dev_err(spi_imx->dev, "DMA TX map failed %d\n", ret);
+ return ret;
+ }
+
+ dma_data->dma_rx_addr = dma_map_single(rx_dev, dma_data->dma_rx_buf,
+ DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+ DMA_FROM_DEVICE);
+ ret = dma_mapping_error(rx_dev, dma_data->dma_rx_addr);
+ if (ret < 0) {
+ dev_err(spi_imx->dev, "DMA RX map failed %d\n", ret);
+ dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
+ DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
+ DMA_TO_DEVICE);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int spi_imx_dma_tx_data_handle(struct spi_imx_data *spi_imx,
+ struct dma_data_package *dma_data,
+ const void *tx_buf,
+ bool word_delay)
+{
+ void *copy_ptr;
+ int unaligned;
+
+ if (word_delay) {
+ dma_data->dma_len = dma_data->data_len;
+ } else {
+ /*
+ * As per the reference manual, when burst length = 32*n + m bits, ECSPI
+ * sends m LSB bits in the first word, followed by n full 32-bit words.
+ * Since actual data may not be 4-byte aligned, allocate DMA TX/RX buffers
+ * to ensure alignment. For TX, DMA pushes 4-byte aligned words to TXFIFO,
+ * while ECSPI uses BURST_LENGTH settings to maintain correct bit count.
+ * For RX, DMA always receives 32-bit words from RXFIFO, when data len is
+ * not 4-byte aligned, trim the first word to drop garbage bytes, then group
+ * all transfer DMA bounse buffer and copy all valid data to rx_buf.
+ */
+ dma_data->dma_len = ALIGN(dma_data->data_len, BYTES_PER_32BITS_WORD);
+ }
+
+ dma_data->dma_tx_buf = kzalloc(dma_data->dma_len, GFP_KERNEL);
+ if (!dma_data->dma_tx_buf)
+ return -ENOMEM;
+
+ dma_data->dma_rx_buf = kzalloc(dma_data->dma_len, GFP_KERNEL);
+ if (!dma_data->dma_rx_buf) {
+ kfree(dma_data->dma_tx_buf);
+ return -ENOMEM;
+ }
+
+ if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
+ unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
+ copy_ptr = (u8 *)dma_data->dma_tx_buf + BYTES_PER_32BITS_WORD - unaligned;
+ } else {
+ copy_ptr = dma_data->dma_tx_buf;
+ }
+
+ memcpy(copy_ptr, tx_buf, dma_data->data_len);
+
+ /*
+ * When word_delay is enabled, DMA transfers an entire word in one minor loop.
+ * In this case, no data requires additional handling.
+ */
+ if (word_delay)
+ return 0;
+
+#ifdef __LITTLE_ENDIAN
+ /*
+ * On little-endian CPUs, adjust byte order:
+ * - Swap bytes when bpw = 8
+ * - Swap half-words when bpw = 16
+ * This ensures correct data ordering for DMA transfers.
+ */
+ unsigned int bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+ u32 *temp = dma_data->dma_tx_buf;
+
+ for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len, sizeof(*temp)); i++) {
+ if (bytes_per_word == 1)
+ swab32s(temp + i);
+ else if (bytes_per_word == 2)
+ swahw32s(temp + i);
+ }
+#endif
+
+ return 0;
+}
+
+static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
+ struct spi_transfer *transfer,
+ bool word_delay)
+{
+ u32 pre_bl, tail_bl;
+ u32 ctrl;
+ int ret;
+
+ /*
+ * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds 512
+ * and is not a multiple of 512, a tail transfer is required. BURST_LEGTH
+ * is used for SPI HW to maintain correct bit count. BURST_LENGTH should
+ * update with data length. After DMA request submit, SPI can not update the
+ * BURST_LENGTH, in this case, we must split two package, update the register
+ * then setup second DMA transfer.
+ */
+ ctrl = readl(spi_imx->base + MX51_ECSPI_CTRL);
+ if (word_delay) {
+ /*
+ * When SPI IMX need to support word delay, according to "Sample Period Control
+ * Register" shows, The Sample Period Control Register (ECSPI_PERIODREG)
+ * provides software a way to insert delays (wait states) between consecutive
+ * SPI transfers. As a result, ECSPI can only transfer one word per frame, and
+ * the delay occurs between frames.
+ */
+ spi_imx->dma_package_num = 1;
+ pre_bl = spi_imx->bits_per_word - 1;
+ } else if (transfer->len <= MX51_ECSPI_CTRL_MAX_BURST) {
+ spi_imx->dma_package_num = 1;
+ pre_bl = transfer->len * BITS_PER_BYTE - 1;
+ } else if (!(transfer->len % MX51_ECSPI_CTRL_MAX_BURST)) {
+ spi_imx->dma_package_num = 1;
+ pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
+ } else {
+ spi_imx->dma_package_num = 2;
+ pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
+ tail_bl = (transfer->len % MX51_ECSPI_CTRL_MAX_BURST) * BITS_PER_BYTE - 1;
+ }
+
+ spi_imx->dma_data = kmalloc_array(spi_imx->dma_package_num,
+ sizeof(struct dma_data_package),
+ GFP_KERNEL | __GFP_ZERO);
+ if (!spi_imx->dma_data) {
+ dev_err(spi_imx->dev, "Failed to allocate DMA package buffer!\n");
+ return -ENOMEM;
+ }
+
+ if (spi_imx->dma_package_num == 1) {
+ ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
+ ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
+ spi_imx->dma_data[0].cmd_word = ctrl;
+ spi_imx->dma_data[0].data_len = transfer->len;
+ ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
+ word_delay);
+ if (ret) {
+ kfree(spi_imx->dma_data);
+ return ret;
+ }
+ } else {
+ ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
+ ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
+ spi_imx->dma_data[0].cmd_word = ctrl;
+ spi_imx->dma_data[0].data_len = round_down(transfer->len,
+ MX51_ECSPI_CTRL_MAX_BURST);
+ ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
+ false);
+ if (ret) {
+ kfree(spi_imx->dma_data);
+ return ret;
+ }
+
+ ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
+ ctrl |= tail_bl << MX51_ECSPI_CTRL_BL_OFFSET;
+ spi_imx->dma_data[1].cmd_word = ctrl;
+ spi_imx->dma_data[1].data_len = transfer->len % MX51_ECSPI_CTRL_MAX_BURST;
+ ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[1],
+ transfer->tx_buf + spi_imx->dma_data[0].data_len,
+ false);
+ if (ret) {
+ kfree(spi_imx->dma_data[0].dma_tx_buf);
+ kfree(spi_imx->dma_data[0].dma_rx_buf);
+ kfree(spi_imx->dma_data);
+ }
+ }
+
+ return 0;
+}
+
static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
+ struct dma_data_package *dma_data,
struct spi_transfer *transfer)
{
- struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
struct spi_controller *controller = spi_imx->controller;
struct dma_async_tx_descriptor *desc_tx, *desc_rx;
unsigned long transfer_timeout;
@@ -1451,9 +1723,9 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
* The TX DMA setup starts the transfer, so make sure RX is configured
* before TX.
*/
- desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
- rx->sgl, rx->nents, DMA_DEV_TO_MEM,
- DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+ desc_rx = dmaengine_prep_slave_single(controller->dma_rx, dma_data->dma_rx_addr,
+ dma_data->dma_len, DMA_DEV_TO_MEM,
+ DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
if (!desc_rx) {
transfer->error |= SPI_TRANS_FAIL_NO_START;
return -EINVAL;
@@ -1471,9 +1743,9 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
reinit_completion(&spi_imx->dma_rx_completion);
dma_async_issue_pending(controller->dma_rx);
- desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
- tx->sgl, tx->nents, DMA_MEM_TO_DEV,
- DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+ desc_tx = dmaengine_prep_slave_single(controller->dma_tx, dma_data->dma_tx_addr,
+ dma_data->dma_len, DMA_MEM_TO_DEV,
+ DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
if (!desc_tx)
goto dmaengine_terminate_rx;
@@ -1521,16 +1793,16 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
}
static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
- struct spi_transfer *transfer)
+ struct dma_data_package *dma_data,
+ bool word_delay)
{
- struct sg_table *rx = &transfer->rx_sg;
- struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
- unsigned int bytes_per_word, i;
+ unsigned int bytes_per_word = word_delay ?
+ spi_imx_bytes_per_word(spi_imx->bits_per_word) :
+ BYTES_PER_32BITS_WORD;
+ unsigned int i;
- /* Get the right burst length from the last sg to ensure no tail data */
- bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
- if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
+ if (!dma_data->dma_len % (i * bytes_per_word))
break;
}
/* Use 1 as wml in case no available burst length got */
@@ -1540,25 +1812,29 @@ static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
spi_imx->wml = i;
}
-static int spi_imx_dma_configure(struct spi_controller *controller)
+static int spi_imx_dma_configure(struct spi_controller *controller, bool word_delay)
{
int ret;
enum dma_slave_buswidth buswidth;
struct dma_slave_config rx = {}, tx = {};
struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
- switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
- case 4:
+ if (word_delay) {
+ switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
+ case 4:
+ buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
+ break;
+ case 2:
+ buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
+ break;
+ case 1:
+ buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
+ break;
+ default:
+ return -EINVAL;
+ }
+ } else {
buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
- break;
- case 2:
- buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
- break;
- case 1:
- buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
- break;
- default:
- return -EINVAL;
}
tx.direction = DMA_MEM_TO_DEV;
@@ -1584,15 +1860,17 @@ static int spi_imx_dma_configure(struct spi_controller *controller)
return 0;
}
-static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
- struct spi_transfer *transfer)
+static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
+ struct dma_data_package *dma_data,
+ struct spi_transfer *transfer,
+ bool word_delay)
{
struct spi_controller *controller = spi_imx->controller;
int ret;
- spi_imx_dma_max_wml_find(spi_imx, transfer);
+ spi_imx_dma_max_wml_find(spi_imx, dma_data, word_delay);
- ret = spi_imx_dma_configure(controller);
+ ret = spi_imx_dma_configure(controller, word_delay);
if (ret)
goto dma_failure_no_start;
@@ -1603,10 +1881,17 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
}
spi_imx->devtype_data->setup_wml(spi_imx);
- ret = spi_imx_dma_submit(spi_imx, transfer);
+ ret = spi_imx_dma_submit(spi_imx, dma_data, transfer);
if (ret)
return ret;
+ /* Trim the DMA RX buffer and copy the actual data to rx_buf */
+ dma_sync_single_for_cpu(controller->dma_rx->device->dev, dma_data->dma_rx_addr,
+ dma_data->dma_len, DMA_FROM_DEVICE);
+ spi_imx_dma_rx_data_handle(spi_imx, dma_data, transfer->rx_buf + spi_imx->rx_offset,
+ word_delay);
+ spi_imx->rx_offset += dma_data->data_len;
+
return 0;
/* fallback to pio */
dma_failure_no_start:
@@ -1614,6 +1899,57 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
return ret;
}
+static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
+ struct spi_transfer *transfer)
+{
+ bool word_delay = transfer->word_delay.value != 0;
+ int ret;
+ int i;
+
+ ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
+ if (ret < 0) {
+ transfer->error |= SPI_TRANS_FAIL_NO_START;
+ dev_err(spi_imx->dev, "DMA data prepare fail\n");
+ goto fallback_pio;
+ }
+
+ spi_imx->rx_offset = 0;
+
+ /* Each dma_package performs a separate DMA transfer once */
+ for (i = 0; i < spi_imx->dma_package_num; i++) {
+ ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
+ if (ret < 0) {
+ if (i == 0)
+ transfer->error |= SPI_TRANS_FAIL_NO_START;
+ dev_err(spi_imx->dev, "DMA map fail\n");
+ break;
+ }
+
+ /* Update the CTRL register BL field */
+ writel(spi_imx->dma_data[i].cmd_word, spi_imx->base + MX51_ECSPI_CTRL);
+
+ ret = spi_imx_dma_package_transfer(spi_imx, &spi_imx->dma_data[i],
+ transfer, word_delay);
+
+ /* Whether the dma transmission is successful or not, dma unmap is necessary */
+ spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
+
+ if (ret < 0) {
+ dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
+ break;
+ }
+ }
+
+ for (int j = 0; j < spi_imx->dma_package_num; j++) {
+ kfree(spi_imx->dma_data[j].dma_tx_buf);
+ kfree(spi_imx->dma_data[j].dma_rx_buf);
+ }
+ kfree(spi_imx->dma_data);
+
+fallback_pio:
+ return ret;
+}
+
static int spi_imx_pio_transfer(struct spi_device *spi,
struct spi_transfer *transfer)
{
@@ -1780,9 +2116,14 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
* transfer, the SPI transfer has already been mapped, so we
* have to do the DMA transfer here.
*/
- if (spi_imx->usedma)
- return spi_imx_dma_transfer(spi_imx, transfer);
-
+ if (spi_imx->usedma) {
+ ret = spi_imx_dma_transfer(spi_imx, transfer);
+ if (transfer->error & SPI_TRANS_FAIL_NO_START) {
+ spi_imx->usedma = false;
+ return spi_imx_pio_transfer(spi, transfer);
+ }
+ return ret;
+ }
/* run in polling mode for short transfers */
if (transfer->len == 1 || (polling_limit_us &&
spi_imx_transfer_estimate_time_us(transfer) < polling_limit_us))
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v2 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
2025-12-02 7:55 ` [PATCH v2 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
@ 2025-12-02 15:06 ` Frank Li
2025-12-03 6:59 ` kernel test robot
1 sibling, 0 replies; 11+ messages in thread
From: Frank Li @ 2025-12-02 15:06 UTC (permalink / raw)
To: Carlos Song
Cc: mkl, broonie, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars, linux-spi, imx, linux-kernel, linux-arm-kernel,
linux-hardening
On Tue, Dec 02, 2025 at 03:55:02PM +0800, Carlos Song wrote:
> ECSPI transfers only one word per frame in DMA mode, causing SCLK stalls
> between words due to BURST_LENGTH updates, which significantly impacts
> performance.
>
> To improve throughput, configure BURST_LENGTH as large as possible (up to
> 512 bytes per frame) instead of word length. This avoids delays between
> words. When transfer length is not 4-byte aligned, use bounce buffers to
> align data for DMA. TX uses aligned words for TXFIFO, while RX trims DMA
> buffer data after transfer completion.
>
> Introduce a new dma_package structure to store:
> 1. BURST_LENGTH values for each DMA request
> 2. Variables for DMA submission
> 3. DMA transmission length and actual data length
>
> Handle three cases:
> - len <= 512 bytes: one package, BURST_LENGTH = len * 8 - 1
> - len > 512 and aligned: one package, BURST_LENGTH = max (512 bytes)
> - len > 512 and unaligned: two packages, second for tail data
>
> Performance test (spidev_test @10MHz, 4KB):
> Before: tx/rx ~6651.9 kbps
> After: tx/rx ~9922.2 kbps (~50% improvement)
>
> For compatibility with slow SPI devices, add configurable word delay in
> DMA mode. When word delay is set, dynamic burst is disabled and
> BURST_LENGTH equals word length.
>
> Signed-off-by: Carlos Song <carlos.song@nxp.com>
> ---
Reviewed-by: Frank Li <Frank.Li@nxp.com>
> drivers/spi/spi-imx.c | 413 ++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 377 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index 42f64d9535c9..045f4ffd680a 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -60,6 +60,7 @@ MODULE_PARM_DESC(polling_limit_us,
> #define MX51_ECSPI_CTRL_MAX_BURST 512
> /* The maximum bytes that IMX53_ECSPI can transfer in target mode.*/
> #define MX53_MAX_TRANSFER_BYTES 512
> +#define BYTES_PER_32BITS_WORD 4
>
> enum spi_imx_devtype {
> IMX1_CSPI,
> @@ -95,6 +96,16 @@ struct spi_imx_devtype_data {
> enum spi_imx_devtype devtype;
> };
>
> +struct dma_data_package {
> + u32 cmd_word;
> + void *dma_rx_buf;
> + void *dma_tx_buf;
> + dma_addr_t dma_tx_addr;
> + dma_addr_t dma_rx_addr;
> + int dma_len;
> + int data_len;
> +};
> +
> struct spi_imx_data {
> struct spi_controller *controller;
> struct device *dev;
> @@ -130,6 +141,9 @@ struct spi_imx_data {
> u32 wml;
> struct completion dma_rx_completion;
> struct completion dma_tx_completion;
> + size_t dma_package_num;
> + struct dma_data_package *dma_data __counted_by(dma_package_num);
> + int rx_offset;
>
> const struct spi_imx_devtype_data *devtype_data;
> };
> @@ -189,6 +203,9 @@ MXC_SPI_BUF_TX(u16)
> MXC_SPI_BUF_RX(u32)
> MXC_SPI_BUF_TX(u32)
>
> +/* Align to cache line to avoid swiotlo bounce */
> +#define DMA_CACHE_ALIGNED_LEN(x) ALIGN((x), dma_get_cache_alignment())
> +
> /* First entry is reserved, second entry is valid only if SDHC_SPIEN is set
> * (which is currently not the case in this driver)
> */
> @@ -253,6 +270,14 @@ static bool spi_imx_can_dma(struct spi_controller *controller, struct spi_device
> if (transfer->len < spi_imx->devtype_data->fifo_size)
> return false;
>
> + /* DMA only can transmit data in bytes */
> + if (spi_imx->bits_per_word != 8 && spi_imx->bits_per_word != 16 &&
> + spi_imx->bits_per_word != 32)
> + return false;
> +
> + if (transfer->len >= MAX_SDMA_BD_BYTES)
> + return false;
> +
> spi_imx->dynamic_burst = 0;
>
> return true;
> @@ -1398,8 +1423,6 @@ static int spi_imx_sdma_init(struct device *dev, struct spi_imx_data *spi_imx,
>
> init_completion(&spi_imx->dma_rx_completion);
> init_completion(&spi_imx->dma_tx_completion);
> - controller->can_dma = spi_imx_can_dma;
> - controller->max_dma_len = MAX_SDMA_BD_BYTES;
> spi_imx->controller->flags = SPI_CONTROLLER_MUST_RX |
> SPI_CONTROLLER_MUST_TX;
>
> @@ -1437,10 +1460,259 @@ static int spi_imx_calculate_timeout(struct spi_imx_data *spi_imx, int size)
> return secs_to_jiffies(2 * timeout);
> }
>
> +static void spi_imx_dma_unmap(struct spi_imx_data *spi_imx,
> + struct dma_data_package *dma_data)
> +{
> + struct device *tx_dev = spi_imx->controller->dma_tx->device->dev;
> + struct device *rx_dev = spi_imx->controller->dma_rx->device->dev;
> +
> + dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> + DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> + DMA_TO_DEVICE);
> + dma_unmap_single(rx_dev, dma_data->dma_rx_addr,
> + DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> + DMA_FROM_DEVICE);
> +}
> +
> +static void spi_imx_dma_rx_data_handle(struct spi_imx_data *spi_imx,
> + struct dma_data_package *dma_data, void *rx_buf,
> + bool word_delay)
> +{
> + void *copy_ptr;
> + int unaligned;
> +
> + /*
> + * On little-endian CPUs, adjust byte order:
> + * - Swap bytes when bpw = 8
> + * - Swap half-words when bpw = 16
> + * This ensures correct data ordering for DMA transfers.
> + */
> +#ifdef __LITTLE_ENDIAN
> + if (!word_delay) {
> + unsigned int bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
> + u32 *temp = dma_data->dma_rx_buf;
> +
> + for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len, sizeof(*temp)); i++) {
> + if (bytes_per_word == 1)
> + swab32s(temp + i);
> + else if (bytes_per_word == 2)
> + swahw32s(temp + i);
> + }
> + }
> +#endif
> +
> + /*
> + * When dynamic burst enabled, DMA RX always receives 32-bit words from RXFIFO with
> + * buswidth = 4, but when data_len is not 4-bytes alignment, the RM shows when
> + * burst length = 32*n + m bits, a SPI burst contains the m LSB in first word and all
> + * 32 bits in other n words. So if garbage bytes in the first word, trim first word then
> + * copy the actual data to rx_buf.
> + */
> + if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> + unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> + copy_ptr = (u8 *)dma_data->dma_rx_buf + BYTES_PER_32BITS_WORD - unaligned;
> + } else {
> + copy_ptr = dma_data->dma_rx_buf;
> + }
> +
> + memcpy(rx_buf, copy_ptr, dma_data->data_len);
> +}
> +
> +static int spi_imx_dma_map(struct spi_imx_data *spi_imx,
> + struct dma_data_package *dma_data)
> +{
> + struct spi_controller *controller = spi_imx->controller;
> + struct device *tx_dev = controller->dma_tx->device->dev;
> + struct device *rx_dev = controller->dma_rx->device->dev;
> + int ret;
> +
> + dma_data->dma_tx_addr = dma_map_single(tx_dev, dma_data->dma_tx_buf,
> + DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> + DMA_TO_DEVICE);
> + ret = dma_mapping_error(tx_dev, dma_data->dma_tx_addr);
> + if (ret < 0) {
> + dev_err(spi_imx->dev, "DMA TX map failed %d\n", ret);
> + return ret;
> + }
> +
> + dma_data->dma_rx_addr = dma_map_single(rx_dev, dma_data->dma_rx_buf,
> + DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> + DMA_FROM_DEVICE);
> + ret = dma_mapping_error(rx_dev, dma_data->dma_rx_addr);
> + if (ret < 0) {
> + dev_err(spi_imx->dev, "DMA RX map failed %d\n", ret);
> + dma_unmap_single(tx_dev, dma_data->dma_tx_addr,
> + DMA_CACHE_ALIGNED_LEN(dma_data->dma_len),
> + DMA_TO_DEVICE);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static int spi_imx_dma_tx_data_handle(struct spi_imx_data *spi_imx,
> + struct dma_data_package *dma_data,
> + const void *tx_buf,
> + bool word_delay)
> +{
> + void *copy_ptr;
> + int unaligned;
> +
> + if (word_delay) {
> + dma_data->dma_len = dma_data->data_len;
> + } else {
> + /*
> + * As per the reference manual, when burst length = 32*n + m bits, ECSPI
> + * sends m LSB bits in the first word, followed by n full 32-bit words.
> + * Since actual data may not be 4-byte aligned, allocate DMA TX/RX buffers
> + * to ensure alignment. For TX, DMA pushes 4-byte aligned words to TXFIFO,
> + * while ECSPI uses BURST_LENGTH settings to maintain correct bit count.
> + * For RX, DMA always receives 32-bit words from RXFIFO, when data len is
> + * not 4-byte aligned, trim the first word to drop garbage bytes, then group
> + * all transfer DMA bounse buffer and copy all valid data to rx_buf.
> + */
> + dma_data->dma_len = ALIGN(dma_data->data_len, BYTES_PER_32BITS_WORD);
> + }
> +
> + dma_data->dma_tx_buf = kzalloc(dma_data->dma_len, GFP_KERNEL);
> + if (!dma_data->dma_tx_buf)
> + return -ENOMEM;
> +
> + dma_data->dma_rx_buf = kzalloc(dma_data->dma_len, GFP_KERNEL);
> + if (!dma_data->dma_rx_buf) {
> + kfree(dma_data->dma_tx_buf);
> + return -ENOMEM;
> + }
> +
> + if (dma_data->data_len % BYTES_PER_32BITS_WORD && !word_delay) {
> + unaligned = dma_data->data_len % BYTES_PER_32BITS_WORD;
> + copy_ptr = (u8 *)dma_data->dma_tx_buf + BYTES_PER_32BITS_WORD - unaligned;
> + } else {
> + copy_ptr = dma_data->dma_tx_buf;
> + }
> +
> + memcpy(copy_ptr, tx_buf, dma_data->data_len);
> +
> + /*
> + * When word_delay is enabled, DMA transfers an entire word in one minor loop.
> + * In this case, no data requires additional handling.
> + */
> + if (word_delay)
> + return 0;
> +
> +#ifdef __LITTLE_ENDIAN
> + /*
> + * On little-endian CPUs, adjust byte order:
> + * - Swap bytes when bpw = 8
> + * - Swap half-words when bpw = 16
> + * This ensures correct data ordering for DMA transfers.
> + */
> + unsigned int bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
> + u32 *temp = dma_data->dma_tx_buf;
> +
> + for (int i = 0; i < DIV_ROUND_UP(dma_data->dma_len, sizeof(*temp)); i++) {
> + if (bytes_per_word == 1)
> + swab32s(temp + i);
> + else if (bytes_per_word == 2)
> + swahw32s(temp + i);
> + }
> +#endif
> +
> + return 0;
> +}
> +
> +static int spi_imx_dma_data_prepare(struct spi_imx_data *spi_imx,
> + struct spi_transfer *transfer,
> + bool word_delay)
> +{
> + u32 pre_bl, tail_bl;
> + u32 ctrl;
> + int ret;
> +
> + /*
> + * ECSPI supports a maximum burst of 512 bytes. When xfer->len exceeds 512
> + * and is not a multiple of 512, a tail transfer is required. BURST_LEGTH
> + * is used for SPI HW to maintain correct bit count. BURST_LENGTH should
> + * update with data length. After DMA request submit, SPI can not update the
> + * BURST_LENGTH, in this case, we must split two package, update the register
> + * then setup second DMA transfer.
> + */
> + ctrl = readl(spi_imx->base + MX51_ECSPI_CTRL);
> + if (word_delay) {
> + /*
> + * When SPI IMX need to support word delay, according to "Sample Period Control
> + * Register" shows, The Sample Period Control Register (ECSPI_PERIODREG)
> + * provides software a way to insert delays (wait states) between consecutive
> + * SPI transfers. As a result, ECSPI can only transfer one word per frame, and
> + * the delay occurs between frames.
> + */
> + spi_imx->dma_package_num = 1;
> + pre_bl = spi_imx->bits_per_word - 1;
> + } else if (transfer->len <= MX51_ECSPI_CTRL_MAX_BURST) {
> + spi_imx->dma_package_num = 1;
> + pre_bl = transfer->len * BITS_PER_BYTE - 1;
> + } else if (!(transfer->len % MX51_ECSPI_CTRL_MAX_BURST)) {
> + spi_imx->dma_package_num = 1;
> + pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> + } else {
> + spi_imx->dma_package_num = 2;
> + pre_bl = MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1;
> + tail_bl = (transfer->len % MX51_ECSPI_CTRL_MAX_BURST) * BITS_PER_BYTE - 1;
> + }
> +
> + spi_imx->dma_data = kmalloc_array(spi_imx->dma_package_num,
> + sizeof(struct dma_data_package),
> + GFP_KERNEL | __GFP_ZERO);
> + if (!spi_imx->dma_data) {
> + dev_err(spi_imx->dev, "Failed to allocate DMA package buffer!\n");
> + return -ENOMEM;
> + }
> +
> + if (spi_imx->dma_package_num == 1) {
> + ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> + ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> + spi_imx->dma_data[0].cmd_word = ctrl;
> + spi_imx->dma_data[0].data_len = transfer->len;
> + ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
> + word_delay);
> + if (ret) {
> + kfree(spi_imx->dma_data);
> + return ret;
> + }
> + } else {
> + ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> + ctrl |= pre_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> + spi_imx->dma_data[0].cmd_word = ctrl;
> + spi_imx->dma_data[0].data_len = round_down(transfer->len,
> + MX51_ECSPI_CTRL_MAX_BURST);
> + ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[0], transfer->tx_buf,
> + false);
> + if (ret) {
> + kfree(spi_imx->dma_data);
> + return ret;
> + }
> +
> + ctrl &= ~MX51_ECSPI_CTRL_BL_MASK;
> + ctrl |= tail_bl << MX51_ECSPI_CTRL_BL_OFFSET;
> + spi_imx->dma_data[1].cmd_word = ctrl;
> + spi_imx->dma_data[1].data_len = transfer->len % MX51_ECSPI_CTRL_MAX_BURST;
> + ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[1],
> + transfer->tx_buf + spi_imx->dma_data[0].data_len,
> + false);
> + if (ret) {
> + kfree(spi_imx->dma_data[0].dma_tx_buf);
> + kfree(spi_imx->dma_data[0].dma_rx_buf);
> + kfree(spi_imx->dma_data);
> + }
> + }
> +
> + return 0;
> +}
> +
> static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
> + struct dma_data_package *dma_data,
> struct spi_transfer *transfer)
> {
> - struct sg_table *tx = &transfer->tx_sg, *rx = &transfer->rx_sg;
> struct spi_controller *controller = spi_imx->controller;
> struct dma_async_tx_descriptor *desc_tx, *desc_rx;
> unsigned long transfer_timeout;
> @@ -1451,9 +1723,9 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
> * The TX DMA setup starts the transfer, so make sure RX is configured
> * before TX.
> */
> - desc_rx = dmaengine_prep_slave_sg(controller->dma_rx,
> - rx->sgl, rx->nents, DMA_DEV_TO_MEM,
> - DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> + desc_rx = dmaengine_prep_slave_single(controller->dma_rx, dma_data->dma_rx_addr,
> + dma_data->dma_len, DMA_DEV_TO_MEM,
> + DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> if (!desc_rx) {
> transfer->error |= SPI_TRANS_FAIL_NO_START;
> return -EINVAL;
> @@ -1471,9 +1743,9 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
> reinit_completion(&spi_imx->dma_rx_completion);
> dma_async_issue_pending(controller->dma_rx);
>
> - desc_tx = dmaengine_prep_slave_sg(controller->dma_tx,
> - tx->sgl, tx->nents, DMA_MEM_TO_DEV,
> - DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> + desc_tx = dmaengine_prep_slave_single(controller->dma_tx, dma_data->dma_tx_addr,
> + dma_data->dma_len, DMA_MEM_TO_DEV,
> + DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
> if (!desc_tx)
> goto dmaengine_terminate_rx;
>
> @@ -1521,16 +1793,16 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
> }
>
> static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
> - struct spi_transfer *transfer)
> + struct dma_data_package *dma_data,
> + bool word_delay)
> {
> - struct sg_table *rx = &transfer->rx_sg;
> - struct scatterlist *last_sg = sg_last(rx->sgl, rx->nents);
> - unsigned int bytes_per_word, i;
> + unsigned int bytes_per_word = word_delay ?
> + spi_imx_bytes_per_word(spi_imx->bits_per_word) :
> + BYTES_PER_32BITS_WORD;
> + unsigned int i;
>
> - /* Get the right burst length from the last sg to ensure no tail data */
> - bytes_per_word = spi_imx_bytes_per_word(transfer->bits_per_word);
> for (i = spi_imx->devtype_data->fifo_size / 2; i > 0; i--) {
> - if (!(sg_dma_len(last_sg) % (i * bytes_per_word)))
> + if (!dma_data->dma_len % (i * bytes_per_word))
> break;
> }
> /* Use 1 as wml in case no available burst length got */
> @@ -1540,25 +1812,29 @@ static void spi_imx_dma_max_wml_find(struct spi_imx_data *spi_imx,
> spi_imx->wml = i;
> }
>
> -static int spi_imx_dma_configure(struct spi_controller *controller)
> +static int spi_imx_dma_configure(struct spi_controller *controller, bool word_delay)
> {
> int ret;
> enum dma_slave_buswidth buswidth;
> struct dma_slave_config rx = {}, tx = {};
> struct spi_imx_data *spi_imx = spi_controller_get_devdata(controller);
>
> - switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
> - case 4:
> + if (word_delay) {
> + switch (spi_imx_bytes_per_word(spi_imx->bits_per_word)) {
> + case 4:
> + buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
> + break;
> + case 2:
> + buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
> + break;
> + case 1:
> + buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
> + break;
> + default:
> + return -EINVAL;
> + }
> + } else {
> buswidth = DMA_SLAVE_BUSWIDTH_4_BYTES;
> - break;
> - case 2:
> - buswidth = DMA_SLAVE_BUSWIDTH_2_BYTES;
> - break;
> - case 1:
> - buswidth = DMA_SLAVE_BUSWIDTH_1_BYTE;
> - break;
> - default:
> - return -EINVAL;
> }
>
> tx.direction = DMA_MEM_TO_DEV;
> @@ -1584,15 +1860,17 @@ static int spi_imx_dma_configure(struct spi_controller *controller)
> return 0;
> }
>
> -static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> - struct spi_transfer *transfer)
> +static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
> + struct dma_data_package *dma_data,
> + struct spi_transfer *transfer,
> + bool word_delay)
> {
> struct spi_controller *controller = spi_imx->controller;
> int ret;
>
> - spi_imx_dma_max_wml_find(spi_imx, transfer);
> + spi_imx_dma_max_wml_find(spi_imx, dma_data, word_delay);
>
> - ret = spi_imx_dma_configure(controller);
> + ret = spi_imx_dma_configure(controller, word_delay);
> if (ret)
> goto dma_failure_no_start;
>
> @@ -1603,10 +1881,17 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> }
> spi_imx->devtype_data->setup_wml(spi_imx);
>
> - ret = spi_imx_dma_submit(spi_imx, transfer);
> + ret = spi_imx_dma_submit(spi_imx, dma_data, transfer);
> if (ret)
> return ret;
>
> + /* Trim the DMA RX buffer and copy the actual data to rx_buf */
> + dma_sync_single_for_cpu(controller->dma_rx->device->dev, dma_data->dma_rx_addr,
> + dma_data->dma_len, DMA_FROM_DEVICE);
> + spi_imx_dma_rx_data_handle(spi_imx, dma_data, transfer->rx_buf + spi_imx->rx_offset,
> + word_delay);
> + spi_imx->rx_offset += dma_data->data_len;
> +
> return 0;
> /* fallback to pio */
> dma_failure_no_start:
> @@ -1614,6 +1899,57 @@ static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> return ret;
> }
>
> +static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> + struct spi_transfer *transfer)
> +{
> + bool word_delay = transfer->word_delay.value != 0;
> + int ret;
> + int i;
> +
> + ret = spi_imx_dma_data_prepare(spi_imx, transfer, word_delay);
> + if (ret < 0) {
> + transfer->error |= SPI_TRANS_FAIL_NO_START;
> + dev_err(spi_imx->dev, "DMA data prepare fail\n");
> + goto fallback_pio;
> + }
> +
> + spi_imx->rx_offset = 0;
> +
> + /* Each dma_package performs a separate DMA transfer once */
> + for (i = 0; i < spi_imx->dma_package_num; i++) {
> + ret = spi_imx_dma_map(spi_imx, &spi_imx->dma_data[i]);
> + if (ret < 0) {
> + if (i == 0)
> + transfer->error |= SPI_TRANS_FAIL_NO_START;
> + dev_err(spi_imx->dev, "DMA map fail\n");
> + break;
> + }
> +
> + /* Update the CTRL register BL field */
> + writel(spi_imx->dma_data[i].cmd_word, spi_imx->base + MX51_ECSPI_CTRL);
> +
> + ret = spi_imx_dma_package_transfer(spi_imx, &spi_imx->dma_data[i],
> + transfer, word_delay);
> +
> + /* Whether the dma transmission is successful or not, dma unmap is necessary */
> + spi_imx_dma_unmap(spi_imx, &spi_imx->dma_data[i]);
> +
> + if (ret < 0) {
> + dev_dbg(spi_imx->dev, "DMA %d transfer not really finish\n", i);
> + break;
> + }
> + }
> +
> + for (int j = 0; j < spi_imx->dma_package_num; j++) {
> + kfree(spi_imx->dma_data[j].dma_tx_buf);
> + kfree(spi_imx->dma_data[j].dma_rx_buf);
> + }
> + kfree(spi_imx->dma_data);
> +
> +fallback_pio:
> + return ret;
> +}
> +
> static int spi_imx_pio_transfer(struct spi_device *spi,
> struct spi_transfer *transfer)
> {
> @@ -1780,9 +2116,14 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
> * transfer, the SPI transfer has already been mapped, so we
> * have to do the DMA transfer here.
> */
> - if (spi_imx->usedma)
> - return spi_imx_dma_transfer(spi_imx, transfer);
> -
> + if (spi_imx->usedma) {
> + ret = spi_imx_dma_transfer(spi_imx, transfer);
> + if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> + spi_imx->usedma = false;
> + return spi_imx_pio_transfer(spi, transfer);
> + }
> + return ret;
> + }
> /* run in polling mode for short transfers */
> if (transfer->len == 1 || (polling_limit_us &&
> spi_imx_transfer_estimate_time_us(transfer) < polling_limit_us))
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v2 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
2025-12-02 7:55 ` [PATCH v2 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
2025-12-02 15:06 ` Frank Li
@ 2025-12-03 6:59 ` kernel test robot
1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-12-03 6:59 UTC (permalink / raw)
To: Carlos Song, frank.li, mkl, broonie, shawnguo, s.hauer, kernel,
festevam, kees, gustavoars
Cc: oe-kbuild-all, linux-spi, imx, linux-kernel, linux-arm-kernel,
linux-hardening, Carlos Song
Hi Carlos,
kernel test robot noticed the following build errors:
[auto build test ERROR on v6.18]
[also build test ERROR on linus/master next-20251202]
[cannot apply to shawnguo/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Carlos-Song/spi-imx-group-spi_imx_dma_configure-with-spi_imx_dma_transfer/20251202-160030
base: v6.18
patch link: https://lore.kernel.org/r/20251202075503.2448339-6-carlos.song%40nxp.com
patch subject: [PATCH v2 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
config: csky-allmodconfig (https://download.01.org/0day-ci/archive/20251203/202512031425.cmBJXuXy-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251203/202512031425.cmBJXuXy-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512031425.cmBJXuXy-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/spi/spi-imx.c:144:34: error: 'counted_by' attribute is not allowed for a non-array field
144 | struct dma_data_package *dma_data __counted_by(dma_package_num);
| ^~~~~~~~
vim +/counted_by +144 drivers/spi/spi-imx.c
107
108 struct spi_imx_data {
109 struct spi_controller *controller;
110 struct device *dev;
111
112 struct completion xfer_done;
113 void __iomem *base;
114 unsigned long base_phys;
115
116 struct clk *clk_per;
117 struct clk *clk_ipg;
118 unsigned long spi_clk;
119 unsigned int spi_bus_clk;
120
121 unsigned int bits_per_word;
122 unsigned int spi_drctl;
123
124 unsigned int count, remainder;
125 void (*tx)(struct spi_imx_data *spi_imx);
126 void (*rx)(struct spi_imx_data *spi_imx);
127 void *rx_buf;
128 const void *tx_buf;
129 unsigned int txfifo; /* number of words pushed in tx FIFO */
130 unsigned int dynamic_burst;
131 bool rx_only;
132
133 /* Target mode */
134 bool target_mode;
135 bool target_aborted;
136 unsigned int target_burst;
137
138 /* DMA */
139 bool usedma;
140 u32 wml;
141 struct completion dma_rx_completion;
142 struct completion dma_tx_completion;
143 size_t dma_package_num;
> 144 struct dma_data_package *dma_data __counted_by(dma_package_num);
145 int rx_offset;
146
147 const struct spi_imx_devtype_data *devtype_data;
148 };
149
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 6/6] spi: imx: enable DMA mode for target operation
2025-12-02 7:54 [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
` (4 preceding siblings ...)
2025-12-02 7:55 ` [PATCH v2 5/6] spi: imx: support dynamic burst length for ECSPI DMA mode Carlos Song
@ 2025-12-02 7:55 ` Carlos Song
2025-12-02 14:52 ` Frank Li
2025-12-15 13:59 ` [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Mark Brown
6 siblings, 1 reply; 11+ messages in thread
From: Carlos Song @ 2025-12-02 7:55 UTC (permalink / raw)
To: frank.li, mkl, broonie, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars
Cc: linux-spi, imx, linux-kernel, linux-arm-kernel, linux-hardening,
Carlos Song
Enable DMA mode for SPI IMX in target mode. Disable the word delay feature
for target mode, because target mode should always keep high performance
to make sure it can follow the master. Target mode continues to operate in
dynamic burst mode.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
---
drivers/spi/spi-imx.c | 77 ++++++++++++++++++++++++++++++++-----------
1 file changed, 57 insertions(+), 20 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 045f4ffd680a..e37d786a5276 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -264,7 +264,13 @@ static bool spi_imx_can_dma(struct spi_controller *controller, struct spi_device
if (!controller->dma_rx)
return false;
- if (spi_imx->target_mode)
+ /*
+ * Due to Freescale errata ERR003775 "eCSPI: Burst completion by Chip
+ * Select (SS) signal in Slave mode is not functional" burst size must
+ * be set exactly to the size of the transfer. This limit SPI transaction
+ * with maximum 2^12 bits.
+ */
+ if (transfer->len > MX53_MAX_TRANSFER_BYTES && spi_imx->target_mode)
return false;
if (transfer->len < spi_imx->devtype_data->fifo_size)
@@ -1763,23 +1769,51 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
transfer_timeout = spi_imx_calculate_timeout(spi_imx, transfer->len);
- /* Wait SDMA to finish the data transfer.*/
- time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
- transfer_timeout);
- if (!time_left) {
- dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
- dmaengine_terminate_all(controller->dma_tx);
- dmaengine_terminate_all(controller->dma_rx);
- return -ETIMEDOUT;
- }
+ if (!spi_imx->target_mode) {
+ /* Wait SDMA to finish the data transfer.*/
+ time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
+ transfer_timeout);
+ if (!time_left) {
+ dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
+ dmaengine_terminate_all(controller->dma_tx);
+ dmaengine_terminate_all(controller->dma_rx);
+ return -ETIMEDOUT;
+ }
- time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
- transfer_timeout);
- if (!time_left) {
- dev_err(&controller->dev, "I/O Error in DMA RX\n");
- spi_imx->devtype_data->reset(spi_imx);
- dmaengine_terminate_all(controller->dma_rx);
- return -ETIMEDOUT;
+ time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
+ transfer_timeout);
+ if (!time_left) {
+ dev_err(&controller->dev, "I/O Error in DMA RX\n");
+ spi_imx->devtype_data->reset(spi_imx);
+ dmaengine_terminate_all(controller->dma_rx);
+ return -ETIMEDOUT;
+ }
+ } else {
+ spi_imx->target_aborted = false;
+
+ if (wait_for_completion_interruptible(&spi_imx->dma_tx_completion) ||
+ READ_ONCE(spi_imx->target_aborted)) {
+ dev_dbg(spi_imx->dev, "I/O Error in DMA TX interrupted\n");
+ dmaengine_terminate_all(controller->dma_tx);
+ dmaengine_terminate_all(controller->dma_rx);
+ return -EINTR;
+ }
+
+ if (wait_for_completion_interruptible(&spi_imx->dma_rx_completion) ||
+ READ_ONCE(spi_imx->target_aborted)) {
+ dev_dbg(spi_imx->dev, "I/O Error in DMA RX interrupted\n");
+ dmaengine_terminate_all(controller->dma_rx);
+ return -EINTR;
+ }
+
+ /*
+ * ECSPI has a HW issue when works in Target mode, after 64 words
+ * writtern to TXFIFO, even TXFIFO becomes empty, ECSPI_TXDATA keeps
+ * shift out the last word data, so we have to disable ECSPI when in
+ * target mode after the transfer completes.
+ */
+ if (spi_imx->devtype_data->disable)
+ spi_imx->devtype_data->disable(spi_imx);
}
return 0;
@@ -1902,7 +1936,7 @@ static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
struct spi_transfer *transfer)
{
- bool word_delay = transfer->word_delay.value != 0;
+ bool word_delay = transfer->word_delay.value != 0 && !spi_imx->target_mode;
int ret;
int i;
@@ -2108,7 +2142,7 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
while (spi_imx->devtype_data->rx_available(spi_imx))
readl(spi_imx->base + MXC_CSPIRXDATA);
- if (spi_imx->target_mode)
+ if (spi_imx->target_mode && !spi_imx->usedma)
return spi_imx_pio_transfer_target(spi, transfer);
/*
@@ -2120,7 +2154,10 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
ret = spi_imx_dma_transfer(spi_imx, transfer);
if (transfer->error & SPI_TRANS_FAIL_NO_START) {
spi_imx->usedma = false;
- return spi_imx_pio_transfer(spi, transfer);
+ if (spi_imx->target_mode)
+ return spi_imx_pio_transfer_target(spi, transfer);
+ else
+ return spi_imx_pio_transfer(spi, transfer);
}
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v2 6/6] spi: imx: enable DMA mode for target operation
2025-12-02 7:55 ` [PATCH v2 6/6] spi: imx: enable DMA mode for target operation Carlos Song
@ 2025-12-02 14:52 ` Frank Li
0 siblings, 0 replies; 11+ messages in thread
From: Frank Li @ 2025-12-02 14:52 UTC (permalink / raw)
To: Carlos Song
Cc: mkl, broonie, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars, linux-spi, imx, linux-kernel, linux-arm-kernel,
linux-hardening
On Tue, Dec 02, 2025 at 03:55:03PM +0800, Carlos Song wrote:
> Enable DMA mode for SPI IMX in target mode. Disable the word delay feature
> for target mode, because target mode should always keep high performance
> to make sure it can follow the master. Target mode continues to operate in
> dynamic burst mode.
>
> Signed-off-by: Carlos Song <carlos.song@nxp.com>
> ---
Reviewed-by: Frank Li <Frank.Li@nxp.com>
> drivers/spi/spi-imx.c | 77 ++++++++++++++++++++++++++++++++-----------
> 1 file changed, 57 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index 045f4ffd680a..e37d786a5276 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -264,7 +264,13 @@ static bool spi_imx_can_dma(struct spi_controller *controller, struct spi_device
> if (!controller->dma_rx)
> return false;
>
> - if (spi_imx->target_mode)
> + /*
> + * Due to Freescale errata ERR003775 "eCSPI: Burst completion by Chip
> + * Select (SS) signal in Slave mode is not functional" burst size must
> + * be set exactly to the size of the transfer. This limit SPI transaction
> + * with maximum 2^12 bits.
> + */
> + if (transfer->len > MX53_MAX_TRANSFER_BYTES && spi_imx->target_mode)
> return false;
>
> if (transfer->len < spi_imx->devtype_data->fifo_size)
> @@ -1763,23 +1769,51 @@ static int spi_imx_dma_submit(struct spi_imx_data *spi_imx,
>
> transfer_timeout = spi_imx_calculate_timeout(spi_imx, transfer->len);
>
> - /* Wait SDMA to finish the data transfer.*/
> - time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
> - transfer_timeout);
> - if (!time_left) {
> - dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
> - dmaengine_terminate_all(controller->dma_tx);
> - dmaengine_terminate_all(controller->dma_rx);
> - return -ETIMEDOUT;
> - }
> + if (!spi_imx->target_mode) {
> + /* Wait SDMA to finish the data transfer.*/
> + time_left = wait_for_completion_timeout(&spi_imx->dma_tx_completion,
> + transfer_timeout);
> + if (!time_left) {
> + dev_err(spi_imx->dev, "I/O Error in DMA TX\n");
> + dmaengine_terminate_all(controller->dma_tx);
> + dmaengine_terminate_all(controller->dma_rx);
> + return -ETIMEDOUT;
> + }
>
> - time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
> - transfer_timeout);
> - if (!time_left) {
> - dev_err(&controller->dev, "I/O Error in DMA RX\n");
> - spi_imx->devtype_data->reset(spi_imx);
> - dmaengine_terminate_all(controller->dma_rx);
> - return -ETIMEDOUT;
> + time_left = wait_for_completion_timeout(&spi_imx->dma_rx_completion,
> + transfer_timeout);
> + if (!time_left) {
> + dev_err(&controller->dev, "I/O Error in DMA RX\n");
> + spi_imx->devtype_data->reset(spi_imx);
> + dmaengine_terminate_all(controller->dma_rx);
> + return -ETIMEDOUT;
> + }
> + } else {
> + spi_imx->target_aborted = false;
> +
> + if (wait_for_completion_interruptible(&spi_imx->dma_tx_completion) ||
> + READ_ONCE(spi_imx->target_aborted)) {
> + dev_dbg(spi_imx->dev, "I/O Error in DMA TX interrupted\n");
> + dmaengine_terminate_all(controller->dma_tx);
> + dmaengine_terminate_all(controller->dma_rx);
> + return -EINTR;
> + }
> +
> + if (wait_for_completion_interruptible(&spi_imx->dma_rx_completion) ||
> + READ_ONCE(spi_imx->target_aborted)) {
> + dev_dbg(spi_imx->dev, "I/O Error in DMA RX interrupted\n");
> + dmaengine_terminate_all(controller->dma_rx);
> + return -EINTR;
> + }
> +
> + /*
> + * ECSPI has a HW issue when works in Target mode, after 64 words
> + * writtern to TXFIFO, even TXFIFO becomes empty, ECSPI_TXDATA keeps
> + * shift out the last word data, so we have to disable ECSPI when in
> + * target mode after the transfer completes.
> + */
> + if (spi_imx->devtype_data->disable)
> + spi_imx->devtype_data->disable(spi_imx);
> }
>
> return 0;
> @@ -1902,7 +1936,7 @@ static int spi_imx_dma_package_transfer(struct spi_imx_data *spi_imx,
> static int spi_imx_dma_transfer(struct spi_imx_data *spi_imx,
> struct spi_transfer *transfer)
> {
> - bool word_delay = transfer->word_delay.value != 0;
> + bool word_delay = transfer->word_delay.value != 0 && !spi_imx->target_mode;
> int ret;
> int i;
>
> @@ -2108,7 +2142,7 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
> while (spi_imx->devtype_data->rx_available(spi_imx))
> readl(spi_imx->base + MXC_CSPIRXDATA);
>
> - if (spi_imx->target_mode)
> + if (spi_imx->target_mode && !spi_imx->usedma)
> return spi_imx_pio_transfer_target(spi, transfer);
>
> /*
> @@ -2120,7 +2154,10 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
> ret = spi_imx_dma_transfer(spi_imx, transfer);
> if (transfer->error & SPI_TRANS_FAIL_NO_START) {
> spi_imx->usedma = false;
> - return spi_imx_pio_transfer(spi, transfer);
> + if (spi_imx->target_mode)
> + return spi_imx_pio_transfer_target(spi, transfer);
> + else
> + return spi_imx_pio_transfer(spi, transfer);
> }
> return ret;
> }
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode
2025-12-02 7:54 [PATCH v2 0/6] Support ECSPI dynamic burst feature for DMA mode Carlos Song
` (5 preceding siblings ...)
2025-12-02 7:55 ` [PATCH v2 6/6] spi: imx: enable DMA mode for target operation Carlos Song
@ 2025-12-15 13:59 ` Mark Brown
6 siblings, 0 replies; 11+ messages in thread
From: Mark Brown @ 2025-12-15 13:59 UTC (permalink / raw)
To: frank.li, mkl, shawnguo, s.hauer, kernel, festevam, kees,
gustavoars, Carlos Song
Cc: linux-spi, imx, linux-kernel, linux-arm-kernel, linux-hardening
On Tue, 02 Dec 2025 15:54:57 +0800, Carlos Song wrote:
> ECSPI has a low throughput because of no dynamic burst support, it
> transfers only one word per frame in DMA mode, causing SCLK stalls
> between words due to BURST_LENGTH updates.
>
> This patch set is to support ECSPI dynamic burst feature to help improve
> the ECSPI DMA mode performance.
>
> [...]
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
Thanks!
[1/6] spi: imx: group spi_imx_dma_configure() with spi_imx_dma_transfer()
commit: c64f62b036aed30626cb30fa82d3ec4a13fa83df
[2/6] spi: imx: introduce helper to clear DMA mode logic
commit: 5395bb7f7c361310d0f329c8169d2190809b05c1
[3/6] spi: imx: avoid dmaengine_terminate_all() on TX prep failure
commit: a5f298581d454c5ea77c5fb6f4ee1bff61eb2b2c
[4/6] spi: imx: handle DMA submission errors with dma_submit_error()
commit: a450c8b77f929f5f9f5236861761a8c5cab22023
[5/6] spi: imx: support dynamic burst length for ECSPI DMA mode
commit: faa8e404ad8e686cb98c51dc507fdcacfb8020ce
[6/6] spi: imx: enable DMA mode for target operation
commit: ba9b28652c75b07383e267328f1759195d5430f7
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
^ permalink raw reply [flat|nested] 11+ messages in thread