linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements
@ 2025-06-09 15:32 James Clark
  2025-06-09 15:32 ` [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer James Clark
                   ` (4 more replies)
  0 siblings, 5 replies; 43+ messages in thread
From: James Clark @ 2025-06-09 15:32 UTC (permalink / raw)
  To: Vladimir Oltean, Mark Brown
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel, James Clark,
	Arnd Bergmann, Larisa Grigore

Improve usability of target mode by reporting FIFO errors and increasing
the buffer size when DMA is used. While we're touching DMA stuff also
switch to non-coherent memory, although this is unrelated to target
mode.

The first commit is marked as a fix because it can fix intermittent
issues with existing transfers, rather than the later fixes which
improve larger than FIFO target mode transfers which would have never
worked.

Signed-off-by: James Clark <james.clark@linaro.org>
---
James Clark (3):
      spi: spi-fsl-dspi: Clear completion counter before initiating transfer
      spi: spi-fsl-dspi: Use non-coherent memory for DMA
      spi: spi-fsl-dspi: Report FIFO overflows as errors

Larisa Grigore (1):
      spi: spi-fsl-dspi: Increase DMA buffer size

 drivers/spi/spi-fsl-dspi.c | 128 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 102 insertions(+), 26 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250522-james-nxp-spi-dma-a997ebebfb6b

Best regards,
-- 
James Clark <james.clark@linaro.org>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
  2025-06-09 15:32 [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements James Clark
@ 2025-06-09 15:32 ` James Clark
  2025-06-10 11:34   ` Vladimir Oltean
  2025-06-09 15:32 ` [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA James Clark
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-09 15:32 UTC (permalink / raw)
  To: Vladimir Oltean, Mark Brown
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel, James Clark

In target mode, extra interrupts can be received between the end of a
transfer and halting the module if the host continues sending more data.
If the interrupt from this occurs after the reinit_completion() then the
completion counter is left at a non-zero value. The next unrelated
transfer initiated by userspace will then complete immediately without
waiting for the interrupt or writing to the RX buffer.

Fix it by resetting the counter before the transfer so that lingering
values are cleared. This is done after clearing the FIFOs and the
status register but before the transfer is initiated, so no interrupts
should be received at this point resulting in other race conditions.

Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
Signed-off-by: James Clark <james.clark@linaro.org>
---
 drivers/spi/spi-fsl-dspi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 863781ba6c16..386a17871e79 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -983,11 +983,13 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 		if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
 			status = dspi_dma_xfer(dspi);
 		} else {
+			if (dspi->irq)
+				reinit_completion(&dspi->xfer_done);
+
 			dspi_fifo_write(dspi);
 
 			if (dspi->irq) {
 				wait_for_completion(&dspi->xfer_done);
-				reinit_completion(&dspi->xfer_done);
 			} else {
 				do {
 					status = dspi_poll(dspi);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-09 15:32 [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements James Clark
  2025-06-09 15:32 ` [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer James Clark
@ 2025-06-09 15:32 ` James Clark
  2025-06-10  8:26   ` Arnd Bergmann
  2025-06-10 15:15   ` Frank Li
  2025-06-09 15:32 ` [PATCH 3/4] spi: spi-fsl-dspi: Increase DMA buffer size James Clark
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 43+ messages in thread
From: James Clark @ 2025-06-09 15:32 UTC (permalink / raw)
  To: Vladimir Oltean, Mark Brown
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel, James Clark,
	Arnd Bergmann

Using coherent memory here isn't functionally necessary. Because the
change to use non-coherent memory isn't overly complex and only a few
synchronization points are required, we might as well do it while fixing
up some other DMA issues.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 drivers/spi/spi-fsl-dspi.c | 55 +++++++++++++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 386a17871e79..567632042f8f 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -247,6 +247,11 @@ struct fsl_dspi {
 	void (*dev_to_host)(struct fsl_dspi *dspi, u32 rxdata);
 };
 
+static int dspi_dma_transfer_size(struct fsl_dspi *dspi)
+{
+	return dspi->words_in_flight * DMA_SLAVE_BUSWIDTH_4_BYTES;
+}
+
 static void dspi_native_host_to_dev(struct fsl_dspi *dspi, u32 *txdata)
 {
 	switch (dspi->oper_word_size) {
@@ -361,7 +366,10 @@ static void dspi_tx_dma_callback(void *arg)
 {
 	struct fsl_dspi *dspi = arg;
 	struct fsl_dspi_dma *dma = dspi->dma;
+	struct device *dev = &dspi->pdev->dev;
 
+	dma_sync_single_for_cpu(dev, dma->tx_dma_phys,
+				dspi_dma_transfer_size(dspi), DMA_TO_DEVICE);
 	complete(&dma->cmd_tx_complete);
 }
 
@@ -369,9 +377,13 @@ static void dspi_rx_dma_callback(void *arg)
 {
 	struct fsl_dspi *dspi = arg;
 	struct fsl_dspi_dma *dma = dspi->dma;
+	struct device *dev = &dspi->pdev->dev;
 	int i;
 
 	if (dspi->rx) {
+		dma_sync_single_for_cpu(dev, dma->rx_dma_phys,
+					dspi_dma_transfer_size(dspi),
+					DMA_FROM_DEVICE);
 		for (i = 0; i < dspi->words_in_flight; i++)
 			dspi_push_rx(dspi, dspi->dma->rx_dma_buf[i]);
 	}
@@ -381,6 +393,7 @@ static void dspi_rx_dma_callback(void *arg)
 
 static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
 {
+	size_t size = dspi_dma_transfer_size(dspi);
 	struct device *dev = &dspi->pdev->dev;
 	struct fsl_dspi_dma *dma = dspi->dma;
 	int time_left;
@@ -389,10 +402,9 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
 	for (i = 0; i < dspi->words_in_flight; i++)
 		dspi->dma->tx_dma_buf[i] = dspi_pop_tx_pushr(dspi);
 
+	dma_sync_single_for_device(dev, dma->tx_dma_phys, size, DMA_TO_DEVICE);
 	dma->tx_desc = dmaengine_prep_slave_single(dma->chan_tx,
-					dma->tx_dma_phys,
-					dspi->words_in_flight *
-					DMA_SLAVE_BUSWIDTH_4_BYTES,
+					dma->tx_dma_phys, size,
 					DMA_MEM_TO_DEV,
 					DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
 	if (!dma->tx_desc) {
@@ -407,10 +419,10 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
 		return -EINVAL;
 	}
 
+	dma_sync_single_for_device(dev, dma->rx_dma_phys, size,
+				   DMA_FROM_DEVICE);
 	dma->rx_desc = dmaengine_prep_slave_single(dma->chan_rx,
-					dma->rx_dma_phys,
-					dspi->words_in_flight *
-					DMA_SLAVE_BUSWIDTH_4_BYTES,
+					dma->rx_dma_phys, size,
 					DMA_DEV_TO_MEM,
 					DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
 	if (!dma->rx_desc) {
@@ -512,17 +524,17 @@ static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
 		goto err_tx_channel;
 	}
 
-	dma->tx_dma_buf = dma_alloc_coherent(dma->chan_tx->device->dev,
-					     dma_bufsize, &dma->tx_dma_phys,
-					     GFP_KERNEL);
+	dma->tx_dma_buf = dma_alloc_noncoherent(dma->chan_tx->device->dev,
+						dma_bufsize, &dma->tx_dma_phys,
+						DMA_TO_DEVICE, GFP_KERNEL);
 	if (!dma->tx_dma_buf) {
 		ret = -ENOMEM;
 		goto err_tx_dma_buf;
 	}
 
-	dma->rx_dma_buf = dma_alloc_coherent(dma->chan_rx->device->dev,
-					     dma_bufsize, &dma->rx_dma_phys,
-					     GFP_KERNEL);
+	dma->rx_dma_buf = dma_alloc_noncoherent(dma->chan_rx->device->dev,
+						dma_bufsize, &dma->rx_dma_phys,
+						DMA_FROM_DEVICE, GFP_KERNEL);
 	if (!dma->rx_dma_buf) {
 		ret = -ENOMEM;
 		goto err_rx_dma_buf;
@@ -557,11 +569,12 @@ static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
 	return 0;
 
 err_slave_config:
-	dma_free_coherent(dma->chan_rx->device->dev,
-			  dma_bufsize, dma->rx_dma_buf, dma->rx_dma_phys);
+	dma_free_noncoherent(dma->chan_rx->device->dev, dma_bufsize,
+			     dma->rx_dma_buf, dma->rx_dma_phys,
+			     DMA_FROM_DEVICE);
 err_rx_dma_buf:
-	dma_free_coherent(dma->chan_tx->device->dev,
-			  dma_bufsize, dma->tx_dma_buf, dma->tx_dma_phys);
+	dma_free_noncoherent(dma->chan_tx->device->dev, dma_bufsize,
+			     dma->tx_dma_buf, dma->tx_dma_phys, DMA_TO_DEVICE);
 err_tx_dma_buf:
 	dma_release_channel(dma->chan_tx);
 err_tx_channel:
@@ -582,14 +595,16 @@ static void dspi_release_dma(struct fsl_dspi *dspi)
 		return;
 
 	if (dma->chan_tx) {
-		dma_free_coherent(dma->chan_tx->device->dev, dma_bufsize,
-				  dma->tx_dma_buf, dma->tx_dma_phys);
+		dma_free_noncoherent(dma->chan_tx->device->dev, dma_bufsize,
+				     dma->tx_dma_buf, dma->tx_dma_phys,
+				     DMA_TO_DEVICE);
 		dma_release_channel(dma->chan_tx);
 	}
 
 	if (dma->chan_rx) {
-		dma_free_coherent(dma->chan_rx->device->dev, dma_bufsize,
-				  dma->rx_dma_buf, dma->rx_dma_phys);
+		dma_free_noncoherent(dma->chan_rx->device->dev, dma_bufsize,
+				     dma->rx_dma_buf, dma->rx_dma_phys,
+				     DMA_FROM_DEVICE);
 		dma_release_channel(dma->chan_rx);
 	}
 }

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/4] spi: spi-fsl-dspi: Increase DMA buffer size
  2025-06-09 15:32 [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements James Clark
  2025-06-09 15:32 ` [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer James Clark
  2025-06-09 15:32 ` [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA James Clark
@ 2025-06-09 15:32 ` James Clark
  2025-06-09 15:32 ` [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors James Clark
  2025-06-30 11:40 ` [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements Mark Brown
  4 siblings, 0 replies; 43+ messages in thread
From: James Clark @ 2025-06-09 15:32 UTC (permalink / raw)
  To: Vladimir Oltean, Mark Brown
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel, James Clark,
	Larisa Grigore

From: Larisa Grigore <larisa.grigore@nxp.com>

When the device is configured as a target, the host won't stop sending
data while we're draining the buffer which leads to FIFO underflows
and corruption.

Increase the DMA buffer size to the maximum words that edma can
transfer once to reduce the chance of this happening.

While we're here, also change the buffer size for host mode back to a
page as it was before commit a957499bd437 ("spi: spi-fsl-dspi: Fix
bits-per-word acceleration in DMA mode"). dma_alloc_noncoherent()
allocations are backed by a full page anyway, so we might as well use it
all.

Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 drivers/spi/spi-fsl-dspi.c | 42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 567632042f8f..e211e44e977f 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -252,6 +252,39 @@ static int dspi_dma_transfer_size(struct fsl_dspi *dspi)
 	return dspi->words_in_flight * DMA_SLAVE_BUSWIDTH_4_BYTES;
 }
 
+static int dspi_dma_bufsize(struct fsl_dspi *dspi)
+{
+	if (spi_controller_is_target(dspi->ctlr)) {
+		/*
+		 * In target mode we have to be ready to receive the maximum
+		 * that can possibly be transferred at once by EDMA without any
+		 * FIFO underflows. This is CITER * SSIZE, where SSIZE is a max
+		 * of 4 when transferring to a peripheral.
+		 */
+		return GENMASK(14, 0) * DMA_SLAVE_BUSWIDTH_4_BYTES;
+	}
+
+	return PAGE_SIZE;
+}
+
+static int dspi_dma_max_datawords(struct fsl_dspi *dspi)
+{
+	/*
+	 * Transfers look like this so we always use a full DMA word regardless
+	 * of SPI word size:
+	 *
+	 * 31              16 15                   0
+	 * -----------------------------------------
+	 * |   CONTROL WORD  |     16-bit DATA     |
+	 * -----------------------------------------
+	 * or
+	 * -----------------------------------------
+	 * |   CONTROL WORD  | UNUSED | 8-bit DATA |
+	 * -----------------------------------------
+	 */
+	return dspi_dma_bufsize(dspi) / DMA_SLAVE_BUSWIDTH_4_BYTES;
+}
+
 static void dspi_native_host_to_dev(struct fsl_dspi *dspi, u32 *txdata)
 {
 	switch (dspi->oper_word_size) {
@@ -474,6 +507,7 @@ static void dspi_setup_accel(struct fsl_dspi *dspi);
 static int dspi_dma_xfer(struct fsl_dspi *dspi)
 {
 	struct spi_message *message = dspi->cur_msg;
+	int max_words = dspi_dma_max_datawords(dspi);
 	struct device *dev = &dspi->pdev->dev;
 	int ret = 0;
 
@@ -486,8 +520,8 @@ static int dspi_dma_xfer(struct fsl_dspi *dspi)
 		dspi_setup_accel(dspi);
 
 		dspi->words_in_flight = dspi->len / dspi->oper_word_size;
-		if (dspi->words_in_flight > dspi->devtype_data->fifo_size)
-			dspi->words_in_flight = dspi->devtype_data->fifo_size;
+		if (dspi->words_in_flight > max_words)
+			dspi->words_in_flight = max_words;
 
 		message->actual_length += dspi->words_in_flight *
 					  dspi->oper_word_size;
@@ -504,7 +538,7 @@ static int dspi_dma_xfer(struct fsl_dspi *dspi)
 
 static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
 {
-	int dma_bufsize = dspi->devtype_data->fifo_size * 2;
+	int dma_bufsize = dspi_dma_bufsize(dspi);
 	struct device *dev = &dspi->pdev->dev;
 	struct dma_slave_config cfg;
 	struct fsl_dspi_dma *dma;
@@ -588,7 +622,7 @@ static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
 
 static void dspi_release_dma(struct fsl_dspi *dspi)
 {
-	int dma_bufsize = dspi->devtype_data->fifo_size * 2;
+	int dma_bufsize = dspi_dma_bufsize(dspi);
 	struct fsl_dspi_dma *dma = dspi->dma;
 
 	if (!dma)

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors
  2025-06-09 15:32 [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements James Clark
                   ` (2 preceding siblings ...)
  2025-06-09 15:32 ` [PATCH 3/4] spi: spi-fsl-dspi: Increase DMA buffer size James Clark
@ 2025-06-09 15:32 ` James Clark
  2025-06-10 21:52   ` Vladimir Oltean
  2025-06-30 11:40 ` [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements Mark Brown
  4 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-09 15:32 UTC (permalink / raw)
  To: Vladimir Oltean, Mark Brown
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel, James Clark

In target mode, the host sending more data than can be consumed would be
a common problem for any message exceeding the FIFO or DMA buffer size.
Cancel the whole message as soon as this condition is hit as the message
will be corrupted.

Only do this for target mode in a DMA transfer because we need to add a
register read. In IRQ and polling modes always do it because SPI_SR was
already read and it might catch some host mode programming/buffer
management errors too.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 drivers/spi/spi-fsl-dspi.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index e211e44e977f..75767d756496 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -228,6 +228,7 @@ struct fsl_dspi {
 	const struct fsl_dspi_devtype_data	*devtype_data;
 
 	struct completion			xfer_done;
+	int                                     xfer_status;
 
 	struct fsl_dspi_dma			*dma;
 
@@ -504,12 +505,22 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
 
 static void dspi_setup_accel(struct fsl_dspi *dspi);
 
+static bool dspi_is_fifo_overflow(struct fsl_dspi *dspi, u32 spi_sr)
+{
+	if (spi_sr & (SPI_SR_TFUF | SPI_SR_RFOF)) {
+		dev_err(&dspi->pdev->dev, "FIFO under/overflow");
+		return true;
+	}
+	return false;
+}
+
 static int dspi_dma_xfer(struct fsl_dspi *dspi)
 {
 	struct spi_message *message = dspi->cur_msg;
 	int max_words = dspi_dma_max_datawords(dspi);
 	struct device *dev = &dspi->pdev->dev;
 	int ret = 0;
+	u32 spi_sr;
 
 	/*
 	 * dspi->len gets decremented by dspi_pop_tx_pushr in
@@ -531,6 +542,12 @@ static int dspi_dma_xfer(struct fsl_dspi *dspi)
 			dev_err(dev, "DMA transfer failed\n");
 			break;
 		}
+
+		if (spi_controller_is_target(dspi->ctlr)) {
+			regmap_read(dspi->regmap, SPI_SR, &spi_sr);
+			if (dspi_is_fifo_overflow(dspi, spi_sr))
+				return -EIO;
+		}
 	}
 
 	return ret;
@@ -918,6 +935,8 @@ static int dspi_poll(struct fsl_dspi *dspi)
 		regmap_read(dspi->regmap, SPI_SR, &spi_sr);
 		regmap_write(dspi->regmap, SPI_SR, spi_sr);
 
+		if (dspi_is_fifo_overflow(dspi, spi_sr))
+			return -EIO;
 		if (spi_sr & SPI_SR_CMDTCF)
 			break;
 	} while (--tries);
@@ -939,8 +958,12 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
 	if (!(spi_sr & SPI_SR_CMDTCF))
 		return IRQ_NONE;
 
-	if (dspi_rxtx(dspi) == 0)
+	if (dspi_is_fifo_overflow(dspi, spi_sr)) {
+		WRITE_ONCE(dspi->xfer_status, -EIO);
+		complete(&dspi->xfer_done);
+	} else if (dspi_rxtx(dspi) == 0) {
 		complete(&dspi->xfer_done);
+	}
 
 	return IRQ_HANDLED;
 }
@@ -1032,13 +1055,15 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 		if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
 			status = dspi_dma_xfer(dspi);
 		} else {
-			if (dspi->irq)
+			if (dspi->irq) {
+				WRITE_ONCE(dspi->xfer_status, 0);
 				reinit_completion(&dspi->xfer_done);
-
+			}
 			dspi_fifo_write(dspi);
 
 			if (dspi->irq) {
 				wait_for_completion(&dspi->xfer_done);
+				status = READ_ONCE(dspi->xfer_status);
 			} else {
 				do {
 					status = dspi_poll(dspi);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-09 15:32 ` [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA James Clark
@ 2025-06-10  8:26   ` Arnd Bergmann
  2025-06-10  9:03     ` James Clark
  2025-06-10 15:15   ` Frank Li
  1 sibling, 1 reply; 43+ messages in thread
From: Arnd Bergmann @ 2025-06-10  8:26 UTC (permalink / raw)
  To: James Clark, Vladimir Oltean, Mark Brown
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel

On Mon, Jun 9, 2025, at 17:32, James Clark wrote:
> Using coherent memory here isn't functionally necessary. Because the
> change to use non-coherent memory isn't overly complex and only a few
> synchronization points are required, we might as well do it while fixing
> up some other DMA issues.
>
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: James Clark <james.clark@linaro.org>

This version looks good to me,

Acked-by: Arnd Bergmann <arnd@arndb.de>

I had reviewed an internal version originally and had some comment
on that, all of which are addressed now. You did not Cc me on the
other patches, so I looked them up in the archive, Patch 3 also
looks good to me and complements this one (i.e. you really want
the combination). I did not understand the logic in patch 4,
and it would be good if someone else can take a closer look
at that in order to Ack that.

      Arnd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-10  8:26   ` Arnd Bergmann
@ 2025-06-10  9:03     ` James Clark
  0 siblings, 0 replies; 43+ messages in thread
From: James Clark @ 2025-06-10  9:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel, Vladimir Oltean,
	Mark Brown



On 10/06/2025 9:26 am, Arnd Bergmann wrote:
> On Mon, Jun 9, 2025, at 17:32, James Clark wrote:
>> Using coherent memory here isn't functionally necessary. Because the
>> change to use non-coherent memory isn't overly complex and only a few
>> synchronization points are required, we might as well do it while fixing
>> up some other DMA issues.
>>
>> Suggested-by: Arnd Bergmann <arnd@arndb.de>
>> Signed-off-by: James Clark <james.clark@linaro.org>
> 
> This version looks good to me,
> 
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> 
> I had reviewed an internal version originally and had some comment
> on that, all of which are addressed now. You did not Cc me on the
> other patches, so I looked them up in the archive, Patch 3 also

Yes sorry about that. I've just started using "b4 send" and I was under 
the impression that it would automatically CC all patches the same way, 
but apparently not. Maybe I'm holding it wrong.

> looks good to me and complements this one (i.e. you really want
> the combination). I did not understand the logic in patch 4,
> and it would be good if someone else can take a closer look
> at that in order to Ack that.
> 
>        Arnd


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
  2025-06-09 15:32 ` [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer James Clark
@ 2025-06-10 11:34   ` Vladimir Oltean
  2025-06-10 15:41     ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-10 11:34 UTC (permalink / raw)
  To: James Clark; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel

On Mon, Jun 09, 2025 at 04:32:38PM +0100, James Clark wrote:
> In target mode, extra interrupts can be received between the end of a
> transfer and halting the module if the host continues sending more data.

Presumably you mean not just any extra interrupts can be received, but
specifically CMDTCF, since that triggers the complete(&dspi->xfer_done)
call. Other interrupt sources are masked in XSPI mode and should be
irrelevant.

> If the interrupt from this occurs after the reinit_completion() then the
> completion counter is left at a non-zero value. The next unrelated
> transfer initiated by userspace will then complete immediately without
> waiting for the interrupt or writing to the RX buffer.
> 
> Fix it by resetting the counter before the transfer so that lingering
> values are cleared. This is done after clearing the FIFOs and the
> status register but before the transfer is initiated, so no interrupts
> should be received at this point resulting in other race conditions.

Sorry, I don't have a lot of experience with the target mode, and when I
introduced the XSPI FIFO mode, I didn't take target mode into consideration.

The question is, does the module support XSPI FIFO writes in target
mode? In the LS1028A reference manual, I see PUSHR_SLAVE has the upper
16 bits (for the command) hidden, specifically there is no CTAS field
there that would point to one of the CTARE0/CTARE1 registers.
Cross-checking with the S32G3 RM, I see nothing fundamentally different.

I am surprised, given this fact, that the CMDTCF interrupt would fire at
all in target mode.

> 
> Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")

To be clear, if you ran 'git bisect' to track down this issue, it
wouldn't have pointed you to this commit, would it?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-09 15:32 ` [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA James Clark
  2025-06-10  8:26   ` Arnd Bergmann
@ 2025-06-10 15:15   ` Frank Li
  2025-06-10 15:46     ` James Clark
  2025-06-10 15:48     ` Arnd Bergmann
  1 sibling, 2 replies; 43+ messages in thread
From: Frank Li @ 2025-06-10 15:15 UTC (permalink / raw)
  To: James Clark
  Cc: Vladimir Oltean, Mark Brown, Vladimir Oltean, linux-spi, imx,
	linux-kernel, Arnd Bergmann

On Mon, Jun 09, 2025 at 04:32:39PM +0100, James Clark wrote:
> Using coherent memory here isn't functionally necessary.
> Because the
> change to use non-coherent memory isn't overly complex and only a few
> synchronization points are required, we might as well do it while fixing
> up some other DMA issues.

Any beanfit by use on-coherent memory here?

Frank

>
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  drivers/spi/spi-fsl-dspi.c | 55 +++++++++++++++++++++++++++++-----------------
>  1 file changed, 35 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
> index 386a17871e79..567632042f8f 100644
> --- a/drivers/spi/spi-fsl-dspi.c
> +++ b/drivers/spi/spi-fsl-dspi.c
> @@ -247,6 +247,11 @@ struct fsl_dspi {
>  	void (*dev_to_host)(struct fsl_dspi *dspi, u32 rxdata);
>  };
>
> +static int dspi_dma_transfer_size(struct fsl_dspi *dspi)
> +{
> +	return dspi->words_in_flight * DMA_SLAVE_BUSWIDTH_4_BYTES;
> +}
> +
>  static void dspi_native_host_to_dev(struct fsl_dspi *dspi, u32 *txdata)
>  {
>  	switch (dspi->oper_word_size) {
> @@ -361,7 +366,10 @@ static void dspi_tx_dma_callback(void *arg)
>  {
>  	struct fsl_dspi *dspi = arg;
>  	struct fsl_dspi_dma *dma = dspi->dma;
> +	struct device *dev = &dspi->pdev->dev;
>
> +	dma_sync_single_for_cpu(dev, dma->tx_dma_phys,
> +				dspi_dma_transfer_size(dspi), DMA_TO_DEVICE);
>  	complete(&dma->cmd_tx_complete);
>  }
>
> @@ -369,9 +377,13 @@ static void dspi_rx_dma_callback(void *arg)
>  {
>  	struct fsl_dspi *dspi = arg;
>  	struct fsl_dspi_dma *dma = dspi->dma;
> +	struct device *dev = &dspi->pdev->dev;
>  	int i;
>
>  	if (dspi->rx) {
> +		dma_sync_single_for_cpu(dev, dma->rx_dma_phys,
> +					dspi_dma_transfer_size(dspi),
> +					DMA_FROM_DEVICE);
>  		for (i = 0; i < dspi->words_in_flight; i++)
>  			dspi_push_rx(dspi, dspi->dma->rx_dma_buf[i]);
>  	}
> @@ -381,6 +393,7 @@ static void dspi_rx_dma_callback(void *arg)
>
>  static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
>  {
> +	size_t size = dspi_dma_transfer_size(dspi);
>  	struct device *dev = &dspi->pdev->dev;
>  	struct fsl_dspi_dma *dma = dspi->dma;
>  	int time_left;
> @@ -389,10 +402,9 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
>  	for (i = 0; i < dspi->words_in_flight; i++)
>  		dspi->dma->tx_dma_buf[i] = dspi_pop_tx_pushr(dspi);
>
> +	dma_sync_single_for_device(dev, dma->tx_dma_phys, size, DMA_TO_DEVICE);
>  	dma->tx_desc = dmaengine_prep_slave_single(dma->chan_tx,
> -					dma->tx_dma_phys,
> -					dspi->words_in_flight *
> -					DMA_SLAVE_BUSWIDTH_4_BYTES,
> +					dma->tx_dma_phys, size,
>  					DMA_MEM_TO_DEV,
>  					DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
>  	if (!dma->tx_desc) {
> @@ -407,10 +419,10 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
>  		return -EINVAL;
>  	}
>
> +	dma_sync_single_for_device(dev, dma->rx_dma_phys, size,
> +				   DMA_FROM_DEVICE);
>  	dma->rx_desc = dmaengine_prep_slave_single(dma->chan_rx,
> -					dma->rx_dma_phys,
> -					dspi->words_in_flight *
> -					DMA_SLAVE_BUSWIDTH_4_BYTES,
> +					dma->rx_dma_phys, size,
>  					DMA_DEV_TO_MEM,
>  					DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
>  	if (!dma->rx_desc) {
> @@ -512,17 +524,17 @@ static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
>  		goto err_tx_channel;
>  	}
>
> -	dma->tx_dma_buf = dma_alloc_coherent(dma->chan_tx->device->dev,
> -					     dma_bufsize, &dma->tx_dma_phys,
> -					     GFP_KERNEL);
> +	dma->tx_dma_buf = dma_alloc_noncoherent(dma->chan_tx->device->dev,
> +						dma_bufsize, &dma->tx_dma_phys,
> +						DMA_TO_DEVICE, GFP_KERNEL);
>  	if (!dma->tx_dma_buf) {
>  		ret = -ENOMEM;
>  		goto err_tx_dma_buf;
>  	}
>
> -	dma->rx_dma_buf = dma_alloc_coherent(dma->chan_rx->device->dev,
> -					     dma_bufsize, &dma->rx_dma_phys,
> -					     GFP_KERNEL);
> +	dma->rx_dma_buf = dma_alloc_noncoherent(dma->chan_rx->device->dev,
> +						dma_bufsize, &dma->rx_dma_phys,
> +						DMA_FROM_DEVICE, GFP_KERNEL);
>  	if (!dma->rx_dma_buf) {
>  		ret = -ENOMEM;
>  		goto err_rx_dma_buf;
> @@ -557,11 +569,12 @@ static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
>  	return 0;
>
>  err_slave_config:
> -	dma_free_coherent(dma->chan_rx->device->dev,
> -			  dma_bufsize, dma->rx_dma_buf, dma->rx_dma_phys);
> +	dma_free_noncoherent(dma->chan_rx->device->dev, dma_bufsize,
> +			     dma->rx_dma_buf, dma->rx_dma_phys,
> +			     DMA_FROM_DEVICE);
>  err_rx_dma_buf:
> -	dma_free_coherent(dma->chan_tx->device->dev,
> -			  dma_bufsize, dma->tx_dma_buf, dma->tx_dma_phys);
> +	dma_free_noncoherent(dma->chan_tx->device->dev, dma_bufsize,
> +			     dma->tx_dma_buf, dma->tx_dma_phys, DMA_TO_DEVICE);
>  err_tx_dma_buf:
>  	dma_release_channel(dma->chan_tx);
>  err_tx_channel:
> @@ -582,14 +595,16 @@ static void dspi_release_dma(struct fsl_dspi *dspi)
>  		return;
>
>  	if (dma->chan_tx) {
> -		dma_free_coherent(dma->chan_tx->device->dev, dma_bufsize,
> -				  dma->tx_dma_buf, dma->tx_dma_phys);
> +		dma_free_noncoherent(dma->chan_tx->device->dev, dma_bufsize,
> +				     dma->tx_dma_buf, dma->tx_dma_phys,
> +				     DMA_TO_DEVICE);
>  		dma_release_channel(dma->chan_tx);
>  	}
>
>  	if (dma->chan_rx) {
> -		dma_free_coherent(dma->chan_rx->device->dev, dma_bufsize,
> -				  dma->rx_dma_buf, dma->rx_dma_phys);
> +		dma_free_noncoherent(dma->chan_rx->device->dev, dma_bufsize,
> +				     dma->rx_dma_buf, dma->rx_dma_phys,
> +				     DMA_FROM_DEVICE);
>  		dma_release_channel(dma->chan_rx);
>  	}
>  }
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
  2025-06-10 11:34   ` Vladimir Oltean
@ 2025-06-10 15:41     ` James Clark
  2025-06-10 21:01       ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-10 15:41 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel



On 10/06/2025 12:34 pm, Vladimir Oltean wrote:
> On Mon, Jun 09, 2025 at 04:32:38PM +0100, James Clark wrote:
>> In target mode, extra interrupts can be received between the end of a
>> transfer and halting the module if the host continues sending more data.
> 
> Presumably you mean not just any extra interrupts can be received, but
> specifically CMDTCF, since that triggers the complete(&dspi->xfer_done)
> call. Other interrupt sources are masked in XSPI mode and should be
> irrelevant.
> 

Yes complete(&dspi->xfer_done) is called so CMDTCF is set. For example 
in one case of underflow I get SPI_SR = 0xca8b0450, which is these flags:

   TCF, TXRXS, TFUF, TFFF, CMDTCF, RFOF, RFDF, CMDFFF

Compared to a successful transfer I get 0xc2830330:

   TCF, TXRXS,       TFFF, CMDTCF,       RFDF, CMDFFF

>> If the interrupt from this occurs after the reinit_completion() then the
>> completion counter is left at a non-zero value. The next unrelated
>> transfer initiated by userspace will then complete immediately without
>> waiting for the interrupt or writing to the RX buffer.
>>
>> Fix it by resetting the counter before the transfer so that lingering
>> values are cleared. This is done after clearing the FIFOs and the
>> status register but before the transfer is initiated, so no interrupts
>> should be received at this point resulting in other race conditions.
> 
> Sorry, I don't have a lot of experience with the target mode, and when I
> introduced the XSPI FIFO mode, I didn't take target mode into consideration.
> 
> The question is, does the module support XSPI FIFO writes in target
> mode? In the LS1028A reference manual, I see PUSHR_SLAVE has the upper
> 16 bits (for the command) hidden, specifically there is no CTAS field
> there that would point to one of the CTARE0/CTARE1 registers.
> Cross-checking with the S32G3 RM, I see nothing fundamentally different.
> 
> I am surprised, given this fact, that the CMDTCF interrupt would fire at
> all in target mode.
> 


It's working in my testing where I've forced it to XSPI mode instead of 
DMA mode on S32G3. I assume the command is blank because in target mode 
CTAR0 (aka CTAR0_SLAVE) is always used regardless of the frame.

CTARE0 isn't explicitly relabeled like CTAR0, but this paragraph states 
that CTARE0 is used:

   50.4.3.2 Slave mode

   ... The SPI Slave mode transfer attributes are configured in the CTAR0
   and CTARE0 registers ...

Any transfers smaller than the FIFO are working in interrupt mode, 
although larger ones are problematic because there isn't enough time to 
reload the FIFOs while the host is still sending (hence the error I 
added in patch 4).

Polling mode isn't working at all because it has a timeout which gets 
hit and returns -ETIMEDOUT before the host sends anything. Although I 
added the check there for consistency and for catching host mode errors.

>>
>> Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
> 
> To be clear, if you ran 'git bisect' to track down this issue, it
> wouldn't have pointed you to this commit, would it?

I didn't test it no, but I did assume that the wake_up_interruptible() 
that got replaced wasn't vulnerable to this same issue. Because the 
spurious wake_up_interruptible() would be "lost", and a fresh one from 
the next transfer would have been required to proceed past the 
wait_event_interruptible().

Whereas wait_for_completion() is just a counter so it has the memory 
problem explained in the commit message.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-10 15:15   ` Frank Li
@ 2025-06-10 15:46     ` James Clark
  2025-06-13 15:56       ` David Laight
  2025-06-10 15:48     ` Arnd Bergmann
  1 sibling, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-10 15:46 UTC (permalink / raw)
  To: Frank Li
  Cc: Vladimir Oltean, Mark Brown, Vladimir Oltean, linux-spi, imx,
	linux-kernel, Arnd Bergmann



On 10/06/2025 4:15 pm, Frank Li wrote:
> On Mon, Jun 09, 2025 at 04:32:39PM +0100, James Clark wrote:
>> Using coherent memory here isn't functionally necessary.
>> Because the
>> change to use non-coherent memory isn't overly complex and only a few
>> synchronization points are required, we might as well do it while fixing
>> up some other DMA issues.
> 
> Any beanfit by use on-coherent memory here?
> 
> Frank
> 

Presumably less cache maintenance traffic?

Thanks
James

>>
>> Suggested-by: Arnd Bergmann <arnd@arndb.de>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   drivers/spi/spi-fsl-dspi.c | 55 +++++++++++++++++++++++++++++-----------------
>>   1 file changed, 35 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
>> index 386a17871e79..567632042f8f 100644
>> --- a/drivers/spi/spi-fsl-dspi.c
>> +++ b/drivers/spi/spi-fsl-dspi.c
>> @@ -247,6 +247,11 @@ struct fsl_dspi {
>>   	void (*dev_to_host)(struct fsl_dspi *dspi, u32 rxdata);
>>   };
>>
>> +static int dspi_dma_transfer_size(struct fsl_dspi *dspi)
>> +{
>> +	return dspi->words_in_flight * DMA_SLAVE_BUSWIDTH_4_BYTES;
>> +}
>> +
>>   static void dspi_native_host_to_dev(struct fsl_dspi *dspi, u32 *txdata)
>>   {
>>   	switch (dspi->oper_word_size) {
>> @@ -361,7 +366,10 @@ static void dspi_tx_dma_callback(void *arg)
>>   {
>>   	struct fsl_dspi *dspi = arg;
>>   	struct fsl_dspi_dma *dma = dspi->dma;
>> +	struct device *dev = &dspi->pdev->dev;
>>
>> +	dma_sync_single_for_cpu(dev, dma->tx_dma_phys,
>> +				dspi_dma_transfer_size(dspi), DMA_TO_DEVICE);
>>   	complete(&dma->cmd_tx_complete);
>>   }
>>
>> @@ -369,9 +377,13 @@ static void dspi_rx_dma_callback(void *arg)
>>   {
>>   	struct fsl_dspi *dspi = arg;
>>   	struct fsl_dspi_dma *dma = dspi->dma;
>> +	struct device *dev = &dspi->pdev->dev;
>>   	int i;
>>
>>   	if (dspi->rx) {
>> +		dma_sync_single_for_cpu(dev, dma->rx_dma_phys,
>> +					dspi_dma_transfer_size(dspi),
>> +					DMA_FROM_DEVICE);
>>   		for (i = 0; i < dspi->words_in_flight; i++)
>>   			dspi_push_rx(dspi, dspi->dma->rx_dma_buf[i]);
>>   	}
>> @@ -381,6 +393,7 @@ static void dspi_rx_dma_callback(void *arg)
>>
>>   static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
>>   {
>> +	size_t size = dspi_dma_transfer_size(dspi);
>>   	struct device *dev = &dspi->pdev->dev;
>>   	struct fsl_dspi_dma *dma = dspi->dma;
>>   	int time_left;
>> @@ -389,10 +402,9 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
>>   	for (i = 0; i < dspi->words_in_flight; i++)
>>   		dspi->dma->tx_dma_buf[i] = dspi_pop_tx_pushr(dspi);
>>
>> +	dma_sync_single_for_device(dev, dma->tx_dma_phys, size, DMA_TO_DEVICE);
>>   	dma->tx_desc = dmaengine_prep_slave_single(dma->chan_tx,
>> -					dma->tx_dma_phys,
>> -					dspi->words_in_flight *
>> -					DMA_SLAVE_BUSWIDTH_4_BYTES,
>> +					dma->tx_dma_phys, size,
>>   					DMA_MEM_TO_DEV,
>>   					DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
>>   	if (!dma->tx_desc) {
>> @@ -407,10 +419,10 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
>>   		return -EINVAL;
>>   	}
>>
>> +	dma_sync_single_for_device(dev, dma->rx_dma_phys, size,
>> +				   DMA_FROM_DEVICE);
>>   	dma->rx_desc = dmaengine_prep_slave_single(dma->chan_rx,
>> -					dma->rx_dma_phys,
>> -					dspi->words_in_flight *
>> -					DMA_SLAVE_BUSWIDTH_4_BYTES,
>> +					dma->rx_dma_phys, size,
>>   					DMA_DEV_TO_MEM,
>>   					DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
>>   	if (!dma->rx_desc) {
>> @@ -512,17 +524,17 @@ static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
>>   		goto err_tx_channel;
>>   	}
>>
>> -	dma->tx_dma_buf = dma_alloc_coherent(dma->chan_tx->device->dev,
>> -					     dma_bufsize, &dma->tx_dma_phys,
>> -					     GFP_KERNEL);
>> +	dma->tx_dma_buf = dma_alloc_noncoherent(dma->chan_tx->device->dev,
>> +						dma_bufsize, &dma->tx_dma_phys,
>> +						DMA_TO_DEVICE, GFP_KERNEL);
>>   	if (!dma->tx_dma_buf) {
>>   		ret = -ENOMEM;
>>   		goto err_tx_dma_buf;
>>   	}
>>
>> -	dma->rx_dma_buf = dma_alloc_coherent(dma->chan_rx->device->dev,
>> -					     dma_bufsize, &dma->rx_dma_phys,
>> -					     GFP_KERNEL);
>> +	dma->rx_dma_buf = dma_alloc_noncoherent(dma->chan_rx->device->dev,
>> +						dma_bufsize, &dma->rx_dma_phys,
>> +						DMA_FROM_DEVICE, GFP_KERNEL);
>>   	if (!dma->rx_dma_buf) {
>>   		ret = -ENOMEM;
>>   		goto err_rx_dma_buf;
>> @@ -557,11 +569,12 @@ static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
>>   	return 0;
>>
>>   err_slave_config:
>> -	dma_free_coherent(dma->chan_rx->device->dev,
>> -			  dma_bufsize, dma->rx_dma_buf, dma->rx_dma_phys);
>> +	dma_free_noncoherent(dma->chan_rx->device->dev, dma_bufsize,
>> +			     dma->rx_dma_buf, dma->rx_dma_phys,
>> +			     DMA_FROM_DEVICE);
>>   err_rx_dma_buf:
>> -	dma_free_coherent(dma->chan_tx->device->dev,
>> -			  dma_bufsize, dma->tx_dma_buf, dma->tx_dma_phys);
>> +	dma_free_noncoherent(dma->chan_tx->device->dev, dma_bufsize,
>> +			     dma->tx_dma_buf, dma->tx_dma_phys, DMA_TO_DEVICE);
>>   err_tx_dma_buf:
>>   	dma_release_channel(dma->chan_tx);
>>   err_tx_channel:
>> @@ -582,14 +595,16 @@ static void dspi_release_dma(struct fsl_dspi *dspi)
>>   		return;
>>
>>   	if (dma->chan_tx) {
>> -		dma_free_coherent(dma->chan_tx->device->dev, dma_bufsize,
>> -				  dma->tx_dma_buf, dma->tx_dma_phys);
>> +		dma_free_noncoherent(dma->chan_tx->device->dev, dma_bufsize,
>> +				     dma->tx_dma_buf, dma->tx_dma_phys,
>> +				     DMA_TO_DEVICE);
>>   		dma_release_channel(dma->chan_tx);
>>   	}
>>
>>   	if (dma->chan_rx) {
>> -		dma_free_coherent(dma->chan_rx->device->dev, dma_bufsize,
>> -				  dma->rx_dma_buf, dma->rx_dma_phys);
>> +		dma_free_noncoherent(dma->chan_rx->device->dev, dma_bufsize,
>> +				     dma->rx_dma_buf, dma->rx_dma_phys,
>> +				     DMA_FROM_DEVICE);
>>   		dma_release_channel(dma->chan_rx);
>>   	}
>>   }
>>
>> --
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-10 15:15   ` Frank Li
  2025-06-10 15:46     ` James Clark
@ 2025-06-10 15:48     ` Arnd Bergmann
  2025-06-10 15:56       ` Frank Li
  1 sibling, 1 reply; 43+ messages in thread
From: Arnd Bergmann @ 2025-06-10 15:48 UTC (permalink / raw)
  To: Frank Li, James Clark
  Cc: Vladimir Oltean, Mark Brown, Vladimir Oltean, linux-spi, imx,
	linux-kernel

On Tue, Jun 10, 2025, at 17:15, Frank Li wrote:
> On Mon, Jun 09, 2025 at 04:32:39PM +0100, James Clark wrote:
>> Using coherent memory here isn't functionally necessary.
>> Because the
>> change to use non-coherent memory isn't overly complex and only a few
>> synchronization points are required, we might as well do it while fixing
>> up some other DMA issues.
>
> Any beanfit by use on-coherent memory here?

The driver copies data in and out of a coherent buffer by default. This is
fine if the buffer is only a few bytes in size, but for large transfers
this is quite slow because this bypasses the cache for any DMA master
that is marked as not "dma-coherent" in devicetree.

Patch 3/4 changes the size from a few bytes to many pages of memory,
so it's access the buffer in cache first and manually maintain
coherency.

     Arnd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-10 15:48     ` Arnd Bergmann
@ 2025-06-10 15:56       ` Frank Li
  2025-06-11  9:01         ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: Frank Li @ 2025-06-10 15:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: James Clark, Vladimir Oltean, Mark Brown, Vladimir Oltean,
	linux-spi, imx, linux-kernel

On Tue, Jun 10, 2025 at 05:48:40PM +0200, Arnd Bergmann wrote:
> On Tue, Jun 10, 2025, at 17:15, Frank Li wrote:
> > On Mon, Jun 09, 2025 at 04:32:39PM +0100, James Clark wrote:
> >> Using coherent memory here isn't functionally necessary.
> >> Because the
> >> change to use non-coherent memory isn't overly complex and only a few
> >> synchronization points are required, we might as well do it while fixing
> >> up some other DMA issues.
> >
> > Any beanfit by use on-coherent memory here?
>
> The driver copies data in and out of a coherent buffer by default. This is
> fine if the buffer is only a few bytes in size, but for large transfers
> this is quite slow because this bypasses the cache for any DMA master
> that is marked as not "dma-coherent" in devicetree.

I see, non-coherent memory use cachable normal memory page. but coherent
use non-cachable normal memory page.

Can you add performance beneafit information after use non-coherent memory
in commit message to let reviewer easily know your intention.

Frank

>
> Patch 3/4 changes the size from a few bytes to many pages of memory,
> so it's access the buffer in cache first and manually maintain
> coherency.
>
>      Arnd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
  2025-06-10 15:41     ` James Clark
@ 2025-06-10 21:01       ` Vladimir Oltean
  2025-06-10 21:31         ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-10 21:01 UTC (permalink / raw)
  To: James Clark; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel

On Tue, Jun 10, 2025 at 04:41:04PM +0100, James Clark wrote:
> On 10/06/2025 12:34 pm, Vladimir Oltean wrote:
> > On Mon, Jun 09, 2025 at 04:32:38PM +0100, James Clark wrote:
> > > In target mode, extra interrupts can be received between the end of a
> > > transfer and halting the module if the host continues sending more data.
> > 
> > Presumably you mean not just any extra interrupts can be received, but
> > specifically CMDTCF, since that triggers the complete(&dspi->xfer_done)
> > call. Other interrupt sources are masked in XSPI mode and should be
> > irrelevant.
> 
> Yes complete(&dspi->xfer_done) is called so CMDTCF is set. For example in
> one case of underflow I get SPI_SR = 0xca8b0450, which is these flags:
> 
>   TCF, TXRXS, TFUF, TFFF, CMDTCF, RFOF, RFDF, CMDFFF
> 
> Compared to a successful transfer I get 0xc2830330:
> 
>   TCF, TXRXS,       TFFF, CMDTCF,       RFDF, CMDFFF

Ok, so my new question would be: if CMDTCF is set, presumably it means a
command was transferred. What command was transferred, and who put data
in the FIFO for it?

Because the answer to the above is AFAIU "no one", I guess the driver
should ignore CMDTCF when TFUF (TX FIFO underflow) is set; I consider
that to be the logic bug. You are also doing that in patch 4/4, except
you still call complete() for some reason. If you don't call complete(),
there is no reason to fend against spurious completions.

I think I would prefer seeing more deliberate decisions in the driver,
it helps if things don't just work by coincidence.

> > > If the interrupt from this occurs after the reinit_completion() then the
> > > completion counter is left at a non-zero value. The next unrelated
> > > transfer initiated by userspace will then complete immediately without
> > > waiting for the interrupt or writing to the RX buffer.
> > > 
> > > Fix it by resetting the counter before the transfer so that lingering
> > > values are cleared. This is done after clearing the FIFOs and the
> > > status register but before the transfer is initiated, so no interrupts
> > > should be received at this point resulting in other race conditions.
> > 
> > Sorry, I don't have a lot of experience with the target mode, and when I
> > introduced the XSPI FIFO mode, I didn't take target mode into consideration.
> > 
> > The question is, does the module support XSPI FIFO writes in target
> > mode? In the LS1028A reference manual, I see PUSHR_SLAVE has the upper
> > 16 bits (for the command) hidden, specifically there is no CTAS field
> > there that would point to one of the CTARE0/CTARE1 registers.
> > Cross-checking with the S32G3 RM, I see nothing fundamentally different.
> > 
> > I am surprised, given this fact, that the CMDTCF interrupt would fire at
> > all in target mode.
> 
> It's working in my testing where I've forced it to XSPI mode instead of DMA
> mode on S32G3. I assume the command is blank because in target mode CTAR0
> (aka CTAR0_SLAVE) is always used regardless of the frame.
> 
> CTARE0 isn't explicitly relabeled like CTAR0, but this paragraph states that
> CTARE0 is used:
> 
>   50.4.3.2 Slave mode
> 
>   ... The SPI Slave mode transfer attributes are configured in the CTAR0
>   and CTARE0 registers ...

That's an interesting piece of data which I wasn't aware of, thanks.

> Any transfers smaller than the FIFO are working in interrupt mode, although
> larger ones are problematic because there isn't enough time to reload the
> FIFOs while the host is still sending (hence the error I added in patch 4).
> 
> Polling mode isn't working at all because it has a timeout which gets hit
> and returns -ETIMEDOUT before the host sends anything. Although I added the
> check there for consistency and for catching host mode errors.
> 
> > > 
> > > Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
> > 
> > To be clear, if you ran 'git bisect' to track down this issue, it
> > wouldn't have pointed you to this commit, would it?
> 
> I didn't test it no, but I did assume that the wake_up_interruptible() that
> got replaced wasn't vulnerable to this same issue. Because the spurious
> wake_up_interruptible() would be "lost", and a fresh one from the next
> transfer would have been required to proceed past the
> wait_event_interruptible().
> 
> Whereas wait_for_completion() is just a counter so it has the memory problem
> explained in the commit message.

Why would a spurious wake_up_interruptible() be lost? Is it because of
the dspi->waitflags condition not becoming 1? It would also become 1...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
  2025-06-10 21:01       ` Vladimir Oltean
@ 2025-06-10 21:31         ` Vladimir Oltean
  2025-06-11  9:12           ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-10 21:31 UTC (permalink / raw)
  To: James Clark; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel

On Wed, Jun 11, 2025 at 12:01:47AM +0300, Vladimir Oltean wrote:
> On Tue, Jun 10, 2025 at 04:41:04PM +0100, James Clark wrote:
> > On 10/06/2025 12:34 pm, Vladimir Oltean wrote:
> > > On Mon, Jun 09, 2025 at 04:32:38PM +0100, James Clark wrote:
> > > > In target mode, extra interrupts can be received between the end of a
> > > > transfer and halting the module if the host continues sending more data.
> > > 
> > > Presumably you mean not just any extra interrupts can be received, but
> > > specifically CMDTCF, since that triggers the complete(&dspi->xfer_done)
> > > call. Other interrupt sources are masked in XSPI mode and should be
> > > irrelevant.
> > 
> > Yes complete(&dspi->xfer_done) is called so CMDTCF is set. For example in
> > one case of underflow I get SPI_SR = 0xca8b0450, which is these flags:
> > 
> >   TCF, TXRXS, TFUF, TFFF, CMDTCF, RFOF, RFDF, CMDFFF
> > 
> > Compared to a successful transfer I get 0xc2830330:
> > 
> >   TCF, TXRXS,       TFFF, CMDTCF,       RFDF, CMDFFF
> 
> Ok, so my new question would be: if CMDTCF is set, presumably it means a
> command was transferred. What command was transferred, and who put data
> in the FIFO for it?
> 
> Because the answer to the above is AFAIU "no one", I guess the driver
> should ignore CMDTCF when TFUF (TX FIFO underflow) is set; I consider
> that to be the logic bug. You are also doing that in patch 4/4, except
> you still call complete() for some reason. If you don't call complete(),
> there is no reason to fend against spurious completions.
> 
> I think I would prefer seeing more deliberate decisions in the driver,
> it helps if things don't just work by coincidence.

After thinking some more, I think I agree with your decision.

If there's a TX FIFO underflow in target mode, presumably there are 2
cases to handle.

1. The underflow occurred in the middle of a large-ish SPI message
   prepared by the driver, where the driver couldn't refill the TX FIFO
   fast enough in dspi_interrupt().

2. The underflow occurred because the driver had absolutely no SPI
   message prepared, and yet the host wanted something.

What changed my mind is that if you don't call complete() on SPI_SR_TFUF
(like I suggested), then case #1 above will hang. Your proposal is to
call complete() anyway, but to discard any previous completions,
associated with case #2, when there's a new message to prepare.

But I would like you to introduce a comment above the earlier
reinit_completion() explaining why it is there.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors
  2025-06-09 15:32 ` [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors James Clark
@ 2025-06-10 21:52   ` Vladimir Oltean
  2025-06-11 14:40     ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-10 21:52 UTC (permalink / raw)
  To: James Clark; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel

On Mon, Jun 09, 2025 at 04:32:41PM +0100, James Clark wrote:
> In target mode, the host sending more data than can be consumed would be
> a common problem for any message exceeding the FIFO or DMA buffer size.
> Cancel the whole message as soon as this condition is hit as the message
> will be corrupted.
> 
> Only do this for target mode in a DMA transfer because we need to add a
> register read. In IRQ and polling modes always do it because SPI_SR was
> already read and it might catch some host mode programming/buffer
> management errors too.
> 
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
>  drivers/spi/spi-fsl-dspi.c | 31 ++++++++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
> index e211e44e977f..75767d756496 100644
> --- a/drivers/spi/spi-fsl-dspi.c
> +++ b/drivers/spi/spi-fsl-dspi.c
> @@ -228,6 +228,7 @@ struct fsl_dspi {
>  	const struct fsl_dspi_devtype_data	*devtype_data;
>  
>  	struct completion			xfer_done;
> +	int                                     xfer_status;

This is certainly simple, and simple is not bad.

But based on the fact that you care about the xfer_status only when
there's an associated dspi->cur_msg, have you considered to update
dspi->cur_msg->status directly?

You'd need to reset dspi->cur_msg to NULL at the end of
dspi_transfer_one_message(), and then check for NULL when you update
the transfer status.

>  
>  	struct fsl_dspi_dma			*dma;
>  
> @@ -504,12 +505,22 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
>  
>  static void dspi_setup_accel(struct fsl_dspi *dspi);
>  
> +static bool dspi_is_fifo_overflow(struct fsl_dspi *dspi, u32 spi_sr)

Can you name this some way else, like dspi_fifo_error()? It's strange
for a reader for this to return true on an underflow.

> +{
> +	if (spi_sr & (SPI_SR_TFUF | SPI_SR_RFOF)) {
> +		dev_err(&dspi->pdev->dev, "FIFO under/overflow");

Missing \n.

And you should use dev_err_ratelimited(), as you don't want an external
entity, when in target mode, to DoS you.

Also, could there be individual error messages for TFUF and for RFOF?
If you are concerned about the penalty for the error-free case, make the
check two-level. First for all errors, then for individual errors.

> +		return true;
> +	}
> +	return false;
> +}
> +
>  static int dspi_dma_xfer(struct fsl_dspi *dspi)
>  {
>  	struct spi_message *message = dspi->cur_msg;
>  	int max_words = dspi_dma_max_datawords(dspi);
>  	struct device *dev = &dspi->pdev->dev;
>  	int ret = 0;
> +	u32 spi_sr;
>  
>  	/*
>  	 * dspi->len gets decremented by dspi_pop_tx_pushr in
> @@ -531,6 +542,12 @@ static int dspi_dma_xfer(struct fsl_dspi *dspi)
>  			dev_err(dev, "DMA transfer failed\n");
>  			break;
>  		}
> +
> +		if (spi_controller_is_target(dspi->ctlr)) {
> +			regmap_read(dspi->regmap, SPI_SR, &spi_sr);
> +			if (dspi_is_fifo_overflow(dspi, spi_sr))
> +				return -EIO;
> +		}
>  	}

Can this go within this block from dspi_next_xfer_dma_submit() instead?

	if (spi_controller_is_target(dspi->ctlr)) {
		wait_for_completion_interruptible(&dspi->dma->cmd_rx_complete);
		// here
		return 0;
	}

>  
>  	return ret;
> @@ -918,6 +935,8 @@ static int dspi_poll(struct fsl_dspi *dspi)
>  		regmap_read(dspi->regmap, SPI_SR, &spi_sr);
>  		regmap_write(dspi->regmap, SPI_SR, spi_sr);
>  
> +		if (dspi_is_fifo_overflow(dspi, spi_sr))
> +			return -EIO;
>  		if (spi_sr & SPI_SR_CMDTCF)
>  			break;
>  	} while (--tries);
> @@ -939,8 +958,12 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
>  	if (!(spi_sr & SPI_SR_CMDTCF))
>  		return IRQ_NONE;
>  
> -	if (dspi_rxtx(dspi) == 0)
> +	if (dspi_is_fifo_overflow(dspi, spi_sr)) {
> +		WRITE_ONCE(dspi->xfer_status, -EIO);
> +		complete(&dspi->xfer_done);
> +	} else if (dspi_rxtx(dspi) == 0) {
>  		complete(&dspi->xfer_done);
> +	}
>  
>  	return IRQ_HANDLED;
>  }
> @@ -1032,13 +1055,15 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
>  		if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
>  			status = dspi_dma_xfer(dspi);
>  		} else {
> -			if (dspi->irq)
> +			if (dspi->irq) {
> +				WRITE_ONCE(dspi->xfer_status, 0);
>  				reinit_completion(&dspi->xfer_done);
> -

Nitpick: The blank line was doing fine here.

> +			}
>  			dspi_fifo_write(dspi);
>  
>  			if (dspi->irq) {
>  				wait_for_completion(&dspi->xfer_done);
> +				status = READ_ONCE(dspi->xfer_status);
>  			} else {
>  				do {
>  					status = dspi_poll(dspi);
> 
> -- 
> 2.34.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-10 15:56       ` Frank Li
@ 2025-06-11  9:01         ` Vladimir Oltean
  2025-06-12 11:05           ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-11  9:01 UTC (permalink / raw)
  To: James Clark
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel

On Tue, Jun 10, 2025 at 11:56:34AM -0400, Frank Li wrote:
> Can you add performance beneafit information after use non-coherent memory
> in commit message to let reviewer easily know your intention.

To expand on that, you can post the output of something like this
(before and after):
$ spidev_test --device /dev/spidev1.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 10000000
where /dev/spidev1.0 is an unconnected chip select with a dummy entry in
the device tree.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
  2025-06-10 21:31         ` Vladimir Oltean
@ 2025-06-11  9:12           ` James Clark
  0 siblings, 0 replies; 43+ messages in thread
From: James Clark @ 2025-06-11  9:12 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel



On 10/06/2025 10:31 pm, Vladimir Oltean wrote:
> On Wed, Jun 11, 2025 at 12:01:47AM +0300, Vladimir Oltean wrote:
>> On Tue, Jun 10, 2025 at 04:41:04PM +0100, James Clark wrote:
>>> On 10/06/2025 12:34 pm, Vladimir Oltean wrote:
>>>> On Mon, Jun 09, 2025 at 04:32:38PM +0100, James Clark wrote:
>>>>> In target mode, extra interrupts can be received between the end of a
>>>>> transfer and halting the module if the host continues sending more data.
>>>>
>>>> Presumably you mean not just any extra interrupts can be received, but
>>>> specifically CMDTCF, since that triggers the complete(&dspi->xfer_done)
>>>> call. Other interrupt sources are masked in XSPI mode and should be
>>>> irrelevant.
>>>
>>> Yes complete(&dspi->xfer_done) is called so CMDTCF is set. For example in
>>> one case of underflow I get SPI_SR = 0xca8b0450, which is these flags:
>>>
>>>    TCF, TXRXS, TFUF, TFFF, CMDTCF, RFOF, RFDF, CMDFFF
>>>
>>> Compared to a successful transfer I get 0xc2830330:
>>>
>>>    TCF, TXRXS,       TFFF, CMDTCF,       RFDF, CMDFFF
>>
>> Ok, so my new question would be: if CMDTCF is set, presumably it means a
>> command was transferred. What command was transferred, and who put data
>> in the FIFO for it?
>>
>> Because the answer to the above is AFAIU "no one", I guess the driver
>> should ignore CMDTCF when TFUF (TX FIFO underflow) is set; I consider
>> that to be the logic bug. You are also doing that in patch 4/4, except
>> you still call complete() for some reason. If you don't call complete(),
>> there is no reason to fend against spurious completions.
>>
>> I think I would prefer seeing more deliberate decisions in the driver,
>> it helps if things don't just work by coincidence.
> 
> After thinking some more, I think I agree with your decision.
> 
> If there's a TX FIFO underflow in target mode, presumably there are 2
> cases to handle.
> 
> 1. The underflow occurred in the middle of a large-ish SPI message
>     prepared by the driver, where the driver couldn't refill the TX FIFO
>     fast enough in dspi_interrupt().
> 
> 2. The underflow occurred because the driver had absolutely no SPI
>     message prepared, and yet the host wanted something.
> 
> What changed my mind is that if you don't call complete() on SPI_SR_TFUF
> (like I suggested), then case #1 above will hang. Your proposal is to
> call complete() anyway, but to discard any previous completions,
> associated with case #2, when there's a new message to prepare.
> 
> But I would like you to introduce a comment above the earlier
> reinit_completion() explaining why it is there.

Ok I can do that. I was going to say that a call to complete() would be 
required, sometimes you only get a single interrupt with both CMDTCF and 
TFUF set, rather than two. So it can't be ignored but it seems like 
we've come to the same conclusion.

I did try a few other approaches. One was disabling SPI in the interrupt 
handler on the first completion, but that didn't work because you need 
to wait for stop mode which might hang, and I was still struggling to 
stop the interrupt handler from firing again but I gave up before 
exploring it fully because it didn't feel right.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors
  2025-06-10 21:52   ` Vladimir Oltean
@ 2025-06-11 14:40     ` James Clark
  2025-06-11 14:56       ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-11 14:40 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel



On 10/06/2025 10:52 pm, Vladimir Oltean wrote:
> On Mon, Jun 09, 2025 at 04:32:41PM +0100, James Clark wrote:
>> In target mode, the host sending more data than can be consumed would be
>> a common problem for any message exceeding the FIFO or DMA buffer size.
>> Cancel the whole message as soon as this condition is hit as the message
>> will be corrupted.
>>
>> Only do this for target mode in a DMA transfer because we need to add a
>> register read. In IRQ and polling modes always do it because SPI_SR was
>> already read and it might catch some host mode programming/buffer
>> management errors too.
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   drivers/spi/spi-fsl-dspi.c | 31 ++++++++++++++++++++++++++++---
>>   1 file changed, 28 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
>> index e211e44e977f..75767d756496 100644
>> --- a/drivers/spi/spi-fsl-dspi.c
>> +++ b/drivers/spi/spi-fsl-dspi.c
>> @@ -228,6 +228,7 @@ struct fsl_dspi {
>>   	const struct fsl_dspi_devtype_data	*devtype_data;
>>   
>>   	struct completion			xfer_done;
>> +	int                                     xfer_status;
> 
> This is certainly simple, and simple is not bad.
> 
> But based on the fact that you care about the xfer_status only when
> there's an associated dspi->cur_msg, have you considered to update
> dspi->cur_msg->status directly?
> 
> You'd need to reset dspi->cur_msg to NULL at the end of
> dspi_transfer_one_message(), and then check for NULL when you update
> the transfer status.
> 

That will work. I can use it for polling and DMA modes too and I think 
it looks a bit better after a refactor.

>>   
>>   	struct fsl_dspi_dma			*dma;
>>   
>> @@ -504,12 +505,22 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
>>   
>>   static void dspi_setup_accel(struct fsl_dspi *dspi);
>>   
>> +static bool dspi_is_fifo_overflow(struct fsl_dspi *dspi, u32 spi_sr)
> 
> Can you name this some way else, like dspi_fifo_error()? It's strange
> for a reader for this to return true on an underflow.
> 

Will do

>> +{
>> +	if (spi_sr & (SPI_SR_TFUF | SPI_SR_RFOF)) {
>> +		dev_err(&dspi->pdev->dev, "FIFO under/overflow");
> 
> Missing \n.
> 
> And you should use dev_err_ratelimited(), as you don't want an external
> entity, when in target mode, to DoS you.
>

Ack

> Also, could there be individual error messages for TFUF and for RFOF?
> If you are concerned about the penalty for the error-free case, make the
> check two-level. First for all errors, then for individual errors.
> 

If I was going to split them I would probably let the compiler optimize 
it whichever way was best. The real reason for combining them is because 
usually you get them both together. As long as the message and fifos are 
configured correctly you'd always get TFUF and RFOF at the same time and 
I wanted to avoid printing twice for one event.

We could have 3 different warnings, TFUF/RFOF, TFUF and RFOF but I don't 
think that's useful to an end user. At that level of debugging you'd 
probably have a load of other debug printfs added anyway.

>> +		return true;
>> +	}
>> +	return false;
>> +}
>> +
>>   static int dspi_dma_xfer(struct fsl_dspi *dspi)
>>   {
>>   	struct spi_message *message = dspi->cur_msg;
>>   	int max_words = dspi_dma_max_datawords(dspi);
>>   	struct device *dev = &dspi->pdev->dev;
>>   	int ret = 0;
>> +	u32 spi_sr;
>>   
>>   	/*
>>   	 * dspi->len gets decremented by dspi_pop_tx_pushr in
>> @@ -531,6 +542,12 @@ static int dspi_dma_xfer(struct fsl_dspi *dspi)
>>   			dev_err(dev, "DMA transfer failed\n");
>>   			break;
>>   		}
>> +
>> +		if (spi_controller_is_target(dspi->ctlr)) {
>> +			regmap_read(dspi->regmap, SPI_SR, &spi_sr);
>> +			if (dspi_is_fifo_overflow(dspi, spi_sr))
>> +				return -EIO;
>> +		}
>>   	}
> 
> Can this go within this block from dspi_next_xfer_dma_submit() instead?
> 
> 	if (spi_controller_is_target(dspi->ctlr)) {
> 		wait_for_completion_interruptible(&dspi->dma->cmd_rx_complete);
> 		// here
> 		return 0;
> 	}
> 

Makes sense yeah

>>   
>>   	return ret;
>> @@ -918,6 +935,8 @@ static int dspi_poll(struct fsl_dspi *dspi)
>>   		regmap_read(dspi->regmap, SPI_SR, &spi_sr);
>>   		regmap_write(dspi->regmap, SPI_SR, spi_sr);
>>   
>> +		if (dspi_is_fifo_overflow(dspi, spi_sr))
>> +			return -EIO;
>>   		if (spi_sr & SPI_SR_CMDTCF)
>>   			break;
>>   	} while (--tries);
>> @@ -939,8 +958,12 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
>>   	if (!(spi_sr & SPI_SR_CMDTCF))
>>   		return IRQ_NONE;
>>   
>> -	if (dspi_rxtx(dspi) == 0)
>> +	if (dspi_is_fifo_overflow(dspi, spi_sr)) {
>> +		WRITE_ONCE(dspi->xfer_status, -EIO);
>> +		complete(&dspi->xfer_done);
>> +	} else if (dspi_rxtx(dspi) == 0) {
>>   		complete(&dspi->xfer_done);
>> +	}
>>   
>>   	return IRQ_HANDLED;
>>   }
>> @@ -1032,13 +1055,15 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
>>   		if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
>>   			status = dspi_dma_xfer(dspi);
>>   		} else {
>> -			if (dspi->irq)
>> +			if (dspi->irq) {
>> +				WRITE_ONCE(dspi->xfer_status, 0);
>>   				reinit_completion(&dspi->xfer_done);
>> -
> 
> Nitpick: The blank line was doing fine here.
> 

Ack

>> +			}
>>   			dspi_fifo_write(dspi);
>>   
>>   			if (dspi->irq) {
>>   				wait_for_completion(&dspi->xfer_done);
>> +				status = READ_ONCE(dspi->xfer_status);
>>   			} else {
>>   				do {
>>   					status = dspi_poll(dspi);
>>
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors
  2025-06-11 14:40     ` James Clark
@ 2025-06-11 14:56       ` Vladimir Oltean
  2025-06-11 15:00         ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-11 14:56 UTC (permalink / raw)
  To: James Clark; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel

On Wed, Jun 11, 2025 at 03:40:40PM +0100, James Clark wrote:
> > Also, could there be individual error messages for TFUF and for RFOF?
> > If you are concerned about the penalty for the error-free case, make the
> > check two-level. First for all errors, then for individual errors.
> > 
> 
> If I was going to split them I would probably let the compiler optimize it
> whichever way was best. The real reason for combining them is because
> usually you get them both together. As long as the message and fifos are
> configured correctly you'd always get TFUF and RFOF at the same time and I
> wanted to avoid printing twice for one event.

In that case, why not:
	if (spi_sr & (SPI_SR_TFUF | SPI_SR_RFOF)) {
		dev_err_ratelimited(dev, "FIFO errors:%s%s\n",
				    spi_sr & SPI_SR_TFUF ? " TX underflow," : "",
				    spi_sr & SPI_SR_RFOF ? " RX overflow," : "");
		}
	}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors
  2025-06-11 14:56       ` Vladimir Oltean
@ 2025-06-11 15:00         ` James Clark
  0 siblings, 0 replies; 43+ messages in thread
From: James Clark @ 2025-06-11 15:00 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel



On 11/06/2025 3:56 pm, Vladimir Oltean wrote:
> On Wed, Jun 11, 2025 at 03:40:40PM +0100, James Clark wrote:
>>> Also, could there be individual error messages for TFUF and for RFOF?
>>> If you are concerned about the penalty for the error-free case, make the
>>> check two-level. First for all errors, then for individual errors.
>>>
>>
>> If I was going to split them I would probably let the compiler optimize it
>> whichever way was best. The real reason for combining them is because
>> usually you get them both together. As long as the message and fifos are
>> configured correctly you'd always get TFUF and RFOF at the same time and I
>> wanted to avoid printing twice for one event.
> 
> In that case, why not:
> 	if (spi_sr & (SPI_SR_TFUF | SPI_SR_RFOF)) {
> 		dev_err_ratelimited(dev, "FIFO errors:%s%s\n",
> 				    spi_sr & SPI_SR_TFUF ? " TX underflow," : "",
> 				    spi_sr & SPI_SR_RFOF ? " RX overflow," : "");
> 		}
> 	}

Yep that looks good, will do that


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-11  9:01         ` Vladimir Oltean
@ 2025-06-12 11:05           ` James Clark
  2025-06-12 11:15             ` Vladimir Oltean
  2025-06-12 11:15             ` Arnd Bergmann
  0 siblings, 2 replies; 43+ messages in thread
From: James Clark @ 2025-06-12 11:05 UTC (permalink / raw)
  To: Vladimir Oltean, Arnd Bergmann, Frank Li
  Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel



On 11/06/2025 10:01 am, Vladimir Oltean wrote:
> On Tue, Jun 10, 2025 at 11:56:34AM -0400, Frank Li wrote:
>> Can you add performance beneafit information after use non-coherent memory
>> in commit message to let reviewer easily know your intention.
> 
> To expand on that, you can post the output of something like this
> (before and after):
> $ spidev_test --device /dev/spidev1.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 10000000
> where /dev/spidev1.0 is an unconnected chip select with a dummy entry in
> the device tree.

Coherent (before):

rate: tx 385.8kbps, rx 385.8kbps
rate: tx 1215.7kbps, rx 1215.7kbps
rate: tx 1845.2kbps, rx 1845.2kbps
rate: tx 1844.0kbps, rx 1844.0kbps
rate: tx 1846.1kbps, rx 1846.1kbps
rate: tx 1844.8kbps, rx 1844.8kbps
rate: tx 1844.4kbps, rx 1844.4kbps
rate: tx 1846.9kbps, rx 1846.9kbps
rate: tx 1846.5kbps, rx 1846.5kbps
rate: tx 1843.2kbps, rx 1843.2kbps
rate: tx 1844.8kbps, rx 1844.8kbps
rate: tx 1845.2kbps, rx 1845.2kbps
rate: tx 1846.5kbps, rx 1846.5kbps

Non-coherent (after):

rate: tx 314.6kbps, rx 314.6kbps
rate: tx 748.3kbps, rx 748.3kbps
rate: tx 1845.2kbps, rx 1845.2kbps
rate: tx 1849.3kbps, rx 1849.3kbps
rate: tx 1846.1kbps, rx 1846.1kbps
rate: tx 1847.3kbps, rx 1847.3kbps
rate: tx 1845.7kbps, rx 1845.7kbps
rate: tx 1846.5kbps, rx 1846.5kbps
rate: tx 1844.4kbps, rx 1844.4kbps
rate: tx 1847.3kbps, rx 1847.3kbps
rate: tx 1847.3kbps, rx 1847.3kbps
rate: tx 1845.7kbps, rx 1845.7kbps
rate: tx 1846.5kbps, rx 1846.5kbps

Ignoring anything less than 1800 as starting up, coherent has an average 
of 1845.2kbps and non-coherent 1846.5kbps. Not sure if that's just noise 
or an actual effect.

With stress running in the background the difference in average over 17 
runs is slightly more significant:

   stress -m 8 --vm-stride 1 --vm-bytes 64MB

Coherent: 2105.5kbps
Non-coherent: 2125.6kbps

There's not much variance in the runs either, they're pretty much always 
2105 and 2125 +-1 so I don't think this result is noise.

(No idea why it goes faster when it's under load, but I hope that can be 
ignored for this test)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 11:05           ` James Clark
@ 2025-06-12 11:15             ` Vladimir Oltean
  2025-06-12 14:14               ` James Clark
  2025-06-12 11:15             ` Arnd Bergmann
  1 sibling, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-12 11:15 UTC (permalink / raw)
  To: James Clark
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel

On Thu, Jun 12, 2025 at 12:05:26PM +0100, James Clark wrote:
> (No idea why it goes faster when it's under load, but I hope that can be
> ignored for this test)

Might be because of dynamic CPU frequency scaling as done by the governor.
If the CPU utilization of spidev_test isn't high enough, the governor
will prefer lower CPU frequencies. You can try to repeat the test with
the "performance" governor and/or setting the min frequency equal to the
max one.

That's why I don't like the DMA mode in DSPI, it's still CPU-bound,
because the DMA buffers are very small (you can only provide one TX FIFO
worth of data per DMA transfer, rather than the whole buffer).

FWIW, the XSPI FIFO performance should be higher.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 11:05           ` James Clark
  2025-06-12 11:15             ` Vladimir Oltean
@ 2025-06-12 11:15             ` Arnd Bergmann
  1 sibling, 0 replies; 43+ messages in thread
From: Arnd Bergmann @ 2025-06-12 11:15 UTC (permalink / raw)
  To: James Clark, Vladimir Oltean, Frank Li
  Cc: Vladimir Oltean, Mark Brown, linux-spi, imx, linux-kernel

On Thu, Jun 12, 2025, at 13:05, James Clark wrote:
> On 11/06/2025 10:01 am, Vladimir Oltean wrote:
>> On Tue, Jun 10, 2025 at 11:56:34AM -0400, Frank Li wrote:
>>> Can you add performance beneafit information after use non-coherent memory
>>> in commit message to let reviewer easily know your intention.
>> 
>> To expand on that, you can post the output of something like this
>> (before and after):
>> $ spidev_test --device /dev/spidev1.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 10000000
>> where /dev/spidev1.0 is an unconnected chip select with a dummy entry in
>> the device tree.
>
> Coherent (before):
>
> rate: tx 385.8kbps, rx 385.8kbps
> rate: tx 1215.7kbps, rx 1215.7kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1844.0kbps, rx 1844.0kbps
> rate: tx 1846.1kbps, rx 1846.1kbps
> rate: tx 1844.8kbps, rx 1844.8kbps
> rate: tx 1844.4kbps, rx 1844.4kbps
> rate: tx 1846.9kbps, rx 1846.9kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
> rate: tx 1843.2kbps, rx 1843.2kbps
> rate: tx 1844.8kbps, rx 1844.8kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
>
> Non-coherent (after):
>
> rate: tx 314.6kbps, rx 314.6kbps
> rate: tx 748.3kbps, rx 748.3kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1849.3kbps, rx 1849.3kbps
> rate: tx 1846.1kbps, rx 1846.1kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1845.7kbps, rx 1845.7kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
> rate: tx 1844.4kbps, rx 1844.4kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1845.7kbps, rx 1845.7kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
>
> Ignoring anything less than 1800 as starting up, coherent has an average 
> of 1845.2kbps and non-coherent 1846.5kbps. Not sure if that's just noise 
> or an actual effect.

The extra cache flushes do introduce some overhead as well, so I
would expect the noncoherent case to be slightly slower for
small transfers, but the coherent case to be faster for large
transfers.

"--size 256" presumably means 256 bytes, i.e. four cachelines?
If it's easy to reproduce, can you check with smaller sizes
that still use the DMA codepath (e.g. 64 bytes) and much larger
transfers (e.g. 2048 bytes)?

      Arnd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 11:15             ` Vladimir Oltean
@ 2025-06-12 14:14               ` James Clark
  2025-06-12 14:23                 ` Vladimir Oltean
                                   ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: James Clark @ 2025-06-12 14:14 UTC (permalink / raw)
  To: Vladimir Oltean, Arnd Bergmann
  Cc: Frank Li, Vladimir Oltean, Mark Brown, linux-spi, imx,
	linux-kernel



On 12/06/2025 12:15 pm, Vladimir Oltean wrote:
> On Thu, Jun 12, 2025 at 12:05:26PM +0100, James Clark wrote:
>> (No idea why it goes faster when it's under load, but I hope that can be
>> ignored for this test)
> 
> Might be because of dynamic CPU frequency scaling as done by the governor.
> If the CPU utilization of spidev_test isn't high enough, the governor
> will prefer lower CPU frequencies. You can try to repeat the test with
> the "performance" governor and/or setting the min frequency equal to the
> max one.
> 

That doesn't seem to make a difference, I get the same results with 
this. Even for the "fixed" DMA test results below there is a similar 
small performance increase when stressing the system:

   # cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
   1300000
   ...

   # cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
   1300000
   ...

   # cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
   performance
   ...

> That's why I don't like the DMA mode in DSPI, it's still CPU-bound,
> because the DMA buffers are very small (you can only provide one TX FIFO
> worth of data per DMA transfer, rather than the whole buffer).

Is that right? The FIFO size isn't used in any of the DMA codepaths, it 
looks like the whole DMA buffer is filled before initiating the 
transfer. And we increase the buffer to 4k in this patchset to fully use 
the existing allocation.

> 
> FWIW, the XSPI FIFO performance should be higher.

This leads me to realise a mistake in my original figures. My head was 
stuck in target mode where we use DMA so I forgot to force DMA in host 
mode to run the performance tests. The previous figures were all XSPI 
mode and the small difference in performance could have been just down 
to the layout of the code changing?

Changing it to DMA mode gives figures that make much more sense:

Coherent (4096 byte transfers): 6534 kbps
Non-coherent:                   7347 kbps

Coherent (16 byte transfers):    447 kbps
Non-coherent:                    448 kbps


Just for comparison running the same test in XSPI mode:

4096 byte transfers:            2143 kbps
16 byte transfers:               637 kbps


So for small transfers XSPI is slightly better but for large ones DMA is 
much better, with non-coherent memory giving another 800kbps gain. 
Perhaps we could find the midpoint and then auto select the mode 
depending on the size, but maybe there is latency to consider too which 
could be important.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:14               ` James Clark
@ 2025-06-12 14:23                 ` Vladimir Oltean
  2025-06-12 14:28                   ` James Clark
  2025-06-12 14:36                 ` Vladimir Oltean
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-12 14:23 UTC (permalink / raw)
  To: James Clark
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel

On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
> > That's why I don't like the DMA mode in DSPI, it's still CPU-bound,
> > because the DMA buffers are very small (you can only provide one TX FIFO
> > worth of data per DMA transfer, rather than the whole buffer).
> 
> Is that right? The FIFO size isn't used in any of the DMA codepaths, it
> looks like the whole DMA buffer is filled before initiating the transfer.
> And we increase the buffer to 4k in this patchset to fully use the existing
> allocation.

Uhm, yeah, no?

dspi_dma_xfer():

	while (dspi->len) {
		dspi->words_in_flight = dspi->len / dspi->oper_word_size;
		if (dspi->words_in_flight > dspi->devtype_data->fifo_size)
			dspi->words_in_flight = dspi->devtype_data->fifo_size;
		dspi_next_xfer_dma_submit();
	}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:23                 ` Vladimir Oltean
@ 2025-06-12 14:28                   ` James Clark
  2025-06-12 14:31                     ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-12 14:28 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel



On 12/06/2025 3:23 pm, Vladimir Oltean wrote:
> On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
>>> That's why I don't like the DMA mode in DSPI, it's still CPU-bound,
>>> because the DMA buffers are very small (you can only provide one TX FIFO
>>> worth of data per DMA transfer, rather than the whole buffer).
>>
>> Is that right? The FIFO size isn't used in any of the DMA codepaths, it
>> looks like the whole DMA buffer is filled before initiating the transfer.
>> And we increase the buffer to 4k in this patchset to fully use the existing
>> allocation.
> 
> Uhm, yeah, no?
> 
> dspi_dma_xfer():
> 
> 	while (dspi->len) {
> 		dspi->words_in_flight = dspi->len / dspi->oper_word_size;
> 		if (dspi->words_in_flight > dspi->devtype_data->fifo_size)
> 			dspi->words_in_flight = dspi->devtype_data->fifo_size;
> 		dspi_next_xfer_dma_submit();
> 	}

Right but that's before the change in this patchset to use the whole 
page that was allocated, hence the next bit:

 > And we increase the buffer to 4k in this patchset to fully use the
   existing allocation.

We were allocating for the size of the FIFO (multiplied by two to hold 
the control words), but dma_alloc_coherent() will be backed by a whole 
page anyway, even if you only ask for a few bytes.

After changing that to make use of the full allocation the FIFO length 
is no longer involved.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:28                   ` James Clark
@ 2025-06-12 14:31                     ` Vladimir Oltean
  2025-06-12 14:35                       ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-12 14:31 UTC (permalink / raw)
  To: James Clark
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel

On Thu, Jun 12, 2025 at 03:28:37PM +0100, James Clark wrote:
> 
> 
> On 12/06/2025 3:23 pm, Vladimir Oltean wrote:
> > On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
> > > > That's why I don't like the DMA mode in DSPI, it's still CPU-bound,
> > > > because the DMA buffers are very small (you can only provide one TX FIFO
> > > > worth of data per DMA transfer, rather than the whole buffer).
> > > 
> > > Is that right? The FIFO size isn't used in any of the DMA codepaths, it
> > > looks like the whole DMA buffer is filled before initiating the transfer.
> > > And we increase the buffer to 4k in this patchset to fully use the existing
> > > allocation.
> > 
> > Uhm, yeah, no?
> > 
> > dspi_dma_xfer():
> > 
> > 	while (dspi->len) {
> > 		dspi->words_in_flight = dspi->len / dspi->oper_word_size;
> > 		if (dspi->words_in_flight > dspi->devtype_data->fifo_size)
> > 			dspi->words_in_flight = dspi->devtype_data->fifo_size;
> > 		dspi_next_xfer_dma_submit();
> > 	}
> 
> Right but that's before the change in this patchset to use the whole page
> that was allocated, hence the next bit:
> 
> > And we increase the buffer to 4k in this patchset to fully use the
>   existing allocation.
> 
> We were allocating for the size of the FIFO (multiplied by two to hold the
> control words), but dma_alloc_coherent() will be backed by a whole page
> anyway, even if you only ask for a few bytes.
> 
> After changing that to make use of the full allocation the FIFO length is no
> longer involved.

Ok, I haven't walked through patch 3 yet, I didn't realize it would be
changing that. I will want to test it on LS1028A.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:31                     ` Vladimir Oltean
@ 2025-06-12 14:35                       ` James Clark
  0 siblings, 0 replies; 43+ messages in thread
From: James Clark @ 2025-06-12 14:35 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel



On 12/06/2025 3:31 pm, Vladimir Oltean wrote:
> On Thu, Jun 12, 2025 at 03:28:37PM +0100, James Clark wrote:
>>
>>
>> On 12/06/2025 3:23 pm, Vladimir Oltean wrote:
>>> On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
>>>>> That's why I don't like the DMA mode in DSPI, it's still CPU-bound,
>>>>> because the DMA buffers are very small (you can only provide one TX FIFO
>>>>> worth of data per DMA transfer, rather than the whole buffer).
>>>>
>>>> Is that right? The FIFO size isn't used in any of the DMA codepaths, it
>>>> looks like the whole DMA buffer is filled before initiating the transfer.
>>>> And we increase the buffer to 4k in this patchset to fully use the existing
>>>> allocation.
>>>
>>> Uhm, yeah, no?
>>>
>>> dspi_dma_xfer():
>>>
>>> 	while (dspi->len) {
>>> 		dspi->words_in_flight = dspi->len / dspi->oper_word_size;
>>> 		if (dspi->words_in_flight > dspi->devtype_data->fifo_size)
>>> 			dspi->words_in_flight = dspi->devtype_data->fifo_size;
>>> 		dspi_next_xfer_dma_submit();
>>> 	}
>>
>> Right but that's before the change in this patchset to use the whole page
>> that was allocated, hence the next bit:
>>
>>> And we increase the buffer to 4k in this patchset to fully use the
>>    existing allocation.
>>
>> We were allocating for the size of the FIFO (multiplied by two to hold the
>> control words), but dma_alloc_coherent() will be backed by a whole page
>> anyway, even if you only ask for a few bytes.
>>
>> After changing that to make use of the full allocation the FIFO length is no
>> longer involved.
> 
> Ok, I haven't walked through patch 3 yet, I didn't realize it would be
> changing that. I will want to test it on LS1028A.

Yeah testing it somewhere else would be good. Maybe there is some 
limitation there about the max size of the DMA transfer, but I didn't 
see it.

I realise the tense in my original message may have been a bit 
confusing. I was mixing up the existing code and proposed changes...


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:14               ` James Clark
  2025-06-12 14:23                 ` Vladimir Oltean
@ 2025-06-12 14:36                 ` Vladimir Oltean
  2025-06-12 15:36                   ` James Clark
  2025-06-12 14:43                 ` Mark Brown
  2025-06-12 15:40                 ` Arnd Bergmann
  3 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-12 14:36 UTC (permalink / raw)
  To: James Clark
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel

On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
> > FWIW, the XSPI FIFO performance should be higher.
> 
> This leads me to realise a mistake in my original figures. My head was stuck
> in target mode where we use DMA so I forgot to force DMA in host mode to run
> the performance tests. The previous figures were all XSPI mode and the small
> difference in performance could have been just down to the layout of the
> code changing?
> 
> Changing it to DMA mode gives figures that make much more sense:
> 
> Coherent (4096 byte transfers): 6534 kbps
> Non-coherent:                   7347 kbps
> 
> Coherent (16 byte transfers):    447 kbps
> Non-coherent:                    448 kbps
> 
> 
> Just for comparison running the same test in XSPI mode:
> 
> 4096 byte transfers:            2143 kbps
> 16 byte transfers:               637 kbps

So to be clear, the 'non-coherent' test was done just with patch 2
applied, or also with 3?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:14               ` James Clark
  2025-06-12 14:23                 ` Vladimir Oltean
  2025-06-12 14:36                 ` Vladimir Oltean
@ 2025-06-12 14:43                 ` Mark Brown
  2025-06-12 15:47                   ` James Clark
  2025-06-12 15:40                 ` Arnd Bergmann
  3 siblings, 1 reply; 43+ messages in thread
From: Mark Brown @ 2025-06-12 14:43 UTC (permalink / raw)
  To: James Clark
  Cc: Vladimir Oltean, Arnd Bergmann, Frank Li, Vladimir Oltean,
	linux-spi, imx, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
> On 12/06/2025 12:15 pm, Vladimir Oltean wrote:

> > FWIW, the XSPI FIFO performance should be higher.

> This leads me to realise a mistake in my original figures. My head was stuck
> in target mode where we use DMA so I forgot to force DMA in host mode to run
> the performance tests. The previous figures were all XSPI mode and the small
> difference in performance could have been just down to the layout of the
> code changing?

> Changing it to DMA mode gives figures that make much more sense:

If not having DMA mode is making this much of a difference shouldn't the
driver just do that?  I'm not seeing runtime configuration in the driver
so I guess this is local hacking...

> So for small transfers XSPI is slightly better but for large ones DMA is
> much better, with non-coherent memory giving another 800kbps gain. Perhaps
> we could find the midpoint and then auto select the mode depending on the
> size, but maybe there is latency to consider too which could be important.

This is a fairly normal pattern, it's a big part of why the can_dma()
callback is per transfer - so you can do a copybreak and use PIO for
smaller transfers where the overhead of setting up DMA is often more
than the overhead of just doing PIO.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:36                 ` Vladimir Oltean
@ 2025-06-12 15:36                   ` James Clark
  2025-06-12 15:37                     ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-12 15:36 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel



On 12/06/2025 3:36 pm, Vladimir Oltean wrote:
> On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
>>> FWIW, the XSPI FIFO performance should be higher.
>>
>> This leads me to realise a mistake in my original figures. My head was stuck
>> in target mode where we use DMA so I forgot to force DMA in host mode to run
>> the performance tests. The previous figures were all XSPI mode and the small
>> difference in performance could have been just down to the layout of the
>> code changing?
>>
>> Changing it to DMA mode gives figures that make much more sense:
>>
>> Coherent (4096 byte transfers): 6534 kbps
>> Non-coherent:                   7347 kbps
>>
>> Coherent (16 byte transfers):    447 kbps
>> Non-coherent:                    448 kbps
>>
>>
>> Just for comparison running the same test in XSPI mode:
>>
>> 4096 byte transfers:            2143 kbps
>> 16 byte transfers:               637 kbps
> 
> So to be clear, the 'non-coherent' test was done just with patch 2
> applied, or also with 3?

The whole set, and then the non-coherent patch reverted.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 15:36                   ` James Clark
@ 2025-06-12 15:37                     ` James Clark
  0 siblings, 0 replies; 43+ messages in thread
From: James Clark @ 2025-06-12 15:37 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Arnd Bergmann, Frank Li, Vladimir Oltean, Mark Brown, linux-spi,
	imx, linux-kernel



On 12/06/2025 4:36 pm, James Clark wrote:
> 
> 
> On 12/06/2025 3:36 pm, Vladimir Oltean wrote:
>> On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
>>>> FWIW, the XSPI FIFO performance should be higher.
>>>
>>> This leads me to realise a mistake in my original figures. My head 
>>> was stuck
>>> in target mode where we use DMA so I forgot to force DMA in host mode 
>>> to run
>>> the performance tests. The previous figures were all XSPI mode and 
>>> the small
>>> difference in performance could have been just down to the layout of the
>>> code changing?
>>>
>>> Changing it to DMA mode gives figures that make much more sense:
>>>
>>> Coherent (4096 byte transfers): 6534 kbps
>>> Non-coherent:                   7347 kbps
>>>
>>> Coherent (16 byte transfers):    447 kbps
>>> Non-coherent:                    448 kbps
>>>
>>>
>>> Just for comparison running the same test in XSPI mode:
>>>
>>> 4096 byte transfers:            2143 kbps
>>> 16 byte transfers:               637 kbps
>>
>> So to be clear, the 'non-coherent' test was done just with patch 2
>> applied, or also with 3?
> 
> The whole set, and then the non-coherent patch reverted.
> 

And with DMA forced in host mode as a hack.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:14               ` James Clark
                                   ` (2 preceding siblings ...)
  2025-06-12 14:43                 ` Mark Brown
@ 2025-06-12 15:40                 ` Arnd Bergmann
  3 siblings, 0 replies; 43+ messages in thread
From: Arnd Bergmann @ 2025-06-12 15:40 UTC (permalink / raw)
  To: James Clark, Vladimir Oltean
  Cc: Frank Li, Vladimir Oltean, Mark Brown, linux-spi, imx,
	linux-kernel

On Thu, Jun 12, 2025, at 16:14, James Clark wrote:
> On 12/06/2025 12:15 pm, Vladimir Oltean wrote:
> This leads me to realise a mistake in my original figures. My head was 
> stuck in target mode where we use DMA so I forgot to force DMA in host 
> mode to run the performance tests. The previous figures were all XSPI 
> mode and the small difference in performance could have been just down 
> to the layout of the code changing?
>
> Changing it to DMA mode gives figures that make much more sense:
>
> Coherent (4096 byte transfers): 6534 kbps
> Non-coherent:                   7347 kbps
>
> Coherent (16 byte transfers):    447 kbps
> Non-coherent:                    448 kbps

Ok, good. The improvement from the patch is less than I had hoped
for, but it does sound like at least it doesn't get worse for
small transfers.

     Arnd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 14:43                 ` Mark Brown
@ 2025-06-12 15:47                   ` James Clark
  2025-06-12 15:51                     ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-12 15:47 UTC (permalink / raw)
  To: Mark Brown
  Cc: Vladimir Oltean, Arnd Bergmann, Frank Li, Vladimir Oltean,
	linux-spi, imx, linux-kernel



On 12/06/2025 3:43 pm, Mark Brown wrote:
> On Thu, Jun 12, 2025 at 03:14:32PM +0100, James Clark wrote:
>> On 12/06/2025 12:15 pm, Vladimir Oltean wrote:
> 
>>> FWIW, the XSPI FIFO performance should be higher.
> 
>> This leads me to realise a mistake in my original figures. My head was stuck
>> in target mode where we use DMA so I forgot to force DMA in host mode to run
>> the performance tests. The previous figures were all XSPI mode and the small
>> difference in performance could have been just down to the layout of the
>> code changing?
> 
>> Changing it to DMA mode gives figures that make much more sense:
> 
> If not having DMA mode is making this much of a difference shouldn't the
> driver just do that?  I'm not seeing runtime configuration in the driver
> so I guess this is local hacking...
> 

Yes just changed locally.

>> So for small transfers XSPI is slightly better but for large ones DMA is
>> much better, with non-coherent memory giving another 800kbps gain. Perhaps
>> we could find the midpoint and then auto select the mode depending on the
>> size, but maybe there is latency to consider too which could be important.
> 
> This is a fairly normal pattern, it's a big part of why the can_dma()
> callback is per transfer - so you can do a copybreak and use PIO for
> smaller transfers where the overhead of setting up DMA is often more
> than the overhead of just doing PIO.

Makes sense. Although for some reason two devices already use DMA for 
host mode and it's not that clear to me what the reason is.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-12 15:47                   ` James Clark
@ 2025-06-12 15:51                     ` Vladimir Oltean
  0 siblings, 0 replies; 43+ messages in thread
From: Vladimir Oltean @ 2025-06-12 15:51 UTC (permalink / raw)
  To: James Clark
  Cc: Mark Brown, Arnd Bergmann, Frank Li, Vladimir Oltean, linux-spi,
	imx, linux-kernel

On Thu, Jun 12, 2025 at 04:47:26PM +0100, James Clark wrote:
> > This is a fairly normal pattern, it's a big part of why the can_dma()
> > callback is per transfer - so you can do a copybreak and use PIO for
> > smaller transfers where the overhead of setting up DMA is often more
> > than the overhead of just doing PIO.
> 
> Makes sense. Although for some reason two devices already use DMA for host
> mode and it's not that clear to me what the reason is.

I couldn't test them and I've kept them as they were.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
  2025-06-10 15:46     ` James Clark
@ 2025-06-13 15:56       ` David Laight
  0 siblings, 0 replies; 43+ messages in thread
From: David Laight @ 2025-06-13 15:56 UTC (permalink / raw)
  To: James Clark
  Cc: Frank Li, Vladimir Oltean, Mark Brown, Vladimir Oltean, linux-spi,
	imx, linux-kernel, Arnd Bergmann

On Tue, 10 Jun 2025 16:46:36 +0100
James Clark <james.clark@linaro.org> wrote:

> On 10/06/2025 4:15 pm, Frank Li wrote:
> > On Mon, Jun 09, 2025 at 04:32:39PM +0100, James Clark wrote:  
> >> Using coherent memory here isn't functionally necessary.
> >> Because the
> >> change to use non-coherent memory isn't overly complex and only a few
> >> synchronization points are required, we might as well do it while fixing
> >> up some other DMA issues.  
> > 
> > Any beanfit by use on-coherent memory here?
> > 
> > Frank
> >   
> 
> Presumably less cache maintenance traffic?

I bet it only helps when cache-coherent memory has to be uncached.
Otherwise the software cache operations are pretty much guaranteed
to be more expensive than the hardware ones.

	David

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements
  2025-06-09 15:32 [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements James Clark
                   ` (3 preceding siblings ...)
  2025-06-09 15:32 ` [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors James Clark
@ 2025-06-30 11:40 ` Mark Brown
  2025-06-30 12:23   ` James Clark
  4 siblings, 1 reply; 43+ messages in thread
From: Mark Brown @ 2025-06-30 11:40 UTC (permalink / raw)
  To: Vladimir Oltean, James Clark
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel, Arnd Bergmann,
	Larisa Grigore

On Mon, 09 Jun 2025 16:32:37 +0100, James Clark wrote:
> Improve usability of target mode by reporting FIFO errors and increasing
> the buffer size when DMA is used. While we're touching DMA stuff also
> switch to non-coherent memory, although this is unrelated to target
> mode.
> 
> The first commit is marked as a fix because it can fix intermittent
> issues with existing transfers, rather than the later fixes which
> improve larger than FIFO target mode transfers which would have never
> worked.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
      commit: fa60c094c19b97e103d653f528f8d9c178b6a5f5
[2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
      (no commit info)
[3/4] spi: spi-fsl-dspi: Increase DMA buffer size
      (no commit info)
[4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors
      (no commit info)

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements
  2025-06-30 11:40 ` [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements Mark Brown
@ 2025-06-30 12:23   ` James Clark
  2025-06-30 12:25     ` Mark Brown
  0 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-30 12:23 UTC (permalink / raw)
  To: Mark Brown, Vladimir Oltean
  Cc: Vladimir Oltean, linux-spi, imx, linux-kernel, Arnd Bergmann,
	Larisa Grigore



On 30/06/2025 12:40 pm, Mark Brown wrote:
> On Mon, 09 Jun 2025 16:32:37 +0100, James Clark wrote:
>> Improve usability of target mode by reporting FIFO errors and increasing
>> the buffer size when DMA is used. While we're touching DMA stuff also
>> switch to non-coherent memory, although this is unrelated to target
>> mode.
>>
>> The first commit is marked as a fix because it can fix intermittent
>> issues with existing transfers, rather than the later fixes which
>> improve larger than FIFO target mode transfers which would have never
>> worked.
>>
>> [...]
> 
> Applied to
> 
>     https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
> 
> Thanks!
> 
> [1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
>        commit: fa60c094c19b97e103d653f528f8d9c178b6a5f5
> [2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
>        (no commit info)
> [3/4] spi: spi-fsl-dspi: Increase DMA buffer size
>        (no commit info)
> [4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors
>        (no commit info)
> 
> All being well this means that it will be integrated into the linux-next
> tree (usually sometime in the next 24 hours) and sent to Linus during
> the next merge window (or sooner if it is a bug fix), however if
> problems are discovered then the patch may be dropped or reverted.
> 
> You may get further e-mails resulting from automated or manual testing
> and review of the tree, please engage with people reporting problems and
> send followup patches addressing any issues that are reported if needed.
> 
> If any updates are required or you are submitting further changes they
> should be sent as incremental updates against current git, existing
> patches will not be replaced.
> 
> Please add any relevant lists and maintainers to the CCs when replying
> to this mail.
> 
> Thanks,
> Mark
> 

Hi Mark,

Not sure if this is a mistake in the notification or not, but this one 
shouldn't be applied. There is a v4 with some issues. Although the 
notification on V4 that patch 1 was applied is OK.

Thanks
James


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements
  2025-06-30 12:23   ` James Clark
@ 2025-06-30 12:25     ` Mark Brown
  2025-06-30 12:36       ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: Mark Brown @ 2025-06-30 12:25 UTC (permalink / raw)
  To: James Clark
  Cc: Vladimir Oltean, Vladimir Oltean, linux-spi, imx, linux-kernel,
	Arnd Bergmann, Larisa Grigore

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

On Mon, Jun 30, 2025 at 01:23:18PM +0100, James Clark wrote:
> On 30/06/2025 12:40 pm, Mark Brown wrote:

> > [1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
> >        commit: fa60c094c19b97e103d653f528f8d9c178b6a5f5

> > If any updates are required or you are submitting further changes they
> > should be sent as incremental updates against current git, existing
> > patches will not be replaced.

> Not sure if this is a mistake in the notification or not, but this one
> shouldn't be applied. There is a v4 with some issues. Although the
> notification on V4 that patch 1 was applied is OK.

Please follow the above for any fixes that are needed.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements
  2025-06-30 12:25     ` Mark Brown
@ 2025-06-30 12:36       ` James Clark
  2025-06-30 12:40         ` Mark Brown
  0 siblings, 1 reply; 43+ messages in thread
From: James Clark @ 2025-06-30 12:36 UTC (permalink / raw)
  To: Mark Brown
  Cc: Vladimir Oltean, Vladimir Oltean, linux-spi, imx, linux-kernel,
	Arnd Bergmann, Larisa Grigore



On 30/06/2025 1:25 pm, Mark Brown wrote:
> On Mon, Jun 30, 2025 at 01:23:18PM +0100, James Clark wrote:
>> On 30/06/2025 12:40 pm, Mark Brown wrote:
> 
>>> [1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer
>>>         commit: fa60c094c19b97e103d653f528f8d9c178b6a5f5
> 
>>> If any updates are required or you are submitting further changes they
>>> should be sent as incremental updates against current git, existing
>>> patches will not be replaced.
> 
>> Not sure if this is a mistake in the notification or not, but this one
>> shouldn't be applied. There is a v4 with some issues. Although the
>> notification on V4 that patch 1 was applied is OK.
> 
> Please follow the above for any fixes that are needed.

Ok I think I may have just been reading the notification incorrectly. I 
see that patches 2-4 are listed as "(no commit info)" so I'm assuming 
they weren't applied.

I was a bit confused to see this notification on both V1 and V4. Maybe 
an artifact of the first commit that was applied being the same?

So nothing needs following up, that particular commit happens to be the 
same on both versions and it looks like the remaining patches weren't 
actually applied.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements
  2025-06-30 12:36       ` James Clark
@ 2025-06-30 12:40         ` Mark Brown
  2025-06-30 12:54           ` James Clark
  0 siblings, 1 reply; 43+ messages in thread
From: Mark Brown @ 2025-06-30 12:40 UTC (permalink / raw)
  To: James Clark
  Cc: Vladimir Oltean, Vladimir Oltean, linux-spi, imx, linux-kernel,
	Arnd Bergmann, Larisa Grigore

[-- Attachment #1: Type: text/plain, Size: 372 bytes --]

On Mon, Jun 30, 2025 at 01:36:17PM +0100, James Clark wrote:

> I was a bit confused to see this notification on both V1 and V4. Maybe an
> artifact of the first commit that was applied being the same?

The tooling remembers if I ever downloaded any of the versions and
didn't explicitly delete them, if it sees a patch that it remembers
it'll report that it was applied.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements
  2025-06-30 12:40         ` Mark Brown
@ 2025-06-30 12:54           ` James Clark
  0 siblings, 0 replies; 43+ messages in thread
From: James Clark @ 2025-06-30 12:54 UTC (permalink / raw)
  To: Mark Brown
  Cc: Vladimir Oltean, Vladimir Oltean, linux-spi, imx, linux-kernel,
	Arnd Bergmann, Larisa Grigore



On 30/06/2025 1:40 pm, Mark Brown wrote:
> On Mon, Jun 30, 2025 at 01:36:17PM +0100, James Clark wrote:
> 
>> I was a bit confused to see this notification on both V1 and V4. Maybe an
>> artifact of the first commit that was applied being the same?
> 
> The tooling remembers if I ever downloaded any of the versions and
> didn't explicitly delete them, if it sees a patch that it remembers
> it'll report that it was applied.

Got it, thanks


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2025-06-30 12:54 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 15:32 [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements James Clark
2025-06-09 15:32 ` [PATCH 1/4] spi: spi-fsl-dspi: Clear completion counter before initiating transfer James Clark
2025-06-10 11:34   ` Vladimir Oltean
2025-06-10 15:41     ` James Clark
2025-06-10 21:01       ` Vladimir Oltean
2025-06-10 21:31         ` Vladimir Oltean
2025-06-11  9:12           ` James Clark
2025-06-09 15:32 ` [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA James Clark
2025-06-10  8:26   ` Arnd Bergmann
2025-06-10  9:03     ` James Clark
2025-06-10 15:15   ` Frank Li
2025-06-10 15:46     ` James Clark
2025-06-13 15:56       ` David Laight
2025-06-10 15:48     ` Arnd Bergmann
2025-06-10 15:56       ` Frank Li
2025-06-11  9:01         ` Vladimir Oltean
2025-06-12 11:05           ` James Clark
2025-06-12 11:15             ` Vladimir Oltean
2025-06-12 14:14               ` James Clark
2025-06-12 14:23                 ` Vladimir Oltean
2025-06-12 14:28                   ` James Clark
2025-06-12 14:31                     ` Vladimir Oltean
2025-06-12 14:35                       ` James Clark
2025-06-12 14:36                 ` Vladimir Oltean
2025-06-12 15:36                   ` James Clark
2025-06-12 15:37                     ` James Clark
2025-06-12 14:43                 ` Mark Brown
2025-06-12 15:47                   ` James Clark
2025-06-12 15:51                     ` Vladimir Oltean
2025-06-12 15:40                 ` Arnd Bergmann
2025-06-12 11:15             ` Arnd Bergmann
2025-06-09 15:32 ` [PATCH 3/4] spi: spi-fsl-dspi: Increase DMA buffer size James Clark
2025-06-09 15:32 ` [PATCH 4/4] spi: spi-fsl-dspi: Report FIFO overflows as errors James Clark
2025-06-10 21:52   ` Vladimir Oltean
2025-06-11 14:40     ` James Clark
2025-06-11 14:56       ` Vladimir Oltean
2025-06-11 15:00         ` James Clark
2025-06-30 11:40 ` [PATCH 0/4] spi: spi-fsl-dspi: Target mode improvements Mark Brown
2025-06-30 12:23   ` James Clark
2025-06-30 12:25     ` Mark Brown
2025-06-30 12:36       ` James Clark
2025-06-30 12:40         ` Mark Brown
2025-06-30 12:54           ` James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).