[PATCH 0/3] spi: tegra210-quad: Improve interrupt handling for loaded systems

Linux Tegra architecture development
 help / color / mirror / Atom feed

* [PATCH 0/3] spi: tegra210-quad: Improve interrupt handling for loaded systems
@ 2026-05-18 16:07 Vishwaroop A
  2026-05-18 16:07 ` [PATCH 1/3] spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue Vishwaroop A
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Vishwaroop A @ 2026-05-18 16:07 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Mark Brown
  Cc: Vishwaroop A, Laxman Dewangan, Sowjanya Komatineni, Breno Leitao,
	Suresh Mangipudi, Krishna Yarlagadda, linux-tegra, linux-spi,
	linux-kernel

The current threaded IRQ implementation in spi-tegra210-quad suffers from
scheduler-induced latency on heavily loaded systems. Because threaded IRQ
handlers are subject to CFS scheduling, they can be delayed long enough to
trigger transfer timeouts even though hardware completes in microseconds.
This results in false timeout errors and WARN_ON splats during normal
operation.

This series addresses the problem in three steps:

1. Convert the threaded IRQ handler to a hard IRQ + high-priority unbound
   workqueue model. The hard IRQ does the minimum: verify interrupt
   ownership, cache status registers, clear and mask interrupts. The
   workqueue bottom-half handles the rest in process context and can run
   on any available CPU, avoiding the CPU0 bottleneck inherent in threaded
   IRQs.

2. Cache QSPI_TRANS_STATUS in the ISR before clearing it. This allows the
   timeout handler to distinguish between a real hardware timeout (QSPI_RDY
   not set) and a delayed workqueue (QSPI_RDY set), preventing false
   timeout errors when hardware has already completed.

3. Process small PIO transfers (≤ FIFO depth, 256 bytes) directly in
   hard IRQ context. This eliminates workqueue scheduling overhead for
   latency-sensitive devices like TPMs, reducing completion latency from
   potentially seconds to microseconds.

Tested on TH500 with TPM and QSPI flash devices under sustained load.
The series applies cleanly on top of linux-next (20260508).

Vishwaroop A (3):
  spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue
  spi: tegra210-quad: Cache TRANS_STATUS in ISR for timeout handler
  spi: tegra210-quad: Process small PIO transfers in hard IRQ context

 drivers/spi/spi-tegra210-quad.c | 169 ++++++++++++++++++++++----------
 1 file changed, 117 insertions(+), 52 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue
  2026-05-18 16:07 [PATCH 0/3] spi: tegra210-quad: Improve interrupt handling for loaded systems Vishwaroop A
@ 2026-05-18 16:07 ` Vishwaroop A
  2026-05-19  9:15   ` Jon Hunter
  2026-05-18 16:07 ` [PATCH 2/3] spi: tegra210-quad: Cache TRANS_STATUS in ISR for timeout handler Vishwaroop A
  2026-05-18 16:07 ` [PATCH 3/3] spi: tegra210-quad: Process small PIO transfers in hard IRQ context Vishwaroop A
  2 siblings, 1 reply; 6+ messages in thread
From: Vishwaroop A @ 2026-05-18 16:07 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Mark Brown
  Cc: Vishwaroop A, Laxman Dewangan, Sowjanya Komatineni, Breno Leitao,
	Suresh Mangipudi, Krishna Yarlagadda, linux-tegra, linux-spi,
	linux-kernel

Threaded IRQ handlers suffer from scheduler latency on heavily loaded
systems, causing false transfer timeouts. Convert to hard IRQ handler
that schedules work on a high-priority unbound workqueue.

The hard IRQ handler verifies the interrupt, caches FIFO status,
clears and masks interrupts, then schedules bottom-half processing.
The workqueue handler runs in process context (can sleep for DMA)
and can execute on any CPU, avoiding CPU0 bottlenecks.

Signed-off-by: Vishwaroop A <va@nvidia.com>
---
 drivers/spi/spi-tegra210-quad.c | 120 ++++++++++++++++++++++----------
 1 file changed, 83 insertions(+), 37 deletions(-)

diff --git a/drivers/spi/spi-tegra210-quad.c b/drivers/spi/spi-tegra210-quad.c
index db28dd556484..a551c7a7f6c4 100644
--- a/drivers/spi/spi-tegra210-quad.c
+++ b/drivers/spi/spi-tegra210-quad.c
@@ -191,6 +191,8 @@ struct tegra_qspi {
 	void __iomem				*base;
 	phys_addr_t				phys;
 	unsigned int				irq;
+	struct work_struct			irq_work;
+	struct workqueue_struct			*wq;
 
 	u32					cur_speed;
 	unsigned int				cur_pos;
@@ -1574,46 +1576,40 @@ static irqreturn_t handle_dma_based_xfer(struct tegra_qspi *tqspi)
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t tegra_qspi_isr_thread(int irq, void *context_data)
+/**
+ * tegra_qspi_work_handler - Workqueue handler for interrupt bottom-half
+ * @work: work_struct embedded in tegra_qspi
+ *
+ * Runs in process context and can sleep (needed for DMA completion waits).
+ * Can run on any available CPU, avoiding CPU0 bottleneck that occurs with
+ * threaded IRQ handlers which are pinned to the IRQ's CPU.
+ *
+ * The hard IRQ handler has already:
+ * - Verified this is our interrupt (QSPI_RDY was set)
+ * - Cached FIFO status in tqspi->status_reg
+ * - Parsed tx_status / rx_status from FIFO status
+ * - Masked further interrupts
+ */
+static void tegra_qspi_work_handler(struct work_struct *work)
 {
-	struct tegra_qspi *tqspi = context_data;
+	struct tegra_qspi *tqspi = container_of(work, struct tegra_qspi, irq_work);
 	unsigned long flags;
-	u32 status;
 
-	/*
-	 * Read transfer status to check if interrupt was triggered by transfer
-	 * completion
-	 */
-	status = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
+	spin_lock_irqsave(&tqspi->lock, flags);
 
 	/*
-	 * Occasionally the IRQ thread takes a long time to wake up (usually
-	 * when the CPU that it's running on is excessively busy) and we have
-	 * already reached the timeout before and cleaned up the timed out
-	 * transfer. Avoid any processing in that case and bail out early.
-	 *
-	 * If no transfer is in progress, check if this was a real interrupt
-	 * that the timeout handler already processed, or a spurious one.
+	 * Check if timeout handler already processed this transfer.
+	 * Can happen if work was delayed and timeout fired first. If
+	 * so, we must unmask interrupts before returning, otherwise
+	 * they remain masked from the hard IRQ handler and the next
+	 * transfer will timeout.
 	 */
-	spin_lock_irqsave(&tqspi->lock, flags);
 	if (!tqspi->curr_xfer) {
 		spin_unlock_irqrestore(&tqspi->lock, flags);
-		/* Spurious interrupt - transfer not ready */
-		if (!(status & QSPI_RDY))
-			return IRQ_NONE;
-		/* Real interrupt, already handled by timeout path */
-		return IRQ_HANDLED;
+		tegra_qspi_unmask_irq(tqspi);
+		return;
 	}
 
-	tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
-
-	if (tqspi->cur_direction & DATA_DIR_TX)
-		tqspi->tx_status = tqspi->status_reg & (QSPI_TX_FIFO_UNF | QSPI_TX_FIFO_OVF);
-
-	if (tqspi->cur_direction & DATA_DIR_RX)
-		tqspi->rx_status = tqspi->status_reg & (QSPI_RX_FIFO_OVF | QSPI_RX_FIFO_UNF);
-
-	tegra_qspi_mask_clear_irq(tqspi);
 	spin_unlock_irqrestore(&tqspi->lock, flags);
 
 	/*
@@ -1623,9 +1619,46 @@ static irqreturn_t tegra_qspi_isr_thread(int irq, void *context_data)
 	 * cannot be done while holding spinlock.
 	 */
 	if (!tqspi->is_curr_dma_xfer)
-		return handle_cpu_based_xfer(tqspi);
+		handle_cpu_based_xfer(tqspi);
+	else
+		handle_dma_based_xfer(tqspi);
+}
 
-	return handle_dma_based_xfer(tqspi);
+/**
+ * tegra_qspi_isr - Hard IRQ handler
+ * @irq: IRQ number
+ * @context_data: QSPI controller instance
+ *
+ * Runs in hard IRQ context with minimal latency. Cannot sleep.
+ *
+ * Return: IRQ_NONE if not our interrupt, IRQ_HANDLED if handled
+ */
+static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
+{
+	struct tegra_qspi *tqspi = context_data;
+	u32 status;
+
+	status = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
+	if (!(status & QSPI_RDY))
+		return IRQ_NONE;
+
+	spin_lock(&tqspi->lock);
+	tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+	tegra_qspi_mask_clear_irq(tqspi);
+
+	if (tqspi->cur_direction & DATA_DIR_TX)
+		tqspi->tx_status = tqspi->status_reg &
+				    (QSPI_TX_FIFO_UNF | QSPI_TX_FIFO_OVF);
+
+	if (tqspi->cur_direction & DATA_DIR_RX)
+		tqspi->rx_status = tqspi->status_reg &
+				    (QSPI_RX_FIFO_OVF | QSPI_RX_FIFO_UNF);
+
+	spin_unlock(&tqspi->lock);
+
+	queue_work(tqspi->wq, &tqspi->irq_work);
+
+	return IRQ_HANDLED;
 }
 
 static struct tegra_qspi_soc_data tegra210_qspi_soc_data = {
@@ -1793,12 +1826,21 @@ static int tegra_qspi_probe(struct platform_device *pdev)
 
 	pm_runtime_put_autosuspend(&pdev->dev);
 
-	ret = request_threaded_irq(tqspi->irq, NULL,
-				   tegra_qspi_isr_thread, IRQF_ONESHOT,
-				   dev_name(&pdev->dev), tqspi);
+	tqspi->wq = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND, 0,
+				    dev_name(&pdev->dev));
+	if (!tqspi->wq) {
+		dev_err(&pdev->dev, "failed to allocate workqueue\n");
+		ret = -ENOMEM;
+		goto exit_pm_disable;
+	}
+
+	INIT_WORK(&tqspi->irq_work, tegra_qspi_work_handler);
+
+	ret = request_irq(tqspi->irq, tegra_qspi_isr, IRQF_SHARED,
+			  dev_name(&pdev->dev), tqspi);
 	if (ret < 0) {
 		dev_err(&pdev->dev, "failed to request IRQ#%u: %d\n", tqspi->irq, ret);
-		goto exit_pm_disable;
+		goto exit_destroy_wq;
 	}
 
 	ret = spi_register_controller(host);
@@ -1810,7 +1852,9 @@ static int tegra_qspi_probe(struct platform_device *pdev)
 	return 0;
 
 exit_free_irq:
-	free_irq(qspi_irq, tqspi);
+	free_irq(tqspi->irq, tqspi);
+exit_destroy_wq:
+	destroy_workqueue(tqspi->wq);
 exit_pm_disable:
 	pm_runtime_dont_use_autosuspend(&pdev->dev);
 	pm_runtime_force_suspend(&pdev->dev);
@@ -1825,6 +1869,8 @@ static void tegra_qspi_remove(struct platform_device *pdev)
 
 	spi_unregister_controller(host);
 	free_irq(tqspi->irq, tqspi);
+	flush_workqueue(tqspi->wq);
+	destroy_workqueue(tqspi->wq);
 	pm_runtime_dont_use_autosuspend(&pdev->dev);
 	pm_runtime_force_suspend(&pdev->dev);
 	tegra_qspi_deinit_dma(tqspi);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue
  2026-05-18 16:07 ` [PATCH 1/3] spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue Vishwaroop A
@ 2026-05-19  9:15   ` Jon Hunter
  0 siblings, 0 replies; 6+ messages in thread
From: Jon Hunter @ 2026-05-19  9:15 UTC (permalink / raw)
  To: Vishwaroop A, Thierry Reding, Mark Brown
  Cc: Laxman Dewangan, Sowjanya Komatineni, Breno Leitao,
	Suresh Mangipudi, Krishna Yarlagadda, linux-tegra, linux-spi,
	linux-kernel



On 18/05/2026 17:07, Vishwaroop A wrote:
> Threaded IRQ handlers suffer from scheduler latency on heavily loaded
> systems, causing false transfer timeouts. Convert to hard IRQ handler
> that schedules work on a high-priority unbound workqueue.
> 
> The hard IRQ handler verifies the interrupt, caches FIFO status,
> clears and masks interrupts, then schedules bottom-half processing.
> The workqueue handler runs in process context (can sleep for DMA)
> and can execute on any CPU, avoiding CPU0 bottlenecks.
> 
> Signed-off-by: Vishwaroop A <va@nvidia.com>
> ---
>   drivers/spi/spi-tegra210-quad.c | 120 ++++++++++++++++++++++----------
>   1 file changed, 83 insertions(+), 37 deletions(-)

...

>   static struct tegra_qspi_soc_data tegra210_qspi_soc_data = {
> @@ -1793,12 +1826,21 @@ static int tegra_qspi_probe(struct platform_device *pdev)
>   
>   	pm_runtime_put_autosuspend(&pdev->dev);
>   
> -	ret = request_threaded_irq(tqspi->irq, NULL,
> -				   tegra_qspi_isr_thread, IRQF_ONESHOT,
> -				   dev_name(&pdev->dev), tqspi);
> +	tqspi->wq = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND, 0,
> +				    dev_name(&pdev->dev));

Why not used devm_alloc_workqueue() here?

> +	if (!tqspi->wq) {
> +		dev_err(&pdev->dev, "failed to allocate workqueue\n");
> +		ret = -ENOMEM;
> +		goto exit_pm_disable;
> +	}
> +
> +	INIT_WORK(&tqspi->irq_work, tegra_qspi_work_handler);
> +
> +	ret = request_irq(tqspi->irq, tegra_qspi_isr, IRQF_SHARED,
> +			  dev_name(&pdev->dev), tqspi);

And use devm_request_irq() here?

>   	if (ret < 0) {
>   		dev_err(&pdev->dev, "failed to request IRQ#%u: %d\n", tqspi->irq, ret);
> -		goto exit_pm_disable;
> +		goto exit_destroy_wq;
>   	}
>   
>   	ret = spi_register_controller(host);
> @@ -1810,7 +1852,9 @@ static int tegra_qspi_probe(struct platform_device *pdev)
>   	return 0;
>   
>   exit_free_irq:
> -	free_irq(qspi_irq, tqspi);
> +	free_irq(tqspi->irq, tqspi);
> +exit_destroy_wq:
> +	destroy_workqueue(tqspi->wq);
>   exit_pm_disable:
>   	pm_runtime_dont_use_autosuspend(&pdev->dev);
>   	pm_runtime_force_suspend(&pdev->dev);
> @@ -1825,6 +1869,8 @@ static void tegra_qspi_remove(struct platform_device *pdev)
>   
>   	spi_unregister_controller(host);
>   	free_irq(tqspi->irq, tqspi);
> +	flush_workqueue(tqspi->wq);
> +	destroy_workqueue(tqspi->wq);
>   	pm_runtime_dont_use_autosuspend(&pdev->dev);
>   	pm_runtime_force_suspend(&pdev->dev);
>   	tegra_qspi_deinit_dma(tqspi);

Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/3] spi: tegra210-quad: Cache TRANS_STATUS in ISR for timeout handler
  2026-05-18 16:07 [PATCH 0/3] spi: tegra210-quad: Improve interrupt handling for loaded systems Vishwaroop A
  2026-05-18 16:07 ` [PATCH 1/3] spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue Vishwaroop A
@ 2026-05-18 16:07 ` Vishwaroop A
  2026-05-19  9:27   ` Jon Hunter
  2026-05-18 16:07 ` [PATCH 3/3] spi: tegra210-quad: Process small PIO transfers in hard IRQ context Vishwaroop A
  2 siblings, 1 reply; 6+ messages in thread
From: Vishwaroop A @ 2026-05-18 16:07 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Mark Brown
  Cc: Vishwaroop A, Laxman Dewangan, Sowjanya Komatineni, Breno Leitao,
	Suresh Mangipudi, Krishna Yarlagadda, linux-tegra, linux-spi,
	linux-kernel

On heavily loaded systems, workqueue scheduling delays can exceed the
transfer timeout even though hardware completes the transfer in
microseconds. The timeout handler cannot distinguish between a real
hardware timeout and a delayed workqueue, causing false timeout errors.

Cache QSPI_TRANS_STATUS in the ISR before clearing it, allowing the
timeout handler to check if hardware completed (QSPI_RDY set) versus
a real timeout (QSPI_RDY not set). This prevents false timeout errors
when the hardware completes but the workqueue is delayed.

Signed-off-by: Vishwaroop A <va@nvidia.com>
---
 drivers/spi/spi-tegra210-quad.c | 42 ++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/spi/spi-tegra210-quad.c b/drivers/spi/spi-tegra210-quad.c
index a551c7a7f6c4..6148267a51cd 100644
--- a/drivers/spi/spi-tegra210-quad.c
+++ b/drivers/spi/spi-tegra210-quad.c
@@ -214,6 +214,7 @@ struct tegra_qspi {
 	u32					tx_status;
 	u32					rx_status;
 	u32					status_reg;
+	u32					trans_status;
 	bool					is_packed;
 	bool					use_dma;
 
@@ -854,6 +855,7 @@ static u32 tegra_qspi_setup_transfer_one(struct spi_device *spi, struct spi_tran
 	tqspi->cur_rx_pos = 0;
 	tqspi->cur_tx_pos = 0;
 	tqspi->curr_xfer = t;
+	tqspi->trans_status = 0;
 	spin_unlock_irqrestore(&tqspi->lock, flags);
 
 	if (is_first_of_msg) {
@@ -1068,26 +1070,32 @@ static irqreturn_t handle_dma_based_xfer(struct tegra_qspi *tqspi);
  */
 static int tegra_qspi_handle_timeout(struct tegra_qspi *tqspi)
 {
+	unsigned long flags;
 	irqreturn_t ret;
-	u32 status;
+	u32 trans_status;
 
-	/* Check if hardware actually completed the transfer */
-	status = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
-	if (!(status & QSPI_RDY))
+	spin_lock_irqsave(&tqspi->lock, flags);
+
+	trans_status = tqspi->trans_status;
+	if (!(trans_status & QSPI_RDY)) {
+		spin_unlock_irqrestore(&tqspi->lock, flags);
 		return -ETIMEDOUT;
+	}
 
 	/*
-	 * Hardware completed but interrupt was lost/delayed. Manually
-	 * process the completion by calling the appropriate handler.
+	 * ISR or workqueue may have already completed the transfer
+	 * and NULLed curr_xfer between the completion timeout and now.
 	 */
+	if (!tqspi->curr_xfer) {
+		spin_unlock_irqrestore(&tqspi->lock, flags);
+		return 0;
+	}
+
+	spin_unlock_irqrestore(&tqspi->lock, flags);
+
 	dev_warn_ratelimited(tqspi->dev,
 			     "QSPI interrupt timeout, but transfer complete\n");
 
-	/* Clear the transfer status */
-	status = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
-	tegra_qspi_writel(tqspi, status, QSPI_TRANS_STATUS);
-
-	/* Manually trigger completion handler */
 	if (!tqspi->is_curr_dma_xfer)
 		ret = handle_cpu_based_xfer(tqspi);
 	else
@@ -1227,9 +1235,9 @@ static int tegra_qspi_combined_seq_xfer(struct tegra_qspi *tqspi,
 
 			if (ret == 0) {
 				/*
-				 * Check if hardware completed the transfer
-				 * even though interrupt was lost or delayed.
-				 * If so, process the completion and continue.
+				 * Check if hardware completed the transfer even though
+				 * workqueue was delayed. If so, process completion and
+				 * continue.
 				 */
 				ret = tegra_qspi_handle_timeout(tqspi);
 				if (ret < 0) {
@@ -1346,8 +1354,8 @@ static int tegra_qspi_non_combined_seq_xfer(struct tegra_qspi *tqspi,
 		if (ret == 0) {
 			/*
 			 * Check if hardware completed the transfer even though
-			 * interrupt was lost or delayed. If so, process the
-			 * completion and continue.
+			 * workqueue was delayed. If so, process completion and
+			 * continue.
 			 */
 			ret = tegra_qspi_handle_timeout(tqspi);
 			if (ret < 0) {
@@ -1642,6 +1650,8 @@ static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
 	if (!(status & QSPI_RDY))
 		return IRQ_NONE;
 
+	tqspi->trans_status = status;
+
 	spin_lock(&tqspi->lock);
 	tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
 	tegra_qspi_mask_clear_irq(tqspi);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] spi: tegra210-quad: Cache TRANS_STATUS in ISR for timeout handler
  2026-05-18 16:07 ` [PATCH 2/3] spi: tegra210-quad: Cache TRANS_STATUS in ISR for timeout handler Vishwaroop A
@ 2026-05-19  9:27   ` Jon Hunter
  0 siblings, 0 replies; 6+ messages in thread
From: Jon Hunter @ 2026-05-19  9:27 UTC (permalink / raw)
  To: Vishwaroop A, Thierry Reding, Mark Brown
  Cc: Laxman Dewangan, Sowjanya Komatineni, Breno Leitao,
	Suresh Mangipudi, Krishna Yarlagadda, linux-tegra, linux-spi,
	linux-kernel



On 18/05/2026 17:07, Vishwaroop A wrote:
> On heavily loaded systems, workqueue scheduling delays can exceed the
> transfer timeout even though hardware completes the transfer in
> microseconds. The timeout handler cannot distinguish between a real
> hardware timeout and a delayed workqueue, causing false timeout errors.

Can you explain this better? Ie. exactly how does this occur?


> Cache QSPI_TRANS_STATUS in the ISR before clearing it, allowing the
> timeout handler to check if hardware completed (QSPI_RDY set) versus
> a real timeout (QSPI_RDY not set). This prevents false timeout errors
> when the hardware completes but the workqueue is delayed.

The workqueue was introduced in patch 1/3. Did this problem already 
exist with the threaded IRQ?

> 
> Signed-off-by: Vishwaroop A <va@nvidia.com>
> ---
>   drivers/spi/spi-tegra210-quad.c | 42 ++++++++++++++++++++-------------
>   1 file changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/spi/spi-tegra210-quad.c b/drivers/spi/spi-tegra210-quad.c
> index a551c7a7f6c4..6148267a51cd 100644
> --- a/drivers/spi/spi-tegra210-quad.c
> +++ b/drivers/spi/spi-tegra210-quad.c
> @@ -214,6 +214,7 @@ struct tegra_qspi {
>   	u32					tx_status;
>   	u32					rx_status;
>   	u32					status_reg;
> +	u32					trans_status;
>   	bool					is_packed;
>   	bool					use_dma;
>   
> @@ -854,6 +855,7 @@ static u32 tegra_qspi_setup_transfer_one(struct spi_device *spi, struct spi_tran
>   	tqspi->cur_rx_pos = 0;
>   	tqspi->cur_tx_pos = 0;
>   	tqspi->curr_xfer = t;
> +	tqspi->trans_status = 0;
>   	spin_unlock_irqrestore(&tqspi->lock, flags);
>   
>   	if (is_first_of_msg) {
> @@ -1068,26 +1070,32 @@ static irqreturn_t handle_dma_based_xfer(struct tegra_qspi *tqspi);
>    */
>   static int tegra_qspi_handle_timeout(struct tegra_qspi *tqspi)
>   {
> +	unsigned long flags;
>   	irqreturn_t ret;
> -	u32 status;
> +	u32 trans_status;

This variable is only used once and so seem unnecessary to have a local 
variable for this.

>   
> -	/* Check if hardware actually completed the transfer */
> -	status = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
> -	if (!(status & QSPI_RDY))
> +	spin_lock_irqsave(&tqspi->lock, flags);
> +
> +	trans_status = tqspi->trans_status;
> +	if (!(trans_status & QSPI_RDY)) {
> +		spin_unlock_irqrestore(&tqspi->lock, flags);
>   		return -ETIMEDOUT;
> +	}
>   
>   	/*
> -	 * Hardware completed but interrupt was lost/delayed. Manually
> -	 * process the completion by calling the appropriate handler.
> +	 * ISR or workqueue may have already completed the transfer
> +	 * and NULLed curr_xfer between the completion timeout and now.
>   	 */
> +	if (!tqspi->curr_xfer) {
> +		spin_unlock_irqrestore(&tqspi->lock, flags);
> +		return 0;
> +	}
> +
> +	spin_unlock_irqrestore(&tqspi->lock, flags);
> +
>   	dev_warn_ratelimited(tqspi->dev,
>   			     "QSPI interrupt timeout, but transfer complete\n");
>   
> -	/* Clear the transfer status */
> -	status = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
> -	tegra_qspi_writel(tqspi, status, QSPI_TRANS_STATUS);
> -
> -	/* Manually trigger completion handler */
>   	if (!tqspi->is_curr_dma_xfer)
>   		ret = handle_cpu_based_xfer(tqspi);
>   	else
> @@ -1227,9 +1235,9 @@ static int tegra_qspi_combined_seq_xfer(struct tegra_qspi *tqspi,
>   
>   			if (ret == 0) {
>   				/*
> -				 * Check if hardware completed the transfer
> -				 * even though interrupt was lost or delayed.
> -				 * If so, process the completion and continue.
> +				 * Check if hardware completed the transfer even though
> +				 * workqueue was delayed. If so, process completion and
> +				 * continue.

Shouldn't this part be part of patch 1/3 because the introduced the 
workqueue?

>   				 */
>   				ret = tegra_qspi_handle_timeout(tqspi);
>   				if (ret < 0) {
> @@ -1346,8 +1354,8 @@ static int tegra_qspi_non_combined_seq_xfer(struct tegra_qspi *tqspi,
>   		if (ret == 0) {
>   			/*
>   			 * Check if hardware completed the transfer even though
> -			 * interrupt was lost or delayed. If so, process the
> -			 * completion and continue.
> +			 * workqueue was delayed. If so, process completion and
> +			 * continue.

Same here.

>   			 */
>   			ret = tegra_qspi_handle_timeout(tqspi);
>   			if (ret < 0) {
> @@ -1642,6 +1650,8 @@ static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
>   	if (!(status & QSPI_RDY))
>   		return IRQ_NONE;
>   
> +	tqspi->trans_status = status;
> +
>   	spin_lock(&tqspi->lock);
>   	tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
>   	tegra_qspi_mask_clear_irq(tqspi);

-- 
nvpublic


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/3] spi: tegra210-quad: Process small PIO transfers in hard IRQ context
  2026-05-18 16:07 [PATCH 0/3] spi: tegra210-quad: Improve interrupt handling for loaded systems Vishwaroop A
  2026-05-18 16:07 ` [PATCH 1/3] spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue Vishwaroop A
  2026-05-18 16:07 ` [PATCH 2/3] spi: tegra210-quad: Cache TRANS_STATUS in ISR for timeout handler Vishwaroop A
@ 2026-05-18 16:07 ` Vishwaroop A
  2 siblings, 0 replies; 6+ messages in thread
From: Vishwaroop A @ 2026-05-18 16:07 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Mark Brown
  Cc: Vishwaroop A, Laxman Dewangan, Sowjanya Komatineni, Breno Leitao,
	Suresh Mangipudi, Krishna Yarlagadda, linux-tegra, linux-spi,
	linux-kernel

On heavily loaded systems, workqueue scheduling delays can exceed
transfer timeouts even for high-priority queues, causing false timeouts
for latency-sensitive devices like TPM despite hardware completing in
microseconds.

Process small PIO transfers (≤256 bytes) directly in hard IRQ context
instead of deferring to workqueue. This reduces completion latency
from 1000ms+ to microseconds and matches the pattern used by other
SPI drivers.

The 256-byte threshold (FIFO depth) ensures small transfers for devices
like TPMs use the fast path, while larger transfers continue using
workqueue.

Signed-off-by: Vishwaroop A <va@nvidia.com>
---
 drivers/spi/spi-tegra210-quad.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/spi/spi-tegra210-quad.c b/drivers/spi/spi-tegra210-quad.c
index 6148267a51cd..435e14d80bfa 100644
--- a/drivers/spi/spi-tegra210-quad.c
+++ b/drivers/spi/spi-tegra210-quad.c
@@ -1666,6 +1666,15 @@ static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
 
 	spin_unlock(&tqspi->lock);
 
+	/*
+	 * For small PIO transfers (e.g., TPM), process directly in hard IRQ
+	 * context unless there was a FIFO error. Error recovery calls
+	 * device_reset() which can sleep, so must be deferred to workqueue.
+	 */
+	if (!tqspi->is_curr_dma_xfer && tqspi->curr_dma_words <= QSPI_FIFO_DEPTH &&
+	    !tqspi->tx_status && !tqspi->rx_status)
+		return handle_cpu_based_xfer(tqspi);
+
 	queue_work(tqspi->wq, &tqspi->irq_work);
 
 	return IRQ_HANDLED;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-19  9:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 16:07 [PATCH 0/3] spi: tegra210-quad: Improve interrupt handling for loaded systems Vishwaroop A
2026-05-18 16:07 ` [PATCH 1/3] spi: tegra210-quad: Convert to hard IRQ with high-priority workqueue Vishwaroop A
2026-05-19  9:15   ` Jon Hunter
2026-05-18 16:07 ` [PATCH 2/3] spi: tegra210-quad: Cache TRANS_STATUS in ISR for timeout handler Vishwaroop A
2026-05-19  9:27   ` Jon Hunter
2026-05-18 16:07 ` [PATCH 3/3] spi: tegra210-quad: Process small PIO transfers in hard IRQ context Vishwaroop A

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox