* [PATCH] spi: dw: fix race between transfer IRQ handler and timeout handler
@ 2026-05-22 9:57 Peng Yang
2026-05-25 14:32 ` Mark Brown
2026-06-08 9:58 ` [PATCH v2] spi: dw: fix race between IRQ handler and error handler on SMP Peng Yang
0 siblings, 2 replies; 7+ messages in thread
From: Peng Yang @ 2026-05-22 9:57 UTC (permalink / raw)
To: Mark Brown; +Cc: Serge Semin, linux-spi, linux-kernel, pyangyyd, Peng Yang
dw_spi_transfer_handler() can race with dw_spi_handle_err() on SMP.
When an IRQ-based transfer times out, the error path resets the chip
while the IRQ handler is still accessing the FIFO on another CPU.
This causes bus errors on platforms where empty FIFO access is illegal.
Fix by adding a spinlock around FIFO access and chip reset to make them
in serial.
Fixes: 0b6bfad4cee4 ("spi: spi-dw: Remove extraneous locking")
Signed-off-by: Peng Yang <pyangyyd@amazon.com>
---
drivers/spi/spi-dw-core.c | 10 ++++++++++
drivers/spi/spi-dw.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
index b47637888..e2ae04410 100644
--- a/drivers/spi/spi-dw-core.c
+++ b/drivers/spi/spi-dw-core.c
@@ -213,6 +213,7 @@ EXPORT_SYMBOL_NS_GPL(dw_spi_check_status, "SPI_DW_CORE");
static irqreturn_t dw_spi_transfer_handler(struct dw_spi *dws)
{
u16 irq_status = dw_readl(dws, DW_SPI_ISR);
+ unsigned long flags;
if (dw_spi_check_status(dws, false)) {
spi_finalize_current_transfer(dws->ctlr);
@@ -226,7 +227,9 @@ static irqreturn_t dw_spi_transfer_handler(struct dw_spi *dws)
* final stage of the transfer. By doing so we'll get the next IRQ
* right when the leftover incoming data is received.
*/
+ spin_lock_irqsave(&dws->buf_lock, flags);
dw_reader(dws);
+ spin_unlock_irqrestore(&dws->buf_lock, flags);
if (!dws->rx_len) {
dw_spi_mask_intr(dws, 0xff);
spi_finalize_current_transfer(dws->ctlr);
@@ -240,7 +243,9 @@ static irqreturn_t dw_spi_transfer_handler(struct dw_spi *dws)
* have the TXE IRQ flood at the final stage of the transfer.
*/
if (irq_status & DW_SPI_INT_TXEI) {
+ spin_lock_irqsave(&dws->buf_lock, flags);
dw_writer(dws);
+ spin_unlock_irqrestore(&dws->buf_lock, flags);
if (!dws->tx_len)
dw_spi_mask_intr(dws, DW_SPI_INT_TXEI);
}
@@ -468,11 +473,14 @@ static int dw_spi_transfer_one(struct spi_controller *ctlr,
static inline void dw_spi_abort(struct spi_controller *ctlr)
{
struct dw_spi *dws = spi_controller_get_devdata(ctlr);
+ unsigned long flags;
if (dws->dma_mapped)
dws->dma_ops->dma_stop(dws);
+ spin_lock_irqsave(&dws->buf_lock, flags);
dw_spi_reset_chip(dws);
+ spin_unlock_irqrestore(&dws->buf_lock, flags);
}
static void dw_spi_handle_err(struct spi_controller *ctlr,
@@ -939,6 +947,8 @@ int dw_spi_add_controller(struct device *dev, struct dw_spi *dws)
dws->ctlr = ctlr;
dws->dma_addr = (dma_addr_t)(dws->paddr + DW_SPI_DR);
+ spin_lock_init(&dws->buf_lock);
+
spi_controller_set_devdata(ctlr, dws);
/* Basic HW init */
diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h
index 9cc79c566..4c0843e96 100644
--- a/drivers/spi/spi-dw.h
+++ b/drivers/spi/spi-dw.h
@@ -196,6 +196,7 @@ struct dw_spi {
const struct dw_spi_dma_ops *dma_ops;
struct completion dma_completion;
+ spinlock_t buf_lock; /* Serialize FIFO access vs chip reset */
#ifdef CONFIG_DEBUG_FS
struct dentry *debugfs;
struct debugfs_regset32 regset;
--
2.50.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] spi: dw: fix race between transfer IRQ handler and timeout handler
2026-05-22 9:57 [PATCH] spi: dw: fix race between transfer IRQ handler and timeout handler Peng Yang
@ 2026-05-25 14:32 ` Mark Brown
[not found] ` <TYRP286MB5365F4E370E3F4862A598B56A60B2@TYRP286MB5365.JPNP286.PROD.OUTLOOK.COM>
2026-05-27 8:20 ` Peng Yang
2026-06-08 9:58 ` [PATCH v2] spi: dw: fix race between IRQ handler and error handler on SMP Peng Yang
1 sibling, 2 replies; 7+ messages in thread
From: Mark Brown @ 2026-05-25 14:32 UTC (permalink / raw)
To: Peng Yang; +Cc: Serge Semin, linux-spi, linux-kernel, Peng Yang
[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]
On Fri, May 22, 2026 at 05:57:27PM +0800, Peng Yang wrote:
> dw_spi_transfer_handler() can race with dw_spi_handle_err() on SMP.
> When an IRQ-based transfer times out, the error path resets the chip
> while the IRQ handler is still accessing the FIFO on another CPU.
> This causes bus errors on platforms where empty FIFO access is illegal.
> Fix by adding a spinlock around FIFO access and chip reset to make them
> in serial.
> @@ -240,7 +243,9 @@ static irqreturn_t dw_spi_transfer_handler(struct dw_spi *dws)
> * have the TXE IRQ flood at the final stage of the transfer.
> */
> if (irq_status & DW_SPI_INT_TXEI) {
> + spin_lock_irqsave(&dws->buf_lock, flags);
> dw_writer(dws);
> + spin_unlock_irqrestore(&dws->buf_lock, flags);
I am having an incredibly hard time convincing myself that these very
tight locking windows are safe, especially given that we're using a
value for irq_status we took before the lock.
> static inline void dw_spi_abort(struct spi_controller *ctlr)
> {
> struct dw_spi *dws = spi_controller_get_devdata(ctlr);
> + unsigned long flags;
>
> if (dws->dma_mapped)
> dws->dma_ops->dma_stop(dws);
>
> + spin_lock_irqsave(&dws->buf_lock, flags);
> dw_spi_reset_chip(dws);
> + spin_unlock_irqrestore(&dws->buf_lock, flags);
> }
What prevents an abort happening between checking the interrupt status
and trying to do a read or write, and/or is it safe for the readers and
writers to run after we reset the hardware? We didn't reset any driver
state AFAICT, just the hardware.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread[parent not found: <TYRP286MB5365F4E370E3F4862A598B56A60B2@TYRP286MB5365.JPNP286.PROD.OUTLOOK.COM>]
* Re: [PATCH] spi: dw: fix race between transfer IRQ handler and timeout handler
[not found] ` <TYRP286MB5365F4E370E3F4862A598B56A60B2@TYRP286MB5365.JPNP286.PROD.OUTLOOK.COM>
@ 2026-05-26 11:33 ` Mark Brown
0 siblings, 0 replies; 7+ messages in thread
From: Mark Brown @ 2026-05-26 11:33 UTC (permalink / raw)
To: Yang, Peng
Cc: Peng Yang, Serge Semin, linux-spi@vger.kernel.org,
linux-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]
On Tue, May 26, 2026 at 02:18:39AM +0000, Yang, Peng wrote:
> Hi Mark,
Please don't top post, reply in line with needed context. This allows
readers to readily follow the flow of conversation and understand what
you are talking about and also helps ensure that everything in the
discussion is being addressed.
Please fix your mail client to word wrap within paragraphs at something
substantially less than 80 columns. Doing this makes your messages much
easier to read and reply to.
> The race I'm fixing is specifically about the local variable “max” in
> dw_reader()/dw_writer(). These functions snapshot the FIFO level into
> a local “max" via dw_spi_rx_max()/dw_spi_tx_max(), then iterate
That doesn't mean that's the only possible thing that could race.
> Regarding irq_status being read before the lock: this is intentional.
> irq_status only gates which code path is taken (read vs write). The
> actual safety comes from the FIFO level check inside
> dw_spi_rx_max()/dw_spi_tx_max() being protected by the lock together
> with the DR access.
What happens if between checking the interrupt status and handling the
FIFOs we call dw_spi_handle_err()? The next transfer could be started,
updating the buffer pointers and lengths in dw_spi_transfer_one() but
that doesn't exclude the interrupt handler. That's also already an
issue, but it's complicated by the thin locking windows.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] spi: dw: fix race between transfer IRQ handler and timeout handler
2026-05-25 14:32 ` Mark Brown
[not found] ` <TYRP286MB5365F4E370E3F4862A598B56A60B2@TYRP286MB5365.JPNP286.PROD.OUTLOOK.COM>
@ 2026-05-27 8:20 ` Peng Yang
1 sibling, 0 replies; 7+ messages in thread
From: Peng Yang @ 2026-05-27 8:20 UTC (permalink / raw)
To: Mark Brown; +Cc: Serge Semin, linux-spi, linux-kernel, pyangyyd
On Mon, May 26, 2026 at ..., Mark Brown wrote:
> Please don't top post, reply in line with needed context.
Apologies for the format, fixing it here.
> Please fix your mail client to word wrap within paragraphs at
> something substantially less than 80 columns.
Will do.
> That doesn't mean that's the only possible thing that could race.
I've traced through the code paths during my debug. The only
race path is in IRQ mode, where handle_err() calls
dw_spi_reset_chip() from the SPI core kthread while the IRQ
handler is still servicing FIFO interrupts on another CPU.
Poll mode returns 0 so the kthread never sleeps and
handle_err can't be reached concurrently.
DMA mode has its own transfer handler that doesn't access
the FIFO directly.
> What happens if between checking the interrupt status and
> handling the FIFOs we call dw_spi_handle_err()? The next
> transfer could be started, updating the buffer pointers and
> lengths in dw_spi_transfer_one() but that doesn't exclude
> the interrupt handler.
Looking at spi_transfer_one_message(), handle_err() and the
next transfer_one() are called sequentially from the same
kthread:
transfer_one() -> spi_transfer_wait() -> timeout ->
handle_err() -> spi_finalize_current_message() ->
next transfer_one()
A new transfer cannot start until handle_err() returns and
spi_finalize_current_message() completes, so buffer pointers
won't be updated concurrently.
If reset happens between checking irq_status and taking the
lock, dw_reader/dw_writer will see an empty FIFO (max=0)
and exit without accessing DR.
> That's also already an issue, but it's complicated by the
> thin locking windows.
I think this case is safe because the kthread serializes
handle_err and the next transfer_one sequentially, so the
buffer pointers can't be updated while the IRQ handler is
still running.
Please let me know if I'm missing something here.
Best Regards,
Peng Yang
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] spi: dw: fix race between IRQ handler and error handler on SMP
2026-05-22 9:57 [PATCH] spi: dw: fix race between transfer IRQ handler and timeout handler Peng Yang
2026-05-25 14:32 ` Mark Brown
@ 2026-06-08 9:58 ` Peng Yang
2026-06-08 13:10 ` Mark Brown
2026-06-09 23:07 ` Mark Brown
1 sibling, 2 replies; 7+ messages in thread
From: Peng Yang @ 2026-06-08 9:58 UTC (permalink / raw)
To: Mark Brown
Cc: Serge Semin, Jonathan Chocron, linux-spi, linux-kernel, pyangyyd,
pyangyyd
On SMP systems, dw_spi_handle_err() can be called from the SPI core
kthread while the IRQ handler is still accessing the FIFO on another
CPU. Resetting the chip via dw_spi_reset_chip() during an active FIFO
read/write causes a bus error.
Fix this by calling disable_irq() before the chip reset, which masks
the IRQ and waits for any in-flight handler to complete via
synchronize_irq(). This ensures no handler is accessing the FIFO when
the reset occurs.
Signed-off-by: Peng Yang <pyangyyd@amazon.com>
Suggested-by: Jonathan Chocron <jonnyc@amazon.com>
---
v1 -> v2:
- Replace spinlock with disable_irq()/enable_irq() as suggested by
Jonathan Chocron. This avoids touching the IRQ handler hot path
entirely.
drivers/spi/spi-dw-core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
--- a/drivers/spi/spi-dw-core.c
+++ b/drivers/spi/spi-dw-core.c
@@ -486,10 +486,12 @@ static inline void dw_spi_abort(struct spi_controller *ctlr)
{
struct dw_spi *dws = spi_controller_get_devdata(ctlr);
if (dws->dma_mapped)
dws->dma_ops->dma_stop(dws);
+ disable_irq(dws->irq);
dw_spi_reset_chip(dws);
+ enable_irq(dws->irq);
}
static void dw_spi_handle_err(struct spi_controller *ctlr,
--
2.40.1
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] spi: dw: fix race between IRQ handler and error handler on SMP
2026-06-08 9:58 ` [PATCH v2] spi: dw: fix race between IRQ handler and error handler on SMP Peng Yang
@ 2026-06-08 13:10 ` Mark Brown
2026-06-09 23:07 ` Mark Brown
1 sibling, 0 replies; 7+ messages in thread
From: Mark Brown @ 2026-06-08 13:10 UTC (permalink / raw)
To: Peng Yang
Cc: Serge Semin, Jonathan Chocron, linux-spi, linux-kernel, pyangyyd
[-- Attachment #1: Type: text/plain, Size: 593 bytes --]
On Mon, Jun 08, 2026 at 05:58:49PM +0800, Peng Yang wrote:
> On SMP systems, dw_spi_handle_err() can be called from the SPI core
> kthread while the IRQ handler is still accessing the FIFO on another
> CPU. Resetting the chip via dw_spi_reset_chip() during an active FIFO
> read/write causes a bus error.
Please don't send new patches in reply to old patches or serieses, this
makes it harder for both people and tools to understand what is going
on - it can bury things in mailboxes and make it difficult to keep track
of what current patches are, both for the new patches and the old ones.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] spi: dw: fix race between IRQ handler and error handler on SMP
2026-06-08 9:58 ` [PATCH v2] spi: dw: fix race between IRQ handler and error handler on SMP Peng Yang
2026-06-08 13:10 ` Mark Brown
@ 2026-06-09 23:07 ` Mark Brown
1 sibling, 0 replies; 7+ messages in thread
From: Mark Brown @ 2026-06-09 23:07 UTC (permalink / raw)
To: Peng Yang
Cc: Serge Semin, Jonathan Chocron, linux-spi, linux-kernel, pyangyyd
On Mon, 08 Jun 2026 17:58:49 +0800, Peng Yang wrote:
> spi: dw: fix race between IRQ handler and error handler on SMP
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-7.1
Thanks!
[1/1] spi: dw: fix race between IRQ handler and error handler on SMP
https://git.kernel.org/broonie/spi/c/3c60184e39b5
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-10 10:23 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 9:57 [PATCH] spi: dw: fix race between transfer IRQ handler and timeout handler Peng Yang
2026-05-25 14:32 ` Mark Brown
[not found] ` <TYRP286MB5365F4E370E3F4862A598B56A60B2@TYRP286MB5365.JPNP286.PROD.OUTLOOK.COM>
2026-05-26 11:33 ` Mark Brown
2026-05-27 8:20 ` Peng Yang
2026-06-08 9:58 ` [PATCH v2] spi: dw: fix race between IRQ handler and error handler on SMP Peng Yang
2026-06-08 13:10 ` Mark Brown
2026-06-09 23:07 ` Mark Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox