Re: [PATCH] dma: altera-msgdma: Replace memcpy with io32write in msgdma_copy_one

DMA Engine development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: tze.yee.ng@altera.com
Cc: Frank.Li@kernel.org, dmaengine@vger.kernel.org, vkoul@kernel.org
Subject: Re: [PATCH] dma: altera-msgdma: Replace memcpy with io32write in msgdma_copy_one
Date: Tue, 19 May 2026 07:18:15 +0000	[thread overview]
Message-ID: <20260519071815.EC443C2BCC6@smtp.kernel.org> (raw)
In-Reply-To: <4586c39b43aa3b9480989940fe905dac40c8cefc.1779173156.git.tze.yee.ng@altera.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Unbounded busy-waiting with mdelay(1) inside a hard IRQ handler causes severe latency spikes and hard lockups. (Pre-existing issue)
- [High] Clearing the interrupt status after starting a new transfer creates a race condition leading to lost interrupts and a hung DMA pipeline. (Pre-existing issue)
- [Medium] The patch introduces excessive heavy memory barriers on the DMA fast path by using iowrite32() in a loop.
--

commit 64f4d2469744b57d2ca64e78f8cef229de2c5eae
Author: Adrian Ng Ho Yin <adrianhoyin.ng@altera.com>

dma: altera-msgdma: Replace memcpy with io32write in msgdma_copy_one

This commit replaces memcpy() with an explicit iowrite32() loop when
writing descriptors into the DMA controller's FIFO. This ensures that
MMIO writes are not reordered by the compiler or architecture, guaranteeing
that the control word is always written last.

> diff --git a/drivers/dma/altera-msgdma.c b/drivers/dma/altera-msgdma.c
> index b46999c81df00..5816973d2c70a 100644
> --- a/drivers/dma/altera-msgdma.c
> +++ b/drivers/dma/altera-msgdma.c
> @@ -495,6 +495,9 @@ static void msgdma_copy_one(struct msgdma_device *mdev,
>  			    struct msgdma_sw_desc *desc)
>  {
>  	void __iomem *hw_desc = mdev->desc;
> +	const u32 *src = (const u32 *)&desc->hw_desc;
> +	unsigned int i, nwords = offsetof(struct msgdma_extended_desc, control) /
> +				 sizeof(u32);
>  
>  	/*
>  	 * Check if the DESC FIFO it not full. If its full, we need to wait
> @@ -505,16 +508,16 @@ static void msgdma_copy_one(struct msgdma_device *mdev,
>  		mdelay(1);

This isn't a regression from this patch, but can this unbounded busy-wait
inside a hard IRQ handler cause severe latency spikes and hard lockups?

If a large scatter-gather list is submitted that exceeds the hardware FIFO
size, msgdma_start_transfer() will attempt to push all of them at once.
The CPU will spin in mdelay(1) for each subsequent descriptor while the DMA
engine processes the transfers.

Waiting for hundreds of milliseconds with IRQs disabled could trigger a kernel
hard lockup watchdog panic. Furthermore, if the hardware stalls due to a bus
error, the CPU will spin forever.

Would it be possible to implement a proper timeout mechanism and software
queuing here instead?

[ ... ]

> -	 * The descriptor needs to get copied into the descriptor FIFO
> -	 * of the DMA controller. The descriptor will get flushed to the
> -	 * FIFO, once the last word (control word) is written. Since we
> -	 * are not 100% sure that memcpy() writes all word in the "correct"
> -	 * order (address from low to high) on all architectures, we make
> -	 * sure this control word is written last by single coding it and
> -	 * adding some write-barriers here.
> +	 * The descriptor must be written into the descriptor FIFO of the DMA
> +	 * controller. The FIFO is flushed and the descriptor becomes valid once
> +	 * the last word (the control word) is written. To guarantee the ordering
> +	 * of MMIO writes across all architectures, we write each 32-bit word
> +	 * individually using iowrite32(), and handle the control word separately
> +	 * at the end. This ensures the control word is always written last and
> +	 * prevents memcpy() or the compiler from reordering accesses.
>  	 */
> -	memcpy((void __force *)hw_desc, &desc->hw_desc,
> -	       sizeof(desc->hw_desc) - sizeof(u32));
> +	for (i = 0; i < nwords; i++)
> +		iowrite32(src[i], hw_desc + i * sizeof(u32));

Does this introduce excessive heavy memory barriers on the DMA fast path?

The iowrite32() macro maps to writel(), which includes an implicit heavy
memory barrier before the write to ensure memory accesses are visible to the
device.

Executing this barrier in a tight loop for every word of the DMA descriptor
could significantly degrade performance on the submission hot path.

Since the hardware merely requires the descriptor words to be written
sequentially before writing the final control word, could writel_relaxed()
be used instead to avoid the per-word barrier overhead? An explicit wmb()
is already present before the final control word write to ensure correct
ordering.

This isn't a regression from this patch, but could clearing the interrupt
status after starting a new transfer create a race condition leading to
lost interrupts and a hung DMA pipeline?

drivers/dma/altera-msgdma.c:msgdma_irq_handler() {
	if ((status & MSGDMA_CSR_STAT_BUSY) == 0) {
		/* Start next transfer if the DMA controller is idle */
		spin_lock(&mdev->lock);
		mdev->idle = true;
		msgdma_start_transfer(mdev);
		spin_unlock(&mdev->lock);
	}

	tasklet_schedule(&mdev->irq_tasklet);

	/* Clear interrupt in mSGDMA controller */
	iowrite32(MSGDMA_CSR_STAT_IRQ, mdev->csr + MSGDMA_CSR_STATUS);
}

If the newly started DMA transfer completes extremely quickly, before the
CPU reaches the iowrite32() instruction, the hardware will assert the IRQ
bit again.

The unconditional clear will then clear this new interrupt event, and because
the event is lost, the tasklet will not be scheduled for the second transfer.

On systems without a response FIFO, the tasklet completes exactly one
descriptor per run, so losing the interrupt could strand the second descriptor
in the active_list permanently.

Should the interrupt status be cleared before initiating the next transfer?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/4586c39b43aa3b9480989940fe905dac40c8cefc.1779173156.git.tze.yee.ng@altera.com?part=1

next prev parent reply	other threads:[~2026-05-19  7:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  6:47 [PATCH] dma: altera-msgdma: Replace memcpy with io32write in msgdma_copy_one tze.yee.ng
2026-05-19  7:18 ` sashiko-bot [this message]
2026-05-20  2:58   ` NG, TZE YEE
2026-05-20 21:29 ` Frank Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519071815.EC443C2BCC6@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=Frank.Li@kernel.org \
    --cc=dmaengine@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=tze.yee.ng@altera.com \
    --cc=vkoul@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox