From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 496D93CF042 for ; Tue, 19 May 2026 07:18:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779175096; cv=none; b=cGTjOpP4xOb+Rph/pjGqn4P4J/j+Lim/ugTnN3sG6PAcgc6Dj/cj+4hQYEkYLcb4zlLfSD3C1DnRrEGhDcNJEVh9kO07uvqf2JigfCPqAUxxkaxKbUKMUKa72MDCp8Nu6fb2wkjxE3BQgQZ5b5oH/DC1h0ZB2lzkj4RNc2QavC8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779175096; c=relaxed/simple; bh=NXfQyjSa8/XG4PHYnjXFo12TwzRr3wd8DLMGw/d8n7k=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=X6Qy8JWpfc4huwIkXVNQBoPCai5r0arnfKC3m0+wPWiFdNPTqlC8nafCEyjLj1312pKNCZPmGfXu2ohA+MdXyCllpRM1caAg/KJkYgNcLfXgIX4RIoNKmDG8olBOWXL7Rsqf5DRiWxCrrZvYwqrmb256RcC+z5zM/DV4yfIpXlw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ZEVcD5r6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ZEVcD5r6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EC443C2BCC6; Tue, 19 May 2026 07:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779175096; bh=NXfQyjSa8/XG4PHYnjXFo12TwzRr3wd8DLMGw/d8n7k=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=ZEVcD5r6hpB21NJerWuU5GbVTPTG9SXT16k9aeWctEc8x1Pd2xLoFppf27xGRMTlG SiEJ9J6Qr5LCg5tr63TMRpMURW+thobti4gs5SzEOpVsGehHMghypFoiQ7IEZMWzSj yqXAWaqA6O+5bAfrJS8/LlSfBd38eVNjI5WjsL5sx3vuJbIgivyPCrLX+H35ZEzsk0 FBczunyQZ5maquMDgkSnRuGArgvKYHW4r4B0kegtRsHOtfp3YwBtMZXthrjS4UwL2y 2HLYNz8sUZ+gvSj40NU5l2vVFrVTSEcaJOFlDUrLCQRLKwWBPwDfNbuSdvWTaUwAGT ac7Bv1a5jLsPg== From: sashiko-bot@kernel.org Subject: Re: [PATCH] dma: altera-msgdma: Replace memcpy with io32write in msgdma_copy_one Reply-To: sashiko-reviews@lists.linux.dev To: tze.yee.ng@altera.com Cc: Frank.Li@kernel.org, dmaengine@vger.kernel.org, vkoul@kernel.org In-Reply-To: <4586c39b43aa3b9480989940fe905dac40c8cefc.1779173156.git.tze.yee.ng@altera.com> References: <4586c39b43aa3b9480989940fe905dac40c8cefc.1779173156.git.tze.yee.ng@altera.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 19 May 2026 07:18:15 +0000 Message-Id: <20260519071815.EC443C2BCC6@smtp.kernel.org> Precedence: bulk X-Mailing-List: dmaengine@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 3 potential issue(= s) to consider: - [High] Unbounded busy-waiting with mdelay(1) inside a hard IRQ handler ca= uses severe latency spikes and hard lockups. (Pre-existing issue) - [High] Clearing the interrupt status after starting a new transfer create= s a race condition leading to lost interrupts and a hung DMA pipeline. (Pre= -existing issue) - [Medium] The patch introduces excessive heavy memory barriers on the DMA = fast path by using iowrite32() in a loop. -- commit 64f4d2469744b57d2ca64e78f8cef229de2c5eae Author: Adrian Ng Ho Yin dma: altera-msgdma: Replace memcpy with io32write in msgdma_copy_one This commit replaces memcpy() with an explicit iowrite32() loop when writing descriptors into the DMA controller's FIFO. This ensures that MMIO writes are not reordered by the compiler or architecture, guaranteeing that the control word is always written last. > diff --git a/drivers/dma/altera-msgdma.c b/drivers/dma/altera-msgdma.c > index b46999c81df00..5816973d2c70a 100644 > --- a/drivers/dma/altera-msgdma.c > +++ b/drivers/dma/altera-msgdma.c > @@ -495,6 +495,9 @@ static void msgdma_copy_one(struct msgdma_device *mde= v, > struct msgdma_sw_desc *desc) > { > void __iomem *hw_desc =3D mdev->desc; > + const u32 *src =3D (const u32 *)&desc->hw_desc; > + unsigned int i, nwords =3D offsetof(struct msgdma_extended_desc, contro= l) / > + sizeof(u32); > =20 > /* > * Check if the DESC FIFO it not full. If its full, we need to wait > @@ -505,16 +508,16 @@ static void msgdma_copy_one(struct msgdma_device *m= dev, > mdelay(1); This isn't a regression from this patch, but can this unbounded busy-wait inside a hard IRQ handler cause severe latency spikes and hard lockups? If a large scatter-gather list is submitted that exceeds the hardware FIFO size, msgdma_start_transfer() will attempt to push all of them at once. The CPU will spin in mdelay(1) for each subsequent descriptor while the DMA engine processes the transfers. Waiting for hundreds of milliseconds with IRQs disabled could trigger a ker= nel hard lockup watchdog panic. Furthermore, if the hardware stalls due to a bus error, the CPU will spin forever. Would it be possible to implement a proper timeout mechanism and software queuing here instead? [ ... ] > - * The descriptor needs to get copied into the descriptor FIFO > - * of the DMA controller. The descriptor will get flushed to the > - * FIFO, once the last word (control word) is written. Since we > - * are not 100% sure that memcpy() writes all word in the "correct" > - * order (address from low to high) on all architectures, we make > - * sure this control word is written last by single coding it and > - * adding some write-barriers here. > + * The descriptor must be written into the descriptor FIFO of the DMA > + * controller. The FIFO is flushed and the descriptor becomes valid once > + * the last word (the control word) is written. To guarantee the orderi= ng > + * of MMIO writes across all architectures, we write each 32-bit word > + * individually using iowrite32(), and handle the control word separate= ly > + * at the end. This ensures the control word is always written last and > + * prevents memcpy() or the compiler from reordering accesses. > */ > - memcpy((void __force *)hw_desc, &desc->hw_desc, > - sizeof(desc->hw_desc) - sizeof(u32)); > + for (i =3D 0; i < nwords; i++) > + iowrite32(src[i], hw_desc + i * sizeof(u32)); Does this introduce excessive heavy memory barriers on the DMA fast path? The iowrite32() macro maps to writel(), which includes an implicit heavy memory barrier before the write to ensure memory accesses are visible to the device. Executing this barrier in a tight loop for every word of the DMA descriptor could significantly degrade performance on the submission hot path. Since the hardware merely requires the descriptor words to be written sequentially before writing the final control word, could writel_relaxed() be used instead to avoid the per-word barrier overhead? An explicit wmb() is already present before the final control word write to ensure correct ordering. This isn't a regression from this patch, but could clearing the interrupt status after starting a new transfer create a race condition leading to lost interrupts and a hung DMA pipeline? drivers/dma/altera-msgdma.c:msgdma_irq_handler() { if ((status & MSGDMA_CSR_STAT_BUSY) =3D=3D 0) { /* Start next transfer if the DMA controller is idle */ spin_lock(&mdev->lock); mdev->idle =3D true; msgdma_start_transfer(mdev); spin_unlock(&mdev->lock); } tasklet_schedule(&mdev->irq_tasklet); /* Clear interrupt in mSGDMA controller */ iowrite32(MSGDMA_CSR_STAT_IRQ, mdev->csr + MSGDMA_CSR_STATUS); } If the newly started DMA transfer completes extremely quickly, before the CPU reaches the iowrite32() instruction, the hardware will assert the IRQ bit again. The unconditional clear will then clear this new interrupt event, and becau= se the event is lost, the tasklet will not be scheduled for the second transfe= r. On systems without a response FIFO, the tasklet completes exactly one descriptor per run, so losing the interrupt could strand the second descrip= tor in the active_list permanently. Should the interrupt status be cleared before initiating the next transfer? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/4586c39b43aa3b94809= 89940fe905dac40c8cefc.1779173156.git.tze.yee.ng@altera.com?part=3D1