dmaengine.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] dmaengine: ti: k3-udma: Fix/improve the completion helper
@ 2025-07-31 16:41 Miquel Raynal
  2025-07-31 16:41 ` [PATCH 1/3] dmaengine: ti: k3-udma: Reduce completion polling delay Miquel Raynal
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Miquel Raynal @ 2025-07-31 16:41 UTC (permalink / raw)
  To: Peter Ujfalusi, Vinod Koul, Grygorii Strashko
  Cc: Thomas Petazzoni, Peter Ujfalusi, dmaengine, linux-kernel,
	Miquel Raynal, stable

While working on a BeaglePlay with the UART controller and making it use
DMA, I figured the DMA controller completion helper was clearly
misbehaving.

On one side, with slow devices like a UART controller, the helper uses
an unusual polling delay of 1s.

On the other side, the helper can also perform 0us sleeps which means
the driver does not sleep at all in some situations and keeps the CPU
busy waiting.

Finally, while I was digging into these issues, it felt like the helper
was a bit too complex and could be simplified, which is what I did in
patch 3. I initially worked on v6.7.x which did not had the spinlocks, I
hope I didn't got them wrong.

I also tested this with v6.17-rc7.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
Miquel Raynal (3):
      dmaengine: ti: k3-udma: Reduce completion polling delay
      dmaengine: ti: k3-udma: Ensure a minimum polling delay
      dmaengine: ti: k3-udma: Simplify the completion worker

 drivers/dma/ti/k3-udma.c | 85 ++++++++++++++++++++----------------------------
 1 file changed, 36 insertions(+), 49 deletions(-)
---
base-commit: b59220b9fa2c3da4295d71913146cd64b869fcf9
change-id: 20250731-ti-dma-timeout-1c64868b5baf

Best regards,
-- 
Miquel Raynal <miquel.raynal@bootlin.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] dmaengine: ti: k3-udma: Reduce completion polling delay
  2025-07-31 16:41 [PATCH 0/3] dmaengine: ti: k3-udma: Fix/improve the completion helper Miquel Raynal
@ 2025-07-31 16:41 ` Miquel Raynal
  2025-07-31 16:41 ` [PATCH 2/3] dmaengine: ti: k3-udma: Ensure a minimum " Miquel Raynal
  2025-07-31 16:41 ` [PATCH 3/3] dmaengine: ti: k3-udma: Simplify the completion worker Miquel Raynal
  2 siblings, 0 replies; 4+ messages in thread
From: Miquel Raynal @ 2025-07-31 16:41 UTC (permalink / raw)
  To: Peter Ujfalusi, Vinod Koul, Grygorii Strashko
  Cc: Thomas Petazzoni, Peter Ujfalusi, dmaengine, linux-kernel,
	Miquel Raynal, stable

DMA transfers may be slow if the peer device is slow (eg. a UART
controller, a SPI controller...). The completion worker is started very
early compared to the actual draining speed, leading to low speed
devices suffering from a full 1s delay before noticing the transfers are
done, further delaying the execution of their callback.

1s seems overly large for a polling delay, reduce it to arbitrarily
100us which is the minimum amount of time to transfer a byte (and see
some progress on the residue variable) on a 100kHz bus. Based on this
first measure, the next sleep will be much closer to what is actually
needed for the transfer to complete.

The 1 second polling delay involves that any device driver with a (very
standard) 1 second timeout will error out indicating a transfer error,
whereas the data has actually been transferred correctly.

Fixes: 25dcb5dd7b7c ("dmaengine: ti: New driver for K3 UDMA")
Cc: stable@vger.kernel.org # 5.7
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
This issue has been observed by playing with the DMA controller on
Beagle Play.

The patch will not apply before v5.7 and probably do not need to be
backported before anyway.
---
 drivers/dma/ti/k3-udma.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index aa2dc762140f6eee334f4506a592e72600ae9834..b2059baed1b2ffc81c10feca797c763e2a04a357 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1115,19 +1115,18 @@ static void udma_check_tx_completion(struct work_struct *work)
 			time_diff = ktime_sub(uc->tx_drain.tstamp,
 					      time_diff) + 1;
 			residue_diff -= uc->tx_drain.residue;
+			/*
+			 * Try to guess when we should check next time by
+			 * calculating rate at which data is being drained at
+			 * the peer device. Slow devices might have not yet
+			 * started, showing no progress. Use an arbitrary delay
+			 * in this case.
+			 */
 			if (residue_diff) {
-				/*
-				 * Try to guess when we should check
-				 * next time by calculating rate at
-				 * which data is being drained at the
-				 * peer device
-				 */
 				delay = (time_diff / residue_diff) *
 					uc->tx_drain.residue;
 			} else {
-				/* No progress, check again in 1 second  */
-				schedule_delayed_work(&uc->tx_drain.work, HZ);
-				break;
+				delay = 100000;
 			}
 
 			spin_unlock_irqrestore(&uc->vc.lock, flags);

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] dmaengine: ti: k3-udma: Ensure a minimum polling delay
  2025-07-31 16:41 [PATCH 0/3] dmaengine: ti: k3-udma: Fix/improve the completion helper Miquel Raynal
  2025-07-31 16:41 ` [PATCH 1/3] dmaengine: ti: k3-udma: Reduce completion polling delay Miquel Raynal
@ 2025-07-31 16:41 ` Miquel Raynal
  2025-07-31 16:41 ` [PATCH 3/3] dmaengine: ti: k3-udma: Simplify the completion worker Miquel Raynal
  2 siblings, 0 replies; 4+ messages in thread
From: Miquel Raynal @ 2025-07-31 16:41 UTC (permalink / raw)
  To: Peter Ujfalusi, Vinod Koul, Grygorii Strashko
  Cc: Thomas Petazzoni, Peter Ujfalusi, dmaengine, linux-kernel,
	Miquel Raynal, stable

ktime_to_us() returns 0 if the time (ktime_t stores nanoseconds) is too
small, leading to a while loop without sleeps, kind of conflicting with
the initial aim of this function at observing a small delay and then
guessing the amount of time needed to finish draining the descriptor.

Make sure we always sleep for (abitrarily) at least 1us to reduce
pressure on the CPU, while not waiting too much either.

Fixes: 25dcb5dd7b7c ("dmaengine: ti: New driver for K3 UDMA")
Cc: stable@vger.kernel.org # 5.7
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/dma/ti/k3-udma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index b2059baed1b2ffc81c10feca797c763e2a04a357..11232b306475edd5e1ed75d938bbf49ed9c2aabd 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1125,6 +1125,8 @@ static void udma_check_tx_completion(struct work_struct *work)
 			if (residue_diff) {
 				delay = (time_diff / residue_diff) *
 					uc->tx_drain.residue;
+				if (delay < 1000)
+					delay = 1000;
 			} else {
 				delay = 100000;
 			}

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] dmaengine: ti: k3-udma: Simplify the completion worker
  2025-07-31 16:41 [PATCH 0/3] dmaengine: ti: k3-udma: Fix/improve the completion helper Miquel Raynal
  2025-07-31 16:41 ` [PATCH 1/3] dmaengine: ti: k3-udma: Reduce completion polling delay Miquel Raynal
  2025-07-31 16:41 ` [PATCH 2/3] dmaengine: ti: k3-udma: Ensure a minimum " Miquel Raynal
@ 2025-07-31 16:41 ` Miquel Raynal
  2 siblings, 0 replies; 4+ messages in thread
From: Miquel Raynal @ 2025-07-31 16:41 UTC (permalink / raw)
  To: Peter Ujfalusi, Vinod Koul, Grygorii Strashko
  Cc: Thomas Petazzoni, Peter Ujfalusi, dmaengine, linux-kernel,
	Miquel Raynal

The function does nothing in the !uc->desc condition.

No need to go through the entire delay guessing logic if the descriptor
is already done when we first check.

Invert the "no descriptor" and "descriptor done" conditions. This
greatly simplifies the function by dropping two indentation levels and
improves the readability.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/dma/ti/k3-udma.c | 86 ++++++++++++++++++++----------------------------
 1 file changed, 36 insertions(+), 50 deletions(-)

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index 11232b306475edd5e1ed75d938bbf49ed9c2aabd..fb323df15a1b6693d917750f18f907e63bb38c53 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1087,69 +1087,55 @@ static void udma_check_tx_completion(struct work_struct *work)
 {
 	struct udma_chan *uc = container_of(work, typeof(*uc),
 					    tx_drain.work.work);
-	bool desc_done = true;
+	struct udma_desc *d = uc->desc;
+	bool desc_done;
 	u32 residue_diff;
 	ktime_t time_diff;
 	unsigned long delay;
 	unsigned long flags;
 
+	if (!d)
+		return;
+
 	while (1) {
 		spin_lock_irqsave(&uc->vc.lock, flags);
 
-		if (uc->desc) {
-			/* Get previous residue and time stamp */
-			residue_diff = uc->tx_drain.residue;
-			time_diff = uc->tx_drain.tstamp;
-			/*
-			 * Get current residue and time stamp or see if
-			 * transfer is complete
-			 */
-			desc_done = udma_is_desc_really_done(uc, uc->desc);
-		}
-
-		if (!desc_done) {
-			/*
-			 * Find the time delta and residue delta w.r.t
-			 * previous poll
-			 */
-			time_diff = ktime_sub(uc->tx_drain.tstamp,
-					      time_diff) + 1;
-			residue_diff -= uc->tx_drain.residue;
-			/*
-			 * Try to guess when we should check next time by
-			 * calculating rate at which data is being drained at
-			 * the peer device. Slow devices might have not yet
-			 * started, showing no progress. Use an arbitrary delay
-			 * in this case.
-			 */
-			if (residue_diff) {
-				delay = (time_diff / residue_diff) *
-					uc->tx_drain.residue;
-				if (delay < 1000)
-					delay = 1000;
-			} else {
-				delay = 100000;
-			}
-
-			spin_unlock_irqrestore(&uc->vc.lock, flags);
-
-			usleep_range(ktime_to_us(delay),
-				     ktime_to_us(delay) + 10);
-			continue;
-		}
-
-		if (uc->desc) {
-			struct udma_desc *d = uc->desc;
-
-			udma_decrement_byte_counters(uc, d->residue);
-			udma_start(uc);
-			vchan_cookie_complete(&d->vd);
+		/* Get previous residue and time stamp */
+		residue_diff = uc->tx_drain.residue;
+		time_diff = uc->tx_drain.tstamp;
+		/* Get current residue and time stamp or see if transfer is complete */
+		desc_done = udma_is_desc_really_done(uc, uc->desc);
+		if (desc_done)
 			break;
+
+		/* Find the time delta and residue delta w.r.t previous poll */
+		time_diff = ktime_sub(uc->tx_drain.tstamp, time_diff) + 1;
+		residue_diff -= uc->tx_drain.residue;
+		/*
+		 * Try to guess when we should check next time by
+		 * calculating rate at which data is being drained at
+		 * the peer device. Slow devices might have not yet
+		 * started, showing no progress. Use an arbitrary delay
+		 * in this case.
+		 */
+		if (residue_diff) {
+			delay = (time_diff / residue_diff) * uc->tx_drain.residue;
+			if (delay < 1000)
+				delay = 1000;
+		} else {
+			delay = 100000;
 		}
 
-		break;
+		spin_unlock_irqrestore(&uc->vc.lock, flags);
+
+		usleep_range(ktime_to_us(delay),
+			     ktime_to_us(delay) + 10);
 	}
 
+	udma_decrement_byte_counters(uc, d->residue);
+	udma_start(uc);
+	vchan_cookie_complete(&d->vd);
+
 	spin_unlock_irqrestore(&uc->vc.lock, flags);
 }
 

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-31 16:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-31 16:41 [PATCH 0/3] dmaengine: ti: k3-udma: Fix/improve the completion helper Miquel Raynal
2025-07-31 16:41 ` [PATCH 1/3] dmaengine: ti: k3-udma: Reduce completion polling delay Miquel Raynal
2025-07-31 16:41 ` [PATCH 2/3] dmaengine: ti: k3-udma: Ensure a minimum " Miquel Raynal
2025-07-31 16:41 ` [PATCH 3/3] dmaengine: ti: k3-udma: Simplify the completion worker Miquel Raynal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).