All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] gpu: host1x: syncpt_wait micro-optimizations
@ 2026-05-14 10:31 Tanmay Patil
  2026-05-14 10:31 ` [PATCH 1/2] gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait() Tanmay Patil
  2026-05-14 10:31 ` [PATCH 2/2] gpu: host1x: skip redundant HW state update Tanmay Patil
  0 siblings, 2 replies; 3+ messages in thread
From: Tanmay Patil @ 2026-05-14 10:31 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: David Airlie, Simona Vetter, dri-devel, linux-tegra, linux-kernel,
	Tanmay Patil

This series reduces the latency in host1x syncpoint wait path.

Patch 1 removes redundant MMIO reads in host1x_syncpt_wait().
Patch 2 skips the host1x_intr_update_hw_state() call in the ISR
when no fences remain.

Measured syncpoint wait latency (50000 samples):
  Average latency:   12.2 us  -> 9.4 us
  99.99 pct latency: 62.96 us -> 36.58 us

Tanmay Patil (2):
  gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait()
  gpu: host1x: skip redundant HW state update

 drivers/gpu/host1x/intr.c   |  8 ++++++--
 drivers/gpu/host1x/syncpt.c | 23 ++++++++++++++---------
 2 files changed, 20 insertions(+), 11 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait()
  2026-05-14 10:31 [PATCH 0/2] gpu: host1x: syncpt_wait micro-optimizations Tanmay Patil
@ 2026-05-14 10:31 ` Tanmay Patil
  2026-05-14 10:31 ` [PATCH 2/2] gpu: host1x: skip redundant HW state update Tanmay Patil
  1 sibling, 0 replies; 3+ messages in thread
From: Tanmay Patil @ 2026-05-14 10:31 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: David Airlie, Simona Vetter, dri-devel, linux-tegra, linux-kernel,
	Tanmay Patil

In host1x_syncpt_wait(), the hardware syncpoint value was loaded
initially for expiry check, and then loaded a second time to
populate the caller's value pointer. Reuse a single load for
both purposes.

After dma_fence_wait_timeout(), the previous code reloaded the syncpoint
value for the expiry check, which is only required in the timeout case.
On success (i.e., return value > 0, or return value == 0 with zero
jiffies remaining), the ISR has already cached the value before
signaling the fence. The value pointer can therefore be populated using
the cached value using host1x_syncpt_read_min() without MMIO access.
Only the timeout path requires a fresh load, move host1x_syncpt_load()
under that path.

Measured Syncpoint wait latency (50000 samples):
  Average latency:   12.2 us  -> 10.6 us
  99.99 pct latency: 62.96 us -> 51.90 us

Signed-off-by: Tanmay Patil <tanmayp@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index acc7d82e0585..807c74fc6a0a 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -222,11 +222,12 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 {
 	struct dma_fence *fence;
 	long wait_err;
+	u32 curr;
 
-	host1x_hw_syncpt_load(sp->host, sp);
+	curr = host1x_syncpt_load(sp);
 
 	if (value)
-		*value = host1x_syncpt_load(sp);
+		*value = curr;
 
 	if (host1x_syncpt_is_expired(sp, thresh))
 		return 0;
@@ -245,21 +246,25 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		host1x_fence_cancel(fence);
 	dma_fence_put(fence);
 
-	if (value)
-		*value = host1x_syncpt_load(sp);
-
 	/*
 	 * Don't rely on dma_fence_wait_timeout return value,
 	 * since it returns zero both on timeout and if the
 	 * wait completed with 0 jiffies left.
 	 */
-	host1x_hw_syncpt_load(sp->host, sp);
-	if (wait_err == 0 && !host1x_syncpt_is_expired(sp, thresh))
+	if (wait_err == 0 && !host1x_syncpt_is_expired(sp, thresh)) {
+		if (value)
+			*value = host1x_syncpt_load(sp);
+
 		return -EAGAIN;
-	else if (wait_err < 0)
+	} else if (wait_err < 0) {
 		return wait_err;
-	else
+	} else {
+		/* Success, read the value cached by ISR */
+		if (value)
+			*value = host1x_syncpt_read_min(sp);
+
 		return 0;
+	}
 }
 EXPORT_SYMBOL(host1x_syncpt_wait);
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] gpu: host1x: skip redundant HW state update
  2026-05-14 10:31 [PATCH 0/2] gpu: host1x: syncpt_wait micro-optimizations Tanmay Patil
  2026-05-14 10:31 ` [PATCH 1/2] gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait() Tanmay Patil
@ 2026-05-14 10:31 ` Tanmay Patil
  1 sibling, 0 replies; 3+ messages in thread
From: Tanmay Patil @ 2026-05-14 10:31 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: David Airlie, Simona Vetter, dri-devel, linux-tegra, linux-kernel,
	Tanmay Patil

When the fence list is empty, host1x_intr_update_hw_state()
falls through to host1x_intr_disable_syncpt_intr()
which does two MMIO writes to disable the syncpoint
interrupt and clear its status.

The ISR has already disabled and acked the interrupt
before calling host1x_intr_handle_interrupt(), making
these two writes redundant. Skip the update_hw_state()
call if no fences remain.

Measured Syncpoint wait latency (50000 samples):
  Average latency:   10.6 us  -> 9.4 us
  99.99 pct latency: 51.90 us -> 36.58 us

Signed-off-by: Tanmay Patil <tanmayp@nvidia.com>
---
 drivers/gpu/host1x/intr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index f77a678949e9..723297250768 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -92,8 +92,12 @@ void host1x_intr_handle_interrupt(struct host1x *host, unsigned int id)
 		host1x_fence_signal(fence);
 	}
 
-	/* Re-enable interrupt if necessary */
-	host1x_intr_update_hw_state(host, sp);
+	/*
+	 * Re-enable interrupt if necessary. The ISR already disabled the interrupt,
+	 * so if no fences remain, no update is needed.
+	 */
+	if (!list_empty(&sp->fences.list))
+		host1x_intr_update_hw_state(host, sp);
 
 	spin_unlock(&sp->fences.lock);
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-14 10:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 10:31 [PATCH 0/2] gpu: host1x: syncpt_wait micro-optimizations Tanmay Patil
2026-05-14 10:31 ` [PATCH 1/2] gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait() Tanmay Patil
2026-05-14 10:31 ` [PATCH 2/2] gpu: host1x: skip redundant HW state update Tanmay Patil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.