The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH 0/2] gpu: host1x: syncpt_wait micro-optimizations
@ 2026-05-14 10:31 Tanmay Patil
  2026-05-14 10:31 ` [PATCH 1/2] gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait() Tanmay Patil
  2026-05-14 10:31 ` [PATCH 2/2] gpu: host1x: skip redundant HW state update Tanmay Patil
  0 siblings, 2 replies; 3+ messages in thread
From: Tanmay Patil @ 2026-05-14 10:31 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: David Airlie, Simona Vetter, dri-devel, linux-tegra, linux-kernel,
	Tanmay Patil

This series reduces the latency in host1x syncpoint wait path.

Patch 1 removes redundant MMIO reads in host1x_syncpt_wait().
Patch 2 skips the host1x_intr_update_hw_state() call in the ISR
when no fences remain.

Measured syncpoint wait latency (50000 samples):
  Average latency:   12.2 us  -> 9.4 us
  99.99 pct latency: 62.96 us -> 36.58 us

Tanmay Patil (2):
  gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait()
  gpu: host1x: skip redundant HW state update

 drivers/gpu/host1x/intr.c   |  8 ++++++--
 drivers/gpu/host1x/syncpt.c | 23 ++++++++++++++---------
 2 files changed, 20 insertions(+), 11 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait()
  2026-05-14 10:31 [PATCH 0/2] gpu: host1x: syncpt_wait micro-optimizations Tanmay Patil
@ 2026-05-14 10:31 ` Tanmay Patil
  2026-05-14 10:31 ` [PATCH 2/2] gpu: host1x: skip redundant HW state update Tanmay Patil
  1 sibling, 0 replies; 3+ messages in thread
From: Tanmay Patil @ 2026-05-14 10:31 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: David Airlie, Simona Vetter, dri-devel, linux-tegra, linux-kernel,
	Tanmay Patil

In host1x_syncpt_wait(), the hardware syncpoint value was loaded
initially for expiry check, and then loaded a second time to
populate the caller's value pointer. Reuse a single load for
both purposes.

After dma_fence_wait_timeout(), the previous code reloaded the syncpoint
value for the expiry check, which is only required in the timeout case.
On success (i.e., return value > 0, or return value == 0 with zero
jiffies remaining), the ISR has already cached the value before
signaling the fence. The value pointer can therefore be populated using
the cached value using host1x_syncpt_read_min() without MMIO access.
Only the timeout path requires a fresh load, move host1x_syncpt_load()
under that path.

Measured Syncpoint wait latency (50000 samples):
  Average latency:   12.2 us  -> 10.6 us
  99.99 pct latency: 62.96 us -> 51.90 us

Signed-off-by: Tanmay Patil <tanmayp@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index acc7d82e0585..807c74fc6a0a 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -222,11 +222,12 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 {
 	struct dma_fence *fence;
 	long wait_err;
+	u32 curr;
 
-	host1x_hw_syncpt_load(sp->host, sp);
+	curr = host1x_syncpt_load(sp);
 
 	if (value)
-		*value = host1x_syncpt_load(sp);
+		*value = curr;
 
 	if (host1x_syncpt_is_expired(sp, thresh))
 		return 0;
@@ -245,21 +246,25 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		host1x_fence_cancel(fence);
 	dma_fence_put(fence);
 
-	if (value)
-		*value = host1x_syncpt_load(sp);
-
 	/*
 	 * Don't rely on dma_fence_wait_timeout return value,
 	 * since it returns zero both on timeout and if the
 	 * wait completed with 0 jiffies left.
 	 */
-	host1x_hw_syncpt_load(sp->host, sp);
-	if (wait_err == 0 && !host1x_syncpt_is_expired(sp, thresh))
+	if (wait_err == 0 && !host1x_syncpt_is_expired(sp, thresh)) {
+		if (value)
+			*value = host1x_syncpt_load(sp);
+
 		return -EAGAIN;
-	else if (wait_err < 0)
+	} else if (wait_err < 0) {
 		return wait_err;
-	else
+	} else {
+		/* Success, read the value cached by ISR */
+		if (value)
+			*value = host1x_syncpt_read_min(sp);
+
 		return 0;
+	}
 }
 EXPORT_SYMBOL(host1x_syncpt_wait);
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] gpu: host1x: skip redundant HW state update
  2026-05-14 10:31 [PATCH 0/2] gpu: host1x: syncpt_wait micro-optimizations Tanmay Patil
  2026-05-14 10:31 ` [PATCH 1/2] gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait() Tanmay Patil
@ 2026-05-14 10:31 ` Tanmay Patil
  1 sibling, 0 replies; 3+ messages in thread
From: Tanmay Patil @ 2026-05-14 10:31 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: David Airlie, Simona Vetter, dri-devel, linux-tegra, linux-kernel,
	Tanmay Patil

When the fence list is empty, host1x_intr_update_hw_state()
falls through to host1x_intr_disable_syncpt_intr()
which does two MMIO writes to disable the syncpoint
interrupt and clear its status.

The ISR has already disabled and acked the interrupt
before calling host1x_intr_handle_interrupt(), making
these two writes redundant. Skip the update_hw_state()
call if no fences remain.

Measured Syncpoint wait latency (50000 samples):
  Average latency:   10.6 us  -> 9.4 us
  99.99 pct latency: 51.90 us -> 36.58 us

Signed-off-by: Tanmay Patil <tanmayp@nvidia.com>
---
 drivers/gpu/host1x/intr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index f77a678949e9..723297250768 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -92,8 +92,12 @@ void host1x_intr_handle_interrupt(struct host1x *host, unsigned int id)
 		host1x_fence_signal(fence);
 	}
 
-	/* Re-enable interrupt if necessary */
-	host1x_intr_update_hw_state(host, sp);
+	/*
+	 * Re-enable interrupt if necessary. The ISR already disabled the interrupt,
+	 * so if no fences remain, no update is needed.
+	 */
+	if (!list_empty(&sp->fences.list))
+		host1x_intr_update_hw_state(host, sp);
 
 	spin_unlock(&sp->fences.lock);
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-14 10:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 10:31 [PATCH 0/2] gpu: host1x: syncpt_wait micro-optimizations Tanmay Patil
2026-05-14 10:31 ` [PATCH 1/2] gpu: host1x: skip redundant syncpoint loads in host1x_syncpt_wait() Tanmay Patil
2026-05-14 10:31 ` [PATCH 2/2] gpu: host1x: skip redundant HW state update Tanmay Patil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox