[PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency
@ 2026-04-29  9:38 Boris Brezillon
  2026-04-29  9:38 ` [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
                   ` (11 more replies)
  0 siblings, 12 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Right now, panthor is one of the rare drivers to signal fences
from work items (not even from the threaded IRQ handler). We
could move that to the threaded handler, but that would still
leave the latency caused by the scheduling of the IRQ thread.

Instead, this patchset moves all the job IRQ processing to
the raw IRQ handler, which is fine because what the current
code does is demux the interrupts and deferring actual handling
to sub work items. The only bits we keep in the IRQ path is
the dma_fence signalling, which should be acceptable, in term
of CPU cycles spent in the IRQ context.

Pretty much all the patches except the last two are just
preparing the ground to get there. The second to last one
does the thread -> IRQ transition, and the last one is some
experimental interrupt coalescing support that I've added
because I noticed moving job IRQ handling to the raw handler
generates quite a lot of interrupts in some case, and having
the system constantly interrupted like that can be
detrimental.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
Boris Brezillon (10):
      drm/panthor: Make panthor_irq::state a non-atomic field
      drm/panthor: Move the register accessors before the IRQ helpers
      drm/panthor: Replace the panthor_irq macro machinery by inline helpers
      drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers
      drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
      drm/panthor: Prepare the scheduler logic for FW events in IRQ context
      drm/panthor: Automate CSG IRQ processing at group unbind time
      drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
      drm/panthor: Process FW events in IRQ context
      drm/panthor: Introduce interrupt coalescing support for job IRQs

 drivers/gpu/drm/panthor/panthor_device.h | 358 ++++++++++++++---------
 drivers/gpu/drm/panthor/panthor_drv.c    |   1 +
 drivers/gpu/drm/panthor/panthor_fw.c     | 226 +++++++++++++--
 drivers/gpu/drm/panthor/panthor_fw.h     |  11 +-
 drivers/gpu/drm/panthor/panthor_gpu.c    |  27 +-
 drivers/gpu/drm/panthor/panthor_mmu.c    |  38 +--
 drivers/gpu/drm/panthor/panthor_pwr.c    |  21 +-
 drivers/gpu/drm/panthor/panthor_sched.c  | 475 ++++++++++++++-----------------
 8 files changed, 698 insertions(+), 459 deletions(-)
---
base-commit: 7455a0583a906533041a80e48c6a2e3230cce96e
change-id: 20260429-panthor-signal-from-irq-d33684f4d292
prerequisite-message-id: <20260427155934.416502-1-karunika.choo@arm.com>
prerequisite-patch-id: 70905a2eb09ab2b31d242a5ed5af3b42fb6a464c
prerequisite-patch-id: aa4c22669f80328039762f25c0b3942bbadbdc89
prerequisite-patch-id: 7f61bcee3c4bb5703900b18d5b6e0f52e622f29d
prerequisite-patch-id: 3402f4d60aa526d40113fc3d9b3e599f8f89e705
prerequisite-patch-id: 00ddbd3d455891f6950609614c1acd2baa78b0db
prerequisite-patch-id: 6a9928f609e3757cadebb2df6795d0da55745f4e
prerequisite-patch-id: fd91f68f25d4bc93eec405f0131f5ae4284bfaf2
prerequisite-patch-id: 553958a10a0ca2f20f7883ad4c752cfc7485c5a8

Best regards,
-- 
Boris Brezillon <boris.brezillon@collabora.com>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-04-29 12:29   ` Liviu Dudau
  2026-05-01 13:17   ` Steven Price
  2026-04-29  9:38 ` [PATCH 02/10] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

The only place where panthor_irq::state is accessed without
panthor_irq::mask_lock held is in the prologue of _irq_suspend(),
which is not really a fast-path. So let's simplify things by assuming
panthor_irq::state must always be accessed with the mask_lock held,
and add a scoped_guard() in _irq_suspend().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 35 ++++++++++++++++----------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 4e4607bca7cc..3f91ba73829d 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -101,8 +101,12 @@ struct panthor_irq {
 	 */
 	spinlock_t mask_lock;
 
-	/** @state: one of &enum panthor_irq_state reflecting the current state. */
-	atomic_t state;
+	/**
+	 * @state: one of &enum panthor_irq_state reflecting the current state.
+	 *
+	 * Must be accessed with mask_lock held.
+	 */
+	enum panthor_irq_state state;
 };
 
 /**
@@ -510,18 +514,15 @@ const char *panthor_exception_name(struct panthor_device *ptdev,
 static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)			\
 {												\
 	struct panthor_irq *pirq = data;							\
-	enum panthor_irq_state old_state;							\
 												\
 	if (!gpu_read(pirq->iomem, INT_STAT))							\
 		return IRQ_NONE;								\
 												\
 	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-	old_state = atomic_cmpxchg(&pirq->state,						\
-				   PANTHOR_IRQ_STATE_ACTIVE,					\
-				   PANTHOR_IRQ_STATE_PROCESSING);				\
-	if (old_state != PANTHOR_IRQ_STATE_ACTIVE)						\
+	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)						\
 		return IRQ_NONE;								\
 												\
+	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;						\
 	gpu_write(pirq->iomem, INT_MASK, 0);							\
 	return IRQ_WAKE_THREAD;									\
 }												\
@@ -551,13 +552,10 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
 	}											\
 												\
 	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
-		enum panthor_irq_state old_state;						\
-												\
-		old_state = atomic_cmpxchg(&pirq->state,					\
-					   PANTHOR_IRQ_STATE_PROCESSING,			\
-					   PANTHOR_IRQ_STATE_ACTIVE);				\
-		if (old_state == PANTHOR_IRQ_STATE_PROCESSING)					\
+		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {				\
+			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;					\
 			gpu_write(pirq->iomem, INT_MASK, pirq->mask);				\
+		}										\
 	}											\
 												\
 	return ret;										\
@@ -566,18 +564,19 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
 static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
 {												\
 	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
-		atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDING);				\
+		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;					\
 		gpu_write(pirq->iomem, INT_MASK, 0);						\
 	}											\
 	synchronize_irq(pirq->irq);								\
-	atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDED);					\
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock)					\
+		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;					\
 }												\
 												\
 static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)			\
 {												\
 	guard(spinlock_irqsave)(&pirq->mask_lock);						\
 												\
-	atomic_set(&pirq->state, PANTHOR_IRQ_STATE_ACTIVE);					\
+	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;							\
 	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);						\
 	gpu_write(pirq->iomem, INT_MASK, pirq->mask);						\
 }												\
@@ -610,7 +609,7 @@ static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *
 	 * on the PROCESSING -> ACTIVE transition.						\
 	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
 	 */											\
-	if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)				\
+	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
 		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
 }												\
 												\
@@ -624,7 +623,7 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
 	 * on the PROCESSING -> ACTIVE transition.						\
 	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
 	 */											\
-	if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)				\
+	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
 		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
 }
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/10] drm/panthor: Move the register accessors before the IRQ helpers
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
  2026-04-29  9:38 ` [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-04-29 12:31   ` Liviu Dudau
  2026-05-01 13:17   ` Steven Price
  2026-04-29  9:38 ` [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

We're about to add an IRQ inline helper using gpu_read(). Move things
around to avoid forward declarations.

No functional changes.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 142 +++++++++++++++----------------
 1 file changed, 71 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 3f91ba73829d..768fc1992368 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -495,6 +495,77 @@ panthor_exception_is_fault(u32 exception_code)
 const char *panthor_exception_name(struct panthor_device *ptdev,
 				   u32 exception_code);
 
+static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
+{
+	writel(data, iomem + reg);
+}
+
+static inline u32 gpu_read(void __iomem *iomem, u32 reg)
+{
+	return readl(iomem + reg);
+}
+
+static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
+{
+	return readl_relaxed(iomem + reg);
+}
+
+static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
+{
+	gpu_write(iomem, reg, lower_32_bits(data));
+	gpu_write(iomem, reg + 4, upper_32_bits(data));
+}
+
+static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
+{
+	return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
+}
+
+static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
+{
+	return (gpu_read_relaxed(iomem, reg) |
+		((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
+}
+
+static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
+{
+	u32 lo, hi1, hi2;
+	do {
+		hi1 = gpu_read(iomem, reg + 4);
+		lo = gpu_read(iomem, reg);
+		hi2 = gpu_read(iomem, reg + 4);
+	} while (hi1 != hi2);
+	return lo | ((u64)hi2 << 32);
+}
+
+#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
+	read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,	\
+			  iomem, reg)
+
+#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
+				     timeout_us)				\
+	read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,	\
+				 false, iomem, reg)
+
+#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
+	read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,	\
+			  iomem, reg)
+
+#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
+				       timeout_us)				\
+	read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,	\
+				 false, iomem, reg)
+
+#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,	\
+					     timeout_us)			\
+	read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,		\
+				 timeout_us, false, iomem, reg)
+
+#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,	\
+					timeout_us)				\
+	read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,	\
+			  false, iomem, reg)
+
 #define INT_RAWSTAT 0x0
 #define INT_CLEAR   0x4
 #define INT_MASK    0x8
@@ -629,75 +700,4 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
 
 extern struct workqueue_struct *panthor_cleanup_wq;
 
-static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
-{
-	writel(data, iomem + reg);
-}
-
-static inline u32 gpu_read(void __iomem *iomem, u32 reg)
-{
-	return readl(iomem + reg);
-}
-
-static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
-{
-	return readl_relaxed(iomem + reg);
-}
-
-static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
-{
-	gpu_write(iomem, reg, lower_32_bits(data));
-	gpu_write(iomem, reg + 4, upper_32_bits(data));
-}
-
-static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
-{
-	return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
-}
-
-static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
-{
-	return (gpu_read_relaxed(iomem, reg) |
-		((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
-}
-
-static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
-{
-	u32 lo, hi1, hi2;
-	do {
-		hi1 = gpu_read(iomem, reg + 4);
-		lo = gpu_read(iomem, reg);
-		hi2 = gpu_read(iomem, reg + 4);
-	} while (hi1 != hi2);
-	return lo | ((u64)hi2 << 32);
-}
-
-#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
-	read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,	\
-			  iomem, reg)
-
-#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
-				     timeout_us)				\
-	read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,	\
-				 false, iomem, reg)
-
-#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
-	read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,	\
-			  iomem, reg)
-
-#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
-				       timeout_us)				\
-	read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,	\
-				 false, iomem, reg)
-
-#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,	\
-					     timeout_us)			\
-	read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,		\
-				 timeout_us, false, iomem, reg)
-
-#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,	\
-					timeout_us)				\
-	read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,	\
-			  false, iomem, reg)
-
 #endif

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
  2026-04-29  9:38 ` [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
  2026-04-29  9:38 ` [PATCH 02/10] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-04-30  9:40   ` Karunika Choo
  2026-05-01 13:22   ` Steven Price
  2026-04-29  9:38 ` [PATCH 04/10] drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers Boris Brezillon
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Now that panthor_irq contains the iomem region, there's no real need
for the macro-based panthor_irq helper generation logic. We can just
provide inline helpers that do the same and let the compiler optimize
indirect function calls. The only extra annoyance is the fact we have
to open-code the panthor_xxx_irq_threaded_handler() implementation, but
those are single-line functions, so it's acceptable.

While at it, we changed the prototype of the IRQ handlers to take
a panthor_irq instead of panthor_device, since that's the thing
that's passed around when it comes to panthor_irq, and the
panthor_device can be directly extracted from there.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 245 +++++++++++++++----------------
 drivers/gpu/drm/panthor/panthor_fw.c     |  22 ++-
 drivers/gpu/drm/panthor/panthor_gpu.c    |  26 ++--
 drivers/gpu/drm/panthor/panthor_mmu.c    |  37 ++---
 drivers/gpu/drm/panthor/panthor_pwr.c    |  20 ++-
 5 files changed, 183 insertions(+), 167 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 768fc1992368..afa202546316 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -571,131 +571,126 @@ static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
 #define INT_MASK    0x8
 #define INT_STAT    0xc
 
-/**
- * PANTHOR_IRQ_HANDLER() - Define interrupt handlers and the interrupt
- * registration function.
- *
- * The boiler-plate to gracefully deal with shared interrupts is
- * auto-generated. All you have to do is call PANTHOR_IRQ_HANDLER()
- * just after the actual handler. The handler prototype is:
- *
- * void (*handler)(struct panthor_device *, u32 status);
- */
-#define PANTHOR_IRQ_HANDLER(__name, __handler)							\
-static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)			\
-{												\
-	struct panthor_irq *pirq = data;							\
-												\
-	if (!gpu_read(pirq->iomem, INT_STAT))							\
-		return IRQ_NONE;								\
-												\
-	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)						\
-		return IRQ_NONE;								\
-												\
-	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;						\
-	gpu_write(pirq->iomem, INT_MASK, 0);							\
-	return IRQ_WAKE_THREAD;									\
-}												\
-												\
-static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *data)		\
-{												\
-	struct panthor_irq *pirq = data;							\
-	struct panthor_device *ptdev = pirq->ptdev;						\
-	irqreturn_t ret = IRQ_NONE;								\
-												\
-	while (true) {										\
-		/* It's safe to access pirq->mask without the lock held here. If a new		\
-		 * event gets added to the mask and the corresponding IRQ is pending,		\
-		 * we'll process it right away instead of adding an extra raw -> threaded	\
-		 * round trip. If an event is removed and the status bit is set, it will	\
-		 * be ignored, just like it would have been if the mask had been adjusted	\
-		 * right before the HW event kicks in. TLDR; it's all expected races we're	\
-		 * covered for.									\
-		 */										\
-		u32 status = gpu_read(pirq->iomem, INT_RAWSTAT) & pirq->mask;			\
-												\
-		if (!status)									\
-			break;									\
-												\
-		__handler(ptdev, status);							\
-		ret = IRQ_HANDLED;								\
-	}											\
-												\
-	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
-		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {				\
-			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;					\
-			gpu_write(pirq->iomem, INT_MASK, pirq->mask);				\
-		}										\
-	}											\
-												\
-	return ret;										\
-}												\
-												\
-static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
-{												\
-	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
-		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;					\
-		gpu_write(pirq->iomem, INT_MASK, 0);						\
-	}											\
-	synchronize_irq(pirq->irq);								\
-	scoped_guard(spinlock_irqsave, &pirq->mask_lock)					\
-		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;					\
-}												\
-												\
-static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)			\
-{												\
-	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-												\
-	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;							\
-	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);						\
-	gpu_write(pirq->iomem, INT_MASK, pirq->mask);						\
-}												\
-												\
-static int panthor_request_ ## __name ## _irq(struct panthor_device *ptdev,			\
-					      struct panthor_irq *pirq,				\
-					      int irq, u32 mask, void __iomem *iomem)		\
-{												\
-	pirq->ptdev = ptdev;									\
-	pirq->irq = irq;									\
-	pirq->mask = mask;									\
-	pirq->iomem = iomem;									\
-	spin_lock_init(&pirq->mask_lock);							\
-	panthor_ ## __name ## _irq_resume(pirq);						\
-												\
-	return devm_request_threaded_irq(ptdev->base.dev, irq,					\
-					 panthor_ ## __name ## _irq_raw_handler,		\
-					 panthor_ ## __name ## _irq_threaded_handler,		\
-					 IRQF_SHARED, KBUILD_MODNAME "-" # __name,		\
-					 pirq);							\
-}												\
-												\
-static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *pirq, u32 mask)	\
-{												\
-	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-	pirq->mask |= mask;									\
-												\
-	/* The only situation where we need to write the new mask is if the IRQ is active.	\
-	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()	\
-	 * on the PROCESSING -> ACTIVE transition.						\
-	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
-	 */											\
-	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
-		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
-}												\
-												\
-static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq *pirq, u32 mask)\
-{												\
-	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-	pirq->mask &= ~mask;									\
-												\
-	/* The only situation where we need to write the new mask is if the IRQ is active.	\
-	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()	\
-	 * on the PROCESSING -> ACTIVE transition.						\
-	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
-	 */											\
-	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
-		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
+static inline irqreturn_t panthor_irq_default_raw_handler(int irq, void *data)
+{
+	struct panthor_irq *pirq = data;
+
+	if (!gpu_read(pirq->iomem, INT_STAT))
+		return IRQ_NONE;
+
+	guard(spinlock_irqsave)(&pirq->mask_lock);
+	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
+		return IRQ_NONE;
+
+	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
+	gpu_write(pirq->iomem, INT_MASK, 0);
+	return IRQ_WAKE_THREAD;
+}
+
+static inline irqreturn_t
+panthor_irq_default_threaded_handler(void *data,
+				     void (*slow_handler)(struct panthor_irq *, u32))
+{
+	struct panthor_irq *pirq = data;
+	irqreturn_t ret = IRQ_NONE;
+
+	while (true) {
+		/* It's safe to access pirq->mask without the lock held here. If a new
+		 * event gets added to the mask and the corresponding IRQ is pending,
+		 * we'll process it right away instead of adding an extra raw -> threaded
+		 * round trip. If an event is removed and the status bit is set, it will
+		 * be ignored, just like it would have been if the mask had been adjusted
+		 * right before the HW event kicks in. TLDR; it's all expected races we're
+		 * covered for.
+		 */
+		u32 status = gpu_read(pirq->iomem, INT_RAWSTAT) & pirq->mask;
+
+		if (!status)
+			break;
+
+		slow_handler(pirq, status);
+		ret = IRQ_HANDLED;
+	}
+
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {
+			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
+			gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+		}
+	}
+
+	return ret;
+}
+
+static inline void panthor_irq_suspend(struct panthor_irq *pirq)
+{
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;
+		gpu_write(pirq->iomem, INT_MASK, 0);
+	}
+	synchronize_irq(pirq->irq);
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock)
+		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;
+}
+
+static inline void panthor_irq_resume(struct panthor_irq *pirq)
+{
+	guard(spinlock_irqsave)(&pirq->mask_lock);
+	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
+	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);
+	gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+}
+
+static inline void panthor_irq_enable_events(struct panthor_irq *pirq, u32 mask)
+{
+	guard(spinlock_irqsave)(&pirq->mask_lock);
+	pirq->mask |= mask;
+
+	/* The only situation where we need to write the new mask is if the IRQ is active.
+	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()
+	 * on the PROCESSING -> ACTIVE transition.
+	 * If the IRQ is suspended/suspending, the mask is restored at resume time.
+	 */
+	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)
+		gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+}
+
+static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask)
+{
+	guard(spinlock_irqsave)(&pirq->mask_lock);
+	pirq->mask &= ~mask;
+
+	/* The only situation where we need to write the new mask is if the IRQ is active.
+	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()
+	 * on the PROCESSING -> ACTIVE transition.
+	 * If the IRQ is suspended/suspending, the mask is restored at resume time.
+	 */
+	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)
+		gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+}
+
+static inline int
+panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
+		    int irq, u32 mask, void __iomem *iomem, const char *name,
+		    irqreturn_t (*threaded_handler)(int, void *data))
+{
+	const char *full_name;
+
+	pirq->ptdev = ptdev;
+	pirq->irq = irq;
+	pirq->mask = mask;
+	pirq->iomem = iomem;
+	spin_lock_init(&pirq->mask_lock);
+	panthor_irq_resume(pirq);
+
+	full_name = devm_kasprintf(ptdev->base.dev, GFP_KERNEL, KBUILD_MODNAME "-%s", name);
+	if (!full_name)
+		return -ENOMEM;
+
+	return devm_request_threaded_irq(ptdev->base.dev, irq,
+					 panthor_irq_default_raw_handler,
+					 threaded_handler,
+					 IRQF_SHARED, full_name, pirq);
 }
 
 extern struct workqueue_struct *panthor_cleanup_wq;
diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index 986151681b24..eaf599b0a887 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1064,8 +1064,9 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
 			 msecs_to_jiffies(PING_INTERVAL_MS));
 }
 
-static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
+static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
 {
+	struct panthor_device *ptdev = pirq->ptdev;
 	u32 duration;
 	u64 start = 0;
 
@@ -1091,7 +1092,11 @@ static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
 		trace_gpu_job_irq(ptdev->base.dev, status, duration);
 	}
 }
-PANTHOR_IRQ_HANDLER(job, panthor_job_irq_handler);
+
+static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
+{
+	return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
+}
 
 static int panthor_fw_start(struct panthor_device *ptdev)
 {
@@ -1099,8 +1104,8 @@ static int panthor_fw_start(struct panthor_device *ptdev)
 	bool timedout = false;
 
 	ptdev->fw->booted = false;
-	panthor_job_irq_enable_events(&ptdev->fw->irq, ~0);
-	panthor_job_irq_resume(&ptdev->fw->irq);
+	panthor_irq_enable_events(&ptdev->fw->irq, ~0);
+	panthor_irq_resume(&ptdev->fw->irq);
 	gpu_write(fw->iomem, MCU_CONTROL, MCU_CONTROL_AUTO);
 
 	if (!wait_event_timeout(ptdev->fw->req_waitqueue,
@@ -1210,7 +1215,7 @@ void panthor_fw_pre_reset(struct panthor_device *ptdev, bool on_hang)
 			ptdev->reset.fast = true;
 	}
 
-	panthor_job_irq_suspend(&ptdev->fw->irq);
+	panthor_irq_suspend(&ptdev->fw->irq);
 	panthor_fw_stop(ptdev);
 }
 
@@ -1280,7 +1285,7 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
 	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) {
 		/* Make sure the IRQ handler cannot be called after that point. */
 		if (ptdev->fw->irq.irq)
-			panthor_job_irq_suspend(&ptdev->fw->irq);
+			panthor_irq_suspend(&ptdev->fw->irq);
 
 		panthor_fw_stop(ptdev);
 	}
@@ -1476,8 +1481,9 @@ int panthor_fw_init(struct panthor_device *ptdev)
 	if (irq <= 0)
 		return -ENODEV;
 
-	ret = panthor_request_job_irq(ptdev, &fw->irq, irq, 0,
-				      ptdev->iomem + JOB_INT_BASE);
+	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
+				  ptdev->iomem + JOB_INT_BASE, "job",
+				  panthor_job_irq_threaded_handler);
 	if (ret) {
 		drm_err(&ptdev->base, "failed to request job irq");
 		return ret;
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index e52c5675981f..ce208e384762 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -86,8 +86,9 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
 	gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
 }
 
-static void panthor_gpu_irq_handler(struct panthor_device *ptdev, u32 status)
+static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
 {
+	struct panthor_device *ptdev = pirq->ptdev;
 	struct panthor_gpu *gpu = ptdev->gpu;
 
 	gpu_write(gpu->irq.iomem, INT_CLEAR, status);
@@ -116,7 +117,11 @@ static void panthor_gpu_irq_handler(struct panthor_device *ptdev, u32 status)
 	}
 	spin_unlock(&ptdev->gpu->reqs_lock);
 }
-PANTHOR_IRQ_HANDLER(gpu, panthor_gpu_irq_handler);
+
+static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
+{
+	return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);
+}
 
 /**
  * panthor_gpu_unplug() - Called when the GPU is unplugged.
@@ -128,7 +133,7 @@ void panthor_gpu_unplug(struct panthor_device *ptdev)
 
 	/* Make sure the IRQ handler is not running after that point. */
 	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev))
-		panthor_gpu_irq_suspend(&ptdev->gpu->irq);
+		panthor_irq_suspend(&ptdev->gpu->irq);
 
 	/* Wake-up all waiters. */
 	spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
@@ -169,9 +174,10 @@ int panthor_gpu_init(struct panthor_device *ptdev)
 	if (irq < 0)
 		return irq;
 
-	ret = panthor_request_gpu_irq(ptdev, &ptdev->gpu->irq, irq,
-				      GPU_INTERRUPTS_MASK,
-				      ptdev->iomem + GPU_INT_BASE);
+	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
+				  GPU_INTERRUPTS_MASK,
+				  ptdev->iomem + GPU_INT_BASE, "gpu",
+				  panthor_gpu_irq_threaded_handler);
 	if (ret)
 		return ret;
 
@@ -182,7 +188,7 @@ int panthor_gpu_power_changed_on(struct panthor_device *ptdev)
 {
 	guard(pm_runtime_active)(ptdev->base.dev);
 
-	panthor_gpu_irq_enable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
+	panthor_irq_enable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
 
 	return 0;
 }
@@ -191,7 +197,7 @@ void panthor_gpu_power_changed_off(struct panthor_device *ptdev)
 {
 	guard(pm_runtime_active)(ptdev->base.dev);
 
-	panthor_gpu_irq_disable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
+	panthor_irq_disable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
 }
 
 /**
@@ -424,7 +430,7 @@ void panthor_gpu_suspend(struct panthor_device *ptdev)
 	else
 		panthor_hw_l2_power_off(ptdev);
 
-	panthor_gpu_irq_suspend(&ptdev->gpu->irq);
+	panthor_irq_suspend(&ptdev->gpu->irq);
 }
 
 /**
@@ -436,7 +442,7 @@ void panthor_gpu_suspend(struct panthor_device *ptdev)
  */
 void panthor_gpu_resume(struct panthor_device *ptdev)
 {
-	panthor_gpu_irq_resume(&ptdev->gpu->irq);
+	panthor_irq_resume(&ptdev->gpu->irq);
 	panthor_hw_l2_power_on(ptdev);
 }
 
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index a7ee14986849..a0d0a9b2926f 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -586,17 +586,13 @@ static u32 panthor_mmu_as_fault_mask(struct panthor_device *ptdev, u32 as)
 	return BIT(as);
 }
 
-/* Forward declaration to call helpers within as_enable/disable */
-static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status);
-PANTHOR_IRQ_HANDLER(mmu, panthor_mmu_irq_handler);
-
 static int panthor_mmu_as_enable(struct panthor_device *ptdev, u32 as_nr,
 				 u64 transtab, u64 transcfg, u64 memattr)
 {
 	struct panthor_mmu *mmu = ptdev->mmu;
 
-	panthor_mmu_irq_enable_events(&ptdev->mmu->irq,
-				      panthor_mmu_as_fault_mask(ptdev, as_nr));
+	panthor_irq_enable_events(&ptdev->mmu->irq,
+				  panthor_mmu_as_fault_mask(ptdev, as_nr));
 
 	gpu_write64(mmu->iomem, AS_TRANSTAB(as_nr), transtab);
 	gpu_write64(mmu->iomem, AS_MEMATTR(as_nr), memattr);
@@ -614,8 +610,8 @@ static int panthor_mmu_as_disable(struct panthor_device *ptdev, u32 as_nr,
 
 	lockdep_assert_held(&ptdev->mmu->as.slots_lock);
 
-	panthor_mmu_irq_disable_events(&ptdev->mmu->irq,
-				       panthor_mmu_as_fault_mask(ptdev, as_nr));
+	panthor_irq_disable_events(&ptdev->mmu->irq,
+				   panthor_mmu_as_fault_mask(ptdev, as_nr));
 
 	/* Flush+invalidate RW caches, invalidate RO ones. */
 	ret = panthor_gpu_flush_caches(ptdev, CACHE_CLEAN | CACHE_INV,
@@ -1785,8 +1781,9 @@ static void panthor_vm_unlock_region(struct panthor_vm *vm)
 	mutex_unlock(&ptdev->mmu->as.slots_lock);
 }
 
-static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
+static void panthor_mmu_irq_handler(struct panthor_irq *pirq, u32 status)
 {
+	struct panthor_device *ptdev = pirq->ptdev;
 	struct panthor_mmu *mmu = ptdev->mmu;
 	bool has_unhandled_faults = false;
 
@@ -1849,6 +1846,11 @@ static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
 		panthor_sched_report_mmu_fault(ptdev);
 }
 
+static irqreturn_t panthor_mmu_irq_threaded_handler(int irq, void *data)
+{
+	return panthor_irq_default_threaded_handler(data, panthor_mmu_irq_handler);
+}
+
 /**
  * panthor_mmu_suspend() - Suspend the MMU logic
  * @ptdev: Device.
@@ -1873,7 +1875,7 @@ void panthor_mmu_suspend(struct panthor_device *ptdev)
 	}
 	mutex_unlock(&ptdev->mmu->as.slots_lock);
 
-	panthor_mmu_irq_suspend(&ptdev->mmu->irq);
+	panthor_irq_suspend(&ptdev->mmu->irq);
 }
 
 /**
@@ -1892,7 +1894,7 @@ void panthor_mmu_resume(struct panthor_device *ptdev)
 	ptdev->mmu->as.faulty_mask = 0;
 	mutex_unlock(&ptdev->mmu->as.slots_lock);
 
-	panthor_mmu_irq_resume(&ptdev->mmu->irq);
+	panthor_irq_resume(&ptdev->mmu->irq);
 }
 
 /**
@@ -1909,7 +1911,7 @@ void panthor_mmu_pre_reset(struct panthor_device *ptdev)
 {
 	struct panthor_vm *vm;
 
-	panthor_mmu_irq_suspend(&ptdev->mmu->irq);
+	panthor_irq_suspend(&ptdev->mmu->irq);
 
 	mutex_lock(&ptdev->mmu->vm.lock);
 	ptdev->mmu->vm.reset_in_progress = true;
@@ -1946,7 +1948,7 @@ void panthor_mmu_post_reset(struct panthor_device *ptdev)
 
 	mutex_unlock(&ptdev->mmu->as.slots_lock);
 
-	panthor_mmu_irq_resume(&ptdev->mmu->irq);
+	panthor_irq_resume(&ptdev->mmu->irq);
 
 	/* Restart the VM_BIND queues. */
 	mutex_lock(&ptdev->mmu->vm.lock);
@@ -3201,7 +3203,7 @@ panthor_mmu_reclaim_priv_bos(struct panthor_device *ptdev,
 void panthor_mmu_unplug(struct panthor_device *ptdev)
 {
 	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev))
-		panthor_mmu_irq_suspend(&ptdev->mmu->irq);
+		panthor_irq_suspend(&ptdev->mmu->irq);
 
 	mutex_lock(&ptdev->mmu->as.slots_lock);
 	for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) {
@@ -3255,9 +3257,10 @@ int panthor_mmu_init(struct panthor_device *ptdev)
 	if (irq <= 0)
 		return -ENODEV;
 
-	ret = panthor_request_mmu_irq(ptdev, &mmu->irq, irq,
-				      panthor_mmu_fault_mask(ptdev, ~0),
-				      ptdev->iomem + MMU_INT_BASE);
+	ret = panthor_irq_request(ptdev, &mmu->irq, irq,
+				  panthor_mmu_fault_mask(ptdev, ~0),
+				  ptdev->iomem + MMU_INT_BASE, "mmu",
+				  panthor_mmu_irq_threaded_handler);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
index 7c7f424a1436..80cf78007896 100644
--- a/drivers/gpu/drm/panthor/panthor_pwr.c
+++ b/drivers/gpu/drm/panthor/panthor_pwr.c
@@ -56,8 +56,9 @@ struct panthor_pwr {
 	wait_queue_head_t reqs_acked;
 };
 
-static void panthor_pwr_irq_handler(struct panthor_device *ptdev, u32 status)
+static void panthor_pwr_irq_handler(struct panthor_irq *pirq, u32 status)
 {
+	struct panthor_device *ptdev = pirq->ptdev;
 	struct panthor_pwr *pwr = ptdev->pwr;
 
 	spin_lock(&ptdev->pwr->reqs_lock);
@@ -75,7 +76,11 @@ static void panthor_pwr_irq_handler(struct panthor_device *ptdev, u32 status)
 	}
 	spin_unlock(&ptdev->pwr->reqs_lock);
 }
-PANTHOR_IRQ_HANDLER(pwr, panthor_pwr_irq_handler);
+
+static irqreturn_t panthor_pwr_irq_threaded_handler(int irq, void *data)
+{
+	return panthor_irq_default_threaded_handler(data, panthor_pwr_irq_handler);
+}
 
 static void panthor_pwr_write_command(struct panthor_device *ptdev, u32 command, u64 args)
 {
@@ -453,7 +458,7 @@ void panthor_pwr_unplug(struct panthor_device *ptdev)
 		return;
 
 	/* Make sure the IRQ handler is not running after that point. */
-	panthor_pwr_irq_suspend(&ptdev->pwr->irq);
+	panthor_irq_suspend(&ptdev->pwr->irq);
 
 	/* Wake-up all waiters. */
 	spin_lock_irqsave(&ptdev->pwr->reqs_lock, flags);
@@ -483,9 +488,10 @@ int panthor_pwr_init(struct panthor_device *ptdev)
 	if (irq < 0)
 		return irq;
 
-	err = panthor_request_pwr_irq(
+	err = panthor_irq_request(
 		ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
-		pwr->iomem + PWR_INT_BASE);
+		pwr->iomem + PWR_INT_BASE, "pwr",
+		panthor_pwr_irq_threaded_handler);
 	if (err)
 		return err;
 
@@ -564,7 +570,7 @@ void panthor_pwr_suspend(struct panthor_device *ptdev)
 	if (!ptdev->pwr)
 		return;
 
-	panthor_pwr_irq_suspend(&ptdev->pwr->irq);
+	panthor_irq_suspend(&ptdev->pwr->irq);
 }
 
 void panthor_pwr_resume(struct panthor_device *ptdev)
@@ -572,5 +578,5 @@ void panthor_pwr_resume(struct panthor_device *ptdev)
 	if (!ptdev->pwr)
 		return;
 
-	panthor_pwr_irq_resume(&ptdev->pwr->irq);
+	panthor_irq_resume(&ptdev->pwr->irq);
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/10] drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (2 preceding siblings ...)
  2026-04-29  9:38 ` [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-04-29 13:32   ` Liviu Dudau
  2026-05-01 13:28   ` Steven Price
  2026-04-29  9:38 ` [PATCH 05/10] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

All drivers except panthor signal their fences from their interrupt
handler to minimize latency. We could do the same from the interrupt
handler, but the latency is still quite high in that case, so let's
allow components to choose the context they want their IRQ handler
to run in.

This takes the form of an extra fast_handler() returning an irqreturn_t
reflecting the need to wake-up a thread or not.
A new PANTHOR_IRQ_ADV_HANDLER() macro taking this extra fast_handler
argument is added, PANTHOR_IRQ_HANDLER() is implemented as a wrapper
around PANTHOR_IRQ_ADV_HANDLER() with a default fast_handler
returning IRQ_WAKE_THREAD. The fast and slow handler are still assumed
to be mutually exclusive. In case a fast handler is provided, the
slow_handler is expected to be run when the event can't be processed
directly in the fast handler, or when the driver thinks it would be
beneficial to coalesce interrupts by polling in the thread rather than
re-enabling interrupts immediately.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 5 ++---
 drivers/gpu/drm/panthor/panthor_fw.c     | 1 +
 drivers/gpu/drm/panthor/panthor_gpu.c    | 1 +
 drivers/gpu/drm/panthor/panthor_mmu.c    | 1 +
 drivers/gpu/drm/panthor/panthor_pwr.c    | 1 +
 5 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index afa202546316..1c130b8394ab 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
 static inline int
 panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
 		    int irq, u32 mask, void __iomem *iomem, const char *name,
+		    irqreturn_t (*raw_handler)(int, void *data),
 		    irqreturn_t (*threaded_handler)(int, void *data))
 {
 	const char *full_name;
@@ -687,9 +688,7 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
 	if (!full_name)
 		return -ENOMEM;
 
-	return devm_request_threaded_irq(ptdev->base.dev, irq,
-					 panthor_irq_default_raw_handler,
-					 threaded_handler,
+	return devm_request_threaded_irq(ptdev->base.dev, irq, raw_handler, threaded_handler,
 					 IRQF_SHARED, full_name, pirq);
 }
 
diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index eaf599b0a887..8239a6951569 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1483,6 +1483,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
 
 	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
 				  ptdev->iomem + JOB_INT_BASE, "job",
+				  panthor_irq_default_raw_handler,
 				  panthor_job_irq_threaded_handler);
 	if (ret) {
 		drm_err(&ptdev->base, "failed to request job irq");
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index ce208e384762..d0be758ea3e1 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -177,6 +177,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
 	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
 				  GPU_INTERRUPTS_MASK,
 				  ptdev->iomem + GPU_INT_BASE, "gpu",
+				  panthor_irq_default_raw_handler,
 				  panthor_gpu_irq_threaded_handler);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index a0d0a9b2926f..2cb07933b629 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -3260,6 +3260,7 @@ int panthor_mmu_init(struct panthor_device *ptdev)
 	ret = panthor_irq_request(ptdev, &mmu->irq, irq,
 				  panthor_mmu_fault_mask(ptdev, ~0),
 				  ptdev->iomem + MMU_INT_BASE, "mmu",
+				  panthor_irq_default_raw_handler,
 				  panthor_mmu_irq_threaded_handler);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
index 80cf78007896..1efb7f3482ba 100644
--- a/drivers/gpu/drm/panthor/panthor_pwr.c
+++ b/drivers/gpu/drm/panthor/panthor_pwr.c
@@ -491,6 +491,7 @@ int panthor_pwr_init(struct panthor_device *ptdev)
 	err = panthor_irq_request(
 		ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
 		pwr->iomem + PWR_INT_BASE, "pwr",
+		panthor_irq_default_raw_handler,
 		panthor_pwr_irq_threaded_handler);
 	if (err)
 		return err;

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/10] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (3 preceding siblings ...)
  2026-04-29  9:38 ` [PATCH 04/10] drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-04-29 13:33   ` Liviu Dudau
  2026-05-01 13:39   ` Steven Price
  2026-04-29  9:38 ` [PATCH 06/10] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

If we want some FW events to be processed in the interrupt path, we need
the helpers manipulating req regs to be IRQ-safe, which implies using
spin_lock_irqsave instead of spinlock. While at it, use guards instead
of plain spin_lock/unlock calls.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_fw.h | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
index a99a9b6f4825..e56b7fe15bb3 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.h
+++ b/drivers/gpu/drm/panthor/panthor_fw.h
@@ -432,12 +432,11 @@ struct panthor_fw_global_iface {
 #define panthor_fw_toggle_reqs(__iface, __in_reg, __out_reg, __mask) \
 	do { \
 		u32 __cur_val, __new_val, __out_val; \
-		spin_lock(&(__iface)->lock); \
+		guard(spinlock_irqsave)(&(__iface)->lock); \
 		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
 		__out_val = READ_ONCE((__iface)->output->__out_reg); \
 		__new_val = ((__out_val ^ (__mask)) & (__mask)) | (__cur_val & ~(__mask)); \
 		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
-		spin_unlock(&(__iface)->lock); \
 	} while (0)
 
 /**
@@ -458,21 +457,19 @@ struct panthor_fw_global_iface {
 #define panthor_fw_update_reqs(__iface, __in_reg, __val, __mask) \
 	do { \
 		u32 __cur_val, __new_val; \
-		spin_lock(&(__iface)->lock); \
+		guard(spinlock_irqsave)(&(__iface)->lock); \
 		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
 		__new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
 		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
-		spin_unlock(&(__iface)->lock); \
 	} while (0)
 
 #define panthor_fw_update_reqs64(__iface, __in_reg, __val, __mask) \
 	do { \
 		u64 __cur_val, __new_val; \
-		spin_lock(&(__iface)->lock); \
+		guard(spinlock_irqsave)(&(__iface)->lock); \
 		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
 		__new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
 		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
-		spin_unlock(&(__iface)->lock); \
 	} while (0)
 
 struct panthor_fw_global_iface *

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/10] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (4 preceding siblings ...)
  2026-04-29  9:38 ` [PATCH 05/10] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-05-01 13:47   ` Steven Price
  2026-04-29  9:38 ` [PATCH 07/10] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Add a specific spinlock for events processing, and force processing
of events in the panthor_sched_report_fw_events() path rather than
deferring it to a work item. We also fast-track fence signalling by
making the job completion logic IRQ-safe.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_sched.c | 322 +++++++++++++++-----------------
 1 file changed, 149 insertions(+), 173 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 5b34032deff8..c197bdc4b2c7 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -177,18 +177,6 @@ struct panthor_scheduler {
 	 */
 	struct work_struct sync_upd_work;
 
-	/**
-	 * @fw_events_work: Work used to process FW events outside the interrupt path.
-	 *
-	 * Even if the interrupt is threaded, we need any event processing
-	 * that require taking the panthor_scheduler::lock to be processed
-	 * outside the interrupt path so we don't block the tick logic when
-	 * it calls panthor_fw_{csg,wait}_wait_acks(). Since most of the
-	 * event processing requires taking this lock, we just delegate all
-	 * FW event processing to the scheduler workqueue.
-	 */
-	struct work_struct fw_events_work;
-
 	/**
 	 * @fw_events: Bitmask encoding pending FW events.
 	 */
@@ -254,6 +242,15 @@ struct panthor_scheduler {
 		struct list_head waiting;
 	} groups;
 
+	/**
+	 * @events_lock: Lock taken when processing events.
+	 *
+	 * This also needs to be taken when csg_slots are updated, to make sure
+	 * the event processing logic doesn't touch groups that have left the CSG
+	 * slot.
+	 */
+	spinlock_t events_lock;
+
 	/**
 	 * @csg_slots: FW command stream group slots.
 	 */
@@ -676,9 +673,6 @@ struct panthor_group {
 	 */
 	struct panthor_kernel_bo *protm_suspend_buf;
 
-	/** @sync_upd_work: Work used to check/signal job fences. */
-	struct work_struct sync_upd_work;
-
 	/** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
 	struct work_struct tiler_oom_work;
 
@@ -999,7 +993,6 @@ static int
 group_bind_locked(struct panthor_group *group, u32 csg_id)
 {
 	struct panthor_device *ptdev = group->ptdev;
-	struct panthor_csg_slot *csg_slot;
 	int ret;
 
 	lockdep_assert_held(&ptdev->scheduler->lock);
@@ -1012,9 +1005,7 @@ group_bind_locked(struct panthor_group *group, u32 csg_id)
 	if (ret)
 		return ret;
 
-	csg_slot = &ptdev->scheduler->csg_slots[csg_id];
 	group_get(group);
-	group->csg_id = csg_id;
 
 	/* Dummy doorbell allocation: doorbell is assigned to the group and
 	 * all queues use the same doorbell.
@@ -1026,7 +1017,10 @@ group_bind_locked(struct panthor_group *group, u32 csg_id)
 	for (u32 i = 0; i < group->queue_count; i++)
 		group->queues[i]->doorbell_id = csg_id + 1;
 
-	csg_slot->group = group;
+	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
+		ptdev->scheduler->csg_slots[csg_id].group = group;
+		group->csg_id = csg_id;
+	}
 
 	return 0;
 }
@@ -1041,7 +1035,6 @@ static int
 group_unbind_locked(struct panthor_group *group)
 {
 	struct panthor_device *ptdev = group->ptdev;
-	struct panthor_csg_slot *slot;
 
 	lockdep_assert_held(&ptdev->scheduler->lock);
 
@@ -1051,9 +1044,12 @@ group_unbind_locked(struct panthor_group *group)
 	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
 		return -EINVAL;
 
-	slot = &ptdev->scheduler->csg_slots[group->csg_id];
+	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
+		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
+		group->csg_id = -1;
+	}
+
 	panthor_vm_idle(group->vm);
-	group->csg_id = -1;
 
 	/* Tiler OOM events will be re-issued next time the group is scheduled. */
 	atomic_set(&group->tiler_oom, 0);
@@ -1062,8 +1058,6 @@ group_unbind_locked(struct panthor_group *group)
 	for (u32 i = 0; i < group->queue_count; i++)
 		group->queues[i]->doorbell_id = -1;
 
-	slot->group = NULL;
-
 	group_put(group);
 	return 0;
 }
@@ -1151,16 +1145,14 @@ queue_suspend_timeout_locked(struct panthor_queue *queue)
 static void
 queue_suspend_timeout(struct panthor_queue *queue)
 {
-	spin_lock(&queue->fence_ctx.lock);
+	guard(spinlock_irqsave)(&queue->fence_ctx.lock);
 	queue_suspend_timeout_locked(queue);
-	spin_unlock(&queue->fence_ctx.lock);
 }
 
 static void
 queue_resume_timeout(struct panthor_queue *queue)
 {
-	spin_lock(&queue->fence_ctx.lock);
-
+	guard(spinlock_irqsave)(&queue->fence_ctx.lock);
 	if (queue_timeout_is_suspended(queue)) {
 		mod_delayed_work(queue->scheduler.timeout_wq,
 				 &queue->timeout.work,
@@ -1168,8 +1160,6 @@ queue_resume_timeout(struct panthor_queue *queue)
 
 		queue->timeout.remaining = MAX_SCHEDULE_TIMEOUT;
 	}
-
-	spin_unlock(&queue->fence_ctx.lock);
 }
 
 /**
@@ -1484,7 +1474,7 @@ cs_slot_process_fatal_event_locked(struct panthor_device *ptdev,
 	u32 fatal;
 	u64 info;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
 	fatal = cs_iface->output->fatal;
@@ -1532,7 +1522,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
 	u32 fault;
 	u64 info;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
 	fault = cs_iface->output->fault;
@@ -1542,7 +1532,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
 		u64 cs_extract = queue->iface.output->extract;
 		struct panthor_job *job;
 
-		spin_lock(&queue->fence_ctx.lock);
+		guard(spinlock_irqsave)(&queue->fence_ctx.lock);
 		list_for_each_entry(job, &queue->fence_ctx.in_flight_jobs, node) {
 			if (cs_extract >= job->ringbuf.end)
 				continue;
@@ -1552,7 +1542,6 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
 
 			dma_fence_set_error(job->done_fence, -EINVAL);
 		}
-		spin_unlock(&queue->fence_ctx.lock);
 	}
 
 	if (group) {
@@ -1682,7 +1671,7 @@ cs_slot_process_tiler_oom_event_locked(struct panthor_device *ptdev,
 	struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
 	struct panthor_group *group = csg_slot->group;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	if (drm_WARN_ON(&ptdev->base, !group))
 		return;
@@ -1703,7 +1692,7 @@ static bool cs_slot_process_irq_locked(struct panthor_device *ptdev,
 	struct panthor_fw_cs_iface *cs_iface;
 	u32 req, ack, events;
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
 	req = cs_iface->input->req;
@@ -1731,7 +1720,7 @@ static void csg_slot_process_idle_event_locked(struct panthor_device *ptdev, u32
 {
 	struct panthor_scheduler *sched = ptdev->scheduler;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	sched->might_have_idle_groups = true;
 
@@ -1742,16 +1731,102 @@ static void csg_slot_process_idle_event_locked(struct panthor_device *ptdev, u32
 	sched_queue_delayed_work(sched, tick, 0);
 }
 
+static void update_fdinfo_stats(struct panthor_job *job)
+{
+	struct panthor_group *group = job->group;
+	struct panthor_queue *queue = group->queues[job->queue_idx];
+	struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
+	struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
+	struct panthor_job_profiling_data *data = &slots[job->profiling.slot];
+
+	scoped_guard(spinlock, &group->fdinfo.lock) {
+		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
+			fdinfo->cycles += data->cycles.after - data->cycles.before;
+		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
+			fdinfo->time += data->time.after - data->time.before;
+	}
+}
+
+static bool queue_check_job_completion(struct panthor_queue *queue)
+{
+	struct panthor_syncobj_64b *syncobj = NULL;
+	struct panthor_job *job, *job_tmp;
+	bool cookie, progress = false;
+	LIST_HEAD(done_jobs);
+
+	cookie = dma_fence_begin_signalling();
+	scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock) {
+		list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) {
+			if (!syncobj) {
+				struct panthor_group *group = job->group;
+
+				syncobj = group->syncobjs->kmap +
+					  (job->queue_idx * sizeof(*syncobj));
+			}
+
+			if (syncobj->seqno < job->done_fence->seqno)
+				break;
+
+			list_move_tail(&job->node, &done_jobs);
+			dma_fence_signal_locked(job->done_fence);
+		}
+
+		if (list_empty(&queue->fence_ctx.in_flight_jobs)) {
+			/* If we have no job left, we cancel the timer, and reset remaining
+			 * time to its default so it can be restarted next time
+			 * queue_resume_timeout() is called.
+			 */
+			queue_suspend_timeout_locked(queue);
+
+			/* If there's no job pending, we consider it progress to avoid a
+			 * spurious timeout if the timeout handler and the sync update
+			 * handler raced.
+			 */
+			progress = true;
+		} else if (!list_empty(&done_jobs)) {
+			queue_reset_timeout_locked(queue);
+			progress = true;
+		}
+	}
+	dma_fence_end_signalling(cookie);
+
+	list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
+		if (job->profiling.mask)
+			update_fdinfo_stats(job);
+		list_del_init(&job->node);
+		panthor_job_put(&job->base);
+	}
+
+	return progress;
+}
+
+static void group_check_job_completion(struct panthor_group *group)
+{
+	bool cookie;
+	u32 queue_idx;
+
+	cookie = dma_fence_begin_signalling();
+	for (queue_idx = 0; queue_idx < group->queue_count; queue_idx++) {
+		struct panthor_queue *queue = group->queues[queue_idx];
+
+		if (!queue)
+			continue;
+
+		queue_check_job_completion(queue);
+	}
+	dma_fence_end_signalling(cookie);
+}
+
 static void csg_slot_sync_update_locked(struct panthor_device *ptdev,
 					u32 csg_id)
 {
 	struct panthor_csg_slot *csg_slot = &ptdev->scheduler->csg_slots[csg_id];
 	struct panthor_group *group = csg_slot->group;
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	if (group)
-		group_queue_work(group, sync_upd);
+		group_check_job_completion(group);
 
 	sched_queue_work(ptdev->scheduler, sync_upd);
 }
@@ -1784,7 +1859,7 @@ static void sched_process_csg_irq_locked(struct panthor_device *ptdev, u32 csg_i
 	struct panthor_fw_csg_iface *csg_iface;
 	u32 ring_cs_db_mask = 0;
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	if (drm_WARN_ON(&ptdev->base, csg_id >= ptdev->scheduler->csg_slot_count))
 		return;
@@ -1842,7 +1917,7 @@ static void sched_process_idle_event_locked(struct panthor_device *ptdev)
 {
 	struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	/* Acknowledge the idle event and schedule a tick. */
 	panthor_fw_update_reqs(glb_iface, req, glb_iface->output->ack, GLB_IDLE);
@@ -1858,7 +1933,7 @@ static void sched_process_global_irq_locked(struct panthor_device *ptdev)
 	struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
 	u32 req, ack, evts;
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	req = READ_ONCE(glb_iface->input->req);
 	ack = READ_ONCE(glb_iface->output->ack);
@@ -1868,30 +1943,6 @@ static void sched_process_global_irq_locked(struct panthor_device *ptdev)
 		sched_process_idle_event_locked(ptdev);
 }
 
-static void process_fw_events_work(struct work_struct *work)
-{
-	struct panthor_scheduler *sched = container_of(work, struct panthor_scheduler,
-						      fw_events_work);
-	u32 events = atomic_xchg(&sched->fw_events, 0);
-	struct panthor_device *ptdev = sched->ptdev;
-
-	mutex_lock(&sched->lock);
-
-	if (events & JOB_INT_GLOBAL_IF) {
-		sched_process_global_irq_locked(ptdev);
-		events &= ~JOB_INT_GLOBAL_IF;
-	}
-
-	while (events) {
-		u32 csg_id = ffs(events) - 1;
-
-		sched_process_csg_irq_locked(ptdev, csg_id);
-		events &= ~BIT(csg_id);
-	}
-
-	mutex_unlock(&sched->lock);
-}
-
 /**
  * panthor_sched_report_fw_events() - Report FW events to the scheduler.
  * @ptdev: Device.
@@ -1902,8 +1953,19 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
 	if (!ptdev->scheduler)
 		return;
 
-	atomic_or(events, &ptdev->scheduler->fw_events);
-	sched_queue_work(ptdev->scheduler, fw_events);
+	guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
+
+	if (events & JOB_INT_GLOBAL_IF) {
+		sched_process_global_irq_locked(ptdev);
+		events &= ~JOB_INT_GLOBAL_IF;
+	}
+
+	while (events) {
+		u32 csg_id = ffs(events) - 1;
+
+		sched_process_csg_irq_locked(ptdev, csg_id);
+		events &= ~BIT(csg_id);
+	}
 }
 
 static const char *fence_get_driver_name(struct dma_fence *fence)
@@ -2136,7 +2198,9 @@ tick_ctx_init(struct panthor_scheduler *sched,
 		 * CSG IRQs, so we can flag the faulty queue.
 		 */
 		if (panthor_vm_has_unhandled_faults(group->vm)) {
-			sched_process_csg_irq_locked(ptdev, i);
+			scoped_guard(spinlock_irqsave, &sched->events_lock) {
+				sched_process_csg_irq_locked(ptdev, i);
+			}
 
 			/* No fatal fault reported, flag all queues as faulty. */
 			if (!group->fatal_queues)
@@ -2183,13 +2247,13 @@ group_term_post_processing(struct panthor_group *group)
 		if (!queue)
 			continue;
 
-		spin_lock(&queue->fence_ctx.lock);
-		list_for_each_entry_safe(job, tmp, &queue->fence_ctx.in_flight_jobs, node) {
-			list_move_tail(&job->node, &faulty_jobs);
-			dma_fence_set_error(job->done_fence, err);
-			dma_fence_signal_locked(job->done_fence);
+		scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock) {
+			list_for_each_entry_safe(job, tmp, &queue->fence_ctx.in_flight_jobs, node) {
+				list_move_tail(&job->node, &faulty_jobs);
+				dma_fence_set_error(job->done_fence, err);
+				dma_fence_signal_locked(job->done_fence);
+			}
 		}
-		spin_unlock(&queue->fence_ctx.lock);
 
 		/* Manually update the syncobj seqno to unblock waiters. */
 		syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj));
@@ -2336,8 +2400,10 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
 			 * any pending interrupts before we start the new
 			 * group.
 			 */
-			if (group->csg_id >= 0)
+			if (group->csg_id >= 0) {
+				guard(spinlock_irqsave)(&sched->events_lock);
 				sched_process_csg_irq_locked(ptdev, group->csg_id);
+			}
 
 			group_unbind_locked(group);
 		}
@@ -2920,8 +2986,10 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
 
 		group_get(group);
 
-		if (group->csg_id >= 0)
+		if (group->csg_id >= 0) {
+			guard(spinlock_irqsave)(&sched->events_lock);
 			sched_process_csg_irq_locked(ptdev, group->csg_id);
+		}
 
 		group_unbind_locked(group);
 
@@ -3005,22 +3073,6 @@ void panthor_sched_post_reset(struct panthor_device *ptdev, bool reset_failed)
 	}
 }
 
-static void update_fdinfo_stats(struct panthor_job *job)
-{
-	struct panthor_group *group = job->group;
-	struct panthor_queue *queue = group->queues[job->queue_idx];
-	struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
-	struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
-	struct panthor_job_profiling_data *data = &slots[job->profiling.slot];
-
-	scoped_guard(spinlock, &group->fdinfo.lock) {
-		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
-			fdinfo->cycles += data->cycles.after - data->cycles.before;
-		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
-			fdinfo->time += data->time.after - data->time.before;
-	}
-}
-
 void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
 {
 	struct panthor_group_pool *gpool = pfile->groups;
@@ -3041,80 +3093,6 @@ void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
 	xa_unlock(&gpool->xa);
 }
 
-static bool queue_check_job_completion(struct panthor_queue *queue)
-{
-	struct panthor_syncobj_64b *syncobj = NULL;
-	struct panthor_job *job, *job_tmp;
-	bool cookie, progress = false;
-	LIST_HEAD(done_jobs);
-
-	cookie = dma_fence_begin_signalling();
-	spin_lock(&queue->fence_ctx.lock);
-	list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) {
-		if (!syncobj) {
-			struct panthor_group *group = job->group;
-
-			syncobj = group->syncobjs->kmap +
-				  (job->queue_idx * sizeof(*syncobj));
-		}
-
-		if (syncobj->seqno < job->done_fence->seqno)
-			break;
-
-		list_move_tail(&job->node, &done_jobs);
-		dma_fence_signal_locked(job->done_fence);
-	}
-
-	if (list_empty(&queue->fence_ctx.in_flight_jobs)) {
-		/* If we have no job left, we cancel the timer, and reset remaining
-		 * time to its default so it can be restarted next time
-		 * queue_resume_timeout() is called.
-		 */
-		queue_suspend_timeout_locked(queue);
-
-		/* If there's no job pending, we consider it progress to avoid a
-		 * spurious timeout if the timeout handler and the sync update
-		 * handler raced.
-		 */
-		progress = true;
-	} else if (!list_empty(&done_jobs)) {
-		queue_reset_timeout_locked(queue);
-		progress = true;
-	}
-	spin_unlock(&queue->fence_ctx.lock);
-	dma_fence_end_signalling(cookie);
-
-	list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
-		if (job->profiling.mask)
-			update_fdinfo_stats(job);
-		list_del_init(&job->node);
-		panthor_job_put(&job->base);
-	}
-
-	return progress;
-}
-
-static void group_sync_upd_work(struct work_struct *work)
-{
-	struct panthor_group *group =
-		container_of(work, struct panthor_group, sync_upd_work);
-	u32 queue_idx;
-	bool cookie;
-
-	cookie = dma_fence_begin_signalling();
-	for (queue_idx = 0; queue_idx < group->queue_count; queue_idx++) {
-		struct panthor_queue *queue = group->queues[queue_idx];
-
-		if (!queue)
-			continue;
-
-		queue_check_job_completion(queue);
-	}
-	dma_fence_end_signalling(cookie);
-
-	group_put(group);
-}
-
 struct panthor_job_ringbuf_instrs {
 	u64 buffer[MAX_INSTRS_PER_JOB];
 	u32 count;
@@ -3346,9 +3324,8 @@ queue_run_job(struct drm_sched_job *sched_job)
 	job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64));
 
 	panthor_job_get(&job->base);
-	spin_lock(&queue->fence_ctx.lock);
-	list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
-	spin_unlock(&queue->fence_ctx.lock);
+	scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock)
+		list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
 
 	/* Make sure the ring buffer is updated before the INSERT
 	 * register.
@@ -3683,7 +3660,6 @@ int panthor_group_create(struct panthor_file *pfile,
 	INIT_LIST_HEAD(&group->wait_node);
 	INIT_LIST_HEAD(&group->run_node);
 	INIT_WORK(&group->term_work, group_term_work);
-	INIT_WORK(&group->sync_upd_work, group_sync_upd_work);
 	INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
 	INIT_WORK(&group->release_work, group_release_work);
 
@@ -4054,7 +4030,6 @@ void panthor_sched_unplug(struct panthor_device *ptdev)
 	struct panthor_scheduler *sched = ptdev->scheduler;
 
 	disable_delayed_work_sync(&sched->tick_work);
-	disable_work_sync(&sched->fw_events_work);
 	disable_work_sync(&sched->sync_upd_work);
 
 	mutex_lock(&sched->lock);
@@ -4139,7 +4114,8 @@ int panthor_sched_init(struct panthor_device *ptdev)
 	sched->tick_period = msecs_to_jiffies(10);
 	INIT_DELAYED_WORK(&sched->tick_work, tick_work);
 	INIT_WORK(&sched->sync_upd_work, sync_upd_work);
-	INIT_WORK(&sched->fw_events_work, process_fw_events_work);
+
+	spin_lock_init(&sched->events_lock);
 
 	ret = drmm_mutex_init(&ptdev->base, &sched->lock);
 	if (ret)

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/10] drm/panthor: Automate CSG IRQ processing at group unbind time
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (5 preceding siblings ...)
  2026-04-29  9:38 ` [PATCH 06/10] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-05-01 13:53   ` Steven Price
  2026-04-29  9:38 ` [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Make the sched_process_csg_irq_locked() call part of
group_unbind_locked() so we don't have to manually call it in
tick_ctx_apply()/panthor_sched_suspend().

This implies moving group_[un]bind_locked() around to avoid a
forward declaration.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_sched.c | 178 +++++++++++++++-----------------
 1 file changed, 82 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index c197bdc4b2c7..601a9bff1485 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -982,86 +982,6 @@ group_get(struct panthor_group *group)
 	return group;
 }
 
-/**
- * group_bind_locked() - Bind a group to a group slot
- * @group: Group.
- * @csg_id: Slot.
- *
- * Return: 0 on success, a negative error code otherwise.
- */
-static int
-group_bind_locked(struct panthor_group *group, u32 csg_id)
-{
-	struct panthor_device *ptdev = group->ptdev;
-	int ret;
-
-	lockdep_assert_held(&ptdev->scheduler->lock);
-
-	if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
-			ptdev->scheduler->csg_slots[csg_id].group))
-		return -EINVAL;
-
-	ret = panthor_vm_active(group->vm);
-	if (ret)
-		return ret;
-
-	group_get(group);
-
-	/* Dummy doorbell allocation: doorbell is assigned to the group and
-	 * all queues use the same doorbell.
-	 *
-	 * TODO: Implement LRU-based doorbell assignment, so the most often
-	 * updated queues get their own doorbell, thus avoiding useless checks
-	 * on queues belonging to the same group that are rarely updated.
-	 */
-	for (u32 i = 0; i < group->queue_count; i++)
-		group->queues[i]->doorbell_id = csg_id + 1;
-
-	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
-		ptdev->scheduler->csg_slots[csg_id].group = group;
-		group->csg_id = csg_id;
-	}
-
-	return 0;
-}
-
-/**
- * group_unbind_locked() - Unbind a group from a slot.
- * @group: Group to unbind.
- *
- * Return: 0 on success, a negative error code otherwise.
- */
-static int
-group_unbind_locked(struct panthor_group *group)
-{
-	struct panthor_device *ptdev = group->ptdev;
-
-	lockdep_assert_held(&ptdev->scheduler->lock);
-
-	if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
-		return -EINVAL;
-
-	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
-		return -EINVAL;
-
-	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
-		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
-		group->csg_id = -1;
-	}
-
-	panthor_vm_idle(group->vm);
-
-	/* Tiler OOM events will be re-issued next time the group is scheduled. */
-	atomic_set(&group->tiler_oom, 0);
-	cancel_work(&group->tiler_oom_work);
-
-	for (u32 i = 0; i < group->queue_count; i++)
-		group->queues[i]->doorbell_id = -1;
-
-	group_put(group);
-	return 0;
-}
-
 static bool
 group_is_idle(struct panthor_group *group)
 {
@@ -1968,6 +1888,88 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
 	}
 }
 
+/**
+ * group_bind_locked() - Bind a group to a group slot
+ * @group: Group.
+ * @csg_id: Slot.
+ *
+ * Return: 0 on success, a negative error code otherwise.
+ */
+static int
+group_bind_locked(struct panthor_group *group, u32 csg_id)
+{
+	struct panthor_device *ptdev = group->ptdev;
+	int ret;
+
+	lockdep_assert_held(&ptdev->scheduler->lock);
+
+	if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
+			ptdev->scheduler->csg_slots[csg_id].group))
+		return -EINVAL;
+
+	ret = panthor_vm_active(group->vm);
+	if (ret)
+		return ret;
+
+	group_get(group);
+
+	/* Dummy doorbell allocation: doorbell is assigned to the group and
+	 * all queues use the same doorbell.
+	 *
+	 * TODO: Implement LRU-based doorbell assignment, so the most often
+	 * updated queues get their own doorbell, thus avoiding useless checks
+	 * on queues belonging to the same group that are rarely updated.
+	 */
+	for (u32 i = 0; i < group->queue_count; i++)
+		group->queues[i]->doorbell_id = csg_id + 1;
+
+	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
+		ptdev->scheduler->csg_slots[csg_id].group = group;
+		group->csg_id = csg_id;
+	}
+
+	return 0;
+}
+
+/**
+ * group_unbind_locked() - Unbind a group from a slot.
+ * @group: Group to unbind.
+ *
+ * Return: 0 on success, a negative error code otherwise.
+ */
+static int
+group_unbind_locked(struct panthor_group *group)
+{
+	struct panthor_device *ptdev = group->ptdev;
+
+	lockdep_assert_held(&ptdev->scheduler->lock);
+
+	if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
+		return -EINVAL;
+
+	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
+		return -EINVAL;
+
+	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
+		/* Process all pending IRQs before returning the slot. */
+		sched_process_csg_irq_locked(ptdev, group->csg_id);
+		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
+		group->csg_id = -1;
+	}
+
+	panthor_vm_idle(group->vm);
+
+	/* Tiler OOM events will be re-issued next time the group is scheduled. */
+	atomic_set(&group->tiler_oom, 0);
+	cancel_work(&group->tiler_oom_work);
+
+	for (u32 i = 0; i < group->queue_count; i++)
+		group->queues[i]->doorbell_id = -1;
+
+	group_put(group);
+	return 0;
+}
+
 static const char *fence_get_driver_name(struct dma_fence *fence)
 {
 	return "panthor";
@@ -2396,15 +2398,6 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
 	/* Unbind evicted groups. */
 	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
 		list_for_each_entry(group, &ctx->old_groups[prio], run_node) {
-			/* This group is gone. Process interrupts to clear
-			 * any pending interrupts before we start the new
-			 * group.
-			 */
-			if (group->csg_id >= 0) {
-				guard(spinlock_irqsave)(&sched->events_lock);
-				sched_process_csg_irq_locked(ptdev, group->csg_id);
-			}
-
 			group_unbind_locked(group);
 		}
 	}
@@ -2970,8 +2963,6 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
 
 			if (flush_caches_failed)
 				csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
-			else
-				csg_slot_sync_update_locked(ptdev, csg_id);
 
 			slot_mask &= ~BIT(csg_id);
 		}
@@ -2986,11 +2977,6 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
 
 		group_get(group);
 
-		if (group->csg_id >= 0) {
-			guard(spinlock_irqsave)(&sched->events_lock);
-			sched_process_csg_irq_locked(ptdev, group->csg_id);
-		}
-
 		group_unbind_locked(group);
 
 		drm_WARN_ON(&group->ptdev->base, !list_empty(&group->run_node));

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (6 preceding siblings ...)
  2026-04-29  9:38 ` [PATCH 07/10] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-05-01 14:20   ` Steven Price
  2026-04-29  9:38 ` [PATCH 09/10] drm/panthor: Process FW events in IRQ context Boris Brezillon
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Rather than assuming an interrupt is always expected for request
acks, temporarily enable the relevant interrupts when the polling-wait
failed. This should hopefully reduce the number of interrupts the CPU
has to process.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_fw.c    | 34 +++++++++++++++++++--------------
 drivers/gpu/drm/panthor/panthor_sched.c |  5 +++--
 2 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index 8239a6951569..f5e0ceca4130 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1039,16 +1039,10 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
 	glb_iface->input->progress_timer = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
 	glb_iface->input->idle_timer = panthor_fw_conv_timeout(ptdev, IDLE_HYSTERESIS_US);
 
-	/* Enable interrupts we care about. */
-	glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
-					 GLB_PING |
-					 GLB_CFG_PROGRESS_TIMER |
-					 GLB_CFG_POWEROFF_TIMER |
-					 GLB_IDLE_EN |
-					 GLB_IDLE;
-
-	if (panthor_fw_has_glb_state(ptdev))
-		glb_iface->input->ack_irq_mask |= GLB_STATE_MASK;
+	/* Enable interrupts for asynchronous events that are not
+	 * triggered by request acks.
+	 */
+	glb_iface->input->ack_irq_mask = GLB_IDLE;
 
 	panthor_fw_update_reqs(glb_iface, req, GLB_IDLE_EN | GLB_COUNTER_EN,
 			       GLB_IDLE_EN | GLB_COUNTER_EN);
@@ -1318,8 +1312,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
  * Return: 0 on success, -ETIMEDOUT otherwise.
  */
 static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
-				wait_queue_head_t *wq,
-				u32 req_mask, u32 *acked,
+				u32 *ack_irq_mask_ptr, spinlock_t *lock,
+				wait_queue_head_t *wq, u32 req_mask, u32 *acked,
 				u32 timeout_ms)
 {
 	u32 ack, req = READ_ONCE(*req_ptr) & req_mask;
@@ -1334,8 +1328,16 @@ static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
 	if (!ret)
 		return 0;
 
-	if (wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
-			       msecs_to_jiffies(timeout_ms)))
+	scoped_guard(spinlock_irqsave, lock)
+		*ack_irq_mask_ptr |= req_mask;
+
+	ret = wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
+				 msecs_to_jiffies(timeout_ms));
+
+	scoped_guard(spinlock_irqsave, lock)
+		*ack_irq_mask_ptr &= ~req_mask;
+
+	if (ret)
 		return 0;
 
 	/* Check one last time, in case we were not woken up for some reason. */
@@ -1369,6 +1371,8 @@ int panthor_fw_glb_wait_acks(struct panthor_device *ptdev,
 
 	return panthor_fw_wait_acks(&glb_iface->input->req,
 				    &glb_iface->output->ack,
+				    &glb_iface->input->ack_irq_mask,
+				    &glb_iface->lock,
 				    &ptdev->fw->req_waitqueue,
 				    req_mask, acked, timeout_ms);
 }
@@ -1395,6 +1399,8 @@ int panthor_fw_csg_wait_acks(struct panthor_device *ptdev, u32 csg_slot,
 
 	ret = panthor_fw_wait_acks(&csg_iface->input->req,
 				   &csg_iface->output->ack,
+				   &csg_iface->input->ack_irq_mask,
+				   &csg_iface->lock,
 				   &ptdev->fw->req_waitqueue,
 				   req_mask, acked, timeout_ms);
 
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 601a9bff1485..2edba335f22d 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1110,7 +1110,7 @@ cs_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 cs_id)
 	cs_iface->input->ringbuf_output = queue->iface.output_fw_va;
 	cs_iface->input->config = CS_CONFIG_PRIORITY(queue->priority) |
 				  CS_CONFIG_DOORBELL(queue->doorbell_id);
-	cs_iface->input->ack_irq_mask = ~0;
+	cs_iface->input->ack_irq_mask = CS_FATAL | CS_FAULT | CS_TILER_OOM;
 	panthor_fw_update_reqs(cs_iface, req,
 			       CS_IDLE_SYNC_WAIT |
 			       CS_IDLE_EMPTY |
@@ -1378,7 +1378,8 @@ csg_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 priority)
 		csg_iface->input->protm_suspend_buf = 0;
 	}
 
-	csg_iface->input->ack_irq_mask = ~0;
+	csg_iface->input->ack_irq_mask = CSG_SYNC_UPDATE | CSG_IDLE |
+					 CSG_PROGRESS_TIMER_EVENT;
 	panthor_fw_toggle_reqs(csg_iface, doorbell_req, doorbell_ack, queue_mask);
 	return 0;
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/10] drm/panthor: Process FW events in IRQ context
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (7 preceding siblings ...)
  2026-04-29  9:38 ` [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-05-01 14:38   ` Steven Price
  2026-04-29  9:38 ` [PATCH 10/10] drm/panthor: Introduce interrupt coalescing support for job IRQs Boris Brezillon
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Now that everything is set to allow processing FW events in IRQ context,
go for it. This should reduce the dma_fence signaling latency.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_fw.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index f5e0ceca4130..05c632913359 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1087,9 +1087,38 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
 	}
 }
 
+static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
+{
+	struct panthor_irq *pirq = data;
+
+	if (!gpu_read(pirq->iomem, INT_STAT))
+		return IRQ_NONE;
+
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
+			return IRQ_NONE;
+
+		pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
+	}
+
+	panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_RAWSTAT));
+
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
+			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
+	}
+
+	return IRQ_HANDLED;
+}
+
 static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
 {
-	return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
+	struct panthor_irq *pirq = data;
+
+	/* We never return IRQ_WAKE_THREAD, so we're not supposed to be called. */
+	drm_WARN_ON_ONCE(&pirq->ptdev->base,
+			 "threaded IRQ handler should never be called.");
+	return IRQ_NONE;
 }
 
 static int panthor_fw_start(struct panthor_device *ptdev)
@@ -1489,7 +1518,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
 
 	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
 				  ptdev->iomem + JOB_INT_BASE, "job",
-				  panthor_irq_default_raw_handler,
+				  panthor_job_irq_raw_handler,
 				  panthor_job_irq_threaded_handler);
 	if (ret) {
 		drm_err(&ptdev->base, "failed to request job irq");

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/10] drm/panthor: Introduce interrupt coalescing support for job IRQs
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (8 preceding siblings ...)
  2026-04-29  9:38 ` [PATCH 09/10] drm/panthor: Process FW events in IRQ context Boris Brezillon
@ 2026-04-29  9:38 ` Boris Brezillon
  2026-05-01 14:57   ` Steven Price
  2026-04-29  9:59 ` [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
  2026-04-29 10:36 ` Boris Brezillon
  11 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:38 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Dealing with interrupts from the raw IRQ handler is good for latency,
but might be detrimental for the overall throughput, because the system
keeps being interrupted to process job interrupts.

Try to mitigate that with some interrupt coalescing infrastructure,
where we wake up the IRQ thread if close enough interrupts gets
detected.

It's still experimental, which explains why the feature is off by
default, and can be enabled through a debugfs knob.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h |  83 +++++++++++++++++
 drivers/gpu/drm/panthor/panthor_drv.c    |   1 +
 drivers/gpu/drm/panthor/panthor_fw.c     | 150 +++++++++++++++++++++++++++++--
 drivers/gpu/drm/panthor/panthor_fw.h     |   2 +
 4 files changed, 231 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 1c130b8394ab..e90f251f75e2 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -109,6 +109,48 @@ struct panthor_irq {
 	enum panthor_irq_state state;
 };
 
+/**
+ * struct panthor_irq_coalescing - IRQ coalescing info
+ */
+struct panthor_irq_coalescing {
+	/**
+	 * @max_us: Maximum time in microseconds between two consecutive
+	 * interrupts to consider coalescing.
+	 *
+	 * It being a u16 means we can't encode more than 65-ish msecs, but
+	 * if we have to poll status for more than a few hundreds usecs it's
+	 * going to make the IRQ thread consume more CPU than we want.
+	 */
+	u16 max_us;
+
+	/**
+	 * @poll_perios_us: Rate at which status polling happens.
+	 *
+	 * It being a u16 means we can't encode more than 65-ish msecs, but
+	 * if we have to delay each status check by more than a few usecs
+	 * it's going to add latency we don't want.
+	 */
+	u16 poll_period_us;
+
+	/**
+	 * @inbounds_cnt_threshold: Minimum of consecutive interrupts with no
+	 * more than max_us between them to wake up the thread handler.
+	 */
+	u16 inbounds_cnt_threshold;
+
+	/**
+	 * @inbounds_cnt: Current number of consecutive interrupts with no more
+	 * than max_us between.
+	 */
+	u16 inbounds_cnt;
+
+	/** @coalesced_cnt: Total number of interrupts coalesced. */
+	u64 coalesced_cnt;
+
+	/** @last_ts: Timestamp of the last IRQ. */
+	ktime_t last_ts;
+};
+
 /**
  * enum panthor_device_profiling_mode - Profiling state
  */
@@ -571,6 +613,47 @@ static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
 #define INT_MASK    0x8
 #define INT_STAT    0xc
 
+static inline bool
+panthor_irq_coalescing_wake_thread(struct panthor_irq_coalescing *coalescing)
+{
+	ktime_t ts;
+	s64 diff_ns;
+
+	if (!coalescing->inbounds_cnt_threshold)
+		return false;
+
+	ts = ktime_get();
+	diff_ns = ktime_to_ns(ktime_sub(ts, coalescing->last_ts));
+	if (diff_ns > coalescing->max_us * 1000) {
+		coalescing->inbounds_cnt = 1;
+		return false;
+	}
+
+	if (coalescing->inbounds_cnt < U16_MAX)
+		coalescing->inbounds_cnt++;
+
+	return coalescing->inbounds_cnt >= coalescing->inbounds_cnt_threshold;
+}
+
+static inline void
+panthor_irq_coalescing_update_ts(struct panthor_irq_coalescing *coalescing)
+{
+	if (coalescing->inbounds_cnt_threshold)
+		coalescing->last_ts = ktime_get();
+}
+
+static inline void
+panthor_irq_coalescing_init(struct panthor_irq_coalescing *coalescing,
+			     u16 max_us, u16 poll_period_us, u16 inbounds_cnt_threshold)
+{
+	coalescing->inbounds_cnt = 0;
+	coalescing->coalesced_cnt = 0;
+	coalescing->max_us = max_us;
+	coalescing->poll_period_us = poll_period_us;
+	coalescing->inbounds_cnt_threshold = inbounds_cnt_threshold;
+	coalescing->last_ts = ktime_set(0, 0);
+}
+
 static inline irqreturn_t panthor_irq_default_raw_handler(int irq, void *data)
 {
 	struct panthor_irq *pirq = data;
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 66996c9147c2..2fac5ba57f9d 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1760,6 +1760,7 @@ static void panthor_debugfs_init(struct drm_minor *minor)
 {
 	panthor_mmu_debugfs_init(minor);
 	panthor_gem_debugfs_init(minor);
+	panthor_fw_debugfs_init(minor);
 }
 #endif
 
diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index 05c632913359..cbb7d00f0e6e 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -6,6 +6,7 @@
 #endif
 
 #include <linux/clk.h>
+#include <linux/debugfs.h>
 #include <linux/dma-mapping.h>
 #include <linux/firmware.h>
 #include <linux/iopoll.h>
@@ -15,6 +16,7 @@
 #include <linux/pm_runtime.h>
 
 #include <drm/drm_drv.h>
+#include <drm/drm_file.h>
 #include <drm/drm_managed.h>
 #include <drm/drm_print.h>
 
@@ -271,6 +273,9 @@ struct panthor_fw {
 
 	/** @irq: Job irq data. */
 	struct panthor_irq irq;
+
+	/** @irq_coalescing: Job IRQ coalescing. */
+	struct panthor_irq_coalescing irq_coalescing;
 };
 
 struct panthor_vm *panthor_fw_vm(struct panthor_device *ptdev)
@@ -1090,6 +1095,8 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
 static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
 {
 	struct panthor_irq *pirq = data;
+	struct panthor_device *ptdev = pirq->ptdev;
+	irqreturn_t ret = IRQ_HANDLED;
 
 	if (!gpu_read(pirq->iomem, INT_STAT))
 		return IRQ_NONE;
@@ -1101,6 +1108,9 @@ static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
 		pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
 	}
 
+	if (panthor_irq_coalescing_wake_thread(&ptdev->fw->irq_coalescing))
+		ret = IRQ_WAKE_THREAD;
+
 	panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_RAWSTAT));
 
 	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
@@ -1108,17 +1118,58 @@ static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
 			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
 	}
 
-	return IRQ_HANDLED;
+	panthor_irq_coalescing_update_ts(&ptdev->fw->irq_coalescing);
+	return ret;
 }
 
 static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
 {
 	struct panthor_irq *pirq = data;
+	struct panthor_device *ptdev = pirq->ptdev;
+	irqreturn_t ret = IRQ_NONE;
+	u32 processed_count = 0;
 
-	/* We never return IRQ_WAKE_THREAD, so we're not supposed to be called. */
-	drm_WARN_ON_ONCE(&pirq->ptdev->base,
-			 "threaded IRQ handler should never be called.");
-	return IRQ_NONE;
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
+			return IRQ_NONE;
+
+		gpu_write(pirq->iomem, INT_MASK, 0);
+		pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
+	}
+
+	while (true) {
+		u32 status;
+
+		/* It's safe to access pirq->mask without the lock held here. If a new
+		 * event gets added to the mask and the corresponding IRQ is pending,
+		 * we'll process it right away instead of adding an extra raw -> threaded
+		 * round trip. If an event is removed and the status bit is set, it will
+		 * be ignored, just like it would have been if the mask had been adjusted
+		 * right before the HW event kicks in. TLDR; it's all expected races we're
+		 * covered for.
+		 */
+		if (readl_poll_timeout_atomic(pirq->iomem + INT_RAWSTAT,
+					      status, status & pirq->mask,
+					      ptdev->fw->irq_coalescing.poll_period_us,
+					      ptdev->fw->irq_coalescing.max_us))
+			break;
+
+		panthor_job_irq_handler(pirq, status);
+		ret = IRQ_HANDLED;
+		processed_count++;
+	}
+
+	if (processed_count > 1)
+		ptdev->fw->irq_coalescing.coalesced_cnt += processed_count - 1;
+
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {
+			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
+			gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+		}
+	}
+
+	return ret;
 }
 
 static int panthor_fw_start(struct panthor_device *ptdev)
@@ -1516,6 +1567,11 @@ int panthor_fw_init(struct panthor_device *ptdev)
 	if (irq <= 0)
 		return -ENODEV;
 
+	/* Start with IRQ coalescing disabled, until we have enough proof it's
+	 * useful and doesn't have a too big CPU overhead. Those parameters can
+	 * be tweaked with the debugfs knobs.
+	 */
+	panthor_irq_coalescing_init(&fw->irq_coalescing, 0, 0, 0);
 	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
 				  ptdev->iomem + JOB_INT_BASE, "job",
 				  panthor_job_irq_raw_handler,
@@ -1563,6 +1619,90 @@ int panthor_fw_init(struct panthor_device *ptdev)
 	return ret;
 }
 
+static ssize_t job_irq_coalescing_props_read(struct file *file,
+					     char __user *ubuf,
+					     size_t ubuf_size,
+					     loff_t *ppos)
+{
+	struct panthor_device *ptdev = container_of(file->private_data,
+						    struct panthor_device, base);
+	char kbuf[256] = {};
+	int kbuf_size;
+
+	kbuf_size = snprintf(kbuf, sizeof(kbuf) - 1,
+			     "max_us=%u poll_period_us=%u inbounds_cnt_threshold=%u\n",
+			     ptdev->fw->irq_coalescing.max_us,
+			     ptdev->fw->irq_coalescing.poll_period_us,
+			     ptdev->fw->irq_coalescing.inbounds_cnt_threshold);
+	if (kbuf_size > sizeof(kbuf) - 1)
+		kbuf_size = sizeof(kbuf) - 1;
+
+	return simple_read_from_buffer(ubuf, ubuf_size, ppos, kbuf, kbuf_size);
+}
+
+static ssize_t job_irq_coalescing_props_write(struct file *file,
+					      const char __user *ubuf,
+					      size_t ubuf_size, loff_t *ppos)
+{
+	struct panthor_device *ptdev = container_of(file->private_data,
+						    struct panthor_device, base);
+	unsigned int max_us = 0, poll_period_us = 0, inbounds_cnt_threshold = 0;
+	char kbuf[256] = {};
+	int ret;
+
+	simple_write_to_buffer(kbuf, sizeof(kbuf) - 1, ppos, ubuf, ubuf_size);
+	ret = sscanf(kbuf,
+		     "max_us=%u poll_period_us=%u inbounds_cnt_threshold=%u",
+		     &max_us, &poll_period_us, &inbounds_cnt_threshold);
+	if (ret != 3)
+		return -EINVAL;
+
+	if (max_us > U16_MAX || poll_period_us > U16_MAX || inbounds_cnt_threshold > U16_MAX)
+		return -EINVAL;
+
+	panthor_irq_coalescing_init(&ptdev->fw->irq_coalescing, max_us,
+				    poll_period_us, inbounds_cnt_threshold);
+	return ubuf_size;
+}
+
+static const struct debugfs_short_fops job_irq_coalescing_props_fops = {
+	.read = job_irq_coalescing_props_read,
+	.write = job_irq_coalescing_props_write,
+};
+
+static ssize_t job_irq_coalescing_stats_read(struct file *file,
+					     char __user *ubuf,
+					     size_t ubuf_size,
+					     loff_t *ppos)
+{
+	struct panthor_device *ptdev = container_of(file->private_data,
+						    struct panthor_device, base);
+	char kbuf[256] = {};
+	int kbuf_size;
+
+	kbuf_size = snprintf(kbuf, sizeof(kbuf) - 1,
+			     "inbounds_cnt=%u coalesced_cnt=%llu last_ts=%llu\n",
+			     ptdev->fw->irq_coalescing.inbounds_cnt,
+			     ptdev->fw->irq_coalescing.coalesced_cnt,
+			     ktime_to_ns(ptdev->fw->irq_coalescing.last_ts));
+	if (kbuf_size > sizeof(kbuf) - 1)
+		kbuf_size = sizeof(kbuf) - 1;
+
+	return simple_read_from_buffer(ubuf, ubuf_size, ppos, kbuf, kbuf_size);
+}
+
+static const struct debugfs_short_fops job_irq_coalescing_stats_fops = {
+	.read = job_irq_coalescing_stats_read,
+};
+
+void panthor_fw_debugfs_init(struct drm_minor *minor)
+{
+	debugfs_create_file("job_irq_coalescing_props", 0600, minor->debugfs_root,
+			    minor->dev, &job_irq_coalescing_props_fops);
+	debugfs_create_file("job_irq_coalescing_stats", 0400, minor->debugfs_root,
+			    minor->dev, &job_irq_coalescing_stats_fops);
+}
+
 MODULE_FIRMWARE("arm/mali/arch10.8/mali_csffw.bin");
 MODULE_FIRMWARE("arm/mali/arch10.10/mali_csffw.bin");
 MODULE_FIRMWARE("arm/mali/arch10.12/mali_csffw.bin");
diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
index e56b7fe15bb3..2643bd9e4ef9 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.h
+++ b/drivers/gpu/drm/panthor/panthor_fw.h
@@ -526,4 +526,6 @@ static inline int panthor_fw_resume(struct panthor_device *ptdev)
 int panthor_fw_init(struct panthor_device *ptdev);
 void panthor_fw_unplug(struct panthor_device *ptdev);
 
+void panthor_fw_debugfs_init(struct drm_minor *minor);
+
 #endif

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (9 preceding siblings ...)
  2026-04-29  9:38 ` [PATCH 10/10] drm/panthor: Introduce interrupt coalescing support for job IRQs Boris Brezillon
@ 2026-04-29  9:59 ` Boris Brezillon
  2026-04-29 10:36 ` Boris Brezillon
  11 siblings, 0 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29  9:59 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Chia-I Wu

+Chia-I

On Wed, 29 Apr 2026 11:38:27 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> Right now, panthor is one of the rare drivers to signal fences
> from work items (not even from the threaded IRQ handler). We
> could move that to the threaded handler, but that would still
> leave the latency caused by the scheduling of the IRQ thread.
> 
> Instead, this patchset moves all the job IRQ processing to
> the raw IRQ handler, which is fine because what the current
> code does is demux the interrupts and deferring actual handling
> to sub work items. The only bits we keep in the IRQ path is
> the dma_fence signalling, which should be acceptable, in term
> of CPU cycles spent in the IRQ context.
> 
> Pretty much all the patches except the last two are just
> preparing the ground to get there. The second to last one
> does the thread -> IRQ transition, and the last one is some
> experimental interrupt coalescing support that I've added
> because I noticed moving job IRQ handling to the raw handler
> generates quite a lot of interrupts in some case, and having
> the system constantly interrupted like that can be
> detrimental.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
> Boris Brezillon (10):
>       drm/panthor: Make panthor_irq::state a non-atomic field
>       drm/panthor: Move the register accessors before the IRQ helpers
>       drm/panthor: Replace the panthor_irq macro machinery by inline helpers
>       drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers
>       drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
>       drm/panthor: Prepare the scheduler logic for FW events in IRQ context
>       drm/panthor: Automate CSG IRQ processing at group unbind time
>       drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
>       drm/panthor: Process FW events in IRQ context
>       drm/panthor: Introduce interrupt coalescing support for job IRQs
> 
>  drivers/gpu/drm/panthor/panthor_device.h | 358 ++++++++++++++---------
>  drivers/gpu/drm/panthor/panthor_drv.c    |   1 +
>  drivers/gpu/drm/panthor/panthor_fw.c     | 226 +++++++++++++--
>  drivers/gpu/drm/panthor/panthor_fw.h     |  11 +-
>  drivers/gpu/drm/panthor/panthor_gpu.c    |  27 +-
>  drivers/gpu/drm/panthor/panthor_mmu.c    |  38 +--
>  drivers/gpu/drm/panthor/panthor_pwr.c    |  21 +-
>  drivers/gpu/drm/panthor/panthor_sched.c  | 475 ++++++++++++++-----------------
>  8 files changed, 698 insertions(+), 459 deletions(-)
> ---
> base-commit: 7455a0583a906533041a80e48c6a2e3230cce96e
> change-id: 20260429-panthor-signal-from-irq-d33684f4d292
> prerequisite-message-id: <20260427155934.416502-1-karunika.choo@arm.com>
> prerequisite-patch-id: 70905a2eb09ab2b31d242a5ed5af3b42fb6a464c
> prerequisite-patch-id: aa4c22669f80328039762f25c0b3942bbadbdc89
> prerequisite-patch-id: 7f61bcee3c4bb5703900b18d5b6e0f52e622f29d
> prerequisite-patch-id: 3402f4d60aa526d40113fc3d9b3e599f8f89e705
> prerequisite-patch-id: 00ddbd3d455891f6950609614c1acd2baa78b0db
> prerequisite-patch-id: 6a9928f609e3757cadebb2df6795d0da55745f4e
> prerequisite-patch-id: fd91f68f25d4bc93eec405f0131f5ae4284bfaf2
> prerequisite-patch-id: 553958a10a0ca2f20f7883ad4c752cfc7485c5a8
> 
> Best regards,


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency
  2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (10 preceding siblings ...)
  2026-04-29  9:59 ` [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
@ 2026-04-29 10:36 ` Boris Brezillon
  2026-05-05  8:54   ` Boris Brezillon
  11 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-04-29 10:36 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On Wed, 29 Apr 2026 11:38:27 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> Right now, panthor is one of the rare drivers to signal fences
> from work items (not even from the threaded IRQ handler). We
> could move that to the threaded handler, but that would still
> leave the latency caused by the scheduling of the IRQ thread.
> 
> Instead, this patchset moves all the job IRQ processing to
> the raw IRQ handler, which is fine because what the current
> code does is demux the interrupts and deferring actual handling
> to sub work items. The only bits we keep in the IRQ path is
> the dma_fence signalling, which should be acceptable, in term
> of CPU cycles spent in the IRQ context.
> 
> Pretty much all the patches except the last two are just
> preparing the ground to get there. The second to last one
> does the thread -> IRQ transition, and the last one is some
> experimental interrupt coalescing support that I've added
> because I noticed moving job IRQ handling to the raw handler
> generates quite a lot of interrupts in some case, and having
> the system constantly interrupted like that can be
> detrimental.
> 

Forgot to post some preliminary numbers I collected during my,
admittedly, very basic testing :-). What this shows is that IRQ
coalescing provides small but noticeable improvements only in some
of the glmark scenes (terrain, refract), the rest of the variations
stay in the noise of what we see between regular glmark runs. BTW,
those relatively small improvements (~5%) aren't even reflected in the
final score, because many tests have high FPS scores, and any variation
on those might actually have more impact on the final score (which is
just a average FPS IIUC) than any improvement on the lower-FPS scenes.

It's also worth noting that the refract scenes seems to suffer from
this threaded -> raw-IRQ transition, and that coalescing gets us back
to where we were.

TLDR; As always, there's no simple answer to this 'latency vs throughput'
issue, and it's not surprising one approach helps some cases and
regresses others.

---------- Before this series ---------------

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      Mesa
    GL_RENDERER:    Mali-G610 MC4 (Panfrost)
    GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 2708 FrameTime: 0.369 ms
[build] use-vbo=true: FPS: 4209 FrameTime: 0.238 ms
[texture] texture-filter=nearest: FPS: 5211 FrameTime: 0.192 ms
[texture] texture-filter=linear: FPS: 5224 FrameTime: 0.191 ms
[texture] texture-filter=mipmap: FPS: 5255 FrameTime: 0.190 ms
[shading] shading=gouraud: FPS: 3395 FrameTime: 0.295 ms
[shading] shading=blinn-phong-inf: FPS: 3329 FrameTime: 0.300 ms
[shading] shading=phong: FPS: 2990 FrameTime: 0.335 ms
[shading] shading=cel: FPS: 2916 FrameTime: 0.343 ms
[bump] bump-render=high-poly: FPS: 1879 FrameTime: 0.532 ms
[bump] bump-render=normals: FPS: 5242 FrameTime: 0.191 ms
[bump] bump-render=height: FPS: 4997 FrameTime: 0.200 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3725 FrameTime: 0.268 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 1906 FrameTime: 0.525 ms
[pulsar] light=false:quads=5:texture=false: FPS: 4863 FrameTime: 0.206 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 706 FrameTime: 1.417 ms
[desktop] effect=shadow:windows=4: FPS: 2621 FrameTime: 0.382 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 411 FrameTime: 2.435 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 402 FrameTime: 2.489 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 490 FrameTime: 2.043 ms
[ideas] speed=duration: FPS: 1008 FrameTime: 0.992 ms
[jellyfish] <default>: FPS: 2722 FrameTime: 0.367 ms
[terrain] <default>: FPS: 120 FrameTime: 8.339 ms
[shadow] <default>: FPS: 2086 FrameTime: 0.479 ms
[refract] <default>: FPS: 312 FrameTime: 3.209 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 4877 FrameTime: 0.205 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 4118 FrameTime: 0.243 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 4845 FrameTime: 0.206 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 4444 FrameTime: 0.225 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 3722 FrameTime: 0.269 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4468 FrameTime: 0.224 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4442 FrameTime: 0.225 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 3847 FrameTime: 0.260 ms
=======================================================
                                  glmark2 Score: 3135 
=======================================================

---------- After transitioning to job event processing in the IRQ context ------------

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      Mesa
    GL_RENDERER:    Mali-G610 MC4 (Panfrost)
    GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 2703 FrameTime: 0.370 ms
[build] use-vbo=true: FPS: 4630 FrameTime: 0.216 ms
[texture] texture-filter=nearest: FPS: 5406 FrameTime: 0.185 ms
[texture] texture-filter=linear: FPS: 5429 FrameTime: 0.184 ms
[texture] texture-filter=mipmap: FPS: 5408 FrameTime: 0.185 ms
[shading] shading=gouraud: FPS: 3678 FrameTime: 0.272 ms
[shading] shading=blinn-phong-inf: FPS: 3587 FrameTime: 0.279 ms
[shading] shading=phong: FPS: 3221 FrameTime: 0.311 ms
[shading] shading=cel: FPS: 3119 FrameTime: 0.321 ms
[bump] bump-render=high-poly: FPS: 1977 FrameTime: 0.506 ms
[bump] bump-render=normals: FPS: 5488 FrameTime: 0.182 ms
[bump] bump-render=height: FPS: 5323 FrameTime: 0.188 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 4003 FrameTime: 0.250 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2008 FrameTime: 0.498 ms
[pulsar] light=false:quads=5:texture=false: FPS: 4961 FrameTime: 0.202 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 852 FrameTime: 1.174 ms
[desktop] effect=shadow:windows=4: FPS: 2649 FrameTime: 0.378 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 412 FrameTime: 2.429 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 392 FrameTime: 2.554 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 482 FrameTime: 2.075 ms
[ideas] speed=duration: FPS: 1021 FrameTime: 0.980 ms
[jellyfish] <default>: FPS: 2939 FrameTime: 0.340 ms
[terrain] <default>: FPS: 126 FrameTime: 7.979 ms
[shadow] <default>: FPS: 2273 FrameTime: 0.440 ms
[refract] <default>: FPS: 251 FrameTime: 3.999 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 5148 FrameTime: 0.194 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 4555 FrameTime: 0.220 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 5245 FrameTime: 0.191 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 4880 FrameTime: 0.205 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 4042 FrameTime: 0.247 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4846 FrameTime: 0.206 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4854 FrameTime: 0.206 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 4207 FrameTime: 0.238 ms
=======================================================
                                  glmark2 Score: 3335 
=======================================================

---- With IRQ coalescing enabled (max_us=100 poll_period_us=5 inbounds_cnt_threshold=5) ---

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      Mesa
    GL_RENDERER:    Mali-G610 MC4 (Panfrost)
    GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 2663 FrameTime: 0.376 ms
[build] use-vbo=true: FPS: 4640 FrameTime: 0.216 ms
[texture] texture-filter=nearest: FPS: 5335 FrameTime: 0.187 ms
[texture] texture-filter=linear: FPS: 5442 FrameTime: 0.184 ms
[texture] texture-filter=mipmap: FPS: 5434 FrameTime: 0.184 ms
[shading] shading=gouraud: FPS: 3683 FrameTime: 0.272 ms
[shading] shading=blinn-phong-inf: FPS: 3580 FrameTime: 0.279 ms
[shading] shading=phong: FPS: 3211 FrameTime: 0.312 ms
[shading] shading=cel: FPS: 3093 FrameTime: 0.323 ms
[bump] bump-render=high-poly: FPS: 1969 FrameTime: 0.508 ms
[bump] bump-render=normals: FPS: 5368 FrameTime: 0.186 ms
[bump] bump-render=height: FPS: 5273 FrameTime: 0.190 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 4038 FrameTime: 0.248 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2001 FrameTime: 0.500 ms
[pulsar] light=false:quads=5:texture=false: FPS: 4961 FrameTime: 0.202 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 842 FrameTime: 1.188 ms
[desktop] effect=shadow:windows=4: FPS: 2681 FrameTime: 0.373 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 412 FrameTime: 2.430 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 408 FrameTime: 2.452 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 483 FrameTime: 2.072 ms
[ideas] speed=duration: FPS: 1005 FrameTime: 0.995 ms
[jellyfish] <default>: FPS: 2945 FrameTime: 0.340 ms
[terrain] <default>: FPS: 131 FrameTime: 7.663 ms
[shadow] <default>: FPS: 2276 FrameTime: 0.440 ms
[refract] <default>: FPS: 328 FrameTime: 3.050 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 5099 FrameTime: 0.196 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 4538 FrameTime: 0.220 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 5152 FrameTime: 0.194 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 4818 FrameTime: 0.208 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 4035 FrameTime: 0.248 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4855 FrameTime: 0.206 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4812 FrameTime: 0.208 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 4150 FrameTime: 0.241 ms
=======================================================
                                  glmark2 Score: 3322 
=======================================================


> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
> Boris Brezillon (10):
>       drm/panthor: Make panthor_irq::state a non-atomic field
>       drm/panthor: Move the register accessors before the IRQ helpers
>       drm/panthor: Replace the panthor_irq macro machinery by inline helpers
>       drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers
>       drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
>       drm/panthor: Prepare the scheduler logic for FW events in IRQ context
>       drm/panthor: Automate CSG IRQ processing at group unbind time
>       drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
>       drm/panthor: Process FW events in IRQ context
>       drm/panthor: Introduce interrupt coalescing support for job IRQs
> 
>  drivers/gpu/drm/panthor/panthor_device.h | 358 ++++++++++++++---------
>  drivers/gpu/drm/panthor/panthor_drv.c    |   1 +
>  drivers/gpu/drm/panthor/panthor_fw.c     | 226 +++++++++++++--
>  drivers/gpu/drm/panthor/panthor_fw.h     |  11 +-
>  drivers/gpu/drm/panthor/panthor_gpu.c    |  27 +-
>  drivers/gpu/drm/panthor/panthor_mmu.c    |  38 +--
>  drivers/gpu/drm/panthor/panthor_pwr.c    |  21 +-
>  drivers/gpu/drm/panthor/panthor_sched.c  | 475 ++++++++++++++-----------------
>  8 files changed, 698 insertions(+), 459 deletions(-)
> ---
> base-commit: 7455a0583a906533041a80e48c6a2e3230cce96e
> change-id: 20260429-panthor-signal-from-irq-d33684f4d292
> prerequisite-message-id: <20260427155934.416502-1-karunika.choo@arm.com>
> prerequisite-patch-id: 70905a2eb09ab2b31d242a5ed5af3b42fb6a464c
> prerequisite-patch-id: aa4c22669f80328039762f25c0b3942bbadbdc89
> prerequisite-patch-id: 7f61bcee3c4bb5703900b18d5b6e0f52e622f29d
> prerequisite-patch-id: 3402f4d60aa526d40113fc3d9b3e599f8f89e705
> prerequisite-patch-id: 00ddbd3d455891f6950609614c1acd2baa78b0db
> prerequisite-patch-id: 6a9928f609e3757cadebb2df6795d0da55745f4e
> prerequisite-patch-id: fd91f68f25d4bc93eec405f0131f5ae4284bfaf2
> prerequisite-patch-id: 553958a10a0ca2f20f7883ad4c752cfc7485c5a8
> 
> Best regards,


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field
  2026-04-29  9:38 ` [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
@ 2026-04-29 12:29   ` Liviu Dudau
  2026-05-01 13:17   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Liviu Dudau @ 2026-04-29 12:29 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Wed, Apr 29, 2026 at 11:38:28AM +0200, Boris Brezillon wrote:
> The only place where panthor_irq::state is accessed without
> panthor_irq::mask_lock held is in the prologue of _irq_suspend(),
> which is not really a fast-path. So let's simplify things by assuming
> panthor_irq::state must always be accessed with the mask_lock held,
> and add a scoped_guard() in _irq_suspend().
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

Best regards,
Liviu

> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 35 ++++++++++++++++----------------
>  1 file changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 4e4607bca7cc..3f91ba73829d 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -101,8 +101,12 @@ struct panthor_irq {
>  	 */
>  	spinlock_t mask_lock;
>  
> -	/** @state: one of &enum panthor_irq_state reflecting the current state. */
> -	atomic_t state;
> +	/**
> +	 * @state: one of &enum panthor_irq_state reflecting the current state.
> +	 *
> +	 * Must be accessed with mask_lock held.
> +	 */
> +	enum panthor_irq_state state;
>  };
>  
>  /**
> @@ -510,18 +514,15 @@ const char *panthor_exception_name(struct panthor_device *ptdev,
>  static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)			\
>  {												\
>  	struct panthor_irq *pirq = data;							\
> -	enum panthor_irq_state old_state;							\
>  												\
>  	if (!gpu_read(pirq->iomem, INT_STAT))							\
>  		return IRQ_NONE;								\
>  												\
>  	guard(spinlock_irqsave)(&pirq->mask_lock);						\
> -	old_state = atomic_cmpxchg(&pirq->state,						\
> -				   PANTHOR_IRQ_STATE_ACTIVE,					\
> -				   PANTHOR_IRQ_STATE_PROCESSING);				\
> -	if (old_state != PANTHOR_IRQ_STATE_ACTIVE)						\
> +	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)						\
>  		return IRQ_NONE;								\
>  												\
> +	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;						\
>  	gpu_write(pirq->iomem, INT_MASK, 0);							\
>  	return IRQ_WAKE_THREAD;									\
>  }												\
> @@ -551,13 +552,10 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
>  	}											\
>  												\
>  	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
> -		enum panthor_irq_state old_state;						\
> -												\
> -		old_state = atomic_cmpxchg(&pirq->state,					\
> -					   PANTHOR_IRQ_STATE_PROCESSING,			\
> -					   PANTHOR_IRQ_STATE_ACTIVE);				\
> -		if (old_state == PANTHOR_IRQ_STATE_PROCESSING)					\
> +		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {				\
> +			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;					\
>  			gpu_write(pirq->iomem, INT_MASK, pirq->mask);				\
> +		}										\
>  	}											\
>  												\
>  	return ret;										\
> @@ -566,18 +564,19 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
>  static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
>  {												\
>  	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
> -		atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDING);				\
> +		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;					\
>  		gpu_write(pirq->iomem, INT_MASK, 0);						\
>  	}											\
>  	synchronize_irq(pirq->irq);								\
> -	atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDED);					\
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock)					\
> +		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;					\
>  }												\
>  												\
>  static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)			\
>  {												\
>  	guard(spinlock_irqsave)(&pirq->mask_lock);						\
>  												\
> -	atomic_set(&pirq->state, PANTHOR_IRQ_STATE_ACTIVE);					\
> +	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;							\
>  	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);						\
>  	gpu_write(pirq->iomem, INT_MASK, pirq->mask);						\
>  }												\
> @@ -610,7 +609,7 @@ static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *
>  	 * on the PROCESSING -> ACTIVE transition.						\
>  	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
>  	 */											\
> -	if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)				\
> +	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
>  		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
>  }												\
>  												\
> @@ -624,7 +623,7 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
>  	 * on the PROCESSING -> ACTIVE transition.						\
>  	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
>  	 */											\
> -	if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)				\
> +	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
>  		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
>  }
>  
> 
> -- 
> 2.53.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/10] drm/panthor: Move the register accessors before the IRQ helpers
  2026-04-29  9:38 ` [PATCH 02/10] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
@ 2026-04-29 12:31   ` Liviu Dudau
  2026-05-01 13:17   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Liviu Dudau @ 2026-04-29 12:31 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Wed, Apr 29, 2026 at 11:38:29AM +0200, Boris Brezillon wrote:
> We're about to add an IRQ inline helper using gpu_read(). Move things
> around to avoid forward declarations.
> 
> No functional changes.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

Best regards,
Liviu

> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 142 +++++++++++++++----------------
>  1 file changed, 71 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 3f91ba73829d..768fc1992368 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -495,6 +495,77 @@ panthor_exception_is_fault(u32 exception_code)
>  const char *panthor_exception_name(struct panthor_device *ptdev,
>  				   u32 exception_code);
>  
> +static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
> +{
> +	writel(data, iomem + reg);
> +}
> +
> +static inline u32 gpu_read(void __iomem *iomem, u32 reg)
> +{
> +	return readl(iomem + reg);
> +}
> +
> +static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
> +{
> +	return readl_relaxed(iomem + reg);
> +}
> +
> +static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
> +{
> +	gpu_write(iomem, reg, lower_32_bits(data));
> +	gpu_write(iomem, reg + 4, upper_32_bits(data));
> +}
> +
> +static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
> +{
> +	return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
> +}
> +
> +static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
> +{
> +	return (gpu_read_relaxed(iomem, reg) |
> +		((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
> +}
> +
> +static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
> +{
> +	u32 lo, hi1, hi2;
> +	do {
> +		hi1 = gpu_read(iomem, reg + 4);
> +		lo = gpu_read(iomem, reg);
> +		hi2 = gpu_read(iomem, reg + 4);
> +	} while (hi1 != hi2);
> +	return lo | ((u64)hi2 << 32);
> +}
> +
> +#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
> +	read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,	\
> +			  iomem, reg)
> +
> +#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
> +				     timeout_us)				\
> +	read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,	\
> +				 false, iomem, reg)
> +
> +#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
> +	read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,	\
> +			  iomem, reg)
> +
> +#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
> +				       timeout_us)				\
> +	read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,	\
> +				 false, iomem, reg)
> +
> +#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,	\
> +					     timeout_us)			\
> +	read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,		\
> +				 timeout_us, false, iomem, reg)
> +
> +#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,	\
> +					timeout_us)				\
> +	read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,	\
> +			  false, iomem, reg)
> +
>  #define INT_RAWSTAT 0x0
>  #define INT_CLEAR   0x4
>  #define INT_MASK    0x8
> @@ -629,75 +700,4 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
>  
>  extern struct workqueue_struct *panthor_cleanup_wq;
>  
> -static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
> -{
> -	writel(data, iomem + reg);
> -}
> -
> -static inline u32 gpu_read(void __iomem *iomem, u32 reg)
> -{
> -	return readl(iomem + reg);
> -}
> -
> -static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
> -{
> -	return readl_relaxed(iomem + reg);
> -}
> -
> -static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
> -{
> -	gpu_write(iomem, reg, lower_32_bits(data));
> -	gpu_write(iomem, reg + 4, upper_32_bits(data));
> -}
> -
> -static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
> -{
> -	return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
> -}
> -
> -static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
> -{
> -	return (gpu_read_relaxed(iomem, reg) |
> -		((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
> -}
> -
> -static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
> -{
> -	u32 lo, hi1, hi2;
> -	do {
> -		hi1 = gpu_read(iomem, reg + 4);
> -		lo = gpu_read(iomem, reg);
> -		hi2 = gpu_read(iomem, reg + 4);
> -	} while (hi1 != hi2);
> -	return lo | ((u64)hi2 << 32);
> -}
> -
> -#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
> -	read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,	\
> -			  iomem, reg)
> -
> -#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
> -				     timeout_us)				\
> -	read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,	\
> -				 false, iomem, reg)
> -
> -#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
> -	read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,	\
> -			  iomem, reg)
> -
> -#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
> -				       timeout_us)				\
> -	read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,	\
> -				 false, iomem, reg)
> -
> -#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,	\
> -					     timeout_us)			\
> -	read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,		\
> -				 timeout_us, false, iomem, reg)
> -
> -#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,	\
> -					timeout_us)				\
> -	read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,	\
> -			  false, iomem, reg)
> -
>  #endif
> 
> -- 
> 2.53.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/10] drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers
  2026-04-29  9:38 ` [PATCH 04/10] drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers Boris Brezillon
@ 2026-04-29 13:32   ` Liviu Dudau
  2026-05-01 13:28   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Liviu Dudau @ 2026-04-29 13:32 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Wed, Apr 29, 2026 at 11:38:31AM +0200, Boris Brezillon wrote:
> All drivers except panthor signal their fences from their interrupt
> handler to minimize latency. We could do the same from the interrupt
> handler, but the latency is still quite high in that case, so let's
> allow components to choose the context they want their IRQ handler
> to run in.

Starting here

> 
> This takes the form of an extra fast_handler() returning an irqreturn_t
> reflecting the need to wake-up a thread or not.
> A new PANTHOR_IRQ_ADV_HANDLER() macro taking this extra fast_handler
> argument is added, PANTHOR_IRQ_HANDLER() is implemented as a wrapper
> around PANTHOR_IRQ_ADV_HANDLER() with a default fast_handler
> returning IRQ_WAKE_THREAD.

up to here: there is no code matching the description. Left over from
earlier iteration?

> The fast and slow handler are still assumed
> to be mutually exclusive. In case a fast handler is provided, the
> slow_handler is expected to be run when the event can't be processed
> directly in the fast handler, or when the driver thinks it would be
> beneficial to coalesce interrupts by polling in the thread rather than
> re-enabling interrupts immediately.

This part is not really describing any code, just the intent. Maybe worth
moving it inside the code as a comment?

Otherwise, the change looks fine to me.

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

Best regards,
Liviu

> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 5 ++---
>  drivers/gpu/drm/panthor/panthor_fw.c     | 1 +
>  drivers/gpu/drm/panthor/panthor_gpu.c    | 1 +
>  drivers/gpu/drm/panthor/panthor_mmu.c    | 1 +
>  drivers/gpu/drm/panthor/panthor_pwr.c    | 1 +
>  5 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index afa202546316..1c130b8394ab 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
>  static inline int
>  panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
>  		    int irq, u32 mask, void __iomem *iomem, const char *name,
> +		    irqreturn_t (*raw_handler)(int, void *data),
>  		    irqreturn_t (*threaded_handler)(int, void *data))
>  {
>  	const char *full_name;
> @@ -687,9 +688,7 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
>  	if (!full_name)
>  		return -ENOMEM;
>  
> -	return devm_request_threaded_irq(ptdev->base.dev, irq,
> -					 panthor_irq_default_raw_handler,
> -					 threaded_handler,
> +	return devm_request_threaded_irq(ptdev->base.dev, irq, raw_handler, threaded_handler,
>  					 IRQF_SHARED, full_name, pirq);
>  }
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index eaf599b0a887..8239a6951569 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1483,6 +1483,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
>  
>  	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
>  				  ptdev->iomem + JOB_INT_BASE, "job",
> +				  panthor_irq_default_raw_handler,
>  				  panthor_job_irq_threaded_handler);
>  	if (ret) {
>  		drm_err(&ptdev->base, "failed to request job irq");
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index ce208e384762..d0be758ea3e1 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -177,6 +177,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
>  	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
>  				  GPU_INTERRUPTS_MASK,
>  				  ptdev->iomem + GPU_INT_BASE, "gpu",
> +				  panthor_irq_default_raw_handler,
>  				  panthor_gpu_irq_threaded_handler);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index a0d0a9b2926f..2cb07933b629 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -3260,6 +3260,7 @@ int panthor_mmu_init(struct panthor_device *ptdev)
>  	ret = panthor_irq_request(ptdev, &mmu->irq, irq,
>  				  panthor_mmu_fault_mask(ptdev, ~0),
>  				  ptdev->iomem + MMU_INT_BASE, "mmu",
> +				  panthor_irq_default_raw_handler,
>  				  panthor_mmu_irq_threaded_handler);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
> index 80cf78007896..1efb7f3482ba 100644
> --- a/drivers/gpu/drm/panthor/panthor_pwr.c
> +++ b/drivers/gpu/drm/panthor/panthor_pwr.c
> @@ -491,6 +491,7 @@ int panthor_pwr_init(struct panthor_device *ptdev)
>  	err = panthor_irq_request(
>  		ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
>  		pwr->iomem + PWR_INT_BASE, "pwr",
> +		panthor_irq_default_raw_handler,
>  		panthor_pwr_irq_threaded_handler);
>  	if (err)
>  		return err;
> 
> -- 
> 2.53.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/10] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
  2026-04-29  9:38 ` [PATCH 05/10] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
@ 2026-04-29 13:33   ` Liviu Dudau
  2026-05-01 13:39   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Liviu Dudau @ 2026-04-29 13:33 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Wed, Apr 29, 2026 at 11:38:32AM +0200, Boris Brezillon wrote:
> If we want some FW events to be processed in the interrupt path, we need
> the helpers manipulating req regs to be IRQ-safe, which implies using
> spin_lock_irqsave instead of spinlock. While at it, use guards instead
> of plain spin_lock/unlock calls.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

Best regards,
Liviu

> ---
>  drivers/gpu/drm/panthor/panthor_fw.h | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
> index a99a9b6f4825..e56b7fe15bb3 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.h
> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
> @@ -432,12 +432,11 @@ struct panthor_fw_global_iface {
>  #define panthor_fw_toggle_reqs(__iface, __in_reg, __out_reg, __mask) \
>  	do { \
>  		u32 __cur_val, __new_val, __out_val; \
> -		spin_lock(&(__iface)->lock); \
> +		guard(spinlock_irqsave)(&(__iface)->lock); \
>  		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
>  		__out_val = READ_ONCE((__iface)->output->__out_reg); \
>  		__new_val = ((__out_val ^ (__mask)) & (__mask)) | (__cur_val & ~(__mask)); \
>  		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -		spin_unlock(&(__iface)->lock); \
>  	} while (0)
>  
>  /**
> @@ -458,21 +457,19 @@ struct panthor_fw_global_iface {
>  #define panthor_fw_update_reqs(__iface, __in_reg, __val, __mask) \
>  	do { \
>  		u32 __cur_val, __new_val; \
> -		spin_lock(&(__iface)->lock); \
> +		guard(spinlock_irqsave)(&(__iface)->lock); \
>  		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
>  		__new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
>  		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -		spin_unlock(&(__iface)->lock); \
>  	} while (0)
>  
>  #define panthor_fw_update_reqs64(__iface, __in_reg, __val, __mask) \
>  	do { \
>  		u64 __cur_val, __new_val; \
> -		spin_lock(&(__iface)->lock); \
> +		guard(spinlock_irqsave)(&(__iface)->lock); \
>  		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
>  		__new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
>  		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -		spin_unlock(&(__iface)->lock); \
>  	} while (0)
>  
>  struct panthor_fw_global_iface *
> 
> -- 
> 2.53.0
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers
  2026-04-29  9:38 ` [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
@ 2026-04-30  9:40   ` Karunika Choo
  2026-04-30 10:38     ` Boris Brezillon
  2026-05-01 13:22   ` Steven Price
  1 sibling, 1 reply; 39+ messages in thread
From: Karunika Choo @ 2026-04-30  9:40 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> Now that panthor_irq contains the iomem region, there's no real need
> for the macro-based panthor_irq helper generation logic. We can just
> provide inline helpers that do the same and let the compiler optimize
> indirect function calls. The only extra annoyance is the fact we have
> to open-code the panthor_xxx_irq_threaded_handler() implementation, but
> those are single-line functions, so it's acceptable.
> 
> While at it, we changed the prototype of the IRQ handlers to take
> a panthor_irq instead of panthor_device, since that's the thing
> that's passed around when it comes to panthor_irq, and the
> panthor_device can be directly extracted from there.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 245 +++++++++++++++----------------
>  drivers/gpu/drm/panthor/panthor_fw.c     |  22 ++-
>  drivers/gpu/drm/panthor/panthor_gpu.c    |  26 ++--
>  drivers/gpu/drm/panthor/panthor_mmu.c    |  37 ++---
>  drivers/gpu/drm/panthor/panthor_pwr.c    |  20 ++-
>  5 files changed, 183 insertions(+), 167 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 768fc1992368..afa202546316 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -571,131 +571,126 @@ static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
>  #define INT_MASK    0x8
>  #define INT_STAT    0xc
>  
> -/**
> - * PANTHOR_IRQ_HANDLER() - Define interrupt handlers and the interrupt
> - * registration function.
> - *
> - * The boiler-plate to gracefully deal with shared interrupts is
> - * auto-generated. All you have to do is call PANTHOR_IRQ_HANDLER()
> - * just after the actual handler. The handler prototype is:
> - *
> - * void (*handler)(struct panthor_device *, u32 status);
> - */
> -#define PANTHOR_IRQ_HANDLER(__name, __handler)							\
> -static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)			\
> -{												\
> -	struct panthor_irq *pirq = data;							\
> -												\
> -	if (!gpu_read(pirq->iomem, INT_STAT))							\
> -		return IRQ_NONE;								\
> -												\
> -	guard(spinlock_irqsave)(&pirq->mask_lock);						\
> -	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)						\
> -		return IRQ_NONE;								\
> -												\
> -	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;						\
> -	gpu_write(pirq->iomem, INT_MASK, 0);							\
> -	return IRQ_WAKE_THREAD;									\
> -}												\
> -												\
> -static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *data)		\
> -{												\
> -	struct panthor_irq *pirq = data;							\
> -	struct panthor_device *ptdev = pirq->ptdev;						\
> -	irqreturn_t ret = IRQ_NONE;								\
> -												\
> -	while (true) {										\
> -		/* It's safe to access pirq->mask without the lock held here. If a new		\
> -		 * event gets added to the mask and the corresponding IRQ is pending,		\
> -		 * we'll process it right away instead of adding an extra raw -> threaded	\
> -		 * round trip. If an event is removed and the status bit is set, it will	\
> -		 * be ignored, just like it would have been if the mask had been adjusted	\
> -		 * right before the HW event kicks in. TLDR; it's all expected races we're	\
> -		 * covered for.									\
> -		 */										\
> -		u32 status = gpu_read(pirq->iomem, INT_RAWSTAT) & pirq->mask;			\
> -												\
> -		if (!status)									\
> -			break;									\
> -												\
> -		__handler(ptdev, status);							\
> -		ret = IRQ_HANDLED;								\
> -	}											\
> -												\
> -	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
> -		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {				\
> -			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;					\
> -			gpu_write(pirq->iomem, INT_MASK, pirq->mask);				\
> -		}										\
> -	}											\
> -												\
> -	return ret;										\
> -}												\
> -												\
> -static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
> -{												\
> -	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
> -		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;					\
> -		gpu_write(pirq->iomem, INT_MASK, 0);						\
> -	}											\
> -	synchronize_irq(pirq->irq);								\
> -	scoped_guard(spinlock_irqsave, &pirq->mask_lock)					\
> -		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;					\
> -}												\
> -												\
> -static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)			\
> -{												\
> -	guard(spinlock_irqsave)(&pirq->mask_lock);						\
> -												\
> -	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;							\
> -	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);						\
> -	gpu_write(pirq->iomem, INT_MASK, pirq->mask);						\
> -}												\
> -												\
> -static int panthor_request_ ## __name ## _irq(struct panthor_device *ptdev,			\
> -					      struct panthor_irq *pirq,				\
> -					      int irq, u32 mask, void __iomem *iomem)		\
> -{												\
> -	pirq->ptdev = ptdev;									\
> -	pirq->irq = irq;									\
> -	pirq->mask = mask;									\
> -	pirq->iomem = iomem;									\
> -	spin_lock_init(&pirq->mask_lock);							\
> -	panthor_ ## __name ## _irq_resume(pirq);						\
> -												\
> -	return devm_request_threaded_irq(ptdev->base.dev, irq,					\
> -					 panthor_ ## __name ## _irq_raw_handler,		\
> -					 panthor_ ## __name ## _irq_threaded_handler,		\
> -					 IRQF_SHARED, KBUILD_MODNAME "-" # __name,		\
> -					 pirq);							\
> -}												\
> -												\
> -static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *pirq, u32 mask)	\
> -{												\
> -	guard(spinlock_irqsave)(&pirq->mask_lock);						\
> -	pirq->mask |= mask;									\
> -												\
> -	/* The only situation where we need to write the new mask is if the IRQ is active.	\
> -	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()	\
> -	 * on the PROCESSING -> ACTIVE transition.						\
> -	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
> -	 */											\
> -	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
> -		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
> -}												\
> -												\
> -static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq *pirq, u32 mask)\
> -{												\
> -	guard(spinlock_irqsave)(&pirq->mask_lock);						\
> -	pirq->mask &= ~mask;									\
> -												\
> -	/* The only situation where we need to write the new mask is if the IRQ is active.	\
> -	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()	\
> -	 * on the PROCESSING -> ACTIVE transition.						\
> -	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
> -	 */											\
> -	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
> -		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
> +static inline irqreturn_t panthor_irq_default_raw_handler(int irq, void *data)
> +{
> +	struct panthor_irq *pirq = data;
> +
> +	if (!gpu_read(pirq->iomem, INT_STAT))
> +		return IRQ_NONE;
> +
> +	guard(spinlock_irqsave)(&pirq->mask_lock);
> +	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> +		return IRQ_NONE;
> +
> +	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> +	gpu_write(pirq->iomem, INT_MASK, 0);
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static inline irqreturn_t
> +panthor_irq_default_threaded_handler(void *data,
> +				     void (*slow_handler)(struct panthor_irq *, u32))
> +{
> +	struct panthor_irq *pirq = data;
> +	irqreturn_t ret = IRQ_NONE;
> +
> +	while (true) {
> +		/* It's safe to access pirq->mask without the lock held here. If a new
> +		 * event gets added to the mask and the corresponding IRQ is pending,
> +		 * we'll process it right away instead of adding an extra raw -> threaded
> +		 * round trip. If an event is removed and the status bit is set, it will
> +		 * be ignored, just like it would have been if the mask had been adjusted
> +		 * right before the HW event kicks in. TLDR; it's all expected races we're
> +		 * covered for.
> +		 */
> +		u32 status = gpu_read(pirq->iomem, INT_RAWSTAT) & pirq->mask;
> +
> +		if (!status)
> +			break;
> +
> +		slow_handler(pirq, status);
> +		ret = IRQ_HANDLED;
> +	}
> +
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {
> +			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> +			gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static inline void panthor_irq_suspend(struct panthor_irq *pirq)
> +{
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;
> +		gpu_write(pirq->iomem, INT_MASK, 0);
> +	}
> +	synchronize_irq(pirq->irq);
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock)
> +		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;
> +}
> +
> +static inline void panthor_irq_resume(struct panthor_irq *pirq)
> +{
> +	guard(spinlock_irqsave)(&pirq->mask_lock);
> +	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> +	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);
> +	gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +}
> +
> +static inline void panthor_irq_enable_events(struct panthor_irq *pirq, u32 mask)
> +{
> +	guard(spinlock_irqsave)(&pirq->mask_lock);
> +	pirq->mask |= mask;
> +
> +	/* The only situation where we need to write the new mask is if the IRQ is active.
> +	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()
> +	 * on the PROCESSING -> ACTIVE transition.
> +	 * If the IRQ is suspended/suspending, the mask is restored at resume time.
> +	 */
> +	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)
> +		gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +}
> +
> +static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask)
> +{
> +	guard(spinlock_irqsave)(&pirq->mask_lock);
> +	pirq->mask &= ~mask;
> +
> +	/* The only situation where we need to write the new mask is if the IRQ is active.
> +	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()
> +	 * on the PROCESSING -> ACTIVE transition.
> +	 * If the IRQ is suspended/suspending, the mask is restored at resume time.
> +	 */
> +	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)
> +		gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +}
> +
> +static inline int
> +panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> +		    int irq, u32 mask, void __iomem *iomem, const char *name,
> +		    irqreturn_t (*threaded_handler)(int, void *data))
> +{
> +	const char *full_name;
> +
> +	pirq->ptdev = ptdev;
> +	pirq->irq = irq;
> +	pirq->mask = mask;
> +	pirq->iomem = iomem;
> +	spin_lock_init(&pirq->mask_lock);
> +	panthor_irq_resume(pirq);
> +
> +	full_name = devm_kasprintf(ptdev->base.dev, GFP_KERNEL, KBUILD_MODNAME "-%s", name);
> +	if (!full_name)
> +		return -ENOMEM;
> +
> +	return devm_request_threaded_irq(ptdev->base.dev, irq,
> +					 panthor_irq_default_raw_handler,
> +					 threaded_handler,
> +					 IRQF_SHARED, full_name, pirq);
>  }
>  
>  extern struct workqueue_struct *panthor_cleanup_wq;
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 986151681b24..eaf599b0a887 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1064,8 +1064,9 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
>  			 msecs_to_jiffies(PING_INTERVAL_MS));
>  }
>  
> -static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
> +static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
>  {
> +	struct panthor_device *ptdev = pirq->ptdev;
>  	u32 duration;
>  	u64 start = 0;
>  
> @@ -1091,7 +1092,11 @@ static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
>  		trace_gpu_job_irq(ptdev->base.dev, status, duration);
>  	}
>  }
> -PANTHOR_IRQ_HANDLER(job, panthor_job_irq_handler);
> +
> +static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
> +{
> +	return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
> +}
>  

Hello,

Maybe we can consider embedding the slow_handler into struct panthor_irq?
You can then always use a default threaded IRQ handler here and call
pirq->slow_handler when needed.

Kind regards,
Karunika


>  static int panthor_fw_start(struct panthor_device *ptdev)
>  {
> @@ -1099,8 +1104,8 @@ static int panthor_fw_start(struct panthor_device *ptdev)
>  	bool timedout = false;
>  
>  	ptdev->fw->booted = false;
> -	panthor_job_irq_enable_events(&ptdev->fw->irq, ~0);
> -	panthor_job_irq_resume(&ptdev->fw->irq);
> +	panthor_irq_enable_events(&ptdev->fw->irq, ~0);
> +	panthor_irq_resume(&ptdev->fw->irq);
>  	gpu_write(fw->iomem, MCU_CONTROL, MCU_CONTROL_AUTO);
>  
>  	if (!wait_event_timeout(ptdev->fw->req_waitqueue,
> @@ -1210,7 +1215,7 @@ void panthor_fw_pre_reset(struct panthor_device *ptdev, bool on_hang)
>  			ptdev->reset.fast = true;
>  	}
>  
> -	panthor_job_irq_suspend(&ptdev->fw->irq);
> +	panthor_irq_suspend(&ptdev->fw->irq);
>  	panthor_fw_stop(ptdev);
>  }
>  
> @@ -1280,7 +1285,7 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>  	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) {
>  		/* Make sure the IRQ handler cannot be called after that point. */
>  		if (ptdev->fw->irq.irq)
> -			panthor_job_irq_suspend(&ptdev->fw->irq);
> +			panthor_irq_suspend(&ptdev->fw->irq);
>  
>  		panthor_fw_stop(ptdev);
>  	}
> @@ -1476,8 +1481,9 @@ int panthor_fw_init(struct panthor_device *ptdev)
>  	if (irq <= 0)
>  		return -ENODEV;
>  
> -	ret = panthor_request_job_irq(ptdev, &fw->irq, irq, 0,
> -				      ptdev->iomem + JOB_INT_BASE);
> +	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
> +				  ptdev->iomem + JOB_INT_BASE, "job",
> +				  panthor_job_irq_threaded_handler);
>  	if (ret) {
>  		drm_err(&ptdev->base, "failed to request job irq");
>  		return ret;
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index e52c5675981f..ce208e384762 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -86,8 +86,9 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
>  	gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
>  }
>  
> -static void panthor_gpu_irq_handler(struct panthor_device *ptdev, u32 status)
> +static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
>  {
> +	struct panthor_device *ptdev = pirq->ptdev;
>  	struct panthor_gpu *gpu = ptdev->gpu;
>  
>  	gpu_write(gpu->irq.iomem, INT_CLEAR, status);
> @@ -116,7 +117,11 @@ static void panthor_gpu_irq_handler(struct panthor_device *ptdev, u32 status)
>  	}
>  	spin_unlock(&ptdev->gpu->reqs_lock);
>  }
> -PANTHOR_IRQ_HANDLER(gpu, panthor_gpu_irq_handler);
> +
> +static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
> +{
> +	return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);
> +}
>  
>  /**
>   * panthor_gpu_unplug() - Called when the GPU is unplugged.
> @@ -128,7 +133,7 @@ void panthor_gpu_unplug(struct panthor_device *ptdev)
>  
>  	/* Make sure the IRQ handler is not running after that point. */
>  	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev))
> -		panthor_gpu_irq_suspend(&ptdev->gpu->irq);
> +		panthor_irq_suspend(&ptdev->gpu->irq);
>  
>  	/* Wake-up all waiters. */
>  	spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
> @@ -169,9 +174,10 @@ int panthor_gpu_init(struct panthor_device *ptdev)
>  	if (irq < 0)
>  		return irq;
>  
> -	ret = panthor_request_gpu_irq(ptdev, &ptdev->gpu->irq, irq,
> -				      GPU_INTERRUPTS_MASK,
> -				      ptdev->iomem + GPU_INT_BASE);
> +	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
> +				  GPU_INTERRUPTS_MASK,
> +				  ptdev->iomem + GPU_INT_BASE, "gpu",
> +				  panthor_gpu_irq_threaded_handler);
>  	if (ret)
>  		return ret;
>  
> @@ -182,7 +188,7 @@ int panthor_gpu_power_changed_on(struct panthor_device *ptdev)
>  {
>  	guard(pm_runtime_active)(ptdev->base.dev);
>  
> -	panthor_gpu_irq_enable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
> +	panthor_irq_enable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
>  
>  	return 0;
>  }
> @@ -191,7 +197,7 @@ void panthor_gpu_power_changed_off(struct panthor_device *ptdev)
>  {
>  	guard(pm_runtime_active)(ptdev->base.dev);
>  
> -	panthor_gpu_irq_disable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
> +	panthor_irq_disable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
>  }
>  
>  /**
> @@ -424,7 +430,7 @@ void panthor_gpu_suspend(struct panthor_device *ptdev)
>  	else
>  		panthor_hw_l2_power_off(ptdev);
>  
> -	panthor_gpu_irq_suspend(&ptdev->gpu->irq);
> +	panthor_irq_suspend(&ptdev->gpu->irq);
>  }
>  
>  /**
> @@ -436,7 +442,7 @@ void panthor_gpu_suspend(struct panthor_device *ptdev)
>   */
>  void panthor_gpu_resume(struct panthor_device *ptdev)
>  {
> -	panthor_gpu_irq_resume(&ptdev->gpu->irq);
> +	panthor_irq_resume(&ptdev->gpu->irq);
>  	panthor_hw_l2_power_on(ptdev);
>  }
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index a7ee14986849..a0d0a9b2926f 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -586,17 +586,13 @@ static u32 panthor_mmu_as_fault_mask(struct panthor_device *ptdev, u32 as)
>  	return BIT(as);
>  }
>  
> -/* Forward declaration to call helpers within as_enable/disable */
> -static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status);
> -PANTHOR_IRQ_HANDLER(mmu, panthor_mmu_irq_handler);
> -
>  static int panthor_mmu_as_enable(struct panthor_device *ptdev, u32 as_nr,
>  				 u64 transtab, u64 transcfg, u64 memattr)
>  {
>  	struct panthor_mmu *mmu = ptdev->mmu;
>  
> -	panthor_mmu_irq_enable_events(&ptdev->mmu->irq,
> -				      panthor_mmu_as_fault_mask(ptdev, as_nr));
> +	panthor_irq_enable_events(&ptdev->mmu->irq,
> +				  panthor_mmu_as_fault_mask(ptdev, as_nr));
>  
>  	gpu_write64(mmu->iomem, AS_TRANSTAB(as_nr), transtab);
>  	gpu_write64(mmu->iomem, AS_MEMATTR(as_nr), memattr);
> @@ -614,8 +610,8 @@ static int panthor_mmu_as_disable(struct panthor_device *ptdev, u32 as_nr,
>  
>  	lockdep_assert_held(&ptdev->mmu->as.slots_lock);
>  
> -	panthor_mmu_irq_disable_events(&ptdev->mmu->irq,
> -				       panthor_mmu_as_fault_mask(ptdev, as_nr));
> +	panthor_irq_disable_events(&ptdev->mmu->irq,
> +				   panthor_mmu_as_fault_mask(ptdev, as_nr));
>  
>  	/* Flush+invalidate RW caches, invalidate RO ones. */
>  	ret = panthor_gpu_flush_caches(ptdev, CACHE_CLEAN | CACHE_INV,
> @@ -1785,8 +1781,9 @@ static void panthor_vm_unlock_region(struct panthor_vm *vm)
>  	mutex_unlock(&ptdev->mmu->as.slots_lock);
>  }
>  
> -static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
> +static void panthor_mmu_irq_handler(struct panthor_irq *pirq, u32 status)
>  {
> +	struct panthor_device *ptdev = pirq->ptdev;
>  	struct panthor_mmu *mmu = ptdev->mmu;
>  	bool has_unhandled_faults = false;
>  
> @@ -1849,6 +1846,11 @@ static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
>  		panthor_sched_report_mmu_fault(ptdev);
>  }
>  
> +static irqreturn_t panthor_mmu_irq_threaded_handler(int irq, void *data)
> +{
> +	return panthor_irq_default_threaded_handler(data, panthor_mmu_irq_handler);
> +}
> +
>  /**
>   * panthor_mmu_suspend() - Suspend the MMU logic
>   * @ptdev: Device.
> @@ -1873,7 +1875,7 @@ void panthor_mmu_suspend(struct panthor_device *ptdev)
>  	}
>  	mutex_unlock(&ptdev->mmu->as.slots_lock);
>  
> -	panthor_mmu_irq_suspend(&ptdev->mmu->irq);
> +	panthor_irq_suspend(&ptdev->mmu->irq);
>  }
>  
>  /**
> @@ -1892,7 +1894,7 @@ void panthor_mmu_resume(struct panthor_device *ptdev)
>  	ptdev->mmu->as.faulty_mask = 0;
>  	mutex_unlock(&ptdev->mmu->as.slots_lock);
>  
> -	panthor_mmu_irq_resume(&ptdev->mmu->irq);
> +	panthor_irq_resume(&ptdev->mmu->irq);
>  }
>  
>  /**
> @@ -1909,7 +1911,7 @@ void panthor_mmu_pre_reset(struct panthor_device *ptdev)
>  {
>  	struct panthor_vm *vm;
>  
> -	panthor_mmu_irq_suspend(&ptdev->mmu->irq);
> +	panthor_irq_suspend(&ptdev->mmu->irq);
>  
>  	mutex_lock(&ptdev->mmu->vm.lock);
>  	ptdev->mmu->vm.reset_in_progress = true;
> @@ -1946,7 +1948,7 @@ void panthor_mmu_post_reset(struct panthor_device *ptdev)
>  
>  	mutex_unlock(&ptdev->mmu->as.slots_lock);
>  
> -	panthor_mmu_irq_resume(&ptdev->mmu->irq);
> +	panthor_irq_resume(&ptdev->mmu->irq);
>  
>  	/* Restart the VM_BIND queues. */
>  	mutex_lock(&ptdev->mmu->vm.lock);
> @@ -3201,7 +3203,7 @@ panthor_mmu_reclaim_priv_bos(struct panthor_device *ptdev,
>  void panthor_mmu_unplug(struct panthor_device *ptdev)
>  {
>  	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev))
> -		panthor_mmu_irq_suspend(&ptdev->mmu->irq);
> +		panthor_irq_suspend(&ptdev->mmu->irq);
>  
>  	mutex_lock(&ptdev->mmu->as.slots_lock);
>  	for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) {
> @@ -3255,9 +3257,10 @@ int panthor_mmu_init(struct panthor_device *ptdev)
>  	if (irq <= 0)
>  		return -ENODEV;
>  
> -	ret = panthor_request_mmu_irq(ptdev, &mmu->irq, irq,
> -				      panthor_mmu_fault_mask(ptdev, ~0),
> -				      ptdev->iomem + MMU_INT_BASE);
> +	ret = panthor_irq_request(ptdev, &mmu->irq, irq,
> +				  panthor_mmu_fault_mask(ptdev, ~0),
> +				  ptdev->iomem + MMU_INT_BASE, "mmu",
> +				  panthor_mmu_irq_threaded_handler);
>  	if (ret)
>  		return ret;
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
> index 7c7f424a1436..80cf78007896 100644
> --- a/drivers/gpu/drm/panthor/panthor_pwr.c
> +++ b/drivers/gpu/drm/panthor/panthor_pwr.c
> @@ -56,8 +56,9 @@ struct panthor_pwr {
>  	wait_queue_head_t reqs_acked;
>  };
>  
> -static void panthor_pwr_irq_handler(struct panthor_device *ptdev, u32 status)
> +static void panthor_pwr_irq_handler(struct panthor_irq *pirq, u32 status)
>  {
> +	struct panthor_device *ptdev = pirq->ptdev;
>  	struct panthor_pwr *pwr = ptdev->pwr;
>  
>  	spin_lock(&ptdev->pwr->reqs_lock);
> @@ -75,7 +76,11 @@ static void panthor_pwr_irq_handler(struct panthor_device *ptdev, u32 status)
>  	}
>  	spin_unlock(&ptdev->pwr->reqs_lock);
>  }
> -PANTHOR_IRQ_HANDLER(pwr, panthor_pwr_irq_handler);
> +
> +static irqreturn_t panthor_pwr_irq_threaded_handler(int irq, void *data)
> +{
> +	return panthor_irq_default_threaded_handler(data, panthor_pwr_irq_handler);
> +}
>  
>  static void panthor_pwr_write_command(struct panthor_device *ptdev, u32 command, u64 args)
>  {
> @@ -453,7 +458,7 @@ void panthor_pwr_unplug(struct panthor_device *ptdev)
>  		return;
>  
>  	/* Make sure the IRQ handler is not running after that point. */
> -	panthor_pwr_irq_suspend(&ptdev->pwr->irq);
> +	panthor_irq_suspend(&ptdev->pwr->irq);
>  
>  	/* Wake-up all waiters. */
>  	spin_lock_irqsave(&ptdev->pwr->reqs_lock, flags);
> @@ -483,9 +488,10 @@ int panthor_pwr_init(struct panthor_device *ptdev)
>  	if (irq < 0)
>  		return irq;
>  
> -	err = panthor_request_pwr_irq(
> +	err = panthor_irq_request(
>  		ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
> -		pwr->iomem + PWR_INT_BASE);
> +		pwr->iomem + PWR_INT_BASE, "pwr",
> +		panthor_pwr_irq_threaded_handler);
>  	if (err)
>  		return err;
>  
> @@ -564,7 +570,7 @@ void panthor_pwr_suspend(struct panthor_device *ptdev)
>  	if (!ptdev->pwr)
>  		return;
>  
> -	panthor_pwr_irq_suspend(&ptdev->pwr->irq);
> +	panthor_irq_suspend(&ptdev->pwr->irq);
>  }
>  
>  void panthor_pwr_resume(struct panthor_device *ptdev)
> @@ -572,5 +578,5 @@ void panthor_pwr_resume(struct panthor_device *ptdev)
>  	if (!ptdev->pwr)
>  		return;
>  
> -	panthor_pwr_irq_resume(&ptdev->pwr->irq);
> +	panthor_irq_resume(&ptdev->pwr->irq);
>  }
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers
  2026-04-30  9:40   ` Karunika Choo
@ 2026-04-30 10:38     ` Boris Brezillon
  0 siblings, 0 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-04-30 10:38 UTC (permalink / raw)
  To: Karunika Choo
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Thu, 30 Apr 2026 10:40:32 +0100
Karunika Choo <karunika.choo@arm.com> wrote:

> > +
> > +static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
> > +{
> > +	return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
> > +}
> >    
> 
> Hello,
> 
> Maybe we can consider embedding the slow_handler into struct panthor_irq?
> You can then always use a default threaded IRQ handler here and call
> pirq->slow_handler when needed.

The idea here was that the compiler can basically inline
panthor_irq_default_threaded_handler() and turn
panthor_job_irq_handler() into a direct call. If we make it a true
function pointer stored in panthor_irq level this optimization
can't happen.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field
  2026-04-29  9:38 ` [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
  2026-04-29 12:29   ` Liviu Dudau
@ 2026-05-01 13:17   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Steven Price @ 2026-05-01 13:17 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> The only place where panthor_irq::state is accessed without
> panthor_irq::mask_lock held is in the prologue of _irq_suspend(),
> which is not really a fast-path. So let's simplify things by assuming
> panthor_irq::state must always be accessed with the mask_lock held,
> and add a scoped_guard() in _irq_suspend().
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 35 ++++++++++++++++----------------
>  1 file changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 4e4607bca7cc..3f91ba73829d 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -101,8 +101,12 @@ struct panthor_irq {
>  	 */
>  	spinlock_t mask_lock;
>  
> -	/** @state: one of &enum panthor_irq_state reflecting the current state. */
> -	atomic_t state;
> +	/**
> +	 * @state: one of &enum panthor_irq_state reflecting the current state.
> +	 *
> +	 * Must be accessed with mask_lock held.
> +	 */
> +	enum panthor_irq_state state;
>  };
>  
>  /**
> @@ -510,18 +514,15 @@ const char *panthor_exception_name(struct panthor_device *ptdev,
>  static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)			\
>  {												\
>  	struct panthor_irq *pirq = data;							\
> -	enum panthor_irq_state old_state;							\
>  												\
>  	if (!gpu_read(pirq->iomem, INT_STAT))							\
>  		return IRQ_NONE;								\
>  												\
>  	guard(spinlock_irqsave)(&pirq->mask_lock);						\
> -	old_state = atomic_cmpxchg(&pirq->state,						\
> -				   PANTHOR_IRQ_STATE_ACTIVE,					\
> -				   PANTHOR_IRQ_STATE_PROCESSING);				\
> -	if (old_state != PANTHOR_IRQ_STATE_ACTIVE)						\
> +	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)						\
>  		return IRQ_NONE;								\
>  												\
> +	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;						\
>  	gpu_write(pirq->iomem, INT_MASK, 0);							\
>  	return IRQ_WAKE_THREAD;									\
>  }												\
> @@ -551,13 +552,10 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
>  	}											\
>  												\
>  	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
> -		enum panthor_irq_state old_state;						\
> -												\
> -		old_state = atomic_cmpxchg(&pirq->state,					\
> -					   PANTHOR_IRQ_STATE_PROCESSING,			\
> -					   PANTHOR_IRQ_STATE_ACTIVE);				\
> -		if (old_state == PANTHOR_IRQ_STATE_PROCESSING)					\
> +		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {				\
> +			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;					\
>  			gpu_write(pirq->iomem, INT_MASK, pirq->mask);				\
> +		}										\
>  	}											\
>  												\
>  	return ret;										\
> @@ -566,18 +564,19 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
>  static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
>  {												\
>  	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
> -		atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDING);				\
> +		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;					\
>  		gpu_write(pirq->iomem, INT_MASK, 0);						\
>  	}											\
>  	synchronize_irq(pirq->irq);								\
> -	atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDED);					\
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock)					\
> +		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;					\
>  }												\
>  												\
>  static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)			\
>  {												\
>  	guard(spinlock_irqsave)(&pirq->mask_lock);						\
>  												\
> -	atomic_set(&pirq->state, PANTHOR_IRQ_STATE_ACTIVE);					\
> +	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;							\
>  	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);						\
>  	gpu_write(pirq->iomem, INT_MASK, pirq->mask);						\
>  }												\
> @@ -610,7 +609,7 @@ static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *
>  	 * on the PROCESSING -> ACTIVE transition.						\
>  	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
>  	 */											\
> -	if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)				\
> +	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
>  		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
>  }												\
>  												\
> @@ -624,7 +623,7 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
>  	 * on the PROCESSING -> ACTIVE transition.						\
>  	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
>  	 */											\
> -	if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)				\
> +	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
>  		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
>  }
>  
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/10] drm/panthor: Move the register accessors before the IRQ helpers
  2026-04-29  9:38 ` [PATCH 02/10] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
  2026-04-29 12:31   ` Liviu Dudau
@ 2026-05-01 13:17   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Steven Price @ 2026-05-01 13:17 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> We're about to add an IRQ inline helper using gpu_read(). Move things
> around to avoid forward declarations.
> 
> No functional changes.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 142 +++++++++++++++----------------
>  1 file changed, 71 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 3f91ba73829d..768fc1992368 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -495,6 +495,77 @@ panthor_exception_is_fault(u32 exception_code)
>  const char *panthor_exception_name(struct panthor_device *ptdev,
>  				   u32 exception_code);
>  
> +static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
> +{
> +	writel(data, iomem + reg);
> +}
> +
> +static inline u32 gpu_read(void __iomem *iomem, u32 reg)
> +{
> +	return readl(iomem + reg);
> +}
> +
> +static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
> +{
> +	return readl_relaxed(iomem + reg);
> +}
> +
> +static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
> +{
> +	gpu_write(iomem, reg, lower_32_bits(data));
> +	gpu_write(iomem, reg + 4, upper_32_bits(data));
> +}
> +
> +static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
> +{
> +	return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
> +}
> +
> +static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
> +{
> +	return (gpu_read_relaxed(iomem, reg) |
> +		((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
> +}
> +
> +static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
> +{
> +	u32 lo, hi1, hi2;
> +	do {
> +		hi1 = gpu_read(iomem, reg + 4);
> +		lo = gpu_read(iomem, reg);
> +		hi2 = gpu_read(iomem, reg + 4);
> +	} while (hi1 != hi2);
> +	return lo | ((u64)hi2 << 32);
> +}
> +
> +#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
> +	read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,	\
> +			  iomem, reg)
> +
> +#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
> +				     timeout_us)				\
> +	read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,	\
> +				 false, iomem, reg)
> +
> +#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
> +	read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,	\
> +			  iomem, reg)
> +
> +#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
> +				       timeout_us)				\
> +	read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,	\
> +				 false, iomem, reg)
> +
> +#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,	\
> +					     timeout_us)			\
> +	read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,		\
> +				 timeout_us, false, iomem, reg)
> +
> +#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,	\
> +					timeout_us)				\
> +	read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,	\
> +			  false, iomem, reg)
> +
>  #define INT_RAWSTAT 0x0
>  #define INT_CLEAR   0x4
>  #define INT_MASK    0x8
> @@ -629,75 +700,4 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
>  
>  extern struct workqueue_struct *panthor_cleanup_wq;
>  
> -static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
> -{
> -	writel(data, iomem + reg);
> -}
> -
> -static inline u32 gpu_read(void __iomem *iomem, u32 reg)
> -{
> -	return readl(iomem + reg);
> -}
> -
> -static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
> -{
> -	return readl_relaxed(iomem + reg);
> -}
> -
> -static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
> -{
> -	gpu_write(iomem, reg, lower_32_bits(data));
> -	gpu_write(iomem, reg + 4, upper_32_bits(data));
> -}
> -
> -static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
> -{
> -	return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
> -}
> -
> -static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
> -{
> -	return (gpu_read_relaxed(iomem, reg) |
> -		((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
> -}
> -
> -static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
> -{
> -	u32 lo, hi1, hi2;
> -	do {
> -		hi1 = gpu_read(iomem, reg + 4);
> -		lo = gpu_read(iomem, reg);
> -		hi2 = gpu_read(iomem, reg + 4);
> -	} while (hi1 != hi2);
> -	return lo | ((u64)hi2 << 32);
> -}
> -
> -#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
> -	read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,	\
> -			  iomem, reg)
> -
> -#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
> -				     timeout_us)				\
> -	read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,	\
> -				 false, iomem, reg)
> -
> -#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
> -	read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,	\
> -			  iomem, reg)
> -
> -#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
> -				       timeout_us)				\
> -	read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,	\
> -				 false, iomem, reg)
> -
> -#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,	\
> -					     timeout_us)			\
> -	read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,		\
> -				 timeout_us, false, iomem, reg)
> -
> -#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,	\
> -					timeout_us)				\
> -	read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,	\
> -			  false, iomem, reg)
> -
>  #endif
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers
  2026-04-29  9:38 ` [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
  2026-04-30  9:40   ` Karunika Choo
@ 2026-05-01 13:22   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Steven Price @ 2026-05-01 13:22 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> Now that panthor_irq contains the iomem region, there's no real need
> for the macro-based panthor_irq helper generation logic. We can just
> provide inline helpers that do the same and let the compiler optimize
> indirect function calls. The only extra annoyance is the fact we have
> to open-code the panthor_xxx_irq_threaded_handler() implementation, but
> those are single-line functions, so it's acceptable.
> 
> While at it, we changed the prototype of the IRQ handlers to take
> a panthor_irq instead of panthor_device, since that's the thing
> that's passed around when it comes to panthor_irq, and the
> panthor_device can be directly extracted from there.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Nice! I've never liked those macros. One minor issue below but with that
fixed:

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 245 +++++++++++++++----------------
>  drivers/gpu/drm/panthor/panthor_fw.c     |  22 ++-
>  drivers/gpu/drm/panthor/panthor_gpu.c    |  26 ++--
>  drivers/gpu/drm/panthor/panthor_mmu.c    |  37 ++---
>  drivers/gpu/drm/panthor/panthor_pwr.c    |  20 ++-
>  5 files changed, 183 insertions(+), 167 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 768fc1992368..afa202546316 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
[...]
> +static inline int
> +panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> +		    int irq, u32 mask, void __iomem *iomem, const char *name,
> +		    irqreturn_t (*threaded_handler)(int, void *data))
> +{
> +	const char *full_name;
> +
> +	pirq->ptdev = ptdev;
> +	pirq->irq = irq;
> +	pirq->mask = mask;
> +	pirq->iomem = iomem;
> +	spin_lock_init(&pirq->mask_lock);
> +	panthor_irq_resume(pirq);
> +
> +	full_name = devm_kasprintf(ptdev->base.dev, GFP_KERNEL, KBUILD_MODNAME "-%s", name);
> +	if (!full_name)
> +		return -ENOMEM;

You should probably move this failure path up before the
panthor_irq_resume() call.

Thanks,
Steve


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/10] drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers
  2026-04-29  9:38 ` [PATCH 04/10] drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers Boris Brezillon
  2026-04-29 13:32   ` Liviu Dudau
@ 2026-05-01 13:28   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Steven Price @ 2026-05-01 13:28 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> All drivers except panthor signal their fences from their interrupt
> handler to minimize latency. We could do the same from the interrupt
> handler, but the latency is still quite high in that case, so let's
> allow components to choose the context they want their IRQ handler
> to run in.
> 
> This takes the form of an extra fast_handler() returning an irqreturn_t
> reflecting the need to wake-up a thread or not.
> A new PANTHOR_IRQ_ADV_HANDLER() macro taking this extra fast_handler
> argument is added, PANTHOR_IRQ_HANDLER() is implemented as a wrapper
> around PANTHOR_IRQ_ADV_HANDLER() with a default fast_handler
> returning IRQ_WAKE_THREAD. The fast and slow handler are still assumed
> to be mutually exclusive. In case a fast handler is provided, the
> slow_handler is expected to be run when the event can't be processed
> directly in the fast handler, or when the driver thinks it would be
> beneficial to coalesce interrupts by polling in the thread rather than
> re-enabling interrupts immediately.

As Liviu pointed out this commit message isn't right, but the code
change below looks fine. So with a fixed commit message:

Reviewed-by: Steven Price <steven.price@arm.com>

> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 5 ++---
>  drivers/gpu/drm/panthor/panthor_fw.c     | 1 +
>  drivers/gpu/drm/panthor/panthor_gpu.c    | 1 +
>  drivers/gpu/drm/panthor/panthor_mmu.c    | 1 +
>  drivers/gpu/drm/panthor/panthor_pwr.c    | 1 +
>  5 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index afa202546316..1c130b8394ab 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
>  static inline int
>  panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
>  		    int irq, u32 mask, void __iomem *iomem, const char *name,
> +		    irqreturn_t (*raw_handler)(int, void *data),
>  		    irqreturn_t (*threaded_handler)(int, void *data))
>  {
>  	const char *full_name;
> @@ -687,9 +688,7 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
>  	if (!full_name)
>  		return -ENOMEM;
>  
> -	return devm_request_threaded_irq(ptdev->base.dev, irq,
> -					 panthor_irq_default_raw_handler,
> -					 threaded_handler,
> +	return devm_request_threaded_irq(ptdev->base.dev, irq, raw_handler, threaded_handler,
>  					 IRQF_SHARED, full_name, pirq);
>  }
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index eaf599b0a887..8239a6951569 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1483,6 +1483,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
>  
>  	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
>  				  ptdev->iomem + JOB_INT_BASE, "job",
> +				  panthor_irq_default_raw_handler,
>  				  panthor_job_irq_threaded_handler);
>  	if (ret) {
>  		drm_err(&ptdev->base, "failed to request job irq");
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index ce208e384762..d0be758ea3e1 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -177,6 +177,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
>  	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
>  				  GPU_INTERRUPTS_MASK,
>  				  ptdev->iomem + GPU_INT_BASE, "gpu",
> +				  panthor_irq_default_raw_handler,
>  				  panthor_gpu_irq_threaded_handler);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index a0d0a9b2926f..2cb07933b629 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -3260,6 +3260,7 @@ int panthor_mmu_init(struct panthor_device *ptdev)
>  	ret = panthor_irq_request(ptdev, &mmu->irq, irq,
>  				  panthor_mmu_fault_mask(ptdev, ~0),
>  				  ptdev->iomem + MMU_INT_BASE, "mmu",
> +				  panthor_irq_default_raw_handler,
>  				  panthor_mmu_irq_threaded_handler);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
> index 80cf78007896..1efb7f3482ba 100644
> --- a/drivers/gpu/drm/panthor/panthor_pwr.c
> +++ b/drivers/gpu/drm/panthor/panthor_pwr.c
> @@ -491,6 +491,7 @@ int panthor_pwr_init(struct panthor_device *ptdev)
>  	err = panthor_irq_request(
>  		ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
>  		pwr->iomem + PWR_INT_BASE, "pwr",
> +		panthor_irq_default_raw_handler,
>  		panthor_pwr_irq_threaded_handler);
>  	if (err)
>  		return err;
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/10] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
  2026-04-29  9:38 ` [PATCH 05/10] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
  2026-04-29 13:33   ` Liviu Dudau
@ 2026-05-01 13:39   ` Steven Price
  1 sibling, 0 replies; 39+ messages in thread
From: Steven Price @ 2026-05-01 13:39 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> If we want some FW events to be processed in the interrupt path, we need
> the helpers manipulating req regs to be IRQ-safe, which implies using
> spin_lock_irqsave instead of spinlock. While at it, use guards instead
> of plain spin_lock/unlock calls.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_fw.h | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
> index a99a9b6f4825..e56b7fe15bb3 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.h
> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
> @@ -432,12 +432,11 @@ struct panthor_fw_global_iface {
>  #define panthor_fw_toggle_reqs(__iface, __in_reg, __out_reg, __mask) \
>  	do { \
>  		u32 __cur_val, __new_val, __out_val; \
> -		spin_lock(&(__iface)->lock); \
> +		guard(spinlock_irqsave)(&(__iface)->lock); \
>  		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
>  		__out_val = READ_ONCE((__iface)->output->__out_reg); \
>  		__new_val = ((__out_val ^ (__mask)) & (__mask)) | (__cur_val & ~(__mask)); \
>  		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -		spin_unlock(&(__iface)->lock); \
>  	} while (0)
>  
>  /**
> @@ -458,21 +457,19 @@ struct panthor_fw_global_iface {
>  #define panthor_fw_update_reqs(__iface, __in_reg, __val, __mask) \
>  	do { \
>  		u32 __cur_val, __new_val; \
> -		spin_lock(&(__iface)->lock); \
> +		guard(spinlock_irqsave)(&(__iface)->lock); \
>  		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
>  		__new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
>  		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -		spin_unlock(&(__iface)->lock); \
>  	} while (0)
>  
>  #define panthor_fw_update_reqs64(__iface, __in_reg, __val, __mask) \
>  	do { \
>  		u64 __cur_val, __new_val; \
> -		spin_lock(&(__iface)->lock); \
> +		guard(spinlock_irqsave)(&(__iface)->lock); \
>  		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
>  		__new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
>  		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -		spin_unlock(&(__iface)->lock); \
>  	} while (0)
>  
>  struct panthor_fw_global_iface *
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/10] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-04-29  9:38 ` [PATCH 06/10] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
@ 2026-05-01 13:47   ` Steven Price
  2026-05-04  9:34     ` Boris Brezillon
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Price @ 2026-05-01 13:47 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> Add a specific spinlock for events processing, and force processing
> of events in the panthor_sched_report_fw_events() path rather than
> deferring it to a work item. We also fast-track fence signalling by
> making the job completion logic IRQ-safe.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

I think there's some locking problems here. With some AI help I found
the following path:

 * panthor_job_irq_handler()
  * panthor_sched_report_fw_events()
   * [takes events_lock]
   * sched_process_csg_irq_locked()
    * csg_slot_process_progress_timer_event_locked()
     * lockdep_assert_held(&sched->lock);

Thanks,

Steve

> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 322 +++++++++++++++-----------------
>  1 file changed, 149 insertions(+), 173 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 5b34032deff8..c197bdc4b2c7 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -177,18 +177,6 @@ struct panthor_scheduler {
>  	 */
>  	struct work_struct sync_upd_work;
>  
> -	/**
> -	 * @fw_events_work: Work used to process FW events outside the interrupt path.
> -	 *
> -	 * Even if the interrupt is threaded, we need any event processing
> -	 * that require taking the panthor_scheduler::lock to be processed
> -	 * outside the interrupt path so we don't block the tick logic when
> -	 * it calls panthor_fw_{csg,wait}_wait_acks(). Since most of the
> -	 * event processing requires taking this lock, we just delegate all
> -	 * FW event processing to the scheduler workqueue.
> -	 */
> -	struct work_struct fw_events_work;
> -
>  	/**
>  	 * @fw_events: Bitmask encoding pending FW events.
>  	 */
> @@ -254,6 +242,15 @@ struct panthor_scheduler {
>  		struct list_head waiting;
>  	} groups;
>  
> +	/**
> +	 * @events_lock: Lock taken when processing events.
> +	 *
> +	 * This also needs to be taken when csg_slots are updated, to make sure
> +	 * the event processing logic doesn't touch groups that have left the CSG
> +	 * slot.
> +	 */
> +	spinlock_t events_lock;
> +
>  	/**
>  	 * @csg_slots: FW command stream group slots.
>  	 */
> @@ -676,9 +673,6 @@ struct panthor_group {
>  	 */
>  	struct panthor_kernel_bo *protm_suspend_buf;
>  
> -	/** @sync_upd_work: Work used to check/signal job fences. */
> -	struct work_struct sync_upd_work;
> -
>  	/** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
>  	struct work_struct tiler_oom_work;
>  
> @@ -999,7 +993,6 @@ static int
>  group_bind_locked(struct panthor_group *group, u32 csg_id)
>  {
>  	struct panthor_device *ptdev = group->ptdev;
> -	struct panthor_csg_slot *csg_slot;
>  	int ret;
>  
>  	lockdep_assert_held(&ptdev->scheduler->lock);
> @@ -1012,9 +1005,7 @@ group_bind_locked(struct panthor_group *group, u32 csg_id)
>  	if (ret)
>  		return ret;
>  
> -	csg_slot = &ptdev->scheduler->csg_slots[csg_id];
>  	group_get(group);
> -	group->csg_id = csg_id;
>  
>  	/* Dummy doorbell allocation: doorbell is assigned to the group and
>  	 * all queues use the same doorbell.
> @@ -1026,7 +1017,10 @@ group_bind_locked(struct panthor_group *group, u32 csg_id)
>  	for (u32 i = 0; i < group->queue_count; i++)
>  		group->queues[i]->doorbell_id = csg_id + 1;
>  
> -	csg_slot->group = group;
> +	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +		ptdev->scheduler->csg_slots[csg_id].group = group;
> +		group->csg_id = csg_id;
> +	}
>  
>  	return 0;
>  }
> @@ -1041,7 +1035,6 @@ static int
>  group_unbind_locked(struct panthor_group *group)
>  {
>  	struct panthor_device *ptdev = group->ptdev;
> -	struct panthor_csg_slot *slot;
>  
>  	lockdep_assert_held(&ptdev->scheduler->lock);
>  
> @@ -1051,9 +1044,12 @@ group_unbind_locked(struct panthor_group *group)
>  	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
>  		return -EINVAL;
>  
> -	slot = &ptdev->scheduler->csg_slots[group->csg_id];
> +	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
> +		group->csg_id = -1;
> +	}
> +
>  	panthor_vm_idle(group->vm);
> -	group->csg_id = -1;
>  
>  	/* Tiler OOM events will be re-issued next time the group is scheduled. */
>  	atomic_set(&group->tiler_oom, 0);
> @@ -1062,8 +1058,6 @@ group_unbind_locked(struct panthor_group *group)
>  	for (u32 i = 0; i < group->queue_count; i++)
>  		group->queues[i]->doorbell_id = -1;
>  
> -	slot->group = NULL;
> -
>  	group_put(group);
>  	return 0;
>  }
> @@ -1151,16 +1145,14 @@ queue_suspend_timeout_locked(struct panthor_queue *queue)
>  static void
>  queue_suspend_timeout(struct panthor_queue *queue)
>  {
> -	spin_lock(&queue->fence_ctx.lock);
> +	guard(spinlock_irqsave)(&queue->fence_ctx.lock);
>  	queue_suspend_timeout_locked(queue);
> -	spin_unlock(&queue->fence_ctx.lock);
>  }
>  
>  static void
>  queue_resume_timeout(struct panthor_queue *queue)
>  {
> -	spin_lock(&queue->fence_ctx.lock);
> -
> +	guard(spinlock_irqsave)(&queue->fence_ctx.lock);
>  	if (queue_timeout_is_suspended(queue)) {
>  		mod_delayed_work(queue->scheduler.timeout_wq,
>  				 &queue->timeout.work,
> @@ -1168,8 +1160,6 @@ queue_resume_timeout(struct panthor_queue *queue)
>  
>  		queue->timeout.remaining = MAX_SCHEDULE_TIMEOUT;
>  	}
> -
> -	spin_unlock(&queue->fence_ctx.lock);
>  }
>  
>  /**
> @@ -1484,7 +1474,7 @@ cs_slot_process_fatal_event_locked(struct panthor_device *ptdev,
>  	u32 fatal;
>  	u64 info;
>  
> -	lockdep_assert_held(&sched->lock);
> +	lockdep_assert_held(&sched->events_lock);
>  
>  	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
>  	fatal = cs_iface->output->fatal;
> @@ -1532,7 +1522,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
>  	u32 fault;
>  	u64 info;
>  
> -	lockdep_assert_held(&sched->lock);
> +	lockdep_assert_held(&sched->events_lock);
>  
>  	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
>  	fault = cs_iface->output->fault;
> @@ -1542,7 +1532,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
>  		u64 cs_extract = queue->iface.output->extract;
>  		struct panthor_job *job;
>  
> -		spin_lock(&queue->fence_ctx.lock);
> +		guard(spinlock_irqsave)(&queue->fence_ctx.lock);
>  		list_for_each_entry(job, &queue->fence_ctx.in_flight_jobs, node) {
>  			if (cs_extract >= job->ringbuf.end)
>  				continue;
> @@ -1552,7 +1542,6 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
>  
>  			dma_fence_set_error(job->done_fence, -EINVAL);
>  		}
> -		spin_unlock(&queue->fence_ctx.lock);
>  	}
>  
>  	if (group) {
> @@ -1682,7 +1671,7 @@ cs_slot_process_tiler_oom_event_locked(struct panthor_device *ptdev,
>  	struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
>  	struct panthor_group *group = csg_slot->group;
>  
> -	lockdep_assert_held(&sched->lock);
> +	lockdep_assert_held(&sched->events_lock);
>  
>  	if (drm_WARN_ON(&ptdev->base, !group))
>  		return;
> @@ -1703,7 +1692,7 @@ static bool cs_slot_process_irq_locked(struct panthor_device *ptdev,
>  	struct panthor_fw_cs_iface *cs_iface;
>  	u32 req, ack, events;
>  
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> +	lockdep_assert_held(&ptdev->scheduler->events_lock);
>  
>  	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
>  	req = cs_iface->input->req;
> @@ -1731,7 +1720,7 @@ static void csg_slot_process_idle_event_locked(struct panthor_device *ptdev, u32
>  {
>  	struct panthor_scheduler *sched = ptdev->scheduler;
>  
> -	lockdep_assert_held(&sched->lock);
> +	lockdep_assert_held(&sched->events_lock);
>  
>  	sched->might_have_idle_groups = true;
>  
> @@ -1742,16 +1731,102 @@ static void csg_slot_process_idle_event_locked(struct panthor_device *ptdev, u32
>  	sched_queue_delayed_work(sched, tick, 0);
>  }
>  
> +static void update_fdinfo_stats(struct panthor_job *job)
> +{
> +	struct panthor_group *group = job->group;
> +	struct panthor_queue *queue = group->queues[job->queue_idx];
> +	struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
> +	struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
> +	struct panthor_job_profiling_data *data = &slots[job->profiling.slot];
> +
> +	scoped_guard(spinlock, &group->fdinfo.lock) {
> +		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
> +			fdinfo->cycles += data->cycles.after - data->cycles.before;
> +		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
> +			fdinfo->time += data->time.after - data->time.before;
> +	}
> +}
> +
> +static bool queue_check_job_completion(struct panthor_queue *queue)
> +{
> +	struct panthor_syncobj_64b *syncobj = NULL;
> +	struct panthor_job *job, *job_tmp;
> +	bool cookie, progress = false;
> +	LIST_HEAD(done_jobs);
> +
> +	cookie = dma_fence_begin_signalling();
> +	scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock) {
> +		list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) {
> +			if (!syncobj) {
> +				struct panthor_group *group = job->group;
> +
> +				syncobj = group->syncobjs->kmap +
> +					  (job->queue_idx * sizeof(*syncobj));
> +			}
> +
> +			if (syncobj->seqno < job->done_fence->seqno)
> +				break;
> +
> +			list_move_tail(&job->node, &done_jobs);
> +			dma_fence_signal_locked(job->done_fence);
> +		}
> +
> +		if (list_empty(&queue->fence_ctx.in_flight_jobs)) {
> +			/* If we have no job left, we cancel the timer, and reset remaining
> +			 * time to its default so it can be restarted next time
> +			 * queue_resume_timeout() is called.
> +			 */
> +			queue_suspend_timeout_locked(queue);
> +
> +			/* If there's no job pending, we consider it progress to avoid a
> +			 * spurious timeout if the timeout handler and the sync update
> +			 * handler raced.
> +			 */
> +			progress = true;
> +		} else if (!list_empty(&done_jobs)) {
> +			queue_reset_timeout_locked(queue);
> +			progress = true;
> +		}
> +	}
> +	dma_fence_end_signalling(cookie);
> +
> +	list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
> +		if (job->profiling.mask)
> +			update_fdinfo_stats(job);
> +		list_del_init(&job->node);
> +		panthor_job_put(&job->base);
> +	}
> +
> +	return progress;
> +}
> +
> +static void group_check_job_completion(struct panthor_group *group)
> +{
> +	bool cookie;
> +	u32 queue_idx;
> +
> +	cookie = dma_fence_begin_signalling();
> +	for (queue_idx = 0; queue_idx < group->queue_count; queue_idx++) {
> +		struct panthor_queue *queue = group->queues[queue_idx];
> +
> +		if (!queue)
> +			continue;
> +
> +		queue_check_job_completion(queue);
> +	}
> +	dma_fence_end_signalling(cookie);
> +}
> +
>  static void csg_slot_sync_update_locked(struct panthor_device *ptdev,
>  					u32 csg_id)
>  {
>  	struct panthor_csg_slot *csg_slot = &ptdev->scheduler->csg_slots[csg_id];
>  	struct panthor_group *group = csg_slot->group;
>  
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> +	lockdep_assert_held(&ptdev->scheduler->events_lock);
>  
>  	if (group)
> -		group_queue_work(group, sync_upd);
> +		group_check_job_completion(group);
>  
>  	sched_queue_work(ptdev->scheduler, sync_upd);
>  }
> @@ -1784,7 +1859,7 @@ static void sched_process_csg_irq_locked(struct panthor_device *ptdev, u32 csg_i
>  	struct panthor_fw_csg_iface *csg_iface;
>  	u32 ring_cs_db_mask = 0;
>  
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> +	lockdep_assert_held(&ptdev->scheduler->events_lock);
>  
>  	if (drm_WARN_ON(&ptdev->base, csg_id >= ptdev->scheduler->csg_slot_count))
>  		return;
> @@ -1842,7 +1917,7 @@ static void sched_process_idle_event_locked(struct panthor_device *ptdev)
>  {
>  	struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>  
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> +	lockdep_assert_held(&ptdev->scheduler->events_lock);
>  
>  	/* Acknowledge the idle event and schedule a tick. */
>  	panthor_fw_update_reqs(glb_iface, req, glb_iface->output->ack, GLB_IDLE);
> @@ -1858,7 +1933,7 @@ static void sched_process_global_irq_locked(struct panthor_device *ptdev)
>  	struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>  	u32 req, ack, evts;
>  
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> +	lockdep_assert_held(&ptdev->scheduler->events_lock);
>  
>  	req = READ_ONCE(glb_iface->input->req);
>  	ack = READ_ONCE(glb_iface->output->ack);
> @@ -1868,30 +1943,6 @@ static void sched_process_global_irq_locked(struct panthor_device *ptdev)
>  		sched_process_idle_event_locked(ptdev);
>  }
>  
> -static void process_fw_events_work(struct work_struct *work)
> -{
> -	struct panthor_scheduler *sched = container_of(work, struct panthor_scheduler,
> -						      fw_events_work);
> -	u32 events = atomic_xchg(&sched->fw_events, 0);
> -	struct panthor_device *ptdev = sched->ptdev;
> -
> -	mutex_lock(&sched->lock);
> -
> -	if (events & JOB_INT_GLOBAL_IF) {
> -		sched_process_global_irq_locked(ptdev);
> -		events &= ~JOB_INT_GLOBAL_IF;
> -	}
> -
> -	while (events) {
> -		u32 csg_id = ffs(events) - 1;
> -
> -		sched_process_csg_irq_locked(ptdev, csg_id);
> -		events &= ~BIT(csg_id);
> -	}
> -
> -	mutex_unlock(&sched->lock);
> -}
> -
>  /**
>   * panthor_sched_report_fw_events() - Report FW events to the scheduler.
>   * @ptdev: Device.
> @@ -1902,8 +1953,19 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
>  	if (!ptdev->scheduler)
>  		return;
>  
> -	atomic_or(events, &ptdev->scheduler->fw_events);
> -	sched_queue_work(ptdev->scheduler, fw_events);
> +	guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> +
> +	if (events & JOB_INT_GLOBAL_IF) {
> +		sched_process_global_irq_locked(ptdev);
> +		events &= ~JOB_INT_GLOBAL_IF;
> +	}
> +
> +	while (events) {
> +		u32 csg_id = ffs(events) - 1;
> +
> +		sched_process_csg_irq_locked(ptdev, csg_id);
> +		events &= ~BIT(csg_id);
> +	}
>  }
>  
>  static const char *fence_get_driver_name(struct dma_fence *fence)
> @@ -2136,7 +2198,9 @@ tick_ctx_init(struct panthor_scheduler *sched,
>  		 * CSG IRQs, so we can flag the faulty queue.
>  		 */
>  		if (panthor_vm_has_unhandled_faults(group->vm)) {
> -			sched_process_csg_irq_locked(ptdev, i);
> +			scoped_guard(spinlock_irqsave, &sched->events_lock) {
> +				sched_process_csg_irq_locked(ptdev, i);
> +			}
>  
>  			/* No fatal fault reported, flag all queues as faulty. */
>  			if (!group->fatal_queues)
> @@ -2183,13 +2247,13 @@ group_term_post_processing(struct panthor_group *group)
>  		if (!queue)
>  			continue;
>  
> -		spin_lock(&queue->fence_ctx.lock);
> -		list_for_each_entry_safe(job, tmp, &queue->fence_ctx.in_flight_jobs, node) {
> -			list_move_tail(&job->node, &faulty_jobs);
> -			dma_fence_set_error(job->done_fence, err);
> -			dma_fence_signal_locked(job->done_fence);
> +		scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock) {
> +			list_for_each_entry_safe(job, tmp, &queue->fence_ctx.in_flight_jobs, node) {
> +				list_move_tail(&job->node, &faulty_jobs);
> +				dma_fence_set_error(job->done_fence, err);
> +				dma_fence_signal_locked(job->done_fence);
> +			}
>  		}
> -		spin_unlock(&queue->fence_ctx.lock);
>  
>  		/* Manually update the syncobj seqno to unblock waiters. */
>  		syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj));
> @@ -2336,8 +2400,10 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
>  			 * any pending interrupts before we start the new
>  			 * group.
>  			 */
> -			if (group->csg_id >= 0)
> +			if (group->csg_id >= 0) {
> +				guard(spinlock_irqsave)(&sched->events_lock);
>  				sched_process_csg_irq_locked(ptdev, group->csg_id);
> +			}
>  
>  			group_unbind_locked(group);
>  		}
> @@ -2920,8 +2986,10 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>  
>  		group_get(group);
>  
> -		if (group->csg_id >= 0)
> +		if (group->csg_id >= 0) {
> +			guard(spinlock_irqsave)(&sched->events_lock);
>  			sched_process_csg_irq_locked(ptdev, group->csg_id);
> +		}
>  
>  		group_unbind_locked(group);
>  
> @@ -3005,22 +3073,6 @@ void panthor_sched_post_reset(struct panthor_device *ptdev, bool reset_failed)
>  	}
>  }
>  
> -static void update_fdinfo_stats(struct panthor_job *job)
> -{
> -	struct panthor_group *group = job->group;
> -	struct panthor_queue *queue = group->queues[job->queue_idx];
> -	struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
> -	struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
> -	struct panthor_job_profiling_data *data = &slots[job->profiling.slot];
> -
> -	scoped_guard(spinlock, &group->fdinfo.lock) {
> -		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
> -			fdinfo->cycles += data->cycles.after - data->cycles.before;
> -		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
> -			fdinfo->time += data->time.after - data->time.before;
> -	}
> -}
> -
>  void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
>  {
>  	struct panthor_group_pool *gpool = pfile->groups;
> @@ -3041,80 +3093,6 @@ void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
>  	xa_unlock(&gpool->xa);
>  }
>  
> -static bool queue_check_job_completion(struct panthor_queue *queue)
> -{
> -	struct panthor_syncobj_64b *syncobj = NULL;
> -	struct panthor_job *job, *job_tmp;
> -	bool cookie, progress = false;
> -	LIST_HEAD(done_jobs);
> -
> -	cookie = dma_fence_begin_signalling();
> -	spin_lock(&queue->fence_ctx.lock);
> -	list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) {
> -		if (!syncobj) {
> -			struct panthor_group *group = job->group;
> -
> -			syncobj = group->syncobjs->kmap +
> -				  (job->queue_idx * sizeof(*syncobj));
> -		}
> -
> -		if (syncobj->seqno < job->done_fence->seqno)
> -			break;
> -
> -		list_move_tail(&job->node, &done_jobs);
> -		dma_fence_signal_locked(job->done_fence);
> -	}
> -
> -	if (list_empty(&queue->fence_ctx.in_flight_jobs)) {
> -		/* If we have no job left, we cancel the timer, and reset remaining
> -		 * time to its default so it can be restarted next time
> -		 * queue_resume_timeout() is called.
> -		 */
> -		queue_suspend_timeout_locked(queue);
> -
> -		/* If there's no job pending, we consider it progress to avoid a
> -		 * spurious timeout if the timeout handler and the sync update
> -		 * handler raced.
> -		 */
> -		progress = true;
> -	} else if (!list_empty(&done_jobs)) {
> -		queue_reset_timeout_locked(queue);
> -		progress = true;
> -	}
> -	spin_unlock(&queue->fence_ctx.lock);
> -	dma_fence_end_signalling(cookie);
> -
> -	list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
> -		if (job->profiling.mask)
> -			update_fdinfo_stats(job);
> -		list_del_init(&job->node);
> -		panthor_job_put(&job->base);
> -	}
> -
> -	return progress;
> -}
> -
> -static void group_sync_upd_work(struct work_struct *work)
> -{
> -	struct panthor_group *group =
> -		container_of(work, struct panthor_group, sync_upd_work);
> -	u32 queue_idx;
> -	bool cookie;
> -
> -	cookie = dma_fence_begin_signalling();
> -	for (queue_idx = 0; queue_idx < group->queue_count; queue_idx++) {
> -		struct panthor_queue *queue = group->queues[queue_idx];
> -
> -		if (!queue)
> -			continue;
> -
> -		queue_check_job_completion(queue);
> -	}
> -	dma_fence_end_signalling(cookie);
> -
> -	group_put(group);
> -}
> -
>  struct panthor_job_ringbuf_instrs {
>  	u64 buffer[MAX_INSTRS_PER_JOB];
>  	u32 count;
> @@ -3346,9 +3324,8 @@ queue_run_job(struct drm_sched_job *sched_job)
>  	job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64));
>  
>  	panthor_job_get(&job->base);
> -	spin_lock(&queue->fence_ctx.lock);
> -	list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
> -	spin_unlock(&queue->fence_ctx.lock);
> +	scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock)
> +		list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
>  
>  	/* Make sure the ring buffer is updated before the INSERT
>  	 * register.
> @@ -3683,7 +3660,6 @@ int panthor_group_create(struct panthor_file *pfile,
>  	INIT_LIST_HEAD(&group->wait_node);
>  	INIT_LIST_HEAD(&group->run_node);
>  	INIT_WORK(&group->term_work, group_term_work);
> -	INIT_WORK(&group->sync_upd_work, group_sync_upd_work);
>  	INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
>  	INIT_WORK(&group->release_work, group_release_work);
>  
> @@ -4054,7 +4030,6 @@ void panthor_sched_unplug(struct panthor_device *ptdev)
>  	struct panthor_scheduler *sched = ptdev->scheduler;
>  
>  	disable_delayed_work_sync(&sched->tick_work);
> -	disable_work_sync(&sched->fw_events_work);
>  	disable_work_sync(&sched->sync_upd_work);
>  
>  	mutex_lock(&sched->lock);
> @@ -4139,7 +4114,8 @@ int panthor_sched_init(struct panthor_device *ptdev)
>  	sched->tick_period = msecs_to_jiffies(10);
>  	INIT_DELAYED_WORK(&sched->tick_work, tick_work);
>  	INIT_WORK(&sched->sync_upd_work, sync_upd_work);
> -	INIT_WORK(&sched->fw_events_work, process_fw_events_work);
> +
> +	spin_lock_init(&sched->events_lock);
>  
>  	ret = drmm_mutex_init(&ptdev->base, &sched->lock);
>  	if (ret)
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/10] drm/panthor: Automate CSG IRQ processing at group unbind time
  2026-04-29  9:38 ` [PATCH 07/10] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
@ 2026-05-01 13:53   ` Steven Price
  2026-05-04 15:00     ` Boris Brezillon
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Price @ 2026-05-01 13:53 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> Make the sched_process_csg_irq_locked() call part of
> group_unbind_locked() so we don't have to manually call it in
> tick_ctx_apply()/panthor_sched_suspend().
> 
> This implies moving group_[un]bind_locked() around to avoid a
> forward declaration.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 178 +++++++++++++++-----------------
>  1 file changed, 82 insertions(+), 96 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index c197bdc4b2c7..601a9bff1485 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -982,86 +982,6 @@ group_get(struct panthor_group *group)
>  	return group;
>  }
>  
> -/**
> - * group_bind_locked() - Bind a group to a group slot
> - * @group: Group.
> - * @csg_id: Slot.
> - *
> - * Return: 0 on success, a negative error code otherwise.
> - */
> -static int
> -group_bind_locked(struct panthor_group *group, u32 csg_id)
> -{
> -	struct panthor_device *ptdev = group->ptdev;
> -	int ret;
> -
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> -
> -	if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
> -			ptdev->scheduler->csg_slots[csg_id].group))
> -		return -EINVAL;
> -
> -	ret = panthor_vm_active(group->vm);
> -	if (ret)
> -		return ret;
> -
> -	group_get(group);
> -
> -	/* Dummy doorbell allocation: doorbell is assigned to the group and
> -	 * all queues use the same doorbell.
> -	 *
> -	 * TODO: Implement LRU-based doorbell assignment, so the most often
> -	 * updated queues get their own doorbell, thus avoiding useless checks
> -	 * on queues belonging to the same group that are rarely updated.
> -	 */
> -	for (u32 i = 0; i < group->queue_count; i++)
> -		group->queues[i]->doorbell_id = csg_id + 1;
> -
> -	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> -		ptdev->scheduler->csg_slots[csg_id].group = group;
> -		group->csg_id = csg_id;
> -	}
> -
> -	return 0;
> -}
> -
> -/**
> - * group_unbind_locked() - Unbind a group from a slot.
> - * @group: Group to unbind.
> - *
> - * Return: 0 on success, a negative error code otherwise.
> - */
> -static int
> -group_unbind_locked(struct panthor_group *group)
> -{
> -	struct panthor_device *ptdev = group->ptdev;
> -
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> -
> -	if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
> -		return -EINVAL;
> -
> -	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
> -		return -EINVAL;
> -
> -	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> -		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
> -		group->csg_id = -1;
> -	}
> -
> -	panthor_vm_idle(group->vm);
> -
> -	/* Tiler OOM events will be re-issued next time the group is scheduled. */
> -	atomic_set(&group->tiler_oom, 0);
> -	cancel_work(&group->tiler_oom_work);
> -
> -	for (u32 i = 0; i < group->queue_count; i++)
> -		group->queues[i]->doorbell_id = -1;
> -
> -	group_put(group);
> -	return 0;
> -}
> -
>  static bool
>  group_is_idle(struct panthor_group *group)
>  {
> @@ -1968,6 +1888,88 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
>  	}
>  }
>  
> +/**
> + * group_bind_locked() - Bind a group to a group slot
> + * @group: Group.
> + * @csg_id: Slot.
> + *
> + * Return: 0 on success, a negative error code otherwise.
> + */
> +static int
> +group_bind_locked(struct panthor_group *group, u32 csg_id)
> +{
> +	struct panthor_device *ptdev = group->ptdev;
> +	int ret;
> +
> +	lockdep_assert_held(&ptdev->scheduler->lock);
> +
> +	if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
> +			ptdev->scheduler->csg_slots[csg_id].group))
> +		return -EINVAL;
> +
> +	ret = panthor_vm_active(group->vm);
> +	if (ret)
> +		return ret;
> +
> +	group_get(group);
> +
> +	/* Dummy doorbell allocation: doorbell is assigned to the group and
> +	 * all queues use the same doorbell.
> +	 *
> +	 * TODO: Implement LRU-based doorbell assignment, so the most often
> +	 * updated queues get their own doorbell, thus avoiding useless checks
> +	 * on queues belonging to the same group that are rarely updated.
> +	 */
> +	for (u32 i = 0; i < group->queue_count; i++)
> +		group->queues[i]->doorbell_id = csg_id + 1;
> +
> +	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +		ptdev->scheduler->csg_slots[csg_id].group = group;
> +		group->csg_id = csg_id;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * group_unbind_locked() - Unbind a group from a slot.
> + * @group: Group to unbind.
> + *
> + * Return: 0 on success, a negative error code otherwise.
> + */
> +static int
> +group_unbind_locked(struct panthor_group *group)
> +{
> +	struct panthor_device *ptdev = group->ptdev;
> +
> +	lockdep_assert_held(&ptdev->scheduler->lock);
> +
> +	if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
> +		return -EINVAL;
> +
> +	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
> +		return -EINVAL;
> +
> +	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +		/* Process all pending IRQs before returning the slot. */
> +		sched_process_csg_irq_locked(ptdev, group->csg_id);
> +		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
> +		group->csg_id = -1;
> +	}
> +
> +	panthor_vm_idle(group->vm);
> +
> +	/* Tiler OOM events will be re-issued next time the group is scheduled. */
> +	atomic_set(&group->tiler_oom, 0);
> +	cancel_work(&group->tiler_oom_work);
> +
> +	for (u32 i = 0; i < group->queue_count; i++)
> +		group->queues[i]->doorbell_id = -1;
> +
> +	group_put(group);
> +	return 0;
> +}
> +
>  static const char *fence_get_driver_name(struct dma_fence *fence)
>  {
>  	return "panthor";
> @@ -2396,15 +2398,6 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
>  	/* Unbind evicted groups. */
>  	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
>  		list_for_each_entry(group, &ctx->old_groups[prio], run_node) {
> -			/* This group is gone. Process interrupts to clear
> -			 * any pending interrupts before we start the new
> -			 * group.
> -			 */
> -			if (group->csg_id >= 0) {
> -				guard(spinlock_irqsave)(&sched->events_lock);
> -				sched_process_csg_irq_locked(ptdev, group->csg_id);
> -			}
> -
>  			group_unbind_locked(group);
>  		}
>  	}
> @@ -2970,8 +2963,6 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>  
>  			if (flush_caches_failed)
>  				csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
> -			else
> -				csg_slot_sync_update_locked(ptdev, csg_id);

The justification for this change doesn't seem to be included in the
commit message and looks suspicious. Although AFAICT the events_lock
wouldn't be held here so it could trigger a lockdep assert before this
change...

Thanks,
Steve

>  
>  			slot_mask &= ~BIT(csg_id);
>  		}
> @@ -2986,11 +2977,6 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>  
>  		group_get(group);
>  
> -		if (group->csg_id >= 0) {
> -			guard(spinlock_irqsave)(&sched->events_lock);
> -			sched_process_csg_irq_locked(ptdev, group->csg_id);
> -		}
> -
>  		group_unbind_locked(group);
>  
>  		drm_WARN_ON(&group->ptdev->base, !list_empty(&group->run_node));
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-04-29  9:38 ` [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
@ 2026-05-01 14:20   ` Steven Price
  2026-05-04 11:02     ` Boris Brezillon
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Price @ 2026-05-01 14:20 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> Rather than assuming an interrupt is always expected for request
> acks, temporarily enable the relevant interrupts when the polling-wait
> failed. This should hopefully reduce the number of interrupts the CPU
> has to process.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

It seems to work, although I'm lightly uneasy about this because I'm not
entirely sure whether the FW will immediately see the updates to
ack_irq_mask and therefore whether there's a possibility to miss an
event and be stuck waiting for the timeout.

Memory models are not my strong point, OpenAI tells me the sequence
should be something like:

  scoped_guard(spinlock_irqsave, lock) {
  	u32 ack_irq_mask = READ_ONCE(*ack_irq_mask_ptr);

  	WRITE_ONCE(*ack_irq_mask_ptr, ack_irq_mask | req_mask);
  }

  /*
   * The FW interface can be mapped write-combine/Normal-NC. Make sure the
   * IRQ mask update is visible to the FW before sleeping waiting for
the IRQ.
   */
  wmb();

Which seems plausible. But I've long ago learnt that plausible doesn't
mean much when dealing with memory models!

Thanks,
Steve

> ---
>  drivers/gpu/drm/panthor/panthor_fw.c    | 34 +++++++++++++++++++--------------
>  drivers/gpu/drm/panthor/panthor_sched.c |  5 +++--
>  2 files changed, 23 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 8239a6951569..f5e0ceca4130 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1039,16 +1039,10 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
>  	glb_iface->input->progress_timer = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
>  	glb_iface->input->idle_timer = panthor_fw_conv_timeout(ptdev, IDLE_HYSTERESIS_US);
>  
> -	/* Enable interrupts we care about. */
> -	glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
> -					 GLB_PING |
> -					 GLB_CFG_PROGRESS_TIMER |
> -					 GLB_CFG_POWEROFF_TIMER |
> -					 GLB_IDLE_EN |
> -					 GLB_IDLE;
> -
> -	if (panthor_fw_has_glb_state(ptdev))
> -		glb_iface->input->ack_irq_mask |= GLB_STATE_MASK;
> +	/* Enable interrupts for asynchronous events that are not
> +	 * triggered by request acks.
> +	 */
> +	glb_iface->input->ack_irq_mask = GLB_IDLE;
>  
>  	panthor_fw_update_reqs(glb_iface, req, GLB_IDLE_EN | GLB_COUNTER_EN,
>  			       GLB_IDLE_EN | GLB_COUNTER_EN);
> @@ -1318,8 +1312,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>   * Return: 0 on success, -ETIMEDOUT otherwise.
>   */
>  static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
> -				wait_queue_head_t *wq,
> -				u32 req_mask, u32 *acked,
> +				u32 *ack_irq_mask_ptr, spinlock_t *lock,
> +				wait_queue_head_t *wq, u32 req_mask, u32 *acked,
>  				u32 timeout_ms)
>  {
>  	u32 ack, req = READ_ONCE(*req_ptr) & req_mask;
> @@ -1334,8 +1328,16 @@ static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
>  	if (!ret)
>  		return 0;
>  
> -	if (wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> -			       msecs_to_jiffies(timeout_ms)))
> +	scoped_guard(spinlock_irqsave, lock)
> +		*ack_irq_mask_ptr |= req_mask;
> +
> +	ret = wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> +				 msecs_to_jiffies(timeout_ms));
> +
> +	scoped_guard(spinlock_irqsave, lock)
> +		*ack_irq_mask_ptr &= ~req_mask;
> +
> +	if (ret)
>  		return 0;
>  
>  	/* Check one last time, in case we were not woken up for some reason. */
> @@ -1369,6 +1371,8 @@ int panthor_fw_glb_wait_acks(struct panthor_device *ptdev,
>  
>  	return panthor_fw_wait_acks(&glb_iface->input->req,
>  				    &glb_iface->output->ack,
> +				    &glb_iface->input->ack_irq_mask,
> +				    &glb_iface->lock,
>  				    &ptdev->fw->req_waitqueue,
>  				    req_mask, acked, timeout_ms);
>  }
> @@ -1395,6 +1399,8 @@ int panthor_fw_csg_wait_acks(struct panthor_device *ptdev, u32 csg_slot,
>  
>  	ret = panthor_fw_wait_acks(&csg_iface->input->req,
>  				   &csg_iface->output->ack,
> +				   &csg_iface->input->ack_irq_mask,
> +				   &csg_iface->lock,
>  				   &ptdev->fw->req_waitqueue,
>  				   req_mask, acked, timeout_ms);
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 601a9bff1485..2edba335f22d 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1110,7 +1110,7 @@ cs_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 cs_id)
>  	cs_iface->input->ringbuf_output = queue->iface.output_fw_va;
>  	cs_iface->input->config = CS_CONFIG_PRIORITY(queue->priority) |
>  				  CS_CONFIG_DOORBELL(queue->doorbell_id);
> -	cs_iface->input->ack_irq_mask = ~0;
> +	cs_iface->input->ack_irq_mask = CS_FATAL | CS_FAULT | CS_TILER_OOM;
>  	panthor_fw_update_reqs(cs_iface, req,
>  			       CS_IDLE_SYNC_WAIT |
>  			       CS_IDLE_EMPTY |
> @@ -1378,7 +1378,8 @@ csg_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 priority)
>  		csg_iface->input->protm_suspend_buf = 0;
>  	}
>  
> -	csg_iface->input->ack_irq_mask = ~0;
> +	csg_iface->input->ack_irq_mask = CSG_SYNC_UPDATE | CSG_IDLE |
> +					 CSG_PROGRESS_TIMER_EVENT;
>  	panthor_fw_toggle_reqs(csg_iface, doorbell_req, doorbell_ack, queue_mask);
>  	return 0;
>  }
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/10] drm/panthor: Process FW events in IRQ context
  2026-04-29  9:38 ` [PATCH 09/10] drm/panthor: Process FW events in IRQ context Boris Brezillon
@ 2026-05-01 14:38   ` Steven Price
  0 siblings, 0 replies; 39+ messages in thread
From: Steven Price @ 2026-05-01 14:38 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> Now that everything is set to allow processing FW events in IRQ context,
> go for it. This should reduce the dma_fence signaling latency.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Another AI-found locking bug...

With this change there's a callpath:

  panthor_job_irq_raw_handler()
    panthor_job_irq_handler()
      panthor_sched_report_fw_events()
        sched_process_csg_irq_locked()
          if (csg_events & CSG_SYNC_UPDATE)
            csg_slot_sync_update_locked()
              group_check_job_completion()
                queue_check_job_completion()
                  if (job->profiling.mask)
                    update_fdinfo_stats()
                      spin_lock(&group->fdinfo.lock)

However group->fdinfo.lock is also held in process context via:

 panthor_gpu_show_fdinfo()
   panthor_fdinfo_gather_group_samples()
     guard(spinlock)(&group->fdinfo.lock);

So panthor_fdinfo_gather_group_samples() will need to use
spinlock_irqsave to be safe.

Thanks,
Steve

> ---
>  drivers/gpu/drm/panthor/panthor_fw.c | 33 +++++++++++++++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index f5e0ceca4130..05c632913359 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1087,9 +1087,38 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
>  	}
>  }
>  
> +static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
> +{
> +	struct panthor_irq *pirq = data;
> +
> +	if (!gpu_read(pirq->iomem, INT_STAT))
> +		return IRQ_NONE;
> +
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +		if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> +			return IRQ_NONE;
> +
> +		pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> +	}
> +
> +	panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_RAWSTAT));
> +
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
> +			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> +	}
> +
> +	return IRQ_HANDLED;
> +}
> +
>  static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
>  {
> -	return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
> +	struct panthor_irq *pirq = data;
> +
> +	/* We never return IRQ_WAKE_THREAD, so we're not supposed to be called. */
> +	drm_WARN_ON_ONCE(&pirq->ptdev->base,
> +			 "threaded IRQ handler should never be called.");
> +	return IRQ_NONE;
>  }
>  
>  static int panthor_fw_start(struct panthor_device *ptdev)
> @@ -1489,7 +1518,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
>  
>  	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
>  				  ptdev->iomem + JOB_INT_BASE, "job",
> -				  panthor_irq_default_raw_handler,
> +				  panthor_job_irq_raw_handler,
>  				  panthor_job_irq_threaded_handler);
>  	if (ret) {
>  		drm_err(&ptdev->base, "failed to request job irq");
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/10] drm/panthor: Introduce interrupt coalescing support for job IRQs
  2026-04-29  9:38 ` [PATCH 10/10] drm/panthor: Introduce interrupt coalescing support for job IRQs Boris Brezillon
@ 2026-05-01 14:57   ` Steven Price
  2026-05-04 11:15     ` Boris Brezillon
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Price @ 2026-05-01 14:57 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 29/04/2026 10:38, Boris Brezillon wrote:
> Dealing with interrupts from the raw IRQ handler is good for latency,
> but might be detrimental for the overall throughput, because the system
> keeps being interrupted to process job interrupts.
> 
> Try to mitigate that with some interrupt coalescing infrastructure,
> where we wake up the IRQ thread if close enough interrupts gets
> detected.
> 
> It's still experimental, which explains why the feature is off by
> default, and can be enabled through a debugfs knob.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

I think we need some more serious benchmarking to decide whether this is
a good idea. We've experimented with coalescing interrupts in the past
and it generally regressed some important benchmark of the day. But I'm
not in the loop of "benchmark of the day" any more (although I do know
that glmark hasn't been for years...) so it might have changed. From
what I hear AI workloads "benefit"[1] from spinning a CPU waiting for
jobs to finish.

[1] AI workloads don't tend to care so much about power... at least from
the CPU.

One typo I spotted below. And I'm not awfully keen on the debugfs
interface (but for testing it's obviously fine).

> ---
>  drivers/gpu/drm/panthor/panthor_device.h |  83 +++++++++++++++++
>  drivers/gpu/drm/panthor/panthor_drv.c    |   1 +
>  drivers/gpu/drm/panthor/panthor_fw.c     | 150 +++++++++++++++++++++++++++++--
>  drivers/gpu/drm/panthor/panthor_fw.h     |   2 +
>  4 files changed, 231 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 1c130b8394ab..e90f251f75e2 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -109,6 +109,48 @@ struct panthor_irq {
>  	enum panthor_irq_state state;
>  };
>  
> +/**
> + * struct panthor_irq_coalescing - IRQ coalescing info
> + */
> +struct panthor_irq_coalescing {
> +	/**
> +	 * @max_us: Maximum time in microseconds between two consecutive
> +	 * interrupts to consider coalescing.
> +	 *
> +	 * It being a u16 means we can't encode more than 65-ish msecs, but
> +	 * if we have to poll status for more than a few hundreds usecs it's
> +	 * going to make the IRQ thread consume more CPU than we want.
> +	 */
> +	u16 max_us;
> +
> +	/**
> +	 * @poll_perios_us: Rate at which status polling happens.

NIT: Typo: s/perios/period/

Thanks,
Steve

> +	 *
> +	 * It being a u16 means we can't encode more than 65-ish msecs, but
> +	 * if we have to delay each status check by more than a few usecs
> +	 * it's going to add latency we don't want.
> +	 */
> +	u16 poll_period_us;
> +
> +	/**
> +	 * @inbounds_cnt_threshold: Minimum of consecutive interrupts with no
> +	 * more than max_us between them to wake up the thread handler.
> +	 */
> +	u16 inbounds_cnt_threshold;
> +
> +	/**
> +	 * @inbounds_cnt: Current number of consecutive interrupts with no more
> +	 * than max_us between.
> +	 */
> +	u16 inbounds_cnt;
> +
> +	/** @coalesced_cnt: Total number of interrupts coalesced. */
> +	u64 coalesced_cnt;
> +
> +	/** @last_ts: Timestamp of the last IRQ. */
> +	ktime_t last_ts;
> +};
> +
>  /**
>   * enum panthor_device_profiling_mode - Profiling state
>   */
> @@ -571,6 +613,47 @@ static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
>  #define INT_MASK    0x8
>  #define INT_STAT    0xc
>  
> +static inline bool
> +panthor_irq_coalescing_wake_thread(struct panthor_irq_coalescing *coalescing)
> +{
> +	ktime_t ts;
> +	s64 diff_ns;
> +
> +	if (!coalescing->inbounds_cnt_threshold)
> +		return false;
> +
> +	ts = ktime_get();
> +	diff_ns = ktime_to_ns(ktime_sub(ts, coalescing->last_ts));
> +	if (diff_ns > coalescing->max_us * 1000) {
> +		coalescing->inbounds_cnt = 1;
> +		return false;
> +	}
> +
> +	if (coalescing->inbounds_cnt < U16_MAX)
> +		coalescing->inbounds_cnt++;
> +
> +	return coalescing->inbounds_cnt >= coalescing->inbounds_cnt_threshold;
> +}
> +
> +static inline void
> +panthor_irq_coalescing_update_ts(struct panthor_irq_coalescing *coalescing)
> +{
> +	if (coalescing->inbounds_cnt_threshold)
> +		coalescing->last_ts = ktime_get();
> +}
> +
> +static inline void
> +panthor_irq_coalescing_init(struct panthor_irq_coalescing *coalescing,
> +			     u16 max_us, u16 poll_period_us, u16 inbounds_cnt_threshold)
> +{
> +	coalescing->inbounds_cnt = 0;
> +	coalescing->coalesced_cnt = 0;
> +	coalescing->max_us = max_us;
> +	coalescing->poll_period_us = poll_period_us;
> +	coalescing->inbounds_cnt_threshold = inbounds_cnt_threshold;
> +	coalescing->last_ts = ktime_set(0, 0);
> +}
> +
>  static inline irqreturn_t panthor_irq_default_raw_handler(int irq, void *data)
>  {
>  	struct panthor_irq *pirq = data;
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 66996c9147c2..2fac5ba57f9d 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1760,6 +1760,7 @@ static void panthor_debugfs_init(struct drm_minor *minor)
>  {
>  	panthor_mmu_debugfs_init(minor);
>  	panthor_gem_debugfs_init(minor);
> +	panthor_fw_debugfs_init(minor);
>  }
>  #endif
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 05c632913359..cbb7d00f0e6e 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -6,6 +6,7 @@
>  #endif
>  
>  #include <linux/clk.h>
> +#include <linux/debugfs.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/firmware.h>
>  #include <linux/iopoll.h>
> @@ -15,6 +16,7 @@
>  #include <linux/pm_runtime.h>
>  
>  #include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
>  #include <drm/drm_managed.h>
>  #include <drm/drm_print.h>
>  
> @@ -271,6 +273,9 @@ struct panthor_fw {
>  
>  	/** @irq: Job irq data. */
>  	struct panthor_irq irq;
> +
> +	/** @irq_coalescing: Job IRQ coalescing. */
> +	struct panthor_irq_coalescing irq_coalescing;
>  };
>  
>  struct panthor_vm *panthor_fw_vm(struct panthor_device *ptdev)
> @@ -1090,6 +1095,8 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
>  static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
>  {
>  	struct panthor_irq *pirq = data;
> +	struct panthor_device *ptdev = pirq->ptdev;
> +	irqreturn_t ret = IRQ_HANDLED;
>  
>  	if (!gpu_read(pirq->iomem, INT_STAT))
>  		return IRQ_NONE;
> @@ -1101,6 +1108,9 @@ static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
>  		pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
>  	}
>  
> +	if (panthor_irq_coalescing_wake_thread(&ptdev->fw->irq_coalescing))
> +		ret = IRQ_WAKE_THREAD;
> +
>  	panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_RAWSTAT));
>  
>  	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> @@ -1108,17 +1118,58 @@ static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
>  			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
>  	}
>  
> -	return IRQ_HANDLED;
> +	panthor_irq_coalescing_update_ts(&ptdev->fw->irq_coalescing);
> +	return ret;
>  }
>  
>  static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
>  {
>  	struct panthor_irq *pirq = data;
> +	struct panthor_device *ptdev = pirq->ptdev;
> +	irqreturn_t ret = IRQ_NONE;
> +	u32 processed_count = 0;
>  
> -	/* We never return IRQ_WAKE_THREAD, so we're not supposed to be called. */
> -	drm_WARN_ON_ONCE(&pirq->ptdev->base,
> -			 "threaded IRQ handler should never be called.");
> -	return IRQ_NONE;
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +		if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> +			return IRQ_NONE;
> +
> +		gpu_write(pirq->iomem, INT_MASK, 0);
> +		pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> +	}
> +
> +	while (true) {
> +		u32 status;
> +
> +		/* It's safe to access pirq->mask without the lock held here. If a new
> +		 * event gets added to the mask and the corresponding IRQ is pending,
> +		 * we'll process it right away instead of adding an extra raw -> threaded
> +		 * round trip. If an event is removed and the status bit is set, it will
> +		 * be ignored, just like it would have been if the mask had been adjusted
> +		 * right before the HW event kicks in. TLDR; it's all expected races we're
> +		 * covered for.
> +		 */
> +		if (readl_poll_timeout_atomic(pirq->iomem + INT_RAWSTAT,
> +					      status, status & pirq->mask,
> +					      ptdev->fw->irq_coalescing.poll_period_us,
> +					      ptdev->fw->irq_coalescing.max_us))
> +			break;
> +
> +		panthor_job_irq_handler(pirq, status);
> +		ret = IRQ_HANDLED;
> +		processed_count++;
> +	}
> +
> +	if (processed_count > 1)
> +		ptdev->fw->irq_coalescing.coalesced_cnt += processed_count - 1;
> +
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {
> +			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> +			gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +		}
> +	}
> +
> +	return ret;
>  }
>  
>  static int panthor_fw_start(struct panthor_device *ptdev)
> @@ -1516,6 +1567,11 @@ int panthor_fw_init(struct panthor_device *ptdev)
>  	if (irq <= 0)
>  		return -ENODEV;
>  
> +	/* Start with IRQ coalescing disabled, until we have enough proof it's
> +	 * useful and doesn't have a too big CPU overhead. Those parameters can
> +	 * be tweaked with the debugfs knobs.
> +	 */
> +	panthor_irq_coalescing_init(&fw->irq_coalescing, 0, 0, 0);
>  	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
>  				  ptdev->iomem + JOB_INT_BASE, "job",
>  				  panthor_job_irq_raw_handler,
> @@ -1563,6 +1619,90 @@ int panthor_fw_init(struct panthor_device *ptdev)
>  	return ret;
>  }
>  
> +static ssize_t job_irq_coalescing_props_read(struct file *file,
> +					     char __user *ubuf,
> +					     size_t ubuf_size,
> +					     loff_t *ppos)
> +{
> +	struct panthor_device *ptdev = container_of(file->private_data,
> +						    struct panthor_device, base);
> +	char kbuf[256] = {};
> +	int kbuf_size;
> +
> +	kbuf_size = snprintf(kbuf, sizeof(kbuf) - 1,
> +			     "max_us=%u poll_period_us=%u inbounds_cnt_threshold=%u\n",
> +			     ptdev->fw->irq_coalescing.max_us,
> +			     ptdev->fw->irq_coalescing.poll_period_us,
> +			     ptdev->fw->irq_coalescing.inbounds_cnt_threshold);
> +	if (kbuf_size > sizeof(kbuf) - 1)
> +		kbuf_size = sizeof(kbuf) - 1;
> +
> +	return simple_read_from_buffer(ubuf, ubuf_size, ppos, kbuf, kbuf_size);
> +}
> +
> +static ssize_t job_irq_coalescing_props_write(struct file *file,
> +					      const char __user *ubuf,
> +					      size_t ubuf_size, loff_t *ppos)
> +{
> +	struct panthor_device *ptdev = container_of(file->private_data,
> +						    struct panthor_device, base);
> +	unsigned int max_us = 0, poll_period_us = 0, inbounds_cnt_threshold = 0;
> +	char kbuf[256] = {};
> +	int ret;
> +
> +	simple_write_to_buffer(kbuf, sizeof(kbuf) - 1, ppos, ubuf, ubuf_size);
> +	ret = sscanf(kbuf,
> +		     "max_us=%u poll_period_us=%u inbounds_cnt_threshold=%u",
> +		     &max_us, &poll_period_us, &inbounds_cnt_threshold);
> +	if (ret != 3)
> +		return -EINVAL;
> +
> +	if (max_us > U16_MAX || poll_period_us > U16_MAX || inbounds_cnt_threshold > U16_MAX)
> +		return -EINVAL;
> +
> +	panthor_irq_coalescing_init(&ptdev->fw->irq_coalescing, max_us,
> +				    poll_period_us, inbounds_cnt_threshold);
> +	return ubuf_size;
> +}
> +
> +static const struct debugfs_short_fops job_irq_coalescing_props_fops = {
> +	.read = job_irq_coalescing_props_read,
> +	.write = job_irq_coalescing_props_write,
> +};
> +
> +static ssize_t job_irq_coalescing_stats_read(struct file *file,
> +					     char __user *ubuf,
> +					     size_t ubuf_size,
> +					     loff_t *ppos)
> +{
> +	struct panthor_device *ptdev = container_of(file->private_data,
> +						    struct panthor_device, base);
> +	char kbuf[256] = {};
> +	int kbuf_size;
> +
> +	kbuf_size = snprintf(kbuf, sizeof(kbuf) - 1,
> +			     "inbounds_cnt=%u coalesced_cnt=%llu last_ts=%llu\n",
> +			     ptdev->fw->irq_coalescing.inbounds_cnt,
> +			     ptdev->fw->irq_coalescing.coalesced_cnt,
> +			     ktime_to_ns(ptdev->fw->irq_coalescing.last_ts));
> +	if (kbuf_size > sizeof(kbuf) - 1)
> +		kbuf_size = sizeof(kbuf) - 1;
> +
> +	return simple_read_from_buffer(ubuf, ubuf_size, ppos, kbuf, kbuf_size);
> +}
> +
> +static const struct debugfs_short_fops job_irq_coalescing_stats_fops = {
> +	.read = job_irq_coalescing_stats_read,
> +};
> +
> +void panthor_fw_debugfs_init(struct drm_minor *minor)
> +{
> +	debugfs_create_file("job_irq_coalescing_props", 0600, minor->debugfs_root,
> +			    minor->dev, &job_irq_coalescing_props_fops);
> +	debugfs_create_file("job_irq_coalescing_stats", 0400, minor->debugfs_root,
> +			    minor->dev, &job_irq_coalescing_stats_fops);
> +}
> +
>  MODULE_FIRMWARE("arm/mali/arch10.8/mali_csffw.bin");
>  MODULE_FIRMWARE("arm/mali/arch10.10/mali_csffw.bin");
>  MODULE_FIRMWARE("arm/mali/arch10.12/mali_csffw.bin");
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
> index e56b7fe15bb3..2643bd9e4ef9 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.h
> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
> @@ -526,4 +526,6 @@ static inline int panthor_fw_resume(struct panthor_device *ptdev)
>  int panthor_fw_init(struct panthor_device *ptdev);
>  void panthor_fw_unplug(struct panthor_device *ptdev);
>  
> +void panthor_fw_debugfs_init(struct drm_minor *minor);
> +
>  #endif
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/10] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-01 13:47   ` Steven Price
@ 2026-05-04  9:34     ` Boris Brezillon
  0 siblings, 0 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-05-04  9:34 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Fri, 1 May 2026 14:47:59 +0100
Steven Price <steven.price@arm.com> wrote:

> On 29/04/2026 10:38, Boris Brezillon wrote:
> > Add a specific spinlock for events processing, and force processing
> > of events in the panthor_sched_report_fw_events() path rather than
> > deferring it to a work item. We also fast-track fence signalling by
> > making the job completion logic IRQ-safe.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> 
> I think there's some locking problems here. With some AI help I found
> the following path:
> 
>  * panthor_job_irq_handler()
>   * panthor_sched_report_fw_events()
>    * [takes events_lock]
>    * sched_process_csg_irq_locked()
>     * csg_slot_process_progress_timer_event_locked()
>      * lockdep_assert_held(&sched->lock);

Oops, this one should check that events_lock is held instead. I'll fix
that.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-01 14:20   ` Steven Price
@ 2026-05-04 11:02     ` Boris Brezillon
  2026-05-06 14:35       ` Steven Price
  0 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-05-04 11:02 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Fri, 1 May 2026 15:20:17 +0100
Steven Price <steven.price@arm.com> wrote:

> On 29/04/2026 10:38, Boris Brezillon wrote:
> > Rather than assuming an interrupt is always expected for request
> > acks, temporarily enable the relevant interrupts when the polling-wait
> > failed. This should hopefully reduce the number of interrupts the CPU
> > has to process.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> 
> It seems to work, although I'm lightly uneasy about this because I'm not
> entirely sure whether the FW will immediately see the updates to
> ack_irq_mask and therefore whether there's a possibility to miss an
> event and be stuck waiting for the timeout.
> 
> Memory models are not my strong point, OpenAI tells me the sequence
> should be something like:
> 
>   scoped_guard(spinlock_irqsave, lock) {
>   	u32 ack_irq_mask = READ_ONCE(*ack_irq_mask_ptr);
> 
>   	WRITE_ONCE(*ack_irq_mask_ptr, ack_irq_mask | req_mask);
>   }

Is this really needed? In which situation would the compiler/CPU decide
to re-order this read_update_modify sequence?

> 
>   /*
>    * The FW interface can be mapped write-combine/Normal-NC.

I'm not too sure I see what the non-cached property has to do with it.
If it was cached we would still need this memory barrier, and in
addition, we'd need a cache flush if the FW is not IO-coherent.

>Make sure the
>    * IRQ mask update is visible to the FW before sleeping waiting for
> the IRQ.
>    */
>   wmb();
> 
> Which seems plausible. But I've long ago learnt that plausible doesn't
> mean much when dealing with memory models!

Yeah, I'm not too sure. I was honestly expecting the spinlock guard to
act as a memory barrier already, but maybe it's not enough.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/10] drm/panthor: Introduce interrupt coalescing support for job IRQs
  2026-05-01 14:57   ` Steven Price
@ 2026-05-04 11:15     ` Boris Brezillon
  0 siblings, 0 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-05-04 11:15 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Fri, 1 May 2026 15:57:35 +0100
Steven Price <steven.price@arm.com> wrote:

> On 29/04/2026 10:38, Boris Brezillon wrote:
> > Dealing with interrupts from the raw IRQ handler is good for latency,
> > but might be detrimental for the overall throughput, because the system
> > keeps being interrupted to process job interrupts.
> > 
> > Try to mitigate that with some interrupt coalescing infrastructure,
> > where we wake up the IRQ thread if close enough interrupts gets
> > detected.
> > 
> > It's still experimental, which explains why the feature is off by
> > default, and can be enabled through a debugfs knob.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> 
> I think we need some more serious benchmarking to decide whether this is
> a good idea. We've experimented with coalescing interrupts in the past
> and it generally regressed some important benchmark of the day. But I'm
> not in the loop of "benchmark of the day" any more (although I do know
> that glmark hasn't been for years...) so it might have changed. From
> what I hear AI workloads "benefit"[1] from spinning a CPU waiting for
> jobs to finish.
> 
> [1] AI workloads don't tend to care so much about power... at least from
> the CPU.
> 
> One typo I spotted below. And I'm not awfully keen on the debugfs
> interface (but for testing it's obviously fine).

Yeah, just to be clear, patch 10 was really meant to be an RFC to get
the discussion started. What worries me a bit is the regression I'm
seeing on refract/terrain when switching to "event processing from the "
hard handler, which is why I worked on that.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/10] drm/panthor: Automate CSG IRQ processing at group unbind time
  2026-05-01 13:53   ` Steven Price
@ 2026-05-04 15:00     ` Boris Brezillon
  0 siblings, 0 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-05-04 15:00 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Fri, 1 May 2026 14:53:22 +0100
Steven Price <steven.price@arm.com> wrote:

> > @@ -2970,8 +2963,6 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
> >  
> >  			if (flush_caches_failed)
> >  				csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
> > -			else
> > -				csg_slot_sync_update_locked(ptdev, csg_id);  
> 
> The justification for this change doesn't seem to be included in the
> commit message and looks suspicious.

Hm, right. I somehow confused csg_slot_sync_update_locked() and
sched_process_csg_irq_locked().

> Although AFAICT the events_lock
> wouldn't be held here so it could trigger a lockdep assert before this
> change...

Yeah, the guard(spinlock_irqsave)(&sched->events_lock); should have
been added to patch 4.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency
  2026-04-29 10:36 ` Boris Brezillon
@ 2026-05-05  8:54   ` Boris Brezillon
  2026-05-05 16:12     ` Liviu Dudau
  0 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-05-05  8:54 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On Wed, 29 Apr 2026 12:36:07 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Wed, 29 Apr 2026 11:38:27 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
> 
> > Right now, panthor is one of the rare drivers to signal fences
> > from work items (not even from the threaded IRQ handler). We
> > could move that to the threaded handler, but that would still
> > leave the latency caused by the scheduling of the IRQ thread.
> > 
> > Instead, this patchset moves all the job IRQ processing to
> > the raw IRQ handler, which is fine because what the current
> > code does is demux the interrupts and deferring actual handling
> > to sub work items. The only bits we keep in the IRQ path is
> > the dma_fence signalling, which should be acceptable, in term
> > of CPU cycles spent in the IRQ context.
> > 
> > Pretty much all the patches except the last two are just
> > preparing the ground to get there. The second to last one
> > does the thread -> IRQ transition, and the last one is some
> > experimental interrupt coalescing support that I've added
> > because I noticed moving job IRQ handling to the raw handler
> > generates quite a lot of interrupts in some case, and having
> > the system constantly interrupted like that can be
> > detrimental.
> >   
> 
> Forgot to post some preliminary numbers I collected during my,
> admittedly, very basic testing :-). What this shows is that IRQ
> coalescing provides small but noticeable improvements only in some
> of the glmark scenes (terrain, refract),

I think I found the problem on the "refract" regression. Turns out if I
set cpufreq governor to performance instead of schedutil, I get those
~20% back, which is not surprising given schedutil uses scheduling stats
to decide to bump/lower the frequency, and the extra IRQ_WAKE_THREAD
indirection does add regular thread activity.

So, this basically leaves terrain where interrupt coalescing helps a
bit, but it's not clear that the improvement justifies the added
complexity or the extra CPU load incurred by the polling done in the
threaded handler.

If everyone is fine with that, I'd be tempted to drop patch 10 for now,
and revisit it if/when we have an actual use case where it's deemed
essential to limit IRQs coming from the GPU.

Regards,

Boris

> the rest of the variations
> stay in the noise of what we see between regular glmark runs. BTW,
> those relatively small improvements (~5%) aren't even reflected in the
> final score, because many tests have high FPS scores, and any variation
> on those might actually have more impact on the final score (which is
> just a average FPS IIUC) than any improvement on the lower-FPS scenes.
> 
> It's also worth noting that the refract scenes seems to suffer from
> this threaded -> raw-IRQ transition, and that coalescing gets us back
> to where we were.
> 
> TLDR; As always, there's no simple answer to this 'latency vs throughput'
> issue, and it's not surprising one approach helps some cases and
> regresses others.
> 
> ---------- Before this series ---------------
> 
> =======================================================
>     glmark2 2023.01
> =======================================================
>     OpenGL Information
>     GL_VENDOR:      Mesa
>     GL_RENDERER:    Mali-G610 MC4 (Panfrost)
>     GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
>     Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
>     Surface Size:   800x600 windowed
> =======================================================
> [build] use-vbo=false: FPS: 2708 FrameTime: 0.369 ms
> [build] use-vbo=true: FPS: 4209 FrameTime: 0.238 ms
> [texture] texture-filter=nearest: FPS: 5211 FrameTime: 0.192 ms
> [texture] texture-filter=linear: FPS: 5224 FrameTime: 0.191 ms
> [texture] texture-filter=mipmap: FPS: 5255 FrameTime: 0.190 ms
> [shading] shading=gouraud: FPS: 3395 FrameTime: 0.295 ms
> [shading] shading=blinn-phong-inf: FPS: 3329 FrameTime: 0.300 ms
> [shading] shading=phong: FPS: 2990 FrameTime: 0.335 ms
> [shading] shading=cel: FPS: 2916 FrameTime: 0.343 ms
> [bump] bump-render=high-poly: FPS: 1879 FrameTime: 0.532 ms
> [bump] bump-render=normals: FPS: 5242 FrameTime: 0.191 ms
> [bump] bump-render=height: FPS: 4997 FrameTime: 0.200 ms
> [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3725 FrameTime: 0.268 ms
> [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 1906 FrameTime: 0.525 ms
> [pulsar] light=false:quads=5:texture=false: FPS: 4863 FrameTime: 0.206 ms
> [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 706 FrameTime: 1.417 ms
> [desktop] effect=shadow:windows=4: FPS: 2621 FrameTime: 0.382 ms
> [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 411 FrameTime: 2.435 ms
> [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 402 FrameTime: 2.489 ms
> [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 490 FrameTime: 2.043 ms
> [ideas] speed=duration: FPS: 1008 FrameTime: 0.992 ms
> [jellyfish] <default>: FPS: 2722 FrameTime: 0.367 ms
> [terrain] <default>: FPS: 120 FrameTime: 8.339 ms
> [shadow] <default>: FPS: 2086 FrameTime: 0.479 ms
> [refract] <default>: FPS: 312 FrameTime: 3.209 ms
> [conditionals] fragment-steps=0:vertex-steps=0: FPS: 4877 FrameTime: 0.205 ms
> [conditionals] fragment-steps=5:vertex-steps=0: FPS: 4118 FrameTime: 0.243 ms
> [conditionals] fragment-steps=0:vertex-steps=5: FPS: 4845 FrameTime: 0.206 ms
> [function] fragment-complexity=low:fragment-steps=5: FPS: 4444 FrameTime: 0.225 ms
> [function] fragment-complexity=medium:fragment-steps=5: FPS: 3722 FrameTime: 0.269 ms
> [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4468 FrameTime: 0.224 ms
> [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4442 FrameTime: 0.225 ms
> [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 3847 FrameTime: 0.260 ms
> =======================================================
>                                   glmark2 Score: 3135 
> =======================================================
> 
> ---------- After transitioning to job event processing in the IRQ context ------------
> 
> =======================================================
>     glmark2 2023.01
> =======================================================
>     OpenGL Information
>     GL_VENDOR:      Mesa
>     GL_RENDERER:    Mali-G610 MC4 (Panfrost)
>     GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
>     Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
>     Surface Size:   800x600 windowed
> =======================================================
> [build] use-vbo=false: FPS: 2703 FrameTime: 0.370 ms
> [build] use-vbo=true: FPS: 4630 FrameTime: 0.216 ms
> [texture] texture-filter=nearest: FPS: 5406 FrameTime: 0.185 ms
> [texture] texture-filter=linear: FPS: 5429 FrameTime: 0.184 ms
> [texture] texture-filter=mipmap: FPS: 5408 FrameTime: 0.185 ms
> [shading] shading=gouraud: FPS: 3678 FrameTime: 0.272 ms
> [shading] shading=blinn-phong-inf: FPS: 3587 FrameTime: 0.279 ms
> [shading] shading=phong: FPS: 3221 FrameTime: 0.311 ms
> [shading] shading=cel: FPS: 3119 FrameTime: 0.321 ms
> [bump] bump-render=high-poly: FPS: 1977 FrameTime: 0.506 ms
> [bump] bump-render=normals: FPS: 5488 FrameTime: 0.182 ms
> [bump] bump-render=height: FPS: 5323 FrameTime: 0.188 ms
> [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 4003 FrameTime: 0.250 ms
> [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2008 FrameTime: 0.498 ms
> [pulsar] light=false:quads=5:texture=false: FPS: 4961 FrameTime: 0.202 ms
> [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 852 FrameTime: 1.174 ms
> [desktop] effect=shadow:windows=4: FPS: 2649 FrameTime: 0.378 ms
> [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 412 FrameTime: 2.429 ms
> [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 392 FrameTime: 2.554 ms
> [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 482 FrameTime: 2.075 ms
> [ideas] speed=duration: FPS: 1021 FrameTime: 0.980 ms
> [jellyfish] <default>: FPS: 2939 FrameTime: 0.340 ms
> [terrain] <default>: FPS: 126 FrameTime: 7.979 ms
> [shadow] <default>: FPS: 2273 FrameTime: 0.440 ms
> [refract] <default>: FPS: 251 FrameTime: 3.999 ms
> [conditionals] fragment-steps=0:vertex-steps=0: FPS: 5148 FrameTime: 0.194 ms
> [conditionals] fragment-steps=5:vertex-steps=0: FPS: 4555 FrameTime: 0.220 ms
> [conditionals] fragment-steps=0:vertex-steps=5: FPS: 5245 FrameTime: 0.191 ms
> [function] fragment-complexity=low:fragment-steps=5: FPS: 4880 FrameTime: 0.205 ms
> [function] fragment-complexity=medium:fragment-steps=5: FPS: 4042 FrameTime: 0.247 ms
> [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4846 FrameTime: 0.206 ms
> [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4854 FrameTime: 0.206 ms
> [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 4207 FrameTime: 0.238 ms
> =======================================================
>                                   glmark2 Score: 3335 
> =======================================================
> 
> ---- With IRQ coalescing enabled (max_us=100 poll_period_us=5 inbounds_cnt_threshold=5) ---
> 
> =======================================================
>     glmark2 2023.01
> =======================================================
>     OpenGL Information
>     GL_VENDOR:      Mesa
>     GL_RENDERER:    Mali-G610 MC4 (Panfrost)
>     GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
>     Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
>     Surface Size:   800x600 windowed
> =======================================================
> [build] use-vbo=false: FPS: 2663 FrameTime: 0.376 ms
> [build] use-vbo=true: FPS: 4640 FrameTime: 0.216 ms
> [texture] texture-filter=nearest: FPS: 5335 FrameTime: 0.187 ms
> [texture] texture-filter=linear: FPS: 5442 FrameTime: 0.184 ms
> [texture] texture-filter=mipmap: FPS: 5434 FrameTime: 0.184 ms
> [shading] shading=gouraud: FPS: 3683 FrameTime: 0.272 ms
> [shading] shading=blinn-phong-inf: FPS: 3580 FrameTime: 0.279 ms
> [shading] shading=phong: FPS: 3211 FrameTime: 0.312 ms
> [shading] shading=cel: FPS: 3093 FrameTime: 0.323 ms
> [bump] bump-render=high-poly: FPS: 1969 FrameTime: 0.508 ms
> [bump] bump-render=normals: FPS: 5368 FrameTime: 0.186 ms
> [bump] bump-render=height: FPS: 5273 FrameTime: 0.190 ms
> [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 4038 FrameTime: 0.248 ms
> [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2001 FrameTime: 0.500 ms
> [pulsar] light=false:quads=5:texture=false: FPS: 4961 FrameTime: 0.202 ms
> [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 842 FrameTime: 1.188 ms
> [desktop] effect=shadow:windows=4: FPS: 2681 FrameTime: 0.373 ms
> [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 412 FrameTime: 2.430 ms
> [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 408 FrameTime: 2.452 ms
> [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 483 FrameTime: 2.072 ms
> [ideas] speed=duration: FPS: 1005 FrameTime: 0.995 ms
> [jellyfish] <default>: FPS: 2945 FrameTime: 0.340 ms
> [terrain] <default>: FPS: 131 FrameTime: 7.663 ms
> [shadow] <default>: FPS: 2276 FrameTime: 0.440 ms
> [refract] <default>: FPS: 328 FrameTime: 3.050 ms
> [conditionals] fragment-steps=0:vertex-steps=0: FPS: 5099 FrameTime: 0.196 ms
> [conditionals] fragment-steps=5:vertex-steps=0: FPS: 4538 FrameTime: 0.220 ms
> [conditionals] fragment-steps=0:vertex-steps=5: FPS: 5152 FrameTime: 0.194 ms
> [function] fragment-complexity=low:fragment-steps=5: FPS: 4818 FrameTime: 0.208 ms
> [function] fragment-complexity=medium:fragment-steps=5: FPS: 4035 FrameTime: 0.248 ms
> [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4855 FrameTime: 0.206 ms
> [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4812 FrameTime: 0.208 ms
> [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 4150 FrameTime: 0.241 ms
> =======================================================
>                                   glmark2 Score: 3322 
> =======================================================
> 
> 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> > Boris Brezillon (10):
> >       drm/panthor: Make panthor_irq::state a non-atomic field
> >       drm/panthor: Move the register accessors before the IRQ helpers
> >       drm/panthor: Replace the panthor_irq macro machinery by inline helpers
> >       drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers
> >       drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
> >       drm/panthor: Prepare the scheduler logic for FW events in IRQ context
> >       drm/panthor: Automate CSG IRQ processing at group unbind time
> >       drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
> >       drm/panthor: Process FW events in IRQ context
> >       drm/panthor: Introduce interrupt coalescing support for job IRQs
> > 
> >  drivers/gpu/drm/panthor/panthor_device.h | 358 ++++++++++++++---------
> >  drivers/gpu/drm/panthor/panthor_drv.c    |   1 +
> >  drivers/gpu/drm/panthor/panthor_fw.c     | 226 +++++++++++++--
> >  drivers/gpu/drm/panthor/panthor_fw.h     |  11 +-
> >  drivers/gpu/drm/panthor/panthor_gpu.c    |  27 +-
> >  drivers/gpu/drm/panthor/panthor_mmu.c    |  38 +--
> >  drivers/gpu/drm/panthor/panthor_pwr.c    |  21 +-
> >  drivers/gpu/drm/panthor/panthor_sched.c  | 475 ++++++++++++++-----------------
> >  8 files changed, 698 insertions(+), 459 deletions(-)
> > ---
> > base-commit: 7455a0583a906533041a80e48c6a2e3230cce96e
> > change-id: 20260429-panthor-signal-from-irq-d33684f4d292
> > prerequisite-message-id: <20260427155934.416502-1-karunika.choo@arm.com>
> > prerequisite-patch-id: 70905a2eb09ab2b31d242a5ed5af3b42fb6a464c
> > prerequisite-patch-id: aa4c22669f80328039762f25c0b3942bbadbdc89
> > prerequisite-patch-id: 7f61bcee3c4bb5703900b18d5b6e0f52e622f29d
> > prerequisite-patch-id: 3402f4d60aa526d40113fc3d9b3e599f8f89e705
> > prerequisite-patch-id: 00ddbd3d455891f6950609614c1acd2baa78b0db
> > prerequisite-patch-id: 6a9928f609e3757cadebb2df6795d0da55745f4e
> > prerequisite-patch-id: fd91f68f25d4bc93eec405f0131f5ae4284bfaf2
> > prerequisite-patch-id: 553958a10a0ca2f20f7883ad4c752cfc7485c5a8
> > 
> > Best regards,  
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency
  2026-05-05  8:54   ` Boris Brezillon
@ 2026-05-05 16:12     ` Liviu Dudau
  0 siblings, 0 replies; 39+ messages in thread
From: Liviu Dudau @ 2026-05-05 16:12 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Tue, May 05, 2026 at 10:54:53AM +0200, Boris Brezillon wrote:
> On Wed, 29 Apr 2026 12:36:07 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
> 
> > On Wed, 29 Apr 2026 11:38:27 +0200
> > Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > 
> > > Right now, panthor is one of the rare drivers to signal fences
> > > from work items (not even from the threaded IRQ handler). We
> > > could move that to the threaded handler, but that would still
> > > leave the latency caused by the scheduling of the IRQ thread.
> > > 
> > > Instead, this patchset moves all the job IRQ processing to
> > > the raw IRQ handler, which is fine because what the current
> > > code does is demux the interrupts and deferring actual handling
> > > to sub work items. The only bits we keep in the IRQ path is
> > > the dma_fence signalling, which should be acceptable, in term
> > > of CPU cycles spent in the IRQ context.
> > > 
> > > Pretty much all the patches except the last two are just
> > > preparing the ground to get there. The second to last one
> > > does the thread -> IRQ transition, and the last one is some
> > > experimental interrupt coalescing support that I've added
> > > because I noticed moving job IRQ handling to the raw handler
> > > generates quite a lot of interrupts in some case, and having
> > > the system constantly interrupted like that can be
> > > detrimental.
> > >   
> > 
> > Forgot to post some preliminary numbers I collected during my,
> > admittedly, very basic testing :-). What this shows is that IRQ
> > coalescing provides small but noticeable improvements only in some
> > of the glmark scenes (terrain, refract),
> 
> I think I found the problem on the "refract" regression. Turns out if I
> set cpufreq governor to performance instead of schedutil, I get those
> ~20% back, which is not surprising given schedutil uses scheduling stats
> to decide to bump/lower the frequency, and the extra IRQ_WAKE_THREAD
> indirection does add regular thread activity.

Nice find! I've completely ignore that the choice of scheduler could play a
part in the latency of handling raw IRQs if it ends up lowering the CPU freq.

> 
> So, this basically leaves terrain where interrupt coalescing helps a
> bit, but it's not clear that the improvement justifies the added
> complexity or the extra CPU load incurred by the polling done in the
> threaded handler.
> 
> If everyone is fine with that, I'd be tempted to drop patch 10 for now,
> and revisit it if/when we have an actual use case where it's deemed
> essential to limit IRQs coming from the GPU.

Yes, I agree with the idea of dropping the coalescing patch for now. Doing
more in the raw handler and deferring only slow work to the threaded handler
makes sense, so we should keep that.

Best regards,
Liviu

> 
> Regards,
> 
> Boris
> 
> > the rest of the variations
> > stay in the noise of what we see between regular glmark runs. BTW,
> > those relatively small improvements (~5%) aren't even reflected in the
> > final score, because many tests have high FPS scores, and any variation
> > on those might actually have more impact on the final score (which is
> > just a average FPS IIUC) than any improvement on the lower-FPS scenes.
> > 
> > It's also worth noting that the refract scenes seems to suffer from
> > this threaded -> raw-IRQ transition, and that coalescing gets us back
> > to where we were.
> > 
> > TLDR; As always, there's no simple answer to this 'latency vs throughput'
> > issue, and it's not surprising one approach helps some cases and
> > regresses others.
> > 
> > ---------- Before this series ---------------
> > 
> > =======================================================
> >     glmark2 2023.01
> > =======================================================
> >     OpenGL Information
> >     GL_VENDOR:      Mesa
> >     GL_RENDERER:    Mali-G610 MC4 (Panfrost)
> >     GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
> >     Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
> >     Surface Size:   800x600 windowed
> > =======================================================
> > [build] use-vbo=false: FPS: 2708 FrameTime: 0.369 ms
> > [build] use-vbo=true: FPS: 4209 FrameTime: 0.238 ms
> > [texture] texture-filter=nearest: FPS: 5211 FrameTime: 0.192 ms
> > [texture] texture-filter=linear: FPS: 5224 FrameTime: 0.191 ms
> > [texture] texture-filter=mipmap: FPS: 5255 FrameTime: 0.190 ms
> > [shading] shading=gouraud: FPS: 3395 FrameTime: 0.295 ms
> > [shading] shading=blinn-phong-inf: FPS: 3329 FrameTime: 0.300 ms
> > [shading] shading=phong: FPS: 2990 FrameTime: 0.335 ms
> > [shading] shading=cel: FPS: 2916 FrameTime: 0.343 ms
> > [bump] bump-render=high-poly: FPS: 1879 FrameTime: 0.532 ms
> > [bump] bump-render=normals: FPS: 5242 FrameTime: 0.191 ms
> > [bump] bump-render=height: FPS: 4997 FrameTime: 0.200 ms
> > [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3725 FrameTime: 0.268 ms
> > [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 1906 FrameTime: 0.525 ms
> > [pulsar] light=false:quads=5:texture=false: FPS: 4863 FrameTime: 0.206 ms
> > [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 706 FrameTime: 1.417 ms
> > [desktop] effect=shadow:windows=4: FPS: 2621 FrameTime: 0.382 ms
> > [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 411 FrameTime: 2.435 ms
> > [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 402 FrameTime: 2.489 ms
> > [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 490 FrameTime: 2.043 ms
> > [ideas] speed=duration: FPS: 1008 FrameTime: 0.992 ms
> > [jellyfish] <default>: FPS: 2722 FrameTime: 0.367 ms
> > [terrain] <default>: FPS: 120 FrameTime: 8.339 ms
> > [shadow] <default>: FPS: 2086 FrameTime: 0.479 ms
> > [refract] <default>: FPS: 312 FrameTime: 3.209 ms
> > [conditionals] fragment-steps=0:vertex-steps=0: FPS: 4877 FrameTime: 0.205 ms
> > [conditionals] fragment-steps=5:vertex-steps=0: FPS: 4118 FrameTime: 0.243 ms
> > [conditionals] fragment-steps=0:vertex-steps=5: FPS: 4845 FrameTime: 0.206 ms
> > [function] fragment-complexity=low:fragment-steps=5: FPS: 4444 FrameTime: 0.225 ms
> > [function] fragment-complexity=medium:fragment-steps=5: FPS: 3722 FrameTime: 0.269 ms
> > [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4468 FrameTime: 0.224 ms
> > [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4442 FrameTime: 0.225 ms
> > [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 3847 FrameTime: 0.260 ms
> > =======================================================
> >                                   glmark2 Score: 3135 
> > =======================================================
> > 
> > ---------- After transitioning to job event processing in the IRQ context ------------
> > 
> > =======================================================
> >     glmark2 2023.01
> > =======================================================
> >     OpenGL Information
> >     GL_VENDOR:      Mesa
> >     GL_RENDERER:    Mali-G610 MC4 (Panfrost)
> >     GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
> >     Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
> >     Surface Size:   800x600 windowed
> > =======================================================
> > [build] use-vbo=false: FPS: 2703 FrameTime: 0.370 ms
> > [build] use-vbo=true: FPS: 4630 FrameTime: 0.216 ms
> > [texture] texture-filter=nearest: FPS: 5406 FrameTime: 0.185 ms
> > [texture] texture-filter=linear: FPS: 5429 FrameTime: 0.184 ms
> > [texture] texture-filter=mipmap: FPS: 5408 FrameTime: 0.185 ms
> > [shading] shading=gouraud: FPS: 3678 FrameTime: 0.272 ms
> > [shading] shading=blinn-phong-inf: FPS: 3587 FrameTime: 0.279 ms
> > [shading] shading=phong: FPS: 3221 FrameTime: 0.311 ms
> > [shading] shading=cel: FPS: 3119 FrameTime: 0.321 ms
> > [bump] bump-render=high-poly: FPS: 1977 FrameTime: 0.506 ms
> > [bump] bump-render=normals: FPS: 5488 FrameTime: 0.182 ms
> > [bump] bump-render=height: FPS: 5323 FrameTime: 0.188 ms
> > [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 4003 FrameTime: 0.250 ms
> > [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2008 FrameTime: 0.498 ms
> > [pulsar] light=false:quads=5:texture=false: FPS: 4961 FrameTime: 0.202 ms
> > [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 852 FrameTime: 1.174 ms
> > [desktop] effect=shadow:windows=4: FPS: 2649 FrameTime: 0.378 ms
> > [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 412 FrameTime: 2.429 ms
> > [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 392 FrameTime: 2.554 ms
> > [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 482 FrameTime: 2.075 ms
> > [ideas] speed=duration: FPS: 1021 FrameTime: 0.980 ms
> > [jellyfish] <default>: FPS: 2939 FrameTime: 0.340 ms
> > [terrain] <default>: FPS: 126 FrameTime: 7.979 ms
> > [shadow] <default>: FPS: 2273 FrameTime: 0.440 ms
> > [refract] <default>: FPS: 251 FrameTime: 3.999 ms
> > [conditionals] fragment-steps=0:vertex-steps=0: FPS: 5148 FrameTime: 0.194 ms
> > [conditionals] fragment-steps=5:vertex-steps=0: FPS: 4555 FrameTime: 0.220 ms
> > [conditionals] fragment-steps=0:vertex-steps=5: FPS: 5245 FrameTime: 0.191 ms
> > [function] fragment-complexity=low:fragment-steps=5: FPS: 4880 FrameTime: 0.205 ms
> > [function] fragment-complexity=medium:fragment-steps=5: FPS: 4042 FrameTime: 0.247 ms
> > [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4846 FrameTime: 0.206 ms
> > [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4854 FrameTime: 0.206 ms
> > [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 4207 FrameTime: 0.238 ms
> > =======================================================
> >                                   glmark2 Score: 3335 
> > =======================================================
> > 
> > ---- With IRQ coalescing enabled (max_us=100 poll_period_us=5 inbounds_cnt_threshold=5) ---
> > 
> > =======================================================
> >     glmark2 2023.01
> > =======================================================
> >     OpenGL Information
> >     GL_VENDOR:      Mesa
> >     GL_RENDERER:    Mali-G610 MC4 (Panfrost)
> >     GL_VERSION:     OpenGL ES 3.1 Mesa 26.2.0-devel (git-c71664cfbc)
> >     Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
> >     Surface Size:   800x600 windowed
> > =======================================================
> > [build] use-vbo=false: FPS: 2663 FrameTime: 0.376 ms
> > [build] use-vbo=true: FPS: 4640 FrameTime: 0.216 ms
> > [texture] texture-filter=nearest: FPS: 5335 FrameTime: 0.187 ms
> > [texture] texture-filter=linear: FPS: 5442 FrameTime: 0.184 ms
> > [texture] texture-filter=mipmap: FPS: 5434 FrameTime: 0.184 ms
> > [shading] shading=gouraud: FPS: 3683 FrameTime: 0.272 ms
> > [shading] shading=blinn-phong-inf: FPS: 3580 FrameTime: 0.279 ms
> > [shading] shading=phong: FPS: 3211 FrameTime: 0.312 ms
> > [shading] shading=cel: FPS: 3093 FrameTime: 0.323 ms
> > [bump] bump-render=high-poly: FPS: 1969 FrameTime: 0.508 ms
> > [bump] bump-render=normals: FPS: 5368 FrameTime: 0.186 ms
> > [bump] bump-render=height: FPS: 5273 FrameTime: 0.190 ms
> > [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 4038 FrameTime: 0.248 ms
> > [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2001 FrameTime: 0.500 ms
> > [pulsar] light=false:quads=5:texture=false: FPS: 4961 FrameTime: 0.202 ms
> > [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 842 FrameTime: 1.188 ms
> > [desktop] effect=shadow:windows=4: FPS: 2681 FrameTime: 0.373 ms
> > [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 412 FrameTime: 2.430 ms
> > [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 408 FrameTime: 2.452 ms
> > [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 483 FrameTime: 2.072 ms
> > [ideas] speed=duration: FPS: 1005 FrameTime: 0.995 ms
> > [jellyfish] <default>: FPS: 2945 FrameTime: 0.340 ms
> > [terrain] <default>: FPS: 131 FrameTime: 7.663 ms
> > [shadow] <default>: FPS: 2276 FrameTime: 0.440 ms
> > [refract] <default>: FPS: 328 FrameTime: 3.050 ms
> > [conditionals] fragment-steps=0:vertex-steps=0: FPS: 5099 FrameTime: 0.196 ms
> > [conditionals] fragment-steps=5:vertex-steps=0: FPS: 4538 FrameTime: 0.220 ms
> > [conditionals] fragment-steps=0:vertex-steps=5: FPS: 5152 FrameTime: 0.194 ms
> > [function] fragment-complexity=low:fragment-steps=5: FPS: 4818 FrameTime: 0.208 ms
> > [function] fragment-complexity=medium:fragment-steps=5: FPS: 4035 FrameTime: 0.248 ms
> > [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4855 FrameTime: 0.206 ms
> > [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4812 FrameTime: 0.208 ms
> > [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 4150 FrameTime: 0.241 ms
> > =======================================================
> >                                   glmark2 Score: 3322 
> > =======================================================
> > 
> > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > ---
> > > Boris Brezillon (10):
> > >       drm/panthor: Make panthor_irq::state a non-atomic field
> > >       drm/panthor: Move the register accessors before the IRQ helpers
> > >       drm/panthor: Replace the panthor_irq macro machinery by inline helpers
> > >       drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers
> > >       drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
> > >       drm/panthor: Prepare the scheduler logic for FW events in IRQ context
> > >       drm/panthor: Automate CSG IRQ processing at group unbind time
> > >       drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
> > >       drm/panthor: Process FW events in IRQ context
> > >       drm/panthor: Introduce interrupt coalescing support for job IRQs
> > > 
> > >  drivers/gpu/drm/panthor/panthor_device.h | 358 ++++++++++++++---------
> > >  drivers/gpu/drm/panthor/panthor_drv.c    |   1 +
> > >  drivers/gpu/drm/panthor/panthor_fw.c     | 226 +++++++++++++--
> > >  drivers/gpu/drm/panthor/panthor_fw.h     |  11 +-
> > >  drivers/gpu/drm/panthor/panthor_gpu.c    |  27 +-
> > >  drivers/gpu/drm/panthor/panthor_mmu.c    |  38 +--
> > >  drivers/gpu/drm/panthor/panthor_pwr.c    |  21 +-
> > >  drivers/gpu/drm/panthor/panthor_sched.c  | 475 ++++++++++++++-----------------
> > >  8 files changed, 698 insertions(+), 459 deletions(-)
> > > ---
> > > base-commit: 7455a0583a906533041a80e48c6a2e3230cce96e
> > > change-id: 20260429-panthor-signal-from-irq-d33684f4d292
> > > prerequisite-message-id: <20260427155934.416502-1-karunika.choo@arm.com>
> > > prerequisite-patch-id: 70905a2eb09ab2b31d242a5ed5af3b42fb6a464c
> > > prerequisite-patch-id: aa4c22669f80328039762f25c0b3942bbadbdc89
> > > prerequisite-patch-id: 7f61bcee3c4bb5703900b18d5b6e0f52e622f29d
> > > prerequisite-patch-id: 3402f4d60aa526d40113fc3d9b3e599f8f89e705
> > > prerequisite-patch-id: 00ddbd3d455891f6950609614c1acd2baa78b0db
> > > prerequisite-patch-id: 6a9928f609e3757cadebb2df6795d0da55745f4e
> > > prerequisite-patch-id: fd91f68f25d4bc93eec405f0131f5ae4284bfaf2
> > > prerequisite-patch-id: 553958a10a0ca2f20f7883ad4c752cfc7485c5a8
> > > 
> > > Best regards,  
> > 
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-04 11:02     ` Boris Brezillon
@ 2026-05-06 14:35       ` Steven Price
  2026-05-06 16:08         ` Boris Brezillon
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Price @ 2026-05-06 14:35 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On 04/05/2026 12:02, Boris Brezillon wrote:
> On Fri, 1 May 2026 15:20:17 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> On 29/04/2026 10:38, Boris Brezillon wrote:
>>> Rather than assuming an interrupt is always expected for request
>>> acks, temporarily enable the relevant interrupts when the polling-wait
>>> failed. This should hopefully reduce the number of interrupts the CPU
>>> has to process.
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
>>
>> It seems to work, although I'm lightly uneasy about this because I'm not
>> entirely sure whether the FW will immediately see the updates to
>> ack_irq_mask and therefore whether there's a possibility to miss an
>> event and be stuck waiting for the timeout.
>>
>> Memory models are not my strong point, OpenAI tells me the sequence
>> should be something like:
>>
>>   scoped_guard(spinlock_irqsave, lock) {
>>   	u32 ack_irq_mask = READ_ONCE(*ack_irq_mask_ptr);
>>
>>   	WRITE_ONCE(*ack_irq_mask_ptr, ack_irq_mask | req_mask);
>>   }
> 
> Is this really needed? In which situation would the compiler/CPU decide
> to re-order this read_update_modify sequence?

I think that's the AI being a bit overzealous, but in general WRITE_ONCE
is necessary to avoid some surprising effects. In theory the compiler
can decide to perform multiple writes if it's non-volatile. I.e. a
sequence like:

	u32 old_mask = *ack_irq_mask_ptr;
	if (condition)
		*ack_irq_mask_ptr = 0;
	else
		*ack_irq_mask_ptr |= req_mask;

Can be 'optimised' to:

	u32 old_mask = *ack_irq_mask_ptr;
	*ack_irq_mask_ptr = 0;
	if (!condition)
		*ack_irq_mask_ptr = old_mask | req_mask;

In which the compiler has changed the (!condition) path to do two writes
one of which "should never be seen".

Given that the compiler shouldn't be able to move any of the effects
outside of the scoped_guard(), and since there's only one operation then
I can't see how a compiler would screw it up - but the compiler is
technically free to do so.

>>
>>   /*
>>    * The FW interface can be mapped write-combine/Normal-NC.
> 
> I'm not too sure I see what the non-cached property has to do with it.
> If it was cached we would still need this memory barrier, and in
> addition, we'd need a cache flush if the FW is not IO-coherent.

I *think* the point the AI was making is that the memory isn't Device.
I.e. it's writeback and the write might not have completed.

>> Make sure the
>>    * IRQ mask update is visible to the FW before sleeping waiting for
>> the IRQ.
>>    */
>>   wmb();
>>
>> Which seems plausible. But I've long ago learnt that plausible doesn't
>> mean much when dealing with memory models!
> 
> Yeah, I'm not too sure. I was honestly expecting the spinlock guard to
> act as a memory barrier already, but maybe it's not enough.

So logically it must be enough to enable other CPUs to see writes within
the spinlock - otherwise spinlocks would be completely broken on SMP. I
guess it should be sufficient for the GPU's firmware MCU to see.

This is of course the problem with AI - it's found something 'plausible'
but it's probably completely wrong. I guess we just ignore it unless we
do actually start seeing that timeout happen.

Thanks,
Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-06 14:35       ` Steven Price
@ 2026-05-06 16:08         ` Boris Brezillon
  2026-05-13 15:02           ` Steven Price
  0 siblings, 1 reply; 39+ messages in thread
From: Boris Brezillon @ 2026-05-06 16:08 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Wed, 6 May 2026 15:35:18 +0100
Steven Price <steven.price@arm.com> wrote:

> On 04/05/2026 12:02, Boris Brezillon wrote:
> > On Fri, 1 May 2026 15:20:17 +0100
> > Steven Price <steven.price@arm.com> wrote:
> >   
> >> On 29/04/2026 10:38, Boris Brezillon wrote:  
> >>> Rather than assuming an interrupt is always expected for request
> >>> acks, temporarily enable the relevant interrupts when the polling-wait
> >>> failed. This should hopefully reduce the number of interrupts the CPU
> >>> has to process.
> >>>
> >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>    
> >>
> >> It seems to work, although I'm lightly uneasy about this because I'm not
> >> entirely sure whether the FW will immediately see the updates to
> >> ack_irq_mask and therefore whether there's a possibility to miss an
> >> event and be stuck waiting for the timeout.
> >>
> >> Memory models are not my strong point, OpenAI tells me the sequence
> >> should be something like:
> >>
> >>   scoped_guard(spinlock_irqsave, lock) {
> >>   	u32 ack_irq_mask = READ_ONCE(*ack_irq_mask_ptr);
> >>
> >>   	WRITE_ONCE(*ack_irq_mask_ptr, ack_irq_mask | req_mask);
> >>   }  
> > 
> > Is this really needed? In which situation would the compiler/CPU decide
> > to re-order this read_update_modify sequence?  
> 
> I think that's the AI being a bit overzealous, but in general WRITE_ONCE
> is necessary to avoid some surprising effects. In theory the compiler
> can decide to perform multiple writes if it's non-volatile. I.e. a
> sequence like:
> 
> 	u32 old_mask = *ack_irq_mask_ptr;
> 	if (condition)
> 		*ack_irq_mask_ptr = 0;
> 	else
> 		*ack_irq_mask_ptr |= req_mask;
> 
> Can be 'optimised' to:
> 
> 	u32 old_mask = *ack_irq_mask_ptr;
> 	*ack_irq_mask_ptr = 0;
> 	if (!condition)
> 		*ack_irq_mask_ptr = old_mask | req_mask;
> 
> In which the compiler has changed the (!condition) path to do two writes
> one of which "should never be seen".
> 
> Given that the compiler shouldn't be able to move any of the effects
> outside of the scoped_guard(), and since there's only one operation then
> I can't see how a compiler would screw it up - but the compiler is
> technically free to do so.

Sure, I'm not saying read_modify_write is atomic per-se (even though
I'd be surprised if the compiler wasn't generating instructions that
are atomic in the end), but it is thread-safe because of the spinlock
covering the read_modify_write op.

> 
> >>
> >>   /*
> >>    * The FW interface can be mapped write-combine/Normal-NC.  
> > 
> > I'm not too sure I see what the non-cached property has to do with it.
> > If it was cached we would still need this memory barrier, and in
> > addition, we'd need a cache flush if the FW is not IO-coherent.  
> 
> I *think* the point the AI was making is that the memory isn't Device.
> I.e. it's writeback and the write might not have completed.

Okay, get it now.

> 
> >> Make sure the
> >>    * IRQ mask update is visible to the FW before sleeping waiting for
> >> the IRQ.
> >>    */
> >>   wmb();
> >>
> >> Which seems plausible. But I've long ago learnt that plausible doesn't
> >> mean much when dealing with memory models!  
> > 
> > Yeah, I'm not too sure. I was honestly expecting the spinlock guard to
> > act as a memory barrier already, but maybe it's not enough.  
> 
> So logically it must be enough to enable other CPUs to see writes within
> the spinlock - otherwise spinlocks would be completely broken on SMP. I
> guess it should be sufficient for the GPU's firmware MCU to see.

For the record, this is currently mapped uncached on both the CPU and
GPU side, because we don't have a way to describe the
shareability properly with the current IOMMU flags.

So, my understanding was that the smp_wb() (DMB(ISH) on arm64) at
the end of a spin_unlock(), would ensure proper store/load instruction
ordering around this barrier, but that it would only wait for the
content to reach the inner shareable domain before returning, not any
further. But maybe I got that wrong from the start, and DMB(ISH)
doesn't even start the transaction if the access is targeting uncached
memory. In which case, AI is right, a full wmb() is needed, otherwise
there's a chance we'll wait indefinitely because the update didn't make
it to the FW interface in the first place.

Also, if that's broken for ack_irq_mask, it's also broken in other
places...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-06 16:08         ` Boris Brezillon
@ 2026-05-13 15:02           ` Steven Price
  2026-05-13 15:42             ` Boris Brezillon
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Price @ 2026-05-13 15:02 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On 06/05/2026 17:08, Boris Brezillon wrote:
> On Wed, 6 May 2026 15:35:18 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> On 04/05/2026 12:02, Boris Brezillon wrote:
>>> On Fri, 1 May 2026 15:20:17 +0100
>>> Steven Price <steven.price@arm.com> wrote:
>>>   
>>>> On 29/04/2026 10:38, Boris Brezillon wrote:  
>>>>> Rather than assuming an interrupt is always expected for request
>>>>> acks, temporarily enable the relevant interrupts when the polling-wait
>>>>> failed. This should hopefully reduce the number of interrupts the CPU
>>>>> has to process.
>>>>>
>>>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>    
>>>>
>>>> It seems to work, although I'm lightly uneasy about this because I'm not
>>>> entirely sure whether the FW will immediately see the updates to
>>>> ack_irq_mask and therefore whether there's a possibility to miss an
>>>> event and be stuck waiting for the timeout.
>>>>
>>>> Memory models are not my strong point, OpenAI tells me the sequence
>>>> should be something like:
>>>>
>>>>   scoped_guard(spinlock_irqsave, lock) {
>>>>   	u32 ack_irq_mask = READ_ONCE(*ack_irq_mask_ptr);
>>>>
>>>>   	WRITE_ONCE(*ack_irq_mask_ptr, ack_irq_mask | req_mask);
>>>>   }  
>>>
>>> Is this really needed? In which situation would the compiler/CPU decide
>>> to re-order this read_update_modify sequence?  
>>
>> I think that's the AI being a bit overzealous, but in general WRITE_ONCE
>> is necessary to avoid some surprising effects. In theory the compiler
>> can decide to perform multiple writes if it's non-volatile. I.e. a
>> sequence like:
>>
>> 	u32 old_mask = *ack_irq_mask_ptr;
>> 	if (condition)
>> 		*ack_irq_mask_ptr = 0;
>> 	else
>> 		*ack_irq_mask_ptr |= req_mask;
>>
>> Can be 'optimised' to:
>>
>> 	u32 old_mask = *ack_irq_mask_ptr;
>> 	*ack_irq_mask_ptr = 0;
>> 	if (!condition)
>> 		*ack_irq_mask_ptr = old_mask | req_mask;
>>
>> In which the compiler has changed the (!condition) path to do two writes
>> one of which "should never be seen".
>>
>> Given that the compiler shouldn't be able to move any of the effects
>> outside of the scoped_guard(), and since there's only one operation then
>> I can't see how a compiler would screw it up - but the compiler is
>> technically free to do so.
> 
> Sure, I'm not saying read_modify_write is atomic per-se (even though
> I'd be surprised if the compiler wasn't generating instructions that
> are atomic in the end), but it is thread-safe because of the spinlock
> covering the read_modify_write op.

But one of the "threads" is the MCU which isn't using the spinlock -
which is why it's a problem if the compiler left the value in a 'random'
state even if it's all fixed up by the time the spinlock is released.

Like you say I would be very surprised if a compiler messed it up in
this case.

>>
>>>>
>>>>   /*
>>>>    * The FW interface can be mapped write-combine/Normal-NC.  
>>>
>>> I'm not too sure I see what the non-cached property has to do with it.
>>> If it was cached we would still need this memory barrier, and in
>>> addition, we'd need a cache flush if the FW is not IO-coherent.  
>>
>> I *think* the point the AI was making is that the memory isn't Device.
>> I.e. it's writeback and the write might not have completed.
> 
> Okay, get it now.
> 
>>
>>>> Make sure the
>>>>    * IRQ mask update is visible to the FW before sleeping waiting for
>>>> the IRQ.
>>>>    */
>>>>   wmb();
>>>>
>>>> Which seems plausible. But I've long ago learnt that plausible doesn't
>>>> mean much when dealing with memory models!  
>>>
>>> Yeah, I'm not too sure. I was honestly expecting the spinlock guard to
>>> act as a memory barrier already, but maybe it's not enough.  
>>
>> So logically it must be enough to enable other CPUs to see writes within
>> the spinlock - otherwise spinlocks would be completely broken on SMP. I
>> guess it should be sufficient for the GPU's firmware MCU to see.
> 
> For the record, this is currently mapped uncached on both the CPU and
> GPU side, because we don't have a way to describe the
> shareability properly with the current IOMMU flags.
> 
> So, my understanding was that the smp_wb() (DMB(ISH) on arm64) at
> the end of a spin_unlock(), would ensure proper store/load instruction
> ordering around this barrier, but that it would only wait for the
> content to reach the inner shareable domain before returning, not any
> further. But maybe I got that wrong from the start, and DMB(ISH)
> doesn't even start the transaction if the access is targeting uncached
> memory. In which case, AI is right, a full wmb() is needed, otherwise
> there's a chance we'll wait indefinitely because the update didn't make
> it to the FW interface in the first place.
> 
> Also, if that's broken for ack_irq_mask, it's also broken in other
> places...

So I've looked at the firmware implementation and I can say that there's
a race condition on the firmware side if we change the mask after
sending the START/RESUME request. The firmware currently samples the
mask before triggering the update to CSG_ACK and then uses the sampled
value to decide whether to trigger an interrupt.

This is all fine if the mask is set before the request (so nothing
broken with the current code), but we'd need a firmware change before we
can safely do what this patch was proposing. And of course we'd have to
get our heads round the barriers needed! ;)

Thanks,
Steve


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-13 15:02           ` Steven Price
@ 2026-05-13 15:42             ` Boris Brezillon
  0 siblings, 0 replies; 39+ messages in thread
From: Boris Brezillon @ 2026-05-13 15:42 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Wed, 13 May 2026 16:02:11 +0100
Steven Price <steven.price@arm.com> wrote:

> >>>> It seems to work, although I'm lightly uneasy about this because I'm not
> >>>> entirely sure whether the FW will immediately see the updates to
> >>>> ack_irq_mask and therefore whether there's a possibility to miss an
> >>>> event and be stuck waiting for the timeout.
> >>>>
> >>>> Memory models are not my strong point, OpenAI tells me the sequence
> >>>> should be something like:
> >>>>
> >>>>   scoped_guard(spinlock_irqsave, lock) {
> >>>>   	u32 ack_irq_mask = READ_ONCE(*ack_irq_mask_ptr);
> >>>>
> >>>>   	WRITE_ONCE(*ack_irq_mask_ptr, ack_irq_mask | req_mask);
> >>>>   }    
> >>>
> >>> Is this really needed? In which situation would the compiler/CPU decide
> >>> to re-order this read_update_modify sequence?    
> >>
> >> I think that's the AI being a bit overzealous, but in general WRITE_ONCE
> >> is necessary to avoid some surprising effects. In theory the compiler
> >> can decide to perform multiple writes if it's non-volatile. I.e. a
> >> sequence like:
> >>
> >> 	u32 old_mask = *ack_irq_mask_ptr;
> >> 	if (condition)
> >> 		*ack_irq_mask_ptr = 0;
> >> 	else
> >> 		*ack_irq_mask_ptr |= req_mask;
> >>
> >> Can be 'optimised' to:
> >>
> >> 	u32 old_mask = *ack_irq_mask_ptr;
> >> 	*ack_irq_mask_ptr = 0;
> >> 	if (!condition)
> >> 		*ack_irq_mask_ptr = old_mask | req_mask;
> >>
> >> In which the compiler has changed the (!condition) path to do two writes
> >> one of which "should never be seen".
> >>
> >> Given that the compiler shouldn't be able to move any of the effects
> >> outside of the scoped_guard(), and since there's only one operation then
> >> I can't see how a compiler would screw it up - but the compiler is
> >> technically free to do so.  
> > 
> > Sure, I'm not saying read_modify_write is atomic per-se (even though
> > I'd be surprised if the compiler wasn't generating instructions that
> > are atomic in the end), but it is thread-safe because of the spinlock
> > covering the read_modify_write op.  
> 
> But one of the "threads" is the MCU which isn't using the spinlock -
> which is why it's a problem if the compiler left the value in a 'random'
> state even if it's all fixed up by the time the spinlock is released.

Okay, I see what you mean. I truly hope it's not random values, but if
it goes

	X -> 0 -> X | Y

or

	X -> 0 -> X & ~Y

that's already problematic, because we'd lose events.

> 
> Like you say I would be very surprised if a compiler messed it up in
> this case.

I'll add the READ/WRITE_ONCE() and add a comment to make sure we don't
forget why they are needed (in theory).

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2026-05-13 15:42 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29  9:38 [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
2026-04-29  9:38 ` [PATCH 01/10] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
2026-04-29 12:29   ` Liviu Dudau
2026-05-01 13:17   ` Steven Price
2026-04-29  9:38 ` [PATCH 02/10] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
2026-04-29 12:31   ` Liviu Dudau
2026-05-01 13:17   ` Steven Price
2026-04-29  9:38 ` [PATCH 03/10] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
2026-04-30  9:40   ` Karunika Choo
2026-04-30 10:38     ` Boris Brezillon
2026-05-01 13:22   ` Steven Price
2026-04-29  9:38 ` [PATCH 04/10] drm/panthor: Extend the IRQ logic to allow fast/raw IRQ handlers Boris Brezillon
2026-04-29 13:32   ` Liviu Dudau
2026-05-01 13:28   ` Steven Price
2026-04-29  9:38 ` [PATCH 05/10] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
2026-04-29 13:33   ` Liviu Dudau
2026-05-01 13:39   ` Steven Price
2026-04-29  9:38 ` [PATCH 06/10] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
2026-05-01 13:47   ` Steven Price
2026-05-04  9:34     ` Boris Brezillon
2026-04-29  9:38 ` [PATCH 07/10] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
2026-05-01 13:53   ` Steven Price
2026-05-04 15:00     ` Boris Brezillon
2026-04-29  9:38 ` [PATCH 08/10] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
2026-05-01 14:20   ` Steven Price
2026-05-04 11:02     ` Boris Brezillon
2026-05-06 14:35       ` Steven Price
2026-05-06 16:08         ` Boris Brezillon
2026-05-13 15:02           ` Steven Price
2026-05-13 15:42             ` Boris Brezillon
2026-04-29  9:38 ` [PATCH 09/10] drm/panthor: Process FW events in IRQ context Boris Brezillon
2026-05-01 14:38   ` Steven Price
2026-04-29  9:38 ` [PATCH 10/10] drm/panthor: Introduce interrupt coalescing support for job IRQs Boris Brezillon
2026-05-01 14:57   ` Steven Price
2026-05-04 11:15     ` Boris Brezillon
2026-04-29  9:59 ` [PATCH 00/10] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
2026-04-29 10:36 ` Boris Brezillon
2026-05-05  8:54   ` Boris Brezillon
2026-05-05 16:12     ` Liviu Dudau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox