[PATCH v2 00/11] drm/panthor: Reduce dma

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency
@ 2026-05-12 11:37 Boris Brezillon
  2026-05-12 11:37 ` [PATCH v2 01/11] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
                   ` (10 more replies)
  0 siblings, 11 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Right now, panthor is one of the rare drivers to signal fences
from work items (not even from the threaded IRQ handler). We
could move that to the threaded handler, but that would still
leave the latency caused by the scheduling of the IRQ thread.

Instead, this patchset moves all the JOB/GPU IRQ processing to
the raw IRQ handler, which is fine because what the current
code does is demux the interrupts and defer actual handling
to sub work items. The only non-trivial thing we keep in the
IRQ path is the dma_fence signalling, which should be acceptable
in term of CPU cycles burnt in IRQ context.

Note that the MMU event handling is left in a threaded handler
because it requires acquiring sleepable locks and fixing that
is non-trivial.

Still very basic testing done, but glmark2 and gfxbench's
manhattan test show a ~5% perf improvement on a rk3588 with this
patchset applied.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
Changes in v2:
- Fix commit message in patch 4
- Move devm_kasprintf() before panthor_irq_resume() in patch 3
- Fix erroneous lockdep_assert_held() in patch 6
- Make sure events_lock is held when calling
  csg_slot_sync_update_locked() in patch 6
- Restore a csg_slot_sync_update_locked() call in patch 7
- Fix a potential deadlock in patch 9
- Drop the IRQ coalescing patch (formerly patch 10)
- Change panthor_irq_request() so we don't have to define a dummy
  threaded handler, and we can let RT kernels move the hard handler
  to a thread
- Add patches to transition GPU event processing to the hard IRQ handler
- Link to v1: https://lore.kernel.org/r/20260429-panthor-signal-from-irq-v1-0-4b92ae4142d2@collabora.com

---
Boris Brezillon (11):
      drm/panthor: Make panthor_irq::state a non-atomic field
      drm/panthor: Move the register accessors before the IRQ helpers
      drm/panthor: Replace the panthor_irq macro machinery by inline helpers
      drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers
      drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
      drm/panthor: Prepare the scheduler logic for FW events in IRQ context
      drm/panthor: Automate CSG IRQ processing at group unbind time
      drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
      drm/panthor: Process FW events in IRQ context
      drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler()
      drm/panthor: Process GPU events in IRQ context

 drivers/gpu/drm/panthor/panthor_device.h | 281 +++++++++---------
 drivers/gpu/drm/panthor/panthor_fw.c     |  76 +++--
 drivers/gpu/drm/panthor/panthor_fw.h     |   9 +-
 drivers/gpu/drm/panthor/panthor_gpu.c    |  31 +-
 drivers/gpu/drm/panthor/panthor_mmu.c    |  38 +--
 drivers/gpu/drm/panthor/panthor_pwr.c    |  21 +-
 drivers/gpu/drm/panthor/panthor_sched.c  | 483 ++++++++++++++-----------------
 7 files changed, 476 insertions(+), 463 deletions(-)
---
base-commit: ac5ac0acf11df04295eb1811066097b7022d6c7f
change-id: 20260429-panthor-signal-from-irq-d33684f4d292

Best regards,
-- 
Boris Brezillon <boris.brezillon@collabora.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 01/11] drm/panthor: Make panthor_irq::state a non-atomic field
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 18:40   ` Chia-I Wu
  2026-05-12 11:37 ` [PATCH v2 02/11] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

The only place where panthor_irq::state is accessed without
panthor_irq::mask_lock held is in the prologue of _irq_suspend(),
which is not really a fast-path. So let's simplify things by assuming
panthor_irq::state must always be accessed with the mask_lock held,
and add a scoped_guard() in _irq_suspend().

Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 35 ++++++++++++++++----------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 4e4607bca7cc..3f91ba73829d 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -101,8 +101,12 @@ struct panthor_irq {
 	 */
 	spinlock_t mask_lock;
 
-	/** @state: one of &enum panthor_irq_state reflecting the current state. */
-	atomic_t state;
+	/**
+	 * @state: one of &enum panthor_irq_state reflecting the current state.
+	 *
+	 * Must be accessed with mask_lock held.
+	 */
+	enum panthor_irq_state state;
 };
 
 /**
@@ -510,18 +514,15 @@ const char *panthor_exception_name(struct panthor_device *ptdev,
 static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)			\
 {												\
 	struct panthor_irq *pirq = data;							\
-	enum panthor_irq_state old_state;							\
 												\
 	if (!gpu_read(pirq->iomem, INT_STAT))							\
 		return IRQ_NONE;								\
 												\
 	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-	old_state = atomic_cmpxchg(&pirq->state,						\
-				   PANTHOR_IRQ_STATE_ACTIVE,					\
-				   PANTHOR_IRQ_STATE_PROCESSING);				\
-	if (old_state != PANTHOR_IRQ_STATE_ACTIVE)						\
+	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)						\
 		return IRQ_NONE;								\
 												\
+	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;						\
 	gpu_write(pirq->iomem, INT_MASK, 0);							\
 	return IRQ_WAKE_THREAD;									\
 }												\
@@ -551,13 +552,10 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
 	}											\
 												\
 	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
-		enum panthor_irq_state old_state;						\
-												\
-		old_state = atomic_cmpxchg(&pirq->state,					\
-					   PANTHOR_IRQ_STATE_PROCESSING,			\
-					   PANTHOR_IRQ_STATE_ACTIVE);				\
-		if (old_state == PANTHOR_IRQ_STATE_PROCESSING)					\
+		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {				\
+			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;					\
 			gpu_write(pirq->iomem, INT_MASK, pirq->mask);				\
+		}										\
 	}											\
 												\
 	return ret;										\
@@ -566,18 +564,19 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
 static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
 {												\
 	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
-		atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDING);				\
+		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;					\
 		gpu_write(pirq->iomem, INT_MASK, 0);						\
 	}											\
 	synchronize_irq(pirq->irq);								\
-	atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDED);					\
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock)					\
+		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;					\
 }												\
 												\
 static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)			\
 {												\
 	guard(spinlock_irqsave)(&pirq->mask_lock);						\
 												\
-	atomic_set(&pirq->state, PANTHOR_IRQ_STATE_ACTIVE);					\
+	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;							\
 	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);						\
 	gpu_write(pirq->iomem, INT_MASK, pirq->mask);						\
 }												\
@@ -610,7 +609,7 @@ static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *
 	 * on the PROCESSING -> ACTIVE transition.						\
 	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
 	 */											\
-	if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)				\
+	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
 		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
 }												\
 												\
@@ -624,7 +623,7 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
 	 * on the PROCESSING -> ACTIVE transition.						\
 	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
 	 */											\
-	if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)				\
+	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
 		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
 }
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 02/11] drm/panthor: Move the register accessors before the IRQ helpers
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
  2026-05-12 11:37 ` [PATCH v2 01/11] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 18:41   ` Chia-I Wu
  2026-05-12 11:37 ` [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

We're about to add an IRQ inline helper using gpu_read(). Move things
around to avoid forward declarations.

No functional changes.

Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 142 +++++++++++++++----------------
 1 file changed, 71 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 3f91ba73829d..768fc1992368 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -495,6 +495,77 @@ panthor_exception_is_fault(u32 exception_code)
 const char *panthor_exception_name(struct panthor_device *ptdev,
 				   u32 exception_code);
 
+static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
+{
+	writel(data, iomem + reg);
+}
+
+static inline u32 gpu_read(void __iomem *iomem, u32 reg)
+{
+	return readl(iomem + reg);
+}
+
+static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
+{
+	return readl_relaxed(iomem + reg);
+}
+
+static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
+{
+	gpu_write(iomem, reg, lower_32_bits(data));
+	gpu_write(iomem, reg + 4, upper_32_bits(data));
+}
+
+static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
+{
+	return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
+}
+
+static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
+{
+	return (gpu_read_relaxed(iomem, reg) |
+		((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
+}
+
+static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
+{
+	u32 lo, hi1, hi2;
+	do {
+		hi1 = gpu_read(iomem, reg + 4);
+		lo = gpu_read(iomem, reg);
+		hi2 = gpu_read(iomem, reg + 4);
+	} while (hi1 != hi2);
+	return lo | ((u64)hi2 << 32);
+}
+
+#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
+	read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,	\
+			  iomem, reg)
+
+#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
+				     timeout_us)				\
+	read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,	\
+				 false, iomem, reg)
+
+#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
+	read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,	\
+			  iomem, reg)
+
+#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
+				       timeout_us)				\
+	read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,	\
+				 false, iomem, reg)
+
+#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,	\
+					     timeout_us)			\
+	read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,		\
+				 timeout_us, false, iomem, reg)
+
+#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,	\
+					timeout_us)				\
+	read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,	\
+			  false, iomem, reg)
+
 #define INT_RAWSTAT 0x0
 #define INT_CLEAR   0x4
 #define INT_MASK    0x8
@@ -629,75 +700,4 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
 
 extern struct workqueue_struct *panthor_cleanup_wq;
 
-static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
-{
-	writel(data, iomem + reg);
-}
-
-static inline u32 gpu_read(void __iomem *iomem, u32 reg)
-{
-	return readl(iomem + reg);
-}
-
-static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
-{
-	return readl_relaxed(iomem + reg);
-}
-
-static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
-{
-	gpu_write(iomem, reg, lower_32_bits(data));
-	gpu_write(iomem, reg + 4, upper_32_bits(data));
-}
-
-static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
-{
-	return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
-}
-
-static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
-{
-	return (gpu_read_relaxed(iomem, reg) |
-		((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
-}
-
-static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
-{
-	u32 lo, hi1, hi2;
-	do {
-		hi1 = gpu_read(iomem, reg + 4);
-		lo = gpu_read(iomem, reg);
-		hi2 = gpu_read(iomem, reg + 4);
-	} while (hi1 != hi2);
-	return lo | ((u64)hi2 << 32);
-}
-
-#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
-	read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,	\
-			  iomem, reg)
-
-#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
-				     timeout_us)				\
-	read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,	\
-				 false, iomem, reg)
-
-#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)	\
-	read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,	\
-			  iomem, reg)
-
-#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,		\
-				       timeout_us)				\
-	read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,	\
-				 false, iomem, reg)
-
-#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,	\
-					     timeout_us)			\
-	read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,		\
-				 timeout_us, false, iomem, reg)
-
-#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,	\
-					timeout_us)				\
-	read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,	\
-			  false, iomem, reg)
-
 #endif

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
  2026-05-12 11:37 ` [PATCH v2 01/11] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
  2026-05-12 11:37 ` [PATCH v2 02/11] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 18:58   ` Chia-I Wu
  2026-05-12 11:37 ` [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers Boris Brezillon
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Now that panthor_irq contains the iomem region, there's no real need
for the macro-based panthor_irq helper generation logic. We can just
provide inline helpers that do the same and let the compiler optimize
indirect function calls. The only extra annoyance is the fact we have
to open-code the panthor_xxx_irq_threaded_handler() implementation, but
those are single-line functions, so it's acceptable.

While at it, we changed the prototype of the IRQ handlers to take
a panthor_irq instead of panthor_device, since that's the thing
that's passed around when it comes to panthor_irq, and the
panthor_device can be directly extracted from there.

Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 245 +++++++++++++++----------------
 drivers/gpu/drm/panthor/panthor_fw.c     |  22 ++-
 drivers/gpu/drm/panthor/panthor_gpu.c    |  26 ++--
 drivers/gpu/drm/panthor/panthor_mmu.c    |  37 ++---
 drivers/gpu/drm/panthor/panthor_pwr.c    |  20 ++-
 5 files changed, 183 insertions(+), 167 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 768fc1992368..393fcda73d88 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -571,131 +571,126 @@ static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
 #define INT_MASK    0x8
 #define INT_STAT    0xc
 
-/**
- * PANTHOR_IRQ_HANDLER() - Define interrupt handlers and the interrupt
- * registration function.
- *
- * The boiler-plate to gracefully deal with shared interrupts is
- * auto-generated. All you have to do is call PANTHOR_IRQ_HANDLER()
- * just after the actual handler. The handler prototype is:
- *
- * void (*handler)(struct panthor_device *, u32 status);
- */
-#define PANTHOR_IRQ_HANDLER(__name, __handler)							\
-static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)			\
-{												\
-	struct panthor_irq *pirq = data;							\
-												\
-	if (!gpu_read(pirq->iomem, INT_STAT))							\
-		return IRQ_NONE;								\
-												\
-	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)						\
-		return IRQ_NONE;								\
-												\
-	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;						\
-	gpu_write(pirq->iomem, INT_MASK, 0);							\
-	return IRQ_WAKE_THREAD;									\
-}												\
-												\
-static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *data)		\
-{												\
-	struct panthor_irq *pirq = data;							\
-	struct panthor_device *ptdev = pirq->ptdev;						\
-	irqreturn_t ret = IRQ_NONE;								\
-												\
-	while (true) {										\
-		/* It's safe to access pirq->mask without the lock held here. If a new		\
-		 * event gets added to the mask and the corresponding IRQ is pending,		\
-		 * we'll process it right away instead of adding an extra raw -> threaded	\
-		 * round trip. If an event is removed and the status bit is set, it will	\
-		 * be ignored, just like it would have been if the mask had been adjusted	\
-		 * right before the HW event kicks in. TLDR; it's all expected races we're	\
-		 * covered for.									\
-		 */										\
-		u32 status = gpu_read(pirq->iomem, INT_RAWSTAT) & pirq->mask;			\
-												\
-		if (!status)									\
-			break;									\
-												\
-		__handler(ptdev, status);							\
-		ret = IRQ_HANDLED;								\
-	}											\
-												\
-	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
-		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {				\
-			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;					\
-			gpu_write(pirq->iomem, INT_MASK, pirq->mask);				\
-		}										\
-	}											\
-												\
-	return ret;										\
-}												\
-												\
-static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)			\
-{												\
-	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {					\
-		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;					\
-		gpu_write(pirq->iomem, INT_MASK, 0);						\
-	}											\
-	synchronize_irq(pirq->irq);								\
-	scoped_guard(spinlock_irqsave, &pirq->mask_lock)					\
-		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;					\
-}												\
-												\
-static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)			\
-{												\
-	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-												\
-	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;							\
-	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);						\
-	gpu_write(pirq->iomem, INT_MASK, pirq->mask);						\
-}												\
-												\
-static int panthor_request_ ## __name ## _irq(struct panthor_device *ptdev,			\
-					      struct panthor_irq *pirq,				\
-					      int irq, u32 mask, void __iomem *iomem)		\
-{												\
-	pirq->ptdev = ptdev;									\
-	pirq->irq = irq;									\
-	pirq->mask = mask;									\
-	pirq->iomem = iomem;									\
-	spin_lock_init(&pirq->mask_lock);							\
-	panthor_ ## __name ## _irq_resume(pirq);						\
-												\
-	return devm_request_threaded_irq(ptdev->base.dev, irq,					\
-					 panthor_ ## __name ## _irq_raw_handler,		\
-					 panthor_ ## __name ## _irq_threaded_handler,		\
-					 IRQF_SHARED, KBUILD_MODNAME "-" # __name,		\
-					 pirq);							\
-}												\
-												\
-static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *pirq, u32 mask)	\
-{												\
-	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-	pirq->mask |= mask;									\
-												\
-	/* The only situation where we need to write the new mask is if the IRQ is active.	\
-	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()	\
-	 * on the PROCESSING -> ACTIVE transition.						\
-	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
-	 */											\
-	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
-		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
-}												\
-												\
-static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq *pirq, u32 mask)\
-{												\
-	guard(spinlock_irqsave)(&pirq->mask_lock);						\
-	pirq->mask &= ~mask;									\
-												\
-	/* The only situation where we need to write the new mask is if the IRQ is active.	\
-	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()	\
-	 * on the PROCESSING -> ACTIVE transition.						\
-	 * If the IRQ is suspended/suspending, the mask is restored at resume time.		\
-	 */											\
-	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)						\
-		gpu_write(pirq->iomem, INT_MASK, pirq->mask);					\
+static inline irqreturn_t panthor_irq_default_raw_handler(int irq, void *data)
+{
+	struct panthor_irq *pirq = data;
+
+	if (!gpu_read(pirq->iomem, INT_STAT))
+		return IRQ_NONE;
+
+	guard(spinlock_irqsave)(&pirq->mask_lock);
+	if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
+		return IRQ_NONE;
+
+	pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
+	gpu_write(pirq->iomem, INT_MASK, 0);
+	return IRQ_WAKE_THREAD;
+}
+
+static inline irqreturn_t
+panthor_irq_default_threaded_handler(void *data,
+				     void (*slow_handler)(struct panthor_irq *, u32))
+{
+	struct panthor_irq *pirq = data;
+	irqreturn_t ret = IRQ_NONE;
+
+	while (true) {
+		/* It's safe to access pirq->mask without the lock held here. If a new
+		 * event gets added to the mask and the corresponding IRQ is pending,
+		 * we'll process it right away instead of adding an extra raw -> threaded
+		 * round trip. If an event is removed and the status bit is set, it will
+		 * be ignored, just like it would have been if the mask had been adjusted
+		 * right before the HW event kicks in. TLDR; it's all expected races we're
+		 * covered for.
+		 */
+		u32 status = gpu_read(pirq->iomem, INT_RAWSTAT) & pirq->mask;
+
+		if (!status)
+			break;
+
+		slow_handler(pirq, status);
+		ret = IRQ_HANDLED;
+	}
+
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {
+			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
+			gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+		}
+	}
+
+	return ret;
+}
+
+static inline void panthor_irq_suspend(struct panthor_irq *pirq)
+{
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;
+		gpu_write(pirq->iomem, INT_MASK, 0);
+	}
+	synchronize_irq(pirq->irq);
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock)
+		pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;
+}
+
+static inline void panthor_irq_resume(struct panthor_irq *pirq)
+{
+	guard(spinlock_irqsave)(&pirq->mask_lock);
+	pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
+	gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);
+	gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+}
+
+static inline void panthor_irq_enable_events(struct panthor_irq *pirq, u32 mask)
+{
+	guard(spinlock_irqsave)(&pirq->mask_lock);
+	pirq->mask |= mask;
+
+	/* The only situation where we need to write the new mask is if the IRQ is active.
+	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()
+	 * on the PROCESSING -> ACTIVE transition.
+	 * If the IRQ is suspended/suspending, the mask is restored at resume time.
+	 */
+	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)
+		gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+}
+
+static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask)
+{
+	guard(spinlock_irqsave)(&pirq->mask_lock);
+	pirq->mask &= ~mask;
+
+	/* The only situation where we need to write the new mask is if the IRQ is active.
+	 * If it's being processed, the mask will be restored for us in _irq_threaded_handler()
+	 * on the PROCESSING -> ACTIVE transition.
+	 * If the IRQ is suspended/suspending, the mask is restored at resume time.
+	 */
+	if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)
+		gpu_write(pirq->iomem, INT_MASK, pirq->mask);
+}
+
+static inline int
+panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
+		    int irq, u32 mask, void __iomem *iomem, const char *name,
+		    irqreturn_t (*threaded_handler)(int, void *data))
+{
+	const char *full_name;
+
+	pirq->ptdev = ptdev;
+	pirq->irq = irq;
+	pirq->mask = mask;
+	pirq->iomem = iomem;
+	spin_lock_init(&pirq->mask_lock);
+
+	full_name = devm_kasprintf(ptdev->base.dev, GFP_KERNEL, KBUILD_MODNAME "-%s", name);
+	if (!full_name)
+		return -ENOMEM;
+
+	panthor_irq_resume(pirq);
+	return devm_request_threaded_irq(ptdev->base.dev, irq,
+					 panthor_irq_default_raw_handler,
+					 threaded_handler,
+					 IRQF_SHARED, full_name, pirq);
 }
 
 extern struct workqueue_struct *panthor_cleanup_wq;
diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index 986151681b24..eaf599b0a887 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1064,8 +1064,9 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
 			 msecs_to_jiffies(PING_INTERVAL_MS));
 }
 
-static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
+static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
 {
+	struct panthor_device *ptdev = pirq->ptdev;
 	u32 duration;
 	u64 start = 0;
 
@@ -1091,7 +1092,11 @@ static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
 		trace_gpu_job_irq(ptdev->base.dev, status, duration);
 	}
 }
-PANTHOR_IRQ_HANDLER(job, panthor_job_irq_handler);
+
+static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
+{
+	return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
+}
 
 static int panthor_fw_start(struct panthor_device *ptdev)
 {
@@ -1099,8 +1104,8 @@ static int panthor_fw_start(struct panthor_device *ptdev)
 	bool timedout = false;
 
 	ptdev->fw->booted = false;
-	panthor_job_irq_enable_events(&ptdev->fw->irq, ~0);
-	panthor_job_irq_resume(&ptdev->fw->irq);
+	panthor_irq_enable_events(&ptdev->fw->irq, ~0);
+	panthor_irq_resume(&ptdev->fw->irq);
 	gpu_write(fw->iomem, MCU_CONTROL, MCU_CONTROL_AUTO);
 
 	if (!wait_event_timeout(ptdev->fw->req_waitqueue,
@@ -1210,7 +1215,7 @@ void panthor_fw_pre_reset(struct panthor_device *ptdev, bool on_hang)
 			ptdev->reset.fast = true;
 	}
 
-	panthor_job_irq_suspend(&ptdev->fw->irq);
+	panthor_irq_suspend(&ptdev->fw->irq);
 	panthor_fw_stop(ptdev);
 }
 
@@ -1280,7 +1285,7 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
 	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) {
 		/* Make sure the IRQ handler cannot be called after that point. */
 		if (ptdev->fw->irq.irq)
-			panthor_job_irq_suspend(&ptdev->fw->irq);
+			panthor_irq_suspend(&ptdev->fw->irq);
 
 		panthor_fw_stop(ptdev);
 	}
@@ -1476,8 +1481,9 @@ int panthor_fw_init(struct panthor_device *ptdev)
 	if (irq <= 0)
 		return -ENODEV;
 
-	ret = panthor_request_job_irq(ptdev, &fw->irq, irq, 0,
-				      ptdev->iomem + JOB_INT_BASE);
+	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
+				  ptdev->iomem + JOB_INT_BASE, "job",
+				  panthor_job_irq_threaded_handler);
 	if (ret) {
 		drm_err(&ptdev->base, "failed to request job irq");
 		return ret;
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index e52c5675981f..ce208e384762 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -86,8 +86,9 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
 	gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
 }
 
-static void panthor_gpu_irq_handler(struct panthor_device *ptdev, u32 status)
+static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
 {
+	struct panthor_device *ptdev = pirq->ptdev;
 	struct panthor_gpu *gpu = ptdev->gpu;
 
 	gpu_write(gpu->irq.iomem, INT_CLEAR, status);
@@ -116,7 +117,11 @@ static void panthor_gpu_irq_handler(struct panthor_device *ptdev, u32 status)
 	}
 	spin_unlock(&ptdev->gpu->reqs_lock);
 }
-PANTHOR_IRQ_HANDLER(gpu, panthor_gpu_irq_handler);
+
+static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
+{
+	return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);
+}
 
 /**
  * panthor_gpu_unplug() - Called when the GPU is unplugged.
@@ -128,7 +133,7 @@ void panthor_gpu_unplug(struct panthor_device *ptdev)
 
 	/* Make sure the IRQ handler is not running after that point. */
 	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev))
-		panthor_gpu_irq_suspend(&ptdev->gpu->irq);
+		panthor_irq_suspend(&ptdev->gpu->irq);
 
 	/* Wake-up all waiters. */
 	spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
@@ -169,9 +174,10 @@ int panthor_gpu_init(struct panthor_device *ptdev)
 	if (irq < 0)
 		return irq;
 
-	ret = panthor_request_gpu_irq(ptdev, &ptdev->gpu->irq, irq,
-				      GPU_INTERRUPTS_MASK,
-				      ptdev->iomem + GPU_INT_BASE);
+	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
+				  GPU_INTERRUPTS_MASK,
+				  ptdev->iomem + GPU_INT_BASE, "gpu",
+				  panthor_gpu_irq_threaded_handler);
 	if (ret)
 		return ret;
 
@@ -182,7 +188,7 @@ int panthor_gpu_power_changed_on(struct panthor_device *ptdev)
 {
 	guard(pm_runtime_active)(ptdev->base.dev);
 
-	panthor_gpu_irq_enable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
+	panthor_irq_enable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
 
 	return 0;
 }
@@ -191,7 +197,7 @@ void panthor_gpu_power_changed_off(struct panthor_device *ptdev)
 {
 	guard(pm_runtime_active)(ptdev->base.dev);
 
-	panthor_gpu_irq_disable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
+	panthor_irq_disable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
 }
 
 /**
@@ -424,7 +430,7 @@ void panthor_gpu_suspend(struct panthor_device *ptdev)
 	else
 		panthor_hw_l2_power_off(ptdev);
 
-	panthor_gpu_irq_suspend(&ptdev->gpu->irq);
+	panthor_irq_suspend(&ptdev->gpu->irq);
 }
 
 /**
@@ -436,7 +442,7 @@ void panthor_gpu_suspend(struct panthor_device *ptdev)
  */
 void panthor_gpu_resume(struct panthor_device *ptdev)
 {
-	panthor_gpu_irq_resume(&ptdev->gpu->irq);
+	panthor_irq_resume(&ptdev->gpu->irq);
 	panthor_hw_l2_power_on(ptdev);
 }
 
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index 452d0b6d4668..375022fb3fd8 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -586,17 +586,13 @@ static u32 panthor_mmu_as_fault_mask(struct panthor_device *ptdev, u32 as)
 	return BIT(as);
 }
 
-/* Forward declaration to call helpers within as_enable/disable */
-static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status);
-PANTHOR_IRQ_HANDLER(mmu, panthor_mmu_irq_handler);
-
 static int panthor_mmu_as_enable(struct panthor_device *ptdev, u32 as_nr,
 				 u64 transtab, u64 transcfg, u64 memattr)
 {
 	struct panthor_mmu *mmu = ptdev->mmu;
 
-	panthor_mmu_irq_enable_events(&ptdev->mmu->irq,
-				      panthor_mmu_as_fault_mask(ptdev, as_nr));
+	panthor_irq_enable_events(&ptdev->mmu->irq,
+				  panthor_mmu_as_fault_mask(ptdev, as_nr));
 
 	gpu_write64(mmu->iomem, AS_TRANSTAB(as_nr), transtab);
 	gpu_write64(mmu->iomem, AS_MEMATTR(as_nr), memattr);
@@ -614,8 +610,8 @@ static int panthor_mmu_as_disable(struct panthor_device *ptdev, u32 as_nr,
 
 	lockdep_assert_held(&ptdev->mmu->as.slots_lock);
 
-	panthor_mmu_irq_disable_events(&ptdev->mmu->irq,
-				       panthor_mmu_as_fault_mask(ptdev, as_nr));
+	panthor_irq_disable_events(&ptdev->mmu->irq,
+				   panthor_mmu_as_fault_mask(ptdev, as_nr));
 
 	/* Flush+invalidate RW caches, invalidate RO ones. */
 	ret = panthor_gpu_flush_caches(ptdev, CACHE_CLEAN | CACHE_INV,
@@ -1785,8 +1781,9 @@ static void panthor_vm_unlock_region(struct panthor_vm *vm)
 	mutex_unlock(&ptdev->mmu->as.slots_lock);
 }
 
-static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
+static void panthor_mmu_irq_handler(struct panthor_irq *pirq, u32 status)
 {
+	struct panthor_device *ptdev = pirq->ptdev;
 	struct panthor_mmu *mmu = ptdev->mmu;
 	bool has_unhandled_faults = false;
 
@@ -1849,6 +1846,11 @@ static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
 		panthor_sched_report_mmu_fault(ptdev);
 }
 
+static irqreturn_t panthor_mmu_irq_threaded_handler(int irq, void *data)
+{
+	return panthor_irq_default_threaded_handler(data, panthor_mmu_irq_handler);
+}
+
 /**
  * panthor_mmu_suspend() - Suspend the MMU logic
  * @ptdev: Device.
@@ -1873,7 +1875,7 @@ void panthor_mmu_suspend(struct panthor_device *ptdev)
 	}
 	mutex_unlock(&ptdev->mmu->as.slots_lock);
 
-	panthor_mmu_irq_suspend(&ptdev->mmu->irq);
+	panthor_irq_suspend(&ptdev->mmu->irq);
 }
 
 /**
@@ -1892,7 +1894,7 @@ void panthor_mmu_resume(struct panthor_device *ptdev)
 	ptdev->mmu->as.faulty_mask = 0;
 	mutex_unlock(&ptdev->mmu->as.slots_lock);
 
-	panthor_mmu_irq_resume(&ptdev->mmu->irq);
+	panthor_irq_resume(&ptdev->mmu->irq);
 }
 
 /**
@@ -1909,7 +1911,7 @@ void panthor_mmu_pre_reset(struct panthor_device *ptdev)
 {
 	struct panthor_vm *vm;
 
-	panthor_mmu_irq_suspend(&ptdev->mmu->irq);
+	panthor_irq_suspend(&ptdev->mmu->irq);
 
 	mutex_lock(&ptdev->mmu->vm.lock);
 	ptdev->mmu->vm.reset_in_progress = true;
@@ -1946,7 +1948,7 @@ void panthor_mmu_post_reset(struct panthor_device *ptdev)
 
 	mutex_unlock(&ptdev->mmu->as.slots_lock);
 
-	panthor_mmu_irq_resume(&ptdev->mmu->irq);
+	panthor_irq_resume(&ptdev->mmu->irq);
 
 	/* Restart the VM_BIND queues. */
 	mutex_lock(&ptdev->mmu->vm.lock);
@@ -3207,7 +3209,7 @@ panthor_mmu_reclaim_priv_bos(struct panthor_device *ptdev,
 void panthor_mmu_unplug(struct panthor_device *ptdev)
 {
 	if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev))
-		panthor_mmu_irq_suspend(&ptdev->mmu->irq);
+		panthor_irq_suspend(&ptdev->mmu->irq);
 
 	mutex_lock(&ptdev->mmu->as.slots_lock);
 	for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) {
@@ -3261,9 +3263,10 @@ int panthor_mmu_init(struct panthor_device *ptdev)
 	if (irq <= 0)
 		return -ENODEV;
 
-	ret = panthor_request_mmu_irq(ptdev, &mmu->irq, irq,
-				      panthor_mmu_fault_mask(ptdev, ~0),
-				      ptdev->iomem + MMU_INT_BASE);
+	ret = panthor_irq_request(ptdev, &mmu->irq, irq,
+				  panthor_mmu_fault_mask(ptdev, ~0),
+				  ptdev->iomem + MMU_INT_BASE, "mmu",
+				  panthor_mmu_irq_threaded_handler);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
index 7c7f424a1436..80cf78007896 100644
--- a/drivers/gpu/drm/panthor/panthor_pwr.c
+++ b/drivers/gpu/drm/panthor/panthor_pwr.c
@@ -56,8 +56,9 @@ struct panthor_pwr {
 	wait_queue_head_t reqs_acked;
 };
 
-static void panthor_pwr_irq_handler(struct panthor_device *ptdev, u32 status)
+static void panthor_pwr_irq_handler(struct panthor_irq *pirq, u32 status)
 {
+	struct panthor_device *ptdev = pirq->ptdev;
 	struct panthor_pwr *pwr = ptdev->pwr;
 
 	spin_lock(&ptdev->pwr->reqs_lock);
@@ -75,7 +76,11 @@ static void panthor_pwr_irq_handler(struct panthor_device *ptdev, u32 status)
 	}
 	spin_unlock(&ptdev->pwr->reqs_lock);
 }
-PANTHOR_IRQ_HANDLER(pwr, panthor_pwr_irq_handler);
+
+static irqreturn_t panthor_pwr_irq_threaded_handler(int irq, void *data)
+{
+	return panthor_irq_default_threaded_handler(data, panthor_pwr_irq_handler);
+}
 
 static void panthor_pwr_write_command(struct panthor_device *ptdev, u32 command, u64 args)
 {
@@ -453,7 +458,7 @@ void panthor_pwr_unplug(struct panthor_device *ptdev)
 		return;
 
 	/* Make sure the IRQ handler is not running after that point. */
-	panthor_pwr_irq_suspend(&ptdev->pwr->irq);
+	panthor_irq_suspend(&ptdev->pwr->irq);
 
 	/* Wake-up all waiters. */
 	spin_lock_irqsave(&ptdev->pwr->reqs_lock, flags);
@@ -483,9 +488,10 @@ int panthor_pwr_init(struct panthor_device *ptdev)
 	if (irq < 0)
 		return irq;
 
-	err = panthor_request_pwr_irq(
+	err = panthor_irq_request(
 		ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
-		pwr->iomem + PWR_INT_BASE);
+		pwr->iomem + PWR_INT_BASE, "pwr",
+		panthor_pwr_irq_threaded_handler);
 	if (err)
 		return err;
 
@@ -564,7 +570,7 @@ void panthor_pwr_suspend(struct panthor_device *ptdev)
 	if (!ptdev->pwr)
 		return;
 
-	panthor_pwr_irq_suspend(&ptdev->pwr->irq);
+	panthor_irq_suspend(&ptdev->pwr->irq);
 }
 
 void panthor_pwr_resume(struct panthor_device *ptdev)
@@ -572,5 +578,5 @@ void panthor_pwr_resume(struct panthor_device *ptdev)
 	if (!ptdev->pwr)
 		return;
 
-	panthor_pwr_irq_resume(&ptdev->pwr->irq);
+	panthor_irq_resume(&ptdev->pwr->irq);
 }

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (2 preceding siblings ...)
  2026-05-12 11:37 ` [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 19:11   ` Chia-I Wu
  2026-05-12 11:37 ` [PATCH v2 05/11] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

All drivers except panthor signal their fences from their interrupt
handler to minimize latency. We could do the same from the threaded
handler, but the latency is still quite high in that case, so let's
allow components to choose the context they want their IRQ handler
to run in by exposing support for custom hard handlers.

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h | 11 ++++++++---
 drivers/gpu/drm/panthor/panthor_fw.c     |  1 +
 drivers/gpu/drm/panthor/panthor_gpu.c    |  1 +
 drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
 drivers/gpu/drm/panthor/panthor_pwr.c    |  1 +
 5 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 393fcda73d88..1aaf06df875b 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
 static inline int
 panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
 		    int irq, u32 mask, void __iomem *iomem, const char *name,
+		    irqreturn_t (*raw_handler)(int, void *data),
 		    irqreturn_t (*threaded_handler)(int, void *data))
 {
 	const char *full_name;
@@ -687,9 +688,13 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
 		return -ENOMEM;
 
 	panthor_irq_resume(pirq);
-	return devm_request_threaded_irq(ptdev->base.dev, irq,
-					 panthor_irq_default_raw_handler,
-					 threaded_handler,
+
+	if (!threaded_handler) {
+		return devm_request_irq(ptdev->base.dev, irq, raw_handler,
+					IRQF_SHARED, full_name, pirq);
+	}
+
+	return devm_request_threaded_irq(ptdev->base.dev, irq, raw_handler, threaded_handler,
 					 IRQF_SHARED, full_name, pirq);
 }
 
diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index eaf599b0a887..8239a6951569 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1483,6 +1483,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
 
 	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
 				  ptdev->iomem + JOB_INT_BASE, "job",
+				  panthor_irq_default_raw_handler,
 				  panthor_job_irq_threaded_handler);
 	if (ret) {
 		drm_err(&ptdev->base, "failed to request job irq");
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index ce208e384762..d0be758ea3e1 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -177,6 +177,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
 	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
 				  GPU_INTERRUPTS_MASK,
 				  ptdev->iomem + GPU_INT_BASE, "gpu",
+				  panthor_irq_default_raw_handler,
 				  panthor_gpu_irq_threaded_handler);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index 375022fb3fd8..2955b8baa2e2 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -3266,6 +3266,7 @@ int panthor_mmu_init(struct panthor_device *ptdev)
 	ret = panthor_irq_request(ptdev, &mmu->irq, irq,
 				  panthor_mmu_fault_mask(ptdev, ~0),
 				  ptdev->iomem + MMU_INT_BASE, "mmu",
+				  panthor_irq_default_raw_handler,
 				  panthor_mmu_irq_threaded_handler);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
index 80cf78007896..1efb7f3482ba 100644
--- a/drivers/gpu/drm/panthor/panthor_pwr.c
+++ b/drivers/gpu/drm/panthor/panthor_pwr.c
@@ -491,6 +491,7 @@ int panthor_pwr_init(struct panthor_device *ptdev)
 	err = panthor_irq_request(
 		ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
 		pwr->iomem + PWR_INT_BASE, "pwr",
+		panthor_irq_default_raw_handler,
 		panthor_pwr_irq_threaded_handler);
 	if (err)
 		return err;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 05/11] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (3 preceding siblings ...)
  2026-05-12 11:37 ` [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 19:29   ` Chia-I Wu
  2026-05-12 11:37 ` [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

If we want some FW events to be processed in the interrupt path, we need
the helpers manipulating req regs to be IRQ-safe, which implies using
spin_lock_irqsave instead of spinlock. While at it, use guards instead
of plain spin_lock/unlock calls.

Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_fw.h | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
index a99a9b6f4825..e56b7fe15bb3 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.h
+++ b/drivers/gpu/drm/panthor/panthor_fw.h
@@ -432,12 +432,11 @@ struct panthor_fw_global_iface {
 #define panthor_fw_toggle_reqs(__iface, __in_reg, __out_reg, __mask) \
 	do { \
 		u32 __cur_val, __new_val, __out_val; \
-		spin_lock(&(__iface)->lock); \
+		guard(spinlock_irqsave)(&(__iface)->lock); \
 		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
 		__out_val = READ_ONCE((__iface)->output->__out_reg); \
 		__new_val = ((__out_val ^ (__mask)) & (__mask)) | (__cur_val & ~(__mask)); \
 		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
-		spin_unlock(&(__iface)->lock); \
 	} while (0)
 
 /**
@@ -458,21 +457,19 @@ struct panthor_fw_global_iface {
 #define panthor_fw_update_reqs(__iface, __in_reg, __val, __mask) \
 	do { \
 		u32 __cur_val, __new_val; \
-		spin_lock(&(__iface)->lock); \
+		guard(spinlock_irqsave)(&(__iface)->lock); \
 		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
 		__new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
 		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
-		spin_unlock(&(__iface)->lock); \
 	} while (0)
 
 #define panthor_fw_update_reqs64(__iface, __in_reg, __val, __mask) \
 	do { \
 		u64 __cur_val, __new_val; \
-		spin_lock(&(__iface)->lock); \
+		guard(spinlock_irqsave)(&(__iface)->lock); \
 		__cur_val = READ_ONCE((__iface)->input->__in_reg); \
 		__new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
 		WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
-		spin_unlock(&(__iface)->lock); \
 	} while (0)
 
 struct panthor_fw_global_iface *

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (4 preceding siblings ...)
  2026-05-12 11:37 ` [PATCH v2 05/11] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 21:04   ` Chia-I Wu
  2026-05-12 11:37 ` [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Add a specific spinlock for events processing, and force processing
of events in the panthor_sched_report_fw_events() path rather than
deferring it to a work item. We also fast-track fence signalling by
making the job completion logic IRQ-safe.

Note that it requires changing a couple spin_lock() into
spin_lock_irqsave() when those are taken inside a events_lock section.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_sched.c | 332 +++++++++++++++-----------------
 1 file changed, 155 insertions(+), 177 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 5b34032deff8..fbf76b59b7ef 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -177,18 +177,6 @@ struct panthor_scheduler {
 	 */
 	struct work_struct sync_upd_work;
 
-	/**
-	 * @fw_events_work: Work used to process FW events outside the interrupt path.
-	 *
-	 * Even if the interrupt is threaded, we need any event processing
-	 * that require taking the panthor_scheduler::lock to be processed
-	 * outside the interrupt path so we don't block the tick logic when
-	 * it calls panthor_fw_{csg,wait}_wait_acks(). Since most of the
-	 * event processing requires taking this lock, we just delegate all
-	 * FW event processing to the scheduler workqueue.
-	 */
-	struct work_struct fw_events_work;
-
 	/**
 	 * @fw_events: Bitmask encoding pending FW events.
 	 */
@@ -254,6 +242,15 @@ struct panthor_scheduler {
 		struct list_head waiting;
 	} groups;
 
+	/**
+	 * @events_lock: Lock taken when processing events.
+	 *
+	 * This also needs to be taken when csg_slots are updated, to make sure
+	 * the event processing logic doesn't touch groups that have left the CSG
+	 * slot.
+	 */
+	spinlock_t events_lock;
+
 	/**
 	 * @csg_slots: FW command stream group slots.
 	 */
@@ -676,9 +673,6 @@ struct panthor_group {
 	 */
 	struct panthor_kernel_bo *protm_suspend_buf;
 
-	/** @sync_upd_work: Work used to check/signal job fences. */
-	struct work_struct sync_upd_work;
-
 	/** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
 	struct work_struct tiler_oom_work;
 
@@ -999,7 +993,6 @@ static int
 group_bind_locked(struct panthor_group *group, u32 csg_id)
 {
 	struct panthor_device *ptdev = group->ptdev;
-	struct panthor_csg_slot *csg_slot;
 	int ret;
 
 	lockdep_assert_held(&ptdev->scheduler->lock);
@@ -1012,9 +1005,7 @@ group_bind_locked(struct panthor_group *group, u32 csg_id)
 	if (ret)
 		return ret;
 
-	csg_slot = &ptdev->scheduler->csg_slots[csg_id];
 	group_get(group);
-	group->csg_id = csg_id;
 
 	/* Dummy doorbell allocation: doorbell is assigned to the group and
 	 * all queues use the same doorbell.
@@ -1026,7 +1017,10 @@ group_bind_locked(struct panthor_group *group, u32 csg_id)
 	for (u32 i = 0; i < group->queue_count; i++)
 		group->queues[i]->doorbell_id = csg_id + 1;
 
-	csg_slot->group = group;
+	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
+		ptdev->scheduler->csg_slots[csg_id].group = group;
+		group->csg_id = csg_id;
+	}
 
 	return 0;
 }
@@ -1041,7 +1035,6 @@ static int
 group_unbind_locked(struct panthor_group *group)
 {
 	struct panthor_device *ptdev = group->ptdev;
-	struct panthor_csg_slot *slot;
 
 	lockdep_assert_held(&ptdev->scheduler->lock);
 
@@ -1051,9 +1044,12 @@ group_unbind_locked(struct panthor_group *group)
 	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
 		return -EINVAL;
 
-	slot = &ptdev->scheduler->csg_slots[group->csg_id];
+	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
+		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
+		group->csg_id = -1;
+	}
+
 	panthor_vm_idle(group->vm);
-	group->csg_id = -1;
 
 	/* Tiler OOM events will be re-issued next time the group is scheduled. */
 	atomic_set(&group->tiler_oom, 0);
@@ -1062,8 +1058,6 @@ group_unbind_locked(struct panthor_group *group)
 	for (u32 i = 0; i < group->queue_count; i++)
 		group->queues[i]->doorbell_id = -1;
 
-	slot->group = NULL;
-
 	group_put(group);
 	return 0;
 }
@@ -1151,16 +1145,14 @@ queue_suspend_timeout_locked(struct panthor_queue *queue)
 static void
 queue_suspend_timeout(struct panthor_queue *queue)
 {
-	spin_lock(&queue->fence_ctx.lock);
+	guard(spinlock_irqsave)(&queue->fence_ctx.lock);
 	queue_suspend_timeout_locked(queue);
-	spin_unlock(&queue->fence_ctx.lock);
 }
 
 static void
 queue_resume_timeout(struct panthor_queue *queue)
 {
-	spin_lock(&queue->fence_ctx.lock);
-
+	guard(spinlock_irqsave)(&queue->fence_ctx.lock);
 	if (queue_timeout_is_suspended(queue)) {
 		mod_delayed_work(queue->scheduler.timeout_wq,
 				 &queue->timeout.work,
@@ -1168,8 +1160,6 @@ queue_resume_timeout(struct panthor_queue *queue)
 
 		queue->timeout.remaining = MAX_SCHEDULE_TIMEOUT;
 	}
-
-	spin_unlock(&queue->fence_ctx.lock);
 }
 
 /**
@@ -1484,7 +1474,7 @@ cs_slot_process_fatal_event_locked(struct panthor_device *ptdev,
 	u32 fatal;
 	u64 info;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
 	fatal = cs_iface->output->fatal;
@@ -1532,7 +1522,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
 	u32 fault;
 	u64 info;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
 	fault = cs_iface->output->fault;
@@ -1542,7 +1532,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
 		u64 cs_extract = queue->iface.output->extract;
 		struct panthor_job *job;
 
-		spin_lock(&queue->fence_ctx.lock);
+		guard(spinlock_irqsave)(&queue->fence_ctx.lock);
 		list_for_each_entry(job, &queue->fence_ctx.in_flight_jobs, node) {
 			if (cs_extract >= job->ringbuf.end)
 				continue;
@@ -1552,7 +1542,6 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
 
 			dma_fence_set_error(job->done_fence, -EINVAL);
 		}
-		spin_unlock(&queue->fence_ctx.lock);
 	}
 
 	if (group) {
@@ -1682,7 +1671,7 @@ cs_slot_process_tiler_oom_event_locked(struct panthor_device *ptdev,
 	struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
 	struct panthor_group *group = csg_slot->group;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	if (drm_WARN_ON(&ptdev->base, !group))
 		return;
@@ -1703,7 +1692,7 @@ static bool cs_slot_process_irq_locked(struct panthor_device *ptdev,
 	struct panthor_fw_cs_iface *cs_iface;
 	u32 req, ack, events;
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
 	req = cs_iface->input->req;
@@ -1731,7 +1720,7 @@ static void csg_slot_process_idle_event_locked(struct panthor_device *ptdev, u32
 {
 	struct panthor_scheduler *sched = ptdev->scheduler;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	sched->might_have_idle_groups = true;
 
@@ -1742,16 +1731,102 @@ static void csg_slot_process_idle_event_locked(struct panthor_device *ptdev, u32
 	sched_queue_delayed_work(sched, tick, 0);
 }
 
+static void update_fdinfo_stats(struct panthor_job *job)
+{
+	struct panthor_group *group = job->group;
+	struct panthor_queue *queue = group->queues[job->queue_idx];
+	struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
+	struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
+	struct panthor_job_profiling_data *data = &slots[job->profiling.slot];
+
+	scoped_guard(spinlock_irqsave, &group->fdinfo.lock) {
+		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
+			fdinfo->cycles += data->cycles.after - data->cycles.before;
+		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
+			fdinfo->time += data->time.after - data->time.before;
+	}
+}
+
+static bool queue_check_job_completion(struct panthor_queue *queue)
+{
+	struct panthor_syncobj_64b *syncobj = NULL;
+	struct panthor_job *job, *job_tmp;
+	bool cookie, progress = false;
+	LIST_HEAD(done_jobs);
+
+	cookie = dma_fence_begin_signalling();
+	scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock) {
+		list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) {
+			if (!syncobj) {
+				struct panthor_group *group = job->group;
+
+				syncobj = group->syncobjs->kmap +
+					  (job->queue_idx * sizeof(*syncobj));
+			}
+
+			if (syncobj->seqno < job->done_fence->seqno)
+				break;
+
+			list_move_tail(&job->node, &done_jobs);
+			dma_fence_signal_locked(job->done_fence);
+		}
+
+		if (list_empty(&queue->fence_ctx.in_flight_jobs)) {
+			/* If we have no job left, we cancel the timer, and reset remaining
+			 * time to its default so it can be restarted next time
+			 * queue_resume_timeout() is called.
+			 */
+			queue_suspend_timeout_locked(queue);
+
+			/* If there's no job pending, we consider it progress to avoid a
+			 * spurious timeout if the timeout handler and the sync update
+			 * handler raced.
+			 */
+			progress = true;
+		} else if (!list_empty(&done_jobs)) {
+			queue_reset_timeout_locked(queue);
+			progress = true;
+		}
+	}
+	dma_fence_end_signalling(cookie);
+
+	list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
+		if (job->profiling.mask)
+			update_fdinfo_stats(job);
+		list_del_init(&job->node);
+		panthor_job_put(&job->base);
+	}
+
+	return progress;
+}
+
+static void group_check_job_completion(struct panthor_group *group)
+{
+	bool cookie;
+	u32 queue_idx;
+
+	cookie = dma_fence_begin_signalling();
+	for (queue_idx = 0; queue_idx < group->queue_count; queue_idx++) {
+		struct panthor_queue *queue = group->queues[queue_idx];
+
+		if (!queue)
+			continue;
+
+		queue_check_job_completion(queue);
+	}
+	dma_fence_end_signalling(cookie);
+}
+
 static void csg_slot_sync_update_locked(struct panthor_device *ptdev,
 					u32 csg_id)
 {
 	struct panthor_csg_slot *csg_slot = &ptdev->scheduler->csg_slots[csg_id];
 	struct panthor_group *group = csg_slot->group;
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	if (group)
-		group_queue_work(group, sync_upd);
+		group_check_job_completion(group);
 
 	sched_queue_work(ptdev->scheduler, sync_upd);
 }
@@ -1763,7 +1838,7 @@ csg_slot_process_progress_timer_event_locked(struct panthor_device *ptdev, u32 c
 	struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
 	struct panthor_group *group = csg_slot->group;
 
-	lockdep_assert_held(&sched->lock);
+	lockdep_assert_held(&sched->events_lock);
 
 	group = csg_slot->group;
 	if (!drm_WARN_ON(&ptdev->base, !group)) {
@@ -1784,7 +1859,7 @@ static void sched_process_csg_irq_locked(struct panthor_device *ptdev, u32 csg_i
 	struct panthor_fw_csg_iface *csg_iface;
 	u32 ring_cs_db_mask = 0;
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	if (drm_WARN_ON(&ptdev->base, csg_id >= ptdev->scheduler->csg_slot_count))
 		return;
@@ -1842,7 +1917,7 @@ static void sched_process_idle_event_locked(struct panthor_device *ptdev)
 {
 	struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	/* Acknowledge the idle event and schedule a tick. */
 	panthor_fw_update_reqs(glb_iface, req, glb_iface->output->ack, GLB_IDLE);
@@ -1858,7 +1933,7 @@ static void sched_process_global_irq_locked(struct panthor_device *ptdev)
 	struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
 	u32 req, ack, evts;
 
-	lockdep_assert_held(&ptdev->scheduler->lock);
+	lockdep_assert_held(&ptdev->scheduler->events_lock);
 
 	req = READ_ONCE(glb_iface->input->req);
 	ack = READ_ONCE(glb_iface->output->ack);
@@ -1868,30 +1943,6 @@ static void sched_process_global_irq_locked(struct panthor_device *ptdev)
 		sched_process_idle_event_locked(ptdev);
 }
 
-static void process_fw_events_work(struct work_struct *work)
-{
-	struct panthor_scheduler *sched = container_of(work, struct panthor_scheduler,
-						      fw_events_work);
-	u32 events = atomic_xchg(&sched->fw_events, 0);
-	struct panthor_device *ptdev = sched->ptdev;
-
-	mutex_lock(&sched->lock);
-
-	if (events & JOB_INT_GLOBAL_IF) {
-		sched_process_global_irq_locked(ptdev);
-		events &= ~JOB_INT_GLOBAL_IF;
-	}
-
-	while (events) {
-		u32 csg_id = ffs(events) - 1;
-
-		sched_process_csg_irq_locked(ptdev, csg_id);
-		events &= ~BIT(csg_id);
-	}
-
-	mutex_unlock(&sched->lock);
-}
-
 /**
  * panthor_sched_report_fw_events() - Report FW events to the scheduler.
  * @ptdev: Device.
@@ -1902,8 +1953,19 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
 	if (!ptdev->scheduler)
 		return;
 
-	atomic_or(events, &ptdev->scheduler->fw_events);
-	sched_queue_work(ptdev->scheduler, fw_events);
+	guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
+
+	if (events & JOB_INT_GLOBAL_IF) {
+		sched_process_global_irq_locked(ptdev);
+		events &= ~JOB_INT_GLOBAL_IF;
+	}
+
+	while (events) {
+		u32 csg_id = ffs(events) - 1;
+
+		sched_process_csg_irq_locked(ptdev, csg_id);
+		events &= ~BIT(csg_id);
+	}
 }
 
 static const char *fence_get_driver_name(struct dma_fence *fence)
@@ -2136,7 +2198,9 @@ tick_ctx_init(struct panthor_scheduler *sched,
 		 * CSG IRQs, so we can flag the faulty queue.
 		 */
 		if (panthor_vm_has_unhandled_faults(group->vm)) {
-			sched_process_csg_irq_locked(ptdev, i);
+			scoped_guard(spinlock_irqsave, &sched->events_lock) {
+				sched_process_csg_irq_locked(ptdev, i);
+			}
 
 			/* No fatal fault reported, flag all queues as faulty. */
 			if (!group->fatal_queues)
@@ -2183,13 +2247,13 @@ group_term_post_processing(struct panthor_group *group)
 		if (!queue)
 			continue;
 
-		spin_lock(&queue->fence_ctx.lock);
-		list_for_each_entry_safe(job, tmp, &queue->fence_ctx.in_flight_jobs, node) {
-			list_move_tail(&job->node, &faulty_jobs);
-			dma_fence_set_error(job->done_fence, err);
-			dma_fence_signal_locked(job->done_fence);
+		scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock) {
+			list_for_each_entry_safe(job, tmp, &queue->fence_ctx.in_flight_jobs, node) {
+				list_move_tail(&job->node, &faulty_jobs);
+				dma_fence_set_error(job->done_fence, err);
+				dma_fence_signal_locked(job->done_fence);
+			}
 		}
-		spin_unlock(&queue->fence_ctx.lock);
 
 		/* Manually update the syncobj seqno to unblock waiters. */
 		syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj));
@@ -2336,8 +2400,10 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
 			 * any pending interrupts before we start the new
 			 * group.
 			 */
-			if (group->csg_id >= 0)
+			if (group->csg_id >= 0) {
+				guard(spinlock_irqsave)(&sched->events_lock);
 				sched_process_csg_irq_locked(ptdev, group->csg_id);
+			}
 
 			group_unbind_locked(group);
 		}
@@ -2902,10 +2968,12 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
 			u32 csg_id = ffs(slot_mask) - 1;
 			struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
 
-			if (flush_caches_failed)
+			if (flush_caches_failed) {
 				csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
-			else
+			} else {
+				guard(spinlock_irqsave)(&sched->events_lock);
 				csg_slot_sync_update_locked(ptdev, csg_id);
+			}
 
 			slot_mask &= ~BIT(csg_id);
 		}
@@ -2920,8 +2988,10 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
 
 		group_get(group);
 
-		if (group->csg_id >= 0)
+		if (group->csg_id >= 0) {
+			guard(spinlock_irqsave)(&sched->events_lock);
 			sched_process_csg_irq_locked(ptdev, group->csg_id);
+		}
 
 		group_unbind_locked(group);
 
@@ -3005,22 +3075,6 @@ void panthor_sched_post_reset(struct panthor_device *ptdev, bool reset_failed)
 	}
 }
 
-static void update_fdinfo_stats(struct panthor_job *job)
-{
-	struct panthor_group *group = job->group;
-	struct panthor_queue *queue = group->queues[job->queue_idx];
-	struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
-	struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
-	struct panthor_job_profiling_data *data = &slots[job->profiling.slot];
-
-	scoped_guard(spinlock, &group->fdinfo.lock) {
-		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
-			fdinfo->cycles += data->cycles.after - data->cycles.before;
-		if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
-			fdinfo->time += data->time.after - data->time.before;
-	}
-}
-
 void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
 {
 	struct panthor_group_pool *gpool = pfile->groups;
@@ -3032,7 +3086,7 @@ void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
 
 	xa_lock(&gpool->xa);
 	xa_for_each_marked(&gpool->xa, i, group, GROUP_REGISTERED) {
-		guard(spinlock)(&group->fdinfo.lock);
+		guard(spinlock_irqsave)(&group->fdinfo.lock);
 		pfile->stats.cycles += group->fdinfo.data.cycles;
 		pfile->stats.time += group->fdinfo.data.time;
 		group->fdinfo.data.cycles = 0;
@@ -3041,80 +3095,6 @@ void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
 	xa_unlock(&gpool->xa);
 }
 
-static bool queue_check_job_completion(struct panthor_queue *queue)
-{
-	struct panthor_syncobj_64b *syncobj = NULL;
-	struct panthor_job *job, *job_tmp;
-	bool cookie, progress = false;
-	LIST_HEAD(done_jobs);
-
-	cookie = dma_fence_begin_signalling();
-	spin_lock(&queue->fence_ctx.lock);
-	list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) {
-		if (!syncobj) {
-			struct panthor_group *group = job->group;
-
-			syncobj = group->syncobjs->kmap +
-				  (job->queue_idx * sizeof(*syncobj));
-		}
-
-		if (syncobj->seqno < job->done_fence->seqno)
-			break;
-
-		list_move_tail(&job->node, &done_jobs);
-		dma_fence_signal_locked(job->done_fence);
-	}
-
-	if (list_empty(&queue->fence_ctx.in_flight_jobs)) {
-		/* If we have no job left, we cancel the timer, and reset remaining
-		 * time to its default so it can be restarted next time
-		 * queue_resume_timeout() is called.
-		 */
-		queue_suspend_timeout_locked(queue);
-
-		/* If there's no job pending, we consider it progress to avoid a
-		 * spurious timeout if the timeout handler and the sync update
-		 * handler raced.
-		 */
-		progress = true;
-	} else if (!list_empty(&done_jobs)) {
-		queue_reset_timeout_locked(queue);
-		progress = true;
-	}
-	spin_unlock(&queue->fence_ctx.lock);
-	dma_fence_end_signalling(cookie);
-
-	list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
-		if (job->profiling.mask)
-			update_fdinfo_stats(job);
-		list_del_init(&job->node);
-		panthor_job_put(&job->base);
-	}
-
-	return progress;
-}
-
-static void group_sync_upd_work(struct work_struct *work)
-{
-	struct panthor_group *group =
-		container_of(work, struct panthor_group, sync_upd_work);
-	u32 queue_idx;
-	bool cookie;
-
-	cookie = dma_fence_begin_signalling();
-	for (queue_idx = 0; queue_idx < group->queue_count; queue_idx++) {
-		struct panthor_queue *queue = group->queues[queue_idx];
-
-		if (!queue)
-			continue;
-
-		queue_check_job_completion(queue);
-	}
-	dma_fence_end_signalling(cookie);
-
-	group_put(group);
-}
-
 struct panthor_job_ringbuf_instrs {
 	u64 buffer[MAX_INSTRS_PER_JOB];
 	u32 count;
@@ -3346,9 +3326,8 @@ queue_run_job(struct drm_sched_job *sched_job)
 	job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64));
 
 	panthor_job_get(&job->base);
-	spin_lock(&queue->fence_ctx.lock);
-	list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
-	spin_unlock(&queue->fence_ctx.lock);
+	scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock)
+		list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
 
 	/* Make sure the ring buffer is updated before the INSERT
 	 * register.
@@ -3683,7 +3662,6 @@ int panthor_group_create(struct panthor_file *pfile,
 	INIT_LIST_HEAD(&group->wait_node);
 	INIT_LIST_HEAD(&group->run_node);
 	INIT_WORK(&group->term_work, group_term_work);
-	INIT_WORK(&group->sync_upd_work, group_sync_upd_work);
 	INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
 	INIT_WORK(&group->release_work, group_release_work);
 
@@ -4054,7 +4032,6 @@ void panthor_sched_unplug(struct panthor_device *ptdev)
 	struct panthor_scheduler *sched = ptdev->scheduler;
 
 	disable_delayed_work_sync(&sched->tick_work);
-	disable_work_sync(&sched->fw_events_work);
 	disable_work_sync(&sched->sync_upd_work);
 
 	mutex_lock(&sched->lock);
@@ -4139,7 +4116,8 @@ int panthor_sched_init(struct panthor_device *ptdev)
 	sched->tick_period = msecs_to_jiffies(10);
 	INIT_DELAYED_WORK(&sched->tick_work, tick_work);
 	INIT_WORK(&sched->sync_upd_work, sync_upd_work);
-	INIT_WORK(&sched->fw_events_work, process_fw_events_work);
+
+	spin_lock_init(&sched->events_lock);
 
 	ret = drmm_mutex_init(&ptdev->base, &sched->lock);
 	if (ret)

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (5 preceding siblings ...)
  2026-05-12 11:37 ` [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 21:16   ` Chia-I Wu
  2026-05-14 14:17   ` Steven Price
  2026-05-12 11:37 ` [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Make the sched_process_csg_irq_locked() call part of
group_unbind_locked() so we don't have to manually call it in
tick_ctx_apply()/panthor_sched_suspend().

This implies moving group_[un]bind_locked() around to avoid a
forward declaration.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_sched.c | 176 +++++++++++++++-----------------
 1 file changed, 82 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index fbf76b59b7ef..6c5ba747ae45 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -982,86 +982,6 @@ group_get(struct panthor_group *group)
 	return group;
 }
 
-/**
- * group_bind_locked() - Bind a group to a group slot
- * @group: Group.
- * @csg_id: Slot.
- *
- * Return: 0 on success, a negative error code otherwise.
- */
-static int
-group_bind_locked(struct panthor_group *group, u32 csg_id)
-{
-	struct panthor_device *ptdev = group->ptdev;
-	int ret;
-
-	lockdep_assert_held(&ptdev->scheduler->lock);
-
-	if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
-			ptdev->scheduler->csg_slots[csg_id].group))
-		return -EINVAL;
-
-	ret = panthor_vm_active(group->vm);
-	if (ret)
-		return ret;
-
-	group_get(group);
-
-	/* Dummy doorbell allocation: doorbell is assigned to the group and
-	 * all queues use the same doorbell.
-	 *
-	 * TODO: Implement LRU-based doorbell assignment, so the most often
-	 * updated queues get their own doorbell, thus avoiding useless checks
-	 * on queues belonging to the same group that are rarely updated.
-	 */
-	for (u32 i = 0; i < group->queue_count; i++)
-		group->queues[i]->doorbell_id = csg_id + 1;
-
-	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
-		ptdev->scheduler->csg_slots[csg_id].group = group;
-		group->csg_id = csg_id;
-	}
-
-	return 0;
-}
-
-/**
- * group_unbind_locked() - Unbind a group from a slot.
- * @group: Group to unbind.
- *
- * Return: 0 on success, a negative error code otherwise.
- */
-static int
-group_unbind_locked(struct panthor_group *group)
-{
-	struct panthor_device *ptdev = group->ptdev;
-
-	lockdep_assert_held(&ptdev->scheduler->lock);
-
-	if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
-		return -EINVAL;
-
-	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
-		return -EINVAL;
-
-	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
-		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
-		group->csg_id = -1;
-	}
-
-	panthor_vm_idle(group->vm);
-
-	/* Tiler OOM events will be re-issued next time the group is scheduled. */
-	atomic_set(&group->tiler_oom, 0);
-	cancel_work(&group->tiler_oom_work);
-
-	for (u32 i = 0; i < group->queue_count; i++)
-		group->queues[i]->doorbell_id = -1;
-
-	group_put(group);
-	return 0;
-}
-
 static bool
 group_is_idle(struct panthor_group *group)
 {
@@ -1968,6 +1888,88 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
 	}
 }
 
+/**
+ * group_bind_locked() - Bind a group to a group slot
+ * @group: Group.
+ * @csg_id: Slot.
+ *
+ * Return: 0 on success, a negative error code otherwise.
+ */
+static int
+group_bind_locked(struct panthor_group *group, u32 csg_id)
+{
+	struct panthor_device *ptdev = group->ptdev;
+	int ret;
+
+	lockdep_assert_held(&ptdev->scheduler->lock);
+
+	if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
+			ptdev->scheduler->csg_slots[csg_id].group))
+		return -EINVAL;
+
+	ret = panthor_vm_active(group->vm);
+	if (ret)
+		return ret;
+
+	group_get(group);
+
+	/* Dummy doorbell allocation: doorbell is assigned to the group and
+	 * all queues use the same doorbell.
+	 *
+	 * TODO: Implement LRU-based doorbell assignment, so the most often
+	 * updated queues get their own doorbell, thus avoiding useless checks
+	 * on queues belonging to the same group that are rarely updated.
+	 */
+	for (u32 i = 0; i < group->queue_count; i++)
+		group->queues[i]->doorbell_id = csg_id + 1;
+
+	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
+		ptdev->scheduler->csg_slots[csg_id].group = group;
+		group->csg_id = csg_id;
+	}
+
+	return 0;
+}
+
+/**
+ * group_unbind_locked() - Unbind a group from a slot.
+ * @group: Group to unbind.
+ *
+ * Return: 0 on success, a negative error code otherwise.
+ */
+static int
+group_unbind_locked(struct panthor_group *group)
+{
+	struct panthor_device *ptdev = group->ptdev;
+
+	lockdep_assert_held(&ptdev->scheduler->lock);
+
+	if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
+		return -EINVAL;
+
+	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
+		return -EINVAL;
+
+	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
+		/* Process all pending IRQs before returning the slot. */
+		sched_process_csg_irq_locked(ptdev, group->csg_id);
+		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
+		group->csg_id = -1;
+	}
+
+	panthor_vm_idle(group->vm);
+
+	/* Tiler OOM events will be re-issued next time the group is scheduled. */
+	atomic_set(&group->tiler_oom, 0);
+	cancel_work(&group->tiler_oom_work);
+
+	for (u32 i = 0; i < group->queue_count; i++)
+		group->queues[i]->doorbell_id = -1;
+
+	group_put(group);
+	return 0;
+}
+
 static const char *fence_get_driver_name(struct dma_fence *fence)
 {
 	return "panthor";
@@ -2396,15 +2398,6 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
 	/* Unbind evicted groups. */
 	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
 		list_for_each_entry(group, &ctx->old_groups[prio], run_node) {
-			/* This group is gone. Process interrupts to clear
-			 * any pending interrupts before we start the new
-			 * group.
-			 */
-			if (group->csg_id >= 0) {
-				guard(spinlock_irqsave)(&sched->events_lock);
-				sched_process_csg_irq_locked(ptdev, group->csg_id);
-			}
-
 			group_unbind_locked(group);
 		}
 	}
@@ -2988,11 +2981,6 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
 
 		group_get(group);
 
-		if (group->csg_id >= 0) {
-			guard(spinlock_irqsave)(&sched->events_lock);
-			sched_process_csg_irq_locked(ptdev, group->csg_id);
-		}
-
 		group_unbind_locked(group);
 
 		drm_WARN_ON(&group->ptdev->base, !list_empty(&group->run_node));

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (6 preceding siblings ...)
  2026-05-12 11:37 ` [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 21:55   ` Chia-I Wu
  2026-05-14 14:25   ` Steven Price
  2026-05-12 11:37 ` [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context Boris Brezillon
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Rather than assuming an interrupt is always expected for request
acks, temporarily enable the relevant interrupts when the polling-wait
failed. This should hopefully reduce the number of interrupts the CPU
has to process.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_fw.c    | 34 +++++++++++++++++++--------------
 drivers/gpu/drm/panthor/panthor_sched.c |  5 +++--
 2 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index 8239a6951569..f5e0ceca4130 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1039,16 +1039,10 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
 	glb_iface->input->progress_timer = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
 	glb_iface->input->idle_timer = panthor_fw_conv_timeout(ptdev, IDLE_HYSTERESIS_US);
 
-	/* Enable interrupts we care about. */
-	glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
-					 GLB_PING |
-					 GLB_CFG_PROGRESS_TIMER |
-					 GLB_CFG_POWEROFF_TIMER |
-					 GLB_IDLE_EN |
-					 GLB_IDLE;
-
-	if (panthor_fw_has_glb_state(ptdev))
-		glb_iface->input->ack_irq_mask |= GLB_STATE_MASK;
+	/* Enable interrupts for asynchronous events that are not
+	 * triggered by request acks.
+	 */
+	glb_iface->input->ack_irq_mask = GLB_IDLE;
 
 	panthor_fw_update_reqs(glb_iface, req, GLB_IDLE_EN | GLB_COUNTER_EN,
 			       GLB_IDLE_EN | GLB_COUNTER_EN);
@@ -1318,8 +1312,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
  * Return: 0 on success, -ETIMEDOUT otherwise.
  */
 static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
-				wait_queue_head_t *wq,
-				u32 req_mask, u32 *acked,
+				u32 *ack_irq_mask_ptr, spinlock_t *lock,
+				wait_queue_head_t *wq, u32 req_mask, u32 *acked,
 				u32 timeout_ms)
 {
 	u32 ack, req = READ_ONCE(*req_ptr) & req_mask;
@@ -1334,8 +1328,16 @@ static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
 	if (!ret)
 		return 0;
 
-	if (wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
-			       msecs_to_jiffies(timeout_ms)))
+	scoped_guard(spinlock_irqsave, lock)
+		*ack_irq_mask_ptr |= req_mask;
+
+	ret = wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
+				 msecs_to_jiffies(timeout_ms));
+
+	scoped_guard(spinlock_irqsave, lock)
+		*ack_irq_mask_ptr &= ~req_mask;
+
+	if (ret)
 		return 0;
 
 	/* Check one last time, in case we were not woken up for some reason. */
@@ -1369,6 +1371,8 @@ int panthor_fw_glb_wait_acks(struct panthor_device *ptdev,
 
 	return panthor_fw_wait_acks(&glb_iface->input->req,
 				    &glb_iface->output->ack,
+				    &glb_iface->input->ack_irq_mask,
+				    &glb_iface->lock,
 				    &ptdev->fw->req_waitqueue,
 				    req_mask, acked, timeout_ms);
 }
@@ -1395,6 +1399,8 @@ int panthor_fw_csg_wait_acks(struct panthor_device *ptdev, u32 csg_slot,
 
 	ret = panthor_fw_wait_acks(&csg_iface->input->req,
 				   &csg_iface->output->ack,
+				   &csg_iface->input->ack_irq_mask,
+				   &csg_iface->lock,
 				   &ptdev->fw->req_waitqueue,
 				   req_mask, acked, timeout_ms);
 
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 6c5ba747ae45..a9124bcc7de6 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1110,7 +1110,7 @@ cs_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 cs_id)
 	cs_iface->input->ringbuf_output = queue->iface.output_fw_va;
 	cs_iface->input->config = CS_CONFIG_PRIORITY(queue->priority) |
 				  CS_CONFIG_DOORBELL(queue->doorbell_id);
-	cs_iface->input->ack_irq_mask = ~0;
+	cs_iface->input->ack_irq_mask = CS_FATAL | CS_FAULT | CS_TILER_OOM;
 	panthor_fw_update_reqs(cs_iface, req,
 			       CS_IDLE_SYNC_WAIT |
 			       CS_IDLE_EMPTY |
@@ -1378,7 +1378,8 @@ csg_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 priority)
 		csg_iface->input->protm_suspend_buf = 0;
 	}
 
-	csg_iface->input->ack_irq_mask = ~0;
+	csg_iface->input->ack_irq_mask = CSG_SYNC_UPDATE | CSG_IDLE |
+					 CSG_PROGRESS_TIMER_EVENT;
 	panthor_fw_toggle_reqs(csg_iface, doorbell_req, doorbell_ack, queue_mask);
 	return 0;
 }

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (7 preceding siblings ...)
  2026-05-12 11:37 ` [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 22:05   ` Chia-I Wu
  2026-05-14 15:23   ` Steven Price
  2026-05-12 11:37 ` [PATCH v2 10/11] drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler() Boris Brezillon
  2026-05-12 11:37 ` [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context Boris Brezillon
  10 siblings, 2 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

Now that everything is set to allow processing FW events in IRQ context,
go for it. This should reduce the dma_fence signaling latency.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_fw.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index f5e0ceca4130..8cfebf180de7 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1087,9 +1087,29 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
 	}
 }
 
-static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
+static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
 {
-	return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
+	struct panthor_irq *pirq = data;
+
+	if (!gpu_read(pirq->iomem, INT_STAT))
+		return IRQ_NONE;
+
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
+			return IRQ_NONE;
+
+		pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
+	}
+
+	/* We can use INT_STAT here, because we didn't mask the IRQs. */
+	panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_STAT));
+
+	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
+		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
+			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
+	}
+
+	return IRQ_HANDLED;
 }
 
 static int panthor_fw_start(struct panthor_device *ptdev)
@@ -1489,8 +1509,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
 
 	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
 				  ptdev->iomem + JOB_INT_BASE, "job",
-				  panthor_irq_default_raw_handler,
-				  panthor_job_irq_threaded_handler);
+				  panthor_job_irq_raw_handler, NULL);
 	if (ret) {
 		drm_err(&ptdev->base, "failed to request job irq");
 		return ret;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 10/11] drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler()
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (8 preceding siblings ...)
  2026-05-12 11:37 ` [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-14 15:26   ` Steven Price
  2026-05-12 11:37 ` [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context Boris Brezillon
  10 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

This is not a bug per-se, because this lock is never taken in an
interrupt context, but it's not consistent with the other users of this
lock. We're also planning on transitioning GPU event processing to
a hard handler. Again, this alone wouldn't justify using the IRQ-safe
variant, because then this _lock/unlock sequence would be in the
hard-IRQ path, where IRQs are already disabled, but let's do it anyway,
to keep things consistent.

While at it, transition to a guard() instead of a plain lock/unlock
sequence.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_gpu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index d0be758ea3e1..b9c51f8a051d 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -110,12 +110,11 @@ static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
 	if (status & GPU_IRQ_PROTM_FAULT)
 		drm_warn(&ptdev->base, "GPU Fault in protected mode\n");
 
-	spin_lock(&ptdev->gpu->reqs_lock);
+	guard(spinlock_irqsave)(&ptdev->gpu->reqs_lock);
 	if (status & ptdev->gpu->pending_reqs) {
 		ptdev->gpu->pending_reqs &= ~status;
 		wake_up_all(&ptdev->gpu->reqs_acked);
 	}
-	spin_unlock(&ptdev->gpu->reqs_lock);
 }
 
 static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context
  2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
                   ` (9 preceding siblings ...)
  2026-05-12 11:37 ` [PATCH v2 10/11] drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler() Boris Brezillon
@ 2026-05-12 11:37 ` Boris Brezillon
  2026-05-12 11:50   ` Boris Brezillon
  10 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:37 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel, Boris Brezillon

The current panthor_gpu_irq_handler() logic is already IRQ-safe
(no sleep or sleeping locks, spinlocks taken with irqsave in other
contexts, etc), so let's toggle the switch and make it an hard IRQ
handler.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_gpu.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index b9c51f8a051d..04c8f23baf3f 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -86,10 +86,15 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
 	gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
 }
 
-static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
+static irqreturn_t panthor_gpu_irq_raw_handler(int irq, void *data)
 {
+	struct panthor_irq *pirq = data;
 	struct panthor_device *ptdev = pirq->ptdev;
 	struct panthor_gpu *gpu = ptdev->gpu;
+	u32 status = gpu_read(gpu->irq.iomem, INT_STAT);
+
+	if (!status)
+		return IRQ_NONE;
 
 	gpu_write(gpu->irq.iomem, INT_CLEAR, status);
 
@@ -115,11 +120,8 @@ static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
 		ptdev->gpu->pending_reqs &= ~status;
 		wake_up_all(&ptdev->gpu->reqs_acked);
 	}
-}
 
-static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
-{
-	return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);
+	return IRQ_HANDLED;
 }
 
 /**
@@ -176,8 +178,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
 	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
 				  GPU_INTERRUPTS_MASK,
 				  ptdev->iomem + GPU_INT_BASE, "gpu",
-				  panthor_irq_default_raw_handler,
-				  panthor_gpu_irq_threaded_handler);
+				  panthor_gpu_irq_raw_handler, NULL);
 	if (ret)
 		return ret;
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context
  2026-05-12 11:37 ` [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context Boris Brezillon
@ 2026-05-12 11:50   ` Boris Brezillon
  2026-05-12 22:40     ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-12 11:50 UTC (permalink / raw)
  To: Steven Price, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On Tue, 12 May 2026 13:37:41 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> The current panthor_gpu_irq_handler() logic is already IRQ-safe
> (no sleep or sleeping locks, spinlocks taken with irqsave in other
> contexts, etc), so let's toggle the switch and make it an hard IRQ
> handler.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_gpu.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index b9c51f8a051d..04c8f23baf3f 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -86,10 +86,15 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
>  	gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
>  }
>  
> -static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
> +static irqreturn_t panthor_gpu_irq_raw_handler(int irq, void *data)
>  {
> +	struct panthor_irq *pirq = data;
>  	struct panthor_device *ptdev = pirq->ptdev;
>  	struct panthor_gpu *gpu = ptdev->gpu;
> +	u32 status = gpu_read(gpu->irq.iomem, INT_STAT);
> +
> +	if (!status)
> +		return IRQ_NONE;
>  

Forgot to add the pirq state transition here:

       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
               if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
                       return IRQ_NONE;

               pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
       }

>  	gpu_write(gpu->irq.iomem, INT_CLEAR, status);
>  
> @@ -115,11 +120,8 @@ static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
>  		ptdev->gpu->pending_reqs &= ~status;
>  		wake_up_all(&ptdev->gpu->reqs_acked);
>  	}
> -}
>  
> -static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
> -{
> -	return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);

and restore it here:

       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
               if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
                       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
       }

> +	return IRQ_HANDLED;
>  }
>  
>  /**
> @@ -176,8 +178,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
>  	ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
>  				  GPU_INTERRUPTS_MASK,
>  				  ptdev->iomem + GPU_INT_BASE, "gpu",
> -				  panthor_irq_default_raw_handler,
> -				  panthor_gpu_irq_threaded_handler);
> +				  panthor_gpu_irq_raw_handler, NULL);
>  	if (ret)
>  		return ret;
>  
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 01/11] drm/panthor: Make panthor_irq::state a non-atomic field
  2026-05-12 11:37 ` [PATCH v2 01/11] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
@ 2026-05-12 18:40   ` Chia-I Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 18:40 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 4:44 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> The only place where panthor_irq::state is accessed without
> panthor_irq::mask_lock held is in the prologue of _irq_suspend(),
> which is not really a fast-path. So let's simplify things by assuming
> panthor_irq::state must always be accessed with the mask_lock held,
> and add a scoped_guard() in _irq_suspend().
>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 35 ++++++++++++++++----------------
>  1 file changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 4e4607bca7cc..3f91ba73829d 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -101,8 +101,12 @@ struct panthor_irq {
>          */
>         spinlock_t mask_lock;

optional nit: might want to rename mask_lock

>
> -       /** @state: one of &enum panthor_irq_state reflecting the current state. */
> -       atomic_t state;
> +       /**
> +        * @state: one of &enum panthor_irq_state reflecting the current state.
> +        *
> +        * Must be accessed with mask_lock held.
> +        */
> +       enum panthor_irq_state state;
>  };
>
>  /**
> @@ -510,18 +514,15 @@ const char *panthor_exception_name(struct panthor_device *ptdev,
>  static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)                 \
>  {                                                                                              \
>         struct panthor_irq *pirq = data;                                                        \
> -       enum panthor_irq_state old_state;                                                       \
>                                                                                                 \
>         if (!gpu_read(pirq->iomem, INT_STAT))                                                   \
>                 return IRQ_NONE;                                                                \
>                                                                                                 \
>         guard(spinlock_irqsave)(&pirq->mask_lock);                                              \
> -       old_state = atomic_cmpxchg(&pirq->state,                                                \
> -                                  PANTHOR_IRQ_STATE_ACTIVE,                                    \
> -                                  PANTHOR_IRQ_STATE_PROCESSING);                               \
> -       if (old_state != PANTHOR_IRQ_STATE_ACTIVE)                                              \
> +       if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)                                            \
>                 return IRQ_NONE;                                                                \
>                                                                                                 \
> +       pirq->state = PANTHOR_IRQ_STATE_PROCESSING;                                             \
>         gpu_write(pirq->iomem, INT_MASK, 0);                                                    \
>         return IRQ_WAKE_THREAD;                                                                 \
>  }                                                                                              \
> @@ -551,13 +552,10 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
>         }                                                                                       \
>                                                                                                 \
>         scoped_guard(spinlock_irqsave, &pirq->mask_lock) {                                      \
> -               enum panthor_irq_state old_state;                                               \
> -                                                                                               \
> -               old_state = atomic_cmpxchg(&pirq->state,                                        \
> -                                          PANTHOR_IRQ_STATE_PROCESSING,                        \
> -                                          PANTHOR_IRQ_STATE_ACTIVE);                           \
> -               if (old_state == PANTHOR_IRQ_STATE_PROCESSING)                                  \
> +               if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {                              \
> +                       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;                                 \
>                         gpu_write(pirq->iomem, INT_MASK, pirq->mask);                           \
> +               }                                                                               \
>         }                                                                                       \
>                                                                                                 \
>         return ret;                                                                             \
> @@ -566,18 +564,19 @@ static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *da
>  static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)                        \
>  {                                                                                              \
>         scoped_guard(spinlock_irqsave, &pirq->mask_lock) {                                      \
> -               atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDING);                         \
> +               pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;                                     \
>                 gpu_write(pirq->iomem, INT_MASK, 0);                                            \
>         }                                                                                       \
>         synchronize_irq(pirq->irq);                                                             \
> -       atomic_set(&pirq->state, PANTHOR_IRQ_STATE_SUSPENDED);                                  \
> +       scoped_guard(spinlock_irqsave, &pirq->mask_lock)                                        \
> +               pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;                                      \
>  }                                                                                              \
>                                                                                                 \
>  static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)                 \
>  {                                                                                              \
>         guard(spinlock_irqsave)(&pirq->mask_lock);                                              \
>                                                                                                 \
> -       atomic_set(&pirq->state, PANTHOR_IRQ_STATE_ACTIVE);                                     \
> +       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;                                                 \
>         gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);                                          \
>         gpu_write(pirq->iomem, INT_MASK, pirq->mask);                                           \
>  }                                                                                              \
> @@ -610,7 +609,7 @@ static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *
>          * on the PROCESSING -> ACTIVE transition.                                              \
>          * If the IRQ is suspended/suspending, the mask is restored at resume time.             \
>          */                                                                                     \
> -       if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)                              \
> +       if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)                                            \
>                 gpu_write(pirq->iomem, INT_MASK, pirq->mask);                                   \
>  }                                                                                              \
>                                                                                                 \
> @@ -624,7 +623,7 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
>          * on the PROCESSING -> ACTIVE transition.                                              \
>          * If the IRQ is suspended/suspending, the mask is restored at resume time.             \
>          */                                                                                     \
> -       if (atomic_read(&pirq->state) == PANTHOR_IRQ_STATE_ACTIVE)                              \
> +       if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)                                            \
>                 gpu_write(pirq->iomem, INT_MASK, pirq->mask);                                   \
>  }
>
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 02/11] drm/panthor: Move the register accessors before the IRQ helpers
  2026-05-12 11:37 ` [PATCH v2 02/11] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
@ 2026-05-12 18:41   ` Chia-I Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 18:41 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 5:14 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> We're about to add an IRQ inline helper using gpu_read(). Move things
> around to avoid forward declarations.
>
> No functional changes.
>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>

This can be dropped if patch 3 uses non-inline functions.

> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 142 +++++++++++++++----------------
>  1 file changed, 71 insertions(+), 71 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 3f91ba73829d..768fc1992368 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -495,6 +495,77 @@ panthor_exception_is_fault(u32 exception_code)
>  const char *panthor_exception_name(struct panthor_device *ptdev,
>                                    u32 exception_code);
>
> +static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
> +{
> +       writel(data, iomem + reg);
> +}
> +
> +static inline u32 gpu_read(void __iomem *iomem, u32 reg)
> +{
> +       return readl(iomem + reg);
> +}
> +
> +static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
> +{
> +       return readl_relaxed(iomem + reg);
> +}
> +
> +static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
> +{
> +       gpu_write(iomem, reg, lower_32_bits(data));
> +       gpu_write(iomem, reg + 4, upper_32_bits(data));
> +}
> +
> +static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
> +{
> +       return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
> +}
> +
> +static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
> +{
> +       return (gpu_read_relaxed(iomem, reg) |
> +               ((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
> +}
> +
> +static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
> +{
> +       u32 lo, hi1, hi2;
> +       do {
> +               hi1 = gpu_read(iomem, reg + 4);
> +               lo = gpu_read(iomem, reg);
> +               hi2 = gpu_read(iomem, reg + 4);
> +       } while (hi1 != hi2);
> +       return lo | ((u64)hi2 << 32);
> +}
> +
> +#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)     \
> +       read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,     \
> +                         iomem, reg)
> +
> +#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,          \
> +                                    timeout_us)                                \
> +       read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,     \
> +                                false, iomem, reg)
> +
> +#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)   \
> +       read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,   \
> +                         iomem, reg)
> +
> +#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,                \
> +                                      timeout_us)                              \
> +       read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,   \
> +                                false, iomem, reg)
> +
> +#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,  \
> +                                            timeout_us)                        \
> +       read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,         \
> +                                timeout_us, false, iomem, reg)
> +
> +#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,       \
> +                                       timeout_us)                             \
> +       read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,  \
> +                         false, iomem, reg)
> +
>  #define INT_RAWSTAT 0x0
>  #define INT_CLEAR   0x4
>  #define INT_MASK    0x8
> @@ -629,75 +700,4 @@ static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq
>
>  extern struct workqueue_struct *panthor_cleanup_wq;
>
> -static inline void gpu_write(void __iomem *iomem, u32 reg, u32 data)
> -{
> -       writel(data, iomem + reg);
> -}
> -
> -static inline u32 gpu_read(void __iomem *iomem, u32 reg)
> -{
> -       return readl(iomem + reg);
> -}
> -
> -static inline u32 gpu_read_relaxed(void __iomem *iomem, u32 reg)
> -{
> -       return readl_relaxed(iomem + reg);
> -}
> -
> -static inline void gpu_write64(void __iomem *iomem, u32 reg, u64 data)
> -{
> -       gpu_write(iomem, reg, lower_32_bits(data));
> -       gpu_write(iomem, reg + 4, upper_32_bits(data));
> -}
> -
> -static inline u64 gpu_read64(void __iomem *iomem, u32 reg)
> -{
> -       return (gpu_read(iomem, reg) | ((u64)gpu_read(iomem, reg + 4) << 32));
> -}
> -
> -static inline u64 gpu_read64_relaxed(void __iomem *iomem, u32 reg)
> -{
> -       return (gpu_read_relaxed(iomem, reg) |
> -               ((u64)gpu_read_relaxed(iomem, reg + 4) << 32));
> -}
> -
> -static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
> -{
> -       u32 lo, hi1, hi2;
> -       do {
> -               hi1 = gpu_read(iomem, reg + 4);
> -               lo = gpu_read(iomem, reg);
> -               hi2 = gpu_read(iomem, reg + 4);
> -       } while (hi1 != hi2);
> -       return lo | ((u64)hi2 << 32);
> -}
> -
> -#define gpu_read_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)     \
> -       read_poll_timeout(gpu_read, val, cond, delay_us, timeout_us, false,     \
> -                         iomem, reg)
> -
> -#define gpu_read_poll_timeout_atomic(iomem, reg, val, cond, delay_us,          \
> -                                    timeout_us)                                \
> -       read_poll_timeout_atomic(gpu_read, val, cond, delay_us, timeout_us,     \
> -                                false, iomem, reg)
> -
> -#define gpu_read64_poll_timeout(iomem, reg, val, cond, delay_us, timeout_us)   \
> -       read_poll_timeout(gpu_read64, val, cond, delay_us, timeout_us, false,   \
> -                         iomem, reg)
> -
> -#define gpu_read64_poll_timeout_atomic(iomem, reg, val, cond, delay_us,                \
> -                                      timeout_us)                              \
> -       read_poll_timeout_atomic(gpu_read64, val, cond, delay_us, timeout_us,   \
> -                                false, iomem, reg)
> -
> -#define gpu_read_relaxed_poll_timeout_atomic(iomem, reg, val, cond, delay_us,  \
> -                                            timeout_us)                        \
> -       read_poll_timeout_atomic(gpu_read_relaxed, val, cond, delay_us,         \
> -                                timeout_us, false, iomem, reg)
> -
> -#define gpu_read64_relaxed_poll_timeout(iomem, reg, val, cond, delay_us,       \
> -                                       timeout_us)                             \
> -       read_poll_timeout(gpu_read64_relaxed, val, cond, delay_us, timeout_us,  \
> -                         false, iomem, reg)
> -
>  #endif
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers
  2026-05-12 11:37 ` [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
@ 2026-05-12 18:58   ` Chia-I Wu
  2026-05-13  8:03     ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 18:58 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> Now that panthor_irq contains the iomem region, there's no real need
> for the macro-based panthor_irq helper generation logic. We can just
> provide inline helpers that do the same and let the compiler optimize
> indirect function calls. The only extra annoyance is the fact we have
> to open-code the panthor_xxx_irq_threaded_handler() implementation, but
> those are single-line functions, so it's acceptable.
We might want to __always_inline panthor_irq_default_threaded_handler.
For the rest, do we want to un-inline them?

>
> While at it, we changed the prototype of the IRQ handlers to take
> a panthor_irq instead of panthor_device, since that's the thing
> that's passed around when it comes to panthor_irq, and the
> panthor_device can be directly extracted from there.
>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 245 +++++++++++++++----------------
>  drivers/gpu/drm/panthor/panthor_fw.c     |  22 ++-
>  drivers/gpu/drm/panthor/panthor_gpu.c    |  26 ++--
>  drivers/gpu/drm/panthor/panthor_mmu.c    |  37 ++---
>  drivers/gpu/drm/panthor/panthor_pwr.c    |  20 ++-
>  5 files changed, 183 insertions(+), 167 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 768fc1992368..393fcda73d88 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -571,131 +571,126 @@ static inline u64 gpu_read64_counter(void __iomem *iomem, u32 reg)
>  #define INT_MASK    0x8
>  #define INT_STAT    0xc
>
> -/**
> - * PANTHOR_IRQ_HANDLER() - Define interrupt handlers and the interrupt
> - * registration function.
> - *
> - * The boiler-plate to gracefully deal with shared interrupts is
> - * auto-generated. All you have to do is call PANTHOR_IRQ_HANDLER()
> - * just after the actual handler. The handler prototype is:
> - *
> - * void (*handler)(struct panthor_device *, u32 status);
> - */
> -#define PANTHOR_IRQ_HANDLER(__name, __handler)                                                 \
> -static irqreturn_t panthor_ ## __name ## _irq_raw_handler(int irq, void *data)                 \
> -{                                                                                              \
> -       struct panthor_irq *pirq = data;                                                        \
> -                                                                                               \
> -       if (!gpu_read(pirq->iomem, INT_STAT))                                                   \
> -               return IRQ_NONE;                                                                \
> -                                                                                               \
> -       guard(spinlock_irqsave)(&pirq->mask_lock);                                              \
> -       if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)                                            \
> -               return IRQ_NONE;                                                                \
> -                                                                                               \
> -       pirq->state = PANTHOR_IRQ_STATE_PROCESSING;                                             \
> -       gpu_write(pirq->iomem, INT_MASK, 0);                                                    \
> -       return IRQ_WAKE_THREAD;                                                                 \
> -}                                                                                              \
> -                                                                                               \
> -static irqreturn_t panthor_ ## __name ## _irq_threaded_handler(int irq, void *data)            \
> -{                                                                                              \
> -       struct panthor_irq *pirq = data;                                                        \
> -       struct panthor_device *ptdev = pirq->ptdev;                                             \
> -       irqreturn_t ret = IRQ_NONE;                                                             \
> -                                                                                               \
> -       while (true) {                                                                          \
> -               /* It's safe to access pirq->mask without the lock held here. If a new          \
> -                * event gets added to the mask and the corresponding IRQ is pending,           \
> -                * we'll process it right away instead of adding an extra raw -> threaded       \
> -                * round trip. If an event is removed and the status bit is set, it will        \
> -                * be ignored, just like it would have been if the mask had been adjusted       \
> -                * right before the HW event kicks in. TLDR; it's all expected races we're      \
> -                * covered for.                                                                 \
> -                */                                                                             \
> -               u32 status = gpu_read(pirq->iomem, INT_RAWSTAT) & pirq->mask;                   \
> -                                                                                               \
> -               if (!status)                                                                    \
> -                       break;                                                                  \
> -                                                                                               \
> -               __handler(ptdev, status);                                                       \
> -               ret = IRQ_HANDLED;                                                              \
> -       }                                                                                       \
> -                                                                                               \
> -       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {                                      \
> -               if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {                              \
> -                       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;                                 \
> -                       gpu_write(pirq->iomem, INT_MASK, pirq->mask);                           \
> -               }                                                                               \
> -       }                                                                                       \
> -                                                                                               \
> -       return ret;                                                                             \
> -}                                                                                              \
> -                                                                                               \
> -static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)                        \
> -{                                                                                              \
> -       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {                                      \
> -               pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;                                     \
> -               gpu_write(pirq->iomem, INT_MASK, 0);                                            \
> -       }                                                                                       \
> -       synchronize_irq(pirq->irq);                                                             \
> -       scoped_guard(spinlock_irqsave, &pirq->mask_lock)                                        \
> -               pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;                                      \
> -}                                                                                              \
> -                                                                                               \
> -static inline void panthor_ ## __name ## _irq_resume(struct panthor_irq *pirq)                 \
> -{                                                                                              \
> -       guard(spinlock_irqsave)(&pirq->mask_lock);                                              \
> -                                                                                               \
> -       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;                                                 \
> -       gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);                                          \
> -       gpu_write(pirq->iomem, INT_MASK, pirq->mask);                                           \
> -}                                                                                              \
> -                                                                                               \
> -static int panthor_request_ ## __name ## _irq(struct panthor_device *ptdev,                    \
> -                                             struct panthor_irq *pirq,                         \
> -                                             int irq, u32 mask, void __iomem *iomem)           \
> -{                                                                                              \
> -       pirq->ptdev = ptdev;                                                                    \
> -       pirq->irq = irq;                                                                        \
> -       pirq->mask = mask;                                                                      \
> -       pirq->iomem = iomem;                                                                    \
> -       spin_lock_init(&pirq->mask_lock);                                                       \
> -       panthor_ ## __name ## _irq_resume(pirq);                                                \
> -                                                                                               \
> -       return devm_request_threaded_irq(ptdev->base.dev, irq,                                  \
> -                                        panthor_ ## __name ## _irq_raw_handler,                \
> -                                        panthor_ ## __name ## _irq_threaded_handler,           \
> -                                        IRQF_SHARED, KBUILD_MODNAME "-" # __name,              \
> -                                        pirq);                                                 \
> -}                                                                                              \
> -                                                                                               \
> -static inline void panthor_ ## __name ## _irq_enable_events(struct panthor_irq *pirq, u32 mask)        \
> -{                                                                                              \
> -       guard(spinlock_irqsave)(&pirq->mask_lock);                                              \
> -       pirq->mask |= mask;                                                                     \
> -                                                                                               \
> -       /* The only situation where we need to write the new mask is if the IRQ is active.      \
> -        * If it's being processed, the mask will be restored for us in _irq_threaded_handler() \
> -        * on the PROCESSING -> ACTIVE transition.                                              \
> -        * If the IRQ is suspended/suspending, the mask is restored at resume time.             \
> -        */                                                                                     \
> -       if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)                                            \
> -               gpu_write(pirq->iomem, INT_MASK, pirq->mask);                                   \
> -}                                                                                              \
> -                                                                                               \
> -static inline void panthor_ ## __name ## _irq_disable_events(struct panthor_irq *pirq, u32 mask)\
> -{                                                                                              \
> -       guard(spinlock_irqsave)(&pirq->mask_lock);                                              \
> -       pirq->mask &= ~mask;                                                                    \
> -                                                                                               \
> -       /* The only situation where we need to write the new mask is if the IRQ is active.      \
> -        * If it's being processed, the mask will be restored for us in _irq_threaded_handler() \
> -        * on the PROCESSING -> ACTIVE transition.                                              \
> -        * If the IRQ is suspended/suspending, the mask is restored at resume time.             \
> -        */                                                                                     \
> -       if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)                                            \
> -               gpu_write(pirq->iomem, INT_MASK, pirq->mask);                                   \
> +static inline irqreturn_t panthor_irq_default_raw_handler(int irq, void *data)
> +{
> +       struct panthor_irq *pirq = data;
> +
> +       if (!gpu_read(pirq->iomem, INT_STAT))
> +               return IRQ_NONE;
> +
> +       guard(spinlock_irqsave)(&pirq->mask_lock);
> +       if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> +               return IRQ_NONE;
> +
> +       pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> +       gpu_write(pirq->iomem, INT_MASK, 0);
> +       return IRQ_WAKE_THREAD;
> +}
> +
> +static inline irqreturn_t
> +panthor_irq_default_threaded_handler(void *data,
> +                                    void (*slow_handler)(struct panthor_irq *, u32))
> +{
> +       struct panthor_irq *pirq = data;
> +       irqreturn_t ret = IRQ_NONE;
> +
> +       while (true) {
> +               /* It's safe to access pirq->mask without the lock held here. If a new
> +                * event gets added to the mask and the corresponding IRQ is pending,
> +                * we'll process it right away instead of adding an extra raw -> threaded
> +                * round trip. If an event is removed and the status bit is set, it will
> +                * be ignored, just like it would have been if the mask had been adjusted
> +                * right before the HW event kicks in. TLDR; it's all expected races we're
> +                * covered for.
> +                */
> +               u32 status = gpu_read(pirq->iomem, INT_RAWSTAT) & pirq->mask;
> +
> +               if (!status)
> +                       break;
> +
> +               slow_handler(pirq, status);
> +               ret = IRQ_HANDLED;
> +       }
> +
> +       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +               if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING) {
> +                       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> +                       gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +               }
> +       }
> +
> +       return ret;
> +}
> +
> +static inline void panthor_irq_suspend(struct panthor_irq *pirq)
> +{
> +       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +               pirq->state = PANTHOR_IRQ_STATE_SUSPENDING;
> +               gpu_write(pirq->iomem, INT_MASK, 0);
> +       }
> +       synchronize_irq(pirq->irq);
> +       scoped_guard(spinlock_irqsave, &pirq->mask_lock)
> +               pirq->state = PANTHOR_IRQ_STATE_SUSPENDED;
> +}
> +
> +static inline void panthor_irq_resume(struct panthor_irq *pirq)
> +{
> +       guard(spinlock_irqsave)(&pirq->mask_lock);
> +       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> +       gpu_write(pirq->iomem, INT_CLEAR, pirq->mask);
> +       gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +}
> +
> +static inline void panthor_irq_enable_events(struct panthor_irq *pirq, u32 mask)
> +{
> +       guard(spinlock_irqsave)(&pirq->mask_lock);
> +       pirq->mask |= mask;
> +
> +       /* The only situation where we need to write the new mask is if the IRQ is active.
> +        * If it's being processed, the mask will be restored for us in _irq_threaded_handler()
> +        * on the PROCESSING -> ACTIVE transition.
> +        * If the IRQ is suspended/suspending, the mask is restored at resume time.
> +        */
> +       if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)
> +               gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +}
> +
> +static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask)
> +{
> +       guard(spinlock_irqsave)(&pirq->mask_lock);
> +       pirq->mask &= ~mask;
> +
> +       /* The only situation where we need to write the new mask is if the IRQ is active.
> +        * If it's being processed, the mask will be restored for us in _irq_threaded_handler()
> +        * on the PROCESSING -> ACTIVE transition.
> +        * If the IRQ is suspended/suspending, the mask is restored at resume time.
> +        */
> +       if (pirq->state == PANTHOR_IRQ_STATE_ACTIVE)
> +               gpu_write(pirq->iomem, INT_MASK, pirq->mask);
> +}
> +
> +static inline int
> +panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> +                   int irq, u32 mask, void __iomem *iomem, const char *name,
> +                   irqreturn_t (*threaded_handler)(int, void *data))
> +{
> +       const char *full_name;
> +
> +       pirq->ptdev = ptdev;
> +       pirq->irq = irq;
> +       pirq->mask = mask;
> +       pirq->iomem = iomem;
> +       spin_lock_init(&pirq->mask_lock);
> +
> +       full_name = devm_kasprintf(ptdev->base.dev, GFP_KERNEL, KBUILD_MODNAME "-%s", name);
> +       if (!full_name)
> +               return -ENOMEM;
> +
> +       panthor_irq_resume(pirq);
> +       return devm_request_threaded_irq(ptdev->base.dev, irq,
> +                                        panthor_irq_default_raw_handler,
> +                                        threaded_handler,
> +                                        IRQF_SHARED, full_name, pirq);
>  }
>
>  extern struct workqueue_struct *panthor_cleanup_wq;
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 986151681b24..eaf599b0a887 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1064,8 +1064,9 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
>                          msecs_to_jiffies(PING_INTERVAL_MS));
>  }
>
> -static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
> +static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
>  {
> +       struct panthor_device *ptdev = pirq->ptdev;
>         u32 duration;
>         u64 start = 0;
>
> @@ -1091,7 +1092,11 @@ static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
>                 trace_gpu_job_irq(ptdev->base.dev, status, duration);
>         }
>  }
> -PANTHOR_IRQ_HANDLER(job, panthor_job_irq_handler);
> +
> +static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
> +{
> +       return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
> +}
>
>  static int panthor_fw_start(struct panthor_device *ptdev)
>  {
> @@ -1099,8 +1104,8 @@ static int panthor_fw_start(struct panthor_device *ptdev)
>         bool timedout = false;
>
>         ptdev->fw->booted = false;
> -       panthor_job_irq_enable_events(&ptdev->fw->irq, ~0);
> -       panthor_job_irq_resume(&ptdev->fw->irq);
> +       panthor_irq_enable_events(&ptdev->fw->irq, ~0);
> +       panthor_irq_resume(&ptdev->fw->irq);
>         gpu_write(fw->iomem, MCU_CONTROL, MCU_CONTROL_AUTO);
>
>         if (!wait_event_timeout(ptdev->fw->req_waitqueue,
> @@ -1210,7 +1215,7 @@ void panthor_fw_pre_reset(struct panthor_device *ptdev, bool on_hang)
>                         ptdev->reset.fast = true;
>         }
>
> -       panthor_job_irq_suspend(&ptdev->fw->irq);
> +       panthor_irq_suspend(&ptdev->fw->irq);
>         panthor_fw_stop(ptdev);
>  }
>
> @@ -1280,7 +1285,7 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>         if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) {
>                 /* Make sure the IRQ handler cannot be called after that point. */
>                 if (ptdev->fw->irq.irq)
> -                       panthor_job_irq_suspend(&ptdev->fw->irq);
> +                       panthor_irq_suspend(&ptdev->fw->irq);
>
>                 panthor_fw_stop(ptdev);
>         }
> @@ -1476,8 +1481,9 @@ int panthor_fw_init(struct panthor_device *ptdev)
>         if (irq <= 0)
>                 return -ENODEV;
>
> -       ret = panthor_request_job_irq(ptdev, &fw->irq, irq, 0,
> -                                     ptdev->iomem + JOB_INT_BASE);
> +       ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
> +                                 ptdev->iomem + JOB_INT_BASE, "job",
> +                                 panthor_job_irq_threaded_handler);
>         if (ret) {
>                 drm_err(&ptdev->base, "failed to request job irq");
>                 return ret;
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index e52c5675981f..ce208e384762 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -86,8 +86,9 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
>         gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
>  }
>
> -static void panthor_gpu_irq_handler(struct panthor_device *ptdev, u32 status)
> +static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
>  {
> +       struct panthor_device *ptdev = pirq->ptdev;
>         struct panthor_gpu *gpu = ptdev->gpu;
>
>         gpu_write(gpu->irq.iomem, INT_CLEAR, status);
> @@ -116,7 +117,11 @@ static void panthor_gpu_irq_handler(struct panthor_device *ptdev, u32 status)
>         }
>         spin_unlock(&ptdev->gpu->reqs_lock);
>  }
> -PANTHOR_IRQ_HANDLER(gpu, panthor_gpu_irq_handler);
> +
> +static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
> +{
> +       return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);
> +}
>
>  /**
>   * panthor_gpu_unplug() - Called when the GPU is unplugged.
> @@ -128,7 +133,7 @@ void panthor_gpu_unplug(struct panthor_device *ptdev)
>
>         /* Make sure the IRQ handler is not running after that point. */
>         if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev))
> -               panthor_gpu_irq_suspend(&ptdev->gpu->irq);
> +               panthor_irq_suspend(&ptdev->gpu->irq);
>
>         /* Wake-up all waiters. */
>         spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
> @@ -169,9 +174,10 @@ int panthor_gpu_init(struct panthor_device *ptdev)
>         if (irq < 0)
>                 return irq;
>
> -       ret = panthor_request_gpu_irq(ptdev, &ptdev->gpu->irq, irq,
> -                                     GPU_INTERRUPTS_MASK,
> -                                     ptdev->iomem + GPU_INT_BASE);
> +       ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
> +                                 GPU_INTERRUPTS_MASK,
> +                                 ptdev->iomem + GPU_INT_BASE, "gpu",
> +                                 panthor_gpu_irq_threaded_handler);
>         if (ret)
>                 return ret;
>
> @@ -182,7 +188,7 @@ int panthor_gpu_power_changed_on(struct panthor_device *ptdev)
>  {
>         guard(pm_runtime_active)(ptdev->base.dev);
>
> -       panthor_gpu_irq_enable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
> +       panthor_irq_enable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
>
>         return 0;
>  }
> @@ -191,7 +197,7 @@ void panthor_gpu_power_changed_off(struct panthor_device *ptdev)
>  {
>         guard(pm_runtime_active)(ptdev->base.dev);
>
> -       panthor_gpu_irq_disable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
> +       panthor_irq_disable_events(&ptdev->gpu->irq, GPU_POWER_INTERRUPTS_MASK);
>  }
>
>  /**
> @@ -424,7 +430,7 @@ void panthor_gpu_suspend(struct panthor_device *ptdev)
>         else
>                 panthor_hw_l2_power_off(ptdev);
>
> -       panthor_gpu_irq_suspend(&ptdev->gpu->irq);
> +       panthor_irq_suspend(&ptdev->gpu->irq);
>  }
>
>  /**
> @@ -436,7 +442,7 @@ void panthor_gpu_suspend(struct panthor_device *ptdev)
>   */
>  void panthor_gpu_resume(struct panthor_device *ptdev)
>  {
> -       panthor_gpu_irq_resume(&ptdev->gpu->irq);
> +       panthor_irq_resume(&ptdev->gpu->irq);
>         panthor_hw_l2_power_on(ptdev);
>  }
>
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index 452d0b6d4668..375022fb3fd8 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -586,17 +586,13 @@ static u32 panthor_mmu_as_fault_mask(struct panthor_device *ptdev, u32 as)
>         return BIT(as);
>  }
>
> -/* Forward declaration to call helpers within as_enable/disable */
> -static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status);
> -PANTHOR_IRQ_HANDLER(mmu, panthor_mmu_irq_handler);
> -
>  static int panthor_mmu_as_enable(struct panthor_device *ptdev, u32 as_nr,
>                                  u64 transtab, u64 transcfg, u64 memattr)
>  {
>         struct panthor_mmu *mmu = ptdev->mmu;
>
> -       panthor_mmu_irq_enable_events(&ptdev->mmu->irq,
> -                                     panthor_mmu_as_fault_mask(ptdev, as_nr));
> +       panthor_irq_enable_events(&ptdev->mmu->irq,
> +                                 panthor_mmu_as_fault_mask(ptdev, as_nr));
>
>         gpu_write64(mmu->iomem, AS_TRANSTAB(as_nr), transtab);
>         gpu_write64(mmu->iomem, AS_MEMATTR(as_nr), memattr);
> @@ -614,8 +610,8 @@ static int panthor_mmu_as_disable(struct panthor_device *ptdev, u32 as_nr,
>
>         lockdep_assert_held(&ptdev->mmu->as.slots_lock);
>
> -       panthor_mmu_irq_disable_events(&ptdev->mmu->irq,
> -                                      panthor_mmu_as_fault_mask(ptdev, as_nr));
> +       panthor_irq_disable_events(&ptdev->mmu->irq,
> +                                  panthor_mmu_as_fault_mask(ptdev, as_nr));
>
>         /* Flush+invalidate RW caches, invalidate RO ones. */
>         ret = panthor_gpu_flush_caches(ptdev, CACHE_CLEAN | CACHE_INV,
> @@ -1785,8 +1781,9 @@ static void panthor_vm_unlock_region(struct panthor_vm *vm)
>         mutex_unlock(&ptdev->mmu->as.slots_lock);
>  }
>
> -static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
> +static void panthor_mmu_irq_handler(struct panthor_irq *pirq, u32 status)
>  {
> +       struct panthor_device *ptdev = pirq->ptdev;
>         struct panthor_mmu *mmu = ptdev->mmu;
>         bool has_unhandled_faults = false;
>
> @@ -1849,6 +1846,11 @@ static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
>                 panthor_sched_report_mmu_fault(ptdev);
>  }
>
> +static irqreturn_t panthor_mmu_irq_threaded_handler(int irq, void *data)
> +{
> +       return panthor_irq_default_threaded_handler(data, panthor_mmu_irq_handler);
> +}
> +
>  /**
>   * panthor_mmu_suspend() - Suspend the MMU logic
>   * @ptdev: Device.
> @@ -1873,7 +1875,7 @@ void panthor_mmu_suspend(struct panthor_device *ptdev)
>         }
>         mutex_unlock(&ptdev->mmu->as.slots_lock);
>
> -       panthor_mmu_irq_suspend(&ptdev->mmu->irq);
> +       panthor_irq_suspend(&ptdev->mmu->irq);
>  }
>
>  /**
> @@ -1892,7 +1894,7 @@ void panthor_mmu_resume(struct panthor_device *ptdev)
>         ptdev->mmu->as.faulty_mask = 0;
>         mutex_unlock(&ptdev->mmu->as.slots_lock);
>
> -       panthor_mmu_irq_resume(&ptdev->mmu->irq);
> +       panthor_irq_resume(&ptdev->mmu->irq);
>  }
>
>  /**
> @@ -1909,7 +1911,7 @@ void panthor_mmu_pre_reset(struct panthor_device *ptdev)
>  {
>         struct panthor_vm *vm;
>
> -       panthor_mmu_irq_suspend(&ptdev->mmu->irq);
> +       panthor_irq_suspend(&ptdev->mmu->irq);
>
>         mutex_lock(&ptdev->mmu->vm.lock);
>         ptdev->mmu->vm.reset_in_progress = true;
> @@ -1946,7 +1948,7 @@ void panthor_mmu_post_reset(struct panthor_device *ptdev)
>
>         mutex_unlock(&ptdev->mmu->as.slots_lock);
>
> -       panthor_mmu_irq_resume(&ptdev->mmu->irq);
> +       panthor_irq_resume(&ptdev->mmu->irq);
>
>         /* Restart the VM_BIND queues. */
>         mutex_lock(&ptdev->mmu->vm.lock);
> @@ -3207,7 +3209,7 @@ panthor_mmu_reclaim_priv_bos(struct panthor_device *ptdev,
>  void panthor_mmu_unplug(struct panthor_device *ptdev)
>  {
>         if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev))
> -               panthor_mmu_irq_suspend(&ptdev->mmu->irq);
> +               panthor_irq_suspend(&ptdev->mmu->irq);
>
>         mutex_lock(&ptdev->mmu->as.slots_lock);
>         for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) {
> @@ -3261,9 +3263,10 @@ int panthor_mmu_init(struct panthor_device *ptdev)
>         if (irq <= 0)
>                 return -ENODEV;
>
> -       ret = panthor_request_mmu_irq(ptdev, &mmu->irq, irq,
> -                                     panthor_mmu_fault_mask(ptdev, ~0),
> -                                     ptdev->iomem + MMU_INT_BASE);
> +       ret = panthor_irq_request(ptdev, &mmu->irq, irq,
> +                                 panthor_mmu_fault_mask(ptdev, ~0),
> +                                 ptdev->iomem + MMU_INT_BASE, "mmu",
> +                                 panthor_mmu_irq_threaded_handler);
>         if (ret)
>                 return ret;
>
> diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
> index 7c7f424a1436..80cf78007896 100644
> --- a/drivers/gpu/drm/panthor/panthor_pwr.c
> +++ b/drivers/gpu/drm/panthor/panthor_pwr.c
> @@ -56,8 +56,9 @@ struct panthor_pwr {
>         wait_queue_head_t reqs_acked;
>  };
>
> -static void panthor_pwr_irq_handler(struct panthor_device *ptdev, u32 status)
> +static void panthor_pwr_irq_handler(struct panthor_irq *pirq, u32 status)
>  {
> +       struct panthor_device *ptdev = pirq->ptdev;
>         struct panthor_pwr *pwr = ptdev->pwr;
>
>         spin_lock(&ptdev->pwr->reqs_lock);
> @@ -75,7 +76,11 @@ static void panthor_pwr_irq_handler(struct panthor_device *ptdev, u32 status)
>         }
>         spin_unlock(&ptdev->pwr->reqs_lock);
>  }
> -PANTHOR_IRQ_HANDLER(pwr, panthor_pwr_irq_handler);
> +
> +static irqreturn_t panthor_pwr_irq_threaded_handler(int irq, void *data)
> +{
> +       return panthor_irq_default_threaded_handler(data, panthor_pwr_irq_handler);
> +}
>
>  static void panthor_pwr_write_command(struct panthor_device *ptdev, u32 command, u64 args)
>  {
> @@ -453,7 +458,7 @@ void panthor_pwr_unplug(struct panthor_device *ptdev)
>                 return;
>
>         /* Make sure the IRQ handler is not running after that point. */
> -       panthor_pwr_irq_suspend(&ptdev->pwr->irq);
> +       panthor_irq_suspend(&ptdev->pwr->irq);
>
>         /* Wake-up all waiters. */
>         spin_lock_irqsave(&ptdev->pwr->reqs_lock, flags);
> @@ -483,9 +488,10 @@ int panthor_pwr_init(struct panthor_device *ptdev)
>         if (irq < 0)
>                 return irq;
>
> -       err = panthor_request_pwr_irq(
> +       err = panthor_irq_request(
>                 ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
> -               pwr->iomem + PWR_INT_BASE);
> +               pwr->iomem + PWR_INT_BASE, "pwr",
> +               panthor_pwr_irq_threaded_handler);
>         if (err)
>                 return err;
>
> @@ -564,7 +570,7 @@ void panthor_pwr_suspend(struct panthor_device *ptdev)
>         if (!ptdev->pwr)
>                 return;
>
> -       panthor_pwr_irq_suspend(&ptdev->pwr->irq);
> +       panthor_irq_suspend(&ptdev->pwr->irq);
>  }
>
>  void panthor_pwr_resume(struct panthor_device *ptdev)
> @@ -572,5 +578,5 @@ void panthor_pwr_resume(struct panthor_device *ptdev)
>         if (!ptdev->pwr)
>                 return;
>
> -       panthor_pwr_irq_resume(&ptdev->pwr->irq);
> +       panthor_irq_resume(&ptdev->pwr->irq);
>  }
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers
  2026-05-12 11:37 ` [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers Boris Brezillon
@ 2026-05-12 19:11   ` Chia-I Wu
  2026-05-13  8:09     ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 19:11 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> All drivers except panthor signal their fences from their interrupt
> handler to minimize latency. We could do the same from the threaded
> handler, but the latency is still quite high in that case, so let's
> allow components to choose the context they want their IRQ handler
> to run in by exposing support for custom hard handlers.
>
> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_device.h | 11 ++++++++---
>  drivers/gpu/drm/panthor/panthor_fw.c     |  1 +
>  drivers/gpu/drm/panthor/panthor_gpu.c    |  1 +
>  drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
>  drivers/gpu/drm/panthor/panthor_pwr.c    |  1 +
>  5 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 393fcda73d88..1aaf06df875b 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
>  static inline int
>  panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
>                     int irq, u32 mask, void __iomem *iomem, const char *name,
> +                   irqreturn_t (*raw_handler)(int, void *data),
>                     irqreturn_t (*threaded_handler)(int, void *data))
>  {
>         const char *full_name;
> @@ -687,9 +688,13 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
>                 return -ENOMEM;
>
>         panthor_irq_resume(pirq);
> -       return devm_request_threaded_irq(ptdev->base.dev, irq,
> -                                        panthor_irq_default_raw_handler,
> -                                        threaded_handler,
> +
> +       if (!threaded_handler) {
> +               return devm_request_irq(ptdev->base.dev, irq, raw_handler,
> +                                       IRQF_SHARED, full_name, pirq);
> +       }
devm_request_irq expands to devm_request_threaded_irq plus
IRQF_COND_ONESHOT. This appears redundant.

> +
> +       return devm_request_threaded_irq(ptdev->base.dev, irq, raw_handler, threaded_handler,
>                                          IRQF_SHARED, full_name, pirq);
>  }
>
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index eaf599b0a887..8239a6951569 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1483,6 +1483,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
>
>         ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
>                                   ptdev->iomem + JOB_INT_BASE, "job",
> +                                 panthor_irq_default_raw_handler,
>                                   panthor_job_irq_threaded_handler);
>         if (ret) {
>                 drm_err(&ptdev->base, "failed to request job irq");
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index ce208e384762..d0be758ea3e1 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -177,6 +177,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
>         ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
>                                   GPU_INTERRUPTS_MASK,
>                                   ptdev->iomem + GPU_INT_BASE, "gpu",
> +                                 panthor_irq_default_raw_handler,
>                                   panthor_gpu_irq_threaded_handler);
>         if (ret)
>                 return ret;
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index 375022fb3fd8..2955b8baa2e2 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -3266,6 +3266,7 @@ int panthor_mmu_init(struct panthor_device *ptdev)
>         ret = panthor_irq_request(ptdev, &mmu->irq, irq,
>                                   panthor_mmu_fault_mask(ptdev, ~0),
>                                   ptdev->iomem + MMU_INT_BASE, "mmu",
> +                                 panthor_irq_default_raw_handler,
>                                   panthor_mmu_irq_threaded_handler);
>         if (ret)
>                 return ret;
> diff --git a/drivers/gpu/drm/panthor/panthor_pwr.c b/drivers/gpu/drm/panthor/panthor_pwr.c
> index 80cf78007896..1efb7f3482ba 100644
> --- a/drivers/gpu/drm/panthor/panthor_pwr.c
> +++ b/drivers/gpu/drm/panthor/panthor_pwr.c
> @@ -491,6 +491,7 @@ int panthor_pwr_init(struct panthor_device *ptdev)
>         err = panthor_irq_request(
>                 ptdev, &pwr->irq, irq, PWR_INTERRUPTS_MASK,
>                 pwr->iomem + PWR_INT_BASE, "pwr",
> +               panthor_irq_default_raw_handler,
>                 panthor_pwr_irq_threaded_handler);
>         if (err)
>                 return err;
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 05/11] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context
  2026-05-12 11:37 ` [PATCH v2 05/11] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
@ 2026-05-12 19:29   ` Chia-I Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 19:29 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> If we want some FW events to be processed in the interrupt path, we need
> the helpers manipulating req regs to be IRQ-safe, which implies using
> spin_lock_irqsave instead of spinlock. While at it, use guards instead
> of plain spin_lock/unlock calls.
>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> ---
>  drivers/gpu/drm/panthor/panthor_fw.h | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
> index a99a9b6f4825..e56b7fe15bb3 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.h
> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
> @@ -432,12 +432,11 @@ struct panthor_fw_global_iface {
>  #define panthor_fw_toggle_reqs(__iface, __in_reg, __out_reg, __mask) \
>         do { \
>                 u32 __cur_val, __new_val, __out_val; \
> -               spin_lock(&(__iface)->lock); \
> +               guard(spinlock_irqsave)(&(__iface)->lock); \
>                 __cur_val = READ_ONCE((__iface)->input->__in_reg); \
>                 __out_val = READ_ONCE((__iface)->output->__out_reg); \
>                 __new_val = ((__out_val ^ (__mask)) & (__mask)) | (__cur_val & ~(__mask)); \
>                 WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -               spin_unlock(&(__iface)->lock); \
>         } while (0)
>
>  /**
> @@ -458,21 +457,19 @@ struct panthor_fw_global_iface {
>  #define panthor_fw_update_reqs(__iface, __in_reg, __val, __mask) \
>         do { \
>                 u32 __cur_val, __new_val; \
> -               spin_lock(&(__iface)->lock); \
> +               guard(spinlock_irqsave)(&(__iface)->lock); \
>                 __cur_val = READ_ONCE((__iface)->input->__in_reg); \
>                 __new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
>                 WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -               spin_unlock(&(__iface)->lock); \
>         } while (0)
>
>  #define panthor_fw_update_reqs64(__iface, __in_reg, __val, __mask) \
>         do { \
>                 u64 __cur_val, __new_val; \
> -               spin_lock(&(__iface)->lock); \
> +               guard(spinlock_irqsave)(&(__iface)->lock); \
>                 __cur_val = READ_ONCE((__iface)->input->__in_reg); \
>                 __new_val = (__cur_val & ~(__mask)) | ((__val) & (__mask)); \
>                 WRITE_ONCE((__iface)->input->__in_reg, __new_val); \
> -               spin_unlock(&(__iface)->lock); \
>         } while (0)
>
>  struct panthor_fw_global_iface *
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-12 11:37 ` [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
@ 2026-05-12 21:04   ` Chia-I Wu
  2026-05-13  8:29     ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 21:04 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 5:14 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> Add a specific spinlock for events processing, and force processing
> of events in the panthor_sched_report_fw_events() path rather than
> deferring it to a work item. We also fast-track fence signalling by
> making the job completion logic IRQ-safe.
>
> Note that it requires changing a couple spin_lock() into
> spin_lock_irqsave() when those are taken inside a events_lock section.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 332 +++++++++++++++-----------------
>  1 file changed, 155 insertions(+), 177 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 5b34032deff8..fbf76b59b7ef 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -177,18 +177,6 @@ struct panthor_scheduler {
>          */
>         struct work_struct sync_upd_work;
>
> -       /**
> -        * @fw_events_work: Work used to process FW events outside the interrupt path.
> -        *
> -        * Even if the interrupt is threaded, we need any event processing
> -        * that require taking the panthor_scheduler::lock to be processed
> -        * outside the interrupt path so we don't block the tick logic when
> -        * it calls panthor_fw_{csg,wait}_wait_acks(). Since most of the
> -        * event processing requires taking this lock, we just delegate all
> -        * FW event processing to the scheduler workqueue.
> -        */
> -       struct work_struct fw_events_work;
> -
>         /**
>          * @fw_events: Bitmask encoding pending FW events.
>          */
If we process all fw events in the irq context, we can remove
fw_events as well. More on this below.
> @@ -254,6 +242,15 @@ struct panthor_scheduler {
>                 struct list_head waiting;
>         } groups;
>
> +       /**
> +        * @events_lock: Lock taken when processing events.
> +        *
> +        * This also needs to be taken when csg_slots are updated, to make sure
> +        * the event processing logic doesn't touch groups that have left the CSG
> +        * slot.
> +        */
> +       spinlock_t events_lock;
> +
>         /**
>          * @csg_slots: FW command stream group slots.
It looks like read access can use either lock (process context) or
events_lock (irq context), while write access must use events_lock
(process context). Can we put that into the comment, or if makes
sense, enforce that with accessor functions?


>          */
> @@ -676,9 +673,6 @@ struct panthor_group {
>          */
>         struct panthor_kernel_bo *protm_suspend_buf;
>
> -       /** @sync_upd_work: Work used to check/signal job fences. */
> -       struct work_struct sync_upd_work;
> -
Can we make this a preparatory commit, where group_sync_upd_work is
replaced by group_check_job_completion?

Multiple things happen in this commit. I try to identify things that
can be separate commits. If this does not make sense, feel free to
ignore.

>         /** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
>         struct work_struct tiler_oom_work;
>
> @@ -999,7 +993,6 @@ static int
>  group_bind_locked(struct panthor_group *group, u32 csg_id)
>  {
>         struct panthor_device *ptdev = group->ptdev;
> -       struct panthor_csg_slot *csg_slot;
>         int ret;
>
>         lockdep_assert_held(&ptdev->scheduler->lock);
> @@ -1012,9 +1005,7 @@ group_bind_locked(struct panthor_group *group, u32 csg_id)
>         if (ret)
>                 return ret;
>
> -       csg_slot = &ptdev->scheduler->csg_slots[csg_id];
>         group_get(group);
> -       group->csg_id = csg_id;
>
>         /* Dummy doorbell allocation: doorbell is assigned to the group and
>          * all queues use the same doorbell.
> @@ -1026,7 +1017,10 @@ group_bind_locked(struct panthor_group *group, u32 csg_id)
>         for (u32 i = 0; i < group->queue_count; i++)
>                 group->queues[i]->doorbell_id = csg_id + 1;
>
> -       csg_slot->group = group;
> +       scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +               ptdev->scheduler->csg_slots[csg_id].group = group;
> +               group->csg_id = csg_id;
> +       }
>
>         return 0;
>  }
> @@ -1041,7 +1035,6 @@ static int
>  group_unbind_locked(struct panthor_group *group)
>  {
>         struct panthor_device *ptdev = group->ptdev;
> -       struct panthor_csg_slot *slot;
>
>         lockdep_assert_held(&ptdev->scheduler->lock);
>
> @@ -1051,9 +1044,12 @@ group_unbind_locked(struct panthor_group *group)
>         if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
>                 return -EINVAL;
>
> -       slot = &ptdev->scheduler->csg_slots[group->csg_id];
> +       scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +               ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
> +               group->csg_id = -1;
> +       }
> +
>         panthor_vm_idle(group->vm);
> -       group->csg_id = -1;
>
>         /* Tiler OOM events will be re-issued next time the group is scheduled. */
>         atomic_set(&group->tiler_oom, 0);
> @@ -1062,8 +1058,6 @@ group_unbind_locked(struct panthor_group *group)
>         for (u32 i = 0; i < group->queue_count; i++)
>                 group->queues[i]->doorbell_id = -1;
>
> -       slot->group = NULL;
> -
>         group_put(group);
>         return 0;
>  }
> @@ -1151,16 +1145,14 @@ queue_suspend_timeout_locked(struct panthor_queue *queue)
>  static void
>  queue_suspend_timeout(struct panthor_queue *queue)
>  {
> -       spin_lock(&queue->fence_ctx.lock);
> +       guard(spinlock_irqsave)(&queue->fence_ctx.lock);
>         queue_suspend_timeout_locked(queue);
> -       spin_unlock(&queue->fence_ctx.lock);
>  }
>
>  static void
>  queue_resume_timeout(struct panthor_queue *queue)
>  {
> -       spin_lock(&queue->fence_ctx.lock);
> -
> +       guard(spinlock_irqsave)(&queue->fence_ctx.lock);
>         if (queue_timeout_is_suspended(queue)) {
>                 mod_delayed_work(queue->scheduler.timeout_wq,
>                                  &queue->timeout.work,
> @@ -1168,8 +1160,6 @@ queue_resume_timeout(struct panthor_queue *queue)
>
>                 queue->timeout.remaining = MAX_SCHEDULE_TIMEOUT;
>         }
> -
> -       spin_unlock(&queue->fence_ctx.lock);
>  }
>
>  /**
> @@ -1484,7 +1474,7 @@ cs_slot_process_fatal_event_locked(struct panthor_device *ptdev,
>         u32 fatal;
>         u64 info;
>
> -       lockdep_assert_held(&sched->lock);
> +       lockdep_assert_held(&sched->events_lock);
>
>         cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
>         fatal = cs_iface->output->fatal;
> @@ -1532,7 +1522,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
>         u32 fault;
>         u64 info;
>
> -       lockdep_assert_held(&sched->lock);
> +       lockdep_assert_held(&sched->events_lock);
>
>         cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
>         fault = cs_iface->output->fault;
> @@ -1542,7 +1532,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
>                 u64 cs_extract = queue->iface.output->extract;
>                 struct panthor_job *job;
>
> -               spin_lock(&queue->fence_ctx.lock);
> +               guard(spinlock_irqsave)(&queue->fence_ctx.lock);
>                 list_for_each_entry(job, &queue->fence_ctx.in_flight_jobs, node) {
>                         if (cs_extract >= job->ringbuf.end)
>                                 continue;
> @@ -1552,7 +1542,6 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
>
>                         dma_fence_set_error(job->done_fence, -EINVAL);
>                 }
> -               spin_unlock(&queue->fence_ctx.lock);
>         }
>
>         if (group) {
> @@ -1682,7 +1671,7 @@ cs_slot_process_tiler_oom_event_locked(struct panthor_device *ptdev,
>         struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
>         struct panthor_group *group = csg_slot->group;
>
> -       lockdep_assert_held(&sched->lock);
> +       lockdep_assert_held(&sched->events_lock);
>
>         if (drm_WARN_ON(&ptdev->base, !group))
>                 return;
> @@ -1703,7 +1692,7 @@ static bool cs_slot_process_irq_locked(struct panthor_device *ptdev,
>         struct panthor_fw_cs_iface *cs_iface;
>         u32 req, ack, events;
>
> -       lockdep_assert_held(&ptdev->scheduler->lock);
> +       lockdep_assert_held(&ptdev->scheduler->events_lock);
>
>         cs_iface = panthor_fw_get_cs_iface(ptdev, csg_id, cs_id);
>         req = cs_iface->input->req;
> @@ -1731,7 +1720,7 @@ static void csg_slot_process_idle_event_locked(struct panthor_device *ptdev, u32
>  {
>         struct panthor_scheduler *sched = ptdev->scheduler;
>
> -       lockdep_assert_held(&sched->lock);
> +       lockdep_assert_held(&sched->events_lock);
>
>         sched->might_have_idle_groups = true;
>
> @@ -1742,16 +1731,102 @@ static void csg_slot_process_idle_event_locked(struct panthor_device *ptdev, u32
>         sched_queue_delayed_work(sched, tick, 0);
>  }
>
> +static void update_fdinfo_stats(struct panthor_job *job)
> +{
> +       struct panthor_group *group = job->group;
> +       struct panthor_queue *queue = group->queues[job->queue_idx];
> +       struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
> +       struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
> +       struct panthor_job_profiling_data *data = &slots[job->profiling.slot];
> +
> +       scoped_guard(spinlock_irqsave, &group->fdinfo.lock) {
> +               if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
> +                       fdinfo->cycles += data->cycles.after - data->cycles.before;
> +               if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
> +                       fdinfo->time += data->time.after - data->time.before;
> +       }
> +}
> +
> +static bool queue_check_job_completion(struct panthor_queue *queue)
> +{
> +       struct panthor_syncobj_64b *syncobj = NULL;
> +       struct panthor_job *job, *job_tmp;
> +       bool cookie, progress = false;
> +       LIST_HEAD(done_jobs);
> +
> +       cookie = dma_fence_begin_signalling();
> +       scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock) {
> +               list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) {
> +                       if (!syncobj) {
> +                               struct panthor_group *group = job->group;
> +
> +                               syncobj = group->syncobjs->kmap +
> +                                         (job->queue_idx * sizeof(*syncobj));
> +                       }
> +
> +                       if (syncobj->seqno < job->done_fence->seqno)
> +                               break;
> +
> +                       list_move_tail(&job->node, &done_jobs);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +
> +               if (list_empty(&queue->fence_ctx.in_flight_jobs)) {
> +                       /* If we have no job left, we cancel the timer, and reset remaining
> +                        * time to its default so it can be restarted next time
> +                        * queue_resume_timeout() is called.
> +                        */
> +                       queue_suspend_timeout_locked(queue);
> +
> +                       /* If there's no job pending, we consider it progress to avoid a
> +                        * spurious timeout if the timeout handler and the sync update
> +                        * handler raced.
> +                        */
> +                       progress = true;
> +               } else if (!list_empty(&done_jobs)) {
> +                       queue_reset_timeout_locked(queue);
> +                       progress = true;
> +               }
> +       }
> +       dma_fence_end_signalling(cookie);
> +
> +       list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
> +               if (job->profiling.mask)
> +                       update_fdinfo_stats(job);
> +               list_del_init(&job->node);
> +               panthor_job_put(&job->base);
> +       }
> +
> +       return progress;
> +}
> +
> +static void group_check_job_completion(struct panthor_group *group)
> +{
> +       bool cookie;
> +       u32 queue_idx;
> +
> +       cookie = dma_fence_begin_signalling();
> +       for (queue_idx = 0; queue_idx < group->queue_count; queue_idx++) {
> +               struct panthor_queue *queue = group->queues[queue_idx];
> +
> +               if (!queue)
> +                       continue;
> +
> +               queue_check_job_completion(queue);
> +       }
> +       dma_fence_end_signalling(cookie);
> +}
> +
>  static void csg_slot_sync_update_locked(struct panthor_device *ptdev,
>                                         u32 csg_id)
>  {
>         struct panthor_csg_slot *csg_slot = &ptdev->scheduler->csg_slots[csg_id];
>         struct panthor_group *group = csg_slot->group;
>
> -       lockdep_assert_held(&ptdev->scheduler->lock);
> +       lockdep_assert_held(&ptdev->scheduler->events_lock);
>
>         if (group)
> -               group_queue_work(group, sync_upd);
> +               group_check_job_completion(group);
>
>         sched_queue_work(ptdev->scheduler, sync_upd);
>  }
> @@ -1763,7 +1838,7 @@ csg_slot_process_progress_timer_event_locked(struct panthor_device *ptdev, u32 c
>         struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
>         struct panthor_group *group = csg_slot->group;
>
> -       lockdep_assert_held(&sched->lock);
> +       lockdep_assert_held(&sched->events_lock);
>
>         group = csg_slot->group;
>         if (!drm_WARN_ON(&ptdev->base, !group)) {
> @@ -1784,7 +1859,7 @@ static void sched_process_csg_irq_locked(struct panthor_device *ptdev, u32 csg_i
>         struct panthor_fw_csg_iface *csg_iface;
>         u32 ring_cs_db_mask = 0;
>
> -       lockdep_assert_held(&ptdev->scheduler->lock);
> +       lockdep_assert_held(&ptdev->scheduler->events_lock);
>
>         if (drm_WARN_ON(&ptdev->base, csg_id >= ptdev->scheduler->csg_slot_count))
>                 return;
> @@ -1842,7 +1917,7 @@ static void sched_process_idle_event_locked(struct panthor_device *ptdev)
>  {
>         struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>
> -       lockdep_assert_held(&ptdev->scheduler->lock);
> +       lockdep_assert_held(&ptdev->scheduler->events_lock);
>
>         /* Acknowledge the idle event and schedule a tick. */
>         panthor_fw_update_reqs(glb_iface, req, glb_iface->output->ack, GLB_IDLE);
> @@ -1858,7 +1933,7 @@ static void sched_process_global_irq_locked(struct panthor_device *ptdev)
>         struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>         u32 req, ack, evts;
>
> -       lockdep_assert_held(&ptdev->scheduler->lock);
> +       lockdep_assert_held(&ptdev->scheduler->events_lock);
>
>         req = READ_ONCE(glb_iface->input->req);
>         ack = READ_ONCE(glb_iface->output->ack);
> @@ -1868,30 +1943,6 @@ static void sched_process_global_irq_locked(struct panthor_device *ptdev)
>                 sched_process_idle_event_locked(ptdev);
>  }
>
> -static void process_fw_events_work(struct work_struct *work)
> -{
> -       struct panthor_scheduler *sched = container_of(work, struct panthor_scheduler,
> -                                                     fw_events_work);
> -       u32 events = atomic_xchg(&sched->fw_events, 0);
> -       struct panthor_device *ptdev = sched->ptdev;
> -
> -       mutex_lock(&sched->lock);
> -
> -       if (events & JOB_INT_GLOBAL_IF) {
> -               sched_process_global_irq_locked(ptdev);
> -               events &= ~JOB_INT_GLOBAL_IF;
> -       }
> -
> -       while (events) {
> -               u32 csg_id = ffs(events) - 1;
> -
> -               sched_process_csg_irq_locked(ptdev, csg_id);
> -               events &= ~BIT(csg_id);
> -       }
> -
> -       mutex_unlock(&sched->lock);
> -}
> -
>  /**
>   * panthor_sched_report_fw_events() - Report FW events to the scheduler.
>   * @ptdev: Device.
> @@ -1902,8 +1953,19 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
This can be renamed to panthor_sched_handle_fw_events.

>         if (!ptdev->scheduler)
>                 return;
>
> -       atomic_or(events, &ptdev->scheduler->fw_events);
> -       sched_queue_work(ptdev->scheduler, fw_events);
> +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> +
> +       if (events & JOB_INT_GLOBAL_IF) {
> +               sched_process_global_irq_locked(ptdev);
> +               events &= ~JOB_INT_GLOBAL_IF;
> +       }
> +
> +       while (events) {
> +               u32 csg_id = ffs(events) - 1;
> +
> +               sched_process_csg_irq_locked(ptdev, csg_id);
> +               events &= ~BIT(csg_id);
> +       }
This handles all fw events in the irq context. Are there concerns that
it may take too long? I might be wrong, but it seems possible to
handle only CSG_SYNC_UPDATE and defer the rest as before.

>  }
>
>  static const char *fence_get_driver_name(struct dma_fence *fence)
> @@ -2136,7 +2198,9 @@ tick_ctx_init(struct panthor_scheduler *sched,
>                  * CSG IRQs, so we can flag the faulty queue.
>                  */
>                 if (panthor_vm_has_unhandled_faults(group->vm)) {
> -                       sched_process_csg_irq_locked(ptdev, i);
> +                       scoped_guard(spinlock_irqsave, &sched->events_lock) {
> +                               sched_process_csg_irq_locked(ptdev, i);
> +                       }
>
>                         /* No fatal fault reported, flag all queues as faulty. */
>                         if (!group->fatal_queues)
> @@ -2183,13 +2247,13 @@ group_term_post_processing(struct panthor_group *group)
>                 if (!queue)
>                         continue;
>
> -               spin_lock(&queue->fence_ctx.lock);
> -               list_for_each_entry_safe(job, tmp, &queue->fence_ctx.in_flight_jobs, node) {
> -                       list_move_tail(&job->node, &faulty_jobs);
> -                       dma_fence_set_error(job->done_fence, err);
> -                       dma_fence_signal_locked(job->done_fence);
> +               scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock) {
> +                       list_for_each_entry_safe(job, tmp, &queue->fence_ctx.in_flight_jobs, node) {
> +                               list_move_tail(&job->node, &faulty_jobs);
> +                               dma_fence_set_error(job->done_fence, err);
> +                               dma_fence_signal_locked(job->done_fence);
> +                       }
>                 }
> -               spin_unlock(&queue->fence_ctx.lock);
>
>                 /* Manually update the syncobj seqno to unblock waiters. */
>                 syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj));
> @@ -2336,8 +2400,10 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
>                          * any pending interrupts before we start the new
>                          * group.
>                          */
> -                       if (group->csg_id >= 0)
> +                       if (group->csg_id >= 0) {
> +                               guard(spinlock_irqsave)(&sched->events_lock);
>                                 sched_process_csg_irq_locked(ptdev, group->csg_id);
> +                       }
>
>                         group_unbind_locked(group);
>                 }
> @@ -2902,10 +2968,12 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>                         u32 csg_id = ffs(slot_mask) - 1;
>                         struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
>
> -                       if (flush_caches_failed)
> +                       if (flush_caches_failed) {
>                                 csg_slot->group->state = PANTHOR_CS_GROUP_TERMINATED;
> -                       else
> +                       } else {
> +                               guard(spinlock_irqsave)(&sched->events_lock);
>                                 csg_slot_sync_update_locked(ptdev, csg_id);
> +                       }
>
>                         slot_mask &= ~BIT(csg_id);
>                 }
> @@ -2920,8 +2988,10 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>
>                 group_get(group);
>
> -               if (group->csg_id >= 0)
> +               if (group->csg_id >= 0) {
> +                       guard(spinlock_irqsave)(&sched->events_lock);
>                         sched_process_csg_irq_locked(ptdev, group->csg_id);
> +               }
>
>                 group_unbind_locked(group);
>
> @@ -3005,22 +3075,6 @@ void panthor_sched_post_reset(struct panthor_device *ptdev, bool reset_failed)
>         }
>  }
>
> -static void update_fdinfo_stats(struct panthor_job *job)
> -{
> -       struct panthor_group *group = job->group;
> -       struct panthor_queue *queue = group->queues[job->queue_idx];
> -       struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
> -       struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
> -       struct panthor_job_profiling_data *data = &slots[job->profiling.slot];
> -
> -       scoped_guard(spinlock, &group->fdinfo.lock) {
> -               if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
> -                       fdinfo->cycles += data->cycles.after - data->cycles.before;
> -               if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
> -                       fdinfo->time += data->time.after - data->time.before;
> -       }
> -}
> -
>  void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
>  {
>         struct panthor_group_pool *gpool = pfile->groups;
> @@ -3032,7 +3086,7 @@ void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
>
>         xa_lock(&gpool->xa);
>         xa_for_each_marked(&gpool->xa, i, group, GROUP_REGISTERED) {
> -               guard(spinlock)(&group->fdinfo.lock);
> +               guard(spinlock_irqsave)(&group->fdinfo.lock);
>                 pfile->stats.cycles += group->fdinfo.data.cycles;
>                 pfile->stats.time += group->fdinfo.data.time;
>                 group->fdinfo.data.cycles = 0;
> @@ -3041,80 +3095,6 @@ void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
>         xa_unlock(&gpool->xa);
>  }
>
> -static bool queue_check_job_completion(struct panthor_queue *queue)
> -{
> -       struct panthor_syncobj_64b *syncobj = NULL;
> -       struct panthor_job *job, *job_tmp;
> -       bool cookie, progress = false;
> -       LIST_HEAD(done_jobs);
> -
> -       cookie = dma_fence_begin_signalling();
> -       spin_lock(&queue->fence_ctx.lock);
> -       list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) {
> -               if (!syncobj) {
> -                       struct panthor_group *group = job->group;
> -
> -                       syncobj = group->syncobjs->kmap +
> -                                 (job->queue_idx * sizeof(*syncobj));
> -               }
> -
> -               if (syncobj->seqno < job->done_fence->seqno)
> -                       break;
> -
> -               list_move_tail(&job->node, &done_jobs);
> -               dma_fence_signal_locked(job->done_fence);
> -       }
> -
> -       if (list_empty(&queue->fence_ctx.in_flight_jobs)) {
> -               /* If we have no job left, we cancel the timer, and reset remaining
> -                * time to its default so it can be restarted next time
> -                * queue_resume_timeout() is called.
> -                */
> -               queue_suspend_timeout_locked(queue);
> -
> -               /* If there's no job pending, we consider it progress to avoid a
> -                * spurious timeout if the timeout handler and the sync update
> -                * handler raced.
> -                */
> -               progress = true;
> -       } else if (!list_empty(&done_jobs)) {
> -               queue_reset_timeout_locked(queue);
> -               progress = true;
> -       }
> -       spin_unlock(&queue->fence_ctx.lock);
> -       dma_fence_end_signalling(cookie);
> -
> -       list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
> -               if (job->profiling.mask)
> -                       update_fdinfo_stats(job);
> -               list_del_init(&job->node);
> -               panthor_job_put(&job->base);
> -       }
> -
> -       return progress;
> -}
> -
> -static void group_sync_upd_work(struct work_struct *work)
> -{
> -       struct panthor_group *group =
> -               container_of(work, struct panthor_group, sync_upd_work);
> -       u32 queue_idx;
> -       bool cookie;
> -
> -       cookie = dma_fence_begin_signalling();
> -       for (queue_idx = 0; queue_idx < group->queue_count; queue_idx++) {
> -               struct panthor_queue *queue = group->queues[queue_idx];
> -
> -               if (!queue)
> -                       continue;
> -
> -               queue_check_job_completion(queue);
> -       }
> -       dma_fence_end_signalling(cookie);
> -
> -       group_put(group);
> -}
> -
>  struct panthor_job_ringbuf_instrs {
>         u64 buffer[MAX_INSTRS_PER_JOB];
>         u32 count;
> @@ -3346,9 +3326,8 @@ queue_run_job(struct drm_sched_job *sched_job)
>         job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64));
>
>         panthor_job_get(&job->base);
> -       spin_lock(&queue->fence_ctx.lock);
> -       list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
> -       spin_unlock(&queue->fence_ctx.lock);
> +       scoped_guard(spinlock_irqsave, &queue->fence_ctx.lock)
> +               list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
>
>         /* Make sure the ring buffer is updated before the INSERT
>          * register.
> @@ -3683,7 +3662,6 @@ int panthor_group_create(struct panthor_file *pfile,
>         INIT_LIST_HEAD(&group->wait_node);
>         INIT_LIST_HEAD(&group->run_node);
>         INIT_WORK(&group->term_work, group_term_work);
> -       INIT_WORK(&group->sync_upd_work, group_sync_upd_work);
>         INIT_WORK(&group->tiler_oom_work, group_tiler_oom_work);
>         INIT_WORK(&group->release_work, group_release_work);
>
> @@ -4054,7 +4032,6 @@ void panthor_sched_unplug(struct panthor_device *ptdev)
>         struct panthor_scheduler *sched = ptdev->scheduler;
>
>         disable_delayed_work_sync(&sched->tick_work);
> -       disable_work_sync(&sched->fw_events_work);
>         disable_work_sync(&sched->sync_upd_work);
>
>         mutex_lock(&sched->lock);
> @@ -4139,7 +4116,8 @@ int panthor_sched_init(struct panthor_device *ptdev)
>         sched->tick_period = msecs_to_jiffies(10);
>         INIT_DELAYED_WORK(&sched->tick_work, tick_work);
>         INIT_WORK(&sched->sync_upd_work, sync_upd_work);
> -       INIT_WORK(&sched->fw_events_work, process_fw_events_work);
> +
> +       spin_lock_init(&sched->events_lock);
>
>         ret = drmm_mutex_init(&ptdev->base, &sched->lock);
>         if (ret)
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time
  2026-05-12 11:37 ` [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
@ 2026-05-12 21:16   ` Chia-I Wu
  2026-05-14 14:17   ` Steven Price
  1 sibling, 0 replies; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 21:16 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> Make the sched_process_csg_irq_locked() call part of
> group_unbind_locked() so we don't have to manually call it in
> tick_ctx_apply()/panthor_sched_suspend().
>
> This implies moving group_[un]bind_locked() around to avoid a
> forward declaration.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 176 +++++++++++++++-----------------
>  1 file changed, 82 insertions(+), 94 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index fbf76b59b7ef..6c5ba747ae45 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -982,86 +982,6 @@ group_get(struct panthor_group *group)
>         return group;
>  }
>
> -/**
> - * group_bind_locked() - Bind a group to a group slot
> - * @group: Group.
> - * @csg_id: Slot.
> - *
> - * Return: 0 on success, a negative error code otherwise.
> - */
> -static int
> -group_bind_locked(struct panthor_group *group, u32 csg_id)
> -{
> -       struct panthor_device *ptdev = group->ptdev;
> -       int ret;
> -
> -       lockdep_assert_held(&ptdev->scheduler->lock);
> -
> -       if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
> -                       ptdev->scheduler->csg_slots[csg_id].group))
> -               return -EINVAL;
> -
> -       ret = panthor_vm_active(group->vm);
> -       if (ret)
> -               return ret;
> -
> -       group_get(group);
> -
> -       /* Dummy doorbell allocation: doorbell is assigned to the group and
> -        * all queues use the same doorbell.
> -        *
> -        * TODO: Implement LRU-based doorbell assignment, so the most often
> -        * updated queues get their own doorbell, thus avoiding useless checks
> -        * on queues belonging to the same group that are rarely updated.
> -        */
> -       for (u32 i = 0; i < group->queue_count; i++)
> -               group->queues[i]->doorbell_id = csg_id + 1;
> -
> -       scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> -               ptdev->scheduler->csg_slots[csg_id].group = group;
> -               group->csg_id = csg_id;
> -       }
> -
> -       return 0;
> -}
> -
> -/**
> - * group_unbind_locked() - Unbind a group from a slot.
> - * @group: Group to unbind.
> - *
> - * Return: 0 on success, a negative error code otherwise.
> - */
> -static int
> -group_unbind_locked(struct panthor_group *group)
> -{
> -       struct panthor_device *ptdev = group->ptdev;
> -
> -       lockdep_assert_held(&ptdev->scheduler->lock);
> -
> -       if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
> -               return -EINVAL;
> -
> -       if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
> -               return -EINVAL;
> -
> -       scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> -               ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
> -               group->csg_id = -1;
> -       }
> -
> -       panthor_vm_idle(group->vm);
> -
> -       /* Tiler OOM events will be re-issued next time the group is scheduled. */
> -       atomic_set(&group->tiler_oom, 0);
> -       cancel_work(&group->tiler_oom_work);
> -
> -       for (u32 i = 0; i < group->queue_count; i++)
> -               group->queues[i]->doorbell_id = -1;
> -
> -       group_put(group);
> -       return 0;
> -}
> -
>  static bool
>  group_is_idle(struct panthor_group *group)
>  {
> @@ -1968,6 +1888,88 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
>         }
>  }
>
> +/**
> + * group_bind_locked() - Bind a group to a group slot
> + * @group: Group.
> + * @csg_id: Slot.
> + *
> + * Return: 0 on success, a negative error code otherwise.
> + */
> +static int
> +group_bind_locked(struct panthor_group *group, u32 csg_id)
> +{
> +       struct panthor_device *ptdev = group->ptdev;
> +       int ret;
> +
> +       lockdep_assert_held(&ptdev->scheduler->lock);
> +
> +       if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
> +                       ptdev->scheduler->csg_slots[csg_id].group))
> +               return -EINVAL;
> +
> +       ret = panthor_vm_active(group->vm);
> +       if (ret)
> +               return ret;
> +
> +       group_get(group);
> +
> +       /* Dummy doorbell allocation: doorbell is assigned to the group and
> +        * all queues use the same doorbell.
> +        *
> +        * TODO: Implement LRU-based doorbell assignment, so the most often
> +        * updated queues get their own doorbell, thus avoiding useless checks
> +        * on queues belonging to the same group that are rarely updated.
> +        */
> +       for (u32 i = 0; i < group->queue_count; i++)
> +               group->queues[i]->doorbell_id = csg_id + 1;
> +
> +       scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +               ptdev->scheduler->csg_slots[csg_id].group = group;
> +               group->csg_id = csg_id;
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * group_unbind_locked() - Unbind a group from a slot.
> + * @group: Group to unbind.
> + *
> + * Return: 0 on success, a negative error code otherwise.
> + */
> +static int
> +group_unbind_locked(struct panthor_group *group)
> +{
> +       struct panthor_device *ptdev = group->ptdev;
> +
> +       lockdep_assert_held(&ptdev->scheduler->lock);
> +
> +       if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
> +               return -EINVAL;
> +
> +       if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
> +               return -EINVAL;
> +
> +       scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +               /* Process all pending IRQs before returning the slot. */
> +               sched_process_csg_irq_locked(ptdev, group->csg_id);
> +               ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
> +               group->csg_id = -1;
> +       }
> +
> +       panthor_vm_idle(group->vm);
> +
> +       /* Tiler OOM events will be re-issued next time the group is scheduled. */
> +       atomic_set(&group->tiler_oom, 0);
> +       cancel_work(&group->tiler_oom_work);
> +
> +       for (u32 i = 0; i < group->queue_count; i++)
> +               group->queues[i]->doorbell_id = -1;
> +
> +       group_put(group);
> +       return 0;
> +}
> +
>  static const char *fence_get_driver_name(struct dma_fence *fence)
>  {
>         return "panthor";
> @@ -2396,15 +2398,6 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
>         /* Unbind evicted groups. */
>         for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
>                 list_for_each_entry(group, &ctx->old_groups[prio], run_node) {
> -                       /* This group is gone. Process interrupts to clear
> -                        * any pending interrupts before we start the new
> -                        * group.
> -                        */
> -                       if (group->csg_id >= 0) {
> -                               guard(spinlock_irqsave)(&sched->events_lock);
> -                               sched_process_csg_irq_locked(ptdev, group->csg_id);
> -                       }
> -
>                         group_unbind_locked(group);
>                 }
>         }
> @@ -2988,11 +2981,6 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>
>                 group_get(group);
>
> -               if (group->csg_id >= 0) {
> -                       guard(spinlock_irqsave)(&sched->events_lock);
> -                       sched_process_csg_irq_locked(ptdev, group->csg_id);
> -               }
> -
>                 group_unbind_locked(group);
>
>                 drm_WARN_ON(&group->ptdev->base, !list_empty(&group->run_node));
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-12 11:37 ` [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
@ 2026-05-12 21:55   ` Chia-I Wu
  2026-05-13  8:42     ` Boris Brezillon
  2026-05-14 14:25   ` Steven Price
  1 sibling, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 21:55 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> Rather than assuming an interrupt is always expected for request
> acks, temporarily enable the relevant interrupts when the polling-wait
> failed. This should hopefully reduce the number of interrupts the CPU
> has to process.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
WIth minor comments below, Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> ---
>  drivers/gpu/drm/panthor/panthor_fw.c    | 34 +++++++++++++++++++--------------
>  drivers/gpu/drm/panthor/panthor_sched.c |  5 +++--
>  2 files changed, 23 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 8239a6951569..f5e0ceca4130 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1039,16 +1039,10 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
>         glb_iface->input->progress_timer = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
>         glb_iface->input->idle_timer = panthor_fw_conv_timeout(ptdev, IDLE_HYSTERESIS_US);
>
> -       /* Enable interrupts we care about. */
> -       glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
> -                                        GLB_PING |
> -                                        GLB_CFG_PROGRESS_TIMER |
> -                                        GLB_CFG_POWEROFF_TIMER |
> -                                        GLB_IDLE_EN |
> -                                        GLB_IDLE;
> -
> -       if (panthor_fw_has_glb_state(ptdev))
> -               glb_iface->input->ack_irq_mask |= GLB_STATE_MASK;
> +       /* Enable interrupts for asynchronous events that are not
> +        * triggered by request acks.
> +        */
> +       glb_iface->input->ack_irq_mask = GLB_IDLE;
We should static_assert or & with GLB_EVT_MASK. Same for CSG and CS.

>
>         panthor_fw_update_reqs(glb_iface, req, GLB_IDLE_EN | GLB_COUNTER_EN,
>                                GLB_IDLE_EN | GLB_COUNTER_EN);
> @@ -1318,8 +1312,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>   * Return: 0 on success, -ETIMEDOUT otherwise.
>   */
>  static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
> -                               wait_queue_head_t *wq,
> -                               u32 req_mask, u32 *acked,
> +                               u32 *ack_irq_mask_ptr, spinlock_t *lock,
> +                               wait_queue_head_t *wq, u32 req_mask, u32 *acked,
>                                 u32 timeout_ms)
>  {
>         u32 ack, req = READ_ONCE(*req_ptr) & req_mask;
> @@ -1334,8 +1328,16 @@ static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
>         if (!ret)
>                 return 0;
>
> -       if (wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> -                              msecs_to_jiffies(timeout_ms)))
> +       scoped_guard(spinlock_irqsave, lock)
> +               *ack_irq_mask_ptr |= req_mask;
> +
> +       ret = wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> +                                msecs_to_jiffies(timeout_ms));
> +
> +       scoped_guard(spinlock_irqsave, lock)
> +               *ack_irq_mask_ptr &= ~req_mask;
We should add a comment saying that this is safe because
{GLB,CSG,CS}_REQ_MASK and {GLB,CSG,CS}_EVT_MASK are disjoint, and thus
req_mask and ack_irq_mask are disjoint.
> +
> +       if (ret)
>                 return 0;
>
>         /* Check one last time, in case we were not woken up for some reason. */
> @@ -1369,6 +1371,8 @@ int panthor_fw_glb_wait_acks(struct panthor_device *ptdev,
>
>         return panthor_fw_wait_acks(&glb_iface->input->req,
>                                     &glb_iface->output->ack,
> +                                   &glb_iface->input->ack_irq_mask,
> +                                   &glb_iface->lock,
>                                     &ptdev->fw->req_waitqueue,
>                                     req_mask, acked, timeout_ms);
>  }
> @@ -1395,6 +1399,8 @@ int panthor_fw_csg_wait_acks(struct panthor_device *ptdev, u32 csg_slot,
>
>         ret = panthor_fw_wait_acks(&csg_iface->input->req,
>                                    &csg_iface->output->ack,
> +                                  &csg_iface->input->ack_irq_mask,
> +                                  &csg_iface->lock,
>                                    &ptdev->fw->req_waitqueue,
>                                    req_mask, acked, timeout_ms);
>
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 6c5ba747ae45..a9124bcc7de6 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1110,7 +1110,7 @@ cs_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 cs_id)
>         cs_iface->input->ringbuf_output = queue->iface.output_fw_va;
>         cs_iface->input->config = CS_CONFIG_PRIORITY(queue->priority) |
>                                   CS_CONFIG_DOORBELL(queue->doorbell_id);
> -       cs_iface->input->ack_irq_mask = ~0;
> +       cs_iface->input->ack_irq_mask = CS_FATAL | CS_FAULT | CS_TILER_OOM;
>         panthor_fw_update_reqs(cs_iface, req,
>                                CS_IDLE_SYNC_WAIT |
>                                CS_IDLE_EMPTY |
> @@ -1378,7 +1378,8 @@ csg_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 priority)
>                 csg_iface->input->protm_suspend_buf = 0;
>         }
>
> -       csg_iface->input->ack_irq_mask = ~0;
> +       csg_iface->input->ack_irq_mask = CSG_SYNC_UPDATE | CSG_IDLE |
> +                                        CSG_PROGRESS_TIMER_EVENT;
>         panthor_fw_toggle_reqs(csg_iface, doorbell_req, doorbell_ack, queue_mask);
>         return 0;
>  }
>
> --
> 2.54.0
>

On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> Rather than assuming an interrupt is always expected for request
> acks, temporarily enable the relevant interrupts when the polling-wait
> failed. This should hopefully reduce the number of interrupts the CPU
> has to process.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_fw.c    | 34 +++++++++++++++++++--------------
>  drivers/gpu/drm/panthor/panthor_sched.c |  5 +++--
>  2 files changed, 23 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 8239a6951569..f5e0ceca4130 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1039,16 +1039,10 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
>         glb_iface->input->progress_timer = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
>         glb_iface->input->idle_timer = panthor_fw_conv_timeout(ptdev, IDLE_HYSTERESIS_US);
>
> -       /* Enable interrupts we care about. */
> -       glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
> -                                        GLB_PING |
> -                                        GLB_CFG_PROGRESS_TIMER |
> -                                        GLB_CFG_POWEROFF_TIMER |
> -                                        GLB_IDLE_EN |
> -                                        GLB_IDLE;
> -
> -       if (panthor_fw_has_glb_state(ptdev))
> -               glb_iface->input->ack_irq_mask |= GLB_STATE_MASK;
> +       /* Enable interrupts for asynchronous events that are not
> +        * triggered by request acks.
> +        */
> +       glb_iface->input->ack_irq_mask = GLB_IDLE;
>
>         panthor_fw_update_reqs(glb_iface, req, GLB_IDLE_EN | GLB_COUNTER_EN,
>                                GLB_IDLE_EN | GLB_COUNTER_EN);
> @@ -1318,8 +1312,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>   * Return: 0 on success, -ETIMEDOUT otherwise.
>   */
>  static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
> -                               wait_queue_head_t *wq,
> -                               u32 req_mask, u32 *acked,
> +                               u32 *ack_irq_mask_ptr, spinlock_t *lock,
> +                               wait_queue_head_t *wq, u32 req_mask, u32 *acked,
>                                 u32 timeout_ms)
>  {
>         u32 ack, req = READ_ONCE(*req_ptr) & req_mask;
> @@ -1334,8 +1328,16 @@ static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
>         if (!ret)
>                 return 0;
>
> -       if (wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> -                              msecs_to_jiffies(timeout_ms)))
> +       scoped_guard(spinlock_irqsave, lock)
> +               *ack_irq_mask_ptr |= req_mask;
> +
> +       ret = wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> +                                msecs_to_jiffies(timeout_ms));
> +
> +       scoped_guard(spinlock_irqsave, lock)
> +               *ack_irq_mask_ptr &= ~req_mask;
> +
> +       if (ret)
>                 return 0;
>
>         /* Check one last time, in case we were not woken up for some reason. */
> @@ -1369,6 +1371,8 @@ int panthor_fw_glb_wait_acks(struct panthor_device *ptdev,
>
>         return panthor_fw_wait_acks(&glb_iface->input->req,
>                                     &glb_iface->output->ack,
> +                                   &glb_iface->input->ack_irq_mask,
> +                                   &glb_iface->lock,
>                                     &ptdev->fw->req_waitqueue,
>                                     req_mask, acked, timeout_ms);
>  }
> @@ -1395,6 +1399,8 @@ int panthor_fw_csg_wait_acks(struct panthor_device *ptdev, u32 csg_slot,
>
>         ret = panthor_fw_wait_acks(&csg_iface->input->req,
>                                    &csg_iface->output->ack,
> +                                  &csg_iface->input->ack_irq_mask,
> +                                  &csg_iface->lock,
>                                    &ptdev->fw->req_waitqueue,
>                                    req_mask, acked, timeout_ms);
>
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 6c5ba747ae45..a9124bcc7de6 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1110,7 +1110,7 @@ cs_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 cs_id)
>         cs_iface->input->ringbuf_output = queue->iface.output_fw_va;
>         cs_iface->input->config = CS_CONFIG_PRIORITY(queue->priority) |
>                                   CS_CONFIG_DOORBELL(queue->doorbell_id);
> -       cs_iface->input->ack_irq_mask = ~0;
> +       cs_iface->input->ack_irq_mask = CS_FATAL | CS_FAULT | CS_TILER_OOM;
>         panthor_fw_update_reqs(cs_iface, req,
>                                CS_IDLE_SYNC_WAIT |
>                                CS_IDLE_EMPTY |
> @@ -1378,7 +1378,8 @@ csg_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 priority)
>                 csg_iface->input->protm_suspend_buf = 0;
>         }
>
> -       csg_iface->input->ack_irq_mask = ~0;
> +       csg_iface->input->ack_irq_mask = CSG_SYNC_UPDATE | CSG_IDLE |
> +                                        CSG_PROGRESS_TIMER_EVENT;
>         panthor_fw_toggle_reqs(csg_iface, doorbell_req, doorbell_ack, queue_mask);
>         return 0;
>  }
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context
  2026-05-12 11:37 ` [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context Boris Brezillon
@ 2026-05-12 22:05   ` Chia-I Wu
  2026-05-12 22:09     ` Chia-I Wu
  2026-05-14 15:23   ` Steven Price
  1 sibling, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 22:05 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> Now that everything is set to allow processing FW events in IRQ context,
> go for it. This should reduce the dma_fence signaling latency.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panthor/panthor_fw.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index f5e0ceca4130..8cfebf180de7 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1087,9 +1087,29 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
>         }
>  }
>
> -static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
> +static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
>  {
> -       return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
> +       struct panthor_irq *pirq = data;
> +
> +       if (!gpu_read(pirq->iomem, INT_STAT))
> +               return IRQ_NONE;
> +
> +       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +               if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> +                       return IRQ_NONE;
> +
> +               pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> +       }
> +
> +       /* We can use INT_STAT here, because we didn't mask the IRQs. */
> +       panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_STAT));
We should loop here until INT_STAT is cleared.
> +
> +       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +               if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
> +                       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> +       }
> +
> +       return IRQ_HANDLED;
>  }
>
>  static int panthor_fw_start(struct panthor_device *ptdev)
> @@ -1489,8 +1509,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
>
>         ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
>                                   ptdev->iomem + JOB_INT_BASE, "job",
> -                                 panthor_irq_default_raw_handler,
> -                                 panthor_job_irq_threaded_handler);
> +                                 panthor_job_irq_raw_handler, NULL);
>         if (ret) {
>                 drm_err(&ptdev->base, "failed to request job irq");
>                 return ret;
>
> --
> 2.54.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context
  2026-05-12 22:05   ` Chia-I Wu
@ 2026-05-12 22:09     ` Chia-I Wu
  2026-05-13  8:44       ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 22:09 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 3:05 PM Chia-I Wu <olvaffe@gmail.com> wrote:
>
> On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > Now that everything is set to allow processing FW events in IRQ context,
> > go for it. This should reduce the dma_fence signaling latency.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_fw.c | 27 +++++++++++++++++++++++----
> >  1 file changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> > index f5e0ceca4130..8cfebf180de7 100644
> > --- a/drivers/gpu/drm/panthor/panthor_fw.c
> > +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> > @@ -1087,9 +1087,29 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
> >         }
> >  }
> >
> > -static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
> > +static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
> >  {
> > -       return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
> > +       struct panthor_irq *pirq = data;
> > +
> > +       if (!gpu_read(pirq->iomem, INT_STAT))
> > +               return IRQ_NONE;
> > +
> > +       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> > +               if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> > +                       return IRQ_NONE;
> > +
> > +               pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> > +       }
> > +
> > +       /* We can use INT_STAT here, because we didn't mask the IRQs. */
> > +       panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_STAT));
> We should loop here until INT_STAT is cleared.
Perhaps not. This is hardirq.

Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> > +
> > +       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> > +               if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
> > +                       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> > +       }
> > +
> > +       return IRQ_HANDLED;
> >  }
> >
> >  static int panthor_fw_start(struct panthor_device *ptdev)
> > @@ -1489,8 +1509,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
> >
> >         ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
> >                                   ptdev->iomem + JOB_INT_BASE, "job",
> > -                                 panthor_irq_default_raw_handler,
> > -                                 panthor_job_irq_threaded_handler);
> > +                                 panthor_job_irq_raw_handler, NULL);
> >         if (ret) {
> >                 drm_err(&ptdev->base, "failed to request job irq");
> >                 return ret;
> >
> > --
> > 2.54.0
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context
  2026-05-12 11:50   ` Boris Brezillon
@ 2026-05-12 22:40     ` Chia-I Wu
  2026-05-13  8:54       ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-12 22:40 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 12, 2026 at 5:09 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 12 May 2026 13:37:41 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
>
> > The current panthor_gpu_irq_handler() logic is already IRQ-safe
> > (no sleep or sleeping locks, spinlocks taken with irqsave in other
> > contexts, etc), so let's toggle the switch and make it an hard IRQ
> > handler.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_gpu.c | 15 ++++++++-------
> >  1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> > index b9c51f8a051d..04c8f23baf3f 100644
> > --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> > @@ -86,10 +86,15 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
> >       gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
> >  }
> >
> > -static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
> > +static irqreturn_t panthor_gpu_irq_raw_handler(int irq, void *data)
> >  {
> > +     struct panthor_irq *pirq = data;
> >       struct panthor_device *ptdev = pirq->ptdev;
> >       struct panthor_gpu *gpu = ptdev->gpu;
> > +     u32 status = gpu_read(gpu->irq.iomem, INT_STAT);
> > +
> > +     if (!status)
> > +             return IRQ_NONE;
> >
>
> Forgot to add the pirq state transition here:
>
>        scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
>                if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
>                        return IRQ_NONE;
>
>                pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
>        }
>
> >       gpu_write(gpu->irq.iomem, INT_CLEAR, status);
> >
> > @@ -115,11 +120,8 @@ static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
> >               ptdev->gpu->pending_reqs &= ~status;
> >               wake_up_all(&ptdev->gpu->reqs_acked);
> >       }
> > -}
> >
> > -static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
> > -{
> > -     return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);
>
> and restore it here:
>
>        scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
>                if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
>                        pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
>        }
>
It looks like we can get rid of state transitions if
panthor_irq_{enable,disable}_events updates INT_MASK directly when the
handler is not threaded. Hm, we can even make pirq->state atomic again
to get rid of locking.

> > +     return IRQ_HANDLED;
> >  }
> >
> >  /**
> > @@ -176,8 +178,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
> >       ret = panthor_irq_request(ptdev, &ptdev->gpu->irq, irq,
> >                                 GPU_INTERRUPTS_MASK,
> >                                 ptdev->iomem + GPU_INT_BASE, "gpu",
> > -                               panthor_irq_default_raw_handler,
> > -                               panthor_gpu_irq_threaded_handler);
> > +                               panthor_gpu_irq_raw_handler, NULL);
> >       if (ret)
> >               return ret;
> >
> >
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers
  2026-05-12 18:58   ` Chia-I Wu
@ 2026-05-13  8:03     ` Boris Brezillon
  2026-05-13 16:46       ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-13  8:03 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, 12 May 2026 11:58:30 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > Now that panthor_irq contains the iomem region, there's no real need
> > for the macro-based panthor_irq helper generation logic. We can just
> > provide inline helpers that do the same and let the compiler optimize
> > indirect function calls. The only extra annoyance is the fact we have
> > to open-code the panthor_xxx_irq_threaded_handler() implementation, but
> > those are single-line functions, so it's acceptable.  
> We might want to __always_inline panthor_irq_default_threaded_handler.

Yep, I can flag it __always_inline, but I'd be surprised if the
compiler wasn't always inlining anyway, unless you use more exotic
optimization options, like -Os (not even sure that would be the case
with -Os, I didn't check), at which point it becomes a user decision,
and not inlining is probably fine.

> For the rest, do we want to un-inline them?

Most of them are super trivial, and I think there's benefit in having
them inlined. Again, because it's not __always_inline, the compiler is
still free to unline, but at least we wouldn't resort to LTO for this
sort of inlining optimization. So I'm still tempted to keep it as
static inline helpers defined in the header file, unless you a strong
reason to think this is a bad idea.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers
  2026-05-12 19:11   ` Chia-I Wu
@ 2026-05-13  8:09     ` Boris Brezillon
  2026-05-13 17:06       ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-13  8:09 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, 12 May 2026 12:11:08 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > All drivers except panthor signal their fences from their interrupt
> > handler to minimize latency. We could do the same from the threaded
> > handler, but the latency is still quite high in that case, so let's
> > allow components to choose the context they want their IRQ handler
> > to run in by exposing support for custom hard handlers.
> >
> > Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> > Reviewed-by: Steven Price <steven.price@arm.com>
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_device.h | 11 ++++++++---
> >  drivers/gpu/drm/panthor/panthor_fw.c     |  1 +
> >  drivers/gpu/drm/panthor/panthor_gpu.c    |  1 +
> >  drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
> >  drivers/gpu/drm/panthor/panthor_pwr.c    |  1 +
> >  5 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > index 393fcda73d88..1aaf06df875b 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
> >  static inline int
> >  panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> >                     int irq, u32 mask, void __iomem *iomem, const char *name,
> > +                   irqreturn_t (*raw_handler)(int, void *data),
> >                     irqreturn_t (*threaded_handler)(int, void *data))
> >  {
> >         const char *full_name;
> > @@ -687,9 +688,13 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> >                 return -ENOMEM;
> >
> >         panthor_irq_resume(pirq);
> > -       return devm_request_threaded_irq(ptdev->base.dev, irq,
> > -                                        panthor_irq_default_raw_handler,
> > -                                        threaded_handler,
> > +
> > +       if (!threaded_handler) {
> > +               return devm_request_irq(ptdev->base.dev, irq, raw_handler,
> > +                                       IRQF_SHARED, full_name, pirq);
> > +       }  
> devm_request_irq expands to devm_request_threaded_irq plus
> IRQF_COND_ONESHOT. This appears redundant.

I considered going for devm_request_threaded_irq(COND_ONESHOT), but I
thought it was easier to reason about with a regular devm_request_irq()
and an extra conditional since request_irq() is what people tend
to use when they just have a hard handler (see [1], there's just one
driver using it, and it's not even needed, because it's calling
devm_request_irq() which adds this flag already)

[1]https://elixir.bootlin.com/linux/v7.1-rc3/A/ident/IRQF_COND_ONESHOT

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-12 21:04   ` Chia-I Wu
@ 2026-05-13  8:29     ` Boris Brezillon
  2026-05-13 17:47       ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-13  8:29 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, 12 May 2026 14:04:43 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 12, 2026 at 5:14 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > Add a specific spinlock for events processing, and force processing
> > of events in the panthor_sched_report_fw_events() path rather than
> > deferring it to a work item. We also fast-track fence signalling by
> > making the job completion logic IRQ-safe.
> >
> > Note that it requires changing a couple spin_lock() into
> > spin_lock_irqsave() when those are taken inside a events_lock section.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_sched.c | 332 +++++++++++++++-----------------
> >  1 file changed, 155 insertions(+), 177 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > index 5b34032deff8..fbf76b59b7ef 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -177,18 +177,6 @@ struct panthor_scheduler {
> >          */
> >         struct work_struct sync_upd_work;
> >
> > -       /**
> > -        * @fw_events_work: Work used to process FW events outside the interrupt path.
> > -        *
> > -        * Even if the interrupt is threaded, we need any event processing
> > -        * that require taking the panthor_scheduler::lock to be processed
> > -        * outside the interrupt path so we don't block the tick logic when
> > -        * it calls panthor_fw_{csg,wait}_wait_acks(). Since most of the
> > -        * event processing requires taking this lock, we just delegate all
> > -        * FW event processing to the scheduler workqueue.
> > -        */
> > -       struct work_struct fw_events_work;
> > -
> >         /**
> >          * @fw_events: Bitmask encoding pending FW events.
> >          */  
> If we process all fw events in the irq context, we can remove
> fw_events as well. More on this below.

Oops, forgot to remove this field, indeed.

> > @@ -254,6 +242,15 @@ struct panthor_scheduler {
> >                 struct list_head waiting;
> >         } groups;
> >
> > +       /**
> > +        * @events_lock: Lock taken when processing events.
> > +        *
> > +        * This also needs to be taken when csg_slots are updated, to make sure
> > +        * the event processing logic doesn't touch groups that have left the CSG
> > +        * slot.
> > +        */
> > +       spinlock_t events_lock;
> > +
> >         /**
> >          * @csg_slots: FW command stream group slots.  
> It looks like read access can use either lock (process context) or
> events_lock (irq context), while write access must use events_lock
> (process context). Can we put that into the comment, or if makes
> sense, enforce that with accessor functions?

You're right. I'll mention that updates to csg_slots[] must be done
with both the ::lock and ::events_lock held, while reads can be done
with any of them held.

> 
> 
> >          */
> > @@ -676,9 +673,6 @@ struct panthor_group {
> >          */
> >         struct panthor_kernel_bo *protm_suspend_buf;
> >
> > -       /** @sync_upd_work: Work used to check/signal job fences. */
> > -       struct work_struct sync_upd_work;
> > -  
> Can we make this a preparatory commit, where group_sync_upd_work is
> replaced by group_check_job_completion?

I'll try to split that up.

> 
> Multiple things happen in this commit. I try to identify things that
> can be separate commits. If this does not make sense, feel free to
> ignore.
> 
> >         /** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
> >         struct work_struct tiler_oom_work;
> >

[...]

> >  /**
> >   * panthor_sched_report_fw_events() - Report FW events to the scheduler.
> >   * @ptdev: Device.
> > @@ -1902,8 +1953,19 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)  
> This can be renamed to panthor_sched_handle_fw_events.

It's not quite handling events though. For most of them, it's really
just deferring the processing to work items, SYNC_UPDATE is the
exception.

> 
> >         if (!ptdev->scheduler)
> >                 return;
> >
> > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > -       sched_queue_work(ptdev->scheduler, fw_events);
> > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > +
> > +       if (events & JOB_INT_GLOBAL_IF) {
> > +               sched_process_global_irq_locked(ptdev);
> > +               events &= ~JOB_INT_GLOBAL_IF;
> > +       }
> > +
> > +       while (events) {
> > +               u32 csg_id = ffs(events) - 1;
> > +
> > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > +               events &= ~BIT(csg_id);
> > +       }  
> This handles all fw events in the irq context. Are there concerns that
> it may take too long? I might be wrong, but it seems possible to
> handle only CSG_SYNC_UPDATE and defer the rest as before.

I started with just the SYNC_UPDATE processing done in the hard-irq
context, but after auditing the other stuff done in the handler, I
realized it's basically just deferring all actual processing to work
items. Yes, there's the overhead of demuxing the events from the
ack/req regs, but part of this is already done to get to SYNC_UPDATE
anyway, so at this point we're probably better off demuxing everything
and scheduling works for all kind of events.

I also compared the perfs between the two approaches (though I didn't
do as much testing as I did with the new version, so I might have
missed something), and it didn't seem to matter at all, because the
interrupts we receive the most are SYNC_UPDATE and IDLE events, and
those are at the same level.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-12 21:55   ` Chia-I Wu
@ 2026-05-13  8:42     ` Boris Brezillon
  2026-05-13 17:14       ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-13  8:42 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, 12 May 2026 14:55:30 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > Rather than assuming an interrupt is always expected for request
> > acks, temporarily enable the relevant interrupts when the polling-wait
> > failed. This should hopefully reduce the number of interrupts the CPU
> > has to process.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> WIth minor comments below, Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> > ---
> >  drivers/gpu/drm/panthor/panthor_fw.c    | 34 +++++++++++++++++++--------------
> >  drivers/gpu/drm/panthor/panthor_sched.c |  5 +++--
> >  2 files changed, 23 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> > index 8239a6951569..f5e0ceca4130 100644
> > --- a/drivers/gpu/drm/panthor/panthor_fw.c
> > +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> > @@ -1039,16 +1039,10 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
> >         glb_iface->input->progress_timer = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
> >         glb_iface->input->idle_timer = panthor_fw_conv_timeout(ptdev, IDLE_HYSTERESIS_US);
> >
> > -       /* Enable interrupts we care about. */
> > -       glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
> > -                                        GLB_PING |
> > -                                        GLB_CFG_PROGRESS_TIMER |
> > -                                        GLB_CFG_POWEROFF_TIMER |
> > -                                        GLB_IDLE_EN |
> > -                                        GLB_IDLE;
> > -
> > -       if (panthor_fw_has_glb_state(ptdev))
> > -               glb_iface->input->ack_irq_mask |= GLB_STATE_MASK;
> > +       /* Enable interrupts for asynchronous events that are not
> > +        * triggered by request acks.
> > +        */
> > +       glb_iface->input->ack_irq_mask = GLB_IDLE;  
> We should static_assert or & with GLB_EVT_MASK. Same for CSG and CS.

Yep, good idea, I'll add a static_assert() in all places where
->ack_irq_mask is set.

> 
> >
> >         panthor_fw_update_reqs(glb_iface, req, GLB_IDLE_EN | GLB_COUNTER_EN,
> >                                GLB_IDLE_EN | GLB_COUNTER_EN);
> > @@ -1318,8 +1312,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
> >   * Return: 0 on success, -ETIMEDOUT otherwise.
> >   */
> >  static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
> > -                               wait_queue_head_t *wq,
> > -                               u32 req_mask, u32 *acked,
> > +                               u32 *ack_irq_mask_ptr, spinlock_t *lock,
> > +                               wait_queue_head_t *wq, u32 req_mask, u32 *acked,
> >                                 u32 timeout_ms)
> >  {
> >         u32 ack, req = READ_ONCE(*req_ptr) & req_mask;
> > @@ -1334,8 +1328,16 @@ static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
> >         if (!ret)
> >                 return 0;
> >
> > -       if (wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> > -                              msecs_to_jiffies(timeout_ms)))
> > +       scoped_guard(spinlock_irqsave, lock)
> > +               *ack_irq_mask_ptr |= req_mask;
> > +
> > +       ret = wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> > +                                msecs_to_jiffies(timeout_ms));
> > +
> > +       scoped_guard(spinlock_irqsave, lock)
> > +               *ack_irq_mask_ptr &= ~req_mask;  
> We should add a comment saying that this is safe because
> {GLB,CSG,CS}_REQ_MASK and {GLB,CSG,CS}_EVT_MASK are disjoint, and thus
> req_mask and ack_irq_mask are disjoint.

You mean the ack_irq_mask set at init time? Because
xxx_iface->input->ack_irq_mask is moving target now.

Well, if we expand on safety matters, I'd say none of this is safe
since it relies on the caller knowing what it does and passing a valid
req_mask. But I'll add a comment mentioning that the original
ack_irq_mask shouldn't intersect with any of the bits that might be set
in req_mask (that's basically the static_assert() you suggested).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context
  2026-05-12 22:09     ` Chia-I Wu
@ 2026-05-13  8:44       ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-05-13  8:44 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, 12 May 2026 15:09:47 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 12, 2026 at 3:05 PM Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> > On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:  
> > >
> > > Now that everything is set to allow processing FW events in IRQ context,
> > > go for it. This should reduce the dma_fence signaling latency.
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > ---
> > >  drivers/gpu/drm/panthor/panthor_fw.c | 27 +++++++++++++++++++++++----
> > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> > > index f5e0ceca4130..8cfebf180de7 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_fw.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> > > @@ -1087,9 +1087,29 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
> > >         }
> > >  }
> > >
> > > -static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
> > > +static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
> > >  {
> > > -       return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
> > > +       struct panthor_irq *pirq = data;
> > > +
> > > +       if (!gpu_read(pirq->iomem, INT_STAT))
> > > +               return IRQ_NONE;
> > > +
> > > +       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> > > +               if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> > > +                       return IRQ_NONE;
> > > +
> > > +               pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> > > +       }
> > > +
> > > +       /* We can use INT_STAT here, because we didn't mask the IRQs. */
> > > +       panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_STAT));  
> > We should loop here until INT_STAT is cleared.  
> Perhaps not. This is hardirq.

Yep, the absence of loop is intentional here. We process it one at a
time, and give other interrupt handlers a chance to do their stuff
before we get called again.

> 
> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> > > +
> > > +       scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> > > +               if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
> > > +                       pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> > > +       }
> > > +
> > > +       return IRQ_HANDLED;
> > >  }
> > >
> > >  static int panthor_fw_start(struct panthor_device *ptdev)
> > > @@ -1489,8 +1509,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
> > >
> > >         ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
> > >                                   ptdev->iomem + JOB_INT_BASE, "job",
> > > -                                 panthor_irq_default_raw_handler,
> > > -                                 panthor_job_irq_threaded_handler);
> > > +                                 panthor_job_irq_raw_handler, NULL);
> > >         if (ret) {
> > >                 drm_err(&ptdev->base, "failed to request job irq");
> > >                 return ret;
> > >
> > > --
> > > 2.54.0
> > >  


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context
  2026-05-12 22:40     ` Chia-I Wu
@ 2026-05-13  8:54       ` Boris Brezillon
  2026-05-13 18:07         ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-13  8:54 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, 12 May 2026 15:40:41 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 12, 2026 at 5:09 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Tue, 12 May 2026 13:37:41 +0200
> > Boris Brezillon <boris.brezillon@collabora.com> wrote:
> >  
> > > The current panthor_gpu_irq_handler() logic is already IRQ-safe
> > > (no sleep or sleeping locks, spinlocks taken with irqsave in other
> > > contexts, etc), so let's toggle the switch and make it an hard IRQ
> > > handler.
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > ---
> > >  drivers/gpu/drm/panthor/panthor_gpu.c | 15 ++++++++-------
> > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> > > index b9c51f8a051d..04c8f23baf3f 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> > > @@ -86,10 +86,15 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
> > >       gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
> > >  }
> > >
> > > -static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
> > > +static irqreturn_t panthor_gpu_irq_raw_handler(int irq, void *data)
> > >  {
> > > +     struct panthor_irq *pirq = data;
> > >       struct panthor_device *ptdev = pirq->ptdev;
> > >       struct panthor_gpu *gpu = ptdev->gpu;
> > > +     u32 status = gpu_read(gpu->irq.iomem, INT_STAT);
> > > +
> > > +     if (!status)
> > > +             return IRQ_NONE;
> > >  
> >
> > Forgot to add the pirq state transition here:
> >
> >        scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> >                if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> >                        return IRQ_NONE;
> >
> >                pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> >        }
> >  
> > >       gpu_write(gpu->irq.iomem, INT_CLEAR, status);
> > >
> > > @@ -115,11 +120,8 @@ static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
> > >               ptdev->gpu->pending_reqs &= ~status;
> > >               wake_up_all(&ptdev->gpu->reqs_acked);
> > >       }
> > > -}
> > >
> > > -static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
> > > -{
> > > -     return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);  
> >
> > and restore it here:
> >
> >        scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> >                if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
> >                        pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> >        }
> >  
> It looks like we can get rid of state transitions if
> panthor_irq_{enable,disable}_events updates INT_MASK directly when the
> handler is not threaded.

Hm, this would add some conditionals to
panthor_irq_{enable,disable}_events() and it makes the whole thing even
harder to reason about, because now it's different depending on whether
this is a threaded handler or not.

> Hm, we can even make pirq->state atomic again
> to get rid of locking.

I'd say, if we really want to optimize that, we do it in a follow-up
series. And I'd rather have an attempt at turning the MMU handler into a
hard handler (which implies selecting what we process immediately and
what we defer) than adding conditionals to irq_enabled/disable_events.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers
  2026-05-13  8:03     ` Boris Brezillon
@ 2026-05-13 16:46       ` Chia-I Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Chia-I Wu @ 2026-05-13 16:46 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, May 13, 2026 at 1:03 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 12 May 2026 11:58:30 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > Now that panthor_irq contains the iomem region, there's no real need
> > > for the macro-based panthor_irq helper generation logic. We can just
> > > provide inline helpers that do the same and let the compiler optimize
> > > indirect function calls. The only extra annoyance is the fact we have
> > > to open-code the panthor_xxx_irq_threaded_handler() implementation, but
> > > those are single-line functions, so it's acceptable.
> > We might want to __always_inline panthor_irq_default_threaded_handler.
>
> Yep, I can flag it __always_inline, but I'd be surprised if the
> compiler wasn't always inlining anyway, unless you use more exotic
> optimization options, like -Os (not even sure that would be the case
> with -Os, I didn't check), at which point it becomes a user decision,
> and not inlining is probably fine.
>
> > For the rest, do we want to un-inline them?
>
> Most of them are super trivial, and I think there's benefit in having
> them inlined. Again, because it's not __always_inline, the compiler is
> still free to unline, but at least we wouldn't resort to LTO for this
> sort of inlining optimization. So I'm still tempted to keep it as
> static inline helpers defined in the header file, unless you a strong
> reason to think this is a bad idea.
No, no strong reason just that the rest are not hot enough to inline.
But there is no harm to inline so

Reviewed-by: Chia-I Wu <olvaffe@gmail.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers
  2026-05-13  8:09     ` Boris Brezillon
@ 2026-05-13 17:06       ` Chia-I Wu
  2026-05-13 17:30         ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-13 17:06 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, May 13, 2026 at 1:09 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 12 May 2026 12:11:08 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > All drivers except panthor signal their fences from their interrupt
> > > handler to minimize latency. We could do the same from the threaded
> > > handler, but the latency is still quite high in that case, so let's
> > > allow components to choose the context they want their IRQ handler
> > > to run in by exposing support for custom hard handlers.
> > >
> > > Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > ---
> > >  drivers/gpu/drm/panthor/panthor_device.h | 11 ++++++++---
> > >  drivers/gpu/drm/panthor/panthor_fw.c     |  1 +
> > >  drivers/gpu/drm/panthor/panthor_gpu.c    |  1 +
> > >  drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
> > >  drivers/gpu/drm/panthor/panthor_pwr.c    |  1 +
> > >  5 files changed, 12 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > > index 393fcda73d88..1aaf06df875b 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > > @@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
> > >  static inline int
> > >  panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> > >                     int irq, u32 mask, void __iomem *iomem, const char *name,
> > > +                   irqreturn_t (*raw_handler)(int, void *data),
> > >                     irqreturn_t (*threaded_handler)(int, void *data))
> > >  {
> > >         const char *full_name;
> > > @@ -687,9 +688,13 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> > >                 return -ENOMEM;
> > >
> > >         panthor_irq_resume(pirq);
> > > -       return devm_request_threaded_irq(ptdev->base.dev, irq,
> > > -                                        panthor_irq_default_raw_handler,
> > > -                                        threaded_handler,
> > > +
> > > +       if (!threaded_handler) {
> > > +               return devm_request_irq(ptdev->base.dev, irq, raw_handler,
> > > +                                       IRQF_SHARED, full_name, pirq);
> > > +       }
> > devm_request_irq expands to devm_request_threaded_irq plus
> > IRQF_COND_ONESHOT. This appears redundant.
>
> I considered going for devm_request_threaded_irq(COND_ONESHOT), but I
> thought it was easier to reason about with a regular devm_request_irq()
> and an extra conditional since request_irq() is what people tend
> to use when they just have a hard handler (see [1], there's just one
> driver using it, and it's not even needed, because it's calling
> devm_request_irq() which adds this flag already)
It is unclear to me why the current version wants IRQF_COND_ONESHOT in
one case but not in another. Can't we call devm_request_threaded_irq
without IRQF_COND_ONESHOT for both cases?

>
> [1]https://elixir.bootlin.com/linux/v7.1-rc3/A/ident/IRQF_COND_ONESHOT

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-13  8:42     ` Boris Brezillon
@ 2026-05-13 17:14       ` Chia-I Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Chia-I Wu @ 2026-05-13 17:14 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, May 13, 2026 at 1:42 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 12 May 2026 14:55:30 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > Rather than assuming an interrupt is always expected for request
> > > acks, temporarily enable the relevant interrupts when the polling-wait
> > > failed. This should hopefully reduce the number of interrupts the CPU
> > > has to process.
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > WIth minor comments below, Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
> > > ---
> > >  drivers/gpu/drm/panthor/panthor_fw.c    | 34 +++++++++++++++++++--------------
> > >  drivers/gpu/drm/panthor/panthor_sched.c |  5 +++--
> > >  2 files changed, 23 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> > > index 8239a6951569..f5e0ceca4130 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_fw.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> > > @@ -1039,16 +1039,10 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
> > >         glb_iface->input->progress_timer = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
> > >         glb_iface->input->idle_timer = panthor_fw_conv_timeout(ptdev, IDLE_HYSTERESIS_US);
> > >
> > > -       /* Enable interrupts we care about. */
> > > -       glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
> > > -                                        GLB_PING |
> > > -                                        GLB_CFG_PROGRESS_TIMER |
> > > -                                        GLB_CFG_POWEROFF_TIMER |
> > > -                                        GLB_IDLE_EN |
> > > -                                        GLB_IDLE;
> > > -
> > > -       if (panthor_fw_has_glb_state(ptdev))
> > > -               glb_iface->input->ack_irq_mask |= GLB_STATE_MASK;
> > > +       /* Enable interrupts for asynchronous events that are not
> > > +        * triggered by request acks.
> > > +        */
> > > +       glb_iface->input->ack_irq_mask = GLB_IDLE;
> > We should static_assert or & with GLB_EVT_MASK. Same for CSG and CS.
>
> Yep, good idea, I'll add a static_assert() in all places where
> ->ack_irq_mask is set.
>
> >
> > >
> > >         panthor_fw_update_reqs(glb_iface, req, GLB_IDLE_EN | GLB_COUNTER_EN,
> > >                                GLB_IDLE_EN | GLB_COUNTER_EN);
> > > @@ -1318,8 +1312,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
> > >   * Return: 0 on success, -ETIMEDOUT otherwise.
> > >   */
> > >  static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
> > > -                               wait_queue_head_t *wq,
> > > -                               u32 req_mask, u32 *acked,
> > > +                               u32 *ack_irq_mask_ptr, spinlock_t *lock,
> > > +                               wait_queue_head_t *wq, u32 req_mask, u32 *acked,
> > >                                 u32 timeout_ms)
> > >  {
> > >         u32 ack, req = READ_ONCE(*req_ptr) & req_mask;
> > > @@ -1334,8 +1328,16 @@ static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
> > >         if (!ret)
> > >                 return 0;
> > >
> > > -       if (wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> > > -                              msecs_to_jiffies(timeout_ms)))
> > > +       scoped_guard(spinlock_irqsave, lock)
> > > +               *ack_irq_mask_ptr |= req_mask;
> > > +
> > > +       ret = wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> > > +                                msecs_to_jiffies(timeout_ms));
> > > +
> > > +       scoped_guard(spinlock_irqsave, lock)
> > > +               *ack_irq_mask_ptr &= ~req_mask;
> > We should add a comment saying that this is safe because
> > {GLB,CSG,CS}_REQ_MASK and {GLB,CSG,CS}_EVT_MASK are disjoint, and thus
> > req_mask and ack_irq_mask are disjoint.
>
> You mean the ack_irq_mask set at init time? Because
> xxx_iface->input->ack_irq_mask is moving target now.
>
> Well, if we expand on safety matters, I'd say none of this is safe
> since it relies on the caller knowing what it does and passing a valid
> req_mask. But I'll add a comment mentioning that the original
> ack_irq_mask shouldn't intersect with any of the bits that might be set
> in req_mask (that's basically the static_assert() you suggested).
The callers of panthor_fw_wait_acks validate req_mask. With a
static_assert on ack_irq_mask at init, we are sure the two masks are
disjoint upon entry. We just need a comment explaining why
save-and-restore is unnecessary.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers
  2026-05-13 17:06       ` Chia-I Wu
@ 2026-05-13 17:30         ` Boris Brezillon
  2026-05-13 18:17           ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-13 17:30 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, 13 May 2026 10:06:14 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Wed, May 13, 2026 at 1:09 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Tue, 12 May 2026 12:11:08 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >  
> > > On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:  
> > > >
> > > > All drivers except panthor signal their fences from their interrupt
> > > > handler to minimize latency. We could do the same from the threaded
> > > > handler, but the latency is still quite high in that case, so let's
> > > > allow components to choose the context they want their IRQ handler
> > > > to run in by exposing support for custom hard handlers.
> > > >
> > > > Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > ---
> > > >  drivers/gpu/drm/panthor/panthor_device.h | 11 ++++++++---
> > > >  drivers/gpu/drm/panthor/panthor_fw.c     |  1 +
> > > >  drivers/gpu/drm/panthor/panthor_gpu.c    |  1 +
> > > >  drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
> > > >  drivers/gpu/drm/panthor/panthor_pwr.c    |  1 +
> > > >  5 files changed, 12 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > > > index 393fcda73d88..1aaf06df875b 100644
> > > > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > > > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > > > @@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
> > > >  static inline int
> > > >  panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> > > >                     int irq, u32 mask, void __iomem *iomem, const char *name,
> > > > +                   irqreturn_t (*raw_handler)(int, void *data),
> > > >                     irqreturn_t (*threaded_handler)(int, void *data))
> > > >  {
> > > >         const char *full_name;
> > > > @@ -687,9 +688,13 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> > > >                 return -ENOMEM;
> > > >
> > > >         panthor_irq_resume(pirq);
> > > > -       return devm_request_threaded_irq(ptdev->base.dev, irq,
> > > > -                                        panthor_irq_default_raw_handler,
> > > > -                                        threaded_handler,
> > > > +
> > > > +       if (!threaded_handler) {
> > > > +               return devm_request_irq(ptdev->base.dev, irq, raw_handler,
> > > > +                                       IRQF_SHARED, full_name, pirq);
> > > > +       }  
> > > devm_request_irq expands to devm_request_threaded_irq plus
> > > IRQF_COND_ONESHOT. This appears redundant.  
> >
> > I considered going for devm_request_threaded_irq(COND_ONESHOT), but I
> > thought it was easier to reason about with a regular devm_request_irq()
> > and an extra conditional since request_irq() is what people tend
> > to use when they just have a hard handler (see [1], there's just one
> > driver using it, and it's not even needed, because it's calling
> > devm_request_irq() which adds this flag already)  
> It is unclear to me why the current version wants IRQF_COND_ONESHOT in
> one case but not in another. Can't we call devm_request_threaded_irq
> without IRQF_COND_ONESHOT for both cases?

Hm, I thought this had to do with the automatic hard -> threaded
downgrade happening when RT is enabled, but I fail to see why it
matters, since all irqaction end up with IRQF_ONESHOT in that case
anyway. Honestly, I'm tempted to stay on the safe side, and have
devm_request_irq() called when we just have a hard handler, because I'm
sure there's a reason for this COND_ONESHOT flag.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-13  8:29     ` Boris Brezillon
@ 2026-05-13 17:47       ` Chia-I Wu
  2026-05-18 13:45         ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-13 17:47 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, May 13, 2026 at 1:29 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 12 May 2026 14:04:43 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > On Tue, May 12, 2026 at 5:14 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > Add a specific spinlock for events processing, and force processing
> > > of events in the panthor_sched_report_fw_events() path rather than
> > > deferring it to a work item. We also fast-track fence signalling by
> > > making the job completion logic IRQ-safe.
> > >
> > > Note that it requires changing a couple spin_lock() into
> > > spin_lock_irqsave() when those are taken inside a events_lock section.
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > ---
> > >  drivers/gpu/drm/panthor/panthor_sched.c | 332 +++++++++++++++-----------------
> > >  1 file changed, 155 insertions(+), 177 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > > index 5b34032deff8..fbf76b59b7ef 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > > @@ -177,18 +177,6 @@ struct panthor_scheduler {
> > >          */
> > >         struct work_struct sync_upd_work;
> > >
> > > -       /**
> > > -        * @fw_events_work: Work used to process FW events outside the interrupt path.
> > > -        *
> > > -        * Even if the interrupt is threaded, we need any event processing
> > > -        * that require taking the panthor_scheduler::lock to be processed
> > > -        * outside the interrupt path so we don't block the tick logic when
> > > -        * it calls panthor_fw_{csg,wait}_wait_acks(). Since most of the
> > > -        * event processing requires taking this lock, we just delegate all
> > > -        * FW event processing to the scheduler workqueue.
> > > -        */
> > > -       struct work_struct fw_events_work;
> > > -
> > >         /**
> > >          * @fw_events: Bitmask encoding pending FW events.
> > >          */
> > If we process all fw events in the irq context, we can remove
> > fw_events as well. More on this below.
>
> Oops, forgot to remove this field, indeed.
>
> > > @@ -254,6 +242,15 @@ struct panthor_scheduler {
> > >                 struct list_head waiting;
> > >         } groups;
> > >
> > > +       /**
> > > +        * @events_lock: Lock taken when processing events.
> > > +        *
> > > +        * This also needs to be taken when csg_slots are updated, to make sure
> > > +        * the event processing logic doesn't touch groups that have left the CSG
> > > +        * slot.
> > > +        */
> > > +       spinlock_t events_lock;
> > > +
> > >         /**
> > >          * @csg_slots: FW command stream group slots.
> > It looks like read access can use either lock (process context) or
> > events_lock (irq context), while write access must use events_lock
> > (process context). Can we put that into the comment, or if makes
> > sense, enforce that with accessor functions?
>
> You're right. I'll mention that updates to csg_slots[] must be done
> with both the ::lock and ::events_lock held, while reads can be done
> with any of them held.
>
> >
> >
> > >          */
> > > @@ -676,9 +673,6 @@ struct panthor_group {
> > >          */
> > >         struct panthor_kernel_bo *protm_suspend_buf;
> > >
> > > -       /** @sync_upd_work: Work used to check/signal job fences. */
> > > -       struct work_struct sync_upd_work;
> > > -
> > Can we make this a preparatory commit, where group_sync_upd_work is
> > replaced by group_check_job_completion?
>
> I'll try to split that up.
>
> >
> > Multiple things happen in this commit. I try to identify things that
> > can be separate commits. If this does not make sense, feel free to
> > ignore.
> >
> > >         /** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
> > >         struct work_struct tiler_oom_work;
> > >
>
> [...]
>
> > >  /**
> > >   * panthor_sched_report_fw_events() - Report FW events to the scheduler.
> > >   * @ptdev: Device.
> > > @@ -1902,8 +1953,19 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
> > This can be renamed to panthor_sched_handle_fw_events.
>
> It's not quite handling events though. For most of them, it's really
> just deferring the processing to work items, SYNC_UPDATE is the
> exception.
panthor_sched_report_fw_events no longer just queues
process_fw_events_work. It processes fw events immediately. If
"handle" is not the right verb, perhaps we can go with "process".


>
> >
> > >         if (!ptdev->scheduler)
> > >                 return;
> > >
> > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > +
> > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > +               sched_process_global_irq_locked(ptdev);
> > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > +       }
> > > +
> > > +       while (events) {
> > > +               u32 csg_id = ffs(events) - 1;
> > > +
> > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > +               events &= ~BIT(csg_id);
> > > +       }
> > This handles all fw events in the irq context. Are there concerns that
> > it may take too long? I might be wrong, but it seems possible to
> > handle only CSG_SYNC_UPDATE and defer the rest as before.
>
> I started with just the SYNC_UPDATE processing done in the hard-irq
> context, but after auditing the other stuff done in the handler, I
> realized it's basically just deferring all actual processing to work
> items. Yes, there's the overhead of demuxing the events from the
> ack/req regs, but part of this is already done to get to SYNC_UPDATE
> anyway, so at this point we're probably better off demuxing everything
> and scheduling works for all kind of events.
>
> I also compared the perfs between the two approaches (though I didn't
> do as much testing as I did with the new version, so I might have
> missed something), and it didn't seem to matter at all, because the
> interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> those are at the same level.
Looking at ftrace irq events, when there is one active csg,
panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).

I don't have a good sense if that's considered normal in hardirq. But
if that is ever an issue, and if the majority of the time is spent in
CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
processing to threaded handler.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context
  2026-05-13  8:54       ` Boris Brezillon
@ 2026-05-13 18:07         ` Chia-I Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Chia-I Wu @ 2026-05-13 18:07 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, May 13, 2026 at 1:54 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 12 May 2026 15:40:41 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > On Tue, May 12, 2026 at 5:09 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > On Tue, 12 May 2026 13:37:41 +0200
> > > Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > >
> > > > The current panthor_gpu_irq_handler() logic is already IRQ-safe
> > > > (no sleep or sleeping locks, spinlocks taken with irqsave in other
> > > > contexts, etc), so let's toggle the switch and make it an hard IRQ
> > > > handler.
> > > >
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > ---
> > > >  drivers/gpu/drm/panthor/panthor_gpu.c | 15 ++++++++-------
> > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> > > > index b9c51f8a051d..04c8f23baf3f 100644
> > > > --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> > > > +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> > > > @@ -86,10 +86,15 @@ static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
> > > >       gpu_write(gpu->iomem, GPU_L2_CONFIG, l2_config);
> > > >  }
> > > >
> > > > -static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
> > > > +static irqreturn_t panthor_gpu_irq_raw_handler(int irq, void *data)
> > > >  {
> > > > +     struct panthor_irq *pirq = data;
> > > >       struct panthor_device *ptdev = pirq->ptdev;
> > > >       struct panthor_gpu *gpu = ptdev->gpu;
> > > > +     u32 status = gpu_read(gpu->irq.iomem, INT_STAT);
> > > > +
> > > > +     if (!status)
> > > > +             return IRQ_NONE;
> > > >
> > >
> > > Forgot to add the pirq state transition here:
> > >
> > >        scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> > >                if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> > >                        return IRQ_NONE;
> > >
> > >                pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> > >        }
> > >
> > > >       gpu_write(gpu->irq.iomem, INT_CLEAR, status);
> > > >
> > > > @@ -115,11 +120,8 @@ static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
> > > >               ptdev->gpu->pending_reqs &= ~status;
> > > >               wake_up_all(&ptdev->gpu->reqs_acked);
> > > >       }
> > > > -}
> > > >
> > > > -static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
> > > > -{
> > > > -     return panthor_irq_default_threaded_handler(data, panthor_gpu_irq_handler);
> > >
> > > and restore it here:
> > >
> > >        scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> > >                if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
> > >                        pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> > >        }
> > >
> > It looks like we can get rid of state transitions if
> > panthor_irq_{enable,disable}_events updates INT_MASK directly when the
> > handler is not threaded.
>
> Hm, this would add some conditionals to
> panthor_irq_{enable,disable}_events() and it makes the whole thing even
> harder to reason about, because now it's different depending on whether
> this is a threaded handler or not.
The difference comes from the default raw handler clears INT_MASK
while the custom raw handler does not. That's how one interacts with
{enable,disable}_events and the other does not.

I think it makes sense to move work from hot raw handlers to cold
{enable,disable}_events in the event. It can certainly be a follow-up
series because the benefit is insignificant compared to this series :)

>
> > Hm, we can even make pirq->state atomic again
> > to get rid of locking.
>
> I'd say, if we really want to optimize that, we do it in a follow-up
> series. And I'd rather have an attempt at turning the MMU handler into a
> hard handler (which implies selecting what we process immediately and
> what we defer) than adding conditionals to irq_enabled/disable_events.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers
  2026-05-13 17:30         ` Boris Brezillon
@ 2026-05-13 18:17           ` Chia-I Wu
  2026-05-18 11:54             ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-13 18:17 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, May 13, 2026 at 10:30 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Wed, 13 May 2026 10:06:14 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > On Wed, May 13, 2026 at 1:09 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > On Tue, 12 May 2026 12:11:08 -0700
> > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > >
> > > > On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> > > > <boris.brezillon@collabora.com> wrote:
> > > > >
> > > > > All drivers except panthor signal their fences from their interrupt
> > > > > handler to minimize latency. We could do the same from the threaded
> > > > > handler, but the latency is still quite high in that case, so let's
> > > > > allow components to choose the context they want their IRQ handler
> > > > > to run in by exposing support for custom hard handlers.
> > > > >
> > > > > Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> > > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > ---
> > > > >  drivers/gpu/drm/panthor/panthor_device.h | 11 ++++++++---
> > > > >  drivers/gpu/drm/panthor/panthor_fw.c     |  1 +
> > > > >  drivers/gpu/drm/panthor/panthor_gpu.c    |  1 +
> > > > >  drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
> > > > >  drivers/gpu/drm/panthor/panthor_pwr.c    |  1 +
> > > > >  5 files changed, 12 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > > > > index 393fcda73d88..1aaf06df875b 100644
> > > > > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > > > > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > > > > @@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
> > > > >  static inline int
> > > > >  panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> > > > >                     int irq, u32 mask, void __iomem *iomem, const char *name,
> > > > > +                   irqreturn_t (*raw_handler)(int, void *data),
> > > > >                     irqreturn_t (*threaded_handler)(int, void *data))
> > > > >  {
> > > > >         const char *full_name;
> > > > > @@ -687,9 +688,13 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> > > > >                 return -ENOMEM;
> > > > >
> > > > >         panthor_irq_resume(pirq);
> > > > > -       return devm_request_threaded_irq(ptdev->base.dev, irq,
> > > > > -                                        panthor_irq_default_raw_handler,
> > > > > -                                        threaded_handler,
> > > > > +
> > > > > +       if (!threaded_handler) {
> > > > > +               return devm_request_irq(ptdev->base.dev, irq, raw_handler,
> > > > > +                                       IRQF_SHARED, full_name, pirq);
> > > > > +       }
> > > > devm_request_irq expands to devm_request_threaded_irq plus
> > > > IRQF_COND_ONESHOT. This appears redundant.
> > >
> > > I considered going for devm_request_threaded_irq(COND_ONESHOT), but I
> > > thought it was easier to reason about with a regular devm_request_irq()
> > > and an extra conditional since request_irq() is what people tend
> > > to use when they just have a hard handler (see [1], there's just one
> > > driver using it, and it's not even needed, because it's calling
> > > devm_request_irq() which adds this flag already)
> > It is unclear to me why the current version wants IRQF_COND_ONESHOT in
> > one case but not in another. Can't we call devm_request_threaded_irq
> > without IRQF_COND_ONESHOT for both cases?
>
> Hm, I thought this had to do with the automatic hard -> threaded
> downgrade happening when RT is enabled, but I fail to see why it
> matters, since all irqaction end up with IRQF_ONESHOT in that case
> anyway. Honestly, I'm tempted to stay on the safe side, and have
> devm_request_irq() called when we just have a hard handler, because I'm
> sure there's a reason for this COND_ONESHOT flag.
Sounds good. Feel free to add Reviewed-by: Chia-I Wu <olvaffe@gmail.com>.

FWIW, I think this commit explains the motivation

  commit c37927a203fa283950f6045602b9f71328ad786c
  Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
  Date:   Thu Jul 11 12:20:04 2024 +0200

      genirq: Set IRQF_COND_ONESHOT in request_irq()

      The callers of request_irq() don't care about IRQF_ONESHOT because they
      don't provide threaded handlers, but if they happen to share the IRQ with
      the ACPI SCI, which has a threaded handler and sets IRQF_ONESHOT,
      request_irq() will fail for them due to a flags mismatch.

For panthor, my takeaway is we either care about flag mismatch (then
we should set IRQF_COND_ONESHOT for the threaded case) or we don't
(then we don't need to use devm_request_irq).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time
  2026-05-12 11:37 ` [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
  2026-05-12 21:16   ` Chia-I Wu
@ 2026-05-14 14:17   ` Steven Price
  1 sibling, 0 replies; 56+ messages in thread
From: Steven Price @ 2026-05-14 14:17 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 12/05/2026 12:37, Boris Brezillon wrote:
> Make the sched_process_csg_irq_locked() call part of
> group_unbind_locked() so we don't have to manually call it in
> tick_ctx_apply()/panthor_sched_suspend().
> 
> This implies moving group_[un]bind_locked() around to avoid a
> forward declaration.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 176 +++++++++++++++-----------------
>  1 file changed, 82 insertions(+), 94 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index fbf76b59b7ef..6c5ba747ae45 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -982,86 +982,6 @@ group_get(struct panthor_group *group)
>  	return group;
>  }
>  
> -/**
> - * group_bind_locked() - Bind a group to a group slot
> - * @group: Group.
> - * @csg_id: Slot.
> - *
> - * Return: 0 on success, a negative error code otherwise.
> - */
> -static int
> -group_bind_locked(struct panthor_group *group, u32 csg_id)
> -{
> -	struct panthor_device *ptdev = group->ptdev;
> -	int ret;
> -
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> -
> -	if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
> -			ptdev->scheduler->csg_slots[csg_id].group))
> -		return -EINVAL;
> -
> -	ret = panthor_vm_active(group->vm);
> -	if (ret)
> -		return ret;
> -
> -	group_get(group);
> -
> -	/* Dummy doorbell allocation: doorbell is assigned to the group and
> -	 * all queues use the same doorbell.
> -	 *
> -	 * TODO: Implement LRU-based doorbell assignment, so the most often
> -	 * updated queues get their own doorbell, thus avoiding useless checks
> -	 * on queues belonging to the same group that are rarely updated.
> -	 */
> -	for (u32 i = 0; i < group->queue_count; i++)
> -		group->queues[i]->doorbell_id = csg_id + 1;
> -
> -	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> -		ptdev->scheduler->csg_slots[csg_id].group = group;
> -		group->csg_id = csg_id;
> -	}
> -
> -	return 0;
> -}
> -
> -/**
> - * group_unbind_locked() - Unbind a group from a slot.
> - * @group: Group to unbind.
> - *
> - * Return: 0 on success, a negative error code otherwise.
> - */
> -static int
> -group_unbind_locked(struct panthor_group *group)
> -{
> -	struct panthor_device *ptdev = group->ptdev;
> -
> -	lockdep_assert_held(&ptdev->scheduler->lock);
> -
> -	if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
> -		return -EINVAL;
> -
> -	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
> -		return -EINVAL;
> -
> -	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> -		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
> -		group->csg_id = -1;
> -	}
> -
> -	panthor_vm_idle(group->vm);
> -
> -	/* Tiler OOM events will be re-issued next time the group is scheduled. */
> -	atomic_set(&group->tiler_oom, 0);
> -	cancel_work(&group->tiler_oom_work);
> -
> -	for (u32 i = 0; i < group->queue_count; i++)
> -		group->queues[i]->doorbell_id = -1;
> -
> -	group_put(group);
> -	return 0;
> -}
> -
>  static bool
>  group_is_idle(struct panthor_group *group)
>  {
> @@ -1968,6 +1888,88 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
>  	}
>  }
>  
> +/**
> + * group_bind_locked() - Bind a group to a group slot
> + * @group: Group.
> + * @csg_id: Slot.
> + *
> + * Return: 0 on success, a negative error code otherwise.
> + */
> +static int
> +group_bind_locked(struct panthor_group *group, u32 csg_id)
> +{
> +	struct panthor_device *ptdev = group->ptdev;
> +	int ret;
> +
> +	lockdep_assert_held(&ptdev->scheduler->lock);
> +
> +	if (drm_WARN_ON(&ptdev->base, group->csg_id != -1 || csg_id >= MAX_CSGS ||
> +			ptdev->scheduler->csg_slots[csg_id].group))
> +		return -EINVAL;
> +
> +	ret = panthor_vm_active(group->vm);
> +	if (ret)
> +		return ret;
> +
> +	group_get(group);
> +
> +	/* Dummy doorbell allocation: doorbell is assigned to the group and
> +	 * all queues use the same doorbell.
> +	 *
> +	 * TODO: Implement LRU-based doorbell assignment, so the most often
> +	 * updated queues get their own doorbell, thus avoiding useless checks
> +	 * on queues belonging to the same group that are rarely updated.
> +	 */
> +	for (u32 i = 0; i < group->queue_count; i++)
> +		group->queues[i]->doorbell_id = csg_id + 1;
> +
> +	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +		ptdev->scheduler->csg_slots[csg_id].group = group;
> +		group->csg_id = csg_id;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * group_unbind_locked() - Unbind a group from a slot.
> + * @group: Group to unbind.
> + *
> + * Return: 0 on success, a negative error code otherwise.
> + */
> +static int
> +group_unbind_locked(struct panthor_group *group)
> +{
> +	struct panthor_device *ptdev = group->ptdev;
> +
> +	lockdep_assert_held(&ptdev->scheduler->lock);
> +
> +	if (drm_WARN_ON(&ptdev->base, group->csg_id < 0 || group->csg_id >= MAX_CSGS))
> +		return -EINVAL;
> +
> +	if (drm_WARN_ON(&ptdev->base, group->state == PANTHOR_CS_GROUP_ACTIVE))
> +		return -EINVAL;
> +
> +	scoped_guard(spinlock_irqsave, &ptdev->scheduler->events_lock) {
> +		/* Process all pending IRQs before returning the slot. */
> +		sched_process_csg_irq_locked(ptdev, group->csg_id);
> +		ptdev->scheduler->csg_slots[group->csg_id].group = NULL;
> +		group->csg_id = -1;
> +	}
> +
> +	panthor_vm_idle(group->vm);
> +
> +	/* Tiler OOM events will be re-issued next time the group is scheduled. */
> +	atomic_set(&group->tiler_oom, 0);
> +	cancel_work(&group->tiler_oom_work);
> +
> +	for (u32 i = 0; i < group->queue_count; i++)
> +		group->queues[i]->doorbell_id = -1;
> +
> +	group_put(group);
> +	return 0;
> +}
> +
>  static const char *fence_get_driver_name(struct dma_fence *fence)
>  {
>  	return "panthor";
> @@ -2396,15 +2398,6 @@ tick_ctx_apply(struct panthor_scheduler *sched, struct panthor_sched_tick_ctx *c
>  	/* Unbind evicted groups. */
>  	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
>  		list_for_each_entry(group, &ctx->old_groups[prio], run_node) {
> -			/* This group is gone. Process interrupts to clear
> -			 * any pending interrupts before we start the new
> -			 * group.
> -			 */
> -			if (group->csg_id >= 0) {
> -				guard(spinlock_irqsave)(&sched->events_lock);
> -				sched_process_csg_irq_locked(ptdev, group->csg_id);
> -			}
> -
>  			group_unbind_locked(group);
>  		}
>  	}
> @@ -2988,11 +2981,6 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>  
>  		group_get(group);
>  
> -		if (group->csg_id >= 0) {
> -			guard(spinlock_irqsave)(&sched->events_lock);
> -			sched_process_csg_irq_locked(ptdev, group->csg_id);
> -		}
> -
>  		group_unbind_locked(group);
>  
>  		drm_WARN_ON(&group->ptdev->base, !list_empty(&group->run_node));
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-12 11:37 ` [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
  2026-05-12 21:55   ` Chia-I Wu
@ 2026-05-14 14:25   ` Steven Price
  2026-05-18  8:16     ` Boris Brezillon
  1 sibling, 1 reply; 56+ messages in thread
From: Steven Price @ 2026-05-14 14:25 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 12/05/2026 12:37, Boris Brezillon wrote:
> Rather than assuming an interrupt is always expected for request
> acks, temporarily enable the relevant interrupts when the polling-wait
> failed. This should hopefully reduce the number of interrupts the CPU
> has to process.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

As mentioned in the other thread[1] it turns out this won't work with
the current firmware.

The firmware checks the interrupt mask before signalling the ACK - so
enabling the bit in the mask just before waiting for it is problematic -
the firmware may not see the addition in the mask and will not trigger
the interrupt.

In practice this will mostly work - the CPU is fast enough that it's
unlikely the firmware will have picked up the request by the time we
become blocked (and hence the firmware's mask read will be after the
host is already in the wait_event_timeout()). But it's not robust.

I'm hoping to get at least a clarification in the spec if not a change
in (later) firmware.

Thanks,
Steve

[1]
https://lore.kernel.org/all/3c721f22-d1a7-474e-8276-f0afc7cd9a0b@arm.com/

> ---
>  drivers/gpu/drm/panthor/panthor_fw.c    | 34 +++++++++++++++++++--------------
>  drivers/gpu/drm/panthor/panthor_sched.c |  5 +++--
>  2 files changed, 23 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 8239a6951569..f5e0ceca4130 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1039,16 +1039,10 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
>  	glb_iface->input->progress_timer = PROGRESS_TIMEOUT_CYCLES >> PROGRESS_TIMEOUT_SCALE_SHIFT;
>  	glb_iface->input->idle_timer = panthor_fw_conv_timeout(ptdev, IDLE_HYSTERESIS_US);
>  
> -	/* Enable interrupts we care about. */
> -	glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
> -					 GLB_PING |
> -					 GLB_CFG_PROGRESS_TIMER |
> -					 GLB_CFG_POWEROFF_TIMER |
> -					 GLB_IDLE_EN |
> -					 GLB_IDLE;
> -
> -	if (panthor_fw_has_glb_state(ptdev))
> -		glb_iface->input->ack_irq_mask |= GLB_STATE_MASK;
> +	/* Enable interrupts for asynchronous events that are not
> +	 * triggered by request acks.
> +	 */
> +	glb_iface->input->ack_irq_mask = GLB_IDLE;
>  
>  	panthor_fw_update_reqs(glb_iface, req, GLB_IDLE_EN | GLB_COUNTER_EN,
>  			       GLB_IDLE_EN | GLB_COUNTER_EN);
> @@ -1318,8 +1312,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>   * Return: 0 on success, -ETIMEDOUT otherwise.
>   */
>  static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
> -				wait_queue_head_t *wq,
> -				u32 req_mask, u32 *acked,
> +				u32 *ack_irq_mask_ptr, spinlock_t *lock,
> +				wait_queue_head_t *wq, u32 req_mask, u32 *acked,
>  				u32 timeout_ms)
>  {
>  	u32 ack, req = READ_ONCE(*req_ptr) & req_mask;
> @@ -1334,8 +1328,16 @@ static int panthor_fw_wait_acks(const u32 *req_ptr, const u32 *ack_ptr,
>  	if (!ret)
>  		return 0;
>  
> -	if (wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> -			       msecs_to_jiffies(timeout_ms)))
> +	scoped_guard(spinlock_irqsave, lock)
> +		*ack_irq_mask_ptr |= req_mask;
> +
> +	ret = wait_event_timeout(*wq, (READ_ONCE(*ack_ptr) & req_mask) == req,
> +				 msecs_to_jiffies(timeout_ms));
> +
> +	scoped_guard(spinlock_irqsave, lock)
> +		*ack_irq_mask_ptr &= ~req_mask;
> +
> +	if (ret)
>  		return 0;
>  
>  	/* Check one last time, in case we were not woken up for some reason. */
> @@ -1369,6 +1371,8 @@ int panthor_fw_glb_wait_acks(struct panthor_device *ptdev,
>  
>  	return panthor_fw_wait_acks(&glb_iface->input->req,
>  				    &glb_iface->output->ack,
> +				    &glb_iface->input->ack_irq_mask,
> +				    &glb_iface->lock,
>  				    &ptdev->fw->req_waitqueue,
>  				    req_mask, acked, timeout_ms);
>  }
> @@ -1395,6 +1399,8 @@ int panthor_fw_csg_wait_acks(struct panthor_device *ptdev, u32 csg_slot,
>  
>  	ret = panthor_fw_wait_acks(&csg_iface->input->req,
>  				   &csg_iface->output->ack,
> +				   &csg_iface->input->ack_irq_mask,
> +				   &csg_iface->lock,
>  				   &ptdev->fw->req_waitqueue,
>  				   req_mask, acked, timeout_ms);
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 6c5ba747ae45..a9124bcc7de6 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1110,7 +1110,7 @@ cs_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 cs_id)
>  	cs_iface->input->ringbuf_output = queue->iface.output_fw_va;
>  	cs_iface->input->config = CS_CONFIG_PRIORITY(queue->priority) |
>  				  CS_CONFIG_DOORBELL(queue->doorbell_id);
> -	cs_iface->input->ack_irq_mask = ~0;
> +	cs_iface->input->ack_irq_mask = CS_FATAL | CS_FAULT | CS_TILER_OOM;
>  	panthor_fw_update_reqs(cs_iface, req,
>  			       CS_IDLE_SYNC_WAIT |
>  			       CS_IDLE_EMPTY |
> @@ -1378,7 +1378,8 @@ csg_slot_prog_locked(struct panthor_device *ptdev, u32 csg_id, u32 priority)
>  		csg_iface->input->protm_suspend_buf = 0;
>  	}
>  
> -	csg_iface->input->ack_irq_mask = ~0;
> +	csg_iface->input->ack_irq_mask = CSG_SYNC_UPDATE | CSG_IDLE |
> +					 CSG_PROGRESS_TIMER_EVENT;
>  	panthor_fw_toggle_reqs(csg_iface, doorbell_req, doorbell_ack, queue_mask);
>  	return 0;
>  }
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context
  2026-05-12 11:37 ` [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context Boris Brezillon
  2026-05-12 22:05   ` Chia-I Wu
@ 2026-05-14 15:23   ` Steven Price
  1 sibling, 0 replies; 56+ messages in thread
From: Steven Price @ 2026-05-14 15:23 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 12/05/2026 12:37, Boris Brezillon wrote:
> Now that everything is set to allow processing FW events in IRQ context,
> go for it. This should reduce the dma_fence signaling latency.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_fw.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index f5e0ceca4130..8cfebf180de7 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1087,9 +1087,29 @@ static void panthor_job_irq_handler(struct panthor_irq *pirq, u32 status)
>  	}
>  }
>  
> -static irqreturn_t panthor_job_irq_threaded_handler(int irq, void *data)
> +static irqreturn_t panthor_job_irq_raw_handler(int irq, void *data)
>  {
> -	return panthor_irq_default_threaded_handler(data, panthor_job_irq_handler);
> +	struct panthor_irq *pirq = data;
> +
> +	if (!gpu_read(pirq->iomem, INT_STAT))
> +		return IRQ_NONE;
> +
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +		if (pirq->state != PANTHOR_IRQ_STATE_ACTIVE)
> +			return IRQ_NONE;
> +
> +		pirq->state = PANTHOR_IRQ_STATE_PROCESSING;
> +	}
> +
> +	/* We can use INT_STAT here, because we didn't mask the IRQs. */
> +	panthor_job_irq_handler(pirq, gpu_read(pirq->iomem, INT_STAT));
> +
> +	scoped_guard(spinlock_irqsave, &pirq->mask_lock) {
> +		if (pirq->state == PANTHOR_IRQ_STATE_PROCESSING)
> +			pirq->state = PANTHOR_IRQ_STATE_ACTIVE;
> +	}
> +
> +	return IRQ_HANDLED;
>  }
>  
>  static int panthor_fw_start(struct panthor_device *ptdev)
> @@ -1489,8 +1509,7 @@ int panthor_fw_init(struct panthor_device *ptdev)
>  
>  	ret = panthor_irq_request(ptdev, &fw->irq, irq, 0,
>  				  ptdev->iomem + JOB_INT_BASE, "job",
> -				  panthor_irq_default_raw_handler,
> -				  panthor_job_irq_threaded_handler);
> +				  panthor_job_irq_raw_handler, NULL);
>  	if (ret) {
>  		drm_err(&ptdev->base, "failed to request job irq");
>  		return ret;
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/11] drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler()
  2026-05-12 11:37 ` [PATCH v2 10/11] drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler() Boris Brezillon
@ 2026-05-14 15:26   ` Steven Price
  2026-05-18  8:04     ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Steven Price @ 2026-05-14 15:26 UTC (permalink / raw)
  To: Boris Brezillon, Liviu Dudau
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, dri-devel, linux-kernel

On 12/05/2026 12:37, Boris Brezillon wrote:
> This is not a bug per-se, because this lock is never taken in an
> interrupt context, but it's not consistent with the other users of this
> lock. We're also planning on transitioning GPU event processing to
> a hard handler. Again, this alone wouldn't justify using the IRQ-safe
> variant, because then this _lock/unlock sequence would be in the
> hard-IRQ path, where IRQs are already disabled, but let's do it anyway,
> to keep things consistent.
> 
> While at it, transition to a guard() instead of a plain lock/unlock
> sequence.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

Although this now leaves the lock in panthor_pwr_irq_handler() the odd
one out...

Thanks,
Steve

> ---
>  drivers/gpu/drm/panthor/panthor_gpu.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> index d0be758ea3e1..b9c51f8a051d 100644
> --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> @@ -110,12 +110,11 @@ static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
>  	if (status & GPU_IRQ_PROTM_FAULT)
>  		drm_warn(&ptdev->base, "GPU Fault in protected mode\n");
>  
> -	spin_lock(&ptdev->gpu->reqs_lock);
> +	guard(spinlock_irqsave)(&ptdev->gpu->reqs_lock);
>  	if (status & ptdev->gpu->pending_reqs) {
>  		ptdev->gpu->pending_reqs &= ~status;
>  		wake_up_all(&ptdev->gpu->reqs_acked);
>  	}
> -	spin_unlock(&ptdev->gpu->reqs_lock);
>  }
>  
>  static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/11] drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler()
  2026-05-14 15:26   ` Steven Price
@ 2026-05-18  8:04     ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-05-18  8:04 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Thu, 14 May 2026 16:26:25 +0100
Steven Price <steven.price@arm.com> wrote:

> On 12/05/2026 12:37, Boris Brezillon wrote:
> > This is not a bug per-se, because this lock is never taken in an
> > interrupt context, but it's not consistent with the other users of this
> > lock. We're also planning on transitioning GPU event processing to
> > a hard handler. Again, this alone wouldn't justify using the IRQ-safe
> > variant, because then this _lock/unlock sequence would be in the
> > hard-IRQ path, where IRQs are already disabled, but let's do it anyway,
> > to keep things consistent.
> > 
> > While at it, transition to a guard() instead of a plain lock/unlock
> > sequence.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> 
> Reviewed-by: Steven Price <steven.price@arm.com>
> 
> Although this now leaves the lock in panthor_pwr_irq_handler() the odd
> one out...

Yeah, I realized I had overlooked the PWR IRQ after sending v2. I'll
transition that one to an hard handler in v3, and use the irqsave
variant there as well.

> 
> Thanks,
> Steve
> 
> > ---
> >  drivers/gpu/drm/panthor/panthor_gpu.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
> > index d0be758ea3e1..b9c51f8a051d 100644
> > --- a/drivers/gpu/drm/panthor/panthor_gpu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_gpu.c
> > @@ -110,12 +110,11 @@ static void panthor_gpu_irq_handler(struct panthor_irq *pirq, u32 status)
> >  	if (status & GPU_IRQ_PROTM_FAULT)
> >  		drm_warn(&ptdev->base, "GPU Fault in protected mode\n");
> >  
> > -	spin_lock(&ptdev->gpu->reqs_lock);
> > +	guard(spinlock_irqsave)(&ptdev->gpu->reqs_lock);
> >  	if (status & ptdev->gpu->pending_reqs) {
> >  		ptdev->gpu->pending_reqs &= ~status;
> >  		wake_up_all(&ptdev->gpu->reqs_acked);
> >  	}
> > -	spin_unlock(&ptdev->gpu->reqs_lock);
> >  }
> >  
> >  static irqreturn_t panthor_gpu_irq_threaded_handler(int irq, void *data)
> >   
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-14 14:25   ` Steven Price
@ 2026-05-18  8:16     ` Boris Brezillon
  2026-05-19 14:19       ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-18  8:16 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Thu, 14 May 2026 15:25:07 +0100
Steven Price <steven.price@arm.com> wrote:

> On 12/05/2026 12:37, Boris Brezillon wrote:
> > Rather than assuming an interrupt is always expected for request
> > acks, temporarily enable the relevant interrupts when the polling-wait
> > failed. This should hopefully reduce the number of interrupts the CPU
> > has to process.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> 
> As mentioned in the other thread[1] it turns out this won't work with
> the current firmware.
> 
> The firmware checks the interrupt mask before signalling the ACK - so
> enabling the bit in the mask just before waiting for it is problematic -
> the firmware may not see the addition in the mask and will not trigger
> the interrupt.

Is it a problem though? wait_event_timeout() will evaluate the
condition before going to sleep, so, if the FW raced with the
input->ack_irq_mask update, I assume the condition will evaluate to
true and wait_event_timeout() would return immediately. The only issue
is if the FW updates the output->ack register after reading
input->ack_irq_mask, but that would be weird, since the output->ack
update doesn't depend on input->ack_irq_mask, and raising an interrupt
before updating output->ack would be racy anyway.

Am I missing something?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers
  2026-05-13 18:17           ` Chia-I Wu
@ 2026-05-18 11:54             ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-05-18 11:54 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, 13 May 2026 11:17:26 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Wed, May 13, 2026 at 10:30 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Wed, 13 May 2026 10:06:14 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >  
> > > On Wed, May 13, 2026 at 1:09 AM Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:  
> > > >
> > > > On Tue, 12 May 2026 12:11:08 -0700
> > > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > > >  
> > > > > On Tue, May 12, 2026 at 4:54 AM Boris Brezillon
> > > > > <boris.brezillon@collabora.com> wrote:  
> > > > > >
> > > > > > All drivers except panthor signal their fences from their interrupt
> > > > > > handler to minimize latency. We could do the same from the threaded
> > > > > > handler, but the latency is still quite high in that case, so let's
> > > > > > allow components to choose the context they want their IRQ handler
> > > > > > to run in by exposing support for custom hard handlers.
> > > > > >
> > > > > > Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> > > > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/panthor/panthor_device.h | 11 ++++++++---
> > > > > >  drivers/gpu/drm/panthor/panthor_fw.c     |  1 +
> > > > > >  drivers/gpu/drm/panthor/panthor_gpu.c    |  1 +
> > > > > >  drivers/gpu/drm/panthor/panthor_mmu.c    |  1 +
> > > > > >  drivers/gpu/drm/panthor/panthor_pwr.c    |  1 +
> > > > > >  5 files changed, 12 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > > > > > index 393fcda73d88..1aaf06df875b 100644
> > > > > > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > > > > > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > > > > > @@ -672,6 +672,7 @@ static inline void panthor_irq_disable_events(struct panthor_irq *pirq, u32 mask
> > > > > >  static inline int
> > > > > >  panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> > > > > >                     int irq, u32 mask, void __iomem *iomem, const char *name,
> > > > > > +                   irqreturn_t (*raw_handler)(int, void *data),
> > > > > >                     irqreturn_t (*threaded_handler)(int, void *data))
> > > > > >  {
> > > > > >         const char *full_name;
> > > > > > @@ -687,9 +688,13 @@ panthor_irq_request(struct panthor_device *ptdev, struct panthor_irq *pirq,
> > > > > >                 return -ENOMEM;
> > > > > >
> > > > > >         panthor_irq_resume(pirq);
> > > > > > -       return devm_request_threaded_irq(ptdev->base.dev, irq,
> > > > > > -                                        panthor_irq_default_raw_handler,
> > > > > > -                                        threaded_handler,
> > > > > > +
> > > > > > +       if (!threaded_handler) {
> > > > > > +               return devm_request_irq(ptdev->base.dev, irq, raw_handler,
> > > > > > +                                       IRQF_SHARED, full_name, pirq);
> > > > > > +       }  
> > > > > devm_request_irq expands to devm_request_threaded_irq plus
> > > > > IRQF_COND_ONESHOT. This appears redundant.  
> > > >
> > > > I considered going for devm_request_threaded_irq(COND_ONESHOT), but I
> > > > thought it was easier to reason about with a regular devm_request_irq()
> > > > and an extra conditional since request_irq() is what people tend
> > > > to use when they just have a hard handler (see [1], there's just one
> > > > driver using it, and it's not even needed, because it's calling
> > > > devm_request_irq() which adds this flag already)  
> > > It is unclear to me why the current version wants IRQF_COND_ONESHOT in
> > > one case but not in another. Can't we call devm_request_threaded_irq
> > > without IRQF_COND_ONESHOT for both cases?  
> >
> > Hm, I thought this had to do with the automatic hard -> threaded
> > downgrade happening when RT is enabled, but I fail to see why it
> > matters, since all irqaction end up with IRQF_ONESHOT in that case
> > anyway. Honestly, I'm tempted to stay on the safe side, and have
> > devm_request_irq() called when we just have a hard handler, because I'm
> > sure there's a reason for this COND_ONESHOT flag.  
> Sounds good. Feel free to add Reviewed-by: Chia-I Wu <olvaffe@gmail.com>.
> 
> FWIW, I think this commit explains the motivation
> 
>   commit c37927a203fa283950f6045602b9f71328ad786c
>   Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>   Date:   Thu Jul 11 12:20:04 2024 +0200
> 
>       genirq: Set IRQF_COND_ONESHOT in request_irq()
> 
>       The callers of request_irq() don't care about IRQF_ONESHOT because they
>       don't provide threaded handlers, but if they happen to share the IRQ with
>       the ACPI SCI, which has a threaded handler and sets IRQF_ONESHOT,
>       request_irq() will fail for them due to a flags mismatch.
> 
> For panthor, my takeaway is we either care about flag mismatch (then
> we should set IRQF_COND_ONESHOT for the threaded case) or we don't
> (then we don't need to use devm_request_irq).

Fair enough. I'll use devm_request_threaded_irq() without
IRQF_COND_ONESHOT since I don't think we ever had an SoC where the IRQ
line was shared with other non-GPU blocks, and we set the same flags
for all GPU irqs.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-13 17:47       ` Chia-I Wu
@ 2026-05-18 13:45         ` Boris Brezillon
  2026-05-18 23:33           ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-18 13:45 UTC (permalink / raw)
  To: Chia-I Wu, Thomas Zimmermann
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Wed, 13 May 2026 10:47:28 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> > >
> > > Multiple things happen in this commit. I try to identify things that
> > > can be separate commits. If this does not make sense, feel free to
> > > ignore.
> > >  
> > > >         /** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
> > > >         struct work_struct tiler_oom_work;
> > > >  
> >
> > [...]
> >  
> > > >  /**
> > > >   * panthor_sched_report_fw_events() - Report FW events to the scheduler.
> > > >   * @ptdev: Device.
> > > > @@ -1902,8 +1953,19 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)  
> > > This can be renamed to panthor_sched_handle_fw_events.  
> >
> > It's not quite handling events though. For most of them, it's really
> > just deferring the processing to work items, SYNC_UPDATE is the
> > exception.  
> panthor_sched_report_fw_events no longer just queues
> process_fw_events_work. It processes fw events immediately. If
> "handle" is not the right verb, perhaps we can go with "process".

I guess "demux" would be more accurate, but do we need to rename this
function in the first place? I mean, panthor_sched_report_fw_events()
doesn't imply that events are processed/handled, it just reflects the
fact FW events are reported to the scheduler. Up to the scheduler to
do what it wants with this piece of information (process some of them
immediately, defer the processing for others, etc).

> 
> 
> >  
> > >  
> > > >         if (!ptdev->scheduler)
> > > >                 return;
> > > >
> > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > +
> > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > +               sched_process_global_irq_locked(ptdev);
> > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > +       }
> > > > +
> > > > +       while (events) {
> > > > +               u32 csg_id = ffs(events) - 1;
> > > > +
> > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > +               events &= ~BIT(csg_id);
> > > > +       }  
> > > This handles all fw events in the irq context. Are there concerns that
> > > it may take too long? I might be wrong, but it seems possible to
> > > handle only CSG_SYNC_UPDATE and defer the rest as before.  
> >
> > I started with just the SYNC_UPDATE processing done in the hard-irq
> > context, but after auditing the other stuff done in the handler, I
> > realized it's basically just deferring all actual processing to work
> > items. Yes, there's the overhead of demuxing the events from the
> > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > anyway, so at this point we're probably better off demuxing everything
> > and scheduling works for all kind of events.
> >
> > I also compared the perfs between the two approaches (though I didn't
> > do as much testing as I did with the new version, so I might have
> > missed something), and it didn't seem to matter at all, because the
> > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > those are at the same level.  
> Looking at ftrace irq events, when there is one active csg,
> panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> 
> I don't have a good sense if that's considered normal in hardirq. But
> if that is ever an issue, and if the majority of the time is spent in
> CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> processing to threaded handler.

Actually, the threaded -> hard transition (patch 9) is where the perf
gain is.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-18 13:45         ` Boris Brezillon
@ 2026-05-18 23:33           ` Chia-I Wu
  2026-05-19  7:53             ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-18 23:33 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Mon, May 18, 2026 at 6:45 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Wed, 13 May 2026 10:47:28 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > > >
> > > > Multiple things happen in this commit. I try to identify things that
> > > > can be separate commits. If this does not make sense, feel free to
> > > > ignore.
> > > >
> > > > >         /** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
> > > > >         struct work_struct tiler_oom_work;
> > > > >
> > >
> > > [...]
> > >
> > > > >  /**
> > > > >   * panthor_sched_report_fw_events() - Report FW events to the scheduler.
> > > > >   * @ptdev: Device.
> > > > > @@ -1902,8 +1953,19 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
> > > > This can be renamed to panthor_sched_handle_fw_events.
> > >
> > > It's not quite handling events though. For most of them, it's really
> > > just deferring the processing to work items, SYNC_UPDATE is the
> > > exception.
> > panthor_sched_report_fw_events no longer just queues
> > process_fw_events_work. It processes fw events immediately. If
> > "handle" is not the right verb, perhaps we can go with "process".
>
> I guess "demux" would be more accurate, but do we need to rename this
> function in the first place? I mean, panthor_sched_report_fw_events()
> doesn't imply that events are processed/handled, it just reflects the
> fact FW events are reported to the scheduler. Up to the scheduler to
> do what it wants with this piece of information (process some of them
> immediately, defer the processing for others, etc).
process_fw_events_work gets inlined into
panthor_sched_report_fw_events. Either report/process/handle should be
fine. Let's drop this comment.

>
> >
> >
> > >
> > > >
> > > > >         if (!ptdev->scheduler)
> > > > >                 return;
> > > > >
> > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > +
> > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > +       }
> > > > > +
> > > > > +       while (events) {
> > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > +
> > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > +               events &= ~BIT(csg_id);
> > > > > +       }
> > > > This handles all fw events in the irq context. Are there concerns that
> > > > it may take too long? I might be wrong, but it seems possible to
> > > > handle only CSG_SYNC_UPDATE and defer the rest as before.
> > >
> > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > context, but after auditing the other stuff done in the handler, I
> > > realized it's basically just deferring all actual processing to work
> > > items. Yes, there's the overhead of demuxing the events from the
> > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > anyway, so at this point we're probably better off demuxing everything
> > > and scheduling works for all kind of events.
> > >
> > > I also compared the perfs between the two approaches (though I didn't
> > > do as much testing as I did with the new version, so I might have
> > > missed something), and it didn't seem to matter at all, because the
> > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > those are at the same level.
> > Looking at ftrace irq events, when there is one active csg,
> > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> >
> > I don't have a good sense if that's considered normal in hardirq. But
> > if that is ever an issue, and if the majority of the time is spent in
> > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > processing to threaded handler.
>
> Actually, the threaded -> hard transition (patch 9) is where the perf
> gain is.
hardirq is even more timely for sure. For our use case, the threaded
handler is RT and is also good enough.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-18 23:33           ` Chia-I Wu
@ 2026-05-19  7:53             ` Boris Brezillon
  2026-05-19 17:16               ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-19  7:53 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Mon, 18 May 2026 16:33:20 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:


> > >
> > >  
> > > >  
> > > > >  
> > > > > >         if (!ptdev->scheduler)
> > > > > >                 return;
> > > > > >
> > > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > +
> > > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > > +       }
> > > > > > +
> > > > > > +       while (events) {
> > > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > > +
> > > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > +               events &= ~BIT(csg_id);
> > > > > > +       }  
> > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.  
> > > >
> > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > context, but after auditing the other stuff done in the handler, I
> > > > realized it's basically just deferring all actual processing to work
> > > > items. Yes, there's the overhead of demuxing the events from the
> > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > anyway, so at this point we're probably better off demuxing everything
> > > > and scheduling works for all kind of events.
> > > >
> > > > I also compared the perfs between the two approaches (though I didn't
> > > > do as much testing as I did with the new version, so I might have
> > > > missed something), and it didn't seem to matter at all, because the
> > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > those are at the same level.  
> > > Looking at ftrace irq events, when there is one active csg,
> > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > >
> > > I don't have a good sense if that's considered normal in hardirq. But
> > > if that is ever an issue, and if the majority of the time is spent in
> > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > processing to threaded handler.  
> >
> > Actually, the threaded -> hard transition (patch 9) is where the perf
> > gain is.  
> hardirq is even more timely for sure. For our use case, the threaded
> handler is RT and is also good enough.

Yeah, true. I forgot you were forcing RT priority on threaded handlers.
Anyway, let's stick to hardirqs for now, and revisit it if it proves to
be too much work done in irq context.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks()
  2026-05-18  8:16     ` Boris Brezillon
@ 2026-05-19 14:19       ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-05-19 14:19 UTC (permalink / raw)
  To: Steven Price
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, dri-devel, linux-kernel

On Mon, 18 May 2026 10:16:56 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Thu, 14 May 2026 15:25:07 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
> > On 12/05/2026 12:37, Boris Brezillon wrote:  
> > > Rather than assuming an interrupt is always expected for request
> > > acks, temporarily enable the relevant interrupts when the polling-wait
> > > failed. This should hopefully reduce the number of interrupts the CPU
> > > has to process.
> > > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>    
> > 
> > As mentioned in the other thread[1] it turns out this won't work with
> > the current firmware.
> > 
> > The firmware checks the interrupt mask before signalling the ACK - so
> > enabling the bit in the mask just before waiting for it is problematic -
> > the firmware may not see the addition in the mask and will not trigger
> > the interrupt.  
> 
> Is it a problem though? wait_event_timeout() will evaluate the
> condition before going to sleep, so, if the FW raced with the
> input->ack_irq_mask update, I assume the condition will evaluate to
> true and wait_event_timeout() would return immediately. The only issue
> is if the FW updates the output->ack register after reading
> input->ack_irq_mask, but that would be weird, since the output->ack
> update doesn't depend on input->ack_irq_mask, and raising an interrupt
> before updating output->ack would be racy anyway.
> 
> Am I missing something?
> 

Quoting your reply to v1, for the context, because I missed it when
replying here:

> So I've looked at the firmware implementation and I can say that there's
> a race condition on the firmware side if we change the mask after
> sending the START/RESUME request. The firmware currently samples the
> mask before triggering the update to CSG_ACK and then uses the sampled
> value to decide whether to trigger an interrupt.

So, the race is:

CPU						MCU

CSG_REQ.STATE = START
atomic_poll(CSG_ACK & STATE_MASK)
						process_state_change()
							...
 							mask = CSG_ACK_IRQ_MASK
							...
							// done
CSG_ACK_IRQ_MASK |= STATE_MASK
wait(CSG_ACK & STATE_MASK)
							CSG_ACK = (CSG_ACK & STATE_MASK) | CSG_REQ.STATE
							if (mask & STATE_MASK) // evaluates to false
								raise_int()


It's a bit unfortunate, because conditionally enabling IRQs
only after the polling period expires reduces the number of
interrupts directly caused by requests initiated by the CPU
since it's assumed those requests will complete relatively
quickly and avoid a sleep+interrupt+wake-up round-trip. We
could of course update CSG_ACK_IRQ_MASK at the time we update
CSG_REQ, but that means we'll still get an interrupt, so I
guess we should stick to what exists now if there's no other
option. Oh well.

>
> This is all fine if the mask is set before the request (so nothing
> broken with the current code), but we'd need a firmware change before we
> can safely do what this patch was proposing. And of course we'd have to
> get our heads round the barriers needed! ;)

:-(

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-19  7:53             ` Boris Brezillon
@ 2026-05-19 17:16               ` Chia-I Wu
  2026-05-19 18:26                 ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-19 17:16 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 19, 2026 at 12:53 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Mon, 18 May 2026 16:33:20 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
>
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > >         if (!ptdev->scheduler)
> > > > > > >                 return;
> > > > > > >
> > > > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > > +
> > > > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       while (events) {
> > > > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > > > +
> > > > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > > +               events &= ~BIT(csg_id);
> > > > > > > +       }
> > > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.
> > > > >
> > > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > > context, but after auditing the other stuff done in the handler, I
> > > > > realized it's basically just deferring all actual processing to work
> > > > > items. Yes, there's the overhead of demuxing the events from the
> > > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > > anyway, so at this point we're probably better off demuxing everything
> > > > > and scheduling works for all kind of events.
> > > > >
> > > > > I also compared the perfs between the two approaches (though I didn't
> > > > > do as much testing as I did with the new version, so I might have
> > > > > missed something), and it didn't seem to matter at all, because the
> > > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > > those are at the same level.
> > > > Looking at ftrace irq events, when there is one active csg,
> > > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > > >
> > > > I don't have a good sense if that's considered normal in hardirq. But
> > > > if that is ever an issue, and if the majority of the time is spent in
> > > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > > processing to threaded handler.
> > >
> > > Actually, the threaded -> hard transition (patch 9) is where the perf
> > > gain is.
> > hardirq is even more timely for sure. For our use case, the threaded
> > handler is RT and is also good enough.
>
> Yeah, true. I forgot you were forcing RT priority on threaded handlers.
> Anyway, let's stick to hardirqs for now, and revisit it if it proves to
> be too much work done in irq context.
Just want to clarify that irq_thread calls sched_set_fifo to make the
task RT. The behavior is universal and is not specific to any
downstream kernel.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-19 17:16               ` Chia-I Wu
@ 2026-05-19 18:26                 ` Boris Brezillon
  2026-05-19 20:45                   ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-19 18:26 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, 19 May 2026 10:16:26 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 19, 2026 at 12:53 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Mon, 18 May 2026 16:33:20 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> >  
> > > > >
> > > > >  
> > > > > >  
> > > > > > >  
> > > > > > > >         if (!ptdev->scheduler)
> > > > > > > >                 return;
> > > > > > > >
> > > > > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > > > +
> > > > > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       while (events) {
> > > > > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > > > > +
> > > > > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > > > +               events &= ~BIT(csg_id);
> > > > > > > > +       }  
> > > > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.  
> > > > > >
> > > > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > > > context, but after auditing the other stuff done in the handler, I
> > > > > > realized it's basically just deferring all actual processing to work
> > > > > > items. Yes, there's the overhead of demuxing the events from the
> > > > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > > > anyway, so at this point we're probably better off demuxing everything
> > > > > > and scheduling works for all kind of events.
> > > > > >
> > > > > > I also compared the perfs between the two approaches (though I didn't
> > > > > > do as much testing as I did with the new version, so I might have
> > > > > > missed something), and it didn't seem to matter at all, because the
> > > > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > > > those are at the same level.  
> > > > > Looking at ftrace irq events, when there is one active csg,
> > > > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > > > >
> > > > > I don't have a good sense if that's considered normal in hardirq. But
> > > > > if that is ever an issue, and if the majority of the time is spent in
> > > > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > > > processing to threaded handler.  
> > > >
> > > > Actually, the threaded -> hard transition (patch 9) is where the perf
> > > > gain is.  
> > > hardirq is even more timely for sure. For our use case, the threaded
> > > handler is RT and is also good enough.  
> >
> > Yeah, true. I forgot you were forcing RT priority on threaded handlers.
> > Anyway, let's stick to hardirqs for now, and revisit it if it proves to
> > be too much work done in irq context.  
> Just want to clarify that irq_thread calls sched_set_fifo to make the
> task RT. The behavior is universal and is not specific to any
> downstream kernel.

Hm, interesting. In my testing, any of the changes before patch 9
didn't make a huge difference in term of perf, patch 9 is where the perf
gains happen. For the record, patch 6 is where we get rid of the
threaded -> work round-trip for job completion/fence signaling, and it
didn't seem to reflect in the benchmark results, but I'll do another
round of tests before posting v3, just to confirm.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-19 18:26                 ` Boris Brezillon
@ 2026-05-19 20:45                   ` Chia-I Wu
  2026-05-19 21:04                     ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-19 20:45 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 19, 2026 at 11:26 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 19 May 2026 10:16:26 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > On Tue, May 19, 2026 at 12:53 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > On Mon, 18 May 2026 16:33:20 -0700
> > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > >
> > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >         if (!ptdev->scheduler)
> > > > > > > > >                 return;
> > > > > > > > >
> > > > > > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > > > > +
> > > > > > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       while (events) {
> > > > > > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > > > > > +
> > > > > > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > > > > +               events &= ~BIT(csg_id);
> > > > > > > > > +       }
> > > > > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.
> > > > > > >
> > > > > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > > > > context, but after auditing the other stuff done in the handler, I
> > > > > > > realized it's basically just deferring all actual processing to work
> > > > > > > items. Yes, there's the overhead of demuxing the events from the
> > > > > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > > > > anyway, so at this point we're probably better off demuxing everything
> > > > > > > and scheduling works for all kind of events.
> > > > > > >
> > > > > > > I also compared the perfs between the two approaches (though I didn't
> > > > > > > do as much testing as I did with the new version, so I might have
> > > > > > > missed something), and it didn't seem to matter at all, because the
> > > > > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > > > > those are at the same level.
> > > > > > Looking at ftrace irq events, when there is one active csg,
> > > > > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > > > > >
> > > > > > I don't have a good sense if that's considered normal in hardirq. But
> > > > > > if that is ever an issue, and if the majority of the time is spent in
> > > > > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > > > > processing to threaded handler.
> > > > >
> > > > > Actually, the threaded -> hard transition (patch 9) is where the perf
> > > > > gain is.
> > > > hardirq is even more timely for sure. For our use case, the threaded
> > > > handler is RT and is also good enough.
> > >
> > > Yeah, true. I forgot you were forcing RT priority on threaded handlers.
> > > Anyway, let's stick to hardirqs for now, and revisit it if it proves to
> > > be too much work done in irq context.
> > Just want to clarify that irq_thread calls sched_set_fifo to make the
> > task RT. The behavior is universal and is not specific to any
> > downstream kernel.
>
> Hm, interesting. In my testing, any of the changes before patch 9
> didn't make a huge difference in term of perf, patch 9 is where the perf
> gains happen. For the record, patch 6 is where we get rid of the
> threaded -> work round-trip for job completion/fence signaling, and it
> didn't seem to reflect in the benchmark results, but I'll do another
> round of tests before posting v3, just to confirm.
We care the most about signaling latency for this series. I collected
some numbers with baseline, with this series, and with patch 9
reverted at https://gitlab.freedesktop.org/panfrost/linux/-/work_items/85#note_3481308.
Reposting the numbers here for reference

|                    | baseline | entire series | patch 9 reverted |
| -                  | -        | -             | -                |
| frag job median    | 2.8ms    | 2.2ms         | 2.2ms            |
| frag job 95%       | 4.5ms    | 2.8ms         | 2.8ms            |
| frag job 99%       | 4.9ms    | 2.8ms         | 2.8ms            |
| panthor-job median | 0.8us    | 6.2us         | 0.9us            |
| panthor-job 95%    | 1.5us    | 16.6us        | 1.5us            |
| panthor-job 99%    | 1.6us    | 28.0us        | 1.8us            |

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-19 20:45                   ` Chia-I Wu
@ 2026-05-19 21:04                     ` Chia-I Wu
  2026-05-20  8:09                       ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-19 21:04 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, May 19, 2026 at 1:45 PM Chia-I Wu <olvaffe@gmail.com> wrote:
>
> On Tue, May 19, 2026 at 11:26 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Tue, 19 May 2026 10:16:26 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> > > On Tue, May 19, 2026 at 12:53 AM Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:
> > > >
> > > > On Mon, 18 May 2026 16:33:20 -0700
> > > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > > >
> > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >         if (!ptdev->scheduler)
> > > > > > > > > >                 return;
> > > > > > > > > >
> > > > > > > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > > > > > +
> > > > > > > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > > > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > > > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +       while (events) {
> > > > > > > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > > > > > > +
> > > > > > > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > > > > > +               events &= ~BIT(csg_id);
> > > > > > > > > > +       }
> > > > > > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.
> > > > > > > >
> > > > > > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > > > > > context, but after auditing the other stuff done in the handler, I
> > > > > > > > realized it's basically just deferring all actual processing to work
> > > > > > > > items. Yes, there's the overhead of demuxing the events from the
> > > > > > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > > > > > anyway, so at this point we're probably better off demuxing everything
> > > > > > > > and scheduling works for all kind of events.
> > > > > > > >
> > > > > > > > I also compared the perfs between the two approaches (though I didn't
> > > > > > > > do as much testing as I did with the new version, so I might have
> > > > > > > > missed something), and it didn't seem to matter at all, because the
> > > > > > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > > > > > those are at the same level.
> > > > > > > Looking at ftrace irq events, when there is one active csg,
> > > > > > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > > > > > >
> > > > > > > I don't have a good sense if that's considered normal in hardirq. But
> > > > > > > if that is ever an issue, and if the majority of the time is spent in
> > > > > > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > > > > > processing to threaded handler.
> > > > > >
> > > > > > Actually, the threaded -> hard transition (patch 9) is where the perf
> > > > > > gain is.
> > > > > hardirq is even more timely for sure. For our use case, the threaded
> > > > > handler is RT and is also good enough.
> > > >
> > > > Yeah, true. I forgot you were forcing RT priority on threaded handlers.
> > > > Anyway, let's stick to hardirqs for now, and revisit it if it proves to
> > > > be too much work done in irq context.
> > > Just want to clarify that irq_thread calls sched_set_fifo to make the
> > > task RT. The behavior is universal and is not specific to any
> > > downstream kernel.
> >
> > Hm, interesting. In my testing, any of the changes before patch 9
> > didn't make a huge difference in term of perf, patch 9 is where the perf
> > gains happen. For the record, patch 6 is where we get rid of the
> > threaded -> work round-trip for job completion/fence signaling, and it
> > didn't seem to reflect in the benchmark results, but I'll do another
> > round of tests before posting v3, just to confirm.
> We care the most about signaling latency for this series. I collected
> some numbers with baseline, with this series, and with patch 9
> reverted at https://gitlab.freedesktop.org/panfrost/linux/-/work_items/85#note_3481308.
> Reposting the numbers here for reference
>
> |                    | baseline | entire series | patch 9 reverted |
> | -                  | -        | -             | -                |
> | frag job median    | 2.8ms    | 2.2ms         | 2.2ms            |
> | frag job 95%       | 4.5ms    | 2.8ms         | 2.8ms            |
> | frag job 99%       | 4.9ms    | 2.8ms         | 2.8ms            |
> | panthor-job median | 0.8us    | 6.2us         | 0.9us            |
> | panthor-job 95%    | 1.5us    | 16.6us        | 1.5us            |
> | panthor-job 99%    | 1.6us    | 28.0us        | 1.8us            |

panthor-job rows are the durations of the raw irq handlers, collected
from irq/irq_handler_{entry,exit}.

frag job rows are the durations from frag jobs, collected from
gpu_scheduler/drm_sched_job_{run,done}.

The fence signaling paths of them are

 - baseline: raw handler -> rt threaded handler -> wq job -> wq job ->
fence signal
 - entire series: raw handler -> fence signal
 - patch 9 reverted: raw handler -> rt threaded handler -> fence signal

This is a synthetic test with a busy userspace. The higher durations
and variances of the baseline main come from the scheduling latencies
of wq jobs.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-19 21:04                     ` Chia-I Wu
@ 2026-05-20  8:09                       ` Boris Brezillon
  2026-05-20 22:15                         ` Chia-I Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-05-20  8:09 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Tue, 19 May 2026 14:04:47 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 19, 2026 at 1:45 PM Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> > On Tue, May 19, 2026 at 11:26 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:  
> > >
> > > On Tue, 19 May 2026 10:16:26 -0700
> > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > >  
> > > > On Tue, May 19, 2026 at 12:53 AM Boris Brezillon
> > > > <boris.brezillon@collabora.com> wrote:  
> > > > >
> > > > > On Mon, 18 May 2026 16:33:20 -0700
> > > > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > > > >
> > > > >  
> > > > > > > >
> > > > > > > >  
> > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > > > >         if (!ptdev->scheduler)
> > > > > > > > > > >                 return;
> > > > > > > > > > >
> > > > > > > > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > > > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > > > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > > > > > > +
> > > > > > > > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > > > > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > > > > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +       while (events) {
> > > > > > > > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > > > > > > > +
> > > > > > > > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > > > > > > +               events &= ~BIT(csg_id);
> > > > > > > > > > > +       }  
> > > > > > > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > > > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > > > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.  
> > > > > > > > >
> > > > > > > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > > > > > > context, but after auditing the other stuff done in the handler, I
> > > > > > > > > realized it's basically just deferring all actual processing to work
> > > > > > > > > items. Yes, there's the overhead of demuxing the events from the
> > > > > > > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > > > > > > anyway, so at this point we're probably better off demuxing everything
> > > > > > > > > and scheduling works for all kind of events.
> > > > > > > > >
> > > > > > > > > I also compared the perfs between the two approaches (though I didn't
> > > > > > > > > do as much testing as I did with the new version, so I might have
> > > > > > > > > missed something), and it didn't seem to matter at all, because the
> > > > > > > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > > > > > > those are at the same level.  
> > > > > > > > Looking at ftrace irq events, when there is one active csg,
> > > > > > > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > > > > > > >
> > > > > > > > I don't have a good sense if that's considered normal in hardirq. But
> > > > > > > > if that is ever an issue, and if the majority of the time is spent in
> > > > > > > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > > > > > > processing to threaded handler.  
> > > > > > >
> > > > > > > Actually, the threaded -> hard transition (patch 9) is where the perf
> > > > > > > gain is.  
> > > > > > hardirq is even more timely for sure. For our use case, the threaded
> > > > > > handler is RT and is also good enough.  
> > > > >
> > > > > Yeah, true. I forgot you were forcing RT priority on threaded handlers.
> > > > > Anyway, let's stick to hardirqs for now, and revisit it if it proves to
> > > > > be too much work done in irq context.  
> > > > Just want to clarify that irq_thread calls sched_set_fifo to make the
> > > > task RT. The behavior is universal and is not specific to any
> > > > downstream kernel.  

There's a difference in what RT means depending on whether the system
is configured with PREEMPT or PREEMPT_RT though. But I assume you're
using PREEMPT not PREEMPT_RT.

> > >
> > > Hm, interesting. In my testing, any of the changes before patch 9
> > > didn't make a huge difference in term of perf, patch 9 is where the perf
> > > gains happen. For the record, patch 6 is where we get rid of the
> > > threaded -> work round-trip for job completion/fence signaling, and it
> > > didn't seem to reflect in the benchmark results, but I'll do another
> > > round of tests before posting v3, just to confirm.  
> > We care the most about signaling latency for this series.

Yes, I know. It's just that it also seemed to help the throughput, which
I initially checked to make sure we were not regressing perfs
significantly by interrupting the system aggressively. I guess the
reason for that is that, by reducing the latency, we also unleash the
job submitter (if you get signaled early, and jobs tend to be
serialized because of deps, you can submit more).

> > I collected
> > some numbers with baseline, with this series, and with patch 9
> > reverted at https://gitlab.freedesktop.org/panfrost/linux/-/work_items/85#note_3481308.
> > Reposting the numbers here for reference
> >
> > |                    | baseline | entire series | patch 9 reverted |
> > | -                  | -        | -             | -                |
> > | frag job median    | 2.8ms    | 2.2ms         | 2.2ms            |
> > | frag job 95%       | 4.5ms    | 2.8ms         | 2.8ms            |
> > | frag job 99%       | 4.9ms    | 2.8ms         | 2.8ms            |
> > | panthor-job median | 0.8us    | 6.2us         | 0.9us            |
> > | panthor-job 95%    | 1.5us    | 16.6us        | 1.5us            |
> > | panthor-job 99%    | 1.6us    | 28.0us        | 1.8us            |  
> 
> panthor-job rows are the durations of the raw irq handlers, collected
> from irq/irq_handler_{entry,exit}.
> 
> frag job rows are the durations from frag jobs, collected from
> gpu_scheduler/drm_sched_job_{run,done}.
> 
> The fence signaling paths of them are
> 
>  - baseline: raw handler -> rt threaded handler -> wq job -> wq job ->
> fence signal
>  - entire series: raw handler -> fence signal
>  - patch 9 reverted: raw handler -> rt threaded handler -> fence signal

Just did another set of throughput tests, and I confirm the gains are
noticeable only with patch 9 applied (that's on rk3588, which embeds a
G610, so not the exact same setup). As an example, on
gfxbench/gl_manhattan, I get the following score bump 2391 -> 2457.

Now I need to set things up to measure latency like you did and make
sure I'm observing the same thing: threaded handlers providing roughly
the same latency as hardirq handlers. If not it probably has to do with
some config options that differ and change the preemptability of the
system.

I'll hold off on the submission of v3 until this is done, because if
threaded handlers are roughly as efficient as hardirq ones, we probably
want to stick to threaded handlers.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-20  8:09                       ` Boris Brezillon
@ 2026-05-20 22:15                         ` Chia-I Wu
  2026-06-22 12:49                           ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Chia-I Wu @ 2026-05-20 22:15 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, May 20, 2026 at 1:09 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 19 May 2026 14:04:47 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
>
> > On Tue, May 19, 2026 at 1:45 PM Chia-I Wu <olvaffe@gmail.com> wrote:
> > >
> > > On Tue, May 19, 2026 at 11:26 AM Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:
> > > >
> > > > On Tue, 19 May 2026 10:16:26 -0700
> > > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > > >
> > > > > On Tue, May 19, 2026 at 12:53 AM Boris Brezillon
> > > > > <boris.brezillon@collabora.com> wrote:
> > > > > >
> > > > > > On Mon, 18 May 2026 16:33:20 -0700
> > > > > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >         if (!ptdev->scheduler)
> > > > > > > > > > > >                 return;
> > > > > > > > > > > >
> > > > > > > > > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > > > > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > > > > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > > > > > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > > > > > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +       while (events) {
> > > > > > > > > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > > > > > > > +               events &= ~BIT(csg_id);
> > > > > > > > > > > > +       }
> > > > > > > > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > > > > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > > > > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.
> > > > > > > > > >
> > > > > > > > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > > > > > > > context, but after auditing the other stuff done in the handler, I
> > > > > > > > > > realized it's basically just deferring all actual processing to work
> > > > > > > > > > items. Yes, there's the overhead of demuxing the events from the
> > > > > > > > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > > > > > > > anyway, so at this point we're probably better off demuxing everything
> > > > > > > > > > and scheduling works for all kind of events.
> > > > > > > > > >
> > > > > > > > > > I also compared the perfs between the two approaches (though I didn't
> > > > > > > > > > do as much testing as I did with the new version, so I might have
> > > > > > > > > > missed something), and it didn't seem to matter at all, because the
> > > > > > > > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > > > > > > > those are at the same level.
> > > > > > > > > Looking at ftrace irq events, when there is one active csg,
> > > > > > > > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > > > > > > > >
> > > > > > > > > I don't have a good sense if that's considered normal in hardirq. But
> > > > > > > > > if that is ever an issue, and if the majority of the time is spent in
> > > > > > > > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > > > > > > > processing to threaded handler.
> > > > > > > >
> > > > > > > > Actually, the threaded -> hard transition (patch 9) is where the perf
> > > > > > > > gain is.
> > > > > > > hardirq is even more timely for sure. For our use case, the threaded
> > > > > > > handler is RT and is also good enough.
> > > > > >
> > > > > > Yeah, true. I forgot you were forcing RT priority on threaded handlers.
> > > > > > Anyway, let's stick to hardirqs for now, and revisit it if it proves to
> > > > > > be too much work done in irq context.
> > > > > Just want to clarify that irq_thread calls sched_set_fifo to make the
> > > > > task RT. The behavior is universal and is not specific to any
> > > > > downstream kernel.
>
> There's a difference in what RT means depending on whether the system
> is configured with PREEMPT or PREEMPT_RT though. But I assume you're
> using PREEMPT not PREEMPT_RT.
Right. What I meant for an RT task is that the task uses an RT
scheduling policy (rt_policy).

>
> > > >
> > > > Hm, interesting. In my testing, any of the changes before patch 9
> > > > didn't make a huge difference in term of perf, patch 9 is where the perf
> > > > gains happen. For the record, patch 6 is where we get rid of the
> > > > threaded -> work round-trip for job completion/fence signaling, and it
> > > > didn't seem to reflect in the benchmark results, but I'll do another
> > > > round of tests before posting v3, just to confirm.
> > > We care the most about signaling latency for this series.
>
> Yes, I know. It's just that it also seemed to help the throughput, which
> I initially checked to make sure we were not regressing perfs
> significantly by interrupting the system aggressively. I guess the
> reason for that is that, by reducing the latency, we also unleash the
> job submitter (if you get signaled early, and jobs tend to be
> serialized because of deps, you can submit more).
Yeah, that's likely the case.

Although, if the benchmark is supposed to be GPU-bound and is not
poorly-written, we might also want to check if the userspace driver
adds unncecessary deps between jobs. A trace should give us some
insight.

> > > I collected
> > > some numbers with baseline, with this series, and with patch 9
> > > reverted at https://gitlab.freedesktop.org/panfrost/linux/-/work_items/85#note_3481308.
> > > Reposting the numbers here for reference
> > >
> > > |                    | baseline | entire series | patch 9 reverted |
> > > | -                  | -        | -             | -                |
> > > | frag job median    | 2.8ms    | 2.2ms         | 2.2ms            |
> > > | frag job 95%       | 4.5ms    | 2.8ms         | 2.8ms            |
> > > | frag job 99%       | 4.9ms    | 2.8ms         | 2.8ms            |
> > > | panthor-job median | 0.8us    | 6.2us         | 0.9us            |
> > > | panthor-job 95%    | 1.5us    | 16.6us        | 1.5us            |
> > > | panthor-job 99%    | 1.6us    | 28.0us        | 1.8us            |
> >
> > panthor-job rows are the durations of the raw irq handlers, collected
> > from irq/irq_handler_{entry,exit}.
> >
> > frag job rows are the durations from frag jobs, collected from
> > gpu_scheduler/drm_sched_job_{run,done}.
> >
> > The fence signaling paths of them are
> >
> >  - baseline: raw handler -> rt threaded handler -> wq job -> wq job ->
> > fence signal
> >  - entire series: raw handler -> fence signal
> >  - patch 9 reverted: raw handler -> rt threaded handler -> fence signal
>
> Just did another set of throughput tests, and I confirm the gains are
> noticeable only with patch 9 applied (that's on rk3588, which embeds a
> G610, so not the exact same setup). As an example, on
> gfxbench/gl_manhattan, I get the following score bump 2391 -> 2457.
>
> Now I need to set things up to measure latency like you did and make
> sure I'm observing the same thing: threaded handlers providing roughly
> the same latency as hardirq handlers. If not it probably has to do with
> some config options that differ and change the preemptability of the
> system.
>
> I'll hold off on the submission of v3 until this is done, because if
> threaded handlers are roughly as efficient as hardirq ones, we probably
> want to stick to threaded handlers.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-05-20 22:15                         ` Chia-I Wu
@ 2026-06-22 12:49                           ` Boris Brezillon
  2026-06-23 12:52                             ` Boris Brezillon
  0 siblings, 1 reply; 56+ messages in thread
From: Boris Brezillon @ 2026-06-22 12:49 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed, 20 May 2026 15:15:54 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> > > > I collected
> > > > some numbers with baseline, with this series, and with patch 9
> > > > reverted at https://gitlab.freedesktop.org/panfrost/linux/-/work_items/85#note_3481308.
> > > > Reposting the numbers here for reference
> > > >
> > > > |                    | baseline | entire series | patch 9 reverted |
> > > > | -                  | -        | -             | -                |
> > > > | frag job median    | 2.8ms    | 2.2ms         | 2.2ms            |
> > > > | frag job 95%       | 4.5ms    | 2.8ms         | 2.8ms            |
> > > > | frag job 99%       | 4.9ms    | 2.8ms         | 2.8ms            |
> > > > | panthor-job median | 0.8us    | 6.2us         | 0.9us            |
> > > > | panthor-job 95%    | 1.5us    | 16.6us        | 1.5us            |
> > > > | panthor-job 99%    | 1.6us    | 28.0us        | 1.8us            |  
> > >
> > > panthor-job rows are the durations of the raw irq handlers, collected
> > > from irq/irq_handler_{entry,exit}.
> > >
> > > frag job rows are the durations from frag jobs, collected from
> > > gpu_scheduler/drm_sched_job_{run,done}.
> > >
> > > The fence signaling paths of them are
> > >
> > >  - baseline: raw handler -> rt threaded handler -> wq job -> wq job ->
> > > fence signal
> > >  - entire series: raw handler -> fence signal
> > >  - patch 9 reverted: raw handler -> rt threaded handler -> fence signal  
> >
> > Just did another set of throughput tests, and I confirm the gains are
> > noticeable only with patch 9 applied (that's on rk3588, which embeds a
> > G610, so not the exact same setup). As an example, on
> > gfxbench/gl_manhattan, I get the following score bump 2391 -> 2457.
> >
> > Now I need to set things up to measure latency like you did and make
> > sure I'm observing the same thing: threaded handlers providing roughly
> > the same latency as hardirq handlers. If not it probably has to do with
> > some config options that differ and change the preemptability of the
> > system.
> >
> > I'll hold off on the submission of v3 until this is done, because if
> > threaded handlers are roughly as efficient as hardirq ones, we probably
> > want to stick to threaded handlers. 

Sorry for the delay, I only got back to this on Friday.

So, I've been using ftrace/function-graph with some noinline added to
get a sense of where most of the time was spent in the hardirq handler
after the transition to hardirqs, and unlike what I thought, it's not
coming from the accesses to uncached mappings of the FW
interface/syncobjs, but instead the various queue[_delayed]_work()
and/or wake_up_all() on panthor_fw::req_waitqueue. I don't expect us to
be able to optimize that anytime soon, so I guess we should just keep
everything in the threaded handler for now and accept the extra delay
(assuming 20+ usec for the hardirq handler is too long). This also
means that a lot of the things I do in this series are moot
(irqsave/restore, using spinlocks instead of mutexes, ...), but before
I go and rework that, I'd like to get some feedback from Steve and
Liviu to make sure this is okay with Arm.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
  2026-06-22 12:49                           ` Boris Brezillon
@ 2026-06-23 12:52                             ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2026-06-23 12:52 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Thomas Zimmermann, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Mon, 22 Jun 2026 14:49:49 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Wed, 20 May 2026 15:15:54 -0700
> Chia-I Wu <olvaffe@gmail.com> wrote:
> 
> > > > > I collected
> > > > > some numbers with baseline, with this series, and with patch 9
> > > > > reverted at https://gitlab.freedesktop.org/panfrost/linux/-/work_items/85#note_3481308.
> > > > > Reposting the numbers here for reference
> > > > >
> > > > > |                    | baseline | entire series | patch 9 reverted |
> > > > > | -                  | -        | -             | -                |
> > > > > | frag job median    | 2.8ms    | 2.2ms         | 2.2ms            |
> > > > > | frag job 95%       | 4.5ms    | 2.8ms         | 2.8ms            |
> > > > > | frag job 99%       | 4.9ms    | 2.8ms         | 2.8ms            |
> > > > > | panthor-job median | 0.8us    | 6.2us         | 0.9us            |
> > > > > | panthor-job 95%    | 1.5us    | 16.6us        | 1.5us            |
> > > > > | panthor-job 99%    | 1.6us    | 28.0us        | 1.8us            |    
> > > >
> > > > panthor-job rows are the durations of the raw irq handlers, collected
> > > > from irq/irq_handler_{entry,exit}.
> > > >
> > > > frag job rows are the durations from frag jobs, collected from
> > > > gpu_scheduler/drm_sched_job_{run,done}.
> > > >
> > > > The fence signaling paths of them are
> > > >
> > > >  - baseline: raw handler -> rt threaded handler -> wq job -> wq job ->
> > > > fence signal
> > > >  - entire series: raw handler -> fence signal
> > > >  - patch 9 reverted: raw handler -> rt threaded handler -> fence signal    
> > >
> > > Just did another set of throughput tests, and I confirm the gains are
> > > noticeable only with patch 9 applied (that's on rk3588, which embeds a
> > > G610, so not the exact same setup). As an example, on
> > > gfxbench/gl_manhattan, I get the following score bump 2391 -> 2457.
> > >
> > > Now I need to set things up to measure latency like you did and make
> > > sure I'm observing the same thing: threaded handlers providing roughly
> > > the same latency as hardirq handlers. If not it probably has to do with
> > > some config options that differ and change the preemptability of the
> > > system.
> > >
> > > I'll hold off on the submission of v3 until this is done, because if
> > > threaded handlers are roughly as efficient as hardirq ones, we probably
> > > want to stick to threaded handlers.   
> 
> Sorry for the delay, I only got back to this on Friday.
> 
> So, I've been using ftrace/function-graph with some noinline added to
> get a sense of where most of the time was spent in the hardirq handler
> after the transition to hardirqs, and unlike what I thought, it's not
> coming from the accesses to uncached mappings of the FW
> interface/syncobjs, but instead the various queue[_delayed]_work()
> and/or wake_up_all() on panthor_fw::req_waitqueue. I don't expect us to
> be able to optimize that anytime soon, so I guess we should just keep
> everything in the threaded handler for now and accept the extra delay
> (assuming 20+ usec for the hardirq handler is too long). This also
> means that a lot of the things I do in this series are moot
> (irqsave/restore, using spinlocks instead of mutexes, ...), but before
> I go and rework that, I'd like to get some feedback from Steve and
> Liviu to make sure this is okay with Arm.

I ended up sending a v3 doing that. I can easily go back to the
previous version if needed.

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2026-06-23 12:52 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 01/11] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
2026-05-12 18:40   ` Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 02/11] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
2026-05-12 18:41   ` Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
2026-05-12 18:58   ` Chia-I Wu
2026-05-13  8:03     ` Boris Brezillon
2026-05-13 16:46       ` Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers Boris Brezillon
2026-05-12 19:11   ` Chia-I Wu
2026-05-13  8:09     ` Boris Brezillon
2026-05-13 17:06       ` Chia-I Wu
2026-05-13 17:30         ` Boris Brezillon
2026-05-13 18:17           ` Chia-I Wu
2026-05-18 11:54             ` Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 05/11] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
2026-05-12 19:29   ` Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
2026-05-12 21:04   ` Chia-I Wu
2026-05-13  8:29     ` Boris Brezillon
2026-05-13 17:47       ` Chia-I Wu
2026-05-18 13:45         ` Boris Brezillon
2026-05-18 23:33           ` Chia-I Wu
2026-05-19  7:53             ` Boris Brezillon
2026-05-19 17:16               ` Chia-I Wu
2026-05-19 18:26                 ` Boris Brezillon
2026-05-19 20:45                   ` Chia-I Wu
2026-05-19 21:04                     ` Chia-I Wu
2026-05-20  8:09                       ` Boris Brezillon
2026-05-20 22:15                         ` Chia-I Wu
2026-06-22 12:49                           ` Boris Brezillon
2026-06-23 12:52                             ` Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
2026-05-12 21:16   ` Chia-I Wu
2026-05-14 14:17   ` Steven Price
2026-05-12 11:37 ` [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
2026-05-12 21:55   ` Chia-I Wu
2026-05-13  8:42     ` Boris Brezillon
2026-05-13 17:14       ` Chia-I Wu
2026-05-14 14:25   ` Steven Price
2026-05-18  8:16     ` Boris Brezillon
2026-05-19 14:19       ` Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context Boris Brezillon
2026-05-12 22:05   ` Chia-I Wu
2026-05-12 22:09     ` Chia-I Wu
2026-05-13  8:44       ` Boris Brezillon
2026-05-14 15:23   ` Steven Price
2026-05-12 11:37 ` [PATCH v2 10/11] drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler() Boris Brezillon
2026-05-14 15:26   ` Steven Price
2026-05-18  8:04     ` Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context Boris Brezillon
2026-05-12 11:50   ` Boris Brezillon
2026-05-12 22:40     ` Chia-I Wu
2026-05-13  8:54       ` Boris Brezillon
2026-05-13 18:07         ` Chia-I Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox