[RFC 0/2] Misc panthor bits

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/2] Misc panthor bits
@ 2026-05-22 11:38 Tvrtko Ursulin
  2026-05-22 11:38 ` [RFC 1/2] drm/panthor: Remove redundant drm_sched_job_cleanup() from the .free_job callback Tvrtko Ursulin
  2026-05-22 11:38 ` [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler Tvrtko Ursulin
  0 siblings, 2 replies; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-05-22 11:38 UTC (permalink / raw)
  To: dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Boris Brezillon, Liviu Dudau,
	Steven Price

I am working on some DRM scheduler experiments and have noticed two things
which perhaps can be improved in panthor.

Sending them out as RFC as they are only compile tested, but feel free to have a
look and see what you think at your own leisure.

Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Steven Price <steven.price@arm.com>

Tvrtko Ursulin (2):
  drm/panthor: Remove redundant drm_sched_job_cleanup() from the
    .free_job callback
  drm/panthor: Use separate workqueue for DRM scheduler

 drivers/gpu/drm/panthor/panthor_sched.c | 28 +++++++++++++++----------
 1 file changed, 17 insertions(+), 11 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/2] drm/panthor: Remove redundant drm_sched_job_cleanup() from the .free_job callback
  2026-05-22 11:38 [RFC 0/2] Misc panthor bits Tvrtko Ursulin
@ 2026-05-22 11:38 ` Tvrtko Ursulin
  2026-05-22 11:38 ` [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler Tvrtko Ursulin
  1 sibling, 0 replies; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-05-22 11:38 UTC (permalink / raw)
  To: dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Boris Brezillon, Liviu Dudau,
	Steven Price

After calling drm_sched_job_cleanup(), the free job callback releases it's
reference to the job, where the act of dropping the last reference will
also call the drm_sched_job_cleanup() helper.

We can therefore remove the redundant call from the .free_job callback.

But we have to leave the "if (job->base.s_fence)" guard in job_release(),
since that one not only handles the above described double cleanup, but
also deals with all job cleanup paths which happen before the point the
job was armed.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panthor/panthor_sched.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 5b34032deff8..2bee1c92fb9e 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -3434,7 +3434,6 @@ queue_timedout_job(struct drm_sched_job *sched_job)
 
 static void queue_free_job(struct drm_sched_job *sched_job)
 {
-	drm_sched_job_cleanup(sched_job);
 	panthor_job_put(sched_job);
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler
  2026-05-22 11:38 [RFC 0/2] Misc panthor bits Tvrtko Ursulin
  2026-05-22 11:38 ` [RFC 1/2] drm/panthor: Remove redundant drm_sched_job_cleanup() from the .free_job callback Tvrtko Ursulin
@ 2026-05-22 11:38 ` Tvrtko Ursulin
  2026-05-22 16:08   ` Boris Brezillon
  2026-05-23 10:46   ` [RFC v2 " Tvrtko Ursulin
  1 sibling, 2 replies; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-05-22 11:38 UTC (permalink / raw)
  To: dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Boris Brezillon, Liviu Dudau,
	Steven Price

Currently an unordered workqueue is used for the DRM scheduler which means
its concurrency is externally managed, and given there is one scheduler
instance per userspace queue, that means workqueue management logic is
within its rights to spawn many kernel threads to submit their respective
jobs.

Problem there is that all run job callbacks are serialized on the device
global mutex, making the potential thread storm just causing lock
contention.

If we add a separate ordered workqueue for the DRM scheduler integration
we can avoid this problem, since the ordered property directly expresses
the nature of the submission backend implementation.

And considering the other user of this workqueue, the free job callback,
which is not globally serialized in this manner so could be thought to
potentially regress with this change, it should not be the case since
commit
a58f317c1ca0 ("drm/sched: Free all finished jobs at once")
made the DRM scheduler handle the cleanup of finished jobs more promptly.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panthor/panthor_sched.c | 27 ++++++++++++++++---------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 2bee1c92fb9e..cc6b3e2b015a 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -147,13 +147,11 @@ struct panthor_scheduler {
 	struct panthor_device *ptdev;
 
 	/**
-	 * @wq: Workqueue used by our internal scheduler logic and
-	 * drm_gpu_scheduler.
+	 * @wq: Workqueue used by our internal scheduler logic.
 	 *
 	 * Used for the scheduler tick, group update or other kind of FW
 	 * event processing that can't be handled in the threaded interrupt
-	 * path. Also passed to the drm_gpu_scheduler instances embedded
-	 * in panthor_queue.
+	 * path.
 	 */
 	struct workqueue_struct *wq;
 
@@ -166,6 +164,14 @@ struct panthor_scheduler {
 	 */
 	struct workqueue_struct *heap_alloc_wq;
 
+	/**
+	 * @sched_wq: Workqueue used for the DRM scheduler.
+	 *
+	 * Workqueue used for drm_gpu_scheduler instances embedded in
+	 * panthor_queue.
+	 */
+	struct workqueue_struct *sched_wq;
+
 	/** @tick_work: Work executed on a scheduling tick. */
 	struct delayed_work tick_work;
 
@@ -3488,7 +3494,7 @@ group_create_queue(struct panthor_group *group,
 {
 	struct drm_sched_init_args sched_args = {
 		.ops = &panthor_queue_sched_ops,
-		.submit_wq = group->ptdev->scheduler->wq,
+		.submit_wq = group->ptdev->scheduler->sched_wq,
 		/*
 		 * The credit limit argument tells us the total number of
 		 * instructions across all CS slots in the ringbuffer, with
@@ -4078,6 +4084,9 @@ static void panthor_sched_fini(struct drm_device *ddev, void *res)
 	if (sched->heap_alloc_wq)
 		destroy_workqueue(sched->heap_alloc_wq);
 
+	if (sched->sched_wq)
+		destroy_workqueue(sched->sched_wq);
+
 	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
 		drm_WARN_ON(ddev, !list_empty(&sched->groups.runnable[prio]));
 		drm_WARN_ON(ddev, !list_empty(&sched->groups.idle[prio]));
@@ -4167,13 +4176,11 @@ int panthor_sched_init(struct panthor_device *ptdev)
 	 * FW is smart enough to fall back on other methods if the kernel can't
 	 * allocate memory, and fail the tiling job if none of these
 	 * countermeasures worked.
-	 *
-	 * Set WQ_MEM_RECLAIM on sched->wq to unblock the situation when the
-	 * system is running out of memory.
 	 */
 	sched->heap_alloc_wq = alloc_workqueue("panthor-heap-alloc", WQ_UNBOUND, 0);
-	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
-	if (!sched->wq || !sched->heap_alloc_wq) {
+	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_UNBOUND, 0);
+	sched->sched_wq = alloc_workqueue("panthor-drm-sched", WQ_MEM_RECLAIM, 0);
+	if (!sched->wq || !sched->heap_alloc_wq || !sched->sched_wq) {
 		panthor_sched_fini(&ptdev->base, sched);
 		drm_err(&ptdev->base, "Failed to allocate the workqueues");
 		return -ENOMEM;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler
  2026-05-22 11:38 ` [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler Tvrtko Ursulin
@ 2026-05-22 16:08   ` Boris Brezillon
  2026-05-22 16:25     ` Tvrtko Ursulin
  2026-05-23 10:46   ` [RFC v2 " Tvrtko Ursulin
  1 sibling, 1 reply; 9+ messages in thread
From: Boris Brezillon @ 2026-05-22 16:08 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: dri-devel, kernel-dev, Liviu Dudau, Steven Price

On Fri, 22 May 2026 12:38:17 +0100
Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:

> Currently an unordered workqueue is used for the DRM scheduler which means
> its concurrency is externally managed, and given there is one scheduler
> instance per userspace queue, that means workqueue management logic is
> within its rights to spawn many kernel threads to submit their respective
> jobs.
> 
> Problem there is that all run job callbacks are serialized on the device
> global mutex, making the potential thread storm just causing lock
> contention.

Yeah, so initially we were not supposed to take the lock over the whole
run_job() function. We should normally be able to queue things to the
ring buffer, and only briefly take the lock to check if the context is
still resident and kick the group scheduler if it's not. I agree that
in practice it turned in a huge synchronization point. I guess we should
consider turning that mutex into a rwsem that's taken in write mode in
the tick path, and read-mode elsewhere.

> 
> If we add a separate ordered workqueue for the DRM scheduler integration
> we can avoid this problem, since the ordered property directly expresses
> the nature of the submission backend implementation.

I don't see alloc_ordered_workqueue() being used for the sched_wq
workqueue, is that intended (according to this comment, you seem to
want an ordered wq).

> 
> And considering the other user of this workqueue, the free job callback,
> which is not globally serialized in this manner so could be thought to
> potentially regress with this change, it should not be the case since
> commit
> a58f317c1ca0 ("drm/sched: Free all finished jobs at once")
> made the DRM scheduler handle the cleanup of finished jobs more promptly.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Liviu Dudau <liviu.dudau@arm.com>
> Cc: Steven Price <steven.price@arm.com>
> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 27 ++++++++++++++++---------
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 2bee1c92fb9e..cc6b3e2b015a 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -147,13 +147,11 @@ struct panthor_scheduler {
>  	struct panthor_device *ptdev;
>  
>  	/**
> -	 * @wq: Workqueue used by our internal scheduler logic and
> -	 * drm_gpu_scheduler.
> +	 * @wq: Workqueue used by our internal scheduler logic.
>  	 *
>  	 * Used for the scheduler tick, group update or other kind of FW
>  	 * event processing that can't be handled in the threaded interrupt
> -	 * path. Also passed to the drm_gpu_scheduler instances embedded
> -	 * in panthor_queue.
> +	 * path.
>  	 */
>  	struct workqueue_struct *wq;
>  
> @@ -166,6 +164,14 @@ struct panthor_scheduler {
>  	 */
>  	struct workqueue_struct *heap_alloc_wq;
>  
> +	/**
> +	 * @sched_wq: Workqueue used for the DRM scheduler.
> +	 *
> +	 * Workqueue used for drm_gpu_scheduler instances embedded in
> +	 * panthor_queue.
> +	 */
> +	struct workqueue_struct *sched_wq;
> +
>  	/** @tick_work: Work executed on a scheduling tick. */
>  	struct delayed_work tick_work;
>  
> @@ -3488,7 +3494,7 @@ group_create_queue(struct panthor_group *group,
>  {
>  	struct drm_sched_init_args sched_args = {
>  		.ops = &panthor_queue_sched_ops,
> -		.submit_wq = group->ptdev->scheduler->wq,
> +		.submit_wq = group->ptdev->scheduler->sched_wq,
>  		/*
>  		 * The credit limit argument tells us the total number of
>  		 * instructions across all CS slots in the ringbuffer, with
> @@ -4078,6 +4084,9 @@ static void panthor_sched_fini(struct drm_device *ddev, void *res)
>  	if (sched->heap_alloc_wq)
>  		destroy_workqueue(sched->heap_alloc_wq);
>  
> +	if (sched->sched_wq)
> +		destroy_workqueue(sched->sched_wq);
> +
>  	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
>  		drm_WARN_ON(ddev, !list_empty(&sched->groups.runnable[prio]));
>  		drm_WARN_ON(ddev, !list_empty(&sched->groups.idle[prio]));
> @@ -4167,13 +4176,11 @@ int panthor_sched_init(struct panthor_device *ptdev)
>  	 * FW is smart enough to fall back on other methods if the kernel can't
>  	 * allocate memory, and fail the tiling job if none of these
>  	 * countermeasures worked.
> -	 *
> -	 * Set WQ_MEM_RECLAIM on sched->wq to unblock the situation when the
> -	 * system is running out of memory.
>  	 */
>  	sched->heap_alloc_wq = alloc_workqueue("panthor-heap-alloc", WQ_UNBOUND, 0);
> -	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
> -	if (!sched->wq || !sched->heap_alloc_wq) {
> +	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_UNBOUND, 0);
> +	sched->sched_wq = alloc_workqueue("panthor-drm-sched", WQ_MEM_RECLAIM, 0);

The other one also needs MEM_RECLAIM, because you need work items
queued to sched->wq to run to guarantee forward progress, and you need
to guarantee forward progress to reclaim GPU mem.

> +	if (!sched->wq || !sched->heap_alloc_wq || !sched->sched_wq) {
>  		panthor_sched_fini(&ptdev->base, sched);
>  		drm_err(&ptdev->base, "Failed to allocate the workqueues");
>  		return -ENOMEM;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler
  2026-05-22 16:08   ` Boris Brezillon
@ 2026-05-22 16:25     ` Tvrtko Ursulin
  2026-05-23  6:06       ` Boris Brezillon
  0 siblings, 1 reply; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-05-22 16:25 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel, kernel-dev, Liviu Dudau, Steven Price


On 22/05/2026 17:08, Boris Brezillon wrote:
> On Fri, 22 May 2026 12:38:17 +0100
> Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
> 
>> Currently an unordered workqueue is used for the DRM scheduler which means
>> its concurrency is externally managed, and given there is one scheduler
>> instance per userspace queue, that means workqueue management logic is
>> within its rights to spawn many kernel threads to submit their respective
>> jobs.
>>
>> Problem there is that all run job callbacks are serialized on the device
>> global mutex, making the potential thread storm just causing lock
>> contention.
> 
> Yeah, so initially we were not supposed to take the lock over the whole
> run_job() function. We should normally be able to queue things to the
> ring buffer, and only briefly take the lock to check if the context is
> still resident and kick the group scheduler if it's not. I agree that
> in practice it turned in a huge synchronization point. I guess we should
> consider turning that mutex into a rwsem that's taken in write mode in
> the tick path, and read-mode elsewhere.

There is some software state modified too, so I am not sure how easy or 
hard it would be to make run job only hold the read lock?

>> If we add a separate ordered workqueue for the DRM scheduler integration
>> we can avoid this problem, since the ordered property directly expresses
>> the nature of the submission backend implementation.
> 
> I don't see alloc_ordered_workqueue() being used for the sched_wq
> workqueue, is that intended (according to this comment, you seem to
> want an ordered wq).

Yeah, it seems I mistyped the wq allocation below.

>> And considering the other user of this workqueue, the free job callback,
>> which is not globally serialized in this manner so could be thought to
>> potentially regress with this change, it should not be the case since
>> commit
>> a58f317c1ca0 ("drm/sched: Free all finished jobs at once")
>> made the DRM scheduler handle the cleanup of finished jobs more promptly.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>> Cc: Liviu Dudau <liviu.dudau@arm.com>
>> Cc: Steven Price <steven.price@arm.com>
>> ---
>>   drivers/gpu/drm/panthor/panthor_sched.c | 27 ++++++++++++++++---------
>>   1 file changed, 17 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
>> index 2bee1c92fb9e..cc6b3e2b015a 100644
>> --- a/drivers/gpu/drm/panthor/panthor_sched.c
>> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
>> @@ -147,13 +147,11 @@ struct panthor_scheduler {
>>   	struct panthor_device *ptdev;
>>   
>>   	/**
>> -	 * @wq: Workqueue used by our internal scheduler logic and
>> -	 * drm_gpu_scheduler.
>> +	 * @wq: Workqueue used by our internal scheduler logic.
>>   	 *
>>   	 * Used for the scheduler tick, group update or other kind of FW
>>   	 * event processing that can't be handled in the threaded interrupt
>> -	 * path. Also passed to the drm_gpu_scheduler instances embedded
>> -	 * in panthor_queue.
>> +	 * path.
>>   	 */
>>   	struct workqueue_struct *wq;
>>   
>> @@ -166,6 +164,14 @@ struct panthor_scheduler {
>>   	 */
>>   	struct workqueue_struct *heap_alloc_wq;
>>   
>> +	/**
>> +	 * @sched_wq: Workqueue used for the DRM scheduler.
>> +	 *
>> +	 * Workqueue used for drm_gpu_scheduler instances embedded in
>> +	 * panthor_queue.
>> +	 */
>> +	struct workqueue_struct *sched_wq;
>> +
>>   	/** @tick_work: Work executed on a scheduling tick. */
>>   	struct delayed_work tick_work;
>>   
>> @@ -3488,7 +3494,7 @@ group_create_queue(struct panthor_group *group,
>>   {
>>   	struct drm_sched_init_args sched_args = {
>>   		.ops = &panthor_queue_sched_ops,
>> -		.submit_wq = group->ptdev->scheduler->wq,
>> +		.submit_wq = group->ptdev->scheduler->sched_wq,
>>   		/*
>>   		 * The credit limit argument tells us the total number of
>>   		 * instructions across all CS slots in the ringbuffer, with
>> @@ -4078,6 +4084,9 @@ static void panthor_sched_fini(struct drm_device *ddev, void *res)
>>   	if (sched->heap_alloc_wq)
>>   		destroy_workqueue(sched->heap_alloc_wq);
>>   
>> +	if (sched->sched_wq)
>> +		destroy_workqueue(sched->sched_wq);
>> +
>>   	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
>>   		drm_WARN_ON(ddev, !list_empty(&sched->groups.runnable[prio]));
>>   		drm_WARN_ON(ddev, !list_empty(&sched->groups.idle[prio]));
>> @@ -4167,13 +4176,11 @@ int panthor_sched_init(struct panthor_device *ptdev)
>>   	 * FW is smart enough to fall back on other methods if the kernel can't
>>   	 * allocate memory, and fail the tiling job if none of these
>>   	 * countermeasures worked.
>> -	 *
>> -	 * Set WQ_MEM_RECLAIM on sched->wq to unblock the situation when the
>> -	 * system is running out of memory.
>>   	 */
>>   	sched->heap_alloc_wq = alloc_workqueue("panthor-heap-alloc", WQ_UNBOUND, 0);
>> -	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
>> -	if (!sched->wq || !sched->heap_alloc_wq) {
>> +	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_UNBOUND, 0);
>> +	sched->sched_wq = alloc_workqueue("panthor-drm-sched", WQ_MEM_RECLAIM, 0);
> 
> The other one also needs MEM_RECLAIM, because you need work items
> queued to sched->wq to run to guarantee forward progress, and you need
> to guarantee forward progress to reclaim GPU mem.

Ack, I had a feeling that might be the case.

I will respin next week or so. Or if you tell me the global lock can be 
easily dropped from .run_job I can drop and forget about it. Wider 
context is that I am experimenting with kthread_worker conversion and 
trying to polish a 
somewhat-broken-but-showing-great-latency-improvements branch. For that 
I can kind of take either one global worker, or one worker per client 
route for the RFC, no big deal either way for the prototype.

Regards,

Tvrtko

> 
>> +	if (!sched->wq || !sched->heap_alloc_wq || !sched->sched_wq) {
>>   		panthor_sched_fini(&ptdev->base, sched);
>>   		drm_err(&ptdev->base, "Failed to allocate the workqueues");
>>   		return -ENOMEM;
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler
  2026-05-22 16:25     ` Tvrtko Ursulin
@ 2026-05-23  6:06       ` Boris Brezillon
  2026-05-23 13:12         ` Tvrtko Ursulin
  0 siblings, 1 reply; 9+ messages in thread
From: Boris Brezillon @ 2026-05-23  6:06 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: dri-devel, kernel-dev, Liviu Dudau, Steven Price

On Fri, 22 May 2026 17:25:18 +0100
Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:

> On 22/05/2026 17:08, Boris Brezillon wrote:
> > On Fri, 22 May 2026 12:38:17 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
> >   
> >> Currently an unordered workqueue is used for the DRM scheduler which means
> >> its concurrency is externally managed, and given there is one scheduler
> >> instance per userspace queue, that means workqueue management logic is
> >> within its rights to spawn many kernel threads to submit their respective
> >> jobs.
> >>
> >> Problem there is that all run job callbacks are serialized on the device
> >> global mutex, making the potential thread storm just causing lock
> >> contention.  
> > 
> > Yeah, so initially we were not supposed to take the lock over the whole
> > run_job() function. We should normally be able to queue things to the
> > ring buffer, and only briefly take the lock to check if the context is
> > still resident and kick the group scheduler if it's not. I agree that
> > in practice it turned in a huge synchronization point. I guess we should
> > consider turning that mutex into a rwsem that's taken in write mode in
> > the tick path, and read-mode elsewhere.  
> 
> There is some software state modified too, so I am not sure how easy or 
> hard it would be to make run job only hold the read lock?

Yeah, it's probably not as easy as it sounds.

> 
> >> If we add a separate ordered workqueue for the DRM scheduler integration
> >> we can avoid this problem, since the ordered property directly expresses
> >> the nature of the submission backend implementation.  
> > 
> > I don't see alloc_ordered_workqueue() being used for the sched_wq
> > workqueue, is that intended (according to this comment, you seem to
> > want an ordered wq).  
> 
> Yeah, it seems I mistyped the wq allocation below.
> 
> >> And considering the other user of this workqueue, the free job callback,
> >> which is not globally serialized in this manner so could be thought to
> >> potentially regress with this change, it should not be the case since
> >> commit
> >> a58f317c1ca0 ("drm/sched: Free all finished jobs at once")
> >> made the DRM scheduler handle the cleanup of finished jobs more promptly.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> >> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >> Cc: Liviu Dudau <liviu.dudau@arm.com>
> >> Cc: Steven Price <steven.price@arm.com>
> >> ---
> >>   drivers/gpu/drm/panthor/panthor_sched.c | 27 ++++++++++++++++---------
> >>   1 file changed, 17 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> >> index 2bee1c92fb9e..cc6b3e2b015a 100644
> >> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> >> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> >> @@ -147,13 +147,11 @@ struct panthor_scheduler {
> >>   	struct panthor_device *ptdev;
> >>   
> >>   	/**
> >> -	 * @wq: Workqueue used by our internal scheduler logic and
> >> -	 * drm_gpu_scheduler.
> >> +	 * @wq: Workqueue used by our internal scheduler logic.
> >>   	 *
> >>   	 * Used for the scheduler tick, group update or other kind of FW
> >>   	 * event processing that can't be handled in the threaded interrupt
> >> -	 * path. Also passed to the drm_gpu_scheduler instances embedded
> >> -	 * in panthor_queue.
> >> +	 * path.
> >>   	 */
> >>   	struct workqueue_struct *wq;
> >>   
> >> @@ -166,6 +164,14 @@ struct panthor_scheduler {
> >>   	 */
> >>   	struct workqueue_struct *heap_alloc_wq;
> >>   
> >> +	/**
> >> +	 * @sched_wq: Workqueue used for the DRM scheduler.
> >> +	 *
> >> +	 * Workqueue used for drm_gpu_scheduler instances embedded in
> >> +	 * panthor_queue.
> >> +	 */
> >> +	struct workqueue_struct *sched_wq;
> >> +
> >>   	/** @tick_work: Work executed on a scheduling tick. */
> >>   	struct delayed_work tick_work;
> >>   
> >> @@ -3488,7 +3494,7 @@ group_create_queue(struct panthor_group *group,
> >>   {
> >>   	struct drm_sched_init_args sched_args = {
> >>   		.ops = &panthor_queue_sched_ops,
> >> -		.submit_wq = group->ptdev->scheduler->wq,
> >> +		.submit_wq = group->ptdev->scheduler->sched_wq,
> >>   		/*
> >>   		 * The credit limit argument tells us the total number of
> >>   		 * instructions across all CS slots in the ringbuffer, with
> >> @@ -4078,6 +4084,9 @@ static void panthor_sched_fini(struct drm_device *ddev, void *res)
> >>   	if (sched->heap_alloc_wq)
> >>   		destroy_workqueue(sched->heap_alloc_wq);
> >>   
> >> +	if (sched->sched_wq)
> >> +		destroy_workqueue(sched->sched_wq);
> >> +
> >>   	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
> >>   		drm_WARN_ON(ddev, !list_empty(&sched->groups.runnable[prio]));
> >>   		drm_WARN_ON(ddev, !list_empty(&sched->groups.idle[prio]));
> >> @@ -4167,13 +4176,11 @@ int panthor_sched_init(struct panthor_device *ptdev)
> >>   	 * FW is smart enough to fall back on other methods if the kernel can't
> >>   	 * allocate memory, and fail the tiling job if none of these
> >>   	 * countermeasures worked.
> >> -	 *
> >> -	 * Set WQ_MEM_RECLAIM on sched->wq to unblock the situation when the
> >> -	 * system is running out of memory.
> >>   	 */
> >>   	sched->heap_alloc_wq = alloc_workqueue("panthor-heap-alloc", WQ_UNBOUND, 0);
> >> -	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
> >> -	if (!sched->wq || !sched->heap_alloc_wq) {
> >> +	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_UNBOUND, 0);
> >> +	sched->sched_wq = alloc_workqueue("panthor-drm-sched", WQ_MEM_RECLAIM, 0);  
> > 
> > The other one also needs MEM_RECLAIM, because you need work items
> > queued to sched->wq to run to guarantee forward progress, and you need
> > to guarantee forward progress to reclaim GPU mem.  
> 
> Ack, I had a feeling that might be the case.
> 
> I will respin next week or so. Or if you tell me the global lock can be 
> easily dropped from .run_job I can drop and forget about it. Wider 
> context is that I am experimenting with kthread_worker conversion and 
> trying to polish a 
> somewhat-broken-but-showing-great-latency-improvements branch.

Yep, I know, your experimental branch is actually on my list of things
to look at/test ;-).

> For that 
> I can kind of take either one global worker, or one worker per client 
> route for the RFC, no big deal either way for the prototype.

We probably want a worker per-prio (and possibly per-cpu), but certainly
not one per-client, or you'll end up with the thread explosion that was
addressed by the kthread -> workqueue transition.

Also, I didn't look at your panthor changes in this branch yet, but if
we're switching drm_sched to kthread workers, we probably want to
transition most existing panthor works to kthread_work, because some FW
events might need to processed for the GPU context to be unblocked, and
if we keep queuing those to a regular workqueue, they will be lagging
behing the HI_PRIO thread you have for HI_PRIO contexts.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC v2 2/2] drm/panthor: Use separate workqueue for DRM scheduler
  2026-05-22 11:38 ` [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler Tvrtko Ursulin
  2026-05-22 16:08   ` Boris Brezillon
@ 2026-05-23 10:46   ` Tvrtko Ursulin
  1 sibling, 0 replies; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-05-23 10:46 UTC (permalink / raw)
  To: dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Boris Brezillon, Liviu Dudau,
	Steven Price

Currently an unordered workqueue is used for the DRM scheduler which means
its concurrency is externally managed, and given there is one scheduler
instance per userspace queue, that means workqueue management logic is
within its rights to spawn many kernel threads to submit their respective
jobs.

Problem there is that all run job callbacks are serialized on the device
global mutex, making the potential thread storm just causing lock
contention.

If we add a separate ordered workqueue for the DRM scheduler integration
we can avoid this problem, since the ordered property directly expresses
the nature of the submission backend implementation.

And considering the other user of this workqueue, the free job callback,
which is not globally serialized in this manner so could be thought to
potentially regress with this change, it should not be the case since
commit
a58f317c1ca0 ("drm/sched: Free all finished jobs at once")
made the DRM scheduler handle the cleanup of finished jobs more promptly.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Steven Price <steven.price@arm.com>
---
v2:
 * Actually create an unordered wq.
 * Put back WQ_MEM_RECLAIM to sched->wq.
---
 drivers/gpu/drm/panthor/panthor_sched.c | 26 +++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 2bee1c92fb9e..f4dfd82ad8a8 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -147,13 +147,11 @@ struct panthor_scheduler {
 	struct panthor_device *ptdev;
 
 	/**
-	 * @wq: Workqueue used by our internal scheduler logic and
-	 * drm_gpu_scheduler.
+	 * @wq: Workqueue used by our internal scheduler logic.
 	 *
 	 * Used for the scheduler tick, group update or other kind of FW
 	 * event processing that can't be handled in the threaded interrupt
-	 * path. Also passed to the drm_gpu_scheduler instances embedded
-	 * in panthor_queue.
+	 * path.
 	 */
 	struct workqueue_struct *wq;
 
@@ -166,6 +164,14 @@ struct panthor_scheduler {
 	 */
 	struct workqueue_struct *heap_alloc_wq;
 
+	/**
+	 * @sched_wq: Workqueue used for the DRM scheduler.
+	 *
+	 * Workqueue used for drm_gpu_scheduler instances embedded in
+	 * panthor_queue.
+	 */
+	struct workqueue_struct *sched_wq;
+
 	/** @tick_work: Work executed on a scheduling tick. */
 	struct delayed_work tick_work;
 
@@ -3488,7 +3494,7 @@ group_create_queue(struct panthor_group *group,
 {
 	struct drm_sched_init_args sched_args = {
 		.ops = &panthor_queue_sched_ops,
-		.submit_wq = group->ptdev->scheduler->wq,
+		.submit_wq = group->ptdev->scheduler->sched_wq,
 		/*
 		 * The credit limit argument tells us the total number of
 		 * instructions across all CS slots in the ringbuffer, with
@@ -4078,6 +4084,9 @@ static void panthor_sched_fini(struct drm_device *ddev, void *res)
 	if (sched->heap_alloc_wq)
 		destroy_workqueue(sched->heap_alloc_wq);
 
+	if (sched->sched_wq)
+		destroy_workqueue(sched->sched_wq);
+
 	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
 		drm_WARN_ON(ddev, !list_empty(&sched->groups.runnable[prio]));
 		drm_WARN_ON(ddev, !list_empty(&sched->groups.idle[prio]));
@@ -4168,12 +4177,13 @@ int panthor_sched_init(struct panthor_device *ptdev)
 	 * allocate memory, and fail the tiling job if none of these
 	 * countermeasures worked.
 	 *
-	 * Set WQ_MEM_RECLAIM on sched->wq to unblock the situation when the
-	 * system is running out of memory.
+	 * Set WQ_MEM_RECLAIM on sched->wq and wq->sched_wq to unblock the
+	 * situation when the system is running out of memory.
 	 */
 	sched->heap_alloc_wq = alloc_workqueue("panthor-heap-alloc", WQ_UNBOUND, 0);
 	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
-	if (!sched->wq || !sched->heap_alloc_wq) {
+	sched->sched_wq = alloc_ordered_workqueue("panthor-drm-sched", WQ_MEM_RECLAIM);
+	if (!sched->wq || !sched->heap_alloc_wq || !sched->sched_wq) {
 		panthor_sched_fini(&ptdev->base, sched);
 		drm_err(&ptdev->base, "Failed to allocate the workqueues");
 		return -ENOMEM;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler
  2026-05-23  6:06       ` Boris Brezillon
@ 2026-05-23 13:12         ` Tvrtko Ursulin
  2026-06-01  9:05           ` Tvrtko Ursulin
  0 siblings, 1 reply; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-05-23 13:12 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel, kernel-dev, Liviu Dudau, Steven Price


On 23/05/2026 07:06, Boris Brezillon wrote:
> On Fri, 22 May 2026 17:25:18 +0100
> Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
> 
>> On 22/05/2026 17:08, Boris Brezillon wrote:
>>> On Fri, 22 May 2026 12:38:17 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
>>>    
>>>> Currently an unordered workqueue is used for the DRM scheduler which means
>>>> its concurrency is externally managed, and given there is one scheduler
>>>> instance per userspace queue, that means workqueue management logic is
>>>> within its rights to spawn many kernel threads to submit their respective
>>>> jobs.
>>>>
>>>> Problem there is that all run job callbacks are serialized on the device
>>>> global mutex, making the potential thread storm just causing lock
>>>> contention.
>>>
>>> Yeah, so initially we were not supposed to take the lock over the whole
>>> run_job() function. We should normally be able to queue things to the
>>> ring buffer, and only briefly take the lock to check if the context is
>>> still resident and kick the group scheduler if it's not. I agree that
>>> in practice it turned in a huge synchronization point. I guess we should
>>> consider turning that mutex into a rwsem that's taken in write mode in
>>> the tick path, and read-mode elsewhere.
>>
>> There is some software state modified too, so I am not sure how easy or
>> hard it would be to make run job only hold the read lock?
> 
> Yeah, it's probably not as easy as it sounds.
> 
>>
>>>> If we add a separate ordered workqueue for the DRM scheduler integration
>>>> we can avoid this problem, since the ordered property directly expresses
>>>> the nature of the submission backend implementation.
>>>
>>> I don't see alloc_ordered_workqueue() being used for the sched_wq
>>> workqueue, is that intended (according to this comment, you seem to
>>> want an ordered wq).
>>
>> Yeah, it seems I mistyped the wq allocation below.
>>
>>>> And considering the other user of this workqueue, the free job callback,
>>>> which is not globally serialized in this manner so could be thought to
>>>> potentially regress with this change, it should not be the case since
>>>> commit
>>>> a58f317c1ca0 ("drm/sched: Free all finished jobs at once")
>>>> made the DRM scheduler handle the cleanup of finished jobs more promptly.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>> Cc: Liviu Dudau <liviu.dudau@arm.com>
>>>> Cc: Steven Price <steven.price@arm.com>
>>>> ---
>>>>    drivers/gpu/drm/panthor/panthor_sched.c | 27 ++++++++++++++++---------
>>>>    1 file changed, 17 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
>>>> index 2bee1c92fb9e..cc6b3e2b015a 100644
>>>> --- a/drivers/gpu/drm/panthor/panthor_sched.c
>>>> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
>>>> @@ -147,13 +147,11 @@ struct panthor_scheduler {
>>>>    	struct panthor_device *ptdev;
>>>>    
>>>>    	/**
>>>> -	 * @wq: Workqueue used by our internal scheduler logic and
>>>> -	 * drm_gpu_scheduler.
>>>> +	 * @wq: Workqueue used by our internal scheduler logic.
>>>>    	 *
>>>>    	 * Used for the scheduler tick, group update or other kind of FW
>>>>    	 * event processing that can't be handled in the threaded interrupt
>>>> -	 * path. Also passed to the drm_gpu_scheduler instances embedded
>>>> -	 * in panthor_queue.
>>>> +	 * path.
>>>>    	 */
>>>>    	struct workqueue_struct *wq;
>>>>    
>>>> @@ -166,6 +164,14 @@ struct panthor_scheduler {
>>>>    	 */
>>>>    	struct workqueue_struct *heap_alloc_wq;
>>>>    
>>>> +	/**
>>>> +	 * @sched_wq: Workqueue used for the DRM scheduler.
>>>> +	 *
>>>> +	 * Workqueue used for drm_gpu_scheduler instances embedded in
>>>> +	 * panthor_queue.
>>>> +	 */
>>>> +	struct workqueue_struct *sched_wq;
>>>> +
>>>>    	/** @tick_work: Work executed on a scheduling tick. */
>>>>    	struct delayed_work tick_work;
>>>>    
>>>> @@ -3488,7 +3494,7 @@ group_create_queue(struct panthor_group *group,
>>>>    {
>>>>    	struct drm_sched_init_args sched_args = {
>>>>    		.ops = &panthor_queue_sched_ops,
>>>> -		.submit_wq = group->ptdev->scheduler->wq,
>>>> +		.submit_wq = group->ptdev->scheduler->sched_wq,
>>>>    		/*
>>>>    		 * The credit limit argument tells us the total number of
>>>>    		 * instructions across all CS slots in the ringbuffer, with
>>>> @@ -4078,6 +4084,9 @@ static void panthor_sched_fini(struct drm_device *ddev, void *res)
>>>>    	if (sched->heap_alloc_wq)
>>>>    		destroy_workqueue(sched->heap_alloc_wq);
>>>>    
>>>> +	if (sched->sched_wq)
>>>> +		destroy_workqueue(sched->sched_wq);
>>>> +
>>>>    	for (prio = PANTHOR_CSG_PRIORITY_COUNT - 1; prio >= 0; prio--) {
>>>>    		drm_WARN_ON(ddev, !list_empty(&sched->groups.runnable[prio]));
>>>>    		drm_WARN_ON(ddev, !list_empty(&sched->groups.idle[prio]));
>>>> @@ -4167,13 +4176,11 @@ int panthor_sched_init(struct panthor_device *ptdev)
>>>>    	 * FW is smart enough to fall back on other methods if the kernel can't
>>>>    	 * allocate memory, and fail the tiling job if none of these
>>>>    	 * countermeasures worked.
>>>> -	 *
>>>> -	 * Set WQ_MEM_RECLAIM on sched->wq to unblock the situation when the
>>>> -	 * system is running out of memory.
>>>>    	 */
>>>>    	sched->heap_alloc_wq = alloc_workqueue("panthor-heap-alloc", WQ_UNBOUND, 0);
>>>> -	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
>>>> -	if (!sched->wq || !sched->heap_alloc_wq) {
>>>> +	sched->wq = alloc_workqueue("panthor-csf-sched", WQ_UNBOUND, 0);
>>>> +	sched->sched_wq = alloc_workqueue("panthor-drm-sched", WQ_MEM_RECLAIM, 0);
>>>
>>> The other one also needs MEM_RECLAIM, because you need work items
>>> queued to sched->wq to run to guarantee forward progress, and you need
>>> to guarantee forward progress to reclaim GPU mem.
>>
>> Ack, I had a feeling that might be the case.
>>
>> I will respin next week or so. Or if you tell me the global lock can be
>> easily dropped from .run_job I can drop and forget about it. Wider
>> context is that I am experimenting with kthread_worker conversion and
>> trying to polish a
>> somewhat-broken-but-showing-great-latency-improvements branch.
> 
> Yep, I know, your experimental branch is actually on my list of things
> to look at/test ;-).

Hold off a few days at least, there is a UAF in the branch Chia-I tested 
and I am in the process of reworking some details to eliminate that.

>> For that
>> I can kind of take either one global worker, or one worker per client
>> route for the RFC, no big deal either way for the prototype.
> 
> We probably want a worker per-prio (and possibly per-cpu), but certainly
> not one per-client, or you'll end up with the thread explosion that was
> addressed by the kthread -> workqueue transition.

Yeah I know, it is non-trivial, especially when different drivers are 
considered. There are multiple angles I am attacking that from and we 
can discuss it once I have something polished enough for sharing.

Just one of the things being the fact both panthor and I think xe 
currently serialize on the device global lock (per GuC CT lock in case 
of xe) so the whole point of per sched workers is questionable. I am yet 
to look at the other drivers to gain a fuller picture, but there is also 
VM BIND in panthor which is a problem.

> Also, I didn't look at your panthor changes in this branch yet, but if
> we're switching drm_sched to kthread workers, we probably want to
> transition most existing panthor works to kthread_work, because some FW
> events might need to processed for the GPU context to be unblocked, and
> if we keep queuing those to a regular workqueue, they will be lagging
> behing the HI_PRIO thread you have for HI_PRIO contexts.

Possibly yes. I glanced over some tick based machine so it is plausible 
that could be changed completely. For now the goal was to do a proof of 
concept and see if the numbers support it. And it seems the numbers are 
dramatically strong so I am cleaning it all up for an RFC.

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler
  2026-05-23 13:12         ` Tvrtko Ursulin
@ 2026-06-01  9:05           ` Tvrtko Ursulin
  0 siblings, 0 replies; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-06-01  9:05 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel, kernel-dev, Liviu Dudau, Steven Price

On 23/05/2026 14:12, Tvrtko Ursulin wrote:

8><

>>> I will respin next week or so. Or if you tell me the global lock can be
>>> easily dropped from .run_job I can drop and forget about it. Wider
>>> context is that I am experimenting with kthread_worker conversion and
>>> trying to polish a
>>> somewhat-broken-but-showing-great-latency-improvements branch.
>>
>> Yep, I know, your experimental branch is actually on my list of things
>> to look at/test ;-).
> 
> Hold off a few days at least, there is a UAF in the branch Chia-I tested 
> and I am in the process of reworking some details to eliminate that.

FYI the new branch is at:

https://cgit.freedesktop.org/~tursulin/drm-intel/log/?h=drm-sched-kworker-single-submit

And benchmark numbers from Chia-I are at:

https://gitlab.freedesktop.org/panfrost/linux/-/work_items/49#note_3484837

Those were collected from the old branch, but I am quite confident the 
fixed version will behave the same.

In summary it is quite compelling, and details aside, I guess the 
question at some point will be where do we go from here. Do we proceed 
in building something on top of kthread_worker, if the problem of 
spawning too many threads cannot be otherwise avoided, do we go back to 
the workqueue maintainers asking to re-consider adding priority 
inheritance or something, or some third option.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-01  9:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 11:38 [RFC 0/2] Misc panthor bits Tvrtko Ursulin
2026-05-22 11:38 ` [RFC 1/2] drm/panthor: Remove redundant drm_sched_job_cleanup() from the .free_job callback Tvrtko Ursulin
2026-05-22 11:38 ` [RFC 2/2] drm/panthor: Use separate workqueue for DRM scheduler Tvrtko Ursulin
2026-05-22 16:08   ` Boris Brezillon
2026-05-22 16:25     ` Tvrtko Ursulin
2026-05-23  6:06       ` Boris Brezillon
2026-05-23 13:12         ` Tvrtko Ursulin
2026-06-01  9:05           ` Tvrtko Ursulin
2026-05-23 10:46   ` [RFC v2 " Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.